Skip to content

Latest commit

 

History

History
45 lines (28 loc) · 2.64 KB

File metadata and controls

45 lines (28 loc) · 2.64 KB

FitLayout - GNN Integration Demo

(c) 2026 Radek Burget (burgetr@fit.vut.cz)

A demo repository that demonstrates the integration of FitLayout as a data source for PyTorch Geometric ML applications. It implements the dataset preparation and the training of a GCNC network on this data set. The dataset contains pages from the imaginary bookstore available at https://books.toscrape.com/.

Installation

All the scripts assume a FitLayout server running. See the server folder for sample server configuration that is started as a docker container. The server hostname should be configured in src/config.py.

The scripts also require the dependencies listed in requiremens.txt that can be installed in the usual way using pip. The communication with the FitLayout server is implenented using the FitLayout Python Client library.

Dataset preparation

The dataset preparation including the page rendering and annotation is driven by scripts in the src/prepare folder. See the separate README for more information.

The src/list_artifacts.py and src/list_tags.py scripts can be used for reviewing the repository contents.

Integration with PyTorch Geometric

The integration with PyG is implemented as two sample components:

Further the learning itself is implemented as the following scripts:

  • src/convert_all.py converts all the AreaTrees in the repository to PyG graphs using the GraphCreator and saves them as PyTorch files.
  • src/test_train.py trains the GNN using the saved graphs and saves the trained network state.
  • src/test_predict.py tests the trained GNN using a testing dataset and evaluates the results.

Dataset export and import

Advanced repository contents management such as export and import can be performed using the provided command line interface tool:

python -i src/prepare/cli.py

Then, the cli.import() and cli.dump() functions can be used the import and export.