The following scripts can be used for preparing the web page data on the FitLayout server:
render.py- renders the pages listed in /data/book_urls.txt, creates the Page artifactssegment.py- performs simple segmentation, creates an AreaTree for each Pagetagging.py- an example script that assigns tags to important area (book titles and prices in this example) using simple DOM-based and presentation-based rules.
All the script share the FitLayout server configuration defined in config.py.
The scripts can be executed from the root folder, e.g.:
python src/prepare/render.pyFor an interactive CLI that allows to manage the configured repository content use:
python src/prepare/cli.py