This is a scraper for the corporate registry of the country of Georgia. It is implemented in Python, using the excellent Scrapy framework.
Although there are still bugs, this scraper has significantly exceeded the capabilities of our old scraper, so please use this one from now on.
Should be pretty simple:
virtualenv geo_corp_scrapecd geo_corp_scrapesource bin/activateand clone the repo- cd into the repo folder and
pip install -r requirements.txt cp settings.py.example settings.pyand edit to suit.- Install poppler
scrapy crawl corps -- That's it.
You should get a series of JSON files representing the scraped data.