Finstat Scraper collects structured financial and company profile data for Slovak businesses using their IČO identifiers. It helps teams quickly enrich datasets with revenue-by-year, classification, and workforce signals from a single workflow. If you need a reliable Finstat scraper for company research, lead enrichment, or market segmentation, this project is built for that.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for finstat-scraper you've just found your team — Let’s Chat. 👆👆
This project processes a list of Slovak IČOs (company identifiers) and returns one normalized record per company with financial history and key profile attributes. It solves the problem of manually looking up financials and firmographics across many companies, turning that work into a repeatable data pipeline. It’s designed for developers, analysts, growth teams, and data ops workflows that need consistent Slovak company data at scale.
- Accepts a batch list of IČOs and produces a clean JSON dataset (one item per IČO).
- Extracts revenue by year (availability varies by company) for quick time-series analysis.
- Captures identifiers and firmographics like VAT ID, address, and SK NACE classification.
- Includes employment signals (text and minimum employees) for sizing and segmentation.
- Works best for hundreds of IČOs per run to balance speed, stability, and data quality.
| Feature | Description |
|---|---|
| Batch IČO processing | Submit an array of Slovak IČOs and get structured results for each company. |
| Revenue by year | Extracts yearly revenues (e.g., revenue_2019 to revenue_2024 when available) for trend analysis. |
| VAT ID extraction | Captures IČ DPH (VAT ID) for compliance checks and matching across systems. |
| Address normalization | Returns a consistent company seat/address field for mapping and CRM enrichment. |
| SK NACE classification | Provides SK NACE codes to enable industry segmentation and reporting. |
| Employment indicators | Extracts employee-related text plus a minimum employee estimate for company sizing. |
| Established date | Includes company establishment date where available for maturity/tenure analysis. |
| Input validation | Filters invalid/empty IČO values and reports failures without breaking the whole run. |
| Field Name | Field Description |
|---|---|
| ico | Slovak company identifier provided as input (IČO). |
| revenue_2019 | Company revenue for 2019 (may be missing depending on availability). |
| revenue_2020 | Company revenue for 2020 (may be missing depending on availability). |
| revenue_2021 | Company revenue for 2021 (may be missing depending on availability). |
| revenue_2022 | Company revenue for 2022 (may be missing depending on availability). |
| revenue_2023 | Company revenue for 2023 (may be missing depending on availability). |
| revenue_2024 | Company revenue for 2024 (may be missing depending on availability). |
| ic_dph | VAT ID (IČ DPH) when available. |
| sidlo | Company registered seat / address. |
| sk_nace | SK NACE classification code (industry classification). |
| employees_text | Human-readable employment info text (e.g., ranges or labels). |
| min_employees | Minimum employee estimate derived from available employment data. |
| established_date | Company establishment date when available. |
[
{
"ico": "35772474",
"revenue_2019": 1254300,
"revenue_2020": 1328800,
"revenue_2021": 1410200,
"revenue_2022": 1556700,
"revenue_2023": 1629000,
"revenue_2024": null,
"ic_dph": "SK2020271234",
"sidlo": "Bratislava, Slovak Republic",
"sk_nace": "62010",
"employees_text": "25–49 employees",
"min_employees": 25,
"established_date": "2002-04-18"
}
]
Finstat Scraper/
├── src/
│ ├── main.js
│ ├── cli.js
│ ├── config/
│ │ ├── default.json
│ │ └── schema.json
│ ├── core/
│ │ ├── run.js
│ │ ├── validator.js
│ │ └── logger.js
│ ├── scrapers/
│ │ ├── finstatClient.js
│ │ ├── companyParser.js
│ │ └── revenueParser.js
│ ├── transformers/
│ │ ├── normalizeFields.js
│ │ └── toOutputRecord.js
│ ├── outputs/
│ │ ├── writeJson.js
│ │ └── writeNdjson.js
│ └── utils/
│ ├── http.js
│ ├── retry.js
│ ├── time.js
│ └── strings.js
├── data/
│ ├── inputs.sample.json
│ └── output.sample.json
├── tests/
│ ├── fixtures/
│ │ ├── company.sample.html
│ │ └── company.sample.json
│ ├── parser.test.js
│ ├── transformer.test.js
│ └── validator.test.js
├── .gitignore
├── .env.example
├── package.json
├── package-lock.json
├── LICENSE
└── README.md
- B2B sales teams use it to enrich prospect lists with revenue and workforce signals, so they can prioritize accounts that match ICP size and budget.
- Market researchers use it to segment Slovak companies by SK NACE and revenue bands, so they can build sharper industry reports and TAM estimates.
- Data teams use it to standardize firmographic attributes across datasets, so they can improve matching, deduplication, and analytics quality.
- Compliance and operations use it to cross-check VAT IDs and registered seats, so they can reduce onboarding friction and verification effort.
- Growth analysts use it to track revenue changes across years, so they can spot trends, outliers, and fast-growing companies.
What input format does it accept?
Provide an array named icos containing Slovak IČO values as strings or numbers. The runner normalizes them to strings, trims whitespace, and skips empty entries.
Why are some revenue years missing or null?
Not every company has the same set of published financial years. The scraper returns only what’s available, leaving missing years as null (or omitting them if you configure sparse output).
How many IČOs should I run at once? For stable operation, aim for a few hundred IČOs per run. Larger batches can be split into chunks to reduce retries, rate limits, and partial failures.
How does employment data work?
Employment info is returned in a human-readable field (employees_text) and a numeric minimum estimate (min_employees). If only qualitative ranges are available, min_employees reflects the lower bound of that range.
Primary Metric: Typical throughput of 2.0–3.5 companies/second on a standard workstation network, depending on company page complexity and retry rate.
Reliability Metric: 97–99% success rate on clean IČO lists (validated and deduplicated), with failures usually caused by invalid IČOs or incomplete company pages.
Efficiency Metric: Average memory usage stays under 200–350 MB for runs of ~300 IČOs, with streaming output options (NDJSON) to keep large exports lightweight.
Quality Metric: Revenue field completeness commonly lands around 70–90% across mixed datasets, while identifiers (IČO, address, SK NACE) typically exceed 90% completeness when available on the source page.
