Finstat Scraper

Finstat Scraper collects structured financial and company profile data for Slovak businesses using their IČO identifiers. It helps teams quickly enrich datasets with revenue-by-year, classification, and workforce signals from a single workflow. If you need a reliable Finstat scraper for company research, lead enrichment, or market segmentation, this project is built for that.

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for finstat-scraper you've just found your team — Let’s Chat. 👆👆

Introduction

This project processes a list of Slovak IČOs (company identifiers) and returns one normalized record per company with financial history and key profile attributes. It solves the problem of manually looking up financials and firmographics across many companies, turning that work into a repeatable data pipeline. It’s designed for developers, analysts, growth teams, and data ops workflows that need consistent Slovak company data at scale.

Built for Slovak company enrichment

Accepts a batch list of IČOs and produces a clean JSON dataset (one item per IČO).
Extracts revenue by year (availability varies by company) for quick time-series analysis.
Captures identifiers and firmographics like VAT ID, address, and SK NACE classification.
Includes employment signals (text and minimum employees) for sizing and segmentation.
Works best for hundreds of IČOs per run to balance speed, stability, and data quality.

Features

Feature	Description
Batch IČO processing	Submit an array of Slovak IČOs and get structured results for each company.
Revenue by year	Extracts yearly revenues (e.g., revenue_2019 to revenue_2024 when available) for trend analysis.
VAT ID extraction	Captures IČ DPH (VAT ID) for compliance checks and matching across systems.
Address normalization	Returns a consistent company seat/address field for mapping and CRM enrichment.
SK NACE classification	Provides SK NACE codes to enable industry segmentation and reporting.
Employment indicators	Extracts employee-related text plus a minimum employee estimate for company sizing.
Established date	Includes company establishment date where available for maturity/tenure analysis.
Input validation	Filters invalid/empty IČO values and reports failures without breaking the whole run.

What Data This Scraper Extracts

Field Name	Field Description
ico	Slovak company identifier provided as input (IČO).
revenue_2019	Company revenue for 2019 (may be missing depending on availability).
revenue_2020	Company revenue for 2020 (may be missing depending on availability).
revenue_2021	Company revenue for 2021 (may be missing depending on availability).
revenue_2022	Company revenue for 2022 (may be missing depending on availability).
revenue_2023	Company revenue for 2023 (may be missing depending on availability).
revenue_2024	Company revenue for 2024 (may be missing depending on availability).
ic_dph	VAT ID (IČ DPH) when available.
sidlo	Company registered seat / address.
sk_nace	SK NACE classification code (industry classification).
employees_text	Human-readable employment info text (e.g., ranges or labels).
min_employees	Minimum employee estimate derived from available employment data.
established_date	Company establishment date when available.

Example Output

[
  {
    "ico": "35772474",
    "revenue_2019": 1254300,
    "revenue_2020": 1328800,
    "revenue_2021": 1410200,
    "revenue_2022": 1556700,
    "revenue_2023": 1629000,
    "revenue_2024": null,
    "ic_dph": "SK2020271234",
    "sidlo": "Bratislava, Slovak Republic",
    "sk_nace": "62010",
    "employees_text": "25–49 employees",
    "min_employees": 25,
    "established_date": "2002-04-18"
  }
]

Directory Structure Tree

Finstat Scraper/
├── src/
│   ├── main.js
│   ├── cli.js
│   ├── config/
│   │   ├── default.json
│   │   └── schema.json
│   ├── core/
│   │   ├── run.js
│   │   ├── validator.js
│   │   └── logger.js
│   ├── scrapers/
│   │   ├── finstatClient.js
│   │   ├── companyParser.js
│   │   └── revenueParser.js
│   ├── transformers/
│   │   ├── normalizeFields.js
│   │   └── toOutputRecord.js
│   ├── outputs/
│   │   ├── writeJson.js
│   │   └── writeNdjson.js
│   └── utils/
│       ├── http.js
│       ├── retry.js
│       ├── time.js
│       └── strings.js
├── data/
│   ├── inputs.sample.json
│   └── output.sample.json
├── tests/
│   ├── fixtures/
│   │   ├── company.sample.html
│   │   └── company.sample.json
│   ├── parser.test.js
│   ├── transformer.test.js
│   └── validator.test.js
├── .gitignore
├── .env.example
├── package.json
├── package-lock.json
├── LICENSE
└── README.md

Use Cases

B2B sales teams use it to enrich prospect lists with revenue and workforce signals, so they can prioritize accounts that match ICP size and budget.
Market researchers use it to segment Slovak companies by SK NACE and revenue bands, so they can build sharper industry reports and TAM estimates.
Data teams use it to standardize firmographic attributes across datasets, so they can improve matching, deduplication, and analytics quality.
Compliance and operations use it to cross-check VAT IDs and registered seats, so they can reduce onboarding friction and verification effort.
Growth analysts use it to track revenue changes across years, so they can spot trends, outliers, and fast-growing companies.

FAQs

What input format does it accept? Provide an array named icos containing Slovak IČO values as strings or numbers. The runner normalizes them to strings, trims whitespace, and skips empty entries.

Why are some revenue years missing or null? Not every company has the same set of published financial years. The scraper returns only what’s available, leaving missing years as null (or omitting them if you configure sparse output).

How many IČOs should I run at once? For stable operation, aim for a few hundred IČOs per run. Larger batches can be split into chunks to reduce retries, rate limits, and partial failures.

How does employment data work? Employment info is returned in a human-readable field (employees_text) and a numeric minimum estimate (min_employees). If only qualitative ranges are available, min_employees reflects the lower bound of that range.

Performance Benchmarks and Results

Primary Metric: Typical throughput of 2.0–3.5 companies/second on a standard workstation network, depending on company page complexity and retry rate.

Reliability Metric: 97–99% success rate on clean IČO lists (validated and deduplicated), with failures usually caused by invalid IČOs or incomplete company pages.

Efficiency Metric: Average memory usage stays under 200–350 MB for runs of ~300 IČOs, with streaming output options (NDJSON) to keep large exports lightweight.

Quality Metric: Revenue field completeness commonly lands around 70–90% across mixed datasets, while identifiers (IČO, address, SK NACE) typically exceed 90% completeness when available on the source page.

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Finstat Scraper

Introduction

Built for Slovak company enrichment

Features

What Data This Scraper Extracts

Example Output

Directory Structure Tree

Use Cases

FAQs

Performance Benchmarks and Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Finstat Scraper

Introduction

Built for Slovak company enrichment

Features

What Data This Scraper Extracts

Example Output

Directory Structure Tree

Use Cases

FAQs

Performance Benchmarks and Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages