11 Apr 17:43

b950a76

Latest

Full Platform. Closes every gap identified in the competitive analysis. Five new features: multi-currency balance verification, hledger/beancount export, bulk directory scanner, account mapping rules, and a REST API microservice.

New features

1. Multi-currency balance verification

from bankstatementparser.hybrid import verify_balance_multi_currency

results = verify_balance_multi_currency(transactions, balances={
    "GBP": (Decimal("500"), Decimal("570")),
    "EUR": (Decimal("1000"), Decimal("1150")),
})

Groups by Transaction.currency, runs the Golden Rule independently per group. No more false DISCREPANCY on multi-currency statements.

2. hledger + beancount export

from bankstatementparser.export import to_hledger, to_beancount

Path("journal.ledger").write_text(to_hledger(transactions))
Path("journal.beancount").write_text(to_beancount(transactions))

Uses Transaction.category as the contra-account when set by the enrichment module. Zero external dependencies.

3. Bulk directory scanner

from bankstatementparser.hybrid import scan_and_ingest

batch = scan_and_ingest("statements/", pattern="**/*.pdf")
print(f"{batch.file_count} files, {batch.total_unique} unique transactions")

Scans a folder tree, runs smart_ingest on every match, deduplicates across the entire batch. Supports seen_hashes for cross-batch persistence.

4. Account mapping rules

from bankstatementparser.enrichment import AccountMapper

mapper = AccountMapper.from_json("mapping.json")
accounts = mapper.map_batch(transactions)

Ordered regex rules, first match wins, loaded from JSON config. Pairs with the ledger exporter for end-to-end plaintext-accounting workflows.

5. REST API

pip install 'bankstatementparser[api]'
bankstatementparser-api --port 8000

# POST a file, get JSON back
curl -F file=@statement.pdf http://localhost:8000/ingest
curl http://localhost:8000/health

FastAPI microservice with /ingest and /health. Default bind 127.0.0.1 (safe); use --host 0.0.0.0 for containers. Gated behind [api] extra.

Install

pip install bankstatementparser                          # core
pip install 'bankstatementparser[hybrid]'                # + text-LLM
pip install 'bankstatementparser[hybrid-vision]'         # + vision
pip install 'bankstatementparser[enrichment]'            # + categorization
pip install 'bankstatementparser[api]'                   # + REST API

Test plan

723 tests at 100% line + branch coverage
mypy --strict clean on 29 source files
ruff check + bandit -r clean
Python 3.14 asyncio compatibility fix included
44 CI checks pass

Full changelog: CHANGELOG.md

Pull request: #52

Assets 2

11 Apr 14:01

sebastienrousseau

v0.0.7

c86f30e

v0.0.7 — Universal Vision

Universal Vision. Turns the local Ollama vision path from 🔴 (600 s LiteLLM timeout, hallucinated output) to 🟢 (all 11 rows extracted in ~33 s, correct currency and balances). Three independent improvements, all verified end-to-end against real local Ollama models on Apple Silicon.

What's new

1. Direct Ollama bridge — `bankstatementparser.hybrid.ollama_direct`

# Auto-selected for any ollama/* model — zero opt-in needed
from bankstatementparser.hybrid import smart_ingest
result = smart_ingest("scan.pdf")  # just works, ~33s instead of 600s timeout

A ~220-line drop-in replacement for litellm.completion that targets Ollama's /api/chat endpoint via httpx. Sidesteps the upstream LiteLLM ↔ Ollama integration bug where vision calls with long structured-JSON system prompts hang at the 600 s timeout.

ollama_direct_completion(**kwargs) — accepts OpenAI-style messages (including multimodal image_url blocks), returns OpenAI-style response envelope
is_ollama_model(model) — returns True for ollama/<name> or ollama_chat/<name>
Auto-selection in both VisionExtractor and LLMExtractor — no user action required
No new dependencies — httpx is already a transitive dep of LiteLLM in [hybrid]

2. `ollama/minicpm-v` recommended default

ollama pull minicpm-v
export BSP_HYBRID_VISION_MODEL=ollama/minicpm-v

minicpm-v:8b (5.5 GB) is explicitly trained for OCR and document understanding. Replaces ollama/llava:7b which was a general-purpose multimodal model not designed for dense statement tables.

Model	Result on synthetic scanned PDF
`ollama/llava:7b`	🔴 Hallucinates INR currency, fabricated rows
`ollama/minicpm-v:8b`	🟢 All 11 transactions, GBP, balances correct, ~33 s

3. Strip mode — `VisionExtractor(strip_rows=True)`

from bankstatementparser.hybrid import VisionExtractor, smart_ingest

vision = VisionExtractor(strip_rows=True, n_strips=4)
result = smart_ingest("dense_statement.pdf", vision_extractor=vision)

Splits each page into N overlapping horizontal strips (default 4, 10% overlap). Header strip extracts balances; body strips extract transactions; results merged by transaction_hash. Designed for dense pages (≥15 rows) where small local models can't process the full page — CLIP's 336×336 internal downscale destroys fine table detail on a full A4 page, but preserves it on a strip.

Smoke-test results

Path	Model	Mode	Result
Text-LLM	`ollama/llama3`	single-shot	✅ All 11 rows, VERIFIED, ~25 s
Vision-LLM	`ollama/minicpm-v:8b`	single-shot	✅ All 11 rows, GBP, ~33 s
Vision-LLM	`ollama/minicpm-v:8b`	strip_rows=True	✅ Sign convention correct, ~43 s

Install

pip install 'bankstatementparser[hybrid-vision]'

Migration from v0.0.6

Fully backwards compatible. Existing code keeps working — it just runs faster. Three opt-in upgrade patterns:

# 1. Do nothing — auto-bridge activates for ollama/* models
result = smart_ingest("scan.pdf")

# 2. Switch to minicpm-v
os.environ["BSP_HYBRID_VISION_MODEL"] = "ollama/minicpm-v"

# 3. Enable strip mode for dense pages
vision = VisionExtractor(strip_rows=True, n_strips=4)
result = smart_ingest("dense.pdf", vision_extractor=vision)

Test plan

677 tests at 100% line + branch coverage (up from 649 on v0.0.6)
mypy --strict clean on 24 source files
ruff check + bandit -r clean
32 docs accuracy tests all pass
All examples verified end-to-end
44 CI checks pass

Full changelog

See CHANGELOG.md for the complete v0.0.7 entry.

Pull request: #51 (8 commits, all SSH-signed)

Assets 2

10 Apr 23:15

sebastienrousseau

v0.0.6

86431e1

v0.0.6 — Intelligence Layer

Intelligence Layer. The full v0.0.6 milestone. Drops Python 3.9 to retire the entire transitive CVE allow-list, adds a categorization enrichment module, an interactive review mode for discrepancy resolution, per-row bounding-box extraction from the vision pipeline, a pre-commit hook, and a 32-test automated docs accuracy suite. Closes #44, #45, #46, #47.

What's new

Categorization module (#44) — `bankstatementparser.enrichment`

from bankstatementparser.enrichment import Categorizer

cat = Categorizer()  # default: Plaid 13-category schema
enriched = cat.categorize_batch(transactions)
for et in enriched:
    print(et.transaction.description, "->", et.category, et.is_business_expense)

Categorizer — LiteLLM-backed with pluggable schema, batch support, graceful failure (no data loss on LLM errors), schema-normalizing category matching
EnrichedTransaction — wrapper (not mutator) around Transaction carrying category, is_business_expense, enrichment_confidence, and rationale
DEFAULT_CATEGORY_SCHEMA — Plaid's 13-category taxonomy as the default
[enrichment] install extra (pip install 'bankstatementparser[enrichment]')
Prompt injection defense: _sanitize_for_prompt() strips control characters and common injection markers from transaction descriptions before LLM interpolation

Interactive review mode (#45) — `--type review`

# 1. Ingest and save
bankstatementparser --type ingest --input statement.pdf --output result.json

# 2. Walk through discrepancies
bankstatementparser --type review --input result.json --output reviewed.json

IngestResult.to_json() / .from_json() — stable JSON round-trip with schema_version=1, Decimal amounts as strings (no float drift), embedded audit_trail
--type review CLI — single-character action menu per row: [a]ccept / [e]dit / [s]kip / [d]elete / [q]uit. Every action recorded in the audit trail. Edits capture before_hash / after_hash. Non-curses (plain stdin/stdout).
JSON size guard — rejects payloads > 50 MB before parsing

Per-row bounding boxes (#46) — `BoundingBox` + `Transaction.source_bbox`

for tx in result.transactions:
    if tx.source_bbox:
        print(f"Row at ({tx.source_bbox.x0:.2f}, {tx.source_bbox.y0:.2f})")

BoundingBox Pydantic model with normalized (0.0–1.0) coordinates and page_index, exported from the top-level package
Transaction.source_bbox — populated by the vision path when the model returns spatial coordinates
Inverted-box validation — model_validator rejects x0 > x1 or y0 > y1
Vision prompt updated to request per-row bounding boxes in the JSON schema

Python 3.9 retirement (#47)

Minimum Python bumped to 3.10 (Python 3.9 reached EOL 2025-10-31)
All 9 transitive CVE allow-list entries deleted — every vulnerable package now resolves to its patched series:

Package	v0.0.5	v0.0.6	Advisories closed
`litellm`	1.80.0	1.83.4	GHSA-jjhc-v7c2-5hh6, GHSA-53mr-6c8q-9789, GHSA-69x8-hrgq-fjj8
`cryptography`	43.0.3	46.0.7	GHSA-r6ph-v2qm-q3c2, GHSA-79v4-65xg-pq4g, GHSA-m959-cc7f-wv43
`pillow`	11.3.0	12.2.0	GHSA-cfh3-3jmp-rvhc
`filelock`	3.19.1	3.25.2	GHSA-w853-jp5j-5j7f, GHSA-qmgc-5h2g-mvrw
`requests`	2.32.5	2.33.1	GHSA-gc5v-m9x4-r6x2

Security hardening

Prompt injection defense in enrichment categorizer (_sanitize_for_prompt)
JSON deserialization size guard (50 MB cap in IngestResult.from_json)
Frozen-dataclass immutability fix — IngestResult fields changed from list to tuple
BoundingBox inverted-box validation via Pydantic model_validator
Duplicate-index warning when the LLM returns the same row index twice

Developer experience

Pre-commit hook (.githooks/pre-commit) runs make verify (ruff + mypy + pytest + bandit) before every commit. Setup: make install-hooks
Automated docs accuracy test suite (test_docs_accuracy.py, 32 tests) validates every factual claim in README, FAQ, CHANGELOG, CONTRIBUTING, and SECURITY against the actual codebase
Modernised Makefile with install, install-all, install-hooks, test, lint, typecheck, security, verify, dist, release targets
PowerShell CLI walkthrough (06_cli_walkthrough.ps1) for native Windows users

Install

pip install bankstatementparser                          # core (deterministic parsers)
pip install 'bankstatementparser[hybrid]'                # + text-LLM for digital PDFs
pip install 'bankstatementparser[hybrid-vision]'         # + vision for scanned PDFs
pip install 'bankstatementparser[enrichment]'            # + categorization

Migration from v0.0.5

The public API is unchanged. v0.0.5 code runs on v0.0.6 without modification provided the interpreter is Python 3.10+. If you are on Python 3.9, pin to v0.0.5:

bankstatementparser==0.0.5

Test plan

649 tests at 100% line + branch coverage (up from 541 on v0.0.5)
mypy --strict clean on 23 source files
ruff check + bandit -r clean
44 CI checks pass on Python 3.10–3.14
All hybrid examples verified end-to-end
Deep-dive security + correctness audit completed with all findings fixed

Full changelog

See CHANGELOG.md for the complete v0.0.6 entry.

Pull request: #48 (15 commits, all SSH-signed)

Assets 2

08 Apr 13:35

sebastienrousseau

v0.0.5

c67f507

v0.0.5 — Universal Extraction

Universal Extraction. Combines the deterministic reliability of the existing ISO/exchange-format parsers with an adaptive LLM layer for unstandardized PDFs, including a multimodal vision fallback for scanned/image-only statements. The core "data only, no inference" philosophy of the library is preserved — categorization and review-mode UI are intentionally deferred to v0.0.6.

Three extraction paths via `smart_ingest()`

Path	Trigger	Cost	Module
A — Deterministic	`detect_statement_format()` returns a non-PDF format	$0, fastest	existing parsers
B — Text-LLM	PDF with ≥ 50 chars extractable text	tokens	`hybrid/llm_extractor.py`
C — Vision-LLM	PDF below `LOW_TEXT_DENSITY_THRESHOLD` (scan/photo)	tokens + compute	`hybrid/vision.py`

IngestResult.source_method is tagged with "deterministic" | "llm" | "vision" for full audit provenance on every row.

from bankstatementparser.hybrid import smart_ingest

result = smart_ingest("statement.pdf")
print(result.source_method)        # "deterministic" | "llm" | "vision"
print(result.verification.status)  # VERIFIED | DISCREPANCY | FAILED
for tx in result.transactions:
    print(tx.transaction_hash, tx.amount, tx.description)

Install

# Core install — deterministic parsers only (zero AI dependencies)
pip install bankstatementparser

# Add the text-LLM path for digital PDFs
pip install 'bankstatementparser[hybrid]'

# Add higher-fidelity table extraction (adds pdfplumber)
pip install 'bankstatementparser[hybrid-plus]'

# Add the multimodal vision path for scanned/photocopied PDFs
pip install 'bankstatementparser[hybrid-vision]'

Every [hybrid*] extra is opt-in and pure-Python — no poppler, no system libraries, no GPU required. Works identically on macOS, Linux, and WSL.

Highlights

New `bankstatementparser.hybrid` subpackage

smart_ingest() — single entry point that implements the three-path routing above. Auto-routes to vision when pypdf extracts fewer than LOW_TEXT_DENSITY_THRESHOLD (50) characters.
LLMExtractor — LiteLLM-backed text extractor with provider-agnostic configuration via BSP_HYBRID_MODEL. Default model is ollama/llama3 (local, private). Tolerant JSON parsing handles markdown fences and prose wrappers.
VisionExtractor — multimodal extractor for scanned/image-only PDFs. Renders pages with pypdfium2 (pure-Python wheel, no poppler dependency) and sends base64 PNGs via LiteLLM's multimodal payload. Vision model is opt-in only via BSP_HYBRID_VISION_MODEL — no surprise downloads.
verify_balance() — Golden Rule integrity check returning VERIFIED | DISCREPANCY | FAILED with the exact delta when mismatched.
Structured prompts that explicitly instruct the model to sort transactions chronologically, mitigating PDF reading-order issues.

`Transaction` model upgrades

transaction_hash — computed field, MD5 of date | normalized_description | amount. Every row carries an immutable fingerprint for idempotent re-ingestion.
source_method — Literal["deterministic", "llm"], audit provenance per row.
confidence — Optional[float], populated for LLM rows.
category and raw_source_text — reserved placeholders for the v0.0.6 "Intelligence Layer" release.

`normalize_description()` noise stripping

Strips inline dates (2026-04-01), times (12:49), and long alphanumeric IDs so that recurring charges hash identically. AMZN MKTPLACE 2026-04-01 #A1B2C3 and AMZN MKTPLACE 2026-04-02 #Z9Y8X7 collapse to the same normalized form, which means dedupe_by_hash() actually catches real duplicates instead of being defeated by one rotating reference character.

`Deduplicator.dedupe_by_hash()`

New strict identity filter using Transaction.transaction_hash, designed for incremental ingestion (syncing to Google Sheets / a database). Mutates a caller-owned seen_hashes: set[str] so consumers can persist state across batches. Coexists with the existing fuzzy/temporal deduplicate() method.

CLI

bankstatementparser --type ingest --input statement.pdf [--output ledger.csv]

New bankstatementparser console-script entry point. Both forms work in parallel:

bankstatementparser --type ingest --input file.pdf
python -m bankstatementparser.cli --type ingest --input file.pdf

Graceful degradation when the [hybrid] extra is missing — surfaces the specific missing dependency name and prints a pip install hint.

Examples — `examples/hybrid/`

Eight new files including a Mermaid flow diagram, prerequisites table, 15-minute quick start, mock-vs-live mode comparison, cross-platform verification matrix, and troubleshooting table. generate_sample_pdfs.py produces reproducible synthetic UK-bank PDFs (digital + scanned) so the LLM examples are runnable without real bank PDFs. Each LLM example runs in two modes — MOCK (default, fully offline, CI-safe) and LIVE (set BSP_HYBRID_MODEL / BSP_HYBRID_VISION_MODEL).

See examples/hybrid/README.md for the full walkthrough.

Smoke-test results (real Ollama models, Apple Silicon, 2026-04-08)

Path	Model	Result
A — Deterministic	n/a	✅ CAMT.053 fixture, 3 transactions, all hashes computed
B — Text-LLM	`ollama/llama3` (4.7 GB)	✅ All 11 transactions extracted with `confidence=1.00`, balance VERIFIED, ~25s end-to-end
C — Vision-LLM	`ollama/llava:7b` (4.7 GB)	⚠️ Library code verified correct, but blocked by reproducible upstream LiteLLM ↔ Ollama hang on long system prompts. Direct Ollama call works in 18s but llava-7b hallucinates statement contents at any render scale. Recommended production path: hosted vision models (`gpt-4o`, `claude-opus-4-6`, `gemini-2.5-pro`).
Golden Rule	n/a	✅ All three outcomes (`VERIFIED`, `DISCREPANCY`, `FAILED`) reproduce as documented
Dedupe	n/a	✅ Recurring Amazon dup caught in batch 1, both already-seen rows caught in batch 2
CLI `--type ingest`	n/a	✅ Deterministic path produces expected DataFrame with all v0.0.5 columns

Test plan

541 tests pass (up from 484 on v0.0.4)
100% line + branch coverage across the entire package, including the new hybrid subpackage
mypy --strict clean on 21 source files
ruff check clean on bankstatementparser/, tests/, and examples/
bandit -r clean
All optional dependencies monkeypatched in tests — CI does not require any [hybrid*] extra to be installed
48 CI checks green on the merge commit

Security

Allow-listed nine transitive CVEs across litellm (3), cryptography (3), pillow (1), filelock (2), and requests (1). All nine share the same root cause: their patched versions require Python ≥ 3.10, while this release still supports Python 3.9. Each advisory is documented per-CVE with the reason its vulnerable code path is unreachable from anything we ship. The entire allow-list can be deleted in a single commit when the minimum Python is raised — see the strategic note in the v0.0.5 commit history.

Deferred to v0.0.6 — "Intelligence Layer"

Categorization (category field populated, is_business_expense flag) — will ship as opt-in bankstatementparser.enrichment module
Interactive review mode — separate --type review subcommand consuming saved IngestResult JSON
OCR chunk-to-row mapping — true bounding-box mapping from the vision path
Drop Python 3.9 support — Python 3.9 reached EOL on 2025-10-31

Full changelog

See CHANGELOG.md for the complete v0.0.5 entry.

Pull request: #43 (13 commits, all SSH-signed)

Assets 2

01 Apr 07:40

sebastienrousseau

v0.0.4

92f2128

v0.0.4 — 27K tx/s streaming, parallel parsing, Python 3.14, ISO 13485

Performance

Metric	CAMT	PAIN.001
Throughput	27,000+ tx/s	52,000+ tx/s
Per-transaction latency	37 us	19 us
Time to first result	< 1 ms	< 2 ms
Memory scaling	Constant (1K–50K)	Constant (1K–50K)

20% CAMT streaming optimization (xpath → find/findtext)
True streaming for PAIN.001 files > 50 MB via chunk-based temp file
CI-enforced TPS minimums and latency contracts

New Features

parse_files_parallel() — Process multiple statement files across CPU cores using ProcessPoolExecutor
Deduplicator — Deterministic transaction deduplication with explainable confidence scores
Transaction — Pydantic model normalizing records from any parser with Decimal precision
to_polars() / to_polars_lazy() — Optional Polars DataFrame export (pip install bankstatementparser[polars])
Python 3.13 and 3.14 — Full support with CI matrix testing

Dependencies

Package	Change
lxml	4.9.3 → 6.0.2
Pygments	2.19.2 → 2.20.0 (CVE-2026-4539 fix)
pydantic	Added (^2.11.0)
hypothesis	Added (>=6.82,<7)
polars	Added (^1.32.0, optional)

Documentation

FAQ.md — 11 questions across 3 personas (CFO/Auditor, Fintech Dev, Treasury Analyst)
docs/MAPPING.md — Complete XML tag to DataFrame column mapping for all 6 formats
README — Performance table, parallel parsing, deduplication, PII redaction, output examples

ISO 13485 Compliance Suite

Risk Register — 7 quantified hazards with severity/probability scoring and residual risk
V&V Plan — 5-phase, 19-step with pass criteria and evidence retention
Change Control Procedure — Change workflow, impact assessment, rollback
SOUP Register — 22 tracked components with risk levels and EOL
Traceability Matrix — 17 design inputs mapped to implementation and verification
Secure Path to Production — Gate criteria per stage with approval authority
Security Policy — Response SLAs (48h ack, 30d fix), severity classification

Quality

Metric	Value
Tests	467 passed, 0 skipped
Branch coverage	100%
Modules	13
Bandit SAST	0 findings
pip-audit	0 CVEs
Commits	All signed (ED25519)
SOUP components	22
Design inputs	17

Breaking Changes

None. All existing APIs are backward-compatible.

THE ARCHITECT ᛫ Sebastien Rousseau ᛫ https://sebastienrousseau.com
THE ENGINE ᛞ EUXIS ᛫ Enterprise Unified Execution Intelligence System ᛫ https://euxis.co

Assets 2

23 Mar 00:47

sebastienrousseau

v0.0.3

666fd79

v0.0.3

What's Changed

feat(v0.0.3): deduplication, parser performance, and typed hardening by @sebastienrousseau in #25

Full Changelog: v0.0.2...v0.0.3

Contributors

sebastienrousseau

Assets 7

22 Mar 21:14

sebastienrousseau

v0.0.2

885e553

v0.0.2

Highlights

Add secure in-memory CAMT parsing with CamtParser.from_string(...) and CamtParser.from_bytes(...)
Add hardened ZIP processing for XML statements via iter_secure_xml_entries(...)
Add parser support for bank CSV, OFX/QFX, and MT940 formats
Add automatic statement-format detection with detect_statement_format(...) and create_parser(...)
Add CI, security scanning, SBOM, checksum, and provenance hardening
Refresh docs, examples, contribution guidance, and cross-platform behavior

Verification

PR checks were green before merge
Release Integrity workflow for tag v0.0.2 passed successfully on 2026-03-22
Attached artifacts include the wheel, sdist, SHA256 checksums, SBOM, and dependency report

Assets 7

08 Nov 21:28

github-actions

v0.0.1

36742ae

Release v0.0.1

Release v0.0.1 - 2023-11-08

Bank Statement Parser v0.0.1 🐍

The Bank Statement Parser is a Python library built for Finance and Treasury Professionals

The Bank Statement Parser is an essential Python library for financial data management. Developed for the busy finance and treasury professional, it simplifies the task of parsing bank statements.

This tool simplifies the process of analysing CAMT and SEPA transaction files. Its streamlined design removes cumbersome manual data review and provides you with a concise, accurate report to facilitate further analysis.

Bank Statement Parser helps you save time by quickly and accurately processing data, allowing you to focus on your financial insights and decisions. Its reliable precision is powered by Python, making it the smarter, more efficient way to manage bank statements.

Key Features

Versatile Parsing: Easily handle formats like CAMT (ISO 20022) and beyond.
Financial Insights: Unlock detailed analysis with powerful calculation utilities.
Simple CLI: Automate and integrate with a straightforward command-line interface.

Why Choose the Bank Statement Parser

Designed for Finance: Tailored features for the finance sector's needs.
Efficiency at Heart: Transform complex data tasks into simple ones.
Community First: Built and enhanced by experts, for experts.

Functionality

CamtParser: Parse CAMT format files with ease.
Pain001Parser: Handle SEPA PAIN.001 files effortlessly.

Installation

Create a Virtual Environment

We recommend creating a virtual environment to install the Bank Statement Parser. This will ensure that the package is installed in an isolated environment and will not affect other projects.

python3 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Getting Started

Install bankstatementparser with just one command:

pip install bankstatementparser

Usage

CAMT Files

from bankstatementparser import CamtParser

# Initialize the parser with the CAMT file path
camt_parser = CamtParser('path/to/camt/file.xml')

# Parse the file and get the results
results = camt_parser.parse()

PAIN.001 Files

from bankstatementparser import Pain001Parser

# Initialize the parser with the PAIN.001 file path
pain_parser = Pain001Parser('path/to/pain/file.xml')

# Parse the file and get the results
results = pain_parser.parse()

Command Line Interface (CLI) Guide

Leverage the CLI for quick parsing tasks:

Basic Command

python cli.py --type <file_type> --input <input_file> [--output <output_file>]

--type: Type of the bank statement file. Currently supported types are "camt" and "pain001".
--input: Path to the bank statement file.
--output: (Optional) Path to save the parsed data. If not provided, data is printed to the console.

Changelog

Artifacts 🎁

Assets 2

Uh oh!

Releases: sebastienrousseau/bankstatementparser

v0.0.8 — Full Platform

New features

1. Multi-currency balance verification

2. hledger + beancount export

3. Bulk directory scanner

4. Account mapping rules

5. REST API

Install

Test plan

Uh oh!

v0.0.7 — Universal Vision

What's new

1. Direct Ollama bridge — bankstatementparser.hybrid.ollama_direct

2. ollama/minicpm-v recommended default

3. Strip mode — VisionExtractor(strip_rows=True)

Smoke-test results

Install

Migration from v0.0.6

Test plan

Full changelog

Uh oh!

v0.0.6 — Intelligence Layer

What's new

Categorization module (#44) — bankstatementparser.enrichment

Interactive review mode (#45) — --type review

Per-row bounding boxes (#46) — BoundingBox + Transaction.source_bbox

Python 3.9 retirement (#47)

Security hardening

Developer experience

Install

Migration from v0.0.5

Test plan

Full changelog

Uh oh!

v0.0.5 — Universal Extraction

Three extraction paths via smart_ingest()

Install

Highlights

New bankstatementparser.hybrid subpackage

Transaction model upgrades

normalize_description() noise stripping

Deduplicator.dedupe_by_hash()

CLI

Examples — examples/hybrid/

Smoke-test results (real Ollama models, Apple Silicon, 2026-04-08)

Test plan

Security

Deferred to v0.0.6 — "Intelligence Layer"

Full changelog

Uh oh!

v0.0.4 — 27K tx/s streaming, parallel parsing, Python 3.14, ISO 13485

Performance

New Features

Dependencies

Documentation

ISO 13485 Compliance Suite

Quality

Breaking Changes

Uh oh!

v0.0.3

What's Changed

Contributors

Uh oh!

v0.0.2

Highlights

Verification

Uh oh!

Release v0.0.1

Release v0.0.1 - 2023-11-08

Bank Statement Parser v0.0.1 🐍

The Bank Statement Parser is a Python library built for Finance and Treasury Professionals

Key Features

Why Choose the Bank Statement Parser

Functionality

Installation

Create a Virtual Environment

Getting Started

Usage

1. Direct Ollama bridge — `bankstatementparser.hybrid.ollama_direct`

2. `ollama/minicpm-v` recommended default

3. Strip mode — `VisionExtractor(strip_rows=True)`

Categorization module (#44) — `bankstatementparser.enrichment`

Interactive review mode (#45) — `--type review`

Per-row bounding boxes (#46) — `BoundingBox` + `Transaction.source_bbox`

Three extraction paths via `smart_ingest()`

New `bankstatementparser.hybrid` subpackage

`Transaction` model upgrades

`normalize_description()` noise stripping

`Deduplicator.dedupe_by_hash()`

Examples — `examples/hybrid/`