Skip to content

bug: non-transaction rows (empty/header) included in CSV and JSON output #129

@longieirl

Description

@longieirl

Description

Empty rows are appearing in output CSV and JSON files for CC statements. For example a CC output CSV contained rows like:

4 Feb,,,,,Statement_CC_1.pdf
12 Feb,,,,,Statement_CC_1.pdf
18 Feb,,,,,Statement_CC_1.pdf

These rows have empty Details and Debit, document_type: bank_statement, confidence_score: 1.0, no template_id.

Root Cause

The AIB CC statement PDF splits each transaction across two lines:

  • Line 1: Transaction Date | Posting Date | Details | Amount
  • Line 2: Reference number only (e.g. Ref: <reference_number>)

RowMergerService is not merging these continuation lines for the CC template. As a result:

  • Line 2 becomes a standalone row with just a date and empty Details/Debit
  • The row classifier sees it as metadata and returns early from RowPostProcessor.process()
  • The row is NOT tagged with Filename, document_type, or template_id
  • But it still passes the if row: guard in StatefulPageRowProcessor.process_page() because it has partial content (date field populated)
  • Result: empty transaction row written to output

Example affected transactions (3 rows across a single CC PDF):

  • Row 1: date only, no details, no amount — phantom empty row in output
  • Row 2: date only, no details, no amount — phantom empty row in output
  • Row 3: date only, no details, no amount — phantom empty row in output

And the corresponding real transactions (with details and amounts) are missing from output entirely.

Two bugs in one

  1. Missing transactions: Real transactions are dropped because their row data gets split across lines and the merger doesn't recombine them
  2. Empty rows in output: The orphaned continuation line (date only, no details/amount) passes through as a phantom empty row

Expected behaviour

All transactions from CC PDFs appear in output with correct Date, Details, and Amount. No empty rows.

Fix direction

RowMergerService needs to handle CC statement continuation lines — rows that contain only a reference number (Ref: ...) should be merged into the preceding transaction row, not emitted as standalone rows. This likely requires a continuation pattern in the CC template config or a CC-aware merge strategy in RowMergerService.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions