Description
Empty rows are appearing in output CSV and JSON files for CC statements. For example a CC output CSV contained rows like:
4 Feb,,,,,Statement_CC_1.pdf
12 Feb,,,,,Statement_CC_1.pdf
18 Feb,,,,,Statement_CC_1.pdf
These rows have empty Details and Debit, document_type: bank_statement, confidence_score: 1.0, no template_id.
Root Cause
The AIB CC statement PDF splits each transaction across two lines:
- Line 1: Transaction Date | Posting Date | Details | Amount
- Line 2: Reference number only (e.g.
Ref: <reference_number>)
RowMergerService is not merging these continuation lines for the CC template. As a result:
- Line 2 becomes a standalone row with just a date and empty Details/Debit
- The row classifier sees it as
metadata and returns early from RowPostProcessor.process()
- The row is NOT tagged with
Filename, document_type, or template_id
- But it still passes the
if row: guard in StatefulPageRowProcessor.process_page() because it has partial content (date field populated)
- Result: empty transaction row written to output
Example affected transactions (3 rows across a single CC PDF):
- Row 1: date only, no details, no amount — phantom empty row in output
- Row 2: date only, no details, no amount — phantom empty row in output
- Row 3: date only, no details, no amount — phantom empty row in output
And the corresponding real transactions (with details and amounts) are missing from output entirely.
Two bugs in one
- Missing transactions: Real transactions are dropped because their row data gets split across lines and the merger doesn't recombine them
- Empty rows in output: The orphaned continuation line (date only, no details/amount) passes through as a phantom empty row
Expected behaviour
All transactions from CC PDFs appear in output with correct Date, Details, and Amount. No empty rows.
Fix direction
RowMergerService needs to handle CC statement continuation lines — rows that contain only a reference number (Ref: ...) should be merged into the preceding transaction row, not emitted as standalone rows. This likely requires a continuation pattern in the CC template config or a CC-aware merge strategy in RowMergerService.
Description
Empty rows are appearing in output CSV and JSON files for CC statements. For example a CC output CSV contained rows like:
These rows have empty
DetailsandDebit,document_type: bank_statement,confidence_score: 1.0, notemplate_id.Root Cause
The AIB CC statement PDF splits each transaction across two lines:
Ref: <reference_number>)RowMergerServiceis not merging these continuation lines for the CC template. As a result:metadataand returns early fromRowPostProcessor.process()Filename,document_type, ortemplate_idif row:guard inStatefulPageRowProcessor.process_page()because it has partial content (date field populated)Example affected transactions (3 rows across a single CC PDF):
And the corresponding real transactions (with details and amounts) are missing from output entirely.
Two bugs in one
Expected behaviour
All transactions from CC PDFs appear in output with correct Date, Details, and Amount. No empty rows.
Fix direction
RowMergerServiceneeds to handle CC statement continuation lines — rows that contain only a reference number (Ref: ...) should be merged into the preceding transaction row, not emitted as standalone rows. This likely requires a continuation pattern in the CC template config or a CC-aware merge strategy inRowMergerService.