Skip to content

fix(#129): filter non-transaction rows from output#130

Closed
longieirl wants to merge 1 commit intomainfrom
fix/filter-non-transaction-rows
Closed

fix(#129): filter non-transaction rows from output#130
longieirl wants to merge 1 commit intomainfrom
fix/filter-non-transaction-rows

Conversation

@longieirl
Copy link
Copy Markdown
Owner

Summary

Closes #129

  • Changed if row: to if row.get("Filename"): in StatefulPageRowProcessor.process_page()
  • Filename is only stamped by RowPostProcessor on rows that pass transaction classification — making it a reliable sentinel to exclude headers, totals, and blank rows

Test plan

  • 1484 unit tests passed locally
  • Integration snapshot: bank_statements_9459 row count 22 → 15 (7 empty rows removed)

Non-transaction rows (headers, totals, blanks) were passing the 'if row:'
guard in process_page() because they still held partial data. Changed to
'if row.get("Filename"):' — Filename is only stamped by RowPostProcessor
on rows that pass transaction classification, making it a reliable sentinel.

Removes empty rows from CC output (bank_statements_9459: 22 -> 15 rows).
@github-actions github-actions bot added the bug Something isn't working label Apr 9, 2026
@longieirl
Copy link
Copy Markdown
Owner Author

Closing — fix was too aggressive. The empty rows (4 Feb, 12 Feb, 18 Feb) are actually real transactions being misclassified as 'metadata' by the row classifier because the CC statement splits each transaction across two lines (transaction data + reference number). The row merger is not combining these continuation lines correctly for the CC template. Root cause is in row merging, not row filtering. Will be tracked separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: non-transaction rows (empty/header) included in CSV and JSON output

2 participants