Skip to content

fix(#129): add aib_credit_card.json template with correct CC column boundaries#138

Merged
longieirl merged 1 commit intomainfrom
fix/129-cc-non-transaction-rows
Apr 9, 2026
Merged

fix(#129): add aib_credit_card.json template with correct CC column boundaries#138
longieirl merged 1 commit intomainfrom
fix/129-cc-non-transaction-rows

Conversation

@longieirl
Copy link
Copy Markdown
Owner

Root Cause

Without a credit_card_statement template, TemplateDetector.get_default_for_type("credit_card_statement") fell back to the global default (bank statement columns: Date/Details/Debit €/Credit €/Balance €).

RowBuilder then assigned CC PDF words to the wrong x-boundaries. Because CC column positions differ significantly from bank statement positions (Transaction Details spans 118–370 vs Details 78–255), words landed in wrong or empty columns. RefContinuationClassifier then called _get_description_text() which found an empty description, fell through to TransactionClassifier, and classified Ref: <number> lines as transactions — emitting phantom empty rows in CC output.

Fix

Adds aib_credit_card.json with:

  • document_type: credit_card_statement — selected by TemplateDetector._classify_document_type() when card number patterns or "Credit Card Statement" header is found
  • Correct CC column layout: Transaction Date (29–80), Posting Date (80–118), Transaction Details (118–370), Amount (370–430) — matching the boundaries used in the existing test_row_merger_integration.py CC tests
  • Detection via header_keywords (Allied Irish Banks, Credit Card Statement, Card Statement) and column_headers

The existing RefContinuationClassifier and _is_date_only_split logic in RowMergerService already handle the two CC row split patterns correctly when the right columns are in scope — confirmed by tests at test_row_merger_integration.py:360 and 402.

Test Plan

  • TestAIBCCTemplateColumnsFix::test_cc_default_template_uses_cc_columnsget_default_for_type('credit_card_statement') returns aib_credit_card (not a bank template)
  • TestAIBCCTemplateColumnsFix::test_cc_columns_include_transaction_details — column names include Transaction Details as required by RefContinuationClassifier
  • TestAIBCCTemplateColumnsFix::test_cc_column_boundaries_match_aib_pdf_layout — x-boundaries match the known AIB CC PDF layout
  • All 1555 existing tests continue to pass

…oundaries

Without a credit_card_statement template, get_default_for_type() fell back
to the global default (bank statement columns). RowBuilder then mapped CC PDF
words to Date/Details/Debit/Credit/Balance — wrong x-boundaries — so
RefContinuationClassifier saw empty description text and classified Ref lines
as transactions, emitting phantom empty rows in CC output.

Adds aib_credit_card.json with:
- document_type: credit_card_statement (selected by TemplateDetector)
- Correct column layout: Transaction Date, Posting Date, Transaction Details, Amount
- Detection via header_keywords and column_headers

Adds 3 regression tests in TestAIBCCTemplateColumnsFix covering:
- get_default_for_type('credit_card_statement') returns aib_credit_card
- Column names include 'Transaction Details' (required by RefContinuationClassifier)
- Column x-boundaries match known AIB CC PDF layout
@longieirl longieirl self-assigned this Apr 9, 2026
@github-actions github-actions bot added the bug Something isn't working label Apr 9, 2026
@longieirl longieirl merged commit bfd42f7 into main Apr 9, 2026
11 checks passed
@longieirl longieirl deleted the fix/129-cc-non-transaction-rows branch April 9, 2026 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants