Conversation
…oundaries
Without a credit_card_statement template, get_default_for_type() fell back
to the global default (bank statement columns). RowBuilder then mapped CC PDF
words to Date/Details/Debit/Credit/Balance — wrong x-boundaries — so
RefContinuationClassifier saw empty description text and classified Ref lines
as transactions, emitting phantom empty rows in CC output.
Adds aib_credit_card.json with:
- document_type: credit_card_statement (selected by TemplateDetector)
- Correct column layout: Transaction Date, Posting Date, Transaction Details, Amount
- Detection via header_keywords and column_headers
Adds 3 regression tests in TestAIBCCTemplateColumnsFix covering:
- get_default_for_type('credit_card_statement') returns aib_credit_card
- Column names include 'Transaction Details' (required by RefContinuationClassifier)
- Column x-boundaries match known AIB CC PDF layout
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Root Cause
Without a
credit_card_statementtemplate,TemplateDetector.get_default_for_type("credit_card_statement")fell back to the global default (bank statement columns:Date/Details/Debit €/Credit €/Balance €).RowBuilderthen assigned CC PDF words to the wrong x-boundaries. Because CC column positions differ significantly from bank statement positions (Transaction Detailsspans 118–370 vsDetails78–255), words landed in wrong or empty columns.RefContinuationClassifierthen called_get_description_text()which found an empty description, fell through toTransactionClassifier, and classifiedRef: <number>lines as transactions — emitting phantom empty rows in CC output.Fix
Adds
aib_credit_card.jsonwith:document_type: credit_card_statement— selected byTemplateDetector._classify_document_type()when card number patterns or "Credit Card Statement" header is foundTransaction Date (29–80),Posting Date (80–118),Transaction Details (118–370),Amount (370–430)— matching the boundaries used in the existingtest_row_merger_integration.pyCC testsheader_keywords(Allied Irish Banks,Credit Card Statement,Card Statement) andcolumn_headersThe existing
RefContinuationClassifierand_is_date_only_splitlogic inRowMergerServicealready handle the two CC row split patterns correctly when the right columns are in scope — confirmed by tests attest_row_merger_integration.py:360and402.Test Plan
TestAIBCCTemplateColumnsFix::test_cc_default_template_uses_cc_columns—get_default_for_type('credit_card_statement')returnsaib_credit_card(not a bank template)TestAIBCCTemplateColumnsFix::test_cc_columns_include_transaction_details— column names includeTransaction Detailsas required byRefContinuationClassifierTestAIBCCTemplateColumnsFix::test_cc_column_boundaries_match_aib_pdf_layout— x-boundaries match the known AIB CC PDF layout