Context
While implementing inline footnote bodies for read_file (#158), we discovered that the NVCA SPA fixture has a footnote whose anchor paragraph is not surfaced by buildDocumentView() — so it's invisible to inline-windowed pagination.
Concrete numbers from tests/test_documents/nvca-regression/source.docx:
| Path |
Count |
get_footnotes total |
109 |
| Eligibility filter (display_number > 0, non-empty, anchored) |
108 |
Inline-attached via paginated read_file(format='json') |
107 |
The missing one is footnote id=47 (display 46), anchored to _bk_6d177a97f7e6. Calling read_file(node_ids: ['_bk_6d177a97f7e6']) returns zero nodes — the paragraph exists in getFootnotes()'s anchor scan but not in the document view. It likely lives in a non-body part (header, footer, comment body, footnote body itself), or in a structure buildDocumentView() skips.
Why it matters
The motivating use case for #158 was fidelity: legal-context was silently losing 109 footnotes from this exact fixture. We've recovered 107, but a fully faithful single-call render still loses one. Whether 107 or 108 is the right denominator depends on the maintainer's intent for buildDocumentView().
Reproduction
const r = await getFootnotes(mgr, { file_path: NVCA });
// r.footnotes contains id=47, anchored_paragraph_id='_bk_6d177a97f7e6'
const probe = await readFile(mgr, {
file_path: NVCA,
format: 'json',
node_ids: ['_bk_6d177a97f7e6'],
});
// probe.content === '[\n\n]' — paragraph not in document view
The new test NVCA SPA fixture: get_footnotes returns 109; paginated JSON walk inlines 107 bodies (packages/docx-mcp/src/tools/read_file_footnotes.test.ts) pins this down with an explanatory comment.
Options
- Investigate the unsurfaced paragraph — figure out which non-body part it lives in and decide whether
buildDocumentView() should include it.
- Surface unreachable footnotes via a top-level field in
read_file output (e.g. unreachable_footnotes: [{id, display_number, text, anchored_paragraph_id}]) so callers see them without a separate get_footnotes call.
- Document the gap in
read_file's schema description so consumers know to also call get_footnotes for full fidelity on edge-case fixtures.
Related
Context
While implementing inline footnote bodies for
read_file(#158), we discovered that the NVCA SPA fixture has a footnote whose anchor paragraph is not surfaced bybuildDocumentView()— so it's invisible to inline-windowed pagination.Concrete numbers from
tests/test_documents/nvca-regression/source.docx:get_footnotestotalread_file(format='json')The missing one is footnote
id=47(display 46), anchored to_bk_6d177a97f7e6. Callingread_file(node_ids: ['_bk_6d177a97f7e6'])returns zero nodes — the paragraph exists ingetFootnotes()'s anchor scan but not in the document view. It likely lives in a non-body part (header, footer, comment body, footnote body itself), or in a structurebuildDocumentView()skips.Why it matters
The motivating use case for #158 was fidelity:
legal-contextwas silently losing 109 footnotes from this exact fixture. We've recovered 107, but a fully faithful single-call render still loses one. Whether 107 or 108 is the right denominator depends on the maintainer's intent forbuildDocumentView().Reproduction
The new test
NVCA SPA fixture: get_footnotes returns 109; paginated JSON walk inlines 107 bodies(packages/docx-mcp/src/tools/read_file_footnotes.test.ts) pins this down with an explanatory comment.Options
buildDocumentView()should include it.read_fileoutput (e.g.unreachable_footnotes: [{id, display_number, text, anchored_paragraph_id}]) so callers see them without a separateget_footnotescall.read_file's schema description so consumers know to also callget_footnotesfor full fidelity on edge-case fixtures.Related
packages/docx-core/src/primitives/footnotes.ts:224-277—getFootnotes(sees the paragraph)packages/docx-core/src/primitives/document_view.ts—buildDocumentView(skips it)tests/test_documents/nvca-regression/source.docx— fixture demonstrating the gap