Skip to content

read_file: footnotes anchored to non-body paragraphs are silently dropped from inline output #185

@stevenobiajulu

Description

@stevenobiajulu

Context

While implementing inline footnote bodies for read_file (#158), we discovered that the NVCA SPA fixture has a footnote whose anchor paragraph is not surfaced by buildDocumentView() — so it's invisible to inline-windowed pagination.

Concrete numbers from tests/test_documents/nvca-regression/source.docx:

Path Count
get_footnotes total 109
Eligibility filter (display_number > 0, non-empty, anchored) 108
Inline-attached via paginated read_file(format='json') 107

The missing one is footnote id=47 (display 46), anchored to _bk_6d177a97f7e6. Calling read_file(node_ids: ['_bk_6d177a97f7e6']) returns zero nodes — the paragraph exists in getFootnotes()'s anchor scan but not in the document view. It likely lives in a non-body part (header, footer, comment body, footnote body itself), or in a structure buildDocumentView() skips.

Why it matters

The motivating use case for #158 was fidelity: legal-context was silently losing 109 footnotes from this exact fixture. We've recovered 107, but a fully faithful single-call render still loses one. Whether 107 or 108 is the right denominator depends on the maintainer's intent for buildDocumentView().

Reproduction

const r = await getFootnotes(mgr, { file_path: NVCA });
// r.footnotes contains id=47, anchored_paragraph_id='_bk_6d177a97f7e6'

const probe = await readFile(mgr, {
  file_path: NVCA,
  format: 'json',
  node_ids: ['_bk_6d177a97f7e6'],
});
// probe.content === '[\n\n]'  — paragraph not in document view

The new test NVCA SPA fixture: get_footnotes returns 109; paginated JSON walk inlines 107 bodies (packages/docx-mcp/src/tools/read_file_footnotes.test.ts) pins this down with an explanatory comment.

Options

  1. Investigate the unsurfaced paragraph — figure out which non-body part it lives in and decide whether buildDocumentView() should include it.
  2. Surface unreachable footnotes via a top-level field in read_file output (e.g. unreachable_footnotes: [{id, display_number, text, anchored_paragraph_id}]) so callers see them without a separate get_footnotes call.
  3. Document the gap in read_file's schema description so consumers know to also call get_footnotes for full fidelity on edge-case fixtures.

Related

  • Optionally inline footnote bodies in read_file (windowed, budget-aware) #158 — inline footnotes feature that surfaced this
  • packages/docx-core/src/primitives/footnotes.ts:224-277getFootnotes (sees the paragraph)
  • packages/docx-core/src/primitives/document_view.tsbuildDocumentView (skips it)
  • tests/test_documents/nvca-regression/source.docx — fixture demonstrating the gap

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions