Skip to content

fix(docx-mcp): seed revision ids from side parts#216

Open
stevenobiajulu wants to merge 2 commits into
mainfrom
171-sidepart-revision-ids
Open

fix(docx-mcp): seed revision ids from side parts#216
stevenobiajulu wants to merge 2 commits into
mainfrom
171-sidepart-revision-ids

Conversation

@stevenobiajulu
Copy link
Copy Markdown
Member

Summary

Closes #171.

getRevisionContextForSession now seeds the session RevisionIdState from document.xml plus available side parts that can carry package-wide w:id revision attributes. This prevents AI-emitted revision IDs from colliding with pre-existing revisions in comments.xml, footnotes.xml, endnotes.xml, commentsExtended.xml, or people.xml.

Implementation

  • Makes getRevisionContextForSession async so it can inspect optional side-part XML from the DOCX package.
  • Reads side parts from the original session buffer before first revision ID allocation.
  • Keeps the existing variadic inferStartingRevisionIdState(...docs) helper and removes the documented limitation.
  • Updates Table A MCP tools to await revision-context creation before emitting tracked changes.
  • Adds a regression where document.xml has w:id=42 but comments.xml has w:id=500, and verifies the next allocated ID starts at 501.

Verification

  • npm -w @usejunior/docx-mcp run test:run -- src/session/manager.test.ts — 31 passed
  • npm -w @usejunior/docx-mcp run lint
  • npm -w @usejunior/docx-mcp run build
  • git diff --check

@vercel
Copy link
Copy Markdown

vercel Bot commented May 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
site Ready Ready Preview, Comment May 22, 2026 6:51am

Request Review

@github-actions github-actions Bot added the fix label May 22, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Refines the side-part revision-id seed (introduced in 8417d7b for #171)
to address peer-review feedback before merge.

Why:

1. **Over-broad scan.** The previous `inferStartingRevisionIdState`
   collected every `w:id`/`id` attribute in the DOM. Non-revision IDs
   (e.g. `<w:comment w:id>`, `<w:footnote w:id>`,
   `<w:bookmarkStart w:id>`) share the attribute name but live in a
   separate ID space, so they were spuriously inflating the starting
   counter. Limit the scan to known revision-bearing WordprocessingML
   elements (`w:ins`, `w:del`, `w:moveFrom/To`, `w:*PrChange`,
   `w:cellIns/Del/Merge`, `w:customXml*RangeStart/End`).

2. **Incomplete part coverage.** Revision IDs are package-wide. The
   prior fixed list missed headers, footers, and the glossary part —
   pre-existing redlines in any of those would still collide with
   AI-emitted IDs. Replace the fixed list with `DocxZip.load` + dynamic
   enumeration of `word/header*.xml` and `word/footer*.xml`, plus a
   static list of `comments.xml`, `footnotes.xml`, `endnotes.xml`,
   `glossary/document.xml`. Drop `commentsExtended.xml` and `people.xml`
   from the seed list — they carry `w15:paraId` / `w15:author`
   identifiers, not revision `w:id` values, so under the narrowed scan
   they would be no-ops.

3. **First-call concurrency race.** With the scan now async, two
   concurrent first callers could each run the seed scan and the
   slower one could overwrite a counter that the faster caller had
   already advanced via emitted revisions, causing real collisions.
   Add a module-level `WeakMap<DocxSession, Promise<RevisionIdState>>`
   single-flight guard that mirrors the existing `baselinePromises`
   pattern in `SessionManager`.

4. **Parse robustness.** `parseXml` throws on malformed input. A
   corrupt optional side part (e.g. truncated `comments.xml`) would
   crash every tracked edit on the session. Wrap each per-part parse
   in `try/catch` and skip on failure — collision risk on an unrelated
   corrupt part is better than bricking the session.

New regression tests:
- non-revision `<w:comment w:id="500">` alone does not seed counter
- header with `<w:ins w:id="900">` seeds `nextId` to 901
- footer with `<w:ins w:id="1234">` seeds `nextId` to 1235
- malformed `comments.xml` does not block seeding from `document.xml`
- 3-way concurrent first callers resolve to a single seeded state

Verification:
- `npm -w @usejunior/docx-mcp run test:run -- src/session/manager.test.ts` — 36 passed
- `npm -w @usejunior/docx-mcp run lint`
- `npm -w @usejunior/docx-mcp run build`
- `npm run lint:workspaces` — clean (one pre-existing warning in edit.test.ts)
- `npm run test:run` — all workspaces green (1239 + 742 + 116 + 3 passed)

Ref: #171

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@stevenobiajulu stevenobiajulu marked this pull request as ready for review May 22, 2026 06:52
@stevenobiajulu stevenobiajulu enabled auto-merge (squash) May 22, 2026 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend inferStartingRevisionIdState to scan side parts (comments.xml / footnotes.xml)

1 participant