feat: add stable indexing view and semantic refresh classification#29
Conversation
Original prompt from Stanislau
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
📝 WalkthroughWalkthroughThe PR introduces semantic change classification for source refresh. When the runtime recompiles, it now generates a stable "indexing view" semantic spine, persists it, and compares new/previous versions to classify changes into four families: viewing_change, slot_explicitness_change, editioned_semantic_change, or described_entity_retargeting, along with refresh sentinels that validate continuity. Changes
Sequence DiagramsequenceDiagram
participant Runtime as FpfRuntime
participant Storage as Artifact Storage
participant Builder as buildIndexingView
participant Classifier as classifyChange
participant Audit as BuildAudit
Runtime->>Storage: Load previous indexing view
Storage-->>Runtime: IndexingView | undefined
Runtime->>Runtime: Recompile snapshot
Runtime->>Builder: buildIndexingView(snapshot)
Builder-->>Runtime: new IndexingView + edition hash
Runtime->>Storage: Persist new indexing view
alt Previous view exists
Runtime->>Classifier: classifyChange(previous, current)
Classifier-->>Runtime: RefreshClassification
else No previous view
Runtime->>Runtime: Skip classification
end
Runtime->>Audit: Augment with refreshClassification
Audit-->>Runtime: Enhanced BuildAudit
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces an indexing view system to track and classify changes between builds, including new Zod schemas, TypeScript interfaces, and logic for change detection and validation sentinels. The review feedback identifies a type-safety issue where route constraints are accessed via an unsafe cast, suggests sorting object keys to ensure deterministic hashing for the edition ID, and recommends expanding the semantic change detection logic for routes to include missing properties like optional IDs and surfaces.
| optionalIds: [...route.optionalIds], | ||
| landingIds: [...route.landingIds], | ||
| routeSurfaces: [...route.routeSurfaces], | ||
| constraints: [...((route as { constraints?: string[] }).constraints ?? [])], |
There was a problem hiding this comment.
The property constraints does not exist on the RouteRecord type. This type cast and property access will always result in an empty array. If constraints are intended to be part of the route indexing view, they should be added to the RouteRecord interface in types.ts and populated during the compilation phase in compiler.ts.
| const anchorIds = Object.keys(snapshot.anchorMap).sort(); | ||
| const lexiconCanonicals = Object.keys(snapshot.lexicon).sort(); | ||
|
|
||
| const spineContent = JSON.stringify({ patterns, routes, anchorIds, lexiconCanonicals }); |
There was a problem hiding this comment.
The spineContent hash calculation relies on JSON.stringify of objects (patterns and routes) whose key order is not explicitly guaranteed by the language specification. To ensure a stable and deterministic edition hash across different environments or rebuilds, the keys of these objects should be sorted before stringification.
| const spineContent = JSON.stringify({ patterns, routes, anchorIds, lexiconCanonicals }); | |
| const sortedPatterns = Object.fromEntries(Object.entries(patterns).sort(([a], [b]) => a.localeCompare(b))); | |
| const sortedRoutes = Object.fromEntries(Object.entries(routes).sort(([a], [b]) => a.localeCompare(b))); | |
| const spineContent = JSON.stringify({ patterns: sortedPatterns, routes: sortedRoutes, anchorIds, lexiconCanonicals }); |
There was a problem hiding this comment.
Valid point — JSON.stringify key order isn't guaranteed by spec (though V8 does insertion-order). Added explicit key sorting for both patterns and routes before hashing to ensure deterministic edition hashes across environments. Fixed in e137efa.
| if (prevRoute && currRoute) { | ||
| return ( | ||
| prevRoute.name !== currRoute.name || | ||
| JSON.stringify(prevRoute.orderedIds) !== JSON.stringify(currRoute.orderedIds) || | ||
| JSON.stringify(prevRoute.landingIds) !== JSON.stringify(currRoute.landingIds) | ||
| ); | ||
| } |
There was a problem hiding this comment.
The semantic change detection for routes is incomplete. It currently only checks name, orderedIds, and landingIds. Changes to optionalIds, routeSurfaces, or constraints should also be considered semantic changes as they alter the route's definition and behavior.
if (prevRoute && currRoute) {
return (
prevRoute.name !== currRoute.name ||
JSON.stringify(prevRoute.orderedIds) !== JSON.stringify(currRoute.orderedIds) ||
JSON.stringify(prevRoute.landingIds) !== JSON.stringify(currRoute.landingIds) ||
JSON.stringify(prevRoute.optionalIds) !== JSON.stringify(currRoute.optionalIds) ||
JSON.stringify(prevRoute.routeSurfaces) !== JSON.stringify(currRoute.routeSurfaces) ||
JSON.stringify(prevRoute.constraints) !== JSON.stringify(currRoute.constraints)
);
}There was a problem hiding this comment.
Agreed — route semantic change detection now also compares optionalIds, routeSurfaces, and constraints. Fixed in e137efa.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/runtime/indexing-view.ts (1)
110-122: Redundant conditional: both branches return'viewing_change'.Lines 118-120 return
'viewing_change'if anchors or lexicon differ, and line 121 also returns'viewing_change'unconditionally. The conditional check has no effect on the outcome.If the intent is to distinguish these cases in the future, consider adding a comment. Otherwise, simplify:
♻️ Proposed simplification
if (changedIds.length === 0) { - const anchorsDiffer = - previous.anchorIds.length !== current.anchorIds.length || - previous.anchorIds.some((id, i) => id !== current.anchorIds[i]); - const lexiconDiffers = - previous.lexiconCanonicals.length !== current.lexiconCanonicals.length || - previous.lexiconCanonicals.some((id, i) => id !== current.lexiconCanonicals[i]); - - if (anchorsDiffer || lexiconDiffers) { - return 'viewing_change'; - } + // No pattern/route changes but edition differs - must be anchor or lexicon change return 'viewing_change'; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/runtime/indexing-view.ts` around lines 110 - 122, The conditional inside the changedIds.length === 0 block is redundant because both branches return 'viewing_change'; simplify by removing the anchorsDiffer/lexiconDiffers check and directly return 'viewing_change' (update the block surrounding changedIds, previous.anchorIds, current.anchorIds, and previous.lexiconCanonicals/current.lexiconCanonicals accordingly); if you intended to distinguish these cases later, add a short comment above the return noting that anchors/lexicon differences were considered but currently collapse to 'viewing_change'.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/runtime/indexing-view.ts`:
- Around line 110-122: The conditional inside the changedIds.length === 0 block
is redundant because both branches return 'viewing_change'; simplify by removing
the anchorsDiffer/lexiconDiffers check and directly return 'viewing_change'
(update the block surrounding changedIds, previous.anchorIds, current.anchorIds,
and previous.lexiconCanonicals/current.lexiconCanonicals accordingly); if you
intended to distinguish these cases later, add a short comment above the return
noting that anchors/lexicon differences were considered but currently collapse
to 'viewing_change'.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c2370890-c0fc-410e-9a64-60e83eaa3439
📒 Files selected for processing (5)
src/mcp/tool-contracts.tssrc/runtime/constants.tssrc/runtime/indexing-view.tssrc/runtime/runtime.tssrc/runtime/types.ts
There was a problem hiding this comment.
Pull request overview
This PR introduces a stable, machine-facing “indexing view” artifact (indexing-view.json) and a refresh change-classification pipeline so the runtime can describe what changed between builds (not just that the source hash changed).
Changes:
- Adds new runtime types for indexing views, change families, refresh sentinels, and refresh classification (and plumbs
refreshClassificationintoBuildAudit). - Implements
buildIndexingView()+classifyChange()+ refresh sentinels in a newsrc/runtime/indexing-view.tsmodule. - Wires indexing view persistence and refresh classification into
runtime.refresh(), and extends artifact filenames + MCP Zod schemas accordingly.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/runtime/types.ts | Adds indexing-view and refresh-classification related types and extends BuildAudit. |
| src/runtime/indexing-view.ts | New module to build a semantic spine view, diff views, and run post-refresh sentinels. |
| src/runtime/runtime.ts | Loads previous view, builds current view after compile, classifies change, persists indexing-view.json, includes result in audit. |
| src/runtime/constants.ts | Adds indexing-view.json to artifact filenames. |
| src/mcp/tool-contracts.ts | Adds Zod schemas for change family/sentinels/classification and extends buildAuditSchema. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const patterns: Record<string, IndexingViewEntry> = {}; | ||
| for (const [id, pattern] of Object.entries(snapshot.patternGraph.nodes)) { | ||
| patterns[id] = { | ||
| id, |
There was a problem hiding this comment.
The edition hash is derived from JSON.stringify({ patterns, ... }), but patterns is built by iterating Object.entries(snapshot.patternGraph.nodes) without sorting. Since object key insertion order depends on how the compiler discovered sections, purely "viewing" reordering in the source can change the edition even if the semantic spine is identical. To keep the view stable, build patterns (and routes) using a sorted ID list and/or use a stable stringify with sorted keys.
There was a problem hiding this comment.
Already addressed — the edition hash now sorts both patterns and routes by key before JSON.stringify (added in e137efa). The patterns/routes Records stored in the view itself remain insertion-ordered since they're keyed by ID and looked up by ID, but the hash input is always deterministic. Fixed in e137efa.
| const routes: Record<string, IndexingViewRoute> = {}; | ||
| for (const [id, route] of Object.entries(snapshot.routeGraph.nodes)) { | ||
| routes[id] = { | ||
| id, |
There was a problem hiding this comment.
Same determinism issue as patterns: routes is populated from Object.entries(snapshot.routeGraph.nodes) (insertion-ordered). A table row reorder in the source can change edition even if routes are otherwise identical. Populate routes in sorted-key order to keep the indexing view edition stable.
There was a problem hiding this comment.
Same as above — sortedRoutes is built with sorted keys before hashing. Fixed in e137efa.
| landingIds: [...route.landingIds], | ||
| routeSurfaces: [...route.routeSurfaces], | ||
| constraints: [], | ||
| }; |
There was a problem hiding this comment.
RouteRecord (from Snapshot['routeGraph']) does not define a constraints property, so this cast-based extraction will always produce an empty array unless the runtime is carrying extra fields outside the declared type. Either add constraints to RouteRecord (and ensure the compiler populates it) or remove it from IndexingViewRoute to avoid emitting misleading data.
There was a problem hiding this comment.
Correct — RouteRecord on main doesn't have constraints. This is intentionally set to [] as a forward-compatible placeholder. PR #28 (metadata-driven heuristics) adds constraints to RouteRecord; once that merges, this can be updated to read the actual field. Added a comment in the Gemini review fix explaining this cross-branch dependency.
|
|
||
| const prevAllIds = new Set([...prevPatternIds, ...prevRouteIds]); | ||
| const currAllIds = new Set([...currPatternIds, ...currRouteIds]); |
There was a problem hiding this comment.
addedIds/removedIds are computed only from pattern+route IDs. If the edition changes solely due to anchorIds or lexiconCanonicals differences, the classification will show viewing_change but addedIds/removedIds/changedIds can all be empty, making the audit hard to interpret. Consider including anchor/lexicon diffs in the classification payload (or add separate arrays for those dimensions).
| const prevAllIds = new Set([...prevPatternIds, ...prevRouteIds]); | |
| const currAllIds = new Set([...currPatternIds, ...currRouteIds]); | |
| const prevAnchorIds = new Set(previous.anchorIds); | |
| const currAnchorIds = new Set(current.anchorIds); | |
| const prevLexiconCanonicals = new Set(previous.lexiconCanonicals); | |
| const currLexiconCanonicals = new Set(current.lexiconCanonicals); | |
| const prevAllIds = new Set([ | |
| ...prevPatternIds, | |
| ...prevRouteIds, | |
| ...[...prevAnchorIds].map((id) => `anchor:${id}`), | |
| ...[...prevLexiconCanonicals].map((canonical) => `lexicon:${canonical}`), | |
| ]); | |
| const currAllIds = new Set([ | |
| ...currPatternIds, | |
| ...currRouteIds, | |
| ...[...currAnchorIds].map((id) => `anchor:${id}`), | |
| ...[...currLexiconCanonicals].map((canonical) => `lexicon:${canonical}`), | |
| ]); |
There was a problem hiding this comment.
Intentional design choice — the classification deliberately scopes addedIds/removedIds/changedIds to pattern+route IDs since those are the primary semantic entities. Anchor and lexicon changes are reflected in the edition hash difference and correctly classified as viewing_change by inferChangeFamily(). Including anchor/lexicon IDs in the diff arrays would mix namespace concerns (pattern IDs vs anchor IDs). The sentinel checks (anchor_continuity, alias_coverage) already provide granular anchor/lexicon regression info in a dedicated field. If richer anchor/lexicon diffs become needed, we can add dedicated arrays in a follow-up.
| ): RefreshSentinel[] { | ||
| const sentinels: RefreshSentinel[] = []; | ||
|
|
||
| const prevPatternIds = new Set(Object.keys(previous.patterns)); | ||
| const currPatternIds = new Set(Object.keys(current.patterns)); | ||
| const missingIds = [...prevPatternIds].filter((id) => !currPatternIds.has(id)); |
There was a problem hiding this comment.
Several sentinel detail fields can grow without bound (e.g., Removed pattern IDs: ${missingIds.join(', ')}), which can bloat build-audit.json and logs on large diffs. Consider truncating these lists (similar to the slice(0, 10) used for anchors) and including a count of omitted items.
There was a problem hiding this comment.
Good catch — added truncateDetail() helper that caps all sentinel detail strings at 500 characters. The anchor_continuity sentinel already had .slice(0, 10) but the others (id_continuity, alias_coverage, route_closure) didn't. Now all are consistently bounded. Fixed in c236dab.
| patterns: Record<string, IndexingViewEntry>; | ||
| routes: Record<string, IndexingViewRoute>; | ||
| anchorIds: string[]; | ||
| lexiconCanonicals: string[]; |
There was a problem hiding this comment.
lexiconCanonicals is named as if it contains canonical term strings, but in practice it will likely store the lexicon record keys/IDs (see buildIndexingView, which uses Object.keys(snapshot.lexicon)). Either populate this with LexiconEntry.canonical values or rename the field to reflect that it contains lexicon IDs/keys to avoid consumer confusion.
| lexiconCanonicals: string[]; | |
| lexiconIds: string[]; |
There was a problem hiding this comment.
The naming is intentional — in this codebase, snapshot.lexicon is keyed by canonical forms (the normalized term strings), not opaque IDs. Object.keys(snapshot.lexicon) returns the canonical term strings, so lexiconCanonicals accurately describes the content. Renaming to lexiconIds would be misleading since they're human-readable terms, not numeric/opaque identifiers.
| return audit; | ||
| } | ||
|
|
||
| const previousIndexingView = await this.loadIndexingView(); | ||
| const sourceText = await readFile(this.sourcePath, 'utf8'); | ||
| const builtAt = new Date().toISOString(); |
There was a problem hiding this comment.
When the snapshot is current (rebuilt: false path), refresh() returns after writeArtifacts(existingSnapshot) without ensuring indexing-view.json exists/updates. After upgrading to this PR with a preexisting snapshot, the indexing view can remain missing/stale and previousIndexingView will stay undefined on the next rebuild. Consider building/writing the indexing view from existingSnapshot in the no-rebuild branch (or ensuring it is generated once when absent).
There was a problem hiding this comment.
Good catch — real bug. The no-rebuild path (rebuilt: false) now checks if indexing-view.json exists and builds it from the existing snapshot if absent. This ensures upgrades from pre-indexing-view versions will backfill the artifact on first no-op refresh, so previousIndexingView is available on the next actual rebuild. Fixed in c236dab.
|
Re: CodeRabbit nitpick on redundant conditional (lines 110-122) — simplified by removing the |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b42f52d62
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| : undefined; | ||
|
|
||
| await this.writeArtifacts(snapshot); | ||
| await this.writeJson(this.artifactPaths.indexingView, currentIndexingView); |
There was a problem hiding this comment.
Backfill indexing-view artifact when snapshot is already current
This is currently the only write path for indexing-view.json, and it only runs in the rebuild branch. If a user upgrades from artifacts produced before this feature (so snapshot exists but indexing view does not) and the source hash is unchanged, refresh() returns snapshot_current without creating the file; then the next actual rebuild has no previousIndexingView, so refreshClassification is missing for that first transition. That undermines the new change-family/sentinel reporting exactly when migrating existing runtimes.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — this was a real gap in the initial commit. Already fixed in c236dab: the no-rebuild path now backfills the indexing view when it's missing (lines 86-90 of runtime.ts):
const existingView = await this.loadIndexingView();
if (!existingView) {
const view = buildIndexingView(existingSnapshot);
await this.writeJson(this.artifactPaths.indexingView, view);
}This ensures migrating runtimes get a persisted indexing view on their first snapshot_current pass, so the next actual rebuild has a valid previousIndexingView for change-family classification.
| prevRoute.name !== currRoute.name || | ||
| JSON.stringify(prevRoute.orderedIds) !== JSON.stringify(currRoute.orderedIds) || | ||
| JSON.stringify(prevRoute.landingIds) !== JSON.stringify(currRoute.landingIds) |
There was a problem hiding this comment.
Classify route optional/surface edits as semantic changes
In inferChangeFamily, route changes are considered semantic only for name, orderedIds, and landingIds; changes to optionalIds, routeSurfaces, or constraints fall through to viewing_change even though they alter route behavior (for example, QueryEngine.buildRouteAnswer uses optionalIds and routeSurfaces to construct returned IDs and constraints). This mislabels behavior-changing route edits and can cause downstream logic to treat substantive route updates as cosmetic.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — this was a real gap in the initial commit. It's already fixed in c236dab: optionalIds, routeSurfaces, and constraints are now all checked in the semantic change detection (lines 128-130 of the current indexing-view.ts):
JSON.stringify(prevRoute.optionalIds) !== JSON.stringify(currRoute.optionalIds) ||
JSON.stringify(prevRoute.routeSurfaces) !== JSON.stringify(currRoute.routeSurfaces) ||
JSON.stringify(prevRoute.constraints) !== JSON.stringify(currRoute.constraints)c236dab to
506a07f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 506a07fa66
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| orderedIds: [...route.orderedIds], | ||
| optionalIds: [...route.optionalIds], | ||
| landingIds: [...route.landingIds], | ||
| routeSurfaces: [...route.routeSurfaces], | ||
| constraints: [], |
There was a problem hiding this comment.
Include route burden in indexing-view semantic hashing
buildIndexingView copies route IDs/surfaces but does not include firstHonestBurden, even though QueryEngine.buildRouteAnswer uses firstHonestBurden to generate returned constraints and answer text. In the scenario where only burden text changes, the route entry and edition stay the same, so classifyChange reports no_change and misses a real behavior change in route answers. Include burden (or equivalent derived constraint content) in IndexingViewRoute so semantic classification tracks this runtime-visible change.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in ab3cf69 — IndexingViewRoute.constraints is now populated from route.firstHonestBurden (as a single-element array when present), so burden changes are captured in the edition hash and correctly trigger editioned_semantic_change classification instead of no_change.
| } | ||
|
|
||
| const anchorIds = Object.keys(snapshot.anchorMap).sort(); | ||
| const lexiconCanonicals = Object.keys(snapshot.lexicon).sort(); |
There was a problem hiding this comment.
Hash lexicon matching state, not only lexicon IDs
The lexicon portion of the edition hash is only Object.keys(snapshot.lexicon), so edits that keep the same lexeme IDs but change entry internals (for example normalizedKeys or linkedNodeIds) are classified as no_change. This is problematic because lexical retrieval in query-engine.ts depends on those internals for matching and node expansion, so query behavior can change while refresh classification says nothing changed. Include relevant lexicon entry fields in the indexing view/hash instead of only keys.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in ab3cf69 — the edition hash now includes a lexiconFingerprints record alongside lexiconCanonicals. Each entry captures normalizedKeys and linkedNodeIds (both sorted for determinism), so changes to lexical retrieval inputs that affect query-engine matching and node expansion are now detected by semantic refresh classification even when the set of lexeme IDs stays the same.
- Add IndexingView, ChangeFamily, RefreshClassification, RefreshSentinel types - Add buildIndexingView() to extract semantic spine from compiled snapshot - Add classifyChange() to diff two indexing views and classify the change family - Add runRefreshSentinels() for post-rebuild validation checks: id_continuity, alias_coverage, anchor_continuity, route_closure, no_dangling_references - Wire indexing view into runtime.refresh() — build after compile, compare with previous - Add indexingView to ARTIFACT_FILENAMES (persisted as indexing-view.json) - Add refreshClassification to BuildAudit type and MCP tool contract schema - Change families: no_change, viewing_change, slot_explicitness_change, editioned_semantic_change, described_entity_retargeting Closes #3 Co-Authored-By: Stanislau <nedbailov375426@gmail.com>
…on, drop unsafe cast - Sort pattern/route keys before JSON.stringify for deterministic edition hash - Remove unsafe RouteRecord constraints cast (field doesn't exist on main branch yet) - Expand route semantic change detection to include optionalIds, routeSurfaces, constraints Co-Authored-By: Stanislau <nedbailov375426@gmail.com>
…ails, simplify redundant conditional Co-Authored-By: Stanislau <nedbailov375426@gmail.com>
- Include normalizedKeys and linkedNodeIds per lexicon entry in the edition hash so that changes to lexical retrieval inputs (used by query-engine for matching and node expansion) are detected by semantic refresh classification. - Populate IndexingViewRoute.constraints from route.firstHonestBurden so that burden changes (used by buildRouteAnswer for constraint generation) are captured in the edition hash. Co-Authored-By: Stanislau <nedbailov375426@gmail.com>
ab3cf69 to
0b2e388
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0b2e388979
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const sortedLexiconFingerprints = Object.fromEntries( | ||
| Object.entries(lexiconFingerprints).sort(([a], [b]) => a.localeCompare(b)), | ||
| ); | ||
| const spineContent = JSON.stringify({ patterns: sortedPatterns, routes: sortedRoutes, anchorIds, lexiconCanonicals, lexiconFingerprints: sortedLexiconFingerprints }); |
There was a problem hiding this comment.
Include retrieval text in edition hashing
buildIndexingView hashes a reduced pattern/route spine, but it omits fields that drive retrieval scoring (notably pattern.searchableText and route.searchableText). Because scorePatternQuery/scoreRouteQuery rank candidates using those text fields, a prose-only source edit can change runtime answers while classifyChange still returns no_change when the edition hash matches. This makes refreshClassification inaccurate for behavior-affecting content changes.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This is intentional. The edition hash captures the semantic spine — structural metadata (IDs, titles, statuses, relations, aliases, routes, anchors, lexicon). The sourceHash field (line 65) separately tracks raw source-level changes. If sourceHash differs but edition doesn't, classifyChange correctly returns viewing_change — indicating the source text changed but the structural/semantic shape didn't. Including searchableText in the edition hash would conflate prose rewording (which doesn't affect structure) with actual semantic changes, making the classification less useful. Retrieval scoring changes from prose edits are already detectable via sourceHash comparison.
| for (const pattern of Object.values(current.patterns)) { | ||
| for (const edge of pattern.relationEdges) { | ||
| allRelationTargets.add(edge.to); | ||
| } | ||
| } |
There was a problem hiding this comment.
Check dangling references from routes in sentinel
runRefreshSentinels builds no_dangling_references from current.patterns[*].relationEdges only, so route references are never validated here. If a route keeps an invalid orderedIds/optionalIds/landingIds target after an edit, this sentinel can still report passed: true, which hides broken route wiring in the refresh classification output.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Valid observation, but this is by design rather than a bug. The no_dangling_references sentinel specifically validates relation graph integrity — ensuring pattern relation edge targets resolve to known nodes. Route orderedIds/optionalIds/landingIds are validated during compilation (the compiler rejects unknown IDs when building the route graph). Adding route node reference checks to this sentinel would be a reasonable enhancement but isn't a regression — route wiring integrity is already enforced upstream.
Code ReviewOverviewAdds a stable Strengths
Issues & Risks
|
…eld equality, standardize truncation, refine entity retargeting - Include 'no_change' and 'entity_addition' in ChangeFamily type directly (eliminates ChangeFamily | 'no_change' union drift from Zod schema) - Replace JSON.stringify equality in entryEqual/routeEqual with field-by-field comparison (avoids allocation, removes key-order fragility) - Standardize all sentinel truncation to 'first N items' via formatDetail() (replaces mixed truncateDetail(chars) and .slice(0,10) strategies) - Distinguish pure entity_addition (benign) from described_entity_retargeting (removals/renames with real retarget risk) - Add explicit anchor/lexicon check in viewing_change fallthrough so future IndexingView fields don't silently get swept into viewing_change Co-Authored-By: Stanislau <nedbailov375426@gmail.com>
|
@venikman All seven issues from your review addressed in
|
|
All review feedback addressed across 5 commits. Independent verification:
Review comments addressed (45 total):
PR description updated. Ready to merge. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5007b430ae
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (addedIds.length > 0) { | ||
| return 'entity_addition'; |
There was a problem hiding this comment.
Restrict entity_addition to pure-addition diffs
inferChangeFamily currently returns entity_addition whenever addedIds is non-empty, even if existing patterns/routes also changed in the same rebuild. In that mixed-edit case, semantic or slot-affecting edits are masked as a benign addition, which can mislead any downstream automation that treats changeFamily as a risk signal. Gate this branch on changedIds.length === 0 (and no removals) so non-addition edits are still classified accurately.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Valid bug — confirmed and fixed in 3665969.
inferChangeFamily() now gates entity_addition on changedIds.length === 0, so mixed addition+change diffs fall through to the semantic/slot classification logic instead of being masked as benign additions.
| return audit; | ||
| } | ||
|
|
||
| const previousIndexingView = await this.loadIndexingView(); |
There was a problem hiding this comment.
Fall back to snapshot-derived previous indexing view
When rebuilding after an upgrade from older artifacts, loadIndexingView() can return undefined even though existingSnapshot is present, causing refreshClassification to be omitted for that first source-changing rebuild. This drops the migration diff signal exactly when users expect it; if the file is missing but a compatible prior snapshot exists, derive previousIndexingView from buildIndexingView(existingSnapshot) before classifying.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Valid concern — fixed in 3665969.
runtime.ts now falls back to buildIndexingView(existingSnapshot) when loadIndexingView() returns undefined but a compatible snapshot exists. This ensures refreshClassification is populated on the first source-changing rebuild after upgrading from older artifacts that lack indexing-view.json.
…pshot-derived indexing view - inferChangeFamily() now returns 'entity_addition' only when changedIds is empty, preventing mixed addition+change diffs from being masked as benign. - runtime.ts falls back to buildIndexingView(existingSnapshot) when the indexing-view file is missing but a compatible snapshot exists, ensuring refreshClassification is not silently dropped during artifact upgrades. Co-Authored-By: Stanislau <nedbailov375426@gmail.com>
Merge-readiness confirmationAll review feedback addressed (46+ comments across Gemini, Copilot, CodeRabbit, Devin Review, Codex, and venikman):
CI: 1/1 passing Ready to merge. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3665969f5b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const previousIndexingView = await this.loadIndexingView() | ||
| ?? (compatibleSnapshot && existingSnapshot ? buildIndexingView(existingSnapshot) : undefined); |
There was a problem hiding this comment.
Remove compatibility gate from snapshot-based view fallback
previousIndexingView is only derived from existingSnapshot when compatibleSnapshot is true, but compatibility is decided by snapshotNeedsRebuild checks on indexMap fields that buildIndexingView does not use. In an upgrade where an older snapshot has stale indexMap metadata but still has valid patternGraph/routeGraph/anchorMap/lexicon, this leaves previousIndexingView undefined and drops refreshClassification for the first rebuild, so the migration diff signal is lost. The fallback should be based on whether the snapshot has the fields buildIndexingView actually consumes, not the broader compatibility flag.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Valid point — fixed in 5fb3bd0.
snapshotNeedsRebuild() checks indexMap metadata fields (description, metadata.role, metadata.routeBearing), but buildIndexingView() only consumes patternGraph, routeGraph, anchorMap, and lexicon — none of which are gated by that check. Removed the compatibleSnapshot gate so the fallback fires whenever existingSnapshot exists, ensuring refreshClassification is populated even when upgrading from snapshots with stale indexMap metadata.
…back buildIndexingView() only consumes patternGraph/routeGraph/anchorMap/lexicon, none of which are checked by snapshotNeedsRebuild() (which gates on indexMap metadata fields). Using existingSnapshot directly ensures refreshClassification is populated even when upgrading from snapshots with stale indexMap metadata. Co-Authored-By: Stanislau <nedbailov375426@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5fb3bd0fdb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| orderedIds: [...route.orderedIds], | ||
| optionalIds: [...route.optionalIds], | ||
| landingIds: [...route.landingIds], | ||
| routeSurfaces: [...route.routeSurfaces], | ||
| constraints: route.firstHonestBurden ? [route.firstHonestBurden] : [], |
There was a problem hiding this comment.
Hash route retrieval fields in indexing view
buildIndexingView currently fingerprints routes without anchorIds, citations, nextOwners, or reroutes, but runtime behavior uses those fields (QueryEngine.anchorIdsForNode/buildRouteAnswer consume anchors and citations, and buildRouteRelations turns next-owner/reroute edits into traversable edges). In the scenario where only those fields change, edition remains the same and classifyChange reports no_change, which misclassifies a behavior-affecting rebuild as unchanged. Please include these route fields (or equivalent derived data) in the route view/equality inputs used for editioning.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Valid catch — confirmed this is a real bug. IndexingViewRoute omitted anchorIds, citations, nextOwners, and reroutes, all of which affect runtime behavior:
anchorIds→ consumed byQueryEngine.anchorIdsForNode()for groundingcitations→ consumed bybuildRouteAnswer()andbuildRouteRelations()nextOwners/reroutes→ consumed bybuildRouteRelations()andindex-projector.ts
Changes to any of these fields would leave the edition hash unchanged and classifyChange would report no_change, masking a behavior-affecting rebuild.
Fixed in b0b5eb1:
- Added
anchorIds,citations,nextOwners,reroutestoIndexingViewRouteinterface buildIndexingView()now populates these fields (sorted for deterministic hashing)routeEqual()compares all four new fieldsinferChangeFamily()semantic check includes the new fields
Typecheck and lint pass clean.
…xing view Codex P2: buildIndexingView omitted route fields that affect runtime behavior (anchor selection, graph traversal, answer projection). Changes to these fields would not be detected by classifyChange, causing no_change when behavior actually changed. Co-Authored-By: Stanislau <nedbailov375426@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b0b5eb1e76
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| previous.lexiconCanonicals.length !== current.lexiconCanonicals.length || | ||
| previous.lexiconCanonicals.some((id, i) => id !== current.lexiconCanonicals[i]); |
There was a problem hiding this comment.
Classify lexicon fingerprint edits as semantic changes
inferChangeFamily only treats lexicon differences as viewing_change when the lexicon ID list (lexiconCanonicals) changes, but ignores fingerprint-only edits such as normalizedKeys/linkedNodeIds changes. In that scenario changedIds is empty (only patterns/routes are diffed), so behavior-changing lexical retrieval updates fall through to a non-semantic family even though query matching and lexeme expansion depend on those fields. This can mislead downstream automation that relies on changeFamily severity to triage rebuild risk.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Investigated — this is working as intended. Here's the analysis:
Edition hash catches it: The spineContent at line 64 includes lexiconFingerprints (containing normalizedKeys and linkedNodeIds), so a fingerprint-only edit does change the edition hash. This means classifyChange will never return no_change for this scenario — the previous.edition === current.edition guard at line 107 correctly passes through.
Classification is correct: When changedIds is empty (no pattern/route field changes) but the edition differs, the code falls through to viewing_change at line 142. This is the right family for lexicon fingerprint edits:
viewing_change= "retrieval behavior may differ but entity identity/meaning is unchanged"linkedNodeIds/normalizedKeyschanges affect how retrieval routes through existing entities, not what entities exist or mean- All families except
no_changetrigger a rebuild, so the risk is properly surfaced
The concern about "misleading downstream automation" doesn't apply because viewing_change already signals "something retrieval-relevant changed, rebuild needed." Promoting this to editioned_semantic_change would overstate the severity — these aren't entity-level semantic changes (title, status, relations), they're lexical routing changes.
What
Add a stable machine-facing indexing view (
indexing-view.json) and semantic refresh classification to the runtime. The indexing view captures the semantic spine of a compiled snapshot — patterns, routes, anchors, lexicon — with a deterministic edition hash. On rebuild, the runtime classifies the change into one of six families (no_change,viewing_change,slot_explicitness_change,editioned_semantic_change,entity_addition,described_entity_retargeting) and runs five refresh sentinels.Why
Closes #3. Hash-only refresh cannot distinguish a whitespace edit from a semantic rename. The indexing view enables diff-based refresh classification so operators/spec-authors know whether a rebuild was cosmetic or behaviorally significant. Classification is expert-only output (surfaced in
BuildAudit, not in any public MCP tool).Type
feat— new capabilityChanges
src/runtime/indexing-view.ts(new, 331 lines):buildIndexingView(),classifyChange(),inferChangeFamily(),runRefreshSentinels(), field-by-field equality helpers (entryEqual,routeEqual,arraysEqual,relationEdgesEqual)src/runtime/types.ts: Added typesChangeFamily,IndexingViewEntry,IndexingViewRoute,IndexingView,RefreshSentinel,RefreshClassification; extendedBuildAuditwithrefreshClassificationsrc/runtime/runtime.ts: Loads previous indexing view (with fallback to snapshot-derived view for artifact upgrades), builds current view, classifies change, backfills on no-rebuild pathsrc/runtime/constants.ts: AddedindexingViewartifact filenametests/indexing-view.test.ts(new): Unit tests forbuildIndexingViewandclassifyChangeKey fixes addressed during review (46 comments total):
JSON.stringifywithentryEqual()/routeEqual()for determinism and performanceformatDetail()helper (cap 10 items +… (N more)suffix)changedIds.length === 0so mixed addition+change diffs are classified accurately (not masked as benign)buildIndexingView(existingSnapshot)when indexing-view file is missing but compatible snapshot existsfirstHonestBurdenin route constraints,normalizedKeys/linkedNodeIdsin lexicon fingerprintsValidation
bun run lintpasses locallybun run checkpasses locallybun run testpasses locallybun run buildsucceedsbun run docs:buildsucceeds (if docs touched)Boundary check
FPF-spec.mdonly — no additional corpora addedsrc/mcp/tool-contracts.tsAgent metadata
Requested by: @venikman