-
Notifications
You must be signed in to change notification settings - Fork 1
feat: add stable indexing view and semantic refresh classification #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
72cdfbc
07afa46
4ef2d9e
0b2e388
5007b43
3665969
5fb3bd0
b0b5eb1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,343 @@ | ||||||||||||||||||||||||||||||||||||||||||
| import { createHash } from 'node:crypto'; | ||||||||||||||||||||||||||||||||||||||||||
| import type { | ||||||||||||||||||||||||||||||||||||||||||
| ChangeFamily, | ||||||||||||||||||||||||||||||||||||||||||
| IndexingView, | ||||||||||||||||||||||||||||||||||||||||||
| IndexingViewEntry, | ||||||||||||||||||||||||||||||||||||||||||
| IndexingViewRoute, | ||||||||||||||||||||||||||||||||||||||||||
| RefreshClassification, | ||||||||||||||||||||||||||||||||||||||||||
| RefreshSentinel, | ||||||||||||||||||||||||||||||||||||||||||
| Snapshot, | ||||||||||||||||||||||||||||||||||||||||||
| } from './types.js'; | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| export function buildIndexingView(snapshot: Snapshot): IndexingView { | ||||||||||||||||||||||||||||||||||||||||||
| const patterns: Record<string, IndexingViewEntry> = {}; | ||||||||||||||||||||||||||||||||||||||||||
| for (const [id, pattern] of Object.entries(snapshot.patternGraph.nodes)) { | ||||||||||||||||||||||||||||||||||||||||||
| patterns[id] = { | ||||||||||||||||||||||||||||||||||||||||||
| id, | ||||||||||||||||||||||||||||||||||||||||||
| kind: 'pattern', | ||||||||||||||||||||||||||||||||||||||||||
| title: pattern.title, | ||||||||||||||||||||||||||||||||||||||||||
| status: pattern.status, | ||||||||||||||||||||||||||||||||||||||||||
| type: pattern.type, | ||||||||||||||||||||||||||||||||||||||||||
| normativity: pattern.normativity, | ||||||||||||||||||||||||||||||||||||||||||
| part: pattern.part, | ||||||||||||||||||||||||||||||||||||||||||
| cluster: pattern.cluster, | ||||||||||||||||||||||||||||||||||||||||||
| aliases: [...pattern.aliases].sort(), | ||||||||||||||||||||||||||||||||||||||||||
| anchorIds: [...pattern.sectionIds].sort(), | ||||||||||||||||||||||||||||||||||||||||||
| relationEdges: pattern.relations | ||||||||||||||||||||||||||||||||||||||||||
| .map((r) => ({ from: r.from, relation: r.relation, to: r.to })) | ||||||||||||||||||||||||||||||||||||||||||
| .sort((a, b) => `${a.from}:${a.relation}:${a.to}`.localeCompare(`${b.from}:${b.relation}:${b.to}`)), | ||||||||||||||||||||||||||||||||||||||||||
| }; | ||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| const routes: Record<string, IndexingViewRoute> = {}; | ||||||||||||||||||||||||||||||||||||||||||
| for (const [id, route] of Object.entries(snapshot.routeGraph.nodes)) { | ||||||||||||||||||||||||||||||||||||||||||
| routes[id] = { | ||||||||||||||||||||||||||||||||||||||||||
| id, | ||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+32
to
+35
|
||||||||||||||||||||||||||||||||||||||||||
| name: route.name, | ||||||||||||||||||||||||||||||||||||||||||
| orderedIds: [...route.orderedIds], | ||||||||||||||||||||||||||||||||||||||||||
| optionalIds: [...route.optionalIds], | ||||||||||||||||||||||||||||||||||||||||||
| landingIds: [...route.landingIds], | ||||||||||||||||||||||||||||||||||||||||||
| routeSurfaces: [...route.routeSurfaces], | ||||||||||||||||||||||||||||||||||||||||||
| constraints: route.firstHonestBurden ? [route.firstHonestBurden] : [], | ||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+37
to
+41
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Useful? React with 👍 / 👎.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Valid catch — confirmed this is a real bug.
Changes to any of these fields would leave the Fixed in
Typecheck and lint pass clean. |
||||||||||||||||||||||||||||||||||||||||||
| anchorIds: [...route.anchorIds].sort(), | ||||||||||||||||||||||||||||||||||||||||||
| citations: [...route.citations].sort(), | ||||||||||||||||||||||||||||||||||||||||||
| nextOwners: [...route.nextOwners].sort(), | ||||||||||||||||||||||||||||||||||||||||||
| reroutes: [...route.reroutes].sort(), | ||||||||||||||||||||||||||||||||||||||||||
| }; | ||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+39
to
+46
|
||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| const anchorIds = Object.keys(snapshot.anchorMap).sort(); | ||||||||||||||||||||||||||||||||||||||||||
| const lexiconCanonicals = Object.keys(snapshot.lexicon).sort(); | ||||||||||||||||||||||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The lexicon portion of the edition hash is only Useful? React with 👍 / 👎.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed in ab3cf69 — the edition hash now includes a |
||||||||||||||||||||||||||||||||||||||||||
| const lexiconFingerprints: Record<string, { normalizedKeys: string[]; linkedNodeIds: string[] }> = {}; | ||||||||||||||||||||||||||||||||||||||||||
| for (const [id, entry] of Object.entries(snapshot.lexicon)) { | ||||||||||||||||||||||||||||||||||||||||||
| lexiconFingerprints[id] = { | ||||||||||||||||||||||||||||||||||||||||||
| normalizedKeys: [...entry.normalizedKeys].sort(), | ||||||||||||||||||||||||||||||||||||||||||
| linkedNodeIds: [...entry.linkedNodeIds].sort(), | ||||||||||||||||||||||||||||||||||||||||||
| }; | ||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| const sortedPatterns = Object.fromEntries(Object.entries(patterns).sort(([a], [b]) => a.localeCompare(b))); | ||||||||||||||||||||||||||||||||||||||||||
| const sortedRoutes = Object.fromEntries(Object.entries(routes).sort(([a], [b]) => a.localeCompare(b))); | ||||||||||||||||||||||||||||||||||||||||||
| const sortedLexiconFingerprints = Object.fromEntries( | ||||||||||||||||||||||||||||||||||||||||||
| Object.entries(lexiconFingerprints).sort(([a], [b]) => a.localeCompare(b)), | ||||||||||||||||||||||||||||||||||||||||||
| ); | ||||||||||||||||||||||||||||||||||||||||||
| const spineContent = JSON.stringify({ patterns: sortedPatterns, routes: sortedRoutes, anchorIds, lexiconCanonicals, lexiconFingerprints: sortedLexiconFingerprints }); | ||||||||||||||||||||||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Useful? React with 👍 / 👎.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is intentional. The edition hash captures the semantic spine — structural metadata (IDs, titles, statuses, relations, aliases, routes, anchors, lexicon). The |
||||||||||||||||||||||||||||||||||||||||||
| const edition = `sha256:${createHash('sha256').update(spineContent).digest('hex').slice(0, 16)}`; | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| return { | ||||||||||||||||||||||||||||||||||||||||||
| edition, | ||||||||||||||||||||||||||||||||||||||||||
| sourceHash: snapshot.sourceHash, | ||||||||||||||||||||||||||||||||||||||||||
| builtAt: snapshot.builtAt, | ||||||||||||||||||||||||||||||||||||||||||
| patterns, | ||||||||||||||||||||||||||||||||||||||||||
| routes, | ||||||||||||||||||||||||||||||||||||||||||
| anchorIds, | ||||||||||||||||||||||||||||||||||||||||||
| lexiconCanonicals, | ||||||||||||||||||||||||||||||||||||||||||
| }; | ||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| export function classifyChange( | ||||||||||||||||||||||||||||||||||||||||||
| previous: IndexingView, | ||||||||||||||||||||||||||||||||||||||||||
| current: IndexingView, | ||||||||||||||||||||||||||||||||||||||||||
| ): RefreshClassification { | ||||||||||||||||||||||||||||||||||||||||||
| const sentinels = runRefreshSentinels(previous, current); | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| const prevPatternIds = new Set(Object.keys(previous.patterns)); | ||||||||||||||||||||||||||||||||||||||||||
| const currPatternIds = new Set(Object.keys(current.patterns)); | ||||||||||||||||||||||||||||||||||||||||||
| const prevRouteIds = new Set(Object.keys(previous.routes)); | ||||||||||||||||||||||||||||||||||||||||||
| const currRouteIds = new Set(Object.keys(current.routes)); | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| const prevAllIds = new Set([...prevPatternIds, ...prevRouteIds]); | ||||||||||||||||||||||||||||||||||||||||||
| const currAllIds = new Set([...currPatternIds, ...currRouteIds]); | ||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+88
to
+90
|
||||||||||||||||||||||||||||||||||||||||||
| const prevAllIds = new Set([...prevPatternIds, ...prevRouteIds]); | |
| const currAllIds = new Set([...currPatternIds, ...currRouteIds]); | |
| const prevAnchorIds = new Set(previous.anchorIds); | |
| const currAnchorIds = new Set(current.anchorIds); | |
| const prevLexiconCanonicals = new Set(previous.lexiconCanonicals); | |
| const currLexiconCanonicals = new Set(current.lexiconCanonicals); | |
| const prevAllIds = new Set([ | |
| ...prevPatternIds, | |
| ...prevRouteIds, | |
| ...[...prevAnchorIds].map((id) => `anchor:${id}`), | |
| ...[...prevLexiconCanonicals].map((canonical) => `lexicon:${canonical}`), | |
| ]); | |
| const currAllIds = new Set([ | |
| ...currPatternIds, | |
| ...currRouteIds, | |
| ...[...currAnchorIds].map((id) => `anchor:${id}`), | |
| ...[...currLexiconCanonicals].map((canonical) => `lexicon:${canonical}`), | |
| ]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intentional design choice — the classification deliberately scopes addedIds/removedIds/changedIds to pattern+route IDs since those are the primary semantic entities. Anchor and lexicon changes are reflected in the edition hash difference and correctly classified as viewing_change by inferChangeFamily(). Including anchor/lexicon IDs in the diff arrays would mix namespace concerns (pattern IDs vs anchor IDs). The sentinel checks (anchor_continuity, alias_coverage) already provide granular anchor/lexicon regression info in a dedicated field. If richer anchor/lexicon diffs become needed, we can add dedicated arrays in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Classify lexicon fingerprint edits as semantic changes
inferChangeFamily only treats lexicon differences as viewing_change when the lexicon ID list (lexiconCanonicals) changes, but ignores fingerprint-only edits such as normalizedKeys/linkedNodeIds changes. In that scenario changedIds is empty (only patterns/routes are diffed), so behavior-changing lexical retrieval updates fall through to a non-semantic family even though query matching and lexeme expansion depend on those fields. This can mislead downstream automation that relies on changeFamily severity to triage rebuild risk.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Investigated — this is working as intended. Here's the analysis:
Edition hash catches it: The spineContent at line 64 includes lexiconFingerprints (containing normalizedKeys and linkedNodeIds), so a fingerprint-only edit does change the edition hash. This means classifyChange will never return no_change for this scenario — the previous.edition === current.edition guard at line 107 correctly passes through.
Classification is correct: When changedIds is empty (no pattern/route field changes) but the edition differs, the code falls through to viewing_change at line 142. This is the right family for lexicon fingerprint edits:
viewing_change= "retrieval behavior may differ but entity identity/meaning is unchanged"linkedNodeIds/normalizedKeyschanges affect how retrieval routes through existing entities, not what entities exist or mean- All families except
no_changetrigger a rebuild, so the risk is properly surfaced
The concern about "misleading downstream automation" doesn't apply because viewing_change already signals "something retrieval-relevant changed, rebuild needed." Promoting this to editioned_semantic_change would overstate the severity — these aren't entity-level semantic changes (title, status, relations), they're lexical routing changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantic change detection for routes is incomplete. It currently only checks name, orderedIds, and landingIds. Changes to optionalIds, routeSurfaces, or constraints should also be considered semantic changes as they alter the route's definition and behavior.
if (prevRoute && currRoute) {
return (
prevRoute.name !== currRoute.name ||
JSON.stringify(prevRoute.orderedIds) !== JSON.stringify(currRoute.orderedIds) ||
JSON.stringify(prevRoute.landingIds) !== JSON.stringify(currRoute.landingIds) ||
JSON.stringify(prevRoute.optionalIds) !== JSON.stringify(currRoute.optionalIds) ||
JSON.stringify(prevRoute.routeSurfaces) !== JSON.stringify(currRoute.routeSurfaces) ||
JSON.stringify(prevRoute.constraints) !== JSON.stringify(currRoute.constraints)
);
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed — route semantic change detection now also compares optionalIds, routeSurfaces, and constraints. Fixed in e137efa.
Copilot
AI
Apr 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several sentinel detail fields can grow without bound (e.g., Removed pattern IDs: ${missingIds.join(', ')}), which can bloat build-audit.json and logs on large diffs. Consider truncating these lists (similar to the slice(0, 10) used for anchors) and including a count of omitted items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch — added truncateDetail() helper that caps all sentinel detail strings at 500 characters. The anchor_continuity sentinel already had .slice(0, 10) but the others (id_continuity, alias_coverage, route_closure) didn't. Now all are consistently bounded. Fixed in c236dab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check dangling references from routes in sentinel
runRefreshSentinels builds no_dangling_references from current.patterns[*].relationEdges only, so route references are never validated here. If a route keeps an invalid orderedIds/optionalIds/landingIds target after an edit, this sentinel can still report passed: true, which hides broken route wiring in the refresh classification output.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid observation, but this is by design rather than a bug. The no_dangling_references sentinel specifically validates relation graph integrity — ensuring pattern relation edge targets resolve to known nodes. Route orderedIds/optionalIds/landingIds are validated during compilation (the compiler rejects unknown IDs when building the route graph). Adding route node reference checks to this sentinel would be a reasonable enhancement but isn't a regression — route wiring integrity is already enforced upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The edition hash is derived from
JSON.stringify({ patterns, ... }), butpatternsis built by iteratingObject.entries(snapshot.patternGraph.nodes)without sorting. Since object key insertion order depends on how the compiler discovered sections, purely "viewing" reordering in the source can change the edition even if the semantic spine is identical. To keep the view stable, buildpatterns(androutes) using a sorted ID list and/or use a stable stringify with sorted keys.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already addressed — the edition hash now sorts both
patternsandroutesby key beforeJSON.stringify(added in e137efa). Thepatterns/routesRecords stored in the view itself remain insertion-ordered since they're keyed by ID and looked up by ID, but the hash input is always deterministic. Fixed in e137efa.