Skip to content

Parse plain-text catalog dependency cells into typed relation edges#23

Closed
venikman wants to merge 1 commit into
mainfrom
fix/parse-catalog-plain-text-relations
Closed

Parse plain-text catalog dependency cells into typed relation edges#23
venikman wants to merge 1 commit into
mainfrom
fix/parse-catalog-plain-text-relations

Conversation

@venikman
Copy link
Copy Markdown
Owner

@venikman venikman commented Apr 12, 2026

Summary

  • Add plain-text fallback parser to parseLabeledRelations in compiler.ts for catalog dependency cells that use Builds on: A.1 format instead of **Builds on:** A.1
  • Extract pushRelationEdges helper to deduplicate edge-creation logic between bold and plain-text parsers
  • Fallback only activates when the bold-markdown regex finds zero matches, preventing double-counting

Impact

Relation type Before After Change
builds_on 353 1,132 +221%
coordinates_with 167 583 +249%
prerequisite_for 37 68 +84%
used_by 32 122 +281%
Total edges 14,006 15,402 +1,396

218 of 224 patterns with dependency text now have parsed typed relations (was 76 before). The remaining 6 reference non-standard IDs (P-4, CHR-CAL, Part C (CHR)) that extractIds cannot parse.

Test plan

  • All 38 existing tests pass (npx rstest run)
  • Force-rebuilt index via refresh_fpf_index MCP tool
  • Verified A.0 now has builds_on and coordinates_with edges (was only explicit_reference)
  • Verified G.9 and G.13 typed relations populated correctly
  • End-to-end MCP client validation: ask_fpf, query_fpf_spec, inspect_fpf_node, trace_fpf_path all return correct results against the improved index

🤖 Generated with Claude Code


Open with Devin

Summary by CodeRabbit

  • Bug Fixes & Improvements
    • Enhanced relation label parsing with improved fallback logic: now attempts bold-markdown format first, then falls back to plain-text format if needed.
    • Strengthened target extraction and label normalization for more robust parsing capabilities.

The parseLabeledRelations function only matched **bold:** markdown
format, but catalog dependency cells use plain text like
"Builds on: A.1, A.2. Constrains: B.3." This caused 148 of 237
patterns to lose typed semantic relations (builds_on, constrains,
coordinates_with, etc.) from catalog metadata.

Add a plain-text fallback parser that activates when the bold regex
finds nothing. Uses lookahead on known RELATION_LABELS keys to split
multi-label cells correctly. Extract pushRelationEdges helper to
deduplicate the edge-creation logic.

Impact: builds_on 353→1132, coordinates_with 167→583,
prerequisite_for 37→68. Total +1396 typed relation edges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 12, 2026 16:05
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

📝 Walkthrough

Walkthrough

Modified the parseLabeledRelations function in the compiler to use a two-stage parsing strategy: first attempting to parse bold-formatted relation labels (**Label:** Targets), then falling back to plain-text catalog format. Introduced a shared helper function pushRelationEdges to consolidate label normalization and edge construction logic.

Changes

Cohort / File(s) Summary
Relation Label Parsing Refactor
src/runtime/compiler.ts
Replaced single-stage regex-based relation label parsing with a two-stage approach (bold-format first, plain-text fallback). Extracted edge construction logic into a shared pushRelationEdges helper to normalize labels, resolve relation kinds, and build edges for both parsing paths.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Two paths now guide the parser's way,
Bold labels first see the light of day,
When none are found, plain text takes flight,
A fallback dance, precise and right,
Relations flow through edges bright!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding plain-text catalog dependency parsing to handle non-bold-markdown relation formats.
Description check ✅ Passed The description covers the What, Impact, and Test plan sections comprehensively with detailed metrics and validation results, though the formal template sections (Why, Type, Changes, Boundary check) are not explicitly structured.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/parse-catalog-plain-text-relations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the parseLabeledRelations function to support both bold markdown and plain-text catalog formats for relations, introducing a pushRelationEdges helper to handle the edge creation logic. The reviewer suggested a performance optimization to move the dynamic regex construction for plain-text labels to a module-level constant or to memoize it, as the underlying labels are static and re-calculating the regex on every function call adds unnecessary overhead.

Comment thread src/runtime/compiler.ts
Comment on lines +1205 to +1212
const escapedLabels = Object.keys(RELATION_LABELS).map(
(label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),
);
const labelAlternation = escapedLabels.join('|');
const plainRegex = new RegExp(
`(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,
'gis',
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The plainRegex and its associated strings (escapedLabels, labelAlternation) are re-constructed on every call to parseLabeledRelations when bold relations are absent. Since RELATION_LABELS is a static constant, this regex should be defined as a module-level constant or memoized to avoid unnecessary overhead during the compilation process, especially as this function is called for every pattern in the catalog.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/runtime/compiler.ts`:
- Around line 1204-1216: The current logic skips plain-text label parsing when
relationEdges already contains matches (the if (relationEdges.length === 0)
branch), causing mixed-format cells to miss edges; remove the conditional so the
plainRegex pass always runs after the bold/initial parse: keep the escapedLabels
/ labelAlternation / plainRegex construction and the loop that calls
pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '',
sourceCitation), but execute it unconditionally (or after the bold-match pass
without gating on relationEdges.length) so both bold and plain labels in text
are parsed and appended to relationEdges.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 07de35c0-0e6e-451c-ae7b-a50a553eebff

📥 Commits

Reviewing files that changed from the base of the PR and between afd7e9d and 737c03c.

📒 Files selected for processing (1)
  • src/runtime/compiler.ts

Comment thread src/runtime/compiler.ts
Comment on lines +1204 to 1216
if (relationEdges.length === 0) {
const escapedLabels = Object.keys(RELATION_LABELS).map(
(label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),
);
const labelAlternation = escapedLabels.join('|');
const plainRegex = new RegExp(
`(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,
'gis',
);
for (const match of text.matchAll(plainRegex)) {
pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Parse plain-text labels even when bold labels already matched.

Line 1204 makes plain-text parsing conditional on zero bold matches, which drops edges for mixed-format cells (e.g., **Builds on:** A.1 Used by: B.2 loses used_by).

💡 Suggested fix
-  if (relationEdges.length === 0) {
-    const escapedLabels = Object.keys(RELATION_LABELS).map(
-      (label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),
-    );
-    const labelAlternation = escapedLabels.join('|');
-    const plainRegex = new RegExp(
-      `(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,
-      'gis',
-    );
-    for (const match of text.matchAll(plainRegex)) {
-      pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation);
-    }
-  }
+  const escapedLabels = Object.keys(RELATION_LABELS).map(
+    (label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),
+  );
+  const labelAlternation = escapedLabels.join('|');
+  const plainRegex = new RegExp(
+    `(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,
+    'gis',
+  );
+  for (const match of text.matchAll(plainRegex)) {
+    pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation);
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (relationEdges.length === 0) {
const escapedLabels = Object.keys(RELATION_LABELS).map(
(label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),
);
const labelAlternation = escapedLabels.join('|');
const plainRegex = new RegExp(
`(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,
'gis',
);
for (const match of text.matchAll(plainRegex)) {
pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation);
}
}
const escapedLabels = Object.keys(RELATION_LABELS).map(
(label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),
);
const labelAlternation = escapedLabels.join('|');
const plainRegex = new RegExp(
`(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,
'gis',
);
for (const match of text.matchAll(plainRegex)) {
pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation);
}
🧰 Tools
🪛 ast-grep (0.42.1)

[warning] 1208-1211: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$),
'gis',
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/runtime/compiler.ts` around lines 1204 - 1216, The current logic skips
plain-text label parsing when relationEdges already contains matches (the if
(relationEdges.length === 0) branch), causing mixed-format cells to miss edges;
remove the conditional so the plainRegex pass always runs after the bold/initial
parse: keep the escapedLabels / labelAlternation / plainRegex construction and
the loop that calls pushRelationEdges(relationEdges, sourceId, match[1] ?? '',
match[2] ?? '', sourceCitation), but execute it unconditionally (or after the
bold-match pass without gating on relationEdges.length) so both bold and plain
labels in text are parsed and appended to relationEdges.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a plain-text fallback for parsing typed relation edges from catalog dependency cells/sections when relation labels are written as Builds on: A.1 (non-bold), and refactors edge creation into a shared helper to avoid duplication.

Changes:

  • Renamed the existing bold-markdown relation regex and routed edge creation through a new pushRelationEdges helper.
  • Added a plain-text label parser that activates as a fallback when no relation edges were produced by the bold pass.
  • Introduced pushRelationEdges to centralize label normalization, relation kind lookup, and target ID extraction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/runtime/compiler.ts
Comment on lines +1202 to +1205
// Plain-text catalog format: Builds on: A.1, A.2. Constrains: B.3.
// Uses lookahead to split on the next known label or end-of-string.
if (relationEdges.length === 0) {
const escapedLabels = Object.keys(RELATION_LABELS).map(
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback condition is relationEdges.length === 0, which means the plain-text parser will run not only when the bold regex finds zero matches, but also when bold matches exist but produce zero edges (e.g., unknown labels or labels with no parseable IDs). If the intended behavior is “only fallback when the bold-markdown regex finds zero matches” (per PR description), track whether boldRegex matched at least once (separately from edge count) and use that to gate the fallback, or update the PR description to reflect the actual behavior.

Copilot uses AI. Check for mistakes.
Comment thread src/runtime/compiler.ts
Comment on lines +1204 to +1212
if (relationEdges.length === 0) {
const escapedLabels = Object.keys(RELATION_LABELS).map(
(label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),
);
const labelAlternation = escapedLabels.join('|');
const plainRegex = new RegExp(
`(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,
'gis',
);
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

escapedLabels, labelAlternation, and the plainRegex are rebuilt on every parseLabeledRelations call that reaches the fallback. Since RELATION_LABELS is constant, consider hoisting a precomputed plainRegex (or precomputed labelAlternation) to module scope to avoid repeated allocations during full-spec compilation.

Copilot uses AI. Check for mistakes.
Comment thread src/runtime/compiler.ts
Comment on lines +1202 to 1216
// Plain-text catalog format: Builds on: A.1, A.2. Constrains: B.3.
// Uses lookahead to split on the next known label or end-of-string.
if (relationEdges.length === 0) {
const escapedLabels = Object.keys(RELATION_LABELS).map(
(label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),
);
const labelAlternation = escapedLabels.join('|');
const plainRegex = new RegExp(
`(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,
'gis',
);
for (const match of text.matchAll(plainRegex)) {
pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation);
}
}
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New plain-text relation parsing is introduced here, but the existing integration tests only assert that the snapshot has some relations (e.g., outline_child / explicit_reference) and don’t validate any typed catalog relations. Consider adding an assertion in the runtime snapshot test to check at least one known builds_on / coordinates_with edge from a section that uses the plain Builds on: ... format, so regressions in this parser are caught.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 737c03c56e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/runtime/compiler.ts
);
const labelAlternation = escapedLabels.join('|');
const plainRegex = new RegExp(
`(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop plain-text parser at unknown label boundaries

The plain-text regex only terminates a relation value at the next known relation label or end-of-string, so IDs that belong to other labels get misattributed to the previous relation. For example, FPF-spec.md row A.6.1 has Builds on: ... Instances: USM (A.2.6), UNM (A.19), and this parser currently emits builds_on edges to A.2.6 and A.19. That silently corrupts the relation graph and can skew retrieval/provenance because extractIds runs over text that should have been excluded from the Builds on segment.

Useful? React with 👍 / 👎.

@venikman
Copy link
Copy Markdown
Owner Author

Superseded by #24 which includes the full Mastra convention restructure

@venikman venikman closed this Apr 12, 2026
@venikman venikman deleted the fix/parse-catalog-plain-text-relations branch April 12, 2026 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants