Parse plain-text catalog dependency cells into typed relation edges by venikman · Pull Request #23 · venikman/fpf-memory

venikman · 2026-04-12T16:05:35Z

Summary

Add plain-text fallback parser to parseLabeledRelations in compiler.ts for catalog dependency cells that use Builds on: A.1 format instead of **Builds on:** A.1
Extract pushRelationEdges helper to deduplicate edge-creation logic between bold and plain-text parsers
Fallback only activates when the bold-markdown regex finds zero matches, preventing double-counting

Impact

Relation type	Before	After	Change
`builds_on`	353	1,132	+221%
`coordinates_with`	167	583	+249%
`prerequisite_for`	37	68	+84%
`used_by`	32	122	+281%
Total edges	14,006	15,402	+1,396

218 of 224 patterns with dependency text now have parsed typed relations (was 76 before). The remaining 6 reference non-standard IDs (P-4, CHR-CAL, Part C (CHR)) that extractIds cannot parse.

Test plan

All 38 existing tests pass (npx rstest run)
Force-rebuilt index via refresh_fpf_index MCP tool
Verified A.0 now has builds_on and coordinates_with edges (was only explicit_reference)
Verified G.9 and G.13 typed relations populated correctly
End-to-end MCP client validation: ask_fpf, query_fpf_spec, inspect_fpf_node, trace_fpf_path all return correct results against the improved index

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes & Improvements
- Enhanced relation label parsing with improved fallback logic: now attempts bold-markdown format first, then falls back to plain-text format if needed.
- Strengthened target extraction and label normalization for more robust parsing capabilities.

The parseLabeledRelations function only matched **bold:** markdown format, but catalog dependency cells use plain text like "Builds on: A.1, A.2. Constrains: B.3." This caused 148 of 237 patterns to lose typed semantic relations (builds_on, constrains, coordinates_with, etc.) from catalog metadata. Add a plain-text fallback parser that activates when the bold regex finds nothing. Uses lookahead on known RELATION_LABELS keys to split multi-label cells correctly. Extract pushRelationEdges helper to deduplicate the edge-creation logic. Impact: builds_on 353→1132, coordinates_with 167→583, prerequisite_for 37→68. Total +1396 typed relation edges. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-12T16:05:49Z

📝 Walkthrough

Walkthrough

Modified the parseLabeledRelations function in the compiler to use a two-stage parsing strategy: first attempting to parse bold-formatted relation labels (**Label:** Targets), then falling back to plain-text catalog format. Introduced a shared helper function pushRelationEdges to consolidate label normalization and edge construction logic.

Changes

Cohort / File(s)	Summary
Relation Label Parsing Refactor `src/runtime/compiler.ts`	Replaced single-stage regex-based relation label parsing with a two-stage approach (bold-format first, plain-text fallback). Extracted edge construction logic into a shared `pushRelationEdges` helper to normalize labels, resolve relation kinds, and build edges for both parsing paths.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Two paths now guide the parser's way,
Bold labels first see the light of day,
When none are found, plain text takes flight,
A fallback dance, precise and right,
Relations flow through edges bright!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding plain-text catalog dependency parsing to handle non-bold-markdown relation formats.
Description check	✅ Passed	The description covers the What, Impact, and Test plan sections comprehensively with detailed metrics and validation results, though the formal template sections (Why, Type, Changes, Boundary check) are not explicitly structured.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/parse-catalog-plain-text-relations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request refactors the parseLabeledRelations function to support both bold markdown and plain-text catalog formats for relations, introducing a pushRelationEdges helper to handle the edge creation logic. The reviewer suggested a performance optimization to move the dynamic regex construction for plain-text labels to a module-level constant or to memoize it, as the underlying labels are static and re-calculating the regex on every function call adds unnecessary overhead.

gemini-code-assist · 2026-04-12T16:06:57Z

+    const escapedLabels = Object.keys(RELATION_LABELS).map(
+      (label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),
+    );
+    const labelAlternation = escapedLabels.join('|');
+    const plainRegex = new RegExp(
+      `(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,
+      'gis',
+    );


The plainRegex and its associated strings (escapedLabels, labelAlternation) are re-constructed on every call to parseLabeledRelations when bold relations are absent. Since RELATION_LABELS is a static constant, this regex should be defined as a module-level constant or memoized to avoid unnecessary overhead during the compilation process, especially as this function is called for every pattern in the catalog.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/runtime/compiler.ts`:
- Around line 1204-1216: The current logic skips plain-text label parsing when
relationEdges already contains matches (the if (relationEdges.length === 0)
branch), causing mixed-format cells to miss edges; remove the conditional so the
plainRegex pass always runs after the bold/initial parse: keep the escapedLabels
/ labelAlternation / plainRegex construction and the loop that calls
pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '',
sourceCitation), but execute it unconditionally (or after the bold-match pass
without gating on relationEdges.length) so both bold and plain labels in text
are parsed and appended to relationEdges.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 07de35c0-0e6e-451c-ae7b-a50a553eebff

📥 Commits

Reviewing files that changed from the base of the PR and between afd7e9d and 737c03c.

📒 Files selected for processing (1)

src/runtime/compiler.ts

coderabbitai · 2026-04-12T16:07:55Z

+  if (relationEdges.length === 0) {
+    const escapedLabels = Object.keys(RELATION_LABELS).map(
+      (label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),
+    );
+    const labelAlternation = escapedLabels.join('|');
+    const plainRegex = new RegExp(
+      `(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,
+      'gis',
+    );
+    for (const match of text.matchAll(plainRegex)) {
+      pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation);
    }
  }


⚠️ Potential issue | 🟡 Minor

Parse plain-text labels even when bold labels already matched.

Line 1204 makes plain-text parsing conditional on zero bold matches, which drops edges for mixed-format cells (e.g., **Builds on:** A.1 Used by: B.2 loses used_by).

💡 Suggested fix

- if (relationEdges.length === 0) { - const escapedLabels = Object.keys(RELATION_LABELS).map( - (label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'), - ); - const labelAlternation = escapedLabels.join('|'); - const plainRegex = new RegExp( - `(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`, - 'gis', - ); - for (const match of text.matchAll(plainRegex)) { - pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation); - } - } + const escapedLabels = Object.keys(RELATION_LABELS).map( + (label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'), + ); + const labelAlternation = escapedLabels.join('|'); + const plainRegex = new RegExp( + `(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`, + 'gis', + ); + for (const match of text.matchAll(plainRegex)) { + pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (relationEdges.length === 0) {

const escapedLabels = Object.keys(RELATION_LABELS).map(

(label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),

);

const labelAlternation = escapedLabels.join('|');

const plainRegex = new RegExp(

`(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,

'gis',

);

for (const match of text.matchAll(plainRegex)) {

pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation);

}

}

const escapedLabels = Object.keys(RELATION_LABELS).map(

(label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),

);

const labelAlternation = escapedLabels.join('|');

const plainRegex = new RegExp(

`(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,

'gis',

);

for (const match of text.matchAll(plainRegex)) {

pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation);

}

🧰 Tools

🪛 ast-grep (0.42.1)

[warning] 1208-1211: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$),
'gis',
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/runtime/compiler.ts` around lines 1204 - 1216, The current logic skips plain-text label parsing when relationEdges already contains matches (the if (relationEdges.length === 0) branch), causing mixed-format cells to miss edges; remove the conditional so the plainRegex pass always runs after the bold/initial parse: keep the escapedLabels / labelAlternation / plainRegex construction and the loop that calls pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation), but execute it unconditionally (or after the bold-match pass without gating on relationEdges.length) so both bold and plain labels in text are parsed and appended to relationEdges.

Copilot

Pull request overview

Adds a plain-text fallback for parsing typed relation edges from catalog dependency cells/sections when relation labels are written as Builds on: A.1 (non-bold), and refactors edge creation into a shared helper to avoid duplication.

Changes:

Renamed the existing bold-markdown relation regex and routed edge creation through a new pushRelationEdges helper.
Added a plain-text label parser that activates as a fallback when no relation edges were produced by the bold pass.
Introduced pushRelationEdges to centralize label normalization, relation kind lookup, and target ID extraction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-12T16:09:54Z

+  // Plain-text catalog format: Builds on: A.1, A.2. Constrains: B.3.
+  // Uses lookahead to split on the next known label or end-of-string.
+  if (relationEdges.length === 0) {
+    const escapedLabels = Object.keys(RELATION_LABELS).map(


The fallback condition is relationEdges.length === 0, which means the plain-text parser will run not only when the bold regex finds zero matches, but also when bold matches exist but produce zero edges (e.g., unknown labels or labels with no parseable IDs). If the intended behavior is “only fallback when the bold-markdown regex finds zero matches” (per PR description), track whether boldRegex matched at least once (separately from edge count) and use that to gate the fallback, or update the PR description to reflect the actual behavior.

Copilot · 2026-04-12T16:09:54Z

+  if (relationEdges.length === 0) {
+    const escapedLabels = Object.keys(RELATION_LABELS).map(
+      (label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),
+    );
+    const labelAlternation = escapedLabels.join('|');
+    const plainRegex = new RegExp(
+      `(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,
+      'gis',
+    );


escapedLabels, labelAlternation, and the plainRegex are rebuilt on every parseLabeledRelations call that reaches the fallback. Since RELATION_LABELS is constant, consider hoisting a precomputed plainRegex (or precomputed labelAlternation) to module scope to avoid repeated allocations during full-spec compilation.

Copilot · 2026-04-12T16:09:54Z

+  // Plain-text catalog format: Builds on: A.1, A.2. Constrains: B.3.
+  // Uses lookahead to split on the next known label or end-of-string.
+  if (relationEdges.length === 0) {
+    const escapedLabels = Object.keys(RELATION_LABELS).map(
+      (label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),
+    );
+    const labelAlternation = escapedLabels.join('|');
+    const plainRegex = new RegExp(
+      `(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,
+      'gis',
+    );
+    for (const match of text.matchAll(plainRegex)) {
+      pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation);
    }
  }


New plain-text relation parsing is introduced here, but the existing integration tests only assert that the snapshot has some relations (e.g., outline_child / explicit_reference) and don’t validate any typed catalog relations. Consider adding an assertion in the runtime snapshot test to check at least one known builds_on / coordinates_with edge from a section that uses the plain Builds on: ... format, so regressions in this parser are caught.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 737c03c56e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-12T16:11:06Z

+    );
+    const labelAlternation = escapedLabels.join('|');
+    const plainRegex = new RegExp(
+      `(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,


Stop plain-text parser at unknown label boundaries

The plain-text regex only terminates a relation value at the next known relation label or end-of-string, so IDs that belong to other labels get misattributed to the previous relation. For example, FPF-spec.md row A.6.1 has Builds on: ... Instances: USM (A.2.6), UNM (A.19), and this parser currently emits builds_on edges to A.2.6 and A.19. That silently corrupts the relation graph and can skew retrieval/provenance because extractIds runs over text that should have been excluded from the Builds on segment.

Useful? React with 👍 / 👎.

venikman · 2026-04-12T18:36:50Z

Superseded by #24 which includes the full Mastra convention restructure

Copilot AI review requested due to automatic review settings April 12, 2026 16:05

Copilot started reviewing on behalf of venikman April 12, 2026 16:06 View session

gemini-code-assist Bot reviewed Apr 12, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 12, 2026

View reviewed changes

Copilot AI reviewed Apr 12, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 12, 2026

View reviewed changes

venikman closed this Apr 12, 2026

venikman deleted the fix/parse-catalog-plain-text-relations branch April 12, 2026 18:36

Conversation

venikman commented Apr 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Impact

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

venikman commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

venikman commented Apr 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 12, 2026 •

edited

Loading