Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 43 additions & 14 deletions src/runtime/compiler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1191,26 +1191,55 @@ function parseLabeledRelations(
sourceCitation: string,
): RelationEdge[] {
const relationEdges: RelationEdge[] = [];
const relationRegex =

// Bold markdown format: **Builds on:** A.1, A.2.
const boldRegex =
/\*\*([^:*]+):\*\*\s*([\s\S]*?)(?=(?:\n\s*[*-]\s*\*\*[^:*]+:\*\*|\s+\*\*[^:*]+:\*\*|$))/g;
for (const match of text.matchAll(relationRegex)) {
const label = normalizeForLookup(match[1] ?? '');
const relation = RELATION_LABELS[label];
if (!relation) {
continue;
}
for (const target of extractIds(match[2] ?? '')) {
relationEdges.push({
from: sourceId,
relation,
to: target,
source: sourceCitation,
});
for (const match of text.matchAll(boldRegex)) {
pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation);
}

// Plain-text catalog format: Builds on: A.1, A.2. Constrains: B.3.
// Uses lookahead to split on the next known label or end-of-string.
if (relationEdges.length === 0) {
const escapedLabels = Object.keys(RELATION_LABELS).map(
Comment on lines +1202 to +1205
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback condition is relationEdges.length === 0, which means the plain-text parser will run not only when the bold regex finds zero matches, but also when bold matches exist but produce zero edges (e.g., unknown labels or labels with no parseable IDs). If the intended behavior is “only fallback when the bold-markdown regex finds zero matches” (per PR description), track whether boldRegex matched at least once (separately from edge count) and use that to gate the fallback, or update the PR description to reflect the actual behavior.

Copilot uses AI. Check for mistakes.
(label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),
);
const labelAlternation = escapedLabels.join('|');
const plainRegex = new RegExp(
`(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop plain-text parser at unknown label boundaries

The plain-text regex only terminates a relation value at the next known relation label or end-of-string, so IDs that belong to other labels get misattributed to the previous relation. For example, FPF-spec.md row A.6.1 has Builds on: ... Instances: USM (A.2.6), UNM (A.19), and this parser currently emits builds_on edges to A.2.6 and A.19. That silently corrupts the relation graph and can skew retrieval/provenance because extractIds runs over text that should have been excluded from the Builds on segment.

Useful? React with 👍 / 👎.

'gis',
);
Comment on lines +1205 to +1212
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The plainRegex and its associated strings (escapedLabels, labelAlternation) are re-constructed on every call to parseLabeledRelations when bold relations are absent. Since RELATION_LABELS is a static constant, this regex should be defined as a module-level constant or memoized to avoid unnecessary overhead during the compilation process, especially as this function is called for every pattern in the catalog.

Comment on lines +1204 to +1212
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

escapedLabels, labelAlternation, and the plainRegex are rebuilt on every parseLabeledRelations call that reaches the fallback. Since RELATION_LABELS is constant, consider hoisting a precomputed plainRegex (or precomputed labelAlternation) to module scope to avoid repeated allocations during full-spec compilation.

Copilot uses AI. Check for mistakes.
for (const match of text.matchAll(plainRegex)) {
pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation);
}
}
Comment on lines +1204 to 1216
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Parse plain-text labels even when bold labels already matched.

Line 1204 makes plain-text parsing conditional on zero bold matches, which drops edges for mixed-format cells (e.g., **Builds on:** A.1 Used by: B.2 loses used_by).

💡 Suggested fix
-  if (relationEdges.length === 0) {
-    const escapedLabels = Object.keys(RELATION_LABELS).map(
-      (label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),
-    );
-    const labelAlternation = escapedLabels.join('|');
-    const plainRegex = new RegExp(
-      `(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,
-      'gis',
-    );
-    for (const match of text.matchAll(plainRegex)) {
-      pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation);
-    }
-  }
+  const escapedLabels = Object.keys(RELATION_LABELS).map(
+    (label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),
+  );
+  const labelAlternation = escapedLabels.join('|');
+  const plainRegex = new RegExp(
+    `(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,
+    'gis',
+  );
+  for (const match of text.matchAll(plainRegex)) {
+    pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation);
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (relationEdges.length === 0) {
const escapedLabels = Object.keys(RELATION_LABELS).map(
(label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),
);
const labelAlternation = escapedLabels.join('|');
const plainRegex = new RegExp(
`(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,
'gis',
);
for (const match of text.matchAll(plainRegex)) {
pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation);
}
}
const escapedLabels = Object.keys(RELATION_LABELS).map(
(label) => label.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),
);
const labelAlternation = escapedLabels.join('|');
const plainRegex = new RegExp(
`(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$)`,
'gis',
);
for (const match of text.matchAll(plainRegex)) {
pushRelationEdges(relationEdges, sourceId, match[1] ?? '', match[2] ?? '', sourceCitation);
}
🧰 Tools
🪛 ast-grep (0.42.1)

[warning] 1208-1211: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
(${labelAlternation}):\\s*(.*?)(?=(?:${labelAlternation}):|$),
'gis',
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/runtime/compiler.ts` around lines 1204 - 1216, The current logic skips
plain-text label parsing when relationEdges already contains matches (the if
(relationEdges.length === 0) branch), causing mixed-format cells to miss edges;
remove the conditional so the plainRegex pass always runs after the bold/initial
parse: keep the escapedLabels / labelAlternation / plainRegex construction and
the loop that calls pushRelationEdges(relationEdges, sourceId, match[1] ?? '',
match[2] ?? '', sourceCitation), but execute it unconditionally (or after the
bold-match pass without gating on relationEdges.length) so both bold and plain
labels in text are parsed and appended to relationEdges.

Comment on lines +1202 to 1216
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New plain-text relation parsing is introduced here, but the existing integration tests only assert that the snapshot has some relations (e.g., outline_child / explicit_reference) and don’t validate any typed catalog relations. Consider adding an assertion in the runtime snapshot test to check at least one known builds_on / coordinates_with edge from a section that uses the plain Builds on: ... format, so regressions in this parser are caught.

Copilot uses AI. Check for mistakes.

return relationEdges;
}

function pushRelationEdges(
edges: RelationEdge[],
sourceId: string,
rawLabel: string,
rawTargets: string,
sourceCitation: string,
): void {
const label = normalizeForLookup(rawLabel);
const relation = RELATION_LABELS[label];
if (!relation) {
return;
}
for (const target of extractIds(rawTargets)) {
edges.push({
from: sourceId,
relation,
to: target,
source: sourceCitation,
});
}
}

function parseKeywords(cell: string): string[] {
const match = cell.match(/Keywords:\s*(.+?)(?:Queries:|$)/i);
if (!match) {
Expand Down