Skip to content

fix(native): align Rust role classification with JS#918

Merged
carlos-alm merged 4 commits intomainfrom
fix/native-roles-parity
Apr 13, 2026
Merged

fix(native): align Rust role classification with JS#918
carlos-alm merged 4 commits intomainfrom
fix/native-roles-parity

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Native Rust role classifier was producing different (less accurate) results than the JS fallback, causing the roles phase to classify nodes incorrectly when using --engine native
  • Adds imports-type edge support to all fan-in, exported-ID, and production fan-in queries
  • Adds reexport-chain exported detection via recursive CTE (codegraph: role classification misses consumers through barrel re-exports #837), matching the JS barrel-tracing logic
  • Adds constant + hasActiveFileSiblings logic so constants in active files classify as leaf instead of dead-leaf
  • Adds !isExported guard to test-only classification
  • Includes imports-type and reexports in incremental neighbour file discovery

Test plan

  • cargo check passes with no new warnings
  • All 41 role classification unit tests pass
  • 509 integration tests pass (7 failures are pre-existing WASM grammar issue in worktree)
  • Run codegraph build --engine native and codegraph build --engine wasm on a real project, compare codegraph roles output — should now match

The native Rust role classifier was producing different (less accurate)
results than the JS fallback path. This fixes all divergences:

- Include 'imports-type' edges in fan-in queries (full, incremental, medians)
- Add reexport-chain exported detection via recursive CTE (#837)
- Include 'imports-type' in exported IDs and production fan-in queries
- Add constant + hasActiveFileSiblings logic (classify as leaf, not dead)
- Add !isExported guard to test-only classification
- Include 'imports-type' and 'reexports' in neighbour file discovery
- Add test_file_filter_col() for arbitrary column filtering
@claude
Copy link
Copy Markdown

claude bot commented Apr 13, 2026

Claude finished @carlos-alm's task —— View job


PR Review Analysis

Reviewing PR #918: fix(native): align Rust role classification with JS

Todo List:

  • Read and analyze the changed file (crates/codegraph-core/src/roles_db.rs)
  • Phase 1: Root Cause Analysis
  • Phase 2: Configuration Change Detection
  • Phase 3: Test Integrity Check
  • Phase 4: Code Quality Review
  • Phase 5: Backlog Compliance Assessment
  • Provide final rating and recommendation

Root Cause Analysis

Problem Being Solved: The native Rust role classification engine was producing different (less accurate) results than the JS fallback when using --engine native. This breaks the fundamental dual-engine design principle that both engines must produce identical results.

Why This Approach: The fix systematically adds missing functionality to match the JS implementation:

  • Adds imports-type edge support to all fan-in queries (lines 234, 266, 280, 312, 389, 500, 550, 615, 666)
  • Implements reexport chain detection via recursive CTE (lines 274-304, 622-658) matching JS barrel-tracing logic
  • Adds constant + hasActiveFileSiblings classification (lines 131-132, 377-386, 464-467)
  • Adds !isExported guard to test-only classification (line 142)

Risk Assessment: ✅ Low risk. This is a bug fix, not a workaround. The discrepancy between engines violates CLAUDE.md's core principle: "If two engines produce different results, that is a bug in the less-accurate engine — not an acceptable 'parity gap.'" The implementation correctly mirrors the proven JS logic.

Configuration Change Detection

No HIGH-RISK configuration changes detected. This is pure Rust code implementing missing functionality. No linting rules, test thresholds, or build configs were relaxed.

Test Integrity

Test integrity maintained. The PR description shows:

  • 41 role classification unit tests pass
  • 509 integration tests pass (7 failures are pre-existing WASM issues)
  • Proper test plan with manual verification step

The existing test at tests/unit/roles.test.ts:196-213 specifically validates imports-type edge handling — this Rust implementation now correctly supports it.

Code Quality Review

Strengths:

  • Systematic parity: Each JS query pattern has been faithfully reproduced in Rust
  • Performance-conscious: Uses prepared statements with parameter binding (raw_bind_parameter)
  • Proper SQL injection protection: All user inputs are parameterized
  • Chunked updates: Batch operations use 500-item chunks to avoid query size limits (line 193)
  • Correct CTE implementation: The recursive prod_reachable CTE (lines 276-287, 625-636) properly traces reexport chains

Technical verification:

  • ✅ Fan-in queries now include imports-type edges matching JS: WHERE e.kind IN ('calls', 'imports-type') (lines 234, 266, 312, 389, 550, 615, 666)
  • ✅ Production fan-in excludes test files with proper filter (lines 314-315, 667-671)
  • ✅ Reexport detection uses identical recursive logic to JS implementation
  • ✅ Constants in active files classify as leaf instead of dead-leaf (lines 131-132, 464-467)
  • ✅ Test-only classification requires !isExported (line 142)

Minor observations:

  • Code could benefit from extracting the recursive CTE SQL as a constant to reduce duplication between full and incremental functions
  • Consider adding inline comments explaining the multi-level reexport chain logic

Backlog Compliance

  • Zero-dep: ✅ No new runtime dependencies added
  • Foundation-aligned: ✅ Fixes engine parity — core to the dual-engine design principle
  • Problem-fit: 5/5 — Directly prevents AI agents from receiving inconsistent results based on engine choice
  • Breaking: No — purely additive internal functionality
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

None. This is a well-implemented bug fix that restores the design contract between engines.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Solid implementation that fixes a critical parity bug. The systematic approach mirrors proven JS logic. Only missing point is for minor code organization (extracted constants). This directly addresses the core problem of engine consistency that affects AI agent reliability.

Test validation recommended: Run codegraph build --engine native vs codegraph build --engine wasm on a real project and verify codegraph roles output matches — this will prove the fix works end-to-end.


@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

Codegraph Impact Analysis

10 functions changed4 callers affected across 1 files

  • classify_node in crates/codegraph-core/src/roles_db.rs:110 (3 transitive callers)
  • do_classify_full in crates/codegraph-core/src/roles_db.rs:213 (0 transitive callers)
  • test_file_filter in crates/codegraph-core/src/roles_db.rs:370 (0 transitive callers)
  • test_file_filter_col in crates/codegraph-core/src/roles_db.rs:375 (1 transitive callers)
  • compute_active_files in crates/codegraph-core/src/roles_db.rs:385 (2 transitive callers)
  • compute_global_medians in crates/codegraph-core/src/roles_db.rs:397 (1 transitive callers)
  • classify_rows in crates/codegraph-core/src/roles_db.rs:458 (2 transitive callers)
  • find_neighbour_files in crates/codegraph-core/src/roles_db.rs:494 (1 transitive callers)
  • query_nodes_for_files in crates/codegraph-core/src/roles_db.rs:532 (1 transitive callers)
  • do_classify_incremental in crates/codegraph-core/src/roles_db.rs:589 (0 transitive callers)

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 13, 2026

Greptile Summary

Aligns the Rust native role classifier (roles_db.rs) with the JS implementation across four areas: adds imports-type edges to all fan-in and exported-ID queries, adds a recursive CTE to trace reexport chains for barrel-exported symbols, introduces the constant + hasActiveFileSiblings → leaf heuristic, and adds the missing !isExported guard on the test-only branch. The changes are logically correct and faithfully match src/graph/classifiers/roles.ts and src/features/structure.ts.

Confidence Score: 5/5

Safe to merge — all changes correctly mirror the JS classifier logic with no P0/P1 issues.

All four alignment fixes (imports-type fan-in, reexport CTE, constant leaf heuristic, !isExported test-only guard) are logically correct and verified against the JS reference implementation. The only finding is a P2 note about reexports edges being dead code in the neighbour query — a pre-existing JS quirk faithfully ported, with no correctness impact on the critical classification paths.

No files require special attention beyond the noted P2 in find_neighbour_files.

Important Files Changed

Filename Overview
crates/codegraph-core/src/roles_db.rs Adds imports-type fan-in support, recursive reexport CTE for exported detection, constant+hasActiveFileSiblings leaf heuristic, and !isExported guard for test-only; reexports edges in find_neighbour_files are a no-op due to file→file node kind filtering (pre-existing JS parity)

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Node: fan_in=0, not exported] --> B{kind == constant\nAND hasActiveFileSiblings?}
    B -- Yes --> C[leaf]
    B -- No --> D[classify_dead_sub_role]
    D --> E{LEAF_KINDS?}
    E -- Yes --> F[dead-leaf]
    E -- No --> G{FFI extension?}
    G -- Yes --> H[dead-ffi]
    G -- No --> I{Entry path pattern?}
    I -- Yes --> J[dead-entry]
    I -- No --> K[dead-unresolved]

    A2[Node: fan_in=0, is_exported] --> L[entry]
    A3[Node: fan_in>0, prod_fan_in=0, NOT exported] --> M[test-only]
    A4[Node: fan_in>0, prod_fan_in>0 OR is_exported] --> N{high_in / high_out?}
    N -- high_in only --> O[core]
    N -- both high --> P[utility]
    N -- high_out only --> Q[adapter]
    N -- neither --> R[leaf]

    FW[Framework entry prefix] --> S[entry]
Loading

Reviews (3): Last reviewed commit: "Merge branch 'main' into fix/native-role..." | Re-trigger Greptile

Expand comments on the recursive prod_reachable CTEs in both the full
and incremental classify paths to explain the base-case/recursive-step
mechanics and how barrel reexport chains are traced.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed the two minor suggestions from the review:

  1. Inline comments for reexport chain logic: Added detailed comments to both CTE blocks (full-classify step 3b and incremental path) explaining the base-case/recursive-step mechanics and how barrel reexport chains are traced. See commit a6b8799.

  2. Extract CTE as constant: Evaluated this but the two CTEs are not identical — the incremental version has an additional AND n.file IN (...) scope filter and uses different parameter binding. Extracting a shared constant would require templating the differences back in, which would reduce clarity rather than improve it. Keeping them separate with cross-references in the comments is cleaner.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 5a1a0f4 into main Apr 13, 2026
22 of 23 checks passed
@carlos-alm carlos-alm deleted the fix/native-roles-parity branch April 13, 2026 03:02
@github-actions github-actions bot locked and limited conversation to collaborators Apr 13, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant