Skip to content

feat(analyzers): syntactic IMPORTS + derived OVERRIDES + tree-sitter resolver name→def fix#698

Draft
DvirDukhan wants to merge 2 commits into
dvirdukhan/mcp-t18-ts-resolverfrom
dvirdukhan/analyzer-import-override-edges
Draft

feat(analyzers): syntactic IMPORTS + derived OVERRIDES + tree-sitter resolver name→def fix#698
DvirDukhan wants to merge 2 commits into
dvirdukhan/mcp-t18-ts-resolverfrom
dvirdukhan/analyzer-import-override-edges

Conversation

@DvirDukhan
Copy link
Copy Markdown
Contributor

@DvirDukhan DvirDukhan commented Jun 3, 2026

What

Three graph-quality improvements on top of the T18 tree-sitter resolver (#691):

  1. Syntactic IMPORTS edges (File→File). Python import resolution with no LSP — dotted-module-parts matching against an exact+suffix file index. Base build_import_index/resolve_imports hooks added to the analyzer base class; source_analyzer.link_imports wires them after first_pass (once every file has a graph id).
  2. Derived OVERRIDES edges. graph.derive_overrides(max_depth=3) derives OVERRIDES from EXTENDS + DEFINES via Cypher MERGE, run after second_pass.
  3. Tree-sitter resolver name→def pairing fix (CALLS precision). ts_resolver.py::_index_file built the per-module symbol table by zip-ing two independently-grouped QueryCursor.captures() lists (@name, @def). Decorated defs shift @def positions, mis-pairing names with defs, so imported-call resolution attached CALLS edges to the wrong target (phantom edges whose token never appears at the call site). Fixed via per-match iteration using a _matches() helper wrapping QueryCursor.matches(), applied across all four indexing loops (top-level funcs, classes, assigns, class methods). Adds 2 regression tests (10 top-level funcs, 8 classes) asserting each imported call resolves to the def whose name matches exactly; 16/16 resolver tests pass.

Why

IMPORTS/OVERRIDES edges improve coverage for all consumers and feed search_code hybrid-ranker centrality (cross-file in-degree). The resolver fix corrects CALLS-edge accuracy — the graph's core "who calls X" capability.

Impact (CALLS resolver fix)

Deterministic graph-vs-jedi-oracle caller-accuracy bench, n=40, paired (identical harness, only the resolver differs):

repo broken macro-F1 fixed macro-F1 median F1 exact-match
uxarray 0.178 0.713 0.0 → 0.94 0.09 → 0.50
arkouda 0.031 0.262 0.0 → 0.12 0.0 → 0.04

Stack

Notes

Lineage note: the IMPORTS/OVERRIDES content was authored on the pre-T18 mcp-smoke-combined branch and cherry-picked here. The old jedi-era resolve_path/resolve_type artifacts from that lineage were intentionally dropped during the cherry-pick (T18 removed them); only the 5 new import-resolution methods + base hooks + _extract_type_target (from #691) remain.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Add language-agnostic File->File IMPORTS edges via per-analyzer import
resolution (Python: dotted-module index) and derive OVERRIDES edges from
the EXTENDS+DEFINES hierarchy. Wired into the analysis pipeline. Improves
the graph for all consumers (HTTP API + MCP) and feeds search_code centrality.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 3, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8e7b692f-4a7c-410e-9c8a-7e5907c2bd8d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dvirdukhan/analyzer-import-override-edges

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

The per-module symbol table in `_index_file` was built by zipping two
independently-grouped `QueryCursor.captures()` lists (`@name` and `@def`).
When `@def` positions shift relative to `@name` (e.g. decorated defs), the
zip mis-pairs names with definitions, so imported-call resolution attaches
CALLS edges to the wrong target — producing phantom edges to functions whose
token never appears at the call site.

Fix: iterate per-match via a `_matches()` helper wrapping
`QueryCursor.matches()`, which guarantees each match's `@name`/`@def`
captures belong together. Applied across all four indexing loops
(top-level funcs, classes, assigns, class methods).

Impact (deterministic graph-vs-jedi-oracle caller bench, n=40, paired,
identical harness — only the resolver differs):
  uxarray CALLS macro-F1 0.178 → 0.713 (median 0.0 → 0.94)
  arkouda CALLS macro-F1 0.031 → 0.262

Adds two regression tests asserting each imported call resolves to the def
whose name matches exactly (10 top-level functions, 8 classes).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@DvirDukhan DvirDukhan changed the title feat(analyzers): syntactic IMPORTS edges + derived OVERRIDES feat(analyzers): syntactic IMPORTS + derived OVERRIDES + tree-sitter resolver name→def fix Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant