feat(analyzers): syntactic IMPORTS + derived OVERRIDES + tree-sitter resolver name→def fix#698
Draft
DvirDukhan wants to merge 2 commits into
Draft
Conversation
Add language-agnostic File->File IMPORTS edges via per-analyzer import resolution (Python: dotted-module index) and derive OVERRIDES edges from the EXTENDS+DEFINES hierarchy. Wired into the analysis pipeline. Improves the graph for all consumers (HTTP API + MCP) and feeds search_code centrality. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The per-module symbol table in `_index_file` was built by zipping two independently-grouped `QueryCursor.captures()` lists (`@name` and `@def`). When `@def` positions shift relative to `@name` (e.g. decorated defs), the zip mis-pairs names with definitions, so imported-call resolution attaches CALLS edges to the wrong target — producing phantom edges to functions whose token never appears at the call site. Fix: iterate per-match via a `_matches()` helper wrapping `QueryCursor.matches()`, which guarantees each match's `@name`/`@def` captures belong together. Applied across all four indexing loops (top-level funcs, classes, assigns, class methods). Impact (deterministic graph-vs-jedi-oracle caller bench, n=40, paired, identical harness — only the resolver differs): uxarray CALLS macro-F1 0.178 → 0.713 (median 0.0 → 0.94) arkouda CALLS macro-F1 0.031 → 0.262 Adds two regression tests asserting each imported call resolves to the def whose name matches exactly (10 top-level functions, 8 classes). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Three graph-quality improvements on top of the T18 tree-sitter resolver (#691):
build_import_index/resolve_importshooks added to the analyzer base class;source_analyzer.link_importswires them afterfirst_pass(once every file has a graph id).graph.derive_overrides(max_depth=3)derives OVERRIDES from EXTENDS + DEFINES via Cypher MERGE, run aftersecond_pass.ts_resolver.py::_index_filebuilt the per-module symbol table byzip-ing two independently-groupedQueryCursor.captures()lists (@name,@def). Decorated defs shift@defpositions, mis-pairing names with defs, so imported-call resolution attached CALLS edges to the wrong target (phantom edges whose token never appears at the call site). Fixed via per-match iteration using a_matches()helper wrappingQueryCursor.matches(), applied across all four indexing loops (top-level funcs, classes, assigns, class methods). Adds 2 regression tests (10 top-level funcs, 8 classes) asserting each imported call resolves to the def whose name matches exactly; 16/16 resolver tests pass.Why
IMPORTS/OVERRIDES edges improve coverage for all consumers and feed
search_codehybrid-ranker centrality (cross-file in-degree). The resolver fix corrects CALLS-edge accuracy — the graph's core "who calls X" capability.Impact (CALLS resolver fix)
Deterministic graph-vs-jedi-oracle caller-accuracy bench, n=40, paired (identical harness, only the resolver differs):
Stack
Notes
Lineage note: the IMPORTS/OVERRIDES content was authored on the pre-T18 mcp-smoke-combined branch and cherry-picked here. The old jedi-era
resolve_path/resolve_typeartifacts from that lineage were intentionally dropped during the cherry-pick (T18 removed them); only the 5 new import-resolution methods + base hooks +_extract_type_target(from #691) remain.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com