docs(roadmap): Phase 8.8 — language-specific analysis reference map (34 languages)#1338
docs(roadmap): Phase 8.8 — language-specific analysis reference map (34 languages)#1338carlos-alm wants to merge 10 commits into
Conversation
…ap (34 languages) Adds section 8.8 to the Phase 8 roadmap covering every language codegraph supports. For each of the 34 languages (grouped into 13 families including HCL/Terraform), documents: - State-of-the-art call-graph / points-to analysis tools (Jelly-equivalents) - Published precision/recall figures with paper/benchmark citations - Codegraph's current gap vs. those tools - Concrete adoption candidates (techniques to implement) - Benchmark suites and fixture sources with license annotations Fixture acquisition guide added to the 8.8 intro: MIT/Apache/BSD/CC-BY fixtures may be committed directly; GPL fixtures are reference-only; academic suites with no license should be derived by running the tool, not copied. Cross-references 8.6 (CI gate) to extend benchmarking beyond Jelly/ACG to all 34 languages using the reference map.
|
Claude finished @carlos-alm's task —— View job PR Review — §8.8 Language-Specific Analysis Reference Map
Root Cause AnalysisProblem Being Solved: Phases 8.x sub-tasks lack a shared reference point for "what does best-in-class look like for this language?" Without it, future implementation work either re-researches the landscape ad hoc or ignores non-JS/TS languages entirely. This section establishes a canonical research anchor for all 34 supported languages. Why This Approach: A single ROADMAP section avoids scattering the literature survey across individual issues, keeps it version-controlled alongside the features it informs, and provides cross-referencing to §8.6 (CI gate). This is the right place for it. Risk Assessment: Purely additive documentation. Zero runtime/build risk. Phase 2 & 3: Configuration and Test IntegrityNo configuration, test, or code files touched. Nothing to flag. Phase 4: Content QualityThe content is genuinely high-quality for a research reference document. That said, several factual concerns require attention before this becomes a lasting canonical reference. Factual accuracy concernsLanguage count inconsistency (minor) Groovy / Clojure section is absent from the per-language subsections Objective-C subsection is also absent
StructureThe section is well-structured. The fixture acquisition policy at the top is clear, practical, and directly actionable. The cross-reference to §8.6 is correct. The summary table is a genuinely useful at-a-glance view. One readability note: the section header uses Phase 5: Backlog Compliance
Critical Concerns
None of these are blockers for merging (this is a research reference document, not a spec), but items 1 and 2 reduce self-consistency and should be fixed. Final Recommendation |
Greptile SummaryThis PR adds § 8.8 — Language-Specific Analysis Reference Map to
Confidence Score: 3/5Documentation-only change; no runtime behavior affected, but contains two factual inaccuracies that would mislead engineers acting on the reference map. The Groovy row in the Summary table tells engineers 'source-level analysis is the only tractable path,' which contradicts the dedicated Groovy subsection's explicit statement that no source-level precision improvement is achievable without JVM bytecode access. Separately, DyPyBench — a Python-specific benchmark corpus — is cited in both the Dart subsection and the Summary table as the target benchmark methodology for Dart, sending developers to Python-only tooling. Both errors are internally inconsistent with content elsewhere in the same PR and could cause misdirected engineering effort if the reference map is used as intended. docs/roadmap/ROADMAP.md — Groovy Summary table row (line 2070) and Dart benchmark citations (lines 1833 and 2056). Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[§ 8.8 Fixture Acquisition] --> B{License?}
B -->|MIT / Apache-2.0 / BSD / CC-BY| C[Commit to tests/benchmarks/resolution/fixtures/lang/]
B -->|CC-BY| D[Commit + add attribution in fixture README]
B -->|GPL| E[Reference-only: run tool, record expected edges — do NOT copy source files]
B -->|No explicit license / Academic| F[Reference-only: derive expected edges by running tool]
C --> G[§ 8.6 CI Gate extended to all 34 languages]
D --> G
E --> G
F --> G
G --> H[Per-language reference tools consulted from § 8.8]
Reviews (12): Last reviewed commit: "docs(roadmap): add dedicated subsections..." | Re-trigger Greptile |
|
Addressed all concerns in commit 9734376:
Out of scope: CLAUDE.md language count discrepancy (says 23, registry has 34) — filed as #1342. |
|
Addressed all 3 Greptile findings in commit 9734376:
|
|
Addressed the two remaining Greptile findings in commit 5c82617:
|
… §8.8 (#1338) - Add BSD license annotation to JCG entry in JVM benchmark suites - Add benchmark suites entry to Erlang subsection (ELP call hierarchy, Dialyzer OTP scalability benchmarks, Set-theoretic Types test suite) - Add benchmark suites entry to Gleam subsection (no dedicated benchmark; Reach project BEAM test cases as closest ground truth)
|
Addressed all 3 Greptile findings in commit d6533bf:
|
| - Adopt Psalm-style flow-sensitive receiver narrowing: at `$v->method()`, use the narrowed type of `$v` from preceding `instanceof` guards or assignment context rather than the full class hierarchy. | ||
| - Phan's two-phase design: build a global class/method index from all parsed files before resolving any call site. Replicates what codegraph's build pipeline already does for JS/TS but is not yet applied to PHP. | ||
|
|
||
| **Benchmark suites:** TChecker evaluation corpus (CCS 2022); Artemis corpus (250 PHP web apps, OOPSLA 2025); SWC Registry PHP test cases. |
There was a problem hiding this comment.
The "SWC Registry PHP test cases" entry is a misattribution. The SWC Registry (swcregistry.io) is the Smart Contract Weakness Classification registry — a Solidity/EVM vulnerability catalogue. It is correctly cited in the Solidity subsection of this very document. There is no PHP benchmark corpus under that name; linking it here will send engineers looking for a PHP test corpus to an Ethereum security resource.
| **Benchmark suites:** TChecker evaluation corpus (CCS 2022); Artemis corpus (250 PHP web apps, OOPSLA 2025); SWC Registry PHP test cases. | |
| **Benchmark suites:** TChecker evaluation corpus (CCS 2022); Artemis corpus (250 PHP web apps, OOPSLA 2025). |
There was a problem hiding this comment.
Fixed in commit 744edf7 — removed "SWC Registry PHP test cases" from the PHP benchmark suites line. The SWC Registry (swcregistry.io) is correctly cited in the Solidity subsection only. Applied the suggested replacement text.
…r's own class (#1343) * fix(edge_builder): restrict same-file this-dispatch fallback to caller's own class When a file contains multiple unrelated classes all defining a method with the same name, the broader same-file suffix scan emitted false-positive call edges (e.g. this.area() in Shape.describe matched Calculator.area and Formatter.area). The fix: when the scan finds more than one method with the matching suffix, restrict the result to methods whose qualified name starts with the caller's own class prefix. A single unambiguous match is returned as-is (handles the CHA case of one subclass override). If multiple classes match and none is the caller's class, return nothing rather than emitting false edges. WASM tests are active; native tests marked todo pending next binary release. Closes #1324 docs check acknowledged * fix(edge_builder): apply caller-class scope to single-match suffix scan too (#1343) The 1-match arm of the same-file suffix scan was returning any sole method regardless of whether it belonged to the caller's class. A file with Caller (no area()) and Sibling (area()) would produce a false Caller.run → Sibling.area edge — the same bug fixed for 2+ matches. Replace the three-arm match with a unified caller-prefix filter applied to all non-empty results. Add fixture single-sibling.ts and a corresponding WASM assertion to cover the single-match false-positive path. Initialize callEdges to [] for cleaner beforeAll failure mode.
- Remove SWC Registry from PHP benchmark suites (it is the Smart Contract Weakness Classification registry for Solidity/EVM, not a PHP test corpus) - Fix Dart TFA speedup notation: "−49.5%" → "49.5% reduction in AOT compilation time" (negative speedup is mathematically undefined) - Clarify Ruby CLBG citation: Ali et al. TSE study covers JVM-hosted languages (Groovy/Clojure/Scala/Kotlin), not MRI Ruby; no dedicated Ruby CG P/R evaluation exists as of the time of writing
…on, Dart sign, Ruby citation (#1338) - Remove SWC Registry from PHP benchmark suites (it is the Smart Contract Weakness Classification registry for Solidity/EVM, not a PHP test corpus) - Fix Dart TFA speedup notation: "−49.5%" → "49.5% reduction in AOT compilation time" (negative speedup is mathematically undefined) - Clarify Ruby CLBG citation: Ali et al. TSE study covers JVM-hosted languages (Groovy/Clojure/Scala/Kotlin), not MRI Ruby; no dedicated Ruby CG P/R evaluation exists as of the time of writing
|
Addressed 3 remaining factual issues from Greptile's review in commit 744edf7:
|
0127e27 to
9d817e7
Compare
…analysis-reference-map
…ve-C (#1338) Addresses review feedback: all 34 languages now have individual subsections in §8.8. Previously Groovy and Clojure were covered only by a shared note inside the JVM section; Objective-C was covered only by a note inside the C/C++ section. Each new subsection follows the standard template: reference tools, codegraph gap, adoption candidates, and benchmark suites.
|
Addressed all remaining Claude review concerns:
All 34 languages now have dedicated subsections in §8.8. |
|
Claude finished @carlos-alm's task —— View job Re-Review — §8.8 Language-Specific Analysis Reference Map
Re-Review SummaryAll items from the previous Claude review (3/5, "approve with conditions") have been addressed:
One remaining open issueDyPyBench misattribution for Dart (lines 1833 and 2056) — not yet fixed. Greptile's P1 finding from the most recent review cycle: arXiv 2403.00539 is "DyPyBench: A Benchmark of Executable Python Software" — a Python-only corpus (50 Python projects, 681k lines of Python code, built on the DynaPyt Python analysis framework). The document's Dart subsection describes it as "portable to Dart via Dart Observatory/VM coverage" but no published work ports DyPyBench to Dart. Current state at line 1833: And at line 2056 in the summary table: The correct fix (consistent with how Bash and Haskell are handled — both acknowledge "no public benchmark exists") would be to either:
Revised Rating⭐⭐⭐⭐☆ (4/5) — APPROVE WITH ONE FIX All four conditions from the previous review are resolved. The section now has full subsection coverage for all 34 languages, all cited review concerns are addressed, and the content quality is genuinely strong. The single remaining blocker is the DyPyBench misattribution, which is the same class of error (Python corpus cited for a different language) that was already corrected for PHP. It should be fixed before merge. |
Summary
docs/roadmap/ROADMAP.mdNotable reference tools per family
golang.org/x/tools/go/callgraph(VTA/CHA/RTA), callgraph-goFixture sources ready for adoption
Test plan