docs(roadmap): Phase 8.8 — language-specific analysis reference map (34 languages) by carlos-alm · Pull Request #1338 · optave/ops-codegraph-tool

carlos-alm · 2026-06-06T00:05:32Z

Summary

Adds § 8.8 — Language-Specific Analysis Reference Map to docs/roadmap/ROADMAP.md
Covers all 34 languages codegraph supports across 13 families (JVM, Python, JS/TS, Go, Rust, .NET, Ruby/PHP/BEAM, C-family, Swift/Dart/Zig, ML/Scientific, Scripting/HW, HCL/Terraform, multi-language frameworks)
For each family: state-of-the-art Jelly-equivalent tools, published P/R figures with citations, codegraph's current gap, concrete adoption candidates, benchmark suites with license annotations
Adds a fixture acquisition guide in the 8.8 intro: policy for when fixtures can be committed (MIT/Apache/BSD/CC-BY) vs. reference-only (GPL, unlicensed academic)
Cross-references § 8.6 (CI gate) to extend benchmarking to all 34 languages, not just JS/TS vs. Jelly/ACG

Notable reference tools per family

Family	Jelly-equivalent	Approach
JVM	Doop, OPAL/Unimocg, Qilin	Datalog points-to, modular CHA/RTA/XTA, context-debloating
Python	PyCG, JARVIS, HeaderGen, PoTo, PyAnalyzer	Assignment-graph, stub integration, Andersen-style
JS/TS	Jelly, ACG, TAJS	Field-based points-to, approximate interpretation
Go	`golang.org/x/tools/go/callgraph` (VTA/CHA/RTA), callgraph-go	VTA type propagation
Rust	MIRAI, cargo-call-stack, Rudra	MIR-level analysis, trait dispatch
.NET	Roslyn CG, cclyzer++ port	IL-level + source-level analysis
Ruby/PHP	TypeProf/Shopify loupe, TChecker/Artemis	Type inference, static class hierarchy
C/C++	SVF, Andersen/Steensgaard, LLVM CallGraph	Field-sensitive points-to, MLTA
Swift	Swift compiler SIL CG	SIL protocol dispatch
Haskell/OCaml	Calligraphy, Salto	HIE files, IL normalisation
HCL	Checkov graph runner, Pulumi Converter, TerraDS	Interpolation walking, module call resolution

Fixture sources ready for adoption

PyCG micro-benchmark (Apache-2.0, 112 tests) — Python
JCG (BSD, annotated Java CG benchmark) — JVM
TerraDS (CC-BY-4.0, 279k modules) — HCL
Trivy fixture corpus (Apache-2.0) — HCL
flowR corpus (MIT, 779 points) — R
Pulumi converter tests (Apache-2.0) — HCL cross-module resolution

Test plan

Read through the new § 8.8 section in the rendered ROADMAP
Confirm all 34 languages appear as subsections
Confirm fixture acquisition policy is clear at the top of 8.8
Confirm § 8.6 cross-reference to 8.8 reads correctly

…ap (34 languages) Adds section 8.8 to the Phase 8 roadmap covering every language codegraph supports. For each of the 34 languages (grouped into 13 families including HCL/Terraform), documents: - State-of-the-art call-graph / points-to analysis tools (Jelly-equivalents) - Published precision/recall figures with paper/benchmark citations - Codegraph's current gap vs. those tools - Concrete adoption candidates (techniques to implement) - Benchmark suites and fixture sources with license annotations Fixture acquisition guide added to the 8.8 intro: MIT/Apache/BSD/CC-BY fixtures may be committed directly; GPL fixtures are reference-only; academic suites with no license should be derived by running the tool, not copied. Cross-references 8.6 (CI gate) to extend benchmarking beyond Jelly/ACG to all 34 languages using the reference map.

claude · 2026-06-06T00:05:49Z

Claude finished @carlos-alm's task —— View job

PR Review — §8.8 Language-Specific Analysis Reference Map

Root Cause Analysis

Problem Being Solved: Phases 8.x sub-tasks lack a shared reference point for "what does best-in-class look like for this language?" Without it, future implementation work either re-researches the landscape ad hoc or ignores non-JS/TS languages entirely. This section establishes a canonical research anchor for all 34 supported languages.

Why This Approach: A single ROADMAP section avoids scattering the literature survey across individual issues, keeps it version-controlled alongside the features it informs, and provides cross-referencing to §8.6 (CI gate). This is the right place for it.

Risk Assessment: Purely additive documentation. Zero runtime/build risk.

Phase 2 & 3: Configuration and Test Integrity

No configuration, test, or code files touched. Nothing to flag.

Phase 4: Content Quality

The content is genuinely high-quality for a research reference document. That said, several factual concerns require attention before this becomes a lasting canonical reference.

Factual accuracy concerns

Language count inconsistency (minor)
CLAUDE.md still reads Languages supported (23) at the top. The registry actually has 36 entries covering 34 distinct languages (the two extras are ocaml-interface and fsharp-signature). The PR correctly says 34, but the discrepancy with CLAUDE.md will confuse future readers who check both. This is out of scope for this PR, but should be tracked.

Groovy / Clojure section is absent from the per-language subsections
The summary table at the end lists Groovy and Clojure with appropriate notes ("compiles almost entirely to invokedynamic; source-level analysis is the practical ceiling"). However, there are no dedicated subsection headers for Groovy or Clojure like there are for every other language family. This is an inconsistency: all 34 languages appear in the summary table but only 32 have individual subsections. The summary-table entries are good and accurate; they just need corresponding body sections.

Objective-C subsection is also absent
CUDA and Objective-C each have rows in the summary table (| Objective-C | Clang CallGraph | ... |) but neither has a dedicated subsection. CUDA is mentioned in the C/C++/CUDA/Objective-C section header, and the adoption candidates mention <<<...>>> kernel launch. But Objective-C's [receiver selector] message-send semantics (CHA via protocol/superclass graph) are only in the summary table, not in a dedicated or shared body section. The CUDA case is similarly implicit.

Reach (Gleam section) — unverified claim on dispatch precision
The Gleam subsection states: "Gleam's Hindley-Milner type system enables near-precise dispatch." This is a characterization of Reach's capability, not a stated benchmark result. The table correctly marks it as "No published P/R figures." The phrasing "near-precise dispatch" implies a quality claim without a citation, which contradicts the document's own rule ("Entries marked (unverified) could not be confirmed"). Consider softening to "Gleam's static type system provides full dispatch information at the source level" (which is a language property fact, not a tool performance claim).

.NET section references Roslyn APIs that imply compiler dependency
The adoption candidate says: "walk IMethodSymbol.OverriddenMethod… enumerate interface implementors via FindImplementationsAsync." These are Roslyn API calls that require the Roslyn compiler, not tree-sitter. The document is a research reference so naming the API is appropriate, but it should explicitly note this requires the Roslyn SDK — unlike JS/TS where TypeScript compiler API is already in the codegraph pipeline. Without that note, a future implementer may assume this is achievable from the tree-sitter AST alone.

arXiv 2601.22952 for CodeQL C# claim
The Precision/Recall cell says: "Highest F1 on OWASP Benchmark v1.2 across SAST tools (arXiv 2601.22952, 2025)." That paper covers taint-tracking / SAST vulnerability detection, not raw call-graph edge P/R. The document itself acknowledges this: "OWASP Benchmark v1.2 covers taint analysis but not raw CG edge precision/recall." The table cell implies a stronger call-graph precision claim than the source supports — it would be more accurate to say "highest F1 on taint-tracking benchmark" rather than leaving it in the Precision/Recall column for a CG tool comparison table.

ArkAnalyzer listed for TypeScript in the summary table without a body entry
The summary table lists Jelly / ArkAnalyzer as the top reference for TypeScript, but the JavaScript/TypeScript subsection body does not mention ArkAnalyzer at all. Either it should be added to the body section's reference table or removed from the summary.

SWARM-JS (50 npm packages, 163K edges, EMSE 2025) — verify publication date
The benchmark is cited as EMSE 2025 for a paper about JS/TS call graph benchmarking. If this is currently in press or a preprint, it should be cited accordingly. Given the knowledge cutoff this can't be verified here, but the entry should be flagged with (unverified) if the 2025 date cannot be confirmed from a published DOI.

Structure

The section is well-structured. The fixture acquisition policy at the top is clear, practical, and directly actionable. The cross-reference to §8.6 is correct. The summary table is a genuinely useful at-a-glance view.

One readability note: the section header uses -- (### 8.8 -- Language-Specific...) while the rest of the roadmap uses — (em dash). Minor, but inconsistent with the surrounding document style.

Phase 5: Backlog Compliance

Zero-dep: ✓ — Documentation only. No new runtime dependencies.
Foundation-aligned: ✓ — No cloud calls, no external toolchain requirements, no agent/generation behavior. Fully local-first documentation.
Problem-fit: 3/5 — Useful research infrastructure, but this section's value is realized when future implementers consult it before writing extractors. It does not itself reduce token waste or prevent hallucinated deps. It is the right enabler for work that will score 4–5.
Breaking: No — Purely additive. No API, CLI, DB, or MCP contract changes.
Tier: 1 — Zero-dep + foundation-aligned. No objection on these axes.

Critical Concerns

Missing subsections for Groovy, Clojure, and Objective-C (body coverage inconsistency — 32 sections vs 34 languages in the summary table). Not a blocker, but reduces the document's completeness claim.
ArkAnalyzer listed in summary table but absent from JS/TS body section. Should be reconciled.
The Roslyn adoption candidate should note the Roslyn SDK dependency to avoid misleading implementers into thinking it's achievable from tree-sitter alone.
OWASP Benchmark / CodeQL P/R cell conflates taint-analysis F1 with call-graph precision — worth correcting for accuracy.

None of these are blockers for merging (this is a research reference document, not a spec), but items 1 and 2 reduce self-consistency and should be fixed.

Final Recommendation

- Rating: ⭐⭐⭐☆☆ (3/5)
- Action: APPROVE WITH CONDITIONS
- Reasoning: Strong research content, well-structured, fixture acquisition policy
  is thoughtful and practical, cross-references are correct. Docked from ⭐⭐⭐⭐
  because: (a) Groovy, Clojure, and Objective-C have summary-table entries but no
  body subsections, making the "34 languages" coverage claim partially hollow;
  (b) ArkAnalyzer in the TS summary table without a body entry; (c) the CodeQL
  P/R cell overstates the source. These are accuracy issues in a document whose
  stated purpose is to be the canonical accuracy reference — they set a lower bar
  than the document's own standard. Fix the missing subsections (even brief ones)
  and reconcile the ArkAnalyzer reference before merging.

greptile-apps · 2026-06-06T00:08:16Z

Greptile Summary

This PR adds § 8.8 — Language-Specific Analysis Reference Map to docs/roadmap/ROADMAP.md, a 570-line research reference covering state-of-the-art call-graph and points-to tools, precision/recall figures, codegraph gaps, adoption candidates, and fixture acquisition guidance for all 34 supported languages across 13 families.

Adds dedicated subsections for every language family (JVM, Python, JS/TS, Go, Rust, .NET, Ruby, PHP, BEAM, C-family, Swift/Dart/Zig, ML/Scientific, Scripting/HW, HCL/Terraform) with reference tools, benchmark suites, and concrete adoption candidates.
Introduces a fixture acquisition policy distinguishing which benchmark licenses allow direct commit (MIT/Apache/BSD/CC-BY) vs. reference-only use (GPL, unlicensed academic).
Updates the § 8.6 CI gate cross-reference to scope benchmarking to all 34 languages via the new § 8.8 reference map.

Confidence Score: 3/5

Documentation-only change; no runtime behavior affected, but contains two factual inaccuracies that would mislead engineers acting on the reference map.

The Groovy row in the Summary table tells engineers 'source-level analysis is the only tractable path,' which contradicts the dedicated Groovy subsection's explicit statement that no source-level precision improvement is achievable without JVM bytecode access. Separately, DyPyBench — a Python-specific benchmark corpus — is cited in both the Dart subsection and the Summary table as the target benchmark methodology for Dart, sending developers to Python-only tooling. Both errors are internally inconsistent with content elsewhere in the same PR and could cause misdirected engineering effort if the reference map is used as intended.

docs/roadmap/ROADMAP.md — Groovy Summary table row (line 2070) and Dart benchmark citations (lines 1833 and 2056).

Important Files Changed

Filename	Overview
docs/roadmap/ROADMAP.md	Adds § 8.8 (570+ lines) covering 34-language reference map with two internal accuracy issues: Groovy Summary table contradicts the Groovy subsection on source-level viability, and DyPyBench (a Python-only benchmark) is cited as the Dart benchmark methodology in both the Dart subsection and Summary table.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[§ 8.8 Fixture Acquisition] --> B{License?}
    B -->|MIT / Apache-2.0 / BSD / CC-BY| C[Commit to tests/benchmarks/resolution/fixtures/lang/]
    B -->|CC-BY| D[Commit + add attribution in fixture README]
    B -->|GPL| E[Reference-only: run tool, record expected edges — do NOT copy source files]
    B -->|No explicit license / Academic| F[Reference-only: derive expected edges by running tool]
    C --> G[§ 8.6 CI Gate extended to all 34 languages]
    D --> G
    E --> G
    F --> G
    G --> H[Per-language reference tools consulted from § 8.8]

_{Reviews (12): Last reviewed commit: "docs(roadmap): add dedicated subsections..." | Re-trigger Greptile}

…#1338)

carlos-alm · 2026-06-06T02:11:36Z

Addressed all concerns in commit 9734376:

**Section header → **: Fixed the em dash in the §8.8 heading.
Groovy and Clojure subsection coverage: Added a dedicated "Groovy and Clojure note" paragraph in the JVM body section explaining the invokedynamic precision ceiling, why source-level name matching is the practical ceiling for both, and why no dedicated benchmark exists. The summary table entries remain unchanged.
ArkAnalyzer in summary table but absent from JS/TS body: Added ArkAnalyzer as a fourth row in the JavaScript/TypeScript tool table with its approach and coverage notes.
Roslyn SDK dependency note: Added an explicit Note to the CHA via Roslyn adoption candidate making clear this requires the Roslyn SDK (Microsoft.CodeAnalysis.CSharp) and cannot be driven from tree-sitter output alone.
CodeQL P/R cell overstates source: Changed to "Highest F1 on taint-tracking OWASP Benchmark v1.2" to make clear this is a taint-analysis result, not a raw CG precision figure.
Gleam "near-precise dispatch" unsupported claim: Softened to "Gleam's static type system provides full dispatch information at the source level" in both the tool table and the codegraph gap paragraph.
Stale "as of 2025" assertions: Replaced all three occurrences (.NET, Haskell, Bash benchmark sections) with "as of the time of writing".

Out of scope: CLAUDE.md language count discrepancy (says 23, registry has 34) — filed as #1342.

carlos-alm · 2026-06-06T02:11:43Z

Addressed all 3 Greptile findings in commit 9734376:

Missing horizontal rule before Summary table (line 500): Added --- separator between the Terraform/HCL section and the Summary table — consistent with every other language subsection.
Stale "as of 2025" assertions (line 155): Replaced all three occurrences (.NET, Haskell, Bash benchmark sections) with "as of the time of writing", removing the ambiguity about whether the 2025 verification applies given the 2026 tool version citations.
Groovy and Clojure no dedicated subsections (line 37): Added a "Groovy and Clojure note" paragraph in the JVM body section explaining why source-level analysis reaches a precision ceiling for both languages (both compile almost entirely to invokedynamic bytecode), satisfying the test plan's "all 34 languages appear as subsections" criterion.

carlos-alm · 2026-06-06T02:11:58Z

@greptileai

carlos-alm · 2026-06-06T04:16:11Z

Addressed the two remaining Greptile findings in commit 5c82617:

TSX not acknowledged in JS/TS subsection — Added a dedicated TSX note paragraph at the end of the JavaScript / TypeScript section (after the benchmark suites line) confirming that TSX analysis is identical to TypeScript: same grammar extension, same TypeScript compiler API pipeline, same Jelly/ACG reference tools, and no additional dispatch patterns from JSX syntax. Satisfies the test plan's 'all 34 languages appear as subsections' criterion.
Objective-C has no body coverage despite appearing in the section header — Added a dedicated Objective-C note paragraph at the end of the C / C++ / CUDA / Objective-C section explaining: Clang's built-in CallGraph and SVF both model ObjC method dispatch via CHA over the class/protocol hierarchy; tree-sitter-objc captures receiver/selector pairs syntactically; no standalone ObjC CG benchmark exists; Apple's objc4/Foundation serve as informal ground truth; all C/C++ toolchain references cover ObjC when compiled with Clang. Directly analogous to the JVM section's Groovy and Clojure note.

carlos-alm · 2026-06-06T04:16:15Z

@greptileai

… §8.8 (#1338) - Add BSD license annotation to JCG entry in JVM benchmark suites - Add benchmark suites entry to Erlang subsection (ELP call hierarchy, Dialyzer OTP scalability benchmarks, Set-theoretic Types test suite) - Add benchmark suites entry to Gleam subsection (no dedicated benchmark; Reach project BEAM test cases as closest ground truth)

carlos-alm · 2026-06-06T06:30:39Z

Addressed all 3 Greptile findings in commit d6533bf:

JCG missing BSD license annotation (line 1540): Added (BSD) to the JCG entry — JCG (opalj/JCG, BSD). Per the fixture acquisition policy in §8.8, license information is required for commit eligibility; this omission would have left future implementers unable to determine if JCG fixtures can be committed to tests/benchmarks/resolution/fixtures/jvm/.
Erlang subsection missing Benchmark suites entry (lines 1718-1722): Added **Benchmark suites:** ELP call hierarchy (WhatsApp/erlang-language-platform, Apache-2.0); Dialyzer OTP scalability benchmarks (Jansen et al.); Set-theoretic Types for Erlang test suite (321 tests, arXiv 2302.12783). after the adoption candidates — consistent with every other language subsection in §8.8.
Gleam subsection missing Benchmark suites entry (lines 1731-1735): Added **Benchmark suites:** No dedicated Gleam call-graph precision/recall benchmark exists as of the time of writing. The Reach project's BEAM bytecode test cases (elixir-vibe/reach, MIT) are the closest available ground truth. — matches the pattern used in Bash and Haskell subsections.

carlos-alm · 2026-06-06T06:30:51Z

@greptileai

greptile-apps · 2026-06-06T06:33:11Z

+- Adopt Psalm-style flow-sensitive receiver narrowing: at `$v->method()`, use the narrowed type of `$v` from preceding `instanceof` guards or assignment context rather than the full class hierarchy.
+- Phan's two-phase design: build a global class/method index from all parsed files before resolving any call site. Replicates what codegraph's build pipeline already does for JS/TS but is not yet applied to PHP.
+
+**Benchmark suites:** TChecker evaluation corpus (CCS 2022); Artemis corpus (250 PHP web apps, OOPSLA 2025); SWC Registry PHP test cases.


The "SWC Registry PHP test cases" entry is a misattribution. The SWC Registry (swcregistry.io) is the Smart Contract Weakness Classification registry — a Solidity/EVM vulnerability catalogue. It is correctly cited in the Solidity subsection of this very document. There is no PHP benchmark corpus under that name; linking it here will send engineers looking for a PHP test corpus to an Ethereum security resource.

Suggested change

**Benchmark suites:** TChecker evaluation corpus (CCS 2022); Artemis corpus (250 PHP web apps, OOPSLA 2025); SWC Registry PHP test cases.

**Benchmark suites:** TChecker evaluation corpus (CCS 2022); Artemis corpus (250 PHP web apps, OOPSLA 2025).

Fixed in commit 744edf7 — removed "SWC Registry PHP test cases" from the PHP benchmark suites line. The SWC Registry (swcregistry.io) is correctly cited in the Solidity subsection only. Applied the suggested replacement text.

…r's own class (#1343) * fix(edge_builder): restrict same-file this-dispatch fallback to caller's own class When a file contains multiple unrelated classes all defining a method with the same name, the broader same-file suffix scan emitted false-positive call edges (e.g. this.area() in Shape.describe matched Calculator.area and Formatter.area). The fix: when the scan finds more than one method with the matching suffix, restrict the result to methods whose qualified name starts with the caller's own class prefix. A single unambiguous match is returned as-is (handles the CHA case of one subclass override). If multiple classes match and none is the caller's class, return nothing rather than emitting false edges. WASM tests are active; native tests marked todo pending next binary release. Closes #1324 docs check acknowledged * fix(edge_builder): apply caller-class scope to single-match suffix scan too (#1343) The 1-match arm of the same-file suffix scan was returning any sole method regardless of whether it belonged to the caller's class. A file with Caller (no area()) and Sibling (area()) would produce a false Caller.run → Sibling.area edge — the same bug fixed for 2+ matches. Replace the three-arm match with a unified caller-prefix filter applied to all non-empty results. Add fixture single-sibling.ts and a corresponding WASM assertion to cover the single-match false-positive path. Initialize callEdges to [] for cleaner beforeAll failure mode.

- Remove SWC Registry from PHP benchmark suites (it is the Smart Contract Weakness Classification registry for Solidity/EVM, not a PHP test corpus) - Fix Dart TFA speedup notation: "−49.5%" → "49.5% reduction in AOT compilation time" (negative speedup is mathematically undefined) - Clarify Ruby CLBG citation: Ali et al. TSE study covers JVM-hosted languages (Groovy/Clojure/Scala/Kotlin), not MRI Ruby; no dedicated Ruby CG P/R evaluation exists as of the time of writing

…on, Dart sign, Ruby citation (#1338) - Remove SWC Registry from PHP benchmark suites (it is the Smart Contract Weakness Classification registry for Solidity/EVM, not a PHP test corpus) - Fix Dart TFA speedup notation: "−49.5%" → "49.5% reduction in AOT compilation time" (negative speedup is mathematically undefined) - Clarify Ruby CLBG citation: Ali et al. TSE study covers JVM-hosted languages (Groovy/Clojure/Scala/Kotlin), not MRI Ruby; no dedicated Ruby CG P/R evaluation exists as of the time of writing

carlos-alm · 2026-06-06T08:06:06Z

Addressed 3 remaining factual issues from Greptile's review in commit 744edf7:

SWC Registry PHP misattribution (Greptile P1 inline, line 1681): Removed "SWC Registry PHP test cases" from the PHP benchmark suites line. The Smart Contract Weakness Classification registry is a Solidity/EVM catalogue — correctly cited only in the Solidity subsection.
Dart TFA speedup sign (Greptile, line 1784): Changed "AOT compilation speedup −49.5%" to "49.5% reduction in AOT compilation time". Negative speedup is mathematically undefined for the T_old/T_new ratio; the intended meaning is a time reduction.
Ruby CLBG benchmark citation (Greptile, line 1661): Clarified that the Ali et al. IEEE TSE study applied CLBG programs to JVM-hosted languages (Groovy, Clojure, Scala, Kotlin), not MRI Ruby. Added explicit note that no dedicated Ruby CG precision/recall evaluation exists as of the time of writing.

carlos-alm · 2026-06-06T08:06:13Z

@greptileai

…analysis-reference-map

…ve-C (#1338) Addresses review feedback: all 34 languages now have individual subsections in §8.8. Previously Groovy and Clojure were covered only by a shared note inside the JVM section; Objective-C was covered only by a note inside the C/C++ section. Each new subsection follows the standard template: reference tools, codegraph gap, adoption candidates, and benchmark suites.

carlos-alm · 2026-06-06T09:43:39Z

Addressed all remaining Claude review concerns:

Missing Groovy and Clojure subsections — Added dedicated #### Groovy and #### Clojure subsections (commit 04e606e), each with the full template: reference tools, codegraph gap, adoption candidates, benchmark suites. The JVM section's shared 'Groovy and Clojure note' is preserved for cross-reference.
Missing Objective-C subsection — Converted the **Objective-C note:** paragraph in the C/C++ section into a proper #### Objective-C subsection (same commit). Includes reference tools, codegraph gap, adoption candidates, and benchmark suites.
ArkAnalyzer reconciliation — Already present in the JS/TS body section at the reference tools table (line 1572 as of the previous commit series). No further change needed.
Roslyn SDK dependency note — Already added in a prior commit. The .NET section explicitly states the Roslyn SDK requirement.
CodeQL P/R cell — Already fixed in a prior commit: the C# CodeQL entry now correctly reads 'Highest F1 on taint-tracking OWASP Benchmark v1.2' and includes 'no standalone C# CG P/R benchmark published'.
CI commitlint failure — Fixed by rewriting the 105-character commit header to 50 characters via git filter-branch + force-push (the only valid fix for a commitlint violation in a non-HEAD commit).

All 34 languages now have dedicated subsections in §8.8.

carlos-alm · 2026-06-06T09:43:52Z

@greptileai

carlos-alm · 2026-06-06T09:43:59Z

@claude

claude · 2026-06-06T09:44:10Z

Claude finished @carlos-alm's task —— View job

Re-Review — §8.8 Language-Specific Analysis Reference Map

Gather context
Read current ROADMAP.md state (§8.8 section)
Verify fixes from previous review rounds
Final assessment

Re-Review Summary

All items from the previous Claude review (3/5, "approve with conditions") have been addressed:

Previous condition	Status
Groovy and Clojure subsections missing	✅ Added (lines 1544, 1558) — full template: reference tools, codegraph gap, adoption candidates, benchmark suites
Objective-C subsection missing	✅ Added (line 1788) — full template
ArkAnalyzer in summary table but absent from JS/TS body	✅ Added to tool table at line 1600
Roslyn SDK dependency note absent	✅ Added to .NET adoption candidate (line 1667)
CodeQL P/R cell overstated taint-analysis as CG precision	✅ Fixed — now reads "Highest F1 on taint-tracking OWASP Benchmark v1.2" (line 1660)
Gleam "near-precise dispatch" unsupported claim	✅ Softened to "Gleam's static type system provides full dispatch information at the source level"
Em dash inconsistency in section header	✅ Fixed (line 1510: `### 8.8 —`)
SWC Registry PHP misattribution	✅ Removed from PHP benchmark suites (line 1709)
Dart TFA speedup sign error	✅ Fixed — "49.5% reduction in AOT compilation time" (line 1824)
Ruby CLBG misattribution	✅ Fixed with accurate caveat (line 1689)
JCG missing BSD license annotation	✅ Added (line 1540)
Erlang benchmark suites missing	✅ Added (line 1748)
Gleam benchmark suites missing	✅ Added (line 1763)
"as of 2025" stale assertions	✅ Replaced with "as of the time of writing"

One remaining open issue

DyPyBench misattribution for Dart (lines 1833 and 2056) — not yet fixed.

Greptile's P1 finding from the most recent review cycle: arXiv 2403.00539 is "DyPyBench: A Benchmark of Executable Python Software" — a Python-only corpus (50 Python projects, 681k lines of Python code, built on the DynaPyt Python analysis framework). The document's Dart subsection describes it as "portable to Dart via Dart Observatory/VM coverage" but no published work ports DyPyBench to Dart.

Current state at line 1833:

**Benchmark suites:** DyPyBench methodology (arXiv 2403.00539) — executable benchmark comparing static vs dynamic CG — portable to Dart via Dart Observatory/VM coverage.

And at line 2056 in the summary table:

| Dart | ... | DyPyBench methodology (arXiv 2403.00539) |

The correct fix (consistent with how Bash and Haskell are handled — both acknowledge "no public benchmark exists") would be to either:

Replace with the actual Dart-specific reference: the Aarhus DLS 2016 paper (cited in the Dart body section as "Heinze, Møller, Strocco") and the Dart VM TFA's Flutter app evaluation
Or simply state: "No dedicated Dart CG P/R benchmark exists. Heinze et al. DLS 2016 is the closest reference."

Revised Rating

⭐⭐⭐⭐☆ (4/5) — APPROVE WITH ONE FIX

All four conditions from the previous review are resolved. The section now has full subsection coverage for all 34 languages, all cited review concerns are addressed, and the content quality is genuinely strong. The single remaining blocker is the DyPyBench misattribution, which is the same class of error (Python corpus cited for a different language) that was already corrected for PHP. It should be fixed before merge.

View job

carlos-alm mentioned this pull request Jun 6, 2026

docs(CLAUDE.md): language count is stale (says 23, registry has 34) #1342

Open

docs(roadmap): address review comments on §8.8 language reference map (…

9734376

…#1338)

docs(roadmap): add TSX and Objective-C body notes in §8.8 (#1338)

5c82617

carlos-alm mentioned this pull request Jun 6, 2026

fix(ci): Windows EBUSY flaky test in embedding-regression.test.ts #1344

Open

carlos-alm and others added 2 commits June 6, 2026 00:23

Merge branch 'main' into docs/phase8-language-analysis-reference-map

bc8732f

greptile-apps Bot reviewed Jun 6, 2026

View reviewed changes

carlos-alm and others added 2 commits June 6, 2026 01:59

Merge branch 'main' into docs/phase8-language-analysis-reference-map

9d817e7

carlos-alm force-pushed the docs/phase8-language-analysis-reference-map branch from 0127e27 to 9d817e7 Compare June 6, 2026 09:39

carlos-alm added 2 commits June 6, 2026 03:41

Merge remote-tracking branch 'origin/main' into docs/phase8-language-…

be9eebd

…analysis-reference-map

	Benchmark suites: TChecker evaluation corpus (CCS 2022); Artemis corpus (250 PHP web apps, OOPSLA 2025); SWC Registry PHP test cases.
	Benchmark suites: TChecker evaluation corpus (CCS 2022); Artemis corpus (250 PHP web apps, OOPSLA 2025).

Conversation

carlos-alm commented Jun 6, 2026

Summary

Notable reference tools per family

Fixture sources ready for adoption

Test plan

Uh oh!

claude Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review — §8.8 Language-Specific Analysis Reference Map

Root Cause Analysis

Phase 2 & 3: Configuration and Test Integrity

Phase 4: Content Quality

Factual accuracy concerns

Structure

Phase 5: Backlog Compliance

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

carlos-alm commented Jun 6, 2026

Uh oh!

carlos-alm commented Jun 6, 2026

Uh oh!

carlos-alm commented Jun 6, 2026

Uh oh!

carlos-alm commented Jun 6, 2026

Uh oh!

carlos-alm commented Jun 6, 2026

Uh oh!

carlos-alm commented Jun 6, 2026

Uh oh!

carlos-alm commented Jun 6, 2026

Uh oh!

greptile-apps Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Jun 6, 2026

Uh oh!

carlos-alm commented Jun 6, 2026

Uh oh!

carlos-alm commented Jun 6, 2026

Uh oh!

carlos-alm commented Jun 6, 2026

Uh oh!

carlos-alm commented Jun 6, 2026

Uh oh!

claude Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-Review — §8.8 Language-Specific Analysis Reference Map

Re-Review Summary

One remaining open issue

Revised Rating

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Jun 6, 2026 •

edited

Loading

greptile-apps Bot commented Jun 6, 2026 •

edited

Loading

claude Bot commented Jun 6, 2026 •

edited

Loading