Skip to content

research(bench): add javac compilation step to Java fixture for javacg-static comparison#1329

Closed
carlos-alm wants to merge 1 commit into
mainfrom
research/bench-javacg-static-1307
Closed

research(bench): add javac compilation step to Java fixture for javacg-static comparison#1329
carlos-alm wants to merge 1 commit into
mainfrom
research/bench-javacg-static-1307

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Adds tests/benchmarks/resolution/fixtures/java/Makefile — compiles all .java sources to fixture.jar via javac -d . *.java && jar cf fixture.jar benchmark/
  • Adds scripts/compare-javacg.mjs — runs javacg-static on the compiled JAR, parses its M:pkg.Class:method (T)pkg.Class:method output, maps to ClassName.method form, and computes precision/recall vs expected-edges.json
  • Updates docs/benchmarks/RESOLUTION-COMPARISON.md with the javacg-static methodology, a placeholder comparison table, and reproduction instructions

Reproducing

# 1. Compile the fixture
cd tests/benchmarks/resolution/fixtures/java && make && cd -

# 2. Download javacg-static (or build from source)
# https://github.com/gousiosg/java-callgraph/releases

# 3. Run the comparison
node scripts/compare-javacg.mjs --jar /path/to/javacg-0.1-SNAPSHOT.jar

Name mapping

javacg-static uses pkg.ClassName:method(JVM-descriptors) form. The script maps:

  • Source <init>ClassName.ClassName (constructor-as-method, matching expected-edges.json)
  • Target <init>ClassName (constructor target = class name only)
  • JDK calls (HashMap, String, System.out, …) are filtered out — only edges between fixture classes count

Closes #1307

@claude
Copy link
Copy Markdown

claude Bot commented Jun 5, 2026

Claude finished @carlos-alm's task —— View job


PR Review: research(bench): add javac compilation step + javacg-static comparison

  • Gather context and read changed files
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity
  • Phase 4: Code quality review
  • Phase 5: Backlog compliance
  • Final assessment

Root Cause Analysis

Problem Being Solved: The existing Java benchmark (RESOLUTION-COMPARISON.md) had a placeholder row for javacg-static with all dashes, because no Java bytecode existed to actually run the tool against. This PR closes that gap by: (1) compiling the fixture .java files into a JAR, and (2) providing a script to run javacg-static and compute precision/recall vs expected-edges.json.

Why This Approach: javacg-static requires compiled bytecode (.class files / JARs), so a compilation step is a genuine prerequisite — not a workaround. The Makefile approach is minimal and keeps the build artifact out of git via .gitignore. The comparison script follows the exact same pattern as the existing compare-tools.mjs.

Risk Assessment: Low. No production code changed. No tests weakened. The new script is a standalone research/dev tool, not part of the CI pipeline.


Phase 2: Configuration Changes

None. No linting rules, TypeScript flags, test timeouts, or build config were modified.


Phase 3: Test Integrity

No test assertions weakened. No test files modified. The only doc update is additive (filling out the Java section that previously said "run compare-javacg.mjs to populate"). Clean.


Phase 4: Code Quality

Overall the script is well-structured and follows the pattern of compare-tools.mjs. A few issues worth flagging:

scripts/compare-javacg.mjs

  1. buildClassFileMap only picks the first class per file (line 74). The regex stops at the first class/interface declaration. Java files often contain inner classes or multiple top-level types (the fixture's InMemoryUserRepository.java likely declares a single class, so this probably works for the current fixture, but it would silently drop inner classes). Since this is a benchmark script and the fixture is small and controlled, this is acceptable, but worth a comment.

  2. parseJavacgOutput regex [CSOIDE] — the javacg-static docs list C, S, O, I, D as call types. E isn't listed in the header comment (line 10–11) but appears in the regex on line 156. Either the comment should mention E, or E should be removed. Minor doc/code inconsistency.

  3. Self-edge filter at line 182 — skips sourceName === targetName && sourceFile === targetFile. This correctly excludes simple recursion. However, a constructor calling itself (e.g. delegating constructor) would have sourceName = ClassName.ClassName and targetName = ClassName — these are different strings, so such an edge would not be incorrectly filtered. Looks correct.

  4. fs.readdirSync(fixtureDir) in buildClassFileMap (line 71) reads the fixture root, not the compiled benchmark/ subdirectory. This is correct — the .java source files are in the root of the fixture dir. Fine.

  5. runJavacg swallows stderr entirely when stdout has content (lines 138–139). If javacg-static emits partial warnings to stderr alongside useful stdout, those are silently discarded. For a research tool this is fine, but it could obscure surprising tool behavior.

  6. No --help / --version flag handling — missing but consistent with compare-tools.mjs. Not a concern for a dev-only script.

tests/benchmarks/resolution/fixtures/java/Makefile

  • javac -d . $(SRCS) compiles all .java files in the root — correct.
  • jar cf $(JAR) benchmark/ — packages the output. The compiled class files land in benchmark/ because the package is benchmark (matching the Java package declaration in the source files). Worth verifying this assumption: if any source file lacks a package benchmark; declaration, classes will land in the root and jar cf ... benchmark/ would silently produce an empty jar. Since the fixture already exists and was clearly tested, this is almost certainly fine.
  • clean removes fixture.jar and benchmark/ — correct.
  • .PHONY targets declared. Good.

tests/benchmarks/resolution/fixtures/java/.gitignore

Correctly excludes fixture.jar and benchmark/ (the compiled class output). Good.

docs/benchmarks/RESOLUTION-COMPARISON.md

The placeholder javacg-static row (— | — | — | — | —) is a legitimate "populate me by running the script" marker. The text is honest and clear. The Java section correctly explains the name-mapping convention and points to the script. No factual claims are made about javacg-static's actual results (the table is left blank pending a real run). This is the right approach — the alternative would be to run the tool, commit the numbers, and update the PR, but that requires the user to have javacg-static installed, which is reasonable to leave for the researcher running the comparison.


Phase 5: Backlog Compliance

Zero-dep ✓ — No new entries in package.json. javacg-static is an external Java tool invoked via java -jar, not an npm dependency. execFileSync is already used throughout the codebase.

Foundation-aligned ✓ — This is a research/benchmark script in scripts/, not part of the core pipeline. It doesn't modify the graph, touch the DB, or affect the programmatic API. Doesn't violate any FOUNDATION.md principle.

Problem-fit: 2/5 — This is calibration/research tooling for understanding codegraph's Java recall gaps. It doesn't directly reduce token waste for AI agents or prevent hallucinated deps. It feeds the diagnostic work that will eventually improve codegraph (e.g., fixing the same-file and static 0% recalls), but is one step removed from the core problem. That's fine for a research branch — the PR title and branch name make it clear this is research work.

Breaking: No — Purely additive. No API, DB schema, CLI output, or MCP contract changes.

Tier: 1 — Zero new deps, fully foundation-aligned.


Minor Suggestions (non-blocking)

  1. Line 156 — regex character class: Either add E to the call type documentation in the file header, or confirm E (edge?) is actually emitted by javacg-static and add it to the comment at line 10.
  2. Line 74 — inner class handling: Add a brief comment that only the first declared class per file is mapped (e.g., // maps first declared class per file — inner classes not indexed).
  3. Makefile assumption: If not already verified, confirm the fixture Java sources all declare package benchmark; so the class output lands in benchmark/ as expected.

Final Assessment

## Root Cause Analysis
Problem Being Solved: Fill the javacg-static placeholder in the Java
  benchmark comparison — requires compiled bytecode that previously didn't exist.
Why This Approach: javacg-static mandates JARs; the Makefile is the minimal
  correct build step. Script follows existing compare-tools.mjs pattern exactly.
Risk Assessment: Low. No production code, tests, or CI pipeline touched.

## Backlog Compliance
- Zero-dep: ✓ — no new npm deps; java/jar via execFileSync
- Foundation-aligned: ✓ — research script, outside core pipeline
- Problem-fit: 2/5 — calibration tooling, one step removed from core problem
- Breaking: No
- Tier: 1

## Critical Concerns
None.

## Final Recommendation
- Rating: ⭐⭐⭐⭐ (4/5)
- Action: APPROVE
- Reasoning: Well-scoped research addition. Follows existing patterns precisely,
  no tests weakened, no config relaxed, no new deps. The three minor issues
  (call-type `E` undocumented, single-class-per-file assumption undocumented,
  Makefile package assumption) are non-blocking for a research script. The
  placeholder table is honest and correctly deferred to a real tool run.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 5, 2026

Greptile Summary

This PR adds a javac compilation step to the Java benchmark fixture and introduces scripts/compare-javacg.mjs, a standalone script that runs javacg-static (CHA) on the compiled JAR and computes precision/recall against expected-edges.json. Documentation is updated accordingly.

  • tests/benchmarks/resolution/fixtures/java/Makefile compiles all fixture .java files (all in package benchmark) to benchmark/*.class and bundles them into fixture.jar; a .gitignore keeps the build outputs out of source control.
  • scripts/compare-javacg.mjs discovers and runs the javacg-static JAR, parses its M:…(T)… output, maps pkg.Class:method tokens to ClassName.method form (with <init>ClassName.ClassName / ClassName conventions), and computes TP/FP/FN vs the ground-truth edge set.
  • docs/benchmarks/RESOLUTION-COMPARISON.md adds the javacg-static methodology, a placeholder comparison table (populated at runtime), and reproduction instructions.

Confidence Score: 5/5

Safe to merge — this is a self-contained benchmark tooling addition with no changes to production code paths.

All changed files are benchmark fixtures, a new standalone comparison script, and documentation. The core parsing and metrics logic in compare-javacg.mjs correctly maps javacg-static output to the expected-edges.json convention, and precision/recall computation is correct. No production code is touched.

No files require special attention.

Important Files Changed

Filename Overview
scripts/compare-javacg.mjs New comparison script; core parsing and metrics logic is sound; minor edge case in --jar argument handling when no value is provided
tests/benchmarks/resolution/fixtures/java/Makefile Correct two-step build: javac -d . produces benchmark/*.class, jar cf bundles them; clean target removes both outputs
tests/benchmarks/resolution/fixtures/java/.gitignore Correctly ignores fixture.jar and the benchmark/ compiled output directory
docs/benchmarks/RESOLUTION-COMPARISON.md Documentation updated with javacg-static methodology, placeholder table, and reproduction instructions; no issues found

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Start: node compare-javacg.mjs] --> B{Find javacg JAR}
    B -->|--jar arg| C[Use provided path]
    B -->|JAVACG_JAR env| D[Use env var path]
    B -->|scripts/lib/*.jar| E[Use auto-discovered jar]
    B -->|none found| F[Exit: JAR not found]
    C --> G[buildClassFileMap\nScan .java files]
    D --> G
    E --> G
    G --> H[loadGroundTruth\nParse expected-edges.json]
    H --> I{fixture.jar exists?}
    I -->|No| J[Exit: run make first]
    I -->|Yes| K[runJavacg\njava -jar javacgJar fixtureJar]
    K --> L[parseJavacgOutput\nFilter JDK edges, map names]
    L --> M[computeMetrics\nTP / FP / FN]
    M --> N{--json flag?}
    N -->|Yes| O[stdout: JSON]
    N -->|No| P[stdout: Markdown table]
Loading

Fix All in Claude Code

Reviews (3): Last reviewed commit: "chore(bench): add javac fixture compilat..." | Re-trigger Greptile

Comment on lines +69 to +77
function buildClassFileMap(fixtureDir) {
const map = new Map();
for (const filename of fs.readdirSync(fixtureDir)) {
if (!filename.endsWith('.java')) continue;
const src = fs.readFileSync(path.join(fixtureDir, filename), 'utf8');
const m = src.match(/(?:public\s+)?(?:abstract\s+)?(?:class|interface)\s+(\w+)/);
if (m) map.set(m[1], filename);
}
return map;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 buildClassFileMap silently drops classes with unhandled modifiers

The regex only handles optional public and abstract prefixes before class/interface. Any Java file using final class, sealed class, non-sealed class, record, or enum won't match, so the class name won't be added to classFileMap. All edges involving such a class are then silently filtered out in parseJavacgOutput (the !sourceFile || !targetFile guard), which deflates the predicted-edge count and inflates measured precision without any warning. A simple validation — asserting that classFileMap contains one entry per .java file before running the comparison — would catch this class of silent failure.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — broadened the regex in buildClassFileMap to handle all Java modifier combinations (final, sealed, non-sealed, protected, private, strictfp) as well as additional type keywords (enum, record). Also added a validation that warns (with a clear message) if any .java file doesn't yield a class name, and a final assertion that map.size === javaFiles.length so skewed precision/recall is surfaced immediately.

Comment thread scripts/compare-javacg.mjs Outdated
Comment on lines +155 to +156
// M: caller (T) callee — the space between caller and (T) may vary
const lineRe = /^M:(\S+)\s+\(([CSOIDE])\)(\S+)$/;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The comment at the top of the file enumerates five call-type codes (C, S, O, I, D) but the regex character class [CSOIDE] also includes E, which isn't documented anywhere. javacg-static doesn't emit an E type, so it's harmless, but a stale character in the regex is confusing. Adding an inline legend makes it easy to verify the regex is exhaustive.

Suggested change
// M: caller (T) callee — the space between caller and (T) may vary
const lineRe = /^M:(\S+)\s+\(([CSOIDE])\)(\S+)$/;
// M: caller (T) callee — the space between caller and (T) may vary
// T values: C=virtual, S=static, O=special, I=interface, D=dynamic
const lineRe = /^M:(\S+)\s+\(([CSOIDE])\)(\S+)$/;

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — removed the undocumented E from the character class and added an inline legend for all call-type codes: // T values: C=virtual, S=static, O=special (constructors/super), I=interface, D=dynamic (invokedynamic). The regex is now [CSOID] which exactly matches what javacg-static documents and emits.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Codegraph Impact Analysis

9 functions changed2 callers affected across 1 files

  • findJavacgJar in scripts/compare-javacg.mjs:51 (1 transitive callers)
  • buildClassFileMap in scripts/compare-javacg.mjs:73 (1 transitive callers)
  • parseMethodSpec in scripts/compare-javacg.mjs:101 (2 transitive callers)
  • toSourceName in scripts/compare-javacg.mjs:116 (2 transitive callers)
  • toTargetName in scripts/compare-javacg.mjs:121 (2 transitive callers)
  • loadGroundTruth in scripts/compare-javacg.mjs:127 (1 transitive callers)
  • runJavacg in scripts/compare-javacg.mjs:142 (1 transitive callers)
  • parseJavacgOutput in scripts/compare-javacg.mjs:171 (1 transitive callers)
  • computeMetrics in scripts/compare-javacg.mjs:208 (1 transitive callers)

@carlos-alm carlos-alm force-pushed the research/bench-javacg-static-1307 branch from 7e3a5ac to b12b579 Compare June 5, 2026 22:23
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed the feedback from this review:

  • buildClassFileMap regex broadened — now handles final, sealed, non-sealed, protected, private, strictfp modifiers and enum/record type keywords. Added a per-file warning and a final size assertion so any unresolved class causes an explicit warning rather than silent result skew.
  • Inner class note added — JSDoc now states "Maps the first top-level type per file — inner classes are not indexed."
  • Call-type E removed — the undocumented character was removed from the lineRe character class. The comment now lists all five documented call types inline: C=virtual, S=static, O=special, I=interface, D=dynamic.
  • Commit type fixed — the original commit used the non-standard research type; the commit has been updated to chore(bench) to comply with the allowed type list.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm
Copy link
Copy Markdown
Contributor Author

Superseded by #1332 — branch renamed from research/bench-javacg-static-1307 to chore/bench-javacg-static-1307 to satisfy branch name validation.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

Superseded by #1332 — branch renamed to chore/bench-javacg-static-1307 to fix CI branch name validation.

@carlos-alm carlos-alm closed this Jun 5, 2026
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 5, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

research(bench): add javac compilation step to Java fixture for javacg-static comparison

1 participant