chore(java): bump JDK 21 → 25 + ZGC investigation findings#145
Conversation
bench-cross-runtime.sh's java leg previously hard-wired the launch command, so any JVM tuning experiment (ZGC vs G1, different heap sizes, preview-flag toggles) required hand-editing the script per run. Add a JAVA_BENCH_OPTS env var that word-splits into the java launch between `java` and `-cp`. Default empty = no flags = current behavior preserved. Usage: JAVA_BENCH_OPTS="-XX:+UseZGC" scripts/bench-cross-runtime.sh ... Word-splitting on env input is intentional (multiple flags); shellcheck disable comment documents the choice. The other two runtimes (go, cpp) are unaffected — they already use binary CLI flags for tuning.
We have no historical dependency tying us to JDK 21: - pine-java is greenfield 2026 code (pine-java-full-implementation reflection) with no `sun.misc.*`, no module overrides, no reflection trickery against deprecated VM internals. - The benchmark host already runs OpenJDK 25.0.2 — so the v0.10.9 README bench numbers were already measured on 25 at runtime, with only `<source>/<target>` and CI setup-java pinned to 21. Bumping closes that stale gap. Risk audited and cleared: - LuaJ 3.0.1 (2014, the BCEL-backed luajc compiler that emits Lua → JVM bytecode and underpins the v0.10 perf baseline) was the obvious worry: would it emit verifier-clean bytecode under target 25? Built + ran the full 246-test suite under target 25 — all pass, including TransformByLuaCompilerBackendTest's luajc/luac equivalence cases. - BCEL 6.10.0 has no JDK 25 release notes regression; the suite proves it. 10 setup-java references swept (ci.yml × 6, daily-sanitized-fuzz, nightly-benchmark, nightly-diff-fuzz, release). Both READMEs say `Java 25+`. Language layer gives us no immediate win — we don't use any 22-25 language features yet (Stable Values / Scoped Values stable in 25, etc. are opportunities, not requirements). The win is closing the stale stack-vs-runtime gap and unlocking 22-25 features when we want them. LTS-to-LTS jump: 21 LTS → 25 LTS. JDK 25 GA'd 2026-01.
…bration Three stable docs land + one stale assumption corrected: 1. `memory/decisions/pine-java-gc-choice.md` (NEW, 76 lines) — pine-java GC choice decision. Under the current deployment shape (4G heap / 2C cgroup / throughput-bound / single-request 100+ ms) G1 (the JDK 21+ default) is the right choice; ZGC is a net -5 to -7 % loss because its concurrent-phase CPU steal (1087 ms in an 80 s bench window, ~6.7 % of 2-core wall) is not paid back when G1 already has 0 pauses >50 ms and a max of 12.82 ms. Records the 2026-06-26 A-stage experiment numbers and the GC-log evidence verbatim so future challengers don't have to re-run. Lists the four re-open triggers (heap ≥16 G / cores ≥8 / P99 ≤10 ms SLO / G1 STW max >50 ms in prod) and the layered relationship with `perf-evolution-roadmap.md` (engine-side perf vs JVM process parameters). 2. `memory/reflections/jdk25-upgrade-and-zgc-investigation.md` (NEW, 118 lines) — the task-level reflection. Captures the "performance hypothesis must be measured, not guessed" lesson (my initial recommendation was driven by an unverified "ZGC tightens stddev" prior that turned out wrong: G1 max STW pause 12.82 ms ≪ calibrated stddev 35 ms, so GC was never the stddev source). Captures the LuaJ 3.0.1 + BCEL 6.10.0 risk falsification methodology (246 tests passed under target 25 including the luajc/luac equivalence suite), and the compile-target ≠ runtime-version gotcha (the host already ran OpenJDK 25.0.2 while pom said target 21, so v0.10.9 README bench numbers were already 25-runtime measurements). 3. `guides/benchmark-hygiene.md` (+10 lines) — new "stddev 来源 校准" subsection between the same-host comparison discipline and the fixture representativeness section. Names the three real stddev drivers (DAG multi-op scheduling jitter / LuaJ JIT warmup tail / HTTP keepalive jitter), explicitly excludes GC pause via the measured G1 max 12.82 ms data point, and warns that GC-tuning to tighten stddev is a wrong-direction fix unless GC logs prove otherwise. 4. `index.md` — three line updates: new reflection entry, new decision entry, and the bench-hygiene description gains the stddev calibration topic. Out of scope (deliberately untouched): `perf-evolution-roadmap.md` (cross-reference left for a future PR if the layering needs to be made explicit), `must/conventions.md` (the JVM toolchain check "run java -version before pom-target audits" deserves a separate promotion pass), `architecture/dag-engine.md:710` (still says "Java 21+" but README is the user-facing source-of-truth and was bumped in `62475e27`).
🔍 PR 审查
干净的 JDK 21→25 升级 PR:编译目标对齐 + CI 全量同步 + 已验证项:
|
Two comment-only corrections inside scripts/bench-cross-runtime.sh caught by the agentic PR reviewer; no functional change. 1. Header `Prerequisites` block still listed `Java 21`. The same PR bumped pom + CI + READMEs to 25, so the file-internal comment was the last stale "Java 21" reference. Updated to `Java 25`. 2. The new `JAVA_BENCH_OPTS` example showed `-XX:+UseZGC -XX:+ZGenerational`. Generational ZGC is the default since JDK 24 and the non-generational mode is removed in 25, so the `-XX:+ZGenerational` flag is now obsolete — passing it emits a warning on every server startup. Reduced the example to just `-XX:+UseZGC` and noted that generational mode is default since JDK 24.
🔍 PR 增量审查
增量改动正是 commit 已验证项:
本次增量无新增问题。全量 PR(JDK 21→25 编译目标对齐 + CI 全量同步 + |
Summary
依次走 A/B/C 路径评估 pine-java JVM 升级,最终落地 B(JDK 25 编译目标对齐),A(ZGC 切换)实测证伪后归档结论到 llmdoc。
3 个 commit 单域
c157242a— JAVA_BENCH_OPTS env hook(基础设施)scripts/bench-cross-runtime.sh新增JAVA_BENCH_OPTSenv 透传 JVM flag,让 GC / heap / preview flag 实验可重复。用法:```bash
JAVA_BENCH_OPTS="-XX:+UseZGC" scripts/bench-cross-runtime.sh ...
```
默认空 = 无 flag = 现有行为不变。
62475e27— JDK 21 → 25 升级(pom + CI + README)pine-java/pom.xml:<source>/<target>21 → 25setup-java21 → 25(ci.yml × 6 + daily-sanitized-fuzz + nightly-benchmark + nightly-diff-fuzz + release)LuaJ 风险审计证伪: 升级前怀疑 LuaJ 3.0.1(2014 年代码)+ BCEL 6.10.0 emit bytecode 在 JDK 25 verifier 严格化下会炸。实测 246 tests / 0 failures(含
TransformByLuaCompilerBackendTest的 luajc/luac 双后端等价测试)。stale gap 修复: benchmark host 早已是 OpenJDK 25.0.2 runtime,pom 还停在 target 21。意味着 v0.10.9 README bench 数据其实早已在 25 runtime 上跑出来,升级 B 只是补齐编译目标的形态对齐。
24612e19— llmdoc 沉淀3 篇文档 + 1 篇索引更新:
[NEW]llmdoc/memory/decisions/pine-java-gc-choice.md(76 行) — G1 vs ZGC 决策、实验数据、重启触发条件[NEW]llmdoc/memory/reflections/jdk25-upgrade-and-zgc-investigation.md(118 行) — 任务级复盘[UPDATE]llmdoc/guides/benchmark-hygiene.md(+10 行) — 新增 "stddev 来源校准" 子节[UPDATE]llmdoc/index.md— 同步入口A 阶段实验:ZGC vs G1(2026-06-26)
结果: ZGC 在 pine-java calibrated 形态下 net loss -5~7% QPS,不切默认。
| Fixture | G1 baseline | ZGC | Δ |
|---|---:|---:|
| calibrated_2c4g | 127.9 | 120.9 | −5.5% |
| calibrated | 128.1 | 119.0 | −7.1% |
| calibrated_itemlua | 126.5 | 119.5 | −5.5% |
stddev 33-36ms 两边无变化(直接证伪 "ZGC 收紧 stddev" 假设)。
GC log 实测根因
根因: 2C cgroup 下 ZGC 1087ms concurrent ≈ 6.7% CPU 偷窃,与实测 −5~7% QPS 吻合。calibrated stddev 35ms 主导源是 DAG 调度抖动 + LuaJ JIT warmup + 网络抖动,与 GC 无关(G1 max STW 12.82ms ≪ stddev 35ms)。
当前部署形态 vs ZGC 优势场景
完全相反。重启对齐的触发条件已记录到
decisions/pine-java-gc-choice.md。核心教训(沉淀到 reflection)
java -version确认实际 runtime)Test plan
mvn test全 246 tests passData archive
bench-results/report-20260625-113855.txt(G1) /report-20260625-114324.txt(ZGC)/tmp/gc-investigation/{g1,zgc}-gc.log(本地暂存,已在 reflection 中摘录关键数字)