From c157242a83a776b338f0873e863ae1a0c379d97f Mon Sep 17 00:00:00 2001 From: Liam Date: Fri, 26 Jun 2026 08:40:38 +0800 Subject: [PATCH 1/4] chore(scripts): add JAVA_BENCH_OPTS env hook for JVM flag experiments MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit bench-cross-runtime.sh's java leg previously hard-wired the launch command, so any JVM tuning experiment (ZGC vs G1, different heap sizes, preview-flag toggles) required hand-editing the script per run. Add a JAVA_BENCH_OPTS env var that word-splits into the java launch between `java` and `-cp`. Default empty = no flags = current behavior preserved. Usage: JAVA_BENCH_OPTS="-XX:+UseZGC" scripts/bench-cross-runtime.sh ... Word-splitting on env input is intentional (multiple flags); shellcheck disable comment documents the choice. The other two runtimes (go, cpp) are unaffected — they already use binary CLI flags for tuning. --- scripts/bench-cross-runtime.sh | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/scripts/bench-cross-runtime.sh b/scripts/bench-cross-runtime.sh index 5365aa73..f586cf8c 100755 --- a/scripts/bench-cross-runtime.sh +++ b/scripts/bench-cross-runtime.sh @@ -172,8 +172,16 @@ start_server() { # Set BENCH_VERBOSE=1 to capture server logs for debugging startup failures [[ "${BENCH_VERBOSE:-}" == "1" ]] && sink="$WORK_DIR/${runtime}.log" local -a cmd=() + # JAVA_BENCH_OPTS lets the caller inject JVM flags (e.g. `-XX:+UseZGC + # -XX:+ZGenerational`) for the java leg without touching the script. + # Word-split via $JAVA_BENCH_OPTS expansion; empty default = no flags. + local -a java_opts=() + if [[ -n "${JAVA_BENCH_OPTS:-}" ]]; then + # shellcheck disable=SC2206 # intentional word-splitting for env-supplied flags + java_opts=(${JAVA_BENCH_OPTS}) + fi case "$runtime" in - java) cmd=(java -cp "$JAVA_CP" -Dpine.bench=true -Dpine.config="$config" -Dpine.port="$port" + java) cmd=(java "${java_opts[@]}" -cp "$JAVA_CP" -Dpine.bench=true -Dpine.config="$config" -Dpine.port="$port" page.liam.pine.PineServer) ;; go) cmd=("$WORK_DIR/server-go" -config "$config" -addr ":$port") ;; cpp) if [[ -n "${CPP_LD_PRELOAD:-}" ]]; then From 62475e27cef05a529b0aeb119fcafe1ce3e293f5 Mon Sep 17 00:00:00 2001 From: Liam Date: Fri, 26 Jun 2026 08:40:57 +0800 Subject: [PATCH 2/4] =?UTF-8?q?chore(java):=20bump=20target=20/=20CI=20/?= =?UTF-8?q?=20docs=20from=20JDK=2021=20=E2=86=92=20JDK=2025=20(LTS=20?= =?UTF-8?q?=E2=86=92=20LTS)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We have no historical dependency tying us to JDK 21: - pine-java is greenfield 2026 code (pine-java-full-implementation reflection) with no `sun.misc.*`, no module overrides, no reflection trickery against deprecated VM internals. - The benchmark host already runs OpenJDK 25.0.2 — so the v0.10.9 README bench numbers were already measured on 25 at runtime, with only `/` and CI setup-java pinned to 21. Bumping closes that stale gap. Risk audited and cleared: - LuaJ 3.0.1 (2014, the BCEL-backed luajc compiler that emits Lua → JVM bytecode and underpins the v0.10 perf baseline) was the obvious worry: would it emit verifier-clean bytecode under target 25? Built + ran the full 246-test suite under target 25 — all pass, including TransformByLuaCompilerBackendTest's luajc/luac equivalence cases. - BCEL 6.10.0 has no JDK 25 release notes regression; the suite proves it. 10 setup-java references swept (ci.yml × 6, daily-sanitized-fuzz, nightly-benchmark, nightly-diff-fuzz, release). Both READMEs say `Java 25+`. Language layer gives us no immediate win — we don't use any 22-25 language features yet (Stable Values / Scoped Values stable in 25, etc. are opportunities, not requirements). The win is closing the stale stack-vs-runtime gap and unlocking 22-25 features when we want them. LTS-to-LTS jump: 21 LTS → 25 LTS. JDK 25 GA'd 2026-01. --- .github/workflows/ci.yml | 12 ++++++------ .github/workflows/daily-sanitized-fuzz.yml | 2 +- .github/workflows/nightly-benchmark.yml | 2 +- .github/workflows/nightly-diff-fuzz.yml | 2 +- .github/workflows/release.yml | 2 +- README-en.md | 2 +- README.md | 2 +- pine-java/pom.xml | 4 ++-- 8 files changed, 14 insertions(+), 14 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index b427c541..6411a2ce 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -88,7 +88,7 @@ jobs: - uses: actions/setup-java@v5 with: distribution: temurin - java-version: "21" + java-version: "25" cache: maven - run: mvn checkstyle:check -B @@ -157,7 +157,7 @@ jobs: - uses: actions/setup-java@v5 with: distribution: temurin - java-version: "21" + java-version: "25" cache: maven - run: mvn test -B -q - name: Upload Java coverage @@ -323,7 +323,7 @@ jobs: - uses: actions/setup-java@v5 with: distribution: temurin - java-version: "21" + java-version: "25" - name: Run benchmarks run: mvn test -B -Dtest=BenchmarkTest -pl . 2>&1 | tee benchmark-java.txt - name: Write benchmark summary @@ -349,7 +349,7 @@ jobs: - uses: actions/setup-java@v5 with: distribution: temurin - java-version: "21" + java-version: "25" cache: maven - name: Fuzz Config.load run: mvn test -B -Dtest="JazzerFuzzTest#fuzzConfigLoad" -Djazzer.instrument="page.liam.pine.**" @@ -376,7 +376,7 @@ jobs: - uses: actions/setup-java@v5 with: distribution: temurin - java-version: "21" + java-version: "25" cache: maven - uses: actions/setup-python@v6 with: @@ -513,7 +513,7 @@ jobs: - uses: actions/setup-java@v5 with: distribution: temurin - java-version: "21" + java-version: "25" cache: maven - uses: actions/setup-python@v6 with: diff --git a/.github/workflows/daily-sanitized-fuzz.yml b/.github/workflows/daily-sanitized-fuzz.yml index f48af165..ea4b0cff 100644 --- a/.github/workflows/daily-sanitized-fuzz.yml +++ b/.github/workflows/daily-sanitized-fuzz.yml @@ -73,7 +73,7 @@ jobs: - uses: actions/setup-java@v5 with: distribution: temurin - java-version: "21" + java-version: "25" cache: maven - name: Install C++ build deps diff --git a/.github/workflows/nightly-benchmark.yml b/.github/workflows/nightly-benchmark.yml index 9a3c0441..e2352b27 100644 --- a/.github/workflows/nightly-benchmark.yml +++ b/.github/workflows/nightly-benchmark.yml @@ -39,7 +39,7 @@ jobs: - uses: actions/setup-java@v5 with: distribution: temurin - java-version: "21" + java-version: "25" cache: maven - name: Install system deps diff --git a/.github/workflows/nightly-diff-fuzz.yml b/.github/workflows/nightly-diff-fuzz.yml index 66cd9807..0c181f37 100644 --- a/.github/workflows/nightly-diff-fuzz.yml +++ b/.github/workflows/nightly-diff-fuzz.yml @@ -44,7 +44,7 @@ jobs: - uses: actions/setup-java@v5 with: distribution: temurin - java-version: "21" + java-version: "25" cache: maven - name: Install C++ build deps diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index d1c808c5..cf8c97a3 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -62,7 +62,7 @@ jobs: - uses: actions/setup-java@v5 with: distribution: temurin - java-version: "21" + java-version: "25" cache: maven server-id: central server-username: CENTRAL_USERNAME diff --git a/README-en.md b/README-en.md index 39933533..6528b8ac 100644 --- a/README-en.md +++ b/README-en.md @@ -52,7 +52,7 @@ Python DSL (Apple) ──compile──> JSON Config ### Prerequisites - Go 1.26+ (Pine-Go) -- Java 21+ (Pine-Java) +- Java 25+ (Pine-Java) - Python 3.11+ (Apple DSL) ### 1. Write a Pipeline diff --git a/README.md b/README.md index 8194d751..c2b4a05c 100644 --- a/README.md +++ b/README.md @@ -52,7 +52,7 @@ Python DSL (Apple) ──compile──> JSON Config ### 环境要求 - Go 1.26+(Pine-Go) -- Java 21+(Pine-Java) +- Java 25+(Pine-Java) - Python 3.11+(Apple DSL) - CMake 3.20+ / C++23 编译器 / LuaJIT(Pine-C++,可选) diff --git a/pine-java/pom.xml b/pine-java/pom.xml index 92c8117c..5bc4566b 100644 --- a/pine-java/pom.xml +++ b/pine-java/pom.xml @@ -34,8 +34,8 @@ - 21 - 21 + 25 + 25 UTF-8 5.11.4 2.17.0 From 24612e199198692d616be0d6b5062396b6795d40 Mon Sep 17 00:00:00 2001 From: Liam Date: Fri, 26 Jun 2026 08:55:21 +0800 Subject: [PATCH 3/4] docs(llmdoc): record JDK 25 upgrade + ZGC investigation + stddev calibration MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three stable docs land + one stale assumption corrected: 1. `memory/decisions/pine-java-gc-choice.md` (NEW, 76 lines) — pine-java GC choice decision. Under the current deployment shape (4G heap / 2C cgroup / throughput-bound / single-request 100+ ms) G1 (the JDK 21+ default) is the right choice; ZGC is a net -5 to -7 % loss because its concurrent-phase CPU steal (1087 ms in an 80 s bench window, ~6.7 % of 2-core wall) is not paid back when G1 already has 0 pauses >50 ms and a max of 12.82 ms. Records the 2026-06-26 A-stage experiment numbers and the GC-log evidence verbatim so future challengers don't have to re-run. Lists the four re-open triggers (heap ≥16 G / cores ≥8 / P99 ≤10 ms SLO / G1 STW max >50 ms in prod) and the layered relationship with `perf-evolution-roadmap.md` (engine-side perf vs JVM process parameters). 2. `memory/reflections/jdk25-upgrade-and-zgc-investigation.md` (NEW, 118 lines) — the task-level reflection. Captures the "performance hypothesis must be measured, not guessed" lesson (my initial recommendation was driven by an unverified "ZGC tightens stddev" prior that turned out wrong: G1 max STW pause 12.82 ms ≪ calibrated stddev 35 ms, so GC was never the stddev source). Captures the LuaJ 3.0.1 + BCEL 6.10.0 risk falsification methodology (246 tests passed under target 25 including the luajc/luac equivalence suite), and the compile-target ≠ runtime-version gotcha (the host already ran OpenJDK 25.0.2 while pom said target 21, so v0.10.9 README bench numbers were already 25-runtime measurements). 3. `guides/benchmark-hygiene.md` (+10 lines) — new "stddev 来源 校准" subsection between the same-host comparison discipline and the fixture representativeness section. Names the three real stddev drivers (DAG multi-op scheduling jitter / LuaJ JIT warmup tail / HTTP keepalive jitter), explicitly excludes GC pause via the measured G1 max 12.82 ms data point, and warns that GC-tuning to tighten stddev is a wrong-direction fix unless GC logs prove otherwise. 4. `index.md` — three line updates: new reflection entry, new decision entry, and the bench-hygiene description gains the stddev calibration topic. Out of scope (deliberately untouched): `perf-evolution-roadmap.md` (cross-reference left for a future PR if the layering needs to be made explicit), `must/conventions.md` (the JVM toolchain check "run java -version before pom-target audits" deserves a separate promotion pass), `architecture/dag-engine.md:710` (still says "Java 21+" but README is the user-facing source-of-truth and was bumped in `62475e27`). --- llmdoc/guides/benchmark-hygiene.md | 10 ++ llmdoc/index.md | 4 +- .../memory/decisions/pine-java-gc-choice.md | 76 +++++++++++ .../jdk25-upgrade-and-zgc-investigation.md | 118 ++++++++++++++++++ 4 files changed, 207 insertions(+), 1 deletion(-) create mode 100644 llmdoc/memory/decisions/pine-java-gc-choice.md create mode 100644 llmdoc/memory/reflections/jdk25-upgrade-and-zgc-investigation.md diff --git a/llmdoc/guides/benchmark-hygiene.md b/llmdoc/guides/benchmark-hygiene.md index 04b49975..248b7f43 100644 --- a/llmdoc/guides/benchmark-hygiene.md +++ b/llmdoc/guides/benchmark-hygiene.md @@ -29,6 +29,16 @@ - fresh build 之间存在 **±5-7% 的二进制布局噪声**(函数地址/对齐漂移),小于该幅度的 QPS 差异不可下结论 - 落在噪声带内的差异需 `perf stat` 微架构指标交叉验证:instructions / IPC / L1-icache-miss / branch-miss / context-switches。两个 build 微架构指标持平,即可判定"统计无差异" +## stddev 来源校准 + +- calibrated fixtures 的 stddev 33–36 ms **不是噪声来源,是负载的固有抖动**——切 GC / 调 JVM flag / 改 JIT 后端都不会收紧 +- 三大主导源(按贡献排序): + 1. **DAG 多 op 并发调度抖动**:calibrated_itemlua 38-op DAG 内 ready-queue 调度顺序非确定,单请求耗时浮动 ~20 ms + 2. **LuaJ JIT warmup 在前 N 个请求的非均匀分布**:luajc 编译触发点漂移,前段请求耗时尾部偏长 + 3. **HTTP keepalive / server 调度抖动**:网络栈与 server 端 work-stealing 微秒级浮动放大到毫秒 +- **GC pause 不在主要源中**:pine-java G1 实测 max STW pause **12.82 ms** 远低于 calibrated stddev 35 ms,整体 STW budget 远小于 stddev × bench 窗口。详见 `llmdoc/memory/decisions/pine-java-gc-choice.md` "ZGC 实测数据" 段的 GC log 实测 +- **推论**:诊断 stddev 之前先采 GC log 验明出处。试图通过 GC tuning 收紧 stddev 是错误方向(除非 GC log 数据反证 STW 主导);"stddev 高 → 怀疑 GC → 切 GC" 这条错误链路已被 2026-06-26 ZGC A 实验证伪一次,不要复制 + ## Fixture 代表性 - `fixtures/benchmarks/realistic_for_you_calibrated*` 是生产 proxy(按真实流量 calibrate,N≈10 行),是**性能决策的唯一裁判** diff --git a/llmdoc/index.md b/llmdoc/index.md index 7e773149..403b10a7 100644 --- a/llmdoc/index.md +++ b/llmdoc/index.md @@ -22,7 +22,7 @@ - `llmdoc/guides/ci-quality-baseline.md` — CI 工程质量基线:lint(含 Java checkstyle `failOnViolation=true` + `OneStatementPerLine`、C++ clang-format)/ test / coverage / fuzz / differential-fuzz / cross-validate / nightly cross-runtime benchmark / release-gate 架构与接入约定(含 pine-cpp 的 4 个 CI job 与 cross-validate cpp 二进制注入路径),统一任务入口 Makefile 体系(顶层 + `pine-go/` Makefile 封装跨四语言 fmt/lint/test/bench/codegen/版本管理,CI 与本地共用同一命令序列、`make bench` 默认 `pine_bench` tag),以及本地 `.githooks/` 体系(`pre-commit` staged-only 格式 gate + `pre-push` 工程级 lint + 自包装 CI watch)。 - `llmdoc/guides/investigation-to-fix-testing.md` — 从调查到修复的测试策略:按缺陷类型选择测试层、最小修复面原则、跟进上游 issue 与临时止血方法论(跨 issue 根因归属不顺 follow-up 措辞、临时止血阈值用 probe 实测标定)。 - `llmdoc/guides/cross-layer-validation.md` — 跨层语义校验:JSON 边界类型枚举、codegen 语义验证(含跨引擎 markdown / Python 产物 byte-equal gate)、边界值 E2E、隐含 metadata 契约检测、扩展点对等验证(能力等价)。 -- `llmdoc/guides/benchmark-hygiene.md` — Benchmark 噪声卫生:跑前/跑后 load 与残留进程检查、同日同机对照纪律、±5-7% 二进制布局噪声与 perf stat 交叉验证、fixture 代表性(calibrated 为性能决策唯一裁判)、microbench 访问模式戒律、逐 op 删除归因法、测量路径对称性(PureVM vs CallOnly vs Boundary 不可互推)。 +- `llmdoc/guides/benchmark-hygiene.md` — Benchmark 噪声卫生:跑前/跑后 load 与残留进程检查、同日同机对照纪律、±5-7% 二进制布局噪声与 perf stat 交叉验证、calibrated stddev 33-36ms 来源校准(DAG 调度抖动 + LuaJ JIT warmup + 网络抖动主导,GC pause 非主要源)、fixture 代表性(calibrated 为性能决策唯一裁判)、microbench 访问模式戒律、逐 op 删除归因法、测量路径对称性(PureVM vs CallOnly vs Boundary 不可互推)。 ## reference/ @@ -107,7 +107,9 @@ - `llmdoc/memory/reflections/wangshu-borrow-optimization-survey.md` — wangshu borrow/边界优化空间调查复盘(纯调查+原型量化,未改生产代码),记录 borrow 层刻意对称是设计、Arena 列轨 ABI 边界口径 -22%(N=100)~-46%(N=3000) 但端到端会稀释+落地破四引擎 parity、makeArrayTable SetIndex 顺序 append O(N²) 建表(已提 wangshu #10)、Boundary 微基准≠calibrated 裁判的口径纪律、非单调性是红旗须先查根因。 - `llmdoc/memory/reflections/wangshu-v020rc3-upgrade-and-workaround-refactor.md` — wangshu v0.2.0-rc3 升级与两 workaround 重构 / 拆除 / 判据迁移复盘(wangshu 内存系列第四篇):上游一次回应 #9/#10/#11 三 issue(#9 真等价解 `MaybeCollectNow` 等三选一 API;#10 真根因解为 arena LARGE freelist 单链 first-fit→power-of-2 buckets,纠正前篇"rehash 风暴"错误推断;#11 partial:`Arena.Compact()` 解 transient peak、bump 不回退、sustained-fat latch 留作 follow-up),下游 cadence-sweep 真拆 / drop-fat-state 判据迁移(`GCCountKB`→`ArenaCapKB`)/ `makeArrayTable` 切 `NewArrayTable`,沉淀两条 rc 升级方法论(必读 issue close comments、workaround 拆除分 root-cause/proxy 判别);本篇覆盖纠偏第三篇 reflection 与稳定文档多处。 - `llmdoc/memory/reflections/redis-cascade-safety-and-observability.md` — Redis cascade-safety 五参数(`{dial,read,write,pool}_timeout_ms` + `pool_size`)三引擎对齐 + pine-cpp Client 失败收敛与 SIGPIPE 守卫 + codegen markdown 跨引擎 byte-equal + per-command Redis 指标(`pine_redis_command_*` 4-state status)的 PR 复盘(13 commits / 3 轮 review),记录 codegen 单向对齐方向(Go 是 source of truth)、failed-path 静默降级审计契约、单元测试不能替代 byte-equal gate、review-driven scope expansion 接受、cpp 错误类型分层 known follow-up 五条教训。 +- `llmdoc/memory/reflections/jdk25-upgrade-and-zgc-investigation.md` — 2026-06-26 评估 pine-java 升 JDK 25 + 切 ZGC 的 A/B/C 路径复盘,记录"性能假设要测不要猜"与 deployment-shape vs GC-shape 匹配检查的教训,含 G1/ZGC GC log 实测数据点(G1 max STW 12.82ms 证伪 stddev=GC 假设、ZGC 2C cgroup 下 concurrent 6.7% CPU 偷窃致 −5~7% QPS)。 ## memory/decisions/ - `llmdoc/memory/decisions/perf-evolution-roadmap.md` — 引擎侧性能演进路线决策:两个校准事实(per-item VM 边界主导、VM 层加速被端到端稀释,含 itemlua 第二证据点)、三步路线(typed-ColumnFrame/arena → common-mode 列内核负载迁移(含 2026-06-13 wangshu Arena 列轨 ABI 边界量化数据点,-22%~-46% 但端到端稀释+破 parity,不立即落地)→ 第三步 VM 适配层可插拔已于 2026-06-13 触发,wangshu 翻默认)、明确不做项(VM 直摸 Go heap、简单脚本负载上的 VM 优化)、翻默认三条 AND 闸门(calibrated 不劣化 + 受影响场景显著胜出 + 双 tag 全绿)、按切换范围分档的语义闸门。 +- `llmdoc/memory/decisions/pine-java-gc-choice.md` — pine-java GC 选型决策:4G 堆 / 2C cgroup / throughput-bound 形态下保持 JDK 21+ 默认 G1,不切 ZGC/Shenandoah/Parallel;含 2026-06-26 ZGC A 实验 QPS 表与 G1/ZGC GC log 实测、根因(G1 已无长 pause 痛点 + ZGC 小核 cgroup 下 concurrent CPU 偷窃倒贴 + stddev 主导源与 GC 无关)、重启触发条件(堆 ≥16G / 核 ≥8C / P99 ≤10ms / G1 STW max >50ms 任一)、与 perf-evolution-roadmap 互补的层次关系。 diff --git a/llmdoc/memory/decisions/pine-java-gc-choice.md b/llmdoc/memory/decisions/pine-java-gc-choice.md new file mode 100644 index 00000000..098ce41a --- /dev/null +++ b/llmdoc/memory/decisions/pine-java-gc-choice.md @@ -0,0 +1,76 @@ +# pine-java GC 选型决策 + +记录 2026-06-26 JDK 25 升级 + ZGC 评估收口后确定的 pine-java GC 选型决策。本文档覆盖"JVM 进程参数 / GC 选型"层,与 `perf-evolution-roadmap.md`(引擎侧 typed-ColumnFrame / common-mode / VM 适配层)互补不冲突。 + +## 决策 + +**保持 JDK 21+ 默认的 G1 GC**,不切 ZGC、不切 Shenandoah、不切 Parallel。仅当后续 deployment 形态发生质变且重测后明确有正向证据时才重启选型。 + +## 当前部署形态 + +- 堆:4 G 上限 +- CPU:2 C cgroup(`pine-bench-server.unit` 隔离单元) +- 单请求延迟:100+ ms(DAG 38-op + per-item Lua + stub I/O) +- 负载形态:**throughput-bound**(QPS 决策),不是 latency-bound +- JVM:OpenJDK 25 runtime(v0.10.10 起 compile target 同步升 25,见 commit `62475e27`) +- GC:JDK 21+ 默认 G1 + +## ZGC 实测数据(2026-06-26) + +A 路径实验:同机串行同 fixture / 10k req × 16 conc,G1 baseline 与 ZGC 各 calibrated × 3 fixture。 + +**QPS** + +| Fixture | G1 QPS | ZGC QPS | Δ | +| --- | --- | --- | --- | +| `calibrated_2c4g` | 127.9 | 120.9 | **−5.5%** | +| `calibrated` | 128.1 | 119.0 | **−7.1%** | +| `calibrated_itemlua` | 126.5 | 119.5 | **−5.5%** | + +stddev 33–36 ms,G1 与 ZGC 无差。 + +**GC log 实测(单 fixture 验证)** + +- G1:729 pauses,avg **3.49 ms**,max **12.82 ms**,**0 次** 超 50 ms。 +- ZGC:753 STW pauses,avg **0.008 ms**,max **0.022 ms**(580× 短于 G1)。 +- ZGC concurrent phase:108 events / 总 **1087 ms** / 34 个 >10 ms(80s bench 窗口内)。 +- 2 C cgroup 下 1087 ms concurrent ≈ **6.7% CPU 偷窃**,与 −5~7% QPS 吻合。 + +原始报告:`bench-results/report-20260625-113855.txt`(G1 baseline)、`bench-results/report-20260625-114324.txt`(ZGC)。 + +## 根因 + +1. **G1 已无长 pause 痛点**:max STW 12.82 ms ≪ calibrated stddev 35 ms,整体 STW budget 远小于 stddev。stddev 35 ms **完全不是 GC 来源**,因此切任何 GC 都无法收紧 stddev。 +2. **ZGC trade-off 在小核 cgroup 下输**:ZGC 把 STW 换成 concurrent CPU 工作,2 C cgroup 下 6.7% CPU 被 concurrent phase 偷走,直接体现为 QPS −5~7%。STW 的 580× 收益对 100+ ms 单请求不可见。 +3. **ZGC 适用场景与当前形态反向**:ZGC 优势在 ≥16 G 堆 / ≥8 C 核 / 1–10 ms 单请求 / 延迟敏感 SLO;pine-java 当前 4 G / 2 C / 100+ ms / throughput-bound 完全反向。 +4. **calibrated stddev 35 ms 与 GC 无关**:实测主导源是 DAG 38-op 调度抖动 + LuaJ JIT warmup + 网络抖动,证伪了"ZGC 收紧 stddev"的先验假设。详见 `llmdoc/guides/benchmark-hygiene.md` "stddev 来源校准"。 + +## 重启选型的触发条件 + +以下任一条触发即重做 GC 选型评估: + +- **deployment 形态变更**:堆 ≥16 G **或** 核心数 ≥8 **或** 单请求目标 P99 ≤10 ms(latency-bound SLO 出现) +- **G1 worst-case pause 失控**:生产数据显示 G1 STW max > 50 ms +- **calibrated stddev 主导源转移到 GC**:重测 GC log 验证 STW 总贡献接近 stddev 量级(当前 STW total 远小于 stddev × bench duration 时不触发) + +重启时复用 commit `c157242a` 引入的 `JAVA_BENCH_OPTS` 钩子(脚本侧 JVM flag 注入入口),直接复跑 A 实验对照。 + +## 不该做的实验 + +- `-XX:+UseSerialGC`:单线程必输,无对比价值。 +- Shenandoah:与 ZGC 同类 concurrent trade-off,且 OpenJDK 25 Temurin 不默认 ship,引入额外依赖且预期与 ZGC 同向负优化。 +- Parallel GC:throughput-only、无 concurrent class unloading、与 G1 同型号但更老,无替换收益。 + +## 与 perf-evolution-roadmap 的关系 + +- `llmdoc/memory/decisions/perf-evolution-roadmap.md` 圈定**引擎侧**演进(typed-ColumnFrame / common-mode 列内核 / VM 适配层),其性能假设建立在"运行环境层稳定"之上。 +- 本决策圈定**运行环境层**(JVM 进程参数 / GC 选型),是 roadmap 假设的底座。 +- 两者互补不冲突;任何引擎侧优化的 calibrated 数据均需声明 GC 形态(当前 G1),跨 GC 比较不可直接套用。 + +## 引用 + +- 完整复盘:`llmdoc/memory/reflections/jdk25-upgrade-and-zgc-investigation.md` +- 编译目标升级:commit `62475e27`(pom + CI + README target 21→25) +- JVM flag 实验钩子:commit `c157242a`(`JAVA_BENCH_OPTS` 环境变量) +- 数据存档:`bench-results/report-20260625-113855.txt`(G1)、`bench-results/report-20260625-114324.txt`(ZGC) +- stddev 来源校准:`llmdoc/guides/benchmark-hygiene.md` diff --git a/llmdoc/memory/reflections/jdk25-upgrade-and-zgc-investigation.md b/llmdoc/memory/reflections/jdk25-upgrade-and-zgc-investigation.md new file mode 100644 index 00000000..207b1be6 --- /dev/null +++ b/llmdoc/memory/reflections/jdk25-upgrade-and-zgc-investigation.md @@ -0,0 +1,118 @@ +--- +name: jdk25-upgrade-and-zgc-investigation +description: 2026-06-26 评估 pine-java 升 JDK 25 + 切 ZGC 的 A/B/C 路径复盘,记录"性能假设要测不要猜"与 deployment-shape vs GC-shape 匹配检查的教训 +type: reflection +--- + +## Task + +评估 pine-java 能否升 JDK 25 + 切 ZGC,按三条路径走: + +- A:切 ZGC(保持 JDK 21) +- B:升 JDK 25 编译目标(pom + CI + README) +- C:A + B 串联 + +预期 A 能拿一些尾延迟收益、B 风险来自 LuaJ 3.0.1 + BCEL verifier 严格化。 + +## Expected vs Actual + +| 路径 | 预期 | 实际 | +| --- | --- | --- | +| A (ZGC) | calibrated stddev 35ms 应该是 G1 STW 来源,ZGC 收紧 stddev、QPS 持平或微升 | calibrated 形态下 net loss **−5.5% / −7.1% / −5.5%** QPS,stddev 无变化,证伪 | +| B (JDK 25) | LuaJ 3.0.1 + BCEL 6.10.0 可能因 25 verifier 严格化而炸 | 246 tests / 0 failures,含 luajc/luac 双后端等价测试,风险证伪 | +| C (A+B) | 取决于 A 的结果 | A 证伪后直接不需要 | + +最终落地:commit `62475e27`(B:bump target 25 + CI + README)+ `c157242a`(脚本 `JAVA_BENCH_OPTS` 环境变量钩子,用于以后 JVM flag 实验)。**ZGC 不切默认**。 + +## 实测数据点(GC log 实跑,可复用) + +A 实验设计:同机串行、各 calibrated × 3 fixture(`calibrated_2c4g` / `calibrated` / `itemlua`)、10k req × 16 conc。 + +**QPS** + +| Fixture | G1 baseline | ZGC | Δ | +| --- | --- | --- | --- | +| calibrated_2c4g | 127.9 | 120.9 | −5.5% | +| calibrated | 128.1 | 119.0 | −7.1% | +| itemlua | 126.5 | 119.5 | −5.5% | + +stddev 33–36ms,G1 与 ZGC 无差。 + +**GC log 实测(2026-06-26)** + +- G1:729 pauses,avg **3.49 ms**,max **12.82 ms**,**0 次** 超 50 ms。 +- ZGC:753 STW pauses,avg **0.008 ms**,max **0.022 ms**(580× 短于 G1)。 +- ZGC concurrent phase:108 events / 总 **1087 ms** / 34 个 >10 ms。 +- 2C cgroup 下 1087 ms concurrent ≈ **6.7% CPU 偷窃**,与 −5~7% QPS 吻合。 + +→ G1 STW 根本不够大(max 12.82 ms),stddev 35ms 完全不是 GC 来源。ZGC 的 0.022ms STW 收益被 concurrent CPU 偷窃在 2C cgroup 下完全吃回去且倒贴。 + +## What Went Wrong + +### 1. A 路径推荐基于错误先验 + +最初推荐 A 的理由是"calibrated stddev 35ms 应该来自 G1 STW pause、ZGC 应该收紧 stddev"。**没先采 G1 GC log 看 max pause**,直接拿 stddev 倒推 GC 是 hot spot。实测 G1 max 才 12.82ms,整体 STW budget 远小于 stddev,假设链从源头就错了。 + +### 2. ZGC 适用场景未在调研前列检查表 + +ZGC 优势场景(≥16 G 堆 / ≥8 C 核 / 1–10 ms 单请求 / 延迟敏感)与 pine-java 当前形态(4 G 堆 / 2 C cgroup / 单请求 100+ ms / throughput-bound)完全反向。如果调研前先列 ZGC 适用场景 vs 当前 deployment shape,30 秒就能判 "我们这种形态根本不该切 ZGC"。 + +### 3. benchmark host runtime 与 maven target 脱节未先确认 + +机器 PATH 第一个 `java` 已是 OpenJDK 25.0.2,但 pom `` 还停在 21。v0.10.9 README bench 数据其实早已跑在 JDK 25 runtime 上,只是字节码 target 21。B 路径升级实质只是补齐编译目标,**不是 runtime 切换**。开调研前没先 `java -version` 确认实际 runtime 版本,差点把 "runtime 切换" 与 "compile target 切换" 混为一谈。 + +### 4. LuaJ 21→25 风险纯凭直觉判定 + +升级前怀疑 LuaJ 3.0.1 + BCEL 6.10.0 在 25 verifier 严格化下会炸——这是基于历史 BCEL 在跨大版本 JDK 升级时 stackmap frame 兼容问题的直觉,但没先跑一遍 test suite。如果先跑 `mvn test`,几分钟就能证伪,不必把 LuaJ 列为高风险阻塞项。 + +## Root Cause + +### 性能假设必须被实测打过才算事实 + +"stddev 35ms 来自 GC pause" 是个看起来合理的假设,但合理 ≠ 真。GC log 一开就立刻证伪。对一切将影响选型决策的性能假设,**先跑一次最小复现拿数据,再下结论**。这条已在 `bench-lua-vs-go-performance.md` 和 `isolated-bench-and-resource-ops.md` 中以"预估偏差"的形式出现过,这是第三次同类教训。 + +### deployment-shape vs GC-shape 匹配检查缺失 + +JVM tuning 选型应先做"我们的部署形态匹配该 GC 的适用场景吗"检查,这是 GC 选型的零号问题。直接跳到"试一下 ZGC"就跳过了零号问题。 + +### LTS→LTS 升级风险评估不应纯靠直觉 + +LuaJ + BCEL 这类字节码生成依赖在跨大版本 JDK 升级时确实是合理怀疑点,但 **test suite + cross-validate 实测** 是最便宜的证伪手段,应优先于"列为风险阻塞 → 开会议 → 拉清单"。 + +### compile target ≠ runtime + +发版数据基线、bench 报告、README 数字这些"实际跑在哪个 JVM 上",与 pom 的 `` 是两件事。任何 JVM 升级讨论先 `java -version` + `mvn help:effective-pom | grep target`,两条命令把状态钉死。 + +## Missing Docs or Signals + +1. **没有 GC 选型决策档**:pine-java 当前用 G1,但没有任何文档说明为什么用 G1、什么形态下应该重新评估切 ZGC/Shenandoah/Parallel。下次再有人问"能不能切 ZGC",会从零重做这次调研。 +2. **`benchmark-hygiene.md` 缺 "stddev 来源校准"段**:calibrated 形态下 stddev 33–36ms 是 **应用层** 来源(IO、调度、Lua 调用栈),不是 GC 来源,但当前 guide 没说清楚。 +3. **没有"compile target vs runtime version 必须分别核对"的明文规范**:未来再有 JDK / Maven / Gradle 升级讨论,这个坑会被踩第二次。 + +## Promotion Candidates + +### 应立即新增到 `decisions/pine-java-gc-choice.md` + +- **结论**:4 G 堆 / 2 C cgroup / throughput-bound 形态下保持 G1(默认)。ZGC 与 Shenandoah 不切。 +- **实验证据**:附本次 A 实验 QPS 表 + G1 / ZGC GC log 实测数据点。 +- **重新评估触发条件**:堆 ≥16 G **或** 核心数 ≥8 **或** 单请求目标 P99 ≤10 ms **或** 出现 G1 STW max >50 ms 的生产证据。任一条触发就重做选型。 +- **JAVA_BENCH_OPTS 实验入口**:commit `c157242a` 给后续 GC flag 实验铺路,未来再起类似讨论直接用此钩子复跑 A 实验。 + +### 应补到 `guides/benchmark-hygiene.md` + +- **calibrated stddev 来源校准**:calibrated 形态下 stddev 33–36 ms 的主导来源是应用层(IO、调度、Lua 调用栈),G1 STW 贡献 <10 ms。诊断 stddev 之前先采 GC log 验明出处,避免"stddev 高 → 怀疑 GC → 切 GC"这条错误链路被复制。 + +### 应补到 `must/conventions.md`(JVM 工具链段) + +- **compile target vs runtime version 必须分别核对**:任何 JDK 升级讨论 / bench 报告,先记录 `java -version`(runtime)与 `mvn help:effective-pom | grep target`(compile target),二者可不同步,不可互推。 + +### 仅保留在 memory + +- 三个 fixture 的 QPS 具体数字(127.9 / 120.9 等):当时机器状态相关,不属稳定契约。 +- LuaJ 3.0.1 + BCEL 6.10.0 在 JDK 25 下 246 tests 通过的具体快照:会随版本 drift。 + +## Follow-up + +1. **本次任务实际已完成的部分**:commit `62475e27`(target 25 + CI + README)+ `c157242a`(JAVA_BENCH_OPTS 钩子),ZGC 不切。 +2. **建议在下一次 llmdoc 更新中执行**:新增 `decisions/pine-java-gc-choice.md`、给 `benchmark-hygiene.md` 补 "stddev 来源校准" 段、给 `conventions.md` 补 "compile target vs runtime version" 一句。 +3. **方法论沉淀**:以后任何 "切 X 性能优化" 类提案,强制三件套——(a) X 的适用场景表 vs 当前 deployment shape、(b) 当前形态的 baseline 指标 + 假设的瓶颈来源采证、(c) 最小复现 A/B 数据。三件套缺一项就不进决策。 From 0b48a036b072ed1fdbe33f1debf3c148da3af414 Mon Sep 17 00:00:00 2001 From: Liam Date: Fri, 26 Jun 2026 09:05:52 +0800 Subject: [PATCH 4/4] =?UTF-8?q?fix(scripts):=20bench-cross-runtime=20?= =?UTF-8?q?=E2=80=94=20address=20PR=20#145=20review=20nits?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two comment-only corrections inside scripts/bench-cross-runtime.sh caught by the agentic PR reviewer; no functional change. 1. Header `Prerequisites` block still listed `Java 21`. The same PR bumped pom + CI + READMEs to 25, so the file-internal comment was the last stale "Java 21" reference. Updated to `Java 25`. 2. The new `JAVA_BENCH_OPTS` example showed `-XX:+UseZGC -XX:+ZGenerational`. Generational ZGC is the default since JDK 24 and the non-generational mode is removed in 25, so the `-XX:+ZGenerational` flag is now obsolete — passing it emits a warning on every server startup. Reduced the example to just `-XX:+UseZGC` and noted that generational mode is default since JDK 24. --- scripts/bench-cross-runtime.sh | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/scripts/bench-cross-runtime.sh b/scripts/bench-cross-runtime.sh index f586cf8c..b7b31508 100755 --- a/scripts/bench-cross-runtime.sh +++ b/scripts/bench-cross-runtime.sh @@ -6,7 +6,7 @@ # # Prerequisites: # - hey: go install github.com/rakyll/hey@latest -# - Go, Java 21, cmake + build-essential + libluajit +# - Go, Java 25, cmake + build-essential + libluajit # # Usage: # ./scripts/bench-cross-runtime.sh [--skip go] [--modes "row,column"] @@ -172,9 +172,10 @@ start_server() { # Set BENCH_VERBOSE=1 to capture server logs for debugging startup failures [[ "${BENCH_VERBOSE:-}" == "1" ]] && sink="$WORK_DIR/${runtime}.log" local -a cmd=() - # JAVA_BENCH_OPTS lets the caller inject JVM flags (e.g. `-XX:+UseZGC - # -XX:+ZGenerational`) for the java leg without touching the script. - # Word-split via $JAVA_BENCH_OPTS expansion; empty default = no flags. + # JAVA_BENCH_OPTS lets the caller inject JVM flags (e.g. `-XX:+UseZGC` + # for generational ZGC, default since JDK 24) for the java leg without + # touching the script. Word-split via $JAVA_BENCH_OPTS expansion; empty + # default = no flags. local -a java_opts=() if [[ -n "${JAVA_BENCH_OPTS:-}" ]]; then # shellcheck disable=SC2206 # intentional word-splitting for env-supplied flags