From c157242a83a776b338f0873e863ae1a0c379d97f Mon Sep 17 00:00:00 2001
From: Liam <liamhuang0205@gmail.com>
Date: Fri, 26 Jun 2026 08:40:38 +0800
Subject: [PATCH 1/4] chore(scripts): add JAVA_BENCH_OPTS env hook for JVM flag
 experiments
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

bench-cross-runtime.sh's java leg previously hard-wired the launch
command, so any JVM tuning experiment (ZGC vs G1, different heap sizes,
preview-flag toggles) required hand-editing the script per run.

Add a JAVA_BENCH_OPTS env var that word-splits into the java launch
between `java` and `-cp`. Default empty = no flags = current behavior
preserved. Usage:

  JAVA_BENCH_OPTS="-XX:+UseZGC" scripts/bench-cross-runtime.sh ...

Word-splitting on env input is intentional (multiple flags); shellcheck
disable comment documents the choice. The other two runtimes (go, cpp)
are unaffected — they already use binary CLI flags for tuning.
---
 scripts/bench-cross-runtime.sh | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/scripts/bench-cross-runtime.sh b/scripts/bench-cross-runtime.sh
index 5365aa73..f586cf8c 100755
--- a/scripts/bench-cross-runtime.sh
+++ b/scripts/bench-cross-runtime.sh
@@ -172,8 +172,16 @@ start_server() {
   # Set BENCH_VERBOSE=1 to capture server logs for debugging startup failures
   [[ "${BENCH_VERBOSE:-}" == "1" ]] && sink="$WORK_DIR/${runtime}.log"
   local -a cmd=()
+  # JAVA_BENCH_OPTS lets the caller inject JVM flags (e.g. `-XX:+UseZGC
+  # -XX:+ZGenerational`) for the java leg without touching the script.
+  # Word-split via $JAVA_BENCH_OPTS expansion; empty default = no flags.
+  local -a java_opts=()
+  if [[ -n "${JAVA_BENCH_OPTS:-}" ]]; then
+    # shellcheck disable=SC2206  # intentional word-splitting for env-supplied flags
+    java_opts=(${JAVA_BENCH_OPTS})
+  fi
   case "$runtime" in
-    java) cmd=(java -cp "$JAVA_CP" -Dpine.bench=true -Dpine.config="$config" -Dpine.port="$port"
+    java) cmd=(java "${java_opts[@]}" -cp "$JAVA_CP" -Dpine.bench=true -Dpine.config="$config" -Dpine.port="$port"
               page.liam.pine.PineServer) ;;
     go)   cmd=("$WORK_DIR/server-go" -config "$config" -addr ":$port") ;;
     cpp)  if [[ -n "${CPP_LD_PRELOAD:-}" ]]; then

From 62475e27cef05a529b0aeb119fcafe1ce3e293f5 Mon Sep 17 00:00:00 2001
From: Liam <liamhuang0205@gmail.com>
Date: Fri, 26 Jun 2026 08:40:57 +0800
Subject: [PATCH 2/4] =?UTF-8?q?chore(java):=20bump=20target=20/=20CI=20/?=
 =?UTF-8?q?=20docs=20from=20JDK=2021=20=E2=86=92=20JDK=2025=20(LTS=20?=
 =?UTF-8?q?=E2=86=92=20LTS)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We have no historical dependency tying us to JDK 21:
- pine-java is greenfield 2026 code (pine-java-full-implementation
  reflection) with no `sun.misc.*`, no module overrides, no reflection
  trickery against deprecated VM internals.
- The benchmark host already runs OpenJDK 25.0.2 — so the v0.10.9
  README bench numbers were already measured on 25 at runtime, with
  only `<source>/<target>` and CI setup-java pinned to 21. Bumping
  closes that stale gap.

Risk audited and cleared:
- LuaJ 3.0.1 (2014, the BCEL-backed luajc compiler that emits Lua →
  JVM bytecode and underpins the v0.10 perf baseline) was the obvious
  worry: would it emit verifier-clean bytecode under target 25? Built
  + ran the full 246-test suite under target 25 — all pass, including
  TransformByLuaCompilerBackendTest's luajc/luac equivalence cases.
- BCEL 6.10.0 has no JDK 25 release notes regression; the suite proves
  it.

10 setup-java references swept (ci.yml × 6, daily-sanitized-fuzz,
nightly-benchmark, nightly-diff-fuzz, release). Both READMEs say
`Java 25+`.

Language layer gives us no immediate win — we don't use any 22-25
language features yet (Stable Values / Scoped Values stable in 25, etc.
are opportunities, not requirements). The win is closing the stale
stack-vs-runtime gap and unlocking 22-25 features when we want them.

LTS-to-LTS jump: 21 LTS → 25 LTS. JDK 25 GA'd 2026-01.
---
 .github/workflows/ci.yml                   | 12 ++++++------
 .github/workflows/daily-sanitized-fuzz.yml |  2 +-
 .github/workflows/nightly-benchmark.yml    |  2 +-
 .github/workflows/nightly-diff-fuzz.yml    |  2 +-
 .github/workflows/release.yml              |  2 +-
 README-en.md                               |  2 +-
 README.md                                  |  2 +-
 pine-java/pom.xml                          |  4 ++--
 8 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index b427c541..6411a2ce 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -88,7 +88,7 @@ jobs:
       - uses: actions/setup-java@v5
         with:
           distribution: temurin
-          java-version: "21"
+          java-version: "25"
           cache: maven
       - run: mvn checkstyle:check -B
 
@@ -157,7 +157,7 @@ jobs:
       - uses: actions/setup-java@v5
         with:
           distribution: temurin
-          java-version: "21"
+          java-version: "25"
           cache: maven
       - run: mvn test -B -q
       - name: Upload Java coverage
@@ -323,7 +323,7 @@ jobs:
       - uses: actions/setup-java@v5
         with:
           distribution: temurin
-          java-version: "21"
+          java-version: "25"
       - name: Run benchmarks
         run: mvn test -B -Dtest=BenchmarkTest -pl . 2>&1 | tee benchmark-java.txt
       - name: Write benchmark summary
@@ -349,7 +349,7 @@ jobs:
       - uses: actions/setup-java@v5
         with:
           distribution: temurin
-          java-version: "21"
+          java-version: "25"
           cache: maven
       - name: Fuzz Config.load
         run: mvn test -B -Dtest="JazzerFuzzTest#fuzzConfigLoad" -Djazzer.instrument="page.liam.pine.**"
@@ -376,7 +376,7 @@ jobs:
       - uses: actions/setup-java@v5
         with:
           distribution: temurin
-          java-version: "21"
+          java-version: "25"
           cache: maven
       - uses: actions/setup-python@v6
         with:
@@ -513,7 +513,7 @@ jobs:
       - uses: actions/setup-java@v5
         with:
           distribution: temurin
-          java-version: "21"
+          java-version: "25"
           cache: maven
       - uses: actions/setup-python@v6
         with:
diff --git a/.github/workflows/daily-sanitized-fuzz.yml b/.github/workflows/daily-sanitized-fuzz.yml
index f48af165..ea4b0cff 100644
--- a/.github/workflows/daily-sanitized-fuzz.yml
+++ b/.github/workflows/daily-sanitized-fuzz.yml
@@ -73,7 +73,7 @@ jobs:
       - uses: actions/setup-java@v5
         with:
           distribution: temurin
-          java-version: "21"
+          java-version: "25"
           cache: maven
 
       - name: Install C++ build deps
diff --git a/.github/workflows/nightly-benchmark.yml b/.github/workflows/nightly-benchmark.yml
index 9a3c0441..e2352b27 100644
--- a/.github/workflows/nightly-benchmark.yml
+++ b/.github/workflows/nightly-benchmark.yml
@@ -39,7 +39,7 @@ jobs:
       - uses: actions/setup-java@v5
         with:
           distribution: temurin
-          java-version: "21"
+          java-version: "25"
           cache: maven
 
       - name: Install system deps
diff --git a/.github/workflows/nightly-diff-fuzz.yml b/.github/workflows/nightly-diff-fuzz.yml
index 66cd9807..0c181f37 100644
--- a/.github/workflows/nightly-diff-fuzz.yml
+++ b/.github/workflows/nightly-diff-fuzz.yml
@@ -44,7 +44,7 @@ jobs:
       - uses: actions/setup-java@v5
         with:
           distribution: temurin
-          java-version: "21"
+          java-version: "25"
           cache: maven
 
       - name: Install C++ build deps
diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
index d1c808c5..cf8c97a3 100644
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@@ -62,7 +62,7 @@ jobs:
       - uses: actions/setup-java@v5
         with:
           distribution: temurin
-          java-version: "21"
+          java-version: "25"
           cache: maven
           server-id: central
           server-username: CENTRAL_USERNAME
diff --git a/README-en.md b/README-en.md
index 39933533..6528b8ac 100644
--- a/README-en.md
+++ b/README-en.md
@@ -52,7 +52,7 @@ Python DSL (Apple)  ──compile──>  JSON Config
 ### Prerequisites
 
 - Go 1.26+ (Pine-Go)
-- Java 21+ (Pine-Java)
+- Java 25+ (Pine-Java)
 - Python 3.11+ (Apple DSL)
 
 ### 1. Write a Pipeline
diff --git a/README.md b/README.md
index 8194d751..c2b4a05c 100644
--- a/README.md
+++ b/README.md
@@ -52,7 +52,7 @@ Python DSL (Apple)  ──compile──>  JSON Config
 ### 环境要求
 
 - Go 1.26+（Pine-Go）
-- Java 21+（Pine-Java）
+- Java 25+（Pine-Java）
 - Python 3.11+（Apple DSL）
 - CMake 3.20+ / C++23 编译器 / LuaJIT（Pine-C++，可选）
 
diff --git a/pine-java/pom.xml b/pine-java/pom.xml
index 92c8117c..5bc4566b 100644
--- a/pine-java/pom.xml
+++ b/pine-java/pom.xml
@@ -34,8 +34,8 @@
     </scm>
 
     <properties>
-        <maven.compiler.source>21</maven.compiler.source>
-        <maven.compiler.target>21</maven.compiler.target>
+        <maven.compiler.source>25</maven.compiler.source>
+        <maven.compiler.target>25</maven.compiler.target>
         <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
         <junit.version>5.11.4</junit.version>
         <jackson.version>2.17.0</jackson.version>

From 24612e199198692d616be0d6b5062396b6795d40 Mon Sep 17 00:00:00 2001
From: Liam <liamhuang0205@gmail.com>
Date: Fri, 26 Jun 2026 08:55:21 +0800
Subject: [PATCH 3/4] docs(llmdoc): record JDK 25 upgrade + ZGC investigation +
 stddev calibration
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Three stable docs land + one stale assumption corrected:

1. `memory/decisions/pine-java-gc-choice.md` (NEW, 76 lines) —
   pine-java GC choice decision. Under the current deployment shape
   (4G heap / 2C cgroup / throughput-bound / single-request 100+ ms)
   G1 (the JDK 21+ default) is the right choice; ZGC is a net -5 to
   -7 % loss because its concurrent-phase CPU steal (1087 ms in an
   80 s bench window, ~6.7 % of 2-core wall) is not paid back when
   G1 already has 0 pauses >50 ms and a max of 12.82 ms. Records the
   2026-06-26 A-stage experiment numbers and the GC-log evidence
   verbatim so future challengers don't have to re-run. Lists the
   four re-open triggers (heap ≥16 G / cores ≥8 / P99 ≤10 ms SLO /
   G1 STW max >50 ms in prod) and the layered relationship with
   `perf-evolution-roadmap.md` (engine-side perf vs JVM process
   parameters).

2. `memory/reflections/jdk25-upgrade-and-zgc-investigation.md`
   (NEW, 118 lines) — the task-level reflection. Captures the
   "performance hypothesis must be measured, not guessed" lesson
   (my initial recommendation was driven by an unverified
   "ZGC tightens stddev" prior that turned out wrong: G1 max STW
   pause 12.82 ms ≪ calibrated stddev 35 ms, so GC was never the
   stddev source). Captures the LuaJ 3.0.1 + BCEL 6.10.0 risk
   falsification methodology (246 tests passed under target 25
   including the luajc/luac equivalence suite), and the
   compile-target ≠ runtime-version gotcha (the host already ran
   OpenJDK 25.0.2 while pom said target 21, so v0.10.9 README bench
   numbers were already 25-runtime measurements).

3. `guides/benchmark-hygiene.md` (+10 lines) — new "stddev 来源
   校准" subsection between the same-host comparison discipline and
   the fixture representativeness section. Names the three real
   stddev drivers (DAG multi-op scheduling jitter / LuaJ JIT warmup
   tail / HTTP keepalive jitter), explicitly excludes GC pause via
   the measured G1 max 12.82 ms data point, and warns that
   GC-tuning to tighten stddev is a wrong-direction fix unless GC
   logs prove otherwise.

4. `index.md` — three line updates: new reflection entry, new
   decision entry, and the bench-hygiene description gains the
   stddev calibration topic.

Out of scope (deliberately untouched): `perf-evolution-roadmap.md`
(cross-reference left for a future PR if the layering needs to be
made explicit), `must/conventions.md` (the JVM toolchain check
"run java -version before pom-target audits" deserves a separate
promotion pass), `architecture/dag-engine.md:710` (still says
"Java 21+" but README is the user-facing source-of-truth and was
bumped in `62475e27`).
---
 llmdoc/guides/benchmark-hygiene.md            |  10 ++
 llmdoc/index.md                               |   4 +-
 .../memory/decisions/pine-java-gc-choice.md   |  76 +++++++++++
 .../jdk25-upgrade-and-zgc-investigation.md    | 118 ++++++++++++++++++
 4 files changed, 207 insertions(+), 1 deletion(-)
 create mode 100644 llmdoc/memory/decisions/pine-java-gc-choice.md
 create mode 100644 llmdoc/memory/reflections/jdk25-upgrade-and-zgc-investigation.md

diff --git a/llmdoc/guides/benchmark-hygiene.md b/llmdoc/guides/benchmark-hygiene.md
index 04b49975..248b7f43 100644
--- a/llmdoc/guides/benchmark-hygiene.md
+++ b/llmdoc/guides/benchmark-hygiene.md
@@ -29,6 +29,16 @@
 - fresh build 之间存在 **±5-7% 的二进制布局噪声**（函数地址/对齐漂移），小于该幅度的 QPS 差异不可下结论
 - 落在噪声带内的差异需 `perf stat` 微架构指标交叉验证：instructions / IPC / L1-icache-miss / branch-miss / context-switches。两个 build 微架构指标持平，即可判定"统计无差异"
 
+## stddev 来源校准
+
+- calibrated fixtures 的 stddev 33–36 ms **不是噪声来源，是负载的固有抖动**——切 GC / 调 JVM flag / 改 JIT 后端都不会收紧
+- 三大主导源（按贡献排序）：
+  1. **DAG 多 op 并发调度抖动**：calibrated_itemlua 38-op DAG 内 ready-queue 调度顺序非确定，单请求耗时浮动 ~20 ms
+  2. **LuaJ JIT warmup 在前 N 个请求的非均匀分布**：luajc 编译触发点漂移，前段请求耗时尾部偏长
+  3. **HTTP keepalive / server 调度抖动**：网络栈与 server 端 work-stealing 微秒级浮动放大到毫秒
+- **GC pause 不在主要源中**：pine-java G1 实测 max STW pause **12.82 ms** 远低于 calibrated stddev 35 ms，整体 STW budget 远小于 stddev × bench 窗口。详见 `llmdoc/memory/decisions/pine-java-gc-choice.md` "ZGC 实测数据" 段的 GC log 实测
+- **推论**：诊断 stddev 之前先采 GC log 验明出处。试图通过 GC tuning 收紧 stddev 是错误方向（除非 GC log 数据反证 STW 主导）；"stddev 高 → 怀疑 GC → 切 GC" 这条错误链路已被 2026-06-26 ZGC A 实验证伪一次，不要复制
+
 ## Fixture 代表性
 
 - `fixtures/benchmarks/realistic_for_you_calibrated*` 是生产 proxy（按真实流量 calibrate，N≈10 行），是**性能决策的唯一裁判**
diff --git a/llmdoc/index.md b/llmdoc/index.md
index 7e773149..403b10a7 100644
--- a/llmdoc/index.md
+++ b/llmdoc/index.md
@@ -22,7 +22,7 @@
 - `llmdoc/guides/ci-quality-baseline.md` — CI 工程质量基线：lint（含 Java checkstyle `failOnViolation=true` + `OneStatementPerLine`、C++ clang-format）/ test / coverage / fuzz / differential-fuzz / cross-validate / nightly cross-runtime benchmark / release-gate 架构与接入约定（含 pine-cpp 的 4 个 CI job 与 cross-validate cpp 二进制注入路径），统一任务入口 Makefile 体系（顶层 + `pine-go/` Makefile 封装跨四语言 fmt/lint/test/bench/codegen/版本管理，CI 与本地共用同一命令序列、`make bench` 默认 `pine_bench` tag），以及本地 `.githooks/` 体系（`pre-commit` staged-only 格式 gate + `pre-push` 工程级 lint + 自包装 CI watch）。
 - `llmdoc/guides/investigation-to-fix-testing.md` — 从调查到修复的测试策略：按缺陷类型选择测试层、最小修复面原则、跟进上游 issue 与临时止血方法论（跨 issue 根因归属不顺 follow-up 措辞、临时止血阈值用 probe 实测标定）。
 - `llmdoc/guides/cross-layer-validation.md` — 跨层语义校验：JSON 边界类型枚举、codegen 语义验证（含跨引擎 markdown / Python 产物 byte-equal gate）、边界值 E2E、隐含 metadata 契约检测、扩展点对等验证（能力等价）。
-- `llmdoc/guides/benchmark-hygiene.md` — Benchmark 噪声卫生：跑前/跑后 load 与残留进程检查、同日同机对照纪律、±5-7% 二进制布局噪声与 perf stat 交叉验证、fixture 代表性（calibrated 为性能决策唯一裁判）、microbench 访问模式戒律、逐 op 删除归因法、测量路径对称性（PureVM vs CallOnly vs Boundary 不可互推）。
+- `llmdoc/guides/benchmark-hygiene.md` — Benchmark 噪声卫生：跑前/跑后 load 与残留进程检查、同日同机对照纪律、±5-7% 二进制布局噪声与 perf stat 交叉验证、calibrated stddev 33-36ms 来源校准（DAG 调度抖动 + LuaJ JIT warmup + 网络抖动主导，GC pause 非主要源）、fixture 代表性（calibrated 为性能决策唯一裁判）、microbench 访问模式戒律、逐 op 删除归因法、测量路径对称性（PureVM vs CallOnly vs Boundary 不可互推）。
 
 ## reference/
 
@@ -107,7 +107,9 @@
 - `llmdoc/memory/reflections/wangshu-borrow-optimization-survey.md` — wangshu borrow/边界优化空间调查复盘（纯调查+原型量化，未改生产代码），记录 borrow 层刻意对称是设计、Arena 列轨 ABI 边界口径 -22%(N=100)~-46%(N=3000) 但端到端会稀释+落地破四引擎 parity、makeArrayTable SetIndex 顺序 append O(N²) 建表（已提 wangshu #10）、Boundary 微基准≠calibrated 裁判的口径纪律、非单调性是红旗须先查根因。
 - `llmdoc/memory/reflections/wangshu-v020rc3-upgrade-and-workaround-refactor.md` — wangshu v0.2.0-rc3 升级与两 workaround 重构 / 拆除 / 判据迁移复盘（wangshu 内存系列第四篇）：上游一次回应 #9/#10/#11 三 issue（#9 真等价解 `MaybeCollectNow` 等三选一 API；#10 真根因解为 arena LARGE freelist 单链 first-fit→power-of-2 buckets，纠正前篇"rehash 风暴"错误推断；#11 partial：`Arena.Compact()` 解 transient peak、bump 不回退、sustained-fat latch 留作 follow-up），下游 cadence-sweep 真拆 / drop-fat-state 判据迁移（`GCCountKB`→`ArenaCapKB`）/ `makeArrayTable` 切 `NewArrayTable`，沉淀两条 rc 升级方法论（必读 issue close comments、workaround 拆除分 root-cause/proxy 判别）；本篇覆盖纠偏第三篇 reflection 与稳定文档多处。
 - `llmdoc/memory/reflections/redis-cascade-safety-and-observability.md` — Redis cascade-safety 五参数（`{dial,read,write,pool}_timeout_ms` + `pool_size`）三引擎对齐 + pine-cpp Client 失败收敛与 SIGPIPE 守卫 + codegen markdown 跨引擎 byte-equal + per-command Redis 指标（`pine_redis_command_*` 4-state status）的 PR 复盘（13 commits / 3 轮 review），记录 codegen 单向对齐方向（Go 是 source of truth）、failed-path 静默降级审计契约、单元测试不能替代 byte-equal gate、review-driven scope expansion 接受、cpp 错误类型分层 known follow-up 五条教训。
+- `llmdoc/memory/reflections/jdk25-upgrade-and-zgc-investigation.md` — 2026-06-26 评估 pine-java 升 JDK 25 + 切 ZGC 的 A/B/C 路径复盘，记录"性能假设要测不要猜"与 deployment-shape vs GC-shape 匹配检查的教训，含 G1/ZGC GC log 实测数据点（G1 max STW 12.82ms 证伪 stddev=GC 假设、ZGC 2C cgroup 下 concurrent 6.7% CPU 偷窃致 −5~7% QPS）。
 
 ## memory/decisions/
 
 - `llmdoc/memory/decisions/perf-evolution-roadmap.md` — 引擎侧性能演进路线决策：两个校准事实（per-item VM 边界主导、VM 层加速被端到端稀释，含 itemlua 第二证据点）、三步路线（typed-ColumnFrame/arena → common-mode 列内核负载迁移（含 2026-06-13 wangshu Arena 列轨 ABI 边界量化数据点，-22%~-46% 但端到端稀释+破 parity，不立即落地）→ 第三步 VM 适配层可插拔已于 2026-06-13 触发，wangshu 翻默认）、明确不做项（VM 直摸 Go heap、简单脚本负载上的 VM 优化）、翻默认三条 AND 闸门（calibrated 不劣化 + 受影响场景显著胜出 + 双 tag 全绿）、按切换范围分档的语义闸门。
+- `llmdoc/memory/decisions/pine-java-gc-choice.md` — pine-java GC 选型决策：4G 堆 / 2C cgroup / throughput-bound 形态下保持 JDK 21+ 默认 G1，不切 ZGC/Shenandoah/Parallel；含 2026-06-26 ZGC A 实验 QPS 表与 G1/ZGC GC log 实测、根因（G1 已无长 pause 痛点 + ZGC 小核 cgroup 下 concurrent CPU 偷窃倒贴 + stddev 主导源与 GC 无关）、重启触发条件（堆 ≥16G / 核 ≥8C / P99 ≤10ms / G1 STW max >50ms 任一）、与 perf-evolution-roadmap 互补的层次关系。
diff --git a/llmdoc/memory/decisions/pine-java-gc-choice.md b/llmdoc/memory/decisions/pine-java-gc-choice.md
new file mode 100644
index 00000000..098ce41a
--- /dev/null
+++ b/llmdoc/memory/decisions/pine-java-gc-choice.md
@@ -0,0 +1,76 @@
+# pine-java GC 选型决策
+
+记录 2026-06-26 JDK 25 升级 + ZGC 评估收口后确定的 pine-java GC 选型决策。本文档覆盖"JVM 进程参数 / GC 选型"层，与 `perf-evolution-roadmap.md`（引擎侧 typed-ColumnFrame / common-mode / VM 适配层）互补不冲突。
+
+## 决策
+
+**保持 JDK 21+ 默认的 G1 GC**，不切 ZGC、不切 Shenandoah、不切 Parallel。仅当后续 deployment 形态发生质变且重测后明确有正向证据时才重启选型。
+
+## 当前部署形态
+
+- 堆：4 G 上限
+- CPU：2 C cgroup（`pine-bench-server.unit` 隔离单元）
+- 单请求延迟：100+ ms（DAG 38-op + per-item Lua + stub I/O）
+- 负载形态：**throughput-bound**（QPS 决策），不是 latency-bound
+- JVM：OpenJDK 25 runtime（v0.10.10 起 compile target 同步升 25，见 commit `62475e27`）
+- GC：JDK 21+ 默认 G1
+
+## ZGC 实测数据（2026-06-26）
+
+A 路径实验：同机串行同 fixture / 10k req × 16 conc，G1 baseline 与 ZGC 各 calibrated × 3 fixture。
+
+**QPS**
+
+| Fixture | G1 QPS | ZGC QPS | Δ |
+| --- | --- | --- | --- |
+| `calibrated_2c4g` | 127.9 | 120.9 | **−5.5%** |
+| `calibrated` | 128.1 | 119.0 | **−7.1%** |
+| `calibrated_itemlua` | 126.5 | 119.5 | **−5.5%** |
+
+stddev 33–36 ms，G1 与 ZGC 无差。
+
+**GC log 实测（单 fixture 验证）**
+
+- G1：729 pauses，avg **3.49 ms**，max **12.82 ms**，**0 次** 超 50 ms。
+- ZGC：753 STW pauses，avg **0.008 ms**，max **0.022 ms**（580× 短于 G1）。
+- ZGC concurrent phase：108 events / 总 **1087 ms** / 34 个 >10 ms（80s bench 窗口内）。
+- 2 C cgroup 下 1087 ms concurrent ≈ **6.7% CPU 偷窃**，与 −5~7% QPS 吻合。
+
+原始报告：`bench-results/report-20260625-113855.txt`（G1 baseline）、`bench-results/report-20260625-114324.txt`（ZGC）。
+
+## 根因
+
+1. **G1 已无长 pause 痛点**：max STW 12.82 ms ≪ calibrated stddev 35 ms，整体 STW budget 远小于 stddev。stddev 35 ms **完全不是 GC 来源**，因此切任何 GC 都无法收紧 stddev。
+2. **ZGC trade-off 在小核 cgroup 下输**：ZGC 把 STW 换成 concurrent CPU 工作，2 C cgroup 下 6.7% CPU 被 concurrent phase 偷走，直接体现为 QPS −5~7%。STW 的 580× 收益对 100+ ms 单请求不可见。
+3. **ZGC 适用场景与当前形态反向**：ZGC 优势在 ≥16 G 堆 / ≥8 C 核 / 1–10 ms 单请求 / 延迟敏感 SLO；pine-java 当前 4 G / 2 C / 100+ ms / throughput-bound 完全反向。
+4. **calibrated stddev 35 ms 与 GC 无关**：实测主导源是 DAG 38-op 调度抖动 + LuaJ JIT warmup + 网络抖动，证伪了"ZGC 收紧 stddev"的先验假设。详见 `llmdoc/guides/benchmark-hygiene.md` "stddev 来源校准"。
+
+## 重启选型的触发条件
+
+以下任一条触发即重做 GC 选型评估：
+
+- **deployment 形态变更**：堆 ≥16 G **或** 核心数 ≥8 **或** 单请求目标 P99 ≤10 ms（latency-bound SLO 出现）
+- **G1 worst-case pause 失控**：生产数据显示 G1 STW max > 50 ms
+- **calibrated stddev 主导源转移到 GC**：重测 GC log 验证 STW 总贡献接近 stddev 量级（当前 STW total 远小于 stddev × bench duration 时不触发）
+
+重启时复用 commit `c157242a` 引入的 `JAVA_BENCH_OPTS` 钩子（脚本侧 JVM flag 注入入口），直接复跑 A 实验对照。
+
+## 不该做的实验
+
+- `-XX:+UseSerialGC`：单线程必输，无对比价值。
+- Shenandoah：与 ZGC 同类 concurrent trade-off，且 OpenJDK 25 Temurin 不默认 ship，引入额外依赖且预期与 ZGC 同向负优化。
+- Parallel GC：throughput-only、无 concurrent class unloading、与 G1 同型号但更老，无替换收益。
+
+## 与 perf-evolution-roadmap 的关系
+
+- `llmdoc/memory/decisions/perf-evolution-roadmap.md` 圈定**引擎侧**演进（typed-ColumnFrame / common-mode 列内核 / VM 适配层），其性能假设建立在"运行环境层稳定"之上。
+- 本决策圈定**运行环境层**（JVM 进程参数 / GC 选型），是 roadmap 假设的底座。
+- 两者互补不冲突；任何引擎侧优化的 calibrated 数据均需声明 GC 形态（当前 G1），跨 GC 比较不可直接套用。
+
+## 引用
+
+- 完整复盘：`llmdoc/memory/reflections/jdk25-upgrade-and-zgc-investigation.md`
+- 编译目标升级：commit `62475e27`（pom + CI + README target 21→25）
+- JVM flag 实验钩子：commit `c157242a`（`JAVA_BENCH_OPTS` 环境变量）
+- 数据存档：`bench-results/report-20260625-113855.txt`（G1）、`bench-results/report-20260625-114324.txt`（ZGC）
+- stddev 来源校准：`llmdoc/guides/benchmark-hygiene.md`
diff --git a/llmdoc/memory/reflections/jdk25-upgrade-and-zgc-investigation.md b/llmdoc/memory/reflections/jdk25-upgrade-and-zgc-investigation.md
new file mode 100644
index 00000000..207b1be6
--- /dev/null
+++ b/llmdoc/memory/reflections/jdk25-upgrade-and-zgc-investigation.md
@@ -0,0 +1,118 @@
+---
+name: jdk25-upgrade-and-zgc-investigation
+description: 2026-06-26 评估 pine-java 升 JDK 25 + 切 ZGC 的 A/B/C 路径复盘，记录"性能假设要测不要猜"与 deployment-shape vs GC-shape 匹配检查的教训
+type: reflection
+---
+
+## Task
+
+评估 pine-java 能否升 JDK 25 + 切 ZGC，按三条路径走：
+
+- A：切 ZGC（保持 JDK 21）
+- B：升 JDK 25 编译目标（pom + CI + README）
+- C：A + B 串联
+
+预期 A 能拿一些尾延迟收益、B 风险来自 LuaJ 3.0.1 + BCEL verifier 严格化。
+
+## Expected vs Actual
+
+| 路径 | 预期 | 实际 |
+| --- | --- | --- |
+| A (ZGC) | calibrated stddev 35ms 应该是 G1 STW 来源，ZGC 收紧 stddev、QPS 持平或微升 | calibrated 形态下 net loss **−5.5% / −7.1% / −5.5%** QPS，stddev 无变化，证伪 |
+| B (JDK 25) | LuaJ 3.0.1 + BCEL 6.10.0 可能因 25 verifier 严格化而炸 | 246 tests / 0 failures，含 luajc/luac 双后端等价测试，风险证伪 |
+| C (A+B) | 取决于 A 的结果 | A 证伪后直接不需要 |
+
+最终落地：commit `62475e27`（B：bump target 25 + CI + README）+ `c157242a`（脚本 `JAVA_BENCH_OPTS` 环境变量钩子，用于以后 JVM flag 实验）。**ZGC 不切默认**。
+
+## 实测数据点（GC log 实跑，可复用）
+
+A 实验设计：同机串行、各 calibrated × 3 fixture（`calibrated_2c4g` / `calibrated` / `itemlua`）、10k req × 16 conc。
+
+**QPS**
+
+| Fixture | G1 baseline | ZGC | Δ |
+| --- | --- | --- | --- |
+| calibrated_2c4g | 127.9 | 120.9 | −5.5% |
+| calibrated | 128.1 | 119.0 | −7.1% |
+| itemlua | 126.5 | 119.5 | −5.5% |
+
+stddev 33–36ms，G1 与 ZGC 无差。
+
+**GC log 实测（2026-06-26）**
+
+- G1：729 pauses，avg **3.49 ms**，max **12.82 ms**，**0 次** 超 50 ms。
+- ZGC：753 STW pauses，avg **0.008 ms**，max **0.022 ms**（580× 短于 G1）。
+- ZGC concurrent phase：108 events / 总 **1087 ms** / 34 个 >10 ms。
+- 2C cgroup 下 1087 ms concurrent ≈ **6.7% CPU 偷窃**，与 −5~7% QPS 吻合。
+
+→ G1 STW 根本不够大（max 12.82 ms），stddev 35ms 完全不是 GC 来源。ZGC 的 0.022ms STW 收益被 concurrent CPU 偷窃在 2C cgroup 下完全吃回去且倒贴。
+
+## What Went Wrong
+
+### 1. A 路径推荐基于错误先验
+
+最初推荐 A 的理由是"calibrated stddev 35ms 应该来自 G1 STW pause、ZGC 应该收紧 stddev"。**没先采 G1 GC log 看 max pause**，直接拿 stddev 倒推 GC 是 hot spot。实测 G1 max 才 12.82ms，整体 STW budget 远小于 stddev，假设链从源头就错了。
+
+### 2. ZGC 适用场景未在调研前列检查表
+
+ZGC 优势场景（≥16 G 堆 / ≥8 C 核 / 1–10 ms 单请求 / 延迟敏感）与 pine-java 当前形态（4 G 堆 / 2 C cgroup / 单请求 100+ ms / throughput-bound）完全反向。如果调研前先列 ZGC 适用场景 vs 当前 deployment shape，30 秒就能判 "我们这种形态根本不该切 ZGC"。
+
+### 3. benchmark host runtime 与 maven target 脱节未先确认
+
+机器 PATH 第一个 `java` 已是 OpenJDK 25.0.2，但 pom `<maven.compiler.target>` 还停在 21。v0.10.9 README bench 数据其实早已跑在 JDK 25 runtime 上，只是字节码 target 21。B 路径升级实质只是补齐编译目标，**不是 runtime 切换**。开调研前没先 `java -version` 确认实际 runtime 版本，差点把 "runtime 切换" 与 "compile target 切换" 混为一谈。
+
+### 4. LuaJ 21→25 风险纯凭直觉判定
+
+升级前怀疑 LuaJ 3.0.1 + BCEL 6.10.0 在 25 verifier 严格化下会炸——这是基于历史 BCEL 在跨大版本 JDK 升级时 stackmap frame 兼容问题的直觉，但没先跑一遍 test suite。如果先跑 `mvn test`，几分钟就能证伪，不必把 LuaJ 列为高风险阻塞项。
+
+## Root Cause
+
+### 性能假设必须被实测打过才算事实
+
+"stddev 35ms 来自 GC pause" 是个看起来合理的假设，但合理 ≠ 真。GC log 一开就立刻证伪。对一切将影响选型决策的性能假设，**先跑一次最小复现拿数据，再下结论**。这条已在 `bench-lua-vs-go-performance.md` 和 `isolated-bench-and-resource-ops.md` 中以"预估偏差"的形式出现过，这是第三次同类教训。
+
+### deployment-shape vs GC-shape 匹配检查缺失
+
+JVM tuning 选型应先做"我们的部署形态匹配该 GC 的适用场景吗"检查，这是 GC 选型的零号问题。直接跳到"试一下 ZGC"就跳过了零号问题。
+
+### LTS→LTS 升级风险评估不应纯靠直觉
+
+LuaJ + BCEL 这类字节码生成依赖在跨大版本 JDK 升级时确实是合理怀疑点，但 **test suite + cross-validate 实测** 是最便宜的证伪手段，应优先于"列为风险阻塞 → 开会议 → 拉清单"。
+
+### compile target ≠ runtime
+
+发版数据基线、bench 报告、README 数字这些"实际跑在哪个 JVM 上"，与 pom 的 `<maven.compiler.target>` 是两件事。任何 JVM 升级讨论先 `java -version` + `mvn help:effective-pom | grep target`，两条命令把状态钉死。
+
+## Missing Docs or Signals
+
+1. **没有 GC 选型决策档**：pine-java 当前用 G1，但没有任何文档说明为什么用 G1、什么形态下应该重新评估切 ZGC/Shenandoah/Parallel。下次再有人问"能不能切 ZGC"，会从零重做这次调研。
+2. **`benchmark-hygiene.md` 缺 "stddev 来源校准"段**：calibrated 形态下 stddev 33–36ms 是 **应用层** 来源（IO、调度、Lua 调用栈），不是 GC 来源，但当前 guide 没说清楚。
+3. **没有"compile target vs runtime version 必须分别核对"的明文规范**：未来再有 JDK / Maven / Gradle 升级讨论，这个坑会被踩第二次。
+
+## Promotion Candidates
+
+### 应立即新增到 `decisions/pine-java-gc-choice.md`
+
+- **结论**：4 G 堆 / 2 C cgroup / throughput-bound 形态下保持 G1（默认）。ZGC 与 Shenandoah 不切。
+- **实验证据**：附本次 A 实验 QPS 表 + G1 / ZGC GC log 实测数据点。
+- **重新评估触发条件**：堆 ≥16 G **或** 核心数 ≥8 **或** 单请求目标 P99 ≤10 ms **或** 出现 G1 STW max >50 ms 的生产证据。任一条触发就重做选型。
+- **JAVA_BENCH_OPTS 实验入口**：commit `c157242a` 给后续 GC flag 实验铺路，未来再起类似讨论直接用此钩子复跑 A 实验。
+
+### 应补到 `guides/benchmark-hygiene.md`
+
+- **calibrated stddev 来源校准**：calibrated 形态下 stddev 33–36 ms 的主导来源是应用层（IO、调度、Lua 调用栈），G1 STW 贡献 <10 ms。诊断 stddev 之前先采 GC log 验明出处，避免"stddev 高 → 怀疑 GC → 切 GC"这条错误链路被复制。
+
+### 应补到 `must/conventions.md`（JVM 工具链段）
+
+- **compile target vs runtime version 必须分别核对**：任何 JDK 升级讨论 / bench 报告，先记录 `java -version`（runtime）与 `mvn help:effective-pom | grep target`（compile target），二者可不同步，不可互推。
+
+### 仅保留在 memory
+
+- 三个 fixture 的 QPS 具体数字（127.9 / 120.9 等）：当时机器状态相关，不属稳定契约。
+- LuaJ 3.0.1 + BCEL 6.10.0 在 JDK 25 下 246 tests 通过的具体快照：会随版本 drift。
+
+## Follow-up
+
+1. **本次任务实际已完成的部分**：commit `62475e27`（target 25 + CI + README）+ `c157242a`（JAVA_BENCH_OPTS 钩子），ZGC 不切。
+2. **建议在下一次 llmdoc 更新中执行**：新增 `decisions/pine-java-gc-choice.md`、给 `benchmark-hygiene.md` 补 "stddev 来源校准" 段、给 `conventions.md` 补 "compile target vs runtime version" 一句。
+3. **方法论沉淀**：以后任何 "切 X 性能优化" 类提案，强制三件套——(a) X 的适用场景表 vs 当前 deployment shape、(b) 当前形态的 baseline 指标 + 假设的瓶颈来源采证、(c) 最小复现 A/B 数据。三件套缺一项就不进决策。

From 0b48a036b072ed1fdbe33f1debf3c148da3af414 Mon Sep 17 00:00:00 2001
From: Liam <liamhuang0205@gmail.com>
Date: Fri, 26 Jun 2026 09:05:52 +0800
Subject: [PATCH 4/4] =?UTF-8?q?fix(scripts):=20bench-cross-runtime=20?=
 =?UTF-8?q?=E2=80=94=20address=20PR=20#145=20review=20nits?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two comment-only corrections inside scripts/bench-cross-runtime.sh
caught by the agentic PR reviewer; no functional change.

1. Header `Prerequisites` block still listed `Java 21`. The same PR
   bumped pom + CI + READMEs to 25, so the file-internal comment was
   the last stale "Java 21" reference. Updated to `Java 25`.

2. The new `JAVA_BENCH_OPTS` example showed `-XX:+UseZGC
   -XX:+ZGenerational`. Generational ZGC is the default since JDK 24
   and the non-generational mode is removed in 25, so the
   `-XX:+ZGenerational` flag is now obsolete — passing it emits a
   warning on every server startup. Reduced the example to just
   `-XX:+UseZGC` and noted that generational mode is default since
   JDK 24.
---
 scripts/bench-cross-runtime.sh | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/scripts/bench-cross-runtime.sh b/scripts/bench-cross-runtime.sh
index f586cf8c..b7b31508 100755
--- a/scripts/bench-cross-runtime.sh
+++ b/scripts/bench-cross-runtime.sh
@@ -6,7 +6,7 @@
 #
 # Prerequisites:
 #   - hey: go install github.com/rakyll/hey@latest
-#   - Go, Java 21, cmake + build-essential + libluajit
+#   - Go, Java 25, cmake + build-essential + libluajit
 #
 # Usage:
 #   ./scripts/bench-cross-runtime.sh [--skip go] [--modes "row,column"]
@@ -172,9 +172,10 @@ start_server() {
   # Set BENCH_VERBOSE=1 to capture server logs for debugging startup failures
   [[ "${BENCH_VERBOSE:-}" == "1" ]] && sink="$WORK_DIR/${runtime}.log"
   local -a cmd=()
-  # JAVA_BENCH_OPTS lets the caller inject JVM flags (e.g. `-XX:+UseZGC
-  # -XX:+ZGenerational`) for the java leg without touching the script.
-  # Word-split via $JAVA_BENCH_OPTS expansion; empty default = no flags.
+  # JAVA_BENCH_OPTS lets the caller inject JVM flags (e.g. `-XX:+UseZGC`
+  # for generational ZGC, default since JDK 24) for the java leg without
+  # touching the script. Word-split via $JAVA_BENCH_OPTS expansion; empty
+  # default = no flags.
   local -a java_opts=()
   if [[ -n "${JAVA_BENCH_OPTS:-}" ]]; then
     # shellcheck disable=SC2206  # intentional word-splitting for env-supplied flags