Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ jobs:
- uses: actions/setup-java@v5
with:
distribution: temurin
java-version: "21"
java-version: "25"
cache: maven
- run: mvn checkstyle:check -B

Expand Down Expand Up @@ -157,7 +157,7 @@ jobs:
- uses: actions/setup-java@v5
with:
distribution: temurin
java-version: "21"
java-version: "25"
cache: maven
- run: mvn test -B -q
- name: Upload Java coverage
Expand Down Expand Up @@ -323,7 +323,7 @@ jobs:
- uses: actions/setup-java@v5
with:
distribution: temurin
java-version: "21"
java-version: "25"
- name: Run benchmarks
run: mvn test -B -Dtest=BenchmarkTest -pl . 2>&1 | tee benchmark-java.txt
- name: Write benchmark summary
Expand All @@ -349,7 +349,7 @@ jobs:
- uses: actions/setup-java@v5
with:
distribution: temurin
java-version: "21"
java-version: "25"
cache: maven
- name: Fuzz Config.load
run: mvn test -B -Dtest="JazzerFuzzTest#fuzzConfigLoad" -Djazzer.instrument="page.liam.pine.**"
Expand All @@ -376,7 +376,7 @@ jobs:
- uses: actions/setup-java@v5
with:
distribution: temurin
java-version: "21"
java-version: "25"
cache: maven
- uses: actions/setup-python@v6
with:
Expand Down Expand Up @@ -513,7 +513,7 @@ jobs:
- uses: actions/setup-java@v5
with:
distribution: temurin
java-version: "21"
java-version: "25"
cache: maven
- uses: actions/setup-python@v6
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/daily-sanitized-fuzz.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ jobs:
- uses: actions/setup-java@v5
with:
distribution: temurin
java-version: "21"
java-version: "25"
cache: maven

- name: Install C++ build deps
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/nightly-benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:
- uses: actions/setup-java@v5
with:
distribution: temurin
java-version: "21"
java-version: "25"
cache: maven

- name: Install system deps
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/nightly-diff-fuzz.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ jobs:
- uses: actions/setup-java@v5
with:
distribution: temurin
java-version: "21"
java-version: "25"
cache: maven

- name: Install C++ build deps
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ jobs:
- uses: actions/setup-java@v5
with:
distribution: temurin
java-version: "21"
java-version: "25"
cache: maven
server-id: central
server-username: CENTRAL_USERNAME
Expand Down
2 changes: 1 addition & 1 deletion README-en.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Python DSL (Apple) ──compile──> JSON Config
### Prerequisites

- Go 1.26+ (Pine-Go)
- Java 21+ (Pine-Java)
- Java 25+ (Pine-Java)
- Python 3.11+ (Apple DSL)

### 1. Write a Pipeline
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Python DSL (Apple) ──compile──> JSON Config
### 环境要求

- Go 1.26+(Pine-Go)
- Java 21+(Pine-Java)
- Java 25+(Pine-Java)
- Python 3.11+(Apple DSL)
- CMake 3.20+ / C++23 编译器 / LuaJIT(Pine-C++,可选)

Expand Down
10 changes: 10 additions & 0 deletions llmdoc/guides/benchmark-hygiene.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,16 @@
- fresh build 之间存在 **±5-7% 的二进制布局噪声**(函数地址/对齐漂移),小于该幅度的 QPS 差异不可下结论
- 落在噪声带内的差异需 `perf stat` 微架构指标交叉验证:instructions / IPC / L1-icache-miss / branch-miss / context-switches。两个 build 微架构指标持平,即可判定"统计无差异"

## stddev 来源校准

- calibrated fixtures 的 stddev 33–36 ms **不是噪声来源,是负载的固有抖动**——切 GC / 调 JVM flag / 改 JIT 后端都不会收紧
- 三大主导源(按贡献排序):
1. **DAG 多 op 并发调度抖动**:calibrated_itemlua 38-op DAG 内 ready-queue 调度顺序非确定,单请求耗时浮动 ~20 ms
2. **LuaJ JIT warmup 在前 N 个请求的非均匀分布**:luajc 编译触发点漂移,前段请求耗时尾部偏长
3. **HTTP keepalive / server 调度抖动**:网络栈与 server 端 work-stealing 微秒级浮动放大到毫秒
- **GC pause 不在主要源中**:pine-java G1 实测 max STW pause **12.82 ms** 远低于 calibrated stddev 35 ms,整体 STW budget 远小于 stddev × bench 窗口。详见 `llmdoc/memory/decisions/pine-java-gc-choice.md` "ZGC 实测数据" 段的 GC log 实测
- **推论**:诊断 stddev 之前先采 GC log 验明出处。试图通过 GC tuning 收紧 stddev 是错误方向(除非 GC log 数据反证 STW 主导);"stddev 高 → 怀疑 GC → 切 GC" 这条错误链路已被 2026-06-26 ZGC A 实验证伪一次,不要复制

## Fixture 代表性

- `fixtures/benchmarks/realistic_for_you_calibrated*` 是生产 proxy(按真实流量 calibrate,N≈10 行),是**性能决策的唯一裁判**
Expand Down
4 changes: 3 additions & 1 deletion llmdoc/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
- `llmdoc/guides/ci-quality-baseline.md` — CI 工程质量基线:lint(含 Java checkstyle `failOnViolation=true` + `OneStatementPerLine`、C++ clang-format)/ test / coverage / fuzz / differential-fuzz / cross-validate / nightly cross-runtime benchmark / release-gate 架构与接入约定(含 pine-cpp 的 4 个 CI job 与 cross-validate cpp 二进制注入路径),统一任务入口 Makefile 体系(顶层 + `pine-go/` Makefile 封装跨四语言 fmt/lint/test/bench/codegen/版本管理,CI 与本地共用同一命令序列、`make bench` 默认 `pine_bench` tag),以及本地 `.githooks/` 体系(`pre-commit` staged-only 格式 gate + `pre-push` 工程级 lint + 自包装 CI watch)。
- `llmdoc/guides/investigation-to-fix-testing.md` — 从调查到修复的测试策略:按缺陷类型选择测试层、最小修复面原则、跟进上游 issue 与临时止血方法论(跨 issue 根因归属不顺 follow-up 措辞、临时止血阈值用 probe 实测标定)。
- `llmdoc/guides/cross-layer-validation.md` — 跨层语义校验:JSON 边界类型枚举、codegen 语义验证(含跨引擎 markdown / Python 产物 byte-equal gate)、边界值 E2E、隐含 metadata 契约检测、扩展点对等验证(能力等价)。
- `llmdoc/guides/benchmark-hygiene.md` — Benchmark 噪声卫生:跑前/跑后 load 与残留进程检查、同日同机对照纪律、±5-7% 二进制布局噪声与 perf stat 交叉验证、fixture 代表性(calibrated 为性能决策唯一裁判)、microbench 访问模式戒律、逐 op 删除归因法、测量路径对称性(PureVM vs CallOnly vs Boundary 不可互推)。
- `llmdoc/guides/benchmark-hygiene.md` — Benchmark 噪声卫生:跑前/跑后 load 与残留进程检查、同日同机对照纪律、±5-7% 二进制布局噪声与 perf stat 交叉验证、calibrated stddev 33-36ms 来源校准(DAG 调度抖动 + LuaJ JIT warmup + 网络抖动主导,GC pause 非主要源)、fixture 代表性(calibrated 为性能决策唯一裁判)、microbench 访问模式戒律、逐 op 删除归因法、测量路径对称性(PureVM vs CallOnly vs Boundary 不可互推)。

## reference/

Expand Down Expand Up @@ -107,7 +107,9 @@
- `llmdoc/memory/reflections/wangshu-borrow-optimization-survey.md` — wangshu borrow/边界优化空间调查复盘(纯调查+原型量化,未改生产代码),记录 borrow 层刻意对称是设计、Arena 列轨 ABI 边界口径 -22%(N=100)~-46%(N=3000) 但端到端会稀释+落地破四引擎 parity、makeArrayTable SetIndex 顺序 append O(N²) 建表(已提 wangshu #10)、Boundary 微基准≠calibrated 裁判的口径纪律、非单调性是红旗须先查根因。
- `llmdoc/memory/reflections/wangshu-v020rc3-upgrade-and-workaround-refactor.md` — wangshu v0.2.0-rc3 升级与两 workaround 重构 / 拆除 / 判据迁移复盘(wangshu 内存系列第四篇):上游一次回应 #9/#10/#11 三 issue(#9 真等价解 `MaybeCollectNow` 等三选一 API;#10 真根因解为 arena LARGE freelist 单链 first-fit→power-of-2 buckets,纠正前篇"rehash 风暴"错误推断;#11 partial:`Arena.Compact()` 解 transient peak、bump 不回退、sustained-fat latch 留作 follow-up),下游 cadence-sweep 真拆 / drop-fat-state 判据迁移(`GCCountKB`→`ArenaCapKB`)/ `makeArrayTable` 切 `NewArrayTable`,沉淀两条 rc 升级方法论(必读 issue close comments、workaround 拆除分 root-cause/proxy 判别);本篇覆盖纠偏第三篇 reflection 与稳定文档多处。
- `llmdoc/memory/reflections/redis-cascade-safety-and-observability.md` — Redis cascade-safety 五参数(`{dial,read,write,pool}_timeout_ms` + `pool_size`)三引擎对齐 + pine-cpp Client 失败收敛与 SIGPIPE 守卫 + codegen markdown 跨引擎 byte-equal + per-command Redis 指标(`pine_redis_command_*` 4-state status)的 PR 复盘(13 commits / 3 轮 review),记录 codegen 单向对齐方向(Go 是 source of truth)、failed-path 静默降级审计契约、单元测试不能替代 byte-equal gate、review-driven scope expansion 接受、cpp 错误类型分层 known follow-up 五条教训。
- `llmdoc/memory/reflections/jdk25-upgrade-and-zgc-investigation.md` — 2026-06-26 评估 pine-java 升 JDK 25 + 切 ZGC 的 A/B/C 路径复盘,记录"性能假设要测不要猜"与 deployment-shape vs GC-shape 匹配检查的教训,含 G1/ZGC GC log 实测数据点(G1 max STW 12.82ms 证伪 stddev=GC 假设、ZGC 2C cgroup 下 concurrent 6.7% CPU 偷窃致 −5~7% QPS)。

## memory/decisions/

- `llmdoc/memory/decisions/perf-evolution-roadmap.md` — 引擎侧性能演进路线决策:两个校准事实(per-item VM 边界主导、VM 层加速被端到端稀释,含 itemlua 第二证据点)、三步路线(typed-ColumnFrame/arena → common-mode 列内核负载迁移(含 2026-06-13 wangshu Arena 列轨 ABI 边界量化数据点,-22%~-46% 但端到端稀释+破 parity,不立即落地)→ 第三步 VM 适配层可插拔已于 2026-06-13 触发,wangshu 翻默认)、明确不做项(VM 直摸 Go heap、简单脚本负载上的 VM 优化)、翻默认三条 AND 闸门(calibrated 不劣化 + 受影响场景显著胜出 + 双 tag 全绿)、按切换范围分档的语义闸门。
- `llmdoc/memory/decisions/pine-java-gc-choice.md` — pine-java GC 选型决策:4G 堆 / 2C cgroup / throughput-bound 形态下保持 JDK 21+ 默认 G1,不切 ZGC/Shenandoah/Parallel;含 2026-06-26 ZGC A 实验 QPS 表与 G1/ZGC GC log 实测、根因(G1 已无长 pause 痛点 + ZGC 小核 cgroup 下 concurrent CPU 偷窃倒贴 + stddev 主导源与 GC 无关)、重启触发条件(堆 ≥16G / 核 ≥8C / P99 ≤10ms / G1 STW max >50ms 任一)、与 perf-evolution-roadmap 互补的层次关系。
76 changes: 76 additions & 0 deletions llmdoc/memory/decisions/pine-java-gc-choice.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# pine-java GC 选型决策

记录 2026-06-26 JDK 25 升级 + ZGC 评估收口后确定的 pine-java GC 选型决策。本文档覆盖"JVM 进程参数 / GC 选型"层,与 `perf-evolution-roadmap.md`(引擎侧 typed-ColumnFrame / common-mode / VM 适配层)互补不冲突。

## 决策

**保持 JDK 21+ 默认的 G1 GC**,不切 ZGC、不切 Shenandoah、不切 Parallel。仅当后续 deployment 形态发生质变且重测后明确有正向证据时才重启选型。

## 当前部署形态

- 堆:4 G 上限
- CPU:2 C cgroup(`pine-bench-server.unit` 隔离单元)
- 单请求延迟:100+ ms(DAG 38-op + per-item Lua + stub I/O)
- 负载形态:**throughput-bound**(QPS 决策),不是 latency-bound
- JVM:OpenJDK 25 runtime(v0.10.10 起 compile target 同步升 25,见 commit `62475e27`)
- GC:JDK 21+ 默认 G1

## ZGC 实测数据(2026-06-26)

A 路径实验:同机串行同 fixture / 10k req × 16 conc,G1 baseline 与 ZGC 各 calibrated × 3 fixture。

**QPS**

| Fixture | G1 QPS | ZGC QPS | Δ |
| --- | --- | --- | --- |
| `calibrated_2c4g` | 127.9 | 120.9 | **−5.5%** |
| `calibrated` | 128.1 | 119.0 | **−7.1%** |
| `calibrated_itemlua` | 126.5 | 119.5 | **−5.5%** |

stddev 33–36 ms,G1 与 ZGC 无差。

**GC log 实测(单 fixture 验证)**

- G1:729 pauses,avg **3.49 ms**,max **12.82 ms**,**0 次** 超 50 ms。
- ZGC:753 STW pauses,avg **0.008 ms**,max **0.022 ms**(580× 短于 G1)。
- ZGC concurrent phase:108 events / 总 **1087 ms** / 34 个 >10 ms(80s bench 窗口内)。
- 2 C cgroup 下 1087 ms concurrent ≈ **6.7% CPU 偷窃**,与 −5~7% QPS 吻合。

原始报告:`bench-results/report-20260625-113855.txt`(G1 baseline)、`bench-results/report-20260625-114324.txt`(ZGC)。

## 根因

1. **G1 已无长 pause 痛点**:max STW 12.82 ms ≪ calibrated stddev 35 ms,整体 STW budget 远小于 stddev。stddev 35 ms **完全不是 GC 来源**,因此切任何 GC 都无法收紧 stddev。
2. **ZGC trade-off 在小核 cgroup 下输**:ZGC 把 STW 换成 concurrent CPU 工作,2 C cgroup 下 6.7% CPU 被 concurrent phase 偷走,直接体现为 QPS −5~7%。STW 的 580× 收益对 100+ ms 单请求不可见。
3. **ZGC 适用场景与当前形态反向**:ZGC 优势在 ≥16 G 堆 / ≥8 C 核 / 1–10 ms 单请求 / 延迟敏感 SLO;pine-java 当前 4 G / 2 C / 100+ ms / throughput-bound 完全反向。
4. **calibrated stddev 35 ms 与 GC 无关**:实测主导源是 DAG 38-op 调度抖动 + LuaJ JIT warmup + 网络抖动,证伪了"ZGC 收紧 stddev"的先验假设。详见 `llmdoc/guides/benchmark-hygiene.md` "stddev 来源校准"。

## 重启选型的触发条件

以下任一条触发即重做 GC 选型评估:

- **deployment 形态变更**:堆 ≥16 G **或** 核心数 ≥8 **或** 单请求目标 P99 ≤10 ms(latency-bound SLO 出现)
- **G1 worst-case pause 失控**:生产数据显示 G1 STW max > 50 ms
- **calibrated stddev 主导源转移到 GC**:重测 GC log 验证 STW 总贡献接近 stddev 量级(当前 STW total 远小于 stddev × bench duration 时不触发)

重启时复用 commit `c157242a` 引入的 `JAVA_BENCH_OPTS` 钩子(脚本侧 JVM flag 注入入口),直接复跑 A 实验对照。

## 不该做的实验

- `-XX:+UseSerialGC`:单线程必输,无对比价值。
- Shenandoah:与 ZGC 同类 concurrent trade-off,且 OpenJDK 25 Temurin 不默认 ship,引入额外依赖且预期与 ZGC 同向负优化。
- Parallel GC:throughput-only、无 concurrent class unloading、与 G1 同型号但更老,无替换收益。

## 与 perf-evolution-roadmap 的关系

- `llmdoc/memory/decisions/perf-evolution-roadmap.md` 圈定**引擎侧**演进(typed-ColumnFrame / common-mode 列内核 / VM 适配层),其性能假设建立在"运行环境层稳定"之上。
- 本决策圈定**运行环境层**(JVM 进程参数 / GC 选型),是 roadmap 假设的底座。
- 两者互补不冲突;任何引擎侧优化的 calibrated 数据均需声明 GC 形态(当前 G1),跨 GC 比较不可直接套用。

## 引用

- 完整复盘:`llmdoc/memory/reflections/jdk25-upgrade-and-zgc-investigation.md`
- 编译目标升级:commit `62475e27`(pom + CI + README target 21→25)
- JVM flag 实验钩子:commit `c157242a`(`JAVA_BENCH_OPTS` 环境变量)
- 数据存档:`bench-results/report-20260625-113855.txt`(G1)、`bench-results/report-20260625-114324.txt`(ZGC)
- stddev 来源校准:`llmdoc/guides/benchmark-hygiene.md`
Loading