From b4bc0c56b6ea7748aa59ca9ee98991a151b49499 Mon Sep 17 00:00:00 2001
From: Liam <liamhuang0205@gmail.com>
Date: Thu, 25 Jun 2026 09:09:35 +0800
Subject: [PATCH 1/4] =?UTF-8?q?docs(readme):=20drop=20v0.7=20migration=20s?=
 =?UTF-8?q?ection=20(=E5=8E=86=E5=8F=B2=E5=8C=85=E8=A2=B1=E5=B7=B2?=
 =?UTF-8?q?=E6=88=90=20stale)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

v0.7 把 Go 引擎搬进 pine-go/ 子目录 + row_dependency→consumes_row_set
+ barrier→marker interface + Field Accessor 三态翻转,这些破坏性变更
到 v0.10 已经过去 3 个 minor + 14 个 patch 版本,生产用户都早已迁移
完成。继续放在 README 主线只会让首屏被 80 行历史变更挡住,新读者
找不到核心特性。

迁移指南本身没失效,但归档价值已耗尽——查 git log / CHANGELOG.md
或 design_doc 即可。design_doc 仍持有完整的语义描述(05_operator_
types.md 的 consumes_row_set DSL 字段、04_operator_registration.md
的注册形态、03_xxx 的 DAG 调度模型),不构成知识丢失。
---
 README-en.md | 83 ---------------------------------------------------
 README.md    | 84 ----------------------------------------------------
 2 files changed, 167 deletions(-)

diff --git a/README-en.md b/README-en.md
index 8d182bcf..2aa004ac 100644
--- a/README-en.md
+++ b/README-en.md
@@ -46,89 +46,6 @@ Python DSL (Apple)  ──compile──>  JSON Config
 - **Tri-engine consistency** — Go/Java/C++ engines verified via CI cross-validation for schema, DAG, execution, error, server, and metrics parity
 - **Pine-C++ benchmark runtime** — Complete third runtime with operator parity, HTTP server (hot reload / graceful shutdown), ColumnFrame/RowFrame dual physical layouts, lazy OperatorInput projection, LuaJIT integration, metrics/resource parity
 
-## Migrating from Older Versions (Breaking Change)
-
-> Starting from v0.7, the Go engine has moved from the repository root into the `pine-go/` subdirectory. The Go module path has changed accordingly.
-
-### What Changed
-
-| Item | Before | After |
-|------|--------|-------|
-| Module path | `github.com/Liam0205/pineapple` | `github.com/Liam0205/pineapple/pine-go` |
-| Import | `github.com/Liam0205/pineapple/internal/...` | `github.com/Liam0205/pineapple/pine-go/internal/...` |
-| Import | `github.com/Liam0205/pineapple/pkg/...` | `github.com/Liam0205/pineapple/pine-go/pkg/...` |
-| Import | `github.com/Liam0205/pineapple/operators` | `github.com/Liam0205/pineapple/pine-go/operators` |
-| Binary | `go build ./cmd/pineapple-server` | `go build ./pine-go/cmd/pineapple-server` |
-
-### Migration Steps
-
-```bash
-# 1. Bulk-replace import paths
-find . -name '*.go' -exec sed -i \
-  's|github.com/Liam0205/pineapple/|github.com/Liam0205/pineapple/pine-go/|g' {} +
-
-# 2. Fix double-nesting if you referenced the module itself
-find . -name '*.go' -exec sed -i \
-  's|github.com/Liam0205/pineapple/pine-go/pine-go/|github.com/Liam0205/pineapple/pine-go/|g' {} +
-
-# 3. Update go.mod
-go get github.com/Liam0205/pineapple/pine-go@latest
-go mod tidy
-```
-
-If your project uses Pineapple through public APIs (`pine.NewEngine`, `pine.BuildOperator`, etc.), the above steps complete the migration.
-
-### Configuration & Runtime Semantic Changes
-
-The following changes affect JSON configuration and operator runtime behavior:
-
-#### 1. `row_dependency` Renamed to `consumes_row_set`
-
-The `"row_dependency": true` field in operator JSON config has been removed. Use `"consumes_row_set": true` instead (same semantics: marks the operator as needing a stable row set before execution).
-
-```diff
- {
-   "type_name": "transform_size",
--  "row_dependency": true,
-+  "consumes_row_set": true,
-   "$metadata": { ... }
- }
-```
-
-Apple DSL side: `OpCall(..., row_dependency=True)` → `OpCall(..., consumes_row_set=True)`.
-
-#### 2. DAG Scheduling Model: Barriers → Row-Set Marker Interfaces
-
-Previously, Filter/Merge/Reorder operators acted as "barriers" — all predecessors had to complete before them, and all successors had to wait.
-
-The new model uses three marker interfaces for precise row-set dependency declaration:
-
-| Marker | Meaning | Typical Operators |
-|--------|---------|-------------------|
-| `ConsumesRowSet` | Iterates all items; needs row set stable | filter_*, merge_*, reorder_*, transform_size |
-| `MutatesRowSet` | Removes or reorders items | filter_*, merge_*, reorder_* |
-| `AdditiveWritesRowSet` | Appends items (parallel with other appenders) | recall_* |
-
-**Impact**: Transform operators that only touch common fields are no longer blocked by barriers and can execute in parallel with Filter/Merge/Reorder. This improves parallelism without changing final results — correctness is guaranteed by field-level data hazard analysis.
-
-**Custom operator migration**: If you implemented a custom Recall-type operator, embed `types.AdditiveWritesRowSetMarker`.
-
-#### 3. Field Accessor Strict Mode
-
-`BuildInput` now distinguishes Strict vs. Defaulted fields:
-
-- **Strict** (fields without a `common_defaults` / `item_defaults` entry): errors immediately at runtime if the value is nil, instead of passing nil to the operator
-- **Defaulted** (fields with a default): substitutes the default when the value is nil or missing
-
-**Impact**: If your pipeline relies on "nil passthrough to operator for self-handling", add a `common_defaults` or `item_defaults` entry for that field (value can be `null`) to preserve the old behavior:
-
-```json
-{
-  "$metadata": { "common_input": ["optional_field"], ... },
-  "common_defaults": { "optional_field": null }
-}
-```
-
 ## Quick Start
 
 ### Prerequisites
diff --git a/README.md b/README.md
index 91a4b7cf..7c89788a 100644
--- a/README.md
+++ b/README.md
@@ -46,90 +46,6 @@ Python DSL (Apple)  ──compile──>  JSON Config
 - **三引擎一致性** — Go/Java/C++ 引擎通过 CI 交叉验证保证 schema、DAG、执行结果、错误消息一致
 - **Pine-C++ 标杆运行时** — 完整第三运行时，内置算子与 Go/Java 完全对等、HTTP server（热加载/graceful shutdown）、ColumnFrame/RowFrame 双物理实现、OperatorInput lazy 投影、LuaJIT 集成、metrics/resource 对等
 
-## 从旧版迁移（Breaking Change）
-
-> 自 v0.7 起，Go 引擎从仓库根目录迁移至 `pine-go/` 子目录，Go module path 随之变更。
-
-### 变更内容
-
-| 项目 | 迁移前 | 迁移后 |
-|------|--------|--------|
-| Module path | `github.com/Liam0205/pineapple` | `github.com/Liam0205/pineapple/pine-go` |
-| Import | `github.com/Liam0205/pineapple/internal/...` | `github.com/Liam0205/pineapple/pine-go/internal/...` |
-| Import | `github.com/Liam0205/pineapple/pkg/...` | `github.com/Liam0205/pineapple/pine-go/pkg/...` |
-| Import | `github.com/Liam0205/pineapple/operators` | `github.com/Liam0205/pineapple/pine-go/operators` |
-| Binary | `go build ./cmd/pineapple-server` | `go build ./pine-go/cmd/pineapple-server` |
-
-### 下游迁移步骤
-
-```bash
-# 1. 批量替换 import path
-find . -name '*.go' -exec sed -i \
-  's|github.com/Liam0205/pineapple/|github.com/Liam0205/pineapple/pine-go/|g' {} +
-
-# 2. 修正 module 自身的引用（避免多余的 pine-go/pine-go）
-find . -name '*.go' -exec sed -i \
-  's|github.com/Liam0205/pineapple/pine-go/pine-go/|github.com/Liam0205/pineapple/pine-go/|g' {} +
-
-# 3. 更新 go.mod
-go get github.com/Liam0205/pineapple/pine-go@latest
-go mod tidy
-```
-
-如果你的项目通过 `pine.NewEngine` / `pine.BuildOperator` 等公共 API 使用 Pineapple，上述步骤即可完成迁移。
-
-### 配置与运行时语义变更
-
-以下变更影响 JSON 配置和算子运行时行为：
-
-#### 1. `row_dependency` 重命名为 `consumes_row_set`
-
-JSON 配置中算子的 `"row_dependency": true` 字段已移除，改用 `"consumes_row_set": true`（语义不变：标记算子需要等待行集稳定后才执行）。
-
-```diff
- {
-   "type_name": "transform_size",
--  "row_dependency": true,
-+  "consumes_row_set": true,
-   "$metadata": { ... }
- }
-```
-
-Apple DSL 侧同步变更：`OpCall(..., row_dependency=True)` → `OpCall(..., consumes_row_set=True)`。
-
-#### 2. DAG 调度模型变更：barrier → row-set marker interfaces
-
-旧模型中 Filter/Merge/Reorder 算子被视为"barrier"——在它们执行前所有前驱必须完成，所有后继必须等它完成。
-
-新模型通过三个 marker interface 精确声明 row-set 依赖：
-
-| Marker | 含义 | 典型算子 |
-|--------|------|----------|
-| `ConsumesRowSet` | 迭代所有 item，需要行集稳定 | filter_*, merge_*, reorder_*, transform_size |
-| `MutatesRowSet` | 删除或重排 item | filter_*, merge_*, reorder_* |
-| `AdditiveWritesRowSet` | 追加 item（与其他追加者并行） | recall_* |
-
-**影响**：仅操作 common 字段的 Transform 算子不再被 barrier 阻塞，可与 Filter/Merge/Reorder 并行执行。这提升了并行度但不改变最终结果——正确性由字段级数据冒险分析保证。
-
-**自定义算子迁移**：如果你实现了自定义的 Recall 类型算子，需要嵌入 `types.AdditiveWritesRowSetMarker`。
-
-#### 3. Field Accessor 三态模型
-
-`BuildInput` 现在支持三种字段模式：
-
-- **Nullable**（默认）：字段缺失时报错，值为 nil 时透传给算子
-- **Strict**（通过 `strict_common` / `strict_item` 声明）：值为 nil 时立即报错
-- **Defaulted**（通过 `common_defaults` / `item_defaults` 声明）：值为 nil 或缺失时替换为默认值
-
-**影响**：v0.9.0 起默认模式从 Strict 变为 Nullable。如果你的流水线依赖"nil 值必须报错"的行为，需要在配置中声明 `strict_common` / `strict_item`：
-
-```json
-{
-  "$metadata": { "common_input": ["required_field"], ... },
-  "strict_common": ["required_field"]
-}
-```
-
 ## Quick Start
 
 ### 环境要求

From a0bc690c9a6735639601ce4517f75fb95eddf7a3 Mon Sep 17 00:00:00 2001
From: Liam <liamhuang0205@gmail.com>
Date: Thu, 25 Jun 2026 09:10:32 +0800
Subject: [PATCH 2/4] =?UTF-8?q?docs(readme):=20refresh=20=E6=A0=B8?=
 =?UTF-8?q?=E5=BF=83=E7=89=B9=E6=80=A7=20=E2=80=94=20wangshu=20/=20Redis?=
 =?UTF-8?q?=20cascade-safety=20/=20/stats=20fan-out?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Core features list was last refreshed pre-v0.10; this batch syncs to
v0.10.9 reality:

- Lua: explicit pine-go default = wangshu (with build-tag escape to
  gopher-lua), pine-java = LuaJC, pine-cpp = LuaJIT. The default flip
  landed in v0.10 series.
- Resources: split into data-typed (snapshot) vs handle-typed (borrow +
  RAII teardown) — `redis_connection` is the canonical handle-typed
  resource. The old one-liner "background-refreshed in-memory resource
  manager" hides the architecture.
- Redis: add dedicated bullet for the 5 cascade-safety params and the
  4-state per-command metrics + fail-on-error contract. These shipped
  in #137 and are production-load-bearing.
- Observability: /stats is no longer just a single endpoint — call out
  /stats.http and /stats.resources sub-trees so readers can find the
  resource fan-out and HTTP middleware metrics.
- Cross-validation bullet: name the actual verification surface (19
  cross-validate sections + differential fuzz + daily sanitized fuzz)
  instead of the vague "verified for schema/DAG/exec/error parity".

EN side kept structurally aligned with the CN edits.
---
 README-en.md | 9 +++++----
 README.md    | 9 +++++----
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/README-en.md b/README-en.md
index 2aa004ac..1fce22c2 100644
--- a/README-en.md
+++ b/README-en.md
@@ -38,12 +38,13 @@ Python DSL (Apple)  ──compile──>  JSON Config
 - **Implicit graph construction** — Operators declare input/output fields; engine infers DAG dependencies with transitive reduction
 - **Lock-free parallelism** — Independent operators in the DAG execute in parallel automatically
 - **Compile-time validation** — Dead code, missing fields, write-after-write detected before deployment
-- **Embedded Lua** — Built-in Lua operators for lightweight custom computation. End-to-end overhead ~1.2-2x; isolated operator-level overhead varies by runtime and compute complexity (C++/LuaJIT ~3-5x, Java ~2-9x, Go ~6-17x) — write native operators for compute-heavy hot paths
+- **Embedded Lua** — Built-in Lua operators for lightweight custom computation. pine-go defaults to [wangshu](https://github.com/Liam0205/wangshu) (pure-Go Lua 5.1 VM, NaN-boxing + arena GC); switch back to gopher-lua via `-tags=lua_gopher`. pine-java uses LuaJC (bytecode compilation), pine-cpp uses LuaJIT. End-to-end overhead ~1.2-2x; isolated operator-level overhead varies by runtime and compute complexity (C++/LuaJIT ~3-5x, Java ~2-9x, Go ~6-17x) — write native operators for compute-heavy hot paths
 - **Hot config reload** — Service automatically reloads engine config without downtime
-- **Dynamic resources** — Background-refreshed in-memory resource manager with lock-free reads
-- **White-box observability** — Operator-level traces, `/stats` endpoint, pluggable Prometheus interface
+- **Dynamic resources** — Two-channel resource manager: **data-typed** (e.g. static dict / real-time feature store, snapshot-exported lock-free reads) + **handle-typed** (e.g. `redis_connection`, borrow lease + RAII teardown); background-refreshed
+- **Redis cascade-safety** — The `redis_connection` resource exposes 5 cascade params (`{dial,read,write,pool}_timeout_ms` + `pool_size`); per-command metrics `pine_redis_command_*` with 4-state status (ok / timeout / pool_timeout / error), fail-on-error silent-degradation contract
+- **White-box observability** — Operator-level traces; the `/stats` composite response includes `/stats.http` (request-level 4-state metrics) + `/stats.resources` (resource pool / probe / per-command 4-state categories); pluggable Prometheus interface
 - **Row/Column storage** — DataFrame supports both storage modes
-- **Tri-engine consistency** — Go/Java/C++ engines verified via CI cross-validation for schema, DAG, execution, error, server, and metrics parity
+- **Tri-engine consistency** — Go/Java/C++ engines verified byte-exactly via CI cross-validation (19 sections + tri-engine differential fuzz + daily ASan/TSan sanitized fuzz)
 - **Pine-C++ benchmark runtime** — Complete third runtime with operator parity, HTTP server (hot reload / graceful shutdown), ColumnFrame/RowFrame dual physical layouts, lazy OperatorInput projection, LuaJIT integration, metrics/resource parity
 
 ## Quick Start
diff --git a/README.md b/README.md
index 7c89788a..1245bcd0 100644
--- a/README.md
+++ b/README.md
@@ -38,12 +38,13 @@ Python DSL (Apple)  ──compile──>  JSON Config
 - **隐式构图** — 算子声明输入/输出字段，引擎自动推导 DAG 依赖并执行传递性归约
 - **无锁并行** — DAG 中无依赖的算子自动并行执行
 - **编译期校验** — 死代码、字段缺失、写后未读等问题在部署前拦截
-- **Lua 嵌入** — 内置 Lua 算子支持轻量自定义计算。端到端开销约 1.2-2x;隔离算子级开销随运行时与计算复杂度变化（C++/LuaJIT 约 3-5x、Java 约 2-9x、Go 约 6-17x），计算密集型热路径建议写原生算子
+- **Lua 嵌入** — 内置 Lua 算子支持轻量自定义计算。pine-go 默认 [wangshu](https://github.com/Liam0205/wangshu)（纯 Go Lua 5.1 VM，NaN-boxing + arena GC），可通过 `-tags=lua_gopher` 切回 gopher-lua；pine-java 用 LuaJC（字节码编译），pine-cpp 用 LuaJIT。端到端开销约 1.2-2x；隔离算子级开销随运行时与计算复杂度变化（C++/LuaJIT 约 3-5x、Java 约 2-9x、Go 约 6-17x），计算密集型热路径建议写原生算子
 - **配置热加载** — 服务运行时自动无停机重载引擎配置
-- **动态资源** — 后台定时刷新的内存资源管理器，无锁读
-- **白盒可观测** — 算子级 trace、`/stats` 端点、可插拔 Prometheus 接口
+- **动态资源** — 双通道资源管理：**数据型**（如静态 dict / 实时 feature store，snapshot 导出后无锁读）+ **句柄型**（如 `redis_connection`，borrow 借用 + RAII 拆除）；后台定时刷新
+- **Redis cascade-safety** — `redis_connection` 资源暴露 `{dial,read,write,pool}_timeout_ms` + `pool_size` 五参数，per-command 指标 `pine_redis_command_*`（4-state status：ok / timeout / pool_timeout / error），fail-on-error 静默降级契约
+- **白盒可观测** — 算子级 trace；`/stats` 组合响应含 `/stats.http`（请求级 4-state 指标）+ `/stats.resources`（资源池连接池/探针/per-command 4 状态分类）；可插拔 Prometheus 接口
 - **行存/列存可切换** — DataFrame 支持两种存储模式
-- **三引擎一致性** — Go/Java/C++ 引擎通过 CI 交叉验证保证 schema、DAG、执行结果、错误消息一致
+- **三引擎一致性** — Go/Java/C++ 引擎通过 CI 交叉验证保证 schema、DAG、执行结果、错误消息字节级一致（19 section cross-validate + 三引擎差分 fuzz + 每日 ASan/TSan sanitized fuzz）
 - **Pine-C++ 标杆运行时** — 完整第三运行时，内置算子与 Go/Java 完全对等、HTTP server（热加载/graceful shutdown）、ColumnFrame/RowFrame 双物理实现、OperatorInput lazy 投影、LuaJIT 集成、metrics/resource 对等
 
 ## Quick Start

From d22b1de9c81aa56d1d46decd65a677e4796f454c Mon Sep 17 00:00:00 2001
From: Liam <liamhuang0205@gmail.com>
Date: Thu, 25 Jun 2026 09:12:38 +0800
Subject: [PATCH 3/4] docs(readme): add Makefile / githooks / llmdoc + daily
 sanitized fuzz CI row
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The README has grown out of sync with how dev work actually happens now.
Three additions:

1. **Makefile section** at the top of the dev block. The top-level
   Makefile + pine-go/Makefile are the actual unified entry — CI and
   local share the exact verb sequence. Up to now scripts/ was the only
   surface readers saw, which understates the project's task plumbing.
   Cross-checked the listed verbs against `make` output: all exist.

2. **Local Git Hooks section**. .githooks/{pre-commit,pre-push} ships
   in-tree and is the source of three concrete dev ergonomics:
   staged-only format gate (no surprise overwrites), four-language lint
   on push, and the auto `--set-upstream` relay landed in pine #139
   (absorbed from wangshu#24 / ctex-kit#888). Without docs, first-time
   contributors miss `core.hooksPath` and lose the lint gate.

3. **Daily sanitized fuzz** added to the CI list — promoted from weekly
   to daily in ef24382c (#109) and load-bearing for race / memory-bug
   surveillance separate from the per-push fuzz fast lane.

Also added `llmdoc/` to the Documentation table since it is now the
canonical AI-collaboration knowledge map and constantly referenced from
issue comments. clang-format -Werror clarified as the actual C++ lint
form (the bare "-Werror" was ambiguous).

EN side kept structurally aligned.
---
 README-en.md | 35 +++++++++++++++++++++++++++++++++--
 README.md    | 35 +++++++++++++++++++++++++++++++++--
 2 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/README-en.md b/README-en.md
index 1fce22c2..98749a09 100644
--- a/README-en.md
+++ b/README-en.md
@@ -157,8 +157,28 @@ pineapple/
 
 ## Development
 
+### Top-level Make Targets
+
+Cross-language fmt / lint / test / bench / codegen / version management is unified behind the top-level `Makefile` (with `pine-go/Makefile` for Go-specific work). CI and local dev share the same command sequence.
+
+| Make target | Purpose |
+|---|---|
+| `make fmt` | Format all four languages (gofmt / google-java-format / clang-format / ruff) |
+| `make lint` | Lint all four languages (incl. checkstyle `failOnViolation=true`, `-Werror`) |
+| `make test` | Full test suite across runtimes |
+| `make bench` | Default `pine_bench` tag |
+| `make bench-cross-runtime` | Cross-engine fixture-driven benchmark (cgroup-isolated) |
+| `make bench-lua-backends` | wangshu vs gopher-lua, same-host serial + benchstat |
+| `make differential-fuzz` | Tri-engine differential fuzz |
+| `make cross-validate` | Tri-engine consistency verification |
+| `make codegen` | Generate `apple_generated/` + `doc/operators/` from pine-go Registry |
+| `make codegen-check` | CI: codegen + `git diff --exit-code` to enforce artifact freshness |
+| `make check-pr-ci` | Watch CI status of the current branch's PR (pre-push hook calls this) |
+
 ### Scripts
 
+`scripts/` holds the actual implementations behind the Make targets and can be invoked standalone:
+
 | Script | Purpose |
 |--------|---------|
 | `scripts/go-test.sh` | Run all Go tests |
@@ -168,6 +188,7 @@ pineapple/
 | `scripts/go-bench.sh` | Go benchmarks |
 | `scripts/java-bench.sh` | Java benchmarks |
 | `scripts/bench-cross-runtime.sh` | Cross-engine HTTP server benchmark (fixture-driven, cgroup-isolated) |
+| `scripts/bench-lua-backends.sh` | wangshu vs gopher-lua backend comparison (benchstat delta) |
 | `scripts/go-fuzz.sh` | Go fuzz testing |
 | `scripts/java-fuzz.sh` | Java fuzz testing |
 | `scripts/differential-fuzz.sh` | Tri-engine differential fuzzing (random pipelines, output diff) |
@@ -178,16 +199,25 @@ pineapple/
 | `scripts/render-dag.sh` | DAG visualization (`--backend go\|java`) |
 | `scripts/apple-compile.sh` | Compile Apple DSL to JSON |
 | `scripts/run-pipeline.sh` | One-shot pipeline execution |
-| `scripts/bump-version.sh` | Synchronize version across all components |
+| `scripts/bump-version.sh` | Synchronize version across all components (incl. pine-cpp `kVersion`) |
+| `scripts/check-pr-ci.sh` | Watch CI status of the current branch's PR (pre-push hook invokes this) |
+
+### Local Git Hooks
+
+`.githooks/` ships with the repository; activate via `git config core.hooksPath .githooks` once after clone:
+
+- **`pre-commit`** — staged-only format gate (gofmt / clang-format / ruff); does not touch unstaged work
+- **`pre-push`** — project-level lint (four-language fail-on-violation) + self-wrapped post-push CI watcher (auto-runs `check-pr-ci.sh` after the actual push) + auto `--set-upstream` relay (first-push of a new branch does not need a manual `-u`)
 
 ### CI Pipeline
 
 CI runs automatically on every push/PR:
 
-- **Lint** — Go (golangci-lint), Java (checkstyle, failOnViolation=true), Python (ruff), C++ (-Werror)
+- **Lint** — Go (golangci-lint), Java (checkstyle, failOnViolation=true), Python (ruff), C++ (clang-format -Werror)
 - **Test** — Full Go/Java/Apple/C++ test suites with coverage
 - **Sanitizer** — C++ ASan/UBSan smoke + ThreadSanitizer stress
 - **Fuzz** — Go/Java fuzz + tri-engine differential fuzzing
+- **Daily sanitized fuzz** — Daily (12:00 UTC+8) ASan/TSan differential fuzz, 3000+2000 rounds, dedicated to race / memory-bug deep diagnostics (independent of the per-push fast lane)
 - **Benchmark** — Go/Java performance benchmarks
 - **Cross-validation** — Tri-engine schema/DAG/execution/error/server/metrics parity
 - **Codegen check** — Ensures generated code is in sync with source
@@ -347,6 +377,7 @@ Highlights:
 | Operator development | [`doc/guide_operator-en.md`](doc/guide_operator-en.md) — Go operator development guide |
 | Third-party extensions | [`design_doc/12_distribution-en.md`](design_doc/12_distribution-en.md) — Add custom operators without modifying source |
 | API reference | [`doc/api-en.md`](doc/api-en.md) — HTTP endpoint documentation |
+| LLM retrieval docs | [`llmdoc/`](llmdoc/) — Stable knowledge map for AI collaboration (architecture / decisions / reflections / index) |
 
 ## License
 
diff --git a/README.md b/README.md
index 1245bcd0..932275bf 100644
--- a/README.md
+++ b/README.md
@@ -172,8 +172,28 @@ pineapple/
 - **Cross-validate**：全 section 接入，三引擎一致性验证
 
 
+### 开发任务入口（Makefile）
+
+跨四语言的 fmt / lint / test / bench / codegen / 版本管理统一通过顶层 `Makefile` + `pine-go/Makefile` 暴露，CI 与本地共用同一命令序列。常用 verb：
+
+| Make 目标 | 用途 |
+|---|---|
+| `make fmt` | 四语言格式化（gofmt / google-java-format / clang-format / ruff） |
+| `make lint` | 四语言 lint（含 checkstyle `failOnViolation=true`、`-Werror`） |
+| `make test` | 全引擎测试 |
+| `make bench` | 默认 `pine_bench` tag |
+| `make bench-cross-runtime` | 跨引擎 fixture 驱动 benchmark（cgroup 隔离） |
+| `make bench-lua-backends` | wangshu vs gopher-lua 同机串行连跑 + benchstat |
+| `make differential-fuzz` | 三引擎差分 fuzz |
+| `make cross-validate` | 跨引擎一致性验证 |
+| `make codegen` | 从 pine-go Registry 生成 `apple_generated/` + `doc/operators/` |
+| `make codegen-check` | CI 用：codegen 后 `git diff --exit-code`，确保产物新鲜 |
+| `make check-pr-ci` | watch 当前分支 PR 的 CI 状态（pre-push hook 也会自动调用） |
+
 ### 常用脚本
 
+`scripts/` 下的脚本是 Make 目标的具体实现，可单独调用：
+
 | 脚本 | 用途 |
 |------|------|
 | `scripts/go-test.sh` | Go 全量测试 |
@@ -183,6 +203,7 @@ pineapple/
 | `scripts/go-bench.sh` | Go 性能基准 |
 | `scripts/java-bench.sh` | Java 性能基准 |
 | `scripts/bench-cross-runtime.sh` | 跨引擎 HTTP server benchmark（fixture 驱动，cgroup 资源隔离） |
+| `scripts/bench-lua-backends.sh` | wangshu vs gopher-lua 后端对比（benchstat delta） |
 | `scripts/go-fuzz.sh` | Go fuzz 测试 |
 | `scripts/java-fuzz.sh` | Java fuzz 测试 |
 | `scripts/differential-fuzz.sh` | 三引擎差异模糊测试（随机生成 pipeline 比对输出） |
@@ -193,16 +214,25 @@ pineapple/
 | `scripts/render-dag.sh` | DAG 可视化（`--backend go\|java`） |
 | `scripts/apple-compile.sh` | Apple DSL 编译为 JSON |
 | `scripts/run-pipeline.sh` | 单次执行 pipeline |
-| `scripts/bump-version.sh` | 版本号同步更新 |
+| `scripts/bump-version.sh` | 版本号同步更新（含 pine-cpp `kVersion`） |
+| `scripts/check-pr-ci.sh` | watch 当前分支 PR 的 CI 状态（pre-push hook 自动调用） |
+
+### 本地 Git Hooks
+
+仓库内置 `.githooks/` 用 `git config core.hooksPath .githooks` 挂载即生效（首次 clone 后建议配一次）：
+
+- **`pre-commit`** — staged-only 格式 gate（gofmt / clang-format / ruff），不动未 staged 改动
+- **`pre-push`** — 工程级 lint（四语言 fail-on-violation）+ 自包装 CI watch（push 完成后自动起 `check-pr-ci.sh` 等终态）+ 自动 `--set-upstream` 接力（首次 push 新分支无需手动 `-u`）
 
 ### CI 流水线
 
 CI 在每次 push/PR 时自动运行：
 
-- **Lint** — Go (golangci-lint)、Java (checkstyle, failOnViolation=true)、Python (ruff)、C++ (-Werror)
+- **Lint** — Go (golangci-lint)、Java (checkstyle, failOnViolation=true)、Python (ruff)、C++ (clang-format -Werror)
 - **Test** — Go/Java/Apple/C++ 全量测试 + 覆盖率
 - **Sanitizer** — C++ ASan/UBSan 冒烟 + ThreadSanitizer 高并发压测
 - **Fuzz** — Go/Java fuzz + 三引擎差异模糊测试
+- **Daily sanitized fuzz** — 每日（北京时间 12:00）跑 ASan/TSan 加持的差分 fuzz 3000+2000 轮，专门面向 race / memory bug 的 deep-diagnostic（独立于每次 push 的 fast 路径）
 - **Benchmark** — Go/Java 性能基准
 - **Cross-validation** — 三引擎 schema/DAG/执行/错误/server/metrics 一致性
 - **Codegen check** — 确保生成代码与源码同步
@@ -363,6 +393,7 @@ def normalize_json(text):
 | 算子开发 | [`doc/guide_operator.md`](doc/guide_operator.md) — Go 算子开发指南 |
 | 第三方扩展 | [`design_doc/12_distribution.md`](design_doc/12_distribution.md) — 不修改源码添加自定义算子 |
 | API 参考 | [`doc/api.md`](doc/api.md) — HTTP 接口说明 |
+| LLM 检索文档 | [`llmdoc/`](llmdoc/) — 面向 AI 协作的稳定知识地图（架构 / 决策 / 反思 / 索引） |
 
 ## License
 

From 59fc37ff7a70b62cd59b7245569b4deecd4a2312 Mon Sep 17 00:00:00 2001
From: Liam <liamhuang0205@gmail.com>
Date: Thu, 25 Jun 2026 09:31:28 +0800
Subject: [PATCH 4/4] docs(readme): refresh benchmark tables with 2026-06-25
 v0.10.9 numbers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Re-ran the full 14-fixture cross-runtime bench on the standard 2C/4G
cgroup (10000 req × 16 conc) on the same machine that produced the
previous 2026-06-11 numbers. v0.10 series picked up wangshu CallInto /
GlobalsSlot fast paths, outputPool (#119), and Redis cascade-safety
(#137) — re-baseline so the README reflects measured reality.

Changes worth calling out:

- Three calibrated fixtures now listed instead of one. Until now the
  README collapsed the calibrated family to a single row, hiding the
  itemlua variant entirely. itemlua (3000 Lua calls/request) is the
  boundary-dominated workload that anchors the perf-evolution-roadmap
  "calibration fact 2 — end-to-end dilution" finding; it deserves to
  show up.
- C++ headline lift: 1.8x → 1.9x against Go/Java on calibrated. P50
  60.8ms vs 117/122ms is the more legible framing than the QPS ratio.
- Synthetic small/medium movements are all within ±10 % run-to-run
  noise; the relative shape (Go highest at small, Java reverses on
  large_1000+) is unchanged.
- Reproduce command now lists `make bench-cross-runtime` first.

Source data: bench-results/report-20260625-090834.txt
---
 README-en.md | 44 +++++++++++++++++++++++++-------------------
 README.md    | 46 ++++++++++++++++++++++++++--------------------
 2 files changed, 51 insertions(+), 39 deletions(-)

diff --git a/README-en.md b/README-en.md
index 98749a09..39933533 100644
--- a/README-en.md
+++ b/README-en.md
@@ -333,39 +333,45 @@ See `scripts/cross-validate.sh` for a complete production implementation.
 
 ## Benchmark
 
-Cross-engine performance comparison (HTTP server mode, `scripts/bench-cross-runtime.sh`, 10000 requests × 16 concurrency, server cgroup-isolated to 2C/4G). `realistic_calibrated` is a production proxy fixture calibrated against real traffic; the rest are synthetic stress tests.
+Cross-engine performance comparison (HTTP server mode, `scripts/bench-cross-runtime.sh`, 10000 requests × 16 concurrency, server cgroup-isolated to 2C/4G, re-measured 2026-06-25 / v0.10.9). `realistic_*_calibrated*` fixtures are production-proxy benchmarks calibrated against real traffic; the rest are synthetic stress tests.
 
 ### Throughput (QPS)
 
 | Fixture | Go | Java | C++ |
 |---|---|---|---|
-| small_010 (10 items) | 37078 | 5825 | 20794 |
-| small_050 (50 items) | 26976 | 5201 | 17244 |
-| small_100 (100 items) | 19585 | 4748 | 13904 |
-| medium_0100 (100 items) | 12025 | 3681 | 8578 |
-| medium_0500 (500 items) | 2921 | 2034 | 2938 |
-| medium_1000 (1000 items) | 1446 | 1360 | 1647 |
-| large_0100 (100 items) | 6395 | 2855 | 4855 |
-| large_0500 (500 items) | 1439 | 1439 | 1671 |
-| large_1000 (1000 items) | 728 | 917 | 902 |
-| large_5000 (5000 items) | 142 | 212 | 174 |
-| **realistic_calibrated (production proxy)** | **120** | **124** | **221** |
+| small_010 (10 items) | 36298 | 6318 | 20756 |
+| small_050 (50 items) | 27270 | 5336 | 17227 |
+| small_100 (100 items) | 19658 | 4607 | 13812 |
+| medium_0100 (100 items) | 12514 | 3589 | 8542 |
+| medium_0500 (500 items) | 3026 | 1965 | 2941 |
+| medium_1000 (1000 items) | 1513 | 1295 | 1656 |
+| large_0100 (100 items) | 7243 | 3064 | 5120 |
+| large_0500 (500 items) | 1684 | 1508 | 1773 |
+| large_1000 (1000 items) | 825 | 966 | 951 |
+| large_5000 (5000 items) | 155 | 213 | 175 |
+| realistic_for_you | 483 | 303 | 349 |
+| realistic_for_you_latency | 250 | 141 | 212 |
+| **realistic_for_you_calibrated (production proxy)** | **121** | **127** | **237** |
+| **realistic_for_you_calibrated_2c4g** | **121** | **124** | **224** |
+| **realistic_for_you_calibrated_itemlua** | **127** | **126** | **233** |
 
 ### P50 Latency (ms)
 
 | Fixture | Go | Java | C++ |
 |---|---|---|---|
-| small_010 | 0.3 | 2.0 | 0.6 |
-| medium_0500 | 5.0 | 6.3 | 5.2 |
-| large_1000 | 20.5 | 14.8 | 16.1 |
-| large_5000 | 102.2 | 67.9 | 83.9 |
-| **realistic_calibrated** | **123.6** | **121.9** | **65.0** |
+| small_010 | 0.4 | 1.5 | 0.6 |
+| medium_0500 | 4.9 | 6.8 | 5.3 |
+| large_1000 | 18.2 | 14.3 | 15.3 |
+| large_5000 | 94.3 | 68.6 | 83.4 |
+| **realistic_for_you_calibrated** | **122.3** | **117.7** | **60.8** |
+| **realistic_for_you_calibrated_itemlua** | **117.1** | **119.5** | **61.5** |
 
 Highlights:
 
-- **C++ leads by ~1.8x on the production-calibrated scenario** (QPS 221 vs 120/124; P50 65ms vs ~122ms) — this is what the "benchmark runtime" positioning means
+- **C++ leads by ~1.9x on production-calibrated workloads** (calibrated QPS 237 vs 121/127; P50 60ms vs 117/122ms) — this is what the "benchmark runtime" positioning means
 - Go has the highest throughput on synthetic small/medium fixtures (lowest lightweight-request overhead); Java's JIT hot-loop optimization wins at large row counts (large_1000+)
-- Numbers evolve with versions. Reproduce with `scripts/bench-cross-runtime.sh --requests 10000 --concurrency 16`; reports land in `bench-results/`
+- itemlua (3000 Lua calls/request, boundary-dominated shape) is statistically flat against calibrated across all three engines — confirms the "per-item boundary dominates + end-to-end dilution" calibration fact (see `llmdoc/memory/decisions/perf-evolution-roadmap.md`)
+- Numbers evolve with versions. Reproduce with `make bench-cross-runtime` or `scripts/bench-cross-runtime.sh --requests 10000 --concurrency 16`; reports land in `bench-results/`
 
 ## Documentation
 
diff --git a/README.md b/README.md
index 932275bf..8194d751 100644
--- a/README.md
+++ b/README.md
@@ -349,39 +349,45 @@ def normalize_json(text):
 
 ## Benchmark
 
-跨引擎性能对比（HTTP server 模式，`scripts/bench-cross-runtime.sh`，10000 请求 × 16 并发，server 以 2C/4G cgroup 隔离）。`realistic_calibrated` 为按真实流量校准的生产 proxy fixture，其余为合成压测。
+跨引擎性能对比（HTTP server 模式，`scripts/bench-cross-runtime.sh`，10000 请求 × 16 并发，server 以 2C/4G cgroup 隔离，2026-06-25 / v0.10.9 复测）。`realistic_*_calibrated*` 系列为按真实流量校准的生产 proxy fixture，其余为合成压测。
 
 ### 吞吐量 (QPS)
 
 | Fixture | Go | Java | C++ |
 |---|---|---|---|
-| small_010 (10 items) | 37078 | 5825 | 20794 |
-| small_050 (50 items) | 26976 | 5201 | 17244 |
-| small_100 (100 items) | 19585 | 4748 | 13904 |
-| medium_0100 (100 items) | 12025 | 3681 | 8578 |
-| medium_0500 (500 items) | 2921 | 2034 | 2938 |
-| medium_1000 (1000 items) | 1446 | 1360 | 1647 |
-| large_0100 (100 items) | 6395 | 2855 | 4855 |
-| large_0500 (500 items) | 1439 | 1439 | 1671 |
-| large_1000 (1000 items) | 728 | 917 | 902 |
-| large_5000 (5000 items) | 142 | 212 | 174 |
-| **realistic_calibrated (生产校准)** | **120** | **124** | **221** |
+| small_010 (10 items) | 36298 | 6318 | 20756 |
+| small_050 (50 items) | 27270 | 5336 | 17227 |
+| small_100 (100 items) | 19658 | 4607 | 13812 |
+| medium_0100 (100 items) | 12514 | 3589 | 8542 |
+| medium_0500 (500 items) | 3026 | 1965 | 2941 |
+| medium_1000 (1000 items) | 1513 | 1295 | 1656 |
+| large_0100 (100 items) | 7243 | 3064 | 5120 |
+| large_0500 (500 items) | 1684 | 1508 | 1773 |
+| large_1000 (1000 items) | 825 | 966 | 951 |
+| large_5000 (5000 items) | 155 | 213 | 175 |
+| realistic_for_you | 483 | 303 | 349 |
+| realistic_for_you_latency | 250 | 141 | 212 |
+| **realistic_for_you_calibrated (生产校准)** | **121** | **127** | **237** |
+| **realistic_for_you_calibrated_2c4g** | **121** | **124** | **224** |
+| **realistic_for_you_calibrated_itemlua** | **127** | **126** | **233** |
 
 ### P50 延迟 (ms)
 
 | Fixture | Go | Java | C++ |
 |---|---|---|---|
-| small_010 | 0.3 | 2.0 | 0.6 |
-| medium_0500 | 5.0 | 6.3 | 5.2 |
-| large_1000 | 20.5 | 14.8 | 16.1 |
-| large_5000 | 102.2 | 67.9 | 83.9 |
-| **realistic_calibrated** | **123.6** | **121.9** | **65.0** |
+| small_010 | 0.4 | 1.5 | 0.6 |
+| medium_0500 | 4.9 | 6.8 | 5.3 |
+| large_1000 | 18.2 | 14.3 | 15.3 |
+| large_5000 | 94.3 | 68.6 | 83.4 |
+| **realistic_for_you_calibrated** | **122.3** | **117.7** | **60.8** |
+| **realistic_for_you_calibrated_itemlua** | **117.1** | **119.5** | **61.5** |
 
 要点：
 
-- **生产校准场景下 C++ 领先约 1.8x**（QPS 221 vs 120/124；P50 65ms vs ~122ms），这是"标杆运行时"定位的体现
-- 合成 small/medium 场景 Go 吞吐最高（轻量请求路径开销最低）；大行数场景（large_1000+）Java 的 JIT 热循环优化使其反超
-- 各引擎数字会随版本演进，复现方式：`scripts/bench-cross-runtime.sh --requests 10000 --concurrency 16`，报告落在 `bench-results/`
+- **生产校准场景下 C++ 领先约 1.9x**（calibrated QPS 237 vs 121/127；P50 60ms vs 117/122ms），这是"标杆运行时"定位的体现
+- 合成 small/medium 场景 Go 吞吐最高（轻量请求路径开销最低）；大行数场景（large_1000+）Java 的 JIT 热循环优化反超
+- itemlua（3000 调用/请求的 boundary-dominated 形状）与 calibrated 在三引擎都统计持平，符合"per-item 边界主导 + 端到端稀释"的校准事实（详见 `llmdoc/memory/decisions/perf-evolution-roadmap.md`）
+- 各引擎数字会随版本演进，复现方式：`make bench-cross-runtime` 或 `scripts/bench-cross-runtime.sh --requests 10000 --concurrency 16`，报告落在 `bench-results/`
 
 ## 文档