Liam0205 · Liam0205 · Jun 25, 2026 · Jun 25, 2026 · Jun 25, 2026 · Jun 25, 2026
diff --git a/README-en.md b/README-en.md
@@ -38,97 +38,15 @@ Python DSL (Apple)  ──compile──>  JSON Config
 - **Implicit graph construction** — Operators declare input/output fields; engine infers DAG dependencies with transitive reduction
 - **Lock-free parallelism** — Independent operators in the DAG execute in parallel automatically
 - **Compile-time validation** — Dead code, missing fields, write-after-write detected before deployment
-- **Embedded Lua** — Built-in Lua operators for lightweight custom computation. End-to-end overhead ~1.2-2x; isolated operator-level overhead varies by runtime and compute complexity (C++/LuaJIT ~3-5x, Java ~2-9x, Go ~6-17x) — write native operators for compute-heavy hot paths
+- **Embedded Lua** — Built-in Lua operators for lightweight custom computation. pine-go defaults to [wangshu](https://github.com/Liam0205/wangshu) (pure-Go Lua 5.1 VM, NaN-boxing + arena GC); switch back to gopher-lua via `-tags=lua_gopher`. pine-java uses LuaJC (bytecode compilation), pine-cpp uses LuaJIT. End-to-end overhead ~1.2-2x; isolated operator-level overhead varies by runtime and compute complexity (C++/LuaJIT ~3-5x, Java ~2-9x, Go ~6-17x) — write native operators for compute-heavy hot paths
 - **Hot config reload** — Service automatically reloads engine config without downtime
-- **Dynamic resources** — Background-refreshed in-memory resource manager with lock-free reads
-- **White-box observability** — Operator-level traces, `/stats` endpoint, pluggable Prometheus interface
+- **Dynamic resources** — Two-channel resource manager: **data-typed** (e.g. static dict / real-time feature store, snapshot-exported lock-free reads) + **handle-typed** (e.g. `redis_connection`, borrow lease + RAII teardown); background-refreshed
+- **Redis cascade-safety** — The `redis_connection` resource exposes 5 cascade params (`{dial,read,write,pool}_timeout_ms` + `pool_size`); per-command metrics `pine_redis_command_*` with 4-state status (ok / timeout / pool_timeout / error), fail-on-error silent-degradation contract
+- **White-box observability** — Operator-level traces; the `/stats` composite response includes `/stats.http` (request-level 4-state metrics) + `/stats.resources` (resource pool / probe / per-command 4-state categories); pluggable Prometheus interface
 - **Row/Column storage** — DataFrame supports both storage modes
-- **Tri-engine consistency** — Go/Java/C++ engines verified via CI cross-validation for schema, DAG, execution, error, server, and metrics parity
+- **Tri-engine consistency** — Go/Java/C++ engines verified byte-exactly via CI cross-validation (19 sections + tri-engine differential fuzz + daily ASan/TSan sanitized fuzz)
 - **Pine-C++ benchmark runtime** — Complete third runtime with operator parity, HTTP server (hot reload / graceful shutdown), ColumnFrame/RowFrame dual physical layouts, lazy OperatorInput projection, LuaJIT integration, metrics/resource parity
 
-## Migrating from Older Versions (Breaking Change)
-
-> Starting from v0.7, the Go engine has moved from the repository root into the `pine-go/` subdirectory. The Go module path has changed accordingly.
-
-### What Changed
-
-| Item | Before | After |
-|------|--------|-------|
-| Module path | `github.com/Liam0205/pineapple` | `github.com/Liam0205/pineapple/pine-go` |
-| Import | `github.com/Liam0205/pineapple/internal/...` | `github.com/Liam0205/pineapple/pine-go/internal/...` |
-| Import | `github.com/Liam0205/pineapple/pkg/...` | `github.com/Liam0205/pineapple/pine-go/pkg/...` |
-| Import | `github.com/Liam0205/pineapple/operators` | `github.com/Liam0205/pineapple/pine-go/operators` |
-| Binary | `go build ./cmd/pineapple-server` | `go build ./pine-go/cmd/pineapple-server` |
-
-### Migration Steps
-
-```bash
-# 1. Bulk-replace import paths
-find . -name '*.go' -exec sed -i \
-  's|github.com/Liam0205/pineapple/|github.com/Liam0205/pineapple/pine-go/|g' {} +
-
-# 2. Fix double-nesting if you referenced the module itself
-find . -name '*.go' -exec sed -i \
-  's|github.com/Liam0205/pineapple/pine-go/pine-go/|github.com/Liam0205/pineapple/pine-go/|g' {} +
-
-# 3. Update go.mod
-go get github.com/Liam0205/pineapple/pine-go@latest
-go mod tidy
-```
-
-If your project uses Pineapple through public APIs (`pine.NewEngine`, `pine.BuildOperator`, etc.), the above steps complete the migration.
-
-### Configuration & Runtime Semantic Changes
-
-The following changes affect JSON configuration and operator runtime behavior:
-
-#### 1. `row_dependency` Renamed to `consumes_row_set`
-
-The `"row_dependency": true` field in operator JSON config has been removed. Use `"consumes_row_set": true` instead (same semantics: marks the operator as needing a stable row set before execution).
-
-```diff
- {
-   "type_name": "transform_size",
--  "row_dependency": true,
-+  "consumes_row_set": true,
-   "$metadata": { ... }
- }
-```
-
-Apple DSL side: `OpCall(..., row_dependency=True)` → `OpCall(..., consumes_row_set=True)`.
-
-#### 2. DAG Scheduling Model: Barriers → Row-Set Marker Interfaces
-
-Previously, Filter/Merge/Reorder operators acted as "barriers" — all predecessors had to complete before them, and all successors had to wait.
-
-The new model uses three marker interfaces for precise row-set dependency declaration:
-
-| Marker | Meaning | Typical Operators |
-|--------|---------|-------------------|
-| `ConsumesRowSet` | Iterates all items; needs row set stable | filter_*, merge_*, reorder_*, transform_size |
-| `MutatesRowSet` | Removes or reorders items | filter_*, merge_*, reorder_* |
-| `AdditiveWritesRowSet` | Appends items (parallel with other appenders) | recall_* |
-
-**Impact**: Transform operators that only touch common fields are no longer blocked by barriers and can execute in parallel with Filter/Merge/Reorder. This improves parallelism without changing final results — correctness is guaranteed by field-level data hazard analysis.
-
-**Custom operator migration**: If you implemented a custom Recall-type operator, embed `types.AdditiveWritesRowSetMarker`.
-
-#### 3. Field Accessor Strict Mode
-
-`BuildInput` now distinguishes Strict vs. Defaulted fields:
-
-- **Strict** (fields without a `common_defaults` / `item_defaults` entry): errors immediately at runtime if the value is nil, instead of passing nil to the operator
-- **Defaulted** (fields with a default): substitutes the default when the value is nil or missing
-
-**Impact**: If your pipeline relies on "nil passthrough to operator for self-handling", add a `common_defaults` or `item_defaults` entry for that field (value can be `null`) to preserve the old behavior:
-
-```json
-{
-  "$metadata": { "common_input": ["optional_field"], ... },
-  "common_defaults": { "optional_field": null }
-}
-```
-
 ## Quick Start
 
 ### Prerequisites
@@ -239,8 +157,28 @@ pineapple/
 
 ## Development
 
+### Top-level Make Targets
+
+Cross-language fmt / lint / test / bench / codegen / version management is unified behind the top-level `Makefile` (with `pine-go/Makefile` for Go-specific work). CI and local dev share the same command sequence.
+
+| Make target | Purpose |
+|---|---|
+| `make fmt` | Format all four languages (gofmt / google-java-format / clang-format / ruff) |
+| `make lint` | Lint all four languages (incl. checkstyle `failOnViolation=true`, `-Werror`) |
+| `make test` | Full test suite across runtimes |
+| `make bench` | Default `pine_bench` tag |
+| `make bench-cross-runtime` | Cross-engine fixture-driven benchmark (cgroup-isolated) |
+| `make bench-lua-backends` | wangshu vs gopher-lua, same-host serial + benchstat |
+| `make differential-fuzz` | Tri-engine differential fuzz |
+| `make cross-validate` | Tri-engine consistency verification |
+| `make codegen` | Generate `apple_generated/` + `doc/operators/` from pine-go Registry |
+| `make codegen-check` | CI: codegen + `git diff --exit-code` to enforce artifact freshness |
+| `make check-pr-ci` | Watch CI status of the current branch's PR (pre-push hook calls this) |
+
 ### Scripts
 
+`scripts/` holds the actual implementations behind the Make targets and can be invoked standalone:
+
 | Script | Purpose |
 |--------|---------|
 | `scripts/go-test.sh` | Run all Go tests |
@@ -250,6 +188,7 @@ pineapple/
 | `scripts/go-bench.sh` | Go benchmarks |
 | `scripts/java-bench.sh` | Java benchmarks |
 | `scripts/bench-cross-runtime.sh` | Cross-engine HTTP server benchmark (fixture-driven, cgroup-isolated) |
+| `scripts/bench-lua-backends.sh` | wangshu vs gopher-lua backend comparison (benchstat delta) |
 | `scripts/go-fuzz.sh` | Go fuzz testing |
 | `scripts/java-fuzz.sh` | Java fuzz testing |
 | `scripts/differential-fuzz.sh` | Tri-engine differential fuzzing (random pipelines, output diff) |
@@ -260,16 +199,25 @@ pineapple/
 | `scripts/render-dag.sh` | DAG visualization (`--backend go\|java`) |
 | `scripts/apple-compile.sh` | Compile Apple DSL to JSON |
 | `scripts/run-pipeline.sh` | One-shot pipeline execution |
-| `scripts/bump-version.sh` | Synchronize version across all components |
+| `scripts/bump-version.sh` | Synchronize version across all components (incl. pine-cpp `kVersion`) |
+| `scripts/check-pr-ci.sh` | Watch CI status of the current branch's PR (pre-push hook invokes this) |
+
+### Local Git Hooks
+
+`.githooks/` ships with the repository; activate via `git config core.hooksPath .githooks` once after clone:
+
+- **`pre-commit`** — staged-only format gate (gofmt / clang-format / ruff); does not touch unstaged work
+- **`pre-push`** — project-level lint (four-language fail-on-violation) + self-wrapped post-push CI watcher (auto-runs `check-pr-ci.sh` after the actual push) + auto `--set-upstream` relay (first-push of a new branch does not need a manual `-u`)
 
 ### CI Pipeline
 
 CI runs automatically on every push/PR:
 
-- **Lint** — Go (golangci-lint), Java (checkstyle, failOnViolation=true), Python (ruff), C++ (-Werror)
+- **Lint** — Go (golangci-lint), Java (checkstyle, failOnViolation=true), Python (ruff), C++ (clang-format -Werror)
 - **Test** — Full Go/Java/Apple/C++ test suites with coverage
 - **Sanitizer** — C++ ASan/UBSan smoke + ThreadSanitizer stress
 - **Fuzz** — Go/Java fuzz + tri-engine differential fuzzing
+- **Daily sanitized fuzz** — Daily (12:00 UTC+8) ASan/TSan differential fuzz, 3000+2000 rounds, dedicated to race / memory-bug deep diagnostics (independent of the per-push fast lane)
 - **Benchmark** — Go/Java performance benchmarks
 - **Cross-validation** — Tri-engine schema/DAG/execution/error/server/metrics parity
 - **Codegen check** — Ensures generated code is in sync with source
@@ -385,39 +333,45 @@ See `scripts/cross-validate.sh` for a complete production implementation.
 
 ## Benchmark
 
-Cross-engine performance comparison (HTTP server mode, `scripts/bench-cross-runtime.sh`, 10000 requests × 16 concurrency, server cgroup-isolated to 2C/4G). `realistic_calibrated` is a production proxy fixture calibrated against real traffic; the rest are synthetic stress tests.
+Cross-engine performance comparison (HTTP server mode, `scripts/bench-cross-runtime.sh`, 10000 requests × 16 concurrency, server cgroup-isolated to 2C/4G, re-measured 2026-06-25 / v0.10.9). `realistic_*_calibrated*` fixtures are production-proxy benchmarks calibrated against real traffic; the rest are synthetic stress tests.
 
 ### Throughput (QPS)
 
 | Fixture | Go | Java | C++ |
 |---|---|---|---|
-| small_010 (10 items) | 37078 | 5825 | 20794 |
-| small_050 (50 items) | 26976 | 5201 | 17244 |
-| small_100 (100 items) | 19585 | 4748 | 13904 |
-| medium_0100 (100 items) | 12025 | 3681 | 8578 |
-| medium_0500 (500 items) | 2921 | 2034 | 2938 |
-| medium_1000 (1000 items) | 1446 | 1360 | 1647 |
-| large_0100 (100 items) | 6395 | 2855 | 4855 |
-| large_0500 (500 items) | 1439 | 1439 | 1671 |
-| large_1000 (1000 items) | 728 | 917 | 902 |
-| large_5000 (5000 items) | 142 | 212 | 174 |
-| **realistic_calibrated (production proxy)** | **120** | **124** | **221** |
+| small_010 (10 items) | 36298 | 6318 | 20756 |
+| small_050 (50 items) | 27270 | 5336 | 17227 |
+| small_100 (100 items) | 19658 | 4607 | 13812 |
+| medium_0100 (100 items) | 12514 | 3589 | 8542 |
+| medium_0500 (500 items) | 3026 | 1965 | 2941 |
+| medium_1000 (1000 items) | 1513 | 1295 | 1656 |
+| large_0100 (100 items) | 7243 | 3064 | 5120 |
+| large_0500 (500 items) | 1684 | 1508 | 1773 |
+| large_1000 (1000 items) | 825 | 966 | 951 |
+| large_5000 (5000 items) | 155 | 213 | 175 |
+| realistic_for_you | 483 | 303 | 349 |
+| realistic_for_you_latency | 250 | 141 | 212 |
+| **realistic_for_you_calibrated (production proxy)** | **121** | **127** | **237** |
+| **realistic_for_you_calibrated_2c4g** | **121** | **124** | **224** |
+| **realistic_for_you_calibrated_itemlua** | **127** | **126** | **233** |
 
 ### P50 Latency (ms)
 
 | Fixture | Go | Java | C++ |
 |---|---|---|---|
-| small_010 | 0.3 | 2.0 | 0.6 |
-| medium_0500 | 5.0 | 6.3 | 5.2 |
-| large_1000 | 20.5 | 14.8 | 16.1 |
-| large_5000 | 102.2 | 67.9 | 83.9 |
-| **realistic_calibrated** | **123.6** | **121.9** | **65.0** |
+| small_010 | 0.4 | 1.5 | 0.6 |
+| medium_0500 | 4.9 | 6.8 | 5.3 |
+| large_1000 | 18.2 | 14.3 | 15.3 |
+| large_5000 | 94.3 | 68.6 | 83.4 |
+| **realistic_for_you_calibrated** | **122.3** | **117.7** | **60.8** |
+| **realistic_for_you_calibrated_itemlua** | **117.1** | **119.5** | **61.5** |
 
 Highlights:
 
-- **C++ leads by ~1.8x on the production-calibrated scenario** (QPS 221 vs 120/124; P50 65ms vs ~122ms) — this is what the "benchmark runtime" positioning means
+- **C++ leads by ~1.9x on production-calibrated workloads** (calibrated QPS 237 vs 121/127; P50 60ms vs 117/122ms) — this is what the "benchmark runtime" positioning means
 - Go has the highest throughput on synthetic small/medium fixtures (lowest lightweight-request overhead); Java's JIT hot-loop optimization wins at large row counts (large_1000+)
-- Numbers evolve with versions. Reproduce with `scripts/bench-cross-runtime.sh --requests 10000 --concurrency 16`; reports land in `bench-results/`
+- itemlua (3000 Lua calls/request, boundary-dominated shape) is statistically flat against calibrated across all three engines — confirms the "per-item boundary dominates + end-to-end dilution" calibration fact (see `llmdoc/memory/decisions/perf-evolution-roadmap.md`)
+- Numbers evolve with versions. Reproduce with `make bench-cross-runtime` or `scripts/bench-cross-runtime.sh --requests 10000 --concurrency 16`; reports land in `bench-results/`
 
 ## Documentation
 
@@ -429,6 +383,7 @@ Highlights:
 | Operator development | [`doc/guide_operator-en.md`](doc/guide_operator-en.md) — Go operator development guide |
 | Third-party extensions | [`design_doc/12_distribution-en.md`](design_doc/12_distribution-en.md) — Add custom operators without modifying source |
 | API reference | [`doc/api-en.md`](doc/api-en.md) — HTTP endpoint documentation |
+| LLM retrieval docs | [`llmdoc/`](llmdoc/) — Stable knowledge map for AI collaboration (architecture / decisions / reflections / index) |
 
 ## License