Context
#119 (PR #120) brought a profile-driven 7-9% ns/op + 19% B/op reduction to pine-go by pooling *OperatorOutput via sync.Pool in the scheduler, eliminating per-Execute alloc of OperatorOutput + the dominant itemWrites []ItemWrite grow cost.
The win does not transfer to pine-java. Java and Go runtimes are independent — sharing only the JSON contract and Apple DSL. #119 touched pine-go/internal/runtime/scheduler.go; nothing in that change reaches the JVM side.
Current Java state
pine-java/src/main/java/page/liam/pine/OperatorOutput.java uses:
private final Map<String, Object> commonWrites = new LinkedHashMap<>();
private final Map<Integer, Map<String, Object>> itemWrites = new HashMap<>(); // nested
private final List<Map<String, Object>> addedItems = new ArrayList<>();
private final Set<Integer> removedItems = new HashSet<>();
The blocker is structural, not lifetime-related:
| Storage |
Per-Execute behaviour |
Pooling viability |
itemWrites: Map<Integer, Map<String,Object>> (nested) |
Each setItem(i, field, v) does computeIfAbsent(i, k -> new LinkedHashMap<>()).put(field, v) — tree-node alloc on first touch per row, then per-cell map put |
Pooling the outer map keeps O(N) inner LinkedHashMaps live; clearing them on Reset is itself O(N×M) — worse than current alloc |
pine-go had the same nested form historically (commit d238098 "replace nested map item writes with flat []ItemWrite slice", v0.7 era). The refactor unlocked everything downstream — including #119's pool. Java needs the same structural refactor first.
Suggested phasing
Phase 1 (refactor — separate PR, no pooling yet):
- Introduce
ItemWrite { int index; String field; Object value; } record
- Change
itemWrites to List<ItemWrite> (or ArrayList<ItemWrite> for capacity reuse later)
- Update
Engine.applyOutput / ColumnFrame.applyOutput / DataFrame.applyOutput / ParallelExecutor.mergeOutputs to iterate the flat list
- Update Java fuzz / unit tests as needed; cross-validate's byte-equal /execute parity (
scripts/cross-validate/02-engine-byte-exact.sh) gates correctness — no behaviour change should leak
Phase 2 (pool — once Phase 1 lands):
- Reset() method analogous to Go's: null slot refs, truncate to size 0 (
ArrayList.clear() retains capacity)
ThreadLocal<OperatorOutput> or ConcurrentLinkedDeque pool keyed at Engine instance
- Expect 5-10% throughput improvement based on Go numbers + JVM GC overhead profile
Why JVM-specific concerns matter
JVM is not Go:
- Short-lived objects often live in TLAB (Thread-Local Allocation Buffer), young gen, never reach old gen
- Escape analysis can sometimes stack-allocate
- BUT: nested LinkedHashMap allocs are heavy enough to escape TLAB on hot paths; profiling is required to confirm the win
Recommended approach: profile pine-java first via JMH (which #119 already noted is missing — see pine-java/benchmarks/'s placeholder note). If OperatorOutput.setItem map allocs dominate hot-path GC pressure, do Phase 1+2. If JVM is amortizing it away cheaply, defer indefinitely.
Risk
- Behaviour parity: byte-equal
/execute parity is gated by cross-validate/02-engine-byte-exact.sh. Both phases must preserve it.
- Concurrency: Java's
OperatorOutput is currently mutable-by-single-thread per Execute; the same contract must hold post-refactor.
- Phase 1 LOC: ~80-120 lines (Java reformat of the nested-map walk is the bulk).
Related
Context
#119 (PR #120) brought a profile-driven 7-9% ns/op + 19% B/op reduction to pine-go by pooling
*OperatorOutputviasync.Poolin the scheduler, eliminating per-Execute alloc ofOperatorOutput+ the dominantitemWrites []ItemWritegrow cost.The win does not transfer to pine-java. Java and Go runtimes are independent — sharing only the JSON contract and Apple DSL. #119 touched
pine-go/internal/runtime/scheduler.go; nothing in that change reaches the JVM side.Current Java state
pine-java/src/main/java/page/liam/pine/OperatorOutput.javauses:The blocker is structural, not lifetime-related:
itemWrites: Map<Integer, Map<String,Object>>(nested)setItem(i, field, v)doescomputeIfAbsent(i, k -> new LinkedHashMap<>()).put(field, v)— tree-node alloc on first touch per row, then per-cell map putpine-go had the same nested form historically (commit
d238098"replace nested map item writes with flat[]ItemWriteslice", v0.7 era). The refactor unlocked everything downstream — including #119's pool. Java needs the same structural refactor first.Suggested phasing
Phase 1 (refactor — separate PR, no pooling yet):
ItemWrite { int index; String field; Object value; }recorditemWritestoList<ItemWrite>(orArrayList<ItemWrite>for capacity reuse later)Engine.applyOutput/ColumnFrame.applyOutput/DataFrame.applyOutput/ParallelExecutor.mergeOutputsto iterate the flat listscripts/cross-validate/02-engine-byte-exact.sh) gates correctness — no behaviour change should leakPhase 2 (pool — once Phase 1 lands):
ArrayList.clear()retains capacity)ThreadLocal<OperatorOutput>orConcurrentLinkedDequepool keyed at Engine instanceWhy JVM-specific concerns matter
JVM is not Go:
Recommended approach: profile pine-java first via JMH (which #119 already noted is missing — see
pine-java/benchmarks/'s placeholder note). IfOperatorOutput.setItemmap allocs dominate hot-path GC pressure, do Phase 1+2. If JVM is amortizing it away cheaply, defer indefinitely.Risk
/executeparity is gated bycross-validate/02-engine-byte-exact.sh. Both phases must preserve it.OperatorOutputis currently mutable-by-single-thread per Execute; the same contract must hold post-refactor.Related
d238098— the prior-art refactor (Go side, v0.7 era)