supernovae · supernovae · Apr 27, 2026 · Apr 27, 2026 · Apr 27, 2026
diff --git a/.github/workflows/harness.yml b/.github/workflows/harness.yml
@@ -39,8 +39,8 @@ jobs:
       - name: Run tests
         run: npm test
 
-      - name: Run chat-demo (smoke test)
-        run: npx tsx examples/chat-demo.ts "What is 2+2?"
+      - name: Run chat pipeline demo (smoke test)
+        run: npx tsx examples/chat-pipeline-demo.ts "What is 2+2?"
 
-      - name: Run coder-demo (smoke test)
-        run: npx tsx examples/coder-demo.ts "Add error handling to the code"
+      - name: Run coder pipeline demo (smoke test)
+        run: npx tsx examples/coder-pipeline-demo.ts "Add error handling to the code"
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -23,6 +23,6 @@ The project follows semantic versioning for schema and registry compatibility:
 ### Changed
 
 - Breaking temporal normalization across governance RFCs: canonical fields (`observed_at`, `decided_at`, `effective_at`, `expires_at`, `started_at`, `completed_at`, `superseded_at`) replace legacy aliases.
-- Governance spine schemas updated (policy, permissions, delegation, audit, receipts, lifecycle, telemetry, memory, multi-agent protocol) and registry regenerated.
+- Governance spine schemas updated (policy, permissions, delegation, audit, receipts, lifecycle, telemetry, memory, multi-party protocol) and registry regenerated.
 - Reference harness runtime/types aligned to canonical temporal fields, logical ordering metadata, and updated governance artifacts.
 - Example fixtures now use registry shortname folders for delegation, permissions, and execution/audit receipts.
diff --git a/README.md b/README.md
@@ -43,7 +43,7 @@ This makes Open CoT useful beyond any one framework. An implementation can use R
 |------|------|
 | [`rfcs/`](./rfcs/) | **53 RFCs** covering reasoning traces, tool invocation, governed execution, policy, delegation, receipts, capability manifests, cognitive artifacts, and reconciliation results |
 | [`schemas/`](./schemas/) | Versioned JSON Schemas per RFC, including `registry.json` |
-| [`harness/`](./harness/) | Reference TypeScript harness that exercises earlier governed execution RFCs |
+| [`harness/`](./harness/) | Reference TypeScript core package that exercises earlier governed execution RFCs |
 | [`examples/`](./examples/) | Validated instance fixtures keyed by registry shortname |
 | [`reference/python/`](./reference/python/) | Reference Python tooling |
 | [`tools/`](./tools/) | Schema and fixture validation, registry sync, and RFC helpers |
@@ -89,7 +89,7 @@ pip install -r requirements-tools.txt
 python tools/validate.py
 ```
 
-Run the reference harness:
+Run the reference package:
 
 ```bash
 cd harness && npm install && npm test
@@ -105,7 +105,7 @@ That implementation pressure-tests Open CoT. If Open Lagrange needs a portable s
 
 - **53 RFCs** and a versioned JSON Schema registry.
 - New draft schemas for cognitive artifacts and reconciliation results.
-- Reference harness coverage for governed execution, policy, delegation, receipts, budgets, and capability manifests.
+- Reference package coverage for governed execution, policy, delegation, receipts, budgets, and capability manifests.
 - Cross-language validation tooling for schemas and examples.
 - Experiment cards and local runbooks under [`docs/experiments/`](./docs/experiments/).
 

diff --git a/datasets/synthetic/generate_scaled.py b/datasets/synthetic/generate_scaled.py
@@ -35,7 +35,7 @@ def build_scaled_traces() -> list[dict[str, object]]:
         "benchmark",
         "validator",
         "trace",
-        "agent",
+        "pipeline",
         "memory",
         "policy",
         "budget",

diff --git a/datasets/synthetic/task_bank_v1_large.jsonl b/datasets/synthetic/task_bank_v1_large.jsonl
@@ -24,7 +24,7 @@
 {"version": "0.1", "task": "Reverse the string 'benchmark'.", "steps": [{"id": "s1", "type": "thought", "content": "Read characters from right to left."}, {"id": "s2", "type": "code", "content": "'benchmark'[::-1] -> 'kramhcneb'", "parent": "s1"}], "final_answer": "kramhcneb"}
 {"version": "0.1", "task": "Reverse the string 'validator'.", "steps": [{"id": "s1", "type": "thought", "content": "Read characters from right to left."}, {"id": "s2", "type": "code", "content": "'validator'[::-1] -> 'rotadilav'", "parent": "s1"}], "final_answer": "rotadilav"}
 {"version": "0.1", "task": "Reverse the string 'trace'.", "steps": [{"id": "s1", "type": "thought", "content": "Read characters from right to left."}, {"id": "s2", "type": "code", "content": "'trace'[::-1] -> 'ecart'", "parent": "s1"}], "final_answer": "ecart"}
-{"version": "0.1", "task": "Reverse the string 'agent'.", "steps": [{"id": "s1", "type": "thought", "content": "Read characters from right to left."}, {"id": "s2", "type": "code", "content": "'agent'[::-1] -> 'tnega'", "parent": "s1"}], "final_answer": "tnega"}
+{"version": "0.1", "task": "Reverse the string 'cognitive pipeline'.", "steps": [{"id": "s1", "type": "thought", "content": "Read characters from right to left."}, {"id": "s2", "type": "code", "content": "'cognitive pipeline'[::-1] -> 'tnega'", "parent": "s1"}], "final_answer": "tnega"}
 {"version": "0.1", "task": "Reverse the string 'memory'.", "steps": [{"id": "s1", "type": "thought", "content": "Read characters from right to left."}, {"id": "s2", "type": "code", "content": "'memory'[::-1] -> 'yromem'", "parent": "s1"}], "final_answer": "yromem"}
 {"version": "0.1", "task": "Reverse the string 'policy'.", "steps": [{"id": "s1", "type": "thought", "content": "Read characters from right to left."}, {"id": "s2", "type": "code", "content": "'policy'[::-1] -> 'ycilop'", "parent": "s1"}], "final_answer": "ycilop"}
 {"version": "0.1", "task": "Reverse the string 'budget'.", "steps": [{"id": "s1", "type": "thought", "content": "Read characters from right to left."}, {"id": "s2", "type": "code", "content": "'budget'[::-1] -> 'tegdub'", "parent": "s1"}], "final_answer": "tegdub"}

diff --git a/docs/bibliography.md b/docs/bibliography.md
@@ -1,4 +1,4 @@
-# 📚 Annotated Bibliography: Chain‑of‑Thought & LLM Reasoning  
+# 📚 Annotated Bibliography: Chain‑of‑Thought & LLM Reasoning
 *With direct arXiv PDF links where available.*
 
 A curated bibliography covering foundational, structured, search‑based, RL‑based, and mechanistic reasoning research for LLMs. All arXiv‑hosted papers include stable PDF links.
@@ -7,109 +7,109 @@ A curated bibliography covering foundational, structured, search‑based, RL‑b
 
 ## 1. Foundational Chain‑of‑Thought (CoT)
 
-### Wei et al. (2022). *Chain‑of‑Thought Prompting Elicits Reasoning in Large Language Models.*  
-https://arxiv.org/pdf/2201.11903.pdf  
-Introduces CoT prompting and demonstrates large gains in arithmetic, symbolic, and commonsense reasoning.  
+### Wei et al. (2022). *Chain‑of‑Thought Prompting Elicits Reasoning in Large Language Models.*
+https://arxiv.org/pdf/2201.11903.pdf
+Introduces CoT prompting and demonstrates large gains in arithmetic, symbolic, and commonsense reasoning.
 **Relevance:** Defines the modern concept of “reasoning traces.”
 
-### Wang et al. (2022). *Self‑Consistency Improves Chain‑of‑Thought Reasoning in LLMs.*  
-https://arxiv.org/pdf/2203.11171.pdf  
-Proposes sampling multiple CoTs and voting for the most consistent answer.  
+### Wang et al. (2022). *Self‑Consistency Improves Chain‑of‑Thought Reasoning in LLMs.*
+https://arxiv.org/pdf/2203.11171.pdf
+Proposes sampling multiple CoTs and voting for the most consistent answer.
 **Relevance:** Establishes statistical evaluation of reasoning.
 
-### Zhou et al. (2022). *Least‑to‑Most Prompting.*  
-https://arxiv.org/pdf/2205.10625.pdf  
-Breaks complex tasks into simpler subproblems.  
+### Zhou et al. (2022). *Least‑to‑Most Prompting.*
+https://arxiv.org/pdf/2205.10625.pdf
+Breaks complex tasks into simpler subproblems.
 **Relevance:** Motivates structured decomposition fields in reasoning schemas.
 
 ---
 
 ## 2. Structured Reasoning & Agentic CoT
 
-### Yao et al. (2022). *ReAct: Synergizing Reasoning and Acting in Language Models.*  
-https://arxiv.org/pdf/2210.03629.pdf  
-Combines reasoning (“Thought”) with tool actions (“Act”).  
-**Relevance:** Foundation of modern agent loops.
+### Yao et al. (2022). *ReAct: Synergizing Reasoning and Acting in Language Models.*
+https://arxiv.org/pdf/2210.03629.pdf
+Combines reasoning (“Thought”) with tool actions (“Act”).
+**Relevance:** Foundation of modern cognitive pipelines.
 
-### Shinn et al. (2023). *Reflexion: Language Agents with Verbal Reinforcement Learning.*  
-https://arxiv.org/pdf/2303.11366.pdf  
-Introduces self‑critique and iterative refinement loops.  
+### Shinn et al. (2023). *Reflexion: Language Pipelines with Verbal Reinforcement Learning.*
+https://arxiv.org/pdf/2303.11366.pdf
+Introduces self‑critique and iterative refinement loops.
 **Relevance:** Motivates `critique` and `revision` fields in schemas.
 
-### Chen et al. (2022). *Program‑of‑Thoughts (PoT).*  
-https://arxiv.org/pdf/2211.12588.pdf  
-Uses executable code as reasoning traces.  
+### Chen et al. (2022). *Program‑of‑Thoughts (PoT).*
+https://arxiv.org/pdf/2211.12588.pdf
+Uses executable code as reasoning traces.
 **Relevance:** Demonstrates typed, verifiable reasoning.
 
 ---
 
 ## 3. Search‑Based Reasoning (Beyond Linear CoT)
 
-### Yao et al. (2023). *Tree‑of‑Thoughts: Deliberate Problem Solving with Large Language Models.*  
-https://arxiv.org/pdf/2305.10601.pdf  
-Generalizes CoT into a search tree with branching and pruning.  
+### Yao et al. (2023). *Tree‑of‑Thoughts: Deliberate Problem Solving with Large Language Models.*
+https://arxiv.org/pdf/2305.10601.pdf
+Generalizes CoT into a search tree with branching and pruning.
 **Relevance:** Motivates branching reasoning structures.
 
-### Besta et al. (2023). *Graph‑of‑Thoughts: Solving Problems with Large Language Models and Search.*  
-https://arxiv.org/pdf/2308.09687.pdf  
-Extends ToT into graph‑structured reasoning.  
+### Besta et al. (2023). *Graph‑of‑Thoughts: Solving Problems with Large Language Models and Search.*
+https://arxiv.org/pdf/2308.09687.pdf
+Extends ToT into graph‑structured reasoning.
 **Relevance:** Encourages flexible graph‑based schemas.
 
-### Long‑Horizon CoT Studies  
-(Various works; no single canonical arXiv source.)  
-Show that longer reasoning traces improve performance but increase instability.  
+### Long‑Horizon CoT Studies
+(Various works; no single canonical arXiv source.)
+Show that longer reasoning traces improve performance but increase instability.
 **Relevance:** Motivates metadata like `confidence`, `verification_status`, and `error_type`.
 
 ---
 
 ## 4. RL‑Based Reasoning (R1‑Style, DeepSeek‑Style, Qwen‑Style)
 
-### DeepSeek‑R1 (2024). *DeepSeek‑R1: Incentivizing Reasoning in LLMs via Reinforcement Learning.*  
-https://arxiv.org/pdf/2501.12948.pdf  
-Uses RL with verifiable rewards to produce long, structured reasoning.  
+### DeepSeek‑R1 (2024). *DeepSeek‑R1: Incentivizing Reasoning in LLMs via Reinforcement Learning.*
+https://arxiv.org/pdf/2501.12948.pdf
+Uses RL with verifiable rewards to produce long, structured reasoning.
 **Relevance:** Aligns with structured scratchpad formats.
 
-### Qwen2.5‑R1 (2024). *Reinforcement Learning for Reasoning.*  
-https://arxiv.org/pdf/2501.19393.pdf  
-Documents RL-centric post-training strategies that improve reasoning quality while preserving broad instruction utility.  
+### Qwen2.5‑R1 (2024). *Reinforcement Learning for Reasoning.*
+https://arxiv.org/pdf/2501.19393.pdf
+Documents RL-centric post-training strategies that improve reasoning quality while preserving broad instruction utility.
 **Relevance:** Supports reward-aware post-training pipelines and reproducibility-oriented run metadata.
 
 ---
 
 ## 5. Evaluation, Reliability, and Calibration
 
-### Lin et al. (2021). *TruthfulQA: Measuring How Models Mimic Human Falsehoods.*  
-https://arxiv.org/pdf/2109.07958.pdf  
-Introduces reliability-oriented evaluation emphasizing truthful behavior under difficult prompts.  
+### Lin et al. (2021). *TruthfulQA: Measuring How Models Mimic Human Falsehoods.*
+https://arxiv.org/pdf/2109.07958.pdf
+Introduces reliability-oriented evaluation emphasizing truthful behavior under difficult prompts.
 **Relevance:** Motivates safety-aware benchmark slices and failure-mode tracking.
 
-### Kadavath et al. (2022). *Language Models (Mostly) Know What They Know.*  
-https://arxiv.org/pdf/2207.05221.pdf  
-Studies calibration and confidence quality in language models.  
+### Kadavath et al. (2022). *Language Models (Mostly) Know What They Know.*
+https://arxiv.org/pdf/2207.05221.pdf
+Studies calibration and confidence quality in language models.
 **Relevance:** Motivates confidence and uncertainty metrics in verifier outputs.
 
-### Gao et al. (2023). *Pal: Program-Aided Language Models.*  
-https://arxiv.org/pdf/2211.10435.pdf  
-Uses executable programs to verify intermediate reasoning steps.  
+### Gao et al. (2023). *Pal: Program-Aided Language Models.*
+https://arxiv.org/pdf/2211.10435.pdf
+Uses executable programs to verify intermediate reasoning steps.
 **Relevance:** Supports stronger step-level verification beyond format checks.
 
 ---
 
 ## 6. Open-Source Tooling and Reuse Guidance
 
-### EleutherAI LM Evaluation Harness  
-https://github.com/EleutherAI/lm-evaluation-harness  
-De facto open benchmark runner for reproducible LLM evaluation.  
+### EleutherAI LM Evaluation Harness
+https://github.com/EleutherAI/lm-evaluation-harness
+De facto open benchmark runner for reproducible LLM evaluation.
 **Relevance:** Should be integrated through adapters rather than reimplemented.
 
-### Hugging Face TRL  
-https://github.com/huggingface/trl  
-Open-source stack for SFT, DPO, PPO/GRPO-style fine-tuning workflows.  
+### Hugging Face TRL
+https://github.com/huggingface/trl
+Open-source stack for SFT, DPO, PPO/GRPO-style fine-tuning workflows.
 **Relevance:** Preferred training primitive for alignment and preference experiments.
 
-### vLLM  
-https://github.com/vllm-project/vllm  
-High-throughput inference engine with consistent generation behavior for evaluation and serving.  
+### vLLM
+https://github.com/vllm-project/vllm
+High-throughput inference engine with consistent generation behavior for evaluation and serving.
 **Relevance:** Stabilizes benchmark throughput and reproducibility for large eval runs.
 
 ---
@@ -142,9 +142,9 @@ Reports 56% token reduction vs JSON with native relationship support, type safet
 
 Use this checklist when building, fine-tuning, and validating models with Open CoT:
 
-1. **Always emit structured traces** (`version`, `task`, `steps`, `final_answer`) and validate before scoring.  
-2. **Use multi-sample evaluation** with consistency metrics, not only single greedy outputs.  
-3. **Track lineage metadata** (dataset hash, model base, adapter hash, seed, decoding config) for every run.  
-4. **Enforce data governance gates** (license allowlist, dedup, contamination checks, provenance fields).  
-5. **Run policy and safety checks** (budget limits, tool restrictions, redaction/audit events) in runtime scripts.  
+1. **Always emit structured traces** (`version`, `task`, `steps`, `final_answer`) and validate before scoring.
+2. **Use multi-sample evaluation** with consistency metrics, not only single greedy outputs.
+3. **Track lineage metadata** (dataset hash, model base, adapter hash, seed, decoding config) for every run.
+4. **Enforce data governance gates** (license allowlist, dedup, contamination checks, provenance fields).
+5. **Run policy and safety checks** (budget limits, tool restrictions, redaction/audit events) in runtime scripts.
 6. **Reuse mature OSS tooling** for training/evaluation kernels and keep Open CoT logic focused on schemas/adapters/conformance.
diff --git a/docs/cognitive-participation-pivot.md b/docs/cognitive-participation-pivot.md
@@ -0,0 +1,54 @@
+# Cognitive Participation Pivot
+
+Open CoT defines a portable interface between cognition and execution. The
+model contributes fuzzy text processing and structured cognitive artifacts; the
+runtime boundary validates, authorizes, executes, records observations, and
+reconciles final state.
+
+This distinction matters because natural-language reasoning is useful evidence,
+but it is not authority. A reasoning trace can explain how a model reached a
+proposal. It cannot grant permission, prove correctness, or bypass policy.
+
+| Common market framing | Open CoT framing |
+| --- | --- |
+| A model owns the loop | A runtime boundary owns reconciliation |
+| Tool use is part of the model experience | Endpoint execution is a governed side effect |
+| Prompts carry safety expectations | Capability snapshots and policy gates carry authority |
+| Reasoning explains the whole run | Reasoning is cognitive evidence inside a larger audit record |
+| A failed tool call is explained by natural language | A failed endpoint execution is recorded as a structured observation and error |
+| Safety is mostly instruction-following | Safety is layered validation, permission, budget, and result reconciliation |
+| Interfaces are private runtime details | Interfaces are portable schemas that independent runtimes can implement |
+
+## Reasoning Remains Central
+
+Open CoT keeps reasoning traces because they are evidence of cognitive
+participation. They help answer:
+
+- What objective did the cognitive step believe it was handling?
+- What constraints and assumptions shaped the proposal?
+- What uncertainty was present before execution?
+- What explanation can be shared safely with reviewers?
+- What detailed evidence, if any, must remain restricted or redacted?
+
+The trace is intentionally separated from execution authority. A runtime may use
+reasoning evidence during review, auditing, debugging, or evaluation, but it
+must reconcile execution intents against capability snapshots, policy gates,
+budgets, preconditions, and endpoint results.
+
+## Interface Boundary
+
+Open CoT should standardize portable artifacts:
+
+- Cognitive artifacts.
+- Capability snapshots.
+- Execution intents.
+- Reasoning evidence.
+- Observations.
+- Policy evaluation records.
+- Reconciliation results.
+- Error taxonomy.
+- Budget and cost boundaries.
+
+Open Lagrange and other implementations can then choose their own durable
+runtime, transport, endpoint registry, policy engine, and storage model while
+still sharing the same interface contract.