From 8c1ec75ebaaed125a1ddf957381fb928052708a5 Mon Sep 17 00:00:00 2001
From: cyber-ayi <259769279+cyber-ayi@users.noreply.github.com>
Date: Wed, 3 Jun 2026 16:24:19 -0700
Subject: [PATCH] docs(exploration): prior-art survey for L2 runtimes + L3
 cognition
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fills the T2/T4 context gap — the L2/L3 prior-art survey was only in chat, never
written to docs (grep found just scattered name-drops).

- exploration/prior-art-l2-l3.md — current (2026) community/research work for the
  harness runtime (L2: ElizaOS/OpenClaw/Hermes/Voyager/PIANO) and cognition layer
  (L3: Generative Agents, Letta/MemGPT/Mem0, Colas/MAGELLAN, needs/personality
  frameworks), mapped to our needs. Two tensions: optimization bias is everywhere
  (borrow goal-generation, drop the reward objective — adr-0004); the "non-optimal
  believable long-horizon" gap is confirmed = the project's contribution.
  Recommendations feed T2 (runtime: ElizaOS-core + OpenClaw-gateway + Hermes-
  fallback) and T4 (reuse Generative Agents memory/reflection + Letta/Mem0; goal-gen
  from Colas minus learning-progress; needs/identity from Sophia/needs-emergence).
- ROADMAP — harness-runtime gate now cites the lean; note added to Open explorations.

Session-Id: 019e8d56-605a-7b45-8ef0-21ee576aa7a9
Agent: cc-rc-bot

Co-authored-by: cyber-ayi <259769279+cyber-ayi@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 ROADMAP.md                     |  3 +-
 exploration/prior-art-l2-l3.md | 93 ++++++++++++++++++++++++++++++++++
 2 files changed, 95 insertions(+), 1 deletion(-)
 create mode 100644 exploration/prior-art-l2-l3.md

diff --git a/ROADMAP.md b/ROADMAP.md
index 8ea1e81..df6b842 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -83,7 +83,7 @@ Each downstream task carries its own small decision-gate. None block T1.
 
 | Open decision | Surfaces in | Status |
 |---|---|---|
-| harness **runtime**: OpenClaw vs ElizaOS vs Hermes | T2 / T5 | candidates listed in `harness/README.md`, unchosen |
+| harness **runtime**: OpenClaw vs ElizaOS vs Hermes | T2 / T5 | unchosen; **lean = ElizaOS-core + OpenClaw-gateway + Hermes-fallback** per `exploration/prior-art-l2-l3.md` |
 | **IPC transport** TS↔Python (how JSON crosses the seam) | T5 | explicitly deferred by ADR-0006 |
 | Melvor **act path**: mod API vs CDP / headless-Chromium | T2 | `TASKS.md` requires choose-and-justify |
 | **salient push** channel: ntfy vs Discord | T3 | **converging → MVP: AstrBot over Discord, two-way** (transport-not-brain; `discord.py` fallback). See `exploration/gateway-selection.md` |
@@ -96,6 +96,7 @@ Evaluations still in motion live in `exploration/` and resolve into ADRs.
 |---|---|---|
 | `exploration/substrate-selection.md` | **resolved → adr-0007** | substrate ladder: Melvor P0, Stardew P1+ (dual-control staged to P1) |
 | `exploration/gateway-selection.md` | converging | operator↔agent channel; MVP = AstrBot over Discord, two-way (transport-not-brain; `discord.py` fallback) |
+| `exploration/prior-art-l2-l3.md` | converging | L2 runtimes + L3 cognition survey; feeds T2 (runtime) and T4 (drive design): reuse-vs-refuse, drop the optimization objective |
 
 ## Governance / ops track (parallel to the product)
 
diff --git a/exploration/prior-art-l2-l3.md b/exploration/prior-art-l2-l3.md
new file mode 100644
index 0000000..cc5ed96
--- /dev/null
+++ b/exploration/prior-art-l2-l3.md
@@ -0,0 +1,93 @@
+---
+topic: prior-art-l2-l3
+status: converging
+date: 2026-06-03
+related-adrs: [adr-0004, adr-0006]
+resolves-to:
+---
+
+# Prior art — L2 runtimes & L3 cognition (what to build on, what to refuse)
+
+> Status: **converging**. A survey of current (2026) community/research work for
+> the harness runtime (L2) and the cognition layer (L3), mapped to our needs.
+> Feeds T2 (runtime choice) and T4 (drive-layer design). Resolves into ADRs when
+> those are committed.
+
+## Question
+
+What existing L2 runtimes and L3 cognition systems do we stand on, and which
+carry biases (reward/optimization) we must **not** adopt given autotelic-not-reward
+(`adr-0004`)?
+
+## Criteria
+
+- **L2 harness:** runtime + perception↔action loop, local-model friendly, async
+  operator channel, tolerant of dual-control; **no built-in reward/curriculum bias**.
+- **L3 cognition:** autotelic-not-reward, identity + needs/drives + memory/reflection,
+  long-horizon *non-optimal believable* behavior.
+
+## L2 — harness / runtime / game-playing agents
+
+| Impl | What | Fits | Don't / risk |
+|---|---|---|---|
+| **ElizaOS** | TS agent runtime; plugin = actions/providers/**evaluators** loop; PostgreSQL memory; Worlds/Rooms; targets game NPCs/companions | best cognition-friendly harness; TS (adr-0006); "agent-in-world" precedent | heavily **Web3/crypto**-oriented now — use only the core loop |
+| **OpenClaw** | TS/Node/Electron; messaging-platform UI; **browser relay over CDP**; Ollama | browser/CDP + **operator async gateway** | heavy (Electron/sandboxes); CDP moot for SMAPI/MCP substrates |
+| **Hermes** (NousResearch) | provider-agnostic, **local-first** (Ollama/LM Studio) | local "fallback brain" (privacy routing, adr-0006 rule 6) | needs ≥64k ctx; no world precedent |
+| **Voyager / ODYSSEY / MindForge / Co-Voyager** | Minecraft open-world agents: **auto-curriculum + skill library** | reference architecture for open-world exploration | **skill-acquisition + curriculum = reward-shaped → conflicts with adr-0004**; reference only |
+| **PIANO** (Project Sid / Altera) | parallel multi-stream cognition; agents **generate own goals** from social motivation; 1000+ agents | good **L3** reference (multi-stream + self-generated goals) | civilization/multi-agent framing; not a single-dyad harness |
+
+## L3 — cognition / memory / autotelic motivation
+
+| Impl | What | Fits | Gap / risk |
+|---|---|---|---|
+| **Generative Agents** (Park 2023) | memory stream + **reflection** + planning | memory/reflection **paradigm reusable directly** | behavior still serves schedules/goals |
+| **Letta/MemGPT · Mem0** | tiered memory (core/recall/archival), auto promote/compress | **memory infrastructure off the shelf** | memory layer only; no drive/identity |
+| **autotelic line** (Colas; **MAGELLAN** ICML'25; "Beyond Utility" NeurIPS'25) | agents self-generate NL goals; MAGELLAN uses **learning-progress (LP)** to guide goal choice | goal-generation machinery = the selector's theoretical core | **LP is an intrinsic reward → still optimizing**; borrow the mechanism, **drop the optimization objective** |
+| **needs / personality / artificial life** (**Sophia** 2512.18202; "personality from **needs alone**"; evolving_personality; SPeCtrum) | personality/behavior **emerging from basic needs**; persistent identity | closest to the identity+needs engine | social-emergence framing; known **persona drift** + convergence to "average persona" |
+
+## Two tensions
+
+1. **Optimization bias is everywhere.** Open-world agents (Voyager/PIANO) and
+   "autotelic RL" (Colas/MAGELLAN learning-progress) ultimately **optimize**
+   (skills / curriculum / LP). Our stance ("purposeless, non-optimal, an end in
+   itself") is *more radical*. → borrow goal-generation machinery, **deliberately
+   discard the optimization objective**, or the drive layer collapses back into a
+   reward maximizer (`adr-0004`).
+2. **The gap is confirmed.** "Persistent personality/needs sustaining long-horizon
+   *non-optimal believable* behavior" is essentially unstudied — exactly the niche
+   `design/autotelic-drives.md` claims. The 2026 needs/artificial-life line is the
+   closest but still social-emergence-framed and drift-prone. **This unfilled niche
+   is the project's contribution.**
+
+## Recommendations
+
+- **L2 (don't build a runtime — adopt one):** **ElizaOS** core (cognition-friendly
+  loop + memory + world precedent, TS) — *strip the crypto*; **OpenClaw** for
+  browser/CDP + the operator async gateway when needed; **Hermes** as the local-first
+  fallback brain. Voyager/PIANO are **reference architectures only** (reward bias).
+- **L3 (self-build — it's the IP — but stand on giants):** reuse **Generative Agents**
+  memory+reflection + **Letta/Mem0** for storage; take goal-generation from
+  **Colas/MAGELLAN** but **cut the learning-progress reward**; take identity+needs
+  from the **needs-emergence/Sophia** line. Spend the budget on the unfilled niche:
+  long-horizon, non-optimal, believable inhabitation.
+
+## Per-task relevance
+
+- **T2 (Melvor adapter / runtime):** the L2 table + lean (ElizaOS-core / OpenClaw-
+  gateway / Hermes-fallback) is the input to the `harness runtime` decision-gate
+  (ROADMAP). Note: for Melvor/Stardew (mod/SMAPI/MCP adapters) OpenClaw's CDP edge
+  is less decisive.
+- **T4 (drive layer):** the L3 table + the two tensions are the design backdrop —
+  *what to reuse, what to refuse*. The "drop the optimization objective" rule is the
+  load-bearing constraint, alongside `adr-0004`.
+
+## Open items
+
+- Commit the L2 runtime choice → a `harness runtime` ADR (resolves the ROADMAP gate).
+- T4 will likely spawn its own exploration (selector design without an optimization
+  objective; memory/identity stack choice).
+
+## Sources
+
+- L2: [ElizaOS](https://www.elizaos.ai/) · [ElizaOS/OpenClaw/Hermes compared](https://innfactory.ai/en/blog/openclaw-vs-hermes-agent-comparison/) · [OpenClaw browser harness](https://openclawlaunch.com/guides/openclaw-browser-harness) · [Hermes Agent](https://github.com/nousresearch/hermes-agent) · [Voyager](https://voyager.minedojo.org/) · [ODYSSEY](https://openreview.net/pdf?id=vtGLtSxtqv) · [MindForge](https://arxiv.org/pdf/2411.12977) · [Project Sid / PIANO](https://arxiv.org/abs/2411.00114)
+- L3: [Generative Agents](https://arxiv.org/pdf/2304.03442) · [Letta/MemGPT vs Mem0](https://vectorize.io/articles/mem0-vs-letta) · [Augmenting Autotelic Agents w/ LLMs (Colas)](https://proceedings.mlr.press/v232/colas23a/colas23a.pdf) · [Colas publications (MAGELLAN)](https://cedriccolas.com/publications/) · [LLM Agents Beyond Utility](https://arxiv.org/abs/2510.14548) · [Sophia: Persistent Agent Framework for Artificial Life](https://arxiv.org/pdf/2512.18202) · [Personality from needs alone](https://www.eurekalert.org/news-releases/1099709) · [SPeCtrum identity](https://arxiv.org/pdf/2502.08599)