From 978258199adfdc1177e7e149734f02e3e41d8156 Mon Sep 17 00:00:00 2001
From: cyber-ayi <259769279+cyber-ayi@users.noreply.github.com>
Date: Sat, 6 Jun 2026 12:25:31 -0700
Subject: [PATCH] =?UTF-8?q?docs(exploration):=20prior-art=20L2/L3=20?=
 =?UTF-8?q?=E2=80=94=20Chinese-community=20supplement?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

A separate research pass surfaced five concrete zh-ecosystem items the original
survey missed; the most load-bearing is GenerativeAgentsCN — a Smallville
reimplementation that empirically validates 25 LLM agents running on local
Ollama + Qwen3-4B / DeepSeek-R1, which materially de-risks the local-cost
question for any future multi-agent substrate work.

L2/L3 table additions:
- L2: AgentScope (Alibaba DAMO, Apache-2.0) — msghub multi-agent broadcast +
  pipeline; official 7-agent Werewolf template; closest off-the-shelf
  "drop N agents into a room" runtime. Flagged for spike vs ElizaOS-core
  only if the multi-agent substrate option is pursued.
- L3: GenerativeAgentsCN (x-glacier, MIT) — Smallville zh fork with verified
  Ollama + Qwen3-4B / DeepSeek-R1 at N=25. Concrete local-LLM cost evidence.
- L3: AgentVerse simulation track (OpenBMB/Tsinghua, arxiv 2308.10848) —
  third multi-agent-emergence reference besides Smallville/PIANO.
- L3: EconAgent (Tsinghua, ACL'24 Outstanding) — 100 LLM × 20-yr macro sim
  reproducing stylized facts. Strongest evidence that long-horizon
  multi-LLM sims can stay coherent — supports the "non-optimal believable
  long-horizon" feasibility (separate from objective).

Chinese-community supplement subsection (don't fit the L2/L3 dichotomy):
- CharacterGLM-6B (THU CoAI + Lingxin, EMNLP'24, open 6B) — Chinese
  role-customised pre-trained dialogue model; NPC local-model candidate.
- Chinese RP corpora — ChatHaruhi (54k), CharacterEval (1785/77), RoleBench,
  SuperCLUE-Role; reusable as persona-fidelity / believability evaluator
  + RAG corpus.
- Wuxia-MUD lib assets (pkuxkx.net wiki + mudcore) — 30 yrs of LPMud zh
  content as RAG corpus if a zh-setting substrate is chosen (setting
  choice deferred to substrate-evennia-multi-agent).
- AI-companion product observation — 筑梦岛/猫箱/星野/Tavo ship
  multi-AI-in-one-scene features; zh-market consumer validation of
  multi-agent co-presence UX. Closed-source — informative only.
- One MUD × LLM lead: mud.ren/threads/436 'Yanhuang MUD' (炎黄 MUD) —
  the only public zh MUD + LLM signal found; single-NPC + no GitHub.
  Worth contacting author if multi-agent MUD work proceeds.

Recommendations + Open items updated to gate the AgentScope spike behind
the upcoming substrate-evennia-multi-agent exploration; note's overall
status stays converging (lean unchanged: ElizaOS-core + OpenClaw + Hermes).

Session-Id: 019e9e62-7e3f-7286-9de2-7b3bc7b9369d
Agent: cc-rc-bot
---
 exploration/prior-art-l2-l3.md | 52 ++++++++++++++++++++++++++++++++--
 1 file changed, 50 insertions(+), 2 deletions(-)

diff --git a/exploration/prior-art-l2-l3.md b/exploration/prior-art-l2-l3.md
index cc5ed96..f948f6e 100644
--- a/exploration/prior-art-l2-l3.md
+++ b/exploration/prior-art-l2-l3.md
@@ -35,6 +35,7 @@ carry biases (reward/optimization) we must **not** adopt given autotelic-not-rew
 | **Hermes** (NousResearch) | provider-agnostic, **local-first** (Ollama/LM Studio) | local "fallback brain" (privacy routing, adr-0006 rule 6) | needs ≥64k ctx; no world precedent |
 | **Voyager / ODYSSEY / MindForge / Co-Voyager** | Minecraft open-world agents: **auto-curriculum + skill library** | reference architecture for open-world exploration | **skill-acquisition + curriculum = reward-shaped → conflicts with adr-0004**; reference only |
 | **PIANO** (Project Sid / Altera) | parallel multi-stream cognition; agents **generate own goals** from social motivation; 1000+ agents | good **L3** reference (multi-stream + self-generated goals) | civilization/multi-agent framing; not a single-dyad harness |
+| **AgentScope** (Alibaba DAMO, Apache-2.0) | Python multi-agent framework; **`msghub` broadcast** + pipeline orchestration; official 7-agent Werewolf game template; Ollama / local-LLM friendly | **multi-agent comms primitive** — closest thing to a "drop N agents into a room" runtime; useful for the multi-agent substrate option in [[substrate-evennia-multi-agent]] | workflow/task-oriented; less *persistent-world* precedent than Eliza — needs a world layer above |
 
 ## L3 — cognition / memory / autotelic motivation
 
@@ -44,6 +45,44 @@ carry biases (reward/optimization) we must **not** adopt given autotelic-not-rew
 | **Letta/MemGPT · Mem0** | tiered memory (core/recall/archival), auto promote/compress | **memory infrastructure off the shelf** | memory layer only; no drive/identity |
 | **autotelic line** (Colas; **MAGELLAN** ICML'25; "Beyond Utility" NeurIPS'25) | agents self-generate NL goals; MAGELLAN uses **learning-progress (LP)** to guide goal choice | goal-generation machinery = the selector's theoretical core | **LP is an intrinsic reward → still optimizing**; borrow the mechanism, **drop the optimization objective** |
 | **needs / personality / artificial life** (**Sophia** 2512.18202; "personality from **needs alone**"; evolving_personality; SPeCtrum) | personality/behavior **emerging from basic needs**; persistent identity | closest to the identity+needs engine | social-emergence framing; known **persona drift** + convergence to "average persona" |
+| **GenerativeAgentsCN** (x-glacier, MIT, 463⭐) | Smallville Chinese reimplementation; **verified Ollama + Qwen3-4B / DeepSeek-R1 running 25 agents** locally | **concrete local-LLM cost evidence** for an N≥25 multi-agent run + a ready zh scaffold; fork-and-run start for multi-agent emergence work | still Smallville-shaped (schedules/goals) — same optimization framing as Park 2023 |
+| **AgentVerse — `simulation` track** (OpenBMB/Tsinghua, arxiv 2308.10848) | LLM multi-agent framework split into `task-solving` + `simulation`; Minecraft branch studies emergent multi-agent behavior | third multi-agent-emergence reference besides Smallville/PIANO; cleaner sim/task separation than ElizaOS | not autotelic — sim runs still framed by task success |
+| **EconAgent** (Tsinghua, ACL'24 Outstanding) | 100 LLM agents × 20 simulated years; macro-economic sim that **reproduces stylized economic facts** | strongest existing evidence that a **long-horizon multi-LLM sim can stay coherent** — supports the "non-optimal believable long-horizon" feasibility | optimization-shaped objective (macro outcomes) → borrow the coherence-evidence, not the objective |
+
+## Chinese-community supplement (added 2026-06-06)
+
+Section added after the original note converged. Covers Chinese-ecosystem
+items that sit alongside the L2/L3 tables — role-LLM model layer, RP corpora,
+and MUD assets — plus a relevant consumer-product observation. The new L2/L3
+rows above (AgentScope, GenerativeAgentsCN, AgentVerse-sim, EconAgent) belong
+in their tables; this subsection is for the items that don't.
+
+- **CharacterGLM-6B** (THU CoAI + Lingxin AI, EMNLP'24, open 6B) — Chinese
+  role-customised dialogue **pre-trained** model with a six-dimension subjective
+  evaluator. Candidate **NPC local model** when role-fidelity matters more than
+  general capability; slots beneath the L3 table as a model-layer choice.
+- **Chinese RP / role-eval corpora**: ChatHaruhi (54k dialogues, 32 zh+en
+  characters, MIT) · CharacterEval (1785 multi-turn dialogues, 77 zh
+  novel/drama characters) · RoleBench · SuperCLUE-Role. Collectively the
+  largest open Chinese role-fidelity dataset stack. Reusable as (a) NPC
+  persona-fidelity evaluator, (b) drive-layer believability evaluator,
+  (c) RAG corpus for character knowledge.
+- **Wuxia-MUD lib assets** (pkuxkx.net wiki + `mudcore` / `xwjy_mud/mudcore`):
+  30 years of LPMud-based Chinese MUD content — characters / sects / techniques
+  / geography / NPC dialogue — usable as RAG corpus *if* a Chinese-setting
+  substrate is chosen. Setting choice is left to [[substrate-evennia-multi-agent]];
+  the asset's existence is the relevant prior-art fact.
+- **AI-companion product observation** (informative, not adoptable): closed-
+  source Chinese RP apps — 筑梦岛 (Yuewen/Tencent), 猫箱 (ByteDance), 星野
+  (MiniMax), Tavo — all ship "multi-AI characters in one shared scene" features.
+  **Multi-agent co-presence has consumer-product validation in the zh market**
+  that the en market lacks — a weak signal that the operator-in-multi-agent-world
+  UX is not unprecedented (relevant to [[substrate-evennia-multi-agent]]).
+- **One MUD × LLM lead** — `mud.ren/threads/436` describes a project called
+  "Yanhuang MUD" (炎黄 MUD) running `npc_manager.py` for LLM NPCs with memory +
+  knowledge-base retrieval. No GitHub repo surfaced; appears to be single-NPC,
+  not multi-agent. The **only public Chinese MUD + LLM signal found**; worth
+  contacting the thread author if multi-agent MUD work proceeds.
 
 ## Two tensions
 
@@ -65,6 +104,9 @@ carry biases (reward/optimization) we must **not** adopt given autotelic-not-rew
   loop + memory + world precedent, TS) — *strip the crypto*; **OpenClaw** for
   browser/CDP + the operator async gateway when needed; **Hermes** as the local-first
   fallback brain. Voyager/PIANO are **reference architectures only** (reward bias).
+  **AgentScope** is a newly surfaced multi-agent-comms candidate (`msghub` + Werewolf
+  template); merits a spike comparison vs ElizaOS-core *only if* the multi-agent
+  substrate option in [[substrate-evennia-multi-agent]] is pursued.
 - **L3 (self-build — it's the IP — but stand on giants):** reuse **Generative Agents**
   memory+reflection + **Letta/Mem0** for storage; take goal-generation from
   **Colas/MAGELLAN** but **cut the learning-progress reward**; take identity+needs
@@ -86,8 +128,14 @@ carry biases (reward/optimization) we must **not** adopt given autotelic-not-rew
 - Commit the L2 runtime choice → a `harness runtime` ADR (resolves the ROADMAP gate).
 - T4 will likely spawn its own exploration (selector design without an optimization
   objective; memory/identity stack choice).
+- **Multi-agent substrate spike** (gated by [[substrate-evennia-multi-agent]]): fork
+  `GenerativeAgentsCN`, measure token/tick at N=25 with Qwen3-4B on operator's local
+  hardware; only then is AgentScope-vs-ElizaOS-core comparison decisive.
+- Contact `mud.ren/threads/436` author re: "Yanhuang MUD" — the single public
+  zh MUD × LLM lead; cheap, may yield code or design insight.
 
 ## Sources
 
-- L2: [ElizaOS](https://www.elizaos.ai/) · [ElizaOS/OpenClaw/Hermes compared](https://innfactory.ai/en/blog/openclaw-vs-hermes-agent-comparison/) · [OpenClaw browser harness](https://openclawlaunch.com/guides/openclaw-browser-harness) · [Hermes Agent](https://github.com/nousresearch/hermes-agent) · [Voyager](https://voyager.minedojo.org/) · [ODYSSEY](https://openreview.net/pdf?id=vtGLtSxtqv) · [MindForge](https://arxiv.org/pdf/2411.12977) · [Project Sid / PIANO](https://arxiv.org/abs/2411.00114)
-- L3: [Generative Agents](https://arxiv.org/pdf/2304.03442) · [Letta/MemGPT vs Mem0](https://vectorize.io/articles/mem0-vs-letta) · [Augmenting Autotelic Agents w/ LLMs (Colas)](https://proceedings.mlr.press/v232/colas23a/colas23a.pdf) · [Colas publications (MAGELLAN)](https://cedriccolas.com/publications/) · [LLM Agents Beyond Utility](https://arxiv.org/abs/2510.14548) · [Sophia: Persistent Agent Framework for Artificial Life](https://arxiv.org/pdf/2512.18202) · [Personality from needs alone](https://www.eurekalert.org/news-releases/1099709) · [SPeCtrum identity](https://arxiv.org/pdf/2502.08599)
+- L2: [ElizaOS](https://www.elizaos.ai/) · [ElizaOS/OpenClaw/Hermes compared](https://innfactory.ai/en/blog/openclaw-vs-hermes-agent-comparison/) · [OpenClaw browser harness](https://openclawlaunch.com/guides/openclaw-browser-harness) · [Hermes Agent](https://github.com/nousresearch/hermes-agent) · [Voyager](https://voyager.minedojo.org/) · [ODYSSEY](https://openreview.net/pdf?id=vtGLtSxtqv) · [MindForge](https://arxiv.org/pdf/2411.12977) · [Project Sid / PIANO](https://arxiv.org/abs/2411.00114) · [AgentScope](https://github.com/modelscope/agentscope)
+- L3: [Generative Agents](https://arxiv.org/pdf/2304.03442) · [Letta/MemGPT vs Mem0](https://vectorize.io/articles/mem0-vs-letta) · [Augmenting Autotelic Agents w/ LLMs (Colas)](https://proceedings.mlr.press/v232/colas23a/colas23a.pdf) · [Colas publications (MAGELLAN)](https://cedriccolas.com/publications/) · [LLM Agents Beyond Utility](https://arxiv.org/abs/2510.14548) · [Sophia: Persistent Agent Framework for Artificial Life](https://arxiv.org/pdf/2512.18202) · [Personality from needs alone](https://www.eurekalert.org/news-releases/1099709) · [SPeCtrum identity](https://arxiv.org/pdf/2502.08599) · [GenerativeAgentsCN](https://github.com/x-glacier/GenerativeAgentsCN) · [AgentVerse](https://github.com/OpenBMB/AgentVerse) (paper: [arxiv 2308.10848](https://arxiv.org/abs/2308.10848)) · [EconAgent (ACL'24)](https://aclanthology.org/2024.acl-long.829/)
+- Chinese-community supplement: [CharacterGLM-6B](https://github.com/thu-coai/CharacterGLM-6B) · [Chat-Haruhi-Suzumiya](https://github.com/LC1332/Chat-Haruhi-Suzumiya) · [CharacterEval](https://arxiv.org/abs/2401.01275) · [RoleBench / RoleLLM](https://github.com/InteractiveNLP-Team/RoleLLM-public) · [SuperCLUE-Role](https://github.com/CLUEbenchmark/SuperCLUE-Role) · [pkuxkx wiki](https://www.pkuxkx.net/wiki) · [mudcore](https://gitee.com/mudcore/mudcore) · [mudchina站点列表](https://mudchina.github.io/) · [mud.ren/threads/436 — 炎黄 MUD](https://mud.ren/threads/436) · [筑梦岛](https://zhumengdao.com/) · [猫箱 (ByteDance)](https://www.maoxiang.com/)