From 978258199adfdc1177e7e149734f02e3e41d8156 Mon Sep 17 00:00:00 2001 From: cyber-ayi <259769279+cyber-ayi@users.noreply.github.com> Date: Sat, 6 Jun 2026 12:25:31 -0700 Subject: [PATCH] =?UTF-8?q?docs(exploration):=20prior-art=20L2/L3=20?= =?UTF-8?q?=E2=80=94=20Chinese-community=20supplement?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A separate research pass surfaced five concrete zh-ecosystem items the original survey missed; the most load-bearing is GenerativeAgentsCN — a Smallville reimplementation that empirically validates 25 LLM agents running on local Ollama + Qwen3-4B / DeepSeek-R1, which materially de-risks the local-cost question for any future multi-agent substrate work. L2/L3 table additions: - L2: AgentScope (Alibaba DAMO, Apache-2.0) — msghub multi-agent broadcast + pipeline; official 7-agent Werewolf template; closest off-the-shelf "drop N agents into a room" runtime. Flagged for spike vs ElizaOS-core only if the multi-agent substrate option is pursued. - L3: GenerativeAgentsCN (x-glacier, MIT) — Smallville zh fork with verified Ollama + Qwen3-4B / DeepSeek-R1 at N=25. Concrete local-LLM cost evidence. - L3: AgentVerse simulation track (OpenBMB/Tsinghua, arxiv 2308.10848) — third multi-agent-emergence reference besides Smallville/PIANO. - L3: EconAgent (Tsinghua, ACL'24 Outstanding) — 100 LLM × 20-yr macro sim reproducing stylized facts. Strongest evidence that long-horizon multi-LLM sims can stay coherent — supports the "non-optimal believable long-horizon" feasibility (separate from objective). Chinese-community supplement subsection (don't fit the L2/L3 dichotomy): - CharacterGLM-6B (THU CoAI + Lingxin, EMNLP'24, open 6B) — Chinese role-customised pre-trained dialogue model; NPC local-model candidate. - Chinese RP corpora — ChatHaruhi (54k), CharacterEval (1785/77), RoleBench, SuperCLUE-Role; reusable as persona-fidelity / believability evaluator + RAG corpus. - Wuxia-MUD lib assets (pkuxkx.net wiki + mudcore) — 30 yrs of LPMud zh content as RAG corpus if a zh-setting substrate is chosen (setting choice deferred to substrate-evennia-multi-agent). - AI-companion product observation — 筑梦岛/猫箱/星野/Tavo ship multi-AI-in-one-scene features; zh-market consumer validation of multi-agent co-presence UX. Closed-source — informative only. - One MUD × LLM lead: mud.ren/threads/436 'Yanhuang MUD' (炎黄 MUD) — the only public zh MUD + LLM signal found; single-NPC + no GitHub. Worth contacting author if multi-agent MUD work proceeds. Recommendations + Open items updated to gate the AgentScope spike behind the upcoming substrate-evennia-multi-agent exploration; note's overall status stays converging (lean unchanged: ElizaOS-core + OpenClaw + Hermes). Session-Id: 019e9e62-7e3f-7286-9de2-7b3bc7b9369d Agent: cc-rc-bot --- exploration/prior-art-l2-l3.md | 52 ++++++++++++++++++++++++++++++++-- 1 file changed, 50 insertions(+), 2 deletions(-) diff --git a/exploration/prior-art-l2-l3.md b/exploration/prior-art-l2-l3.md index cc5ed96..f948f6e 100644 --- a/exploration/prior-art-l2-l3.md +++ b/exploration/prior-art-l2-l3.md @@ -35,6 +35,7 @@ carry biases (reward/optimization) we must **not** adopt given autotelic-not-rew | **Hermes** (NousResearch) | provider-agnostic, **local-first** (Ollama/LM Studio) | local "fallback brain" (privacy routing, adr-0006 rule 6) | needs ≥64k ctx; no world precedent | | **Voyager / ODYSSEY / MindForge / Co-Voyager** | Minecraft open-world agents: **auto-curriculum + skill library** | reference architecture for open-world exploration | **skill-acquisition + curriculum = reward-shaped → conflicts with adr-0004**; reference only | | **PIANO** (Project Sid / Altera) | parallel multi-stream cognition; agents **generate own goals** from social motivation; 1000+ agents | good **L3** reference (multi-stream + self-generated goals) | civilization/multi-agent framing; not a single-dyad harness | +| **AgentScope** (Alibaba DAMO, Apache-2.0) | Python multi-agent framework; **`msghub` broadcast** + pipeline orchestration; official 7-agent Werewolf game template; Ollama / local-LLM friendly | **multi-agent comms primitive** — closest thing to a "drop N agents into a room" runtime; useful for the multi-agent substrate option in [[substrate-evennia-multi-agent]] | workflow/task-oriented; less *persistent-world* precedent than Eliza — needs a world layer above | ## L3 — cognition / memory / autotelic motivation @@ -44,6 +45,44 @@ carry biases (reward/optimization) we must **not** adopt given autotelic-not-rew | **Letta/MemGPT · Mem0** | tiered memory (core/recall/archival), auto promote/compress | **memory infrastructure off the shelf** | memory layer only; no drive/identity | | **autotelic line** (Colas; **MAGELLAN** ICML'25; "Beyond Utility" NeurIPS'25) | agents self-generate NL goals; MAGELLAN uses **learning-progress (LP)** to guide goal choice | goal-generation machinery = the selector's theoretical core | **LP is an intrinsic reward → still optimizing**; borrow the mechanism, **drop the optimization objective** | | **needs / personality / artificial life** (**Sophia** 2512.18202; "personality from **needs alone**"; evolving_personality; SPeCtrum) | personality/behavior **emerging from basic needs**; persistent identity | closest to the identity+needs engine | social-emergence framing; known **persona drift** + convergence to "average persona" | +| **GenerativeAgentsCN** (x-glacier, MIT, 463⭐) | Smallville Chinese reimplementation; **verified Ollama + Qwen3-4B / DeepSeek-R1 running 25 agents** locally | **concrete local-LLM cost evidence** for an N≥25 multi-agent run + a ready zh scaffold; fork-and-run start for multi-agent emergence work | still Smallville-shaped (schedules/goals) — same optimization framing as Park 2023 | +| **AgentVerse — `simulation` track** (OpenBMB/Tsinghua, arxiv 2308.10848) | LLM multi-agent framework split into `task-solving` + `simulation`; Minecraft branch studies emergent multi-agent behavior | third multi-agent-emergence reference besides Smallville/PIANO; cleaner sim/task separation than ElizaOS | not autotelic — sim runs still framed by task success | +| **EconAgent** (Tsinghua, ACL'24 Outstanding) | 100 LLM agents × 20 simulated years; macro-economic sim that **reproduces stylized economic facts** | strongest existing evidence that a **long-horizon multi-LLM sim can stay coherent** — supports the "non-optimal believable long-horizon" feasibility | optimization-shaped objective (macro outcomes) → borrow the coherence-evidence, not the objective | + +## Chinese-community supplement (added 2026-06-06) + +Section added after the original note converged. Covers Chinese-ecosystem +items that sit alongside the L2/L3 tables — role-LLM model layer, RP corpora, +and MUD assets — plus a relevant consumer-product observation. The new L2/L3 +rows above (AgentScope, GenerativeAgentsCN, AgentVerse-sim, EconAgent) belong +in their tables; this subsection is for the items that don't. + +- **CharacterGLM-6B** (THU CoAI + Lingxin AI, EMNLP'24, open 6B) — Chinese + role-customised dialogue **pre-trained** model with a six-dimension subjective + evaluator. Candidate **NPC local model** when role-fidelity matters more than + general capability; slots beneath the L3 table as a model-layer choice. +- **Chinese RP / role-eval corpora**: ChatHaruhi (54k dialogues, 32 zh+en + characters, MIT) · CharacterEval (1785 multi-turn dialogues, 77 zh + novel/drama characters) · RoleBench · SuperCLUE-Role. Collectively the + largest open Chinese role-fidelity dataset stack. Reusable as (a) NPC + persona-fidelity evaluator, (b) drive-layer believability evaluator, + (c) RAG corpus for character knowledge. +- **Wuxia-MUD lib assets** (pkuxkx.net wiki + `mudcore` / `xwjy_mud/mudcore`): + 30 years of LPMud-based Chinese MUD content — characters / sects / techniques + / geography / NPC dialogue — usable as RAG corpus *if* a Chinese-setting + substrate is chosen. Setting choice is left to [[substrate-evennia-multi-agent]]; + the asset's existence is the relevant prior-art fact. +- **AI-companion product observation** (informative, not adoptable): closed- + source Chinese RP apps — 筑梦岛 (Yuewen/Tencent), 猫箱 (ByteDance), 星野 + (MiniMax), Tavo — all ship "multi-AI characters in one shared scene" features. + **Multi-agent co-presence has consumer-product validation in the zh market** + that the en market lacks — a weak signal that the operator-in-multi-agent-world + UX is not unprecedented (relevant to [[substrate-evennia-multi-agent]]). +- **One MUD × LLM lead** — `mud.ren/threads/436` describes a project called + "Yanhuang MUD" (炎黄 MUD) running `npc_manager.py` for LLM NPCs with memory + + knowledge-base retrieval. No GitHub repo surfaced; appears to be single-NPC, + not multi-agent. The **only public Chinese MUD + LLM signal found**; worth + contacting the thread author if multi-agent MUD work proceeds. ## Two tensions @@ -65,6 +104,9 @@ carry biases (reward/optimization) we must **not** adopt given autotelic-not-rew loop + memory + world precedent, TS) — *strip the crypto*; **OpenClaw** for browser/CDP + the operator async gateway when needed; **Hermes** as the local-first fallback brain. Voyager/PIANO are **reference architectures only** (reward bias). + **AgentScope** is a newly surfaced multi-agent-comms candidate (`msghub` + Werewolf + template); merits a spike comparison vs ElizaOS-core *only if* the multi-agent + substrate option in [[substrate-evennia-multi-agent]] is pursued. - **L3 (self-build — it's the IP — but stand on giants):** reuse **Generative Agents** memory+reflection + **Letta/Mem0** for storage; take goal-generation from **Colas/MAGELLAN** but **cut the learning-progress reward**; take identity+needs @@ -86,8 +128,14 @@ carry biases (reward/optimization) we must **not** adopt given autotelic-not-rew - Commit the L2 runtime choice → a `harness runtime` ADR (resolves the ROADMAP gate). - T4 will likely spawn its own exploration (selector design without an optimization objective; memory/identity stack choice). +- **Multi-agent substrate spike** (gated by [[substrate-evennia-multi-agent]]): fork + `GenerativeAgentsCN`, measure token/tick at N=25 with Qwen3-4B on operator's local + hardware; only then is AgentScope-vs-ElizaOS-core comparison decisive. +- Contact `mud.ren/threads/436` author re: "Yanhuang MUD" — the single public + zh MUD × LLM lead; cheap, may yield code or design insight. ## Sources -- L2: [ElizaOS](https://www.elizaos.ai/) · [ElizaOS/OpenClaw/Hermes compared](https://innfactory.ai/en/blog/openclaw-vs-hermes-agent-comparison/) · [OpenClaw browser harness](https://openclawlaunch.com/guides/openclaw-browser-harness) · [Hermes Agent](https://github.com/nousresearch/hermes-agent) · [Voyager](https://voyager.minedojo.org/) · [ODYSSEY](https://openreview.net/pdf?id=vtGLtSxtqv) · [MindForge](https://arxiv.org/pdf/2411.12977) · [Project Sid / PIANO](https://arxiv.org/abs/2411.00114) -- L3: [Generative Agents](https://arxiv.org/pdf/2304.03442) · [Letta/MemGPT vs Mem0](https://vectorize.io/articles/mem0-vs-letta) · [Augmenting Autotelic Agents w/ LLMs (Colas)](https://proceedings.mlr.press/v232/colas23a/colas23a.pdf) · [Colas publications (MAGELLAN)](https://cedriccolas.com/publications/) · [LLM Agents Beyond Utility](https://arxiv.org/abs/2510.14548) · [Sophia: Persistent Agent Framework for Artificial Life](https://arxiv.org/pdf/2512.18202) · [Personality from needs alone](https://www.eurekalert.org/news-releases/1099709) · [SPeCtrum identity](https://arxiv.org/pdf/2502.08599) +- L2: [ElizaOS](https://www.elizaos.ai/) · [ElizaOS/OpenClaw/Hermes compared](https://innfactory.ai/en/blog/openclaw-vs-hermes-agent-comparison/) · [OpenClaw browser harness](https://openclawlaunch.com/guides/openclaw-browser-harness) · [Hermes Agent](https://github.com/nousresearch/hermes-agent) · [Voyager](https://voyager.minedojo.org/) · [ODYSSEY](https://openreview.net/pdf?id=vtGLtSxtqv) · [MindForge](https://arxiv.org/pdf/2411.12977) · [Project Sid / PIANO](https://arxiv.org/abs/2411.00114) · [AgentScope](https://github.com/modelscope/agentscope) +- L3: [Generative Agents](https://arxiv.org/pdf/2304.03442) · [Letta/MemGPT vs Mem0](https://vectorize.io/articles/mem0-vs-letta) · [Augmenting Autotelic Agents w/ LLMs (Colas)](https://proceedings.mlr.press/v232/colas23a/colas23a.pdf) · [Colas publications (MAGELLAN)](https://cedriccolas.com/publications/) · [LLM Agents Beyond Utility](https://arxiv.org/abs/2510.14548) · [Sophia: Persistent Agent Framework for Artificial Life](https://arxiv.org/pdf/2512.18202) · [Personality from needs alone](https://www.eurekalert.org/news-releases/1099709) · [SPeCtrum identity](https://arxiv.org/pdf/2502.08599) · [GenerativeAgentsCN](https://github.com/x-glacier/GenerativeAgentsCN) · [AgentVerse](https://github.com/OpenBMB/AgentVerse) (paper: [arxiv 2308.10848](https://arxiv.org/abs/2308.10848)) · [EconAgent (ACL'24)](https://aclanthology.org/2024.acl-long.829/) +- Chinese-community supplement: [CharacterGLM-6B](https://github.com/thu-coai/CharacterGLM-6B) · [Chat-Haruhi-Suzumiya](https://github.com/LC1332/Chat-Haruhi-Suzumiya) · [CharacterEval](https://arxiv.org/abs/2401.01275) · [RoleBench / RoleLLM](https://github.com/InteractiveNLP-Team/RoleLLM-public) · [SuperCLUE-Role](https://github.com/CLUEbenchmark/SuperCLUE-Role) · [pkuxkx wiki](https://www.pkuxkx.net/wiki) · [mudcore](https://gitee.com/mudcore/mudcore) · [mudchina站点列表](https://mudchina.github.io/) · [mud.ren/threads/436 — 炎黄 MUD](https://mud.ren/threads/436) · [筑梦岛](https://zhumengdao.com/) · [猫箱 (ByteDance)](https://www.maoxiang.com/)