From e4bdb6a95c55eaa5f6b425b2c7e4f52b1a820706 Mon Sep 17 00:00:00 2001 From: MilkyCode <145179065+light12222@users.noreply.github.com> Date: Fri, 12 Jun 2026 00:09:43 -0400 Subject: [PATCH] Revise README for clarity and updated content Updated README.md to enhance project description, structure, and content organization. --- README.md | 706 +++++++++++++++++++----------------------------------- 1 file changed, 247 insertions(+), 459 deletions(-) diff --git a/README.md b/README.md index 86ae711..6b85d7c 100644 --- a/README.md +++ b/README.md @@ -1,604 +1,392 @@ -# Symphony-coord +

+Symphony-Coord +

-**Symphony-Coord: Emergent Coordination in Decentralized Agent Systems** +

+Adaptive Routing for Multi-Agent LLM Systems +

-Symphony-Coord is a decentralized multi-agent framework that transforms agent selection into an online multi-armed bandit problem, enabling roles to emerge organically through interaction. - -## Table of Contents - -- [Overview](#overview) -- [Key Features](#key-features) -- [Directory Structure](#directory-structure) -- [Installation](#installation) -- [Quick Start](#quick-start) -- [Running Experiments](#running-experiments) -- [Benchmark Data Generation](#benchmark-data-generation) -- [Reproducing Paper Results](#reproducing-paper-results) -- [Configuration Guide](#configuration-guide) -- [Citation](#citation) - -## Project Demo +

+Agents That Learn Who Should Solve What +

- - + + + +

-## Overview +

+ ๐Ÿ“„ Paper + ยท + ๐ŸŒ Live Demo + ยท + ๐Ÿ’ก Ecosystem +

-Symphony employs a three-stage pipeline: -1. **Planning Phase**: Multiple planning agents decompose complex queries into executable sub-tasks -2. **Execution Phase**: Beacon-guided routing matches sub-tasks to specialized agents using LinUCB-based selection -3. **Voting Phase**: CoT voting aggregates multiple agent responses for robust final answers +--- +## Contents - +- [Main Results](#main-results) +- [Overview](#overview) +- [Why Symphony-Coord?](#why-symphony-coord) +- [Demo](#demo) +- [System Architecture](#system-architecture) +- [Quick Start](#quick-start) +- [Reproducing Results](#reproducing-results) +- [Citation](#citation) -### Architecture +--- -``` -User Query - โ”‚ - โ–ผ -โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” -โ”‚ Planning Phase โ”‚ -โ”‚ - Task decomposition (k plans) โ”‚ -โ”‚ - LinUCB plan selection โ”‚ -โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ - โ”‚ - โ–ผ -โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” -โ”‚ Execution Phase โ”‚ -โ”‚ - Beacon broadcast for each sub-task โ”‚ -โ”‚ - Top-L agent candidate selection โ”‚ -โ”‚ - LinUCB agent selection โ”‚ -โ”‚ - Parallel CoT execution โ”‚ -โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ - โ”‚ - โ–ผ -โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” -โ”‚ Voting Phase โ”‚ -โ”‚ - CoT voting across responses โ”‚ -โ”‚ - Final answer aggregation โ”‚ -โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ - โ”‚ - โ–ผ -Final Result -``` +

+ +

-## Key Features +

+Three-stage coordination pipeline: +Planning โ†’ Adaptive Routing โ†’ Voting & Aggregation +

-- **Decentralized Architecture**: No central orchestrator required, fault-tolerant -- **Intelligent Task Routing**: Beacon-based capability matching with LinUCB learning -- **Advanced Reasoning**: Multi-path CoT with majority voting -- **Edge-Optimized**: Runs on consumer-grade GPUs (RTX 3060/4090, Jetson, M-series Mac) +--- +# Main Results -## Directory Structure +Symphony-Coord consistently outperforms both single-agent and multi-agent baselines across mathematical reasoning, multi-hop reasoning, and domain-specific QA benchmarks. -``` -symphony/ -โ”œโ”€โ”€ README.md # This file -โ”œโ”€โ”€ requirements.txt # Python dependencies -โ”œโ”€โ”€ pyproject.toml # Package configuration -โ”‚ -โ”œโ”€โ”€ core/ # Core algorithms -โ”‚ โ”œโ”€โ”€ capability.py # Capability matching -โ”‚ โ”œโ”€โ”€ linucb_selector.py # LinUCB bandit selector -โ”‚ โ”œโ”€โ”€ routing.py # Task routing -โ”‚ โ””โ”€โ”€ voting.py # CoT voting mechanisms -โ”‚ -โ”œโ”€โ”€ agents/ # Agent implementations -โ”‚ โ”œโ”€โ”€ agent.py # Main Agent class -โ”‚ โ””โ”€โ”€ user.py # User client -โ”‚ -โ”œโ”€โ”€ protocol/ # Protocol definitions -โ”‚ โ”œโ”€โ”€ task_contract.py # Task data structures -โ”‚ โ””โ”€โ”€ beacon.py # Beacon messages -โ”‚ -โ”œโ”€โ”€ infra/ # Infrastructure -โ”‚ โ””โ”€โ”€ ISEP.py # Service exchange protocol -โ”‚ -โ”œโ”€โ”€ models/ # Model loaders -โ”‚ โ””โ”€โ”€ base_loader.py # LLM loading utilities -โ”‚ -โ”œโ”€โ”€ symphony.py # Core orchestrator -โ”œโ”€โ”€ main.py # Simple entry point -โ”œโ”€โ”€ agent_register.py # Agent registration runner -โ”œโ”€โ”€ user_register.py # User registration runner -โ”‚ -โ”œโ”€โ”€ experiments/ # All experiments -โ”‚ โ”œโ”€โ”€ README.md # Experiments overview -โ”‚ โ”œโ”€โ”€ pretrain.py # Main experiment runner -โ”‚ โ”œโ”€โ”€ configs/ # Configuration files -โ”‚ โ”œโ”€โ”€ scripts/ # Shell scripts -โ”‚ โ”œโ”€โ”€ exp1/ # Exp1: Efficiency & Cost -โ”‚ โ”œโ”€โ”€ exp2/ # Exp2: Robustness & Recovery -โ”‚ โ””โ”€โ”€ exp3/ # Exp3: System Optimization -โ”‚ -โ”œโ”€โ”€ scripts/ # Utility scripts -โ”‚ โ”œโ”€โ”€ plotting/ # Visualization -โ”‚ โ”‚ โ”œโ”€โ”€ paper_figures/ # Paper figure generation -โ”‚ โ”‚ โ””โ”€โ”€ routing/ # Routing analysis plots -โ”‚ โ””โ”€โ”€ analysis/ # Analysis utilities -โ”‚ -โ”œโ”€โ”€ symphony-data-generator/ # Benchmark data generation -โ”‚ โ”œโ”€โ”€ config/data_config.yaml # Benchmark configurations -โ”‚ โ”œโ”€โ”€ src/data_generator.py # Core difficulty scoring module -โ”‚ โ””โ”€โ”€ src/quick_start.py # Quick start script -โ”‚ -โ”œโ”€โ”€ docs/ # Documentation -โ”œโ”€โ”€ examples/ # Example configurations -โ””โ”€โ”€ tests/ # Test suite -``` +Compared with single-agent baselines across evaluated backbones, Symphony-Coord achieves: -## Installation +| Benchmark | Accuracy Gain | +|------------|------------| +| GSM8K | +8.5 to +22.0 | +| BBH | +16.5 to +23.5 | +| MedicalQA | +27.0 to +33.0 | -### System Requirements +Across all evaluated backbones, Symphony-Coord achieves the strongest average performance while remaining robust under heterogeneous agent capabilities and cold-start conditions. -| Requirement | Minimum | Recommended | -| ----------- | --------------------- | --------------------------- | -| Python | 3.9 | 3.10 or 3.11 | -| RAM | 8 GB | 16 GB | -| GPU | Optional | CUDA-compatible (RTX 3060+) | -| OS | Linux, macOS, Windows | Linux (Ubuntu 20.04+) | +--- +# Overview -### Step-by-Step Setup +Symphony-Coord is a decentralized multi-agent LLM framework that formulates adaptive routing as an online contextual bandit problem. -```bash -# 1. Clone the repository -git clone https://github.com/anonymous/symphony.git -cd symphony +Instead of relying on static expert assignment or handcrafted orchestration policies, Symphony continuously learns routing decisions from interaction outcomes. -# 2. Create and activate virtual environment -python -m venv venv -source venv/bin/activate # On Windows: venv\Scripts\activate +The framework consists of three stages: -# 3. Upgrade pip -pip install --upgrade pip +1. Planning +2. Adaptive Routing +3. Voting & Aggregation -# 4. Install core dependencies -pip install -r requirements.txt +Core mechanisms include: -# 5. Install Symphony in development mode -pip install -e . +- beacon-based capability advertisement +- Top-L candidate selection +- LinUCB-based adaptive routing +- reward-driven adaptation +- Chain-of-Thought voting -# 6. Verify installation -python -c "import symphony; print('Symphony installed successfully')" -``` +Through continual feedback, routing policies evolve online and improve coordination quality over time. -### Dependencies Overview +--- -The `requirements.txt` includes: +# Why Symphony-Coord? +Existing multi-agent systems often rely on: -**Core Dependencies** (required): -- `torch>=2.0.0` - Deep learning framework -- `transformers>=4.30.0` - Hugging Face model library -- `numpy>=1.24.0` - Numerical computing -- `pyyaml>=6.0` - Configuration file parsing -- `requests>=2.28.0` - HTTP client for API calls -- `pyzmq>=25.0.0` - Distributed messaging -- `aiohttp>=3.8.0` - Async HTTP client +- centralized orchestrators +- static expert assignment +- fixed routing heuristics -**Optional Dependencies** (for GPU acceleration): -- `accelerate>=0.20.0` - Distributed training -- `bitsandbytes>=0.41.0` - 8-bit quantization -- `peft>=0.4.0` - Parameter-efficient fine-tuning +However, real-world decentralized systems are inherently dynamic. -### API Key Setup (Required for Real Experiments) +Agent capability, latency, availability, and specialization continuously evolve during execution. -Symphony uses [OpenRouter](https://openrouter.ai/) for LLM API access: +Symphony-Coord studies how adaptive routing policies can continuously improve coordination quality under changing execution conditions. -```bash -# Option 1: Export in terminal (temporary) -export OPENROUTER_API_KEY="sk-or-v1-your-key-here" +By formulating routing as an online contextual bandit problem, the system learns which agents should solve which tasks while balancing capability, uncertainty, and reward feedback. -# Option 2: Add to shell profile (persistent) -echo 'export OPENROUTER_API_KEY="sk-or-v1-your-key-here"' >> ~/.bashrc -source ~/.bashrc +--- -# Option 3: Create .env file (recommended for development) -echo 'OPENROUTER_API_KEY=sk-or-v1-your-key-here' > .env -``` +# Demo -**Verify API key is set:** -```bash -python -c "import os; print('API Key configured' if os.getenv('OPENROUTER_API_KEY') else 'API Key NOT set')" -``` +## Video Demo -See [docs/OPENROUTER_CONFIG_GUIDE.md](docs/OPENROUTER_CONFIG_GUIDE.md) for detailed API setup instructions. +Explore adaptive routing and emergent specialization in decentralized multi-agent systems. -## Quick Start +

+ + + +

-### Running a Simple Task +--- -```python -from symphony import SymphonyOrchestrator -from agents.agent import Agent +## Interactive Demo -# Initialize orchestrator -orchestrator = SymphonyOrchestrator( - agents=["agent1", "agent2", "agent3"], - topL=3, - cot_count=3 -) +

+ + + +

-# Execute a task -result = orchestrator.run_task( - task_description="Solve: What is 25 * 37?", - requirements=["math"] -) +Interactive features include: -print(f"Result: {result['final_answer']}") -``` +- live routing visualization +- evolving specialization dynamics +- decentralized coordination simulation +- adaptive recovery under failure +- multi-agent execution tracing -### Using OpenRouter API +--- -```bash -# Set API key -export OPENROUTER_API_KEY="sk-or-v1-..." - -# Run with OpenRouter models -python experiments/pretrain.py \ - --task-pool path/to/tasks.jsonl \ - --agents "deepseek-v3" \ - --runtime-dir experiments/configs \ - --n 100 -``` +# System Architecture -## Running Experiments +## 1. Planning -All experiments are in the `experiments/` directory. See [experiments/README.md](experiments/README.md) for detailed documentation. +๐Ÿงฉ Task decomposition and candidate plan generation. -### Overview of Experiments +Core components: -| Experiment | Description | Type | Estimated Time | -| ------------ | -------------------------- | ----------------- | ----------------------------- | -| **Exp1** | Efficiency & Cost Analysis | Simulation + Real | 30 min (sim) / 2-4 hrs (real) | -| **Exp2** | Robustness & Recovery | Simulation + Real | 1-2 hrs | -| **Exp3** | System Optimization | Simulation | 30 min | -| **Pretrain** | Main benchmark evaluation | Real | 4-8 hrs per benchmark | +- task decomposition +- candidate plan generation +- plan selection --- -### Experiment 1: Efficiency & Cost Analysis +## 2. Adaptive Routing -**Goal**: Compare agent selection strategies (Always-A, Static Rule, Random, LinUCB). +๐ŸŒ Beacon-guided decentralized coordination. -**Simulation Mode** (no API key needed): -```bash -cd experiments/exp1/sim -python sim_efficiency_cost.py --n 1000 --seed 42 +Core components: -# Output: Results saved to exp1_sim_results/ -``` - -**Real Mode** (requires OpenRouter API key): -```bash -cd experiments/exp1/real -python exp1_real_openrouter.py --n 100 - -# Output: Results saved to exp1_real_results/ -``` - -**Expected Output Files**: -- `accuracy_by_strategy.csv` - Accuracy comparison -- `cost_by_strategy.csv` - API cost comparison -- `selection_trace.json` - Agent selection decisions +- beacon broadcasting +- capability matching +- Top-L candidate selection +- LinUCB routing +- online reward updates --- -### Experiment 2: Robustness & Recovery +## 3. Voting & Aggregation -**Goal**: Evaluate adaptation when agents become unavailable or degraded. +๐Ÿง  Multi-path reasoning fusion. -**Run both simulation and real:** -```bash -bash experiments/exp2/scripts/run_exp2_both.sh - -# Or run separately: -python experiments/exp2/sim/exp2_sim.py --shock-type A_unavailable -python experiments/exp2/real/exp2_real.py --shock-type A_degraded -``` +Core components: -**Shock Types**: -- `A_unavailable`: Agent suddenly becomes unavailable -- `A_degraded`: Agent performance drops significantly - -**Expected Output Files**: -- `recovery_curve.csv` - Accuracy over time after shock -- `adaptation_metrics.json` - Recovery time and final accuracy +- parallel Chain-of-Thought execution +- confidence estimation +- voting-based aggregation +- final answer synthesis --- -### Experiment 3: System Optimization +# Quick Start -**Goal**: Evaluate routing optimization under latency and load variations. +## Requirements -```bash -bash experiments/exp3/run_exp3.sh - -# Or run directly: -python experiments/exp3/sim_system_optimization.py --scenario latency_heterogeneous -``` - -**Scenarios** (defined in `experiments/exp3/configs/scenarios.yaml`): -- `latency_heterogeneous`: Agents with different response latencies -- `load_burst`: Dynamic load spikes -- `combined`: Both latency and load variations - -**Expected Output Files**: -- `latency_comparison.csv` - Response time metrics -- `load_balance_metrics.csv` - Task distribution across agents +* Python 3.10+ +* OpenRouter API key --- -### Main Pretrain Experiments (Benchmark Evaluation) - -**Goal**: Evaluate Symphony on standard benchmarks (GSM8K, BBH, Medical QA). +## Installation -**Run individual benchmarks:** ```bash -# GSM8K (math reasoning) -bash experiments/scripts/run_gsm8k_pretrain.sh - -# BBH (Big-Bench Hard) -bash experiments/scripts/run_bbh_pretrain.sh +git clone https://github.com/GradientHQ/symphony-coord.git +cd symphony-coord -# Balanced sampling across all tasks -bash experiments/scripts/run_balanced_pretrain.sh +python -m venv venv +source venv/bin/activate -# All datasets sequentially -bash experiments/scripts/run_all_datasets.sh +pip install --upgrade pip +pip install -r requirements.txt +pip install -e . ``` -**Run with custom parameters:** +Verify installation: + ```bash -python experiments/pretrain.py \ - --task-pool data/gsm8k_full.jsonl \ - --benchmark gsm8k \ - --n 600 \ - --cold-n 200 \ - --pretrain-n 300 \ - --test-n 100 \ - --topL 3 \ - --plan-k 3 \ - --cot-count 3 \ - --agents "deepseek-v3,openai-gpt-5-nano,openai-gpt-4-1-nano" \ - --runtime-dir experiments/configs +python -c "import symphony; print('Symphony installed successfully')" ``` -**Expected Output** (saved to `pretrain_results//`): -- `accuracy_summary.csv` - Per-phase accuracy -- `ucb_trace.md` - LinUCB arm selection trace -- `progress_state.json` - Checkpoint for resumption - -## Benchmark Data Generation - -Symphony includes a unified data generator for creating experiment-ready task pools with difficulty scoring across 5 benchmarks. +--- -### Quick Start +## Configure API Key ```bash -cd symphony-data-generator -pip install -r requirements.txt -python src/quick_start.py +export OPENROUTER_API_KEY="your-key" ``` -### Supported Benchmarks - -| Benchmark | Source | Tasks | Type | -| ------------- | ------------------------------ | ----- | ---------------------- | -| **HumanEval** | `openai_humaneval` | 164 | Code Generation | -| **GSM8K** | `gsm8k` | 1,319 | Mathematical Reasoning | -| **BBH** | `lukaemon/bbh` | 2,437 | Multi-hop Reasoning | -| **AMC** | `AI-MO/aimo-validation-amc` | 83 | Competition Math | -| **MedicalQA** | `GBaker/MedQA-USMLE-4-options` | 1,273 | Domain-Specific QA | +Verify API configuration: -### Difficulty Scoring Formulas +```bash +python -c "import os; print('API Key configured' if os.getenv('OPENROUTER_API_KEY') else 'API Key NOT set')" +``` -Each benchmark uses a domain-specific difficulty scoring function: +--- -**HumanEval (Code Generation)**: -$$d_{\text{code}} = 0.6 \cdot \frac{n_{\text{asserts}}}{\hat{a}} + 0.4 \cdot \frac{|\text{prompt}|}{\hat{p}}$$ +## Run Example -**GSM8K (Mathematical Reasoning)**: -$$d_{\text{math}} = \frac{\text{reasoning\_steps}}{\hat{s}}$$ +```python +from symphony import SymphonyOrchestrator -**BBH (Multi-hop Reasoning)**: -$$d_{\text{BBH}} = c_{\text{task}} + 0.3 \cdot \frac{|\text{input}|}{\hat{i}}$$ +orchestrator = SymphonyOrchestrator( + agents=["agent1", "agent2", "agent3"], + topL=3, + cot_count=3, +) -**AMC (Competition Mathematics)**: -$$d_{\text{AMC}} = 0.7 \cdot \frac{|\text{problem}|}{\hat{p}} + 0.3 + 0.12 \cdot \mathbb{1}[\text{math\_notation}]$$ +result = orchestrator.run_task( + task_description="Solve: What is 25 * 37?", + requirements=["math"], +) -**Medical QA (Domain-Specific)**: -$$d_{\text{med}} = 0.4 \cdot \bar{q} + 0.3 \cdot \bar{k} + 0.2 \cdot \bar{o} + 0.2 \cdot \mathbb{1}[\text{clinical}]$$ +print(result["final_answer"]) +``` -Where $\hat{\cdot}$ denotes 95th percentile normalizers computed from the full dataset. +--- +# Reproducing Results -### Difficulty Binning +This section provides the commands used to reproduce the main experimental results reported in the paper. -Tasks are categorized using percentile-based thresholds (P20/P80): -- **Easy**: score โ‰ค P20 -- **Hard**: score โ‰ฅ P80 -- **Medium**: P20 < score < P80 +## Main Benchmark Results -### Generating Task Pools +Run all benchmark evaluations: -```python -from src.data_generator import DatasetBuilder +```bash +bash experiments/scripts/run_all_datasets.sh +``` -builder = DatasetBuilder('config/data_config.yaml') +Run individual benchmarks: -# Preprocess all benchmarks (one-time) -builder.preprocess_all_benchmarks(output_dir='data/benchmarks/full') +```bash +bash experiments/scripts/run_gsm8k_pretrain.sh +bash experiments/scripts/run_bbh_pretrain.sh +bash experiments/scripts/run_balanced_pretrain.sh +``` -# Generate experiment stream -tasks = builder.build_task_stream( - benchmarks_to_include=['humaneval', 'gsm8k'], - difficulty_split='80:20', # 80% easy, 20% hard - n_total_tasks=1000, - random_seed=2025, -) +Benchmarks include: -builder.save_task_pool(tasks, 'data/exp1/task_pool.jsonl') -``` +| Benchmark | Task Type | +| --------- | ---------------------- | +| GSM8K | Mathematical Reasoning | +| BBH | Multi-hop Reasoning | +| MedicalQA | Domain-Specific QA | --- -## Reproducing Paper Results - -This section provides step-by-step instructions to reproduce all results in the paper. +## System-Level Experiments -### Step 1: Environment Setup +### Exp1: Efficiency & Cost Analysis ```bash -# Create fresh environment -python -m venv venv && source venv/bin/activate - -# Install dependencies -pip install --upgrade pip -pip install -r requirements.txt -pip install -e . - -# Set API key -export OPENROUTER_API_KEY="sk-or-v1-your-key" - -# Verify setup -python -c "import symphony; import os; print('Ready!' if os.getenv('OPENROUTER_API_KEY') else 'Missing API key')" +python experiments/exp1/real/exp1_real_openrouter.py --n 2000 ``` -### Step 2: Run All Experiments +### Exp2: Robustness & Recovery ```bash -# Exp1: Efficiency & Cost Analysis (Table 2 in paper) -python experiments/exp1/real/exp1_real_openrouter.py --n 2000 - -# Exp2: Robustness & Recovery (Figure 4 in paper) bash experiments/exp2/scripts/run_all_experiments.sh +``` -# Exp3: System Optimization (Figure 5 in paper) -bash experiments/exp3/run_exp3.sh +### Exp3: System Optimization -# Main Benchmark Results (Table 1 in paper) -bash experiments/scripts/run_all_datasets.sh +```bash +bash experiments/exp3/run_exp3.sh ``` -### Step 3: Generate Paper Figures +--- + +## Generate Paper Figures ```bash -# Figure 3: Robustness bar charts python scripts/plotting/paper_figures/plot_robustness_bars.py - -# Figure 4: 3D robustness surface -python scripts/plotting/paper_figures/plot_robustness_3d_surface.py - -# Figure 5: Gap analysis python scripts/plotting/paper_figures/plot_gap_analysis.py - -# Figure 6: Parallel coordinates python scripts/plotting/paper_figures/plot_parallel_coordinates.py - -# Routing analysis visualizations -python scripts/plotting/routing/plot_from_json.py pretrain_results/ -python scripts/plotting/routing/plot_agent_donut.py pretrain_results/ ``` -### Expected Results Summary +Additional routing visualizations: -| Experiment | Key Metric | Expected Range | -| ----------------- | ---------------------------------- | -------------- | -| Exp1 (Efficiency) | LinUCB vs Always-A cost reduction | 15-25% | -| Exp2 (Robustness) | Recovery time after shock | < 50 tasks | -| Exp3 (Latency) | Load-balanced vs naive improvement | 10-20% | -| GSM8K | Test accuracy (LinUCB) | 75-85% | -| BBH | Macro-average accuracy | 60-70% | +```bash +python scripts/plotting/routing/plot_from_json.py pretrain_results/ +python scripts/plotting/routing/plot_agent_donut.py pretrain_results/ +``` --- -## Troubleshooting +## Detailed Experiment Documentation -### Common Issues +For complete experiment configurations, task generation procedures, benchmark preprocessing, troubleshooting, and advanced settings, see: -**1. `ModuleNotFoundError: No module named 'symphony'`** -```bash -# Ensure you're in the project root and installed in dev mode -pip install -e . +```text +experiments/README.md +docs/EXPERIMENTS.md +docs/CONFIGS.md +docs/TROUBLESHOOTING.md +docs/OPENROUTER_CONFIG_GUIDE.md ``` -**2. `OPENROUTER_API_KEY not set`** -```bash -# Check if key is exported -echo $OPENROUTER_API_KEY +--- -# If empty, set it -export OPENROUTER_API_KEY="sk-or-v1-your-key" -``` +# Documentation -**3. `CUDA out of memory`** -```bash -# Use CPU-only mode or reduce batch size -export CUDA_VISIBLE_DEVICES="" # Force CPU -``` +Detailed setup and experiment guides are available in: -**4. `Connection timeout` or `Rate limit exceeded`** -```bash -# Reduce concurrent requests in config -# Edit experiments/configs/openrouter//config_*.yaml -# Add: rate_limit_delay: 1.0 +```text +docs/ +โ”œโ”€โ”€ INSTALL.md +โ”œโ”€โ”€ EXPERIMENTS.md +โ”œโ”€โ”€ CONFIGS.md +โ”œโ”€โ”€ TROUBLESHOOTING.md +โ””โ”€โ”€ OPENROUTER_CONFIG_GUIDE.md ``` -**5. `FileNotFoundError: task-pool not found`** -```bash -# Ensure task data files exist -# Download from paper supplementary materials or generate: -python scripts/analysis/balanced_task_pool.py --output data/tasks.jsonl -``` - -### Getting Help - -- Check [experiments/README.md](experiments/README.md) for experiment-specific issues -- Check [docs/OPENROUTER_CONFIG_GUIDE.md](docs/OPENROUTER_CONFIG_GUIDE.md) for API setup -- Verify Python version: `python --version` (requires 3.9+) +--- -## Configuration Guide +# Repository Structure + +```text +symphony-coord/ +โ”œโ”€โ”€ agents/ # Agent implementations +โ”œโ”€โ”€ core/ # Routing and coordination algorithms +โ”œโ”€โ”€ experiments/ # Benchmark and robustness experiments +โ”œโ”€โ”€ protocol/ # Task and beacon protocols +โ”œโ”€โ”€ scripts/ # Plotting and analysis scripts +โ”œโ”€โ”€ docs/ # Documentation +โ”œโ”€โ”€ tests/ # Test suite +โ””โ”€โ”€ symphony.py # Main orchestrator +``` -### Agent Configuration +--- -Configs in `experiments/configs/openrouter//`: +# Citation -```yaml -debug: false -role: "agent" -node_id: "agent-openrouter-016" -base_model: "openrouter:deepseek/deepseek-chat" -capabilities: [math, reasoning, code] -max_tokens: 512 -temperature: 0.2 +```bibtex +@misc{guan2026symphonycoordadaptiveroutingmultiagent, + title={Symphony-Coord: Adaptive Routing for Multi-Agent LLM Systems}, + author={Zhaoyang Guan and Huixi Cao and Ming Zhong and Yin Wang and Guanyu Liu and Eric Yang and Lynn Ai and Yongxin Ni and Bill Shi}, + year={2026}, + eprint={2602.00966}, + archivePrefix={arXiv}, + primaryClass={cs.MA}, + url={https://arxiv.org/abs/2602.00966}, +} ``` -### Key Experiment Parameters +--- + +# Acknowledgements -| Parameter | Description | Default | -| ------------- | ----------------- | -------- | -| `--task-pool` | Task JSONL file | Required | -| `--n` | Total tasks | 100 | -| `--topL` | Top-L candidates | 3 | -| `--plan-k` | Plans to generate | 3 | -| `--cot-count` | CoT paths | 3 | -| `--agents` | Agent IDs | Required | +We thank the open-source research community for foundational work in: -See [docs/OPENROUTER_CONFIG_GUIDE.md](docs/OPENROUTER_CONFIG_GUIDE.md) for detailed setup. +* decentralized systems +* online bandit optimization +* multi-agent reasoning +* Chain-of-Thought coordination +* distributed inference systems -## Citation +--- -If you use Symphony in your research, please cite: +# License -```bibtex -@article{guan2026symphony, - title={Symphony-Coord: Emergent Coordination in Decentralized Agent Systems}, - author={Guan, Zhaoyang and Cao, Huixi and Zhong, Ming and Yang, Eric and Ai, Lynn and Ni, Yongxin and Shi, Bill}, - journal={arXiv preprint arXiv:2602.00966}, - year={2026} -} -``` +MIT License