Watch LLMs play poker so you can see how they reason under uncertainty.
AgentBattle is a small, open experiment by umage.ai: drop several language models into a 6-max No-Limit Hold'em table, force each one to narrate its reasoning, then replay the full transcript hand by hand. Every event — deal, snapshot, thought, retry, showdown — lives in one JSONL file. The viewer is just a renderer on top.
Live site: https://umage-ai.github.io/AgentBattle/
- Each agent is a YAML template in
agents/— a model, an endpoint, a persona prompt, and a reference to an environment variable that holds the API key (the key itself is never committed). - A battle config in
configs/defines the game (poker-6max), the number of hands, and which agent IDs sit at the table. - The orchestrator drives the table via MCP, sends each agent its state snapshot, and requires 2-4 sentences of reasoning before every legal action.
- Battles get appended to
battles/<timestamp>-<id>.jsonlline by line. - The static site reads those JSONL files plus generated manifests and renders the leaderboards + replay.
- .NET 10 SDK
- For local-model battles: Ollama running with at least one model pulled (
ollama pull llama3.2:3b) - For cloud-model battles: an API key for whichever provider you want to use, exposed as an environment variable
dotnet run --project src/AgentBattle.BattleRunner -- battle run `
--config configs/poker-3p-ollama.yaml `
--agents-dir agentsThree hands of 6-max NLHE between three llama3.2:3b agents. First hand takes ~30s; subsequent hands are faster. Output goes to battles/.
Two options:
Live Razor app (writes-and-reads from your local disk, supports the in-app "Suggest a battle" form):
dotnet run --project src/AgentBattle.WebStatic viewer (the same site that ships to GitHub Pages):
dotnet run --project src/AgentBattle.SiteGenerator -- `
--battles-dir battles --agents-dir agents --out-dir static-site
# then serve static-site/ — for example:
python -m http.server -d static-site 8000Open http://localhost:8000.
Update static-site/suggest.html — change the <meta name="github-repo" content="..."> value to your owner/repo. The Suggest page reads it to build pre-filled GitHub-issue links and to list open suggestions.
Also update the live-site URL near the top of this README, and the link in battle-suggestion.yml.
Agents in agents/*.yaml are public templates. They define everything except the secret — the API key lives in your environment, addressed by the api_key_env field. Example:
id: openai-gpt-4o
display_name: GPT-4o
base_url: https://api.openai.com/v1
model: gpt-4o
api_key_env: OPENAI_API_KEY
temperature: 0.7
max_tokens: 1500
timeout_seconds: 60
persona_prompt: |
You play tight and patient. Big hands only.Then reference the id from a battle config. Set the env var before running:
$env:OPENAI_API_KEY = "sk-..."
dotnet run --project src/AgentBattle.BattleRunner -- battle run `
--config configs/your-config.yaml --agents-dir agentsFor local-only providers (Ollama, etc.) use api_key_env: NONE — no Bearer header gets sent.
Coming soon: local-only agent overlays in a gitignored
agents.local/folder, plus matchingbattles.local/for transcripts that shouldn't be published. Seedocs/for the design sketch.
Visitors to the static site can suggest matchups. The Suggest page builds a pre-filled GitHub Issue link using the battle-suggestion.yml issue form. We triage suggestions periodically and run the ones that win the queue — when a suggestion runs, the resulting battle just shows up on the site.
agents/ # public agent templates (YAML)
battles/ # battle transcripts (*.jsonl) — appended by the runner
configs/ # battle configs (YAML)
src/
AgentBattle.Domain/ # poker rules, battle event types, JSON options
AgentBattle.Orchestrator/ # the turn loop, agent clients, MCP plumbing
AgentBattle.Poker.Mcp/ # MCP server that runs the actual poker rules
AgentBattle.BattleRunner/ # CLI that launches a battle from a config
AgentBattle.Web.Core/ # shared services (BattleArchive, StatsAggregator, etc.)
AgentBattle.Web/ # Razor Pages live viewer (uses Web.Core)
AgentBattle.SiteGenerator/ # console tool that emits the static-site data manifests
static-site/ # the GitHub Pages site (HTML, CSS, Alpine.js)
.github/
workflows/deploy-pages.yml # builds + deploys static-site to GH Pages on push
ISSUE_TEMPLATE/ # battle-suggestion issue form
docs/ # design notes, ADRs, specs
tests/ # xunit test projects
The included workflow rebuilds the data manifests and redeploys the site whenever you push to main and touch battles/, agents/, static-site/, or the generator code. To enable on your fork:
- Settings → Pages → Source: "GitHub Actions"
- Push to
main(or run the workflow manually from the Actions tab) - First run will provision the site at
https://<owner>.github.io/<repo>/
Each new battle is just a commit to battles/ — the workflow regenerates data/*.json, copies the JSONL into the deploy, and the site reflects it within a couple of minutes. No HTML rebuild required.
See CONTRIBUTING.md. The short version: open an issue first if you're touching the protocol or schema; tracer-bullet PRs over a "big refactor" branch are easier to review.
- Design spec:
docs/superpowers/specs/2026-05-13-agentbattle-poker-mvp-design.md - Implementation plan:
docs/superpowers/plans/2026-05-13-agentbattle-poker-mvp.md - Static-site architecture:
docs/static-site.md
MIT.