ApolloAgents

An AI-powered DJ set builder — from track catalog to rendered YouTube video, guided by a team of specialized agents.

ApolloAgents uses a multi-agent pipeline to plan, critique, and build DJ mixes. You describe the vibe. The agents handle harmonic mixing, BPM matching, energy arc planning, and audio quality validation. You stay in control at every checkpoint.

✨ Live Mode

Apollo DJs in real time — no pre-render, no waiting. Just music, events, and autonomous decisions.

%%{init: {'theme': 'base', 'themeVariables': {
  'background': '#0d0d1a',
  'primaryColor': '#1a1a2e',
  'primaryTextColor': '#e0e0ff',
  'primaryBorderColor': '#4a4a8a',
  'lineColor': '#6060aa',
  'secondaryColor': '#12122a',
  'edgeLabelBackground': '#1a1a2e',
  'clusterBkg': '#12122a',
  'clusterBorder': '#3a3a6a',
  'nodeTextColor': '#e0e0ff',
  'fontFamily': 'monospace'
}}}%%

flowchart LR
    TRACK["🎵 Track playing\nlive audio"]:::pipeline

    CF{"⏱ Approaching\ncrossfade?"}:::checkpoint

    GOOD["✅ Let it ride"]:::agent
    MED["⏸ extend_track(20)"]:::agent
    BAD["⚡ crossfade_now()"]:::agent

    USER(["👤 next · stay\nmore energetic\nwind down"]):::user

    NEXT["🎵 Next track\n(pre-stretched)"]:::pipeline

    TRACK -->|"30s warning"| CF
    CF -->|"≤1 Camelot step\n≤8 BPM diff"| GOOD
    CF -->|"2 steps\nOR 8–20 BPM"| MED
    CF -->|">2 steps\nOR >20 BPM"| BAD
    GOOD --> NEXT
    MED  --> NEXT
    BAD  --> NEXT
    USER -->|"mid-set command"| CF

    classDef agent      fill:#1a1a3a,stroke:#5858b0,color:#c8c8ff
    classDef pipeline   fill:#0a1f1a,stroke:#20a060,color:#60ffb0
    classDef checkpoint fill:#2a1a0a,stroke:#c07820,color:#ffc060
    classDef user       fill:#0a0a1a,stroke:#4040a0,color:#8080d0

uv run python agent/run.py
# → go live

→ Full Live Mode docs, thread architecture & cycle diagram

Example Output

Every session in this YouTube channel was built with ApolloAgents — from the earliest proof-of-concept cuts in v0.0 to today's fully orchestrated pipeline in v1.0. Same tracks, same taste, progressively better mixing as the agents learned.

Architecture

%%{init: {'theme': 'base', 'themeVariables': {
  'background': '#0d0d1a',
  'primaryColor': '#1a1a2e',
  'primaryTextColor': '#e0e0ff',
  'primaryBorderColor': '#4a4a8a',
  'lineColor': '#6060aa',
  'secondaryColor': '#12122a',
  'tertiaryColor': '#0d0d1a',
  'edgeLabelBackground': '#1a1a2e',
  'clusterBkg': '#12122a',
  'clusterBorder': '#3a3a6a',
  'titleColor': '#c0c0ff',
  'nodeTextColor': '#e0e0ff',
  'fontFamily': 'monospace'
}}}%%

flowchart TD
    User(["👤 User\nprompt"]):::user

    subgraph APOLLO["☀️  APOLLO — Orchestrator"]
        direction TB

        JANUS["🚪 JANUS\nGenre Guard\n─────────────\nvalidates genre · duration · mood"]:::agent
        HERMES["⚡ HERMES\nCatalog Manager\n─────────────\nsyncs WAVs · detects BPM & key"]:::agent

        MUSE["🎵 MUSE\nPlanner\n─────────────\nenergy arc · harmonic order\nreads memory → avoids weak tracks"]:::agent

        CP1{{"🛑 Checkpoint 1\nreview playlist"}}:::checkpoint

        MOMUS["🎭 MOMUS\nCritic\n─────────────\ncold review · PROBLEMS / VERDICT\nreads memory → flags patterns"]:::agent

        CP2{{"🛑 Checkpoint 2\napply fixes"}}:::checkpoint

        EDITOR["✏️ Editor REPL\nswap · move · refine"]:::agent

        PIPELINE[["⚙️ Mix Pipeline\nBPM match → crossfade → WAV\n1080p video + YouTube Short"]]:::pipeline

        THEMIS["⚖️ THEMIS\nValidator\n─────────────\nclipping · spectral flatness\nsilence gaps · RMS drops"]:::agent

        MEMORY[("🧠 Memory\nrating + notes\n→ agents improve")]:::memory
    end

    User --> JANUS
    User --> HERMES
    JANUS -->|"confirmed genre"| MUSE
    MUSE -->|"playlist"| CP1
    CP1 -->|"proceed"| MOMUS
    MOMUS -->|"verdict"| CP2
    CP2 -->|"ok"| EDITOR
    EDITOR -->|"build"| PIPELINE
    PIPELINE --> THEMIS
    THEMIS -->|"PASS"| MEMORY
    MEMORY -.->|"past sessions"| MUSE
    MEMORY -.->|"problem patterns"| MOMUS

    classDef agent        fill:#1a1a3a,stroke:#5858b0,color:#c8c8ff,rx:6
    classDef checkpoint   fill:#2a1a0a,stroke:#c07820,color:#ffc060,shape:diamond
    classDef pipeline     fill:#0a1f1a,stroke:#20a060,color:#60ffb0
    classDef memory       fill:#1a0a2a,stroke:#8040c0,color:#c080ff
    classDef user         fill:#0a0a1a,stroke:#4040a0,color:#8080d0,shape:circle

Agent	Mythological name	Role
Genre Guard	Janus	Gatekeeper — validates genre, duration, mood before planning starts
Catalog Manager	Hermes	Keeper of records — syncs WAV files to catalog, detects BPM & key
Planner	Muse	Inspires the set — energy arc, harmonic ordering, track selection
Critic	Momus	God of fault-finding — cold independent review, structured verdict
Editor	(REPL)	Interactive editor — swap, move, insert bridge tracks, trigger build or go live
LiveDJ	Apollo LiveDJ	Real-time DJ engine — autonomous crossfade decisions, reacts to engine events and listener commands
Validator	Themis	Goddess of order — audio quality analysis after every build
Orchestrator	Apollo	Conductor — sequences all agents, manages state, collects memory

Pipeline phases

1. Janus (Genre Guard)   → confirms genre / duration / mood
2. Muse  (Planner)       → proposes playlist + energy arc
3. Checkpoint 1          → user reviews; manual adjustments allowed
4. Momus (Critic)        → cold review: flags key clashes, BPM stretch, arc gaps
5. Checkpoint 2          → user sees critique; decides what to apply
6. Editor REPL           → swap · move · insert bridge tracks → build
7. Themis (Validator)    → audio quality report after build

Checkpoints are hard gates — agents never auto-apply fixes. You stay in control.

Features

Conversational planning — describe the vibe, iterate with the agents, build when ready
Harmonic mixing — Camelot wheel-based track ordering for smooth key transitions
BPM matching — gradual tempo ramps between tracks via pyrubberband
BPM stretch safety — transitions with pyrubberband ratio >1.5× are flagged; Critic mandates a bridge track fix
Bridge track insertion — suggest_bridge_track finds candidates between mismatched positions; insert_bridge_track splices one in
EQ matching at crossfade — shelving EQ applied to outgoing/incoming segments based on key distance, reducing frequency masking
Energy arc planning — Planner evaluates set shape (warmup → build → peak → wind-down) and iterates until no gaps or plateaus
Audio validation — peak clipping, spectral flatness (bleach detection), silence gap and RMS anomaly checks
Per-transition ratings — rate each session 1–5 after build; Critic memory flags recurring problem transitions
Session memory — agents learn from past sessions: which tracks get swapped, what energy arcs rate highly
Catalog management — scan new WAVs, detect missing BPM/key fields, keep tracks.json in sync
Live Mode — Apollo DJs in real time: autonomous crossfade decisions, responds to next, stay, more energetic, wind down mid-set
Multi-provider — Claude (Anthropic), GPT-4o (OpenAI), or any local model via Ollama; auto-detected from .env
1080p video output — spectral waveform visualizer, beat-reactive particles, DALL-E 3 artwork, retro pixel titles
YouTube Short — auto-generated 20s teaser alongside the full mix

Setup

Requirements: Python 3.12+, uv, ffmpeg

git clone https://github.com/YOUR_USERNAME/apollo-agents.git
cd apollo-agents

# Install dependencies
uv sync

# Copy and fill in your API keys
cp .env.example .env

.env keys:

Key	Required	Purpose
`ANTHROPIC_API_KEY`	One of these	Claude (recommended, default: `claude-opus-4-6`)
`OPENAI_API_KEY`	One of these	GPT-4o — also used for DALL-E 3 artwork
`AGENT_PROVIDER=ollama`	One of these	Use a local Ollama model (default: `gemma4:4b`)
`OLLAMA_BASE_URL`	Optional	Override Ollama endpoint (default: `http://localhost:11434/v1`)
`AGENT_MODEL`	Optional	Override the model for any provider (e.g. `AGENT_MODEL=gpt-4o-mini`)

Adding Your Tracks

Put WAV files into genre subfolders under tracks/:

tracks/
  techno/
    Acid Rain.wav
    Zero Day.wav
  deep house/
    Solar Drift.wav
  lofi - ambient/
    Kernel Space.wav
  cyberpunk/
    Chrome Horizon.wav

Then build the catalog (detects BPM + Camelot key for each file):

python main.py --build-catalog

Or let Hermes do it conversationally:

uv run python agent/run.py
# → "I added new tracks"

Usage

Conversational agent (recommended)

# Default (Claude / GPT-4o, whichever key is in .env)
uv run python agent/run.py

# Local model via Ollama (no API key required)
AGENT_PROVIDER=ollama uv run python agent/run.py

# Override model for any provider
AGENT_MODEL=claude-haiku-4-5-20251001 uv run python agent/run.py

Example session:

What would you like to do?

You: 60min techno set, dark industrial build to a hard peak

── Janus (Genre Guard) ──
[confirms genre: techno, 60min, mood: dark industrial build]

── Muse (Planner) ──
[surveys catalog, proposes 12-track playlist]
[evaluates energy arc: plateau detected at pos 6-8, swaps pos 7 to fix]
Energy arc: 3-track warmup → hard build → peak at pos 9 → wind-down

── Checkpoint 1 ──
You: move track 4 to position 7
[shows updated playlist]
You: proceed

── Momus (Critic) ──
PROBLEMS:
- [pos 2→3] key clash 5A → 11A — fix: swap pos 3 for zero-day
- [pos 8→9] ⚠ Stretch 1.8× — bridge track required
VERDICT: NEEDS_FIXES

── Checkpoint 2 ──
You: swap pos 3 like the critic said
You: ok

── Editor ──
You: fix the stretch at 8→9
[suggest_bridge_track(8, 9) → 3 candidates at 142 BPM]
[insert_bridge_track(after_position=8, track_id="techno--acid-rain")]
You: build midnight-industrial

── Themis (Validator) ──
AUDIO QUALITY REPORT — midnight-industrial
Status: PASS — no issues detected ✓

Rate 1-5 (Enter to skip): 5
Any notes?: peak section was perfect

Direct CLI (no agent)

# Generate a session directly
python main.py --name "midnight-techno" --genre "techno" --duration 60

# Re-render video from existing mix audio
python main.py --name "midnight-techno" --genre "techno" --video-only

# Fix missing BPM/key fields in catalog
python main.py --fix-incomplete

Supported Genres

Folder name	Visual theme
`techno`	Dark red, industrial
`deep house`	Neon violet, deep
`lofi - ambient`	Warm cream, anime-style artwork
`cyberpunk`	Neon green, dystopic

Add new genres by creating a subfolder under tracks/ and running --build-catalog.

Output

Every session writes to output/<session-name>/:

output/midnight-techno/
  mix_output.wav      # lossless mix
  mix_video.mp4       # 1920×1080, 24fps, spectral waveform
  short.mp4           # 1080×1920, 20s YouTube Short
  session.json        # playlist for reproducibility
  transitions.json    # crossfade timestamps
  youtube.md          # title, description, tracklist, tags

Live Mode

Live Mode skips the pre-rendered pipeline entirely. Apollo plays tracks in real time and makes autonomous crossfade decisions as the music unfolds.

How to start

Say any of these at the Editor prompt (or as your opening request):

go live
play live
spin it live late-night-study

What happens

── Apollo LiveDJ ──
Commands: next | stay [N] | skip | quit | or anything natural language

[LiveDJ] On deck. Let's go.

  TRACK_STARTED: 'Quiet Notes bis' (76 BPM, 10B)
  ...
  APPROACHING_CF in 18s: 'Quiet Notes bis' → 'Soft Focus Loop' (76→76 BPM, 10B→11A)

[LiveDJ] Clean 1-step key move, same BPM — letting it ride.

  CROSSFADE_TRIGGERED: 'Quiet Notes bis' → 'Soft Focus Loop'

You: more energetic
[LiveDJ] Swapped track 4 → 'No more socials' (82 BPM, 11B).

You: next
[LiveDJ] Crossfading now.

Apollo's decision rules

Transition quality	Action
Camelot ≤1 step, BPM diff ≤8	Let it ride — no intervention
Camelot 2 steps or BPM diff 8–20	`extend_track(20)` — buys time
Camelot >2 steps or BPM diff >20	`crossfade_now()` or `queue_swap()` a better track

Live commands

What you type	What Apollo does
`next` / `skip`	Crossfades immediately
`stay` / `longer`	Extends current track 30s
`stay 60`	Extends by a specific number of seconds
`more energetic`	Swaps next track for a higher-BPM option
`wind down` / `chill`	Swaps next track for lower BPM / softer key
`quit` / `q`	Ends the session

Cycle diagram

sequenceDiagram
    participant U  as 👤 User
    participant DJ as Apollo LiveDJ<br/>(event loop · 100ms tick)
    participant LM as LLM
    participant EQ as Event Queue
    participant EN as LiveEngine<br/>(sounddevice callback)
    participant PS as Pre-stretch Thread<br/>(pyrubberband)

    Note over EN: play() — loads track 1, starts OutputStream
    EN->>EQ: TRACK_STARTED
    EN->>PS: start_prestretch(track 1 → track 2)
    PS-->>EN: _next_audio ready (BPM-stretched)

    loop Every 100 ms
        DJ->>EQ: drain events
        DJ->>U: drain stdin (1 line max)
    end

    Note over EN,EQ: 30s before crossfade point…
    EN->>EQ: APPROACHING_CF (track 1 → track 2, Δbpm, Δkey, secs)
    DJ->>LM: batch turn (events + state)

    alt Good transition (≤1 Camelot step, ≤8 BPM diff)
        LM-->>DJ: (silent — no tool call)
    else Mediocre (2 steps OR 8–20 BPM diff)
        LM->>EN: extend_track(20)
        EN-->>LM: "Crossfade delayed 20s."
    else Bad (>2 steps OR >20 BPM diff)
        LM->>EN: crossfade_now()
        EN-->>LM: "Crossfade triggered."
    end

    U->>DJ: "more energetic"
    DJ->>LM: batch turn (user input + state)
    LM->>EN: queue_swap(position=3, track_id="…")
    EN-->>LM: "Queued 'No more socials' at position 3."
    LM-->>DJ: "Swapped track 3 → No more socials."
    DJ->>U: [LiveDJ] Swapped track 3 → No more socials.

    Note over EN: crossfade point reached — watchdog fires
    EN->>EQ: CROSSFADE_TRIGGERED
    Note over EN: 12s linear blend in audio callback
    EN->>EQ: CROSSFADE_FINISHED
    EN->>EQ: TRACK_ENDED (track 1)
    EN->>EQ: TRACK_STARTED (track 2)
    EN->>PS: start_prestretch(track 2 → track 3)

How the threads fit together

%%{init: {'theme': 'base', 'themeVariables': {
  'background': '#0d0d1a',
  'primaryColor': '#1a1a2e',
  'primaryTextColor': '#e0e0ff',
  'primaryBorderColor': '#4a4a8a',
  'lineColor': '#6060aa',
  'secondaryColor': '#12122a',
  'tertiaryColor': '#0d0d1a',
  'edgeLabelBackground': '#1a1a2e',
  'clusterBkg': '#12122a',
  'clusterBorder': '#3a3a6a',
  'titleColor': '#c0c0ff',
  'nodeTextColor': '#e0e0ff',
  'fontFamily': 'monospace'
}}}%%

flowchart TB
    subgraph MAIN["🧵 Main thread — LiveDJ event loop (100ms tick)"]
        direction LR
        DRAIN["drain Event Queue\ndrain stdin (1 line)"]:::pipeline
        FORMAT["format turn\ncall LLM ≤5 turns\nexec tool calls"]:::agent
        DRAIN --> FORMAT
    end

    subgraph ENGINE["⚙️ LiveEngine"]
        direction TB
        WATCHDOG["🔍 Watchdog thread\n50ms tick\n─────────────────\nAPPROACHING_CF\nCROSSFADE_TRIGGERED\nCROSSFADE_FINISHED\nTRACK_ENDED · SESSION_ENDED"]:::agent
        CALLBACK["🔊 sounddevice callback\nlow-latency audio thread\n2048-sample blocks\n─────────────────\nstate=playing → copy samples\nstate=crossfading →\n  out*(1−t) + in*t\nblend done → swap buffers"]:::pipeline
        API["🔧 Public API\nthread-safe (_lock)\n─────────────────\ncrossfade_now()\nextend_track(N)\nskip_track()\nqueue_swap(pos, id)\nset_crossfade_point(sec)"]:::memory
        WATCHDOG -->|"_cf_just_finished flag"| CALLBACK
    end

    PRESTRETCH["🎛️ Pre-stretch daemon\n─────────────────\nload next WAV\npyrubberband time-stretch\nto match current BPM\n_STRETCH_MAX 1.5×\nsignal _prestretch_ready"]:::memory

    USER(["👤 User\nnext · stay · skip\nmore energetic\nwind down · quit"]):::user
    EQ[("📨 Event Queue\nthreading.Queue")]:::memory

    WATCHDOG -->|"emit events"| EQ
    EQ -->|"drained each tick"| DRAIN
    USER -->|"stdin"| DRAIN
    FORMAT -->|"tool calls"| API
    API -->|"state mutations"| CALLBACK
    CALLBACK -->|"_next_audio\npre-stretched WAV"| WATCHDOG
    ENGINE -->|"triggers prestretch\non each track start"| PRESTRETCH
    PRESTRETCH -->|"_next_audio ready"| CALLBACK

    classDef agent    fill:#1a1a3a,stroke:#5858b0,color:#c8c8ff
    classDef pipeline fill:#0a1f1a,stroke:#20a060,color:#60ffb0
    classDef memory   fill:#1a0a2a,stroke:#8040c0,color:#c080ff
    classDef user     fill:#0a0a1a,stroke:#4040a0,color:#8080d0

Key design decisions:

Pre-stretch runs ahead of time — by the time the crossfade fires, the next track's audio is already in memory at the right BPM. Crossfades are instant with no stutter.
Watchdog at 50ms, event loop at 100ms — the engine detects state changes twice as fast as the LLM loop polls, so events are never missed.
LLM budget capped at 5 turns per batch — prevents the agent from spending unbounded tokens on a single event while music is playing.
_extend_samples shifts the crossfade point — extend_track(N) adds N×44100 samples to the threshold, cleanly delaying the auto-crossfade without touching the audio buffer.
Hot cue OUT marks set the crossfade point — if a track has an out hot cue, the engine crossfades from that exact position instead of duration − 17s.

Agent Memory

After each build, rate your session 1-5. Ratings accumulate in agent/memory.json. On the next session of the same genre:

Muse (Planner) avoids tracks that have been swapped out 2+ times
Momus (Critic) flags transition patterns that have been problems before
High-rated mood/arc combinations are surfaced as references

Project Structure

main.py              # Core pipeline (~2600 lines): catalog, mixing, video
agent/
  run.py             # Apollo orchestrator + all agent loops
  tools.py           # Tool functions for all agents
  memory.json        # Persistent session history (auto-created)
tracks/
  tracks.json        # Unified catalog (auto-generated)
  <genre>/           # WAV files per genre
output/              # Generated mixes and videos (gitignored)
artwork/             # DALL-E 3 backgrounds (cached, gitignored)
fonts/
  PressStart2P-Regular.ttf

License

MIT — see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
agent		agent
fonts		fonts
openspec		openspec
tests		tests
tracks		tracks
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
PAPER.md		PAPER.md
README.md		README.md
ROADMAP.md		ROADMAP.md
apollo_agents_logo.png		apollo_agents_logo.png
architecture_infographic.png		architecture_infographic.png
main.py		main.py
make_logo.py		make_logo.py
make_paper_pdf.py		make_paper_pdf.py
paper.tex		paper.tex
pyproject.toml		pyproject.toml
uv.lock		uv.lock
video_test.py		video_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ApolloAgents

✨ Live Mode

Example Output

Architecture

Pipeline phases

Features

Setup

Adding Your Tracks

Usage

Conversational agent (recommended)

Direct CLI (no agent)

Supported Genres

Output

Live Mode

How to start

What happens

Apollo's decision rules

Live commands

Cycle diagram

How the threads fit together

Agent Memory

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ApolloAgents

✨ Live Mode

Example Output

Architecture

Pipeline phases

Features

Setup

Adding Your Tracks

Usage

Conversational agent (recommended)

Direct CLI (no agent)

Supported Genres

Output

Live Mode

How to start

What happens

Apollo's decision rules

Live commands

Cycle diagram

How the threads fit together

Agent Memory

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages