Skip to content

pabloformoso/apollo-agents

Repository files navigation

ApolloAgents

CI Python License: MIT Roadmap

ApolloAgents Logo

An AI-powered DJ set builder — from track catalog to rendered YouTube video, guided by a team of specialized agents.

ApolloAgents uses a multi-agent pipeline to plan, critique, and build DJ mixes. You describe the vibe. The agents handle harmonic mixing, BPM matching, energy arc planning, and audio quality validation. You stay in control at every checkpoint.


✨ Live Mode

Apollo DJs in real time — no pre-render, no waiting. Just music, events, and autonomous decisions.

%%{init: {'theme': 'base', 'themeVariables': {
  'background': '#0d0d1a',
  'primaryColor': '#1a1a2e',
  'primaryTextColor': '#e0e0ff',
  'primaryBorderColor': '#4a4a8a',
  'lineColor': '#6060aa',
  'secondaryColor': '#12122a',
  'edgeLabelBackground': '#1a1a2e',
  'clusterBkg': '#12122a',
  'clusterBorder': '#3a3a6a',
  'nodeTextColor': '#e0e0ff',
  'fontFamily': 'monospace'
}}}%%

flowchart LR
    TRACK["🎵 Track playing\nlive audio"]:::pipeline

    CF{"⏱ Approaching\ncrossfade?"}:::checkpoint

    GOOD["✅ Let it ride"]:::agent
    MED["⏸ extend_track(20)"]:::agent
    BAD["⚡ crossfade_now()"]:::agent

    USER(["👤 next · stay\nmore energetic\nwind down"]):::user

    NEXT["🎵 Next track\n(pre-stretched)"]:::pipeline

    TRACK -->|"30s warning"| CF
    CF -->|"≤1 Camelot step\n≤8 BPM diff"| GOOD
    CF -->|"2 steps\nOR 8–20 BPM"| MED
    CF -->|">2 steps\nOR >20 BPM"| BAD
    GOOD --> NEXT
    MED  --> NEXT
    BAD  --> NEXT
    USER -->|"mid-set command"| CF

    classDef agent      fill:#1a1a3a,stroke:#5858b0,color:#c8c8ff
    classDef pipeline   fill:#0a1f1a,stroke:#20a060,color:#60ffb0
    classDef checkpoint fill:#2a1a0a,stroke:#c07820,color:#ffc060
    classDef user       fill:#0a0a1a,stroke:#4040a0,color:#8080d0
Loading
uv run python agent/run.py
# → go live

Full Live Mode docs, thread architecture & cycle diagram


Example Output

Every session in this YouTube channel was built with ApolloAgents — from the earliest proof-of-concept cuts in v0.0 to today's fully orchestrated pipeline in v1.0. Same tracks, same taste, progressively better mixing as the agents learned.

Watch on YouTube

Architecture

ApolloAgents: The Multi-Agent DJ Architecture

%%{init: {'theme': 'base', 'themeVariables': {
  'background': '#0d0d1a',
  'primaryColor': '#1a1a2e',
  'primaryTextColor': '#e0e0ff',
  'primaryBorderColor': '#4a4a8a',
  'lineColor': '#6060aa',
  'secondaryColor': '#12122a',
  'tertiaryColor': '#0d0d1a',
  'edgeLabelBackground': '#1a1a2e',
  'clusterBkg': '#12122a',
  'clusterBorder': '#3a3a6a',
  'titleColor': '#c0c0ff',
  'nodeTextColor': '#e0e0ff',
  'fontFamily': 'monospace'
}}}%%

flowchart TD
    User(["👤 User\nprompt"]):::user

    subgraph APOLLO["☀️  APOLLO — Orchestrator"]
        direction TB

        JANUS["🚪 JANUS\nGenre Guard\n─────────────\nvalidates genre · duration · mood"]:::agent
        HERMES["⚡ HERMES\nCatalog Manager\n─────────────\nsyncs WAVs · detects BPM & key"]:::agent

        MUSE["🎵 MUSE\nPlanner\n─────────────\nenergy arc · harmonic order\nreads memory → avoids weak tracks"]:::agent

        CP1{{"🛑 Checkpoint 1\nreview playlist"}}:::checkpoint

        MOMUS["🎭 MOMUS\nCritic\n─────────────\ncold review · PROBLEMS / VERDICT\nreads memory → flags patterns"]:::agent

        CP2{{"🛑 Checkpoint 2\napply fixes"}}:::checkpoint

        EDITOR["✏️ Editor REPL\nswap · move · refine"]:::agent

        PIPELINE[["⚙️ Mix Pipeline\nBPM match → crossfade → WAV\n1080p video + YouTube Short"]]:::pipeline

        THEMIS["⚖️ THEMIS\nValidator\n─────────────\nclipping · spectral flatness\nsilence gaps · RMS drops"]:::agent

        MEMORY[("🧠 Memory\nrating + notes\n→ agents improve")]:::memory
    end

    User --> JANUS
    User --> HERMES
    JANUS -->|"confirmed genre"| MUSE
    MUSE -->|"playlist"| CP1
    CP1 -->|"proceed"| MOMUS
    MOMUS -->|"verdict"| CP2
    CP2 -->|"ok"| EDITOR
    EDITOR -->|"build"| PIPELINE
    PIPELINE --> THEMIS
    THEMIS -->|"PASS"| MEMORY
    MEMORY -.->|"past sessions"| MUSE
    MEMORY -.->|"problem patterns"| MOMUS

    classDef agent        fill:#1a1a3a,stroke:#5858b0,color:#c8c8ff,rx:6
    classDef checkpoint   fill:#2a1a0a,stroke:#c07820,color:#ffc060,shape:diamond
    classDef pipeline     fill:#0a1f1a,stroke:#20a060,color:#60ffb0
    classDef memory       fill:#1a0a2a,stroke:#8040c0,color:#c080ff
    classDef user         fill:#0a0a1a,stroke:#4040a0,color:#8080d0,shape:circle
Loading
Agent Mythological name Role
Genre Guard Janus Gatekeeper — validates genre, duration, mood before planning starts
Catalog Manager Hermes Keeper of records — syncs WAV files to catalog, detects BPM & key
Planner Muse Inspires the set — energy arc, harmonic ordering, track selection
Critic Momus God of fault-finding — cold independent review, structured verdict
Editor (REPL) Interactive editor — swap, move, insert bridge tracks, trigger build or go live
LiveDJ Apollo LiveDJ Real-time DJ engine — autonomous crossfade decisions, reacts to engine events and listener commands
Validator Themis Goddess of order — audio quality analysis after every build
Orchestrator Apollo Conductor — sequences all agents, manages state, collects memory

Pipeline phases

1. Janus (Genre Guard)   → confirms genre / duration / mood
2. Muse  (Planner)       → proposes playlist + energy arc
3. Checkpoint 1          → user reviews; manual adjustments allowed
4. Momus (Critic)        → cold review: flags key clashes, BPM stretch, arc gaps
5. Checkpoint 2          → user sees critique; decides what to apply
6. Editor REPL           → swap · move · insert bridge tracks → build
7. Themis (Validator)    → audio quality report after build

Checkpoints are hard gates — agents never auto-apply fixes. You stay in control.


Features

  • Conversational planning — describe the vibe, iterate with the agents, build when ready
  • Harmonic mixing — Camelot wheel-based track ordering for smooth key transitions
  • BPM matching — gradual tempo ramps between tracks via pyrubberband
  • BPM stretch safety — transitions with pyrubberband ratio >1.5× are flagged; Critic mandates a bridge track fix
  • Bridge track insertionsuggest_bridge_track finds candidates between mismatched positions; insert_bridge_track splices one in
  • EQ matching at crossfade — shelving EQ applied to outgoing/incoming segments based on key distance, reducing frequency masking
  • Energy arc planning — Planner evaluates set shape (warmup → build → peak → wind-down) and iterates until no gaps or plateaus
  • Audio validation — peak clipping, spectral flatness (bleach detection), silence gap and RMS anomaly checks
  • Per-transition ratings — rate each session 1–5 after build; Critic memory flags recurring problem transitions
  • Session memory — agents learn from past sessions: which tracks get swapped, what energy arcs rate highly
  • Catalog management — scan new WAVs, detect missing BPM/key fields, keep tracks.json in sync
  • Live Mode — Apollo DJs in real time: autonomous crossfade decisions, responds to next, stay, more energetic, wind down mid-set
  • Multi-provider — Claude (Anthropic), GPT-4o (OpenAI), or any local model via Ollama; auto-detected from .env
  • 1080p video output — spectral waveform visualizer, beat-reactive particles, DALL-E 3 artwork, retro pixel titles
  • YouTube Short — auto-generated 20s teaser alongside the full mix

Setup

Requirements: Python 3.12+, uv, ffmpeg

git clone https://github.com/YOUR_USERNAME/apollo-agents.git
cd apollo-agents

# Install dependencies
uv sync

# Copy and fill in your API keys
cp .env.example .env

.env keys:

Key Required Purpose
ANTHROPIC_API_KEY One of these Claude (recommended, default: claude-opus-4-6)
OPENAI_API_KEY One of these GPT-4o — also used for DALL-E 3 artwork
AGENT_PROVIDER=ollama One of these Use a local Ollama model (default: gemma4:4b)
OLLAMA_BASE_URL Optional Override Ollama endpoint (default: http://localhost:11434/v1)
AGENT_MODEL Optional Override the model for any provider (e.g. AGENT_MODEL=gpt-4o-mini)

Adding Your Tracks

Put WAV files into genre subfolders under tracks/:

tracks/
  techno/
    Acid Rain.wav
    Zero Day.wav
  deep house/
    Solar Drift.wav
  lofi - ambient/
    Kernel Space.wav
  cyberpunk/
    Chrome Horizon.wav

Then build the catalog (detects BPM + Camelot key for each file):

python main.py --build-catalog

Or let Hermes do it conversationally:

uv run python agent/run.py
# → "I added new tracks"

Usage

Conversational agent (recommended)

# Default (Claude / GPT-4o, whichever key is in .env)
uv run python agent/run.py

# Local model via Ollama (no API key required)
AGENT_PROVIDER=ollama uv run python agent/run.py

# Override model for any provider
AGENT_MODEL=claude-haiku-4-5-20251001 uv run python agent/run.py

Example session:

What would you like to do?

You: 60min techno set, dark industrial build to a hard peak

── Janus (Genre Guard) ──
[confirms genre: techno, 60min, mood: dark industrial build]

── Muse (Planner) ──
[surveys catalog, proposes 12-track playlist]
[evaluates energy arc: plateau detected at pos 6-8, swaps pos 7 to fix]
Energy arc: 3-track warmup → hard build → peak at pos 9 → wind-down

── Checkpoint 1 ──
You: move track 4 to position 7
[shows updated playlist]
You: proceed

── Momus (Critic) ──
PROBLEMS:
- [pos 2→3] key clash 5A → 11A — fix: swap pos 3 for zero-day
- [pos 8→9] ⚠ Stretch 1.8× — bridge track required
VERDICT: NEEDS_FIXES

── Checkpoint 2 ──
You: swap pos 3 like the critic said
You: ok

── Editor ──
You: fix the stretch at 8→9
[suggest_bridge_track(8, 9) → 3 candidates at 142 BPM]
[insert_bridge_track(after_position=8, track_id="techno--acid-rain")]
You: build midnight-industrial

── Themis (Validator) ──
AUDIO QUALITY REPORT — midnight-industrial
Status: PASS — no issues detected ✓

Rate 1-5 (Enter to skip): 5
Any notes?: peak section was perfect

Direct CLI (no agent)

# Generate a session directly
python main.py --name "midnight-techno" --genre "techno" --duration 60

# Re-render video from existing mix audio
python main.py --name "midnight-techno" --genre "techno" --video-only

# Fix missing BPM/key fields in catalog
python main.py --fix-incomplete

Supported Genres

Folder name Visual theme
techno Dark red, industrial
deep house Neon violet, deep
lofi - ambient Warm cream, anime-style artwork
cyberpunk Neon green, dystopic

Add new genres by creating a subfolder under tracks/ and running --build-catalog.


Output

Every session writes to output/<session-name>/:

output/midnight-techno/
  mix_output.wav      # lossless mix
  mix_video.mp4       # 1920×1080, 24fps, spectral waveform
  short.mp4           # 1080×1920, 20s YouTube Short
  session.json        # playlist for reproducibility
  transitions.json    # crossfade timestamps
  youtube.md          # title, description, tracklist, tags

Live Mode

Live Mode skips the pre-rendered pipeline entirely. Apollo plays tracks in real time and makes autonomous crossfade decisions as the music unfolds.

How to start

Say any of these at the Editor prompt (or as your opening request):

go live
play live
spin it live late-night-study

What happens

── Apollo LiveDJ ──
Commands: next | stay [N] | skip | quit | or anything natural language

[LiveDJ] On deck. Let's go.

  TRACK_STARTED: 'Quiet Notes bis' (76 BPM, 10B)
  ...
  APPROACHING_CF in 18s: 'Quiet Notes bis' → 'Soft Focus Loop' (76→76 BPM, 10B→11A)

[LiveDJ] Clean 1-step key move, same BPM — letting it ride.

  CROSSFADE_TRIGGERED: 'Quiet Notes bis' → 'Soft Focus Loop'

You: more energetic
[LiveDJ] Swapped track 4 → 'No more socials' (82 BPM, 11B).

You: next
[LiveDJ] Crossfading now.

Apollo's decision rules

Transition quality Action
Camelot ≤1 step, BPM diff ≤8 Let it ride — no intervention
Camelot 2 steps or BPM diff 8–20 extend_track(20) — buys time
Camelot >2 steps or BPM diff >20 crossfade_now() or queue_swap() a better track

Live commands

What you type What Apollo does
next / skip Crossfades immediately
stay / longer Extends current track 30s
stay 60 Extends by a specific number of seconds
more energetic Swaps next track for a higher-BPM option
wind down / chill Swaps next track for lower BPM / softer key
quit / q Ends the session

Cycle diagram

sequenceDiagram
    participant U  as 👤 User
    participant DJ as Apollo LiveDJ<br/>(event loop · 100ms tick)
    participant LM as LLM
    participant EQ as Event Queue
    participant EN as LiveEngine<br/>(sounddevice callback)
    participant PS as Pre-stretch Thread<br/>(pyrubberband)

    Note over EN: play() — loads track 1, starts OutputStream
    EN->>EQ: TRACK_STARTED
    EN->>PS: start_prestretch(track 1 → track 2)
    PS-->>EN: _next_audio ready (BPM-stretched)

    loop Every 100 ms
        DJ->>EQ: drain events
        DJ->>U: drain stdin (1 line max)
    end

    Note over EN,EQ: 30s before crossfade point…
    EN->>EQ: APPROACHING_CF (track 1 → track 2, Δbpm, Δkey, secs)
    DJ->>LM: batch turn (events + state)

    alt Good transition (≤1 Camelot step, ≤8 BPM diff)
        LM-->>DJ: (silent — no tool call)
    else Mediocre (2 steps OR 8–20 BPM diff)
        LM->>EN: extend_track(20)
        EN-->>LM: "Crossfade delayed 20s."
    else Bad (>2 steps OR >20 BPM diff)
        LM->>EN: crossfade_now()
        EN-->>LM: "Crossfade triggered."
    end

    U->>DJ: "more energetic"
    DJ->>LM: batch turn (user input + state)
    LM->>EN: queue_swap(position=3, track_id="…")
    EN-->>LM: "Queued 'No more socials' at position 3."
    LM-->>DJ: "Swapped track 3 → No more socials."
    DJ->>U: [LiveDJ] Swapped track 3 → No more socials.

    Note over EN: crossfade point reached — watchdog fires
    EN->>EQ: CROSSFADE_TRIGGERED
    Note over EN: 12s linear blend in audio callback
    EN->>EQ: CROSSFADE_FINISHED
    EN->>EQ: TRACK_ENDED (track 1)
    EN->>EQ: TRACK_STARTED (track 2)
    EN->>PS: start_prestretch(track 2 → track 3)
Loading

How the threads fit together

%%{init: {'theme': 'base', 'themeVariables': {
  'background': '#0d0d1a',
  'primaryColor': '#1a1a2e',
  'primaryTextColor': '#e0e0ff',
  'primaryBorderColor': '#4a4a8a',
  'lineColor': '#6060aa',
  'secondaryColor': '#12122a',
  'tertiaryColor': '#0d0d1a',
  'edgeLabelBackground': '#1a1a2e',
  'clusterBkg': '#12122a',
  'clusterBorder': '#3a3a6a',
  'titleColor': '#c0c0ff',
  'nodeTextColor': '#e0e0ff',
  'fontFamily': 'monospace'
}}}%%

flowchart TB
    subgraph MAIN["🧵 Main thread — LiveDJ event loop (100ms tick)"]
        direction LR
        DRAIN["drain Event Queue\ndrain stdin (1 line)"]:::pipeline
        FORMAT["format turn\ncall LLM ≤5 turns\nexec tool calls"]:::agent
        DRAIN --> FORMAT
    end

    subgraph ENGINE["⚙️ LiveEngine"]
        direction TB
        WATCHDOG["🔍 Watchdog thread\n50ms tick\n─────────────────\nAPPROACHING_CF\nCROSSFADE_TRIGGERED\nCROSSFADE_FINISHED\nTRACK_ENDED · SESSION_ENDED"]:::agent
        CALLBACK["🔊 sounddevice callback\nlow-latency audio thread\n2048-sample blocks\n─────────────────\nstate=playing → copy samples\nstate=crossfading →\n  out*(1−t) + in*t\nblend done → swap buffers"]:::pipeline
        API["🔧 Public API\nthread-safe (_lock)\n─────────────────\ncrossfade_now()\nextend_track(N)\nskip_track()\nqueue_swap(pos, id)\nset_crossfade_point(sec)"]:::memory
        WATCHDOG -->|"_cf_just_finished flag"| CALLBACK
    end

    PRESTRETCH["🎛️ Pre-stretch daemon\n─────────────────\nload next WAV\npyrubberband time-stretch\nto match current BPM\n_STRETCH_MAX 1.5×\nsignal _prestretch_ready"]:::memory

    USER(["👤 User\nnext · stay · skip\nmore energetic\nwind down · quit"]):::user
    EQ[("📨 Event Queue\nthreading.Queue")]:::memory

    WATCHDOG -->|"emit events"| EQ
    EQ -->|"drained each tick"| DRAIN
    USER -->|"stdin"| DRAIN
    FORMAT -->|"tool calls"| API
    API -->|"state mutations"| CALLBACK
    CALLBACK -->|"_next_audio\npre-stretched WAV"| WATCHDOG
    ENGINE -->|"triggers prestretch\non each track start"| PRESTRETCH
    PRESTRETCH -->|"_next_audio ready"| CALLBACK

    classDef agent    fill:#1a1a3a,stroke:#5858b0,color:#c8c8ff
    classDef pipeline fill:#0a1f1a,stroke:#20a060,color:#60ffb0
    classDef memory   fill:#1a0a2a,stroke:#8040c0,color:#c080ff
    classDef user     fill:#0a0a1a,stroke:#4040a0,color:#8080d0
Loading

Key design decisions:

  • Pre-stretch runs ahead of time — by the time the crossfade fires, the next track's audio is already in memory at the right BPM. Crossfades are instant with no stutter.
  • Watchdog at 50ms, event loop at 100ms — the engine detects state changes twice as fast as the LLM loop polls, so events are never missed.
  • LLM budget capped at 5 turns per batch — prevents the agent from spending unbounded tokens on a single event while music is playing.
  • _extend_samples shifts the crossfade pointextend_track(N) adds N×44100 samples to the threshold, cleanly delaying the auto-crossfade without touching the audio buffer.
  • Hot cue OUT marks set the crossfade point — if a track has an out hot cue, the engine crossfades from that exact position instead of duration − 17s.

Agent Memory

After each build, rate your session 1-5. Ratings accumulate in agent/memory.json. On the next session of the same genre:

  • Muse (Planner) avoids tracks that have been swapped out 2+ times
  • Momus (Critic) flags transition patterns that have been problems before
  • High-rated mood/arc combinations are surfaced as references

Project Structure

main.py              # Core pipeline (~2600 lines): catalog, mixing, video
agent/
  run.py             # Apollo orchestrator + all agent loops
  tools.py           # Tool functions for all agents
  memory.json        # Persistent session history (auto-created)
tracks/
  tracks.json        # Unified catalog (auto-generated)
  <genre>/           # WAV files per genre
output/              # Generated mixes and videos (gitignored)
artwork/             # DALL-E 3 backgrounds (cached, gitignored)
fonts/
  PressStart2P-Regular.ttf

License

MIT — see LICENSE

About

Automated DJ mix generator — BPM-matched crossfades + 1080p video with spectral waveform visualizations and AI-generated artwork. Built for YouTube.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors