Skip to content

Latest commit

 

History

History
291 lines (219 loc) · 8.27 KB

File metadata and controls

291 lines (219 loc) · 8.27 KB

Implementation Details

What Was Built

A 150-line Python solution to extend Claude Code usage through automatic API fallbacks with commit tracking for code review.

Components

  1. LiteLLM Proxy (config file)

    • Handles API routing and automatic fallbacks
    • Switches: Claude → Gemini → DeepSeek
    • Built-in retry logic and rate limit handling
  2. Monitor (50 lines)

    • Flask webhook listener
    • Creates git commits/tags on model switches
    • Sends Discord notifications
  3. Review Script (100 lines)

    • Analyzes commits by model
    • Shows diffs for each model's work
    • Enables accept/rollback of model work

Architecture Decisions

Why LiteLLM Instead of Custom Orchestrator?

Before: 2,500+ lines of custom Python orchestrator with:

  • Custom provider adapters
  • ClaudeMonitor subprocess wrapper
  • Complex session management
  • HTTP server with SSE
  • Nested git branching logic

After: Use battle-tested LiteLLM proxy

  • Already handles 100+ LLM providers
  • Built-in fallback logic
  • Well-documented and maintained
  • 95% less code to maintain

Why Webhook + Flask Instead of Subprocess Monitoring?

The Problem: Claude Code is a web app making HTTP requests, not a CLI tool you can wrap as a subprocess.

The Solution: Intercept at the HTTP level using a proxy, then use webhooks to trigger git operations.

Alternative Considered: mitmproxy with custom addon

  • More complex setup
  • Requires HTTPS certificate management
  • Harder to debug Why We Chose LiteLLM: Official support for proxying, easier configuration, better documentation.

Why Simple Tags Instead of Nested Branches?

Before: Nested branches like dev → dev-openrouter → dev-openrouter-gemini

  • Hard to visualize
  • Complex merge/rollback logic
  • Over-engineered for the use case

After: Linear tags on commits

  • Clear which model did what
  • Simple git revert for rollback
  • Easy to analyze with git log --oneline --decorate

Example:

abc123f (tag: model-gemini-20260216-1430) Auto-commit before switch to gemini
def456g (tag: model-deepseek-20260216-1500) Auto-commit before switch to deepseek

Data Flow

┌─────────────┐
│ Claude Code │
│  (Web App)  │
└──────┬──────┘
       │ HTTP POST
       │ to api.anthropic.com
       ↓
┌─────────────────┐
│  LiteLLM Proxy  │ ← listens on localhost:8000
│  (Port 8000)    │
└────────┬────────┘
         │
         ├─→ Try T0: Claude API
         │   ↓ (429 rate limit)
         ├─→ Try T1: Gemini API
         │   ↓ (429 rate limit)
         └─→ Try T2: DeepSeek API

         │
         │ Webhook on fallback
         ↓
┌──────────────────┐
│  Monitor (Flask) │ ← listens on localhost:5000
│  (Port 5000)     │
└────────┬─────────┘
         │
         ├─→ git commit -am "..."
         ├─→ git tag model-X-timestamp
         └─→ Discord notification

Review Workflow

┌──────────────┐
│ Claude Usage │
│   Returns    │
└──────┬───────┘
       │
       ↓
┌─────────────────────────┐
│ python review.py -a     │  Show what each model did
└──────┬──────────────────┘
       │
       ├─→ Good work? → python review.py --accept gemini
       │                ↓
       │                Tag renamed: model-* → accepted-*
       │
       └─→ Bad work?  → python review.py --rollback deepseek
                        ↓
                        git revert <commits>

Configuration Options

Adding More Fallback Models

Edit litellm_config.yaml:

model_list:
  - model_name: my-custom-model
    litellm_params:
      model: provider/model-name
      api_key: os.environ/MY_API_KEY

router_settings:
  fallbacks: [
    {"claude-sonnet-4": ["gemini-flash"]},
    {"gemini-flash": ["my-custom-model"]},  # Add here
  ]

Disabling Auto-Commit

export GIT_AUTO_COMMIT=false
python monitor.py

Monitor will still receive webhooks but won't create commits.

Custom Webhook Actions

Edit monitor.py webhook handler to add custom logic:

@app.route("/webhook", methods=["POST"])
def webhook():
    data = request.json
    model = data.get("model")

    # Your custom logic here
    if model == "gemini-flash":
        # Run tests before accepting fallback
        subprocess.run(["pytest"])

    # ... existing commit logic

Testing

Unit Test the Monitor

# Start monitor
python monitor.py

# In another terminal, send test webhook
curl -X POST http://localhost:5000/webhook \
  -H "Content-Type: application/json" \
  -d '{"model":"test-model","event_type":"fallback"}'

# Check git tags
git tag -l "model-*"

End-to-End Test

  1. Set very low rate limit on Claude API (via Anthropic console)
  2. Start proxy and monitor
  3. Use Claude Code intensively
  4. Verify automatic fallback to Gemini
  5. Check commits were created: git log --oneline
  6. Review work: python review.py -a

Maintenance

Daily

  • Check review.py --analyze for model activity
  • Review and accept/rollback model work

Weekly

  • Clean up old tags: git tag -l "accepted-*" | xargs git tag -d
  • Check LiteLLM proxy logs for errors

Monthly

  • Update dependencies: pip install -U litellm flask gitpython requests
  • Review fallback model list, adjust as providers change offerings

Troubleshooting

Monitor Creates Too Many Commits

Cause: Every webhook triggers commit, even non-fallback events

Fix: Edit monitor.py to filter events more strictly:

if event_type == "fallback_success" and "rate_limit" in str(data):
    git_commit_and_tag(model, event_type)

Commits Are Empty

Cause: No files changed when model switches

Fix: This is expected if Claude was just thinking/planning. The commit preserves the moment of switch even if no code changed.

Review Script Shows No Tags

Cause: Monitor not creating tags, or tags were deleted

Fix:

  1. Check monitor is running: curl http://localhost:5000/health
  2. Check git tag creation: git tag -l "model-*"
  3. Manually test webhook: see "Testing" section above

Future Enhancements

If this proves useful, consider:

  1. Automated Testing on Fallback

    • Run test suite before accepting fallback
    • Auto-rollback if tests fail
  2. Cost Tracking

    • Log token usage by model
    • Generate cost reports
  3. Context Preservation

    • Save conversation state before switch
    • Replay context to new model
  4. Smart Routing

    • Route by task type (coding vs docs vs planning)
    • Use cheaper models for simple tasks
  5. Web Dashboard

    • Real-time monitoring UI
    • Visual git history by model
    • One-click accept/rollback

But: Keep it simple. The current solution solves the core problem without over-engineering.

Why This Works

  1. Existing Tools: LiteLLM handles the hard parts (API compatibility, rate limit detection)
  2. Simple State: Git commits are the only state, no databases
  3. Manual Review: Human judgment for code quality, not automated merges
  4. Minimal Code: Easy to understand, debug, and modify

Total maintenance burden: ~2 hours/month for a solo developer.

Comparison to Original Implementation

Aspect Original New
Lines of Code ~2,500 ~150
External Dependencies Custom orchestrator LiteLLM (maintained)
Complexity High (nested branches, sessions) Low (tags, commits)
API Interception ❌ Subprocess wrapper ✅ HTTP proxy
Setup Time Hours Minutes
Maintenance High Minimal
Works with Claude Code ❌ No ✅ Yes

License

MIT - Do whatever you want with this code.