You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PopAgent: Multi-Agent LLM Trading with Adaptive Method Selection
𧬠Core Innovation: Agents Learn to SELECT Methods
Unlike fixed-strategy trading systems, PopAgent maintains populations of agents that learn to SELECT which methods to use from a shared inventory. This creates a meta-learning system where agents discover optimal method combinations through continual learning.
Preference Transfer: Knowledge sharing is about WHAT to select
Context-Aware Selection: Different methods for different market regimes
Online Learning: Models update after EVERY observation (like real hedge funds)
π§ Feature-Aligned Learning (v0.9.8) - The Right Way
Key insight: Update frequency should match FEATURE TIMESCALE, not model complexity!
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FEATURE-ALIGNED LEARNING ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β FAST FEATURES (momentum, vol) ββββββββββΊ Update: EVERY BAR β
β Model: Any (even XGBoost!) Why: These change every 4h β
β β
β MEDIUM FEATURES (trend, daily) βββββββββΊ Update: EVERY 6 BARS β
β Model: Any Why: Trend changes daily β
β β
β SLOW FEATURES (regime, corr) βββββββββββΊ Update: EVERY 42 BARSβ
β Model: Any (even simple!) Why: Regime changes weekly β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Wrong Approach (Model-Based)
Right Approach (Feature-Based)
Simple model β fast update
Fast-changing feature β fast update
Complex model β slow update
Slow-changing feature β slow update
Computational constraint drives design
Data dynamics drive design
How It Works:
Bar 1: Observe β Predict β Trade β See outcome β UPDATE WEIGHTS
Bar 2: Observe β Predict (better) β Trade β See outcome β UPDATE WEIGHTS
Bar 3: Observe β Predict (even better) β Trade β See outcome β UPDATE WEIGHTS
...
Bar 8700: Model has been learning for 4 years
Feature Groups and Models:
Feature Group
Features
Update Freq
Models Used
Fast
ret_1bar, ret_5bar, vol_intrabar, momentum
Every bar
OnlineLinear + OnlineRidge
Medium
trend_strength, daily_vol, sma_ratio
Every 6 bars
Ridge with batch refit
Slow
regime, cross_correlation
Every 42 bars
RandomForest + regime means
Online Models (used in Fast features):
Model
Algorithm
What It Learns
OnlineLinearRegression
SGD
Return prediction
OnlineRidge
Recursive Least Squares
Trend prediction
OnlineVolatility
EWMA
Volatility estimation
OnlineRegimeDetector
Bayesian HMM
Market regime (Bull/Bear/Neutral)
Code Example:
# Online models update after EVERY bar:forbarinprice_data:
features=extract_features(bar)
# Predict BEFORE seeing outcomeprediction=model.predict(features)
# Execute tradeexecute_trade(prediction)
# Next bar: see actual outcomeactual_return=next_bar.close/bar.close-1# UPDATE model weights with observationmodel.update(features, actual_return) # β This is online learning!
π° RL Enhancements (v0.7.0)
Three lightweight, theoretically-grounded RL improvements for robust learning:
1. Thompson Sampling (Bayesian Exploration)
Instead of deterministic UCB, agents sample from Beta distributions to naturally balance exploration and exploitation:
For each method m:
sample ~ Beta(Ξ±_m, Ξ²_m)
# High uncertainty β high variance β more exploration
# High success rate β high mean β more exploitation
Scenario
Alpha
Beta
Behavior
New method
1
1
Uniform sampling (explore)
10 wins, 2 losses
11
3
High mean, exploit
3 wins, 10 losses
4
11
Low mean, avoid
2. Contextual Baselines (Regime-Aware Learning)
Per-regime baselines for proper credit assignment:
Bull market: +2% is average (baseline = 2.5%) β advantage β 0
Bear market: +2% is exceptional (baseline = -0.5%) β advantage β +2.5%
Agents learn context-specific method preferences, not global averages.
# Run all 4 conditions: A=Baseline, B=LLM, C=News, D=Full
python -m trading_agents.cli ablation --condition all \
--symbols BTC,ETH,SOL,XRP,DOGE \
--start 2022-01-01 \
--end 2024-12-01
# Run single condition (e.g., baseline only)
python -m trading_agents.cli ablation --condition A
Condition
LLM
News
Description
A (Baseline)
No
No
Pure Thompson Sampling
B (LLM Only)
Yes
No
LLM reasoning, no news
C (News Only)
No
Yes
News as features
D (Full)
Yes
Yes
Complete system
Step 5: Real-Time Learning Mode (Live Trading)
# Run real-time learning with 4-hour iterations
python -m trading_agents.cli live --symbols BTC,ETH,SOL,XRP,DOGE
# With options
python -m trading_agents.cli live \
--symbols BTC,ETH,SOL \
--interval 4.0 \
--use-llm \
--use-news \
--testnet # Execute on Bybit testnet# Test single iteration (no waiting)
python -m trading_agents.cli live --test-once
Key difference from backtesting:
Backtest: Simulates historical data rapidly (1000+ iterations in minutes)
Live Mode: Waits actual 4 hours between iterations, fetches live data
This implementation builds on TradingAgents (Apache-2.0) and Population-Based Training research.
About
PopAgent: Multi-agent LLM trading with adaptive method selection. Agents learn WHICH methods to use via Thompson Sampling and population-based learning.