Autonomous Trading Logic & Analysis System
A research project investigating whether causal-chain discovery over public macroeconomic, energy, weather, trade, and sentiment data can produce tradeable predictive edge in liquid futures and FX markets.
Short answer, found through honest out-of-sample testing: no — and the why is the interesting part.
ATLAS is an end-to-end quantitative research system (~90k lines of Python, ~110 PostgreSQL tables, 100+ external data sources). It spans:
- Asset Tracker — daily price history for ~1,400 equities + 55 traded futures/FX pairs.
- Data Engine — 8 deterministic data pipelines (macro, energy, weather, trade, agriculture, sentiment, tech, event) producing ~1,000 daily indicator columns from public sources (FRED, EIA, JODI, NOAA, USGS, UN Comtrade, GDELT, and others).
- Unified Dataset — all pipelines aligned onto a single date spine.
- Prediction Engine — XGBoost models and a PCMCI+/Granger causal-chain discovery pipeline.
- Portfolio & Execution — dollar-neutral construction and broker-agnostic routing (paper only).
The design principle throughout is "math first, LLM second": the data pipelines are deterministic, not autonomous agents. There is no black-box reasoning in the signal path.
Causal chains are intuitive and real: rainfall affects crop supply, supply affects prices; interest-rate differentials affect currencies; upstream input costs affect downstream producers. The hypothesis was that a disciplined causal-discovery method (PCMCI+ for discovery, Granger for confirmation, walk-forward stability filtering) could surface these chains from public data and convert them into tradeable signals.
The earlier version of the system reported a strong in-sample backtest (annualized Sharpe ~2.2 on 2024 data). This project set out to test whether that result survived honest, leak-free, out-of-sample evaluation.
The core contribution of this project is not a strategy — it is a test harness that refuses to fool itself. Specifically:
- A curated, health-audited dataset. A full audit of all 1,009 data-engine columns found only ~27% genuinely healthy; the rest were empty (never wired to a source), stubbed (constant placeholders), truncated (short history hidden behind a full-looking chart axis), or stale (source discontinued). 296 verified-healthy factors were carved into a separate aligned table.
- Release-date alignment. Many public series are dated to a reference period but published weeks later (e.g. January CPI released in mid-February). Naively aligning by reference date leaks future information into the past. A release-date shift transform ensures every value is only visible on the date it was actually knowable — eliminating a major source of lookahead bias.
- Leak-free walk-forward evaluation. Discovery runs on six historical windows ending no later than 2025-01-02; all out-of-sample testing happens strictly after that date, on data discovery never saw.
- Honest significance testing. Cross-target family-wise error correction (Bonferroni over the effective number of independent signals, after collapsing redundant near-duplicate chains) and block-bootstrap null distributions (block length ≥ forecast horizon, to preserve autocorrelation in overlapping-window targets). Strategies are judged on realized profit, not Sharpe.
Across five independent discovery+test campaigns — crude oil, soybeans, six FX majors, gold, and horizon-reframed targets (60-day and 90-day forward returns, 20-day realized volatility) — using tight, economically-motivated driver sets (~20–55 factors each), two stability thresholds, and lag-tolerant edge voting:
No discovered chain exceeded the noise floor under honest multiple-testing correction.
In every campaign, nominal "winners" landed at the number expected by pure chance once corrected for how many hypotheses were tested. The strongest individual chains were fragile (a 2–3 day change in one link's lag flipped a top performer into a bottom one) and the most "stable" lead-lag relationships were economically empty:
- Physics, not signal — e.g. a region's max temperature "predicting" its own min temperature (6/6 windows). True, useless.
- Accounting identities — the dollar index is mechanically anti-correlated with every USD pair by construction. PCMCI correctly flags it as maximally stable; it is not exploitable edge, it is the same information expressed twice. This identity substantially explains the earlier system's headline "FX chains."
- Already-priced macro co-movement — real-rate / dollar / credit-spread relationships that the market prices instantly.
The honest conclusion is not "causal chains are fake," and it is not "we picked the wrong phenomena." Rainfall, trade flows, energy, positioning — these are real economic drivers. The constraint is not which phenomena you choose; it is that public data on any of them is already priced. A lead-lag relationship visible in public data gets traded until trading it prices it in — the mechanism by which markets become efficient. And markets are far more efficient now than ten or fifteen years ago: the window between a public edge appearing and being arbitraged away has compressed to days. The width of that window scales with how widely-watched the data source is — the less-watched the input, the longer an edge in it survives.
So the load-bearing variable is the data, not the model. Once you hold an input the crowd does not have — and is not yet watching — the signal in it can be extracted many ways: multiple models, or even direct discretionary reading. Model sophistication is the replaceable part; the data advantage is what lasts. The moat is the data, not the method.
One caveat this project is careful about: a null result on its own cannot prove an edge was real and later decayed versus never having existed outside in-sample noise — both present identically as weak out-of-sample performance. Separating them requires tracking edge strength over rolling windows (a measurable half-life points to a real, arbitraged edge; flat-at-noise from the start points to a spurious one). That analysis is the natural next step; until it is done, "perishable edge" is the most economically plausible reading of these nulls, not a proven one.
This points to where edge can still live:
- Faster than the arbitrageurs — intraday / order-flow data (a latency game, not an insight game).
- Data the crowd doesn't have or isn't watching — non-public or alternative data, unpriced precisely because it is hard to acquire and not yet widely tracked.
- Markets too small to be worth arbitraging — neglected, illiquid instruments where edge persists because large funds don't bother.
None of these is "a better algorithm on the same public daily data" — that terrain is already efficiently logged. The next version of ATLAS is built around the first two: sourcing alternative data that has not yet been picked over.
A leaky or Sharpe-chasing backtest would have produced a confident fake winner. This one didn't, because it was built not to. Publishing the null — with the full statistical work behind it (see reports/) — is the point. The discipline of building an instrument that tells the truth, and then believing it when it says "not here," is the actual research contribution.
The infrastructure is reusable: a clean aligned dataset, leak-safe release alignment, an isolated strategy-module pattern, and an out-of-sample + FWER + block-bootstrap harness that returns an honest verdict in ~80 seconds. When better inputs become available, they can be tested in an afternoon.
src/
core/ # database, config, shared utilities
asset_tracker/ # price layer
*_agent/ # the 8 data pipelines (macro, energy, weather, ...)
unified_dataset/ # alignment layer + refined factor table builder
prediction_engine/
chains/ # PCMCI+/Granger discovery pipeline
crude_strategy/ # isolated discovery+OOS campaign (crude)
soy_strategy/ # isolated campaign (soybeans)
fx_strategy/ # isolated campaign (FX majors)
horizon_strategy/ # isolated campaign (long-horizon + volatility targets)
migrations/ # PostgreSQL schema migrations
reports/ # audit + out-of-sample test results (the evidence)
Requires PostgreSQL and the data pipelines populated from their respective public sources (API keys in .env, template in .env.example). See the architecture documentation for the full data-source list and build sequence. Each *_strategy module exposes a pipeline entry point and an OOS test harness under its reports/ output directory.
- Performance numbers from the earlier version of this system (e.g. the ~2.2 in-sample Sharpe) were measured before the leak-safe pipeline existed and do not survive the out-of-sample evaluation described above. They are retained in historical documentation for transparency, not as claims.
- Execution is paper-only; no live broker connection.
- This is a research project, not investment advice.
This was the version 1 of ATLAS, were we found that data is the key. I've already begun the work of V2 where I have come across some very interesting alternate datasets which all look very promising.
Author: Yash Bhosale
