Summary
I'd like to gauge interest in adding OPRA options support to the existing DataBentoProvider. Today the Databento integration covers equities and CME futures (GLBX.MDP3) only — there's no path to options data, even though Databento exposes the full OPRA feed (OPRA.PILLAR) through the same Historical client already vendored.
Before writing any code or tests, I wanted to confirm this is in scope for the library (the futures/ module looks deliberately CME-focused, so options may be an intentional boundary rather than a gap).
Motivation
Options are the one major asset class in the README's Databento coverage line ("CME, CBOE, ICE futures/options") that isn't actually reachable. A single DataBentoProvider already normalizes Databento → the standard Polars schema and plugs into DataManager; options OHLCV could ride the same path with a dataset/symbology switch.
What I'm proposing
Extend DataBentoProvider (not a new top-level module), since the work splits cleanly along the existing provider contract:
| Capability |
Fits BaseProvider? |
Proposed surface |
| Single-contract OHLCV (OSI symbol) |
✅ it is OHLCV |
fetch_ohlcv(...) with dataset="OPRA.PILLAR", stype_in="raw_symbol" |
| Chain discovery (definition + filter) |
❌ extra method |
fetch_option_chain(underlying, date, *, expiry=None, spot=None, moneyness=None, right="both") |
| Consolidated bid/ask |
❌ non-OHLCV columns |
fetch_option_quotes(contract, schema="cbbo-1m", ...) |
This mirrors how fetch_continuous_futures() already sits alongside fetch_ohlcv() for the futures case.
Findings from a working prototype
I built a standalone script against databento.Historical to validate behavior end-to-end (happy to share it / attach as a gist). A few things any implementation needs to handle — and which I think are worth encoding so users don't hit them:
-
Request size scales with schema, not asset class. For a full chain (parent symbology), get_billable_size grows by orders of magnitude moving from ohlcv-1d (small) to trades (very large); single-contract pulls are negligible by comparison. Actual dollar cost depends on the user's plan and streaming-vs-batch mode, so the API should (a) make chain-wide price/quote pulls deliberate rather than accidental, and (b) expose get_billable_size / get_cost so users see size and their own quote before committing.
-
estimate_cost() caveat (existing futures code). The current ContinuousDownloader.estimate_cost() is a flat years × constant heuristic that ignores schema — it returns the same estimate for ohlcv-1d and ohlcv-1m, which can be very wrong for finer schemas. For options I'd lean on metadata.get_billable_size()/get_cost() directly. (Possibly worth a separate fix: for futures, but flagging here.)
-
OPRA OHLCV is multi-publisher. Each contract gets one bar per reporting venue (~17), so raw row counts are dates × venues and volume is split. Needs a documented consolidation step. The consolidated quote schemas (cbbo/tcbbo/cmbp-1) are single-publisher (id 30) and need no dedup.
-
Quote availability is per-schema. get_dataset_range shows cbbo-1m back to 2013, cmbp-1/tcbbo to 2023, but cbbo-1s only from 2025-02-20. A runtime check beats any hardcoded date.
-
Index roots split. SPX has SPX (AM-settled monthlies) and SPXW (PM weeklys/EOM) as separate parent chains — full coverage means querying both.
Open questions
- In scope? Is OPRA options something the library wants, or is Databento intentionally futures-only here?
- Provider vs. module: extend
DataBentoProvider, or a parallel options/ module like futures/?
- Quotes scope: include bid/ask (
cbbo/tcbbo) in a first cut, or OHLCV + chain only to start?
- Greeks/IV: out of scope (Databento doesn't provide them; they'd be a downstream computation)?
Scope / non-goals (if welcomed)
- First PR would target OHLCV + chain discovery with mocked-client unit tests (core lane) and
@pytest.mark.integration/paid_tier for live calls; quotes could be a follow-up.
- No greeks/IV, no live/streaming.
Summary
I'd like to gauge interest in adding OPRA options support to the existing
DataBentoProvider. Today the Databento integration covers equities and CME futures (GLBX.MDP3) only — there's no path to options data, even though Databento exposes the full OPRA feed (OPRA.PILLAR) through the sameHistoricalclient already vendored.Before writing any code or tests, I wanted to confirm this is in scope for the library (the
futures/module looks deliberately CME-focused, so options may be an intentional boundary rather than a gap).Motivation
Options are the one major asset class in the README's Databento coverage line ("CME, CBOE, ICE futures/options") that isn't actually reachable. A single
DataBentoProvideralready normalizes Databento → the standard Polars schema and plugs intoDataManager; options OHLCV could ride the same path with a dataset/symbology switch.What I'm proposing
Extend
DataBentoProvider(not a new top-level module), since the work splits cleanly along the existing provider contract:BaseProvider?fetch_ohlcv(...)withdataset="OPRA.PILLAR",stype_in="raw_symbol"fetch_option_chain(underlying, date, *, expiry=None, spot=None, moneyness=None, right="both")fetch_option_quotes(contract, schema="cbbo-1m", ...)This mirrors how
fetch_continuous_futures()already sits alongsidefetch_ohlcv()for the futures case.Findings from a working prototype
I built a standalone script against
databento.Historicalto validate behavior end-to-end (happy to share it / attach as a gist). A few things any implementation needs to handle — and which I think are worth encoding so users don't hit them:Request size scales with schema, not asset class. For a full chain (parent symbology),
get_billable_sizegrows by orders of magnitude moving fromohlcv-1d(small) totrades(very large); single-contract pulls are negligible by comparison. Actual dollar cost depends on the user's plan and streaming-vs-batch mode, so the API should (a) make chain-wide price/quote pulls deliberate rather than accidental, and (b) exposeget_billable_size/get_costso users see size and their own quote before committing.estimate_cost()caveat (existing futures code). The currentContinuousDownloader.estimate_cost()is a flatyears × constantheuristic that ignores schema — it returns the same estimate forohlcv-1dandohlcv-1m, which can be very wrong for finer schemas. For options I'd lean onmetadata.get_billable_size()/get_cost()directly. (Possibly worth a separatefix:for futures, but flagging here.)OPRA OHLCV is multi-publisher. Each contract gets one bar per reporting venue (~17), so raw row counts are
dates × venuesand volume is split. Needs a documented consolidation step. The consolidated quote schemas (cbbo/tcbbo/cmbp-1) are single-publisher (id 30) and need no dedup.Quote availability is per-schema.
get_dataset_rangeshowscbbo-1mback to 2013,cmbp-1/tcbboto 2023, butcbbo-1sonly from 2025-02-20. A runtime check beats any hardcoded date.Index roots split. SPX has
SPX(AM-settled monthlies) andSPXW(PM weeklys/EOM) as separate parent chains — full coverage means querying both.Open questions
DataBentoProvider, or a paralleloptions/module likefutures/?cbbo/tcbbo) in a first cut, or OHLCV + chain only to start?Scope / non-goals (if welcomed)
@pytest.mark.integration/paid_tierfor live calls; quotes could be a follow-up.