Skip to content

Add OPRA options support to DataBentoProvider (chain discovery + OHLCV + consolidated quotes) #25

@pandashark

Description

@pandashark

Summary

I'd like to gauge interest in adding OPRA options support to the existing DataBentoProvider. Today the Databento integration covers equities and CME futures (GLBX.MDP3) only — there's no path to options data, even though Databento exposes the full OPRA feed (OPRA.PILLAR) through the same Historical client already vendored.

Before writing any code or tests, I wanted to confirm this is in scope for the library (the futures/ module looks deliberately CME-focused, so options may be an intentional boundary rather than a gap).

Motivation

Options are the one major asset class in the README's Databento coverage line ("CME, CBOE, ICE futures/options") that isn't actually reachable. A single DataBentoProvider already normalizes Databento → the standard Polars schema and plugs into DataManager; options OHLCV could ride the same path with a dataset/symbology switch.

What I'm proposing

Extend DataBentoProvider (not a new top-level module), since the work splits cleanly along the existing provider contract:

Capability Fits BaseProvider? Proposed surface
Single-contract OHLCV (OSI symbol) ✅ it is OHLCV fetch_ohlcv(...) with dataset="OPRA.PILLAR", stype_in="raw_symbol"
Chain discovery (definition + filter) ❌ extra method fetch_option_chain(underlying, date, *, expiry=None, spot=None, moneyness=None, right="both")
Consolidated bid/ask ❌ non-OHLCV columns fetch_option_quotes(contract, schema="cbbo-1m", ...)

This mirrors how fetch_continuous_futures() already sits alongside fetch_ohlcv() for the futures case.

Findings from a working prototype

I built a standalone script against databento.Historical to validate behavior end-to-end (happy to share it / attach as a gist). A few things any implementation needs to handle — and which I think are worth encoding so users don't hit them:

  1. Request size scales with schema, not asset class. For a full chain (parent symbology), get_billable_size grows by orders of magnitude moving from ohlcv-1d (small) to trades (very large); single-contract pulls are negligible by comparison. Actual dollar cost depends on the user's plan and streaming-vs-batch mode, so the API should (a) make chain-wide price/quote pulls deliberate rather than accidental, and (b) expose get_billable_size / get_cost so users see size and their own quote before committing.

  2. estimate_cost() caveat (existing futures code). The current ContinuousDownloader.estimate_cost() is a flat years × constant heuristic that ignores schema — it returns the same estimate for ohlcv-1d and ohlcv-1m, which can be very wrong for finer schemas. For options I'd lean on metadata.get_billable_size()/get_cost() directly. (Possibly worth a separate fix: for futures, but flagging here.)

  3. OPRA OHLCV is multi-publisher. Each contract gets one bar per reporting venue (~17), so raw row counts are dates × venues and volume is split. Needs a documented consolidation step. The consolidated quote schemas (cbbo/tcbbo/cmbp-1) are single-publisher (id 30) and need no dedup.

  4. Quote availability is per-schema. get_dataset_range shows cbbo-1m back to 2013, cmbp-1/tcbbo to 2023, but cbbo-1s only from 2025-02-20. A runtime check beats any hardcoded date.

  5. Index roots split. SPX has SPX (AM-settled monthlies) and SPXW (PM weeklys/EOM) as separate parent chains — full coverage means querying both.

Open questions

  • In scope? Is OPRA options something the library wants, or is Databento intentionally futures-only here?
  • Provider vs. module: extend DataBentoProvider, or a parallel options/ module like futures/?
  • Quotes scope: include bid/ask (cbbo/tcbbo) in a first cut, or OHLCV + chain only to start?
  • Greeks/IV: out of scope (Databento doesn't provide them; they'd be a downstream computation)?

Scope / non-goals (if welcomed)

  • First PR would target OHLCV + chain discovery with mocked-client unit tests (core lane) and @pytest.mark.integration/paid_tier for live calls; quotes could be a follow-up.
  • No greeks/IV, no live/streaming.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions