diff --git a/.claude/skills/review-pr/SKILL.md b/.claude/skills/review-pr/SKILL.md new file mode 100644 index 00000000..2ff79a80 --- /dev/null +++ b/.claude/skills/review-pr/SKILL.md @@ -0,0 +1,377 @@ +--- +name: review-pr +description: Review a GitHub pull request against py-questdb-client (Cython + C-ABI) coding standards +argument-hint: [PR number or URL] [--level=0..3] +allowed-tools: Bash(gh *), Bash(git *), Read, Grep, Glob, Agent +--- + +Review the pull request `$ARGUMENTS`. + +## Review mindset + +You are a senior QuestDB engineer performing a blocking code review. `py-questdb-client` is mission-critical software: a **Cython** extension that wraps the **`c-questdb-client` (Rust) library** through its **C ABI**, and is used to ingest production data from customer Python applications. A bug here causes data loss, silent data corruption, segfaults that take down the host Python interpreter, reference-count leaks, or native memory leaks. There is zero tolerance for correctness issues, memory unsafety, refcount imbalance, GIL violations, or an FFI binding that disagrees with the C header it calls. Be critical, thorough, and opinionated. Your job is to catch problems before they ship, not to be nice. + +- **Assume nothing is correct until you've verified it.** Read surrounding code to understand context — don't just look at the diff in isolation. +- **The diff is a hint, not the boundary of the review.** The highest-value bugs almost always live at callsites outside the diff that depend on contracts the diff quietly changed (a `cdef` helper's error-return convention, a buffer's ownership, a `qdb_pystr_buf` arena's lifetime). Treat the diff as the entry point, not the scope. +- **Flag every issue you find**, no matter how small. Do not soften language or hedge. Say "this is wrong" not "this might be an issue". +- **Do not praise the code.** Skip "looks good", "nice work", "clever approach". Focus entirely on problems and risks. +- **Think adversarially.** For each change, work through: + - Inputs: which values break this? Empty buffers, zero-length strings, `None`, NaN/inf floats, boundary integers (`INT64_MAX`/`INT64_MIN`), max-length symbols, non-UTF-8 `str`, `bytes` with embedded NULs, huge `int` that overflows `int64_t`. + - Encoding: how does the code behave when a Python `str` contains lone surrogates, astral codepoints, or characters that fail UTF-8 encoding? + - Memory: every `malloc`/`calloc`/`realloc` — is it freed on the error path, the exception path, and the early-return path? Every `Py_INCREF` — is there a matching `Py_DECREF`? Every `PyObject_GetBuffer` — a matching `PyBuffer_Release`? + - GIL: does a `with nogil` block touch a Python object or call a CPython API function? Does a `cdef ... nogil` function need the GIL it doesn't hold? + - Failure modes: connection dropping mid-flush, partial write, TLS handshake failure, auth rejection, server rejection — does the buffer/sender end in a usable state, and does native memory get released? + - C-ABI callers: what happens when a C function returns `NULL`, returns an error via its out-param, or hands back a pointer the Cython side must free exactly once? +- **Check what's missing**, not just what's there. Missing tests, missing error handling, missing edge cases, missing `ingress.pyi` stub updates for public API changes, `.pxd` declarations out of sync with the C header. +- **Verify every claim.** If the PR title says "fix", verify the bug actually existed and the fix is correct. If it says "improve performance", look for benchmarks or reason about the change against the per-row hot path. If it says "simplify", verify the new code is actually simpler and doesn't drop behavior (e.g. a dropped `free` on an error branch). Treat the PR description as an unverified hypothesis. +- **Read the full context of changed files** when the diff alone is ambiguous. Use Read/Grep/Glob to inspect surrounding code, callers, and related tests. +- **Assess reachability before reporting.** For every potential bug, trace the actual callers and inputs. If a problem requires physically impossible conditions (a length larger than `SIZE_MAX`, a NUL injected through an API that already rejects it, a panic behind a validation guard), it is not a real finding — drop it. Focus on bugs that real workloads can trigger, not theoretical edge cases. +- **Never review generated or build artifacts.** `src/questdb/ingress.c`, `*.html` (Cython annotation), and `*.so` are build outputs. The source of truth is `*.pyx`, `*.pxi`, `*.pxd`, and `*.pyi`. If the diff contains a regenerated `ingress.c`, review the `.pyx`/`.pxi` change that produced it, not the generated C. + +## Review level + +Parse `$ARGUMENTS` for a level token: `--level=N`, `-lN`, or a bare single digit `0`-`3`. **If no level is given, default to 0.** Strip the level token before feeding the remainder (PR number or URL) to `gh` commands. + +The level controls how much of the review below actually runs. Lower levels keep the same review *spirit* — adversarial, blocking, no praise — but cut the breadth of the analysis. Higher levels have significantly higher token cost; reserve level 3 for high-stakes PRs (C-ABI `.pxd` changes, a `c-questdb-client` submodule bump, the dataframe/Arrow ingestion path, `nogil` sections, manual `malloc`/refcount code, ILP wire format, or auth/TLS configuration). + +| Level | What runs | +|-------|-----------| +| **0 (default)** | Steps 1, 2, 4. Skip Steps 2.5a-d, but still run Step 2.5e (build & binding profile — mandatory at every level). Skip Step 3 — no agent spawn; review the diff inline in the main loop, using Read/Grep on demand to resolve ambiguities. Skip Step 3b — verify each finding inline as you write it. Single-pass review covering correctness, Cython memory/refcount/GIL safety, C-ABI binding correctness, tests, and coding standards on the diff itself. | +| **1** | Adds Step 2.5a (semantic delta only — skip 2.5b/2.5c/2.5d; Step 2.5e still runs, as at every level). In Step 3, launch only Agent 1 (correctness), Agent 2 (Cython memory & refcount safety), and Agent 7 (tests) in parallel. Skip all other agents. Skip Step 3b — verify findings inline as you draft the report. | +| **2** | Full Step 2.5, but in 2.5b restrict the callsite inventory to public Python symbols (exported in `__all__` / `ingress.pyi`) plus every `cdef`/`cpdef` function and every C-ABI symbol declared in the `.pxd` files. In Step 3, launch Agents 1-8. Skip Agent 9 (cross-context) and Agent 10 (adversarial fresh-context). Step 3b uses a single batched verification agent for all findings instead of one per finding. | +| **3** | Every step below as written, all 10 agents, per-finding verification. The full mission-critical pass. | + +State the chosen level in one line at the start of the review so the user knows what they're getting (e.g., "Reviewing PR #141 at level 2"). If the level was defaulted, mention that level 3 exists for full review. + +## Step 1: Gather PR context + +Capture the PR identifier in `$PR` (the part of `$ARGUMENTS` left after stripping the level token), then fetch metadata, diff, and review comments in a single bash call so `$PR` is in scope for all three `gh` invocations: + +```bash +PR='' +gh pr view "$PR" --json number,title,body,labels,state +gh pr diff "$PR" +gh pr view "$PR" --comments +``` + +If the diff modifies `c-questdb-client` (the git submodule pointer) or any `.pxd` file, note it now — a submodule bump or binding change is the highest-risk class of change in this repo and forces level-3 scrutiny of the C-ABI surface regardless of the requested level. + +## Step 2: PR title and description + +Check: +- Title is clear and describes the change +- Description speaks to end-user impact, not implementation internals +- If fixing an issue, `Fixes #NNN` or a link to the issue is present +- Tone is level-headed and analytical +- For public API changes (anything in `__all__`, a new/changed method on `Sender`/`Buffer`/`Client`, a new keyword argument, or a changed default), the description calls out the API change explicitly, and `CHANGELOG.rst` is updated +- For a `c-questdb-client` submodule bump, the description states which upstream change is being pulled in and why + +## Step 2.5: Map the change surface + +Before launching review agents, produce a structured change surface map. This step is mandatory and must use Grep/Glob — do not reason about callsites from memory. The output of this step is required input for every Step 3 agent except Agent 10 (the fresh-context adversarial agent, which deliberately works from the diff alone). + +### 2.5a Semantic delta per changed symbol + +For every modified or added function (`def`, `cdef`, `cpdef`), method, class, `cdef class` attribute, module-level constant, enum member, or C-ABI declaration in a `.pxd`, write: + +- **Symbol:** fully-qualified name (e.g., `questdb.ingress.Buffer.column`, `_dataframe`, `c_err_to_py`, `line_sender_buffer_column_f64`) +- **Before:** signature, return type, **Cython exception convention** (`except -1` / `except *` / `except? -1` / `except +` / none / `noexcept`), what it raises and on which inputs, `nogil`-ness, whether it touches Python objects, allocation behavior (`malloc`/`calloc`/`realloc`), refcount effect (does it steal/borrow/own a reference?), C-ABI ownership semantics (who frees returned pointers), thread-safety +- **After:** same fields +- **Delta:** one line stating what semantically changed + +"Refactored", "cleaned up", "improved", "simplified" are not acceptable deltas. State the actual behavioral difference. If nothing semantically changed, write "no behavioral change" — but only after checking, not as a default. + +### 2.5b Callsite inventory + +For every changed symbol that is public (in `__all__` / `ingress.pyi`), `cdef`/`cpdef`, declared in a `.pxd`, or a C-ABI function, run Grep across the repository to find every callsite, override, or reference outside the diff. + +Produce a list grouped by file. Search at minimum: + +- **Cython implementation & includes:** `grep -rn 'symbol_name' src/questdb/*.pyx src/questdb/*.pxi` +- **Cython C-ABI / helper declarations:** `grep -rn 'symbol_name' src/questdb/*.pxd` +- **Type stubs:** `grep -rn 'symbol_name' src/questdb/ingress.pyi` +- **C-ABI header (source of truth):** `grep -rn 'symbol_name' c-questdb-client/include/questdb/ingress/` +- **Rust helper crate:** `grep -rn 'symbol_name' rpyutils/src/ rpyutils/include/` +- **Unit & mock-server tests:** `grep -rn 'symbol_name' test/test.py test/mock_server.py test/test_tools.py` +- **System / integration tests:** `grep -rn 'symbol_name' test/system_test.py` +- **DataFrame tests, fuzz tests, leak tests:** `grep -rn 'symbol_name' test/test_dataframe.py test/test_client_dataframe_fuzz.py test/test_dataframe_fuzz.py test/test_dataframe_leaks.py test/test_client_capsule_path.py` +- **Examples:** `grep -rn 'symbol_name' examples/` +- **Docs:** `grep -rn 'symbol_name' docs/` + +A changed public / `cdef` / `.pxd` symbol with zero recorded Grep calls in the trace is a skill violation. The model is not allowed to assert "this is only used here" without showing the search. + +### 2.5c Implicit contract list + +For each changed symbol, walk this checklist and write one line per item, stating before vs after: + +- **Cython exception convention:** does the function return a C type with the right `except` clause? A `cdef` function returning `int`/`void`/a pointer with **no** `except` clause (or `noexcept`, the Cython 3 default for `nogil` functions) **silently swallows any Python exception raised inside it.** Did the convention change, and do all callers still propagate errors correctly? +- **Raises which exceptions on which inputs** (`IngressError`, `ValueError`, `TypeError`, `IngressServerRejectionError`, `UnsupportedDataFrameShapeError`) and which callers catch vs propagate them +- **Native memory:** does the symbol allocate (`malloc`/`calloc`/`realloc`) and who frees it? Does it free on every path including the exception path? +- **Reference counting:** does it `Py_INCREF`/`Py_DECREF`, store a borrowed `PyObject*`, hold a weakref/capsule, or return a borrowed vs owned reference? +- **Buffer protocol:** does it call `PyObject_GetBuffer` (and the matching `PyBuffer_Release`)? Does it keep the exporter alive while the raw pointer is in use? +- **GIL:** does it run under `nogil`? Does it release the GIL around a blocking C call (flush/connect)? Does it reacquire to raise? +- **C-ABI ownership:** does it pass a `line_sender_buffer`/`line_sender_utf8`/`qdb_pystr_buf` pointer into Rust, and who owns it afterward? Is a returned `line_sender_error*` freed exactly once (`line_sender_error_free`)? +- **`qdb_pystr_buf` arena lifetime:** are UTF-8 pointers obtained from the arena still valid after a subsequent `clear`/append (which may reallocate and invalidate earlier pointers)? +- **Buffer/sender state on error:** does a failed call leave the `Buffer` half-written, or the `Sender` in an unusable state requiring reconstruction? +- **`.pxd` ↔ C header agreement:** parameter types, `const`-ness, struct layout, enum discriminant order, return type — does the Cython declaration still match `c-questdb-client/include/questdb/ingress/*.h`? +- **`.pyi` ↔ implementation agreement:** does the stub still match the real signature, defaults, and return type? +- **Wire format:** any change to the ILP bytes produced (protocol v1 / v2), timestamp units, or column encoding. + +### 2.5d Cross-context exposure list + +End this step with an explicit list of "places this change is visible from but the diff does not touch". This is the highest-priority input for the bug-hunting agents in Step 3. + +Group the callsites from 2.5b by execution context. Typical contexts in this codebase: + +- **C-ABI binding surface:** every C-ABI function declared in `src/questdb/line_sender.pxd` / `conf_str.pxd` / `arrow_c_data_interface.pxd` / `mpdecimal_compat.pxd` / `rpyutils.pxd` that the changed code calls (transitively) +- **Buffer build hot path:** `Buffer.column`, `Buffer.symbol`, `Buffer.row`, `Buffer.at*`, and their `cdef` helpers +- **DataFrame / Arrow ingestion path:** everything in `dataframe.pxi`, the pandas/numpy/pyarrow/polars code paths, Arrow C Data Interface (`ArrowArray`/`ArrowSchema`/`ArrowArrayStream`) consumption and release callbacks, PyCapsule handling +- **Egress / query path:** `egress.pxi`, `QueryResult` +- **Flush path:** `Sender.flush`, `Buffer` → transport, the `with nogil` blocking sections +- **Auto-flush logic:** any callsite that triggers flush implicitly (row count / byte threshold / interval) +- **Configuration parsing:** `Sender.from_conf` / `from_env`, the `conf_str` parser, keyword-argument handling +- **Authentication / TLS:** auth token / basic-auth / TLS-CA configuration paths +- **`nogil` / threading surface:** the `active_senders` registry (`rpyutils/src/active_senders.rs`), any code reachable from multiple threads +- **`qdb_pystr_buf` arena users:** every function that obtains UTF-8 pointers from the per-`Buffer` string arena +- **Python type stubs:** `ingress.pyi` +- **Tests:** `test/test.py`, `test/system_test.py`, `test/test_dataframe.py`, fuzz and leak tests +- **Examples & docs:** `examples/*.py`, `docs/` + +Every entry on this list must be reviewed in Step 3. + +### 2.5e Build & binding profile facts + +**This sub-step runs at every level, including levels 0 and 1 where the rest of Step 2.5 is skipped.** A single Cython directive or a submodule bump can flip the safety story for the entire extension; agents must reason from the actual profile, not from defaults. + +Record, with file:line citations: + +- **Cython compiler directives** at the top of `ingress.pyx` and in `setup.py` (`language_level`, `binding`, and — if set — `boundscheck`, `wraparound`, `cdivision`, `initializedcheck`, `nonecheck`). If `boundscheck=False` / `wraparound=False`, **out-of-range or negative C-array/typed-memoryview indexing is undefined behavior, not an `IndexError`** — agents must treat indexing as a crash surface, not a guarded operation. +- **Cython exception-default fact:** in Cython 3, a `cdef`/`cpdef` function declared `nogil` (or any `cdef` returning a non-object type without an explicit `except` clause) defaults to `noexcept` — it **swallows Python exceptions silently**. Agents 1, 2, and 3 must check the actual `except` clause on every changed `cdef` and not assume exceptions propagate. +- **`c-questdb-client` submodule commit** (`git submodule status`) — if the diff moves it, the pinned commit's headers under `c-questdb-client/include/questdb/ingress/` are the *new* source of truth that every `.pxd` must match. Re-verify the `.pxd` ↔ `.h` agreement against the new commit. +- **`rpyutils` Rust crate:** if `rpyutils/src/**` or `rpyutils/Cargo.toml` changed, note its panic/profile behavior — a panic in `rpyutils` reached across the C ABI aborts the Python process. Its headers (`rpyutils/include/`, generated via `cbindgen.toml`) must match `rpyutils.pxd`. +- **Minimum numpy / Python versions** (`pyproject.toml`: `requires-python`, `numpy>=1.21.0`). Code that uses a newer numpy C-API or Python C-API symbol than the floor breaks the oldest supported build. State the floor. +- **`abort()` is imported** (`from libc.stdlib cimport ... abort`). Any reachable `abort()` call, or any Rust panic that crosses the C ABI, terminates the host interpreter with no traceback. Flag the path. + +A review without this section is incomplete. State the relevant facts (directives, exception default, submodule commit) in one line at the top of every Step 3 agent prompt (except Agent 10's, which works from the diff alone) so the agent reasons from the right premise. + +## Step 3: Parallel review + +Every agent except Agent 10 receives: +1. The PR diff +2. The full change surface map from Step 2.5 (semantic deltas, callsite inventory, implicit contracts, cross-context exposure list, build & binding profile facts) + +### Anti-anchoring directive (applies to all agents) + +- **Bugs at callsites outside the diff outrank bugs inside the diff.** A confirmed bug in a file the PR did not touch but that calls a changed symbol is a P0 finding. +- **"Looks correct in isolation" is not a valid conclusion.** Before clearing a changed symbol, the agent must walk the callsite inventory from 2.5b and explicitly state, per callsite, whether the new behavior is still correct there. +- **The diff is the entry point, not the scope.** If the change surface map shows the symbol is reachable from N other files, the review covers N+1 files. +- **Project-wide settings affect untouched code.** A change to a Cython directive in `ingress.pyx` or `setup.py` (e.g. flipping `boundscheck` off), a `c-questdb-client` submodule bump, or a `.pxd` declaration change retroactively changes the safety/ABI story for **every** function that compiles under that directive or calls that binding — not just the diff. When directives, `setup.py`, `pyproject.toml`, or `.pxd`/submodule pointers appear in the diff, the review covers the affected surface of the whole extension, not just the touched lines. +- A single finding of the form "in `dataframe.pxi` the new behavior of `Buffer.column` leaks `b.validity` on the exception path" is worth more than five findings inside the diff. + +### Agents + +Launch the following agents in parallel. + +**Agent 1 — Correctness & bugs:** `None`/NULL handling, edge cases, logic errors, off-by-one, operator precedence, error paths. Integer correctness across the Python↔C boundary: Python `int` → `int64_t`/`size_t` conversion and overflow, `` / `` / `` casts that truncate or wrap, signed/unsigned mismatches, negative-length math. NaN/inf float handling. Timestamp unit conversions (micros vs nanos). Correct ILP wire format (v1 / v2). Cross-reference every changed symbol against its callsite inventory and verify the new behavior is correct at each callsite. + +**Agent 2 — Cython memory, refcount & crash surface:** In a Cython extension, anything that corrupts memory or aborts the native side takes down the host Python interpreter with no traceback. Flag every reachable instance of: + +- **Native memory leaks / double-free / use-after-free:** every `malloc`/`calloc`/`realloc` must be `free`d on **all** paths — success, early `return`, and the exception/`except` path (prefer `try/finally`). A `realloc` whose return value is assigned back to the same pointer leaks the original on failure (it returns `NULL` without freeing). Freeing a pointer twice, or using it after `free`, corrupts the heap. +- **Reference-count errors:** every `Py_INCREF` needs a matching `Py_DECREF` on all paths; a missing `DECREF` leaks, an extra `DECREF` causes a later use-after-free crash. Borrowed references (`PyWeakref_GetObject`, dict/list borrows, `PyObject*` stored without incref) must not outlive their owner. Verify `PyCapsule` and weakref handling. +- **Buffer-protocol imbalance:** every `PyObject_GetBuffer` must have a matching `PyBuffer_Release` on all paths, and the raw pointer must not be used after the exporting object can be collected. +- **Indexing under `boundscheck=False`:** per 2.5e, C-array and typed-memoryview indexing is unchecked — an out-of-range or negative index is UB, not an exception. Verify bounds are established before every index on the hot path. +- **Silent exception swallowing:** a `cdef` function returning a C type without the correct `except` clause (or `noexcept`) drops Python exceptions on the floor, turning an error into wrong data. Verify the `except` convention against what the body raises. +- **Direct aborts:** any reachable `abort()` (it is imported), and any **Rust panic crossing the C ABI** (from `c-questdb-client` or `rpyutils`) — both terminate the interpreter. The only defense is that the native side returns an error code/`line_sender_error*`, never panics. +- **Uninitialized memory:** a struct field or `malloc`'d region read before it is written (use `calloc` or explicit init), especially partially-built `pyobj_built_t`-style structs on an error path that then get freed. + +State the relevant build facts (directives, exception default, submodule commit) from 2.5e in the agent's first sentence, and evaluate every finding under the actual settings, not the textbook defaults. + +**Agent 3 — C-ABI boundary safety:** Check every call into the `c-questdb-client` / `rpyutils` C ABI. Verify: +- **`.pxd` matches the C header.** For every changed or called C-ABI symbol, read the actual declaration in `c-questdb-client/include/questdb/ingress/*.h` (or `rpyutils/include/`) and confirm the `.pxd` declaration matches it exactly: parameter types, pointer/`const`-ness, return type, struct field order and types, enum discriminant order. A mismatch is silent memory corruption / ABI breakage. If the submodule pointer moved, verify against the **new** pinned commit. +- **NULL handling:** every pointer returned from a C function checked before dereference; every pointer argument that could be `NULL` handled. +- **Error object lifecycle:** every `line_sender_error*` obtained via an out-param is converted (`c_err_to_py`) and freed exactly once (`line_sender_error_free`) — never leaked, never double-freed, never freed then read. +- **Ownership transfer:** `line_sender_buffer`, `line_sender_utf8`, `qdb_pystr_buf`, `line_sender` handles — who allocates, who frees, and is the lifetime correct relative to the owning `cdef class` (`__cinit__`/`__dealloc__`)? +- **`qdb_pystr_buf` arena invalidation:** UTF-8 pointers handed to Rust must remain valid until the buffer write completes and must not be invalidated by an intervening arena `clear`/append. +- **String encoding:** Python `str` → UTF-8 (`line_sender_utf8`), correct length passed, no lone surrogates, embedded-NUL handling, `bytes` vs `str` distinction. + +**Agent 4 — GIL & concurrency:** Verify: +- **`nogil` correctness:** no `with nogil` block (or `cdef ... nogil` function) touches a Python object, calls the CPython C-API, raises a Python exception, or `INCREF`/`DECREF`s — doing so without the GIL is a crash/corruption. Errors discovered under `nogil` must be deferred and raised after reacquiring the GIL. +- **GIL release around blocking calls:** the flush/connect/network C calls should release the GIL (`with nogil`) so other threads run; verify the released region doesn't reference Python state. +- **Thread-safety:** `Sender`, `Buffer`, and the `active_senders` registry (`rpyutils/src/active_senders.rs`) — verify documented thread-safety matches the implementation, and that shared mutable state reachable from multiple threads is synchronized. Cross-reference every callsite from 2.5b for violations of the concurrency contract. +- **Free-threaded build:** if the change assumes the GIL serializes access, note whether it holds under a free-threaded (no-GIL) CPython build (the CI matrix includes `*t` free-threaded targets). + +**Agent 5 — Resource management & lifecycle:** Leaks on all code paths (especially errors). Check `__cinit__`/`__dealloc__` pairing on every `cdef class` (does `__dealloc__` free everything `__cinit__` and methods allocated, and is it safe when `__cinit__` failed partway?). Native handle lifecycle (`line_sender`, `line_sender_buffer`, `qdb_pystr_buf`). Socket/connection/TLS teardown on error (handled by Rust, but verify the Cython side calls close/free). **Arrow C Data Interface:** `ArrowArray`/`ArrowSchema`/`ArrowArrayStream` `release` callbacks invoked exactly once; PyCapsule consumption semantics correct; no double-release. Walk every callsite from 2.5b that constructs, owns, or transfers ownership of a native handle and verify cleanup on all paths (success, exception, early return). + +**Agent 6 — Performance & allocations:** Unnecessary work on hot paths — the per-row buffer build (`Buffer.column`/`symbol`/`row`) and the per-column DataFrame loop (`dataframe.pxi`). Flag: Python-level operations (attribute lookups, `dict` access, object boxing, `str` re-encoding) inside the inner per-row/per-cell loop that should be hoisted or done at C level; allocations per row/cell that should be amortized; excessive copying of data that could be zero-copy via the buffer protocol / Arrow; O(n²) patterns over rows or columns. Analyze scaling at realistic volume: millions of rows per flush, hundreds of columns. Setup-path costs (sender construction, config parsing, schema inspection done once per DataFrame) are acceptable; per-row/per-cell costs are not. + +**Agent 7 — Test review & coverage:** Coverage gaps, error-path tests, `None`/edge-case tests, boundary conditions, regression tests, test quality. Check: +- Unit / mock-server tests in `test/test.py` (uses `test/mock_server.py`) +- System / integration tests against a real QuestDB in `test/system_test.py` +- DataFrame tests in `test/test_dataframe.py`, fuzz tests in `test/test_client_dataframe_fuzz.py` / `test/test_dataframe_fuzz.py`, and **leak tests** in `test/test_dataframe_leaks.py` (new native-memory or refcount handling should have a leak test) +- Capsule / Arrow path tests in `test/test_client_capsule_path.py` +- Examples in `examples/` still run (and `examples.manifest.yaml` is consistent) + +Cross-reference 2.5d: every cross-context exposure should have a test that exercises the changed symbol from that context. Missing tests for cross-context callsites — especially a new native-memory path without a leak test, or a new C-ABI binding without a system test — is a high-priority finding. + +**Agent 8 — Code quality & API design:** Public API ergonomics and consistency. **`ingress.pyi` stub must match the implementation** (signatures, defaults, return types, new symbols added to `__all__`). Docstrings on public classes/methods. `CHANGELOG.rst` updated for user-visible changes. Backward compatibility of the Python API (renamed/removed kwargs, changed defaults, changed exception types) — breaking changes must be intentional and called out in the PR body. Naming consistent with the codebase. No dead code, no unused `cimport`/`import`. Docs under `docs/` updated for API changes. + +**Agent 9 — Cross-context caller impact:** Walk the callsite inventory from 2.5b. For every callsite, fetch the surrounding code (the calling function plus its callers up two levels) and answer: + +- Does this caller pass inputs the new behavior handles incorrectly? +- Does this caller depend on a contract from the implicit contract list (2.5c) that the change broke — e.g. relying on the old `except` convention, the old ownership of a buffer, the old `qdb_pystr_buf` lifetime, the old refcount behavior? +- Is this caller in a context (a `with nogil` block, the per-row hot loop, an auto-flush trigger, an Arrow release callback, a `__dealloc__`, an exception/error path) where the new behavior misbehaves even if the inputs are valid? +- For a changed `cdef`/`cpdef` exception convention: do all callers still detect and propagate the error? +- For a changed C-ABI declaration: does the `.pxd` still match the C header, and do all Cython callers pass the right types/ownership? +- For a changed buffer/sender state machine: do all callers respect the new state transitions (buffer cleared after error before reuse; flush only when flushable)? + +This agent's output is structured per callsite, not per failure mode. Each callsite gets a verdict: SAFE / BROKEN / NEEDS VERIFICATION. Every BROKEN entry is a P0 finding regardless of whether the file is in the diff. + +This agent is not optional even when the diff is small. Small diffs to widely-used symbols (`Buffer.column`, `Sender.flush`, the dataframe entry point, a C-ABI binding) have the largest blast radius. + +**Agent 10 — Fresh-context adversarial:** Dispatched separately from agents 1-9 to escape checklist anchoring. This agent operates under different rules from the rest: + +- It receives ONLY the PR diff and the names of the changed files. It does NOT receive the change surface map from Step 2.5, the implicit contract list, the cross-context exposure list, or any of the review checklists below. +- Its sole instruction: "find ways this code is wrong". No category list, no failure-mode taxonomy, no project-specific style guide. +- It is free to use Read, Grep, and Glob to explore the repository however it wants. +- Findings are not pre-classified by category. Each finding states: what's wrong, why it's wrong, and the code path that demonstrates it. + +The point of this agent is to surface bugs the structured agents cannot see because they are reasoning inside the same frame. A finding here that none of agents 1-9 produced is high signal — it means the structured review missed it. A finding here that overlaps with agents 1-9 is corroboration. + +Run this agent in parallel with agents 1-9. It is mandatory regardless of diff size. + +Combine all agent findings into a single deduplicated **draft** report. Do NOT present this draft to the user yet — it goes straight into verification. + +## Step 3b: Verify every finding against source code + +The parallel review agents work from the diff plus the change surface map and frequently produce false positives — especially around native memory ownership, refcounting, GIL boundaries, Cython exception conventions, and C-ABI lifecycle. Every finding MUST be verified before it is reported. + +For each finding in the draft report: + +1. **Read the actual source code** at the exact lines cited (in the `.pyx`/`.pxi`/`.pxd`/`.pyi`, never the generated `ingress.c`). Do not rely on the agent's description alone. +2. **Trace the full code path:** follow callers and `cdef` helpers. Remember Cython's `include` model — `dataframe.pxi` and `egress.pxi` are textually included into `ingress.pyx`, so symbols are shared across them. +3. **Check both sides of the C ABI:** if a finding involves Cython↔Rust interaction, read both the Cython call and the C header in `c-questdb-client/include/questdb/ingress/` (or `rpyutils/include/`). Verify ownership transfer, error propagation, and freeing on both sides. +4. **For native-memory-leak claims:** trace every `malloc`/`calloc`/`realloc` to its `free` on ALL paths (success, early return, `except`/exception unwind). Confirm the intervening code can actually raise before claiming the exception path leaks. +5. **For refcount claims:** count `Py_INCREF`/`Py_DECREF` on every path; confirm borrowed-vs-owned reasoning against the CPython C-API contract of each function used. +6. **For exception-swallowing claims:** check the actual `except` clause on the `cdef` and whether the body can raise. Under Cython 3 a `nogil` `cdef` defaults to `noexcept` — confirm whether that's the real declaration. +7. **For GIL claims:** verify the cited code is actually inside a `nogil` region and actually touches a Python object / C-API; a `cdef` function called from `nogil` may itself acquire the GIL. +8. **For C-ABI / `.pxd` mismatch claims:** read the exact declaration in the pinned header and compare field-by-field. A claimed mismatch that actually matches is a false positive. +9. **For numeric overflow/truncation claims:** check reachability at realistic scale — ILP buffers up to a few hundred MB, millions of rows per flush, columns in the tens to low hundreds. Drop overflows that require values beyond that scale. +10. **For performance claims:** confirm the cost is on the per-row/per-cell hot path and measurable relative to surrounding I/O. Downgrade negligible savings to a nit. Exception: a per-row or per-cell allocation / Python-object operation on the buffer-build path is always worth flagging. +11. **For cross-context findings (Agent 9):** re-read the callsite in full, including callers up two levels, and confirm the broken behavior is reachable from production or test paths users will exercise. + +**Classify each finding** as: +- **CONFIRMED in-diff** — the bug is real and inside the diff +- **CONFIRMED at out-of-diff callsite** — the bug is in an unchanged file because the changed symbol is used there in a way that's now broken (cite the file and the contract from 2.5c that was violated) +- **FALSE POSITIVE** — the code is actually correct (explain why) +- **CONFIRMED with nuance** — the issue exists but is less severe than stated (explain) + +**Move false positives to a separate "Downgraded" section** at the end of the report. For each, give a one-line explanation of why it was dismissed. This lets the PR author verify the reasoning and catch verification mistakes. + +Launch verification agents in parallel where findings are independent. Each verification agent should read surrounding source files, not just the diff. + +## Review checklists + +Review the diff for: + +### Correctness & bugs +- `None`/NULL handling at API boundaries +- Edge cases and error paths +- Logic errors, off-by-one, incorrect bounds, wrong operator precedence +- Integer overflow/truncation across the Python↔C boundary (`int` → `int64_t`/`size_t`, ``/`` casts, signed/unsigned) +- Float edge cases (NaN, inf), timestamp unit conversions (micros vs nanos) +- Correct ILP wire format (v1 / v2) +- **Reachability expansion:** for each changed symbol, list the new contexts it can appear in (DataFrame path, `nogil` section, auto-flush, Arrow callback, error path) and verify it works in each. + +### Cython memory & refcount safety +- Every `malloc`/`calloc`/`realloc` freed on success, early-return, and exception paths (prefer `try/finally`); no double-free, no use-after-free; `realloc`-failure path doesn't leak the original +- Every `Py_INCREF` matched by `Py_DECREF`; borrowed references not outliving their owner; weakref/capsule handling correct +- Every `PyObject_GetBuffer` matched by `PyBuffer_Release`; exporter kept alive while the pointer is used +- Correct Cython `except` convention on every `cdef`/`cpdef` returning a C type (no silent exception swallowing; `noexcept` is the Cython-3 default for `nogil` `cdef`) +- No reachable `abort()`, and no Rust panic crossing the C ABI (both kill the interpreter) +- Indexing safe under the active `boundscheck`/`wraparound` directives +- No uninitialized struct/heap memory read (use `calloc` or init before use, especially on partially-built error paths) + +### C-ABI boundary +- `.pxd` declarations match `c-questdb-client/include/questdb/ingress/*.h` (and `rpyutils/include/`) exactly — types, `const`, struct layout, enum order, return type — against the **pinned** submodule commit +- All pointers returned from C checked for NULL before dereference +- Every `line_sender_error*` freed exactly once (`line_sender_error_free`), never double-freed or leaked +- Ownership semantics clear and correct (who allocates the handle, who frees it, lifetime vs the owning `cdef class`) +- `qdb_pystr_buf` arena pointers stay valid until consumed; not invalidated by an intervening `clear`/append +- String handling: `str` → UTF-8 with correct length, lone-surrogate rejection, embedded-NUL handling, `bytes`/`str` distinction +- ABI stability: a submodule bump that reorders a struct or renumbers an enum requires matching `.pxd` updates + +### GIL & concurrency +- No Python object access / C-API call / refcount op / raise inside a `with nogil` block or `cdef ... nogil` function +- GIL released around blocking network/flush C calls; released region references no Python state; errors deferred and raised after reacquiring +- `Sender`/`Buffer`/`active_senders` thread-safety matches documentation; shared mutable state synchronized +- Assumptions that the GIL serializes access re-checked for the free-threaded CPython build + +### Performance +- No per-row/per-cell Python-level operations (attribute/dict lookups, boxing, `str` re-encoding) in the buffer-build or DataFrame inner loops that belong at C level or hoisted to setup +- No per-row/per-cell allocations that should be amortized +- Zero-copy where possible (buffer protocol, Arrow) instead of copying +- No O(n²) over rows or columns at realistic scale (millions of rows, hundreds of columns) + +### Resource management +- `__cinit__`/`__dealloc__` pair frees everything allocated, and `__dealloc__` is safe after a partially-failed `__cinit__` +- Native handles (`line_sender`, `line_sender_buffer`, `qdb_pystr_buf`) released on all paths +- Socket/connection/TLS cleanup on error (Cython side invokes the Rust close/free) +- Arrow `release` callbacks invoked exactly once; PyCapsule consumed correctly; no double-release +- No leak through the C-ABI boundary (ownership documented and consistent) + +### Code quality +- `ingress.pyi` stub matches the implementation (signatures, defaults, return types, `__all__`) +- Public API consistent and ergonomic; backward-compatible (or breaking changes called out in the PR body) +- `CHANGELOG.rst` updated for user-visible changes; `docs/` updated for API changes +- Docstrings on public classes/methods +- Naming consistent with the codebase; no dead code or unused `import`/`cimport` + +### Test review +- **Coverage gaps:** every new/changed code path has a corresponding test; flag missing ones explicitly as "missing test for X" +- **Cross-context coverage:** every entry in the cross-context exposure list (2.5d) has a test exercising the changed symbol from that context +- **Leak coverage:** new native-memory or refcount-handling code has a test in `test/test_dataframe_leaks.py` (or equivalent) +- **Error-path coverage:** failure cases, partial writes, connection drops, TLS/auth failures, server rejections, and edge conditions tested — not just the happy path +- **Edge-case tests:** `None`, empty buffers, zero-length strings, max-length symbols, boundary integers, NaN/inf, non-UTF-8 strings +- **C-ABI / binding changes** covered by a system test in `test/system_test.py` +- **DataFrame / Arrow changes** covered in `test/test_dataframe.py` and the fuzz/capsule tests +- **Test quality:** tests assert the right thing; watch for trivially-passing tests +- **Regression tests:** a bug fix has a test that reproduces the original bug and fails without the fix + +### Unresolved TODOs and FIXMEs +- Scan the diff for `TODO`, `FIXME`, `HACK`, `XXX`, `WORKAROUND`. For each: + - Pre-existing (just moved/reformatted) or newly introduced in this PR? + - If new: unfinished work that should block merge, or an acceptable known limitation? Flag deferred bugs or incomplete implementations. + - If it references a ticket/issue, verify the reference exists. + +### Commit messages +- Plain English titles, under 50 chars +- Active voice, naming the acting subject + +## Step 4: Output + +Present ONLY verified findings (false positives are excluded from Critical/Moderate/Minor). Structure as: + +### Critical +Issues that must be fixed before merge. Each must include: +- Exact file path and line numbers (including out-of-diff files) +- Whether the finding is **in-diff** or **out-of-diff** +- Code path trace showing why the bug is real +- For out-of-diff findings: the contract from 2.5c that was violated and the callsite that triggers it +- Suggested fix + +### Moderate +Issues worth addressing but not blocking. + +### Minor +Style nits and suggestions. + +### Downgraded (false positives) +Findings from the initial review that were dismissed after source code verification. For each, state: +- The original claim (one line) +- Why it was dismissed (one line, citing the specific code that disproves it) + +### Summary +- One-line verdict: approve, request changes, or needs discussion +- Highlight any regressions or tradeoffs +- State how many draft findings were verified vs dropped as false positives (e.g., "8 findings verified, 4 false positives removed") +- State the in-diff vs out-of-diff split (e.g., "5 findings in-diff, 3 findings out-of-diff"). If the diff is non-trivial and out-of-diff is zero, the cross-context pass likely underran — re-invoke Agent 9 with a wider grep before finalizing. \ No newline at end of file diff --git a/CHANGELOG.rst b/CHANGELOG.rst index c3deae50..672455bb 100644 --- a/CHANGELOG.rst +++ b/CHANGELOG.rst @@ -5,6 +5,55 @@ Changelog ========= +Unreleased +---------- + +Features +~~~~~~~~ + +OIDC Authentication (:mod:`questdb.auth`) +************************************************ + +New :mod:`questdb.auth` module to sign in interactively to OIDC-secured +QuestDB Enterprise from Python — including from **remote** kernels +(JupyterHub, SageMaker, Colab, VS Code-remote) that have no local browser. + +It runs the OAuth 2.0 Device Authorization Grant (RFC 8628) client-side: you +authorize in any browser (laptop or phone), and the token is presented to +QuestDB over the auth paths it already supports (HTTP ``Bearer`` / PG-wire +``_sso``). No server change is required. + +.. code-block:: python + + from questdb.auth import OidcDeviceAuth, connect + + # Just the token (use it with PG-wire, HTTP, or any client): + auth = OidcDeviceAuth.from_questdb("https://questdb.example.com:9000") + token = auth.token() + + # Or the integrated session (query to a DataFrame, feed adapters): + qdb = connect("https://questdb.example.com:9000") + df = qdb.sql("SELECT * FROM trades LIMIT 10") + +Highlights: + +* Auto-discovery of OIDC config from the QuestDB ``/settings`` endpoint, with a + fallback to the IdP ``.well-known`` document. +* In-process token cache with silent refresh (tokens are never written to + disk). +* Adapters for pandas (REST ``/exec``), SQLAlchemy, psycopg and the ingestion + ``Sender``. +* ``token()`` / ``headers()`` require no dependencies beyond the standard + library; ``pandas`` / ``sqlalchemy`` / ``psycopg`` / ``qrcode`` / ``IPython`` + are imported lazily. + +See the :ref:`OIDC authentication guide ` for details. + +Python Version Support +~~~~~~~~~~~~~~~~~~~~~~~~ + +* Raised the minimum supported Python version to 3.10. + 4.1.0 (2025-11-28) ------------------ diff --git a/ci/cibuildwheel.yaml b/ci/cibuildwheel.yaml index c0d31767..f6ca473f 100644 --- a/ci/cibuildwheel.yaml +++ b/ci/cibuildwheel.yaml @@ -107,7 +107,7 @@ stages: cmd /c "call `"$vsPath`" && set > env_vars.txt" Get-Content env_vars.txt | ForEach-Object { - if ($_ -match "^([^=]+?)=(.*)$" -and $matches[1] -notmatch '^(SYSTEM|AGENT|BUILD|RELEASE|VSTS|TASK|USE_|FAIL_|MSDEPLOY|AZP_75787|AZP_AGENT|AZP_ENABLE|AZURE_HTTP|COPYFILESOVERSSHV0|ENABLE_ISSUE_SOURCE_VALIDATION|MODIFY_NUMBER_OF_RETRIES_IN_ROBOCOPY|MSBUILDHELPERS_ENABLE_TELEMETRY|RETIRE_AZURERM_POWERSHELL_MODULE|ROSETTA2_WARNING|AZP_PS_ENABLE)') { + if ($_ -match "^([^=]+?)=(.*)$" -and $matches[1] -notmatch '^(SYSTEM|AGENT|BUILD|RELEASE|VSTS|TASK|USE_|FAIL_|MSDEPLOY|AZP_75787|AZP_AGENT|AZP_ENABLE|AZP_ENHANCED|AZURE_HTTP|COPYFILESOVERSSHV0|ENABLE_ISSUE_SOURCE_VALIDATION|MODIFY_NUMBER_OF_RETRIES_IN_ROBOCOPY|MSBUILDHELPERS_ENABLE_TELEMETRY|RETIRE_AZURERM_POWERSHELL_MODULE|ROSETTA2_WARNING|AZP_PS_ENABLE)') { [System.Environment]::SetEnvironmentVariable($matches[1], $matches[2], "Process") Write-Host "##vso[task.setvariable variable=$($matches[1])]$($matches[2])" } @@ -137,7 +137,7 @@ stages: cmd /c "call `"$vsPath`" && set > env_vars.txt" Get-Content env_vars.txt | ForEach-Object { - if ($_ -match "^([^=]+?)=(.*)$" -and $matches[1] -notmatch '^(SYSTEM|AGENT|BUILD|RELEASE|VSTS|TASK|USE_|FAIL_|MSDEPLOY|AZP_75787|AZP_AGENT|AZP_ENABLE|AZURE_HTTP|COPYFILESOVERSSHV0|ENABLE_ISSUE_SOURCE_VALIDATION|MODIFY_NUMBER_OF_RETRIES_IN_ROBOCOPY|MSBUILDHELPERS_ENABLE_TELEMETRY|RETIRE_AZURERM_POWERSHELL_MODULE|ROSETTA2_WARNING|AZP_PS_ENABLE)') { + if ($_ -match "^([^=]+?)=(.*)$" -and $matches[1] -notmatch '^(SYSTEM|AGENT|BUILD|RELEASE|VSTS|TASK|USE_|FAIL_|MSDEPLOY|AZP_75787|AZP_AGENT|AZP_ENABLE|AZP_ENHANCED|AZURE_HTTP|COPYFILESOVERSSHV0|ENABLE_ISSUE_SOURCE_VALIDATION|MODIFY_NUMBER_OF_RETRIES_IN_ROBOCOPY|MSBUILDHELPERS_ENABLE_TELEMETRY|RETIRE_AZURERM_POWERSHELL_MODULE|ROSETTA2_WARNING|AZP_PS_ENABLE)') { [System.Environment]::SetEnvironmentVariable($matches[1], $matches[2], "Process") Write-Host "##vso[task.setvariable variable=$($matches[1])]$($matches[2])" } diff --git a/ci/pip_install_deps.py b/ci/pip_install_deps.py index d70b9761..e3ee25b4 100644 --- a/ci/pip_install_deps.py +++ b/ci/pip_install_deps.py @@ -77,12 +77,18 @@ def install_pandas3_and_numpy(): def should_use_pandas3(py_version=None): if py_version is None: py_version = sys.version_info[:2] - return py_version >= (3, 11) + # Pandas 3 ships no 32-bit wheels, so only take the pandas 3 / numpy 2 + # path on 64-bit interpreters. On 32-bit (e.g. win32) the pandas 3 install + # would be silently skipped, fastparquet would then drag in a numpy-1-built + # pandas 2.0.3 alongside numpy 2, and importing pandas would crash. + is_64bits = sys.maxsize > 2 ** 32 + return is_64bits and py_version >= (3, 11) def install_default_pandas_and_numpy(): - # Pandas 3 currently requires Python 3.11+, so keep 3.10 wheel tests on - # the pandas 2 / numpy 1.x-compatible path unless explicitly overridden. + # Pandas 3 requires Python 3.11+ and ships only 64-bit wheels, so keep + # 3.10 and all 32-bit wheel tests on the pandas 2 / numpy 1.x-compatible + # path unless explicitly overridden. if should_use_pandas3(): install_pandas3_and_numpy() else: diff --git a/ci/run_tests_pipeline.yaml b/ci/run_tests_pipeline.yaml index 80099fb9..3025ab1a 100644 --- a/ci/run_tests_pipeline.yaml +++ b/ci/run_tests_pipeline.yaml @@ -63,12 +63,24 @@ stages: git clone --depth 1 https://github.com/questdb/questdb.git displayName: git clone questdb master condition: eq(variables.vsQuestDbMaster, true) - - task: Maven@3 + # Decide whether to build java-questdb-client from the bundled + # submodule (-P local-client, for a -SNAPSHOT client not on Maven + # Central) or resolve it from Maven Central. Sets $(CLIENT_PROFILE). + - template: templates/detect-local-client.yml + parameters: + qdbRepoPath: questdb + condition: eq(variables.vsQuestDbMaster, true) + # The Maven@3 task crashes parsing JDK 25 ("Cannot read properties of + # null (reading 'major')") since its JDK support tops out at 21, so + # invoke Maven directly on the preinstalled JDK 25 instead. Mirrors the + # task's defaults: POM questdb/pom.xml, goal "package". + - bash: | + set -eu + export JAVA_HOME="$(JAVA_HOME_25_X64)" + export PATH="$JAVA_HOME/bin:$PATH" + java -version + mvn -B -f questdb/pom.xml package -DskipTests -Pbuild-web-console $(CLIENT_PROFILE) displayName: "Compile QuestDB master" - inputs: - mavenPOMFile: "questdb/pom.xml" - jdkVersionOption: "1.17" - options: "-DskipTests -Pbuild-web-console" condition: eq(variables.vsQuestDbMaster, true) - script: python3 proj.py test 1 displayName: "Test vs released" @@ -77,8 +89,20 @@ stages: - script: python3 proj.py test 1 displayName: "Test vs master" env: - JAVA_HOME: $(JAVA_HOME_17_X64) + JAVA_HOME: $(JAVA_HOME_25_X64) QDB_REPO_PATH: "./questdb" + # QuestDB master runs as the io.questdb JPMS module and needs these + # JDK 25 access flags (mirrors questdb.sh). The test fixture launches + # questdb.jar directly rather than via questdb.sh, so feed them to the + # java launcher through JDK_JAVA_OPTIONS. + JDK_JAVA_OPTIONS: >- + --sun-misc-unsafe-memory-access=allow + --enable-native-access=io.questdb + --add-opens=java.base/java.lang=io.questdb + --add-opens=java.base/java.lang.reflect=io.questdb + --add-opens=java.base/java.nio=io.questdb + --add-opens=java.base/java.time.zone=io.questdb + --add-exports=java.base/jdk.internal.vm=io.questdb condition: eq(variables.vsQuestDbMaster, true) - job: TestsAgainstVariousNumpyVersion1x pool: diff --git a/ci/templates/detect-local-client.yml b/ci/templates/detect-local-client.yml new file mode 100644 index 00000000..55c9a5fb --- /dev/null +++ b/ci/templates/detect-local-client.yml @@ -0,0 +1,36 @@ +# Adapted from questdb/questdb's ci/templates/detect-local-client.yml. +# +# Decide how a cloned QuestDB checkout resolves its java-questdb-client +# dependency: a -SNAPSHOT client version is not published to Maven Central, so +# build it from the bundled java-questdb-client submodule via the `local-client` +# profile; a released version is taken from Maven Central. Sets the +# CLIENT_PROFILE pipeline variable (``-P local-client`` or empty) for the +# following Maven build, and inits the submodule only when it is needed. +# +# Unlike the upstream template, QuestDB is cloned into a subdirectory here, so +# the repo path is a parameter; ``condition`` lets the caller gate this to the +# matrix leg that builds QuestDB master. +parameters: + - name: qdbRepoPath + type: string + default: questdb + - name: condition + type: string + default: succeeded() + +steps: + - bash: | + set -eu + pom="${{ parameters.qdbRepoPath }}/core/pom.xml" + CLIENT_VERSION=$(sed -n 's/.*\(.*\)<\/questdb.client.version>.*/\1/p' "$pom" | head -1) + echo "questdb.client.version=$CLIENT_VERSION" + if echo "$CLIENT_VERSION" | grep -q '\-SNAPSHOT$'; then + echo "SNAPSHOT client detected -> build it locally (local-client profile)" + git -C "${{ parameters.qdbRepoPath }}" submodule update --init java-questdb-client + echo "##vso[task.setvariable variable=CLIENT_PROFILE]-P local-client" + else + echo "Release client detected -> resolve from Maven Central" + echo "##vso[task.setvariable variable=CLIENT_PROFILE]" + fi + displayName: "Detect QuestDB local client profile" + condition: ${{ parameters.condition }} diff --git a/docs/api.rst b/docs/api.rst index b3e1f11e..9ff4daf3 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -67,3 +67,66 @@ questdb.ingress :members: :undoc-members: :show-inheritance: + +questdb.auth +============ + +See the :ref:`oidc_auth` guide for an overview. + +.. autofunction:: questdb.auth.connect + +.. autoclass:: questdb.auth.QuestDB + :members: + :undoc-members: + :show-inheritance: + +.. autoclass:: questdb.auth.OidcDeviceAuth + :members: + :undoc-members: + :show-inheritance: + +.. autoclass:: questdb.auth.OidcConfig + :members: + :undoc-members: + :show-inheritance: + +.. autoclass:: questdb.auth.TokenCache + :members: + :undoc-members: + :show-inheritance: + +.. autoclass:: questdb.auth.TokenSet + :members: + :undoc-members: + :show-inheritance: + +.. autoclass:: questdb.auth.MemoryCache + :members: + :undoc-members: + :show-inheritance: + +.. autoclass:: questdb.auth.NullCache + :members: + :undoc-members: + :show-inheritance: + +.. autoexception:: questdb.auth.OidcError + :show-inheritance: + +.. autoexception:: questdb.auth.OidcConfigError + :show-inheritance: + +.. autoexception:: questdb.auth.OidcInteractionRequired + :show-inheritance: + +.. autoexception:: questdb.auth.OidcDeviceFlowError + :show-inheritance: + +.. autoexception:: questdb.auth.OidcTimeoutError + :show-inheritance: + +.. autoexception:: questdb.auth.OidcAuthError + :show-inheritance: + +.. autoexception:: questdb.auth.OidcNetworkError + :show-inheritance: diff --git a/docs/auth.rst b/docs/auth.rst new file mode 100644 index 00000000..fbef0b3e --- /dev/null +++ b/docs/auth.rst @@ -0,0 +1,250 @@ +.. _oidc_auth: + +=================== +OIDC Authentication +=================== + +QuestDB Enterprise can be secured with `OpenID Connect (OIDC) +`_. The :mod:`questdb.auth` module +lets you sign in interactively from Python — including from a **remote** kernel +(JupyterHub, SageMaker, Colab, VS Code-remote, containers) where there is no +local browser. + +It runs the `OAuth 2.0 Device Authorization Grant (RFC 8628) +`_ entirely client-side: you +authorize in **any** browser (your laptop or your phone), while the kernel only +makes outbound calls to your identity provider (IdP). The resulting token is +then presented to QuestDB over the auth paths it already supports — HTTP +``Authorization: Bearer`` or PG-wire ``_sso`` — so **no server change is +required**. + +.. note:: + + This feature targets **QuestDB Enterprise with OIDC enabled**. The IdP + client referenced by ``acl.oidc.client.id`` must have the device grant + (``urn:ietf:params:oauth:grant-type:device_code``) enabled and be a public + client. See :ref:`oidc_idp_requirements`. + +Two ways to use it +================== + +You can let the helper drive everything, or you can just take the token and use +it with your own tooling. + +Just the token (PG-wire / HTTP / anything) +------------------------------------------ + +If you connect to QuestDB yourself — over PG-wire, raw HTTP, or any other +client — you only need a valid token. This path has **no extra dependencies**. + +.. code-block:: python + + from questdb.auth import OidcDeviceAuth + + # Discover the OIDC configuration from the QuestDB server: + auth = OidcDeviceAuth.from_questdb("https://questdb.example.com:9000") + + token = auth.token() # runs the device flow on first use, else cached + headers = auth.headers() # {"Authorization": "Bearer "} + + # Use the token however you like, e.g. PG-wire via psycopg: + import psycopg + conn = psycopg.connect( + host="questdb.example.com", port=8812, dbname="qdb", + user="_sso", password=token) + +The integrated session +---------------------- + +The high-level :func:`questdb.auth.connect` returns a :class:`~questdb.auth.QuestDB` +session that signs you in and adapts the token into the common Python access +paths. + +.. code-block:: python + + from questdb.auth import connect + + qdb = connect("https://questdb.example.com:9000") # interactive sign-in + df = qdb.sql("SELECT * FROM trades WHERE ts > dateadd('h', -1, now())") + + # Bring-your-own client, same auto-refreshed token: + engine = qdb.sqlalchemy_engine() # PG-wire, token as _sso + with qdb.psycopg() as conn: # raw psycopg + ... + + from questdb.ingress import TimestampNanos # the compiled extension + with qdb.sender() as sender: # ingestion (ILP/HTTP) + sender.row("trades", columns={"price": 101.5}, + at=TimestampNanos.now()) + +On first use you will see a sign-in prompt (rendered as a clickable link in +Jupyter, plain text on a terminal):: + + 🔐 Sign in to QuestDB + Open https://idp.example.com/device and enter code: WDJB-MJHT + (or open directly: https://idp.example.com/device?user_code=WDJB-MJHT) + ⏳ waiting for authorization… (4:51 left) + ✅ Signed in as alice@example.com — token cached, expires in 60 min + +Re-running any cell is silent — the token is cached and refreshed silently on +the next use once it nears expiry. + +How it works +============ + +Configuration discovery +------------------------ + +:meth:`OidcDeviceAuth.from_questdb ` +(and :func:`~questdb.auth.connect`) resolve the OIDC configuration in this +order: + +1. ``GET {url}/settings`` (public, no auth) for the QuestDB-authoritative + values: ``acl.oidc.client.id``, ``acl.oidc.scope``, ``acl.oidc.token.endpoint``, + ``acl.oidc.groups.encoded.in.token`` and (on newer servers) + ``acl.oidc.device.authorization.endpoint``. +2. If the device-authorization endpoint is not advertised, the helper falls + back to the IdP discovery document + (``{issuer}/.well-known/openid-configuration``). This path **requires** an + explicit ``issuer=`` (or ``discovery_url=``) argument. + +Anything you pass explicitly overrides discovery. You can also skip discovery +entirely: + +.. code-block:: python + + auth = OidcDeviceAuth( + client_id="questdb", + device_authorization_endpoint="https://idp/.../device", + token_endpoint="https://idp/.../token", + scope="openid groups", + groups_in_token=True, # send id_token (True) vs access_token (False) + audience="questdb", # optional; some IdPs need it to set `aud` + cache="memory") + +Which token is sent +------------------- + +The helper mirrors QuestDB's own selection logic +(``groupsEncodedInToken ? idToken : accessToken``): + +============================================ ================= +``acl.oidc.groups.encoded.in.token`` Helper sends +============================================ ================= +``true`` ``id_token`` +``false`` ``access_token`` +============================================ ================= + +When sending the ``id_token`` the ``openid`` scope is requested automatically. + +Token lifecycle (cache + refresh) +--------------------------------- + +``token()`` returns the cached token while it is valid (with a small clock-skew +margin). When it nears expiry the helper silently refreshes it using the +``refresh_token`` if one was issued. If the refresh token is missing or rejected +(expired/revoked), it re-runs the interactive sign-in; a transient network error +is raised instead, so you can retry without being needlessly re-prompted. A lock +serializes refresh so parallel cells/threads don't double-prompt. + +Cache backends (``cache=`` argument): + +* ``"memory"`` *(default)* — process-global, nothing written to disk. + Re-running cells is silent; a kernel restart re-prompts once. +* ``None`` — never persist; prompt every time. + +Tokens are deliberately never written to disk: a kernel restart re-prompts +(an interactive sign-in is cheap relative to the risk of a refresh token +sitting in a plaintext file at rest). + +Non-interactive contexts +------------------------- + +Scheduled / non-interactive notebooks (papermill, cron, CI) have no human to +authorize the device. The helper detects this and raises +:class:`~questdb.auth.OidcInteractionRequired` instead of hanging. Use a QuestDB +**service-account REST token** or the **client-credentials** grant there. + +Connection adapters +=================== + +* :meth:`QuestDB.sql ` — query over REST ``/exec`` to a + pandas DataFrame using ``Authorization: Bearer``. Recommended: there is no + token-length limit (a groups-encoded JWT can be several KB). +* :meth:`QuestDB.sqlalchemy_engine ` — + PG-wire engine that injects a fresh token as the ``_sso`` password for every + new connection. Requires ``acl.oidc.pg.token.as.password.enabled=true``. +* :meth:`QuestDB.psycopg ` — a raw psycopg / + psycopg2 connection. +* :meth:`QuestDB.sender ` — a + :class:`~questdb.ingress.Sender` for ingestion (ILP over HTTP). + +.. note:: + + QuestDB validates the token at **authentication** time, not per query. An + already-open PG connection survives token expiry; only **new** connections + need a fresh token — which is why the PG-wire adapter supplies the token + per-connect. + +.. _oidc_idp_requirements: + +IdP requirements +=============== + +The OIDC client referenced by ``acl.oidc.client.id`` must: + +* have the **Device Authorization grant** enabled; +* be a **public client** (no secret in a notebook); +* optionally issue **refresh tokens** for the device grant (for silent refresh); +* issue tokens whose ``aud`` matches ``acl.oidc.audience`` (some IdPs need an + ``audience``/``resource`` request parameter); +* include the **groups** claim in the token (``groups.encoded.in.token=true``) + or expose it via the **userinfo** endpoint (``false``), matching the server. + +Security notes +============= + +* No IdP passwords are ever entered in the notebook; MFA/SSO happen at the IdP. +* ``https`` is required. Plaintext ``http`` to a **loopback** address + (``localhost`` / ``127.0.0.1`` / ``::1``) is always allowed — it never leaves + the host. ``insecure=True`` additionally permits plaintext to a non-loopback + **QuestDB** host (local development only); it does **not** downgrade the + **IdP**, so the device code and refresh token are never sent in cleartext + over the network. Certificate verification is never disabled. +* **Endpoint trust.** The device code and the long-lived refresh token are sent + to the device-authorization and token endpoints, which are discovered from + QuestDB ``/settings``. The helper requires both endpoints to share a single + origin and rejects the configuration otherwise. Because ``/settings`` is + authoritative-by-QuestDB, a compromised server could in principle point them + elsewhere; pass ``issuer=`` to **pin** the IdP so the endpoints are verified + to belong to it and credentials can't be redirected to another host. The pin + checks both the **origin** and, for endpoints advertised by ``/settings``, the + issuer **path** — so on a path-based multi-tenant IdP (e.g. Keycloak issuers + ``https://host/realms/{realm}``) a tampered ``/settings`` cannot redirect the + device code / refresh token to a *different realm on the same host*. (Caller- + supplied endpoints and endpoints from the IdP's own discovery document are + trusted as-is and not path-restricted, since some IdPs — e.g. Azure AD — place + their endpoints outside the issuer path; pass such endpoints explicitly or let + discovery resolve them.) When the server does not advertise the device- + authorization endpoint (so it must be discovered from the IdP), ``issuer=`` + (or ``discovery_url=``) is **required** for exactly this reason — the helper + refuses to guess the discovery origin from the server-supplied token endpoint. +* Adapters avoid logging the token / PG DSN. Avoid logging them yourself. +* Standard proxy / CA settings (``HTTPS_PROXY``, ``REQUESTS_CA_BUNDLE``, + ``SSL_CERT_FILE``) are honoured; you can also pass ``ca_bundle=``. The same + private CA is forwarded to the ingestion :meth:`~questdb.auth.QuestDB.sender` + (as the ILP ``tls_roots``) for an ``https`` QuestDB, so REST queries and ILP + ingestion trust the same roots. (Only a PEM **file** is forwarded this way; + for a CA *directory*, or to override, pass ``tls_roots=``/``tls_ca=`` to + ``sender()``.) + +Dependencies +=========== + +``token()`` / ``headers()`` need nothing beyond the standard library. The +following are imported lazily, only when used: + +* ``pandas`` — for :meth:`QuestDB.sql`; +* ``sqlalchemy`` and ``psycopg`` / ``psycopg2`` — for the PG-wire adapters; +* ``qrcode`` — to render a QR code for phone-based authorization (``qr=True``); +* ``IPython`` — for the rich Jupyter prompt (falls back to plain text). diff --git a/docs/index.rst b/docs/index.rst index 4540c9b5..6babf211 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -14,6 +14,7 @@ Contents installation sender conf + auth examples api troubleshooting diff --git a/docs/installation.rst b/docs/installation.rst index dc2f0405..fccd7599 100644 --- a/docs/installation.rst +++ b/docs/installation.rst @@ -6,7 +6,7 @@ Dependency ========== The Python QuestDB client does not have any additional run-time dependencies and -will run on any version of Python >= 3.9 on most platforms and architectures. +will run on any version of Python >= 3.10 on most platforms and architectures. From version 3.0.0, this library depends on ``numpy>=1.21.0``. @@ -23,6 +23,12 @@ These are bundled as the ``dataframe`` extra. Without this option, you may still ingest data row-by-row. +The :ref:`OIDC authentication helper ` (:mod:`questdb.auth`) needs +no extra dependencies for ``token()`` / ``headers()``. Some of its conveniences +import the following lazily, only when used: ``pandas`` (for ``sql()``), +``sqlalchemy`` and ``psycopg`` / ``psycopg2`` (PG-wire adapters), ``qrcode`` +(QR-code prompt) and ``IPython`` (rich Jupyter prompt). + PIP --- diff --git a/examples/oidc_device_auth.py b/examples/oidc_device_auth.py new file mode 100644 index 00000000..1691659f --- /dev/null +++ b/examples/oidc_device_auth.py @@ -0,0 +1,70 @@ +""" +Interactive OIDC sign-in to QuestDB Enterprise from Python (e.g. a notebook). + +Runs the OAuth 2.0 Device Authorization Grant (RFC 8628) client-side: you +authorize in any browser (laptop or phone), while the code runs on a possibly +remote kernel that only makes outbound calls to your identity provider. + +This requires QuestDB Enterprise with OIDC enabled and an IdP client that has +the device grant enabled. It cannot run unattended (there is a human in the +loop), so it is not part of the automated example suite. +""" + +import sys + +from questdb.auth import connect, OidcDeviceAuth, OidcError + + +QUESTDB_URL = 'https://questdb.example.com:9000' + + +def integrated(url: str = QUESTDB_URL): + """The high-level path: sign in, then query / ingest with one object.""" + # First call triggers the interactive device-flow sign-in; the token is + # cached, so re-running this is silent until it expires. + qdb = connect(url) + + # Query straight to a pandas DataFrame over REST (Authorization: Bearer). + df = qdb.sql("SELECT * FROM trades WHERE ts > dateadd('h', -1, now())") + print(df) + + # Feed the same auto-refreshed token into your existing tooling: + # engine = qdb.sqlalchemy_engine() # PG-wire, token as _sso password + # with qdb.psycopg() as conn: ... # raw psycopg + # + # questdb.ingress is the compiled extension; import it lazily (only the + # ingestion path needs it) so this module also loads for the pure-Python + # bring_your_own_client() path, which needs no extension. + from questdb.ingress import TimestampNanos + with qdb.sender() as sender: # ingestion (ILP over HTTP) + sender.row( + 'trades', + symbols={'symbol': 'ETH-USD', 'side': 'sell'}, + columns={'price': 2615.54, 'amount': 0.00044}, + at=TimestampNanos.now()) + + +def bring_your_own_client(url: str = QUESTDB_URL): + """The low-level path: you just want the token (PG-wire / HTTP / anything).""" + auth = OidcDeviceAuth.from_questdb(url) + + token = auth.token() # valid, auto-refreshed id/access token + headers = auth.headers() # {"Authorization": "Bearer "} + print('Authorization header ready:', 'Authorization' in headers) + + # e.g. hand the token to psycopg yourself over PG-wire: + # import psycopg + # conn = psycopg.connect(host='questdb.example.com', port=8812, + # dbname='qdb', user='_sso', password=token) + return token + + +def main(): + try: + integrated() + except OidcError as e: + sys.stderr.write(f'OIDC sign-in failed: {e}\n') + + +if __name__ == '__main__': + main() diff --git a/setup.py b/setup.py index 74438319..2f6ab44a 100755 --- a/setup.py +++ b/setup.py @@ -175,7 +175,7 @@ def readme(): name='questdb', version='4.1.0', platforms=['any'], - python_requires='>=3.8', + python_requires='>=3.10', install_requires=[], ext_modules = cythonize([ingress_extension()], annotate=True), cmdclass={'build_ext': questdb_build_ext}, diff --git a/src/questdb/auth/__init__.py b/src/questdb/auth/__init__.py new file mode 100644 index 00000000..a06acaf5 --- /dev/null +++ b/src/questdb/auth/__init__.py @@ -0,0 +1,85 @@ +################################################################################ +## ___ _ ____ ____ +## / _ \ _ _ ___ ___| |_| _ \| __ ) +## | | | | | | |/ _ \/ __| __| | | | _ \ +## | |_| | |_| | __/\__ \ |_| |_| | |_) | +## \__\_\\__,_|\___||___/\__|____/|____/ +## +## Copyright (c) 2014-2019 Appsicle +## Copyright (c) 2019-2024 QuestDB +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +################################################################################ + +""" +OIDC authentication helper for QuestDB (Jupyter-first). + +Runs the OAuth 2.0 Device Authorization Grant (RFC 8628) client-side and +presents the token to QuestDB (HTTP ``Bearer`` / PG-wire ``_sso``). Works on +browserless local and remote kernels (JupyterHub, SageMaker, Colab, +VS Code-remote): authorize in any browser, the kernel only calls the IdP. + +* **Just the token** — works with anything; no optional dependencies:: + + from questdb.auth import OidcDeviceAuth + + auth = OidcDeviceAuth.from_questdb("https://questdb.example.com:9000") + token = auth.token() # device flow on first use + headers = auth.headers() # {"Authorization": "Bearer .."} + +* **The integrated session** — query to a DataFrame and feed adapters:: + + from questdb.auth import connect + + qdb = connect("https://questdb.example.com:9000") + df = qdb.sql("SELECT * FROM trades LIMIT 10") + engine = qdb.sqlalchemy_engine() # PG-wire, token as _sso password + with qdb.sender() as sender: # ingestion (ILP/HTTP) + ... + +Optional deps (``pandas``, ``sqlalchemy``/``psycopg``, ``qrcode``, ``IPython``) +are imported lazily, only when used. +""" + +from ._device import OidcDeviceAuth +from ._discovery import OidcConfig +from ._cache import TokenCache, TokenSet, MemoryCache, NullCache +from ._errors import ( + OidcError, + OidcConfigError, + OidcNetworkError, + OidcInteractionRequired, + OidcDeviceFlowError, + OidcTimeoutError, + OidcAuthError, +) +from ._questdb import QuestDB, connect + +__all__ = [ + 'MemoryCache', + 'NullCache', + 'OidcAuthError', + 'OidcConfig', + 'OidcConfigError', + 'OidcDeviceAuth', + 'OidcDeviceFlowError', + 'OidcError', + 'OidcInteractionRequired', + 'OidcNetworkError', + 'OidcTimeoutError', + 'QuestDB', + 'TokenCache', + 'TokenSet', + 'connect', +] diff --git a/src/questdb/auth/_cache.py b/src/questdb/auth/_cache.py new file mode 100644 index 00000000..b8373fb6 --- /dev/null +++ b/src/questdb/auth/_cache.py @@ -0,0 +1,173 @@ +################################################################################ +## ___ _ ____ ____ +## / _ \ _ _ ___ ___| |_| _ \| __ ) +## | | | | | | |/ _ \/ __| __| | | | _ \ +## | |_| | |_| | __/\__ \ |_| |_| | |_) | +## \__\_\\__,_|\___||___/\__|____/|____/ +## +## Copyright (c) 2014-2019 Appsicle +## Copyright (c) 2019-2024 QuestDB +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +################################################################################ + +"""Token state and cache backends for :mod:`questdb.auth`.""" + +from __future__ import annotations + +import threading +from dataclasses import dataclass, field, replace +from typing import Dict, Optional, Union + +from ._errors import OidcConfigError + +# Refresh a little before the real expiry to absorb clock skew / latency. +DEFAULT_SKEW_SECONDS = 30 + + +@dataclass(frozen=True) +class TokenSet: + """ + IdP tokens plus their expiry. + + ``frozen`` because the lock-free fast path in + :class:`~questdb.auth._device.OidcDeviceAuth` reads a published ``TokenSet`` + without a lock, which is safe only if its fields never change; use + :func:`dataclasses.replace` for a modified copy. The secret fields are + excluded from ``repr`` so a token can't leak into a log or traceback. + """ + + access_token: Optional[str] = field(default=None, repr=False) + id_token: Optional[str] = field(default=None, repr=False) + refresh_token: Optional[str] = field(default=None, repr=False) + expires_at: float = 0.0 # epoch seconds; 0 == unknown + token_type: str = 'Bearer' + scope: Optional[str] = None + sub: Optional[str] = None + issued_at: float = 0.0 # epoch seconds; 0 == unknown + + def is_valid(self, now: float, skew: float = DEFAULT_SKEW_SECONDS) -> bool: + """True if the token is present and not within ``skew`` of expiry.""" + if self.expires_at <= 0: + return False + # Cap skew at half the token lifetime, so a short-lived (< 2*skew) + # token isn't reported expired the instant it's issued. + if self.issued_at: + lifetime = self.expires_at - self.issued_at + if lifetime > 0: + skew = min(skew, lifetime / 2) + return now < (self.expires_at - skew) + + +class TokenCache: + """Interface for token caches.""" + + def load(self, key: str) -> Optional[TokenSet]: # pragma: no cover + raise NotImplementedError + + def store(self, key: str, tokens: TokenSet) -> None: # pragma: no cover + raise NotImplementedError + + def clear(self, key: str) -> None: # pragma: no cover + raise NotImplementedError + + +# Module-global so a re-run notebook cell (fresh ``OidcDeviceAuth``) reuses the +# acquired token instead of re-prompting. +_MEMORY_STORE: Dict[str, TokenSet] = {} +# Per-key counter bumped on every clear(); store_if_current() uses it to drop a +# write from an acquisition that began before a concurrent clear() — even a +# clear() on a different OidcDeviceAuth sharing this store, whose per-instance +# lock doesn't serialize against this one — so clear() can't be silently undone. +_MEMORY_GENERATION: Dict[str, int] = {} +_MEMORY_LOCK = threading.Lock() + + +class MemoryCache(TokenCache): + """ + Process-global, in-memory cache (the default). + + Safest backend: nothing hits disk. Tokens live for the life of the process, + so re-running cells is silent; a kernel restart re-prompts once. + """ + + def load(self, key: str) -> Optional[TokenSet]: + # Return a copy so callers can't mutate the cached entry in place. + with _MEMORY_LOCK: + tokens = _MEMORY_STORE.get(key) + return replace(tokens) if tokens is not None else None + + def store(self, key: str, tokens: TokenSet) -> None: + with _MEMORY_LOCK: + _MEMORY_STORE[key] = replace(tokens) + + def clear(self, key: str) -> None: + with _MEMORY_LOCK: + _MEMORY_STORE.pop(key, None) + _MEMORY_GENERATION[key] = _MEMORY_GENERATION.get(key, 0) + 1 + + def generation(self, key: str) -> int: + """ + Current clear()-generation for ``key``. + + Capture before an IdP round-trip and pass to :meth:`store_if_current`, + which drops the write if a ``clear()`` bumped the counter meanwhile. + """ + with _MEMORY_LOCK: + return _MEMORY_GENERATION.get(key, 0) + + def store_if_current( + self, key: str, tokens: TokenSet, generation: int) -> bool: + """ + Store ``tokens`` only if no :meth:`clear` happened since ``generation``. + + If a concurrent ``clear()`` (on any OidcDeviceAuth sharing this store) + bumped the counter after ``generation`` was captured, the write is + dropped (``False``) so the cleared entry isn't resurrected with a stale + token; returns ``True`` when stored. + """ + with _MEMORY_LOCK: + if _MEMORY_GENERATION.get(key, 0) != generation: + return False + _MEMORY_STORE[key] = replace(tokens) + return True + + +class NullCache(TokenCache): + """Never persists anything; prompts every time.""" + + def load(self, key: str) -> Optional[TokenSet]: + return None + + def store(self, key: str, tokens: TokenSet) -> None: + pass + + def clear(self, key: str) -> None: + pass + + +_CacheSpec = Union[str, None, TokenCache] + + +def make_cache(spec: _CacheSpec) -> TokenCache: + """Resolve a cache spec (``"memory"`` / ``None`` / a TokenCache instance).""" + if isinstance(spec, TokenCache): + return spec + if spec is None or spec == 'none': + return NullCache() + if spec == 'memory': + return MemoryCache() + raise OidcConfigError( + f'Unknown cache backend {spec!r}; ' + "expected 'memory', None, or a TokenCache instance.") diff --git a/src/questdb/auth/_device.py b/src/questdb/auth/_device.py new file mode 100644 index 00000000..a3db901a --- /dev/null +++ b/src/questdb/auth/_device.py @@ -0,0 +1,813 @@ +################################################################################ +## ___ _ ____ ____ +## / _ \ _ _ ___ ___| |_| _ \| __ ) +## | | | | | | |/ _ \/ __| __| | | | _ \ +## | |_| | |_| | __/\__ \ |_| |_| | |_) | +## \__\_\\__,_|\___||___/\__|____/|____/ +## +## Copyright (c) 2014-2019 Appsicle +## Copyright (c) 2019-2024 QuestDB +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +################################################################################ + +"""The OAuth 2.0 device authorization grant (RFC 8628) token manager.""" + +from __future__ import annotations + +import base64 +import binascii +import json +import threading +import time +import webbrowser +from dataclasses import replace +from typing import Any, Dict, Optional + +from ._cache import TokenSet, make_cache +from ._discovery import OidcConfig, resolve_config, validate_endpoint_origins +from ._errors import ( + OidcConfigError, + OidcDeviceFlowError, + OidcError, + OidcInteractionRequired, + OidcNetworkError, + OidcTimeoutError, +) +from ._http import build_ssl_context, post_form, safe_urlparse +from ._render import ( + Renderer, + _safe_link_url, + detect_interactive, + in_ipython_kernel, + make_renderer, +) + +DEVICE_CODE_GRANT = 'urn:ietf:params:oauth:grant-type:device_code' +REFRESH_GRANT = 'refresh_token' + +_VALID_FLOWS = ('auto', 'device', 'loopback') + +# A non-positive expires_in is non-conformant; treat it as "unknown". +_DEFAULT_EXPIRES_IN = 3600 + +# Clamp the device-authorization timing fields (RFC 8628): a hostile/buggy +# response must not time the flow out before its first poll, pin the polling +# thread (which holds the acquisition lock) in one huge sleep, or keep the loop +# (and lock) alive indefinitely. +_DEFAULT_DEVICE_CODE_LIFETIME = 600 # expires_in fallback (absent/invalid/<=0) +_MAX_DEVICE_CODE_LIFETIME = 1800 # cap on how long we keep polling +_MAX_POLL_INTERVAL = 60 # cap on the poll interval (incl. slow_down) + + +class _SystemClock: + """Real time source; the default for :class:`OidcDeviceAuth`.""" + sleep = staticmethod(time.sleep) + monotonic = staticmethod(time.monotonic) + now = staticmethod(time.time) + + +_SYSTEM_CLOCK = _SystemClock() + + +def _decode_jwt_claims(token: Optional[str]) -> Dict[str, Any]: + """ + Best-effort decode of a JWT payload **without signature verification**. + + Used only to show a friendly identity in the sign-in message; QuestDB does + the real validation. Returns ``{}`` for opaque/invalid tokens. + """ + if not token or token.count('.') < 2: + return {} + try: + payload = token.split('.')[1] + payload += '=' * (-len(payload) % 4) # restore base64 padding + raw = base64.urlsafe_b64decode(payload.encode('ascii')) + claims = json.loads(raw) + return claims if isinstance(claims, dict) else {} + except (ValueError, binascii.Error, UnicodeDecodeError, RecursionError): + # RecursionError (deeply-nested JSON exhausts the decoder stack) isn't a + # ValueError, so list it explicitly: a hostile token must not crash + # token()/refresh here. + return {} + + +def _identity_from_claims(claims: Dict[str, Any]) -> Optional[str]: + for key in ('email', 'preferred_username', 'upn', 'name', 'sub'): + value = claims.get(key) + if value: + return str(value) + return None + + +def _http_status_is_terminal_4xx(status: Optional[int]) -> bool: + """ + True for a 4xx that is a definitive rejection. + + A non-JSON body with such a status (e.g. an HTML ``403`` from a WAF/proxy or + non-conformant IdP) is never an ``authorization_pending`` / ``slow_down`` + (those are always JSON), so the poll must fail fast rather than retry to a + misleading "code expired". ``429`` is excluded — it's a transient rate-limit. + """ + return status is not None and 400 <= status < 500 and status != 429 + + +def _http_status_is_transient(status: Optional[int]) -> bool: + """True for a server-side (5xx) or rate-limit (429) status worth retrying.""" + return status is not None and (status >= 500 or status == 429) + + +class OidcDeviceAuth: + """ + Acquire and refresh an OIDC token via the device authorization grant. + + The token is presented to QuestDB over the auth paths it already supports: + HTTP ``Authorization: Bearer`` or PG-wire ``_sso`` (token as password). The + flow runs entirely client-side; QuestDB is never in the acquisition path. + + Most users only call :meth:`token` (or :meth:`headers`). The first call runs + the interactive device flow; later calls return the cached token, refreshing + it silently and synchronously once it nears expiry (no background thread). + Acquisition is serialized so concurrent callers don't double-prompt, while a + valid cached token is returned without blocking on another's sign-in. + + **Concurrency note.** The lock is held for a whole interactive sign-in (up + to the device-code lifetime, ~30 min): a caller with a *valid* cached token + never blocks, but one whose token is missing/expired waits behind the + signer. So when threads share an auth object (e.g. a SQLAlchemy/psycopg + pool), sign in once up front — :func:`questdb.auth.connect` does this via + ``eager=True`` (the default), running the flow once on the main thread before + the pool opens connections. + + .. code-block:: python + + from questdb.auth import OidcDeviceAuth + + # Discover everything from the QuestDB server: + auth = OidcDeviceAuth.from_questdb("https://questdb.example.com:9000") + token = auth.token() # device flow on first use, else cached + + Or fully explicit (no server discovery): + + .. code-block:: python + + auth = OidcDeviceAuth( + client_id="questdb", + device_authorization_endpoint="https://idp/.../device", + token_endpoint="https://idp/.../token", + scope="openid groups", + groups_in_token=True, + audience="questdb", + cache="memory") + """ + + def __init__( + self, + client_id: str, + device_authorization_endpoint: str, + token_endpoint: str, + *, + scope: str = 'openid', + groups_in_token: bool = True, + audience: Optional[str] = None, + issuer: Optional[str] = None, + cache: Any = 'memory', + insecure: bool = False, + ca_bundle: Optional[str] = None, + open_browser: bool = False, + interactive: Optional[bool] = None, + qr: bool = False, + renderer: Optional[Renderer] = None, + default_interval: int = 5, + timeout: float = 30, + _clock=None): # injectable time source for testing + if not client_id: + raise OidcConfigError('client_id is required') + if not device_authorization_endpoint: + raise OidcConfigError('device_authorization_endpoint is required') + if not token_endpoint: + raise OidcConfigError('token_endpoint is required') + + # Sending the id_token requires the ``openid`` scope. + if groups_in_token and 'openid' not in scope.split(): + scope = ('openid ' + scope).strip() + + self.config = OidcConfig( + client_id=client_id, + token_endpoint=token_endpoint, + device_authorization_endpoint=device_authorization_endpoint, + scope=scope, + groups_in_token=groups_in_token, + audience=audience, + issuer=issuer) + + # Enforce the credential-endpoint co-location / issuer pin here too (not + # just on the discovery path), so the guarantee holds for this + # constructor as well. + validate_endpoint_origins( + self.config.token_endpoint, + self.config.device_authorization_endpoint, + self.config.issuer) + + # `insecure` permits plaintext http only to QuestDB (e.g. local dev). + # _idp_post always holds the IdP to https (or loopback http), so the + # device code / refresh token are never sent in cleartext even when set. + self.insecure = insecure + self.open_browser = open_browser + # Kept so adapters with their own transport (QuestDB.sender's ILP Sender) + # can forward the same private CA as _ctx rather than the default roots. + self._ca_bundle = ca_bundle + self._interactive = interactive + self._default_interval = default_interval + # Per-request network timeout for every IdP call (device-code, each poll, + # refresh). Bounds how long one network leg pins the acquisition lock if + # the IdP stalls; the total poll duration is separately capped by + # _MAX_DEVICE_CODE_LIFETIME. + self._timeout = timeout + self._cache = make_cache(cache) + self._ctx = build_ssl_context(ca_bundle) + self._renderer = renderer if renderer is not None else make_renderer(qr=qr) + # Serializes token *acquisition* (silent refresh or interactive sign-in) + # only. Without it, threaded SQLAlchemy/psycopg connections opening as + # the token expires would run overlapping refreshes — and with + # refresh-token rotation all but one would fail and re-prompt. NOT held + # on the fast path, so a valid cached token never blocks behind a + # sign-in. + self._lock = threading.Lock() + self._tokens: Optional[TokenSet] = None + clock = _clock or _SYSTEM_CLOCK + self._sleep = clock.sleep + self._monotonic = clock.monotonic + self._now = clock.now + + # -- construction ------------------------------------------------------- + + @classmethod + def from_questdb( + cls, + url: str, + *, + client_id: Optional[str] = None, + scope: Optional[str] = None, + audience: Optional[str] = None, + groups_in_token: Optional[bool] = None, + issuer: Optional[str] = None, + discovery_url: Optional[str] = None, + token_endpoint: Optional[str] = None, + device_authorization_endpoint: Optional[str] = None, + flow: str = 'auto', + cache: Any = 'memory', + insecure: bool = False, + ca_bundle: Optional[str] = None, + open_browser: bool = False, + interactive: Optional[bool] = None, + qr: bool = False, + renderer: Optional[Renderer] = None, + default_interval: int = 5, + timeout: float = 30, + _clock=None) -> 'OidcDeviceAuth': # injectable time source + """ + Build an :class:`OidcDeviceAuth` by discovering config from QuestDB. + + Reads ``{url}/settings`` for the OIDC client id, scope, endpoints and + groups mode, falling back to the IdP ``.well-known`` document for the + device-authorization endpoint when QuestDB doesn't advertise it. Any + explicit keyword overrides discovery. + """ + _validate_flow(flow) + ctx = build_ssl_context(ca_bundle) + cfg = resolve_config( + questdb_url=url, + client_id=client_id, + scope=scope, + audience=audience, + groups_in_token=groups_in_token, + token_endpoint=token_endpoint, + device_authorization_endpoint=device_authorization_endpoint, + issuer=issuer, + discovery_url=discovery_url, + ctx=ctx, + insecure=insecure, + timeout=timeout) + return cls( + client_id=cfg.client_id, + device_authorization_endpoint=cfg.device_authorization_endpoint, + token_endpoint=cfg.token_endpoint, + scope=cfg.scope, + groups_in_token=cfg.groups_in_token, + audience=cfg.audience, + issuer=cfg.issuer, + cache=cache, + insecure=insecure, + ca_bundle=ca_bundle, + open_browser=open_browser, + interactive=interactive, + qr=qr, + renderer=renderer, + default_interval=default_interval, + timeout=timeout, + _clock=_clock) + + # -- public API --------------------------------------------------------- + + def token(self) -> str: + """ + Return a valid token for QuestDB, acquiring or refreshing as needed. + + Returns the ``id_token`` when the server expects groups encoded in the + token (``acl.oidc.groups.encoded.in.token=true``), else the + ``access_token`` — mirroring QuestDB's own selection logic. + """ + return self._select(self._obtain_tokens()) + + def headers(self) -> Dict[str, str]: + """Return ``{"Authorization": "Bearer "}``.""" + return {'Authorization': f'Bearer {self.token()}'} + + @property + def cache_key(self) -> str: + """ + Identifies the token's security context for caching. + + Two sessions share a cached token only when they'd accept the same one: + same IdP token endpoint (**path included**, so multi-tenant realms on one + host don't collide), client id, scope *set* (order-insensitive), + audience, and token-kind mode (``groups_in_token`` — id_token vs + access_token). The QuestDB URL is excluded — the same IdP token is valid + against any QuestDB that trusts it. + + ``groups_in_token`` is keyed because it selects the token kind + :meth:`_select` returns; otherwise two sessions differing only in that + mode would collide and repeatedly evict each other's token (self- + correcting, but at the cost of avoidable refreshes / re-prompts). + """ + c = self.config + scope = ' '.join(sorted(c.scope.split())) if c.scope else '' + return '\x1f'.join([ + c.issuer or '', + _normalize_url(c.token_endpoint), + c.client_id, + scope, + c.audience or '', + 'groups' if c.groups_in_token else 'access']) + + def clear(self) -> None: + """Forget the cached token (forces a fresh sign-in next time).""" + # self._lock serializes against THIS instance's acquisition; the shared + # MemoryCache also bumps a per-key generation, so an in-flight acquire on + # ANOTHER instance sharing the process-global store can't repopulate the + # entry (its _store sees the bumped generation and drops the write). + # Resets the local/process cache only — does not revoke at the IdP. + with self._lock: + self._tokens = None + self._cache.clear(self.cache_key) + + # -- token lifecycle ---------------------------------------------------- + + def _select(self, tokens: TokenSet) -> str: + if self.config.groups_in_token: + if not tokens.id_token: + raise OidcConfigError( + 'Server expects groups encoded in the token but the IdP ' + 'returned no id_token. Ensure the "openid" scope is ' + 'requested (current scope: ' + f'{self.config.scope!r}).') + return tokens.id_token + if not tokens.access_token: + raise OidcConfigError('IdP returned no access_token.') + return tokens.access_token + + def _has_required_token(self, tokens: TokenSet) -> bool: + """ + True if ``tokens`` carries the kind :meth:`_select` will return (the + ``id_token`` in groups mode, else the ``access_token``). The cache gate + and post-refresh check share this predicate so they can't disagree with + ``_select``. + """ + if self.config.groups_in_token: + return bool(tokens.id_token) + return bool(tokens.access_token) + + def _missing_required_token_error(self) -> OidcDeviceFlowError: + """ + Terminal error for a *completed* grant whose response omits the kind + :meth:`_select` needs. Mirrors :meth:`_select`'s diagnostics but as an + :class:`OidcDeviceFlowError`, so the poll can raise it without first + caching an unusable response. + """ + if self.config.groups_in_token: + return OidcDeviceFlowError( + 'Device authorization completed but the IdP returned no ' + 'id_token, which this server requires (it expects groups ' + 'encoded in the token). Ensure the "openid" scope is requested ' + f'(current scope: {self.config.scope!r}).') + return OidcDeviceFlowError( + 'Device authorization completed but the IdP returned no ' + 'access_token.') + + def _obtain_tokens(self) -> TokenSet: + # Fast path: return a valid token without the lock, so a caller with a + # usable token never blocks behind another thread's refresh/sign-in. + # READ-ONLY — never writes self._tokens; every write to that field is + # under the lock (the promotion below, _store, clear), so this lock-free + # reader can't race a write or resurrect a just-cleared token. + tokens = self._valid_cached() + if tokens is not None: + return tokens + # Slow path: serialize acquisition so concurrent callers don't overlap + # refreshes or double-prompt; the loser re-checks and reuses the + # winner's token. + with self._lock: + # Capture the generation before reading/acquiring, so a racing + # clear() — including on another instance sharing the process-global + # MemoryCache (whose per-instance lock doesn't serialize against + # ours) — invalidates the store below instead of resurrecting the + # cleared entry. + generation = self._cache_generation() + # Promote a cached token under the lock (even expired, so _acquire + # can reuse its refresh_token). Here, not on the fast path, so every + # write to self._tokens stays serialized. + if self._tokens is None: + cached = self._cache.load(self.cache_key) + if cached is not None: + self._tokens = cached + tokens = self._valid_cached() + if tokens is not None: + return tokens + return self._acquire(generation) + + def _valid_cached(self) -> Optional[TokenSet]: + # Read-only: reads the published field, falling back to the shared cache + # backend. Never writes self._tokens (that's lock-only), so it's safe on + # the lock-free fast path. + tokens = self._tokens + if tokens is None: + tokens = self._cache.load(self.cache_key) + if (tokens is not None and tokens.is_valid(self._now()) + and self._has_required_token(tokens)): + return tokens + return None + + def _acquire(self, generation: int) -> TokenSet: + # Holds self._lock. Try a silent refresh, else run the device flow. + # `generation` was captured before the cache read in _obtain_tokens; + # _store drops its write if a concurrent clear() bumped it since. + tokens = self._tokens + if tokens is not None and tokens.refresh_token: + try: + refreshed = self._refresh(tokens) + except OidcNetworkError: + # Transient: the refresh token is still valid, so the interactive + # flow (same network) wouldn't help and would needlessly + # re-prompt. Surface it; the cached token is kept for a retry. + raise + except OidcError: + # Refresh token rejected (expired/revoked) or unusable response: + # fall through to a fresh interactive sign-in. + pass + else: + # Accept only a refresh that yields the kind we need: some IdPs + # don't re-issue the id_token on refresh, so fall through rather + # than cache an unusable response and loop on every call. + if self._has_required_token(refreshed): + self._store(refreshed, generation) + return refreshed + + fresh = self._run_device_flow() + self._store(fresh, generation) + return fresh + + def _store(self, tokens: TokenSet, generation: int) -> None: + # self._tokens is this instance's own view, so always set it (the caller + # uses what it just acquired). The shared-cache write is conditional: a + # clear() (here or on another instance sharing the store) that bumped the + # generation drops the write, so clear() isn't silently undone. Backends + # without generation support (NullCache / custom TokenCache) store + # unconditionally. + self._tokens = tokens + store_if_current = getattr(self._cache, 'store_if_current', None) + if store_if_current is not None: + store_if_current(self.cache_key, tokens, generation) + else: + self._cache.store(self.cache_key, tokens) + + def _cache_generation(self) -> int: + # MemoryCache tracks a per-key clear()-generation for the cross-instance + # CAS in _store; other backends don't, so default to 0 (unconditional + # store). + generation = getattr(self._cache, 'generation', None) + return generation(self.cache_key) if generation is not None else 0 + + def _tokenset_from_response(self, body: Dict[str, Any]) -> TokenSet: + try: + expires_in = int(body.get('expires_in', _DEFAULT_EXPIRES_IN)) + except (TypeError, ValueError, OverflowError): + # OverflowError: a JSON Infinity (json.loads accepts it) → int(inf) + # isn't a ValueError, so list it to keep the typed contract. + expires_in = _DEFAULT_EXPIRES_IN + if expires_in <= 0: + # A non-positive lifetime marks a just-issued token as expired, + # causing refresh/re-prompt churn. Treat it as unknown. + expires_in = _DEFAULT_EXPIRES_IN + claims = (_decode_jwt_claims(body.get('id_token')) + or _decode_jwt_claims(body.get('access_token'))) + now = self._now() + return TokenSet( + access_token=body.get('access_token'), + id_token=body.get('id_token'), + refresh_token=body.get('refresh_token'), + expires_at=now + expires_in, + issued_at=now, + token_type=body.get('token_type', 'Bearer'), + scope=body.get('scope', self.config.scope), + sub=claims.get('sub')) + + def _idp_post(self, url: str, form: Dict[str, Any]): + # IdP POSTs carry the device code / refresh token, so always https + # (loopback http is fine for local dev); the user's `insecure` flag (the + # QuestDB link) never downgrades them. The timeout bounds how long this + # leg can hold the acquisition lock if the IdP stalls. + return post_form( + url, form, ctx=self._ctx, insecure=False, timeout=self._timeout) + + def _refresh(self, tokens: TokenSet) -> TokenSet: + try: + status, body = self._idp_post( + self.config.token_endpoint, + { + 'grant_type': REFRESH_GRANT, + 'refresh_token': tokens.refresh_token, + 'client_id': self.config.client_id, + 'scope': self.config.scope, + # Re-send the audience (mirroring the device-authorization + # request): some IdPs (e.g. Auth0) need it to keep the + # rotated token's `aud`, else they mint one QuestDB rejects + # only after a silent refresh. Others ignore it; post_form + # drops it when audience is None. + 'audience': self.config.audience, + }) + except OidcNetworkError: + # Already transient (socket drop / DNS / timeout): propagate so + # _acquire keeps the still-valid refresh token and retries later. + raise + except OidcError as e: + # Non-JSON HTTP error body (e.g. an HTML 5xx from a proxy). 5xx/429 + # is transient → re-raise as a network error so _acquire keeps the + # refresh token; a 4xx is a genuine rejection, so let it fall through + # to a fresh interactive sign-in. + if _http_status_is_transient(getattr(e, 'status', None)): + raise OidcNetworkError(str(e)) from e + raise + if status == 200: + refreshed = self._tokenset_from_response(body) + # Many IdPs don't rotate the refresh token; keep the old one. + # TokenSet is frozen, so derive a copy. + if not refreshed.refresh_token: + refreshed = replace( + refreshed, refresh_token=tokens.refresh_token) + return refreshed + # A transient 5xx/429 during a silent refresh must not tear down the + # session: the refresh token is still valid, so surface it as a network + # error for _acquire to retry — matching the poll loop. Only a genuine + # rejection (expired/revoked token, 4xx invalid_grant) falls through to a + # fresh sign-in. + if _http_status_is_transient(status): + raise OidcNetworkError( + f'Token refresh hit a transient IdP error (HTTP {status}); ' + 'the refresh token is still valid — retry later.') + raise OidcDeviceFlowError( + f"Token refresh failed: {body.get('error', 'unknown error')}", + error=body.get('error'), + error_description=body.get('error_description')) + + # -- device flow (RFC 8628) --------------------------------------------- + + def _run_device_flow(self) -> TokenSet: + if not self._is_interactive(): + raise OidcInteractionRequired( + 'Interactive sign-in is required, but no interactive terminal ' + 'or notebook was detected (e.g. papermill / cron / CI). Use a ' + 'QuestDB service-account REST token or the OAuth2 ' + 'client-credentials grant for non-interactive contexts.') + + resp = self._request_device_code() + self._renderer.on_prompt(resp) + self._maybe_open_browser(resp) + tokens = self._poll_for_token(resp) + claims = (_decode_jwt_claims(tokens.id_token) + or _decode_jwt_claims(tokens.access_token)) + identity = _identity_from_claims(claims) + self._renderer.on_success( + identity, max(0.0, tokens.expires_at - self._now())) + return tokens + + def _request_device_code(self) -> Dict[str, Any]: + form = { + 'client_id': self.config.client_id, + 'scope': self.config.scope, + } + if self.config.audience: + form['audience'] = self.config.audience + status, body = self._idp_post( + self.config.device_authorization_endpoint, form) + if status == 200 and body.get('device_code') and body.get('user_code'): + return body + error = body.get('error') + if status == 200: + # 200 but the guard above failed: device_code/user_code missing. + # A non-conformant body, not an HTTP failure — say so plainly rather + # than a contradictory "failed (HTTP 200)". + raise OidcDeviceFlowError( + 'The IdP returned a 200 device-authorization response that is ' + 'missing the required "device_code"/"user_code" fields; cannot ' + 'start the device flow.', + error=error, + error_description=body.get('error_description')) + if status in (400, 404, 405) or error in ( + 'invalid_client', 'unauthorized_client', + 'unsupported_grant_type'): + raise OidcDeviceFlowError( + 'The IdP rejected the device-authorization request ' + f'(HTTP {status}, error={error!r}). Ensure the OIDC client ' + f'{self.config.client_id!r} has the device grant ' + "('urn:ietf:params:oauth:grant-type:device_code') enabled and " + 'is registered as a public client.', + error=error, + error_description=body.get('error_description')) + raise OidcDeviceFlowError( + f'Device authorization request failed (HTTP {status}): ' + f'{body.get("error_description") or error or body}', + error=error, + error_description=body.get('error_description')) + + def _poll_for_token(self, resp: Dict[str, Any]) -> TokenSet: + device_code = resp['device_code'] + try: + interval = int(resp.get('interval', self._default_interval)) + except (TypeError, ValueError, OverflowError): + interval = self._default_interval + # At least 1s (RFC 8628 floor), capped so a hostile value can't pin the + # polling thread (which holds the lock) in one enormous sleep. + interval = min(_MAX_POLL_INTERVAL, max(1, interval)) + try: + expires_in = int(resp.get('expires_in', _DEFAULT_DEVICE_CODE_LIFETIME)) + except (TypeError, ValueError, OverflowError): + expires_in = _DEFAULT_DEVICE_CODE_LIFETIME + # A non-positive lifetime would time out before the first poll (the code + # is already shown); treat it as unknown. Cap the upper end so a hostile + # value can't keep the loop — and the lock — alive indefinitely. + if expires_in <= 0: + expires_in = _DEFAULT_DEVICE_CODE_LIFETIME + expires_in = min(expires_in, _MAX_DEVICE_CODE_LIFETIME) + deadline = self._monotonic() + expires_in + + while True: + remaining = deadline - self._monotonic() + if remaining <= 0: + self._renderer.on_failure( + 'Code expired — run the cell again to retry.') + raise OidcTimeoutError( + 'The device code expired before authorization completed. ' + 'Run the sign-in again.', + error='expired_token') + self._renderer.on_waiting(remaining) + # Never sleep past the deadline (remaining > 0 here). + self._sleep(min(interval, remaining)) + + try: + status, body = self._idp_post( + self.config.token_endpoint, + { + 'grant_type': DEVICE_CODE_GRANT, + 'device_code': device_code, + 'client_id': self.config.client_id, + }) + except OidcError as e: + # A non-JSON 4xx is a terminal rejection (e.g. an HTML error page + # from a WAF/proxy, or a non-conformant IdP): a conformant OAuth + # error is JSON, so it can never be authorization_pending / + # slow_down. Fail fast instead of polling on to "code expired". + if _http_status_is_terminal_4xx(getattr(e, 'status', None)): + self._renderer.on_failure( + 'Sign-in failed: the identity provider rejected the ' + 'request.') + raise OidcDeviceFlowError( + f'Device flow failed: the IdP rejected the token ' + f'request ({e}).') from e + # Otherwise transient: a dropped connection / DNS blip / timeout + # (OidcNetworkError) or a non-JSON 5xx/429 from a proxy (bare + # OidcError). The user may already have authorized, and RFC 8628 + # §3.4 expects polling to continue until the code expires, so + # poll again rather than discard the sign-in (the deadline bounds + # the total wait; a genuine JSON rejection arrives below). + if getattr(e, 'status', None) == 429: + interval = min(_MAX_POLL_INTERVAL, interval + 5) + continue + + if status == 200: + # The RFC 6749 §5.1 token response: the grant completed. Accept + # it only if it carries the kind _select hands to QuestDB, using + # the same predicate as the cache gate and post-refresh check so + # the three can't disagree. + tokens = self._tokenset_from_response(body) + if self._has_required_token(tokens): + return tokens + # Grant completed but the required kind is absent: a stable + # misconfiguration, not a transient poll state. Raise a terminal + # error rather than cache an unusable token and silently re-run + # the whole flow on every later token() call. + self._renderer.on_failure( + 'Sign-in failed: the identity provider did not return the ' + 'token this server requires.') + raise self._missing_required_token_error() + + # A 5xx/429 with a JSON body is also transient (server error or + # rate-limit), not a terminal rejection: back off on 429 and keep + # polling until the deadline, as above. + if status >= 500 or status == 429: + if status == 429: + interval = min(_MAX_POLL_INTERVAL, interval + 5) + continue + + error = body.get('error') + if error == 'authorization_pending': + continue + if error == 'slow_down': + interval = min(_MAX_POLL_INTERVAL, interval + 5) + continue + if error == 'expired_token': + self._renderer.on_failure( + 'Code expired — run the cell again to retry.') + raise OidcTimeoutError( + 'The device code expired before authorization completed. ' + 'Run the sign-in again.', + error=error) + # access_denied or any other terminal error. + description = body.get('error_description') or error or 'unknown error' + self._renderer.on_failure(f'Sign-in failed: {description}') + raise OidcDeviceFlowError( + f'Device flow failed: {description}', + error=error, + error_description=body.get('error_description')) + + # -- helpers ------------------------------------------------------------ + + def _is_interactive(self) -> bool: + if self._interactive is not None: + return self._interactive + return detect_interactive() + + def _maybe_open_browser(self, resp: Dict[str, Any]) -> None: + # Never auto-open on a (possibly remote) notebook kernel; only on an + # opted-in local terminal. + if not self.open_browser or in_ipython_kernel(): + return + # Only http(s) — never a javascript:/data: scheme from a malicious or + # MITM'd device response. + target = _safe_link_url( + resp.get('verification_uri_complete') + or resp.get('verification_uri') + or resp.get('verification_url')) + if target: + try: + webbrowser.open(target) + except Exception: + pass + + +def _validate_flow(flow: str) -> None: + if flow not in _VALID_FLOWS: + raise OidcConfigError( + f'Unknown flow {flow!r}; expected one of {_VALID_FLOWS}.') + if flow == 'loopback': + raise OidcConfigError( + "The 'loopback' (Authorization Code + PKCE) flow is not yet " + "implemented. Use flow='device' (works on local and remote " + 'kernels alike).') + + +def _normalize_url(url: str) -> str: + # Full URL with scheme/host lower-cased and default port dropped, but path + # kept (it distinguishes multi-tenant realms). Used for the cache key so + # trivial spelling differences don't cause a spurious re-prompt. + parts, port = safe_urlparse(url) + scheme = (parts.scheme or '').lower() + host = (parts.hostname or '').lower() + default_port = {'https': 443, 'http': 80}.get(scheme) + if port and port != default_port: + netloc = f'{host}:{port}' + else: + netloc = host + query = f'?{parts.query}' if parts.query else '' + return f'{scheme}://{netloc}{parts.path}{query}' diff --git a/src/questdb/auth/_discovery.py b/src/questdb/auth/_discovery.py new file mode 100644 index 00000000..68fe7301 --- /dev/null +++ b/src/questdb/auth/_discovery.py @@ -0,0 +1,536 @@ +################################################################################ +## ___ _ ____ ____ +## / _ \ _ _ ___ ___| |_| _ \| __ ) +## | | | | | | |/ _ \/ __| __| | | | _ \ +## | |_| | |_| | __/\__ \ |_| |_| | |_) | +## \__\_\\__,_|\___||___/\__|____/|____/ +## +## Copyright (c) 2014-2019 Appsicle +## Copyright (c) 2019-2024 QuestDB +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +################################################################################ + +""" +OIDC configuration discovery. + +Resolution order: + +1. ``GET {questdb_url}/settings`` (public) -> QuestDB-authoritative + ``acl.oidc.*`` values (client id, scope, endpoints, groups mode). +2. If QuestDB doesn't advertise the device-authorization endpoint, fall back to + the IdP discovery document (``{issuer}/.well-known/openid-configuration``). +""" + +from __future__ import annotations + +import ssl +import urllib.parse +from dataclasses import dataclass +from typing import Any, Dict, Optional + +from ._errors import OidcConfigError +from ._http import get_json, safe_urlparse, _is_loopback + +# QuestDB /settings keys (see EntPropServerConfiguration.exportConfiguration()). +_K_ENABLED = 'acl.oidc.enabled' +_K_CLIENT_ID = 'acl.oidc.client.id' +_K_SCOPE = 'acl.oidc.scope' +_K_TOKEN_ENDPOINT = 'acl.oidc.token.endpoint' +_K_AUTHORIZATION_ENDPOINT = 'acl.oidc.authorization.endpoint' +_K_DEVICE_ENDPOINT = 'acl.oidc.device.authorization.endpoint' # design §7 (new) +_K_GROUPS_IN_TOKEN = 'acl.oidc.groups.encoded.in.token' +_K_AUDIENCE = 'acl.oidc.audience' +_K_HOST = 'acl.oidc.host' +_K_PORT = 'acl.oidc.port' +_K_TLS_ENABLED = 'acl.oidc.tls.enabled' + + +@dataclass +class OidcConfig: + """Resolved OIDC parameters needed to run the device flow.""" + + client_id: str + token_endpoint: str + device_authorization_endpoint: str + scope: str = 'openid' + groups_in_token: bool = True + audience: Optional[str] = None + issuer: Optional[str] = None + authorization_endpoint: Optional[str] = None + + +def _as_bool(value: Any, default: Optional[bool] = None) -> Optional[bool]: + if value is None: + return default + if isinstance(value, bool): + return value + if isinstance(value, (int, float)): + return bool(value) + if isinstance(value, str): + v = value.strip().lower() + if v in ('true', '1', 'yes', 'on'): + return True + if v in ('false', '0', 'no', 'off', ''): + return False + return default + + +def _str_setting(value: Any) -> Optional[str]: + """ + A ``/settings`` value as a non-empty string, else ``None``. + + Drops a non-string ``acl.oidc.*`` value (a JSON list/number from a buggy or + hostile server) so it can't reach ``scope.split()`` / the cache-key join as a + raw object and escape the typed-error contract with an ``AttributeError`` / + ``TypeError``. Mirrors :func:`_resolve_endpoint`. + """ + return value if isinstance(value, str) and value else None + + +def settings_config(settings: Any) -> Dict[str, Any]: + """ + Return the trusted config map from a ``/settings`` response. + + Modern QuestDB nests server-authoritative values under ``"config"``, + alongside a **user-writable** ``"preferences"`` sibling (written via + ``PUT /settings``). Read only ``"config"`` so a user who can write a + preference can't smuggle an ``acl.oidc.*`` key (e.g. a redirected + ``token.endpoint``) into the resolved config. A genuinely flat legacy + response (no ``config`` / ``preferences`` split) is still tolerated. + """ + if not isinstance(settings, dict): + return {} + cfg = settings.get('config') + if isinstance(cfg, dict): + return cfg + # Either marker present => structured response: read "config" or nothing, + # never the user-writable top level — even when "config" is absent/malformed. + if 'config' in settings or 'preferences' in settings: + return {} + # Legacy flat response: no config/preferences split; tolerate top-level keys. + return settings + + +def fetch_settings( + questdb_url: str, + *, + ctx: Optional[ssl.SSLContext] = None, + insecure: bool = False, + timeout: float = 30) -> Dict[str, Any]: + """Fetch and return the QuestDB ``/settings`` config map.""" + base = questdb_url.rstrip('/') + data = get_json(base + '/settings', ctx=ctx, insecure=insecure, + timeout=timeout) + return settings_config(data) + + +_DEFAULT_PORTS = {'https': 443, 'http': 80} + + +def _normalized_origin(url: str) -> tuple: + """(scheme, host, port) with default ports filled in, for comparison.""" + parts, explicit_port = safe_urlparse(url) + scheme = (parts.scheme or '').lower() + host = (parts.hostname or '').lower() + port = explicit_port or _DEFAULT_PORTS.get(scheme) + return (scheme, host, port) + + +def _origin_str(url: str) -> str: + scheme, host, port = _normalized_origin(url) + return f'{scheme}://{host}:{port}' if port else f'{scheme}://{host}' + + +def _settings_channel_is_plaintext(questdb_url: str) -> bool: + """ + True if QuestDB ``/settings`` was fetched over plaintext http to a + non-loopback host — a MITM-tamperable channel (only reachable with + ``insecure=True``). Endpoints advertised over it must not route credentials + without an out-of-band pin. + """ + parts, _ = safe_urlparse(questdb_url) + return (parts.scheme or '').lower() == 'http' and not _is_loopback( + parts.hostname) + + +def _decode_path_segments(path: str) -> list: + """ + Fully percent-decode a URL path and split it into ``/`` segments. + + Decoding repeats until stable so a multiply-encoded dot segment + (``%252e%252e`` -> ``..``) or encoded slash (``%2f``) — which a server/proxy + may unescape more than once before normalizing — is unmasked. Backslash is a + separator (some proxies fold ``\\`` to ``/``). The containment check compares + these decoded segments, not the raw wire string, so an encoding the server + later undoes can't hide a ``..``. Loop bounded: a real path needs 0-1 passes. + """ + decoded = path + for _ in range(10): # bounded; each pass peels one percent-encoding layer + nxt = urllib.parse.unquote(decoded) + if nxt == decoded: + break + decoded = nxt + return decoded.replace('\\', '/').split('/') + + +def _endpoint_path_under_issuer(endpoint: str, issuer: str) -> bool: + """ + True if ``endpoint``'s path is the issuer's path or a sub-path of it. + + Segment-aware, so ``/realms/prod`` does not match ``/realms/production``. A + root issuer (no path) constrains the origin only and matches any path. Stops + a tampered ``/settings`` from redirecting credentials to a different tenant + on a path-based multi-tenant IdP (Keycloak issuers are + ``https://host/realms/{realm}``), which an origin-only check can't catch. + + Compared on fully *decoded* path segments, not the raw wire string. A ``.`` / + ``..`` segment is rejected outright: the server normalizes it, so + ``/realms/prod/../attacker/token`` passes a naive prefix test yet resolves to + a *different* realm. ``_decode_path_segments`` unmasks encoded dot segments, + and the last segment's ``;params`` (which urllib splits off ``.path``) is + folded back, so neither can hide a traversal. Legitimate paths have no dot + segments. + """ + base = (safe_urlparse(issuer)[0].path or '').rstrip('/') + if not base: + return True + base_segs = _decode_path_segments(base) + eparts = safe_urlparse(endpoint)[0] + # Fold the last segment's ;params back into the path so a traversal hidden + # there (…/token;..%2f..%2fEVIL) can't slip past the scan. + ep_path = eparts.path or '' + if eparts.params: + ep_path = f'{ep_path};{eparts.params}' + ep_segs = _decode_path_segments(ep_path) + if '.' in ep_segs or '..' in ep_segs: + return False + return ep_segs[:len(base_segs)] == base_segs + + +def validate_endpoint_origins( + token_endpoint: str, + device_authorization_endpoint: str, + issuer: Optional[str] = None) -> None: + """ + Reject an OIDC configuration that would send credentials off-origin. + + The device code and long-lived refresh token are POSTed to the device- + authorization and token endpoints. This limits a tampered or MITM'd config + from steering those credentials to an attacker host: + + * the two credential endpoints must share a single origin (always co-located + on the authorization server per RFC 8628); and + * when ``issuer`` is known independently (explicit or from the IdP + ``.well-known``), both endpoints must share its **origin**. + + Origin-level only: it does **not** isolate path-based multi-tenant realms + (e.g. Keycloak ``https://host/realms/{realm}``, one origin per realm). That + path-scoping lives in :func:`resolve_config`, and only for endpoints from the + untrusted QuestDB ``/settings``; endpoints from IdP discovery or the caller + are authoritative and not path-restricted (some IdPs, e.g. Azure AD, + legitimately place endpoints outside the issuer path). + + Pass ``issuer=`` to pin the IdP when QuestDB advertises the endpoints + directly, so a compromised server cannot redirect the token POST. + """ + if _normalized_origin(token_endpoint) != _normalized_origin( + device_authorization_endpoint): + raise OidcConfigError( + 'OIDC token and device-authorization endpoints are on different ' + f'origins ({_origin_str(token_endpoint)} vs ' + f'{_origin_str(device_authorization_endpoint)}); refusing to send ' + 'credentials. This indicates a misconfigured or tampered OIDC ' + 'configuration.') + if issuer: + issuer_origin = _normalized_origin(issuer) + for label, url in ( + ('token endpoint', token_endpoint), + ('device-authorization endpoint', + device_authorization_endpoint)): + if _normalized_origin(url) != issuer_origin: + raise OidcConfigError( + f'OIDC {label} origin ({_origin_str(url)}) does not match ' + f'the issuer origin ({_origin_str(issuer)}); refusing to ' + 'send credentials to an endpoint outside the trusted ' + 'issuer.') + + +def _resolve_endpoint(value: Optional[str], cfg: Dict[str, Any]) -> Optional[str]: + """ + Turn a possibly-relative endpoint into a full URL. + + QuestDB usually exports fully-resolved URLs, but some deployments store + only the path (e.g. ``/as/token.oauth2``) alongside ``acl.oidc.host``. + """ + if not value: + return None + if not isinstance(value, str): + # Non-string endpoint (e.g. a JSON number): treat as absent so resolution + # yields a clear OidcConfigError instead of an AttributeError from + # .startswith() escaping the typed-error contract. + return None + if value.startswith('http://') or value.startswith('https://'): + return value + if value.startswith('/'): + # _str_setting drops a non-string acl.oidc.host so it can't be + # interpolated raw into the netloc (e.g. https://12345:9000/path). + host = _str_setting(cfg.get(_K_HOST)) + if not host: + # Path-only endpoint with no host to resolve against: treat as absent + # for the clear "could not resolve" error, rather than passing a + # scheme-less "/path" on to a confusing "malformed URL" downstream. + return None + tls = _as_bool(cfg.get(_K_TLS_ENABLED), default=True) + scheme = 'https' if tls else 'http' + # A usable port is an int or digit string; anything else would corrupt + # the netloc, so drop it and resolve host-only. + port = cfg.get(_K_PORT) + if isinstance(port, bool) or not ( + isinstance(port, int) + or (isinstance(port, str) and port.isdigit())): + port = None + netloc = f'{host}:{port}' if port else host + return f'{scheme}://{netloc}{value}' + return value + + +def well_known_url(issuer: str) -> str: + return issuer.rstrip('/') + '/.well-known/openid-configuration' + + +def discover_device_endpoint_from_idp( + *, + issuer: Optional[str], + discovery_url: Optional[str], + ctx: Optional[ssl.SSLContext] = None, + insecure: bool = False, + timeout: float = 30) -> Dict[str, Any]: + """ + Fetch the IdP ``.well-known/openid-configuration`` and return it. + + The discovery URL comes from ``discovery_url``, else built from ``issuer``; + one is required. The discovery origin is **never** derived from a + QuestDB-advertised endpoint — that would let a tampered ``/settings`` choose + where credentials are sent, with the co-location / issuer-pin checks passing + trivially because every value would share the attacker's origin. + """ + url = discovery_url or (well_known_url(issuer) if issuer else None) + if not url: + raise OidcConfigError( + 'Cannot discover the IdP device-authorization endpoint: no issuer ' + 'or discovery_url was given. Pass issuer=... (or ' + 'device_authorization_endpoint=... to skip discovery).') + doc = get_json(url, ctx=ctx, insecure=insecure, timeout=timeout) + # get_json guarantees valid JSON, not a JSON *object*. Coerce a non-dict + # document (from a captive portal, bad proxy, or hostile IdP) to empty so + # resolve_config's doc.get(...) yields a clear "could not resolve" error + # rather than an AttributeError. Mirrors settings_config. + return doc if isinstance(doc, dict) else {} + + +def resolve_config( + *, + questdb_url: Optional[str] = None, + client_id: Optional[str] = None, + scope: Optional[str] = None, + audience: Optional[str] = None, + groups_in_token: Optional[bool] = None, + token_endpoint: Optional[str] = None, + device_authorization_endpoint: Optional[str] = None, + authorization_endpoint: Optional[str] = None, + issuer: Optional[str] = None, + discovery_url: Optional[str] = None, + ctx: Optional[ssl.SSLContext] = None, + insecure: bool = False, + timeout: float = 30) -> OidcConfig: + """ + Resolve a complete :class:`OidcConfig`. + + Explicit keyword arguments always win; anything left ``None`` is filled in + from QuestDB ``/settings`` (if ``questdb_url`` is given) and, as a last + resort for the device endpoint, the IdP discovery document. + """ + cfg: Dict[str, Any] = {} + if questdb_url: + cfg = fetch_settings( + questdb_url, ctx=ctx, insecure=insecure, timeout=timeout) + enabled = _as_bool(cfg.get(_K_ENABLED), default=None) + if enabled is False: + raise OidcConfigError( + f'QuestDB at {questdb_url} reports OIDC is disabled ' + f'({_K_ENABLED}=false). Nothing to authenticate against.') + + # _str_setting drops a non-string /settings value so a non-string client.id + # reads as absent and hits the clear "Missing client_id" error below. + client_id = client_id or _str_setting(cfg.get(_K_CLIENT_ID)) + if not client_id: + raise OidcConfigError( + 'Missing OIDC client_id. QuestDB did not advertise ' + f'{_K_CLIENT_ID!r} via /settings; pass client_id=... explicitly.') + + if scope is None: + scope = _str_setting(cfg.get(_K_SCOPE)) or 'openid' + if groups_in_token is None: + groups_in_token = _as_bool(cfg.get(_K_GROUPS_IN_TOKEN), default=True) + if audience is None: + audience = _str_setting(cfg.get(_K_AUDIENCE)) + + # Track caller-supplied credential endpoints: those are trusted, whereas + # /settings endpoints are only as trustworthy as the channel that delivered + # them (see the insecure-channel guard below). + explicit_token_endpoint = token_endpoint is not None + explicit_device_endpoint = device_authorization_endpoint is not None + + token_endpoint = ( + token_endpoint or _resolve_endpoint(cfg.get(_K_TOKEN_ENDPOINT), cfg)) + authorization_endpoint = ( + authorization_endpoint + or _resolve_endpoint(cfg.get(_K_AUTHORIZATION_ENDPOINT), cfg)) + device_authorization_endpoint = ( + device_authorization_endpoint + or _resolve_endpoint(cfg.get(_K_DEVICE_ENDPOINT), cfg)) + + # Over a plaintext-http /settings channel (insecure=True, non-loopback), a + # tampered response can advertise BOTH credential endpoints at one attacker + # origin: the discovery path below is skipped, co-location passes trivially + # (shared origin) and the issuer-pin check is vacuous (no issuer), so nothing + # else catches it. Demand the same out-of-band pin (issuer= / discovery_url=) + # before trusting /settings endpoints here. Caller-explicit endpoints and + # those from an authenticated (https / loopback) /settings are unaffected. + settings_supplied_credentials = ( + (token_endpoint and not explicit_token_endpoint) + or (device_authorization_endpoint and not explicit_device_endpoint)) + if (questdb_url and settings_supplied_credentials + and not issuer and not discovery_url + and _settings_channel_is_plaintext(questdb_url)): + raise OidcConfigError( + 'QuestDB was reached over plaintext http (insecure=True), so its ' + '/settings response — and the OIDC endpoints it advertises — can be ' + 'tampered in transit and used to redirect the device-code and ' + 'refresh-token requests to an attacker. Pin the identity provider ' + 'out-of-band with issuer="https://your-idp" (or discovery_url=...), ' + 'pass the endpoints explicitly (token_endpoint=..., ' + 'device_authorization_endpoint=...), or connect to QuestDB over ' + 'https so /settings is authenticated.') + + # For /settings endpoints with an out-of-band issuer, require each under the + # issuer's PATH, not just its origin: path-based IdPs share one origin per + # tenant (Keycloak https://host/realms/{realm}), so the origin check alone + # can't stop a tampered /settings steering credentials to a different realm. + # The out-of-band issuer can't be forged. Caller-explicit endpoints and those + # from IdP discovery are authoritative and skip this — some IdPs (e.g. Azure + # AD) legitimately place endpoints outside the issuer path. + if issuer: + for label, url, from_settings in ( + ('token endpoint', token_endpoint, + not explicit_token_endpoint), + ('device-authorization endpoint', + device_authorization_endpoint, not explicit_device_endpoint)): + if url and from_settings and not _endpoint_path_under_issuer( + url, issuer): + raise OidcConfigError( + f'The OIDC {label} advertised by QuestDB /settings ' + f'({url!r}) is not under the pinned issuer ({issuer!r}); ' + 'refusing to send credentials to an endpoint outside the ' + 'trusted issuer (e.g. a different realm on the same host). ' + 'If your IdP places endpoints outside the issuer path, pass ' + 'them explicitly (token_endpoint=..., ' + 'device_authorization_endpoint=...).') + + # Fall back to IdP discovery when QuestDB doesn't advertise the device + # (and/or token) endpoint. This contacts the IdP, so it is held to + # https/loopback (insecure=False) regardless of the QuestDB flag. + if not device_authorization_endpoint or not token_endpoint: + # Require an out-of-band trust anchor first. Otherwise the discovery + # target would be guessed from the /settings token endpoint, so a + # tampered /settings could steer discovery (and the credential POSTs) to + # an attacker origin with co-location / issuer-pin passing trivially. + if not issuer and not discovery_url: + raise OidcConfigError( + 'QuestDB did not advertise the OIDC device-authorization ' + 'endpoint (and/or the token endpoint), so it must be ' + 'discovered from the identity provider, but the IdP is not ' + 'pinned. Pass issuer="https://your-idp" (its origin) so a ' + 'tampered or intercepted /settings response cannot redirect ' + 'the device-code and refresh-token requests to an attacker. ' + 'Alternatively pass the endpoint(s) explicitly ' + '(device_authorization_endpoint=..., token_endpoint=...) to ' + 'skip discovery, or discovery_url=... to pin the discovery ' + 'document.') + doc = discover_device_endpoint_from_idp( + issuer=issuer, discovery_url=discovery_url, + ctx=ctx, insecure=False, timeout=timeout) + # The discovery document is untrusted too: coerce its values like + # /settings. A non-string endpoint / issuer reads as absent (clear + # "could not resolve" below, or no issuer pin) instead of reaching + # safe_urlparse / the cache-key join as a raw object. + device_authorization_endpoint = ( + device_authorization_endpoint + or _str_setting(doc.get('device_authorization_endpoint'))) + token_endpoint = ( + token_endpoint or _str_setting(doc.get('token_endpoint'))) + authorization_endpoint = ( + authorization_endpoint + or _str_setting(doc.get('authorization_endpoint'))) + # OIDC Discovery §4.3 / RFC 8414 §3: when pinned ONLY by discovery_url, + # the document's self-declared issuer (the anchor + # validate_endpoint_origins would use) comes from that same untrusted + # document, so it's a vacuous check. Anchor to the caller-pinned + # discovery_url instead: require the credential endpoints on its origin + # so the document can't redirect the POSTs off it. Origin-level; pass + # issuer= and explicit endpoints if discovery and tokens differ in origin. + if discovery_url and not issuer: + discovery_origin = _normalized_origin(discovery_url) + for label, url in ( + ('token endpoint', token_endpoint), + ('device-authorization endpoint', + device_authorization_endpoint)): + if url and _normalized_origin(url) != discovery_origin: + raise OidcConfigError( + f'The OIDC {label} ({url!r}) discovered via the pinned ' + f'discovery_url ({discovery_url!r}) is on a different ' + 'origin; refusing to let a discovery document redirect ' + 'credentials off the pinned IdP origin (OIDC Discovery ' + '§4.3). Pin the IdP with issuer="https://your-idp" and ' + 'pass token_endpoint=/device_authorization_endpoint= ' + 'explicitly if it serves discovery and tokens from ' + 'different origins.') + issuer = issuer or _str_setting(doc.get('issuer')) + + if not token_endpoint: + raise OidcConfigError( + 'Could not resolve the OIDC token endpoint from QuestDB /settings ' + 'or IdP discovery. Pass token_endpoint=... explicitly.') + if not device_authorization_endpoint: + raise OidcConfigError( + 'Could not resolve the device-authorization endpoint. The IdP ' + 'discovery document did not contain ' + '"device_authorization_endpoint". Ensure the IdP supports the ' + 'device grant, or pass device_authorization_endpoint=... ' + 'explicitly.') + + # The credential-endpoint origin check (validate_endpoint_origins) is + # enforced centrally in OidcDeviceAuth.__init__, which every path goes + # through. + + return OidcConfig( + client_id=client_id, + token_endpoint=token_endpoint, + device_authorization_endpoint=device_authorization_endpoint, + scope=scope, + groups_in_token=bool(groups_in_token), + audience=audience, + issuer=issuer, + authorization_endpoint=authorization_endpoint) diff --git a/src/questdb/auth/_errors.py b/src/questdb/auth/_errors.py new file mode 100644 index 00000000..17b83e9e --- /dev/null +++ b/src/questdb/auth/_errors.py @@ -0,0 +1,89 @@ +################################################################################ +## ___ _ ____ ____ +## / _ \ _ _ ___ ___| |_| _ \| __ ) +## | | | | | | |/ _ \/ __| __| | | | _ \ +## | |_| | |_| | __/\__ \ |_| |_| | |_) | +## \__\_\\__,_|\___||___/\__|____/|____/ +## +## Copyright (c) 2014-2019 Appsicle +## Copyright (c) 2019-2024 QuestDB +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +################################################################################ + +"""Exceptions raised by :mod:`questdb.auth`.""" + +from __future__ import annotations + +from typing import Optional + + +class OidcError(Exception): + """Base class for every error raised by :mod:`questdb.auth`.""" + + def __init__(self, *args, status: Optional[int] = None): + super().__init__(*args) + # HTTP status behind a non-JSON HTTP response (else None), so the poll + # loop and silent refresh can tell a terminal 4xx (e.g. a WAF error + # page) from a transient 5xx/429/network blip. + self.status = status + + +class OidcConfigError(OidcError): + """ + The OIDC configuration could not be resolved or is inconsistent (e.g. + QuestDB does not advertise OIDC, the IdP device-authorization endpoint + cannot be discovered, or a required argument is missing). + """ + + +class OidcNetworkError(OidcError): + """A network-level failure while talking to QuestDB or the IdP.""" + + +class OidcInteractionRequired(OidcError): + """ + Interactive sign-in is required, but raised instead of hanging in a + non-interactive context (``papermill``, cron, CI). Use a QuestDB + service-account REST token or the OAuth2 client-credentials grant there. + """ + + +class OidcDeviceFlowError(OidcError): + """ + The OAuth 2.0 device authorization grant failed; the IdP + ``error``/``error_description`` are preserved when available. + """ + + def __init__( + self, + message: str, + *, + error: Optional[str] = None, + error_description: Optional[str] = None): + super().__init__(message) + self.error = error + self.error_description = error_description + + +class OidcTimeoutError(OidcDeviceFlowError): + """The user did not authorize the device in time (the code expired).""" + + +class OidcAuthError(OidcError): + """ + QuestDB rejected the token (typically a ``401``/``403`` from the server); + the message hints at common causes (scope / ``groups.encoded.in.token`` / + ``audience`` mismatches). + """ diff --git a/src/questdb/auth/_http.py b/src/questdb/auth/_http.py new file mode 100644 index 00000000..a07444ed --- /dev/null +++ b/src/questdb/auth/_http.py @@ -0,0 +1,289 @@ +################################################################################ +## ___ _ ____ ____ +## / _ \ _ _ ___ ___| |_| _ \| __ ) +## | | | | | | |/ _ \/ __| __| | | | _ \ +## | |_| | |_| | __/\__ \ |_| |_| | |_) | +## \__\_\\__,_|\___||___/\__|____/|____/ +## +## Copyright (c) 2014-2019 Appsicle +## Copyright (c) 2019-2024 QuestDB +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +################################################################################ + +""" +A tiny stdlib-only HTTP helper. + +Avoids a hard dependency on ``requests``/``httpx`` so ``OidcDeviceAuth.token()`` +/ ``headers()`` work with no extra installs. Only the device flow, discovery and +the REST adapter use this; heavier adapters (SQLAlchemy / psycopg / ingestion +``Sender``) bring their own transports. + +``urllib`` honours the standard proxy env vars (``HTTPS_PROXY`` / ``HTTP_PROXY`` +/ ``NO_PROXY``); a custom CA bundle can come from ``REQUESTS_CA_BUNDLE`` / +``SSL_CERT_FILE``. +""" + +from __future__ import annotations + +import http.client +import ipaddress +import json +import os +import ssl +import urllib.error +import urllib.parse +import urllib.request +from typing import Any, Dict, Mapping, Optional + +from ._errors import OidcConfigError, OidcNetworkError, OidcError + +_DEFAULT_TIMEOUT = 30 +_USER_AGENT = 'questdb-python-client (oidc-auth)' + + +def build_ssl_context(ca_bundle: Optional[str] = None) -> ssl.SSLContext: + """ + Build an SSL context from an explicit CA bundle or the ``REQUESTS_CA_BUNDLE`` + / ``SSL_CERT_FILE`` env vars (useful behind a TLS-intercepting proxy). + """ + ca = ( + ca_bundle + or os.environ.get('REQUESTS_CA_BUNDLE') + or os.environ.get('SSL_CERT_FILE')) + if not ca: + return ssl.create_default_context() + # Map the raw FileNotFoundError / ssl.SSLError from a missing/invalid bundle + # to a typed error so a mistyped path fails clearly. + try: + if os.path.isdir(ca): + return ssl.create_default_context(capath=ca) + return ssl.create_default_context(cafile=ca) + except (OSError, ssl.SSLError) as e: + raise OidcConfigError( + f'Could not load the CA bundle {ca!r}: {e}. Check the path points ' + 'to a readable PEM/DER certificate file (or a directory of them).' + ) from e + + +class HttpResponse: + """A minimal response wrapper (status + raw body + headers).""" + + __slots__ = ('status', 'body', 'headers') + + def __init__(self, status: int, body: bytes, headers: Mapping[str, str]): + self.status = status + self.body = body + self.headers = dict(headers) + + def text(self) -> str: + return self.body.decode('utf-8', errors='replace') + + def json(self) -> Any: + return json.loads(self.body.decode('utf-8')) + + @property + def ok(self) -> bool: + return 200 <= self.status < 300 + + +def safe_urlparse(url: str) -> tuple: + """ + ``urlparse(url)`` paired with its port, but with a typed error. + + Both ``urlparse`` (malformed IPv6 literal) and ``ParseResult.port`` + (non-integer port) raise a bare ``ValueError``; re-raise as + :class:`OidcConfigError` to keep a malformed URL within the error contract. + Returns ``(parts, port)``. + """ + try: + parts = urllib.parse.urlparse(url) + return parts, parts.port + except ValueError as e: + raise OidcConfigError( + f'Malformed endpoint URL {url!r}: {e}.') from e + + +def _is_loopback(host: Optional[str]) -> bool: + # Loopback traffic never leaves the host, so plaintext http is safe here. + if not host: + return False + if host.lower() == 'localhost': + return True + try: + return ipaddress.ip_address(host).is_loopback + except ValueError: + return False + + +def _require_secure(url: str, insecure: bool) -> None: + # safe_urlparse maps a malformed URL to OidcConfigError, not a bare ValueError. + parts, _ = safe_urlparse(url) + scheme = parts.scheme.lower() + if scheme == 'https': + return + if scheme == 'http': + if _is_loopback(parts.hostname): + return + if insecure: + return + raise OidcConfigError( + f'Refusing to use insecure URL {url!r} (scheme {scheme!r}). Use https ' + '(loopback http is always allowed for local development); pass ' + 'insecure=True only to permit plaintext to a non-loopback host.') + + +class _NoRedirect(urllib.request.HTTPRedirectHandler): + """Refuse to follow HTTP redirects. + + These endpoints never legitimately redirect, and auto-following is unsafe: + only the *original* URL is vetted (``_require_secure`` / + ``validate_endpoint_origins`` never see the target), and urllib does not + strip ``Authorization`` on a cross-origin redirect — so one ``302`` from + ``/exec`` could re-send the bearer token to an attacker host, even + downgrading to plaintext ``http``. + + Returning ``None`` surfaces the ``30x`` as an ``HTTPError`` (which + :func:`request` turns into a non-2xx :class:`HttpResponse`), giving callers a + clean failure. + """ + + def redirect_request(self, *args, **kwargs): + return None + + +def _opener(ctx: Optional[ssl.SSLContext]) -> urllib.request.OpenerDirector: + # build_opener keeps the default ProxyHandler (reads *_PROXY env vars) while + # letting us pin our own TLS context and forbid redirects. + handlers: list = [_NoRedirect()] + if ctx is not None: + handlers.append(urllib.request.HTTPSHandler(context=ctx)) + return urllib.request.build_opener(*handlers) + + +def request( + method: str, + url: str, + *, + form: Optional[Mapping[str, Any]] = None, + data: Optional[bytes] = None, + headers: Optional[Mapping[str, str]] = None, + timeout: float = _DEFAULT_TIMEOUT, + ctx: Optional[ssl.SSLContext] = None, + insecure: bool = False) -> HttpResponse: + """ + Perform a single HTTP request. + + ``form`` is encoded into the body as ``application/x-www-form-urlencoded``. + HTTP error statuses (``4xx``/``5xx``) are returned as an + :class:`HttpResponse`, not raised, so callers can inspect OAuth error bodies + (e.g. ``authorization_pending``); only genuine network failures raise + (:class:`OidcNetworkError`). + """ + _require_secure(url, insecure) + body: Optional[bytes] = data + req_headers = {'User-Agent': _USER_AGENT, 'Accept': 'application/json'} + if form is not None: + body = urllib.parse.urlencode( + {k: v for k, v in form.items() if v is not None}).encode('utf-8') + req_headers['Content-Type'] = 'application/x-www-form-urlencoded' + if headers: + req_headers.update(headers) + + req = urllib.request.Request( + url, data=body, headers=req_headers, method=method.upper()) + try: + with _opener(ctx).open(req, timeout=timeout) as resp: + return HttpResponse( + getattr(resp, 'status', resp.getcode()), + resp.read(), + resp.headers) + except urllib.error.HTTPError as e: + # 4xx/5xx still carry a (possibly JSON) body to inspect. Map a mid-body + # read failure to a network error, and close the response so its socket + # isn't leaked (the poll loop drives many 400s during a long sign-in). + try: + body = e.read() + except (TimeoutError, OSError) as read_err: + raise OidcNetworkError( + f'Failed to read response from {url}: {read_err}') from read_err + finally: + e.close() + return HttpResponse(e.code, body, e.headers or {}) + except urllib.error.URLError as e: + raise OidcNetworkError(f'Failed to reach {url}: {e.reason}') from e + except http.client.InvalidURL as e: + # A malformed URL (e.g. non-integer port) can't become a request; + # surface it as a config error, not a raw http.client exception. + raise OidcConfigError(f'Malformed URL {url!r}: {e}') from e + except (TimeoutError, OSError, http.client.HTTPException) as e: + raise OidcNetworkError(f'Failed to reach {url}: {e}') from e + + +def get_json( + url: str, + *, + headers: Optional[Mapping[str, str]] = None, + timeout: float = _DEFAULT_TIMEOUT, + ctx: Optional[ssl.SSLContext] = None, + insecure: bool = False) -> Any: + """GET a URL and parse a JSON response, raising on non-2xx.""" + resp = request( + 'GET', url, headers=headers, timeout=timeout, ctx=ctx, + insecure=insecure) + if not resp.ok: + raise OidcError( + f'HTTP {resp.status} from {url}: {resp.text()[:200]}') + try: + return resp.json() + except (ValueError, UnicodeDecodeError, RecursionError) as e: + # RecursionError (deeply-nested JSON) isn't a ValueError, so catch it + # explicitly to keep the typed contract. + raise OidcError(f'Invalid JSON from {url}: {e}') from e + + +def post_form( + url: str, + form: Mapping[str, Any], + *, + headers: Optional[Mapping[str, str]] = None, + timeout: float = _DEFAULT_TIMEOUT, + ctx: Optional[ssl.SSLContext] = None, + insecure: bool = False) -> tuple[int, Dict[str, Any]]: + """ + POST a form-url-encoded body and parse the JSON response. + + Returns ``(status, parsed_json)``. Used for the device-authorization and + token endpoints, which return JSON on both success and error. + """ + resp = request( + 'POST', url, form=form, headers=headers, timeout=timeout, ctx=ctx, + insecure=insecure) + try: + parsed = resp.json() + except (ValueError, UnicodeDecodeError, RecursionError): + # RecursionError (deeply-nested JSON) isn't a ValueError, so catch it + # explicitly to keep the typed contract. + if resp.ok: + raise OidcError( + f'Expected JSON from {url}, got: {resp.text()[:200]}', + status=resp.status) + # Non-JSON error body: attach the HTTP status so callers (poll loop / + # silent refresh) can tell a terminal 4xx from a transient 5xx/429. + raise OidcError( + f'HTTP {resp.status} from {url}: {resp.text()[:200]}', + status=resp.status) + if not isinstance(parsed, dict): + raise OidcError(f'Unexpected JSON shape from {url}: {parsed!r}') + return resp.status, parsed diff --git a/src/questdb/auth/_questdb.py b/src/questdb/auth/_questdb.py new file mode 100644 index 00000000..2cb7da05 --- /dev/null +++ b/src/questdb/auth/_questdb.py @@ -0,0 +1,405 @@ +################################################################################ +## ___ _ ____ ____ +## / _ \ _ _ ___ ___| |_| _ \| __ ) +## | | | | | | |/ _ \/ __| __| | | | _ \ +## | |_| | |_| | __/\__ \ |_| |_| | |_) | +## \__\_\\__,_|\___||___/\__|____/|____/ +## +## Copyright (c) 2014-2019 Appsicle +## Copyright (c) 2019-2024 QuestDB +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +################################################################################ + +"""High-level QuestDB session: token, REST queries, and connection adapters.""" + +from __future__ import annotations + +import os +import re +import urllib.parse +from typing import Any, Dict, Optional + +from ._device import OidcDeviceAuth +from ._errors import OidcAuthError, OidcConfigError, OidcError +from ._http import request, safe_urlparse + +_DEFAULT_PG_PORT = 8812 +_DEFAULT_DATABASE = 'qdb' + +# Reject ILP conf-string delimiters (';', '=') and whitespace/control chars in +# the host: a real hostname/IP never has them, so their presence means a +# tampered URL trying to inject conf params like ';tls_verify=unsafe_off;' +# (which disables TLS verification) into the addr= string. ':' is allowed +# (IPv6 literals contain it; _ilp_addr brackets them). +_ILLEGAL_HOST_CHARS = re.compile(r'[\x00-\x20\x7f;=]') + +_AUTH_HINT = ( + 'QuestDB rejected the token (HTTP {status}). Common causes:\n' + " * scope / 'acl.oidc.groups.encoded.in.token' mismatch — the server may " + 'expect the id_token (groups in token) while an access_token was sent, or ' + 'vice-versa;\n' + " * the 'groups'/'sub' claim is missing — check the requested scope;\n" + " * 'aud' mismatch — the token's audience does not match " + "'acl.oidc.audience' (try passing audience=...).") + + +def _import_pandas(): + try: + import pandas # type: ignore + return pandas + except ImportError as e: + raise ImportError( + 'Missing optional dependency `pandas`, required for ' + 'QuestDB.sql(). Install it with `pip install questdb[dataframe]`. ' + 'See https://py-questdb-client.readthedocs.io/en/latest/' + 'installation.html') from e + + +def _exec_json_to_df(data: Dict[str, Any], pandas): + columns = data.get('columns') or [] + # /exec returns a list of {"name", "type"} descriptors. Validate the shape + # (a real column name is always a string) so a malformed response raises a + # clean OidcError rather than a raw AttributeError from .get() or a + # TypeError from `name in df.columns` on a non-hashable name. + if not isinstance(columns, list) or not all( + isinstance(c, dict) + and isinstance(c.get('name'), (str, type(None))) + for c in columns): + raise OidcError( + 'QuestDB /exec returned a malformed "columns" field; ' + 'cannot build a DataFrame.') + names = [c.get('name') for c in columns] + dataset = data.get('dataset') + if dataset is None: + dataset = data.get('data') or [] + try: + df = pandas.DataFrame(dataset, columns=names or None) + except (ValueError, TypeError) as e: + # A malformed dataset shape can make the pandas constructor raise + # ValueError or TypeError; keep both within OidcError. + raise OidcError( + f'Unexpected shape in QuestDB /exec response: {e}') from e + for col in columns: + name = col.get('name') + if col.get('type') in ('TIMESTAMP', 'DATE') and name in df.columns: + try: + df[name] = pandas.to_datetime(df[name], errors='coerce') + except Exception: + pass + return df + + +def _pg_module(): + try: + import psycopg # type: ignore # psycopg v3 + return psycopg + except ImportError: + pass + try: + import psycopg2 # type: ignore + return psycopg2 + except ImportError as e: + raise ImportError( + 'A PostgreSQL driver is required: install `psycopg` (v3) or ' + '`psycopg2-binary`.') from e + + +class QuestDB: + """ + A thin, authenticated QuestDB session built on an :class:`OidcDeviceAuth`. + + Offers a one-call DataFrame query over REST plus adapters that feed the + same auto-refreshed token into SQLAlchemy / psycopg / the ingestion + ``Sender``, or take :meth:`token` / :meth:`headers` and wire it up yourself. + """ + + def __init__( + self, + url: str, + auth: OidcDeviceAuth, + *, + insecure: bool = False): + self.url = url.rstrip('/') + self.auth = auth + self._insecure = insecure + self._ctx = auth._ctx + # Same private CA bundle the auth/REST transport uses, so sender() can + # forward it to the ILP Sender's own TLS stack. getattr keeps test + # doubles that only set _ctx working. + self._ca_bundle = getattr(auth, '_ca_bundle', None) + # safe_urlparse validates the port up-front, raising OidcConfigError + # (not a bare ValueError) for a malformed one. + self._parts, self._port = safe_urlparse(self.url) + + # -- token access ------------------------------------------------------- + + def token(self) -> str: + """Return a valid, auto-refreshed token (see :meth:`OidcDeviceAuth.token`).""" + return self.auth.token() + + def headers(self) -> Dict[str, str]: + """Return ``{"Authorization": "Bearer "}``.""" + return self.auth.headers() + + # -- REST query --------------------------------------------------------- + + def sql(self, query: str, *, limit: Optional[str] = None, + timeout: float = 60) -> 'pandas.DataFrame': + """ + Run a SQL query over QuestDB's REST ``/exec`` endpoint and return a + :class:`pandas.DataFrame`. + + Uses ``Authorization: Bearer`` (no token-length limit, unlike PG-wire), + so it's the recommended path for large groups-encoded JWTs. + + :param query: The SQL query to run. + :param limit: Optional QuestDB ``limit`` (e.g. ``"1,1000"``). + :param timeout: Request timeout in seconds. + """ + pandas = _import_pandas() + params = {'query': query} + if limit is not None: + params['limit'] = limit + url = f'{self.url}/exec?' + urllib.parse.urlencode(params) + resp = request( + 'GET', url, headers=self.headers(), ctx=self._ctx, + insecure=self._insecure, timeout=timeout) + if resp.status in (401, 403): + raise OidcAuthError(_AUTH_HINT.format(status=resp.status)) + if not resp.ok: + detail = resp.text()[:300] + try: + detail = resp.json().get('error', detail) + except Exception: + pass + raise OidcError( + f'QuestDB query failed (HTTP {resp.status}): {detail}') + try: + data = resp.json() + except (ValueError, UnicodeDecodeError, RecursionError): + # A 2xx body that isn't JSON (e.g. an HTML page from a proxy/captive + # portal) or deeply-nested JSON that exhausts the decoder's stack + # (RecursionError) surfaces as a clean OidcError. Mirrors post_form(). + raise OidcError( + 'QuestDB returned a non-JSON success response from /exec: ' + f'{resp.text()[:300]}') + if not isinstance(data, dict): + # Valid JSON but not an object (e.g. a bare list) would break + # _exec_json_to_df's .get(); reject it. + raise OidcError( + 'QuestDB /exec returned JSON that is not an object ' + f'(got {type(data).__name__}); cannot build a DataFrame.') + return _exec_json_to_df(data, pandas) + + # -- connection adapters ------------------------------------------------ + + def _require_host(self, host: Optional[str] = None) -> str: + """ + Resolve the PG-wire / ILP host: an explicit ``host`` override, else the + host from the QuestDB URL. Raises (rather than passing a bare ``None`` + to the driver) when neither yields one, e.g. a URL with no authority + such as ``"localhost"`` or ``"questdb:9000"``. + + The returned host is *unbracketed* — psycopg and SQLAlchemy take address + and port separately. :meth:`_ilp_addr` adds the brackets an IPv6 literal + needs in the ILP ``addr=host:port`` form. + """ + resolved = host or self._parts.hostname + if not resolved: + raise OidcConfigError( + f'The QuestDB URL {self.url!r} has no host. Use a URL with an ' + 'explicit host (e.g. "https://questdb.example.com:9000"), or ' + 'pass host=... to the adapter.') + if _ILLEGAL_HOST_CHARS.search(resolved): + raise OidcConfigError( + f'The QuestDB host {resolved!r} contains an illegal character ' + "(';', '=', whitespace or a control character). A hostname or " + 'IP address never does; this indicates a malformed or tampered ' + 'URL. (Such a host could otherwise inject ILP conf parameters ' + 'such as "tls_verify=unsafe_off" into the sender, silently ' + 'disabling TLS certificate verification.)') + return resolved + + @staticmethod + def _ilp_addr(host: str, port: int) -> str: + # Bracket an IPv6 literal (it contains ':', unlike hostnames/IPv4) so + # the ILP conf parser reads host:port unambiguously. + bracketed = f'[{host}]' if ':' in host else host + return f'{bracketed}:{port}' + + def sqlalchemy_engine( + self, + *, + host: Optional[str] = None, + pg_port: int = _DEFAULT_PG_PORT, + database: str = _DEFAULT_DATABASE, + drivername: Optional[str] = None, + **engine_kwargs) -> 'sqlalchemy.engine.Engine': + """ + Build a SQLAlchemy ``Engine`` for QuestDB's PG-wire endpoint. + + Connects as user ``_sso``, injecting a **fresh** token as the password + on every new connection (via a ``do_connect`` listener) so pooled + connections always authenticate with a valid token. Requires + ``acl.oidc.pg.token.as.password.enabled=true`` on the server. + """ + try: + from sqlalchemy import create_engine, event + from sqlalchemy.engine import URL + except ImportError as e: + raise ImportError( + 'SQLAlchemy is required for QuestDB.sqlalchemy_engine(); ' + 'install it with `pip install sqlalchemy`.') from e + + if drivername is None: + mod = _pg_module() + drivername = ( + 'postgresql+psycopg' + if mod.__name__ == 'psycopg' + else 'postgresql+psycopg2') + + url = URL.create( + drivername=drivername, + username='_sso', + host=self._require_host(host), + port=pg_port, + database=database) + engine = create_engine(url, **engine_kwargs) + + auth = self.auth + + @event.listens_for(engine, 'do_connect') + def _provide_token(dialect, conn_rec, cargs, cparams): # noqa: ANN001 + cparams['password'] = auth.token() + + return engine + + def psycopg( + self, + *, + host: Optional[str] = None, + pg_port: int = _DEFAULT_PG_PORT, + database: str = _DEFAULT_DATABASE, + **connect_kwargs) -> 'Any': + """ + Open a raw psycopg (v3) or psycopg2 connection to QuestDB's PG-wire + endpoint, authenticating as ``_sso`` with the current token. + + The token is captured at connect time; reconnect to pick up a refreshed + token. + """ + mod = _pg_module() + return mod.connect( + host=self._require_host(host), + port=pg_port, + dbname=database, + user='_sso', + password=self.auth.token(), + **connect_kwargs) + + def sender(self, *, port: Optional[int] = None, + **sender_kwargs) -> 'questdb.ingress.Sender': + """ + Build a :class:`questdb.ingress.Sender` (ILP-over-HTTP) for ingestion, + configured with the current bearer token. + + The token is captured at creation time; create a new sender to pick up + a refreshed token. + """ + scheme = 'https' if self._parts.scheme == 'https' else 'http' + resolved_port = port or self._port or ( + 443 if scheme == 'https' else 9000) + # Coerce to int (before the heavy import, so bad input fails fast) so a + # stray non-integer port can't inject conf params like + # "9000;tls_verify=unsafe_off" into the addr= string via _ilp_addr — + # the same injection _require_host() blocks for the host. The + # URL-derived self._port is already an int. + try: + resolved_port = int(resolved_port) + except (TypeError, ValueError): + raise OidcConfigError( + f'Invalid port {resolved_port!r} for QuestDB.sender(); expected ' + 'an integer.') + + try: + from questdb.ingress import Sender + except ImportError as e: + raise ImportError( + 'The compiled `questdb.ingress` module is required for ' + 'QuestDB.sender(). Install the full client wheel ' + '(`pip install questdb`).') from e + + conf = (f'{scheme}::addr=' + f'{self._ilp_addr(self._require_host(), resolved_port)};') + # Forward the private CA bundle (explicit ca_bundle=, else the + # REQUESTS_CA_BUNDLE / SSL_CERT_FILE env vars — same precedence as + # build_ssl_context) to the Sender as tls_roots, so an https Sender + # against a private-CA QuestDB trusts the same roots the REST/IdP paths + # do. Only a PEM file works (tls_roots takes a file, no capath), only + # over https, and the caller can still override via tls_roots=/tls_ca=. + if (scheme == 'https' + and 'tls_roots' not in sender_kwargs + and 'tls_ca' not in sender_kwargs): + ca = (self._ca_bundle + or os.environ.get('REQUESTS_CA_BUNDLE') + or os.environ.get('SSL_CERT_FILE')) + if ca and os.path.isfile(ca): + sender_kwargs['tls_roots'] = ca + return Sender.from_conf(conf, token=self.auth.token(), **sender_kwargs) + + +def connect( + url: str, + *, + flow: str = 'auto', + cache: Any = 'memory', + insecure: bool = False, + eager: bool = True, + **opts) -> QuestDB: + """ + High-level entry point: authenticate to QuestDB and return a + :class:`QuestDB` session. + + .. code-block:: python + + from questdb.auth import connect + + qdb = connect("https://questdb.example.com:9000") # signs in + df = qdb.sql("SELECT * FROM trades LIMIT 10") + + Configuration (OIDC client id, scope, endpoints, groups mode) is discovered + from ``{url}/settings`` and, as needed, the IdP ``.well-known`` document. + Re-running the same call reuses the cached token (no re-prompt). + + :param url: The QuestDB HTTP(S) base URL, e.g. + ``"https://questdb.example.com:9000"``. + :param flow: ``"auto"`` (default), ``"device"`` or ``"loopback"``. Today + ``"auto"`` resolves to the device flow (works on local and remote + kernels); ``"loopback"`` is reserved for a future release. + :param cache: Token cache backend: ``"memory"`` (default) or ``None``. + :param insecure: Allow plaintext ``http://`` URLs (development only). + :param eager: If ``True`` (default), sign in immediately; otherwise defer + until the first call that needs a token. + :param opts: Forwarded to :meth:`OidcDeviceAuth.from_questdb` (e.g. + ``client_id``, ``scope``, ``audience``, ``issuer``, ``open_browser``, + ``qr``, ``ca_bundle``, ``timeout`` — the per-request IdP network timeout, + which also bounds how long a stalled IdP can hold the token lock). + """ + auth = OidcDeviceAuth.from_questdb( + url, flow=flow, cache=cache, insecure=insecure, **opts) + qdb = QuestDB(url, auth, insecure=insecure) + if eager: + auth.token() + return qdb diff --git a/src/questdb/auth/_render.py b/src/questdb/auth/_render.py new file mode 100644 index 00000000..85dc29c0 --- /dev/null +++ b/src/questdb/auth/_render.py @@ -0,0 +1,373 @@ +################################################################################ +## ___ _ ____ ____ +## / _ \ _ _ ___ ___| |_| _ \| __ ) +## | | | | | | |/ _ \/ __| __| | | | _ \ +## | |_| | |_| | __/\__ \ |_| |_| | |_) | +## \__\_\\__,_|\___||___/\__|____/|____/ +## +## Copyright (c) 2014-2019 Appsicle +## Copyright (c) 2019-2024 QuestDB +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +################################################################################ + +""" +Presentation of the device-flow prompt. + +Renders a clickable link + user code in Jupyter (via ``IPython.display``), +falling back to plain text on a terminal. Not required for ``token()`` / +``headers()``; ``IPython`` and ``qrcode`` are imported lazily. +""" + +from __future__ import annotations + +import html +import re +import sys +import urllib.parse +from typing import Any, Dict, Optional, TextIO + + +def in_ipython_kernel() -> bool: + """True when running inside an interactive Jupyter/ZMQ kernel.""" + try: + from IPython import get_ipython # type: ignore + except Exception: + return False + ip = get_ipython() + if ip is None: + return False + # ZMQInteractiveShell == notebook/qtconsole/lab; TerminalInteractiveShell + # == ipython in a terminal. + return ip.__class__.__name__ in ( + 'ZMQInteractiveShell', 'TerminalInteractiveShell') + + +def detect_interactive() -> bool: + """ + Best-effort detection of whether a human can complete the sign-in. + + Interactive when attached to a TTY or an interactive IPython shell; guards + against hanging forever in a non-interactive context (papermill/cron/CI). + """ + if in_ipython_kernel(): + return True + try: + return bool(sys.stdin and sys.stdin.isatty() + and sys.stdout and sys.stdout.isatty()) + except Exception: + return False + + +def _verification_uri(resp: Dict[str, Any]) -> str: + # RFC 8628 uses ``verification_uri``; some IdPs (older Google) use + # ``verification_url``. Coerce to str: the device response is untrusted, and + # a non-string (e.g. a JSON number) would crash the renderer. + uri = resp.get('verification_uri') or resp.get('verification_url') or '' + return uri if isinstance(uri, str) else '' + + +def _verification_uri_complete(resp: Dict[str, Any]) -> Optional[str]: + # Coerce to str/None for the same untrusted-input reason as _verification_uri. + uri = (resp.get('verification_uri_complete') + or resp.get('verification_url_complete')) + return uri if isinstance(uri, str) else None + + +def _safe_link_url(url: Optional[str]) -> Optional[str]: + """ + Return ``url`` only if it uses an ``http(s)`` scheme, else ``None``. + + The verification URL is untrusted (from the IdP's device-authorization + response); the scheme allowlist blocks a ``javascript:`` / ``data:`` href + from executing in the notebook DOM (``html.escape`` guards markup, not the + scheme). + """ + if not url or not isinstance(url, str): + # A non-string has no scheme to vet and would make urlparse raise. + return None + try: + scheme = urllib.parse.urlparse(url).scheme.lower() + except (ValueError, TypeError): + return None + return url if scheme in ('http', 'https') else None + + +def _render_link(url: Optional[str], *, text: Optional[str] = None) -> str: + """ + Render ``url`` as a clickable link, or as inert escaped text if its scheme + is not ``http(s)``. + + The label defaults to the URL itself. A rejected URL is shown as escaped + plain text (still visible/copyable) but never made clickable. + """ + safe = _safe_link_url(url) + label = html.escape(text if text is not None else (url or '')) + if safe is None: + return label + return (f'{label}') + + +# Strips C0/C1/ESC, bidi overrides, zero-width and line/paragraph separators +# — all can spoof the prompt (e.g. U+202E reverses a URL's host). Applied to +# untrusted device-response fields on both paths; html.escape would not catch +# these. +_CONTROL_CHARS = re.compile( + r'[\x00-\x1f\x7f-\x9f\u00ad\u061c\u115f\u180e\u200b-\u200f' + r'\u2028-\u202e\u2060-\u2064\u2066-\u2069\ufeff\ufff9-\ufffb]') + + +def _strip_control(text: Optional[str]) -> str: + """ + Strip control / format characters from an untrusted string before display. + + The verification URL, user code and IdP error strings are untrusted; raw + ANSI escapes or bidi/zero-width/line-separator chars could spoof the prompt + or hide the real sign-in URL. Needed on both paths — ``html.escape`` does + not catch bidi/zero-width spoofing. + """ + if not text: + return '' + return _CONTROL_CHARS.sub('', text) + + +def format_prompt(resp: Dict[str, Any]) -> str: + """Plain-text sign-in prompt (also used as the notebook fallback).""" + uri = _strip_control(_verification_uri(resp)) + code = _strip_control(str(resp.get('user_code', ''))) + complete = _strip_control(_verification_uri_complete(resp)) + lines = [ + '🔐 Sign in to QuestDB', + f' Open {uri} and enter code: {code}', + ] + if complete: + lines.append(f' (or open directly: {complete})') + return '\n'.join(lines) + + +def _fmt_mmss(seconds: float) -> str: + seconds = max(0, int(seconds)) + return f'{seconds // 60}:{seconds % 60:02d}' + + +class Renderer: + """No-op renderer interface; subclasses present the prompt to the user.""" + + def on_prompt(self, resp: Dict[str, Any]) -> None: + pass + + def on_waiting(self, seconds_left: float) -> None: + pass + + def on_success(self, identity: Optional[str], expires_in: float) -> None: + pass + + def on_failure(self, message: str) -> None: + pass + + +class TerminalRenderer(Renderer): + """Plain-text rendering for terminals (writes to ``stderr`` by default).""" + + def __init__(self, stream: Optional[TextIO] = None, qr: bool = False): + self._stream = stream if stream is not None else sys.stderr + self._qr = qr + self._countdown_active = False + + def _write(self, text: str) -> None: + try: + try: + self._stream.write(text) + except UnicodeEncodeError: + # The stream's encoding can't represent some chars (e.g. the + # emoji on a legacy Windows console or ascii PYTHONIOENCODING). + # Degrade only those, so the URL/code don't vanish and look like + # a silent hang. + enc = getattr(self._stream, 'encoding', None) or 'ascii' + self._stream.write( + text.encode(enc, 'replace').decode(enc, 'replace')) + self._stream.flush() + except Exception: + pass + + def on_prompt(self, resp: Dict[str, Any]) -> None: + self._write(format_prompt(resp) + '\n') + if self._qr: + target = _verification_uri_complete(resp) or _verification_uri(resp) + art = _qr_ascii(target) + if art: + self._write(art + '\n') + + def on_waiting(self, seconds_left: float) -> None: + self._countdown_active = True + self._write(f'\r ⏳ waiting for authorization… ({_fmt_mmss(seconds_left)} left) ') + + def on_success(self, identity: Optional[str], expires_in: float) -> None: + if self._countdown_active: + self._write('\n') + self._countdown_active = False + who = f' as {_strip_control(identity)}' if identity else '' + mins = max(1, int(round(expires_in / 60))) + self._write(f'✅ Signed in{who} — token cached, expires in {mins} min\n') + + def on_failure(self, message: str) -> None: + if self._countdown_active: + self._write('\n') + self._countdown_active = False + self._write(f'❌ {_strip_control(message)}\n') + + +class JupyterRenderer(Renderer): + """Rich rendering for Jupyter using an updatable display handle.""" + + def __init__(self, qr: bool = False): + self._qr = qr + self._handle = None + self._resp: Dict[str, Any] = {} + + def _display(self, html_str: str): + from IPython.display import HTML, display # type: ignore + if self._handle is None: + self._handle = display(HTML(html_str), display_id=True) + else: + self._handle.update(HTML(html_str)) + + def _panel(self, body: str) -> str: + return ( + '
' + + body + '
') + + def _prompt_head(self): + """Header + sanitized verification link and user code. + + Shared by :meth:`on_prompt` and :meth:`_render_with_status` so + sanitization is applied on both paths, never forgotten on one. The + untrusted device-response fields are stripped of control/bidi/zero-width + chars (which ``html.escape`` does NOT remove) before rendering; + ``_render_link`` also html-escapes and scheme-vets the URL. Returns + ``(body, uri, complete)`` so the QR target isn't re-derived. + """ + resp = self._resp + uri = _strip_control(_verification_uri(resp)) + code = html.escape(_strip_control(str(resp.get('user_code', '')))) + complete = _verification_uri_complete(resp) + complete = _strip_control(complete) if complete else None + body = [ + '
' + '🔐 Sign in to QuestDB
', + f'
Open {_render_link(uri)} and enter code:
', + f'
{code}
', + ] + if _safe_link_url(complete): + body.append( + '
' + _render_link( + complete, text='Click here to authorize directly →') + + '
') + return body, uri, complete + + def on_prompt(self, resp: Dict[str, Any]) -> None: + self._resp = resp + body, uri, complete = self._prompt_head() + if self._qr: + qr_target = _safe_link_url(complete) or _safe_link_url(uri) + data_uri = _qr_data_uri(qr_target) if qr_target else None + if data_uri: + body.append( + f'QR code') + body.append( + '
' + '⏳ waiting for authorization…
') + self._display(self._panel(''.join(body))) + + def on_waiting(self, seconds_left: float) -> None: + # Re-render the whole panel (cheap) with an updated countdown. + if not self._resp: + return + self._resp = dict(self._resp) + self._render_with_status( + f'⏳ waiting for authorization… ({_fmt_mmss(seconds_left)} left)', + color='#888') + + def on_success(self, identity: Optional[str], expires_in: float) -> None: + # identity comes from untrusted JWT claims: strip then html-escape. + who = html.escape(_strip_control(identity)) if identity else '' + mins = max(1, int(round(expires_in / 60))) + suffix = f' as {who}' if who else '' + self._render_with_status( + f'✅ Signed in{suffix} — token cached, expires in {mins} min', + color='#2e7d32') + + def on_failure(self, message: str) -> None: + # message may interpolate the IdP's untrusted error_description. + self._render_with_status( + '❌ ' + html.escape(_strip_control(message)), color='#c62828') + + def _render_with_status(self, status_html: str, color: str) -> None: + body, _uri, _complete = self._prompt_head() + body.append( + f'
{status_html}
') + self._display(self._panel(''.join(body))) + + +def make_renderer(qr: bool = False) -> Renderer: + """Pick a renderer appropriate for the current environment.""" + if in_ipython_kernel(): + try: + import IPython.display # noqa: F401 # type: ignore + return JupyterRenderer(qr=qr) + except Exception: + pass + return TerminalRenderer(qr=qr) + + +def _qr_ascii(data: str) -> Optional[str]: + if not data: + return None + try: + import qrcode # type: ignore + except Exception: + return None + try: + qr = qrcode.QRCode(border=1) + qr.add_data(data) + qr.make(fit=True) + import io + buf = io.StringIO() + qr.print_ascii(out=buf, invert=True) + return buf.getvalue() + except Exception: + return None + + +def _qr_data_uri(data: str) -> Optional[str]: + if not data: + return None + try: + import qrcode # type: ignore + except Exception: + return None + try: + import base64 + import io + img = qrcode.make(data) + buf = io.BytesIO() + img.save(buf, format='PNG') + b64 = base64.b64encode(buf.getvalue()).decode('ascii') + return f'data:image/png;base64,{b64}' + except Exception: + return None diff --git a/test/mock_server.py b/test/mock_server.py index 6178a4f7..708be7f4 100644 --- a/test/mock_server.py +++ b/test/mock_server.py @@ -3,6 +3,7 @@ import select import re import http.server as hs +import sys import threading import time import struct @@ -121,6 +122,21 @@ def __exit__(self, _ex_type, _ex_value, _ex_tb): SETTINGS_WITH_PROTOCOL_VERSION_V1_V2_V3 = '{"config":{"release.type":"OSS","release.version":"[DEVELOPMENT]","line.proto.support.versions":[1,2,3],"ilp.proto.transports":["tcp","http"],"posthog.enabled":false,"posthog.api.key":null,"cairo.max.file.name.length":127},"preferences.version":0,"preferences":{}}' SETTINGS_WITHOUT_PROTOCOL_VERSION = '{ "release.type": "OSS", "release.version": "[DEVELOPMENT]", "acl.enabled": false, "posthog.enabled": false, "posthog.api.key": null }' +class _QuietHTTPServer(hs.HTTPServer): + """HTTPServer that stays quiet when a client disconnects abruptly. + + Several tests (e.g. the request-timeout and min-throughput cases) drop the + connection mid-request on purpose. The stdlib would otherwise print a + harmless but noisy traceback for the resulting connection error -- most + visibly on Windows, where the keep-alive read of the next request line + raises ConnectionResetError outside of any request handler's try/except. + """ + def handle_error(self, request, client_address): + if isinstance(sys.exc_info()[1], ConnectionError): + return + super().handle_error(request, client_address) + + class HttpServer: def __init__(self, settings=SETTINGS_WITH_PROTOCOL_VERSION_V1_V2_V3, delay_seconds=0): self.delay_seconds = delay_seconds @@ -162,7 +178,12 @@ def do_GET(self): else: self.send_error(404, "Endpoint not found") self.close_connection = False - except BrokenPipeError: + except ConnectionError: + # The client (sender under test) may disconnect mid-request, + # e.g. in the timeout / min-throughput tests. On Windows this + # surfaces as ConnectionAbortedError/ConnectionResetError + # rather than the BrokenPipeError seen on Unix; both derive + # from ConnectionError. pass def do_POST(self): @@ -187,7 +208,12 @@ def do_POST(self): if body: self.wfile.write(body) self.close_connection = False - except BrokenPipeError: + except ConnectionError: + # The client (sender under test) may disconnect mid-request, + # e.g. in the timeout / min-throughput tests. On Windows this + # surfaces as ConnectionAbortedError/ConnectionResetError + # rather than the BrokenPipeError seen on Unix; both derive + # from ConnectionError. pass return IlpHttpHandler @@ -195,7 +221,7 @@ def do_POST(self): def __enter__(self): self._stop_event = threading.Event() handler_class = self.create_handler() - self._http_server = hs.HTTPServer(('', 0), handler_class, bind_and_activate=True) + self._http_server = _QuietHTTPServer(('', 0), handler_class, bind_and_activate=True) self._http_server.timeout = 30 self._http_server_thread = threading.Thread(target=self._serve) self._http_server_thread.start() diff --git a/test/test.py b/test/test.py index 18f4461a..054efe5e 100755 --- a/test/test.py +++ b/test/test.py @@ -33,6 +33,25 @@ from fixture import _parse_version +# OIDC auth tests (pure-Python; no compiled extension required). +# Imported here so they are picked up by ``unittest.main()`` in CI. +from test_auth import ( + TestDeviceFlow, + TestNonInteractive, + TestRefresh, + TestDiscovery, + TestInsecureSettingsGuard, + TestRestAdapter, + TestRestAdapterAuthErrors, + TestAdapters, + TestConcurrency, + TestConfigHelpers, + TestEndpointValidation, + TestCacheKey, + TestTransportSecurity, + TestRendererSecurity, +) + NUMPY_VERSION = _parse_version(np.__version__) try: diff --git a/test/test_auth.py b/test/test_auth.py new file mode 100644 index 00000000..1a3afb96 --- /dev/null +++ b/test/test_auth.py @@ -0,0 +1,2588 @@ +#!/usr/bin/env python3 +################################################################################ +## ___ _ ____ ____ +## / _ \ _ _ ___ ___| |_| _ \| __ ) +## | | | | | | |/ _ \/ __| __| | | | _ \ +## | |_| | |_| | __/\__ \ |_| |_| | |_) | +## \__\_\\__,_|\___||___/\__|____/|____/ +## +## Copyright (c) 2014-2019 Appsicle +## Copyright (c) 2019-2024 QuestDB +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +################################################################################ + +""" +Standalone unit tests for ``questdb.auth``. + +These do not require the compiled ``questdb.ingress`` extension; they exercise +the device flow, discovery, caching, refresh and the REST adapter against an +in-process mock IdP + mock QuestDB server. + +Run directly:: + + python3 test/test_auth.py -v +""" + +import base64 +import contextlib +import importlib.util +import json +import os +import sys +import threading +import types +import unittest +import http.server +import urllib.parse +from unittest import mock + +sys.dont_write_bytecode = True +sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'src')) + +from questdb.auth import ( # noqa: E402 + OidcDeviceAuth, + QuestDB, + connect, + OidcError, + OidcConfigError, + OidcDeviceFlowError, + OidcTimeoutError, + OidcInteractionRequired, + OidcAuthError, + OidcNetworkError, + TokenSet, +) +from questdb.auth._cache import ( # noqa: E402 + MemoryCache, _MEMORY_GENERATION, _MEMORY_STORE) +from questdb.auth._render import Renderer # noqa: E402 + +try: + import pandas as pd +except ImportError: + pd = None + +_HAS_PG_DRIVER = ( + importlib.util.find_spec('psycopg') is not None + or importlib.util.find_spec('psycopg2') is not None) + + +class _FakeAuth: + """A stand-in OidcDeviceAuth for adapter tests (no network).""" + + _ctx = None + + def __init__(self, token='TKN'): + self._token = token + self.calls = 0 + + def token(self): + self.calls += 1 + return self._token + + def headers(self): + return {'Authorization': f'Bearer {self._token}'} + + +def _jwt(claims): + """Build an unsigned JWT-shaped string with the given payload claims.""" + def b64(obj): + raw = json.dumps(obj).encode() + return base64.urlsafe_b64encode(raw).rstrip(b'=').decode() + return f'{b64({"alg": "none"})}.{b64(claims)}.sig' + + +ID_TOKEN = _jwt({'sub': 'user-1', 'email': 'alice@example.com', + 'groups': ['analysts']}) +ACCESS_TOKEN = _jwt({'sub': 'user-1', 'scope': 'openid'}) + + +@contextlib.contextmanager +def _raw_response_server(status, content_type, body): + """A throwaway HTTP server that returns one fixed (status, type, body). + + Used to exercise the transport's handling of responses the scripted mock + IdP can't produce (non-JSON 2xx, non-dict JSON, non-2xx) on the token / + device / settings / discovery endpoints. Yields the base URL. + """ + class _H(http.server.BaseHTTPRequestHandler): + def log_message(self, *a): + pass + + def _send(self): + self.send_response(status) + self.send_header('Content-Type', content_type) + self.send_header('Content-Length', str(len(body))) + self.end_headers() + self.wfile.write(body) + + def do_GET(self): + self._send() + + def do_POST(self): + self.rfile.read(int(self.headers.get('Content-Length', 0))) + self._send() + + srv = http.server.HTTPServer(('127.0.0.1', 0), _H) + threading.Thread(target=srv.serve_forever, daemon=True).start() + try: + yield f'http://127.0.0.1:{srv.server_port}' + finally: + srv.shutdown() + srv.server_close() + + +class FakeClock: + """Deterministic clock: ``sleep`` advances both monotonic and wall time.""" + + def __init__(self): + self.mono = 0.0 + self.wall = 1_000_000.0 + self.sleeps = [] + + def sleep(self, dt): + self.sleeps.append(dt) + self.mono += dt + self.wall += dt + + def monotonic(self): + return self.mono + + def now(self): + return self.wall + + +class MockState: + """Scriptable behaviour shared with the request handler.""" + + def __init__(self): + self.settings = {} + self.well_known = None + # FIFO of (status, body) returned for device_code grant polls. + # When exhausted, the last entry repeats. + self.token_script = [(200, None)] # None => default success body + self.refresh_response = None # (status, body) or None + self.device_response = None # override device-auth response body + self.device_status = 200 + self.expected_bearer = None # for /exec auth check + self.exec_response = None + self.exec_status = 200 + self.exec_raw = None # (status, content_type, bytes) override + # Recording. + self.device_requests = 0 + self.token_requests = [] + self.refresh_requests = 0 + self.refresh_forms = [] + self.exec_requests = [] + + +class _Handler(http.server.BaseHTTPRequestHandler): + def log_message(self, *args): + pass + + @property + def state(self): + return self.server.state + + def _send_json(self, status, obj): + data = json.dumps(obj).encode() + self.send_response(status) + self.send_header('Content-Type', 'application/json') + self.send_header('Content-Length', str(len(data))) + self.end_headers() + self.wfile.write(data) + + def _read_form(self): + length = int(self.headers.get('Content-Length', 0)) + body = self.rfile.read(length).decode() + return {k: v[0] for k, v in urllib.parse.parse_qs(body).items()} + + def do_GET(self): + path = urllib.parse.urlparse(self.path).path + if path == '/settings': + self._send_json(200, self.state.settings) + elif path == '/.well-known/openid-configuration': + if self.state.well_known is None: + self._send_json(404, {'error': 'not found'}) + else: + self._send_json(200, self.state.well_known) + elif path == '/exec': + auth = self.headers.get('Authorization') + if self.state.expected_bearer and auth != ( + 'Bearer ' + self.state.expected_bearer): + self._send_json(401, {'error': 'unauthorized'}) + return + self.state.exec_requests.append(self.path) + if self.state.exec_raw is not None: + status, ctype, raw = self.state.exec_raw + self.send_response(status) + self.send_header('Content-Type', ctype) + self.send_header('Content-Length', str(len(raw))) + self.end_headers() + self.wfile.write(raw) + return + self._send_json(self.state.exec_status, self.state.exec_response or { + 'columns': [ + {'name': 'ts', 'type': 'TIMESTAMP'}, + {'name': 'price', 'type': 'DOUBLE'}, + ], + 'dataset': [ + ['2021-01-01T00:00:00.000000Z', 1.5], + ['2021-01-02T00:00:00.000000Z', 2.5], + ], + 'count': 2, + }) + else: + self._send_json(404, {'error': 'not found'}) + + def do_POST(self): + path = urllib.parse.urlparse(self.path).path + form = self._read_form() + if path == '/device': + self.state.device_requests += 1 + if self.state.device_status != 200: + self._send_json(self.state.device_status, + self.state.device_response or + {'error': 'invalid_client'}) + return + body = self.state.device_response or { + 'device_code': 'DEV-CODE', + 'user_code': 'WDJB-MJHT', + 'verification_uri': 'https://idp.example.com/device', + 'verification_uri_complete': + 'https://idp.example.com/device?user_code=WDJB-MJHT', + 'expires_in': 600, + 'interval': 5, + } + self._send_json(200, body) + elif path == '/token': + grant = form.get('grant_type') + if grant == 'refresh_token': + self.state.refresh_requests += 1 + self.state.refresh_forms.append(form) + status, body = self.state.refresh_response or ( + 200, self._default_token_body()) + self._send_json(status, body) + return + self.state.token_requests.append(form) + idx = min(len(self.state.token_requests) - 1, + len(self.state.token_script) - 1) + status, body = self.state.token_script[idx] + if body is None: + body = self._default_token_body() + self._send_json(status, body) + else: + self._send_json(404, {'error': 'not found'}) + + @staticmethod + def _default_token_body(): + return { + 'access_token': ACCESS_TOKEN, + 'id_token': ID_TOKEN, + 'refresh_token': 'REFRESH-1', + 'token_type': 'Bearer', + 'expires_in': 3600, + 'scope': 'openid groups', + } + + +class _MockServer(http.server.HTTPServer): + def __init__(self): + super().__init__(('127.0.0.1', 0), _Handler) + self.state = MockState() + + +class AuthTestBase(unittest.TestCase): + def setUp(self): + _MEMORY_STORE.clear() + _MEMORY_GENERATION.clear() + self.server = _MockServer() + self.state = self.server.state + self.thread = threading.Thread( + target=lambda: self.server.serve_forever(poll_interval=0.02), + daemon=True) + self.thread.start() + self.base = f'http://127.0.0.1:{self.server.server_port}' + + def tearDown(self): + self.server.shutdown() + self.server.server_close() + self.thread.join(timeout=5) + + def make_auth(self, *, clock=None, groups_in_token=True, cache='memory', + interactive=True, **kw): + clock = clock or FakeClock() + self._clock = clock + return OidcDeviceAuth( + client_id='questdb', + device_authorization_endpoint=self.base + '/device', + token_endpoint=self.base + '/token', + scope='openid groups', + groups_in_token=groups_in_token, + cache=cache, + insecure=True, + interactive=interactive, + renderer=Renderer(), + _clock=clock, + **kw) + + +class TestDeviceFlow(AuthTestBase): + def test_happy_path_returns_id_token(self): + self.state.token_script = [ + (400, {'error': 'authorization_pending'}), + (400, {'error': 'authorization_pending'}), + (200, None), + ] + auth = self.make_auth() + token = auth.token() + self.assertEqual(token, ID_TOKEN) + # 3 token polls, slept 'interval' (5s) before each. + self.assertEqual(len(self.state.token_requests), 3) + self.assertEqual(self._clock.sleeps, [5, 5, 5]) + + def test_access_token_when_groups_not_in_token(self): + auth = self.make_auth(groups_in_token=False) + self.assertEqual(auth.token(), ACCESS_TOKEN) + + def test_headers(self): + auth = self.make_auth() + self.assertEqual(auth.headers(), + {'Authorization': 'Bearer ' + ID_TOKEN}) + + def test_slow_down_backs_off(self): + self.state.token_script = [ + (400, {'error': 'slow_down'}), + (200, None), + ] + auth = self.make_auth() + auth.token() + # interval starts at 5, +5 after slow_down. + self.assertEqual(self._clock.sleeps, [5, 10]) + + def test_transient_network_error_during_poll_keeps_polling(self): + # A dropped connection / DNS blip / timeout on a single poll must not + # abort a sign-in the user may already have completed in the browser: + # the loop keeps polling until the deadline (RFC 8628 §3.4). M1. + self.state.token_script = [(200, None)] # success once actually polled + auth = self.make_auth() + real_idp_post = auth._idp_post + token_polls = {'n': 0} + + def flaky(url, form): + # Fail only the first poll of the token endpoint; pass the device- + # code request and later polls through to the real transport. + if url == auth.config.token_endpoint: + token_polls['n'] += 1 + if token_polls['n'] == 1: + raise OidcNetworkError('connection reset mid-poll') + return real_idp_post(url, form) + + auth._idp_post = flaky + self.assertEqual(auth.token(), ID_TOKEN) + # First poll raised (transient, retried); second poll reached the IdP. + self.assertEqual(token_polls['n'], 2) + self.assertEqual(len(self.state.token_requests), 1) + self.assertEqual(self._clock.sleeps, [5, 5]) + + def test_transient_5xx_and_429_during_poll_keep_polling(self): + # A 5xx server error or a 429 rate-limit (even carrying a JSON body) is + # transient, not a terminal OAuth rejection: keep polling, backing off + # on the rate-limit. M1. + self.state.token_script = [ + (503, {'error': 'server_error'}), + (429, {'error': 'slow_down'}), + (200, None), + ] + auth = self.make_auth() + self.assertEqual(auth.token(), ID_TOKEN) + self.assertEqual(len(self.state.token_requests), 3) + # 503 polled at the base interval; 429 bumps the interval by 5. + self.assertEqual(self._clock.sleeps, [5, 5, 10]) + + def test_non_json_4xx_during_poll_is_terminal(self): + # A non-JSON 4xx during polling (an HTML/plain error page from a WAF or + # reverse proxy in front of the IdP, or a non-conformant IdP) is a + # terminal rejection: a conformant OAuth error is JSON, so it can't be + # authorization_pending / slow_down. Fail fast with a device-flow error + # instead of polling on to a misleading "code expired". M1. + auth = self.make_auth() + with _raw_response_server( + 403, 'text/html', b'denied') as raw: + # Point only the poll (token) endpoint at the non-JSON 403; the + # device-code request still hits the JSON mock IdP. Set it post- + # construction so the (already-satisfied) co-location check isn't + # re-run against the throwaway origin. + auth.config.token_endpoint = raw + '/token' + with self.assertRaises(OidcDeviceFlowError) as cm: + auth.token() + # Terminal on the first poll: not a timeout, and it did not keep polling + # to the device-code deadline. + self.assertNotIsInstance(cm.exception, OidcTimeoutError) + self.assertLessEqual(len(self._clock.sleeps), 1) + + def test_device_200_without_codes_is_rejected_clearly(self): + # A 200 device-authorization response missing device_code/user_code is + # a non-conformant body, not an HTTP failure: the error must say so + # plainly (NOT the self-contradictory "failed (HTTP 200)") and the flow + # must never start polling. + self.state.device_status = 200 + self.state.device_response = {'verification_uri': 'https://idp/device'} + auth = self.make_auth() + with self.assertRaises(OidcDeviceFlowError) as cm: + auth.token() + msg = str(cm.exception) + self.assertNotIn('HTTP 200', msg) + self.assertIn('device_code', msg) + self.assertEqual(self.state.token_requests, []) # never polled + + def test_timeout_when_never_authorized(self): + self.state.device_response = { + 'device_code': 'DEV-CODE', 'user_code': 'X', + 'verification_uri': 'https://idp/device', + 'expires_in': 10, 'interval': 5, + } + self.state.token_script = [(400, {'error': 'authorization_pending'})] + auth = self.make_auth() + with self.assertRaises(OidcTimeoutError): + auth.token() + + def test_idp_expired_token_error_raises_timeout(self): + # The token endpoint can itself answer a poll with error=expired_token + # (RFC 8628) — distinct from the local-deadline timeout. It must surface + # as OidcTimeoutError carrying that error, not loop or mis-classify it. + self.state.token_script = [(400, {'error': 'expired_token'})] + auth = self.make_auth() + with self.assertRaises(OidcTimeoutError) as cm: + auth.token() + self.assertEqual(cm.exception.error, 'expired_token') + + def test_nonpositive_expires_in_still_polls(self): + # A non-positive expires_in in the device-auth response must be treated + # as unknown, not as "already expired" — otherwise the flow times out + # before its first poll even though the user can still authorize. M2. + self.state.device_response = { + 'device_code': 'DEV-CODE', 'user_code': 'X', + 'verification_uri': 'https://idp/device', + 'expires_in': 0, 'interval': 5, + } + self.state.token_script = [(200, None)] # success on the first poll + auth = self.make_auth() + self.assertEqual(auth.token(), ID_TOKEN) + self.assertEqual(len(self.state.token_requests), 1) # it actually polled + + def test_oversized_interval_is_clamped(self): + # A hostile/huge interval must not pin the polling thread (which holds + # the acquisition lock) in one enormous sleep; the per-poll sleep is + # capped at _MAX_POLL_INTERVAL. M2. + from questdb.auth._device import _MAX_POLL_INTERVAL + self.state.device_response = { + 'device_code': 'DEV-CODE', 'user_code': 'X', + 'verification_uri': 'https://idp/device', + 'expires_in': 600, 'interval': 10 ** 9, + } + self.state.token_script = [(200, None)] + auth = self.make_auth() + auth.token() + self.assertTrue(self._clock.sleeps) + self.assertLessEqual(max(self._clock.sleeps), _MAX_POLL_INTERVAL) + + def test_oversized_expires_in_is_capped(self): + # A hostile expires_in must not keep the poll loop (and the lock) alive + # indefinitely; the lifetime is capped so a never-authorized flow still + # terminates promptly rather than looping millions of times. M2. + from questdb.auth._device import ( + _MAX_DEVICE_CODE_LIFETIME, _MAX_POLL_INTERVAL) + self.state.device_response = { + 'device_code': 'DEV-CODE', 'user_code': 'X', + 'verification_uri': 'https://idp/device', + 'expires_in': 10 ** 9, 'interval': 10 ** 9, # interval clamps too + } + self.state.token_script = [(400, {'error': 'authorization_pending'})] + auth = self.make_auth() + with self.assertRaises(OidcTimeoutError): + auth.token() + max_polls = _MAX_DEVICE_CODE_LIFETIME // _MAX_POLL_INTERVAL + 1 + self.assertLessEqual(len(self.state.token_requests), max_polls) + + def test_access_denied_is_surfaced(self): + self.state.token_script = [ + (400, {'error': 'access_denied', + 'error_description': 'user said no'}), + ] + auth = self.make_auth() + with self.assertRaises(OidcDeviceFlowError) as cm: + auth.token() + self.assertEqual(cm.exception.error, 'access_denied') + self.assertIn('user said no', str(cm.exception)) + + def test_device_endpoint_rejects_grant(self): + self.state.device_status = 400 + self.state.device_response = {'error': 'invalid_client'} + auth = self.make_auth() + with self.assertRaises(OidcDeviceFlowError) as cm: + auth.token() + self.assertIn('device grant', str(cm.exception)) + + def test_token_caches_in_memory_across_instances(self): + self.make_auth().token() + self.assertEqual(self.state.device_requests, 1) + # A brand-new instance with the same config reuses the cached token. + self.make_auth().token() + self.assertEqual(self.state.device_requests, 1) + + def test_groups_mode_missing_id_token_fails_without_caching(self): + # groups_in_token=True but the completed grant carries only an + # access_token: the poll must reject it as a terminal flow error and + # NOT cache it (otherwise every later token() re-runs the whole + # interactive flow). See M1. + self.state.token_script = [(200, { + 'access_token': ACCESS_TOKEN, 'token_type': 'Bearer', + 'expires_in': 3600})] # no id_token + auth = self.make_auth(groups_in_token=True) + with self.assertRaises(OidcDeviceFlowError): + auth.token() + self.assertIsNone(auth._tokens) # nothing was cached + + def test_groups_mode_accepts_id_token_without_access_token(self): + # A completed grant that returns only an id_token (no access_token) is + # usable in groups mode and must be returned, not discarded as it was + # when success gated on access_token. See M1. + self.state.token_script = [(200, { + 'id_token': ID_TOKEN, 'token_type': 'Bearer', + 'expires_in': 3600})] # no access_token + auth = self.make_auth(groups_in_token=True) + self.assertEqual(auth.token(), ID_TOKEN) + + def test_200_without_access_token_is_not_success(self): + # A 200 with no access_token must not be treated as a token. + self.state.token_script = [(200, {'token_type': 'Bearer'})] + auth = self.make_auth() + with self.assertRaises(OidcDeviceFlowError): + auth.token() + + def test_access_token_headers(self): + auth = self.make_auth(groups_in_token=False) + self.assertEqual(auth.headers(), + {'Authorization': 'Bearer ' + ACCESS_TOKEN}) + + def test_clear_forces_resignin(self): + auth = self.make_auth() + auth.token() + self.assertEqual(self.state.device_requests, 1) + auth.clear() + auth.token() + self.assertEqual(self.state.device_requests, 2) # prompted again + + def test_openid_scope_auto_added_for_groups_in_token(self): + # groups-in-token requires an id_token, which needs the openid scope. + auth = OidcDeviceAuth( + client_id='questdb', + device_authorization_endpoint=self.base + '/device', + token_endpoint=self.base + '/token', + scope='groups', groups_in_token=True, # no 'openid' + cache='memory', insecure=True, renderer=Renderer()) + self.assertIn('openid', auth.config.scope.split()) + + def test_zero_expires_in_is_treated_as_unknown(self): + # A non-positive expires_in must not mark the just-issued token expired. + self.state.token_script = [(200, { + 'access_token': ACCESS_TOKEN, 'id_token': ID_TOKEN, + 'token_type': 'Bearer', 'expires_in': 0})] + auth = self.make_auth() + auth.token() + self.assertTrue(auth._tokens.is_valid(self._clock.now())) + + def test_short_lived_token_valid_at_issue(self): + # A small positive expires_in (< 2*skew) must not read as expired the + # instant it is issued (adaptive skew = min(skew, lifetime/2)). + self.state.token_script = [(200, { + 'access_token': ACCESS_TOKEN, 'id_token': ID_TOKEN, + 'token_type': 'Bearer', 'expires_in': 20})] + auth = self.make_auth() + auth.token() + t = auth._tokens + self.assertEqual(round(t.expires_at - t.issued_at), 20) + self.assertTrue(t.is_valid(t.issued_at)) # usable right after issue + self.assertFalse(t.is_valid(t.expires_at)) # but still does expire + + def test_overflow_expires_in_treated_as_unknown(self): + # A non-finite expires_in (JSON Infinity, which json.loads accepts and + # int(inf) turns into an OverflowError — not a ValueError) must not + # crash; treat it as unknown so the token stays usable. See M1. + self.state.token_script = [(200, { + 'access_token': ACCESS_TOKEN, 'id_token': ID_TOKEN, + 'token_type': 'Bearer', 'expires_in': float('inf')})] + auth = self.make_auth() + self.assertEqual(auth.token(), ID_TOKEN) + self.assertTrue(auth._tokens.is_valid(self._clock.now())) + + def test_overflow_device_timing_fields_do_not_crash(self): + # Non-finite interval / expires_in in the device-auth response (JSON + # Infinity) must be treated as unknown, not raise OverflowError. See M1. + self.state.device_response = { + 'device_code': 'DEV-CODE', 'user_code': 'X', + 'verification_uri': 'https://idp/device', + 'expires_in': float('inf'), 'interval': float('inf')} + self.state.token_script = [(200, None)] # success on the first poll + auth = self.make_auth() + self.assertEqual(auth.token(), ID_TOKEN) + + def test_deeply_nested_jwt_payload_does_not_crash(self): + # A hostile/buggy IdP returning an id_token whose payload base64-decodes + # to deeply-nested JSON must not crash token() with a raw RecursionError + # from the best-effort identity decode (RecursionError is not a + # ValueError); the decode degrades to no-identity and the token is still + # returned. See _decode_jwt_claims. + payload = base64.urlsafe_b64encode( + (('[' * 60000) + (']' * 60000)).encode()).rstrip(b'=').decode() + nested = f'aaa.{payload}.sig' + self.state.token_script = [(200, { + 'id_token': nested, 'token_type': 'Bearer', 'expires_in': 3600})] + auth = self.make_auth() + self.assertEqual(auth.token(), nested) + + def test_idp_requests_use_configured_timeout(self): + # The device-code / poll / refresh POSTs must use the configured + # timeout, so a stalled IdP can't pin the acquisition lock for the + # urllib default (30s) per network leg. See M3. + seen = [] + + def fake_post_form(url, form, *, ctx=None, insecure=False, + timeout=None): + seen.append(timeout) + if url.endswith('/device'): + return 200, {'device_code': 'D', 'user_code': 'U', + 'verification_uri': 'https://idp/d', + 'expires_in': 600, 'interval': 5} + return 200, {'access_token': ACCESS_TOKEN, 'id_token': ID_TOKEN, + 'token_type': 'Bearer', 'expires_in': 3600} + + from questdb.auth import _device + auth = self.make_auth(timeout=3) + with mock.patch.object(_device, 'post_form', fake_post_form): + self.assertEqual(auth.token(), ID_TOKEN) + self.assertTrue(seen) + self.assertTrue( + all(t == 3 for t in seen), + f'IdP POSTs did not all use the configured timeout: {seen}') + + def test_connect_lazy_defers_signin(self): + # eager=False must return a session WITHOUT running the device flow; the + # first token-needing call then triggers exactly one sign-in. See M4. + self.state.settings = {'config': { + 'acl.oidc.enabled': True, + 'acl.oidc.client.id': 'questdb', + 'acl.oidc.scope': 'openid groups', + 'acl.oidc.groups.encoded.in.token': True, + 'acl.oidc.token.endpoint': self.base + '/token', + 'acl.oidc.device.authorization.endpoint': self.base + '/device'}} + qdb = connect(self.base, insecure=True, eager=False, + renderer=Renderer(), interactive=True, _clock=FakeClock()) + self.assertEqual(self.state.device_requests, 0) # deferred + self.assertEqual(qdb.token(), ID_TOKEN) # first use signs in + self.assertEqual(self.state.device_requests, 1) + + def test_open_browser_rejects_dangerous_scheme(self): + auth = self.make_auth(open_browser=True) + with mock.patch('webbrowser.open') as opener: + auth._maybe_open_browser({'verification_uri': 'javascript:alert(1)'}) + opener.assert_not_called() + auth._maybe_open_browser( + {'verification_uri': 'https://idp.example.com/device'}) + opener.assert_called_once_with('https://idp.example.com/device') + + def test_memory_cache_returns_independent_copy(self): + cache = MemoryCache() + stored = TokenSet(access_token='a', refresh_token='r', expires_at=1.0) + cache.store('k', stored) + # Each load is a distinct copy — never the object handed to store(), nor + # shared between loads — so a cached entry can't be aliased and reused. + first = cache.load('k') + second = cache.load('k') + self.assertIsNot(first, stored) + self.assertIsNot(first, second) + self.assertEqual(first.refresh_token, 'r') + + def test_tokenset_is_frozen(self): + # TokenSet is immutable: the lock-free fast path reads a published + # TokenSet without a lock, which is only safe if its fields never change + # after construction. Mutating one must raise, not silently succeed. + import dataclasses + t = TokenSet(access_token='a', refresh_token='r', expires_at=1.0) + with self.assertRaises(dataclasses.FrozenInstanceError): + t.refresh_token = 'MUTATED' + # Deriving a modified copy is the supported idiom. + t2 = dataclasses.replace(t, refresh_token='r2') + self.assertEqual(t.refresh_token, 'r') + self.assertEqual(t2.refresh_token, 'r2') + + def test_tokenset_repr_redacts_secrets(self): + # The access/id/refresh tokens must never appear in repr() — a TokenSet + # landing in a log line or traceback would otherwise leak credentials. + r = repr(TokenSet(access_token='SECRET-A', id_token='SECRET-I', + refresh_token='SECRET-R', scope='openid')) + self.assertNotIn('SECRET-A', r) + self.assertNotIn('SECRET-I', r) + self.assertNotIn('SECRET-R', r) + self.assertIn('openid', r) # non-secret metadata still shown + + +class TestNonInteractive(AuthTestBase): + def test_non_interactive_raises_without_polling(self): + auth = self.make_auth(interactive=False) + with self.assertRaises(OidcInteractionRequired): + auth.token() + self.assertEqual(self.state.device_requests, 0) + + +class TestRefresh(AuthTestBase): + def _seed_expired(self, auth, refresh_token='REFRESH-1'): + expired = TokenSet( + access_token='old-access', id_token='old-id', + refresh_token=refresh_token, + expires_at=self._clock.now() - 10) + auth._cache.store(auth.cache_key, expired) + + def test_silent_refresh(self): + auth = self.make_auth() + self._seed_expired(auth) + token = auth.token() + self.assertEqual(token, ID_TOKEN) + self.assertEqual(self.state.refresh_requests, 1) + self.assertEqual(self.state.device_requests, 0) # no re-prompt + + def test_refresh_failure_falls_back_to_device_flow(self): + auth = self.make_auth() + self._seed_expired(auth) + self.state.refresh_response = (400, {'error': 'invalid_grant'}) + token = auth.token() + self.assertEqual(token, ID_TOKEN) + self.assertEqual(self.state.refresh_requests, 1) + self.assertEqual(self.state.device_requests, 1) # re-prompted + + def test_refresh_token_preserved_when_not_rotated(self): + auth = self.make_auth() + self._seed_expired(auth) + self.state.refresh_response = (200, { + 'access_token': ACCESS_TOKEN, 'id_token': ID_TOKEN, + 'token_type': 'Bearer', 'expires_in': 3600}) # no new refresh + auth.token() + self.assertEqual(auth._tokens.refresh_token, 'REFRESH-1') + + def test_rotated_refresh_token_is_stored(self): + # When the IdP DOES rotate the refresh token, the new one must replace + # the old in the cached token set — else an IdP with one-time-use + # refresh tokens breaks on the NEXT refresh. + auth = self.make_auth() + self._seed_expired(auth) + self.state.refresh_response = (200, { + 'access_token': ACCESS_TOKEN, 'id_token': ID_TOKEN, + 'refresh_token': 'REFRESH-2', # rotated + 'token_type': 'Bearer', 'expires_in': 3600}) + auth.token() + self.assertEqual(auth._tokens.refresh_token, 'REFRESH-2') + self.assertEqual(self.state.device_requests, 0) # no re-prompt + + def test_refresh_without_id_token_falls_back_to_device_flow(self): + # groups_in_token=True but the IdP's refresh omits the id_token: the + # refresh is unusable, so fall back to the interactive flow rather than + # caching it and looping (the device flow yields a complete token). + auth = self.make_auth(groups_in_token=True) + self._seed_expired(auth) + self.state.refresh_response = (200, { + 'access_token': ACCESS_TOKEN, 'token_type': 'Bearer', + 'expires_in': 3600}) # no id_token + token = auth.token() + self.assertEqual(token, ID_TOKEN) # from the device flow + self.assertEqual(self.state.refresh_requests, 1) + self.assertEqual(self.state.device_requests, 1) # fell back + + def test_refresh_without_id_token_non_interactive_does_not_loop(self): + # Same situation but non-interactive: surface a clear error rather than + # repeatedly re-running a refresh that can never satisfy _select. + auth = self.make_auth(groups_in_token=True, interactive=False) + self._seed_expired(auth) + self.state.refresh_response = (200, { + 'access_token': ACCESS_TOKEN, 'token_type': 'Bearer', + 'expires_in': 3600}) # no id_token + with self.assertRaises(OidcInteractionRequired): + auth.token() + self.assertEqual(self.state.device_requests, 0) + + def test_cached_token_missing_required_kind_is_refreshed(self): + # A cached, non-expired token that lacks the required kind (here: + # access_token in non-groups mode) must not pass the cache gate and + # then hard-fail in _select; it should trigger a refresh instead. + auth = self.make_auth(groups_in_token=False) + auth._cache.store(auth.cache_key, TokenSet( + access_token=None, id_token='id', refresh_token='REFRESH-1', + expires_at=self._clock.now() + 3600)) + token = auth.token() + self.assertEqual(token, ACCESS_TOKEN) + self.assertEqual(self.state.refresh_requests, 1) + self.assertEqual(self.state.device_requests, 0) + + def test_refresh_network_error_propagates_without_reprompt(self): + # Both endpoints point at a closed port (same origin, so the co-location + # check passes), so the refresh POST fails at the transport layer. The + # error must propagate from the *token* endpoint (the refresh), proving + # the flow did NOT fall back to the device flow on a transient blip. + clock = FakeClock() + auth = OidcDeviceAuth( + client_id='questdb', + device_authorization_endpoint='http://127.0.0.1:1/device', + token_endpoint='http://127.0.0.1:1/token', # connection refused + scope='openid groups', groups_in_token=True, cache='memory', + insecure=True, interactive=True, renderer=Renderer(), + _clock=clock) + expired = TokenSet( + access_token='old', id_token='old-id', refresh_token='REFRESH-1', + expires_at=clock.now() - 10) + auth._cache.store(auth.cache_key, expired) + + with self.assertRaises(OidcNetworkError) as cm: + auth.token() + # The error is from the refresh (token endpoint), not a device-flow + # fallback (device endpoint), and the refresh token is kept for a retry. + self.assertIn('/token', str(cm.exception)) + self.assertEqual(auth._tokens.refresh_token, 'REFRESH-1') + + def test_refresh_transient_5xx_kept_for_retry(self): + # A transient IdP error (5xx) during a silent refresh must NOT tear the + # session down and re-prompt: the refresh token is still valid, so it is + # surfaced as a retryable OidcNetworkError and the cached token (with its + # refresh token) is kept for a later retry — matching the poll loop, + # which also treats 5xx/429 as transient. M2. + auth = self.make_auth() + self._seed_expired(auth) + self.state.refresh_response = (503, {'error': 'temporarily_unavailable'}) + with self.assertRaises(OidcNetworkError): + auth.token() + self.assertEqual(self.state.refresh_requests, 1) + self.assertEqual(self.state.device_requests, 0) # NOT re-prompted + self.assertEqual(auth._tokens.refresh_token, 'REFRESH-1') # kept + + def test_refresh_transient_429_kept_for_retry(self): + # Same as the 5xx case for a 429 rate-limit. M2. + auth = self.make_auth() + self._seed_expired(auth) + self.state.refresh_response = (429, {'error': 'slow_down'}) + with self.assertRaises(OidcNetworkError): + auth.token() + self.assertEqual(self.state.device_requests, 0) + self.assertEqual(auth._tokens.refresh_token, 'REFRESH-1') + + def test_refresh_includes_audience_when_configured(self): + # The audience is re-sent on refresh (mirroring the device-auth + # request), so an IdP that scopes `aud` per request keeps it on the + # rotated token instead of minting one QuestDB rejects after a silent + # refresh. When no audience is configured the param is omitted. + auth = self.make_auth(audience='questdb-api') + self._seed_expired(auth) + self.assertEqual(auth.token(), ID_TOKEN) + self.assertEqual(self.state.refresh_requests, 1) + self.assertEqual( + self.state.refresh_forms[-1].get('audience'), 'questdb-api') + + # Without an audience, the refresh form carries no audience key. + _MEMORY_STORE.clear() + _MEMORY_GENERATION.clear() + self.state.refresh_forms.clear() + auth2 = self.make_auth() # no audience + self._seed_expired(auth2) + auth2.token() + self.assertNotIn('audience', self.state.refresh_forms[-1]) + + def test_refresh_transient_5xx_non_interactive_does_not_hard_fail(self): + # The worst case: in a non-interactive context (papermill / cron / CI) a + # transient refresh error must surface as a retryable OidcNetworkError, + # NOT escalate to OidcInteractionRequired — which a fall-through to the + # device flow would raise, hard-failing a session whose refresh token is + # still valid and would succeed on the next attempt. M2. + auth = self.make_auth(interactive=False) + self._seed_expired(auth) + self.state.refresh_response = (503, {'error': 'temporarily_unavailable'}) + with self.assertRaises(OidcNetworkError): + auth.token() + self.assertEqual(self.state.device_requests, 0) + + +class TestDiscovery(AuthTestBase): + def test_from_questdb_reads_settings(self): + self.state.settings = {'config': { + 'acl.oidc.enabled': True, + 'acl.oidc.client.id': 'questdb', + 'acl.oidc.scope': 'openid groups', + 'acl.oidc.groups.encoded.in.token': True, + 'acl.oidc.token.endpoint': self.base + '/token', + 'acl.oidc.device.authorization.endpoint': self.base + '/device', + }} + auth = OidcDeviceAuth.from_questdb( + self.base, insecure=True, interactive=True, renderer=Renderer(), + _clock=FakeClock()) + self.assertEqual(auth.config.client_id, 'questdb') + self.assertTrue(auth.config.groups_in_token) + self.assertEqual(auth.config.device_authorization_endpoint, + self.base + '/device') + self.assertEqual(auth.token(), ID_TOKEN) + + def test_user_writable_preferences_cannot_override_config(self): + # A user-writable "preferences" sibling in /settings must never override + # the trusted "config" object during discovery: end-to-end, the resolved + # credential endpoints come from "config", not the attacker's prefs. + self.state.settings = { + 'config': { + 'acl.oidc.enabled': True, + 'acl.oidc.client.id': 'questdb', + 'acl.oidc.scope': 'openid groups', + 'acl.oidc.groups.encoded.in.token': True, + 'acl.oidc.token.endpoint': self.base + '/token', + 'acl.oidc.device.authorization.endpoint': self.base + '/device', + }, + 'preferences.version': 1, + 'preferences': { + 'acl.oidc.token.endpoint': 'https://evil.example.com/token', + 'acl.oidc.device.authorization.endpoint': + 'https://evil.example.com/device', + }, + } + auth = OidcDeviceAuth.from_questdb( + self.base, insecure=True, interactive=True, renderer=Renderer(), + _clock=FakeClock()) + self.assertEqual(auth.config.token_endpoint, self.base + '/token') + self.assertEqual(auth.config.device_authorization_endpoint, + self.base + '/device') + self.assertEqual(auth.token(), ID_TOKEN) + + def test_well_known_fallback_for_device_endpoint(self): + # Settings advertise OIDC + token endpoint but NOT the device endpoint; + # issuer= is pinned, so the IdP .well-known fallback is allowed. + self.state.settings = {'config': { + 'acl.oidc.enabled': True, + 'acl.oidc.client.id': 'questdb', + 'acl.oidc.scope': 'openid', + 'acl.oidc.groups.encoded.in.token': False, + 'acl.oidc.token.endpoint': self.base + '/token', + }} + self.state.well_known = { + 'issuer': self.base, + 'token_endpoint': self.base + '/token', + 'device_authorization_endpoint': self.base + '/device', + } + auth = OidcDeviceAuth.from_questdb(self.base, issuer=self.base, + insecure=True, renderer=Renderer()) + self.assertEqual(auth.config.device_authorization_endpoint, + self.base + '/device') + + def test_device_fallback_without_issuer_is_rejected(self): + # M4: QuestDB advertises the token endpoint but not the device + # endpoint, and no issuer is pinned. Discovery would otherwise be + # steered by the (possibly tampered) /settings response, so refuse and + # demand an out-of-band issuer pin — even though a usable .well-known + # is reachable here, it must NOT be fetched. + self.state.settings = {'config': { + 'acl.oidc.enabled': True, + 'acl.oidc.client.id': 'questdb', + 'acl.oidc.token.endpoint': self.base + '/token', + }} + self.state.well_known = { + 'issuer': self.base, + 'token_endpoint': self.base + '/token', + 'device_authorization_endpoint': self.base + '/device', + } + with self.assertRaises(OidcConfigError) as cm: + OidcDeviceAuth.from_questdb(self.base, insecure=True) + self.assertIn('issuer', str(cm.exception)) + + def test_device_fallback_with_discovery_url_is_accepted(self): + # discovery_url= is an out-of-band pin too, accepted in lieu of issuer=. + self.state.settings = {'config': { + 'acl.oidc.enabled': True, + 'acl.oidc.client.id': 'questdb', + 'acl.oidc.token.endpoint': self.base + '/token', + }} + self.state.well_known = { + 'issuer': self.base, + 'token_endpoint': self.base + '/token', + 'device_authorization_endpoint': self.base + '/device', + } + auth = OidcDeviceAuth.from_questdb( + self.base, + discovery_url=self.base + '/.well-known/openid-configuration', + insecure=True, renderer=Renderer()) + self.assertEqual(auth.config.device_authorization_endpoint, + self.base + '/device') + + def test_discovery_url_rejects_off_origin_issuer_in_doc(self): + # M4: discovery_url= is advertised as an out-of-band pin, but the doc it + # points to could declare an attacker issuer AND endpoints all on one + # (attacker) origin — which passes co-location / issuer-origin vacuously. + # The discovered issuer must share the pinned discovery_url origin (OIDC + # Discovery §4.3), else refuse. /settings advertises NO endpoints, so + # both come from the (hostile) doc — the exact gap the fix closes. + self.state.settings = {'config': { + 'acl.oidc.enabled': True, + 'acl.oidc.client.id': 'questdb', + }} + self.state.well_known = { + 'issuer': 'https://attacker.example.net', + 'token_endpoint': 'https://attacker.example.net/token', + 'device_authorization_endpoint': + 'https://attacker.example.net/device', + } + with self.assertRaises(OidcConfigError) as cm: + OidcDeviceAuth.from_questdb( + self.base, + discovery_url=self.base + '/.well-known/openid-configuration', + insecure=True) + self.assertIn('origin', str(cm.exception).lower()) + + def test_oidc_disabled_raises(self): + self.state.settings = {'config': {'acl.oidc.enabled': False}} + with self.assertRaises(OidcConfigError): + OidcDeviceAuth.from_questdb(self.base, insecure=True) + + def test_missing_device_endpoint_raises(self): + # issuer= is pinned (so the fallback is allowed), but the IdP's + # discovery doc carries no device_authorization_endpoint: that is the + # error under test, not the missing-issuer guard above. + self.state.settings = {'config': { + 'acl.oidc.enabled': True, + 'acl.oidc.client.id': 'questdb', + 'acl.oidc.token.endpoint': self.base + '/token', + }} + self.state.well_known = {'issuer': self.base, + 'token_endpoint': self.base + '/token'} + with self.assertRaises(OidcConfigError): + OidcDeviceAuth.from_questdb(self.base, issuer=self.base, + insecure=True) + + def test_non_dict_well_known_doc_raises_config_error(self): + # M2: an IdP discovery document that is valid JSON but not an object + # (a list/null/number/string from a captive portal, a misconfigured + # proxy, or a hostile IdP) must surface as a typed OidcConfigError, not + # a raw AttributeError from doc.get(...). issuer= is pinned so the + # fallback is allowed; the doc's shape is the error under test. + self.state.settings = {'config': { + 'acl.oidc.enabled': True, + 'acl.oidc.client.id': 'questdb', + 'acl.oidc.token.endpoint': self.base + '/token', + }} + self.state.well_known = [] # valid JSON, but not an object + with self.assertRaises(OidcConfigError): + OidcDeviceAuth.from_questdb(self.base, issuer=self.base, + insecure=True) + + def test_malformed_endpoint_port_raises_config_error(self): + # /settings advertising a non-integer port in an endpoint must raise + # OidcConfigError (the typed contract), not a bare ValueError that + # callers catching OidcError would miss. See M6. + self.state.settings = {'config': { + 'acl.oidc.enabled': True, + 'acl.oidc.client.id': 'questdb', + 'acl.oidc.token.endpoint': 'https://idp:notaport/token', + 'acl.oidc.device.authorization.endpoint': + 'https://idp:notaport/device', + }} + with self.assertRaises(OidcConfigError): + OidcDeviceAuth.from_questdb(self.base, insecure=True) + + def test_loopback_flow_not_implemented(self): + # Reserved-but-unimplemented flow raises an OidcError subclass so it's + # caught by `except OidcError` like other config problems. + with self.assertRaises(OidcConfigError): + OidcDeviceAuth.from_questdb(self.base, flow='loopback', + insecure=True) + + def test_endpoint_origin_mismatch_rejected(self): + # /settings advertises the device endpoint on a different origin than + # the token endpoint: refuse rather than POST credentials off-origin. + self.state.settings = {'config': { + 'acl.oidc.enabled': True, + 'acl.oidc.client.id': 'questdb', + 'acl.oidc.token.endpoint': self.base + '/token', + 'acl.oidc.device.authorization.endpoint': + 'http://127.0.0.2:9/device', # different host:port + }} + with self.assertRaises(OidcConfigError): + OidcDeviceAuth.from_questdb(self.base, insecure=True) + + def test_issuer_pin_rejects_off_origin_endpoints(self): + # Endpoints are internally consistent, but an explicit issuer pins them + # to a different origin -> reject (a compromised /settings can't + # redirect the token POST when the IdP is pinned). + self.state.settings = {'config': { + 'acl.oidc.enabled': True, + 'acl.oidc.client.id': 'questdb', + 'acl.oidc.token.endpoint': self.base + '/token', + 'acl.oidc.device.authorization.endpoint': self.base + '/device', + }} + with self.assertRaises(OidcConfigError): + OidcDeviceAuth.from_questdb( + self.base, issuer='https://idp.attacker.example', + insecure=True) + + def test_issuer_pin_accepts_matching_origin(self): + self.state.settings = {'config': { + 'acl.oidc.enabled': True, + 'acl.oidc.client.id': 'questdb', + 'acl.oidc.token.endpoint': self.base + '/token', + 'acl.oidc.device.authorization.endpoint': self.base + '/device', + }} + auth = OidcDeviceAuth.from_questdb( + self.base, issuer=self.base, insecure=True, renderer=Renderer()) + self.assertEqual(auth.config.device_authorization_endpoint, + self.base + '/device') + + def test_well_known_404_raises_oidc_error(self): + # issuer pinned (so the IdP fallback is allowed), but the .well-known + # document 404s: get_json maps the non-2xx to OidcError rather than a + # silent miss that would later masquerade as a missing-endpoint error. + # See M4. + self.state.settings = {'config': { + 'acl.oidc.enabled': True, + 'acl.oidc.client.id': 'questdb', + 'acl.oidc.token.endpoint': self.base + '/token'}} + self.state.well_known = None # the handler returns 404 for /.well-known + with self.assertRaises(OidcError): + OidcDeviceAuth.from_questdb(self.base, issuer=self.base, + insecure=True) + + def test_connect_forwards_default_interval(self): + # M5: connect(**opts) routes through from_questdb; default_interval must + # be accepted (it previously raised TypeError) and reach the auth. + self.state.settings = {'config': { + 'acl.oidc.enabled': True, + 'acl.oidc.client.id': 'questdb', + 'acl.oidc.token.endpoint': self.base + '/token', + 'acl.oidc.device.authorization.endpoint': self.base + '/device'}} + qdb = connect(self.base, insecure=True, eager=False, default_interval=9, + renderer=Renderer(), interactive=True, _clock=FakeClock()) + self.assertEqual(qdb.auth._default_interval, 9) + + +class TestInsecureSettingsGuard(unittest.TestCase): + """ + M1: a /settings response fetched over plaintext http to a non-loopback host + (only reachable with insecure=True) is MITM-able, so IdP endpoints it + advertises must not be trusted to route the device code / refresh token + without an out-of-band issuer/discovery_url pin — even when BOTH endpoints + are present (so the co-location check would otherwise pass trivially). + """ + + _TAMPERED = { + 'acl.oidc.enabled': True, + 'acl.oidc.client.id': 'questdb', + 'acl.oidc.token.endpoint': 'https://evil.example.com/token', + 'acl.oidc.device.authorization.endpoint': + 'https://evil.example.com/device', + } + + def _resolve(self, settings, **kw): + # Stub the network: /settings returns the given (possibly tampered) map, + # and IdP discovery must never be contacted in these guard paths. + from questdb.auth import _discovery + with mock.patch.object(_discovery, 'fetch_settings', + return_value=settings), \ + mock.patch.object( + _discovery, 'discover_device_endpoint_from_idp', + side_effect=AssertionError('IdP discovery must not run')): + return _discovery.resolve_config(**kw) + + def test_both_endpoints_over_plaintext_without_pin_rejected(self): + # The M1 case: both endpoints present at one (attacker) origin, plaintext + # channel, no pin -> refuse, and never contact the IdP. + with self.assertRaises(OidcConfigError) as cm: + self._resolve(self._TAMPERED, + questdb_url='http://qdb.internal.example:9000', + insecure=True) + self.assertIn('issuer', str(cm.exception)) + + def test_plaintext_guard_does_not_fire_for_loopback(self): + # Loopback http never leaves the host, so /settings is not MITM-able; + # the guard must not fire (the common local-dev path). + cfg = self._resolve(self._TAMPERED, + questdb_url='http://127.0.0.1:9000', insecure=True) + self.assertEqual(cfg.token_endpoint, 'https://evil.example.com/token') + + def test_plaintext_guard_does_not_fire_over_https(self): + # Over https /settings is authenticated by TLS; the documented + # trust-the-server behavior is preserved (issuer= stays optional). + cfg = self._resolve(self._TAMPERED, + questdb_url='https://qdb.example.com:9000') + self.assertEqual(cfg.device_authorization_endpoint, + 'https://evil.example.com/device') + + def test_explicit_endpoints_over_plaintext_are_trusted(self): + # Endpoints the caller passed explicitly are not /settings-supplied, so + # the guard must not force a pin even over a plaintext channel. + cfg = self._resolve( + {'acl.oidc.enabled': True, 'acl.oidc.client.id': 'questdb'}, + questdb_url='http://qdb.internal.example:9000', insecure=True, + token_endpoint='https://idp.example.com/token', + device_authorization_endpoint='https://idp.example.com/device') + self.assertEqual(cfg.token_endpoint, 'https://idp.example.com/token') + + def test_pin_satisfies_guard_over_plaintext(self): + # With an out-of-band issuer pin the guard is satisfied (the actual + # origin validation then happens in OidcDeviceAuth.__init__). + cfg = self._resolve(self._TAMPERED, + questdb_url='http://qdb.internal.example:9000', + insecure=True, issuer='https://evil.example.com') + self.assertEqual(cfg.token_endpoint, 'https://evil.example.com/token') + + def test_issuer_path_scopes_settings_endpoints(self): + # M1: a tampered /settings advertising a DIFFERENT realm's endpoints on + # the SAME host (Keycloak path-based multi-tenancy) is rejected when the + # issuer is pinned to a specific realm — the origin check alone can't + # catch it because both realms share one origin. + kc = 'https://idp.example.com/realms' + evil = { + 'acl.oidc.enabled': True, 'acl.oidc.client.id': 'questdb', + 'acl.oidc.token.endpoint': + kc + '/EVIL/protocol/openid-connect/token', + 'acl.oidc.device.authorization.endpoint': + kc + '/EVIL/protocol/openid-connect/auth/device'} + with self.assertRaises(OidcConfigError) as cm: + self._resolve(evil, questdb_url='https://qdb.example.com:9000', + issuer=kc + '/prod') + self.assertIn('issuer', str(cm.exception).lower()) + # The pinned realm's own endpoints are accepted. + good = { + 'acl.oidc.enabled': True, 'acl.oidc.client.id': 'questdb', + 'acl.oidc.token.endpoint': + kc + '/prod/protocol/openid-connect/token', + 'acl.oidc.device.authorization.endpoint': + kc + '/prod/protocol/openid-connect/auth/device'} + cfg = self._resolve(good, questdb_url='https://qdb.example.com:9000', + issuer=kc + '/prod') + self.assertEqual(cfg.token_endpoint, + kc + '/prod/protocol/openid-connect/token') + + def test_issuer_path_scope_skips_explicit_endpoints(self): + # Caller-explicit endpoints are trusted and NOT path-checked, so an IdP + # that places endpoints outside the issuer path (e.g. Azure AD) still + # works when the endpoints are passed explicitly. + cfg = self._resolve( + {'acl.oidc.enabled': True, 'acl.oidc.client.id': 'questdb'}, + questdb_url='https://qdb.example.com:9000', + issuer='https://idp.example.com/realms/prod', + token_endpoint='https://idp.example.com/oauth2/v2.0/token', + device_authorization_endpoint=( + 'https://idp.example.com/oauth2/v2.0/devicecode')) + self.assertEqual(cfg.token_endpoint, + 'https://idp.example.com/oauth2/v2.0/token') + + def test_issuer_path_scope_rejects_dot_segment_traversal(self): + # A tampered /settings can't slip a different realm past the issuer-path + # scope with a '..' segment: '/realms/prod/../EVIL/...' satisfies a naive + # prefix test but the IdP normalizes it to the EVIL realm. The dotted + # path must be rejected (even percent-encoded). See + # _endpoint_path_under_issuer. + kc = 'https://idp.example.com/realms' + for ep in (kc + '/prod/../EVIL/protocol/openid-connect', + kc + '/prod/%2e%2e/EVIL/protocol/openid-connect', + # double-encoded: a server that unescapes twice resolves the + # '..' the old single-decode check missed (M4). + kc + '/prod/%252e%252e/EVIL/protocol/openid-connect'): + evil = { + 'acl.oidc.enabled': True, 'acl.oidc.client.id': 'questdb', + 'acl.oidc.token.endpoint': ep + '/token', + 'acl.oidc.device.authorization.endpoint': ep + '/auth/device'} + with self.assertRaises(OidcConfigError) as cm: + self._resolve(evil, questdb_url='https://qdb.example.com:9000', + issuer=kc + '/prod') + self.assertIn('issuer', str(cm.exception).lower()) + + +@unittest.skipIf(pd is None, 'pandas not installed') +class TestRestAdapter(AuthTestBase): + def _connected(self): + self.state.settings = {'config': { + 'acl.oidc.enabled': True, + 'acl.oidc.client.id': 'questdb', + 'acl.oidc.scope': 'openid groups', + 'acl.oidc.groups.encoded.in.token': True, + 'acl.oidc.token.endpoint': self.base + '/token', + 'acl.oidc.device.authorization.endpoint': self.base + '/device', + }} + self.state.expected_bearer = ID_TOKEN + return connect(self.base, insecure=True, renderer=Renderer(), + interactive=True, _clock=FakeClock()) + + def test_sql_returns_dataframe(self): + qdb = self._connected() + df = qdb.sql('SELECT * FROM trades') + self.assertEqual(list(df.columns), ['ts', 'price']) + self.assertEqual(len(df), 2) + self.assertEqual(df['price'].tolist(), [1.5, 2.5]) + # TIMESTAMP column coerced to datetime. + self.assertTrue(str(df['ts'].dtype).startswith('datetime64')) + + def test_sql_unauthorized_maps_to_auth_error(self): + qdb = self._connected() + self.state.expected_bearer = 'something-else' # force 401 + with self.assertRaises(OidcAuthError): + qdb.sql('SELECT 1') + + def test_connect_is_eager(self): + qdb = self._connected() + self.assertIsInstance(qdb, QuestDB) + # Sign-in already happened during connect(). + self.assertEqual(self.state.device_requests, 1) + + def test_sql_query_error_maps_to_oidc_error(self): + qdb = self._connected() + self.state.exec_status = 400 + self.state.exec_response = {'error': 'unexpected token', 'position': 5} + with self.assertRaises(OidcError) as cm: + qdb.sql('SELEKT 1') + self.assertIn('unexpected token', str(cm.exception)) + self.assertNotIsInstance(cm.exception, OidcAuthError) + + def test_sql_passes_limit(self): + qdb = self._connected() + qdb.sql('SELECT * FROM trades', limit='1,10') + self.assertTrue(any('limit=1' in p for p in self.state.exec_requests)) + + def test_sql_handles_empty_dataset(self): + qdb = self._connected() + self.state.exec_response = {'ddl': 'OK'} # no columns / dataset + df = qdb.sql('CREATE TABLE x (a INT)') + self.assertEqual(len(df), 0) + + def test_sql_malformed_shape_raises_oidc_error(self): + qdb = self._connected() + self.state.exec_response = { # rows shorter than the column list + 'columns': [{'name': 'a', 'type': 'LONG'}, + {'name': 'b', 'type': 'LONG'}], + 'dataset': [[1]]} + with self.assertRaises(OidcError): + qdb.sql('SELECT a, b FROM t') + + def test_sql_non_json_2xx_raises_oidc_error(self): + # A 2xx body that isn't JSON (e.g. an HTML page from a reverse proxy) + # must raise a clean OidcError, not a raw JSONDecodeError. See M3. + qdb = self._connected() + self.state.exec_raw = (200, 'text/html', b'proxy') + with self.assertRaises(OidcError) as cm: + qdb.sql('SELECT 1') + self.assertNotIsInstance(cm.exception, OidcAuthError) + + def test_sql_non_dict_json_raises_oidc_error(self): + # A valid-JSON-but-not-an-object 2xx body (e.g. a bare list) must raise + # OidcError, not AttributeError from .get(). See M3. + qdb = self._connected() + self.state.exec_response = ['not', 'an', 'object'] + with self.assertRaises(OidcError) as cm: + qdb.sql('SELECT 1') + self.assertNotIsInstance(cm.exception, OidcAuthError) + + def test_sql_non_dict_columns_raises_oidc_error(self): + # A /exec body whose "columns" entries aren't objects must raise a clean + # OidcError, not an AttributeError from .get() on the column. See M3. + qdb = self._connected() + self.state.exec_response = {'columns': [None], 'dataset': [[1]]} + with self.assertRaises(OidcError) as cm: + qdb.sql('SELECT 1') + self.assertNotIsInstance(cm.exception, OidcAuthError) + + def test_sql_non_string_column_name_raises_oidc_error(self): + # M2: a column descriptor with a non-hashable name (a JSON list/object) + # and a TIMESTAMP/DATE type must raise a clean OidcError, not a raw + # TypeError ("unhashable type") from `name in df.columns` during the + # timestamp coercion. + qdb = self._connected() + self.state.exec_response = { + 'columns': [{'name': ['evil'], 'type': 'TIMESTAMP'}, + {'name': 'b', 'type': 'LONG'}], + 'dataset': [['2021-01-01T00:00:00.000000Z', 2]]} + with self.assertRaises(OidcError) as cm: + qdb.sql('SELECT 1') + self.assertNotIsInstance(cm.exception, OidcAuthError) + + +class TestRestAdapterAuthErrors(AuthTestBase): + """QuestDB.sql maps 401/403 to OidcAuthError BEFORE it builds a DataFrame, + so the mapping is testable without a real pandas. Kept out of the + pandas-gated TestRestAdapter so this security-relevant mapping runs on EVERY + CI leg, not just the ones where pandas is installed. M5.""" + + def _connected(self): + self.state.settings = {'config': { + 'acl.oidc.enabled': True, + 'acl.oidc.client.id': 'questdb', + 'acl.oidc.scope': 'openid groups', + 'acl.oidc.groups.encoded.in.token': True, + 'acl.oidc.token.endpoint': self.base + '/token', + 'acl.oidc.device.authorization.endpoint': self.base + '/device', + }} + self.state.expected_bearer = ID_TOKEN + return connect(self.base, insecure=True, renderer=Renderer(), + interactive=True, _clock=FakeClock()) + + @staticmethod + def _stub_pandas(): + # sql() reaches the 401/403 check before it touches pandas, so a bare + # stub module is enough to exercise the mapping without the real + # (possibly absent) dependency. + return mock.patch.dict( + sys.modules, {'pandas': types.ModuleType('pandas')}) + + def test_sql_401_maps_to_auth_error_without_pandas(self): + qdb = self._connected() + self.state.expected_bearer = 'something-else' # force 401 + with self._stub_pandas(), self.assertRaises(OidcAuthError): + qdb.sql('SELECT 1') + + def test_sql_403_maps_to_auth_error_without_pandas(self): + qdb = self._connected() + self.state.exec_status = 403 # bearer matches; server forbids + self.state.exec_response = {'error': 'forbidden'} + with self._stub_pandas(), self.assertRaises(OidcAuthError): + qdb.sql('SELECT 1') + + +class TestConcurrency(AuthTestBase): + def test_valid_cached_token_does_not_block_during_signin(self): + # A caller with a valid cached token must NOT block behind another + # thread's in-progress sign-in: the fast path takes no lock. + auth = self.make_auth() + valid = TokenSet( + access_token='a', id_token=ID_TOKEN, refresh_token='r', + expires_at=self._clock.now() + 3600) + auth._cache.store(auth.cache_key, valid) + + auth._lock.acquire() # simulate another thread mid-sign-in + try: + result = {} + t = threading.Thread( + target=lambda: result.update(tok=auth.token())) + t.start() + t.join(timeout=5) + self.assertFalse( + t.is_alive(), 'token() blocked behind an in-progress sign-in') + self.assertEqual(result.get('tok'), ID_TOKEN) + finally: + auth._lock.release() + + def test_concurrent_signin_prompts_only_once(self): + # Two threads racing with an empty cache must trigger exactly ONE + # device flow; the loser reuses the winner's token. + auth = self.make_auth() + entered = threading.Event() + release = threading.Event() + + class GatingRenderer(Renderer): + def on_prompt(self, resp): + entered.set() # first thread is now inside the flow + release.wait(5) # ...holding the acquisition lock + + auth._renderer = GatingRenderer() + results = {} + + def call(name): + try: + results[name] = auth.token() + except Exception as e: # noqa: BLE001 + results[name] = e + + t1 = threading.Thread(target=call, args=('a',)) + t1.start() + self.assertTrue(entered.wait(5)) # t1 holds the lock in the flow + t2 = threading.Thread(target=call, args=('b',)) + t2.start() + release.set() # let t1 finish signing in + t1.join(5) + t2.join(5) + # Fail loudly on a deadlock regression: a hung thread would otherwise + # leak and let the assertions below pass on a stale/half-filled dict. + self.assertFalse(t1.is_alive(), 'sign-in thread deadlocked') + self.assertFalse(t2.is_alive(), 'waiter thread deadlocked') + self.assertEqual(results.get('a'), ID_TOKEN) + self.assertEqual(results.get('b'), ID_TOKEN) + self.assertEqual(self.state.device_requests, 1) # no second prompt + + def test_fast_path_does_not_write_tokens_field(self): + # M4: the lock-free fast path must be READ-ONLY. Serving a valid token + # from the shared cache must not write self._tokens — only the locked + # slow path (and _store/clear) write it — so the lock-free reader can't + # race a concurrent write (lost update / clear() resurrection). + auth = self.make_auth() + valid = TokenSet(access_token='a', id_token=ID_TOKEN, refresh_token='r', + expires_at=self._clock.now() + 3600) + auth._cache.store(auth.cache_key, valid) + self.assertIsNone(auth._tokens) # nothing published yet + self.assertEqual(auth.token(), ID_TOKEN) # served via the fast path + self.assertIsNone(auth._tokens) # fast path did not write it + + def test_clear_on_other_instance_survives_inflight_acquire(self): + # Two OidcDeviceAuth instances share the process-global MemoryCache + # (same cache_key) but have separate per-instance locks. If instance B + # clears the entry while instance A's sign-in is in flight, A's store + # must NOT resurrect it: the per-key generation A captured before its + # round-trip no longer matches, so the write is dropped and the cache + # stays cleared (the next fresh load re-prompts, honoring clear()). A + # still returns the token it just acquired. See store_if_current. + a = self.make_auth() + b = self.make_auth() + self.assertEqual(a.cache_key, b.cache_key) + + class _ClearMidFlow(Renderer): + def on_prompt(self, resp): + b.clear() # concurrent clear during A's sign-in + + a._renderer = _ClearMidFlow() + self.assertEqual(a.token(), ID_TOKEN) # A still gets its token + # A's store was dropped, so the shared cache is NOT repopulated; a fresh + # instance therefore re-signs in rather than reusing the cleared token. + self.assertNotIn(a.cache_key, _MEMORY_STORE) + + def test_store_if_current_drops_write_after_concurrent_clear(self): + # Unit cover for the CAS primitive the cross-instance guard relies on: + # a generation captured before a clear() must not be allowed to store. + cache = MemoryCache() + key = 'k' + gen = cache.generation(key) # captured before clear + cache.clear(key) # concurrent clear + self.assertFalse( + cache.store_if_current(key, TokenSet(access_token='T1'), gen)) + self.assertIsNone(cache.load(key)) # write dropped + gen2 = cache.generation(key) # unraced store succeeds + self.assertTrue( + cache.store_if_current(key, TokenSet(access_token='T2'), gen2)) + self.assertIsNotNone(cache.load(key)) + + +class TestAdapters(unittest.TestCase): + """Connection adapters: tested via injected fake modules (the real + sqlalchemy / psycopg / questdb.ingress need not be installed).""" + + def _qdb(self, url='http://db.example.com:9000', token='TKN'): + return QuestDB(url, _FakeAuth(token), insecure=True) + + def test_sender_builds_conf_with_token(self): + qdb = self._qdb('http://db.example.com:9000', token='TKN') + captured = {} + + fake = types.ModuleType('questdb.ingress') + + class Sender: + @staticmethod + def from_conf(conf, *, token=None, **kw): + captured.update(conf=conf, token=token, kw=kw) + return 'SENDER' + + fake.Sender = Sender + with mock.patch.dict(sys.modules, {'questdb.ingress': fake}): + sender = qdb.sender(auto_flush=False) + self.assertEqual(sender, 'SENDER') + self.assertEqual(captured['conf'], 'http::addr=db.example.com:9000;') + self.assertEqual(captured['token'], 'TKN') + self.assertEqual(captured['kw'], {'auto_flush': False}) + + def test_sender_https_defaults_to_443(self): + qdb = self._qdb('https://db.example.com') # no explicit port + captured = {} + fake = types.ModuleType('questdb.ingress') + + class Sender: + @staticmethod + def from_conf(conf, *, token=None, **kw): + captured['conf'] = conf + return 'S' + + fake.Sender = Sender + with mock.patch.dict(sys.modules, {'questdb.ingress': fake}): + qdb.sender() + self.assertEqual(captured['conf'], 'https::addr=db.example.com:443;') + + def test_psycopg_connects_as_sso_with_token(self): + qdb = self._qdb('http://db.example.com:9000', token='TKN') + captured = {} + fake = types.ModuleType('psycopg') + + def connect(**kw): + captured.update(kw) + return 'CONN' + + fake.connect = connect + with mock.patch.dict(sys.modules, {'psycopg': fake}): + conn = qdb.psycopg(connect_timeout=3) + self.assertEqual(conn, 'CONN') + self.assertEqual(captured['user'], '_sso') + self.assertEqual(captured['password'], 'TKN') + self.assertEqual(captured['host'], 'db.example.com') + self.assertEqual(captured['port'], 8812) + self.assertEqual(captured['dbname'], 'qdb') + self.assertEqual(captured['connect_timeout'], 3) + # The token is fetched at connect time (fresh per connection). + self.assertEqual(qdb.auth.calls, 1) + + def test_sqlalchemy_engine_injects_fresh_token_per_connect(self): + auth = _FakeAuth('TKN') + qdb = QuestDB('http://db.example.com:9000', auth, insecure=True) + created = {} + events = {} + engine_obj = object() + + fake_sa = types.ModuleType('sqlalchemy') + fake_sa.__path__ = [] + + def create_engine(url, **kw): + created.update(url=url, engine_kw=kw) + return engine_obj + + class _Event: + @staticmethod + def listens_for(target, name): + def deco(fn): + events.update(name=name, fn=fn) + return fn + return deco + + fake_sa.create_engine = create_engine + fake_sa.event = _Event + + fake_eng = types.ModuleType('sqlalchemy.engine') + + class _URL: + @staticmethod + def create(**kw): + created.update(kw) + return 'URL' + + fake_eng.URL = _URL + fake_pg = types.ModuleType('psycopg') # drives the drivername choice + + with mock.patch.dict(sys.modules, { + 'sqlalchemy': fake_sa, + 'sqlalchemy.engine': fake_eng, + 'psycopg': fake_pg}): + engine = qdb.sqlalchemy_engine(pool_pre_ping=True) + + self.assertIs(engine, engine_obj) + self.assertEqual(created['drivername'], 'postgresql+psycopg') + self.assertEqual(created['username'], '_sso') + self.assertEqual(created['host'], 'db.example.com') + self.assertEqual(created['port'], 8812) + self.assertEqual(created['database'], 'qdb') + self.assertEqual(created['url'], 'URL') + self.assertEqual(created['engine_kw'], {'pool_pre_ping': True}) + self.assertEqual(events['name'], 'do_connect') + # The listener injects a fresh token on each new connection. + before = auth.calls + for _ in range(2): + cparams = {} + events['fn'](None, None, [], cparams) + self.assertEqual(cparams['password'], 'TKN') + self.assertEqual(auth.calls - before, 2) + + def test_sender_brackets_ipv6_addr(self): + # An IPv6 literal must be bracketed in the ILP addr=host:port conf, + # else "::1:9000" is ambiguous to the conf parser. See M5. + qdb = self._qdb('https://[::1]:9000') + captured = {} + fake = types.ModuleType('questdb.ingress') + + class Sender: + @staticmethod + def from_conf(conf, *, token=None, **kw): + captured['conf'] = conf + return 'S' + + fake.Sender = Sender + with mock.patch.dict(sys.modules, {'questdb.ingress': fake}): + qdb.sender() + self.assertEqual(captured['conf'], 'https::addr=[::1]:9000;') + + def test_sender_forwards_ca_bundle_as_tls_roots(self): + # M2: an https Sender must inherit the private CA bundle (as tls_roots) + # so it trusts the same roots as the REST/IdP paths; http does not, and + # an explicit tls_roots= is never overridden. + import tempfile + + def captured_conf_kwargs(url, *, ca_bundle, **sender_kwargs): + auth = _FakeAuth('TKN') + auth._ca_bundle = ca_bundle + qdb = QuestDB(url, auth, insecure=True) + captured = {} + fake = types.ModuleType('questdb.ingress') + + class Sender: + @staticmethod + def from_conf(conf, *, token=None, **kw): + captured['kw'] = kw + return 'S' + + fake.Sender = Sender + with mock.patch.dict(sys.modules, {'questdb.ingress': fake}): + qdb.sender(**sender_kwargs) + return captured['kw'] + + with tempfile.NamedTemporaryFile('w', suffix='.pem', delete=False) as f: + f.write('-----dummy-----') + ca = f.name + try: + # https + a real CA file -> forwarded as tls_roots. + self.assertEqual( + captured_conf_kwargs('https://db.example.com:9000', + ca_bundle=ca).get('tls_roots'), ca) + # http -> never forwarded (TLS roots are irrelevant). + self.assertNotIn( + 'tls_roots', + captured_conf_kwargs('http://db.example.com:9000', + ca_bundle=ca)) + # An explicit tls_roots= wins over the inherited bundle. + self.assertEqual( + captured_conf_kwargs('https://db.example.com:9000', + ca_bundle=ca, + tls_roots='/other/ca.pem').get('tls_roots'), + '/other/ca.pem') + finally: + os.unlink(ca) + + def test_psycopg_uses_bare_ipv6_host(self): + # psycopg takes host and port separately, so the IPv6 host is passed + # WITHOUT brackets (unlike the ILP addr= form). See M5. + qdb = self._qdb('http://[::1]:9000') + captured = {} + fake = types.ModuleType('psycopg') + + def connect(**kw): + captured.update(kw) + return 'CONN' + + fake.connect = connect + with mock.patch.dict(sys.modules, {'psycopg': fake}): + qdb.psycopg() + self.assertEqual(captured['host'], '::1') + + def test_require_host_rejects_hostless_url(self): + # A URL with no extractable host must raise, not pass None to a driver; + # an explicit host= override still resolves. See M5. + for bad in ('localhost', 'questdb:9000'): + with self.subTest(url=bad): + with self.assertRaises(OidcConfigError): + QuestDB(bad, _FakeAuth(), insecure=True)._require_host() + self.assertEqual( + QuestDB('localhost', _FakeAuth())._require_host('h.example'), + 'h.example') + + def test_malformed_port_url_raises_config_error(self): + # A QuestDB URL with a non-integer port must raise OidcConfigError at + # construction, not a bare ValueError when an adapter reads .port. M3. + with self.assertRaises(OidcConfigError): + QuestDB('https://questdb.example.com:notaport', _FakeAuth(), + insecure=True) + + def test_host_with_conf_metachars_rejected(self): + # C1: a host containing the ILP conf delimiters (';' / '=') or + # whitespace must be rejected, never spliced into the + # `addr=host:port;` conf string. Otherwise a crafted/tampered URL host + # injects extra conf params — e.g. `tls_verify=unsafe_off`, which + # silently disables the sender's TLS certificate verification, or + # `auto_flush=off` (data loss). urlparse() keeps ';'/'=' in .hostname. + for bad in ('https://realhost;tls_verify=unsafe_off;x=', + 'https://a=b'): + with self.subTest(url=bad): + with self.assertRaises(OidcConfigError): + self._qdb(bad)._require_host() + # An explicit host= override goes through the same guard (incl. + # whitespace, which is never valid in a host). + for bad_host in ('evil;tls_verify=unsafe_off', 'a=b', 'h ost'): + with self.subTest(host=bad_host): + with self.assertRaises(OidcConfigError): + self._qdb()._require_host(bad_host) + # A legitimate host (incl. an IPv6 literal, which contains ':') is + # still accepted — the guard must not over-reject. + self.assertEqual(self._qdb()._require_host('::1'), '::1') + self.assertEqual( + self._qdb()._require_host('questdb.example.com'), + 'questdb.example.com') + # The guard fires through the adapter (sender), before the conf string + # is built and handed to Sender.from_conf. + qdb = self._qdb('https://realhost;tls_verify=unsafe_off:9000') + fake = types.ModuleType('questdb.ingress') + fake.Sender = object() # import must succeed so we reach the guard + with mock.patch.dict(sys.modules, {'questdb.ingress': fake}): + with self.assertRaises(OidcConfigError): + qdb.sender() + + def test_sender_hostless_url_raises(self): + # The guard propagates through an adapter (not just the helper): + # sender() on a host-less URL raises OidcConfigError. See M5. + qdb = self._qdb('questdb:9000') + fake = types.ModuleType('questdb.ingress') + fake.Sender = object() # import must succeed so we reach the guard + with mock.patch.dict(sys.modules, {'questdb.ingress': fake}): + with self.assertRaises(OidcConfigError): + qdb.sender() + + def test_sql_missing_pandas_raises(self): + qdb = self._qdb() + with mock.patch.dict(sys.modules, {'pandas': None}): + with self.assertRaises(ImportError): + qdb.sql('SELECT 1') + + @unittest.skipIf(importlib.util.find_spec('sqlalchemy') is not None, + 'sqlalchemy installed') + def test_sqlalchemy_engine_missing_dep_raises(self): + with self.assertRaises(ImportError): + self._qdb().sqlalchemy_engine() + + @unittest.skipIf(_HAS_PG_DRIVER, 'a PostgreSQL driver is installed') + def test_psycopg_missing_dep_raises(self): + with self.assertRaises(ImportError): + self._qdb().psycopg() + + @unittest.skipIf(_HAS_PG_DRIVER, 'a PostgreSQL driver is installed') + def test_pg_module_missing_chains_cause(self): + # The "no PG driver" ImportError chains the underlying import failure + # (raise ... from e) so the traceback preserves the real cause. + from questdb.auth._questdb import _pg_module + with self.assertRaises(ImportError) as cm: + _pg_module() + self.assertIsInstance(cm.exception.__cause__, ImportError) + + @unittest.skipIf(importlib.util.find_spec('questdb.ingress') is not None, + 'questdb.ingress extension is built') + def test_sender_missing_extension_raises(self): + with self.assertRaises(ImportError): + self._qdb().sender() + + def test_sender_rejects_non_integer_port(self): + # A non-integer port kwarg must be rejected before it can be + # interpolated into the addr= conf string, where ";tls_verify= + # unsafe_off" would silently disable TLS verification — the same + # injection _require_host() blocks for the host. The coercion runs + # before the extension import, so this fails cleanly even without it. + qdb = self._qdb('https://db.example.com:9000') + for bad in ('9000;tls_verify=unsafe_off', 'notaport', ['9000']): + with self.assertRaises(OidcConfigError): + qdb.sender(port=bad) + + +class TestConfigHelpers(unittest.TestCase): + def test_as_bool_variants(self): + from questdb.auth._discovery import _as_bool + for v in ('true', 'True', '1', 'yes', 'on', True, 1): + self.assertIs(_as_bool(v), True) + for v in ('false', '0', 'no', 'off', '', False, 0): + self.assertIs(_as_bool(v), False) + self.assertIsNone(_as_bool(None)) + self.assertIs(_as_bool(None, default=True), True) + + def test_resolve_endpoint_relative_path(self): + from questdb.auth._discovery import _resolve_endpoint + cfg = {'acl.oidc.host': 'idp.example.com', + 'acl.oidc.tls.enabled': True, 'acl.oidc.port': 443} + self.assertEqual(_resolve_endpoint('/as/token.oauth2', cfg), + 'https://idp.example.com:443/as/token.oauth2') + self.assertEqual(_resolve_endpoint('https://idp/x', cfg), + 'https://idp/x') # absolute is kept verbatim + + def test_resolve_endpoint_ignores_non_string(self): + # A non-string endpoint from /settings (e.g. a JSON number) must be + # treated as absent, not raise AttributeError from .startswith(). M3. + from questdb.auth._discovery import _resolve_endpoint + self.assertIsNone(_resolve_endpoint(8080, {})) + self.assertIsNone(_resolve_endpoint(True, {})) + + def test_str_setting_ignores_non_string(self): + # A non-empty string passes through; anything else (a JSON list / + # number / dict, None, empty string) reads as absent so it can't reach + # scope.split() / the cache-key join as a raw object. + from questdb.auth._discovery import _str_setting + self.assertEqual(_str_setting('openid email'), 'openid email') + for bad in (['openid'], 12345, {'x': 1}, True, '', None): + self.assertIsNone(_str_setting(bad)) + + def test_non_string_settings_do_not_crash_resolution(self): + # A buggy/tampered /settings advertising non-string acl.oidc.* values + # must stay within the typed-error contract instead of crashing later + # with a bare AttributeError / TypeError (scope.split() / the cache-key + # join). scope falls back to 'openid', audience drops to None, and a + # non-string client.id reads as absent -> clear OidcConfigError. + from questdb.auth import _discovery + base = { + 'acl.oidc.enabled': True, 'acl.oidc.client.id': 'questdb', + 'acl.oidc.token.endpoint': 'https://idp.example.com/token', + 'acl.oidc.device.authorization.endpoint': + 'https://idp.example.com/device'} + + def from_settings(settings): + with mock.patch.object(_discovery, 'fetch_settings', + return_value=settings): + return OidcDeviceAuth.from_questdb( + 'https://qdb.example.com:9000', renderer=Renderer()) + + auth = from_settings({**base, 'acl.oidc.scope': ['openid', 'groups'], + 'acl.oidc.audience': {'x': 1}}) + self.assertEqual(auth.config.scope, 'openid') # non-string -> default + self.assertIsNone(auth.config.audience) # non-string -> dropped + self.assertTrue(auth.cache_key) # crash site now safe + # A non-string client.id reads as absent -> clear typed error. + with self.assertRaises(OidcConfigError): + from_settings({**base, 'acl.oidc.client.id': 12345}) + + def test_non_string_idp_discovery_values_do_not_crash(self): + # The IdP .well-known discovery document is untrusted too: a non-string + # endpoint / issuer (a JSON number/list from a buggy or hostile IdP) + # must read as absent -> a clear OidcConfigError, not a bare + # AttributeError from safe_urlparse later. See resolve_config discovery. + from questdb.auth import _discovery + settings = {'acl.oidc.enabled': True, 'acl.oidc.client.id': 'questdb'} + + def from_discovery(well_known, **kw): + with mock.patch.object(_discovery, 'fetch_settings', + return_value=settings), \ + mock.patch.object( + _discovery, 'discover_device_endpoint_from_idp', + return_value=well_known): + return OidcDeviceAuth.from_questdb( + 'https://qdb.example.com:9000', renderer=Renderer(), **kw) + + # Non-string token / device endpoint -> absent -> clear typed error. + with self.assertRaises(OidcConfigError): + from_discovery( + {'device_authorization_endpoint': 'https://idp.example.com/device', + 'token_endpoint': 12345}, + issuer='https://idp.example.com') + with self.assertRaises(OidcConfigError): + from_discovery( + {'device_authorization_endpoint': ['nope'], + 'token_endpoint': 'https://idp.example.com/token'}, + issuer='https://idp.example.com') + # A non-string discovered issuer is dropped (no pin); valid endpoints + # still resolve and the cache key builds (the former crash site). + auth = from_discovery( + {'device_authorization_endpoint': 'https://idp.example.com/device', + 'token_endpoint': 'https://idp.example.com/token', + 'issuer': ['not', 'a', 'string']}, + discovery_url='https://idp.example.com/.well-known/openid-configuration') + self.assertIsNone(auth.config.issuer) + self.assertTrue(auth.cache_key) + + def test_resolve_endpoint_relative_path_without_host_is_none(self): + # A path-only endpoint with no acl.oidc.host can't be resolved; it must + # be treated as absent (None) so resolution fails with a clear "could + # not resolve the ... endpoint" error rather than a scheme-less "/path" + # that later surfaces as a confusing "insecure/malformed URL". + from questdb.auth._discovery import _resolve_endpoint + self.assertIsNone(_resolve_endpoint('/as/token.oauth2', {})) + self.assertIsNone( # port present but host missing -> still unresolved + _resolve_endpoint('/as/token.oauth2', {'acl.oidc.port': 443})) + + def test_resolve_endpoint_ignores_non_string_host(self): + # A non-string acl.oidc.host (a JSON number/list from a buggy or hostile + # /settings) must not be interpolated raw into the netloc (e.g. + # https://12345:9000/path); treat it as absent so a path-only endpoint + # reads as unresolvable, mirroring how endpoint values are coerced. + from questdb.auth._discovery import _resolve_endpoint + for bad_host in (12345, ['idp'], {'h': 'idp'}, True): + self.assertIsNone( + _resolve_endpoint('/as/token', {'acl.oidc.host': bad_host})) + # A non-numeric port is dropped rather than corrupting the netloc. + self.assertEqual( + _resolve_endpoint('/as/token', { + 'acl.oidc.host': 'idp', 'acl.oidc.tls.enabled': True, + 'acl.oidc.port': ['x']}), + 'https://idp/as/token') + + def test_settings_config_nesting(self): + from questdb.auth._discovery import settings_config + self.assertEqual(settings_config({'config': {'a': 1}}), {'a': 1}) + self.assertEqual(settings_config({'a': 1}), {'a': 1}) # flat fallback + + def test_settings_config_ignores_user_writable_preferences(self): + # QuestDB /settings nests server-authoritative values under "config" + # alongside a user-writable "preferences" sibling (the web console + # persists UI prefs there). Discovery must read only "config", so a user + # who can write a preference cannot smuggle an acl.oidc.* key in to + # redirect the device code / refresh token. Ported from the Java client. + from questdb.auth._discovery import settings_config + resp = { + 'config': { + 'acl.oidc.client.id': 'questdb', + 'acl.oidc.token.endpoint': 'https://idp.example.com/token'}, + 'preferences.version': 0, + 'preferences': { + 'acl.oidc.token.endpoint': 'https://evil.example.com/token'}, + } + cfg = settings_config(resp) + self.assertEqual(cfg['acl.oidc.token.endpoint'], + 'https://idp.example.com/token') + self.assertNotIn('evil', str(cfg)) + # A structured response (one carrying the user-writable "preferences" + # sibling) must NOT fall back to trusting the top level when "config" is + # absent or malformed: read nothing rather than the top level. + self.assertEqual( + settings_config({'preferences': {'acl.oidc.token.endpoint': 'x'}}), + {}) + self.assertEqual( + settings_config({'config': None, + 'preferences': {'acl.oidc.client.id': 'x'}}), + {}) + # A genuinely flat / legacy response (no config/preferences split) is + # still tolerated at the top level. + self.assertEqual(settings_config({'acl.oidc.client.id': 'q'}), + {'acl.oidc.client.id': 'q'}) + + def test_make_cache_variants(self): + # The cache factory resolves the documented specs and rejects an + # unknown one with a typed error. See M4. + from questdb.auth._cache import make_cache, MemoryCache, NullCache + self.assertIsInstance(make_cache('memory'), MemoryCache) + self.assertIsInstance(make_cache(None), NullCache) + self.assertIsInstance(make_cache('none'), NullCache) + custom = MemoryCache() + self.assertIs(make_cache(custom), custom) # a TokenCache passes through + with self.assertRaises(OidcConfigError): + make_cache('disk') + + +class TestEndpointValidation(unittest.TestCase): + def setUp(self): + from questdb.auth._discovery import validate_endpoint_origins + self._validate = validate_endpoint_origins + + def test_default_port_equivalence_accepted(self): + # https default (443) vs explicit :443 normalize to the same origin. + self._validate('https://idp/token', 'https://idp:443/device') + + def test_ipv6_same_origin_accepted(self): + self._validate('https://[::1]/token', 'https://[::1]/device') + + def test_off_origin_device_rejected(self): + with self.assertRaises(OidcConfigError): + self._validate('https://idp/token', 'https://evil.example/device') + + def test_both_endpoints_off_issuer_rejected(self): + # Endpoints agree with each other but not with the pinned issuer: + # the issuer-pin loop must check both, not just their consistency. + with self.assertRaises(OidcConfigError): + self._validate('https://idp/token', 'https://idp/device', + issuer='https://other-issuer.example') + + def test_malformed_port_raises_config_error(self): + # A non-integer port must surface as OidcConfigError, not urllib's bare + # ValueError (which callers catching OidcError would miss). See M6. + with self.assertRaises(OidcConfigError): + self._validate('https://idp:notaport/token', + 'https://idp:notaport/device') + + def test_malformed_ipv6_endpoint_raises_config_error(self): + # A malformed IPv6 literal makes urllib.parse.urlparse() itself raise + # ValueError (before .port is read); it must surface as OidcConfigError, + # not a bare ValueError escaping the typed-error contract. See M1. + with self.assertRaises(OidcConfigError): + self._validate('https://[::1', 'https://[::1') + with self.assertRaises(OidcConfigError): + OidcDeviceAuth( + client_id='questdb', + device_authorization_endpoint='https://[::1', + token_endpoint='https://[::1', + renderer=Renderer()) + + def test_explicit_constructor_enforces_co_location(self): + with self.assertRaises(OidcConfigError): + OidcDeviceAuth( + client_id='questdb', + device_authorization_endpoint='https://idp.example.com/device', + token_endpoint='https://attacker.example/token', + renderer=Renderer()) + + def test_endpoint_path_under_issuer(self): + # M1: segment-aware path containment used to isolate path-based realms. + from questdb.auth._discovery import _endpoint_path_under_issuer as under + iss = 'https://idp.example.com/realms/prod' + self.assertTrue(under(iss + '/protocol/openid-connect/token', iss)) + self.assertTrue(under(iss, iss)) # exact path + self.assertTrue(under(iss + '/', iss)) # trailing slash + self.assertFalse(under('https://idp.example.com/realms/EVIL/token', iss)) + self.assertFalse( # not a *segment* prefix: prod != production + under('https://idp.example.com/realms/production/token', iss)) + # A root issuer (no path) constrains the origin only -> any path is in. + self.assertTrue( + under('https://idp.example.com/anything', 'https://idp.example.com')) + self.assertTrue( + under('https://idp.example.com/x', 'https://idp.example.com/')) + # A '.' / '..' segment (even percent-encoded) is rejected: urllib sends + # the dotted path verbatim and the IdP / proxy normalizes it to a + # DIFFERENT realm, which an origin check can't catch. + self.assertFalse(under(iss + '/../EVIL/protocol/token', iss)) + self.assertFalse(under(iss + '/%2e%2e/EVIL/token', iss)) + self.assertFalse(under(iss + '/./token', iss)) + # Encodings/escapes the old decode-once-then-compare-raw check let + # through (M4): a server that unescapes more than once, folds a + # backslash to '/', or normalizes the last segment's ;params would + # resolve these to a DIFFERENT realm, so they must be rejected too. + self.assertFalse(under(iss + '/%252e%252e/EVIL/token', iss)) # 2x-enc + self.assertFalse(under(iss + '/..\\EVIL/token', iss)) # backslash + self.assertFalse(under(iss + '/token;..%2f..%2fEVIL', iss)) # ;params + # A legitimate sub-path with a (non-traversal) percent-escape or matrix + # param is still accepted — only dot traversal is rejected. + self.assertTrue(under(iss + '/some%20path/token', iss)) + self.assertTrue(under(iss + '/token;jsessionid=abc', iss)) + + +class TestCacheKey(unittest.TestCase): + def _auth(self, **kw): + opts = dict( + client_id='questdb', + device_authorization_endpoint='https://idp.example.com/device', + token_endpoint='https://idp.example.com/token', + scope='openid groups', groups_in_token=True, cache='memory', + renderer=Renderer()) + opts.update(kw) + return OidcDeviceAuth(**opts) + + def test_normalize_url_malformed_port_raises_config_error(self): + # cache_key normalization shares the same typed-port guard: a malformed + # port raises OidcConfigError, not a bare ValueError. See M6. + from questdb.auth._device import _normalize_url + with self.assertRaises(OidcConfigError): + _normalize_url('https://idp:notaport/token') + + def test_normalize_url_malformed_ipv6_raises_config_error(self): + # cache_key normalization must also map a malformed IPv6 literal (which + # makes urlparse itself raise) to OidcConfigError, not a bare + # ValueError. See M1. + from questdb.auth._device import _normalize_url + with self.assertRaises(OidcConfigError): + _normalize_url('https://[::1') + + def test_realm_path_distinguishes_key(self): + # Multi-tenant IdP: same host, different realm path -> distinct keys + # (the old origin-only key collided, leaking one realm's token). + a = self._auth( + token_endpoint='https://idp.example.com/realmA/token', + device_authorization_endpoint='https://idp.example.com/realmA/dev') + b = self._auth( + token_endpoint='https://idp.example.com/realmB/token', + device_authorization_endpoint='https://idp.example.com/realmB/dev') + self.assertNotEqual(a.cache_key, b.cache_key) + + def test_scope_order_does_not_change_key(self): + self.assertEqual( + self._auth(scope='openid groups').cache_key, + self._auth(scope='groups openid').cache_key) + + def test_audience_distinguishes_key(self): + self.assertNotEqual( + self._auth(audience='aud-1').cache_key, + self._auth(audience='aud-2').cache_key) + + def test_default_port_normalized(self): + self.assertEqual( + self._auth(token_endpoint='https://idp.example.com/token').cache_key, + self._auth( + token_endpoint='https://idp.example.com:443/token').cache_key) + + def test_groups_in_token_distinguishes_key(self): + # groups_in_token selects which token kind _select returns, so two + # sessions differing ONLY in that mode must not collide on one cache + # entry (and evict each other). scope already has 'openid' here, so the + # keys can differ only by the mode. + self.assertNotEqual( + self._auth(groups_in_token=True).cache_key, + self._auth(groups_in_token=False).cache_key) + + +class TestTransportSecurity(unittest.TestCase): + def test_require_secure_policy(self): + from questdb.auth._http import _require_secure + # https is always fine. + _require_secure('https://idp.example.com/x', insecure=False) + # loopback http never leaves the host -> always allowed. + _require_secure('http://127.0.0.1:9000/x', insecure=False) + _require_secure('http://localhost/x', insecure=False) + _require_secure('http://[::1]:8080/x', insecure=False) + # non-loopback http is refused unless insecure is explicitly set. + with self.assertRaises(OidcConfigError): + _require_secure('http://idp.example.com/x', insecure=False) + _require_secure('http://idp.example.com/x', insecure=True) + + def test_post_form_attaches_status_to_non_json_error(self): + # The device-flow poll loop and the silent refresh classify a non-JSON + # token-endpoint failure (4xx terminal vs 5xx/429 transient) by the HTTP + # status, so post_form must attach it to the raised OidcError. M1/M2. + from questdb.auth._http import post_form + with _raw_response_server(403, 'text/plain', b'forbidden') as raw: + with self.assertRaises(OidcError) as cm: + post_form(raw + '/token', {'grant_type': 'x'}) + self.assertEqual(cm.exception.status, 403) + # A non-JSON 5xx likewise carries its status (classified as transient). + with _raw_response_server(503, 'text/html', b'

bad gw

') as raw: + with self.assertRaises(OidcError) as cm: + post_form(raw + '/token', {'grant_type': 'x'}) + self.assertEqual(cm.exception.status, 503) + + def test_insecure_does_not_downgrade_idp(self): + # insecure=True must NOT permit plaintext to a non-loopback IdP: the + # device code / refresh token must never traverse the network in clear. + auth = OidcDeviceAuth( + client_id='questdb', + device_authorization_endpoint='http://idp.example.com/device', + token_endpoint='http://idp.example.com/token', + scope='openid', groups_in_token=False, cache='memory', + insecure=True, interactive=True, renderer=Renderer(), + _clock=FakeClock()) + with self.assertRaises(OidcConfigError): + auth.token() + + def test_redirects_are_not_followed(self): + # A 30x must NOT be followed: urllib would otherwise re-send the + # Authorization: Bearer header (and downgrade to plaintext http) to the + # redirect target, leaking the QuestDB token off-origin (only the + # original URL is vetted, never the redirect target). The redirect must + # surface as a non-2xx response, and the off-origin host must never be + # contacted. See C1. + from questdb.auth import _http + + seen = [] + + class _Redir(http.server.BaseHTTPRequestHandler): + def log_message(self, *a): + pass + + def do_GET(self): + seen.append((self.path, self.headers.get('Authorization'))) + if self.path == '/exec': + self.send_response(302) + self.send_header('Location', attacker + '/stolen') + self.end_headers() + else: + self.send_response(200) + self.send_header('Content-Length', '2') + self.end_headers() + self.wfile.write(b'{}') + + victim = http.server.HTTPServer(('127.0.0.1', 0), _Redir) + thief = http.server.HTTPServer(('127.0.0.1', 0), _Redir) + attacker = f'http://127.0.0.1:{thief.server_port}' + for srv in (victim, thief): + threading.Thread(target=srv.serve_forever, daemon=True).start() + try: + resp = _http.request( + 'GET', f'http://127.0.0.1:{victim.server_port}/exec', + headers={'Authorization': 'Bearer SECRET'}, timeout=5) + finally: + for srv in (victim, thief): + srv.shutdown() + srv.server_close() + + # The redirect surfaced as a non-2xx response, was not followed, and the + # off-origin target never saw the request (or the bearer token). + self.assertEqual(resp.status, 302) + self.assertEqual(seen, [('/exec', 'Bearer SECRET')]) + + def test_malformed_url_raises_config_error(self): + # A non-integer port must surface as OidcConfigError, not a raw + # http.client.InvalidURL escaping the typed-error contract — this is the + # path the QuestDB /settings / discovery fetches go through. See M3. + from questdb.auth._http import request + with self.assertRaises(OidcConfigError): + request('GET', 'https://questdb.example.com:notaport/settings', + timeout=5) + + def test_require_secure_rejects_malformed_ipv6(self): + # _require_secure routes through safe_urlparse, so a malformed IPv6 + # endpoint raises OidcConfigError instead of a bare ValueError (urlparse + # raises before the scheme is even inspected). See M1. + from questdb.auth._http import _require_secure + with self.assertRaises(OidcConfigError): + _require_secure('https://[::1', insecure=False) + # A well-formed IPv6 URL is still accepted (loopback http is allowed). + _require_secure('http://[::1]:8080/x', insecure=False) + + def test_bad_ca_bundle_raises_config_error(self): + # A missing or invalid CA bundle path (explicit or via env) must surface + # as OidcConfigError, not a raw FileNotFoundError / ssl.SSLError. See M1. + import tempfile + from questdb.auth._http import build_ssl_context + with self.assertRaises(OidcConfigError): + build_ssl_context('/no/such/path/ca.pem') + with tempfile.NamedTemporaryFile('w', suffix='.pem', delete=False) as f: + f.write('not a certificate') + bad = f.name + try: + with self.assertRaises(OidcConfigError): + build_ssl_context(bad) + finally: + os.unlink(bad) + + def test_deeply_nested_json_raises_oidc_error(self): + # A RecursionError from json.loads (a deeply-nested JSON body exhausts + # the decoder's stack) must be mapped to OidcError, not escape the + # typed-error contract. The depth at which json actually raises is a + # Python-version detail — the C scanner ignores sys.setrecursionlimit, + # and 3.14 parses far deeper than 3.13, so a fixed-depth body no longer + # raises there — so inject the RecursionError directly to test the + # mapping deterministically across versions. See M1. + from questdb.auth import _http + with _raw_response_server( + 200, 'application/json', b'{"ok": true}') as base, \ + mock.patch.object( + _http.json, 'loads', + side_effect=RecursionError('nesting too deep')): + with self.assertRaises(OidcError): + _http.get_json(base + '/x', timeout=5) + with self.assertRaises(OidcError): + _http.post_form(base + '/x', {'a': 'b'}, timeout=5) + + def test_post_form_non_json_2xx_raises_oidc_error(self): + # A 2xx body from the token/device endpoint that isn't JSON (e.g. an + # HTML login page from a proxy in front of the IdP) must surface as + # OidcError, not a raw decoder error. Only /exec had this before. M4. + from questdb.auth import _http + with _raw_response_server(200, 'text/html', b'login') as b: + with self.assertRaises(OidcError): + _http.post_form(b + '/token', {'a': 'b'}, timeout=5) + + def test_post_form_non_dict_json_raises_oidc_error(self): + # A 2xx JSON array (valid JSON but not an object) from the token + # endpoint must surface as OidcError. See M4. + from questdb.auth import _http + with _raw_response_server(200, 'application/json', b'[1, 2, 3]') as b: + with self.assertRaises(OidcError): + _http.post_form(b + '/token', {'a': 'b'}, timeout=5) + + def test_get_json_non_2xx_raises_oidc_error(self): + # A non-2xx /settings or discovery response must surface as OidcError. + # See M4. + from questdb.auth import _http + with _raw_response_server(500, 'text/plain', b'boom') as b: + with self.assertRaises(OidcError): + _http.get_json(b + '/settings', timeout=5) + + def test_get_json_non_json_2xx_raises_oidc_error(self): + # A 2xx /settings or discovery body that isn't JSON must surface as + # OidcError, not a raw JSONDecodeError. See M4. + from questdb.auth import _http + with _raw_response_server(200, 'text/html', b'x') as b: + with self.assertRaises(OidcError): + _http.get_json( + b + '/.well-known/openid-configuration', timeout=5) + + +class TestRendererSecurity(unittest.TestCase): + """The Jupyter prompt must never turn an IdP-supplied URL into a + clickable/executable link unless it uses an http(s) scheme.""" + + def test_safe_link_url_allowlist(self): + from questdb.auth._render import _safe_link_url + self.assertEqual(_safe_link_url('https://idp/x'), 'https://idp/x') + self.assertEqual(_safe_link_url('http://idp/x'), 'http://idp/x') + self.assertEqual(_safe_link_url('HTTPS://idp/x'), 'HTTPS://idp/x') + for bad in ('javascript:alert(1)', 'data:text/html,x', + 'vbscript:x', 'file:///etc/passwd', '', None): + self.assertIsNone(_safe_link_url(bad)) + + def test_render_link_inert_for_dangerous_scheme(self): + from questdb.auth._render import _render_link + safe = _render_link('https://idp/x') + self.assertIn('= 3. + # Derive it instead of hardcoding so the test is version-agnostic. + str_dtype = pd.Series(['x']).dtype fallback_exp_dtypes = [ - np.dtype('O'), + str_dtype, np.dtype('int16'), np.dtype('float64'), np.dtype('float64')] - fallback_df = df.astype({'s': 'object', 'b': 'float64'}) + fallback_df = df.astype({'s': str_dtype, 'b': 'float64'}) df_eq(df, pa2pa_df, exp_dtypes) if fp_wrote: