feat(core): OIDC sign-in via device flow (RFC 8628)#52
Open
glasstiger wants to merge 15 commits into
Open
Conversation
Two fixes for the OIDC device flow in the Java client. M2 - Bidi / zero-width Unicode bypassed the display sanitizer. sanitizeForDisplay and OidcAuthException.putSanitized filtered only on Character.isISOControl, which covers C0/C1 and DEL but not the bidirectional overrides (U+202A-202E), isolates (U+2066-2069), marks (U+200E/200F), zero-width characters or the BOM (U+FEFF). Those fields - user_code, verification_uri(_complete), error and error_description - all come from the IdP/settings boundary and reach System.out and the exception messages, so a hostile or MITM'd IdP could embed a right-to-left override and spoof the verification URL a human reads and then opens. The JSON lexer's \uXXXX decoding widens the vector, since an escaped override decodes to the real character before display. Both sanitizers now share OidcAuthException.isUnsafeForDisplay, which also strips the Unicode format category (Cf) plus the explicit bidi/BOM set. The predicate uses hex int literals rather than char escapes, keeping the source strictly ASCII so the file carries none of the characters it guards against. M3 - httpTokenProvider forced a successful sign-in before build(). createLineSender eagerly rebuilt the pending request when a provider was set, calling getToken() at build time. With the documented .httpTokenProvider(auth::getTokenSilently), that threw unless the caller had already signed in, so the natural "construct the sender, sign in, then send" ordering was impossible. The first token pull is now deferred off the build path to the first row (table()). The provider is wired at build but not queried; the initial request is stamped with a token when the first row starts, and the pending flag is cleared only after the pull succeeds, so a not-yet-signed-in provider that throws leaves the stamp pending for a retry. The Sender.httpTokenProvider Javadoc now states the provider is not called at build time. Tests: new bidi/zero-width cases for the challenge fields and the oauth error message (fed as JSON \uXXXX escapes so they exercise the decode-then-display path), and a new LineHttpSenderTokenProviderTest covering the deferred pull and a lazily signing-in provider. Each test was confirmed to fail without its fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three robustness fixes in the OIDC device-flow parser. m3 - A JSON null arrives from the lexer as the literal "null". The token and device parsers used putValue, which stored it verbatim, so "access_token": null became the 4-char token "null" and "error": null was read as an OAuth error code "null". Merged putValue with SettingsDiscoveryParser's null-guarding putNonNull into one shared helper used by all three parsers, so a JSON null is treated as absent everywhere. m4 - Endpoint.parse did not range-check the port, so host:0, host:-1 and host:99999 parsed and flowed to the transport. Added a 1..65535 guard that rejects them with a clear message. m5 - The token-response expires_in was not clamped, unlike the device-auth value, so a TTL near Integer.MAX_VALUE cached the token for ~68 years. storeTokens now applies the same boundedSeconds clamp (the default for a non-positive value, capped at MAX_EXPIRES_IN_SECONDS). The server still enforces the real expiry; this only bounds how long the client trusts its cached copy. Tests: null access_token and null error are rejected/ignored, out-of-range ports are rejected at build, and a clamped token expiry forces a fresh sign-in (observed via a clock-skew margin set above the clamp). Each test was confirmed to fail without its fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-questdb-client into ia_oidc_device_flow
The post-flush reset() eagerly rebuilt the next request and pulled the provider token via httpTokenProvider.getToken() after the current batch had already been sent and accepted. If that pull threw (e.g. OidcDeviceAuth::getTokenSilently when a silent refresh fails) it turned an already-successful flush into a thrown exception and left the shared Request half-built (contentStart == -1, no withContent()), so the next row's data went into the header region - a malformed request, lost rows and a permanently corrupted sender. Route every request's token pull through the same deferred, retriable path the initial request already used: newRequest() no longer pulls the provider token (it marks the request token-pending and builds a valid token-less request), and stampTokenIfPending() pulls it lazily when the first row of a request starts. A failed pull leaves the flag set and the sender untouched, so the next row re-runs the stamp and fully rebuilds the request. Per-request token rotation is unchanged. Rename isInitialTokenPending/stampInitialTokenIfPending to isTokenPending/stampTokenIfPending since the deferral now covers every request, and stamp the token in putRawMessage() too. Add a regression test that fails at the first, successful flush without the fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
getToken() and getTokenSilently() were both synchronized on the instance
monitor, and getToken() holds it for the entire interactive device flow -
up to the device-code lifetime, clamped to one hour. A long-lived Sender
wired with httpTokenProvider(auth::getTokenSilently) therefore stalled on
the flush path for up to an hour whenever another thread ran an
interactive sign-in (e.g. a re-auth after the refresh token died). The
javadoc claimed the opposite ("safe on a request/flush path").
Replace the synchronized methods with a ReentrantLock. getToken() and
clearCache() still acquire it blocking, but getTokenSilently() now uses
tryLock() and fails fast with an OidcAuthException instead of waiting:
while a sign-in is in progress there is no token to serve anyway, so the
caller gets a prompt, retriable exception rather than a wedged flush. The
interactive flow still holds the lock for its whole duration and close()
still sets the volatile cancellation flag before acquiring the lock, so
the no-use-after-free guarantee is unchanged.
Correct the class and getTokenSilently() javadocs, and add a regression
test that fails (getTokenSilently blocks ~10s behind an in-flight
sign-in) without the fix.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
isUnsafeForDisplay() inspected one UTF-16 code unit at a time, so a supplementary-plane (>= U+10000) format or control character - an invisible U+E00xx "tag" char, for instance - arrived as a surrogate pair whose halves are each neither a control nor category Cf and so passed the filter unstripped. Because the JSON lexer reassembles such 😀-style escapes, a hostile or man-in-the-middled identity provider could smuggle invisible/spoofing characters into a user_code, a verification_uri, or an error_description and on into the terminal prompt and exception messages. Judge a Unicode code point instead: isUnsafeForDisplay() takes an int, and both sanitizers (putSanitized for exception messages, sanitizeForDisplay for the prompt) walk the text by code point with Character.codePointAt/charCount, so Character.getType classifies a supplementary char as one character. A legitimate astral character (an emoji) is still preserved. Make the assertNoUnsafeDisplayChars test helper code-point-aware too - it shared the blind spot - and add a regression test that fails (the U+E0001 tag char survives) without the fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pollOnce() checked for a token before the HTTP status and the OAuth error field, so a response that carried a token alongside an error, or under a non-2xx status, was cached as a valid grant. tryRefresh() had the same flaw: it accepted the refreshed token on token presence alone. Both contradict RFC 6749 - 5.1 makes a grant a 2xx response carrying a token, and 5.2 says an error response must not be treated as a grant. Handle the OAuth error first in pollOnce(), so a token smuggled alongside an error never counts, and accept a token only when the status is 2xx; a token under a non-2xx status goes to the transport- error budget instead of being trusted. Guard tryRefresh() the same way: cache the refreshed token only from a clean 2xx response with no error, otherwise fall back to the interactive flow. The happy path and the existing pending/slow_down/access_denied/empty- body outcomes are unchanged. Add regression tests for a token alongside an error, a token under a non-2xx status, and a refresh that smuggles a token with an error - each fails without the fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
newRequest() passed the token from httpTokenProvider.getToken() straight to authToken(), which does not null- or empty-check it. A provider that returned null, "", or whitespace therefore produced a malformed "Authorization: Bearer " header that the server only answered with a 401 far from the cause - no client-side error at all. The HttpTokenProvider contract forbids such a return but nothing enforced it, and httpToken() already rejects a blank token, so the provider path was the weaker spot. Validate the pulled token with Chars.isBlank (as httpToken does) and throw a clear LineSenderException instead. The check sits inside the deferred pull, so a rejected token leaves the stamp pending and the next row retries cleanly, just like a throwing provider does. OidcDeviceAuth never returns a blank token, so this guards custom providers. Add tests that a null, an empty, and a whitespace-only provider token is rejected at first use - each fails without the fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
JsonLexer.getCharSequence rescanned every decoded value and name from the start to look for a backslash, even though the parse loop already detects one when it sets ignoreNext. Record that in a sawEscape flag (carried across parse() fragments) and resolve escapes only when it is set, so the common no-escape value returns the assembled sink without a second pass. OidcDeviceAuth.Endpoint.parse now rejects a host that contains control characters or whitespace - a smuggled CR/LF would otherwise flow into the outbound Host header. Add the tests these paths lacked: a cross-fragment escape; the lexer's lenient and exotic escape arms (surrogate pairs, \b/\f, unknown and malformed escapes, lone surrogates); the version-probe settings parser reading an escaped key through unescape; HTTP-token-provider rejection for UDP and WebSocket (not just TCP); and the control-character host cases above. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Port the issuer feature from py-questdb-client (PR #133) onto OidcDeviceAuth, so the device flow keeps working against servers that do not advertise their device-authorization endpoint, and so the device code and refresh token are only sent where the caller pins. The issuer plays three roles: - Discovery fallback: when /settings omits the device (and/or token) endpoint, fromQuestDB(url, issuer) reads it from the issuer's .well-known/openid-configuration document. The discovery origin comes only from the out-of-band issuer (or an explicit discoveryUrl), never from a /settings-supplied value, so a tampered /settings cannot redirect discovery. Without a pin, discovery is refused. - Plaintext-channel pin: a /settings response fetched over plaintext http to a non-loopback host (only reachable with allowInsecureTransport) cannot route credentials to its advertised endpoints without a pin. - Endpoint-origin pin: validateEndpointOrigins, enforced in Builder.build() on every construction path, requires the token and device endpoints to share one origin (RFC 8628 co-location) and, when an issuer is set, to belong to it. Config surface: Builder.issuer(...); new fromQuestDB overloads (url, issuer), (url, issuer, allowInsecure), and a 5-arg master taking issuer, discoveryUrl and a TLS config. Tradeoffs: - The co-location check makes the token and device endpoints share an origin. testPersistentTransportFailureDuringPollingAborts simulated an unreachable token endpoint with a dead second port; it now uses a new MockOidcServer.dropConnection() against a co-located path. - The origin pin compares scheme/host/port and ignores the path, so an identity provider that hosts its endpoints on a different origin than its issuer must be configured without an issuer. This matches the Python client. - allowInsecureTransport still relaxes the identity provider endpoints too (unchanged); the Python client always forces https/loopback for the IdP. Left as-is to avoid changing settled transport behavior. Adds 7 tests and updates the README OIDC section. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
[PR Coverage check]😍 pass : 822 / 880 (93.41%) file detail
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds interactive OIDC sign-in to the Java client using the OAuth 2.0 Device Authorization Grant (RFC 8628). A process with no local browser — a remote notebook kernel, a container, a headless job — can sign a human in against QuestDB Enterprise: the user authorizes on any device (laptop or phone) while the process only makes outbound calls to the identity provider.
On first use it prints a verification URL and a short code; once the user authorizes, the token is cached in memory and refreshed silently on later calls.
What's new
OidcDeviceAuth(io.questdb.client.cutlass.auth) — runs the flow and owns the token:OidcDeviceAuth.fromQuestDB(url)discovers the client id, scope and IdP endpoints from the server's unauthenticated/settings;OidcDeviceAuth.fromQuestDB(url, issuer)additionally pins the identity provider (see Discovery and trust below);OidcDeviceAuth.builder()configures the identity provider explicitly.getToken()signs in interactively on first use, then serves a cached token and refreshes it silently;getTokenSilently()never prompts (safe on a request/flush path);getAuthorizationHeaderValue()returns the fullBearer …value;close()cancels an in-flight sign-in. Token state lives in memory only and does not survive a process restart.Senderintegration — newHttpTokenProviderinterface andSender.builder(...).httpTokenProvider(auth::getTokenSilently). The sender pulls a freshly refreshed token on every request, so a long-lived sender keeps working as the token rotates — unlike a fixedhttpToken(...), which is captured once and eventually starts returning 401s. Mutually exclusive withhttpToken/httpUsernamePassword, and HTTP-transport only (rejected for TCP/UDP/WebSocket).DeviceCodePrompt/DeviceAuthorizationChallenge— how the verification URL and user code are shown. The default prints toSystem.out; supply your own to render a clickable link or a QR code, e.g. in a notebook.The token can be presented to QuestDB over any auth path the server already validates:
Authorization: Bearer <token>._ssowith the token as the password (requiresacl.oidc.pg.token.as.password.enabled=trueon the server).Discovery and trust
fromQuestDB(...)takes the IdP endpoints from the server's unauthenticated/settings, so by default it trusts that server to designate where the user signs in: a spoofed, compromised, or man-in-the-middled server could otherwise redirect the sign-in — and the long-lived refresh token — to an attacker-controlled identity provider. An optionalissuerargument addresses this, and also covers servers that do not advertise a device-authorization endpoint. It plays three roles:.well-knowndiscovery fallback. Current servers do not advertise the device-authorization endpoint. When it (and/or the token endpoint) is missing,fromQuestDB(url, issuer)reads it from the issuer's{issuer}/.well-known/openid-configurationdocument. The discovery origin comes only from the caller-suppliedissuer(or an explicitdiscoveryUrl), never from a/settings-supplied value, so a tampered/settingscannot choose where discovery — and the credential POSTs it resolves — are aimed. Without an issuer, discovery is refused rather than guessed.validateEndpointOrigins, enforced on every construction path (discovery and the explicitbuilder()), requires the token and device-authorization endpoints to share one origin (RFC 8628 co-locates them), and — when an issuer is set — to belong to that issuer's origin, so a compromised-but-reachable server cannot redirect the credentials off the trusted issuer./settingsresponse fetched over plaintexthttpto a non-loopback host (only reachable withallowInsecureTransport) is MITM-able, so its advertised endpoints are not trusted to route credentials without an issuer/discoveryUrlpin.Without an
issuer, the behaviour against anhttpsserver that advertises its endpoints is unchanged: that server is trusted, as before.Security
httpsis required by default for both the QuestDB server and the IdP endpoints;httpis rejected unless the caller opts in withallowInsecureTransport(true)(local development only).expires_in/intervalare clamped; a persistent transport failure while polling aborts on a small error budget instead of running silently to the device-code timeout; IPv6-literal hosts are rejected rather than mis-parsed.Supporting changes
JsonLexernow resolves JSON string escape sequences (\",\\,\/,\b \f \n \r \t,\uXXXX; lenient on malformed input), so string values arrive fully decoded.Responsegainsrecv(int timeout)for deadline-bounded body reads.AbstractLineHttpSenderplumbs the token provider through and rebuilds the pending request so the very first send already carries a provider-sourced token.Tradeoffs and limitations
builder(). This matches the Python client.MockOidcServer.dropConnection()).allowInsecureTransportstill relaxes the IdP endpoints as well as the QuestDB link; it is not narrowed to the server link only (unlike the Python client, which always holds the IdP to https/loopback). The plaintext-channel pin mitigates the credential-routing risk this leaves, but the final hop to the IdP can still be plaintext when this flag is set.Tests & docs
OidcDeviceAuthTest+MockOidcServer,SenderBuilderErrorApiTest,JsonLexerTest; a runnableOidcDeviceFlowExample; and a README "OIDC Sign-In (Device Flow)" section. The issuer feature adds tests for:.well-knowndiscovery via an issuer, and via an explicitdiscoveryUrl