Skip to content

fix: honor server-directed FDv1 Fallback Directive in initializer phase#423

Open
keelerm84 wants to merge 5 commits intomainfrom
mk/sdk-2290/fdv1-fallback
Open

fix: honor server-directed FDv1 Fallback Directive in initializer phase#423
keelerm84 wants to merge 5 commits intomainfrom
mk/sdk-2290/fdv1-fallback

Conversation

@keelerm84
Copy link
Copy Markdown
Member

@keelerm84 keelerm84 commented Apr 30, 2026

Summary

  • FDv2 initializer responses carrying X-LD-FD-Fallback: true now apply any accompanying payload first, then the data system swaps the synchronizer list to the FDv1 Fallback Synchronizer (or transitions to OFF when none is configured). Synchronizer-phase behavior is unchanged; the directive is one-way.
  • Adds Basis.fallback_to_fdv1 and renames Update.revert_to_fdv1 to Update.fallback_to_fdv1 so initializer/synchronizer results carry the signal explicitly. Polling and streaming data sources detect the header on both success and error paths via a shared helper.
  • Contract test service declares the fdv1-fallback capability and accepts a top-level dataSystem.fdv1Fallback config object, wiring it directly to datasystem.fdv1_compatible_synchronizer(...) instead of inferring it from the last polling synchronizer. Bumps the contract-tests pin from v3.0.0-alpha.4 to v3.0.0-alpha.6.

Mirrors the Go reference change in go-server-sdk#365 and contract-test wiring in go-server-sdk#368. Spec rationale lives in sdk-specs#155; the new harness suite is in sdk-test-harness#336.

Test plan

  • pytest (LD_SKIP_DATABASE_TESTS=1): 975 passed, 194 skipped (database-only).
  • mypy ldclient: clean.
  • isort --check, pycodestyle: clean.
  • New deterministic tests cover initializer fallback on success, on error, with and without FDv1 configured, plus streaming Start-then-payload and synchronizer fallback after a Valid update. No real wall clocks or sleeps introduced.
  • Contract-test workflow against v3.0.0-alpha.6 runs the new "FDv1 Fallback Directive" suite cleanly in CI.

Note

Medium Risk
Changes FDv2 initialization and streaming/polling fallback behavior based on server headers, which can alter data-source lifecycle and shutdown semantics. Risk is mitigated by extensive new unit/contract-test wiring but impacts core update/fallback paths.

Overview
Implements server-directed FDv1 fallback during FDv2 initialization. Initializers can now trigger a terminal switch to the configured FDv1 fallback synchronizer when X-LD-FD-Fallback: true is seen (including on initializer error responses), and the system transitions to OFF if no FDv1 fallback is configured.

Unifies and hardens fallback signaling across FDv2 components. Adds Basis.fallback_to_fdv1 and renames Update.revert_to_fdv1 to Update.fallback_to_fdv1, updates polling to forward headers on failures and to honor fallback even for non-HTTP errors, and updates streaming to latch the directive from Start, propagate it through subsequent updates/errors, and force-close the underlying urllib3 pool on shutdown.

Updates contract-test integration. CI contract tests are bumped to v3.0.0-alpha.6, the contract test service advertises fdv1-fallback, and the test client config now accepts dataSystem.fdv1Fallback as a dedicated FDv1 fallback synchronizer configuration.

Reviewed by Cursor Bugbot for commit cafcca1. Bugbot is set up for automated code reviews on this repo. Configure here.

Previously the X-LD-FD-Fallback response header was honored only in
the synchronizer phase, and in the synchronizer phase a payload
arriving alongside the directive was discarded because the streaming
processor halted before applying it. Per the updated FDv2 Data System
spec, the directive must be honored in both initializer and
synchronizer phases, must take precedence over the section 1.2
failover algorithm, and must let any accompanying payload be applied
before the SDK switches terminally to the FDv1 Fallback Synchronizer.

Carry the signal explicitly on the public initializer/synchronizer
result types -- Basis.fallback_to_fdv1 and Update.fallback_to_fdv1 --
so callers cannot silently drop it. Update.revert_to_fdv1 is renamed
to Update.fallback_to_fdv1 to match the spec terminology.
Declare the fdv1-fallback capability and accept the new top-level
dataSystem.fdv1Fallback config object, wiring it directly to the SDK's
FDv1 Fallback Synchronizer. Drop the heuristic that previously inferred
the FDv1 fallback from the last polling synchronizer entry -- the
FDv1 Fallback Synchronizer is a distinct configuration concern from
the FDv2 Primary/Fallback chain.

Also bump the test harness pin to v3.0.0-alpha.6 so the new
fdv1-fallback test suite runs in CI.
Per the FDv1 Fallback Directive (Data System spec 1.6.3), the Primary
Synchronizer must be terminated when the directive engages. The
streaming source called ``SSEClient.close()`` to do this, but on
Python 3.10 that wasn't enough: ``SSEClient.close()`` ultimately
invokes ``urllib3.HTTPResponse.shutdown()`` (which only performs a
half-close on the local read side) plus ``release_conn()`` (which
returns the live connection to the pool). The TCP socket stayed open
until the response was garbage-collected, which the contract test
harness flagged on the linux/3.10 matrix entry as
"SDK did not close the FDv2 streaming connection after the directive
— the Primary Synchronizer must be stopped when Directed Fallback
engages." Newer Python versions GC the response promptly enough to
hide the leak.

Track the urllib3 PoolManager alongside the SSEClient and call
``HTTPConnectionPool.close()`` on each pool when the synchronizer
stops. ``HTTPConnectionPool.close()`` drains the connection queue and
closes each socket, sending the FIN the server is waiting on.
``PoolManager.clear()`` alone won't do this -- it drops the dict of
pools without closing the connections inside them. The fix is
deterministic (no sleeps) and runs both at the natural end of
``sync()`` and in ``stop()`` so it's robust to whichever thread
arrives first.
@keelerm84 keelerm84 marked this pull request as ready for review May 4, 2026 20:44
@keelerm84 keelerm84 requested a review from a team as a code owner May 4, 2026 20:44
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f37695b. Configure here.

Comment thread ldclient/impl/datasourcev2/streaming.py
Comment thread ldclient/impl/datasourcev2/streaming.py
keelerm84 added 2 commits May 5, 2026 09:46
When a streaming Start action latched the FDv1 Fallback Directive
(via X-LD-FD-Fallback: true), a subsequent Fault, JSONDecodeError, or
generic exception path could yield an Update with
fallback_to_fdv1=False because each error path independently checked
the failure's headers for the directive instead of consulting the
latched state from the earlier Start. The most visible symptom is an
HTTPStatusError 401 mid-stream after the directive was set: the SDK
treated it as an ordinary unrecoverable failure (REMOVE) instead of
honoring the directive (FDV1).

Funnel every error-path Update through a small _with_fallback_signal
helper that propagates the latched signal, and break out of the SSE
read loop unconditionally once the directive is set so we don't keep
retrying the FDv2 endpoint -- the directive is one-way and terminal.
Two follow-up cleanups from review of the FDv1 Fallback Directive work:

- Drop the test-only shim in StreamingDataSource.sync() that accepted
  either a bare SSEClient or a (client, pool) tuple from the injected
  builder. The SseClientBuilder type alias already declares the tuple
  return; annotating the field as SseClientBuilder lets pyright align,
  and the lone test fixture (list_sse_client) now returns
  (client, None) like the production create_sse_client does.
- Hoist basis.change_set.selector.is_defined() into a local in
  FDv2._run_initializers() so it isn't computed twice on the
  success-with-fallback path. Also fixes a typo in the adjacent
  comment ("if an only if" -> "if and only if").
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant