Per-Partition Automatic Failover: Faster detection of per-partition write region through availability strategy for writes.#48421
Draft
jeet1995 wants to merge 3 commits intoAzure:mainfrom
Conversation
72cd8e0 to
acbf49c
Compare
…riter accounts
Enable proactive write hedging for Per-Partition Automatic Failover (PPAF) on single-writer
Cosmos DB accounts. When a write to the primary region is slow or failing, the SDK now hedges
the write to a read region — reducing time-to-recovery from 60-120s (retry-based) to the
hedging threshold (~1s with default config).
## Problem
In PPAF-enabled single-writer accounts, when a partition fails over, the SDK waits for error
signals (503, 408, 410) which can take 60-120s before marking a region as failed for that
partition via the retry-based path in GlobalPartitionEndpointManagerForPerPartitionAutomaticFailover.
## Solution
Plug the existing availability strategy (hedging) machinery into the write path for PPAF:
1. **Speculation gating** (RxDocumentClientImpl.getApplicableRegionsForSpeculation):
- Relax the canUseMultipleWriteLocations() gate for PPAF single-writer accounts
- Relax the isIdempotentWriteRetriesEnabled gate (PPAF provides partition-level consistency)
- Use ALL account-level read regions (getAvailableReadRoutingContexts) as hedge candidates,
not just preferred regions — PPAF failover can target any read region
2. **Routing** (tryAddPartitionLevelLocationOverride + CrossRegionAvailabilityContext):
- Add ppafWriteHedgeTargetRegion field to CrossRegionAvailabilityContextForRxDocumentServiceRequest
- In tryAddPartitionLevelLocationOverride: when ppafWriteHedgeTargetRegion is set, create the
conchashmap entry via computeIfAbsent and route via hedgeFailoverInfo.getCurrent()
- This is synchronous and deterministic — conchashmap updated in the same request pipeline
- Thread safety: uses getCurrent() from the computeIfAbsent result (not raw hedgeTarget)
to avoid routing to a region the concurrent retry path may have marked as failed
3. **Default E2E policy** (evaluatePpafEnforcedE2eLatencyPolicyCfgForWrites):
- Mirrors the read defaults exactly — symmetric hedging behavior for reads and writes
- Only applied to point write operations (batch excluded via isPointOperation gate)
- DIRECT: timeout=networkRequestTimeout+1s, threshold=min(timeout/2, 1s), step=500ms
- GATEWAY: timeout=min(6s, httpTimeout), threshold=min(timeout/2, 1s), step=500ms
4. **Safety lever** (Configs.isWriteAvailabilityStrategyEnabledWithPpaf):
- System property COSMOS.IS_WRITE_AVAILABILITY_STRATEGY_ENABLED_WITH_PPAF (default: true)
- Allows opt-out without code changes if regression is observed
## Files changed (6)
- Configs.java: Write availability strategy PPAF config flag
- RxDocumentClientImpl.java: Speculation gating, region resolution, write E2E policy
- CrossRegionAvailabilityContextForRxDocumentServiceRequest.java: ppafWriteHedgeTargetRegion field
- ClientRetryPolicy.java: Honor ppafWriteHedgeTargetRegion in tryAddPartitionLevelLocationOverride
- GlobalPartitionEndpointManagerForPerPartitionAutomaticFailover.java: Hedge target handling
in tryAddPartitionLevelLocationOverride with computeIfAbsent + getCurrent()
- PerPartitionAutomaticFailoverE2ETests.java: 26 new test cases
## Test coverage
| Op | DIRECT (mocked transport) | GATEWAY (mocked HttpClient) |
|---------|--------------------------|----------------------------|
| Create | 410/21005 + 503/21008 | delayed write region |
| Replace | 410/21005 | delayed write region |
| Upsert | 410/21005 | delayed write region |
| Delete | 410/21005 | delayed write region |
| Patch | 410/21005 | delayed write region |
Additional tests:
- Opt-out via COSMOS.IS_WRITE_AVAILABILITY_STRATEGY_ENABLED_WITH_PPAF=false
- Batch bypass verification (batch uses retry-based PPAF, not hedging)
- Explicit conchashmap verification: after hedge success, asserts the PPAF manager's
partitionKeyRangeToFailoverInfo entry points to a region != the failed write region
All assertions are exact match: 2 regions before failover, 1 region after failover.
165 tests total (existing + new), 0 regressions, 0 modifications to existing test logic.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
b07dde3 to
46125f4
Compare
…ible error codes Add 34 new test configurations to write availability strategy hedging tests covering all error codes from the base PPAF E2E test suite: DIRECT mode: - 503/21008 (SERVICE_UNAVAILABLE) for Replace, Upsert, Delete, Patch - 403/3 (FORBIDDEN_WRITEFORBIDDEN) for all 5 write ops - 408/UNKNOWN (REQUEST_TIMEOUT) for all 5 write ops GATEWAY mode: - 403/3 (FORBIDDEN_WRITEFORBIDDEN) for all 5 write ops - 408/UNKNOWN (REQUEST_TIMEOUT) for all 5 write ops - 408/GATEWAY_ENDPOINT_READ_TIMEOUT (network error) for all 5 write ops - 503/GATEWAY_ENDPOINT_UNAVAILABLE (network error) for all 5 write ops Parameterize gateway test method to accept error codes instead of hardcoding 503. Extend setupHttpClientToThrowCosmosException to support combined delay + network error mode for gateway-specific fault types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Please add an informative description that covers that changes made by the pull request and link all relevant issues.
If an SDK is being regenerated based on a new swagger spec, a link to the pull request containing these swagger spec changes has been included above.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines