-
Notifications
You must be signed in to change notification settings - Fork 25
feat(storage): multi-copy upload with store->pull->commit flow #593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: rvagg/sp-sp-fetch
Are you sure you want to change the base?
Conversation
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
synapse-dev | 29ac8ad | Commit Preview URL Branch Preview URL |
Feb 10 2026, 11:55 PM |
|
Docs lint failing, this still needs a big docs addition but that can come a little later as we get through review here. Here's some notes I built up about failure modes and handling: Multi-Copy Upload: Failure HandlingPhilosophy
Partial Success Over AtomicityWhen a user requests N copies and we can only achieve fewer, we commit what we have rather than throwing everything away:
Failure Modes by StageThe multi-copy upload has a sequential pipeline: select → store → pull → commit. Stage 0: Provider Selection (before any upload)Provider selection uses a tiered approach with ping validation at each step:
Ping validation: Before selecting any provider, we ping their PDP endpoint. If ping fails, we try the next provider in the current tier before falling to the next tier.
Key distinction:
Stage 1: Store (upload data to primary SP)Store has two sub-stages:
Store failure is unambiguous from the SDK's perspective: either we have confirmed parked data, or we don't. The user can safely retry. Note: If 1b times out, data might exist on the SP but we can't confirm it. The SP will eventually GC parked pieces that aren't committed. Stage 2: Pull (SP-to-SP fetch to secondaries)
Pull failure is recoverable: data is still on the primary, no on-chain state exists yet. Retrying pull is cheap (SP-to-SP, no client bandwidth). Stage 3: Commit (addPieces on-chain transaction)
Behaviour Matrix
Error Types/** Primary store failed - no data stored anywhere, safe to retry */
class StoreError extends Error {
name = 'StoreError'
}
/** All commits failed - data stored on SP(s) but nothing on-chain, safe to retry */
class CommitError extends Error {
name = 'CommitError'
}
// Partial commit failures appear in result.failures[] with role: 'primary' or 'secondary'
// Only throws CommitError when ALL providers fail to commitWhat Users Must CheckUsers should always inspect // If ALL commits fail, upload() throws CommitError
// If at least one succeeds, we get a result:
const result = await synapse.storage.upload(data, { count: 3 })
// Check if endorsed provider (primary) failed
const primaryFailed = result.failures.find(f => f.role === 'primary')
if (primaryFailed) {
console.warn(`Endorsed provider ${primaryFailed.providerId} failed: ${primaryFailed.error}`)
// Data is only on non-endorsed secondaries
}
// Check if we got all requested copies
if (result.copies.length < 3) {
console.warn(`Only ${result.copies.length}/3 copies succeeded`)
for (const failure of result.failures) {
console.warn(` Provider ${failure.providerId} (${failure.role}): ${failure.error}`)
}
}
// Every copy in copies[] is committed on-chain
for (const copy of result.copies) {
console.log(`Provider ${copy.providerId}, dataset ${copy.dataSetId}, piece ${copy.pieceId}`)
}Auto-Retry LogicWhen user calls
When user specifies Design Decision: Primary Commit Failure HandlingCurrent implementation commits on all providers in parallel via Endorsed providers are selected as primary because they're curated for reliability. If primary (endorsed) fails but secondary (non-endorsed) succeeds, the user ends up with data only on non-endorsed providers. This may not meet product requirements of having one copy on an endorsed provider. // Check if endorsed provider failed
const primaryFailed = result.failures.some(f => f.role === 'primary')
if (primaryFailed) {
// Handle: retry, alert, or treat as error depending on requirements
} |
|
I noticed this:
What is the test for the availability of an Endorsed Provider in the case we have more than one? If the first store fails, is there a retry? Under retry:
If we have 2 Endorsed, and the store on primary operation fails do we retry the other endorsed? |
|
@timfong888 I've clarified the post above with more detail:
|
|
Docs updated to pass lint, additional tests added to address some gaps. |
|
I am not clear on this:
My understanding is if no Endorsed SP succeeds, it's a failure operation, because if there is no Endorsed and we only have Approved, that has a low durability guarantee. |
For primary selection (first context), exhaustion = error (can't proceed) The above seems right. If Primary exhausts, it's error, not go to the next tier, right? |
|
Question: If the endorsed provider passed ping during selection but then fails during store() (HTTP upload or parking |
What happens if GC before retry? |
Implement store->pull->commit flow for efficient multi-copy storage replication. Split operations API on StorageContext: - store(): upload data to SP, wait for parking confirmation - presignForCommit(): pre-sign EIP-712 extraData for pull + commit reuse - pull(): request SP-to-SP transfer from another provider - commit(): add pieces on-chain with optional pre-signed extraData - getPieceUrl(): get retrieval URL for SP-to-SP pulls StorageManager.upload() orchestration: - Default 2 copies (primary + endorsed secondary) - Single-provider: store->commit flow - Multi-copy: store on primary, presign, pull to secondaries, commit all - Auto-retry failed secondaries with provider exclusion (up to 5 attempts) - Pre-signing avoids redundant wallet prompts across providers Callback refinements: - Remove redundant onUploadComplete (use onStored instead) - onStored(providerId, pieceCid) - after data parked on provider - onPieceAdded(providerId, pieceCid) - after on-chain submission - onPieceConfirmed(providerId, pieceCid, pieceId) - after confirmation Type clarity: - Rename UploadOptions.metadata -> pieceMetadata (piece-level) - Rename CommitOptions.pieces[].metadata -> pieceMetadata - Dataset-level metadata remains in CreateContextOptions.metadata - New: StoreError, CommitError for clear failure semantics - New: CopyResult, FailedCopy for multi-copy transparency Implements #494
eb878ac to
29ac8ad
Compare
|
On the tier question: yes, the current code does fall back to approved-only if no endorsed provider passes the health check. A
Not right now. Couple of reasons:
Curio GCs unreferenced pieces after 24 hours, so there's a comfortable window for retries for the commit phase. |
|
Okay. So it randomizes across the Endorsed SP for ping if no existing context. As long as they are good and an endorsed stores and commits successfully we are good. That's a fair assumption. |
…lity Borrowed a lot of this from #593, and merged with foc-devnet-info support.
Sits on top of #544 which has the synapse-core side of this.
Implement store->pull->commit flow for efficient multi-copy storage replication.
Split operations API on StorageContext:
StorageManager.upload() orchestration:
Callback refinements:
Type clarity:
Implements #494