feat(storage): multi-copy upload with store->pull->commit flow #593

rvagg · 2026-02-06T14:06:50Z

Sits on top of #544 which has the synapse-core side of this.

Implement store->pull->commit flow for efficient multi-copy storage replication.

Split operations API on StorageContext:

store(): upload data to SP, wait for parking confirmation
presignForCommit(): pre-sign EIP-712 extraData for pull + commit reuse
pull(): request SP-to-SP transfer from another provider
commit(): add pieces on-chain with optional pre-signed extraData
getPieceUrl(): get retrieval URL for SP-to-SP pulls

StorageManager.upload() orchestration:

Default 2 copies (primary + endorsed secondary)
Single-provider: store->commit flow
Multi-copy: store on primary, presign, pull to secondaries, commit all
Auto-retry failed secondaries with provider exclusion (up to 5 attempts)
Pre-signing avoids redundant wallet prompts across providers

Callback refinements:

Remove redundant onUploadComplete (use onStored instead)
onStored(providerId, pieceCid) - after data parked on provider
onPieceAdded(providerId, pieceCid) - after on-chain submission
onPieceConfirmed(providerId, pieceCid, pieceId) - after confirmation

Type clarity:

Rename UploadOptions.metadata -> pieceMetadata (piece-level)
Rename CommitOptions.pieces[].metadata -> pieceMetadata
Dataset-level metadata remains in CreateContextOptions.metadata
New: StoreError, CommitError for clear failure semantics
New: CopyResult, FailedCopy for multi-copy transparency

Implements #494

cloudflare-workers-and-pages · 2026-02-06T14:09:01Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	synapse-dev	`29ac8ad`	Commit Preview URL Branch Preview URL	Feb 10 2026, 11:55 PM

rvagg · 2026-02-06T14:33:17Z

Docs lint failing, this still needs a big docs addition but that can come a little later as we get through review here.

Here's some notes I built up about failure modes and handling:

Multi-Copy Upload: Failure Handling

Philosophy

Store failure = hard fail: If we can't store data anywhere, throw immediately
All commits fail = hard fail: If no provider commits successfully, throw CommitError
Partial commit failure = record and return: Record failed providers in failures[] (with role), return result with successful copies[]
Secondary failure = best-effort: Retry with replacement SPs, then commit whatever succeeded
Never throw away successful work: If data is committed on any provider, the user gets a result -- not an exception
Explicit providers = no retry: User specified providers, respect their choice
Batch semantics: All pieces must succeed on a provider, or that provider is failed
Transparency over exceptions: failures[] tells the user what went wrong; copies[] tells them what worked

Partial Success Over Atomicity

When a user requests N copies and we can only achieve fewer, we commit what we have rather than throwing everything away:

Best-effort exhaustion: For auto-selected providers, we retry up to 5 secondaries before giving up
Upload work is expensive: Throwing discards successful uploads; parked pieces get GC'd by the SP
No information loss: throw after partial success destroys information about what did succeed
Result inspection is the contract: result.copies.length < count tells the user they got fewer copies; result.failures tells them why

Failure Modes by Stage

The multi-copy upload has a sequential pipeline: select → store → pull → commit.

Stage 0: Provider Selection (before any upload)

Provider selection uses a tiered approach with ping validation at each step:

Priority	Selection Strategy	When Used
1	Existing data set with endorsed provider	Primary selection, has stored before
2	New data set with endorsed provider	Primary selection, fresh start
3	Existing data set with non-endorsed provider	Fallback if no endorsed available
4	New data set with non-endorsed provider	Final fallback

Ping validation: Before selecting any provider, we ping their PDP endpoint. If ping fails, we try the next provider in the current tier before falling to the next tier.

What happens	Behavior
Provider ping succeeds	Use this provider
Provider ping fails	Try next provider in tier, warn to console
All providers in tier fail ping	Move to next tier
All tiers exhausted (providers remain but unreachable)	Throw error: "All N providers failed health check"
No providers remain after filtering	Throw error for primary, break loop for secondaries

Key distinction:

For primary selection (first context), exhaustion = error (can't proceed)
For secondary selection (subsequent contexts), exhaustion = get fewer copies (proceed with what we have)

Stage 1: Store (upload data to primary SP)

Store has two sub-stages:

Sub-stage	What happens	Data state	Behavior
1a: Upload	HTTP upload stream succeeds	Data on SP (parked)	Continue to 1b
1a: Upload	HTTP upload stream fails (network, timeout)	No	Throw `StoreError`
1b: Confirm	Polling for "parked" status succeeds	Data on SP (parked)	Continue to pull
1b: Confirm	Polling for "parked" status times out	Unknown (may or may not exist)	Throw `StoreError`

Store failure is unambiguous from the SDK's perspective: either we have confirmed parked data, or we don't. The user can safely retry.

Note: If 1b times out, data might exist on the SP but we can't confirm it. The SP will eventually GC parked pieces that aren't committed.

Stage 2: Pull (SP-to-SP fetch to secondaries)

What happens	Data on secondary?	On-chain?	Behaviour
Pull succeeds	Yes (parked)	No	Continue to commit
Pull fails (auto-selected)	No	No	Retry with next provider (up to 5 attempts)
Pull fails (explicit provider)	No	No	Record in `failures[]`, no retry
All secondary attempts exhausted	No	No	Proceed to commit with primary only

Pull failure is recoverable: data is still on the primary, no on-chain state exists yet. Retrying pull is cheap (SP-to-SP, no client bandwidth).

Stage 3: Commit (addPieces on-chain transaction)

What happens	Data on SP?	On-chain?	Behaviour
All commits succeed	Yes	Yes	Build result with all copies
Primary commit succeeds, secondary fails	Yes	Primary: yes	Record secondary in `failures[]`
Primary commit fails, secondary succeeds	Yes	Secondary: yes	Record primary in `failures[]` with `role: 'primary'`, return with secondary in `copies[]`
Primary commit fails, secondary also fails	Yes (parked)	No	Throw `CommitError` -- nothing on-chain, safe to retry
Secondary commit fails	Yes (parked)	No	Record in `failures[]` -- data on SP, will be GC'd

Behaviour Matrix

Scenario	Behaviour
Primary store fails	Throw `StoreError` -- nothing happened
Primary commit fails, secondary succeeds	Record primary in `failures[]` with `role: 'primary'`, return result
All commits fail	Throw `CommitError` -- nothing on-chain
Secondary pull fails (auto-selected)	Retry with next provider (up to 5 attempts)
Secondary pull fails (explicit)	Record in `failures[]`, no retry
All secondary attempts exhausted	Commit primary only, record failures
Secondary commit fails	Record in `failures[]` -- data on SP, will be GC'd
Failover creates new dataset	Mark `isNewDataSet: true` in `CopyResult`
`copies.length < count`	Partial success -- user should inspect `failures[]`

Error Types

/** Primary store failed - no data stored anywhere, safe to retry */
class StoreError extends Error {
  name = 'StoreError'
}

/** All commits failed - data stored on SP(s) but nothing on-chain, safe to retry */
class CommitError extends Error {
  name = 'CommitError'
}

// Partial commit failures appear in result.failures[] with role: 'primary' or 'secondary'
// Only throws CommitError when ALL providers fail to commit

What Users Must Check

Users should always inspect result.failures, not just check that upload() didn't throw:

// If ALL commits fail, upload() throws CommitError
// If at least one succeeds, we get a result:
const result = await synapse.storage.upload(data, { count: 3 })

// Check if endorsed provider (primary) failed
const primaryFailed = result.failures.find(f => f.role === 'primary')
if (primaryFailed) {
  console.warn(`Endorsed provider ${primaryFailed.providerId} failed: ${primaryFailed.error}`)
  // Data is only on non-endorsed secondaries
}

// Check if we got all requested copies
if (result.copies.length < 3) {
  console.warn(`Only ${result.copies.length}/3 copies succeeded`)
  for (const failure of result.failures) {
    console.warn(`  Provider ${failure.providerId} (${failure.role}): ${failure.error}`)
  }
}

// Every copy in copies[] is committed on-chain
for (const copy of result.copies) {
  console.log(`Provider ${copy.providerId}, dataset ${copy.dataSetId}, piece ${copy.pieceId}`)
}

Auto-Retry Logic

When user calls upload(data, { count: 2 }) without explicit providerIds or dataSetIds:

Select primary (endorsed preferred)
Store on primary
Select secondary candidate from pool (excluding primary)
Pull to secondary
If pull fails:
- Mark secondary as failed
- Select next secondary from pool
- Retry pull (data already on primary)
- Repeat until: success OR exhausted pool OR hit MAX_SECONDARY_ATTEMPTS (5)
If no secondary succeeded → proceed to commit with primary only
Commit on all successful providers
Return result with copies[] and failures[]

When user specifies providerIds or dataSetIds: no auto-retry, failures recorded in failures[].

Design Decision: Primary Commit Failure Handling

Current implementation commits on all providers in parallel via Promise.allSettled(). If primary commit fails but secondary commit succeeds, we record the primary failure and return with the secondary in copies[].

Endorsed providers are selected as primary because they're curated for reliability. If primary (endorsed) fails but secondary (non-endorsed) succeeds, the user ends up with data only on non-endorsed providers. This may not meet product requirements of having one copy on an endorsed provider.

// Check if endorsed provider failed
const primaryFailed = result.failures.some(f => f.role === 'primary')
if (primaryFailed) {
  // Handle: retry, alert, or treat as error depending on requirements
}

timfong888 · 2026-02-06T19:07:19Z

I noticed this:

Primary store failure = hard fail: If we can't store on primary, throw immediately

What is the test for the availability of an Endorsed Provider in the case we have more than one? If the first store fails, is there a retry?

Under retry:

Select primary (endorsed preferred)
Store on primary

If we have 2 Endorsed, and the store on primary operation fails do we retry the other endorsed?

rvagg · 2026-02-08T23:45:36Z

@timfong888 I've clarified the post above with more detail:

Now says: Store failure = hard fail: If we can't store data anywhere, throw immediately.
There's now also a "Stage 0" that details how we select a provider
I updated "Stage 1" with details about the failure modes that can happen there too because there's nuanced ways it can go wrong.

rvagg · 2026-02-09T08:36:22Z

Docs updated to pass lint, additional tests added to address some gaps.

timfong888 · 2026-02-10T17:27:07Z

I am not clear on this:

All providers in tier fail ping	Move to next tier

My understanding is if no Endorsed SP succeeds, it's a failure operation, because if there is no Endorsed and we only have Approved, that has a low durability guarantee.

timfong888 · 2026-02-10T17:28:33Z

Key distinction:

For primary selection (first context), exhaustion = error (can't proceed)
For secondary selection (subsequent contexts), exhaustion = get fewer copies (proceed with what we have)

The above seems right. If Primary exhausts, it's error, not go to the next tier, right?

timfong888 · 2026-02-10T17:40:11Z

Question: If the endorsed provider passed ping during selection but then fails during store() (HTTP upload or parking
confirmation), StoreError is thrown immediately. There doesn't appear to be an attempt to try another endorsed provider. But if there is, then great, but checking.

timfong888 · 2026-02-10T19:12:18Z

All commits failed - data stored on SP(s) but nothing on-chain, safe to retry
parked pieces get GC'd by the SP

What happens if GC before retry?

Implement store->pull->commit flow for efficient multi-copy storage replication. Split operations API on StorageContext: - store(): upload data to SP, wait for parking confirmation - presignForCommit(): pre-sign EIP-712 extraData for pull + commit reuse - pull(): request SP-to-SP transfer from another provider - commit(): add pieces on-chain with optional pre-signed extraData - getPieceUrl(): get retrieval URL for SP-to-SP pulls StorageManager.upload() orchestration: - Default 2 copies (primary + endorsed secondary) - Single-provider: store->commit flow - Multi-copy: store on primary, presign, pull to secondaries, commit all - Auto-retry failed secondaries with provider exclusion (up to 5 attempts) - Pre-signing avoids redundant wallet prompts across providers Callback refinements: - Remove redundant onUploadComplete (use onStored instead) - onStored(providerId, pieceCid) - after data parked on provider - onPieceAdded(providerId, pieceCid) - after on-chain submission - onPieceConfirmed(providerId, pieceCid, pieceId) - after confirmation Type clarity: - Rename UploadOptions.metadata -> pieceMetadata (piece-level) - Rename CommitOptions.pieces[].metadata -> pieceMetadata - Dataset-level metadata remains in CreateContextOptions.metadata - New: StoreError, CommitError for clear failure semantics - New: CopyResult, FailedCopy for multi-copy transparency Implements #494

rvagg · 2026-02-11T04:03:39Z

@timfong888:

On the tier question: yes, the current code does fall back to approved-only if no endorsed provider passes the health check. A requireEndorsed option is something I wrote down as on the table for the future, but right now the priority is "data gets stored" over "only endorsed". If that's a problem we should talk about it, but I think for launch it's the right trade-off since endorsed providers failing the health check would be an unusual situation? Maybe a hard failure is a better signal for us though.

There doesn't appear to be an attempt to try another endorsed provider

Not right now. Couple of reasons:

Scope, this is where I'm drawing the line for the first iteration. First pass, best effort, fail clearly.
It's hard because streams can only be consumed once. If the user gives us raw bytes or a File we could restart, but for a plain stream we can't, and the DX gets complicated fast (do we silently re-send 1GiB? what about streams that can't restart?). Better to throw and let the user decide until we work through the DX of it and see if it's worth the complexity.

What happens if GC before retry

Curio GCs unreferenced pieces after 24 hours, so there's a comfortable window for retries for the commit phase.

timfong888 · 2026-02-11T16:54:14Z

Okay. So it randomizes across the Endorsed SP for ping if no existing context.

As long as they are good and an endorsed stores and commits successfully we are good. That's a fair assumption.

…lity Borrowed a lot of this from #593, and merged with foc-devnet-info support.

rvagg requested a review from hugomrdias as a code owner February 6, 2026 14:06

github-project-automation bot added this to FOC Feb 6, 2026

github-project-automation bot moved this to 📌 Triage in FOC Feb 6, 2026

This was referenced Feb 6, 2026

GA DURABILITY: Multi-copy upload via SP-to-SP pull #494

Open

Cut Synapse-SDK v0.38.0 #591

Open

rvagg mentioned this pull request Feb 9, 2026

refactor: remove pdp in sdk #595

Merged

rjan90 assigned rvagg Feb 9, 2026

BigLep linked an issue Feb 9, 2026 that may be closed by this pull request

GA DURABILITY: Multi-copy upload via SP-to-SP pull #494

Open

5 tasks

BigLep moved this from 📌 Triage to 🔎 Awaiting review in FOC Feb 9, 2026

rvagg force-pushed the rvagg/pull-upload-flow branch from eb878ac to 29ac8ad Compare February 10, 2026 23:49

rvagg mentioned this pull request Feb 12, 2026

init: foc-devnet-info library in synapse-core #600

Open

rvagg added a commit that referenced this pull request Feb 12, 2026

fix: make example script work again, refactor for maximum example uti…

1ff4386

…lity Borrowed a lot of this from #593, and merged with foc-devnet-info support.

This was referenced Feb 12, 2026

fix: make example script work again, refactor for maximum example utility #604

Open

resolveByDataSetId doesn't check metadata consistency #485

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(storage): multi-copy upload with store->pull->commit flow #593

feat(storage): multi-copy upload with store->pull->commit flow #593

Uh oh!

rvagg commented Feb 6, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

rvagg commented Feb 6, 2026 •

edited

Loading

Uh oh!

timfong888 commented Feb 6, 2026 •

edited

Loading

Uh oh!

rvagg commented Feb 8, 2026

Uh oh!

rvagg commented Feb 9, 2026

Uh oh!

timfong888 commented Feb 10, 2026

Uh oh!

timfong888 commented Feb 10, 2026

Uh oh!

timfong888 commented Feb 10, 2026

Uh oh!

timfong888 commented Feb 10, 2026

Uh oh!

rvagg commented Feb 11, 2026

Uh oh!

timfong888 commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(storage): multi-copy upload with store->pull->commit flow #593

Are you sure you want to change the base?

feat(storage): multi-copy upload with store->pull->commit flow #593

Uh oh!

Conversation

rvagg commented Feb 6, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

rvagg commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Multi-Copy Upload: Failure Handling

Philosophy

Partial Success Over Atomicity

Failure Modes by Stage

Stage 0: Provider Selection (before any upload)

Stage 1: Store (upload data to primary SP)

Stage 2: Pull (SP-to-SP fetch to secondaries)

Stage 3: Commit (addPieces on-chain transaction)

Behaviour Matrix

Error Types

What Users Must Check

Auto-Retry Logic

Design Decision: Primary Commit Failure Handling

Uh oh!

timfong888 commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rvagg commented Feb 8, 2026

Uh oh!

rvagg commented Feb 9, 2026

Uh oh!

timfong888 commented Feb 10, 2026

Uh oh!

timfong888 commented Feb 10, 2026

Uh oh!

timfong888 commented Feb 10, 2026

Uh oh!

timfong888 commented Feb 10, 2026

Uh oh!

rvagg commented Feb 11, 2026

Uh oh!

timfong888 commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cloudflare-workers-and-pages bot commented Feb 6, 2026 •

edited

Loading

rvagg commented Feb 6, 2026 •

edited

Loading

timfong888 commented Feb 6, 2026 •

edited

Loading