Skip to content

piece-retriever: add try different SP on retrieval failure#438

Merged
juliangruber merged 11 commits intomainfrom
add/retry-SPs
Nov 7, 2025
Merged

piece-retriever: add try different SP on retrieval failure#438
juliangruber merged 11 commits intomainfrom
add/retry-SPs

Conversation

@juliangruber
Copy link
Member

No description provided.

Comment on lines 93 to 96
const response = new Response('No retrieval candidate found', {
status: 404,
})
setContentSecurityPolicy(response)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to better understand why we cannot use httpAssert here. Can you please explain?

I am also concerned about the error message "No retrieval candidate found" - from the perspective of a retrieval client, it's not clear to me how to troubleshoot this error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See code below, with http assert we can't add the content security policy.

I am also concerned about the error message "No retrieval candidate found" - from the perspective of a retrieval client, it's not clear to me how to troubleshoot this error.

Do you have a suggestion? Tbh I don't know if this case can even ever happen, since this is after all of those checks in

httpAssert(
results && results.length > 0,
404,
`Piece_cid '${pieceCid}' does not exist or may not have been indexed yet.`,
)
const withServiceProvider = results.filter(
(row) => row && row.service_provider_id != null,
)
httpAssert(
withServiceProvider.length > 0,
404,
`Piece_cid '${pieceCid}' exists but has no associated service provider.`,
)
const withPaymentRail = withServiceProvider.filter(
(row) =>
row.payer_address && row.payer_address.toLowerCase() === payerAddress,
)
httpAssert(
withPaymentRail.length > 0,
402,
`There is no Filecoin Warm Storage Service deal for payer '${payerAddress}' and piece_cid '${pieceCid}'.`,
)
const withCDN = withPaymentRail.filter(
(row) => row.with_cdn && row.with_cdn === 1,
)
httpAssert(
withCDN.length > 0,
402,
`The Filecoin Warm Storage Service deal for payer '${payerAddress}' and piece_cid '${pieceCid}' has withCDN=false.`,
)
const withPayerNotSanctioned = withCDN.filter((row) => !row.is_sanctioned)
httpAssert(
withPayerNotSanctioned.length > 0,
403,
`Wallet '${payerAddress}' is sanctioned and cannot retrieve piece_cid '${pieceCid}'.`,
)
const withApprovedProvider = withPayerNotSanctioned.filter(
(row) => row.service_url,
)
httpAssert(
withApprovedProvider.length > 0,
404,
`No approved service provider found for payer '${payerAddress}' and piece_cid '${pieceCid}'.`,
)
// Check CDN quota first
const withSufficientCDNQuota = enforceEgressQuota
? withApprovedProvider.filter((row) => {
return BigInt(row.cdn_egress_quota ?? '0') > 0n
})
: withApprovedProvider
httpAssert(
withSufficientCDNQuota.length > 0,
402,
`CDN egress quota exhausted for payer '${payerAddress}' and data set '${withApprovedProvider[0]?.data_set_id}'. Please top up your CDN egress quota.`,
)
// Check cache-miss quota
const withSufficientCacheMissQuota = enforceEgressQuota
? withSufficientCDNQuota.filter((row) => {
return BigInt(row.cache_miss_egress_quota ?? '0') > 0n
})
: withSufficientCDNQuota
httpAssert(
withSufficientCacheMissQuota.length > 0,
402,
`Cache miss egress quota exhausted for payer '${payerAddress}' and data set '${withSufficientCDNQuota[0]?.data_set_id}'. Please top up your cache miss egress quota.`,
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a content security policy for a 404 response with a text body? We are currently not setting up the content security policy in this case, as far as I can tell from the code. What changed?

Alternatively, is there a reason why we should not set CSP for every execution path? We can move the call of setContentSecurityPolicy to the fetch function above, which is a thin wrapper around the real request handler that adds error handling for HTTP asserts.

I am also concerned about the error message "No retrieval candidate found" - from the perspective of a retrieval client, it's not clear to me how to troubleshoot this error.

Do you have a suggestion? Tbh I don't know if this case can even ever happen, since this is after all of those checks in

So before this change, the error message mentioned the service provider that we are refusing to cache-miss to.

When I look at getRetrievalCandidatesAndValidatePayer, there is a check before returning the filtered array to ensure there is at least one candidate left, so I agree with you that this error will never happen in the current codebase.

I propose to rephrase the error message to something like the following:

Internal error: No available provider was found for cache-miss request Piece CID ${pieceCid} and Payer ${payerWalletAddress}

I am struggling to find a good way to describe the fact that this is not only about finding an SP willing to serve the content, but also about the client having a payment rail with non-zero remaining cache-miss egress remaining. I.e. we want a generic message saying that our internal provider discovery mechanism failed in an unexpected way.

How about this:

Internal error: service provider lookup failed.

I think we should also return status code 500 for Internal Server Error.

Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed & addressed in c2e1d34

@juliangruber juliangruber requested a review from bajtos November 6, 2025 08:59
Copy link
Contributor

@bajtos bajtos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

juliangruber and others added 3 commits November 7, 2025 10:09
@juliangruber juliangruber requested a review from bajtos November 7, 2025 09:12
)
const response = new Response(
`Service provider ${serviceProviderId} is unavailable${retrievalResult ? ` at ${retrievalResult.url}` : ''}`,
`No available service provider found. Attempted: ${retrievalAttempts.map((a) => `ID=${a.serviceProviderId} (Service URL=${a.serviceUrl})`).join(', ')}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will eventually need to update filbeam-bot to support an array of IDs in this header:

https://github.com/filbeam/bot/blob/c4479005a7834e0e708d1dc910bc9fd4baf5794a/index.js#L197-L203

Maybe after we land my PR filbeam/bot#40 first, to avoid merge conflicts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@bajtos bajtos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏🏻 :shipit:

@juliangruber juliangruber merged commit 000c19b into main Nov 7, 2025
8 checks passed
@juliangruber juliangruber deleted the add/retry-SPs branch November 7, 2025 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants