Skip to content

Conversation

@bajtos
Copy link
Contributor

@bajtos bajtos commented Sep 29, 2025

URL schema:

  • https://1-{base32(dataSetId)}-{base32(pieceId)}.ipfs.filbeam.io/{favicon.ico}
  • https://ipfs.filbeam.io/{WalletAddres}/{IpfsRootCid}/{favicon.ico} (redirects to the above)

This is a proof of concept that I don't expect to land. Instead, I'll extract smaller pull requests and land them independently.

I will be updating filbeam-ipfs-retriever-calibration worker on CF with the code from this pull request.

I have a special-case path sending the retrieval requests to our Frisbii instance as a workaround until Curio implements IPFS retrievals for PDP deals:

Raw image:

CAR format:

Does not work yet - we need to figure out Cloudflare config:
https://ipfs.calibration.filbeam.io/0x000000000000000000000000000000000000dead/bafybeiagrjpf2rwth5oylc64czsrz2jm7a4fgo67b2luygqjrivjbswuku/rusty-lassie.png

TODO:

  • Add CI & CD steps
  • Re-enable the skipped tests (see FIXME comments).
    This is blocked until there is at least one calibnet SP running the Curio version that scans PDP pieces for IPFS CAR blocks.
  • Create real SP test for subpath support (currently not supported by FOC SPs)
  • Support all ?format=raw
  • Support for other IPFS Trustless GW options, like dag-scope, etc.
  • Calculate egress consumption, account for different size of the cache-miss CAR response vs client-side RAW response
  • Bot to peridically check IPFS retrievals
  • Preserve ?format=car and other query-string params when redirecting from /wallet/cid to 1-{dataset}-{piece}.
  • Convert the wallet address to lowercase before querying the DB
  • Implement bad-bits lookup
  • Don't deploy to mainnet

Links:

@bajtos bajtos requested a review from juliangruber September 29, 2025 13:48
Copy link
Member

@juliangruber juliangruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are quite a few things both retrievers share, like

  • measure egress
  • check bad bits
  • etc

I think we shouldn't duplicate these. Instead, I propose:

  • Create a new module /retriever, acting as a lib, exporting functions mentioned above and more
  • Use that lib in /ipfs-retriever
  • And /piece-retriever

Wdyt?

@bajtos
Copy link
Contributor Author

bajtos commented Sep 30, 2025

There are quite a few things both retrievers share, like

  • measure egress
  • check bad bits
  • etc
    I think we shouldn't duplicate these. Instead, I propose:
  • Create a new module /retriever, acting as a lib, exporting functions mentioned above and more
  • Use that lib in /ipfs-retriever
  • And /piece-retriever
    Wdyt?

That's a great idea, I like it.

My main concern is about how long it will take to ship this, I'd rather have a working PoC ipfs-retriever by Friday, even if I deploy it from an open PR.

Having said that, I am waiting for Curio to start scanning PDP Pieces for IPFS Car blocks, so I can do the suggested refactor in the meantime.

Let me give it a try and see how it goes.

bajtos added 2 commits October 2, 2025 08:21
Clone `retriever` to `ipfs-retriever`.

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
@bajtos bajtos force-pushed the serve-ipfs-retrievals branch from 8c9649a to f73c566 Compare October 2, 2025 06:21
bajtos added 11 commits October 2, 2025 08:21
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
@bajtos bajtos changed the title feat: serve IPFS retrievals PoC: serve IPFS retrievals [DO NOT MERGE] Oct 20, 2025
@socket-security
Copy link

socket-security bot commented Oct 20, 2025

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Added@​web3-storage/​car-block-validator@​1.2.2871008082100
Added@​ipld/​car@​5.4.29910010086100
Addedipfs-unixfs-exporter@​13.7.39810010090100

View full report

@juliangruber juliangruber changed the title PoC: serve IPFS retrievals [DO NOT MERGE] PoC: serve IPFS retrievals Oct 27, 2025
* retrieval
* @returns {Promise<void>} - A promise that resolves when the log is inserted.
*/
export async function logRetrievalResult(env, params) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is shared with the retriever-worker.

However, we need to enhance it to accept two sizes (byte-counts) - one for the CDN response stream returned to the client, and another for the cache-miss response from the SP (set to 0 or NULL for cache-hit requests).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +103 to +129
const withServiceProvider = results.filter(
(row) => row && row.service_provider_id != null,
)
httpAssert(
withServiceProvider.length > 0,
404,
`${lookupKey} exists but has no associated service provider.`,
)

const withPaymentRail = withServiceProvider.filter(
(row) =>
row.payer_address && row.payer_address.toLowerCase() === payerAddress,
)
httpAssert(
withPaymentRail.length > 0,
402,
`There is no Filecoin Warm Storage Service deal for payer '${payerAddress}' and ${lookupKey}.`,
)

const withCDN = withPaymentRail.filter(
(row) => row.with_cdn && row.with_cdn === 1,
)
httpAssert(
withCDN.length > 0,
402,
`The Filecoin Warm Storage Service deal for payer '${payerAddress}' and ${lookupKey} has withCDN=false.`,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQL query and some of the following filters are specific to IPFS retrievals, but many filters are shared with the Piece retrievals. It makes me wonder if it's worth extracting this into a shared helper?

Comment on lines +138 to +154
const withPayerNotSanctioned = withIpfsIndexing.filter(
(row) => !row.is_sanctioned,
)
httpAssert(
withPayerNotSanctioned.length > 0,
403,
`Wallet '${payerAddress}' is sanctioned and cannot retrieve ${lookupKey}.`,
)

const withApprovedProvider = withPayerNotSanctioned.filter(
(row) => row.service_url,
)
httpAssert(
withApprovedProvider.length > 0,
404,
`No approved service provider found for payer '${payerAddress}' and ${lookupKey}.`,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be shared as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to create a discussion thread that we can mark as resolved later, and the only way to do so is by attaching the first comment to some code.

Let's review the TODO check-list in the issue description:

  • Support all ?format=raw
  • Support for other IPFS Trustless GW options, like dag-scope, etc.
  • Calculate egress consumption, account for different size of the cache-miss CAR response vs client-side RAW response
  • Bot to peridically check IPFS retrievals
  • Preserve ?format=car and other query-string params when redirecting from /wallet/cid to 1-{dataset}-{piece}.

I feel many of these items can be implemented later and don't block the landing of this PR.

The only hard requirement is IMO reporting egress consumption, as it is tied to billing. I can imagine the first iteration can report CAR size instead of the real response size. We will charge clients a bit more (CAR is larger than the raw data), but I think that's okay for a few days until we fix the problem in a follow-up pull request.

It would be nice to preserve the query string when redirecting from link.* to 1-{dataset}-{piece}.*, since it should be a quick fix.

I propose to convert the rest of TODOs into new GH issues and add them as children of the theme MVP of FilBeam for IPFS (filbeam/roadmap#85).

@juliangruber thoughts?

Comment on lines +12 to +20
// This Piece must have IPFS RootCID set and IPFS Indexing enabled at the dataset level
serviceProviderId: '23',
serviceUrl: 'https://pdp.oplian.com/',
pieceCid:
'bafkzcibe2g5acdgp624n6qglofslq4dl2aixoeecjsqiqzcbk4ji6vpmyvcr2pytaq',
ipfsRootCid: 'bafybeidt6ugk5xeoeeumev3eexamnjxvexbfpfajx4kgzgsa5hkrwlhavu',
dataSetId: 845,
pieceId: '0',
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Piece is no longer present on Calibnet after we upgraded to FWSS v1.0.0. We need to upload new IPFS content.

Comment on lines +165 to +181
const {
piece_id: pieceId,
data_set_id: dataSetId,
ipfs_root_cid: ipfsRootCid,
service_provider_id: serviceProviderId,
service_url: serviceUrl,
} = withApprovedProvider[0]

// We need this assertion to supress TypeScript error. The compiler is not able to infer that
// `withApprovedProvider.filter()` above returns only rows with `service_url` defined.
httpAssert(serviceUrl, 500, 'should never happen')

console.log(
`Validated data set ID '${dataSetId}', piece ID '${pieceId}', and service provider id '${serviceProviderId}' for ${lookupKey} and payer '${payerAddress}'. Service URL: ${serviceUrl}`,
)

return { serviceProviderId, serviceUrl, dataSetId, pieceId, ipfsRootCid }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to modify this to follow the changes made in #438

Comment on lines +200 to +209
// FIXME: move this logic into processIpfsResponse function
// When converting from CAR to RAW, set content-disposition to inline
// so browsers display the content instead of downloading it.
if (ipfsFormat !== 'car') {
response.headers.set('content-disposition', 'inline')
// Also remove the content-type header, remove x-content-type-options,
// and let the browser to sniff the content type.
response.headers.delete('content-type')
response.headers.delete('x-content-type-options')
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we move this block to processIpfsResponse(), the code post-processing the response should be pretty much identical in both piece-retriever and ipfs-retriever.

Comment on lines +244 to +270
/**
* Extracts status and message from an error object.
*
* - If the error has a numeric `status`, it is used; otherwise, defaults to 500.
* - If the status is < 500 and a string `message` exists, it's used; otherwise, a
* generic message is returned.
*
* @param {unknown} error - The error object to extract from.
* @returns {{ status: number; message: string }}
*/
function getErrorHttpStatusMessage(error) {
const isObject = typeof error === 'object' && error !== null
const status =
isObject && 'status' in error && typeof error.status === 'number'
? error.status
: 500

const message =
isObject &&
status < 500 &&
'message' in error &&
typeof error.message === 'string'
? error.message
: 'Internal Server Error'

return { status, message }
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a good candidate to extract into a helper shared by both piece-retriever and ipfs-retriever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test-data builders can be shared by piece-retriever and ipfs-retriever. However, we need to preserve the two new options - ipfsRootCid and withIpfsIndexing.

Should we start a new shared package called /test-data-builders?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants