-
Notifications
You must be signed in to change notification settings - Fork 7
Serve IPFS retrievals #312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are quite a few things both retrievers share, like
- measure egress
- check bad bits
- etc
I think we shouldn't duplicate these. Instead, I propose:
- Create a new module
/retriever, acting as a lib, exporting functions mentioned above and more - Use that lib in
/ipfs-retriever - And
/piece-retriever
Wdyt?
That's a great idea, I like it. My main concern is about how long it will take to ship this, I'd rather have a working PoC ipfs-retriever by Friday, even if I deploy it from an open PR. Having said that, I am waiting for Curio to start scanning PDP Pieces for IPFS Car blocks, so I can do the suggested refactor in the meantime. Let me give it a try and see how it goes. |
Clone `retriever` to `ipfs-retriever`. Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
8c9649a to
f73c566
Compare
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
ipfs-retriever/lib/store.js
Outdated
| * retrieval | ||
| * @returns {Promise<void>} - A promise that resolves when the log is inserted. | ||
| */ | ||
| export async function logRetrievalResult(env, params) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is shared with the retriever-worker.
However, we need to enhance it to accept two sizes (byte-counts) - one for the CDN response stream returned to the client, and another for the cache-miss response from the SP (set to 0 or NULL for cache-hit requests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const withServiceProvider = results.filter( | ||
| (row) => row && row.service_provider_id != null, | ||
| ) | ||
| httpAssert( | ||
| withServiceProvider.length > 0, | ||
| 404, | ||
| `${lookupKey} exists but has no associated service provider.`, | ||
| ) | ||
|
|
||
| const withPaymentRail = withServiceProvider.filter( | ||
| (row) => | ||
| row.payer_address && row.payer_address.toLowerCase() === payerAddress, | ||
| ) | ||
| httpAssert( | ||
| withPaymentRail.length > 0, | ||
| 402, | ||
| `There is no Filecoin Warm Storage Service deal for payer '${payerAddress}' and ${lookupKey}.`, | ||
| ) | ||
|
|
||
| const withCDN = withPaymentRail.filter( | ||
| (row) => row.with_cdn && row.with_cdn === 1, | ||
| ) | ||
| httpAssert( | ||
| withCDN.length > 0, | ||
| 402, | ||
| `The Filecoin Warm Storage Service deal for payer '${payerAddress}' and ${lookupKey} has withCDN=false.`, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SQL query and some of the following filters are specific to IPFS retrievals, but many filters are shared with the Piece retrievals. It makes me wonder if it's worth extracting this into a shared helper?
| const withPayerNotSanctioned = withIpfsIndexing.filter( | ||
| (row) => !row.is_sanctioned, | ||
| ) | ||
| httpAssert( | ||
| withPayerNotSanctioned.length > 0, | ||
| 403, | ||
| `Wallet '${payerAddress}' is sanctioned and cannot retrieve ${lookupKey}.`, | ||
| ) | ||
|
|
||
| const withApprovedProvider = withPayerNotSanctioned.filter( | ||
| (row) => row.service_url, | ||
| ) | ||
| httpAssert( | ||
| withApprovedProvider.length > 0, | ||
| 404, | ||
| `No approved service provider found for payer '${payerAddress}' and ${lookupKey}.`, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be shared as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to create a discussion thread that we can mark as resolved later, and the only way to do so is by attaching the first comment to some code.
Let's review the TODO check-list in the issue description:
- Support all
?format=raw - Support for other IPFS Trustless GW options, like dag-scope, etc.
- Calculate egress consumption, account for different size of the cache-miss CAR response vs client-side RAW response
- Bot to peridically check IPFS retrievals
- Preserve
?format=carand other query-string params when redirecting from/wallet/cid to 1-{dataset}-{piece}.
I feel many of these items can be implemented later and don't block the landing of this PR.
The only hard requirement is IMO reporting egress consumption, as it is tied to billing. I can imagine the first iteration can report CAR size instead of the real response size. We will charge clients a bit more (CAR is larger than the raw data), but I think that's okay for a few days until we fix the problem in a follow-up pull request.
It would be nice to preserve the query string when redirecting from link.* to 1-{dataset}-{piece}.*, since it should be a quick fix.
I propose to convert the rest of TODOs into new GH issues and add them as children of the theme MVP of FilBeam for IPFS (filbeam/roadmap#85).
@juliangruber thoughts?
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
| // This Piece must have IPFS RootCID set and IPFS Indexing enabled at the dataset level | ||
| serviceProviderId: '23', | ||
| serviceUrl: 'https://pdp.oplian.com/', | ||
| pieceCid: | ||
| 'bafkzcibe2g5acdgp624n6qglofslq4dl2aixoeecjsqiqzcbk4ji6vpmyvcr2pytaq', | ||
| ipfsRootCid: 'bafybeidt6ugk5xeoeeumev3eexamnjxvexbfpfajx4kgzgsa5hkrwlhavu', | ||
| dataSetId: 845, | ||
| pieceId: '0', | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Piece is no longer present on Calibnet after we upgraded to FWSS v1.0.0. We need to upload new IPFS content.
| const { | ||
| piece_id: pieceId, | ||
| data_set_id: dataSetId, | ||
| ipfs_root_cid: ipfsRootCid, | ||
| service_provider_id: serviceProviderId, | ||
| service_url: serviceUrl, | ||
| } = withApprovedProvider[0] | ||
|
|
||
| // We need this assertion to supress TypeScript error. The compiler is not able to infer that | ||
| // `withApprovedProvider.filter()` above returns only rows with `service_url` defined. | ||
| httpAssert(serviceUrl, 500, 'should never happen') | ||
|
|
||
| console.log( | ||
| `Validated data set ID '${dataSetId}', piece ID '${pieceId}', and service provider id '${serviceProviderId}' for ${lookupKey} and payer '${payerAddress}'. Service URL: ${serviceUrl}`, | ||
| ) | ||
|
|
||
| return { serviceProviderId, serviceUrl, dataSetId, pieceId, ipfsRootCid } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to modify this to follow the changes made in #438
| // FIXME: move this logic into processIpfsResponse function | ||
| // When converting from CAR to RAW, set content-disposition to inline | ||
| // so browsers display the content instead of downloading it. | ||
| if (ipfsFormat !== 'car') { | ||
| response.headers.set('content-disposition', 'inline') | ||
| // Also remove the content-type header, remove x-content-type-options, | ||
| // and let the browser to sniff the content type. | ||
| response.headers.delete('content-type') | ||
| response.headers.delete('x-content-type-options') | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After we move this block to processIpfsResponse(), the code post-processing the response should be pretty much identical in both piece-retriever and ipfs-retriever.
| /** | ||
| * Extracts status and message from an error object. | ||
| * | ||
| * - If the error has a numeric `status`, it is used; otherwise, defaults to 500. | ||
| * - If the status is < 500 and a string `message` exists, it's used; otherwise, a | ||
| * generic message is returned. | ||
| * | ||
| * @param {unknown} error - The error object to extract from. | ||
| * @returns {{ status: number; message: string }} | ||
| */ | ||
| function getErrorHttpStatusMessage(error) { | ||
| const isObject = typeof error === 'object' && error !== null | ||
| const status = | ||
| isObject && 'status' in error && typeof error.status === 'number' | ||
| ? error.status | ||
| : 500 | ||
|
|
||
| const message = | ||
| isObject && | ||
| status < 500 && | ||
| 'message' in error && | ||
| typeof error.message === 'string' | ||
| ? error.message | ||
| : 'Internal Server Error' | ||
|
|
||
| return { status, message } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a good candidate to extract into a helper shared by both piece-retriever and ipfs-retriever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test-data builders can be shared by piece-retriever and ipfs-retriever. However, we need to preserve the two new options - ipfsRootCid and withIpfsIndexing.
Should we start a new shared package called /test-data-builders?
URL schema:
https://1-{base32(dataSetId)}-{base32(pieceId)}.ipfs.filbeam.io/{favicon.ico}https://ipfs.filbeam.io/{WalletAddres}/{IpfsRootCid}/{favicon.ico}(redirects to the above)This is a proof of concept that I don't expect to land. Instead, I'll extract smaller pull requests and land them independently.
I will be updating
filbeam-ipfs-retriever-calibrationworker on CF with the code from this pull request.I have a special-case path sending the retrieval requests to our Frisbii instance as a workaround until Curio implements IPFS retrievals for PDP deals:
Raw image:
CAR format:
Does not work yet - we need to figure out Cloudflare config:
https://ipfs.calibration.filbeam.io/0x000000000000000000000000000000000000dead/bafybeiagrjpf2rwth5oylc64czsrz2jm7a4fgo67b2luygqjrivjbswuku/rusty-lassie.png
TODO:
This is blocked until there is at least one calibnet SP running the Curio version that scans PDP pieces for IPFS CAR blocks.
?format=raw?format=carand other query-string params when redirecting from/wallet/cidto1-{dataset}-{piece}.Links: