Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
cf025ce
feat: add ipfs-retriever worker
bajtos Sep 29, 2025
f73c566
feat: implement IPFS-style retrievals
bajtos Sep 29, 2025
dd025cc
fixup! package-lock maintenance
bajtos Oct 2, 2025
c26393b
feat: add BigInt<>Base32 converters
bajtos Oct 2, 2025
b2e4c81
feat: add `getSlugForWalletAndCid` helper
bajtos Oct 2, 2025
56b4cee
feat: redirect /wallet/cid/pathname to slug.ipfs.filbeam.io/pathname
bajtos Oct 2, 2025
d7e4692
feat: https://1-{dataset}-{piece}.ipfs.filbeam.io
bajtos Oct 2, 2025
4b7b1c6
fix: drop multibase `b` prefix from the slugs
bajtos Oct 2, 2025
f8de5b0
fix: CF worker name
bajtos Oct 2, 2025
6faf291
feat: special-case handling for our Frisbii instance
bajtos Oct 2, 2025
290b38d
fix: improve error message about the slug format
bajtos Oct 2, 2025
4425cc3
fix: use 0x00dead address for the special case
bajtos Oct 2, 2025
aed1018
fix: serve redirects also at link.ipfs.calibration.filbeam.io
bajtos Oct 2, 2025
c47071f
Merge branch 'main' into serve-ipfs-retrievals
bajtos Oct 20, 2025
78d175c
feat: convert CAR to RAW
bajtos Oct 20, 2025
4d1541e
disable bad-bits lookup for IPFS retrievals
bajtos Oct 24, 2025
9baaea0
Merge branch 'main' into serve-ipfs-retrievals
juliangruber Oct 27, 2025
eee955b
re-enable bad bits
juliangruber Oct 27, 2025
4723465
fix bad bits method signature
juliangruber Oct 27, 2025
6b28eb1
Merge branch 'main' into serve-ipfs-retrievals
juliangruber Oct 27, 2025
0e0451d
fix missing `await`
juliangruber Oct 27, 2025
992e284
add CD
juliangruber Oct 27, 2025
2105982
fix type
juliangruber Oct 27, 2025
07c4dec
add support mixed case wallet addresses
juliangruber Oct 27, 2025
9c0c2a4
fix recreate node module tree
juliangruber Oct 27, 2025
8d9fdec
tests wip
juliangruber Oct 27, 2025
2e0f385
fix test
juliangruber Oct 27, 2025
7ccfd61
refactor
juliangruber Oct 27, 2025
78bb34d
tests wip
juliangruber Oct 27, 2025
784006d
fix test
juliangruber Oct 27, 2025
044a105
fix test
juliangruber Oct 27, 2025
1690c01
SP retrievals wip
juliangruber Oct 29, 2025
fa1011f
real SP test passes
juliangruber Oct 29, 2025
4900845
fmt
juliangruber Oct 29, 2025
d40e592
no mainnet for now
juliangruber Oct 29, 2025
89e8fad
remove frisbii special case
juliangruber Nov 2, 2025
1f3c471
fix: enable content-type sniffing for RAW responses
bajtos Oct 21, 2025
ff768d2
fix: handle 404 responses and empty response body
bajtos Oct 21, 2025
9f2272c
Merge branch 'main' into serve-ipfs-retrievals
bajtos Nov 4, 2025
e223831
fix support entry type `raw`
juliangruber Nov 5, 2025
0aac84d
Merge branch 'main' into serve-ipfs-retrievals
bajtos Nov 6, 2025
8e04420
refactor: `@filbeam/retrieval` in ipfs-retriever
bajtos Nov 6, 2025
78a7d32
Merge branch 'main' into serve-ipfs-retrievals
juliangruber Nov 13, 2025
def2c30
Merge branch 'main' into serve-ipfs-retrievals
juliangruber Nov 14, 2025
ef6cecf
use shared `getDataSetStats()`
juliangruber Nov 14, 2025
4320137
clean up
juliangruber Nov 14, 2025
ceed20c
re-enable mainnet
juliangruber Nov 14, 2025
78778c3
clean up
juliangruber Nov 14, 2025
5be53ac
Merge branch 'main' into serve-ipfs-retrievals
juliangruber Nov 14, 2025
9ba5aa5
refactor `logRetrievalResult()`
juliangruber Nov 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,14 @@ jobs:
accountId: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
preCommands: ../db/deploy-${{ matrix.environment }}.sh
environment: ${{ matrix.environment }}
- name: Deploy IPFS Retriever and Migrate Database
uses: cloudflare/wrangler-action@v3
with:
workingDirectory: ipfs-retriever
apiToken: ${{ secrets.CLOUDFLARE_API_TOKEN }}
accountId: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
preCommands: ../db/deploy-${{ matrix.environment }}.sh
environment: ${{ matrix.environment }}
- name: Deploy Indexer
uses: cloudflare/wrangler-action@v3
with:
Expand Down
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,12 @@ Create `piece-retriever/.dev.vars` file with the following content:
BOT_TOKENS="{\"secret\":\"dev\"}"
```

Create `ipfs-retriever/.dev.vars` file with the following content:

```
BOT_TOKENS="{\"secret\":\"dev\"}"
```

Create `terminator/.dev.vars` file with the following content:

```
Expand Down
331 changes: 331 additions & 0 deletions ipfs-retriever/bin/ipfs-retriever.js
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to create a discussion thread that we can mark as resolved later, and the only way to do so is by attaching the first comment to some code.

Let's review the TODO check-list in the issue description:

  • Support all ?format=raw
  • Support for other IPFS Trustless GW options, like dag-scope, etc.
  • Calculate egress consumption, account for different size of the cache-miss CAR response vs client-side RAW response
  • Bot to peridically check IPFS retrievals
  • Preserve ?format=car and other query-string params when redirecting from /wallet/cid to 1-{dataset}-{piece}.

I feel many of these items can be implemented later and don't block the landing of this PR.

The only hard requirement is IMO reporting egress consumption, as it is tied to billing. I can imagine the first iteration can report CAR size instead of the real response size. We will charge clients a bit more (CAR is larger than the raw data), but I think that's okay for a few days until we fix the problem in a follow-up pull request.

It would be nice to preserve the query string when redirecting from link.* to 1-{dataset}-{piece}.*, since it should be a quick fix.

I propose to convert the rest of TODOs into new GH issues and add them as children of the theme MVP of FilBeam for IPFS (filbeam/roadmap#85).

@juliangruber thoughts?

Original file line number Diff line number Diff line change
@@ -0,0 +1,331 @@
import {
isValidEthereumAddress,
httpAssert,
setContentSecurityPolicy,
getBadBitsEntry,
updateDataSetStats,
logRetrievalResult,
} from '@filbeam/retrieval'

import { parseRequest } from '../lib/request.js'
import {
retrieveIpfsContent as defaultRetrieveIpfsContent,
measureStreamedEgress,
processIpfsResponse,
} from '../lib/retrieval.js'
import {
getStorageProviderAndValidatePayerByDataSetAndPiece,
getSlugForWalletAndCid,
} from '../lib/store.js'

export default {
/**
* @param {Request} request
* @param {Env} env
* @param {ExecutionContext} ctx
* @param {object} options
* @param {typeof defaultRetrieveIpfsContent} [options.retrieveIpfsContent]
* @returns
*/
async fetch(
request,
env,
ctx,
{ retrieveIpfsContent = defaultRetrieveIpfsContent } = {},
) {
try {
return await this._fetch(request, env, ctx, {
retrieveIpfsContent,
})
} catch (error) {
return this._handleError(error)
}
},

/**
* @param {Request} request
* @param {Env} env
* @param {ExecutionContext} ctx
* @param {object} options
* @param {typeof defaultRetrieveIpfsContent} [options.retrieveIpfsContent]
* @returns
*/
async _fetch(
request,
env,
ctx,
{ retrieveIpfsContent = defaultRetrieveIpfsContent } = {},
) {
httpAssert(
['GET', 'HEAD'].includes(request.method),
405,
'Method Not Allowed',
)

if (
URL.parse(request.url)?.hostname === env.DNS_ROOT.slice(1) ||
URL.parse(request.url)?.hostname === `link${env.DNS_ROOT}`
) {
return handleDnsRootRequest(request, env)
}

if (URL.parse(request.url)?.hostname.endsWith('filcdn.io')) {
return Response.redirect(
request.url.replace('filcdn.io', 'filbeam.io'),
301,
)
}

const requestTimestamp = new Date().toISOString()
const workerStartedAt = performance.now()
const requestCountryCode = request.headers.get('CF-IPCountry')

const { dataSetId, pieceId, ipfsSubpath, ipfsFormat, botName } = parseRequest(
request,
env,
)

try {
// Timestamp to measure file retrieval performance (from cache and from SP)
const fetchStartedAt = performance.now()

const { serviceProviderId, serviceUrl, ipfsRootCid } =
await getStorageProviderAndValidatePayerByDataSetAndPiece(
env,
dataSetId,
pieceId,
)

// Now check Bad Bits with the ipfsRootCid we got from the database
const isBadBit = await env.BAD_BITS_KV.get(
`bad-bits:${await getBadBitsEntry(ipfsRootCid)}`,
{
type: 'json',
},
)

httpAssert(
!isBadBit,
404,
'The requested CID was flagged by the Bad Bits Denylist at https://badbits.dwebops.pub',
)

httpAssert(
serviceProviderId,
404,
`Unsupported Service Provider: ${serviceProviderId}`,
)

const { response: originResponse, cacheMiss } = await retrieveIpfsContent(
serviceUrl,
ipfsRootCid,
ipfsSubpath,
env.ORIGIN_CACHE_TTL,
{ signal: request.signal },
)

const responseBody = await processIpfsResponse(originResponse, {
ipfsRootCid,
ipfsSubpath,
ipfsFormat,
signal: request.signal,
})

if (!responseBody) {
// The upstream response does not have any readable body
// There is no need to measure response body size, we can
// return the original response object.
ctx.waitUntil(
logRetrievalResult(env, {
cacheMiss,
responseStatus: originResponse.status,
egressBytes: 0,
requestCountryCode,
timestamp: requestTimestamp,
dataSetId,
botName,
}),
)
const response = new Response(originResponse.body, originResponse)
setContentSecurityPolicy(response)
response.headers.set('X-Data-Set-ID', dataSetId)
response.headers.set(
'Cache-Control',
`public, max-age=${env.CLIENT_CACHE_TTL}`,
)
return response
}

// Stream and count bytes
// We create two identical streams, one for the egress measurement and the other for returning the response as soon as possible
const [returnedStream, egressMeasurementStream] = responseBody.tee()
const reader = egressMeasurementStream.getReader()
const firstByteAt = performance.now()

ctx.waitUntil(
(async () => {
const egressBytes = await measureStreamedEgress(reader)
const lastByteFetchedAt = performance.now()

await logRetrievalResult(env, {
cacheMiss,
responseStatus: originResponse.status,
egressBytes,
requestCountryCode,
timestamp: requestTimestamp,
performanceStats: {
fetchTtfb: firstByteAt - fetchStartedAt,
fetchTtlb: lastByteFetchedAt - fetchStartedAt,
workerTtfb: firstByteAt - workerStartedAt,
},
dataSetId,
botName
})

await updateDataSetStats(env, {
dataSetId,
egressBytes,
cacheMiss,
enforceEgressQuota: env.ENFORCE_EGRESS_QUOTA,
})
})(),
)

// Return immediately, proxying the transformed response
const response = new Response(returnedStream, {
status: originResponse.status,
statusText: originResponse.statusText,
headers: originResponse.headers,
})
setContentSecurityPolicy(response)
response.headers.set('X-Data-Set-ID', dataSetId)
response.headers.set(
'Cache-Control',
`public, max-age=${env.CLIENT_CACHE_TTL}`,
)

// FIXME: move this logic into processIpfsResponse function
// When converting from CAR to RAW, set content-disposition to inline
// so browsers display the content instead of downloading it.
if (ipfsFormat !== 'car') {
response.headers.set('content-disposition', 'inline')
// Also remove the content-type header, remove x-content-type-options,
// and let the browser to sniff the content type.
response.headers.delete('content-type')
response.headers.delete('x-content-type-options')
}
Comment on lines +207 to +216
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we move this block to processIpfsResponse(), the code post-processing the response should be pretty much identical in both piece-retriever and ipfs-retriever.


return response
} catch (error) {
const { status } = getErrorHttpStatusMessage(error)

ctx.waitUntil(
logRetrievalResult(env, {
cacheMiss: null,
responseStatus: status,
egressBytes: null,
requestCountryCode,
timestamp: requestTimestamp,
dataSetId: null,
botName,
}),
)

throw error
}
},

/**
* @param {unknown} error
* @returns
*/
_handleError(error) {
const { status, message } = getErrorHttpStatusMessage(error)

if (status >= 500) {
console.error(error)
}
return new Response(message, { status })
},
}

/**
* Extracts status and message from an error object.
*
* - If the error has a numeric `status`, it is used; otherwise, defaults to 500.
* - If the status is < 500 and a string `message` exists, it's used; otherwise, a
* generic message is returned.
*
* @param {unknown} error - The error object to extract from.
* @returns {{ status: number; message: string }}
*/
function getErrorHttpStatusMessage(error) {
const isObject = typeof error === 'object' && error !== null
const status =
isObject && 'status' in error && typeof error.status === 'number'
? error.status
: 500

const message =
isObject &&
status < 500 &&
'message' in error &&
typeof error.message === 'string'
? error.message
: 'Internal Server Error'

return { status, message }
}
Comment on lines +252 to +278
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a good candidate to extract into a helper shared by both piece-retriever and ipfs-retriever.


/**
* Handles requests to the bare DNS_ROOT domain (e.g., ipfs.filbeam.io).
*
* - If no path is provided, redirects to https://filbeam.com
* - If path is /wallet/cid or /wallet/cid/pathname, generates a slug and
* redirects to the subdomain-based URL
*
* @param {Request} request - The incoming request
* @param {Env} env - Worker environment
* @returns {Promise<Response>} Redirect response
*/
async function handleDnsRootRequest(request, env) {
// Parse the URL path to extract wallet, cid, and optional pathname
const parsedUrl = URL.parse(request.url)
const pathname = parsedUrl?.pathname || '/'

// If no path, redirect to filbeam.com
if (pathname === '/' || pathname === '') {
return Response.redirect('https://filbeam.com/', 302)
}

// Parse path as /wallet/cid/pathname
const pathParts = pathname.slice(1).split('/') // Remove leading slash and split

if (pathParts.length < 2) {
httpAssert(
false,
404,
'Invalid path format. Expected: /wallet/cid or /wallet/cid/pathname',
)
}

const wallet = pathParts[0].toLowerCase()
const cid = pathParts[1]
const subpath = pathParts.slice(2).join('/')

// Validate wallet address
httpAssert(
isValidEthereumAddress(wallet),
404,
`Invalid wallet address: ${wallet}. Address must be a valid ethereum address.`,
)

// Get slug for the wallet and CID
const slug = await getSlugForWalletAndCid(env, wallet, cid)

// Build redirect URL
const redirectPath = subpath ? `/${subpath}` : ''
const redirectUrl = `https://${slug}${env.DNS_ROOT}${redirectPath}`

return Response.redirect(redirectUrl, 302)
}
Loading
Loading