Skip to content

Conversation

@SgtPooki
Copy link
Collaborator

@SgtPooki SgtPooki commented Feb 10, 2026

Summary

Adds CAR content validation to IPNI retrieval, replacing the size-only check with a full round-trip integrity verification.

Problem

When retrieving data from storage providers via IPNI, we only validated that the downloaded CAR file's byte count matched the expected size. A corrupted or tampered CAR file would pass validation as long as its length was correct. (Fixes #144)

Solution

We validate retrieved CARs by round-tripping the content:

  1. Unpack the CAR — parse it, load blocks into a memory blockstore, and use the UnixFS exporter to extract the original file content
  2. Rebuild a new CAR from the extracted content using the same chunker/layout
  3. Compare the rebuilt root CID against the expected root CID

If the CIDs match, the entire chain is proven valid: CAR structure, UnixFS DAG integrity, block linkage, and content correctness. The existing size check is kept as a fast pre-check to fail immediately on obvious mismatches before doing the heavier round-trip.

Changes

  • car-utils.ts — Added unpackCarToPath() and validateCarContent() functions, plus CarValidationResult type
  • car-utils.spec.ts — New test file with 6 tests: round-trip CID matching, byte-level content preservation, valid/invalid/corrupt/garbage CAR validation
  • ipni.strategy.tsvalidateData now runs content validation after size pre-check; method updated from car-size-check to car-content-validation; removed TODO referencing bug: when we retrieve or verify content downloaded from SP or IPFS, we do not validate the actual content #144
  • package.json — Added blockstore-core and ipfs-unixfs-exporter as direct deps (already present as transitive deps of filecoin-pin)

Notes for reviewers

  • The round-trip approach is heavier than per-block hash verification, but it validates strictly more — it catches DAG structural issues (broken links, missing intermediate nodes, incorrect UnixFS metadata) that block-level hash checks would miss.
  • validateCarContent writes to temp dirs and cleans up in a finally block. The extracted content lives only for the duration of the validation call.

Copilot AI review requested due to automatic review settings February 10, 2026 14:33
@FilOzzy FilOzzy added this to FOC Feb 10, 2026
@github-project-automation github-project-automation bot moved this to 📌 Triage in FOC Feb 10, 2026
@SgtPooki SgtPooki moved this from 📌 Triage to 🔎 Awaiting review in FOC Feb 10, 2026
@SgtPooki SgtPooki self-assigned this Feb 10, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens IPNI retrieval integrity checks by replacing a size-only CAR validation with a full content round-trip validation that rebuilds a CAR and compares root CIDs (addressing #144).

Changes:

  • Added CAR unpacking + round-trip content validation utilities (unpackCarToPath, validateCarContent) and result typing.
  • Updated IPNI retrieval validation to run a size pre-check, then perform full content validation when rootCID is available.
  • Added a new Vitest suite covering valid/invalid/corrupt/garbage CAR validation scenarios and introduced required direct dependencies.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
pnpm-lock.yaml Locks new direct dependencies needed for UnixFS export + blockstore usage.
apps/backend/src/retrieval-addons/strategies/ipni.strategy.ts Switches IPNI validation from size-only to size pre-check + content round-trip validation.
apps/backend/src/common/car-utils.ts Implements CAR unpacking via UnixFS exporter and round-trip root CID validation.
apps/backend/src/common/car-utils.spec.ts Adds tests for round-trip correctness and failure modes (corrupt/garbage/wrong CID).
apps/backend/package.json Adds blockstore-core and ipfs-unixfs-exporter as direct deps.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

SgtPooki and others added 4 commits February 10, 2026 09:39
class CarReaderBlockstore {
constructor(private readonly reader: CarReader) {}

async *get(cid: CID, _options?: unknown): AsyncGenerator<Uint8Array> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it an AsyncGenerator? I think you're just supposed to conform to get(cid): Promise<Uint8Array> to be a blockstore for exporter

@rvagg
Copy link

rvagg commented Feb 12, 2026

full round-trip integrity verification

I think this might be overengineering the solution here, CIDs should provide this free of charge.

There's two ways to look at integrity of these deals:

  1. If you care about the piece-level, i.e. the intact CAR: fetch the piece from /piece/, hash it and check commp is correct (commpv2 which has size, or just commp + length to get the same guarantee [see https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-0069.md for more on the length problem with v1]).
  2. If you care about the data in the CAR: fetch traverse the full DAG from the root, fetching each one from /ipfs/, hash the bytes and confirm the CID Is correct (js-car doesn't do this by default for you btw, you need to go the extra mile, there's probably a helia util for it, or just use verified-fetch since that's the whole hog and I presume it hashes).

You shouldn't need to reconstruct anything, CIDs do the work for you, just hash and be assured.

Here's the thing about the SP modifying the CAR:

  • If you care about the CAR being unmolested, then you care about whole-piece retrieval cause you care about the CAR, so just fetch it as a piece.
  • If you just care about your IPFS data being unmolested then you don't care about the CAR, you care about the DAG and the CAR was just a way of getting it to the SP. The /ipfs/ endpoint can also do some DAG magic for you: regardless of how the CAR was put together, you an ask for the root plus dag-scope=all (edit: you get this by default, sorry) and you'll get back a new CAR that's correctly depth-first ordered (we don't guarantee depth-first when we generate in filecoin-pin, we just bucket-of-blocks as we make them cause who cares since /ipfs/ will do the nice ordering for us to make download validation easier -- this is what verified-fetch will verify). So you could even use that API to get a fresh CAR that's easy to traverse in clean block order.

Perhaps you care about both things, but just caring about the latter is legitimate for many users. Either way though, verifying CIDs is enough to validate. From a user perspective: trust the root CID (IPFS root CID or Piece CID, same concept) and you can trust the data underneath it because you have the tools to extend that downward.

@BigLep
Copy link
Contributor

BigLep commented Feb 12, 2026

Thanks for all the details @rvagg .

I agree we should verify the CIDs match - that is good enough.

Lets do /ipfs retrieval for all the blocks since this is all about making sure SPs support "standard IPFS tooling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🔎 Awaiting review

Development

Successfully merging this pull request may close these issues.

bug: when we retrieve or verify content downloaded from SP or IPFS, we do not validate the actual content

3 participants