Skip to content

Phase 3 withholding: incomplete withheld set can leak blobs in cleartext (ref namespace scope + fail-open ref walk) #42

@beardthelion

Description

@beardthelion

Summary

Two issues in crates/gitlawb-node/src/git/visibility_pack.rs (blob_paths, the basis of the withheld-blob set for Phase 3 subtree withholding) can produce an incomplete withheld set, which downstream code then treats as public — a potential cleartext disclosure of withheld content.

1. Withholding only walks refs/heads/* and refs/tags/*

if !refname.starts_with("refs/heads/") && !refname.starts_with("refs/tags/") {
    continue;
}

blob_paths enumerates blob→path pairs from heads and tags only. But the consumers of the resulting withheld set operate over a broader object graph:

  • crates/gitlawb-node/src/git/smart_http.rs uses rev-list --objects --all
  • crates/gitlawb-node/src/ipfs_pin.rs filters all repo objects through it

A blob reachable only from another ref namespace (e.g. refs/notes/*, refs/merge-requests/*, hidden refs) never enters withheld, so it can be packed or pinned in cleartext even though its path is denied by visibility rules.

2. Ref walk failures fail open

if !listing.status.success() {
    continue;
}

A non-zero git ls-tree result is skipped, turning a traversal error into a partial withheld set. Callers then treat the missing blobs as public. This should fail closed (return an error so the fetch/pack/pin aborts) rather than silently leak.

Suggested direction

  • Enumerate the object graph for withholding with the same ref/object scope the pack and pin paths use (align on one shared enumerator over --all / the full reachable set), so nothing the consumers see can escape the withheld computation.
  • On any git ls-tree / ref-walk error, return Err so the caller aborts instead of producing a partial set.

Notes

This is Phase 3 (#28, issue #18) code. Surfaced by CodeRabbit during review of #40 (it appears in that PR only because the cross-fork diff is cumulative over the unmerged stack). Filing here so it's tracked against the withholding work rather than the recipient-set-blinding PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions