fix(node): close under-withholding via full ref scope and reachable-only pin set#46
fix(node): close under-withholding via full ref scope and reachable-only pin set#46beardthelion wants to merge 30 commits into
Conversation
|
Warning Review limit reached
More reviews will be available in 45 minutes and 24 seconds. Learn how PR review limits work. To continue reviewing without waiting, enable usage-based billing in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (18)
📝 WalkthroughWalkthroughThis PR implements end-to-end encryption and withholding for Phase 3 confidentiality: withheld blobs are sealed to recipient keys via ephemeral X25519 envelopes, encrypted with XChaCha20Poly1305, pinned to IPFS with opaque recipient tags, anchored to Arweave, replicated across mirrors using promisor-based filtering, and recovered locally via a new ChangesCore encryption and visibility foundations
Server-side filtered pack serving and encrypted blob management
Mirror mode classification and encrypted blob replication
Client-side clone command with recovery paths
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (4)
crates/gl/src/clone.rs (1)
262-267: ⚡ Quick winParse
/encrypted-blobsinto a typed response instead ofValue.A missing or mistyped
blobsfield currently degrades to[]and silently disables node-side recovery. Making this schema-strict would catch server/client contract drift the same wayWithheldPathsResponsealready does.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gl/src/clone.rs` around lines 262 - 267, Replace the loose serde_json::Value parsing for resp in clone.rs with a strongly typed EncryptedBlobsResponse struct so schema mismatches surface as errors: define EncryptedBlobsResponse { blobs: Vec<EncryptedBlobOrString> } (matching the server shape, analogous to WithheldPathsResponse), then call resp.json::<EncryptedBlobsResponse>().await.context("parsing encrypted-blobs")? and use the returned .blobs directly instead of body.get(...).and_then(...).cloned().unwrap_or_default(); this ensures missing or mistyped "blobs" no longer silently becomes [] and will fail fast (update the EncryptedBlobOrString type to the actual blob element type used).crates/gitlawb-node/src/api/encrypted.rs (1)
109-123: ⚡ Quick winAdd a handler test for the 404/404 access-control contract.
The new test only locks the wire shape. A small handler-level test that exercises both “caller denied” and “OID missing” on
get_encrypted_blobwould protect the deliberate non-oracle behavior from future refactors.Based on learnings, this endpoint should keep “not found” and “not authorized” externally indistinguishable.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/api/encrypted.rs` around lines 109 - 123, Add a handler-level test for the access-control contract of get_encrypted_blob that verifies the external response is indistinguishable when the caller is denied versus when the OID is missing: write two tests (or one parameterized test) invoking the get_encrypted_blob handler — one scenario where the auth check fails (simulate "caller denied") and one where the blob store returns NotFound (simulate "OID missing") — and assert both responses have the same HTTP status and body (e.g., both return 404/not-found) to ensure the endpoint does not leak authorization state; reference the get_encrypted_blob handler in your test and reuse existing test scaffolding in the encrypted.rs tests module.Source: Learnings
crates/gitlawb-core/src/encrypt.rs (1)
66-88: ⚡ Quick winZeroize transient key material in
seal_blobandopen_blob.
content_key, the seed-derived X25519 secret, and the unwrapped content key all live in plain stack/heap buffers until drop. In this confidentiality path that is avoidable exposure; keep them inZeroizing(or explicitly callzeroize()) once the ciphers are initialized.Hardening sketch
+use zeroize::Zeroizing; + - let mut content_key = [0u8; 32]; - OsRng.fill_bytes(&mut content_key); + let mut content_key = Zeroizing::new([0u8; 32]); + OsRng.fill_bytes(&mut *content_key); let body_cipher = XChaCha20Poly1305::new_from_slice(&content_key) .map_err(|e| anyhow::anyhow!("content key: {e}"))?; ... - let my_x = XSecret::from(x25519_secret_from_seed(&keypair.seed_bytes())); + let seed = keypair.to_seed(); + let my_x = XSecret::from(x25519_secret_from_seed(&seed)); ... - let content_key = content_key.context("not a recipient of this envelope")?; + let content_key = Zeroizing::new(content_key.context("not a recipient of this envelope")?);Also applies to: 126-167
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-core/src/encrypt.rs` around lines 66 - 88, In seal_blob and open_blob, sensitive key material (the 32-byte content_key, the ephemeral XSecret `eph`, the seed-derived X25519 secret, and any unwrapped content key buffers) must be zeroed after use: wrap stack/heap buffers in zeroize::Zeroizing (e.g., Zeroizing<[u8;32]>, Zeroizing<Vec<u8>>) or explicitly call zeroize() on types that implement Zeroize immediately after the ciphers are initialized and after encryption/decryption completes; ensure you zero `content_key`, the ephemeral secret `eph` (or call its zeroize method), the seed-derived X25519 secret produced by x25519_private/related variable, and the unwrapped content key before returning or pushing into Recipient so no plaintext key remains in memory.crates/gitlawb-node/src/git/visibility_pack.rs (1)
101-130: 🏗️ Heavy liftAvoid re-traversing the repo to derive recipients.
withheld_blob_recipients()does one fullblob_paths()sweep throughwithheld_blob_oids()and then a second fullblob_paths()sweep for the recipient pass. On the push path,git_receive_pack()has already computed the withheld set before it gets here, so repos with visibility rules now pay up to three completels-treetraversals per push. Threading the precomputed withheld/blob-path data into this API would keep recipient derivation single-pass.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/git/visibility_pack.rs` around lines 101 - 130, withheld_blob_recipients currently re-traverses the repo by calling withheld_blob_oids(...) and then blob_paths(...), causing multiple ls-tree sweeps; change withheld_blob_recipients signature to accept precomputed withheld data (e.g., Option<&HashSet<String>> or Option<&HashMap<String,String>> representing withheld oids or oid->path map) so callers like git_receive_pack can pass the already-computed set/map, skip the internal withheld_blob_oids call when provided, and iterate the provided oid->path mapping (or filter blob_paths once) to build recipients; update callers (notably git_receive_pack) to supply the precomputed data and keep behavior identical when None is passed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-core/src/identity.rs`:
- Around line 55-59: The new seed_bytes() method leaks a non-zeroizing copy of
the Ed25519 seed; replace its usage with the existing to_seed() which returns a
Zeroizing<[u8;32]>, or change seed_bytes() to return a zeroizing wrapper (e.g.
Zeroizing<[u8;32]>) instead of plain [u8;32]; update callers such as the code
paths that call seed_bytes() (including open_blob and the encrypted-pin task in
crates/gitlawb-node/src/api/repos.rs) to accept and use the Zeroizing seed or to
call to_seed() directly so the raw seed is never held in plain memory.
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 398-425: Short-circuit the expensive withheld walk by detecting
when the repo's rules cannot withhold any path-scoped blobs (e.g., rules is
empty or only applies to root) before calling
visibility_pack::withheld_blob_oids; if that check shows no possible withheld
blobs, skip the tokio::task::spawn_blocking(...) and call
smart_http::upload_pack(&disk_path, body).await directly, otherwise proceed to
spawn_blocking and then choose between smart_http::upload_pack and
smart_http::upload_pack_excluding based on the resulting withheld set; use the
existing variables and functions (rules, visibility_pack::withheld_blob_oids,
withheld, smart_http::upload_pack, smart_http::upload_pack_excluding) to
implement the early return.
In `@crates/gitlawb-node/src/encrypted_pin.rs`:
- Around line 77-80: Replace silent `continue` branches when
`crate::git::store::read_object(repo_path, oid)` and the IPFS pin operation fail
with warning logs that include contextual details (repo_path, oid, and the
error/result) so failures are observable; in the match arms around `let data =
match crate::git::store::read_object(repo_path, oid) { ... }` and the subsequent
IPFS pin match, call the module’s logger (e.g., process_logger.warn or
log::warn) with a concise message and the error/result, then continue as before
to preserve behavior.
- Around line 72-76: The current code builds `keys: Vec<VerifyingKey>` by
`filter_map(|d| did_to_key(d))`, which silently drops unresolved DIDs and
proceeds to encrypt to a partial recipient set; change this so any unresolved
DID causes the blob to be skipped: instead of `filter_map`, resolve all DIDs and
detect failures (e.g., collect Options or compare `keys.len()` to `dids.len()`
after mapping via `did_to_key`), and when a mismatch is found log a warning
referencing `oid` (similar to the existing log) and `continue` so you do not
seal to a partial set; ensure you keep the `VerifyingKey` handling consistent
with downstream code that expects `keys`.
In `@crates/gitlawb-node/src/ipfs_pin.rs`:
- Around line 75-85: The cat function creates a reqwest::Client with no timeout
(reqwest::Client::new()) which can hang get_encrypted_blob and the sync worker;
modify cat (or callers) to use a client with a bounded timeout (e.g., build a
Client via reqwest::Client::builder().timeout(Duration::from_secs(...)) and
reuse it) or apply a per-request timeout (e.g., wrap the send/bytes await with
tokio::time::timeout) so the POST to {}/api/v0/cat?arg={} fails fast on
slow/stalled Kubo; update the cat signature or module to accept/reuse a Client
or enforce a timeout around resp = client.post(&url).send().await and
resp.bytes().await to avoid indefinite hangs.
In `@crates/gitlawb-node/src/sync.rs`:
- Around line 181-188: The code currently treats a None result from
fetch_withheld as MirrorMode::Plain which incorrectly downgrades unknown/error
states; instead, check the Option returned by fetch_withheld and only set
MirrorMode::Plain when you get Some(empty); if fetch_withheld returns None
preserve the repository's existing promisor/partialclone state (do not call
classify_mirror on None). Concretely, change the branch around
fetch_withheld/classify_mirror to: if withheld.is_some() { let mode =
classify_mirror(withheld); } else { read the current repo promisor/partialclone
setting from the local repo at local_path (or otherwise keep the current
MirrorMode) and use that mode for fetch_repo/clone_repo }, so
fetch_repo/clone_repo are invoked with the preserved mode rather than
unconditionally using MirrorMode::Plain.
In `@crates/gl/src/clone.rs`:
- Around line 428-459: The current GraphQL fetch (the local `query` +
`gql_body`, use of `client.post(...).json(&gql_body).send().await`, and
subsequent `parse_tx_ids(&gql)`) only requests first:100 and stops, causing
missing manifests; change the logic to paginate: rewrite the GraphQL query to
take variables for "first" and "after" (cursor), then loop POSTing the query
(building `gql_body` with `{ "repo": slug, "first": batch_size, "after": cursor
}`), call `parse_tx_ids` (or adjust it to return edges + pageInfo) to collect tx
ids across pages, update `cursor` from pageInfo.nextCursor/hasNextPage until no
more pages, then proceed to fetch manifests and call
`merge_manifests(manifests)` as before; ensure the loop uses the same
`client.post(...).json(&gql_body).send().await` error handling as existing code
and retains the existing behavior when responses fail.
- Around line 415-416: The slug normalization and Arweave discovery/parsing need
tightening: ensure owner short-form uses the same split logic as the node by
keeping owner_short = owner.split(':').next_back().unwrap_or(owner) and forming
slug = format!("{owner_short}/{name}"); update the GraphQL discovery for
encrypted-manifest anchors in discover/recovery code to paginate instead of
hard-coding first:100 (iterate queries until no more edges or timestamps
covered) so recover_encrypted_blobs sees all anchors; change
recover_encrypted_blobs to deserialize the /encrypted-blobs response into a
strict typed struct (like the existing withheld-paths type) and fail on
missing/wrong fields instead of using Value with unwrap_or_default(), ensuring
schema-strict parsing and surfaced errors.
---
Nitpick comments:
In `@crates/gitlawb-core/src/encrypt.rs`:
- Around line 66-88: In seal_blob and open_blob, sensitive key material (the
32-byte content_key, the ephemeral XSecret `eph`, the seed-derived X25519
secret, and any unwrapped content key buffers) must be zeroed after use: wrap
stack/heap buffers in zeroize::Zeroizing (e.g., Zeroizing<[u8;32]>,
Zeroizing<Vec<u8>>) or explicitly call zeroize() on types that implement Zeroize
immediately after the ciphers are initialized and after encryption/decryption
completes; ensure you zero `content_key`, the ephemeral secret `eph` (or call
its zeroize method), the seed-derived X25519 secret produced by
x25519_private/related variable, and the unwrapped content key before returning
or pushing into Recipient so no plaintext key remains in memory.
In `@crates/gitlawb-node/src/api/encrypted.rs`:
- Around line 109-123: Add a handler-level test for the access-control contract
of get_encrypted_blob that verifies the external response is indistinguishable
when the caller is denied versus when the OID is missing: write two tests (or
one parameterized test) invoking the get_encrypted_blob handler — one scenario
where the auth check fails (simulate "caller denied") and one where the blob
store returns NotFound (simulate "OID missing") — and assert both responses have
the same HTTP status and body (e.g., both return 404/not-found) to ensure the
endpoint does not leak authorization state; reference the get_encrypted_blob
handler in your test and reuse existing test scaffolding in the encrypted.rs
tests module.
In `@crates/gitlawb-node/src/git/visibility_pack.rs`:
- Around line 101-130: withheld_blob_recipients currently re-traverses the repo
by calling withheld_blob_oids(...) and then blob_paths(...), causing multiple
ls-tree sweeps; change withheld_blob_recipients signature to accept precomputed
withheld data (e.g., Option<&HashSet<String>> or Option<&HashMap<String,String>>
representing withheld oids or oid->path map) so callers like git_receive_pack
can pass the already-computed set/map, skip the internal withheld_blob_oids call
when provided, and iterate the provided oid->path mapping (or filter blob_paths
once) to build recipients; update callers (notably git_receive_pack) to supply
the precomputed data and keep behavior identical when None is passed.
In `@crates/gl/src/clone.rs`:
- Around line 262-267: Replace the loose serde_json::Value parsing for resp in
clone.rs with a strongly typed EncryptedBlobsResponse struct so schema
mismatches surface as errors: define EncryptedBlobsResponse { blobs:
Vec<EncryptedBlobOrString> } (matching the server shape, analogous to
WithheldPathsResponse), then call
resp.json::<EncryptedBlobsResponse>().await.context("parsing encrypted-blobs")?
and use the returned .blobs directly instead of
body.get(...).and_then(...).cloned().unwrap_or_default(); this ensures missing
or mistyped "blobs" no longer silently becomes [] and will fail fast (update the
EncryptedBlobOrString type to the actual blob element type used).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: fab81d81-dc64-46fe-b486-709131c8a108
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (23)
crates/gitlawb-core/Cargo.tomlcrates/gitlawb-core/src/encrypt.rscrates/gitlawb-core/src/identity.rscrates/gitlawb-core/src/lib.rscrates/gitlawb-node/Cargo.tomlcrates/gitlawb-node/src/api/encrypted.rscrates/gitlawb-node/src/api/mod.rscrates/gitlawb-node/src/api/repos.rscrates/gitlawb-node/src/api/visibility.rscrates/gitlawb-node/src/arweave.rscrates/gitlawb-node/src/db/mod.rscrates/gitlawb-node/src/encrypted_pin.rscrates/gitlawb-node/src/git/mod.rscrates/gitlawb-node/src/git/smart_http.rscrates/gitlawb-node/src/git/visibility_pack.rscrates/gitlawb-node/src/ipfs_pin.rscrates/gitlawb-node/src/main.rscrates/gitlawb-node/src/pinata.rscrates/gitlawb-node/src/server.rscrates/gitlawb-node/src/sync.rscrates/gitlawb-node/src/visibility.rscrates/gl/src/clone.rscrates/gl/src/main.rs
| /// The raw 32-byte Ed25519 seed. Used to derive the X25519 secret for | ||
| /// envelope decryption (see `crate::encrypt`). | ||
| pub fn seed_bytes(&self) -> [u8; 32] { | ||
| self.signing_key.to_bytes() | ||
| } |
There was a problem hiding this comment.
Avoid adding a non-zeroizing raw-seed accessor.
to_seed() already exposes this material behind Zeroizing; seed_bytes() reintroduces the same long-lived signing seed as a plain [u8; 32]. That copy now flows into open_blob and the async encrypted-pin path, which widens private-key lifetime without adding any new capability. Prefer reusing to_seed() here, or make this helper return a zeroizing wrapper instead of raw bytes. This already shows up in crates/gitlawb-node/src/api/repos.rs:659-707, where the seed is moved into the spawned encrypted-pin task.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-core/src/identity.rs` around lines 55 - 59, The new
seed_bytes() method leaks a non-zeroizing copy of the Ed25519 seed; replace its
usage with the existing to_seed() which returns a Zeroizing<[u8;32]>, or change
seed_bytes() to return a zeroizing wrapper (e.g. Zeroizing<[u8;32]>) instead of
plain [u8;32]; update callers such as the code paths that call seed_bytes()
(including open_blob and the encrypted-pin task in
crates/gitlawb-node/src/api/repos.rs) to accept and use the Zeroizing seed or to
call to_seed() directly so the raw seed is never held in plain memory.
| // withheld_blob_oids walks every ref with blocking `git ls-tree`; keep that | ||
| // off the async worker thread. | ||
| let withheld = { | ||
| let path = disk_path.clone(); | ||
| let rules = rules.clone(); | ||
| let owner_did = record.owner_did.clone(); | ||
| let caller_owned = caller.map(str::to_string); | ||
| let is_public = record.is_public; | ||
| tokio::task::spawn_blocking(move || { | ||
| visibility_pack::withheld_blob_oids( | ||
| &path, | ||
| &rules, | ||
| is_public, | ||
| &owner_did, | ||
| caller_owned.as_deref(), | ||
| ) | ||
| }) | ||
| .await | ||
| .map_err(|e| { | ||
| let msg = e.to_string(); | ||
| if msg.contains("bad line length") || msg.contains("protocol error") { | ||
| tracing::warn!(repo = %name, err = %msg, "git-upload-pack: bad client request"); | ||
| AppError::BadRequest(msg) | ||
| } else { | ||
| tracing::error!(repo = %name, err = %msg, "git-upload-pack failed"); | ||
| AppError::Git(msg) | ||
| } | ||
| })?; | ||
| .map_err(|e| AppError::Git(e.to_string()))? | ||
| .map_err(|e| AppError::Git(e.to_string()))? | ||
| }; | ||
|
|
||
| let resp = if withheld.is_empty() { | ||
| smart_http::upload_pack(&disk_path, body).await | ||
| } else { | ||
| tracing::info!(repo = %name, caller = ?caller, withheld = withheld.len(), "serving filtered pack"); | ||
| smart_http::upload_pack_excluding(&disk_path, body, &withheld).await | ||
| } |
There was a problem hiding this comment.
Skip the withheld walk when no path-scoped rule can withhold anything.
After the "/" gate above, repos whose rule set is empty or root-only cannot produce a non-empty withheld blob set, but this branch still runs the full withheld_blob_oids() ref walk before falling back to the ordinary upload_pack(). That adds an extra full repo scan to every clone/fetch on the common public path.
💡 Minimal short-circuit
- let withheld = {
- let path = disk_path.clone();
- let rules = rules.clone();
- let owner_did = record.owner_did.clone();
- let caller_owned = caller.map(str::to_string);
- let is_public = record.is_public;
- tokio::task::spawn_blocking(move || {
- visibility_pack::withheld_blob_oids(
- &path,
- &rules,
- is_public,
- &owner_did,
- caller_owned.as_deref(),
- )
- })
- .await
- .map_err(|e| AppError::Git(e.to_string()))?
- .map_err(|e| AppError::Git(e.to_string()))?
- };
-
- let resp = if withheld.is_empty() {
+ let has_path_scoped_rules = rules.iter().any(|r| r.path_glob != "/");
+ let resp = if !has_path_scoped_rules {
smart_http::upload_pack(&disk_path, body).await
} else {
- tracing::info!(repo = %name, caller = ?caller, withheld = withheld.len(), "serving filtered pack");
- smart_http::upload_pack_excluding(&disk_path, body, &withheld).await
+ let withheld = {
+ let path = disk_path.clone();
+ let rules = rules.clone();
+ let owner_did = record.owner_did.clone();
+ let caller_owned = caller.map(str::to_string);
+ let is_public = record.is_public;
+ tokio::task::spawn_blocking(move || {
+ visibility_pack::withheld_blob_oids(
+ &path,
+ &rules,
+ is_public,
+ &owner_did,
+ caller_owned.as_deref(),
+ )
+ })
+ .await
+ .map_err(|e| AppError::Git(e.to_string()))?
+ .map_err(|e| AppError::Git(e.to_string()))?
+ };
+ if withheld.is_empty() {
+ smart_http::upload_pack(&disk_path, body).await
+ } else {
+ tracing::info!(repo = %name, caller = ?caller, withheld = withheld.len(), "serving filtered pack");
+ smart_http::upload_pack_excluding(&disk_path, body, &withheld).await
+ }
}
.map_err(|e| {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/api/repos.rs` around lines 398 - 425, Short-circuit
the expensive withheld walk by detecting when the repo's rules cannot withhold
any path-scoped blobs (e.g., rules is empty or only applies to root) before
calling visibility_pack::withheld_blob_oids; if that check shows no possible
withheld blobs, skip the tokio::task::spawn_blocking(...) and call
smart_http::upload_pack(&disk_path, body).await directly, otherwise proceed to
spawn_blocking and then choose between smart_http::upload_pack and
smart_http::upload_pack_excluding based on the resulting withheld set; use the
existing variables and functions (rules, visibility_pack::withheld_blob_oids,
withheld, smart_http::upload_pack, smart_http::upload_pack_excluding) to
implement the early return.
| let keys: Vec<VerifyingKey> = dids.iter().filter_map(|d| did_to_key(d)).collect(); | ||
| if keys.is_empty() { | ||
| tracing::warn!(oid = %oid, "no resolvable recipient keys; skipping encrypted pin"); | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Do not encrypt to a partial recipient set when some DIDs fail key resolution.
At Line 72, filter_map drops unresolved DIDs and still seals for the remaining keys. That can make authorized readers unrecoverable even though the blob is recorded as successfully encrypted/pinned. Treat any unresolved DID as a per-blob failure and skip instead of partially sealing.
💡 Suggested fix
- let keys: Vec<VerifyingKey> = dids.iter().filter_map(|d| did_to_key(d)).collect();
- if keys.is_empty() {
- tracing::warn!(oid = %oid, "no resolvable recipient keys; skipping encrypted pin");
+ let mut keys = Vec::with_capacity(dids.len());
+ let mut unresolved = Vec::new();
+ for did in dids {
+ match did_to_key(did) {
+ Some(k) => keys.push(k),
+ None => unresolved.push(did.clone()),
+ }
+ }
+ if !unresolved.is_empty() {
+ tracing::warn!(oid = %oid, unresolved = ?unresolved, "some recipient DIDs could not be resolved; skipping encrypted pin");
+ continue;
+ }
+ if keys.is_empty() {
+ tracing::warn!(oid = %oid, "empty recipient set; skipping encrypted pin");
continue;
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/encrypted_pin.rs` around lines 72 - 76, The current
code builds `keys: Vec<VerifyingKey>` by `filter_map(|d| did_to_key(d))`, which
silently drops unresolved DIDs and proceeds to encrypt to a partial recipient
set; change this so any unresolved DID causes the blob to be skipped: instead of
`filter_map`, resolve all DIDs and detect failures (e.g., collect Options or
compare `keys.len()` to `dids.len()` after mapping via `did_to_key`), and when a
mismatch is found log a warning referencing `oid` (similar to the existing log)
and `continue` so you do not seal to a partial set; ensure you keep the
`VerifyingKey` handling consistent with downstream code that expects `keys`.
| let data = match crate::git::store::read_object(repo_path, oid) { | ||
| Ok(Some((_t, bytes))) => bytes, | ||
| _ => continue, | ||
| }; |
There was a problem hiding this comment.
Add warnings for object-read and IPFS pin failures instead of silent skips.
At Line 77 and Line 88, failures currently continue without logs. This makes production retry/debugging hard and weakens observability during partial outages.
💡 Suggested fix
- let data = match crate::git::store::read_object(repo_path, oid) {
- Ok(Some((_t, bytes))) => bytes,
- _ => continue,
- };
+ let data = match crate::git::store::read_object(repo_path, oid) {
+ Ok(Some((_t, bytes))) => bytes,
+ Ok(None) => {
+ tracing::warn!(oid = %oid, "blob object not found locally; skipping encrypted pin");
+ continue;
+ }
+ Err(e) => {
+ tracing::warn!(oid = %oid, err = %e, "failed reading blob object; skipping encrypted pin");
+ continue;
+ }
+ };
@@
- let cid = match crate::ipfs_pin::pin_git_object(ipfs_api, oid, &envelope).await {
- Ok(c) if !c.is_empty() => c,
- _ => continue,
- };
+ let cid = match crate::ipfs_pin::pin_git_object(ipfs_api, oid, &envelope).await {
+ Ok(c) if !c.is_empty() => c,
+ Ok(_) => {
+ tracing::warn!(oid = %oid, "pin_git_object returned empty CID; skipping record");
+ continue;
+ }
+ Err(e) => {
+ tracing::warn!(oid = %oid, err = %e, "pin_git_object failed; skipping encrypted pin");
+ continue;
+ }
+ };Also applies to: 88-90
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/encrypted_pin.rs` around lines 77 - 80, Replace
silent `continue` branches when `crate::git::store::read_object(repo_path, oid)`
and the IPFS pin operation fail with warning logs that include contextual
details (repo_path, oid, and the error/result) so failures are observable; in
the match arms around `let data = match
crate::git::store::read_object(repo_path, oid) { ... }` and the subsequent IPFS
pin match, call the module’s logger (e.g., process_logger.warn or log::warn)
with a concise message and the error/result, then continue as before to preserve
behavior.
| /// Fetch raw bytes for a CID from the local Kubo node (`/api/v0/cat`). | ||
| pub async fn cat(ipfs_api: &str, cid: &str) -> Result<Vec<u8>> { | ||
| if ipfs_api.is_empty() { | ||
| return Err(anyhow::anyhow!("IPFS not configured")); | ||
| } | ||
| let url = format!("{}/api/v0/cat?arg={}", ipfs_api.trim_end_matches('/'), cid); | ||
| let resp = reqwest::Client::new().post(&url).send().await?; | ||
| if !resp.status().is_success() { | ||
| return Err(anyhow::anyhow!("ipfs cat {cid}: {}", resp.status())); | ||
| } | ||
| Ok(resp.bytes().await?.to_vec()) |
There was a problem hiding this comment.
Bound the Kubo /cat request.
This new helper creates a fresh reqwest::Client with no timeout, so a slow or stalled Kubo fetch can hang both get_encrypted_blob and the sync worker indefinitely. Please reuse a client with a bounded timeout, or set a per-request timeout here.
⏱️ Suggested fix
pub async fn cat(ipfs_api: &str, cid: &str) -> Result<Vec<u8>> {
if ipfs_api.is_empty() {
return Err(anyhow::anyhow!("IPFS not configured"));
}
let url = format!("{}/api/v0/cat?arg={}", ipfs_api.trim_end_matches('/'), cid);
- let resp = reqwest::Client::new().post(&url).send().await?;
+ let resp = reqwest::Client::new()
+ .post(&url)
+ .timeout(std::time::Duration::from_secs(30))
+ .send()
+ .await?;
if !resp.status().is_success() {
return Err(anyhow::anyhow!("ipfs cat {cid}: {}", resp.status()));
}
Ok(resp.bytes().await?.to_vec())
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/ipfs_pin.rs` around lines 75 - 85, The cat function
creates a reqwest::Client with no timeout (reqwest::Client::new()) which can
hang get_encrypted_blob and the sync worker; modify cat (or callers) to use a
client with a bounded timeout (e.g., build a Client via
reqwest::Client::builder().timeout(Duration::from_secs(...)) and reuse it) or
apply a per-request timeout (e.g., wrap the send/bytes await with
tokio::time::timeout) so the POST to {}/api/v0/cat?arg={} fails fast on
slow/stalled Kubo; update the cat signature or module to accept/reuse a Client
or enforce a timeout around resp = client.post(&url).send().await and
resp.bytes().await to avoid indefinite hangs.
| let withheld = fetch_withheld(client, &origin_url, owner_short, repo_name).await; | ||
| let mode = classify_mirror(withheld); | ||
|
|
||
| let result = if local_path.exists() { | ||
| fetch_repo(&local_path, &remote_url).await | ||
| fetch_repo(&local_path, &remote_url, mode).await | ||
| } else { | ||
| clone_repo(&remote_url, &local_path).await | ||
| clone_repo(&remote_url, &local_path, mode).await | ||
| }; |
There was a problem hiding this comment.
Don’t downgrade an unknown mirror state to Plain.
fetch_withheld collapses 404, 5xx, network, and parse failures into None, and this branch always turns that into MirrorMode::Plain. For an existing promisor mirror, that flows into fetch_repo(..., MirrorMode::Plain), which clears remote.origin.promisor and remote.origin.partialclonefilter before refetching. A transient withheld-paths outage on a still-withheld repo will therefore start failing syncs until the endpoint recovers. Only downgrade on Some(empty), and preserve the current promisor mode when the lookup is unknown.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/sync.rs` around lines 181 - 188, The code currently
treats a None result from fetch_withheld as MirrorMode::Plain which incorrectly
downgrades unknown/error states; instead, check the Option returned by
fetch_withheld and only set MirrorMode::Plain when you get Some(empty); if
fetch_withheld returns None preserve the repository's existing
promisor/partialclone state (do not call classify_mirror on None). Concretely,
change the branch around fetch_withheld/classify_mirror to: if
withheld.is_some() { let mode = classify_mirror(withheld); } else { read the
current repo promisor/partialclone setting from the local repo at local_path (or
otherwise keep the current MirrorMode) and use that mode for
fetch_repo/clone_repo }, so fetch_repo/clone_repo are invoked with the preserved
mode rather than unconditionally using MirrorMode::Plain.
| let owner_short = owner.split(':').next_back().unwrap_or(owner); | ||
| let slug = format!("{owner_short}/{name}"); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Client-side Arweave lookup normalization:"
rg -n -C2 'owner_short|slug = format!|graphql' crates/gl/src/clone.rs
echo
echo "Server-side encrypted-manifest tag construction:"
rg -n -C4 'encrypted-manifest|Schema|Repo|App-Name' crates/gitlawb-node/src/arweave.rs crates/gitlawb-node/src/api/repos.rsRepository: Gitlawb/node
Length of output: 20700
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Find EncryptedManifest and where `manifest.repo` is set:"
rg -n "struct EncryptedManifest|manifest\.repo|EncryptedManifest<'_>" crates/gitlawb-node/src/arweave.rs
echo
echo "Search for `repo:` field construction / slug derivation in node code:"
rg -n "Repo\s*[:=]|repo\s*:\s*|format!\(.*owner.*name|owner.*split\(':'\)|next_back\(\)" crates/gitlawb-node/src/arweave.rs crates/gitlawb-node/src -SRepository: Gitlawb/node
Length of output: 10904
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== crates/gitlawb-node/src/arweave.rs (top + EncryptedManifest + tag header) ==="
nl -ba crates/gitlawb-node/src/arweave.rs | sed -n '1,260p'
echo
echo "=== crates/gl/src/clone.rs (Arweave fallback slug + GraphQL variables) ==="
nl -ba crates/gl/src/clone.rs | sed -n '360,520p'Repository: Gitlawb/node
Length of output: 199
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "arweave.rs lines:"
wc -l crates/gitlawb-node/src/arweave.rs
echo
echo "crates/gitlawb-node/src/arweave.rs: EncryptedManifest + manifest_json payload + build_manifest_tags_header + any construction sites:"
cat -n crates/gitlawb-node/src/arweave.rs | sed -n '1,260p'
echo
echo "clone.rs: Arweave fallback slug + GraphQL body:"
wc -l crates/gl/src/clone.rs
echo
cat -n crates/gl/src/clone.rs | sed -n '380,480p'Repository: Gitlawb/node
Length of output: 15653
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Call sites for anchor_encrypted_manifest / EncryptedManifest construction ==="
rg -n "anchor_encrypted_manifest|EncryptedManifest\s*\{" crates/gitlawb-node/src
echo
echo "=== Inspect around likely repo_slug construction (owner_short + repo_name) in api/repos.rs ==="
rg -n "owner_short\s*=.*split\(':'\)\.next_back\(\)\.unwrap_or|repo_slug\s*=\s*format!" crates/gitlawb-node/src/api/repos.rs -n
sed -n '650,820p' crates/gitlawb-node/src/api/repos.rs | cat -nRepository: Gitlawb/node
Length of output: 9929
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== clone.rs: encrypted-blobs parsing ==="
rg -n "/encrypted-blobs|encrypted-blobs|recover_encrypted_blobs|serde_json::Value" crates/gl/src/clone.rs
echo
echo "=== clone.rs: recover_encrypted_blobs implementation ==="
# Print the function body area (best-effort) around its definition
rg -n "async fn recover_encrypted_blobs" -n crates/gl/src/clone.rs | head -n 5
# Use line ranges around likely area
sed -n '1,220p' crates/gl/src/clone.rs | head -n 220
echo
echo "=== clone.rs: fetch_withheld + schema strictness for withheld-paths ==="
rg -n "withheld-paths|fetch_withheld|recover_from_arweave" crates/gl/src/clone.rsRepository: Gitlawb/node
Length of output: 9912
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# encrypted-blobs + parsing behavior
echo "=== clone.rs: recover_encrypted_blobs + parse_tx_ids ==="
sed -n '220,420p' crates/gl/src/clone.rs
echo
echo "=== clone.rs: WithheldPathsResponse parsing (schema strictness / fail-open) ==="
sed -n '180,260p' crates/gl/src/clone.rs
echo
echo "=== clone.rs: tests relevant to strictness / arweave recovery ==="
rg -n "withheld-paths|encrypted-blobs|recover_from_arweave|unauthorized|happy|schema" crates/gl/src/clone.rsRepository: Gitlawb/node
Length of output: 11990
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '740,820p' crates/gl/src/clone.rsRepository: Gitlawb/node
Length of output: 3031
Fix Arweave manifest discovery pagination; encrypted-blobs parsing should be schema-strict
- Arweave
Reposlug normalization matches: the client uses{last-':' owner segment}/{name}, and the node anchorsEncryptedManifest.repowith the sameowner_did.split(':').next_back()short-form, taggingRepo:{manifest.repo}. - Arweave GraphQL discovery hard-codes
first:100; if a repo has >100 encrypted-manifest anchors under the sameRepotag, recovery can miss the newest per-blob manifests and leave still-live blobs unrecovered—needs pagination/iteration until coverage is complete. /encrypted-blobsparsing is fail-open on schema drift:recover_encrypted_blobsusesValueand defaultsblobsto[]viaunwrap_or_default(), unlike/withheld-pathswhich deserializes a typed struct and errors on missing/wrong fields; switch/encrypted-blobsto a strict typed response and fail on missing/wrong types.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gl/src/clone.rs` around lines 415 - 416, The slug normalization and
Arweave discovery/parsing need tightening: ensure owner short-form uses the same
split logic as the node by keeping owner_short =
owner.split(':').next_back().unwrap_or(owner) and forming slug =
format!("{owner_short}/{name}"); update the GraphQL discovery for
encrypted-manifest anchors in discover/recovery code to paginate instead of
hard-coding first:100 (iterate queries until no more edges or timestamps
covered) so recover_encrypted_blobs sees all anchors; change
recover_encrypted_blobs to deserialize the /encrypted-blobs response into a
strict typed struct (like the existing withheld-paths type) and fail on
missing/wrong fields instead of using Value with unwrap_or_default(), ensuring
schema-strict parsing and surfaced errors.
| let query = r#"query($repo:String!){transactions(tags:[{name:"App-Name",values:["gitlawb"]},{name:"Schema",values:["gitlawb/encrypted-manifest/v1"]},{name:"Repo",values:[$repo]}],first:100){edges{node{id}}}}"#; | ||
| let gql_body = serde_json::json!({ "query": query, "variables": { "repo": slug } }); | ||
| let resp = match client | ||
| .post(format!("{ag}/graphql")) | ||
| .json(&gql_body) | ||
| .send() | ||
| .await | ||
| { | ||
| Ok(r) if r.status().is_success() => r, | ||
| _ => return Ok(vec![]), | ||
| }; | ||
| let gql: serde_json::Value = match resp.json().await { | ||
| Ok(v) => v, | ||
| Err(_) => return Ok(vec![]), | ||
| }; | ||
| let tx_ids = parse_tx_ids(&gql); | ||
| if tx_ids.is_empty() { | ||
| return Ok(vec![]); | ||
| } | ||
|
|
||
| // 2. Fetch and parse each manifest body, then merge latest-wins per oid. | ||
| let mut manifests = Vec::new(); | ||
| for tx in tx_ids { | ||
| let m = match client.get(format!("{ag}/{tx}")).send().await { | ||
| Ok(r) if r.status().is_success() => r, | ||
| _ => continue, | ||
| }; | ||
| if let Ok(parsed) = m.json::<Manifest>().await { | ||
| manifests.push(parsed); | ||
| } | ||
| } | ||
| let oid_cid = merge_manifests(manifests); |
There was a problem hiding this comment.
Paginate manifest discovery instead of truncating at 100 transactions.
This fallback only inspects the first 100 manifest txs, then merges per-push manifests to find the latest CID per OID. Once a repo has more than 100 pushes, any private blob whose last seal is older than that window disappears from recovery even if it is still live at HEAD.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gl/src/clone.rs` around lines 428 - 459, The current GraphQL fetch
(the local `query` + `gql_body`, use of
`client.post(...).json(&gql_body).send().await`, and subsequent
`parse_tx_ids(&gql)`) only requests first:100 and stops, causing missing
manifests; change the logic to paginate: rewrite the GraphQL query to take
variables for "first" and "after" (cursor), then loop POSTing the query
(building `gql_body` with `{ "repo": slug, "first": batch_size, "after": cursor
}`), call `parse_tx_ids` (or adjust it to return edges + pageInfo) to collect tx
ids across pages, update `cursor` from pageInfo.nextCursor/hasNextPage until no
more pages, then proceed to fetch manifests and call
`merge_manifests(manifests)` as before; ensure the loop uses the same
`client.post(...).json(&gql_body).send().await` error handling as existing code
and retains the existing behavior when responses fail.
|
hello @beardthelion do you want to be a maintainer of this node repository? I would like to add you as one. let me know if you are interested. |
|
Thanks @kevincodex1, I'd be glad to. I've enjoyed working on this and I'm happy to take on more responsibility for it. Couple of things so I know what I'm signing up for: what's the expectation for the role, just merge access, or also triage/releases and general upkeep? And who else maintains it currently? Either way, count me in. |
|
please rebase and fix conflicts |
cc30427 to
90dc8d4
Compare
At the push chokepoint, after pinning withheld objects, resolve each withheld blob's recipient DIDs and seal it to their Ed25519 keys with seal_blob, pinning the ciphertext to IPFS and recording it in encrypted_blobs. Best-effort per blob: failures are logged and skipped, never pinned in plaintext. Pinata replication is unchanged; B1 encrypts to IPFS only. Adds ed25519-dalek as a direct dependency of gitlawb-node (it was only declared in the workspace Cargo.toml).
Skip re-pinning only when an existing envelope already covers exactly the current recipients. A reader added to a rule after the first pin now gets a re-seal on the next push instead of being permanently locked out. Reader removal stays non-retroactive (the old envelope is already public).
…or present-check Add two hermetic integration tests for recover_from_arweave that drive the full read path over mocked Arweave GraphQL + IPFS gateways: discover the manifest, fetch it, fetch the envelope, decrypt, and install the withheld blob. One covers an authorized recipient (blob installed), the other a non-recipient (nothing recovered). Both simulate origin death by removing the promisor remote and enable uploadpack.allowFilter so the blob is truly withheld over file://. Also harden the local presence check in recover_from_arweave with GIT_NO_LAZY_FETCH=1 and .output() so the expected 'missing object' case does not trigger a wasted promisor fetch or leak git stderr to the user.
db: parse_recipients now propagates a descriptive error instead of defaulting corrupt recipients JSON to an empty list, which would have denied authorized readers and handed peers incomplete metadata. gl: clone recovery now warns when the sparse-checkout file cannot be read or written, or when the post-recovery checkout fails, instead of silently discarding those errors and claiming files were recovered.
Envelope v2 drops the cleartext recipient public key (kid) from each wrapped-key header entry, so a party holding the public IPFS/Arweave copy can no longer enumerate who is authorized to decrypt a withheld blob. The version is bumped to 2 and v1 envelopes are rejected. open_blob now selects the reader's entry by trial decryption (the AEAD tag authenticates exactly one entry) instead of matching on the public key. Recipient count is still visible; identity is not. Scope is envelope-only, the node DB recipients column and peer replication metadata are unchanged. Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
open_blob fed attacker-controlled envelopes to GenericArray::from_slice and XNonce::from_slice, which panic on a wrong-length input. Validate both the per-recipient box nonce and the body nonce to 24 bytes before use: skip a recipient entry whose nonce is malformed, and return an error for a malformed body nonce, so the public recovery path surfaces an error rather than panicking.
The encrypted-blobs/replicate endpoint shipped every blob's full recipient
DID list to mirrors, which persisted it, so any mirroring peer learned the
reader set. The v2 envelope blinding already removed recipient public keys
from the pinned envelopes, so the comment justifying this (DIDs are already
public) was no longer true.
/replicate now returns {oid, cid} only. Mirrors detect a re-seal by the CID
changing (the OID is stable across re-seals) instead of comparing recipient
sets, and store no recipient identities. Origin-side authz and the at-rest
recipients column are unchanged; this blinds only the peer-facing surface.
The B3 encrypted-blob manifest anchored {oid, cid, recipients} to Arweave, a
permanent public record of every blob's reader set. The v2 envelope blinding
already removed recipient keys from the pinned envelopes, so the comment
justifying this (recipient DIDs are already public) was no longer true. The
manifest now anchors {oid, cid} only. The gl reader already ignores the
recipients field (it recovers by trial decryption), so no reader changes.
…adability
The origin no longer stores recipient DIDs. Migration v5 replaces the
encrypted_blobs.recipients column with an opaque, node-keyed recipients_tag
used only to detect a recipient-set change for re-seal. B1 discovery and fetch
are now gated by the same repo-readability check the git read path uses, not by
per-recipient matching; decryption is gated by the envelope crypto, so a
non-recipient who can read the repo sees a blob's {oid, cid} but cannot open it.
encrypt_and_pin keys the tag from the node seed and returns {oid, cid}; the
Arweave manifest tuple drops the now-unused recipient vec.
A DB compromise no longer reveals the reader set; recovering it would require
brute-forcing candidate DID sets against the keyed tag with the node key.
…rror
Address review on the at-rest blinding change:
- The encrypted-blobs/replicate listing returned {oid, cid} with no visibility
check, so a non-readable repo's blob index was reachable by an unauthenticated
caller who guessed {owner}/{repo}. Gate it by the same repo-readability check
discovery and fetch use. For the intended case (a public repo with withheld
subtrees) the public root keeps this open to peers; only fully non-readable
repos are withheld, which is the desired behavior.
- encrypt_and_pin treated a recipients_tag DB read error as a cache miss and
resealed, causing avoidable IPFS writes during a partial outage; skip and retry
on the next push instead.
- Correct the get_encrypted_blob doc comment to describe repo-readability access.
…nly pin set blob_paths walked only refs/heads/* and refs/tags/* and skipped silently on a failed git ls-tree, so a blob reachable only through another namespace, or a ref that failed to traverse, could fall out of the withheld set and ship in cleartext. Walk every ref and fail closed on traversal error. The pin enumerators (ipfs_pin, pinata) used git cat-file --batch-all-objects, which includes unreachable/dangling objects that have no path and cannot be classified for withholding. Switch them to git rev-list --objects --all so the pin set matches the reachable graph blob_paths evaluates.
90dc8d4 to
67afb20
Compare
Closes #42.
blob_pathsinvisibility_pack.rsbuilds the blob→path map that every withholding decision rests on. Two flaws let it produce an incomplete withheld set, which downstream code then treats as public, leaking withheld content in cleartext:Ref-namespace gap. It walked only
refs/heads/*andrefs/tags/*, but the serve path enumeratesgit rev-list --objects --all(every object reachable from any ref). A blob reachable only through another namespace (e.g.refs/notes/*) never entered the withheld set and shipped in cleartext. Fix: drop the heads/tags filter and walk every ref, still per-ref viagit ls-tree -rso all paths a blob lives at are seen (a blob is withheld only if denied at every path; if it is public at some allowed path it must still be sent). Note this is deliberately notrev-list --objects, which reports one representative path per object and would regress that correctness.Fail-open ref walk. A non-zero
git ls-treeexit was silently skipped, yielding a partial set. Fix: fail closed (bail!). All three callers already abort on error (serve propagates, push/encrypt-pin skip the block), so no path serves unfiltered.It also narrows the two pin enumerators (
ipfs_pin,pinata) fromgit cat-file --batch-all-objectstogit rev-list --objects --all. The old call included unreachable/dangling objects, which have no ref and no tree path, so visibility rules cannot classify them; a dangling copy of a secret blob would bypass withholding straight into a permanent IPFS pin. Reachable-only keeps the pin set aligned with whatblob_pathscan evaluate.Four tests added: a blob reachable only via a non-head/tag ref is still withheld; an untraversable ref makes the withheld computation return
Err; and a dangling blob is absent from each pin enumerator while a committed blob is present.Stacks on #40 (base=main), carrying the cumulative Phase-2/3 prereq diff since these files exist only in the unmerged stack.
Summary by CodeRabbit
gl clonecommand supporting partial clones with content filtering and automatic blob recovery