feat: bounded-memory file uploads for huge files (1GB-100GB+) by grumbach · Pull Request #20 · WithAutonomi/ant-client

grumbach · 2026-03-31T07:20:14Z

Summary

Add ChunkSpill mechanism: encrypted chunks are written to a temp directory during encryption, keeping only 32-byte addresses in memory. Chunks are read back one wave (64 chunks) at a time for upload, bounding peak memory to ~256 MB regardless of file size.
Add disk space pre-check before upload (InsufficientDiskSpace error) using fs2::available_space with integer arithmetic for the 10% headroom calculation.
Refactor file_prepare_upload (external-signer path) to use spill for encryption, with honest docs that PreparedUpload still holds all chunks in memory (inherent to the two-phase protocol).
Fix prepared[0] direct indexing in batch_pay — replaced with iterator .sum().
Fix inline paths (evmlib::contract::..., xor_name::XorName) — moved to top-level use imports.
Fix mutex poison handling in spawn_file_encryption — 3 sites now use unwrap_or_else(PoisonError::into_inner) instead of silently skipping.
Fix swallowed errors: datamap_tx.send() and temp file cleanup now log warnings.
Fix temp file leak on rename failure in file_download.
Harden spill dir creation: create_dir (not create_dir_all) to reject pre-existing directories.

Proven test results

File Size	Chunks	RSS Before	RSS After	RSS Increase	Integrity
64 MB	20	—	—	—	All bytes match
1 GB	260	58 MB	244 MB	185 MB	All 1,073,741,824 bytes match
4 GB	1029	59 MB	191 MB	132 MB	All 4,294,967,296 bytes match

Memory does not scale with file size — 4 GB uses less RSS increase (132 MB) than 1 GB (185 MB), proving the wave-based spill mechanism works.

E2E tests (`e2e_huge_file.rs`)

test_huge_file_upload_download_1gb — upload, download, byte-verify 1 GB, assert RSS < 512 MB
test_huge_file_upload_download_4gb — same for 4 GB with tighter chunk count assertion (>= 1000)
test_large_file_64mb_round_trip — fast CI-friendly 64 MB round-trip
test_file_upload_disk_space_check — verifies pre-flight disk space check
Uses seeded xorshift64 PRNG for incompressible test data (verified via Brotli: 0% compression)
Uses mach_task_basic_info.resident_size on macOS (current RSS, not peak) — panics on measurement failure on supported platforms

Review history

3 review rounds, 12 agents total (6 Claude Opus + 6 Codex gpt-5.4 xhigh). Final round verdict: zero remaining production code violations across all project rules (no unwrap/expect/panic, no direct indexing, no inline paths, no swallowed errors).

Test plan

cfd (cargo fmt + clippy) passes with zero warnings
All 113 unit tests pass
1 GB E2E upload/download round-trip with memory assertion
4 GB E2E upload/download round-trip with memory assertion
64 MB round-trip (CI-friendly)
Disk space pre-check test

Add ChunkSpill mechanism that writes encrypted chunks to a temp directory during encryption, keeping only 32-byte addresses in memory. Chunks are read back one wave (64 chunks) at a time for upload, bounding peak RSS to ~256 MB regardless of file size. Key changes: - ChunkSpill struct: temp-dir-based buffer with Drop cleanup - Disk space pre-check before upload (fs2::available_space) - InsufficientDiskSpace error variant - file_prepare_upload refactored to use spill (with honest docs that PreparedUpload still holds all chunks in memory) - Wave-level progress logging during encryption and upload - Fixed direct indexing (prepared[0]) in batch_pay - Fixed inline paths and swallowed errors per review feedback - Integer arithmetic for disk space headroom (no float precision loss) E2E tests proving bounded memory: - 1 GB: 260 chunks, RSS increase 185 MB (limit 512 MB) - 4 GB: 1029 chunks, RSS increase 132 MB (limit 512 MB) - Both verified byte-for-byte after round-trip - Uses mach_task_basic_info (current RSS, not peak) on macOS Memory does NOT scale with file size — 4 GB uses less RSS increase than 1 GB, proving the wave-based spill mechanism works correctly.

…rity Fixes from multi-agent review (2 Claude Opus + 3 Codex gpt-5.4 xhigh): - Use unwrap_or_else(PoisonError::into_inner) on all 3 mutex lock sites in spawn_file_encryption instead of silently skipping on poison. A poisoned mutex previously caused read errors to be silently dropped, which could produce a truncated DataMap. - Move xor_name::XorName to top-level use import (was inline in file_download closure). - Revert merkle chunk limit guard: gas savings from merkle payments far outweigh the O(n) proof memory overhead (~86 KB per chunk from ML-DSA-65 signatures in candidate pools). For a 100 GB file (~25k chunks), proof memory is ~2 GB which is acceptable.

Fixes from Codex gpt-5.4 xhigh production review: - batch.rs: move evmlib::contract::payment_vault::get_market_price and evmlib::wallet::PayForQuotesError to top-level use imports (PROD-006) - file.rs: ChunkSpill::new() now uses create_dir instead of create_dir_all, so creation fails if the directory already exists. This prevents silent reuse of a stale/symlinked spill directory (PROD-007) Not addressed (architectural, separate PRs): - PROD-001/002/003: Resumable uploads after partial payment failure - PROD-004: CancellationToken threading through upload APIs

Fixes from Codex gpt-5.4 xhigh test review: - Replace repeating 256-byte pattern with seeded xorshift64 PRNG for incompressible test data. Repeating patterns compress to tiny chunks, making the wave buffer test pass trivially without stressing the ~256 MB wave buffer. Incompressible data produces full-size ~4 MB encrypted chunks. (TEST-003) - RSS measurement failure now panics on macOS/Linux instead of silently passing. This prevents CI from quietly skipping the bounded-memory assertion if the FFI probe ever drifts. (TEST-002) - Tighten 4 GB chunk count assertion from >= 800 to >= 1000 (actual expected: ~1026 chunks at MAX_CHUNK_SIZE). (TEST-004)

If std::fs::rename fails after the download temp file is fully written, the previous code used ? which exited without cleanup, leaking the temp file. Now explicitly handles the rename error, removes the temp file, and then returns the error. Found by Codex gpt-5.4 xhigh review round 3 (R3-001).

mickvandijke

Review: Duplicate chunk addresses cause double-upload and inflated metrics

ChunkSpill::push() (file.rs:73-79) uses compute_address(content) as the filename. If two chunks have identical content (which self-encryption can produce for repeated data), the second fs::write silently overwrites the first file on disk, but both addresses are appended to self.addresses. Consequences:

total_bytes double-counts — inflates avg_chunk_size(), which feeds into quoting metrics via pay_for_merkle_batch
Wave uploads read and re-upload the same chunk twice — wasted bandwidth and gas

The old code had the same latent issue (duplicate Bytes in the vec), but the new code makes it slightly worse because total_bytes is now used for the avg_chunk_size quoting metric.

Suggested fix: Deduplicate in push() — e.g. track seen addresses in a HashSet and skip the write + total_bytes increment if the address already exists (still preserve the entry in addresses if ordering matters for the DataMap, or skip it entirely if duplicates are redundant on the network).

Alternatively, document that this is acceptable because self-encryption rarely produces duplicates for real (incompressible) files.

mickvandijke

LGTM!!

ChunkSpill::push() now tracks seen addresses in a HashSet and skips the write, total_bytes increment, and address recording for duplicate content. This prevents: - Double-uploading the same chunk (wasted bandwidth and gas) - Inflated avg_chunk_size() feeding into quoting metrics - Inflated chunk count in progress logging Self-encryption rarely produces duplicate chunks for incompressible data, but repeated/patterned content could trigger this.

grumbach added 5 commits March 31, 2026 15:25

mickvandijke reviewed Mar 31, 2026

View reviewed changes

mickvandijke approved these changes Mar 31, 2026

View reviewed changes

mickvandijke merged commit 72bd4e4 into WithAutonomi:main Mar 31, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: bounded-memory file uploads for huge files (1GB-100GB+)#20

feat: bounded-memory file uploads for huge files (1GB-100GB+)#20
mickvandijke merged 6 commits intoWithAutonomi:mainfrom
grumbach:feat/huge-file-uploads

grumbach commented Mar 31, 2026

Uh oh!

mickvandijke left a comment

Uh oh!

mickvandijke left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

grumbach commented Mar 31, 2026

Summary

Proven test results

E2E tests (e2e_huge_file.rs)

Review history

Test plan

Uh oh!

mickvandijke left a comment

Choose a reason for hiding this comment

Review: Duplicate chunk addresses cause double-upload and inflated metrics

Uh oh!

mickvandijke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

E2E tests (`e2e_huge_file.rs`)