Skip to content

lore-storage: Batch FastCDC chunking to cut per-chunk overhead#61

Open
Vazcore wants to merge 1 commit into
EpicGames:mainfrom
Vazcore:lore-storage-fragment-optimization
Open

lore-storage: Batch FastCDC chunking to cut per-chunk overhead#61
Vazcore wants to merge 1 commit into
EpicGames:mainfrom
Vazcore:lore-storage-fragment-optimization

Conversation

@Vazcore

@Vazcore Vazcore commented Jun 23, 2026

Copy link
Copy Markdown

When you store a big file in Lore, write_fragmented cuts it into chunks using FastCDC and ships each cut point to the compute pool separately, waits for the answer, then does it again. For a 1 GiB buffer with ~4 KiB chunks I think that's around 256k spawn dispatches, oneshot allocations and await yields. The actual chunking work is fast, but all that round-tripping added up to a real cost on medium and large writes I believe.

Why

A single chunking pass on the compute pool is enough - there's no reason to wait per boundary.

How

A single compute_pool task drives the FastCDC Iterator to completion and sends back a list of boundaries through one oneshot. The existing storage dispatch loop then iterates them locally. For fixed-size chunking the boundaries are just step arithmetic so those are computed inline.

The Arc wrapper is gone since we only touch the chunker once now. The single-fragment fast path, the JoinSet of storage tasks, hash-only mode and clone_buffer are all preserved, and the public signature of write_fragmented is unchanged. Behaviorally the output is identical to the old code, same cut points for the same input.

Testing

I added a small reference helper that does the standard FastCDC iteration synchronously and compared its output to the batched path on a 256 KiB random buffer. There's also coverage for empty input, sub-min-size input, an all-zero buffer (which defeats the rolling hash), and a non-aligned fixed-size case. cargo test -p lore-storage goes from 154 to 160 passing, clippy is clean with -D warnings, and the whole workspace still compiles. The fastcdc_batch_matches_reference test is the one that would catch a regression in the FastCDC version or a misconfiguration of the chunk sizes.

@Vazcore Vazcore marked this pull request as draft June 23, 2026 18:37
write_fragmented called FastCDC::cut once per chunk, shipping the work to the compute pool, awaiting the result, and repeating. For a 1 GiB buffer with ~4 KiB chunks that is ~256k spawn dispatches, oneshot allocations and await yields — overhead that can rival the chunking work itself for small remaining chunks.

Compute all FastCDC boundaries in a single compute_pool task by driving the FastCDC Iterator to completion. Fixed-size chunking has trivial per-step math so its boundaries are computed inline. The single-fragment fast path and the storage dispatch loop are unchanged.

Verified by a new parity test that compares the batched boundaries against a reference FastCDC iteration on a 256 KiB random buffer, plus edge cases for empty, sub-min-size, all-zero, and non-aligned fixed-size inputs.

Signed-off-by: Oleksii Habrusiev <alexgabrusev@gmail.com>
@Vazcore Vazcore force-pushed the lore-storage-fragment-optimization branch from 2fe3c48 to 80d041b Compare June 24, 2026 03:18
@Vazcore Vazcore marked this pull request as ready for review June 24, 2026 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant