lore-storage: Batch FastCDC chunking to cut per-chunk overhead by Vazcore · Pull Request #61 · EpicGames/lore

Vazcore · 2026-06-23T17:29:11Z

When you store a big file in Lore, write_fragmented cuts it into chunks using FastCDC and ships each cut point to the compute pool separately, waits for the answer, then does it again. For a 1 GiB buffer with ~4 KiB chunks I think that's around 256k spawn dispatches, oneshot allocations and await yields. The actual chunking work is fast, but all that round-tripping added up to a real cost on medium and large writes I believe.

Why

A single chunking pass on the compute pool is enough - there's no reason to wait per boundary.

How

A single compute_pool task drives the FastCDC Iterator to completion and sends back a list of boundaries through one oneshot. The existing storage dispatch loop then iterates them locally. For fixed-size chunking the boundaries are just step arithmetic so those are computed inline.

The Arc wrapper is gone since we only touch the chunker once now. The single-fragment fast path, the JoinSet of storage tasks, hash-only mode and clone_buffer are all preserved, and the public signature of write_fragmented is unchanged. Behaviorally the output is identical to the old code, same cut points for the same input.

Testing

I added a small reference helper that does the standard FastCDC iteration synchronously and compared its output to the batched path on a 256 KiB random buffer. There's also coverage for empty input, sub-min-size input, an all-zero buffer (which defeats the rolling hash), and a non-aligned fixed-size case. cargo test -p lore-storage goes from 154 to 160 passing, clippy is clean with -D warnings, and the whole workspace still compiles. The fastcdc_batch_matches_reference test is the one that would catch a regression in the FastCDC version or a misconfiguration of the chunk sizes.

write_fragmented called FastCDC::cut once per chunk, shipping the work to the compute pool, awaiting the result, and repeating. For a 1 GiB buffer with ~4 KiB chunks that is ~256k spawn dispatches, oneshot allocations and await yields — overhead that can rival the chunking work itself for small remaining chunks. Compute all FastCDC boundaries in a single compute_pool task by driving the FastCDC Iterator to completion. Fixed-size chunking has trivial per-step math so its boundaries are computed inline. The single-fragment fast path and the storage dispatch loop are unchanged. Verified by a new parity test that compares the batched boundaries against a reference FastCDC iteration on a 256 KiB random buffer, plus edge cases for empty, sub-min-size, all-zero, and non-aligned fixed-size inputs. Signed-off-by: Oleksii Habrusiev <alexgabrusev@gmail.com>

Vazcore marked this pull request as draft June 23, 2026 18:37

Vazcore force-pushed the lore-storage-fragment-optimization branch from 2fe3c48 to 80d041b Compare June 24, 2026 03:18

Vazcore marked this pull request as ready for review June 24, 2026 03:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lore-storage: Batch FastCDC chunking to cut per-chunk overhead#61

lore-storage: Batch FastCDC chunking to cut per-chunk overhead#61
Vazcore wants to merge 1 commit into
EpicGames:mainfrom
Vazcore:lore-storage-fragment-optimization

Vazcore commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

Vazcore commented Jun 23, 2026

Why

How

Testing

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant