Skip to content

packing: replace greedy merge with statistical partitioning#107

Draft
jlebon wants to merge 2 commits intocoreos:mainfrom
jlebon:pr/algo-tweaks
Draft

packing: replace greedy merge with statistical partitioning#107
jlebon wants to merge 2 commits intocoreos:mainfrom
jlebon:pr/algo-tweaks

Conversation

@jlebon
Copy link
Copy Markdown
Member

@jlebon jlebon commented Apr 10, 2026

The old algorithm used a greedy merge approach (BinaryHeap-based,
minimizing TEV loss per merge) which was fundamentally unstable: small
changes in input caused completely different merge decisions, resulting
in poor layer reuse across updates.

Additionally, because of its greediness, it easily fell into a local
optimum with this giant catch-all bucket with low stability and safer
more stable items in the other layers.

Replace it with a two-phase statistical partitioning approach inspired
by rpm-ostree's chunking algorithm:

Phase 1 classifies components by size using median + MAD, giving large
components (linux-firmware, kernel, firefox) their own singleton layers.
Phase 2 assigns all remaining components to bins using stability tiers
(high/mid/low via mean+stddev) and deterministic name-based hashing,
which ensures stable bin membership across builds without needing to
track prior build state.

Also remove the stability fallback that assigned min(known)/2 to
components without stability data (xattr, bigfiles, unclaimed). These
now stay at 0.0 and are naturally handled by the stability tiers.

Benchmarked against rpm-ostree's native chunking on FCOS F43 (10
biweekly stable releases) and Silverblue F43 (10 daily builds):

FCOS:       49.2% reuse, 826 MiB avg download (rpm-ostree: 33.0%, 1.1 GiB)
Silverblue: 89.5% reuse, 543 MiB avg download (rpm-ostree: 71.8%, 1.5 GiB)

Assisted-by: OpenCode (Claude Opus 4.6)

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the existing greedy clustering algorithm for OCI layer packing with a new statistical partitioning approach inspired by rpm-ostree. The new algorithm classifies components into layers based on size outliers (using median and MAD) and stability tiers, utilizing name-based hashing for deterministic binning. Feedback identifies critical logic errors where components could be silently dropped if the layer budget is exhausted or if a stability tier is allocated zero bins. Additionally, a fix is required for the use of an unstable Rust feature in the median calculation.

Comment thread src/packing.rs
Comment thread src/packing.rs
Comment thread src/packing.rs
@jlebon
Copy link
Copy Markdown
Member Author

jlebon commented Apr 10, 2026

Don't look too much at the code. This is still rough/raw from Opus. I still need to carefully go through it, but the results are encouraging. I made it sweep for optimal parameters for the FCOS and Silverblue sets (which hopefully generalizes well to non-Fedora data sets -- will try to do more validation there).

@jlebon jlebon force-pushed the pr/algo-tweaks branch 2 times, most recently from 2db7255 to 3a357ad Compare April 10, 2026 22:25
jlebon added 2 commits April 10, 2026 22:13
The old algorithm used a greedy merge approach (BinaryHeap-based,
minimizing TEV loss per merge) which was fundamentally unstable: small
changes in input caused completely different merge decisions, resulting
in poor layer reuse across updates.

Additionally, because of its greediness, it easily fell into a local
optimum with this giant catch-all bucket with low stability and safer
more stable items in the other layers.

Replace it with a two-phase statistical partitioning approach inspired
by rpm-ostree's chunking algorithm:

Phase 1 classifies components by size using median + MAD, giving large
components (linux-firmware, kernel, firefox) their own singleton layers.
Phase 2 assigns all remaining components to bins using stability tiers
(high/mid/low via mean+stddev) and deterministic name-based hashing,
which ensures stable bin membership across builds without needing to
track prior build state.

Also remove the stability fallback that assigned min(known)/2 to
components without stability data (xattr, bigfiles, unclaimed). These
now stay at 0.0 and are naturally handled by the stability tiers.

Benchmarked against rpm-ostree's native chunking on FCOS F43 (10
biweekly stable releases) and Silverblue F43 (10 daily builds):

    FCOS:       49.2% reuse, 826 MiB avg download (rpm-ostree: 33.0%, 1.1 GiB)
    Silverblue: 89.5% reuse, 543 MiB avg download (rpm-ostree: 71.8%, 1.5 GiB)

Assisted-by: OpenCode (Claude Opus 4.6)
Switch test-arch.sh and test-self.sh to use --write-manifest-to for
checking the unclaimed component size. The previous approach checked
the layer size via skopeo inspect, which is inaccurate since the
unclaimed component may share a layer with other components.

For test-self.sh, also drop the is_chunked shortcut so that we always
run chunkah and produce a manifest.

Assisted-by: OpenCode (Claude Opus 4.6)
@cgwalters
Copy link
Copy Markdown
Member

There's plenty of other container images to look at, I think registry.redhat.io/rhelai1/bootc-cuda-rhel9 is another one we should have in our references. IIRC Chris was looking at tweaking its build to use user.component.

@castrojo
Copy link
Copy Markdown

castrojo commented Apr 13, 2026

ghcr.io/projectbluefin/dakota:latest is a buildstream based image. We want to just start with chunkah out of the gate, would love to offer a prototype. We at least have a few people on this, and I'm dogfooding on metal! projectbluefin/dakota#212

Happy to turn on whatever is necessary to help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants