As part of optimising our workflow, we try to store build results in the cache so we can reuse them in later builds. However, over time, as builds grow, restoring the build cache slowly comes to dominate the build process (as everything else is cached)
We found a potential improvement that could could significantly improve throughput without that many code changes: exchange (p)gzip for Zstd.
For testing, I used a cache artefact for one of our Rust-based builds, which contains the target/ dir and all of the objects within.
- Archive size shrunk from 2.8GB to 2.4GB
- Archive list time (measured as tar -tf) shrunk from 51 to 18 seconds. This is basically a sequential scan through the file, so it should take roughly as long as decompression in the limit. It just doesn't write.
- Introducing multi-threading gave a 2x improvement, reducing it to 9s, which is less than I expected
- Unpacking times followed a similar pattern, taking 1s longer than just listing for both
gz (52s) and zstd (19s) because they do the same additional work
- Similarly, introducing multi-threaded decompression takes 10s, one second longer than it takes to simply list the archive
- Total achieved write speed is 1GB/s. I don't know if you can do meaningfully better than that without CoW.
Would you consider implementing this? From what we can tell, it should be a relatively straightforward patch, as the current pgzip library you use has an equivalent zstd library with a similar API.
As part of optimising our workflow, we try to store build results in the cache so we can reuse them in later builds. However, over time, as builds grow, restoring the build cache slowly comes to dominate the build process (as everything else is cached)
We found a potential improvement that could could significantly improve throughput without that many code changes: exchange
(p)gzipforZstd.For testing, I used a cache artefact for one of our Rust-based builds, which contains the
target/dir and all of the objects within.gz(52s) andzstd(19s) because they do the same additional workWould you consider implementing this? From what we can tell, it should be a relatively straightforward patch, as the current
pgziplibrary you use has an equivalentzstdlibrary with a similar API.