Add compressed image support for QEMU driver#789
Add compressed image support for QEMU driver#789bennyz merged 5 commits intojumpstarter-dev:mainfrom
Conversation
✅ Deploy Preview for jumpstarter-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
📝 WalkthroughWalkthroughAdds file-signature based compression detection and an async AutoDecompressIterator (gzip, xz, bz2, zstd), exposes detection/creator helpers, adds tests, and integrates AutoDecompressIterator into QemuFlasher.flash to transparently decompress input image streams before writing. Changes
Sequence Diagram(s)sequenceDiagram
participant Qemu as QemuFlasher.flash()
participant Stream as Input AsyncIterator
participant AutoDec as AutoDecompressIterator
participant Factory as create_decompressor()
participant Target as Target Stream/Writer
Qemu->>AutoDec: wrap(Stream)
Note right of AutoDec: buffer up to SIGNATURE_BUFFER_SIZE bytes
AutoDec->>AutoDec: inspect buffered signature
alt compression detected
AutoDec->>Factory: create_decompressor(type)
Factory-->>AutoDec: decompressor instance
AutoDec->>AutoDec: decompress buffered + incoming chunks
AutoDec-->>Target: yield decompressed chunks
else no compression
AutoDec-->>Target: yield original chunks (passthrough)
end
Target->>Qemu: writes/chunks to device
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used📓 Path-based instructions (1)**/*.py📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)
Files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
🔇 Additional comments (10)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/jumpstarter/jumpstarter/streams/encoding.py (1)
62-72: Consider adding explicit handling for unmatched compression types.The function relies on exhaustive pattern matching but lacks a default case. If a new
Compressionenum value is added without updating this function, it would silently returnNone.🔎 Proposed improvement
def create_decompressor(compression: Compression) -> Any: """Create a decompressor object for the given compression type.""" match compression: case Compression.GZIP: return zlib.decompressobj(wbits=47) # Auto-detect gzip/zlib case Compression.XZ: return lzma.LZMADecompressor() case Compression.BZ2: return bz2.BZ2Decompressor() case Compression.ZSTD: return zstd.ZstdDecompressor() + case _: + raise ValueError(f"Unsupported compression type: {compression}")
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.pypackages/jumpstarter/jumpstarter/streams/encoding.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)
Ruff should be used for code formatting and linting, excluding
jumpstarter-protocolpackage
Files:
packages/jumpstarter/jumpstarter/streams/encoding.pypackages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
packages/jumpstarter-driver-*/jumpstarter_driver_*/driver.py
📄 CodeRabbit inference engine (.cursor/rules/creating-new-drivers.mdc)
Driver class names should be in CamelCase and be descriptive with appropriate suffixes based on functionality: Power drivers should end with
*Power, Network drivers with*Network, Flasher drivers with*Flasher, Console drivers with*Console, Server drivers with*Server
Files:
packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
packages/jumpstarter-driver-**/jumpstarter_driver_**/*.py
📄 CodeRabbit inference engine (.cursor/rules/creating-new-drivers.mdc)
Driver implementations should follow existing code style validated with
make lint(fix withmake lint-fix), perform static type checking withmake ty-pkg-${package_name}, add comprehensive tests, and verify all tests pass withmake test-pkg-${package_name}ormake test
Files:
packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
🧠 Learnings (1)
📚 Learning: 2025-11-27T09:58:41.875Z
Learnt from: CR
Repo: jumpstarter-dev/jumpstarter PR: 0
File: .cursor/rules/creating-new-drivers.mdc:0-0
Timestamp: 2025-11-27T09:58:41.875Z
Learning: Applies to packages/jumpstarter-driver-**/jumpstarter_driver_**/*.py : Driver implementations should follow existing code style validated with `make lint` (fix with `make lint-fix`), perform static type checking with `make ty-pkg-${package_name}`, add comprehensive tests, and verify all tests pass with `make test-pkg-${package_name}` or `make test`
Applied to files:
packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
🧬 Code graph analysis (2)
packages/jumpstarter/jumpstarter/streams/encoding.py (1)
packages/jumpstarter-driver-iscsi/jumpstarter_driver_iscsi/driver.py (1)
decompress(318-357)
packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py (1)
packages/jumpstarter/jumpstarter/streams/encoding.py (1)
AutoDecompressIterator(155-216)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Redirect rules - jumpstarter-docs
- GitHub Check: Header rules - jumpstarter-docs
- GitHub Check: Pages changed - jumpstarter-docs
- GitHub Check: build
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
- GitHub Check: e2e
- GitHub Check: pytest-matrix (macos-15, 3.13)
- GitHub Check: pytest-matrix (macos-15, 3.12)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
🔇 Additional comments (4)
packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py (1)
44-55: LGTM! Clean integration of auto-decompression.The flash method correctly wraps the source stream with
AutoDecompressIteratorand the docstring accurately describes the new decompression capability. The implementation maintains the existing streaming behavior while adding transparent decompression support.Minor style consideration:
res.__aiter__()works butaiter(res)is the idiomatic built-in (Python 3.10+). Not required to change if you prefer explicitness here.packages/jumpstarter/jumpstarter/streams/encoding.py (3)
26-44: LGTM! Well-structured signature definitions.The
FileSignaturedataclass andCOMPRESSION_SIGNATUREStuple are clean and correctly define the magic bytes for each compression format. The frozen dataclass is appropriate since signatures are immutable.
47-59: LGTM! Straightforward signature detection.The function correctly iterates through signatures and uses
startswith()for matching. ReturningNonefor unknown formats is the right approach.
13-16: No changes needed. Thebackports-zstd>=1.1.0dependency is correctly declared inpyproject.tomlwith the version constraintpython_full_version < '3.14', matching the conditional import at lines 13-16.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
packages/jumpstarter/jumpstarter/streams/encoding.py (1)
200-202: Missing flush when stream exhausted during detection phase.The flush logic at lines 209-214 only runs when
StopAsyncIterationis caught in the main read loop. However, if the source stream is exhausted during_detect_compression()(small compressed file < 8 bytes), the decompressor is never flushed here.Scenario:
- Small gzip file fits entirely in buffer during detection
_exhausted = True, decompressor created- Buffer is decompressed and returned
- Next call: buffer empty,
_exhaustedis True → raises immediately without flushingThis could cause data loss for trailing decompressed bytes.
🔎 Proposed fix
# Stream exhausted if self._exhausted: + # Flush any remaining data from decompressor + if self._decompressor is not None and hasattr(self._decompressor, "flush"): + remaining = self._decompressor.flush() + self._decompressor = None + if remaining: + return remaining raise StopAsyncIteration
🧹 Nitpick comments (1)
packages/jumpstarter/jumpstarter/streams/encoding.py (1)
62-72: Consider adding a default case for defensive coding.The match statement covers all current
Compressionenum values, but lacks a fallback. If the enum is ever extended without updating this function, it would silently returnNone.🔎 Proposed fix
def create_decompressor(compression: Compression) -> Any: """Create a decompressor object for the given compression type.""" match compression: case Compression.GZIP: return zlib.decompressobj(wbits=47) # Auto-detect gzip/zlib case Compression.XZ: return lzma.LZMADecompressor() case Compression.BZ2: return bz2.BZ2Decompressor() case Compression.ZSTD: return zstd.ZstdDecompressor() + case _: + raise ValueError(f"Unsupported compression type: {compression}")
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/jumpstarter/jumpstarter/streams/encoding.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)
Ruff should be used for code formatting and linting, excluding
jumpstarter-protocolpackage
Files:
packages/jumpstarter/jumpstarter/streams/encoding.py
🧬 Code graph analysis (1)
packages/jumpstarter/jumpstarter/streams/encoding.py (1)
packages/jumpstarter-driver-iscsi/jumpstarter_driver_iscsi/driver.py (1)
decompress(318-357)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Redirect rules - jumpstarter-docs
- GitHub Check: Header rules - jumpstarter-docs
- GitHub Check: Pages changed - jumpstarter-docs
- GitHub Check: build
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
- GitHub Check: pytest-matrix (macos-15, 3.13)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
- GitHub Check: pytest-matrix (macos-15, 3.12)
- GitHub Check: pytest-matrix (macos-15, 3.11)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
- GitHub Check: e2e
🔇 Additional comments (6)
packages/jumpstarter/jumpstarter/streams/encoding.py (6)
26-31: LGTM!The
FileSignaturefrozen dataclass is a clean, immutable value object for associating magic bytes with compression types.
34-44: LGTM!Magic bytes are correct for all four compression formats, and
SIGNATURE_BUFFER_SIZE = 8adequately covers the longest signature (xz at 6 bytes).
47-59: LGTM!The detection logic is clean and handles edge cases gracefully—
startswithcorrectly handles buffers shorter than the signature.
154-185: LGTM!The detection phase correctly buffers initial bytes and handles early stream exhaustion. The class design with clear internal state fields is well-structured.
207-219: Flush logic for main loop is correctly implemented.The handling here properly flushes the decompressor, guards against double-flush by setting
_decompressor = None, and returns remaining bytes before raisingStopAsyncIteration. Good implementation.
71-72: TheZstdDecompressor()usage is compatible across bothbackports.zstdand Python 3.14'scompression.zstd. Thebackports.zstdpackage is explicitly designed as a backport ofcompression.zstd(PEP 784) and exposes the same API, including theZstdDecompressorclass with identicaldecompress(data)method signature for incremental decompression. Both libraries support the same streaming decompression behavior used in this code and inCompressedStream.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
packages/jumpstarter/jumpstarter/streams/encoding_test.py (2)
153-155: Consider a more idiomatic empty async generator.The
if False: yieldpattern works but is unconventional. A clearer alternative would be:async def empty_iter(): return yield # Make it a generatorOr simply:
async def empty_iter(): pass yield
100-166: Consider adding tests for malformed compressed data.The current test suite comprehensively covers happy paths and basic edge cases. Consider adding tests for error handling scenarios:
- Malformed/corrupted compressed data (e.g., valid signature but invalid compression stream)
- Truncated compressed data (stream ends mid-decompression)
- Data with valid signature but incorrect compression format
These tests would verify that the
AutoDecompressIteratorhandles decompression errors gracefully and fails with appropriate error messages.Example test cases to add
async def test_malformed_gzip(self): """Malformed gzip data should raise an error.""" # Valid gzip signature followed by random data malformed = b"\x1f\x8b\x08" + os.urandom(20) with pytest.raises(Exception): # Adjust exception type as needed await self._decompress_and_check(malformed, b"", chunk_size=16) async def test_truncated_compressed_data(self): """Truncated compressed stream should raise an error.""" original = b"hello world" * 100 compressed = gzip.compress(original) truncated = compressed[:len(compressed) // 2] # Cut in half with pytest.raises(Exception): # Adjust exception type as needed await self._decompress_and_check(truncated, original, chunk_size=16)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/jumpstarter/jumpstarter/streams/encoding_test.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)
Ruff should be used for code formatting and linting, excluding
jumpstarter-protocolpackage
Files:
packages/jumpstarter/jumpstarter/streams/encoding_test.py
🧬 Code graph analysis (1)
packages/jumpstarter/jumpstarter/streams/encoding_test.py (1)
packages/jumpstarter/jumpstarter/streams/encoding.py (4)
AutoDecompressIterator(155-222)Compression(19-23)compress_stream(124-151)detect_compression_from_signature(47-59)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Redirect rules - jumpstarter-docs
- GitHub Check: Header rules - jumpstarter-docs
- GitHub Check: Pages changed - jumpstarter-docs
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
- GitHub Check: pytest-matrix (macos-15, 3.12)
- GitHub Check: pytest-matrix (macos-15, 3.13)
- GitHub Check: pytest-matrix (macos-15, 3.11)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
- GitHub Check: e2e
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
- GitHub Check: build
🔇 Additional comments (2)
packages/jumpstarter/jumpstarter/streams/encoding_test.py (2)
49-54: LGTM!The helper function provides a clean abstraction for retrieving compression signatures in tests.
57-98: LGTM!Comprehensive test coverage for signature detection including all compression formats and edge cases (empty input, truncated signatures, uncompressed data, and real compressed data).
|
cc @mangelajo @bennyz @bkhizgiy for review. thanks! |
packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
Outdated
Show resolved
Hide resolved
|
@bkhizgiy updated, tnx for the suggestions 👍 on a side note, pipeline error seems unrelated: |
bennyz
left a comment
There was a problem hiding this comment.
Personally I feel like the signature checking is a bit of an overkill for this. Tools like unxz/gunzip/etc rely on the extensions, which can be good enough for our case as well, since flashing qemu has no risk of corruption
however I will not block on this
yep that would work too i guess. ended up doing the sign check as the bot warned me that in some corner case with the URL we can lose the original filename. not sure how probable this thing is. but do you think it's worth reimplementing now? |
nope, merging |
Closes jumpstarter-dev/jumpstarter#116
Summary
Adds transparent decompression support for compressed disk images when flashing to the QEMU driver. Compressed images (
.gz,.xz,.bz2,.zstd) are automatically detected and decompressed on the fly.Changes
packages/jumpstarter/jumpstarter/streams/encoding.pyFileSignaturedataclass to represent compression format signaturesCOMPRESSION_SIGNATUREStuple with file signatures for gzip, xz, bz2, and zstddetect_compression_from_signature()function for auto-detectioncreate_decompressor()helper functionAutoDecompressIteratorasync iterator that wraps a byte stream and transparently decompresses if neededpackages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.pyQemuFlasher.flash()to wrap the source stream withAutoDecompressIteratorHow it works
flash()is called, the first 8 bytes are bufferedSummary by CodeRabbit
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.