fuzz: Fix checksum coverage gap in parse and differential fuzzers#12
Closed
fuzz: Fix checksum coverage gap in parse and differential fuzzers#12
Conversation
The parse.rs and differential.rs fuzz targets were getting almost zero coverage of deeper parser logic (PAX extensions, GNU long name/link, sparse files, field parsing, etc.) because random fuzz input almost never has valid tar header checksums. The parser's verify_checksum() call at the top of parse_header() rejects ~100% of random inputs immediately, causing the fuzzer to break on every input before exercising any interesting code paths. For the parse.rs fuzzer, add a Parser::set_verify_checksums(bool) API that allows skipping checksum verification entirely. The fuzzer uses this ~90% of the time (determined by the first byte of input), letting it exercise all the parsing code paths beyond the checksum gate. The remaining 10% keeps checksum validation testing. For the differential fuzzer, since both tar-core and tar-rs must see identical data with valid checksums, use a fixup_checksums() approach that walks through 512-byte blocks and rewrites checksum fields in-place before passing to both parsers. It also parses the size field to skip over content blocks, so subsequent headers get fixed too. The roundtrip.rs fuzzer already uses EntryBuilder which produces valid checksums, so it does not need changes. Assisted-by: OpenCode (Claude claude-opus-4-6)
…rences The differential fuzzer was hitting false positives from two sources: 1. Truncated archives: tar-core's test harness pushed entries before verifying the full padded content was present. tar-rs refuses to yield entries whose content can't be fully read. Fix by checking that the entire padded content is available before pushing the entry. 2. Parsing leniency: tar-core intentionally accepts some inputs that tar-rs rejects (e.g. all-null numeric fields are parsed as 0, while tar-rs errors on them). Change the entry count assertion to only require tar-core parses at least as many entries as tar-rs, not exactly the same count. Found by running the improved checksum-aware fuzzers from the previous commit. Assisted-by: OpenCode (Claude claude-opus-4-6)
Multiple fixes found by running the improved checksum-aware fuzzers: parse_octal: Truncate at null first, then trim all ASCII whitespace (not just spaces) from both ends. This matches tar-rs's behavior of calling .trim() after null-truncation, fixing a differential where tar-core rejected fields with tab characters that tar-rs accepted. testutil parse_tar_core: Check that enough content data is present before yielding an entry, matching tar-rs's read_exact semantics. testutil parse_tar_rs: Skip metadata entry types (GNU long name/link, PAX headers) that tar-core handles internally. Require numeric fields to parse successfully instead of silently defaulting to 0. Skip entries with empty paths since tar-core rejects those. differential fuzzer: Dump raw header via Header's Debug impl on mismatch for easier diagnosis. Allow tar-core to parse more entries than tar-rs (it's intentionally more lenient for e.g. all-null fields). Assisted-by: OpenCode (Claude claude-opus-4-6)
Fix several sources of differential fuzzer crashes between tar-core and tar-rs: Parser (src/parse.rs): Require recognized magic (is_gnu() || is_ustar()) before treating typeflags L/K/x/g/S as extension entries. A V7 header whose typeflag byte happens to be 'L' should be emitted as a regular entry, not consumed as a GNU long name extension. This matches tar-rs's `is_recognized_header` guard. Octal parsing (src/lib.rs): Add `is_tar_whitespace` helper that treats VT (0x0b) as whitespace, matching real tar implementations. Rust's `u8::is_ascii_whitespace` omits VT, causing fields like "0000000\x0b" to fail. Test harness (testutil/src/lib.rs): Three alignment fixes for `parse_tar_core`: - Call `set_allow_empty_path(true)` so parsing continues past entries with empty paths (previously it would error out early). - Skip entries with empty paths after parsing, matching `parse_tar_rs`. - Skip entries with metadata-only typeflags (L/K/x/g) that tar-core emits as regular entries when magic is unrecognized. `parse_tar_rs` already had an equivalent skip via its `continue` on these types. The root cause of the specific crash artifact: a fuzzed header had typeflag 'L' (GnuLongName) but no valid magic bytes. tar-core treated it as an extension and consumed it, while tar-rs treated it as a regular entry and skipped it. After these fixes, both parsers handle V7-style metadata typeflags consistently. Assisted-by: OpenCode (Claude claude-opus-4-6)
Headers with a null byte at position 0 in the uname or gname field produce empty slices from username()/groupname(). Treat these as None to match tar-rs behavior, which filters empty values via .filter(|b| !b.is_empty()). Found via differential fuzzing. Assisted-by: OpenCode (Claude claude-opus-4-6)
Global PAX headers (type 'g') are defined by POSIX independently of the UStar/GNU magic field. Routing them through emit_entry (the catch-all path for unrecognized-magic extensions) is wrong because global headers may have arbitrary metadata in their numeric fields. Remove the is_extension_format guard for XGlobalHeader so these are always emitted as GlobalExtensions events. Found via differential fuzzing. Assisted-by: OpenCode (Claude claude-opus-4-6)
Move numeric field validation (mode, uid, gid, mtime, device_major, device_minor) before the metadata-type and empty-path skip logic. Previously, parse_tar_rs would skip metadata-type entries (L/K/x/g) before checking numeric fields, while tar-core's parser validates all fields eagerly in emit_entry. This caused false mismatches when an unrecognized-magic 'x'-typed block had garbage in its numeric fields: tar-rs skipped it, tar-core errored. Also replace the lenient unwrap_or(None) for device fields with strict break-on-error, matching tar-core's behavior. Assisted-by: OpenCode (Claude claude-opus-4-6)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
parse.rsanddifferential.rsfuzz targets were getting almost zero coverage of deeper parser logic because random fuzz input almost never has valid tar header checksums. The parser'sverify_checksum()call at the top ofparse_header()rejects ~100% of random inputs immediately, so the fuzzers never exercise PAX extensions, GNU long name/link, sparse files, field parsing, etc.Changes
parse.rsfuzzer: Uses a newParser::set_verify_checksums(bool)API to skip checksum verification ~90% of the time (determined bydata[0] % 10). This lets the fuzzer exercise all code paths beyond the checksum gate. The remaining 10% keeps checksum validation testing.differential.rsfuzzer: Uses afixup_checksums()approach that walks through 512-byte blocks and rewrites checksum fields in-place, since both tar-core and tar-rs must see identical valid-checksum input. Also factors comparison logic into acompare_entries()helper to avoid duplication between the fixup and raw paths.src/parse.rs: AddsParser::set_verify_checksums(bool)API (analogous to existingset_allow_empty_path). Defaults totrue.roundtrip.rs: No changes needed — already usesEntryBuilderwhich produces valid checksums.