fuzz: Fix checksum coverage gap in parse and differential fuzzers by cgwalters · Pull Request #12 · composefs/tar-core

cgwalters · 2026-03-02T01:40:05Z

Summary

The parse.rs and differential.rs fuzz targets were getting almost zero coverage of deeper parser logic because random fuzz input almost never has valid tar header checksums. The parser's verify_checksum() call at the top of parse_header() rejects ~100% of random inputs immediately, so the fuzzers never exercise PAX extensions, GNU long name/link, sparse files, field parsing, etc.

Changes

parse.rs fuzzer: Uses a new Parser::set_verify_checksums(bool) API to skip checksum verification ~90% of the time (determined by data[0] % 10). This lets the fuzzer exercise all code paths beyond the checksum gate. The remaining 10% keeps checksum validation testing.

differential.rs fuzzer: Uses a fixup_checksums() approach that walks through 512-byte blocks and rewrites checksum fields in-place, since both tar-core and tar-rs must see identical valid-checksum input. Also factors comparison logic into a compare_entries() helper to avoid duplication between the fixup and raw paths.

src/parse.rs: Adds Parser::set_verify_checksums(bool) API (analogous to existing set_allow_empty_path). Defaults to true.

roundtrip.rs: No changes needed — already uses EntryBuilder which produces valid checksums.

The parse.rs and differential.rs fuzz targets were getting almost zero coverage of deeper parser logic (PAX extensions, GNU long name/link, sparse files, field parsing, etc.) because random fuzz input almost never has valid tar header checksums. The parser's verify_checksum() call at the top of parse_header() rejects ~100% of random inputs immediately, causing the fuzzer to break on every input before exercising any interesting code paths. For the parse.rs fuzzer, add a Parser::set_verify_checksums(bool) API that allows skipping checksum verification entirely. The fuzzer uses this ~90% of the time (determined by the first byte of input), letting it exercise all the parsing code paths beyond the checksum gate. The remaining 10% keeps checksum validation testing. For the differential fuzzer, since both tar-core and tar-rs must see identical data with valid checksums, use a fixup_checksums() approach that walks through 512-byte blocks and rewrites checksum fields in-place before passing to both parsers. It also parses the size field to skip over content blocks, so subsequent headers get fixed too. The roundtrip.rs fuzzer already uses EntryBuilder which produces valid checksums, so it does not need changes. Assisted-by: OpenCode (Claude claude-opus-4-6)

…rences The differential fuzzer was hitting false positives from two sources: 1. Truncated archives: tar-core's test harness pushed entries before verifying the full padded content was present. tar-rs refuses to yield entries whose content can't be fully read. Fix by checking that the entire padded content is available before pushing the entry. 2. Parsing leniency: tar-core intentionally accepts some inputs that tar-rs rejects (e.g. all-null numeric fields are parsed as 0, while tar-rs errors on them). Change the entry count assertion to only require tar-core parses at least as many entries as tar-rs, not exactly the same count. Found by running the improved checksum-aware fuzzers from the previous commit. Assisted-by: OpenCode (Claude claude-opus-4-6)

Multiple fixes found by running the improved checksum-aware fuzzers: parse_octal: Truncate at null first, then trim all ASCII whitespace (not just spaces) from both ends. This matches tar-rs's behavior of calling .trim() after null-truncation, fixing a differential where tar-core rejected fields with tab characters that tar-rs accepted. testutil parse_tar_core: Check that enough content data is present before yielding an entry, matching tar-rs's read_exact semantics. testutil parse_tar_rs: Skip metadata entry types (GNU long name/link, PAX headers) that tar-core handles internally. Require numeric fields to parse successfully instead of silently defaulting to 0. Skip entries with empty paths since tar-core rejects those. differential fuzzer: Dump raw header via Header's Debug impl on mismatch for easier diagnosis. Allow tar-core to parse more entries than tar-rs (it's intentionally more lenient for e.g. all-null fields). Assisted-by: OpenCode (Claude claude-opus-4-6)

Fix several sources of differential fuzzer crashes between tar-core and tar-rs: Parser (src/parse.rs): Require recognized magic (is_gnu() || is_ustar()) before treating typeflags L/K/x/g/S as extension entries. A V7 header whose typeflag byte happens to be 'L' should be emitted as a regular entry, not consumed as a GNU long name extension. This matches tar-rs's `is_recognized_header` guard. Octal parsing (src/lib.rs): Add `is_tar_whitespace` helper that treats VT (0x0b) as whitespace, matching real tar implementations. Rust's `u8::is_ascii_whitespace` omits VT, causing fields like "0000000\x0b" to fail. Test harness (testutil/src/lib.rs): Three alignment fixes for `parse_tar_core`: - Call `set_allow_empty_path(true)` so parsing continues past entries with empty paths (previously it would error out early). - Skip entries with empty paths after parsing, matching `parse_tar_rs`. - Skip entries with metadata-only typeflags (L/K/x/g) that tar-core emits as regular entries when magic is unrecognized. `parse_tar_rs` already had an equivalent skip via its `continue` on these types. The root cause of the specific crash artifact: a fuzzed header had typeflag 'L' (GnuLongName) but no valid magic bytes. tar-core treated it as an extension and consumed it, while tar-rs treated it as a regular entry and skipped it. After these fixes, both parsers handle V7-style metadata typeflags consistently. Assisted-by: OpenCode (Claude claude-opus-4-6)

Headers with a null byte at position 0 in the uname or gname field produce empty slices from username()/groupname(). Treat these as None to match tar-rs behavior, which filters empty values via .filter(|b| !b.is_empty()). Found via differential fuzzing. Assisted-by: OpenCode (Claude claude-opus-4-6)

Global PAX headers (type 'g') are defined by POSIX independently of the UStar/GNU magic field. Routing them through emit_entry (the catch-all path for unrecognized-magic extensions) is wrong because global headers may have arbitrary metadata in their numeric fields. Remove the is_extension_format guard for XGlobalHeader so these are always emitted as GlobalExtensions events. Found via differential fuzzing. Assisted-by: OpenCode (Claude claude-opus-4-6)

Move numeric field validation (mode, uid, gid, mtime, device_major, device_minor) before the metadata-type and empty-path skip logic. Previously, parse_tar_rs would skip metadata-type entries (L/K/x/g) before checking numeric fields, while tar-core's parser validates all fields eagerly in emit_entry. This caused false mismatches when an unrecognized-magic 'x'-typed block had garbage in its numeric fields: tar-rs skipped it, tar-core errored. Also replace the lenient unwrap_or(None) for device fields with strict break-on-error, matching tar-core's behavior. Assisted-by: OpenCode (Claude claude-opus-4-6)

cgwalters added 7 commits March 2, 2026 01:39

cgwalters closed this Mar 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fuzz: Fix checksum coverage gap in parse and differential fuzzers#12

fuzz: Fix checksum coverage gap in parse and differential fuzzers#12
cgwalters wants to merge 7 commits intomainfrom
agent-fuzz-checksum-coverage

cgwalters commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cgwalters commented Mar 2, 2026

Summary

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant