Skip to content

Various improvements#16

Merged
cgwalters merged 7 commits intomainfrom
parser-fixes
Mar 4, 2026
Merged

Various improvements#16
cgwalters merged 7 commits intomainfrom
parser-fixes

Conversation

@cgwalters
Copy link
Collaborator

No description provided.

Replace PendingMetadata (which stored extension data as Vec<u8>) with
a Copy struct holding borrowed &'a [u8] slices from the input buffer.
Extension data is now threaded through parse_header → handle_extension →
emit_entry as function parameters instead of being stored in the Parser
struct.

This eliminates all heap allocations during extension processing. PAX-
derived paths, link targets, user/group names, and xattrs are now
Cow::Borrowed directly from the input, and ParsedEntry.pax is &'a [u8]
instead of Vec<u8>. The save/restore pattern on NeedData is also gone
since no state is written to self during extension handling.

Assisted-by: OpenCode (Claude claude-opus-4-6)
Change InvalidPaxValue.value and InvalidPaxSparseMap from String to
Cow<'static, str>. Static error messages use Cow::Borrowed (zero
allocation), while the few cases that interpolate values (format!) use
Cow::Owned. The from_utf8_lossy calls on non-UTF-8 PAX values are
replaced with a static "<non-UTF-8>" placeholder since the key alone
identifies the problem.

Assisted-by: OpenCode (Claude claude-opus-4-6)
Replace max_pax_size, max_gnu_long_size, and the previous max_path_len
with two cleaner limits:

- max_metadata_size: u32 — aggregate budget for all extension data
  (PAX headers + GNU long name/link) per entry. Default 1 MiB.
  Consolidates PaxTooLarge/GnuLongTooLarge into MetadataTooLarge.

- max_path_len: Option<u32> — optional filesystem-level path length
  check. None by default (we're a parser, not a filesystem). Callers
  extracting to disk should set this to libc::PATH_MAX or equivalent.

Also add Limits::check_path_len() helper and track running metadata
size in PendingMetadata.

Assisted-by: OpenCode (Claude claude-opus-4-6)
The strict/lenient toggle for PAX parse errors was a parser behavior
flag, not a resource limit — it was misplaced on Limits. Move it to
Parser as set_ignore_parsing_errors(bool), matching the pattern of
set_allow_empty_path and set_verify_checksums.

Remove Limits::strict() entirely. The defaults are already safe for
untrusted input (1 MiB metadata cap, bounded pending/sparse counts).
Callers who want tighter resource limits can set fields directly;
callers who want lenient PAX parsing call set_ignore_parsing_errors.

Assisted-by: OpenCode (Claude claude-opus-4-6)
Replace inline cargo commands in GHA workflows with `just ci`,
`just fuzz-all`, and `just generate-corpus`. This eliminates
duplication between the Justfile and the workflow definitions.

Also add `cargo test --no-default-features` to the `unit` target
and remove `fuzz-all` from the `ci` target (fuzzing runs as a
separate GHA job since it requires a nightly toolchain).

Assisted-by: OpenCode (Claude Opus)
The fuzzer found that chained extension headers (GNU long name/link,
PAX) with large declared sizes can cause an arithmetic overflow in
add_consumed() when the recursive parse_header calls unwind. Each
layer adds total_size (HEADER_SIZE + padded_size) to the consumed or
min_bytes field, and with Limits::permissive() allowing metadata up
to u32::MAX, two such additions can overflow usize.

Fix by using saturating_add instead of plain +. A saturated value
(usize::MAX) will always exceed the input length, so callers
correctly treat it as insufficient data.

Crash: parse.rs:413 panic_const_add_overflow in GlobalExtensions
variant, found by extended fuzzing on commit 518dcfb.

Assisted-by: OpenCode (Claude Opus)
When a fuzzer crashes, the crash-reproducing input and logs are now
saved as GitHub Actions artifacts, making it possible to download
and reproduce failures locally without re-running long fuzz sessions.

Assisted-by: OpenCode (Claude Opus)
@cgwalters cgwalters merged commit d6dce93 into main Mar 4, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant