pdf-object: implement CCITTFaxDecode filter#121
Conversation
There was a problem hiding this comment.
Pull request overview
Adds CCITTFaxDecode support to pdf-object by introducing a pure-Rust CCITT (Group 3/4) decoder and wiring it into stream filter decoding so CCITT-compressed streams can be decompressed via StreamObject::data().
Changes:
- Added
ccittmodule implementing CCITTFaxDecode decoding (Group 3 1D, Group 3 2D, Group 4/MMR) with unit tests. - Wired
Filter::CCITTFaxDecodeintoStreamObject::data()and added a/DecodeParmsextraction helper. - Implemented
Filter::decode_ccitt_fax(.., params)as a thin wrapper over the new decoder.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| crates/pdf-object/src/stream.rs | Adds CCITTFaxDecode handling in StreamObject::data() and parses /DecodeParms per filter index. |
| crates/pdf-object/src/lib.rs | Exposes the new ccitt module publicly. |
| crates/pdf-object/src/filter.rs | Implements CCITTFaxDecode decode entrypoint and documents its behavior. |
| crates/pdf-object/src/ccitt.rs | New CCITT decoder implementation + parameter parsing + unit tests. |
Comments suppressed due to low confidence (1)
crates/pdf-object/src/stream.rs:99
StreamObject::data()documents that it errors on unsupported filter chains, but the default match arm prints to stdout and returns partially decoded data. This can mask failures and produce incorrect output for callers. Consider returning anObjectErrorforFilter::Unsupported(_)/ unknown filters instead ofprintln!+break.
_ => {
println!(
"Unsupported filter in data_with_remaining_filter: {:?}",
filter
);
break;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1dfe18b to
56fd2a7
Compare
Add `pub mod ccitt` with a pure-Rust CCITTFaxDecode decoder supporting all three encoding modes: Group 3 1D (K=0), Group 3 2D (K>0), and Group 4 / MMR (K<0). Wire it into `StreamObject::data()` with a `ccitt_params_for_filter()` helper that reads `/DecodeParms` for both single-filter and multi-filter streams. The implementation embeds Huffman tables from ITU-T T.4 Appendix A (WHITE/BLACK terminating and make-up codes, common extended make-up codes) and a T.6 mode-code decoder (Pass, Horizontal, Vertical ±0–3). `CCITTFaxParams` is parsed directly from PDF dictionary entries without requiring an `ObjectResolver`. Includes 14 unit tests covering params defaults, bit reader, Huffman run-length decoding, EOL handling, 1D rows (all-white, all-black, mixed), 2D Group 4 rows (V0 / Horizontal modes, all-white, row limit), and row packing with both polarities. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
f001490 to
68f1211
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 5 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| loop { | ||
| if rows > 0 && decoded_rows >= rows { | ||
| break; | ||
| } | ||
|
|
||
| // Skip any EOL marker preceding the row (no-op when none present). | ||
| skip_eol(&mut reader); | ||
|
|
||
| if reader.exhausted() { | ||
| break; | ||
| } |
There was a problem hiding this comment.
CCITTFaxParams::end_of_block is never consulted during decoding. For Group 3/4 streams with rows = 0 (the default), this will treat the RTC/EOFB terminator as image data and can emit a spurious final row (often all-white) or stop mid-row depending on bit patterns. Please add explicit RTC (Group 3) / EOFB (Group 4) detection when end_of_block is true and stop decoding before appending another row; also add unit tests that cover termination behavior.
| // Skip any EOL marker preceding the row (no-op when none present). | ||
| skip_eol(&mut reader); | ||
|
|
There was a problem hiding this comment.
skip_eol(&mut reader) is called unconditionally before every row, even when params.end_of_line is false (and for Group 4, which normally has no EOLs). This can mis-synchronize decoding if a data bit sequence happens to match the EOL pattern. Consider only consuming a leading EOL when the encoding mode/params require it (e.g., end_of_line for Group 3 1D, mandatory row EOL for Group 3 2D) and avoid scanning for EOLs in Group 4 unless you are explicitly checking for EOFB.
| /// Extract [`CCITTFaxParams`][crate::ccitt::CCITTFaxParams] for the filter at | ||
| /// `filter_idx` from the stream's `/DecodeParms` dictionary entry. | ||
| /// | ||
| /// Per PDF spec §7.3.8.2, `/DecodeParms` is either a single dictionary (when | ||
| /// there is one filter) or an array of dictionaries (one per filter). Values | ||
| /// are always inline objects, so no object resolver is needed. | ||
| fn ccitt_params_for_filter(dict: &Dictionary, filter_idx: usize) -> crate::ccitt::CCITTFaxParams { | ||
| match dict.get("DecodeParms") { | ||
| Some(ObjectVariant::Dictionary(d)) => crate::ccitt::CCITTFaxParams::from_dictionary(d), | ||
| Some(ObjectVariant::Array(arr)) => arr |
There was a problem hiding this comment.
The helper assumes /DecodeParms values are always inline objects (and ignores ObjectVariant::Reference). In PDF, /DecodeParms can legally be an indirect reference, so this will silently fall back to defaults and potentially decode CCITT data with the wrong parameters (K/Columns/BlackIs1/etc.). Either resolve references here (which likely requires threading an ObjectResolver into StreamObject::data) or detect references and return a clear DecompressionError/update the doc comment to state indirect /DecodeParms is unsupported.
| /// Decodes CCITTFaxDecode (Group 3 / Group 4 fax) compressed stream data. | ||
| /// | ||
| /// # Parameters | ||
| /// | ||
| /// - `stream_data`: The compressed byte stream to decode. | ||
| /// - `params`: Decode parameters parsed from the stream's `/DecodeParms` dictionary. | ||
| /// | ||
| /// # Returns | ||
| /// | ||
| /// The decompressed image data as a packed MSB-first 1-bit-per-pixel `Vec<u8>`. | ||
| /// | ||
| /// # Errors | ||
| /// | ||
| /// Returns [`ObjectError::DecompressionError`] if the stream is truncated or | ||
| /// contains an invalid bit pattern. | ||
| pub fn decode_ccitt_fax( | ||
| stream_data: &[u8], | ||
| params: &crate::ccitt::CCITTFaxParams, | ||
| ) -> Result<Vec<u8>, ObjectError> { | ||
| crate::ccitt::decode(stream_data, params) |
There was a problem hiding this comment.
The decode_ccitt_fax docstring says it returns DecompressionError on truncated/invalid bit patterns, but crate::ccitt::decode() currently only errors for columns == 0 and otherwise decodes best-effort (including on truncation). Please align the documentation with the actual behavior (or change the decoder to return errors as documented).
| /// Whether a block terminator (EOFB / RTC) is present. Default: `true`. | ||
| pub end_of_block: bool, | ||
| /// If `true`, black = 1 and white = 0. Default: `false` (white = 1). | ||
| pub black_is1: bool, | ||
| /// Tolerated number of damaged rows before returning an error. Default: `0`. | ||
| pub damaged_rows_before_error: u32, |
There was a problem hiding this comment.
damaged_rows_before_error is parsed from /DecodeParms but never used by the decoder. This is misleading (callers may assume it has an effect). Either implement damaged-row tracking/erroring per the PDF spec, or remove the field / mark it as currently ignored in the docs.
| // ─── Tests ─────────────────────────────────────────────────────────────────── | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { | ||
| use super::*; | ||
|
|
||
| fn params_1d(columns: u32, rows: u32) -> CCITTFaxParams { | ||
| CCITTFaxParams { | ||
| k: 0, | ||
| columns, | ||
| rows, | ||
| end_of_line: false, | ||
| encoded_byte_align: false, | ||
| end_of_block: true, | ||
| black_is1: false, | ||
| damaged_rows_before_error: 0, | ||
| } | ||
| } | ||
|
|
||
| fn params_g4(columns: u32, rows: u32) -> CCITTFaxParams { | ||
| CCITTFaxParams { | ||
| k: -1, | ||
| columns, | ||
| rows, | ||
| end_of_line: false, | ||
| encoded_byte_align: false, | ||
| end_of_block: true, | ||
| black_is1: false, | ||
| damaged_rows_before_error: 0, | ||
| } | ||
| } |
There was a problem hiding this comment.
PR description says the unit tests cover EOL handling, mixed 1D rows, and Group 4 2D modes (V0/Horizontal, row limit), but the tests in this file currently only cover fill_bits/find_bit basics, a few simple 1D cases, and a Group 4 empty-stream case (no 2D mode coverage). Either expand the tests to match the stated coverage (especially around 2D decoding and end-of-block handling) or update the PR description to reflect the current test scope.
…safety - Introduce CcittDecoder struct encapsulating BitReader, row buffers, and decode configuration; eliminates parameter threading across decode_1d_row/decode_2d_row - Extract clamp_to_usize() helper replacing 7 repeated usize::try_from(x.max(0)).unwrap_or(0) call sites - Replace manual iterator loops with idiomatic slice::fill and for_each - Remove dead EOL-recovery loop in decode_1d_row (always returned Err) - Move misplaced fill_bits/find_bit/find_b1_b2 tests from bitreader.rs into ccitt.rs where those functions are accessible (fixes compilation) - Add From<CcittDecodeError> for ObjectError; replace .unwrap() with ? in filter.rs decode_ccitt_fax to avoid panics on decode failure - Fix empty-line-after-doc clippy warning - Fix incorrect G3-2D test constant (0xC8 -> 0xCC) - Add 12 new edge-case unit tests (clamp_to_usize, ref_pixel, one_lead_pos, fill_bits boundaries, find_bit skip path, multi-row, error conversion) - Add doc comments to all public items Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace i32 sentinel (-1) with Option<usize> for the a0 position tracker in decode_2d_row. This removes all usize↔i32 conversions from the method: - Change a0 from i32 to Option<usize> (None = before first pixel) - Change find_b1_b2 to accept Option<usize> instead of i32 - Change decode_run_seq return type from Result<i32> to Result<usize> - Add apply_delta(usize, i32) helper for signed vertical offsets - Remove columns_i32 field from CcittDecoder - Remove clamp_to_usize function and its tests (zero remaining callers) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ticalLeft(usize) Replace the signed Vertical(i32) variant with two unsigned variants: - VerticalRight(usize) for non-negative deltas (a1 = b1 + delta) - VerticalLeft(usize) for negative deltas (a1 = b1 - delta) This eliminates the apply_delta helper and its usize::try_from conversions, using checked_add/checked_sub directly at the call site. Also fix pre-existing issues in ccitt_fax_params.rs: - Rename DEFFAULT_IMAGE_WIDTH → DEFAULT_IMAGE_WIDTH (typo) - Change columns field and test helpers from u32 to usize Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Handle monotonicity violations gracefully in decode_2d_row: VerticalRight overflow clamps to columns, VerticalLeft underflow uses saturating_sub, and position regressions end the row early instead of returning an error. - Make decode_all do best-effort recovery: row-level errors (InvalidCode, UnknownExtensionCode, etc.) include the partial row and continue; UnexpectedEof includes partial row then stops. - Wire up damaged_rows_before_error parameter (0 = unlimited per PDF spec §7.4.6). - Add tests for regression leniency, extension tolerance, and the damaged-row threshold. Fixes rendering of PDF32000_2008.pdf which contains CCITT-encoded images that trigger position regressions during 2D decoding. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace ONE_LEAD_POS[256] table with u8::leading_zeros() (single hardware instruction on all targets) - Optimize find_bit to scan 64 bits at a time via u64::from_be_bytes + u64::leading_zeros instead of byte-by-byte with 8-byte skip loop - Simplify fill_bits mask computation: replace branchy if/else with branchless shift expressions and clearer start_mask/end_mask naming Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move stream decompression filters (FlateDecode, DCTDecode, JPXDecode, CCITTFaxDecode, ASCII85Decode) from pdf-object into a new pdf-filter crate. Streams are now decoded at ObjectCollection insertion time, simplifying StreamObject to hold only decoded data. Key changes: - Add pdf-filter crate with FilterError, Filter enum with Display impl - Replace println! for unsupported filters with proper error return - Remove unused pdf-filter dep from pdf-page - Update StreamObject docs to reflect decoded-at-insertion semantics Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the per-iteration ccitt_params_for_filter() helper with a parse_decode_params() function that reads /DecodeParms once and returns a Vec<DecodeParms> aligned 1:1 with the filter list. - Add DecodeParms enum with None and CcittFax variants - Add parse_decode_params() to parse all params before the loop - Refactor decode() loop to zip filters with pre-parsed params - Delete ccitt_params_for_filter() function Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
12a4878 to
8b82825
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 22 out of 23 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Stream data is decoded (decompressed) at insertion time by | ||
| /// [`ObjectCollection`], so the [`data`](Self::data) field always contains the | ||
| /// fully decoded bytes. |
There was a problem hiding this comment.
The type-level docs claim stream data is always decoded at insertion time by ObjectCollection, but StreamObject::new is public and can be constructed with raw (still-filtered) bytes outside of an ObjectCollection context. To avoid misleading API consumers, consider rewording this to describe the behavior when streams are stored in an ObjectCollection (or otherwise clarify that StreamObject itself does not enforce decoding).
| /// | ||
| /// # Errors | ||
| /// | ||
| /// Returns [`FilterError`] if any filter in the chain fails or is unsupported. | ||
| pub fn decode(stream: &StreamObject) -> Result<Cow<'_, [u8]>, FilterError> { | ||
| let mut data: Cow<'_, [u8]> = Cow::Borrowed(&stream.data); | ||
| let objects = PassthroughResolver; | ||
| let filters = Filter::from_dictionary(&stream.dictionary, &objects)?; | ||
|
|
||
| let Some(filters) = &filters else { | ||
| return Ok(data); | ||
| }; | ||
|
|
||
| let decode_params = parse_decode_params(&stream.dictionary, filters, &objects); |
There was a problem hiding this comment.
decode() currently hard-codes PassthroughResolver, so any /Filter value that is an indirect reference (or contains indirect references) will not be resolved and filter parsing will fail or silently degrade. Consider taking an &dyn ObjectResolver (or providing a decode_with_resolver) so callers with an ObjectCollection can correctly resolve referenced filter metadata.
| /// | |
| /// # Errors | |
| /// | |
| /// Returns [`FilterError`] if any filter in the chain fails or is unsupported. | |
| pub fn decode(stream: &StreamObject) -> Result<Cow<'_, [u8]>, FilterError> { | |
| let mut data: Cow<'_, [u8]> = Cow::Borrowed(&stream.data); | |
| let objects = PassthroughResolver; | |
| let filters = Filter::from_dictionary(&stream.dictionary, &objects)?; | |
| let Some(filters) = &filters else { | |
| return Ok(data); | |
| }; | |
| let decode_params = parse_decode_params(&stream.dictionary, filters, &objects); | |
| /// For streams whose filter metadata may contain indirect references, use | |
| /// [`decode_with_resolver`] to provide an [`ObjectResolver`]. | |
| /// | |
| /// # Errors | |
| /// | |
| /// Returns [`FilterError`] if any filter in the chain fails or is unsupported. | |
| pub fn decode(stream: &StreamObject) -> Result<Cow<'_, [u8]>, FilterError> { | |
| let objects = PassthroughResolver; | |
| decode_with_resolver(stream, &objects) | |
| } | |
| /// Decodes a [`StreamObject`] by applying its full filter chain using the | |
| /// provided [`ObjectResolver`] for `/Filter` and `/DecodeParms` metadata. | |
| /// | |
| /// This should be used when the stream dictionary may contain indirect | |
| /// references that need to be resolved before parsing the filter chain. | |
| /// | |
| /// # Errors | |
| /// | |
| /// Returns [`FilterError`] if any filter in the chain fails or is unsupported. | |
| pub fn decode_with_resolver( | |
| stream: &StreamObject, | |
| objects: &dyn ObjectResolver, | |
| ) -> Result<Cow<'_, [u8]>, FilterError> { | |
| let mut data: Cow<'_, [u8]> = Cow::Borrowed(&stream.data); | |
| let filters = Filter::from_dictionary(&stream.dictionary, objects)?; | |
| let Some(filters) = &filters else { | |
| return Ok(data); | |
| }; | |
| let decode_params = parse_decode_params(&stream.dictionary, filters, objects); |
| let params_entry = dict.get("DecodeParms"); | ||
|
|
||
| // Pre-extract per-index dictionaries from the `/DecodeParms` value. | ||
| let param_dicts: Vec<Option<&Dictionary>> = match params_entry { | ||
| Some(ObjectVariant::Dictionary(d)) => { | ||
| // Single dict applies to all filters (common single-filter case). | ||
| vec![Some(d); filters.len()] | ||
| } | ||
| Some(ObjectVariant::Array(arr)) => filters | ||
| .iter() | ||
| .enumerate() | ||
| .map(|(i, _)| { | ||
| arr.get(i).and_then(|v| { | ||
| if let ObjectVariant::Dictionary(d) = v { | ||
| Some(d.as_ref()) | ||
| } else { | ||
| None | ||
| } | ||
| }) | ||
| }) | ||
| .collect(), | ||
| _ => vec![None; filters.len()], | ||
| }; | ||
|
|
||
| filters | ||
| .iter() | ||
| .zip(param_dicts.iter()) | ||
| .map(|(filter, param_dict)| match (filter, param_dict) { | ||
| (Filter::CCITTFaxDecode, Some(d)) => { | ||
| let p = CCITTFaxParams::from_dictionary(d, objects).unwrap_or_default(); | ||
| DecodeParms::CcittFax(p) | ||
| } | ||
| (Filter::CCITTFaxDecode, None) => DecodeParms::CcittFax(CCITTFaxParams::default()), | ||
| _ => DecodeParms::None, |
There was a problem hiding this comment.
parse_decode_params only handles /DecodeParms when it is embedded directly as a Dictionary/Array in the stream dictionary; it does not resolve references (e.g. /DecodeParms 12 0 R or array elements that are references/null). This can cause CCITT decoding to use default parameters even when the PDF provides explicit decode parms. Consider resolving the /DecodeParms object (and array elements) via objects.resolve_object(...) before pattern matching.
| let params_entry = dict.get("DecodeParms"); | |
| // Pre-extract per-index dictionaries from the `/DecodeParms` value. | |
| let param_dicts: Vec<Option<&Dictionary>> = match params_entry { | |
| Some(ObjectVariant::Dictionary(d)) => { | |
| // Single dict applies to all filters (common single-filter case). | |
| vec![Some(d); filters.len()] | |
| } | |
| Some(ObjectVariant::Array(arr)) => filters | |
| .iter() | |
| .enumerate() | |
| .map(|(i, _)| { | |
| arr.get(i).and_then(|v| { | |
| if let ObjectVariant::Dictionary(d) = v { | |
| Some(d.as_ref()) | |
| } else { | |
| None | |
| } | |
| }) | |
| }) | |
| .collect(), | |
| _ => vec![None; filters.len()], | |
| }; | |
| filters | |
| .iter() | |
| .zip(param_dicts.iter()) | |
| .map(|(filter, param_dict)| match (filter, param_dict) { | |
| (Filter::CCITTFaxDecode, Some(d)) => { | |
| let p = CCITTFaxParams::from_dictionary(d, objects).unwrap_or_default(); | |
| DecodeParms::CcittFax(p) | |
| } | |
| (Filter::CCITTFaxDecode, None) => DecodeParms::CcittFax(CCITTFaxParams::default()), | |
| _ => DecodeParms::None, | |
| let params_entry = dict | |
| .get("DecodeParms") | |
| .and_then(|value| objects.resolve_object(value).ok()); | |
| filters | |
| .iter() | |
| .enumerate() | |
| .map(|(i, filter)| { | |
| let param_dict = match params_entry.as_deref() { | |
| Some(ObjectVariant::Dictionary(d)) => Some(d.as_ref()), | |
| Some(ObjectVariant::Array(arr)) => arr | |
| .get(i) | |
| .and_then(|value| objects.resolve_object(value).ok()) | |
| .and_then(|resolved| match resolved.as_ref() { | |
| ObjectVariant::Dictionary(d) => Some(d.as_ref()), | |
| ObjectVariant::Null => None, | |
| _ => None, | |
| }), | |
| _ => None, | |
| }; | |
| match (filter, param_dict) { | |
| (Filter::CCITTFaxDecode, Some(d)) => { | |
| let p = CCITTFaxParams::from_dictionary(d, objects).unwrap_or_default(); | |
| DecodeParms::CcittFax(p) | |
| } | |
| (Filter::CCITTFaxDecode, None) => { | |
| DecodeParms::CcittFax(CCITTFaxParams::default()) | |
| } | |
| _ => DecodeParms::None, | |
| } |
| ObjectVariant::Stream(stream) => { | ||
| let data = pdf_filter::filter::decode(&stream)?.to_vec(); | ||
| let StreamObject { |
There was a problem hiding this comment.
Decoding streams during ObjectCollection::insert means the persisted stream data depends on pdf_filter::filter::decode, which currently cannot resolve indirect /Filter or /DecodeParms entries (it uses a passthrough resolver). This will store incorrectly decoded bytes for PDFs that use referenced filter metadata. Consider deferring decoding until the collection is populated (second pass) or otherwise providing decode with a resolver that can follow references in the collection.
| struct CcittDecoder<'a> { | ||
| reader: BitReader<'a>, | ||
| ref_row: Vec<u8>, | ||
| row_buf: Vec<u8>, | ||
| columns: usize, | ||
| k: i32, | ||
| end_of_line: bool, | ||
| byte_align: bool, | ||
| black_is1: bool, | ||
| /// Maximum number of damaged rows before aborting. | ||
| /// `0` means tolerate all damaged rows (PDF spec §7.4.6). | ||
| damaged_rows_before_error: u32, | ||
| } | ||
|
|
||
| impl<'a> CcittDecoder<'a> { | ||
| /// Construct a new decoder from raw stream data and CCITT parameters. | ||
| /// | ||
| /// # Errors | ||
| /// | ||
| /// Returns [`CcittDecodeError::ZeroColumns`] if `params.columns` is zero. | ||
| fn new(data: &'a [u8], params: &CCITTFaxParams) -> Result<Self, CcittDecodeError> { | ||
| if params.columns == 0 { | ||
| return Err(CcittDecodeError::ZeroColumns); | ||
| } | ||
| let row_bytes = params.columns.div_ceil(8); | ||
|
|
||
| Ok(Self { | ||
| reader: BitReader::new(data), | ||
| ref_row: vec![0xff; row_bytes], | ||
| row_buf: vec![0xff; row_bytes], | ||
| columns: params.columns, | ||
| k: params.k, | ||
| end_of_line: params.end_of_line, | ||
| byte_align: params.encoded_byte_align, | ||
| black_is1: params.black_is1, | ||
| damaged_rows_before_error: params.damaged_rows_before_error, | ||
| }) |
There was a problem hiding this comment.
CCITTFaxParams includes end_of_block, but the decoder never stores or consults it, so the implementation cannot stop cleanly on RTC/EOFB markers when Rows = 0 (the default). This can lead to extra garbage rows or spurious row-level errors after the actual image data ends. Consider wiring end_of_block into CcittDecoder and terminating decoding when the appropriate end-of-block marker is encountered.
| TwoDMode::Horizontal => { | ||
| let run_len1 = decode_run_seq(&mut self.reader, a0color)?; | ||
| let (a0_fill, a1) = match a0 { | ||
| None => (0, run_len1.saturating_add(1)), |
There was a problem hiding this comment.
In 2D Horizontal mode, the a0 == None case sets a1 = run_len1 + 1, which is inconsistent with the Some(pos) branch (a1 = pos + run_len1) and is very likely an off-by-one that shifts the first pair of runs by one pixel when Horizontal is the first opcode in a row. Consider making the None case follow the same coordinate convention as the Some case so a1 advances by exactly run_len1 pixels from the start of the line.
| None => (0, run_len1.saturating_add(1)), | |
| None => (0, run_len1), |
| } | ||
|
|
||
| TwoDMode::EndOfBlock => { | ||
| // First EOL of EOFB; bits already consumed in read_2d_mode. |
There was a problem hiding this comment.
TwoDMode::EndOfBlock is currently a no-op, so the 2D row decoder loop continues without advancing a0/a0color, which can produce incorrect output instead of terminating at the EOFB marker. If end-of-block markers are supported, consider treating this mode as an explicit termination condition for the current row/stream (e.g. return Ok(()) and let the outer loop stop).
| // First EOL of EOFB; bits already consumed in read_2d_mode. | |
| // EOFB marker consumed by read_2d_mode; terminate decoding | |
| // instead of continuing with stale a0/a0color state. | |
| return Ok(()); |
| #[cfg(test)] | ||
| mod tests { | ||
| use super::*; | ||
|
|
||
| #[test] | ||
| fn test_decode_ascii85_basic() { | ||
| // "Man " → known ASCII85 encoding "9jqo^" | ||
| let man_encoded = b"9jqo^~>"; | ||
| let decoded = decode_ascii85(man_encoded).expect("decode failed"); | ||
| assert_eq!(decoded, b"Man "); |
There was a problem hiding this comment.
The ASCII85 tests use expect(...) without a #[allow(clippy::expect_used)] / #[allow(clippy::unwrap_used)] guard. Since the workspace Clippy config denies expect_used/unwrap_used, cargo clippy --all-targets will likely fail. Consider adding the appropriate #[allow(...)] on the tests module (as done elsewhere in the repo) or rewriting tests to return Result and use ? instead of expect/unwrap.
| #[cfg(test)] | ||
| mod tests { | ||
| use super::*; | ||
|
|
||
| #[test] | ||
| fn fill_bits_full_byte() { | ||
| let mut row = [0xffu8; 1]; |
There was a problem hiding this comment.
The CCITT test module uses expect(...) / unwrap_err() without a #[allow(clippy::expect_used)] / #[allow(clippy::unwrap_used)] guard. Given the workspace Clippy config denies these lints, this will likely fail cargo clippy --all-targets. Consider adding the allow attributes on the tests module or rewriting tests to avoid expect/unwrap.
Add
pub mod ccittwith a pure-Rust CCITTFaxDecode decoder supporting all three encoding modes: Group 3 1D (K=0), Group 3 2D (K>0), and Group 4 / MMR (K<0). Wire it intoStreamObject::data()with accitt_params_for_filter()helper that reads/DecodeParmsfor both single-filter and multi-filter streams.The implementation embeds Huffman tables from ITU-T T.4 Appendix A (WHITE/BLACK terminating and make-up codes, common extended make-up codes) and a T.6 mode-code decoder (Pass, Horizontal, Vertical ±0–3).
CCITTFaxParamsis parsed directly from PDF dictionary entries without requiring anObjectResolver.Includes 14 unit tests covering params defaults, bit reader, Huffman run-length decoding, EOL handling, 1D rows (all-white, all-black, mixed), 2D Group 4 rows (V0 / Horizontal modes, all-white, row limit), and row packing with both polarities.