fix: preserve dictionary object numbers and resolve glyphs via encoding#125
fix: preserve dictionary object numbers and resolve glyphs via encoding#125
Conversation
Inline dictionaries should not default to object number 0. Make Dictionary::object_number optional, assign it when an indirect object is inserted into ObjectCollection, and only populate resource caches when a real object number exists. This avoids bogus cache hits and lets try_object_number report missing numbers explicitly. For embedded TrueType fonts, consult the text state's glyph name, map it through the Adobe Glyph List, and resolve it through the cached skrifa charmap before falling back to direct Unicode and raw glyph IDs. This follows PDF encoding-based glyph selection more closely for non-CID fonts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adjusts PDF object-number tracking to avoid treating inline dictionaries as object 0, and refines TrueType glyph resolution for non-CID fonts by preferring encoding-derived glyph names mapped through the Adobe Glyph List (AGL) into the font’s cmap.
Changes:
- Make
Dictionary::object_numberoptional and assign it when indirect objects are inserted intoObjectCollection. - Update resource caching to only key caches when a real dictionary object number exists, avoiding bogus cache hits.
- Rework non-CID TrueType glyph selection to use
/Encodingglyph names → AGL → cmap, with fallbacks.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
crates/pdf-parser/src/indirect_object.rs |
Stops forcing dictionary object numbers during parse; defers assignment to collection insertion. |
crates/pdf-page/src/resources.rs |
Gates font/extGState caching on Dictionary::object_number being present. |
crates/pdf-object/src/object_variant.rs |
Makes try_object_number() report missing dictionary numbers explicitly. |
crates/pdf-object/src/dictionary.rs |
Changes dictionary object number storage to Option<usize> (default None). |
crates/pdf-object-collection/src/object_collection.rs |
Assigns dictionary object numbers when inserting indirect objects into the collection. |
crates/pdf-canvas/src/truetype_font_renderer.rs |
Adds cmap caching (Charmap) and switches glyph resolution to encoding-name-based lookup via AGL. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fn resolve_glyph_id(charmap: &Charmap<'_>, char_code: u16, glyph_name: Option<&str>) -> GlyphId { | ||
| // Step 1: Encoding -> glyph name -> Unicode (via AGL) -> cmap | ||
| if let Some(name) = glyph_name | ||
| && let Some(unicode_char) = glyph_name_to_unicode(name) | ||
| && let Some(gid) = charmap.map(unicode_char) | ||
| { | ||
| return gid; | ||
| } | ||
|
|
||
| if let Some(id) = resolved { | ||
| return id; | ||
| // Step 2: treat char_code as a Unicode codepoint directly | ||
| if let Some(unicode_char) = char::from_u32(u32::from(char_code)) | ||
| && let Some(gid) = charmap.map(unicode_char) | ||
| { | ||
| return gid; | ||
| } | ||
|
|
||
| // Step 3: use the character code as a raw glyph index | ||
| GlyphId::new(u32::from(char_code)) | ||
| } |
There was a problem hiding this comment.
resolve_glyph_id dropped the previous Symbol cmap fallbacks (e.g. probing U+F000/U+F100 + code). For TrueType fonts that only expose a Microsoft Symbol cmap (common for Symbol/ZapfDingbats-like fonts) and/or when the PDF /Encoding can't provide a glyph name, steps (1) and (2) will likely miss, and step (3) will incorrectly treat the PDF char code as a raw glyph index. Consider adding an additional fallback before step (3) that maps char_code into the private-use ranges (e.g. U+F000 and U+F100) and probes charmap with those codepoints, preserving the previous behavior for symbol-encoded fonts.
Inline dictionaries should not default to object number 0. Make Dictionary::object_number optional, assign it when an indirect object is inserted into ObjectCollection, and only populate resource caches when a real object number exists. This avoids bogus cache hits and lets try_object_number report missing numbers explicitly.
For embedded TrueType fonts, consult the text state's glyph name, map it through the Adobe Glyph List, and resolve it through the cached skrifa charmap before falling back to direct Unicode and raw glyph IDs. This follows PDF encoding-based glyph selection more closely for non-CID fonts.