Skip to content

fix: preserve dictionary object numbers and resolve glyphs via encoding#125

Merged
Velli20 merged 1 commit intomainfrom
fix-obj-num
Apr 11, 2026
Merged

fix: preserve dictionary object numbers and resolve glyphs via encoding#125
Velli20 merged 1 commit intomainfrom
fix-obj-num

Conversation

@Velli20
Copy link
Copy Markdown
Owner

@Velli20 Velli20 commented Apr 9, 2026

Inline dictionaries should not default to object number 0. Make Dictionary::object_number optional, assign it when an indirect object is inserted into ObjectCollection, and only populate resource caches when a real object number exists. This avoids bogus cache hits and lets try_object_number report missing numbers explicitly.

For embedded TrueType fonts, consult the text state's glyph name, map it through the Adobe Glyph List, and resolve it through the cached skrifa charmap before falling back to direct Unicode and raw glyph IDs. This follows PDF encoding-based glyph selection more closely for non-CID fonts.

Inline dictionaries should not default to object number 0. Make
Dictionary::object_number optional, assign it when an indirect object
is inserted into ObjectCollection, and only populate resource caches
when a real object number exists. This avoids bogus cache hits and
lets try_object_number report missing numbers explicitly.

For embedded TrueType fonts, consult the text state's glyph name, map
it through the Adobe Glyph List, and resolve it through the cached
skrifa charmap before falling back to direct Unicode and raw glyph
IDs. This follows PDF encoding-based glyph selection more closely for
non-CID fonts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts PDF object-number tracking to avoid treating inline dictionaries as object 0, and refines TrueType glyph resolution for non-CID fonts by preferring encoding-derived glyph names mapped through the Adobe Glyph List (AGL) into the font’s cmap.

Changes:

  • Make Dictionary::object_number optional and assign it when indirect objects are inserted into ObjectCollection.
  • Update resource caching to only key caches when a real dictionary object number exists, avoiding bogus cache hits.
  • Rework non-CID TrueType glyph selection to use /Encoding glyph names → AGL → cmap, with fallbacks.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
crates/pdf-parser/src/indirect_object.rs Stops forcing dictionary object numbers during parse; defers assignment to collection insertion.
crates/pdf-page/src/resources.rs Gates font/extGState caching on Dictionary::object_number being present.
crates/pdf-object/src/object_variant.rs Makes try_object_number() report missing dictionary numbers explicitly.
crates/pdf-object/src/dictionary.rs Changes dictionary object number storage to Option<usize> (default None).
crates/pdf-object-collection/src/object_collection.rs Assigns dictionary object numbers when inserting indirect objects into the collection.
crates/pdf-canvas/src/truetype_font_renderer.rs Adds cmap caching (Charmap) and switches glyph resolution to encoding-name-based lookup via AGL.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +65 to 83
fn resolve_glyph_id(charmap: &Charmap<'_>, char_code: u16, glyph_name: Option<&str>) -> GlyphId {
// Step 1: Encoding -> glyph name -> Unicode (via AGL) -> cmap
if let Some(name) = glyph_name
&& let Some(unicode_char) = glyph_name_to_unicode(name)
&& let Some(gid) = charmap.map(unicode_char)
{
return gid;
}

if let Some(id) = resolved {
return id;
// Step 2: treat char_code as a Unicode codepoint directly
if let Some(unicode_char) = char::from_u32(u32::from(char_code))
&& let Some(gid) = charmap.map(unicode_char)
{
return gid;
}

// Step 3: use the character code as a raw glyph index
GlyphId::new(u32::from(char_code))
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve_glyph_id dropped the previous Symbol cmap fallbacks (e.g. probing U+F000/U+F100 + code). For TrueType fonts that only expose a Microsoft Symbol cmap (common for Symbol/ZapfDingbats-like fonts) and/or when the PDF /Encoding can't provide a glyph name, steps (1) and (2) will likely miss, and step (3) will incorrectly treat the PDF char code as a raw glyph index. Consider adding an additional fallback before step (3) that maps char_code into the private-use ranges (e.g. U+F000 and U+F100) and probes charmap with those codepoints, preserving the previous behavior for symbol-encoded fonts.

Copilot uses AI. Check for mistakes.
@Velli20 Velli20 merged commit f65a441 into main Apr 11, 2026
8 checks passed
@Velli20 Velli20 deleted the fix-obj-num branch April 11, 2026 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants