Skip to content

feat: Document.Outline() accessor for PDF bookmarks (/Outlines) #5

@hallelx2

Description

@hallelx2

Context

pdftable currently has no public accessor for a PDF's /Outlines (bookmark) tree. Consumers that want bookmark titles for section-tree construction must keep a parallel PDF reader. In vectorless-engine, integrating pdftable as the primary parser still required keeping github.com/ledongthuc/pdf as a secondary dependency purely for reader.Outline().

Proposed API

type OutlineEntry struct {
    Title    string
    Level    int    // 0 = top-level
    Page     int    // 1-indexed
}

func (d *Document) Outline() ([]OutlineEntry, error)

Flat list (depth-first, pre-order) with Level so callers can rebuild the tree. Returns an empty slice + nil error when the PDF has no /Outlines dict.

Why this matters

  • Eliminates the need for downstream consumers to carry a second PDF library.
  • Vectorless-engine's PDF parser can do the entire ingest pipeline (chars, words, tables, outline) through pdftable.
  • The implementation is mechanical — read Catalog -> Outlines -> First/Next/Title/Dest -> Page. Adobe spec is in PDF 1.7 §12.3.3.

Acceptance

  • Existing tests + golden fixtures unchanged.
  • New test fixture: a PDF with a 3-entry outline (nested 2 levels deep). Assert Outline() returns the flat list with correct Level + Page.
  • Pdfplumber's reference behaviour: pdfminer.six exposes outlines via PDFDocument.get_outlines(). Mirror what it returns.

Out of scope for this issue

  • Outline mutation / writing.
  • Outline destinations that are XYZ-with-zoom or fit-rect — for now, just emit the page number; callers can re-call Page.Bbox or similar themselves.

Filed during the vectorless-engine integration of pdftable v0.3.0 (PR hallelx2/vectorless-engine#20).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions