Context
pdftable currently has no public accessor for a PDF's /Outlines (bookmark) tree. Consumers that want bookmark titles for section-tree construction must keep a parallel PDF reader. In vectorless-engine, integrating pdftable as the primary parser still required keeping github.com/ledongthuc/pdf as a secondary dependency purely for reader.Outline().
Proposed API
type OutlineEntry struct {
Title string
Level int // 0 = top-level
Page int // 1-indexed
}
func (d *Document) Outline() ([]OutlineEntry, error)
Flat list (depth-first, pre-order) with Level so callers can rebuild the tree. Returns an empty slice + nil error when the PDF has no /Outlines dict.
Why this matters
- Eliminates the need for downstream consumers to carry a second PDF library.
- Vectorless-engine's PDF parser can do the entire ingest pipeline (chars, words, tables, outline) through pdftable.
- The implementation is mechanical — read
Catalog -> Outlines -> First/Next/Title/Dest -> Page. Adobe spec is in PDF 1.7 §12.3.3.
Acceptance
- Existing tests + golden fixtures unchanged.
- New test fixture: a PDF with a 3-entry outline (nested 2 levels deep). Assert
Outline() returns the flat list with correct Level + Page.
- Pdfplumber's reference behaviour:
pdfminer.six exposes outlines via PDFDocument.get_outlines(). Mirror what it returns.
Out of scope for this issue
- Outline mutation / writing.
- Outline destinations that are XYZ-with-zoom or fit-rect — for now, just emit the page number; callers can re-call
Page.Bbox or similar themselves.
Filed during the vectorless-engine integration of pdftable v0.3.0 (PR hallelx2/vectorless-engine#20).
Context
pdftable currently has no public accessor for a PDF's
/Outlines(bookmark) tree. Consumers that want bookmark titles for section-tree construction must keep a parallel PDF reader. In vectorless-engine, integrating pdftable as the primary parser still required keepinggithub.com/ledongthuc/pdfas a secondary dependency purely forreader.Outline().Proposed API
Flat list (depth-first, pre-order) with
Levelso callers can rebuild the tree. Returns an empty slice + nil error when the PDF has no/Outlinesdict.Why this matters
Catalog -> Outlines -> First/Next/Title/Dest -> Page. Adobe spec is in PDF 1.7 §12.3.3.Acceptance
Outline()returns the flat list with correct Level + Page.pdfminer.sixexposes outlines viaPDFDocument.get_outlines(). Mirror what it returns.Out of scope for this issue
Page.Bboxor similar themselves.Filed during the vectorless-engine integration of pdftable v0.3.0 (PR hallelx2/vectorless-engine#20).