GitHub - hanyeonjee/evidence-units: Official evaluation code & QA pairs for "Evidence Units: Ontology-Grounded Document Organization for Parser-Independent Retrieval"

English | 한국어

Evidence Units is a parser-independent document organization framework that groups visual assets with their contextual text into semantically complete retrieval units — achieving consistent retrieval gains across any document parser.

🔍 What is an Evidence Unit?

An Evidence Unit (EU) is a semantically complete document unit that groups visual assets (tables, charts, figures) with their contextual text (captions, headers, labels, paragraphs) — constructed through ontology-grounded normalization that works regardless of which document parser you use.

┌─────────────────────────────────────┐
│  section_header  "2.2 Methods"      │
│  table           [HTML data]        │  ← Evidence Unit
│  unit_label      "(Unit: mg/L)"     │
│  support_para    "As shown above…"  │
└─────────────────────────────────────┘

Key property: EU spatial footprints converge across parsers (MinerU, Docling, etc.) even when individual bounding boxes differ — making downstream retrieval parser-independent.

📦 This Repository

This repo releases the evaluation code and QA pairs used in the paper.

File	Description
`eval_retrieval_combined.py`	Retrieval evaluation script (LCS, Recall@K, MinK)
`qas.json`	1,551 QA pairs generated from OmniDocBench v1.0

Full EU construction pipeline is not included in this release.

🚀 Quick Start

git clone https://github.com/hanyeonjee/evidence-units
cd evidence-units
pip install sentence-transformers numpy

# Baseline evaluation (GT annotations, element-level)
python eval_retrieval.py \
    --gt   OmniDocBench.json \
    --qas  qas.json \
    --output results/

# Cross-parser evaluation with pre-computed EU outputs
python eval_retrieval.py \
    --gt              OmniDocBench.json \
    --qas             qas.json \
    --output          results/ \
    --docling-eu-dir  path/to/eu_docling \
    --mineru-eu-dir   path/to/eu_mineru

📊 Results on OmniDocBench (1,340 pages · 1,551 QA pairs)

Method	Avg LCS	Recall@1	MinK ↓
w/o EU (baseline)	0.4417	0.157	2.70
w/ EU (ours)	0.7172	0.406	2.00
Δ	+0.275	+0.249	−0.70

Cross-parser consistency: ΔLCS ≈ +0.26–0.28 across GT, MinerU, and Docling.

🗂️ QA Pair Format

{
  "qa_id": "omnidoc_table_0042",
  "type": "table",
  "question": "Table 1. Water quality in the experiments.",
  "evidence_node_ids": ["node_012", "node_013", "node_014"],
  "page_id": "scihub_page_002"
}

type is one of table · figure · text.

📝 Citation

@article{han2025evidenceunits,
  title     = {Evidence Units: Ontology-Grounded Document Organization
               for Parser-Independent Retrieval},
  author    = {Han, Yeonjee},
  journal   = {arXiv preprint arXiv:XXXX.XXXXX},
  year      = {2025}
}

📬 Contact

Questions or issues → yeonjee.han@kt.com

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
README.md		README.md
eval_retrieval.py		eval_retrieval.py
qas_en.json		qas_en.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 What is an Evidence Unit?

📦 This Repository

🚀 Quick Start

📊 Results on OmniDocBench (1,340 pages · 1,551 QA pairs)

🗂️ QA Pair Format

📝 Citation

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔍 What is an Evidence Unit?

📦 This Repository

🚀 Quick Start

📊 Results on OmniDocBench (1,340 pages · 1,551 QA pairs)

🗂️ QA Pair Format

📝 Citation

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages