This repository accompanies the paper A Human-in-the-Loop Corpus for LLM-Based Simplification of Scientific Summaries.
It presents a new dataset, evaluation scripts, and supporting materials for a two-phase workflow combining LLM-based generation with human feedback and expert post-editing.
Scientific papers are often written for in-field experts, limiting accessibility to readers from other domains.
This project introduces a human-in-the-loop approach to make scientific summaries more readable for non-specialists while maintaining technical accuracy.
We combine:
- Baseline LLM simplification of scientific summaries using GPT-4o-mini,
- Human feedback from non-CS STEM readers on simplicity, coherence, and fluency,
- Expert-edited gold summaries for high-fidelity simplification.
The resulting dataset supports the development and evaluation of LLMs and simplification systems aimed at cross-disciplinary scientific communication.
Scientific-Text-Simplification/
├── dataset/
│ ├── SciSummNet_summary_full.txt # Original SciSummNet expert summaries (source)
│ ├── GPT-simplified_summary_full.txt # Baseline simplifications by GPT-4o-mini
│ ├── phase1_GPT-simplified_passages_47 # 47-sample subset used for non-expert user study
│ ├── original_passages_47.txt # Matching original summaries for Phase 1
│ └── gold_simplified_summaries.txt # Expert-edited gold simplifications (Phase 2)
│
├── evaluation/
│ ├── evaluate_simp.py # Script for SARI, FKGL, BLEU, etc.
│ ├── final_evaluation_score_GPT.txt # Evaluation results for GPT-simplified summaries
│ └── final_evaluation_score_gold.txt # Evaluation results for gold simplifications
│
├── phase1/ # Non-expert annotation materials (if added)
├── phase2/ # Expert post-editing materials (if added)
└── README.md
| File | Description |
|---|---|
SciSummNet_summary_full.txt |
Source summaries from SciSummNet (expert-written). |
GPT-simplified_summary_full.txt |
LLM-generated simplified versions using GPT-4o-mini. |
phase1_GPT-simplified_passages_47 |
Subset of 47 passages used for Phase 1 user study. |
original_passages_47.txt |
The corresponding 47 original summaries (pre-simplification). |
gold_simplified_summaries.txt |
Expert post-edited gold-standard simplifications. |
Each file contains one aligned entry per sample (one per line or paragraph).
evaluation/evaluate_simp.py computes readability and simplification metrics such as:
- SARI (lexical simplification),
- Flesch–Kincaid Grade Level (FKGL) and Reading Ease (FKRE),
- BLEU, BERTScore, and LENS (semantic alignment).
final_evaluation_score_GPT.txt— evaluation results for GPT outputsfinal_evaluation_score_gold.txt— evaluation results for expert gold summaries
Key finding:
GPT-4o-mini improves surface readability, but expert post-editing produces text with higher precision, stylistic consistency, and scientific fidelity.
python >= 3.9
pip install pandas numpy textstat sacrebleu bert-score eassecd evaluation
python evaluate_simp.py --original ../dataset/original_passages_47.txt --simplified ../dataset/GPT-simplified_summary_full.txt --gold ../dataset/gold_simplified_summaries.txtOutputs are saved to:
evaluation/final_evaluation_score_GPT.txtevaluation/final_evaluation_score_gold.txt
-
Source Corpus:
SciSummNet — 1,000 ACL papers with abstracts, citation contexts, and 150-word expert summaries. -
Phase 1 – Non-Expert Evaluation:
- Participants from STEM fields (outside computer science) identified difficult sentences and rated GPT outputs on simplicity, coherence, and fluency.
-
Phase 2 – Expert Post-Editing:
- Computer science experts refined GPT outputs using Phase 1 feedback to ensure domain fidelity and stylistic clarity.
-
Evaluation:
- Combination of automatic metrics (SARI, FKGL, BLEU, BERTScore) and human qualitative comparison.
- LLM-generated simplifications increase readability but risk loss of precision.
- Expert-in-the-loop editing corrects oversimplification while preserving accuracy.
- The dataset supports training and benchmarking for scientific simplification systems.
- Encourages cross-disciplinary accessibility in scientific communication.
- This repository: intended under CC BY 4.0 license.
- SciSummNet data: not redistributed; obtain from the official source.