Skip to content

faerber-lab/scientific-text-simplification-corpus

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Human-in-the-Loop Corpus for LLM-Based Simplification of Scientific Summaries

This repository accompanies the paper A Human-in-the-Loop Corpus for LLM-Based Simplification of Scientific Summaries.
It presents a new dataset, evaluation scripts, and supporting materials for a two-phase workflow combining LLM-based generation with human feedback and expert post-editing.


🌍 Overview

Scientific papers are often written for in-field experts, limiting accessibility to readers from other domains.
This project introduces a human-in-the-loop approach to make scientific summaries more readable for non-specialists while maintaining technical accuracy.

We combine:

  1. Baseline LLM simplification of scientific summaries using GPT-4o-mini,
  2. Human feedback from non-CS STEM readers on simplicity, coherence, and fluency,
  3. Expert-edited gold summaries for high-fidelity simplification.

The resulting dataset supports the development and evaluation of LLMs and simplification systems aimed at cross-disciplinary scientific communication.


📂 Repository Structure

Scientific-Text-Simplification/
├── dataset/
│   ├── SciSummNet_summary_full.txt            # Original SciSummNet expert summaries (source)
│   ├── GPT-simplified_summary_full.txt        # Baseline simplifications by GPT-4o-mini
│   ├── phase1_GPT-simplified_passages_47      # 47-sample subset used for non-expert user study
│   ├── original_passages_47.txt               # Matching original summaries for Phase 1
│   └── gold_simplified_summaries.txt          # Expert-edited gold simplifications (Phase 2)
│
├── evaluation/
│   ├── evaluate_simp.py                       # Script for SARI, FKGL, BLEU, etc.
│   ├── final_evaluation_score_GPT.txt         # Evaluation results for GPT-simplified summaries
│   └── final_evaluation_score_gold.txt        # Evaluation results for gold simplifications
│
├── phase1/                                    # Non-expert annotation materials (if added)
├── phase2/                                    # Expert post-editing materials (if added)
└── README.md

🧱 Data Description

File Description
SciSummNet_summary_full.txt Source summaries from SciSummNet (expert-written).
GPT-simplified_summary_full.txt LLM-generated simplified versions using GPT-4o-mini.
phase1_GPT-simplified_passages_47 Subset of 47 passages used for Phase 1 user study.
original_passages_47.txt The corresponding 47 original summaries (pre-simplification).
gold_simplified_summaries.txt Expert post-edited gold-standard simplifications.

Each file contains one aligned entry per sample (one per line or paragraph).


⚙️ Evaluation

Script

evaluation/evaluate_simp.py computes readability and simplification metrics such as:

  • SARI (lexical simplification),
  • Flesch–Kincaid Grade Level (FKGL) and Reading Ease (FKRE),
  • BLEU, BERTScore, and LENS (semantic alignment).

Results

  • final_evaluation_score_GPT.txt — evaluation results for GPT outputs
  • final_evaluation_score_gold.txt — evaluation results for expert gold summaries

Key finding:
GPT-4o-mini improves surface readability, but expert post-editing produces text with higher precision, stylistic consistency, and scientific fidelity.


🚀 Quick Start

Requirements

python >= 3.9
pip install pandas numpy textstat sacrebleu bert-score easse

Run evaluation

cd evaluation
python evaluate_simp.py     --original ../dataset/original_passages_47.txt     --simplified ../dataset/GPT-simplified_summary_full.txt     --gold ../dataset/gold_simplified_summaries.txt

Outputs are saved to:

  • evaluation/final_evaluation_score_GPT.txt
  • evaluation/final_evaluation_score_gold.txt

🧠 Methodology Summary

  1. Source Corpus:
    SciSummNet — 1,000 ACL papers with abstracts, citation contexts, and 150-word expert summaries.

  2. Phase 1 – Non-Expert Evaluation:

    • Participants from STEM fields (outside computer science) identified difficult sentences and rated GPT outputs on simplicity, coherence, and fluency.
  3. Phase 2 – Expert Post-Editing:

    • Computer science experts refined GPT outputs using Phase 1 feedback to ensure domain fidelity and stylistic clarity.
  4. Evaluation:

    • Combination of automatic metrics (SARI, FKGL, BLEU, BERTScore) and human qualitative comparison.

📊 Key Insights

  • LLM-generated simplifications increase readability but risk loss of precision.
  • Expert-in-the-loop editing corrects oversimplification while preserving accuracy.
  • The dataset supports training and benchmarking for scientific simplification systems.
  • Encourages cross-disciplinary accessibility in scientific communication.

⚖️ License

  • This repository: intended under CC BY 4.0 license.
  • SciSummNet data: not redistributed; obtain from the official source.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%