Skip to content

JinyiHan99/SCR

Repository files navigation

🧠 SCR: Structured Reasoning for Large Language Models

Paper HuggingFace License Python

📖 Overview

A single long-CoT trajectory entangles generation, verification, and revision into one token stream, so neither token-level SFT nor outcome-level RL can assign ability-specific credit. As a result, large reasoning models (LRMs) often keep verifying and revising even after reaching the correct answer.

SCR (Structured Reasoning) reorganizes long-CoT reasoning into stages that are explicit, evaluable, and independently trainable, realized through a Generate–Verify–Revise paradigm:

  • Dynamic Termination Supervision (DTS). Stops reasoning once self-verification confirms correctness.
  • Selective Loss Masking (SLM). Excludes incorrect initial answers from the imitation loss while keeping them as context.
  • Progressive Two-Stage RL. Stage I jointly optimizes generation and verification; Stage II focuses on revision once the verifier is reliable.

Across three backbones and multiple benchmarks, SCR improves reasoning quality and self-verification while reducing output length by up to 50%.

SCR method overview

📁 Repository Structure

SCR/
├── LLaMA-Factory/        # SFT codebase (adapted)
├── EasyR1/               # RL codebase (adapted)
├── data/                 # Training and evaluation data
├── infer/                # Inference / evaluation scripts
├── figs/                 # Figures and paper PDF
├── run_SCR-SFT.sh        # Entry script for the SFT phase
├── run_SCR-Stage1.sh     # Entry script for RL Stage I (generation + verification)
├── run_SCR-Stage2.sh     # Entry script for RL Stage II (revision)
├── infer.sh              # Entry script for evaluation
└── rl_config.yaml        # RL hyper-parameters

🚀 Quick Start

1. SFT Phase

Install Dependencies

conda create -n SCR-SFT python=3.10.9
conda activate SCR-SFT
cd LLaMA-Factory
pip install -e .

Run

bash run_SCR-SFT.sh

Key parameters in run_SCR-SFT.sh:

Parameter Description
model_path Path to the base language model. This is the model fine-tuned during training.
template Conversation/prompt template for the model.
dataset Path to the training dataset. Please update the EOS token according to the model type before running SFT.

2. RL Phase

Install Dependencies

conda create -n SCR-RL python=3.10.9
conda activate SCR-RL
cd EasyR1
pip install -e .

Run

The RL phase uses a progressive two-stage curriculum. Stage I jointly trains generation and self-verification; Stage II focuses on revision once the verifier becomes reliable.

# Stage I: jointly optimize initial generation and self-verification
bash run_SCR-Stage1.sh

# Stage II: optimize revision based on the trained verifier
bash run_SCR-Stage2.sh

3. Evaluation

bash infer.sh

📊 Highlights

  • Explicit reasoning stages. Each step (generation / verification / revision) is exposed and can be supervised individually.
  • Trajectory-aware supervision. DTS + SLM provide targeted SFT signals that prevent imitation of false starts and over-long verification.
  • Decoupled credit assignment. Two-stage GRPO assigns ability-specific credit instead of a single outcome-level reward.
  • Shorter, sharper reasoning. Up to 50% reduction in output length without sacrificing accuracy.

🎉 Acknowledgements

This repository includes code adapted from the following open-source projects:

  • LLaMA-Factory: Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
  • EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based

We thank the authors and contributors of these projects for making their code publicly available.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages