Official repo for Learning to Reason for Long-Form Story Generation, available on arxiv.
This work presents a new RL-reward paradigm, Verifiable Rewards via Completion Likelihood Improvement (VR-CLI), which is then used to train a model to predict useful plans for the next chapter of a story.
This repo contains five parts:
setup_data: Compile a Next-Chapter Prediction dataset, used for training and story-generationrl_training: Train a model using our VR-CLI reward paradigm, using either the NCP task or another task of your choosingsft_training: Train a model using supervised finetuning on the NCP taskstory_generation: Generate reasoning and story continuations using either a pretrained model, or a model you have trained yourselfevaluation: Replicate our evaluations of the story generation models using human annotations and automated metrics
Consult the instructions.md files in each directory for more details.
If you find this work useful, please cite it as follows:
@misc{gurung2025learningreasonlongformstory,
title={Learning to Reason for Long-Form Story Generation},
author={Alexander Gurung and Mirella Lapata},
year={2025},
eprint={2503.22828},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.22828},
}