CHIRON

This is the official repository for the EMNLP 2024 Findings paper "CHIRON: Rich Character Representations in Long-Form Narratives" by Alex Gurung and Mirella Lapata. In the paper we propose a new character-sheet based representation, CHIRON, useful for long-story tasks that require character understanding and for story analysis. The code in this repository will allow you to create your own character sheets, and calculate the density character-centricity metric we propose. Refer to the paper for more details.

Installation

(Code was tested using docker image pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel)

$ conda create -n chiron python=3.11 # might be unnecessary if you are using a docker image
$ conda activate chiron
$ pip install -r requirements.txt
$ huggingface-cli login --token HUGGINFACE_TOKEN # get your token from https://huggingface.co/settings/tokens
$ python -m spacy download en_core_web_lg # can use different model if you want
$ pip install vllm==0.6.3.post1

Note: vllm sometimes has breaking changes, we used 0.5.4 for the paper (with beam search) but as it is no longer supported this code uses the latest available version as of writing.

Usage

From Snippet

For creating your own character sheets from a snippet, run

python end_to_end_chiron_from_snippet.py

We provide a sample snippet and default parameters, run python end_to_end_chiron_from_snippet.py --help for more details.

From Dataset

For creating character sheets from a dataset, run

python end_to_end_chiron_from_dataset.py (see python end_to_end_chiron_from_dataset.py --help for more details on parameters).

This will output a character sheet to the provided directory for each story-character pair in the dataset. There will be three files for each story-character pair:

story_id-character_unfiltered.txt: the character sheet without the verification classifier filtering
story_id-character_postfiltering.txt: the character sheet after verification classifier filtering
story_id-character_postfilteringdedup.txt: the character sheet after verification classifier filtering and statement deduplication

This last file is the one you should use for further analysis/downstream tasks.

Your dataset of stories and snippets should be in the following format:

{
    "story_id": str, # unique identifier for the story
    "character": str, # character sheet name
    "snippets": List[List[str]]
}

snippets should be a list of snippets from the story, where each snippet is either a list of [snippet_role, original_text, snippet] or just [original_text, snippet] - this depends on dataset and allows you to split snippets to a normal size while maintaining references to the original snippet snippet role.

Dataset

We build the paper's work on three data sources: STORIUM, New Yorker/Art or Artifice?, and DOC/RE³. Due to copyright concerns we cannot release this data here, but the instructions for reconstructing the datasets are present in the paper. As of writing the full STORIUM dataset is available here. For academic research purposes only you may also reach out to Alex Gurung for a subset of the STORIUM data that we used for this paper.

Citation

@article{gurung2024chiron,
  title={CHIRON: Rich Character Representations in Long-Form Narratives},
  author={Gurung, Alexander and Lapata, Mirella},
  journal={arXiv preprint arXiv: 2406.10190},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chiron_compile_character_sheet_utils.py		chiron_compile_character_sheet_utils.py
chiron_dataset_setup_utils.py		chiron_dataset_setup_utils.py
chiron_filter_utils.py		chiron_filter_utils.py
chiron_generation_module_utils.py		chiron_generation_module_utils.py
chiron_reasoning_util.py		chiron_reasoning_util.py
chiron_simplification_utils.py		chiron_simplification_utils.py
chiron_utils.py		chiron_utils.py
dummy_example.jsonl		dummy_example.jsonl
end_to_end_chiron_from_dataset.py		end_to_end_chiron_from_dataset.py
end_to_end_chiron_from_snippet.py		end_to_end_chiron_from_snippet.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CHIRON

Table of Contents

Installation

Usage

From Snippet

From Dataset

Dataset

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CHIRON

Table of Contents

Installation

Usage

From Snippet

From Dataset

Dataset

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages