article_2025_histories_textreuse

Dataset and replication package for Histories of England text reuse article.

Access the published article here: A Computational Approach to Literary Borrowing in Enlightenment Britain

License & Citing

CC BY-NC. https://creativecommons.org/licenses/by-nc/4.0/

If using the data, cite the published article:

Vaara, Ville. "Chapter 2 Charting the Circulation of Histories of England: A Computational Approach to Literary Borrowing in Enlightenment Britain" In Enlightenment Histories edited by Marc Hanvelt, Mark Gregory Spencer and Mikko Sakari Tolonen, 37-72. De Gruyter Oldenbourg, 2026.
https://doi.org/10.1515/9783111637181-003

Scripts

Each article_* -script is independent and produces part of the article data and plot content.

article0_datasets.py loads common source datasets and is used by the other scripts.

config_visuals.py sets the visual style of the plots.

Input data

Note that cloning all the data requires Git LFS.

Found in data/raw.

hoe_meta.csv - Histories of England titles metadata.

'manifestation_id': Unique id for each physical book.
'estc_id': ID of title in English Short Title Catalogue.
'actor_id': Unique ID for each author.
'name_unified': Name of author.
'publication_year': Publication year of the volume / edition.
'title': Title of the volume / edition.
'text_length': Length of text, in characters.
'work_id': Unique ID of the work.
'main_category': Genre category of the work. All are Histories of England.
'text_length_p': Length of text, in pages. (characters / 3000)
'sequence': First or subsequent edition.
'author_volume_group': Unique ID for the volume group. Grouping volumes across editions, e.g. First Volume of Hume's History in each edition, etc.
'publication_decade': Publication decade iof the volume / edition.
'originality': Originality ratio of the volume.

reception_inception_coverage.csv - Reuse numbers between titles.

'src_manifestation_id': Manifestation ID of the source document.
'dst_manifestation_id': Manifestation ID of the destination document.
'coverage_src_in_dst_abs': Absolute number of characters reused.
'same_author': Are the authors the same for both manifestations.

coverage_full.csv - Pairwise coverage between titles.

'mi1' - Manifestation ID of title 1.
'mi2' - Manifestation ID of title 2.
'reuse_t1_t2' - Characters in 1 reused in 2.
'reuse_t2_t1' - Characters in 2 reused in 1.
'coverage_t1_t2' - Portion of 1 reused in 2.
'coverage_t2_t1' - Portion of 2 reused in 1.
't1_length' - Length of 1.
't2_length' - Length of 2.
'coverage_max' - Max of 'coverage_t1_t2' and 'coverage_t2_t1'.
'reuse_max' - Max of 'reuse_t1_t2' and 'reuse_t2_t1'.
'publication_year1' - Publication year of 1.
'publication_year2' - Publication year of 2.
'same_author' - Are both authors the same.
'coverage_directed' - Coverage of earlier publication in later.
'both_authors_present' - Do both titles have authors in the metadata.

as_original_source_maps/dst/*.csv - Text reuses originating from each title. Filename is Manifestation ID of origin.

'dst_trs_start': Character index of beginning of reuse in destination.
'dst_trs_end': Character index of end of reuse in destination.
'dst_mi': Manifestation ID of the destination.

as_original_source_maps/src/*.csv - Text reuses originating from each title. Filename is Manifestation ID of origin.

'dst_trs_start': Character index of beginning of reuse in source.
'dst_trs_end': Character index of end of reuse in source.
'dst_mi': Manifestation ID of the destination.

originality_maps/*.json - Original segments of each title. Filename denotes Manifestation ID.

"manifestation_id": Manifestation ID of the manifestation.
"originality_ratio": 0-1. Ratio of original content to all content.
"originality_segments": Character indices of original segments.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
plots		plots
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
article0_datasets.py		article0_datasets.py
article1_overviews.py		article1_overviews.py
article2_originality.py		article2_originality.py
article3_communities.py		article3_communities.py
article4_influence.py		article4_influence.py
article5_influence_details.py		article5_influence_details.py
config_visuals.py		config_visuals.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

article_2025_histories_textreuse

License & Citing

Scripts

Input data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

article_2025_histories_textreuse

License & Citing

Scripts

Input data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages