Skip to content

COMHIS/article_2025_histories_textreuse

Repository files navigation

article_2025_histories_textreuse

Dataset and replication package for Histories of England text reuse article.

Access the published article here: A Computational Approach to Literary Borrowing in Enlightenment Britain

License & Citing

CC BY-NC. https://creativecommons.org/licenses/by-nc/4.0/

If using the data, cite the published article:

Vaara, Ville. "Chapter 2 Charting the Circulation of Histories of England: A Computational Approach to Literary Borrowing in Enlightenment Britain" In Enlightenment Histories edited by Marc Hanvelt, Mark Gregory Spencer and Mikko Sakari Tolonen, 37-72. De Gruyter Oldenbourg, 2026.
https://doi.org/10.1515/9783111637181-003

Scripts

Each article_* -script is independent and produces part of the article data and plot content.

article0_datasets.py loads common source datasets and is used by the other scripts.

config_visuals.py sets the visual style of the plots.

Input data

Note that cloning all the data requires Git LFS.

Found in data/raw.


hoe_meta.csv - Histories of England titles metadata.

  • 'manifestation_id': Unique id for each physical book.
  • 'estc_id': ID of title in English Short Title Catalogue.
  • 'actor_id': Unique ID for each author.
  • 'name_unified': Name of author.
  • 'publication_year': Publication year of the volume / edition.
  • 'title': Title of the volume / edition.
  • 'text_length': Length of text, in characters.
  • 'work_id': Unique ID of the work.
  • 'main_category': Genre category of the work. All are Histories of England.
  • 'text_length_p': Length of text, in pages. (characters / 3000)
  • 'sequence': First or subsequent edition.
  • 'author_volume_group': Unique ID for the volume group. Grouping volumes across editions, e.g. First Volume of Hume's History in each edition, etc.
  • 'publication_decade': Publication decade iof the volume / edition.
  • 'originality': Originality ratio of the volume.

reception_inception_coverage.csv - Reuse numbers between titles.

  • 'src_manifestation_id': Manifestation ID of the source document.
  • 'dst_manifestation_id': Manifestation ID of the destination document.
  • 'coverage_src_in_dst_abs': Absolute number of characters reused.
  • 'same_author': Are the authors the same for both manifestations.

coverage_full.csv - Pairwise coverage between titles.

  • 'mi1' - Manifestation ID of title 1.
  • 'mi2' - Manifestation ID of title 2.
  • 'reuse_t1_t2' - Characters in 1 reused in 2.
  • 'reuse_t2_t1' - Characters in 2 reused in 1.
  • 'coverage_t1_t2' - Portion of 1 reused in 2.
  • 'coverage_t2_t1' - Portion of 2 reused in 1.
  • 't1_length' - Length of 1.
  • 't2_length' - Length of 2.
  • 'coverage_max' - Max of 'coverage_t1_t2' and 'coverage_t2_t1'.
  • 'reuse_max' - Max of 'reuse_t1_t2' and 'reuse_t2_t1'.
  • 'publication_year1' - Publication year of 1.
  • 'publication_year2' - Publication year of 2.
  • 'same_author' - Are both authors the same.
  • 'coverage_directed' - Coverage of earlier publication in later.
  • 'both_authors_present' - Do both titles have authors in the metadata.

as_original_source_maps/dst/*.csv - Text reuses originating from each title. Filename is Manifestation ID of origin.

  • 'dst_trs_start': Character index of beginning of reuse in destination.
  • 'dst_trs_end': Character index of end of reuse in destination.
  • 'dst_mi': Manifestation ID of the destination.

as_original_source_maps/src/*.csv - Text reuses originating from each title. Filename is Manifestation ID of origin.

  • 'dst_trs_start': Character index of beginning of reuse in source.
  • 'dst_trs_end': Character index of end of reuse in source.
  • 'dst_mi': Manifestation ID of the destination.

originality_maps/*.json - Original segments of each title. Filename denotes Manifestation ID.

  • "manifestation_id": Manifestation ID of the manifestation.
  • "originality_ratio": 0-1. Ratio of original content to all content.
  • "originality_segments": Character indices of original segments.

About

Dataset and replication package for Histories of England text reuse article.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages