Digital Scriptorium OpenRefine documentation and JSON recipes for data reconciliation
When utilizing the JSON instructions (also known as recipes) found in this repository for DS data in OpenRefine, find the left column, select the Undo/Redo tab, select Apply, paste the JSON code, and then select Perform operations. This will execute the prewritten commands which perform various actions on the data for the reconciliation process.
Facets and filters can also be used on the data by using drop-down menus available on each column header and displayed in the left column when selecting the Facet/Filter tab.
The following notes apply to file naming conventions for editing file name variables found in the instructions in this repository (use all lowercase letters where applicable):
- DATE = the date the file/dataset was generated/created/extracted in
YYYYMMDDformat - INSTITUTION = the code for the name of the institutional source for the data, such as
pennorkansasorcsl - DATATYPE = the type of encoding standard or technical format of the metadata source, such as
marcxmlormetsorcsv - One or more DIFFERENTIATORS may also be added on the file name to disambiguate files, using sources names of collections or databases, such as
bibliophillyormuslimworld, or batch numbers, such asbatch-1,batch-2, etc.
Examples of correctly formatted file names:
20230518-materials-rome-mets-legacy-enriched.csv20230630-genres-penn-marcxml-bibliophilly-enriched.csv20230715-names-kansas-marc-enriched.csv20230816-languages-princeton-marcxml-batch-3-enriched.csv20230901-places-hrc-csv-fragments-batch-1-enriched.csv
Genre reconciliation instructions
Language reconciliation instructions
Material reconciliation instructions
Name reconciliation instructions