You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reads previous dump from {output_path}/{prev-release}.zip
Generates new dump at {input_path}/{release}-{YYYY-MM-DD}-ror-data.json/.csv
Creates zip at {output_path}/{release}-{YYYY-MM-DD}-ror-data.zip
Has no hardcoded ror-data repo references; the workflow handles all repo interactions
Changes needed:
Minimal or none. The script is I/O path agnostic and reads from paths passed as arguments. The workflow changes (sub-issue 01) handle downloading from release artifacts and uploading back.
Optional: update the -ror-data suffix constant (line 18) if the naming convention changes. This is likely best kept as-is since "ror-data" is a known brand/convention for the dump files.
upload_dump_zenodo.py
Current behavior:
DUMP_FILE_DIR = "./" (line 15) scans the current directory for the dump zip
get_dump_file() (lines 19-24) matches the filename prefix against the release name
check_release_data() (line 221) checks "ror-data.zip" in the filename
Error message (line 228): "Dump file not found in ror-data"
GITHUB_API_URL (line 12) points to ror-community/ror-updates for release notes (this is a different repo from ror-data, used for release notes metadata)
format_description() (lines 50-108) contains hardcoded HTML with links to ror-schema and ror-updates repos
Changes needed:
Update error message on line 228 from "ror-data" to something generic (e.g., "Dump file not found in working directory")
Consider parameterizing DUMP_FILE_DIR as a CLI argument instead of hardcoding "./", so the workflow can explicitly pass the download directory
Context
Several scripts in https://github.com/ror-community/curation_ops repository are used by https://github.com/ror-community/ror-records workflows to generate and publish data dumps. While most of these scripts are path-agnostic and require minimal changes, some contain hard-coded references and conventions that should be updated to complete #353.
Scripts
generate_dump.pyCurrent behavior:
-r(release dir),-e(previous dump name),-i(input path),-o(output path){output_path}/{prev-release}.zip{input_path}/{release}-{YYYY-MM-DD}-ror-data.json/.csv{output_path}/{release}-{YYYY-MM-DD}-ror-data.zipChanges needed:
-ror-datasuffix constant (line 18) if the naming convention changes. This is likely best kept as-is since "ror-data" is a known brand/convention for the dump files.upload_dump_zenodo.pyCurrent behavior:
DUMP_FILE_DIR = "./"(line 15) scans the current directory for the dump zipget_dump_file()(lines 19-24) matches the filename prefix against the release namecheck_release_data()(line 221) checks"ror-data.zip"in the filename"Dump file not found in ror-data"GITHUB_API_URL(line 12) points toror-community/ror-updatesfor release notes (this is a different repo from ror-data, used for release notes metadata)format_description()(lines 50-108) contains hardcoded HTML with links to ror-schema and ror-updates reposChanges needed:
"ror-data"to something generic (e.g.,"Dump file not found in working directory")DUMP_FILE_DIRas a CLI argument instead of hardcoding"./", so the workflow can explicitly pass the download directoryrequirements.txtcurrently pinsrequeststorequests==2.27.1. We should update to a more recent versionFiles to review/modify
generate_dump.pyupload_dump_zenodo.pyrequirements.txtAcceptance criteria
generate_dump.pycontinues to work correctly with the modified workflow from sub-issue 01upload_dump_zenodo.pyerror messages are accurate and no longer reference "ror-data" as a directoryDUMP_FILE_DIRis parameterizable via CLI argument with"./"as the defaultrequirements.txtdependencies are reviewed and updated where appropriate