Skip to content

Add DOI support and enhance dataset metadata structure#44

Merged
headmetal merged 9 commits intomainfrom
zenodo_doi_updates
Mar 18, 2026
Merged

Add DOI support and enhance dataset metadata structure#44
headmetal merged 9 commits intomainfrom
zenodo_doi_updates

Conversation

@headmetal
Copy link
Collaborator

No description provided.

Expose dataset DOIs in the catalog and update repository metadata. DataCatalog now resolves a "doi" field (default None) and includes it in dataset records/DataFrame for both subdatasets and versioned datasets. datasets.yaml was updated to include placeholder DOIs (10.000/ABCD) for many entries. .zenodo.json creators, title, description and keywords were also updated to reflect the project and contributor changes.
Remove the top-level DOI for measures_bedmachine_antarctica and add a subdatasets block with individual DOIs for versions v1, v2, and v3. This updates dataset metadata to track DOIs per dataset version while leaving paths and other attributes unchanged.
Update src/ccdtools/config/datasets.yaml to add 'subpath' and 'extension' entries for subdatasets v1, v2, and v3 (subpath set to the version name and extension set to 'nc') to make dataset locations and file types explicit.
Wrap doi, subpath and extension for subdatasets v1, v2, and v3 under a new `data` mapping in src/ccdtools/config/datasets.yaml to standardize the subdataset schema and make room for additional metadata fields.
Adjust datasets.yaml structure: remove the duplicate top-level `extension: nc` and replace the nested `data:` keys under each subdataset with explicit version keys (`v1`, `v2`, `v3`). This aligns the file with the expected config schema so each subdataset's metadata (doi, subpath, extension) is nested under its version key and can be parsed correctly.
Remove deeply nested subdatasets mapping and replace with a flatter structure: a single 'extension' field and a 'doi' mapping keyed by version (v1, v2, v3). This simplifies the datasets config, avoids repeating the extension for each subdataset, and centralizes DOIs for easier maintenance. No changes to the bedmap entry.
Testing adding doi to subdatasets
Add DOI to the metadata printout in DataCatalog by inserting print(f'\nDOI: {doi}') after the version line. This ensures the catalog display includes the DOI for easier reference.
Extract the first row from `subset` earlier to obtain and print the DOI, and remove the later redundant `row = subset.iloc[0]`. This ensures `doi` is read from the row before printing and avoids duplicating the row extraction when accessing metadata.
@headmetal headmetal self-assigned this Mar 18, 2026
@headmetal headmetal marked this pull request as ready for review March 18, 2026 21:40
@headmetal headmetal merged commit 1267c2e into main Mar 18, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant