Add DOI support and enhance dataset metadata structure#44
Merged
Conversation
Expose dataset DOIs in the catalog and update repository metadata. DataCatalog now resolves a "doi" field (default None) and includes it in dataset records/DataFrame for both subdatasets and versioned datasets. datasets.yaml was updated to include placeholder DOIs (10.000/ABCD) for many entries. .zenodo.json creators, title, description and keywords were also updated to reflect the project and contributor changes.
Remove the top-level DOI for measures_bedmachine_antarctica and add a subdatasets block with individual DOIs for versions v1, v2, and v3. This updates dataset metadata to track DOIs per dataset version while leaving paths and other attributes unchanged.
Update src/ccdtools/config/datasets.yaml to add 'subpath' and 'extension' entries for subdatasets v1, v2, and v3 (subpath set to the version name and extension set to 'nc') to make dataset locations and file types explicit.
Wrap doi, subpath and extension for subdatasets v1, v2, and v3 under a new `data` mapping in src/ccdtools/config/datasets.yaml to standardize the subdataset schema and make room for additional metadata fields.
Adjust datasets.yaml structure: remove the duplicate top-level `extension: nc` and replace the nested `data:` keys under each subdataset with explicit version keys (`v1`, `v2`, `v3`). This aligns the file with the expected config schema so each subdataset's metadata (doi, subpath, extension) is nested under its version key and can be parsed correctly.
Remove deeply nested subdatasets mapping and replace with a flatter structure: a single 'extension' field and a 'doi' mapping keyed by version (v1, v2, v3). This simplifies the datasets config, avoids repeating the extension for each subdataset, and centralizes DOIs for easier maintenance. No changes to the bedmap entry.
Testing adding doi to subdatasets
Add DOI to the metadata printout in DataCatalog by inserting print(f'\nDOI: {doi}') after the version line. This ensures the catalog display includes the DOI for easier reference.
Extract the first row from `subset` earlier to obtain and print the DOI, and remove the later redundant `row = subset.iloc[0]`. This ensures `doi` is read from the row before printing and avoids duplicating the row extraction when accessing metadata.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.