Skip to content

GeoScienceWorld harvester: add temporal/epoch extraction once geoextent#122 is implemented #257

@nuest

Description

@nuest

Context

This is a follow-up to #251 (Add GeoScienceWorld as a source).

GSW article pages embed GeoRef subject metadata with stratigraphic age/epoch terms (e.g. "Jurassic, Upper", "Holocene"). The initial GeoScienceWorld harvester (#251) focuses on spatial metadata only.

Temporal/epoch extraction has been intentionally deferred because it belongs in geoextent's GeoScienceWorld content provider, not in OPTIMAP's harvester. The upstream feature request is tracked at:

nuest/geoextent#122GeoScienceWorld provider: extract stratigraphic age/epoch from GeoRef subject metadata

What to do when geoextent#122 is merged

Once geoextent ships the epoch extraction in its GSW provider (i.e. from_remote(doi, bbox=True, tbox=True) returns a tbox for geological age), update works/harvesting/geoscienceworld.py to:

  1. Pass tbox=True to the relevant geoextent call.
  2. Map the returned tbox value (signed ISO 8601 envelope) to timeperiod_startdate / timeperiod_enddate on the Work record.
  3. Note in docs/CHANGELOG that geological timescale dates (e.g. -201300000-01-01) are stored as strings; the UI timeline will not render them — a follow-up UI task may be needed.
  4. Add/update the @tag('online') test for the GSW harvester to assert timeperiod_startdate is populated for a paper with a known geological age.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions