Context
This is a follow-up to #251 (Add GeoScienceWorld as a source).
GSW article pages embed GeoRef subject metadata with stratigraphic age/epoch terms (e.g. "Jurassic, Upper", "Holocene"). The initial GeoScienceWorld harvester (#251) focuses on spatial metadata only.
Temporal/epoch extraction has been intentionally deferred because it belongs in geoextent's GeoScienceWorld content provider, not in OPTIMAP's harvester. The upstream feature request is tracked at:
nuest/geoextent#122 — GeoScienceWorld provider: extract stratigraphic age/epoch from GeoRef subject metadata
What to do when geoextent#122 is merged
Once geoextent ships the epoch extraction in its GSW provider (i.e. from_remote(doi, bbox=True, tbox=True) returns a tbox for geological age), update works/harvesting/geoscienceworld.py to:
- Pass
tbox=True to the relevant geoextent call.
- Map the returned
tbox value (signed ISO 8601 envelope) to timeperiod_startdate / timeperiod_enddate on the Work record.
- Note in docs/CHANGELOG that geological timescale dates (e.g.
-201300000-01-01) are stored as strings; the UI timeline will not render them — a follow-up UI task may be needed.
- Add/update the
@tag('online') test for the GSW harvester to assert timeperiod_startdate is populated for a paper with a known geological age.
Context
This is a follow-up to #251 (Add GeoScienceWorld as a source).
GSW article pages embed GeoRef subject metadata with stratigraphic age/epoch terms (e.g. "Jurassic, Upper", "Holocene"). The initial GeoScienceWorld harvester (#251) focuses on spatial metadata only.
Temporal/epoch extraction has been intentionally deferred because it belongs in geoextent's GeoScienceWorld content provider, not in OPTIMAP's harvester. The upstream feature request is tracked at:
nuest/geoextent#122 — GeoScienceWorld provider: extract stratigraphic age/epoch from GeoRef subject metadata
What to do when geoextent#122 is merged
Once geoextent ships the epoch extraction in its GSW provider (i.e.
from_remote(doi, bbox=True, tbox=True)returns atboxfor geological age), updateworks/harvesting/geoscienceworld.pyto:tbox=Trueto the relevant geoextent call.tboxvalue (signed ISO 8601 envelope) totimeperiod_startdate/timeperiod_enddateon theWorkrecord.-201300000-01-01) are stored as strings; the UI timeline will not render them — a follow-up UI task may be needed.@tag('online')test for the GSW harvester to asserttimeperiod_startdateis populated for a paper with a known geological age.