Skip to content

dgrotebeverborg/doi_resolver

Repository files navigation

DOI Resolver

DOI Resolver compares publication metadata across scholarly sources. Paste one DOI and see what each source reports. It highlights agreement, conflicts, and missing data. Every value keeps provenance (where it came from). It can prepare controlled updates for PURE.

Plain-Language Summary

DOI Resolver helps you answer one simple question: “What do different scholarly systems say about this publication?”

You paste a DOI, and the app:

  • checks multiple trusted scholarly sources,
  • shows where information agrees and where it differs,
  • shows where each piece of data came from,
  • helps prepare safe updates for PURE.

In short: it is a metadata comparison and quality-control tool for research outputs.

Who It Is For

  • Research support staff
  • University libraries
  • CRIS/PURE administrators
  • Researchers who want to verify publication metadata across systems

Screenshots

Resolver start

Resolver start

Comparison view

Comparison view

PURE update workflow

PURE update workflow

Full text and citations

Full text and citations

It is built for transparency:

  • you can see what was found,
  • where it came from,
  • where sources disagree,
  • and what can safely be pushed to PURE.

What It Does

1) Resolve and compare metadata

  • Input one DOI.
  • Query selected scholarly sources concurrently.
  • Return normalized publication, persons, organizations, identifiers, provenance, and comparison results.

2) Enrich entities

  • Merge person and organization candidates from multiple sources.
  • Keep provenance and source-level evidence.

3) Citation/network view

  • Show references/citations in a dedicated tab.
  • Keep citation sources separate from metadata comparison.

4) Full text discovery

  • Collect full-text candidates (PDF/landing/etc.) with access status and provenance.
  • Rank candidates without pretending all are open.

5) PURE update workflow

  • Generate candidate updates.
  • Apply selected metadata/person/org/publication-identifier updates.
  • Attach PDF to PURE through a separate controlled action.

Source Model

Sources are grouped by capability:

  • comparison: Crossref, OpenAlex, Unpaywall, DataCite, Europe PMC, PURE
  • enrichment: ORCID, ROR
  • citation: Semantic Scholar, OpenCitations
  • fulltext: Unpaywall, Europe PMC, CORE, Crossref links

This keeps the UI readable and avoids overloading the comparison table.

Tech Stack

  • Backend: FastAPI, Pydantic, httpx
  • Frontend: React, TypeScript, Vite
  • Tests: pytest (backend), typed frontend build checks

API Endpoints

  • GET /health
  • GET /sources
  • POST /resolve
  • POST /pure/enrichment/dry-run
  • POST /pure/enrichment/update
  • POST /pure/enrichment/attach-pdf

Architecture Overview

Backend

  • app/connectors/: one connector per source (fetch + parse only).
  • app/sources/registry.py: centralized typed source registry.
  • app/services/resolver.py: selected-source orchestration and partial-failure handling.
  • app/services/merge.py: merge/comparison logic for publication/entities.
  • app/services/pure_enrichment.py: PURE dry-run/update/attach workflows.
  • app/models/domain.py: typed internal domain models.

Frontend

  • src/services/api.ts: API client layer.
  • src/features/doi-resolver/: resolver logic + mapping helpers.
  • src/components/: presentation and interaction components.
  • Source configuration is centralized and typed; UI is generated from config.

Merge and Provenance Strategy

  • Preserve all source values in source_results.
  • Classify field comparison states (exact_match, close_match, conflict, missing).
  • Choose merged values using explainable rules (majority + source priority fallback).
  • Never discard provenance; every normalized value can carry source evidence.

PURE Integration (Current Behavior)

Supported in Update Pure

  • Publication fields: title, journal, abstract, volume, issue, pages, open_access_status, license, keywords, subjects, language, published_year, published_date
  • Publication identifiers (filtered to publication-level types)
  • Linking matched persons and organizations to the research output

Separate action

  • Attach PDF to PURE:
    • tries eligible PDF candidates,
    • skips blocked/challenge pages,
    • uploads through PURE file upload API,
    • links file as FileElectronicVersion.

Quick Start

1) Configure environment

Copy the component-specific templates:

cp backend/.env.example backend/.env
cp frontend/.env.example frontend/.env

Do not commit .env files.

Important variables:

  • UNPAYWALL_EMAIL (required for Unpaywall)
  • PURE_BASE_URL, PURE_APIKEY (required for PURE features)
  • SEMANTIC_SCHOLAR_API_KEY (optional but useful)
  • CORE_API_KEY (optional, enables CORE)
  • VITE_API_BASE_URL (frontend -> backend URL)

2) Start backend

cd backend
../.venv/bin/pip install -e .[dev]
../.venv/bin/uvicorn app.main:app --host 127.0.0.1 --port 8011

3) Start frontend

cd frontend
npm install
npm run dev -- --host 127.0.0.1 --port 4173

Open: http://127.0.0.1:4173

Optional: start both with one command

./scripts/dev-up.sh

Defaults:

  • backend: 127.0.0.1:8011
  • frontend: 127.0.0.1:4173

Override ports if needed:

BACKEND_PORT=8013 FRONTEND_PORT=4175 ./scripts/dev-up.sh

Testing

Run backend tests:

cd backend
../.venv/bin/python -m pytest

Detailed pytest scheme (layers, markers, skip policy, quality gates):

Optional live DOI tests:

cd backend
RUN_LIVE_DOI_TESTS=1 ../.venv/bin/python -m pytest -m live_doi

Run frontend build/type checks:

cd frontend
npm run build

Run frontend tests:

cd frontend
npm test

Add a New Source

  1. Add connector in backend/app/connectors/.
  2. Declare source metadata/category in backend/app/sources/registry.py.
  3. Return normalized SourceResult from connector parser.
  4. Add env config if needed.
  5. Add tests for parser + resolver behavior.
  6. Update frontend source descriptions (single centralized config).

Current Limitations

  • External source rate limits and anti-bot pages can reduce coverage.
  • Some fields are source-type/PURE-type dependent and cannot always be persisted.
  • Citation graph data is presented in the app but not written as native PURE citation graph fields.

Pre-Publish Checklist

  • Verify no secrets are committed (.env files, API keys, tokens).
  • Confirm .env setup works from backend/.env.example and frontend/.env.example.
  • Run backend tests: cd backend && ../.venv/bin/python -m pytest
  • Run frontend tests/build: cd frontend && npm test && npm run build
  • Test key manual flows:
    • resolve DOI
    • compare sources
    • PURE dry-run/update
    • Attach PDF to PURE (safe and risky mode)
  • Confirm README startup steps work on a clean machine.

License

This project is licensed under the GNU Affero General Public License v3.0 only (AGPL-3.0-only). See LICENSE.

About

get metadata of many sources for a doi like open alex, cross_ref (and update it in your cris)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors