Skip to content

Add uniqueId and organization fields to concept set metadata for cross-system sharing #1

@BorisDelange

Description

@BorisDelange

Context

Concept sets in this repository use a local integer id that is only meaningful within our system. The OHDSI Concept Set Specification explicitly states that id is "a unique identifier for the concept set within a given system" and does not define any cross-system identifier.

This makes it difficult to:

  • Share concept sets with other organizations or OHDSI community members
  • Reference a specific concept set unambiguously across systems (e.g., in publications, study packages, or federated analyses)
  • Track provenance and attribution when concept sets are reused

This is a well-known gap in the OHDSI ecosystem. See notably:

  • OHDSI/Atlas#496 — extensive discussion on adding UUID/GUID to cohort definitions (2017–2019, never implemented)
  • OHDSI/Strategus#114 — modern JSON schema using URI references for portable artifacts
  • OHDSI/Athena#48 — proposal for a shared design repository with versioning

Proposal

Add two new fields to the concept set metadata object:

1. uniqueId (string, UUID v4)

A globally unique identifier for the concept set, generated once at creation and stable across versions.

{
  "metadata": {
    "uniqueId": "550e8400-e29b-41d4-a716-446655440000",
    ...
  }
}

Rationale: The OHDSI community debated UUID vs content-hash (Atlas #496). A UUID v4 is simpler and more practical for our use case:

  • It stays stable when the concept set is updated (content hashes would change on every edit)
  • It can be generated offline without any central registry
  • It is universally understood and supported

The local id (integer) would remain for internal use (file naming, URLs, backward compatibility).

2. organization (object)

Attribution to the organization that created/maintains the concept set.

{
  "metadata": {
    "organization": {
      "name": "INDICATE Consortium",
      "url": "https://indicate-eu.org"
    },
    ...
  }
}

This would make it clear where a concept set comes from when shared externally.

Example of a full metadata block after these changes

{
  "id": 1,
  "name": "3-minute Diagnostic Interview for CAM-defined Delirium (3D-CAM) score",
  "version": "1.0.0",
  "metadata": {
    "uniqueId": "550e8400-e29b-41d4-a716-446655440000",
    "organization": {
      "name": "INDICATE Consortium",
      "url": "https://indicate-eu.org"
    },
    "translations": { "..." : "..." },
    "createdByDetails": { "..." : "..." },
    "reviews": [],
    "versions": []
  }
}

Design decisions

  • UUID v4 for uniqueId (not a content hash): stable across edits, generated offline, universally supported.
  • uniqueId inside metadata: respects the current OHDSI Concept Set Specification, which treats metadata as extensible. This avoids deviating from the spec while we propose changes upstream.

Open questions

  1. Additional organization fields? Should we add fields like contactEmail, license, or a ROR ID?

References

Next step: If we reach consensus on this proposal, we will open an issue on OHDSI/TAB to propose extending the official Concept Set Specification accordingly.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions