Skip to content

Feature Request: Automated Retrieval of Distinct Column Values in Schema Page #296

@YemreGurses

Description

@YemreGurses

Currently users must manually identify all possible values for categorical columns (e.g., encounter class code) when configuring data sources. This process is time-consuming and prone to error. For example, we end up putting possible values for EncounterClassCode column in a data source manually:

Image

It would be highly beneficial if toFHIR could verify and retrieve distinct values directly from the data source and display them within the schema page.

Proposed Solution:

UI: Add a "Fetch Distinct Values" button (or similar control) next to each column in the schema view.

Backend: When triggered, toFHIR should utilize the underlying Spark engine to execute a groupBy or distinct query on the specific column to fetch unique entries.

UX: The retrieved values should then be displayed to the user (could be put to the Definition field), allowing for easier mapping or validation.

We must be careful when dealing with non-categorical columns such as dates or unique IDs. Executing distinct value queries on these high-cardinality fields may be resource-intensive and the resulting list is rarely useful for mapping purposes. The backend query must enforce a hard limit (e.g., LIMIT 100) on the retrieved distinct values to avoid crashes. If the column contains more unique values than the limit, the system should return only the top N results (or a sample) to ensure stability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions