Skip to content

Implement data standards module (0.5.0) #19

@ericpan64

Description

@ericpan64

Problem

Alright -- this is a big reason why I wrote pydian: make it easy to convert data between two standard mappings. Given two well-defined data schemas, have a way to import a repeatable mapping function between them so people can work together to build one standard mapping together instead of re-writing the code from scratch or using a bunch of spreadsheets that no one likes working in (though are necessary and make sense)

Requested feature

(IP -- update as ideas evolve)

from pydian import interchange  # get, select, validate, __interchange?__
from pydian.standards import DataStandard, DataMapping

some_mapping_fn = Mapper(...)
some_input_data: dict[str, Any] = { ... }
some_output_data = some_mapping_fn(some_input_data)

# `DataStandard` -- validation with "compliance" levels, and can combine other standards.


# `DataMapping` -- importable logic between two standards
input_to_output_mapping = DataMapping(
  some_mapping_fn,  # Some callable expecting input, returning output
  description="...",  # Human readable description
  input_schema= { ... },   # Performs validation and wraps in Err with data on failure (strict -> raises error)
  output_schema = { ... },   # Performs validation and wraps in Err with data on failure (strict -> raises error)
  strict=False,
  ... 
)

# Export a mapping function some standard, readable, and hard-to-hack way
with open("./input_to_output.mappy-v1.json", "w") as f: # A mapping in Python -- `map py`? Also sounds like happy lol
  input_to_output_mapping.export(f)

# Have standard way to load mapping file:
some_mapping = mimport("./input_to_output.mappy-v1.json")

The json might look like:

{
   "version": "...",
   "description": "...",
   "input_schema": "...",  # jsonschema
   "output_schema": "...",   # jsonschema
   "mapping_fn": {
      // number each step taken (for pipelines later, or for series of functions)
      "1": {
          ... way to make this look nice? Otherwise just source code as naive implementation...
      }, 
   }, 
   "helper_imports": [
      ... list of imported modules
   ], 
   "helper_fns": {
       "some_fn": "<python source code>"
   }

Alternatives considered

  • Consider making it like React Components which are imported as their own module
    • Read more on how React does it -- right now only have high-level understanding
      • Basically subclassing (extends React.Component) and standard function protocol (e.g. render should return HTML / JSx)

Additional context

  • Good module to mix-in DSL parser (seems generic enough, and a time to "standardize" how pydian DSL strings work)
  • Set-up for pipelines (and good initial area for ideas)
  • ... OMOP<>FHIR stuff! Worth reviewing prior notes from CS343D + CS343S too

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions