Skip to content

ob325/cohort-tests

Repository files navigation

cohort-tests

Property-based testing framework for generating OHDSI cohort definitions that conform to the Atlas/Circe JSON schema.

Overview

This project uses test.check to generate random, valid OHDSI cohort definitions. Instead of manually writing cohort definitions, property-based testing automatically generates hundreds of valid cohorts to ensure your cohort processing code can handle a wide variety of inputs.

Features

  • Generators for all cohort definition components:

    • Concepts with OMOP vocabulary metadata
    • Concept sets with unique sequential IDs
    • Primary criteria with various domain types (always references existing concept sets)
    • Correlated criteria with temporal windows and occurrence constraints
    • Inclusion rules with complex correlation expressions
    • Optional components (observation windows, limits, collapse settings, demographic criteria)
  • Property-based tests that verify:

    • Generated cohorts have required fields
    • All nested structures are valid
    • Primary criteria is never empty
    • Concept set IDs are unique
    • CodesetIds always reference existing concept sets (in both primary criteria and inclusion rules)
    • Correlated criteria have valid temporal windows
    • JSON serialization/deserialization preserves structure
    • Schema compliance (ConceptSets, PrimaryCriteria, InclusionRules, etc.)
  • 21 property tests running 100+ iterations each (2100+ generated cohorts) to validate cohort structure

Quick Start

Generate Cohorts

Generate cohorts and display them in the console:

clj -M:run 10

Generate cohorts and save them to disk (one JSON file per cohort):

clj -M:run 20 --output-dir output

Save to a custom directory:

clj -M:run 5 --output-dir C:\my-cohorts

Arguments:

  • <count> - Number of cohorts to generate (required)
  • --output-dir <directory> - Optional directory to save cohorts

Output files (when using --output-dir):

  • cohort-0.json
  • cohort-1.json
  • cohort-2.json
  • etc.

Build Uberjar

Build a standalone executable JAR file:

clj -T:build uber

This creates target/cohort-tests-0.1.0-standalone.jar.

Run the uberjar:

# Generate 10 cohorts
java -jar target/cohort-tests-0.1.0-standalone.jar 10

# Generate 20 cohorts and save to directory
java -jar target/cohort-tests-0.1.0-standalone.jar 20 --output-dir output

# Generate 5 cohorts and save to custom location
java -jar target/cohort-tests-0.1.0-standalone.jar 5 --output-dir C:\cohorts

Clean build artifacts:

clj -T:build clean

The uberjar includes all dependencies and can be distributed as a single file. No Clojure installation required to run it (only Java).

Run Tests

clj -M:test

This runs both property-based tests (1000+ generated cohorts) and unit tests.

Alternatively, run tests directly:

clj -M:test -e '(require (quote cohort-tests.core-test)) (clojure.test/run-tests (quote cohort-tests.core-test))'

REPL Usage

clj -M:repl

Then in the REPL:

(require '[cohort-tests.core :as cohort])
(require '[clojure.test.check.generators :as gen])

;; Generate a single cohort
(gen/generate cohort/gen-cohort-definition)

;; Generate 10 cohorts
(cohort/generate-cohorts 10)

;; Generate and validate
(def cohorts (cohort/generate-cohorts 5))
(every? cohort/validate-cohort-definition cohorts)
; => true

;; Convert to JSON
(cohort/cohort->json (first cohorts))

;; Generate specific components
(gen/generate cohort/gen-concept)
(gen/generate cohort/gen-concept-set)
(gen/generate cohort/gen-primary-criteria)

Schema Compliance

Generated cohorts conform to the OHDSI Atlas/Circe JSON schema (draft-04):

Required Fields

  • ConceptSets - Array of concept set definitions
  • PrimaryCriteria - Entry criteria with CriteriaList

Optional Fields

  • QualifiedLimit - String or object with Type
  • ExpressionLimit - String or object with Type
  • InclusionRules - Array of inclusion rule objects
  • CollapseSettings - Era collapse configuration
  • cdmVersionRange - CDM version specification

Architecture

Generators (src/cohort_tests/core.clj)

Basic Components:

  • gen-concept - OMOP concept with ID, name, domain, vocabulary
  • gen-concept-set-item - Concept with inclusion/exclusion flags
  • gen-concept-set - Named set with concept expression
  • gen-observation-window - Prior/post days window
  • gen-criteria-item - Domain criteria (Condition, Drug, Procedure, Measurement, Observation)
  • gen-primary-criteria - Entry criteria with optional window (always references valid concept sets)

Correlated Criteria Components:

  • gen-window - Temporal windows (StartWindow/EndWindow) with start/end days
  • gen-correlated-criteria-item - Criteria with temporal windows and occurrence constraints
  • gen-demographic-criteria - Age and gender criteria
  • gen-correlated-criteria - Complete correlation expression with criteria lists and groups
  • gen-inclusion-rule - Inclusion rule with name and correlated criteria expression
  • gen-inclusion-rules - List of inclusion rules

Top-Level Generator:

  • gen-cohort-definition - Complete valid cohort with:
    • Unique sequential concept set IDs (0, 1, 2, ...)
    • Primary criteria that always references existing concept sets
    • Optional inclusion rules with correlated criteria
    • Optional qualified/expression limits, collapse settings, CDM version

Property Tests (test/cohort_tests/core_test.clj)

21 property-based tests verify structural correctness (2100+ cohorts validated):

Basic Structure:

  1. generated-cohorts-are-valid - All required fields present
  2. generated-cohorts-have-concept-sets - ConceptSets is non-empty vector
  3. generated-cohorts-have-primary-criteria - PrimaryCriteria is valid map
  4. concept-sets-have-required-fields - Each set has id, name, expression
  5. concept-set-items-have-concepts - Items contain concept objects
  6. concepts-have-required-fields - Concepts have ID and name
  7. criteria-list-is-vector - CriteriaList is a vector

Enhanced Validations: 8. concept-set-ids-are-unique - No duplicate concept set IDs 9. primary-criteria-never-empty - CriteriaList always has ≥1 item 10. codeset-ids-reference-existing-concept-sets - All CodesetIds in primary criteria are valid

Correlated Criteria: 11. inclusion-rules-have-valid-structure - Rules have name and expression 12. inclusion-rule-codeset-ids-reference-existing-concept-sets - CodesetIds in inclusion rules are valid 13. correlated-criteria-have-valid-windows - Temporal windows have Start and End

Optional Fields: 14. observation-window-has-valid-structure - Window has PriorDays/PostDays 15. qualified-limit-is-valid - Limit is string or object 16. expression-limit-is-valid - Limit is string or object 17. collapse-settings-is-valid - Settings have valid structure

Serialization: 18. json-roundtrip-preserves-structure - Serialization is lossless

Example Output

{
  "ConceptSets": [
    {
      "id": 0,
      "name": "abc123",
      "expression": {
        "items": [
          {
            "concept": {
              "CONCEPT_ID": 42,
              "CONCEPT_NAME": "xyz789",
              "DOMAIN_ID": "Condition",
              "VOCABULARY_ID": "SNOMED",
              "CONCEPT_CODE": "def456"
            },
            "includeDescendants": true,
            "isExcluded": false,
            "includeMapped": false
          }
        ]
      }
    }
  ],
  "PrimaryCriteria": {
    "CriteriaList": [
      {
        "ConditionOccurrence": {
          "CodesetId": 0
        }
      }
    ],
    "ObservationWindow": {
      "PriorDays": 30,
      "PostDays": 0
    }
  }
}

Dependencies

  • Clojure 1.11.1
  • test.check 1.1.1 (property-based testing)
  • data.json 2.4.0 (JSON serialization)

License

Copyright © 2026

About

Property based testing for OHDSI cohort definition JSON

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors