Skip to content

braindatalab/GHOSTS-Experiments

Repository files navigation

GHOSTS: Generation of synthetic hospital time series for clinical machine learning research

Abstract: Machine learning (ML) holds great promise to support, improve, and automatize clinical decision-making in hospitals. Model training on abundantly available routine data, however, is hindered by data protection regulations. Generative models can comply with privacy laws by learning to synthesize hospital data from a target population while ensuring data privacy. Clinical time series acquired during intensive care are difficult to model using established techniques, especially due to uneven sampling intervals. Here we introduce GHOSTS (Generator of Hospital Time Series), a novel generator of synthetic patient trajectories that is capable of generating realistic heterogeneous hospital data including realistic time series with uneven sampling intervals and static patient attributes. To achieve this, GHOSTS introduces novel regularizers and a postprocessing module leveraging low-dimensional summary statistics.We further present a suite of novel benchmarks for synthetic hospital time series, GHOSTS-Bench. We train GHOSTS on a large cohort of patient data from the MIMIC-IV and EICU critical care datasets. Along with measuring the quality of the generated data in terms of how faithfully the distributions of the real data as well as their spatio-temporal dynamics are preserved, we also measure how well ML models trained on the generated data can solve a clinical prediction task on the real data. We observe that GHOSTS outperforms a state-of-the-art approach, DoppelGANger, with respect to these criteria. We intend to make the GHOSTS model, a corpus of synthetic data as well as Python codes implementing GHOSTS and GHOSTS-Bench publicly available. These resources will become instrumental in the future development of powerful predictive models for intensive and perioperative care.

Companion repositories

  1. GHOSTS
  2. GHOSTS-Bench
  3. DoppelGANger
  4. HALO

This is the source code for reproducing experiments in the paper "GHOSTS: Generation of synthetic hospital time series for clinical machine learning research".

  1. Dataset generation: Code for generation of datasets can be found in data_preparation folder. a. MIMIC-IV dataset: Run SQL queries found in data_preparation/queries.sql and run data_preparation/preprocess.ipynb substituting paths. b. EICU dataset: Run data_preparation/extract_r_ricu_eicu.ipynb with R jupyter kernel, or copy code into R script. Then, run data_preparation/eicu_preprocessing.ipynb substituting paths. c. HALO: HALO model requires special encoding. To generate HALO datasets, run data_preparation/convert_to_halo.ipynb to encode the data for HALO. The same notebooks contain the code for decoding back from HALO format.

  2. Benchmarking: For the benchmarking please install companion repository -- GHOSTS-Bench. Configs and instructions can be found in benchmark_configs folder.

  3. Figures and tables: Jupyter notebooks for the figures and tables can be found in the figures folder.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors