Overview of public clinical datasets
| Dataset | Input type | Task | Sample size | |
|---|---|---|---|---|
| BreakHis | images | binary classification | 700 x 460 x 3 | |
| OpenML | tabular data | classification | 1 x 34 1 x 19 |
|
| FLamby | images tabular data images tabular data 3D CT scans 3D MRI images 3D CT scans |
cancer detection survival prediction melanoma classification heart disease detection segmentation segmentation segmentation |
10,000 x 2,048 1 x 39 200 x 200 x 3 1 x 13 128 x 128 x 128 83 x 44 x 55 64 x 192 x 192 |
|
| FLOP | images images |
classification Gastrointestinal disease classification |
480 x 480 x 3 from 720 x 576 up to 1920 x 1072 |
|
| RSNA Breast Cancer | images | breast cancer detection | from 2,776 x 2,082 to 3,328 x 2,560 | |
| MIMIC-III | mixed (mostly tabular) | ? | ||
| NHANES | tabular data | ? |
Joshi et al. Federated Learning for Healthcare Domain - Pipeline, Applications and Challenges. 2022.
Tables 1 and 2 list studies on FL and ML in healthcare. Used datasets of these studies need to be investigated.
The PhysioNet dataset is a collection of physiological signals and clinical data for use in research and education. It includes data on electrocardiograms (ECGs), electroencephalograms (EEGs), blood pressure, and other physiological signals.
The GHO data from the World Health Organization (WHO) includes health-related statistics for countries around the world. The dataset includes information on mortality rates, disease incidence, health expenditures, and other health indicators.
This dataset contains data related to diabetes patients and includes information such as age, gender, body mass index, blood pressure, and various blood serum measurements.
- 768 patients (females at least 21 years old of Pima Indian heritage)
- features: pregnancies, glucose, blood pressure, skin thickness, insulin, bmi, diabetes pedigree function, age
- 500 without diabetes, 268 with diabetes
The Cancer Genome Atlas Lung Adenocarcinoma Collection
https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=6881474
OASIS: Brain MRIs, 400+, 256 x 256
ADNI: Alzheimer's Disease Neuroimaging Initiative, MRI
Cancer Imaging Archive: 21M images (11 TB), only cancer images??
https://wiki.cancerimagingarchive.net/display/NLST/National+Lung+Screening+Trial
chest x-ray (normal, pneumonia, covid), 100-200 per class
https://www.kaggle.com/datasets/pranavraikokte/covid19-image-dataset