Skip to content

MartinNoc/PublicDatasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ClinicalDatasets

Overview of public clinical datasets

Dataset Input type Task Sample size
BreakHis images binary classification 700 x 460 x 3
OpenML

dermatology

pbcseq

tabular data classification

1 x 34

1 x 19

FLamby

Camelyon16

TCGA-BRCA

ISIC2019

Heart-Disease

LIDC-IDRI

IXI Tiny

KiTS2019

images

tabular data

images

tabular data

3D CT scans

3D MRI images

3D CT scans

cancer detection

survival prediction

melanoma classification

heart disease detection

segmentation

segmentation

segmentation

10,000 x 2,048

1 x 39

200 x 200 x 3

1 x 13

128 x 128 x 128

83 x 44 x 55

64 x 192 x 192

FLOP

COVIDx

Kvasir

images

images

classification

Gastrointestinal disease classification

480 x 480 x 3

from 720 x 576 up to 1920 x 1072

RSNA Breast Cancer images breast cancer detection from 2,776 x 2,082 to 3,328 x 2,560
MIMIC-III mixed (mostly tabular) ?
NHANES tabular data ?

Additional dataset information:

Joshi et al.

Joshi et al. Federated Learning for Healthcare Domain - Pipeline, Applications and Challenges. 2022.

Tables 1 and 2 list studies on FL and ML in healthcare. Used datasets of these studies need to be investigated.

UK Biobank

https://www.ukbiobank.ac.uk/

PhysioNet

https://physionet.org/

The PhysioNet dataset is a collection of physiological signals and clinical data for use in research and education. It includes data on electrocardiograms (ECGs), electroencephalograms (EEGs), blood pressure, and other physiological signals.

Global Health Observatory (GHO) data

https://www.who.int/data/gho

The GHO data from the World Health Organization (WHO) includes health-related statistics for countries around the world. The dataset includes information on mortality rates, disease incidence, health expenditures, and other health indicators.

Todo

Diabetes Dataset (1988)

This dataset contains data related to diabetes patients and includes information such as age, gender, body mass index, blood pressure, and various blood serum measurements.

Kaggle dataset

  • 768 patients (females at least 21 years old of Pima Indian heritage)
  • features: pregnancies, glucose, blood pressure, skin thickness, insulin, bmi, diabetes pedigree function, age
  • 500 without diabetes, 268 with diabetes

TCGA-LUAD

The Cancer Genome Atlas Lung Adenocarcinoma Collection

https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=6881474

OASIS

OASIS: Brain MRIs, 400+, 256 x 256

https://oasis-brains.org/

ADNI

ADNI: Alzheimer's Disease Neuroimaging Initiative, MRI

https://adni.loni.usc.edu/

NLST

Cancer Imaging Archive: 21M images (11 TB), only cancer images??

https://wiki.cancerimagingarchive.net/display/NLST/National+Lung+Screening+Trial

Covid-19

chest x-ray (normal, pneumonia, covid), 100-200 per class

https://www.kaggle.com/datasets/pranavraikokte/covid19-image-dataset

About

Overview of public clinical datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors