AIPAL Validation

External validation pipeline for AIPAL (Acute Leukemia Prediction), a machine learning model that predicts acute leukemia subtypes (ALL, AML, APL) from routine laboratory measurements. This tool provides a FHIR-based data extraction pipeline, outlier detection, and model retraining capabilities for multicentric validation.

Citation

If you use this software in your research, please cite:

Citation will be added upon publication.

Local Setup

Prerequisites

Python >= 3.10

R with packages: dplyr, tidyr, yaml, caret, xgboost

sudo apt-get install r-base
R -e "install.packages(c('dplyr', 'tidyr', 'yaml', 'caret', 'xgboost'), repos='https://cran.r-project.org/')"

Install dependencies:
```
poetry install
```

Run the validation pipeline:

poetry run aipal_validation --task aipal --step [all,data,sampling,test]

Docker Setup

Set your data directory and run the container:

export DATA_DIR=/path/to/your/data
docker compose run aipal bash

Inside the container:

python -m aipal_validation --task aipal --step [all,data,sampling,test]

Project Structure

aipal_validation/ — Main package
- r/ — R scripts for prediction and model training
- config/ — Configuration files
- data_preprocessing/ — Data preprocessing modules
- eval/ — Evaluation and analysis scripts
- fhir/ — FHIR data extraction, filtering, and validation
- ml/ — Machine learning modules
- outlier/ — Outlier detection (Isolation Forest, LOF)
- helper/ — Utility functions

Importing Data from Excel

If you don't have a FHIR server and want to import data from an Excel sheet:

Update the run_id in the config to match your cohort name.
In your root_dir, create <cohort_name>/aipal/ and place your Excel sheet there.
Generate samples:
```
python -m aipal_validation --task aipal --step sampling
```
Ensure column names in your Excel file match those expected by generate_custom_samples.py.

Run the validation pipeline:

python -m aipal_validation --task aipal --step test

Outlier Detection

Run outlier detection using Isolation Forest and Local Outlier Factor (LOF):

poetry run aipal_validation --task outlier --step detect

Model Retraining (Pediatric Subset)

Retrain the AIPAL model on a pediatric subset (age < 18):

poetry run aipal_validation --task retrain --step all

This will split data into training/testing sets, train an XGBoost model, save the retrained model to aipal_validation/r/, and evaluate on the test set.

Credits

The XGBoost model file (221003_Final_model_res_list.rds) was obtained from VincentAlcazer/AIPAL (MIT License). No source code from the original repository is reused; only the trained model asset is redistributed for validation purposes.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 375 Commits
aipal_validation		aipal_validation
jupyter		jupyter
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
tmp.cmd		tmp.cmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIPAL Validation

Citation

Local Setup

Prerequisites

Docker Setup

Project Structure

Importing Data from Excel

Outlier Detection

Model Retraining (Pediatric Subset)

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

UMEssen/aipal-validation

Folders and files

Latest commit

History

Repository files navigation

AIPAL Validation

Citation

Local Setup

Prerequisites

Docker Setup

Project Structure

Importing Data from Excel

Outlier Detection

Model Retraining (Pediatric Subset)

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages