External validation pipeline for AIPAL (Acute Leukemia Prediction), a machine learning model that predicts acute leukemia subtypes (ALL, AML, APL) from routine laboratory measurements. This tool provides a FHIR-based data extraction pipeline, outlier detection, and model retraining capabilities for multicentric validation.
If you use this software in your research, please cite:
Citation will be added upon publication.
-
Python >= 3.10
-
R with packages:
dplyr,tidyr,yaml,caret,xgboostsudo apt-get install r-base R -e "install.packages(c('dplyr', 'tidyr', 'yaml', 'caret', 'xgboost'), repos='https://cran.r-project.org/')"
-
Install dependencies:
poetry install
-
Run the validation pipeline:
poetry run aipal_validation --task aipal --step [all,data,sampling,test]
-
Set your data directory and run the container:
export DATA_DIR=/path/to/your/data docker compose run aipal bash -
Inside the container:
python -m aipal_validation --task aipal --step [all,data,sampling,test]
aipal_validation/— Main packager/— R scripts for prediction and model trainingconfig/— Configuration filesdata_preprocessing/— Data preprocessing moduleseval/— Evaluation and analysis scriptsfhir/— FHIR data extraction, filtering, and validationml/— Machine learning modulesoutlier/— Outlier detection (Isolation Forest, LOF)helper/— Utility functions
If you don't have a FHIR server and want to import data from an Excel sheet:
- Update the
run_idin the config to match your cohort name. - In your
root_dir, create<cohort_name>/aipal/and place your Excel sheet there. - Generate samples:
Ensure column names in your Excel file match those expected by
python -m aipal_validation --task aipal --step sampling
generate_custom_samples.py. - Run the validation pipeline:
python -m aipal_validation --task aipal --step test
Run outlier detection using Isolation Forest and Local Outlier Factor (LOF):
poetry run aipal_validation --task outlier --step detectRetrain the AIPAL model on a pediatric subset (age < 18):
poetry run aipal_validation --task retrain --step allThis will split data into training/testing sets, train an XGBoost model, save the retrained model to aipal_validation/r/, and evaluate on the test set.
The XGBoost model file (221003_Final_model_res_list.rds) was obtained from VincentAlcazer/AIPAL (MIT License). No source code from the original repository is reused; only the trained model asset is redistributed for validation purposes.
This project is licensed under the MIT License.