This project aims to develop a data-driven approach to standardize the diagnosis of Class III skeletal malocclusion (SCIII). We use landmark coordinates annotated from lateral cephalometric radiographs to cluster patients based on craniofacial morphology.
We preprocess the data using geometric morphometric techniques, specifically Generalized Procrustes Analysis (GPA), and apply k-means clustering (with k=6) using a reduced set of 12 key anatomical landmarks.. Additionally, we explore other clustering algorithms and different values of k.
We provide a comprehensive evaluation of clustering robustness through a cross-validation framework, and implement a K-Nearest Neighbors (KNN) classifier to assign new patients to clusters; And interpretation of the obtained subphenotypes carried out in close collaboration with clinical experts in the field.
Further details on the methodology and findings are described in the associated publication: https://doi.org/10.1038/s43856-026-01557-y
To access the datasets—comprising a cohort of 655 Class III patients of White origin (used for training) and an external dataset of 186 patients of Korean origin and 85 White origin - please request access from the corresponding authors via the link below.
We start by exploring our dataset of 655 SCIII (step1_dataset_exploration.ipynb) and further analyse the reliability of the annotated anatomical landmarks (step2_annotation_reliability.ipynb). Following a geometric morphometrics approach, we used Generalized Procrustes Analysis (step3_generalized_procrustes_analysis.ipynb). Finally, we present the results of the clustering analysis and introduce six subphenotypes of skeletal Class III malocclusion (step4_clustering_analysis.ipynb). Other notebooks also include the analyses conducted for algorithm selection (step4.1_clustering_algorithms.ipynb), as well as, robustness assessment and subphenotype assignment for new samples (step4.2_cross_validation.ipynb, step4.3_clustering_prediction_korean_population.ipynb, step4.4_clustering_prediction_white_external_population.ipynb).
├── README.md <- The top-level README for developers using this project.
│
│
├── notebooks <- Jupyter notebooks.
| step1_dataset_exploration.ipynb
| step2_annotation_reliability.ipynb
| step3_generalized_procrustes_analysis.ipynb
| step4_clustering_analysis.ipynb
| step4.1_clustering_algorithms.ipynb
| step4.2_cross_validation.ipynb
| step4.3_clustering_prediction_korean_population.ipynb
| step4.4_clustering_prediction_white_external_population.ipynb
│
├── outputs <- Results from models/analysis go here, like figures, metrics, or predictions
│
├── config.yaml <- config file with parameters for running the pipelines
│
└─── scripts <- scripts developed that are related to this project
- Make sure you have R installed. Recommended version:
R version 4.4.3 (2025-02-28) - Install Conda (via Miniconda or Anaconda)
- Install Jupyter and the R kernel:
Rscript -e "install.packages('IRkernel'); IRkernel::installspec()"Use the terminal to clone the GitHub repository:
git clone https://github.com/istars-fmul/proj-sciii-diagnostic.git
cd proj-sciii-diagnostic
Move to the root of the project directory (where environment.yml is located), then run:
conda env create -f environment.yml
This will create a new Conda environment named:
r_env_sciii_diagnosis
conda activate r_env_sciii_diagnosis
To reproduce the results in this repository, a data/ folder must be present in the project root. This folder should contain:
- Cephalometric measurements dataset
- Landmark coordinate dataset
- Patient info dataset
If need change the path in the config.yml file.
To compute and obtain the subphenotype, you can use our online application. Please ensure that your input Excel file containing the annotated landmarks is in the correct format.
Faria-Teixeira, M.C., Carvalho, I.M.N., Dehesa-Santos, A. et al. Geometric morphometrics based diagnostic model for Skeletal Class III patients. Commun Med (2026). https://doi.org/10.1038/s43856-026-01557-y
For questions, feedback, or collaboration, feel free to reach out:
iStars Team:
- Inês M. N. Carvalho: ines.c@edu.ulisboa.pt
- João C. Guimarães: joao.guimaraes@medicina.ulisboa.pt
Website: https://istars.pt/
Clinical team:
- Maria Cristina Faria-Teixeira: cristina.vft@gmail.com
- Alejandro Iglesias-Linares: aleigl01@ucm.es
This software is available under the European Union Public Licence (EUPL) v1.2 or later. For proprietary use, commercial support, or alternative licensing terms, please contact the authors.