Skip to content

istars-fmul/proj-sciii-diagnostic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI zenodo

Geometric morphometrics based diagnostic model for Skeletal Class III patients

This project aims to develop a data-driven approach to standardize the diagnosis of Class III skeletal malocclusion (SCIII). We use landmark coordinates annotated from lateral cephalometric radiographs to cluster patients based on craniofacial morphology.

We preprocess the data using geometric morphometric techniques, specifically Generalized Procrustes Analysis (GPA), and apply k-means clustering (with k=6) using a reduced set of 12 key anatomical landmarks.. Additionally, we explore other clustering algorithms and different values of k.

We provide a comprehensive evaluation of clustering robustness through a cross-validation framework, and implement a K-Nearest Neighbors (KNN) classifier to assign new patients to clusters; And interpretation of the obtained subphenotypes carried out in close collaboration with clinical experts in the field.

Further details on the methodology and findings are described in the associated publication: https://doi.org/10.1038/s43856-026-01557-y

To access the datasets—comprising a cohort of 655 Class III patients of White origin (used for training) and an external dataset of 186 patients of Korean origin and 85 White origin - please request access from the corresponding authors via the link below.

Project overview

We start by exploring our dataset of 655 SCIII (step1_dataset_exploration.ipynb) and further analyse the reliability of the annotated anatomical landmarks (step2_annotation_reliability.ipynb). Following a geometric morphometrics approach, we used Generalized Procrustes Analysis (step3_generalized_procrustes_analysis.ipynb). Finally, we present the results of the clustering analysis and introduce six subphenotypes of skeletal Class III malocclusion (step4_clustering_analysis.ipynb). Other notebooks also include the analyses conducted for algorithm selection (step4.1_clustering_algorithms.ipynb), as well as, robustness assessment and subphenotype assignment for new samples (step4.2_cross_validation.ipynb, step4.3_clustering_prediction_korean_population.ipynb, step4.4_clustering_prediction_white_external_population.ipynb).

Directory structure

├── README.md          <- The top-level README for developers using this project.
│
│
├── notebooks          <- Jupyter notebooks.
|                       step1_dataset_exploration.ipynb
|                       step2_annotation_reliability.ipynb
|                       step3_generalized_procrustes_analysis.ipynb
|                       step4_clustering_analysis.ipynb
|                       step4.1_clustering_algorithms.ipynb
|                       step4.2_cross_validation.ipynb
|                       step4.3_clustering_prediction_korean_population.ipynb
|                       step4.4_clustering_prediction_white_external_population.ipynb    
│
├── outputs            <- Results from models/analysis go here, like figures, metrics, or predictions
│
├── config.yaml        <- config file with parameters for running the pipelines
│
└─── scripts	       <- scripts developed that are related to this project 



Step-by-step Instructions

🔧 1. Install & Configure Environment

  • Make sure you have R installed. Recommended version: R version 4.4.3 (2025-02-28)
  • Install Conda (via Miniconda or Anaconda)
  • Install Jupyter and the R kernel:
Rscript -e "install.packages('IRkernel'); IRkernel::installspec()"

📂 2. Clone the Repository

Use the terminal to clone the GitHub repository:

git clone https://github.com/istars-fmul/proj-sciii-diagnostic.git
cd proj-sciii-diagnostic

📦 3. Set Up the Conda Environment

Move to the root of the project directory (where environment.yml is located), then run:

conda env create -f environment.yml

This will create a new Conda environment named: r_env_sciii_diagnosis


▶️ 4. Activate the Environment

conda activate r_env_sciii_diagnosis

📁 5. Include the Dataset

To reproduce the results in this repository, a data/ folder must be present in the project root. This folder should contain:

  • Cephalometric measurements dataset
  • Landmark coordinate dataset
  • Patient info dataset

If need change the path in the config.yml file.

SCIII Diagnostic Tool Web Interface

To compute and obtain the subphenotype, you can use our online application. Please ensure that your input Excel file containing the annotated landmarks is in the correct format.

Citation

Faria-Teixeira, M.C., Carvalho, I.M.N., Dehesa-Santos, A. et al. Geometric morphometrics based diagnostic model for Skeletal Class III patients. Commun Med (2026). https://doi.org/10.1038/s43856-026-01557-y

Contact

For questions, feedback, or collaboration, feel free to reach out:

iStars Team:

Website: https://istars.pt/

Clinical team:

License

This software is available under the European Union Public Licence (EUPL) v1.2 or later. For proprietary use, commercial support, or alternative licensing terms, please contact the authors.

About

In this project, we explore the use of clinical data on skeletal Class III malocclusion (SCIII) to propose and present subphenotypes. We employ an unsupervised method to cluster the data. We also analyze the robustness of the resulting clusters through cross-validation and provide subphenotype prediction using a fitted KNN model.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors