Density Functional Theory (DFT) is a quantum mechanical method that computes the energy of a system using its electron density instead of its full wavefunction.
For most DFT approximations, errors arise primarily from the functional (functional errors). However, in some systems, errors are dominated by inaccuracies in the electron density (density driven errors). These cases are known as density-sensitive.[1]
To quantify density-driven errors in DFT, the density sensitivity metric
where reactions with
Computing s̃ requires HF (O(N^4)) and LDA (O(N^3)) densities. We instead use machine learning as a surrogate to classify GMTKN55 benchmark reactions as density-sensitive (s̃ ≥ 2) or density-insensitive (s̃ < 2) without explicitly evaluating s̃.
-This work was done as part of ML research in the Burke Group at UC Irvine.
The pipeline integrates physics-informed molecular encoding with modern ML techniques:
- Molecular Parsing – Uses the Atomic Simulation Environment (ASE) to read
.xyzfiles and constructAtomsobjects containing atomic numbers and 3D coordinates. These standardized structures serve as inputs for Molecular descriptor generation. - Coulomb Matrix Molecular Descriptor – Converts each ASE
Atomsobject into a rotation- and permutation-invariant Coulomb matrix molecular descriptor using thedscribelibrary. This descriptor captures interatomic electrostatic interactions in a fixed numerical representation. - Reaction Matrices – extend molecular descriptors to constructs block-diagonal reaction matrices that account for stoichiometric coefficients of reactant and product molecules.
- Spectral Feature Extraction – Computes and sorts eigenvalues of each reaction matrix to obtain fixed-length, invariant feature vectors.
- Learning and Prediction – Trains Decision Tree, Random Forest and XGBoost models for binary classification (density sensitive vs. insensitive).
For a full summary of methods and results, see the project poster.
density_sensitivity-classification/
├── Descriptor1/
│ ├── Descriptor1_complete_features.npy — feature matrix (reaction eigenvalues + metadata)
│ └── Descriptor1_complete_targets.npy — target labels for reactions (density sensitivity)
│
├── descriptor1_model.ipynb — model training and evaluation notebook
├── dimensionality_reduction.ipynb — PCA, UMAP, and t-SNE notebook
├── diagonalize_matrices.py — computes eigenvalues of reaction matrices
├── generate_cm.py — constructs Coulomb matrices
├── pad_and_metadata.py — pads eigenvalue vectors and attaches metadata
├── preprocess.py — preprocessing utility functions
├── main.py — full descriptor generation workflow
├── final_dict_allsets.pkl — Coulomb matrices for all GMTKN55 systems
├── Density_sensitivity_classification_poster.pdf — project poster
├── requirements.txt — Python dependencies
└── README.md — documentation
# Clone the repository
git clone https://github.com/nedamhs/density-sensitivity-classification.git
cd density-sensitivity-classification
# Install dependencies
pip install -r requirements.txt# generates datasets used for ML training
python main.py
ASE, dscribe, NumPy, scikit-learn, XGBoost, Matplotlib.
The dataset exhibits a moderate class imbalance (~33% density-sensitive vs. ~67% density-insensitive reactions). Models were evaluated using metrics robust to imbalance, including balanced accuracy, recall, and precision.
| Model | K* | Accuracy | Balanced Accuracy | ROC-AUC | Recall (Minority) | Precision (Minority) |
|---|---|---|---|---|---|---|
| XGBoost | 22 | 0.821 | 0.812 | 0.883 | 0.784 | 0.710 |
| Random Forest | 22 | 0.801 | 0.791 | 0.864 | 0.763 | 0.679 |
| Decision Tree | 24 | 0.808 | 0.806 | 0.825 | 0.804 | 0.678 |
- GMTKN55 database from Goerigk Research Group
- SWARM dataset from Burke Group
- Burke Group @ UCI
- Goerigk Research Group @ university of Melbourne
- https://hunterheidenreich.com/posts/molecular-descriptor-coulomb-matrix/#the-coulomb-matrix
- https://goerigk.chemistry.unimelb.edu.au/research/the-gmtkn55-database
[1] Burke, K.
Density-Corrected Density Functional Theory.
Burke Research Group, University of California, Irvine.
https://dft.uci.edu/projects_DC.php
[2] Sim, E.; Song, S.; Burke, K.
Quantifying density errors in DFT.
J. Phys. Chem. Lett. 2018, 9 (22), 6385–6392.
DOI: 10.1021/acs.jpclett.8b02855
[3] Lee, M.; Kim, B.; Sim, M.; Sogal, M.; Kim, Y.; Yu, H.; Burke, K.; Sim, E.
Correcting dispersion corrections with density-corrected DFT.
J. Chem. Theory Comput. 2024, 20 (16), 7155–7167.
DOI: 10.1021/acs.jctc.4c00689