$$$$$$$\ $$$$$$\ $$\
$$ __$$\ $$ __$$\ $$ |
$$ | $$ | $$$$$$\ $$$$$$\ $$$$$$\ $$ / \__| $$$$$$\ $$ |$$\ $$\
$$ | $$ |$$ __$$\ $$ __$$\ $$ __$$\ \$$$$$$\ $$ __$$\ $$ |\$$\ $$ |
$$ | $$ |$$$$$$$$ |$$$$$$$$ |$$ / $$ | \____$$\ $$ / $$ |$$ | \$$\$$ /
$$ | $$ |$$ ____|$$ ____|$$ | $$ |$$\ $$ |$$ | $$ |$$ | \$$$ /
$$$$$$$ |\$$$$$$$\ \$$$$$$$\ $$$$$$$ |\$$$$$$ |\$$$$$$ |$$ | \$ /
\_______/ \_______| \_______|$$ ____/ \______/ \______/ \__| \_/
$$ |
$$ |
\__|
git clone
https://github.com/Tuttlelab/DeepSolv
cd DeepSolv
pip install -r requirements.txtpython DeepSolv.pyThis is a generalizable method based on the free energies and solvation of molecules.
This method uses the ANI2x deep learning model to optimize molecules and predict their energies as well as an auxiliary model we have trained on solvation energy (
Such that the total energy is calculated as:
This model focus on reproducing the work of Magill et al. (https://doi.org/10.1021/ja038973x) in the investigation of carbene pKa values.
- It is trained the CPCM(Water) solvation energy difference between conformers in the aqueous and gaseous phase
- Molecular self-interaction energy is not removed before training (as it is already removed by equation #)
- There is an additional partical given to positively charged molecule (Uranium as it need only be a placeholder that represents the presence of a +1 charge).
- In this study we have focused on the prediction of carbene pKa's and thus that is what is represented in the dataset.
- The addition of a charged particle has proven to work well though has only been tested on relatively small molecules of a +1 charge and has been placed at the COM.
We used the QMSpin dataset which contains thousands of carbene conformers as the corpus of carbene molecules on which to train model.
These carbenes and their protonated forms, where optimized as well as run as MD trajectories (in the gas phase) to yield a large number of conformers. Each of these conformers was then simulated in CPCM water in order to calculate the
In order to gain additional data the original ANI1 dataset was also used as a based to generate additional
The training data (in HDF5 format) can be downloaded from the University of Strathclyde's KnowledgeBase at: LINK
The training set does not contain any of the carbenes for which are we predicting the pKa.
We test several different methods of calculating pKa, which fall under two categories:
- Using reference and calculated
$G(H)$ and$\Delta G^{0}_{solv}(H)$ along with the carbene thermodynamic cycle energetic values - Using linear machine learning models to fit the carbene thermodynamic cycle energetic values
Using the reference
Using ANI model consistent values for
This works well because the
Ultimatley it doesnt matter what values are used for
