Skip to content

Commit ad92164

Browse files
authored
Merge pull request #28 from MobleyLab/mobley
Deposit new files for major rebuild of database
2 parents 8330104 + c845217 commit ad92164

3,254 files changed

Lines changed: 67691 additions & 1409690 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 42 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,14 +26,17 @@ The current set and format is motivated by several factors:
2626

2727
The database consists of a .tar.gz file containing:
2828
* `database.txt`: A semicolon delimited text file containing compound IDs, SMILES, IUPAC names or similar, experimental values and uncertainties, calculated values, DOIs for references, and notes. Format described in the header
29-
* `database.pickle`: Python pickle file containing the same database, with some extra fields as well (notably, 'groups', which provides functional groups for the compounds as assigned by checkmol)
29+
* `database.pickle`: Python pickle file containing the same database, with some extra fields as well including 'groups', which provides functional groups for the compounds as assigned by checkmol), PubChem compound IDs, calculated enthalpies of hydration, some experimental enthalpies of hydration (from ORCHYD), and components of the enthalpy of hydration and hydration free energy (as described in our forthcoming paper, to be linked here soon).
3030
* `groups.txt`: Functional groups for compounds as assigned by checkmol. Semicolon delimited. First field is compound ID, second field is compound name, and subsequent fields are functional groups.
3131
* `iupac_to_cid.pickle, smiles_to_cid.pickle`: Python pickle files containing conversion of IUPAC name to compound id and SMILES string to compound id, stored in dictionaries
3232
* Structure files:
33-
* `mol2files_sybyl`: `mol2` files with partial charges as written by OEChem in Sybyl format/Sybyl atom types
34-
* `mol2files_gaff`: `mol2` files with partial charges as used for our hydration free energy calculations (AMBER GAFF atom types)
35-
* `sdffiles`: `sdf` files with partial charges as written by OEChem
36-
* `topgro`: GROMACS format topology and coordinate files as used for our AM1-BCC GAFF hydration free energy calculations. Technical note: There may be some variation as to whether water molecules are or are not included in these files; these are intended to be used for the small molecule parameters only.
33+
* `mol2files_sybyl.tar.gz`: `mol2` files with partial charges as written by OEChem in Sybyl format/Sybyl atom types
34+
* `mol2files_gaff.tar.gz`: `mol2` files with partial charges as used for our hydration free energy calculations (AMBER GAFF atom types)
35+
* `sdffiles.tar.gz`: `sdf` files with partial charges as written by OEChem
36+
* `topgro.tar.gz`: GROMACS format topology and coordinate files as used for our AM1-BCC GAFF hydration free energy calculations. Technical note: There may be some variation as to whether water molecules are or are not included in these files; these are intended to be used for the small molecule parameters only.
37+
38+
(See the Manifest below for a more complete list of all available files.)
39+
3740

3841
# The future:
3942

@@ -42,6 +45,30 @@ The database is maintained on the cite-able eScholarship repository of the Unive
4245
Please cite:
4346
> Mobley, David L. (2013). Experimental and Calculated Small Molecule Hydration Free Energies. UC Irvine: Department of Pharmaceutical Sciences, UCI. Retrieved from: http://www.escholarship.org/uc/item/6sd403pz
4447
48+
# Manifest
49+
- `gromacs_analysis`: Contains plots resulting from GROMACS analysis of some of the data in FreeSolv.
50+
- `gromacs_energies`: Contains XVG files associated with the most recent (2017) update of FreeSolv calculated values; these files are large and are only available in the archived version of the database and not on GitHub.
51+
- `gromacs_mdpfiles`: Contains GROMACS run (.mdp) files used for the calculations connected with the most recent (2017) update of the calculated hydration free energies and enthalpies reported here.
52+
- `mol2files_gaff.tar.gz`: contains mol2 files for all compounds with AM1-BCC charges and GAFF atom types
53+
- `mol2files_sybyl.tar.gz`: contains mol2 files for all compounds with AM1-BCC charges and SYBYL atom types
54+
- `primary-data`: Primary data from which the contents of this database can be re-generated; obtained from full database via `scripts/extract-primary-data.py`
55+
- `prmcrd.tar.gz`: AMBER format parameter, coordinate, and frcmod files corresponding to the systems we ultimately simulated in GROMACS.
56+
- `scripts`: Scripts pertaining to the material deposited here
57+
- `sdffiles.tar.gz`: SDF-format files for all of the molecules deposited here (as in `mol2files_gaff` and `mol2files_sybyl`)
58+
- `topgro.tar.gz`: GROMACS format topology and coordinate files for the calculations associated with the computed values in FreeSolv, for calculations in gas phase
59+
- `topgro_solvated.tar.gz`: GROMACS format topology and coordinate files for the calculations associated with the computed values in FreeSolv, for calculations in solution
60+
- `README.md`: This file
61+
- `database.pickle`: Python pickle file of the FreeSolv database
62+
- `database.json`: JSON format version of the FreeSolv database also stored in `database.pickle`
63+
- `database.txt`: Text format version of some of the fields from the database
64+
- `groups.txt`: Functional groups assigned to the different compounds in the database
65+
- `iupac_to_cid.pickle` and `.json`: Python pickle file and JSON file containing a dictionary for converting IUPAC names to FreeSolv compound IDs
66+
- `smiles_to_cid.pickle' and `.json`: Python pickle and JSON file containing a dictionary for converting SMILES strings to FreeSolv compound IDs
67+
68+
# Rebuilding FreeSolv
69+
70+
The input files deposited here can be rebuilt (from SMILES strings) using the script `scripts/rebuild_freesolv.py`, which requires the Chodera lab's `openmoltools` package and the Mobley Lab's `SolvationToolkit`, both of which are `conda` installable from the `omnia` channel.
71+
4572
# Change log/version history:
4673

4774
This dataset started by taking all of the compounds we have studied previously with hydration free energies (references 1, 2, 4-9) including those from SAMPL4 and compiling them all into one big set, removing any redundancies and providing data, references, etc. for all of them. Details of changes for specific versions are found below.
@@ -112,6 +139,16 @@ Please also note that some discrepancies between experimental values here and va
112139
## Version 0.320:
113140
Same as the above but initiates Zenodo DOIs. DOI http://dx.doi.org/10.5281/zenodo/159499
114141

142+
## Version 0.5 (Jan. 26, 2017):
143+
* Re-generates all input files (`.mol2`, `.sdf`, GROMACS and AMBER format files, etc.) from primary data (SMILES strings)
144+
* Deposits scripts used for re-generating the database in the `scripts` directory
145+
* Re-calculates all calculated values (in conjunction with forthcoming paper)
146+
* Adds calculated enthalpies of hydration and components of enthalpy
147+
* Adds charge and non-polar components of hydration free energy
148+
* Adds a few experimental enthalpies of hydration obtained from the ORCHYD dataset
149+
* Adds `README.md` files in some of the sub-directories better indicating their contents
150+
* Corrects `tripos_mol2` back to `mol2files_sybyl` for consistency with `mol2files_gaff` (as in a prior version, but we had lost this change)
151+
* Provides JSON versions of database files
115152

116153
## Changes not yet in a formal release:
117154

database.json

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)