Preprocessing of raw .pdb protein files for Uni-Mol pocket encoder

### Details

Hello! Thank you for your amazing work on molecular representation learning!

I am interested in computing pocket representations with Uni-Mol for some experimental structures from the PDB.
As I understand from the paper (Appendix A), raw PDB data is first preprocessed: missing heavy atoms, hydrogen atoms, and water molecules are added.

While going through the repository, specifically the [example](https://github.com/deepmodeling/Uni-Mol/blob/main/unimol/notebooks/unimol_pocket_repr_demo.ipynb) for computing pocket representations, I could not find the part where such preprocessing is performed.
As far as I understand, this needs to be done as a prerequisite.

Could you please share the scripts that were used to preprocess raw protein data for the pocket encoder pretraining?

I also have a few other related questions about preprocessing:
1) In Appendix C, it is stated that hydrogen atoms were removed from the pocket input structures during pretraining. However, in the pretraining [example](https://github.com/deepmodeling/Uni-Mol/tree/main/unimol#pocket-pretraining), the `remove-hydrogen` flag is not used. It also seems that the pocket pretraining dataset [transformations](https://github.com/deepmodeling/Uni-Mol/blob/main/unimol/unimol/tasks/unimol_pocket.py) retain hydrogens in the structure. Could you clarify this discrepancy?
2) Do you remove heterogens (ions, cofactors) during raw data preprocessing?
3) Were hydrogen and water positions minimized with some force field or added using templates?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing of raw .pdb protein files for Uni-Mol pocket encoder #370

Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Preprocessing of raw .pdb protein files for Uni-Mol pocket encoder #370

Description

Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions