PIGEN (Physics Informed Generation) is a framework for generating novel crystal structures by integrating physics-informed sampling, chemically guided control, and structural evaluation into a denoising diffusion model. Building on DiffCSP, PIGEN enables targeted generation beyond known chemical spaces and supports out-of-distribution extrapolation. This enables the generation of chemically and structurally diverse, physically plausible crystal candidates, yielding a higher fraction of stable structures per batch and achieving greater chemical and structural diversity than frameworks such as DiffCSP or MatterGen (as demonstrated in our benchmarks).
conda env create -f environment.yml
conda activate pigenAll dependencies are managed via conda; setup.py is only for local package registration:
pip install -e .https://huggingface.co/datasets/UoLiverpool/Alex_MP_20_M_LED/
For re-training the model with the default dataset - Alex_MP_20_MLED, run:
python pigen/train.pyThis will use the default data and conditioning properties and is equivalent to
python pigen/train.py --data_name Alex_MP_20_M_LED --prop ['entropy_sum', 'target_energy']You can use your trained model or download the model's checkpoint from: huggingface.co/DeepDrew/PIGEN/
After downloading, place the checkpoint file in:
checkpoints/This ensures pigen/generate.py can locate it.
Run
cd pigen
python generate.pyThis repository builds on DiffCSP, an open-source implementation of denoising diffusion probabilistic models for crystal structure prediction. We have further developed and extended it as described below.
- Physics-informed logic integrated into the sampling process
- Conditional generation with target-guided control via classifier-free guidance
- Featurised dataset with local chemical and structural environment feature, enabling out-of-distribution extrapolation
- Chemistry-informed structure evaluation tools
- Modular refactoring for better reproducibility and configuration management.
- Support for PyTorch Distributed Data Parallel to accelerate large-scale training across multiple GPUs or nodes
Run:
pytest tests Note: The Docker image is intentionally left with a flexible entry point (/bin/bash) to allow the user to either train or generate as needed, following the instructions below. This design choice supports both CPU and GPU environments.
docker build -t pigen .docker run --rm pigendocker run --rm --gpus all pigen├── checkpoints
├── data
│ └── Alex_MP_20_M_LED/
├── environment.yml
├── log
├── pigen
│ ├── __init__.py
│ ├── assets/
│ ├── common/
│ ├── eval/
│ ├── generate.py
│ ├── normalization
│ ├── partial_sample.py
│ ├── settings.py
│ └── train.py
├── README.md
├── setup.py
├── tests
│ ├── dummy_data/
│ ├── dummy_logs/
│ ├── fixtures/
│ ├── conftest.py
│ ├── test_dependecies.py
│ ├── test_dummy_training.py
│ ├── test_pd_structure_parsing.py
│ └── test_torch_installation.py
└── verify_environment_installs.py
The original repository DiffCSP licensed under the MIT License. We retain this license and clearly mark any modified components. We gratefully acknowledge the authors of DiffCSP for their contribution to the research and open-source community.