This repository contains codebase for training a scientific reasoning model to generate protein-binding ligands.
We create a scientific reasoning model to generate small drug-like molecules bdining to diverse protein targets. This repository contains code for chain-of-thought (CoT) SFT and reinforcement learning (RL) training.
We use group relative policy optimization (GRPO) RL training with custom reward functions for optimizing chemical validity, synthesizability and binding affinity of generated molecules across diverse protein targets.
To install the packages we recommend creating two separate conda environments. One for running SFT and RL training and other to run inference with vLLM.
To install training conda environment run:
conda env create -f molflow.ymlTo install vLLM inference environment run:
conda env create -f vllm.ymlFor both SFT and RL training we leverage accelerate to distribute our training across multiple GPUs.
max_length = 512
per_device_batch_size = 2
gradient_accumulation_steps = 2
max_steps = 25000
learning_rate = 5e-6
weight_decay = 0.05
warmup_steps = 1000
seed = 42To start training, run:
accelerate launch sft_train.pyFor RL training we use LoRA and trl's GRPOTrainer. Details about training parameters can be found in grpo_train.py
To start training, run:
accelerate launch grpo_train.pyAll training runs were conducted on 4X NVIDIA H100 80GB GPUs.