feat(bioemu): Add FKC steering denoiser & refactor steering code#203
feat(bioemu): Add FKC steering denoiser & refactor steering code#203
Conversation
Split steering.py into a steering/ package with modular components: - steering/potentials.py: Potential base class, UmbrellaPotential, LinearPotential - steering/collective_variables.py: CV framework (RMSD, FNC, CaCaDistance, PairwiseClash) - steering/utils.py: Resampling, reward computation, sequence alignment helpers - steering/dpm_fkc.py: FKC (Feynman-Kac Control) steered denoiser - steering/dpm_smc.py: SMC (Sequential Monte Carlo) steered denoiser Key changes: - Unified DPM-Solver primitives in denoiser.py (shared by FKC, SMC, unsteered) - Steering configs (cv_steer.yaml, physical_steering.yaml) are self-contained Hydra denoiser configs with target, potentials, and steering params - Simplified sample.py: removed steering_config param, denoiser handles everything - Added start/end time window for steering resampling - Simplified loop returns to (batch, log_weights) 2-tuple Tests: - 60+ steering tests: unit tests for CVs, potentials, utils; integration tests for FKC/SMC loops, ODE consistency, generate_batch pipeline - Chignolin e2e tests (require model weights) Closes: https://github.com/msr-ai4science/feynman/issues/20268 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Split steering.py into a steering/ package with modular components: - steering/potentials.py: Potential base class, UmbrellaPotential, LinearPotential - steering/collective_variables.py: CV framework (RMSD, FNC, CaCaDistance, PairwiseClash) - steering/utils.py: Resampling, reward computation, sequence alignment helpers - steering/dpm_fkc.py: FKC (Feynman-Kac Control) steered denoiser - steering/dpm_smc.py: SMC (Sequential Monte Carlo) steered denoiser Key changes: - Unified DPM-Solver primitives in denoiser.py (shared by FKC, SMC, unsteered) - Steering configs (cv_steer.yaml, physical_steering.yaml) are self-contained Hydra denoiser configs with target, potentials, and steering params - Simplified sample.py: removed steering_config param, denoiser handles everything - Added start/end time window for steering resampling - Simplified loop returns to (batch, log_weights) 2-tuple Tests: - 60+ steering tests: unit tests for CVs, potentials, utils; integration tests for FKC/SMC loops, ODE consistency, generate_batch pipeline - Chignolin e2e tests (require model weights) Closes: https://github.com/msr-ai4science/feynman/issues/20268 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| return self.compute_batch(all_positions * 10.0, sequence) | ||
|
|
||
|
|
||
| class FractionNativeContacts(CollectiveVariable): |
There was a problem hiding this comment.
This one has duplication with the train/foldedness.py, we might want to decide which one to keep
- Change physical_steering.yaml target from dpm_solver_fkc to dpm_solver_smc - Fix SMC loop bug: log_weights was overwritten with None outside steering window - Rewrite README steering section: document both SMC/FKC algorithms, update CLI examples to use denoiser_config (removed old steering_config param), fix parameter descriptions to match current interface Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ioemu into yuxie1/fkc-steering # Conflicts: # src/bioemu/config/steering/physical_steering.yaml # src/bioemu/steering/dpm_smc.py
SummarySummary
Coveragesrc.bioemu - 89.1%
src.bioemu.colabfold_setup -
src.bioemu.hpacker_setup - 58.8%
src.bioemu.openfold.np - 44%
src.bioemu.openfold.utils - 50.1%
src.bioemu.steering - 79.3%
src.bioemu.training - 100%
|
There was a problem hiding this comment.
Pull request overview
Refactors BioEmu’s steering functionality into a dedicated bioemu.steering package and adds modular, Hydra-configured steered denoisers (FKC + SMC) built on shared DPM-Solver primitives.
Changes:
- Introduces
bioemu.steeringsubpackage (CVs, potentials, utilities) and new steered denoisers (dpm_fkc,dpm_smc). - Extracts/shared DPM-Solver helper primitives in
denoiser.pyand simplifies sampling API to steer via a singledenoiser_config. - Replaces the legacy steering test with a broader steering test suite (unit + lightweight integration + optional e2e).
Reviewed changes
Copilot reviewed 19 out of 21 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_steering.py | Removes legacy steering e2e test module. |
| tests/steering/init.py | Adds steering test package marker. |
| tests/steering/test_utils.py | Adds unit tests for resampling + helper utilities. |
| tests/steering/test_potentials.py | Adds unit tests for UmbrellaPotential / LinearPotential behavior. |
| tests/steering/test_integration.py | Adds lightweight integration tests for configs, solvers, resampling, and pipeline wiring. |
| tests/steering/test_denoisers.py | Adds tests for ESS computation and SO(3) gradient mapping helper. |
| tests/steering/test_collective_variables.py | Adds unit tests for CV implementations. |
| tests/steering/test_chignolin_e2e.py | Adds e2e tests that invoke sample() (requires model weights). |
| src/bioemu/steering/utils.py | New steering utilities: config validation, x0/R0 helpers, resampling, reward/grad computation. |
| src/bioemu/steering/potentials.py | New potential framework + UmbrellaPotential / LinearPotential. |
| src/bioemu/steering/dpm_smc.py | Adds SMC steered denoiser built on DPM-Solver++ utilities. |
| src/bioemu/steering/dpm_fkc.py | Adds FKC steered denoiser + analytical weight updates + ESS resampling. |
| src/bioemu/steering/collective_variables.py | Adds CV framework + implementations (FNC, RMSD, CaCaDistance, PairwiseClash). |
| src/bioemu/steering/init.py | Exposes steering public API (CVs, potentials, utilities). |
| src/bioemu/steering.py | Removes legacy monolithic steering module. |
| src/bioemu/sample.py | Simplifies sampling: steering now handled by the instantiated denoiser config. |
| src/bioemu/denoiser.py | Refactors DPM-Solver primitives into reusable helper dataclasses/functions. |
| src/bioemu/config/steering/physical_steering.yaml | Converts physical steering into a self-contained Hydra denoiser config. |
| src/bioemu/config/steering/cv_steer.yaml | Adds example self-contained FKC steering config. |
| README.md | Updates steering documentation and usage to the new single-config model. |
| .gitignore | Ignores tests/cross_repo/. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Two steering algorithms are available: | ||
|
|
||
| - **SMC (Sequential Monte Carlo)**: Simulates multiple *candidate samples* (particles) per desired output sample and resamples between them according to the favorability of the provided potentials. This is the default for physical steering. | ||
| - **FKC (Feynman–Kac Control)**: Applies importance weighting without resampling; useful when targeting a specific collective variable value (e.g., RMSD to a reference). |
There was a problem hiding this comment.
The README states FKC "applies importance weighting without resampling", but the current dpm_solver_fkc implementation does perform ESS-based resampling via resample_based_on_log_weights when num_particles > 1 and within the steering time window. Either update the README description to match the actual behavior, or adjust the implementation/config to make resampling optional/absent for FKC as documented.
| - **FKC (Feynman–Kac Control)**: Applies importance weighting without resampling; useful when targeting a specific collective variable value (e.g., RMSD to a reference). | |
| - **FKC (Feynman–Kac Control)**: Uses importance weighting and may perform ESS-based resampling between particles; useful when targeting a specific collective variable value (e.g., RMSD to a reference). |
| # 1. Compute cumulative sums (CDF) for each batch | ||
| cdf = torch.cumsum(weights, dim=-1) # (B, N) | ||
|
|
||
| # 2. Stratified positions: one per interval | ||
| # shape (B, N): each row gets N stratified uniforms | ||
| u = (torch.rand(B, N, device=weights.device) + torch.arange(N, device=weights.device)) / N | ||
|
|
||
| # 3. Inverse-CDF search: for each u, find smallest j s.t. cdf[b, j] >= u[b, i] | ||
| idx = torch.searchsorted(cdf, u, right=True) | ||
|
|
||
| return idx # shape (B, N) |
There was a problem hiding this comment.
stratified_resample() can return an out-of-range index when u exceeds the final CDF entry due to floating-point normalization error (i.e., searchsorted may return N). This will later crash when indexing data_list[i] during resampling. Consider explicitly forcing cdf[..., -1] = 1 (or cdf = cdf / cdf[..., -1:]) and/or clamping idx to [0, N-1] before returning.
| def resample_based_on_log_weights( | ||
| batch: ChemGraph, | ||
| log_weight: torch.Tensor, | ||
| n_particles: int, | ||
| is_last_step: bool, | ||
| ess_threshold: float, | ||
| step: int, | ||
| t: float, | ||
| ) -> tuple[ChemGraph, torch.Tensor, float, torch.Tensor]: | ||
| """Resample particles based on importance weights. |
There was a problem hiding this comment.
The return type annotation for resample_based_on_log_weights doesn't match what the function actually returns. It currently annotates -> tuple[ChemGraph, torch.Tensor, float, torch.Tensor], but the implementation returns (batch, log_weight, indices, ess) where both indices and ess are tensors. Updating the annotation (and, if desired, the docstring wording) will prevent type-checking confusion and downstream misuse.
| potential_(None, 10 * coords, None, None, | ||
| t=step_index, N=num_steps, sequence=batch.sequence[0]). |
There was a problem hiding this comment.
The compute_reward_and_grad docstring describes calling potentials with the old signature (potential_(None, 10 * coords, None, None, t=step_index, N=num_steps, ...)), but the implementation now calls potential(10.0 * coords, t=..., sequence=...). Please update the docstring to reflect the current API so new potentials/CVs are implemented against the right contract.
| potential_(None, 10 * coords, None, None, | |
| t=step_index, N=num_steps, sequence=batch.sequence[0]). | |
| potential(10.0 * coords, t=step_index, sequence=batch.sequence[0]). |
| denoiser_type: Denoiser to use for sampling, if `denoiser_config` not specified. Comes in with default parameter configuration. Must be one of ['dpm', 'heun'] | ||
| denoiser_config: Path to a denoiser config YAML, or a dict. For steered sampling (FKC/SMC), | ||
| pass a steering config (e.g., config/steering/physical_steering.yaml) which includes | ||
| the denoiser target, potentials, and steering parameters in one file. |
There was a problem hiding this comment.
The docstring states denoiser_config can be a Path, but the current parsing logic only treats an exact str as a path input. If a caller passes a Path, it will hit the final type-assert and fail. Either update the implementation to accept Path/os.PathLike, or adjust the docstring/type to match the actual supported inputs.
| clip_max: 0.7 | ||
| cv: | ||
| _target_: bioemu.steering.RMSD | ||
| reference_pdb: null |
There was a problem hiding this comment.
cv_steer.yaml is described as a self-contained denoiser config, but it sets RMSD.reference_pdb: null. The RMSD CV constructor asserts reference_pdb is not None, so Hydra instantiation will fail if someone tries to use this config as-is. Consider using a concrete default path, or mark it as a required Hydra value (e.g., ???) and add a comment explaining it must be provided.
| reference_pdb: null | |
| # Path to the reference PDB file; must be provided by the user when using this config. | |
| reference_pdb: ??? |
SummarySummary
Coveragesrc.bioemu - 89.1%
src.bioemu.colabfold_setup -
src.bioemu.hpacker_setup - 58.8%
src.bioemu.openfold.np - 44%
src.bioemu.openfold.utils - 50.1%
src.bioemu.steering - 79.4%
src.bioemu.training - 100%
|
Put steering functionalities into a steering/ package with modular components:
Other key changes:
steeringdir.dpm.yamlfor example.resampling_frequencytoess_thresholdTests:
test_steering.pyTODOs:
FractionNativeConcactCV class versustrain/foldedness.pyCloses: https://github.com/msr-ai4science/feynman/issues/20268