Skip to content

feat(bioemu): Add FKC steering denoiser & refactor steering code#203

Draft
YuuuXie wants to merge 4 commits intomainfrom
yuxie1/fkc-steering
Draft

feat(bioemu): Add FKC steering denoiser & refactor steering code#203
YuuuXie wants to merge 4 commits intomainfrom
yuxie1/fkc-steering

Conversation

@YuuuXie
Copy link
Collaborator

@YuuuXie YuuuXie commented Feb 27, 2026

Put steering functionalities into a steering/ package with modular components:

  • steering/potentials.py: Potential base class, UmbrellaPotential, LinearPotential
  • steering/collective_variables.py: CV framework (RMSD, FNC, CaCaDistance, PairwiseClash)
  • steering/utils.py: Resampling, reward computation, sequence alignment helpers
  • steering/dpm_fkc.py: Added FKC (Feynman-Kac Control) steered denoiser
  • steering/dpm_smc.py: SMC (Sequential Monte Carlo) steered denoiser

Other key changes:

  • Extracted DPM-Solver primitives in denoiser.py (shared by FKC, SMC, unsteered)
  • Now outside the steering folder, there's only unsteered sampling functionalities, all the steering features are isolated in steering dir.
  • Steering configs (cv_steer.yaml, physical_steering.yaml) are self-contained Hydra denoiser configs with target, potentials, and steering params, and can be used to replace dpm.yaml for example.
  • Simplified sample.py: removed steering_config param, denoiser handles everything
  • Kept start/end time window for steering resampling, but changed resampling_frequency to ess_threshold

Tests:

  • 60+ steering tests: unit tests for CVs, potentials, utils; integration tests for FKC/SMC loops, ODE consistency, generate_batch pipeline
  • Chignolin e2e tests (require model weights), kept from the original test_steering.py

TODOs:

  • Some of the tests need to be cleaned up
  • Add a notebook example
  • Decide what to do with FractionNativeConcact CV class versus train/foldedness.py

Closes: https://github.com/msr-ai4science/feynman/issues/20268

Yu Xie and others added 2 commits February 27, 2026 20:05
Split steering.py into a steering/ package with modular components:
- steering/potentials.py: Potential base class, UmbrellaPotential, LinearPotential
- steering/collective_variables.py: CV framework (RMSD, FNC, CaCaDistance, PairwiseClash)
- steering/utils.py: Resampling, reward computation, sequence alignment helpers
- steering/dpm_fkc.py: FKC (Feynman-Kac Control) steered denoiser
- steering/dpm_smc.py: SMC (Sequential Monte Carlo) steered denoiser

Key changes:
- Unified DPM-Solver primitives in denoiser.py (shared by FKC, SMC, unsteered)
- Steering configs (cv_steer.yaml, physical_steering.yaml) are self-contained
  Hydra denoiser configs with target, potentials, and steering params
- Simplified sample.py: removed steering_config param, denoiser handles everything
- Added start/end time window for steering resampling
- Simplified loop returns to (batch, log_weights) 2-tuple

Tests:
- 60+ steering tests: unit tests for CVs, potentials, utils; integration tests
  for FKC/SMC loops, ODE consistency, generate_batch pipeline
- Chignolin e2e tests (require model weights)

Closes: https://github.com/msr-ai4science/feynman/issues/20268

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Split steering.py into a steering/ package with modular components:
- steering/potentials.py: Potential base class, UmbrellaPotential, LinearPotential
- steering/collective_variables.py: CV framework (RMSD, FNC, CaCaDistance, PairwiseClash)
- steering/utils.py: Resampling, reward computation, sequence alignment helpers
- steering/dpm_fkc.py: FKC (Feynman-Kac Control) steered denoiser
- steering/dpm_smc.py: SMC (Sequential Monte Carlo) steered denoiser

Key changes:
- Unified DPM-Solver primitives in denoiser.py (shared by FKC, SMC, unsteered)
- Steering configs (cv_steer.yaml, physical_steering.yaml) are self-contained
  Hydra denoiser configs with target, potentials, and steering params
- Simplified sample.py: removed steering_config param, denoiser handles everything
- Added start/end time window for steering resampling
- Simplified loop returns to (batch, log_weights) 2-tuple

Tests:
- 60+ steering tests: unit tests for CVs, potentials, utils; integration tests
  for FKC/SMC loops, ODE consistency, generate_batch pipeline
- Chignolin e2e tests (require model weights)

Closes: https://github.com/msr-ai4science/feynman/issues/20268

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
return self.compute_batch(all_positions * 10.0, sequence)


class FractionNativeContacts(CollectiveVariable):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one has duplication with the train/foldedness.py, we might want to decide which one to keep

YuuuXie and others added 2 commits February 27, 2026 20:33
- Change physical_steering.yaml target from dpm_solver_fkc to dpm_solver_smc
- Fix SMC loop bug: log_weights was overwritten with None outside steering window
- Rewrite README steering section: document both SMC/FKC algorithms, update CLI
  examples to use denoiser_config (removed old steering_config param), fix
  parameter descriptions to match current interface

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ioemu into yuxie1/fkc-steering

# Conflicts:
#	src/bioemu/config/steering/physical_steering.yaml
#	src/bioemu/steering/dpm_smc.py
@github-actions
Copy link

Summary

Summary
Generated on: 02/27/2026 - 20:40:59
Parser: Cobertura
Assemblies: 7
Classes: 31
Files: 31
Line coverage: 75.1% (2437 of 3241)
Covered lines: 2437
Uncovered lines: 804
Coverable lines: 3241
Total lines: 10699
Covered branches: 0
Total branches: 0
Method coverage: Feature is only available for sponsors

Coverage

src.bioemu - 89.1%
Name Line Branch
src.bioemu 89.1% ****
init.py 100%
chemgraph.py 100%
convert_chemgraph.py 97.6%
denoiser.py 98.3%
get_embeds.py 90.3%
md_utils.py 85.8%
model_utils.py 78%
models.py 94.1%
run_hpacker.py 0%
sample.py 92.3%
sde_lib.py 86.6%
seq_io.py 100%
shortcuts.py 100%
sidechain_relax.py 77.2%
so3_sde.py 90.3%
structure_module.py 84.3%
utils.py 65.6%
src.bioemu.colabfold_setup -
Name Line Branch
src.bioemu.colabfold_setup **** ****
init.py
src.bioemu.hpacker_setup - 58.8%
Name Line Branch
src.bioemu.hpacker_setup 58.8% ****
init.py
setup_hpacker.py 58.8%
src.bioemu.openfold.np - 44%
Name Line Branch
src.bioemu.openfold.np 44% ****
protein.py 31.2%
residue_constants.py 60.7%
src.bioemu.openfold.utils - 50.1%
Name Line Branch
src.bioemu.openfold.utils 50.1% ****
rigid_utils.py 50.1%
src.bioemu.steering - 79.3%
Name Line Branch
src.bioemu.steering 79.3% ****
init.py 100%
collective_variables.py 32.8%
dpm_fkc.py 100%
dpm_smc.py 100%
potentials.py 95.5%
utils.py 92.1%
src.bioemu.training - 100%
Name Line Branch
src.bioemu.training 100% ****
foldedness.py 100%
loss.py 100%

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors BioEmu’s steering functionality into a dedicated bioemu.steering package and adds modular, Hydra-configured steered denoisers (FKC + SMC) built on shared DPM-Solver primitives.

Changes:

  • Introduces bioemu.steering subpackage (CVs, potentials, utilities) and new steered denoisers (dpm_fkc, dpm_smc).
  • Extracts/shared DPM-Solver helper primitives in denoiser.py and simplifies sampling API to steer via a single denoiser_config.
  • Replaces the legacy steering test with a broader steering test suite (unit + lightweight integration + optional e2e).

Reviewed changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/test_steering.py Removes legacy steering e2e test module.
tests/steering/init.py Adds steering test package marker.
tests/steering/test_utils.py Adds unit tests for resampling + helper utilities.
tests/steering/test_potentials.py Adds unit tests for UmbrellaPotential / LinearPotential behavior.
tests/steering/test_integration.py Adds lightweight integration tests for configs, solvers, resampling, and pipeline wiring.
tests/steering/test_denoisers.py Adds tests for ESS computation and SO(3) gradient mapping helper.
tests/steering/test_collective_variables.py Adds unit tests for CV implementations.
tests/steering/test_chignolin_e2e.py Adds e2e tests that invoke sample() (requires model weights).
src/bioemu/steering/utils.py New steering utilities: config validation, x0/R0 helpers, resampling, reward/grad computation.
src/bioemu/steering/potentials.py New potential framework + UmbrellaPotential / LinearPotential.
src/bioemu/steering/dpm_smc.py Adds SMC steered denoiser built on DPM-Solver++ utilities.
src/bioemu/steering/dpm_fkc.py Adds FKC steered denoiser + analytical weight updates + ESS resampling.
src/bioemu/steering/collective_variables.py Adds CV framework + implementations (FNC, RMSD, CaCaDistance, PairwiseClash).
src/bioemu/steering/init.py Exposes steering public API (CVs, potentials, utilities).
src/bioemu/steering.py Removes legacy monolithic steering module.
src/bioemu/sample.py Simplifies sampling: steering now handled by the instantiated denoiser config.
src/bioemu/denoiser.py Refactors DPM-Solver primitives into reusable helper dataclasses/functions.
src/bioemu/config/steering/physical_steering.yaml Converts physical steering into a self-contained Hydra denoiser config.
src/bioemu/config/steering/cv_steer.yaml Adds example self-contained FKC steering config.
README.md Updates steering documentation and usage to the new single-config model.
.gitignore Ignores tests/cross_repo/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Two steering algorithms are available:

- **SMC (Sequential Monte Carlo)**: Simulates multiple *candidate samples* (particles) per desired output sample and resamples between them according to the favorability of the provided potentials. This is the default for physical steering.
- **FKC (Feynman–Kac Control)**: Applies importance weighting without resampling; useful when targeting a specific collective variable value (e.g., RMSD to a reference).
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README states FKC "applies importance weighting without resampling", but the current dpm_solver_fkc implementation does perform ESS-based resampling via resample_based_on_log_weights when num_particles > 1 and within the steering time window. Either update the README description to match the actual behavior, or adjust the implementation/config to make resampling optional/absent for FKC as documented.

Suggested change
- **FKC (Feynman–Kac Control)**: Applies importance weighting without resampling; useful when targeting a specific collective variable value (e.g., RMSD to a reference).
- **FKC (Feynman–Kac Control)**: Uses importance weighting and may perform ESS-based resampling between particles; useful when targeting a specific collective variable value (e.g., RMSD to a reference).

Copilot uses AI. Check for mistakes.
Comment on lines +84 to +94
# 1. Compute cumulative sums (CDF) for each batch
cdf = torch.cumsum(weights, dim=-1) # (B, N)

# 2. Stratified positions: one per interval
# shape (B, N): each row gets N stratified uniforms
u = (torch.rand(B, N, device=weights.device) + torch.arange(N, device=weights.device)) / N

# 3. Inverse-CDF search: for each u, find smallest j s.t. cdf[b, j] >= u[b, i]
idx = torch.searchsorted(cdf, u, right=True)

return idx # shape (B, N)
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stratified_resample() can return an out-of-range index when u exceeds the final CDF entry due to floating-point normalization error (i.e., searchsorted may return N). This will later crash when indexing data_list[i] during resampling. Consider explicitly forcing cdf[..., -1] = 1 (or cdf = cdf / cdf[..., -1:]) and/or clamping idx to [0, N-1] before returning.

Copilot uses AI. Check for mistakes.
Comment on lines +178 to +187
def resample_based_on_log_weights(
batch: ChemGraph,
log_weight: torch.Tensor,
n_particles: int,
is_last_step: bool,
ess_threshold: float,
step: int,
t: float,
) -> tuple[ChemGraph, torch.Tensor, float, torch.Tensor]:
"""Resample particles based on importance weights.
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type annotation for resample_based_on_log_weights doesn't match what the function actually returns. It currently annotates -> tuple[ChemGraph, torch.Tensor, float, torch.Tensor], but the implementation returns (batch, log_weight, indices, ess) where both indices and ess are tensors. Updating the annotation (and, if desired, the docstring wording) will prevent type-checking confusion and downstream misuse.

Copilot uses AI. Check for mistakes.
Comment on lines +325 to +326
potential_(None, 10 * coords, None, None,
t=step_index, N=num_steps, sequence=batch.sequence[0]).
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compute_reward_and_grad docstring describes calling potentials with the old signature (potential_(None, 10 * coords, None, None, t=step_index, N=num_steps, ...)), but the implementation now calls potential(10.0 * coords, t=..., sequence=...). Please update the docstring to reflect the current API so new potentials/CVs are implemented against the right contract.

Suggested change
potential_(None, 10 * coords, None, None,
t=step_index, N=num_steps, sequence=batch.sequence[0]).
potential(10.0 * coords, t=step_index, sequence=batch.sequence[0]).

Copilot uses AI. Check for mistakes.
Comment on lines +105 to +108
denoiser_type: Denoiser to use for sampling, if `denoiser_config` not specified. Comes in with default parameter configuration. Must be one of ['dpm', 'heun']
denoiser_config: Path to a denoiser config YAML, or a dict. For steered sampling (FKC/SMC),
pass a steering config (e.g., config/steering/physical_steering.yaml) which includes
the denoiser target, potentials, and steering parameters in one file.
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring states denoiser_config can be a Path, but the current parsing logic only treats an exact str as a path input. If a caller passes a Path, it will hit the final type-assert and fail. Either update the implementation to accept Path/os.PathLike, or adjust the docstring/type to match the actual supported inputs.

Copilot uses AI. Check for mistakes.
clip_max: 0.7
cv:
_target_: bioemu.steering.RMSD
reference_pdb: null
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cv_steer.yaml is described as a self-contained denoiser config, but it sets RMSD.reference_pdb: null. The RMSD CV constructor asserts reference_pdb is not None, so Hydra instantiation will fail if someone tries to use this config as-is. Consider using a concrete default path, or mark it as a required Hydra value (e.g., ???) and add a comment explaining it must be provided.

Suggested change
reference_pdb: null
# Path to the reference PDB file; must be provided by the user when using this config.
reference_pdb: ???

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

Summary

Summary
Generated on: 02/27/2026 - 21:02:23
Parser: Cobertura
Assemblies: 7
Classes: 31
Files: 31
Line coverage: 75.2% (2439 of 3243)
Covered lines: 2439
Uncovered lines: 804
Coverable lines: 3243
Total lines: 10701
Covered branches: 0
Total branches: 0
Method coverage: Feature is only available for sponsors

Coverage

src.bioemu - 89.1%
Name Line Branch
src.bioemu 89.1% ****
init.py 100%
chemgraph.py 100%
convert_chemgraph.py 97.6%
denoiser.py 98.3%
get_embeds.py 90.3%
md_utils.py 85.8%
model_utils.py 78%
models.py 94.1%
run_hpacker.py 0%
sample.py 92.3%
sde_lib.py 86.6%
seq_io.py 100%
shortcuts.py 100%
sidechain_relax.py 77.2%
so3_sde.py 90.3%
structure_module.py 84.3%
utils.py 65.6%
src.bioemu.colabfold_setup -
Name Line Branch
src.bioemu.colabfold_setup **** ****
init.py
src.bioemu.hpacker_setup - 58.8%
Name Line Branch
src.bioemu.hpacker_setup 58.8% ****
init.py
setup_hpacker.py 58.8%
src.bioemu.openfold.np - 44%
Name Line Branch
src.bioemu.openfold.np 44% ****
protein.py 31.2%
residue_constants.py 60.7%
src.bioemu.openfold.utils - 50.1%
Name Line Branch
src.bioemu.openfold.utils 50.1% ****
rigid_utils.py 50.1%
src.bioemu.steering - 79.4%
Name Line Branch
src.bioemu.steering 79.4% ****
init.py 100%
collective_variables.py 32.8%
dpm_fkc.py 100%
dpm_smc.py 100%
potentials.py 95.5%
utils.py 92.1%
src.bioemu.training - 100%
Name Line Branch
src.bioemu.training 100% ****
foldedness.py 100%
loss.py 100%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants