feat(bioemu): Add FKC steering denoiser & refactor steering code by YuuuXie · Pull Request #203 · microsoft/bioemu

YuuuXie · 2026-02-27T20:16:29Z

Put steering functionalities into a steering/ package with modular components:

steering/potentials.py: Potential base class, UmbrellaPotential, LinearPotential
steering/collective_variables.py: CV framework (RMSD, FNC, CaCaDistance, PairwiseClash)
steering/utils.py: Resampling, reward computation, sequence alignment helpers
steering/dpm_fkc.py: Added FKC (Feynman-Kac Control) steered denoiser
steering/dpm_smc.py: SMC (Sequential Monte Carlo) steered denoiser

Other key changes:

Extracted DPM-Solver primitives in denoiser.py (shared by FKC, SMC, unsteered)
Now outside the steering folder, there's only unsteered sampling functionalities, all the steering features are isolated in steering dir.
Steering configs (cv_steer.yaml, physical_steering.yaml) are self-contained Hydra denoiser configs with target, potentials, and steering params, and can be used to replace dpm.yaml for example.
Simplified sample.py: removed steering_config param, denoiser handles everything
Kept start/end time window for steering resampling, but changed resampling_frequency to ess_threshold

Tests:

60+ steering tests: unit tests for CVs, potentials, utils; integration tests for FKC/SMC loops, ODE consistency, generate_batch pipeline
Chignolin e2e tests (require model weights), kept from the original test_steering.py

TODOs:

Some of the tests need to be cleaned up
Add a notebook example
Decide what to do with FractionNativeConcact CV class versus train/foldedness.py

Closes: https://github.com/msr-ai4science/feynman/issues/20268

Split steering.py into a steering/ package with modular components: - steering/potentials.py: Potential base class, UmbrellaPotential, LinearPotential - steering/collective_variables.py: CV framework (RMSD, FNC, CaCaDistance, PairwiseClash) - steering/utils.py: Resampling, reward computation, sequence alignment helpers - steering/dpm_fkc.py: FKC (Feynman-Kac Control) steered denoiser - steering/dpm_smc.py: SMC (Sequential Monte Carlo) steered denoiser Key changes: - Unified DPM-Solver primitives in denoiser.py (shared by FKC, SMC, unsteered) - Steering configs (cv_steer.yaml, physical_steering.yaml) are self-contained Hydra denoiser configs with target, potentials, and steering params - Simplified sample.py: removed steering_config param, denoiser handles everything - Added start/end time window for steering resampling - Simplified loop returns to (batch, log_weights) 2-tuple Tests: - 60+ steering tests: unit tests for CVs, potentials, utils; integration tests for FKC/SMC loops, ODE consistency, generate_batch pipeline - Chignolin e2e tests (require model weights) Closes: https://github.com/msr-ai4science/feynman/issues/20268 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

YuuuXie · 2026-02-27T20:19:11Z

src/bioemu/steering/collective_variables.py

+        return self.compute_batch(all_positions * 10.0, sequence)
+
+
+class FractionNativeContacts(CollectiveVariable):


This one has duplication with the train/foldedness.py, we might want to decide which one to keep

- Change physical_steering.yaml target from dpm_solver_fkc to dpm_solver_smc - Fix SMC loop bug: log_weights was overwritten with None outside steering window - Rewrite README steering section: document both SMC/FKC algorithms, update CLI examples to use denoiser_config (removed old steering_config param), fix parameter descriptions to match current interface Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ioemu into yuxie1/fkc-steering # Conflicts: # src/bioemu/config/steering/physical_steering.yaml # src/bioemu/steering/dpm_smc.py

github-actions · 2026-02-27T20:41:01Z

Summary


Generated on:	02/27/2026 - 20:40:59
Parser:	Cobertura
Assemblies:	7
Classes:	31
Files:	31
Line coverage:	75.1% (2437 of 3241)
Covered lines:	2437
Uncovered lines:	804
Coverable lines:	3241
Total lines:	10699
Covered branches:	0
Total branches:	0
Method coverage:	Feature is only available for sponsors

Coverage

src.bioemu - 89.1%

Name	Line	Branch
src.bioemu	89.1%	****
init.py	100%
chemgraph.py	100%
convert_chemgraph.py	97.6%
denoiser.py	98.3%
get_embeds.py	90.3%
md_utils.py	85.8%
model_utils.py	78%
models.py	94.1%
run_hpacker.py	0%
sample.py	92.3%
sde_lib.py	86.6%
seq_io.py	100%
shortcuts.py	100%
sidechain_relax.py	77.2%
so3_sde.py	90.3%
structure_module.py	84.3%
utils.py	65.6%

src.bioemu.colabfold_setup -

Name	Line	Branch
src.bioemu.colabfold_setup	****	****
init.py

src.bioemu.hpacker_setup - 58.8%

Name	Line	Branch
src.bioemu.hpacker_setup	58.8%	****
init.py
setup_hpacker.py	58.8%

src.bioemu.openfold.np - 44%

Name	Line	Branch
src.bioemu.openfold.np	44%	****
protein.py	31.2%
residue_constants.py	60.7%

src.bioemu.openfold.utils - 50.1%

Name	Line	Branch
src.bioemu.openfold.utils	50.1%	****
rigid_utils.py	50.1%

src.bioemu.steering - 79.3%

Name	Line	Branch
src.bioemu.steering	79.3%	****
init.py	100%
collective_variables.py	32.8%
dpm_fkc.py	100%
dpm_smc.py	100%
potentials.py	95.5%
utils.py	92.1%

src.bioemu.training - 100%

Name	Line	Branch
src.bioemu.training	100%	****
foldedness.py	100%
loss.py	100%

Copilot

Pull request overview

Refactors BioEmu’s steering functionality into a dedicated bioemu.steering package and adds modular, Hydra-configured steered denoisers (FKC + SMC) built on shared DPM-Solver primitives.

Changes:

Introduces bioemu.steering subpackage (CVs, potentials, utilities) and new steered denoisers (dpm_fkc, dpm_smc).
Extracts/shared DPM-Solver helper primitives in denoiser.py and simplifies sampling API to steer via a single denoiser_config.
Replaces the legacy steering test with a broader steering test suite (unit + lightweight integration + optional e2e).

Reviewed changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/test_steering.py	Removes legacy steering e2e test module.
tests/steering/init.py	Adds steering test package marker.
tests/steering/test_utils.py	Adds unit tests for resampling + helper utilities.
tests/steering/test_potentials.py	Adds unit tests for `UmbrellaPotential` / `LinearPotential` behavior.
tests/steering/test_integration.py	Adds lightweight integration tests for configs, solvers, resampling, and pipeline wiring.
tests/steering/test_denoisers.py	Adds tests for ESS computation and SO(3) gradient mapping helper.
tests/steering/test_collective_variables.py	Adds unit tests for CV implementations.
tests/steering/test_chignolin_e2e.py	Adds e2e tests that invoke `sample()` (requires model weights).
src/bioemu/steering/utils.py	New steering utilities: config validation, x0/R0 helpers, resampling, reward/grad computation.
src/bioemu/steering/potentials.py	New potential framework + `UmbrellaPotential` / `LinearPotential`.
src/bioemu/steering/dpm_smc.py	Adds SMC steered denoiser built on DPM-Solver++ utilities.
src/bioemu/steering/dpm_fkc.py	Adds FKC steered denoiser + analytical weight updates + ESS resampling.
src/bioemu/steering/collective_variables.py	Adds CV framework + implementations (FNC, RMSD, CaCaDistance, PairwiseClash).
src/bioemu/steering/init.py	Exposes steering public API (CVs, potentials, utilities).
src/bioemu/steering.py	Removes legacy monolithic steering module.
src/bioemu/sample.py	Simplifies sampling: steering now handled by the instantiated denoiser config.
src/bioemu/denoiser.py	Refactors DPM-Solver primitives into reusable helper dataclasses/functions.
src/bioemu/config/steering/physical_steering.yaml	Converts physical steering into a self-contained Hydra denoiser config.
src/bioemu/config/steering/cv_steer.yaml	Adds example self-contained FKC steering config.
README.md	Updates steering documentation and usage to the new single-config model.
.gitignore	Ignores `tests/cross_repo/`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-27T20:49:27Z

README.md

+Two steering algorithms are available:
+
+- **SMC (Sequential Monte Carlo)**: Simulates multiple *candidate samples* (particles) per desired output sample and resamples between them according to the favorability of the provided potentials. This is the default for physical steering.
+- **FKC (Feynman–Kac Control)**: Applies importance weighting without resampling; useful when targeting a specific collective variable value (e.g., RMSD to a reference).


The README states FKC "applies importance weighting without resampling", but the current dpm_solver_fkc implementation does perform ESS-based resampling via resample_based_on_log_weights when num_particles > 1 and within the steering time window. Either update the README description to match the actual behavior, or adjust the implementation/config to make resampling optional/absent for FKC as documented.

Suggested change

- **FKC (Feynman–Kac Control)**: Applies importance weighting without resampling; useful when targeting a specific collective variable value (e.g., RMSD to a reference).

- **FKC (Feynman–Kac Control)**: Uses importance weighting and may perform ESS-based resampling between particles; useful when targeting a specific collective variable value (e.g., RMSD to a reference).

Copilot · 2026-02-27T20:49:28Z

src/bioemu/steering/utils.py

+    # 1. Compute cumulative sums (CDF) for each batch
+    cdf = torch.cumsum(weights, dim=-1)  # (B, N)
+
+    # 2. Stratified positions: one per interval
+    # shape (B, N): each row gets N stratified uniforms
+    u = (torch.rand(B, N, device=weights.device) + torch.arange(N, device=weights.device)) / N
+
+    # 3. Inverse-CDF search: for each u, find smallest j s.t. cdf[b, j] >= u[b, i]
+    idx = torch.searchsorted(cdf, u, right=True)
+
+    return idx  # shape (B, N)


stratified_resample() can return an out-of-range index when u exceeds the final CDF entry due to floating-point normalization error (i.e., searchsorted may return N). This will later crash when indexing data_list[i] during resampling. Consider explicitly forcing cdf[..., -1] = 1 (or cdf = cdf / cdf[..., -1:]) and/or clamping idx to [0, N-1] before returning.

Copilot · 2026-02-27T20:49:28Z

src/bioemu/steering/utils.py

+def resample_based_on_log_weights(
+    batch: ChemGraph,
+    log_weight: torch.Tensor,
+    n_particles: int,
+    is_last_step: bool,
+    ess_threshold: float,
+    step: int,
+    t: float,
+) -> tuple[ChemGraph, torch.Tensor, float, torch.Tensor]:
+    """Resample particles based on importance weights.


The return type annotation for resample_based_on_log_weights doesn't match what the function actually returns. It currently annotates -> tuple[ChemGraph, torch.Tensor, float, torch.Tensor], but the implementation returns (batch, log_weight, indices, ess) where both indices and ess are tensors. Updating the annotation (and, if desired, the docstring wording) will prevent type-checking confusion and downstream misuse.

Copilot · 2026-02-27T20:49:28Z

src/bioemu/steering/utils.py

+            potential_(None, 10 * coords, None, None,
+                       t=step_index, N=num_steps, sequence=batch.sequence[0]).


The compute_reward_and_grad docstring describes calling potentials with the old signature (potential_(None, 10 * coords, None, None, t=step_index, N=num_steps, ...)), but the implementation now calls potential(10.0 * coords, t=..., sequence=...). Please update the docstring to reflect the current API so new potentials/CVs are implemented against the right contract.

Suggested change

potential_(None, 10 * coords, None, None,

t=step_index, N=num_steps, sequence=batch.sequence[0]).

potential(10.0 * coords, t=step_index, sequence=batch.sequence[0]).

Copilot · 2026-02-27T20:49:29Z

src/bioemu/sample.py

+        denoiser_type: Denoiser to use for sampling, if `denoiser_config` not specified. Comes in with default parameter configuration. Must be one of ['dpm', 'heun']
+        denoiser_config: Path to a denoiser config YAML, or a dict. For steered sampling (FKC/SMC),
+            pass a steering config (e.g., config/steering/physical_steering.yaml) which includes
+            the denoiser target, potentials, and steering parameters in one file.


The docstring states denoiser_config can be a Path, but the current parsing logic only treats an exact str as a path input. If a caller passes a Path, it will hit the final type-assert and fail. Either update the implementation to accept Path/os.PathLike, or adjust the docstring/type to match the actual supported inputs.

Copilot · 2026-02-27T20:49:29Z

src/bioemu/config/steering/cv_steer.yaml

+    clip_max: 0.7
+    cv:
+      _target_: bioemu.steering.RMSD
+      reference_pdb: null


cv_steer.yaml is described as a self-contained denoiser config, but it sets RMSD.reference_pdb: null. The RMSD CV constructor asserts reference_pdb is not None, so Hydra instantiation will fail if someone tries to use this config as-is. Consider using a concrete default path, or mark it as a required Hydra value (e.g., ???) and add a comment explaining it must be provided.

Suggested change

reference_pdb: null

# Path to the reference PDB file; must be provided by the user when using this config.

reference_pdb: ???

github-actions · 2026-02-27T21:02:24Z

Summary


Generated on:	02/27/2026 - 21:02:23
Parser:	Cobertura
Assemblies:	7
Classes:	31
Files:	31
Line coverage:	75.2% (2439 of 3243)
Covered lines:	2439
Uncovered lines:	804
Coverable lines:	3243
Total lines:	10701
Covered branches:	0
Total branches:	0
Method coverage:	Feature is only available for sponsors

Coverage

src.bioemu - 89.1%

Name	Line	Branch
src.bioemu	89.1%	****
init.py	100%
chemgraph.py	100%
convert_chemgraph.py	97.6%
denoiser.py	98.3%
get_embeds.py	90.3%
md_utils.py	85.8%
model_utils.py	78%
models.py	94.1%
run_hpacker.py	0%
sample.py	92.3%
sde_lib.py	86.6%
seq_io.py	100%
shortcuts.py	100%
sidechain_relax.py	77.2%
so3_sde.py	90.3%
structure_module.py	84.3%
utils.py	65.6%

src.bioemu.colabfold_setup -

Name	Line	Branch
src.bioemu.colabfold_setup	****	****
init.py

src.bioemu.hpacker_setup - 58.8%

Name	Line	Branch
src.bioemu.hpacker_setup	58.8%	****
init.py
setup_hpacker.py	58.8%

src.bioemu.openfold.np - 44%

Name	Line	Branch
src.bioemu.openfold.np	44%	****
protein.py	31.2%
residue_constants.py	60.7%

src.bioemu.openfold.utils - 50.1%

Name	Line	Branch
src.bioemu.openfold.utils	50.1%	****
rigid_utils.py	50.1%

src.bioemu.steering - 79.4%

Name	Line	Branch
src.bioemu.steering	79.4%	****
init.py	100%
collective_variables.py	32.8%
dpm_fkc.py	100%
dpm_smc.py	100%
potentials.py	95.5%
utils.py	92.1%

src.bioemu.training - 100%

Name	Line	Branch
src.bioemu.training	100%	****
foldedness.py	100%
loss.py	100%

Yu Xie and others added 2 commits February 27, 2026 20:05

YuuuXie commented Feb 27, 2026

View reviewed changes

YuuuXie and others added 2 commits February 27, 2026 20:33

Merge branch 'yuxie1/fkc-steering' of github.com-personal:microsoft/b…

4741092

…ioemu into yuxie1/fkc-steering # Conflicts: # src/bioemu/config/steering/physical_steering.yaml # src/bioemu/steering/dpm_smc.py

YuuuXie requested review from Copilot, ludwigwinkler, nw13slx and sarahnlewis February 27, 2026 20:43

YuuuXie assigned YuuuXie and ludwigwinkler and unassigned YuuuXie Feb 27, 2026

Copilot started reviewing on behalf of YuuuXie February 27, 2026 20:44 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bioemu): Add FKC steering denoiser & refactor steering code#203

feat(bioemu): Add FKC steering denoiser & refactor steering code#203
YuuuXie wants to merge 4 commits intomainfrom
yuxie1/fkc-steering

YuuuXie commented Feb 27, 2026 •

edited

Loading

Uh oh!

YuuuXie Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return self.compute_batch(all_positions * 10.0, sequence)


		class FractionNativeContacts(CollectiveVariable):

	- FKC (Feynman–Kac Control): Applies importance weighting without resampling; useful when targeting a specific collective variable value (e.g., RMSD to a reference).
	- FKC (Feynman–Kac Control): Uses importance weighting and may perform ESS-based resampling between particles; useful when targeting a specific collective variable value (e.g., RMSD to a reference).

		potential_(None, 10 * coords, None, None,
		t=step_index, N=num_steps, sequence=batch.sequence[0]).

	potential_(None, 10 * coords, None, None,
	t=step_index, N=num_steps, sequence=batch.sequence[0]).
	potential(10.0 * coords, t=step_index, sequence=batch.sequence[0]).

	reference_pdb: null
	# Path to the reference PDB file; must be provided by the user when using this config.
	reference_pdb: ???

Conversation

YuuuXie commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YuuuXie Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 27, 2026

Summary

Coverage

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 27, 2026

Summary

Coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YuuuXie commented Feb 27, 2026 •

edited

Loading