Symmetry and Data Augmentation by Scienfitz · Pull Request #626 · emdgroup/baybe

Scienfitz · 2025-08-21T15:01:05Z

Implements #621

New: A Symmetry class which is part of baybe Surrogates. Three distinct symmetries are included, for more info check the userguide and for a demonstration of the effect see the new example. The ability to perform data augmentation has been included for all symmetries.

I have left some initial comments on design questions that are still open or where I am kind of indifferent and just had to choose one. Feel free to leave an opinion there first so the large-scale design picture can be finalized independent of small comments.

TODO

CHANGELOG (after architecture is logged in)
Remember to compress finalized svg picture

Other Notes
Symmetries and constraints are conceptually so similar that they should probably have the same interface. The design here has been done from scratch completely ignoring the constraint interface because it is already known to be not optimal and needs refactoring.

There is no overarching shared attribute parameters or similar because some symmetries allow single and some multiple such parameters. Instead the parameters are treated like the objectives treat target(s)
Contrary to dependency constraint, dependency symmetry can only hold 1 set of dependencies. The constraint should be refactored to look the same.

Unrelated Bugfix
I noticed that the permutation constraint also removed the diagonal in its filtering process. However this seems unreasonable since the diagonal is a set of points that are unique and have no invariant equivalent hence nothing needs removing. Turns out there was an automatic removal of the diagoanl because internally DiscretePermutationInvarianceConstraint also always applied a DiscreteNoLabelDuplicates constraint. I think the rational was that label duplicates dont make sense in these mixture situations so they need removing. However, this as nothing to do with the invariance and is achieved anyway in mixture use cases by adding a no label duplicate explicitly. So it was removed from the DiscretePermutationInvarianceConstraint which now leads to the expected amount of removed points (on of the matrix triangles)

AVHopp · 2025-08-25T07:31:31Z

@Scienfitz just to make sure - I guess since this is marked as a draft, you do not require a PR review for now, right? Is there anything else that we can assist with?

Scienfitz · 2025-08-26T15:04:57Z

@AVHopp yes exactly and it will always be like that for PR's that I open in draft: Ignore until requested or asked in any other way

Copilot

Pull Request Overview

This PR implements automatic data augmentation for measurements when constraints support symmetry assumptions, particularly for permutation and dependency invariance constraints. This enhancement helps surrogate models better learn from symmetric relationships in the data without requiring users to manually generate augmented points.

Adds consider_data_augmentation flags to both surrogate models and relevant constraints to control augmentation behavior
Integrates augmentation logic into the Bayesian recommender workflow, applying it before model fitting when configured
Provides comprehensive examples and documentation showing the performance benefits of augmentation

Reviewed Changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`tests/test_measurement_augmentation.py`	New test file verifying augmentation is applied when configured
`examples/Constraints_Discrete/augmentation.py`	New example demonstrating augmentation effects on optimization performance
`docs/userguide/surrogates.md`	Documentation updates explaining data augmentation feature
`docs/userguide/constraints.md`	Documentation updates for augmentation flags in constraints
`docs/scripts/build_examples.py`	Build script improvement to ignore `__pycache__` folders
`baybe/utils/dataframe.py`	Added documentation note about constraint considerations
`baybe/utils/augmentation.py`	Cleaned up duplicate example in docstring
`baybe/surrogates/gaussian_process/core.py`	Added `consider_data_augmentation` flag with temporary default
`baybe/surrogates/base.py`	Added base `consider_data_augmentation` flag to surrogate interface
`baybe/searchspace/core.py`	Core augmentation logic and `augment_measurements` method
`baybe/recommenders/pure/bayesian/base.py`	Integration of augmentation into Bayesian recommender workflow
`baybe/recommenders/pure/base.py`	Minor cleanup of validation logic
`baybe/constraints/discrete.py`	Added `consider_data_augmentation` flags to constraint classes
`baybe/constraints/base.py`	Moved augmentation flag to base constraint class
`CHANGELOG.md`	Documented new features and changes

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

baybe/surrogates/gaussian_process/core.py

tests/test_measurement_augmentation.py

docs/scripts/build_examples.py

baybe/symmetries.py

Scienfitz · 2025-11-03T17:22:53Z

baybe/recommenders/pure/bayesian/base.py

+        # Validate compatibility of surrogate symmetries with searchspace
+        if hasattr(self._surrogate_model, "symmetries"):
+            for s in self._surrogate_model.symmetries:
+                s.validate_searchspace_context(searchspace)


Important: Validation so far is only part of the recommend call here in the recommenders. Validation has not been included in the Campaign yet. This is due to two factors

To properly validate the symmetries and searchspace compatibility there needs to be a mechanism that can iterate over all possible recommenders of a metarecommender. Otherwise this upfront validation already fails for the two phase recommender if the second recommender has symmetries

There would be double validation with campaign and recommend call so the context info of whether validation was already performed needs to be passed somewhere. Likely fixable with settings mechanism not yet available

@AdrianSosic I see now that the 2nd point could be solved with the Settings mechanism but I have no idea how to solve issue 1.

In the absence of that its not realy possible to turn it into an upfront validation, so I would probably not change the validation for this moment unless you have a smarter idea

+1 for being pragmatic and not trying to come up with something potentially convoluted right now. Even if we find a better way for the validation later, including it is just a plain improvement without negative consequences to users, so we can add it later without problems.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

examples/Symmetries/permutation.py

The _autoreplicate converter on main wraps surrogates in a CompositeSurrogate. Access the inner template for symmetry validation and augmentation.

Use full module paths (e.g., baybe.symmetries.base.Symmetry) instead of short paths via __init__.py re-exports, which Sphinx cannot resolve.

Scienfitz · 2026-03-20T19:13:47Z

@AdrianSosic @AVHopp this PR is again ready for review

I

inocorporated most content arising from your comments
then made a new branch off of main
replayed all end2end changes the old branch had into a fresh commit structure on the new branch
overwrote the old branch with the new one

`DiscretePermutationInvarianceConstraint` was always internally applying a DiscreteNoLabelDuplicates constraint to remove the diagonal elements, which is not correct and can always be achieved separately by explicitly using `DiscreteNoLabelDuplicates`

AVHopp

First batch of comments, not yet done

AVHopp · 2026-03-31T10:54:02Z

baybe/constraints/discrete.py

+    def to_symmetries(
+        self, use_data_augmentation=True
+    ) -> tuple[DependencySymmetry, ...]:
+        """Convert to a :class:`~baybe.symmetries.dependency.DependencySymmetry`."""


Please add a description of what use_data_augmentation does.

baybe/surrogates/base.py

AVHopp · 2026-03-31T11:32:33Z

baybe/recommenders/pure/bayesian/base.py

+        # Perform data augmentation if configured
+        surrogate_for_augmentation = self._get_surrogate_for_augmentation()
+        if surrogate_for_augmentation is not None:


I find this way of checking for and applying the augmentations a bit cumbersome. Couldn't we also add a property to Surrogate named something like uses_augmentation or applies_symmetries or similar which basically just checks the length of the symmetries of the Surrogate? I would find this a bit clearer to read, since we now call augment_measurements independent of whether or not a Surrogate actually contains Symmetries and augments something.

AVHopp · 2026-03-31T11:34:56Z

baybe/recommenders/pure/bayesian/base.py

+        # Validate compatibility of surrogate symmetries with searchspace
+        if hasattr(self._surrogate_model, "symmetries"):
+            for s in self._surrogate_model.symmetries:
+                s.validate_searchspace_context(searchspace)


+1 for being pragmatic and not trying to come up with something potentially convoluted right now. Even if we find a better way for the validation later, including it is just a plain improvement without negative consequences to users, so we can add it later without problems.

AVHopp · 2026-03-31T11:37:26Z

baybe/surrogates/gaussian_process/core.py

    @override
    def _fit(self, train_x: Tensor, train_y: Tensor) -> None:
-        import botorch
+        import botorch.models.transforms


Why does this work? 😆 We use botorch.models.SingleTaskGP down below in line 200, I would have thought that this line requires the full botorch import (or at least the import of that model)?
And even if, why do we need the additional from botorch.models.transforms import Normalize, Standardize if we have this import?

baybe/symmetries/base.py

baybe/symmetries/dependency.py

AVHopp · 2026-03-31T11:57:24Z

baybe/symmetries/dependency.py

+    """The parameters affected by the dependency."""
+
+    n_discretization_points: int = field(
+        default=3, validator=(instance_of(int), ge(2)), kw_only=True


Is there any good argument for having 3 as a default? Isn't this completely arbitrary, and also only necessary in special cases when continuous parameters are involved?

3 is the minumum to cover both ends of the range and one in the middle to capture chagne. Thats actually soemwhat similar to what DOE does for designs

only number I can also imagine is 2, but Id go with 3

anything more would be bonus and more fidelity, 1 would not be enough

AVHopp · 2026-03-31T11:59:05Z

baybe/symmetries/base.py

+    the presence of invariances.
+    """
+
+    use_data_augmentation: bool = field(


When would I want to set this to False? Isn't the whole purpose of a symmetry that I want to use it for data augmentation?

In general eveyrthing that increases training poijts must be treated with care purely for computational reasons

But here a concrete example: GPs dotn scale with number of training points (which are increase with this ON), isntead they can work better with invariant kernels and would likely not need the augmentation at all (or at lelast not be affected if it were on)

AVHopp · 2026-03-31T15:25:56Z

docs/userguide/symmetries.md

+| Symmetry                                       | Functional Definition                                                                                                                            | Corresponding Constraint                                                                           | 
+|:-----------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|
+| {class}`~baybe.symmetries.permutation.PermutationSymmetry` | $f(x,y) = f(y,x)$                                                                                                                                | {class}`~baybe.constraints.discrete.DiscretePermutationInvarianceConstraint`                       |
+| {class}`~baybe.symmetries.dependency.DependencySymmetry`  | $f(x,y) = \begin{cases}f(x,y) & \text{if }c(x) \\f(x) & \text{otherwise}\end{cases}$<br>where $c(x)$ is a condition that is either true or false | {class}`~baybe.constraints.discrete.DiscreteDependenciesConstraint`                                |


This is not well-defined - what exactly is the space that f is defined on? You cannot simply "ignore" the second part of the arguments, this is not how functions work.

I guess what you want to say is "only if c(x) is True, the value of the y-part is relevant. Otherwise, we do not care". So this means that you are technically defining a function f that has the property that f(x,y_1) = f(x,y_2) for all y1,y2 if c(x), which means that you could in theory (but not formally) skip the y argument completely right? Not sure yet what is the best way to write that down, but let me first check that I understand the concept.

thats exactly what I want to say which I take as evidence that the language here is indeed sufficient to convey the idea

(if you still dont believe that functions can work like that I can plot you one)

Co-authored-by: Alexander V. Hopp <alexander.hopp@merckgroup.com>

Scienfitz self-assigned this Aug 21, 2025

Scienfitz added the new feature New functionality label Aug 21, 2025

Scienfitz linked an issue Aug 21, 2025 that may be closed by this pull request

Add Data Augmentation for Invariant Contraints #621

Open

Scienfitz mentioned this pull request Aug 24, 2025

Data Input With Invariant Parameters #291

Closed

Scienfitz force-pushed the feature/invariance_augmentation branch from 00cfef8 to 5ba4cb1 Compare September 10, 2025 08:10

Scienfitz force-pushed the feature/invariance_augmentation branch 4 times, most recently from db98f64 to aedafa7 Compare September 25, 2025 10:41

Scienfitz marked this pull request as ready for review September 25, 2025 11:01

Scienfitz requested review from AVHopp and AdrianSosic as code owners September 25, 2025 11:01

Copilot AI review requested due to automatic review settings September 25, 2025 11:01

Copilot AI reviewed Sep 25, 2025

View reviewed changes

baybe/surrogates/gaussian_process/core.py Outdated Show resolved Hide resolved

tests/test_measurement_augmentation.py Outdated Show resolved Hide resolved

docs/scripts/build_examples.py Show resolved Hide resolved

Scienfitz force-pushed the feature/invariance_augmentation branch from 583b7be to 98c29e8 Compare September 25, 2025 11:10

This comment was marked as outdated.

Sign in to view

Scienfitz marked this pull request as draft September 30, 2025 11:26

Scienfitz changed the title ~~Add Auto-Augmentation of Measurements in the Presence of Invariance Constraints~~ Symmetry and Data Augmentation Oct 9, 2025

Scienfitz force-pushed the feature/invariance_augmentation branch from 98c29e8 to 0e05880 Compare October 24, 2025 18:12

Scienfitz force-pushed the feature/invariance_augmentation branch from 46bc49c to 859ca3b Compare October 31, 2025 18:23

Scienfitz commented Nov 3, 2025

View reviewed changes

Scienfitz marked this pull request as ready for review November 3, 2025 17:31

Scienfitz requested a review from Copilot November 4, 2025 08:59

Copilot AI reviewed Nov 4, 2025

View reviewed changes

Scienfitz requested a review from Copilot November 4, 2025 11:55

This comment was marked as resolved.

Sign in to view

This comment was marked as outdated.

Sign in to view

Copilot AI mentioned this pull request Nov 4, 2025

Fix max_value calculation in permutation_symmetries strategy #688

Closed

AVHopp reviewed Dec 15, 2025

View reviewed changes

examples/Symmetries/permutation.py Outdated Show resolved Hide resolved

Scienfitz added this to the 0.15.0 milestone Jan 7, 2026

Scienfitz added 16 commits March 20, 2026 17:34

Refactor shared validators and converters

867f74b

Update permutation augmentation utility interface

2013897

Add mirror augmentation utility

02a06f2

Add Symmetry domain model

c88e5ab

Add Parameter.is_equivalent and apply in PermutationSymmetry

c4b8112

Integrate symmetries into surrogates and recommenders

5bd0225

Update constraints for symmetry support

de60293

Add hypothesis strategies for symmetries and conditions

116b7e7

Add symmetry tests

1e3ce19

Add symmetry documentation

bdfa470

Add symmetry example

3f6494f

Handle CompositeSurrogate in symmetry integration

4619558

The _autoreplicate converter on main wraps surrogates in a CompositeSurrogate. Access the inner template for symmetry validation and augmentation.

Fix mypy errors in categorical validator and dependency type ignore

497cfdd

Add symmetry validation tests

29ad8f3

Update CHANGELOG

25fa10a

Replace deprecated set_random_seed with Settings in example

ef85c21

Scienfitz force-pushed the feature/invariance_augmentation branch from ae24de0 to ef85c21 Compare March 20, 2026 18:18

Fix Sphinx cross-references for symmetry classes

44275fd

Use full module paths (e.g., baybe.symmetries.base.Symmetry) instead of short paths via __init__.py re-exports, which Sphinx cannot resolve.

Fix bug in permutation constraint

f6a7f80

`DiscretePermutationInvarianceConstraint` was always internally applying a DiscreteNoLabelDuplicates constraint to remove the diagonal elements, which is not correct and can always be achieved separately by explicitly using `DiscreteNoLabelDuplicates`

Scienfitz modified the milestones: 0.15.0, 0.16.0 Mar 31, 2026

AVHopp reviewed Mar 31, 2026

View reviewed changes

Scienfitz mentioned this pull request Mar 31, 2026

Smartly Apply Constraints During Cartesian Product #773

Open

Scienfitz and others added 2 commits April 2, 2026 02:10

Improve docstring

27ef14a

Co-authored-by: Alexander V. Hopp <alexander.hopp@merckgroup.com>

Improve docstring

ed0547b

Co-authored-by: Alexander V. Hopp <alexander.hopp@merckgroup.com>

Scienfitz changed the base branch from main to dev/symmetry April 2, 2026 00:41

Scienfitz added the dev label Apr 2, 2026

Conversation

Scienfitz commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AVHopp commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Scienfitz commented Aug 26, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as outdated.

Uh oh!

Scienfitz commented Mar 20, 2026

Uh oh!

AVHopp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Scienfitz commented Aug 21, 2025 •

edited

Loading

AVHopp commented Aug 25, 2025 •

edited

Loading