Parameter validation for kernel factories by AdrianSosic · Pull Request #776 · emdgroup/baybe

AdrianSosic · 2026-04-02T11:55:18Z

DevPR, parent is #745

Adds a validation mechanism to ensure kernel factories only produce kernels for search spaces they are intended for.
This is achieved via a new ParameterKind flag enum that factories can use to signal which parameter types they support.

The new parameter kind validation step guards us

Copilot

Pull request overview

This PR introduces a parameter-kind validation mechanism for Gaussian process kernel factories and updates kernel translation to infer dimensions from a SearchSpace, enabling factories to explicitly declare and enforce which parameter roles (e.g., task vs. regular) they support.

Changes:

Add ParameterKind (flag enum) + Parameter.kind and enforce supported parameter kinds in KernelFactory.
Introduce parameter sub-selection via parameter_selector / parameter_names, and refactor Kernel.to_gpytorch to take a SearchSpace for automatic active_dims/ard_num_dims.
Add a deprecation guard that raises a DeprecationError when using a custom kernel in multi-task GP contexts unless suppressed via env var.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`baybe/kernels/base.py`	Refactors `to_gpytorch` to use `SearchSpace`-derived dimensions; adds `parameter_names` to basic kernels.
`baybe/parameters/enum.py`	Introduces `ParameterKind` flag enum.
`baybe/parameters/base.py`	Adds `Parameter.kind` property derived from `ParameterKind`.
`baybe/parameters/__init__.py`	Exposes `ParameterKind` in public parameters API.
`baybe/parameters/selector.py`	Adds parameter selector protocol + concrete selectors (e.g., `TypeSelector`).
`baybe/surrogates/gaussian_process/components/generic.py`	Renames factory protocol type to `GPComponentFactoryProtocol` and updates conversion helper typing.
`baybe/surrogates/gaussian_process/components/kernel.py`	Adds kernel factory base class with parameter-kind validation and introduces `ICMKernelFactory`.
`baybe/surrogates/gaussian_process/components/mean.py`	Switches to protocol-based mean factory typing.
`baybe/surrogates/gaussian_process/components/likelihood.py`	Switches to protocol-based likelihood factory typing.
`baybe/surrogates/gaussian_process/components/__init__.py`	Exposes new `*Protocol` factory types.
`baybe/surrogates/gaussian_process/presets/baybe.py`	Replaces alias with explicit default kernel/task-kernel factories (incl. multitask handling).
`baybe/surrogates/gaussian_process/presets/edbo.py`	Updates EDBO kernel factory to use `parameter_names` selection; adjusts likelihood factory typing.
`baybe/surrogates/gaussian_process/presets/edbo_smoothed.py`	Same as above for smoothed EDBO.
`baybe/surrogates/gaussian_process/core.py`	Updates GP surrogate to use protocol factories, `SearchSpace`-based kernel conversion, and adds multitask custom-kernel deprecation guard.
`baybe/settings.py`	Whitelists and validates `BAYBE_DISABLE_CUSTOM_KERNEL_WARNING`.
`tests/test_kernels.py`	Updates kernel assembly test to build a `SearchSpace` and validate inferred dims / mapping.
`tests/hypothesis_strategies/kernels.py`	Extends kernel strategies to optionally generate `parameter_names`.
`tests/test_deprecations.py`	Adds deprecation test for multitask custom-kernel behavior and env-var suppression.
`CHANGELOG.md`	Documents new features, breaking changes, and the new deprecation.

Comments suppressed due to low confidence (2)

baybe/surrogates/gaussian_process/presets/edbo.py:76

effective_dims is now train_x.shape[-1], which will include task columns in multi-task settings even though this factory can be used with parameter_selector to exclude TaskParameters. Since effective_dims controls prior/initialization regime selection, it should reflect the dimensionality of the kernel’s active (selected) inputs (e.g., based on self.get_parameter_names(searchspace) / BasicKernel._get_dimensions(searchspace)), otherwise priors will shift when adding a task parameter.

    @override
    def _make(
        self, searchspace: SearchSpace, train_x: Tensor, train_y: Tensor
    ) -> Kernel:
        effective_dims = train_x.shape[-1]

        switching_condition = _contains_encoding(
            searchspace.discrete, _EDBO_ENCODINGS
        ) and (effective_dims >= 50)

        # low D priors
        if effective_dims < 5:
            lengthscale_prior = GammaPrior(1.2, 1.1)

baybe/surrogates/gaussian_process/presets/edbo_smoothed.py:60

Same as in EDBOKernelFactory: effective_dims=train_x.shape[-1] will count task dimensions even when a parameter_selector excludes them. Since the interpolated priors depend on effective_dims, compute it from the selected/active dimensions instead of the raw train_x width to keep behavior stable in multi-task setups.

    @override
    def _make(
        self, searchspace: SearchSpace, train_x: Tensor, train_y: Tensor
    ) -> Kernel:
        effective_dims = train_x.shape[-1]

        # Interpolate prior moments linearly between low D and high D regime.
        # The high D regime itself is the average of the EDBO OHE and Mordred regime.
        # Values outside the dimension limits will get the border value assigned.
        lengthscale_prior = GammaPrior(
            np.interp(effective_dims, _DIM_LIMITS, [1.2, 2.5]),
            np.interp(effective_dims, _DIM_LIMITS, [1.1, 0.55]),
        )
        lengthscale_initial_value = np.interp(effective_dims, _DIM_LIMITS, [0.2, 6.0])
        outputscale_prior = GammaPrior(

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-02T12:00:07Z

baybe/kernels/base.py

        # Convert specified inner kernels to gpytorch, if provided
        kernel_dict = {
-            key: value.to_gpytorch(**kw)
+            key: value.to_gpytorch(searchspace, **kw)


Kernel.to_gpytorch recurses into inner kernels via value.to_gpytorch(searchspace, **kw), but the new to_gpytorch signature only accepts (searchspace) (no **kw). This will raise a TypeError for composite kernels at runtime. Call inner kernels with just the searchspace (or update the signature consistently) and let each kernel compute its own dimensions.

Suggested change

key: value.to_gpytorch(searchspace, **kw)

key: value.to_gpytorch(searchspace)

Copilot · 2026-04-02T12:00:07Z

baybe/kernels/base.py

        # Exception: initial values are not used during construction but are set
        # on the created object (see code at the end of the method).
-        missing = set(unmatched) - set(kernel_attrs)
+        missing = set(unmatched) - set(kernel_attrs) - self._whitelisted_attributes


The sanity-check for unmatched attributes computes missing from unmatched (the last loop iteration) instead of the accumulated unmatched_attrs. This can silently miss unmatched BayBE attributes depending on base-class iteration order. Use the collected unmatched_attrs when computing missing.

Suggested change

missing = set(unmatched) - set(kernel_attrs) - self._whitelisted_attributes

missing = set(unmatched_attrs) - set(kernel_attrs) - self._whitelisted_attributes

Copilot · 2026-04-02T12:00:08Z

baybe/kernels/base.py

+        gpytorch_kernel = kernel_cls(**kernel_attrs, ard_num_dims=ard_num_dims, **kw)



to_gpytorch always passes ard_num_dims=ard_num_dims into the GPyTorch constructor, even when ard_num_dims is None. The comment above explicitly notes that passing None can break kernels that use kwargs.get('ard_num_dims', ...) semantics, and it also defeats the earlier filtering of default kwargs. Only pass ard_num_dims when it is not None (consistent with how active_dims is handled).

Suggested change

gpytorch_kernel = kernel_cls(**kernel_attrs, ard_num_dims=ard_num_dims, **kw)

# Only pass `ard_num_dims` if it is not None, to avoid overriding

# kernels that rely on `kwargs.get("ard_num_dims", ...)` semantics.

kernel_kwargs: dict[str, Any] = {**kernel_attrs, **kw}

if ard_num_dims is not None:

kernel_kwargs["ard_num_dims"] = ard_num_dims

gpytorch_kernel = kernel_cls(**kernel_kwargs)

Copilot · 2026-04-02T12:00:08Z

baybe/surrogates/gaussian_process/components/kernel.py

+    base_kernel_factory: KernelFactoryProtocol = field(alias="base_kernel_or_factory")
+    """The factory for the base kernel operating on numerical input features."""
+
+    task_kernel_factory: KernelFactoryProtocol = field(alias="task_kernel_or_factory")
+    """The factory for the task kernel operating on the task indices."""
+
+    @base_kernel_factory.default
+    def _default_base_kernel_factory(self) -> KernelFactoryProtocol:
+        from baybe.surrogates.gaussian_process.presets.baybe import (
+            BayBENumericalKernelFactory,
+        )
+
+        return BayBENumericalKernelFactory(TypeSelector((TaskParameter,), exclude=True))
+
+    @task_kernel_factory.default
+    def _default_task_kernel_factory(self) -> KernelFactoryProtocol:
+        from baybe.surrogates.gaussian_process.presets.baybe import (
+            BayBETaskKernelFactory,
+        )
+
+        return BayBETaskKernelFactory(TypeSelector((TaskParameter,)))
+
+    @override
+    def _make(
+        self, searchspace: SearchSpace, train_x: Tensor, train_y: Tensor
+    ) -> Kernel:
+        base_kernel = self.base_kernel_factory(searchspace, train_x, train_y)
+        task_kernel = self.task_kernel_factory(searchspace, train_x, train_y)
+        return ProductKernel([base_kernel, task_kernel])


ICMKernelFactory fields are aliased as base_kernel_or_factory / task_kernel_or_factory, but they have no converter/validator like GaussianProcessSurrogate.kernel_factory. Passing a Kernel instance (or any non-callable component) will fail at runtime when invoked. Either (a) rename aliases to *_factory and validate is_callable(), or (b) add a to_component_factory converter here as well.

Related: _make builds a BayBE ProductKernel, so these sub-factories should be constrained to return BayBE Kernel objects (not raw GPyTorch kernels), otherwise ProductKernel([base_kernel, task_kernel]) will break.

AdrianSosic · 2026-04-02T12:42:49Z

baybe/parameters/enum.py

+    from baybe.parameters.base import Parameter
+
+
+class ParameterKind(Flag):


or ParameterRole?

AdrianSosic added 22 commits March 6, 2026 17:54

Absorb index kernel construction into ICMKernelFactory

b84af1d

Add parameter selectors

e365862

Rename protocols

dc9ee90

Implement active kernel dimension control

63e64fa

Move parameter_names attribute down to BasicKernel subclass

da8ffc0

Fix condition in BayBEKernelFactory

86e110b

Add deprecation mechanism for breaking change in kernel logic

3714c5f

Import KernelFactory to components/__init__.py

66b3dfb

Add citation to docstring

d91623f

Fix typing

91bd654

Add parameter selection to kernel hypothesis strategies

02bfc97

Drop batch_shape argument from Kernel.to_gpytorch

ca37ecf

Update kernel assembly test

cec1e5d

Refactor handling of constructor-only attributes in test

daaa8f3

Fix logic of custom kernel converter helper

90d4641

Validate that GP component factories are callable

2dd1316

Update CHANGELOG.md

118e019

Add ParameterKind Flag enum and Parameter.kind property

0dc1fa8

Add supported_parameter_kinds and validation to KernelFactory

3d5da06

Wire parameter kind validation into kernel factories via template method

7ebb523

Remove dead TaskParameter filtering in kernel factories

426c9e6

The new parameter kind validation step guards us

Update CHANGELOG.md

671e81c

AdrianSosic self-assigned this Apr 2, 2026

AdrianSosic requested a review from Scienfitz as a code owner April 2, 2026 11:55

AdrianSosic added the new feature New functionality label Apr 2, 2026

AdrianSosic requested a review from AVHopp as a code owner April 2, 2026 11:55

AdrianSosic added the dev label Apr 2, 2026

Copilot AI review requested due to automatic review settings April 2, 2026 11:55

AdrianSosic changed the title ~~Feature/parameter support~~ Parameter validation for kernel factories Apr 2, 2026

Copilot started reviewing on behalf of AdrianSosic April 2, 2026 11:55 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

AdrianSosic commented Apr 2, 2026

View reviewed changes

AdrianSosic mentioned this pull request Apr 2, 2026

Beef up GaussianProcessSurrogate #745

Draft

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameter validation for kernel factories#776

Parameter validation for kernel factories#776
AdrianSosic wants to merge 22 commits intodev/gpfrom
feature/parameter_support

AdrianSosic commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

AdrianSosic Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	key: value.to_gpytorch(searchspace, **kw)
	key: value.to_gpytorch(searchspace)

	missing = set(unmatched) - set(kernel_attrs) - self._whitelisted_attributes
	missing = set(unmatched_attrs) - set(kernel_attrs) - self._whitelisted_attributes

		gpytorch_kernel = kernel_cls(kernel_attrs, ard_num_dims=ard_num_dims, kw)

-        gpytorch_kernel = kernel_cls(**kernel_attrs, ard_num_dims=ard_num_dims, **kw)
+        # Only pass `ard_num_dims` if it is not None, to avoid overriding
+        # kernels that rely on `kwargs.get("ard_num_dims", ...)` semantics.
+        kernel_kwargs: dict[str, Any] = {**kernel_attrs, **kw}
+        if ard_num_dims is not None:
+            kernel_kwargs["ard_num_dims"] = ard_num_dims
+        gpytorch_kernel = kernel_cls(**kernel_kwargs)

		from baybe.parameters.base import Parameter


		class ParameterKind(Flag):

Conversation

AdrianSosic commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

AdrianSosic Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants