Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
eb4545a
Add narwhal lazyframe converter
myrazma May 13, 2026
507a38b
Add polars(pyarrow) and narwhals as hard dependencies
myrazma May 13, 2026
f499cbb
Add InfiniteSpaceError
myrazma May 13, 2026
d920ae2
Add CandidateProtocol as well as TabelCandidates and ProductCandidates
myrazma May 13, 2026
2c724e2
Add tests for ProductCandidates and TableCandidates
myrazma May 13, 2026
22aa41b
Add parameter name check to Candidates
myrazma May 13, 2026
92af0ca
Update CHANGELOG.md
myrazma May 13, 2026
fc19bc4
Fix typo
myrazma May 13, 2026
4a43d8b
Remove attribute validation from CandidateProtocol
myrazma May 13, 2026
eacd434
Fix terminology: enumerable -> finite
AdrianSosic May 21, 2026
6cb0e34
Remove polars as hard dependency
AdrianSosic May 21, 2026
8be8647
Adjust narwhals version constraints
AdrianSosic May 21, 2026
2eacc75
Use narwhals stable.v2 namespace
AdrianSosic May 21, 2026
053c16a
Turn protocol (class) attribute into property
AdrianSosic May 21, 2026
05b3ad3
Refine attrs coding conventions in AGENTS.md
AdrianSosic May 21, 2026
831aafe
Add missing garbage collection step
AdrianSosic May 21, 2026
a4abf8f
Fix attribute definitions
AdrianSosic May 21, 2026
dda34ff
Drop unnecessary __attrs_post_init__
AdrianSosic May 21, 2026
adabdd4
Rework candidate module docstrings
AdrianSosic May 21, 2026
2a7f4b9
Turn delayed validation into eager validation
AdrianSosic May 21, 2026
c5ebbba
Add DiscreteParameter.is_finite
AdrianSosic May 21, 2026
4605151
Drop unused helper function
AdrianSosic May 21, 2026
0a4b9b8
Adjust lazyframe conversion utility
AdrianSosic May 21, 2026
10f8118
Rename CandidatesProtocol.to_lazy_candidates to to_lazy
AdrianSosic May 21, 2026
17fc099
Fix changelog
AdrianSosic May 21, 2026
41580f0
Refactor table candidates tests
AdrianSosic May 21, 2026
9d042f1
Enable check for extra dataframe columns
AdrianSosic May 21, 2026
8017b65
Refactor product candidates tests
AdrianSosic May 21, 2026
6b59e42
Expose candidates classes via namespace
AdrianSosic May 21, 2026
f6b516b
Fix CI: Add polars as an optional dependency for testing
fabianliebig May 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 31 additions & 8 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,9 @@ More specific conventions for subdirectories:
### attrs Only
All domain classes use `attrs` `@define`. No dataclasses, no Pydantic.
- Immutable value objects (parameters, kernels, priors, transformations, objectives,
targets): `@define(frozen=True, slots=False)`.
targets): `@define(frozen=True)`.
- Mutable stateful objects (campaign, surrogates, recommenders): `@define`.
- `slots=False` required with `frozen=True` when `cached_property` is needed. See
`attrs` issue #164
- `slots=False` required `cached_property` is needed. See `attrs` issue #164
- Also use `slots=False` when monkeypatching is needed (e.g., `register_hooks`)

### Inheritance: ABC + SerialMixin + Protocol
Expand All @@ -71,14 +70,25 @@ All domain classes use `attrs` `@define`. No dataclasses, no Pydantic.
3. Concrete classes: Inherit from ABC.

### Fields and Methods
- Use `field()` with `validator=`, `converter=`, `default=`, `factory=`, `alias=`.
- Use `field()` with arguments in this order: 1) `alias=` (if needed), 2) `init=`
(if needed), 3) `default=` / `factory=`, 4) `converter=`, 5) `validator=`.
- Private fields: `_` prefix, typically `init=False`.
- Store each piece of information once — no data duplication.
- Use `attrs.evolve()` for modified copies of frozen objects.
- Use `on_setattr` hooks for cache invalidation on mutable objects.
- Use `kw_only=True` deliberately: only when positional construction would be
ambiguous or error-prone (e.g., multiple fields of the same type, or
optional/secondary fields that should not be passed positionally). Do not
apply `kw_only` to all fields by default.
- `ClassVar[bool]` for capability flags (`supports_transfer_learning`, etc.).
- Order class content like this: 1) Attributes, 2) validators and post_init, 3)
properties, 4) methods. Within each group use alphabetical order.
- Order class content like this: 1) Attributes, 2) default and validator methods,
3) `__attrs_post_init__`, 4) properties, 5) methods.
- Attributes are ordered by functionality/importance (primary identity fields
first, optional/secondary fields last), not alphabetically.
- Default and validator methods mirror the attribute order. For a given
attribute, the default method (`_default_<attr>`) comes before its validator
(`_validate_<attr>`).
- Regular methods are ordered alphabetically.

### Attribute Docstrings
String literals immediately below field declarations, blank lines between attributes.
Expand Down Expand Up @@ -225,11 +235,24 @@ Three tiers:

## 11. Validation Patterns
- Inline validators: `field(validator=(instance_of(str), min_len(1)))`, `in_()`,
`deep_iterable()`, custom `finite_float`, `gt()`.
`deep_iterable()`, custom `finite_float`, `gt()`. Order validators from simplest
to most complex: cheap structural checks (e.g., `min_len`, `instance_of`) before
expensive semantic checks (e.g., cross-field consistency, name uniqueness).
- Method validators: `@_field.validator` with `# noqa: DOC101, DOC103` for
validators needing `self` access.
- Cross-field: `__attrs_post_init__` when validation involves multiple fields.
- Cross-field: `__attrs_post_init__` is a last resort. Method validators
(`@field.validator`) already receive `self` and can read other already-set
attributes, so most cross-field checks belong there instead. When one field
must be compatible with another, attach the validator to the later field —
attrs sets fields in declaration order, so earlier fields are always available
via `self` at that point. When one attribute's value must be adjusted after
all fields are set — which is typically a workaround and should itself be
questioned — `__attrs_post_init__` is acceptable.
- Converters: `field(converter=to_searchspace)` for automatic type coercion.
- If a converter already guarantees a specific type (e.g., `converter=list`
always produces a `list`, a custom converter always returns a known type),
omit any `instance_of(...)` validator for that same type — the check is
redundant.
- Reusable validators in `baybe/utils/validation.py`: `finite_float`,
`non_nan_float`, `non_inf_float`, `validate_not_nan`, `validate_target_input`,
`validate_parameter_input`, `validate_object_names`.
Expand Down
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Coding convention instructions for agentic developers (`AGENTS.md`, `CLAUDE.md`)
- `has_polars_implementation` property on `DiscreteConstraint`
- `allow_missing` flag on `DiscreteConstraint.get_invalid` and `get_valid`
- `narwhals` as hard dependencies
- `CandidatesProtocol` as an interface for candidates generation
- `TableCandidates` and `ProductCandidates` classes implementing `CandidatesProtocol`
- `DiscreteParameter.is_finite` property

### Breaking Changes
- `parameter_cartesian_prod_pandas` and `parameter_cartesian_prod_polars` moved
Expand Down
4 changes: 4 additions & 0 deletions baybe/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,5 +179,9 @@ class UnsupportedEarlyFilteringError(Exception):
"""A constraint does not support early filtering with the given parameters."""


class InfiniteSpaceError(Exception):
"""An operation requires a finite search space but the space is infinite."""


# Collect leftover original slotted classes processed by `attrs.define`
gc.collect()
6 changes: 6 additions & 0 deletions baybe/parameters/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,12 @@ class DiscreteParameter(Parameter, ABC):
def values(self) -> tuple:
"""The values the parameter can take."""

@property
def is_finite(self) -> bool:
"""Indicates whether the parameter has a finite number of values."""
len(self.values) # <-- raises an error if the parameter is infinite
return True
Comment on lines +119 to +123
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand why an error is triggered by intend here, but perhaps I don't see the full picture at the moment. Because it's not an abstract method, it looks to me as if it would violate the function's contract of returning a bool. In essence, the len() function calls __len__(), right? So an error is thrown if that method is not implemented. Is that meant as some kind of a safeguard until follow-up PRs are implemented? But then I would expect some kind of a readable assert like: assert isinstance(param.values, Sized) or directly use return isinstance(param.values, Sized) directly? Or does that violate the changes to come?


@property
def active_values(self) -> tuple:
"""The values that are considered for recommendation."""
Expand Down
11 changes: 11 additions & 0 deletions baybe/searchspace/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
"""BayBE search spaces."""

from baybe.searchspace.candidates import (
CandidatesProtocol,
ProductCandidates,
TableCandidates,
)
from baybe.searchspace.continuous import SubspaceContinuous
from baybe.searchspace.core import (
SearchSpace,
Expand All @@ -9,9 +14,15 @@
from baybe.searchspace.discrete import SubspaceDiscrete

__all__ = [
# Search space
"validate_searchspace_from_config",
"SearchSpace",
"SearchSpaceType",
# Discrete
"CandidatesProtocol",
"ProductCandidates",
"TableCandidates",
"SubspaceDiscrete",
# Continuous
"SubspaceContinuous",
]
123 changes: 123 additions & 0 deletions baybe/searchspace/candidates.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
"""Candidates module for managing lazy candidate generation."""

import gc
from typing import Protocol

import narwhals.stable.v2 as nw
from attr.validators import deep_iterable, instance_of, min_len
from attrs import Attribute, define, field
from typing_extensions import override

from baybe.constraints import DISCRETE_CONSTRAINTS_FILTERING_ORDER, validate_constraints
from baybe.constraints.base import DiscreteConstraint
from baybe.exceptions import InfiniteSpaceError
from baybe.parameters.base import DiscreteParameter
from baybe.parameters.utils import sort_parameters
from baybe.searchspace.utils import build_constrained_product
from baybe.searchspace.validation import validate_parameter_names
from baybe.utils.basic import to_tuple
from baybe.utils.dataframe import to_lazy
from baybe.utils.validation import validate_parameter_input


class CandidatesProtocol(Protocol):
"""Type protocol specifying the interface candidate generators need to implement."""

@property
def parameters(self) -> tuple[DiscreteParameter, ...]:
"""The parameters spanning the space from which candidates are generated."""

@property
def is_finite(self) -> bool:
"""Indicates whether the candidate set is finite or infinite."""

def to_lazy(self) -> nw.LazyFrame:
"""Generate all candidates."""

Comment on lines +23 to +36

@define(frozen=True)
class ProductCandidates(CandidatesProtocol):
"""Class for managing candidates from (filtered) Cartesian product spaces."""

parameters: tuple[DiscreteParameter, ...] = field(
converter=sort_parameters,
validator=[
min_len(1),
deep_iterable(member_validator=instance_of(DiscreteParameter)),
lambda _, __, x: validate_parameter_names(x),
],
)
"""See :attr:`CandidatesProtocol.parameters`."""

constraints: tuple[DiscreteConstraint, ...] = field(
default=(),
converter=lambda x: to_tuple(
sorted(
x, key=lambda c: DISCRETE_CONSTRAINTS_FILTERING_ORDER.index(c.__class__)
)
),
validator=deep_iterable(member_validator=instance_of(DiscreteConstraint)),
)
"""Constraints to filter the Cartesian product of parameter values."""

@constraints.validator
def _validate_constraints(
self, _: Attribute, value: tuple[DiscreteConstraint, ...]
): # noqa: DOC101, DOC103
validate_constraints(value, self.parameters)

@override
@property
def is_finite(self) -> bool:
return all(p.is_finite for p in self.parameters)

@override
def to_lazy(self) -> nw.LazyFrame:
if not self.is_finite:
raise InfiniteSpaceError(
"Cannot generate all candidates from an infinite space."
)
Comment on lines +75 to +79
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to override the doc string as well to indicate that this override can raise an exception?


candidates_df = build_constrained_product(self.parameters, self.constraints)

# TODO: Remove to lazy once build_constrained_product returns a nw.LazyFrame
assert not isinstance(candidates_df, nw.LazyFrame)
return to_lazy(candidates_df)


@define(frozen=True)
class TableCandidates(CandidatesProtocol):
"""Class for managing candidates provided in a tabular format."""

parameters: tuple[DiscreteParameter, ...] = field(
converter=sort_parameters,
validator=[
min_len(1),
deep_iterable(member_validator=instance_of(DiscreteParameter)),
lambda _, __, x: validate_parameter_names(x),
],
)
"""See :attr:`CandidatesProtocol.parameters`."""
Comment on lines +92 to +100
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is repeated above and will likely also be in further implementations, such as the upcoming generator, right? Now I understand that the protocol is better for defining nothing but the contract. I was just wondering if we should have an ABC base class for internal purposes, because I cannot think of a scenario where parameters are not defined as in these both classes right now. But feel free to keep it if you think something else would be overengineered.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see based on the commits that this has been explicitly moved from the protocol which makes sense. Just because I don't know the code base as well as you do but liked the initial plan of having the parameter and the candidates separated so that it only applies the candidate materialization logic as in #787 with every method having parameters as function arguments: Is that just because it's incompatible with the current way we apply constraints etc. and will be added later or has it become generally impossible?


dataframe: nw.LazyFrame = field(converter=to_lazy)
"""The dataframe containing the candidates."""

@dataframe.validator
def _validate_dataframe(self, _: Attribute, value: nw.LazyFrame) -> None: # noqa: DOC101, DOC103
# TODO: Remove collect().to_pandas() once validation on lazy frames is supported
validate_parameter_input(
value.collect().to_pandas(), self.parameters, allow_extra=False
)

@override
@property
def is_finite(self) -> bool:
return True

@override
def to_lazy(self) -> nw.LazyFrame:
return self.dataframe


# Collect leftover original slotted classes processed by `attrs.define`
gc.collect()
7 changes: 7 additions & 0 deletions baybe/utils/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,10 @@
from collections.abc import Callable, Collection, Iterable, Sequence
from typing import TYPE_CHECKING, Any, Literal, TypeVar, overload

import narwhals.stable.v2 as nw
import numpy as np
import pandas as pd
from narwhals.stable.v2.typing import IntoDataFrame
from typing_extensions import assert_never

from baybe.exceptions import InputDataTypeWarning, SearchSpaceMatchWarning
Expand Down Expand Up @@ -778,3 +780,8 @@ def needs_float_dtype(obj) -> bool:
for col in cols_to_convert:
df[col] = df[col].astype(active_settings.DTypeFloatNumpy)
return df


def to_lazy(df: IntoDataFrame, /) -> nw.LazyFrame:
"""Convert any dataframe to a :class:`~narwhals.LazyFrame`."""
return nw.from_native(df).lazy()
14 changes: 14 additions & 0 deletions baybe/utils/validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,8 @@ def validate_parameter_input(
data: pd.DataFrame,
parameters: Iterable[Parameter],
numerical_measurements_must_be_within_tolerance: bool = False,
*,
allow_extra: bool = True,
) -> None:
"""Validate input dataframe columns corresponding to parameters.

Expand All @@ -158,10 +160,14 @@ def validate_parameter_input(
numerical_measurements_must_be_within_tolerance: If ``True``, numerical
parameter values must match to parameter values within the
parameter-specific tolerance.
allow_extra: If ``False``, the dataframe is not allowed to contain columns that
do not correspond to any parameter.

Raises:
ValueError: If the data is empty.
ValueError: If the data misses columns for a parameter.
ValueError: If the data contains columns that do not correspond to any parameter
and the corresponding check is enabled.
ValueError: If a parameter contains NaN.
TypeError: If a parameter contains non-numeric values.
"""
Expand All @@ -174,6 +180,14 @@ def validate_parameter_input(
f"{missing}"
)

if not allow_extra and (
extra := set(data.columns).difference({p.name for p in parameters})
):
raise ValueError(
f"The input dataframe contains columns that do not correspond to any "
f"parameter: {extra}"
)

for p in parameters:
if data[p.name].isna().any():
raise ValueError(
Expand Down
9 changes: 5 additions & 4 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -274,15 +274,16 @@
# Mappings to all external packages that we want to have clickable links to
intersphinx_mapping = {
"botorch": ("https://botorch.readthedocs.io/en/latest", None),
"python": ("https://docs.python.org/3", None),
"narwhals": ("https://narwhals-dev.github.io/narwhals/", None),
"numpy": ("https://numpy.org/doc/stable/", None),
"pandas": ("https://pandas.pydata.org/docs/", None),
"polars": ("https://docs.pola.rs/api/python/stable/", None),
"python": ("https://docs.python.org/3", None),
"rdkit": ("https://rdkit.org/docs/", None),
"shap": ("https://shap.readthedocs.io/en/stable/", None),
"skfp": ("https://scikit-fingerprints.readthedocs.io/latest/", None),
"sklearn": ("https://scikit-learn.org/stable/", None),
"numpy": ("https://numpy.org/doc/stable/", None),
"torch": ("https://pytorch.org/docs/main/", None),
"rdkit": ("https://rdkit.org/docs/", None),
"shap": ("https://shap.readthedocs.io/en/stable/", None),
"xyzpy": ("https://xyzpy.readthedocs.io/en/latest/", None),
}

Expand Down
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ dependencies = [
"exceptiongroup",
"gpytorch>=1.9.1,<2",
"joblib>1.4.0,<2",
"narwhals>=2,<3",
"numpy>=1.24.1,<3",
"pandas>=1.4.2,<3",
"scikit-learn>=1.1.1,<2",
Comment thread
AdrianSosic marked this conversation as resolved.
Expand Down Expand Up @@ -165,6 +166,7 @@ benchmarking = [
]

test = [
"baybe[polars]",
"hypothesis[pandas]>=6.88.4",
"tenacity>=8.5.0",
"pytest>=7.2.0",
Expand Down
Loading
Loading