Add CandidatesProtocol#800
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a new candidate-generation abstraction for discrete search spaces, adds Narwhals-based lazy dataframe conversion utilities, and wires in new hard dependencies (narwhals, polars) alongside tests to validate the new candidate classes.
Changes:
- Added
CandidatesProtocolplusProductCandidates(cartesian product + constraints) andTableCandidates(user-provided table) for candidate handling. - Added Narwhals lazy dataframe conversion helpers (
to_lazy_narwhals,from_lazy_narwhals). - Added dependencies and changelog entries, plus a new test module for candidates.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
baybe/searchspace/candidates.py |
New candidate protocol and implementations for product/table candidate generation. |
baybe/utils/dataframe.py |
Adds Narwhals lazy/native conversion helpers. |
baybe/exceptions.py |
Adds InfiniteSpaceError for operations requiring finite spaces. |
tests/test_candidates.py |
New tests for ProductCandidates / TableCandidates. |
pyproject.toml |
Adds narwhals and polars[pyarrow] as core dependencies. |
uv.lock |
Updates lockfile to include new dependencies/versions. |
CHANGELOG.md |
Documents the new dependencies and candidate system additions. |
Comments suppressed due to low confidence (1)
tests/test_candidates.py:137
- Pytest parametrizes
parameter_namesandconstraint_namesfor this test, buttest_product_candidates_creationdoes not accept these arguments. As written, pytest collection will error (“function uses no argument …”). Add the missing arguments to the signature or use indirect parametrization of theparameters/constraintsfixtures.
@pytest.mark.parametrize(
"parameter_names",
[
["Num_disc_1", "Num_disc_2", "Fraction_2"],
["Categorical_1", "Num_disc_1"],
["Categorical_1", "Categorical_2", "Categorical_1_subset"],
],
ids=["numerical", "mixed", "categorical"],
)
@pytest.mark.parametrize(
"constraint_names",
[
[],
["DiscreteSumConstraint"],
["DiscreteExcludeConstraint"],
],
ids=["no_constraint", "sum", "exclude"],
)
def test_product_candidates_creation(parameters, constraints):
"""ProductCandidates can be created with valid parameters and constraints."""
candidates = ProductCandidates(parameters=parameters, constraints=constraints)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
is this PR related to any of the Issues 793 to 798? |
Yes, to #796, but only as a preparation step. Will add it to description. @Scienfitz, @AVHopp: Let me do a first round of polish before you start your review, will save us some unnecessary iterations |
|
please put PRs not ready fort review in |
cc5829c to
3dfbbc4
Compare
Using version from beginnign of 2026
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
3dfbbc4 to
6b59e42
Compare
| class CandidatesProtocol(Protocol): | ||
| """Type protocol specifying the interface candidate generators need to implement.""" | ||
|
|
||
| @property | ||
| def parameters(self) -> tuple[DiscreteParameter, ...]: | ||
| """The parameters spanning the space from which candidates are generated.""" | ||
|
|
||
| @property | ||
| def is_finite(self) -> bool: | ||
| """Indicates whether the candidate set is finite or infinite.""" | ||
|
|
||
| def to_lazy(self) -> nw.LazyFrame: | ||
| """Generate all candidates.""" | ||
|
|
| if missing := {p.name for p in parameters}.difference(data.columns): | ||
| raise ValueError( | ||
| f"The input dataframe is missing columns for the following parameters: " | ||
| f"{missing}" | ||
| ) | ||
|
|
||
| if not allow_extra and ( | ||
| extra := set(data.columns).difference({p.name for p in parameters}) | ||
| ): | ||
| raise ValueError( | ||
| f"The input dataframe contains columns that do not correspond to any " | ||
| f"parameter: {extra}" | ||
| ) | ||
|
|
||
| for p in parameters: |
| @property | ||
| def is_finite(self) -> bool: | ||
| """Indicates whether the parameter has a finite number of values.""" | ||
| len(self.values) # <-- raises an error if the parameter is infinite |
| @@ -0,0 +1,125 @@ | |||
| """Tests for candidate generators.""" | |||
|
|
|||
| import narwhals as nw | |||
| - Mutable stateful objects (campaign, surrogates, recommenders): `@define`. | ||
| - `slots=False` required with `frozen=True` when `cached_property` is needed. See | ||
| `attrs` issue #164 | ||
| - `slots=False` required `cached_property` is needed. See `attrs` issue #164 |
| - Coding convention instructions for agentic developers (`AGENTS.md`, `CLAUDE.md`) | ||
| - `has_polars_implementation` property on `DiscreteConstraint` | ||
| - `allow_missing` flag on `DiscreteConstraint.get_invalid` and `get_valid` | ||
| - `narwhals` as hard dependencies |
fabianliebig
left a comment
There was a problem hiding this comment.
Looks pretty good to me, I have a few questions but most of them are really mainly for my understanding and out of curiosity also regarding the upcoming changes. Just because planning and coding is rather different, there are potentially some comments that are obsolete because it's just a preparation PR so feel free to resolve if they don't make much sense to you :)
| @property | ||
| def is_finite(self) -> bool: | ||
| """Indicates whether the parameter has a finite number of values.""" | ||
| len(self.values) # <-- raises an error if the parameter is infinite | ||
| return True |
There was a problem hiding this comment.
I'm not sure if I understand why an error is triggered by intend here, but perhaps I don't see the full picture at the moment. Because it's not an abstract method, it looks to me as if it would violate the function's contract of returning a bool. In essence, the len() function calls __len__(), right? So an error is thrown if that method is not implemented. Is that meant as some kind of a safeguard until follow-up PRs are implemented? But then I would expect some kind of a readable assert like: assert isinstance(param.values, Sized) or directly use return isinstance(param.values, Sized) directly? Or does that violate the changes to come?
| parameters: tuple[DiscreteParameter, ...] = field( | ||
| converter=sort_parameters, | ||
| validator=[ | ||
| min_len(1), | ||
| deep_iterable(member_validator=instance_of(DiscreteParameter)), | ||
| lambda _, __, x: validate_parameter_names(x), | ||
| ], | ||
| ) | ||
| """See :attr:`CandidatesProtocol.parameters`.""" |
There was a problem hiding this comment.
This code is repeated above and will likely also be in further implementations, such as the upcoming generator, right? Now I understand that the protocol is better for defining nothing but the contract. I was just wondering if we should have an ABC base class for internal purposes, because I cannot think of a scenario where parameters are not defined as in these both classes right now. But feel free to keep it if you think something else would be overengineered.
There was a problem hiding this comment.
I see based on the commits that this has been explicitly moved from the protocol which makes sense. Just because I don't know the code base as well as you do but liked the initial plan of having the parameter and the candidates separated so that it only applies the candidate materialization logic as in #787 with every method having parameters as function arguments: Is that just because it's incompatible with the current way we apply constraints etc. and will be added later or has it become generally impossible?
| def to_lazy(self) -> nw.LazyFrame: | ||
| if not self.is_finite: | ||
| raise InfiniteSpaceError( | ||
| "Cannot generate all candidates from an infinite space." | ||
| ) |
There was a problem hiding this comment.
Does it make sense to override the doc string as well to indicate that this override can raise an exception?
Prepares #796
CandidatesProtocolalong withProductCandidatesandTableCandidatesclasses for representing discrete candidate sets, backed by narwhals lazy framesDiscreteParameter.is_finiteproperty andInfiniteSpaceErrorexception