Skip to content

Add CandidatesProtocol#800

Open
myrazma wants to merge 29 commits into
dev/candidatesfrom
feature/candidates_base
Open

Add CandidatesProtocol#800
myrazma wants to merge 29 commits into
dev/candidatesfrom
feature/candidates_base

Conversation

@myrazma
Copy link
Copy Markdown
Collaborator

@myrazma myrazma commented May 13, 2026

Prepares #796

  • Adds narwhals as a hard dependency for backend-agnostic dataframe operations
  • Introduces CandidatesProtocol along with ProductCandidates and TableCandidates classes for representing discrete candidate sets, backed by narwhals lazy frames
  • Adds DiscreteParameter.is_finite property and InfiniteSpaceError exception
  • Enables validation for extra columns in candidate tables

@myrazma myrazma marked this pull request as ready for review May 13, 2026 19:02
Copilot AI review requested due to automatic review settings May 13, 2026 19:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new candidate-generation abstraction for discrete search spaces, adds Narwhals-based lazy dataframe conversion utilities, and wires in new hard dependencies (narwhals, polars) alongside tests to validate the new candidate classes.

Changes:

  • Added CandidatesProtocol plus ProductCandidates (cartesian product + constraints) and TableCandidates (user-provided table) for candidate handling.
  • Added Narwhals lazy dataframe conversion helpers (to_lazy_narwhals, from_lazy_narwhals).
  • Added dependencies and changelog entries, plus a new test module for candidates.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
baybe/searchspace/candidates.py New candidate protocol and implementations for product/table candidate generation.
baybe/utils/dataframe.py Adds Narwhals lazy/native conversion helpers.
baybe/exceptions.py Adds InfiniteSpaceError for operations requiring finite spaces.
tests/test_candidates.py New tests for ProductCandidates / TableCandidates.
pyproject.toml Adds narwhals and polars[pyarrow] as core dependencies.
uv.lock Updates lockfile to include new dependencies/versions.
CHANGELOG.md Documents the new dependencies and candidate system additions.
Comments suppressed due to low confidence (1)

tests/test_candidates.py:137

  • Pytest parametrizes parameter_names and constraint_names for this test, but test_product_candidates_creation does not accept these arguments. As written, pytest collection will error (“function uses no argument …”). Add the missing arguments to the signature or use indirect parametrization of the parameters/constraints fixtures.
@pytest.mark.parametrize(
    "parameter_names",
    [
        ["Num_disc_1", "Num_disc_2", "Fraction_2"],
        ["Categorical_1", "Num_disc_1"],
        ["Categorical_1", "Categorical_2", "Categorical_1_subset"],
    ],
    ids=["numerical", "mixed", "categorical"],
)
@pytest.mark.parametrize(
    "constraint_names",
    [
        [],
        ["DiscreteSumConstraint"],
        ["DiscreteExcludeConstraint"],
    ],
    ids=["no_constraint", "sum", "exclude"],
)
def test_product_candidates_creation(parameters, constraints):
    """ProductCandidates can be created with valid parameters and constraints."""
    candidates = ProductCandidates(parameters=parameters, constraints=constraints)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_candidates.py Outdated
Comment thread baybe/searchspace/candidates.py Outdated
Comment thread pyproject.toml
@Scienfitz
Copy link
Copy Markdown
Collaborator

is this PR related to any of the Issues 793 to 798?

@AdrianSosic
Copy link
Copy Markdown
Collaborator

is this PR related to any of the Issues 793 to 798?

Yes, to #796, but only as a preparation step. Will add it to description.

@Scienfitz, @AVHopp: Let me do a first round of polish before you start your review, will save us some unnecessary iterations

@Scienfitz Scienfitz marked this pull request as draft May 18, 2026 08:46
@Scienfitz
Copy link
Copy Markdown
Collaborator

please put PRs not ready fort review in draft

@AdrianSosic AdrianSosic changed the title Add CandidateProtocol and Base Candidate Classes Add CandidatesProtocol May 21, 2026
@AdrianSosic AdrianSosic force-pushed the feature/candidates_base branch from cc5829c to 3dfbbc4 Compare May 21, 2026 14:40
@AdrianSosic AdrianSosic marked this pull request as ready for review May 21, 2026 14:40
@AdrianSosic AdrianSosic force-pushed the feature/candidates_base branch from 3dfbbc4 to 6b59e42 Compare May 21, 2026 14:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 6 comments.

Comment on lines +23 to +36
class CandidatesProtocol(Protocol):
"""Type protocol specifying the interface candidate generators need to implement."""

@property
def parameters(self) -> tuple[DiscreteParameter, ...]:
"""The parameters spanning the space from which candidates are generated."""

@property
def is_finite(self) -> bool:
"""Indicates whether the candidate set is finite or infinite."""

def to_lazy(self) -> nw.LazyFrame:
"""Generate all candidates."""

Comment thread baybe/utils/validation.py
Comment on lines 177 to 191
if missing := {p.name for p in parameters}.difference(data.columns):
raise ValueError(
f"The input dataframe is missing columns for the following parameters: "
f"{missing}"
)

if not allow_extra and (
extra := set(data.columns).difference({p.name for p in parameters})
):
raise ValueError(
f"The input dataframe contains columns that do not correspond to any "
f"parameter: {extra}"
)

for p in parameters:
Comment thread baybe/parameters/base.py
@property
def is_finite(self) -> bool:
"""Indicates whether the parameter has a finite number of values."""
len(self.values) # <-- raises an error if the parameter is infinite
Comment thread tests/test_candidates.py
@@ -0,0 +1,125 @@
"""Tests for candidate generators."""

import narwhals as nw
Comment thread AGENTS.md
- Mutable stateful objects (campaign, surrogates, recommenders): `@define`.
- `slots=False` required with `frozen=True` when `cached_property` is needed. See
`attrs` issue #164
- `slots=False` required `cached_property` is needed. See `attrs` issue #164
Comment thread CHANGELOG.md
- Coding convention instructions for agentic developers (`AGENTS.md`, `CLAUDE.md`)
- `has_polars_implementation` property on `DiscreteConstraint`
- `allow_missing` flag on `DiscreteConstraint.get_invalid` and `get_valid`
- `narwhals` as hard dependencies
@AdrianSosic AdrianSosic changed the base branch from main to dev/candidates May 21, 2026 14:46
@AdrianSosic AdrianSosic self-assigned this May 22, 2026
@AdrianSosic AdrianSosic added new feature New functionality dev labels May 22, 2026
@AdrianSosic AdrianSosic added this to the Roadmap (no ETA) milestone May 22, 2026
Copy link
Copy Markdown
Collaborator

@fabianliebig fabianliebig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me, I have a few questions but most of them are really mainly for my understanding and out of curiosity also regarding the upcoming changes. Just because planning and coding is rather different, there are potentially some comments that are obsolete because it's just a preparation PR so feel free to resolve if they don't make much sense to you :)

Comment thread baybe/parameters/base.py
Comment on lines +119 to +123
@property
def is_finite(self) -> bool:
"""Indicates whether the parameter has a finite number of values."""
len(self.values) # <-- raises an error if the parameter is infinite
return True
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand why an error is triggered by intend here, but perhaps I don't see the full picture at the moment. Because it's not an abstract method, it looks to me as if it would violate the function's contract of returning a bool. In essence, the len() function calls __len__(), right? So an error is thrown if that method is not implemented. Is that meant as some kind of a safeguard until follow-up PRs are implemented? But then I would expect some kind of a readable assert like: assert isinstance(param.values, Sized) or directly use return isinstance(param.values, Sized) directly? Or does that violate the changes to come?

Comment on lines +92 to +100
parameters: tuple[DiscreteParameter, ...] = field(
converter=sort_parameters,
validator=[
min_len(1),
deep_iterable(member_validator=instance_of(DiscreteParameter)),
lambda _, __, x: validate_parameter_names(x),
],
)
"""See :attr:`CandidatesProtocol.parameters`."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is repeated above and will likely also be in further implementations, such as the upcoming generator, right? Now I understand that the protocol is better for defining nothing but the contract. I was just wondering if we should have an ABC base class for internal purposes, because I cannot think of a scenario where parameters are not defined as in these both classes right now. But feel free to keep it if you think something else would be overengineered.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see based on the commits that this has been explicitly moved from the protocol which makes sense. Just because I don't know the code base as well as you do but liked the initial plan of having the parameter and the candidates separated so that it only applies the candidate materialization logic as in #787 with every method having parameters as function arguments: Is that just because it's incompatible with the current way we apply constraints etc. and will be added later or has it become generally impossible?

Comment on lines +75 to +79
def to_lazy(self) -> nw.LazyFrame:
if not self.is_finite:
raise InfiniteSpaceError(
"Cannot generate all candidates from an infinite space."
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to override the doc string as well to indicate that this override can raise an exception?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dev new feature New functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants