Skip to content

fix(devserver): honor evaluator.project_id when request omits it#372

Merged
Abhijeet Prasad (AbhiPrasad) merged 3 commits intobraintrustdata:mainfrom
willfrey:fix/devserver-honor-evaluator-project-id
May 1, 2026
Merged

fix(devserver): honor evaluator.project_id when request omits it#372
Abhijeet Prasad (AbhiPrasad) merged 3 commits intobraintrustdata:mainfrom
willfrey:fix/devserver-honor-evaluator-project-id

Conversation

@willfrey
Copy link
Copy Markdown
Contributor

Summary

The dev-server's run_eval builds EvalAsync(...) kwargs with:

{**eval_kwargs, ..., "project_id": eval_data.get("project_id")}

The trailing key always wins in dict-spread merging, so a request body that omits project_id silently overrides the registered evaluator's project_id to None. EvalAsync(name=..., project_id=None) then falls back to using the eval name as the project name (per Eval(...) docstring: "If specified, uses the given project ID instead of the evaluator's name to identify the project."), so experiments route into a per-evaluator-name auto-created project instead of the project the evaluator was registered against.

This bites consumers who mount the dev-server behind a custom auth layer and trigger evals from anything other than the Braintrust playground UI: every triggered run lands in a fresh eval-name-keyed project rather than the canonical project the registered Evaluator(project_id=...) named.

Fix

Fall back to evaluator.project_id when the request omits it. An explicit request-level project_id still takes precedence (no behavior change for the playground UI flow).

project_id = eval_data.get("project_id") or evaluator.project_id

Test plan

  • test_eval_falls_back_to_evaluator_project_id_when_request_omits_it — registers an evaluator with a known project_id, POSTs /eval without project_id, asserts EvalAsync receives the registered id. (Fails on main, passes with this fix.)
  • test_eval_request_project_id_overrides_evaluator — confirms an explicit request-level project_id still wins.
  • Full py/src/braintrust/devserver/ test suite green (21 passed, 2 pre-existing skips).
  • nox -s pylint passes.
  • Pre-commit hooks (ruff format, ruff check, codespell) pass.

The dev-server's run_eval built EvalAsync(...) kwargs with

    {**eval_kwargs, ..., "project_id": eval_data.get("project_id")}

The trailing key always wins in dict-spread merging, so a request body
that omits project_id silently overrode the registered evaluator's
project_id to None. EvalAsync(name=..., project_id=None) then fell back
to using the eval name as the project name (per Eval(... project_id) docs:
"If specified, uses the given project ID instead of the evaluator's name
to identify the project."), so experiments routed into a per-evaluator-name
auto-created project instead of the project the evaluator was registered
against.

Use evaluator.project_id as a fallback when the request omits it. An
explicit project_id in the request still takes precedence.

Tests:
- test_eval_falls_back_to_evaluator_project_id_when_request_omits_it —
  registers an evaluator with a known project_id, posts /eval without
  project_id, asserts EvalAsync receives the registered id.
- test_eval_request_project_id_overrides_evaluator — confirms an
  explicit request-level project_id still wins.
@willfrey Will Frey (willfrey) force-pushed the fix/devserver-honor-evaluator-project-id branch from bff47a1 to 69505ff Compare April 29, 2026 19:07
@willfrey
Copy link
Copy Markdown
Contributor Author

Updated to use the cleaner two-arg dict.get() form:

project_id = eval_data.get("project_id", evaluator.project_id)

The previous ... or evaluator.project_id form treated empty string the same as a missing key, which conflated "absent" with "falsy" — surprising for a literal value the request explicitly sent. The two-arg form expresses the intent more directly: fall back only when the key is absent. The validator in schemas.py already rejects non-string project_id values, so None is impossible through this path.

Both tests still pass (the regression test omits the key; the override test passes a non-empty string — neither exercises the empty-string edge case where the two forms diverge).

@AbhiPrasad
Copy link
Copy Markdown
Member

Abhijeet Prasad (AbhiPrasad) commented May 1, 2026

gonna push up a small adjustment for this, and we can merge it in. As always, thanks for the PR Will Frey (@willfrey)!

The previous ... or evaluator.project_id form treated empty string the same as a missing key, which conflated "absent" with "falsy" — surprising for a literal value the request explicitly sent. The two-arg form expresses the intent more directly: fall back only when the key is absent. The validator in schemas.py already rejects non-string project_id values, so None is impossible through this path.

I know this is more correct, but our backend implementation actually uses or, so I'm swapping out for this and adding a test for it accordingly. Just feels safer to have this if things change.

@willfrey
Copy link
Copy Markdown
Contributor Author

Thank you! Adjust away :) I appreciate your responsiveness!

@AbhiPrasad Abhijeet Prasad (AbhiPrasad) enabled auto-merge (squash) May 1, 2026 19:31
@AbhiPrasad Abhijeet Prasad (AbhiPrasad) merged commit 7c04444 into braintrustdata:main May 1, 2026
82 checks passed
Abhijeet Prasad (AbhiPrasad) added a commit that referenced this pull request May 4, 2026
@AbhiPrasad
Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants