Skip to content

feat: Add hosted evals commands#303

Closed
d42me wants to merge 8 commits intomainfrom
feature/hosted-evals
Closed

feat: Add hosted evals commands#303
d42me wants to merge 8 commits intomainfrom
feature/hosted-evals

Conversation

@d42me
Copy link
Contributor

@d42me d42me commented Jan 13, 2026

Note

Medium Risk
Adds new CLI flows that create/cancel remote evaluations and handle polling/log streaming; risk is mainly around API contract correctness, error handling, and user-facing prompts, but changes are isolated to eval-related commands.

Overview
Adds hosted evaluation support to prime eval / prime env eval: a new --hosted mode starts evaluations on the platform (resolving owner/name from slug or local metadata, optionally prompting to prime env push if missing), and supports streaming status/logs with polling plus hosted-only options (timeouts, sandbox/instances access, custom secrets, naming).

Introduces new prime eval management commands for hosted runs: logs (tail/follow with status-aware termination + rate-limit handling), stop (cancel), and pull (export completed results into verifiers metadata.json + results.jsonl on disk). Output UX is improved by standardizing viewer URLs via get_eval_viewer_url, and tests add coverage for hosted log cleaning/streaming helpers and rename _load_eval_directory to _load_verifiers_format.

Written by Cursor Bugbot for commit 34cda1b. This will update automatically on new commits. Configure here.

@d42me d42me marked this pull request as ready for review January 15, 2026 15:24
@d42me d42me force-pushed the feature/hosted-evals branch from a3fa3c1 to 0e7f627 Compare February 6, 2026 01:15
@d42me d42me requested review from JannikSt and manveerxyz February 6, 2026 03:19
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@d42me d42me changed the title feat: Add hosted evals options. feat: Add hosted evals commands Feb 6, 2026
Comment on lines +140 to +143
console.print(
f"[dim]Configuration: num_examples={config.num_examples}, "
f"rollouts_per_example={config.rollouts_per_example}[/dim]"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to follow the same output structure as on prime rl run

(cli) cli % prime rl run test-dev.toml 
Loading config from test-dev.toml

Configuration:

Model & Environment
  Model:        PrimeIntellect/Qwen3-0.6B-Reverse-Text-SFT
  Environments: d42me/reverse-text

Training
  Max Steps:           100
  Batch Size:          128
  Rollouts per Example: 8

Sampling
  Max Tokens:  2048

Evaluation
  Environments: manveerxyz/meow
  Interval:     10

Checking Environment Actions...
  ✓ d42me/reverse-text (success)

Creating RL training run...

✓ Run created successfully!

Monitor run at:
  https://<REDACTED>/dashboard/training/pbghzxpwjc6o6z4al5whc8md

View logs with:
  prime rl logs pbghzxpwjc6o6z4al5whc8md -f

console.print("[green]✓ Hosted evaluation started[/green]")
console.print(f"\n[cyan]Evaluation ID:[/cyan] {eval_id}")
console.print(f"\n[dim]View logs with:[/dim] prime eval logs {eval_id} -f")
console.print(f"[dim]Stop eval with:[/dim] prime eval stop {eval_id}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's an extra space here that's kind of annoying haha
Image

@d42me
Copy link
Contributor Author

d42me commented Feb 10, 2026

Closing and merging features to #368

@d42me d42me closed this Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants