feat: Add hosted evals commands by d42me · Pull Request #303 · PrimeIntellect-ai/prime

d42me · 2026-01-13T23:09:37Z

Note

Medium Risk
Adds new CLI flows that create/cancel remote evaluations and handle polling/log streaming; risk is mainly around API contract correctness, error handling, and user-facing prompts, but changes are isolated to eval-related commands.

Overview
Adds hosted evaluation support to prime eval / prime env eval: a new --hosted mode starts evaluations on the platform (resolving owner/name from slug or local metadata, optionally prompting to prime env push if missing), and supports streaming status/logs with polling plus hosted-only options (timeouts, sandbox/instances access, custom secrets, naming).

Introduces new prime eval management commands for hosted runs: logs (tail/follow with status-aware termination + rate-limit handling), stop (cancel), and pull (export completed results into verifiers metadata.json + results.jsonl on disk). Output UX is improved by standardizing viewer URLs via get_eval_viewer_url, and tests add coverage for hosted log cleaning/streaming helpers and rename _load_eval_directory to _load_verifiers_format.

^{Written by Cursor Bugbot for commit 34cda1b. This will update automatically on new commits. Configure here.}

packages/prime/src/prime_cli/utils/hosted_eval.py

packages/prime/src/prime_cli/commands/env.py

packages/prime/src/prime_cli/utils/eval_push.py

packages/prime/src/prime_cli/commands/evals.py

…tus check for prime eval logs.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

packages/prime/src/prime_cli/commands/evals.py

packages/prime/src/prime_cli/utils/hosted_eval.py

manveerxyz · 2026-02-08T23:52:33Z

packages/prime/src/prime_cli/utils/hosted_eval.py

+        console.print(
+            f"[dim]Configuration: num_examples={config.num_examples}, "
+            f"rollouts_per_example={config.rollouts_per_example}[/dim]"
+        )


It would be nice to follow the same output structure as on prime rl run

(cli) cli % prime rl run test-dev.toml Loading config from test-dev.toml Configuration: Model & Environment Model: PrimeIntellect/Qwen3-0.6B-Reverse-Text-SFT Environments: d42me/reverse-text Training Max Steps: 100 Batch Size: 128 Rollouts per Example: 8 Sampling Max Tokens: 2048 Evaluation Environments: manveerxyz/meow Interval: 10 Checking Environment Actions... ✓ d42me/reverse-text (success) Creating RL training run... ✓ Run created successfully! Monitor run at: https://<REDACTED>/dashboard/training/pbghzxpwjc6o6z4al5whc8md View logs with: prime rl logs pbghzxpwjc6o6z4al5whc8md -f

manveerxyz · 2026-02-08T23:54:14Z

packages/prime/src/prime_cli/commands/env.py

+                console.print("[green]✓ Hosted evaluation started[/green]")
+                console.print(f"\n[cyan]Evaluation ID:[/cyan] {eval_id}")
+                console.print(f"\n[dim]View logs with:[/dim]   prime eval logs {eval_id} -f")
+                console.print(f"[dim]Stop eval with:[/dim]     prime eval stop {eval_id}")


there's an extra space here that's kind of annoying haha

d42me · 2026-02-10T01:45:07Z

Closing and merging features to #368

d42me marked this pull request as ready for review January 15, 2026 15:24

cursor bot reviewed Jan 16, 2026

View reviewed changes

packages/prime/src/prime_cli/utils/hosted_eval.py Outdated Show resolved Hide resolved

cursor bot reviewed Jan 29, 2026

View reviewed changes

packages/prime/src/prime_cli/commands/env.py Show resolved Hide resolved

cursor bot reviewed Jan 31, 2026

View reviewed changes

packages/prime/src/prime_cli/commands/env.py Outdated Show resolved Hide resolved

packages/prime/src/prime_cli/utils/eval_push.py Outdated Show resolved Hide resolved

d42me added 5 commits February 5, 2026 17:14

Add hosted evals options.

a781736

Update endpoints:

852f7c0

Improvements.

372f6ac

Add seperate logs command. Refactor enums.

6fd2f68

WIP: Add prime eval pull commands. Add timeout command.

0e7f627

d42me force-pushed the feature/hosted-evals branch from a3fa3c1 to 0e7f627 Compare February 6, 2026 01:15

cursor bot reviewed Feb 6, 2026

View reviewed changes

packages/prime/src/prime_cli/commands/evals.py Show resolved Hide resolved

Removed extend-timeout command. Auto-resolving environment slugs. Sta…

e1f8295

…tus check for prime eval logs.

d42me requested review from JannikSt and manveerxyz February 6, 2026 03:19

Fix test.

dc421ec

cursor bot reviewed Feb 6, 2026

View reviewed changes

packages/prime/src/prime_cli/commands/evals.py Show resolved Hide resolved

packages/prime/src/prime_cli/utils/hosted_eval.py Show resolved Hide resolved

Check status before prime eval pull.

34cda1b

d42me changed the title ~~feat: Add hosted evals options.~~ feat: Add hosted evals commands Feb 6, 2026

manveerxyz reviewed Feb 9, 2026

View reviewed changes

d42me closed this Feb 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add hosted evals commands#303

feat: Add hosted evals commands#303
d42me wants to merge 8 commits intomainfrom
feature/hosted-evals

d42me commented Jan 13, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

manveerxyz Feb 8, 2026

Uh oh!

manveerxyz Feb 8, 2026

Uh oh!

d42me commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

d42me commented Jan 13, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

manveerxyz Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

manveerxyz Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

d42me commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

d42me commented Jan 13, 2026 •

edited by cursor bot

Loading