Skip to content

feat(pipeline): add --experiment flag for labeling eval run conditions#802

Merged
christso merged 1 commit intomainfrom
feat/experiment-flag
Mar 28, 2026
Merged

feat(pipeline): add --experiment flag for labeling eval run conditions#802
christso merged 1 commit intomainfrom
feat/experiment-flag

Conversation

@christso
Copy link
Copy Markdown
Collaborator

Summary

  • Add --experiment option to pipeline input and pipeline run commands
  • Experiment label flows through manifest.jsonpipeline benchindex.jsonl entries + benchmark.json metadata
  • New feature example: examples/features/experiments/
  • Docs: experiment section in running-evals.mdx, experiments workflow in skill-improvement-workflow.mdx

Motivation

Following the convex-evals pattern, an experiment is a run-level label that records conditions (with_skills, without_skills, web_search, etc.) while keeping eval files identical across runs. This enables dashboard filtering and structured A/B comparison.

agentv pipeline run evals/coding-ability.eval.yaml --experiment with_skills
agentv pipeline run evals/coding-ability.eval.yaml --experiment without_skills

Test plan

  • 2 new tests: experiment flag writes to manifest, omitted when not provided
  • All 14 pipeline tests pass
  • All 81 results tests pass
  • Eval YAML validates
  • Pre-commit hooks pass (build, typecheck, lint, test, validate)

🤖 Generated with Claude Code

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Mar 28, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 61c2c7a
Status:⚡️  Build in progress...

View logs

Add --experiment option to pipeline input and pipeline run commands.
The label is written to manifest.json and propagated through
pipeline bench into index.jsonl entries and benchmark.json metadata.

- pipeline input: accepts --experiment, writes to manifest
- pipeline run: accepts --experiment, writes to manifest
- pipeline bench: reads manifest.experiment, includes in index entries
- New feature example: examples/features/experiments/
- Docs: add experiment section to running-evals.mdx
- Docs: add experiments workflow to skill-improvement-workflow.mdx
- Tests: 2 new tests for experiment flag presence/absence

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@christso christso force-pushed the feat/experiment-flag branch from 2b7250f to 61c2c7a Compare March 28, 2026 05:51
@christso christso merged commit 443766e into main Mar 28, 2026
1 of 2 checks passed
@christso christso deleted the feat/experiment-flag branch March 28, 2026 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant