feat(eval): multi-turn conversation mode with turn-by-turn evaluation by christso · Pull Request #1054 · EntityProcess/agentv

christso · 2026-04-12T10:32:22Z

Summary

Adds mode: conversation with turns array for live turn-by-turn LLM evaluation with per-turn and conversation-level grading.

Changes

Types: ConversationTurn, ConversationMode, ConversationAggregation, TurnFailurePolicy
Zod schema updates for new YAML fields
YAML parser support for conversation turns
Conversation runner in orchestrator with turn-by-turn provider calls
Score aggregation (mean/min/max), on_turn_failure (continue/stop), window_size
Cross-field validation rules

Verification

bun run typecheck ✅
bun run build ✅
bun run test ✅ (all 1944 tests pass)

…tion Implements issue #1052: support for evaluating multi-turn conversations where the agent generates each assistant turn with per-turn grading. - Add ConversationTurn type, mode, turns, aggregation, on_turn_failure, window_size to EvalTest - Zod schema and YAML parser updates for new fields - Turn-by-turn loop in orchestrator: accumulate messages, call provider, grade, repeat - Conversation assertions run after all turns - Aggregation: mean (default), min (weakest-link), max - String shorthand in per-turn assertions works identically to top-level - Cross-field validation (turns requires mode:conversation, etc.) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

cloudflare-workers-and-pages · 2026-04-12T10:32:52Z

Deploying agentv with Cloudflare Pages

Latest commit:	`8affcd0`
Status:	✅ Deploy successful!
Preview URL:	https://9530562a.agentv.pages.dev
Branch Preview URL:	https://feat-1052-conversation-mode.agentv.pages.dev

View logs

Adds examples/features/multi-turn-conversation-live/ with 5 test cases exercising conversation mode features: context retention, aggregation modes, on_turn_failure, mixed assertions, and conversation-level assertions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Tests for conversation-mode orchestrator, validation rules, and score aggregation (mean/min/max). Also fixes buildTurnAssertions to emit type: 'llm-grader' with rubrics instead of type: 'rubrics' (which is not registered in the builtin registry). The evaluator-parser uses the same pattern for YAML-sourced rubrics. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- YAML loader: include `turns` in completeness gate so conversation-only cases (no top-level criteria/assertions) are not silently skipped - Orchestrator: stop falling back to evalCase.assertions per-turn — turns without own assertions score 1.0 instead of double-counting top-level - Orchestrator: pass full transcript as candidate for conversation-level grading instead of only the last assistant reply - Orchestrator: serialize structured message content with JSON.stringify instead of producing [object Object] in transcript strings - Validator: reject whitespace-only and empty-array turn inputs - Tests: add regression coverage for double-counting, transcript candidate, and whitespace input validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

christso and others added 3 commits April 12, 2026 10:36

christso marked this pull request as ready for review April 12, 2026 12:48

christso merged commit bdcd007 into main Apr 12, 2026
4 checks passed

christso deleted the feat/1052-conversation-mode branch April 12, 2026 12:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eval): multi-turn conversation mode with turn-by-turn evaluation#1054

feat(eval): multi-turn conversation mode with turn-by-turn evaluation#1054
christso merged 4 commits intomainfrom
feat/1052-conversation-mode

christso commented Apr 12, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Apr 12, 2026

Summary

Changes

Verification

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages Bot commented Apr 12, 2026 •

edited

Loading