-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Trace Timestamps & Persistence
Goal
Add startTime/endTime to trace types and persist full traces to disk via --trace flag.
Scope
1. Add startTime/endTime to core type interfaces
Replace timestamp with startTime/endTime on ToolCall, OutputMessage, and ProviderResponse. Add startTime/endTime/llmCallCount to TraceSummary and ExecutionMetrics. Since we have few users, replace timestamp directly (no soft deprecation).
Files:
packages/core/src/evaluation/providers/types.tspackages/core/src/evaluation/trace.ts- All references to old
timestampfield across providers and tests
2. Update computeTraceSummary to derive timing from spans
- Derive
startTime/endTimefrom message boundaries (earliest start, latest end) - Compute
toolDurationsfromstartTime/endTimewhendurationMsnot provided - Count
llmCallCountfrom assistant messages
Files:
packages/core/src/evaluation/trace.ts- New:
packages/core/test/evaluation/trace-summary.test.ts
3. Add --trace flag and TraceWriter for trace persistence
- Add
--traceCLI flag that writes fulloutputMessagesto.agentv/traces/as JSONL - Add
outputMessagestoEvaluationResult(optional, stripped before results output) - TraceWriter writes JSONL trace records with spans derived from outputMessages
Files:
- New:
apps/cli/src/commands/eval/trace-writer.ts apps/cli/src/commands/eval/index.tsapps/cli/src/commands/eval/run-eval.tspackages/core/src/evaluation/types.tspackages/core/src/evaluation/orchestrator.ts
Out of scope
Aggregate threshold checks (max_total_duration_ms, max_llm_calls, max_tool_calls) are handled by #103 (execution_metrics evaluator), not this issue.
Related
- feat: add built-in execution_metrics evaluator #103 — built-in execution_metrics evaluator (aggregate thresholds)
- feat: add duration ms for VS Code eval runs #178 — add duration ms for VS Code eval runs
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels