feat(eval): dependency-aware eval ordering with DAG wave scheduler by christso · Pull Request #1051 · EntityProcess/agentv

christso · 2026-04-12T08:11:12Z

Summary

Closes #331

Add depends_on and on_dependency_failure fields to EvalTest for declaring inter-test dependencies
Replace flat pLimit + Promise.allSettled dispatch with a DAG-aware wave scheduler that topologically sorts tests and dispatches independent tests in parallel per wave
Inject dependency_results into evaluator context so downstream assertions can access prior test outputs
Validate dependency graph at load time (cycle detection, missing IDs, self-dependencies)
Support three failure policies: skip (default), fail, run

Schema additions

tests:
  - id: backend-api
    criteria: Build REST API endpoints
    input: Implement CRUD for tasks

  - id: integration
    criteria: Frontend calls backend correctly
    input: Verify API integration
    depends_on: [backend-api]
    on_dependency_failure: skip  # skip | fail | run

Test plan

E2E red/green evidence

Red (main): All 4 tests dispatch simultaneously - integration and consistency-check run without waiting for dependencies. depends_on is completely ignored.

Green (feature branch): Wave scheduler respects dependencies. Wave 1 runs backend-api and frontend-ui in parallel. Wave 2 waits for wave 1 to complete, then applies failure policies: integration (skip policy) is skipped, consistency-check (fail policy) is marked failed. Error reason code is dependency_failed.

Generated with Claude Code

#331) Add `depends_on` and `on_dependency_failure` fields to EvalTest, enabling multi-agent swarm evaluation with dependency ordering between tests. The flat pLimit + Promise.allSettled dispatch is replaced with a DAG-aware wave scheduler that: - Validates the dependency graph (rejects cycles, missing IDs, self-deps) - Computes execution waves via topological sort - Dispatches independent tests in parallel within each wave - Tracks completed results for downstream context injection - Supports three failure policies: skip (default), fail, run - Injects `dependency_results` into evaluator context for dependent tests Tests without `depends_on` behave identically to before (single wave). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Only treat execution_error (not quality_failure) as dependency failure A dependency test scoring 0.2 now correctly allows downstream to run. Only actual crashes/errors trigger skip/fail policies. - Add duplicate test ID validation in validateDependencyGraph - Add defensive assertion in computeWaves for unscheduled tests - Deduplicate skip/fail result construction into single branch - Add tests: transitive cascade (A->B->C), quality_failure distinction Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-04-12T08:24:27Z

Deploying agentv with Cloudflare Pages

Latest commit:	`04686dc`
Status:	✅ Deploy successful!
Preview URL:	https://4ea079b0.agentv.pages.dev
Branch Preview URL:	https://feat-331-dependency-aware-sc.agentv.pages.dev

View logs

christso and others added 2 commits April 12, 2026 08:09

christso marked this pull request as ready for review April 12, 2026 08:35

christso merged commit 80f2a64 into main Apr 12, 2026
4 checks passed

christso deleted the feat/331-dependency-aware-scheduling branch April 12, 2026 08:35

christso mentioned this pull request Apr 12, 2026

feat(eval): multi-turn conversational test case — live turn-by-turn evaluation #1052

Closed

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eval): dependency-aware eval ordering with DAG wave scheduler#1051

feat(eval): dependency-aware eval ordering with DAG wave scheduler#1051
christso merged 2 commits intomainfrom
feat/331-dependency-aware-scheduling

christso commented Apr 12, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Schema additions

Test plan

E2E red/green evidence

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christso commented Apr 12, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Apr 12, 2026 •

edited

Loading