Skip to content

feat(eval): dependency-aware eval ordering with DAG wave scheduler#1051

Merged
christso merged 2 commits intomainfrom
feat/331-dependency-aware-scheduling
Apr 12, 2026
Merged

feat(eval): dependency-aware eval ordering with DAG wave scheduler#1051
christso merged 2 commits intomainfrom
feat/331-dependency-aware-scheduling

Conversation

@christso
Copy link
Copy Markdown
Collaborator

@christso christso commented Apr 12, 2026

Summary

Closes #331

  • Add depends_on and on_dependency_failure fields to EvalTest for declaring inter-test dependencies
  • Replace flat pLimit + Promise.allSettled dispatch with a DAG-aware wave scheduler that topologically sorts tests and dispatches independent tests in parallel per wave
  • Inject dependency_results into evaluator context so downstream assertions can access prior test outputs
  • Validate dependency graph at load time (cycle detection, missing IDs, self-dependencies)
  • Support three failure policies: skip (default), fail, run

Schema additions

tests:
  - id: backend-api
    criteria: Build REST API endpoints
    input: Implement CRUD for tasks

  - id: integration
    criteria: Frontend calls backend correctly
    input: Verify API integration
    depends_on: [backend-api]
    on_dependency_failure: skip  # skip | fail | run

Test plan

  • Tests without depends_on behave identically (no regression) - 1492 core tests pass
  • Circular dependency detected and rejected at load time
  • Missing dependency IDs rejected at load time
  • Self-dependency rejected at load time
  • Independent tests within a wave run in parallel
  • Dependent tests wait for all dependencies to complete
  • on_dependency_failure: skip skips downstream
  • on_dependency_failure: fail marks downstream as failed
  • on_dependency_failure: run executes downstream regardless
  • dependency_results available in evaluator context
  • Multi-level dependency chains work correctly
  • Schema validation updated (Zod + generated JSON schema)
  • E2E verification with real eval (dry-run)

E2E red/green evidence

Red (main): All 4 tests dispatch simultaneously - integration and consistency-check run without waiting for dependencies. depends_on is completely ignored.

Green (feature branch): Wave scheduler respects dependencies. Wave 1 runs backend-api and frontend-ui in parallel. Wave 2 waits for wave 1 to complete, then applies failure policies: integration (skip policy) is skipped, consistency-check (fail policy) is marked failed. Error reason code is dependency_failed.

Generated with Claude Code

christso and others added 2 commits April 12, 2026 08:09
#331)

Add `depends_on` and `on_dependency_failure` fields to EvalTest, enabling
multi-agent swarm evaluation with dependency ordering between tests.

The flat pLimit + Promise.allSettled dispatch is replaced with a DAG-aware
wave scheduler that:
- Validates the dependency graph (rejects cycles, missing IDs, self-deps)
- Computes execution waves via topological sort
- Dispatches independent tests in parallel within each wave
- Tracks completed results for downstream context injection
- Supports three failure policies: skip (default), fail, run
- Injects `dependency_results` into evaluator context for dependent tests

Tests without `depends_on` behave identically to before (single wave).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Only treat execution_error (not quality_failure) as dependency failure
  A dependency test scoring 0.2 now correctly allows downstream to run.
  Only actual crashes/errors trigger skip/fail policies.
- Add duplicate test ID validation in validateDependencyGraph
- Add defensive assertion in computeWaves for unscheduled tests
- Deduplicate skip/fail result construction into single branch
- Add tests: transitive cascade (A->B->C), quality_failure distinction

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 12, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 04686dc
Status: ✅  Deploy successful!
Preview URL: https://4ea079b0.agentv.pages.dev
Branch Preview URL: https://feat-331-dependency-aware-sc.agentv.pages.dev

View logs

@christso christso marked this pull request as ready for review April 12, 2026 08:35
@christso christso merged commit 80f2a64 into main Apr 12, 2026
4 checks passed
@christso christso deleted the feat/331-dependency-aware-scheduling branch April 12, 2026 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(eval): multi-agent swarm evaluation — dependency-aware eval ordering and cross-agent scoring

1 participant