feat(core): optional governance metadata on EvalMetadata and EvalTest by christso · Pull Request #1165 · EntityProcess/agentv

christso · 2026-04-27T06:47:42Z

Summary

Add optional GovernanceMetadata block (OWASP LLM Top 10 / Agentic, MITRE ATLAS, cross-framework controls, EU AI Act risk_tier, owner) to suite-level EvalMetadata and case-level EvalTest.metadata.
Suite ↔ case merge: arrays concatenate (deduplicated), scalars on the case override.
Soft-warning lint in eval-validator.ts for unknown fields, malformed controls strings, and risk_tier values outside the EU AI Act vocabulary. No hard errors — custom prefixes (e.g. INTERNAL-AI-POLICY-3.2:CTRL-7) and other vocabularies (e.g. NIST 800-30) pass.
GovernanceMetadata exported from @agentv/core.
Case metadata propagated onto EvaluationResult so it round-trips into JSONL artifacts.

Permissive-by-default to match the issue's design latitude: every field is optional, unknown keys pass through, and the schema_version is also optional.

Surface map (matches issue)

packages/core/src/evaluation/metadata.ts — GovernanceMetadata type + parsing
packages/core/src/evaluation/types.ts — EvaluationResult.metadata for pass-through
packages/core/src/evaluation/yaml-parser.ts — case ↔ suite merge
packages/core/src/evaluation/orchestrator.ts — single funnel that attaches case metadata to the result
packages/core/src/evaluation/validation/eval-validator.ts — soft warnings
packages/core/src/index.ts — re-export via existing barrel
apps/cli/src/commands/eval/artifact-writer.ts — writes metadata into JSONL index

Manual test plan (green)

1. Non-breaking baseline — bun run validate:examples returns Total: 56 | Valid: 56 | Invalid: 0. None of the existing examples without a governance block emit any warning.

2. Suite-level governance round-trips into JSONL. Created /tmp/agentv-1161-uat/g.eval.yaml with a suite governance: block, ran bun apps/cli/src/cli.ts eval ... --dry-run --target llm, then:

{
  "test_id": "case-1",
  "metadata": {
    "governance": {
      "schema_version": "1.0",
      "owasp_llm_top_10_2025": ["LLM01"],
      "controls": ["NIST-AI-RMF-1.0:MEASURE-2.7", "INTERNAL-POLICY-1.0:CTRL-1"],
      "risk_tier": "high",
      "owner": "platform-team"
    }
  }
}

3. Case-level overrides merge with suite-level. Same run, second case (which adds owasp_llm_top_10_2025: [LLM06] to its own metadata):

{
  "test_id": "case-2",
  "metadata": {
    "governance": {
      "owasp_llm_top_10_2025": ["LLM01", "LLM06"],
      "controls": ["NIST-AI-RMF-1.0:MEASURE-2.7", "INTERNAL-POLICY-1.0:CTRL-1"],
      ...
    }
  }
}

Arrays concatenated (suite + case, deduplicated).

5. SDK type export.

import type { GovernanceMetadata } from '@agentv/core';
const g: GovernanceMetadata = {
  schema_version: '1.0',
  controls: ['ISO-42001-2023:A.6.2.4'],
  owasp_llm_top_10_2025: ['LLM01'],
  risk_tier: 'high',
};

Type compiles via bun run typecheck (both @agentv/core and agentv).

6. Validator warns on typos but does not block. Validating a fixture with the three intentional issues from the test plan:

✓ /tmp/agentv-1161-uat/governance-typos.eval.yaml
  ⚠ [governance.owasp_lm_top_10_2025] Unknown governance field 'owasp_lm_top_10_2025'. ...
  ⚠ [governance.controls[0]] Malformed control 'NIST-AI-RMF:MEASURE-2.7'. Expected '<FRAMEWORK>-<VERSION>:<ID>' ...
  ⚠ [governance.risk_tier] 'risk_tier: critical' is outside EU AI Act vocabulary (prohibited | high | limited | minimal). ...

Total files: 1
Valid: 1
Invalid: 0

7. Custom prefixes are first-class, no warning. Same run, controls: [..., 'INTERNAL-POLICY-1.0:CTRL-1'] validates clean.

Unit tests: 1718 (core) + 67 (eval) + 491 (cli) = 2276 pass / 0 fail. New tests cover suite-level governance, case-level merge, lint warnings for typos / malformed strings / non-EU vocabulary, and case-level governance lint.

Pre-push hook: Build / Typecheck / Lint / Test / Validate eval YAML files all Passed.

Quality-gate self-check

❌ no new CLI flags or subcommands
❌ no new fields on workspace.yaml / targets.yaml
❌ no new runtime dependencies
❌ no hard-coded ID lists for OWASP / NIST / ATLAS
❌ no rejection of unknown frameworks or custom prefixes
❌ no behavior change to agentv eval / compare / results beyond passing the metadata through

Notes

The issue's surface map said result propagation was "already verified" through existing paths, but the JSONL artifact writer didn't carry case-level metadata. This PR adds a single line in orchestrator.ts that attaches evalCase.metadata to the result on the success funnel, plus the matching metadata field on IndexArtifactEntry / ResultIndexArtifact. That keeps the change strictly additive — no existing field is renamed or repurposed.

🤖 Generated with Claude Code

Adds an optional `governance` block (OWASP LLM Top 10 / OWASP Agentic / MITRE ATLAS / cross-framework controls / EU AI Act risk tier / owner) to suite-level EvalMetadata and case-level EvalTest.metadata. The shape is permissive: every field is optional, custom prefixes are first-class, and value validation is a soft warning, never an error. Existing evals without the block validate and run unchanged. Case-level blocks merge with suite-level (arrays concat with dedupe, scalars override). Result metadata is surfaced into the JSONL artifact so reports and `jq` pipelines can aggregate by control. Closes #1161

cloudflare-workers-and-pages · 2026-04-27T06:47:51Z

Deploying agentv with Cloudflare Pages

Latest commit:	`aecd755`
Status:	✅ Deploy successful!
Preview URL:	https://e9734571.agentv.pages.dev
Branch Preview URL:	https://feat-1161-governance-metadat.agentv.pages.dev

View logs

The `yaml` package leaves the YAML 1.1 merge key (`<<: *anchor`) as a literal sibling key when parsing in YAML 1.2 mode (its default). Since PR #1165 surfaced suite-level governance into JSONL, this caused a literal `"<<"` key to leak into `metadata.governance` for any case that used merge syntax (e.g. red-team suites authored in #1166). Funnel every YAML parse through a new `parseYamlValue` helper that sets `{ merge: true }`, so merge keys are unwrapped at the parse boundary once and downstream consumers (loaders, validators, JSONL artifacts) all benefit consistently. Promptfoo handles this via js-yaml whose default schema already supports merge keys; we get equivalent behavior. Regression test asserts `<<` is not retained as a key after parsing a document with `<<: *anchor`.

christso marked this pull request as ready for review April 27, 2026 06:47

christso mentioned this pull request Apr 27, 2026

test: pipeline-e2e flake at 5000ms default timeout #1169

Closed

christso merged commit cd76bf8 into main Apr 27, 2026
4 checks passed

christso deleted the feat/1161-governance-metadata branch April 27, 2026 09:07

christso mentioned this pull request Apr 27, 2026

refactor(core): remove typed governance schema, generalize metadata merge (Phase 2 of #1172) #1179

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): optional governance metadata on EvalMetadata and EvalTest#1165

feat(core): optional governance metadata on EvalMetadata and EvalTest#1165
christso merged 1 commit intomainfrom
feat/1161-governance-metadata

christso commented Apr 27, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Apr 27, 2026

Summary

Surface map (matches issue)

Manual test plan (green)

Quality-gate self-check

Notes

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 27, 2026

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant