feat(core): optional governance metadata on EvalMetadata and EvalTest#1165
Merged
feat(core): optional governance metadata on EvalMetadata and EvalTest#1165
Conversation
Adds an optional `governance` block (OWASP LLM Top 10 / OWASP Agentic / MITRE ATLAS / cross-framework controls / EU AI Act risk tier / owner) to suite-level EvalMetadata and case-level EvalTest.metadata. The shape is permissive: every field is optional, custom prefixes are first-class, and value validation is a soft warning, never an error. Existing evals without the block validate and run unchanged. Case-level blocks merge with suite-level (arrays concat with dedupe, scalars override). Result metadata is surfaced into the JSONL artifact so reports and `jq` pipelines can aggregate by control. Closes #1161
Deploying agentv with
|
| Latest commit: |
aecd755
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://e9734571.agentv.pages.dev |
| Branch Preview URL: | https://feat-1161-governance-metadat.agentv.pages.dev |
This was referenced Apr 27, 2026
feat(examples): scenario-based red-team suites for coding and customer-facing agent archetypes
#1168
Merged
christso
added a commit
that referenced
this pull request
Apr 27, 2026
The `yaml` package leaves the YAML 1.1 merge key (`<<: *anchor`) as a literal sibling key when parsing in YAML 1.2 mode (its default). Since PR #1165 surfaced suite-level governance into JSONL, this caused a literal `"<<"` key to leak into `metadata.governance` for any case that used merge syntax (e.g. red-team suites authored in #1166). Funnel every YAML parse through a new `parseYamlValue` helper that sets `{ merge: true }`, so merge keys are unwrapped at the parse boundary once and downstream consumers (loaders, validators, JSONL artifacts) all benefit consistently. Promptfoo handles this via js-yaml whose default schema already supports merge keys; we get equivalent behavior. Regression test asserts `<<` is not retained as a key after parsing a document with `<<: *anchor`.
Merged
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1161
Summary
GovernanceMetadatablock (OWASP LLM Top 10 / Agentic, MITRE ATLAS, cross-frameworkcontrols, EU AI Actrisk_tier,owner) to suite-levelEvalMetadataand case-levelEvalTest.metadata.eval-validator.tsfor unknown fields, malformedcontrolsstrings, andrisk_tiervalues outside the EU AI Act vocabulary. No hard errors — custom prefixes (e.g.INTERNAL-AI-POLICY-3.2:CTRL-7) and other vocabularies (e.g. NIST 800-30) pass.GovernanceMetadataexported from@agentv/core.EvaluationResultso it round-trips into JSONL artifacts.Permissive-by-default to match the issue's design latitude: every field is optional, unknown keys pass through, and the schema_version is also optional.
Surface map (matches issue)
packages/core/src/evaluation/metadata.ts—GovernanceMetadatatype + parsingpackages/core/src/evaluation/types.ts—EvaluationResult.metadatafor pass-throughpackages/core/src/evaluation/yaml-parser.ts— case ↔ suite mergepackages/core/src/evaluation/orchestrator.ts— single funnel that attaches case metadata to the resultpackages/core/src/evaluation/validation/eval-validator.ts— soft warningspackages/core/src/index.ts— re-export via existing barrelapps/cli/src/commands/eval/artifact-writer.ts— writesmetadatainto JSONL indexManual test plan (green)
1. Non-breaking baseline —
bun run validate:examplesreturnsTotal: 56 | Valid: 56 | Invalid: 0. None of the existing examples without agovernanceblock emit any warning.2. Suite-level governance round-trips into JSONL. Created
/tmp/agentv-1161-uat/g.eval.yamlwith a suitegovernance:block, ranbun apps/cli/src/cli.ts eval ... --dry-run --target llm, then:{ "test_id": "case-1", "metadata": { "governance": { "schema_version": "1.0", "owasp_llm_top_10_2025": ["LLM01"], "controls": ["NIST-AI-RMF-1.0:MEASURE-2.7", "INTERNAL-POLICY-1.0:CTRL-1"], "risk_tier": "high", "owner": "platform-team" } } }3. Case-level overrides merge with suite-level. Same run, second case (which adds
owasp_llm_top_10_2025: [LLM06]to its own metadata):{ "test_id": "case-2", "metadata": { "governance": { "owasp_llm_top_10_2025": ["LLM01", "LLM06"], "controls": ["NIST-AI-RMF-1.0:MEASURE-2.7", "INTERNAL-POLICY-1.0:CTRL-1"], ... } } }Arrays concatenated (suite + case, deduplicated).
5. SDK type export.
Type compiles via
bun run typecheck(both@agentv/coreandagentv).6. Validator warns on typos but does not block. Validating a fixture with the three intentional issues from the test plan:
7. Custom prefixes are first-class, no warning. Same run,
controls: [..., 'INTERNAL-POLICY-1.0:CTRL-1']validates clean.Unit tests: 1718 (core) + 67 (eval) + 491 (cli) = 2276 pass / 0 fail. New tests cover suite-level governance, case-level merge, lint warnings for typos / malformed strings / non-EU vocabulary, and case-level governance lint.
Pre-push hook: Build / Typecheck / Lint / Test / Validate eval YAML files all
Passed.Quality-gate self-check
workspace.yaml/targets.yamlagentv eval/compare/resultsbeyond passing the metadata throughNotes
The issue's surface map said result propagation was "already verified" through existing paths, but the JSONL artifact writer didn't carry case-level
metadata. This PR adds a single line inorchestrator.tsthat attachesevalCase.metadatato the result on the success funnel, plus the matchingmetadatafield onIndexArtifactEntry/ResultIndexArtifact. That keeps the change strictly additive — no existing field is renamed or repurposed.🤖 Generated with Claude Code