Skip to content

feat(core): optional governance metadata on EvalMetadata and EvalTest#1165

Merged
christso merged 1 commit intomainfrom
feat/1161-governance-metadata
Apr 27, 2026
Merged

feat(core): optional governance metadata on EvalMetadata and EvalTest#1165
christso merged 1 commit intomainfrom
feat/1161-governance-metadata

Conversation

@christso
Copy link
Copy Markdown
Collaborator

Closes #1161

Summary

  • Add optional GovernanceMetadata block (OWASP LLM Top 10 / Agentic, MITRE ATLAS, cross-framework controls, EU AI Act risk_tier, owner) to suite-level EvalMetadata and case-level EvalTest.metadata.
  • Suite ↔ case merge: arrays concatenate (deduplicated), scalars on the case override.
  • Soft-warning lint in eval-validator.ts for unknown fields, malformed controls strings, and risk_tier values outside the EU AI Act vocabulary. No hard errors — custom prefixes (e.g. INTERNAL-AI-POLICY-3.2:CTRL-7) and other vocabularies (e.g. NIST 800-30) pass.
  • GovernanceMetadata exported from @agentv/core.
  • Case metadata propagated onto EvaluationResult so it round-trips into JSONL artifacts.

Permissive-by-default to match the issue's design latitude: every field is optional, unknown keys pass through, and the schema_version is also optional.

Surface map (matches issue)

  • packages/core/src/evaluation/metadata.tsGovernanceMetadata type + parsing
  • packages/core/src/evaluation/types.tsEvaluationResult.metadata for pass-through
  • packages/core/src/evaluation/yaml-parser.ts — case ↔ suite merge
  • packages/core/src/evaluation/orchestrator.ts — single funnel that attaches case metadata to the result
  • packages/core/src/evaluation/validation/eval-validator.ts — soft warnings
  • packages/core/src/index.ts — re-export via existing barrel
  • apps/cli/src/commands/eval/artifact-writer.ts — writes metadata into JSONL index

Manual test plan (green)

1. Non-breaking baselinebun run validate:examples returns Total: 56 | Valid: 56 | Invalid: 0. None of the existing examples without a governance block emit any warning.

2. Suite-level governance round-trips into JSONL. Created /tmp/agentv-1161-uat/g.eval.yaml with a suite governance: block, ran bun apps/cli/src/cli.ts eval ... --dry-run --target llm, then:

{
  "test_id": "case-1",
  "metadata": {
    "governance": {
      "schema_version": "1.0",
      "owasp_llm_top_10_2025": ["LLM01"],
      "controls": ["NIST-AI-RMF-1.0:MEASURE-2.7", "INTERNAL-POLICY-1.0:CTRL-1"],
      "risk_tier": "high",
      "owner": "platform-team"
    }
  }
}

3. Case-level overrides merge with suite-level. Same run, second case (which adds owasp_llm_top_10_2025: [LLM06] to its own metadata):

{
  "test_id": "case-2",
  "metadata": {
    "governance": {
      "owasp_llm_top_10_2025": ["LLM01", "LLM06"],
      "controls": ["NIST-AI-RMF-1.0:MEASURE-2.7", "INTERNAL-POLICY-1.0:CTRL-1"],
      ...
    }
  }
}

Arrays concatenated (suite + case, deduplicated).

5. SDK type export.

import type { GovernanceMetadata } from '@agentv/core';
const g: GovernanceMetadata = {
  schema_version: '1.0',
  controls: ['ISO-42001-2023:A.6.2.4'],
  owasp_llm_top_10_2025: ['LLM01'],
  risk_tier: 'high',
};

Type compiles via bun run typecheck (both @agentv/core and agentv).

6. Validator warns on typos but does not block. Validating a fixture with the three intentional issues from the test plan:

✓ /tmp/agentv-1161-uat/governance-typos.eval.yaml
  ⚠ [governance.owasp_lm_top_10_2025] Unknown governance field 'owasp_lm_top_10_2025'. ...
  ⚠ [governance.controls[0]] Malformed control 'NIST-AI-RMF:MEASURE-2.7'. Expected '<FRAMEWORK>-<VERSION>:<ID>' ...
  ⚠ [governance.risk_tier] 'risk_tier: critical' is outside EU AI Act vocabulary (prohibited | high | limited | minimal). ...

Total files: 1
Valid: 1
Invalid: 0

7. Custom prefixes are first-class, no warning. Same run, controls: [..., 'INTERNAL-POLICY-1.0:CTRL-1'] validates clean.

Unit tests: 1718 (core) + 67 (eval) + 491 (cli) = 2276 pass / 0 fail. New tests cover suite-level governance, case-level merge, lint warnings for typos / malformed strings / non-EU vocabulary, and case-level governance lint.

Pre-push hook: Build / Typecheck / Lint / Test / Validate eval YAML files all Passed.

Quality-gate self-check

  • ❌ no new CLI flags or subcommands
  • ❌ no new fields on workspace.yaml / targets.yaml
  • ❌ no new runtime dependencies
  • ❌ no hard-coded ID lists for OWASP / NIST / ATLAS
  • ❌ no rejection of unknown frameworks or custom prefixes
  • ❌ no behavior change to agentv eval / compare / results beyond passing the metadata through

Notes

The issue's surface map said result propagation was "already verified" through existing paths, but the JSONL artifact writer didn't carry case-level metadata. This PR adds a single line in orchestrator.ts that attaches evalCase.metadata to the result on the success funnel, plus the matching metadata field on IndexArtifactEntry / ResultIndexArtifact. That keeps the change strictly additive — no existing field is renamed or repurposed.

🤖 Generated with Claude Code

Adds an optional `governance` block (OWASP LLM Top 10 / OWASP Agentic / MITRE
ATLAS / cross-framework controls / EU AI Act risk tier / owner) to suite-level
EvalMetadata and case-level EvalTest.metadata. The shape is permissive: every
field is optional, custom prefixes are first-class, and value validation is a
soft warning, never an error.

Existing evals without the block validate and run unchanged. Case-level blocks
merge with suite-level (arrays concat with dedupe, scalars override). Result
metadata is surfaced into the JSONL artifact so reports and `jq` pipelines can
aggregate by control.

Closes #1161
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: aecd755
Status: ✅  Deploy successful!
Preview URL: https://e9734571.agentv.pages.dev
Branch Preview URL: https://feat-1161-governance-metadat.agentv.pages.dev

View logs

@christso christso marked this pull request as ready for review April 27, 2026 06:47
@christso christso merged commit cd76bf8 into main Apr 27, 2026
4 checks passed
@christso christso deleted the feat/1161-governance-metadata branch April 27, 2026 09:07
christso added a commit that referenced this pull request Apr 27, 2026
The `yaml` package leaves the YAML 1.1 merge key (`<<: *anchor`) as a
literal sibling key when parsing in YAML 1.2 mode (its default). Since
PR #1165 surfaced suite-level governance into JSONL, this caused a
literal `"<<"` key to leak into `metadata.governance` for any case that
used merge syntax (e.g. red-team suites authored in #1166).

Funnel every YAML parse through a new `parseYamlValue` helper that sets
`{ merge: true }`, so merge keys are unwrapped at the parse boundary
once and downstream consumers (loaders, validators, JSONL artifacts)
all benefit consistently. Promptfoo handles this via js-yaml whose
default schema already supports merge keys; we get equivalent behavior.

Regression test asserts `<<` is not retained as a key after parsing a
document with `<<: *anchor`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(core): optional governance metadata on EvalMetadata and EvalTest (OWASP / NIST / ATLAS / controls)

1 participant