Skip to content

test(T14-T15): governance extraction and audit engine tests#18

Closed
ogkranthi wants to merge 5 commits into
mainfrom
agent/tester/T14-T15
Closed

test(T14-T15): governance extraction and audit engine tests#18
ogkranthi wants to merge 5 commits into
mainfrom
agent/tester/T14-T15

Conversation

@ogkranthi
Copy link
Copy Markdown
Owner

Summary

Closes T14 and T15 (Week 7 governance framework tests).

T14 — Governance extraction tests (tests/test_governance_extraction.py, 68 tests)

  • GuardrailRule classification: all 7 categories (safety/privacy/compliance/ethical/operational/scope/general), all 4 severity levels, defaults, invalid values rejected
  • ToolPermission field population: access levels (full/read-only/disabled), deny_patterns, allow_patterns, rate_limit, max_value, notes, enabled flag
  • PlatformAnnotation (L3): all 4 kinds (content_filter/pii_detection/denied_topics/grounding_check), all 4 platform targets, config dict nesting, storage on Governance/AgentIR
  • Edge cases: empty governance, missing required fields, extra fields forbidden (extra='forbid'), per-instance default factories, all three layers combined

T15 — Audit engine tests (tests/test_governance_audit.py, 73 tests)

  • GPR-L1: always 1.0 (L1 always preserved), zero-guardrail default = 1.0, platform-invariant
  • GPR-L2: native-only rate; elevated ≠ preserved; platform matrix (copilot→0, bedrock→preserves disabled_tool, rate_limit always elevated)
  • GPR-L3: zero default (contrast with L1/L2); bedrock full support=1.0; vertex partial (cf+pii only); claude-code none
  • GPR-Overall: weighted by artifact count (l1+l2+l3), platform comparison, formula verified
  • CFS: all four checks, formula ratio = sum/4, range [0,1]
  • Elevation tracking: disabled tools, deny patterns, allow patterns, rate limits, L3 annotations → L1 instructions; artifact key presence; elevated_instruction non-empty; claude-code deny passthrough (no re-elevation)
  • CSV export: exact 16-column header match, parent dir creation, 4-decimal float formatting, empty list → header-only
  • JSON export: top-level keys, l1/l2/l3 subkey structure, elevated_artifacts list + keys, empty list → []
  • audit_batch: agents×targets Cartesian product, empty agents, empty targets
  • Edge cases: zero tools, all-denied copilot, perfect bedrock, agent_id defaults to ir.name

Results

782 passed in 44.17s

All 141 new tests pass. All 641 pre-existing tests continue to pass.

… experiments) + clean up duplicate files

- Add Governance, Guardrail, ToolPermission, PlatformAnnotation to IR model
- Add governance extraction to OpenClaw parser (SOUL.md + tool permissions + L3 annotations)
- Add elevation engine (elevate_governance) for L2/L3 → L1 promotion
- Add governance_audit module with GPR/CFS scoring, CSV/JSON export, Rich tables
- Add `agentshift audit` and `agentshift audit-batch` CLI commands
- Integrate elevation into claude_code + copilot emitters
- Add experiments/ directory with 12 domain agents for research paper
- Remove duplicate sections 2.py and persona-sections-schema 2.md
- Mark T13 as merged in BACKLOG.md
T14 — test_governance_extraction.py (68 tests):
- GuardrailRule category/severity classification (all 7 categories, 4 severities)
- ToolPermission field population: access, deny/allow patterns, rate_limit, max_value
- PlatformAnnotation L3 parsing: all 4 kinds, all 4 platform targets, config dict
- Edge cases: empty governance, missing required fields, extra fields rejected,
  defaults verified, combined layers

T15 — test_governance_audit.py (73 tests):
- GPR-L1 formula: always 1.0 (L1 always preserved), zero-guardrail default
- GPR-L2 formula: native-only rate, elevated != preserved, platform matrix verified
- GPR-L3 formula: zero default, bedrock full support, vertex partial, claude-code none
- GPR-Overall: weighted by artifact count, cross-platform comparison
- CFS: identity/tools/memory/schema checks, formula ratio
- Elevation tracking: disabled tools, deny patterns, allow patterns, rate limits,
  L3 annotations; artifact keys, non-empty instructions, claude-code deny passthrough
- CSV export: header exact match, parent dir creation, 4-decimal formatting, empty list
- JSON export: structure, l1/l2/l3 subkeys, elevated_artifacts keys, empty list
- audit_batch: agents×targets, empty cases
- Edge cases: zero tools, all-denied copilot, perfect bedrock scores, agent_id defaults
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Mar 29, 2026

Deploying agentshift with  Cloudflare Pages  Cloudflare Pages

Latest commit: d47abfc
Status: ✅  Deploy successful!
Preview URL: https://00a4fbc5.agentshift.pages.dev
Branch Preview URL: https://agent-tester-t14-t15.agentshift.pages.dev

View logs

…om bedrock/vertex, v0.3.0

D22 — AWS Bedrock parser (src/agentshift/parsers/bedrock.py)
- Reads bedrock-agent.json, cloudformation.yaml, instruction.txt, openapi.json,
  guardrail-config.json (any combination; precedence: bedrock-agent.json > CFN > txt)
- Reconstructs persona.system_prompt with section extraction
- Extracts tools from OpenAPI action-group schemas (with CFN fallback)
- Extracts knowledge sources from AWS::Bedrock::KnowledgeBase CFN resources
- Strips AgentShift truncation notice from instruction.txt
- Heuristic L1 guardrail extraction from instruction + guardrail-config.json topic policies
- Registered under 'bedrock' parser key

D23 — Vertex AI parser (src/agentshift/parsers/vertex.py)
- Reads agent.json (required) + optional tools.json and README.md
- Reconstructs system_prompt from goal + instructions with separator
- Recovers structured sections from 'SectionName:\ncontent' linearized patterns
- Detects tool kind: function / openapi / data store (routed to ir.knowledge)
- Reconstructs ToolAuth from Vertex authentication blocks (apiKey/oauth/serviceAccount)
- Heuristic L1 guardrail scan of instruction strings
- Registered under 'vertex' parser key

Shared utilities (src/agentshift/parsers/utils.py)
- slugify, title_case_to_slug, is_todo_placeholder
- infer_guardrail_category, infer_guardrail_severity
- extract_guardrails_from_text (shared by both parsers)

D24 — CLI updates (src/agentshift/cli.py)
- Added 'bedrock' and 'vertex' to _PARSERS registry
- convert/diff/audit now support --from bedrock and --from vertex
- Enhanced _parse_with_errors with bedrock/vertex-specific error hints

D25 — Version bump to 0.3.0
- pyproject.toml version: 0.3.0
- __init__.py __version__: 0.3.0
- CHANGELOG.md: v0.3.0 section added
- README.md: governance layer docs + cloud parser examples

Tests: 782 passed (all existing tests continue to pass)
@ogkranthi
Copy link
Copy Markdown
Owner Author

Tests cherry-picked to main directly (commit 27b27ca). T14-T15 merged.

@ogkranthi ogkranthi closed this Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant