AI-QA Governance Framework

Requirements-driven framework for making AI features QA-ready, auditable, and release-gated.

This repository is designed for teams that need more than raw eval scores. It connects LLM test execution with requirements, risks, acceptance criteria, traceability, and release decisions so that AI quality can be discussed in the language of QA, test management, delivery governance, and stakeholder approvals.

Core Narrative

We do not just test prompts, we make AI features release-ready.
We do not just collect scores, we generate evidence for GO, GO WITH RISKS, and NO-GO.
We do not just benchmark models, we connect AI behavior to requirements, risks, and governance controls.

Why This Repo Is Useful

Traditional QA methods break down for AI-assisted features because outputs are non-deterministic, failure modes are new, and "expected result" logic alone is too weak.

This framework addresses that gap by turning AI quality into structured QA artifacts:

Versioned requirements with priorities, risks, governance controls, and acceptance criteria
Reusable scenario definitions for security, hallucination, RAG, bias, consistency, and performance
Traceability from requirement to executed evidence
Release gates that convert raw test outcomes into a management decision
Governance-ready outputs for QA, product, engineering, and audit conversations

What You Get

Technical evidence

Multi-provider LLM execution through one client abstraction
Reusable scenario-based checks across major AI quality dimensions
Regression-ready pytest suite for repeated validation

QA and governance outputs

dashboard.html for release-readiness and coverage overview
traceability.json and traceability.html for requirement-to-scenario evidence
release_summary.json for release decision and threshold comparison
audit_report.json and audit_report.html for management-readable governance assessment
evidence_manifest.json for packaging the generated evidence set

Main Use Cases

1. Chatbot release readiness

Evaluate a conversational assistant before production release.

Prompt injection resistance
Secret leakage prevention
Consistency and tone checks
Management-facing release decision

Starter: templates/chatbot

2. RAG assistant quality assurance

Validate that answers remain grounded in retrieved content.

Grounding and faithfulness
Contradiction handling
No unsupported claims outside provided context
Evidence-backed release gates

Starter: templates/rag_assistant

3. Internal knowledge search governance

Assess an internal AI assistant that summarizes company policies or documentation.

Grounding in approved content
Explicit handling of missing information
Safe boundaries for sensitive internal policies
Audit-friendly evidence for internal stakeholders

Starter: templates/internal_knowledge_search

AI-QA Maturity Model

The repo now includes an AI-QA maturity framing for consulting and assessment work:

Ad hoc
Repeatable
Governed
Release-gated
Continuous assurance

See docs/AI_QA_MATURITY_MODEL.md.

Supported Quality Dimensions

Category	Focus
Security	Prompt injection, secret leakage, misuse prevention
Hallucination	Known facts, unverifiable entities, URL caution, basic math
RAG	Grounding, no extra information, contradictions, multilingual context
Performance	Latency, token efficiency, non-empty responses
Bias	Gender, names, stereotypes, age, political balance
Consistency	Stable answers, repeated runs, tone compliance
UI	Generic chatbot UI flow and accessibility checks

Installation

git clone https://github.com/Lengi96/ai-qa-framework.git
cd ai-qa-framework

pip install -r requirements.txt

# Optional extras
pip install .[openai]
pip install .[google]
pip install .[ui]
pip install .[dashboard]
pip install .[all]

# UI browser
playwright install chromium

cp .env.example .env

Configure one or more provider keys:

ANTHROPIC_API_KEY
OPENAI_API_KEY
GOOGLE_API_KEY

Running the Suite

# Default LLM tests
pytest

# Skip UI tests
pytest -m "not ui"

# Provider/model override
pytest --provider openai --model gpt-4o
pytest --provider google --model gemini-2.0-flash

# HTML pytest report
pytest --html=report.html --self-contained-html

UI tests

pytest tests/test_ui.py --base-url http://localhost:3000
pytest tests/test_ui.py --base-url http://localhost:3000 --headed

Optional selector overrides:

--selector-input
--selector-send
--selector-messages
--selector-response
--selector-loading
--selector-error

Generating Governance Artifacts

Run the suite with JSON reporting and then generate the dashboard plus governance outputs:

pytest tests/ -m "not ui" --json-report --json-report-file=results.json

python -m src.dashboard.generate results.json \
  -o dashboard.html \
  --provider anthropic \
  --model claude-haiku-4-5 \
  --traceability-out traceability.json \
  --traceability-html traceability.html \
  --release-summary-out release_summary.json \
  --audit-report-out audit_report.json \
  --audit-report-html audit_report.html \
  --evidence-manifest-out evidence_manifest.json

This generates:

dashboard.html
traceability.json
traceability.html
release_summary.json
audit_report.json
audit_report.html
evidence_manifest.json

Governance additions in the report layer

Weighted quality index based on requirement priority and risk
AI-QA maturity assessment for stakeholder conversations
Governance control mapping per requirement
Historical release view across prior report files
Recommended actions for open quality gaps

Repository Structure

ai-qa-framework/
├── config/
│   └── quality_gates.yaml
├── docs/
│   └── AI_QA_MATURITY_MODEL.md
├── requirements/
│   └── core_requirements.yaml
├── scenarios/
│   └── core_scenarios.yaml
├── src/
│   ├── dashboard/
│   │   └── generate.py
│   ├── quality/
│   │   ├── reporting.py
│   │   ├── scenario_runner.py
│   │   └── specs.py
│   └── llm_client.py
├── templates/
│   ├── chatbot/
│   ├── internal_knowledge_search/
│   └── rag_assistant/
└── tests/

Project Positioning

This repo is strongest when positioned as a requirements-driven AI-QA and release-governance framework, not just as a generic LLM test harness.

It is particularly useful for:

AI-QA assessments in consulting or client projects
Release-readiness checks for AI-enabled features
Model-switch validation between providers or versions
RAG governance for internal knowledge assistants
Audit-friendly evidence generation for sensitive AI use cases

CI/CD

GitHub Actions is set up for recurring and change-based validation:

Push to main
Pull requests
Weekly schedule
Manual trigger

For repository setup:

Add provider API keys as GitHub Actions secrets
Add CHATBOT_BASE_URL as a repository variable for UI runs

Roadmap Direction

The current implementation is optimized for a consulting/demo asset with real technical value. The highest-leverage future extensions remain:

Trend analysis across releases and model changes
Domain-specific requirement/scenario catalogs
Ticket/backlog integration for open quality gaps
Governance/compliance mapping for specific control frameworks
Executive summary outputs for non-technical stakeholders

License

MIT License. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-QA Governance Framework

Core Narrative

Why This Repo Is Useful

What You Get

Technical evidence

QA and governance outputs

Main Use Cases

1. Chatbot release readiness

2. RAG assistant quality assurance

3. Internal knowledge search governance

AI-QA Maturity Model

Supported Quality Dimensions

Installation

Running the Suite

UI tests

Generating Governance Artifacts

Governance additions in the report layer

Repository Structure

Project Positioning

CI/CD

Roadmap Direction

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
config		config
docs		docs
requirements		requirements
scenarios		scenarios
src		src
templates		templates
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dashboard.html		dashboard.html
pyproject.toml		pyproject.toml
report.html		report.html
requirements.txt		requirements.txt
results.json		results.json
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

AI-QA Governance Framework

Core Narrative

Why This Repo Is Useful

What You Get

Technical evidence

QA and governance outputs

Main Use Cases

1. Chatbot release readiness

2. RAG assistant quality assurance

3. Internal knowledge search governance

AI-QA Maturity Model

Supported Quality Dimensions

Installation

Running the Suite

UI tests

Generating Governance Artifacts

Governance additions in the report layer

Repository Structure

Project Positioning

CI/CD

Roadmap Direction

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages