Skip to content

feat(backend): add ControlVerificationTemplate table and CRUD endpoints#229

Open
AaronAlijani wants to merge 3 commits into
mainfrom
feature/26T1-BAC-AA-verification-templates
Open

feat(backend): add ControlVerificationTemplate table and CRUD endpoints#229
AaronAlijani wants to merge 3 commits into
mainfrom
feature/26T1-BAC-AA-verification-templates

Conversation

@AaronAlijani
Copy link
Copy Markdown
Collaborator

Summary

Phase 1 of the semi-automated manual verification system. Adds a per-control template table that stores auditor instructions, keywords, severity, and expected evidence type for the 14 manual CIS controls that AutoAudit cannot automate via the M365 collectors.

This is the foundation that later phases build on:

  • Phase 2 (subsequent commits to this branch), confidence scoring algorithm, seed the 14 templates, link evidence uploads to scan_result, auditor workflow.
  • Phase 3 (subsequent commits), status propagation from verdict to scan_result.status, scan finalisation for manual controls.

Companion to Ash Alijani's GRC documentation work on classifying manual controls and authoring the per-control templates JSON.

Type of Change

  • New feature

Affected Components

  • /backend-api

Motivation

Khan asked us to connect the evidence scanner with manual verification to reduce auditor workload. The idea: each manual control gets a verification template with specific auditor instructions and per-control keywords, so when an auditor uploads evidence, the existing OCR and keyword matching can validate it automatically and suggest a verdict with a confidence score. The auditor confirms or overrides.

This PR delivers the foundation, the table that holds those templates and the CRUD endpoints to manage them.

What's Included

  • New table control_verification_template (Alembic revision 8a7b91ea95d9, autogenerated)
    • control_id (unique, indexed)
    • title, instructions, keywords (JSONB), severity, evidence_type
    • created_at / updated_at with now() defaults
  • Pydantic schemas: Create, Update, Read
  • CRUD endpoints under /v1/verification-templates:
    • POST / create (admin only)
    • GET / list all
    • GET /{control_id} fetch one
    • PATCH /{control_id} partial update (admin only)
    • DELETE /{control_id} delete (admin only)
  • Write operations gated via the existing require_admin permission
  • Read operations available to any authenticated user (auditors/viewers need to see instructions in the UI)
  • Unique constraint on control_id returns 409 Conflict on duplicate creates

Testing Done

  • Tested manually — describe how:

Smoke tested locally end-to-end via curl:

  1. alembic upgrade head applied the migration cleanly with no errors
  2. POST /v1/verification-templates/ with admin token created a template for control 1.1.2 and returned 201 Created with the full record including id, created_at, updated_at
  3. GET /v1/verification-templates/1.1.2 returned 200 OK with the same record, JSONB keywords array intact
  4. Duplicate POST for the same control_id returned 409 Conflict with the expected error message, confirms unique constraint and conflict-handling logic
  5. Confirmed the new endpoints appear in /docs (Swagger) under the "Verification Templates" tag

Security Considerations

  • All write endpoints (POST / PATCH / DELETE) gated to admin role via require_admin from app/core/permissions.py (existing pattern).
  • All read endpoints require an authenticated user via get_current_user.
  • No secrets, tokens, or sensitive data introduced.
  • keywords field is stored as JSONB, same pattern as EvidenceValidation.matches_json and ScanResult.evidence, so no new attack surface.

Breaking Changes

  • No breaking changes

The new table is additive. No existing schemas, contracts, or workflows are modified.

Rollback Plan

  • Revert commit is sufficient

If a rollback is required:

docker compose exec backend-api uv run alembic downgrade j1k2l3m4n567

The migration's downgrade() drops the new index and the new table cleanly.

Checklist

  • Code follows project conventions
  • No secrets, credentials, or tokens committed
  • Relevant documentation updated (if applicable), docs are Ash's parallel work
  • CI/CD workflows pass on this branch
  • PR is focused on one thing

Screenshots

All 5 endpoints visible in Swagger UI under the "Verification Templates" tag:

image

GET request returns the created template with all fields and timestamps:

image

Phase 1 of the semi-automated manual verification system. Adds a per-control template that stores auditor instructions, keywords, severity, and expected evidence type for the 14 manual CIS controls.

- New table control_verification_template (Alembic 8a7b91ea95d9)

- Pydantic schemas: Create, Update, Read

- CRUD endpoints under /v1/verification-templates

- Write operations (POST/PATCH/DELETE) gated to admin role

- Read operations available to any authenticated user

- Unique constraint on control_id with 409 Conflict on duplicates

Smoke-tested locally: POST creates row, GET returns it, duplicate POST returns 409.

Refs: Khan's product brief on manual control verification.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 358f6754df

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +134 to +136
update_fields = data.model_dump(exclude_unset=True)
for field, value in update_fields.items():
setattr(template, field, value)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject nulls before applying template updates

When a PATCH request explicitly sends null for non-nullable fields such as title, instructions, keywords, or severity, the update schema accepts it and exclude_unset=True still includes that field, so this loop writes None to columns declared nullable=False. In that scenario the subsequent commit raises a database integrity error and returns a 500 instead of a client error; either disallow None in the update schema or reject provided nulls before assigning.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

@du-dhartley du-dhartley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AaronAlijani Great work on this one so far. The concept makes sense and has been well thought through.
There are couple of comments on specific lines in files to address, however there is one major issue that needs to be considered before this can be merged.

The short summary is that control_id alone is not a suitable unique identifier and we're completely missing the concept of a benchmark/framework version when saving this info.

When a user runs a scan they're selecting a framework, a benchmark and a version, and the policies directory already carries v3.1.0, v4.0.0 and v6.0.0 of the M365 benchmark side by side. CIS renumbers and rewrites controls between versions, so "1.1.2" in v3 isn't the same control as "1.1.2" in v6, and once the Azure benchmarks come online we'll have a second framework sharing the same numeric space. The new table keys uniquely on control_id alone, which means it can only hold one set of instructions per control across the whole platform — there's no way to say "these are the v6.0.0 instructions for 1.1.2" distinctly from the v3.1.0 ones. That'll bite us in Phase 2 the moment we try to seed: either we pick one version and quietly mismatch every other scan, or the second seed run trips the unique constraint. Could you fold framework, benchmark and version onto the table now, and make uniqueness composite across those plus control_id? The lookup pattern in scans.py already passes that tuple through end to end, so it'll plug in cleanly and keep the Phase 2 seeding honest about which version each template belongs to.

Comment thread backend-api/app/models/__init__.py Outdated
from app.models.evidence_validation import EvidenceValidation
from app.models.contact import ContactSubmission, SubmissionNote, SubmissionHistory
from app.models.user_settings import UserSettings
from app.models.user_settings import UserSettings
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is a duplicate of the one above and should be removed

keywords: Mapped[list] = mapped_column(JSONB, nullable=False)

# Risk severity: "high" | "medium" | "low".
severity: Mapped[str] = mapped_column(String(20), nullable=False)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above shows what the values should be, but there's no actual enforcement that what's coming in here matches those values. If this is what we expect to see, we should enforce this - Pydantic supports the concept of a Literal that does this, such as

from typing import Literal

Severity = Literal["high", "medium", "low"]

@AaronAlijani
Copy link
Copy Markdown
Collaborator Author

Hi David, thanks for the thorough review. All four points addressed in the latest commit:

  1. Composite key : control_id alone is not enough. Templates are now keyed on the full (framework, benchmark, version, control_id) tuple, mirroring the same fields Scan already carries (String(50)/(100)/(20) respectively). Column types and lookup shape match scans.py end-to-end, so Phase 2 seeding can pass the tuple straight through. The unique constraint is composite (uq_template_framework_benchmark_version_control), and control_id is still indexed for lookups, but no longer uniquely so. Endpoints now use the path pattern /{framework}/{benchmark}/{version}/{control_id}, and the list endpoint accepts optional framework/benchmark/version query filters for scoped seeding.
  2. Severity enforcement. Added Severity = Literal["high", "medium", "low"] (and EvidenceType = Literal["screenshot", "pdf_export", "comment_only"] for the same reason) on Create and Update. Invalid values now return 422 at the API layer instead of being silently accepted as strings. DB column kept as String(20) for forward-compat.
  3. PATCH null handling (Codex review). Added a @model_validator on Update that rejects explicit nulls on title, instructions, keywords, and severity (the non-nullable columns). Clients now get a clean 422 instead of triggering a 500 through the integrity error.
  4. Duplicate UserSettings import. Removed.

Migration chain note: PR #166 and this PR both branch from j1k2l3m4n567 as their down_revision. That's fine while they're both unmerged, but whichever merges second will need its down_revision updated to point at the first one's revision, so Alembic has a single linear head.

@du-dhartley du-dhartley self-requested a review May 22, 2026 11:59
Copy link
Copy Markdown
Collaborator

@du-dhartley du-dhartley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AaronAlijani Thanks for the changes, this PR looks good now.
I've approved this, but agree with your comment about the migration - if you merge this one, after you address the last remaining comment in your other PR, push up the migration change and that can be approved also.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

CI: Backend API

Job Result
Security analysis (CodeQL + Bandit) failure
Lint success

One or more checks failed. View logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants