Skip to content

New agent: Red Team security scanner #10

@markgar

Description

@markgar

Summary

Add a new "red team" agent that runs after all builders finish, spins up the application in its own Docker containers, and executes automated security scanning using OWASP ZAP, Nuclei, and Trivy. Findings are filed as GitHub Issues with a new security label.

Design

Trigger: end-of-build (not milestone-triggered)

Unlike the tester/validator (which run after each milestone), the red team agent waits for all builders to finish, then runs a single comprehensive security audit against the complete application. It polls is_builder_done() like other agents, but executes only once instead of looping per milestone.

Security scanning tools (all Docker-based, no Python deps)

Copilot CLI orchestrates three Docker-based scanners via shell commands:

OWASP ZAP (full scan) — dynamic application security testing (DAST)

  • Crawls the app from the root URL, discovers pages/forms/endpoints automatically
  • Injects attack payloads into every discovered input (SQL injection, XSS, path traversal, etc.)
  • API scan mode if OpenAPI/Swagger spec exists at common paths (/openapi.json, /swagger.json)
  • Authenticated scanning: Copilot reads the codebase for seed data / registration endpoints, creates a test user, logs in, and passes auth context to ZAP so it can test the surface behind login
  • docker run --rm --network host ghcr.io/zaproxy/zaproxy zap-full-scan.py -t http://localhost:{port} -J zap-report.json

Nuclei — known vulnerability template scanner (6000+ community templates)

  • Checks for exposed admin panels, default credentials, debug endpoints, git/config exposure, open redirects, tech fingerprinting mapped to known CVEs
  • docker run --rm --network host projectdiscovery/nuclei -u http://localhost:{port} -t http/ -j -o nuclei-results.json

Trivy — container + dependency scanning (static, doesn't probe the running app)

  • Scans the Docker image for vulnerable OS packages and language dependencies (npm, pip, NuGet)
  • docker run --rm aquasec/trivy image {app_image} --format json -o trivy-results.json

What it catches (concrete examples)

Tool Vulnerability How
ZAP SQL injection on login form Injects ' OR 1=1 -- into username field, detects auth bypass
ZAP Reflected XSS Injects <script>alert(1)</script> into inputs, checks if rendered unescaped
ZAP Broken access control Accesses authenticated endpoints without session cookie, flags if data returned
ZAP IDOR (with auth) Changes /api/users/5/profile to /api/users/6/profile, checks if user 5 sees user 6's data
ZAP CSRF Checks if state-changing POST endpoints work without CSRF tokens
ZAP Missing security headers Flags missing X-Content-Type-Options, CSP, HSTS, etc.
ZAP Cookie issues Session cookie missing HttpOnly, Secure, SameSite flags
ZAP CORS misconfiguration Sends Origin: https://evil.com, flags Access-Control-Allow-Origin: * on auth endpoints
Nuclei Exposed debug endpoints /actuator/env, /debug, /.env accessible
Nuclei Default credentials admin/admin works on login
Nuclei Git config exposure /.git/config accessible
Trivy Vulnerable dependencies lodash@4.17.15 has prototype pollution CVE
Trivy Vulnerable base image Alpine has OpenSSL vulnerability

Containers: spins up its own

Uses the same compute_project_ports() function but with a "redteam-" + project_name input to produce deterministic non-colliding ports vs. the validator. Tears down containers after scanning (docker compose down).

GitHub Issues: new security label

Each finding filed as: gh issue create --title '[security] <category>: <description>' --body '<reproduction steps, severity, affected endpoint>' --label security

Duplicate checking via gh issue list --label security --state open before filing.

Results file

Writes redteam-results.txt (one line per test: PASS/FAIL with category tags like [injection], [xss], [auth], [headers], [deps]). Copied to logs/redteam-results.txt by the Python orchestration. Summary printed to terminal.

Implementation

Files to create

File Contents
src/buildteam/redteam.py Agent module: redteamloop() CLI function, register(app), poll-then-run loop, results summary
src/buildteam/prompts/redteam.py REDTEAM_PROMPT format string — instructs Copilot to run ZAP/Nuclei/Trivy, authenticated scan, parse results, file issues
tests/test_redteam.py Tests for results summary parsing, port isolation

Files to modify

File Change
src/buildteam/prompts/__init__.py Import + re-export REDTEAM_PROMPT
src/buildteam/cli.py Import redteam module, call register(app)
src/buildteam/orchestrator.py Add "redteam" to _AGENT_ROLES, _clone_all_agents, _pull_all_clones; add --redteam-model option to go(); call ensure_security_label_exists() in launch; spawn redteam terminal
src/buildteam/bootstrap.py Add "redteam" to _clone_agent_copies list
src/buildteam/utils.py Add ensure_security_label_exists()gh label create security --color b60205 --force
src/buildteam/utils.py Move compute_project_ports() from validator.py to utils.py (shared by validator + redteam)
src/buildteam/validator.py Update import of compute_project_ports from utils
src/buildteam/prompts/builder.py Add gh issue list --label security --state open to builder's issue-checking instructions so builder fixes security findings
AGENTS.md Add Red Team agent section

Prompt design notes

The prompt should instruct Copilot to:

  1. Read DEPLOY.md for container config, SPEC.md for tech stack, REQUIREMENTS.md for feature inventory
  2. Start app containers on isolated ports (redteam-prefixed COMPOSE_PROJECT_NAME)
  3. Wait for health check
  4. Run ZAP baseline scan first to verify connectivity
  5. Discover auth mechanism from codebase — create test user if registration exists, use seed data if available
  6. Run ZAP full scan (unauthenticated)
  7. Run ZAP full scan (authenticated) if login was possible
  8. Run ZAP API scan if OpenAPI spec found
  9. Run Nuclei HTTP templates
  10. Run Trivy on app Docker image
  11. Parse all JSON reports, file GitHub Issues for medium+ severity findings
  12. Write redteam-results.txt with one line per check
  13. Tear down containers
  14. Commit artifacts with [redteam] prefix, do NOT push

What it won't catch

Business logic flaws, authorization between specific users beyond simple IDOR, race conditions, or vulnerabilities requiring domain-specific multi-step setup. Those would need manual penetration testing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions