Summary
Add a new "red team" agent that runs after all builders finish, spins up the application in its own Docker containers, and executes automated security scanning using OWASP ZAP, Nuclei, and Trivy. Findings are filed as GitHub Issues with a new security label.
Design
Trigger: end-of-build (not milestone-triggered)
Unlike the tester/validator (which run after each milestone), the red team agent waits for all builders to finish, then runs a single comprehensive security audit against the complete application. It polls is_builder_done() like other agents, but executes only once instead of looping per milestone.
Security scanning tools (all Docker-based, no Python deps)
Copilot CLI orchestrates three Docker-based scanners via shell commands:
OWASP ZAP (full scan) — dynamic application security testing (DAST)
- Crawls the app from the root URL, discovers pages/forms/endpoints automatically
- Injects attack payloads into every discovered input (SQL injection, XSS, path traversal, etc.)
- API scan mode if OpenAPI/Swagger spec exists at common paths (
/openapi.json, /swagger.json)
- Authenticated scanning: Copilot reads the codebase for seed data / registration endpoints, creates a test user, logs in, and passes auth context to ZAP so it can test the surface behind login
docker run --rm --network host ghcr.io/zaproxy/zaproxy zap-full-scan.py -t http://localhost:{port} -J zap-report.json
Nuclei — known vulnerability template scanner (6000+ community templates)
- Checks for exposed admin panels, default credentials, debug endpoints, git/config exposure, open redirects, tech fingerprinting mapped to known CVEs
docker run --rm --network host projectdiscovery/nuclei -u http://localhost:{port} -t http/ -j -o nuclei-results.json
Trivy — container + dependency scanning (static, doesn't probe the running app)
- Scans the Docker image for vulnerable OS packages and language dependencies (npm, pip, NuGet)
docker run --rm aquasec/trivy image {app_image} --format json -o trivy-results.json
What it catches (concrete examples)
| Tool |
Vulnerability |
How |
| ZAP |
SQL injection on login form |
Injects ' OR 1=1 -- into username field, detects auth bypass |
| ZAP |
Reflected XSS |
Injects <script>alert(1)</script> into inputs, checks if rendered unescaped |
| ZAP |
Broken access control |
Accesses authenticated endpoints without session cookie, flags if data returned |
| ZAP |
IDOR (with auth) |
Changes /api/users/5/profile to /api/users/6/profile, checks if user 5 sees user 6's data |
| ZAP |
CSRF |
Checks if state-changing POST endpoints work without CSRF tokens |
| ZAP |
Missing security headers |
Flags missing X-Content-Type-Options, CSP, HSTS, etc. |
| ZAP |
Cookie issues |
Session cookie missing HttpOnly, Secure, SameSite flags |
| ZAP |
CORS misconfiguration |
Sends Origin: https://evil.com, flags Access-Control-Allow-Origin: * on auth endpoints |
| Nuclei |
Exposed debug endpoints |
/actuator/env, /debug, /.env accessible |
| Nuclei |
Default credentials |
admin/admin works on login |
| Nuclei |
Git config exposure |
/.git/config accessible |
| Trivy |
Vulnerable dependencies |
lodash@4.17.15 has prototype pollution CVE |
| Trivy |
Vulnerable base image |
Alpine has OpenSSL vulnerability |
Containers: spins up its own
Uses the same compute_project_ports() function but with a "redteam-" + project_name input to produce deterministic non-colliding ports vs. the validator. Tears down containers after scanning (docker compose down).
GitHub Issues: new security label
Each finding filed as: gh issue create --title '[security] <category>: <description>' --body '<reproduction steps, severity, affected endpoint>' --label security
Duplicate checking via gh issue list --label security --state open before filing.
Results file
Writes redteam-results.txt (one line per test: PASS/FAIL with category tags like [injection], [xss], [auth], [headers], [deps]). Copied to logs/redteam-results.txt by the Python orchestration. Summary printed to terminal.
Implementation
Files to create
| File |
Contents |
src/buildteam/redteam.py |
Agent module: redteamloop() CLI function, register(app), poll-then-run loop, results summary |
src/buildteam/prompts/redteam.py |
REDTEAM_PROMPT format string — instructs Copilot to run ZAP/Nuclei/Trivy, authenticated scan, parse results, file issues |
tests/test_redteam.py |
Tests for results summary parsing, port isolation |
Files to modify
| File |
Change |
src/buildteam/prompts/__init__.py |
Import + re-export REDTEAM_PROMPT |
src/buildteam/cli.py |
Import redteam module, call register(app) |
src/buildteam/orchestrator.py |
Add "redteam" to _AGENT_ROLES, _clone_all_agents, _pull_all_clones; add --redteam-model option to go(); call ensure_security_label_exists() in launch; spawn redteam terminal |
src/buildteam/bootstrap.py |
Add "redteam" to _clone_agent_copies list |
src/buildteam/utils.py |
Add ensure_security_label_exists() — gh label create security --color b60205 --force |
src/buildteam/utils.py |
Move compute_project_ports() from validator.py to utils.py (shared by validator + redteam) |
src/buildteam/validator.py |
Update import of compute_project_ports from utils |
src/buildteam/prompts/builder.py |
Add gh issue list --label security --state open to builder's issue-checking instructions so builder fixes security findings |
AGENTS.md |
Add Red Team agent section |
Prompt design notes
The prompt should instruct Copilot to:
- Read
DEPLOY.md for container config, SPEC.md for tech stack, REQUIREMENTS.md for feature inventory
- Start app containers on isolated ports (redteam-prefixed
COMPOSE_PROJECT_NAME)
- Wait for health check
- Run ZAP baseline scan first to verify connectivity
- Discover auth mechanism from codebase — create test user if registration exists, use seed data if available
- Run ZAP full scan (unauthenticated)
- Run ZAP full scan (authenticated) if login was possible
- Run ZAP API scan if OpenAPI spec found
- Run Nuclei HTTP templates
- Run Trivy on app Docker image
- Parse all JSON reports, file GitHub Issues for medium+ severity findings
- Write
redteam-results.txt with one line per check
- Tear down containers
- Commit artifacts with
[redteam] prefix, do NOT push
What it won't catch
Business logic flaws, authorization between specific users beyond simple IDOR, race conditions, or vulnerabilities requiring domain-specific multi-step setup. Those would need manual penetration testing.
Summary
Add a new "red team" agent that runs after all builders finish, spins up the application in its own Docker containers, and executes automated security scanning using OWASP ZAP, Nuclei, and Trivy. Findings are filed as GitHub Issues with a new
securitylabel.Design
Trigger: end-of-build (not milestone-triggered)
Unlike the tester/validator (which run after each milestone), the red team agent waits for all builders to finish, then runs a single comprehensive security audit against the complete application. It polls
is_builder_done()like other agents, but executes only once instead of looping per milestone.Security scanning tools (all Docker-based, no Python deps)
Copilot CLI orchestrates three Docker-based scanners via shell commands:
OWASP ZAP (full scan) — dynamic application security testing (DAST)
/openapi.json,/swagger.json)docker run --rm --network host ghcr.io/zaproxy/zaproxy zap-full-scan.py -t http://localhost:{port} -J zap-report.jsonNuclei — known vulnerability template scanner (6000+ community templates)
docker run --rm --network host projectdiscovery/nuclei -u http://localhost:{port} -t http/ -j -o nuclei-results.jsonTrivy — container + dependency scanning (static, doesn't probe the running app)
docker run --rm aquasec/trivy image {app_image} --format json -o trivy-results.jsonWhat it catches (concrete examples)
' OR 1=1 --into username field, detects auth bypass<script>alert(1)</script>into inputs, checks if rendered unescaped/api/users/5/profileto/api/users/6/profile, checks if user 5 sees user 6's dataX-Content-Type-Options,CSP,HSTS, etc.HttpOnly,Secure,SameSiteflagsOrigin: https://evil.com, flagsAccess-Control-Allow-Origin: *on auth endpoints/actuator/env,/debug,/.envaccessibleadmin/adminworks on login/.git/configaccessiblelodash@4.17.15has prototype pollution CVEContainers: spins up its own
Uses the same
compute_project_ports()function but with a"redteam-" + project_nameinput to produce deterministic non-colliding ports vs. the validator. Tears down containers after scanning (docker compose down).GitHub Issues: new
securitylabelEach finding filed as:
gh issue create --title '[security] <category>: <description>' --body '<reproduction steps, severity, affected endpoint>' --label securityDuplicate checking via
gh issue list --label security --state openbefore filing.Results file
Writes
redteam-results.txt(one line per test:PASS/FAILwith category tags like[injection],[xss],[auth],[headers],[deps]). Copied tologs/redteam-results.txtby the Python orchestration. Summary printed to terminal.Implementation
Files to create
src/buildteam/redteam.pyredteamloop()CLI function,register(app), poll-then-run loop, results summarysrc/buildteam/prompts/redteam.pyREDTEAM_PROMPTformat string — instructs Copilot to run ZAP/Nuclei/Trivy, authenticated scan, parse results, file issuestests/test_redteam.pyFiles to modify
src/buildteam/prompts/__init__.pyREDTEAM_PROMPTsrc/buildteam/cli.pyregister(app)src/buildteam/orchestrator.py"redteam"to_AGENT_ROLES,_clone_all_agents,_pull_all_clones; add--redteam-modeloption togo(); callensure_security_label_exists()in launch; spawn redteam terminalsrc/buildteam/bootstrap.py"redteam"to_clone_agent_copieslistsrc/buildteam/utils.pyensure_security_label_exists()—gh label create security --color b60205 --forcesrc/buildteam/utils.pycompute_project_ports()from validator.py to utils.py (shared by validator + redteam)src/buildteam/validator.pycompute_project_portsfrom utilssrc/buildteam/prompts/builder.pygh issue list --label security --state opento builder's issue-checking instructions so builder fixes security findingsAGENTS.mdPrompt design notes
The prompt should instruct Copilot to:
DEPLOY.mdfor container config,SPEC.mdfor tech stack,REQUIREMENTS.mdfor feature inventoryCOMPOSE_PROJECT_NAME)redteam-results.txtwith one line per check[redteam]prefix, do NOT pushWhat it won't catch
Business logic flaws, authorization between specific users beyond simple IDOR, race conditions, or vulnerabilities requiring domain-specific multi-step setup. Those would need manual penetration testing.