docs: add 2026-04-26 repository audit and update status links#274
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new repository audit report and updates the project status documentation to include recent audit references. The review feedback focuses on improving documentation navigability by using markdown links, qualifying audit findings due to a logic error in the underlying import-checking tool, and refining the recommended fixes for subprocess handling in the auto-ml and gpu-renderer services to ensure they are accurate and comprehensive.
| - Repository audit (2026-04-26): `docs/development/repository-audit-2026-04-26.md` | ||
| - Security audit (2026-04-24): `docs/security/full-repo-audit-2026-04-24.md` |
There was a problem hiding this comment.
The pull request description mentions linking the new audit and the recent security audit. However, these are currently provided as plain text paths. To improve documentation navigability and fulfill the intent of the change, these should be formatted as markdown links using relative paths.
| - Repository audit (2026-04-26): `docs/development/repository-audit-2026-04-26.md` | |
| - Security audit (2026-04-24): `docs/security/full-repo-audit-2026-04-24.md` | |
| - [Repository audit (2026-04-26)](repository-audit-2026-04-26.md) | |
| - [Security audit (2026-04-24)](../security/full-repo-audit-2026-04-24.md) |
|
|
||
| Current guardrails are in a healthy state for import/runtime boundaries: | ||
|
|
||
| - ✅ Runtime import leak checks pass. |
There was a problem hiding this comment.
The conclusion that runtime import leak checks pass may be a false positive. The tool used for this check, tools/check_no_runtime_imports.py, contains a logic error in its AST traversal (lines 41-43). It iterates over child nodes of top-level statements but fails to recursively call check_nodes on their bodies. Consequently, illegal imports nested within functions or classes in the services/ directory are not detected. This audit finding should be qualified until the tool is fixed to use a complete traversal (e.g., via ast.walk).
| - **File:** `services/auto-ml/loop.py` | ||
| - **Observed pattern:** `subprocess.run(...)` | ||
| - **Risk:** Blocking process execution can reduce worker responsiveness and increases blast radius during command stalls/timeouts. | ||
| - **Recommended fix:** Replace with `asyncio.create_subprocess_exec(...)`, bounded timeout handling, and explicit stderr/stdout capture policy. |
There was a problem hiding this comment.
The recommended fix for the auto-ml finding is partially redundant. The current implementation in services/auto-ml/loop.py (lines 30-32) already implements bounded timeouts and explicit stderr/stdout capture. The recommendation should be refined to focus on the transition to asyncio and the necessary cascading changes to the calling logic.
| - **Recommended fix:** Replace with `asyncio.create_subprocess_exec(...)`, bounded timeout handling, and explicit stderr/stdout capture policy. | |
| - **Recommended fix:** Replace with asyncio.create_subprocess_exec(...) and refactor the calling logic (train, evaluate, deploy, and the main loop) to be async-safe. |
| - **File:** `services/gpu-renderer/src/core/render.py` | ||
| - **Observed pattern:** `subprocess.run(cmd, check=True)` | ||
| - **Risk:** Same blocking/availability concerns; additionally can create throughput bottlenecks under concurrent render load. | ||
| - **Recommended fix:** Convert render execution path to async subprocess management and enforce command allowlist validation before execution. |
There was a problem hiding this comment.
The recommendation for the gpu-renderer finding should explicitly include timeout handling and output capture. Unlike the auto-ml service, the current implementation in services/gpu-renderer/src/core/render.py (line 57) lacks these protections, which are critical for managing long-running ffmpeg processes and diagnosing failures.
| - **Recommended fix:** Convert render execution path to async subprocess management and enforce command allowlist validation before execution. | |
| - **Recommended fix:** Convert render execution path to async subprocess management, implement bounded timeouts with output capture, and enforce command allowlist validation. |
Motivation
Description
docs/development/repository-audit-2026-04-26.mdthat documents scope, methodology, findings, and prioritized next actions.docs/development/current-status-and-next-steps.mdto correct the roadmap wording (change "Priority A-D" to "Priority A-C") and add a new "Latest audit references" section linking the new audit and the recent security audit.subprocess.run(...)use inservices/auto-ml/loop.pyandservices/gpu-renderer/src/core/render.py, and a missing test dependency (httpx) that prevents local collection of a runtime test.Testing
python tools/check_no_runtime_imports.pywhich passed and reported no runtime import leaks.python tools/check_runtime_dependency_guards.pyandpython tools/check_imports.py, both of which passed.rg -n "shell=True|subprocess\.run\(|os\.system\("to locate synchronous subprocess patterns and identifiedservices/auto-ml/loop.pyandservices/gpu-renderer/src/core/render.pyas relevant findings.pytest -q tests/runtime/test_service_health_import_safe.py tests/security_platform/test_runtime_engine.pywhich failed at collection due to a missinghttpxdependency required bystarlette.testclient, and this failure is recorded in the audit for follow-up.Codex Task