feat: pluggable security scanner architecture (Phase 1)#190
Conversation
Create factory/security/ package with Protocol-based scanner abstraction: - SecurityScanner Protocol with detect() and run() methods - ScannerRegistry with auto-detection and error-resilient scanning - SecurityIssue and SecurityScanResult Pydantic models - Five concrete scanners: BanditScanner, NpmAuditScanner, SemgrepScanner, TrivyScanner, GitSecretsScanner - Refactored eval_security() hygiene dimension to use the registry - Rebalanced hygiene weights for 7 dimensions (sum 1.0) - Updated dimension counts in runner, hygiene tests, runner tests - 44 new tests covering models, registry, all scanners, integration Implements Phase 1 of #189. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #190 +/- ##
==========================================
+ Coverage 86.23% 86.68% +0.44%
==========================================
Files 45 50 +5
Lines 6313 6960 +647
==========================================
+ Hits 5444 6033 +589
- Misses 869 927 +58 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
akashgit
left a comment
There was a problem hiding this comment.
Code Review: PR #190 — Pluggable Security Scanner Architecture (Phase 1)
Good architecture — Protocol-based scanner interface, registry pattern, clean separation of concerns. But there are two blockers and several issues that need addressing before merge.
BLOCKER 1: NpmAuditScanner issue_count vs vulnerability count mismatch
NpmAuditScanner.run() creates one SecurityIssue per severity level, not one per vulnerability. For {"low": 2, "moderate": 1, "high": 1, "critical": 0}:
- Creates 3
SecurityIssueobjects (one each for low, moderate, high) result.issue_count= 3 (vialen(self.issues))- But
result.details= "4 vulnerabilities"
Then in eval_security():
count = result.issue_count # = 3, not 4
total_issues += countScore uses 3 instead of 4. BanditScanner creates one issue per finding, so npm projects get an artificial scoring advantage. The issue_count property lies — it returns the number of severity buckets, not the number of vulnerabilities.
Additionally, NpmAuditScanner.run() parses json.loads(stdout) twice (creating issues, then computing total_vulns). If the first parse succeeds but the second fails, you get issues populated but total_vulns = 0, so passed = True despite having issues.
Fix: Either create N SecurityIssue objects to match the actual vulnerability count, or add a vulnerability_count field to SecurityScanResult that eval_security() uses instead of issue_count.
BLOCKER 2: runner.py and docs still have stale dimension counts
This was flagged in both reviews on #167. PR #190 only partially fixes it:
| Location | Current (on main) | PR #190 fixes? |
|---|---|---|
runner.py:4 |
"6 hygiene dimensions" | ✅ → "7 hygiene" |
runner.py:97 |
"mandatory 6 + eval/score.py additions" | ❌ |
runner.py:126 |
"mandatory 11 dimensions" | ✅ → "12" |
runner.py:244 |
"Compute 6 mandatory hygiene dimensions" | ❌ |
docs/architecture.md:129 |
"6 mandatory dimensions" | ❌ |
docs/eval.md:7 |
"Hygiene (6 dimensions)" | ❌ |
docs/eval.md table |
Missing security row | ❌ |
Two of four runner.py references and all docs references are still stale. These are authoritative references that agents (CEO, Strategist) and the eval system rely on.
ISSUE 1: SemgrepScanner and TrivyScanner run on ALL projects
class SemgrepScanner:
def detect(self, project_path: Path) -> bool:
return _tool_available(["semgrep", "--version"])If semgrep or trivy is installed system-wide, they run on every project on every eval cycle. Semgrep with --config auto downloads rules from the registry (network dependency) and has a 5-minute timeout. In factory run --loop --interval 1800, this adds up to 10 minutes of scanning per 30-minute cycle.
At minimum, add language/source-file detection or make these scanners opt-in.
ISSUE 2: "Command not found" dead code in every scanner's run()
Every scanner has:
if rc == 1 and "Command not found" in stderr:
return SecurityScanResult(passed=True, ...)Through ScannerRegistry.scan(), detect() already filters out unavailable tools, making this unreachable. The original #167 bug persists in this dead code — bandit produces "No module named bandit", not "Command not found". Either fix the check or remove the dead code.
ISSUE 3: Returning passed=True for "not installed"
"Not installed" is not "passed." Should be a neutral result or distinct status.
ISSUE 4: Scoring ignores severity entirely
max(0.0, 1.0 - total_issues * 0.1) — every issue deducts 0.1 regardless of severity. 10 INFO findings = 10 CRITICAL findings = score 0.0. The models have SecuritySeverity but it's unused in scoring. If severity-weighted scoring isn't ready for Phase 1, at minimum filter out INFO-level issues from the score.
ISSUE 5: structlog import in hygiene.py is unused
The new eval_security() delegates to the registry and never uses log. Dead code.
ISSUE 6: Global _default_registry singleton not test-isolated
No autouse fixture to reset _default_registry between tests. The existing _isolate_registry conftest fixture only covers factory.registry, not this new global.
ISSUE 7: info severity inconsistency in NpmAuditScanner
_NPM_SEVERITY_MAP maps "info" → SecuritySeverity.INFO, so info-level vulns get added to issues. But total_vulns only counts ("low", "moderate", "high", "critical"). If a project has only info-level findings: issues is non-empty but passed = True and details = "clean".
NIT 1: Import ordering in __init__.py
structlog (third-party) before pathlib (stdlib). Ruff should flag this.
NIT 2: _run_cmd duplicated
factory/security/scanners.py defines its own _run_cmd that's nearly identical to factory/eval/hygiene.py's _run_cmd. Should share or acknowledge.
Summary
| Category | Count |
|---|---|
| BLOCKER | 2 (npm issue_count mismatch, stale dimension counts) |
| ISSUE | 7 (scanner applicability, dead code, passed semantics, severity scoring, unused import, test isolation, info inconsistency) |
| NIT | 2 (import ordering, duplicated helper) |
The architecture is solid. Fix the two blockers and address issues 1 and 4 (scanner applicability and severity-aware scoring) before merge. The rest can be follow-ups.
Summary
Implements Phase 1: Scanner Abstraction from the security audit mode plan (#189), building on the foundation laid in PR #167.
factory/security/package with aSecurityScannerProtocol andScannerRegistryfor pluggable, auto-detecting scanner supportSecurityIssueandSecurityScanResultPydantic models with severity classificationBanditScanner,NpmAuditScanner,SemgrepScanner,TrivyScanner,GitSecretsScannereval_security()hygiene dimension to delegate to the registry instead of inline bandit/npm logicContext
PR #167 introduced a single
eval_security()hygiene dimension with inline bandit + npm audit support. @akashgit's review comment proposed expanding this into a full security audit mode. Issue #189 captures the phased plan; this PR delivers Phase 1.What changed
factory/security/__init__.pySecurityScannerProtocol,ScannerRegistry,get_default_registry()factory/security/models.pySecuritySeverity,SecurityIssue,SecurityScanResultmodelsfactory/security/scanners.pyfactory/eval/hygiene.pyeval_security()refactored to use registry, weight rebalancedfactory/eval/runner.pytests/test_security.pytests/eval/test_hygiene.pytests/eval/test_runner.pyTest plan
Refs: #189, #167