Skip to content

feat: pluggable security scanner architecture (Phase 1)#190

Open
lukeinglis wants to merge 1 commit into
mainfrom
feat/security-scanner-abstraction
Open

feat: pluggable security scanner architecture (Phase 1)#190
lukeinglis wants to merge 1 commit into
mainfrom
feat/security-scanner-abstraction

Conversation

@lukeinglis
Copy link
Copy Markdown
Collaborator

Summary

Implements Phase 1: Scanner Abstraction from the security audit mode plan (#189), building on the foundation laid in PR #167.

  • Creates factory/security/ package with a SecurityScanner Protocol and ScannerRegistry for pluggable, auto-detecting scanner support
  • Adds SecurityIssue and SecurityScanResult Pydantic models with severity classification
  • Implements five concrete scanners: BanditScanner, NpmAuditScanner, SemgrepScanner, TrivyScanner, GitSecretsScanner
  • Refactors eval_security() hygiene dimension to delegate to the registry instead of inline bandit/npm logic
  • Rebalances hygiene weights for 7 dimensions (sum 1.0)
  • 44 new tests covering models, registry, all scanners, and integration

Context

PR #167 introduced a single eval_security() hygiene dimension with inline bandit + npm audit support. @akashgit's review comment proposed expanding this into a full security audit mode. Issue #189 captures the phased plan; this PR delivers Phase 1.

What changed

Area Change
factory/security/__init__.py SecurityScanner Protocol, ScannerRegistry, get_default_registry()
factory/security/models.py SecuritySeverity, SecurityIssue, SecurityScanResult models
factory/security/scanners.py 5 concrete scanner implementations
factory/eval/hygiene.py eval_security() refactored to use registry, weight rebalanced
factory/eval/runner.py Updated dimension counts in docstrings
tests/test_security.py 44 new tests (models, registry, scanners, integration)
tests/eval/test_hygiene.py Updated dimension count assertions
tests/eval/test_runner.py Updated mandatory dimension counts and sets

Test plan

  • All 44 new security tests pass
  • Full test suite (1368 tests) passes
  • Lint clean (ruff)
  • Type check clean (mypy)
  • Hygiene weights sum to 1.0
  • Verify scanners work end-to-end on a project with bandit/npm installed

Refs: #189, #167

Create factory/security/ package with Protocol-based scanner abstraction:

- SecurityScanner Protocol with detect() and run() methods
- ScannerRegistry with auto-detection and error-resilient scanning
- SecurityIssue and SecurityScanResult Pydantic models
- Five concrete scanners: BanditScanner, NpmAuditScanner,
  SemgrepScanner, TrivyScanner, GitSecretsScanner
- Refactored eval_security() hygiene dimension to use the registry
- Rebalanced hygiene weights for 7 dimensions (sum 1.0)
- Updated dimension counts in runner, hygiene tests, runner tests
- 44 new tests covering models, registry, all scanners, integration

Implements Phase 1 of #189.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 92.90541% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.68%. Comparing base (4fa11d5) to head (d8c6d67).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
factory/security/scanners.py 88.77% 21 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #190      +/-   ##
==========================================
+ Coverage   86.23%   86.68%   +0.44%     
==========================================
  Files          45       50       +5     
  Lines        6313     6960     +647     
==========================================
+ Hits         5444     6033     +589     
- Misses        869      927      +58     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Owner

@akashgit akashgit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PR #190 — Pluggable Security Scanner Architecture (Phase 1)

Good architecture — Protocol-based scanner interface, registry pattern, clean separation of concerns. But there are two blockers and several issues that need addressing before merge.


BLOCKER 1: NpmAuditScanner issue_count vs vulnerability count mismatch

NpmAuditScanner.run() creates one SecurityIssue per severity level, not one per vulnerability. For {"low": 2, "moderate": 1, "high": 1, "critical": 0}:

  • Creates 3 SecurityIssue objects (one each for low, moderate, high)
  • result.issue_count = 3 (via len(self.issues))
  • But result.details = "4 vulnerabilities"

Then in eval_security():

count = result.issue_count  # = 3, not 4
total_issues += count

Score uses 3 instead of 4. BanditScanner creates one issue per finding, so npm projects get an artificial scoring advantage. The issue_count property lies — it returns the number of severity buckets, not the number of vulnerabilities.

Additionally, NpmAuditScanner.run() parses json.loads(stdout) twice (creating issues, then computing total_vulns). If the first parse succeeds but the second fails, you get issues populated but total_vulns = 0, so passed = True despite having issues.

Fix: Either create N SecurityIssue objects to match the actual vulnerability count, or add a vulnerability_count field to SecurityScanResult that eval_security() uses instead of issue_count.


BLOCKER 2: runner.py and docs still have stale dimension counts

This was flagged in both reviews on #167. PR #190 only partially fixes it:

Location Current (on main) PR #190 fixes?
runner.py:4 "6 hygiene dimensions" ✅ → "7 hygiene"
runner.py:97 "mandatory 6 + eval/score.py additions"
runner.py:126 "mandatory 11 dimensions" ✅ → "12"
runner.py:244 "Compute 6 mandatory hygiene dimensions"
docs/architecture.md:129 "6 mandatory dimensions"
docs/eval.md:7 "Hygiene (6 dimensions)"
docs/eval.md table Missing security row

Two of four runner.py references and all docs references are still stale. These are authoritative references that agents (CEO, Strategist) and the eval system rely on.


ISSUE 1: SemgrepScanner and TrivyScanner run on ALL projects

class SemgrepScanner:
    def detect(self, project_path: Path) -> bool:
        return _tool_available(["semgrep", "--version"])

If semgrep or trivy is installed system-wide, they run on every project on every eval cycle. Semgrep with --config auto downloads rules from the registry (network dependency) and has a 5-minute timeout. In factory run --loop --interval 1800, this adds up to 10 minutes of scanning per 30-minute cycle.

At minimum, add language/source-file detection or make these scanners opt-in.

ISSUE 2: "Command not found" dead code in every scanner's run()

Every scanner has:

if rc == 1 and "Command not found" in stderr:
    return SecurityScanResult(passed=True, ...)

Through ScannerRegistry.scan(), detect() already filters out unavailable tools, making this unreachable. The original #167 bug persists in this dead code — bandit produces "No module named bandit", not "Command not found". Either fix the check or remove the dead code.

ISSUE 3: Returning passed=True for "not installed"

"Not installed" is not "passed." Should be a neutral result or distinct status.

ISSUE 4: Scoring ignores severity entirely

max(0.0, 1.0 - total_issues * 0.1) — every issue deducts 0.1 regardless of severity. 10 INFO findings = 10 CRITICAL findings = score 0.0. The models have SecuritySeverity but it's unused in scoring. If severity-weighted scoring isn't ready for Phase 1, at minimum filter out INFO-level issues from the score.

ISSUE 5: structlog import in hygiene.py is unused

The new eval_security() delegates to the registry and never uses log. Dead code.

ISSUE 6: Global _default_registry singleton not test-isolated

No autouse fixture to reset _default_registry between tests. The existing _isolate_registry conftest fixture only covers factory.registry, not this new global.

ISSUE 7: info severity inconsistency in NpmAuditScanner

_NPM_SEVERITY_MAP maps "info"SecuritySeverity.INFO, so info-level vulns get added to issues. But total_vulns only counts ("low", "moderate", "high", "critical"). If a project has only info-level findings: issues is non-empty but passed = True and details = "clean".

NIT 1: Import ordering in __init__.py

structlog (third-party) before pathlib (stdlib). Ruff should flag this.

NIT 2: _run_cmd duplicated

factory/security/scanners.py defines its own _run_cmd that's nearly identical to factory/eval/hygiene.py's _run_cmd. Should share or acknowledge.


Summary

Category Count
BLOCKER 2 (npm issue_count mismatch, stale dimension counts)
ISSUE 7 (scanner applicability, dead code, passed semantics, severity scoring, unused import, test isolation, info inconsistency)
NIT 2 (import ordering, duplicated helper)

The architecture is solid. Fix the two blockers and address issues 1 and 4 (scanner applicability and severity-aware scoring) before merge. The rest can be follow-ups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants