feat: pluggable security scanner architecture (Phase 1) by lukeinglis · Pull Request #190 · akashgit/remote-factory

lukeinglis · 2026-05-04T18:32:07Z

Summary

Implements Phase 1: Scanner Abstraction from the security audit mode plan (#189), building on the foundation laid in PR #167.

Creates factory/security/ package with a SecurityScanner Protocol and ScannerRegistry for pluggable, auto-detecting scanner support
Adds SecurityIssue and SecurityScanResult Pydantic models with severity classification
Implements five concrete scanners: BanditScanner, NpmAuditScanner, SemgrepScanner, TrivyScanner, GitSecretsScanner
Refactors eval_security() hygiene dimension to delegate to the registry instead of inline bandit/npm logic
Rebalances hygiene weights for 7 dimensions (sum 1.0)
44 new tests covering models, registry, all scanners, and integration

Context

PR #167 introduced a single eval_security() hygiene dimension with inline bandit + npm audit support. @akashgit's review comment proposed expanding this into a full security audit mode. Issue #189 captures the phased plan; this PR delivers Phase 1.

What changed

Area	Change
`factory/security/__init__.py`	`SecurityScanner` Protocol, `ScannerRegistry`, `get_default_registry()`
`factory/security/models.py`	`SecuritySeverity`, `SecurityIssue`, `SecurityScanResult` models
`factory/security/scanners.py`	5 concrete scanner implementations
`factory/eval/hygiene.py`	`eval_security()` refactored to use registry, weight rebalanced
`factory/eval/runner.py`	Updated dimension counts in docstrings
`tests/test_security.py`	44 new tests (models, registry, scanners, integration)
`tests/eval/test_hygiene.py`	Updated dimension count assertions
`tests/eval/test_runner.py`	Updated mandatory dimension counts and sets

Test plan

All 44 new security tests pass
Full test suite (1368 tests) passes
Lint clean (ruff)
Type check clean (mypy)
Hygiene weights sum to 1.0
Verify scanners work end-to-end on a project with bandit/npm installed

Refs: #189, #167

Create factory/security/ package with Protocol-based scanner abstraction: - SecurityScanner Protocol with detect() and run() methods - ScannerRegistry with auto-detection and error-resilient scanning - SecurityIssue and SecurityScanResult Pydantic models - Five concrete scanners: BanditScanner, NpmAuditScanner, SemgrepScanner, TrivyScanner, GitSecretsScanner - Refactored eval_security() hygiene dimension to use the registry - Rebalanced hygiene weights for 7 dimensions (sum 1.0) - Updated dimension counts in runner, hygiene tests, runner tests - 44 new tests covering models, registry, all scanners, integration Implements Phase 1 of #189. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>

codecov · 2026-05-04T18:33:35Z

Codecov Report

❌ Patch coverage is 92.90541% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.68%. Comparing base (4fa11d5) to head (d8c6d67).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
factory/security/scanners.py	88.77%	21 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #190      +/-   ##
==========================================
+ Coverage   86.23%   86.68%   +0.44%     
==========================================
  Files          45       50       +5     
  Lines        6313     6960     +647     
==========================================
+ Hits         5444     6033     +589     
- Misses        869      927      +58

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

akashgit

Code Review: PR #190 — Pluggable Security Scanner Architecture (Phase 1)

Good architecture — Protocol-based scanner interface, registry pattern, clean separation of concerns. But there are two blockers and several issues that need addressing before merge.

BLOCKER 1: NpmAuditScanner issue_count vs vulnerability count mismatch

NpmAuditScanner.run() creates one SecurityIssue per severity level, not one per vulnerability. For {"low": 2, "moderate": 1, "high": 1, "critical": 0}:

Creates 3 SecurityIssue objects (one each for low, moderate, high)
result.issue_count = 3 (via len(self.issues))
But result.details = "4 vulnerabilities"

Then in eval_security():

count = result.issue_count  # = 3, not 4
total_issues += count

Score uses 3 instead of 4. BanditScanner creates one issue per finding, so npm projects get an artificial scoring advantage. The issue_count property lies — it returns the number of severity buckets, not the number of vulnerabilities.

Additionally, NpmAuditScanner.run() parses json.loads(stdout) twice (creating issues, then computing total_vulns). If the first parse succeeds but the second fails, you get issues populated but total_vulns = 0, so passed = True despite having issues.

Fix: Either create N SecurityIssue objects to match the actual vulnerability count, or add a vulnerability_count field to SecurityScanResult that eval_security() uses instead of issue_count.

BLOCKER 2: runner.py and docs still have stale dimension counts

This was flagged in both reviews on #167. PR #190 only partially fixes it:

Location	Current (on main)	PR #190 fixes?
`runner.py:4`	"6 hygiene dimensions"	✅ → "7 hygiene"
`runner.py:97`	"mandatory 6 + eval/score.py additions"	❌
`runner.py:126`	"mandatory 11 dimensions"	✅ → "12"
`runner.py:244`	"Compute 6 mandatory hygiene dimensions"	❌
`docs/architecture.md:129`	"6 mandatory dimensions"	❌
`docs/eval.md:7`	"Hygiene (6 dimensions)"	❌
`docs/eval.md` table	Missing security row	❌

Two of four runner.py references and all docs references are still stale. These are authoritative references that agents (CEO, Strategist) and the eval system rely on.

ISSUE 1: SemgrepScanner and TrivyScanner run on ALL projects

class SemgrepScanner:
    def detect(self, project_path: Path) -> bool:
        return _tool_available(["semgrep", "--version"])

If semgrep or trivy is installed system-wide, they run on every project on every eval cycle. Semgrep with --config auto downloads rules from the registry (network dependency) and has a 5-minute timeout. In factory run --loop --interval 1800, this adds up to 10 minutes of scanning per 30-minute cycle.

At minimum, add language/source-file detection or make these scanners opt-in.

ISSUE 2: "Command not found" dead code in every scanner's `run()`

Every scanner has:

if rc == 1 and "Command not found" in stderr:
    return SecurityScanResult(passed=True, ...)

Through ScannerRegistry.scan(), detect() already filters out unavailable tools, making this unreachable. The original #167 bug persists in this dead code — bandit produces "No module named bandit", not "Command not found". Either fix the check or remove the dead code.

ISSUE 3: Returning `passed=True` for "not installed"

"Not installed" is not "passed." Should be a neutral result or distinct status.

ISSUE 4: Scoring ignores severity entirely

max(0.0, 1.0 - total_issues * 0.1) — every issue deducts 0.1 regardless of severity. 10 INFO findings = 10 CRITICAL findings = score 0.0. The models have SecuritySeverity but it's unused in scoring. If severity-weighted scoring isn't ready for Phase 1, at minimum filter out INFO-level issues from the score.

ISSUE 5: `structlog` import in `hygiene.py` is unused

The new eval_security() delegates to the registry and never uses log. Dead code.

ISSUE 6: Global `_default_registry` singleton not test-isolated

No autouse fixture to reset _default_registry between tests. The existing _isolate_registry conftest fixture only covers factory.registry, not this new global.

ISSUE 7: `info` severity inconsistency in NpmAuditScanner

_NPM_SEVERITY_MAP maps "info" → SecuritySeverity.INFO, so info-level vulns get added to issues. But total_vulns only counts ("low", "moderate", "high", "critical"). If a project has only info-level findings: issues is non-empty but passed = True and details = "clean".

NIT 1: Import ordering in `init.py`

structlog (third-party) before pathlib (stdlib). Ruff should flag this.

NIT 2: `_run_cmd` duplicated

factory/security/scanners.py defines its own _run_cmd that's nearly identical to factory/eval/hygiene.py's _run_cmd. Should share or acknowledge.

Summary

Category	Count
BLOCKER	2 (npm issue_count mismatch, stale dimension counts)
ISSUE	7 (scanner applicability, dead code, passed semantics, severity scoring, unused import, test isolation, info inconsistency)
NIT	2 (import ordering, duplicated helper)

The architecture is solid. Fix the two blockers and address issues 1 and 4 (scanner applicability and severity-aware scoring) before merge. The rest can be follow-ups.

akashgit mentioned this pull request May 7, 2026

feat: add security hygiene eval dimension (bandit + npm audit) #167

Closed

7 tasks

akashgit requested changes May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: pluggable security scanner architecture (Phase 1)#190

feat: pluggable security scanner architecture (Phase 1)#190
lukeinglis wants to merge 1 commit into
mainfrom
feat/security-scanner-abstraction

lukeinglis commented May 4, 2026

Uh oh!

codecov Bot commented May 4, 2026

Uh oh!

akashgit left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lukeinglis commented May 4, 2026

Summary

Context

What changed

Test plan

Uh oh!

codecov Bot commented May 4, 2026

Codecov Report

Uh oh!

akashgit left a comment

Choose a reason for hiding this comment

Code Review: PR #190 — Pluggable Security Scanner Architecture (Phase 1)

BLOCKER 1: NpmAuditScanner issue_count vs vulnerability count mismatch

BLOCKER 2: runner.py and docs still have stale dimension counts

ISSUE 1: SemgrepScanner and TrivyScanner run on ALL projects

ISSUE 2: "Command not found" dead code in every scanner's run()

ISSUE 3: Returning passed=True for "not installed"

ISSUE 4: Scoring ignores severity entirely

ISSUE 5: structlog import in hygiene.py is unused

ISSUE 6: Global _default_registry singleton not test-isolated

ISSUE 7: info severity inconsistency in NpmAuditScanner

NIT 1: Import ordering in __init__.py

NIT 2: _run_cmd duplicated

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ISSUE 2: "Command not found" dead code in every scanner's `run()`

ISSUE 3: Returning `passed=True` for "not installed"

ISSUE 5: `structlog` import in `hygiene.py` is unused

ISSUE 6: Global `_default_registry` singleton not test-isolated

ISSUE 7: `info` severity inconsistency in NpmAuditScanner

NIT 1: Import ordering in `init.py`

NIT 2: `_run_cmd` duplicated