Skip to content

Commit 87fe0d2

Browse files
intel352claude
andauthored
Codebase audit: SecurityScannerProvider and scanner plugin (#279)
* docs: actor model integration design document Explores integrating goakt v4 actor framework into the workflow engine as a complementary paradigm alongside pipelines. Covers architecture, YAML config schemas, deployment model, documentation strategy, and the path toward a future actor-native engine. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: actor model implementation plan 11 tasks, 29 tests, TDD throughout. Covers plugin skeleton, actor.system module, actor.pool module, bridge actor, step.actor_send/ask, actor workflow handler, schemas, config example, and integration tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add goakt v4 dependency and fix k8s API compat Add github.com/tochemey/goakt/v4 as a dependency. Fix VolumeResourceRequirements type change from k8s API v0.35 upgrade. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(actors): plugin skeleton, actor.system and actor.pool modules with tests * feat(actors): bridge actor and message types for pipeline-in-actor execution * feat(actors): bridge actor that executes step pipelines inside goakt BridgeActor is the core goakt<->pipeline integration: - Implements goakt v4 Actor interface (PreStart/Receive/PostStop) - Dispatches incoming ActorMessages to HandlerPipeline by message type - Builds PipelineContext with .message, .state, .actor template variables - Falls back to inline step.set creation when no step registry is available - Merges last step output back into actor state for persistence - Returns error map for unknown message types (not panics) - 3 tests pass: receive, unknown type, state persistence across messages Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(actors): integration tests, step.actor_send/ask, and actor spawning Integration tests (Task 10): - TestIntegration_FullActorLifecycle: full lifecycle from module creation through message passing and state persistence verification - TestIntegration_MultipleActorsIndependentState: two actors maintain independent state — no shared-state bugs step.actor_send (Task 5): fire-and-forget Tell to actor pools, template resolution for identity/payload, pool lookup from metadata step.actor_ask (Task 6): request-response Ask with configurable timeout, template resolution, returns actor response as step output module_pool.go: GetOrSpawnActor helper for identity-based spawning, pids map with mutex for concurrent access, SetStepRegistry injection, handlers typed to map[string]*HandlerPipeline bridge_actor.go: NewBridgeActor constructor, State() inspector method All 25 tests pass with -race flag. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(actors): step.actor_send and step.actor_ask pipeline steps - step.actor_send: fire-and-forget Tell to identity-based or pool actors - step.actor_ask: request-response Ask with configurable timeout (default 10s) - Both resolve message/identity as template expressions via PipelineContext - Use GetOrSpawnActor for auto-managed pools, ActorOf for permanent pools - Registered in plugin.StepFactories via wrapStepFactory helper - 9 unit tests covering config validation and timeout parsing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(actors): actor workflow handler, wiring hook, and pool actor management * feat(actors): module and step schemas, handler tests * docs(actors): example config demonstrating actor-based order processing Shows actor.system, actor.pool, and step.actor_ask in a complete HTTP + actor workflow: stateful order processing where each order_id maps to its own auto-managed actor instance. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: add actor model types to DOCUMENTATION.md - actor.system and actor.pool module types in Actor Model section - step.actor_send and step.actor_ask in Pipeline Steps table - Actors workflow type in Workflow Types section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(actors): refine config example with cleaner format and cancel workflow * fix(actors): address spec review — pool lookup via service registry, permanent pool spawning - Fix step.actor_send and step.actor_ask to look up pools via app.GetService("actor-pool:<name>") instead of unreachable pc.Metadata["__actor_pools"] map - Implement permanent pool actor spawning in ActorPoolModule.Start() — spawns poolSize BridgeActor instances into the goakt system - Remove double error response in BridgeActor.Receive() — use only ctx.Err(err), not both ctx.Err and ctx.Response - Remove unused ActorResponse dead code from messages.go - Add BridgeGrain integration test (state persistence via grain API) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(actors): nil guard before system access, safe type assertions, ask step test - Move nil check for pool.system before ActorSystem() call in both step.actor_send and step.actor_ask to prevent nil pointer dereference - Use sys variable consistently instead of redundant ActorSystem() calls - Convert bare type assertions to two-value form in integration and bridge actor tests to fail gracefully instead of panicking - Add TestActorAskStep_RequiresMessageType test for factory validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(actors): address Copilot review — naming, CBOR tags, identity validation, step caching - Spawn primary actor under pool name so sys.ActorOf(ctx, poolName) succeeds for permanent pools (was spawning pool-0..N only) - Add CBOR struct tags to ActorMessage for cluster mode serialization - Validate identity is required for auto-managed pools at Execute time instead of silently falling through to a failing ActorOf path - Cache step instances in executePipeline to avoid rebuilding per message - Remove AI-generated plan doc (local paths, Claude instructions) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(actors): implement runtime routing, recovery, and remove cluster-only fields Steps now use pool.SelectActor(msg) for permanent pools instead of sys.ActorOf(), enabling actual routing strategy execution. Round-robin, random, broadcast, and sticky routing all work at runtime. Recovery supervisors are applied via actor.WithSupervisor() during Spawn. Removed non-functional cluster-only schema fields (cluster, metrics, tracing, placement, targetRoles, failover) since single-node is the current scope. Added 10 new tests covering routing distribution, permanent pool spawning, broadcast delivery, and recovery. All 41 actor plugin tests pass. Example config updated with permanent pool. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(actors): resolve lint issues and step caching race condition - Pre-build step instances during pool init (preBuildSteps) instead of caching in shared handler maps at runtime, eliminating a data race when multiple actors process messages concurrently - Convert if-else chain to switch statement (gocritic) - Guard negative int-to-uint32 conversion (gosec G115) - Remove unnecessary nil check around range (staticcheck S1031) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: tidy example module for goakt v4 transitive deps Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address review comments — fresh steps per execution, RequiresServices, error handling - Build fresh step instances per execution in executePipeline() to avoid sharing mutable state across concurrent actors in the same pool - Remove BuiltSteps field from HandlerPipeline and preBuildSteps() from pool (no longer needed with per-execution step building) - Return real error for unknown message types instead of map with error key - Accumulate all step outputs into actor state, not just the last step's - Add RequiresServices() on ActorPoolModule to declare dependency on its actor.system module for correct init ordering - Document nil return convention in module factories (engine handles nil) - Update TestBridgeActor_UnknownMessageType for new error behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add codebase audit design and implementation plan Comprehensive audit found 3 stubbed scan steps in core engine, confirmed all external plugins are fully implemented, and identified 8 plugins with zero scenario coverage. Plan: SecurityScannerProvider interface, DockerSandbox module, security scanner plugin, 4 public scenarios, 5 private scenarios. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(module): add SecurityScannerProvider interface and rewrite scan steps - Create scan_provider.go with SecurityScannerProvider interface and SASTScanOpts/ContainerScanOpts/DepsScanOpts config structs; export SeverityRank as a public wrapper around the existing severityRank helper - Rewrite ScanSASTStep, ScanContainerStep, and ScanDepsStep Execute() methods to delegate to a SecurityScannerProvider looked up from the modular service registry under "security-scanner" - Steps return a clear error when no provider is configured instead of ErrNotImplemented; severity gate evaluation uses existing EvaluateGate() - Add scan_provider_test.go with mock provider covering success, gate failure, and provider error cases for all three scan steps - Update pipeline_step_scan_test.go to verify the no-provider error path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(scanner): add security.scanner built-in plugin with mock mode Adds plugins/scanner package implementing SecurityScannerProvider. The security.scanner module registers as "security-scanner" service, enabling the existing scan_sast/scan_container/scan_deps steps. Supports mock mode with configurable findings for testing, and defaults to sensible scanner backends (semgrep, trivy, grype). 12 tests covering all scan types, gate evaluation, and config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(plugin/external): add SecurityScannerRemoteModule adapter for security.scanner - Add security_scanner_adapter.go with SecurityScannerRemoteModule that wraps RemoteModule and registers a remoteSecurityScannerProvider in the service registry on Init(app), enabling core scan steps to find the provider via app.GetService("security-scanner", &provider) - Update adapter.go ModuleFactories() to wrap security.scanner remote modules with SecurityScannerRemoteModule instead of the plain RemoteModule - remoteSecurityScannerProvider implements module.SecurityScannerProvider by delegating ScanSAST/ScanContainer/ScanDeps to InvokeService gRPC calls Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address PR #279 review comments - Add nil guards for app in all 3 scan steps - Validate severity threshold in scan step factories - Default fail_on_severity to "high" (was "error", not a valid severity) - Validate scanner module mode config - Log errors in scanner module factory - Update plugin description to not claim CLI mode support - Add TODO for context propagation in remote scanner adapter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address remaining PR #279 review comments - Validate target_image is non-empty in scan_container step factory - Validate mockFindings scan type keys (sast/container/deps only) - Error on malformed finding items instead of silently skipping - Handle both int and float64 types for line field in parseMockFindings - Remove tool-specific instructions and absolute paths from plan doc - Fix provider lookup reference to match implementation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent e59d21d commit 87fe0d2

14 files changed

Lines changed: 1949 additions & 74 deletions
Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# Codebase Audit & Stub Completion Design
2+
3+
## Goal
4+
5+
Complete all stubbed/incomplete code in the workflow engine, implement DockerSandbox for secure container execution, and build realistic test scenarios for plugins with zero coverage.
6+
7+
## Audit Summary
8+
9+
### Stubs Found in Core Engine (3 items)
10+
11+
| File | Step Type | Issue |
12+
|------|-----------|-------|
13+
| `module/pipeline_step_scan_sast.go` | step.scan_sast | Returns `ErrNotImplemented` |
14+
| `module/pipeline_step_scan_container.go` | step.scan_container | Returns `ErrNotImplemented` |
15+
| `module/pipeline_step_scan_deps.go` | step.scan_deps | Returns `ErrNotImplemented` |
16+
17+
**Root cause**: Blocked on `sandbox.DockerSandbox` which doesn't exist. These steps have scanning logic inline in the core engine — wrong architecture. Scanning should be delegated to plugins via a provider interface.
18+
19+
### External Plugins — All Fully Implemented
20+
21+
Deep audits confirmed all plugin code is production-ready:
22+
- **workflow-plugin-supply-chain**: 4 steps + 6 modules (Trivy, Grype, Snyk, ECR, GCP)
23+
- **workflow-plugin-data-protection**: 3 steps + 4 modules (regex, GCP DLP, AWS Macie, Presidio)
24+
- **workflow-plugin-github**: 3 steps + 1 module (41 tests, real GitHub API client)
25+
- All other plugins: bento, authz, payments, waf, security, sandbox — fully implemented
26+
27+
### Missing: DockerSandbox
28+
29+
`workflow-plugin-sandbox` provides WASM execution + goroutine guards but no Docker container isolation. A `sandbox.docker` module is needed for secure container execution (used by CI/CD steps, scan steps, etc.).
30+
31+
### Scenario Coverage Gaps
32+
33+
8 plugins with zero scenario coverage: authz, payments, github, waf, security, sandbox, supply-chain, data-protection.
34+
35+
## Architecture
36+
37+
### Part 1A: Security Scanner Provider Interface
38+
39+
The core engine defines a `SecurityScannerProvider` interface. Scan steps delegate to whichever plugin registers as provider.
40+
41+
```go
42+
// In module/scan_provider.go
43+
type SecurityScannerProvider interface {
44+
// ScanSAST runs static analysis on source code.
45+
ScanSAST(ctx context.Context, opts SASTScanOpts) (*ScanResult, error)
46+
// ScanContainer scans a container image for vulnerabilities.
47+
ScanContainer(ctx context.Context, opts ContainerScanOpts) (*ScanResult, error)
48+
// ScanDeps scans dependencies for known vulnerabilities.
49+
ScanDeps(ctx context.Context, opts DepsScanOpts) (*ScanResult, error)
50+
}
51+
52+
type ScanResult struct {
53+
Passed bool
54+
Findings []ScanFinding
55+
Summary map[string]int // severity -> count
56+
OutputFormat string // sarif, json, table
57+
RawOutput string
58+
}
59+
60+
type ScanFinding struct {
61+
ID string // CVE-2024-1234
62+
Severity string // critical, high, medium, low, info
63+
Title string
64+
Description string
65+
Package string
66+
Version string
67+
FixVersion string
68+
Location string // file path or image layer
69+
}
70+
```
71+
72+
The 3 scan steps become thin wrappers:
73+
1. Look up `SecurityScannerProvider` from service registry
74+
2. Call the appropriate method
75+
3. Evaluate severity gate (fail_on_severity)
76+
4. Return structured results
77+
78+
### Part 1B: DockerSandbox Module
79+
80+
Add `sandbox.docker` to the existing `workflow-plugin-sandbox`. Provides secure container execution:
81+
82+
```yaml
83+
modules:
84+
- name: docker-sandbox
85+
type: sandbox.docker
86+
config:
87+
maxCPU: "1.0" # CPU limit
88+
maxMemory: "512m" # Memory limit
89+
networkMode: "none" # No network by default
90+
readOnlyRootfs: true # Immutable filesystem
91+
noPrivileged: true # Never allow --privileged
92+
allowedImages: # Whitelist of allowed images
93+
- "semgrep/semgrep:*"
94+
- "aquasec/trivy:*"
95+
- "anchore/grype:*"
96+
timeout: "5m" # Max execution time
97+
```
98+
99+
Interface:
100+
```go
101+
type DockerSandbox interface {
102+
Run(ctx context.Context, opts DockerRunOpts) (*DockerRunResult, error)
103+
}
104+
105+
type DockerRunOpts struct {
106+
Image string
107+
Command []string
108+
Env map[string]string
109+
Mounts []Mount // Read-only bind mounts
110+
WorkDir string
111+
NetworkMode string // "none", "bridge" (not "host")
112+
}
113+
```
114+
115+
Uses Docker Engine API client (`github.com/docker/docker/client`), NOT `os/exec` with `docker run` (which can be shell-injected).
116+
117+
### Part 1C: Security Scanner Plugin
118+
119+
Create `workflow-plugin-security-scanner` (public, Apache-2.0) that:
120+
1. Implements `SecurityScannerProvider` interface
121+
2. Provides `security.scanner` module type
122+
3. Supports backends: semgrep (SAST), trivy (container + deps), grype (deps)
123+
4. Optionally uses DockerSandbox for isolated execution when available
124+
5. Falls back to direct CLI execution when DockerSandbox isn't configured
125+
6. Includes `mock` mode for testing without real tools
126+
127+
### Part 2: Public Scenarios (workflow-scenarios)
128+
129+
| # | Scenario | Plugin | Tests | Verification |
130+
|---|----------|--------|-------|-------------|
131+
| 46 | github-cicd | workflow-plugin-github | Webhook HMAC validation, action trigger/status, check runs | Verify payload parsing, signature validation, event filtering |
132+
| 47 | authz-rbac | workflow-plugin-authz | Casbin policy CRUD, role enforcement, deny access | Verify policy creates, role assigns, access denied returns 403 |
133+
| 48 | payment-processing | workflow-plugin-payments | Charge, capture, refund, subscription lifecycle | Verify amounts, status transitions, webhook handling |
134+
| 49 | security-scanning | Core + scanner plugin | SAST, container scan, dependency scan | Verify findings count, severity filtering, pass/fail gate |
135+
136+
### Part 3: Private Scenarios (workflow-scenarios-private)
137+
138+
| # | Scenario | Plugin | Tests | Verification |
139+
|---|----------|--------|-------|-------------|
140+
| 01 | waf-protection | workflow-plugin-waf | Input sanitization, IP check, WAF evaluate | Verify blocked requests return 403, clean requests pass |
141+
| 02 | mfa-encryption | workflow-plugin-security | TOTP enroll/verify, AES encrypt/decrypt | Verify TOTP codes validate, encrypted != plaintext, decrypt == original |
142+
| 03 | wasm-sandbox | workflow-plugin-sandbox | WASM exec, goroutine guards | Verify WASM output, resource limits enforced |
143+
| 04 | data-protection | workflow-plugin-data-protection | PII detect, data mask, classify | Verify PII found in test data, masked values differ, classifications correct |
144+
| 05 | supply-chain | workflow-plugin-supply-chain | Signature verify, vuln scan, SBOM | Verify signature validation, finding counts, SBOM component counts |
145+
146+
### Scenario Design Principles
147+
148+
Every test script uses `jq` for JSON validation:
149+
```bash
150+
# Good: verify specific field values
151+
RESULT=$(curl -s "$BASE_URL/api/scan" -d '{"target":"test-image:v1"}')
152+
PASSED=$(echo "$RESULT" | jq -r '.passed')
153+
SEVERITY=$(echo "$RESULT" | jq -r '.summary.critical')
154+
[ "$PASSED" = "false" ] && [ "$SEVERITY" -gt 0 ] && echo "PASS: scan detected critical vulns" || echo "FAIL: expected critical findings"
155+
156+
# Bad: just check HTTP status
157+
curl -s -o /dev/null -w "%{http_code}" "$BASE_URL/api/scan" | grep -q "200" && echo "PASS"
158+
```
159+
160+
Tests must verify:
161+
1. **Data transforms** — output values match expected transformations
162+
2. **State changes** — persistence confirmed by reading back after writing
163+
3. **Enforcement** — denied/blocked requests fail with proper error codes (403, 400)
164+
4. **Error paths** — invalid inputs return descriptive error messages
165+
166+
## Implementation Order
167+
168+
1. **Part 1A**: SecurityScannerProvider interface in core engine
169+
2. **Part 1B**: DockerSandbox module in workflow-plugin-sandbox
170+
3. **Part 1C**: Security scanner plugin implementing the provider
171+
4. **Part 1 wiring**: Update core scan steps to delegate to provider
172+
5. **Part 2**: Public scenarios 46-49
173+
6. **Part 3**: Private scenarios repo + scenarios 01-05
174+
175+
## Out of Scope
176+
177+
- Real cloud API integration tests (need credentials)
178+
- `cache.modular` interface gap (needs modular framework changes)
179+
- Phase 5 architecture refactoring (separate effort)
180+
- Documentation updates (separate effort)

0 commit comments

Comments
 (0)