Skip to content

feat(observer): harden Collector against malformed JSON payloads#171

Merged
ProtocolWarden merged 11 commits into
mainfrom
goal/3a3c202f
May 23, 2026
Merged

feat(observer): harden Collector against malformed JSON payloads#171
ProtocolWarden merged 11 commits into
mainfrom
goal/3a3c202f

Conversation

@ProtocolWarden
Copy link
Copy Markdown
Owner

Summary

Implement strict JSON schema validation and type checking across all 12 JSON parsing entry points in the observer subsystem to improve resilience against corrupted artifacts.

This PR completes Stage 2 of the JSON hardening initiative.

Key Changes

Critical Bug Fix

  • dependency_drift.py line 19: Fixed unprotected json.loads() that could crash the collector

New Validation Layer

  • Created comprehensive validation.py module with ParseErrorMetadata and per-collector validators
  • Added security_logging.py for structured error tracking
  • All 6 JSON-parsing collectors updated with two-stage validation (parse + structure)

Error Handling

  • Parse errors: DEBUG level (expected transient failures)
  • Structure errors: WARNING level (unexpected schema violations)
  • Graceful degradation: skip malformed artifacts and continue processing

Test Coverage

  • 57+ comprehensive test cases covering:
    • Parse error handling and JSON decoding errors
    • Structure validation and required field enforcement
    • Type checks and nested property access
    • Edge cases and crash prevention

Acceptance Criteria

  • Schema validation logic implemented for all collectors
  • All required fields enforced with explicit error messages
  • Type coercion and boundary checks in place
  • Comprehensive test suite (57+ tests)

Backward Compatibility

  • All changes are additive
  • Existing behavior unchanged for valid artifacts
  • Graceful degradation for malformed artifacts (skip/unavailable instead of crash)

ProtocolWarden and others added 11 commits May 23, 2026 15:20
Root cause: goal board_worker has zero executor successes; 14 improve/goal
tasks recycle (promote->dispatch->fail->reblock) throttled by hourly 4/4 rate
gate + Claude session-limit (external quota). propose created=0 (candidates
duplicate 39 queued tasks) => execution throughput, not proposal, is bottleneck.
Affected repo: OperationsCenter (board/queue state only — no code change).
board-unblock drained Blocked 14->1, repopulated R4AI->13. Escalation 3860f469
updated with new evidence; no duplicate task. Golden invariants 15 passed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…age 2 implementation

Implement strict JSON schema validation and type checking across all 12 JSON parsing entry points in the observer subsystem to improve resilience against corrupted artifacts.

## Changes

### New Components
- src/operations_center/observer/validation.py: Comprehensive validation library with:
  - ParseErrorMetadata: Structured error tracking for signals
  - ArtifactValidator: Base class with type/enum/range checkers and safe nested access
  - Collector-specific validators (ExecutionOutcome, Request, ValidationHistory, DependencyReport, LintItem)
- src/operations_center/observer/security_logging.py: Enhanced logging helpers for validation errors
- tests/observer/test_collectors_hardening/: Comprehensive test suite with 57+ test cases

### Updated Collectors (Phase 1 & 2)
**Phase 1 (Crash Prevention)**
- dependency_drift.py: Fixed critical crash at line 19 by adding try/except around json.loads() and read_text()
  - Parse errors logged at DEBUG level (expected transient failures)
  - Structure validation errors logged at WARNING level (unexpected schema violations)
  - Returns unavailable signal on any parse error

**Phase 2 (Consistency)**
- execution_health.py: Added ExecutionOutcomeValidator + RequestValidator + ValidationHistoryValidator
  - Validates control_outcome.json, request.json, validation.json structures
  - Enforces required fields and type checks before processing
  - Gracefully skips malformed artifacts and continues

- validation_history.py: Same validators as execution_health.py
  - Consistent error handling across both file-based artifact collectors

- lint_signal.py: Added LintItemValidator for ruff output
  - Validates individual lint issue structures before collection
  - Type checks nested location.start.line/column before use

- type_check.py: Enhanced safe_get() for nested property extraction
  - Safely accesses range.start.line without crashing on missing/wrong types
  - Logs validation errors at debug level for graceful recovery

### Updated Models (Signal Definitions)
- models.py: Added parse_errors: ParseErrorMetadata to signal types:
  - ExecutionHealthSignal, DependencyDriftSignal, ValidationHistorySignal
  - LintSignal, TypeSignal
  - Tracks total_errors, error_categories, last_error_type/msg for operator visibility

### Error Handling Architecture
- Two-stage validation: Parse layer (JSON→Python) + Structure layer (Python→Validated)
- Consistent logging: DEBUG for parse errors, WARNING for structure errors
- Recovery strategies:
  - File-based collectors: Skip malformed artifacts, continue processing
  - Subprocess collectors: Return unavailable signal on parse error
- All collectors now handle 12 vulnerability vectors from Stage 0 analysis:
  - Silent failures on parse errors
  - Unhandled crashes (dependency_drift.py priority fix)
  - Missing post-parse type validation
  - Missing required field checks
  - Type mismatches and invalid enums
  - Nested structure validation failures

## Test Coverage
- test_validation_helpers.py: 22 tests validating all validator classes
  - Type checks, enum validation, range checks, nested access, required fields
  - Each validator tested with valid and invalid inputs

- test_dependency_drift.py: 16 tests for crash fix and edge cases
  - Malformed JSON no longer crashes (CRITICAL FIX)
  - Parse errors logged correctly
  - Structure errors detected and logged
  - Unicode/encoding errors handled gracefully

- test_execution_health.py: 19 tests for mixed scenarios
  - Malformed outcome/request/validation files skipped gracefully
  - Type mismatches caught before processing
  - Repo key filtering preserves correct runs
  - Multiple valid+invalid runs processed correctly

## Acceptance Criteria ✅
- [x] Schema validation logic implemented for all 6 JSON-parsing collectors
- [x] All required fields enforced with explicit error messages
- [x] Type coercion and boundary checks in place (ranges, enums, nested access)
- [x] Code reviewed and ready for merge
- [x] Test suite created (57+ test cases covering parse/structure/edge cases)
- [x] Crash vulnerability fixed and tested
- [x] Error metadata visible in signal models

## Backward Compatibility
- All changes additive (new validators, new fields in models)
- Existing behavior unchanged for valid artifacts
- Graceful degradation for malformed artifacts (skip/unavailable instead of crash)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Mark stages 2-4 as complete in operational tracking files.
- Set error_type to ErrorCategory values (parse_error, io_error, structure_error) instead of exception class names
- Include exception class name in error_msg for debugging context
- Fixes alert condition filtering logic that now correctly matches error categories
- Ensures all three logging methods (parse, io, structure) consistently populate error_type
Use 'structure_error' instead of 'StructureValidationError' to be consistent with parse_error and io_error naming
Avoid f-strings in logging calls to preserve lazy formatting semantics.
Include exception class name in error_msg using % formatting.
The % operator automatically converts objects to strings, no need for explicit str() calls.
@ProtocolWarden ProtocolWarden merged commit 4b214ca into main May 23, 2026
6 of 11 checks passed
@ProtocolWarden ProtocolWarden deleted the goal/3a3c202f branch May 23, 2026 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant