diff --git a/docs/PROXY-CORE-R13-SCHEMA-DETECTION.md b/docs/PROXY-CORE-R13-SCHEMA-DETECTION.md new file mode 100644 index 0000000..43f0760 --- /dev/null +++ b/docs/PROXY-CORE-R13-SCHEMA-DETECTION.md @@ -0,0 +1,266 @@ +# Round 13: WU 2.4 — Schema Change Detection + +**Work Unit:** 2.4 +**Objective:** Detect and alert on API schema changes — the #1 unsolved problem in autonomous agent infrastructure +**Status:** KICKOFF +**Depends on:** WU 2.1 (proxy core), WU 2.2 (agent identity), WU 2.3 (metering) +**Assigned to:** Codex sub-agent (`model="codex53"`) +**Timeline:** ~11–13 hours (thin-slice execution, 4 slices, 12-15 min each) +**Target metrics:** 40+ tests, 5 modules, 0 regressions + +--- + +## Why This Work + +Schema changes are the #1 unsolved problem. From research: +- **Cost cited:** $2.1K–$45M in losses per incident +- **Frequency:** Weekly for actively-developed APIs (OpenAI, Stripe, HubSpot) +- **Current status:** No standard detection mechanism +- **Impact:** Agents silently fail or return garbled data when schemas drift + +Rhumb's unique advantage: we see every call through the proxy. We can fingerprint schema, detect diffs, and alert **before agents break**. + +--- + +## Deliverables + +### Module 1: Schema Fingerprinting Engine +**File:** `packages/api/services/schema_fingerprint.py` +**Responsibility:** Capture and normalize API response schemas + +**Key Components:** +- `SchemaFingerprint` dataclass: captures structure of response (field names, types, nested structure) +- `fingerprint_response()` function: parse response → structural hash (content-agnostic) +- Deep structural comparison: field add/remove, type change, nesting depth, cardinality (singular vs array) +- Semantic drift detection: detect common renames (parse similarity, levenshtein distance on field names) +- Metadata extraction: latency, status code, headers (content-type, cache-control) + +**Tests (10 tests):** +- Fingerprint stable response → consistent hash +- Add field → detected as change +- Remove field → detected as change +- Type change (string → int) → detected +- Nested object structure change → detected +- Same fields, different order → same fingerprint (order-insensitive) +- Semantic rename (old_field → new_field) → similarity score >0.8 flags as likely rename +- Null/optional fields → treated as cardinality change +- Array → single object change → detected +- Complex nested structure (3+ levels) → fingerprinted correctly + +--- + +### Module 2: Schema Change Detector +**File:** `packages/api/services/schema_change_detector.py` +**Responsibility:** Detect, classify, and track schema drift + +**Key Components:** +- `SchemaChangeDetector` class: compares current vs baseline fingerprint +- `detect_changes()` method: returns list of `SchemaChange` (add, remove, rename, type_change, nesting_change) +- `classify_severity()` method: breaking vs non-breaking vs advisory + - **Breaking:** field removal, type change, nesting change + - **Non-breaking:** field addition, optional field changes + - **Advisory:** naming convention changes +- `alert_required()` method: returns bool (breaking changes → alert always, non-breaking → only if configured) +- Redis-backed baseline tracking: `schema:baseline:{service}:{endpoint}` → latest fingerprint hash + timestamp + +**Tests (12 tests):** +- No changes detected when responses identical +- Field addition flagged as non-breaking +- Field removal flagged as breaking +- Type change flagged as breaking +- Multiple changes in single response → detected and classified +- Empty response → handled gracefully +- Breaking change + advisory change in same diff → both surfaced +- Baseline update flow (store new fingerprint after validation) +- Stale baseline (>7 days old) → handled with age warning +- Non-JSON response (HTML error) → graceful fallback +- Rate limit response (429) → not treated as schema drift +- Error response schema (500) → separate from success schema + +--- + +### Module 3: Alert Pipeline +**File:** `packages/api/services/schema_alert_pipeline.py` +**Responsibility:** Route schema change alerts to operators + +**Key Components:** +- `AlertDispatcher` class: routes breaking changes to webhook + email + in-app +- `webhook_dispatch()`: POST to operator's configured webhook URL (auth token in header) + - Payload: service name, endpoint, change detail, severity, timestamp, fingerprint diff + - Retry logic: exponential backoff (3x, max 1h) + - Error handling: webhook failure logged, alert marked retry_pending +- `email_dispatch()`: Slack/email notification (mock for now, real integration in Phase 3) +- `inapp_dispatch()`: Store schema alert in `schema_alerts` table, queryable via `/v1/admin/schema-alerts` +- Alert deduplication: same change on same endpoint → only alert once per 24h (unless severity escalates) + +**Tests (8 tests):** +- Breaking change → webhook dispatched +- Non-breaking change → no alert (unless configured) +- Webhook success (200 OK) → logged, alert marked sent +- Webhook failure (500) → retry scheduled, alert marked pending +- Alert deduplication: same change 2x in 1h → only one webhook call +- Payload shape validation: includes all required fields +- Email dispatch (mock) → logged with recipient +- Alert query: `/v1/admin/schema-alerts?service=stripe&limit=10` → returns recent alerts + +--- + +### Module 4: Proxy Integration +**File:** `packages/api/routes/proxy.py` (extension) +**Responsibility:** Integrate schema detection into the proxy call path + +**Key Components:** +- Extend `POST /proxy/` to call `schema_change_detector.detect_changes()` after every proxied call +- Store fingerprint in `schema_events` table (lightweight, no blocking) +- If breaking change detected: dispatch alert asynchronously (don't block response) +- New endpoint: `GET /v1/admin/schema/{service}/{endpoint}` → returns latest fingerprint + change history +- Leaderboard integration: schema freshness feeds into AN Score confidence (if schema stable 30 days, freshness bonus) + +**Integration tests (10 tests):** +- Proxy call + schema stable → response unaffected, fingerprint stored +- Proxy call + breaking change detected → response unaffected, alert dispatched async +- Multiple calls, same schema → fingerprint reused (no re-computation) +- Admin endpoint `/v1/admin/schema/stripe/create-payment-intent` → returns current fingerprint + last 5 changes +- Leaderboard: service with stable schema (30d no changes) → freshness multiplier applied +- Operator receives webhook on breaking change → webhook payload well-formed +- Schema drift on error response (500) → not treated as drift on success schema +- High-volume endpoint (1,000 calls/sec) → fingerprinting doesn't block, stored async +- Multi-tenant: agent A's schema alert doesn't leak to agent B +- Admin schema-alerts query: filters by service, date range, severity + +--- + +### Module 5: Supabase Migration +**File:** `packages/api/migrations/0007_schema_detection.sql` +**Responsibility:** Database schema for schema tracking + +**Tables:** +```sql +CREATE TABLE schema_fingerprints ( + id BIGSERIAL PRIMARY KEY, + service_id BIGINT NOT NULL, + endpoint TEXT NOT NULL, + fingerprint_hash TEXT NOT NULL, + updated_at TIMESTAMP DEFAULT now(), + UNIQUE(service_id, endpoint) +); + +CREATE TABLE schema_events ( + id BIGSERIAL PRIMARY KEY, + service_id BIGINT NOT NULL, + endpoint TEXT NOT NULL, + fingerprint_hash TEXT NOT NULL, + change_type TEXT, -- add, remove, type_change, rename, nesting_change + severity TEXT, -- breaking, non_breaking, advisory + captured_at TIMESTAMP DEFAULT now(), + CONSTRAINT fk_service FOREIGN KEY (service_id) REFERENCES services(id) +); + +CREATE INDEX idx_schema_events_service_endpoint ON schema_events(service_id, endpoint); +CREATE INDEX idx_schema_events_captured_at ON schema_events(captured_at DESC); + +CREATE TABLE schema_alerts ( + id BIGSERIAL PRIMARY KEY, + service_id BIGINT NOT NULL, + endpoint TEXT NOT NULL, + change_detail JSONB, + severity TEXT, + alert_sent_at TIMESTAMP, + webhook_url TEXT, + webhook_status INT, + retry_count INT DEFAULT 0, + retry_at TIMESTAMP, + CONSTRAINT fk_service FOREIGN KEY (service_id) REFERENCES services(id) +); + +CREATE INDEX idx_schema_alerts_service_pending ON schema_alerts(service_id) WHERE webhook_status IS NULL; +CREATE INDEX idx_schema_alerts_created_at ON schema_alerts(created_at DESC); +``` + +--- + +## Acceptance Criteria + +### Functional +- ✅ Fingerprint captures schema structure (fields, types, nesting) +- ✅ Detector identifies changes (add, remove, type change, rename, nesting) +- ✅ Severity classification (breaking vs non-breaking vs advisory) +- ✅ Alert dispatch (webhook + email + in-app) for breaking changes +- ✅ Deduplication (same change only alerts once per 24h) +- ✅ Proxy integration (non-blocking, async alert dispatch) +- ✅ Admin endpoint: `/v1/admin/schema/{service}/{endpoint}` returns fingerprint + change history +- ✅ Leaderboard integration: schema stability feeds AN Score freshness + +### Quality +- ✅ 40+ integration tests (fingerprint, detector, alerts, proxy, admin) +- ✅ Type annotations complete (mypy clean) +- ✅ Linting clean (flake8, isort) +- ✅ Zero regressions from Phase 2 (239 tests still passing) +- ✅ Supabase migration idempotent (no side effects) + +### Performance +- ✅ Fingerprinting: <5ms per response (streaming, not blocking) +- ✅ Change detection: <10ms per call (O(n) in field count, not response size) +- ✅ Alert dispatch: async (webhook calls don't block proxy response) +- ✅ Baseline lookups: O(1) Redis (or in-memory cache) + +### Operability +- ✅ Config: operator can set webhook URL + alert preferences per service +- ✅ Logging: all fingerprints + changes logged (queryable via admin endpoint) +- ✅ Monitoring: alert success/failure rates visible in dashboards +- ✅ Graceful degradation: if Redis unavailable, in-memory fallback (with limits) + +--- + +## Thin-Slice Decomposition + +| Slice | Focus | Deliverables | Tests | Approx Time | +|-------|-------|---------------|----|------------| +| A | Fingerprinting + comparison | Module 1 + tests | 10 | 12 min | +| B | Change detection + classification | Module 2 + tests | 12 | 13 min | +| C | Alert pipeline + integration | Module 3 + Module 4 integration | 10 | 14 min | +| D | Admin endpoints + E2E | Module 4 completion + Module 5 + integration tests | 10+ | 12 min | + +**Total target:** 40+ tests, 4–5 modules, 50–60 min execution time (single sub-agent run) + +--- + +## Dependencies + +**Runtime:** Depends on Phase 2 (WU 2.1–2.3) being live +- Proxy router must be operational (for call interception) +- Agent identity must be in place (for per-agent alert routing) +- Billing/metering must exist (for schema events to include cost context) + +**No blocking dependencies:** All modules can be developed in parallel slices + +--- + +## Success Signal + +By end of Round 13: +- ✅ Proxy detects schema changes on every call (non-blocking) +- ✅ Breaking changes trigger webhook alerts within 1 second +- ✅ Operators can query schema history per endpoint +- ✅ Leaderboard scores account for schema stability (freshness bonus for stable services) +- ✅ 239 Phase 2 tests still passing + 40+ new tests +- ✅ Ready for Phase 3 GTM launch with complete Access Layer + +--- + +## Questions / Clarifications + +**Q: How do we handle schema versioning (e.g., API v1 → v2)?** +A: Currently treat as separate endpoint. Future: add version parameter to fingerprint key. + +**Q: What if an endpoint returns different schemas based on auth level?** +A: Each (agent + endpoint) pair gets separate baseline. Schema changes scoped per agent if needed. + +**Q: Real Stripe integration for alerts?** +A: No. Phase 3 work. For now: webhook dispatch + in-app alerts. Email is mocked. + +**Q: How do we avoid false positives (e.g., timestamps, UUIDs)?** +A: Fingerprint ignores scalar values, only captures structure. Timestamps/UUIDs don't trigger alerts. + +**Q: Can operators opt-out of alerts for non-breaking changes?** +A: Yes. Config: `alert_severity: breaking_only` (default) or `all`. Per-service setting.