Diagnostics And Maintenance

This guide covers the operator-facing checks that help explain what the local system is doing and why.

Plane Doctor

./scripts/operations-center.sh plane-doctor --task-id TASK-123

Use this to verify:

Plane base URL
configured workspace/project
API user identity
project and work-item endpoint reachability

This is the fastest check when polling is not finding work.

Plane Smoke Test

./scripts/operations-center.sh smoke --task-id TASK-123 --comment-only

Use this to verify Plane fetch/comment behavior without running a full execution task.

Retained smoke artifacts are written under:

tools/report/execution_plane/<timestamp>_<task_id>_<run_id>/

Providers Status

./scripts/operations-center.sh providers-status

Use this to re-check:

installed provider CLIs
versions
auth readiness
headless readiness where supported

Dependency Check

./scripts/operations-center.sh dependency-check
./scripts/operations-center.sh dependency-check --create-plane-tasks

This maintenance checker:

reads local version pins
checks installed/local health
compares upstream latest where practical
writes a retained report
can create Plane task-kind: improve tasks for drift or breakage

This is a maintenance path, not part of normal task execution.

Retained Summaries

Retained result_summary.md files align with board/log language and include:

worker_role
task_kind
run_id
final_status
blocked_classification
follow_up_task_ids
human_attention_required when relevant

Watcher Heartbeat Check

python -m operations_center.entrypoints.worker.main heartbeat-check --log-dir logs/local/watch-all

Returns exit code 0 if all watchers wrote a heartbeat within the last 5 minutes. Returns exit code 1 with a message listing stale roles. Run from cron to get paged when a watcher dies silently.

Credential Validation and Expiry Detection

On the first cycle of each watcher run, Operations Center validates GitHub and Plane tokens.

Invalid tokens (401/403): The watcher logs credential_invalid, records an escalation event, and exits. If a watcher fails to start with watch_credential_failure, check that GITHUB_TOKEN and PLANE_API_TOKEN are set and valid for the configured workspace.

Upcoming expiry (fine-grained PATs): If the GitHub /user response includes an x-token-expiration header, Operations Center checks whether expiry is within escalation.credential_expiry_warn_days days (default 7). When approaching expiry:

≤ warn_days remaining: logs credential_expiry_soon warning with days remaining and expiry date
≤ 1 day remaining: logs error and records a credential_github_expiring escalation event

Set escalation.credential_expiry_warn_days: 0 to disable expiry monitoring. Only fine-grained GitHub PATs expose this header; classic tokens do not.

Config Schema Drift Check

At watcher startup (cycle 1), Operations Center compares your deployed config against config/operations_center.example.yaml. If any top-level or nested key in the example is absent from your config, it is logged as a config_drift_detected warning:

{"event": "config_drift_detected", "missing_key": "escalation", ...}
{"event": "config_drift_summary", "missing_count": 2, "missing_keys": ["escalation", "stale_pr_days"]}

This fires on every watcher start until the gap is resolved. Check the watcher log if a feature appears to be silently disabled.

Workspace Health Check

The improve watcher automatically verifies and repairs repo environments every 25 cycles. To check manually:

Look for workspace_health_unhealthy and workspace_health_repair_failed events in the improve watcher log.
If a [Workspace] Repair environment for <repo> task appears on the board, the automatic repair failed — investigate the venv or bootstrap script for that repo.

Spend Report

View execution count and estimated cost for the last N days:

# Last 24 hours
python -m operations_center.entrypoints.worker.main spend-report

# Last 7 days
python -m operations_center.entrypoints.worker.main spend-report --window-days 7

Returns JSON:

{
  "window_days": 7,
  "total_executions": 42,
  "total_estimated_usd": 6.30,
  "per_repo": {
    "OperationsCenter": {"executions": 18, "estimated_usd": 2.70},
    "ExternalRepo": {"executions": 24, "estimated_usd": 3.60}
  }
}

Requires cost_per_execution_usd to be set in config (default 0.0 = disabled).

Circuit Breaker Diagnosis

When a systemic failure (bad executor version, auth regression) causes every execution to fail, the circuit breaker opens to prevent burning the full daily budget. Look for:

{"event": "budget_decision", "allowed": false, "reason": "circuit_breaker_open", ...}

in the watcher log. The circuit reopens automatically when the failure rate drops below 80% over the last 5 executions. To reset immediately: fix the underlying issue and wait for a successful execution.

Thresholds are tunable via env vars:

OPERATIONS_CENTER_CIRCUIT_BREAKER_THRESHOLD (default 0.8)
OPERATIONS_CENTER_CIRCUIT_BREAKER_WINDOW (default 5)

Connection Error Backoff

Transient network failures (Plane API down, DNS failure) now trigger exponential backoff in the watcher. Look for "event": "watch_error" with "consecutive_errors": N and "backoff_seconds": N in the log. The backoff caps at 5 minutes. The counter resets on the next successful cycle.

If consecutive_errors is climbing and not resetting, the Plane API is unreachable — check network or the PLANE_API_TOKEN.

Observer Snapshot Staleness

generate-insights warns if the most recent observer snapshot is older than 2 hours:

[warn] Latest observer snapshot is 4.2h old — insights may not reflect current repo state.

If you see this, re-run observe-repo before generating insights:

./scripts/operations-center.sh observe-repo
./scripts/operations-center.sh generate-insights

Proposer Quiet Diagnosis

When the proposer emits 0 candidates for 5 or more consecutive cycles, a diagnosis file is written automatically:

logs/autonomy_cycle/quiet_diagnosis.json

It contains:

cycles_analyzed — how many recent cycles were checked
suppression_reasons — reason counts across all cycles, sorted by frequency
advice — a plain-language summary of the dominant suppression reason

Check this file before manually inspecting individual cycle JSON files.

The file is deleted automatically when the proposer starts emitting again.

Proposal Rejection Store

To see which autonomy candidates have been permanently rejected (by human cancellation):

cat state/proposal_rejections.json

Each entry has reason, task_id, task_title, and recorded_at. These are checked before budget, cooldown, or dedup — a rejected key will never be proposed again. To allow a re-proposal, delete the entry from the JSON file and rerun autonomy-cycle.

Quality Erosion Warnings

After each executor run, the execution service scans the diff for quality-suppressing additions:

# noqa annotations
# type: ignore annotations
bare pass statements

When the combined total reaches or exceeds 3, an executor_quality_warning event is written to the usage store and a note is appended to the PR comment:

> [quality] This run added N quality suppressions: {"noqa": N, "type_ignore": N}. Review before merging.

To audit suppression trends, filter the usage store for "kind": "executor_quality_warning" entries. These are tracked for observability only — they do not affect task status.

Scope Violation Observability

When allowed_paths is configured and an executor run modifies files outside the allowed set, the policy is enforced (changes are not pushed) and a scope_violation event is written to the usage store. Fields include violated_files and repo_key.

Filter for "kind": "scope_violation" in tools/report/operations_center/execution/usage.json to see which tasks have exceeded their path budget. Recurring violations may indicate the allowed_paths config is too narrow for the goal.

Quota Exhaustion Detection

Hard quota exhaustion from the executor orchestrator (e.g. Anthropic API quota exceeded) is detected separately from rate limiting. When detected:

a quota_event is written to the usage store (does not feed the circuit breaker)
the task is moved to Blocked with blocked_classification: quota_exhausted

Filter for "kind": "quota_event" to track frequency. Unlike transient rate limits, quota exhaustion typically requires manual intervention (quota increase or wait for reset).

Disk Space Check

See the Disk Space Guardrail section in the Runtime Guide. If writes to the usage store are failing with OSError, disk space is the first thing to check.

Failure Classification Reference

classify_execution_result maps execution failures to one of these classifications (checked in priority order):

Classification	Trigger	Follow-up action
`scope_policy`	`allowed_paths` violation	policy-retry fired
`oom`	Out of memory / killed	investigate memory pressure
`timeout`	Process timed out	increase `executor.timeout_seconds` or split task
`model_error`	API 5xx / overloaded	transient; retry usually succeeds
`context_limit`	Token limit exceeded	split task with `prior_progress` handoff
`dependency_missing`	ModuleNotFoundError / command not found	fix bootstrap
`flaky_test`	Known-flaky command	stabilise the test
`validation_failure`	Tests / lint fail	investigate test output
`awaiting_input`	Executor embedded `<!-- cp:question: ... -->`	human must reply; improve watcher re-queues automatically
`tool_failure`	Bash/git tool error	investigate tool configuration
`infra_tooling`	Auth / missing file	fix credentials or environment
`unknown`	None of the above	investigate stderr

Self-Healing Log Events

When a task is blocked for the third consecutive time without a successful execution in between, the system posts a [Improve] Repeated-block self-healing triggered comment and logs:

{"event": "self_healing_repeated_block", "task_id": "...", "consecutive_blocks": 3, "classification": "..."}

This means the task needs human review — autonomous retries for it are paused. The consecutive block counter resets after a successful execution.

Cross-Repo Impact Warnings

When a goal task touches paths declared in another repo's impact_report_paths, a warning is logged:

{"event": "cross_repo_impact_detected", "task_id": "...", "warnings": ["repo=shared_lib shared_path=src/api/ changed_file=src/api/client.py"]}

And a comment is posted on the task: [Goal] Cross-repo impact detected. This is advisory — verify dependent repos still build and pass tests.

Supervisor Status

When using the process supervisor, check its status:

cat logs/local/supervisor.status.json

Fields per process: role, alive, pid, restart_count, last_restart_at. A high restart_count on a role indicates a crash loop — investigate the watcher log for that role.

Circuit Breaker Escalation

When the circuit breaker trips (≥80% failure over last 5 executions) AND an escalation webhook is configured, a webhook POST is sent automatically (cooldown-guarded). Look for:

{"event": "circuit_breaker_escalation_sent", "role": "...", "reason": "circuit_breaker_open"}

in the watcher log. The escalation fires once per cooldown period (escalation.cooldown_seconds, default 3600). The circuit breaker still resets when the failure rate improves — the escalation is informational.

Campaign Status

Track multi-step plan progress from the board:

# Show all active campaigns
python -m operations_center.entrypoints.campaign_status.main

# Show only in-progress campaigns
python -m operations_center.entrypoints.campaign_status.main --status in_progress

# JSON output for scripting
python -m operations_center.entrypoints.campaign_status.main --json

Campaigns are created automatically when the improve watcher decomposes a multi-step plan (tasks with titles containing refactor, migrate, redesign, etc. or labeled plan: multi-step). Each campaign tracks step completion and overall progress.

Campaign records are stored in state/campaigns.json. Each record contains campaign_id, title, step_ids, done_step_ids, cancelled_step_ids, total_steps, completed_steps, progress_pct, and status (pending, in_progress, complete, partially_cancelled).

Awaiting-Input Tasks

When the executor embeds  in its output, the task is classified as awaiting_input and blocked. The improve watcher:

Extracts the question text from the HTML comment.
Posts it as a Plane comment asking for human input.
Scans every 8 improve cycles for a human reply.
When a reply is detected, injects the answer into the task description and re-queues it to Ready for AI.

To find pending awaiting-input tasks:

grep "awaiting_input" logs/local/watch-all/improve.log | tail -20

Or check the Plane board for tasks with blocked_classification: awaiting_input in their latest comment.

CI Webhook

The CI webhook server receives GitHub check-run events and triggers autonomy cycles reactively:

# Start the webhook server
python -m operations_center.entrypoints.ci_webhook.main --port 8765 --secret "$WEBHOOK_SECRET"

# Trigger files land in state/ci_webhook_triggers/
ls state/ci_webhook_triggers/

Each trigger file is named <timestamp>_<check_suite_id>.json and contains the repository, branch, check name, status, and conclusion. The pipeline trigger watcher watches this directory and fires autonomy-cycle when a new trigger appears.

HMAC-SHA256 signature validation (X-Hub-Signature-256 header) is enforced when --secret is provided. Requests without a valid signature return HTTP 401.

Stray Artifact Isolation

Self-review verdict files must land inside the task's ephemeral workspace (/tmp/oc-task-<id>/), not in $HOME or the repo root. The prompt uses an absolute path and the supervisor starts each watcher with cd <ROOT_DIR> so relative paths in the worker process resolve correctly.

If you find .review/ directories or *_verdict.txt files in unexpected locations ($HOME, bare /home/dev/repo/, unrelated repo clones), they are leftovers from before this fix. You can safely delete them:

find $HOME -maxdepth 2 -name '*verdict*' -o -name '.review' -type d 2>/dev/null

The root cause of stray verdicts is either an old executor run that predates the absolute-path fix, or a supervisor that was not starting the worker process from ROOT_DIR. Both are fixed; this should not recur.

Suggested Debugging Order

watch-all-status
watcher log in logs/local/watch-all/
Plane comments on the task
retained artifact directory in tools/report/execution_plane/
plane-doctor if the board/API contract looks wrong
heartbeat check: python -m operations_center.entrypoints.worker.main heartbeat-check
supervisor status: cat logs/local/supervisor.status.json (if using supervisor)
config drift: look for config_drift_detected in watcher log at cycle 1
workspace health: look for workspace_health_* events in improve watcher log
circuit breaker: look for reason: circuit_breaker_open in watcher log; check for circuit_breaker_escalation_sent
connection backoff: look for watch_error with consecutive_errors > 1 in watcher log
quota exhaustion: look for "kind": "quota_event" in usage store
quality erosion: look for "kind": "executor_quality_warning" in usage store
scope violations: look for "kind": "scope_violation" in usage store
board saturation: look for "event": "propose_skipped_board_saturated" in propose watcher log
propose backlog gate: look for "event": "watch_propose_skipped_backlog" in propose watcher log — board has ≥ propose_skip_when_ready_count ready tasks
executor concurrency gate: look for "dispatch skipped" with "reason": "concurrency_cap" — another executor run is active
memory gate: look for "dispatch skipped" with "reason": "low_memory" — less than backend_caps.team_executor.min_available_memory_mb free
self-healing: look for self_healing_repeated_block events in improve watcher log
credential expiry: look for credential_expiry_soon in watcher log at cycle 1
cross-repo impact: look for cross_repo_impact_detected in goal watcher log
dependency updates: look for dependency_update_task_created in improve watcher log
quality trends: look for quality_trend/lint_degrading or type_degrading insights in tools/report/operations_center/insights/
confidence calibration: run tune-autonomy and check the calibration table for ⚠ families
error ingest: look for error_ingest_task_created events; check state/error_ingest_dedup.json for dedup state
no-op loop: look for noop_loop/family_cycling in the latest insights artifact; family is cycling without acceptance
coverage gap: look for coverage_gap/low_overall or coverage_gap/uncovered_files insights; check coverage.xml is being generated
execution env: look for execution_env_warning in the goal watcher log; check that required tools are installed in venv
awaiting-input tasks: look for blocked_classification: awaiting_input in improve watcher log; check Plane task for extracted question
priority rescore: look for priority_rescore_demoted and priority_rescore_promoted events in improve watcher log every 45 cycles
cross-repo patterns: look for cross_repo/pattern_detected in the latest insights artifact; consider org-wide fix tasks
campaign status: run campaign-status CLI; check state/campaigns.json for stalled campaigns
ci webhook triggers: check state/ci_webhook_triggers/ for unprocessed trigger files
orphaned workspaces: look for watch_cleanup_orphaned_workspaces at cycle 1 or every 20 cycles; leftover /tmp/oc-task-* dirs indicate prior worker crashes

For autonomy-layer inputs:

./scripts/operations-center.sh observe-repo
./scripts/operations-center.sh generate-insights
./scripts/operations-center.sh decide-proposals
./scripts/operations-center.sh propose-from-candidates --dry-run
retained observer artifacts in tools/report/operations_center/observer/
retained insight artifacts in tools/report/operations_center/insights/
retained decision artifacts in tools/report/operations_center/decision/
retained proposer artifacts in tools/report/operations_center/proposer/

When the board is quiet, also check the proposer lane:

proposer heartbeat/status in watch-all-status
proposer log in logs/local/watch-all/
new tasks labeled source: proposer
logs/autonomy_cycle/quiet_diagnosis.json for aggregated suppression reasons

No-Op Loop Warnings

When the NoOpLoopDeriver detects a family cycling without acceptance, a noop_loop/family_cycling insight is written. Check:

cat tools/report/operations_center/insights/$(ls -t tools/report/operations_center/insights/ | head -1) | python3 -m json.tool | grep -A5 noop_loop

Evidence fields: family, proposals_in_window, merges_in_window, look_back_days.

Common causes:

The family's threshold is too low — it fires too easily on signals that don't warrant a fix
The generated proposals are consistently rejected without a clear path to acceptance (check state/rejection_patterns.json)
The execution environment is missing the required tool (check execution_env_warning in the goal watcher log)

Remediation: Consider raising the family's signal threshold in config/autonomy_tuning.json, or demoting the family's tier via autonomy-tiers set --family <family> --tier 0.

Coverage Gap Detection

When coverage.xml, a text coverage report, or htmlcov/index.html is present in a repo, the observer collects coverage data automatically. Check whether coverage reports exist:

ls <repo_path>/coverage.xml <repo_path>/htmlcov/index.html 2>/dev/null

If coverage data is available but no coverage_gap proposals are appearing, the total may be above the 60% threshold or the uncovered file count may be below 3. Check the latest observer artifact:

python3 -c "import json; d=json.load(open('$(ls -t tools/report/operations_center/observer/*.json | head -1)')); print(d['signals'].get('coverage_signal', {}))"

Coverage collection requires pre-existing report files. OperationsCenter never runs coverage tools itself. Generate coverage reports as part of your CI or test script, then retain the output files.

Theme Aggregation

When the same source file appears in top lint or type violations across 3+ consecutive observer snapshots, ThemeAggregationDeriver emits theme/lint_cluster or theme/type_cluster insights. The LintClusterRule proposes a single [Refactor] task for that file rather than repeated lint_fix proposals.

If you see [Refactor] Systematic lint cleanup: <file> tasks being proposed, it means the file has persistent violations that individual lint fixes are not resolving. The refactor task asks Kodo to address the root pattern rather than individual violations.

Rejection Patterns Store

When a PR is escalated to human review and the human reviewer leaves comments, rejection patterns are automatically extracted and stored:

cat state/rejection_patterns.json

Each key is {repo_key}:{family}. Each entry has patterns (pattern name → count) and last_seen (pattern name → ISO timestamp). The most frequently seen patterns are the main recurring feedback from human reviewers for that family in that repo.

These patterns are currently persisted for observability. Future work can wire them into proposal descriptions to pre-empt known objections.

Execution Environment Warnings

When a goal task is claimed for a family that requires specific tools, the watcher checks tool availability:

grep "execution_env_warning" logs/local/watch-all/goal.log

Warning fields: task_id, family, warning (describes which tool group is missing). The task is not blocked — it proceeds to execution — but repeated warnings for the same family indicate the tool should be installed in the repo's venv or system PATH.

Quality Trend Warnings

The QualityTrendDeriver emits insights when lint or type error counts are trending in a direction across ≥3 observer snapshots. Check the latest insights artifact for these signals:

cat tools/report/operations_center/insights/$(ls -t tools/report/operations_center/insights/ | head -1)

Look for entries with kind starting with quality_trend/:

Insight	Meaning
`quality_trend/lint_degrading`	Lint errors increased >10% from oldest to newest snapshot
`quality_trend/type_degrading`	Type errors increased >10%
`quality_trend/lint_improving`	Lint errors decreased >10%
`quality_trend/type_improving`	Type errors decreased >10%
`quality_trend/stagnant`	Metrics present but <10% change in either direction

stagnant on a repo with many outstanding lint/type tasks may indicate the autonomy loop is proposing tasks that are not reaching execution, or that tasks complete but introduce equivalent new violations.

Confidence Calibration

After enough feedback records accumulate (≥5 per family/confidence combination), tune-autonomy prints a calibration table:

  Confidence calibration:
  family                       conf     n  accept%  expected    ratio
  lint_fix                     high     8    62.5%     80.0%   0.78
  type_fix                     high     6    33.3%     80.0%   0.42⚠

The ratio column is acceptance_rate / expected_rate. Interpretation:

ratio ≥ 0.9 (✓) — well-calibrated; the confidence label matches observed outcomes
0.6 ≤ ratio < 0.9 — mildly over-confident; monitor but no immediate action needed
ratio < 0.6 (⚠) — over-confident; the system is creating high-confidence proposals that are frequently rejected

When a family shows ⚠, consider lowering its min_confidence threshold in config or demoting its tier until the acceptance rate recovers.

To record calibration data manually:

python -m operations_center.entrypoints.feedback.main record \
    --task-id <uuid> --outcome merged \
    --family lint_fix --confidence high

Calibration data is stored in state/calibration_store.json.

Maintenance Boundary

Normal execution should stay pinned and stable.

Use diagnostics and maintenance commands to:

inspect health
investigate failures
check version drift

Do not treat normal run or watch cycles as the place to auto-upgrade tooling.

Likewise, do not treat the proposer lane as unlimited self-directed work generation. It is intentionally bounded by cooldown, quota, and deduplication guardrails.

FilesExpand file tree

diagnostics.md

Latest commit

History