A lightweight self-hosted dashboard for monitoring and controlling an OpenClaw assistant environment.
- OpenClaw health/status check
- Raw status output for quick debugging
- Gateway restart action from the UI
- Fetches recent failed workflow runs for any
owner/repo - Enriches runs with failed jobs/steps details
- Pulls failed logs and classifies likely failure type:
test_failurelint_or_typecheckdependency_or_buildinfra_or_flakyauth_or_secretsunknown
- Provides:
- likely failure reason
- fail-point hotspots (
job -> step) - confidence score
- severity (
high,medium,low) with badges - short failed-log excerpt
- Supports sorting and filtering by severity
- Quick actions:
- open top 3 high-severity runs
- export visible summary to
.txt - auto-refresh every 30s
- Node.js 18+
- GitHub CLI (
gh) authenticated with access to target repos - OpenClaw installed on host
This project now runs HTTPS by default with:
minVersion: TLSv1.2- TLS 1.2 cipher allowlist:
ECDHE-ECDSA-AES128-GCM-SHA256ECDHE-RSA-AES128-GCM-SHA256ECDHE-ECDSA-AES256-GCM-SHA384ECDHE-RSA-AES256-GCM-SHA384
- HTTP (
:80) → HTTPS (:443) redirect with301 - HSTS header:
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
- Outbound TLS verification enabled (
rejectUnauthorized: true) - Optional certificate pinning via
TLS_PINNED_SPKI_SHA256 - Strict dynamic CORS whitelist (no wildcard)
ALLOWED_ORIGINS=https://yourdomain.com,https://app.yourdomain.com
ALLOWED_METHODS=GET,POST,OPTIONS
ALLOWED_HEADERS=Authorization,Content-TypeADMIN_ALLOWED_IPS=203.0.113.10,10.8.0.0/24,2001:db8::/48
WEBHOOK_ALLOWED_IPS=198.51.100.0/24,2001:db8:feed::/48
WEBHOOK_HMAC_SECRET=replace-with-strong-secretCIRCUIT_BREAKER_WEBHOOK=https://hooks.slack.com/services/xxx
CIRCUIT_BREAKER_EMAIL=admin@yourdomain.com
CIRCUIT_BREAKER_WARNING=100
CIRCUIT_BREAKER_SOFT=250
CIRCUIT_BREAKER_HARD=500
CIRCUIT_BREAKER_EMERGENCY=1000
REDIS_URL=redis://127.0.0.1:6379
RATE_LIMIT_REDIS_PREFIX=aod:rl:# Optional monitoring bypass for health/observability
INTERNAL_MONITORING_IPS=10.0.0.10
INTERNAL_MONITORING_HEADER=x-internal-monitoring
INTERNAL_MONITORING_HEADER_VALUE=allow# 32-byte key (raw/base64/hex) for AES-256-GCM field encryption
DATA_ENCRYPTION_KEY=...
# Redis over TLS (certificate validation in production)
REDIS_URL=rediss://redis.example.com:6379
# API key secret material (plaintext only for one-time migration/startup hash)
API_KEY_SECRETS_JSON={"ak_ops_default":"super-secret"}
# Store encrypted webhook secret (preferred) instead of plaintext WEBHOOK_HMAC_SECRET
ENCRYPTED_WEBHOOK_SECRET=...Utilities:
lib/encryption.js— AES-256-GCM encrypt/decrypt + bcrypt hashingscripts/migrate-sensitive-data.js— generate encrypted secrets + hashed API keysscripts/data-retention.sh— 7d prompt/response purge, daily session cleanup, 90d analytics anonymizationscripts/backup-encrypted.sh— encrypted backup (gpg AES256)scripts/backup-verify.sh— decrypt/verify backup integrity
Built-in protections on /api/models/invoke:
- Input limits: max 4000 chars, 50 turns, 2000-char system prompt
- Hard-block jailbreak patterns
- Sanitizes script/SQL/path traversal sequences
- Quarantine API key after repeated blocked attempts
- Output redaction for key/PII/system-prompt leakage
- Canary token leak detection and alert logging
Admin endpoints:
GET /admin/costsPOST /admin/costs/reset(owner only)PUT /admin/costs/limits(owner only)GET /admin/logs(admin/owner; filter bycategory,level,since,until,q)GET /admin/logs/digest(daily medium-severity summaries)GET /admin/incidents/GET /admin/incidents/:id/POST /admin/incidents/:id/resolvePOST /admin/killswitch/POST /admin/killswitch/release(owner only)
Detections:
- API key compromise signals (unusual IP/geography)
- Brute force attacks (10+ failed logins in 5 min)
- Cost anomaly (>3x rolling 7-day average)
- Data exfiltration signal (large responses)
- Service degradation (error rate >10%, latency >5s avg)
Automatic responses include:
- key disable + source IP block for compromise
- 24h IP block for brute force (+ CAPTCHA if distributed)
- circuit breaker escalation and model downgrades for cost anomalies
- incident timeline creation for every incident
JSON log schema:
timestamp(ISO 8601)level(info|warn|error|critical)category(auth|api|model|admin|security)eventmetadata(masked identifiers)request_id
Sensitive data is not logged (passwords/full tokens/full API keys/prompt content).
Environment:
LOG_DIR=/path/to/logs
LOG_FORWARD_URL=https://logs.example.com/ingest
ADMIN_RECOGNIZED_IPS=203.0.113.10,10.8.0.0/24Critical (immediate):
- 10+ failed logins from same IP in 5 min
- circuit breaker emergency
- unknown API key usage
- admin action from unrecognized IP
High:
- 5+ prompt injection attempts from same key
- cost spike >3x normal hourly
- error rate >10% over 5 min
Medium (daily digest):
- rate limit summary by key/IP
- failed auth summary
- cost summary by model/key (
/admin/logs/digest)
Script: scripts/log-rotate.sh
- rotate daily
- compress logs older than 1 day
- retain security logs 365 days
- retain general logs 30 days
LaunchAgent installed:
-
~/Library/LaunchAgents/com.assistant-ops-dashboard.log-rotate.plist -
runs daily at 00:05
-
Admin IP allowlist applies to
/admin/*and/api/v1/admin/* -
Webhooks require both source IP allowlist match and
x-webhook-signatureHMAC-SHA256 validation -
In production,
/docs,/swagger,/api-docs,/debug/*, and/test/*are blocked
Behavior:
- Returns a matched origin (never
*) - Returns no CORS headers for unmatched origins
- Logs rejected origins as
[cors-rejected] ... - Caches preflight for 24h (
Access-Control-Max-Age: 86400)
npm install
mkdir -p certs
openssl req -x509 -newkey rsa:2048 -sha256 -days 30 -nodes \
-keyout certs/privkey.pem \
-out certs/fullchain.pem \
-subj "/CN=localhost" \
-addext "subjectAltName=DNS:localhost,IP:127.0.0.1"
HTTPS_PORT=3443 HTTP_PORT=3080 npm startThen open:
https://localhost:3443/login
Add a screenshot at
docs/screenshot.pngand it will render automatically.
- Enter
owner/repo, click Load failures - Show that failures are sorted with high severity first
- Set Severity = High to filter
- Expand a Failed log excerpt
- Click Open top 3 high severity
- Click Export summary and show downloaded file
- Toggle Auto-refresh (30s)
Installed and configured with a LaunchAgent:
~/Library/LaunchAgents/com.assistant-ops-dashboard.certbot-renew.plist- Script:
scripts/certbot-renew.sh - Schedule: daily at 03:17 local time
Dry-run test command:
certbot renew --dry-run \
--config-dir ~/.assistant-ops-dashboard/certbot/config \
--work-dir ~/.assistant-ops-dashboard/certbot/work \
--logs-dir ~/.assistant-ops-dashboard/certbot/logsGET /api/health— OpenClaw statusPOST /api/restart— restart gatewayGET /api/ci/failures?repo=owner/repo&limit=5— CI failed run summaries
- CI failed run summarization (workflow + run metadata)
- Failed jobs/steps enrichment
- Log-based failure classification + quick fix hints
- Confidence scoring + severity badges
- Severity filter/sort + quick triage actions
- Screenshot + GIF walkthrough
- PR comment bot mode (post summary to pull requests)
- Slack/Discord notifier for high-severity CI failures
- This dashboard relies on
ghcommand output; make sure you are logged in (gh auth status). - Failure classification is heuristic-based and intended as triage assistance, not a root-cause guarantee.
