diff --git a/README.md b/README.md index 7130b6e..41a7a55 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,14 @@ # openclaw-superpowers -**53 ready-to-use skills that make your AI agent autonomous, self-healing, and self-improving.** +**60 ready-to-use skills that make your AI agent autonomous, self-healing, and self-improving.** -[![Skills](https://img.shields.io/badge/skills-53-blue)](#skills-included) +[![Skills](https://img.shields.io/badge/skills-60-blue)](#skills-included) [![Security](https://img.shields.io/badge/security_skills-6-green)](#security--guardrails) -[![Cron](https://img.shields.io/badge/cron_scheduled-20-orange)](#openclaw-native-37-skills) -[![Scripts](https://img.shields.io/badge/companion_scripts-37-purple)](#companion-scripts) +[![Cron](https://img.shields.io/badge/cron_scheduled-23-orange)](#openclaw-native-44-skills) +[![Scripts](https://img.shields.io/badge/companion_scripts-44-purple)](#companion-scripts) [![License: MIT](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE) -A plug-and-play skill library for [OpenClaw](https://github.com/openclaw/openclaw) — the open-source AI agent runtime. Gives your agent structured thinking, security guardrails, persistent memory, cron scheduling, runtime verification, self-recovery, and the ability to write its own new skills during conversation. +A plug-and-play skill library for [OpenClaw](https://github.com/openclaw/openclaw) — the open-source AI agent runtime. Gives your agent structured thinking, security guardrails, persistent memory, cron scheduling, deployment preflight, runtime verification, auth lifecycle tracking, self-recovery, and the ability to write its own new skills during conversation. Built for developers who want their AI agent to run autonomously 24/7, not just respond to prompts in a chat window. @@ -20,13 +20,18 @@ Built for developers who want their AI agent to run autonomously 24/7, not just Most AI agent frameworks give you a chatbot that forgets everything between sessions. OpenClaw is different — it runs persistently, handles multi-hour tasks, and has native cron scheduling. But out of the box, it doesn't know *how* to use those capabilities well. -**openclaw-superpowers bridges that gap.** Install 53 skills in one command, and your agent immediately knows how to: +**openclaw-superpowers bridges that gap.** Install 60 skills in one command, and your agent immediately knows how to: - **Think before it acts** — brainstorming, planning, and systematic debugging skills prevent the "dive in and break things" failure mode - **Protect itself** — 6 security skills detect prompt injection, block dangerous actions, audit installed code, and scan for leaked credentials -- **Run unattended** — 20 cron-scheduled skills handle memory cleanup, health checks, budget tracking, and community monitoring while you sleep +- **Run unattended** — 23 cron-scheduled skills handle memory cleanup, health checks, budget tracking, and community monitoring while you sleep +- **Keep MCP auth alive** — auth lifecycle tracking catches missing env vars, expiring tokens, and undefined refresh paths before a healthy server turns unusable +- **Prove delivery** — cron execution proofs distinguish "the job fired" from "the user actually got the output" +- **Scale delegation safely** — subagent capability auditing catches missing spawn tools, unsafe depth settings, and bloated fleet definitions before they burn time and tokens +- **Rollback cleanly** — upgrade rollback snapshots preserve configs and restore instructions before runtime changes become irreversible +- **Deploy safely** — deployment preflight catches missing mounts, missing bootstrap files, and public gateway exposure before the runtime starts drifting - **Verify itself** — runtime verification catches missing cron registrations, stale state, dependency drift, and install layout mistakes before they silently break automation -- **Recover from failures** — self-recovery, loop-breaking, and task handoff skills keep long-running work alive across crashes and restarts +- **Recover from failures** — self-recovery, loop-breaking, task handoff, and reset recovery keep long-running work alive across crashes and routine session resets - **Never forget** — DAG-based memory compaction, integrity checking, context scoring, and SQLite session persistence ensure the agent preserves critical information even in month-long conversations - **Improve itself** — the agent can write new skills during normal conversation using `create-skill`, encoding your preferences as permanent behaviors @@ -82,7 +87,7 @@ Methodology skills that work in any AI agent runtime. Adapted from [obra/superpo | `skill-conflict-detector` | Detects name shadowing and description-overlap conflicts between installed skills | `detect.py` | | `skill-portability-checker` | Validates OS/binary dependencies in companion scripts; catches non-portable calls | `check.py` | -### OpenClaw-Native (37 skills) +### OpenClaw-Native (44 skills) Skills that require OpenClaw's persistent runtime — cron scheduling, session state, or long-running execution. These are the skills that make a 24/7 autonomous agent actually work reliably. @@ -108,6 +113,12 @@ Skills that require OpenClaw's persistent runtime — cron scheduling, session s | `channel-context-bridge` | Writes a context card at session end for seamless channel switching | — | `bridge.py` | | `skill-doctor` | Diagnoses silent skill discovery failures — YAML errors, path violations, schema mismatches | — | `doctor.py` | | `installed-skill-auditor` | Weekly post-install audit of all skills for injection, credentials, and drift | Mondays 9am | `audit.py` | +| `deployment-preflight` | Validates deployment safety before install, upgrade, or unattended use — workspace visibility, persistent mounts, gateway exposure, and runtime paths | — | `check.py` | +| `session-reset-recovery` | Checkpoints active work before the overnight reset window and restores a concise resume brief after restart | daily 3:45am | `recover.py` | +| `cron-execution-prover` | Wraps scheduled workflows with proof records — start, finish, evidence, and stale-run detection | — | `prove.py` | +| `message-delivery-verifier` | Tracks outbound notification delivery across channels so sent, acknowledged, failed, and stale messages are explicit | every 15 min | `verify.py` | +| `subagent-capability-auditor` | Audits subagent configuration for spawn depth, tool exposure, and fleet shape before multi-agent work begins | — | `audit.py` | +| `upgrade-rollback-manager` | Snapshots config and state before upgrades and writes rollback instructions tied to the previous runtime version | — | `manage.py` | | `skill-loadout-manager` | Named skill profiles to manage active skill sets and prevent system prompt bloat | — | `loadout.py` | | `skill-compatibility-checker` | Checks installed skills against the current OpenClaw version for feature compatibility | — | `check.py` | | `runtime-verification-dashboard` | Verifies cron registration, state freshness, install layout, and dependency readiness across the live runtime; can dry-run or apply safe remediations | every 6h | `check.py` | @@ -117,6 +128,7 @@ Skills that require OpenClaw's persistent runtime — cron scheduling, session s | `config-encryption-auditor` | Scans config directories for plaintext API keys, tokens, and world-readable permissions | Sundays 9am | `audit.py` | | `tool-description-optimizer` | Scores skill descriptions for trigger quality — clarity, specificity, keyword density — and suggests rewrites | — | `optimize.py` | | `mcp-health-checker` | Monitors MCP server connections for health, latency, and availability; detects stale connections | every 6h | `check.py` | +| `mcp-auth-lifecycle-manager` | Tracks MCP auth expiry, missing env vars, refresh commands, and interactive-login risk before credentials silently age out | every 6h | `manage.py` | | `memory-dag-compactor` | Builds hierarchical summary DAGs from MEMORY.md with depth-aware prompts (d0 leaf → d3+ durable) | daily 11pm | `compact.py` | | `large-file-interceptor` | Detects oversized files, generates structural exploration summaries, stores compact references | — | `intercept.py` | | `context-assembly-scorer` | Scores how well current context represents full conversation; detects blind spots | every 4h | `score.py` | @@ -155,12 +167,15 @@ Six skills form a defense-in-depth security layer for autonomous agents: | Feature | openclaw-superpowers | obra/superpowers | Custom prompts | |---|---|---|---| -| Skills included | **53** | 8 | 0 | +| Skills included | **60** | 8 | 0 | | Self-modifying (agent writes new skills) | Yes | No | No | -| Cron scheduling | **20 scheduled skills** | No | No | +| Cron scheduling | **23 scheduled skills** | No | No | | Persistent state across sessions | **YAML state schemas** | No | No | | Security guardrails | **6 defense-in-depth skills** | No | No | -| Companion scripts with CLI | **37 scripts** | No | No | +| Companion scripts with CLI | **44 scripts** | No | No | +| MCP auth lifecycle tracking | Yes | No | No | +| Upgrade rollback planning | Yes | No | No | +| Deployment preflight / Docker safety | Yes | No | No | | Memory graph / knowledge graph | Yes | No | No | | SQLite session persistence + FTS5 search | Yes | No | No | | Sub-agent recall with token-budgeted grants | Yes | No | No | @@ -185,7 +200,7 @@ Six skills form a defense-in-depth security layer for autonomous agents: │ │ │ ├── SKILL.md │ │ │ └── TEMPLATE.md │ │ └── ... -│ ├── openclaw-native/ # 37 persistent-runtime skills +│ ├── openclaw-native/ # 44 persistent-runtime skills │ │ ├── memory-graph-builder/ │ │ │ ├── SKILL.md # Skill definition + YAML frontmatter │ │ │ ├── STATE_SCHEMA.yaml # State shape (committed, versioned) @@ -208,7 +223,7 @@ Six skills form a defense-in-depth security layer for autonomous agents: Skills marked with a script ship a small executable alongside their `SKILL.md`: -- **36 Python scripts** (`run.py`, `audit.py`, `check.py`, `guard.py`, `bridge.py`, `onboard.py`, `sync.py`, `doctor.py`, `loadout.py`, `governor.py`, `detect.py`, `test.py`, `radar.py`, `graph.py`, `optimize.py`, `compact.py`, `intercept.py`, `score.py`, `integrity.py`, `persist.py`, `recall.py`) — run directly to manipulate state, generate reports, or trigger actions. Install `PyYAML` for any helper that reads or writes skill state. +- **43 Python scripts** (`audit.py`, `bridge.py`, `check.py`, `compact.py`, `detect.py`, `doctor.py`, `governor.py`, `graph.py`, `guard.py`, `integrity.py`, `intercept.py`, `loadout.py`, `manage.py`, `onboard.py`, `optimize.py`, `persist.py`, `prove.py`, `radar.py`, `recall.py`, `recover.py`, `run.py`, `score.py`, `sync.py`, `test.py`, `verify.py`) — run directly to manipulate state, generate reports, or trigger actions. Install `PyYAML` for any helper that reads or writes skill state. - **`vet.sh`** — Pure bash scanner; runs on any system with grep. - Every script supports `--help` and `--format json`. Dry-run mode available on scripts that make changes. - See the `example-state.yaml` in each skill directory for sample state and a commented walkthrough of cron behaviour. @@ -220,9 +235,30 @@ Skills marked with a script ship a small executable alongside their `SKILL.md`: **Solo developer with a persistent AI agent** > Install superpowers, and your agent handles memory cleanup, security audits, and daily briefings on autopilot. You focus on building; the agent maintains itself. +**Anyone bitten by the overnight reset** +> Use `session-reset-recovery` to checkpoint active work before the routine reset window and recover with a concise "here is what changed, here is what to do next" brief after restart. + +**Teams depending on scheduled delivery** +> Use `cron-execution-prover` around cron workflows that write files or send notifications, so "started" and "delivered" are no longer treated as the same thing. + +**Anyone shipping notifications to real humans** +> Use `message-delivery-verifier` for the last mile. It tells you whether a Telegram or Slack-style notification was only queued, actually sent, acknowledged, failed, or left stale. + +**Anyone moving from one agent to a fleet** +> Run `subagent-capability-auditor` before trusting subagents in production. It catches missing spawn capability, risky delegation depth, and flat fleets that will be painful to operate. + +**Anyone upgrading frequently** +> Use `upgrade-rollback-manager` before changing the runtime version so you have preserved config, a version fingerprint, and a rollback plan if the new release behaves badly. + +**Anyone relying on MCP servers with expiring auth** +> Use `mcp-auth-lifecycle-manager` to record expiry windows, refresh commands, and headless-login risk so MCP tools do not fail halfway through unattended work. + **Team running multiple OpenClaw agents** > Use `multi-agent-coordinator` for fleet health checks, `skill-loadout-manager` to keep system prompts lean per agent role, and `heartbeat-governor` to prevent runaway cron costs. +**Self-hosted or Docker deployment** +> Run `deployment-preflight` before the first rollout or after compose changes to catch missing mounts, missing bootstrap files, and public gateway exposure. Follow it with `runtime-verification-dashboard` once the runtime is live. + **Open-source maintainer** > `community-skill-radar` scans Reddit for pain points automatically. `skill-vetting` catches malicious community contributions before they're installed. `installed-skill-auditor` detects post-install tampering. diff --git a/skills/openclaw-native/cron-execution-prover/SKILL.md b/skills/openclaw-native/cron-execution-prover/SKILL.md new file mode 100644 index 0000000..da203f3 --- /dev/null +++ b/skills/openclaw-native/cron-execution-prover/SKILL.md @@ -0,0 +1,66 @@ +--- +name: cron-execution-prover +version: "1.0" +category: openclaw-native +description: Wraps scheduled workflows with proof records — start, finish, evidence, and stale-run detection — so cron jobs can be trusted instead of merely assumed. +stateful: true +--- + +# Cron Execution Prover + +## What it does + +Cron jobs often "kind of run": the task starts, some work happens, but the final side effect never lands. Cron Execution Prover gives scheduled workflows a durable proof trail so you can tell the difference between a completed run and an abandoned one. + +## When to invoke + +- Around any cron-driven workflow that writes files, sends messages, or produces deliverables +- When scheduled jobs seem to run but users still do not receive the expected output +- When debugging stuck or half-finished cron chains + +## Proof model + +Every cron run gets a ledger entry: + +- `expected_at` +- `started_at` +- `finished_at` +- `status` +- `evidence` +- `notes` + +If a run never finishes, the prover can surface it as stale. + +## How to use + +```bash +python3 prove.py --expect morning-briefing --expected-at "2026-03-30T07:00:00" +python3 prove.py --start morning-briefing --run-id mb-20260330-0700 +python3 prove.py --finish morning-briefing --run-id mb-20260330-0700 --evidence "telegram:msg-8812" +python3 prove.py --fail morning-briefing --run-id mb-20260330-0700 --notes "Telegram send timed out" +python3 prove.py --stale +python3 prove.py --report +python3 prove.py --format json +``` + +## Operating rule + +For important cron workflows: + +1. Create or infer an expected run record +2. Mark the run started +3. Record proof of side effects on completion +4. Mark failures explicitly +5. Review stale runs before assuming the schedule is healthy + +## Difference from runtime-verification-dashboard + +`runtime-verification-dashboard` verifies whether cron skills are registered and whether state is fresh. + +`cron-execution-prover` verifies whether specific scheduled runs actually produced the expected outcome. + +## State + +State file: `~/.openclaw/skill-state/cron-execution-prover/state.yaml` + +Fields: `runs`, `stale_runs`, `last_report_at`, `report_history`. diff --git a/skills/openclaw-native/cron-execution-prover/STATE_SCHEMA.yaml b/skills/openclaw-native/cron-execution-prover/STATE_SCHEMA.yaml new file mode 100644 index 0000000..a3e8456 --- /dev/null +++ b/skills/openclaw-native/cron-execution-prover/STATE_SCHEMA.yaml @@ -0,0 +1,31 @@ +version: "1.0" +description: Cron proof ledger, stale-run detection, and reporting history. +fields: + runs: + type: list + items: + skill: { type: string } + run_id: { type: string } + expected_at: { type: datetime } + started_at: { type: datetime } + finished_at: { type: datetime } + status: { type: enum, values: [expected, in_progress, succeeded, failed, stale] } + evidence: { type: list, items: { type: string } } + notes: { type: string } + stale_runs: + type: list + items: + skill: { type: string } + run_id: { type: string } + expected_at: { type: datetime } + age_minutes: { type: integer } + last_report_at: + type: datetime + report_history: + type: list + description: Rolling history of report summaries (last 12) + items: + reported_at: { type: datetime } + total_runs: { type: integer } + stale_run_count: { type: integer } + failed_run_count: { type: integer } diff --git a/skills/openclaw-native/cron-execution-prover/example-state.yaml b/skills/openclaw-native/cron-execution-prover/example-state.yaml new file mode 100644 index 0000000..12151c7 --- /dev/null +++ b/skills/openclaw-native/cron-execution-prover/example-state.yaml @@ -0,0 +1,41 @@ +# Example runtime state for cron-execution-prover +runs: + - skill: "morning-briefing" + run_id: "mb-20260330-0700" + expected_at: "2026-03-30T07:00:00" + started_at: "2026-03-30T07:00:02" + finished_at: "2026-03-30T07:00:18" + status: succeeded + evidence: + - "telegram:msg-8812" + - "state:morning-briefing:last_briefing_date=2026-03-30" + notes: "Delivered successfully." + - skill: "community-skill-radar" + run_id: "csr-20260330-0900" + expected_at: "2026-03-30T09:00:00" + started_at: "2026-03-30T09:00:10" + finished_at: "" + status: stale + evidence: + - "reddit-fetch-started" + notes: "Started, but PROPOSALS.md was never written." +stale_runs: + - skill: "community-skill-radar" + run_id: "csr-20260330-0900" + expected_at: "2026-03-30T09:00:00" + age_minutes: 94 +last_report_at: "2026-03-30T10:34:00" +report_history: + - reported_at: "2026-03-30T10:34:00" + total_runs: 2 + stale_run_count: 1 + failed_run_count: 0 +# ── Walkthrough ────────────────────────────────────────────────────────────── +# python3 prove.py --stale +# +# Cron Execution Prover +# ─────────────────────────────────────────────────────── +# 2 tracked runs | 1 stale | 0 failed +# +# STALE community-skill-radar (csr-20260330-0900) +# Started, but PROPOSALS.md was never written. diff --git a/skills/openclaw-native/cron-execution-prover/prove.py b/skills/openclaw-native/cron-execution-prover/prove.py new file mode 100755 index 0000000..40df24b --- /dev/null +++ b/skills/openclaw-native/cron-execution-prover/prove.py @@ -0,0 +1,226 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +""" +Cron Execution Prover for openclaw-superpowers. + +Maintains a proof ledger around cron-driven workflows so start/finish/failure +and evidence are explicit. +""" + +import argparse +import json +import os +from datetime import datetime +from pathlib import Path + +try: + import yaml + HAS_YAML = True +except ImportError: + HAS_YAML = False + +OPENCLAW_DIR = Path(os.environ.get("OPENCLAW_HOME", Path.home() / ".openclaw")) +STATE_FILE = OPENCLAW_DIR / "skill-state" / "cron-execution-prover" / "state.yaml" +MAX_RUNS = 100 +MAX_HISTORY = 12 +STALE_AFTER_MINUTES = 60 + + +def default_state() -> dict: + return { + "runs": [], + "stale_runs": [], + "last_report_at": "", + "report_history": [], + } + + +def load_state() -> dict: + if not STATE_FILE.exists(): + return default_state() + try: + text = STATE_FILE.read_text() + if HAS_YAML: + return yaml.safe_load(text) or default_state() + return json.loads(text) + except Exception: + return default_state() + + +def save_state(state: dict) -> None: + STATE_FILE.parent.mkdir(parents=True, exist_ok=True) + if HAS_YAML: + with open(STATE_FILE, "w") as handle: + yaml.dump(state, handle, default_flow_style=False, allow_unicode=True, sort_keys=False) + else: + STATE_FILE.write_text(json.dumps(state, indent=2)) + + +def now_iso() -> str: + return datetime.now().isoformat(timespec="seconds") + + +def find_run(state: dict, skill: str, run_id: str) -> dict | None: + for item in state.get("runs", []): + if item.get("skill") == skill and item.get("run_id") == run_id: + return item + return None + + +def ensure_run(state: dict, skill: str, run_id: str) -> dict: + existing = find_run(state, skill, run_id) + if existing: + return existing + entry = { + "skill": skill, + "run_id": run_id, + "expected_at": "", + "started_at": "", + "finished_at": "", + "status": "expected", + "evidence": [], + "notes": "", + } + state["runs"] = [entry] + (state.get("runs") or []) + state["runs"] = state["runs"][:MAX_RUNS] + return entry + + +def refresh_stale(state: dict) -> None: + stale = [] + current = datetime.now() + for item in state.get("runs", []): + if item.get("status") in {"succeeded", "failed"}: + continue + ts = item.get("started_at") or item.get("expected_at") + if not ts: + continue + try: + age = int((current - datetime.fromisoformat(ts)).total_seconds() / 60) + except ValueError: + continue + if age > STALE_AFTER_MINUTES: + item["status"] = "stale" + stale.append( + { + "skill": item.get("skill", ""), + "run_id": item.get("run_id", ""), + "expected_at": item.get("expected_at", ""), + "age_minutes": age, + } + ) + state["stale_runs"] = stale + + +def record_history(state: dict) -> None: + refresh_stale(state) + now = now_iso() + history = state.get("report_history") or [] + history.insert( + 0, + { + "reported_at": now, + "total_runs": len(state.get("runs", [])), + "stale_run_count": len(state.get("stale_runs", [])), + "failed_run_count": sum(1 for item in state.get("runs", []) if item.get("status") == "failed"), + }, + ) + state["last_report_at"] = now + state["report_history"] = history[:MAX_HISTORY] + + +def print_report(state: dict, stale_only: bool = False) -> None: + refresh_stale(state) + runs = state.get("runs", []) + stale_runs = state.get("stale_runs", []) + failed_count = sum(1 for item in runs if item.get("status") == "failed") + print("\nCron Execution Prover") + print("───────────────────────────────────────────────────────") + print(f" {len(runs)} tracked runs | {len(stale_runs)} stale | {failed_count} failed") + if stale_only: + if not stale_runs: + print("\n No stale runs.") + return + print() + for item in stale_runs: + print(f" STALE {item['skill']} ({item['run_id']})") + return + if not runs: + print("\n No runs recorded.") + return + print() + for item in runs[:10]: + print(f" {item.get('status', '').upper():10} {item.get('skill', '')} ({item.get('run_id', '')})") + if item.get("notes"): + print(f" {item['notes']}") + + +def main() -> None: + parser = argparse.ArgumentParser(description="Proof ledger for cron-driven workflows") + group = parser.add_mutually_exclusive_group(required=True) + group.add_argument("--expect", metavar="SKILL", help="Record an expected run") + group.add_argument("--start", metavar="SKILL", help="Mark a run started") + group.add_argument("--finish", metavar="SKILL", help="Mark a run finished") + group.add_argument("--fail", metavar="SKILL", help="Mark a run failed") + group.add_argument("--stale", action="store_true", help="Show stale runs") + group.add_argument("--report", action="store_true", help="Show run report") + parser.add_argument("--run-id", help="Unique run identifier") + parser.add_argument("--expected-at", help="Expected run time in ISO format") + parser.add_argument("--evidence", nargs="*", default=[], help="Proof artifacts or side effects") + parser.add_argument("--notes", default="", help="Extra notes") + parser.add_argument("--format", choices=["human", "json"], default="human") + args = parser.parse_args() + + state = load_state() + if args.expect: + run_id = args.run_id or f"{args.expect}-{datetime.now().strftime('%Y%m%d%H%M%S')}" + run = ensure_run(state, args.expect, run_id) + run["expected_at"] = args.expected_at or now_iso() + run["notes"] = args.notes or run.get("notes", "") + save_state(state) + elif args.start: + if not args.run_id: + raise SystemExit("--run-id is required for --start") + run = ensure_run(state, args.start, args.run_id) + run["started_at"] = now_iso() + run["status"] = "in_progress" + if args.notes: + run["notes"] = args.notes + save_state(state) + elif args.finish: + if not args.run_id: + raise SystemExit("--run-id is required for --finish") + run = ensure_run(state, args.finish, args.run_id) + run["finished_at"] = now_iso() + run["status"] = "succeeded" + run["evidence"] = args.evidence or run.get("evidence", []) + if args.notes: + run["notes"] = args.notes + save_state(state) + elif args.fail: + if not args.run_id: + raise SystemExit("--run-id is required for --fail") + run = ensure_run(state, args.fail, args.run_id) + run["finished_at"] = now_iso() + run["status"] = "failed" + run["evidence"] = args.evidence or run.get("evidence", []) + run["notes"] = args.notes or run.get("notes", "") + save_state(state) + + if args.report or args.stale: + record_history(state) + save_state(state) + + refresh_stale(state) + if args.format == "json": + print(json.dumps(state, indent=2)) + return + if args.stale: + print_report(state, stale_only=True) + else: + print_report(state) + + +if __name__ == "__main__": + main() diff --git a/skills/openclaw-native/deployment-preflight/SKILL.md b/skills/openclaw-native/deployment-preflight/SKILL.md new file mode 100644 index 0000000..044372f --- /dev/null +++ b/skills/openclaw-native/deployment-preflight/SKILL.md @@ -0,0 +1,70 @@ +--- +name: deployment-preflight +version: "1.0" +category: openclaw-native +description: Validates OpenClaw deployment safety before install, upgrade, or unattended use — checks workspace visibility, persistent mounts, gateway exposure, and critical runtime paths. +stateful: true +--- + +# Deployment Preflight + +## What it does + +OpenClaw often fails in boring ways before the agent does anything wrong: the workspace is not mounted, `.openclaw` is ephemeral, the gateway is publicly exposed, or the extension install path points somewhere unexpected. + +Deployment Preflight checks the environment before you trust it with unattended work. + +## When to invoke + +- Before first-time OpenClaw setup +- Before or after container / compose changes +- Before enabling cron-heavy autonomous workflows +- After upgrades, migrations, or moving the runtime to a new machine + +## What it checks + +| Check | Why it matters | +|---|---| +| OpenClaw home | Missing or unwritable runtime directories break stateful skills immediately | +| Workspace bootstrap | If `AGENTS.md`, `SOUL.md`, or `MEMORY.md` are absent, the agent starts half-configured | +| Superpowers install path | Detects missing or non-standard extension wiring before skills silently disappear | +| Compose / Docker persistence | Flags deployments that do not persist `.openclaw` or workspace data | +| Gateway exposure | Warns when common OpenClaw ports are published publicly or `network_mode: host` is used | +| Tooling readiness | Confirms `openclaw`, `docker`, and `PyYAML` are present when the deployment depends on them | + +## How to use + +```bash +python3 check.py --check +python3 check.py --check --path /srv/openclaw +python3 check.py --status +python3 check.py --findings +python3 check.py --format json +``` + +## Procedure + +1. Point the checker at the deployment root if your compose files live outside the current directory. +2. Run `python3 check.py --check`. +3. Fix all FAIL items before deploying. +4. Review WARN items before enabling unattended cron jobs. +5. Save the last known-good output in state so later drift is obvious. + +## Output levels + +- **PASS** — the preflight area looks healthy +- **WARN** — the deployment may work, but there is drift or risk +- **FAIL** — fix this before trusting the runtime + +## Scope + +This skill is for deployment wiring, not live runtime behaviour. + +- Use `runtime-verification-dashboard` after install to verify cron registration, stale state, and live runtime health. +- Use `deployment-preflight` before install or after infrastructure changes to catch environment mistakes early. + +## State + +State file: `~/.openclaw/skill-state/deployment-preflight/state.yaml` + +Fields: `last_check_at`, `deployment_root`, `deployment_mode`, `environment`, `findings`, `check_history`. diff --git a/skills/openclaw-native/deployment-preflight/STATE_SCHEMA.yaml b/skills/openclaw-native/deployment-preflight/STATE_SCHEMA.yaml new file mode 100644 index 0000000..9412eb8 --- /dev/null +++ b/skills/openclaw-native/deployment-preflight/STATE_SCHEMA.yaml @@ -0,0 +1,41 @@ +version: "1.0" +description: Deployment preflight results, environment summary, and rolling history. +fields: + last_check_at: + type: datetime + deployment_root: + type: string + deployment_mode: + type: enum + values: [local, docker-compose, dockerfile, unknown] + environment: + type: object + items: + openclaw_home: { type: string } + workspace_dir: { type: string } + superpowers_path: { type: string } + openclaw_cli_found: { type: boolean } + pyyaml_found: { type: boolean } + docker_found: { type: boolean } + docker_compose_found: { type: boolean } + files_checked: { type: integer, default: 0 } + findings: + type: list + items: + severity: { type: enum, values: [FAIL, WARN, INFO] } + check: { type: string } + detail: { type: string } + suggestion: { type: string } + file_path: { type: string } + detected_at: { type: datetime } + resolved: { type: boolean } + check_history: + type: list + description: Rolling preflight summaries (last 12) + items: + checked_at: { type: datetime } + deployment_mode: { type: string } + fail_count: { type: integer } + warn_count: { type: integer } + info_count: { type: integer } + files_checked: { type: integer } diff --git a/skills/openclaw-native/deployment-preflight/check.py b/skills/openclaw-native/deployment-preflight/check.py new file mode 100755 index 0000000..d45c7da --- /dev/null +++ b/skills/openclaw-native/deployment-preflight/check.py @@ -0,0 +1,501 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +""" +Deployment Preflight for openclaw-superpowers. + +Checks OpenClaw deployment safety before install, upgrade, or unattended use. +It focuses on runtime paths, workspace visibility, Docker/compose persistence, +and obvious gateway exposure mistakes. + +Usage: + python3 check.py --check + python3 check.py --check --path /srv/openclaw + python3 check.py --status + python3 check.py --findings + python3 check.py --format json +""" + +import argparse +import json +import os +import re +import shutil +import subprocess +import sys +from datetime import datetime +from pathlib import Path + +try: + import yaml + HAS_YAML = True +except ImportError: + HAS_YAML = False + +OPENCLAW_DIR = Path(os.environ.get("OPENCLAW_HOME", Path.home() / ".openclaw")) +STATE_FILE = OPENCLAW_DIR / "skill-state" / "deployment-preflight" / "state.yaml" +WORKSPACE_DIR = Path(os.environ.get("OPENCLAW_WORKSPACE", OPENCLAW_DIR / "workspace")) +SUPERPOWERS_PATH = OPENCLAW_DIR / "extensions" / "superpowers" +MAX_HISTORY = 12 +COMPOSE_NAMES = ( + "compose.yml", + "compose.yaml", + "docker-compose.yml", + "docker-compose.yaml", +) +CONFIG_EXTENSIONS = {".json", ".yaml", ".yml", ".toml", ".conf", ".ini"} +PUBLIC_PORT_RE = re.compile(r'(^|["\'\s-])((?:0\.0\.0\.0:)?(?:18789|3000|8080):(?:18789|3000|8080))') +LOOPBACK_PORT_RE = re.compile(r'127\.0\.0\.1:(?:18789|3000|8080):(?:18789|3000|8080)') +PUBLIC_BIND_RE = re.compile(r'0\.0\.0\.0|ws://0\.0\.0\.0|http://0\.0\.0\.0') + + +def default_state() -> dict: + return { + "last_check_at": "", + "deployment_root": "", + "deployment_mode": "unknown", + "environment": {}, + "findings": [], + "check_history": [], + } + + +def load_state() -> dict: + if not STATE_FILE.exists(): + return default_state() + try: + text = STATE_FILE.read_text() + if HAS_YAML: + return yaml.safe_load(text) or default_state() + return json.loads(text) + except Exception: + return default_state() + + +def save_state(state: dict) -> None: + STATE_FILE.parent.mkdir(parents=True, exist_ok=True) + if HAS_YAML: + with open(STATE_FILE, "w") as handle: + yaml.dump(state, handle, default_flow_style=False, allow_unicode=True, sort_keys=False) + else: + STATE_FILE.write_text(json.dumps(state, indent=2)) + + +def finding(severity: str, check: str, detail: str, suggestion: str, file_path: Path | str = "") -> dict: + return { + "severity": severity, + "check": check, + "detail": detail, + "suggestion": suggestion, + "file_path": str(file_path), + "detected_at": datetime.now().isoformat(), + "resolved": False, + } + + +def docker_compose_available() -> bool: + if shutil.which("docker") is None: + return False + try: + proc = subprocess.run( + ["docker", "compose", "version"], + capture_output=True, + text=True, + timeout=5, + check=False, + ) + except Exception: + return False + return proc.returncode == 0 + + +def detect_deployment_files(root: Path) -> tuple[str, list[Path]]: + if not root.exists() or not root.is_dir(): + return "unknown", [] + + compose_files = [root / name for name in COMPOSE_NAMES if (root / name).exists()] + if compose_files: + return "docker-compose", compose_files + + dockerfiles = [path for path in root.iterdir() if path.is_file() and path.name.startswith("Dockerfile")] + if dockerfiles: + return "dockerfile", dockerfiles + + return "local", [] + + +def check_runtime_paths() -> list[dict]: + items = [] + + if not OPENCLAW_DIR.exists(): + items.append( + finding( + "FAIL", + "OPENCLAW_HOME_MISSING", + f"OpenClaw home does not exist: {OPENCLAW_DIR}", + "Create or mount OPENCLAW_HOME before deployment.", + OPENCLAW_DIR, + ) + ) + return items + + if not os.access(str(OPENCLAW_DIR), os.W_OK): + items.append( + finding( + "FAIL", + "OPENCLAW_HOME_UNWRITABLE", + f"OpenClaw home is not writable: {OPENCLAW_DIR}", + "Fix filesystem permissions or container user mapping.", + OPENCLAW_DIR, + ) + ) + + state_dir = OPENCLAW_DIR / "skill-state" + if not state_dir.exists(): + items.append( + finding( + "WARN", + "STATE_DIR_MISSING", + f"Skill state directory does not exist yet: {state_dir}", + "Run ./install.sh or create the directory before enabling stateful skills.", + state_dir, + ) + ) + + if not WORKSPACE_DIR.exists(): + items.append( + finding( + "WARN", + "WORKSPACE_MISSING", + f"Workspace directory does not exist: {WORKSPACE_DIR}", + "Mount or create the workspace before trusting project memory or identity files.", + WORKSPACE_DIR, + ) + ) + else: + required = ["AGENTS.md", "SOUL.md", "MEMORY.md"] + present = [name for name in required if (WORKSPACE_DIR / name).exists()] + missing = [name for name in required if not (WORKSPACE_DIR / name).exists()] + if not present: + items.append( + finding( + "WARN", + "WORKSPACE_BOOTSTRAP_MISSING", + "Workspace exists but none of AGENTS.md, SOUL.md, or MEMORY.md were found.", + "Add the bootstrap files the agent depends on before unattended use.", + WORKSPACE_DIR, + ) + ) + elif missing: + items.append( + finding( + "WARN", + "WORKSPACE_BOOTSTRAP_PARTIAL", + f"Workspace exists but {', '.join(missing)} {'is' if len(missing) == 1 else 'are'} missing.", + "Add the missing bootstrap files before relying on persistent context.", + WORKSPACE_DIR, + ) + ) + + if not SUPERPOWERS_PATH.exists(): + items.append( + finding( + "WARN", + "SUPERPOWERS_NOT_INSTALLED", + f"superpowers is not installed at {SUPERPOWERS_PATH}", + "Run ./install.sh after cloning the repository outside the extensions directory.", + SUPERPOWERS_PATH, + ) + ) + elif SUPERPOWERS_PATH.is_symlink(): + items.append( + finding( + "INFO", + "SUPERPOWERS_SYMLINK", + "superpowers is installed as a symlink to the skills directory.", + "No action required.", + SUPERPOWERS_PATH, + ) + ) + else: + items.append( + finding( + "WARN", + "SUPERPOWERS_NOT_SYMLINK", + "superpowers exists but is not a symlink.", + "Confirm this is intentional and not a stale checkout inside the extensions directory.", + SUPERPOWERS_PATH, + ) + ) + + return items + + +def check_tooling(mode: str) -> list[dict]: + items = [] + docker_found = shutil.which("docker") is not None + docker_compose_found = docker_compose_available() + + if shutil.which("openclaw") is None: + items.append( + finding( + "WARN", + "OPENCLAW_CLI_MISSING", + "`openclaw` CLI is not on PATH.", + "Install the OpenClaw CLI in the runtime environment.", + ) + ) + if not HAS_YAML: + items.append( + finding( + "WARN", + "PYYAML_MISSING", + "PyYAML is unavailable; some stateful helpers will degrade.", + "Install PyYAML with `python3 -m pip install PyYAML`.", + ) + ) + if mode in {"docker-compose", "dockerfile"} and not docker_found: + items.append( + finding( + "WARN", + "DOCKER_MISSING", + "Deployment files suggest Docker, but `docker` is not on PATH.", + "Run the checker where Docker is installed or point it at the correct environment.", + ) + ) + if mode == "docker-compose" and docker_found and not docker_compose_found: + items.append( + finding( + "WARN", + "DOCKER_COMPOSE_MISSING", + "Docker is installed, but `docker compose` is unavailable.", + "Install the compose plugin or run the checker where compose is available.", + ) + ) + return items + + +def check_compose_files(files: list[Path]) -> list[dict]: + items = [] + for path in files: + try: + text = path.read_text() + except Exception as exc: + items.append( + finding( + "WARN", + "COMPOSE_UNREADABLE", + f"Could not read compose file: {exc}", + "Fix file permissions or path selection before relying on preflight output.", + path, + ) + ) + continue + + if ".openclaw" not in text: + items.append( + finding( + "WARN", + "EPHEMERAL_OPENCLAW_HOME", + f"{path.name} does not appear to mount `.openclaw`.", + "Persist OPENCLAW_HOME so skills, memory, and config survive container restarts.", + path, + ) + ) + + if "/workspace" not in text and "workspace" not in text: + items.append( + finding( + "WARN", + "WORKSPACE_MOUNT_UNCLEAR", + f"{path.name} does not clearly mount a workspace path.", + "Confirm the runtime can see the project workspace and bootstrap files.", + path, + ) + ) + + if "network_mode: host" in text: + items.append( + finding( + "WARN", + "HOST_NETWORK_MODE", + f"{path.name} uses `network_mode: host`.", + "Prefer explicit loopback bindings unless host networking is required.", + path, + ) + ) + + if PUBLIC_PORT_RE.search(text) and not LOOPBACK_PORT_RE.search(text): + items.append( + finding( + "WARN", + "PUBLIC_GATEWAY_PORT", + f"{path.name} publishes a common OpenClaw port without loopback binding.", + "Bind the gateway to 127.0.0.1 or put it behind an authenticated reverse proxy.", + path, + ) + ) + return items + + +def check_config_exposure() -> list[dict]: + items = [] + if not OPENCLAW_DIR.exists(): + return items + + scanned = 0 + for path in OPENCLAW_DIR.rglob("*"): + if not path.is_file() or path.suffix not in CONFIG_EXTENSIONS: + continue + scanned += 1 + try: + text = path.read_text(errors="replace") + except Exception: + continue + + if PUBLIC_BIND_RE.search(text): + items.append( + finding( + "WARN", + "PUBLIC_BIND_ADDRESS", + f"{path.name} appears to bind OpenClaw services to 0.0.0.0.", + "Use loopback bindings unless you have an authenticated proxy in front.", + path, + ) + ) + return items[:10] + + +def run_check(root: Path) -> dict: + mode, deployment_files = detect_deployment_files(root) + findings = [] + if not root.exists() or not root.is_dir(): + findings.append( + finding( + "FAIL", + "DEPLOYMENT_ROOT_MISSING", + f"Deployment root does not exist or is not a directory: {root}", + "Point `--path` at the directory containing your compose or Docker files.", + root, + ) + ) + findings.extend(check_runtime_paths()) + findings.extend(check_tooling(mode)) + if mode == "docker-compose": + findings.extend(check_compose_files(deployment_files)) + findings.extend(check_config_exposure()) + + fail_count = sum(1 for item in findings if item["severity"] == "FAIL") + warn_count = sum(1 for item in findings if item["severity"] == "WARN") + info_count = sum(1 for item in findings if item["severity"] == "INFO") + environment = { + "openclaw_home": str(OPENCLAW_DIR), + "workspace_dir": str(WORKSPACE_DIR), + "superpowers_path": str(SUPERPOWERS_PATH), + "openclaw_cli_found": shutil.which("openclaw") is not None, + "pyyaml_found": HAS_YAML, + "docker_found": shutil.which("docker") is not None, + "docker_compose_found": docker_compose_available(), + "files_checked": len(deployment_files), + } + + state = load_state() + history = state.get("check_history") or [] + now = datetime.now().isoformat() + history.insert( + 0, + { + "checked_at": now, + "deployment_mode": mode, + "fail_count": fail_count, + "warn_count": warn_count, + "info_count": info_count, + "files_checked": len(deployment_files), + }, + ) + + report = { + "last_check_at": now, + "deployment_root": str(root), + "deployment_mode": mode, + "environment": environment, + "findings": findings, + "check_history": history[:MAX_HISTORY], + } + save_state(report) + return report + + +def print_summary(report: dict) -> None: + findings = report.get("findings", []) + fail_count = sum(1 for item in findings if item["severity"] == "FAIL") + warn_count = sum(1 for item in findings if item["severity"] == "WARN") + info_count = sum(1 for item in findings if item["severity"] == "INFO") + print("\nDeployment Preflight") + print("───────────────────────────────────────────────────────") + print(f" Mode: {report.get('deployment_mode', 'unknown')}") + print(f" Root: {report.get('deployment_root', '')}") + print(f" Files checked: {report.get('environment', {}).get('files_checked', 0)}") + print(f" {fail_count} FAIL | {warn_count} WARN | {info_count} INFO") + + +def print_findings(report: dict) -> None: + findings = report.get("findings", []) + if not findings: + print("\n PASS No deployment findings.") + return + print("\nFindings") + print("───────────────────────────────────────────────────────") + for item in findings: + print(f" {item['severity']:4} {item['check']}") + print(f" {item['detail']}") + if item["file_path"]: + print(f" File: {item['file_path']}") + + +def print_status(report: dict) -> None: + print_summary(report) + history = report.get("check_history", []) + if history: + print("\nRecent checks") + print("───────────────────────────────────────────────────────") + for item in history[:5]: + print( + f" {item['checked_at'][:19]} " + f"{item['deployment_mode']} " + f"F:{item['fail_count']} W:{item['warn_count']} I:{item['info_count']}" + ) + + +def main() -> None: + parser = argparse.ArgumentParser(description="Deployment safety checks for OpenClaw") + group = parser.add_mutually_exclusive_group(required=True) + group.add_argument("--check", action="store_true", help="Run a deployment preflight") + group.add_argument("--status", action="store_true", help="Show the last preflight summary") + group.add_argument("--findings", action="store_true", help="Show findings from the last preflight") + parser.add_argument("--path", default=".", help="Deployment root containing compose or Docker files") + parser.add_argument("--format", choices=["human", "json"], default="human") + args = parser.parse_args() + + if args.check: + report = run_check(Path(args.path).expanduser().resolve()) + else: + report = load_state() + + if args.format == "json": + print(json.dumps(report, indent=2)) + return + + if args.status: + print_status(report) + return + + print_summary(report) + print_findings(report) + if args.check: + fail_count = sum(1 for item in report.get("findings", []) if item["severity"] == "FAIL") + sys.exit(1 if fail_count else 0) + + +if __name__ == "__main__": + main() diff --git a/skills/openclaw-native/deployment-preflight/example-state.yaml b/skills/openclaw-native/deployment-preflight/example-state.yaml new file mode 100644 index 0000000..5a79b60 --- /dev/null +++ b/skills/openclaw-native/deployment-preflight/example-state.yaml @@ -0,0 +1,62 @@ +# Example runtime state for deployment-preflight +last_check_at: "2026-03-29T09:15:05.000000" +deployment_root: "/srv/openclaw" +deployment_mode: docker-compose +environment: + openclaw_home: "/srv/openclaw/.openclaw" + workspace_dir: "/srv/openclaw/.openclaw/workspace" + superpowers_path: "/srv/openclaw/.openclaw/extensions/superpowers" + openclaw_cli_found: true + pyyaml_found: true + docker_found: true + docker_compose_found: true + files_checked: 3 +findings: + - severity: WARN + check: WORKSPACE_BOOTSTRAP_PARTIAL + detail: "Workspace exists but MEMORY.md is missing." + suggestion: "Add the missing bootstrap file before relying on persistent context." + file_path: "/srv/openclaw/.openclaw/workspace" + detected_at: "2026-03-29T09:15:05.000000" + resolved: false + - severity: WARN + check: PUBLIC_GATEWAY_PORT + detail: "compose.yaml publishes 18789 without loopback binding." + suggestion: "Bind the gateway to 127.0.0.1 or put it behind an authenticated reverse proxy." + file_path: "/srv/openclaw/compose.yaml" + detected_at: "2026-03-29T09:15:05.000000" + resolved: false + - severity: INFO + check: SUPERPOWERS_SYMLINK + detail: "superpowers is installed as a symlink to the skills directory." + suggestion: "No action required." + file_path: "/srv/openclaw/.openclaw/extensions/superpowers" + detected_at: "2026-03-29T09:15:05.000000" + resolved: false +check_history: + - checked_at: "2026-03-29T09:15:05.000000" + deployment_mode: docker-compose + fail_count: 0 + warn_count: 2 + info_count: 1 + files_checked: 3 + - checked_at: "2026-03-28T18:00:00.000000" + deployment_mode: docker-compose + fail_count: 1 + warn_count: 1 + info_count: 0 + files_checked: 3 +# ── Walkthrough ────────────────────────────────────────────────────────────── +# python3 check.py --check --path /srv/openclaw +# +# Deployment Preflight +# ─────────────────────────────────────────────────────── +# Mode: docker-compose +# Files checked: 3 +# 0 FAIL | 2 WARN | 1 INFO +# +# WARN WORKSPACE_BOOTSTRAP_PARTIAL +# Workspace exists but MEMORY.md is missing. +# +# WARN PUBLIC_GATEWAY_PORT +# compose.yaml publishes 18789 without loopback binding. diff --git a/skills/openclaw-native/mcp-auth-lifecycle-manager/SKILL.md b/skills/openclaw-native/mcp-auth-lifecycle-manager/SKILL.md new file mode 100644 index 0000000..1bf23b2 --- /dev/null +++ b/skills/openclaw-native/mcp-auth-lifecycle-manager/SKILL.md @@ -0,0 +1,76 @@ +--- +name: mcp-auth-lifecycle-manager +version: "1.0" +category: openclaw-native +description: Tracks MCP auth dependencies, token expiry, missing env vars, and refresh readiness so MCP servers do not silently fail after credentials age out. +stateful: true +cron: "0 */6 * * *" +--- + +# MCP Auth Lifecycle Manager + +## What it does + +MCP outages often start as auth problems, not transport problems. A server can be reachable but still unusable because a token expired, a refresh command is missing, or a required environment variable is no longer set. + +MCP Auth Lifecycle Manager keeps a per-server auth ledger: expiry windows, refresh cadence, missing env vars, interactive login requirements, and the last successful refresh event. + +## When to invoke + +- After adding or changing an MCP server +- Before unattended runs that depend on OAuth or rotating tokens +- When an MCP server is healthy but tool calls still fail with auth errors +- As a scheduled audit every 6 hours + +## What it checks + +| Check | What it means | +|---|---| +| `TOKEN_MISSING` | Required auth environment variable is not set | +| `TOKEN_EXPIRING` | Auth expires within the next 24 hours | +| `TOKEN_EXPIRED` | Auth is already past its expiry timestamp | +| `REFRESH_UNDEFINED` | Server has auth but no refresh command or refresh guidance | +| `REFRESH_OVERDUE` | Last successful refresh is older than the expected interval | +| `STATIC_SECRET_IN_CONFIG` | Token or secret appears to be hardcoded in MCP config | +| `INTERACTIVE_REFRESH` | Refresh still requires a browser or manual login | + +## How to use + +```bash +python3 manage.py --scan +python3 manage.py --scan --server github +python3 manage.py --status +python3 manage.py --plan github +python3 manage.py --record-refresh github --result success --note "gh auth refresh completed" +python3 manage.py --record-refresh github --result success --expires-at 2026-04-20T12:00:00 +python3 manage.py --history +python3 manage.py --format json +``` + +## Optional auth registry + +Use `~/.openclaw/config/mcp-auth.yaml` or `mcp-auth.json` to record provider-specific lifecycle data: + +```yaml +servers: + github: + provider: github + auth_type: oauth + expires_at: "2026-04-20T12:00:00" + refresh_command: "gh auth refresh -h github.com" + refresh_interval_hours: 12 + interactive_refresh: false + notes: "Bot token rotated by SSO policy" +``` + +## Difference from mcp-health-checker + +`mcp-health-checker` asks: "Can I reach the server right now?" + +`mcp-auth-lifecycle-manager` asks: "Will auth still work tomorrow, and do I know how to recover it when it stops?" + +## State + +State file: `~/.openclaw/skill-state/mcp-auth-lifecycle-manager/state.yaml` + +Fields: `last_scan_at`, `last_config_path`, `last_registry_path`, `servers`, `refresh_history`. diff --git a/skills/openclaw-native/mcp-auth-lifecycle-manager/STATE_SCHEMA.yaml b/skills/openclaw-native/mcp-auth-lifecycle-manager/STATE_SCHEMA.yaml new file mode 100644 index 0000000..da043d4 --- /dev/null +++ b/skills/openclaw-native/mcp-auth-lifecycle-manager/STATE_SCHEMA.yaml @@ -0,0 +1,43 @@ +version: "1.0" +description: MCP auth inventory, expiry tracking, refresh readiness, and refresh history. +fields: + last_scan_at: + type: datetime + last_config_path: + type: string + last_registry_path: + type: string + servers: + type: list + items: + name: { type: string } + provider: { type: string } + auth_type: { type: string } + status: { type: enum, values: [healthy, degraded, critical, unknown] } + env_vars: + type: list + items: { type: string } + missing_env_vars: + type: list + items: { type: string } + expires_at: { type: datetime } + last_refresh_at: { type: datetime } + refresh_interval_hours: { type: integer } + interactive_refresh: { type: boolean } + refresh_command: { type: string } + notes: { type: string } + findings: + type: list + items: + check: { type: string } + severity: { type: string } + detail: { type: string } + refresh_history: + type: list + description: Rolling log of manual or automated refresh outcomes + items: + recorded_at: { type: datetime } + server: { type: string } + result: { type: enum, values: [success, failed] } + note: { type: string } + expires_at: { type: datetime } diff --git a/skills/openclaw-native/mcp-auth-lifecycle-manager/example-state.yaml b/skills/openclaw-native/mcp-auth-lifecycle-manager/example-state.yaml new file mode 100644 index 0000000..d541cdd --- /dev/null +++ b/skills/openclaw-native/mcp-auth-lifecycle-manager/example-state.yaml @@ -0,0 +1,69 @@ +# Example runtime state for mcp-auth-lifecycle-manager +last_scan_at: "2026-04-02T12:00:00" +last_config_path: "/Users/you/.openclaw/config/mcp.yaml" +last_registry_path: "/Users/you/.openclaw/config/mcp-auth.yaml" +servers: + - name: github + provider: github + auth_type: oauth + status: healthy + env_vars: ["GITHUB_TOKEN"] + missing_env_vars: [] + expires_at: "2026-04-20T12:00:00" + last_refresh_at: "2026-04-02T06:30:00" + refresh_interval_hours: 12 + interactive_refresh: false + refresh_command: "gh auth refresh -h github.com" + notes: "SSO rotation policy" + findings: [] + - name: linear + provider: linear + auth_type: bearer + status: critical + env_vars: ["LINEAR_API_TOKEN"] + missing_env_vars: ["LINEAR_API_TOKEN"] + expires_at: "" + last_refresh_at: "" + refresh_interval_hours: 24 + interactive_refresh: true + refresh_command: "" + notes: "Browser login required until service account is configured" + findings: + - check: TOKEN_MISSING + severity: CRITICAL + detail: "Missing environment variable: LINEAR_API_TOKEN" + - check: REFRESH_UNDEFINED + severity: HIGH + detail: "No refresh command or documented recovery path" + - check: INTERACTIVE_REFRESH + severity: MEDIUM + detail: "Refresh still requires a manual browser login" +refresh_history: + - recorded_at: "2026-04-02T06:30:00" + server: github + result: success + note: "gh auth refresh -h github.com" + expires_at: "2026-04-20T12:00:00" + - recorded_at: "2026-04-01T12:10:00" + server: linear + result: failed + note: "Interactive login not available on headless host" + expires_at: "" +# ── Walkthrough ────────────────────────────────────────────────────────────── +# python3 manage.py --scan +# +# MCP Auth Lifecycle Manager +# ─────────────────────────────────────────────────────── +# 2 servers | 1 healthy | 0 degraded | 1 critical +# +# HEALTHY github oauth +# CRITICAL linear bearer +# [CRITICAL] TOKEN_MISSING: Missing environment variable: LINEAR_API_TOKEN +# [HIGH] REFRESH_UNDEFINED: No refresh command or documented recovery path +# +# python3 manage.py --plan github +# +# Refresh plan for github +# 1. Confirm env vars: GITHUB_TOKEN +# 2. Run: gh auth refresh -h github.com +# 3. Re-run mcp-health-checker and record the result with manage.py --record-refresh diff --git a/skills/openclaw-native/mcp-auth-lifecycle-manager/manage.py b/skills/openclaw-native/mcp-auth-lifecycle-manager/manage.py new file mode 100755 index 0000000..60a08fe --- /dev/null +++ b/skills/openclaw-native/mcp-auth-lifecycle-manager/manage.py @@ -0,0 +1,480 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +""" +MCP Auth Lifecycle Manager for openclaw-superpowers. + +Tracks MCP auth expiry, missing env vars, refresh readiness, and refresh history. +""" + +import argparse +import json +import os +import re +from datetime import datetime, timedelta +from pathlib import Path + +try: + import yaml + HAS_YAML = True +except ImportError: + HAS_YAML = False + +OPENCLAW_DIR = Path(os.environ.get("OPENCLAW_HOME", Path.home() / ".openclaw")) +STATE_FILE = OPENCLAW_DIR / "skill-state" / "mcp-auth-lifecycle-manager" / "state.yaml" +MAX_HISTORY = 20 + +MCP_CONFIG_PATHS = [ + OPENCLAW_DIR / "config" / "mcp.yaml", + OPENCLAW_DIR / "config" / "mcp.json", + OPENCLAW_DIR / "mcp.yaml", + OPENCLAW_DIR / "mcp.json", + Path.home() / ".config" / "openclaw" / "mcp.yaml", + Path.home() / ".config" / "openclaw" / "mcp.json", +] + +AUTH_REGISTRY_PATHS = [ + OPENCLAW_DIR / "config" / "mcp-auth.yaml", + OPENCLAW_DIR / "config" / "mcp-auth.json", + OPENCLAW_DIR / "mcp-auth.yaml", + OPENCLAW_DIR / "mcp-auth.json", + Path.home() / ".config" / "openclaw" / "mcp-auth.yaml", + Path.home() / ".config" / "openclaw" / "mcp-auth.json", +] + +SECRET_KEY_PATTERN = re.compile(r"(token|secret|password|api[_-]?key|authorization)", re.I) +ENV_REF_PATTERN = re.compile(r"^\$(\w+)$|^\$\{(\w+)\}$") + + +def default_state() -> dict: + return { + "last_scan_at": "", + "last_config_path": "", + "last_registry_path": "", + "servers": [], + "refresh_history": [], + } + + +def load_structured(path: Path) -> dict: + text = path.read_text() + if path.suffix == ".json": + return json.loads(text) + if HAS_YAML: + return yaml.safe_load(text) or {} + return {} + + +def load_state() -> dict: + if not STATE_FILE.exists(): + return default_state() + try: + text = STATE_FILE.read_text() + if HAS_YAML: + return yaml.safe_load(text) or default_state() + return json.loads(text) + except Exception: + return default_state() + + +def save_state(state: dict) -> None: + STATE_FILE.parent.mkdir(parents=True, exist_ok=True) + if HAS_YAML: + with open(STATE_FILE, "w") as handle: + yaml.dump(state, handle, default_flow_style=False, allow_unicode=True, sort_keys=False) + else: + STATE_FILE.write_text(json.dumps(state, indent=2)) + + +def now_iso() -> str: + return datetime.now().isoformat(timespec="seconds") + + +def find_config(paths: list[Path]) -> tuple[Path | None, dict]: + for path in paths: + if not path.exists(): + continue + try: + return path, load_structured(path) + except Exception: + continue + return None, {} + + +def extract_servers(config: dict) -> list[dict]: + servers = [] + mcp_servers = config.get("mcpServers") or config.get("servers") or config + if not isinstance(mcp_servers, dict): + return servers + for name, definition in mcp_servers.items(): + if isinstance(definition, dict): + servers.append({"name": name, "definition": definition}) + return servers + + +def extract_registry_entry(registry: dict, name: str) -> dict: + if not isinstance(registry, dict): + return {} + servers = registry.get("servers") or registry + if isinstance(servers, dict): + entry = servers.get(name) + if isinstance(entry, dict): + return entry + return {} + + +def get_existing_server(state: dict, name: str) -> dict: + for item in state.get("servers", []): + if item.get("name") == name: + return item + return {} + + +def detect_auth_type(definition: dict, registry_entry: dict) -> str: + explicit = registry_entry.get("auth_type") + if explicit: + return str(explicit) + env_map = definition.get("env") if isinstance(definition.get("env"), dict) else {} + combined = " ".join(str(v) for v in definition.values()) + if any(SECRET_KEY_PATTERN.search(key) for key in env_map): + return "env-token" + if "oauth" in combined.lower() or "pkce" in combined.lower(): + return "oauth" + if SECRET_KEY_PATTERN.search(combined): + return "bearer" + return "none" + + +def extract_env_vars(definition: dict) -> tuple[list[str], list[str], int]: + env_vars: list[str] = [] + missing: list[str] = [] + literal_secrets = 0 + env_map = definition.get("env") if isinstance(definition.get("env"), dict) else {} + for key, value in env_map.items(): + value_str = str(value) + match = ENV_REF_PATTERN.match(value_str.strip()) + if match: + env_name = match.group(1) or match.group(2) or key + env_vars.append(env_name) + if not os.environ.get(env_name): + missing.append(env_name) + elif SECRET_KEY_PATTERN.search(key) or SECRET_KEY_PATTERN.search(value_str): + literal_secrets += 1 + return sorted(set(env_vars)), sorted(set(missing)), literal_secrets + + +def count_literal_secrets(value) -> int: + if isinstance(value, dict): + total = 0 + for key, nested in value.items(): + if isinstance(nested, str) and SECRET_KEY_PATTERN.search(str(key)) and not ENV_REF_PATTERN.match(nested.strip()): + total += 1 + total += count_literal_secrets(nested) + return total + if isinstance(value, list): + return sum(count_literal_secrets(item) for item in value) + if isinstance(value, str): + lowered = value.lower() + if value.startswith("Bearer ") or ("token" in lowered and not ENV_REF_PATTERN.match(value.strip())): + return 1 + return 0 + + +def parse_time(value: str) -> datetime | None: + if not value: + return None + try: + return datetime.fromisoformat(value) + except ValueError: + return None + + +def build_findings(server: dict) -> list[dict]: + findings: list[dict] = [] + for env_name in server.get("missing_env_vars", []): + findings.append( + { + "check": "TOKEN_MISSING", + "severity": "CRITICAL", + "detail": f"Missing environment variable: {env_name}", + } + ) + + expires_at = parse_time(server.get("expires_at", "")) + if expires_at is not None: + delta = expires_at - datetime.now() + if delta.total_seconds() <= 0: + findings.append( + { + "check": "TOKEN_EXPIRED", + "severity": "CRITICAL", + "detail": f"Auth expired at {server['expires_at']}", + } + ) + elif delta <= timedelta(hours=24): + findings.append( + { + "check": "TOKEN_EXPIRING", + "severity": "HIGH", + "detail": f"Auth expires in {int(delta.total_seconds() // 3600)}h", + } + ) + + auth_type = server.get("auth_type", "none") + if auth_type != "none" and not server.get("refresh_command") and not server.get("notes"): + findings.append( + { + "check": "REFRESH_UNDEFINED", + "severity": "HIGH", + "detail": "No refresh command or documented recovery path", + } + ) + + last_refresh = parse_time(server.get("last_refresh_at", "")) + interval_hours = int(server.get("refresh_interval_hours", 0) or 0) + if interval_hours > 0 and last_refresh is not None: + age_hours = (datetime.now() - last_refresh).total_seconds() / 3600 + if age_hours > interval_hours: + findings.append( + { + "check": "REFRESH_OVERDUE", + "severity": "HIGH", + "detail": f"Last successful refresh was {int(age_hours)}h ago (interval: {interval_hours}h)", + } + ) + + if int(server.get("literal_secret_count", 0) or 0) > 0: + findings.append( + { + "check": "STATIC_SECRET_IN_CONFIG", + "severity": "HIGH", + "detail": "Token-like material appears to be hardcoded in MCP config", + } + ) + + if server.get("interactive_refresh"): + findings.append( + { + "check": "INTERACTIVE_REFRESH", + "severity": "MEDIUM", + "detail": "Refresh still requires a manual browser login", + } + ) + + return findings + + +def compute_status(findings: list[dict]) -> str: + severities = {item.get("severity") for item in findings} + if "CRITICAL" in severities: + return "critical" + if "HIGH" in severities or "MEDIUM" in severities: + return "degraded" + return "healthy" + + +def scan_servers(state: dict, server_filter: str | None = None) -> dict: + config_path, config = find_config(MCP_CONFIG_PATHS) + registry_path, registry = find_config(AUTH_REGISTRY_PATHS) + + scanned = [] + for server in extract_servers(config): + name = server["name"] + if server_filter and name != server_filter: + continue + definition = server["definition"] + registry_entry = extract_registry_entry(registry, name) + existing = get_existing_server(state, name) + env_vars, missing_env_vars, literal_from_env = extract_env_vars(definition) + literal_secrets = literal_from_env + count_literal_secrets(definition) + entry = { + "name": name, + "provider": registry_entry.get("provider", name), + "auth_type": detect_auth_type(definition, registry_entry), + "env_vars": env_vars, + "missing_env_vars": missing_env_vars, + "expires_at": str(registry_entry.get("expires_at", existing.get("expires_at", "")) or ""), + "last_refresh_at": str(existing.get("last_refresh_at", registry_entry.get("last_refresh_at", "")) or ""), + "refresh_interval_hours": int( + registry_entry.get("refresh_interval_hours", existing.get("refresh_interval_hours", 0)) or 0 + ), + "interactive_refresh": bool( + registry_entry.get("interactive_refresh", existing.get("interactive_refresh", False)) + ), + "refresh_command": str(registry_entry.get("refresh_command", existing.get("refresh_command", "")) or ""), + "notes": str(registry_entry.get("notes", existing.get("notes", "")) or ""), + "literal_secret_count": literal_secrets, + } + entry["findings"] = build_findings(entry) + entry["status"] = compute_status(entry["findings"]) if entry["auth_type"] != "none" else "unknown" + scanned.append(entry) + + if server_filter: + existing = {item.get("name"): item for item in state.get("servers", []) if item.get("name")} + for item in scanned: + existing[item["name"]] = item + scanned = list(existing.values()) + + scanned.sort(key=lambda item: item["name"]) + state["last_scan_at"] = now_iso() + state["last_config_path"] = str(config_path or "") + state["last_registry_path"] = str(registry_path or "") + state["servers"] = scanned + return state + + +def record_refresh(state: dict, server_name: str, result: str, note: str, expires_at: str) -> dict: + servers = state.get("servers", []) + server = next((item for item in servers if item.get("name") == server_name), None) + if server is None: + server = { + "name": server_name, + "provider": server_name, + "auth_type": "unknown", + "status": "unknown", + "env_vars": [], + "missing_env_vars": [], + "expires_at": "", + "last_refresh_at": "", + "refresh_interval_hours": 0, + "interactive_refresh": False, + "refresh_command": "", + "notes": "", + "findings": [], + } + servers.append(server) + + recorded_at = now_iso() + history = state.get("refresh_history") or [] + history.insert( + 0, + { + "recorded_at": recorded_at, + "server": server_name, + "result": result, + "note": note, + "expires_at": expires_at, + }, + ) + state["refresh_history"] = history[:MAX_HISTORY] + + if result == "success": + server["last_refresh_at"] = recorded_at + if expires_at: + server["expires_at"] = expires_at + + server["findings"] = build_findings(server) + server["status"] = compute_status(server["findings"]) if server.get("auth_type") != "none" else "unknown" + return state + + +def print_scan(state: dict) -> None: + servers = state.get("servers", []) + healthy = sum(1 for item in servers if item.get("status") == "healthy") + degraded = sum(1 for item in servers if item.get("status") == "degraded") + critical = sum(1 for item in servers if item.get("status") == "critical") + + print("\nMCP Auth Lifecycle Manager") + print("───────────────────────────────────────────────────────") + print(f" {len(servers)} servers | {healthy} healthy | {degraded} degraded | {critical} critical") + if state.get("last_config_path"): + print(f" Config: {state['last_config_path']}") + if state.get("last_registry_path"): + print(f" Registry: {state['last_registry_path']}") + if not servers: + print("\n No MCP servers discovered.") + return + for item in servers: + print(f" {item['status'].upper():9} {item['name']} {item['auth_type']}") + for finding in item.get("findings", [])[:3]: + print(f" [{finding['severity']}] {finding['check']}: {finding['detail']}") + + +def print_status(state: dict) -> None: + print(f"\nMCP Auth Lifecycle Manager — Last scan: {state.get('last_scan_at') or 'never'}") + print("───────────────────────────────────────────────────────") + servers = state.get("servers", []) + if not servers: + print(" No auth state recorded.") + return + for item in servers: + expiry = item.get("expires_at") or "n/a" + refresh = item.get("last_refresh_at") or "never" + print(f" {item['name']}: {item['status']} expires={expiry} refreshed={refresh}") + + +def print_history(state: dict) -> None: + history = state.get("refresh_history", []) + print("\nMCP Auth Refresh History") + print("───────────────────────────────────────────────────────") + if not history: + print(" No refresh events recorded.") + return + for item in history[:10]: + print(f" {item['recorded_at']} {item['server']} {item['result']} {item['note']}") + + +def print_plan(state: dict, server_name: str) -> None: + server = next((item for item in state.get("servers", []) if item.get("name") == server_name), None) + if server is None: + raise SystemExit(f"No auth record found for server '{server_name}'") + print(f"\nRefresh plan for {server_name}") + print("───────────────────────────────────────────────────────") + if server.get("env_vars"): + print(f"1. Confirm env vars: {', '.join(server['env_vars'])}") + else: + print("1. Confirm auth inputs are available to the MCP process.") + if server.get("refresh_command"): + print(f"2. Run: {server['refresh_command']}") + else: + print("2. Follow the documented provider refresh flow or update the registry with refresh_command.") + print("3. Re-run mcp-health-checker to confirm the server still initializes cleanly.") + print(f"4. Record the result with: python3 manage.py --record-refresh {server_name} --result success") + if server.get("interactive_refresh"): + print("5. This server still requires an interactive login; do not rely on it for unattended workloads.") + + +def main() -> None: + parser = argparse.ArgumentParser(description="Track MCP auth expiry and refresh readiness") + group = parser.add_mutually_exclusive_group(required=True) + group.add_argument("--scan", action="store_true", help="Scan MCP auth dependencies and refresh readiness") + group.add_argument("--status", action="store_true", help="Show last recorded auth status") + group.add_argument("--history", action="store_true", help="Show refresh history") + group.add_argument("--plan", metavar="SERVER", help="Print refresh steps for a server") + group.add_argument("--record-refresh", metavar="SERVER", help="Record a refresh outcome for a server") + parser.add_argument("--server", help="Optional single-server filter for --scan") + parser.add_argument("--result", choices=["success", "failed"], help="Refresh result for --record-refresh") + parser.add_argument("--note", default="", help="Context or command used during refresh") + parser.add_argument("--expires-at", default="", help="Updated auth expiry for successful refreshes") + parser.add_argument("--format", choices=["human", "json"], default="human") + args = parser.parse_args() + + state = load_state() + + if args.scan: + state = scan_servers(state, args.server) + save_state(state) + elif args.record_refresh: + if not args.result: + raise SystemExit("--result is required for --record-refresh") + state = record_refresh(state, args.record_refresh, args.result, args.note, args.expires_at) + save_state(state) + + if args.format == "json": + print(json.dumps(state, indent=2)) + return + + if args.scan: + print_scan(state) + elif args.status: + print_status(state) + elif args.history: + print_history(state) + elif args.plan: + print_plan(state, args.plan) + elif args.record_refresh: + print_status(state) + + +if __name__ == "__main__": + main() diff --git a/skills/openclaw-native/message-delivery-verifier/SKILL.md b/skills/openclaw-native/message-delivery-verifier/SKILL.md new file mode 100644 index 0000000..46a529a --- /dev/null +++ b/skills/openclaw-native/message-delivery-verifier/SKILL.md @@ -0,0 +1,61 @@ +--- +name: message-delivery-verifier +version: "1.0" +category: openclaw-native +description: Tracks outbound message delivery across Telegram, Slack, and similar channels — queued, sent, acknowledged, failed, or stale — so users stop missing the final result. +stateful: true +cron: "*/15 * * * *" +--- + +# Message Delivery Verifier + +## What it does + +Cron jobs and agents often complete the work but fail on the last step: the message never reaches the user. Message Delivery Verifier maintains a delivery ledger so queued, sent, acknowledged, failed, and stale messages are explicit instead of guessed. + +## When to invoke + +- Around any workflow that sends a notification, briefing, or deliverable +- When users report "the agent ran but I never got the message" +- As a periodic watchdog for outbound message queues + +## Delivery states + +- `queued` +- `sent` +- `acknowledged` +- `failed` +- `stale` + +## How to use + +```bash +python3 verify.py --queue telegram --recipient ops-chat --body "Morning briefing ready" +python3 verify.py --sent telegram --delivery-id msg-001 --receipt telegram:8812 +python3 verify.py --ack telegram --delivery-id msg-001 +python3 verify.py --fail telegram --delivery-id msg-001 --reason "403 chat not found" +python3 verify.py --stale +python3 verify.py --report +python3 verify.py --format json +``` + +## Watchdog behaviour + +Every 15 minutes: + +1. Load queued and sent-but-unacknowledged messages +2. Mark any message older than the stale threshold as `stale` +3. Surface the channel, recipient, and last known receipt +4. Preserve retry recommendations in state + +## Difference from cron-execution-prover + +`cron-execution-prover` proves that a scheduled workflow ran. + +`message-delivery-verifier` proves that the last-mile notification or output actually reached the user. + +## State + +State file: `~/.openclaw/skill-state/message-delivery-verifier/state.yaml` + +Fields: `deliveries`, `stale_deliveries`, `last_report_at`, `report_history`. diff --git a/skills/openclaw-native/message-delivery-verifier/STATE_SCHEMA.yaml b/skills/openclaw-native/message-delivery-verifier/STATE_SCHEMA.yaml new file mode 100644 index 0000000..5262403 --- /dev/null +++ b/skills/openclaw-native/message-delivery-verifier/STATE_SCHEMA.yaml @@ -0,0 +1,34 @@ +version: "1.0" +description: Delivery ledger, stale message tracking, and report history. +fields: + deliveries: + type: list + items: + channel: { type: string } + delivery_id: { type: string } + recipient: { type: string } + body: { type: string } + queued_at: { type: datetime } + sent_at: { type: datetime } + acknowledged_at: { type: datetime } + status: { type: enum, values: [queued, sent, acknowledged, failed, stale] } + receipt: { type: string } + failure_reason:{ type: string } + retry_count: { type: integer } + stale_deliveries: + type: list + items: + delivery_id: { type: string } + channel: { type: string } + recipient: { type: string } + age_minutes: { type: integer } + last_report_at: + type: datetime + report_history: + type: list + description: Rolling history of the last 12 delivery reports + items: + reported_at: { type: datetime } + total_deliveries: { type: integer } + stale_count: { type: integer } + failed_count: { type: integer } diff --git a/skills/openclaw-native/message-delivery-verifier/example-state.yaml b/skills/openclaw-native/message-delivery-verifier/example-state.yaml new file mode 100644 index 0000000..e9c477a --- /dev/null +++ b/skills/openclaw-native/message-delivery-verifier/example-state.yaml @@ -0,0 +1,42 @@ +# Example runtime state for message-delivery-verifier +deliveries: + - channel: telegram + delivery_id: "msg-001" + recipient: "ops-chat" + body: "Morning briefing ready" + queued_at: "2026-04-01T07:00:00" + sent_at: "2026-04-01T07:00:03" + acknowledged_at: "2026-04-01T07:00:07" + status: acknowledged + receipt: "telegram:8812" + failure_reason: "" + retry_count: 0 + - channel: slack + delivery_id: "msg-002" + recipient: "#alerts" + body: "Nightly sync completed" + queued_at: "2026-04-01T08:00:00" + sent_at: "2026-04-01T08:00:05" + acknowledged_at: "" + status: stale + receipt: "slack:ts-191.88" + failure_reason: "" + retry_count: 1 +stale_deliveries: + - delivery_id: "msg-002" + channel: slack + recipient: "#alerts" + age_minutes: 92 +last_report_at: "2026-04-01T09:32:00" +report_history: + - reported_at: "2026-04-01T09:32:00" + total_deliveries: 2 + stale_count: 1 + failed_count: 0 +# ── Walkthrough ────────────────────────────────────────────────────────────── +# python3 verify.py --stale +# +# Message Delivery Verifier +# ─────────────────────────────────────────────────────── +# 2 deliveries | 1 stale | 0 failed +# STALE slack msg-002 -> #alerts diff --git a/skills/openclaw-native/message-delivery-verifier/verify.py b/skills/openclaw-native/message-delivery-verifier/verify.py new file mode 100755 index 0000000..cdd838a --- /dev/null +++ b/skills/openclaw-native/message-delivery-verifier/verify.py @@ -0,0 +1,216 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +""" +Message Delivery Verifier for openclaw-superpowers. + +Tracks the last-mile state of outbound notifications across supported channels. +""" + +import argparse +import json +import os +from datetime import datetime +from pathlib import Path + +try: + import yaml + HAS_YAML = True +except ImportError: + HAS_YAML = False + +OPENCLAW_DIR = Path(os.environ.get("OPENCLAW_HOME", Path.home() / ".openclaw")) +STATE_FILE = OPENCLAW_DIR / "skill-state" / "message-delivery-verifier" / "state.yaml" +MAX_DELIVERIES = 200 +MAX_HISTORY = 12 +STALE_AFTER_MINUTES = 60 + + +def default_state() -> dict: + return { + "deliveries": [], + "stale_deliveries": [], + "last_report_at": "", + "report_history": [], + } + + +def load_state() -> dict: + if not STATE_FILE.exists(): + return default_state() + try: + text = STATE_FILE.read_text() + if HAS_YAML: + return yaml.safe_load(text) or default_state() + return json.loads(text) + except Exception: + return default_state() + + +def save_state(state: dict) -> None: + STATE_FILE.parent.mkdir(parents=True, exist_ok=True) + if HAS_YAML: + with open(STATE_FILE, "w") as handle: + yaml.dump(state, handle, default_flow_style=False, allow_unicode=True, sort_keys=False) + else: + STATE_FILE.write_text(json.dumps(state, indent=2)) + + +def now_iso() -> str: + return datetime.now().isoformat(timespec="seconds") + + +def find_delivery(state: dict, delivery_id: str) -> dict | None: + for item in state.get("deliveries", []): + if item.get("delivery_id") == delivery_id: + return item + return None + + +def ensure_delivery(state: dict, channel: str, delivery_id: str, recipient: str = "", body: str = "") -> dict: + existing = find_delivery(state, delivery_id) + if existing: + return existing + entry = { + "channel": channel, + "delivery_id": delivery_id, + "recipient": recipient, + "body": body, + "queued_at": now_iso(), + "sent_at": "", + "acknowledged_at": "", + "status": "queued", + "receipt": "", + "failure_reason": "", + "retry_count": 0, + } + state["deliveries"] = [entry] + (state.get("deliveries") or []) + state["deliveries"] = state["deliveries"][:MAX_DELIVERIES] + return entry + + +def refresh_stale(state: dict) -> None: + stale = [] + current = datetime.now() + for item in state.get("deliveries", []): + if item.get("status") in {"acknowledged", "failed"}: + continue + ts = item.get("sent_at") or item.get("queued_at") + if not ts: + continue + try: + age = int((current - datetime.fromisoformat(ts)).total_seconds() / 60) + except ValueError: + continue + if age > STALE_AFTER_MINUTES: + item["status"] = "stale" + stale.append( + { + "delivery_id": item.get("delivery_id", ""), + "channel": item.get("channel", ""), + "recipient": item.get("recipient", ""), + "age_minutes": age, + } + ) + state["stale_deliveries"] = stale + + +def record_history(state: dict) -> None: + refresh_stale(state) + history = state.get("report_history") or [] + history.insert( + 0, + { + "reported_at": now_iso(), + "total_deliveries": len(state.get("deliveries", [])), + "stale_count": len(state.get("stale_deliveries", [])), + "failed_count": sum(1 for item in state.get("deliveries", []) if item.get("status") == "failed"), + }, + ) + state["last_report_at"] = history[0]["reported_at"] + state["report_history"] = history[:MAX_HISTORY] + + +def print_report(state: dict, stale_only: bool = False) -> None: + refresh_stale(state) + deliveries = state.get("deliveries", []) + stale = state.get("stale_deliveries", []) + failed_count = sum(1 for item in deliveries if item.get("status") == "failed") + print("\nMessage Delivery Verifier") + print("───────────────────────────────────────────────────────") + print(f" {len(deliveries)} deliveries | {len(stale)} stale | {failed_count} failed") + if stale_only: + for item in stale: + print(f" STALE {item['channel']} {item['delivery_id']} -> {item['recipient']}") + if not stale: + print("\n No stale deliveries.") + return + if not deliveries: + print("\n No deliveries recorded.") + return + for item in deliveries[:10]: + print(f" {item.get('status', '').upper():12} {item.get('channel', '')} {item.get('delivery_id', '')}") + + +def main() -> None: + parser = argparse.ArgumentParser(description="Track outbound message delivery state") + group = parser.add_mutually_exclusive_group(required=True) + group.add_argument("--queue", metavar="CHANNEL", help="Queue a delivery") + group.add_argument("--sent", metavar="CHANNEL", help="Mark a delivery sent") + group.add_argument("--ack", metavar="CHANNEL", help="Mark a delivery acknowledged") + group.add_argument("--fail", metavar="CHANNEL", help="Mark a delivery failed") + group.add_argument("--stale", action="store_true", help="Show stale deliveries") + group.add_argument("--report", action="store_true", help="Show delivery report") + parser.add_argument("--delivery-id", help="Stable delivery identifier") + parser.add_argument("--recipient", default="", help="Channel recipient") + parser.add_argument("--body", default="", help="Message body") + parser.add_argument("--receipt", default="", help="Provider receipt or message ID") + parser.add_argument("--reason", default="", help="Failure reason") + parser.add_argument("--format", choices=["human", "json"], default="human") + args = parser.parse_args() + + state = load_state() + if args.queue: + delivery_id = args.delivery_id or f"{args.queue}-{datetime.now().strftime('%Y%m%d%H%M%S')}" + ensure_delivery(state, args.queue, delivery_id, args.recipient, args.body) + save_state(state) + elif args.sent: + if not args.delivery_id: + raise SystemExit("--delivery-id is required for --sent") + entry = ensure_delivery(state, args.sent, args.delivery_id, args.recipient, args.body) + entry["sent_at"] = now_iso() + entry["status"] = "sent" + entry["receipt"] = args.receipt or entry.get("receipt", "") + save_state(state) + elif args.ack: + if not args.delivery_id: + raise SystemExit("--delivery-id is required for --ack") + entry = ensure_delivery(state, args.ack, args.delivery_id) + entry["acknowledged_at"] = now_iso() + entry["status"] = "acknowledged" + save_state(state) + elif args.fail: + if not args.delivery_id: + raise SystemExit("--delivery-id is required for --fail") + entry = ensure_delivery(state, args.fail, args.delivery_id) + entry["status"] = "failed" + entry["failure_reason"] = args.reason + entry["retry_count"] = int(entry.get("retry_count", 0)) + 1 + save_state(state) + + if args.report or args.stale: + record_history(state) + save_state(state) + + refresh_stale(state) + if args.format == "json": + print(json.dumps(state, indent=2)) + return + if args.stale: + print_report(state, stale_only=True) + else: + print_report(state) + + +if __name__ == "__main__": + main() diff --git a/skills/openclaw-native/session-reset-recovery/SKILL.md b/skills/openclaw-native/session-reset-recovery/SKILL.md new file mode 100644 index 0000000..5419079 --- /dev/null +++ b/skills/openclaw-native/session-reset-recovery/SKILL.md @@ -0,0 +1,73 @@ +--- +name: session-reset-recovery +version: "1.0" +category: openclaw-native +description: Checkpoints active work before the overnight session reset window and restores a concise resume brief after restart so long tasks survive routine session loss. +stateful: true +cron: "45 3 * * *" +--- + +# Session Reset Recovery + +## What it does + +Some OpenClaw users lose active context during routine overnight session resets. Work is not always gone, but the active thread of thought is. Session Reset Recovery writes a compact checkpoint before the risky window and turns it into a resume brief after restart. + +## When to invoke + +- Automatically at 03:45 local time before the common overnight reset window +- Manually before stopping a long-running session +- Immediately after a reset when the user says "pick up where we left off" + +## What to capture + +- Current task name +- Current status +- The last stable checkpoint +- The next concrete action +- Files in play +- Risks or blockers +- Whether the checkpoint was created automatically or manually + +## How to use + +```bash +python3 recover.py --checkpoint --task "ship deployment-preflight" --next "run tests and open PR" +python3 recover.py --checkpoint --task "ship deployment-preflight" --checkpoint-text "deployment-preflight skill implemented" +python3 recover.py --resume +python3 recover.py --status +python3 recover.py --clear +python3 recover.py --format json +``` + +## Reset-window behaviour + +At 03:45: + +1. Read the current checkpoint state +2. If a task is in progress, write a fresh recovery checkpoint +3. Mark it as `pending_resume: true` +4. Save a short resume brief that can be injected after restart + +If no task is active, skip. + +## Recovery protocol + +After a reset: + +1. Run `python3 recover.py --resume` +2. Read the latest recovery brief +3. Confirm the last stable checkpoint and next step +4. Continue from the recorded next action instead of reconstructing context from memory + +## Difference from task-handoff + +`task-handoff` is a deliberate pause between agents or sessions. + +`session-reset-recovery` is for routine or accidental session loss where the user wants the same agent to recover quickly with minimal friction. + +## State + +State file: `~/.openclaw/skill-state/session-reset-recovery/state.yaml` + +Fields: `active_task`, `latest_checkpoint`, `resume_brief`, `pending_resume`, `checkpoint_history`. diff --git a/skills/openclaw-native/session-reset-recovery/STATE_SCHEMA.yaml b/skills/openclaw-native/session-reset-recovery/STATE_SCHEMA.yaml new file mode 100644 index 0000000..ec29591 --- /dev/null +++ b/skills/openclaw-native/session-reset-recovery/STATE_SCHEMA.yaml @@ -0,0 +1,31 @@ +version: "1.0" +description: Recovery checkpoints and resume briefs for routine session resets. +fields: + active_task: + type: string + pending_resume: + type: boolean + default: false + latest_checkpoint: + type: object + items: + task: { type: string } + status: { type: string } + checkpoint_text: { type: string } + next_action: { type: string } + files_in_play: { type: list, items: { type: string } } + blockers: { type: list, items: { type: string } } + mode: { type: enum, values: [manual, automatic] } + written_at: { type: datetime } + resume_brief: + type: string + checkpoint_history: + type: list + description: Rolling log of the last 12 checkpoints + items: + task: { type: string } + status: { type: string } + checkpoint_text: { type: string } + next_action: { type: string } + mode: { type: string } + written_at: { type: datetime } diff --git a/skills/openclaw-native/session-reset-recovery/example-state.yaml b/skills/openclaw-native/session-reset-recovery/example-state.yaml new file mode 100644 index 0000000..99bf2b8 --- /dev/null +++ b/skills/openclaw-native/session-reset-recovery/example-state.yaml @@ -0,0 +1,44 @@ +# Example runtime state for session-reset-recovery +active_task: "ship-deployment-preflight" +pending_resume: true +latest_checkpoint: + task: "ship-deployment-preflight" + status: "in_progress" + checkpoint_text: "Deployment preflight skill is implemented and validated locally." + next_action: "Push the branch and open the PR." + files_in_play: + - "skills/openclaw-native/deployment-preflight/SKILL.md" + - "skills/openclaw-native/deployment-preflight/check.py" + - "README.md" + blockers: [] + mode: automatic + written_at: "2026-03-29T03:45:02.000000" +resume_brief: "Resume `ship-deployment-preflight`: Deployment preflight skill is implemented and validated locally. Next: Push the branch and open the PR." +checkpoint_history: + - task: "ship-deployment-preflight" + status: "in_progress" + checkpoint_text: "Deployment preflight skill is implemented and validated locally." + next_action: "Push the branch and open the PR." + mode: automatic + written_at: "2026-03-29T03:45:02.000000" + - task: "ship-deployment-preflight" + status: "in_progress" + checkpoint_text: "README inventory updated to include deployment-preflight." + next_action: "Run the test runner and preflight smoke test." + mode: manual + written_at: "2026-03-28T23:18:10.000000" +# ── Walkthrough ────────────────────────────────────────────────────────────── +# Cron runs at 03:45: python3 recover.py --checkpoint --automatic +# +# Session Reset Recovery +# ─────────────────────────────────────────────────────── +# Task: ship-deployment-preflight +# Status: in_progress +# Checkpoint: Deployment preflight skill is implemented and validated locally. +# Next: Push the branch and open the PR. +# +# python3 recover.py --resume +# +# Resume `ship-deployment-preflight` +# Deployment preflight skill is implemented and validated locally. +# Next: Push the branch and open the PR. diff --git a/skills/openclaw-native/session-reset-recovery/recover.py b/skills/openclaw-native/session-reset-recovery/recover.py new file mode 100755 index 0000000..732ec01 --- /dev/null +++ b/skills/openclaw-native/session-reset-recovery/recover.py @@ -0,0 +1,166 @@ +#!/usr/bin/env python3 +""" +Session Reset Recovery for openclaw-superpowers. + +Writes recovery checkpoints before the overnight reset window and prints a +resume brief after restart. +""" + +import argparse +import json +import os +from datetime import datetime +from pathlib import Path + +try: + import yaml + HAS_YAML = True +except ImportError: + HAS_YAML = False + +OPENCLAW_DIR = Path(os.environ.get("OPENCLAW_HOME", Path.home() / ".openclaw")) +STATE_FILE = OPENCLAW_DIR / "skill-state" / "session-reset-recovery" / "state.yaml" +MAX_HISTORY = 12 + + +def default_state() -> dict: + return { + "active_task": "", + "pending_resume": False, + "latest_checkpoint": {}, + "resume_brief": "", + "checkpoint_history": [], + } + + +def load_state() -> dict: + if not STATE_FILE.exists(): + return default_state() + try: + text = STATE_FILE.read_text() + if HAS_YAML: + return yaml.safe_load(text) or default_state() + return json.loads(text) + except Exception: + return default_state() + + +def save_state(state: dict) -> None: + STATE_FILE.parent.mkdir(parents=True, exist_ok=True) + if HAS_YAML: + with open(STATE_FILE, "w") as handle: + yaml.dump(state, handle, default_flow_style=False, allow_unicode=True, sort_keys=False) + else: + STATE_FILE.write_text(json.dumps(state, indent=2)) + + +def build_resume_brief(checkpoint: dict) -> str: + task = checkpoint.get("task", "unknown-task") + checkpoint_text = checkpoint.get("checkpoint_text", "No checkpoint recorded.") + next_action = checkpoint.get("next_action", "No next action recorded.") + return f"Resume `{task}`: {checkpoint_text} Next: {next_action}" + + +def write_checkpoint(state: dict, args: argparse.Namespace) -> dict: + now = datetime.now().isoformat() + latest = state.get("latest_checkpoint") or {} + checkpoint = { + "task": args.task or state.get("active_task") or latest.get("task", ""), + "status": args.task_status or latest.get("status", "in_progress"), + "checkpoint_text": args.checkpoint_text or latest.get("checkpoint_text", "Checkpoint recorded."), + "next_action": args.next or latest.get("next_action", "Resume the active task."), + "files_in_play": args.files or latest.get("files_in_play", []), + "blockers": args.blockers or latest.get("blockers", []), + "mode": "automatic" if args.automatic else "manual", + "written_at": now, + } + state["active_task"] = checkpoint["task"] + state["pending_resume"] = True + state["latest_checkpoint"] = checkpoint + state["resume_brief"] = build_resume_brief(checkpoint) + history = state.get("checkpoint_history") or [] + history.insert( + 0, + { + "task": checkpoint["task"], + "status": checkpoint["status"], + "checkpoint_text": checkpoint["checkpoint_text"], + "next_action": checkpoint["next_action"], + "mode": checkpoint["mode"], + "written_at": now, + }, + ) + state["checkpoint_history"] = history[:MAX_HISTORY] + save_state(state) + return state + + +def clear_state(state: dict) -> dict: + state["pending_resume"] = False + state["resume_brief"] = "" + state["active_task"] = "" + state["latest_checkpoint"] = {} + save_state(state) + return state + + +def print_status(state: dict) -> None: + checkpoint = state.get("latest_checkpoint") or {} + print("\nSession Reset Recovery") + print("───────────────────────────────────────────────────────") + if not checkpoint: + print(" No checkpoint recorded.") + return + print(f" Task: {checkpoint.get('task', '')}") + print(f" Status: {checkpoint.get('status', '')}") + print(f" Pending resume: {state.get('pending_resume', False)}") + print(f" Written at: {checkpoint.get('written_at', '')}") + print(f" Next: {checkpoint.get('next_action', '')}") + + +def print_resume(state: dict) -> None: + brief = state.get("resume_brief") + if not brief: + print("No recovery brief recorded.") + return + print(brief) + + +def main() -> None: + parser = argparse.ArgumentParser(description="Recovery checkpoints for session resets") + group = parser.add_mutually_exclusive_group(required=True) + group.add_argument("--checkpoint", action="store_true", help="Write a recovery checkpoint") + group.add_argument("--resume", action="store_true", help="Print the latest resume brief") + group.add_argument("--status", action="store_true", help="Show current checkpoint status") + group.add_argument("--clear", action="store_true", help="Clear the active recovery checkpoint") + parser.add_argument("--automatic", action="store_true", help="Mark the checkpoint as cron-generated") + parser.add_argument("--task", help="Active task name") + parser.add_argument("--status-text", dest="task_status", help="Task status") + parser.add_argument("--checkpoint-text", help="Last stable checkpoint summary") + parser.add_argument("--next", help="Next concrete action") + parser.add_argument("--files", nargs="*", default=[], help="Files in play") + parser.add_argument("--blockers", nargs="*", default=[], help="Known blockers") + parser.add_argument("--format", choices=["human", "json"], default="human") + args = parser.parse_args() + + state = load_state() + if args.checkpoint: + state = write_checkpoint(state, args) + elif args.clear: + state = clear_state(state) + + if args.format == "json": + if args.resume: + print(json.dumps({"resume_brief": state.get("resume_brief", "")}, indent=2)) + else: + print(json.dumps(state, indent=2)) + return + + if args.resume: + print_resume(state) + elif args.status or args.checkpoint or args.clear: + print_status(state) + + +if __name__ == "__main__": + main() diff --git a/skills/openclaw-native/subagent-capability-auditor/SKILL.md b/skills/openclaw-native/subagent-capability-auditor/SKILL.md new file mode 100644 index 0000000..0d738b9 --- /dev/null +++ b/skills/openclaw-native/subagent-capability-auditor/SKILL.md @@ -0,0 +1,60 @@ +--- +name: subagent-capability-auditor +version: "1.0" +category: openclaw-native +description: Audits subagent configuration for spawn depth, tool exposure, and fleet shape so multi-agent setups fail early instead of becoming expensive mystery states. +stateful: true +--- + +# Subagent Capability Auditor + +## What it does + +Subagent failures are often configuration failures hiding behind runtime symptoms: no spawn tool, unsafe `maxSpawnDepth`, too many loosely defined agents, or configs spread across files with no obvious ownership. + +Subagent Capability Auditor inspects the available configuration and records the issues before the orchestrator tries to use them. + +## When to invoke + +- Before enabling subagents in a new environment +- After changing agent profiles or tool exposure +- When subagents fail to spawn or produce inconsistent fleet behaviour +- Before trusting large multi-agent workflows + +## What it checks + +| Check | Why it matters | +|---|---| +| Subagent config discovery | You cannot audit what the runtime cannot see | +| `maxSpawnDepth` | Too low blocks useful delegation; too high creates runaway trees | +| Spawn tool exposure | Missing `sessions_spawn` / equivalent makes subagents impossible | +| Fleet size | Large flat fleets without strong roles are hard to reason about | +| Role metadata | Agents without clear roles increase overlap and wasted token usage | + +## How to use + +```bash +python3 audit.py --audit +python3 audit.py --audit --path ~/.openclaw +python3 audit.py --status +python3 audit.py --findings +python3 audit.py --format json +``` + +## Output levels + +- **PASS** — no configuration concerns found +- **WARN** — something is likely to work poorly or expensively +- **FAIL** — a required subagent capability appears to be missing + +## Difference from multi-agent-coordinator + +`multi-agent-coordinator` manages an active fleet. + +`subagent-capability-auditor` validates whether the fleet is configured sanely enough to exist in the first place. + +## State + +State file: `~/.openclaw/skill-state/subagent-capability-auditor/state.yaml` + +Fields: `last_audit_at`, `config_files`, `findings`, `audit_history`. diff --git a/skills/openclaw-native/subagent-capability-auditor/STATE_SCHEMA.yaml b/skills/openclaw-native/subagent-capability-auditor/STATE_SCHEMA.yaml new file mode 100644 index 0000000..9b28853 --- /dev/null +++ b/skills/openclaw-native/subagent-capability-auditor/STATE_SCHEMA.yaml @@ -0,0 +1,28 @@ +version: "1.0" +description: Subagent capability findings and rolling audit summaries. +fields: + last_audit_at: + type: datetime + config_files: + type: list + items: + type: string + findings: + type: list + items: + severity: { type: enum, values: [FAIL, WARN, INFO] } + check: { type: string } + detail: { type: string } + suggestion: { type: string } + file_path: { type: string } + detected_at: { type: datetime } + resolved: { type: boolean } + audit_history: + type: list + description: Rolling summaries of the last 12 audits + items: + audited_at: { type: datetime } + fail_count: { type: integer } + warn_count: { type: integer } + info_count: { type: integer } + file_count: { type: integer } diff --git a/skills/openclaw-native/subagent-capability-auditor/audit.py b/skills/openclaw-native/subagent-capability-auditor/audit.py new file mode 100755 index 0000000..b0f4317 --- /dev/null +++ b/skills/openclaw-native/subagent-capability-auditor/audit.py @@ -0,0 +1,284 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +""" +Subagent Capability Auditor for openclaw-superpowers. + +Audits subagent-related configuration for spawn depth, tool exposure, and +fleet shape before users trust multi-agent workflows. +""" + +import argparse +import json +import os +import re +from datetime import datetime +from pathlib import Path + +try: + import yaml + HAS_YAML = True +except ImportError: + HAS_YAML = False + +OPENCLAW_DIR = Path(os.environ.get("OPENCLAW_HOME", Path.home() / ".openclaw")) +STATE_FILE = OPENCLAW_DIR / "skill-state" / "subagent-capability-auditor" / "state.yaml" +MAX_HISTORY = 12 +CONFIG_NAMES = {"agents.yaml", "agents.yml", "runtime.json", "openclaw.json", "config.yaml", "config.yml"} + + +def default_state() -> dict: + return { + "last_audit_at": "", + "config_files": [], + "findings": [], + "audit_history": [], + } + + +def load_state() -> dict: + if not STATE_FILE.exists(): + return default_state() + try: + text = STATE_FILE.read_text() + if HAS_YAML: + return yaml.safe_load(text) or default_state() + return json.loads(text) + except Exception: + return default_state() + + +def save_state(state: dict) -> None: + STATE_FILE.parent.mkdir(parents=True, exist_ok=True) + if HAS_YAML: + with open(STATE_FILE, "w") as handle: + yaml.dump(state, handle, default_flow_style=False, allow_unicode=True, sort_keys=False) + else: + STATE_FILE.write_text(json.dumps(state, indent=2)) + + +def finding(severity: str, check: str, detail: str, suggestion: str, file_path: Path | str = "") -> dict: + return { + "severity": severity, + "check": check, + "detail": detail, + "suggestion": suggestion, + "file_path": str(file_path), + "detected_at": datetime.now().isoformat(), + "resolved": False, + } + + +def discover_files(root: Path) -> list[Path]: + if not root.exists(): + return [] + matches = [] + for path in root.rglob("*"): + if path.is_file() and path.name in CONFIG_NAMES: + matches.append(path) + return sorted(matches) + + +def parse_agent_count(text: str) -> int: + match = re.search(r'agent[s]?\s*[:=]\s*\[', text, re.IGNORECASE) + if match: + return text.count("role:") + return text.lower().count("role:") + + +def audit_files(files: list[Path]) -> list[dict]: + findings = [] + if not files: + findings.append( + finding( + "FAIL", + "SUBAGENT_CONFIG_MISSING", + "No candidate subagent config files were found.", + "Point the auditor at the correct config root or add the relevant runtime config files.", + ) + ) + return findings + + saw_spawn_tool = False + saw_depth = False + total_agents = 0 + + for path in files: + try: + text = path.read_text(errors="replace") + except Exception as exc: + findings.append( + finding( + "WARN", + "CONFIG_UNREADABLE", + f"Could not read config file: {exc}", + "Fix file permissions or path selection.", + path, + ) + ) + continue + + total_agents += parse_agent_count(text) + + if "sessions_spawn" in text or "spawn_subagent" in text or "spawn-agent" in text: + saw_spawn_tool = True + + depth_match = re.search(r'"?maxSpawnDepth"?\s*[:=]\s*(\d+)', text) + if depth_match: + saw_depth = True + depth = int(depth_match.group(1)) + if depth < 1: + findings.append( + finding( + "FAIL", + "SPAWN_DEPTH_INVALID", + f"maxSpawnDepth is {depth}.", + "Use a positive depth; 2 or 3 is a safer starting point.", + path, + ) + ) + elif depth == 1: + findings.append( + finding( + "WARN", + "SPAWN_DEPTH_LOW", + "maxSpawnDepth is set to 1; deeper delegation will be blocked.", + "Raise maxSpawnDepth to 2 or 3 if nested delegation is expected.", + path, + ) + ) + elif depth > 3: + findings.append( + finding( + "WARN", + "SPAWN_DEPTH_HIGH", + f"maxSpawnDepth is set to {depth}; runaway delegation becomes harder to reason about.", + "Lower maxSpawnDepth to 2 or 3 unless you have a strong reason to exceed it.", + path, + ) + ) + + if ("agents:" in text or '"agents"' in text or "- name:" in text) and "role:" not in text and '"role"' not in text: + findings.append( + finding( + "WARN", + "ROLE_METADATA_MISSING", + "This config file does not show explicit agent roles.", + "Give agents distinct roles so fleet overlap stays manageable.", + path, + ) + ) + + if not saw_spawn_tool: + findings.append( + finding( + "FAIL", + "SPAWN_TOOL_MISSING", + "No spawn tool exposure was detected in the scanned config.", + "Expose `sessions_spawn` or the runtime's equivalent spawn capability before expecting subagents to work.", + ) + ) + else: + findings.append( + finding( + "INFO", + "SPAWN_TOOL_PRESENT", + "Spawn tooling appears to be exposed in the current configuration.", + "No action required.", + ) + ) + + if not saw_depth: + findings.append( + finding( + "WARN", + "SPAWN_DEPTH_UNDECLARED", + "No maxSpawnDepth setting was found in the scanned config.", + "Declare a depth limit explicitly so delegation behaviour is predictable.", + ) + ) + + if total_agents >= 10: + findings.append( + finding( + "WARN", + "LARGE_AGENT_FLEET", + f"Detected {total_agents} configured agents in a flat fleet.", + "Consolidate overlapping roles or create stronger role boundaries before scaling further.", + ) + ) + + return findings + + +def print_summary(state: dict) -> None: + findings = state.get("findings", []) + fail_count = sum(1 for item in findings if item["severity"] == "FAIL") + warn_count = sum(1 for item in findings if item["severity"] == "WARN") + info_count = sum(1 for item in findings if item["severity"] == "INFO") + print("\nSubagent Capability Auditor") + print("───────────────────────────────────────────────────────") + print(f" {len(state.get('config_files', []))} config files | {fail_count} FAIL | {warn_count} WARN | {info_count} INFO") + + +def print_findings(state: dict) -> None: + findings = state.get("findings", []) + if not findings: + print("\n PASS No findings.") + return + print("\nFindings") + print("───────────────────────────────────────────────────────") + for item in findings: + print(f" {item['severity']:4} {item['check']}") + print(f" {item['detail']}") + + +def main() -> None: + parser = argparse.ArgumentParser(description="Audit subagent configuration safety") + group = parser.add_mutually_exclusive_group(required=True) + group.add_argument("--audit", action="store_true", help="Run the subagent capability audit") + group.add_argument("--status", action="store_true", help="Show the last audit summary") + group.add_argument("--findings", action="store_true", help="Show findings from the last audit") + parser.add_argument("--path", default=str(OPENCLAW_DIR), help="Config root to inspect") + parser.add_argument("--format", choices=["human", "json"], default="human") + args = parser.parse_args() + + if args.audit: + root = Path(args.path).expanduser().resolve() + files = discover_files(root) + findings = audit_files(files) + now = datetime.now().isoformat() + state = { + "last_audit_at": now, + "config_files": [str(path) for path in files], + "findings": findings, + "audit_history": [], + } + previous = load_state() + history = previous.get("audit_history") or [] + history.insert( + 0, + { + "audited_at": now, + "fail_count": sum(1 for item in findings if item["severity"] == "FAIL"), + "warn_count": sum(1 for item in findings if item["severity"] == "WARN"), + "info_count": sum(1 for item in findings if item["severity"] == "INFO"), + "file_count": len(files), + }, + ) + state["audit_history"] = history[:MAX_HISTORY] + save_state(state) + else: + state = load_state() + + if args.format == "json": + print(json.dumps(state, indent=2)) + return + + print_summary(state) + if args.findings or args.audit: + print_findings(state) + + +if __name__ == "__main__": + main() diff --git a/skills/openclaw-native/subagent-capability-auditor/example-state.yaml b/skills/openclaw-native/subagent-capability-auditor/example-state.yaml new file mode 100644 index 0000000..baa2cfe --- /dev/null +++ b/skills/openclaw-native/subagent-capability-auditor/example-state.yaml @@ -0,0 +1,42 @@ +# Example runtime state for subagent-capability-auditor +last_audit_at: "2026-03-30T14:20:05.000000" +config_files: + - "/Users/you/.openclaw/config/agents.yaml" + - "/Users/you/.openclaw/config/runtime.json" +findings: + - severity: WARN + check: SPAWN_DEPTH_HIGH + detail: "maxSpawnDepth is set to 6; runaway delegation becomes harder to reason about." + suggestion: "Lower maxSpawnDepth to 2 or 3 unless you have a strong reason to exceed it." + file_path: "/Users/you/.openclaw/config/runtime.json" + detected_at: "2026-03-30T14:20:05.000000" + resolved: false + - severity: WARN + check: LARGE_AGENT_FLEET + detail: "Detected 12 configured agents in a flat fleet." + suggestion: "Consolidate overlapping roles or create stronger role boundaries before scaling further." + file_path: "/Users/you/.openclaw/config/agents.yaml" + detected_at: "2026-03-30T14:20:05.000000" + resolved: false + - severity: INFO + check: SPAWN_TOOL_PRESENT + detail: "Spawn tooling appears to be exposed in the current configuration." + suggestion: "No action required." + file_path: "/Users/you/.openclaw/config/agents.yaml" + detected_at: "2026-03-30T14:20:05.000000" + resolved: false +audit_history: + - audited_at: "2026-03-30T14:20:05.000000" + fail_count: 0 + warn_count: 2 + info_count: 1 + file_count: 2 +# ── Walkthrough ────────────────────────────────────────────────────────────── +# python3 audit.py --audit +# +# Subagent Capability Auditor +# ─────────────────────────────────────────────────────── +# 2 config files | 0 FAIL | 2 WARN | 1 INFO +# +# WARN SPAWN_DEPTH_HIGH +# maxSpawnDepth is set to 6; runaway delegation becomes harder to reason about. diff --git a/skills/openclaw-native/upgrade-rollback-manager/SKILL.md b/skills/openclaw-native/upgrade-rollback-manager/SKILL.md new file mode 100644 index 0000000..48f6f25 --- /dev/null +++ b/skills/openclaw-native/upgrade-rollback-manager/SKILL.md @@ -0,0 +1,61 @@ +--- +name: upgrade-rollback-manager +version: "1.0" +category: openclaw-native +description: Snapshots OpenClaw config and state before upgrades, records version fingerprints, and writes rollback instructions so runtime updates stop being one-way bets. +stateful: true +--- + +# Upgrade Rollback Manager + +## What it does + +OpenClaw upgrades can change config shape, break skill assumptions, or leave the runtime in a half-working state. Upgrade Rollback Manager takes a snapshot before the change, records what version and config you upgraded from, and writes a rollback brief you can use if the new runtime goes sideways. + +## When to invoke + +- Immediately before upgrading OpenClaw +- Before changing runtime config in a way that is hard to reverse +- After an upgrade, to compare the current runtime against the last known-good snapshot +- When users report "it worked before the update" and you need a rollback path + +## What it records + +- Detected OpenClaw version +- Timestamped snapshot directory +- Checksums of important config files +- A list of preserved paths +- Human-readable rollback instructions + +## How to use + +```bash +python3 manage.py --snapshot +python3 manage.py --snapshot --label before-1-5-upgrade +python3 manage.py --status +python3 manage.py --list +python3 manage.py --rollback-plan before-1-5-upgrade +python3 manage.py --format json +``` + +## Safety model + +This skill does not automatically downgrade or overwrite the runtime. It prepares a rollback kit: + +- snapshot files +- version/config metadata +- exact restore steps + +The operator still chooses when to execute the rollback. + +## Difference from deployment-preflight + +`deployment-preflight` tells you whether a deployment is wired safely. + +`upgrade-rollback-manager` preserves a way back before you change that deployment. + +## State + +State file: `~/.openclaw/skill-state/upgrade-rollback-manager/state.yaml` + +Fields: `last_snapshot_at`, `latest_snapshot`, `snapshots`, `rollback_history`. diff --git a/skills/openclaw-native/upgrade-rollback-manager/STATE_SCHEMA.yaml b/skills/openclaw-native/upgrade-rollback-manager/STATE_SCHEMA.yaml new file mode 100644 index 0000000..a130908 --- /dev/null +++ b/skills/openclaw-native/upgrade-rollback-manager/STATE_SCHEMA.yaml @@ -0,0 +1,28 @@ +version: "1.0" +description: Upgrade snapshots, preserved config metadata, and rollback planning history. +fields: + last_snapshot_at: + type: datetime + latest_snapshot: + type: object + items: + label: { type: string } + snapshot_dir: { type: string } + openclaw_version: { type: string } + created_at: { type: datetime } + files: { type: list, items: { type: string } } + snapshots: + type: list + items: + label: { type: string } + snapshot_dir: { type: string } + openclaw_version: { type: string } + created_at: { type: datetime } + files: { type: list, items: { type: string } } + rollback_history: + type: list + description: Rolling history of generated rollback plans (last 12) + items: + generated_at: { type: datetime } + label: { type: string } + snapshot_dir: { type: string } diff --git a/skills/openclaw-native/upgrade-rollback-manager/example-state.yaml b/skills/openclaw-native/upgrade-rollback-manager/example-state.yaml new file mode 100644 index 0000000..3cc0abc --- /dev/null +++ b/skills/openclaw-native/upgrade-rollback-manager/example-state.yaml @@ -0,0 +1,39 @@ +# Example runtime state for upgrade-rollback-manager +last_snapshot_at: "2026-03-31T08:45:00.000000" +latest_snapshot: + label: "before-1-5-upgrade" + snapshot_dir: "/Users/you/.openclaw/rollback-snapshots/20260331-084500-before-1-5-upgrade" + openclaw_version: "1.4.3" + created_at: "2026-03-31T08:45:00.000000" + files: + - "/Users/you/.openclaw/openclaw.json" + - "/Users/you/.openclaw/extensions" + - "/Users/you/.openclaw/workspace" +snapshots: + - label: "before-1-5-upgrade" + snapshot_dir: "/Users/you/.openclaw/rollback-snapshots/20260331-084500-before-1-5-upgrade" + openclaw_version: "1.4.3" + created_at: "2026-03-31T08:45:00.000000" + files: + - "/Users/you/.openclaw/openclaw.json" + - "/Users/you/.openclaw/extensions" + - "/Users/you/.openclaw/workspace" +rollback_history: + - generated_at: "2026-03-31T08:46:12.000000" + label: "before-1-5-upgrade" + snapshot_dir: "/Users/you/.openclaw/rollback-snapshots/20260331-084500-before-1-5-upgrade" +# ── Walkthrough ────────────────────────────────────────────────────────────── +# python3 manage.py --snapshot --label before-1-5-upgrade +# +# Upgrade Rollback Manager +# ─────────────────────────────────────────────────────── +# Snapshot: before-1-5-upgrade +# Version: 1.4.3 +# Files preserved: 3 +# +# python3 manage.py --rollback-plan before-1-5-upgrade +# +# Rollback plan for before-1-5-upgrade +# 1. Stop the gateway +# 2. Restore preserved files from the snapshot directory +# 3. Restart on the previous runtime version diff --git a/skills/openclaw-native/upgrade-rollback-manager/manage.py b/skills/openclaw-native/upgrade-rollback-manager/manage.py new file mode 100755 index 0000000..d04372e --- /dev/null +++ b/skills/openclaw-native/upgrade-rollback-manager/manage.py @@ -0,0 +1,205 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +""" +Upgrade Rollback Manager for openclaw-superpowers. + +Creates pre-upgrade snapshots of important OpenClaw paths and prints rollback +instructions from recorded snapshots. +""" + +import argparse +import json +import os +import shutil +import subprocess +from datetime import datetime +from pathlib import Path + +try: + import yaml + HAS_YAML = True +except ImportError: + HAS_YAML = False + +OPENCLAW_DIR = Path(os.environ.get("OPENCLAW_HOME", Path.home() / ".openclaw")) +STATE_FILE = OPENCLAW_DIR / "skill-state" / "upgrade-rollback-manager" / "state.yaml" +SNAPSHOT_ROOT = OPENCLAW_DIR / "rollback-snapshots" +MAX_HISTORY = 12 +PRESERVED_PATHS = [ + OPENCLAW_DIR / "openclaw.json", + OPENCLAW_DIR / "config", + OPENCLAW_DIR / "extensions", + OPENCLAW_DIR / "workspace", +] + + +def default_state() -> dict: + return { + "last_snapshot_at": "", + "latest_snapshot": {}, + "snapshots": [], + "rollback_history": [], + } + + +def load_state() -> dict: + if not STATE_FILE.exists(): + return default_state() + try: + text = STATE_FILE.read_text() + if HAS_YAML: + return yaml.safe_load(text) or default_state() + return json.loads(text) + except Exception: + return default_state() + + +def save_state(state: dict) -> None: + STATE_FILE.parent.mkdir(parents=True, exist_ok=True) + if HAS_YAML: + with open(STATE_FILE, "w") as handle: + yaml.dump(state, handle, default_flow_style=False, allow_unicode=True, sort_keys=False) + else: + STATE_FILE.write_text(json.dumps(state, indent=2)) + + +def detect_openclaw_version() -> str: + if shutil.which("openclaw") is None: + return "unknown" + try: + proc = subprocess.run( + ["openclaw", "--version"], + capture_output=True, + text=True, + timeout=5, + check=False, + ) + except Exception: + return "unknown" + if proc.returncode != 0: + return "unknown" + return proc.stdout.strip() or "unknown" + + +def snapshot(label: str | None) -> dict: + timestamp = datetime.now().strftime("%Y%m%d-%H%M%S") + label = label or "pre-upgrade" + snapshot_dir = SNAPSHOT_ROOT / f"{timestamp}-{label}" + snapshot_dir.mkdir(parents=True, exist_ok=True) + + copied = [] + for path in PRESERVED_PATHS: + if not path.exists(): + continue + target = snapshot_dir / path.name + if path.is_dir(): + shutil.copytree(path, target, dirs_exist_ok=True) + else: + shutil.copy2(path, target) + copied.append(str(path)) + + entry = { + "label": label, + "snapshot_dir": str(snapshot_dir), + "openclaw_version": detect_openclaw_version(), + "created_at": datetime.now().isoformat(), + "files": copied, + } + + state = load_state() + snapshots = state.get("snapshots") or [] + snapshots.insert(0, entry) + state["last_snapshot_at"] = entry["created_at"] + state["latest_snapshot"] = entry + state["snapshots"] = snapshots[:MAX_HISTORY] + save_state(state) + return state + + +def generate_plan(state: dict, label: str) -> dict: + snapshots = state.get("snapshots") or [] + match = next((item for item in snapshots if item.get("label") == label), None) + if not match: + raise SystemExit(f"No snapshot found for label '{label}'") + history = state.get("rollback_history") or [] + history.insert( + 0, + { + "generated_at": datetime.now().isoformat(), + "label": match["label"], + "snapshot_dir": match["snapshot_dir"], + }, + ) + state["rollback_history"] = history[:MAX_HISTORY] + save_state(state) + return match + + +def print_status(state: dict) -> None: + latest = state.get("latest_snapshot") or {} + print("\nUpgrade Rollback Manager") + print("───────────────────────────────────────────────────────") + if not latest: + print(" No snapshots recorded.") + return + print(f" Latest: {latest.get('label', '')}") + print(f" Version: {latest.get('openclaw_version', '')}") + print(f" Snapshot dir: {latest.get('snapshot_dir', '')}") + print(f" Files preserved: {len(latest.get('files', []))}") + + +def print_list(state: dict) -> None: + snapshots = state.get("snapshots") or [] + if not snapshots: + print("No snapshots recorded.") + return + print("\nSnapshots") + print("───────────────────────────────────────────────────────") + for item in snapshots: + print(f" {item['label']} {item['openclaw_version']} {item['created_at'][:19]}") + + +def print_plan(snapshot_entry: dict) -> None: + print(f"\nRollback plan for {snapshot_entry['label']}") + print("───────────────────────────────────────────────────────") + print("1. Stop the OpenClaw gateway and background workers.") + print(f"2. Restore preserved files from {snapshot_entry['snapshot_dir']}.") + print(f"3. Reinstall or restart the previous runtime version: {snapshot_entry['openclaw_version']}.") + print("4. Re-run deployment-preflight and runtime-verification-dashboard before reopening automation.") + + +def main() -> None: + parser = argparse.ArgumentParser(description="Create rollback snapshots before upgrades") + group = parser.add_mutually_exclusive_group(required=True) + group.add_argument("--snapshot", action="store_true", help="Create a rollback snapshot") + group.add_argument("--status", action="store_true", help="Show the latest snapshot summary") + group.add_argument("--list", action="store_true", help="List recent snapshots") + group.add_argument("--rollback-plan", metavar="LABEL", help="Print rollback instructions for a snapshot label") + parser.add_argument("--label", help="Optional snapshot label") + parser.add_argument("--format", choices=["human", "json"], default="human") + args = parser.parse_args() + + if args.snapshot: + state = snapshot(args.label) + else: + state = load_state() + + if args.format == "json": + if args.rollback_plan: + print(json.dumps(generate_plan(state, args.rollback_plan), indent=2)) + else: + print(json.dumps(state, indent=2)) + return + + if args.status or args.snapshot: + print_status(state) + elif args.list: + print_list(state) + elif args.rollback_plan: + plan = generate_plan(state, args.rollback_plan) + print_plan(plan) + + +if __name__ == "__main__": + main()