ArchieIndian · ArchieIndian · Mar 29, 2026 · Mar 29, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,15 @@
 # Changelog
 
+## [0.2.0] - 2026-03-29
+
+### Added
+- Runtime reliability skills: `runtime-verification-dashboard`, `deployment-preflight`, `session-reset-recovery`, `cron-execution-prover`, `message-delivery-verifier`, `subagent-capability-auditor`, `upgrade-rollback-manager`, and `mcp-auth-lifecycle-manager`
+- Operational playbooks in `docs/OPERATIONS.md`
+
+### Changed
+- README and contributor guidance now reflect the expanded operational skill set and validation workflow
+- New shared `scripts/state_helpers.py` reduces repeated state loading and saving code across recent Python helpers
+
 ## [0.1.0] - 2026-03-15
 
 ### Added

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -6,7 +6,7 @@ We'd love your skills! Here's how to contribute.
 
 1. **Propose your idea** — [Open a Skill Proposal issue](../../issues/new?template=skill-proposal.yml) to get feedback
 2. **Create the skill** — Use the `create-skill` superpower or copy the [template](skills/core/create-skill/TEMPLATE.md)
-3. **Validate locally** — Run `./scripts/validate-skills.sh` to catch issues
+3. **Validate locally** — Run `./scripts/validate-skills.sh` and `bash ./tests/test-runner.sh`
 4. **Submit a PR** — CI validates automatically on any PR that touches `skills/`
 
 ## Where to Put Your Skill
@@ -46,7 +46,15 @@ Run the validation script before submitting:
 ./scripts/validate-skills.sh
 ```
 
-It checks: frontmatter format, naming conventions, file structure, line count, stateful skill coherence (`STATE_SCHEMA.yaml` present when `stateful: true`), and cron expression format.
+It checks: frontmatter format, naming conventions, file structure, line count, stateful skill coherence (`STATE_SCHEMA.yaml` present when `stateful: true`), cron expression format, and README inventory metrics.
+
+Run the repository smoke tests too:
+
+```bash
+bash ./tests/test-runner.sh
+```
+
+If you are adding or updating a stateful helper script, prefer reusing the shared helpers in `scripts/state_helpers.py` instead of open-coding the same YAML/JSON state loader again.
 
 ## Pull Requests
 

diff --git a/README.md b/README.md
@@ -61,6 +61,22 @@ openclaw gateway restart
 
 Install `PyYAML` before using the stateful Python helpers: `python3 -m pip install PyYAML`.
 
+For operator workflows and rollout order, see [docs/OPERATIONS.md](docs/OPERATIONS.md).
+
+---
+
+## Start Here
+
+If you are adopting the repo for real unattended usage, start in this order:
+
+1. Run `deployment-preflight` before the first install or after Docker/compose changes.
+2. Install the repo and run `runtime-verification-dashboard` once the runtime is live.
+3. Wrap scheduled work with `cron-execution-prover` and `message-delivery-verifier`.
+4. Add `session-reset-recovery` and `upgrade-rollback-manager` before relying on overnight or upgrade-heavy automation.
+5. If you use MCP servers, pair `mcp-health-checker` with `mcp-auth-lifecycle-manager`.
+
+This gives you deployment safety, runtime visibility, proof of execution, proof of delivery, reset survival, rollback coverage, and MCP auth coverage without enabling every skill at once.
+
 ---
 
 ## Skills included
@@ -259,6 +275,9 @@ Skills marked with a script ship a small executable alongside their `SKILL.md`:
 **Self-hosted or Docker deployment**
 > Run `deployment-preflight` before the first rollout or after compose changes to catch missing mounts, missing bootstrap files, and public gateway exposure. Follow it with `runtime-verification-dashboard` once the runtime is live.
 
+**Operators building a reliability stack**
+> Use the playbooks in [docs/OPERATIONS.md](docs/OPERATIONS.md) to layer deployment safety, cron proofing, delivery verification, reset recovery, upgrade rollback, and MCP auth checks in a sane order.
+
 **Open-source maintainer**
 > `community-skill-radar` scans Reddit for pain points automatically. `skill-vetting` catches malicious community contributions before they're installed. `installed-skill-auditor` detects post-install tampering.
 

diff --git a/docs/OPERATIONS.md b/docs/OPERATIONS.md
@@ -0,0 +1,68 @@
+# Operational Playbooks
+
+This repo has enough always-on skills now that the useful question is no longer "which skills exist?" but "which ones should I turn on together?"
+
+Use these playbooks as a rollout order.
+
+## 1. First deployment
+
+Use this when bringing up OpenClaw on a laptop, server, or Docker host for the first time.
+
+1. Run `deployment-preflight` before install or after any compose change.
+2. Install `openclaw-superpowers`.
+3. Run `runtime-verification-dashboard` once the runtime is live.
+4. Fix any missing mounts, missing bootstrap files, missing cron registrations, or state path issues before enabling unattended workflows.
+
+Why this order:
+- `deployment-preflight` catches layout and exposure mistakes before the runtime starts.
+- `runtime-verification-dashboard` catches post-install drift inside the live runtime.
+
+## 2. Scheduled workflow with proof
+
+Use this when a cron workflow writes files, posts a report, or notifies a human.
+
+1. Wrap the workflow in `cron-execution-prover`.
+2. Track the last-mile notification in `message-delivery-verifier`.
+3. Review stale executions and stale deliveries before trusting the automation.
+
+Why this order:
+- `cron-execution-prover` proves the job started and finished.
+- `message-delivery-verifier` proves the output was actually sent and acknowledged.
+
+## 3. Overnight continuity
+
+Use this when long-running work regularly crosses the session reset window.
+
+1. Enable `session-reset-recovery`.
+2. Pair it with `task-handoff` for tasks that may span multiple sessions or agents.
+3. Review `resume_brief` output after restart before resuming work.
+
+Why this order:
+- `session-reset-recovery` preserves the active checkpoint.
+- `task-handoff` keeps the next operator or session from restarting blind.
+
+## 4. Safer upgrades
+
+Use this before changing OpenClaw versions, config structure, or deployment layout.
+
+1. Run `upgrade-rollback-manager --snapshot`.
+2. Apply the upgrade.
+3. Re-run `deployment-preflight`.
+4. Re-run `runtime-verification-dashboard`.
+5. If something regresses, generate rollback instructions with `upgrade-rollback-manager --rollback-plan <label>`.
+
+Why this order:
+- Snapshot first.
+- Then verify both the deployment surface and the live runtime after the change.
+
+## 5. MCP-dependent automation
+
+Use this when OpenClaw depends on GitHub, Linear, filesystem, browser, or other MCP servers.
+
+1. Use `mcp-health-checker` to verify transport reachability.
+2. Use `mcp-auth-lifecycle-manager` to verify token expiry, env vars, and refresh readiness.
+3. Avoid unattended dependency on MCP servers that still require interactive re-authentication.
+
+Why this order:
+- Reachability and auth are different failure modes.
+- A healthy server can still be unusable if the auth path is broken.
diff --git a/scripts/state_helpers.py b/scripts/state_helpers.py
@@ -0,0 +1,68 @@
+from __future__ import annotations
+
+import copy
+import json
+import os
+from datetime import datetime
+from pathlib import Path
+
+try:
+    import yaml
+
+    HAS_YAML = True
+except ImportError:
+    HAS_YAML = False
+
+
+def openclaw_dir() -> Path:
+    return Path(os.environ.get("OPENCLAW_HOME", Path.home() / ".openclaw"))
+
+
+def skill_state_file(skill_name: str, filename: str = "state.yaml") -> Path:
+    return openclaw_dir() / "skill-state" / skill_name / filename
+
+
+def now_iso(timespec: str = "seconds") -> str:
+    return datetime.now().isoformat(timespec=timespec)
+
+
+def _default_value(default_factory):
+    if callable(default_factory):
+        return default_factory()
+    return copy.deepcopy(default_factory)
+
+
+def load_state(path: Path, default_factory) -> dict:
+    default_value = _default_value(default_factory)
+    if not path.exists():
+        return default_value
+    try:
+        text = path.read_text()
+        if HAS_YAML:
+            return yaml.safe_load(text) or _default_value(default_factory)
+        return json.loads(text)
+    except Exception:
+        return _default_value(default_factory)
+
+
+def save_state(path: Path, state: dict) -> None:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    if HAS_YAML:
+        with open(path, "w") as handle:
+            yaml.dump(state, handle, default_flow_style=False, allow_unicode=True, sort_keys=False)
+    else:
+        path.write_text(json.dumps(state, indent=2))
+
+
+def load_structured(path: Path, default_factory=dict):
+    if not path.exists():
+        return _default_value(default_factory)
+    try:
+        text = path.read_text()
+        if path.suffix == ".json":
+            return json.loads(text)
+        if HAS_YAML:
+            return yaml.safe_load(text) or _default_value(default_factory)
+    except Exception:
+        pass
+    return _default_value(default_factory)
diff --git a/skills/openclaw-native/cron-execution-prover/prove.py b/skills/openclaw-native/cron-execution-prover/prove.py
@@ -10,18 +10,19 @@
 
 import argparse
 import json
-import os
+import sys
 from datetime import datetime
 from pathlib import Path
 
-try:
-    import yaml
-    HAS_YAML = True
-except ImportError:
-    HAS_YAML = False
+REPO_ROOT = Path(__file__).resolve().parents[3]
+SCRIPTS_DIR = REPO_ROOT / "scripts"
+if str(SCRIPTS_DIR) not in sys.path:
+    sys.path.insert(0, str(SCRIPTS_DIR))
 
-OPENCLAW_DIR = Path(os.environ.get("OPENCLAW_HOME", Path.home() / ".openclaw"))
-STATE_FILE = OPENCLAW_DIR / "skill-state" / "cron-execution-prover" / "state.yaml"
+from state_helpers import load_state as load_state_file
+from state_helpers import now_iso, save_state as save_state_file, skill_state_file
+
+STATE_FILE = skill_state_file("cron-execution-prover")
 MAX_RUNS = 100
 MAX_HISTORY = 12
 STALE_AFTER_MINUTES = 60
@@ -37,28 +38,11 @@ def default_state() -> dict:
 
 
 def load_state() -> dict:
-    if not STATE_FILE.exists():
-        return default_state()
-    try:
-        text = STATE_FILE.read_text()
-        if HAS_YAML:
-            return yaml.safe_load(text) or default_state()
-        return json.loads(text)
-    except Exception:
-        return default_state()
+    return load_state_file(STATE_FILE, default_state)
 
 
 def save_state(state: dict) -> None:
-    STATE_FILE.parent.mkdir(parents=True, exist_ok=True)
-    if HAS_YAML:
-        with open(STATE_FILE, "w") as handle:
-            yaml.dump(state, handle, default_flow_style=False, allow_unicode=True, sort_keys=False)
-    else:
-        STATE_FILE.write_text(json.dumps(state, indent=2))
-
-
-def now_iso() -> str:
-    return datetime.now().isoformat(timespec="seconds")
+    save_state_file(STATE_FILE, state)
 
 
 def find_run(state: dict, skill: str, run_id: str) -> dict | None:

diff --git a/skills/openclaw-native/deployment-preflight/check.py b/skills/openclaw-native/deployment-preflight/check.py
@@ -23,17 +23,18 @@
 import shutil
 import subprocess
 import sys
-from datetime import datetime
 from pathlib import Path
 
-try:
-    import yaml
-    HAS_YAML = True
-except ImportError:
-    HAS_YAML = False
+REPO_ROOT = Path(__file__).resolve().parents[3]
+SCRIPTS_DIR = REPO_ROOT / "scripts"
+if str(SCRIPTS_DIR) not in sys.path:
+    sys.path.insert(0, str(SCRIPTS_DIR))
 
-OPENCLAW_DIR = Path(os.environ.get("OPENCLAW_HOME", Path.home() / ".openclaw"))
-STATE_FILE = OPENCLAW_DIR / "skill-state" / "deployment-preflight" / "state.yaml"
+from state_helpers import HAS_YAML, load_state as load_state_file
+from state_helpers import now_iso, openclaw_dir, save_state as save_state_file, skill_state_file
+
+OPENCLAW_DIR = openclaw_dir()
+STATE_FILE = skill_state_file("deployment-preflight")
 WORKSPACE_DIR = Path(os.environ.get("OPENCLAW_WORKSPACE", OPENCLAW_DIR / "workspace"))
 SUPERPOWERS_PATH = OPENCLAW_DIR / "extensions" / "superpowers"
 MAX_HISTORY = 12
@@ -61,24 +62,11 @@ def default_state() -> dict:
 
 
 def load_state() -> dict:
-    if not STATE_FILE.exists():
-        return default_state()
-    try:
-        text = STATE_FILE.read_text()
-        if HAS_YAML:
-            return yaml.safe_load(text) or default_state()
-        return json.loads(text)
-    except Exception:
-        return default_state()
+    return load_state_file(STATE_FILE, default_state)
 
 
 def save_state(state: dict) -> None:
-    STATE_FILE.parent.mkdir(parents=True, exist_ok=True)
-    if HAS_YAML:
-        with open(STATE_FILE, "w") as handle:
-            yaml.dump(state, handle, default_flow_style=False, allow_unicode=True, sort_keys=False)
-    else:
-        STATE_FILE.write_text(json.dumps(state, indent=2))
+    save_state_file(STATE_FILE, state)
 
 
 def finding(severity: str, check: str, detail: str, suggestion: str, file_path: Path | str = "") -> dict:
@@ -88,7 +76,7 @@ def finding(severity: str, check: str, detail: str, suggestion: str, file_path:
         "detail": detail,
         "suggestion": suggestion,
         "file_path": str(file_path),
-        "detected_at": datetime.now().isoformat(),
+        "detected_at": now_iso(),
         "resolved": False,
     }
 
@@ -401,7 +389,7 @@ def run_check(root: Path) -> dict:
 
     state = load_state()
     history = state.get("check_history") or []
-    now = datetime.now().isoformat()
+    now = now_iso()
     history.insert(
         0,
         {