Add tool-description-optimizer skill (#30)

ArchieIndian · claude · web-flow · commit 42e19317ca20 · 2026-03-16T00:53:07.000+05:30
Scores skill descriptions for trigger quality across 5 dimensions:
clarity, specificity, keyword density, uniqueness, and length.
Grades A-F with concrete rewrite suggestions. Companion script:
optimize.py with --scan, --skill, --suggest, --compare, --status.

Inspired by OpenLobster's tool-description scoring layer.

Co-authored-by: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/skills/openclaw-native/tool-description-optimizer/SKILL.md b/skills/openclaw-native/tool-description-optimizer/SKILL.md
@@ -0,0 +1,109 @@
+---
+name: tool-description-optimizer
+version: "1.0"
+category: openclaw-native
+description: Analyzes skill descriptions for trigger quality — scores clarity, keyword density, and specificity, then suggests rewrites that improve discovery accuracy.
+stateful: true
+---
+
+# Tool Description Optimizer
+
+## What it does
+
+A skill's description is its only discovery mechanism. If the description is vague, overlapping, or keyword-poor, the agent won't trigger it — or worse, will trigger the wrong skill. Tool Description Optimizer analyzes every installed skill's description for trigger quality and suggests concrete rewrites.
+
+Inspired by OpenLobster's tool-description scoring layer, which penalizes vague descriptions and rewards keyword-rich, action-specific ones.
+
+## When to invoke
+
+- After installing new skills — check if descriptions are trigger-ready
+- When a skill isn't firing when expected — diagnose whether the description is the problem
+- Periodically to audit all descriptions for quality drift
+- Before publishing a skill — polish the description for discoverability
+
+## How it works
+
+### Scoring dimensions (5 metrics, 0–10 each)
+
+| Metric | What it measures | Weight |
+|---|---|---|
+| Clarity | Single clear purpose, no ambiguity | 2x |
+| Specificity | Action verbs, concrete nouns vs. vague terms | 2x |
+| Keyword density | Trigger-relevant keywords per sentence | 1.5x |
+| Uniqueness | Low overlap with other installed skill descriptions | 1.5x |
+| Length | Optimal range (15–40 words) — too short = vague, too long = diluted | 1x |
+
+### Quality grades
+
+| Grade | Score range | Meaning |
+|---|---|---|
+| A | 8.0–10.0 | Excellent — high trigger accuracy expected |
+| B | 6.0–7.9 | Good — minor improvements possible |
+| C | 4.0–5.9 | Fair — likely to miss triggers or overlap |
+| D | 2.0–3.9 | Poor — needs rewrite |
+| F | 0.0–1.9 | Failing — will not trigger reliably |
+
+## How to use
+
+```bash
+python3 optimize.py --scan                    # Score all installed skills
+python3 optimize.py --scan --grade C          # Only show skills graded C or below
+python3 optimize.py --skill <name>            # Deep analysis of a single skill
+python3 optimize.py --suggest <name>          # Generate rewrite suggestions
+python3 optimize.py --compare "desc A" "desc B"  # Compare two descriptions
+python3 optimize.py --status                  # Last scan summary
+python3 optimize.py --format json             # Machine-readable output
+```
+
+## Procedure
+
+**Step 1 — Run a full scan**
+
+```bash
+python3 optimize.py --scan
+```
+
+Review the scorecard. Focus on skills graded C or below — these are the ones most likely to cause trigger failures.
+
+**Step 2 — Get rewrite suggestions for low-scoring skills**
+
+```bash
+python3 optimize.py --suggest <skill-name>
+```
+
+The optimizer generates 2–3 alternative descriptions with predicted score improvements.
+
+**Step 3 — Compare alternatives**
+
+```bash
+python3 optimize.py --compare "original description" "suggested rewrite"
+```
+
+Side-by-side scoring shows exactly which metrics improved.
+
+**Step 4 — Apply the best rewrite**
+
+Edit the skill's `SKILL.md` frontmatter `description:` field with the chosen rewrite.
+
+## Vague word penalties
+
+These words score 0 on specificity — they say nothing actionable:
+
+`helps`, `manages`, `handles`, `deals with`, `works with`, `does stuff`, `various`, `things`, `general`, `misc`, `utility`, `tool for`, `assistant for`
+
+## Strong trigger keywords (examples)
+
+`scans`, `detects`, `validates`, `generates`, `audits`, `monitors`, `checks`, `reports`, `fixes`, `migrates`, `syncs`, `schedules`, `blocks`, `scores`, `diagnoses`
+
+## State
+
+Scan results and per-skill scores stored in `~/.openclaw/skill-state/tool-description-optimizer/state.yaml`.
+
+Fields: `last_scan_at`, `skill_scores` list, `scan_history`.
+
+## Notes
+
+- Does not modify any skill files — analysis and suggestions only
+- Uniqueness scoring uses Jaccard similarity against all other installed descriptions
+- Length scoring uses a bell curve centered at 25 words (optimal)
+- Rewrite suggestions are heuristic-based, not LLM-generated — deterministic and fast
diff --git a/skills/openclaw-native/tool-description-optimizer/STATE_SCHEMA.yaml b/skills/openclaw-native/tool-description-optimizer/STATE_SCHEMA.yaml
@@ -0,0 +1,27 @@
+version: "1.0"
+description: Tool description quality scores, rewrite suggestions, and scan history.
+fields:
+  last_scan_at:
+    type: datetime
+  skill_scores:
+    type: list
+    description: Per-skill quality scores from the most recent scan
+    items:
+      skill_name:   { type: string }
+      description:  { type: string }
+      word_count:   { type: integer }
+      clarity:      { type: float, description: "0-10 clarity score" }
+      specificity:  { type: float, description: "0-10 specificity score" }
+      keyword_density: { type: float, description: "0-10 keyword density score" }
+      uniqueness:   { type: float, description: "0-10 uniqueness vs other skills" }
+      length_score: { type: float, description: "0-10 length optimality score" }
+      overall:      { type: float, description: "Weighted composite score" }
+      grade:        { type: string, description: "A/B/C/D/F" }
+  scan_history:
+    type: list
+    description: Rolling log of past scans (last 20)
+    items:
+      scanned_at:      { type: datetime }
+      skills_scanned:  { type: integer }
+      avg_score:       { type: float }
+      grade_distribution: { type: object, description: "Count per grade: A, B, C, D, F" }
diff --git a/skills/openclaw-native/tool-description-optimizer/example-state.yaml b/skills/openclaw-native/tool-description-optimizer/example-state.yaml
@@ -0,0 +1,94 @@
+# Example runtime state for tool-description-optimizer
+last_scan_at: "2026-03-16T14:00:05.221000"
+skill_scores:
+  - skill_name: using-superpowers
+    description: "Bootstrap — teaches the agent how to find and invoke skills"
+    word_count: 11
+    clarity: 7.2
+    specificity: 3.8
+    keyword_density: 3.3
+    uniqueness: 8.1
+    length_score: 4.8
+    overall: 5.6
+    grade: C
+  - skill_name: config-encryption-auditor
+    description: "Scans OpenClaw config directories for plaintext API keys, tokens, and secrets in unencrypted files."
+    word_count: 15
+    clarity: 9.2
+    specificity: 8.5
+    keyword_density: 8.0
+    uniqueness: 9.0
+    length_score: 7.5
+    overall: 8.5
+    grade: A
+  - skill_name: memory-graph-builder
+    description: "Parses MEMORY.md into a knowledge graph with typed relationships, detects duplicates and contradictions, and generates a compressed memory digest."
+    word_count: 22
+    clarity: 8.8
+    specificity: 7.6
+    keyword_density: 7.2
+    uniqueness: 9.4
+    length_score: 9.5
+    overall: 8.5
+    grade: A
+scan_history:
+  - scanned_at: "2026-03-16T14:00:05.221000"
+    skills_scanned: 40
+    avg_score: 7.2
+    grade_distribution:
+      A: 18
+      B: 14
+      C: 6
+      D: 2
+      F: 0
+  - scanned_at: "2026-03-13T14:00:00.000000"
+    skills_scanned: 36
+    avg_score: 6.8
+    grade_distribution:
+      A: 14
+      B: 12
+      C: 7
+      D: 3
+      F: 0
+# ── Walkthrough ──────────────────────────────────────────────────────────────
+# python3 optimize.py --scan
+#
+#   Tool Description Quality Scan — 2026-03-16
+#   ────────────────────────────────────────────────────────────
+#     40 skills scanned | avg score: 7.2
+#     Grades: 18xA  14xB  6xC  2xD  0xF
+#
+#     ! [D] 3.8 — some-vague-skill
+#          clarity=2.0 spec=1.5 kw=1.2 uniq=8.0 len=6.5
+#          "A helpful utility tool that manages various things..."
+#
+#     ~ [C] 5.6 — using-superpowers
+#          clarity=7.2 spec=3.8 kw=3.3 uniq=8.1 len=4.8
+#          "Bootstrap — teaches the agent how to find and invoke skills"
+#
+# python3 optimize.py --suggest using-superpowers
+#
+#   Rewrite Suggestions: using-superpowers
+#   ──────────────────────────────────────────────────
+#     Current: "Bootstrap — teaches the agent how to find and invoke skills"
+#     Score: 5.6 (C)
+#
+#     1. Front-load action verb
+#        "Teaches the agent how to discover, invoke, and chain installed skills"
+#        Predicted: 7.4 (B) [+1.8]
+#
+# python3 optimize.py --compare "A tool that helps manage stuff" "Scans config files for plaintext secrets and suggests env var migration"
+#
+#   Description Comparison
+#   ──────────────────────────────────────────────────
+#     A: "A tool that helps manage stuff"
+#     B: "Scans config files for plaintext secrets and suggests env var migration"
+#
+#     Clarity       A=2.0   B=9.5   B
+#     Specificity   A=0.0   B=8.5   B
+#     Keywords      A=0.0   B=7.8   B
+#     Uniqueness    A=7.0   B=7.0   =
+#     Length        A=5.2   B=8.8   B
+#     OVERALL       A=2.8   B=8.4   B
+#
+#     Grade:  A=D  B=A
diff --git a/skills/openclaw-native/tool-description-optimizer/optimize.py b/skills/openclaw-native/tool-description-optimizer/optimize.py