Skip to content

Commit 42e1931

Browse files
ArchieIndianclaude
andauthored
Add tool-description-optimizer skill (#30)
Scores skill descriptions for trigger quality across 5 dimensions: clarity, specificity, keyword density, uniqueness, and length. Grades A-F with concrete rewrite suggestions. Companion script: optimize.py with --scan, --skill, --suggest, --compare, --status. Inspired by OpenLobster's tool-description scoring layer. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent e7cabf4 commit 42e1931

4 files changed

Lines changed: 779 additions & 0 deletions

File tree

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
---
2+
name: tool-description-optimizer
3+
version: "1.0"
4+
category: openclaw-native
5+
description: Analyzes skill descriptions for trigger quality — scores clarity, keyword density, and specificity, then suggests rewrites that improve discovery accuracy.
6+
stateful: true
7+
---
8+
9+
# Tool Description Optimizer
10+
11+
## What it does
12+
13+
A skill's description is its only discovery mechanism. If the description is vague, overlapping, or keyword-poor, the agent won't trigger it — or worse, will trigger the wrong skill. Tool Description Optimizer analyzes every installed skill's description for trigger quality and suggests concrete rewrites.
14+
15+
Inspired by OpenLobster's tool-description scoring layer, which penalizes vague descriptions and rewards keyword-rich, action-specific ones.
16+
17+
## When to invoke
18+
19+
- After installing new skills — check if descriptions are trigger-ready
20+
- When a skill isn't firing when expected — diagnose whether the description is the problem
21+
- Periodically to audit all descriptions for quality drift
22+
- Before publishing a skill — polish the description for discoverability
23+
24+
## How it works
25+
26+
### Scoring dimensions (5 metrics, 0–10 each)
27+
28+
| Metric | What it measures | Weight |
29+
|---|---|---|
30+
| Clarity | Single clear purpose, no ambiguity | 2x |
31+
| Specificity | Action verbs, concrete nouns vs. vague terms | 2x |
32+
| Keyword density | Trigger-relevant keywords per sentence | 1.5x |
33+
| Uniqueness | Low overlap with other installed skill descriptions | 1.5x |
34+
| Length | Optimal range (15–40 words) — too short = vague, too long = diluted | 1x |
35+
36+
### Quality grades
37+
38+
| Grade | Score range | Meaning |
39+
|---|---|---|
40+
| A | 8.0–10.0 | Excellent — high trigger accuracy expected |
41+
| B | 6.0–7.9 | Good — minor improvements possible |
42+
| C | 4.0–5.9 | Fair — likely to miss triggers or overlap |
43+
| D | 2.0–3.9 | Poor — needs rewrite |
44+
| F | 0.0–1.9 | Failing — will not trigger reliably |
45+
46+
## How to use
47+
48+
```bash
49+
python3 optimize.py --scan # Score all installed skills
50+
python3 optimize.py --scan --grade C # Only show skills graded C or below
51+
python3 optimize.py --skill <name> # Deep analysis of a single skill
52+
python3 optimize.py --suggest <name> # Generate rewrite suggestions
53+
python3 optimize.py --compare "desc A" "desc B" # Compare two descriptions
54+
python3 optimize.py --status # Last scan summary
55+
python3 optimize.py --format json # Machine-readable output
56+
```
57+
58+
## Procedure
59+
60+
**Step 1 — Run a full scan**
61+
62+
```bash
63+
python3 optimize.py --scan
64+
```
65+
66+
Review the scorecard. Focus on skills graded C or below — these are the ones most likely to cause trigger failures.
67+
68+
**Step 2 — Get rewrite suggestions for low-scoring skills**
69+
70+
```bash
71+
python3 optimize.py --suggest <skill-name>
72+
```
73+
74+
The optimizer generates 2–3 alternative descriptions with predicted score improvements.
75+
76+
**Step 3 — Compare alternatives**
77+
78+
```bash
79+
python3 optimize.py --compare "original description" "suggested rewrite"
80+
```
81+
82+
Side-by-side scoring shows exactly which metrics improved.
83+
84+
**Step 4 — Apply the best rewrite**
85+
86+
Edit the skill's `SKILL.md` frontmatter `description:` field with the chosen rewrite.
87+
88+
## Vague word penalties
89+
90+
These words score 0 on specificity — they say nothing actionable:
91+
92+
`helps`, `manages`, `handles`, `deals with`, `works with`, `does stuff`, `various`, `things`, `general`, `misc`, `utility`, `tool for`, `assistant for`
93+
94+
## Strong trigger keywords (examples)
95+
96+
`scans`, `detects`, `validates`, `generates`, `audits`, `monitors`, `checks`, `reports`, `fixes`, `migrates`, `syncs`, `schedules`, `blocks`, `scores`, `diagnoses`
97+
98+
## State
99+
100+
Scan results and per-skill scores stored in `~/.openclaw/skill-state/tool-description-optimizer/state.yaml`.
101+
102+
Fields: `last_scan_at`, `skill_scores` list, `scan_history`.
103+
104+
## Notes
105+
106+
- Does not modify any skill files — analysis and suggestions only
107+
- Uniqueness scoring uses Jaccard similarity against all other installed descriptions
108+
- Length scoring uses a bell curve centered at 25 words (optimal)
109+
- Rewrite suggestions are heuristic-based, not LLM-generated — deterministic and fast
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
version: "1.0"
2+
description: Tool description quality scores, rewrite suggestions, and scan history.
3+
fields:
4+
last_scan_at:
5+
type: datetime
6+
skill_scores:
7+
type: list
8+
description: Per-skill quality scores from the most recent scan
9+
items:
10+
skill_name: { type: string }
11+
description: { type: string }
12+
word_count: { type: integer }
13+
clarity: { type: float, description: "0-10 clarity score" }
14+
specificity: { type: float, description: "0-10 specificity score" }
15+
keyword_density: { type: float, description: "0-10 keyword density score" }
16+
uniqueness: { type: float, description: "0-10 uniqueness vs other skills" }
17+
length_score: { type: float, description: "0-10 length optimality score" }
18+
overall: { type: float, description: "Weighted composite score" }
19+
grade: { type: string, description: "A/B/C/D/F" }
20+
scan_history:
21+
type: list
22+
description: Rolling log of past scans (last 20)
23+
items:
24+
scanned_at: { type: datetime }
25+
skills_scanned: { type: integer }
26+
avg_score: { type: float }
27+
grade_distribution: { type: object, description: "Count per grade: A, B, C, D, F" }
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Example runtime state for tool-description-optimizer
2+
last_scan_at: "2026-03-16T14:00:05.221000"
3+
skill_scores:
4+
- skill_name: using-superpowers
5+
description: "Bootstrap — teaches the agent how to find and invoke skills"
6+
word_count: 11
7+
clarity: 7.2
8+
specificity: 3.8
9+
keyword_density: 3.3
10+
uniqueness: 8.1
11+
length_score: 4.8
12+
overall: 5.6
13+
grade: C
14+
- skill_name: config-encryption-auditor
15+
description: "Scans OpenClaw config directories for plaintext API keys, tokens, and secrets in unencrypted files."
16+
word_count: 15
17+
clarity: 9.2
18+
specificity: 8.5
19+
keyword_density: 8.0
20+
uniqueness: 9.0
21+
length_score: 7.5
22+
overall: 8.5
23+
grade: A
24+
- skill_name: memory-graph-builder
25+
description: "Parses MEMORY.md into a knowledge graph with typed relationships, detects duplicates and contradictions, and generates a compressed memory digest."
26+
word_count: 22
27+
clarity: 8.8
28+
specificity: 7.6
29+
keyword_density: 7.2
30+
uniqueness: 9.4
31+
length_score: 9.5
32+
overall: 8.5
33+
grade: A
34+
scan_history:
35+
- scanned_at: "2026-03-16T14:00:05.221000"
36+
skills_scanned: 40
37+
avg_score: 7.2
38+
grade_distribution:
39+
A: 18
40+
B: 14
41+
C: 6
42+
D: 2
43+
F: 0
44+
- scanned_at: "2026-03-13T14:00:00.000000"
45+
skills_scanned: 36
46+
avg_score: 6.8
47+
grade_distribution:
48+
A: 14
49+
B: 12
50+
C: 7
51+
D: 3
52+
F: 0
53+
# ── Walkthrough ──────────────────────────────────────────────────────────────
54+
# python3 optimize.py --scan
55+
#
56+
# Tool Description Quality Scan — 2026-03-16
57+
# ────────────────────────────────────────────────────────────
58+
# 40 skills scanned | avg score: 7.2
59+
# Grades: 18xA 14xB 6xC 2xD 0xF
60+
#
61+
# ! [D] 3.8 — some-vague-skill
62+
# clarity=2.0 spec=1.5 kw=1.2 uniq=8.0 len=6.5
63+
# "A helpful utility tool that manages various things..."
64+
#
65+
# ~ [C] 5.6 — using-superpowers
66+
# clarity=7.2 spec=3.8 kw=3.3 uniq=8.1 len=4.8
67+
# "Bootstrap — teaches the agent how to find and invoke skills"
68+
#
69+
# python3 optimize.py --suggest using-superpowers
70+
#
71+
# Rewrite Suggestions: using-superpowers
72+
# ──────────────────────────────────────────────────
73+
# Current: "Bootstrap — teaches the agent how to find and invoke skills"
74+
# Score: 5.6 (C)
75+
#
76+
# 1. Front-load action verb
77+
# "Teaches the agent how to discover, invoke, and chain installed skills"
78+
# Predicted: 7.4 (B) [+1.8]
79+
#
80+
# python3 optimize.py --compare "A tool that helps manage stuff" "Scans config files for plaintext secrets and suggests env var migration"
81+
#
82+
# Description Comparison
83+
# ──────────────────────────────────────────────────
84+
# A: "A tool that helps manage stuff"
85+
# B: "Scans config files for plaintext secrets and suggests env var migration"
86+
#
87+
# Clarity A=2.0 B=9.5 B
88+
# Specificity A=0.0 B=8.5 B
89+
# Keywords A=0.0 B=7.8 B
90+
# Uniqueness A=7.0 B=7.0 =
91+
# Length A=5.2 B=8.8 B
92+
# OVERALL A=2.8 B=8.4 B
93+
#
94+
# Grade: A=D B=A

0 commit comments

Comments
 (0)