Integrate Skillgrade for skill quality evaluation and testing

## Type

feature (high confidence)

## Description

[Skillgrade](https://github.com/mgechev/skillgrade) is a new CLI tool by Minko Gechev (Angular team at Google) that provides **unit tests for AI agent skills**. It evaluates skills using deterministic checks and LLM-based rubric scoring, with Docker isolation, CI integration, and multi-agent support (Claude, Gemini, Codex).

ASM manages the skill lifecycle (discovery, install, update, removal) but has no way to measure whether a skill actually works. Skillgrade fills exactly that gap — "skills are instructions for agents, and instructions need tests."

### Key Skillgrade capabilities relevant to ASM

- **`skillgrade init`** — AI-powered scaffolding that reads `SKILL.md` and auto-generates `eval.yaml` with tasks and graders
- **`skillgrade run`** with presets (`--smoke`, `--reliable`, `--regression`) for different confidence levels
- **Deterministic + LLM rubric graders** with weighted scoring
- **Docker isolation** for safe execution; `--provider=local` for CI
- **`--ci` flag** with configurable pass/fail threshold (exits non-zero below threshold)
- **`skillgrade preview browser`** — web UI for viewing results

### Integration opportunities

1. **`asm test <skill-name>`** — new command that invokes Skillgrade against an installed skill's directory, running `--smoke` by default
2. **Quality scores in catalog** — skills that ship `eval.yaml` get a "tested" badge; display pass rates as a quality signal
3. **Pre-install validation** — run `skillgrade --validate` against a skill's bundled eval before installing from untrusted sources
4. **`asm publish` gate** — require `skillgrade --ci` to pass before a skill is listed in the registry (ties into #105)
5. **Benchmark integration** — Skillgrade's grading engine could power the prompt benchmark suite proposed in #114

### Why now

- MIT licensed, lightweight npm package (`npm i -g skillgrade`)
- Actively maintained (v0.1.3, 318+ stars in first month, last push 3 days ago)
- Operates on the same `SKILL.md` convention ASM already uses
- Targets the same agents (Claude Code, Codex, Gemini CLI)

> **Reporter Context**
> 
> check skillgrade repo, could be very interesting for integrating into asm: https://github.com/mgechev/skillgrade

## Acceptance Criteria

- [ ] ASM can detect whether Skillgrade is installed and suggest installation if missing (medium confidence)
- [ ] `asm test <skill-name>` runs Skillgrade evaluation against the skill's directory and surfaces results (medium confidence)
- [ ] Skills with `eval.yaml` are flagged as "tested" in catalog/search output (medium confidence)
- [ ] Integration does not create a hard dependency — Skillgrade is optional, ASM works without it (high confidence)
- [ ] Documentation covers how skill authors can add `eval.yaml` to their skills for ASM quality scoring (medium confidence)

## Metadata

- **Priority:** medium
- **Effort:** L
- **Labels:** feature, integration, quality
- **Related:** #114 (prompt benchmark suite — complementary scope), #105 (asm publish), #54 (skill security audit)

## References

- **Repository:** https://github.com/mgechev/skillgrade
- **Blog post:** https://blog.mgechev.com/2026/03/14/skillgrade/
- **npm:** `npm i -g skillgrade`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate Skillgrade for skill quality evaluation and testing #118

Type

Description

Key Skillgrade capabilities relevant to ASM

Integration opportunities

Why now

Acceptance Criteria

Metadata

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Integrate Skillgrade for skill quality evaluation and testing #118

Description

Type

Description

Key Skillgrade capabilities relevant to ASM

Integration opportunities

Why now

Acceptance Criteria

Metadata

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions