You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Skillgrade is a new CLI tool by Minko Gechev (Angular team at Google) that provides unit tests for AI agent skills. It evaluates skills using deterministic checks and LLM-based rubric scoring, with Docker isolation, CI integration, and multi-agent support (Claude, Gemini, Codex).
ASM manages the skill lifecycle (discovery, install, update, removal) but has no way to measure whether a skill actually works. Skillgrade fills exactly that gap — "skills are instructions for agents, and instructions need tests."
Key Skillgrade capabilities relevant to ASM
skillgrade init — AI-powered scaffolding that reads SKILL.md and auto-generates eval.yaml with tasks and graders
skillgrade run with presets (--smoke, --reliable, --regression) for different confidence levels
Deterministic + LLM rubric graders with weighted scoring
Docker isolation for safe execution; --provider=local for CI
--ci flag with configurable pass/fail threshold (exits non-zero below threshold)
skillgrade preview browser — web UI for viewing results
Integration opportunities
asm test <skill-name> — new command that invokes Skillgrade against an installed skill's directory, running --smoke by default
Quality scores in catalog — skills that ship eval.yaml get a "tested" badge; display pass rates as a quality signal
Pre-install validation — run skillgrade --validate against a skill's bundled eval before installing from untrusted sources
Type
feature (high confidence)
Description
Skillgrade is a new CLI tool by Minko Gechev (Angular team at Google) that provides unit tests for AI agent skills. It evaluates skills using deterministic checks and LLM-based rubric scoring, with Docker isolation, CI integration, and multi-agent support (Claude, Gemini, Codex).
ASM manages the skill lifecycle (discovery, install, update, removal) but has no way to measure whether a skill actually works. Skillgrade fills exactly that gap — "skills are instructions for agents, and instructions need tests."
Key Skillgrade capabilities relevant to ASM
skillgrade init— AI-powered scaffolding that readsSKILL.mdand auto-generateseval.yamlwith tasks and gradersskillgrade runwith presets (--smoke,--reliable,--regression) for different confidence levels--provider=localfor CI--ciflag with configurable pass/fail threshold (exits non-zero below threshold)skillgrade preview browser— web UI for viewing resultsIntegration opportunities
asm test <skill-name>— new command that invokes Skillgrade against an installed skill's directory, running--smokeby defaulteval.yamlget a "tested" badge; display pass rates as a quality signalskillgrade --validateagainst a skill's bundled eval before installing from untrusted sourcesasm publishgate — requireskillgrade --cito pass before a skill is listed in the registry (ties into asm publish — validate, audit, and submit skills to the registry #105)Why now
npm i -g skillgrade)SKILL.mdconvention ASM already usesAcceptance Criteria
asm test <skill-name>runs Skillgrade evaluation against the skill's directory and surfaces results (medium confidence)eval.yamlare flagged as "tested" in catalog/search output (medium confidence)eval.yamlto their skills for ASM quality scoring (medium confidence)Metadata
References
npm i -g skillgrade