Skip to content

Integrate Skillgrade for skill quality evaluation and testing #118

@luongnv89

Description

@luongnv89

Type

feature (high confidence)

Description

Skillgrade is a new CLI tool by Minko Gechev (Angular team at Google) that provides unit tests for AI agent skills. It evaluates skills using deterministic checks and LLM-based rubric scoring, with Docker isolation, CI integration, and multi-agent support (Claude, Gemini, Codex).

ASM manages the skill lifecycle (discovery, install, update, removal) but has no way to measure whether a skill actually works. Skillgrade fills exactly that gap — "skills are instructions for agents, and instructions need tests."

Key Skillgrade capabilities relevant to ASM

  • skillgrade init — AI-powered scaffolding that reads SKILL.md and auto-generates eval.yaml with tasks and graders
  • skillgrade run with presets (--smoke, --reliable, --regression) for different confidence levels
  • Deterministic + LLM rubric graders with weighted scoring
  • Docker isolation for safe execution; --provider=local for CI
  • --ci flag with configurable pass/fail threshold (exits non-zero below threshold)
  • skillgrade preview browser — web UI for viewing results

Integration opportunities

  1. asm test <skill-name> — new command that invokes Skillgrade against an installed skill's directory, running --smoke by default
  2. Quality scores in catalog — skills that ship eval.yaml get a "tested" badge; display pass rates as a quality signal
  3. Pre-install validation — run skillgrade --validate against a skill's bundled eval before installing from untrusted sources
  4. asm publish gate — require skillgrade --ci to pass before a skill is listed in the registry (ties into asm publish — validate, audit, and submit skills to the registry #105)
  5. Benchmark integration — Skillgrade's grading engine could power the prompt benchmark suite proposed in [FEATURE] Add prompt benchmark suite to compare skill repos on popular tasks #114

Why now

  • MIT licensed, lightweight npm package (npm i -g skillgrade)
  • Actively maintained (v0.1.3, 318+ stars in first month, last push 3 days ago)
  • Operates on the same SKILL.md convention ASM already uses
  • Targets the same agents (Claude Code, Codex, Gemini CLI)

Reporter Context

check skillgrade repo, could be very interesting for integrating into asm: https://github.com/mgechev/skillgrade

Acceptance Criteria

  • ASM can detect whether Skillgrade is installed and suggest installation if missing (medium confidence)
  • asm test <skill-name> runs Skillgrade evaluation against the skill's directory and surfaces results (medium confidence)
  • Skills with eval.yaml are flagged as "tested" in catalog/search output (medium confidence)
  • Integration does not create a hard dependency — Skillgrade is optional, ASM works without it (high confidence)
  • Documentation covers how skill authors can add eval.yaml to their skills for ASM quality scoring (medium confidence)

Metadata

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or request

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions