feat(skills): native agentskills.io support by darkrishabh · Pull Request #1 · darkrishabh/bench-ai

darkrishabh · 2026-04-30T18:14:04Z

Summary

Add native agentskills.io support via the new @darkrishabh/bench-ai/skills entrypoint: loadSkill, discoverSkills, runEval, gradeOutputs, writeRunArtifacts, buildBenchmark, ensureIterationDir, and evaluateSkills.
Add @darkrishabh/bench-ai/providers for provider imports and extend the Provider contract with optional capabilities and completeChat while preserving complete(prompt) compatibility.
Add bench-ai skills <root> CLI support, spec-shaped artifact writing, baseline mode, README examples, a fixture skill, tests, and a minor version bump to 1.1.0.

Spec: https://agentskills.io/skill-creation/evaluating-skills

Testing

npm test
npm run type-check
npm run build
node ./bench-ai.mjs skills --help
node --input-type=module -e 'import { evaluateSkills } from "./dist/skills/index.js"; import { OpenAICompatibleProvider } from "./dist/providers/index.js"; console.log(typeof evaluateSkills, typeof OpenAICompatibleProvider);'

vercel · 2026-04-30T18:14:08Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
bench-ai	Ready	Preview, Comment	Apr 30, 2026 7:08pm

…op mode, HTML report - Add structured SkillsEvent stream (suite/eval start/end) so consumers can build rich UIs and reporters without re-parsing logs. - Bundle a default consoleReporter with color-coded headers, prompt and output snippets, per-assertion verdicts with evidence, and suite-level pass-rate delta across with_skill / without_skill modes. - Default workspace layout is now flat: <workspace>/<skill-slug>/..., overwritten on each run (CI-friendly, easy to diff). - Opt-in loop mode (loop: true) snapshots each run into <workspace>/.history/iteration-N/<skill-slug>/... for skill-improvement workflows, returning historyIteration in the result. - Persist meta.json (skill metadata) and prompts.json (system, user, and judge prompts actually sent) alongside the spec-mandated artifacts so reports and debugging surface exactly what each model saw. - Add a self-contained static HTML report (report.ts) at <workspace>/report/index.html with overall stats, per-skill drill-down, side-by-side with_skill / without_skill output, and a collapsible judge prompt section. - gradeOutputs now returns { grading, judgePrompt, judgeResponse } so the judge prompt can be exposed in events and reports. - evaluate-skills emits events, defaults to consoleReporter when no onEvent/onLog is provided, generates the HTML report unless report: false, and returns historyIteration + reportPath in EvaluateSkillsResult. - src/cli/skills.ts: update CLI JSON output to the new result shape.

vercel Bot deployed to Preview April 30, 2026 18:14 View deployment

cursor Bot force-pushed the cursor/agent-skills-standard-6cb3 branch from bf427ee to 045dc91 Compare April 30, 2026 19:07

cursor Bot changed the title ~~Add Agent Skills eval support~~ feat(skills): native agentskills.io support Apr 30, 2026

vercel Bot deployed to Preview April 30, 2026 19:08 View deployment

darkrishabh added 2 commits April 30, 2026 13:51

feat(skills): native agentskills.io support

aac77d1

darkrishabh closed this Apr 30, 2026

darkrishabh force-pushed the cursor/agent-skills-standard-6cb3 branch from 045dc91 to eb0ec5c Compare April 30, 2026 20:52

vercel Bot deployed to Preview April 30, 2026 20:53 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): native agentskills.io support#1

feat(skills): native agentskills.io support#1
darkrishabh wants to merge 2 commits into
masterfrom
cursor/agent-skills-standard-6cb3

darkrishabh commented Apr 30, 2026 •

edited by cursor Bot

Loading

Uh oh!

vercel Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

darkrishabh commented Apr 30, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

vercel Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

darkrishabh commented Apr 30, 2026 •

edited by cursor Bot

Loading

vercel Bot commented Apr 30, 2026 •

edited

Loading