Skip to content

skill-trigger evaluator cannot detect pi-cli skill loading #780

@christso

Description

@christso

Summary

The skill-trigger evaluator fails for pi-cli because pi's JSONL output doesn't include tool call names in a format the evaluator recognizes.

Evidence

The skill IS loaded (scores improve from ~4/9 to 6/9 when skills are in the workspace), but the evaluator reports:

Skill "agent-plugin-review" not found in 1 tool call(s)

Pi's stream log shows toolcall_start/toolcall_delta/toolcall_end events without the tool name or arguments. The extractToolCalls function in pi-cli.ts looks for type: "tool_use" or type: "toolCall" with a name field in msg.content, but pi may structure tool calls differently.

Current behavior

  • skill-trigger works for Claude Code (checks Skill tool use) and Copilot (checks readFile of SKILL.md)
  • skill-trigger silently fails for pi-cli — tool calls are detected but skill name is not found

Expected behavior

The evaluator should detect when pi reads a SKILL.md file (via read_file or similar tool) that matches the skill name.

Workaround

Removed skill-trigger assertions from the agentic-engineering eval. Content assertions still validate review quality.

Related

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions