Summary
The skill-trigger evaluator fails for pi-cli because pi's JSONL output doesn't include tool call names in a format the evaluator recognizes.
Evidence
The skill IS loaded (scores improve from ~4/9 to 6/9 when skills are in the workspace), but the evaluator reports:
Skill "agent-plugin-review" not found in 1 tool call(s)
Pi's stream log shows toolcall_start/toolcall_delta/toolcall_end events without the tool name or arguments. The extractToolCalls function in pi-cli.ts looks for type: "tool_use" or type: "toolCall" with a name field in msg.content, but pi may structure tool calls differently.
Current behavior
skill-trigger works for Claude Code (checks Skill tool use) and Copilot (checks readFile of SKILL.md)
skill-trigger silently fails for pi-cli — tool calls are detected but skill name is not found
Expected behavior
The evaluator should detect when pi reads a SKILL.md file (via read_file or similar tool) that matches the skill name.
Workaround
Removed skill-trigger assertions from the agentic-engineering eval. Content assertions still validate review quality.
Related
Summary
The
skill-triggerevaluator fails for pi-cli because pi's JSONL output doesn't include tool call names in a format the evaluator recognizes.Evidence
The skill IS loaded (scores improve from ~4/9 to 6/9 when skills are in the workspace), but the evaluator reports:
Pi's stream log shows
toolcall_start/toolcall_delta/toolcall_endevents without the tool name or arguments. TheextractToolCallsfunction inpi-cli.tslooks fortype: "tool_use"ortype: "toolCall"with anamefield inmsg.content, but pi may structure tool calls differently.Current behavior
skill-triggerworks for Claude Code (checksSkilltool use) and Copilot (checksreadFileof SKILL.md)skill-triggersilently fails for pi-cli — tool calls are detected but skill name is not foundExpected behavior
The evaluator should detect when pi reads a SKILL.md file (via
read_fileor similar tool) that matches the skill name.Workaround
Removed
skill-triggerassertions from the agentic-engineering eval. Content assertions still validate review quality.Related