Skip to content

fix(core): extract system messages in prompt builder for LLM grader#983

Merged
christso merged 3 commits intomainfrom
fix/982-grader-system-prompt
Apr 8, 2026
Merged

fix(core): extract system messages in prompt builder for LLM grader#983
christso merged 3 commits intomainfrom
fix/982-grader-system-prompt

Conversation

@christso
Copy link
Copy Markdown
Collaborator

@christso christso commented Apr 8, 2026

Closes #982

Summary

  • Fix buildPromptInputs to extract system messages into systemMessage field
  • Fix orchestrator to pass systemPrompt directly instead of in metadata

Test plan

  • Unit tests pass (1901 tests across all packages)
  • Build succeeds
  • Manual eval with system prompt produces correct grader scores

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

The buildPromptInputs function now correctly extracts system messages
and returns them in the systemMessage field. The orchestrator passes
the system prompt directly instead of burying it in metadata.

Closes #982

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 8, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2f58cfa
Status: ✅  Deploy successful!
Preview URL: https://78c2eebe.agentv.pages.dev
Branch Preview URL: https://fix-982-grader-system-prompt.agentv.pages.dev

View logs

christso and others added 2 commits April 8, 2026 22:49
…ride (#982)

When a user writes `prompt: "Check step-by-step work"` in an llm-grader
assertion, the text was being used as the entire evaluator template,
replacing the DEFAULT_EVALUATOR_TEMPLATE which contains {{output}},
{{input}}, {{criteria}} variables. This meant the grader never saw the
actual candidate response, always scoring 0.

Now bare inline prompt strings (without template variables like
{{output}}) are injected into the default template's {{criteria}} slot
instead. Prompts that contain recognized template variables, and prompts
from scripts/files, continue to work as full template overrides.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace while-loop assignment with matchAll iterator (biome
noAssignInExpressions) and collapse short import to single line (biome
formatter).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@christso christso marked this pull request as ready for review April 8, 2026 23:01
@christso christso merged commit 8f4a29b into main Apr 8, 2026
4 checks passed
@christso christso deleted the fix/982-grader-system-prompt branch April 8, 2026 23:01
christso added a commit to EntityProcess/agentv-bench-skills that referenced this pull request Apr 9, 2026
Rerun of with-superpowers vs without-superpowers logic-puzzle evals after merging
fix(core): extract system messages in prompt builder for LLM grader (EntityProcess/agentv#983).

Both experiments now score 100%/0.990 on gemini and azure. Previous 0% for with-superpowers/azure
was entirely a grader artifact from bug #982.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: LLM grader reports 'no response provided' when system prompt is present in input

1 participant