Objective
plugins/agentv-dev/skills/agentv-bench/agents/grader.md:42 documents:
contains | Check if response includes the value substring (case-insensitive by default)
The implementation at packages/core/src/evaluation/graders/assertions.ts:14-25 does:
export function runContainsAssertion(output: string, value: string): AssertionResult {
const passed = output.includes(value); // case-sensitive — no toLowerCase
...
}
contains-any (assertions.ts:28-45) and contains-all (assertions.ts:48-63) are also case-sensitive (raw .includes()). The icontains* family at assertions.ts:68-123 explicitly lowercases both sides — which only makes sense as a variant if the bare contains* functions are case-sensitive.
So grader.md:42 is both factually wrong and internally inconsistent with the icontains* entries at grader.md:45.
Reproducer
tests:
- id: t
input: test
assertions:
- name: has_hello
type: contains
value: hello
Response "Hello, world!" → assertion fails. The auto-generated failure text "Output does not contain \"hello\"" comes from assertions.ts:20 and is a grep anchor for the case-sensitive branch.
Design latitude
- Fix the doc (recommended) —
grader.md:42 states contains is case-sensitive by default; direct users to icontains* for case-insensitive matching. Aligns with the existing icontains convention.
- Fix the implementation — make
contains* case-insensitive by default (change assertions.ts:15, :32, :52). Breaking change; any eval relying on case-sensitive contains would start passing incorrectly.
Option 1 is the YAGNI path unless there's concrete evidence users expected case-insensitive behavior from bare contains. icontains* already covers the case-insensitive use case.
Acceptance signals
grader.md:42-44 accurately describes contains, contains-any, contains-all case-sensitivity (case-sensitive if Option 1).
- Regression test in
packages/core/test/evaluation/graders/ pinning the chosen behavior, e.g. for Option 1: expect(runContainsAssertion("Hello", "hello").score).toBe(0) and expect(runContainsAssertion("hello", "hello").score).toBe(1).
- No other skill/doc file claims
contains* is case-insensitive.
Non-goals
equals (assertions.ts:196), starts-with (:126), ends-with (:140) are all case-sensitive and their grader.md:46-49 entries do not claim otherwise — explicitly out of scope.
regex case flags are handled via flags parameter — out of scope.
Related
Objective
plugins/agentv-dev/skills/agentv-bench/agents/grader.md:42documents:The implementation at
packages/core/src/evaluation/graders/assertions.ts:14-25does:contains-any(assertions.ts:28-45) andcontains-all(assertions.ts:48-63) are also case-sensitive (raw.includes()). Theicontains*family atassertions.ts:68-123explicitly lowercases both sides — which only makes sense as a variant if the barecontains*functions are case-sensitive.So
grader.md:42is both factually wrong and internally inconsistent with theicontains*entries atgrader.md:45.Reproducer
Response
"Hello, world!"→ assertion fails. The auto-generated failure text"Output does not contain \"hello\""comes fromassertions.ts:20and is a grep anchor for the case-sensitive branch.Design latitude
grader.md:42statescontainsis case-sensitive by default; direct users toicontains*for case-insensitive matching. Aligns with the existingicontainsconvention.contains*case-insensitive by default (changeassertions.ts:15,:32,:52). Breaking change; any eval relying on case-sensitivecontainswould start passing incorrectly.Option 1 is the YAGNI path unless there's concrete evidence users expected case-insensitive behavior from bare
contains.icontains*already covers the case-insensitive use case.Acceptance signals
grader.md:42-44accurately describescontains,contains-any,contains-allcase-sensitivity (case-sensitive if Option 1).packages/core/test/evaluation/graders/pinning the chosen behavior, e.g. for Option 1:expect(runContainsAssertion("Hello", "hello").score).toBe(0)andexpect(runContainsAssertion("hello", "hello").score).toBe(1).contains*is case-insensitive.Non-goals
equals(assertions.ts:196),starts-with(:126),ends-with(:140) are all case-sensitive and theirgrader.md:46-49entries do not claim otherwise — explicitly out of scope.regexcase flags are handled viaflagsparameter — out of scope.Related