Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions apps/web/src/content/docs/docs/evaluation/eval-cases.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -328,6 +328,19 @@ tests:
# No assertions → default llm-grader evaluates against criteria
```

Suite-level `preprocessors` also apply to this implicit grader. That matters when the agent output is a `ContentFile` block rather than plain text:

```yaml
preprocessors:
- type: xlsx
command: ["bun", "run", "scripts/preprocessors/xlsx-to-csv.ts"]

tests:
- id: spreadsheet-eval
criteria: Output includes the revenue rows
input: Generate the spreadsheet report
```

### `assertions` present — explicit evaluators only

When `assertions` is defined, only the declared evaluators run. No implicit grader is added. Graders that are declared (such as `llm-grader`, `code-grader`, or `rubrics`) receive `criteria` as input automatically.
Expand All @@ -353,6 +366,26 @@ tests:
value: "fix"
```

When you need a custom file conversion for only one grader, add `preprocessors` directly to that evaluator:

```yaml
preprocessors:
- type: xlsx
command: ["bun", "run", "scripts/preprocessors/xlsx-to-csv.ts"]

tests:
- id: mixed-eval
criteria: Response is helpful and mentions the fix
input: "Debug this function..."
assertions:
- type: llm-grader
preprocessors:
- type: xlsx
command: ["bun", "run", "scripts/preprocessors/xlsx-to-json.ts"]
- type: contains
value: "fix"
```

## Metadata

Pass additional context to evaluators via the `metadata` field:
Expand Down
29 changes: 29 additions & 0 deletions apps/web/src/content/docs/docs/evaluation/examples.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,35 @@ tests:
}
```

## File Output Preprocessing

Convert a binary file output into text before the `llm-grader` sees it:

```yaml
description: Grade spreadsheet output via a preprocessor

preprocessors:
- type: xlsx
command: ["bun", "run", "../scripts/preprocessors/xlsx-to-csv.ts"]

execution:
target: file_output

tests:
- id: spreadsheet-output
input: Generate the spreadsheet report
criteria: The extracted spreadsheet content includes the revenue rows
assertions:
- name: file-check
type: llm-grader
prompt: |
Check whether the transformed spreadsheet text contains the revenue rows:

{{ output }}
```

See [`examples/features/preprocessors/`](../../../../examples/features/preprocessors/) for a runnable end-to-end example with a file-producing target and custom grader target.

## Tool Trajectory

Validate that an agent uses specific tools during execution:
Expand Down
36 changes: 36 additions & 0 deletions apps/web/src/content/docs/docs/evaluators/llm-graders.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,42 @@ assertions:

The `config` object is available as `ctx.config` inside the template function.

## Preprocessing File Outputs

If an agent returns a `ContentFile` block instead of plain text, you can preprocess that file into text before `llm-grader` builds the candidate prompt.

AgentV always tries a default UTF-8 text read first. That is enough for text-based formats such as CSV, JSON, SQL, Markdown, YAML, HTML, XML, and plain text. For binary formats such as `.xlsx`, `.pdf`, or `.docx`, add a preprocessor command:

```yaml
preprocessors:
- type: xlsx
command: ["bun", "run", "scripts/preprocessors/xlsx-to-csv.ts"]

tests:
- id: spreadsheet-output
criteria: Output includes the revenue rows
input: Generate the spreadsheet report
assertions:
- name: spreadsheet-check
type: llm-grader
prompt: |
Check whether the transformed spreadsheet text contains the revenue rows:

{{ output }}
```

`type` accepts either a short alias such as `xlsx` or a full MIME type such as `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet`.

Resolution order:

- per-evaluator `preprocessors` override suite-level entries
- if no preprocessor matches, AgentV falls back to a UTF-8 text read
- if the fallback read looks binary or invalid, the grader receives a warning note instead of failing the test run

The implicit default `llm-grader` also inherits suite-level `preprocessors`, so you can omit `assertions` and still preprocess file outputs before grading.

See [`examples/features/preprocessors/`](../../../../examples/features/preprocessors/) for a runnable example with a file-producing target and a custom preprocessor script.

## Available Context Fields

TypeScript templates receive a context object with these fields:
Expand Down
2 changes: 2 additions & 0 deletions examples/features/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Focused examples for specific AgentV capabilities. Find your use case below, the
| [composite](composite/) | Safety gate and weighted aggregation patterns |
| [threshold-evaluator](threshold-evaluator/) | Pass a test if a configurable percentage of sub-evaluators pass |
| [multi-turn-conversation](multi-turn-conversation/) | Grade a multi-turn conversation with per-turn score breakdowns |
| [preprocessors](preprocessors/) | Convert `ContentFile` outputs into grader-readable text before `llm-grader` runs |

---

Expand Down Expand Up @@ -159,6 +160,7 @@ Focused examples for specific AgentV capabilities. Find your use case below, the
| [matrix-evaluation](matrix-evaluation/) | Benchmarking |
| [multi-turn-conversation](multi-turn-conversation/) | LLM grading |
| [nlp-metrics](nlp-metrics/) | Deterministic assertions |
| [preprocessors](preprocessors/) | LLM grading |
| [prompt-template-sdk](prompt-template-sdk/) | TypeScript SDK |
| [repo-lifecycle](repo-lifecycle/) | Workspace & targets |
| [rubric](rubric/) | LLM grading |
Expand Down
30 changes: 30 additions & 0 deletions examples/features/preprocessors/.agentv/providers/file-output.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import { mkdirSync, writeFileSync } from 'node:fs';
import path from 'node:path';

const outputFile = process.argv[2];
if (!outputFile) {
throw new Error('missing output file path');
}

const generatedDir = path.join(process.cwd(), 'generated');
mkdirSync(generatedDir, { recursive: true });
writeFileSync(path.join(generatedDir, 'report.xlsx'), Buffer.from([0, 159, 146, 150]));

writeFileSync(
outputFile,
JSON.stringify({
output: [
{
role: 'assistant',
content: [
{
type: 'file',
media_type: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
path: 'generated/report.xlsx',
},
],
},
],
}),
'utf8',
);
30 changes: 30 additions & 0 deletions examples/features/preprocessors/.agentv/providers/grader-check.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import { readFileSync, writeFileSync } from 'node:fs';

const promptFile = process.argv[2];
const outputFile = process.argv[3];

if (!promptFile || !outputFile) {
throw new Error('missing args');
}

const prompt = readFileSync(promptFile, 'utf8');
const passed = prompt.includes('spreadsheet: revenue,total') && prompt.includes('Q1,42');

writeFileSync(
outputFile,
JSON.stringify({
text: JSON.stringify({
score: passed ? 1 : 0,
assertions: [
{
text: 'preprocessed file content reached the llm grader',
passed,
evidence: passed
? 'found transformed spreadsheet text in prompt'
: 'transformed spreadsheet text missing from prompt',
},
],
}),
}),
'utf8',
);
12 changes: 12 additions & 0 deletions examples/features/preprocessors/.agentv/targets.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
$schema: agentv-targets-v2.2
targets:
- name: file_output
provider: file-output
command: bun run .agentv/providers/file-output.ts {OUTPUT_FILE}
cwd: ..
grader_target: grader_check

- name: grader_check
provider: grader-check
command: bun run .agentv/providers/grader-check.ts {PROMPT_FILE} {OUTPUT_FILE}
cwd: ..
27 changes: 27 additions & 0 deletions examples/features/preprocessors/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Content Preprocessors

Demonstrates how `llm-grader` preprocessors turn `ContentFile` outputs into text before grading.

## What This Shows

- top-level `preprocessors:` shared by all graders in an eval
- an agent target returning a `ContentFile` block instead of plain text
- an `llm-grader` receiving transformed spreadsheet text
- relative `ContentFile.path` resolution against the target workspace

## Running

```bash
# From repository root
bun apps/cli/src/cli.ts eval examples/features/preprocessors/evals/dataset.eval.yaml --target file_output
```

Expected result: the eval passes because the grader sees the transformed spreadsheet text from `generated/report.xlsx`.

## Key Files

- `evals/dataset.eval.yaml` - eval with top-level `preprocessors`
- `.agentv/targets.yaml` - custom file-producing target and custom grader target
- `.agentv/providers/file-output.ts` - emits a relative `ContentFile` path
- `.agentv/providers/grader-check.ts` - passes only when transformed text reaches the grader prompt
- `scripts/preprocessors/xlsx-to-csv.ts` - example spreadsheet preprocessor script
17 changes: 17 additions & 0 deletions examples/features/preprocessors/evals/dataset.eval.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
description: Convert file outputs to grader-readable text before llm grading

preprocessors:
- type: xlsx
command: ["bun", "run", "../scripts/preprocessors/xlsx-to-csv.ts"]

tests:
- id: spreadsheet-output
input: Generate the spreadsheet report
criteria: The extracted spreadsheet content includes the revenue rows
assertions:
- name: file-check
type: llm-grader
prompt: |
Check whether the answer contains the transformed spreadsheet text:

{{ output }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
import { readFileSync } from 'node:fs';

const payload = JSON.parse(readFileSync(0, 'utf8')) as { path?: string };

if (!payload.path) {
throw new Error('missing file path');
}

// Example-only placeholder transformation. Copy this script into your project
// and replace it with real spreadsheet extraction logic.
console.log('spreadsheet: revenue,total');
console.log('Q1,42');
Loading
Loading