feat(core): Content preprocessor for LLM graders to handle binary agent outputs

## Objective

Add a content preprocessor pipeline so LLM graders can evaluate agents that produce binary file outputs (e.g., `.xlsx`, `.pdf`, `.docx`). Currently, `ContentFile` blocks are defined in `content.ts` but silently ignored by the grading pipeline — the grader receives an empty `candidate` string.

## Problem

1. `extractLastAssistantContent()` in `providers/types.ts:250` only extracts `ContentText` blocks — `ContentFile` is ignored
2. LLM grader receives empty `candidate` when agent output is a file
3. Built-in agent mode's `read_file` skips binary extensions
4. Code grader's `materializeContentForGrader()` handles images but not files

## Prerequisites — ContentFile Production

Before this feature is useful end-to-end, at least one provider must emit `ContentFile` blocks in agent output. Check whether any current providers (claude, codex, copilot) already produce `ContentFile` when agents write files, or whether provider-side changes are also needed. If provider work is required, it can be a parallel workstream — the preprocessor pipeline should be built to be testable with mock `ContentFile` blocks regardless.

## Proposed Design

Add a **preprocessor pipeline** that converts `ContentFile` blocks to `ContentText` before graders see them.

### Default behavior: read as text

Any `ContentFile` without a registered preprocessor is read as UTF-8 text. This covers csv, json, sql, md, yaml, html, xml, txt, and any other text-based format — no registration needed.

### Preprocessors: only for formats that need transformation

Preprocessors exist only when raw text read is insufficient (binary formats, or when a text format needs restructuring before grading). Core ships **no built-in preprocessors** — only the registry and default text read. Converter scripts are provided as examples that users copy into their projects and customize.

**Resolution order:**
1. User-defined preprocessor in YAML → takes priority (overrides default text read)
2. Default fallback → `readFile(path, 'utf-8')`

### Core implementation

- **Preprocessor registry** (`content-preprocessor.ts`): `Map<type, (ContentFile) => ContentText>` — populated only by user-defined preprocessors
- **Format alias map**: Short aliases resolve to MIME types (`xlsx` → full MIME string). Unrecognized values treated as literal MIME types. One `type` field.
- **Pipeline integration**: Run preprocessing on `ContentFile` blocks before `candidate` extraction

### YAML config — scoping and syntax

Preprocessors are declared **top-level** in the eval file (shared by all evaluators). Per-evaluator override is possible but optional.

```yaml
# Top-level: applies to all evaluators in this file
preprocessors:
  - type: xlsx
    command: ["bun", "run", "scripts/preprocessors/xlsx-to-csv.ts"]
  - type: html
    command: ["bun", "run", "scripts/preprocessors/html-to-md.ts"]

tests:
  - id: report-check
    assertions:
      - type: llm-grader          # inherits xlsx/html preprocessors
        prompt: grade-report.txt
      - type: rubrics              # also inherits
        criteria:
          - Has revenue column

  - id: special-case
    assertions:
      - type: llm-grader
        preprocessors:             # per-evaluator override
          - type: xlsx
            command: ["bun", "run", "scripts/preprocessors/xlsx-to-json.ts"]
```

### Command path resolution

Preprocessor `command` paths follow the same resolution as code-grader: the last element of the command array is resolved relative to `searchRoots` (eval file directory + project root) via `resolveFileReference()`. This keeps preprocessor scripts at a project-level location, not mixed into eval folders:

```
my-project/
  scripts/preprocessors/
    xlsx-to-csv.ts
    html-to-md.ts
  evals/
    dataset.eval.yaml          # references scripts/preprocessors/xlsx-to-csv.ts
```

### Integration points — hybrid approach

Use **both** integration strategies:

1. **At extraction boundary** (for LLM graders): Modify or wrap `extractLastAssistantContent()` to run preprocessors on `ContentFile` blocks → all LLM graders benefit automatically
2. **At materialization** (for code graders): Extend `materializeContentForGrader()` to write `ContentFile` blocks to temp files and pass paths to code-grader scripts — code graders may want raw file access, not just text

### Error handling

- Binary file with no preprocessor: attempt text read → if it fails (invalid UTF-8), log warning, skip the block, note in grader evidence that file content was not evaluable
- Preprocessor command fails: log stderr, skip the block, note in grader evidence

### Example converter scripts

Ship ready-to-copy converter scripts in `examples/features/preprocessors/`:

```
examples/features/preprocessors/
  scripts/preprocessors/
    xlsx-to-csv.ts             # xlsx → CSV (zero deps, uses built-in zip/XML parsing)
    html-to-md.ts              # HTML → markdown (zero deps, regex-based)
  evals/
    dataset.eval.yaml          # demonstrates top-level preprocessor config
  README.md                    # usage guide
```

Users copy converter scripts into their project's `scripts/preprocessors/` and customize as needed (e.g., pick specific xlsx sheets, filter HTML elements, change output format).

## Design Latitude

- The preprocessor registry pattern is prescribed (aligns with Inspect AI Tier 1 approach from research)
- Hybrid integration (extraction + materialization) is recommended but implementer may simplify if warranted
- YAML config schema for custom preprocessors can be deferred to a follow-up if simpler to start with programmatic-only registration
- Implementation details (sync vs async, exact function signatures) are flexible

## Acceptance Signals

- [ ] `ContentFile` blocks in agent output are converted to text before reaching LLM graders
- [ ] Default: text-based files read as UTF-8 with no configuration needed
- [ ] Top-level `preprocessors` config shared across all evaluators, with per-evaluator override
- [ ] Custom preprocessors can be registered via YAML config (command scripts) and override default text read
- [ ] Command path resolution matches code-grader behavior (resolveFileReference)
- [ ] Existing text-only workflows are unaffected (non-breaking, `ContentFile` absent = no-op)
- [ ] Code graders receive materialized file paths alongside text content
- [ ] Binary files with no preprocessor produce a warning, not a failure
- [ ] Example converter scripts: xlsx → CSV, HTML → markdown (TypeScript, zero deps)
- [ ] Unit tests for preprocessor registry and pipeline
- [ ] E2E test: eval with a mock agent that outputs a file, graded via preprocessor

## Non-Goals

- Multimodal LLM grading (sending files natively to vision models) — separate concern
- Preprocessing for trace display or non-grader stages
- Streaming preprocessing
- Provider-side changes to emit `ContentFile` (separate issue if needed)
- Built-in preprocessors in core (converters are examples, not built-ins)

## Industry Context

| Framework | Approach |
|-----------|----------|
| Inspect AI | Structured `Content` union preserved end-to-end (gold standard) |
| Braintrust | `Attachment` → S3, `AttachmentReference` to scorers |
| promptfoo | `output: string` only, no binary support |
| deepeval | Slug injection into strings (anti-pattern) |

No framework has a first-class preprocessor primitive — this is an industry gap. The converter registry pattern (media type → converter function) is universal in adjacent domains (Apache Tika, LangChain document loaders, Unstructured.io).

## Related

- Research: `agentevals-research/research/findings/binary-output-preprocessing/README.md`
- Multimodal content model research: `agentevals-research/research/findings/multimodal-content-model/README.md`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): Content preprocessor for LLM graders to handle binary agent outputs #963

Objective

Problem

Prerequisites — ContentFile Production

Proposed Design

Default behavior: read as text

Preprocessors: only for formats that need transformation

Core implementation

YAML config — scoping and syntax

Command path resolution

Integration points — hybrid approach

Error handling

Example converter scripts

Design Latitude

Acceptance Signals

Non-Goals

Industry Context

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Framework	Approach
Inspect AI	Structured `Content` union preserved end-to-end (gold standard)
Braintrust	`Attachment` → S3, `AttachmentReference` to scorers
promptfoo	`output: string` only, no binary support
deepeval	Slug injection into strings (anti-pattern)

feat(core): Content preprocessor for LLM graders to handle binary agent outputs #963

Description

Objective

Problem

Prerequisites — ContentFile Production

Proposed Design

Default behavior: read as text

Preprocessors: only for formats that need transformation

Core implementation

YAML config — scoping and syntax

Command path resolution

Integration points — hybrid approach

Error handling

Example converter scripts

Design Latitude

Acceptance Signals

Non-Goals

Industry Context

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions