Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 114 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Copilot Instructions for pytest-codingagents

## Build, Test & Lint Commands

```bash
# Install all dependencies (including dev and docs extras)
uv sync --all-extras

# Unit tests (fast, no credentials needed)
uv run pytest tests/unit/ -v

# Run a single unit test file
uv run pytest tests/unit/test_event_mapper.py -v

# Run a single test by name
uv run pytest tests/unit/test_result.py::test_name -v

# Integration tests (require GitHub Copilot credentials via GITHUB_TOKEN or `gh` CLI auth)
uv run pytest tests/ -v -m copilot

# Run one integration test file for a specific model
uv run pytest tests/test_basic.py -k "gpt-5.2" -v

# Lint
uv run ruff check src tests

# Format
uv run ruff format src tests

# Type check
uv run pyright src

# Multi-file integration run with per-file HTML reports
uv run python scripts/run_all.py
```

## Architecture

This is a **pytest plugin** (`pytest11` entry point) that provides a test harness for empirically validating GitHub Copilot agent configurations.

### Data Flow

```
CopilotAgent (frozen config dataclass)
→ runner.run_copilot(agent, prompt)
→ GitHub Copilot SDK client + session
→ SDK SessionEvent stream
→ EventMapper.process_event() (38+ event types → structured data)
→ Turn / ToolCall accumulation
→ CopilotResult (turns, success, usage, reasoning, subagents)
→ copilot_run fixture stashes result for pytest-aitest
→ HTML report with AI-powered insights
```

### Key Modules (`src/pytest_codingagents/`)

| Module | Role |
|--------|------|
| `plugin.py` | Pytest plugin entry point; registers fixtures and `pytest_aitest_analysis_prompt` hook |
| `copilot/agent.py` | `CopilotAgent` frozen dataclass; `build_session_config()` maps user fields → SDK TypedDict |
| `copilot/runner.py` | `run_copilot()` — manages SDK client lifecycle, streams events, returns `CopilotResult` |
| `copilot/events.py` | `EventMapper` — translates raw SDK events into `Turn`/`ToolCall` objects |
| `copilot/result.py` | `CopilotResult`, `UsageInfo`, `SubagentInvocation`; re-exports `Turn`/`ToolCall` from `pytest_aitest` |
| `copilot/fixtures.py` | `copilot_run` and `ab_run` pytest fixtures |
| `copilot/agents.py` | `load_custom_agent()` — parses `.agent.md` YAML frontmatter files |
| `copilot/optimizer.py` | `optimize_instruction()` — uses pydantic-ai to suggest instruction improvements |
| `copilot/personas.py` | `VSCodePersona`, `ClaudeCodePersona`, `CopilotCLIPersona`, `HeadlessPersona` — inject IDE context |

### Two Core Fixtures

**`copilot_run(agent, prompt)`** — Executes a single agent run, auto-stashes result for aitest reporting.

**`ab_run(baseline_agent, treatment_agent, task)`** — Runs two agents in isolated `tmp_path` directories and returns `(baseline_result, treatment_result)` for direct comparison.

## Key Conventions

### Every module uses `from __future__ import annotations`
Required for forward references and PEP 563 deferred evaluation. Add it to every new module.

### `CopilotAgent` is a frozen dataclass
It is immutable and safe to share across parametrized tests. User-friendly field names (e.g., `instructions`) are mapped to SDK internals in `build_session_config()`. Unknown SDK fields go in `extra_config: dict`.

### Async-first
All SDK interactions are async. Test functions using `copilot_run` or `ab_run` must be `async def`. `asyncio_mode = "auto"` is set in `pyproject.toml`, so no `@pytest.mark.asyncio` decorator is needed.

### Integration tests are parametrized over models
```python
from tests.conftest import MODELS

@pytest.mark.parametrize("model", MODELS)
async def test_something(copilot_run, model):
agent = CopilotAgent(model=model, ...)
```
`MODELS = ["gpt-5.2", "claude-opus-4.5"]` is defined in `tests/conftest.py`.

### Result introspection methods
Prefer the typed helper methods over raw field access:
- `result.success` / `result.error`
- `result.tool_was_called("create_file")`
- `result.all_tool_calls` / `result.final_response`
- `result.file(path)` — reads a file from the agent's working directory
- `result.usage` — `UsageInfo` with token counts and estimated cost

### Personas inject IDE context post-config
Apply a persona to a `CopilotAgent` before running to simulate a specific IDE environment (e.g., `VSCodePersona` polyfills `runSubagent`). This is separate from the agent config.

### Custom agents use `.agent.md` files
YAML frontmatter + Markdown body. Parsed by `load_custom_agent(path)`. The `mode` frontmatter field controls agent type.

### Ruff rules: E, F, B, I — 100 char line length, double quotes
Enforced by pre-commit hooks and CI. Run `uv run ruff check --fix src tests` before committing.

### Pyright type checking is `basic` mode, scoped to `src/` only
Tests directory is not type-checked by pyright. Type annotations in `src/` should be complete and valid.
4 changes: 2 additions & 2 deletions docs/how-to/optimize.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,11 +87,11 @@ async def test_docstring_instruction_iterates(ab_run, tmp_path):

## API Reference

::: pytest_codingagents.copilot.optimizer.optimize_instruction
::: pytest_aitest.execution.optimizer.optimize_instruction

---

::: pytest_codingagents.copilot.optimizer.InstructionSuggestion
::: pytest_aitest.execution.optimizer.InstructionSuggestion

## Choosing a Model

Expand Down
4 changes: 2 additions & 2 deletions docs/reference/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@
options:
show_source: false

::: pytest_codingagents.optimize_instruction
::: pytest_aitest.execution.optimizer.optimize_instruction
options:
show_source: false

::: pytest_codingagents.InstructionSuggestion
::: pytest_aitest.execution.optimizer.InstructionSuggestion
options:
show_source: false

Expand Down
14 changes: 11 additions & 3 deletions docs/reference/result.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,22 @@
options:
show_source: false

::: pytest_codingagents.copilot.result.SubagentInvocation
## SubagentInvocation

`SubagentInvocation` is defined in [`pytest_aitest.core.result`](https://sbroenne.github.io/pytest-aitest/reference/result/) and available as:

```python
from pytest_aitest import SubagentInvocation
```

::: pytest_aitest.core.result.SubagentInvocation
options:
show_source: false

## Turn and ToolCall

`Turn` and `ToolCall` are re-exported from [`pytest_aitest.core.result`](https://sbroenne.github.io/pytest-aitest/reference/result/) for convenience. See the pytest-aitest documentation for their full API.
`Turn` and `ToolCall` are defined in [`pytest_aitest.core.result`](https://sbroenne.github.io/pytest-aitest/reference/result/) and available as:

```python
from pytest_codingagents.copilot.result import Turn, ToolCall
from pytest_aitest import Turn, ToolCall
```
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ classifiers = [
dependencies = [
"pytest>=9.0",
"github-copilot-sdk>=0.1.25",
"pytest-aitest>=0.5.6",
"pytest-aitest>=0.5.7",
"azure-identity>=1.25.2",
"pyyaml>=6.0",
"pydantic-ai>=1.0",
Expand Down
6 changes: 2 additions & 4 deletions src/pytest_codingagents/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,10 @@

from __future__ import annotations

from pytest_aitest.execution.optimizer import InstructionSuggestion, optimize_instruction

from pytest_codingagents.copilot.agent import CopilotAgent
from pytest_codingagents.copilot.agents import load_custom_agent, load_custom_agents
from pytest_codingagents.copilot.optimizer import (
InstructionSuggestion,
optimize_instruction,
)
from pytest_codingagents.copilot.personas import (
ClaudeCodePersona,
CopilotCLIPersona,
Expand Down
2 changes: 1 addition & 1 deletion src/pytest_codingagents/copilot/events.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,11 +60,11 @@
import time
from typing import TYPE_CHECKING, Any

from pytest_aitest.core.result import SubagentInvocation
from pytest_aitest.execution.cost import estimate_cost

from pytest_codingagents.copilot.result import (
CopilotResult,
SubagentInvocation,
ToolCall,
Turn,
UsageInfo,
Expand Down
161 changes: 0 additions & 161 deletions src/pytest_codingagents/copilot/optimizer.py

This file was deleted.

Loading