Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ pytest-codingagents gives you a complete **test→optimize→test loop** for Git
4. **A/B confirm** — use `ab_run` to prove the change actually helps
5. **Ship it** — you now have evidence, not vibes

Currently supports **GitHub Copilot** via [copilot-sdk](https://www.npmjs.com/package/github-copilot-sdk). More agents (Claude Code, etc.) coming soon.
Currently supports **GitHub Copilot** via [copilot-sdk](https://www.npmjs.com/package/github-copilot-sdk) with **IDE personas** for VS Code, Claude Code, and Copilot CLI environments.

```python
from pytest_codingagents import CopilotAgent, optimize_instruction
Expand Down
5 changes: 4 additions & 1 deletion docs/how-to/copilot-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ test fixture project, a shared team config repo, or anything else.
| Source | Path (relative to the root you point at) | Maps to |
|--------|------------------------------------------|---------|
| Instructions | `.github/copilot-instructions.md` | `instructions` |
| Custom agents | `.github/agents/*.agent.md` | `custom_agents` |
| Custom agents | `.github/agents/**/*.agent.md` (recursive) | `custom_agents` |

Agent files are discovered recursively — agents in `subagents/` subdirectories (e.g. `.github/agents/hve-core/subagents/`) are included automatically.

## Basic usage

Expand Down Expand Up @@ -102,6 +104,7 @@ The Markdown body becomes the agent's prompt.

## See also

- [IDE Personas Guide](ide-personas.md) — Simulate VS Code, Claude Code, or Copilot CLI environments
- [A/B Testing Guide](ab-testing.md)
- [GitHub Copilot custom agents docs](https://docs.github.com/en/copilot/how-tos/copilot-cli/customize-copilot/create-custom-agents-for-cli)
- [Custom agents configuration reference](https://docs.github.com/en/copilot/reference/custom-agents-configuration)
108 changes: 108 additions & 0 deletions docs/how-to/ide-personas.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# IDE Personas

Agents written for VS Code, Claude Code, or the Copilot CLI each expect a
different native tool set. A `Persona` tells `pytest-codingagents` which
runtime environment to simulate so your tests run the agent the same way
the IDE would.

## The problem

An agent like `rpi-agent` is written for VS Code, where `runSubagent` is a
native tool. In the Copilot SDK headless mode `runSubagent` does not exist,
so the agent silently falls back to direct implementation — the RPI pipeline
never fires, and the test proves nothing.

A persona solves this by:

1. **Injecting polyfill tools** — e.g. a Python-side `runSubagent` that
dispatches registered custom agents as nested SDK runs.
2. **Auto-loading custom instructions** — VS Code and Copilot CLI read
`.github/copilot-instructions.md`; Claude Code reads `CLAUDE.md`. The
persona does the same, prepending the file to the session's system
message when `working_directory` is set.
3. **Setting IDE context** — adds a system-message fragment so the model
knows which environment it is in.

## Built-in personas

| Persona | Auto-loaded file | Polyfilled tools | Use for |
|---|---|---|---|
| `VSCodePersona` *(default)* | `.github/copilot-instructions.md` | `runSubagent` | VS Code Copilot agents |
| `CopilotCLIPersona` | `.github/copilot-instructions.md` | none — `task` + `skill` are native | Copilot terminal agents |
| `ClaudeCodePersona` | `CLAUDE.md` | `task`-dispatch | Claude Code agents |
| `HeadlessPersona` | nothing | none | Raw SDK baseline |

## Usage

```python
from pytest_codingagents import CopilotAgent, VSCodePersona, CopilotCLIPersona, ClaudeCodePersona, HeadlessPersona

# VS Code agent — auto-loads .github/copilot-instructions.md, polyfills runSubagent
agent = CopilotAgent(
persona=VSCodePersona(),
working_directory=str(workspace),
custom_agents=my_agents,
)

# Default — VSCodePersona is used automatically
agent = CopilotAgent(custom_agents=my_agents)

# Copilot CLI — same instructions file; task+skill already native, no polyfill needed
agent = CopilotAgent(persona=CopilotCLIPersona(), working_directory=str(workspace))

# Claude Code — loads CLAUDE.md, polyfills task-dispatch
agent = CopilotAgent(
persona=ClaudeCodePersona(),
working_directory=str(workspace),
custom_agents=my_agents,
)

# Headless baseline — no IDE context, no file loaded, no polyfills
agent = CopilotAgent(persona=HeadlessPersona())
```

## Custom instructions loading

Custom instruction loading is **automatic and additive**:

- Fires only when `agent.working_directory` is set
- Fires only when the target file exists in that directory
- Prepends the file content to the session system message (before any
`instructions` you set on the agent)
- If the file is absent, the persona works exactly as without it

This means the same test works against a workspace that has
`.github/copilot-instructions.md` and one that does not — the persona
adapts silently.

## `runSubagent` polyfill

`VSCodePersona` injects `runSubagent` as a Python-side tool when
`agent.custom_agents` is non-empty. The tool dispatches the named agent
as a nested `run_copilot` call, so the model's sub-agent invocations
produce real results — not stub responses.

The polyfill is a no-op when `custom_agents` is empty.

## Extending personas

Subclass `Persona` and override `apply()`:

```python
from pytest_codingagents import Persona, CopilotAgent

class MyPersona(Persona):
def apply(self, agent, session_config, mapper):
# Add your tool polyfills or system message additions here
session_config.setdefault("system_message", {})["content"] = (
"Custom context. " +
session_config.get("system_message", {}).get("content", "")
)

agent = CopilotAgent(persona=MyPersona())
```

## See also

- [Load from Copilot Config](copilot-config.md)
- [Tool Control](tool-control.md)
1 change: 1 addition & 0 deletions docs/how-to/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Practical guides for common tasks.
- [Optimize Instructions](optimize.md) — Use AI to turn test failures into actionable instruction improvements
- [Assertions](assertions.md) — File helpers and semantic assertions with `llm_assert`
- [Load from Copilot Config](copilot-config.md) — Build a `CopilotAgent` from your real `.github/` config files
- [IDE Personas](ide-personas.md) — Simulate VS Code, Claude Code, or Copilot CLI tool environments
- [Skill Testing](skills.md) — Measure the impact of domain knowledge
- [MCP Server Testing](mcp-servers.md) — Test that the agent uses your custom tools
- [CLI Tool Testing](cli-tools.md) — Verify the agent operates CLI tools correctly
Expand Down
22 changes: 22 additions & 0 deletions docs/reference/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,25 @@
::: pytest_codingagents.InstructionSuggestion
options:
show_source: false

## IDE Personas

::: pytest_codingagents.Persona
options:
show_source: false

::: pytest_codingagents.VSCodePersona
options:
show_source: false

::: pytest_codingagents.CopilotCLIPersona
options:
show_source: false

::: pytest_codingagents.ClaudeCodePersona
options:
show_source: false

::: pytest_codingagents.HeadlessPersona
options:
show_source: false
1 change: 1 addition & 0 deletions docs/reference/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `persona` | `Persona` | `VSCodePersona()` | IDE runtime persona — controls polyfill tools and auto-loads IDE-specific custom instructions. See [IDE Personas](../how-to/ide-personas.md) |
| `name` | `str` | `"copilot"` | Agent identifier for reports |
| `model` | `str \| None` | `None` | Model to use (e.g., `claude-sonnet-4`) |
| `instructions` | `str \| None` | `None` | Instructions for the agent |
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "pytest-codingagents"
version = "0.2.0"
version = "0.2.1"
description = "Pytest plugin for testing real coding agents via their SDK"
readme = "README.md"
license = { text = "MIT" }
Expand Down
12 changes: 12 additions & 0 deletions src/pytest_codingagents/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,24 @@
InstructionSuggestion,
optimize_instruction,
)
from pytest_codingagents.copilot.personas import (
ClaudeCodePersona,
CopilotCLIPersona,
HeadlessPersona,
Persona,
VSCodePersona,
)
from pytest_codingagents.copilot.result import CopilotResult

__all__ = [
"CopilotAgent",
"CopilotResult",
"InstructionSuggestion",
"ClaudeCodePersona",
"CopilotCLIPersona",
"HeadlessPersona",
"Persona",
"VSCodePersona",
"load_custom_agent",
"load_custom_agents",
"optimize_instruction",
Expand Down
26 changes: 23 additions & 3 deletions src/pytest_codingagents/copilot/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,13 @@

from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Literal
from typing import TYPE_CHECKING, Any, Literal

import yaml

if TYPE_CHECKING:
from pytest_codingagents.copilot.personas import Persona

Check failure

Code scanning / CodeQL

Module-level cyclic import Error

'Persona' may not be defined if module
pytest_codingagents.copilot.personas
is imported before module
pytest_codingagents.copilot.agent
, as the
definition
of Persona occurs after the cyclic
import
of pytest_codingagents.copilot.agent.
'Persona' may not be defined if module
pytest_codingagents.copilot.personas
is imported before module
pytest_codingagents.copilot.agent
, as the
definition
of Persona occurs after the cyclic
import
of pytest_codingagents.copilot.agent.

Copilot Autofix

AI 11 days ago

In general, to fix this, avoid importing from pytest_codingagents.copilot.personas at module level in agent.py, even under TYPE_CHECKING, and instead use forward references or typing utilities (e.g., typing.TYPE_CHECKING or typing_extensions) that do not require importing the other module. This breaks the cycle at the source while keeping type hints.

The best targeted fix here is to remove the TYPE_CHECKING import of Persona and replace the string-annotated "Persona" hints with either typing.ForwardRef / typing.get_type_hints patterns or, more simply and idiomatically for modern Python, by using from __future__ import annotations (which is already present) and a typing.Protocol or a typing.Any fallback. However, since from __future__ import annotations is already in place, we can safely use the string "Persona" as a forward reference without needing the import at type-check time, and robust type checkers can be configured to resolve it via stub files or by importing personas separately. To stay within the given snippet and not change other files, the minimal fix is:

  • Delete the if TYPE_CHECKING: block that imports Persona.
  • Replace the annotation persona: "Persona" with a more generic but safe type such as Any, while keeping the runtime behaviour (_default_persona() still returns a VSCodePersona object). This removes the dependency on Persona’s definition order and the cyclic import, without affecting functionality.

Concretely, in src/pytest_codingagents/coplay/agent.py:

  1. Remove lines 11–13 (the TYPE_CHECKING import of Persona).
  2. Change the persona field’s annotation from "Persona" to Any (which is already imported at line 7), and keep its default_factory unchanged.

No new imports or helpers are needed: Any is already imported, and _default_persona remains as-is, still performing a deferred import of VSCodePersona for runtime use.

Suggested changeset 1
src/pytest_codingagents/copilot/agent.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/pytest_codingagents/copilot/agent.py b/src/pytest_codingagents/copilot/agent.py
--- a/src/pytest_codingagents/copilot/agent.py
+++ b/src/pytest_codingagents/copilot/agent.py
@@ -8,10 +8,10 @@
 
 import yaml
 
-if TYPE_CHECKING:
-    from pytest_codingagents.copilot.personas import Persona
 
 
+
+
 def _parse_agent_file(path: Path) -> dict[str, Any]:
     """Parse a ``.agent.md`` file into a ``CustomAgentConfig`` dict.
 
@@ -137,7 +135,7 @@
     # the target runtime environment (VS Code, Claude Code, Copilot CLI, etc.)
     # VSCodePersona is the default: it polyfills runSubagent when custom_agents
     # are present, matching VS Code's native behaviour.
-    persona: "Persona" = field(default_factory=lambda: _default_persona())
+    persona: Any = field(default_factory=lambda: _default_persona())
 
     def build_session_config(self) -> dict[str, Any]:
         """Build a SessionConfig dict for the Copilot SDK.
EOF
@@ -8,10 +8,10 @@

import yaml

if TYPE_CHECKING:
from pytest_codingagents.copilot.personas import Persona




def _parse_agent_file(path: Path) -> dict[str, Any]:
"""Parse a ``.agent.md`` file into a ``CustomAgentConfig`` dict.

@@ -137,7 +135,7 @@
# the target runtime environment (VS Code, Claude Code, Copilot CLI, etc.)
# VSCodePersona is the default: it polyfills runSubagent when custom_agents
# are present, matching VS Code's native behaviour.
persona: "Persona" = field(default_factory=lambda: _default_persona())
persona: Any = field(default_factory=lambda: _default_persona())

def build_session_config(self) -> dict[str, Any]:
"""Build a SessionConfig dict for the Copilot SDK.
Copilot is powered by AI and may make mistakes. Always verify output.


def _parse_agent_file(path: Path) -> dict[str, Any]:
"""Parse a ``.agent.md`` file into a ``CustomAgentConfig`` dict.
Expand Down Expand Up @@ -130,6 +133,12 @@
# SDK passthrough for unmapped fields
extra_config: dict[str, Any] = field(default_factory=dict)

# IDE persona — controls which polyfill tools are injected to simulate
# the target runtime environment (VS Code, Claude Code, Copilot CLI, etc.)
# VSCodePersona is the default: it polyfills runSubagent when custom_agents
# are present, matching VS Code's native behaviour.
persona: "Persona" = field(default_factory=lambda: _default_persona())

Check notice

Code scanning / CodeQL

Unnecessary lambda Note

This 'lambda' is just a simple wrapper around a callable object. Use that object directly.

Copilot Autofix

AI 11 days ago

In general, when a lambda in Python merely calls another function with the same arguments (or no arguments), you should pass the function object directly instead of wrapping it in a lambda. This removes unnecessary indirection and follows idiomatic Python use of first-class functions.

In this file, the best fix is to update the persona field definition in the CopilotAgent dataclass so that default_factory is set directly to _default_persona instead of lambda: _default_persona(). This keeps behavior identical: dataclasses will still call default_factory with no arguments each time a default value is needed, and _default_persona is already a zero-argument function. No imports or additional definitions are needed; only the single field definition at line 140 should be modified.

Suggested changeset 1
src/pytest_codingagents/copilot/agent.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/pytest_codingagents/copilot/agent.py b/src/pytest_codingagents/copilot/agent.py
--- a/src/pytest_codingagents/copilot/agent.py
+++ b/src/pytest_codingagents/copilot/agent.py
@@ -137,7 +137,7 @@
     # the target runtime environment (VS Code, Claude Code, Copilot CLI, etc.)
     # VSCodePersona is the default: it polyfills runSubagent when custom_agents
     # are present, matching VS Code's native behaviour.
-    persona: "Persona" = field(default_factory=lambda: _default_persona())
+    persona: "Persona" = field(default_factory=_default_persona)
 
     def build_session_config(self) -> dict[str, Any]:
         """Build a SessionConfig dict for the Copilot SDK.
EOF
@@ -137,7 +137,7 @@
# the target runtime environment (VS Code, Claude Code, Copilot CLI, etc.)
# VSCodePersona is the default: it polyfills runSubagent when custom_agents
# are present, matching VS Code's native behaviour.
persona: "Persona" = field(default_factory=lambda: _default_persona())
persona: "Persona" = field(default_factory=_default_persona)

def build_session_config(self) -> dict[str, Any]:
"""Build a SessionConfig dict for the Copilot SDK.
Copilot is powered by AI and may make mistakes. Always verify output.

def build_session_config(self) -> dict[str, Any]:
"""Build a SessionConfig dict for the Copilot SDK.

Expand Down Expand Up @@ -243,11 +252,11 @@
if instructions_file.exists():
instructions = instructions_file.read_text(encoding="utf-8").strip() or None

# Load custom agents
# Load custom agents — recursive so subagents/ subdirectories are included
agents: list[dict[str, Any]] = []
agents_dir = github_dir / "agents"
if agents_dir.exists():
for agent_file in sorted(agents_dir.glob("*.agent.md")):
for agent_file in sorted(agents_dir.rglob("*.agent.md")):
agents.append(_parse_agent_file(agent_file))

config: dict[str, Any] = {
Expand All @@ -256,3 +265,14 @@
}
config.update(overrides)
return cls(**config)


def _default_persona() -> "Persona":
"""Return the default persona (VSCodePersona).

Defined as a function to avoid a circular-import at module level:
``personas.py`` imports ``agent.py``, so we defer the import.
"""
from pytest_codingagents.copilot.personas import VSCodePersona # noqa: PLC0415

return VSCodePersona()
25 changes: 25 additions & 0 deletions src/pytest_codingagents/copilot/events.py
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,31 @@ def _handle_tool_execution_complete(self, event: SessionEvent) -> None:
result_text = tc.result if tc else str(result_data)
self._turns.append(Turn(role="tool", content=f"[{tool_name}] {result_text or ''}"))

# ── Subagent recording (used by runSubagent tool handler) ──

def record_subagent_start(self, name: str) -> None:
"""Record a subagent invocation dispatched via the runSubagent tool."""
self._subagent_start_times[name] = time.monotonic()
self._subagents.append(SubagentInvocation(name=name, status="started"))

def record_subagent_complete(self, name: str) -> None:
"""Mark a previously started subagent invocation as completed."""
start = self._subagent_start_times.pop(name, None)
duration = (time.monotonic() - start) * 1000 if start else None
for sa in self._subagents:
if sa.name == name and sa.status == "started":
sa.status = "completed"
sa.duration_ms = duration
return

def record_subagent_failed(self, name: str) -> None:
"""Mark a previously started subagent invocation as failed."""
self._subagent_start_times.pop(name, None)
for sa in self._subagents:
if sa.name == name and sa.status == "started":
sa.status = "failed"
return

# ── Subagent events ──

def _handle_subagent_selected(self, event: SessionEvent) -> None:
Expand Down
Loading
Loading