Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
cd461cd
Resume 1682 iterating
edburns Jun 18, 2026
edf457c
Phase 03 answer questions
edburns Jun 18, 2026
6d42a43
On branch edburns/1682-java-tool-ergonomics
edburns Jun 18, 2026
0751844
WIP: Phase 3. Question 3.4
edburns Jun 22, 2026
9441d48
WIP: Phase 3. Question 3.6
edburns Jun 22, 2026
cfe0874
WIP: Phase 3. Question 3.6: Answer
edburns Jun 22, 2026
e8408fa
Answer 3.7
edburns Jun 22, 2026
ba84711
Resolve 3.8
edburns Jun 22, 2026
c36ab4c
Initial plan
Copilot Jun 22, 2026
9c4de05
feat(java): create @CopilotTool and @Param annotations with tests
Copilot Jun 22, 2026
a0c1623
spotless
edburns Jun 22, 2026
1a5778a
fix(java): make ToolDefer.NONE serialize as null to prevent wire leak
edburns Jun 22, 2026
3865f46
WIP Phase 4.1
edburns Jun 22, 2026
9f78437
feat(java): create @CopilotTool and @Param annotations (#1763)
Copilot Jun 23, 2026
5ec1aee
Initial plan
Copilot Jun 22, 2026
f413c7f
feat(java): add SchemaGenerator compile-time type-to-JSON-Schema util…
Copilot Jun 23, 2026
403ac7d
WIP 4.3
edburns Jun 23, 2026
7634198
Initial plan
Copilot Jun 23, 2026
3ecfa57
feat(java): Add CopilotToolProcessor annotation processor (task 4.3)
Copilot Jun 23, 2026
eaa25b6
fix: Address code review feedback
Copilot Jun 23, 2026
027a45b
fix: Fix Spotless formatting and test classpath for JDK 17
Copilot Jun 23, 2026
903740b
fix: Fix remaining Spotless violations and test classpath resolution
Copilot Jun 23, 2026
b53f838
fix: Add jackson-core and jackson-annotations to test classpath
Copilot Jun 23, 2026
1ac82f6
fix: Fix Spotless formatting for keyClasses array initializer
Copilot Jun 23, 2026
c3c6291
fix(java): Pass ObjectMapper as parameter in generated $$CopilotToolM…
edburns Jun 24, 2026
f5f7956
fix(java): restrict single-param shortcut to records only
edburns Jun 24, 2026
ac2e240
fix(java): emit typed default values in JSON Schema
edburns Jun 24, 2026
e939709
fix(java): fix double 61059CopilotToolMeta suffix in test helper
edburns Jun 24, 2026
887fef0
fix(java): use record constructor for independent flag combination
edburns Jun 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Plan: Add E2E test for non-ergonomic (low-level) tool definition

## Goal

Add a failsafe IT test that exercises the **current explicit** `ToolDefinition.create()` / `ToolDefinition.createOverride()` API — the "non-ergonomic" approach — with multiple tools, `ToolSet` with `addCustom`/`addBuiltIn`, `getArgumentsAs()` deserialization into a record, and a tool handler that mutates application state. This establishes baseline test coverage before issue #1682 adds the annotation-driven ergonomic API.

## Instructions

Read `java.instructions.md` in my User level Copilot instructions. This session is about Java.

Use the `new-java-e2e-test-yaml-and-test` skill to create a new failsafe IT test that exercises the non-ergonomic-tool-definition approach to tool definition.

### What the test must exercise

The test class should be `LowLevelToolDefinitionIT.java` in `java/src/test/java/com/github/copilot/`. It must demonstrate **all** of the following in a single session:

1. **`ToolDefinition.create(name, description, schema, handler)`** — define at least two custom tools explicitly with `Map<String, Object>` schemas.
2. **`ToolDefinition.createOverride(name, description, schema, handler)`** — define one tool that overrides a built-in tool.
3. **`invocation.getArgumentsAs(SomeRecord.class)`** — at least one handler must deserialize arguments into a Java record (not `getArguments()` returning raw Map).
4. **`invocation.getArguments()`** — at least one handler must use the raw `Map<String, Object>` accessor.
5. **`ToolSet` with `addCustom("*").addBuiltIn("web_fetch")`** — pass `setAvailableTools(...)` on the `SessionConfig`.
6. **Handler mutates state** — one tool handler should mutate a field on the test class and the test should assert that the field was updated after the response.
7. **Handler returns `CompletableFuture.completedFuture(...)`** — all handlers return completed futures (as is the current pattern).

### Concrete test design

#### Snapshot category

`tools` (reuse the existing category under `test/snapshots/tools/`).

#### Snapshot file

`test/snapshots/tools/low_level_tool_definition.yaml`

#### Java test method name

`lowLevelToolDefinition` (converts to `low_level_tool_definition` for snapshot lookup).

#### Tool definitions for the test

| Tool | Factory | Name | Description | Schema | Handler behavior |
|------|---------|------|-------------|--------|-----------------|
| Set Phase | `create` | `set_current_phase` | "Sets the current phase of the agent" | `{ type: object, properties: { phase: { type: string, enum: [searching, analyzing, done] } }, required: [phase] }` | Deserializes via `getArgumentsAs(PhaseArgs.class)` where `record PhaseArgs(String phase) {}`. Mutates a `currentPhase` field on the test. Returns `"Phase set to " + phase`. |
| Search | `create` | `search_items` | "Search for items by keyword" | `{ type: object, properties: { keyword: { type: string } }, required: [keyword] }` | Uses `getArguments()` raw Map. Returns a fixed string like `"Found: item_alpha, item_beta"`. |
| Override grep | `createOverride` | `grep` | "Custom grep override" | `{ type: object, properties: { query: { type: string } }, required: [query] }` | Uses `getArguments()`. Returns `"CUSTOM_GREP: " + query`. |

#### Prompt

```
First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and search results.
```

#### YAML snapshot structure

Two conversations (one for the tool-call turn, one for the final response turn after tool results are provided):

- **Conversation 1** (tool call turn): system `${system}` + user prompt → assistant with `tool_calls` for `set_current_phase` and `search_items`.
- **Conversation 2** (final response turn): full history including tool results → assistant final content mentioning "analyzing", "item_alpha", "item_beta".

Study the existing snapshot files in `test/snapshots/tools/` carefully. In particular, study the snapshot file for the `testInvokesCustomTool` test in `ToolsTest.java` (`test/snapshots/tools/invokes_custom_tool.yaml`). It shows how tool call and tool result conversations are structured. Additionally, study `test/snapshots/tools/should_execute_multiple_custom_tools_in_parallel_single_turn.yaml` which shows multiple parallel tool calls in a single turn.

#### Assertions

1. `response` is not null.
2. Response content contains `"analyzing"` (confirming the phase tool was called).
3. Response content contains `"item_alpha"` or `"item_beta"` (confirming search tool was called).
4. The `currentPhase` field on the test class equals `"analyzing"` (confirming handler mutated state).

#### Session config

```java
new SessionConfig()
.setOnPermissionRequest(PermissionHandler.APPROVE_ALL)
.setAvailableTools(new ToolSet().addCustom("*").addBuiltIn("web_fetch"))
.setTools(List.of(setPhaseTool, searchTool, grepOverrideTool))
```

### Step-by-step execution

1. Create the YAML snapshot file at `test/snapshots/tools/low_level_tool_definition.yaml`.
2. Create the Java IT file at `java/src/test/java/com/github/copilot/LowLevelToolDefinitionIT.java`.
3. Run `mvn spotless:apply` from the `java/` directory (using the background + log pattern from `java.instructions.md`).
4. Run the test in isolation:
```sh
cd java
LOG="$(date +%Y%m%d-%H%M)-job-logs.txt" && mvn failsafe:integration-test -Dit.test="LowLevelToolDefinitionIT#lowLevelToolDefinition" -Denforcer.skip=true > "$LOG" 2>&1 & tail -f "$LOG"
```
5. Fix any failures. Iterate until the isolated test passes cleanly.
6. Run the full build:
```sh
cd java
LOG="$(date +%Y%m%d-%H%M)-job-logs.txt" && mvn clean verify > "$LOG" 2>&1 & tail -f "$LOG"
```
7. Fix any failures from the full build. Iterate until `mvn clean verify` passes cleanly.
Loading
Loading