Skip to content

[Java] @CopilotTool ergonomics 4.5: E2E integration test with replay proxy #1762

Description

@edburns

Overview

Create an E2E failsafe integration test that proves the ergonomic @CopilotTool + ToolDefinition.fromObject() API produces identical wire behavior to the low-level ToolDefinition.create() API, tested against the replay proxy.

Branch: edburns/1682-java-tool-ergonomics on upstream (⚠️ NOT main — PRs must target this branch)

Prerequisites

  • Tasks 4.1–4.4 must be complete and merged to the branch.
  • Before writing any code, read the entire implementation plan at:
    1682-java-tool-ergonomics-prompts-remove-before-merge/dd-3018003-ignorance-reduction-for-implementation-plan.md

Relevant plan sections to carefully re-read

  • Section 4.5 — E2E integration test (the primary task description, includes test code outline)
  • Phase 2 ✅ — Verify the existing low-level path works in Java (context: LowLevelToolDefinitionIT is the reference)
  • The existing skill file for creating Java E2E tests: .github/skills/new-java-e2e-test-yaml-and-test/SKILL.md — read this skill file for the exact patterns, conventions, and harness usage for creating new E2E tests.

Deliverables

Files to create

  1. test/snapshots/tools/ergonomic_tool_definition.yaml — replay proxy snapshot (may be identical to low_level_tool_definition.yaml since the wire format is the same; if so, symlink or copy is acceptable).
  2. java/src/test/java/com/github/copilot/e2e/ErgonomicToolDefinitionIT.java — the failsafe IT class.

Test specification

The test must define a tools class using the ergonomic API:

class MyTestTools {
    String currentPhase;

    @CopilotTool("Sets the current phase of the agent")
    String setCurrentPhase(@Param("The phase to transition to") String phase) {
        currentPhase = phase;
        return "Phase set to " + phase;
    }

    @CopilotTool("Search for items by keyword")
    String searchItems(@Param("Search keyword") String keyword) {
        return "Found: item_alpha, item_beta";
    }

    @CopilotTool(value = "Custom grep override", name = "grep", overridesBuiltInTool = true)
    String grepOverride(@Param("Search query") String query) {
        return "CUSTOM_GREP: " + query;
    }
}

The test method:

  1. Creates MyTestTools instance.
  2. Calls ToolDefinition.fromObject(tools) to get tool definitions.
  3. Creates a CopilotSession configured with the replay proxy URL and the tool definitions.
  4. Sends a prompt that triggers tool invocations.
  5. Asserts that:
    • Tools were invoked (via the tool handler callbacks).
    • The correct arguments were passed to each tool.
    • The session completed successfully.
    • The wire-level behavior is identical to LowLevelToolDefinitionIT.

Snapshot YAML

The snapshot YAML must match the exchange pattern from test/snapshots/tools/low_level_tool_definition.yaml. The tool schemas sent over the wire by the ergonomic API must be byte-for-byte identical to what the low-level API sends (proving the abstraction is lossless).

If the snapshot can be reused as-is (same tool names, same schemas), reference the existing file. If tool names differ, create a new snapshot with appropriate tool definitions.

Gating tests and criteria

All of the following must pass before this task is considered complete:

  1. IT runs and passes: mvn verify -Dit.test=ErgonomicToolDefinitionIT passes.

  2. Wire equivalence: The tool definitions registered via fromObject() produce identical JSON-RPC tool registration messages as those from LowLevelToolDefinitionIT. Verify by comparing:

    • Tool names sent to the server.
    • Tool schemas (JSON Schema) sent to the server.
    • Tool invocation request/response format.
  3. Tool invocation verification: Assert that during the test session:

    • At least one tool was invoked by the model.
    • The tool handler received the correct arguments.
    • The tool handler's return value was sent back to the server.
    • State was mutated correctly (e.g., currentPhase field was set).
  4. Override tool verification: If the snapshot exercises the grep override tool, verify it was invoked and returned "CUSTOM_GREP: ...".

  5. No regression: mvn clean verify passes (all existing ITs including LowLevelToolDefinitionIT still pass).

  6. Spotless format check: mvn spotless:check passes.

Constraints

  • ✅✅ YOU MUST run mvn spotless:apply before every commit.

  • Follow the exact E2E test patterns established by LowLevelToolDefinitionIT and documented in .github/skills/new-java-e2e-test-yaml-and-test/SKILL.md.

  • Use E2ETestContext for managing the replay proxy lifecycle.

  • Test method names are converted to lowercase snake_case for snapshot filenames.

  • Do NOT modify any files outside the java/ and test/snapshots/ directories.

  • Do NOT modify LowLevelToolDefinitionIT or its snapshot.

Metadata

Metadata

Assignees

No one assigned

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions