Skip to content

More e2e nodejs tests#466

Open
friggeri wants to merge 2 commits intomainfrom
frg/more-e2e-tests
Open

More e2e nodejs tests#466
friggeri wants to merge 2 commits intomainfrom
frg/more-e2e-tests

Conversation

@friggeri
Copy link
Collaborator

As titled

@friggeri friggeri requested a review from a team as a code owner February 13, 2026 17:37
Copilot AI review requested due to automatic review settings February 13, 2026 17:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds additional Node.js E2E coverage by introducing new Vitest suites and the corresponding replay-proxy snapshots, extending coverage around tool results, streaming/event fidelity, lifecycle APIs, hooks, and built-in tool behaviors. It also updates the replaying CAPI proxy to support “request-only” snapshots used for timeout scenarios.

Changes:

  • Added multiple new Node.js E2E test suites (tool results, streaming fidelity, lifecycle, hooks, built-in tools, multi-turn, error resilience) and their snapshot YAMLs under test/snapshots/**.
  • Updated the replaying proxy to hang indefinitely when a request matches a “request-only” snapshot (for timeout tests).
  • Updated a handful of existing snapshots and skipped the Node compaction E2E suite.

Reviewed changes

Copilot reviewed 51 out of 51 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/harness/replayingCapiProxy.ts Adds “request-only snapshot” detection to support timeout test scenarios.
nodejs/test/e2e/builtin_tools.test.ts New E2E suite covering built-in tools (bash/view/edit/create/grep/glob).
nodejs/test/e2e/client_lifecycle.test.ts New E2E coverage for last-session-id and lifecycle event emission.
nodejs/test/e2e/compaction.test.ts Marks compaction suite skipped.
nodejs/test/e2e/error_resilience.test.ts New E2E coverage for error behaviors (destroyed sessions, abort, resume invalid).
nodejs/test/e2e/event_fidelity.test.ts New E2E coverage validating ordering and required fields on emitted events.
nodejs/test/e2e/hooks_extended.test.ts New E2E coverage for extended session hooks.
nodejs/test/e2e/multi_turn.test.ts New E2E coverage for multi-turn behavior using prior tool/file results.
nodejs/test/e2e/session_config.test.ts New E2E coverage for workingDirectory, provider config, and attachments.
nodejs/test/e2e/session_lifecycle.test.ts New E2E coverage for list/delete sessions and getMessages behavior.
nodejs/test/e2e/streaming_fidelity.test.ts New E2E coverage ensuring deltas appear only when streaming is enabled.
nodejs/test/e2e/tool_results.test.ts New E2E coverage for ToolResultObject handling and Zod param validation.
test/snapshots/ask_user/should_invoke_user_input_handler_when_model_uses_ask_user_tool.yaml Updates expected assistant text for ask_user flow.
test/snapshots/builtin_tools/should_capture_exit_code_in_output.yaml New snapshot for bash output/exit code behavior.
test/snapshots/builtin_tools/should_capture_stderr_output.yaml New snapshot for bash stderr capture behavior.
test/snapshots/builtin_tools/should_create_a_new_file.yaml New snapshot for create tool behavior.
test/snapshots/builtin_tools/should_edit_a_file_successfully.yaml New snapshot for edit tool behavior.
test/snapshots/builtin_tools/should_find_files_by_pattern.yaml New snapshot for glob tool behavior.
test/snapshots/builtin_tools/should_handle_nonexistent_file_gracefully.yaml New snapshot for view error handling.
test/snapshots/builtin_tools/should_read_file_with_line_range.yaml New snapshot for ranged view behavior.
test/snapshots/builtin_tools/should_search_for_patterns_in_files.yaml New snapshot for grep behavior.
test/snapshots/client_lifecycle/should_emit_session_lifecycle_events.yaml New snapshot for lifecycle-events test.
test/snapshots/client_lifecycle/should_return_last_session_id_after_sending_a_message.yaml New snapshot for last-session-id test.
test/snapshots/compaction/should_trigger_compaction_with_low_threshold_and_emit_events.yaml Updates snapshot content for compaction scenario.
test/snapshots/event_fidelity/should_emit_assistant_message_with_messageid.yaml New snapshot for assistant messageId behavior.
test/snapshots/event_fidelity/should_emit_events_in_correct_order_for_tool_using_conversation.yaml New snapshot for event ordering in tool-using flow.
test/snapshots/event_fidelity/should_emit_tool_execution_events_with_correct_fields.yaml New snapshot for tool execution events.
test/snapshots/event_fidelity/should_include_valid_fields_on_all_events.yaml New snapshot for general event field validity.
test/snapshots/hooks/should_invoke_both_pretooluse_and_posttooluse_hooks_for_a_single_tool_call.yaml Updates expected assistant text for hook scenario.
test/snapshots/hooks/should_invoke_pretooluse_hook_when_model_runs_a_tool.yaml Updates expected assistant text for hook scenario.
test/snapshots/hooks_extended/should_invoke_onerroroccurred_hook_when_error_occurs.yaml New snapshot for extended hook coverage.
test/snapshots/hooks_extended/should_invoke_onsessionend_hook_when_session_is_destroyed.yaml New snapshot for extended hook coverage.
test/snapshots/hooks_extended/should_invoke_onsessionstart_hook_on_new_session.yaml New snapshot for extended hook coverage.
test/snapshots/hooks_extended/should_invoke_onuserpromptsubmitted_hook_when_sending_a_message.yaml New snapshot for extended hook coverage.
test/snapshots/multi_turn/should_handle_file_creation_then_reading_across_turns.yaml New snapshot for create+read multi-turn flow.
test/snapshots/multi_turn/should_use_tool_results_from_previous_turns.yaml New snapshot for multi-turn state usage.
test/snapshots/permissions/should_deny_permission_when_handler_returns_denied.yaml Updates expected assistant text in permissions flow.
test/snapshots/permissions/should_handle_permission_handler_errors_gracefully.yaml Updates shell tool description text in permissions flow.
test/snapshots/session/should_abort_a_session.yaml Updates expected assistant response in abort scenario.
test/snapshots/session_config/should_accept_message_attachments.yaml New snapshot for attachments request content.
test/snapshots/session_config/should_use_workingdirectory_for_tool_execution.yaml New snapshot validating workingDirectory paths.
test/snapshots/session_lifecycle/should_delete_session_permanently.yaml New snapshot for delete-session lifecycle behavior.
test/snapshots/session_lifecycle/should_list_created_sessions_after_sending_a_message.yaml New snapshot for session listing after activity.
test/snapshots/session_lifecycle/should_return_events_via_getmessages_after_conversation.yaml New snapshot for getMessages lifecycle behavior.
test/snapshots/session_lifecycle/should_support_multiple_concurrent_sessions.yaml New snapshot for concurrent sessions scenario.
test/snapshots/skills/should_load_and_apply_skill_from_skilldirectories.yaml Updates expected assistant text for skills scenario.
test/snapshots/streaming_fidelity/should_not_produce_deltas_when_streaming_is_disabled.yaml New snapshot for non-streaming case.
test/snapshots/streaming_fidelity/should_produce_delta_events_when_streaming_is_enabled.yaml New snapshot for streaming-enabled case.
test/snapshots/tool_results/should_handle_structured_toolresultobject_from_custom_tool.yaml New snapshot for structured ToolResultObject success case.
test/snapshots/tool_results/should_handle_tool_result_with_failure_resulttype.yaml New snapshot for ToolResultObject failure case.
test/snapshots/tool_results/should_pass_validated_zod_parameters_to_tool_handler.yaml New snapshot for Zod-validated tool params flow.
Comments suppressed due to low confidence (2)

nodejs/test/e2e/tool_results.test.ts:13

  • openAiEndpoint is destructured from createSdkTestContext() but never used. With the repo's ESLint config (@typescript-eslint/no-unused-vars), this will fail lint; remove it or rename to _openAiEndpoint if intentionally unused.
describe("Tool Results", async () => {
    const { copilotClient: client } = await createSdkTestContext();

nodejs/test/e2e/hooks_extended.test.ts:16

  • workDir is destructured from createSdkTestContext() but never used. This will fail ESLint @typescript-eslint/no-unused-vars; remove it or rename to _workDir if intentionally unused.
describe("Extended session hooks", async () => {
    const { copilotClient: client } = await createSdkTestContext();

Comment on lines +295 to +301
const headers = {
"content-type": "text/event-stream",
...commonResponseHeaders,
};
options.onResponseStart(200, headers);
// Never call onResponseEnd - hang indefinitely for timeout tests
await new Promise(() => {});
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The request-only snapshot branch hangs by awaiting a never-resolving Promise. This keeps an async stack alive indefinitely and is unnecessary to keep the HTTP response open. Consider returning immediately after onResponseStart (and simply never calling onResponseEnd) to simulate a hang without leaking a pending Promise; also consider matching the response content-type (SSE vs JSON) to the request's stream flag, as the non-streaming path uses application/json.

Suggested change
const headers = {
"content-type": "text/event-stream",
...commonResponseHeaders,
};
options.onResponseStart(200, headers);
// Never call onResponseEnd - hang indefinitely for timeout tests
await new Promise(() => {});
const streamingIsRequested =
options.body &&
(JSON.parse(options.body) as { stream?: boolean }).stream ===
true;
const headers = {
"content-type": streamingIsRequested
? "text/event-stream"
: "application/json",
...commonResponseHeaders,
};
options.onResponseStart(200, headers);
// Never call onResponseEnd - hang indefinitely for timeout tests.
// Returning here keeps the HTTP response open without leaking a pending Promise.

Copilot uses AI. Check for mistakes.
});

expect(assistantMessage).not.toBeNull();
expect(assistantMessage?.data.content).toBeTruthy();
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is intended to validate failure handling (prompt: "If it fails, say 'service is down'"), but the assertions only check that some content exists. This can pass even if the SDK ignores resultType: "failure" entirely; assert that the assistant response includes the expected fallback text (e.g., "service is down").

Suggested change
expect(assistantMessage?.data.content).toBeTruthy();
const failureContent = assistantMessage?.data.content ?? "";
expect(failureContent).toMatch(/service is down/i);

Copilot uses AI. Check for mistakes.
it("should return undefined for getLastSessionId with no sessions", async () => {
// On a fresh client this may return undefined or an older session ID
const lastSessionId = await client.getLastSessionId();
expect(() => lastSessionId).not.toThrow();
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is a no-op: expect(() => lastSessionId).not.toThrow() cannot fail because it just returns a value. If the intent is to validate behavior when no sessions exist, assert on the return value (e.g., undefined) or explicitly document and check the allowed shapes (string | undefined).

Suggested change
expect(() => lastSessionId).not.toThrow();
expect(lastSessionId === undefined || typeof lastSessionId === "string").toBe(true);

Copilot uses AI. Check for mistakes.
import { createSdkTestContext } from "./harness/sdkTestContext.js";

describe("Compaction", async () => {
describe.skip("Compaction", async () => {
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire Compaction E2E suite is now skipped via describe.skip, which disables coverage for compaction behavior. If this is due to flakiness, consider describe.skipIf(...) with a tracked condition, or leave the suite enabled and mark only the flaky test(s) as skipped with a TODO + issue link.

Suggested change
describe.skip("Compaction", async () => {
describe("Compaction", async () => {

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

Cross-SDK Consistency Review ✅

I've reviewed this PR's additions of comprehensive E2E tests for the Node.js SDK against the Python, Go, and .NET implementations.

📊 Test Coverage Gap (Not a Consistency Issue)

This PR adds extensive test coverage for Node.js that doesn't yet exist in other SDKs:

  • ✅ Built-in tools (bash, view, edit, create, glob, grep)
  • ✅ Event ordering and fidelity
  • ✅ Multi-turn conversation context
  • ✅ Error resilience (destroyed sessions, double abort, etc.)
  • ✅ Session configuration (working directory, provider config, attachments)
  • ✅ Streaming delta events vs non-streaming mode
  • ✅ Tool result objects (success/failure types)
  • ✅ Extended session hooks (onSessionStart, onUserPromptSubmitted, etc.)

Good news: The underlying SDK features exist in all languages—this is purely a test coverage gap, not an API inconsistency. The new tests provide excellent patterns that could be ported to Python/Go/.NET test suites in future work.

🔍 One API Inconsistency Found

getLastSessionId() is missing in Python and Go SDKs:

  • ✅ Node.js: client.getLastSessionId() (tested in client_lifecycle.test.ts)
  • ✅ .NET: client.GetLastSessionIdAsync() (exists but not E2E tested)
  • ❌ Python: Missing (no client.get_last_session_id())
  • ❌ Go: Missing (no client.GetLastSessionId())

This is a minor utility method, but should be added to Python and Go for complete feature parity.

✅ Recommendation

For this PR: ✅ Approve—this adds valuable test coverage and doesn't introduce inconsistencies.

Follow-up work suggested:

  1. Add get_last_session_id() to Python SDK and GetLastSessionId() to Go SDK
  2. Port these comprehensive E2E test patterns to Python/Go/.NET for better cross-SDK validation (especially built-in tools, event fidelity, and error resilience tests)

Note: All other features tested here (streaming, list/delete sessions, extended hooks, tool results, etc.) already exist across all SDKs—this PR just demonstrates best practices for testing them thoroughly.

AI generated by SDK Consistency Review Agent

await session.sendAndWait({ prompt: "Say hello" });

// Wait for session data to flush to disk
await new Promise((r) => setTimeout(r, 500));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make sure none of the tests rely on timings like this? Should poll until the desired condition is met.

@SteveSandersonMS
Copy link
Contributor

@friggeri This looks very useful! Eventually we should make sure we have the same e2e tests across the other languages as well, but I'm personally fine with this PR doing it for Node first and then getting Copilot to copy the same set of tests to other languages as a later follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants