Skip to content

Python: Fix FunctionShellTool throw and empty streaming shell command#6763

Open
westey-m wants to merge 2 commits into
microsoft:mainfrom
westey-m:python-shelltool-bugfix
Open

Python: Fix FunctionShellTool throw and empty streaming shell command#6763
westey-m wants to merge 2 commits into
microsoft:mainfrom
westey-m:python-shelltool-bugfix

Conversation

@westey-m

Copy link
Copy Markdown
Contributor

Motivation & Context

The OpenAI Responses shell tool was unusable in two ways:

  1. Hard failure when a shell tool was attached. Invoking an agent that had a
    shell tool raised ChatClientException: 'FunctionShellTool' object is not subscriptable. The client constructed the OpenAI pydantic model
    FunctionShellTool for the request tools payload, but the Responses request
    path expects the corresponding TypedDict param (FunctionShellToolParam)
    and indexes it like a dict, so a model instance blew up. This made every shell
    tool call fail (e.g. "value MSFT for me" in the build_your_own_claw sample).

  2. Empty command on streaming shell calls. When streaming, a shell call
    surfaced with an empty command — the run failed and the UI printed the
    tool/parameter name with no value. The streaming parser read shell items off
    the response.output_item.added event, whose action is an in-progress
    skeleton with no commands yet. The command (and output) are only populated on
    the completed response.output_item.done item, and there are no shell-specific
    streaming delta events to fill the gap.

Description & Review Guide

  • What are the major changes?

    • Use the TypedDict param type FunctionShellToolParam (instead of the pydantic
      model FunctionShellTool) when building the Responses tools payload — two
      call sites and the import in _chat_client.py. This resolves the
      not subscriptable throw.
    • Move streaming shell-item parsing from response.output_item.added to
      response.output_item.done, where the item carries the fully-populated
      action/output. The .added case is now a no-op (pass) with an
      explanatory comment. This mirrors how the client already defers mcp_call
      results to .done.
    • Extract a shared _shell_item_to_contents() helper that converts
      shell_call / local_shell_call / shell_call_output items into framework
      Content. Both the non-streaming parser and the new streaming .done handler
      call it, removing three near-duplicate parsing blocks.
    • Tests: added test_prepared_local_shell_tool_survives_make_tools,
      test_parse_chunk_from_openai_shell_call_added_defers_command, and
      test_parse_chunk_from_openai_shell_call_done_emits_command; updated the two
      get_shell_tool tests for dict (param) access.
  • What is the impact of these changes?

    • Shell tools no longer throw when attached, and streaming shell calls carry the
      actual command and output. Hosted shell, local shell, and shell output items
      are all handled. Non-streaming behavior is unchanged (it now goes through the
      shared helper). No public API changes.
  • What do you want reviewers to focus on?

    • That deferring shell parsing to response.output_item.done is correct for
      both hosted (shell_call/shell_call_output) and local (local_shell_call)
      items, and that the shared helper preserves the previous non-streaming output.

Related Issue

Fixes #6761

Contribution Checklist

  • The code builds clean without any errors or warnings
  • All unit tests pass, and I have added new tests where possible
  • The PR follows the Contribution Guidelines
  • This PR is linked to an issue and there is no other open PR for this issue (see Related Issue above).
  • This is not a breaking change. If it is a breaking change, add the breaking change label (or add "[BREAKING]" to the title prefix, before or after any language prefix) — a workflow keeps the label and title prefix in sync automatically.

Copilot AI review requested due to automatic review settings June 26, 2026 11:41
@moonbox3 moonbox3 added the python Usage: [Issues, PRs], Target: Python label Jun 26, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 5 | Confidence: 88%

✓ Correctness

The PR correctly fixes two real bugs: (1) replacing the pydantic model FunctionShellTool with the TypedDict FunctionShellToolParam to fix the 'not subscriptable' error, and (2) deferring streaming shell-item parsing from response.output_item.added to response.output_item.done where the command/output is actually populated. The extracted _shell_item_to_contents helper faithfully reproduces the logic of the old inline code in both the non-streaming and streaming paths, with a minor improvement: using getattr(item, "call_id", None) or " instead of item.call_id if hasattr(item, "call_id") else ", which also handles None values. Both call sites for the helper correctly receive local_shell_tool_name from the enclosing method scope. No correctness issues found.

✓ Security Reliability

This PR correctly fixes two bugs: (1) switching from FunctionShellTool (pydantic model) to FunctionShellToolParam (TypedDict) to avoid 'not subscriptable' errors in the OpenAI SDK's _make_tools path, and (2) deferring streaming shell-item parsing from response.output_item.added to response.output_item.done where commands are actually populated. The shared _shell_item_to_contents helper faithfully consolidates the previous duplicate parsing logic. The local_shell_tool_name variable is correctly in scope at the .done handler (resolved at line 2590, used at line 3086). No security, reliability, or resource leak issues were identified.

✓ Test Coverage

The PR adds 4 new tests and updates 2 existing tests to cover the FunctionShellTool→FunctionShellToolParam fix and the streaming shell deferral. The non-streaming path has solid coverage for all three shell item types (shell_call, local_shell_call, shell_call_output) via pre-existing tests that now go through the new _shell_item_to_contents helper. However, the streaming path's new response.output_item.done handler only has a test for shell_call — there are no streaming tests for local_shell_call done events or shell_call_output done events. Additionally, the streaming done test doesn't verify additional_properties metadata.

✓ Failure Modes

The PR correctly fixes two shell tool bugs: (1) replacing the pydantic model FunctionShellTool with the TypedDict FunctionShellToolParam to avoid 'not subscriptable' errors, and (2) deferring streaming shell-item parsing from response.output_item.added to response.output_item.done where commands are actually populated. The shared _shell_item_to_contents helper is a faithful extraction of the existing logic. The local_shell_tool_name variable is properly in scope at the new call site in the done handler. No silent failure modes, lost errors, or missing cleanup were identified in the changed code.

✓ Design Approach

The shell-tool fix itself looks consistent with the surrounding parsing and tests. The only design concern I found is that this PR also carries a broad unrelated uv.lock refresh/downgrade, which does not match the stated shell-focused scope and goes against the repo’s documented guidance to keep runtime dependency lock updates targeted.

Suggestions

  • Split the unrelated python/uv.lock dependency refresh out of this shell-tool fix, or regenerate only the specific dependency update that is actually required. The repo guidance says targeted runtime dependency changes should use uv lock --upgrade-package <dependency-name> and that full lock refreshes are for repo-wide dev tooling updates (python/DEV_SETUP.md:236-238,395-399; python/.github/skills/python-package-management/SKILL.md:42-43,70-72).

Automated review by westey-m's agents

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/openai/agent_framework_openai
   _chat_client.py123512789%279, 292, 642–646, 654–657, 663–667, 717–724, 726–728, 735–737, 795, 803, 826, 944, 1043, 1102, 1104, 1106, 1108, 1174, 1188, 1268, 1278, 1283, 1326, 1437–1438, 1453, 1680, 1685, 1689–1691, 1695–1696, 1779, 1789, 1816, 1822, 1832, 1838, 1843, 1849, 1854–1855, 1935, 1979, 1982–1985, 1999, 2001, 2009–2010, 2022, 2064, 2129, 2146, 2149, 2176–2178, 2217, 2234, 2237, 2298–2299, 2334, 2372–2373, 2391–2392, 2435, 2550–2551, 2569, 2652–2660, 2690, 2800, 2835, 2850, 2899–2902, 2912–2914, 2925–2927, 2941–2943, 2953–2954, 2960, 2975
TOTAL42539505688% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
8345 37 💤 0 ❌ 0 🔥 2m 12s ⏱️

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the OpenAI Responses shell tool integration in the Python OpenAI chat client by ensuring shell tools are built using the SDK’s dict/param types (avoiding “not subscriptable” failures) and by deferring streaming shell parsing until the completed response.output_item.done events so commands aren’t emitted empty.

Changes:

  • Switch shell tool construction from the Pydantic model to FunctionShellToolParam when preparing the OpenAI Responses tools payload.
  • Refactor shell output-item parsing into a shared helper and move streaming shell parsing from response.output_item.added to response.output_item.done.
  • Add/adjust unit tests for shell tool preparation and streaming shell-call parsing behavior.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

File Description
python/uv.lock Updates the Python lockfile, including dependency specifier alignment and resolved-version changes.
python/packages/openai/agent_framework_openai/_chat_client.py Fixes shell tool payload typing and corrects streaming parsing to emit shell command/output only once items are complete.
python/packages/openai/tests/openai/test_openai_chat_client.py Updates shell tool tests for dict access and adds regressions for the streaming .added vs .done shell-call behavior.

Comment thread python/packages/openai/tests/openai/test_openai_chat_client.py
@westey-m westey-m marked this pull request as ready for review June 26, 2026 13:11

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 5 | Confidence: 92% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Failure Modes, Design Approach


Automated review by westey-m's agents

@westey-m westey-m enabled auto-merge June 26, 2026 14:15
@westey-m westey-m added this pull request to the merge queue Jun 26, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 26, 2026
@westey-m westey-m added this pull request to the merge queue Jun 26, 2026
@westey-m westey-m removed this pull request from the merge queue due to a manual request Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Usage: [Issues, PRs], Target: Python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: Fix FunctionShellTool throw

5 participants