Python: Fix FunctionShellTool throw and empty streaming shell command#6763
Python: Fix FunctionShellTool throw and empty streaming shell command#6763westey-m wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Automated Code Review
Reviewers: 5 | Confidence: 88%
✓ Correctness
The PR correctly fixes two real bugs: (1) replacing the pydantic model
FunctionShellToolwith the TypedDictFunctionShellToolParamto fix the 'not subscriptable' error, and (2) deferring streaming shell-item parsing fromresponse.output_item.addedtoresponse.output_item.donewhere the command/output is actually populated. The extracted_shell_item_to_contentshelper faithfully reproduces the logic of the old inline code in both the non-streaming and streaming paths, with a minor improvement: usinggetattr(item, "call_id", None) or "instead ofitem.call_id if hasattr(item, "call_id") else ", which also handlesNonevalues. Both call sites for the helper correctly receivelocal_shell_tool_namefrom the enclosing method scope. No correctness issues found.
✓ Security Reliability
This PR correctly fixes two bugs: (1) switching from FunctionShellTool (pydantic model) to FunctionShellToolParam (TypedDict) to avoid 'not subscriptable' errors in the OpenAI SDK's _make_tools path, and (2) deferring streaming shell-item parsing from response.output_item.added to response.output_item.done where commands are actually populated. The shared _shell_item_to_contents helper faithfully consolidates the previous duplicate parsing logic. The local_shell_tool_name variable is correctly in scope at the .done handler (resolved at line 2590, used at line 3086). No security, reliability, or resource leak issues were identified.
✓ Test Coverage
The PR adds 4 new tests and updates 2 existing tests to cover the FunctionShellTool→FunctionShellToolParam fix and the streaming shell deferral. The non-streaming path has solid coverage for all three shell item types (shell_call, local_shell_call, shell_call_output) via pre-existing tests that now go through the new _shell_item_to_contents helper. However, the streaming path's new
response.output_item.donehandler only has a test forshell_call— there are no streaming tests forlocal_shell_calldone events orshell_call_outputdone events. Additionally, the streaming done test doesn't verify additional_properties metadata.
✓ Failure Modes
The PR correctly fixes two shell tool bugs: (1) replacing the pydantic model FunctionShellTool with the TypedDict FunctionShellToolParam to avoid 'not subscriptable' errors, and (2) deferring streaming shell-item parsing from response.output_item.added to response.output_item.done where commands are actually populated. The shared _shell_item_to_contents helper is a faithful extraction of the existing logic. The local_shell_tool_name variable is properly in scope at the new call site in the done handler. No silent failure modes, lost errors, or missing cleanup were identified in the changed code.
✓ Design Approach
The shell-tool fix itself looks consistent with the surrounding parsing and tests. The only design concern I found is that this PR also carries a broad unrelated
uv.lockrefresh/downgrade, which does not match the stated shell-focused scope and goes against the repo’s documented guidance to keep runtime dependency lock updates targeted.
Suggestions
- Split the unrelated
python/uv.lockdependency refresh out of this shell-tool fix, or regenerate only the specific dependency update that is actually required. The repo guidance says targeted runtime dependency changes should useuv lock --upgrade-package <dependency-name>and that full lock refreshes are for repo-wide dev tooling updates (python/DEV_SETUP.md:236-238,395-399;python/.github/skills/python-package-management/SKILL.md:42-43,70-72).
Automated review by westey-m's agents
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Pull request overview
Fixes the OpenAI Responses shell tool integration in the Python OpenAI chat client by ensuring shell tools are built using the SDK’s dict/param types (avoiding “not subscriptable” failures) and by deferring streaming shell parsing until the completed response.output_item.done events so commands aren’t emitted empty.
Changes:
- Switch shell tool construction from the Pydantic model to
FunctionShellToolParamwhen preparing the OpenAI Responsestoolspayload. - Refactor shell output-item parsing into a shared helper and move streaming shell parsing from
response.output_item.addedtoresponse.output_item.done. - Add/adjust unit tests for shell tool preparation and streaming shell-call parsing behavior.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| python/uv.lock | Updates the Python lockfile, including dependency specifier alignment and resolved-version changes. |
| python/packages/openai/agent_framework_openai/_chat_client.py | Fixes shell tool payload typing and corrects streaming parsing to emit shell command/output only once items are complete. |
| python/packages/openai/tests/openai/test_openai_chat_client.py | Updates shell tool tests for dict access and adds regressions for the streaming .added vs .done shell-call behavior. |
Motivation & Context
The OpenAI Responses shell tool was unusable in two ways:
Hard failure when a shell tool was attached. Invoking an agent that had a
shell tool raised
ChatClientException: 'FunctionShellTool' object is not subscriptable. The client constructed the OpenAI pydantic modelFunctionShellToolfor the requesttoolspayload, but the Responses requestpath expects the corresponding TypedDict param (
FunctionShellToolParam)and indexes it like a dict, so a model instance blew up. This made every shell
tool call fail (e.g. "value MSFT for me" in the
build_your_own_clawsample).Empty command on streaming shell calls. When streaming, a shell call
surfaced with an empty
command— the run failed and the UI printed thetool/parameter name with no value. The streaming parser read shell items off
the
response.output_item.addedevent, whoseactionis an in-progressskeleton with no
commandsyet. The command (and output) are only populated onthe completed
response.output_item.doneitem, and there are no shell-specificstreaming delta events to fill the gap.
Description & Review Guide
What are the major changes?
FunctionShellToolParam(instead of the pydanticmodel
FunctionShellTool) when building the Responsestoolspayload — twocall sites and the import in
_chat_client.py. This resolves thenot subscriptablethrow.response.output_item.addedtoresponse.output_item.done, where the item carries the fully-populatedaction/output. The.addedcase is now a no-op (pass) with anexplanatory comment. This mirrors how the client already defers
mcp_callresults to
.done._shell_item_to_contents()helper that convertsshell_call/local_shell_call/shell_call_outputitems into frameworkContent. Both the non-streaming parser and the new streaming.donehandlercall it, removing three near-duplicate parsing blocks.
test_prepared_local_shell_tool_survives_make_tools,test_parse_chunk_from_openai_shell_call_added_defers_command, andtest_parse_chunk_from_openai_shell_call_done_emits_command; updated the twoget_shell_tooltests for dict (param) access.What is the impact of these changes?
actual command and output. Hosted shell, local shell, and shell output items
are all handled. Non-streaming behavior is unchanged (it now goes through the
shared helper). No public API changes.
What do you want reviewers to focus on?
response.output_item.doneis correct forboth hosted (
shell_call/shell_call_output) and local (local_shell_call)items, and that the shared helper preserves the previous non-streaming output.
Related Issue
Fixes #6761
Contribution Checklist
breaking changelabel (or add "[BREAKING]" to the title prefix, before or after any language prefix) — a workflow keeps the label and title prefix in sync automatically.