Release v1.23.1#3374
Conversation
|
Hi! I started running the behavior tests on your PR. You will receive a comment with the results shortly. |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
|
Hi! I started running the behavior tests on your PR. You will receive a comment with the results shortly. |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ❌ FAIL Exit code 1 |
8.7s | -- |
| 01_standalone_sdk/03_activate_skill.py | ❌ FAIL Exit code 1 |
8.1s | -- |
| 01_standalone_sdk/05_use_llm_registry.py | ❌ FAIL Exit code 1 |
8.7s | -- |
| 01_standalone_sdk/07_mcp_integration.py | ❌ FAIL Exit code 1 |
18.9s | -- |
| 01_standalone_sdk/09_pause_example.py | ❌ FAIL Exit code 1 |
8.5s | -- |
| 01_standalone_sdk/10_persistence.py | ❌ FAIL Exit code 1 |
11.4s | -- |
| 01_standalone_sdk/11_async.py | ❌ FAIL Exit code 1 |
7.4s | -- |
| 01_standalone_sdk/12_custom_secrets.py | ❌ FAIL Exit code 1 |
8.0s | -- |
| 01_standalone_sdk/13_get_llm_metrics.py | ❌ FAIL Exit code 1 |
11.4s | -- |
| 01_standalone_sdk/14_context_condenser.py | ❌ FAIL Exit code 1 |
7.2s | -- |
| 01_standalone_sdk/17_image_input.py | ❌ FAIL Exit code 1 |
8.6s | -- |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 8.8s | $0.00 |
| 01_standalone_sdk/19_llm_routing.py | ❌ FAIL Exit code 1 |
10.1s | -- |
| 01_standalone_sdk/20_stuck_detector.py | ❌ FAIL Exit code 1 |
8.9s | -- |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ❌ FAIL Exit code 1 |
7.0s | -- |
| 01_standalone_sdk/22_anthropic_thinking.py | ❌ FAIL Exit code 1 |
7.0s | -- |
| 01_standalone_sdk/23_responses_reasoning.py | ❌ FAIL Exit code 1 |
5.9s | -- |
| 01_standalone_sdk/24_planning_agent_workflow.py | ❌ FAIL Exit code 1 |
6.1s | -- |
| 01_standalone_sdk/25_agent_delegation.py | ❌ FAIL Exit code 1 |
6.3s | -- |
| 01_standalone_sdk/26_custom_visualizer.py | ❌ FAIL Exit code 1 |
7.1s | -- |
| 01_standalone_sdk/28_ask_agent_example.py | ❌ FAIL Exit code 1 |
8.1s | -- |
| 01_standalone_sdk/29_llm_streaming.py | ❌ FAIL Exit code 1 |
6.9s | -- |
| 01_standalone_sdk/30_tom_agent.py | ❌ FAIL Exit code 1 |
7.1s | -- |
| 01_standalone_sdk/31_iterative_refinement.py | ❌ FAIL Exit code 1 |
6.7s | -- |
| 01_standalone_sdk/32_configurable_security_policy.py | ❌ FAIL Exit code 1 |
6.8s | -- |
| 01_standalone_sdk/33_hooks/main.py | ❌ FAIL Exit code 1 |
13.2s | -- |
| 01_standalone_sdk/34_critic_example.py | ❌ FAIL Exit code 1 |
7.0s | -- |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ❌ FAIL Exit code 1 |
6.9s | -- |
| 01_standalone_sdk/37_llm_profile_store/main.py | ✅ PASS | 7.7s | $0.00 |
| 01_standalone_sdk/38_browser_session_recording.py | ❌ FAIL Timed out after 600 seconds |
10m 0s | -- |
| 01_standalone_sdk/39_llm_fallback.py | ❌ FAIL Exit code 1 |
7.3s | -- |
| 01_standalone_sdk/40_acp_agent_example.py | ❌ FAIL Exit code 1 |
1m 5s | -- |
| 01_standalone_sdk/41_task_tool_set.py | ❌ FAIL Exit code 1 |
6.6s | -- |
| 01_standalone_sdk/42_file_based_subagents.py | ❌ FAIL Exit code 1 |
6.2s | -- |
| 01_standalone_sdk/43_mixed_marketplace_skills/main.py | ✅ PASS | 7.1s | $0.00 |
| 01_standalone_sdk/44_model_switching_in_convo.py | ❌ FAIL Exit code 1 |
7.2s | -- |
| 01_standalone_sdk/45_parallel_tool_execution.py | ❌ FAIL Exit code 1 |
7.2s | -- |
| 01_standalone_sdk/46_agent_settings.py | ❌ FAIL Exit code 1 |
7.5s | -- |
| 01_standalone_sdk/47_defense_in_depth_security.py | ✅ PASS | 4.1s | $0.00 |
| 01_standalone_sdk/48_conversation_fork.py | ❌ FAIL Exit code 1 |
7.0s | -- |
| 01_standalone_sdk/49_switch_llm_tool.py | ❌ FAIL Exit code 1 |
4.0s | -- |
| 01_standalone_sdk/50_async_cancellation.py | ❌ FAIL Exit code 1 |
5.3s | -- |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ❌ FAIL Exit code 1 |
12.1s | -- |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
1m 8s | -- |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
1m 0s | -- |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ❌ FAIL Exit code 1 |
57.0s | -- |
| 02_remote_agent_server/06_custom_tool/main.py | ❌ FAIL Exit code 1 |
4m 51s | -- |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ❌ FAIL Exit code 1 |
9.3s | -- |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ❌ FAIL Exit code 1 |
4m 1s | -- |
| 02_remote_agent_server/09_acp_agent_with_remote_runtime.py | ✅ PASS | 59.9s | $0.00 |
| 02_remote_agent_server/10_cloud_workspace_share_credentials.py | ✅ PASS | 36.9s | $0.06 |
| 02_remote_agent_server/11_conversation_fork.py | ❌ FAIL Exit code 1 |
21.1s | -- |
| 02_remote_agent_server/12_settings_and_secrets_api.py | ✅ PASS | 21.7s | $0.00 |
| 02_remote_agent_server/13_workspace_get_llm.py | ❌ FAIL Exit code 1 |
25.6s | -- |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ❌ FAIL Exit code 1 |
21.8s | -- |
| 04_llm_specific_tools/02_gemini_file_tools.py | ❌ FAIL Exit code 1 |
21.4s | -- |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ❌ FAIL Exit code 1 |
10.0s | -- |
| 05_skills_and_plugins/02_loading_plugins/main.py | ❌ FAIL Exit code 1 |
5.5s | -- |
❌ Some tests failed
Total: 58 | Passed: 7 | Failed: 51 | Total Cost: $0.06
Failed examples:
- examples/01_standalone_sdk/02_custom_tools.py: Exit code 1
- examples/01_standalone_sdk/03_activate_skill.py: Exit code 1
- examples/01_standalone_sdk/05_use_llm_registry.py: Exit code 1
- examples/01_standalone_sdk/07_mcp_integration.py: Exit code 1
- examples/01_standalone_sdk/09_pause_example.py: Exit code 1
- examples/01_standalone_sdk/10_persistence.py: Exit code 1
- examples/01_standalone_sdk/11_async.py: Exit code 1
- examples/01_standalone_sdk/12_custom_secrets.py: Exit code 1
- examples/01_standalone_sdk/13_get_llm_metrics.py: Exit code 1
- examples/01_standalone_sdk/14_context_condenser.py: Exit code 1
- examples/01_standalone_sdk/17_image_input.py: Exit code 1
- examples/01_standalone_sdk/19_llm_routing.py: Exit code 1
- examples/01_standalone_sdk/20_stuck_detector.py: Exit code 1
- examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py: Exit code 1
- examples/01_standalone_sdk/22_anthropic_thinking.py: Exit code 1
- examples/01_standalone_sdk/23_responses_reasoning.py: Exit code 1
- examples/01_standalone_sdk/24_planning_agent_workflow.py: Exit code 1
- examples/01_standalone_sdk/25_agent_delegation.py: Exit code 1
- examples/01_standalone_sdk/26_custom_visualizer.py: Exit code 1
- examples/01_standalone_sdk/28_ask_agent_example.py: Exit code 1
- examples/01_standalone_sdk/29_llm_streaming.py: Exit code 1
- examples/01_standalone_sdk/30_tom_agent.py: Exit code 1
- examples/01_standalone_sdk/31_iterative_refinement.py: Exit code 1
- examples/01_standalone_sdk/32_configurable_security_policy.py: Exit code 1
- examples/01_standalone_sdk/33_hooks/main.py: Exit code 1
- examples/01_standalone_sdk/34_critic_example.py: Exit code 1
- examples/01_standalone_sdk/36_event_json_to_openai_messages.py: Exit code 1
- examples/01_standalone_sdk/38_browser_session_recording.py: Timed out after 600 seconds
- examples/01_standalone_sdk/39_llm_fallback.py: Exit code 1
- examples/01_standalone_sdk/40_acp_agent_example.py: Exit code 1
- examples/01_standalone_sdk/41_task_tool_set.py: Exit code 1
- examples/01_standalone_sdk/42_file_based_subagents.py: Exit code 1
- examples/01_standalone_sdk/44_model_switching_in_convo.py: Exit code 1
- examples/01_standalone_sdk/45_parallel_tool_execution.py: Exit code 1
- examples/01_standalone_sdk/46_agent_settings.py: Exit code 1
- examples/01_standalone_sdk/48_conversation_fork.py: Exit code 1
- examples/01_standalone_sdk/49_switch_llm_tool.py: Exit code 1
- examples/01_standalone_sdk/50_async_cancellation.py: Exit code 1
- examples/02_remote_agent_server/01_convo_with_local_agent_server.py: Exit code 1
- examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/06_custom_tool/main.py: Exit code 1
- examples/02_remote_agent_server/07_convo_with_cloud_workspace.py: Exit code 1
- examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/11_conversation_fork.py: Exit code 1
- examples/02_remote_agent_server/13_workspace_get_llm.py: Exit code 1
- examples/04_llm_specific_tools/01_gpt5_apply_patch_preset.py: Exit code 1
- examples/04_llm_specific_tools/02_gemini_file_tools.py: Exit code 1
- examples/05_skills_and_plugins/01_loading_agentskills/main.py: Exit code 1
- examples/05_skills_and_plugins/02_loading_plugins/main.py: Exit code 1
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ❌ FAIL Exit code 1 |
8.7s | -- |
| 01_standalone_sdk/03_activate_skill.py | ❌ FAIL Exit code 1 |
7.8s | -- |
| 01_standalone_sdk/05_use_llm_registry.py | ❌ FAIL Exit code 1 |
8.7s | -- |
| 01_standalone_sdk/07_mcp_integration.py | ❌ FAIL Exit code 1 |
19.3s | -- |
| 01_standalone_sdk/09_pause_example.py | ❌ FAIL Exit code 1 |
8.1s | -- |
| 01_standalone_sdk/10_persistence.py | ❌ FAIL Exit code 1 |
11.3s | -- |
| 01_standalone_sdk/11_async.py | ❌ FAIL Exit code 1 |
6.5s | -- |
| 01_standalone_sdk/12_custom_secrets.py | ❌ FAIL Exit code 1 |
6.8s | -- |
| 01_standalone_sdk/13_get_llm_metrics.py | ❌ FAIL Exit code 1 |
11.3s | -- |
| 01_standalone_sdk/14_context_condenser.py | ❌ FAIL Exit code 1 |
6.7s | -- |
| 01_standalone_sdk/17_image_input.py | ❌ FAIL Exit code 1 |
6.5s | -- |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 8.1s | $0.00 |
| 01_standalone_sdk/19_llm_routing.py | ❌ FAIL Exit code 1 |
9.9s | -- |
| 01_standalone_sdk/20_stuck_detector.py | ❌ FAIL Exit code 1 |
8.6s | -- |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ❌ FAIL Exit code 1 |
6.8s | -- |
| 01_standalone_sdk/22_anthropic_thinking.py | ❌ FAIL Exit code 1 |
7.0s | -- |
| 01_standalone_sdk/23_responses_reasoning.py | ❌ FAIL Exit code 1 |
5.7s | -- |
| 01_standalone_sdk/24_planning_agent_workflow.py | ❌ FAIL Exit code 1 |
6.4s | -- |
| 01_standalone_sdk/25_agent_delegation.py | ❌ FAIL Exit code 1 |
6.2s | -- |
| 01_standalone_sdk/26_custom_visualizer.py | ❌ FAIL Exit code 1 |
7.0s | -- |
| 01_standalone_sdk/28_ask_agent_example.py | ❌ FAIL Exit code 1 |
7.6s | -- |
| 01_standalone_sdk/29_llm_streaming.py | ❌ FAIL Exit code 1 |
6.7s | -- |
| 01_standalone_sdk/30_tom_agent.py | ❌ FAIL Exit code 1 |
6.2s | -- |
| 01_standalone_sdk/31_iterative_refinement.py | ❌ FAIL Exit code 1 |
6.4s | -- |
| 01_standalone_sdk/32_configurable_security_policy.py | ❌ FAIL Exit code 1 |
6.9s | -- |
| 01_standalone_sdk/33_hooks/main.py | ❌ FAIL Exit code 1 |
10.2s | -- |
| 01_standalone_sdk/34_critic_example.py | ❌ FAIL Exit code 1 |
7.0s | -- |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ❌ FAIL Exit code 1 |
6.7s | -- |
| 01_standalone_sdk/37_llm_profile_store/main.py | ✅ PASS | 7.8s | $0.00 |
| 01_standalone_sdk/38_browser_session_recording.py | ❌ FAIL Timed out after 600 seconds |
10m 0s | -- |
| 01_standalone_sdk/39_llm_fallback.py | ❌ FAIL Exit code 1 |
6.7s | -- |
| 01_standalone_sdk/40_acp_agent_example.py | ❌ FAIL Exit code 1 |
1m 5s | -- |
| 01_standalone_sdk/41_task_tool_set.py | ❌ FAIL Exit code 1 |
5.9s | -- |
| 01_standalone_sdk/42_file_based_subagents.py | ❌ FAIL Exit code 1 |
6.1s | -- |
| 01_standalone_sdk/43_mixed_marketplace_skills/main.py | ✅ PASS | 9.7s | $0.00 |
| 01_standalone_sdk/44_model_switching_in_convo.py | ❌ FAIL Exit code 1 |
6.6s | -- |
| 01_standalone_sdk/45_parallel_tool_execution.py | ❌ FAIL Exit code 1 |
6.9s | -- |
| 01_standalone_sdk/46_agent_settings.py | ❌ FAIL Exit code 1 |
6.8s | -- |
| 01_standalone_sdk/47_defense_in_depth_security.py | ✅ PASS | 3.9s | $0.00 |
| 01_standalone_sdk/48_conversation_fork.py | ❌ FAIL Exit code 1 |
5.6s | -- |
| 01_standalone_sdk/49_switch_llm_tool.py | ❌ FAIL Exit code 1 |
3.7s | -- |
| 01_standalone_sdk/50_async_cancellation.py | ❌ FAIL Exit code 1 |
5.4s | -- |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ❌ FAIL Exit code 1 |
13.3s | -- |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
1m 5s | -- |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
53.7s | -- |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ❌ FAIL Exit code 1 |
1m 11s | -- |
| 02_remote_agent_server/06_custom_tool/main.py | ❌ FAIL Exit code 1 |
4m 44s | -- |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ❌ FAIL Exit code 1 |
7.7s | -- |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ❌ FAIL Exit code 1 |
3m 52s | -- |
| 02_remote_agent_server/09_acp_agent_with_remote_runtime.py | ✅ PASS | 1m 20s | $0.00 |
| 02_remote_agent_server/10_cloud_workspace_share_credentials.py | ✅ PASS | 44.3s | $0.04 |
| 02_remote_agent_server/11_conversation_fork.py | ❌ FAIL Exit code 1 |
19.2s | -- |
| 02_remote_agent_server/12_settings_and_secrets_api.py | ✅ PASS | 21.9s | $0.00 |
| 02_remote_agent_server/13_workspace_get_llm.py | ❌ FAIL Exit code 1 |
20.4s | -- |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ❌ FAIL Exit code 1 |
23.1s | -- |
| 04_llm_specific_tools/02_gemini_file_tools.py | ❌ FAIL Exit code 1 |
20.6s | -- |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ❌ FAIL Exit code 1 |
9.1s | -- |
| 05_skills_and_plugins/02_loading_plugins/main.py | ❌ FAIL Exit code 1 |
5.5s | -- |
❌ Some tests failed
Total: 58 | Passed: 7 | Failed: 51 | Total Cost: $0.04
Failed examples:
- examples/01_standalone_sdk/02_custom_tools.py: Exit code 1
- examples/01_standalone_sdk/03_activate_skill.py: Exit code 1
- examples/01_standalone_sdk/05_use_llm_registry.py: Exit code 1
- examples/01_standalone_sdk/07_mcp_integration.py: Exit code 1
- examples/01_standalone_sdk/09_pause_example.py: Exit code 1
- examples/01_standalone_sdk/10_persistence.py: Exit code 1
- examples/01_standalone_sdk/11_async.py: Exit code 1
- examples/01_standalone_sdk/12_custom_secrets.py: Exit code 1
- examples/01_standalone_sdk/13_get_llm_metrics.py: Exit code 1
- examples/01_standalone_sdk/14_context_condenser.py: Exit code 1
- examples/01_standalone_sdk/17_image_input.py: Exit code 1
- examples/01_standalone_sdk/19_llm_routing.py: Exit code 1
- examples/01_standalone_sdk/20_stuck_detector.py: Exit code 1
- examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py: Exit code 1
- examples/01_standalone_sdk/22_anthropic_thinking.py: Exit code 1
- examples/01_standalone_sdk/23_responses_reasoning.py: Exit code 1
- examples/01_standalone_sdk/24_planning_agent_workflow.py: Exit code 1
- examples/01_standalone_sdk/25_agent_delegation.py: Exit code 1
- examples/01_standalone_sdk/26_custom_visualizer.py: Exit code 1
- examples/01_standalone_sdk/28_ask_agent_example.py: Exit code 1
- examples/01_standalone_sdk/29_llm_streaming.py: Exit code 1
- examples/01_standalone_sdk/30_tom_agent.py: Exit code 1
- examples/01_standalone_sdk/31_iterative_refinement.py: Exit code 1
- examples/01_standalone_sdk/32_configurable_security_policy.py: Exit code 1
- examples/01_standalone_sdk/33_hooks/main.py: Exit code 1
- examples/01_standalone_sdk/34_critic_example.py: Exit code 1
- examples/01_standalone_sdk/36_event_json_to_openai_messages.py: Exit code 1
- examples/01_standalone_sdk/38_browser_session_recording.py: Timed out after 600 seconds
- examples/01_standalone_sdk/39_llm_fallback.py: Exit code 1
- examples/01_standalone_sdk/40_acp_agent_example.py: Exit code 1
- examples/01_standalone_sdk/41_task_tool_set.py: Exit code 1
- examples/01_standalone_sdk/42_file_based_subagents.py: Exit code 1
- examples/01_standalone_sdk/44_model_switching_in_convo.py: Exit code 1
- examples/01_standalone_sdk/45_parallel_tool_execution.py: Exit code 1
- examples/01_standalone_sdk/46_agent_settings.py: Exit code 1
- examples/01_standalone_sdk/48_conversation_fork.py: Exit code 1
- examples/01_standalone_sdk/49_switch_llm_tool.py: Exit code 1
- examples/01_standalone_sdk/50_async_cancellation.py: Exit code 1
- examples/02_remote_agent_server/01_convo_with_local_agent_server.py: Exit code 1
- examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/06_custom_tool/main.py: Exit code 1
- examples/02_remote_agent_server/07_convo_with_cloud_workspace.py: Exit code 1
- examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/11_conversation_fork.py: Exit code 1
- examples/02_remote_agent_server/13_workspace_get_llm.py: Exit code 1
- examples/04_llm_specific_tools/01_gpt5_apply_patch_preset.py: Exit code 1
- examples/04_llm_specific_tools/02_gemini_file_tools.py: Exit code 1
- examples/05_skills_and_plugins/01_loading_agentskills/main.py: Exit code 1
- examples/05_skills_and_plugins/02_loading_plugins/main.py: Exit code 1
🧪 Integration Tests ResultsOverall Success Rate: 50.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_v4_flash
litellm_proxy_gemini_3.1_pro_preview
Failed Tests:
litellm_proxy_openai_gpt_5.5
litellm_proxy_moonshot_kimi_k2.6
Failed Tests:
|
🧪 Integration Tests ResultsOverall Success Rate: 50.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_v4_flash
litellm_proxy_gemini_3.1_pro_preview
Failed Tests:
litellm_proxy_openai_gpt_5.5
litellm_proxy_moonshot_kimi_k2.6
Failed Tests:
|
🧪 Integration Tests ResultsOverall Success Rate: 48.6% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_v4_flash
Skipped Tests:
litellm_proxy_gemini_3.1_pro_preview
Failed Tests:
litellm_proxy_openai_gpt_5.5
litellm_proxy_moonshot_kimi_k2.6
Failed Tests:
|
🧪 Integration Tests ResultsOverall Success Rate: 48.6% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_v4_flash
Skipped Tests:
litellm_proxy_gemini_3.1_pro_preview
Failed Tests:
litellm_proxy_openai_gpt_5.5
litellm_proxy_moonshot_kimi_k2.6
Failed Tests:
|
|
@OpenHands examine all the integration test errors, try to fix, and re-trigger the tests |
|
I'm on it! neubig can track my progress at all-hands.dev |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_v4_flash
Skipped Tests:
litellm_proxy_gemini_3.1_pro_preview
litellm_proxy_openai_gpt_5.5
litellm_proxy_minimax_MiniMax_M2.5
Skipped Tests:
|
|
Integration Tests Triggered
Results will be posted here when complete. |
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_v4_flash
Skipped Tests:
litellm_proxy_gemini_3.1_pro_preview
litellm_proxy_openai_gpt_5.5
litellm_proxy_minimax_MiniMax_M2.5
Skipped Tests:
|
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_v4_flash
Skipped Tests:
litellm_proxy_gemini_3.1_pro_preview
litellm_proxy_openai_gpt_5.5
litellm_proxy_minimax_MiniMax_M2.5
Skipped Tests:
|
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_v4_flash
Skipped Tests:
litellm_proxy_gemini_3.1_pro_preview
litellm_proxy_openai_gpt_5.5
litellm_proxy_minimax_MiniMax_M2.5
Skipped Tests:
|
|
OpenHands encountered an error: Request timeout after 30 seconds to https://sqonvajcojrpebas.prod-runtime.all-hands.dev/api/conversations/83bb6474-8bf6-4b56-a38c-20ec6369657d/ask_agent See the conversation for more information. |
Co-authored-by: openhands <openhands@all-hands.dev>
Drop non-release Gemini and workflow-comment experiments from the release branch so the PR diff only carries release mechanics. Co-authored-by: openhands <openhands@all-hands.dev>
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 1m 35s | $0.15 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 22.8s | $0.03 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 12.9s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 28.1s | $0.02 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 12.4s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 57.8s | $0.04 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 30.3s | $0.04 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 9.6s | $0.01 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 34.3s | $0.01 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 2m 20s | $0.17 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 22.9s | $0.02 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 15.3s | $0.01 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 14.0s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 13.7s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 14.6s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 24.0s | $0.02 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 1m 33s | $0.02 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 5m 27s | $0.45 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 1m 12s | $0.08 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 19.0s | $0.03 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 45.2s | $0.03 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 39.5s | $0.02 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 9.4s | $0.01 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 5m 34s | $0.38 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 20.0s | $0.02 |
| 01_standalone_sdk/33_hooks/main.py | ✅ PASS | 30.8s | $0.04 |
| 01_standalone_sdk/34_critic_example.py | ✅ PASS | 8m 43s | $0.78 |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 10.6s | $0.00 |
| 01_standalone_sdk/37_llm_profile_store/main.py | ✅ PASS | 16.8s | $0.00 |
| 01_standalone_sdk/38_browser_session_recording.py | ✅ PASS | 33.4s | $0.03 |
| 01_standalone_sdk/39_llm_fallback.py | ✅ PASS | 10.3s | $0.01 |
| 01_standalone_sdk/40_acp_agent_example.py | ✅ PASS | 30.1s | $0.32 |
| 01_standalone_sdk/41_task_tool_set.py | ✅ PASS | 28.7s | $0.03 |
| 01_standalone_sdk/42_file_based_subagents.py | ✅ PASS | 48.8s | $0.05 |
| 01_standalone_sdk/43_mixed_marketplace_skills/main.py | ✅ PASS | 8.1s | $0.00 |
| 01_standalone_sdk/44_model_switching_in_convo.py | ✅ PASS | 10.5s | $0.01 |
| 01_standalone_sdk/45_parallel_tool_execution.py | ✅ PASS | 6m 26s | $0.58 |
| 01_standalone_sdk/46_agent_settings.py | ✅ PASS | 11.0s | $0.01 |
| 01_standalone_sdk/47_defense_in_depth_security.py | ✅ PASS | 3.2s | $0.00 |
| 01_standalone_sdk/48_conversation_fork.py | ✅ PASS | 13.9s | $0.00 |
| 01_standalone_sdk/49_switch_llm_tool.py | ✅ PASS | 7.0s | $0.03 |
| 01_standalone_sdk/50_async_cancellation.py | ✅ PASS | 12.8s | $0.00 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 35.6s | $0.02 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 1m 48s | $0.05 |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 1m 9s | $0.06 |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ✅ PASS | 1m 49s | $0.08 |
| 02_remote_agent_server/06_custom_tool/main.py | ✅ PASS | 5m 34s | $0.03 |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 40.3s | $0.03 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ✅ PASS | 5m 1s | $0.03 |
| 02_remote_agent_server/09_acp_agent_with_remote_runtime.py | ✅ PASS | 46.3s | $0.11 |
| 02_remote_agent_server/10_cloud_workspace_share_credentials.py | ✅ PASS | 38.2s | $0.06 |
| 02_remote_agent_server/11_conversation_fork.py | ✅ PASS | 1m 30s | $0.00 |
| 02_remote_agent_server/12_settings_and_secrets_api.py | ✅ PASS | 2m 19s | $0.02 |
| 02_remote_agent_server/13_workspace_get_llm.py | ✅ PASS | 55.0s | $0.02 |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 40.6s | $0.03 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 55.7s | $0.05 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 14.8s | $0.02 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 37.7s | $0.02 |
✅ All tests passed!
Total: 58 | Passed: 58 | Failed: 0 | Total Cost: $4.15
🧪 Integration Tests ResultsOverall Success Rate: 48.6% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_v4_flash
Skipped Tests:
litellm_proxy_gemini_3.1_pro_preview
Failed Tests:
litellm_proxy_openai_gpt_5.5
litellm_proxy_moonshot_kimi_k2.6
Failed Tests:
|
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_v4_flash
Skipped Tests:
litellm_proxy_openai_gpt_5.5
litellm_proxy_minimax_MiniMax_M2.5
Skipped Tests:
|
|
Update after the requested rebase/test rerun:
This comment was created by an AI agent (OpenHands) on behalf of neubig. |
|
OpenHands encountered an error: Request timeout after 30 seconds to https://evryoikfbwwdnjky.prod-runtime.all-hands.dev/api/conversations/978ac4fa-7140-4783-b75d-385046923d1c/ask_agent See the conversation for more information. |
I'm reading in agent's response that Gemini doesn't work, and Kimi doesn't work; instead it ran other models. Can we maybe look into this? @neubig If something broke Gemini, maybe we could investigate and fix the bug? |
|
@OpenHands debug and fix the gemini issues |
|
I'm on it! neubig can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
|
Debugged the Gemini integration failure and pushed a fix in Root cause: Vertex/Gemini rejects the OpenAI-style tool call Fix: Validation:
This comment was created by an AI agent (OpenHands) on behalf of the maintainer. |
This comment was marked as duplicate.
This comment was marked as duplicate.
|
Historical note on the original Gemini integration regression (ignoring the release-branch-specific cleanup/revert): Gemini was working in the daily integration workflow, then started failing in the next daily run. Last good daily Gemini integration run
First bad daily Gemini integration run
So the historical regression window is between the May 21 nightly at This comment was created by an AI agent (OpenHands) on behalf of the maintainer. |
|
Additional upstream LiteLLM context for the original Gemini regression: I found a very relevant recent LiteLLM fix: BerriAI/litellm#28324 —
The PR summary says it:
and explicitly:
That matches our failure class exactly: There is also a suspicious preceding LiteLLM PR: BerriAI/litellm#28268 —
Its commits include: The diff added logic like: forward_tool_call_id = bool(
model and VertexGeminiConfig._is_gemini_3_or_newer(model)
)and then forwarded OpenAI tool-call IDs into Gemini payloads: function_call["id"] = clean_id
...
_function_response["id"] = gemini_call_idBecause Then #28324 corrected that to be provider-aware:
There is also an older related issue confirming this class of incompatibility: BerriAI/litellm#17080 — It reports: So the likely external explanation for our original timeline is:
This comment was created by an AI agent (OpenHands) on behalf of the maintainer. |
|
Question for whoever manages the eval LiteLLM proxy: can we confirm exactly which LiteLLM proxy build/version was running during the Gemini regression window? The relevant upstream LiteLLM release/PR timeline I found is: Candidate buggy LiteLLM release
Relevant included commit: That PR introduced Gemini 3+ tool-call ID forwarding, including logic equivalent to: forward_tool_call_id = bool(
model and VertexGeminiConfig._is_gemini_3_or_newer(model)
)and then forwarded OpenAI tool-call IDs into Gemini payloads: function_call["id"] = clean_id
...
_function_response["id"] = gemini_call_idSince Candidate fixed LiteLLM release
Relevant included commits: #28324 makes the ID forwarding provider-aware:
Stable/backport notesI also checked stable-ish tags:
Why I am askingOur SDK repo lockfile did not change LiteLLM/OpenAI/Google package versions between the last-good and first-bad daily Gemini integration runs, so the most plausible explanation is an eval LiteLLM proxy deployment/config change. Can someone confirm whether
This comment was created by an AI agent (OpenHands) on behalf of the maintainer. |
|
@OpenHands revert commit hash e4fe2ba Be careful and don't do something else; once done and pushed, verify the PR diff: is it a regular release only PR? /codereview it. Post directly on github. |
|
I'm on it! enyst can track my progress at all-hands.dev |
Reverts e4fe2ba to keep the v1.23.1 release PR focused on release-only changes. Co-authored-by: openhands <openhands@all-hands.dev>
enyst
left a comment
There was a problem hiding this comment.
🟢 Taste Rating: Good taste
I reverted e4fe2ba57e15fb3112ed20d113e626a51c3fbf26 in d0b37ec236de2c3ef966533fa9faa44d4415bd95 and re-checked the PR diff against main.
Regular release-only diff check: ✅ Yes. The current PR diff only contains release mechanics:
- version bumps from
1.23.0to1.23.1in the four packagepyproject.tomlfiles - matching editable package version updates in
uv.lock .github/workflows/run-eval.ymldefaultsdk_refupdated tov1.23.1
No SDK behavior/test changes remain in the PR diff after the revert.
[TESTING / RELEASE VALIDATION]
- [PR validation] I am not approving yet because this is a release PR and the repo-specific release review rules require current, PR-specific successful results for
Run tests,Run Examples Scripts, andRun Integration Tests. - At the current head
d0b37ec236de2c3ef966533fa9faa44d4415bd95, normal checks are still in progress/queued. - The latest example-test comments visible on the PR are stale and failing.
- The latest integration-test comment visible on the PR is stale and reports a 50% success rate.
A human maintainer should re-check once the latest release validation comments/checks are present and passing.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
The diff is a standard patch-release-only change after the revert: package version bumps, lock metadata for local editable packages, and the eval workflow default version. No runtime logic, public API, prompt, tool, or dependency-version changes remain.
VERDICT:
⏸️ Hold approval for validation only: the diff is regular release-only and looks structurally correct, but release-specific workflow evidence is not yet current/passing.
KEY INSIGHT:
The revert successfully restored the PR to release mechanics only; the remaining blocker is release validation evidence, not code content.
Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:
- Add a
.agents/skills/custom-codereview-guide.mdfile to your branch (or edit it if one already exists) with the/codereviewtrigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.- Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
- When your PR is merged, the guideline file goes through normal code review by repository maintainers.
Resolve with AI? Install the iterate skill in your agent and run
/iterateto automatically drive this PR through CI, review, and QA until it's merge-ready.Was this review helpful? React with 👍 or 👎 to give feedback.
This PR review was created by an AI agent (OpenHands) on behalf of @enyst.
|
Since my last summary, there have been no additional changes. Checklist:
The request has been completely addressed, and the changes are concise and limited to removing the non-release SDK/test changes from the release PR. |
Updated exclude-newer date for compatibility.
Release v1.23.1
This PR prepares the release for version 1.23.1.
Release Checklist
integration-test)behavior-test)test-examples)release-note-requiredPRs are accurately called out in the final release notesWhat happens on merge
When this PR is merged, the
create-release.ymlworkflow will automatically:v1.23.1and auto-generated notes, plus an explicit preamble for mergedrelease-note-requiredPRspypi-release.ymlto publish all packages to PyPIversion-bump-prs.ymlto create downstream version bump PRsAgent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:b490947-pythonRun
All tags pushed for this build
About Multi-Architecture Support
b490947-python) is a multi-arch manifest supporting both amd64 and arm64b490947-python-amd64) are also available if needed