From 4d7844729ba64a375dc11628e472a794c96acc85 Mon Sep 17 00:00:00 2001 From: openhands Date: Thu, 21 May 2026 15:18:56 +0000 Subject: [PATCH] Refresh recommended LLM docs Update the LLM recommendations with current OpenHands Index-backed cloud and open-weight model guidance, and feature Qwen3.6-35B-A3B on the local LLM page.\n\nCo-authored-by: openhands --- openhands/usage/llms/llms.mdx | 51 ++++++++++++++++++++--------- openhands/usage/llms/local-llms.mdx | 38 ++++++++++----------- 2 files changed, 54 insertions(+), 35 deletions(-) diff --git a/openhands/usage/llms/llms.mdx b/openhands/usage/llms/llms.mdx index 301f1c321..b09163f68 100644 --- a/openhands/usage/llms/llms.mdx +++ b/openhands/usage/llms/llms.mdx @@ -15,20 +15,38 @@ for the canonical list of supported parameters. ## Model Recommendations -Based on our evaluations of language models for coding tasks (using the SWE-bench dataset), we can provide some -recommendations for model selection. Our latest benchmarking results can be found in -[this spreadsheet](https://docs.google.com/spreadsheets/d/1wOUdFCMyY6Nt0AIqF705KN4JKOWgeI4wUGUP60krXXs/edit?gid=0). +Model quality for coding agents changes quickly. These recommendations are based on current +[OpenHands Index](https://index.openhands.dev/home) results where available. The linked +[openhands-index-results repository](https://github.com/OpenHands/openhands-index-results) contains the full scores and +trajectories for each run. -Based on these findings and community feedback, these are the latest models that have been verified to work reasonably well with OpenHands: +Use the strongest model you can afford for long-running or high-stakes tasks. Use lower-cost profiles for routine edits, +then switch back to a stronger model for planning, debugging, and review. -### Cloud / API-Based Models +### Best Cloud Models by Family -- [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api) (recommended) -- [anthropic/claude-sonnet-4-5-20250929](https://www.anthropic.com/api) (recommended) -- [openai/gpt-5-2025-08-07](https://openai.com/api/) (recommended) -- [gemini/gemini-3-pro-preview](https://blog.google/products/gemini/gemini-3/) -- [deepseek/deepseek-chat](https://api-docs.deepseek.com/) -- [moonshot/kimi-k2-0711-preview](https://platform.moonshot.ai/docs/pricing/chat#generation-model-kimi-k2) +| Family | Recommended Model | Model String | OpenHands Index Average | Notes | +|--------|-------------------|--------------|-------------------------|-------| +| Claude | [Claude Opus 4.7](https://github.com/OpenHands/openhands-index-results/tree/main/results/claude-opus-4-7) | `anthropic/claude-opus-4-7` | 68.2 | Best Claude-series result in the OpenHands Index. Use it for complex, long-running software work. Claude Opus 4.6 is close behind at 66.7. | +| GPT | [GPT-5.5](https://github.com/OpenHands/openhands-index-results/tree/main/results/GPT-5.5) | `openai/gpt-5.5` | 65.9 | Best GPT-series result in the OpenHands Index. GPT-5.4 is close behind at 64.3. | +| Gemini | [Gemini 3.1 Pro](https://github.com/OpenHands/openhands-index-results/tree/main/results/Gemini-3.1-Pro) | `gemini/gemini-3.1-pro-preview` | 57.0 | Best Gemini-series result in the OpenHands Index. Use Gemini 3 Flash when cost or latency is more important than top accuracy. | + +### Strong Open / Open-Weight Models + +These open or open-weight models have good OpenHands Index scores or are recommended for local OpenHands setups: + +| Model | Suggested Model String | OpenHands Index Average | Notes | +|-------|------------------------|-------------------------|-------| +| [GLM-5.1](https://github.com/OpenHands/openhands-index-results/tree/main/results/GLM-5.1) | `openrouter/z-ai/glm-5.1` | 58.2 | Strongest open-weight result currently listed in the OpenHands Index. | +| [Kimi-K2.6](https://github.com/OpenHands/openhands-index-results/tree/main/results/Kimi-K2.6) | `openrouter/moonshotai/kimi-k2.6` | 57.1 | Strong open-weight option, especially for coding and information-gathering tasks. | +| [DeepSeek-V4-Pro](https://github.com/OpenHands/openhands-index-results/tree/main/results/DeepSeek-V4-Pro) | `openrouter/deepseek/deepseek-v4-pro` | 51.3 | Strong coding and test-generation scores; current Index entry covers three benchmarks. | +| [MiniMax-M2.7](https://github.com/OpenHands/openhands-index-results/tree/main/results/MiniMax-M2.7) | `openrouter/minimax/minimax-m2.7` | 43.4 | Recommended as a lower-cost open-weight option with strong SWE-bench and SWT-bench scores. Also available from MiniMax-compatible OpenAI endpoints as `openai/MiniMax-M2.7`. | +| [Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) | `openai/Qwen3.6-35B-A3B` for local OpenAI-compatible servers, or `openrouter/qwen/qwen3.6-35b-a3b` through OpenRouter | Not yet listed | Recommended local / self-hosted model for OpenHands. It is open-weight, supports a large context window, and is featured in the [local LLM guide](/openhands/usage/llms/local-llms). | + + +Hosted model strings can vary by provider and region. If a model string is not accepted, check the provider console and +the [LiteLLM provider list](https://docs.litellm.ai/docs/providers), then use the provider-specific model ID shown there. + If you have successfully run OpenHands with specific providers, we encourage you to open a PR to share your setup process to help others using the same provider! @@ -43,15 +61,16 @@ limits and monitor usage. ### Local / Self-Hosted Models -- [mistralai/devstral-small](https://openhands.dev/blog/devstral-a-new-state-of-the-art-open-model-for-coding-agents) (20 May 2025) -- also available through [OpenRouter](https://openrouter.ai/mistralai/devstral-small:free) -- [all-hands/openhands-lm-32b-v0.1](https://openhands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model) (31 March 2025) -- also available through [OpenRouter](https://openrouter.ai/all-hands/openhands-lm-32b-v0.1) +For local and self-hosted usage, start with +[Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B). See the +[local LLM guide](/openhands/usage/llms/local-llms) for LM Studio, Ollama, SGLang, and vLLM setup examples. ### Known Issues -Most current local and open source models are not as powerful. When using such models, you may see long -wait times between messages, poor responses, or errors about malformed JSON. OpenHands can only be as powerful as the -models driving it. However, if you do find ones that work, please add them to the verified list above. +Open-weight and local models still vary widely in tool-use reliability. If you see long wait times, poor responses, or +errors about malformed JSON, try a stronger model, increase the context window, or switch to a frontier cloud model for +that task. ## LLM Configuration diff --git a/openhands/usage/llms/local-llms.mdx b/openhands/usage/llms/local-llms.mdx index 395e70f22..0a09f6134 100644 --- a/openhands/usage/llms/local-llms.mdx +++ b/openhands/usage/llms/local-llms.mdx @@ -5,7 +5,7 @@ description: When using a Local LLM, OpenHands may have limited functionality. I ## News -- 2025/12/12: We now recommend two powerful local models for OpenHands: [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) and [Devstral Small 2 (24B)](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512). Both models deliver excellent performance on coding tasks and work great with OpenHands! +- 2026/05/21: We now recommend [Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) as the first local model to try with OpenHands. It is an open-weight MoE model built for agentic coding, supports a large context window, and is available through LM Studio, Ollama, vLLM, and SGLang. ## Quickstart: Running OpenHands with a Local LLM using LM Studio @@ -13,13 +13,13 @@ This guide explains how to serve a local LLM using [LM Studio](https://lmstudio. We recommend: - **LM Studio** as the local model server, which handles metadata downloads automatically and offers a simple, user-friendly interface for configuration. -- **Qwen3-Coder-30B-A3B-Instruct** as the LLM for software development. This model is optimized for coding tasks and works excellently with agent-style workflows like OpenHands. +- **Qwen3.6-35B-A3B** as the LLM for software development. This model is optimized for agentic coding and works well with tool-heavy workflows like OpenHands. ### Hardware Requirements -Running Qwen3-Coder-30B-A3B-Instruct requires: -- A recent GPU with at least 12GB of VRAM (tested on RTX 3060 with 12GB VRAM + 64GB RAM), or -- A Mac with Apple Silicon with at least 32GB of RAM +Running Qwen3.6-35B-A3B requires: +- A recent GPU with at least 24GB of VRAM for quantized variants, or multiple GPUs for full precision and larger context windows, or +- A Mac with Apple Silicon with at least 64GB of unified memory for quantized variants ### 1. Install LM Studio @@ -32,7 +32,7 @@ Download and install the LM Studio desktop app from [lmstudio.ai](https://lmstud ![image](./screenshots/01_lm_studio_open_model_hub.png) -3. Search for **"Qwen3-Coder-30B-A3B-Instruct"**, confirm you're downloading from the official Qwen publisher, then proceed to download. +3. Search for **"Qwen3.6-35B-A3B"**, confirm you're downloading from the official Qwen publisher, then proceed to download. ![image](./screenshots/02_lm_studio_download_devstral.png) @@ -46,7 +46,7 @@ Download and install the LM Studio desktop app from [lmstudio.ai](https://lmstud ![image](./screenshots/03_lm_studio_open_load_model.png) 3. Enable the "Manually choose model load parameters" switch. -4. Select **Qwen3-Coder-30B-A3B-Instruct** from the model list. +4. Select **Qwen3.6-35B-A3B** from the model list. ![image](./screenshots/04_lm_studio_setup_devstral_part_1.png) @@ -108,7 +108,7 @@ When started for the first time, OpenHands will prompt you to set up the LLM pro 2. Enable the "Advanced" switch at the top of the page to show all the available settings. 3. Set the following values: - - **Custom Model**: `openai/qwen/qwen3-coder-30b-a3b-instruct` (the Model API identifier from LM Studio, prefixed with "openai/") + - **Custom Model**: `openai/qwen/qwen3.6-35b-a3b` (the Model API identifier from LM Studio, prefixed with "openai/") - **Base URL**: `http://host.docker.internal:1234/v1` - **API Key**: `local-llm` @@ -137,14 +137,14 @@ This section describes how to run local LLMs with OpenHands using alternative ba ### Create an OpenAI-Compatible Endpoint with Ollama - Install Ollama following [the official documentation](https://ollama.com/download). -- Example launch command for Qwen3-Coder-30B-A3B-Instruct: +- Example launch command for Qwen3.6-35B-A3B: ```bash # ⚠️ WARNING: OpenHands requires a large context size to work properly. # When using Ollama, set OLLAMA_CONTEXT_LENGTH to at least 22000. # The default (4096) is way too small — not even the system prompt will fit, and the agent will not behave correctly. OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_KEEP_ALIVE=-1 nohup ollama serve & -ollama pull qwen3-coder:30b +ollama pull qwen3.6:35b-a3b ``` ### Create an OpenAI-Compatible Endpoint with vLLM or SGLang @@ -152,7 +152,7 @@ ollama pull qwen3-coder:30b First, download the model checkpoint: ```bash -huggingface-cli download Qwen/Qwen3-Coder-30B-A3B-Instruct --local-dir Qwen/Qwen3-Coder-30B-A3B-Instruct +huggingface-cli download Qwen/Qwen3.6-35B-A3B --local-dir Qwen/Qwen3.6-35B-A3B ``` #### Serving the model using SGLang @@ -162,8 +162,8 @@ huggingface-cli download Qwen/Qwen3-Coder-30B-A3B-Instruct --local-dir Qwen/Qwen ```bash SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \ - --model Qwen/Qwen3-Coder-30B-A3B-Instruct \ - --served-model-name Qwen3-Coder-30B-A3B-Instruct \ + --model Qwen/Qwen3.6-35B-A3B \ + --served-model-name Qwen3.6-35B-A3B \ --port 8000 \ --tp 2 --dp 1 \ --host 0.0.0.0 \ @@ -176,11 +176,11 @@ SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \ - Example launch command (with at least 2 GPUs): ```bash -vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \ +vllm serve Qwen/Qwen3.6-35B-A3B \ --host 0.0.0.0 --port 8000 \ --api-key mykey \ --tensor-parallel-size 2 \ - --served-model-name Qwen3-Coder-30B-A3B-Instruct \ + --served-model-name Qwen3.6-35B-A3B \ --enable-prefix-caching ``` @@ -197,11 +197,11 @@ pip install git+https://github.com/snowflakedb/ArcticInference.git 2. Run the launch command with speculative decoding enabled: ```bash -vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \ +vllm serve Qwen/Qwen3.6-35B-A3B \ --host 0.0.0.0 --port 8000 \ --api-key mykey \ --tensor-parallel-size 2 \ - --served-model-name Qwen3-Coder-30B-A3B-Instruct \ + --served-model-name Qwen3.6-35B-A3B \ --speculative-config '{"method": "suffix"}' ``` @@ -225,8 +225,8 @@ Once OpenHands is running, open the Settings page in the UI and go to the `LLM` 2. Enable the **Advanced** toggle at the top of the page. 3. Set the following parameters, if you followed the examples above: - **Custom Model**: `openai/` - - For **Ollama**: `openai/qwen3-coder:30b` - - For **SGLang/vLLM**: `openai/Qwen3-Coder-30B-A3B-Instruct` + - For **Ollama**: `openai/qwen3.6:35b-a3b` + - For **SGLang/vLLM**: `openai/Qwen3.6-35B-A3B` - **Base URL**: `http://host.docker.internal:/v1` Use port `11434` for Ollama, or `8000` for SGLang and vLLM. - **API Key**: