OpenHands · neubig · May 22, 2026 · May 21, 2026 · all-hands-bot · May 21, 2026
@@ -4,7 +4,7 @@
 ---

 <Note>
 This section is for users who want to connect OpenHands to different LLMs.
 </Note>

 <Info>
@@ -15,20 +15,38 @@
 
 ## Model Recommendations
 
-Based on our evaluations of language models for coding tasks (using the SWE-bench dataset), we can provide some
-recommendations for model selection. Our latest benchmarking results can be found in
-[this spreadsheet](https://docs.google.com/spreadsheets/d/1wOUdFCMyY6Nt0AIqF705KN4JKOWgeI4wUGUP60krXXs/edit?gid=0).
+Model quality for coding agents changes quickly. These recommendations are based on current
+[OpenHands Index](https://index.openhands.dev/home) results where available. The linked
+[openhands-index-results repository](https://github.com/OpenHands/openhands-index-results) contains the full scores and
+trajectories for each run.
 
-Based on these findings and community feedback, these are the latest models that have been verified to work reasonably well with OpenHands:
+Use the strongest model you can afford for long-running or high-stakes tasks. Use lower-cost profiles for routine edits,
+then switch back to a stronger model for planning, debugging, and review.
 
-### Cloud / API-Based Models
+### Best Cloud Models by Family
 
-- [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api) (recommended)
-- [anthropic/claude-sonnet-4-5-20250929](https://www.anthropic.com/api) (recommended)
-- [openai/gpt-5-2025-08-07](https://openai.com/api/) (recommended)
-- [gemini/gemini-3-pro-preview](https://blog.google/products/gemini/gemini-3/)
-- [deepseek/deepseek-chat](https://api-docs.deepseek.com/)
-- [moonshot/kimi-k2-0711-preview](https://platform.moonshot.ai/docs/pricing/chat#generation-model-kimi-k2)
+| Family | Recommended Model | Model String | OpenHands Index Average | Notes |
+|--------|-------------------|--------------|-------------------------|-------|
+| Claude | [Claude Opus 4.7](https://github.com/OpenHands/openhands-index-results/tree/main/results/claude-opus-4-7) | `anthropic/claude-opus-4-7` | 68.2 | Best Claude-series result in the OpenHands Index. Use it for complex, long-running software work. Claude Opus 4.6 is close behind at 66.7. |
+| GPT | [GPT-5.5](https://github.com/OpenHands/openhands-index-results/tree/main/results/GPT-5.5) | `openai/gpt-5.5` | 65.9 | Best GPT-series result in the OpenHands Index. GPT-5.4 is close behind at 64.3. |
+| Gemini | [Gemini 3.1 Pro](https://github.com/OpenHands/openhands-index-results/tree/main/results/Gemini-3.1-Pro) | `gemini/gemini-3.1-pro-preview` | 57.0 | Best Gemini-series result in the OpenHands Index. Use Gemini 3 Flash when cost or latency is more important than top accuracy. |
+
+### Strong Open / Open-Weight Models
+
+These open or open-weight models have good OpenHands Index scores or are recommended for local OpenHands setups:
+
+| Model | Suggested Model String | OpenHands Index Average | Notes |
+|-------|------------------------|-------------------------|-------|
+| [GLM-5.1](https://github.com/OpenHands/openhands-index-results/tree/main/results/GLM-5.1) | `openrouter/z-ai/glm-5.1` | 58.2 | Strongest open-weight result currently listed in the OpenHands Index. |
+| [Kimi-K2.6](https://github.com/OpenHands/openhands-index-results/tree/main/results/Kimi-K2.6) | `openrouter/moonshotai/kimi-k2.6` | 57.1 | Strong open-weight option, especially for coding and information-gathering tasks. |
+| [DeepSeek-V4-Pro](https://github.com/OpenHands/openhands-index-results/tree/main/results/DeepSeek-V4-Pro) | `openrouter/deepseek/deepseek-v4-pro` | 51.3 | Strong coding and test-generation scores; current Index entry covers three benchmarks. |
+| [MiniMax-M2.7](https://github.com/OpenHands/openhands-index-results/tree/main/results/MiniMax-M2.7) | `openrouter/minimax/minimax-m2.7` | 43.4 | Recommended as a lower-cost open-weight option with strong SWE-bench and SWT-bench scores. Also available from MiniMax-compatible OpenAI endpoints as `openai/MiniMax-M2.7`. |
+| [Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) | `openai/Qwen3.6-35B-A3B` for local OpenAI-compatible servers, or `openrouter/qwen/qwen3.6-35b-a3b` through OpenRouter | Not yet listed | Recommended local / self-hosted model for OpenHands. It is open-weight, supports a large context window, and is featured in the [local LLM guide](/openhands/usage/llms/local-llms). |
+
+<Note>
+Hosted model strings can vary by provider and region. If a model string is not accepted, check the provider console and
+the [LiteLLM provider list](https://docs.litellm.ai/docs/providers), then use the provider-specific model ID shown there.
+</Note>
 
 If you have successfully run OpenHands with specific providers, we encourage you to open a PR to share your setup process
 to help others using the same provider!
@@ -37,21 +55,22 @@
 [litellm documentation](https://docs.litellm.ai/docs/providers).

 <Warning>
 OpenHands will issue many prompts to the LLM you configure. Most of these LLMs cost money, so be sure to set spending
 limits and monitor usage.
 </Warning>
 
 ### Local / Self-Hosted Models
 
-- [mistralai/devstral-small](https://openhands.dev/blog/devstral-a-new-state-of-the-art-open-model-for-coding-agents) (20 May 2025) -- also available through [OpenRouter](https://openrouter.ai/mistralai/devstral-small:free)
-- [all-hands/openhands-lm-32b-v0.1](https://openhands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model) (31 March 2025) -- also available through [OpenRouter](https://openrouter.ai/all-hands/openhands-lm-32b-v0.1)
+For local and self-hosted usage, start with
+[Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B). See the
+[local LLM guide](/openhands/usage/llms/local-llms) for LM Studio, Ollama, SGLang, and vLLM setup examples.
 
 ### Known Issues
 
 <Note>
-Most current local and open source models are not as powerful. When using such models, you may see long
-wait times between messages, poor responses, or errors about malformed JSON. OpenHands can only be as powerful as the
-models driving it. However, if you do find ones that work, please add them to the verified list above.
+Open-weight and local models still vary widely in tool-use reliability. If you see long wait times, poor responses, or
+errors about malformed JSON, try a stronger model, increase the context window, or switch to a frontier cloud model for
+that task.
 </Note>
 
 ## LLM Configuration
@@ -96,7 +115,7 @@

 LLM providers have specific settings that can be customized to optimize their performance with OpenHands, such as:

 - **Custom Tokenizers**: For specialized models, you can add a suitable tokenizer.
 - **Native Tool Calling**: Toggle native function/tool calling capabilities.

 For detailed information about model customization, see

@@ -1,25 +1,25 @@
 ---
 title: Local LLMs
 description: When using a Local LLM, OpenHands may have limited functionality. It is highly recommended that you use GPUs to serve local models for optimal experience.
 ---
 
 ## News
 
-- 2025/12/12: We now recommend two powerful local models for OpenHands: [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) and [Devstral Small 2 (24B)](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512). Both models deliver excellent performance on coding tasks and work great with OpenHands!
+- 2026/05/21: We now recommend [Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) as the first local model to try with OpenHands. It is an open-weight MoE model built for agentic coding, supports a large context window, and is available through LM Studio, Ollama, vLLM, and SGLang.
 
 ## Quickstart: Running OpenHands with a Local LLM using LM Studio
 
 This guide explains how to serve a local LLM using [LM Studio](https://lmstudio.ai/) and have OpenHands connect to it.
 
 We recommend:
 - **LM Studio** as the local model server, which handles metadata downloads automatically and offers a simple, user-friendly interface for configuration.
-- **Qwen3-Coder-30B-A3B-Instruct** as the LLM for software development. This model is optimized for coding tasks and works excellently with agent-style workflows like OpenHands.
+- **Qwen3.6-35B-A3B** as the LLM for software development. This model is optimized for agentic coding and works well with tool-heavy workflows like OpenHands.
 
 ### Hardware Requirements
 
-Running Qwen3-Coder-30B-A3B-Instruct requires:
-- A recent GPU with at least 12GB of VRAM (tested on RTX 3060 with 12GB VRAM + 64GB RAM), or
-- A Mac with Apple Silicon with at least 32GB of RAM
+Running Qwen3.6-35B-A3B requires:
+- A recent GPU with at least 24GB of VRAM for quantized variants, or multiple GPUs for full precision and larger context windows, or
+- A Mac with Apple Silicon with at least 64GB of unified memory for quantized variants
 
 ### 1. Install LM Studio
 
@@ -32,7 +32,7 @@
 
 ![image](./screenshots/01_lm_studio_open_model_hub.png)
 
-3. Search for **"Qwen3-Coder-30B-A3B-Instruct"**, confirm you're downloading from the official Qwen publisher, then proceed to download.
+3. Search for **"Qwen3.6-35B-A3B"**, confirm you're downloading from the official Qwen publisher, then proceed to download.
 
 ![image](./screenshots/02_lm_studio_download_devstral.png)
 
@@ -46,11 +46,11 @@
 ![image](./screenshots/03_lm_studio_open_load_model.png)
 
 3. Enable the "Manually choose model load parameters" switch.
-4. Select **Qwen3-Coder-30B-A3B-Instruct** from the model list.
+4. Select **Qwen3.6-35B-A3B** from the model list.
 
 ![image](./screenshots/04_lm_studio_setup_devstral_part_1.png)
 
 5. Enable the "Show advanced settings" switch at the bottom of the Model settings flyout to show all the available settings.
 6. Set "Context Length" to at least 22000 (for lower VRAM systems) or 32768 (recommended for better performance) and enable Flash Attention.
 7. Click "Load Model" to start loading the model.

@@ -108,7 +108,7 @@
 2. Enable the "Advanced" switch at the top of the page to show all the available settings.
 
 3. Set the following values:
-    - **Custom Model**: `openai/qwen/qwen3-coder-30b-a3b-instruct` (the Model API identifier from LM Studio, prefixed with "openai/")
+    - **Custom Model**: `openai/qwen/qwen3.6-35b-a3b` (the Model API identifier from LM Studio, prefixed with "openai/")
     - **Base URL**: `http://host.docker.internal:1234/v1`
     - **API Key**: `local-llm`
 
@@ -122,48 +122,48 @@

 ## Community-Reported Notes and Troubleshooting

 If OpenHands behaves like a plain chatbot, refuses to use tools or files, or has constant failed tool calls with a local model, the issue may be with the model itself rather than your setup. Even with a large context window, some local models may struggle with reliable tool use.

 **Community-reported working models:**
 - `qwen2.5-coder-14b-instruct` — reported to resolve chatbot-like behavior
 - `qwopus3.5-27b-v3 Q8_0` (and similar retrained qwopus variants) — reported to work well with tool calls

 If you're experiencing issues, try switching to one of these models before assuming the setup is broken.

 ## Advanced: Alternative LLM Backends

 This section describes how to run local LLMs with OpenHands using alternative backends like Ollama, SGLang, or vLLM — without relying on LM Studio.

 ### Create an OpenAI-Compatible Endpoint with Ollama
 
 - Install Ollama following [the official documentation](https://ollama.com/download).
-- Example launch command for Qwen3-Coder-30B-A3B-Instruct:
+- Example launch command for Qwen3.6-35B-A3B:
 
 ```bash
 # ⚠️ WARNING: OpenHands requires a large context size to work properly.
 # When using Ollama, set OLLAMA_CONTEXT_LENGTH to at least 22000.
 # The default (4096) is way too small — not even the system prompt will fit, and the agent will not behave correctly.
 OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_KEEP_ALIVE=-1 nohup ollama serve &
-ollama pull qwen3-coder:30b
+ollama pull qwen3.6:35b-a3b
 ```
 
 ### Create an OpenAI-Compatible Endpoint with vLLM or SGLang
 
 First, download the model checkpoint:
 
 ```bash
-huggingface-cli download Qwen/Qwen3-Coder-30B-A3B-Instruct --local-dir Qwen/Qwen3-Coder-30B-A3B-Instruct
+huggingface-cli download Qwen/Qwen3.6-35B-A3B --local-dir Qwen/Qwen3.6-35B-A3B
 ```
 
 #### Serving the model using SGLang

 - Install SGLang following [the official documentation](https://docs.sglang.io/get_started/install.html).
 - Example launch command (with at least 2 GPUs):
 
 ```bash
 SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \
-    --model Qwen/Qwen3-Coder-30B-A3B-Instruct \
-    --served-model-name Qwen3-Coder-30B-A3B-Instruct \
+    --model Qwen/Qwen3.6-35B-A3B \
+    --served-model-name Qwen3.6-35B-A3B \
     --port 8000 \
     --tp 2 --dp 1 \
     --host 0.0.0.0 \
@@ -173,14 +173,14 @@
 #### Serving the model using vLLM

 - Install vLLM following [the official documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html).
 - Example launch command (with at least 2 GPUs):
 
 ```bash
-vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \
+vllm serve Qwen/Qwen3.6-35B-A3B \
     --host 0.0.0.0 --port 8000 \
     --api-key mykey \
     --tensor-parallel-size 2 \
-    --served-model-name Qwen3-Coder-30B-A3B-Instruct \
+    --served-model-name Qwen3.6-35B-A3B \
     --enable-prefix-caching
 ```
 
@@ -197,11 +197,11 @@
 2. Run the launch command with speculative decoding enabled:
 
 ```bash
-vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \
+vllm serve Qwen/Qwen3.6-35B-A3B \
     --host 0.0.0.0 --port 8000 \
     --api-key mykey \
     --tensor-parallel-size 2 \
-    --served-model-name Qwen3-Coder-30B-A3B-Instruct \
+    --served-model-name Qwen3.6-35B-A3B \
     --speculative-config '{"method": "suffix"}'
 ```
 
@@ -225,10 +225,10 @@
 2. Enable the **Advanced** toggle at the top of the page.
 3. Set the following parameters, if you followed the examples above:
    - **Custom Model**: `openai/<served-model-name>`
-     - For **Ollama**: `openai/qwen3-coder:30b`
-     - For **SGLang/vLLM**: `openai/Qwen3-Coder-30B-A3B-Instruct`
+     - For **Ollama**: `openai/qwen3.6:35b-a3b`
+     - For **SGLang/vLLM**: `openai/Qwen3.6-35B-A3B`
    - **Base URL**: `http://host.docker.internal:<port>/v1`
      Use port `11434` for Ollama, or `8000` for SGLang and vLLM.
    - **API Key**:
     - For **Ollama**: any placeholder value (e.g. `dummy`, `local-llm`)
     - For **SGLang** or **vLLM**: use the same key provided when starting the server (e.g. `mykey`)