-
Notifications
You must be signed in to change notification settings - Fork 26
Refresh recommended LLM docs #517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,25 +1,25 @@ | ||
| --- | ||
| title: Local LLMs | ||
| description: When using a Local LLM, OpenHands may have limited functionality. It is highly recommended that you use GPUs to serve local models for optimal experience. | ||
| --- | ||
|
|
||
| ## News | ||
|
|
||
| - 2025/12/12: We now recommend two powerful local models for OpenHands: [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) and [Devstral Small 2 (24B)](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512). Both models deliver excellent performance on coding tasks and work great with OpenHands! | ||
| - 2026/05/21: We now recommend [Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) as the first local model to try with OpenHands. It is an open-weight MoE model built for agentic coding, supports a large context window, and is available through LM Studio, Ollama, vLLM, and SGLang. | ||
|
Check warning on line 8 in openhands/usage/llms/local-llms.mdx
|
||
|
|
||
| ## Quickstart: Running OpenHands with a Local LLM using LM Studio | ||
|
|
||
| This guide explains how to serve a local LLM using [LM Studio](https://lmstudio.ai/) and have OpenHands connect to it. | ||
|
|
||
| We recommend: | ||
| - **LM Studio** as the local model server, which handles metadata downloads automatically and offers a simple, user-friendly interface for configuration. | ||
| - **Qwen3-Coder-30B-A3B-Instruct** as the LLM for software development. This model is optimized for coding tasks and works excellently with agent-style workflows like OpenHands. | ||
| - **Qwen3.6-35B-A3B** as the LLM for software development. This model is optimized for agentic coding and works well with tool-heavy workflows like OpenHands. | ||
|
|
||
| ### Hardware Requirements | ||
|
|
||
| Running Qwen3-Coder-30B-A3B-Instruct requires: | ||
| - A recent GPU with at least 12GB of VRAM (tested on RTX 3060 with 12GB VRAM + 64GB RAM), or | ||
| - A Mac with Apple Silicon with at least 32GB of RAM | ||
| Running Qwen3.6-35B-A3B requires: | ||
| - A recent GPU with at least 24GB of VRAM for quantized variants, or multiple GPUs for full precision and larger context windows, or | ||
| - A Mac with Apple Silicon with at least 64GB of unified memory for quantized variants | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 Suggestion: Hardware requirements increased significantly from 12GB to 24GB VRAM. Consider adding a note in the "News" section or as a callout warning to help users with older hardware understand this change upfront. Example: <Warning>
Qwen3.6-35B-A3B requires more VRAM than the previous Qwen3-Coder-30B-A3B-Instruct (24GB vs 12GB for quantized variants). If you have limited hardware, consider using a smaller quantized variant or one of the community-reported models mentioned below.
</Warning> |
||
|
|
||
| ### 1. Install LM Studio | ||
|
|
||
|
|
@@ -32,7 +32,7 @@ | |
|
|
||
|  | ||
|
|
||
| 3. Search for **"Qwen3-Coder-30B-A3B-Instruct"**, confirm you're downloading from the official Qwen publisher, then proceed to download. | ||
| 3. Search for **"Qwen3.6-35B-A3B"**, confirm you're downloading from the official Qwen publisher, then proceed to download. | ||
|
|
||
|  | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 Suggestion: The screenshot filename references "devstral" but should show Qwen3.6-35B-A3B. Verify that screenshot files match the current model recommendations, or update filenames/alt text to be model-agnostic (e.g., |
||
|
|
||
|
|
@@ -46,11 +46,11 @@ | |
|  | ||
|
|
||
| 3. Enable the "Manually choose model load parameters" switch. | ||
| 4. Select **Qwen3-Coder-30B-A3B-Instruct** from the model list. | ||
| 4. Select **Qwen3.6-35B-A3B** from the model list. | ||
|
|
||
|  | ||
|
|
||
| 5. Enable the "Show advanced settings" switch at the bottom of the Model settings flyout to show all the available settings. | ||
| 6. Set "Context Length" to at least 22000 (for lower VRAM systems) or 32768 (recommended for better performance) and enable Flash Attention. | ||
| 7. Click "Load Model" to start loading the model. | ||
|
|
||
|
|
@@ -108,7 +108,7 @@ | |
| 2. Enable the "Advanced" switch at the top of the page to show all the available settings. | ||
|
|
||
| 3. Set the following values: | ||
| - **Custom Model**: `openai/qwen/qwen3-coder-30b-a3b-instruct` (the Model API identifier from LM Studio, prefixed with "openai/") | ||
| - **Custom Model**: `openai/qwen/qwen3.6-35b-a3b` (the Model API identifier from LM Studio, prefixed with "openai/") | ||
| - **Base URL**: `http://host.docker.internal:1234/v1` | ||
| - **API Key**: `local-llm` | ||
|
|
||
|
|
@@ -122,48 +122,48 @@ | |
|
|
||
| ## Community-Reported Notes and Troubleshooting | ||
|
|
||
| If OpenHands behaves like a plain chatbot, refuses to use tools or files, or has constant failed tool calls with a local model, the issue may be with the model itself rather than your setup. Even with a large context window, some local models may struggle with reliable tool use. | ||
|
|
||
| **Community-reported working models:** | ||
| - `qwen2.5-coder-14b-instruct` — reported to resolve chatbot-like behavior | ||
| - `qwopus3.5-27b-v3 Q8_0` (and similar retrained qwopus variants) — reported to work well with tool calls | ||
|
|
||
| If you're experiencing issues, try switching to one of these models before assuming the setup is broken. | ||
|
|
||
| ## Advanced: Alternative LLM Backends | ||
|
|
||
| This section describes how to run local LLMs with OpenHands using alternative backends like Ollama, SGLang, or vLLM — without relying on LM Studio. | ||
|
Check warning on line 135 in openhands/usage/llms/local-llms.mdx
|
||
|
|
||
| ### Create an OpenAI-Compatible Endpoint with Ollama | ||
|
|
||
| - Install Ollama following [the official documentation](https://ollama.com/download). | ||
| - Example launch command for Qwen3-Coder-30B-A3B-Instruct: | ||
| - Example launch command for Qwen3.6-35B-A3B: | ||
|
|
||
| ```bash | ||
| # ⚠️ WARNING: OpenHands requires a large context size to work properly. | ||
| # When using Ollama, set OLLAMA_CONTEXT_LENGTH to at least 22000. | ||
| # The default (4096) is way too small — not even the system prompt will fit, and the agent will not behave correctly. | ||
| OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_KEEP_ALIVE=-1 nohup ollama serve & | ||
| ollama pull qwen3-coder:30b | ||
| ollama pull qwen3.6:35b-a3b | ||
| ``` | ||
|
|
||
| ### Create an OpenAI-Compatible Endpoint with vLLM or SGLang | ||
|
|
||
| First, download the model checkpoint: | ||
|
|
||
| ```bash | ||
| huggingface-cli download Qwen/Qwen3-Coder-30B-A3B-Instruct --local-dir Qwen/Qwen3-Coder-30B-A3B-Instruct | ||
| huggingface-cli download Qwen/Qwen3.6-35B-A3B --local-dir Qwen/Qwen3.6-35B-A3B | ||
| ``` | ||
|
|
||
| #### Serving the model using SGLang | ||
|
|
||
| - Install SGLang following [the official documentation](https://docs.sglang.io/get_started/install.html). | ||
| - Example launch command (with at least 2 GPUs): | ||
|
|
||
| ```bash | ||
| SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \ | ||
| --model Qwen/Qwen3-Coder-30B-A3B-Instruct \ | ||
| --served-model-name Qwen3-Coder-30B-A3B-Instruct \ | ||
| --model Qwen/Qwen3.6-35B-A3B \ | ||
| --served-model-name Qwen3.6-35B-A3B \ | ||
| --port 8000 \ | ||
| --tp 2 --dp 1 \ | ||
| --host 0.0.0.0 \ | ||
|
|
@@ -173,14 +173,14 @@ | |
| #### Serving the model using vLLM | ||
|
|
||
| - Install vLLM following [the official documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html). | ||
| - Example launch command (with at least 2 GPUs): | ||
|
|
||
| ```bash | ||
| vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \ | ||
| vllm serve Qwen/Qwen3.6-35B-A3B \ | ||
| --host 0.0.0.0 --port 8000 \ | ||
| --api-key mykey \ | ||
| --tensor-parallel-size 2 \ | ||
| --served-model-name Qwen3-Coder-30B-A3B-Instruct \ | ||
| --served-model-name Qwen3.6-35B-A3B \ | ||
| --enable-prefix-caching | ||
| ``` | ||
|
|
||
|
|
@@ -197,11 +197,11 @@ | |
| 2. Run the launch command with speculative decoding enabled: | ||
|
|
||
| ```bash | ||
| vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \ | ||
| vllm serve Qwen/Qwen3.6-35B-A3B \ | ||
| --host 0.0.0.0 --port 8000 \ | ||
| --api-key mykey \ | ||
| --tensor-parallel-size 2 \ | ||
| --served-model-name Qwen3-Coder-30B-A3B-Instruct \ | ||
| --served-model-name Qwen3.6-35B-A3B \ | ||
| --speculative-config '{"method": "suffix"}' | ||
| ``` | ||
|
|
||
|
|
@@ -225,10 +225,10 @@ | |
| 2. Enable the **Advanced** toggle at the top of the page. | ||
| 3. Set the following parameters, if you followed the examples above: | ||
| - **Custom Model**: `openai/<served-model-name>` | ||
| - For **Ollama**: `openai/qwen3-coder:30b` | ||
| - For **SGLang/vLLM**: `openai/Qwen3-Coder-30B-A3B-Instruct` | ||
| - For **Ollama**: `openai/qwen3.6:35b-a3b` | ||
| - For **SGLang/vLLM**: `openai/Qwen3.6-35B-A3B` | ||
| - **Base URL**: `http://host.docker.internal:<port>/v1` | ||
| Use port `11434` for Ollama, or `8000` for SGLang and vLLM. | ||
|
Check warning on line 231 in openhands/usage/llms/local-llms.mdx
|
||
| - **API Key**: | ||
| - For **Ollama**: any placeholder value (e.g. `dummy`, `local-llm`) | ||
| - For **SGLang** or **vLLM**: use the same key provided when starting the server (e.g. `mykey`) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Suggestion: The entry for Qwen3.6-35B-A3B says "Not yet listed" in the OpenHands Index Average column. Consider either:
This helps users understand the recommendation basis and maintains trust in the OpenHands Index as the primary evaluation source.