Guide to configure OpenCode to use your local llama.cpp server.
Note: Droid (Factory CLI) currently doesn't support local models - see section below.
- Complete the setup:
./setup.sh - Download a model:
./download-model.sh qwen2.5-coder-7b - Start the server:
./server.sh
Keep the server running while using OpenCode.
File Location: ~/.config/opencode/opencode.json
Create the directory if it doesn't exist:
mkdir -p ~/.config/opencodeConfiguration:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"llamacpp": {
"npm": "@ai-sdk/openai-compatible",
"name": "Local Model",
"options": {
"baseURL": "http://127.0.0.1:8080/v1"
},
"models": {
"local": {
"name": "Local Model"
}
}
}
}
}Create the file:
cat > ~/.config/opencode/opencode.json <<'EOF'
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"llamacpp": {
"npm": "@ai-sdk/openai-compatible",
"name": "Local Model",
"options": {
"baseURL": "http://127.0.0.1:8080/v1"
},
"models": {
"local": {
"name": "Local Model"
}
}
}
}
}
EOFNote: Using generic "local" model name means you never need to update this config when switching models.
⚠️ Known Issue: Droid (Factory CLI) currently does not work with local models. It ignores thebase_urland routes requests through Factory's remote servers. Use OpenCode instead for local models.
File Location: ~/.factory/settings.json (managed through Droid UI)
If Factory fixes this issue in the future, add a custom model through Droid's settings with:
- Base URL:
http://127.0.0.1:8080/v1 - Provider:
generic-chat-completion-api - API Key:
not-needed
All server settings are in config.sh. Edit and restart server to apply:
| Parameter | Default | Description |
|---|---|---|
ACTIVE_MODEL |
qwen2.5-coder-7b |
Model ID (from models.conf) |
N_THREADS |
6 |
CPU threads (leave 1 for system) |
CONTEXT_SIZE |
4096 |
Context window (higher = more memory) |
TEMPERATURE |
0.5 |
Creativity (0.1-1.0, lower = focused) |
TOP_P |
0.95 |
Nucleus sampling |
REPEAT_PENALTY |
1.1 |
Reduce repetition |
SERVER_PORT |
8080 |
API server port |
| Setting | Value | Notes |
|---|---|---|
| Base URL | http://127.0.0.1:8080/v1 |
Must match SERVER_PORT in config.sh |
| Model Name | local |
Generic name - works with any model |
| API Key | not needed | Local server has no auth |
See SWITCHING_MODELS.md for details.
Quick version:
./download-model.sh MODEL_ID # Download
nano config.sh # Change ACTIVE_MODEL
./server.sh # Restart (type x first if running)OpenCode config doesn't need to change - it uses a generic "local" model name.
Test server is running:
curl http://127.0.0.1:8080/v1/modelsShould return:
{
"object": "list",
"data": [...]
}Test chat completion:
curl http://127.0.0.1:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Say hello"}],
"temperature": 0.7,
"max_tokens": 100
}'- Server not running: Start with
./server.sh - Wrong port: Check
SERVER_PORTinconfig.sh
If the model ignores your question or gives unexpected responses:
- Check server is running:
curl http://127.0.0.1:8080/v1/models - Restart server: type
xin server terminal, then./server.sh - If using specific model name in OpenCode config, ensure it matches
ACTIVE_MODELin config.sh
Normal shutdown: Type x and press Enter in the server terminal.
If stuck: Find and kill the process:
# Find the process
ps aux | grep llama-server
# Kill it
pkill llama-server
# Force kill if needed
pkill -9 llama-serverNote: Ctrl+C doesn't work with llama-server. Use x + Enter instead.
- Try smaller model:
qwen2.5-coder-3b - Reduce
CONTEXT_SIZEin config.sh
- Change
SERVER_PORTinconfig.sh(e.g., to8081) - Update
baseURLin OpenCode config - Restart server
Copy-paste to set up OpenCode:
mkdir -p ~/.config/opencode
cat > ~/.config/opencode/opencode.json <<'EOF'
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"llamacpp": {
"npm": "@ai-sdk/openai-compatible",
"name": "Local Model",
"options": {
"baseURL": "http://127.0.0.1:8080/v1"
},
"models": {
"local": {
"name": "Local Model"
}
}
}
}
}
EOF
echo "OpenCode configured. Start server with: ./server.sh"-
Start server (once per session):
cd ~/PycharmProjects/coding-assistant ./server.sh
-
Use OpenCode normally - it connects to your local model
-
Stop server when done: type
xand Enter in server terminal
- Close other applications while using the local model
- Your hardware (i7-8665U, 7 threads) will generate 5-15 tokens/second
- First response takes 2-5 seconds to load context
- Smaller models (3B) are faster but less capable