Forgot everything? Start here.
cd ~/PycharmProjects/coding-assistant
# 1. Build llama.cpp (one-time, takes ~5 minutes)
./setup.sh
# 2. Download model (one-time, ~4GB)
./download-model.sh qwen2.5-coder-7b
# 3. Start using it
./chat.sh./chat.sh./server.shThen configure your IDE to use: http://127.0.0.1:8080/v1
./download-model.sh./download-model.sh qwen2.5-coder-3b # Smaller, faster- Open
config.shin any editor - Change this line:
ACTIVE_MODEL="qwen2.5-coder-3b" - Save and restart chat/server
Edit config.sh - key settings:
ACTIVE_MODEL- which model to useN_THREADS- CPU threads (default: 6)CONTEXT_SIZE- context window (default: 4096)TEMPERATURE- creativity 0.1-1.0 (default: 0.5)SERVER_PORT- API port (default: 8080)
→ Run ./setup.sh
→ Run ./download-model.sh qwen2.5-coder-7b
→ Try smaller model: ./download-model.sh qwen2.5-coder-3b
→ Edit config.sh: reduce CONTEXT_SIZE to 4096
→ Edit config.sh: change SERVER_PORT=8081
- models.conf - Add new models here (one line per model)
- config.sh - Change settings here (active model, threads, etc.)
- README.md - Full documentation (read when you have time)
Really, it's just:
./setup.sh(once)./download-model.sh qwen2.5-coder-7b(once per model)./chat.shor./server.sh(daily use)