Ollama Copilot allows users to integrate their Ollama code completion models into Neovim, giving GitHub Copilot-like tab completions.
Offers Suggestion Streaming which will stream the completions into your editor as they are generated from the model.
- Debouncing for subsequent completion requests to avoid overflows of Ollama requests which lead to CPU over-utilization.
- Full control over triggers, using textChange events instead of Neovim client requests.
- Language server which can provide code completions from an Ollama model
- Ghost text completions which can be inserted into the editor
- Streamed ghost text completions which populate in real-time
To use Ollama-Copilot, you need to have Ollama installed github.com/ollama/ollama:
curl -fsSL https://ollama.com/install.sh | shAlso, the language server runs on Python, and requires two libraries (Can also be found in python/requirements.txt)
pip install pygls ollamaMake sure you have the model you want to use installed, a catalog can be found here: ollama.com/library
# To view your available models:
ollama ls
# To pull a new model
ollama pull <Model name>
Lazy:
-- Default configuration
{"Jacob411/Ollama-Copilot", opts={}}-- Custom configuration (defaults shown)
{
'jacob411/Ollama-Copilot',
opts = {
-- Prefer base code models for autocomplete, not *-instruct chat variants.
model_name = "qwen2.5-coder:3b",
ollama_url = "http://localhost:11434", -- URL for Ollama server, Leave blank to use default local instance.
stream_suggestion = false,
python_command = "python3",
filetypes = {'python', 'lua','vim', "markdown"},
capabilities = nil, -- LSP capabilities, auto-detected if not provided
ollama_model_opts = {
temperature = 0.1, -- keep low entropy for stable tab completion
top_p = 0.9,
num_predict = 128, -- 64-256 is usually best for autocomplete
num_ctx = 8192,
fim_enabled = true, -- include prefix + suffix (Fill-in-the-middle)
fim_mode = "auto", -- "auto" | "template" | "manual" | "off"
context_lines_before = 80,
context_lines_after = 40,
max_prefix_chars = 8000,
max_suffix_chars = 3000,
stop = { "<|im_start|>", "<|im_end|>", "<|fim_prefix|>", "<|fim_suffix|>", "<|fim_middle|>", "```" },
-- Internal payload/response logging (or set OLLAMA_COPILOT_DEBUG=1).
-- debug = true,
-- debug_log_file = "/tmp/ollama-copilot-debug.log",
},
keymaps = {
suggestion = '<leader>os',
reject = '<leader>or',
insert_accept = '<Tab>',
},
}
},For more Ollama customization, see github.com/ollama/ollama/blob/main/docs/modelfile.md
The plugin automatically detects and configures LSP capabilities for optimal completion support:
-
Auto-detection (default): If
capabilitiesis not specified, the plugin will:- Try to use
cmp_nvim_lsp.default_capabilities()if nvim-cmp is installed - Fall back to
vim.lsp.protocol.make_client_capabilities()if nvim-cmp is not available
- Try to use
-
Custom capabilities: You can override the auto-detection by providing your own capabilities:
opts = { capabilities = require('cmp_nvim_lsp').default_capabilities(), -- or use custom capabilities capabilities = vim.tbl_deep_extend('force', vim.lsp.protocol.make_client_capabilities(), { your_custom_capability = true } ) }
This ensures backward compatibility while allowing the plugin to work without requiring nvim-cmp as a dependency.
Ollama copilot language server will attach when you enter a buffer and can be viewed using:
:LspInfoPrefer base coder models for completion quality (qwen2.5-coder:*, deepseek-coder:*) and avoid *-instruct unless you explicitly want chat-like behavior.
3B models are fast but can be weak/unstable on instruction-heavy files (markdown/docs), so 7B is often a better default if your machine can handle it.
To inspect exact requests sent to Ollama and raw streamed chunks:
OLLAMA_COPILOT_DEBUG=1 nvimor set in ollama_model_opts:
debug = true
debug_log_file = "/tmp/ollama-copilot-debug.log"Use the included payload test script to verify prompt shape and suffix usage:
cd ~/path/to/Ollama-Copilot
python3 python/payload_debug_demo.pyContributions are welcome! If you have any ideas for new features, improvements, or bug fixes, please open an issue or submit a pull request.
I am hopeful to add more on the model side as well, as I am interested in finetuning the models and implementing RAG techniques, moving outside of using just Ollama.
This project is licensed under the MIT License.
