Run powerful AI models locally on your machine with Ollama—no API keys, no internet required, and completely private.
- Install Ollama from ollama.com
- Pull a model:
ollama pull llama3.2 - Start the Ollama server (runs on port 11434 by default)
- Configure graphify-dotnet with
OllamaClientFactoryor unifiedChatClientFactory - Analyze code locally!
| Feature | Ollama | Cloud APIs |
|---|---|---|
| Privacy | Data stays on your machine | Sent to servers |
| Cost | Free (one-time download) | Pay per request |
| Offline | Works without internet | Requires connectivity |
| Speed | GPU-accelerated locally | Network latency |
| Models | Llama, CodeLlama, Mistral, etc. | Limited selection |
Perfect for:
- Development & testing without spending API credits
- Sensitive code analysis (keeps your code private)
- Prototyping features that will later use cloud APIs
- Offline environments (airgapped networks, laptops without internet)
- 4GB+ RAM minimum (8GB+ recommended)
- GPU strongly recommended (NVIDIA, AMD, or Apple Silicon for best performance)
- 2GB+ disk space per model
# Download and run the installer from https://ollama.com
# Or use Homebrew:
brew install ollama
# Start the server (runs in background)
ollama serve# Official installation script
curl -fsSL https://ollama.com/install.sh | sh
# Start the server
ollama serve- Download the Windows installer from ollama.com/download
- Run the
.exeinstaller - The server starts automatically in the background
- Verify it's running: Open PowerShell and run:
curl http://localhost:11434/api/tags
docker run -d -v ollama:/root/.ollama -p 11434:11434 ollama/ollamaThe first time you use a model, Ollama downloads it. This may take a few minutes.
# llama3.2 - Excellent for general coding tasks, 8B/70B
ollama pull llama3.2
# Or pull the larger 70B version for better analysis
ollama pull llama3.2:70b
# CodeLlama - Specialized for code, faster
ollama pull codellama
# Deepseek Coder - Excellent code understanding
ollama pull deepseek-coderollama pull mistral
ollama pull neural-chatollama listollama rm llama3.2# Check if Ollama is serving (any response = success)
curl http://localhost:11434/api/tags
# Expected response:
# {"models":[{"name":"llama3.2:latest","modified_at":"..."}]}
# On Windows with PowerShell:
curl -Uri http://localhost:11434/api/tags -UseBasicParsingIf you see connection errors, restart the Ollama server:
macOS/Linux:
# Kill existing process
pkill ollama
# Start fresh
ollama serveWindows:
- Restart the Ollama application from the system tray
- Or:
Restart-Service ollamain PowerShell (admin)
Use the new System.CommandLine CLI syntax to configure Ollama:
# Run with default Ollama settings (localhost:11434, llama3.2)
graphify run ./my-project --provider ollama
# Specify a custom model
graphify run ./my-project --provider ollama --model codellama
# Use a custom endpoint
graphify run ./my-project --provider ollama --endpoint http://custom:11434
# Combine options
graphify run ./my-project --provider ollama --model deepseek-coder --endpoint http://192.168.1.100:11434graphify-dotnet supports a layered configuration system (priority order):
- CLI arguments (highest priority)
- User secrets (.NET user secrets)
- Environment variables
- appsettings.local.json (saved by
graphify configwizard) - appsettings.json (lowest priority)
Set these for automatic configuration:
# Linux/macOS
export GRAPHIFY__Provider=ollama
export GRAPHIFY__Ollama__Endpoint=http://localhost:11434
export GRAPHIFY__Ollama__ModelId=llama3.2
# Windows (PowerShell)
$env:GRAPHIFY__Provider = "ollama"
$env:GRAPHIFY__Ollama__Endpoint = "http://localhost:11434"
$env:GRAPHIFY__Ollama__ModelId = "llama3.2"Use .NET user secrets for local development (keeps secrets out of source):
# Set secrets for your project
dotnet user-secrets set "Graphify:Provider" "Ollama"
dotnet user-secrets set "Graphify:Ollama:Endpoint" "http://localhost:11434"
dotnet user-secrets set "Graphify:Ollama:ModelId" "llama3.2"
# List configured secrets
dotnet user-secrets listConfigure in your application's appsettings.json:
{
"Graphify": {
"Provider": "Ollama",
"Ollama": {
"Endpoint": "http://localhost:11434",
"ModelId": "llama3.2"
}
}
}Use the graphify config show command to verify your configuration:
graphify config showThis displays the active configuration values from all sources.
For SDK usage in your own applications:
using Graphify.Sdk;
using Microsoft.Extensions.AI;
// Create Ollama options
var aiOptions = new AiProviderOptions(
Provider: AiProvider.Ollama,
Endpoint: "http://localhost:11434",
ModelId: "llama3.2"
);
IChatClient client = ChatClientFactory.Create(aiOptions);
// Use the client for local analysis
var response = await client.GetResponseAsync(
[new ChatMessage(ChatRole.User, "Explain this C# code...")]);
Console.WriteLine(response.Text);using System;
using Graphify.Sdk;
using Microsoft.Extensions.AI;
public class LocalCodeAnalyzer
{
public static async Task Main(string[] args)
{
// 1. Create Ollama options
var options = new AiProviderOptions(
Provider: AiProvider.Ollama,
Endpoint: "http://localhost:11434",
ModelId: "llama3.2"
);
// 2. Create the client
IChatClient client = ChatClientFactory.Create(options);
// 3. Analyze code locally (no internet needed!)
string codeSnippet = @"
public class Calculator {
public int Add(int a, int b) => a + b;
public int Multiply(int a, int b) => a * b;
}";
string prompt = $"Analyze this C# code:\n\n{codeSnippet}";
Console.WriteLine("Analyzing with Ollama (llama3.2)...");
var response = await client.GetResponseAsync(
[new ChatMessage(ChatRole.User, prompt)]);
Console.WriteLine("\nAnalysis:");
Console.WriteLine(response.Text);
}
}| Model | Size | Speed | Quality | Best For |
|---|---|---|---|---|
| llama3.2 | 8B / 70B | Fast / Slow | Good / Excellent | General coding, good balance |
| codellama | 7B / 34B | Fast / Moderate | Very Good | Code-specific tasks |
| deepseek-coder | 6B / 33B | Very Fast / Fast | Excellent | Code understanding |
| mistral | 7B | Very Fast | Good | Lightweight, fast |
- 7B models: ~4-5GB VRAM, use for laptop development
- 13B models: ~8GB VRAM, balanced performance
- 70B+ models: 16GB+ VRAM, best quality (GPU required)
No GPU? Use 7B models and set OLLAMA_NUM_GPU=0 to use CPU (slower but works).
NVIDIA:
# Automatically detected if CUDA installed
ollama serveAMD (ROCm):
# Requires ROCm installation
export OLLAMA_NUM_PARALLEL=4
ollama serveApple Silicon (M1/M2/M3):
# Automatically uses Neural Engine
ollama serveCPU-only (if no GPU):
OLLAMA_NUM_GPU=0 ollama serve// Smaller, faster model for quick analysis
var smallOptions = new AiProviderOptions(
Provider: AiProvider.Ollama,
Endpoint: "http://localhost:11434",
ModelId: "mistral" // 7B, very fast
);
// Larger, higher-quality model for detailed analysis
var largeOptions = new AiProviderOptions(
Provider: AiProvider.Ollama,
Endpoint: "http://localhost:11434",
ModelId: "llama3.2:70b" // 70B, slower but better
);
// In production, choose based on your needs:
var model = needsSpeed ? "mistral" : "llama3.2:70b";# Background service (macOS/Linux)
nohup ollama serve > ollama.log 2>&1 &
# Or use systemd (Linux)
sudo systemctl enable ollama
sudo systemctl start ollamaFor analyzing multiple files, queue requests to avoid overloading local resources:
var semaphore = new SemaphoreSlim(maxConcurrentRequests: 2);
foreach (var file in codeFiles)
{
await semaphore.WaitAsync();
_ = AnalyzeFile(file).ContinueWith(_ => semaphore.Release());
}| Variable | Description | Default |
|---|---|---|
OLLAMA_ENDPOINT |
Ollama server URL | http://localhost:11434 |
OLLAMA_MODEL |
Model to use | llama3.2 |
OLLAMA_NUM_GPU |
GPU layers to load (0 = CPU only) | Auto-detect |
OLLAMA_NUM_PARALLEL |
Parallel requests | 1 |
Example:
# Use more GPU layers for faster inference
export OLLAMA_NUM_GPU=40
export OLLAMA_NUM_PARALLEL=2
ollama serveCause: Ollama server not running
Solution:
# Start the server
ollama serve
# Or check if it's running
curl http://localhost:11434/api/tags
# On Windows, restart from system trayCause: Model hasn't been pulled yet
Solution:
# List available models
ollama list
# Pull the model
ollama pull llama3.2
# Check progress
ollama listCause: Model is too large for available VRAM
Solution:
- Use a smaller model:
ollama pull mistral(7B) - Disable GPU:
OLLAMA_NUM_GPU=0 ollama serve - Add more RAM or increase swap
- Reduce parallel requests:
OLLAMA_NUM_PARALLEL=1
Cause: GPU not being used
Solution:
# Check GPU usage
ollama list
# Verify CUDA is installed (NVIDIA)
nvidia-smi
# Restart Ollama to detect GPU
ollama serve
# Force CPU if you want (not recommended)
OLLAMA_NUM_GPU=0 ollama serveCause: Model taking too long (CPU inference or large model)
Solution:
- Increase timeout in your code:
var cts = new CancellationTokenSource(TimeSpan.FromSeconds(300)); // 5 min var response = await client.GetResponseAsync( [new ChatMessage(ChatRole.User, prompt)], cancellationToken: cts.Token);
- Use a smaller/faster model
- Increase available VRAM
// Easy, local setup
var options = new AiProviderOptions(
Provider: AiProvider.Ollama,
Endpoint: "http://localhost:11434",
ModelId: "mistral" // Fast 7B model
);
var client = ChatClientFactory.Create(options);// Use environment variables, larger model
var options = new AiProviderOptions(
Provider: AiProvider.Ollama,
Endpoint: Environment.GetEnvironmentVariable("OLLAMA_ENDPOINT"),
ModelId: Environment.GetEnvironmentVariable("OLLAMA_MODEL") ?? "llama3.2"
);
var client = ChatClientFactory.Create(options);- Using graphify-dotnet with Azure OpenAI
- Using graphify-dotnet with GitHub Copilot SDK
- Ollama Documentation
- Available Models
- API Reference: OllamaClientFactory
- Install Ollama and pull a model
- Verify it's running:
curl http://localhost:11434/api/tags - Run the example code above
- Explore the README for SDK features
- Build your own code analysis tools!
Need help? Open an issue on GitHub or check the documentation.