Using graphify-dotnet with Ollama (Local Models)

Run powerful AI models locally on your machine with Ollama—no API keys, no internet required, and completely private.

Quick Start

Install Ollama from ollama.com
Pull a model: ollama pull llama3.2
Start the Ollama server (runs on port 11434 by default)
Configure graphify-dotnet with OllamaClientFactory or unified ChatClientFactory
Analyze code locally!

Why Use Ollama?

Feature	Ollama	Cloud APIs
Privacy	Data stays on your machine	Sent to servers
Cost	Free (one-time download)	Pay per request
Offline	Works without internet	Requires connectivity
Speed	GPU-accelerated locally	Network latency
Models	Llama, CodeLlama, Mistral, etc.	Limited selection

Perfect for:

Development & testing without spending API credits
Sensitive code analysis (keeps your code private)
Prototyping features that will later use cloud APIs
Offline environments (airgapped networks, laptops without internet)

Prerequisites

4GB+ RAM minimum (8GB+ recommended)
GPU strongly recommended (NVIDIA, AMD, or Apple Silicon for best performance)
2GB+ disk space per model

Step 1: Install Ollama

macOS

# Download and run the installer from https://ollama.com
# Or use Homebrew:
brew install ollama

# Start the server (runs in background)
ollama serve

Linux

# Official installation script
curl -fsSL https://ollama.com/install.sh | sh

# Start the server
ollama serve

Windows

Download the Windows installer from ollama.com/download
Run the .exe installer
The server starts automatically in the background
Verify it's running: Open PowerShell and run:
```
curl http://localhost:11434/api/tags
```

Docker

docker run -d -v ollama:/root/.ollama -p 11434:11434 ollama/ollama

Step 2: Pull a Model

The first time you use a model, Ollama downloads it. This may take a few minutes.

For Code Analysis (Recommended)

# llama3.2 - Excellent for general coding tasks, 8B/70B
ollama pull llama3.2

# Or pull the larger 70B version for better analysis
ollama pull llama3.2:70b

# CodeLlama - Specialized for code, faster
ollama pull codellama

# Deepseek Coder - Excellent code understanding
ollama pull deepseek-coder

For General Tasks

ollama pull mistral
ollama pull neural-chat

View Installed Models

ollama list

Remove a Model

ollama rm llama3.2

Step 3: Verify Ollama is Running

# Check if Ollama is serving (any response = success)
curl http://localhost:11434/api/tags

# Expected response:
# {"models":[{"name":"llama3.2:latest","modified_at":"..."}]}

# On Windows with PowerShell:
curl -Uri http://localhost:11434/api/tags -UseBasicParsing

If you see connection errors, restart the Ollama server:

macOS/Linux:

# Kill existing process
pkill ollama

# Start fresh
ollama serve

Windows:

Restart the Ollama application from the system tray
Or: Restart-Service ollama in PowerShell (admin)

Step 4: Configure graphify-dotnet

CLI Usage (Recommended)

Use the new System.CommandLine CLI syntax to configure Ollama:

# Run with default Ollama settings (localhost:11434, llama3.2)
graphify run ./my-project --provider ollama

# Specify a custom model
graphify run ./my-project --provider ollama --model codellama

# Use a custom endpoint
graphify run ./my-project --provider ollama --endpoint http://custom:11434

# Combine options
graphify run ./my-project --provider ollama --model deepseek-coder --endpoint http://192.168.1.100:11434

Configuration Sources

graphify-dotnet supports a layered configuration system (priority order):

CLI arguments (highest priority)
User secrets (.NET user secrets)
Environment variables
appsettings.local.json (saved by graphify config wizard)
appsettings.json (lowest priority)

Environment Variables

Set these for automatic configuration:

# Linux/macOS
export GRAPHIFY__Provider=ollama
export GRAPHIFY__Ollama__Endpoint=http://localhost:11434
export GRAPHIFY__Ollama__ModelId=llama3.2

# Windows (PowerShell)
$env:GRAPHIFY__Provider = "ollama"
$env:GRAPHIFY__Ollama__Endpoint = "http://localhost:11434"
$env:GRAPHIFY__Ollama__ModelId = "llama3.2"

User Secrets

Use .NET user secrets for local development (keeps secrets out of source):

# Set secrets for your project
dotnet user-secrets set "Graphify:Provider" "Ollama"
dotnet user-secrets set "Graphify:Ollama:Endpoint" "http://localhost:11434"
dotnet user-secrets set "Graphify:Ollama:ModelId" "llama3.2"

# List configured secrets
dotnet user-secrets list

appsettings.json

Configure in your application's appsettings.json:

{
  "Graphify": {
    "Provider": "Ollama",
    "Ollama": {
      "Endpoint": "http://localhost:11434",
      "ModelId": "llama3.2"
    }
  }
}

View Current Configuration

Use the graphify config show command to verify your configuration:

graphify config show

This displays the active configuration values from all sources.

Programmatic Configuration (Code)

For SDK usage in your own applications:

using Graphify.Sdk;
using Microsoft.Extensions.AI;

// Create Ollama options
var aiOptions = new AiProviderOptions(
    Provider: AiProvider.Ollama,
    Endpoint: "http://localhost:11434",
    ModelId: "llama3.2"
);

IChatClient client = ChatClientFactory.Create(aiOptions);

// Use the client for local analysis
var response = await client.GetResponseAsync(
    [new ChatMessage(ChatRole.User, "Explain this C# code...")]);
Console.WriteLine(response.Text);

Full Working Example

using System;
using Graphify.Sdk;
using Microsoft.Extensions.AI;

public class LocalCodeAnalyzer
{
    public static async Task Main(string[] args)
    {
        // 1. Create Ollama options
        var options = new AiProviderOptions(
            Provider: AiProvider.Ollama,
            Endpoint: "http://localhost:11434",
            ModelId: "llama3.2"
        );

        // 2. Create the client
        IChatClient client = ChatClientFactory.Create(options);

        // 3. Analyze code locally (no internet needed!)
        string codeSnippet = @"
public class Calculator {
    public int Add(int a, int b) => a + b;
    public int Multiply(int a, int b) => a * b;
}";

        string prompt = $"Analyze this C# code:\n\n{codeSnippet}";
        
        Console.WriteLine("Analyzing with Ollama (llama3.2)...");
        var response = await client.GetResponseAsync(
            [new ChatMessage(ChatRole.User, prompt)]);
        Console.WriteLine("\nAnalysis:");
        Console.WriteLine(response.Text);
    }
}

Recommended Models for Code Analysis

Model	Size	Speed	Quality	Best For
llama3.2	8B / 70B	Fast / Slow	Good / Excellent	General coding, good balance
codellama	7B / 34B	Fast / Moderate	Very Good	Code-specific tasks
deepseek-coder	6B / 33B	Very Fast / Fast	Excellent	Code understanding
mistral	7B	Very Fast	Good	Lightweight, fast

Model Size Guide

7B models: ~4-5GB VRAM, use for laptop development
13B models: ~8GB VRAM, balanced performance
70B+ models: 16GB+ VRAM, best quality (GPU required)

No GPU? Use 7B models and set OLLAMA_NUM_GPU=0 to use CPU (slower but works).

Performance Tips

1. Use GPU Acceleration

NVIDIA:

# Automatically detected if CUDA installed
ollama serve

AMD (ROCm):

# Requires ROCm installation
export OLLAMA_NUM_PARALLEL=4
ollama serve

Apple Silicon (M1/M2/M3):

# Automatically uses Neural Engine
ollama serve

CPU-only (if no GPU):

OLLAMA_NUM_GPU=0 ollama serve

2. Tune Model Size vs. Performance

// Smaller, faster model for quick analysis
var smallOptions = new AiProviderOptions(
    Provider: AiProvider.Ollama,
    Endpoint: "http://localhost:11434",
    ModelId: "mistral"  // 7B, very fast
);

// Larger, higher-quality model for detailed analysis
var largeOptions = new AiProviderOptions(
    Provider: AiProvider.Ollama,
    Endpoint: "http://localhost:11434",
    ModelId: "llama3.2:70b"  // 70B, slower but better
);

// In production, choose based on your needs:
var model = needsSpeed ? "mistral" : "llama3.2:70b";

3. Keep Ollama Running

# Background service (macOS/Linux)
nohup ollama serve > ollama.log 2>&1 &

# Or use systemd (Linux)
sudo systemctl enable ollama
sudo systemctl start ollama

4. Batch Processing

For analyzing multiple files, queue requests to avoid overloading local resources:

var semaphore = new SemaphoreSlim(maxConcurrentRequests: 2);

foreach (var file in codeFiles)
{
    await semaphore.WaitAsync();
    _ = AnalyzeFile(file).ContinueWith(_ => semaphore.Release());
}

Environment Variables

Variable	Description	Default
`OLLAMA_ENDPOINT`	Ollama server URL	`http://localhost:11434`
`OLLAMA_MODEL`	Model to use	`llama3.2`
`OLLAMA_NUM_GPU`	GPU layers to load (0 = CPU only)	Auto-detect
`OLLAMA_NUM_PARALLEL`	Parallel requests	1

Example:

# Use more GPU layers for faster inference
export OLLAMA_NUM_GPU=40
export OLLAMA_NUM_PARALLEL=2
ollama serve

Troubleshooting

❌ Connection Refused (Cannot connect to Ollama)

Cause: Ollama server not running

Solution:

# Start the server
ollama serve

# Or check if it's running
curl http://localhost:11434/api/tags

# On Windows, restart from system tray

❌ Model Not Found

Cause: Model hasn't been pulled yet

Solution:

# List available models
ollama list

# Pull the model
ollama pull llama3.2

# Check progress
ollama list

❌ Out of Memory (OOM)

Cause: Model is too large for available VRAM

Solution:

Use a smaller model: ollama pull mistral (7B)
Disable GPU: OLLAMA_NUM_GPU=0 ollama serve
Add more RAM or increase swap
Reduce parallel requests: OLLAMA_NUM_PARALLEL=1

❌ Slow Responses (Running on CPU)

Cause: GPU not being used

Solution:

# Check GPU usage
ollama list

# Verify CUDA is installed (NVIDIA)
nvidia-smi

# Restart Ollama to detect GPU
ollama serve

# Force CPU if you want (not recommended)
OLLAMA_NUM_GPU=0 ollama serve

❌ Timeout Errors

Cause: Model taking too long (CPU inference or large model)

Solution:

Increase timeout in your code:

var cts = new CancellationTokenSource(TimeSpan.FromSeconds(300)); // 5 min
var response = await client.GetResponseAsync(
    [new ChatMessage(ChatRole.User, prompt)], cancellationToken: cts.Token);

Use a smaller/faster model
Increase available VRAM

Development vs. Production

Development (Your Machine)

// Easy, local setup
var options = new AiProviderOptions(
    Provider: AiProvider.Ollama,
    Endpoint: "http://localhost:11434",
    ModelId: "mistral"  // Fast 7B model
);
var client = ChatClientFactory.Create(options);

Production (Shared Server)

// Use environment variables, larger model
var options = new AiProviderOptions(
    Provider: AiProvider.Ollama,
    Endpoint: Environment.GetEnvironmentVariable("OLLAMA_ENDPOINT"),
    ModelId: Environment.GetEnvironmentVariable("OLLAMA_MODEL") ?? "llama3.2"
);
var client = ChatClientFactory.Create(options);

Next Steps

Install Ollama and pull a model
Verify it's running: curl http://localhost:11434/api/tags
Run the example code above
Explore the README for SDK features
Build your own code analysis tools!

Need help? Open an issue on GitHub or check the documentation.

FilesExpand file tree

setup-ollama.md

Latest commit

History

setup-ollama.md

File metadata and controls

Using graphify-dotnet with Ollama (Local Models)

Quick Start

Why Use Ollama?

Prerequisites

Step 1: Install Ollama

macOS

Linux

Windows

Docker

Step 2: Pull a Model

For Code Analysis (Recommended)

For General Tasks

View Installed Models

Remove a Model

Step 3: Verify Ollama is Running

Step 4: Configure graphify-dotnet

CLI Usage (Recommended)

Configuration Sources

Environment Variables

User Secrets

appsettings.json

View Current Configuration

Programmatic Configuration (Code)

Full Working Example

Recommended Models for Code Analysis

Model Size Guide

Performance Tips

1. Use GPU Acceleration

2. Tune Model Size vs. Performance

3. Keep Ollama Running

4. Batch Processing

Environment Variables

Troubleshooting

❌ Connection Refused (Cannot connect to Ollama)

❌ Model Not Found

❌ Out of Memory (OOM)

❌ Slow Responses (Running on CPU)

❌ Timeout Errors

Development vs. Production

Development (Your Machine)

Production (Shared Server)

See Also

Next Steps