Summary
SkillSpector currently requires a cloud API key (OpenAI, Anthropic, or NVIDIA) to run LLM-powered analysis. This creates a barrier for developers who want to evaluate the tool, run it in airgapped environments, or avoid cloud API costs during development.
Ollama serves an OpenAI-compatible API on localhost:11434 and supports hundreds of models (llama, mistral, qwen, gemma, deepseek, phi, etc.) running entirely on local hardware. Adding an Ollama provider would make SkillSpector usable without any API key, internet connection, or cloud spend.
Why This Matters
Scenario 1: First-time evaluation
A security engineer wants to evaluate SkillSpector before requesting API key budget approval. Today they can only run --no-llm mode, which skips the semantic analyzers and meta-analyzer enrichment — the most valuable part of the tool. With an Ollama provider, they could run a full scan with a locally-hosted model in minutes.
Scenario 2: Airgapped / restricted environments
Government and financial institutions often prohibit sending source code to external APIs. An Ollama provider enables full LLM analysis without any data leaving the machine.
Scenario 3: CI cost optimization
Running SkillSpector on every PR via cloud APIs costs money per scan. Self-hosted Ollama on a GPU runner eliminates per-scan API costs entirely.
Proposed Implementation
Ollama exposes an OpenAI-compatible endpoint at http://localhost:11434/v1. The implementation follows the existing provider pattern:
providers/
ollama/
__init__.py
provider.py
model_registry.yaml
provider.py — uses ChatOpenAI with Ollama's base URL:
class OllamaProvider:
DEFAULT_MODEL = "llama3.1:8b"
SLOT_DEFAULTS: dict[str, str] = {}
def resolve_credentials(self) -> tuple[str, str | None] | None:
base_url = os.environ.get(
"OLLAMA_BASE_URL", "http://localhost:11434/v1"
).strip()
# Ollama doesn't require an API key; ChatOpenAI needs a non-empty string
return "ollama", base_url
def create_chat_model(self, model, *, max_tokens, timeout=120):
return create_openai_compatible_chat_model(
model=model,
credentials=self.resolve_credentials(),
max_tokens=max_tokens,
timeout=timeout,
)
model_registry.yaml — token budgets for common Ollama models:
models:
llama3.1:8b:
context_length: 131072
max_output_tokens: 4096
llama3.1:70b:
context_length: 131072
max_output_tokens: 4096
mistral:7b:
context_length: 32768
max_output_tokens: 4096
qwen2.5:7b:
context_length: 131072
max_output_tokens: 8192
gemma2:9b:
context_length: 8192
max_output_tokens: 4096
deepseek-v3:latest:
context_length: 65536
max_output_tokens: 8192
Selection via env var:
export SKILLSPECTOR_PROVIDER=ollama
# Optionally override the default model:
export SKILLSPECTOR_MODEL=mistral:7b
# Optionally point to a remote Ollama instance:
export OLLAMA_BASE_URL=http://gpu-server:11434/v1
skillspector scan ./my-skill/
Registration in providers/__init__.py:
if name == "ollama":
from .ollama import OllamaProvider
return OllamaProvider()
Related
Structured Output Consideration
Ollama models may not support response_format: {"type": "json_object"} consistently. The provider should work well with PR #71 (text-based structured output parsing) when that lands. For models that don't support structured output natively, the existing --no-llm fallback remains available.
Scope
- New
providers/ollama/ subpackage (~80 lines)
- Registration in
providers/__init__.py (~5 lines)
- Update error message to mention
ollama as a valid provider
- Tests: provider construction, credential resolution, model registry lookup
- Documentation: add Ollama to README provider table and
.env.example
Summary
SkillSpector currently requires a cloud API key (OpenAI, Anthropic, or NVIDIA) to run LLM-powered analysis. This creates a barrier for developers who want to evaluate the tool, run it in airgapped environments, or avoid cloud API costs during development.
Ollama serves an OpenAI-compatible API on
localhost:11434and supports hundreds of models (llama, mistral, qwen, gemma, deepseek, phi, etc.) running entirely on local hardware. Adding an Ollama provider would make SkillSpector usable without any API key, internet connection, or cloud spend.Why This Matters
Scenario 1: First-time evaluation
A security engineer wants to evaluate SkillSpector before requesting API key budget approval. Today they can only run
--no-llmmode, which skips the semantic analyzers and meta-analyzer enrichment — the most valuable part of the tool. With an Ollama provider, they could run a full scan with a locally-hosted model in minutes.Scenario 2: Airgapped / restricted environments
Government and financial institutions often prohibit sending source code to external APIs. An Ollama provider enables full LLM analysis without any data leaving the machine.
Scenario 3: CI cost optimization
Running SkillSpector on every PR via cloud APIs costs money per scan. Self-hosted Ollama on a GPU runner eliminates per-scan API costs entirely.
Proposed Implementation
Ollama exposes an OpenAI-compatible endpoint at
http://localhost:11434/v1. The implementation follows the existing provider pattern:provider.py— usesChatOpenAIwith Ollama's base URL:model_registry.yaml— token budgets for common Ollama models:Selection via env var:
Registration in
providers/__init__.py:Related
OPENAI_BASE_URLworkaround. PR fix(schemas): normalize confidence from 0-100 scale before Pydantic validation #93 fixed a confidence-scale crash that surfaced in that configuration. A dedicated provider would formalize this path with proper env var naming, a model registry, and documentation.Structured Output Consideration
Ollama models may not support
response_format: {"type": "json_object"}consistently. The provider should work well with PR #71 (text-based structured output parsing) when that lands. For models that don't support structured output natively, the existing--no-llmfallback remains available.Scope
providers/ollama/subpackage (~80 lines)providers/__init__.py(~5 lines)ollamaas a valid provider.env.example