Överblick Security

Overview

Överblick is a security-first multi-identity agent framework. Every LLM interaction passes through a 6-layer fail-closed pipeline. This document describes the threat model, security guarantees, limitations, and reporting process.

Threat Model

Överblick is designed to protect against:

1. Prompt Injection & Jailbreaks

Attempts to override system instructions
Hidden commands in user input
Multi-message attacks
Persona hijacking

2. AI Language Leakage

LLM revealing it's an AI/model
Mentioning internal rules or safety guidelines
Breaking character/persona

3. Content Safety Violations

Harmful instructions (violence, self-harm)
Hate speech, discrimination
Dangerous technical instructions

4. Resource Exhaustion

Rate limit bypass attempts
Denial-of-service via LLM calls
Memory exhaustion through large inputs

5. Local Process Isolation

Unauthorized inter-process communication
Privilege escalation between identities
Secret leakage between identities

6. Configuration & Secret Exposure

Hardcoded credentials in plugins
Unencrypted secrets at rest
Permission bypass through misconfiguration

Security Architecture

Överblick implements defense-in-depth with 6 interlocking layers:

Layer 1: Input Sanitizer

Strips null bytes, control characters (except \n, \t, \r)
Unicode NFC normalization
Length truncation (10,000 characters)
Boundary markers wrap external content: <<<EXTERNAL_*_START>>> ... <<<EXTERNAL_*_END>>>

Layer 2: Preflight Checker (3 sublayers)

Fast pattern matching — 17 instant-block, 8 suspicion patterns
AI analysis — LLM-classifies suspicious messages (confidence ≥ 0.7)
User context tracking — Per-user suspicion scoring with temporary bans

Layer 3: Rate Limiter

Token bucket algorithm with LRU eviction
Per-key limits (e.g., "llm_pipeline:user123")
Configurable burst capacity and refill rate

Layer 4: LLM Call

Actual model invocation
Fail-closed: Empty response or error → blocked result

Layer 5: Output Safety (4 sublayers)

AI language detection — Blocks "I am an AI", model names, etc.
Persona break detection — Blocks "I'm not {identity}", role-breaking statements
Banned slang replacement — Identity-specific word filtering with replacements
Blocked content — Harmful content patterns

Layer 6: Audit Log

Structured SQLite audit trail
Every action logged with identity, plugin, success/failure
Automatic 90-day retention cleanup

Additional Protections

IPC Security

Unix domain sockets (macOS/Linux) or TCP localhost (Windows)
HMAC-authenticated messages with hmac.compare_digest
Message size limits (1MB)
Rate limiting on IPC senders
Token files with restrictive permissions (0600)

Secrets Management

Fernet encryption at rest (AES-128-CBC)
Per-identity secret isolation
Master key in macOS Keychain (or file with 0600 fallback)
ctx.get_secret("key") API — never hardcode credentials

Permission System

Default-deny — actions must be explicitly permitted
Per-identity permission rules (YAML config)
Rate limits, cooldowns, boss-agent approval requirements
PermissionChecker.is_allowed("action") runtime checks

Centralized Policy Gate

PolicyGate — unified security decision point for all plugin requests
Combines permission checks, capability verification, preflight, output safety, and rate limiting
Ensures consistent security enforcement across all plugin operations
Used by SafeLLMPipeline and available to plugins via ctx.security.policy_gate

Plugin Isolation

Plugins only access framework via PluginContext
No direct filesystem/network access outside provided APIs
Per-identity data directory isolation
Capability system (beta) — fine-grained control via plugin capability bundles (RuntimeServices, SecurityServices, LLMServices, DataServices, IdentityServices, CommunicationServices)
Plugins declare REQUIRED_CAPABILITIES; users grant permissions in identity YAML under plugin_capabilities:

What Överblick Does NOT Protect Against

1. Malicious Plugins

Plugins run with the same permissions as the identity
A malicious plugin can bypass all security layers
Mitigation: Only run plugins from trusted sources, review plugin code

2. Physical Access Attacks

Anyone with physical access to the host can read/write files
Secrets are encrypted at rest but decrypted in memory
Mitigation: Use full-disk encryption, secure host access

3. Network Attacks (when plugins expose network services)

Plugins that open network ports are outside the security boundary
No built-in firewall or network isolation
Mitigation: Run behind firewall, use VPN for remote access

4. Supply Chain Attacks

Compromised Python packages in dependencies
Malicious LLM models (Ollama pulls from untrusted registries)
Mitigation: Pin dependencies, verify checksums, use trusted model sources

5. Social Engineering

Convincing an identity to reveal secrets via legitimate conversation
Social engineering the human operator
Mitigation: Operator education, secret redaction in logs

6. Advanced Adversarial ML Attacks

Sophisticated prompt injections that evade pattern matching
AI-generated jailbreaks that pass preflight checks
Mitigation: Regular updates to preflight patterns, community testing

7. Side-Channel Attacks

Timing attacks on HMAC comparison (mitigated by hmac.compare_digest)
Memory inspection via /proc or debugging tools
Mitigation: Secure host configuration, disable debug interfaces

Safe by Default Configuration

Överblick now enables strict mode by default (since version 0.1.0-beta):

Centralized security settings: All security-related environment variables are defined in overblick/core/security/settings.py. This module provides consistent defaults, boolean parsing, and helper functions (safe_mode(), raw_llm(), strict_capabilities()). Plugins should import from this module rather than reading environment variables directly.

SafeLLMPipeline(strict=True) requires all security components
PolicyGate — centralized security gate used by SafeLLMPipeline and plugins for unified permission and capability checks
Missing preflight checker, output safety, or rate limiter raises ConfigError
Opt-out: Set environment variable OVERBLICK_SAFE_MODE=0
Beta testers: We recommend keeping strict mode enabled

Skip flags are dangerous: skip_preflight=True and skip_output_safety=True bypass critical security layers. Use only for:

Internal system prompts (supervisor handlers)
Code analysis where injection risk is minimal
Never expose skip flags to untrusted input paths

Raw LLM client protection: Plugin access to raw LLM client (ctx.llm_client) is disabled by default.

Default: OVERBLICK_RAW_LLM=0 (raises RuntimeError when accessed)
Migration: OVERBLICK_RAW_LLM=1 allows raw access (not recommended for production)
Use ctx.llm_pipeline for secure LLM calls with full security chain
ResponseGenerator requires llm_pipeline or explicit allow_raw_fallback=True

Strict capability enforcement: Capability system warns by default.

Default: OVERBLICK_STRICT_CAPABILITIES=0 (warnings only)
Strict: OVERBLICK_STRICT_CAPABILITIES=1 (raises PermissionError for missing grants)
Configure grants in identity YAML under plugin_capabilities: section

Beta Testing Security Guidelines

If you're testing Överblick in external beta:

Start with strict mode enabled (OVERBLICK_SAFE_MODE=1, default)
Review plugin permissions before enabling new plugins
Monitor audit logs for blocked attempts and false positives
Test with non-privileged identities first (no email/SMS/API access)
Use isolated test environments — not production credentials

Reporting Security Vulnerabilities

Please do NOT open public GitHub issues for security vulnerabilities.

Responsible Disclosure Process

Email: security@overblick.ai (PGP key available on request)
Include:
- Description of the vulnerability
- Steps to reproduce
- Impact assessment
- Suggested fix (if any)

Response Commitment

Acknowledgement: Within 48 hours
Assessment: Within 7 days
Fix timeline: Depends on severity (critical: <72 hours, high: <1 week)
Disclosure: Coordinated after fix is released

Scope

In-scope:

Security bypass in SafeLLMPipeline
Permission escalation
Secret leakage
Remote code execution via plugin API

Out-of-scope:

Theoretical attacks without proof-of-concept
Social engineering
Physical access attacks
Issues in dependencies (report upstream)

Security Updates

Subscribe to GitHub Releases for security updates
Critical security fixes will be marked with [SECURITY] in release notes
Consider enabling Dependabot for dependency updates

Security: jensabrahamsson/overblick

Security

SECURITY.md