Överblick is a security-first multi-identity agent framework. Every LLM interaction passes through a 6-layer fail-closed pipeline. This document describes the threat model, security guarantees, limitations, and reporting process.
Överblick is designed to protect against:
- Attempts to override system instructions
- Hidden commands in user input
- Multi-message attacks
- Persona hijacking
- LLM revealing it's an AI/model
- Mentioning internal rules or safety guidelines
- Breaking character/persona
- Harmful instructions (violence, self-harm)
- Hate speech, discrimination
- Dangerous technical instructions
- Rate limit bypass attempts
- Denial-of-service via LLM calls
- Memory exhaustion through large inputs
- Unauthorized inter-process communication
- Privilege escalation between identities
- Secret leakage between identities
- Hardcoded credentials in plugins
- Unencrypted secrets at rest
- Permission bypass through misconfiguration
Överblick implements defense-in-depth with 6 interlocking layers:
- Strips null bytes, control characters (except
\n,\t,\r) - Unicode NFC normalization
- Length truncation (10,000 characters)
- Boundary markers wrap external content:
<<<EXTERNAL_*_START>>> ... <<<EXTERNAL_*_END>>>
- Fast pattern matching — 17 instant-block, 8 suspicion patterns
- AI analysis — LLM-classifies suspicious messages (confidence ≥ 0.7)
- User context tracking — Per-user suspicion scoring with temporary bans
- Token bucket algorithm with LRU eviction
- Per-key limits (e.g.,
"llm_pipeline:user123") - Configurable burst capacity and refill rate
- Actual model invocation
- Fail-closed: Empty response or error → blocked result
- AI language detection — Blocks "I am an AI", model names, etc.
- Persona break detection — Blocks "I'm not {identity}", role-breaking statements
- Banned slang replacement — Identity-specific word filtering with replacements
- Blocked content — Harmful content patterns
- Structured SQLite audit trail
- Every action logged with identity, plugin, success/failure
- Automatic 90-day retention cleanup
- Unix domain sockets (macOS/Linux) or TCP localhost (Windows)
- HMAC-authenticated messages with
hmac.compare_digest - Message size limits (1MB)
- Rate limiting on IPC senders
- Token files with restrictive permissions (0600)
- Fernet encryption at rest (AES-128-CBC)
- Per-identity secret isolation
- Master key in macOS Keychain (or file with 0600 fallback)
ctx.get_secret("key")API — never hardcode credentials
- Default-deny — actions must be explicitly permitted
- Per-identity permission rules (YAML config)
- Rate limits, cooldowns, boss-agent approval requirements
PermissionChecker.is_allowed("action")runtime checks
- PolicyGate — unified security decision point for all plugin requests
- Combines permission checks, capability verification, preflight, output safety, and rate limiting
- Ensures consistent security enforcement across all plugin operations
- Used by
SafeLLMPipelineand available to plugins viactx.security.policy_gate
- Plugins only access framework via
PluginContext - No direct filesystem/network access outside provided APIs
- Per-identity data directory isolation
- Capability system (beta) — fine-grained control via plugin capability bundles (RuntimeServices, SecurityServices, LLMServices, DataServices, IdentityServices, CommunicationServices)
- Plugins declare
REQUIRED_CAPABILITIES; users grant permissions in identity YAML underplugin_capabilities:
- Plugins run with the same permissions as the identity
- A malicious plugin can bypass all security layers
- Mitigation: Only run plugins from trusted sources, review plugin code
- Anyone with physical access to the host can read/write files
- Secrets are encrypted at rest but decrypted in memory
- Mitigation: Use full-disk encryption, secure host access
- Plugins that open network ports are outside the security boundary
- No built-in firewall or network isolation
- Mitigation: Run behind firewall, use VPN for remote access
- Compromised Python packages in dependencies
- Malicious LLM models (Ollama pulls from untrusted registries)
- Mitigation: Pin dependencies, verify checksums, use trusted model sources
- Convincing an identity to reveal secrets via legitimate conversation
- Social engineering the human operator
- Mitigation: Operator education, secret redaction in logs
- Sophisticated prompt injections that evade pattern matching
- AI-generated jailbreaks that pass preflight checks
- Mitigation: Regular updates to preflight patterns, community testing
- Timing attacks on HMAC comparison (mitigated by
hmac.compare_digest) - Memory inspection via
/procor debugging tools - Mitigation: Secure host configuration, disable debug interfaces
Överblick now enables strict mode by default (since version 0.1.0-beta):
Centralized security settings: All security-related environment variables are defined in overblick/core/security/settings.py. This module provides consistent defaults, boolean parsing, and helper functions (safe_mode(), raw_llm(), strict_capabilities()). Plugins should import from this module rather than reading environment variables directly.
SafeLLMPipeline(strict=True)requires all security componentsPolicyGate— centralized security gate used bySafeLLMPipelineand plugins for unified permission and capability checks- Missing preflight checker, output safety, or rate limiter raises
ConfigError - Opt-out: Set environment variable
OVERBLICK_SAFE_MODE=0 - Beta testers: We recommend keeping strict mode enabled
Skip flags are dangerous: skip_preflight=True and skip_output_safety=True bypass critical security layers. Use only for:
- Internal system prompts (supervisor handlers)
- Code analysis where injection risk is minimal
- Never expose skip flags to untrusted input paths
Raw LLM client protection: Plugin access to raw LLM client (ctx.llm_client) is disabled by default.
- Default:
OVERBLICK_RAW_LLM=0(raises RuntimeError when accessed) - Migration:
OVERBLICK_RAW_LLM=1allows raw access (not recommended for production) - Use
ctx.llm_pipelinefor secure LLM calls with full security chain ResponseGeneratorrequiresllm_pipelineor explicitallow_raw_fallback=True
Strict capability enforcement: Capability system warns by default.
- Default:
OVERBLICK_STRICT_CAPABILITIES=0(warnings only) - Strict:
OVERBLICK_STRICT_CAPABILITIES=1(raises PermissionError for missing grants) - Configure grants in identity YAML under
plugin_capabilities:section
If you're testing Överblick in external beta:
- Start with strict mode enabled (
OVERBLICK_SAFE_MODE=1, default) - Review plugin permissions before enabling new plugins
- Monitor audit logs for blocked attempts and false positives
- Test with non-privileged identities first (no email/SMS/API access)
- Use isolated test environments — not production credentials
Please do NOT open public GitHub issues for security vulnerabilities.
- Email: security@overblick.ai (PGP key available on request)
- Include:
- Description of the vulnerability
- Steps to reproduce
- Impact assessment
- Suggested fix (if any)
- Acknowledgement: Within 48 hours
- Assessment: Within 7 days
- Fix timeline: Depends on severity (critical: <72 hours, high: <1 week)
- Disclosure: Coordinated after fix is released
In-scope:
- Security bypass in SafeLLMPipeline
- Permission escalation
- Secret leakage
- Remote code execution via plugin API
Out-of-scope:
- Theoretical attacks without proof-of-concept
- Social engineering
- Physical access attacks
- Issues in dependencies (report upstream)
- Subscribe to GitHub Releases for security updates
- Critical security fixes will be marked with
[SECURITY]in release notes - Consider enabling Dependabot for dependency updates
- ARCHITECTURE.md — Detailed security architecture
- overblick/core/security/README.md — Security module documentation
- AGENTS.md — Framework overview and agent guidelines
Last updated: March 2025
Överblick Security Team