Kagura Memory Cloud is built with a modern, scalable architecture designed for production use.
┌─────────────────────────────────────────────────────────────┐
│ Client Applications │
│ (Claude Desktop, Claude Code, ChatGPT, Web UI, Custom) │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Authentication Layer │
│ OAuth2 (Google, GitHub) │ API Keys │ JWT Tokens │
└─────────────────────────────────────────────────────────────┘
↓
┌──────────────────────┬──────────────────────────────────────┐
│ MCP Server (SSE) │ REST API (FastAPI) │
│ - 21 MCP Tools │ - Memory CRUD │
│ (memory / ctx / │ - OAuth2 endpoints │
│ edge / search / │ - API Key management │
│ usage / sleep) │ - Admin: sleep-reports, neural cfg │
│ - Session Mgmt │ │
│ - JSON-RPC │ │
└──────────────────────┴──────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Service Layer │
│ MemoryService │ SearchService │ EmbeddingService │
│ GraphService │ NeuralMemoryEngine │ AuthService │
│ SleepService │ LLMService │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Repository Layer │
│ MemoryRepository │ GraphRepository │ UserRepository │
└─────────────────────────────────────────────────────────────┘
↓
┌──────────────┬──────────────┬──────────────┬───────────────┐
│ PostgreSQL │ Qdrant │ Redis │ External API │
│ - Memories │ - Vectors │ - Sessions │ - OpenAI │
│ - Users │ - Full-text │ - Cache │ - Cohere │
│ - Graph │ (1u=1coll) │ - Rate Lmt │ │
└──────────────┴──────────────┴──────────────┴───────────────┘
- Purpose: Quick search and retrieval
- Content: Concise summary (10-500 characters)
- Storage: PostgreSQL + Qdrant vector
- Use case: Initial search results
- Purpose: Understanding context
- Content: Medium explanation (max 2000 characters)
- Storage: PostgreSQL
- Use case: Showing search result previews
- Purpose: Complete information
- Content: Full content + structured metadata (JSON)
- Storage: PostgreSQL
- Use case: Detailed view after selection
User Query
↓
┌─────────────────────────────┐
│ Query Processing │
│ - Tokenization │
│ - OpenAI Embedding │
└─────────────────────────────┘
↓
┌──────────────┬──────────────┐
│ Semantic │ BM25 │
│ (Vector) │ (Keyword) │
│ 60% weight │ 40% weight │
└──────────────┴──────────────┘
↓ ↓
Results A Results B
└───────┬───────┘
↓
┌───────────────┐
│ Fusion │
│ (Weighted) │
└───────────────┘
↓
┌───────────────┐
│ Cohere │
│ Reranking │
└───────────────┘
↓
Final Results
- Semantic Search: 60% (better for meaning)
- BM25 Full-text: 40% (better for exact keywords)
- Reranking: Cohere multilingual-v3.0
Automatic relationship learning based on co-activation:
# When memories A and B are accessed together
weight(A, B) += learning_rate * activation(A) * activation(B)Features:
- Decays over time (forgetting curve)
- Strengthens with repeated co-access
- Creates knowledge graph automatically
Graph-based exploration from seed memory:
Seed Memory
↓
[depth=1]
├── Related Memory 1 (weight=0.9)
├── Related Memory 2 (weight=0.7)
└── Related Memory 3 (weight=0.5)
↓
[depth=2]
├── Sub-related 1 (weight=0.8)
└── Sub-related 2 (weight=0.6)
Parameters:
depth: Max hops (default: 2, max: 5)min_weight: Minimum edge weight (default: 0.5)relation_types: Filter by relation types
Combines multiple signals:
score = (
0.4 * semantic_score +
0.3 * graph_score +
0.2 * temporal_score +
0.1 * trust_score
)- users - User accounts
- api_keys - API key management (SHA256 hashed)
- external_api_keys - OpenAI/Cohere keys (Fernet encrypted)
- memories - 3-layer memory storage
- graph_memory - Neural memory relationships (NetworkX JSON)
- oauth_clients - OAuth2 client applications
- oauth_authorization_codes - OAuth2 auth codes
- oauth_tokens - OAuth2 access/refresh tokens
Design: 1 user = 1 collection
- Collection name:
kagura_user_{user_id} - Vector size: 512 (OpenAI text-embedding-3-small)
- Distance metric: Cosine
- Tokenizer: Multilingual (auto Japanese support)
Features:
- Semantic vector search
- Full-text BM25 search (MatchText)
- Metadata filtering
- Sessions: Session-based authentication (7 days TTL)
- Cache: Embedding cache, search results
- Rate Limiting: Per-key request counters
User → OAuth2 Login → Session Cookie → Access Token
↓
API Key Creation
↓
External API Access
- Admin: Full access (user management, system config)
- User: Standard access (own memories, API keys)
- Read-only: View-only access
- API Keys: SHA256 hash storage
- External API Keys: Fernet symmetric encryption
- JWT Tokens: HS256 signing (1 hour expiration)
- OAuth2 Secrets: SHA256 hash storage
- Backend: Stateless FastAPI (scale with replicas)
- Frontend: Next.js static export (CDN-ready)
- Database: PostgreSQL connection pooling (asyncpg)
- Redis: Cluster mode support
- Async I/O: All database operations async
- Connection Pooling: PostgreSQL (20 max), Redis (10 max)
- Caching: Redis cache for embeddings (60 min TTL)
- Batch Processing: Background tasks with APScheduler
- CPU: 2-4 cores (recommended)
- Memory: 4-8 GB (recommended)
- Storage: ~100 MB per 1000 memories (PostgreSQL + Qdrant)
┌──────────────────────────────────────┐
│ Load Balancer (GCP) │
└──────────────────────────────────────┘
↓
┌──────────────────────────────────────┐
│ Backend Instances (Docker) │
│ - your-domain.com (production) │
│ - localhost:8080 (development) │
└──────────────────────────────────────┘
↓
┌──────────────────────────────────────┐
│ Managed Services (GCP) │
│ - Cloud SQL (PostgreSQL) │
│ - Qdrant Cloud │
│ - Memorystore (Redis) │
└──────────────────────────────────────┘
| Layer | Technology | Version |
|---|---|---|
| Backend | FastAPI | 0.115+ |
| Database | PostgreSQL | 15+ |
| Vector DB | Qdrant | 1.15+ |
| Cache | Redis | 7+ |
| Frontend | Next.js | 16 |
| ORM | SQLAlchemy | 2.0+ (async) |
| Auth | Authlib | 1.3+ |
| AI | OpenAI API | Latest |
| Reranking | Cohere API | Latest |
| Graph | NetworkX | 3.0+ |
- Format: Structured JSON logs (structlog)
- Levels: DEBUG, INFO, WARNING, ERROR
- Destinations: stdout (Docker), CloudWatch (prod)
- Request latency (p50, p95, p99)
- Error rates
- Database query performance
- Memory usage per user
/health- Basic health check/api/v1/info- Detailed system info- Database connection status
- Qdrant connection status
- Redis connection status
- Multi-region Deployment: Global CDN + regional databases
- Real-time Collaboration: WebSocket support for shared memories
- Advanced Analytics: User behavior insights, usage patterns
- Custom Embeddings: Fine-tuned models for specific domains
- GraphQL API: Alternative to REST for complex queries