Evolving AI-powered assistant for organizing knowledge, automating tasks, and multimodal reasoning
Goal: Build a personal AI assistant that helps users:
- Remember everything: Smart memory for user facts, preferences, and conversation history
- Organize knowledge from documents, notes, websites, images, and emails
- Search and summarize information efficiently
- Automate repetitive tasks (web forms, reminders, data extraction)
- Gradually evolve into a multimodal, fine-tuned, deployed agent
Key Features:
- Smart Context: Long-term memory and user awareness
- Multi-format ingestion (PDF, HTML, images, video, audio)
- RAG (Retrieval-Augmented Generation) for searching knowledge
- Agents that perform tasks using tools and automation
- Optional fine-tuning for domain-specific knowledge
- Web + API deployment for real-world usability
Status: Phase 2 - Memory & Context (Active)
Last Updated: February 14, 2026
- Core Backend: FastAPI + Gemini 2.0 Flash + LangChain
- Observability: Langfuse integration for full trace monitoring
- Frontend: React + Vite + Tailwind + Modern UI/UX
- Authentication: Supabase Auth (Email/OTP) + Protected Routes
- Memory Management System:
- Smart Classifier: LLM-based classification including Project Milestones & Skills.
- Entity Resolution: Intelligently links updates to existing projects (e.g., "frontend done" -> linked to Project NeuraDesk).
- Auto-Enhancement: Expands generic updates (e.g., "finished" → "Backend Finished") for richer context.
- Refactored Manager: Clean architecture with dedicated helper methods for key generation and value enhancement.
- Hybrid Storage:
- Structured: Supabase for exact facts.
- Vector:
pgvectorwith HuggingFace Embeddings (768d) for semantic search.
- Memory RAG: Auto-retrieval of semantically relevant memories using Vector Search.
- Background Processing: Non-blocking memory extraction
- Conversation Persistence:
- Conversation Management: Create and track conversations per user
- Message Storage: All user and AI messages saved to database
- Unified Chat Endpoint: Single endpoint handles both new and existing conversations
- Auto Title Generation: Conversations automatically titled from first message
- Frontend Enhancements:
- Growth Board: New dashboard widget visualizing Projects, Skills, Focus, and Bio.
- Skill Graph: Dedicated visualization for user skills and tools.
- Smart Milestones: Clean date formatting for project updates.
- Chat UI: Smart dates, context menus, and favorites.
- Database Integration:
- Centralized Supabase service
- Row Level Security (RLS) for user privacy
- Pydantic models for structured data validation
- Conversation and Message repositories
- Document Ingestion: Uploading PDFs/MD files to vector store.
- Streaming Responses: Implement real-time token streaming.
- Document RAG: Implement PDF/File ingestion for knowledge base.
- Monitor Classification: Fine-tune the LLM prompt to ensure consistent skill/milestone detection.
- Memory Explorer: Advanced UI to browse and manage all stored memories.
- Archive: Implement "Archive" functionality for old chats.
📊 See MILESTONES.md for detailed progress tracking.
| Phase | TODO / Description | Skills / Concepts | Tools / Tech | Expected Output |
|---|---|---|---|---|
| 1. Setup and LLM API Base | ✅ Done - FastAPI backend with Gemini 2.0 Flash, LangChain integration, Langfuse monitoring | Prompting, JSON output, LLM observability | Gemini 2.0 API, FastAPI, LangChain, Langfuse | User asks question → AI answers from text input |
| 2. Memory & Context | ✅ Done - "Smart Memory" (Structured + Vector RAG), Conversation history | Vector stores (long-term), User profiling | Supabase (pgvector), Postgres RPC, LangChain Memory | AI remembers you, your past chats, and preferences |
| 3. Document Ingestion / RAG | 🔄 Next Up - Add PDF / HTML ingestion → store embeddings → searchable | Embeddings, chunking, vector DB, retrieval | HuggingFace Embeddings, Supabase Vector | Ask questions → AI answers using documents |
| 4. Multimodal Support | TODO: Add image, screenshot, audio ingestion + OCR | Image → text, audio transcription, TTS | Gemini Vision, Whisper, XTTS, OpenCV | AI can understand images / screenshots / audio and answer questions |
| 5. Agents & Automation | TODO: Enable AI to perform tasks like web scraping, form filling, email sending | Tool calling, multi-step reasoning, memory | LangChain / LlamaIndex, Selenium / Playwright | AI completes automated workflows for the user |
| 6. Fine-Tuning | TODO: Fine-tune model on personal / domain-specific data | LoRA, QLoRA, SFT | HuggingFace TRL, LoRA adapters | AI answers more accurately for personal workflow or specialized domain |
| 7. Deployment | TODO: Make the assistant accessible via web / API | FastAPI, Docker, Redis, async pipelines | Vercel / Railway / GCP | Live personal assistant available via web + API |
| 8. Scaling & Monitoring | TODO: Optimize for performance, multi-user, logging, error handling | Async pipelines, Redis queues, monitoring | Celery, Redis, Prometheus, Grafana | Production-ready system with dashboard & analytics |
- LLM APIs: Gemini 2.0
- Embeddings / RAG: HuggingFace
all-mpnet-base-v2, Supabasepgvector - Agents / Workflow: LangChain, LlamaIndex, CrewAI
- Automation: Selenium, Playwright, Python scripts
- Multimodal: Gemini Vision, OpenCV, Whisper, XTTS
- Fine-tuning: LoRA, QLoRA, HuggingFace TRL
- Backend / Deployment: FastAPI, Docker, Redis, Vercel / Railway / GCP, Supabase
- Frontend / Dashboard: React / Next.js, Streamlit (optional)
NeuraDesk features a sophisticated Memory Management System that allows the AI to "remember" users over time.
-
Classifier (
app/memory/classifier.py):- Analyzes every user message in the background.
- Uses Gemini with Structured Output (JSON Schema) to categorize facts.
- Categories:
Personal Profile,Preference,Project,Ephemeral. - Assigns an Importance Score (0.0 - 1.0).
-
Manager (
app/memory/manager.py):- Orchestrates the flow: User Query -> Classification -> Storage.
- Hybrid Storage Strategy:
- Structured: Stores exact Key/Value pairs in Postgres for instant "User Profile" lookup.
- Vector: Generates embeddings (numbers representing meaning) and stores them in Supabase Vector for "Fuzzy" semantic search.
- Retrieves relevant memories to inject into the chat context.
-
Memory Repository (
app/database/repositories/memory.py):- Persists facts to Supabase (
user_memoriestable). - Uses Row Level Security (RLS) to ensure users only access their own data.
- Persists facts to Supabase (
-
Vector Repository (
app/database/repositories/vector.py):- Persists embeddings to Supabase (
memory_embeddingstable). - Uses Postgres RPC (
match_embeddings) to perform fast cosine similarity search.
- Persists embeddings to Supabase (
- User asks: "My name is Sarah and I love Python."
- LLM Answers: "Nice to meet you Sarah! Python is great."
- Background Process:
- Classifier detects:
Category: Personal,Key: name,Value: Sarah. - Action 1: Save to Structured DB (Role: Profile Display).
- Action 2: Generate Embedding -> Save to Vector DB (Role: Deep Recall).
- Classifier detects:
- Next Query: "What language do I like?" (Note: "language" != "Python")
- Context Injection (Future): Vector Search finds "Sarah loves Python" because "language" is semantically close to "Python".
- LLM Answers: "You mentioned you love Python!"
NeuraDesk now features a Conversation Persistence System that tracks all conversations and messages for each user.
-
Conversation Repository (
app/database/repositories/conversations.py):- Creates new conversations with auto-generated titles
- Retrieves conversations for a user
- Updates and deletes conversations
- Stores:
id,user_id,title,is_favorite,is_archived, timestamps
-
Message Repository (
app/database/repositories/messages.py):- Saves every user and AI message to the database
- Retrieves message history for a conversation
- Formats conversation history for LLM context (future use)
- Stores:
id,conversation_id,role(user/assistant),content,created_at
-
Chat Service (
app/services/chat_service.py):- Orchestrates conversation creation and message storage
- Handles both new and existing conversations seamlessly
- Auto-generates conversation titles from first message
1. User sends message WITHOUT conversation_id
↓
2. Create new conversation in database
- Title: First 50 chars of message
- Returns: conversation_id
↓
3. Save user message to messages table
↓
4. Generate AI response (with user facts)
↓
5. Save AI response to messages table
↓
6. Return response WITH conversation_id
1. User sends message WITH conversation_id
↓
2. Save user message to messages table
↓
3. Generate AI response (with user facts)
↓
4. Save AI response to messages table
↓
5. Return response WITH conversation_id
Conversations Table:
CREATE TABLE conversations (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
user_id UUID REFERENCES auth.users(id) ON DELETE CASCADE,
title TEXT NOT NULL,
is_favorite BOOLEAN DEFAULT FALSE,
is_archived BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);Messages Table:
CREATE TABLE messages (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
conversation_id UUID REFERENCES conversations(id) ON DELETE CASCADE,
role TEXT NOT NULL, -- 'user' or 'assistant'
content TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);POST /api/v1/chat
- Request:
{ user_id, message, conversation_id? } - Response:
{ message, answer, conversation_id } - Behavior:
- If
conversation_idis null → creates new conversation - If
conversation_idis provided → continues existing conversation - All messages are automatically saved to database
- If
PATCH /api/v1/conversations/{conversation_id}
- Request:
{ title?, is_favourite? } - Response:
{ status: "success" } - Behavior: Updates conversation title or favorite status
DELETE /api/v1/conversations/{conversation_id}
- Request: None
- Response:
{ status: "success" } - Behavior: Permanently deletes conversation and all associated messages
As NeuraDesk evolved from a simple LLM wrapper to a sophisticated memory-aware assistant, the codebase needed to scale accordingly. The refactor implements clean architecture principles to ensure:
- Separation of Concerns: Each layer has a single, well-defined responsibility
- Maintainability: Easy to locate, understand, and modify code
- Testability: Isolated components can be tested independently
- Scalability: New features can be added without touching existing code
- Team Collaboration: Clear boundaries make parallel development easier
┌─────────────────────────────────────────────────┐
│ API Layer (Outer) │ ← HTTP endpoints, request/response
├─────────────────────────────────────────────────┤
│ Services (Business Logic) │ ← Orchestration, workflows
├─────────────────────────────────────────────────┤
│ Memory (Domain Logic) │ ← Memory management, classification
├─────────────────────────────────────────────────┤
│ Database (Data Access) │ ← Repositories, Supabase client
├─────────────────────────────────────────────────┤
│ Schemas (Data Models) │ ← Pydantic validation models
└─────────────────────────────────────────────────┘
Key Principles:
- Outer layers depend on inner layers (never the reverse)
- Database layer knows nothing about business logic
- Memory layer focuses on domain logic, not DB operations
- Services orchestrate between layers
- API layer is thin, delegates to services
NeuraDesk/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI entry point with CORS & routing
│ │ │
│ │ ├── api/ # 🌐 API Layer (Outer)
│ │ │ ├── deps.py # Dependency injection
│ │ │ └── v1/
│ │ │ └── chat.py # Chat endpoints (/api/v1/chat, /api/v1/test)
│ │ │
│ │ ├── services/ # 🧠 Business Logic Layer
│ │ │ ├── chat_service.py # Chat orchestration & LLM invocation
│ │ │ └── langfuse_service.py # Observability & prompt management
│ │ │
│ │ ├── memory/ # 💾 Memory Domain Layer
│ │ │ ├── manager.py # Memory orchestration (process, retrieve)
│ │ │ └── classifier.py # Fact classification logic
│ │ │
│ │ ├── database/ # 🗄️ Data Access Layer
│ │ │ ├── client.py # Supabase client singleton
│ │ │ └── repositories/
│ │ │ ├── memory.py # Memory CRUD operations
│ │ │ ├── conversations.py # Conversation CRUD operations
│ │ │ ├── messages.py # Message CRUD operations
│ │ │ └── vector.py # Vector storage (future)
│ │ │
│ │ ├── schemas/ # 📋 Data Models Layer
│ │ │ ├── chat_models.py # ChatRequest, ChatResponse
│ │ │ ├── memory.py # MemoryFact, MemoryType, MemoryClassificationResult
│ │ │ ├── conversations.py # Conversation model
│ │ │ ├── messages.py # Message, MessageCreate models
│ │ │ └── classification_schema.py # LLM structured output schemas
│ │ │
│ │ └── ai/ # 🤖 AI/LLM Layer
│ │ ├── llm.py # LLMService (Gemini + LangChain)
│ │ └── chat_engine.py # AI response & fact classification functions
│ │
│ ├── requirements.txt
│ └── .env # API Keys & Config
│
├── frontend/
│ ├── src/
│ │ ├── api/ # API integration
│ │ ├── components/ # React components
│ │ └── pages/ # Application routes
│ └── package.json
│
└── README.md
- Defines HTTP endpoints
- Validates requests/responses
- Delegates to services
- Does NOT contain business logic
- Orchestrates workflows
- Coordinates between memory, AI, and database
- Implements business rules
- Does NOT directly access database
- Memory classification logic
- Memory retrieval strategies
- Does NOT know about Supabase or SQL
- Single source of truth for data access
- Repository pattern for each entity
- Supabase client management
- Does NOT contain business logic
- Pydantic models for validation
- Shared data contracts
- Type safety across layers
- Multi-agent orchestration (agents querying each other)
- Auto-update knowledge base (ingest new docs automatically)
- Personalized recommendations and reminders
- User authentication & multi-user support
- Analytics dashboard to track queries, usage, and model performance
- Phase 1: Working LLM answering questions from text
- Phase 2: Smart Memory & Context Awareness (History, User Facts)
- Phase 3: RAG-powered knowledge retrieval
- Phase 4: Multimodal support (images + audio)
- Phase 5: Automation agent capable of tasks
- Phase 6: Fine-tuned personalized AI
- Phase 7: Deployed web + API assistant
- Phase 8: Production-ready scaling + monitoring
Current Focus:
- Complete RAG Pipeline: Connect vector retrieval to ChatService.
- Document Ingestion: Implement PDF upload and chunking.
- Implement session management and conversation history
- Add streaming responses for better UX
Maintenance Notes:
- Keep dependencies updated (especially SDK versions)
- Monitor Langfuse dashboard for LLM performance metrics
- Document API changes and breaking updates