A powerful enterprise-grade application that enables semantic search across visual and textual data using MongoDB Atlas Vector Search, LangGraph, and the Vercel AI SDK. Transform your charts, diagrams, PDFs, and images into searchable knowledge and interact with it through an intelligent AI agent with planning capabilities.
- Modern Two-Column Layout: Collapsible side panel (384px) with main agent view
- Multi-Select Workflow: Search/Browse → Select items → Feed to Agent → Get answers
- Focus Mode: Distraction-free interface (Cmd/Ctrl+Shift+F)
- Keyboard Shortcuts: Cmd/Ctrl+B to toggle side panel
- Image Preview Modal: Full-screen zoom, keyboard navigation, download
- Mandatory Planning Phase: Agent creates visible execution plan before taking action
- Advanced Tool Suite: 7 specialized tools including vector search, image analysis, web search, and email
- Reference Tracking: Bidirectional tracking between conversations and data sources
- Step Execution Monitoring: Detailed metrics for each tool call (duration, tokens, outputs)
- Budget-Aware Reasoning: Manages step limits intelligently (5 steps general, 8 steps deep mode)
- Feature Flags: Enable/disable web search and email tools independently
- Vector Search: Semantic similarity across text and images
- Visual Query Support: Search using natural language or images
- Project-Scoped Search: Fast, paginated results within projects
- Configurable Strategies: Different search parameters for Search/Chat/Agent modes
- Dual LLM Support: Choose between Claude or OpenAI for analysis
- Visual Content Extraction: Analyzes charts, diagrams, tables, and images
- Smart Compression: Reduces token usage by 60-80% before analysis
- Contextual Understanding: Project-aware analysis with user query context
- Project Organization: Group related documents into searchable projects
- Batch Processing: Analyze and process multiple files simultaneously
- PDF Support: Converts PDFs to images page-by-page for analysis
- Image Formats: JPEG, PNG (max 20MB for PDFs)
- Node.js 18+ and npm
- MongoDB Atlas account with M10+ cluster (required for vector search)
- API Keys:
- Anthropic Claude API key (required)
- VoyageAI API key (required)
- OpenAI API key (optional, for OpenAI analysis)
- Perplexity API key (optional, for agent web search)
- Resend API key (optional, for agent email functionality)
- LangSmith API key (optional, for tracing/debugging)
git clone https://github.com/your-repo/mongo-multimodal.git
cd mongo-multimodalnpm installCreate a .env.local file in the root directory:
# MongoDB Atlas Connection String (M10+ cluster required)
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/test?retryWrites=true&w=majority
# Required: AI Service API Keys
VOYAGE_API_KEY=your-voyage-api-key # VoyageAI for embeddings
ANTHROPIC_API_KEY=your-anthropic-api-key # Claude for analysis/chat
# Optional: Alternative LLM Provider
OPENAI_API_KEY=your-openai-api-key # OpenAI (optional alternative)
LLM_FOR_ANALYSIS=claude # Options: "claude" or "openai"
# Optional: Agent External Tools
PERPLEXITY_API_KEY=your-perplexity-api-key # Web search capability
AGENT_WEB_SEARCH_ENABLED=true # Enable/disable web search
EMAIL_API_KEY=your-resend-api-key # Email sending via Resend
EMAIL_FROM=noreply@yourdomain.com # From address for emails
EMAIL_ENABLED=true # Enable/disable email tool
# Optional: Agent Configuration
AGENT_PLANNING_ENABLED=true # Enable planning phase (recommended)
# Optional: LangSmith Tracing (for debugging agent)
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your-langsmith-api-key
LANGCHAIN_PROJECT=mongo-multimodal # Organize runs in LangSmith
# Optional: Unstructured.io (for advanced document parsing)
UNSTRUCTURED_API_URL=http://localhost:8000
UNSTRUCTURED_API_KEY= # Leave empty for self-hostedYou can create the vector search index in two ways:
Option 1: Using the Script (Recommended)
npm run create:indexOption 2: Manual Creation in Atlas UI
- Go to your MongoDB Atlas cluster
- Navigate to "Search" → "Create Search Index"
- Select "JSON Editor" and use this configuration:
{
"mappings": {
"dynamic": true,
"fields": {
"embedding": {
"type": "knnVector",
"dimensions": 1024,
"similarity": "cosine"
}
}
}
}- Name the index:
vector_index - Select the database:
test - Select the collection:
projectData
Note: Vector search requires MongoDB Atlas M10 tier or higher.
# Test database connection
npm run test:db
# Create vector search indexes
npm run create:indexnpm run devOpen http://localhost:3000 in your browser.
- Navigate to the homepage
- Click "New Project" button
- Fill in the project details:
- Name: e.g., "Q4 Financial Reports"
- Description: e.g., "Quarterly financial charts and analysis documents"
- Click "Create Project"
- Open your newly created project
- Use the side panel's Upload tab
- Select files to upload:
- Images: JPEG/PNG files (charts, graphs, diagrams)
- PDFs: Multi-page documents (reports, presentations)
- Maximum file size: 20MB for PDFs
- Files are automatically uploaded and stored
After uploading, documents need to be processed to generate embeddings:
- Each uploaded file shows a "Process" button if not yet processed
- Click "Process" on individual files
- Wait for the processing indicator to complete
- Select multiple unprocessed files using checkboxes
- Click "Batch Process Selected" button
- Monitor the progress bar as files are processed
Processing includes:
- Analyzing images/PDFs with Claude AI or OpenAI (based on
LLM_FOR_ANALYSIS) - Generating 1024-dimensional vector embeddings with VoyageAI
- Extracting metadata, tags, and insights
The agent-centric interface provides a seamless workflow:
-
Search or Browse (Side Panel):
- Use the Search tab for vector search queries
- Use the Browse tab to explore all uploaded documents
- Preview images with the eye icon (full-screen modal with zoom)
-
Select Context (Multi-Select):
- Check boxes next to relevant documents
- View selected items in the selection tray
- Feed selected items as context to the agent
-
Ask the Agent:
- Type your question in the agent chat
- Agent creates a visible plan showing its strategy
- Watch step-by-step progress as tools are executed
- See real-time reference tracking
-
Review Sources:
- Expand the References panel to see all sources used
- Click on references to view full content
- Track which data items contributed to the answer
The agent has access to these tools:
Core Tools (Always Available):
planQuery- Creates execution plan (mandatory first step)searchProjectData- Vector search with configurable results (1-10)searchSimilarItems- Find related content by similarityanalyzeImage- Context-aware image analysisprojectDataAnalysis- Fetch stored analysis without base64
External Tools (Optional):
searchWeb- Perplexity AI web search with citations (requiresPERPLEXITY_API_KEY)sendEmail- Send emails via Resend API (requiresEMAIL_API_KEY)
- General Mode (5 steps): Quick queries, 1-2 searches + 1-2 analyses
- Deep Mode (8 steps): Complex queries, 2-3 searches + 3-4 analyses
Track agent performance and usage:
- Navigate to
/api/agent/analyticsendpoint - Filter by project, session, or date range
- View tool usage statistics, step budgets, and reference patterns
┌─────────────────────────────────────────────────────────────────┐
│ Next.js 15 App (React 19) │
│ Agent-Centric UI with Side Panel │
└───────────────────────────────┬─────────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Search │ │ Chat │ │ Agent │
│ Mode │ │ Mode │ │ Mode │
│ (Vector) │ │ (AI SDK) │ │(LangGraph)│
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
└───────────────┼───────────────┘
▼
┌──────────────────┐
│ Service Layer │
├──────────────────┤
│ • Vector Search │
│ • Project Data │
│ • References │
│ • Perplexity │
│ • Email │
└────────┬─────────┘
│
┌────────────────┼────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ MongoDB │ │ VoyageAI │ │ Claude AI │
│ Atlas │ │ (Embeddings)│ │ / OpenAI │
│ (Vector DB) │ │ 1024-dim │ │ (Analysis) │
└──────────────┘ └──────────────┘ └──────────────┘
/app/
├── api/ # Backend API routes
│ ├── agent/ # LangGraph agent endpoint (AI SDK)
│ │ └── route.ts # 7 tools, planning, reference tracking
│ ├── chat/ # Vercel AI SDK chat endpoint
│ ├── projects/ # Project CRUD operations
│ │ ├── [projectId]/
│ │ │ ├── search/ # Vector search endpoint
│ │ │ ├── upload/ # File upload endpoint
│ │ │ └── data/ # Bulk operations (analyze, process)
│ │ └── data/[id]/ # Single-item operations
│ │ ├── analyze/ # AI analysis endpoint
│ │ ├── process/ # Embedding generation
│ │ └── references/ # Reference tracking
│ └── ...
├── lib/ # Core utilities
│ ├── services/ # Service layer (business logic)
│ │ ├── projectData.service.ts # Data operations
│ │ ├── vectorSearch.service.ts # Unified vector search
│ │ ├── references.service.ts # Bidirectional tracking
│ │ ├── perplexity.service.ts # Web search
│ │ └── email.service.ts # Email sending
│ ├── mongodb.ts # Database connection
│ ├── claude.ts # LLM response generation
│ ├── voyageai.ts # Embedding generation
│ ├── image-utils.ts # Image compression
│ └── pdf-to-image.ts # PDF processing
├── projects/[projectId]/ # Project UI pages
│ └── components/ # React components
│ ├── AgentCentricLayout.tsx # Main layout
│ ├── AgentView.tsx # Agent interface
│ ├── SidePanel/ # Search, Browse, Upload
│ ├── Agent/ # Plan, Progress, References
│ └── SelectionContext.tsx # Multi-select state
├── types/ # TypeScript definitions
│ ├── models.ts # Server-side types
│ └── clientTypes.ts # Client-side types
└── scripts/ # Utility scripts
├── test-db.ts # Test MongoDB connection
└── create-vector-index.ts # Create vector indexes
# Development
npm run dev # Start development server with Turbopack
npm run build # Build for production
npm run start # Start production server
npm run lint # Run ESLint
# Database
npm run test:db # Test MongoDB connection
npm run create:index # Create vector search indexes- Search quarterly reports by chart patterns
- Find anomalies across financial visualizations
- Compare internal reports with web articles
- Connect insights from multiple documents
- Locate technical diagrams by description
- Find similar defect patterns across quality reports
- Search equipment manuals with natural language
- Predict maintenance needs from visual patterns
- Search medical images by visual similarity
- Find molecular structures across research papers
- Compare clinical trial results
- Connect patient data with clinical insights
- Unified search across all visual assets
- Break down information silos
- Preserve institutional knowledge
- Automated report generation via email
- Install Vercel CLI:
npm i -g vercel - Run:
vercel - Set environment variables in Vercel dashboard
- Deploy:
vercel --prod
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["npm", "start"]- API Keys: Never commit API keys to version control
- MongoDB: Use connection strings with authentication
- File Uploads: Implement file size and type restrictions (already in place)
- Access Control: Add authentication for production use
- Data Privacy: Ensure compliance with data regulations (GDPR, HIPAA, etc.)
- Rate Limiting: Implement rate limits for API endpoints
- Input Validation: All inputs validated with Zod schemas
-
"Vector index not found"
- Run
npm run create:index - Ensure index name is
vector_index - Check database name is
test(hardcoded inlib/mongodb.ts)
- Run
-
"Failed to generate embeddings"
- Check
VOYAGE_API_KEYin.env.local - Verify VoyageAI API quota hasn't been exceeded
- Check network connectivity
- Check
-
"MongoDB connection failed"
- Verify
MONGODB_URIconnection string - Check IP whitelist in Atlas (allow your IP or 0.0.0.0/0 for all)
- Ensure cluster is M10+ for vector search
- Verify
-
"File upload fails"
- Check file size (max 20MB for PDFs)
- Ensure proper file format (JPEG, PNG for images)
- Check browser console for errors
-
"Agent planning not working"
- Verify
AGENT_PLANNING_ENABLED=truein.env.local - Check LangSmith for trace logs if enabled
- Ensure Claude API key is valid
- Verify
-
"Web search tool not available"
- Set
PERPLEXITY_API_KEYin.env.local - Ensure
AGENT_WEB_SEARCH_ENABLED=true - Check Perplexity API quota
- Set
// Project-specific vector search
POST /api/projects/[projectId]/search
{
"query": "revenue charts Q3",
"type": "text" | "image",
"page": 1,
"limit": 10
}
Response: {
results: ProjectData[],
total: number,
page: number,
totalPages: number
}// Upload file to project
POST /api/projects/[projectId]/upload
FormData: { file: File }
Response: { success: true, data: ProjectData }
// Analyze single document (AI analysis)
POST /api/projects/data/[id]/analyze
Response: { success: true, data: ProjectData }
// Process single document (generate embedding)
POST /api/projects/data/[id]/process
Response: { success: true, data: ProjectData }
// Bulk analyze
POST /api/projects/[projectId]/data/analyze
{ "dataIds": ["id1", "id2", ...] }
// Bulk process
POST /api/projects/[projectId]/data/process
{ "dataIds": ["id1", "id2", ...] }// Agent chat (streaming)
POST /api/agent
{
"messages": Message[],
"projectId": string,
"sessionId": string,
"analysisDepth": "general" | "deep"
}
Response: StreamingTextResponse with tool calls
// Agent analytics
GET /api/agent/analytics?projectId=xxx&startDate=xxx&endDate=xxx
Response: {
toolUsage: { [tool: string]: { count, avgDuration, totalDuration } },
stepBudget: { average, min, max },
planAccuracy: { avgEstimated, avgActual },
references: { total, byType: {...}, topItems: [...] },
insights: {...}
}// Get references for a data item
GET /api/projects/data/[id]/references
Response: {
dataItem: ProjectData,
conversations: Conversation[]
}Configure which LLM analyzes your images:
# Use Claude (faster, cheaper for images)
LLM_FOR_ANALYSIS=claude
# Use OpenAI (alternative)
LLM_FOR_ANALYSIS=openaiModels used:
- Claude:
claude-haiku-4-5-20251001 - OpenAI:
gpt-5-nano-2025-08-07
The application uses different search strategies based on mode:
// Search Mode: Paginated, broader threshold
{ limit: 200, numCandidates: 800, threshold: 0.3 }
// Chat Mode: Tight focus
{ limit: 2, numCandidates: 150, threshold: 0.2 }
// Agent Mode: High precision
{ limit: 2, numCandidates: 150, threshold: 0.6 }Control agent depth:
// General mode (default)
analysisDepth: "general" // 5 steps total
// Deep mode
analysisDepth: "deep" // 8 steps total- Vector Search Pagination: Limits result sets for fast responses
- Image Compression: Reduces token usage by 60-80%
- Conversation Storage: Base64 stripped before saving (16MB limit)
- HMR Safety: MongoDB client uses global caching in development
- Reference Tracking: Non-blocking updates
- Tool Execution: Parallel execution where possible
- Service Layer: Centralized logic prevents duplication
projects
{
_id: ObjectId,
name: string,
description: string,
createdAt: Date,
updatedAt: Date
}projectData
{
_id: ObjectId,
projectId: ObjectId,
type: 'image' | 'document',
content: {
text?: string,
base64?: string
},
metadata: {
filename: string,
mimeType: string,
size: number
},
analysis?: {
description: string,
tags: string[],
insights: string[],
facets: Record<string, any>
},
embedding?: number[], // 1024-dimensional vector
referencedBy?: Array<{ // Bidirectional tracking
conversationId: ObjectId,
sessionId: string,
timestamp: Date,
context: string,
toolCall: string
}>,
processedAt?: Date,
createdAt: Date,
updatedAt: Date
}conversations
{
_id: ObjectId,
projectId: string,
sessionId: string,
message: {
role: 'user' | 'assistant',
content: string
},
timestamp: Date,
plan?: { // Agent's execution plan
steps: string[],
estimatedToolCalls: number,
rationale: string,
needsExternalData: boolean,
toolsToUse: string[]
},
references?: Array<{ // Sources used
type: 'projectData' | 'web' | 'email',
dataId?: string,
url?: string,
title: string,
usedInStep: number,
toolCall: string,
score?: number
}>,
toolExecutions?: Array<{ // Detailed tracking
step: number,
tool: string,
input: Record<string, unknown>,
output: unknown,
duration: number,
tokens?: number,
timestamp: Date
}>
}- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- MongoDB Atlas for enterprise-grade vector search
- Anthropic for Claude AI and the incredible multimodal capabilities
- VoyageAI for state-of-the-art multimodal embeddings
- Vercel for the AI SDK and seamless deployment
- LangChain/LangGraph for agentic workflow capabilities
- Next.js team for the amazing React framework
- Documentation: This README
- Issues: GitHub Issues
- CLAUDE.md: See project instructions for developers
Built with MongoDB Atlas Vector Search, Claude AI, VoyageAI, and Next.js 15