Skip to content

Latest commit

 

History

History
351 lines (286 loc) · 13.1 KB

File metadata and controls

351 lines (286 loc) · 13.1 KB

RAGChatbot - Architecture & Design Decisions

Problem Statement & Approach

Core Challenge

Build a full-stack RAG (Retrieval-Augmented Generation) chatbot that supports both anonymous and authenticated users, with persistent chat history and production-ready AWS deployment.

Key Requirements

  1. User Management: Anonymous users (first-time visitors) vs authenticated users (returning users)
  2. RAG Integration: Vector search + AI generation for contextual responses
  3. Chat Persistence: Store and retrieve conversation history
  4. Production Deployment: AWS-ready with scalability and security
  5. Modern Tech Stack: FastAPI + React + TypeScript

Architectural Decisions

1. Backend Architecture (FastAPI)

Project Structure

Server/
├── app/
│   ├── api/v1/endpoints/     # REST API routes
│   ├── auth/                 # JWT authentication
│   ├── core/                 # Configuration & database
│   ├── models/               # SQLAlchemy ORM models
│   ├── schemas/              # Pydantic data validation
│   └── services/             # Business logic layer
├── alembic/                  # Database migrations
└── requirements.txt          # Dependencies

Key Design Patterns

  • Alembic: Alembic helps with versioning and updating the database
  • Service Layer: Business logic isolated from API endpoints which helps readability
  • Middleware Pattern: Helps protect routes that may be sensitive (like chats)
  • Pydantic: Response and request validation to ensure data integrity

2. Database Architecture (PostgreSQL + Amazon RDS)

Why PostgreSQL?

  • ACID Compliance: Full transactional support
  • JSON Support: Native JSONB for flexible metadata storage (chat sources, retrieved context)
  • Structure: Relational structure is intuitive for keeping track of user chats and history

Why Amazon RDS?

  • Managed Service: Automated backups, patching, and maintenance, especially when deploying to AWS
  • Monitoring: CloudWatch integration for performance metrics
  • Cost Effective: Pay-per-use model with reserved instances for production

Database Schema Design

-- Users Table (Supports both anonymous and authenticated users)
users (
    id SERIAL PRIMARY KEY,
    identifier VARCHAR(255) UNIQUE NOT NULL,  -- UUID for anonymous, email for authenticated
    user_type VARCHAR(20) NOT NULL,           -- 'anonymous' or 'authenticated'
    email VARCHAR(255) UNIQUE,                -- NULL for anonymous users
    hashed_password VARCHAR(255),             -- NULL for anonymous users
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    is_active BOOLEAN DEFAULT true,
    is_verified BOOLEAN DEFAULT false,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    last_login TIMESTAMP
);

-- Chat Sessions
chat_sessions (
    id SERIAL PRIMARY KEY,
    session_id UUID UNIQUE NOT NULL,          -- Session identifier
    user_id INTEGER REFERENCES users(id),
    title VARCHAR(255) NOT NULL,
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    last_activity TIMESTAMP DEFAULT NOW()
);

-- Chat Messages
chat_messages (
    id SERIAL PRIMARY KEY,
    session_id INTEGER REFERENCES chat_sessions(id),
    content TEXT NOT NULL,
    role VARCHAR(20) NOT NULL,                -- 'user' or 'assistant'
    retrieved_context JSONB,                  -- RAG retrieved documents
    sources JSONB,                            -- Source metadata
    tokens_used INTEGER DEFAULT 0,
    processing_time INTEGER DEFAULT 0,        -- Response time in milliseconds
    created_at TIMESTAMP DEFAULT NOW()
);

Key Design Decisions

  • JSONB for Metadata: Flexible storage of RAG context and sources
  • UUID Session IDs: Unique session identifiers for frontend
  • Soft Deletes: is_active flags instead of hard deletes
  • Audit Trail: created_at, updated_at, last_activity timestamps
  • Foreign Key Constraints: Data integrity and referential integrity

Migration Strategy

  • Alembic: Version-controlled database migrations
  • Forward-Only: Production-safe migration approach
  • Rollback Support: Development environment rollbacks

3. User Management Strategy

Dual User System Design

# Anonymous Users (Stateless)
- UUID-based identification
- JWT tokens with anonymous claims
- No database persistence for user data
- Chat history tied to session tokens

# Authenticated Users (Stateful)
- Email-based registration/login
- Database persistence with full profiles
- JWT tokens with user ID claims
- Persistent chat history across sessions

User Conversion Flow

Anonymous User → Chat → Convert to Authenticated → Preserve History

Authentication Architecture

  • JWT Tokens: Stateless authentication supporting both user types
  • Token Claims: user_id, identifier, user_type for flexible user handling
  • Middleware Protection: Global auth middleware + endpoint-level validation
  • CORS Configuration: Secure cross-origin requests

3. RAG Implementation Strategy

Vector Store Architecture

# Pluggable Embedding Providers
- OpenAI: text-embedding-3-small
- NVIDIA NIM: nvidia/nv-embed-v1(1024-dim)

# Vector Database: Pinecone
- 1024-dimensional vectors
- Automatic dimension normalization
- Similarity search with configurable thresholds

Used NVIDIA API key to get access to embedding models similar to llama-text-embed-v2 as the OpenAI embeddings were not very performant.

RAG Pipeline Design

Query → Query Expansion → Embedding Generation → Vector Search → Context Retrieval → AI Generation → Response

Query Expansion Strategy

The system employs a query expansion strategy to make sure it retrieves as many relevant documents as possible:

1. Keyword Extraction & Combinations

  • Extract meaningful keywords from user queries
  • Generate single keyword searches for broad matching
  • Create bigram combinations for phrase matching
  • Combine top keywords in various combinations

2. Question-Based Variations

  • Add question starters (How, What, Why, When, Where) to non-question queries
  • Generate multiple question formats for better coverage
  • Include both question and non-question versions

3. Action-Oriented Expansions

  • Add action verbs: help, support, troubleshoot, resolve, fix, solve
  • Include information-seeking variations: information about, guide for
  • Cover different user intents and search patterns

4. Synonym-Based Searches

  • Map financial terms to common synonyms
  • Example: "transaction declined" → "payment rejected", "purchase failed"
  • Ensures matching even when users use different terminology

5. Context-Specific Expansions

  • Add domain-specific contexts for financial queries
  • Include: banking, financial services, payment processing, customer support
  • Broadens search scope while maintaining relevance, especially for a targeted knowledge base

Example: For query "One of my transactions got declined, what should I do?"

  • Generates 30+ search variations including:
    • Single keywords: "transaction", "declined", "payment", "rejected"
    • Phrases: "transaction declined", "payment rejected", "help with declined transaction"
    • Questions: "How to fix declined transaction", "What is declined transaction"
    • Context: "banking transaction declined", "customer support declined transaction"

4. Frontend Architecture (React + Vite + Typescript)

Component Architecture

Client/
├── components/
│   ├── auth/          # Authentication components
│   └── chat/          # Chat interface components
├── contexts/          # React context for state management (specifically auth)
├── hooks/             # Hooks required for state management (specifically auth)
├── services/          # API client, utilities, and types
└── pages/             # Route-based components

State Management Strategy

  • React Context: Global auth state and user management
  • API Integration: Axios for HTTP requests with interceptors
  • Local States: React hooks for local state management (but for future would move to SSR/Tanstack for cleaner management)

6. API Design Decisions

RESTful Endpoint Structure

/api/v1/auth/
├── POST /register          # User registration
├── POST /login             # User authentication
├── POST /anonymous         # Anonymous user creation
├── POST /convert-anonymous # User conversion
├── GET /me                 # Current user info
└── POST /change-password   # Password management

/api/v1/chat/
├── POST /send             # Send message
├── GET /sessions          # List chat sessions
├── GET /sessions/{id}     # Get specific session
├── POST /sessions         # Create new session
└── DELETE /sessions/{id}  # Delete session

Response Format Standardization

{
  "message": "AI response content",
  "session_id": "uuid",
  "sources": [{"title": "...", "content": "..."}],
  "tokens_used": 150,
  "processing_time": 1200
}

7. Security Architecture

Authentication & Authorization

  • JWT Tokens: Secure, stateless authentication
  • Token Expiration: 30-minute access tokens
  • Password Security: bcrypt hashing with salt
  • CORS Protection: Configurable allowed origins

API Security

  • Global Middleware: Automatic route protection
  • Input Validation: Pydantic schemas for all inputs
  • SQL Injection Prevention: SQLAlchemy ORM

Production Security

  • Environment Variables: Secure configuration management (AWS Parameter Store, Secrets Manager)
  • HTTPS Only: TLS encryption in production
  • Audit Logging: Request/response logging capability

8. AWS Deployment Strategy

Service Selection Rationale

Frontend: AWS Amplify
├── Global CDN distribution
├── Automatic HTTPS
├── Git-based deployments
└── Built-in CI/CD

Backend: AWS App Runner
├── Serverless container platform
├── Auto-scaling capabilities
├── Health check integration
└── Managed infrastructure

Database: Amazon RDS
├── Managed PostgreSQL
├── Automated backups
├── Multi-AZ availability
└── Security group isolation

Deployment Architecture

Amplify (Frontend) → App Runner (Backend) → RDS (Database)
                            ↓
                    Pinecone (Vector DB)

Environment Management

  • Development: Local environment with hot reload
  • Staging: App Runner with staging database
  • Production: App Runner with production RDS + monitoring

9. Scalability Considerations

Horizontal Scaling

  • App Runner: Automatic scaling based on CPU/memory
  • RDS: Read replicas for high read loads
  • Pinecone: Managed vector database with auto-scaling (if more documents/vectors needed)

Performance Optimization

  • Database Indexing: Optimized queries with proper indexes
  • Caching Strategy: Redis-ready for session caching
  • Connection Pooling: Database connection management
  • Async Processing: Non-blocking I/O operations

Monitoring & Observability

  • Health Checks: /health endpoint for load balancers
  • Logging: Structured logging for debugging
  • Metrics: CloudWatch integration ready
  • Error Handling: Comprehensive exception management

Technical Challenges & Solutions

1. Anonymous User Persistence

Challenge: How to maintain chat history for anonymous users without database storage?

Solution:

  • UUID-based identification in JWT tokens
  • Session-based chat history with ephemeral storage
  • Conversion path to authenticated accounts

2. RAG Pipeline

Challenge: How to integrate multiple embedding providers with consistent vector dimensions and ensure comprehensive document retrieval?

Solution:

  • Pluggable Embedding Architecture: Support for multiple providers (OpenAI text-embedding-3-small, NVIDIA NIM nvidia/nv-embed-v1)
  • Advanced Query Expansion: 30+ search variations including keywords, synonyms, questions, and context-specific expansions
  • Configurable RAG Parameters: Chunk size (1000), overlap (200), similarity threshold (0.08), top-k results (5)
  • Quality Filtering: Duplicate detection, relevance scoring, and content similarity checks
  • Fallback Mechanisms: Graceful fallbacks when embedding services are unavailable

3. Real-time Chat Experience

Challenge: Providing responsive chat interface with async processing?

Solution:

  • Optimistic UI updates
  • Loading states and error handling
  • Efficient API design with minimal round trips

Conclusion

This architecture provides a solid foundation for a production-ready RAG chatbot with:

  • Security First: Comprehensive authentication and authorization
  • Developer Experience: Clean code structure and documentation
  • Production Ready: AWS deployment strategy with monitoring
  • Future Proof: Architecture that is adaptable and scalable