RAGChatbot - Architecture & Design Decisions

Problem Statement & Approach

Core Challenge

Build a full-stack RAG (Retrieval-Augmented Generation) chatbot that supports both anonymous and authenticated users, with persistent chat history and production-ready AWS deployment.

Key Requirements

User Management: Anonymous users (first-time visitors) vs authenticated users (returning users)
RAG Integration: Vector search + AI generation for contextual responses
Chat Persistence: Store and retrieve conversation history
Production Deployment: AWS-ready with scalability and security
Modern Tech Stack: FastAPI + React + TypeScript

Architectural Decisions

1. Backend Architecture (FastAPI)

Project Structure

Server/
├── app/
│   ├── api/v1/endpoints/     # REST API routes
│   ├── auth/                 # JWT authentication
│   ├── core/                 # Configuration & database
│   ├── models/               # SQLAlchemy ORM models
│   ├── schemas/              # Pydantic data validation
│   └── services/             # Business logic layer
├── alembic/                  # Database migrations
└── requirements.txt          # Dependencies

Key Design Patterns

Alembic: Alembic helps with versioning and updating the database
Service Layer: Business logic isolated from API endpoints which helps readability
Middleware Pattern: Helps protect routes that may be sensitive (like chats)
Pydantic: Response and request validation to ensure data integrity

2. Database Architecture (PostgreSQL + Amazon RDS)

Why PostgreSQL?

ACID Compliance: Full transactional support
JSON Support: Native JSONB for flexible metadata storage (chat sources, retrieved context)
Structure: Relational structure is intuitive for keeping track of user chats and history

Why Amazon RDS?

Managed Service: Automated backups, patching, and maintenance, especially when deploying to AWS
Monitoring: CloudWatch integration for performance metrics
Cost Effective: Pay-per-use model with reserved instances for production

Database Schema Design

-- Users Table (Supports both anonymous and authenticated users)
users (
    id SERIAL PRIMARY KEY,
    identifier VARCHAR(255) UNIQUE NOT NULL,  -- UUID for anonymous, email for authenticated
    user_type VARCHAR(20) NOT NULL,           -- 'anonymous' or 'authenticated'
    email VARCHAR(255) UNIQUE,                -- NULL for anonymous users
    hashed_password VARCHAR(255),             -- NULL for anonymous users
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    is_active BOOLEAN DEFAULT true,
    is_verified BOOLEAN DEFAULT false,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    last_login TIMESTAMP
);

-- Chat Sessions
chat_sessions (
    id SERIAL PRIMARY KEY,
    session_id UUID UNIQUE NOT NULL,          -- Session identifier
    user_id INTEGER REFERENCES users(id),
    title VARCHAR(255) NOT NULL,
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    last_activity TIMESTAMP DEFAULT NOW()
);

-- Chat Messages
chat_messages (
    id SERIAL PRIMARY KEY,
    session_id INTEGER REFERENCES chat_sessions(id),
    content TEXT NOT NULL,
    role VARCHAR(20) NOT NULL,                -- 'user' or 'assistant'
    retrieved_context JSONB,                  -- RAG retrieved documents
    sources JSONB,                            -- Source metadata
    tokens_used INTEGER DEFAULT 0,
    processing_time INTEGER DEFAULT 0,        -- Response time in milliseconds
    created_at TIMESTAMP DEFAULT NOW()
);

Key Design Decisions

JSONB for Metadata: Flexible storage of RAG context and sources
UUID Session IDs: Unique session identifiers for frontend
Soft Deletes: is_active flags instead of hard deletes
Audit Trail: created_at, updated_at, last_activity timestamps
Foreign Key Constraints: Data integrity and referential integrity

Migration Strategy

Alembic: Version-controlled database migrations
Forward-Only: Production-safe migration approach
Rollback Support: Development environment rollbacks

3. User Management Strategy

Dual User System Design

# Anonymous Users (Stateless)
- UUID-based identification
- JWT tokens with anonymous claims
- No database persistence for user data
- Chat history tied to session tokens

# Authenticated Users (Stateful)
- Email-based registration/login
- Database persistence with full profiles
- JWT tokens with user ID claims
- Persistent chat history across sessions

User Conversion Flow

Anonymous User → Chat → Convert to Authenticated → Preserve History

Authentication Architecture

JWT Tokens: Stateless authentication supporting both user types
Token Claims: user_id, identifier, user_type for flexible user handling
Middleware Protection: Global auth middleware + endpoint-level validation
CORS Configuration: Secure cross-origin requests

3. RAG Implementation Strategy

Vector Store Architecture

# Pluggable Embedding Providers
- OpenAI: text-embedding-3-small
- NVIDIA NIM: nvidia/nv-embed-v1(1024-dim)

# Vector Database: Pinecone
- 1024-dimensional vectors
- Automatic dimension normalization
- Similarity search with configurable thresholds

Used NVIDIA API key to get access to embedding models similar to llama-text-embed-v2 as the OpenAI embeddings were not very performant.

RAG Pipeline Design

Query → Query Expansion → Embedding Generation → Vector Search → Context Retrieval → AI Generation → Response

Query Expansion Strategy

The system employs a query expansion strategy to make sure it retrieves as many relevant documents as possible:

1. Keyword Extraction & Combinations

Extract meaningful keywords from user queries
Generate single keyword searches for broad matching
Create bigram combinations for phrase matching
Combine top keywords in various combinations

2. Question-Based Variations

Add question starters (How, What, Why, When, Where) to non-question queries
Generate multiple question formats for better coverage
Include both question and non-question versions

3. Action-Oriented Expansions

Add action verbs: help, support, troubleshoot, resolve, fix, solve
Include information-seeking variations: information about, guide for
Cover different user intents and search patterns

4. Synonym-Based Searches

Map financial terms to common synonyms
Example: "transaction declined" → "payment rejected", "purchase failed"
Ensures matching even when users use different terminology

5. Context-Specific Expansions

Add domain-specific contexts for financial queries
Include: banking, financial services, payment processing, customer support
Broadens search scope while maintaining relevance, especially for a targeted knowledge base

Example: For query "One of my transactions got declined, what should I do?"

Generates 30+ search variations including:
- Single keywords: "transaction", "declined", "payment", "rejected"
- Phrases: "transaction declined", "payment rejected", "help with declined transaction"
- Questions: "How to fix declined transaction", "What is declined transaction"
- Context: "banking transaction declined", "customer support declined transaction"

4. Frontend Architecture (React + Vite + Typescript)

Component Architecture

Client/
├── components/
│   ├── auth/          # Authentication components
│   └── chat/          # Chat interface components
├── contexts/          # React context for state management (specifically auth)
├── hooks/             # Hooks required for state management (specifically auth)
├── services/          # API client, utilities, and types
└── pages/             # Route-based components

State Management Strategy

React Context: Global auth state and user management
API Integration: Axios for HTTP requests with interceptors
Local States: React hooks for local state management (but for future would move to SSR/Tanstack for cleaner management)

6. API Design Decisions

RESTful Endpoint Structure

/api/v1/auth/
├── POST /register          # User registration
├── POST /login             # User authentication
├── POST /anonymous         # Anonymous user creation
├── POST /convert-anonymous # User conversion
├── GET /me                 # Current user info
└── POST /change-password   # Password management

/api/v1/chat/
├── POST /send             # Send message
├── GET /sessions          # List chat sessions
├── GET /sessions/{id}     # Get specific session
├── POST /sessions         # Create new session
└── DELETE /sessions/{id}  # Delete session

Response Format Standardization

{
  "message": "AI response content",
  "session_id": "uuid",
  "sources": [{"title": "...", "content": "..."}],
  "tokens_used": 150,
  "processing_time": 1200
}

7. Security Architecture

Authentication & Authorization

JWT Tokens: Secure, stateless authentication
Token Expiration: 30-minute access tokens
Password Security: bcrypt hashing with salt
CORS Protection: Configurable allowed origins

API Security

Global Middleware: Automatic route protection
Input Validation: Pydantic schemas for all inputs
SQL Injection Prevention: SQLAlchemy ORM

Production Security

Environment Variables: Secure configuration management (AWS Parameter Store, Secrets Manager)
HTTPS Only: TLS encryption in production
Audit Logging: Request/response logging capability

8. AWS Deployment Strategy

Service Selection Rationale

Frontend: AWS Amplify
├── Global CDN distribution
├── Automatic HTTPS
├── Git-based deployments
└── Built-in CI/CD

Backend: AWS App Runner
├── Serverless container platform
├── Auto-scaling capabilities
├── Health check integration
└── Managed infrastructure

Database: Amazon RDS
├── Managed PostgreSQL
├── Automated backups
├── Multi-AZ availability
└── Security group isolation

Deployment Architecture

Amplify (Frontend) → App Runner (Backend) → RDS (Database)
                            ↓
                    Pinecone (Vector DB)

Environment Management

Development: Local environment with hot reload
Staging: App Runner with staging database
Production: App Runner with production RDS + monitoring

9. Scalability Considerations

Horizontal Scaling

App Runner: Automatic scaling based on CPU/memory
RDS: Read replicas for high read loads
Pinecone: Managed vector database with auto-scaling (if more documents/vectors needed)

Performance Optimization

Database Indexing: Optimized queries with proper indexes
Caching Strategy: Redis-ready for session caching
Connection Pooling: Database connection management
Async Processing: Non-blocking I/O operations

Monitoring & Observability

Health Checks: /health endpoint for load balancers
Logging: Structured logging for debugging
Metrics: CloudWatch integration ready
Error Handling: Comprehensive exception management

Technical Challenges & Solutions

1. Anonymous User Persistence

Challenge: How to maintain chat history for anonymous users without database storage?

Solution:

UUID-based identification in JWT tokens
Session-based chat history with ephemeral storage
Conversion path to authenticated accounts

2. RAG Pipeline

Challenge: How to integrate multiple embedding providers with consistent vector dimensions and ensure comprehensive document retrieval?

Solution:

Pluggable Embedding Architecture: Support for multiple providers (OpenAI text-embedding-3-small, NVIDIA NIM nvidia/nv-embed-v1)
Advanced Query Expansion: 30+ search variations including keywords, synonyms, questions, and context-specific expansions
Configurable RAG Parameters: Chunk size (1000), overlap (200), similarity threshold (0.08), top-k results (5)
Quality Filtering: Duplicate detection, relevance scoring, and content similarity checks
Fallback Mechanisms: Graceful fallbacks when embedding services are unavailable

3. Real-time Chat Experience

Challenge: Providing responsive chat interface with async processing?

Solution:

Optimistic UI updates
Loading states and error handling
Efficient API design with minimal round trips

Conclusion

This architecture provides a solid foundation for a production-ready RAG chatbot with:

Security First: Comprehensive authentication and authorization
Developer Experience: Clean code structure and documentation
Production Ready: AWS deployment strategy with monitoring
Future Proof: Architecture that is adaptable and scalable

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

RAGChatbot - Architecture & Design Decisions

Problem Statement & Approach

Core Challenge

Key Requirements

Architectural Decisions

1. Backend Architecture (FastAPI)

Project Structure

Key Design Patterns

2. Database Architecture (PostgreSQL + Amazon RDS)

Why PostgreSQL?

Why Amazon RDS?

Database Schema Design

Key Design Decisions

Migration Strategy

3. User Management Strategy

Dual User System Design

User Conversion Flow

Authentication Architecture

3. RAG Implementation Strategy

Vector Store Architecture

RAG Pipeline Design

Query Expansion Strategy

4. Frontend Architecture (React + Vite + Typescript)

Component Architecture

State Management Strategy

6. API Design Decisions

RESTful Endpoint Structure

Response Format Standardization

7. Security Architecture

Authentication & Authorization

API Security

Production Security

8. AWS Deployment Strategy

Service Selection Rationale

Deployment Architecture

Environment Management

9. Scalability Considerations

Horizontal Scaling

Performance Optimization

Monitoring & Observability

Technical Challenges & Solutions

1. Anonymous User Persistence

2. RAG Pipeline

3. Real-time Chat Experience

Conclusion