AI Model Comparison Tool

A sophisticated web application for comparing responses from multiple AI models simultaneously. Built with React, TypeScript, and Express, supporting all major AI providers with real-time response comparison and analysis.

Architecture Overview

Core Technology Stack

Frontend: React 18 + TypeScript + Vite for development/building
Backend: Express.js + TypeScript with ES modules
Database: PostgreSQL with Drizzle ORM (with in-memory fallback)
Styling: Tailwind CSS + shadcn/ui component library
State Management: Zustand store with TanStack Query for server state
Variable System: Unified template engine with server-side resolution
Routing: Wouter for lightweight client-side routing
Streaming: Server-Sent Events (SSE) for real-time AI responses with browser extension compatibility (v0.4.5)

State Management Architecture

Variable Resolution System

Isomorphic Engine: Shared template resolution between frontend preview and server execution
Single Source of Truth: Server performs authoritative variable substitution with audit logging
Type-Safe Registry: Mode-specific variable schemas with validation and auto-generated UI
Migration Support: Backward compatibility with alias mapping (e.g., {RESPONSE} → {response})

Unified API Design

Single Endpoint: /api/generate handles all modes (creative, battle, debate, compare)
Streaming Support: Real-time SSE events for token-by-token updates
Legacy Compatibility: Feature-flagged legacy routes during transition period

Store Architecture

Zustand Integration: Optimized state management replacing useState patterns
Derived State: Computed values via selectors prevent drift and enable streaming
Message-Centric: UnifiedMessage format supports debates, tools, and streaming status

Project Structure

├── client/                 # Frontend React application
│   ├── src/
│   │   ├── components/     # Reusable UI components
│   │   ├── hooks/          # Custom React hooks
│   │   ├── lib/            # Utility libraries and configurations
│   │   ├── pages/          # Page components (routing)
│   │   └── types/          # TypeScript type definitions
├── server/                 # Backend Express API
│   ├── services/           # Business logic and external integrations
│   ├── routes.ts           # API route definitions
│   ├── storage.ts          # Data persistence interface
│   └── index.ts            # Server entry point
├── shared/                 # Shared types and schemas
└── attached_assets/        # Static assets and generated images

Model Source of Truth

All AI model configurations, capabilities, pricing, and specifications are maintained in the modular provider system located in server/providers/. This serves as the single source of truth for:

Latest Model Versions (Updated: August 10, 2025)

OpenAI: GPT-5 (flagship model), GPT-4.1 series, o3/o4 reasoning models
Anthropic: Claude Sonnet 4, Claude 3.7 Sonnet, Claude 3.5 series
Google: Gemini 2.5 Pro/Flash, Gemini 2.0 Flash series
xAI: Grok 4 (reasoning), Grok 3 series variants
DeepSeek: R1 Reasoner (CoT), V3 Chat

Model Capabilities Tracked

Reasoning: Full chain-of-thought support with visible reasoning logs
Multimodal: Text and image input processing capabilities
Function Calling: Tool use and API integration support
Streaming: Real-time response generation support

Provider-Specific Features

OpenAI: OpenAI models use Responses API (/v1/responses) exclusively - Proactive migration completed October 2025 (ahead of mid-2026 Completions API deprecation). Full reasoning support with reasoning.summary = "auto" for GPT-5, o3/o4 series
Anthropic: Structured reasoning with <reasoning> tags for Claude 3.7/4
Google: Thinking budget configuration for Gemini 2.5 models
xAI: Advanced reasoning capabilities in Grok 4
DeepSeek: Complete reasoning transparency via reasoning_content field

🔄 OpenAI Responses API Proactive Migration (October 2025)

PROACTIVE MIGRATION: As of October 2025, ModelCompare exclusively uses OpenAI's modern Responses API (/v1/responses). While OpenAI's legacy Chat Completions API remains supported until mid-2026, we've migrated early to ensure forward compatibility and unlock advanced reasoning capabilities.

Migration Details:

Endpoint: All OpenAI calls use the Responses API (/v1/responses) exclusively; Chat Completions API is NOT USED in this codebase
Reasoning Support: Requests include reasoning.summary = "auto" for GPT-5 and o-series models; UI displays full reasoning summaries
Output Token Caps:
- GPT‑5 series (flagship, mini, nano): max_output_tokens = 128000
- All other OpenAI models default to max_output_tokens = 16384
- Global minimum floor of 16,300 visible output tokens enforced at provider level (env overrides cannot go lower)
Timeouts: Default request timeout is 10 minutes (600,000 ms), configurable via OPENAI_TIMEOUT_MS
Retries: Exponential backoff on HTTP 429/5xx and timeouts (up to 3 attempts for reliability)
Content Parsing: Primary content from response.output_text; reasoning from response.output_reasoning.summary with safe fallbacks
Conversation Chaining: Response IDs (response.id) captured for multi-turn debates and conversations
Diagnostics: Optional raw JSON logging via DEBUG_SAVE_RAW environment variable for debugging

API Integration Architecture

All AI providers are abstracted through a unified modular provider system that:

Normalizes Request Format: Converts internal prompt format to provider-specific API calls
Handles Authentication: Manages API keys securely through environment variables
Implements Error Handling: Provides graceful degradation when providers fail
Tracks Performance: Measures response times for comparison
Supports Concurrency: Makes parallel API calls for efficient comparison

Provider Configuration

Each provider requires specific environment variables:

OPENAI_API_KEY - OpenAI API access
ANTHROPIC_API_KEY - Anthropic Claude access
GEMINI_API_KEY - Google Gemini access
GROK_API_KEY - xAI Grok access
DEEPSEEK_API_KEY - DeepSeek access

OpenAI Responses Configuration (Environment)

OPENAI_MAX_OUTPUT_TOKENS (optional)
- Overrides default caps above. A provider-level floor of 16,300 is always enforced.
OPENAI_TIMEOUT_MS (optional)
- Overrides default 10-minute timeout.
DEBUG_SAVE_RAW (optional)
- When set, enables saving raw OpenAI Responses JSON for diagnostics.

Data Architecture

Database Schema (`shared/schema.ts`)

The application uses a flexible schema supporting both PostgreSQL and in-memory storage:

// Core comparison result storage
export const comparisons = pgTable('comparisons', {
  id: text('id').primaryKey(),
  prompt: text('prompt').notNull(),
  selectedModels: text('selected_models').array().notNull(),
  responses: json('responses').$type<Record<string, ModelResponse>>().notNull(),
  createdAt: timestamp('created_at').defaultNow().notNull(),
});

Storage Interface

The IStorage interface in server/storage.ts provides:

CRUD Operations: Create, read, update, delete comparisons
Type Safety: Full TypeScript support with Drizzle ORM
Fallback Support: Automatic fallback to in-memory storage
Session Management: PostgreSQL-backed user sessions

Core Features

Comparison Modes

Compare Mode (/)

Side-by-side model response comparison
Multi-model selection with provider grouping
Real-time response timing and cost tracking
Export and raw prompt preview functionality

Battle Chat Mode (/battle)

Interactive chat-style model comparison with unlimited model seats
Turn-based conversation analysis with proper prompt memory persistence
PersonX/Challenger prompt template system for dynamic rebuttals
Challenger prompts automatically receive previous responses and original prompts

Debate Mode (/debate)

Structured 10-round AI debates between models with real-time streaming (v0.4.4+)
Topic selection with adversarial intensity levels (1-10)
Same-model debate support for self-analysis scenarios
Automated debate progression with manual turn-by-turn controls
Advanced Streaming Features:
- Server-Sent Events (SSE) for real-time reasoning and content display
- Two-phase streaming handshake via POST /api/debate/stream/init followed by GET /api/debate/stream/:taskId/:modelKey/:sessionId
- Conversation chaining using OpenAI Responses API response.id tracking
- Database session persistence with turn history
- Model-specific configuration (reasoning effort, temperature, max tokens)
- Live progress indicators and cost estimation during generation

Creative Combat Mode (/creative)

Sequential creative editing workflow
Multiple AI models enhance content iteratively
Editorial pass tracking and version comparison
Manual model selection for each enhancement round

Vixra Mode (/vixra)

Generate satirical academic-style papers with template-driven sections
Auto Mode: One-click generation of complete papers with automatic section progression
Intelligent dependency resolution (abstract → introduction → methodology → results → discussion → conclusion)
Real-time progress tracking with pause/resume functionality
Manual section control still available alongside auto mode
Uses the same model selection UI and ResponseCard display as Compare mode
Loads templates from client/public/docs/vixra-prompts.md
Calls existing endpoints (GET /api/models, POST /api/models/respond)

Universal Features

Export Functionality

Comprehensive markdown export across all modes
One-click clipboard copy for easy sharing
Safe filename generation for downloaded files
Session metadata and timing information included

Raw Prompt Preview

Transparency widgets showing exact prompts sent to models
Toggle visibility with Eye icon controls
Template variable substitution preview
Debugging and prompt optimization support

Browser Extension Compatibility (v0.4.5)

Problem Solved: Browser extensions (LastPass, Grammarly, etc.) were causing application crashes during streaming operations due to MutationObserver interference.

Comprehensive Solution Applied:

All Streaming Modes Protected: Debate, Battle Chat, Vixra, and Luigi modes now include defensive programming against browser extension interference
Defensive Pattern:
- Null checks before all scrollIntoView operations
- Try-catch blocks with debug logging (no error propagation)
- Browser extension opt-out data attributes on all dynamic content containers
Extensions Supported:
- Grammarly: data-gramm="false", data-gramm_editor="false", data-enable-grammarly="false"
- LastPass: data-lpignore="true", data-form-type="other"
Impact: All streaming and chat interfaces now gracefully handle rapid DOM updates without crashes, even with multiple browser extensions active

Technical Details: Browser extensions inject content scripts using MutationObserver to watch DOM changes. During rapid streaming updates, these observers can fail when trying to observe nodes being removed/updated, causing crashes. Our defensive pattern prevents these failures from affecting the user experience.

Frontend Architecture

Modular Component Architecture

The application follows a strict modular component approach to ensure consistency and reusability across all modes:

Core Reusable Components

AppNavigation - Unified navigation header with theme toggle
ModelButton - Enhanced model selection with provider colors, capabilities, and cost info
MessageCard - Universal message display for all modes (responses, reasoning, costs)
ExportButton - Standardized export functionality (markdown, clipboard)
ThemeProvider - Global dark/light mode management

Component Hierarchy

App (Theme Provider + Router)
├── AppNavigation (consistent across all pages)
├── Home Page (Compare Mode)
│   ├── ModelButton[] (provider-grouped with quick actions)
│   ├── PromptInput (template system integration)
│   └── ResponseCard[] (individual model responses)
├── Battle Chat Page
│   ├── ModelButton[] (same selection UI)
│   └── MessageCard[] (turn-based conversation display)
├── Debate Page
│   ├── ModelButton[] (consistent model selection)
│   └── MessageCard[] (structured debate display)
├── Creative Combat Page
│   ├── ModelButton[] (reused provider-grouped layout)
│   └── MessageCard[] (creative pass evolution)
└── ThemeProvider (light/dark mode support)

Design Principles

Consistency: All pages use the same ModelButton layout and MessageCard display
Reusability: Components are designed to work across different modes
Type Safety: Shared TypeScript interfaces ensure compatibility
No Duplication: Custom UI is avoided in favor of existing components

State Management Pattern

The application uses TanStack Query for all server state:

Model Loading: useQuery for fetching available models
Comparison Execution: useMutation for submitting prompts
Cache Management: Automatic invalidation and refetching
Loading States: Built-in loading and error state handling

Form Handling

React Hook Form with Zod validation provides:

Type-safe Forms: Schema-driven validation
Real-time Validation: Immediate feedback
Performance: Minimal re-renders
Accessibility: ARIA compliance

Backend Architecture

API Design

RESTful endpoints with clear responsibilities:

# Core Comparison
GET  /api/models           # Fetch available AI models
POST /api/compare          # Submit prompt for comparison
GET  /api/comparisons/:id  # Retrieve specific comparison

# Debate Mode (with streaming)
POST /api/debate/session      # Create new debate session
GET  /api/debate/sessions     # List existing debate sessions
POST /api/debate/stream/init                     # Validate payload and create streaming session
GET  /api/debate/stream/:taskId/:modelKey/:sessionId  # Stream debate responses via SSE

# Model Responses
POST /api/models/respond      # Get single model response

Request/Response Flow

Model Selection: Frontend fetches available models
Prompt Submission: User input validated and submitted
Parallel Processing: Backend calls multiple AI APIs simultaneously
Response Aggregation: Results collected and formatted
Real-time Updates: Frontend receives formatted responses

Error Handling Strategy

Multi-layered error handling:

Provider Level: Individual API failures don't affect others
Request Level: Validation errors return 400 with details
Server Level: Unexpected errors return 500 with safe messages
Client Level: UI shows specific error states per model

Development Workflow

File Modification Guidelines

Shared Types First: Always update shared/schema.ts for data model changes
Backend Implementation: Implement storage and API routes
Frontend Integration: Update components and hooks
Testing: Verify end-to-end functionality

Key Development Files

shared/schema.ts - Central data model definitions
server/services/ai-providers.ts - AI integration logic
client/src/pages/home.tsx - Main application interface
client/src/components/ - Reusable UI components
server/routes.ts - API endpoint definitions

Code Style & Standards

TypeScript: Strict type checking enabled
ES Modules: Import/export syntax throughout
React Patterns: Hooks-based functional components
Tailwind: Utility-first CSS with design system
Error Boundaries: Graceful error handling

Performance Considerations

Frontend Optimizations

Code Splitting: Automatic route-based splitting with Vite
Query Caching: TanStack Query reduces redundant API calls
Virtual DOM: React optimizations for large response lists
Lazy Loading: Components loaded on demand

Backend Optimizations

Concurrent API Calls: Parallel provider requests
Connection Pooling: PostgreSQL connection management
Memory Management: Efficient in-memory fallback storage
Request Validation: Early rejection of invalid requests

Scaling Considerations

Stateless Design: Easy horizontal scaling
Database Separation: Can extract to dedicated DB instance
CDN Ready: Static assets can be served from CDN
API Rate Limiting: Ready for rate limiting middleware

Security & Best Practices

API Key Management

Environment variable storage only
No keys in client-side code
Separate keys per environment
Secure transmission to providers

Data Privacy

No persistent storage of API responses by default
User prompts stored only if explicitly saved
No cross-user data access
Secure session management

Input Validation

Client-side validation for UX
Server-side validation for security
Schema-driven validation with Zod
Sanitization of user inputs

Future Enhancements

Planned Features

Battle Mode: Models critique each other's responses
Response Analytics: Sentiment analysis and metrics
Export Functionality: Save comparisons as PDF/JSON
Custom Model Configs: Temperature and parameter controls
Prompt Templates: Saved prompt collections
Collaboration: Share comparisons with others

Technical Roadmap

Streaming Responses: Real-time response streaming
Advanced Caching: Redis integration for better performance
Monitoring: Application performance monitoring
Testing Suite: Comprehensive test coverage
Documentation: API documentation with OpenAPI

Environment Setup

ARE ALL IN THE .ENV FILE!!!

Development ??? Not sure wtf this is for???

NODE_ENV=development


### Development Commands

- `npm run dev` - Start development server (frontend + backend)
- `npm run build` - Build for production
- `npm run preview` - Preview production build

The application automatically handles missing API keys by excluding unavailable providers from the model selection interface.

FilesExpand file tree

README.md

Latest commit

History