Classify CLI - LLM Providers

Version: 1.0
Last Updated: 2025-01-26

Overview

Classify supports 6 LLM providers with 30+ models optimized for document classification. This guide covers provider-specific configuration, model selection strategies, pricing, and integration details.

Provider Interface

All providers implement the same interface:

interface LLMProvider {
  name: string;
  
  // Template selection (Stage 1)
  selectTemplate(
    documentText: string,
    templateCatalog: TemplateCatalog
  ): Promise<TemplateSelection>;
  
  // Classification (Stage 2)
  classify(
    documentText: string,
    template: Template,
    config: LLMConfig
  ): Promise<ClassificationResult>;
  
  // Utility methods
  validateApiKey(): Promise<boolean>;
  getCost(tokens: number): {input: number; output: number};
  getModelInfo(): ModelInfo;
  healthCheck(): Promise<boolean>;
}

Supported Providers

1. DeepSeek (RECOMMENDED - Default)

Best for: Cost-effective general classification

Configuration:

Provider: DeepSeek
Default Model: deepseek-chat
API Endpoint: https://api.deepseek.com/v1/chat/completions
Authentication: Bearer token
API Key Env: DEEPSEEK_API_KEY

Available Models:

Model	Pricing (per 1M tokens)	Speed	Best For
`deepseek-chat`	$0.14 / $0.28	Medium	General classification (RECOMMENDED)
`deepseek-r1`	$0.14 / $0.28	Medium	Reasoning tasks
`deepseek-reasoner`	$0.14 / $0.28	Slow	Complex analysis
`deepseek-v3`	$0.14 / $0.28	Medium	Latest features

Example Usage:

const client = new ClassifyClient({
  provider: 'deepseek',
  model: 'deepseek-chat',
  apiKey: process.env.DEEPSEEK_API_KEY
});

CLI:

npx @hivellm/classify document file.pdf --provider deepseek --model deepseek-chat

Rate Limits:

Requests: 60/minute
Tokens: 1M/minute
Concurrent: 10 requests

Error Handling:

429: Rate limit exceeded → retry with exponential backoff
500: Service error → fallback to alternative model
401: Invalid API key → check DEEPSEEK_API_KEY

2. OpenAI

Best for: High-accuracy requirements

Configuration:

Provider: OpenAI
Default Model: gpt-4o-mini
API Endpoint: https://api.openai.com/v1/chat/completions
Authentication: Bearer token
API Key Env: OPENAI_API_KEY

Available Models:

Model	Pricing (per 1M tokens)	Speed	Best For
`gpt-4o-mini`	$0.15 / $0.60	Fast	Cost-effective accuracy (RECOMMENDED)
`chatgpt-4o-latest`	$5.00 / $15.00	Medium	Latest ChatGPT
`gpt-4o`	$5.00 / $15.00	Medium	Standard GPT-4 Optimized
`gpt-5-mini`	$0.50 / $1.50	Fast	Latest mini model
`gpt-4.1-mini`	$0.30 / $1.20	Fast	Version 4.1 mini
`o1-mini`	$3.00 / $12.00	Slow	Reasoning model
`gpt-4-turbo`	$10.00 / $30.00	Fast	Legacy turbo
`gpt-4o-search-preview`	$5.00 / $15.00	Medium	With search capability

Example Usage:

const client = new ClassifyClient({
  provider: 'openai',
  model: 'gpt-4o-mini',
  apiKey: process.env.OPENAI_API_KEY
});

CLI:

npx @hivellm/classify document file.pdf --provider openai --model gpt-4o-mini

Rate Limits (Tier 3):

Requests: 5,000/minute
Tokens: 2M/minute
Concurrent: 500 requests

3. Anthropic

Best for: Fast, quality classification

Configuration:

Provider: Anthropic
Default Model: claude-3-5-haiku-latest
API Endpoint: https://api.anthropic.com/v1/messages
Authentication: x-api-key header
API Key Env: ANTHROPIC_API_KEY

Available Models:

Model	Pricing (per 1M tokens)	Speed	Best For
`claude-3-5-haiku-latest`	$0.25 / $1.25	Fast	Fast & cheap (RECOMMENDED)
`claude-3-7-sonnet-latest`	$3.00 / $15.00	Medium	Sonnet 3.7
`claude-4-sonnet-20250514`	$3.00 / $15.00	Medium	Claude 4 latest
`claude-sonnet-4-20250514`	$3.00 / $15.00	Medium	Sonnet 4 variant
`claude-3-5-sonnet-latest`	$3.00 / $15.00	Medium	Sonnet 3.5
`claude-3-5-sonnet-20241022`	$3.00 / $15.00	Medium	Dated version
`claude-3-opus-latest`	$15.00 / $75.00	Slow	Highest quality (expensive)

Example Usage:

const client = new ClassifyClient({
  provider: 'anthropic',
  model: 'claude-3-5-haiku-latest',
  apiKey: process.env.ANTHROPIC_API_KEY
});

CLI:

npx @hivellm/classify document file.pdf --provider anthropic --model claude-3-5-haiku-latest

Rate Limits:

Requests: 4,000/minute
Tokens: 400K/minute
Concurrent: 50 requests

4. Google Gemini

Best for: Fast Google ecosystem model

Configuration:

Provider: Gemini
Default Model: gemini-2.0-flash
API Endpoint: https://generativelanguage.googleapis.com/v1/models
Authentication: API key parameter
API Key Env: GEMINI_API_KEY

Available Models:

Model	Pricing (per 1M tokens)	Speed	Best For
`gemini-2.0-flash`	$0.50 / $1.50	Fast	Fast version 2.0 (RECOMMENDED)
`gemini-2.5-flash`	$0.50 / $1.50	Fast	Latest flash
`gemini-2.5-pro`	$1.25 / $5.00	Medium	Pro version
`gemini-1.5-pro-latest`	$3.50 / $10.50	Medium	Legacy pro
`gemini-1.5-flash-latest`	$0.35 / $1.05	Fast	Legacy flash

Example Usage:

const client = new ClassifyClient({
  provider: 'gemini',
  model: 'gemini-2.0-flash',
  apiKey: process.env.GEMINI_API_KEY
});

CLI:

npx @hivellm/classify document file.pdf --provider gemini --model gemini-2.0-flash

Rate Limits:

Requests: 1,500/minute (free tier)
Tokens: 1.5M/minute
Concurrent: 15 requests

5. xAI (Grok)

Best for: Alternative provider

Configuration:

Provider: xAI
Default Model: grok-3-mini-latest
API Endpoint: https://api.x.ai/v1/chat/completions
Authentication: Bearer token
API Key Env: XAI_API_KEY

Available Models:

Model	Pricing	Speed	Best For
`grok-3-mini-latest`	Variable	Fast	Cost-effective (RECOMMENDED)
`grok-3-fast-latest`	Variable	Very Fast	Speed priority
`grok-code-fast-1`	Variable	Fast	Code-optimized
`grok-3-latest`	Variable	Medium	Standard
`grok-4-latest`	Variable	Medium	Latest (premium)

Example Usage:

const client = new ClassifyClient({
  provider: 'xai',
  model: 'grok-3-mini-latest',
  apiKey: process.env.XAI_API_KEY
});

CLI:

npx @hivellm/classify document file.pdf --provider xai --model grok-3-mini-latest

Note: Pricing varies; check x.ai for current rates.

6. Groq (Ultra-Fast)

Best for: Ultra-fast batch processing

Configuration:

Provider: Groq
Default Model: llama-3.1-8b-instant
API Endpoint: https://api.groq.com/openai/v1/chat/completions
Authentication: Bearer token
API Key Env: GROQ_API_KEY

Available Models:

Model	Pricing	Speed	Best For
`llama-3.1-8b-instant`	Very Low	Ultra-Fast	Speed priority (RECOMMENDED)
`llama-3.1-70b-versatile`	Low	Fast	Balanced
`llama-3.3-70b-versatile`	Low	Fast	Latest 70B

Example Usage:

const client = new ClassifyClient({
  provider: 'groq',
  model: 'llama-3.1-8b-instant',
  apiKey: process.env.GROQ_API_KEY
});

CLI:

npx @hivellm/classify document file.pdf --provider groq --model llama-3.1-8b-instant

Rate Limits (Free Tier):

Requests: 30/minute
Tokens: 20K/minute
Concurrent: 10 requests

Performance: Up to 500 tokens/second (fastest in the market)

Provider Selection Strategy

By Use Case

Use Case	Recommended Provider	Model	Reason
Default	DeepSeek	deepseek-chat	Best value, good quality
Speed	Groq	llama-3.1-8b-instant	Ultra-fast inference
Accuracy	OpenAI	gpt-4o-mini	High accuracy, reasonable cost
Quality + Speed	Anthropic	claude-3-5-haiku-latest	Fast Anthropic
Google Ecosystem	Gemini	gemini-2.0-flash	Google integration
Cost Optimization	DeepSeek	deepseek-chat	Lowest cost per token
Reasoning	OpenAI	o1-mini	Complex logical tasks

Fallback Chain

Default fallback order (configured in classify.config.json):

const fallbackChain = [
  { provider: 'deepseek', model: 'deepseek-chat' },        // Primary
  { provider: 'groq', model: 'llama-3.1-8b-instant' },    // Fallback 1: Speed
  { provider: 'openai', model: 'gpt-4o-mini' },           // Fallback 2: Accuracy
  { provider: 'gemini', model: 'gemini-2.0-flash' }       // Fallback 3: Alternative
];

Two-Stage LLM Interaction

Stage 1: Template Selection

Purpose: Select the best template for the document.

Prompt Structure:

const selectionPrompt = {
  system: "You are an expert document classifier that selects appropriate templates.",
  user: `
Available Templates:
${templateCatalog.toMarkdown()}

Document to classify (compressed):
${compressedDocument}

Select the most appropriate template and explain why.
Return JSON: {"selected": "legal.json", "confidence": 0.95, "reasoning": "..."}
  `
};

Token Usage: ~500-1000 tokens (lightweight) Time: 500-1000ms

Stage 2: Classification

Purpose: Extract entities, relationships, and metadata using selected template.

Prompt Structure:

const classificationPrompt = {
  system: template.llm_config.system_prompt,
  user: `
Template: ${template.name}
Document Types: ${template.document_types}
Entities: ${template.entity_definitions}
Graph Schema: ${template.graph_schema}
Full-text Schema: ${template.fulltext_schema}

Document (compressed):
${compressedDocument}

Extract:
1. Entities and relationships for graph structure
2. Metadata fields for full-text indexing
3. Keywords and categories

Return structured JSON.
  `
};

Token Usage: ~1500-2500 tokens (detailed) Time: 1000-2000ms

Cost Analysis

Per-Document Cost (20-page PDF, ~15,000 tokens → 7,500 after compression)

Provider	Model	Stage 1	Stage 2	Total	Notes
DeepSeek	deepseek-chat	$0.0011	$0.0013	$0.0024	Best value
Groq	llama-3.1-8b-instant	$0.0005	$0.0005	$0.0010	Fastest
OpenAI	gpt-4o-mini	$0.0012	$0.0030	$0.0042	High accuracy
Anthropic	claude-3-5-haiku	$0.0020	$0.0063	$0.0083	Quality
Gemini	gemini-2.0-flash	$0.0040	$0.0075	$0.0115	Google

Cost Savings

With Compression (50% reduction):

DeepSeek: $0.0024 (saved $0.0021 from $0.0045)
GPT-4o-mini: $0.0042 (saved $0.0048 from $0.0090)

With Cache Hit:

Cost: $0.00 (100% savings)
Time: 2-5ms (vs 2000ms)

Batch Processing Cost (1000 documents)

Provider	Cold Start	70% Cache Hit	Notes
DeepSeek	$2.40	$0.72	Best value
Groq	$1.00	$0.30	Fastest
GPT-4o-mini	$4.20	$1.26	Accuracy

Rate Limit Management

Exponential Backoff

async function callWithRetry(fn: () => Promise<any>, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      if (error.status === 429 && i < maxRetries - 1) {
        const delay = Math.pow(2, i) * 1000; // 1s, 2s, 4s
        await sleep(delay);
        continue;
      }
      throw error;
    }
  }
}

Concurrent Request Management

import PQueue from 'p-queue';

const queue = new PQueue({
  concurrency: provider === 'groq' ? 5 : 10,
  interval: 60000, // 1 minute
  intervalCap: provider === 'groq' ? 30 : 100
});

const result = await queue.add(() => provider.classify(document));

Provider-Specific Features

DeepSeek

Reasoning Models: deepseek-r1, deepseek-reasoner
Best For: Cost-sensitive applications
Special: Built-in Chinese language support

OpenAI

Search Integration: gpt-4o-search-preview
Reasoning: o1-mini for complex logic
Best For: Production applications requiring reliability

Anthropic

Context Window: Up to 200K tokens (Claude 3)
Safety: Built-in safety filtering
Best For: Complex document analysis

Gemini

Multimodal: Supports images with gemini-pro-vision
Free Tier: 1,500 requests/minute
Best For: Google Cloud integration

Groq

Speed: Up to 500 tokens/second
Open Source: Llama models
Best For: Real-time applications

xAI

Code Optimized: grok-code-fast-1
Latest Models: Grok-4
Best For: Alternative to mainstream providers

Environment Configuration

# ~/.bashrc or .env
export DEEPSEEK_API_KEY=sk-...
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GEMINI_API_KEY=AI...
export XAI_API_KEY=xai-...
export GROQ_API_KEY=gsk_...

# Default provider
export CLASSIFY_DEFAULT_PROVIDER=deepseek
export CLASSIFY_DEFAULT_MODEL=deepseek-chat

Monitoring and Observability

Metrics to Track

interface ProviderMetrics {
  provider: string;
  model: string;
  
  // Performance
  avgLatency: number;
  p95Latency: number;
  requestsPerMinute: number;
  
  // Reliability
  successRate: number;
  errorRate: number;
  rateLimitHits: number;
  
  // Cost
  tokensUsed: number;
  costIncurred: number;
  
  // Quality
  avgConfidence: number;
  lowConfidenceCount: number;
}

Health Checks

# Check provider health
npx @hivellm/classify health-check --provider deepseek

# Output:
✓ DeepSeek API: Healthy
  - Latency: 234ms
  - Rate Limit: 45/60 rpm available
  - Model: deepseek-chat

Next: See INTEGRATION.md for integration examples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classify CLI - LLM Providers

Overview

Provider Interface

Supported Providers

1. DeepSeek (RECOMMENDED - Default)

2. OpenAI

3. Anthropic

4. Google Gemini

5. xAI (Grok)

6. Groq (Ultra-Fast)

Provider Selection Strategy

By Use Case

Fallback Chain

Two-Stage LLM Interaction

Stage 1: Template Selection

Stage 2: Classification

Cost Analysis

Per-Document Cost (20-page PDF, ~15,000 tokens → 7,500 after compression)

Cost Savings

Batch Processing Cost (1000 documents)

Rate Limit Management

Exponential Backoff

Concurrent Request Management

Provider-Specific Features

DeepSeek

OpenAI

Anthropic

Gemini

Groq

xAI

Environment Configuration

Monitoring and Observability

Metrics to Track

Health Checks

FilesExpand file tree

LLM_PROVIDERS.md

Latest commit

History

LLM_PROVIDERS.md

File metadata and controls

Classify CLI - LLM Providers

Overview

Provider Interface

Supported Providers

1. DeepSeek (RECOMMENDED - Default)

2. OpenAI

3. Anthropic

4. Google Gemini

5. xAI (Grok)

6. Groq (Ultra-Fast)

Provider Selection Strategy

By Use Case

Fallback Chain

Two-Stage LLM Interaction

Stage 1: Template Selection

Stage 2: Classification

Cost Analysis

Per-Document Cost (20-page PDF, ~15,000 tokens → 7,500 after compression)

Cost Savings

Batch Processing Cost (1000 documents)

Rate Limit Management

Exponential Backoff

Concurrent Request Management

Provider-Specific Features

DeepSeek

OpenAI

Anthropic

Gemini

Groq

xAI

Environment Configuration

Monitoring and Observability

Metrics to Track

Health Checks