Version: 1.0
Last Updated: 2025-01-26
Classify supports 6 LLM providers with 30+ models optimized for document classification. This guide covers provider-specific configuration, model selection strategies, pricing, and integration details.
All providers implement the same interface:
interface LLMProvider {
name: string;
// Template selection (Stage 1)
selectTemplate(
documentText: string,
templateCatalog: TemplateCatalog
): Promise<TemplateSelection>;
// Classification (Stage 2)
classify(
documentText: string,
template: Template,
config: LLMConfig
): Promise<ClassificationResult>;
// Utility methods
validateApiKey(): Promise<boolean>;
getCost(tokens: number): {input: number; output: number};
getModelInfo(): ModelInfo;
healthCheck(): Promise<boolean>;
}Best for: Cost-effective general classification
Configuration:
Provider: DeepSeek
Default Model: deepseek-chat
API Endpoint: https://api.deepseek.com/v1/chat/completions
Authentication: Bearer token
API Key Env: DEEPSEEK_API_KEYAvailable Models:
| Model | Pricing (per 1M tokens) | Speed | Best For |
|---|---|---|---|
deepseek-chat |
$0.14 / $0.28 | Medium | General classification (RECOMMENDED) |
deepseek-r1 |
$0.14 / $0.28 | Medium | Reasoning tasks |
deepseek-reasoner |
$0.14 / $0.28 | Slow | Complex analysis |
deepseek-v3 |
$0.14 / $0.28 | Medium | Latest features |
Example Usage:
const client = new ClassifyClient({
provider: 'deepseek',
model: 'deepseek-chat',
apiKey: process.env.DEEPSEEK_API_KEY
});CLI:
npx @hivellm/classify document file.pdf --provider deepseek --model deepseek-chatRate Limits:
- Requests: 60/minute
- Tokens: 1M/minute
- Concurrent: 10 requests
Error Handling:
- 429: Rate limit exceeded → retry with exponential backoff
- 500: Service error → fallback to alternative model
- 401: Invalid API key → check DEEPSEEK_API_KEY
Best for: High-accuracy requirements
Configuration:
Provider: OpenAI
Default Model: gpt-4o-mini
API Endpoint: https://api.openai.com/v1/chat/completions
Authentication: Bearer token
API Key Env: OPENAI_API_KEYAvailable Models:
| Model | Pricing (per 1M tokens) | Speed | Best For |
|---|---|---|---|
gpt-4o-mini |
$0.15 / $0.60 | Fast | Cost-effective accuracy (RECOMMENDED) |
chatgpt-4o-latest |
$5.00 / $15.00 | Medium | Latest ChatGPT |
gpt-4o |
$5.00 / $15.00 | Medium | Standard GPT-4 Optimized |
gpt-5-mini |
$0.50 / $1.50 | Fast | Latest mini model |
gpt-4.1-mini |
$0.30 / $1.20 | Fast | Version 4.1 mini |
o1-mini |
$3.00 / $12.00 | Slow | Reasoning model |
gpt-4-turbo |
$10.00 / $30.00 | Fast | Legacy turbo |
gpt-4o-search-preview |
$5.00 / $15.00 | Medium | With search capability |
Example Usage:
const client = new ClassifyClient({
provider: 'openai',
model: 'gpt-4o-mini',
apiKey: process.env.OPENAI_API_KEY
});CLI:
npx @hivellm/classify document file.pdf --provider openai --model gpt-4o-miniRate Limits (Tier 3):
- Requests: 5,000/minute
- Tokens: 2M/minute
- Concurrent: 500 requests
Best for: Fast, quality classification
Configuration:
Provider: Anthropic
Default Model: claude-3-5-haiku-latest
API Endpoint: https://api.anthropic.com/v1/messages
Authentication: x-api-key header
API Key Env: ANTHROPIC_API_KEYAvailable Models:
| Model | Pricing (per 1M tokens) | Speed | Best For |
|---|---|---|---|
claude-3-5-haiku-latest |
$0.25 / $1.25 | Fast | Fast & cheap (RECOMMENDED) |
claude-3-7-sonnet-latest |
$3.00 / $15.00 | Medium | Sonnet 3.7 |
claude-4-sonnet-20250514 |
$3.00 / $15.00 | Medium | Claude 4 latest |
claude-sonnet-4-20250514 |
$3.00 / $15.00 | Medium | Sonnet 4 variant |
claude-3-5-sonnet-latest |
$3.00 / $15.00 | Medium | Sonnet 3.5 |
claude-3-5-sonnet-20241022 |
$3.00 / $15.00 | Medium | Dated version |
claude-3-opus-latest |
$15.00 / $75.00 | Slow | Highest quality (expensive) |
Example Usage:
const client = new ClassifyClient({
provider: 'anthropic',
model: 'claude-3-5-haiku-latest',
apiKey: process.env.ANTHROPIC_API_KEY
});CLI:
npx @hivellm/classify document file.pdf --provider anthropic --model claude-3-5-haiku-latestRate Limits:
- Requests: 4,000/minute
- Tokens: 400K/minute
- Concurrent: 50 requests
Best for: Fast Google ecosystem model
Configuration:
Provider: Gemini
Default Model: gemini-2.0-flash
API Endpoint: https://generativelanguage.googleapis.com/v1/models
Authentication: API key parameter
API Key Env: GEMINI_API_KEYAvailable Models:
| Model | Pricing (per 1M tokens) | Speed | Best For |
|---|---|---|---|
gemini-2.0-flash |
$0.50 / $1.50 | Fast | Fast version 2.0 (RECOMMENDED) |
gemini-2.5-flash |
$0.50 / $1.50 | Fast | Latest flash |
gemini-2.5-pro |
$1.25 / $5.00 | Medium | Pro version |
gemini-1.5-pro-latest |
$3.50 / $10.50 | Medium | Legacy pro |
gemini-1.5-flash-latest |
$0.35 / $1.05 | Fast | Legacy flash |
Example Usage:
const client = new ClassifyClient({
provider: 'gemini',
model: 'gemini-2.0-flash',
apiKey: process.env.GEMINI_API_KEY
});CLI:
npx @hivellm/classify document file.pdf --provider gemini --model gemini-2.0-flashRate Limits:
- Requests: 1,500/minute (free tier)
- Tokens: 1.5M/minute
- Concurrent: 15 requests
Best for: Alternative provider
Configuration:
Provider: xAI
Default Model: grok-3-mini-latest
API Endpoint: https://api.x.ai/v1/chat/completions
Authentication: Bearer token
API Key Env: XAI_API_KEYAvailable Models:
| Model | Pricing | Speed | Best For |
|---|---|---|---|
grok-3-mini-latest |
Variable | Fast | Cost-effective (RECOMMENDED) |
grok-3-fast-latest |
Variable | Very Fast | Speed priority |
grok-code-fast-1 |
Variable | Fast | Code-optimized |
grok-3-latest |
Variable | Medium | Standard |
grok-4-latest |
Variable | Medium | Latest (premium) |
Example Usage:
const client = new ClassifyClient({
provider: 'xai',
model: 'grok-3-mini-latest',
apiKey: process.env.XAI_API_KEY
});CLI:
npx @hivellm/classify document file.pdf --provider xai --model grok-3-mini-latestNote: Pricing varies; check x.ai for current rates.
Best for: Ultra-fast batch processing
Configuration:
Provider: Groq
Default Model: llama-3.1-8b-instant
API Endpoint: https://api.groq.com/openai/v1/chat/completions
Authentication: Bearer token
API Key Env: GROQ_API_KEYAvailable Models:
| Model | Pricing | Speed | Best For |
|---|---|---|---|
llama-3.1-8b-instant |
Very Low | Ultra-Fast | Speed priority (RECOMMENDED) |
llama-3.1-70b-versatile |
Low | Fast | Balanced |
llama-3.3-70b-versatile |
Low | Fast | Latest 70B |
Example Usage:
const client = new ClassifyClient({
provider: 'groq',
model: 'llama-3.1-8b-instant',
apiKey: process.env.GROQ_API_KEY
});CLI:
npx @hivellm/classify document file.pdf --provider groq --model llama-3.1-8b-instantRate Limits (Free Tier):
- Requests: 30/minute
- Tokens: 20K/minute
- Concurrent: 10 requests
Performance: Up to 500 tokens/second (fastest in the market)
| Use Case | Recommended Provider | Model | Reason |
|---|---|---|---|
| Default | DeepSeek | deepseek-chat | Best value, good quality |
| Speed | Groq | llama-3.1-8b-instant | Ultra-fast inference |
| Accuracy | OpenAI | gpt-4o-mini | High accuracy, reasonable cost |
| Quality + Speed | Anthropic | claude-3-5-haiku-latest | Fast Anthropic |
| Google Ecosystem | Gemini | gemini-2.0-flash | Google integration |
| Cost Optimization | DeepSeek | deepseek-chat | Lowest cost per token |
| Reasoning | OpenAI | o1-mini | Complex logical tasks |
Default fallback order (configured in classify.config.json):
const fallbackChain = [
{ provider: 'deepseek', model: 'deepseek-chat' }, // Primary
{ provider: 'groq', model: 'llama-3.1-8b-instant' }, // Fallback 1: Speed
{ provider: 'openai', model: 'gpt-4o-mini' }, // Fallback 2: Accuracy
{ provider: 'gemini', model: 'gemini-2.0-flash' } // Fallback 3: Alternative
];Purpose: Select the best template for the document.
Prompt Structure:
const selectionPrompt = {
system: "You are an expert document classifier that selects appropriate templates.",
user: `
Available Templates:
${templateCatalog.toMarkdown()}
Document to classify (compressed):
${compressedDocument}
Select the most appropriate template and explain why.
Return JSON: {"selected": "legal.json", "confidence": 0.95, "reasoning": "..."}
`
};Token Usage: ~500-1000 tokens (lightweight) Time: 500-1000ms
Purpose: Extract entities, relationships, and metadata using selected template.
Prompt Structure:
const classificationPrompt = {
system: template.llm_config.system_prompt,
user: `
Template: ${template.name}
Document Types: ${template.document_types}
Entities: ${template.entity_definitions}
Graph Schema: ${template.graph_schema}
Full-text Schema: ${template.fulltext_schema}
Document (compressed):
${compressedDocument}
Extract:
1. Entities and relationships for graph structure
2. Metadata fields for full-text indexing
3. Keywords and categories
Return structured JSON.
`
};Token Usage: ~1500-2500 tokens (detailed) Time: 1000-2000ms
| Provider | Model | Stage 1 | Stage 2 | Total | Notes |
|---|---|---|---|---|---|
| DeepSeek | deepseek-chat | $0.0011 | $0.0013 | $0.0024 | Best value |
| Groq | llama-3.1-8b-instant | $0.0005 | $0.0005 | $0.0010 | Fastest |
| OpenAI | gpt-4o-mini | $0.0012 | $0.0030 | $0.0042 | High accuracy |
| Anthropic | claude-3-5-haiku | $0.0020 | $0.0063 | $0.0083 | Quality |
| Gemini | gemini-2.0-flash | $0.0040 | $0.0075 | $0.0115 |
With Compression (50% reduction):
- DeepSeek: $0.0024 (saved $0.0021 from $0.0045)
- GPT-4o-mini: $0.0042 (saved $0.0048 from $0.0090)
With Cache Hit:
- Cost: $0.00 (100% savings)
- Time: 2-5ms (vs 2000ms)
| Provider | Cold Start | 70% Cache Hit | Notes |
|---|---|---|---|
| DeepSeek | $2.40 | $0.72 | Best value |
| Groq | $1.00 | $0.30 | Fastest |
| GPT-4o-mini | $4.20 | $1.26 | Accuracy |
async function callWithRetry(fn: () => Promise<any>, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await fn();
} catch (error) {
if (error.status === 429 && i < maxRetries - 1) {
const delay = Math.pow(2, i) * 1000; // 1s, 2s, 4s
await sleep(delay);
continue;
}
throw error;
}
}
}import PQueue from 'p-queue';
const queue = new PQueue({
concurrency: provider === 'groq' ? 5 : 10,
interval: 60000, // 1 minute
intervalCap: provider === 'groq' ? 30 : 100
});
const result = await queue.add(() => provider.classify(document));- Reasoning Models: deepseek-r1, deepseek-reasoner
- Best For: Cost-sensitive applications
- Special: Built-in Chinese language support
- Search Integration: gpt-4o-search-preview
- Reasoning: o1-mini for complex logic
- Best For: Production applications requiring reliability
- Context Window: Up to 200K tokens (Claude 3)
- Safety: Built-in safety filtering
- Best For: Complex document analysis
- Multimodal: Supports images with gemini-pro-vision
- Free Tier: 1,500 requests/minute
- Best For: Google Cloud integration
- Speed: Up to 500 tokens/second
- Open Source: Llama models
- Best For: Real-time applications
- Code Optimized: grok-code-fast-1
- Latest Models: Grok-4
- Best For: Alternative to mainstream providers
# ~/.bashrc or .env
export DEEPSEEK_API_KEY=sk-...
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GEMINI_API_KEY=AI...
export XAI_API_KEY=xai-...
export GROQ_API_KEY=gsk_...
# Default provider
export CLASSIFY_DEFAULT_PROVIDER=deepseek
export CLASSIFY_DEFAULT_MODEL=deepseek-chatinterface ProviderMetrics {
provider: string;
model: string;
// Performance
avgLatency: number;
p95Latency: number;
requestsPerMinute: number;
// Reliability
successRate: number;
errorRate: number;
rateLimitHits: number;
// Cost
tokensUsed: number;
costIncurred: number;
// Quality
avgConfidence: number;
lowConfidenceCount: number;
}# Check provider health
npx @hivellm/classify health-check --provider deepseek
# Output:
✓ DeepSeek API: Healthy
- Latency: 234ms
- Rate Limit: 45/60 rpm available
- Model: deepseek-chatNext: See INTEGRATION.md for integration examples.