The LessTokens SDK is designed as a thin wrapper that:
- Compresses prompts via the LessTokens API
- Delegates to official LLM provider SDKs for API calls
- Provides a unified interface across multiple providers
┌─────────────────────────────────────────────────────────────┐
│ Application Code │
└──────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ LessTokensSDK │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ - process_prompt() │ │
│ │ - process_prompt_stream() │ │
│ │ - compress_prompt() │ │
│ └──────────────────────────────────────────────────────┘ │
└──────────────┬───────────────────────────────┬──────────────┘
│ │
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────────┐
│ LessTokensClient │ │ LLMClient │
│ ┌────────────────────┐ │ │ ┌────────────────────────┐ │
│ │ - compress() │ │ │ │ - chat() │ │
│ │ - _perform_ │ │ │ │ - chat_stream() │ │
│ │ compression_ │ │ │ └────────────────────────┘ │
│ │ request() │ │ │ │ │
│ └────────────────────┘ │ │ ▼ │
└──────────┬───────────────┘ │ ┌────────────────────────┐ │
│ │ │ Provider Factory │ │
▼ │ │ - create_provider() │ │
┌──────────────────────────┐ │ └────────────────────────┘ │
│ LessTokens API │ │ │ │
│ (HTTP/REST) │ │ ▼ │
└──────────────────────────┘ │ ┌────────────────────────┐ │
│ │ Provider │ │
│ │ Implementations │ │
│ │ - OpenAIProvider │ │
│ │ - AnthropicProvider │ │
│ │ - GoogleProvider │ │
│ │ - DeepSeekProvider │ │
│ └────────────────────────┘ │
└────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ Official Provider SDKs │
│ - openai │
│ - anthropic │
│ - google-generativeai │
└──────────────────────────────┘
Main entry point for the SDK. Coordinates compression and LLM API calls.
Responsibilities:
- Validate configuration and inputs
- Orchestrate compression and LLM requests
- Calculate and attach compression metrics
- Handle streaming responses
Client for communicating with the LessTokens API.
Responsibilities:
- Handle HTTP requests to LessTokens API
- Implement retry logic with exponential backoff
- Parse and normalize API responses
- Handle errors and timeouts
Wrapper around provider implementations.
Responsibilities:
- Provide unified interface for all providers
- Delegate to appropriate provider implementation
- Handle provider-specific differences
Provider-specific adapters that wrap official SDKs.
Responsibilities:
- Convert SDK types to internal types
- Handle provider-specific API differences
- Implement streaming support
- Extract usage metrics
Supporting utilities for validation, retry, and error handling.
Responsibilities:
- Input validation
- Retry logic with exponential backoff
- Error creation and handling
-
Input Validation
- Validate SDK configuration
- Validate prompt and options
- Validate LLM configuration
-
Compression
- Send prompt to LessTokens API
- Receive compressed prompt and metrics
- Handle errors and retries
-
LLM Request
- Create provider instance
- Build messages array (with optional conversation history)
- Send to LLM provider
- Receive response and usage metrics
-
Response Assembly
- Combine LLM response with compression metrics
- Calculate total savings
- Return unified response
Similar to process prompt flow, but:
- LLM response is streamed chunk by chunk
- Compression metrics are added to final chunk
- Client receives chunks asynchronously
The SDK is a thin wrapper that:
- Uses official provider SDKs internally
- Doesn't reimplement provider logic
- Passes through all provider-specific options
All providers share the same interface:
- Same method signatures
- Same response types
- Same error handling
Full type hints for:
- All public APIs
- Internal types
- Provider-specific options (via TypedDict)
Consistent error handling:
- Custom error class (
LessTokensError) - Error codes for programmatic handling
- Detailed error messages
Fully async implementation:
- Non-blocking I/O
- Efficient resource usage
- Native Python async support
- Create provider class implementing
LLMProvider - Add provider to factory function
- Update validation and types
Override retry configuration:
- Per-request retry settings
- Custom retryable errors
- Custom delay calculation
Use message_content option:
- String: Direct content
- Callable: Dynamic content based on compression results
HTTP clients use connection pooling for efficiency.
Streaming responses reduce memory usage for large responses.
Exponential backoff prevents overwhelming APIs during retries.
Minimal type conversions to reduce overhead.
API keys are never logged or exposed in error messages.
All API calls use HTTPS.
Configurable timeouts prevent hanging requests.
- Test each component in isolation
- Mock external dependencies
- Test error cases
- Test with real APIs (using test keys)
- Test provider-specific behavior
- Test streaming
- Aim for high test coverage
- Test edge cases
- Test error handling