feat: Add support for grok-4-fast model via LiteLLM proxy#154
Open
johnbean393 wants to merge 2 commits intobytebot-ai:mainfrom
Open
feat: Add support for grok-4-fast model via LiteLLM proxy#154johnbean393 wants to merge 2 commits intobytebot-ai:mainfrom
johnbean393 wants to merge 2 commits intobytebot-ai:mainfrom
Conversation
Remove max_tokens and reasoning_effort parameters from proxy service to improve compatibility with grok-4-fast model through OpenRouter. These model-specific parameters were causing issues with the new model.
- Add proxy.model-info.ts to dynamically fetch context windows from OpenRouter API - Update tasks.controller.ts to use async extractContextWindow function - Replace hardcoded 128K context window with dynamic values from OpenRouter - Implement caching layer (1-hour TTL) to minimize API calls - Fix Dockerfile to properly handle Prisma in Alpine Linux Benefits: - Grok 4 Fast now correctly reports 2M token context window - Claude Sonnet 4.5 reports 1M tokens instead of 200K - Gemini 2.5 models report 1048576 tokens - All models automatically get accurate, up-to-date context windows - Improves agent performance by preventing premature summarization Fixes context window inaccuracies by prioritizing: 1. LiteLLM model_info (when available) 2. OpenRouter API context_length (when LiteLLM returns null) 3. Default fallback (128K) Related to Grok 4 Fast support
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR adds support for the new
grok-4-fastmodel via OpenRouter through the bytebot-llm-proxy (LiteLLM).Changes
max_tokensparameter from proxy service Chat Completion requestsreasoning_effortparameter from proxy service Chat Completion requestsThese model-specific parameters were causing compatibility issues with the grok-4-fast model. By removing them, the proxy service now works seamlessly with grok-4-fast and other models that don't support these parameters, while LiteLLM handles model-specific parameter mapping automatically.
Testing