Factory Droid — Response Truncation Bug Report (max_tokens / Context Limit)
Summary
During long conversation sessions, assistant responses are being cut off mid-sentence across multiple BYOK models. Investigation shows this is caused by the Factory client not correctly enforcing model-specific output token limits, or the session context exceeding the model's input token limit.
Affected Models (Confirmed)
| Model |
Error Pattern |
Notes |
custom:kimi-k2.6:cloud |
413 Request Entity Too Large (llmContextExceeded) |
Persists even after compaction |
custom:deepseek-v4-pro:cloud |
400 max_tokens exceeds model's maximum output tokens (65536) |
Requested 170,394 tokens |
custom:qwen3-vl:235b-cloud |
400 max_tokens exceeds model's maximum output tokens (32768) |
Requested 64,000 tokens |
custom:glm-5.1:cloud |
400 max_tokens exceeds model's maximum output tokens (131072) |
Requested 194,000 tokens |
All models have isByok: true.
Error Log Excerpts
Pattern A: max_tokens Exceeds Output Limit
400 "max_tokens (170394) exceeds model's maximum output tokens (65536)
for model deepseek-v4-pro (ref: ...)"
Pattern B: Context Size Exceeds Input Limit
413 "Request Entity Too Large (ref: ...)"
reason: "llmContextExceeded"
Pattern C: Invalid Message Format After Compaction
400 "invalid message format"
Occurs after Compaction (reason: context_limit)
Reproduction Steps
- Select any of the BYOK models above.
- Continue a long conversation (including tool calls and large file reads).
- Input tokens accumulate in the session.
- On the next turn, the model call fails with
400 or 413, and the response is truncated.
Root Cause Analysis
1. Broken max_tokens Calculation
The client appears to compute max_tokens = context_window - input_tokens, but ignores the model-specific maximum output token limit, resulting in API returning 400 Bad Request.
Correct calculation should be:
max_tokens = min(context_window - input_tokens, model_max_output_tokens)
2. Incomplete BYOK Model Spec Resolution
Logs show Unknown model, falling back to default when resolving getTuiModelConfig for BYOK models, indicating the client may not be retrieving the correct token limits for custom:<model>:cloud aliases.
3. Compaction Side-Effects
When llmContextExceeded occurs, Factory compacts the session, but the resulting message structure can trigger invalid message format on some models (notably kimi-k2.6).
Requested Fixes
-
Add per-model max_output_tokens hard caps
- deepseek-v4-pro → 65,536
- qwen3-vl → 32,768
- glm-5.1 → 131,072
- kimi-k2.6 → limit according to model specs
-
Fix max_tokens computation
- Always clip to
model_max_output_tokens
-
Improve BYOK model spec resolution
- Ensure
custom:<model>:cloud aliases resolve to correct limits
-
Validate message structure after Compaction
- Ensure compacted summaries conform to each model's format constraints
Environment
- OS: Windows 11 (win32 10.0.26200)
- Factory Droid versions: 0.105.0 through 0.137.1 (reproduced across versions)
- Installation ID:
af46be50-2fc2-4e76-b07b-30aeab5ee2b0
This is a client-side model invocation control issue that cannot be mitigated by the end user.
Factory Droid — Response Truncation Bug Report (max_tokens / Context Limit)
Summary
During long conversation sessions, assistant responses are being cut off mid-sentence across multiple BYOK models. Investigation shows this is caused by the Factory client not correctly enforcing model-specific output token limits, or the session context exceeding the model's input token limit.
Affected Models (Confirmed)
custom:kimi-k2.6:cloud413 Request Entity Too Large(llmContextExceeded)custom:deepseek-v4-pro:cloud400 max_tokens exceeds model's maximum output tokens (65536)custom:qwen3-vl:235b-cloud400 max_tokens exceeds model's maximum output tokens (32768)custom:glm-5.1:cloud400 max_tokens exceeds model's maximum output tokens (131072)All models have
isByok: true.Error Log Excerpts
Pattern A: max_tokens Exceeds Output Limit
Pattern B: Context Size Exceeds Input Limit
Pattern C: Invalid Message Format After Compaction
Reproduction Steps
400or413, and the response is truncated.Root Cause Analysis
1. Broken max_tokens Calculation
The client appears to compute
max_tokens = context_window - input_tokens, but ignores the model-specific maximum output token limit, resulting in API returning400 Bad Request.Correct calculation should be:
2. Incomplete BYOK Model Spec Resolution
Logs show
Unknown model, falling back to defaultwhen resolvinggetTuiModelConfigfor BYOK models, indicating the client may not be retrieving the correct token limits forcustom:<model>:cloudaliases.3. Compaction Side-Effects
When
llmContextExceededoccurs, Factory compacts the session, but the resulting message structure can triggerinvalid message formaton some models (notablykimi-k2.6).Requested Fixes
Add per-model
max_output_tokenshard capsFix
max_tokenscomputationmodel_max_output_tokensImprove BYOK model spec resolution
custom:<model>:cloudaliases resolve to correct limitsValidate message structure after Compaction
Environment
af46be50-2fc2-4e76-b07b-30aeab5ee2b0This is a client-side model invocation control issue that cannot be mitigated by the end user.