Bug Description
When vlm.max_tokens is not set in the config, the OpenAI VLM backend defaults to 32768 completion tokens:
# openviking/models/vlm/backends/openai_vlm.py (lines 236, 274, on main and in 0.4.4)
max_tokens = self.max_tokens or 32768
kwargs["max_completion_tokens" if is_reasoning else "max_tokens"] = max_tokens
Many widely-used OpenAI models cap completion tokens below 32768 (e.g. gpt-4o-mini and gpt-4o allow at most 16384). The API then rejects the memory-extraction call with HTTP 400:
openviking.session.compressor_v2 - ERROR - [trajectory] Failed to extract:
Error code: 400 - {'error': {'message': 'max_tokens is too large: 32768.
This model supports at most 16384 completion tokens, whereas you provided 32768.',
'type': 'invalid_request_error', 'param': 'max_tokens', 'code': 'invalid_value'}}
Impact
- Memory extraction completes without raising, but logs
Extracted 0 memories and total_memories stays 0.
- It's silent from the API's perspective —
commit returns 200 / accepted; the failure is only visible in server logs. Easy to mistake for "extraction just didn't find anything."
- Affects any deployment using a model whose max completion tokens < 32768 without explicitly setting
vlm.max_tokens — gpt-4o-mini is a very common default choice.
Reproduction
ov.conf → vlm: { provider: "openai", model: "gpt-4o-mini", ... } (no max_tokens)
POST /api/v1/sessions → .../messages/batch → .../commit
- Server log shows the 400 above;
GET /api/v1/stats/memories returns total_memories: 0
Workaround
Set an explicit cap in the vlm block:
"vlm": { "provider": "openai", "model": "gpt-4o-mini", "max_tokens": 16384 }
After this, extraction succeeds (Extracted N memories for long_term) and memories become searchable.
Suggested fix
A hardcoded or 32768 default is too high for most non-Volcengine models. Options:
- Lower the default to a broadly-safe value (e.g. 16384), or
- Derive a per-model cap, or
- At minimum, when the API returns the
max_tokens is too large 400, surface it as a clear configuration error (e.g. log a WARNING/raise) instead of silently extracting 0 memories.
Environment: OpenViking 0.4.4 (PyPI), Python 3.12, api_key auth mode, OpenAI gpt-4o-mini.
Bug Description
When
vlm.max_tokensis not set in the config, the OpenAI VLM backend defaults to 32768 completion tokens:Many widely-used OpenAI models cap completion tokens below 32768 (e.g.
gpt-4o-miniandgpt-4oallow at most 16384). The API then rejects the memory-extraction call with HTTP 400:Impact
Extracted 0 memoriesandtotal_memoriesstays0.commitreturns 200 /accepted; the failure is only visible in server logs. Easy to mistake for "extraction just didn't find anything."vlm.max_tokens—gpt-4o-miniis a very common default choice.Reproduction
ov.conf→vlm: { provider: "openai", model: "gpt-4o-mini", ... }(nomax_tokens)POST /api/v1/sessions→.../messages/batch→.../commitGET /api/v1/stats/memoriesreturnstotal_memories: 0Workaround
Set an explicit cap in the
vlmblock:After this, extraction succeeds (
Extracted N memories for long_term) and memories become searchable.Suggested fix
A hardcoded
or 32768default is too high for most non-Volcengine models. Options:max_tokens is too large400, surface it as a clear configuration error (e.g. log aWARNING/raise) instead of silently extracting 0 memories.Environment: OpenViking 0.4.4 (PyPI), Python 3.12,
api_keyauth mode, OpenAIgpt-4o-mini.