Skip to content

[Bug] VLM default max_tokens=32768 exceeds completion-token limit of common models (e.g. gpt-4o-mini → 400), silently yielding 0 extracted memories #2751

Description

@gleydson115-code

Bug Description

When vlm.max_tokens is not set in the config, the OpenAI VLM backend defaults to 32768 completion tokens:

# openviking/models/vlm/backends/openai_vlm.py  (lines 236, 274, on main and in 0.4.4)
max_tokens = self.max_tokens or 32768
kwargs["max_completion_tokens" if is_reasoning else "max_tokens"] = max_tokens

Many widely-used OpenAI models cap completion tokens below 32768 (e.g. gpt-4o-mini and gpt-4o allow at most 16384). The API then rejects the memory-extraction call with HTTP 400:

openviking.session.compressor_v2 - ERROR - [trajectory] Failed to extract:
Error code: 400 - {'error': {'message': 'max_tokens is too large: 32768.
This model supports at most 16384 completion tokens, whereas you provided 32768.',
'type': 'invalid_request_error', 'param': 'max_tokens', 'code': 'invalid_value'}}

Impact

  • Memory extraction completes without raising, but logs Extracted 0 memories and total_memories stays 0.
  • It's silent from the API's perspectivecommit returns 200 / accepted; the failure is only visible in server logs. Easy to mistake for "extraction just didn't find anything."
  • Affects any deployment using a model whose max completion tokens < 32768 without explicitly setting vlm.max_tokensgpt-4o-mini is a very common default choice.

Reproduction

  1. ov.confvlm: { provider: "openai", model: "gpt-4o-mini", ... } (no max_tokens)
  2. POST /api/v1/sessions.../messages/batch.../commit
  3. Server log shows the 400 above; GET /api/v1/stats/memories returns total_memories: 0

Workaround

Set an explicit cap in the vlm block:

"vlm": { "provider": "openai", "model": "gpt-4o-mini", "max_tokens": 16384 }

After this, extraction succeeds (Extracted N memories for long_term) and memories become searchable.

Suggested fix

A hardcoded or 32768 default is too high for most non-Volcengine models. Options:

  • Lower the default to a broadly-safe value (e.g. 16384), or
  • Derive a per-model cap, or
  • At minimum, when the API returns the max_tokens is too large 400, surface it as a clear configuration error (e.g. log a WARNING/raise) instead of silently extracting 0 memories.

Environment: OpenViking 0.4.4 (PyPI), Python 3.12, api_key auth mode, OpenAI gpt-4o-mini.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingscenario:kernelCore server, runtime, storage, retrieval, SDK, CLI, or Studio behavior.urgency:bugIncorrect behavior with a bounded fix path.

Type

No type
No fields configured for issues without a type.

Projects

Status
In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions