[Bug] VLM default max_tokens=32768 exceeds completion-token limit of common models (e.g. gpt-4o-mini → 400), silently yielding 0 extracted memories

## Bug Description

When `vlm.max_tokens` is not set in the config, the OpenAI VLM backend defaults to **32768** completion tokens:

```python
# openviking/models/vlm/backends/openai_vlm.py  (lines 236, 274, on main and in 0.4.4)
max_tokens = self.max_tokens or 32768
kwargs["max_completion_tokens" if is_reasoning else "max_tokens"] = max_tokens
```

Many widely-used OpenAI models cap completion tokens **below** 32768 (e.g. `gpt-4o-mini` and `gpt-4o` allow at most **16384**). The API then rejects the memory-extraction call with HTTP 400:

```
openviking.session.compressor_v2 - ERROR - [trajectory] Failed to extract:
Error code: 400 - {'error': {'message': 'max_tokens is too large: 32768.
This model supports at most 16384 completion tokens, whereas you provided 32768.',
'type': 'invalid_request_error', 'param': 'max_tokens', 'code': 'invalid_value'}}
```

## Impact

- Memory extraction **completes without raising**, but logs `Extracted 0 memories` and `total_memories` stays `0`.
- It's **silent from the API's perspective** — `commit` returns 200 / `accepted`; the failure is only visible in server logs. Easy to mistake for "extraction just didn't find anything."
- Affects any deployment using a model whose max completion tokens < 32768 without explicitly setting `vlm.max_tokens` — `gpt-4o-mini` is a very common default choice.

## Reproduction

1. `ov.conf` → `vlm: { provider: "openai", model: "gpt-4o-mini", ... }` (no `max_tokens`)
2. `POST /api/v1/sessions` → `.../messages/batch` → `.../commit`
3. Server log shows the 400 above; `GET /api/v1/stats/memories` returns `total_memories: 0`

## Workaround

Set an explicit cap in the `vlm` block:
```json
"vlm": { "provider": "openai", "model": "gpt-4o-mini", "max_tokens": 16384 }
```
After this, extraction succeeds (`Extracted N memories for long_term`) and memories become searchable.

## Suggested fix

A hardcoded `or 32768` default is too high for most non-Volcengine models. Options:
- Lower the default to a broadly-safe value (e.g. 16384), or
- Derive a per-model cap, or
- At minimum, when the API returns the `max_tokens is too large` 400, surface it as a clear configuration error (e.g. log a `WARNING`/raise) instead of silently extracting 0 memories.

Environment: OpenViking **0.4.4** (PyPI), Python 3.12, `api_key` auth mode, OpenAI `gpt-4o-mini`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] VLM default max_tokens=32768 exceeds completion-token limit of common models (e.g. gpt-4o-mini → 400), silently yielding 0 extracted memories #2751

Bug Description

Impact

Reproduction

Workaround

Suggested fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] VLM default max_tokens=32768 exceeds completion-token limit of common models (e.g. gpt-4o-mini → 400), silently yielding 0 extracted memories #2751

Description

Bug Description

Impact

Reproduction

Workaround

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions