--llm flag is parsed but not used; OMB_ANSWER_LLM env default is 'groq' (not 'gemini')

## Summary

Two compound issues in the answer-LLM wiring at commit `45fa380` mean `omb run --llm gemini` still dispatches answer generation to Groq:

1. **`--llm` is parsed but never threaded to `get_answer_llm()`**. At [src/memory_bench/cli.py#L39](https://github.com/vectorize-io/agent-memory-benchmark/blob/main/src/memory_bench/cli.py#L39) the flag is captured into a local `llm` variable, but [cli.py#L68](https://github.com/vectorize-io/agent-memory-benchmark/blob/main/src/memory_bench/cli.py#L68) calls `get_answer_llm()` with no arguments:

   ```python
   mode=get_mode(mode, llm=get_answer_llm()),
   ```

2. **`get_answer_llm()` defaults to `"groq"`**. At [src/memory_bench/llm/__init__.py#L24](https://github.com/vectorize-io/agent-memory-benchmark/blob/main/src/memory_bench/llm/__init__.py#L24):

   ```python
   def get_answer_llm() -> LLM:
       provider = os.environ.get("OMB_ANSWER_LLM", "groq")
       ...
   ```

   Combined effect: the `--llm` flag is decorative; `OMB_ANSWER_LLM` env var is the only way to actually pick an answer LLM, and the default disagrees with `--llm`'s documented default of `"gemini"`.

## Reproducer

```bash
git clone https://github.com/vectorize-io/agent-memory-benchmark.git
cd agent-memory-benchmark
git checkout 45fa380
# Even though --llm gemini matches the documented default, Groq is invoked:
unset OMB_ANSWER_LLM
uv run --python 3.12 omb run --dataset locomo --memory bm25 --split locomo10 --query-limit 1 --llm gemini
# → answer phase hits memory_bench/llm/groq.py:32 (GroqLLM.generate) and
#   APIConnectionError / 401 / etc. depending on GROQ_API_KEY presence and
#   network egress to api.groq.com.
```

Workaround for callers today: `OMB_ANSWER_LLM=gemini omb run ...` (env var wins because `--llm` is unused).

## Suggested fix

Two minimal, independent fixes:

1. **Honor `--llm` at the call site** ([cli.py#L68](https://github.com/vectorize-io/agent-memory-benchmark/blob/main/src/memory_bench/cli.py#L68)):

   ```python
   answer_llm = get_llm(llm) if llm else get_answer_llm()
   mode=get_mode(mode, llm=answer_llm),
   ```

   This makes `--llm gemini` actually do what it documents, while still letting `OMB_ANSWER_LLM` work for callers who prefer the env-var path.

2. **Align `get_answer_llm()` default with `--llm`'s default** ([llm/__init__.py#L24](https://github.com/vectorize-io/agent-memory-benchmark/blob/main/src/memory_bench/llm/__init__.py#L24)):

   ```python
   provider = os.environ.get("OMB_ANSWER_LLM", "gemini")
   ```

   Matches the Gemini 2.5 Flash judge baseline landed in 45fa380 and the README/CLI documented default. The current `"groq"` default makes the CLI ship with internally inconsistent semantics.

Happy to file a PR with both fixes if helpful — let me know whether you'd prefer one combined PR or two separated by concern.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--llm flag is parsed but not used; OMB_ANSWER_LLM env default is 'groq' (not 'gemini') #15

Summary

Reproducer

Suggested fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

--llm flag is parsed but not used; OMB_ANSWER_LLM env default is 'groq' (not 'gemini') #15

Description

Summary

Reproducer

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions