Summary
Two compound issues in the answer-LLM wiring at commit 45fa380 mean omb run --llm gemini still dispatches answer generation to Groq:
-
--llm is parsed but never threaded to get_answer_llm(). At src/memory_bench/cli.py#L39 the flag is captured into a local llm variable, but cli.py#L68 calls get_answer_llm() with no arguments:
mode=get_mode(mode, llm=get_answer_llm()),
-
get_answer_llm() defaults to "groq". At src/memory_bench/llm/init.py#L24:
def get_answer_llm() -> LLM:
provider = os.environ.get("OMB_ANSWER_LLM", "groq")
...
Combined effect: the --llm flag is decorative; OMB_ANSWER_LLM env var is the only way to actually pick an answer LLM, and the default disagrees with --llm's documented default of "gemini".
Reproducer
git clone https://github.com/vectorize-io/agent-memory-benchmark.git
cd agent-memory-benchmark
git checkout 45fa380
# Even though --llm gemini matches the documented default, Groq is invoked:
unset OMB_ANSWER_LLM
uv run --python 3.12 omb run --dataset locomo --memory bm25 --split locomo10 --query-limit 1 --llm gemini
# → answer phase hits memory_bench/llm/groq.py:32 (GroqLLM.generate) and
# APIConnectionError / 401 / etc. depending on GROQ_API_KEY presence and
# network egress to api.groq.com.
Workaround for callers today: OMB_ANSWER_LLM=gemini omb run ... (env var wins because --llm is unused).
Suggested fix
Two minimal, independent fixes:
-
Honor --llm at the call site (cli.py#L68):
answer_llm = get_llm(llm) if llm else get_answer_llm()
mode=get_mode(mode, llm=answer_llm),
This makes --llm gemini actually do what it documents, while still letting OMB_ANSWER_LLM work for callers who prefer the env-var path.
-
Align get_answer_llm() default with --llm's default (llm/init.py#L24):
provider = os.environ.get("OMB_ANSWER_LLM", "gemini")
Matches the Gemini 2.5 Flash judge baseline landed in 45fa380 and the README/CLI documented default. The current "groq" default makes the CLI ship with internally inconsistent semantics.
Happy to file a PR with both fixes if helpful — let me know whether you'd prefer one combined PR or two separated by concern.
Summary
Two compound issues in the answer-LLM wiring at commit
45fa380meanomb run --llm geministill dispatches answer generation to Groq:--llmis parsed but never threaded toget_answer_llm(). At src/memory_bench/cli.py#L39 the flag is captured into a localllmvariable, but cli.py#L68 callsget_answer_llm()with no arguments:get_answer_llm()defaults to"groq". At src/memory_bench/llm/init.py#L24:Combined effect: the
--llmflag is decorative;OMB_ANSWER_LLMenv var is the only way to actually pick an answer LLM, and the default disagrees with--llm's documented default of"gemini".Reproducer
Workaround for callers today:
OMB_ANSWER_LLM=gemini omb run ...(env var wins because--llmis unused).Suggested fix
Two minimal, independent fixes:
Honor
--llmat the call site (cli.py#L68):This makes
--llm geminiactually do what it documents, while still lettingOMB_ANSWER_LLMwork for callers who prefer the env-var path.Align
get_answer_llm()default with--llm's default (llm/init.py#L24):Matches the Gemini 2.5 Flash judge baseline landed in 45fa380 and the README/CLI documented default. The current
"groq"default makes the CLI ship with internally inconsistent semantics.Happy to file a PR with both fixes if helpful — let me know whether you'd prefer one combined PR or two separated by concern.