Skip to content

TypeError: LoComoDataset.load_documents() got an unexpected keyword argument 'user_ids' #13

@King-Brownie

Description

@King-Brownie

Summary

At commit 45fa380, running omb run --dataset locomo --memory bm25 --split locomo10 --query-limit N raises:

TypeError: LoComoDataset.load_documents() got an unexpected keyword argument 'user_ids'

Root cause

The base class Dataset.load_documents() (src/memory_bench/dataset/base.py#L81-L92) declares:

@abstractmethod
def load_documents(
    self,
    split: str,
    category: str | None = None,
    limit: int | None = None,
    ids: set[str] | None = None,
    user_ids: set[str] | None = None,
) -> list[Document]:

The runner at src/memory_bench/runner.py#L127-L128 always passes user_ids when the dataset has an isolation_unit and a query_limit is set:

query_user_ids = {q.user_id for q in queries if q.user_id}
documents = dataset.load_documents(split, category=doc_category, limit=doc_limit, user_ids=query_user_ids)

But 4 of 7 concrete dataset overrides forgot to include the user_ids parameter in their signature:

Dataset user_ids in override? isolation_unit Bug fires today?
LoComoDataset 'conversation' YES
MemBenchDataset None latent
MemSimDataset None latent
PersonaMemDataset None latent
LongMemEvalDataset 'question' no
LifeBenchDataset 'user' no
BEAMDataset 'conversation' no

Only LoCoMo actively fires the TypeError (it has isolation_unit='conversation'). The other 3 are latent abstract-contract violations that would fire the moment their isolation_unit is set.

Minimal reproducer

git clone https://github.com/vectorize-io/agent-memory-benchmark.git
cd agent-memory-benchmark
git checkout 45fa380
uv run --python 3.12 omb run --dataset locomo --memory bm25 --split locomo10 --query-limit 3
# → TypeError: LoComoDataset.load_documents() got an unexpected keyword argument 'user_ids'

Fix

Filed as PR — adds the missing user_ids kwarg to all 4 affected datasets (LoCoMo gets the filter logic matching the BEAM/LongMemEval/LifeBench reference pattern; the other 3 are signature-only because their isolation_unit=None means the runner never passes them user_ids today).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions