ContextFit submission path for Agent Memory Benchmark / comparison tracking

ContextFit has a fresh LongMemEval-S retrieval/evidence-ranking artifact that may be relevant for comparison tracking.

Result summary:

- Dataset: LongMemEval-S cleaned, 500 rows, 470 scored after excluding 30 abstention rows
- Metric: retrieval/evidence ranking, not end-to-end QA accuracy
- Any evidence @5: 96.60%
- Any evidence @10: 98.72%
- All evidence @5: 83.62%
- All evidence @10: 91.28%
- MRR: 0.8999
- Storage/search: ContextFit local token-native index; no vector database
- Optional fusion signal: OpenAI `text-embedding-3-small`, cached locally and fused with ContextFit rankings

Public note: https://www.context.fit/longmemeval-fusion-20260519.html
Repro report: https://github.com/ContextFit/cf/blob/contextfit-lockin-20260518/benchmarks/longmemeval_fusion_claim_966_987_20260519.md
Raw JSON artifact: https://github.com/ContextFit/cf/blob/contextfit-lockin-20260518/benchmarks/longmemeval_fusion_claim_966_987_20260519.json
Artifact SHA-256: `059c778ca389e2a5939505800acffd6349f0be7ada579238023d342784214932`

Important caveat: this should not be listed as an official LongMemEval QA score unless/until ContextFit is run through the answer-generation + judging harness. The supported wording is: ContextFit with optional OpenAI fusion reaches 96.6% Any@5 and 98.7% Any@10 evidence retrieval on LongMemEval-S, with no vector database required.

I realize AMB has its own provider interface and answer-generation/judge pipeline, so this LongMemEval retrieval artifact may not be directly leaderboard-eligible. If external memory providers are accepted, I can follow up with a ContextFit provider adapter and AMB-native output package.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ContextFit submission path for Agent Memory Benchmark / comparison tracking #16

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ContextFit submission path for Agent Memory Benchmark / comparison tracking #16

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions