ContextFit has a fresh LongMemEval-S retrieval/evidence-ranking artifact that may be relevant for comparison tracking.
Result summary:
- Dataset: LongMemEval-S cleaned, 500 rows, 470 scored after excluding 30 abstention rows
- Metric: retrieval/evidence ranking, not end-to-end QA accuracy
- Any evidence @5: 96.60%
- Any evidence @10: 98.72%
- All evidence @5: 83.62%
- All evidence @10: 91.28%
- MRR: 0.8999
- Storage/search: ContextFit local token-native index; no vector database
- Optional fusion signal: OpenAI
text-embedding-3-small, cached locally and fused with ContextFit rankings
Public note: https://www.context.fit/longmemeval-fusion-20260519.html
Repro report: https://github.com/ContextFit/cf/blob/contextfit-lockin-20260518/benchmarks/longmemeval_fusion_claim_966_987_20260519.md
Raw JSON artifact: https://github.com/ContextFit/cf/blob/contextfit-lockin-20260518/benchmarks/longmemeval_fusion_claim_966_987_20260519.json
Artifact SHA-256: 059c778ca389e2a5939505800acffd6349f0be7ada579238023d342784214932
Important caveat: this should not be listed as an official LongMemEval QA score unless/until ContextFit is run through the answer-generation + judging harness. The supported wording is: ContextFit with optional OpenAI fusion reaches 96.6% Any@5 and 98.7% Any@10 evidence retrieval on LongMemEval-S, with no vector database required.
I realize AMB has its own provider interface and answer-generation/judge pipeline, so this LongMemEval retrieval artifact may not be directly leaderboard-eligible. If external memory providers are accepted, I can follow up with a ContextFit provider adapter and AMB-native output package.
ContextFit has a fresh LongMemEval-S retrieval/evidence-ranking artifact that may be relevant for comparison tracking.
Result summary:
text-embedding-3-small, cached locally and fused with ContextFit rankingsPublic note: https://www.context.fit/longmemeval-fusion-20260519.html
Repro report: https://github.com/ContextFit/cf/blob/contextfit-lockin-20260518/benchmarks/longmemeval_fusion_claim_966_987_20260519.md
Raw JSON artifact: https://github.com/ContextFit/cf/blob/contextfit-lockin-20260518/benchmarks/longmemeval_fusion_claim_966_987_20260519.json
Artifact SHA-256:
059c778ca389e2a5939505800acffd6349f0be7ada579238023d342784214932Important caveat: this should not be listed as an official LongMemEval QA score unless/until ContextFit is run through the answer-generation + judging harness. The supported wording is: ContextFit with optional OpenAI fusion reaches 96.6% Any@5 and 98.7% Any@10 evidence retrieval on LongMemEval-S, with no vector database required.
I realize AMB has its own provider interface and answer-generation/judge pipeline, so this LongMemEval retrieval artifact may not be directly leaderboard-eligible. If external memory providers are accepted, I can follow up with a ContextFit provider adapter and AMB-native output package.