Skip to content

kvcache: skip multi-turn cache reads in decode-only mode#284

Open
LouisDDN wants to merge 1 commit intomlcommons:mainfrom
LouisDDN:ld/skip-multiturn-decode-only
Open

kvcache: skip multi-turn cache reads in decode-only mode#284
LouisDDN wants to merge 1 commit intomlcommons:mainfrom
LouisDDN:ld/skip-multiturn-decode-only

Conversation

@LouisDDN
Copy link
Contributor

Skip multi-turn conversation cache reads when running in decode-only mode, since previous turn cache entries are never written in this mode.

This change:

  • Prevents wasteful cache lookups that always miss
  • Cleans up multi_turn_cache_misses metrics (no longer polluted)
  • Improves code correctness by not checking cache that was never written

The multi-turn cache read (Step 2) is now guarded by the same if not self.decode_only check as the prefill write (Step 3), since both operations are meaningless in decode-only mode.

Performance impact: negligible (<0.01%), but improves code clarity.

@LouisDDN LouisDDN requested a review from a team March 20, 2026 15:52
@github-actions
Copy link

github-actions bot commented Mar 20, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Skip multi-turn conversation cache reads when running in decode-only mode,
since previous turn cache entries are never written in this mode.

This change:
- Prevents wasteful cache lookups that always miss
- Cleans up multi_turn_cache_misses metrics (no longer polluted)
- Improves code correctness by not checking cache that was never written

The multi-turn cache read (Step 2) is now guarded by the same
`if not self.decode_only` check as the prefill write (Step 3),
since both operations are meaningless in decode-only mode.

Performance impact: negligible (<0.01%), but improves code clarity.
@LouisDDN LouisDDN force-pushed the ld/skip-multiturn-decode-only branch from 984ee49 to f45de66 Compare March 20, 2026 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant