Clarify explanation of latencies in KV results

We should consider adding explanatory text to the top of the stdout text when the KV benchmark is run that indicates that measured latencies are for entire KV blocks, which can be quite large (e.g. 1 GB, with exact sizes and the distribution of sizes depending on the simulated KV cache workload).

This will reduce confusion about the results.

(Issue raised and discussed on the 2026-03-10 KV Cache TF call)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify explanation of latencies in KV results #266

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarify explanation of latencies in KV results #266

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions