Skip to content

Clarify explanation of latencies in KV results #266

@dslik

Description

@dslik

We should consider adding explanatory text to the top of the stdout text when the KV benchmark is run that indicates that measured latencies are for entire KV blocks, which can be quite large (e.g. 1 GB, with exact sizes and the distribution of sizes depending on the simulated KV cache workload).

This will reduce confusion about the results.

(Issue raised and discussed on the 2026-03-10 KV Cache TF call)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions