We should consider adding explanatory text to the top of the stdout text when the KV benchmark is run that indicates that measured latencies are for entire KV blocks, which can be quite large (e.g. 1 GB, with exact sizes and the distribution of sizes depending on the simulated KV cache workload).
This will reduce confusion about the results.
(Issue raised and discussed on the 2026-03-10 KV Cache TF call)