-
Notifications
You must be signed in to change notification settings - Fork 59
Description
My understanding is that we have not yet come to a resolution on the question of when test data is generated in KVCache whether the technique currently used leads to overly high dedupe ratios or not.
This issue is intended to trigger the restart of that discussion and get us to a common understanding of the problem and a rough consensus on how to resolve it.
My understanding of the problem based upon talking to Russ is that he has seen real storage products that dedupe bulk data at the 512-byte granularity. That's the core reason he has raised his questions about the technique used to generate non-dedupeable data in KVCache today, it does not look to him like it can support ensuring non-dedupe at that granularity, which would give those vendors an unfair advantage against the other vendors.
IMHO it's important to separate the question of the speed and/or CPU cost of generating random data from the non-dedupability of the data. Our first priority must be deciding if the current technique will result in unfairness or not; then comes what we do about it if that were true.
I could not assign Hazem as an "assignee", so we'll just have to pull him into the discussions manually.