Skip to content

On optimizing histogram findBounds() #671

Description

@jdmarshall

Apologies if this has become word salad. This was the contents of my morning walk and some mental exercises last night.

There's a histogram problem that OTEL figured out a couple years ago, and that's the linear bucket scan in the observe() logic. As you increase the number of buckets per label set you increase hosting costs but also collection time. And if you try to generalize your bucket sizes over different classes of traffic, (eg KV store access and write timing and overall response times with the same bucket sets), you can make a real mess.

They solved the insertion problem with a binary search. I'm not convinced that is the best solution to the problem. It might be worth considering other options as any changes to observe() result in diminishing returns if another better option comes along later.

The most obvious problem with BST is that we aren't looking for an equality, we are looking for an inequality. So we need to find the leftmost bucket that is greater than the our sample value. We are in effect looking for a 'between' rather than a match. So when you get close you need to look at one bucket and its subsequent and next bucket. A binary search is fine for that but other data structures like sorted heaps less so.

Secondly, insertions are not uniformly distributed, they are normally distributed. Most bucket sets I have thus far encountered represent a geometric progression of values. Most of the time every second bucket is twice the bound, and the bucket between them is the average of the two, resulting in an effective exponent of around 1.4. This geometric distribution of bucket bounds leads to some buckets having most of the entries and the outliers having few or sometimes zero. Most of those extra buckets are about making sure the p95 and p99 statistics don't artifact so badly that they look like hallucinations.

... except under heavy load most of these bets are off. Histogram cost is proportional to the samples per second, so when server load goes up due to traffic rather than request cost, a bigger and bigger slice of the CPU time is spent on telemetry instead of retiring requests. Especially in Node, if you aren't using cluster mode, because every stat calculation is blocking the event loop. Timers being by far the worst in this regard because they also occupy additional memory (Little's Law), but bucket lookups can also cost L1 cache.

We would be doing the users a service if findBounds required fewer lookups per sample, and particularly for slightly larger buckets than the long-term mean, because as system contention increases the p50 response times will shift right, looking more like the typical the p75 response times. Or p90. So ideally the most popular bucket should be quickest to find and the next ±2 buckets are nearly as efficient.

That suggests a priority balanced Treap, which would be simple enough to implement given that bucket.count is exactly the priority of each bucket. But I don't know how many users would configure enough buckets for that to win out over binary search.

The simplest thing that could work would be to track the n/2 bucket, and do a linear scan of the first or second half of the bucket set based on value <= middle.upperBound. Or the n/3 bucket, if the long tail pattern tends to hold.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions