Replies: 1 comment
-
|
I ran into the same thing in 2022 when we swapped the default DHT for a Redis directory in a 10‑node cluster. The DHT’s hash‑range locks mean every activation request triggers a gossip round‑trip to the node that owns the range. Even with a small cluster, that coordination cost adds ~1 - 2 ms per lookup, which shows up as a 2 - 3× hit when you have millions of activations per second. Redis, on the other hand, is a single‑threaded key/value store that can batch pipelined requests and has sub‑millisecond latency on local networks. It also avoids the DHT’s “split‑brain” scenarios where a node can temporarily hold a stale copy of a range and you get extra coordination hops. In production I saw the same pattern: the Redis directory was consistently ~3× faster than the in‑cluster DHT, especially under bursty traffic. If you’re using the default DHT, try measuring the time spent in `DirectoryCl |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
In our Orleans clusters we use the default In-Memory grain directory (DHT). Recently we performed a stress test with the same setup but with the Redis grain directory because we wanted stronger consistency, link here.
What we saw and we didn’t expect was a 2.5-3 times performance gain when the cluster was using the Redis grain directory instead of the DHT.
Specifically we tried the same scenarios with DHT and Redis, and the Redis grain directory cluster was able to handle 2.5-3 times the throughput of the DHT without issues.
Is anyone aware of other cases like this in production environments or similar tests that could verify our findings?
We have one theory why we see a performance gain but we haven’t been able to find mentions of this in the bibliography.
Our theory is:
Directory Coordination
Since the DHT is eventually consistent, it requires sophisticated and frequent, internal cluster communication. This coordination adds latency and overhead to grain lookups.
By using Redis, we centralize the directory to an external, strongly consistent key-value store. The silo performs a lookup operation in Redis. The Redis server handles the consistency, removing the need for complex coordination protocols between the silos, thus greatly improving overall cluster throughput.
Beta Was this translation helpful? Give feedback.
All reactions