Goal
Use GRPC clients as a more direct subscription update source instead of as a backup.
Background
Currently our GRPC clients don't subscribe to accounts when it is requested, but instead add
them to a hashset to then activate those pending subscriptions at an interval.
This has the following problems:
- we get no update until the subscription for an account is activated after the first request for that account
- for triton GRPC which does not support the backfill feature we miss those updates forever
- for helius GRPC we see those updates coming in, but there is a long delay from the time a
subscriptions is requested until the time it is activated and starts receiving updates
The reason for not immediately activating subscriptions is that each GRPC update connection
takes a complete filter and does not support adding/removing individual accounts from it.
Instead everytime we need to add/remove an account we need to send the entire filter again. Due
to the large payload I felt this wouldn't scale to run at a millisecond level especially once
the filter grew to thousands of accounts.
However if that was made possible then combined with the backfill feature that helius provides,
the GRPC client could be come a reliable real time source of account updates.
Proposal
Introduce the concept of stable sets of subscriptions vs. fluctuating sets. Keep the streams
with fluctuating sets small such that we can update them very frequently.
Move subscriptions from the fluctuating set to the stable set after a they pass a time or other
threshold in fluctuating.
This approach would be used for account subscriptions, but not for program subscriptions
since the latter are already way more stable and we do less of them, so they can be managed in
a single separate stream.
Implementation Overview
The stable vs. fluctuating sets would be managed via an approach similar to a generational
garbage collector. Stable sets would be part of the old generation and fluctuating sets would
be part of the new generation.
Subscriptions
- when a user requests a subscription it is immediately added to the
subscriptions: HashSet<Pubkey> and then added to the current new stream
- when a subscription is removed it is immediately removed from the
subscriptions: HashSet<Pubkey>, but no stream is updated yet
- when an update is received for a subscription that is not in the
subscriptions: HashSet<Pubkey> it is ignored
- this means that the streams might hold more subscriptions than the
subscriptions: HashSet<Pubkey> at any given time
- however the ignored updates overhead is preferable over the cost of updating the streams on
each unsubscribe especially if that account is inside an old stream
Generational Streams
MAX_SUBS_IN_OLD_OPTIMIZED = 2000
MAX_OLD_UNOPTIMIZED = 10
MAX_SUBS_IN_NEW = 200
- an old stream will hold up to MAX_SUBS_IN_OLD_OPTIMIZED subscriptions
- a new stream will hold up to MAX_SUBS_IN_NEW subscriptions
- a new stream becomes when it exceeds MAX_SUBS_IN_NEW subscriptions
- there will be as many old streams as needed to hold all subscriptions which means
a maximum of LRU_CACHE_SIZE / MAX_SUBS_IN_OLD_OPTIMIZED
- the amount of new streams that became old but have not been optimized yet can be
arbitrary, but will be limited to MAX_OLD_UNOPTIMIZED as to not create too many connections
- there will always be the current new stream on which we operate when addin
subs
- streams are in either of these two vecs optimized or unoptimized and one extra stream
exists (the current new stream)
- the GRPC laser client starts with just a single current new stream
Adding a Subscription
When a subscription is added the following steps are performed:
- Add pubkey to
subscriptions hashset
- Add pubkey to the current new stream
- If the current new stream exceeds MAX_SUBS_IN_NEW then:
- Add current new stream to unoptimized old streams vec
- Create new current new stream
- If the amount of unoptimized old streams exceeds MAX_OLD_UNOPTIMIZED then trigger
optimization
Stream Optimization
Stream optimizatoin is triggered at an interval, i.e. every 5mins but can also be manually triggered when the amount of unoptimized old streams exceeds MAX_OLD_UNOPTIMIZED.
When optimization is triggered the following steps are performed:
- Create chunks of the
subscriptions of size MAX_SUBS_IN_OLD_OPTIMIZED
- Create a filter and initialize a new stream for each chunk
Now all subscriptions are covered by optimized old streams.
NOTE: this is similar to how we activate subscriptions now and thus part of the
implementation can be reused.
NOTE: while this optimization is running we add new subscriptions to the current new
generation stream and will NEVER do either of the following:
- trigger optimization again
- add new subscriptions to any of the unoptimized old streams
This means that for a while our current new stream may go past the MAX_SUBS_IN_NEW
threshold and stay there for a while until the following happens:
- optimiziation completes
- we add another subscription to the current new stream
At this point the following is true:
- we only have old optimized streams and a single new stream
- all accounts that were unsubscribed before the optimization are now removed from the
optimized old streams
Goal
Use GRPC clients as a more direct subscription update source instead of as a backup.
Background
Currently our GRPC clients don't subscribe to accounts when it is requested, but instead add
them to a hashset to then activate those pending subscriptions at an interval.
This has the following problems:
subscriptions is requested until the time it is activated and starts receiving updates
The reason for not immediately activating subscriptions is that each GRPC update connection
takes a complete filter and does not support adding/removing individual accounts from it.
Instead everytime we need to add/remove an account we need to send the entire filter again. Due
to the large payload I felt this wouldn't scale to run at a millisecond level especially once
the filter grew to thousands of accounts.
However if that was made possible then combined with the backfill feature that helius provides,
the GRPC client could be come a reliable real time source of account updates.
Proposal
Introduce the concept of stable sets of subscriptions vs. fluctuating sets. Keep the streams
with fluctuating sets small such that we can update them very frequently.
Move subscriptions from the fluctuating set to the stable set after a they pass a time or other
threshold in fluctuating.
This approach would be used for account subscriptions, but not for program subscriptions
since the latter are already way more stable and we do less of them, so they can be managed in
a single separate stream.
Implementation Overview
The stable vs. fluctuating sets would be managed via an approach similar to a generational
garbage collector. Stable sets would be part of the old generation and fluctuating sets would
be part of the new generation.
Subscriptions
subscriptions: HashSet<Pubkey>and then added to the current new streamsubscriptions: HashSet<Pubkey>, but no stream is updated yetsubscriptions: HashSet<Pubkey>it is ignoredsubscriptions: HashSet<Pubkey>at any given timeeach unsubscribe especially if that account is inside an old stream
Generational Streams
a maximum of
LRU_CACHE_SIZE / MAX_SUBS_IN_OLD_OPTIMIZEDarbitrary, but will be limited to MAX_OLD_UNOPTIMIZED as to not create too many connections
subs
exists (the current new stream)
Adding a Subscription
When a subscription is added the following steps are performed:
subscriptionshashsetoptimization
Stream Optimization
Stream optimizatoin is triggered at an interval, i.e. every 5mins but can also be manually triggered when the amount of unoptimized old streams exceeds MAX_OLD_UNOPTIMIZED.
When optimization is triggered the following steps are performed:
subscriptionsof size MAX_SUBS_IN_OLD_OPTIMIZEDNow all subscriptions are covered by optimized old streams.
NOTE: this is similar to how we activate subscriptions now and thus part of the
implementation can be reused.
NOTE: while this optimization is running we add new subscriptions to the current new
generation stream and will NEVER do either of the following:
This means that for a while our current new stream may go past the MAX_SUBS_IN_NEW
threshold and stay there for a while until the following happens:
At this point the following is true:
optimized old streams