-
Notifications
You must be signed in to change notification settings - Fork 9
Track download size in ps_buckets
#160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
rkistner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with tracking this in the ps_buckets table.
Do we already have something similar to track the number of operations we have downloaded for the bucket? I know we have count_at_last / count_since_last, but I believe those are reset when the checkpoint completes, so not quite the same?
We should also keep in mind there are some differences in tracking it here versus the current diagnostics client logic:
- Diagnostics client tracks empty buckets using the checkpoint data, this only tracks buckets when it has data. Empty buckets can be useful to see to distinguish parameter query / subquery issues from data query issues.
- The diagnostics client keeps the bucket download history around when removing buckets, while the ps_buckets row can be deleted (and potentially re-created) when the bucket is removed, or when there is a checksum failure.
That's true, but those phantom buckets are essentially in-memory only right? They would also be tracked through diagnostics from this PR, so SDKs could react to that (and infer a default downloaded size of zero for buckets that exist in memory but not in
That is something worth remembering. I can see pros and cons for either approach (a growing download size without any notification when we've thrown a bucket away and are downloading it again is also not that helpful). |
I think that's fine. We should perhaps then just add a note somewhere in the diagnostics client on what it specifically means (i.e. it's the count from the server, not the actual count on the client).
I think it's semi-persisted in a localOnly table at the moment, but it's fine to have it in-memory only.
Yeah, I think that's fine. We can also keep track of the total downloaded data size for the current session in-memory on the client (i.e. count the bytes, no need to inspect the messages). That plus the new diagnostics events should be sufficient to identify cases of for example repeatedly re-downloading the same buckets. |
This adds a
download_sizecolumn tops_buckets. It is initialized to0, the sync client increments it whenever it receives adataline for a particular bucket.This also helps with #155: By tracking this in the Rust client, SDKs won't have to decode lines to correlate buckets with original message sizes.