-
Notifications
You must be signed in to change notification settings - Fork 28
Add support for batch warmup #438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4115d63 to
086e581
Compare
|
@aawsome , I have been looking forward to your feedback on this. This addresses an issue that I've been thinking about for over a year, and I hope that it helps gain new customers for rustic. |
|
Hi @philipmw! First, thanks a lot for your proposal and deep apologies for letting you wait so long. I wanted to have a look very soon, but then this PR somehow went below my radar - sorry for that! I took a lock at the code changes (but more a general one) and the implementation looks fine from my point of view. There are however some general points:
What do you think about these points? |
|
Also note that there are rustfmt and clippy checks failing - we are quite strict about formatting and clippy-compliance, so these findings must also be fixed (but after discussing the general points..) |
|
HI @philipmw! |
|
Hi, @aawsome , now it's my turn to apologize for taking so long to reply to your feedback. Now that the holidays are over, I will try to reply and act faster on this, as I am still motivated to get it built. I was sitting with my computer already analyzing your feedback when you wrote the most recent update. Here are my responses to each point:
Suggested task is to rename
Change` sequential warmup to parallel. I think this is a good idea, although I am not certain that it would be backward compatible for all current users. Do you have any concerns here?
Can you clarify this feedback? What "args given for the command" are you referring to?
I agree, "anchor" is not intuitive. With this name, I was alluding to the HTML anchor element. Another parallel is Firefox's dynamic bookmarks feature: "Wherever the string %s appears in the bookmark's URL, it will be replaced with any words typed in the address bar after the bookmark's keyword and a space, properly URL-encoded, so they can be used as query string parameters to a search engine, for example." Perhaps another good name for this could be` "substitute", "replace", "dynamic". "Variable" is also fine. Given this list, which name do you like most? (Naming is hard.) I agree that "argv" could be better renamed to "args". Happy to rename it.
You are suggesting that, instead of
Thanks. I didn't realize because the CI step didn't run automatically. Now that it ran for this PR, I will fix the issues. |
I agree that
If
Sorry, for me it looked like you were omitting the given args, but https://github.com/rustic-rs/rustic_core/pull/438/files#diff-de508849190b7987f41c9d008e5e4bd90aad3c464a04bf5a73f661cf952b4a62R188 does include them. My fault.
Actually, I suggest to not use
There is some validation needed so that
|
The variable "%id" has made sense because it works so much like shell substitution, which CLI users are already familiar with. Pack IDs don't have spaces, so it works well. But once we're adding "%ids" and the "%path" / "%paths", it no longer resembles shell substitution. Now one percent-string is a variable, while another is actually just a directive, not a variable. Is your motivation to keep the number of command-line parameters small / keep commands short? Perhaps as an alternative, we could do this: if To eliminate the type parameter, we could consider breaking backward compatibility and making "path" the default. What was the original motivation to make it just the pack ID? Do you think anyone would mind if we change the default? If we do, we could eliminate that parameter. |
From a users perspective, I think that having "%ids" being replaced by space separated ids "1234.. 3423... 23423.." is what they'd expect for variable substitution.
This but also to prevent users from the need to specify lot of things via CLI parameters (we already have tons of it.. ;-) ). In the
I must say I'd like using
We have the problem of the tree used to store pack ids: On a local dir the path looks like |
|
On rereading your proposal, I realize I misunderstood what you were suggesting. Now I understand and have no concerns beyond what you already outlined with edge cases like Let me take a crack at implementing it over this weekend. |
This implements the idea in rustic-rs/rustic#1430 and the subsequent feedback in rustic-rs#438 To use this feature, I wrote a proof-of-concept *warmup-s3-archives* program: https://gitlab.com/philipmw/warmup-s3-archives Changes: * add `--warm-up-batch <N>` parameter * add variables to `--warm-up-command` parameter to support singular and plural IDs and paths * add a function to the backend interface that provides the S3 key (or other key usable by the warmup command) instead of pack ID. Tested: * unit tests; * restoring data from Glacier Deep Archive spanning 3 packs, specifying a batch size of >=3 and argv mode * killing and restarting restore; this works as long as the warmup program is idempotent (which works for S3) * having the warmup command exit with an error code; rustic aborts the restore and prints a correct error message * running multiple restore operations for different packs in parallel. The warmup program ignores notifications for packs that it does not recognize, leaving them in the queue and letting another warmup instance process them. Known limitations and my thoughts for improvement opportunities: * setup is non-trivial, between the AWS infrastructure and the warmup program configuration. It requires some AWS experience and cold storage motivation. But IMO all the complexity is specific to the domain; none is incidental. * rustic does not pass the backend credentials to the warmup program. The warmup program is responsible for finding credentials on its own. Probably the best solution, at least for AWS, is for both rustic and the warmup program to use a common AWS credential provider. * rustic's progress bar does not reflect warmup progress within a batch; only progress of entire batches. There is no protocol for communicating progress from a single invocation of the warmup command. * rustic's warmup parameters could use a refactor as we discover and clarify cold storage backup scenarios. The distinction between `--warm-up-command` and `--warm-up-wait-command` seems too subtle. `--warm-up-wait` is too inflexible (since cold storage backends' estimates are measured in hours) and can be avoided entirely.
cfaa0cc to
1e3947f
Compare
This implements the idea in rustic-rs/rustic#1430 and the subsequent feedback in rustic-rs#438 To use this feature, I wrote a proof-of-concept *warmup-s3-archives* program: https://gitlab.com/philipmw/warmup-s3-archives Changes: * add `--warm-up-batch <N>` parameter * add variables to `--warm-up-command` parameter to support singular and plural IDs and paths * add a function to the backend interface that provides the S3 key (or other key usable by the warmup command) instead of pack ID. Tested: * unit tests; * restoring data from Glacier Deep Archive spanning 3 packs, specifying a batch size of >=3 and argv mode * killing and restarting restore; this works as long as the warmup program is idempotent (which works for S3) * having the warmup command exit with an error code; rustic aborts the restore and prints a correct error message * running multiple restore operations for different packs in parallel. The warmup program ignores notifications for packs that it does not recognize, leaving them in the queue and letting another warmup instance process them. Known limitations and my thoughts for improvement opportunities: * setup is non-trivial, between the AWS infrastructure and the warmup program configuration. It requires some AWS experience and cold storage motivation. But IMO all the complexity is specific to the domain; none is incidental. * rustic does not pass the backend credentials to the warmup program. The warmup program is responsible for finding credentials on its own. Probably the best solution, at least for AWS, is for both rustic and the warmup program to use a common AWS credential provider. * rustic's progress bar does not reflect warmup progress within a batch; only progress of entire batches. There is no protocol for communicating progress from a single invocation of the warmup command. * rustic's warmup parameters could use a refactor as we discover and clarify cold storage backup scenarios. The distinction between `--warm-up-command` and `--warm-up-wait-command` seems too subtle. `--warm-up-wait` is too inflexible (since cold storage backends' estimates are measured in hours) and can be avoided entirely.
1e3947f to
3b38173
Compare
This implements the idea in rustic-rs/rustic#1430 and the subsequent feedback in rustic-rs#438 To use this feature, I wrote a proof-of-concept *warmup-s3-archives* program: https://gitlab.com/philipmw/warmup-s3-archives Changes: * add `--warm-up-batch <N>` parameter * add variables to `--warm-up-command` parameter to support singular and plural IDs and paths * add a function to the backend interface that provides the S3 key (or other key usable by the warmup command) instead of pack ID. Tested: * unit tests; * restoring data from Glacier Deep Archive spanning 3 packs, specifying a batch size of >=3 and argv mode * killing and restarting restore; this works as long as the warmup program is idempotent (which works for S3) * having the warmup command exit with an error code; rustic aborts the restore and prints a correct error message * running multiple restore operations for different packs in parallel. The warmup program ignores notifications for packs that it does not recognize, leaving them in the queue and letting another warmup instance process them. Known limitations and my thoughts for improvement opportunities: * setup is non-trivial, between the AWS infrastructure and the warmup program configuration. It requires some AWS experience and cold storage motivation. But IMO all the complexity is specific to the domain; none is incidental. * rustic does not pass the backend credentials to the warmup program. The warmup program is responsible for finding credentials on its own. Probably the best solution, at least for AWS, is for both rustic and the warmup program to use a common AWS credential provider. * rustic's progress bar does not reflect warmup progress within a batch; only progress of entire batches. There is no protocol for communicating progress from a single invocation of the warmup command. * rustic's warmup parameters could use a refactor as we discover and clarify cold storage backup scenarios. The distinction between `--warm-up-command` and `--warm-up-wait-command` seems too subtle. `--warm-up-wait` is too inflexible (since cold storage backends' estimates are measured in hours) and can be avoided entirely.
3b38173 to
904d7b0
Compare
|
@aawsome , it is ready for your review. I implemented all the suggestions. |
aawsome
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great already! I found some things I think we can simplify the code at some places without changing functionality, so don't be shocked by the number of comments ;-)
This implements the idea in rustic-rs/rustic#1430 and the subsequent feedback in rustic-rs#438 To use this feature, I wrote a proof-of-concept *warmup-s3-archives* program: https://gitlab.com/philipmw/warmup-s3-archives Changes: * add `--warm-up-batch <N>` parameter * add variables to `--warm-up-command` parameter to support singular and plural IDs and paths * add a function to the backend interface that provides the S3 key (or other key usable by the warmup command) instead of pack ID. Tested: * unit tests; * invoking the warmup program with "%id", "%ids", "%pack", "%packs", and batch size of 2 for a total restore size of 3 packs, verifying that the warmup command is invoked either separately per ID/pack or with two IDs/packs for the first invocation and with one ID/pack for the second invocation. * killing and restarting restore; this works as long as the warmup program is idempotent (which works for S3) * having the warmup command exit with an error code; rustic aborts the restore and prints a correct error message * running multiple restore operations for different packs in parallel. The warmup program ignores notifications for packs that it does not recognize, leaving them in the queue and letting another warmup instance process them. Known limitations and my thoughts for improvement opportunities: * setup is non-trivial, between the AWS infrastructure and the warmup program configuration. It requires some AWS experience and cold storage motivation. But IMO all the complexity is specific to the domain; none is incidental. * rustic does not pass the backend credentials to the warmup program. The warmup program is responsible for finding credentials on its own. Probably the best solution, at least for AWS, is for both rustic and the warmup program to use a common AWS credential provider. * rustic's progress bar does not reflect warmup progress within a batch; only progress of entire batches. There is no protocol for communicating progress from a single invocation of the warmup command. * rustic's warmup parameters could use a refactor as we discover and clarify cold storage backup scenarios. The distinction between `--warm-up-command` and `--warm-up-wait-command` seems too subtle. `--warm-up-wait` is too inflexible (since cold storage backends' estimates are measured in hours) and can be avoided entirely.
904d7b0 to
f7b82be
Compare
|
@aawsome , I implemented all the changes. I also noticed that the CI tests are failing for macOS. For one of them, I made a fix, but the other one, I am not certain. It may be already fixed with the latest changes, so let's see how the latest CI build runs. It seems that the CI won't run automatically til you approve it; is there any way to make it run automatically so that I get faster feedback from it? |
Add support for batch warmup
This implements the idea in rustic-rs/rustic#1430
To use this feature, I wrote a proof-of-concept warmup-s3-archives program:
https://gitlab.com/philipmw/warmup-s3-archives
Changes:
--warm-up-batch <N>parameter--warm-up-pack-id-input <mode>parameter--warm-up-input-type <type>parameterkey usable by the warmup command) instead of pack ID.
Tested:
batch size of >=3 and argv mode
is idempotent (which works for S3)
and prints a correct error message
warmup program ignores notifications for packs that it does not recognize,
leaving them in the queue and letting another warmup instance process them.
Known limitations and my thoughts for improvement opportunities:
It requires some AWS experience and cold storage motivation. But IMO all the
complexity is specific to the domain; none is incidental.
program is responsible for finding credentials on its own.
Probably the best solution, at least for AWS, is for both rustic and the
warmup program to use a common AWS credential provider.
progress of entire batches. There is no protocol for communicating progress
from a single invocation of the warmup command.
as we discover and clarify cold storage backup scenarios.
The distinction between
--warm-up-commandand--warm-up-wait-commandseems too subtle.
--warm-up-waitis too inflexible (since cold storagebackends' estimates are measured in hours) and can be avoided entirely.