Add support for batch warmup #438

philipmw · 2025-11-09T18:07:35Z

Add support for batch warmup

This implements the idea in rustic-rs/rustic#1430

To use this feature, I wrote a proof-of-concept warmup-s3-archives program:
https://gitlab.com/philipmw/warmup-s3-archives

Changes:

add --warm-up-batch <N> parameter
add --warm-up-pack-id-input <mode> parameter
add --warm-up-input-type <type> parameter
add a function to the backend interface that provides the S3 key (or other
key usable by the warmup command) instead of pack ID.

Tested:

unit tests;
restoring data from Glacier Deep Archive spanning 3 packs, specifying a
batch size of >=3 and argv mode
killing and restarting restore; this works as long as the warmup program
is idempotent (which works for S3)
having the warmup command exit with an error code; rustic aborts the restore
and prints a correct error message
running multiple restore operations for different packs in parallel. The
warmup program ignores notifications for packs that it does not recognize,
leaving them in the queue and letting another warmup instance process them.

Known limitations and my thoughts for improvement opportunities:

setup is non-trivial, between the AWS infrastructure and the warmup program configuration.
It requires some AWS experience and cold storage motivation. But IMO all the
complexity is specific to the domain; none is incidental.
rustic does not pass the backend credentials to the warmup program. The warmup
program is responsible for finding credentials on its own.
Probably the best solution, at least for AWS, is for both rustic and the
warmup program to use a common AWS credential provider.
rustic's progress bar does not reflect warmup progress within a batch; only
progress of entire batches. There is no protocol for communicating progress
from a single invocation of the warmup command.
rustic's warmup parameters are growing in complexity and could use a refactor
as we discover and clarify cold storage backup scenarios.
The distinction between --warm-up-command and --warm-up-wait-command
seems too subtle. --warm-up-wait is too inflexible (since cold storage
backends' estimates are measured in hours) and can be avoided entirely.

philipmw · 2025-12-21T06:28:40Z

@aawsome , I have been looking forward to your feedback on this. This addresses an issue that I've been thinking about for over a year, and I hope that it helps gain new customers for rustic.

aawsome · 2025-12-21T21:31:31Z

Hi @philipmw!

First, thanks a lot for your proposal and deep apologies for letting you wait so long. I wanted to have a look very soon, but then this PR somehow went below my radar - sorry for that!

I took a lock at the code changes (but more a general one) and the implementation looks fine from my point of view. There are however some general points:

First I must say that the design to only warm-up pack files was wrong (from my side) - as there are use cases (when repairing a hot/cold repo) where we may want to also warm-up other files. I think we should extend the warmup. This can go in a future PR; however, I think we must keep an eye to ensure this can be easily extended. For backend_path this should be the case, but the name pack_id should be id and then we'll need a way to transfer the type to the command....
As a batch is now added for "anchor" mode as well, I think this should not be implemented by waiting for each command to finish, but instead spawn all commands of the batch and then wait for all of them - making this also a parallel warm-up.
For "argv" mode, I find it irritating that the args given for the command are completely ignored. What about keeping them and just append the args of what to warm-up there? (actually that could be also the solution to how to give the type of the file: In the existing args there could be a %type or something like this to be replaced like in "id" mode; just some thoughts I'm having while writing this...)
I must say I was personally a bit confused about the name "anchor" and would maybe call it "variable" or something like this. Do you have another suggestion? Also "argv" is quite technical, maybe "args" is a better name?

What do you think about these points?

aawsome · 2025-12-21T21:50:33Z

Also note that there are rustfmt and clippy checks failing - we are quite strict about formatting and clippy-compliance, so these findings must also be fixed (but after discussing the general points..)

aawsome · 2026-01-20T00:31:04Z

HI @philipmw!
Are you still interested in working on this? If not, I would try to adapt this in order to get it into rustic!
When thinking about it, I wondered if it was easier to work if we not use argv vs anchor mode and packid vs path type, but just different variables to substitute for all cases. Something like id (anchor+packid), ids (argv+packid), path (anchor+path) and paths (argv+packid) (+ additionally type solving the mentioned problem about not only warming-up packfiles). What do you think about it?
Sorry again for the long delay after your PR!

philipmw · 2026-01-20T01:12:16Z

Hi, @aawsome , now it's my turn to apologize for taking so long to reply to your feedback. Now that the holidays are over, I will try to reply and act faster on this, as I am still motivated to get it built. I was sitting with my computer already analyzing your feedback when you wrote the most recent update. Here are my responses to each point:

First I must say that the design to only warm-up pack files was wrong (from my side) - as there are use cases (when repairing a hot/cold repo) where we may want to also warm-up other files. I think we should extend the warmup. This can go in a future PR; however, I think we must keep an eye to ensure this can be easily extended. For backend_path this should be the case, but the name pack_id should be id and then we'll need a way to transfer the type to the command....

Suggested task is to rename pack_id to id. I have no concerns with this and happy to implement.

As a batch is now added for "anchor" mode as well, I think this should not be implemented by waiting for each command to finish, but instead spawn all commands of the batch and then wait for all of them - making this also a parallel warm-up.

Change` sequential warmup to parallel. I think this is a good idea, although I am not certain that it would be backward compatible for all current users. Do you have any concerns here?

For "argv" mode, I find it irritating that the args given for the command are completely ignored. What about keeping them and just append the args of what to warm-up there? (actually that could be also the solution to how to give the type of the file: In the existing args there could be a %type or something like this to be replaced like in "id" mode; just some thoughts I'm having while writing this...)

Can you clarify this feedback? What "args given for the command" are you referring to?

I must say I was personally a bit confused about the name "anchor" and would maybe call it "variable" or something like this. Do you have another suggestion? Also "argv" is quite technical, maybe "args" is a better name?

I agree, "anchor" is not intuitive. With this name, I was alluding to the HTML anchor element.

Another parallel is Firefox's dynamic bookmarks feature: "Wherever the string %s appears in the bookmark's URL, it will be replaced with any words typed in the address bar after the bookmark's keyword and a space, properly URL-encoded, so they can be used as query string parameters to a search engine, for example."

Perhaps another good name for this could be` "substitute", "replace", "dynamic". "Variable" is also fine. Given this list, which name do you like most? (Naming is hard.)

I agree that "argv" could be better renamed to "args". Happy to rename it.

When thinking about it, I wondered if it was easier to work if we not use argv vs anchor mode and packid vs path type, but just different variables to substitute for all cases. Something like id (anchor+packid), ids (argv+packid), path (anchor+path) and paths (argv+packid) (+ additionally type solving the mentioned problem about not only warming-up packfiles). What do you think about it?

You are suggesting that, instead of --warm-up-pack-id-input and --warm-up-input-type parameters, each with two possible values, we create a more generic single parameter with four possible values, such as --warmup-mode. I don't have a strong opinion here; both ways make sense to me. Happy to implement it either way.

Also note that there are rustfmt and clippy checks failing - we are quite strict about formatting and clippy-compliance, so these findings must also be fixed (but after discussing the general points..)

Thanks. I didn't realize because the CI step didn't run automatically. Now that it ran for this PR, I will fix the issues.

aawsome · 2026-01-20T03:06:45Z

Suggested task is to rename pack_id to id. I have no concerns with this and happy to implement.

I agree that id is better suited.

Change` sequential warmup to parallel. I think this is a good idea, although I am not certain that it would be backward compatible for all current users. Do you have any concerns here?

If warm-up-batch is not set, the behavior won't change. So it's just a new feature.

Can you clarify this feedback? What "args given for the command" are you referring to?

Sorry, for me it looked like you were omitting the given args, but https://github.com/rustic-rs/rustic_core/pull/438/files#diff-de508849190b7987f41c9d008e5e4bd90aad3c464a04bf5a73f661cf952b4a62R188 does include them. My fault.

You are suggesting that, instead of --warm-up-pack-id-input and --warm-up-input-type parameters, each with two possible values, we create a more generic single parameter with four possible values, such as --warmup-mode. I don't have a strong opinion here; both ways make sense to me. Happy to implement it either way.

Actually, I suggest to not use --warm-up-pack-id-input and --warm-up-input-type at all, but only --warm-up-batch and decide on the variables given in the command. Some examples:

warm-up-command = "echo %id" would span [BATCH_SIZE] echo commands with each a single id
warm-up-command = "echo %ids" would run a single echo command with [BATCH_SIZE] ids given as args
warm-up-command = "echo %path" would span [BATCH_SIZE] echo commands with each a single path
warm-up-command = "echo %paths" would run a single echo command with [BATCH_SIZE] ids given as args

There is some validation needed so that

warm-up-command = "echo %id %path" would span [BATCH_SIZE] echo commands with each a single id and path
warm-up-command = "echo %ids %paths" would run a single echo command with [BATCH_SIZE] ids and [BATCH_SIZE] paths given as args (no idea if this makes sense...), but
warm-up-command = "echo %ids %path" would error out - not clear what to do here!

philipmw · 2026-01-20T04:08:28Z

warm-up-command = "echo %id %path" would span [BATCH_SIZE] echo commands with each a single id and path

warm-up-command = "echo %ids %paths" would run a single echo command with [BATCH_SIZE] ids and [BATCH_SIZE] paths given as args (no idea if this makes sense...)

The variable "%id" has made sense because it works so much like shell substitution, which CLI users are already familiar with. Pack IDs don't have spaces, so it works well. But once we're adding "%ids" and the "%path" / "%paths", it no longer resembles shell substitution. Now one percent-string is a variable, while another is actually just a directive, not a variable.

Is your motivation to keep the number of command-line parameters small / keep commands short?

Perhaps as an alternative, we could do this: if %id is provided, then we infer "variable" mode. If %id is not provided, then we infer "args" mode.

To eliminate the type parameter, we could consider breaking backward compatibility and making "path" the default. What was the original motivation to make it just the pack ID? Do you think anyone would mind if we change the default? If we do, we could eliminate that parameter.

aawsome · 2026-01-23T12:51:36Z

The variable "%id" has made sense because it works so much like shell substitution, which CLI users are already familiar with. Pack IDs don't have spaces, so it works well. But once we're adding "%ids" and the "%path" / "%paths", it no longer resembles shell substitution. Now one percent-string is a variable, while another is actually just a directive, not a variable.

From a users perspective, I think that having "%ids" being replaced by space separated ids "1234.. 3423... 23423.." is what they'd expect for variable substitution.
Yes, we transform this into multiple argvs for the command called, but this is also what a shell would do if the ids where not surrounded by "".

Is your motivation to keep the number of command-line parameters small / keep commands short?

This but also to prevent users from the need to specify lot of things via CLI parameters (we already have tons of it.. ;-) ). In the mount/webdav command we also work with variable substitution and there some combinations would also not make sense, but IMO it is more powerful and explicit to be able to express all in a single argument like
warm-up-command = "my_script.sh %type %ids".

Perhaps as an alternative, we could do this: if %id is provided, then we infer "variable" mode. If %id is not provided, then we infer "args" mode.

I must say I'd like using %ids more as it is not implicit but explicit.

To eliminate the type parameter, we could consider breaking backward compatibility and making "path" the default. What was the original motivation to make it just the pack ID? Do you think anyone would mind if we change the default? If we do, we could eliminate that parameter.

We have the problem of the tree used to store pack ids: On a local dir the path looks like data/1b/1b4234..... However, when accessing this via URL it is often data/1b4234.... So, if we decide to only provide path or type/id, I'd vote for type/id. But IMO having the decision is even better.

philipmw · 2026-01-24T01:17:56Z

On rereading your proposal, I realize I misunderstood what you were suggesting. Now I understand and have no concerns beyond what you already outlined with edge cases like %ids %path.

Let me take a crack at implementing it over this weekend.

This implements the idea in rustic-rs/rustic#1430 and the subsequent feedback in rustic-rs#438 To use this feature, I wrote a proof-of-concept *warmup-s3-archives* program: https://gitlab.com/philipmw/warmup-s3-archives Changes: * add `--warm-up-batch <N>` parameter * add variables to `--warm-up-command` parameter to support singular and plural IDs and paths * add a function to the backend interface that provides the S3 key (or other key usable by the warmup command) instead of pack ID. Tested: * unit tests; * restoring data from Glacier Deep Archive spanning 3 packs, specifying a batch size of >=3 and argv mode * killing and restarting restore; this works as long as the warmup program is idempotent (which works for S3) * having the warmup command exit with an error code; rustic aborts the restore and prints a correct error message * running multiple restore operations for different packs in parallel. The warmup program ignores notifications for packs that it does not recognize, leaving them in the queue and letting another warmup instance process them. Known limitations and my thoughts for improvement opportunities: * setup is non-trivial, between the AWS infrastructure and the warmup program configuration. It requires some AWS experience and cold storage motivation. But IMO all the complexity is specific to the domain; none is incidental. * rustic does not pass the backend credentials to the warmup program. The warmup program is responsible for finding credentials on its own. Probably the best solution, at least for AWS, is for both rustic and the warmup program to use a common AWS credential provider. * rustic's progress bar does not reflect warmup progress within a batch; only progress of entire batches. There is no protocol for communicating progress from a single invocation of the warmup command. * rustic's warmup parameters could use a refactor as we discover and clarify cold storage backup scenarios. The distinction between `--warm-up-command` and `--warm-up-wait-command` seems too subtle. `--warm-up-wait` is too inflexible (since cold storage backends' estimates are measured in hours) and can be avoided entirely.

philipmw · 2026-01-26T05:16:53Z

@aawsome , it is ready for your review. I implemented all the suggestions.

aawsome

Looks great already! I found some things I think we can simplify the code at some places without changing functionality, so don't be shocked by the number of comments ;-)

crates/core/src/repository/warm_up.rs

crates/core/src/repository.rs

crates/testing/src/backend.rs

crates/backend/src/local.rs

crates/backend/src/rclone.rs

crates/backend/src/rest.rs

This implements the idea in rustic-rs/rustic#1430 and the subsequent feedback in rustic-rs#438 To use this feature, I wrote a proof-of-concept *warmup-s3-archives* program: https://gitlab.com/philipmw/warmup-s3-archives Changes: * add `--warm-up-batch <N>` parameter * add variables to `--warm-up-command` parameter to support singular and plural IDs and paths * add a function to the backend interface that provides the S3 key (or other key usable by the warmup command) instead of pack ID. Tested: * unit tests; * invoking the warmup program with "%id", "%ids", "%pack", "%packs", and batch size of 2 for a total restore size of 3 packs, verifying that the warmup command is invoked either separately per ID/pack or with two IDs/packs for the first invocation and with one ID/pack for the second invocation. * killing and restarting restore; this works as long as the warmup program is idempotent (which works for S3) * having the warmup command exit with an error code; rustic aborts the restore and prints a correct error message * running multiple restore operations for different packs in parallel. The warmup program ignores notifications for packs that it does not recognize, leaving them in the queue and letting another warmup instance process them. Known limitations and my thoughts for improvement opportunities: * setup is non-trivial, between the AWS infrastructure and the warmup program configuration. It requires some AWS experience and cold storage motivation. But IMO all the complexity is specific to the domain; none is incidental. * rustic does not pass the backend credentials to the warmup program. The warmup program is responsible for finding credentials on its own. Probably the best solution, at least for AWS, is for both rustic and the warmup program to use a common AWS credential provider. * rustic's progress bar does not reflect warmup progress within a batch; only progress of entire batches. There is no protocol for communicating progress from a single invocation of the warmup command. * rustic's warmup parameters could use a refactor as we discover and clarify cold storage backup scenarios. The distinction between `--warm-up-command` and `--warm-up-wait-command` seems too subtle. `--warm-up-wait` is too inflexible (since cold storage backends' estimates are measured in hours) and can be avoided entirely.

philipmw · 2026-02-02T00:47:54Z

@aawsome , I implemented all the changes.

I also noticed that the CI tests are failing for macOS. For one of them, I made a fix, but the other one, I am not certain. It may be already fixed with the latest changes, so let's see how the latest CI build runs.

It seems that the CI won't run automatically til you approve it; is there any way to make it run automatically so that I get faster feedback from it?

philipmw force-pushed the batch-warm-up branch 2 times, most recently from 4115d63 to 086e581 Compare November 15, 2025 14:04

philipmw force-pushed the batch-warm-up branch from cfaa0cc to 1e3947f Compare January 26, 2026 04:35

philipmw force-pushed the batch-warm-up branch from 1e3947f to 3b38173 Compare January 26, 2026 05:11

philipmw force-pushed the batch-warm-up branch from 3b38173 to 904d7b0 Compare January 26, 2026 05:16

aawsome requested changes Jan 27, 2026

View reviewed changes

philipmw force-pushed the batch-warm-up branch from 904d7b0 to f7b82be Compare February 2, 2026 00:45

Add support for batch warmup #438

Are you sure you want to change the base?

Add support for batch warmup #438

Uh oh!

Conversation

philipmw commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philipmw commented Dec 21, 2025

Uh oh!

aawsome commented Dec 21, 2025

Uh oh!

aawsome commented Dec 21, 2025

Uh oh!

aawsome commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philipmw commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aawsome commented Jan 20, 2026

Uh oh!

philipmw commented Jan 20, 2026

Uh oh!

aawsome commented Jan 23, 2026

Uh oh!

philipmw commented Jan 24, 2026

Uh oh!

philipmw commented Jan 26, 2026

Uh oh!

aawsome left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

philipmw commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

philipmw commented Nov 9, 2025 •

edited

Loading

aawsome commented Jan 20, 2026 •

edited

Loading

philipmw commented Jan 20, 2026 •

edited

Loading