Feature Request
NB: some aspects of this feature request affect multiple Pytroll packages. I've put this issue here because I think it affects trollflow2 the most. It's actually two feature requests, but they serve the same goal.
Is your feature request related to a problem? Please describe.
The way we produce images and deliver them to our client software is inefficient.
- In Pytroll, we wait until all parts of the input data are there (via geographic gatherer or segment gatherer), then send a message to trollflow2 to start processing. The images would be produced faster if we could start processing each segment as it comes in.
- In several parts of trollflow2, images are only finalised when all products have been produced. For example, if we use the file publisher plugin (or any other plugin that comes after
save_datasets), this is only called after save_datasets has completed. If we use a staging zone, all files are moved from the staging zone to the output directory only when all images have been produced. When we produce many images, there can be many minutes between the finalisation of the first and the last one. In this case, it would be beneficial to announce each image (or move it to the output directory) as soon as it is finished.
Both of those aspects cause a delay in the latency of our image production. This may become more critical with FCI due to the much larger image size compared to SEVIRI. It also means all our images are delivered to our client software at the same time, which leads to bottlenecks in the images being imported there, that would be avoided or at least reduced if the images were delivered one by one.
Describe the solution you'd like
I would like a solution that minimises the latency of image delivery to clients without sacrificing quality. How to do this is of secondary concern. Problem (1) is difficult to solve if we are doing any resampling. Problem (2) is in theory easier to solve, but may need significant changes in trollflow2. Instead of sequentially calling the plugins, it might need to call the filepublisher plugin many times in parallel, while communicating with the save_datasets plugin to know when each file is ready. Or the save datasets plugin would need to do the publishing itself (or moving the final files in case of using a staging zone or temporary files).
Describe any changes to existing user workflow
I think either problem should be solvable without breaking existing user workflow, although either problem is large enough that the amount of restructuring needed to solve them may lead to accidental breaking of backwards compatibility.
Additional context
For point (1), it would be possible to skip the segment gatherer and send each segment immediately to trollflow2, producing images in native projection, that are then at the end all read and resampled, before stored again, maybe. I think such a setup could be built without writing new code. For point (2), we could write directly to the final output directory, but because our file distribution software is external, it doesn't have a way to know when a file is finished (I think). This leads to the file distribution software copying files before they are finished, in particular for large files that take a while to produce.
Feature Request
NB: some aspects of this feature request affect multiple Pytroll packages. I've put this issue here because I think it affects trollflow2 the most. It's actually two feature requests, but they serve the same goal.
Is your feature request related to a problem? Please describe.
The way we produce images and deliver them to our client software is inefficient.
save_datasets), this is only called aftersave_datasetshas completed. If we use a staging zone, all files are moved from the staging zone to the output directory only when all images have been produced. When we produce many images, there can be many minutes between the finalisation of the first and the last one. In this case, it would be beneficial to announce each image (or move it to the output directory) as soon as it is finished.Both of those aspects cause a delay in the latency of our image production. This may become more critical with FCI due to the much larger image size compared to SEVIRI. It also means all our images are delivered to our client software at the same time, which leads to bottlenecks in the images being imported there, that would be avoided or at least reduced if the images were delivered one by one.
Describe the solution you'd like
I would like a solution that minimises the latency of image delivery to clients without sacrificing quality. How to do this is of secondary concern. Problem (1) is difficult to solve if we are doing any resampling. Problem (2) is in theory easier to solve, but may need significant changes in trollflow2. Instead of sequentially calling the plugins, it might need to call the filepublisher plugin many times in parallel, while communicating with the save_datasets plugin to know when each file is ready. Or the save datasets plugin would need to do the publishing itself (or moving the final files in case of using a staging zone or temporary files).
Describe any changes to existing user workflow
I think either problem should be solvable without breaking existing user workflow, although either problem is large enough that the amount of restructuring needed to solve them may lead to accidental breaking of backwards compatibility.
Additional context
For point (1), it would be possible to skip the segment gatherer and send each segment immediately to trollflow2, producing images in native projection, that are then at the end all read and resampled, before stored again, maybe. I think such a setup could be built without writing new code. For point (2), we could write directly to the final output directory, but because our file distribution software is external, it doesn't have a way to know when a file is finished (I think). This leads to the file distribution software copying files before they are finished, in particular for large files that take a while to produce.