-
Notifications
You must be signed in to change notification settings - Fork 6
Description
While it is a major milestone to include the SYNAPS-I segmentation into splash_flows, it's clear we need to strategize on how to scale this infrastructure to handle a potentially wider range of models and analysis workflows depending on the user group or experiment.
Here is a list of suggestions/ideas for how to approach this (thank you Xiaoya):
-
The SAM3 / DINOv3 steps are currently hardcoded in the flow, but the workflow may vary across different projects. For example, other projects may load/save U-Net features. Could we make this more flexible?
-
Would be beneficial to move the segmentation and combination code into a separate segmentation.py file, as nersc.py is getting quite long. In segmentation.py, we could load different models depending on the project—for example, for Synaps, we load SAM3 and DINO3 and then combine them, while for Harry’s project, we might load a U-Net model instead
I think we can split it up so ALCF/NERSC flows live in their own subdirectories:
flows/bl832/
nersc/
__init__.py # re-exports NERSCTomographyHPCController
controller.py # class definition + reconstruct, build_multi_resolution
segmentation.py # segmentation_sam3, segmentation_dino, combine_segmentations
streaming.py # streaming mixin stuff
alcf/
__init__.py
controller.py
segmentation.py
-
We need to make model loading and inference more flexible to support different projects.
-
The torchrun command may vary across models. Could we define a function to generate the command for different tasks?
-
As we collaborate with Harry on a different project, I’m wondering whether there is a more general way to handle model loading across different models.
-
MLflow might be a good option to consider in a future PR (though the inference code would likely need to be adapted accordingly). However, I’m not yet sure how to run a Slurm job while loading models from MLflow, and there may be some additional configuration required.
-
I think it makes sense to define separate Flows for each use case (i.e., regular recon/multiresolution, petiole segmentation, Harry's moon rock segmentation, etc.). But we should make sure to encapsulate each HPC job submission as a Task. This way, it would be easier to reuse and rearrange individual components of the pipeline for different Flows.