Skip to content

[8.3.2] Scaling segmentation/AI for future models #130

@davramov

Description

@davramov

While it is a major milestone to include the SYNAPS-I segmentation into splash_flows, it's clear we need to strategize on how to scale this infrastructure to handle a potentially wider range of models and analysis workflows depending on the user group or experiment.

Here is a list of suggestions/ideas for how to approach this (thank you Xiaoya):

  • The SAM3 / DINOv3 steps are currently hardcoded in the flow, but the workflow may vary across different projects. For example, other projects may load/save U-Net features. Could we make this more flexible?

  • Would be beneficial to move the segmentation and combination code into a separate segmentation.py file, as nersc.py is getting quite long. In segmentation.py, we could load different models depending on the project—for example, for Synaps, we load SAM3 and DINO3 and then combine them, while for Harry’s project, we might load a U-Net model instead

I think we can split it up so ALCF/NERSC flows live in their own subdirectories:

flows/bl832/
    nersc/
        __init__.py          # re-exports NERSCTomographyHPCController
        controller.py        # class definition + reconstruct, build_multi_resolution
        segmentation.py      # segmentation_sam3, segmentation_dino, combine_segmentations
        streaming.py         # streaming mixin stuff
    alcf/
        __init__.py
        controller.py
        segmentation.py
  • We need to make model loading and inference more flexible to support different projects.

  • The torchrun command may vary across models. Could we define a function to generate the command for different tasks?

  • As we collaborate with Harry on a different project, I’m wondering whether there is a more general way to handle model loading across different models.

  • MLflow might be a good option to consider in a future PR (though the inference code would likely need to be adapted accordingly). However, I’m not yet sure how to run a Slurm job while loading models from MLflow, and there may be some additional configuration required.

  • I think it makes sense to define separate Flows for each use case (i.e., regular recon/multiresolution, petiole segmentation, Harry's moon rock segmentation, etc.). But we should make sure to encapsulate each HPC job submission as a Task. This way, it would be easier to reuse and rearrange individual components of the pipeline for different Flows.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions