Skip to content

Investigate Distributed Support #16

@tbirdso

Description

@tbirdso

Background

Running subimage registration tasks on a single workstation may require a prohibitively long time to run on massive, cloud-based image datasets. We would like to be able to distribute registration tasks among a cluster of worker nodes to execute in parallel.

The itk_dreg framework is built with distributed registration in mind via streaming readers and dask.delayed tasks. However, output serialization is not fully supported in ITK v5.4rc2 or earlier.

ITK v5.4rc3 wheels will include support for unbuffered ITK images introduced in InsightSoftwareConsortium/ITK#4270. That support will allow us to serialize itk.Images describing oriented bounding boxes over which piecewise itk.Transform results are be valid, which is required for distributed processing.

Steps to Investigate

When ITK v5.4rc3 is available on PyPI:

  1. Update pyproject.toml and CI workflows in itk-dreg to use the updated ITK version
  2. Run the localcluster and serialize_pairwise_result tests locally and verify that both tests pass
  3. Re-enable the localcluster and serialize_pairwise_result tests in CI and verify that automated tests pass

For further testing:

  1. Use dask.distributed.LocalCluster to mock a distributed cluster on your local system. Run serialized registration in an example notebook on a LocalCluster and verify that tasks are visible in the accompanying Dask dashboard.
  2. Set up access to a distributed cluster and test distributed registration on the cluster. (xref: Coiled, ACCESS)
  3. Explore Dask optimization to reduce task serialization requirements

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions