Skip to content

enh: enhance labelisation_seeded_watershed performance with shortcut (#260)#296

Open
lapertor wants to merge 1 commit into
higra:masterfrom
lapertor:enh/seeded-watershed-performance
Open

enh: enhance labelisation_seeded_watershed performance with shortcut (#260)#296
lapertor wants to merge 1 commit into
higra:masterfrom
lapertor:enh/seeded-watershed-performance

Conversation

@lapertor
Copy link
Copy Markdown
Contributor

Description

This PR addresses Issue #260 by implementing the first proposed optimization (early stopping) for labelisation_seeded_watershed.

Changes made

  • Added a counter num_background_components to track the number of vertices initially set to the background_label.
  • Decremented this counter each time an unseeded component is merged with a seeded one.
  • Added an early termination break in the main edge processing loop as soon as num_background_components == 0.
  • Wrapped the main loop in an if (num_background_components > 0) to handle the edge case where all vertices are already seeded or no background exists.

Partially fixes #260, let's explore idea 2 now (replacing the relabeling).

@PerretB
Copy link
Copy Markdown
Member

PerretB commented Mar 19, 2026

Thank you for addressing this. As this is a performance-related issue, can you also provide some benchmark to assess if the optimization really provides an improvement?

@lapertor
Copy link
Copy Markdown
Contributor Author

I ran instrumented benchmarks to isolate the performance of the main loop (excluding stable_arg_sort) on both master and the patched branch.

Setup:

  • Grid graph of shape (2000, 2000) with 4-adjacency (4M vertices, ~8M edges)
  • One seed per block, placed at the block center
  • Edge weights: 0 inside each block, 1 across block boundaries
  • Timing via std::chrono around the union-find loop + final relabelling only
  • 5 runs per configuration

Results (loop only, in ms):

Block shape Seeds Seed ratio master (ms) patched (ms) Gain
2000x2000 1 0.000025% ~851 ~936 -10%
100x100 400 0.01% ~854 ~896 -5%
10x10 40,000 1% ~868 ~865 ~0%
4x4 250,000 6.25% ~886 ~800 +9.7%
2x2 1,000,000 25% ~877 ~636 +27.5%
1x1 4,000,000 100% ~676 ~193 +71.5%

It shows clear gains when the seed ratio is high enough. For very low seed ratios, the overhead of the counter is negligible (within noise).
Note that in this test stable_arg_sort dominates total wall time (~10-11s), so the end-to-end improvement is modest in absolute terms, but the loop itself is faster for seed-dense inputs.

Benchmark script
import higra as hg
import numpy as np
import time


def make_block_seeds(shape, block_shape):
    h, w = shape
    bh, bw = block_shape
    seeds = np.zeros(h * w, dtype=np.int32)

    label = 1
    for bi in range(0, h, bh):
        for bj in range(0, w, bw):
            ci = min(bi + bh // 2, h - 1)
            cj = min(bj + bw // 2, w - 1)
            seeds[ci * w + cj] = label
            label += 1

    return seeds


def make_block_edge_weights(graph, shape, block_shape):
    h, w = shape
    bh, bw = block_shape

    sources, targets = graph.edge_list()
    weights = np.zeros(graph.num_edges(), dtype=np.float32)

    for i in range(graph.num_edges()):
        u = sources[i]
        v = targets[i]

        ui, uj = divmod(u, w)
        vi, vj = divmod(v, w)

        same_block = (ui // bh == vi // bh) and (uj // bw == vj // bw)
        weights[i] = 0.0 if same_block else 1.0

    return weights


def _benchmark(shape=(2000, 2000), block_shape=(100, 100), n_runs=5):
    graph = hg.get_4_adjacency_graph(shape)
    seeds = make_block_seeds(shape, block_shape)
    edge_weights = make_block_edge_weights(graph, shape, block_shape)

    print(f"Shape: {shape}")
    print(f"Block shape: {block_shape}")
    print(f"Num vertices: {graph.num_vertices()}")
    print(f"Num edges: {graph.num_edges()}")
    print(f"Num seeds: {np.count_nonzero(seeds)}")

    times = []
    for _ in range(n_runs):
        t0 = time.perf_counter()
        labels = hg.labelisation_seeded_watershed(graph, edge_weights, seeds)
        t1 = time.perf_counter()
        times.append(t1 - t0)

    print(f"Times: {[round(t, 4) for t in times]}")
    print(f"Mean: {np.mean(times):.4f} s")
    print(f"Std:  {np.std(times):.4f} s")
    print(f"Labels range: {labels.min()} to {labels.max()}")

def benchmark():
    shape = (2000, 2000)
    for block_shape in [(2000, 2000), (100, 100), (10, 10), (4, 4), (2, 2), (1, 1)]:
        print("=" * 60)
        _benchmark(shape=shape, block_shape=block_shape)
        print()

if __name__ == "__main__":
    benchmark()
Old implementation
/***************************************************************************
* Copyright ESIEE Paris (2018)                                             *
*                                                                          *
* Contributor(s) : Benjamin Perret                                         *
*                                                                          *
* Distributed under the terms of the CECILL-B License.                     *
*                                                                          *
* The full license is in the file LICENSE, distributed with this software. *
****************************************************************************/

#pragma once


#include "../graph.hpp"
#include "../structure/array.hpp"
#include "higra/structure/unionfind.hpp"
#include "higra/sorting.hpp"
#include <vector>
#include <stack>
#include <chrono>

namespace hg {

    /**
     * Linear time watershed cut algorithm.
     *
     * Jean Cousty, Gilles Bertrand, Laurent Najman, Michel Couprie. Watershed Cuts: Minimum Spanning
     * Forests and the Drop of Water Principle. IEEE Transactions on Pattern Analysis and Machine
     * Intelligence, Institute of Electrical and Electronics Engineers, 2009, 31 (8), pp.1362-1374.
     * @tparam graph_t
     * @tparam T
     * @param graph
     * @param xedge_weights
     * @return array of labels on graph vertices, numbered from 1 to n with n the number of minima
     */
    template<typename graph_t, typename T>
    auto
    labelisation_watershed(const graph_t &graph, const xt::xexpression<T> &xedge_weights) {
        HG_TRACE();
        auto &edge_weights = xedge_weights.derived_cast();
        hg_assert_edge_weights(graph, edge_weights);
        hg_assert_1d_array(edge_weights);

        using value_type = typename T::value_type;
        using vertex_t = typename graph_traits<graph_t>::vertex_descriptor;

        auto fminus = array_1d<value_type>::from_shape({graph.num_vertices()});

        for (auto v: vertex_iterator(graph)) {
            auto minValue = (std::numeric_limits<value_type>::max)();
            for (auto e: out_edge_iterator(v, graph)) {
                minValue = (std::min)(minValue, edge_weights(e));
            }
            fminus[v] = minValue;
        }


        auto no_label = (std::numeric_limits<index_t>::max)();
        auto labels = array_1d<index_t>::from_shape({graph.num_vertices()});
        std::fill(labels.begin(), labels.end(), no_label);

        auto notInL = array_1d<bool>::from_shape({graph.num_vertices()});
        std::fill(notInL.begin(), notInL.end(), true);

        std::vector<vertex_t> L;
        std::vector<vertex_t> LL;

        auto stream = [&L, &LL, &graph, &edge_weights, &fminus, &notInL, &labels, no_label](vertex_t x) {
            L.clear();
            LL.clear();
            L.push_back(x);
            LL.push_back(x);
            notInL[x] = false;

            while (!LL.empty()) {
                auto y = LL[LL.size() - 1];
                LL.pop_back();

                for (auto e: out_edge_iterator(y, graph)) {
                    auto adjacent_vertex = target(e, graph);
                    if (notInL[adjacent_vertex] && edge_weights(e) == fminus[y]) {
                        if (labels[adjacent_vertex] != no_label) {
                            return labels[adjacent_vertex];
                        } else if (fminus[adjacent_vertex] < fminus[y]) {
                            L.push_back(adjacent_vertex);
                            notInL[adjacent_vertex] = false;
                            LL.clear();
                            LL.push_back(adjacent_vertex);
                            break; // stop breadth_first
                        } else {
                            L.push_back(adjacent_vertex);
                            notInL[adjacent_vertex] = false;
                            LL.push_back(adjacent_vertex);
                        }
                    }
                }
            }
            return no_label;
        };

        index_t num_labs = 0;

        for (auto v: vertex_iterator(graph)) {
            if (labels[v] == no_label) {
                auto res = stream(v);
                if (res == no_label) {
                    num_labs++;
                    for (auto x: L) {
                        labels[x] = num_labs;
                        notInL[x] = true;
                    }
                } else {
                    for (auto x: L) {
                        labels[x] = res;
                        notInL[x] = true;
                    }
                }
            }
        }
        return labels;
    };


    template<typename graph_t, typename T1, typename T2>
    auto labelisation_seeded_watershed(
            const graph_t &graph,
            const xt::xexpression<T1> &xedge_weights,
            const xt::xexpression<T2> &xvertex_seeds,
            const typename T2::value_type background_label = 0) {
        HG_TRACE();
        auto &edge_weights = xedge_weights.derived_cast();
        auto &vertex_seeds = xvertex_seeds.derived_cast();
        hg_assert_edge_weights(graph, edge_weights);
        hg_assert_node_weights(graph, vertex_seeds);
        hg_assert_1d_array(edge_weights);
        hg_assert_1d_array(vertex_seeds);

        using label_type = typename T2::value_type;

        array_1d<index_t> sorted_edges_indices = stable_arg_sort(edge_weights);

        auto t0 = std::chrono::high_resolution_clock::now();

        index_t num_nodes = num_vertices(graph);
        index_t num_edges = sorted_edges_indices.size();

        union_find uf(num_nodes);

        array_1d<label_type> labels = vertex_seeds;

        for (index_t i = 0; i < num_edges; i++) {
            auto ei = sorted_edges_indices[i];
            auto e = edge_from_index(ei, graph);
            auto c1 = uf.find(source(e, graph));
            auto c2 = uf.find(target(e, graph));

            if (c1 != c2 && (labels(c1) == background_label || labels(c2) == background_label)) {
                if (labels(c1) == background_label) {
                    labels(c1) = labels(c2);
                } else {
                    labels(c2) = labels(c1);
                }
                uf.link(c1, c2);
            }

        }

        for (index_t i = 0; i < num_nodes; i++) {
            if (labels(i) == background_label) {
                labels(i) = labels(uf.find(i));
            }
        }

        auto t1 = std::chrono::high_resolution_clock::now();
        std::cerr << "watershed loop: "
                << std::chrono::duration<double, std::milli>(t1 - t0).count()
                << " ms" << std::endl;

        return labels;
    };

}
Output of tests on the old implementation
============================================================
Shape: (2000, 2000)
Block shape: (2000, 2000)
Num vertices: 4000000
Num edges: 7996000
Num seeds: 1
watershed loop: 855.673 ms
watershed loop: 846.526 ms
watershed loop: 849.356 ms
watershed loop: 856.115 ms
watershed loop: 845.752 ms
Times: [5.8268, 5.7668, 5.739, 5.7754, 5.7401]
Mean: 5.7696 s
Std:  0.0320 s
Labels range: 1 to 1

============================================================
Shape: (2000, 2000)
Block shape: (100, 100)
Num vertices: 4000000
Num edges: 7996000
Num seeds: 400
watershed loop: 852.523 ms
watershed loop: 850.293 ms
watershed loop: 850.901 ms
watershed loop: 852.464 ms
watershed loop: 865.418 ms
Times: [10.8303, 10.8492, 10.9356, 10.8443, 11.0812]
Mean: 10.9081 s
Std:  0.0941 s
Labels range: 1 to 400

============================================================
Shape: (2000, 2000)
Block shape: (10, 10)
Num vertices: 4000000
Num edges: 7996000
Num seeds: 40000
watershed loop: 883.138 ms
watershed loop: 869.168 ms
watershed loop: 870.525 ms
watershed loop: 858.007 ms
watershed loop: 856.922 ms
Times: [11.9661, 11.6858, 11.6835, 11.7077, 11.5412]
Mean: 11.7169 s
Std:  0.1379 s
Labels range: 1 to 40000

============================================================
Shape: (2000, 2000)
Block shape: (4, 4)
Num vertices: 4000000
Num edges: 7996000
Num seeds: 250000
watershed loop: 885.402 ms
watershed loop: 884.956 ms
watershed loop: 899.801 ms
watershed loop: 879.073 ms
watershed loop: 879.725 ms
Times: [11.6119, 11.711, 12.1034, 11.5515, 11.5471]
Mean: 11.7050 s
Std:  0.2078 s
Labels range: 1 to 250000

============================================================
Shape: (2000, 2000)
Block shape: (2, 2)
Num vertices: 4000000
Num edges: 7996000
Num seeds: 1000000
watershed loop: 882.144 ms
watershed loop: 878.473 ms
watershed loop: 874.782 ms
watershed loop: 876.145 ms
watershed loop: 871.374 ms
Times: [11.1941, 11.0869, 11.0762, 11.0111, 11.0052]
Mean: 11.0747 s
Std:  0.0682 s
Labels range: 1 to 1000000

============================================================
Shape: (2000, 2000)
Block shape: (1, 1)
Num vertices: 4000000
Num edges: 7996000
Num seeds: 4000000
watershed loop: 689.118 ms
watershed loop: 686.167 ms
watershed loop: 696.256 ms
watershed loop: 655.261 ms
watershed loop: 651.837 ms
Times: [5.7815, 5.7608, 5.7853, 5.6649, 5.4938]
Mean: 5.6973 s
Std:  0.1107 s
Labels range: 1 to 4000000
New implementation
/***************************************************************************
* Copyright ESIEE Paris (2018)                                             *
*                                                                          *
* Contributor(s) : Benjamin Perret                                         *
*                                                                          *
* Distributed under the terms of the CECILL-B License.                     *
*                                                                          *
* The full license is in the file LICENSE, distributed with this software. *
****************************************************************************/

#pragma once


#include "../graph.hpp"
#include "../structure/array.hpp"
#include "higra/structure/unionfind.hpp"
#include "higra/sorting.hpp"
#include <vector>
#include <stack>
#include <chrono>

namespace hg {

    /**
     * Linear time watershed cut algorithm.
     *
     * Jean Cousty, Gilles Bertrand, Laurent Najman, Michel Couprie. Watershed Cuts: Minimum Spanning
     * Forests and the Drop of Water Principle. IEEE Transactions on Pattern Analysis and Machine
     * Intelligence, Institute of Electrical and Electronics Engineers, 2009, 31 (8), pp.1362-1374.
     * @tparam graph_t
     * @tparam T
     * @param graph
     * @param xedge_weights
     * @return array of labels on graph vertices, numbered from 1 to n with n the number of minima
     */
    template<typename graph_t, typename T>
    auto
    labelisation_watershed(const graph_t &graph, const xt::xexpression<T> &xedge_weights) {
        HG_TRACE();
        auto &edge_weights = xedge_weights.derived_cast();
        hg_assert_edge_weights(graph, edge_weights);
        hg_assert_1d_array(edge_weights);

        using value_type = typename T::value_type;
        using vertex_t = typename graph_traits<graph_t>::vertex_descriptor;

        auto fminus = array_1d<value_type>::from_shape({graph.num_vertices()});

        for (auto v: vertex_iterator(graph)) {
            auto minValue = (std::numeric_limits<value_type>::max)();
            for (auto e: out_edge_iterator(v, graph)) {
                minValue = (std::min)(minValue, edge_weights(e));
            }
            fminus[v] = minValue;
        }


        auto no_label = (std::numeric_limits<index_t>::max)();
        auto labels = array_1d<index_t>::from_shape({graph.num_vertices()});
        std::fill(labels.begin(), labels.end(), no_label);

        auto notInL = array_1d<bool>::from_shape({graph.num_vertices()});
        std::fill(notInL.begin(), notInL.end(), true);

        std::vector<vertex_t> L;
        std::vector<vertex_t> LL;

        auto stream = [&L, &LL, &graph, &edge_weights, &fminus, &notInL, &labels, no_label](vertex_t x) {
            L.clear();
            LL.clear();
            L.push_back(x);
            LL.push_back(x);
            notInL[x] = false;

            while (!LL.empty()) {
                auto y = LL[LL.size() - 1];
                LL.pop_back();

                for (auto e: out_edge_iterator(y, graph)) {
                    auto adjacent_vertex = target(e, graph);
                    if (notInL[adjacent_vertex] && edge_weights(e) == fminus[y]) {
                        if (labels[adjacent_vertex] != no_label) {
                            return labels[adjacent_vertex];
                        } else if (fminus[adjacent_vertex] < fminus[y]) {
                            L.push_back(adjacent_vertex);
                            notInL[adjacent_vertex] = false;
                            LL.clear();
                            LL.push_back(adjacent_vertex);
                            break; // stop breadth_first
                        } else {
                            L.push_back(adjacent_vertex);
                            notInL[adjacent_vertex] = false;
                            LL.push_back(adjacent_vertex);
                        }
                    }
                }
            }
            return no_label;
        };

        index_t num_labs = 0;

        for (auto v: vertex_iterator(graph)) {
            if (labels[v] == no_label) {
                auto res = stream(v);
                if (res == no_label) {
                    num_labs++;
                    for (auto x: L) {
                        labels[x] = num_labs;
                        notInL[x] = true;
                    }
                } else {
                    for (auto x: L) {
                        labels[x] = res;
                        notInL[x] = true;
                    }
                }
            }
        }
        return labels;
    };


    template<typename graph_t, typename T1, typename T2>
    auto labelisation_seeded_watershed(
            const graph_t &graph,
            const xt::xexpression<T1> &xedge_weights,
            const xt::xexpression<T2> &xvertex_seeds,
            const typename T2::value_type background_label = 0) {
        HG_TRACE();
        auto &edge_weights = xedge_weights.derived_cast();
        auto &vertex_seeds = xvertex_seeds.derived_cast();
        hg_assert_edge_weights(graph, edge_weights);
        hg_assert_node_weights(graph, vertex_seeds);
        hg_assert_1d_array(edge_weights);
        hg_assert_1d_array(vertex_seeds);

        using label_type = typename T2::value_type;

        array_1d<index_t> sorted_edges_indices = stable_arg_sort(edge_weights);

        auto t0 = std::chrono::high_resolution_clock::now();

        index_t num_nodes = num_vertices(graph);
        index_t num_edges = sorted_edges_indices.size();

        union_find uf(num_nodes);

        array_1d<label_type> labels = vertex_seeds;
        index_t num_background_components = 0;
        for (index_t i = 0; i < num_nodes; i++) {
            if (labels(i) == background_label) {
                num_background_components++;
            }
        }

        if (num_background_components > 0) {
            for (index_t i = 0; i < num_edges; i++) {
                auto ei = sorted_edges_indices[i];
                auto e = edge_from_index(ei, graph);
                auto c1 = uf.find(source(e, graph));
                auto c2 = uf.find(target(e, graph));

                if (c1 != c2 && (labels(c1) == background_label || labels(c2) == background_label)) {
                    if (labels(c1) == background_label) {
                        labels(c1) = labels(c2);
                    } else {
                        labels(c2) = labels(c1);
                    }
                    uf.link(c1, c2);

                    num_background_components--;
                    if (num_background_components == 0) {
                        break;
                    }
                }
            }
        }

        for (index_t i = 0; i < num_nodes; i++) {
            if (labels(i) == background_label) {
                labels(i) = labels(uf.find(i));
            }
        }

        auto t1 = std::chrono::high_resolution_clock::now();
        std::cerr << "watershed loop: "
                << std::chrono::duration<double, std::milli>(t1 - t0).count()
                << " ms" << std::endl;

        return labels;
    };

}
Output of tests on the new implementation
============================================================
Shape: (2000, 2000)
Block shape: (2000, 2000)
Num vertices: 4000000
Num edges: 7996000
Num seeds: 1
watershed loop: 950.008 ms
watershed loop: 925.107 ms
watershed loop: 943.912 ms
watershed loop: 927.181 ms
watershed loop: 935.416 ms
Times: [5.9918, 6.0398, 6.014, 5.9141, 6.0508]
Mean: 6.0021 s
Std:  0.0485 s
Labels range: 1 to 1

============================================================
Shape: (2000, 2000)
Block shape: (100, 100)
Num vertices: 4000000
Num edges: 7996000
Num seeds: 400
watershed loop: 899.798 ms
watershed loop: 915.862 ms
watershed loop: 885.446 ms
watershed loop: 887.755 ms
watershed loop: 892.831 ms
Times: [11.119, 11.067, 10.8419, 10.7912, 10.8781]
Mean: 10.9394 s
Std:  0.1294 s
Labels range: 1 to 400

============================================================
Shape: (2000, 2000)
Block shape: (10, 10)
Num vertices: 4000000
Num edges: 7996000
Num seeds: 40000
watershed loop: 867.669 ms
watershed loop: 872.522 ms
watershed loop: 866.832 ms
watershed loop: 858.539 ms
watershed loop: 860.101 ms
Times: [12.0127, 11.9728, 11.8824, 11.7093, 11.6933]
Mean: 11.8541 s
Std:  0.1318 s
Labels range: 1 to 40000

============================================================
Shape: (2000, 2000)
Block shape: (4, 4)
Num vertices: 4000000
Num edges: 7996000
Num seeds: 250000
watershed loop: 799.887 ms
watershed loop: 799.381 ms
watershed loop: 793.636 ms
watershed loop: 797.365 ms
watershed loop: 809.87 ms
Times: [11.7416, 11.5014, 11.5901, 11.5639, 11.7484]
Mean: 11.6291 s
Std:  0.0990 s
Labels range: 1 to 250000

============================================================
Shape: (2000, 2000)
Block shape: (2, 2)
Num vertices: 4000000
Num edges: 7996000
Num seeds: 1000000
watershed loop: 645.401 ms
watershed loop: 639.748 ms
watershed loop: 629.793 ms
watershed loop: 632.526 ms
watershed loop: 630.233 ms
Times: [10.7289, 10.5895, 10.5117, 10.5368, 10.4405]
Mean: 10.5615 s
Std:  0.0965 s
Labels range: 1 to 1000000

============================================================
Shape: (2000, 2000)
Block shape: (1, 1)
Num vertices: 4000000
Num edges: 7996000
Num seeds: 4000000
watershed loop: 193.35 ms
watershed loop: 191.892 ms
watershed loop: 197.769 ms
watershed loop: 191.449 ms
watershed loop: 190.6 ms
Times: [5.056, 4.9982, 5.1023, 5.0678, 5.1104]
Mean: 5.0669 s
Std:  0.0400 s
Labels range: 1 to 4000000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Seeded watershed performance

2 participants