PERF: Use tuned ImageRangeRegion copy in CastImageFilter by blowekamp · Pull Request #5670 · InsightSoftwareConsortium/ITK

blowekamp · 2025-12-03T11:59:10Z

Based on performance testing, across converting between Image of Vector and VectorImage, this loop was the best performing. Key features of the improved performane:

Uses NumericTraits::GetLength over Image::GetNumberOfComponents, the former may be constant, while the latter is virtual
Uses const InputPixelType & inputPixel
OutputPixelType value{ outputIt.Get() } initialized to a reference in the output image bufffer and does not perform memory allocation for variable length vectors.

PR Checklist

No API changes were made (or the changes have been approved)
No major design changes were made (or the changes have been approved)
Added test (or behavior not changed)
Updated API documentation (or API not changed)
Added license to new files (if any)
Added Python wrapping to new files (if any) as described in ITK Software Guide Section 9.5
Added ITK examples for all new major features (if any)

Refer to the ITK Software Guide for
further development details if necessary.

SimonRit

LGTM, just typos and a question.

SimonRit · 2025-12-03T12:03:26Z

Modules/Filtering/ImageFilterBase/include/itkCastImageFilter.hxx

  {
-    while (!inputIt.IsAtEndOfLine())
+    const InputPixelType & inputPixel = *inputIt;
+    OutputPixelType        outputPixel{ *outputIt };


One alternative to this is to replace this by

OutputPixelType outputPixel; outputPixel = *outputIt;

outside the while loop. Does it improve performance?

I am probably missing some larger context behavior, but why do we need a temporary local variable for outputPixel? It seems that line 164 could be removed, and line 159 could be an alias to the current memory.

OutputPixelType & outputPixel = *outputIt;

I encourage both of you to compile and try experiments with PR #5669

@SimonRit Outside the loop it is a little slower ~2x for vector pixel output. It has odd statements to force a copy.
@hjmjohnson That does not compile.

@hjmjohnson FYI, for VectorImage, *outputIt returns a proxy to the current output pixel. Hope that clarifies why OutputPixelType & outputPixel = *outputIt won't compile for VectorImage!

Modules/Filtering/ImageFilterBase/include/itkCastImageFilter.hxx

Based on performance testing, across converting between Image of Vector and VectorImage, this loop was the best performing. Key features of the improved performane: - Uses NumericTraits::GetLength over Image::GetNumberOfComponents, the former may be constant, while the latter is virtual - Uses const InputPixelType & inputPixel - OutputPixelType value{ outputIt.Get() } initialized to a reference in the output image bufffer and does not perform memory allocation for variable length vectors.

N-Dekker · 2025-12-03T16:37:59Z

Modules/Filtering/ImageFilterBase/include/itkCastImageFilter.hxx

@@ -139,29 +140,30 @@ CastImageFilter<TInputImage, TOutputImage>::DynamicThreadedGenerateDataDispatche

  this->CallCopyOutputRegionToInputRegion(inputRegionForThread, outputRegionForThread);


For my understanding, do inputRegionForThread and outputRegionForThread always have exactly the same number of pixels (as returned by ImageRegion::GetNumberOfPixels()), after calling CallCopyOutputRegionToInputRegion?

If the filter has some "radius" (frequent), then the input region is slightly larger (I believe). Each filter decides how to increase the region during propagation from output to input.

Looking at an older version, it looks like it assumes that the output region has the same size as the input region. Because it just checks inputIt.IsAtEnd() and inputIt.IsAtEndOfLine(), not outputIt.IsAtEnd() or outputIt.IsAtEndOfLine():

ITK/Modules/Filtering/ImageFilterBase/include/itkCastImageFilter.hxx

Lines 145 to 165 in 921c5ba

ImageScanlineConstIterator inputIt(inputPtr, inputRegionForThread);

ImageScanlineIterator outputIt(outputPtr, outputRegionForThread);

while (!inputIt.IsAtEnd())

{

while (!inputIt.IsAtEndOfLine())

{

const InputPixelType & inputPixel = inputIt.Get();

OutputPixelType value{ outputIt.Get() };

for (unsigned int k = 0; k < componentsPerPixel; ++k)

{

value[k] = static_cast<typename OutputPixelType::ValueType>(inputPixel[k]);

}

outputIt.Set(value);

++inputIt;

++outputIt;

}

inputIt.NextLine();

outputIt.NextLine();

}

That's OK to me, as long as that assumption is OK 😃 So then, in the new code, it's also OK to just check inputRange.end() and ignore outputRange.end().

Cast image filter does not need a larger input region, so it does not override the default CallCopyOutputRegionToInputRegion implementation, and has the same input and output regions.

N-Dekker · 2025-12-03T16:46:22Z

Modules/Filtering/ImageFilterBase/include/itkCastImageFilter.hxx

+  while (inputIt != inputEnd)
  {
-    while (!inputIt.IsAtEndOfLine())
+    const InputPixelType & inputPixel = *inputIt;


You can make it a beautiful range-based for loop, if you like 😃:

for (const InputPixelType & inputPixel : inputRange)

Then you may remove the variables inputIt and inputEnd.

How close are we to being able to do:

for ( auto &[inputPixel, outputPixel] : zip(inputRange, outputRange))

😈

I didn't try, but such a zip might be possible. Very cute 😸 However... zip(inputRange, outputRange) will check both inputRange.end() and outputRange.end(). It may not be necessary to check both, when we assume that input and output region have the same size anyway. So then, zip(inputRange, outputRange) may be unnecessary slow!

Update: The input and output region do indeed have the same number of pixels, as implied by the reply from @dzenanz at https://github.com/InsightSoftwareConsortium/ITK/pull/5670/files/2e8c70a4499a2e87c8e0f65d25a355baa82f3bb9#r2586937371 So then, it is indeed unnecessary to check both inputRange.end() and outputRange.end(). So in this case zip is not necessary. Just a range-based for loop over inputRange should be fine 😃

Using the range-based for loop in did not give the same performance is all cases in the test code. 🤷

Using the range-based for loop in did not give the same performance is all cases in the test code.

Interesting! Do you mean that my range-based for suggestion is in some cases significantly slower than your current PR? If so, do you have a clue why?

By the way, the range-based for loop can also be written as:

for (const InputPixelType & inputPixel : ImageRegionRange(*inputPtr, inputRegionForThread))

So then the inputRange variable may be removed as well!

It appears measurable slower in the performance testing code. I don't know why.

Modules/Filtering/ImageFilterBase/include/itkCastImageFilter.hxx

N-Dekker · 2025-12-04T15:30:29Z

Honestly I think we should still fix this:

ITK/Modules/Filtering/ImageFilterBase/include/itkCastImageFilter.hxx

Line 159 in 5032829

OutputPixelType outputPixel{ *outputIt };

For a regular image, OutputPixelType outputPixel{ *outputIt }; has undefined behavior, as far as I know.

github-actions bot added type:Performance Improvement in terms of compilation or execution time area:Filtering Issues affecting the Filtering module labels Dec 3, 2025

SimonRit approved these changes Dec 3, 2025

View reviewed changes

blowekamp force-pushed the cast_image_region_performance branch from 0b6156d to 2e8c70a Compare December 3, 2025 12:12

blowekamp mentioned this pull request Dec 3, 2025

Performance comparison of iterators and ranges to copy vector images #5669

Closed

7 tasks

N-Dekker reviewed Dec 3, 2025

View reviewed changes

dzenanz approved these changes Dec 3, 2025

View reviewed changes

Modules/Filtering/ImageFilterBase/include/itkCastImageFilter.hxx Show resolved Hide resolved

N-Dekker mentioned this pull request Dec 4, 2025

STYLE: Make NumericTraits::GetLength constexpr whenever possible #5671

Merged

blowekamp merged commit 5032829 into InsightSoftwareConsortium:main Dec 4, 2025
17 checks passed

		@@ -139,29 +140,30 @@ CastImageFilter<TInputImage, TOutputImage>::DynamicThreadedGenerateDataDispatche

		this->CallCopyOutputRegionToInputRegion(inputRegionForThread, outputRegionForThread);

	ImageScanlineConstIterator inputIt(inputPtr, inputRegionForThread);
	ImageScanlineIterator outputIt(outputPtr, outputRegionForThread);

	while (!inputIt.IsAtEnd())
	{
	while (!inputIt.IsAtEndOfLine())
	{
	const InputPixelType & inputPixel = inputIt.Get();
	OutputPixelType value{ outputIt.Get() };
	for (unsigned int k = 0; k < componentsPerPixel; ++k)
	{
	value[k] = static_cast<typename OutputPixelType::ValueType>(inputPixel[k]);
	}
	outputIt.Set(value);

	++inputIt;
	++outputIt;
	}
	inputIt.NextLine();
	outputIt.NextLine();
	}

Uh oh!

Conversation

blowekamp commented Dec 3, 2025

PR Checklist

Uh oh!

SimonRit left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

N-Dekker Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

N-Dekker Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

N-Dekker commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

N-Dekker Dec 3, 2025 •

edited

Loading

N-Dekker Dec 4, 2025 •

edited

Loading