Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
142 changes: 142 additions & 0 deletions .claude/api_review_report.txt

Large diffs are not rendered by default.

91 changes: 91 additions & 0 deletions .claude/design_review_report.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
---
Phase 4 — Design Review Report

Likely Design Issues

1. Module docstring lists functions from other packages (src/RegisterMismatchCuda.jl:26–35)
The top-level docstring advertises nanpad, highpass, correctbias!, truncatenoise!, aperture_grid, and allocate_mmarrays as if they live here. None of them are exported or defined in
this package — they come from companion packages. A first-time user reading the docs would search for these, get a MethodError, and be confused.

2. assertsamesize only checks 3 dimensions (src/RegisterMismatchCuda.jl:328–329)
size(A,1) == size(B,1) && size(A,2) == size(B,2) && size(A,3) == size(B,3)
For 1D arrays this silently checks only dimension 1 (which is always checked). For 4D+ arrays it ignores higher dimensions entirely. Should be size(A) == size(B).

3. fftfunc::Any / ifftfunc::Any fields in CMStorage (lines 91–92)
These fields sit on the hot path — every mismatch! call multiplies by them. Typing them Any suppresses specialization and forces runtime dispatch. They could be typed as
AbstractFFTs.Plan (or a concrete subtype), or made type parameters on CMStorage.

4. Active deprecations that have not been removed (lines 333–341)
There are three deprecated callsites:
- CuRCpair(realtype, realsize) → CuRCpair{T}(undef, realsize)
- CMStorage{T}(::UndefInitializer, WidthLike, DimsLike) → tuple form
- CMStorage(::Type{T}, ...) → CMStorage{T}(undef, ...)

These carry maintenance debt and emit runtime warnings to downstream users.

5. cucartesianindex only handles N = 1, 2, 3 (kernels.jl:1–6)
CMStorage{T,N} is parametric in N, so a user can create 4D storage. But cucartesianindex has no method for N ≥ 4 — any GPU kernel launch on 4D data would fail at runtime with an
obscure error. This is a latent type-system gap.

---
Design Questions

Q1 — Is CuRCpair genuinely public?
CuRCpair is exported but is purely a GPU r2c buffer layout detail. No user of mismatch or mismatch! ever needs to touch it directly. Was it exported to allow power users to build
custom storage types, or is it an accidental export? If the latter, removing it from the export list would be a minor breaking change.

Q2 — normalization as Symbol vs. dispatch type
Both high-level entry points take normalization=:intensity or :pixels, which routes to one of two completely different GPU kernels via a runtime if. An alternative design would use
singleton dispatch types (struct IntensityNorm end). The symbol approach matches the CPU counterpart in RegisterMismatchCommon — was keeping the API identical a deliberate choice?
Changing it would be breaking.

Q3 — display=false keyword as perpetual compat shim (lines 115–117)
The constructors accept a display keyword that is explicitly ignored "for compatibility with the CPU version." Is this expected to remain indefinitely, or is there a plan to drop it
once users have migrated? It currently silently swallows the argument rather than warning.

Q4 — NanCorrFFTs field names I0, I1, I2
Inside fillfixed!, I1 holds the image, I2 holds the elementwise square, and I0 holds the NaN mask (theta). The index-based names are opaque. Was there a reason not to name them
theta, image, image_sq? (This is internal-only so it's not a breaking change.)

Q5 — mismatch_apertures accepts kwargs... passed to CMStorage (lines 202–213)
The only recognized CMStorage keyword is display=false. Any other keyword silently disappears. Was this intended as a forward-compatibility shim, or should unknown keywords produce
an error?

---
Observations

- calculate_threads, kernel_conv_components!, kernel_calcNumDenom_intensity!, and kernel_calcNumDenom_pixels! are all accessed via RM. in the test suite but have no docstrings and
no public annotation. If they are intended for advanced users building custom kernels, they warrant docstrings. If they are truly internal, the test-qualified access could be
replaced.
- CMStorage defines eltype and ndims but not size. A user who wants to know what aperture size a given storage handles must access cms.aperture_width directly. A size(cms) method
returning the aperture size would be natural.
- mismatch_apertures (line 192) calls CuArray{T}(sdata(fixed)) — sdata is a SharedArrays function for unwrapping shared arrays. This suggests fixed was once expected to commonly be
a SharedArray. Is this still a realistic input type, or a historical artifact?
- CuRCpair.rng is typed NTuple{N} with no element type constraint. In practice it always holds UnitRange{Int} elements, but the loose typing means the compiler can't specialize on
it.
- No show methods for CMStorage, CuRCpair, or NanCorrFFTs. At the REPL these fall back to the generic mutable struct display, which prints all fields including raw GPU array
pointers — not very readable.

---
Overall Character

RegisterMismatchCuda is a tightly scoped, purpose-built GPU backend. Its design is largely coherent: the CMStorage / fillfixed! / mismatch! three-step pattern is clean, and the
allocating vs. in-place function pairs follow standard Julia idioms. The main tensions are (a) a module docstring that describes a richer API than the package actually owns,
creating a misleading first impression; (b) three active deprecations that should have been removed; and (c) fftfunc::Any on the hot path. If the author wanted to address the two
highest-leverage items, they would be: fix the module docstring (zero-cost, high user-facing impact) and remove the deprecations (the TODO: remove deprecations step in the
maintenance workflow will handle this).

---
Phase 5 — Values Clarification

Before writing the plan, I'd like to understand your intent on a few points:

1. Scope and audience: Is this package purely a GPU backend consumed by RegisterMismatch users, or do you intend it to be usable standalone? And is CuRCpair's export intentional —
are there users who construct them directly?
2. Central abstraction: CMStorage is clearly the centerpiece. Should it stay as-is (a flat mutable struct with all fields public), or would you prefer it to move toward a more
opaque type over time?
3. Composability / Base interfaces: How much do you want to invest in show, size(cms), etc.? Or is the REPL experience for these types not a priority?
4. Error model: The codebase mixes bare error() and ArgumentError. Should there be a consistent style?
5. normalization as Symbol: Would you consider moving to dispatch types, or does keeping it consistent with the CPU counterpart (RegisterMismatchCommon) take priority?

Take as long as you need — your answers will shape which chunks land in the plan and which get dropped.
15 changes: 15 additions & 0 deletions .claude/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "[[ \"$CLAUDE_TOOL_INPUT_FILE_PATH\" == *.jl ]] && julia --project -e 'using Runic; exit(Runic.main(ARGS))' -- --inplace \"$CLAUDE_TOOL_INPUT_FILE_PATH\" || true"
}
]
}
]
}
}
2 changes: 2 additions & 0 deletions .git-blame-ignore-revs
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Runic formatting
3628a61d6a30db8f0e4b05a1649a88389bfdebd1
12 changes: 10 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "RegisterMismatchCuda"
uuid = "ee0d8d85-fa18-576c-9601-66ebb12862d9"
version = "1.0.0"
authors = ["Tim Holy <tim.holy@gmail.com>"]
version = "0.3.4"

[deps]
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
Expand All @@ -12,17 +12,25 @@ RegisterMismatchCommon = "abb2e897-52bf-5d28-a379-6ca321e3b878"
SharedArrays = "1a1011a3-84de-559e-8e89-a11a2f7dc383"

[compat]
Aqua = "0.8"
CUDA = "3,4,5"
CenterIndexedArrays = "0.2, 1"
ExplicitImports = "1"
ImageCore = "0.8.1, 0.9, 0.10"
ImageFiltering = "0.7"
Primes = "0.4, 0.5"
RegisterCore = "0.2, 1"
RegisterMismatchCommon = "0.2.2, 1"
SharedArrays = "1"
Test = "1"
julia = "1.10"

[extras]
Aqua = "4c88cf16-eb10-579e-8560-4a9242c79595"
CenterIndexedArrays = "46a7138f-0d70-54e1-8ada-fb8296f91f24"
ExplicitImports = "7d51a73a-1435-4ff3-83d9-f097790105c7"
ImageFiltering = "6a3955dd-da59-5b1f-98d4-e7296123deb5"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["Test", "CenterIndexedArrays", "ImageFiltering"]
test = ["Test", "Aqua", "CenterIndexedArrays", "ExplicitImports", "ImageFiltering"]
122 changes: 122 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# RegisterMismatchCuda

[![Build Status](https://github.com/HolyLab/RegisterMismatchCuda.jl/actions/workflows/CI.yml/badge.svg?branch=master)](https://github.com/HolyLab/RegisterMismatchCuda.jl/actions/workflows/CI.yml?query=branch%3Amaster)
[![Coverage](https://codecov.io/gh/HolyLab/RegisterMismatchCuda.jl/branch/master/graph/badge.svg)](https://codecov.io/gh/HolyLab/RegisterMismatchCuda.jl)

GPU-accelerated mismatch computation for image registration. This package is a
CUDA backend for the [Register](https://github.com/HolyLab) image-registration
ecosystem, implementing the same interface as
[RegisterMismatch](https://github.com/HolyLab/RegisterMismatch.jl) but running
on the GPU via [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl).

## What is "mismatch"?

Mismatch quantifies how well two images align at a given pixel shift. For a
shift **d**, it is the sum of squared differences between the fixed image and
the shifted moving image, divided by a normalization factor. The shift with the
smallest mismatch is the best-fit translation between the images.

Each result entry is stored as a `(numerator, denominator)` pair inside a
[`MismatchArray`](https://github.com/HolyLab/RegisterCore.jl). Use
`RegisterCore.separate` to extract the pair and
`RegisterCore.indmin_mismatch(mm, thresh)` to find the optimal shift while
ignoring poorly-normalized entries.

## Installation

This package lives in the [HolyLab registry](https://github.com/HolyLab/HolyLabRegistry).
Add the registry once, then install as usual:

```julia
using Pkg
pkg"registry add https://github.com/HolyLab/HolyLabRegistry.git"
Pkg.add("RegisterMismatchCuda")
```

A working CUDA installation and a compatible GPU are required at runtime.
See [CUDA.jl setup](https://cuda.juliagpu.org/stable/installation/overview/) for details.

## Usage

### Simple one-shot mismatch

Compare two images up to a given maximum shift. Host arrays are uploaded to the
GPU automatically:

```julia
using RegisterMismatchCuda, RegisterCore

fixed = rand(Float32, 256, 256)
moving = rand(Float32, 256, 256)
maxshift = (20, 20)

mm = mismatch(fixed, moving, maxshift) # returns a MismatchArray
best_shift = RegisterCore.indmin_mismatch(mm, 0.01)
```

`best_shift` is a `CartesianIndex` giving the translation (in pixels) from
`fixed` to `moving`, indexed relative to zero (e.g. `CartesianIndex(-3, 5)`
means the moving image is shifted 3 pixels up and 5 pixels to the right).

Pass `normalization=:pixels` (default is `:intensity`) to normalize by the
number of overlapping pixels rather than by image intensity.

### Repeated comparisons against a fixed image (CMStorage workflow)

When comparing many moving images against a single fixed image, pre-allocating
`CMStorage` avoids redundant GPU allocations and re-computing the fixed image's
FFTs:

```julia
using RegisterMismatchCuda, RegisterCore, CUDA

aperture_width = (256, 256)
maxshift = (20, 20)

cms = CMStorage{Float32}(undef, aperture_width, maxshift)

d_fixed = CuArray(rand(Float32, 256, 256))
d_moving = CuArray(rand(Float32, 256, 256))
mm = RegisterCore.MismatchArray(Float32, (2 .* maxshift .+ 1)...)

fillfixed!(cms, d_fixed) # compute fixed-image FFTs once

mismatch!(mm, cms, d_moving) # fast: only moving-image FFTs + correlation
best_shift = RegisterCore.indmin_mismatch(mm, 0.01)

# reuse cms for another moving image without touching d_fixed again
d_moving2 = CuArray(rand(Float32, 256, 256))
mismatch!(mm, cms, d_moving2)
```

### Localized (aperture) mismatch

For non-rigid or spatially varying registration, compute mismatch independently
inside a grid of localized apertures. Each aperture produces its own
`MismatchArray`:

```julia
using RegisterMismatchCuda, RegisterMismatchCommon, RegisterCore

fixed = rand(Float32, 512, 512)
moving = rand(Float32, 512, 512)

# 4×4 grid of apertures (centers and widths chosen automatically), max shift ±15
gridsize = (4, 4)
maxshift = (15, 15)

mms = mismatch_apertures(Float32, fixed, moving, gridsize, maxshift)

# mms is an Array{MismatchArray} with shape gridsize
# find the best shift at each aperture location:
shifts = map(mm -> RegisterCore.indmin_mismatch(mm, 0.01), mms)
```

For irregular aperture layouts, pass explicit aperture centers and an explicit
width instead:

```julia
aperture_centers = RegisterMismatchCommon.aperture_grid(size(fixed), gridsize)
aperture_width = RegisterMismatchCommon.default_aperture_width(fixed, gridsize)
mms = mismatch_apertures(Float32, fixed, moving, aperture_centers, aperture_width, maxshift)
```
Loading
Loading