Describe what you are looking for
Feature request: Specialized Normalize — image, video, volume
Why this matters
This may look like a very narrow use case, but normalize is the single most widely used transform in all of computer vision augmentations. Almost every training pipeline applies it (ImageNet mean/std, YOLO /255, min-max, etc.). So performance here has outsized impact.
Summary
We request from NumKong a small set of specialized normalize operations. Normalization (subtract mean / divide by std, or min-max scale) is the most common preprocessing step in image and video pipelines.
- Work for image
(H, W, C), video (N, H, W, C) or (T, H, W, C), and volume (D, H, W, C) or (N, D, H, W, C).
- Support external mean/std or min/max (YOLO, ImageNet, Inception, etc.) or compute them from the data (per-image or per-channel).
- Performance: We’d love the implementation to be competitive with or faster than (1) direct NumPy, (2) OpenCV’s specialized functions where they exist, and (3) the “build LUT + apply LUT” approach. The section below gives reference code to compare against.
(uint8 → float32 LUT is a separate request — see ashvardanian/StringZilla#302
How we do it now (Albucore)
AlbumentationsX calls Albucore for normalize. Current behavior https://github.com/albumentations-team/albucore/blob/main/albucore/functions.py#L336-L348 :
- Standard (external mean/std): We pass precomputed
mean_np and denominator (i.e. mean and 1/(std * max_pixel_value) in pixel space). Formula: out = (img - mean_np) * denominator → float32. For uint8 input, Albucore builds a float32 LUT of length 256 and applies it (e.g. via cv2.LUT) to get float32 output in one pass instead of converting to float then doing the math.
- Per-image / min-max:
normalize_per_image(img, mode) with modes like "min_max", "image", "image_per_channel", "min_max_per_channel" — compute stats from the image then apply.
So today we have: (a) direct float32 path (numpy-like or fused), (b) uint8 path via LUT. Beating or matching these would make NumKong a natural choice for us.
Reference implementations to compare against
If you’re not deep in CV: the snippets below are reference implementations we use today (NumPy, OpenCV, LUT). Same input (e.g. uint8 image (H, W, 3) or float32), same output (float32). Timing these gives a baseline; we’d adopt NumKong if it’s competitive or faster.
1. Direct NumPy — standard (external mean/std)
import numpy as np
# Input: float32 (H, W, C); mean, std shape (C,) in [0,1]; max_pixel_value = 255
def normalize_numpy_standard(img: np.ndarray, mean: np.ndarray, std: np.ndarray, max_val: float = 255.0) -> np.ndarray:
x = img.astype(np.float32) / max_val
return (x - mean) / (std + 1e-7)
2. Direct NumPy — min-max (global, to [0,1])
def normalize_numpy_minmax_global(img: np.ndarray) -> np.ndarray:
img_f = img.astype(np.float32)
mn, mx = img_f.min(), img_f.max()
return (img_f - mn) / ((mx - mn) + 1e-7)
3. OpenCV — min-max (global)
OpenCV has a dedicated function for this; a natural comparison point.
import cv2
def normalize_cv2_minmax(img: np.ndarray) -> np.ndarray:
# img uint8 or float32; output float32 [0, 1]
return cv2.normalize(img, None, 0, 1, cv2.NORM_MINMAX, dtype=cv2.CV_32F)
4. OpenCV — per-channel mean/std (compute then apply)
No single cv2 call for “subtract mean, divide by std”; we use cv2.meanStdDev then NumPy.
def normalize_cv2_meanstd_per_channel(img: np.ndarray) -> np.ndarray:
# img float32 (H, W, C)
mean, std = cv2.meanStdDev(img) # (C, 1) each
mean = mean.flatten()
std = std.flatten()
return (img - mean) / (std + 1e-7)
5. LUT path (uint8 → float32) — standard
How Albucore does it for uint8: build a float32 LUT, then one LUT apply per channel (or fused). Another useful baseline.
import cv2
def build_lut_standard(mean: float, inv_std: float) -> np.ndarray:
# For one channel: out[i] = (i/255 - mean) * inv_std => (i/255 - mean) / std
lut = np.arange(256, dtype=np.float32) / 255.0
lut = (lut - mean) * inv_std
return lut
def normalize_uint8_via_lut(img: np.ndarray, mean: np.ndarray, std: np.ndarray) -> np.ndarray:
# img uint8 (H, W, C); mean, std (C,)
out = np.empty((*img.shape[:-1], img.shape[-1]), dtype=np.float32)
for c in range(img.shape[-1]):
lut = build_lut_standard(mean[c], 1.0 / (std[c] + 1e-7))
out[..., c] = cv2.LUT(img[..., c], lut)
return out
Operations we need
1. Standard normalization (mean + std)
- Per-channel: Compute mean and std per channel, then
out = (x - mean) / std (mean/std shape (C,)).
- Global: Single scalar mean and std for all channels:
out = (x - mean) / std.
External values: Caller may pass precomputed mean and std (e.g. ImageNet, YOLO, Inception — see External presets below).
cv2 analogue: None as a single call; we do (x - mean) / std manually. cv2.meanStdDev() can compute per-channel mean/std.
NumPy equivalents:
import numpy as np
# ---- Input: x shape (H, W, C), float32 ----
# Per-channel: compute mean/std from data (over spatial dims), then normalize
def standard_per_channel(x: np.ndarray, axis=(0, 1)) -> np.ndarray:
mean = np.mean(x, axis=axis, keepdims=True) # (1, 1, C)
std = np.std(x, axis=axis, keepdims=True) # (1, 1, C)
return (x - mean) / (std + 1e-7)
# Per-channel: external mean/std (e.g. ImageNet in [0,1]); mean, std shape (C,) or (1,1,C)
def standard_external_per_channel(x: np.ndarray, mean: np.ndarray, std: np.ndarray) -> np.ndarray:
# mean, std e.g. (3,) for RGB; broadcast to (H, W, C)
return (x - mean) / (std + 1e-7)
# Global: single scalar mean and std
def standard_global(x: np.ndarray, mean: float, std: float) -> np.ndarray:
return (x - mean) / (std + 1e-7)
# Example: ImageNet-style (x in [0,255], mean/std in [0,1])
max_val = 255.0
mean_01 = np.array([0.485, 0.456, 0.406], dtype=np.float32)
std_01 = np.array([0.229, 0.224, 0.225], dtype=np.float32)
# out = (x / max_val - mean_01) / std_01
For video (N, H, W, C): same ops; per-channel mean/std would use axis=(0, 1, 2) to compute over N, H, W. For volume (D, H, W, C): axis=(0, 1, 2, 3) for per-channel.
2. Min-max normalization
- Per-channel: Compute min and max per channel, then scale to a target range (e.g. [0, 1]).
- Global: Single min/max over all channels; same formula.
External values: Caller may pass precomputed min/max or only the output range [out_min, out_max] (we compute data min/max).
cv2 analogue: cv2.normalize(src, dst, alpha, beta, cv2.NORM_MINMAX, dtype=cv2.CV_32F) — rescales so min→alpha, max→beta.
NumPy equivalents:
import numpy as np
# ---- Input: x shape (H, W, C), float32 ----
# Per-channel: compute min/max from data, scale to [0, 1]
def minmax_per_channel(x: np.ndarray, axis=(0, 1), out_lo=0.0, out_hi=1.0, eps=1e-7) -> np.ndarray:
x_min = np.min(x, axis=axis, keepdims=True) # (1, 1, C)
x_max = np.max(x, axis=axis, keepdims=True) # (1, 1, C)
scale = (x_max - x_min) + eps
normalized = (x - x_min) / scale
return normalized * (out_hi - out_lo) + out_lo
# Global: single min/max over entire array
def minmax_global(x: np.ndarray, out_lo=0.0, out_hi=1.0, eps=1e-7) -> np.ndarray:
x_min, x_max = np.min(x), np.max(x)
scale = (x_max - x_min) + eps
normalized = (x - x_min) / scale
return normalized * (out_hi - out_lo) + out_lo
# External min/max: user provides data range (e.g. 0 and 255 for uint8)
def minmax_external(x: np.ndarray, in_lo: float, in_hi: float, out_lo=0.0, out_hi=1.0) -> np.ndarray:
scale = (in_hi - in_lo) + 1e-7
return (x - in_lo) / scale * (out_hi - out_lo) + out_lo
# e.g. uint8 -> [0,1]: minmax_external(x, 0, 255, 0, 1)
For video (N, H, W, C): per-channel min/max over axis=(0, 1, 2). For volume (D, H, W, C): axis=(0, 1, 2, 3).
Input shapes
All operations should accept:
- Image:
(H, W, C)
- Video / batch of images:
(N, H, W, C) or (T, H, W, C)
- Volume:
(D, H, W, C)
- Batch of volumes:
(N, D, H, W, C)
So the “channel” dimension is always the last; spatial and batch/sequence dimensions can vary.
External presets (examples)
We need to support external mean/std or scale factors so users can plug in standard schemes without recomputing:
- YOLO: scale pixel range to [0, 1]: effectively
mean=(0, 0, 0), std=(1, 1, 1) after dividing by 255 (or mean=0, std=255 in pixel space then output in [0,1]).
- ImageNet (standard):
mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225) (in [0,1]); in pixel space with max_pixel_value=255: mean and std are arrays of shape (3,).
- Inception: often
mean=(0.5, 0.5, 0.5), scale to [-1, 1] or [0, 1] with fixed scale.
So the API should accept:
- mean: scalar or array of shape
(C,) (or broadcast-compatible).
- std: scalar or array of shape
(C,) (or broadcast-compatible).
- min / max: optional; for min-max path, scalar or per-channel.
- max_pixel_value: optional (e.g. 255 for uint8); used to convert from [0,255] to [0,1] before applying mean/std in normalized space, or to interpret mean/std given in [0,1] space.
Same idea for divide (and subtract): allow caller to pass the exact values or arrays they want to subtract and divide by (e.g. from a config or from another framework’s normalization constants).
Requested from NumKong
We request these normalize ops from NumKong. AlbumentationsX would call NumKong for normalize (image, video, volume shapes).
Summary table
| Operation |
Per-channel |
Global |
External mean/std |
cv2 analogue |
| Standard (mean/std) |
Yes |
Yes |
Yes |
No single call |
| Min-max |
Yes |
Yes |
Yes (or from data) |
cv2.normalize NORM_MINMAX |
Input shapes: (H,W,C), (N,H,W,C), (D,H,W,C), (N,D,H,W,C). Output: float32. Presets: YOLO, ImageNet, Inception, or user-defined arrays.
Can you contribute to the implementation?
Is your feature request specific to a certain interface?
It applies to everything
Contact Details
No response
Is there an existing issue for this?
Code of Conduct
Describe what you are looking for
Feature request: Specialized Normalize — image, video, volume
Why this matters
This may look like a very narrow use case, but normalize is the single most widely used transform in all of computer vision augmentations. Almost every training pipeline applies it (ImageNet mean/std, YOLO /255, min-max, etc.). So performance here has outsized impact.
Summary
We request from NumKong a small set of specialized normalize operations. Normalization (subtract mean / divide by std, or min-max scale) is the most common preprocessing step in image and video pipelines.
(H, W, C), video(N, H, W, C)or(T, H, W, C), and volume(D, H, W, C)or(N, D, H, W, C).(uint8 → float32 LUT is a separate request — see ashvardanian/StringZilla#302
How we do it now (Albucore)
AlbumentationsX calls Albucore for normalize. Current behavior https://github.com/albumentations-team/albucore/blob/main/albucore/functions.py#L336-L348 :
mean_npanddenominator(i.e.meanand1/(std * max_pixel_value)in pixel space). Formula:out = (img - mean_np) * denominator→ float32. For uint8 input, Albucore builds a float32 LUT of length 256 and applies it (e.g. viacv2.LUT) to get float32 output in one pass instead of converting to float then doing the math.normalize_per_image(img, mode)with modes like"min_max","image","image_per_channel","min_max_per_channel"— compute stats from the image then apply.So today we have: (a) direct float32 path (numpy-like or fused), (b) uint8 path via LUT. Beating or matching these would make NumKong a natural choice for us.
Reference implementations to compare against
If you’re not deep in CV: the snippets below are reference implementations we use today (NumPy, OpenCV, LUT). Same input (e.g. uint8 image
(H, W, 3)or float32), same output (float32). Timing these gives a baseline; we’d adopt NumKong if it’s competitive or faster.1. Direct NumPy — standard (external mean/std)
2. Direct NumPy — min-max (global, to [0,1])
3. OpenCV — min-max (global)
OpenCV has a dedicated function for this; a natural comparison point.
4. OpenCV — per-channel mean/std (compute then apply)
No single cv2 call for “subtract mean, divide by std”; we use
cv2.meanStdDevthen NumPy.5. LUT path (uint8 → float32) — standard
How Albucore does it for uint8: build a float32 LUT, then one LUT apply per channel (or fused). Another useful baseline.
Operations we need
1. Standard normalization (mean + std)
out = (x - mean) / std(mean/std shape(C,)).out = (x - mean) / std.External values: Caller may pass precomputed mean and std (e.g. ImageNet, YOLO, Inception — see External presets below).
cv2 analogue: None as a single call; we do
(x - mean) / stdmanually.cv2.meanStdDev()can compute per-channel mean/std.NumPy equivalents:
For video
(N, H, W, C): same ops; per-channel mean/std would useaxis=(0, 1, 2)to compute over N, H, W. For volume(D, H, W, C):axis=(0, 1, 2, 3)for per-channel.2. Min-max normalization
External values: Caller may pass precomputed min/max or only the output range
[out_min, out_max](we compute data min/max).cv2 analogue:
cv2.normalize(src, dst, alpha, beta, cv2.NORM_MINMAX, dtype=cv2.CV_32F)— rescales so min→alpha, max→beta.NumPy equivalents:
For video
(N, H, W, C): per-channel min/max overaxis=(0, 1, 2). For volume(D, H, W, C):axis=(0, 1, 2, 3).Input shapes
All operations should accept:
(H, W, C)(N, H, W, C)or(T, H, W, C)(D, H, W, C)(N, D, H, W, C)So the “channel” dimension is always the last; spatial and batch/sequence dimensions can vary.
External presets (examples)
We need to support external mean/std or scale factors so users can plug in standard schemes without recomputing:
mean=(0, 0, 0),std=(1, 1, 1)after dividing by 255 (ormean=0,std=255in pixel space then output in [0,1]).mean=(0.485, 0.456, 0.406),std=(0.229, 0.224, 0.225)(in [0,1]); in pixel space withmax_pixel_value=255: mean and std are arrays of shape(3,).mean=(0.5, 0.5, 0.5), scale to [-1, 1] or [0, 1] with fixed scale.So the API should accept:
(C,)(or broadcast-compatible).(C,)(or broadcast-compatible).Same idea for divide (and subtract): allow caller to pass the exact values or arrays they want to subtract and divide by (e.g. from a config or from another framework’s normalization constants).
Requested from NumKong
We request these normalize ops from NumKong. AlbumentationsX would call NumKong for normalize (image, video, volume shapes).
Summary table
cv2.normalizeNORM_MINMAXInput shapes:
(H,W,C),(N,H,W,C),(D,H,W,C),(N,D,H,W,C). Output: float32. Presets: YOLO, ImageNet, Inception, or user-defined arrays.Can you contribute to the implementation?
Is your feature request specific to a certain interface?
It applies to everything
Contact Details
No response
Is there an existing issue for this?
Code of Conduct