Stand up dual stable-ABI build harness and port nms as the first op by adabeyta · Pull Request #1 · adabeyta/vision

adabeyta · 2026-06-18T22:26:21Z

This PR stands up the stable-ABI build path and ports nms (CPU, CUDA, quantized-CPU, MPS) as the first op. The intent is to validate the harness/per-backend patterns on nms.

Structure: Dual Extension

torchvision._C: the existing extension, now built from all not-yet-ported sources.
torchvision._C_stable: the new extension, built only from migrated sources (currently nms).

Why two extensions?
For a per op rollout a separate compilation unit helps to scope the pin and keep the tree buildable at every step of an incremental port (torchcodec migrated in place because it converted everything in one shot). Once all ops are stable remove legacy unstable compilation.

Layout Changes
setup.py: splits sources between the two extensions.

STABLE_SOURCES + _stable()/_not_stable() helpers split the source globs, so migrated files compile into _C_stable and everything else stays in _C.
make_C_stable_extension(): builds the new target with the -DTORCH_TARGET_VERSION pin.
ensure_hipified() hoisted to a single idempotent step both extensions call (runs once/with no build-order dependency between them).

Registration: schema declaration/per-backend kernel binding.

Schema via STABLE_TORCH_LIBRARY_FRAGMENT, which coexists with the classic TORCH_LIBRARY_FRAGMENT still in _C both register into the same torchvision namespace across two .so.
Per-backend kernels bound with STABLE_TORCH_LIBRARY_IMPL + TORCH_BOX.
The nms() C++ entrypoint dispatches through torch_call_dispatcher.

StableABICompat.h: shared compat header one place for ATen ops not yet wrapped in stable ops.h (placeholder until upstream.

Wraps sort, index_select, and masked_select through the dispatcher escape hatch, so each kernel doesn't re-roll the boilerplate.
Mirrors torchcodec's header of the same name

Missing Pytorch NMS MPS Stable-ABI Port Support

MPS scalar arguments:

The nms schema takes a float iou_threshold, which the Metal kernel needs as a scalar argument (constant float & iou_threshold [[buffer(3)]]). With no scalar-arg setter in the shim, that float is currently carried as a 1-element float32 tensor bound via set_arg_tensor. shim_mps.h exposes only aoti_torch_mps_set_arg_tensor and aoti_torch_mps_set_arg_int

Missing:
- aoti_torch_mps_set_arg_double
- aoti_torch_mps_set_arg_float
- aoti_torch_mps_set_arg_bytes
- aoti_torch_mps_set_arg_scalar.
TODO: Workaround (for now) packs iou_threshold into a 1-element float32 tensor (new_empty + fill_) and bind it at buffer(3) via set_arg_tensor. The kernel reads the same 4 bytes as constant float&, so it's behavior-identical. File PR for missing mps_set_args.
Metal source isolation.
Made nms Metal source into a standalone nms_metal_shader.h. The shared mps_kernels.h can't be used under the pin because it pulls in ATen/native/mps/OperationUtils.h and instantiates at::native::mps::MetalShaderLibrary. Derives from same isolation rationale as the dual extension (legacy _C keeps using the full mps_kernels.h)

Stable-ABI audit (`torch-abi-audit`)

Ran torch-abi-audit against the built torchvision package to verify the migrated extension stays on the stable surface and never reaches into at:: / c10:: / torch::jit:: internals.

_C_stable.so audits as STABLE — 67 stable-shim symbols, 0 unstable. The legacy _C.so and the
unrelated image.so stay UNSTABLE, which is expected: only nms has migrated so far, so the
package-level verdict remains UNSTABLE until the remaining ops move into _C_stable (same
mid-migration shape as torchcodec).

Package: torchvision
  Torch ABI:   UNSTABLE
  CPython ABI: n/a
  Extensions:  0
  Bundled libs: 3
  -- bundled libs --
    [UNSTABLE] [not-abi3] _C.so         (stable_shim=0,  unstable=110)
    [STABLE  ] [not-abi3] _C_stable.so  (stable_shim=67, unstable=0)
    [UNSTABLE] [uses-private-api] image.so  (stable_shim=0,  unstable=36)

…face

NicolasHug

Thank a lot for the PR @adabeyta, I left a few minor suggestions and questions, but overall this looks great!

Please submit this PR and all following ones to the torchvision repo instead, we'll want to review PRs directly over there so that we can get signal from the CI jobs.

NicolasHug · 2026-06-22T08:12:41Z

+// stable surface (the same pattern used by meta-pytorch/torchcodec's
+// StableABICompat.h and facebookresearch/xformers' pt_stable_utils.h).


let's use a permalink here (and we can ommit the ref to xformers, the torchcodec one is enough)

Suggested change

// stable surface (the same pattern used by meta-pytorch/torchcodec's

// StableABICompat.h and facebookresearch/xformers' pt_stable_utils.h).

// stable surface (the same pattern used by

// https://github.com/meta-pytorch/torchcodec/blob/8bbce656797c4f2b00feb2784ffe76e408be1e4c/src/torchcodec/_core/StableABICompat.h

NicolasHug · 2026-06-22T08:13:23Z

+#include <torch/csrc/stable/library.h>
+#include <torch/csrc/stable/ops.h>
+#include <torch/csrc/stable/tensor.h>
+#include <torch/headeronly/core/Dispatch_v2.h>
+#include <torch/headeronly/core/ScalarType.h>
+#include <torch/headeronly/util/Exception.h>


Nit, I have a small pref for just relying on the fact that most of these headers are transitively included already via the ../StableABICompat.h include.

NicolasHug · 2026-06-22T08:18:03Z

-at::Tensor nms_kernel(
-    const at::Tensor& dets,
-    const at::Tensor& scores,
+// Wrap the launch so the THO_DISPATCH_V2 body has no unprotected (launch-config) commas.


Q - why is this needed, i.e. what happens if there are commas? Are the commas the ones in <<<blocks, threads, 0, stream>>> ?

NicolasHug · 2026-06-22T08:30:56Z

+template <typename T>
+struct AccType {
+  using type = T;
+};
+template <>
+struct AccType<torch::headeronly::Half> {
+  using type = float;
+};


On the above: this seems OK, but perhaps a bit brittle. We control the types that get through this via our dispatcher invocation, so indeed this should only ever see floats and half types, but it won't be correct in general - e.g. the acc type of BFloat16 should be float, while this will map it to BFloat16.

It's worth adding a comment that this helper is only correct for float, double and Half, which is OK since these are the only types we support, but it's not generally correct. Alternatively we could consider upstreaming it as a follow-up but this is optional!

NicolasHug · 2026-06-22T08:35:01Z

+#if !defined(MOBILE) && defined(_WIN32)
+void* PyInit__C_stable(void) {
+  return nullptr;
+}
+#endif


I assume we'll move this somewhere else (in a more genereal file) once we have migrated more stuff? Can you add a TODO(stable-abi) for that?

NicolasHug · 2026-06-22T08:38:51Z



+# Stable-ABI extension (migrated ops)
+_load_library("_C_stable")


We'll want _C_stable to be handled exactly like the _C one, so this should be part of the if _load_library("_C"): condition above as well:

if _load_library("_C") and _load_library("_C_stable"):

NicolasHug · 2026-06-22T08:40:04Z

+
+# Files migrated to the stable ABI: subtracted from make_C_extension()'s globs and
+# built into _C_stable instead. Add an op's files here as it migrates.
+# TODO: when all ops migrate, drop make_C_extension()/STABLE_SOURCES and glob _C_stable.


Nit: I liked that you used TODO(stable-abi) in other places, it makes it very easy to grep for all the ABI-related TODOs!

Suggested change

# TODO: when all ops migrate, drop make_C_extension()/STABLE_SOURCES and glob _C_stable.

# TODO(stable-abi): when all ops migrate, drop make_C_extension()/STABLE_SOURCES and glob _C_stable.

Also just noting that in theory we could just migrate _C iteratively and not have _C_stable.

_C would just have a mix of stable and non-stable ops. The main thing we'd miss is that we wouldn't be able to set TORCH_TARGET_VERSION until all the ops in that lib have been migrated. But I think having _C_stable is good, as we'll be able to remove all the boilerplate / duplicated code once the migration is complete.

NicolasHug · 2026-06-22T08:44:43Z

    )


+def get_stable_macros_and_flags():


maybe add a TODO to merge this into get_macros_and_flags() once the migration is done?

NicolasHug · 2026-06-22T08:51:09Z

+// The kernel bodies below are copied verbatim from mps_kernels.h; keep them in
+// sync if that file changes.


maybe this comment can be removed as it'll stop being relevant as soon as we finish the migration (and we definitely won't be modifying mps_kernels.h until the migration is complete!) ?

NicolasHug · 2026-06-22T08:53:00Z

+// plain string, handed to aoti_torch_mps_create_shader_library at runtime --
+// the same shape PyTorch's AOTInductor MPS backend emits.


@janeyx99 can you confirm this is the official way to support MPS through the stable ABI?

adabeyta added 2 commits June 17, 2026 20:25

Stand up the stable-ABI build harness and migrate nms as the first op

a74e311

Port the nms MPS kernel to the stable ABI via the AOTI shim_mps.h sur…

829e953

…face

NicolasHug reviewed Jun 22, 2026

View reviewed changes

adabeyta added 3 commits June 22, 2026 15:25

Address reviewer comments and cleanup.

476420e

Merge branch 'main' into stable-abi-standup

776e799

Add stableABI nms op in the CMake target.

8f2660f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stand up dual stable-ABI build harness and port nms as the first op#1

Stand up dual stable-ABI build harness and port nms as the first op#1
adabeyta wants to merge 5 commits into
mainfrom
stable-abi-standup

adabeyta commented Jun 18, 2026 •

edited

Loading

Uh oh!

NicolasHug left a comment

Uh oh!

NicolasHug Jun 22, 2026

Uh oh!

NicolasHug Jun 22, 2026

Uh oh!

NicolasHug Jun 22, 2026

Uh oh!

NicolasHug Jun 22, 2026

Uh oh!

NicolasHug Jun 22, 2026

Uh oh!

NicolasHug Jun 22, 2026

Uh oh!

NicolasHug Jun 22, 2026

Uh oh!

NicolasHug Jun 22, 2026

Uh oh!

NicolasHug Jun 22, 2026

Uh oh!

NicolasHug Jun 22, 2026

Uh oh!

NicolasHug Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// stable surface (the same pattern used by meta-pytorch/torchcodec's
		// StableABICompat.h and facebookresearch/xformers' pt_stable_utils.h).



		# Stable-ABI extension (migrated ops)
		_load_library("_C_stable")

	# TODO: when all ops migrate, drop make_C_extension()/STABLE_SOURCES and glob _C_stable.
	# TODO(stable-abi): when all ops migrate, drop make_C_extension()/STABLE_SOURCES and glob _C_stable.

		// The kernel bodies below are copied verbatim from mps_kernels.h; keep them in
		// sync if that file changes.

		// plain string, handed to aoti_torch_mps_create_shader_library at runtime --
		// the same shape PyTorch's AOTInductor MPS backend emits.

		)


		def get_stable_macros_and_flags():

Conversation

adabeyta commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Structure: Dual Extension

Missing Pytorch NMS MPS Stable-ABI Port Support

Stable-ABI audit (torch-abi-audit)

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adabeyta commented Jun 18, 2026 •

edited

Loading

Stable-ABI audit (`torch-abi-audit`)