Skip to content

Stand up dual stable-ABI build harness and port nms as the first op#1

Open
adabeyta wants to merge 5 commits into
mainfrom
stable-abi-standup
Open

Stand up dual stable-ABI build harness and port nms as the first op#1
adabeyta wants to merge 5 commits into
mainfrom
stable-abi-standup

Conversation

@adabeyta

@adabeyta adabeyta commented Jun 18, 2026

Copy link
Copy Markdown
Owner

This PR stands up the stable-ABI build path and ports nms (CPU, CUDA, quantized-CPU, MPS) as the first op. The intent is to validate the harness/per-backend patterns on nms.

Structure: Dual Extension

  • torchvision._C: the existing extension, now built from all not-yet-ported sources.
  • torchvision._C_stable: the new extension, built only from migrated sources (currently nms).

Why two extensions?
For a per op rollout a separate compilation unit helps to scope the pin and keep the tree buildable at every step of an incremental port (torchcodec migrated in place because it converted everything in one shot). Once all ops are stable remove legacy unstable compilation.

Layout Changes
setup.py: splits sources between the two extensions.

  • STABLE_SOURCES + _stable()/_not_stable() helpers split the source globs, so migrated files compile into _C_stable and everything else stays in _C.
  • make_C_stable_extension(): builds the new target with the -DTORCH_TARGET_VERSION pin.
  • ensure_hipified() hoisted to a single idempotent step both extensions call (runs once/with no build-order dependency between them).

Registration: schema declaration/per-backend kernel binding.

  • Schema via STABLE_TORCH_LIBRARY_FRAGMENT, which coexists with the classic TORCH_LIBRARY_FRAGMENT still in _C both register into the same torchvision namespace across two .so.
  • Per-backend kernels bound with STABLE_TORCH_LIBRARY_IMPL + TORCH_BOX.
  • The nms() C++ entrypoint dispatches through torch_call_dispatcher.

StableABICompat.h: shared compat header one place for ATen ops not yet wrapped in stable ops.h (placeholder until upstream.

  • Wraps sort, index_select, and masked_select through the dispatcher escape hatch, so each kernel doesn't re-roll the boilerplate.
  • Mirrors torchcodec's header of the same name

Missing Pytorch NMS MPS Stable-ABI Port Support

  1. MPS scalar arguments:

    The nms schema takes a float iou_threshold, which the Metal kernel needs as a scalar argument (constant float & iou_threshold [[buffer(3)]]). With no scalar-arg setter in the shim, that float is currently carried as a 1-element float32 tensor bound via set_arg_tensor. shim_mps.h exposes only aoti_torch_mps_set_arg_tensor and aoti_torch_mps_set_arg_int

    Missing:

    • aoti_torch_mps_set_arg_double
    • aoti_torch_mps_set_arg_float
    • aoti_torch_mps_set_arg_bytes
    • aoti_torch_mps_set_arg_scalar.

    TODO: Workaround (for now) packs iou_threshold into a 1-element float32 tensor (new_empty + fill_) and bind it at buffer(3) via set_arg_tensor. The kernel reads the same 4 bytes as constant float&, so it's behavior-identical. File PR for missing mps_set_args.

  2. Metal source isolation.
    Made nms Metal source into a standalone nms_metal_shader.h. The shared mps_kernels.h can't be used under the pin because it pulls in ATen/native/mps/OperationUtils.h and instantiates at::native::mps::MetalShaderLibrary. Derives from same isolation rationale as the dual extension (legacy _C keeps using the full mps_kernels.h)

Stable-ABI audit (torch-abi-audit)

Ran torch-abi-audit against the built torchvision package to verify the migrated extension stays on the stable surface and never reaches into at:: / c10:: / torch::jit:: internals.

_C_stable.so audits as STABLE — 67 stable-shim symbols, 0 unstable. The legacy _C.so and the
unrelated image.so stay UNSTABLE, which is expected: only nms has migrated so far, so the
package-level verdict remains UNSTABLE until the remaining ops move into _C_stable (same
mid-migration shape as torchcodec).

Package: torchvision
  Torch ABI:   UNSTABLE
  CPython ABI: n/a
  Extensions:  0
  Bundled libs: 3
  -- bundled libs --
    [UNSTABLE] [not-abi3] _C.so         (stable_shim=0,  unstable=110)
    [STABLE  ] [not-abi3] _C_stable.so  (stable_shim=67, unstable=0)
    [UNSTABLE] [uses-private-api] image.so  (stable_shim=0,  unstable=36)

@NicolasHug NicolasHug left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank a lot for the PR @adabeyta, I left a few minor suggestions and questions, but overall this looks great!

Please submit this PR and all following ones to the torchvision repo instead, we'll want to review PRs directly over there so that we can get signal from the CI jobs.

Comment thread torchvision/csrc/ops/StableABICompat.h Outdated
Comment on lines +23 to +24
// stable surface (the same pattern used by meta-pytorch/torchcodec's
// StableABICompat.h and facebookresearch/xformers' pt_stable_utils.h).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use a permalink here (and we can ommit the ref to xformers, the torchcodec one is enough)

Suggested change
// stable surface (the same pattern used by meta-pytorch/torchcodec's
// StableABICompat.h and facebookresearch/xformers' pt_stable_utils.h).
// stable surface (the same pattern used by
// https://github.com/meta-pytorch/torchcodec/blob/8bbce656797c4f2b00feb2784ffe76e408be1e4c/src/torchcodec/_core/StableABICompat.h

Comment thread torchvision/csrc/ops/cpu/nms_kernel.cpp Outdated
Comment on lines +2 to +7
#include <torch/csrc/stable/library.h>
#include <torch/csrc/stable/ops.h>
#include <torch/csrc/stable/tensor.h>
#include <torch/headeronly/core/Dispatch_v2.h>
#include <torch/headeronly/core/ScalarType.h>
#include <torch/headeronly/util/Exception.h>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, I have a small pref for just relying on the fact that most of these headers are transitively included already via the ../StableABICompat.h include.

Comment thread torchvision/csrc/ops/cuda/nms_kernel.cu Outdated
at::Tensor nms_kernel(
const at::Tensor& dets,
const at::Tensor& scores,
// Wrap the launch so the THO_DISPATCH_V2 body has no unprotected (launch-config) commas.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q - why is this needed, i.e. what happens if there are commas? Are the commas the ones in <<<blocks, threads, 0, stream>>> ?

Comment on lines +28 to +35
template <typename T>
struct AccType {
using type = T;
};
template <>
struct AccType<torch::headeronly::Half> {
using type = float;
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the above: this seems OK, but perhaps a bit brittle. We control the types that get through this via our dispatcher invocation, so indeed this should only ever see floats and half types, but it won't be correct in general - e.g. the acc type of BFloat16 should be float, while this will map it to BFloat16.

It's worth adding a comment that this helper is only correct for float, double and Half, which is OK since these are the only types we support, but it's not generally correct. Alternatively we could consider upstreaming it as a follow-up but this is optional!

Comment thread torchvision/csrc/ops/nms.cpp Outdated
Comment on lines +11 to +15
#if !defined(MOBILE) && defined(_WIN32)
void* PyInit__C_stable(void) {
return nullptr;
}
#endif

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we'll move this somewhere else (in a more genereal file) once we have migrated more stuff? Can you add a TODO(stable-abi) for that?

Comment thread torchvision/extension.py Outdated


# Stable-ABI extension (migrated ops)
_load_library("_C_stable")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want _C_stable to be handled exactly like the _C one, so this should be part of the if _load_library("_C"): condition above as well:

if _load_library("_C") and _load_library("_C_stable"):

Comment thread setup.py Outdated

# Files migrated to the stable ABI: subtracted from make_C_extension()'s globs and
# built into _C_stable instead. Add an op's files here as it migrates.
# TODO: when all ops migrate, drop make_C_extension()/STABLE_SOURCES and glob _C_stable.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I liked that you used TODO(stable-abi) in other places, it makes it very easy to grep for all the ABI-related TODOs!

Suggested change
# TODO: when all ops migrate, drop make_C_extension()/STABLE_SOURCES and glob _C_stable.
# TODO(stable-abi): when all ops migrate, drop make_C_extension()/STABLE_SOURCES and glob _C_stable.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just noting that in theory we could just migrate _C iteratively and not have _C_stable.

_C would just have a mix of stable and non-stable ops. The main thing we'd miss is that we wouldn't be able to set TORCH_TARGET_VERSION until all the ops in that lib have been migrated. But I think having _C_stable is good, as we'll be able to remove all the boilerplate / duplicated code once the migration is complete.

Comment thread setup.py
)


def get_stable_macros_and_flags():

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a TODO to merge this into get_macros_and_flags() once the migration is done?

Comment on lines +13 to +14
// The kernel bodies below are copied verbatim from mps_kernels.h; keep them in
// sync if that file changes.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this comment can be removed as it'll stop being relevant as soon as we finish the migration (and we definitely won't be modifying mps_kernels.h until the migration is complete!) ?

Comment on lines +10 to +11
// plain string, handed to aoti_torch_mps_create_shader_library at runtime --
// the same shape PyTorch's AOTInductor MPS backend emits.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janeyx99 can you confirm this is the official way to support MPS through the stable ABI?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants