Stand up dual stable-ABI build harness and port nms as the first op#1
Stand up dual stable-ABI build harness and port nms as the first op#1adabeyta wants to merge 5 commits into
Conversation
NicolasHug
left a comment
There was a problem hiding this comment.
Thank a lot for the PR @adabeyta, I left a few minor suggestions and questions, but overall this looks great!
Please submit this PR and all following ones to the torchvision repo instead, we'll want to review PRs directly over there so that we can get signal from the CI jobs.
| // stable surface (the same pattern used by meta-pytorch/torchcodec's | ||
| // StableABICompat.h and facebookresearch/xformers' pt_stable_utils.h). |
There was a problem hiding this comment.
let's use a permalink here (and we can ommit the ref to xformers, the torchcodec one is enough)
| // stable surface (the same pattern used by meta-pytorch/torchcodec's | |
| // StableABICompat.h and facebookresearch/xformers' pt_stable_utils.h). | |
| // stable surface (the same pattern used by | |
| // https://github.com/meta-pytorch/torchcodec/blob/8bbce656797c4f2b00feb2784ffe76e408be1e4c/src/torchcodec/_core/StableABICompat.h |
| #include <torch/csrc/stable/library.h> | ||
| #include <torch/csrc/stable/ops.h> | ||
| #include <torch/csrc/stable/tensor.h> | ||
| #include <torch/headeronly/core/Dispatch_v2.h> | ||
| #include <torch/headeronly/core/ScalarType.h> | ||
| #include <torch/headeronly/util/Exception.h> |
There was a problem hiding this comment.
Nit, I have a small pref for just relying on the fact that most of these headers are transitively included already via the ../StableABICompat.h include.
| at::Tensor nms_kernel( | ||
| const at::Tensor& dets, | ||
| const at::Tensor& scores, | ||
| // Wrap the launch so the THO_DISPATCH_V2 body has no unprotected (launch-config) commas. |
There was a problem hiding this comment.
Q - why is this needed, i.e. what happens if there are commas? Are the commas the ones in <<<blocks, threads, 0, stream>>> ?
| template <typename T> | ||
| struct AccType { | ||
| using type = T; | ||
| }; | ||
| template <> | ||
| struct AccType<torch::headeronly::Half> { | ||
| using type = float; | ||
| }; |
There was a problem hiding this comment.
On the above: this seems OK, but perhaps a bit brittle. We control the types that get through this via our dispatcher invocation, so indeed this should only ever see floats and half types, but it won't be correct in general - e.g. the acc type of BFloat16 should be float, while this will map it to BFloat16.
It's worth adding a comment that this helper is only correct for float, double and Half, which is OK since these are the only types we support, but it's not generally correct. Alternatively we could consider upstreaming it as a follow-up but this is optional!
| #if !defined(MOBILE) && defined(_WIN32) | ||
| void* PyInit__C_stable(void) { | ||
| return nullptr; | ||
| } | ||
| #endif |
There was a problem hiding this comment.
I assume we'll move this somewhere else (in a more genereal file) once we have migrated more stuff? Can you add a TODO(stable-abi) for that?
|
|
||
|
|
||
| # Stable-ABI extension (migrated ops) | ||
| _load_library("_C_stable") |
There was a problem hiding this comment.
We'll want _C_stable to be handled exactly like the _C one, so this should be part of the if _load_library("_C"): condition above as well:
if _load_library("_C") and _load_library("_C_stable"):
|
|
||
| # Files migrated to the stable ABI: subtracted from make_C_extension()'s globs and | ||
| # built into _C_stable instead. Add an op's files here as it migrates. | ||
| # TODO: when all ops migrate, drop make_C_extension()/STABLE_SOURCES and glob _C_stable. |
There was a problem hiding this comment.
Nit: I liked that you used TODO(stable-abi) in other places, it makes it very easy to grep for all the ABI-related TODOs!
| # TODO: when all ops migrate, drop make_C_extension()/STABLE_SOURCES and glob _C_stable. | |
| # TODO(stable-abi): when all ops migrate, drop make_C_extension()/STABLE_SOURCES and glob _C_stable. |
There was a problem hiding this comment.
Also just noting that in theory we could just migrate _C iteratively and not have _C_stable.
_C would just have a mix of stable and non-stable ops. The main thing we'd miss is that we wouldn't be able to set TORCH_TARGET_VERSION until all the ops in that lib have been migrated. But I think having _C_stable is good, as we'll be able to remove all the boilerplate / duplicated code once the migration is complete.
| ) | ||
|
|
||
|
|
||
| def get_stable_macros_and_flags(): |
There was a problem hiding this comment.
maybe add a TODO to merge this into get_macros_and_flags() once the migration is done?
| // The kernel bodies below are copied verbatim from mps_kernels.h; keep them in | ||
| // sync if that file changes. |
There was a problem hiding this comment.
maybe this comment can be removed as it'll stop being relevant as soon as we finish the migration (and we definitely won't be modifying mps_kernels.h until the migration is complete!) ?
| // plain string, handed to aoti_torch_mps_create_shader_library at runtime -- | ||
| // the same shape PyTorch's AOTInductor MPS backend emits. |
There was a problem hiding this comment.
@janeyx99 can you confirm this is the official way to support MPS through the stable ABI?
This PR stands up the stable-ABI build path and ports nms (CPU, CUDA, quantized-CPU, MPS) as the first op. The intent is to validate the harness/per-backend patterns on nms.
Structure: Dual Extension
torchvision._C: the existing extension, now built from all not-yet-ported sources.torchvision._C_stable: the new extension, built only from migrated sources (currently nms).Why two extensions?
For a per op rollout a separate compilation unit helps to scope the pin and keep the tree buildable at every step of an incremental port (torchcodec migrated in place because it converted everything in one shot). Once all ops are stable remove legacy unstable compilation.
Layout Changes
setup.py: splits sources between the two extensions.STABLE_SOURCES + _stable()/_not_stable()helpers split the source globs, so migrated files compile into _C_stable and everything else stays in _C.make_C_stable_extension(): builds the new target with the -DTORCH_TARGET_VERSION pin.ensure_hipified()hoisted to a single idempotent step both extensions call (runs once/with no build-order dependency between them).Registration: schema declaration/per-backend kernel binding.StableABICompat.h: shared compat header one place for ATen ops not yet wrapped in stable ops.h (placeholder until upstream.Missing Pytorch NMS MPS Stable-ABI Port Support
MPS scalar arguments:
The nms schema takes a float iou_threshold, which the Metal kernel needs as a scalar argument (constant float & iou_threshold [[buffer(3)]]). With no scalar-arg setter in the shim, that float is currently carried as a 1-element float32 tensor bound via set_arg_tensor. shim_mps.h exposes only aoti_torch_mps_set_arg_tensor and aoti_torch_mps_set_arg_int
Missing:
TODO: Workaround (for now) packs iou_threshold into a 1-element float32 tensor (new_empty + fill_) and bind it at buffer(3) via set_arg_tensor. The kernel reads the same 4 bytes as constant float&, so it's behavior-identical. File PR for missing mps_set_args.
Metal source isolation.
Made nms Metal source into a standalone nms_metal_shader.h. The shared mps_kernels.h can't be used under the pin because it pulls in ATen/native/mps/OperationUtils.h and instantiates at::native::mps::MetalShaderLibrary. Derives from same isolation rationale as the dual extension (legacy _C keeps using the full mps_kernels.h)
Stable-ABI audit (
torch-abi-audit)Ran
torch-abi-auditagainst the builttorchvisionpackage to verify the migrated extension stays on the stable surface and never reaches intoat::/c10::/torch::jit::internals._C_stable.soaudits asSTABLE— 67 stable-shim symbols, 0 unstable. The legacy_C.soand theunrelated
image.sostayUNSTABLE, which is expected: onlynmshas migrated so far, so thepackage-level verdict remains
UNSTABLEuntil the remaining ops move into_C_stable(samemid-migration shape as torchcodec).