11.. _python_runtime :
22
3- Python Runtime
4- ==============
3+ Python vs C++ runtime
4+ =====================
55
6- Torch-TensorRT provides two runtime backends for executing compiled TRT engines
7- inside a PyTorch graph:
6+ Torch-TensorRT uses a single module type, :class: `~torch_tensorrt.runtime.TorchTensorRTModule `,
7+ to run TensorRT engines inside PyTorch. The **execution path ** (which code actually drives
8+ ``execute_async ``) is selected at runtime:
89
9- * **C++ runtime ** (default) — ``TorchTensorRTModule `` backed by a C++ TorchBind class.
10- Fully serializable, supports CUDAGraphs, multi-device safe.
11- * **Python runtime ** — ``PythonTorchTensorRTModule `` backed entirely by the TRT Python
12- API. Simpler to instrument for debugging but **not serializable ** to
13- ``ExportedProgram ``.
10+ * **C++ path ** — ``torch.classes.tensorrt.Engine `` and ``torch.ops.tensorrt.execute_engine ``.
11+ Preferred for production when the Torch-TensorRT C++ extension is available: TorchScript-friendly,
12+ and integrates with the full C++ runtime stack.
13+ * **Python path ** — internal ``PythonTRTEngine `` plus
14+ ``torch.ops.tensorrt.execute_engine_python ``. Useful when the C++ extension is absent, or when
15+ you want easier Python-level debugging and instrumentation.
16+
17+ :class: `~torch_tensorrt.runtime.PythonTorchTensorRTModule ` is a **thin subclass ** of
18+ ``TorchTensorRTModule `` that **pins ** the Python path (same constructor and behavior, but always
19+ resolves to the Python engine). Prefer ``TorchTensorRTModule `` plus the global backend APIs below
20+ when you do not need that pin.
1421
1522----
1623
17- When to Use the Python Runtime
18- --------------------------------
24+ When to use the Python path
25+ ---------------------------
1926
20- Use `` use_python_runtime=True `` when:
27+ Use :func: ` ~torch_tensorrt.runtime.set_runtime_backend ` (typically as a context manager) when:
2128
22- * You need to run on a machine where the C++ Torch-TensorRT library is not installed
23- (e.g., a minimal CI container with only the Python wheel).
24- * You want to attach Python-level callbacks to the engine execution (via
25- :ref: `observer `) for debugging or profiling without building the C++ extension.
26- * You are debugging a conversion issue and want to step through TRT execution in Python.
29+ * The C++ Torch-TensorRT library is not installed (e.g. a minimal environment with only the Python pieces).
30+ * You want Python-level hooks (e.g. :ref: `observer `) without relying on the C++ extension.
31+ * You are debugging conversion or execution and want to break inside the Python TRT wrapper.
2732
28- Use the default C++ runtime in all other cases, especially :
33+ Prefer the C++ path when :
2934
30- * When saving a compiled module to disk (``torch_tensorrt.save() ``).
31- * When using CUDAGraphs for low-latency inference.
32- * In production deployments.
35+ * You rely on the default Torch-TensorRT deployment story and maximum parity with TorchScript export.
36+ * You use whole-graph CUDAGraph wrappers that assume the C++ runtime (see :ref: `cuda_graphs `).
3337
3438----
3539
36- Enabling the Python Runtime
37- -----------------------------
40+ Enabling the Python path
41+ ------------------------
42+
43+ **Process-wide default (context manager) **
3844
3945.. code-block :: python
4046
41- import torch_tensorrt
47+ import torch_tensorrt as tt
4248
43- trt_gm = torch_tensorrt.dynamo.compile(
44- exported_program,
45- arg_inputs = inputs,
46- use_python_runtime = True ,
47- )
49+ with tt.runtime.set_runtime_backend(" python" ):
50+ trt_gm = tt.dynamo.compile(exported_program, inputs)
4851
49- Or via ``torch.compile ``:
52+ ** ``torch.compile`` ** (same context manager around compile / first run)
5053
5154.. code-block :: python
5255
53- trt_model = torch.compile(
54- model,
55- backend = " tensorrt" ,
56- options = {" use_python_runtime" : True },
57- )
56+ import torch_tensorrt as tt
5857
59- ----
58+ with tt.runtime.set_runtime_backend(" python" ):
59+ trt_model = torch.compile(model, backend = " tensorrt" , options = {})
6060
61- Limitations
62- -----------
61+ The context manager does ** not ** replace :class: ` ~torch_tensorrt.runtime.PythonTorchTensorRTModule `,
62+ which always requests the Python path via a class-level pin.
6363
64- * **Not serializable **: ``PythonTorchTensorRTModule `` cannot be saved via
65- ``torch_tensorrt.save() `` as an ``ExportedProgram `` or loaded back. The module is
66- Python-only in-process.
64+ ----
6765
68- .. code-block :: python
66+ Serialization
67+ ---------------
6968
70- # This will raise an error with use_python_runtime=True:
71- torch_tensorrt.save(trt_gm, " model.ep" , arg_inputs = inputs)
69+ Module state records which backend was used (``runtime_backend `` in packed metadata). After load,
70+ ``TorchTensorRTModule `` reconstructs either the C++ engine or the Python engine wrapper
71+ as appropriate. Some **export ** workflows (e.g. certain ``ExportedProgram `` save paths) may still
72+ assume a C++-only graph; validate your deployment path if you mix Python execution with AOT export.
7273
73- * **No C++ deployment **: The compiled module cannot be exported to AOTInductor or used
74- in a C++ application without re-compiling with the C++ runtime.
74+ ----
7575
76- * **CUDAGraphs **: Whole-graph CUDAGraphs work with the Python runtime, but the
77- per-submodule CUDAGraph recording in ``CudaGraphsTorchTensorRTModule `` is
78- only available with the C++ runtime.
76+ Limitations
77+ -----------
78+
79+ * **C++ deployment **: A module that executed on the Python path still needs TensorRT and the
80+ Torch-TensorRT Python pieces available in-process unless you recompile targeting the C++ path.
81+ * **CUDAGraphs **: Whole-graph CUDAGraph wrappers may assume the C++ runtime for some configurations;
82+ see :ref: `cuda_graphs `.
83+ * **Explicit allocator engines **: Engines with data-dependent outputs may set
84+ ``requires_output_allocator=True ``; the unified module supports the output-allocator execution
85+ mode on the Python path. See :ref: `cuda_graphs ` for interaction with CUDA graphs.
7986
8087----
8188
82- ``PythonTorchTensorRTModule `` Direct Instantiation
83- ----------------------------------------------------
89+ ``PythonTorchTensorRTModule `` direct instantiation
90+ --------------------------------------------------
8491
85- You can instantiate ``PythonTorchTensorRTModule `` directly from raw engine bytes,
86- for example when integrating a TRT engine built outside of Torch-TensorRT:
92+ You can instantiate :class: `~torch_tensorrt.runtime.PythonTorchTensorRTModule ` from raw engine bytes
93+ when you need a **guaranteed ** Python execution path (e.g. integrating an engine built outside
94+ Torch-TensorRT):
8795
8896.. code-block :: python
8997
9098 from torch_tensorrt.dynamo.runtime import PythonTorchTensorRTModule
9199 from torch_tensorrt.dynamo._settings import CompilationSettings
92100
93- # Load raw engine bytes (e.g., from trtexec output or torch_tensorrt.dynamo.convert_*)
94101 with open (" model.engine" , " rb" ) as f:
95102 engine_bytes = f.read()
96103
@@ -104,37 +111,32 @@ for example when integrating a TRT engine built outside of Torch-TensorRT:
104111
105112 output = module(torch.randn(1 , 3 , 224 , 224 ).cuda())
106113
107- **Constructor arguments: **
114+ **Constructor arguments ** (same as `` TorchTensorRTModule ``):
108115
109116``serialized_engine `` (``bytes ``)
110- The raw serialized TRT engine bytes.
111-
112- ``input_binding_names `` (``List[str] ``)
113- TRT input binding names in the order they are passed to ``forward() ``.
117+ Raw serialized TRT engine.
114118
115- ``output_binding_names `` (``List[str] ``)
116- TRT output binding names in the order they should be returned .
119+ ``input_binding_names `` / `` output_binding_names `` (``List[str] ``)
120+ Binding names in `` forward `` order.
117121
118122``name `` (``str ``, optional)
119- Human-readable name for the module (used in logging) .
123+ Name for logging and serialization .
120124
121- ``settings `` (``CompilationSettings ``, optional)
122- The compilation settings used to build the engine. Used to determine device
123- placement and other runtime behaviors.
125+ ``settings `` (:class: `~torch_tensorrt.dynamo._settings.CompilationSettings `, optional)
126+ Device and runtime options (must match how the engine was built).
124127
125128``weight_name_map `` (``dict ``, optional)
126- Mapping of TRT weight names to PyTorch state dict names. Required for refit
127- support via :func: `~torch_tensorrt.dynamo.refit_module_weights `.
129+ For refit workflows; see :func: `~torch_tensorrt.dynamo.refit_module_weights `.
128130
129- ``requires_output_allocator `` (``bool ``, default ``False ``)
130- Set to ``True `` if the engine contains data-dependent-shape ops (``nonzero ``,
131- ``unique ``, etc.) that require TRT's output allocator.
131+ ``requires_output_allocator `` (``bool ``)
132+ Set ``True `` for data-dependent-shape ops that need TRT's output allocator.
132133
133134----
134135
135- Runtime Selection Logic
136- ------------------------
136+ Runtime selection summary
137+ -------------------------
137138
138- When ``use_python_runtime `` is ``None `` (auto-select), Torch-TensorRT tries to import
139- the C++ TorchBind class. If the C++ extension is not available it silently falls back to
140- the Python runtime. Pass ``True `` or ``False `` to force a specific runtime.
139+ * :func: `~torch_tensorrt.runtime.get_runtime_backend ` / :func: `~torch_tensorrt.runtime.set_runtime_backend `
140+ — process default for newly created ``TorchTensorRTModule `` instances (unless a subclass pins a backend).
141+ Use ``set_runtime_backend `` as a context manager to scope C++ vs Python for compile and forward.
142+ * If the C++ extension is **not ** built, only the Python path is available.
0 commit comments