Skip to content

feat: add runtime cache API for TensorRT-RTX#4180

Open
tp5uiuc wants to merge 3 commits intopytorch:mainfrom
tp5uiuc:feat/runtime-cache-rtx
Open

feat: add runtime cache API for TensorRT-RTX#4180
tp5uiuc wants to merge 3 commits intopytorch:mainfrom
tp5uiuc:feat/runtime-cache-rtx

Conversation

@tp5uiuc
Copy link
Copy Markdown
Contributor

@tp5uiuc tp5uiuc commented Apr 10, 2026

Description

Add runtime cache support for TensorRT-RTX JIT compilation results, replacing the timing cache which is not used by RTX (no autotuning).

TensorRT-RTX uses JIT compilation at inference time. The runtime cache (IRuntimeCache) stores these compilation results so that kernels and execution graphs are not recompiled on subsequent runs. This is analogous to the timing cache but operates at inference time rather than build time.

Fixes #3817

Changes

  • Skip timing cache for RTX: Early return in _create_timing_cache() and _save_timing_cache() when ENABLED_FEATURES.tensorrt_rtx is True (timing cache is a no-op in TRT-RTX)
  • Add runtime_cache_path setting: New RUNTIME_CACHE_PATH default and runtime_cache_path field in CompilationSettings, threaded through all compile functions
  • Wire up IRuntimeCache in PythonTorchTensorRTModule: Create RuntimeConfig with runtime cache on engine setup, load from disk if available, save on module destruction
  • File locking: Uses filelock for concurrent access safety when multiple processes share the same cache file
  • Documentation: Updated docstrings, compilation settings RST, and engine cache tutorial with new "Runtime Cache (TensorRT-RTX)" section

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

@meta-cla meta-cla bot added the cla signed label Apr 10, 2026
@tp5uiuc tp5uiuc marked this pull request as draft April 10, 2026 20:18
@github-actions github-actions bot added documentation Improvements or additions to documentation component: tests Issues re: Tests component: conversion Issues re: Conversion stage component: core Issues re: The core compiler component: api [Python] Issues re: Python API component: runtime component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Apr 10, 2026
@github-actions github-actions bot requested a review from cehongwang April 10, 2026 20:18
Comment thread docsrc/tutorials/resource_memory/engine_cache.rst
Comment thread tests/py/dynamo/runtime/test_000_runtime_cache.py
Comment thread tests/py/dynamo/models/test_runtime_cache_models.py
Comment thread py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py
@github-actions github-actions bot added the component: build system Issues re: Build system label Apr 10, 2026
Comment thread tests/py/dynamo/runtime/test_000_runtime_cache.py
Comment thread py/requirements.txt
Comment thread setup.py
@tp5uiuc tp5uiuc marked this pull request as ready for review April 10, 2026 20:58
@cehongwang cehongwang requested review from lanluo-nvidia and removed request for cehongwang April 11, 2026 00:17
Comment thread py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py
Comment thread py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py
dryrun: bool = _defaults.DRYRUN,
hardware_compatible: bool = _defaults.HARDWARE_COMPATIBLE,
timing_cache_path: str = _defaults.TIMING_CACHE_PATH,
runtime_cache_path: str = _defaults.RUNTIME_CACHE_PATH,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime cache is a JIT-time API : it may not much make sense for cross_compile_for_windows and convert_exported_program_to_serialized_trt_engine. I have added it to the interface as a common API for entry point into torch-TRT, but I can add it to unsupported_settings

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, it doesn't make sense in JIT-time cache.
Let's add unsupported_settings for now, even in future, we want this feature we can add it back.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for the feedback Lan 🙏

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 3893fa4 which emits a warning.

Copy link
Copy Markdown
Collaborator

@lanluo-nvidia lanluo-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Add runtime cache support for TensorRT-RTX JIT compilation results,
replacing the timing cache which is not used by RTX (no autotuning).

Changes:
- Skip timing cache creation/saving for TensorRT-RTX in _TRTInterpreter
- Add RUNTIME_CACHE_PATH default and runtime_cache_path setting
- Wire up IRuntimeCache in PythonTorchTensorRTModule (setup, load, save)
- Persist runtime cache to disk with filelock for concurrent access safety
- Thread runtime_cache_path through all compile functions
- Add unit tests (12 tests) and E2E model tests (6 tests)
- Update docstrings and RST documentation

Fixes pytorch#3817

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tp5uiuc and others added 2 commits April 15, 2026 11:50
Version provided by upstream torch; no pin needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
runtime_cache_path is a JIT-time API for TensorRT-RTX that only applies
at inference time via PythonTorchTensorRTModule. Remove it from
compilation_options in cross_compile_for_windows and
convert_exported_program_to_serialized_trt_engine (with a warning),
letting the dataclass default fill in harmlessly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tp5uiuc tp5uiuc force-pushed the feat/runtime-cache-rtx branch from 3893fa4 to 3be6032 Compare April 15, 2026 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend: TensorRT-RTX cla signed component: api [Python] Issues re: Python API component: build system Issues re: Build system component: conversion Issues re: Conversion stage component: core Issues re: The core compiler component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: runtime component: tests Issues re: Tests documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 [Bug] TensorRT-RTX: need to remove timing cache

3 participants