fix: propagate CuTe DSL runtime link requirements for static libraries by francismelon · Pull Request #103 · NVIDIA/TensorRT-Edge-LLM

francismelon · 2026-06-07T09:17:16Z

What does this PR do?

Type of change: Bug fix

Overview:
This PR fixes CuTe DSL link propagation for static library targets.

Before this change, cute_dsl_setup() did not propagate all CuTe DSL runtime compatibility pieces when the target type was STATIC_LIBRARY. As a result, downstream executables could fail during the final link step because they did not inherit all required link dependencies and options.

This change updates the static-library path to propagate:

the CuTe DSL static archive
the CuTe DSL cudart shim
the CUDA driver library
the _cudaLaunchKernelEx wrap linker option for CUDA < 12.8

The existing private-link behavior for non-static targets is preserved.

Usage

This is a build-system fix only. No user-facing API or runtime usage change is introduced.

🚀 Pull Request Checklist

Thank you for contributing to TensorRT Edge-LLM! Before we review your pull request, please make sure the following items are complete.
Please also refer to Contributor guidelines for general guidelines.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit.
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing.

📄 Documentation

Updated any necessary documentation

⚙️ Compatibility

The change is backward compatible

Additional Information

Reproduction environment

The issue was reproduced during cross-build configuration for Jetson Orin with:

-DCMAKE_BUILD_TYPE=Release
-DTRT_PACKAGE_DIR=/usr
-DCMAKE_TOOLCHAIN_FILE=cmake/aarch64_linux_toolchain.cmake
-DEMBEDDED_TARGET=jetson-orin
-DCUDA_CTK_VERSION=12.6
-DENABLE_CUTE_DSL=ALL

Failure before this fix

Before this change, the build failed at the final executable link step for llm_inference with unresolved CuTe DSL CUDA runtime symbols:

/usr/bin/ld: ../../../cpp/kernels/cuteDSLArtifact/aarch64/sm_87/libcutedsl_aarch64.a(CudaDialectRuntime.c.o): in function `_cudaLibraryLoadData':
(.text+0x0): undefined reference to `cudaLibraryLoadData'
/usr/bin/ld: ../../../cpp/kernels/cuteDSLArtifact/aarch64/sm_87/libcutedsl_aarch64.a(CudaDialectRuntime.c.o): in function `_cudaLibraryUnload':
(.text+0x8): undefined reference to `cudaLibraryUnload'
/usr/bin/ld: ../../../cpp/kernels/cuteDSLArtifact/aarch64/sm_87/libcutedsl_aarch64.a(CudaDialectRuntime.c.o): in function `_cudaLibraryGetKernel':
(.text+0x10): undefined reference to `cudaLibraryGetKernel'
/usr/bin/ld: ../../../cpp/kernels/cuteDSLArtifact/aarch64/sm_87/libcutedsl_aarch64.a(CudaDialectRuntime.c.o): in function `_cudaKernelSetAttributeForDevice':
(.text+0x20): undefined reference to `cudaKernelSetAttributeForDevice'
collect2: error: ld returned 1 exit status
make[2]: *** [examples/llm/CMakeFiles/llm_inference.dir/build.make:131: examples/llm/llm_inference] Error 1
make[1]: *** [CMakeFiles/Makefile2:389: examples/llm/CMakeFiles/llm_inference.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

Validation after this fix

Validation performed locally:

pre-commit run --files cmake/CuteDsl.cmake
Configure succeeded with:
- cmake .. -DCMAKE_BUILD_TYPE=Release -DTRT_PACKAGE_DIR=/usr -DCMAKE_TOOLCHAIN_FILE=cmake/aarch64_linux_toolchain.cmake -DEMBEDDED_TARGET=jetson-orin -DCUDA_CTK_VERSION=12.6 -DENABLE_CUTE_DSL=ALL
Build succeeded
Runtime binary startup validation succeeded

Relevant successful outputs after this fix included:

[100%] Linking CXX shared library ../libNvInfer_edgellm_plugin.so
[100%] Built target NvInfer_edgellm_plugin

./examples/llm/llm_inference --help
Usage: ./examples/llm/llm_inference [--help] [--engineDir=<path to engine directory>] [--multimodalEngineDir=<path to multimodal engine directory>] [--inputFile=<path to input file>] [--outputFile=<path to output file>] [--dumpProfile] [--profileOutputFile=<path to profile output file>] [--warmup=<number>] [--debug] [--dumpOutput] [--batchSize=<number>] [--maxGenerateLength=<number>] [--specDecode] [--specDraftTopK=<number>] [--specDraftStep=<number>] [--specVerifySize=<number>]

Risk

Low. The change only adjusts link visibility and propagation for CuTe DSL-related dependencies and link options for static-library targets, while preserving the existing behavior for non-static targets.

Signed-off-by: francismelon <qq1650190803@gmail.com>

fix: propagate CuTe DSL runtime link requirements for static libraries

73919fa

Signed-off-by: francismelon <qq1650190803@gmail.com>

francismelon requested a review from a team June 7, 2026 09:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: propagate CuTe DSL runtime link requirements for static libraries#103

fix: propagate CuTe DSL runtime link requirements for static libraries#103
francismelon wants to merge 1 commit into
NVIDIA:mainfrom
francismelon:fix/cutedsl-static-link-propagation

francismelon commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

francismelon commented Jun 7, 2026

What does this PR do?

Usage

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

📄 Documentation

⚙️ Compatibility

Additional Information

Reproduction environment

Failure before this fix

Validation after this fix

Risk

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant