Bug Description
17 tests fail across the dynamo test suites on all tested GPU architectures and CUDA 13.x versions. The failures group into four distinct categories, suggesting multiple independent regressions.
Environment
- GPUs: RTX 3070, B100-TS2, H100, A100
- Arch: x86_64
- CUDA: 13.2.0 / 13.1.1 / 13.0.2
- OS: Ubuntu 24.04
- cuDNN: 8.9.7.29
- TensorRT: 10.16.0.59
- Myelin: 2.17.78+7
- CASK: 5.16.17+1
- Python: 3.12
- Package: qa_tar_py3.12
Failure Categories
1. scaled_dot_product_attention — unexpected keyword argument use_fp32_acc (8 tests)
FAILED lowering/test_decompositions.py::TestLowering::test_lowering_scaled_dot_product_attention_0
FAILED lowering/test_decompositions.py::TestLowering::test_lowering_scaled_dot_product_attention_1
FAILED lowering/test_decompositions.py::TestLowering::test_lowering_scaled_dot_product_cudnn_attention_0
FAILED lowering/test_decompositions.py::TestLowering::test_lowering_scaled_dot_product_cudnn_attention_1
FAILED lowering/test_decompositions.py::TestLowering::test_lowering_scaled_dot_product_efficient_attention_0
FAILED lowering/test_decompositions.py::TestLowering::test_lowering_scaled_dot_product_efficient_attention_1
FAILED lowering/test_decompositions.py::TestLowering::test_lowering_scaled_dot_product_flash_attention_0
FAILED lowering/test_decompositions.py::TestLowering::test_lowering_scaled_dot_product_flash_attention_1
Error: TypeError: scaled_dot_product_attention() got an unexpected keyword argument 'use_fp32_acc'
The use_fp32_acc kwarg may have been removed or renamed in the current PyTorch/CUDA version.
2. scaled_dot_product_attention with dynamic shape — output mismatch (1 test)
FAILED lowering/test_decompositions.py::TestLowering::test_lowering_scaled_dot_product_attention_with_dynamic_shape_0
Error: AssertionError: Scaled_dot_product_attention_with_dynamic_shape TRT outputs don't match with the original model.
3. BERT base-uncased — accuracy regression and dtype issue (5 tests)
FAILED models/test_models.py::test_bert_base_uncased[dtype0] - Cosine sim: 0.5738 (threshold: 0.99)
FAILED models/test_models.py::test_bert_base_uncased[dtype1] - TypeError: Unsupported numpy dtype
FAILED models/test_models.py::test_bert_base_uncased[dtype2] - Cosine sim: 0.5090 (threshold: 0.99)
FAILED models/test_models.py::test_bert_base_uncased_cpu_offload - Cosine sim: 0.4028 (threshold: 0.99)
FAILED models/test_models_export.py::test_bert_base_uncased - Cosine sim: 0.4203 (threshold: 0.99)
Cosine similarity scores are far below the 0.99 threshold, indicating significant numerical divergence in the compiled BERT model.
4. AutomaticPlugin — array conversion error (2 tests)
FAILED automatic_plugin/test_automatic_plugin_with_attrs.py::TestAutomaticPlugin::test_scale_mul_plugin_float_0
FAILED automatic_plugin/test_automatic_plugin_with_attrs.py::TestAutomaticPlugin::test_scale_mul_plugin_float_1
Error: TypeError: only 0-dimensional arrays can be converted to Python scalars
5. Refit cumsum fallback — missing PyTorch segment (1 test)
FAILED models/test_model_refit.py::test_refit_cumsum_fallback
Error: AssertionError: False is not true : test_refit_cumsum_fallback test found 0 pytorch segments but expected 1
Reproducible Configurations
| GPU |
CUDA |
Test Suites Affected |
| RTX 3070/x86_64 |
r13.0.2, r13.1.1, r13.2.0 |
backend, conversion, models, partitioning, runtime |
| A100/x86_64 |
r13.0.2, r13.1.1, r13.2.0 |
backend, conversion, models, partitioning, runtime |
| H100/x86_64 |
r13.1.1, r13.2.0 |
runtime |
| B100-TS2/x86_64 |
r13.0.2, r13.1.1, r13.2.0 |
runtime |
Overall Test Results
17 failed, 2317 passed, 22 skipped, 2 xpassed, 4088 warnings in 3644.85s (1:00:44)
Steps to Reproduce
- Run on any of the listed GPUs with CUDA 13.x and the environment above
- Execute the dynamo test suites:
pytest lowering/test_decompositions.py
pytest models/test_models.py
pytest models/test_models_export.py
pytest models/test_model_refit.py
pytest automatic_plugin/test_automatic_plugin_with_attrs.py
Bug Description
17 tests fail across the dynamo test suites on all tested GPU architectures and CUDA 13.x versions. The failures group into four distinct categories, suggesting multiple independent regressions.
Environment
Failure Categories
1.
scaled_dot_product_attention— unexpected keyword argumentuse_fp32_acc(8 tests)Error:
TypeError: scaled_dot_product_attention() got an unexpected keyword argument 'use_fp32_acc'The
use_fp32_acckwarg may have been removed or renamed in the current PyTorch/CUDA version.2.
scaled_dot_product_attentionwith dynamic shape — output mismatch (1 test)Error:
AssertionError: Scaled_dot_product_attention_with_dynamic_shape TRT outputs don't match with the original model.3. BERT base-uncased — accuracy regression and dtype issue (5 tests)
Cosine similarity scores are far below the 0.99 threshold, indicating significant numerical divergence in the compiled BERT model.
4. AutomaticPlugin — array conversion error (2 tests)
Error:
TypeError: only 0-dimensional arrays can be converted to Python scalars5. Refit cumsum fallback — missing PyTorch segment (1 test)
Error:
AssertionError: False is not true : test_refit_cumsum_fallback test found 0 pytorch segments but expected 1Reproducible Configurations
Overall Test Results
Steps to Reproduce