[Bug] 17 dynamo test failures across all GPUs on CUDA 13.x — SDP use_fp32_acc, BERT accuracy, AutomaticPlugin, refit

## Bug Description

17 tests fail across the dynamo test suites on all tested GPU architectures and CUDA 13.x versions. The failures group into four distinct categories, suggesting multiple independent regressions.

## Environment

- **GPUs:** RTX 3070, B100-TS2, H100, A100
- **Arch:** x86_64
- **CUDA:** 13.2.0 / 13.1.1 / 13.0.2
- **OS:** Ubuntu 24.04
- **cuDNN:** 8.9.7.29
- **TensorRT:** 10.16.0.59
- **Myelin:** 2.17.78+7
- **CASK:** 5.16.17+1
- **Python:** 3.12
- **Package:** qa_tar_py3.12

## Failure Categories

### 1. `scaled_dot_product_attention` — unexpected keyword argument `use_fp32_acc` (8 tests)

```
FAILED lowering/test_decompositions.py::TestLowering::test_lowering_scaled_dot_product_attention_0
FAILED lowering/test_decompositions.py::TestLowering::test_lowering_scaled_dot_product_attention_1
FAILED lowering/test_decompositions.py::TestLowering::test_lowering_scaled_dot_product_cudnn_attention_0
FAILED lowering/test_decompositions.py::TestLowering::test_lowering_scaled_dot_product_cudnn_attention_1
FAILED lowering/test_decompositions.py::TestLowering::test_lowering_scaled_dot_product_efficient_attention_0
FAILED lowering/test_decompositions.py::TestLowering::test_lowering_scaled_dot_product_efficient_attention_1
FAILED lowering/test_decompositions.py::TestLowering::test_lowering_scaled_dot_product_flash_attention_0
FAILED lowering/test_decompositions.py::TestLowering::test_lowering_scaled_dot_product_flash_attention_1
```

**Error:** `TypeError: scaled_dot_product_attention() got an unexpected keyword argument 'use_fp32_acc'`

The `use_fp32_acc` kwarg may have been removed or renamed in the current PyTorch/CUDA version.

### 2. `scaled_dot_product_attention` with dynamic shape — output mismatch (1 test)

```
FAILED lowering/test_decompositions.py::TestLowering::test_lowering_scaled_dot_product_attention_with_dynamic_shape_0
```

**Error:** `AssertionError: Scaled_dot_product_attention_with_dynamic_shape TRT outputs don't match with the original model.`

### 3. BERT base-uncased — accuracy regression and dtype issue (5 tests)

```
FAILED models/test_models.py::test_bert_base_uncased[dtype0] - Cosine sim: 0.5738 (threshold: 0.99)
FAILED models/test_models.py::test_bert_base_uncased[dtype1] - TypeError: Unsupported numpy dtype
FAILED models/test_models.py::test_bert_base_uncased[dtype2] - Cosine sim: 0.5090 (threshold: 0.99)
FAILED models/test_models.py::test_bert_base_uncased_cpu_offload - Cosine sim: 0.4028 (threshold: 0.99)
FAILED models/test_models_export.py::test_bert_base_uncased - Cosine sim: 0.4203 (threshold: 0.99)
```

Cosine similarity scores are far below the 0.99 threshold, indicating significant numerical divergence in the compiled BERT model.

### 4. AutomaticPlugin — array conversion error (2 tests)

```
FAILED automatic_plugin/test_automatic_plugin_with_attrs.py::TestAutomaticPlugin::test_scale_mul_plugin_float_0
FAILED automatic_plugin/test_automatic_plugin_with_attrs.py::TestAutomaticPlugin::test_scale_mul_plugin_float_1
```

**Error:** `TypeError: only 0-dimensional arrays can be converted to Python scalars`

### 5. Refit cumsum fallback — missing PyTorch segment (1 test)

```
FAILED models/test_model_refit.py::test_refit_cumsum_fallback
```

**Error:** `AssertionError: False is not true : test_refit_cumsum_fallback test found 0 pytorch segments but expected 1`

## Reproducible Configurations

| GPU | CUDA | Test Suites Affected |
|-----|------|---------------------|
| RTX 3070/x86_64 | r13.0.2, r13.1.1, r13.2.0 | backend, conversion, models, partitioning, runtime |
| A100/x86_64 | r13.0.2, r13.1.1, r13.2.0 | backend, conversion, models, partitioning, runtime |
| H100/x86_64 | r13.1.1, r13.2.0 | runtime |
| B100-TS2/x86_64 | r13.0.2, r13.1.1, r13.2.0 | runtime |

## Overall Test Results

```
17 failed, 2317 passed, 22 skipped, 2 xpassed, 4088 warnings in 3644.85s (1:00:44)
```

## Steps to Reproduce

1. Run on any of the listed GPUs with CUDA 13.x and the environment above
2. Execute the dynamo test suites:
   ```
   pytest lowering/test_decompositions.py
   pytest models/test_models.py
   pytest models/test_models_export.py
   pytest models/test_model_refit.py
   pytest automatic_plugin/test_automatic_plugin_with_attrs.py
   ```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] 17 dynamo test failures across all GPUs on CUDA 13.x — SDP use_fp32_acc, BERT accuracy, AutomaticPlugin, refit #4154

Bug Description

Environment

Failure Categories

1. `scaled_dot_product_attention` — unexpected keyword argument `use_fp32_acc` (8 tests)

2. `scaled_dot_product_attention` with dynamic shape — output mismatch (1 test)

3. BERT base-uncased — accuracy regression and dtype issue (5 tests)

4. AutomaticPlugin — array conversion error (2 tests)

5. Refit cumsum fallback — missing PyTorch segment (1 test)

Reproducible Configurations

Overall Test Results

Steps to Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPU	CUDA	Test Suites Affected
RTX 3070/x86_64	r13.0.2, r13.1.1, r13.2.0	backend, conversion, models, partitioning, runtime
A100/x86_64	r13.0.2, r13.1.1, r13.2.0	backend, conversion, models, partitioning, runtime
H100/x86_64	r13.1.1, r13.2.0	runtime
B100-TS2/x86_64	r13.0.2, r13.1.1, r13.2.0	runtime

[Bug] 17 dynamo test failures across all GPUs on CUDA 13.x — SDP use_fp32_acc, BERT accuracy, AutomaticPlugin, refit #4154

Description

Bug Description

Environment

Failure Categories

1. scaled_dot_product_attention — unexpected keyword argument use_fp32_acc (8 tests)

2. scaled_dot_product_attention with dynamic shape — output mismatch (1 test)

3. BERT base-uncased — accuracy regression and dtype issue (5 tests)

4. AutomaticPlugin — array conversion error (2 tests)

5. Refit cumsum fallback — missing PyTorch segment (1 test)

Reproducible Configurations

Overall Test Results

Steps to Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `scaled_dot_product_attention` — unexpected keyword argument `use_fp32_acc` (8 tests)

2. `scaled_dot_product_attention` with dynamic shape — output mismatch (1 test)