Skip to content

[Bug] MutableTorchTRTModule refit flag stuck at NEEDS_REFIT, never transitions to LIVE on B100/H100 with CUDA 13.x #4153

@apbose

Description

@apbose

Bug Description

test_resnet18_modify_attribute_no_refit fails because update_refit_condition() does not transition the RefitFlag from NEEDS_REFIT (2) to LIVE (4). This reproduces across all CUDA 13.x versions on both B100-TS2 and H100, and surfaces through multiple dynamo test suites (backend, conversion, models, partitioning).

Environment

  • GPUs: B100-TS2, H100
  • Arch: x86_64
  • CUDA: 13.2.0 / 13.1.1 / 13.0.2
  • OS: Ubuntu 24.04
  • cuDNN: 8.9.7.29
    • TensorRT: 10.16.0.59
  • Myelin: 2.17.78+7
  • CASK: 5.16.17+1
  • Python: 3.12
  • Package: qa_tar_py3.12

Failing Test

FAILED runtime/test_mutable_torchtrt_module.py::test_resnet18_modify_attribute_no_refit
  AssertionError: <RefitFlag.NEEDS_REFIT: 2> != <RefitFlag.LIVE: 4> :
  update_refit_condition() failed to set the flag to LIVE.

Reproducible Configurations

GPU CUDA Test Suites Affected Result
B100-TS2/x86_64 r13.0.2, r13.1.1, r13.2.0 backend, conversion, models, partitioning FAILED
H100/x86_64 r13.1.1, r13.2.0 backend, conversion, models, partitioning FAILED

All 20 logged test runs failed — no passing configuration observed.

Steps to Reproduce

  1. Run on B100-TS2 or H100 with CUDA 13.x and the environment above
  2. Execute: pytest runtime/test_mutable_torchtrt_module.py::test_resnet18_modify_attribute_no_refit

Expected Behavior

After modifying a model attribute without triggering a refit, update_refit_condition() should transition the RefitFlag from NEEDS_REFIT to LIVE, indicating the engine is still valid and does not require a full refit.

Additional Context

The failure appears in all four dynamo test suite categories (backend, conversion, models, partitioning), suggesting the issue is in the core MutableTorchTRTModule runtime logic rather than in any specific converter or partitioning path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions