Skip to content

Nemotron3: Failed to load model with all configured dtypes. #147

@mcr-ksh

Description

@mcr-ksh

When trying Nemotron nvfp4 quantizised version.
Same issue with nvfp4 or auto.

Any hint how todo it?
Got the model running on vLLM v0.14.1
transformers==4.57.6
numpy==2.2

# heretic --dtypes nvfp4 --trust-remote-code true --model cybermotaz/nemotron3-nano-nvfp4-w4a16 
█░█░█▀▀░█▀▄░█▀▀░▀█▀░█░█▀▀  v1.1.0
█▀█░█▀▀░█▀▄░█▀▀░░█░░█░█░░
▀░▀░▀▀▀░▀░▀░▀▀▀░░▀░░▀░▀▀▀  https://github.com/p-e-w/heretic

Detected 1 CUDA device(s):
* GPU 0: NVIDIA Thor

Loading model cybermotaz/nemotron3-nano-nvfp4-w4a16...
* Trying dtype nvfp4... Failed (NemotronHForCausalLM.__init__() got an unexpected keyword argument 'dtype')
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /opt/local/miniconda3/envs/vllm/bin/heretic:10 in <module>                                       │
│                                                                                                  │
│    7 │   │   sys.argv[0] = sys.argv[0][:-11]                                                     │
│    8 │   elif sys.argv[0].endswith(".exe"):                                                      │
│    9 │   │   sys.argv[0] = sys.argv[0][:-4]                                                      │
│ ❱ 10 │   sys.exit(main())                                                                        │
│   11                                                                                             │
│                                                                                                  │
│ /usr/src/heretic/src/heretic/main.py:888 in main                                                 │
│                                                                                                  │
│   885 │   install()                                                                              │
│   886 │                                                                                          │
│   887 │   try:                                                                                   │
│ ❱ 888 │   │   run()                                                                              │
│   889 │   except BaseException as error:                                                         │
│   890 │   │   # Transformers appears to handle KeyboardInterrupt (or BaseException)              │
│   891 │   │   # internally in some places, which can re-raise a different error in the handler   │
│                                                                                                  │
│ /usr/src/heretic/src/heretic/main.py:307 in run                                                  │
│                                                                                                  │
│   304 │   │   elif choice is None or choice == "":                                               │
│   305 │   │   │   return                                                                         │
│   306 │                                                                                          │
│ ❱ 307 │   model = Model(settings)                                                                │
│   308 │   print()                                                                                │
│   309 │   print_memory_usage()                                                                   │
│   310                                                                                            │
│                                                                                                  │
│ /usr/src/heretic/src/heretic/model.py:145 in __init__                                            │
│                                                                                                  │
│   142 │   │   │   break                                                                          │
│   143 │   │                                                                                      │
│   144 │   │   if self.model is None:                                                             │
│ ❱ 145 │   │   │   raise Exception("Failed to load model with all configured dtypes.")            │
│   146 │   │                                                                                      │
│   147 │   │   self._apply_lora()                                                                 │
│   148                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Exception: Failed to load model with all configured dtypes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions