When trying Nemotron nvfp4 quantizised version.
Same issue with nvfp4 or auto.
Any hint how todo it?
Got the model running on vLLM v0.14.1
transformers==4.57.6
numpy==2.2
# heretic --dtypes nvfp4 --trust-remote-code true --model cybermotaz/nemotron3-nano-nvfp4-w4a16
█░█░█▀▀░█▀▄░█▀▀░▀█▀░█░█▀▀ v1.1.0
█▀█░█▀▀░█▀▄░█▀▀░░█░░█░█░░
▀░▀░▀▀▀░▀░▀░▀▀▀░░▀░░▀░▀▀▀ https://github.com/p-e-w/heretic
Detected 1 CUDA device(s):
* GPU 0: NVIDIA Thor
Loading model cybermotaz/nemotron3-nano-nvfp4-w4a16...
* Trying dtype nvfp4... Failed (NemotronHForCausalLM.__init__() got an unexpected keyword argument 'dtype')
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /opt/local/miniconda3/envs/vllm/bin/heretic:10 in <module> │
│ │
│ 7 │ │ sys.argv[0] = sys.argv[0][:-11] │
│ 8 │ elif sys.argv[0].endswith(".exe"): │
│ 9 │ │ sys.argv[0] = sys.argv[0][:-4] │
│ ❱ 10 │ sys.exit(main()) │
│ 11 │
│ │
│ /usr/src/heretic/src/heretic/main.py:888 in main │
│ │
│ 885 │ install() │
│ 886 │ │
│ 887 │ try: │
│ ❱ 888 │ │ run() │
│ 889 │ except BaseException as error: │
│ 890 │ │ # Transformers appears to handle KeyboardInterrupt (or BaseException) │
│ 891 │ │ # internally in some places, which can re-raise a different error in the handler │
│ │
│ /usr/src/heretic/src/heretic/main.py:307 in run │
│ │
│ 304 │ │ elif choice is None or choice == "": │
│ 305 │ │ │ return │
│ 306 │ │
│ ❱ 307 │ model = Model(settings) │
│ 308 │ print() │
│ 309 │ print_memory_usage() │
│ 310 │
│ │
│ /usr/src/heretic/src/heretic/model.py:145 in __init__ │
│ │
│ 142 │ │ │ break │
│ 143 │ │ │
│ 144 │ │ if self.model is None: │
│ ❱ 145 │ │ │ raise Exception("Failed to load model with all configured dtypes.") │
│ 146 │ │ │
│ 147 │ │ self._apply_lora() │
│ 148 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Exception: Failed to load model with all configured dtypes.
When trying Nemotron nvfp4 quantizised version.
Same issue with
nvfp4orauto.Any hint how todo it?
Got the model running on vLLM v0.14.1
transformers==4.57.6
numpy==2.2