args=Namespace(ckpt_id='black-forest-labs/FLUX.1-dev', seed=0, disable_fa3=True, disable_fp8=False, disable_compile=False, disable_recompile_error=False, disable_hotswap=False, quantize_t5=True, offload=False, max_rank=128, out_dir=PosixPath('no_fa3'))
Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading checkpoint shards: 100%|██████████| 2/2 [00:04<00:00, 2.44s/it]it/s]
Loading checkpoint shards: 100%|██████████| 3/3 [00:26<00:00, 8.74s/it]s/it]
Loading pipeline components...: 100%|██████████| 7/7 [00:31<00:00, 4.50s/it]
Loading repo_id='glif/l0w-r3z'
WARN Feature `utils/Perplexity` requires python GIL. Feature is currently skipped/disabled.
INFO ENV: Auto setting PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' for memory saving.
INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
No LoRA keys associated to CLIPTextModel found with the prefix='text_encoder'. This is safe to ignore if LoRA state dict didn't originally have any CLIPTextModel related params. You can also try specifying `prefix=None` to resolve the warning. Otherwise, open an issue if you think it's unexpected: https://github.com/huggingface/diffusers/issues/new
No LoRA keys associated to CLIPTextModel found with the prefix='text_encoder'. This is safe to ignore if LoRA state dict didn't originally have any CLIPTextModel related params. You can also try specifying `prefix=None` to resolve the warning. Otherwise, open an issue if you think it's unexpected: https://github.com/huggingface/diffusers/issues/new
Token indices sequence length is longer than the specified maximum sequence length for this model (96 > 77). Running this sequence through the model will result in indexing errors
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['in the lower corner show a price of 1 5 cents and the date sep 2 0 2 4']
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['in the lower corner show a price of 1 5 cents and the date sep 2 0 2 4']
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['in the lower corner show a price of 1 5 cents and the date sep 2 0 2 4']
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['in the lower corner show a price of 1 5 cents and the date sep 2 0 2 4']
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['in the lower corner show a price of 1 5 cents and the date sep 2 0 2 4']
Loading repo_id='renderartist/retrocomicflux'
Benchmark completed in 351.19 seconds.
out_dict={'timings': [17.71, 17.722], 'time_mean': 17.715999603271484, 'time_var': 7.201245171017945e-05, 'img_paths': ['no_fa3/glif_l0w-r3z.png', 'no_fa3/renderartist_retrocomicflux.png']}
args=Namespace(ckpt_id='black-forest-labs/FLUX.1-dev', seed=0, disable_fa3=True, disable_fp8=False, disable_compile=True, disable_recompile_error=False, disable_hotswap=False, quantize_t5=True, offload=False, max_rank=128, out_dir=PosixPath('no_compile_fa3'))
Loading checkpoint shards: 100%|██████████| 3/3 [00:26<00:00, 8.76s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:04<00:00, 2.43s/it]s/it]
Loading pipeline components...: 29%|██▊ | 2/7 [00:31<01:08, 13.78s/it]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|██████████| 7/7 [00:31<00:00, 4.50s/it]
Loading repo_id='glif/l0w-r3z'
WARN Feature `utils/Perplexity` requires python GIL. Feature is currently skipped/disabled.
INFO ENV: Auto setting PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' for memory saving.
INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
No LoRA keys associated to CLIPTextModel found with the prefix='text_encoder'. This is safe to ignore if LoRA state dict didn't originally have any CLIPTextModel related params. You can also try specifying `prefix=None` to resolve the warning. Otherwise, open an issue if you think it's unexpected: https://github.com/huggingface/diffusers/issues/new
No LoRA keys associated to CLIPTextModel found with the prefix='text_encoder'. This is safe to ignore if LoRA state dict didn't originally have any CLIPTextModel related params. You can also try specifying `prefix=None` to resolve the warning. Otherwise, open an issue if you think it's unexpected: https://github.com/huggingface/diffusers/issues/new
Token indices sequence length is longer than the specified maximum sequence length for this model (96 > 77). Running this sequence through the model will result in indexing errors
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['in the lower corner show a price of 1 5 cents and the date sep 2 0 2 4']
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['in the lower corner show a price of 1 5 cents and the date sep 2 0 2 4']
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['in the lower corner show a price of 1 5 cents and the date sep 2 0 2 4']
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['in the lower corner show a price of 1 5 cents and the date sep 2 0 2 4']
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['in the lower corner show a price of 1 5 cents and the date sep 2 0 2 4']
Loading repo_id='renderartist/retrocomicflux'
Benchmark completed in 237.86 seconds.
out_dict={'timings': [23.446, 23.306], 'time_mean': 23.375999450683594, 'time_var': 0.009799914434552193, 'img_paths': ['no_compile_fa3/glif_l0w-r3z.png', 'no_compile_fa3/renderartist_retrocomicflux.png']}
args=Namespace(ckpt_id='black-forest-labs/FLUX.1-dev', seed=0, disable_fa3=True, disable_fp8=True, disable_compile=False, disable_recompile_error=True, disable_hotswap=False, quantize_t5=False, offload=True, max_rank=128, out_dir=PosixPath('no_fa3_fp8_nf4'))
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 120.79it/s]
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 88.45it/s]
Loading pipeline components...: 43%|████▎ | 3/7 [00:00<00:00, 26.85it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00, 24.32it/s]
Loading repo_id='glif/l0w-r3z'
WARN Feature `utils/Perplexity` requires python GIL. Feature is currently skipped/disabled.
INFO ENV: Auto setting PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' for memory saving.
INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
No LoRA keys associated to CLIPTextModel found with the prefix='text_encoder'. This is safe to ignore if LoRA state dict didn't originally have any CLIPTextModel related params. You can also try specifying `prefix=None` to resolve the warning. Otherwise, open an issue if you think it's unexpected: https://github.com/huggingface/diffusers/issues/new
/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py:1263: UserWarning: Dynamo does not know how to trace the builtin `posix.putenv.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
Traceback (most recent call last):
File "/home/name/work/clones/lora-fast/run_benchmark.py", line 26, in <module>
out_dict = bench_manager.run_benchmark(LORA_MAPPINGS)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/clones/lora-fast/utils/benchmark_utils.py", line 108, in run_benchmark
image = self.run_inference(self.pipe, pipe_kwargs, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/clones/lora-fast/utils/benchmark_utils.py", line 85, in run_inference
return pipe(**pipe_kwargs).images[0]
^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/forks/diffusers/src/diffusers/pipelines/flux/pipeline_flux.py", line 913, in __call__
noise_pred = self.transformer(
^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1749, in _wrapped_call_impl
return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/accelerate/hooks.py", line 170, in new_forward
args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/accelerate/hooks.py", line 718, in pre_forward
self.prev_module_hook.offload()
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/accelerate/hooks.py", line 719, in torch_dynamo_resume_in_pre_forward_at_718
clear_device_cache()
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/accelerate/hooks.py", line 720, in torch_dynamo_resume_in_pre_forward_at_719
module.to(self.execution_device)
File "/home/name/work/forks/diffusers/src/diffusers/models/modeling_utils.py", line 1383, in to
return super().to(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1355, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 942, in _apply
param_applied = fn(param)
^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1341, in convert
return t.to(
^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 90.00 MiB. GPU 0 has a total capacity of 23.55 GiB of which 45.75 MiB is free. Including non-PyTorch memory, this process has 23.48 GiB memory in use. Of the allocated memory 22.72 GiB is allocated by PyTorch, and 257.71 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
args=Namespace(ckpt_id='black-forest-labs/FLUX.1-dev', seed=0, disable_fa3=True, disable_fp8=True, disable_compile=False, disable_recompile_error=True, disable_hotswap=False, quantize_t5=True, offload=True, max_rank=128, out_dir=PosixPath('no_fa3_fp8'))
Loading checkpoint shards: 100%|██████████| 2/2 [00:04<00:00, 2.49s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 96.44it/s]s/it]
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|██████████| 7/7 [00:05<00:00, 1.30it/s]
Loading repo_id='glif/l0w-r3z'
WARN Feature `utils/Perplexity` requires python GIL. Feature is currently skipped/disabled.
INFO ENV: Auto setting PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' for memory saving.
INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
No LoRA keys associated to CLIPTextModel found with the prefix='text_encoder'. This is safe to ignore if LoRA state dict didn't originally have any CLIPTextModel related params. You can also try specifying `prefix=None` to resolve the warning. Otherwise, open an issue if you think it's unexpected: https://github.com/huggingface/diffusers/issues/new
/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py:1263: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.TensorBase._make_subclass.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py:1263: UserWarning: Dynamo does not know how to trace the builtin `posix.putenv.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
Traceback (most recent call last):
File "/home/name/work/clones/lora-fast/run_benchmark.py", line 26, in <module>
out_dict = bench_manager.run_benchmark(LORA_MAPPINGS)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/clones/lora-fast/utils/benchmark_utils.py", line 108, in run_benchmark
image = self.run_inference(self.pipe, pipe_kwargs, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/clones/lora-fast/utils/benchmark_utils.py", line 85, in run_inference
return pipe(**pipe_kwargs).images[0]
^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/forks/diffusers/src/diffusers/pipelines/flux/pipeline_flux.py", line 913, in __call__
noise_pred = self.transformer(
^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1749, in _wrapped_call_impl
return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/accelerate/hooks.py", line 170, in new_forward
args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/accelerate/hooks.py", line 718, in pre_forward
self.prev_module_hook.offload()
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/accelerate/hooks.py", line 719, in torch_dynamo_resume_in_pre_forward_at_718
clear_device_cache()
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/accelerate/hooks.py", line 720, in torch_dynamo_resume_in_pre_forward_at_719
module.to(self.execution_device)
File "/home/name/work/forks/diffusers/src/diffusers/models/modeling_utils.py", line 1383, in to
return super().to(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1355, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 942, in _apply
param_applied = fn(param)
^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1341, in convert
return t.to(
^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 90.00 MiB. GPU 0 has a total capacity of 23.55 GiB of which 47.75 MiB is free. Including non-PyTorch memory, this process has 23.48 GiB memory in use. Of the allocated memory 22.72 GiB is allocated by PyTorch, and 256.21 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
args=Namespace(ckpt_id='black-forest-labs/FLUX.1-dev', seed=0, disable_fa3=True, disable_fp8=True, disable_compile=True, disable_recompile_error=True, disable_hotswap=False, quantize_t5=False, offload=True, max_rank=128, out_dir=PosixPath('no_fa3_fp8_nf4_compile'))
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 129.33it/s]
Loading pipeline components...: 71%|███████▏ | 5/7 [00:00<00:00, 48.06it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 92.10it/s]
Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00, 24.94it/s]
Loading repo_id='glif/l0w-r3z'
WARN Feature `utils/Perplexity` requires python GIL. Feature is currently skipped/disabled.
INFO ENV: Auto setting PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' for memory saving.
INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
No LoRA keys associated to CLIPTextModel found with the prefix='text_encoder'. This is safe to ignore if LoRA state dict didn't originally have any CLIPTextModel related params. You can also try specifying `prefix=None` to resolve the warning. Otherwise, open an issue if you think it's unexpected: https://github.com/huggingface/diffusers/issues/new
Traceback (most recent call last):
File "/home/name/work/clones/lora-fast/run_benchmark.py", line 26, in <module>
out_dict = bench_manager.run_benchmark(LORA_MAPPINGS)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/clones/lora-fast/utils/benchmark_utils.py", line 108, in run_benchmark
image = self.run_inference(self.pipe, pipe_kwargs, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/clones/lora-fast/utils/benchmark_utils.py", line 85, in run_inference
return pipe(**pipe_kwargs).images[0]
^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/forks/diffusers/src/diffusers/pipelines/flux/pipeline_flux.py", line 913, in __call__
noise_pred = self.transformer(
^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/accelerate/hooks.py", line 170, in new_forward
args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/accelerate/hooks.py", line 720, in pre_forward
module.to(self.execution_device)
File "/home/name/work/forks/diffusers/src/diffusers/models/modeling_utils.py", line 1383, in to
return super().to(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1355, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 942, in _apply
param_applied = fn(param)
^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1341, in convert
return t.to(
^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 90.00 MiB. GPU 0 has a total capacity of 23.55 GiB of which 45.75 MiB is free. Including non-PyTorch memory, this process has 23.48 GiB memory in use. Of the allocated memory 22.72 GiB is allocated by PyTorch, and 257.71 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
args=Namespace(ckpt_id='black-forest-labs/FLUX.1-dev', seed=0, disable_fa3=True, disable_fp8=True, disable_compile=True, disable_recompile_error=True, disable_hotswap=False, quantize_t5=True, offload=True, max_rank=128, out_dir=PosixPath('no_fa3_fp8_compile'))
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 87.39it/s]
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading checkpoint shards: 100%|██████████| 2/2 [00:04<00:00, 2.50s/it]it/s]
Loading pipeline components...: 100%|██████████| 7/7 [00:05<00:00, 1.30it/s]
Loading repo_id='glif/l0w-r3z'
WARN Feature `utils/Perplexity` requires python GIL. Feature is currently skipped/disabled.
INFO ENV: Auto setting PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' for memory saving.
INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
No LoRA keys associated to CLIPTextModel found with the prefix='text_encoder'. This is safe to ignore if LoRA state dict didn't originally have any CLIPTextModel related params. You can also try specifying `prefix=None` to resolve the warning. Otherwise, open an issue if you think it's unexpected: https://github.com/huggingface/diffusers/issues/new
Traceback (most recent call last):
File "/home/name/work/clones/lora-fast/run_benchmark.py", line 26, in <module>
out_dict = bench_manager.run_benchmark(LORA_MAPPINGS)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/clones/lora-fast/utils/benchmark_utils.py", line 108, in run_benchmark
image = self.run_inference(self.pipe, pipe_kwargs, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/clones/lora-fast/utils/benchmark_utils.py", line 85, in run_inference
return pipe(**pipe_kwargs).images[0]
^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/forks/diffusers/src/diffusers/pipelines/flux/pipeline_flux.py", line 913, in __call__
noise_pred = self.transformer(
^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/accelerate/hooks.py", line 170, in new_forward
args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/accelerate/hooks.py", line 720, in pre_forward
module.to(self.execution_device)
File "/home/name/work/forks/diffusers/src/diffusers/models/modeling_utils.py", line 1383, in to
return super().to(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1355, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 942, in _apply
param_applied = fn(param)
^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1341, in convert
return t.to(
^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 90.00 MiB. GPU 0 has a total capacity of 23.55 GiB of which 47.75 MiB is free. Including non-PyTorch memory, this process has 23.48 GiB memory in use. Of the allocated memory 22.72 GiB is allocated by PyTorch, and 256.21 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
args=Namespace(ckpt_id='black-forest-labs/FLUX.1-dev', seed=0, disable_fa3=True, disable_fp8=True, disable_compile=False, disable_recompile_error=True, disable_hotswap=True, quantize_t5=False, offload=True, max_rank=128, out_dir=PosixPath('no_fa3_fp8_nf4_hotswap'))
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 92.39it/s]
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 129.88it/s]
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00, 25.84it/s]
Loading repo_id='glif/l0w-r3z'
WARN Feature `utils/Perplexity` requires python GIL. Feature is currently skipped/disabled.
INFO ENV: Auto setting PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' for memory saving.
INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
No LoRA keys associated to CLIPTextModel found with the prefix='text_encoder'. This is safe to ignore if LoRA state dict didn't originally have any CLIPTextModel related params. You can also try specifying `prefix=None` to resolve the warning. Otherwise, open an issue if you think it's unexpected: https://github.com/huggingface/diffusers/issues/new
/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py:1263: UserWarning: Dynamo does not know how to trace the builtin `posix.putenv.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
Traceback (most recent call last):
File "/home/name/work/clones/lora-fast/run_benchmark.py", line 26, in <module>
out_dict = bench_manager.run_benchmark(LORA_MAPPINGS)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/clones/lora-fast/utils/benchmark_utils.py", line 108, in run_benchmark
image = self.run_inference(self.pipe, pipe_kwargs, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/clones/lora-fast/utils/benchmark_utils.py", line 85, in run_inference
return pipe(**pipe_kwargs).images[0]
^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/forks/diffusers/src/diffusers/pipelines/flux/pipeline_flux.py", line 913, in __call__
noise_pred = self.transformer(
^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1749, in _wrapped_call_impl
return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/accelerate/hooks.py", line 170, in new_forward
args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/accelerate/hooks.py", line 170, in torch_dynamo_resume_in_new_forward_at_170
args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1209, in forward
return compiled_fn(full_args)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 328, in runtime_wrapper
all_outs = call_func_at_runtime_with_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 126, in call_func_at_runtime_with_args
out = normalize_as_list(f(args))
^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 689, in inner_fn
outs = compiled_fn(args)
^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 495, in wrapper
return compiled_fn(runtime_args)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 460, in __call__
return self.current_callable(inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_inductor/utils.py", line 2404, in run
return model(new_inputs)
^^^^^^^^^^^^^^^^^
File "/tmp/torchinductor_name/ju/cjulupfc4jcwfrfojcurabtjn7mwycfwl354nfp5hsnbhvhoys3h.py", line 5892, in call
triton_poi_fused_mm_18.run(buf94, buf95, 196608, stream=stream0)
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 909, in run
self.autotune_to_one_config(*args, **kwargs)
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 763, in autotune_to_one_config
timings = self.benchmark_all_configs(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 738, in benchmark_all_configs
launcher: self.bench(launcher, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 616, in bench
return benchmarker.benchmark_gpu(kernel_call, rep=40)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_inductor/runtime/benchmarking.py", line 39, in wrapper
return fn(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_inductor/runtime/benchmarking.py", line 247, in benchmark_gpu
buffer = torch.empty(self.L2_cache_size // 4, dtype=torch.int, device="cuda")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 72.00 MiB. GPU 0 has a total capacity of 23.55 GiB of which 59.75 MiB is free. Including non-PyTorch memory, this process has 23.46 GiB memory in use. Of the allocated memory 22.94 GiB is allocated by PyTorch, and 17.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
args=Namespace(ckpt_id='black-forest-labs/FLUX.1-dev', seed=0, disable_fa3=True, disable_fp8=True, disable_compile=False, disable_recompile_error=True, disable_hotswap=True, quantize_t5=True, offload=True, max_rank=128, out_dir=PosixPath('no_fa3_fp8_hotswap'))
Loading checkpoint shards: 100%|██████████| 2/2 [00:05<00:00, 2.51s/it]
Loading pipeline components...: 14%|█▍ | 1/7 [00:05<00:31, 5.19s/it]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 91.64it/s]s/it]
Loading pipeline components...: 100%|██████████| 7/7 [00:05<00:00, 1.29it/s]
Loading repo_id='glif/l0w-r3z'
WARN Feature `utils/Perplexity` requires python GIL. Feature is currently skipped/disabled.
INFO ENV: Auto setting PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' for memory saving.
INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
No LoRA keys associated to CLIPTextModel found with the prefix='text_encoder'. This is safe to ignore if LoRA state dict didn't originally have any CLIPTextModel related params. You can also try specifying `prefix=None` to resolve the warning. Otherwise, open an issue if you think it's unexpected: https://github.com/huggingface/diffusers/issues/new
/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py:1263: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.TensorBase._make_subclass.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py:1263: UserWarning: Dynamo does not know how to trace the builtin `posix.putenv.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
Traceback (most recent call last):
File "/home/name/work/clones/lora-fast/run_benchmark.py", line 26, in <module>
out_dict = bench_manager.run_benchmark(LORA_MAPPINGS)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/clones/lora-fast/utils/benchmark_utils.py", line 108, in run_benchmark
image = self.run_inference(self.pipe, pipe_kwargs, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/clones/lora-fast/utils/benchmark_utils.py", line 85, in run_inference
return pipe(**pipe_kwargs).images[0]
^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/forks/diffusers/src/diffusers/pipelines/flux/pipeline_flux.py", line 913, in __call__
noise_pred = self.transformer(
^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1749, in _wrapped_call_impl
return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/accelerate/hooks.py", line 170, in new_forward
args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/accelerate/hooks.py", line 170, in torch_dynamo_resume_in_new_forward_at_170
args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1209, in forward
return compiled_fn(full_args)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 328, in runtime_wrapper
all_outs = call_func_at_runtime_with_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 126, in call_func_at_runtime_with_args
out = normalize_as_list(f(args))
^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 495, in wrapper
return compiled_fn(runtime_args)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 460, in __call__
return self.current_callable(inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_inductor/utils.py", line 2404, in run
return model(new_inputs)
^^^^^^^^^^^^^^^^^
File "/tmp/torchinductor_name/ju/cjulupfc4jcwfrfojcurabtjn7mwycfwl354nfp5hsnbhvhoys3h.py", line 5892, in call
triton_poi_fused_mm_18.run(buf94, buf95, 196608, stream=stream0)
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 909, in run
self.autotune_to_one_config(*args, **kwargs)
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 763, in autotune_to_one_config
timings = self.benchmark_all_configs(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 738, in benchmark_all_configs
launcher: self.bench(launcher, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 616, in bench
return benchmarker.benchmark_gpu(kernel_call, rep=40)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_inductor/runtime/benchmarking.py", line 39, in wrapper
return fn(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/_inductor/runtime/benchmarking.py", line 247, in benchmark_gpu
buffer = torch.empty(self.L2_cache_size // 4, dtype=torch.int, device="cuda")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 72.00 MiB. GPU 0 has a total capacity of 23.55 GiB of which 67.75 MiB is free. Including non-PyTorch memory, this process has 23.46 GiB memory in use. Of the allocated memory 22.95 GiB is allocated by PyTorch, and 6.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
args=Namespace(ckpt_id='black-forest-labs/FLUX.1-dev', seed=0, disable_fa3=True, disable_fp8=True, disable_compile=True, disable_recompile_error=True, disable_hotswap=True, quantize_t5=False, offload=True, max_rank=128, out_dir=PosixPath('no_fa3_fp8_nf4_hotswap_comp'))
Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 89.40it/s]it/s]
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 133.26it/s]
Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00, 25.31it/s]
Loading repo_id='glif/l0w-r3z'
WARN Feature `utils/Perplexity` requires python GIL. Feature is currently skipped/disabled.
INFO ENV: Auto setting PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' for memory saving.
INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
No LoRA keys associated to CLIPTextModel found with the prefix='text_encoder'. This is safe to ignore if LoRA state dict didn't originally have any CLIPTextModel related params. You can also try specifying `prefix=None` to resolve the warning. Otherwise, open an issue if you think it's unexpected: https://github.com/huggingface/diffusers/issues/new
Traceback (most recent call last):
File "/home/name/work/clones/lora-fast/run_benchmark.py", line 26, in <module>
out_dict = bench_manager.run_benchmark(LORA_MAPPINGS)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/clones/lora-fast/utils/benchmark_utils.py", line 108, in run_benchmark
image = self.run_inference(self.pipe, pipe_kwargs, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/clones/lora-fast/utils/benchmark_utils.py", line 85, in run_inference
return pipe(**pipe_kwargs).images[0]
^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/forks/diffusers/src/diffusers/pipelines/flux/pipeline_flux.py", line 913, in __call__
noise_pred = self.transformer(
^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/accelerate/hooks.py", line 175, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/forks/diffusers/src/diffusers/models/transformers/transformer_flux.py", line 490, in forward
encoder_hidden_states, hidden_states = block(
^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/forks/diffusers/src/diffusers/models/transformers/transformer_flux.py", line 151, in forward
attention_outputs = self.attn(
^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/forks/diffusers/src/diffusers/models/attention_processor.py", line 605, in forward
return self.processor(
^^^^^^^^^^^^^^^
File "/home/name/work/forks/diffusers/src/diffusers/models/attention_processor.py", line 2339, in __call__
query = apply_rotary_emb(query, image_rotary_emb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/forks/diffusers/src/diffusers/models/embeddings.py", line 1211, in apply_rotary_emb
out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
~~~~~~~~~~~~~~~~~~^~~~~
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 54.00 MiB. GPU 0 has a total capacity of 23.55 GiB of which 41.75 MiB is free. Including non-PyTorch memory, this process has 23.48 GiB memory in use. Of the allocated memory 22.91 GiB is allocated by PyTorch, and 68.94 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
args=Namespace(ckpt_id='black-forest-labs/FLUX.1-dev', seed=0, disable_fa3=True, disable_fp8=True, disable_compile=True, disable_recompile_error=True, disable_hotswap=True, quantize_t5=True, offload=True, max_rank=128, out_dir=PosixPath('no_fa3_fp8_hotswap_comp'))
Loading checkpoint shards: 100%|██████████| 2/2 [00:04<00:00, 2.48s/it]
Loading pipeline components...: 43%|████▎ | 3/7 [00:05<00:06, 1.72s/it]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 92.49it/s]s/it]
Loading pipeline components...: 100%|██████████| 7/7 [00:05<00:00, 1.30it/s]
Loading repo_id='glif/l0w-r3z'
WARN Feature `utils/Perplexity` requires python GIL. Feature is currently skipped/disabled.
INFO ENV: Auto setting PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' for memory saving.
INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
No LoRA keys associated to CLIPTextModel found with the prefix='text_encoder'. This is safe to ignore if LoRA state dict didn't originally have any CLIPTextModel related params. You can also try specifying `prefix=None` to resolve the warning. Otherwise, open an issue if you think it's unexpected: https://github.com/huggingface/diffusers/issues/new
Traceback (most recent call last):
File "/home/name/work/clones/lora-fast/run_benchmark.py", line 26, in <module>
out_dict = bench_manager.run_benchmark(LORA_MAPPINGS)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/clones/lora-fast/utils/benchmark_utils.py", line 108, in run_benchmark
image = self.run_inference(self.pipe, pipe_kwargs, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/clones/lora-fast/utils/benchmark_utils.py", line 85, in run_inference
return pipe(**pipe_kwargs).images[0]
^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/forks/diffusers/src/diffusers/pipelines/flux/pipeline_flux.py", line 913, in __call__
noise_pred = self.transformer(
^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/accelerate/hooks.py", line 175, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/forks/diffusers/src/diffusers/models/transformers/transformer_flux.py", line 490, in forward
encoder_hidden_states, hidden_states = block(
^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/forks/diffusers/src/diffusers/models/transformers/transformer_flux.py", line 151, in forward
attention_outputs = self.attn(
^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/forks/diffusers/src/diffusers/models/attention_processor.py", line 605, in forward
return self.processor(
^^^^^^^^^^^^^^^
File "/home/name/work/forks/diffusers/src/diffusers/models/attention_processor.py", line 2339, in __call__
query = apply_rotary_emb(query, image_rotary_emb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/name/work/forks/diffusers/src/diffusers/models/embeddings.py", line 1211, in apply_rotary_emb
out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
~~~~~~~~~~~~~~~~~~^~~~~
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 54.00 MiB. GPU 0 has a total capacity of 23.55 GiB of which 41.75 MiB is free. Including non-PyTorch memory, this process has 23.48 GiB memory in use. Of the allocated memory 22.91 GiB is allocated by PyTorch, and 68.94 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Unfortunately, one the first two experiments,
no_fa3andno_compile_fa3succeeded, the other ones all ran OOM. So it seems that FP8 is the common denominator and required for the experiment to succeed.Click to see logs