Skip to content

fix(state): make MLU backend part of the _prepare_backend elif chain (#4055)#4057

Open
Anai-Guo wants to merge 1 commit into
huggingface:mainfrom
Anai-Guo:fix-mlu-backend-elif-chain
Open

fix(state): make MLU backend part of the _prepare_backend elif chain (#4055)#4057
Anai-Guo wants to merge 1 commit into
huggingface:mainfrom
Anai-Guo:fix-mlu-backend-elif-chain

Conversation

@Anai-Guo
Copy link
Copy Markdown

What

In PartialState._prepare_backend, the MLU branch was guarded by a standalone if, immediately followed by a separate if is_sdaa_available(): ... elif is_musa_available(): ... chain:

if is_mlu_available():
    backend = "cncl"
    distributed_type = DistributedType.MULTI_MLU
if is_sdaa_available():        # <- new chain, not part of the MLU check
    ...
elif is_musa_available():
    ...
elif torch.cuda.is_available():
    ...

Because MLU sits outside the mutually-exclusive chain, on an MLU host the cncl / MULTI_MLU selection is not final: execution falls through into the second chain and a later branch (e.g. torch.cuda.is_available()) can silently overwrite backend/distributed_type. Every other accelerator (SDAA, MUSA, NPU, HPU, CUDA, XPU, NEURON) is already part of one if/elif chain — MLU is the lone outlier.

Fix

Change the second if is_sdaa_available() to elif, so all backends form a single mutually-exclusive chain with MLU as the head. This matches the structure already used by default_device.

  • Non-MLU hosts: is_mlu_available() is False, so the elif is evaluated exactly as the old if was — no behavior change.
  • MLU hosts: the chain now short-circuits correctly instead of being able to fall through to CUDA.

Closes #4055

🤖 Generated with Claude Code

`is_mlu_available()` was checked with a standalone `if` followed by a separate `if is_sdaa_available()` chain, so on an MLU host the cncl/MULTI_MLU selection could be silently overwritten by a later branch (e.g. cuda) instead of short-circuiting. Switch the second `if` to `elif` so all accelerator backends form a single mutually-exclusive chain, matching default_device.

Fixes huggingface#4055
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cncl should at the same level in _prepare_backend in src/accelerate/state.py

1 participant