Robust FP8 layer detection for ignore_layers (#1283) by scopophobic · Pull Request #1289 · intel/auto-round

scopophobic · 2026-01-15T14:21:41Z

FP8 layers were not detected by get_fp_layer_names, causing ignore_layers
to be ignored. This PR:

Auto-detects FP8 layers
Includes them in not_to_quantized_layers
Ensures ignore_layers works for FP8 models

Signed-off-by: Adithyan Madhu adithyanworkmail@gmail.com

Signed-off-by: Adithyan Madhu <adithyanworkmail@gmail.com>

for more information, see https://pre-commit.ci

yiliu30 · 2026-01-16T10:54:01Z

auto_round/compressors/utils.py

+            logger.trace(f"Auto-detected FP8 layer to ignore : {n}")
+
+    if ignore_layers:
+        ignore_list = ignore_layers.replace(" ", "").split(",")


Hi @scopophobic, thanks for your interest in fix that issue! I think there might be a bit of misunderstanding.
We don’t want to skip all FP8 layers. The idea is that we start with an FP8 model and want to requantize it to another format, like W4A16. However, we don’t want certain layers—such as those inside the attention module—to be quantized to W4A16.
The fix here is aligned with what we’re aiming for. #1286

Hi @scopophobic Would you be interested in working on the left part of this issue? #1283 (comment)

Hi @yiliu30, thanks a lot for the clarification, that helped resolve a misunderstanding I had 👍

I now understand that the goal is not to skip all FP8 layers, but to start from an FP8 model and re-quantize it (e.g., to W4A16), while keeping specific submodules (like attention) from being quantized.

I’m definitely interested in working on the remaining part of #1283. My current thought is to make FP8 detection more robust by moving away from class-name checks (like "FP8Linear") and instead relying on explicit FP8 characteristics (e.g., presence of FP8 scale metadata used during dequantization). This would allow supporting multiple FP8 layer implementations without brittle heuristics.

Does this approach sound aligned with what you had in mind for this issue?

Adithyan Madhu and others added 2 commits January 15, 2026 17:09

Robust FP8 layer detection for ignore_layers (intel#1283)

0a92bad

Signed-off-by: Adithyan Madhu <adithyanworkmail@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

f4837bf

for more information, see https://pre-commit.ci

scopophobic mentioned this pull request Jan 15, 2026

ignore_layers does not take effect on FP8 model #1283

Open

yiliu30 reviewed Jan 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Robust FP8 layer detection for ignore_layers (#1283)#1289

Robust FP8 layer detection for ignore_layers (#1283)#1289
scopophobic wants to merge 2 commits intointel:mainfrom
scopophobic:fix/fp8-ignore-layer-detection

scopophobic commented Jan 15, 2026

Uh oh!

yiliu30 Jan 16, 2026

Uh oh!

yiliu30 Jan 16, 2026 •

edited

Loading

Uh oh!

scopophobic Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

scopophobic commented Jan 15, 2026

Uh oh!

yiliu30 Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

yiliu30 Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scopophobic Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yiliu30 Jan 16, 2026 •

edited

Loading