feat: add Qwen3.5 MoE hybrid layer support by farolone · Pull Request #187 · p-e-w/heretic

farolone · 2026-02-22T06:55:58Z

Summary

Qwen3.5 MoE uses GatedDeltaNet (linear attention) on some layers instead of standard self-attention. This causes abliteration to fail because self_attn.o_proj doesn't exist on those layers.

Wrap self_attn.o_proj access in suppress(Exception) and add linear_attn.out_proj as alternative attention out-projection for GatedDeltaNet layers
Scan all layers in get_abliterable_components() instead of only layer 0, since hybrid models have different components on different layers
Derive LoRA target_modules from actual named_modules() instead of splitting component keys, which fails when module names differ across layers (e.g. o_proj vs out_proj)

Test results

Successfully abliterated Qwen3.5-397B-A17B (7× H200 SXM):

Trial 104 selected from 200-trial Pareto front
Refusals: 7/100, KL divergence: 0.2676
Parameters: direction_index=52.45, attn.o_proj.max_weight=1.34, max_weight_position=36.86, min_weight=0.66, min_weight_distance=15.21

The resulting model passes both censorship removal and quality preservation tests.

Relation to #43

This PR is complementary to #43 (hybrid layer support for LFM/Mamba/Conv). While #43 adds broad architecture support across get_layer_matrices(), this PR focuses specifically on Qwen3.5 MoE's GatedDeltaNet layers and fixes get_layer_modules() + LoRA targeting which #43 does not touch.

🤖 Generated with Claude Code

Qwen3.5 MoE uses GatedDeltaNet (linear attention) on some layers instead of standard self-attention, causing abliteration to fail because self_attn.o_proj doesn't exist on those layers. Changes: - Wrap self_attn.o_proj in suppress(Exception) and add linear_attn.out_proj as alternative attention out-projection for GatedDeltaNet layers - Scan all layers in get_abliterable_components() instead of only layer 0, since hybrid models have different components on different layers - Derive LoRA target_modules from actual named_modules() instead of splitting component keys, which fails when module names differ across layers (e.g. "o_proj" vs "out_proj") Tested with Qwen3.5-397B-A17B (7/100 refusals, KL 0.2676). Relates to p-e-w#43 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist · 2026-02-22T06:56:11Z

Summary of Changes

Hello @farolone, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive support for hybrid model architectures, specifically addressing the Qwen3.5 MoE model. It refines the module identification and LoRA targeting mechanisms to correctly handle layers with different attention output projections and varying component structures, thereby enhancing the abliteration framework's compatibility and reliability for complex models.

Highlights

Hybrid Layer Support: Implemented support for Qwen3.5 MoE hybrid layers by gracefully handling both self_attn.o_proj and linear_attn.out_proj as alternative attention output projections.
Dynamic Component Identification: Updated get_abliterable_components() to scan all layers, ensuring accurate component identification in hybrid models where different layers may have varying structures.
Robust LoRA Targeting: Revised LoRA target_modules derivation to use actual named_modules() from all layers, providing robustness against differing module names (e.g., o_proj vs. out_proj) across layers.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

src/heretic/model.py
- Refactored LoRA target_modules collection to iterate through all layers and dynamically identify unique leaf module names, improving robustness for hybrid models.
- Added a fallback mechanism for attention output projection within get_layer_modules, allowing linear_attn.out_proj to be used when self_attn.o_proj is absent, specifically for GatedDeltaNet layers.
- Modified get_abliterable_components to aggregate components from all model layers instead of just the first, supporting models with heterogeneous layer structures.

Activity

The author successfully tested the changes by abliterating Qwen3.5-397B-A17B, achieving specific refusal rates (7/100) and KL divergence (0.2676), confirming the model passes censorship removal and quality preservation tests.
This PR is noted as complementary to add support for hybrid layer models: conv, mamba #43, which focuses on broad architecture support for LFM/Mamba/Conv, while this PR specifically addresses Qwen3.5 MoE's GatedDeltaNet layers and LoRA targeting.
The code for this pull request was generated with Claude Code.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for Qwen3.5 MoE hybrid layers by making the model component discovery more robust. The changes correctly handle models with different module types across layers, such as self_attn and linear_attn. The logic for identifying LoRA target modules and abliterable components has been updated to scan all layers instead of just the first one. The implementation is solid. I've added one suggestion to improve code readability.

p-e-w · 2026-02-22T07:23:55Z

Cool! Did you upload the abliterated model to HF?

farolone · 2026-02-22T07:54:14Z

I will have to delete it. In two days.
matthiasstegner/Qwen3.5-397B-A17B-heretic

p-e-w · 2026-02-22T11:22:28Z

Pity you didn't upload the model card, I wanted to check whether the individual components were listed as expected.

Do you by chance have the full parameters of the chosen trial?

farolone · 2026-02-22T11:34:03Z

Abliteration parameters

┌────────────────────────┬───────┐
│ Parameter │ Value │
├────────────────────────┼───────┤
│ direction_index │ 52.45 │
├────────────────────────┼───────┤
│ attn.o_proj.max_weight │ 1.34 │
├────────────────────────┼───────┤
│ max_weight_position │ 36.86 │
├────────────────────────┼───────┤
│ min_weight │ 0.66 │
├────────────────────────┼───────┤
│ min_weight_distance │ 15.21 │
└────────────────────────┴───────┘

Performance

Metric: KL divergence
This model: 0.2676
Original model (https://huggingface.co/Qwen/Qwen3.5-397B-A17B): 0 (by
definition)
────────────────────────────────────────
Metric: Refusals
This model: 7/100
Original model (https://huggingface.co/Qwen/Qwen3.5-397B-A17B): ~100/100

p-e-w · 2026-02-22T11:44:24Z

That can't be correct. Where is the component name for the last three parameters?

Qwen3.5 MoE uses hybrid arch, needing some more care: 1. **Hybrid attention layers**: Some layers use standard self-attention (`self_attn.o_proj`) while others use GatedDeltaNet linear attention (`linear_attn.out_proj`). -> Needs special handling introduced in this PR. 2. **Fused expert parameters**: Expert weights are stored as stacked `nn.Parameter` tensors and not just as nn.Module -> Also handled. Changes: - Add fused expert detection (`get_fused_expert_params()`) and direct weight modification with snapshot/restore for model reset - Add hybrid layer support: `linear_attn.out_proj` as alternative attention out-projection, `mlp.shared_expert.down_proj` for shared expert - Scan all layers in `get_abliterable_components()` instead of only layer 0, since hybrid models have different components per layer - Derive LoRA `target_modules` from `named_modules()` instead of splitting component keys, which is more robust for hybrid architectures - Pass `enable_thinking=False` to `apply_chat_template()` so models with thinking mode (e.g. Qwen3.5) produce prompts without `<think>` tags, matching standard deployment configurations. Falls back gracefully for tokenizers that don't support this kwarg. Tested on Qwen3.5-35B-A3B (NVIDIA RTX PRO 6000 Blackwell): - 200-trial Pareto optimization completes successfully - Fused expert abliteration correctly modifies expert slices Comparison with other PRs for Qwen 3.5 MoE: This PR covers: - Hybrid layer detection - Shared Expert (missing in p-e-w#187) - Fused expert abliteration (missing in both p-e-w#187 and p-e-w#193) - Separate attn component keys (o_proj + out_proj) (missing in p-e-w#187). - I also set enable_thinking=False which is not in p-e-w#193 and p-e-w#187 (currently Heretic does inject <think></think> however for Qwen3.5 case those tokens will never exist in the first place if thinking is disabled anyway, so this approach should lead to slightly more correctness.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

p-e-w · 2026-03-06T07:50:08Z

Merged, thank you!

@gemini-code-assist

* feat: add Qwen3.5 MoE hybrid layer support Qwen3.5 MoE uses GatedDeltaNet (linear attention) on some layers instead of standard self-attention, causing abliteration to fail because self_attn.o_proj doesn't exist on those layers. Changes: - Wrap self_attn.o_proj in suppress(Exception) and add linear_attn.out_proj as alternative attention out-projection for GatedDeltaNet layers - Scan all layers in get_abliterable_components() instead of only layer 0, since hybrid models have different components on different layers - Derive LoRA target_modules from actual named_modules() instead of splitting component keys, which fails when module names differ across layers (e.g. "o_proj" vs "out_proj") Tested with Qwen3.5-397B-A17B (7/100 refusals, KL 0.2676). Relates to p-e-w#43 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Philipp Emanuel Weidmann <pew@worldwidemann.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

@gemini-code-assist

* feat: add Qwen3.5 MoE hybrid layer support Qwen3.5 MoE uses GatedDeltaNet (linear attention) on some layers instead of standard self-attention, causing abliteration to fail because self_attn.o_proj doesn't exist on those layers. Changes: - Wrap self_attn.o_proj in suppress(Exception) and add linear_attn.out_proj as alternative attention out-projection for GatedDeltaNet layers - Scan all layers in get_abliterable_components() instead of only layer 0, since hybrid models have different components on different layers - Derive LoRA target_modules from actual named_modules() instead of splitting component keys, which fails when module names differ across layers (e.g. "o_proj" vs "out_proj") Tested with Qwen3.5-397B-A17B (7/100 refusals, KL 0.2676). Relates to p-e-w#43 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Philipp Emanuel Weidmann <pew@worldwidemann.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gemini-code-assist Bot reviewed Feb 22, 2026

View reviewed changes

Comment thread src/heretic/model.py Outdated

p-e-w mentioned this pull request Feb 26, 2026

feat: Add Qwen3.5 MoE Hybrid Layer Support #193

Closed

Sehyo mentioned this pull request Mar 3, 2026

feat: add Qwen3.5 MoE support with fused expert abliteration #206

Closed

Sehyo mentioned this pull request Mar 3, 2026

feat: add Qwen3.5 MoE support with fused expert abliteration #207

Closed

Apply suggestion from @gemini-code-assist[bot]

0cf2ca3

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

p-e-w merged commit 5e3c04c into p-e-w:master Mar 6, 2026
0 of 4 checks passed

p-e-w mentioned this pull request Mar 8, 2026

fix: display all abliterable components across layers #215

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Qwen3.5 MoE hybrid layer support#187

feat: add Qwen3.5 MoE hybrid layer support#187
p-e-w merged 2 commits intop-e-w:masterfrom
farolone:feat/qwen3.5-moe-support

farolone commented Feb 22, 2026

Uh oh!

gemini-code-assist Bot commented Feb 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

p-e-w commented Feb 22, 2026

Uh oh!

farolone commented Feb 22, 2026

Uh oh!

p-e-w commented Feb 22, 2026

Uh oh!

farolone commented Feb 22, 2026

Uh oh!

p-e-w commented Feb 22, 2026

Uh oh!

Uh oh!

p-e-w commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

farolone commented Feb 22, 2026

Summary

Test results

Relation to #43

Uh oh!

gemini-code-assist Bot commented Feb 22, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

p-e-w commented Feb 22, 2026

Uh oh!

farolone commented Feb 22, 2026

Uh oh!

p-e-w commented Feb 22, 2026

Uh oh!

farolone commented Feb 22, 2026

Uh oh!

p-e-w commented Feb 22, 2026

Uh oh!

Uh oh!

p-e-w commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants