Skip to content

feat: add Qwen3.5 MoE hybrid layer support#187

Merged
p-e-w merged 2 commits intop-e-w:masterfrom
farolone:feat/qwen3.5-moe-support
Mar 6, 2026
Merged

feat: add Qwen3.5 MoE hybrid layer support#187
p-e-w merged 2 commits intop-e-w:masterfrom
farolone:feat/qwen3.5-moe-support

Conversation

@farolone
Copy link
Copy Markdown
Contributor

Summary

Qwen3.5 MoE uses GatedDeltaNet (linear attention) on some layers instead of standard self-attention. This causes abliteration to fail because self_attn.o_proj doesn't exist on those layers.

  • Wrap self_attn.o_proj access in suppress(Exception) and add linear_attn.out_proj as alternative attention out-projection for GatedDeltaNet layers
  • Scan all layers in get_abliterable_components() instead of only layer 0, since hybrid models have different components on different layers
  • Derive LoRA target_modules from actual named_modules() instead of splitting component keys, which fails when module names differ across layers (e.g. o_proj vs out_proj)

Test results

Successfully abliterated Qwen3.5-397B-A17B (7× H200 SXM):

  • Trial 104 selected from 200-trial Pareto front
  • Refusals: 7/100, KL divergence: 0.2676
  • Parameters: direction_index=52.45, attn.o_proj.max_weight=1.34, max_weight_position=36.86, min_weight=0.66, min_weight_distance=15.21

The resulting model passes both censorship removal and quality preservation tests.

Relation to #43

This PR is complementary to #43 (hybrid layer support for LFM/Mamba/Conv). While #43 adds broad architecture support across get_layer_matrices(), this PR focuses specifically on Qwen3.5 MoE's GatedDeltaNet layers and fixes get_layer_modules() + LoRA targeting which #43 does not touch.

🤖 Generated with Claude Code

Qwen3.5 MoE uses GatedDeltaNet (linear attention) on some layers instead
of standard self-attention, causing abliteration to fail because
self_attn.o_proj doesn't exist on those layers.

Changes:
- Wrap self_attn.o_proj in suppress(Exception) and add linear_attn.out_proj
  as alternative attention out-projection for GatedDeltaNet layers
- Scan all layers in get_abliterable_components() instead of only layer 0,
  since hybrid models have different components on different layers
- Derive LoRA target_modules from actual named_modules() instead of
  splitting component keys, which fails when module names differ across
  layers (e.g. "o_proj" vs "out_proj")

Tested with Qwen3.5-397B-A17B (7/100 refusals, KL 0.2676).

Relates to p-e-w#43

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @farolone, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive support for hybrid model architectures, specifically addressing the Qwen3.5 MoE model. It refines the module identification and LoRA targeting mechanisms to correctly handle layers with different attention output projections and varying component structures, thereby enhancing the abliteration framework's compatibility and reliability for complex models.

Highlights

  • Hybrid Layer Support: Implemented support for Qwen3.5 MoE hybrid layers by gracefully handling both self_attn.o_proj and linear_attn.out_proj as alternative attention output projections.
  • Dynamic Component Identification: Updated get_abliterable_components() to scan all layers, ensuring accurate component identification in hybrid models where different layers may have varying structures.
  • Robust LoRA Targeting: Revised LoRA target_modules derivation to use actual named_modules() from all layers, providing robustness against differing module names (e.g., o_proj vs. out_proj) across layers.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/heretic/model.py
    • Refactored LoRA target_modules collection to iterate through all layers and dynamically identify unique leaf module names, improving robustness for hybrid models.
    • Added a fallback mechanism for attention output projection within get_layer_modules, allowing linear_attn.out_proj to be used when self_attn.o_proj is absent, specifically for GatedDeltaNet layers.
    • Modified get_abliterable_components to aggregate components from all model layers instead of just the first, supporting models with heterogeneous layer structures.
Activity
  • The author successfully tested the changes by abliterating Qwen3.5-397B-A17B, achieving specific refusal rates (7/100) and KL divergence (0.2676), confirming the model passes censorship removal and quality preservation tests.
  • This PR is noted as complementary to add support for hybrid layer models: conv, mamba #43, which focuses on broad architecture support for LFM/Mamba/Conv, while this PR specifically addresses Qwen3.5 MoE's GatedDeltaNet layers and LoRA targeting.
  • The code for this pull request was generated with Claude Code.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for Qwen3.5 MoE hybrid layers by making the model component discovery more robust. The changes correctly handle models with different module types across layers, such as self_attn and linear_attn. The logic for identifying LoRA target modules and abliterable components has been updated to scan all layers instead of just the first one. The implementation is solid. I've added one suggestion to improve code readability.

Comment thread src/heretic/model.py Outdated
@p-e-w
Copy link
Copy Markdown
Owner

p-e-w commented Feb 22, 2026

Cool! Did you upload the abliterated model to HF?

@farolone
Copy link
Copy Markdown
Contributor Author

I will have to delete it. In two days.
matthiasstegner/Qwen3.5-397B-A17B-heretic

@p-e-w
Copy link
Copy Markdown
Owner

p-e-w commented Feb 22, 2026

Pity you didn't upload the model card, I wanted to check whether the individual components were listed as expected.

Do you by chance have the full parameters of the chosen trial?

@farolone
Copy link
Copy Markdown
Contributor Author

Abliteration parameters

┌────────────────────────┬───────┐
│ Parameter │ Value │
├────────────────────────┼───────┤
│ direction_index │ 52.45 │
├────────────────────────┼───────┤
│ attn.o_proj.max_weight │ 1.34 │
├────────────────────────┼───────┤
│ max_weight_position │ 36.86 │
├────────────────────────┼───────┤
│ min_weight │ 0.66 │
├────────────────────────┼───────┤
│ min_weight_distance │ 15.21 │
└────────────────────────┴───────┘

Performance

Metric: KL divergence
This model: 0.2676
Original model (https://huggingface.co/Qwen/Qwen3.5-397B-A17B): 0 (by
definition)
────────────────────────────────────────
Metric: Refusals
This model: 7/100
Original model (https://huggingface.co/Qwen/Qwen3.5-397B-A17B): ~100/100

@p-e-w
Copy link
Copy Markdown
Owner

p-e-w commented Feb 22, 2026

That can't be correct. Where is the component name for the last three parameters?

Sehyo pushed a commit to Sehyo/heretic that referenced this pull request Mar 3, 2026
Qwen3.5 MoE uses hybrid arch, needing some more care:

1. **Hybrid attention layers**: Some layers use standard self-attention
   (`self_attn.o_proj`) while others use GatedDeltaNet linear attention
   (`linear_attn.out_proj`). -> Needs special handling introduced in this PR.

2. **Fused expert parameters**: Expert weights are stored as stacked
   `nn.Parameter` tensors and not just as nn.Module -> Also handled.

Changes:
- Add fused expert detection (`get_fused_expert_params()`) and direct
  weight modification with snapshot/restore for model reset
- Add hybrid layer support: `linear_attn.out_proj` as alternative
  attention out-projection, `mlp.shared_expert.down_proj` for shared
  expert
- Scan all layers in `get_abliterable_components()` instead of only
  layer 0, since hybrid models have different components per layer
- Derive LoRA `target_modules` from `named_modules()` instead of
  splitting component keys, which is more robust for hybrid
  architectures
- Pass `enable_thinking=False` to `apply_chat_template()` so models
  with thinking mode (e.g. Qwen3.5) produce prompts without `<think>`
  tags, matching standard deployment configurations. Falls back
  gracefully for tokenizers that don't support this kwarg.

Tested on Qwen3.5-35B-A3B (NVIDIA RTX PRO 6000 Blackwell):
- 200-trial Pareto optimization completes successfully
- Fused expert abliteration correctly modifies expert slices

Comparison with other PRs for Qwen 3.5 MoE:
This PR covers:
- Hybrid layer detection
- Shared Expert (missing in p-e-w#187)
- Fused expert abliteration (missing in both p-e-w#187 and p-e-w#193)
- Separate attn component keys (o_proj + out_proj) (missing in p-e-w#187).
- I also set enable_thinking=False which is not in p-e-w#193 and p-e-w#187 (currently Heretic does inject <think></think> however for Qwen3.5 case those tokens will never exist in the first place if thinking is disabled anyway, so this approach should lead to slightly more correctness.
Sehyo added a commit to Sehyo/heretic that referenced this pull request Mar 3, 2026
Qwen3.5 MoE uses hybrid arch, needing some more care:

1. **Hybrid attention layers**: Some layers use standard self-attention
   (`self_attn.o_proj`) while others use GatedDeltaNet linear attention
   (`linear_attn.out_proj`). -> Needs special handling introduced in this PR.

2. **Fused expert parameters**: Expert weights are stored as stacked
   `nn.Parameter` tensors and not just as nn.Module -> Also handled.

Changes:
- Add fused expert detection (`get_fused_expert_params()`) and direct
  weight modification with snapshot/restore for model reset
- Add hybrid layer support: `linear_attn.out_proj` as alternative
  attention out-projection, `mlp.shared_expert.down_proj` for shared
  expert
- Scan all layers in `get_abliterable_components()` instead of only
  layer 0, since hybrid models have different components per layer
- Derive LoRA `target_modules` from `named_modules()` instead of
  splitting component keys, which is more robust for hybrid
  architectures
- Pass `enable_thinking=False` to `apply_chat_template()` so models
  with thinking mode (e.g. Qwen3.5) produce prompts without `<think>`
  tags, matching standard deployment configurations. Falls back
  gracefully for tokenizers that don't support this kwarg.

Tested on Qwen3.5-35B-A3B (NVIDIA RTX PRO 6000 Blackwell):
- 200-trial Pareto optimization completes successfully
- Fused expert abliteration correctly modifies expert slices

Comparison with other PRs for Qwen 3.5 MoE:
This PR covers:
- Hybrid layer detection
- Shared Expert (missing in p-e-w#187)
- Fused expert abliteration (missing in both p-e-w#187 and p-e-w#193)
- Separate attn component keys (o_proj + out_proj) (missing in p-e-w#187).
- I also set enable_thinking=False which is not in p-e-w#193 and p-e-w#187 (currently Heretic does inject <think></think> however for Qwen3.5 case those tokens will never exist in the first place if thinking is disabled anyway, so this approach should lead to slightly more correctness.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@p-e-w p-e-w merged commit 5e3c04c into p-e-w:master Mar 6, 2026
0 of 4 checks passed
@p-e-w
Copy link
Copy Markdown
Owner

p-e-w commented Mar 6, 2026

Merged, thank you!

MagicalAlchemist pushed a commit to MagicalAlchemist/heretic that referenced this pull request Mar 7, 2026
* feat: add Qwen3.5 MoE hybrid layer support

Qwen3.5 MoE uses GatedDeltaNet (linear attention) on some layers instead
of standard self-attention, causing abliteration to fail because
self_attn.o_proj doesn't exist on those layers.

Changes:
- Wrap self_attn.o_proj in suppress(Exception) and add linear_attn.out_proj
  as alternative attention out-projection for GatedDeltaNet layers
- Scan all layers in get_abliterable_components() instead of only layer 0,
  since hybrid models have different components on different layers
- Derive LoRA target_modules from actual named_modules() instead of
  splitting component keys, which fails when module names differ across
  layers (e.g. "o_proj" vs "out_proj")

Tested with Qwen3.5-397B-A17B (7/100 refusals, KL 0.2676).

Relates to p-e-w#43

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Philipp Emanuel Weidmann <pew@worldwidemann.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
0xA50C1A1 pushed a commit to 0xA50C1A1/heretic that referenced this pull request Mar 25, 2026
* feat: add Qwen3.5 MoE hybrid layer support

Qwen3.5 MoE uses GatedDeltaNet (linear attention) on some layers instead
of standard self-attention, causing abliteration to fail because
self_attn.o_proj doesn't exist on those layers.

Changes:
- Wrap self_attn.o_proj in suppress(Exception) and add linear_attn.out_proj
  as alternative attention out-projection for GatedDeltaNet layers
- Scan all layers in get_abliterable_components() instead of only layer 0,
  since hybrid models have different components on different layers
- Derive LoRA target_modules from actual named_modules() instead of
  splitting component keys, which fails when module names differ across
  layers (e.g. "o_proj" vs "out_proj")

Tested with Qwen3.5-397B-A17B (7/100 refusals, KL 0.2676).

Relates to p-e-w#43

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Philipp Emanuel Weidmann <pew@worldwidemann.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants