feat: add Qwen3.5 MoE hybrid layer support#187
Conversation
Qwen3.5 MoE uses GatedDeltaNet (linear attention) on some layers instead of standard self-attention, causing abliteration to fail because self_attn.o_proj doesn't exist on those layers. Changes: - Wrap self_attn.o_proj in suppress(Exception) and add linear_attn.out_proj as alternative attention out-projection for GatedDeltaNet layers - Scan all layers in get_abliterable_components() instead of only layer 0, since hybrid models have different components on different layers - Derive LoRA target_modules from actual named_modules() instead of splitting component keys, which fails when module names differ across layers (e.g. "o_proj" vs "out_proj") Tested with Qwen3.5-397B-A17B (7/100 refusals, KL 0.2676). Relates to p-e-w#43 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary of ChangesHello @farolone, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces comprehensive support for hybrid model architectures, specifically addressing the Qwen3.5 MoE model. It refines the module identification and LoRA targeting mechanisms to correctly handle layers with different attention output projections and varying component structures, thereby enhancing the abliteration framework's compatibility and reliability for complex models. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds support for Qwen3.5 MoE hybrid layers by making the model component discovery more robust. The changes correctly handle models with different module types across layers, such as self_attn and linear_attn. The logic for identifying LoRA target modules and abliterable components has been updated to scan all layers instead of just the first one. The implementation is solid. I've added one suggestion to improve code readability.
|
Cool! Did you upload the abliterated model to HF? |
|
I will have to delete it. In two days. |
|
Pity you didn't upload the model card, I wanted to check whether the individual components were listed as expected. Do you by chance have the full parameters of the chosen trial? |
|
Abliteration parameters ┌────────────────────────┬───────┐ Performance Metric: KL divergence |
|
That can't be correct. Where is the component name for the last three parameters? |
Qwen3.5 MoE uses hybrid arch, needing some more care: 1. **Hybrid attention layers**: Some layers use standard self-attention (`self_attn.o_proj`) while others use GatedDeltaNet linear attention (`linear_attn.out_proj`). -> Needs special handling introduced in this PR. 2. **Fused expert parameters**: Expert weights are stored as stacked `nn.Parameter` tensors and not just as nn.Module -> Also handled. Changes: - Add fused expert detection (`get_fused_expert_params()`) and direct weight modification with snapshot/restore for model reset - Add hybrid layer support: `linear_attn.out_proj` as alternative attention out-projection, `mlp.shared_expert.down_proj` for shared expert - Scan all layers in `get_abliterable_components()` instead of only layer 0, since hybrid models have different components per layer - Derive LoRA `target_modules` from `named_modules()` instead of splitting component keys, which is more robust for hybrid architectures - Pass `enable_thinking=False` to `apply_chat_template()` so models with thinking mode (e.g. Qwen3.5) produce prompts without `<think>` tags, matching standard deployment configurations. Falls back gracefully for tokenizers that don't support this kwarg. Tested on Qwen3.5-35B-A3B (NVIDIA RTX PRO 6000 Blackwell): - 200-trial Pareto optimization completes successfully - Fused expert abliteration correctly modifies expert slices Comparison with other PRs for Qwen 3.5 MoE: This PR covers: - Hybrid layer detection - Shared Expert (missing in p-e-w#187) - Fused expert abliteration (missing in both p-e-w#187 and p-e-w#193) - Separate attn component keys (o_proj + out_proj) (missing in p-e-w#187). - I also set enable_thinking=False which is not in p-e-w#193 and p-e-w#187 (currently Heretic does inject <think></think> however for Qwen3.5 case those tokens will never exist in the first place if thinking is disabled anyway, so this approach should lead to slightly more correctness.
Qwen3.5 MoE uses hybrid arch, needing some more care: 1. **Hybrid attention layers**: Some layers use standard self-attention (`self_attn.o_proj`) while others use GatedDeltaNet linear attention (`linear_attn.out_proj`). -> Needs special handling introduced in this PR. 2. **Fused expert parameters**: Expert weights are stored as stacked `nn.Parameter` tensors and not just as nn.Module -> Also handled. Changes: - Add fused expert detection (`get_fused_expert_params()`) and direct weight modification with snapshot/restore for model reset - Add hybrid layer support: `linear_attn.out_proj` as alternative attention out-projection, `mlp.shared_expert.down_proj` for shared expert - Scan all layers in `get_abliterable_components()` instead of only layer 0, since hybrid models have different components per layer - Derive LoRA `target_modules` from `named_modules()` instead of splitting component keys, which is more robust for hybrid architectures - Pass `enable_thinking=False` to `apply_chat_template()` so models with thinking mode (e.g. Qwen3.5) produce prompts without `<think>` tags, matching standard deployment configurations. Falls back gracefully for tokenizers that don't support this kwarg. Tested on Qwen3.5-35B-A3B (NVIDIA RTX PRO 6000 Blackwell): - 200-trial Pareto optimization completes successfully - Fused expert abliteration correctly modifies expert slices Comparison with other PRs for Qwen 3.5 MoE: This PR covers: - Hybrid layer detection - Shared Expert (missing in p-e-w#187) - Fused expert abliteration (missing in both p-e-w#187 and p-e-w#193) - Separate attn component keys (o_proj + out_proj) (missing in p-e-w#187). - I also set enable_thinking=False which is not in p-e-w#193 and p-e-w#187 (currently Heretic does inject <think></think> however for Qwen3.5 case those tokens will never exist in the first place if thinking is disabled anyway, so this approach should lead to slightly more correctness.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
Merged, thank you! |
* feat: add Qwen3.5 MoE hybrid layer support Qwen3.5 MoE uses GatedDeltaNet (linear attention) on some layers instead of standard self-attention, causing abliteration to fail because self_attn.o_proj doesn't exist on those layers. Changes: - Wrap self_attn.o_proj in suppress(Exception) and add linear_attn.out_proj as alternative attention out-projection for GatedDeltaNet layers - Scan all layers in get_abliterable_components() instead of only layer 0, since hybrid models have different components on different layers - Derive LoRA target_modules from actual named_modules() instead of splitting component keys, which fails when module names differ across layers (e.g. "o_proj" vs "out_proj") Tested with Qwen3.5-397B-A17B (7/100 refusals, KL 0.2676). Relates to p-e-w#43 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Philipp Emanuel Weidmann <pew@worldwidemann.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* feat: add Qwen3.5 MoE hybrid layer support Qwen3.5 MoE uses GatedDeltaNet (linear attention) on some layers instead of standard self-attention, causing abliteration to fail because self_attn.o_proj doesn't exist on those layers. Changes: - Wrap self_attn.o_proj in suppress(Exception) and add linear_attn.out_proj as alternative attention out-projection for GatedDeltaNet layers - Scan all layers in get_abliterable_components() instead of only layer 0, since hybrid models have different components on different layers - Derive LoRA target_modules from actual named_modules() instead of splitting component keys, which fails when module names differ across layers (e.g. "o_proj" vs "out_proj") Tested with Qwen3.5-397B-A17B (7/100 refusals, KL 0.2676). Relates to p-e-w#43 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Philipp Emanuel Weidmann <pew@worldwidemann.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Summary
Qwen3.5 MoE uses GatedDeltaNet (linear attention) on some layers instead of standard self-attention. This causes abliteration to fail because
self_attn.o_projdoesn't exist on those layers.self_attn.o_projaccess insuppress(Exception)and addlinear_attn.out_projas alternative attention out-projection for GatedDeltaNet layersget_abliterable_components()instead of only layer 0, since hybrid models have different components on different layerstarget_modulesfrom actualnamed_modules()instead of splitting component keys, which fails when module names differ across layers (e.g.o_projvsout_proj)Test results
Successfully abliterated Qwen3.5-397B-A17B (7× H200 SXM):
direction_index=52.45, attn.o_proj.max_weight=1.34, max_weight_position=36.86, min_weight=0.66, min_weight_distance=15.21The resulting model passes both censorship removal and quality preservation tests.
Relation to #43
This PR is complementary to #43 (hybrid layer support for LFM/Mamba/Conv). While #43 adds broad architecture support across
get_layer_matrices(), this PR focuses specifically on Qwen3.5 MoE's GatedDeltaNet layers and fixesget_layer_modules()+ LoRA targeting which #43 does not touch.🤖 Generated with Claude Code