Conversation
Qwen3.5-9B is a hybrid architecture (Gated Delta Networks + full attention) that requires several adaptations: - vLLM: use dedicated qwen3_5 Docker image, qwen3_coder tool call parser, --enforce-eager (CUDA graph capture bug in GDN causal conv1d layer) - LoRA init: handle per-layer architecture differences — full_attention layers have q_proj doubled for output gate (8192 vs 4096), linear_attention (GDN) layers lack q/k/v/o_proj entirely - Dependencies: bump transformers>=5.0.0 and huggingface_hub>=1.3.0 for qwen3_5 model type support - Init container: install --extra local with CPU torch for LoRA weight creation - coerce_template_ids: handle BatchEncoding (Mapping subclass, not dict) Tested end-to-end: vLLM serves model, CLaaS API proxies with LoRA, OpenClaw routes through successfully. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis pull request upgrades the default model from Qwen3-8B to Qwen3.5-9B across the codebase, updates Docker vLLM configuration and dependencies, enhances LoRA training for multimodal models, and adds observability improvements. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 15ef5450ab
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (lastAssistant) { | ||
| const raw = JSON.stringify((lastAssistant as Record<string, unknown>).content); | ||
| console.log("[claas-feedback] content type:", typeof (lastAssistant as Record<string, unknown>).content, Array.isArray((lastAssistant as Record<string, unknown>).content) ? "(array)" : ""); | ||
| console.log("[claas-feedback] preview:", raw.slice(0, 500)); |
There was a problem hiding this comment.
Guard agent_end debug parsing when content is missing
If an assistant message arrives without a content field, JSON.stringify(...) yields undefined, so raw.slice(...) (and raw.includes(...)) throws and aborts the agent_end hook before contextStore.set(...) runs. In that case the feedback command loses the just-finished conversation context, so this path should skip debug parsing when raw is not a string.
Useful? React with 👍 / 👎.
| console.log("[claas-feedback] content type:", typeof (lastAssistant as Record<string, unknown>).content, Array.isArray((lastAssistant as Record<string, unknown>).content) ? "(array)" : ""); | ||
| console.log("[claas-feedback] preview:", raw.slice(0, 500)); | ||
| console.log("[claas-feedback] has thinking:", raw.includes("think") || raw.includes("thinking")); |
There was a problem hiding this comment.
Gate assistant payload logging behind debug mode
These console.log statements always run, even when CLAAS_FEEDBACK_DEBUG is false, so every assistant turn writes raw content previews into service logs. That introduces unnecessary exposure of user/model text in production and increases log noise/cost; this should be routed through the existing logDebug gate (or removed after investigation).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
claas/training/storage.py (1)
486-493:⚠️ Potential issue | 🟠 MajorBreaking change: custom
target_modulesnow strictly validated.The new validation rejects any module name not in
dim_map. Perclaas/core/types.py:259-285,LoraInitRequest.target_modulesaccepts arbitrary strings. Callers or scripts using custom module names (e.g.,"embed_tokens","lm_head") will now receive aValueError.If custom modules should be allowed, consider skipping unknown modules with a warning instead of raising. If strict validation is intentional, document this breaking change.
♻️ Alternative: skip unknown modules with warning
- unsupported_modules = sorted(set(target_modules) - set(dim_map)) - if unsupported_modules: - raise ValueError( - "Unsupported target_modules: " - + ", ".join(unsupported_modules) - + ". Supported modules: " - + ", ".join(sorted(dim_map)) - ) + supported_modules = [m for m in target_modules if m in dim_map] + unsupported_modules = sorted(set(target_modules) - set(dim_map)) + if unsupported_modules: + import warnings + warnings.warn( + f"Skipping unsupported target_modules: {', '.join(unsupported_modules)}. " + f"Supported: {', '.join(sorted(dim_map))}" + ) + target_modules = supported_modules🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@claas/training/storage.py` around lines 486 - 493, The current strict validation builds unsupported_modules from target_modules minus dim_map and raises a ValueError; change this to be non-breaking by filtering out unknown modules and emitting a warning instead of raising: compute the allowed set as the intersection of target_modules and dim_map, if any unsupported_modules remain call warnings.warn or the module logger with a clear message listing the skipped names, and proceed using the filtered list (replace use of unsupported_modules and the raise in the block where unsupported_modules is defined). Ensure behavior of downstream code that expects target_modules now uses the filtered/validated list.claas/modal/worker.py (1)
31-31:⚠️ Potential issue | 🔴 CriticalUpdate Modal worker transformers pin to support Qwen3.5: currently pinned to
<5.0.0but Qwen3.5 requires transformers 5.x or later.Qwen3.5 support was added only to Transformers 5.x (as of February 2026) and cannot work with transformers 4.x out of the box. The Modal worker's
transformers>=4.40.0,<5.0.0constraint is incompatible with Qwen3.5, which the PR aims to support. Update the constraint totransformers>=5.0.0to resolve this conflict.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@claas/modal/worker.py` at line 31, Replace the pinned dependency string "transformers>=4.40.0,<5.0.0" with "transformers>=5.0.0" in the Modal worker requirement list so the worker can load Qwen3.5; update the requirement literal wherever "transformers>=4.40.0,<5.0.0" appears (the dependency entry in the worker's requirements list) to the new "transformers>=5.0.0" spec.
🧹 Nitpick comments (1)
claas/training/storage.py (1)
506-509: Test coverage gap: hybrid model logic is untested.Per
tests/test_storage.py:206-217, the mock config has nolayer_typesfield, so the new hybrid-model handling (skipping attention modules for non-full-attention layers, q_proj doubling for output gate) is not exercised by tests. A miscalculation in tensor shapes or key naming would not be detected.Consider adding a test case with a mock config that includes
layer_typesandattn_output_gate.Would you like me to generate a test case for hybrid model LoRA initialization?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@claas/training/storage.py` around lines 506 - 509, Add a unit test in tests/test_storage.py that provides a mock model config containing layer_types (with a mix of "full_attention" and non-full types) and attn_output_gate enabled, then run the LoRA initialization code paths that reference attn_modules, mod_name and layer_type in claas/training/storage.py to exercise the branch that skips attention modules for non-full-attention layers and the q_proj doubling behavior for output gate; assert that attention modules in non-full-attention layers are not modified/registered, that q_proj-related parameter keys are created with the expected doubled shapes/naming for gated attention (check names like q_proj and any gate-specific suffixes), and that no shape/key mismatches occur.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@claas/training/storage.py`:
- Line 504: The current selection layer_type = layer_types[layer_idx] if
layer_types else "full_attention" can raise IndexError when layer_types exists
but its length doesn't cover layer_idx; update the logic in the function where
layer_types and layer_idx are used (referencing layer_types, layer_idx,
num_layers/num_hidden_layers) to defensively check length—e.g. if layer_types
and layer_idx < len(layer_types) then use layer_types[layer_idx], else fall back
to "full_attention" (or validate and raise a clear error if mismatched lengths
are unacceptable); you can also add an explicit validation earlier that compares
len(layer_types) to num_hidden_layers and logs or raises a descriptive
exception.
In `@docker/scripts/start_vllm.sh`:
- Around line 6-13: The TOOL_CALL_PARSER default (TOOL_CALL_PARSER) is set to
qwen3_coder while MODEL defaults to Qwen/Qwen3.5-9B—verify and change the parser
to one compatible with the base Qwen3.5-9B (or make TOOL_CALL_PARSER conditional
on MODEL) so you aren't forcing the Qwen3-Coder parser onto a non-coder model;
also explicitly pin the vLLM dependency in your requirements (or equivalent
install manifest) to >=0.10.1.1 to avoid the known RCE in vLLM 0.10.0–0.10.1.1,
and add a brief comment near the TOOL_CALL_PARSER and MODEL declarations
documenting the compatibility requirement and the security-pinned vLLM version.
In `@plugins/claas-feedback/index.ts`:
- Around line 104-111: The temporary debug block currently uses console.log
unconditionally and should be removed or gated by the existing debug flag;
replace the direct console.log usage in the block that finds lastAssistant
(using messages.slice().reverse().find(...) and raw) with calls to the module's
logDebug helper and guard with the debugEnabled check (the same pattern used at
lines where logDebug is used) so the inspection of lastAssistant.content (type,
preview, has thinking) only emits when debugEnabled is true.
In `@pyproject.toml`:
- Line 37: The dependency declaration "transformers>=5.0.0" is too low and may
resolve to a release missing Qwen3.5 support; update the dependency
specification for the transformers package (the string "transformers>=5.0.0" in
pyproject.toml) to require a minimum that includes Qwen3.5 support by changing
it to a constrained range such as "transformers>=5.2.0,<6" so environments
cannot pull an incompatible 5.x release.
---
Outside diff comments:
In `@claas/modal/worker.py`:
- Line 31: Replace the pinned dependency string "transformers>=4.40.0,<5.0.0"
with "transformers>=5.0.0" in the Modal worker requirement list so the worker
can load Qwen3.5; update the requirement literal wherever
"transformers>=4.40.0,<5.0.0" appears (the dependency entry in the worker's
requirements list) to the new "transformers>=5.0.0" spec.
In `@claas/training/storage.py`:
- Around line 486-493: The current strict validation builds unsupported_modules
from target_modules minus dim_map and raises a ValueError; change this to be
non-breaking by filtering out unknown modules and emitting a warning instead of
raising: compute the allowed set as the intersection of target_modules and
dim_map, if any unsupported_modules remain call warnings.warn or the module
logger with a clear message listing the skipped names, and proceed using the
filtered list (replace use of unsupported_modules and the raise in the block
where unsupported_modules is defined). Ensure behavior of downstream code that
expects target_modules now uses the filtered/validated list.
---
Nitpick comments:
In `@claas/training/storage.py`:
- Around line 506-509: Add a unit test in tests/test_storage.py that provides a
mock model config containing layer_types (with a mix of "full_attention" and
non-full types) and attn_output_gate enabled, then run the LoRA initialization
code paths that reference attn_modules, mod_name and layer_type in
claas/training/storage.py to exercise the branch that skips attention modules
for non-full-attention layers and the q_proj doubling behavior for output gate;
assert that attention modules in non-full-attention layers are not
modified/registered, that q_proj-related parameter keys are created with the
expected doubled shapes/naming for gated attention (check names like q_proj and
any gate-specific suffixes), and that no shape/key mismatches occur.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b7e10d90-7147-42d5-aff5-07eed1431745
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (25)
.claude/skills/setup-local/SKILL.md.claude/skills/setup-modal/SKILL.mdREADME.mdclaas/core/config.pyclaas/core/configs/local.yamlclaas/core/configs/modal.yamlclaas/core/configs/tinker.yamlclaas/core/types.pyclaas/eval/types.pyclaas/inference/helpers.pyclaas/modal/worker.pyclaas/training/storage.pydocker/.env.local.exampledocker/Dockerfile.initdocker/README.mddocker/docker-compose.ymldocker/scripts/init-stack.pydocker/scripts/start_vllm.shplugins/claas-feedback/index.tspyproject.tomltests/integration/test_local_engine_integration.pytests/test_api.pytests/test_config.pytests/test_env_fallbacks.pytests/test_local_training_engine.py
| # while allowing gradients to propagate through A. | ||
| tensors: dict[str, torch.Tensor] = {} | ||
| for layer_idx in range(num_layers): | ||
| layer_type = layer_types[layer_idx] if layer_types else "full_attention" |
There was a problem hiding this comment.
Potential IndexError if layer_types length doesn't match num_layers.
If a model's config has a layer_types list with a different length than num_hidden_layers, this line will raise an IndexError. Consider adding a length validation or using get with a fallback.
🛡️ Proposed defensive check
tensors: dict[str, torch.Tensor] = {}
+ if layer_types and len(layer_types) != num_layers:
+ raise ValueError(
+ f"layer_types length ({len(layer_types)}) != num_hidden_layers ({num_layers})"
+ )
for layer_idx in range(num_layers):
layer_type = layer_types[layer_idx] if layer_types else "full_attention"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@claas/training/storage.py` at line 504, The current selection layer_type =
layer_types[layer_idx] if layer_types else "full_attention" can raise IndexError
when layer_types exists but its length doesn't cover layer_idx; update the logic
in the function where layer_types and layer_idx are used (referencing
layer_types, layer_idx, num_layers/num_hidden_layers) to defensively check
length—e.g. if layer_types and layer_idx < len(layer_types) then use
layer_types[layer_idx], else fall back to "full_attention" (or validate and
raise a clear error if mismatched lengths are unacceptable); you can also add an
explicit validation earlier that compares len(layer_types) to num_hidden_layers
and logs or raises a descriptive exception.
| MODEL="${MODEL:-Qwen/Qwen3.5-9B}" | ||
| HOST="${HOST:-127.0.0.1}" | ||
| PORT="${PORT:-8000}" | ||
| API_KEY="${API_KEY:-sk-local}" | ||
| SERVED_MODEL_NAMES="${SERVED_MODEL_NAMES:-qwen3-8b}" | ||
| SERVED_MODEL_NAMES="${SERVED_MODEL_NAMES:-qwen3.5-9b}" | ||
| ENABLE_SLEEP_MODE="${ENABLE_SLEEP_MODE:-1}" | ||
| ENABLE_AUTO_TOOL_CHOICE="${ENABLE_AUTO_TOOL_CHOICE:-1}" | ||
| TOOL_CALL_PARSER="${TOOL_CALL_PARSER:-qwen3_xml}" | ||
| TOOL_CALL_PARSER="${TOOL_CALL_PARSER:-qwen3_coder}" |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
vLLM qwen3_coder tool call parser
💡 Result:
In vLLM, the qwen3_coder tool-call parser is the built-in parser that extracts Qwen3-Coder’s XML-style tool calls from the model’s output and converts them into the OpenAI-compatible tool_calls structure. The implementation is Qwen3CoderToolParser in vllm/tool_parsers/qwen3coder_tool_parser.py. [2]
How to enable it (vLLM OpenAI server)
Use both flags:
--enable-auto-tool-choice--tool-call-parser qwen3_coder[1]
Example:
vllm serve Qwen/Qwen3-Coder-... \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coderCustom / alternative parsers
vLLM also supports registering your own parser via --tool-parser-plugin, then selecting it by name with --tool-call-parser. [1]
Security note (important)
If you use --tool-call-parser qwen3_coder, upgrade vLLM to >= 0.10.1.1: the Qwen3-Coder tool parser had an RCE vulnerability (use of eval() during parameter parsing) affecting >=0.10.0 and <0.10.1.1, patched in 0.10.1.1. [3]
Sources: [1] [2] [3]
🏁 Script executed:
# Search for vLLM version specification in the repository
find . -type f \( -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" -o -name "Dockerfile" -o -name "docker-compose*.yml" \) | head -20 | xargs grep -l "vllm\|Qwen" 2>/dev/nullRepository: kfallah/CLaaS
Length of output: 84
🏁 Script executed:
# Look for any other references to Qwen3.5 or tool parser configuration in the repo
rg "Qwen3\.?5|tool.?parser|qwen3" --type sh --type py --type yaml --max-count 20Repository: kfallah/CLaaS
Length of output: 2620
🏁 Script executed:
# Check the start_vllm.sh file to understand the context better
cat -n docker/scripts/start_vllm.shRepository: kfallah/CLaaS
Length of output: 4357
🏁 Script executed:
# Check docker-compose.yml for vLLM version information
cat -n docker/docker-compose.yml | head -30Repository: kfallah/CLaaS
Length of output: 1191
🏁 Script executed:
# Search for Qwen-Coder model references to see if there's a coder variant
rg "Coder|coder" --type sh --type py --type yaml -B2 -A2Repository: kfallah/CLaaS
Length of output: 2326
🏁 Script executed:
# Check requirements or dependencies for vLLM version
rg "vllm" docker/scripts/ claas/ --type py --type txt --type yamlRepository: kfallah/CLaaS
Length of output: 5589
Verify tool parser compatibility and vLLM version for security.
The model default is correctly set to Qwen3.5-9B, but qwen3_coder is designed specifically for Qwen3-Coder models, not the base Qwen3.5-9B. Confirm whether this model variant supports tool calling with the qwen3_coder parser or if a different parser should be used.
Additionally, the vLLM version should be explicitly pinned in requirements. The qwen3_coder parser had an RCE vulnerability (use of eval() in parameter parsing) affecting vLLM >= 0.10.0 and < 0.10.1.1—ensure the deployment uses >= 0.10.1.1 or later.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/scripts/start_vllm.sh` around lines 6 - 13, The TOOL_CALL_PARSER
default (TOOL_CALL_PARSER) is set to qwen3_coder while MODEL defaults to
Qwen/Qwen3.5-9B—verify and change the parser to one compatible with the base
Qwen3.5-9B (or make TOOL_CALL_PARSER conditional on MODEL) so you aren't forcing
the Qwen3-Coder parser onto a non-coder model; also explicitly pin the vLLM
dependency in your requirements (or equivalent install manifest) to >=0.10.1.1
to avoid the known RCE in vLLM 0.10.0–0.10.1.1, and add a brief comment near the
TOOL_CALL_PARSER and MODEL declarations documenting the compatibility
requirement and the security-pinned vLLM version.
| // Debug: inspect assistant message content shape for proxy-removal investigation | ||
| const lastAssistant = messages.slice().reverse().find((m: Record<string, unknown>) => m.role === "assistant"); | ||
| if (lastAssistant) { | ||
| const raw = JSON.stringify((lastAssistant as Record<string, unknown>).content); | ||
| console.log("[claas-feedback] content type:", typeof (lastAssistant as Record<string, unknown>).content, Array.isArray((lastAssistant as Record<string, unknown>).content) ? "(array)" : ""); | ||
| console.log("[claas-feedback] preview:", raw.slice(0, 500)); | ||
| console.log("[claas-feedback] has thinking:", raw.includes("think") || raw.includes("thinking")); | ||
| } |
There was a problem hiding this comment.
Unconditional console.log bypasses debugEnabled flag.
This debug block uses console.log directly while the rest of the file uses logDebug (lines 78, 113, 154) to respect the debugEnabled configuration. This will emit logs in production regardless of the debug setting.
Since the comment indicates this is for "proxy-removal investigation", consider either:
- Removing this temporary debug code before merging, or
- Gating it behind the existing
debugEnabledflag usinglogDebug.
🛠️ Option 2: Gate behind debugEnabled
// Debug: inspect assistant message content shape for proxy-removal investigation
const lastAssistant = messages.slice().reverse().find((m: Record<string, unknown>) => m.role === "assistant");
- if (lastAssistant) {
+ if (debugEnabled && lastAssistant) {
const raw = JSON.stringify((lastAssistant as Record<string, unknown>).content);
- console.log("[claas-feedback] content type:", typeof (lastAssistant as Record<string, unknown>).content, Array.isArray((lastAssistant as Record<string, unknown>).content) ? "(array)" : "");
- console.log("[claas-feedback] preview:", raw.slice(0, 500));
- console.log("[claas-feedback] has thinking:", raw.includes("think") || raw.includes("thinking"));
+ logDebug(`[claas-feedback] content type: ${typeof (lastAssistant as Record<string, unknown>).content}${Array.isArray((lastAssistant as Record<string, unknown>).content) ? " (array)" : ""}`);
+ logDebug(`[claas-feedback] preview: ${raw.slice(0, 500)}`);
+ logDebug(`[claas-feedback] has thinking: ${raw.includes("think") || raw.includes("thinking")}`);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Debug: inspect assistant message content shape for proxy-removal investigation | |
| const lastAssistant = messages.slice().reverse().find((m: Record<string, unknown>) => m.role === "assistant"); | |
| if (lastAssistant) { | |
| const raw = JSON.stringify((lastAssistant as Record<string, unknown>).content); | |
| console.log("[claas-feedback] content type:", typeof (lastAssistant as Record<string, unknown>).content, Array.isArray((lastAssistant as Record<string, unknown>).content) ? "(array)" : ""); | |
| console.log("[claas-feedback] preview:", raw.slice(0, 500)); | |
| console.log("[claas-feedback] has thinking:", raw.includes("think") || raw.includes("thinking")); | |
| } | |
| // Debug: inspect assistant message content shape for proxy-removal investigation | |
| const lastAssistant = messages.slice().reverse().find((m: Record<string, unknown>) => m.role === "assistant"); | |
| if (debugEnabled && lastAssistant) { | |
| const raw = JSON.stringify((lastAssistant as Record<string, unknown>).content); | |
| logDebug(`[claas-feedback] content type: ${typeof (lastAssistant as Record<string, unknown>).content}${Array.isArray((lastAssistant as Record<string, unknown>).content) ? " (array)" : ""}`); | |
| logDebug(`[claas-feedback] preview: ${raw.slice(0, 500)}`); | |
| logDebug(`[claas-feedback] has thinking: ${raw.includes("think") || raw.includes("thinking")}`); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/claas-feedback/index.ts` around lines 104 - 111, The temporary debug
block currently uses console.log unconditionally and should be removed or gated
by the existing debug flag; replace the direct console.log usage in the block
that finds lastAssistant (using messages.slice().reverse().find(...) and raw)
with calls to the module's logDebug helper and guard with the debugEnabled check
(the same pattern used at lines where logDebug is used) so the inspection of
lastAssistant.content (type, preview, has thinking) only emits when debugEnabled
is true.
| local = [ | ||
| "torch>=2.1.0", | ||
| "transformers==4.57.6", | ||
| "transformers>=5.0.0", |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
transformers PyPI release history 5.0.0 5.1.0 5.2.0 dates
💡 Result:
- transformers 5.0.0 — Released Jan 26, 2026 [1]
- transformers 5.1.0 — Released Feb 5, 2026 [2]
- transformers 5.2.0 — Released Feb 16, 2026 [3]
Sources: [1] (pypi.org) [2] (pypi.org) [3] (pypi.org)
Citations:
- 1: https://pypi.org/project/transformers/5.0.0/
- 2: https://pypi.org/project/transformers/5.1.0/
- 3: https://pypi.org/project/transformers/5.2.0/
🌐 Web query:
transformers Qwen3.5 support added when version
💡 Result:
Hugging Face Transformers added native Qwen3.5 support on 2026‑02‑09, and it’s available in the stable transformers release v5.2.0 (or newer). (huggingface.co)
Citations:
🏁 Script executed:
# Check if pyproject.toml exists and what the current transformers requirement is
cat -n pyproject.toml | grep -A 5 -B 5 transformersRepository: kfallah/CLaaS
Length of output: 285
🏁 Script executed:
# Search codebase for any references to qwen3.5 or qwen to understand if this is actually needed
rg -i "qwen" --type pyRepository: kfallah/CLaaS
Length of output: 2765
Raise the minimum transformers version to a Qwen3.5-capable release.
5.0.0 is too low for this codebase. transformers 5.0.0 and 5.1.0 were published on January 26, 2026 and February 5, 2026, but Hugging Face added Qwen3.5 support on February 9, 2026. A constrained environment can still resolve to a version in this range that lacks qwen3_5 support. Based on the release timeline, >=5.2.0,<6 is the appropriate minimum.
Suggested change
- "transformers>=5.0.0",
+ "transformers>=5.2.0,<6",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "transformers>=5.0.0", | |
| "transformers>=5.2.0,<6", |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pyproject.toml` at line 37, The dependency declaration "transformers>=5.0.0"
is too low and may resolve to a release missing Qwen3.5 support; update the
dependency specification for the transformers package (the string
"transformers>=5.0.0" in pyproject.toml) to require a minimum that includes
Qwen3.5 support by changing it to a constrained range such as
"transformers>=5.2.0,<6" so environments cannot pull an incompatible 5.x
release.
Qwen3.5's Gated Delta Network layers require these CUDA kernels for correct forward pass computation. Without them, transformers falls back to a buggy torch implementation that causes illegal memory access errors during SDPO distillation training. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
59674e0 to
04f74d6
Compare
Summary
coerce_template_idsto handleBatchEncoding(Mapping subclass, not plain dict)transformers>=5.0.0andhuggingface_hub>=1.3.0for qwen3_5 model type supportvllm/vllm-openai:qwen3_5Docker image with--enforce-eager(CUDA graph capture bug in GDN causal conv1d layer)Changes across 26 files
Core config (6 files): Update default model ID in all configs, types, and defaults
Docker (5 files): New vLLM image tag, tool call parser (
qwen3_coder),--enforce-eager, init container with--extra local+ CPU torchTraining (1 file):
create_initial_loranow readslayer_typesandattn_output_gatefrom model config to create correctly-shaped LoRA weights per layer typeInference (1 file):
coerce_template_idshandlesBatchEncodingvia__getitem__+"input_ids" in resultinstead ofisinstance(result, dict)Tests (5 files): Update model references
Docs (4 files): Update README, docker README, setup skills
Deps (2 files): transformers 5.x, huggingface_hub 1.3+, remove teacher extra (vllm conflicts with transformers 5.x)
Test plan
uv run ruff checkpassesuv run pytest tests/ -m "not integration"— 114 passed🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Updates