Migrate default model to Qwen3.5-9B by kfallah · Pull Request #42 · kfallah/CLaaS

kfallah · 2026-03-07T03:37:36Z

Summary

Migrate the entire CLaaS stack from Qwen3-8B to Qwen/Qwen3.5-9B (hybrid GDN + full attention architecture)
Fix LoRA initialization for hybrid models: per-layer awareness for different attention types, correct q_proj dimensions (doubled for output gate)
Fix coerce_template_ids to handle BatchEncoding (Mapping subclass, not plain dict)
Bump transformers>=5.0.0 and huggingface_hub>=1.3.0 for qwen3_5 model type support
Use dedicated vllm/vllm-openai:qwen3_5 Docker image with --enforce-eager (CUDA graph capture bug in GDN causal conv1d layer)

Changes across 26 files

Core config (6 files): Update default model ID in all configs, types, and defaults
Docker (5 files): New vLLM image tag, tool call parser (qwen3_coder), --enforce-eager, init container with --extra local + CPU torch
Training (1 file): create_initial_lora now reads layer_types and attn_output_gate from model config to create correctly-shaped LoRA weights per layer type
Inference (1 file): coerce_template_ids handles BatchEncoding via __getitem__ + "input_ids" in result instead of isinstance(result, dict)
Tests (5 files): Update model references
Docs (4 files): Update README, docker README, setup skills
Deps (2 files): transformers 5.x, huggingface_hub 1.3+, remove teacher extra (vllm conflicts with transformers 5.x)

Test plan

uv run ruff check passes
uv run pytest tests/ -m "not integration" — 114 passed
Full Docker stack tested end-to-end: vLLM → CLaaS API → OpenClaw with LoRA adapter
CI lint-and-test job

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Enhanced LoRA initialization with support for multimodal and complex model architectures
- Improved input handling for inference operations
Updates
- Default base model upgraded from Qwen3-8B to Qwen3.5-9B across all deployments
- Tool Call Parser configuration updated
- Dependencies updated: transformers (≥5.0.0) and huggingface_hub (≥1.3.0)
- Docker configurations modernized for improved compatibility

Qwen3.5-9B is a hybrid architecture (Gated Delta Networks + full attention) that requires several adaptations: - vLLM: use dedicated qwen3_5 Docker image, qwen3_coder tool call parser, --enforce-eager (CUDA graph capture bug in GDN causal conv1d layer) - LoRA init: handle per-layer architecture differences — full_attention layers have q_proj doubled for output gate (8192 vs 4096), linear_attention (GDN) layers lack q/k/v/o_proj entirely - Dependencies: bump transformers>=5.0.0 and huggingface_hub>=1.3.0 for qwen3_5 model type support - Init container: install --extra local with CPU torch for LoRA weight creation - coerce_template_ids: handle BatchEncoding (Mapping subclass, not dict) Tested end-to-end: vLLM serves model, CLaaS API proxies with LoRA, OpenClaw routes through successfully. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-03-07T03:37:53Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4079414f-f066-4747-9fbf-4bd8c2a407b4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This pull request upgrades the default model from Qwen3-8B to Qwen3.5-9B across the codebase, updates Docker vLLM configuration and dependencies, enhances LoRA training for multimodal models, and adds observability improvements.

Changes

Cohort / File(s)	Summary
Model Version Upgrade (Qwen3-8B → Qwen3.5-9B) `claas/core/config.py`, `claas/core/configs/local.yaml`, `claas/core/configs/modal.yaml`, `claas/core/configs/tinker.yaml`, `claas/core/types.py`, `claas/eval/types.py`, `claas/modal/worker.py`, `docker/.env.local.example`, `docker/scripts/init-stack.py`, `docker/scripts/start_vllm.sh`	Updated default base model identifier and allowed model lists across all configuration files, default value assignments, and environment fallbacks to reference the newer Qwen3.5-9B model variant.
Documentation & Setup Files `.claude/skills/setup-local/SKILL.md`, `.claude/skills/setup-modal/SKILL.md`, `README.md`, `docker/README.md`	Updated references to base model version from Qwen3-8B to Qwen3.5-9B in documentation, setup guides, and quick-start instructions. Includes vLLM startup command updates and environment variable documentation.
Docker Configuration `docker/docker-compose.yml`, `docker/Dockerfile.init`	Updated vLLM service image tag to `qwen3_5`, changed startup script reference, added `--enforce-eager` flag, updated served model names and tool call parser. Modified Dockerfile to include local dependencies and CPU-only Torch installation.
LoRA Training Enhancement `claas/training/storage.py`	Extends base model dimension inference to support multimodal/text-config nesting, adds layer-type-aware attention handling with multipliers, introduces separate dimension mapping for attention vs MLP modules (gate_proj, up_proj, down_proj), and enforces supported module validation.
Dependency Updates `pyproject.toml`	Updated `huggingface_hub` from exact pin 0.36.2 to `>=1.3.0`, `transformers` from exact pin 4.57.6 to `>=5.0.0`, and removed the `teacher` optional-dependency group.
Inference & Observability `claas/inference/helpers.py`, `plugins/claas-feedback/index.ts`	Enhanced `coerce_template_ids` to support dict-like objects with `__getitem__` (e.g., BatchEncoding). Added debug logging block in feedback plugin's agent_end handler to inspect and log assistant message content.
Test Updates `tests/integration/test_local_engine_integration.py`, `tests/test_api.py`, `tests/test_config.py`, `tests/test_env_fallbacks.py`, `tests/test_local_training_engine.py`	Updated test fixtures, environment setup, and assertions to reflect new default base model identifier Qwen3.5-9B and corresponding allowed model lists.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Qwen hops from eight to point-five-nine,
With eager flags and deps divine!
LoRA learns to handle layers deep,
While Docker scripts make changes sweep. 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and concisely summarizes the main objective: migrating the default model from Qwen3-8B to Qwen3.5-9B, which is the primary change across all 26 modified files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/qwen3.5-9b-migration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 15ef5450ab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-07T03:42:41Z

plugins/claas-feedback/index.ts

+    if (lastAssistant) {
+      const raw = JSON.stringify((lastAssistant as Record<string, unknown>).content);
+      console.log("[claas-feedback] content type:", typeof (lastAssistant as Record<string, unknown>).content, Array.isArray((lastAssistant as Record<string, unknown>).content) ? "(array)" : "");
+      console.log("[claas-feedback] preview:", raw.slice(0, 500));


Guard agent_end debug parsing when content is missing

If an assistant message arrives without a content field, JSON.stringify(...) yields undefined, so raw.slice(...) (and raw.includes(...)) throws and aborts the agent_end hook before contextStore.set(...) runs. In that case the feedback command loses the just-finished conversation context, so this path should skip debug parsing when raw is not a string.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-07T03:42:41Z

plugins/claas-feedback/index.ts

+      console.log("[claas-feedback] content type:", typeof (lastAssistant as Record<string, unknown>).content, Array.isArray((lastAssistant as Record<string, unknown>).content) ? "(array)" : "");
+      console.log("[claas-feedback] preview:", raw.slice(0, 500));
+      console.log("[claas-feedback] has thinking:", raw.includes("think") || raw.includes("thinking"));


Gate assistant payload logging behind debug mode

These console.log statements always run, even when CLAAS_FEEDBACK_DEBUG is false, so every assistant turn writes raw content previews into service logs. That introduces unnecessary exposure of user/model text in production and increases log noise/cost; this should be routed through the existing logDebug gate (or removed after investigation).

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

claas/training/storage.py (1)
486-493: ⚠️ Potential issue | 🟠 Major

Breaking change: custom target_modules now strictly validated.

The new validation rejects any module name not in dim_map. Per claas/core/types.py:259-285, LoraInitRequest.target_modules accepts arbitrary strings. Callers or scripts using custom module names (e.g., "embed_tokens", "lm_head") will now receive a ValueError.

If custom modules should be allowed, consider skipping unknown modules with a warning instead of raising. If strict validation is intentional, document this breaking change.
♻️ Alternative: skip unknown modules with warning
-    unsupported_modules = sorted(set(target_modules) - set(dim_map))
-    if unsupported_modules:
-        raise ValueError(
-            "Unsupported target_modules: "
-            + ", ".join(unsupported_modules)
-            + ". Supported modules: "
-            + ", ".join(sorted(dim_map))
-        )
+    supported_modules = [m for m in target_modules if m in dim_map]
+    unsupported_modules = sorted(set(target_modules) - set(dim_map))
+    if unsupported_modules:
+        import warnings
+        warnings.warn(
+            f"Skipping unsupported target_modules: {', '.join(unsupported_modules)}. "
+            f"Supported: {', '.join(sorted(dim_map))}"
+        )
+    target_modules = supported_modules
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/training/storage.py` around lines 486 - 493, The current strict
validation builds unsupported_modules from target_modules minus dim_map and
raises a ValueError; change this to be non-breaking by filtering out unknown
modules and emitting a warning instead of raising: compute the allowed set as
the intersection of target_modules and dim_map, if any unsupported_modules
remain call warnings.warn or the module logger with a clear message listing the
skipped names, and proceed using the filtered list (replace use of
unsupported_modules and the raise in the block where unsupported_modules is
defined). Ensure behavior of downstream code that expects target_modules now
uses the filtered/validated list.
claas/modal/worker.py (1)
31-31: ⚠️ Potential issue | 🔴 Critical

Update Modal worker transformers pin to support Qwen3.5: currently pinned to <5.0.0 but Qwen3.5 requires transformers 5.x or later.

Qwen3.5 support was added only to Transformers 5.x (as of February 2026) and cannot work with transformers 4.x out of the box. The Modal worker's transformers>=4.40.0,<5.0.0 constraint is incompatible with Qwen3.5, which the PR aims to support. Update the constraint to transformers>=5.0.0 to resolve this conflict.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/modal/worker.py` at line 31, Replace the pinned dependency string
"transformers>=4.40.0,<5.0.0" with "transformers>=5.0.0" in the Modal worker
requirement list so the worker can load Qwen3.5; update the requirement literal
wherever "transformers>=4.40.0,<5.0.0" appears (the dependency entry in the
worker's requirements list) to the new "transformers>=5.0.0" spec.

🧹 Nitpick comments (1)

claas/training/storage.py (1)
506-509: Test coverage gap: hybrid model logic is untested.

Per tests/test_storage.py:206-217, the mock config has no layer_types field, so the new hybrid-model handling (skipping attention modules for non-full-attention layers, q_proj doubling for output gate) is not exercised by tests. A miscalculation in tensor shapes or key naming would not be detected.

Consider adding a test case with a mock config that includes layer_types and attn_output_gate.

Would you like me to generate a test case for hybrid model LoRA initialization?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/training/storage.py` around lines 506 - 509, Add a unit test in
tests/test_storage.py that provides a mock model config containing layer_types
(with a mix of "full_attention" and non-full types) and attn_output_gate
enabled, then run the LoRA initialization code paths that reference
attn_modules, mod_name and layer_type in claas/training/storage.py to exercise
the branch that skips attention modules for non-full-attention layers and the
q_proj doubling behavior for output gate; assert that attention modules in
non-full-attention layers are not modified/registered, that q_proj-related
parameter keys are created with the expected doubled shapes/naming for gated
attention (check names like q_proj and any gate-specific suffixes), and that no
shape/key mismatches occur.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@claas/training/storage.py`:
- Line 504: The current selection layer_type = layer_types[layer_idx] if
layer_types else "full_attention" can raise IndexError when layer_types exists
but its length doesn't cover layer_idx; update the logic in the function where
layer_types and layer_idx are used (referencing layer_types, layer_idx,
num_layers/num_hidden_layers) to defensively check length—e.g. if layer_types
and layer_idx < len(layer_types) then use layer_types[layer_idx], else fall back
to "full_attention" (or validate and raise a clear error if mismatched lengths
are unacceptable); you can also add an explicit validation earlier that compares
len(layer_types) to num_hidden_layers and logs or raises a descriptive
exception.

In `@docker/scripts/start_vllm.sh`:
- Around line 6-13: The TOOL_CALL_PARSER default (TOOL_CALL_PARSER) is set to
qwen3_coder while MODEL defaults to Qwen/Qwen3.5-9B—verify and change the parser
to one compatible with the base Qwen3.5-9B (or make TOOL_CALL_PARSER conditional
on MODEL) so you aren't forcing the Qwen3-Coder parser onto a non-coder model;
also explicitly pin the vLLM dependency in your requirements (or equivalent
install manifest) to >=0.10.1.1 to avoid the known RCE in vLLM 0.10.0–0.10.1.1,
and add a brief comment near the TOOL_CALL_PARSER and MODEL declarations
documenting the compatibility requirement and the security-pinned vLLM version.

In `@plugins/claas-feedback/index.ts`:
- Around line 104-111: The temporary debug block currently uses console.log
unconditionally and should be removed or gated by the existing debug flag;
replace the direct console.log usage in the block that finds lastAssistant
(using messages.slice().reverse().find(...) and raw) with calls to the module's
logDebug helper and guard with the debugEnabled check (the same pattern used at
lines where logDebug is used) so the inspection of lastAssistant.content (type,
preview, has thinking) only emits when debugEnabled is true.

In `@pyproject.toml`:
- Line 37: The dependency declaration "transformers>=5.0.0" is too low and may
resolve to a release missing Qwen3.5 support; update the dependency
specification for the transformers package (the string "transformers>=5.0.0" in
pyproject.toml) to require a minimum that includes Qwen3.5 support by changing
it to a constrained range such as "transformers>=5.2.0,<6" so environments
cannot pull an incompatible 5.x release.

---

Outside diff comments:
In `@claas/modal/worker.py`:
- Line 31: Replace the pinned dependency string "transformers>=4.40.0,<5.0.0"
with "transformers>=5.0.0" in the Modal worker requirement list so the worker
can load Qwen3.5; update the requirement literal wherever
"transformers>=4.40.0,<5.0.0" appears (the dependency entry in the worker's
requirements list) to the new "transformers>=5.0.0" spec.

In `@claas/training/storage.py`:
- Around line 486-493: The current strict validation builds unsupported_modules
from target_modules minus dim_map and raises a ValueError; change this to be
non-breaking by filtering out unknown modules and emitting a warning instead of
raising: compute the allowed set as the intersection of target_modules and
dim_map, if any unsupported_modules remain call warnings.warn or the module
logger with a clear message listing the skipped names, and proceed using the
filtered list (replace use of unsupported_modules and the raise in the block
where unsupported_modules is defined). Ensure behavior of downstream code that
expects target_modules now uses the filtered/validated list.

---

Nitpick comments:
In `@claas/training/storage.py`:
- Around line 506-509: Add a unit test in tests/test_storage.py that provides a
mock model config containing layer_types (with a mix of "full_attention" and
non-full types) and attn_output_gate enabled, then run the LoRA initialization
code paths that reference attn_modules, mod_name and layer_type in
claas/training/storage.py to exercise the branch that skips attention modules
for non-full-attention layers and the q_proj doubling behavior for output gate;
assert that attention modules in non-full-attention layers are not
modified/registered, that q_proj-related parameter keys are created with the
expected doubled shapes/naming for gated attention (check names like q_proj and
any gate-specific suffixes), and that no shape/key mismatches occur.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b7e10d90-7147-42d5-aff5-07eed1431745

📥 Commits

Reviewing files that changed from the base of the PR and between 838bf90 and 15ef545.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (25)

.claude/skills/setup-local/SKILL.md
.claude/skills/setup-modal/SKILL.md
README.md
claas/core/config.py
claas/core/configs/local.yaml
claas/core/configs/modal.yaml
claas/core/configs/tinker.yaml
claas/core/types.py
claas/eval/types.py
claas/inference/helpers.py
claas/modal/worker.py
claas/training/storage.py
docker/.env.local.example
docker/Dockerfile.init
docker/README.md
docker/docker-compose.yml
docker/scripts/init-stack.py
docker/scripts/start_vllm.sh
plugins/claas-feedback/index.ts
pyproject.toml
tests/integration/test_local_engine_integration.py
tests/test_api.py
tests/test_config.py
tests/test_env_fallbacks.py
tests/test_local_training_engine.py

coderabbitai · 2026-03-07T03:45:54Z

claas/training/storage.py

    # while allowing gradients to propagate through A.
    tensors: dict[str, torch.Tensor] = {}
    for layer_idx in range(num_layers):
+        layer_type = layer_types[layer_idx] if layer_types else "full_attention"


⚠️ Potential issue | 🟡 Minor

Potential IndexError if layer_types length doesn't match num_layers.

If a model's config has a layer_types list with a different length than num_hidden_layers, this line will raise an IndexError. Consider adding a length validation or using get with a fallback.

🛡️ Proposed defensive check

tensors: dict[str, torch.Tensor] = {} + if layer_types and len(layer_types) != num_layers: + raise ValueError( + f"layer_types length ({len(layer_types)}) != num_hidden_layers ({num_layers})" + ) for layer_idx in range(num_layers): layer_type = layer_types[layer_idx] if layer_types else "full_attention"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@claas/training/storage.py` at line 504, The current selection layer_type = layer_types[layer_idx] if layer_types else "full_attention" can raise IndexError when layer_types exists but its length doesn't cover layer_idx; update the logic in the function where layer_types and layer_idx are used (referencing layer_types, layer_idx, num_layers/num_hidden_layers) to defensively check length—e.g. if layer_types and layer_idx < len(layer_types) then use layer_types[layer_idx], else fall back to "full_attention" (or validate and raise a clear error if mismatched lengths are unacceptable); you can also add an explicit validation earlier that compares len(layer_types) to num_hidden_layers and logs or raises a descriptive exception.

coderabbitai · 2026-03-07T03:45:54Z

docker/scripts/start_vllm.sh

+MODEL="${MODEL:-Qwen/Qwen3.5-9B}"
 HOST="${HOST:-127.0.0.1}"
 PORT="${PORT:-8000}"
 API_KEY="${API_KEY:-sk-local}"
-SERVED_MODEL_NAMES="${SERVED_MODEL_NAMES:-qwen3-8b}"
+SERVED_MODEL_NAMES="${SERVED_MODEL_NAMES:-qwen3.5-9b}"
 ENABLE_SLEEP_MODE="${ENABLE_SLEEP_MODE:-1}"
 ENABLE_AUTO_TOOL_CHOICE="${ENABLE_AUTO_TOOL_CHOICE:-1}"
-TOOL_CALL_PARSER="${TOOL_CALL_PARSER:-qwen3_xml}"
+TOOL_CALL_PARSER="${TOOL_CALL_PARSER:-qwen3_coder}"


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

vLLM qwen3_coder tool call parser

💡 Result:

In vLLM, the qwen3_coder tool-call parser is the built-in parser that extracts Qwen3-Coder’s XML-style tool calls from the model’s output and converts them into the OpenAI-compatible tool_calls structure. The implementation is Qwen3CoderToolParser in vllm/tool_parsers/qwen3coder_tool_parser.py. [2]

How to enable it (vLLM OpenAI server)

Use both flags:

--enable-auto-tool-choice

--tool-call-parser qwen3_coder [1]

Example:

vllm serve Qwen/Qwen3-Coder-... \ --enable-auto-tool-choice \ --tool-call-parser qwen3_coder

Custom / alternative parsers

vLLM also supports registering your own parser via --tool-parser-plugin, then selecting it by name with --tool-call-parser. [1]

Security note (important)

If you use --tool-call-parser qwen3_coder, upgrade vLLM to >= 0.10.1.1: the Qwen3-Coder tool parser had an RCE vulnerability (use of eval() during parameter parsing) affecting >=0.10.0 and <0.10.1.1, patched in 0.10.1.1. [3]

Sources: [1] [2] [3]

🏁 Script executed:

# Search for vLLM version specification in the repository find . -type f $ -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" -o -name "Dockerfile" -o -name "docker-compose*.yml" $ | head -20 | xargs grep -l "vllm\|Qwen" 2>/dev/null

Repository: kfallah/CLaaS

Length of output: 84

🏁 Script executed:

# Look for any other references to Qwen3.5 or tool parser configuration in the repo rg "Qwen3\.?5|tool.?parser|qwen3" --type sh --type py --type yaml --max-count 20

Repository: kfallah/CLaaS

Length of output: 2620

🏁 Script executed:

# Check the start_vllm.sh file to understand the context better cat -n docker/scripts/start_vllm.sh

Repository: kfallah/CLaaS

Length of output: 4357

🏁 Script executed:

# Check docker-compose.yml for vLLM version information cat -n docker/docker-compose.yml | head -30

Repository: kfallah/CLaaS

Length of output: 1191

🏁 Script executed:

# Search for Qwen-Coder model references to see if there's a coder variant rg "Coder|coder" --type sh --type py --type yaml -B2 -A2

Repository: kfallah/CLaaS

Length of output: 2326

🏁 Script executed:

# Check requirements or dependencies for vLLM version rg "vllm" docker/scripts/ claas/ --type py --type txt --type yaml

Repository: kfallah/CLaaS

Length of output: 5589

Verify tool parser compatibility and vLLM version for security.

The model default is correctly set to Qwen3.5-9B, but qwen3_coder is designed specifically for Qwen3-Coder models, not the base Qwen3.5-9B. Confirm whether this model variant supports tool calling with the qwen3_coder parser or if a different parser should be used.

Additionally, the vLLM version should be explicitly pinned in requirements. The qwen3_coder parser had an RCE vulnerability (use of eval() in parameter parsing) affecting vLLM >= 0.10.0 and < 0.10.1.1—ensure the deployment uses >= 0.10.1.1 or later.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docker/scripts/start_vllm.sh` around lines 6 - 13, The TOOL_CALL_PARSER default (TOOL_CALL_PARSER) is set to qwen3_coder while MODEL defaults to Qwen/Qwen3.5-9B—verify and change the parser to one compatible with the base Qwen3.5-9B (or make TOOL_CALL_PARSER conditional on MODEL) so you aren't forcing the Qwen3-Coder parser onto a non-coder model; also explicitly pin the vLLM dependency in your requirements (or equivalent install manifest) to >=0.10.1.1 to avoid the known RCE in vLLM 0.10.0–0.10.1.1, and add a brief comment near the TOOL_CALL_PARSER and MODEL declarations documenting the compatibility requirement and the security-pinned vLLM version.

coderabbitai · 2026-03-07T03:45:54Z

plugins/claas-feedback/index.ts

+    // Debug: inspect assistant message content shape for proxy-removal investigation
+    const lastAssistant = messages.slice().reverse().find((m: Record<string, unknown>) => m.role === "assistant");
+    if (lastAssistant) {
+      const raw = JSON.stringify((lastAssistant as Record<string, unknown>).content);
+      console.log("[claas-feedback] content type:", typeof (lastAssistant as Record<string, unknown>).content, Array.isArray((lastAssistant as Record<string, unknown>).content) ? "(array)" : "");
+      console.log("[claas-feedback] preview:", raw.slice(0, 500));
+      console.log("[claas-feedback] has thinking:", raw.includes("think") || raw.includes("thinking"));
+    }


⚠️ Potential issue | 🟡 Minor

Unconditional console.log bypasses debugEnabled flag.

This debug block uses console.log directly while the rest of the file uses logDebug (lines 78, 113, 154) to respect the debugEnabled configuration. This will emit logs in production regardless of the debug setting.

Since the comment indicates this is for "proxy-removal investigation", consider either:

Removing this temporary debug code before merging, or

Gating it behind the existing debugEnabled flag using logDebug.

🛠️ Option 2: Gate behind debugEnabled

// Debug: inspect assistant message content shape for proxy-removal investigation const lastAssistant = messages.slice().reverse().find((m: Record<string, unknown>) => m.role === "assistant"); - if (lastAssistant) { + if (debugEnabled && lastAssistant) { const raw = JSON.stringify((lastAssistant as Record<string, unknown>).content); - console.log("[claas-feedback] content type:", typeof (lastAssistant as Record<string, unknown>).content, Array.isArray((lastAssistant as Record<string, unknown>).content) ? "(array)" : ""); - console.log("[claas-feedback] preview:", raw.slice(0, 500)); - console.log("[claas-feedback] has thinking:", raw.includes("think") || raw.includes("thinking")); + logDebug(`[claas-feedback] content type: ${typeof (lastAssistant as Record<string, unknown>).content}${Array.isArray((lastAssistant as Record<string, unknown>).content) ? " (array)" : ""}`); + logDebug(`[claas-feedback] preview: ${raw.slice(0, 500)}`); + logDebug(`[claas-feedback] has thinking: ${raw.includes("think") || raw.includes("thinking")}`); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Debug: inspect assistant message content shape for proxy-removal investigation

const lastAssistant = messages.slice().reverse().find((m: Record<string, unknown>) => m.role === "assistant");

if (lastAssistant) {

const raw = JSON.stringify((lastAssistant as Record<string, unknown>).content);

console.log("[claas-feedback] content type:", typeof (lastAssistant as Record<string, unknown>).content, Array.isArray((lastAssistant as Record<string, unknown>).content) ? "(array)" : "");

console.log("[claas-feedback] preview:", raw.slice(0, 500));

console.log("[claas-feedback] has thinking:", raw.includes("think") || raw.includes("thinking"));

}

// Debug: inspect assistant message content shape for proxy-removal investigation

const lastAssistant = messages.slice().reverse().find((m: Record<string, unknown>) => m.role === "assistant");

if (debugEnabled && lastAssistant) {

const raw = JSON.stringify((lastAssistant as Record<string, unknown>).content);

logDebug(`[claas-feedback] content type: ${typeof (lastAssistant as Record<string, unknown>).content}${Array.isArray((lastAssistant as Record<string, unknown>).content) ? " (array)" : ""}`);

logDebug(`[claas-feedback] preview: ${raw.slice(0, 500)}`);

logDebug(`[claas-feedback] has thinking: ${raw.includes("think") || raw.includes("thinking")}`);

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/claas-feedback/index.ts` around lines 104 - 111, The temporary debug block currently uses console.log unconditionally and should be removed or gated by the existing debug flag; replace the direct console.log usage in the block that finds lastAssistant (using messages.slice().reverse().find(...) and raw) with calls to the module's logDebug helper and guard with the debugEnabled check (the same pattern used at lines where logDebug is used) so the inspection of lastAssistant.content (type, preview, has thinking) only emits when debugEnabled is true.

coderabbitai · 2026-03-07T03:45:54Z

pyproject.toml

 local = [
    "torch>=2.1.0",
-    "transformers==4.57.6",
+    "transformers>=5.0.0",


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

transformers PyPI release history 5.0.0 5.1.0 5.2.0 dates

💡 Result:

transformers 5.0.0 — Released Jan 26, 2026 [1]

transformers 5.1.0 — Released Feb 5, 2026 [2]

transformers 5.2.0 — Released Feb 16, 2026 [3]

Sources: [1] (pypi.org) [2] (pypi.org) [3] (pypi.org)

Citations:

1: https://pypi.org/project/transformers/5.0.0/

2: https://pypi.org/project/transformers/5.1.0/

3: https://pypi.org/project/transformers/5.2.0/

🌐 Web query:

transformers Qwen3.5 support added when version

💡 Result:

Hugging Face Transformers added native Qwen3.5 support on 2026‑02‑09, and it’s available in the stable transformers release v5.2.0 (or newer). (huggingface.co)

Citations:

1: https://huggingface.co/docs/transformers/main/en/model_doc/qwen3_5?utm_source=openai

🏁 Script executed:

# Check if pyproject.toml exists and what the current transformers requirement is cat -n pyproject.toml | grep -A 5 -B 5 transformers

Repository: kfallah/CLaaS

Length of output: 285

🏁 Script executed:

# Search codebase for any references to qwen3.5 or qwen to understand if this is actually needed rg -i "qwen" --type py

Repository: kfallah/CLaaS

Length of output: 2765

Raise the minimum transformers version to a Qwen3.5-capable release.

5.0.0 is too low for this codebase. transformers 5.0.0 and 5.1.0 were published on January 26, 2026 and February 5, 2026, but Hugging Face added Qwen3.5 support on February 9, 2026. A constrained environment can still resolve to a version in this range that lacks qwen3_5 support. Based on the release timeline, >=5.2.0,<6 is the appropriate minimum.

Suggested change

- "transformers>=5.0.0", + "transformers>=5.2.0,<6",

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"transformers>=5.0.0",

"transformers>=5.2.0,<6",

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` at line 37, The dependency declaration "transformers>=5.0.0" is too low and may resolve to a release missing Qwen3.5 support; update the dependency specification for the transformers package (the string "transformers>=5.0.0" in pyproject.toml) to require a minimum that includes Qwen3.5 support by changing it to a constrained range such as "transformers>=5.2.0,<6" so environments cannot pull an incompatible 5.x release.

Qwen3.5's Gated Delta Network layers require these CUDA kernels for correct forward pass computation. Without them, transformers falls back to a buggy torch implementation that causes illegal memory access errors during SDPO distillation training. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector bot reviewed Mar 7, 2026

View reviewed changes

coderabbitai bot reviewed Mar 7, 2026

View reviewed changes

kfallah force-pushed the feat/qwen3.5-9b-migration branch from 59674e0 to 04f74d6 Compare March 7, 2026 04:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate default model to Qwen3.5-9B#42

Migrate default model to Qwen3.5-9B#42
kfallah wants to merge 2 commits intomainfrom
feat/qwen3.5-9b-migration

kfallah commented Mar 7, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 7, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 7, 2026

Uh oh!

coderabbitai bot Mar 7, 2026

Uh oh!

coderabbitai bot Mar 7, 2026

Uh oh!

coderabbitai bot Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kfallah commented Mar 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes across 26 files

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 7, 2026

Choose a reason for hiding this comment

How to enable it (vLLM OpenAI server)

Custom / alternative parsers

Security note (important)

Uh oh!

coderabbitai bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kfallah commented Mar 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 7, 2026 •

edited

Loading