[Draft, no merge] MVP for vLLM Disagg by chunfangamd · Pull Request #948 · SemiAnalysisAI/InferenceX

chunfangamd · 2026-03-26T07:33:31Z

We prototype the PD Disagg on DeepSeek. So far, we've done

Framework Integration: Successfully integrated vLLM’s Prefill/Decode disaggregated architecture into the InferenceX framework.
MoRI/Deep EP Integration on DPSK V3: for MoE All-to-All and for KV Cache transfer.
Confirmed MoRI-IO outperforms the standard NixlConnector in TTFT (Time to First Token).
Stability & Bug Fixes: * Resolved hang issues at high concurrency (CONC512) by fixing KV cache leaks and optimizing the "Reaper" logic for memory block release.
Fixed hardware compatibility issues, including PCI topology failures on Broadcom switches and error handling for mlx5 NICs.
Cluster Validation: Verified the multi-node deployment recipe on the SA-9N (mia1) cluster.

co-authors:

Add multi-node vLLM PD disaggregation recipe using Nixl/RIXL KV transfer and vllm-router, mirroring the existing SGLang disagg recipe structure. - New benchmark config: dsr1-fp8-mi355x-vllm-disagg (1P2D, TP8) - New utils: vllm_disagg_utils/ (job.slurm, server.sh, submit.sh, etc.) - Runner: extend launch_mi355x-amds.sh for vllm-disagg framework

Extract hardcoded model configurations from server.sh bash maps and job.slurm VALID_MODELS into a declarative models.yaml, mirroring the SGLang disagg recipe pattern. Adding a new model now requires no script changes. Also: - Consolidate UCX transport vars in job.slurm Docker env; remove duplicated setup_ucx_env() from server.sh - Extract RDMA workarounds (ionic /31 route fix, Nixl UCX patch) into setup_rdma_env() helper - Lower UCX_LOG_LEVEL from info to warn - Add nicctl mount and QoS/DSCP auto-detection to env.sh - Remove stale host libionic bind-mounts (driver now built into image)

Adapt server.sh to vLLM v0.17.1 breaking changes: - Use simplified kv-transfer-config (side channel via env vars instead of kv_ip/kv_port, add kv_load_failure_policy) - Remove deprecated --disable-log-requests (disabled by default in v0.17) - Route NIXL side channel through RDMA IP for correct fabric path - Fix RIXL ucx_error_handling_mode patch for updated _api.py layout

bench.sh: replace `vllm bench serve` (log-only output) with the shared run_benchmark_serving helper from benchmark_lib.sh, matching the SGLang disagg pattern. This produces the .json result files that the multinode CI workflow expects (benchmark-multinode-tmpl.yml → process_result.py). server.sh: make the Nixl ucx_error_handling_mode=none runtime patch conditional on Pensando ionic RDMA devices (IBDEVICES=*ionic*). On the mia1 cluster (ConnectX/mlx5, IBDEVICES=rdma*), UCX handles error mode natively and the patch is skipped. Model-path resolution and IBDEVICES/UCX/QoS auto-detection were verified to already work on mia1 — no changes needed. Tested locally (Job 2802, 1P+2D, ISL/OSL=1024): conc 8 → 507 tok/s conc 32 → 1778 tok/s conc 16 → 1004 tok/s conc 64 → 2480 tok/s All four .json result files produced; 100% external prefix cache hit rate.

Move the vllm-router from a dedicated proxy node onto the first prefill node, mirroring SGLang's co-location pattern. This reduces the node count from xP + yD + 1 to xP + yD (e.g., 3 nodes instead of 4 for 1P+2D). - server.sh: NODE_RANK=0 now runs both vllm serve (prefill, port 2584) and vllm-router (port 30000); barrier waits on all nodes - submit.sh / job.slurm: NUM_NODES = PREFILL_NODES + DECODE_NODES - bench.sh: ROUTER_PORT default updated to 30000 Local 1P+2D benchmark (ISL/OSL=1024, DeepSeek-R1 FP8, MI355X): - Throughput: +1.6% to +8.4% across concurrency 8-64 - Mean TTFT: -22% to -63% (prefill is local to router) - TPOT/ITL: unchanged (within noise) - 25% fewer nodes, no performance regression

Replace the custom Docker image (vllm_disagg_pd:latest) with the public vllm/vllm-openai-rocm:v0.17.1 base image. Missing components (UCX, RIXL, etcd, libionic1, vllm-router) are now installed at container start via setup_deps.sh, which is sourced by server.sh. This eliminates the need to build, host, and maintain a custom image — CI nodes can pull directly from Docker Hub. Changes: - Add setup_deps.sh: idempotent installer for UCX (ROCm fork), RIXL, etcd, libionic1 (Pensando ionic), and vllm-router (NODE_RANK=0 only). Build steps run in subshells to avoid CWD pollution. - server.sh: source setup_deps.sh before any other logic - job.slurm: add --entrypoint "" to override the base image's vllm CLI entrypoint, allowing bash -lc to work correctly - env.sh: update comment (paths now set by setup_deps.sh, not image ENV) - amd-master.yaml: image changed to vllm/vllm-openai-rocm:v0.17.1 Tested locally (Job 2807, 3 nodes, ISL/OSL=1024): Setup overhead: ~2.5 min per node (all components built from source) Benchmark completed successfully across concurrency 8/16/32/64

…ecode Enable MoRI-based Expert Parallelism (--enable-expert-parallel --all2all-backend mori) on decode workers for DeepSeek-R1-0528, while keeping TP=8 to preserve KV cache transfer compatibility with the prefill node via NixlConnector. This matches SGLang's approach of TP=8 + EP within the TP group. KV Transfer: RIXL/NixlConnector (unchanged) MoE All-to-All: NCCL (default) -> MoRI-EP (--all2all-backend mori) Changes: - models.yaml: Add --enable-expert-parallel --all2all-backend mori to decode_flags; increase engine ready timeout to 1200s - setup_deps.sh: Add MoRI install and vLLM v0.17.1 patches for MoRI-EP + FP8 compatibility (AITER assertion, defer_input_quant) - server.sh: Support decode_env from models.yaml for decode-specific environment overrides - dsr1_fp8_mi355x_vllm-disagg.sh: Pass NODELIST to submit.sh for Slurm node constraints

@C64

…roxy Replace NixlConnector with MoRIIOConnector for KV cache transfer and replace the Rust-based vllm-router with a MoRI-IO-aware Python proxy that handles both HTTP routing and ZMQ-based RDMA endpoint discovery. The key architectural change is that the proxy enriches each request's kv_transfer_params with remote RDMA endpoint info (handshake_port, notify_port, host, port) before dispatching, enabling concurrent prefill+decode in WRITE mode — something vllm-router could not do because it only understands HTTP, not the MoRI-IO registration protocol. Changes: - Add moriio_proxy.py: MoRI-IO-aware proxy with ZMQ service discovery, request enrichment, and /health endpoint (adapted from vLLM upstream moriio_toy_proxy_server.py) - server.sh: switch --kv-transfer-config from NixlConnector to MoRIIOConnector with kv_connector_extra_config (proxy_ip, proxy_ping_port, http_port); launch proxy before prefill on NODE_RANK=0; set VLLM_DISABLE_REQUEST_ID_RANDOMIZATION=1 as workaround for v0.17.1 completion-ID mismatch (upstream fix: vllm-project/vllm#34907) - setup_deps.sh: replace vllm-router/Rust install with lightweight Python deps (quart, aiohttp, msgpack, pyzmq) for the proxy Benchmark (Job 2853 vs 2818 NixlConnector baseline, ISL/OSL=1024): TTFT median: -37% to -55% across C8–C64 (e.g. 384→241ms @C64) TTFT p99: -63% at C64 (6622→2469ms) Throughput: +8% at C64 (2634→2844 tok/s) TPOT: unchanged (~22ms @C64)

Signed-off-by: Theresa Shan <theresa.shan@amd.com>

… still ran the first barrier; 2. kill and kill run only when DRY_RUN=0 Signed-off-by: Theresa Shan <theresa.shan@amd.com>

Enable READ-mode KV transfer (decode-initiated RDMA reads) with a critical scheduler assertion fix, and add safety timeouts to prevent indefinite hangs during RDMA transfers. Changes: - setup_deps.sh: Add patches — save_kv_layer/start_load_kv handshake timeouts (30s), RDMA transfer timeout (120s), deferred write task expiry (60s), write worker error handling, and scheduler assertion fix for READ-mode intermediate request states - moriio_proxy.py: Add stream idle timeout (PROXY_STREAM_IDLE_TIMEOUT) to abort stalled decode streams, and proper response.release() - submit.sh, job.slurm: Plumb PROXY_STREAM_IDLE_TIMEOUT and VLLM_MORIIO_CONNECTOR_READ_MODE env vars into Docker containers Validated: 1k/1k full sweep (C8–C512), 100% success rate at all concurrency levels, peak 8500 output tok/s at C512.

Port the vLLM disaggregated serving pipeline from the 4N cluster (Pensando ionic NICs) to the 9N mia1 cluster (mlx5/rdma NICs). Key changes: - Fix C512 deadlock: apply ucx_error_handling_mode=none universally instead of only for ionic NICs. Under high concurrency, UCX's default UCP_ERR_HANDLING_MODE_PEER prevents RIXL RDMA READ retries from recovering after ibv_post_send queue exhaustion, causing prefill KV cache saturation and pipeline deadlock. - Force-reinstall MoRI from b645fc8 to fix PCI topology assertion failure on nodes with Broadcom PEX890xx PCIe switches. - Auto-detect Docker privilege (sudo vs non-sudo) for cross-cluster portability. - Add SLURM_EXCLUDE_NODES support to skip nodes with broken Docker sockets. - Increase VLLM_ENGINE_READY_TIMEOUT_S to 3600 to accommodate longer setup times (RIXL/MoRI source builds over NFS).

…rdening Server-side: RIXL can lose `finished_sending` notifications under high concurrency with ibv_post_send failures, permanently leaking prefill KV blocks. Over multiple benchmark rounds (sweep), leaked blocks accumulate and saturate the prefill KV cache, deadlocking C512. - Fix finished_sending handler to unconditionally free KV blocks (the conditional status check had no recovery path, causing leaks) - Add idle KV block reaper: detects engine idle >5s with finished requests still holding blocks, then force-frees them - Add 10s cooldown between benchmark rounds for reaper activation Client-side: SSE streaming loop did not break on the [DONE] sentinel, causing the benchmark client to hang when the proxy held connections open after request completion. - Break SSE loop on [DONE] in completions and chat completions - Share a single aiohttp.ClientSession across all requests (connection pooling via TCPConnector instead of per-request session creation) - Add asyncio.wait_for timeout around asyncio.gather with proper task cancellation and partial result collection - Reduce AIOHTTP_TIMEOUT from 6h to 30min Verified: sweep 1K/1K C128→C256→C512 all pass (Job 6222, 9N cluster).

…pletion Background processes (proxy, prefill, decode, etcd) were started via `cmd 2>&1 | tee logfile &`, causing bash $! to capture the PID of tee rather than the actual process. `kill $pid` only killed tee, leaving the real process running. The proxy kept port 30000 open, so decode nodes' `sync.py wait` never detected shutdown and the Slurm job hung forever. Additionally, etcd's stderr was not redirected, holding the Docker container's main pipe open and preventing container exit even after server.sh completed. Changes: - Redirect all background processes to log files instead of piping through tee, so $! captures the correct PID (matches SGLang pattern) - Redirect etcd launcher's stderr to prevent pipe leak - Add pkill fallback cleanup for proxy, vllm, and etcd processes - Increase barrier grace period to handle node setup time variance - Increase container creation barrier timeout from 300s to 600s

Fix per-node Docker privilege detection in vLLM disagg job.slurm

Docker containers run as root, so __pycache__/*.pyc files created during benchmark_serving.py import end up root-owned on the NFS workspace. The CI runner cannot delete them, breaking checkout. Set PYTHONPYCACHEPREFIX=/tmp/pycache in the Docker env so bytecache stays inside the container. Remove the previous server.sh find-and- delete workaround since the root cause is now addressed.

The idle KV block reaper only fired when both running=0 AND waiting=0. Under 8K ISL at C64+, leaked blocks filled the prefill KV cache while new requests queued in WAITING state — the non-empty wait queue prevented the reaper from ever triggering, causing a permanent hang. Remove the waiting-queue check so the reaper fires whenever no requests are actively running, which is precisely when leaked blocks can be safely reclaimed. Verified with 8K/1K sweep (C32–C512) completing without hangs.

…DECODE_EP,DECODE_DP_ATTN from amd-master.yaml config. Signed-off-by: Theresa Shan <theresa.shan@amd.com>

github-actions · 2026-03-26T07:33:41Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

claude · 2026-03-26T07:42:24Z

benchmarks/multi_node/vllm_disagg_utils/moriio_proxy.py

+        req_data_to_prefill["kv_transfer_params"]["remote_dp_size"] = (
+            decode_instance_endpoint["dp_size"]
+        )
+        req_data_to_prefill["kv_transfer_params"]["remote_tp_size"] = (
+            decode_instance_endpoint["tp_size"]


🔴 In handle_request(), req_data["max_tokens"] -= 1 is unconditional but the proxy registers both /v1/completions and /v1/chat/completions routes. Chat completion clients use max_completion_tokens instead of max_tokens, so any chat completions request without a max_tokens key raises a KeyError, returning a 500 error for every chat completions request routed through this proxy. The fix should mirror send_request_to_prefill(), which already handles both fields correctly.

Extended reasoning...

What the bug is and how it manifests

In moriio_proxy.py, the handle_request() function is registered for both /v1/completions and /v1/chat/completions routes. When forwarding a request to the decode worker, line 269 does:

req_data["max_tokens"] -= 1

This assumes max_tokens is always present in req_data. However, the OpenAI Chat Completions API uses max_completion_tokens as the field name, not max_tokens. A client calling /v1/chat/completions that omits max_tokens (the normal case, since it would instead set max_completion_tokens) will trigger a KeyError here.

The specific code path that triggers it

The route decorator:

@app.route("/v1/completions", methods=["POST"]) @app.route("/v1/chat/completions", methods=["POST"]) async def handle_request():

Directs both endpoint types to the same handler. Inside, before building kv_transfer_params for the decode worker, the code executes req_data["max_tokens"] -= 1 unconditionally. A chat completion payload like {"messages": [...], "max_completion_tokens": 512, "model": "..."} has no max_tokens key, causing a KeyError.

Why existing code does not prevent it

The try/except Exception block surrounding handle_request() catches the KeyError and returns a 500 Internal Server Error response — so the server stays alive, but every chat completions request fails. The bug is entirely silent from the server-operator perspective unless logs are checked.

What the impact would be

The /v1/chat/completions endpoint is advertised and registered but completely non-functional for any client using standard chat completions semantics (i.e., sending max_completion_tokens without also sending max_tokens). This is the standard OpenAI SDK behavior for chat completions.

How to fix it

The fix is to mirror what send_request_to_prefill() already does correctly:

# send_request_to_prefill — correct handling: req_data_copy["max_tokens"] = 1 if "max_completion_tokens" in req_data_copy: req_data_copy["max_completion_tokens"] = 1

For the decode path, the equivalent fix would be:

if "max_tokens" in req_data: req_data["max_tokens"] -= 1 if "max_completion_tokens" in req_data: req_data["max_completion_tokens"] -= 1

Step-by-step proof

Client sends POST /v1/chat/completions with body {"model": "...", "messages": [{"role": "user", "content": "hello"}], "max_completion_tokens": 100}

handle_request() is invoked; req_data = {"model": ..., "messages": [...], "max_completion_tokens": 100}

Execution reaches line 269: req_data["max_tokens"] -= 1

"max_tokens" is not a key in req_data → KeyError: max_tokens

Outer except Exception as e catches it and returns (f"Internal Server Error: {e!s}", 500)

Client receives a 500 error; all chat completions requests through the proxy fail

The fact that send_request_to_prefill() explicitly handles both max_tokens and max_completion_tokens confirms the author was aware of the dual field naming — the decode path was simply not updated consistently.

claude · 2026-03-26T07:42:24Z

benchmarks/multi_node/vllm_disagg_utils/setup_deps.sh

+export UCX_HOME="${UCX_HOME}"
+export RIXL_HOME="${RIXL_HOME}"
+export PATH="${UCX_HOME}/bin:/usr/local/bin/etcd:/root/.cargo/bin:${PATH}"
+export LD_LIBRARY_PATH="${UCX_HOME}/lib:${RIXL_HOME}/lib:${RIXL_HOME}/lib/x86_64-linux-gnu:${LD_LIBRARY_PATH:-}"
+
+_SETUP_END=$(date +%s)


🟡 In setup_deps.sh, the if-branch that logs "All dependencies already present" prints the raw Unix epoch timestamp (~1748000000) as the wallclock time instead of the elapsed seconds. The else-branch correctly computes the elapsed time with $(( _SETUP_END - _SETUP_START ))s; the if-branch should do the same.

Extended reasoning...

What the bug is and how it manifests

At the very end of setup_deps.sh, the script records _SETUP_END=$(date +%s) and checks whether any components were newly installed. If nothing was installed (the common fast path on container re-use), it prints:

[SETUP] All dependencies already present (${_SETUP_END}s wallclock)

_SETUP_END is a Unix epoch timestamp — a large integer like 1748000000. The message therefore reads something like (1748000000s wallclock) instead of the intended elapsed time (e.g., (2s wallclock)).

The specific code path that triggers it

The final block of setup_deps.sh:

_SETUP_END=$(date +%s) if [[ ${#_SETUP_INSTALLED[@]} -eq 0 ]]; then echo "[SETUP] All dependencies already present (${_SETUP_END}s wallclock)" # BUG: should be elapsed else echo "[SETUP] Installed: ${_SETUP_INSTALLED[*]} in $(( _SETUP_END - _SETUP_START ))s" fi

The else branch correctly subtracts _SETUP_START from _SETUP_END; the if branch is missing that arithmetic.

Why existing code does not prevent it

There is no test for the log output and the script runs without error — the echo simply emits an incorrect number. The else branch was written correctly, suggesting the if branch was a copy-paste oversight.

Step-by-step proof

Script starts at Unix epoch T=1748000000, so _SETUP_START=1748000000.

All dependencies are already present; _SETUP_INSTALLED remains empty.

Script reaches the final block 2 seconds later: _SETUP_END=1748000002.

${#_SETUP_INSTALLED[@]} -eq 0 is true, so the if branch executes.

Actual output: [SETUP] All dependencies already present (1748000002s wallclock) — clearly wrong.

Expected output: [SETUP] All dependencies already present (2s wallclock).

Impact

Purely cosmetic — no functional impact. However, the wallclock message is the primary timing signal operators use to verify the fast-path completed quickly. Displaying ~1.7 billion seconds makes the log useless for that purpose.

Fix

Replace ${_SETUP_END}s with $(( _SETUP_END - _SETUP_START ))s in the if-branch to match the else-branch.

claude · 2026-03-26T07:42:24Z

benchmarks/multi_node/vllm_disagg_utils/moriio_proxy.py

+                pass
+            elif (
+                data["type"] == "register"
+                and data["role"] == "P"
+                and data["request_address"] not in prefill_instances
+            ):
+                with _list_lock:
+                    _append_whole_dict_unique(prefill_instances, data)
+
+            elif (
+                data["type"] == "register"
+                and data["role"] == "D"
+                and data["request_address"] not in decode_instances
+            ):
+                with _list_lock:
+                    _append_whole_dict_unique(decode_instances, data)


🟡 In _listen_for_register(), the guards data["request_address"] not in prefill_instances and data["request_address"] not in decode_instances are always True because they compare a string against a list of dicts — a string can never equal a dict in Python. The guards are dead code that provide no actual filtering. Fix by checking data["request_address"] not in [d["request_address"] for d in prefill_instances], or remove the redundant guards entirely since _append_whole_dict_unique() handles deduplication correctly.

Extended reasoning...

Bug: Type mismatch makes deduplication guards always True

data["request_address"] is a string (e.g., "http://192.168.1.1:2584"), while prefill_instances and decode_instances are declared as list[dict] (line ~35: prefill_instances: list[dict] = []). Python’s in operator for lists checks element equality via ==; since a string is never equal to a dict, data["request_address"] not in prefill_instances evaluates to True unconditionally, making the guard a no-op.

The same issue applies to both the prefill branch and the decode branch:

and data["request_address"] not in prefill_instances # always True ... and data["request_address"] not in decode_instances # always True

The intent of the guard is clearly early-exit deduplication — skip calling _append_whole_dict_unique if an instance with the same address is already registered. However, because the type comparison is wrong, the guard never fires and _append_whole_dict_unique() is always called.

Functionally there is no impact because _append_whole_dict_unique() performs correct deduplication by comparing filtered dicts. As long as that function remains in place, no duplicates are registered. However, the guard is semantically broken dead code: a reader would reasonably assume it provides an O(n) fast path that skips the lock acquisition and full dict comparison, but it does not.

The risk materializes if _append_whole_dict_unique is ever removed or refactored away under the assumption the guard provides the deduplication — in that case, duplicate instances would be silently registered, causing requests to be routed to stale/duplicate endpoints.

Step-by-step proof:

A worker calls into the ZMQ router with data = {"type": "register", "role": "P", "request_address": "http://10.0.0.1:2584", ...}

prefill_instances at this point is [{"type": "register", "role": "P", "request_address": "http://10.0.0.1:2584", ...}] (previously registered)

Python evaluates "http://10.0.0.1:2584" not in [{"type": "register", ...}]

Python checks: "http://10.0.0.1:2584" == {"type": "register", ...} → False

Therefore not in → True, and the branch body executes — calling _append_whole_dict_unique again for the duplicate

_append_whole_dict_unique detects the duplicate and returns False without appending, saving the day

Fix: Replace the broken guard with a correct one:

and data["request_address"] not in [d["request_address"] for d in prefill_instances]

Or simply remove the guard entirely since _append_whole_dict_unique() already handles deduplication correctly under the lock.

functionstackx · 2026-03-26T19:49:32Z

benchmarks/multi_node/vllm_disagg_utils/moriio_proxy.py

+# MoRI-IO proxy server for vLLM PD disaggregation.
+#
+# Based on vLLM's examples/online_serving/disaggregated_serving/moriio_toy_proxy_server.py
+# with the following adaptations for production multi-node use:
+#   - Ports configurable via PROXY_HTTP_PORT / PROXY_PING_PORT env vars
+#   - /health endpoint for sync.py barrier readiness checks
+#   - Uses stdlib `re` instead of `regex` to avoid extra dep
+#
+# The proxy performs two roles that vllm-router cannot:
+#   1. ZMQ service discovery — prefill/decode workers register their RDMA ports


should we use vllm router instead here? that seems like the more production ready version than this toy proxy?

chunfangamd and others added 20 commits March 11, 2026 16:21

[AMD] BUG fix: RANDOM_RANGE_RATIO never reaches bench.sh

69fcdbd

Signed-off-by: Theresa Shan <theresa.shan@amd.com>

Bug fix: 1. With DRY_RUN=1, node 0 skipped starting proxy/prefill but…

d214e79

… still ran the first barrier; 2. kill and kill run only when DRY_RUN=0 Signed-off-by: Theresa Shan <theresa.shan@amd.com>

[AMD] Enable MoRI-IO READ mode by default for vLLM disagg

cb52c29

[AMD] Fix CI checkout failure caused by root-owned __pycache__ files

25a0310

Fix per-node Docker privilege detection in vLLM disagg job.slurm

[AMD] Enable reading PREFILL_TP,PREFILL_EP,PREFILL_DP_ATTN,DECODE_TP,…

f611f47

…DECODE_EP,DECODE_DP_ATTN from amd-master.yaml config. Signed-off-by: Theresa Shan <theresa.shan@amd.com>

Merge branch 'main' into chun-oren-theresa/vllm_disagg

bec9c09

chunfangamd requested a review from a team March 26, 2026 07:33

chunfangamd requested a review from billishyahao as a code owner March 26, 2026 07:33

github-project-automation bot added this to InferenceMAX Board Mar 26, 2026

claude bot reviewed Mar 26, 2026

View reviewed changes

chunfangamd changed the title ~~[Draft] MVP for vLLM Disagg~~ [Draft, no merge] MVP for vLLM Disagg Mar 26, 2026

functionstackx reviewed Mar 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft, no merge] MVP for vLLM Disagg #948

[Draft, no merge] MVP for vLLM Disagg #948
chunfangamd wants to merge 20 commits intomainfrom
amd/vllm_disagg_mvp

chunfangamd commented Mar 26, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

claude bot Mar 26, 2026

Uh oh!

claude bot Mar 26, 2026

Uh oh!

claude bot Mar 26, 2026

Uh oh!

functionstackx Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chunfangamd commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

claude bot Mar 26, 2026

Choose a reason for hiding this comment

What the bug is and how it manifests

The specific code path that triggers it

Why existing code does not prevent it

What the impact would be

How to fix it

Step-by-step proof

Uh oh!

claude bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 26, 2026

Choose a reason for hiding this comment

Bug: Type mismatch makes deduplication guards always True

Uh oh!

functionstackx Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chunfangamd commented Mar 26, 2026 •

edited

Loading