Conversation
…ods compile spec The cross-method constant sharing code in CudaBackend::init() was running unconditionally for all multi-method models, which corrupts weights for models like Parakeet where different methods have different sub-models (encoder, decoder, joint) that should NOT share constants. This change: - Adds a new `share_kv_cache_across_methods` compile spec that must be explicitly set to enable cross-method constant sharing - Guards the sharing logic behind this compile spec (previously ran for all models with the required AOTI APIs) - Makes sharing failures return Error::Internal instead of just logging - Adds generate_share_kv_cache_compile_spec() to AotiBackend Python API - Updates Qwen3.5 MoE export to opt-in to sharing for prefill/decode Without this spec set, each method gets its own independent constants, fixing the Parakeet CUDA CI regression.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18864
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 3 Unrelated FailuresAs of commit c3456e6 with merge base 7e099b4 ( NEW FAILURE - The following job has failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
Currently we blindly share kv cache cross all prefill + decode methods, making parakeet model generate garbage output.
This PR creates a cuda backend spec to control the KV cache sharing across different methods. Default is not sharing.