Skip to content

Share kv cache compile spec#18864

Merged
Gasoonjia merged 3 commits intomainfrom
share-kv-cache-compile-spec
Apr 14, 2026
Merged

Share kv cache compile spec#18864
Gasoonjia merged 3 commits intomainfrom
share-kv-cache-compile-spec

Conversation

@Gasoonjia
Copy link
Copy Markdown
Contributor

@Gasoonjia Gasoonjia commented Apr 14, 2026

Currently we blindly share kv cache cross all prefill + decode methods, making parakeet model generate garbage output.
This PR creates a cuda backend spec to control the KV cache sharing across different methods. Default is not sharing.

…ods compile spec

The cross-method constant sharing code in CudaBackend::init() was running
unconditionally for all multi-method models, which corrupts weights for
models like Parakeet where different methods have different sub-models
(encoder, decoder, joint) that should NOT share constants.

This change:
- Adds a new `share_kv_cache_across_methods` compile spec that must be
  explicitly set to enable cross-method constant sharing
- Guards the sharing logic behind this compile spec (previously ran for
  all models with the required AOTI APIs)
- Makes sharing failures return Error::Internal instead of just logging
- Adds generate_share_kv_cache_compile_spec() to AotiBackend Python API
- Updates Qwen3.5 MoE export to opt-in to sharing for prefill/decode

Without this spec set, each method gets its own independent constants,
fixing the Parakeet CUDA CI regression.
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Apr 14, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18864

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 3 Unrelated Failures

As of commit c3456e6 with merge base 7e099b4 (image):

NEW FAILURE - The following job has failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 14, 2026
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@Gasoonjia Gasoonjia marked this pull request as ready for review April 14, 2026 07:45
@Gasoonjia Gasoonjia requested a review from lucylq as a code owner April 14, 2026 07:45
@Gasoonjia Gasoonjia merged commit 875f7c8 into main Apr 14, 2026
347 of 352 checks passed
@Gasoonjia Gasoonjia deleted the share-kv-cache-compile-spec branch April 14, 2026 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/cuda CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants