Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18870
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ⏳ No Failures, 8 PendingAs of commit 325ece4 with merge base c09c713 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
8cdffc6 to
536aea5
Compare
There was a problem hiding this comment.
Pull request overview
This PR aligns Cadence mixed W8A32 quantization behavior with ExecuTorch by matching symmetric quant ranges and ensuring GRU bias quantization uses a shared scale/observer across bias terms.
Changes:
- Introduce a symmetric int8 quantization spec using
[-127, 127]and apply it to mixed W8A32 patterns (linear/conv/GRU). - Update the mixed W8A32 GRU path to use a single bias scale (shared observer) and update the custom op schema accordingly.
- Adjust GRU reference implementation, meta kernel shape inference, fusion pass wiring, and unit tests to reflect the updated bias scaling and output shaping.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| backends/cadence/aot/tests/test_ref_implementations.py | Updates GRU test inputs/signature usage and expected output shape for the new GRU behavior. |
| backends/cadence/aot/ref_implementations.py | Updates GRU ref impl to use a single bias scale and changes output shaping logic. |
| backends/cadence/aot/quantizer/quantizer.py | Adds [-127,127] symmetric qspec and switches mixed W8A32 quantizer to it. |
| backends/cadence/aot/quantizer/patterns.py | Makes conv/gru pattern metadata checks more robust; shares GRU bias observers via SharedQuantizationSpec. |
| backends/cadence/aot/quantizer/fusion_pass.py | Adjusts mixed W8A32 conv metadata propagation and updates GRU args to pass a single bias scale. |
| backends/cadence/aot/ops_registrations.py | Updates GRU op schema to a single bias scale and changes meta output shape inference. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| batch_size = inputs.shape[0] | ||
| input_dim = inputs.shape[1] | ||
| hidden_dim = hidden.shape[-1] | ||
|
|
||
| new_hidden_expanded = new_hidden.unsqueeze(1).expand(batch_size, input_dim, hidden_dim) | ||
|
|
||
| return torch.stack([new_hidden_expanded, new_hidden_expanded], dim=0) |
| seq_len = inputs.shape[1] | ||
| assert seq_len == 1 | ||
| # inputs comes in shape [batch, seq_len, input_size] | ||
| # hidden comes in shape [batch, seq_len, hidden_size] | ||
| # weights_inputs comes in shape [3 * hidden_size, input_size] | ||
| # weights_hidden comes in shape [3 * hidden_size, hidden_size] | ||
| # output comes in empty with shape [2, batch, seq_len, hidden_size] | ||
| # The first dimension stacks the output and the new hidden state | ||
| return hidden.new_empty( | ||
| (2, inputs.shape[0], inputs.shape[1], hidden.shape[-1]), dtype=torch.float32 | ||
| ) |
| expected_shape = (2, inputs.shape[0], inputs.shape[1], hidden.shape[-1]) | ||
| self.assertEqual( | ||
| output.shape, | ||
| (2, *hidden.shape), | ||
| f"Output shape should match {(2, *hidden.shape)} in {name}", | ||
| expected_shape, | ||
| f"Output shape should match {expected_shape} in {name}", |
| assert len(dequants_biases) == 2 | ||
| w_i_scale = dequants_weights[0].args[1] | ||
| w_h_scale = dequants_weights[1].args[1] | ||
| b_i_scale = dequants_biases[0].args[1] | ||
| b_h_scale = dequants_biases[1].args[1] | ||
| b_scale = dequants_biases[0].args[1] | ||
|
|
Summary: Pull Request resolved: pytorch#16607 #### Summary This diff fixes the Conv1d w8a32 operator by adding a transformation to the `val` attribute of the `other_inputs[0].meta` dictionary. Specifically, the `permute` operation is applied to the `original_val` tensor with the `fake_mode` context, and the resulting `transposed_val` is assigned to `transposed_inputs.meta["val"]`. Differential Revision: D89863750 Reviewed By: mcremon-meta
Summary: # Context This diff fixes the reference implementation of the w8a32 GRU operator and enhances the operator's pattern matching. # Mitigation The reference implementation has now the right output dimension and pattern matching now uses a safer check for the operator parameters. Differential Revision: D90437262 Reviewed By: hsharma35
Summary: # Context This diff aims at matching the inference accuracy on device using Executorch. # Summary The quantizer of C++ pipeline needs to be aligned with the quantizer of Executorch. This involves matching the same quantization arithmetic. --- #hthtemplate Reviewed By: hsharma35 Differential Revision: D91777784
536aea5 to
46ec199
Compare
Summary: Pull Request resolved: pytorch#18870 # Context This diff aims at matching the inference accuracy on device using Executorch. # Summary The quantizer of C++ pipeline needs to be aligned with the quantizer of Executorch. This involves matching the same quantization arithmetic. --- #hthtemplate Reviewed By: hsharma35 Differential Revision: D91777784
46ec199 to
325ece4
Compare
Summary:
Context
This diff aims at matching the inference accuracy on device using Executorch.
Summary
The quantizer of C++ pipeline needs to be aligned with the quantizer of Executorch. This involves matching the same quantization arithmetic.
#hthtemplate
Reviewed By: hsharma35
Differential Revision: D91777784