This document provides a quantization support matrix for the following frameworks listed below:
| Framework | Backend Library | Symmetric Quantization | Asymmetric Quantization |
|---|---|---|---|
| TensorFlow | oneDNN | Activation (int8/uint8), Weight (int8) | - |
| PyTorch | FBGEMM | Activation (uint8), Weight (int8) | Activation (uint8) |
| PyTorch IPEX | oneDNN | Activation (int8/uint8), Weight (int8) | - |
| MXNet | oneDNN | Activation (int8/uint8), Weight (int8) | - |
| ONNX Runtime | MLAS | Weight (int8) | Activation (uint8) |
- Symmetric Quantization
- int8: scale = 2 * max(abs(rmin), abs(rmax)) / (max(int8) - min(int8) - 1)
- uint8: scale = max(rmin, rmax) / (max(uint8) - min(uint8))
- Symmetric Quantization
- int8: scale = max(abs(rmin), abs(rmax)) / (float(max(int8) - min(int8)) / 2)
- uint8: scale = max(abs(rmin), abs(rmax)) / (float(max(int8) - min(int8)) / 2)
- Asymmetric Quantization
- uint8: scale = (rmax - rmin) / (max(uint8) - min(uint8)); zero_point = min(uint8) - round(rmin / scale)
- Symmetric Quantization
- int8: scale = 2 * max(abs(rmin), abs(rmax)) / (max(int8) - min(int8) - 1)
- uint8: scale = max(rmin, rmax) / (max(uint8) - min(uint8))
- Symmetric Quantization
- int8: scale = 2 * max(abs(rmin), abs(rmax)) / (max(int8) - min(int8) - 1)
- uint8: scale = max(rmin, rmax) / (max(uint8) - min(uint8))
- Symmetric Quantization
- int8: scale = 2 * max(abs(rmin), abs(rmax)) / (max(int8) - min(int8) - 1)
- Asymmetric Quantization
- uint8: scale = (rmax - rmin) / (max(uint8) - min(uint8)); zero_point = min(uint8) - round(rmin / scale)