Skip to content

Add NoQEngine fallback for quantized ops on RISC-V#15

Open
XYenChi wants to merge 14 commits into
RuyiAI-Stack:riscvfrom
XYenChi:noq
Open

Add NoQEngine fallback for quantized ops on RISC-V#15
XYenChi wants to merge 14 commits into
RuyiAI-Stack:riscvfrom
XYenChi:noq

Conversation

@XYenChi
Copy link
Copy Markdown
Collaborator

@XYenChi XYenChi commented May 8, 2026

Implement PackedLinearWeightNoQEngine and PackedConvWeightNoQEngine classes that dequantize inputs, run float computation, and requantize outputs. This provides a working fallback when no hardware-specific quantized engine (FBGEMM, QNNPACK, ONEDNN) is available.

XYenChi and others added 14 commits April 19, 2026 23:32
* Add RISC-V 64 BLOCK_LIST

* Skip long time testcase
* Add riscv64 ci with PR
⭐ Run Main Diff base and head
Push to riscv
From https://github.com/RuyiAI-Stack/pytorch
 * branch              riscv      -> FETCH_HEAD
fatal: Not a valid object name origin/main
Error:   ❌  Failure - Main Diff base and head
Error: exit status 128
* mklnn is unavailable on RISC-V

* Remove test_cpu_select_algorithm from block_list

* Fix block list format
bytes_to_scalar previously round-tripped raw bytes through Python
float/complex values (via ctypes) before constructing the tensor. This
loses NaN bit patterns on architectures (such as RISC-V) that
canonicalize NaNs in floating-point loads/conversions, causing
test_bytes_to_scalar_cpu_{float32,float64,complex64,complex128} to
fail with mismatched storage bytes.

Construct the scalar tensor by writing the raw bytes directly into its
untyped storage so all input bit patterns (including NaN payloads) are
preserved exactly.
These cases are too slow on riscv64, adding them to here simply

Drop test_torch from the list because it is one core case
bytes_to_scalar previously round-tripped raw bytes through Python
float/complex values (via ctypes) before constructing the tensor. This
loses NaN bit patterns on architectures (such as RISC-V) that
canonicalize NaNs in floating-point loads/conversions, causing
test_bytes_to_scalar_cpu_{float32,float64,complex64,complex128} to
fail with mismatched storage bytes.

Construct the scalar tensor by writing the raw bytes directly into its
untyped storage so all input bit patterns (including NaN payloads) are
preserved exactly.
Fix bytes_to_scalar for float/complex on RISC-V
Implement PackedLinearWeightNoQEngine and PackedConvWeightNoQEngine
classes that dequantize inputs, run float computation, and requantize
outputs. This provides a working fallback when no hardware-specific
quantized engine (FBGEMM, QNNPACK, ONEDNN) is available.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants