Feature: Loongson LAPX backend

### Describe what you are looking for

LASX provides 256-bit SIMD with 32 vector registers, analogous to AVX2. Its standout strength is widening integer multiply-accumulate via even/odd split (`xvmaddwev_h_b` / `xvmaddwod_h_b` for i8->i16, then `xvmaddwev_w_h` / `xvmaddwod_w_h` into i32). No native FP16 or BF16 — only f32 and f64 hardware floats.


## dot/ and dots/

The highest-value new kernels are `nk_dot_i8_loongapx` and `nk_dot_u8_loongapx`. The widening even/odd multiply-add pair processes 32 i8 elements per 256-bit iteration into i32 accumulators, matching AVX2 throughput. Sub-byte i4/u4 use `xvandi_b` + shift for nibble extraction, then the same widening chain.

`nk_dot_f32_loongapx` uses `xvfmadd_s` directly. `nk_dot_bf16_loongapx` needs manual upcast: unpack via `xvilvl_h` / `xvilvh_h` with zero, left-shift 16 via `xvslli_w`, reinterpret as f32. FP16 requires a more involved software conversion (sign-extend exponent, shift mantissa). The e4m3/e5m2 float8 types need LUT-based conversion to f32, similar to Haswell. Batched `dots/` variants replicate accumulators across output lanes with the same arithmetic.

Complex dot products (f32c, bf16c) use `xvfmul_s` + `xvxor_v` + `xvfadd_s` for the delayed sign-flip pattern.


## spatial/ and spatials/

`nk_euclidean_f32_loongapx` uses `xvfsub_s` + `xvfmul_s` + `xvfadd_s`. The i8 Euclidean variant benefits most — subtract then widen-multiply the difference with itself through the even/odd chain. Cosine kernels run three accumulators (ab, a^2, b^2) in parallel; LASX's 32 registers handle this comfortably.

BF16/FP16 spatial kernels apply the same manual upcast as dot before the subtract-square-accumulate sequence. Batched `spatials/` tiles fit well at 8 f32 lanes per accumulator, 4 accumulators per tile row.


## set/ and sets/

`nk_hamming_u1_loongapx` and `nk_jaccard_u1_loongapx` use `xvxor_v` + `xvpcnt_b` + horizontal sum. Batched `sets/` variants replicate accumulators with the same pattern.


### Can you contribute to the implementation?

- [x] I can contribute

### Is your feature request specific to a certain interface?

It applies to everything

### Contact Details

_No response_

### Is there an existing issue for this?

- [x] I have searched the existing issues

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Loongson LAPX backend #317

Describe what you are looking for

dot/ and dots/

spatial/ and spatials/

set/ and sets/

Can you contribute to the implementation?

Is your feature request specific to a certain interface?

Contact Details

Is there an existing issue for this?

Code of Conduct

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature: Loongson LAPX backend #317

Description

Describe what you are looking for

dot/ and dots/

spatial/ and spatials/

set/ and sets/

Can you contribute to the implementation?

Is your feature request specific to a certain interface?

Contact Details

Is there an existing issue for this?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions