Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions zlib-rs/src/crc32.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ use crate::CRC32_INITIAL_VALUE;
pub(crate) mod acle;
mod braid;
mod combine;
#[cfg(target_arch = "loongarch64")]
mod loongarch;
#[cfg(target_arch = "x86_64")]
mod pclmulqdq;
#[cfg(target_arch = "x86_64")]
Expand Down Expand Up @@ -81,8 +83,15 @@ impl Crc32Fold {
return;
}

// in this case the start value is ignored
self.value = braid::crc32_braid::<5>(self.value, src);
#[cfg(target_arch = "loongarch64")]
{
self.value = self::loongarch::crc32_loongarch64(self.value, src);
}
#[cfg(not(target_arch = "loongarch64"))]
{
// in this case the start value is ignored
self.value = braid::crc32_braid::<5>(self.value, src);
}
}

pub fn fold_copy(&mut self, dst: &mut [u8], src: &[u8]) {
Expand Down
118 changes: 118 additions & 0 deletions zlib-rs/src/crc32/loongarch.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
pub fn crc32_loongarch64(crc: u32, buf: &[u8]) -> u32 {
let mut c = !crc as i32;

// SAFETY: [u8; 8] safely transmutes into i64.
let (before, middle, after) = unsafe { buf.align_to::<i64>() };

c = remainder(c, before);

if middle.is_empty() && after.is_empty() {
return !c as u32;
}

for d in middle {
c = crc_w_d_w(*d, c);
}

c = remainder(c, after);

!c as u32
}

#[inline]
fn remainder(mut c: i32, mut buf: &[u8]) -> i32 {
if let [b0, b1, b2, b3, rest @ ..] = buf {
c = crc_w_w_w(i32::from_le_bytes([*b0, *b1, *b2, *b3]), c);
buf = rest;
}

if let [b0, b1, rest @ ..] = buf {
c = crc_w_h_w(i16::from_le_bytes([*b0, *b1]), c);
buf = rest;
}

if let [b0, rest @ ..] = buf {
c = crc_w_b_w(*b0 as i8, c);
buf = rest;
}

debug_assert!(buf.is_empty());

c
}

crate::cfg_select! {
miri => {
use core::arch::loongarch64::{crc_w_b_w, crc_w_h_w, crc_w_w_w, crc_w_d_w};
}
_ => {
use asm::{crc_w_b_w, crc_w_h_w, crc_w_w_w, crc_w_d_w};
}
}

// FIXME: there are intrinsics for these in the standard library, but currently
// unstable behind the stdarch_loongarch feature
//
// CRC32 instructions are part of the basic integer operations and therefore
// always available.
mod asm {
/// CRC32 single round checksum for bytes (8 bits).
///
/// [Loongson's documentation](https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#crc-check-instructions)
pub fn crc_w_b_w(data: i8, mut crc: i32) -> i32 {
unsafe {
core::arch::asm!(
"crc.w.b.w {crc}, {data}, {crc}",
crc = inout(reg) crc,
data = in(reg) data,
options(pure, nomem, nostack, preserves_flags)
);
}
crc
}
Comment on lines +56 to +72
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here is an idea: rust-lang/miri#4899 added crc32 support for aarch, you could add similar support for loongarch? A requirement for new shims is often that you actually check the results on real hardware, but that is something you can actually do.

With miri support we can actually test this in CI today if instead of asm you use the intrinsics here. We can add some __internal feature for it if needed.

Separately we could actually stabilize the crc intrinsics on loongarch if the target maintainer is OK with that. For signatures we follow the clang headers by default (but it's ultimately up to the target maintainer), hence the i32 instead of more accurate widths.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, adding support for these in Miri sounds good. I'll try to put together a PR this evening.

With miri support we can actually test this in CI today if instead of asm you use the intrinsics here. We can add some __internal feature for it if needed.

So you would be okay with requiring nightly behind a feature flag?

Separately we could actually stabilize the crc intrinsics on loongarch if the target maintainer is OK with that. For signatures we follow the clang headers by default (but it's ultimately up to the target maintainer), hence the i32 instead of more accurate widths.

Funny enough, the clang headers do have different (in my view, better) signatures (see https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/larchintrin.h#L60) and use i8 and i16 where appropriate. GCC also seems to do this better (https://github.com/gcc-mirror/gcc/blob/master/gcc/config/loongarch/larchintrin.h#L152). I'll submit a patch to align the LLVM intrinsics with clang and GCC. Afterwards I think stabilization would be reasonable, though I've also noticed that LLVM completely prohibits using the CRC intrinsics on 32-bit, while Rust currently exposes them on both 32-bit and 64-bit, which should cause compile errors. I'm not sure whether that's a hardware limitation or not and what is actually correct in that case.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you would be okay with requiring nightly behind a feature flag?

We already do for gzprintf, which will just become a default-enabled feature when our msrv reaches whatever version c-variadic will be stable in. We're conservative with bumping the msrv but c-variadic and custom allocators are enticing reasons to bump.

For now, given that I don't think there is a really urgent use case I'd say add it as an __internal_loongarch_crc32 feature, we can give it a proper name once we have a bit more clarity on the testing and stabilization situation.

Funny enough, the clang headers do have different (in my view, better) signatures

cc @heiher on the crc questions

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm actually, now that I'm looking at it (after I'm hitting operand promotion LLVM crashes when changing the LLVM intrinsics), the aarch64 intrinsics are also defined with i32 in LLVM. The stdarch wrappers around them simply perform the cast, similar to how clang does it. I guess this doesn't need changes in LLVM, only in stdarch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just hope you don't blindly trust Miri for correctness of these operations -- I don't know enough about their intended behavior to make strong promises.^^

OTOH if you are testing both on real HW and Miri and they give the same result, that helps us ensure Miri is correct. :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's the same idea as avx512, which we run/fuzz/ occasionally on the actual hardware. The CI job also provides some protection against rust/stdarch/llvm doing something weird.


/// CRC32 single round checksum for half words (16 bits).
///
/// [Loongson's documentation](https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#crc-check-instructions)
pub fn crc_w_h_w(data: i16, mut crc: i32) -> i32 {
unsafe {
core::arch::asm!(
"crc.w.h.w {crc}, {data}, {crc}",
crc = inout(reg) crc,
data = in(reg) data,
options(pure, nomem, nostack, preserves_flags)
);
}
crc
}

/// CRC32 single round checksum for words (32 bits).
///
/// [Loongson's documentation](https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#crc-check-instructions)
pub fn crc_w_w_w(data: i32, mut crc: i32) -> i32 {
unsafe {
core::arch::asm!(
"crc.w.w.w {crc}, {data}, {crc}",
crc = inout(reg) crc,
data = in(reg) data,
options(pure, nomem, nostack, preserves_flags)
);
}
crc
}

/// CRC32 single round checksum for double words (64 bits).
///
/// [Loongson's documentation](https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#crc-check-instructions)
pub fn crc_w_d_w(data: i64, mut crc: i32) -> i32 {
unsafe {
core::arch::asm!(
"crc.w.d.w {crc}, {data}, {crc}",
crc = inout(reg) crc,
data = in(reg) data,
options(pure, nomem, nostack, preserves_flags)
);
}
crc
}
}
Loading