With the vx4b architecture the arithmetic right-shift becomes less trivial as rsa instruction does not accommodate the case when shifting for more then the length of the word (undefined behaviour by the C standard). PR #222 adds a new API called ashr32_sat, trying to come up with the most generic arithmetic shift (with flooring when shifting right and saturations when shifting left). The API is partially tested by the higher-level APIs that use it but it needs to be tested on its own. Thinking about it, there's a need for the s16 (and maybe s64) variants. Would probably have to be moved into the scalar APIs and called something like s32_shr or s32_ashr and reimplemented fully in assembly.
With the
vx4barchitecture the arithmetic right-shift becomes less trivial asrsainstruction does not accommodate the case when shifting for more then the length of the word (undefined behaviour by the C standard). PR #222 adds a new API calledashr32_sat, trying to come up with the most generic arithmetic shift (with flooring when shifting right and saturations when shifting left). The API is partially tested by the higher-level APIs that use it but it needs to be tested on its own. Thinking about it, there's a need for thes16(and maybes64) variants. Would probably have to be moved into the scalar APIs and called something likes32_shrors32_ashrand reimplemented fully in assembly.