Extend support of batch_cast<...> to upcasting to a type twice as big#1184
Conversation
|
@AntoinePrv : this is just some mockup to check if we're good with the API |
9e19732 to
4d848a8
Compare
|
Also cc @JohanMabille & @DiamonDinoia for the API. I didn't want to support the generic conversion, say from uint8 to uint32 with larger arrays, let's start humble |
|
I would call the API On the implementation:
namespace {
template<class T> struct widen;
template<> struct widen<uint8_t> { using type = uint16_t; };
template<> struct widen<uint16_t> { using type = uint32_t; };
template<> struct widen<uint32_t> { using type = uint64_t; };
template<class T>
using widen_t_unsigned = typename widen<T>::type;
} // namespace
template<class T>
struct widen {
static_assert(std::is_integral<T>::value && !std::is_same<T,bool>::value,
"integral non-bool type required");
using U = typename std::make_unsigned<T>::type;
using UWide = detail::widen_t_unsigned<U>;
using type = typename std::conditional<
std::is_signed<T>::value,
typename std::make_signed<UWide>::type,
UWide
>::type;
};
template<class T>
using widen_t = typename widen<T>::type; |
4d848a8 to
affe742
Compare
|
@DiamonDinoia yep, widen looks like a good name, thanks! |
520e96d to
b76733d
Compare
| template <typename T> | ||
| struct widen : widen<typename std::make_unsigned<T>::type> | ||
| { | ||
| }; | ||
|
|
||
| template <> | ||
| struct widen<uint32_t> | ||
| { | ||
| using type = uint64_t; | ||
| }; | ||
| template <> | ||
| struct widen<uint16_t> | ||
| { | ||
| using type = uint32_t; | ||
| }; | ||
| template <> | ||
| struct widen<uint8_t> | ||
| { | ||
| using type = uint16_t; | ||
| }; | ||
| template <> | ||
| struct widen<float> | ||
| { | ||
| using type = double; | ||
| }; |
There was a problem hiding this comment.
I think a trait that convert a byte size to a uint/float type could be more generic:
template <>
struct sized_uint<4>
{
using type = uint32_t;
};
...And then use with batch<sized_uint_t<2*sizeof(T)>> (or an alias for that).
There was a problem hiding this comment.
I am not in favor of this. Then we need sized_int and sized_uint and the API will need an std::conditional<std::is_signed<T>, sized_int_t<2*sizeof(T)>, sized_uint_t<2*sizeof(T)>
AntoinePrv
left a comment
There was a problem hiding this comment.
Thanks for taking the time! Looking good
|
My only suggestion would be to merge masked_memory ops. Or merge lower/upper from it and use it here. |
Yup, I'll wait for it to be merged! |
471c67f to
a470cf8
Compare
| return { batch<T_out, A>::load_aligned(&out_buffer[0]), | ||
| batch<T_out, A>::load_aligned(&out_buffer[batch<T_out, A>::size]) }; |
There was a problem hiding this comment.
out_buffer
and out_buffer + batch<T_out, A>::size seems clearer to me
There was a problem hiding this comment.
(I mean doing pointer arithmetic directly instead of using the [] and & operators)
@DiamonDinoia : Are you fine if I split (haha) that part of your commit in a seperate one so that we don't block each other? |
6e570d1 to
f8136da
Compare
Sure, go ahead. I'll rebase once I have time |
It was relying on an intrinsic only available with avx512dq
f8136da to
6b93d0f
Compare
done (as in: I created a separate PR to extract that part of your patch, merged it and rebased my PR on top of it) |
6b93d0f to
b9c9e8d
Compare
Intel + common implementation + test + doc Fix #1179
b9c9e8d to
7ebdc5e
Compare
Fix #1179