Refactor arithmetic ops to use std::variant + std::visit#723
Refactor arithmetic ops to use std::variant + std::visit#723
Conversation
Remove Arithmetic_internal.hpp/cpp and Ari_ii from linalg_internal_interface
|
Not sure what happened in github action: https://github.com/Cytnx-dev/Cytnx/actions/runs/19826283869/job/56800387981?pr=723 |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #723 +/- ##
==========================================
+ Coverage 32.35% 36.18% +3.83%
==========================================
Files 215 213 -2
Lines 36363 30579 -5784
Branches 14597 12655 -1942
==========================================
- Hits 11764 11065 -699
+ Misses 22659 17679 -4980
+ Partials 1940 1835 -105 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
This is looking good - you've removed about 20,000 lines of code in the last couple of weeks? 😁 There is still a lot of repitition with the template <typename T>
Tensor Add(const T &lc, const Tensor &Rt) {
auto type = cy_typeid_v<T>;
Storage Cnst(1, type);
Cnst.at<T>(0) = lc;
Tensor out;
out._impl = Rt._impl->_clone_meta_only();
out._impl->storage() =
Storage(Rt._impl->storage().size(), std::min(type, Rt.dtype()), Rt.device());
if (Rt.device() == Device.cpu) {
std::visit(
[&](auto *rptr) {
using TR = std::remove_pointer_t<decltype(rptr)>;
cytnx::linalg_internal::AddInternalImpl<type, TR>(
out._impl->storage()._impl, Cnst._impl, Rt._impl->storage()._impl,
Rt._impl->storage()._impl->size(), {}, {}, {});
},
Rt.ptr());
} else {
#ifdef UNI_GPU
checkCudaErrors(cudaSetDevice(Rt.device()));
linalg_internal::lii.cuAri_iitType][Rt.dtype()](
out._impl->storage()._impl, Cnst._impl, Rt._impl->storage()._impl,
Rt._impl->storage()._impl->size(), {}, {}, {}, 0);
#else
cytnx_error_msg(true, "[Add] fatal error, the tensor is on GPU without CUDA support.%s",
"\n");
#endif
}Now the type of Since But cytnx::linalg_internal::AddInternalImpl<lhs_type, rhs_type, out_type>(lhs, rhs, out);This involves either a triple dispatch of Now, for the GPU version, I just noticed: Cytnx/src/backend/linalg_internal_gpu/cuAdd_internal.cu Lines 5291 to 5300 in c5dd9c4 ARGH: the output type is hard-coded to be template <typename T>
Tensor Add(const T &lc, const Tensor &Rt) {
// CPU path: use the templated AddInternalImpl directly, with the output type
// determined by type_promote().
if (Rt.device() == Device.cpu) {
const auto lhs_type = cy_typeid_v<T>;
const auto rhs_type = Rt.dtype();
const auto out_type = cytnx::Scalar_init_interface::type_promote(lhs_type, rhs_type);
// Construct constant tensor and output tensor with correct promoted dtype
Tensor Cnst({1}, lhs_type); // one element, rank 1 tensor
Cnst.at<T>(0) = lc;
Tensor out;
out._impl = Rt._impl->_clone_meta_only();
out._impl->storage() = Storage(Rt._impl->storage().size(), out_type, Rt.device());
// Dispatch to fully-typed templated kernel
cytnx::linalg_internal::AddInternalImpl(Cnst, Rt, out);
return out;
} else {
#ifdef UNI_GPU
// GPU path: existing cuAdd kernels assume the output dtype is std::min(lhs_type, rhs_type))
// so we must match this until the kernels
// are regenerated with correct type promotion semantics.
const auto lhs_type = cy_typeid_v<T>;
const auto rhs_type = Rt.dtype();
const auto out_type = std::min(lhs_type, rhs_type);
Storage Cnst(1, lhs_type);
Cnst.at<T>(0) = lc;
Tensor out;
out._impl = Rt._impl->_clone_meta_only();
out._impl->storage() = Storage(Rt._impl->storage().size(), out_type, Rt.device());
checkCudaErrors(cudaSetDevice(Rt.device()));
linalg_internal::lii.cuAri_ii[lhs_type][rhs_type](
out._impl->storage()._impl, Cnst._impl, Rt._impl->storage()._impl,
Rt._impl->storage()._impl->size(), {}, {}, {}, 0);
return out;
#else
cytnx_error_msg(true, "[Add] fatal error: GPU tensor without CUDA support.%s", "\n");
#endif
}
}There is nothing we can do about the GPU version; in order to call the existing It is also obvious that there are no unit tests covering tricky cases involving type promotions and mixed signed/unsigned etc. Someone should add some, with some annotations that some corner cases are currently expected to fail. For example, addition of a Footnotes
|
For HTTP error, it is fine to just rerun the failed test. |
Yeah, you’re right, it’s a critical problem. I will solve others issues later. |
|
@codex review I think the best solution here is a larger patch that completely gets rid of the function dispatch tables and replaces it with templates. Then we are guaranteed that the result type of each operation agrees with the type_promote function (which could then be modified, potentially). This would also facilitate adding new types, eg low-precision GPU types, quad precision, etc. Doing it bit by bit is ultimately much more work. |
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
Well, @codex wasn't much use here. Why didn't it at least say something about the comment I made earlier? |
|
To use Codex here, create an environment for this repo. |
|
Addressed Ian’s comment #2 in commit What changed:
Validation run:
All listed tests pass. This gives the template-driven direction on Add and removes a large chunk of dispatch-like boilerplate. The remaining step for Ian’s full request is replacing the GPU |
|
@codex environment created |
Summary
Testing
|
Remove Arithmetic_internal.hpp/cpp and Ari_ii from linalg_internal_interface