Refactor arithmetic ops to use std::variant + std::visit by jysh1214 · Pull Request #723 · Cytnx-dev/Cytnx

jysh1214 · 2025-12-01T14:34:43Z

Remove Arithmetic_internal.hpp/cpp and Ari_ii from linalg_internal_interface

jysh1214 · 2025-12-01T14:42:15Z

Not sure what happened in github action:

Collecting package metadata (repodata.json): | ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��failed
CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/conda-forge/noarch/repodata.json>
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.
'https//conda.anaconda.org/conda-forge/noarch'

https://github.com/Cytnx-dev/Cytnx/actions/runs/19826283869/job/56800387981?pr=723

codecov · 2025-12-01T14:56:11Z

Codecov Report

❌ Patch coverage is 15.72387% with 879 lines in your changes missing coverage. Please review.
✅ Project coverage is 36.18%. Comparing base (e87c3a2) to head (d60ea7f).
⚠️ Report is 32 commits behind head on master.

Files with missing lines	Patch %	Lines
src/linalg/Div.cpp	4.09%	232 Missing and 2 partials ⚠️
src/linalg/Sub.cpp	7.37%	221 Missing and 5 partials ⚠️
src/linalg/Mod.cpp	3.58%	188 Missing ⚠️
src/linalg/Cpr.cpp	0.00%	108 Missing ⚠️
src/linalg/Mul.cpp	25.18%	96 Missing and 5 partials ⚠️
src/linalg/Add.cpp	80.95%	3 Missing and 9 partials ⚠️
src/linalg/iArithmetic_visit.hpp	80.95%	6 Missing and 2 partials ⚠️
src/linalg/iDiv.cpp	66.66%	1 Missing ⚠️
src/linalg/iMul.cpp	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #723      +/-   ##
==========================================
+ Coverage   32.35%   36.18%   +3.83%     
==========================================
  Files         215      213       -2     
  Lines       36363    30579    -5784     
  Branches    14597    12655    -1942     
==========================================
- Hits        11764    11065     -699     
+ Misses      22659    17679    -4980     
+ Partials     1940     1835     -105

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ianmccul · 2025-12-01T19:22:19Z

This is looking good - you've removed about 20,000 lines of code in the last couple of weeks? 😁

There is still a lot of repitition with the Tensor Add<T>(const T &lc, const Tensor &Rt) templates in src/linalg/Add.cpp, is it possible to replace that with something like:

    template <typename T>
    Tensor Add(const T &lc, const Tensor &Rt) {
      auto type = cy_typeid_v<T>;
      Storage Cnst(1, type);
      Cnst.at<T>(0) = lc;
      Tensor out;
      out._impl = Rt._impl->_clone_meta_only();
      out._impl->storage() =
        Storage(Rt._impl->storage().size(), std::min(type, Rt.dtype()), Rt.device());

      if (Rt.device() == Device.cpu) {
        std::visit(
          [&](auto *rptr) {
            using TR = std::remove_pointer_t<decltype(rptr)>;
            cytnx::linalg_internal::AddInternalImpl<type, TR>(
              out._impl->storage()._impl, Cnst._impl, Rt._impl->storage()._impl,
              Rt._impl->storage()._impl->size(), {}, {}, {});
          },
          Rt.ptr());
      } else {
  #ifdef UNI_GPU
        checkCudaErrors(cudaSetDevice(Rt.device()));
        linalg_internal::lii.cuAri_iitType][Rt.dtype()](
          out._impl->storage()._impl, Cnst._impl, Rt._impl->storage()._impl,
          Rt._impl->storage()._impl->size(), {}, {}, {}, 0);
  #else
        cytnx_error_msg(true, "[Add] fatal error, the tensor is on GPU without CUDA support.%s",
                        "\n");
  #endif
      }

Now the type of out looks very suspicious. The old code (equivalent to the std::min() I used above) is not identical to the result of the type_promote() function, which is most likely a bug¹. Although they are both broken (eg complex<float> + double would end up as complex<float> under either mechanism), it is absolutely essential that the type selected here matches the output type used in the linalg_internal function, since this is just assumed and there is no checking.

Since AddInternalImpl defines using TOut = cytnx::Scalar_init_interface::type_promote_t<TLin, TRin> the same function must be used here as well, otherwise there will be a mismatch. This would be detectable with mixed types involving unsigned, eg type_promote(uint32, int16) is int32, but min(uint32, int16) is uint32. So addition of Tensor of uint32 and int16 will incorrectly give a result tensor that is an unsigned type. (Note that int32 is also the wrong type here, since converting uint32 to int32 might overflow. It ought to be int64, but that is not fixable until all of the implicitly assumed return types are fixed...)

But cytnx::linalg_internal::AddInternalImpl is now a template anyway, so why convert to a type-erased interface, only to convert back again? Better would be to construct the output tensor and pass that in to a typed interface:

cytnx::linalg_internal::AddInternalImpl<lhs_type, rhs_type, out_type>(lhs, rhs, out);

This involves either a triple dispatch of std::variant (OK, but lots of combinations!), or do a double dispatch on lhs and rhs, and a runtime assert that out.dtype() matches type_promote(lhs, rhs). Or alternatively, move the implementation of AddInternalImpl into Add, since both functions are now fairly simple. (That is a better option I think, because then the output type is localized to the Add function and does not need to be kept in syncronization with anything else.)

Now, for the GPU version, I just noticed:

Cytnx/src/backend/linalg_internal_gpu/cuAdd_internal.cu

Lines 5291 to 5300 in c5dd9c4

    
           void cuAdd_internal_u32ti16(boost::intrusive_ptr<Storage_base> &out, 
        
                                       boost::intrusive_ptr<Storage_base> &Lin, 
        
                                       boost::intrusive_ptr<Storage_base> &Rin, 
        
                                       const unsigned long long &len, 
        
                                       const std::vector<cytnx_uint64> &shape, 
        
                                       const std::vector<cytnx_uint64> &invmapper_L, 
        
                                       const std::vector<cytnx_uint64> &invmapper_R) { 
        
             cytnx_uint32 *_out = (cytnx_uint32 *)out->data(); 
        
             cytnx_uint32 *_Lin = (cytnx_uint32 *)Lin->data(); 
        
             cytnx_int16 *_Rin = (cytnx_int16 *)Rin->data();

ARGH: the output type is hard-coded to be uint32! This is different to what is given by type_promote(uint32, int16) and is just broken². So for the time being, I'd try

template <typename T>
Tensor Add(const T &lc, const Tensor &Rt) {
  // CPU path: use the templated AddInternalImpl directly, with the output type
  // determined by type_promote().
  if (Rt.device() == Device.cpu) {
    const auto lhs_type = cy_typeid_v<T>;
    const auto rhs_type = Rt.dtype();
    const auto out_type = cytnx::Scalar_init_interface::type_promote(lhs_type, rhs_type);

    // Construct constant tensor and output tensor with correct promoted dtype
    Tensor Cnst({1}, lhs_type);  // one element, rank 1 tensor
    Cnst.at<T>(0) = lc;
    Tensor out;
    out._impl = Rt._impl->_clone_meta_only();
    out._impl->storage() = Storage(Rt._impl->storage().size(), out_type, Rt.device());

    // Dispatch to fully-typed templated kernel
    cytnx::linalg_internal::AddInternalImpl(Cnst, Rt, out);

    return out;
  } else {
#ifdef UNI_GPU
    // GPU path: existing cuAdd kernels assume the output dtype is std::min(lhs_type, rhs_type)) 
    // so we must match this until the kernels
    // are regenerated with correct type promotion semantics.
    const auto lhs_type = cy_typeid_v<T>;
    const auto rhs_type = Rt.dtype();
    const auto out_type = std::min(lhs_type, rhs_type);

    Storage Cnst(1, lhs_type);
    Cnst.at<T>(0) = lc;
    Tensor out;
    out._impl = Rt._impl->_clone_meta_only();
    out._impl->storage() = Storage(Rt._impl->storage().size(), out_type, Rt.device());

    checkCudaErrors(cudaSetDevice(Rt.device()));
    linalg_internal::lii.cuAri_ii[lhs_type][rhs_type](
    out._impl->storage()._impl, Cnst._impl, Rt._impl->storage()._impl,
    Rt._impl->storage()._impl->size(), {}, {}, {}, 0);

    return out;
#else
    cytnx_error_msg(true, "[Add] fatal error: GPU tensor without CUDA support.%s", "\n");
#endif
  }
}

There is nothing we can do about the GPU version; in order to call the existing linalg_internal::lii.cuAri_ii we must use the incorrect type given by min(lhs, rhs). Replacing the GPU linalg_internal::lii.cuAri_ii functions with type-safe templates should be done soon! And in the process, relying on Unified Memory to pass a single number into a CUDA kernel isn't great (paging a 4K block from the CPU to GPU) -- it would be much better to write a separate kernel to add a constant to a tensor, and pass the constant as a parameter to the kernel!

It is also obvious that there are no unit tests covering tricky cases involving type promotions and mixed signed/unsigned etc. Someone should add some, with some annotations that some corner cases are currently expected to fail. For example, addition of a Tensor<uint32> and Tensor<int16> will give Tensor<int32> if running on a CPU, but Tensor<uint32> if running on a GPU. But the actual result should be Tensor<int64>. Also Tensor<complex<float>> + double should give Tensor<complex<double>> but currently gives Tensor<complex<float>>.

For some archaology, my guess is that originally the code used pytorch min(lhs, rhs) for type promotions, but pytorch historically never had unsigned types (it does now I think, incompletely). When unsigned int was added, or it was realized that min(lhs, rhs) doesn't work properly with unsigned, the type_promote mechanism was added, but some of the kernels involving unsigned were never updated... ↩
Well, C/C++ is broken in the same way... ↩

IvanaGyro · 2025-12-05T05:47:01Z

Not sure what happened in github action:

For HTTP error, it is fine to just rerun the failed test.

jysh1214 · 2025-12-06T18:13:23Z

It is also obvious that there are no unit tests covering tricky cases involving type promotions and mixed signed/unsigned etc.

Yeah, you’re right, it’s a critical problem.
I’ve sent PR#730 for the mixed-type unit tests. Please help review it. Thx.

I will solve others issues later.

ianmccul · 2026-03-30T10:08:49Z

@codex review

I think the best solution here is a larger patch that completely gets rid of the function dispatch tables and replaces it with templates. Then we are guaranteed that the result type of each operation agrees with the type_promote function (which could then be modified, potentially). This would also facilitate adding new types, eg low-precision GPU types, quad precision, etc. Doing it bit by bit is ultimately much more work.

chatgpt-codex-connector · 2026-03-30T10:15:29Z

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ianmccul · 2026-03-30T10:19:28Z

Well, @codex wasn't much use here. Why didn't it at least say something about the comment I made earlier?

chatgpt-codex-connector · 2026-03-30T10:19:43Z

To use Codex here, create an environment for this repo.

yingjerkao · 2026-03-30T10:46:29Z

Addressed Ian’s comment #2 in commit bc6f6d92.

What changed:

Refactored src/linalg/Add.cpp to remove per-type scalar overload boilerplate and use a single templated scalar+tensor path with std::visit dispatch on CPU.
Added explicit template instantiations for supported scalar types instead of hand-written specializations.
Added a shared output-dtype selector in Add:
- CPU path now uses Type.type_promote(lhs, rhs).
- GPU path intentionally keeps legacy min(lhs, rhs) behavior to stay compatible with current cuAri_ii kernels.
Added regression tests in tests/linalg_test/linalg_test.cpp:
- Tensor_Add_mixed_dtype_type_promote_cpu
- Tensor_Add_scalar_mixed_dtype_type_promote_cpu
  These fail under old min(lhs, rhs) CPU allocation and pass with this patch.

Validation run:

cmake --build --preset openblas-cpu --target test_main -- -j8
./build/openblas-cpu/tests/test_main --gtest_filter=DenseUniTensorTest.Add_UT_UT:DenseUniTensorTest.Add_UT_Scalar:DenseUniTensorTest.operatorAdd_UT_Scalar:BlockUniTensorTest.Add:linalg_Test.Tensor_Add_mixed_dtype_type_promote_cpu:linalg_Test.Tensor_Add_scalar_mixed_dtype_type_promote_cpu

All listed tests pass.

This gives the template-driven direction on Add and removes a large chunk of dispatch-like boilerplate. The remaining step for Ian’s full request is replacing the GPU lii.cuAri_ii tables with typed kernels (so GPU can also use type_promote semantics).

ianmccul · 2026-03-30T10:47:13Z

@codex environment created

chatgpt-codex-connector · 2026-03-30T10:51:13Z

Summary

Added a new regression test Tensor_Add_scalar_object_mixed_dtype_type_promote_cpu to cover the Scalar overloads of linalg::Add (Scalar + Tensor and Tensor + Scalar) for mixed dtypes on CPU. The test validates both promoted dtype selection (Type.type_promote) and numeric results. tests/linalg_test/linalg_test.cppL344-L356
This complements the existing tensor/tensor and primitive scalar mixed-dtype checks and helps guard against future regressions in Add dtype promotion behavior. tests/linalg_test/linalg_test.cppL321-L342

Testing

⚠️ cmake --preset openblas-cpu (failed in this environment because Boost CMake package files were not found: BoostConfig.cmake / boost-config.cmake).
⚠️ cmake --build --preset openblas-cpu --target test_main -- -j8 (could not run because configure step did not complete, so build/openblas-cpu was not created).

View task →

Refactor arithmetic ops to use std::variant + std::visit

9548e52

Remove Arithmetic_internal.hpp/cpp and Ari_ii from linalg_internal_interface

jysh1214 requested a review from IvanaGyro December 1, 2025 14:35

IvanaGyro mentioned this pull request Dec 10, 2025

Add mixed-type GPU arith ops tests and fix Cpr/Mod dispatcher #730

Closed

yingjerkao added 3 commits March 30, 2026 16:20

linalg: use std::visit for inplace cpu arithmetic dispatch

c483252

linalg: remove dead inplace arithmetic dispatch table

17b6036

style: clang-format inplace arithmetic visit helper

965a109

yingjerkao mentioned this pull request Mar 30, 2026

Introducing Agentic Code Generators into Workflow #759

Open

yingjerkao requested review from ianmccul and pcchen March 30, 2026 09:15

Refactor Add scalar dispatch and enforce CPU type promotion

bc6f6d9

yingjerkao added 2 commits March 31, 2026 08:44

CUDA: typed Add dispatch + CUDA13 CCCL include detection

d2c3881

CUDA: typed dispatch for Sub/Mul/Div with mixed-dtype tests

d60ea7f

yingjerkao requested a review from chiamin March 31, 2026 02:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor arithmetic ops to use std::variant + std::visit#723

Refactor arithmetic ops to use std::variant + std::visit#723
jysh1214 wants to merge 7 commits intomasterfrom
dev/alexchiang/refactor-arithop

jysh1214 commented Dec 1, 2025

Uh oh!

jysh1214 commented Dec 1, 2025

Uh oh!

codecov bot commented Dec 1, 2025 •

edited

Loading

Uh oh!

ianmccul commented Dec 1, 2025 •

edited

Loading

Uh oh!

IvanaGyro commented Dec 5, 2025

Uh oh!

jysh1214 commented Dec 6, 2025

Uh oh!

ianmccul commented Mar 30, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 30, 2026

Uh oh!

ianmccul commented Mar 30, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 30, 2026

Uh oh!

yingjerkao commented Mar 30, 2026 •

edited

Loading

Uh oh!

ianmccul commented Mar 30, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jysh1214 commented Dec 1, 2025

Uh oh!

jysh1214 commented Dec 1, 2025

Uh oh!

codecov bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ianmccul commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

IvanaGyro commented Dec 5, 2025

Uh oh!

jysh1214 commented Dec 6, 2025

Uh oh!

ianmccul commented Mar 30, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 30, 2026

Uh oh!

ianmccul commented Mar 30, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 30, 2026

Uh oh!

yingjerkao commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianmccul commented Mar 30, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 30, 2026

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Dec 1, 2025 •

edited

Loading

ianmccul commented Dec 1, 2025 •

edited

Loading

yingjerkao commented Mar 30, 2026 •

edited

Loading