Skip to content

sanitize python more#634

Open
Yurlungur wants to merge 3 commits intomainfrom
jmm/sanitize-python-more
Open

sanitize python more#634
Yurlungur wants to merge 3 commits intomainfrom
jmm/sanitize-python-more

Conversation

@Yurlungur
Copy link
Copy Markdown
Collaborator

@Yurlungur Yurlungur commented Apr 12, 2026

PR Summary

This MR is a follow up of #630 and issue #628 . I found I was still seeing compilation errors on GPU builds with python active, and the reason was that the lambda was marked __host__ __device__ even though the execution space was host. This MR adds a constexpr branch to switch to a host-only lambda when appropriate.

PR Checklist

  • Adds a test for any bugs fixed. Adds tests for new features.
  • Format your changes by using the make format command after configuring with cmake.
  • Document any new features, update documentation for changes made.
  • Make sure the copyright notice on any files you modified is up to date.
  • After creating a pull request, note it in the CHANGELOG.md file.
  • LANL employees: make sure tests pass both on the github CI and on the Darwin CI
  • If ML was used, make sure to add a disclaimer at the top of a file indicating ML was used to assist in generating the file.
  • If Agentic AI was used, have the AI generate a "proposed changes" markdown file and store it in the plan_histories folder, with a filename the same as the MR number.

If preparing for a new release, in addition please check the following:

  • Update the version in cmake.
  • Move the changes in the CHANGELOG.md file under a new header for the new release, and reset the categories.
  • Maintainers: ensure spackages are up to date:
    • LANL-internal team, update XCAP spackages
    • Current maintainer of upstream spackages, submit MR to spack

@Yurlungur Yurlungur self-assigned this Apr 12, 2026
@Yurlungur Yurlungur added bug Something isn't working build Something to do with the build system clean-up Changes that don't affect code execution but make the code cleaner labels Apr 12, 2026
Copy link
Copy Markdown
Collaborator

@adamdempsey90 adamdempsey90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have to do this manual switch on every function that could be called from python?

@Yurlungur
Copy link
Copy Markdown
Collaborator Author

Do you have to do this manual switch on every function that could be called from python?

Every C++ function. The alternative would be to do a for loop in Python directly, which would be a bit of a hack, but maybe worth it. That would prevent the Python from doing efficient EOSPAC calls though.

@Yurlungur
Copy link
Copy Markdown
Collaborator Author

@adamdempsey90 do you object to this? I don't love it, but I do think it's the simplest/cleanest path to getting things working the way we need them to.

@adamdempsey90
Copy link
Copy Markdown
Collaborator

@adamdempsey90 do you object to this? I don't love it, but I do think it's the simplest/cleanest path to getting things working the way we need them to.

I don't know, I'm not a huge fan of having to write the same for loops twice. Do you understand the root cause of why the normal portable lambda can't be used?

@Yurlungur
Copy link
Copy Markdown
Collaborator Author

@adamdempsey90 do you object to this? I don't love it, but I do think it's the simplest/cleanest path to getting things working the way we need them to.

I don't know, I'm not a huge fan of having to write the same for loops twice.

Neither am I.

Do you understand the root cause of why the normal portable lambda can't be used?

Yes. Because it marks it as __host__ __device__. But if that lambda then calls the operator[] for, say, a numpy array, that is not kosher, because the numpy array doesn't have device markings.

@adamdempsey90
Copy link
Copy Markdown
Collaborator

@adamdempsey90 do you object to this? I don't love it, but I do think it's the simplest/cleanest path to getting things working the way we need them to.

I don't know, I'm not a huge fan of having to write the same for loops twice.

Neither am I.

Do you understand the root cause of why the normal portable lambda can't be used?

Yes. Because it marks it as __host__ __device__. But if that lambda then calls the operator[] for, say, a numpy array, that is not kosher, because the numpy array doesn't have device markings.

Maybe an alternative then is to have a small struct that wraps the numpy pointer (?) in an unmanaged host view?

@Yurlungur
Copy link
Copy Markdown
Collaborator Author

@adamdempsey90 do you object to this? I don't love it, but I do think it's the simplest/cleanest path to getting things working the way we need them to.

I don't know, I'm not a huge fan of having to write the same for loops twice.

Neither am I.

Do you understand the root cause of why the normal portable lambda can't be used?

Yes. Because it marks it as __host__ __device__. But if that lambda then calls the operator[] for, say, a numpy array, that is not kosher, because the numpy array doesn't have device markings.

Maybe an alternative then is to have a small struct that wraps the numpy pointer (?) in an unmanaged host view?

That imposes a requirement on stride, which we currently don't impose. Though perhaps one could be clever.

@adamdempsey90
Copy link
Copy Markdown
Collaborator

That imposes a requirement on stride, which we currently don't impose. Though perhaps one could be clever.

What do you mean? There's LayoutStride that can accept whatever striding you want

@adamdempsey90
Copy link
Copy Markdown
Collaborator

That imposes a requirement on stride, which we currently don't impose. Though perhaps one could be clever.

What do you mean? There's LayoutStride that can accept whatever striding you want

I should say, you just set the stride at runtime to whatever the numpy array has, but the View would be LayoutStride and Unmanaged

@Yurlungur
Copy link
Copy Markdown
Collaborator Author

That imposes a requirement on stride, which we currently don't impose. Though perhaps one could be clever.

What do you mean? There's LayoutStride that can accept whatever striding you want

I should say, you just set the stride at runtime to whatever the numpy array has, but the View would be LayoutStride and Unmanaged

We have to thread the numpy language for stride/layout into the Kokkos unmanaged view.

@Yurlungur
Copy link
Copy Markdown
Collaborator Author

That imposes a requirement on stride, which we currently don't impose. Though perhaps one could be clever.

What do you mean? There's LayoutStride that can accept whatever striding you want

I should say, you just set the stride at runtime to whatever the numpy array has, but the View would be LayoutStride and Unmanaged

Ah. messages crossing in flight. Yeah this is what I mean---the alternative is a bunch of additional helper code in the python module. Does numpy always provide contiguous memory? It never does some weird array of arrays thing?

@adamdempsey90
Copy link
Copy Markdown
Collaborator

That imposes a requirement on stride, which we currently don't impose. Though perhaps one could be clever.

What do you mean? There's LayoutStride that can accept whatever striding you want

I should say, you just set the stride at runtime to whatever the numpy array has, but the View would be LayoutStride and Unmanaged

Ah. messages crossing in flight. Yeah this is what I mean---the alternative is a bunch of additional helper code in the python module. Does numpy always provide contiguous memory? It never does some weird array of arrays thing?

From some consulting with GPT/Gemini and looking at https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html, I think you should be able to write a very minimal array wrapper class that basically copies the strides/offsets/whatever else you need and the pointer and then provide PORTABLE_INLINE_FUNCTION operators that do the correct indexing.

If this works, it seems worth the effort now so that we don't have to always add double loops for every new function we want the python api to have access to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working build Something to do with the build system clean-up Changes that don't affect code execution but make the code cleaner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants