Skip to content

Zsgpu#14

Merged
joshkamm merged 68 commits intomainfrom
zsgpu
Jul 16, 2025
Merged

Zsgpu#14
joshkamm merged 68 commits intomainfrom
zsgpu

Conversation

@joshkamm
Copy link
Member

@joshkamm joshkamm commented Nov 5, 2024

Paul plans to work on this branch for some time and use it as the dependency for his replacement of FancyElectrons. It is being kept separate because Paul's changes likely will not continue to be compatible with FancyElectrons. Once the replacement is complete, the plan is to merge this pr and pin FancyElectrons to a commit before this pr was merged. Ultimately, Paul plans to migrate all FancyElectrons usage in the group to his replacement.

I'm creating this pr well before it's ready to merge as a space for notes and to monitor progress.

  • Merge main and zsgpu into a temporary branch
  • Comment out all occurrences of async
  • Create a separate branch to pin FancyElectrons to before merging this branch (add note in FancyElectrons readme)
  • Update the gh actions ci to use one of the newer examples (update in pixi, gh actions script, and maybe cmake)
  • Reapply SSTRF changes
  • Update version number

For Alex and Nate: do the examples currently in the repository look representative of what would be helpful for validating the accuracy of the code?

Vaibhav's testing

  • RI = 3 doesn't work
  • Testing RI = 4

Closes #16
Closes #17

@joshkamm joshkamm self-assigned this Nov 5, 2024
Note that some USE_ACC, especially in controlling the multigpu code, are necessary. Otherwise, a non-GPU compile will behave quite badly. So, those remain.
This is a significant update that attempted to fix the multiGPU parallel operations. It did not succeed, but the async functionality may still be worthwhile to use later on. 

Other changes over the last month as I evolved the integral code for Jellium/etc are contained here too.
@paulzim46
Copy link
Contributor

It would be nice to fix the multiGPU operation with the latest nvidia compilers, but I also suspect we just have a compiler bug on our hands. Versions 24.9 and 24.11 do not work; 20.7 is fine.

@joshkamm
Copy link
Member Author

Hmm @paulzim46 do you think it's worth trying to sufficiently isolate it to provide the compiler team with a lead if it is a bug?

While working on the infrastructure, nvhpc has also been the most difficult dependency to manage. It seems like their licensing makes it difficult for a package manager to redistribute.

I've noticed that gcc also has openacc support which may be easier to manage from an infrastructure angle, but I don't know whether it's as reliable and performant in general, or how many subtle differences there are between them.

@joshkamm joshkamm linked an issue Jan 6, 2025 that may be closed by this pull request
Vaibhav-Chemistry and others added 25 commits May 31, 2025 16:53
also removed non-reference output files from lih_VK1 example
…cmake

CMake configuration for installation and include directories
helped with installing on perlmutter
Update example in README because we deleted the one previously referred to
@joshkamm joshkamm marked this pull request as ready for review July 16, 2025 21:07
@joshkamm joshkamm merged commit fe697ef into main Jul 16, 2025
1 check passed
@joshkamm joshkamm deleted the zsgpu branch July 16, 2025 21:09
@joshkamm joshkamm restored the zsgpu branch February 25, 2026 20:32
@joshkamm joshkamm deleted the zsgpu branch February 25, 2026 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix multiple gpu support Remove USE_ACC

3 participants