[Stagging] Aurora tweaks 2 by tdavidcl · Pull Request #1713 · Shamrock-code/Shamrock

tdavidcl · 2026-03-15T23:21:59Z

No description provided.

github-actions · 2026-03-15T23:22:07Z

Thanks @tdavidcl for opening this PR!

You can do multiple things directly here:
1 - Comment pre-commit.ci run to run pre-commit checks.
2 - Comment pre-commit.ci autofix to apply fixes.
3 - Add label autofix.ci to fix authorship & pre-commit for every commit made.
4 - Add label light-ci to only trigger a reduced & faster version of the CI (need the full one before merge).
5 - Add label trigger-ci to create an empty commit to trigger the CI.

Once the workflow completes a message will appear displaying informations related to the run.

Also the PR gets automatically reviewed by gemini, you can:
1 - Comment /gemini review to trigger a review
2 - Comment /gemini summary for a summary
3 - Tag it using @gemini-code-assist either in the PR or in review comments on files

gemini-code-assist · 2026-03-15T23:22:24Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces significant improvements to how system performance metrics, especially energy consumption and wall time, are collected, processed, and reported. It centralizes the logic for gathering and aggregating these metrics across MPI ranks, making the system more robust and easier to extend. Additionally, it enhances the output of a key benchmark to provide more comprehensive performance data and updates the Aurora build configuration for better debugging.

Highlights

System Metrics Refactoring: Refactored the system metrics collection, aggregation, and reporting logic into dedicated utility functions within the shamsys namespace. This centralizes the handling of energy consumption and wall time data across ranks.
Enhanced Benchmark Output: Updated the SPH weak scale benchmark script to output detailed system metrics, including calculated power consumption, in a structured dictionary format for easier analysis.
Aurora Configuration Updates: Modified the Aurora environment setup to enable debug symbols (-g) and CPPTRACE in CMake flags, improving crash reporting and debugging capabilities on the Aurora system.
Accurate Timing for System Metrics: Adjusted the order of system metrics and MPI timer calls in the SPH solver to ensure precise measurements, particularly by accounting for MPI barriers.

Changelog

env/machine/argonne/aurora/env_oneapi.sh
- Added a comment explaining the purpose of the -g flag for debug symbols on Aurora.
- Included the -g flag in CMAKE_CXX_FLAGS for debug symbol generation.
- Enabled SHAMROCK_USE_CPPTRACE for improved stack tracing.
examples/benchmarks/sph_weak_scale_test.py
- Introduced a dic_out dictionary to store and output system metrics.
- Modified average power calculation to use metrics_duration instead of step_time.
- Added system metrics and power values to the dic_out dictionary.
src/shammodels/common/src/timestep_report.cpp
- Replaced manual power gathering with shamsys::gather_rank_metrics.
- Replaced manual power aggregation and formatting with shamsys::aggregate_rank_metrics and shamsys::format_system_metrics.
- Updated table population to use the new formatted system metrics structures.
src/shammodels/sph/src/Solver.cpp
- Reordered system_metrics_start initialization to occur before other timers, with a comment explaining the need for barrier synchronization.
- Adjusted the timing of delta_mpi_timer calculation relative to mem_perf_infos_end and system_metrics_end.
src/shammodels/sph/src/SolverLog.cpp
- Replaced manual gathering and aggregation of system metrics with calls to shamsys::gather_rank_metrics and shamsys::aggregate_rank_metrics.
src/shamsys/include/shamsys/system_metrics.hpp
- Added wall_time member to the SystemMetrics struct.
- Modified get_system_metrics signature to accept an optional barrier parameter.
- Declared new functions: gather_rank_metrics, aggregate_rank_metrics, and format_system_metrics.
- Defined FormattedSystemMetrics struct for string-formatted metric output.
- Updated operator- for SystemMetrics to include wall_time subtraction.
src/shamsys/src/system_metrics.cpp
- Included new headers: shambase/stacktrace.hpp, shamalgs/collective/reduction.hpp, shamcomm/wrapper.hpp.
- Implemented get_system_metrics to record wall_time and optionally use MPI barriers.
- Implemented gather_rank_metrics to collect SystemMetrics from all MPI ranks.
- Implemented aggregate_rank_metrics to sum energy metrics and find the maximum wall time across ranks.
- Implemented format_system_metrics to convert SystemMetrics into human-readable strings, including power calculations.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces several tweaks for the Aurora machine, focusing on improving metrics reporting and performance analysis. The changes include enabling debug symbols for better crash analysis, refactoring system metrics collection into a centralized module to reduce code duplication, and adding wall_time to metrics for more accurate power calculations. My review identifies a correctness bug in the new metrics formatting logic where energy units were incorrect, and also points out opportunities to further reduce code duplication in both C++ and Python code to improve maintainability.

src/shamsys/src/system_metrics.cpp

examples/benchmarks/sph_weak_scale_test.py

github-actions · 2026-03-22T23:22:11Z

Workflow report

workflow report corresponding to commit 9079339
Commiter email is timothee.davidcleris@proton.me

Light CI is enabled. This will only run the basic tests and not the full tests.
Merging a PR require the job "on PR / all" to pass which is disabled in this case.

Pre-commit check report

Pre-commit check: ✅

trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check for merge conflicts................................................Passed
check that executables have shebangs.....................................Passed
check that scripts with shebangs are executable..........................Passed
check for added large files..............................................Passed
check for case conflicts.................................................Passed
check for broken symlinks................................................Passed
check yaml...............................................................Passed
detect private key.......................................................Passed
No-tabs checker..........................................................Passed
Tabs remover.............................................................Passed
Validate GitHub Workflows................................................Passed
clang-format.............................................................Passed
ruff check...............................................................Passed
ruff format..............................................................Passed
Check doxygen headers....................................................Passed
Check license headers....................................................Passed
Check #pragma once.......................................................Passed
Check SYCL #include......................................................Passed
No ssh in git submodules remote..........................................Passed
No UTF-8 in files (except for authors)...................................Passed

Test pipeline can run.

Doxygen diff with `main`

Removed warnings : 9
New warnings : 9
Warnings count : 8366 → 8366 (0.0%)

Detailed changes :

- src/shamalgs/src/collective/sparse_exchange.cpp:113: warning: Member build_sparse_exchange_table(const std::vector< CommMessageInfo > &messages_send, size_t max_alloc_size) (function) of namespace shamalgs::collective is not documented.
- src/shamalgs/src/collective/sparse_exchange.cpp:113: warning: Member build_sparse_exchange_table(const std::vector< CommMessageInfo > &messages_send, size_t max_alloc_size) (function) of namespace shamalgs::collective is not documented.
+ src/shamalgs/src/collective/sparse_exchange.cpp:116: warning: Member build_sparse_exchange_table(const std::vector< CommMessageInfo > &messages_send, size_t max_alloc_size) (function) of namespace shamalgs::collective is not documented.
+ src/shamalgs/src/collective/sparse_exchange.cpp:116: warning: Member build_sparse_exchange_table(const std::vector< CommMessageInfo > &messages_send, size_t max_alloc_size) (function) of namespace shamalgs::collective is not documented.
- src/shamalgs/src/collective/sparse_exchange.cpp:247: warning: Member sparse_exchange(std::shared_ptr< sham::DeviceScheduler > dev_sched, const std::vector< const u8 * > &bytebuffer_send, const std::vector< u8 * > &bytebuffer_recv, const CommTable &comm_table) (function) of namespace shamalgs::collective is not documented.
+ src/shamalgs/src/collective/sparse_exchange.cpp:250: warning: Member sparse_exchange(std::shared_ptr< sham::DeviceScheduler > dev_sched, const std::vector< const u8 * > &bytebuffer_send, const std::vector< u8 * > &bytebuffer_recv, const CommTable &comm_table) (function) of namespace shamalgs::collective is not documented.
- src/shamalgs/src/collective/sparse_exchange.cpp:296: warning: Member sparse_exchange(std::shared_ptr< sham::DeviceScheduler > dev_sched, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, target > > > &bytebuffer_send, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, target > > > &bytebuffer_recv, const CommTable &comm_table) (function) of namespace shamalgs::collective is not documented.
- src/shamalgs/src/collective/sparse_exchange.cpp:296: warning: Member sparse_exchange(std::shared_ptr< sham::DeviceScheduler > dev_sched, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, target > > > &bytebuffer_send, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, target > > > &bytebuffer_recv, const CommTable &comm_table) (function) of namespace shamalgs::collective is not documented.
+ src/shamalgs/src/collective/sparse_exchange.cpp:299: warning: Member sparse_exchange(std::shared_ptr< sham::DeviceScheduler > dev_sched, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, target > > > &bytebuffer_send, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, target > > > &bytebuffer_recv, const CommTable &comm_table) (function) of namespace shamalgs::collective is not documented.
+ src/shamalgs/src/collective/sparse_exchange.cpp:299: warning: Member sparse_exchange(std::shared_ptr< sham::DeviceScheduler > dev_sched, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, target > > > &bytebuffer_send, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, target > > > &bytebuffer_recv, const CommTable &comm_table) (function) of namespace shamalgs::collective is not documented.
- src/shamalgs/src/collective/sparse_exchange.cpp:384: warning: Member sparse_exchange< sham::device >(std::shared_ptr< sham::DeviceScheduler > dev_sched, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, sham::device > > > &bytebuffer_send, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, sham::device > > > &bytebuffer_recv, const CommTable &comm_table) (function) of namespace shamalgs::collective is not documented.
+ src/shamalgs/src/collective/sparse_exchange.cpp:387: warning: Member sparse_exchange< sham::device >(std::shared_ptr< sham::DeviceScheduler > dev_sched, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, sham::device > > > &bytebuffer_send, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, sham::device > > > &bytebuffer_recv, const CommTable &comm_table) (function) of namespace shamalgs::collective is not documented.
- src/shamalgs/src/collective/sparse_exchange.cpp:390: warning: Member sparse_exchange< sham::host >(std::shared_ptr< sham::DeviceScheduler > dev_sched, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, sham::host > > > &bytebuffer_send, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, sham::host > > > &bytebuffer_recv, const CommTable &comm_table) (function) of namespace shamalgs::collective is not documented.
+ src/shamalgs/src/collective/sparse_exchange.cpp:393: warning: Member sparse_exchange< sham::host >(std::shared_ptr< sham::DeviceScheduler > dev_sched, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, sham::host > > > &bytebuffer_send, std::vector< std::unique_ptr< sham::DeviceBuffer< u8, sham::host > > > &bytebuffer_recv, const CommTable &comm_table) (function) of namespace shamalgs::collective is not documented.
- src/shamrock/include/shamrock/scheduler/SerialPatchTree.hpp:312: warning: Member dump_dat() (function) of class SerialPatchTree is not documented.
+ src/shamrock/include/shamrock/scheduler/SerialPatchTree.hpp:317: warning: Member dump_dat() (function) of class SerialPatchTree is not documented.
- src/shamrock/include/shamrock/scheduler/SerialPatchTree.hpp:335: warning: Member compute_patch_owner(sham::DeviceScheduler_ptr dev_sched, sham::DeviceBuffer< fp_prec_vec > &position_buffer, u32 len) (function) of class SerialPatchTree is not documented.
+ src/shamrock/include/shamrock/scheduler/SerialPatchTree.hpp:340: warning: Member compute_patch_owner(sham::DeviceScheduler_ptr dev_sched, sham::DeviceBuffer< fp_prec_vec > &position_buffer, u32 len) (function) of class SerialPatchTree is not documented.

tdavidcl added 6 commits March 15, 2026 13:08

cleanup

5d07c43

cleanup

de797a6

better ?

b161128

better reporting ?

58b4695

better reporting ?

ff5a902

streamline that shit

e0cd91d

tdavidcl added draft light-ci labels Mar 15, 2026

gemini-code-assist bot reviewed Mar 15, 2026

View reviewed changes

src/shamsys/src/system_metrics.cpp Show resolved Hide resolved

examples/benchmarks/sph_weak_scale_test.py Show resolved Hide resolved

tdavidcl added 19 commits March 16, 2026 00:49

whoopsi

6bc2845

correct max mem usage

ee3632a

try to permute time mesure and power mesure

f815d5f

attempt

cdca66a

attempt

d811e5f

add duration

775d26a

Merge branch 'main' into aurora-test4

3944ef1

add support for linked geopm

1d53302

enable geopm on aurora

9011b2a

shut up warnings

9dc904d

Merge branch 'main' into aurora-test4

bd98637

Merge branch 'main' into aurora-test4

9534a22

better ?

dcd76b0

more debug infos

2ebb27a

dammit

8f7087b

dammit

36b2516

more steps

5e2d837

Merge branch 'main' into aurora-test4

8d69537

Merge branch 'main' into aurora-test4

0f0d14c

tdavidcl added 3 commits March 22, 2026 00:02

Merge branch 'main' into aurora-test4

c4217cd

better ?

e698051

more omp

9079339

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stagging] Aurora tweaks 2#1713

[Stagging] Aurora tweaks 2#1713
tdavidcl wants to merge 28 commits intoShamrock-code:mainfrom
tdavidcl:aurora-test4

tdavidcl commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

gemini-code-assist bot commented Mar 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tdavidcl commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

gemini-code-assist bot commented Mar 15, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 22, 2026

Workflow report

Pre-commit check report

Doxygen diff with main

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Doxygen diff with `main`