Muphys jax mlir opt#7
Open
dganellari wants to merge 170 commits into
Open
Conversation
Re-structuring of the (experiments and grids) serialized data generation and download_extraction --------- Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi> Co-authored-by: Hannes Vogt <hannes@havogt.de>
porting of advection least square coefficients for both sphere and torus and implemented `setup_program` thorughout advection --------- Co-authored-by: Hannes Vogt <hannes@havogt.de> Co-authored-by: Rico Haeuselmann <r.haeuselmann@gmx.ch> Co-authored-by: Jacopo Canton <jacopo.canton@gmail.com>
…TOFFSET_DSL fields with torus grids (C2SM#1045) They were using the hardcoded config values for `thslp_zdiffu` and `thhgtd_zdiffu`. These values can now be customized in the `MetricsFieldsFactory`, and have been updated in the test definitions for GAUSS3D and WEISMAN_KLEMP. The xfails are removed from `test_metrics_factory.py`. The previously failing tests in `test_compute_diffusion_metrics.py` now pass. --------- Co-authored-by: Jacopo Canton <jacopo.canton@gmail.com>
…fix' into amd_profiling
Co-authored-by: Edoardo Paone <edoardo.paone@cscs.ch> Co-authored-by: Mikael Simberg <mikael.simberg@cscs.ch>
Since tests are currently anyway serialized, I think we don't benefit from running across the full node. This uses MPS to run all four ranks on a single GPU. This is based on C2SM#819.
…C2SM#1119) Combines C2SM#1118 and C2SM#1115, because both require `v3` serialized data: - Move computation of `vertoffset_gradp` from `vertidx_gradp` to Python (index2offset) - Move transpose (and decomposition) of `rbf_vec_coeff_v` and `rbf_vec_coeff_e` to Python Additional: - Add missing halo exchange to `compute_zdiff_gradp` (test failure triggered by new serialized data, most likely unrelated to the other changes) - Refactor `kflip_wgtfacq` to a more general `flip` on fields
- set default log level in py2fgen runtime to WARNING - add `ICON4PY_WAIT_FOR_COMPILATION` option to wait until granules inits finished jit compiling - remove an unused parameter
Reduce blanket type ignores at the price of adding a handful specific ones. --------- Co-authored-by: Hannes Vogt <vogt@hey.com>
Make usage of `setup_graupel()` consistent across multiple places.
Adds gtfn_gpu backend to the distributed CI pipeline. dace_gpu is still left out because compilation takes too long. The base image is upgraded because it's possible, but not strictly necessary. The CPU-only version of the pipeline needed 25.04 (24.04 and 25.10 did not work for various reasons). However, since OpenMPI and libfabric are now built manually in the container the base image version is less of a constraint. 24.04 doesn't have matching GCC/CUDA versions and 26.04 doesn't exist yet, but the pipeline should eventually use 26.04. OpenMPI and libfabric are built manually for slingshot support because getting the ubuntu repository packages to work with GPU support did not seem possible/easy. The installation is based on https://github.com/eth-cscs/cray-network-stack. GHEX needs an upgrade, because there's a bug in how strides are calculated for GPU buffers. @philip-paul-mueller has already fixed this in ghex-org/GHEX#190 but we should wait for that to be merged (and probably test in icon-exclaim first). This also fixes a few cupy/numpy incompatibilities. `revert_repeated_index_to_invalid` was updated to only deal with numpy for now as the connectivities are always numpy arrays. `test_halo_exchange_for_sparse_field` is marked `embedded_only`. The non-MPI test was already marked embedded-only. This does not try to unify the default and distributed CI pipeline definitions. That should, however, be done done sooner or later as well. --------- Co-authored-by: Jacopo Canton <jacopo.canton@gmail.com> Co-authored-by: Nicoletta Farabullini <41536517+nfarabullini@users.noreply.github.com>
Plan is to tag this version and then branch to make further v0.1.x releases from a branch with selected commits. Greenline work will continue in main, blueline will stay on `v0.1.x` until we feel comfortable to get to the next version with all changes from main.
In [PR#980](C2SM#980) introduced streams into the halo exchanges. For this also `DEFAULT_STREAM`, which models the default stream and implements the [CUDA Stream Protocol](https://nvidia.github.io/cuda-python/cuda-core/latest/interoperability.html#cuda-stream-protocol). However, the original implementation identified as protocol version `1` instead of version `0`. Because of a related bug in [GHEX](ghex-org/GHEX#202) this error was hidden. This PR fixes the Python implementation and also updates GHEX.
The orchestration is not used and not tested. Moreover the orchestration.decorator does import mpi.MPI which does an MPI_Init (e.g. when generating bindings with py2fgen).
- delete tools/common (there was only py2fgen left which had its own setup_logger) - default setup_logger is WARNING
…1171) `test_diffusion.f90` and `test_dycore.f90` in `tools/tests/tools/py2fgen/fortran_samples/` are unused — only referenced by permanently-skipped tests that require connectivity data never passed from Fortran. - **Deleted files:** `test_diffusion.f90` (384 lines), `test_dycore.f90` (851 lines) - **Removed 4 skipped tests** from `test_cli.py`: `test_py2fgen_compilation_and_execution_{diffusion,diffusion_gpu,dycore,dycore_gpu}` - **Kept:** `test_square.f90` and all active tests that use it <!-- START COPILOT CODING AGENT TIPS --> --- 🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. [Learn more about Advanced Security.](https://gh.io/cca-advanced-security) --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: jcanton <5622559+jcanton@users.noreply.github.com>
The `graupel` SDFG looks like the following: <img width="1889" height="1051" alt="image" src="https://github.com/user-attachments/assets/2c88af89-1b2d-40f5-9928-aa6e6698449b" /> In both maps there are outputs whose values are determined based on if-statements that check if a mask or multiple masks are activated. In case they are not the values of the maps are updated with the inputs without any change. Since we know that the inputs and outputs are the same pointers we can improve this patter by removing the copies in the false branches of the if-statements and replacing the intermediate temporary `AccessNode`s with the global `AccessNode`s that are used as outputs of the program. To be more specific, the `AccessNode`s where this is applied are: - `q_in_2` -> `q_out_2` - `q_in_3` -> `q_out_3` - `q_in_4` -> `q_out_4` - `q_in_5` -> `q_out_5` - `te` -> `t_out` This is the updated SDFG: <img width="1766" height="1136" alt="image" src="https://github.com/user-attachments/assets/3827fe87-10f0-4c33-98e3-12e78d9bbfed" /> --------- Co-authored-by: Edoardo Paone <edoardo.paone@cscs.ch> Co-authored-by: Hannes Vogt <hannes.vogt@cscs.ch> Co-authored-by: Philip Mueller, CSCS <philip.mueller@cscs.ch>
rayleigh_coeff divdamp_trans_start divdamp_trans_end and also remove nudging_decay_rate in DiffusionConfig
Co-authored-by: Edoardo Paone <edoardo.paone@cscs.ch>
- Removed duplicate ```timeloop_diffusion_savepoint_exit (driver)``` and ```timeloop_diffusion_savepoint_exit_standalone (standalone_driver) ``` fixtures that were identical to the shared ```savepoint_diffusion_exit``` in ```datatest.py``` - Added a small ```linit``` fixture alias in both driver and standalone_driver to bridge the parametrized ```timeloop_diffusion_linit_exit``` name to the ```linit``` name expected by the shared fixture
…re solver Profiled vertically_implicit_solver_at_predictor_step on MI300A (Beverin, gfx942). Individual kernels achieve 93% of HBM peak bandwidth. Enable fuse_tasklets for the solver stencil, giving ~7% improvement (0.82ms -> 0.76ms). Added per-kernel roofline script, C2E scatter analysis, and HIP/CUDA bandwidth benchmarks for cross-platform comparison. See amd_scripts/PROFILING_RESULTS.md for detailed findings.
`single_node_default` is ambiguous—it sits next to `single_node_reductions` in `definitions.py` but doesn't convey that it's an exchange runtime. Renamed to `single_node_exchange` to match its type (`SingleNodeExchange`) and mirror the naming of its sibling. Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: jcanton <5622559+jcanton@users.noreply.github.com>
The actual Fortran bindings of diffusion and dycore where part of the tools/py2fgen package. However py2fgen is actually a standalone tool. We introduce a new package `icon4py.bindings` which depends on py2fgen and the atmosphere packages that it's generating bindings for. Longer term it might be better to make the bindings part of their respective packages as optionals.
Amd profiling
|
Mandatory Tests Please make sure you run these tests via comment before you merge!
Optional Tests To run benchmarks you can use:
To run tests and benchmarks with the DaCe backend you can use:
To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:
For more detailed information please look at CI in the EXCLAIM universe. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.