Skip to content

Add EVLOSER interface#1

Draft
tamar-dewilde wants to merge 28 commits into
resolve-developfrom
tamar/resolve-interface
Draft

Add EVLOSER interface#1
tamar-dewilde wants to merge 28 commits into
resolve-developfrom
tamar/resolve-interface

Conversation

@tamar-dewilde

@tamar-dewilde tamar-dewilde commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Description

Add an experimental EVLOSER sparse linear solver interface to HiOp.

EVLOSER is added as a separate embedded backend next to the existing ReSolve path. ReSolve remains CUDA-only, while EVLOSER has CUDA and is gated by its own HIOP_USE_EVLOSER option. The existing sparse solver option path is extended so EVLOSER can be selected with linear_solver_sparse=evloser.

The HIP path currently supports RF refactorization without iterative refinement. The CUDA path keeps the RF and iterative refinement structure.

@pelesh

Proposed changes

  • Add embedded EVLOSER backend files under src/LinAlg/EVLOSER.
  • Add CUDA and HIP GPU backend wrapper headers for EVLOSER.
  • Add hiopLinSolverSparseEVLOSER.hpp and hiopLinSolverSparseEVLOSER.cpp.
  • Register evloser as a valid sparse linear solver option.
  • Wire EVLOSER into sparse KKT solver selection and dual initialization.
  • Add HIOP_USE_EVLOSER as a separate build option, independent of HIOP_USE_RESOLVE.
  • Link EVLOSER separately from ReSolve so HIP-only builds do not require the CUDA-only ReSolve target; the existing ReSolve target and install rule remain gated on HIOP_USE_RESOLVE AND HIOP_USE_CUDA.
  • Add EVLOSER sparse driver options:
    • -evloser
    • -evloser_cuda_rf
    • -evloser_hip_rf
  • Add EVLOSER sparse driver tests:
    • EVLOSERKLUCPU
    • NlpSparse1_EVLOSER_CPU
    • NlpSparse2_EVLOSER_CPU
    • NlpSparseRaja2_3
    • NlpSparseRaja2_4
  • Build the RAJA sparse driver for CUDA and HIP EVLOSER configurations.
  • Disable iterative refinement on HIP builds since the current Krylov kernels are CUDA-only.
  • Add host-side CSR/CSC structural validation before KLU/RF setup and refactorization in the EVLOSER backend.

Additional cleanup in files touched by this branch:

  • Fix a pre-existing copy/paste bug in the sparse example drivers where use_ginkgo_cuda = false it was duplicated and use_ginkgo_hip was never initialized so it was corrected to initialize use_ginkgo_hip. Unrelated to EVLOSER, found while editing the same flag-parsing code.

Checklist

  • All tests pass (make test and make test_install per testing instructions). Code tested on
    • CPU backend
    • CUDA backend
    • HIP backend
  • I have manually run the non-experimental examples and verified that residuals are close to machine precision. (In your build directory run: ./examples/<your_example>.exe -h to get instructions how to run examples). Code tested on:
    • CPU backend
    • CUDA backend
    • HIP backend
  • Code compiles cleanly with flags -Wall -Wpedantic -Wconversion -Wextra.
  • There are unit tests for the new code.
  • The new code is documented.
  • The feature branch is rebased with respect to the target branch.
  • I have updated CHANGELOG.md to reflect the changes in this PR. If this is a minor PR that is part of a larger fix already included in the file, state so.

Further comments

The HIP EVLOSER path currently disables iterative refinement because the existing iterative refinement implementation depends on CUDA-only Krylov kernels. The HIP RF path still registers, builds, and passes the sparse RAJA RF test on Frontier.

SuiteSparse/KLU was supplied through the active Spack environment with spack -e. The HiOp Spack recipe was not changed.

The following environment-specific adjustments were used during validation and are not part of this PR:

  • On Frontier, FortranCInterface produced an empty FC_GLOBAL definition for HIP-compiled translation units that include hiop_blasdefs.hpp. A local force-include was used for validation. This should be addressed separately in the HIP build configuration.
  • For local CUDA validation, CoinHSL was rebuilt against the OpenBLAS and METIS installations from the active Spack environment.
  • The original ReSolve CUDA RF path was also built and run separately as a regression check. It completed successfully with the expected self-check result.

@tamar-dewilde tamar-dewilde changed the title resolve interface Add EVLOSER interface Jun 18, 2026
@tamar-dewilde tamar-dewilde force-pushed the tamar/resolve-interface branch 3 times, most recently from 248fd54 to e784fe8 Compare June 18, 2026 04:22
@tamar-dewilde

Copy link
Copy Markdown
Collaborator Author

I ran a temporary EVLOSER HIP RF debug build before cleaning up the branch. The new driver path selected the HIP EVLOSER RF backend correctly:

1:[EVLOSER] entered NlpSparseRajaEx2Driver main argc=5
2:[EVLOSER] argv[0]=/autofs/nccs-svm1_home2/tamar/hiop_work/tmp/spack-stage/spack-stage-hiop-develop-lnqwlurftklhpua6lhjdryaxp2vlw5tk/spack-build-lnqwlur/src/Drivers/Sparse/NlpSparseRajaEx2.exe
3:[EVLOSER] argv[1]=500
4:[EVLOSER] argv[2]=-inertiafree
5:[EVLOSER] argv[3]=-selfcheck
6:[EVLOSER] argv[4]=-evloser_hip_rf
7:[EVLOSER] parsed flags resolve_glu=0 resolve_rf=0 evloser_cuda_rf=0 evloser_hip_rf=1 ginkgo=0 ginkgo_cuda=0 ginkgo_hip=0 inertia_free=1 self_check=1 n=500
8:[EVLOSER] driver selecting linear_solver_sparse=evloser use_evloser_cuda_rf=0 use_evloser_hip_rf=1
9:[EVLOSER] constructing HiOp EVLOSER sparse solver in /ccs/home/tamar/hiop_work/hiop-ornl-mirror/src/Optimization/hiopKKTLinSysSparse.cpp
10:[EVLOSER] lifecycle first_calls=1 matrix_changed=18 value_updates=17 factorizations=2 refactorizations=16 solves=17

This verified that the HIP path was being selected correctly and that the expected factorization/refactorization/solve lifecycle was reached during the debug run.

Note: the remaining red check appears to be in the Spack/mirror workflow rather than the source changes. I looked into it briefly, but I did not want to keep changing CI logic in this draft PR without guidance.

@pelesh pelesh changed the base branch from develop to resolve-develop June 19, 2026 14:50
@pelesh pelesh self-requested a review June 19, 2026 16:05
@tamar-dewilde tamar-dewilde force-pushed the tamar/resolve-interface branch from b874bed to 7621a05 Compare June 25, 2026 18:07
@tamar-dewilde tamar-dewilde force-pushed the tamar/resolve-interface branch from 7621a05 to 240786a Compare June 25, 2026 21:48
@tamar-dewilde

Copy link
Copy Markdown
Collaborator Author

Embedded EVLOSER HIP / CUDA RF comparison

Both runs used:

NlpSparseRajaEx2 500 -inertiafree -selfcheck

HIP RF CUDA RF
Platform Frontier, gfx90a, ROCm 6.2.4 Local, sm_75, CUDA 12.8
Iterations 17 17
Final objective 6.4322371e+01 6.4322371e+01
Final inf_pr / inf_du 2.830e-08 / 1.150e-11 1.776e-15 / 1.150e-11
Self-check Success, 6 digits Success, 6 digits
Iterative refinement Disabled Enabled; 0 iterations used
Initial KLU factorization Regularized and recovered Regularized and recovered
Full CTest suite 37/37 passed 39/39 passed

Both backends produced the same iteration count, objective, and six-digit self-check result. The initial KLU factorization required regularization and recovered on both backends, suggesting the behavior is not backend-specific.

The HIP inf_pr is higher but remains within the self-check tolerance.

The suite totals differ because tests are registered by configuration. All registered EVLOSER tests passed.

The existing ReSolve solver sources are unchanged. EVLOSER is gated by HIOP_USE_EVLOSER, while ReSolve remains gated by HIOP_USE_RESOLVE AND HIOP_USE_CUDA.

I also built a separate ReSolve-enabled configuration and manually ran -resolve_cuda_rf; it completed successfully and passed its self-check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant