Releases: magnetotellurics/ModEM
2023-11-19-rev755 - New default settings for SP/SP2 Solvers (BICG)
Note: This is a release on the classic branch (SVN stable branch). Although this revision is after release 2022-07-07-rev630 (that was released on the main branch), it does not include the GPU solvers.
This version is based on the latest stable code but includes IMPORTANT technical changes
relative to stable -rev 710, namely:
FWD,FWD_SP,FWD_SP2/EMsolve3D.f90 - NEW DEFAULTS; unless the solver is specified in the
forward solver configuration file, BICG is now the default for sparse-matrix formulations
SP & SP2. QMR is still the default for the traditional matrix-free formulation.
MPI/Main_MPI.f90 - incorporated all the deallocate statements from Naser's version
of the code to, hopefully, eliminate the memory leaks; also cleaned up the formatting
in some obvious places. Note that this version does not include the initialization
of grid elements l_E and S_F. This is by design - this is now correctly done in
modelOperator3D.f90 for all flavors of the code (FWD, FWD_SP, FWD_SP2, FWD_SPETSc2).
MPI/Declaration_MPI.f90 - added subroutine gather_runtime and Naser's solver_name,
period and solver_residual_vec variables.
3D_MT/Sub_MPI.f90 - added rFile_Model1D variable to allow for secondary field formulation
in the near future (this version is not fully equipped with it).
FWD,FWD_SP,FWD_SP2,FWD_SPETSc2/modelOperator3D.f90 - added initialization and deallocation
of metric elements (rvectors & cvectors in addition to arrays used previously by SP*).
These are used repeatedly in EMfieldInterp.f90 and ModelMap.f90 for all, and for other
things in FWD as well. Note that this duplicates the storage of 5 metric element arrays.
Currently unavoidable (to support spherical coords) unless the code is rewritten.
Previous versions used these arrays as well but they were allocated in the wrong places.
SP_Topology/MetricElements*.f90 - added Vcell variable to support modelOperator3D.f90
metric element initialization (needed for ModelMap.f90 in SP* variants of the code).
FWD/modelParam/modelMap.inc - removed metric elements (V_C and V_E) initialization, now
correctly done in modelOperator3D.f90; incorporated Gary's changes that remove the ghost
nodes which I no longer remember anything about and which messed up Jmult for Gary.
UTILS/elements.f90 - added the logic for boundary elements and, importantly, replaced
integral approximations with finite differences for the area and volume elements.
This has NOT been carefully verified to be better. Needs to be carefully tested.
This module is not used by the traditional 3D MT but it will be needed for the spherical coords
modeling, which is not yet part of this distribution since it requires additional development
before it can be released to the public.
--
Technical notes:
This version at present does NOT pass the symmetry tests, specifically Jmult. Matrix-free
is broken relative to how well it passed the test on 2022-11-17 and SP2 is substantially
broken just like it used to be. All versions compile and run, tested both in serial and
in parallel. Additional length, area and volume elements storage adds just below 7% memory
usage to matrix-free and 10% to sparse matrix formulations, based on a typical realistic
parallel forward modeling example, because they are currently stored both in matrix form
and as vectors, for different purposes - can be improved upon with code rewrite.
A much more substantial memory reduction has been achieved by removing some of the memory
leaks from the MPI code, based on Naser Meqbel's and Paulo Werdt's analysis.
SP2 BICG formulation is be far the most efficient at present. For a realistic MPI example
with an optimized Intel-compiled code, forward modeling completes in 14.46min with SP2 BiCG
compared to 1h 49min with matrix-free QMR. However, matrix-free code requires approx
2.5 times less operational memory than SP2.
SP2 QMR runs 3.5 - 10 times SLOWER than SP2 BICG.
The matrix-free QMR works while the matrix-free BICG never converges and is truly broken.
All these observations equally apply to stable -rev 710 that this is based upon, so I'm
committing to the svn with the intention of figuring out what has broken the symmetry
in a separate release sometime later - not the first priority right now.
Both the current stable version and this version pass the symmetry test perfectly using
the old (broken!) model and air layers, unit_testing. It will take
some time to understand why the new, corrected, model and air layers - deeper lower boundary
and better resolution in the air - does not pass the Jmult symmetry test as perfectly.
After wasting a couple of days on this again, will postpone for now.
And of course, the latest fix introduced a stupid MPI bug, which is no longer as of this commit. The forward modeling is tested to run efficiently in parallel (FWD and SP2).
2022-07-07-rev703 - Corresponds to publication Dong et al, 2024
Note: This release is a release on the main (SVN stable-) branch. This is not intended to be a robust release. Even though Release 2023-11-19-rev755 comes after this release, it was based on the stable branch and does not include these changes.
Introducing GPU solvers. Please refer to https://github.com/dong-hao/ModEM-GPU for bug updates based on this revision. Please follow the main branch for further development of this functionality.
Dong, H., Sun, K., Egbert, G. D., Kelbert, A., & Meqbel, N. (2024). Hybrid CPU-GPU solution to regularized divergence-free curl-curl equations for electromagnetic inversion problems. Computers and Geosciences, 184, 05518, Elsevier. https://doi.org/10.1016/j.cageo.2024.105518
-rev630 (89c3215):
introducing the new GPU solvers -
this (hopefully) will not affect the behavior of CPU solvers. for now I only
implemented the GPU solver in the SP2 version - which is the most efficient CPU
version so far.
Depending on your hardware, the GPU vs CPU speed-up can be anywhere between a
few times (for old/weak GPUs) to tens of times for professional ones.
The code has been tested on various computers from my old laptop (bought in
2016) to a brand-new multi-GPU workstation, with GCC+Gfortran 7/9 and CUDA 10/11
Also note that this apparently only works for NVIDIA cards, as the
implementation is through CUDA. Other "universal" interfaces like Kokkos or
OpenCL may be prefered for other GPUs - however, those interfaces do not
provide the ability to implement kernel-level improvements, yet.
Those who want to use the code are welcomed to try - although one should bear
in mind that the GPU libs and CUDA are rather user-hostile to set-up. See
Makefile.gpu file for some basic example on how to compile the code. You have
been warned...
3D_MT/FWD_SP2/solver.f90:
see cuBiCG/cuBiCGmix subroutines for the details of the new solvers
GPU version of the BiCG solver (should) yield consistent results as the CPU
BiCG. the default GPU solver is now the double precision version.
3D_MT/FWD_SP2/EMsolve3D.f90:
added a method to enable using multiple CPUs with one GPU - to mitigate the
impact of the overheads from the CPU side. This is important for an efficient
hybrid CPU-GPU parallel infrastructure.
3D_MT/FWD_SP2/cudaFortMap.f90:
added this file for the translation of CUDA-C interfaces to Fortran. As we can
only write kernels in C, this is kind of inevitable.
3D_MT/FWD_SP2/kernel_c.cu:
added all custom CUDA-C kernel codes here. I never imagined myself writing C
again after all these years, hmmmmm.
-rev648 (b39f69c):
INV/NLCG.f90 INV/LBFGS.f90
a clean-up of the debug and stub codes during my experiments for the
linesearch schemes and LBFGS solver. also removed the LBFGSsolver2 for inversions in smooth model space
Note the default linesearch routine is "cubic" for NLCG and "wolfe2" for
LBFGS, the wolfe2 uses only one forward and one adjoint calculations per
iteration. So it should be 1/3 faster than the "wolfe" one (2 fwds and 1trn)
One problem for wolfe2 is the in-exact line search (not enough descend)
may cause an early iteration finish (or an early update of lambda)
if you are interested, you can also try the PR+ restart criteria
for NLCG (comment the two PR lines and uncomment the PR+ ones)
From my personal experiences, they are slightly faster than the
default "PR" criteria for NLCG
Mod3DMT.f90/Mod2DMT.f90
updated the corresponding interfaces (from LBFGSsolver2 to LBFGSsolver)
-rev675 (d127dae):
A couple of important updates on the two-layered parallel structure (as discussed in the GPU paper).
MPI/Declaration_MPI.f90, Main_MPI.f90:
The most important feature is a topology-based parallel paradigm. We can now get to know the number of GPU devices on a physical node, which allows an easier multiple GPU usage on multiple machines.
The other feature is a node-based parallel method, controlled by a master switch (para_method) in Declaration_MPI.f90. The basic idea is to use the bandwidth of one entire server node to, well, accelerate one single FWD/ADJ task. It allows one to utilize tens of computational servers at once (with the PETSc routines, of course). That’s probably the best way to get maximum throughput if one doesn’t have any GPUs at hand.
That can, of course, singlehandedly dry out your CPU-hour deposit - but I guess it worth it if you can get a total bandwidth with tens of Terabytes and get the job done faster. But note that this is turned off on default (set para_method = 1 and re-compile to try it).
3D_MT/FWD_SPETSc2/ EMsolve3D.f90, modelOperator3D.f90:
I also tried to make the PETSc version compatible with the new configuration format. Not sure if it works correctly, though.
Makefiles:
updated the tempelate makefiles (probably only used by myself for quick switch of branches).
-rev699 (7ec8bc4):
3D_MT/FWD_SP2/EMsolve3D.f90 solver.f90
fixed a bug when using solvers other than BiCG with GPU - as there wasn't any other GPU solvers at all (TFQMR should not be very hard to implement, as we already have the cpu version)...
The code now gives a warning if selecting QMR or TFQMR - and fall back to CPU solvers. Also fixed a silly typo that caused the fortran-c interoperability to fail.
TODO:
there is still no means to configure the GPU version with fmkmf.perl, as the code now involves a CUDA cpp compiler (NVCC) - need to do it by hand for now - i.e. $CC and $C_FLAG thingy. See Makefile.gpu for an example.
-rev703 (0b9626d):
a quick update to setup the default GPU solver to be using
deterministic (but slightly slower) algorithm
MT3D/FWD_SP2/solver.f90
use CUSPARSE_SPMV_CSR_ALG2 to replace ALG1
2022-11-17-rev679 - Updates for publications Dong & Egbert 2019 and Liu et al. 2024
SP/SP2 Solvers added
Dong, H., &; Egbert, G. D. (2019). Divergence-free solutions to electromagnetic forward and adjoint problems: a regularization approach Geophysical Journal International, 216(2), 906–918, OUP. https://doi.org/10.1093/gji/ggy462
Pushing many of the substantial new features to ModEM stable:
- SP/SP2 compile time optional solvers added [Hao Dong]
In addition to the default matrix-free solution (MF), the two sparse-matrix
solver schemes have been added for CC-DC (SP) and CCGD (SP2) methods. These
typically require more memory but are also more efficient. SP2 is generally
more efficient than SP.
Traditional curl-curl formulation with divergence correction (CC-DC) compared
to curl-curl formulation with grad-div regularization (CCGD; Dong & Egbert,
2019) implemented in the sparse-matrix solver framework (SP2). CCGD is 85%
faster than CC-DC approach for shorter periods, while the advantage increases
to 200% for longer periods. Run times for the inversion test are also reduced
by almost a factor of three, making 3D FD inversion significantly more
practical.
- Flexible air layers added [Anna Kelbert]
Historically, these were computed internally in the code. Now, specify options
in the FWD control file. New default: fixed height, 12 layers.
- General boundary conditions added [Anna Kelbert]
A boundary conditions file can now be externally supplied on command line. The
format is sparse matrix ascii.
- Choice of the forward solver added [Hao Dong and Anna Kelbert]
Specify in the FWD control file. New default: BiCG. This is generally more
efficient than QMR.
- Choice of inversion algorithm added [Hao Dong and Gary Egbert]
Select on command line: NLCG, DCG, LBFGS.
Everything compiles and runs with ifort and gfortran. Matrix-free version
passes the symmetry tests; SP2 doesn't (to fix).
DEAR USERS: please run and test the various combinations and provide feedback
to the developers. Thank you!
Also includes:
-
BUG FIX. Most embarassing, but I missed one little but important location for the apparent resistivity and phase fixes (two releases ago). Corrected now.
-
NEW FUNCTIONALITY. This version includes ModEM ADORA (with Arbitrary Data Orientation Angles) implemented by Liu Zhongyin as part of his Ph.D. project.
Liu, Z., Kelbert, A., & Chen, X. (2024). 3D magnetotelluric inversion with arbitrary data orientation angles Computers and Geosciences, 188, 105596, Elsevier. https://doi.org/10.1016/j.cageo.2024.105596
ModEM ADORA allows the user to specify data orientations both in bulk (through the corresponding line in the header) and line-by-line, allowing for complete
generality. At the end of each line, can add the azimuths, optionally, either for Hx, or for Hx,Hy,Ex,Ey. This allows for site-by-site orientation,
as well as (primarily for compatibility with historical data) for frequency-by-frequency orientation.
Internally, the predicted responses are rotated to correspond to the desired azimuths angles. This functionality has been tested to work in synthetic
and real inversion settings (i.e., all sensitivies are implemented for all flavors of MT responses, including the impedances, tippers, apparent resistivity
and phase, and the interstation TFs). Tilts are not currently supported, but the internal code structure supports complete generality - would be easy
to implement. Need to run this specific version in all combinations again to ensure that no bugs have been introduced.
2020-02-05-rev551 - Flexible Air Layers, Log 10 Conductivity
Flexible Air Layers, Unit Testing & Log 10 conductivity model
Release 1.2.0. This release includes a major code modification, however the
updates that are of relevance to the user community are few, and are mostly
limited to the new flexible air layers option, and the LOG10 electrical
resistivity input/output file option. See User Guide for details. This release
should be completely backwards compatible. If anything does not work for you in
exactly the same configuration as before, we would like to know about it.
Bug fix. Removed the lines in the exitSolver that caused the MPI code to keep
running long after the computations were completed, instead of exiting
gracefully. This got introduced (by me) in the rev 535 about 15 months ago. I
must have been trying to do different things for different txTypes. That logic
was removed but this got left in there accidentally. It only caused problems
just before exiting after the dictionaries are deallocated. Thanks to Ben
Murphy for pointing this out.
2014-07-24-rev485 - ModEM as Published in Kelbert et al. 2014
ModEM as published in Kelbert et al. 2014
Kelbert, A., Meqbel, N., Egbert, G. D., & Tandon, K. (2014) ModEM: A modular system for inversion of electromagnetic geophysical data Computers and Geosciences, 66, 40–53, Elsevier. https://doi.org/10.1016/j.cageo.2014.01.010
This stable release implements all bug fixes and updates since Oct 10, 2013
(rev 374 - 2c72f27).
-
We've listened to the feedback we've been getting about the desired support of
gfortran compiler. This release implements all necessary gfortran-friendly
fixes. The default makefiles Makefile2d and Makefile3d are now configured with
gfortran. Other makefiles will no longer be supported. Instead, configure files
will be maintained. -
We have added configure scripts to create makefile configurations with four
compilers: Portland Group, Intel, GFortran and G95. Three compiler options are
supported: debug, release, MPI. To use, run e.g., ./Configure.3D_MT.PGI
Makefile MPI make Note: we have only tested MPI configuration with Portrand
Group and Intel compilers so far. -
Important bug fixes are included for apparent resistivity and phase tensor
inversion configurations, thanks to Kristina Tietze. -
The computation of the full sensitivity matrix in parallel now works thanks
to Naser Meqbel. -
The binary output of the full sensitivity matrix slightly modified to
include headers. -
The default maximum number of NLCG iterations is now 600.
-
In 2D MT, ocean can now be fixed; Tzy data type has been added.
Multiple other rewrites and fixes should not directly affect the usage of the
program.
2011-12-29-rev374 - Initial ModEM Release; Egbert & Kelbert 2012
Initial ModEM Release
Egbert, G. D., & Kelbert, A. (2012) Computational recipes for electromagnetic inverse problems Geophysical Journal International, 189(1), 251–267. https://doi.org/10.1111/j.1365-246X.2011.05347.x
Stability update before releasing the code to the academic community. Includes
updates to the user guide, and the paper currently in press in GJI. Also, fixes
2D MT inversion in MPI; changes the behaviour of apparent resistivity and phase
inversion (linear, NOT log apparent resistivities expected in the input file);
also includes minor corrections to the Matlab I/O scripts and some cosmetic
changes to the makefiles.