Skip to content

14NGiestas/mfi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

159 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MFI

Modern Fortran interfaces to BLAS and LAPACK

MFI provides generic, type-agnostic wrappers around BLAS and LAPACK routines. Instead of writing type-specific calls with dozens of arguments, you write one call that works for real32, real64, complex(real32), and complex(real64).

Example: $C = A \cdot B$

program main
    use mfi_blas, only: mfi_gemm
    implicit none
    real :: A(4,4), B(4,4), C(4,4)
    ! ... fill A and B ...
    call mfi_gemm(A, B, C)   ! That's it. No leading dims, no m/n/k, no alpha/beta.
end program

Quick Start

Recommended: Nix Flake (zero config)

git clone https://github.com/14NGiestas/mfi.git
cd mfi
nix develop          # cpu-only shell with gfortran, fpm, fypp, BLAS, LAPACK
nix develop .#gpu-modern   # with CUDA 12.3
nix develop .#gpu-legacy   # with CUDA 11.8
make              # generates .f90 from .fpp/.fypp templates
fpm test          # runs the test suite

Requires Nix with flakes enabled.

Manual setup

Tool Minimum version
fpm β‰₯ 0.13.0
fypp any
Fortran compiler gfortran 12+ (recommended)
pip install fypp

Install BLAS and LAPACK from your package manager:

Distro Package
Arch openblas-lapack-static (AUR)
Ubuntu/Debian libblas-dev liblapack-dev
Fedora openblas-devel lapack-devel

Build & Test

git clone https://github.com/14NGiestas/mfi.git
cd mfi
make              # generates .f90 from .fpp/.fypp templates
fpm test          # runs the test suite

Using MFI as a Dependency

Add to your project's fpm.toml:

# CPU-only (stable)
[dependencies]
mfi = { git = "https://github.com/14NGiestas/mfi.git", branch = "mfi-fpm" }

That's all β€” fpm handles the rest. No make needed in your own project.


GPU Acceleration with cuBLAS

MFI can transparently dispatch BLAS calls to cuBLAS when compiled with the cublas feature. The same mfi_gemm, mfi_gemv, etc. calls run on the GPU without code changes.

Try it in your browser:

Open In Colab

Local build with cuBLAS

make
fpm build --profile cublas
fpm test --profile cublas

Runtime CPU / GPU switching

MFI uses lazy initialization β€” no setup code is needed. When compiled with the cublas feature, GPU dispatching is controlled entirely by the MFI_USE_CUBLAS environment variable:

# CPU (default)
./build/app/app

# GPU
MFI_USE_CUBLAS=1 ./build/app/app

The same call mfi_gemm(A, B, C) runs on CPU or GPU without any code changes.

For OpenMP-parallel programs, also set OMP_NUM_THREADS to pre-allocate per-thread cuBLAS handles:

MFI_USE_CUBLAS=1 OMP_NUM_THREADS=8 ./build/app/app

Manual CPU/GPU switching (advanced)

If you need fine-grained control within a single program (e.g., run most computations on GPU but force a specific call to CPU), use mfi_force_gpu / mfi_force_cpu:

call mfi_gemm(A, B, C)       ! CPU (default)

call mfi_force_gpu
call mfi_gemm(D, E, F)       ! GPU
call mfi_force_cpu

call mfi_gemm(G, H, I)       ! CPU again

Note: When compiled without the cublas feature, mfi_force_gpu and mfi_force_cpu are no-op stubs β€” your code compiles and runs normally on CPU without any #ifdef changes. Simply recompile with --profile cublas to activate GPU acceleration.

Clean shutdown (optional)

Call mfi_cublas_finalize() at program end to release GPU resources. The OS cleans up on exit anyway.


Troubleshooting

Problem Solution
CUBLAS_STATUS_NOT_INITIALIZED cuBLAS handle not created. Set MFI_USE_CUBLAS=1 or call mfi_force_gpu before the first BLAS call.
cuda_runtime.h not found CUDA Toolkit is not installed or not in your include path. See gpu_test.ipynb for a working Colab setup.
i?amin symbols missing Your BLAS provider lacks extensions. Use the default profile (without MFI_LINK_EXTERNAL) or switch to OpenBLAS.
Tests fail on CPU build Known pre-existing failures: cunmrq, sorg2r, sorgr2, cungr2, cung2r, sormrq, heevx (segfault).

Interface Levels

MFI exposes four interface levels for BLAS, from bare-metal to fully modern:

Level Example Arguments
Raw F77 call cgemm('N','N', N, N, N, alpha, A, N, B, N, beta, C, N) 13
Improved F77 call f77_gemm('N','N', N, N, N, alpha, A, N, B, N, beta, C, N) 13 (no c/d/s/z prefix)
MFI typed call mfi_sgemm(A, B, C) 3 (type-specific)
MFI generic call mfi_gemm(A, B, C) 3 (type-agnostic)

For full API documentation, see the generated reference.


Supported Routines

BLAS

Level 1

Click to expand
Status Name Description
πŸ‘ asum Sum of vector magnitudes
πŸ‘ axpy Scalar-vector product
πŸ‘ copy Copy vector
πŸ‘ dot Dot product
πŸ‘ dotc Dot product conjugated
πŸ‘ dotu Dot product unconjugated
f77 sdsdot Extended precision inner product
f77 dsdot Extended precision inner product with double result
πŸ‘ nrm2 Vector 2-norm (Euclidean norm)
πŸ‘ rot Plane rotation
πŸ‘ rotg Generate Givens rotation
πŸ‘ rotm Modified Givens rotation
πŸ‘ rotmg Generate modified Givens rotation
πŸ‘ scal Vector-scalar product
πŸ‘ swap Vector-vector swap

Level 1 β€” Extensions

Click to expand
Status Name Description
πŸ‘ iamax Index of maximum absolute value element
πŸ‘ iamin Index of minimum absolute value element
πŸ‘ lamch Machine precision parameters

Level 2

Click to expand
Status Name Description
πŸ‘ gbmv Matrix-vector product (general band)
πŸ‘ gemv Matrix-vector product (general)
πŸ‘ ger Rank-1 update (general)
πŸ‘ gerc Rank-1 update (general, conjugated)
πŸ‘ geru Rank-1 update (general, unconjugated)
πŸ‘ hbmv Matrix-vector product (Hermitian band)
πŸ‘ hemv Matrix-vector product (Hermitian)
πŸ‘ her Rank-1 update (Hermitian)
πŸ‘ her2 Rank-2 update (Hermitian)
πŸ‘ hpmv Matrix-vector product (Hermitian packed)
πŸ‘ hpr Rank-1 update (Hermitian packed)
πŸ‘ hpr2 Rank-2 update (Hermitian packed)
πŸ‘ sbmv Matrix-vector product (symmetric band)
πŸ‘ spmv Matrix-vector product (symmetric packed)
πŸ‘ spr Rank-1 update (symmetric packed)
πŸ‘ spr2 Rank-2 update (symmetric packed)
πŸ‘ symv Matrix-vector product (symmetric)
πŸ‘ syr Rank-1 update (symmetric)
πŸ‘ syr2 Rank-2 update (symmetric)
πŸ‘ tbmv Matrix-vector product (triangular band)
πŸ‘ tbsv Solve (triangular band)
πŸ‘ tpmv Matrix-vector product (triangular packed)
πŸ‘ tpsv Solve (triangular packed)
πŸ‘ trmv Matrix-vector product (triangular)
πŸ‘ trsv Solve (triangular)

Level 3

Click to expand
Status GPU Name Description
πŸ‘ βœ… gemm General matrix-matrix product
πŸ‘ βœ… hemm Hermitian Γ— general matrix product
πŸ‘ herk Hermitian rank-k update
πŸ‘ her2k Hermitian rank-2k update
πŸ‘ βœ… symm Symmetric Γ— general matrix product
πŸ‘ syrk Symmetric rank-k update
πŸ‘ syr2k Symmetric rank-2k update
πŸ‘ βœ… trmm Triangular Γ— general matrix product
πŸ‘ βœ… trsm Solve with triangular matrix

LAPACK

LAPACK coverage is growing β€” routines are implemented as needed.

Factorization and Solve

Click to expand
Status Name Description
πŸ‘ geqrf QR factorization
πŸ‘ gerqf RQ factorization
πŸ‘ getrf LU factorization
πŸ‘ getri Matrix inverse (from LU)
πŸ‘ getrs Solve with LU-factored matrix
πŸ‘ hetrf Bunch-Kaufman factorization (Hermitian)
πŸ‘ pocon Condition number estimate (Cholesky)
πŸ‘ potrf Cholesky factorization
πŸ‘ potri Matrix inverse (from Cholesky)
πŸ‘ potrs Solve with Cholesky-factored matrix
πŸ‘ sytrf Bunch-Kaufman factorization (symmetric)
πŸ‘ trtrs Solve with triangular matrix

Orthogonal / Unitary Factors

Click to expand
Status Name Description
πŸ‘ orgqr Generate Q from QR (real)
πŸ‘ orgrq Generate Q from RQ (real)
πŸ‘ ormqr Multiply by Q from QR (real)
f77 ormrq Multiply by Q from RQ (real)
πŸ‘ org2r Generate Q from QR2 (real)
πŸ‘ orm2r Multiply by Q from QR2 (real)
πŸ‘ orgr2 Generate Q from RQ2 (real)
πŸ‘ ormr2 Multiply by Q from RQ2 (real)
πŸ‘ ungqr Generate Q from QR (complex)
πŸ‘ ungrq Generate Q from RQ (complex)
πŸ‘ unmqr Multiply by Q from QR (complex)
f77 unmrq Multiply by Q from RQ (complex)
πŸ‘ ung2r Generate Q from QR2 (complex)
πŸ‘ unm2r Multiply by Q from QR2 (complex)
πŸ‘ ungr2 Generate Q from RQ2 (complex)
πŸ‘ unmr2 Multiply by Q from RQ2 (complex)

Eigenvalues and SVD

Click to expand
Status Name Description
πŸ‘ gesvd Singular value decomposition
πŸ‘ heevd Hermitian eigenvalues (divide & conquer)
πŸ‘ hegvd Generalized Hermitian eigenproblem (divide & conquer)
πŸ‘ heevr Hermitian eigenvalues (relatively robust)
f77 heevx Hermitian eigenvalues (expert)

Least Squares

Click to expand
Status Name Description
f77 gels Least squares (QR/LQ)
f77 gelst Least squares (QR/LQ, T matrix)
f77 gelss Least squares (SVD, QR iteration)
f77 gelsd Least squares (SVD, divide & conquer)
f77 gelsy Least squares (complete orthogonal)
f77 getsls Least squares (tall-skinny QR/LQ)
f77 gglse Equality-constrained least squares
f77 ggglm Gauss-Markov linear model

Auxiliary

Name Types Description
mfi_lartg s, d, c, z Generate plane rotation

Continuous Integration

CI uses Nix flakes with magic-nix-cache-action for fast, reproducible builds.

Event Behavior
Push to main Full test matrix + deploy to mfi-fpm
Push to impl/cublas Full test matrix + deploy to mfi-cublas
PR to main Full test matrix
Manual dispatch Full test matrix