MFI provides generic, type-agnostic wrappers around BLAS and LAPACK routines.
Instead of writing type-specific calls with dozens of arguments, you write one
call that works for real32, real64, complex(real32), and complex(real64).
program main
use mfi_blas, only: mfi_gemm
implicit none
real :: A(4,4), B(4,4), C(4,4)
! ... fill A and B ...
call mfi_gemm(A, B, C) ! That's it. No leading dims, no m/n/k, no alpha/beta.
end programgit clone https://github.com/14NGiestas/mfi.git
cd mfi
nix develop # cpu-only shell with gfortran, fpm, fypp, BLAS, LAPACK
nix develop .#gpu-modern # with CUDA 12.3
nix develop .#gpu-legacy # with CUDA 11.8
make # generates .f90 from .fpp/.fypp templates
fpm test # runs the test suiteRequires Nix with flakes enabled.
| Tool | Minimum version |
|---|---|
| fpm | β₯ 0.13.0 |
| fypp | any |
| Fortran compiler | gfortran 12+ (recommended) |
pip install fyppInstall BLAS and LAPACK from your package manager:
| Distro | Package |
|---|---|
| Arch | openblas-lapack-static (AUR) |
| Ubuntu/Debian | libblas-dev liblapack-dev |
| Fedora | openblas-devel lapack-devel |
git clone https://github.com/14NGiestas/mfi.git
cd mfi
make # generates .f90 from .fpp/.fypp templates
fpm test # runs the test suiteAdd to your project's fpm.toml:
# CPU-only (stable)
[dependencies]
mfi = { git = "https://github.com/14NGiestas/mfi.git", branch = "mfi-fpm" }That's all β fpm handles the rest. No make needed in your own project.
MFI can transparently dispatch BLAS calls to cuBLAS when compiled with the
cublas feature. The same mfi_gemm, mfi_gemv, etc. calls run on the GPU
without code changes.
Try it in your browser:
make
fpm build --profile cublas
fpm test --profile cublasMFI uses lazy initialization β no setup code is needed. When compiled with the
cublas feature, GPU dispatching is controlled entirely by the
MFI_USE_CUBLAS environment variable:
# CPU (default)
./build/app/app
# GPU
MFI_USE_CUBLAS=1 ./build/app/appThe same call mfi_gemm(A, B, C) runs on CPU or GPU without any code changes.
For OpenMP-parallel programs, also set OMP_NUM_THREADS to pre-allocate
per-thread cuBLAS handles:
MFI_USE_CUBLAS=1 OMP_NUM_THREADS=8 ./build/app/appIf you need fine-grained control within a single program (e.g., run most
computations on GPU but force a specific call to CPU), use
mfi_force_gpu / mfi_force_cpu:
call mfi_gemm(A, B, C) ! CPU (default)
call mfi_force_gpu
call mfi_gemm(D, E, F) ! GPU
call mfi_force_cpu
call mfi_gemm(G, H, I) ! CPU againNote: When compiled without the
cublasfeature,mfi_force_gpuandmfi_force_cpuare no-op stubs β your code compiles and runs normally on CPU without any#ifdefchanges. Simply recompile with--profile cublasto activate GPU acceleration.
Call mfi_cublas_finalize() at program end to release GPU resources.
The OS cleans up on exit anyway.
| Problem | Solution |
|---|---|
CUBLAS_STATUS_NOT_INITIALIZED |
cuBLAS handle not created. Set MFI_USE_CUBLAS=1 or call mfi_force_gpu before the first BLAS call. |
cuda_runtime.h not found |
CUDA Toolkit is not installed or not in your include path. See gpu_test.ipynb for a working Colab setup. |
i?amin symbols missing |
Your BLAS provider lacks extensions. Use the default profile (without MFI_LINK_EXTERNAL) or switch to OpenBLAS. |
| Tests fail on CPU build | Known pre-existing failures: cunmrq, sorg2r, sorgr2, cungr2, cung2r, sormrq, heevx (segfault). |
MFI exposes four interface levels for BLAS, from bare-metal to fully modern:
| Level | Example | Arguments |
|---|---|---|
| Raw F77 | call cgemm('N','N', N, N, N, alpha, A, N, B, N, beta, C, N) |
13 |
| Improved F77 | call f77_gemm('N','N', N, N, N, alpha, A, N, B, N, beta, C, N) |
13 (no c/d/s/z prefix) |
| MFI typed | call mfi_sgemm(A, B, C) |
3 (type-specific) |
| MFI generic | call mfi_gemm(A, B, C) |
3 (type-agnostic) |
For full API documentation, see the generated reference.
Click to expand
| Status | Name | Description |
|---|---|---|
| π | asum | Sum of vector magnitudes |
| π | axpy | Scalar-vector product |
| π | copy | Copy vector |
| π | dot | Dot product |
| π | dotc | Dot product conjugated |
| π | dotu | Dot product unconjugated |
| f77 | sdsdot | Extended precision inner product |
| f77 | dsdot | Extended precision inner product with double result |
| π | nrm2 | Vector 2-norm (Euclidean norm) |
| π | rot | Plane rotation |
| π | rotg | Generate Givens rotation |
| π | rotm | Modified Givens rotation |
| π | rotmg | Generate modified Givens rotation |
| π | scal | Vector-scalar product |
| π | swap | Vector-vector swap |
Click to expand
| Status | Name | Description |
|---|---|---|
| π | iamax | Index of maximum absolute value element |
| π | iamin | Index of minimum absolute value element |
| π | lamch | Machine precision parameters |
Click to expand
| Status | Name | Description |
|---|---|---|
| π | gbmv | Matrix-vector product (general band) |
| π | gemv | Matrix-vector product (general) |
| π | ger | Rank-1 update (general) |
| π | gerc | Rank-1 update (general, conjugated) |
| π | geru | Rank-1 update (general, unconjugated) |
| π | hbmv | Matrix-vector product (Hermitian band) |
| π | hemv | Matrix-vector product (Hermitian) |
| π | her | Rank-1 update (Hermitian) |
| π | her2 | Rank-2 update (Hermitian) |
| π | hpmv | Matrix-vector product (Hermitian packed) |
| π | hpr | Rank-1 update (Hermitian packed) |
| π | hpr2 | Rank-2 update (Hermitian packed) |
| π | sbmv | Matrix-vector product (symmetric band) |
| π | spmv | Matrix-vector product (symmetric packed) |
| π | spr | Rank-1 update (symmetric packed) |
| π | spr2 | Rank-2 update (symmetric packed) |
| π | symv | Matrix-vector product (symmetric) |
| π | syr | Rank-1 update (symmetric) |
| π | syr2 | Rank-2 update (symmetric) |
| π | tbmv | Matrix-vector product (triangular band) |
| π | tbsv | Solve (triangular band) |
| π | tpmv | Matrix-vector product (triangular packed) |
| π | tpsv | Solve (triangular packed) |
| π | trmv | Matrix-vector product (triangular) |
| π | trsv | Solve (triangular) |
Click to expand
| Status | GPU | Name | Description |
|---|---|---|---|
| π | β | gemm | General matrix-matrix product |
| π | β | hemm | Hermitian Γ general matrix product |
| π | herk | Hermitian rank-k update | |
| π | her2k | Hermitian rank-2k update | |
| π | β | symm | Symmetric Γ general matrix product |
| π | syrk | Symmetric rank-k update | |
| π | syr2k | Symmetric rank-2k update | |
| π | β | trmm | Triangular Γ general matrix product |
| π | β | trsm | Solve with triangular matrix |
LAPACK coverage is growing β routines are implemented as needed.
Click to expand
| Status | Name | Description |
|---|---|---|
| π | geqrf | QR factorization |
| π | gerqf | RQ factorization |
| π | getrf | LU factorization |
| π | getri | Matrix inverse (from LU) |
| π | getrs | Solve with LU-factored matrix |
| π | hetrf | Bunch-Kaufman factorization (Hermitian) |
| π | pocon | Condition number estimate (Cholesky) |
| π | potrf | Cholesky factorization |
| π | potri | Matrix inverse (from Cholesky) |
| π | potrs | Solve with Cholesky-factored matrix |
| π | sytrf | Bunch-Kaufman factorization (symmetric) |
| π | trtrs | Solve with triangular matrix |
Click to expand
| Status | Name | Description |
|---|---|---|
| π | orgqr | Generate Q from QR (real) |
| π | orgrq | Generate Q from RQ (real) |
| π | ormqr | Multiply by Q from QR (real) |
| f77 | ormrq | Multiply by Q from RQ (real) |
| π | org2r | Generate Q from QR2 (real) |
| π | orm2r | Multiply by Q from QR2 (real) |
| π | orgr2 | Generate Q from RQ2 (real) |
| π | ormr2 | Multiply by Q from RQ2 (real) |
| π | ungqr | Generate Q from QR (complex) |
| π | ungrq | Generate Q from RQ (complex) |
| π | unmqr | Multiply by Q from QR (complex) |
| f77 | unmrq | Multiply by Q from RQ (complex) |
| π | ung2r | Generate Q from QR2 (complex) |
| π | unm2r | Multiply by Q from QR2 (complex) |
| π | ungr2 | Generate Q from RQ2 (complex) |
| π | unmr2 | Multiply by Q from RQ2 (complex) |
Click to expand
| Status | Name | Description |
|---|---|---|
| π | gesvd | Singular value decomposition |
| π | heevd | Hermitian eigenvalues (divide & conquer) |
| π | hegvd | Generalized Hermitian eigenproblem (divide & conquer) |
| π | heevr | Hermitian eigenvalues (relatively robust) |
| f77 | heevx | Hermitian eigenvalues (expert) |
Click to expand
| Status | Name | Description |
|---|---|---|
| f77 | gels | Least squares (QR/LQ) |
| f77 | gelst | Least squares (QR/LQ, T matrix) |
| f77 | gelss | Least squares (SVD, QR iteration) |
| f77 | gelsd | Least squares (SVD, divide & conquer) |
| f77 | gelsy | Least squares (complete orthogonal) |
| f77 | getsls | Least squares (tall-skinny QR/LQ) |
| f77 | gglse | Equality-constrained least squares |
| f77 | ggglm | Gauss-Markov linear model |
| Name | Types | Description |
|---|---|---|
| mfi_lartg | s, d, c, z | Generate plane rotation |
CI uses Nix flakes with magic-nix-cache-action for fast, reproducible builds.
| Event | Behavior |
|---|---|
Push to main |
Full test matrix + deploy to mfi-fpm |
Push to impl/cublas |
Full test matrix + deploy to mfi-cublas |
PR to main |
Full test matrix |
| Manual dispatch | Full test matrix |