Skip to content

weka/virtiofs-bench

Repository files navigation

WEKA VirtioFS Benchmark Suite

Benchmark automation and lab workbooks for evaluating WEKA storage via virtiofs on Ubuntu 24.04 KVM hypervisors.

What's here

File Purpose
bench_suite.sh End-to-end fio benchmark script — runs on the benchmark client
checkpoint_bench.py PyTorch model checkpoint save/load benchmark (1/5/10 GB, CPU tensors)
make_graphs.py Reads results, generates charts + WORKBOOK.md into runs/
terraform/ Terraform module + GCP environment for one-command lab provisioning
scripts/ Helper scripts (e.g. create-weka-token-secret.sh)
agents/ Agent coordination files — PLAN, STATUS, and per-role instructions for running benchmark sessions
runs/ Timestamped results archive — one directory per run

Run archive

Each entry in runs/ is a self-contained lab result:

runs/
  YYYY-MM-DD-<hostname>/
    results.json              — raw fio metrics (all 6 configurations)
    virtiofs_hypervisor.png   — UDP cached vs O_DIRECT hypervisor comparison
    virtiofs_tldr.png         — hypervisor reference + C/Rust virtiofsd VMs
    virtiofs_checkpoint.png   — PyTorch torch.save/load throughput by config + size
    WORKBOOK.md               — full writeup with tables, findings, charts
    bench.log                 — full fio + virtiofsd output from the client
    make.log                  — local orchestration log (terraform + scp output)

results/
    checkpoint_hypervisor.json    — raw checkpoint timings, hypervisor direct
    checkpoint_c_virtiofsd.json   — raw checkpoint timings, C virtiofsd guest
    checkpoint_rust_virtiofsd.json — raw checkpoint timings, Rust virtiofsd guest

The live orchestration log during a run is /tmp/virtiofs-bench-run.log — tail it to monitor progress.

Quickstart

Terraform provisions a full WEKA test environment (backends + client VM) with one command, enabling reproducible lab runs without manual setup.

Expected cost: ~$34/run (standard, n2-standard-32 client); ~$57/run (full with cold phases, n2-highmem-64).

Prerequisites: Terraform ≥ 1.3, gcloud auth login && gcloud auth application-default login, access to team-cst GCP project.

git clone git@github.com:weka/virtiofs-bench.git
cd virtiofs-bench
cp terraform/environments/gcp-lab/terraform.tfvars.example \
   terraform/environments/gcp-lab/terraform.tfvars
# Edit terraform.tfvars: set cluster_name (region/zone default to europe-west1-b)

# One-time token setup (recommended): store in Secret Manager, make picks it up automatically.
scripts/create-weka-token-secret.sh
# Fallback: set get_weka_io_token = "..." in terraform.tfvars instead.

make cluster-up    # ~15–20 min — 6 backends + client VM, /mnt/weka mounted
make prereqs       # fio, QEMU, Rust virtiofsd, Rocky 8, torch + checkpoint_bench.py (~15 min)
make bench         # fio suite + PyTorch checkpoint benchmark (~90 min)
make results       # copy results.json + checkpoint JSONs, generate graphs, commit + push

make cluster-down  # destroy all GCP resources when done

Or run the full pipeline in one shot:

make run           # cluster-up → prereqs → bench → results → cluster-down (auto-destroys)

To keep the cluster up after the run for inspection:

make all           # cluster-up → prereqs → bench → results (cluster stays up)
make cluster-down  # destroy when done

For full cold-phase runs (phases 6c/7c, requires 512 GB RAM):

make cluster-up BENCH_CLIENT=n2-highmem-64
make bench-full

Multi-cloud support (AWS, OCI) is in the multi-cloud branch.

Architecture

Hypervisor (Ubuntu 24.04)
  └── WEKA client (DPDK, 8 cores) → WEKA backends
  └── virtiofsd (C or Rust) → vhost-user-fs socket
        └── Guest VM (Rocky 8)
              └── /mnt/weka (virtiofs mount, no WEKA agent in guest)

Hypervisor benchmarks run directly against /mnt/weka. VM benchmarks boot a Rocky 8 guest via QEMU, mount via virtiofs, and run fio inside the guest.

Results overview

See OVERVIEW.md for the full recommendation, benchmark numbers, fleet orchestration patterns, and VM RAM sizing guidance.

Short version (C virtiofsd, cache=none + O_DIRECT, 6-backend GCP cluster, kernel 6.8.0):

Configuration Seq Write Rand 4K Read Latency
Hypervisor O_DIRECT (reference) 646 MiB/s 8.4 KiOPS 965 µs
C virtiofsd cache=auto 2,287 MiB/s 39.3 KiOPS 78 µs
Rust virtiofsd cache=auto 2,013 MiB/s 39.7 KiOPS 84 µs
C virtiofsd cache=none + O_DIRECT 3,650 MiB/s 56.9 KiOPS 78 µs
Rust virtiofsd cache=none + O_DIRECT 3,350 MiB/s 53.5 KiOPS 85 µs

PyTorch checkpoint overhead (virtiofsd vs hypervisor direct): ~35–40% slower saves, ~38–47% slower loads. A 10 GB LLaMA-7B shard: ~22 s via virtiofsd vs ~14 s direct. At typical checkpoint cadence (every 3–40 min) this is <4% of training time.

Key rules:

  • Use --cache=none (C) / --cache=never (Rust) for AI training — bypasses host page cache; reads go directly through the WEKA stack to the distributed NVMe backends
  • Never use --writeback — confirmed 3.7× sequential write collapse
  • queue-size=1024 on the vhost-user-fs device — default 128 bottlenecks throughput
  • C virtiofsd leads on kernel 6.8.0; both C and Rust are production-viable

About

WEKA virtiofs benchmark suite for KVM/QEMU evaluation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors