A compute-only Vulkan MPS (Multi-Process Service) daemon, inspired by NVIDIA's MPS. Multiple clients share a single Vulkan device managed by a central daemon over Unix domain sockets.
# Build
cmake -S . -B build
cmake --build build -j4
# Start the server
./build/vkmps-server
# Run the compute example
./build/simple_compute
# Run all tests
cd build && ctest -V- Vulkan SDK — for
VK_HEADER_VERSIONandlibvulkan - glslc — included with Vulkan SDK, compiles GLSL → SPIR-V
- g++ ≥ 10 or clang ≥ 12 (C++17 required)
- Linux with a Vulkan-capable GPU (or llvmpipe software rasterizer for development)
cmake -S . -B build
cmake --build build -j4Targets:
| Target | Description |
|---|---|
vkmps-server |
The daemon binary |
vkmps-control |
CLI tool to query/control a running server |
simple_compute |
Single-client compute example (sync + async) |
multi_process |
Multi-client concurrent compute example (async) |
test_protocol |
Protocol serialization unit tests |
test_integration |
Mock-server IPC integration tests |
test_vulkan_integration |
Real-server Vulkan GPU integration tests |
On startup the server checks whether a socket file already exists at the configured path. If a file is found it attempts to connect — if a live server responds, the new instance exits with an error (prevents duplicate daemons). If no server responds, the stale socket and lock file are removed before binding.
A PID lock file (<socket>.lock) prevents a second server from starting on the same path.
SIGSEGV, SIGABRT, and SIGQUIT are handled with an async-signal-safe emergency cleanup routine that:
- Writes a diagnostic message to stderr
- Unlinks the Unix socket file
- Unlinks the PID lock file
- Calls
shm_unlinkon the ring-logger shared memory segment - Exits with
128 + signal
Normal shutdown via SIGINT/SIGTERM performs full graceful cleanup (close client sockets, stop threads, destroy Vulkan resources, unlink socket and lock file).
SIGPIPE is silently ignored to prevent process termination from broken client connections.
Usage: ./build/vkmps-server [options]
Options:
-s, --socket PATH Unix socket path (default: /tmp/vkmps.sock)
-d, --device IDX GPU device index (default: 0)
-l, --log Enable file logging (default: /tmp/vkmps.log)
-L, --log-path PATH Log file path (implies --log)
-M, --scheduling-mode MODE
Scheduling mode: exclusive-throughput (default)
or cooperative-fairshare
-q, --loop-slice-quota N
Yield threshold for EXCLUSIVE_THROUGHPUT (default: 100000)
--instrument-submodular
Enable submodular yield-point selection in SPIR-V
instrumentation (selects loops that maximize
yield-per-byte under the loop-slice-quota budget)
-h, --help Show this help
Enable with --log (writes to /tmp/vkmps.log by default) or --log-path /path/to/log to set a custom path:
./build/vkmps-server --log -s /tmp/vkmps.sockThe log captures timestamps for:
- Server lifecycle (start, stop, thread state)
- Vulkan device discovery (name, vendor, version, queues)
- Client connections and disconnections
- Resource operations (program registration, buffer allocation/free, data transfer)
- Dispatch submissions, group sizes, priority scheduling
- All failures and errors
Example log output:
[2026-05-20 20:05:18.851] [INFO] vkmps-server logging started (PID=50236)
[2026-05-20 20:05:18.877] [INFO] Vulkan device found: llvmpipe ... Vulkan 1.4.318
[2026-05-20 20:05:18.891] [INFO] Created 1 compute queue(s): [q0@1.000000]
[2026-05-20 20:05:18.892] [INFO] Server listening on /tmp/vkmps.sock
[2026-05-20 20:05:19.123] [INFO] Client 1 connected (PID=12345, name=client1, priority=MEDIUM)
[2026-05-20 20:05:19.456] [INFO] Registered program 1: compute_multiply
[2026-05-20 20:05:19.789] [INFO] Allocated buffer 1 (4096 bytes) for client 1
[2026-05-20 20:05:20.012] [INFO] Submitted dispatch 1 (4x1x1) prio=REALTIME for client 1
[2026-05-20 20:05:20.234] [INFO] Client 1 disconnected
#include <vkmps/client.h>
vkmps_client_t client = vkmps_connect("/tmp/vkmps.sock");
vkmps_program_t prog = vkmps_register_program(client, "my_shader", spirv, size);
vkmps_buffer_t buf = vkmps_alloc_buffer(client, 4096, VKMPS_BUFFER_USAGE_STORAGE);
vkmps_write_buffer(client, buf, 0, 4096, data);
vkmps_submission_t sub = vkmps_submit(client, prog, 64, 1, 1, &pc, 4,
&buf, 1, &buf, 1);
vkmps_wait(client, sub, 0);
vkmps_read_buffer(client, buf, 0, 4096, output);
vkmps_disconnect(client);// Submit returns immediately — work runs in a background thread
vkmps_async_handle_t* handle = vkmps_submit_nb(client, prog, 64, 1, 1,
&pc, 4, &buf, 1, &buf, 1);
// Do other work while dispatch is in flight...
// ...
// Block until the dispatch completes
vkmps_submit_wait(handle, 0);
vkmps_submit_handle_free(handle);vkmps_submit_nb uses std::async internally to perform the submit + wait in a
background thread, freeing the calling thread to overlap work with execution.
| Function | Description |
|---|---|
vkmps_submit_nb(client, prog, gx, gy, gz, pc, pc_size, in, ni, out, no) |
Non-blocking submit; returns vkmps_async_handle_t* immediately |
vkmps_submit_wait(handle, timeout_ns) |
Blocks until the dispatch completes; returns VKMPS_OK or error |
vkmps_submit_handle_free(handle) |
Releases the async handle |
vkmps_client_t client = vkmps_connect_with_priority(path, VKMPS_PRIORITY_REALTIME);┌────────────┐ ┌────────────┐ ┌────────────┐
│ Client A │ │ Client B │ │ Client C │
│ (process) │ │ (process) │ │ (process) │
└─────┬──────┘ └─────┬──────┘ └─────┬──────┘
│ Unix socket │ Unix socket │ Unix socket
└──────┬────────┴────────┬──────┘
│ │
┌──────▼─────────────────▼──────┐
│ vkmps-server daemon │
│ ┌─────────────────────────┐ │
│ │ Submission Thread │ │
│ │ (VkQueue dispatch) │ │
│ ├─────────────────────────┤ │
│ │ Accept + Client Threads│ │
│ │ (protocol handling) │ │
│ └─────────────────────────┘ │
│ VkDevice │
└───────────────────────────────┘
- Daemon owns the VkDevice — clients are thin IPC proxies over Unix domain sockets
- Binary protocol — fixed-header wire format, no serialization library dependency
- Priority scheduling — 4 client priorities (LOW/MEDIUM/HIGH/REALTIME) route to up to 4 compute queues with descending priorities
- VK_EXT_global_priority — when available, GPU-side priority queuing (REALTIME through LOW)
- VK_ARM_scheduling_controls — when detected on ARM Mali GPUs, enables shader core count control
- Cooperative scheduling — dispatch dimensions dynamically scaled based on submission backlog depth and client priority
Extensions are probed at runtime and enabled only when the underlying GPU supports them:
| Extension | Scope | Effect |
|---|---|---|
VK_EXT_global_priority |
Cross-vendor | Per-queue GPU priority (REALTIME/HIGH/MEDIUM/LOW) |
VK_ARM_scheduling_controls |
ARM Mali | Shader core count control for compute dispatches |
cd build && ctest -V| Test binary | Tests | What it covers |
|---|---|---|
test_protocol |
31 | Writer/Reader, every message type round-trip, edge cases, priority helpers |
test_integration |
5 | Socketpair-based IPC: handshake, lifecycle, concurrent clients, priority round-trip, dispatch round-trip |
test_vulkan_integration |
6 | Real-Vulkan: buffer read/write, compute dispatch, sequential dispatches, concurrent clients, large dispatch, async dispatch |
Vulkan integration tests require a Vulkan-capable GPU or driver (e.g., llvmpipe). All tests gracefully skip when no Vulkan device is available.
Boost Software License 1.0 — see LICENSE_1_0.txt.