Add --gpu flag for Metal/MLX inference in Linux containers by ilessiorobotflowlabs · Pull Request #1314 · apple/container

ilessiorobotflowlabs · 2026-03-15T10:09:43Z

Summary

This PR adds GPU acceleration support for Linux containers running on Apple Silicon. When --gpu is passed to container run, the runtime injects vsock environment variables into the guest VM, enabling Python code inside the container to access the host Metal GPU for ML inference at near-native speed.

76 lines changed. 2 files touched.

The problem

Apple containers run Linux in lightweight VMs on Apple Silicon -- but there is currently no way to access the Metal GPU from inside those VMs. Metal and MLX cannot run in Linux guests. This is a hardware limitation, not a software one.

Developers working with ML models in containers today have two options: CPU-only inference (~5% of native speed) or running everything on the host outside any container.

The approach

Rather than attempting GPU passthrough (which Apple's Virtualization framework does not support), this PR takes the same host-guest bridge approach that vminitd already uses for container management: vsock.

A host-side daemon (container-toolkit-mlx) runs with direct Metal/MLX access and serves inference requests over gRPC through the vsock channel. Code inside the container uses a lightweight Python client (pip install mlx-container) that proxies requests to the host GPU.

Container (Linux VM) --[gRPC over vsock]--> Host Daemon (MLX/Metal GPU)

This is architecturally identical to how NVIDIA's container toolkit bridges GPU access, adapted for Apple Silicon's unified memory model.

What this PR adds

Two files, 76 lines total:

Flags.swift -- new Flags.GPU struct:

--gpu flag to enable GPU access
--gpu-model <id> to pre-load a HuggingFace model on container start
--gpu-memory <gb> for per-container GPU memory budgets
--gpu-max-tokens <n> to cap inference request size
--gpu-port <port> for custom vsock port (default: 2048)

ContainerRun.swift -- GPU environment injection:

When --gpu is set, injects MLX_VSOCK_CID, MLX_VSOCK_PORT, MLX_GPU_ENABLED into the container environment
Optionally injects MLX_GPU_MODEL and MLX_GPU_MEMORY
Logs GPU configuration at info level

Usage

# Run GPU-accelerated inference inside a Linux container
container run --gpu --gpu-model mlx-community/Llama-3.2-1B-4bit \
  ubuntu:latest python3 -c "
from mlx_container import generate, load_model
load_model('mlx-community/Llama-3.2-1B-4bit')
result = generate('Explain Apple Silicon', model='mlx-community/Llama-3.2-1B-4bit')
print(result.text)
print(f'{result.tokens_per_second:.0f} tok/s on host Metal GPU')
"

Performance

Tested on Apple M5, 24 GB unified memory:

Method	Tokens/sec	Runs in container
This PR + container-toolkit-mlx	99 tok/s	Yes
Native MLX (macOS, no container)	~103 tok/s	No
CPU fallback (no GPU)	~5 tok/s	Yes

~95% of native Metal performance. The only overhead is vsock serialization.

The companion toolkit

This PR is the integration point. The heavy lifting lives in container-toolkit-mlx, an open-source toolkit that provides:

mlx-container-daemon -- host-side gRPC server with MLX model management
mlx-ctk -- CLI for GPU discovery, daemon lifecycle, CDI spec generation
mlx-cdi-hook -- OCI prestart hook for automatic daemon startup
mlx-container -- Python client library (OpenAI + Anthropic API compatible)
CDI v0.5.0 spec support (apple.com/gpu)
259 tests (Swift + Python), security audited

The toolkit follows the same architectural patterns as this project: vsock for host-guest communication, gRPC for the wire protocol, Swift for the host-side components.

Why this belongs upstream

The flags are inert without the toolkit -- if container-toolkit-mlx is not installed, --gpu simply injects environment variables that nothing reads. Zero risk to existing users.
The vsock channel already exists -- this PR adds no new transport. It reuses the same vsock path that vminitd uses.
Developers expect it -- GPU support is the most-requested feature for Apple containers. This gives them a path forward.
76 lines -- this is as minimal as a GPU integration can be. All complexity lives in the external toolkit.

Test plan

container run --gpu --help shows GPU flags
container run without --gpu behaves identically to before (no GPU env vars injected)
container run --gpu injects MLX_VSOCK_CID=2 and MLX_VSOCK_PORT=2048 into container env
container run --gpu --gpu-model X additionally injects MLX_GPU_MODEL=X
End-to-end inference from container at 99 tok/s verified on M5

Built by RobotFlow Labs | container-toolkit-mlx

Adds GPU acceleration support for Linux containers on Apple Silicon through the MLX Container Toolkit. When --gpu is passed to `container run`, the runtime injects vsock environment variables into the guest VM, enabling code inside the container to access the host's Metal GPU for ML inference. Architecture: Container (Linux VM) --[gRPC over vsock]--> Host daemon (MLX/Metal) The host-side daemon (mlx-container-daemon) manages model loading and serves inference requests over the same vsock channel that vminitd already uses for container management. No GPU drivers or Metal frameworks are needed inside the Linux guest. New flags on `container run`: --gpu Enable GPU access --gpu-model <id> Pre-load a HuggingFace model --gpu-memory <gb> GPU memory budget --gpu-max-tokens <n> Max tokens per request --gpu-port <port> vsock port (default: 2048) Example: container run --gpu --gpu-model mlx-community/Llama-3.2-1B-4bit \ ubuntu:latest python3 -c \ "from mlx_container import generate; print(generate('Hello', model='mlx-community/Llama-3.2-1B-4bit').text)" Requires: https://github.com/RobotFlow-Labs/container-toolkit-mlx Signed-off-by: ilessio <ilessio@aiflowlabs.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --gpu flag for Metal/MLX inference in Linux containers#1314

Add --gpu flag for Metal/MLX inference in Linux containers#1314
ilessiorobotflowlabs wants to merge 1 commit intoapple:mainfrom
RobotFlow-Labs:feature/gpu-support

ilessiorobotflowlabs commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ilessiorobotflowlabs commented Mar 15, 2026

Summary

The problem

The approach

What this PR adds

Usage

Performance

The companion toolkit

Why this belongs upstream

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant