feat: Apple Silicon (MPS) local generation support#99
Open
Sergio Gil Jiménez (jimeneztion) wants to merge 3 commits intoLightricks:mainfrom
Open
feat: Apple Silicon (MPS) local generation support#99Sergio Gil Jiménez (jimeneztion) wants to merge 3 commits intoLightricks:mainfrom
Sergio Gil Jiménez (jimeneztion) wants to merge 3 commits intoLightricks:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Enables local AI video generation on Apple Silicon Macs. Previously, all macOS
users were forced to use the LTX cloud API regardless of hardware. With this
change, users with ≥15 GB of unified memory can run full local inference on
their GPU via Metal Performance Shaders (MPS).
The work has three layers:
1. Low-level MPS compatibility patches (
backend/services/patches/)The upstream
ltx_pipelineslibrary is CUDA-first and makes several API callsthat don't exist on MPS. We monkey-patch at server startup to fix them without
forking upstream:
mps_gpu_model_fix.py— replaces CUDA stream-based async layer streamingwith a synchronous MPS-aware wrapper. On CUDA, layers are streamed to GPU
using async CUDA streams for overlap; on MPS we do it synchronously since
Metal has no equivalent primitive.
mps_layer_streaming_fix.py— skipspin_memory()when moving layers toMPS. Pinned host memory is CUDA-only; calling it on MPS silently corrupts
tensors.
mps_vocoder_fix.py— fixes a float32 dtype mismatch in the vocoder. MPSautocast doesn't support float32, so we temporarily cast model weights
instead of relying on autocast.
safetensors_loader_fix.py— setsnon_blocking=Falseandcopy=Falsewhen moving memory-mapped tensors to MPS. Async transfers on mmap buffers
can segfault on Metal.
ltx_text_encoder.py— inlines device-aware memory cleanup to avoidunconditional
torch.cuda.synchronize()calls.2. Local generation policy (
backend/runtime_config/runtime_policy.py)All four pipelines (
fast,a2v,ic_lora,retake) setstreaming_prefetch_count=1on MPS (synchronous) instead of2(async withCUDA streams). This prevents OOM when streaming the transformer without CUDA
stream overlap.
3. Warmup skip on MPS (
health_handler.py,pipelines_handler.py)A full inference warmup at startup takes several minutes on a cold Metal device
and blocks all generation requests. On MPS we skip warmup — the pipeline loads
and the first real generation acts as warmup. For CUDA, the warmup state machine
is unchanged but now correctly manages
WARMING → WARMtransitions so requeststhat arrive during warmup wait correctly instead of failing.
How to test
pnpm dev[t2v] Generation started (model=fast, ...)with no CUDA errorspnpm typecheck && pnpm backend:test