Skip to content

Commit e3a64e0

Browse files
committed
feat(diffusers): add Shutdown method to release GPU memory
Add Shutdown method to the diffusers backend that properly releases GPU memory when a model is unloaded. This enables dynamic model reloading with different configurations (e.g., switching LoRA adapters) without restarting the service. The Shutdown method: - Releases the pipeline, controlnet, and compel objects - Clears CUDA cache with torch.cuda.empty_cache() - Resets state flags (img2vid, txt2vid, ltx2_pipeline) This works with LocalAI's existing /backend/shutdown API endpoint, which terminates the gRPC process. The explicit cleanup ensures GPU memory is properly released before process termination. Tested with Qwen-Image (~95GB) on NVIDIA H20 GPUs.
1 parent 3f48145 commit e3a64e0

1 file changed

Lines changed: 41 additions & 0 deletions

File tree

backend/python/diffusers/backend.py

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -443,6 +443,47 @@ def _load_pipeline(self, request, modelFile, fromSingleFile, torchType, variant)
443443
def Health(self, request, context):
444444
return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
445445

446+
def Shutdown(self, request, context):
447+
"""
448+
Shutdown and release GPU memory for the loaded model.
449+
This allows dynamic model reloading with different configurations (e.g., different LoRA adapters).
450+
"""
451+
try:
452+
print("Shutting down diffusers backend...", file=sys.stderr)
453+
454+
# Release pipeline
455+
if hasattr(self, 'pipe') and self.pipe is not None:
456+
del self.pipe
457+
self.pipe = None
458+
459+
# Release controlnet
460+
if hasattr(self, 'controlnet') and self.controlnet is not None:
461+
del self.controlnet
462+
self.controlnet = None
463+
464+
# Release compel
465+
if hasattr(self, 'compel') and self.compel is not None:
466+
del self.compel
467+
self.compel = None
468+
469+
# Clear CUDA cache to release GPU memory
470+
if torch.cuda.is_available():
471+
torch.cuda.empty_cache()
472+
torch.cuda.synchronize()
473+
print("CUDA cache cleared", file=sys.stderr)
474+
475+
# Reset state flags
476+
self.img2vid = False
477+
self.txt2vid = False
478+
self.ltx2_pipeline = False
479+
self.options = {}
480+
481+
print("Diffusers backend shutdown complete", file=sys.stderr)
482+
return backend_pb2.Result(message="Model unloaded successfully", success=True)
483+
except Exception as err:
484+
print(f"Error during shutdown: {err}", file=sys.stderr)
485+
return backend_pb2.Result(success=False, message=f"Shutdown error: {err}")
486+
446487
def LoadModel(self, request, context):
447488
try:
448489
print(f"Loading model {request.Model}...", file=sys.stderr)

0 commit comments

Comments
 (0)