Wangmerlyn · Wangmerlyn · Mar 9, 2026 · Mar 9, 2026 · Mar 9, 2026 · Mar 9, 2026
diff --git a/README.md b/README.md
@@ -61,6 +61,12 @@ http://127.0.0.1:8765/
   pip install torch
   pip install keep-gpu
   ```
+- **Mac M series (M1/M2/M3/M4)**
+  ```bash
+  pip install torch
+  pip install keep-gpu[macm]
+  ```
+  Uses Metal Performance Shaders (MPS) backend on Apple Silicon.
 
 Flags that matter:
 
@@ -132,6 +138,9 @@ with GlobalGPUController(gpu_ids=[0, 1], vram_to_keep="750MB", interval=90, busy
   ```
 - Methods: `start_keep`, `stop_keep` (optional `job_id`, default stops all), `status` (optional `job_id`), `list_gpus` (basic info).
 - Dashboard: `http://127.0.0.1:8765/`
+- **Mac M series limitations:**
+  - GPU utilization monitoring is not available on macOS.
+  - `busy_threshold` parameter is accepted for API compatibility but has no effect.
 - Minimal client config (stdio MCP):
   ```yaml
   servers:

diff --git a/docs/concepts/architecture.md b/docs/concepts/architecture.md
@@ -7,12 +7,13 @@ schedulers that the GPU is still busy, without burning a full training workload.
 ## Components
 
 1. **CLI (Typer/Rich)** – Parses options, validates GPU IDs, and configures the logger.
-2. **`GlobalGPUController`** – Detects the current platform (CUDA today) and
-   instantiates one single-GPU controller per selected device.
-3. **`CudaGPUController`** – Owns the background thread, VRAM allocation, and small
-   matmul loops that tick every `interval` seconds.
+2. **`GlobalGPUController`** – Detects the current platform (CUDA, ROCm,
+   or Mac M series) and instantiates one single-GPU controller per selected device.
+3. **`CudaGPUController`** / **`RocmGPUController`** / **`MacMGPUController`** –
+   Platform-specific implementations for per-GPU keep-alive loops.
 4. **GPU monitor (NVML/ROCm)** – Wraps `nvidia-ml-py` (the `pynvml` module) for CUDA
    telemetry and optionally `rocm-smi` when installed by way of the `rocm` extra.
+   Mac M series does not support direct GPU utilization monitoring.
 5. **Utilities** – `parse_size` turns strings like `1GiB` into bytes, while
    `setup_logger` wires both console and file logging with optional colors.
 
@@ -53,6 +54,5 @@ Matrix multiplies:
 
 ## Platform detection
 
-`get_platform()` inspects the system and currently only enables the CUDA path.
-If you plan to contribute ROCm or CPU fallbacks, use this hook to branch into
-new controller implementations without changing the CLI.
+`get_platform()` inspects the system and enables the CUDA, ROCm, or Mac M series
+(MPS) path. Detection order: CUDA → ROCm → Mac M → CPU fallback.
diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -11,8 +11,9 @@ understand the minimum knobs you need to keep a GPU occupied.
 
 !!! info "Platforms"
     CUDA is the primary path; ROCm is supported by way of the `rocm` extra
-    (requires a ROCm-enabled PyTorch build). CPU-only environments can import
-    the package but controllers will not start.
+    (requires a ROCm-enabled PyTorch build); Mac M series (M1/M2/M3/M4) is
+    supported by way of the `macm` extra using Metal Performance Shaders (MPS).
+    CPU-only environments can import the package but controllers will not start.
 
 ## Install
 
@@ -47,6 +48,15 @@ understand the minimum knobs you need to keep a GPU occupied.
     pip install -e .[dev]
     ```
 
+=== "Mac M series (M1/M2/M3/M4)"
+    ```bash
+    pip install torch
+    pip install keep-gpu[macm]
+    ```
+    MPS (Metal Performance Shaders) backend will be used automatically on
+    Apple Silicon Macs. Note: GPU utilization monitoring is not available
+    on macOS.
+
 ## Pick your interface
 
 - **CLI** – fastest way to reserve GPUs from a shell; see [CLI Playbook](guides/cli.md).
@@ -60,6 +70,13 @@ understand the minimum knobs you need to keep a GPU occupied.
    python -c "import torch; print(torch.cuda.device_count())"
    ```
    A non-zero integer indicates CUDA is available.
+
+   On Mac M series:
+   ```bash
+   python -c "import torch; print(torch.backends.mps.is_available())"
+   ```
+   Should print `True` on Apple Silicon Macs.
+
 2. Run the CLI in dry form (press `Ctrl+C` after a few seconds):
    ```bash
    keep-gpu --interval 30 --vram 512MB

diff --git a/pyproject.toml b/pyproject.toml
@@ -49,6 +49,9 @@ dev = [
 rocm = [
   "rocm-smi",
 ]
+macm = [
+  "psutil",  # For system memory query on Apple Silicon
+]
 
 [project.urls]
 bugs = "https://github.com/Wangmerlyn/KeepGPU/issues"
@@ -144,5 +147,6 @@ force-single-line = false
 [tool.pytest.ini_options]
 markers = [
   "rocm: tests that require ROCm stack",
+  "macm: tests that require Apple Silicon with MPS",
   "large_memory: tests that use large VRAM",
 ]
diff --git a/skills/gpu-keepalive-with-keepgpu/SKILL.md b/skills/gpu-keepalive-with-keepgpu/SKILL.md
@@ -73,6 +73,23 @@ For ROCm users from local checkout:
 pip install -e ".[rocm]"
 ```
 
+### Option D: Mac M series install
+
+```bash
+# Install PyTorch with MPS support
+pip install torch
+
+# Install KeepGPU with Mac M extras
+pip install "keep_gpu[macm] @ git+https://github.com/Wangmerlyn/KeepGPU.git"
+```
+
+**Note for Mac M users:**
+- Uses Metal Performance Shaders (MPS) backend automatically on Apple Silicon
+- GPU utilization monitoring is **not available** on macOS (the system doesn't provide this API)
+- The `--busy-threshold` parameter is accepted for API compatibility but has no effect
+- Only device 0 is supported (MPS limitation)
+- Memory allocation uses the unified memory architecture (shared with system RAM)
+
 Verify installation:
 
 ```bash
@@ -179,6 +196,8 @@ echo $! > keepgpu.pid
 - Allocation failure / OOM: reduce `--vram` or free memory first.
 - No utilization telemetry: ensure `nvidia-ml-py` works and `nvidia-smi` is available.
 - No GPUs detected: verify drivers, CUDA/ROCm runtime, and `torch.cuda.device_count()`.
+- Mac M series issues: ensure you're on macOS 12.3+ and PyTorch is built with MPS support (`torch.backends.mps.is_available()` should return True)
+- Memory allocation failures on Mac: the unified memory architecture may show different behavior than discrete GPUs; try reducing `--vram` values
 
 ## Example
 

diff --git a/src/keep_gpu/global_gpu_controller/global_gpu_controller.py b/src/keep_gpu/global_gpu_controller/global_gpu_controller.py
@@ -37,12 +37,27 @@ def __init__(
             )
 
             controller_cls = RocmGPUController
+        elif self.computing_platform == ComputingPlatform.MACM:
+            from keep_gpu.single_gpu_controller.macm_gpu_controller import (
+                MacMGPUController,
+            )
+
+            controller_cls = MacMGPUController
         else:
             raise NotImplementedError(
                 f"GlobalGPUController not implemented for platform {self.computing_platform}"
             )
 
-        if gpu_ids is None:
+        if self.computing_platform == ComputingPlatform.MACM:
+            if gpu_ids is None:
+                self.gpu_ids = [0]
+            elif gpu_ids == [0]:
+                self.gpu_ids = gpu_ids
+            else:
+                raise ValueError(
+                    f"MACM platform only supports gpu_ids=[0] or None, got {gpu_ids}"
+                )
+        elif gpu_ids is None:
             self.gpu_ids = list(range(torch.cuda.device_count()))
         else:
             self.gpu_ids = gpu_ids

diff --git a/src/keep_gpu/single_gpu_controller/__init__.py b/src/keep_gpu/single_gpu_controller/__init__.py
@@ -0,0 +1,3 @@
+from keep_gpu.single_gpu_controller.macm_gpu_controller import MacMGPUController
+
+__all__ = ["MacMGPUController"]
diff --git a/src/keep_gpu/single_gpu_controller/macm_gpu_controller.py b/src/keep_gpu/single_gpu_controller/macm_gpu_controller.py
@@ -0,0 +1,165 @@
+import gc
+import threading
+import time
+from typing import Optional
+
+import torch
+
+from keep_gpu.single_gpu_controller.base_gpu_controller import BaseGPUController
+from keep_gpu.utilities.logger import setup_logger
+from keep_gpu.utilities.platform_manager import ComputingPlatform
+
+logger = setup_logger(__name__)
+
+
+class MacMGPUController(BaseGPUController):
+    def __init__(
+        self,
+        *,
+        rank: int = 0,
+        interval: float = 1.0,
+        vram_to_keep: str | int = "1000 MB",
+        busy_threshold: int = 10,
+        iterations: int = 5000,
+    ):
+        super().__init__(vram_to_keep=vram_to_keep, interval=interval)
+        if rank != 0:
+            raise ValueError("MPS only supports device 0; rank must be 0")
+        if iterations <= 0:
+            raise ValueError("iterations must be positive")
+        if not torch.backends.mps.is_available():
+            raise RuntimeError("PyTorch MPS backend is not available")
+
+        self.rank = rank
+        self.device = torch.device("mps")
+        self.busy_threshold = busy_threshold
+        self.iterations = iterations
+        self.platform = ComputingPlatform.MACM
+
+        self._stop_evt: Optional[threading.Event] = None
+        self._thread: Optional[threading.Thread] = None
+        self._num_elements: Optional[int] = None
+
+        logger.debug(
+            "rank %s: busy_threshold=%s ignored on macOS MPS (API compatibility)",
+            self.rank,
+            self.busy_threshold,
+        )
+
+    def keep(self) -> None:
+        if self._thread and self._thread.is_alive():
+            logger.warning("rank %s: keep thread already running", self.rank)
+            return
+
+        self._num_elements = int(self.vram_to_keep)
-        self._num_elements = int(self.vram_to_keep)
+        self._num_elements = int(self.vram_to_keep) // 4
-        self._num_elements = int(self.vram_to_keep)
+        self._num_elements = int(self.vram_to_keep) // 4
+        if self._num_elements <= 0:
+            raise ValueError("vram_to_keep must be positive")
+
+        self._stop_evt = threading.Event()
+        self._thread = threading.Thread(
+            target=self._keep_loop,
+            name=f"gpu-keeper-macm-{self.rank}",
+            daemon=True,
+        )
+        self._thread.start()
+        logger.info("rank %s: MPS keep thread started", self.rank)
+
+    def release(self) -> None:
+        if not (self._thread and self._thread.is_alive()):
+            logger.warning("rank %s: keep thread not running", self.rank)
+            return
+
+        stop_evt = self._stop_evt
+        if stop_evt is None:
+            logger.warning("rank %s: stop event missing; skipping release", self.rank)
+            return
+
+        stop_evt.set()
+        join_timeout = max(2.0, min(float(self.interval) + 2.0, 30.0))
+        self._thread.join(timeout=join_timeout)
+        if self._thread.is_alive():
+            logger.warning(
+                "rank %s: MPS keep thread did not stop within %.1fs",
+                self.rank,
+                join_timeout,
+            )
+            return
+
+        torch.mps.empty_cache()
+        gc.collect()
+        logger.info("rank %s: keep thread stopped & cache cleared", self.rank)
+
+    def __enter__(self):
+        self.keep()
+        return self
+
+    def __exit__(self, exc_type, exc, tb):
+        self.release()
+
+    def _keep_loop(self) -> None:
+        stop_evt = self._stop_evt
+        if stop_evt is None:
+            logger.error("rank %s: stop event not initialized", self.rank)
+            return
+
+        num_elements = self._num_elements if self._num_elements is not None else 0
+        if num_elements <= 0:
+            logger.error(
+                "rank %s: invalid vram_to_keep=%s", self.rank, self.vram_to_keep
+            )
+            return
+
+        tensor = None
+        while not stop_evt.is_set():
+            try:
+                tensor = torch.rand(
+                    num_elements,
+                    device=self.device,
+                    dtype=torch.float32,
+                    requires_grad=False,
+                )
+                break
+            except RuntimeError as exc:
+                logger.error("rank %s: failed to allocate tensor: %s", self.rank, exc)
+                torch.mps.empty_cache()
+                gc.collect()
+                if stop_evt.wait(self.interval):
+                    return
+
+        if tensor is None:
+            logger.error("rank %s: failed to allocate tensor, exiting loop", self.rank)
+            return
+
+        while not stop_evt.is_set():
+            try:
+                self._run_batch(tensor)
+                if stop_evt.wait(self.interval):
+                    break
+            except RuntimeError as exc:
+                if "out of memory" in str(exc).lower():
+                    torch.mps.empty_cache()
+                    gc.collect()
+                if stop_evt.wait(self.interval):
+                    break
+            except Exception:
+                logger.exception("rank %s: unexpected error", self.rank)
+                if stop_evt.wait(self.interval):
+                    break
+
+    @torch.no_grad()
+    def _run_batch(self, tensor: torch.Tensor) -> None:
+        stop_evt = self._stop_evt
+
+        tic = time.time()
+        for _ in range(self.iterations):
+            torch.relu_(tensor)
+            if stop_evt is not None and stop_evt.is_set():
+                break
+        torch.mps.synchronize()
+        toc = time.time()
+
+        logger.debug(
+            "rank %s: elementwise batch done - avg %.2f ms",
+            self.rank,
+            (toc - tic) * 1000 / max(1, self.iterations),
+        )
diff --git a/src/keep_gpu/utilities/platform_manager.py b/src/keep_gpu/utilities/platform_manager.py
@@ -1,4 +1,6 @@
 import os
+import sys
+import platform
 from enum import Enum
 from typing import Callable, List, Tuple
 
@@ -13,6 +15,7 @@ class ComputingPlatform(Enum):
     CPU = "cpu"
     CUDA = "cuda"
     ROCM = "rocm"
+    MACM = "macm"
 
 
 def _check_cuda():
@@ -60,9 +63,27 @@ def _check_cpu():
     return True
 
 
+def _check_macm():
+    """Return True if running on Apple Silicon Mac (Mac M) with MPS support."""
+    try:
+        # macOS (darwin) on Apple Silicon (arm64) with PyTorch MPS backend available
+        if sys.platform != "darwin":
+            return False
+        if platform.machine() != "arm64":
+            return False
+        # PyTorch MPS availability
+        if torch.backends.mps.is_available():
+            return True
+        return False
+    except Exception as exc:  # pragma: no cover - defensive
+        logger.debug("MACM detection failed: %s", exc)
+        return False
+
+
 _PLATFORM_CHECKS: List[Tuple[ComputingPlatform, Callable[[], bool]]] = [
     (ComputingPlatform.CUDA, _check_cuda),
     (ComputingPlatform.ROCM, _check_rocm),
+    (ComputingPlatform.MACM, _check_macm),
     (ComputingPlatform.CPU, _check_cpu),
 ]
 
@@ -105,5 +126,4 @@ def get_platform():
 
 
 if __name__ == "__main__":
-
     print("Current platform:", get_platform().value)
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		from keep_gpu.single_gpu_controller.macm_gpu_controller import MacMGPUController

		__all__ = ["MacMGPUController"]