Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,12 @@ http://127.0.0.1:8765/
pip install torch
pip install keep-gpu
```
- **Mac M series (M1/M2/M3/M4)**
```bash
pip install torch
pip install keep-gpu[macm]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove unsupported macm extra from install command

The new installation snippet tells users to run pip install keep-gpu[macm], but this repository’s pyproject.toml only defines dev and rocm extras, so macm is not a real extra. Users following this path get an unknown-extra install warning and no guaranteed Mac-specific dependency set, which makes onboarding and support guidance unreliable.

Useful? React with 👍 / 👎.

```
Uses Metal Performance Shaders (MPS) backend on Apple Silicon.

Flags that matter:

Expand Down Expand Up @@ -132,6 +138,9 @@ with GlobalGPUController(gpu_ids=[0, 1], vram_to_keep="750MB", interval=90, busy
```
- Methods: `start_keep`, `stop_keep` (optional `job_id`, default stops all), `status` (optional `job_id`), `list_gpus` (basic info).
- Dashboard: `http://127.0.0.1:8765/`
- **Mac M series limitations:**
- GPU utilization monitoring is not available on macOS.
- `busy_threshold` parameter is accepted for API compatibility but has no effect.
- Minimal client config (stdio MCP):
```yaml
servers:
Expand Down
14 changes: 7 additions & 7 deletions docs/concepts/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,13 @@ schedulers that the GPU is still busy, without burning a full training workload.
## Components

1. **CLI (Typer/Rich)** – Parses options, validates GPU IDs, and configures the logger.
2. **`GlobalGPUController`** – Detects the current platform (CUDA today) and
instantiates one single-GPU controller per selected device.
3. **`CudaGPUController`** – Owns the background thread, VRAM allocation, and small
matmul loops that tick every `interval` seconds.
2. **`GlobalGPUController`** – Detects the current platform (CUDA, ROCm,
or Mac M series) and instantiates one single-GPU controller per selected device.
3. **`CudaGPUController`** / **`RocmGPUController`** / **`MacMGPUController`** –
Platform-specific implementations for per-GPU keep-alive loops.
4. **GPU monitor (NVML/ROCm)** – Wraps `nvidia-ml-py` (the `pynvml` module) for CUDA
telemetry and optionally `rocm-smi` when installed by way of the `rocm` extra.
Mac M series does not support direct GPU utilization monitoring.
5. **Utilities** – `parse_size` turns strings like `1GiB` into bytes, while
`setup_logger` wires both console and file logging with optional colors.

Expand Down Expand Up @@ -53,6 +54,5 @@ Matrix multiplies:

## Platform detection

`get_platform()` inspects the system and currently only enables the CUDA path.
If you plan to contribute ROCm or CPU fallbacks, use this hook to branch into
new controller implementations without changing the CLI.
`get_platform()` inspects the system and enables the CUDA, ROCm, or Mac M series
(MPS) path. Detection order: CUDA → ROCm → Mac M → CPU fallback.
21 changes: 19 additions & 2 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,9 @@ understand the minimum knobs you need to keep a GPU occupied.

!!! info "Platforms"
CUDA is the primary path; ROCm is supported by way of the `rocm` extra
(requires a ROCm-enabled PyTorch build). CPU-only environments can import
the package but controllers will not start.
(requires a ROCm-enabled PyTorch build); Mac M series (M1/M2/M3/M4) is
supported by way of the `macm` extra using Metal Performance Shaders (MPS).
CPU-only environments can import the package but controllers will not start.

## Install

Expand Down Expand Up @@ -47,6 +48,15 @@ understand the minimum knobs you need to keep a GPU occupied.
pip install -e .[dev]
```

=== "Mac M series (M1/M2/M3/M4)"
```bash
pip install torch
pip install keep-gpu[macm]
```
MPS (Metal Performance Shaders) backend will be used automatically on
Apple Silicon Macs. Note: GPU utilization monitoring is not available
on macOS.

## Pick your interface

- **CLI** – fastest way to reserve GPUs from a shell; see [CLI Playbook](guides/cli.md).
Expand All @@ -60,6 +70,13 @@ understand the minimum knobs you need to keep a GPU occupied.
python -c "import torch; print(torch.cuda.device_count())"
```
A non-zero integer indicates CUDA is available.

On Mac M series:
```bash
python -c "import torch; print(torch.backends.mps.is_available())"
```
Should print `True` on Apple Silicon Macs.

2. Run the CLI in dry form (press `Ctrl+C` after a few seconds):
```bash
keep-gpu --interval 30 --vram 512MB
Expand Down
4 changes: 4 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@ dev = [
rocm = [
"rocm-smi",
]
macm = [
"psutil", # For system memory query on Apple Silicon
]

[project.urls]
bugs = "https://github.com/Wangmerlyn/KeepGPU/issues"
Expand Down Expand Up @@ -144,5 +147,6 @@ force-single-line = false
[tool.pytest.ini_options]
markers = [
"rocm: tests that require ROCm stack",
"macm: tests that require Apple Silicon with MPS",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The macm optional dependency is missing. The installation instructions in the documentation use pip install keep-gpu[macm], which will fail without this definition. According to the PR description, this should also include the psutil dependency.

Please add the macm extra to [project.optional-dependencies]. For example:

[project.optional-dependencies]
dev = [...]
rocm = [...]
macm = [
  "psutil",
]

"large_memory: tests that use large VRAM",
]
19 changes: 19 additions & 0 deletions skills/gpu-keepalive-with-keepgpu/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,23 @@ For ROCm users from local checkout:
pip install -e ".[rocm]"
```

### Option D: Mac M series install

```bash
# Install PyTorch with MPS support
pip install torch

# Install KeepGPU with Mac M extras
pip install "keep_gpu[macm] @ git+https://github.com/Wangmerlyn/KeepGPU.git"
```

**Note for Mac M users:**
- Uses Metal Performance Shaders (MPS) backend automatically on Apple Silicon
- GPU utilization monitoring is **not available** on macOS (the system doesn't provide this API)
- The `--busy-threshold` parameter is accepted for API compatibility but has no effect
- Only device 0 is supported (MPS limitation)
- Memory allocation uses the unified memory architecture (shared with system RAM)

Verify installation:

```bash
Expand Down Expand Up @@ -179,6 +196,8 @@ echo $! > keepgpu.pid
- Allocation failure / OOM: reduce `--vram` or free memory first.
- No utilization telemetry: ensure `nvidia-ml-py` works and `nvidia-smi` is available.
- No GPUs detected: verify drivers, CUDA/ROCm runtime, and `torch.cuda.device_count()`.
- Mac M series issues: ensure you're on macOS 12.3+ and PyTorch is built with MPS support (`torch.backends.mps.is_available()` should return True)
- Memory allocation failures on Mac: the unified memory architecture may show different behavior than discrete GPUs; try reducing `--vram` values

## Example

Expand Down
17 changes: 16 additions & 1 deletion src/keep_gpu/global_gpu_controller/global_gpu_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,27 @@ def __init__(
)

controller_cls = RocmGPUController
elif self.computing_platform == ComputingPlatform.MACM:
from keep_gpu.single_gpu_controller.macm_gpu_controller import (
MacMGPUController,
)

controller_cls = MacMGPUController
else:
raise NotImplementedError(
f"GlobalGPUController not implemented for platform {self.computing_platform}"
)

if gpu_ids is None:
if self.computing_platform == ComputingPlatform.MACM:
if gpu_ids is None:
self.gpu_ids = [0]
elif gpu_ids == [0]:
self.gpu_ids = gpu_ids
else:
raise ValueError(
f"MACM platform only supports gpu_ids=[0] or None, got {gpu_ids}"
)
elif gpu_ids is None:
self.gpu_ids = list(range(torch.cuda.device_count()))
else:
self.gpu_ids = gpu_ids
Expand Down
3 changes: 3 additions & 0 deletions src/keep_gpu/single_gpu_controller/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from keep_gpu.single_gpu_controller.macm_gpu_controller import MacMGPUController

__all__ = ["MacMGPUController"]
165 changes: 165 additions & 0 deletions src/keep_gpu/single_gpu_controller/macm_gpu_controller.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
import gc
import threading
import time
from typing import Optional

import torch

from keep_gpu.single_gpu_controller.base_gpu_controller import BaseGPUController
from keep_gpu.utilities.logger import setup_logger
from keep_gpu.utilities.platform_manager import ComputingPlatform

logger = setup_logger(__name__)


class MacMGPUController(BaseGPUController):
def __init__(
self,
*,
rank: int = 0,
interval: float = 1.0,
vram_to_keep: str | int = "1000 MB",
busy_threshold: int = 10,
iterations: int = 5000,
):
super().__init__(vram_to_keep=vram_to_keep, interval=interval)
if rank != 0:
raise ValueError("MPS only supports device 0; rank must be 0")
if iterations <= 0:
raise ValueError("iterations must be positive")
if not torch.backends.mps.is_available():
raise RuntimeError("PyTorch MPS backend is not available")

self.rank = rank
self.device = torch.device("mps")
self.busy_threshold = busy_threshold
self.iterations = iterations
self.platform = ComputingPlatform.MACM

self._stop_evt: Optional[threading.Event] = None
self._thread: Optional[threading.Thread] = None
self._num_elements: Optional[int] = None

logger.debug(
"rank %s: busy_threshold=%s ignored on macOS MPS (API compatibility)",
self.rank,
self.busy_threshold,
)

def keep(self) -> None:
if self._thread and self._thread.is_alive():
logger.warning("rank %s: keep thread already running", self.rank)
return

self._num_elements = int(self.vram_to_keep)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There appears to be a miscalculation in the number of tensor elements. self.vram_to_keep holds the requested memory in bytes, but it's being used directly as the number of elements for a torch.float32 tensor. Since each float32 element occupies 4 bytes, this results in allocating 4 times the requested VRAM.

To allocate the correct amount of memory, you should divide the number of bytes by the size of the element type (4 for float32).

Suggested change
self._num_elements = int(self.vram_to_keep)
self._num_elements = int(self.vram_to_keep) // 4

if self._num_elements <= 0:
raise ValueError("vram_to_keep must be positive")

self._stop_evt = threading.Event()
self._thread = threading.Thread(
target=self._keep_loop,
name=f"gpu-keeper-macm-{self.rank}",
daemon=True,
)
self._thread.start()
logger.info("rank %s: MPS keep thread started", self.rank)

def release(self) -> None:
if not (self._thread and self._thread.is_alive()):
logger.warning("rank %s: keep thread not running", self.rank)
return

stop_evt = self._stop_evt
if stop_evt is None:
logger.warning("rank %s: stop event missing; skipping release", self.rank)
return

stop_evt.set()
join_timeout = max(2.0, min(float(self.interval) + 2.0, 30.0))
self._thread.join(timeout=join_timeout)
if self._thread.is_alive():
logger.warning(
"rank %s: MPS keep thread did not stop within %.1fs",
self.rank,
join_timeout,
)
return

torch.mps.empty_cache()
gc.collect()
logger.info("rank %s: keep thread stopped & cache cleared", self.rank)

def __enter__(self):
self.keep()
return self

def __exit__(self, exc_type, exc, tb):
self.release()

def _keep_loop(self) -> None:
stop_evt = self._stop_evt
if stop_evt is None:
logger.error("rank %s: stop event not initialized", self.rank)
return

num_elements = self._num_elements if self._num_elements is not None else 0
if num_elements <= 0:
logger.error(
"rank %s: invalid vram_to_keep=%s", self.rank, self.vram_to_keep
)
return

tensor = None
while not stop_evt.is_set():
try:
tensor = torch.rand(
num_elements,
device=self.device,
dtype=torch.float32,
requires_grad=False,
)
break
except RuntimeError as exc:
logger.error("rank %s: failed to allocate tensor: %s", self.rank, exc)
torch.mps.empty_cache()
gc.collect()
if stop_evt.wait(self.interval):
return

if tensor is None:
logger.error("rank %s: failed to allocate tensor, exiting loop", self.rank)
return

while not stop_evt.is_set():
try:
self._run_batch(tensor)
if stop_evt.wait(self.interval):
break
except RuntimeError as exc:
if "out of memory" in str(exc).lower():
torch.mps.empty_cache()
gc.collect()
if stop_evt.wait(self.interval):
break
except Exception:
logger.exception("rank %s: unexpected error", self.rank)
if stop_evt.wait(self.interval):
break

@torch.no_grad()
def _run_batch(self, tensor: torch.Tensor) -> None:
stop_evt = self._stop_evt

tic = time.time()
for _ in range(self.iterations):
torch.relu_(tensor)
if stop_evt is not None and stop_evt.is_set():
break
torch.mps.synchronize()
toc = time.time()

logger.debug(
"rank %s: elementwise batch done - avg %.2f ms",
self.rank,
(toc - tic) * 1000 / max(1, self.iterations),
)
22 changes: 21 additions & 1 deletion src/keep_gpu/utilities/platform_manager.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
import os
import sys
import platform
from enum import Enum
from typing import Callable, List, Tuple

Expand All @@ -13,6 +15,7 @@ class ComputingPlatform(Enum):
CPU = "cpu"
CUDA = "cuda"
ROCM = "rocm"
MACM = "macm"


def _check_cuda():
Expand Down Expand Up @@ -60,9 +63,27 @@ def _check_cpu():
return True


def _check_macm():
"""Return True if running on Apple Silicon Mac (Mac M) with MPS support."""
try:
# macOS (darwin) on Apple Silicon (arm64) with PyTorch MPS backend available
if sys.platform != "darwin":
return False
if platform.machine() != "arm64":
return False
# PyTorch MPS availability
if torch.backends.mps.is_available():
return True
return False
except Exception as exc: # pragma: no cover - defensive
logger.debug("MACM detection failed: %s", exc)
return False


_PLATFORM_CHECKS: List[Tuple[ComputingPlatform, Callable[[], bool]]] = [
(ComputingPlatform.CUDA, _check_cuda),
(ComputingPlatform.ROCM, _check_rocm),
(ComputingPlatform.MACM, _check_macm),
(ComputingPlatform.CPU, _check_cpu),
]

Expand Down Expand Up @@ -105,5 +126,4 @@ def get_platform():


if __name__ == "__main__":

print("Current platform:", get_platform().value)
Loading