Wangmerlyn · Wangmerlyn · Mar 4, 2026 · Mar 3, 2026 · Mar 3, 2026 · Mar 4, 2026
diff --git a/README.md b/README.md
@@ -5,6 +5,7 @@
 [![DOI](https://zenodo.org/badge/987167271.svg)](https://doi.org/10.5281/zenodo.17129114)
 [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/Wangmerlyn/KeepGPU)
 [![CodeRabbit Pull Request Reviews](https://img.shields.io/coderabbit/prs/github/Wangmerlyn/KeepGPU)](https://coderabbit.ai/dashboard/gh/Wangmerlyn/KeepGPU)
+[![SkillCheck Passed](https://raw.githubusercontent.com/olgasafonova/skillcheck-free/main/skill-check/passed.svg)](https://github.com/olgasafonova/skillcheck-free)
 
 **Keep GPU** keeps shared GPUs from being reclaimed while you prep data, debug, or coordinate multi-stage pipelines. It allocates just enough VRAM and issues lightweight CUDA work so schedulers observe an “active” device—without running a full training job.
 

diff --git a/docs/skills/gpu-keepalive-with-keepgpu/skillcheck-free-report.md b/docs/skills/gpu-keepalive-with-keepgpu/skillcheck-free-report.md
@@ -0,0 +1,22 @@
+# SkillCheck Free Report: gpu-keepalive-with-keepgpu
+
+Validation method: applied the SkillCheck Free rule set from `skill-check/SKILL.md` in <https://github.com/olgasafonova/skillcheck-free> after updating the skill to focus on KeepGPU CLI usage and installation instructions.
+
+## Summary
+
+- Critical: 0
+- Warnings: 0
+- Suggestions: 0
+
+## Strengths detected
+
+- Description includes activation triggers (8.3)
+- Description includes negative triggers (8.7)
+- Skill includes example section (8.1)
+- Skill documents error handling or limitations (8.2)
+- Skill uses structured instructions (8.5)
+- Skill documents prerequisites (8.6)
+
+## Checked files
+
+- `skills/gpu-keepalive-with-keepgpu/SKILL.md`
diff --git a/skills/gpu-keepalive-with-keepgpu/SKILL.md b/skills/gpu-keepalive-with-keepgpu/SKILL.md
@@ -0,0 +1,156 @@
+---
+name: gpu-keepalive-with-keepgpu
+description: Install and operate the KeepGPU CLI to keep reserved GPUs active during data prep, debugging, and orchestration downtime. Use when users ask for keep-gpu command construction, tuning (--vram, --interval, --busy-threshold), installation from this repository, or runtime troubleshooting of keep-gpu sessions; do not use for repository development, code refactoring, or unrelated Python tooling.
+---
+
+# KeepGPU CLI Operator
+
+Use this workflow to run `keep-gpu` safely and effectively.
+
+## Prerequisites
+
+- Confirm at least one GPU is visible (`python -c "import torch; print(torch.cuda.device_count())"`).
+- Run commands in a shell where CUDA/ROCm drivers are already available.
+- Use `Ctrl+C` to stop KeepGPU and release memory cleanly.
+
+## Install KeepGPU
+
+Install PyTorch first for your platform, then install KeepGPU.
+
+### Option A: Install from package index
+
+```bash
+# CUDA example (change cu121 to your CUDA version)
+pip install --index-url https://download.pytorch.org/whl/cu121 torch
+pip install keep-gpu
+```
+
+```bash
+# ROCm example (change rocm6.1 to your ROCm version)
+pip install --index-url https://download.pytorch.org/whl/rocm6.1 torch
+pip install keep-gpu[rocm]
+```
+
+### Option B: Install directly from Git URL (no local clone)
+
+Prefer this option when users only need the CLI and do not need local source edits. This avoids checkout directory and cleanup overhead.
+
+```bash
+pip install "git+https://github.com/Wangmerlyn/KeepGPU.git"
+```
+
+If SSH access is configured:
+
+```bash
+pip install "git+ssh://git@github.com/Wangmerlyn/KeepGPU.git"
+```
+
+ROCm variant from Git URL:
+
+```bash
+pip install "keep_gpu[rocm] @ git+https://github.com/Wangmerlyn/KeepGPU.git"
+```
+
+### Option C: Install from a local source checkout (explicit path)
+
+Use this option only when users already have a local checkout or plan to edit source.
+
+```bash
+git clone https://github.com/Wangmerlyn/KeepGPU.git
+cd KeepGPU
+pip install -e .
+```
+
+If the checkout already exists somewhere else, install by absolute path:
+
+```bash
+pip install -e /absolute/path/to/KeepGPU
+```
+
+For ROCm users from local checkout:
+
+```bash
+pip install -e ".[rocm]"
+```
+
+Verify installation:
+
+```bash
+keep-gpu --help
+```
+
+## Command model
+
+CLI options to tune:
+
+- `--gpu-ids`: comma-separated IDs (`0`, `0,1`). If omitted, KeepGPU uses all visible GPUs.
+- `--vram`: VRAM to hold per GPU (`512MB`, `1GiB`, or raw bytes).
+- `--interval`: seconds between keep-alive cycles.
+- `--busy-threshold` (`--util-threshold` alias): if utilization is above this percent, KeepGPU backs off.
+
+Legacy compatibility:
+
+- `--threshold` is deprecated but still accepted.
+- Numeric `--threshold` maps to busy threshold.
+- String `--threshold` maps to VRAM.
+
+## Agent workflow
+
+1. Collect workload intent: target GPU IDs, expected hold duration, and whether node is shared.
+2. Choose safe defaults when unspecified: `--vram 1GiB`, `--interval 60-120`, `--busy-threshold 25` for shared nodes.
+3. Build one concrete command.
+4. Provide stop instruction (`Ctrl+C`) and a verification step.
+5. If command fails, provide one minimal troubleshooting command at a time.
+
+## Command templates
+
+Single GPU while preprocessing:
+
+```bash
+keep-gpu --gpu-ids 0 --vram 1GiB --interval 60 --busy-threshold 25
+```
+
+All visible GPUs with lighter load:
+
+```bash
+keep-gpu --vram 512MB --interval 180
+```
+
+Remote sessions (preferred: `tmux` for visibility and control):
+
+```bash
+tmux new -s keepgpu
+keep-gpu --gpu-ids 0 --vram 1GiB --interval 300
+# Detach with Ctrl+b then d; reattach with: tmux attach -t keepgpu
+```
+
+Fallback when `tmux` is unavailable:
+
+```bash
+nohup keep-gpu --gpu-ids 0 --vram 1GiB --interval 300 > keepgpu.log 2>&1 &
+echo $! > keepgpu.pid
+# Monitor: tail -f keepgpu.log
+# Stop: kill "$(cat keepgpu.pid)"
+```
+
+## Troubleshooting
+
+- Invalid `--gpu-ids`: ensure comma-separated integers only.
+- Allocation failure / OOM: reduce `--vram` or free memory first.
+- No utilization telemetry: ensure `nvidia-ml-py` works and `nvidia-smi` is available.
+- No GPUs detected: verify drivers, CUDA/ROCm runtime, and `torch.cuda.device_count()`.
+
+## Example
+
+User request: "Install KeepGPU from GitHub and keep GPU 0 alive while I preprocess."
+
+Suggested response shape:
+
+1. Install: `pip install "git+https://github.com/Wangmerlyn/KeepGPU.git"`
+2. Run: `keep-gpu --gpu-ids 0 --vram 1GiB --interval 60 --busy-threshold 25`
+3. Verify: check CLI logs for keep loop activity; stop with `Ctrl+C` when done.
+
+## Limitations
+
+- KeepGPU is not a scheduler; it only keeps already accessible GPUs active.
+- KeepGPU behavior depends on cluster policy; some schedulers require higher VRAM or tighter intervals.
diff --git a/skills/gpu-keepalive-with-keepgpu/agents/openai.yaml b/skills/gpu-keepalive-with-keepgpu/agents/openai.yaml
@@ -0,0 +1,3 @@
+display_name: KeepGPU CLI Operator
+short_description: Install and operate keep-gpu CLI commands safely on shared GPU machines.
+default_prompt: Use this skill to install KeepGPU and build run-ready keep-gpu commands with sensible defaults, verification steps, and troubleshooting.