From dec7a340035d8bc2c3be0ab1f6754d260b116bd4 Mon Sep 17 00:00:00 2001
From: Wang Siyuan <wsy0227@sjtu.edu.cn>
Date: Thu, 5 Mar 2026 16:18:55 +0800
Subject: [PATCH] chore(git): remove planning files from git tracking

---
 .gitignore                                    |  8 +++
 docs/plans/cli-service-dashboard/findings.md  | 72 -------------------
 docs/plans/cli-service-dashboard/progress.md  | 59 ---------------
 docs/plans/cli-service-dashboard/task_plan.md | 48 -------------
 4 files changed, 8 insertions(+), 179 deletions(-)
 delete mode 100644 docs/plans/cli-service-dashboard/findings.md
 delete mode 100644 docs/plans/cli-service-dashboard/progress.md
 delete mode 100644 docs/plans/cli-service-dashboard/task_plan.md

diff --git a/.gitignore b/.gitignore
index 68860dd..4c8a2ec 100644
--- a/.gitignore
+++ b/.gitignore
@@ -147,9 +147,17 @@ venv.bak/
 
 # mkdocs documentation
 /site
+
+# Planning files (created by planning-with-files skill)
 /task_plan.md
 /findings.md
 /progress.md
+task_plan.md
+findings.md
+progress.md
+**/task_plan.md
+**/findings.md
+**/progress.md
 
 # mypy
 .mypy_cache/
diff --git a/docs/plans/cli-service-dashboard/findings.md b/docs/plans/cli-service-dashboard/findings.md
deleted file mode 100644
index 872dcb7..0000000
--- a/docs/plans/cli-service-dashboard/findings.md
+++ /dev/null
@@ -1,72 +0,0 @@
-# Findings: Non-blocking CLI + Dashboard
-
-## Repository Observations
-
-- Blocking CLI entrypoint is `src/keep_gpu/cli.py` (`main` Typer command).
-- Existing remote/session model already exists in `src/keep_gpu/mcp/server.py` (`start_keep`, `stop_keep`, `status`, `list_gpus`).
-- Existing MCP HTTP mode only accepts JSON-RPC `POST` and does not expose REST or web UI.
-- Current tests for MCP are in `tests/mcp/test_server.py` and cover core session methods.
-
-## Design Direction (Frontend Skill)
-
-- Aesthetic direction: **retro-futuristic control room**.
-- Visual memory hook: glowing telemetry grid + layered glass panels + high-contrast amber/cyan instrumentation.
-- UX priority: one-screen control loop (start session, inspect health, stop session) for agent operators.
-
-## Implementation Notes
-
-- Keep backward compatibility: default `keep-gpu` remains blocking.
-- Non-blocking usage should be first-class through subcommands and local auto-start.
-- Dashboard should be served by the same local backend to avoid extra runtime coordination.
-
-## Implemented Architecture
-
-- `src/keep_gpu/cli.py` now exposes:
-  - blocking default callback (backward compatible),
-  - `serve`, `start`, `status`, `stop`, and `list-gpus` subcommands,
-  - local service auto-start in `start` by launching `python -m keep_gpu.mcp.server`.
-- `src/keep_gpu/mcp/server.py` now exposes REST routes and static UI serving:
-  - `GET /health`, `GET /api/gpus`, `GET /api/sessions`, `GET /api/sessions/{job_id}`
-  - `POST /api/sessions`, `DELETE /api/sessions`, `DELETE /api/sessions/{job_id}`
-  - dashboard assets on `/` with SPA fallback.
-- Dashboard source lives in `web/dashboard/` (React + Vite), with build output in
-  `src/keep_gpu/mcp/static/`.
-
-## Validation Findings
-
-- New CLI and server tests pass (`13 passed`).
-- Pre-commit hooks pass with no auto-fix needed after initial implementation.
-- `mkdocs build` succeeds; informational note reports plan/skills docs excluded from nav.
-
-## Iteration Findings (Post-UX Feedback)
-
-- Added explicit daemon lifecycle guidance in CLI output and docs.
-- Added `keep-gpu service-stop` command so users can terminate auto-started local daemon.
-- Updated `keep-gpu start` output to always print dashboard URL and follow-up stop commands.
-- Reworked dashboard visual language from neon/glassy to classic, restrained control-room style.
-- Expanded CLI tests to cover start-output hints and `service-stop` guardrails.
-
-## Iteration Findings (Bug Report Follow-up)
-
-- Root cause for noisy `keep-gpu stop --all` failure output:
-  - service commands still imported `torch` at module import time, causing unrelated NumPy warning noise,
-  - network timeout from `urlopen` was not normalized, so users could see raw traceback-style failures.
-- Fixes applied:
-  - moved `torch` import into blocking-only execution path,
-  - wrapped HTTP/timeout errors in friendly `RuntimeError` messages,
-  - extended command error handling to avoid traceback leaks to users,
-  - improved dashboard action state so releasing one session does not disable every release control globally.
-- Additional tests now assert timeout/error behavior for `keep-gpu stop --all` and HTTP wrapper timeout handling.
-
-## Root-Cause Fix Findings (Stop-All Reliability)
-
-- Reproduced that frequent stop timeouts are mostly service-side latency under release, not network transport.
-- Root causes identified:
-  - worker loops used `time.sleep(interval)`, which is non-interruptible and can delay shutdown for long intervals,
-  - stop RPC could block behind slow release execution,
-  - single-threaded HTTP server amplified perceived hangs under long requests.
-- Fixes implemented:
-  - replaced sleep points in CUDA/ROCm keep loops with `Event.wait(interval)` so stop signals interrupt immediately,
-  - added bounded release join timeout in CUDA/ROCm controllers to avoid indefinite blocking,
-  - made release-all server responses timeout-aware (`timed_out` reporting) with background continuation,
-  - switched HTTP server to threaded mode to prevent one slow request from blocking others.
diff --git a/docs/plans/cli-service-dashboard/progress.md b/docs/plans/cli-service-dashboard/progress.md
deleted file mode 100644
index 05c9a4f..0000000
--- a/docs/plans/cli-service-dashboard/progress.md
+++ /dev/null
@@ -1,59 +0,0 @@
-# Progress Log: Non-blocking CLI + Dashboard
-
-## Session 2026-03-04
-
-### Completed
-
-- Created implementation plan in `docs/plans/cli-service-dashboard/task_plan.md`.
-- Captured initial repository findings in `docs/plans/cli-service-dashboard/findings.md`.
-- Confirmed branch: `feat/cli-service-dashboard`.
-- Implemented CLI subcommands and service auto-start in `src/keep_gpu/cli.py`.
-- Extended server HTTP endpoints and static dashboard routing in `src/keep_gpu/mcp/server.py`.
-- Added React/Vite dashboard source under `web/dashboard/` and built static assets.
-- Added tests in `tests/mcp/test_http_api.py` and `tests/test_cli_service_commands.py`.
-- Updated README/docs/skill content for new service and dashboard workflow.
-- Addressed UX feedback:
-  - refined dashboard styling to a classic high-quality aesthetic,
-  - improved `keep-gpu start` messaging with dashboard URL and shutdown hints,
-  - added `keep-gpu service-stop` command for daemon shutdown.
-- Expanded tests for output/help lifecycle behavior.
-- Applied follow-up reliability fixes from real-user reproduction:
-  - moved `torch` import into blocking-only path,
-  - normalized service timeout/HTTP failures into user-friendly errors,
-  - fixed dashboard release-button state handling to avoid global lockouts.
-- Implemented root-cause shutdown fixes:
-  - converted CUDA/ROCm loop sleeps to interruptible waits,
-  - added bounded join behavior in controller `release()`,
-  - updated server stop behavior to surface timeout-aware payloads,
-  - moved HTTP server to threaded mode.
-
-### In Progress
-
-- PR is open; addressing review feedback and final merge readiness checks.
-
-### Next
-
-1. Finish open review comments and verify final CI pass.
-2. Merge PR after review completion.
-
-### Validation Results
-
-- `pytest tests/mcp/test_server.py tests/mcp/test_http_api.py tests/test_cli_thresholds.py tests/test_cli_service_commands.py` -> 13 passed.
-- `pre-commit run --all-files` -> all hooks passed.
-- `mkdocs build` -> success (non-blocking info about docs pages outside nav).
-
-### Validation Results (Follow-up)
-
-- `pytest tests/test_cli_service_commands.py tests/mcp/test_http_api.py tests/mcp/test_server.py tests/test_cli_thresholds.py` -> 18 passed.
-- `pre-commit run --all-files` -> all hooks passed.
-- `mkdocs build` -> success.
-
-### Validation Results (Root-Cause Fix)
-
-- `pytest tests/test_cli_service_commands.py tests/mcp/test_server.py tests/mcp/test_http_api.py tests/test_cli_thresholds.py tests/cuda_controller/test_keep_and_release.py tests/cuda_controller/test_throttle.py` -> 26 passed.
-- `pre-commit run --all-files` -> all hooks passed.
-- `mkdocs build` -> success.
-
-## Error Log (This Iteration)
-
-- `pre-commit run --all-files` reformatted `tests/test_cli_service_commands.py` by way of Black on first run; second run passed.
diff --git a/docs/plans/cli-service-dashboard/task_plan.md b/docs/plans/cli-service-dashboard/task_plan.md
deleted file mode 100644
index 749bba1..0000000
--- a/docs/plans/cli-service-dashboard/task_plan.md
+++ /dev/null
@@ -1,48 +0,0 @@
-# Task Plan: Non-blocking CLI + Local Service + Dashboard
-
-## Background
-
-`keep-gpu` currently runs as a blocking command. This works for manual shells but is awkward for LLM agent workflows that need to continue with follow-up commands.
-
-## Goal
-
-Add an agent-friendly, non-blocking CLI workflow while preserving the existing blocking behavior, then provide a production-grade web dashboard backed by the same local service.
-
-## Proposed Solution
-
-1. Keep existing `keep-gpu` (blocking) unchanged for compatibility.
-2. Extend CLI with service-driven subcommands: `serve`, `start`, `status`, `stop`, and `list-gpus`.
-3. Implement service auto-start in `start` when local service is unavailable.
-4. Extend HTTP server with REST endpoints and serve a React/Vite-built dashboard.
-5. Update docs and the KeepGPU skill to reflect the new preferred agent workflow.
-
-## Phases
-
-### Phase 1: Design and Scaffolding
-- [x] Define CLI subcommand UX and local service contract
-- [x] Add service client/autostart helpers
-- [x] Add REST handlers and static file serving skeleton
-- **Status:** complete
-
-### Phase 2: Dashboard Frontend (React/Vite)
-- [x] Add Vite app and distinctive UI design
-- [x] Implement live GPU/session views and session controls
-- [x] Build and bundle static assets for Python serving
-- **Status:** complete
-
-### Phase 3: Tests and Documentation
-- [x] Add/extend tests for new CLI and server behavior
-- [x] Update CLI/MCP docs and README
-- [x] Update `skills/gpu-keepalive-with-keepgpu/SKILL.md`
-- **Status:** complete
-
-### Phase 4: Validation
-- [x] Run targeted pytest suites
-- [x] Run `pre-commit run --all-files`
-- [x] Run `mkdocs build`
-- **Status:** complete
-
-## Risks / Open Questions
-
-1. How to keep auto-start robust without introducing zombie processes.
-2. How to package dashboard assets cleanly for source and installed usage.