add GKE TPU skills: exec-gke-tpu and profile-tpu-kernel#19
add GKE TPU skills: exec-gke-tpu and profile-tpu-kernel#19
Conversation
- exec-gke-tpu: provision GKE TPU workloads via xpk, sync code, run multi-process benchmarks on TPU pods (e.g. TPU v7x-8) - profile-tpu-kernel: profile Pallas/JAX kernels with xprof LLO utilization (MXU, Vector ALU, etc.) - Register gke-tpu plugin in marketplace.json Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR introduces a new Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces the gke-tpu plugin, which provides skills for provisioning and executing workloads on GKE-based TPUs using xpk, as well as profiling Pallas/JAX kernels with xprof. The documentation covers prerequisites, cluster creation, multi-process execution, and TensorBoard integration for trace analysis. Feedback was provided to improve the reusability of the documentation by replacing hardcoded project IDs with placeholders, updating deprecated package references like msgpack-python, and adding explanatory comments for specific dependency version pins.
|
|
||
| # 4. Auth | ||
| gcloud auth login | ||
| gcloud config set project tpu-service-473302 |
There was a problem hiding this comment.
Hardcoding the project ID tpu-service-473302 makes this skill less reusable for other users or projects. Consider using a placeholder like <YOUR_PROJECT_ID> or an environment variable to make it more flexible.
| gcloud config set project tpu-service-473302 | |
| gcloud config set project <YOUR_PROJECT_ID> |
| huggingface-hub safetensors transformers tiktoken \ | ||
| setproctitle psutil pandas httpx openai aiohttp \ | ||
| pybase64 partial_json_parser omegaconf \ | ||
| msgpack-python requests typing-extensions |
There was a problem hiding this comment.
| 'tensorboard-plugin-profile>=2.22' \ | ||
| 'xprof>=2.22' \ | ||
| 'protobuf>=5,<7' \ | ||
| 'setuptools<81' |
There was a problem hiding this comment.
Pinning setuptools to <81 is a workaround for the pkg_resources issue (as noted in troubleshooting). While necessary for now, this can lead to dependency conflicts. Consider adding a comment here explaining the reason for the pin, or explore a more robust solution that doesn't rely on pkg_resources or is compatible with newer setuptools versions.
| 'setuptools<81' | |
| 'setuptools<81' # Pinned due to pkg_resources removal in setuptools >= 82 |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/gke-tpu/skills/exec-gke-tpu/SKILL.md`:
- Around line 15-37: The macOS-specific prerequisites and PATH instructions in
SKILL.md (lines showing brew installs, /Users/... path, and pipx with Python
3.13) need to be scoped and supplemented: update the doc to explicitly label the
current commands as macOS/Homebrew instructions and add brief Linux alternatives
(apt/yum or curl/install steps for Google Cloud SDK, kubectl install, pipx
install command for system python, and an equivalent PATH example using $HOME
instead of /Users/$(whoami)). Also add a short note that Windows users should
follow Cloud SDK and kubectl Windows installers or WSL, and ensure the PATH
section references a cross-platform pattern (e.g., $HOME/.local/bin) rather than
an absolute macOS-only path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2bbb198d-4715-47b4-9450-01d8dc09dcb5
📒 Files selected for processing (4)
.claude-plugin/marketplace.jsonplugins/gke-tpu/.claude-plugin/plugin.jsonplugins/gke-tpu/skills/exec-gke-tpu/SKILL.mdplugins/gke-tpu/skills/profile-tpu-kernel/SKILL.md
| The following tools must be installed locally. Install via: | ||
|
|
||
| ```bash | ||
| # 1. Google Cloud SDK | ||
| brew install --cask google-cloud-sdk | ||
|
|
||
| # 2. kubectl + auth plugin | ||
| gcloud components install kubectl gke-gcloud-auth-plugin beta --quiet | ||
|
|
||
| # 3. xpk (must use Python 3.13, NOT 3.14 which has argparse incompatibility) | ||
| brew install pipx | ||
| pipx install xpk --python python3.13 | ||
|
|
||
| # 4. Auth | ||
| gcloud auth login | ||
| gcloud config set project tpu-service-473302 | ||
| gcloud auth application-default login | ||
| ``` | ||
|
|
||
| **PATH setup** (needed in every shell/command): | ||
| ```bash | ||
| export PATH="/Users/$(whoami)/.local/bin:/opt/homebrew/bin:/opt/homebrew/share/google-cloud-sdk/bin:/usr/bin:$PATH" | ||
| ``` |
There was a problem hiding this comment.
Clarify OS scope for setup commands (currently macOS-specific).
The prerequisite and PATH instructions are Homebrew/macOS-specific (brew, /Users/...) but the doc doesn’t explicitly scope this section to macOS or provide Linux alternatives. This can cause setup failures for non-macOS users.
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 28-28: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/gke-tpu/skills/exec-gke-tpu/SKILL.md` around lines 15 - 37, The
macOS-specific prerequisites and PATH instructions in SKILL.md (lines showing
brew installs, /Users/... path, and pipx with Python 3.13) need to be scoped and
supplemented: update the doc to explicitly label the current commands as
macOS/Homebrew instructions and add brief Linux alternatives (apt/yum or
curl/install steps for Google Cloud SDK, kubectl install, pipx install command
for system python, and an equivalent PATH example using $HOME instead of
/Users/$(whoami)). Also add a short note that Windows users should follow Cloud
SDK and kubectl Windows installers or WSL, and ensure the PATH section
references a cross-platform pattern (e.g., $HOME/.local/bin) rather than an
absolute macOS-only path.
Summary by CodeRabbit
New Features
Documentation