
Use when creating new skills, editing existing skills, or verifying skills work before deployment
You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation.
Manages GKE TPU clusters using xpk. Creates, deletes, and lists TPU Nodepool resources on Google Kubernetes Engine. Multi-user safe - always queries GKE for real-time cluster state.
Use when implementation is complete, all tests pass, and you need to decide how to integrate the work - guides completion of development work by presenting structured options for merge, PR, or cleanup
Manage GKE-based TPU workloads — create pods/jobs via kubectl, sync code, and run multi-process benchmarks. Use when the user wants to create/manage/run TPU workloads on GKE. Reads config from gke.toml in the current working directory.
Check and fix lint issues for changed Python files. Supports single commit, commit range, and unstaged/staged working tree changes. Use when the user wants to verify or fix lint compliance.
Use when receiving code review feedback, before implementing suggestions, especially if feedback seems unclear or technically questionable - requires technical rigor and verification, not performative agreement or blind implementation
Subagent prompt template for reviewing a Beaver design document (RFC) against five-dimension completeness, fact traceability (anti-hallucination), and internal consistency. Used by /beaver-design before pushing the Draft PR. Returns one of PASS / BLOCK with structured feedback.
Use when executing implementation plans with independent tasks in the current session
Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes
Use when implementing any feature or bugfix, before writing implementation code
Use when analyzing register-level pipeline scheduling for TPU v7x kernels. Trigger when the user asks about instruction-level pipeline analysis, VPR register pressure, data hazard detection (RAW/WAR/WAW), or optimal instruction ordering for TPU pipelines.
Use when starting feature work that needs isolation from current workspace or before executing implementation plans - creates isolated git worktrees with smart directory selection and safety verification
Use when starting any conversation - establishes how to find and use skills, requiring Skill tool invocation before ANY response including clarifying questions
Use when analyzing TPU/GPU profiling performance — XProf MCP tools for operator breakdowns, memory profiles, A/B comparisons; offline methodology for trace parsing, MFU calculation, HBM memory analysis, XLA flag auditing, communication bottleneck identification
Audit the decomposition of a size/L Beaver issue into sub-tasks. Checks coverage, atomicity (200 LOC limit), and test definitions. Trigger when the user wants to review task decomposition quality.
Create or claim a Beaver-tracked GitHub Issue with automatic status transitions and guardrail checks. Trigger when the user wants to create a GitHub issue, claim/start a task, or pick up work.
Generate a project health report with milestone progress, stale/overdue detection, blocking chains, and risk analysis. Trigger when the user asks about project status, progress, health, or risks.
Use when you have a spec or requirements for a multi-step task, before touching code
Use when analyzing theoretical TPU v7x performance for a mathematical formula or comparing kernel performance against theoretical bounds. Trigger when the user asks about TPU performance modeling, roofline analysis, data flow optimization, or tiling strategy.
Decompose a Beaver Goal into Tasks (size/L) or a Task into SubTasks (size/S), guided by a design doc. Trigger when the user wants to split, breakdown, or decompose an issue into sub-issues.
Use when you have a written implementation plan to execute in a separate session with review checkpoints
Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always
Use when facing 2+ independent tasks that can be worked on without shared state or sequential dependencies
Deploys a SkyPilot-managed TPU cluster on GKE. Automatically ensures the required node pool exists for the requested TPU type, creating one if necessary. Supports running multiple TPU types in parallel on the same GKE cluster.
Write and submit a design document for a Beaver size/L issue in status/design-pending. Trigger when the user wants to write a design doc, start design review, or work on a design-pending issue.
Internal engine for Beaver commands. DO NOT trigger directly. Provides state machine rules, guardrail checks, field operations, and project config reading used by beaver-create, beaver-design, beaver-decompose, beaver-dev, beaver-pr, beaver-tracker, beaver-focus, and beaver-setup.
Show your personal Beaver work status: today's tasks, pending reviews, blockers, and DDL warnings with priority recommendations. Trigger when the user asks about their tasks, what to work on, or personal status.
Commit, push, and open a PR with Beaver compliance checks (LOC guard, label completeness, test evidence). Automatically transitions the linked Issue to review-needed. Trigger when the user wants to commit, push, or create a pull request.
Records the complete session content and logs it to a daily work directory with a dynamic filename based on the active CLI agent. Use this for automated progress tracking and documentation.
Executes Python scripts, tests, or benchmarks on a provisioned remote cluster (GPU or TPU) using SkyPilot. Use this skill when the user asks to run code on GPU, TPU, or any "remote" cluster.
Use when completing tasks, implementing major features, or before merging to verify work meets requirements
Mine local Claude/Codex session history to produce a structured work recap for the past 1-7 days, with optional sync to GitHub Issues. Trigger when the user asks to summarize their recent work, generate a daily/weekly report, or wants to see what they solved/researched/reviewed/was blocked on. Default range is 1 day.
Use when reading TPU pretraining profiles (xplane.pb, trace.json.gz) — describes the on-disk layout, the XSpace/XPlane/XLine/XEvent/XStat hierarchy, and provides reference scripts that future tpu-perf skills can read as schema documentation.
Use when analyzing TPU pretraining HBM occupancy from a profile directory — locates the static HBM peak (the same number TensorBoard's Memory Viewer shows), enumerates every buffer alive at the peak schedule moment with size / HLO instruction / opcode / op_name, and rolls the alive set up by opcode and op_name. Reads compile-time `*.hlo_proto.pb` (BufferAssignmentProto) as the primary source; runtime `*.xplane.pb` allocator events are a secondary, often-truncated signal.
Use when analyzing TPU pretraining compute efficiency from xplane.pb — produces source-line-aggregated HLO duration tables, layer-scoped breakdowns, non-compute (padding/cast/copy) audits, and v7x roofline shortfall vs theoretical peak. Reads schema documented by profile-anatomy.
--- name: comm-analysis description: Use when analyzing communication on a TPU pretraining profile — extracts every comm primitive (async + sync, TC + SparseCore), attributes axes via HLO replica_groups, computes per-row NCCL bus BW vs per-axis peak ICI BW (peak_link × k_torus_dims × directions_per_dim; TPUv7x: 200 GB/s bidir per link on a 3D torus; util% requires `--mesh-spec` with topology), and reports per-step compute/comm overlap. Builds on profile-anatomy. --- # Communication Analysis **