skills/autopilot/SKILL.md
Autonomous plan-to-code pipeline. Stress-tests the plan (using grill-me), fixes issues found, creates a branch, executes step by step with guardrails, runs code review and QA, then hands back a shippable branch. Zero human interaction until the final handoff. Use when the user says 'autopilot', 'just build it', 'auto execute', 'implement this plan', 'take it from here', or wants to go from plan to working code without babysitting.
npx skillsauth add vltansky/skills autopilotInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Plan in, shippable branch out. No hand-holding in between.
Autopilot takes a reviewed or draft plan and runs seven phases autonomously:
/ship-itEvery intermediate decision is auto-resolved. The user sees the finished result.
These auto-answer every question that would normally go to the user:
Autopilot runs fully autonomous with one exception:
If the roast phase produces a NOT_READY verdict (score below 60) with unresolved high-severity issues, stop and present the findings to the user. The plan needs human judgment before execution — autopilot is not the right tool for a fundamentally broken plan.
For READY or READY_WITH_RISKS verdicts: continue autonomously.
git status and git diff to understand the current state.If not already on a feature branch, create one:
# Detect git user prefix from git config or CLAUDE.md conventions
GIT_USER=$(git config user.name 2>/dev/null | tr '[:upper:]' '[:lower:]' | tr ' ' '-')
Derive a branch name from the plan topic: {user-prefix}/{plan-topic}.
Use the branch prefix convention from CLAUDE.md if one exists.
git checkout -b <branch-name>
If already on a feature branch (not main/master/develop), stay on it.
Break the plan into discrete implementation steps. Each step should be:
List the steps and move on. Do not ask for confirmation.
Load the grill-me skill from disk:
GRILL_PATH=""
for p in ~/.claude/skills/grill-me/SKILL.md ~/.claude/skills/*/grill-me/SKILL.md; do
[ -f "$p" ] && GRILL_PATH="$p" && break
done
echo "GRILL_PATH=${GRILL_PATH:-not_found}"
If found: read the file and follow its full methodology. If not found: run a lightweight adversarial review yourself (premise challenge, assumptions, feasibility, edge cases — same dimensions, less ceremony).
Grill-me is interactive by design. In autopilot mode, every interactive point is auto-resolved:
After the roast completes, check the verdict:
Take every finding from Phase 1 and apply it to the plan:
After fixing, re-extract plan steps if the structure changed (new steps, removed steps, reordered dependencies).
Write the updated plan back to the plan file (or note the changes in the decision log if there is no plan file on disk).
Emit a short transition summary:
Phase 1-2 complete. Roast score: [N]/100. Fixed [X] high, [Y] medium issues. [Z] items noted as known risks. Proceeding to execution with [N] steps.
Implement the fixed plan. Use parallel subagents when possible.
TDD_PATH=""
for p in ~/.claude/skills/tdd/SKILL.md ~/.claude/skills/*/tdd/SKILL.md; do
[ -f "$p" ] && TDD_PATH="$p" && break
done
DEBUG_PATH=""
for p in ~/.claude/skills/debug/SKILL.md ~/.claude/skills/*/debug/SKILL.md; do
[ -f "$p" ] && DEBUG_PATH="$p" && break
done
echo "TDD_PATH=${TDD_PATH:-not_found}"
echo "DEBUG_PATH=${DEBUG_PATH:-not_found}"
If TDD skill found: read it. Workers will follow TDD discipline (test first, then implement). If not found: workers write tests after implementation as a fallback.
If debug skill found: read its instrumentation approach for Phase 3 step 0b below.
Before writing any feature code, instrument the codebase for observability during execution:
# Find the project's logging convention
grep -r "console\.\|logger\.\|logging\." --include="*.ts" --include="*.js" --include="*.py" -l | head -5
// #region autopilot-debug
console.log('[autopilot] functionName entry', { arg1, arg2 });
// #endregion autopilot-debug
chore: add autopilot debug instrumentationThis instrumentation serves two purposes:
The instrumentation is removed in Phase 6 (cleanup) after everything passes.
Group plan steps into layers based on dependencies:
Layer 0: [steps with no dependencies — can all run in parallel]
Layer 1: [steps that depend on Layer 0]
Layer 2: [steps that depend on Layer 1]
...
If all steps are independent (no shared files, no import dependencies between them), they are all Layer 0 — maximum parallelism.
If the plan is small (3 or fewer steps) or all steps touch the same files: skip parallelism, execute sequentially on the current branch.
For each layer, launch subagents in parallel. Each subagent gets:
Implement the assigned step using TDD:
1. Write a failing test that defines the expected behavior.
Match existing test patterns in the project. Run it — verify it fails.
(If no test infrastructure exists for this area, skip to step 2 and note it.)
2. Write the minimum implementation to make the test pass.
3. Run guardrails: [type check command], [test command], [lint command]
4. If guardrails fail: read the error. Check autopilot-debug logs for context.
Fix and re-run. Max 2 retries. If still failing after retries, use the
debug skill's hypothesis approach: generate 3 hypotheses, investigate each.
5. Commit with a descriptive message (not "autopilot step N").
Stage specific files only — never `git add .` or `git add -A`.
6. Report: list files changed, tests written, guardrail results (pass/fail),
and any issues you could not resolve.
Sequential fallback: If the host does not support subagents, or if all steps have dependencies, execute sequentially yourself — same guardrail gate after each step.
Layer transitions: Wait for all subagents in a layer to complete before starting the next layer. If a subagent in Layer N fails, assess whether Layer N+1 steps depend on it — if yes, execute those sequentially yourself with the fix. If no, continue the next layer in parallel.
As soon as a layer completes, kick off review on that layer's diff in the background while the next layer executes:
This means review runs concurrently with execution of later layers. For single-layer plans, review runs after execution (no pipelining benefit).
After all layers complete, run the full validation suite:
Fix any integration issues introduced by combining parallel work.
Collect findings from pipelined review subagents (launched during Phase 3). If no pipelined reviews ran (sequential execution or no subagent support), run the full review now.
Load the simplify skill from disk if available:
SIMPLIFY_PATH=""
for p in ~/.claude/skills/simplify/SKILL.md ~/.claude/skills/*/simplify/SKILL.md; do
[ -f "$p" ] && SIMPLIFY_PATH="$p" && break
done
echo "SIMPLIFY_PATH=${SIMPLIFY_PATH:-not_found}"
If found: read it and follow its methodology (3 parallel review agents: code reuse, code quality, efficiency). If not found: run a lightweight self-review yourself covering the same three dimensions on the branch diff.
Merge findings from all review sources (pipelined + final). Deduplicate. Then for each finding:
refactor: [description of cleanup]Detect if this is a web app:
# Check for web indicators
HAS_WEB=false
[ -f "next.config.js" ] || [ -f "next.config.ts" ] || [ -f "next.config.mjs" ] && HAS_WEB=true
[ -f "vite.config.ts" ] || [ -f "vite.config.js" ] && HAS_WEB=true
[ -f "angular.json" ] && HAS_WEB=true
grep -q '"start"' package.json 2>/dev/null && grep -qE '"(react|vue|svelte|next|nuxt|angular)"' package.json 2>/dev/null && HAS_WEB=true
echo "HAS_WEB=$HAS_WEB"
If not a web app: skip Phase 5, proceed to handoff.
If web app: load the QA skill from disk if available:
QA_PATH=""
for p in ~/.claude/skills/qa/SKILL.md ~/.claude/skills/*/qa/SKILL.md; do
[ -f "$p" ] && QA_PATH="$p" && break
done
echo "QA_PATH=${QA_PATH:-not_found}"
If found: read it and follow its methodology in diff-aware mode — only test pages affected by the branch diff, not the full app.
If not found: skip QA. Do not attempt browser testing without the QA skill — it
requires agent-browser setup and structured methodology.
Remove debug instrumentation added in Phase 3 Step 0b.
# Find all autopilot-debug regions
grep -rn "autopilot-debug" --include="*.ts" --include="*.tsx" --include="*.js" --include="*.jsx" --include="*.py" .
Remove every #region autopilot-debug / #endregion autopilot-debug block and the
code between them. Verify no functional code was accidentally wrapped in a debug region.
After removal:
chore: remove autopilot debug instrumentationPresent the result to the user:
## Autopilot Complete
### Branch
`{branch-name}` — [N] commits ([M] execution, [K] review fixes, [J] QA fixes)
### Pipeline
| Phase | Duration | Result |
|-------|----------|--------|
| Roast | ~Xm | [N]/100 → [M]/100 after fixes |
| Execute | ~Xm | [N/M] steps, [K] parallel / [J] sequential |
| Review | ~Xm | [N] found, [M] fixed |
| QA | ~Xm | [N]/100 health (or skipped) |
### Roast Summary
- Fixed: [X] high, [Y] medium severity issues
- Known risks: [list]
### Execution Summary
- Layers: [N] ([M] steps ran in parallel, [K] sequential)
- Guardrail failures during execution: [N] (all resolved / [X] unresolved)
### Review Summary
- Issues: [N] (code reuse: [X], quality: [Y], efficiency: [Z])
- Fixed: [N], Skipped: [M] (false positives or not worth it)
### QA Summary (if run)
- Health score: [N]/100
- Issues found: [N], Fixed: [M], Deferred: [K]
- Screenshots: `.context/qa-reports/`
### Decision Log
| # | Phase | Decision | Principle | Rationale |
|---|-------|----------|-----------|-----------|
| 1 | Roast | ... | ... | ... |
### Final Guardrails
- Types: pass/fail
- Tests: pass/fail ([N] passed, [M] failed)
- Lint: pass/fail
- Build: pass/fail
### What I'd flag for human review
[Anything that was borderline, surprising, or where the decision principles
produced a defensible but debatable choice]
Suggest next step based on results:
/ship-ittools
Prepare a Hetzner Cloud VPS for secure Codex remote SSH access. Use when the user wants to create or configure a Hetzner server for Codex remote control, fix "No codex found in PATH" on a remote machine, install agent development tooling on a VPS, harden SSH access to a Hetzner server, or connect the server through Codex Settings, Connections, Add SSH.
data-ai
Summarize your GitHub activity from the last 24 hours across all repos. Use when user says "what did I do", "my activity", "standup", "recap", "summarize my day", "what-i-did", "git activity", "daily summary".
development
Test-driven development loop. Write failing test first, then implement to make it pass. Use when the user says 'tdd', 'test first', 'write the test first', 'failing test', 'red green refactor', or for any bug fix where the fix should be proven by a test. Also use when autopilot or other skills need test-first execution.
development
Review changed code for reuse, quality, and efficiency, then fix any issues found. Use when the user says "simplify", "simplify this", "review changes", "clean up my code", "check for duplicates", "code reuse review", or wants a post-change quality sweep.