
Handle feedback from PR reviewers. For each inline thread: reply on-thread and resolve after user approves. Use when developers left comments on your PR and you need to address them. Triggers on 'address comments', 'fix PR feedback', 'what did the reviewer say', 'handle review', 'resolve comments', 'comments from dev', 'dev feedback'.
Parallel work orchestration — decompose large changes into 5-30 worktree agents that each open a PR. Use when the user says 'batch', 'do this in parallel', 'split into PRs', 'bulk change', 'mass refactor', or wants a sweeping mechanical change across many files.
Weekly engineering retrospective. Analyzes commit history, work patterns, and code quality metrics with persistent trend tracking. Team-aware: per-person contributions with praise and growth areas. Use when asked to 'weekly retro', 'what did we ship', 'engineering retrospective', 'how was this week', or 'show me the commit stats'. Proactively suggest at the end of a work week or sprint.
Prepare a Hetzner Cloud VPS for secure Codex remote SSH access. Use when the user wants to create or configure a Hetzner server for Codex remote control, fix "No codex found in PATH" on a remote machine, install agent development tooling on a VPS, harden SSH access to a Hetzner server, or connect the server through Codex Settings, Connections, Add SSH.
This skill should be used when debugging frontend/UI bugs that need runtime evidence. USE THIS SKILL (instead of adding console.log) when you're about to say "add console.log and ask user to check", "open DevTools and tell me what you see", "reproduce the bug and share the output", "check the browser console". Triggers: "debug this", "fix this bug", "why isn't this working", "investigate this issue", "trace the problem", "figure out why X happens", "UI not updating", "state is wrong", "value is null/undefined", "click doesn't work", "modal not showing". Automates log collection server-side - you read logs directly, no user copy-paste needed.
Evaluate and optimize AGENTS.md/CLAUDE.md instruction files through A/B testing. Use when the user says "eval my agents.md", "test my claude.md", "optimize my instructions", "which rules matter", "trim my agents file", or wants to know which instructions in their global config actually change model behavior vs. are dead weight.
Research a technical topic and produce an RFC document backed by real code evidence from GitHub. Use when user says 'write an RFC', 'RFC research', 'create RFC for', 'technical proposal', 'design doc', 'investigate X', 'research X and write a proposal', 'architecture decision record', 'ADR', or needs a structured technical decision document with prior art analysis.
Autonomous plan-to-code pipeline. Stress-tests the plan (using grill-me), fixes issues found, creates a branch, executes step by step with guardrails, runs code review and QA, then hands back a shippable branch. Zero human interaction until the final handoff. Use when the user says 'autopilot', 'just build it', 'auto execute', 'implement this plan', 'take it from here', or wants to go from plan to working code without babysitting.
Designer's eye visual audit of a live site — finds typography issues, spacing violations, AI slop patterns, hierarchy problems, interaction state gaps — then fixes them with atomic commits and before/after screenshots. Use when asked to 'audit the design', 'visual QA', 'design polish', 'does this look good', 'check the UI', or 'fix the design'. Proactively suggest when the user mentions visual inconsistencies or wants to polish a live site before shipping.
You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation.
Search previous AI chat conversations from Cursor IDE and Claude Code by content, affected file, or project. Use when the user asks about previous conversations, wants to find how they solved something before, or needs to recall past AI interactions.
Structured adversarial review that pushes back on a plan, challenges the premise, compares alternatives, and stress-tests the design until the main risks are explicit. Use when the user asks to "grill me", stress-test a plan, poke holes in an approach, challenge assumptions, pressure-test a design, or validate an early-stage idea before building ("I have an idea", "is this worth building", "grill me on this idea").
Systematically QA test a web application and fix bugs found. Runs browser-based testing, iteratively fixes bugs in source code, commits each fix atomically, and re-verifies. Use when asked to 'qa', 'QA', 'test this site', 'find bugs', 'test and fix', or 'fix what's broken'. Three tiers: Quick (critical/high only), Standard (+medium, default), Exhaustive (+cosmetic). Produces before/after health scores, fix evidence, and a ship-readiness summary. For report-only mode, use /qa-only.
Brutally honest AGENTS.md/CLAUDE.md review backed by real A/B test evidence. Not just opinions — actual proof that your rules are dead weight. Roasts instruction files for bloat, slop, and redundancy, then proves it by running evals. Use when user says "roast my agents.md", "roast my CLAUDE.md", "prove my rules are useless", "eval roast", or wants entertaining evidence-based feedback on their AI config files. Also triggers on "audit my instructions" or "are my rules helping".
Brutally honest code review with comedic flair. Roasts the sins, then redeems the sinner. Use when the user says "roast my code", "roast this", "tear this apart", "be brutal", "savage review", "destroy my code", "flame this", or wants entertaining but actionable code feedback. Also triggers on "what's wrong with this code" with a casual tone, "how bad is this", or "rate my code".
Create a GitHub PR with conventional format and AI session context. Use when user says 'create PR', 'open PR', 'submit changes', 'send to dev', 'ship it', or is done with their task.
Review changed code for reuse, quality, and efficiency, then fix any issues found. Use when the user says "simplify", "simplify this", "review changes", "clean up my code", "check for duplicates", "code reuse review", or wants a post-change quality sweep.
Test-driven development loop. Write failing test first, then implement to make it pass. Use when the user says 'tdd', 'test first', 'write the test first', 'failing test', 'red green refactor', or for any bug fix where the fix should be proven by a test. Also use when autopilot or other skills need test-first execution.
Summarize your GitHub activity from the last 24 hours across all repos. Use when user says "what did I do", "my activity", "standup", "recap", "summarize my day", "what-i-did", "git activity", "daily summary".
Systematic root-cause debugging for any bug — backend, frontend, logic, integration. Hypothesis-driven investigation with evidence collection. Use when the user says 'debug', 'investigate', 'why is this broken', 'root cause', 'trace this bug', 'figure out why', 'this doesn't work', or 'unexpected behavior'. For frontend-specific runtime debugging with a log server, use /debug-mode instead.