skills/grill-me/SKILL.md
Structured adversarial review that pushes back on a plan, challenges the premise, compares alternatives, and stress-tests the design until the main risks are explicit. Use when the user asks to "grill me", stress-test a plan, poke holes in an approach, challenge assumptions, pressure-test a design, or validate an early-stage idea before building ("I have an idea", "is this worth building", "grill me on this idea").
npx skillsauth add vltansky/skills grill-meInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Adversarial review for plans and designs. Find what breaks before implementation starts.
<HARD-GATE> Do NOT write code or begin implementation. The output is a stress-test report and readiness verdict, not an implementation. </HARD-GATE>Three uses, all optional. If the host does not support them, the main agent handles everything directly.
Pre-scan (before grilling): Launch 1 background subagent to gather ammunition while the main agent reads the plan. The subagent searches for: existing code that overlaps with the plan, assumptions that contradict the codebase, and simpler alternatives already in the repo. If the host has code search tools that can search external repositories, also search for how other projects solve the same problem — real-world patterns and prior art make pushback concrete. Prefer octocode MCP tools (githubSearchCode, githubViewRepoStructure) when available; fall back to any GitHub search capability the host provides. Its brief feeds into the initial assessment.
During grilling: While the user answers the current question, launch a background subagent to prepare evidence for the next dimension. This includes targeted external research — use octocode MCP to search for how other projects handle the specific concern coming up next (e.g., "how do popular repos handle auth token refresh" before the Security dimension, "how did X library migrate from Y" before Feasibility). Concrete prior art mid-grill ("3 projects tried your approach and hit this wall") is stronger ammunition than generic pre-scan findings. Do not use subagents for scoring, user questions, or the blocking decision.
Outside voice (after verdict): Launch 1 subagent with fresh context. Give it the plan, a summary of the grill (questions asked, defenses given, unresolved items), and one instruction: "What did this review miss? What question should have been asked but wasn't?" Present findings. The user decides what to act on.
Before challenging a plan, check what already exists:
Do not propose a custom solution until you have checked whether the runtime, framework, or repo already solves it. If first-principles reasoning contradicts conventional wisdom and the reasoning is strong, name it: Eureka: the usual approach is wrong here because ...
Before presenting anything to the user, gather context:
If there is no codebase (pure idea, early-stage plan): external research becomes more valuable — it may be the only evidence available. Shift weight toward Assumptions and Feasibility dimensions.
If a background subagent is available, delegate steps 3-6 to it while doing steps 1-2 yourself.
If the input is a raw idea rather than a formed plan — "I want to build X", "is this worth building", vague concept with no spec, doc, or structured approach — run this step to make the idea grillable. Skip if the user provided a plan, PRD, design doc, or structured proposal.
Ask these one at a time. Use the host's ask-user tool when available. Skip any question the user's initial prompt already answered.
If the user shows impatience ("just grill me", "skip the questions"): ask one more question (the most critical remaining one), then proceed. If they push back a second time, proceed immediately with what you have.
After this step, synthesize the answers into a working problem statement and proceed to the Initial Assessment. The answers feed directly into the Premise Challenge — they are the claims you will now stress-test.
Present a short assessment including pre-scan findings:
--- Stress-Test Assessment ---
Initial readiness: 58/100
Weakest areas: premise, feasibility, edge cases
Estimated questions: 8-14
Estimated time: 10-20 min
Pre-scan findings:
- [what the pre-scan found: overlap, broken assumptions, alternatives]
- [or: "No codebase context — heavier weight on assumptions and feasibility"]
Review order (weakest first):
Premise Challenge
Assumptions
Feasibility
Edge Cases
Security/Risk
Maintainability
Scope
Commands: done | skip | back | I don't know
The score is a calibrated estimate, not a measurement.
Mandatory. Before debating implementation, challenge whether the plan itself is the right move.
APPROACH A: Minimal path
Summary: ...
Effort: S/M/L
Risk: Low/Med/High
Pros: ...
Cons: ...
Reuses: ...
APPROACH B: Long-term path
Summary: ...
Effort: S/M/L
Risk: Low/Med/High
Pros: ...
Cons: ...
Reuses: ...
RECOMMENDATION: Choose [X] because ...
If the current plan is not the best path, say so directly and make the user defend it. Where useful, label each approach as tried-and-true, new-and-popular, or first-principles.
Run remaining dimensions weakest-first. For each:
Dimension: Feasibility
Score: 62 → 71
Issues: 1 high, 2 medium
Unresolved: 1
Next: Edge Cases
If the user tries to exit early with score below 60:
Warning: readiness is below 60/100.
This plan is still likely to cause avoidable rework.
Stop anyway?
After all dimensions are scored, if the host supports subagents:
"Want an outside voice? A fresh reviewer can look for what this grill missed. Takes about a minute."
If yes: launch a subagent as described in the Subagents section. Present its findings under an "Outside Voice" header. The user decides what to act on.
If no, or if subagents are unavailable: skip and proceed to the report.
Produce the final report in chat. Persist to disk if possible (see Report section below).
After delivering the report, suggest the natural next step based on the verdict:
READY or READY_WITH_RISKS:
Offer one of two paths (pick the one that fits, or present both if ambiguous):
/rfc-research — when the grill uncovered tradeoffs, competing approaches, or prior art worth formalizing. Fits technical decisions (architecture, library selection, system design) where a documented proposal will outlive this conversation.NOT_READY: Do not suggest either. The plan needs rework first — say what needs to change before it is worth formalizing or implementing.
For every question, use this structure. Only ask when the answer is both not obvious from context and important enough to change the recommendation or score.
RECOMMENDATION: Choose [X] because [one-line reason]I don't know / skip / done / backInclude: Concern: high/medium/low and Confidence: high/medium/low.
Use the host's ask-user tool when available. If not, present the same structure in chat and wait.
Batch questions (max 3) when they share the same dimension and premise. Prefer batching for premise challenge and low-medium severity probing. Prefer single questions for high-severity risks, vague answers, or controversial decisions.
If a background subagent can prepare evidence for the next batch, keep the current turn short and use its findings in the next turn.
Track readiness on a 100-point scale. Re-score after every dimension.
| Dimension | Weight | |-----------|--------| | Premise Challenge | 20 | | Assumptions | 20 | | Feasibility | 20 | | Edge Cases | 15 | | Security/Risk | 10 | | Maintainability | 10 | | Scope | 5 |
Do not inflate the score to be polite.
First reasonable match:
docs/grill-me/.ai/grill-me/If none are appropriate, skip persistence and say so.
Filename: YYYY-MM-DD-<topic>-stress-test.md
# Stress-Test Report: <topic>
- Verdict: READY | READY_WITH_RISKS | NOT_READY
- Score: 78/100
- Questions asked: 11
- Dimensions covered: 6/6
- Chosen approach: Minimal path | Long-term path | Modified path
## Blast Radius
- Services: ...
- Teams: ...
- Data: ...
- Customers: ...
## Biggest Pushback
- ...
## High Severity
- [ ] ...
## Medium Severity
- [ ] ...
## Low Severity
- [ ] ...
## Unresolved
- [ ] ...
## Well-Defended
- [x] ...
## Outside Voice Findings
- ... (if run)
## Recommended Next Step
- ...
Direct, skeptical, concrete. Push back on weak framing. No passive consultant tone. Name the flaw, why it matters, and what to do next. Prefer "I don't buy this yet because..." over soft hedging when the plan is weak.
The user should feel challenged, not stonewalled. Pressure-test the idea, then leave them with a sharper plan.
tools
Prepare a Hetzner Cloud VPS for secure Codex remote SSH access. Use when the user wants to create or configure a Hetzner server for Codex remote control, fix "No codex found in PATH" on a remote machine, install agent development tooling on a VPS, harden SSH access to a Hetzner server, or connect the server through Codex Settings, Connections, Add SSH.
data-ai
Summarize your GitHub activity from the last 24 hours across all repos. Use when user says "what did I do", "my activity", "standup", "recap", "summarize my day", "what-i-did", "git activity", "daily summary".
development
Test-driven development loop. Write failing test first, then implement to make it pass. Use when the user says 'tdd', 'test first', 'write the test first', 'failing test', 'red green refactor', or for any bug fix where the fix should be proven by a test. Also use when autopilot or other skills need test-first execution.
development
Review changed code for reuse, quality, and efficiency, then fix any issues found. Use when the user says "simplify", "simplify this", "review changes", "clean up my code", "check for duplicates", "code reuse review", or wants a post-change quality sweep.