brutal-idea-eval/SKILL.md
Comprehensive IDEA evaluation combining Reality's Moat scar tissue framework, Revenue Reality Check, VC Power-Law filter, and structured deep research pipeline. Point it at an IDEA.md for end-to-end analysis: defensibility ratio, revenue viability, VC scalability, validated research, and synthesized verdict with pivot recommendations.
npx skillsauth add sssemil/skills brutal-idea-evalInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Comprehensive IDEA evaluation engine. Evaluates a single IDEA.md end-to-end using:
Produces a validated brutal reality analysis, an optimistic opportunity lens, a structured deep research report, a final synthesized verdict, and a pivot recommendation.
Agent assumptions (applies to all agents and subagents):
When pointed at an IDEA.md, execute these phases in order:
phase-2-moat.mdphase-3-vc.mdphase-4-revenue.mdphase-5-verdict.mdoutline.yaml, fields.yaml, results/*.json, report.mdverdict.mdEach phase builds on the previous. Do not skip phases.
Every phase writes its output to a file in the session directory and updates state.yaml before moving to the next phase. This enables resume from zero context at any point.
workspace/idea-eval/IE-<NNNN>-<slug>/
├── state.yaml # Phase completion tracking (source of truth for resume)
├── phase-2-moat.md # Phase 2 output
├── phase-3-vc.md # Phase 3 output
├── phase-4-revenue.md # Phase 4 output
├── phase-5-verdict.md # Phase 5 output (includes claims catalog)
├── outline.yaml # Phase 6 research outline
├── fields.yaml # Phase 6 field definitions
├── progress.yaml # Phase 6 research progress
├── results/ # Phase 6 research results (one JSON per item)
├── generate_report.py # Phase 6 report script
├── report.md # Phase 6 deep research report
└── verdict.md # Phase 7 final synthesis
session: "IE-<NNNN>-<slug>"
idea_source: "<absolute path to IDEA.md>"
created: "<YYYY-MM-DD>"
phases:
phase_1_context: completed
phase_2_moat: completed
phase_3_vc: in_progress
phase_4_revenue: pending
phase_5_verdict: pending
phase_6_research: pending
phase_6_report: pending
phase_7_synthesis: pending
Rules:
state.yaml AFTER writing the phase output file, not before.completed only when its output file exists AND state.yaml says so.in_progress means the phase was started but not finished (crash recovery target).pending means not yet started.Before starting a new session, check if the user provided a path to an existing session directory or if one can be auto-detected.
Auto-detection: Check for existing workspace/idea-eval/ directories. If the user provides a session path or says "resume," read state.yaml from that directory and skip to the Resume Logic section at the bottom of this document.
If no resume is detected, proceed with a fresh session.
Read IDEA.md from the project root or path provided by user.
Extract and catalog:
If IDEA.md is missing or empty, stop and ask the user to provide one.
date +%Y-%m-%d
Find the highest existing session number:
ls workspace/idea-eval/ 2>/dev/null | grep -oE '[0-9]{4}' | sort -rn | head -1
New session number = highest + 1. If none exist, start at 0001.
Generate from the idea name:
Session directory: workspace/idea-eval/IE-<NNNN>-<slug>/
Create the directory immediately:
mkdir -p workspace/idea-eval/IE-<NNNN>-<slug>/results
Write initial state.yaml:
session: "IE-<NNNN>-<slug>"
idea_source: "<absolute path to IDEA.md>"
created: "<YYYY-MM-DD>"
phases:
phase_1_context: completed
phase_2_moat: pending
phase_3_vc: pending
phase_4_revenue: pending
phase_5_verdict: pending
phase_6_research: pending
phase_6_report: pending
phase_7_synthesis: pending
Phase 1 is marked completed because IDEA.md has been read and the session is initialized.
defensibility = scar_tissue / specifiable_code
Most ideas are majority specifiable code. Assume low ratio unless proven otherwise.
Split the idea into two piles.
APIs, dashboards, auth, billing logic, SDKs, ML training loops, UI components, orchestration layers, data pipelines, CRUD systems.
If AI can reproduce it from a description, it belongs here. Be generous — most software is specifiable.
Knowledge earned only by acting in the system:
Ask: Could a well-funded competitor converge on this knowledge using public information? If yes, it is not scar tissue — it is a stockpile.
State the ratio honestly:
Run in order. Each is a gate.
Can equivalent knowledge be generated by watching data?
Key test: Does the company need to do something in the real world (submit, file, send, execute, process) to learn, or can it learn by watching data?
If the answer is "AI can learn this from data," the moat dissolves. Say so clearly.
Moats compound only in non-stationary systems. State the direction.
Does the system adapt against what you know?
Does operating for more customers generate combinatorial knowledge?
Crossing streams create compounding moats. Independent streams create stockpiles.
If a well-funded competitor started today with the best AI available, how long before they match your operational knowledge?
Be concrete about what the competitor would need to rebuild and how long it would take.
Is the system you are earning scar tissue in going to keep existing?
Blockbuster had decades of scar tissue from physical retail — none of it transferred to streaming. The knowledge was real but the system was replaced.
Ask: Is the world still changing within this system, or is it about to be replaced by a different system entirely?
If there is a plausible system replacement on the horizon, flag it. Scar tissue in a dying system is a liability, not an asset.
Write the complete Phase 2 analysis to phase-2-moat.md in the session directory. Include all five steps: ratio, survival questions, interaction effects, competitive compression, and system replacement risk.
Then update state.yaml: set phase_2_moat: completed.
After moat analysis, apply these filters. These determine whether the idea is investable at scale, separate from whether it is defensible.
Can this plausibly reach $500M+ ARR?
Evaluate:
If structurally capped (geography, regulation, niche TAM), state the ceiling clearly. A $50M ARR ceiling is still a real business — just not a venture-scale one.
Do customers have to pass through this layer? Or can they route around it?
Does replacing you require:
If switching is easy (export data, plug in competitor), control is weak. If switching requires re-earning scar tissue, control is strong.
Does distribution compound automatically? Or does growth require linear outbound effort forever?
Evaluate:
Distribution leverage often matters more than scar tissue. A strong moat with no distribution compounds slowly. Weak moat with strong distribution can reach revenue fast.
Write the complete Phase 3 analysis to phase-3-vc.md in the session directory. Include all three tests: ceiling, inevitable touchpoint, and distribution asymmetry.
Then update state.yaml: set phase_3_vc: completed.
This axis is separate from defensibility. A low-ratio idea that hits $10k MRR fast is a legitimate strategy: get revenue flowing, use it to fund the search for scar tissue, and build defensibility while customers are already paying. Do not use this phase to dismiss or filter out ideas — use it to add information.
Work backward from $10,000/month at three price points:
State which price point is most realistic for this idea and why.
How long from "product is built" to "first dollar received"? Map the steps:
Build -> Find customer -> Get attention -> Demo/trial -> Close -> Payment received
If time-to-first-dollar exceeds 3 months, the 6-month target is at serious risk.
What is the realistic sales cycle at the identified price point? How many complete deal cycles fit in 6 months? Subtract onboarding time, pilot periods, and procurement delays. Does the math still work?
Which channel can the founder realistically use from day one?
State which channel is most realistic and what that implies for the ramp.
Three key signals:
If nobody is paying or solving manually, willingness-to-pay risk is high. State this clearly.
Which pattern describes the most likely revenue trajectory?
State the shape and whether it can reach $10k MRR within the 6-month window.
State the single biggest risk to hitting $10k MRR in 6 months.
If the idea is low-ratio on defensibility but scores well on revenue, call this out explicitly as a viable path. Revenue buys time and funding to discover where the scar tissue lives. The strategic question the founder must answer: "What operational knowledge will you accumulate while serving these customers that a competitor cannot?" If they have a credible answer, revenue-first is not a consolation prize — it is a strategy.
Write the complete Phase 4 analysis to phase-4-revenue.md in the session directory. Include all six sub-questions (R1-R6), the revenue verdict, and the revenue-first strategy note if applicable.
Then update state.yaml: set phase_4_revenue: completed.
Before launching deep research, synthesize what you know into an initial verdict. This frames the research questions.
**Ratio:** [High / Medium / Low] — X% specifiable, Y% scar tissue
**Volatility:** [Converging / Stable / Increasing / Extreme] — direction of system complexity
**Interventional:** [Yes / Partial / No] — does operating generate knowledge that watching cannot?
**Adversarial:** [Yes / No] — does the system fight back?
**Interaction effects:** [Strong / Weak / None] — do customer streams cross?
**Time to scar tissue:** [Immediate / Months / Years / Never] — how fast can you start accumulating?
**System replacement risk:** [Low / Medium / High] — could the whole domain get disrupted?
**VC ceiling:** [$XB / $XM / Capped] — maximum plausible ARR
**Inevitable touchpoint:** [Strong / Weak / None] — can customers route around you?
**Distribution asymmetry:** [Compounding / Linear / None] — does growth feed itself?
**$10k MRR in 6 months:** [Likely / Possible / Unlikely / Near-impossible] — [biggest risk or enabler]
Choose the most honest characterization:
List every factual claim from IDEA.md that the deep research phase should verify:
These become research items for Phase 6.
Write the complete Phase 5 analysis to phase-5-verdict.md in the session directory. Include:
Then update state.yaml: set phase_5_verdict: completed.
This phase validates claims from IDEA.md using the deep research engine. It runs fully autonomously with resume support.
On resume, re-read prior phase outputs to reconstruct context:
IDEA.md from path stored in state.yaml → idea_sourcephase-5-verdict.md → get claims-to-validate listThe session directory already exists (created in Phase 1).
Update state.yaml: set phase_6_research: in_progress.
Using the claims catalog from Phase 5 (or phase-5-verdict.md on resume), generate:
Research objects derived from IDEA.md claims:
Each item includes:
name: Item namecategory: Classificationdescription: What this item validates and why it mattersPer category, define research fields:
market_size, cagr, data_sourceincumbent_players, funding_raised, market_shareswitching_costs, integration_depthregulatory_barriers, compliance_timelinemoat_indicators, compression_estimateuncertainty_flagsProceed directly to web search supplement. Do not ask the user for confirmation.
Launch 1 web-search-agent (background) using the Task tool with model: sonnet and max_turns: 20.
Parameter Retrieval:
{topic}: The idea name/domain from IDEA.md{YYYY-MM-DD}: Current date from Step 0.2{step1_output}: Complete output from Step 6.1 (items list + field framework){time_range}: "Since 2024"Hard Constraint: The following prompt must be strictly reproduced, only replacing variables in {xxx}. Do not modify structure or wording.
Prompt Template:
You are an elite internet researcher. Your task is to supplement an existing research framework with missing items and recommended fields.
## Research Methodology
Before searching, determine which search strategies apply to this topic. Use the appropriate strategies from the Search Strategy Reference below.
Get today's date first:
date +%Y-%m-%d
Generate 5-10 different search query variations to maximize coverage:
- Include technical terms, product names, and common variations
- Think of how different people might describe the same topic
- Use exact phrases in quotes for specific names
- Include version numbers and dates when relevant
## Information Gathering Standards
- Read beyond the first few results - valuable information is often buried
- Look for patterns across different sources
- Pay attention to dates to ensure relevance
- Note different approaches and their trade-offs
- Identify authoritative sources and experienced contributors
- Check for updated information or superseded approaches
- Verify across multiple sources when possible
## Task
Research topic: {topic}
Current date: {YYYY-MM-DD}
Based on the following initial framework, supplement latest items and recommended research fields.
## Existing Framework
{step1_output}
## Goals
1. Verify if existing items are missing important objects
2. Supplement items based on missing objects
3. Continue searching for {topic} related items within {time_range} and supplement
4. Supplement new fields
## Output Requirements
Return structured results directly (do not write files):
### Supplementary Items
- item_name: Brief explanation (why it should be added)
...
### Recommended Supplementary Fields
- field_name: Field description (why this dimension is needed)
...
### Sources
- [Source1](url1)
- [Source2](url2)
After the web search agent completes, merge findings with the initial framework:
After merging, immediately write:
outline.yaml:
topic: "<idea name>"
session: "IE-<NNNN>-<slug>"
created: "<YYYY-MM-DD>"
source: "IDEA.md"
items:
- name: "<item name>"
category: "<category>"
description: "<description>"
# ... more items
output_dir: "./results"
fields.yaml:
categories:
<category_name>:
fields:
- name: "<field_name>"
description: "<field description>"
detail_level: "<brief|moderate|detailed>"
# ... more fields
# ... more categories
Read workspace/idea-eval/IE-<NNNN>-<slug>/outline.yaml to get items list.
Check for completed and in-progress results:
ls workspace/idea-eval/IE-<NNNN>-<slug>/results/*.json 2>/dev/null
ls workspace/idea-eval/IE-<NNNN>-<slug>/results/*.started 2>/dev/null
Determine item status:
.json exists = completed (skip).started exists but no .json = interrupted (re-research)Calculate total remaining items. Display which will be researched vs skipped.
Launch remaining agents using the Task tool with model: sonnet, run_in_background: true, and max_turns: 25.
Batching strategy:
Each agent researches one item and outputs JSON.
Agent Prompt Template (per item):
Hard Constraint: The following prompt must be strictly reproduced, only replacing variables in {xxx}. Do not modify structure or wording.
You are an elite internet researcher specializing in finding relevant information across diverse online sources. Your expertise lies in creative search strategies, thorough investigation, and comprehensive compilation of findings.
## Progress Tracking
Before starting research, write a marker file to signal that this agent has started:
Write an empty file to {started_path}
After self-validation passes and the JSON result is confirmed correct, delete the marker file:
rm {started_path}
## Research Methodology
Get today's date first:
date +%Y-%m-%d
Generate 5-10 different search query variations to maximize coverage:
- Include technical terms, product names, and common variations
- Think of how different people might describe the same topic
- Use exact phrases in quotes for specific names
- Include version numbers and dates when relevant
### Search Strategy Reference
Use the following search strategies based on what is relevant to the research topic:
**General Web Strategy** (for broad information gathering):
Sources: Reddit, official documentation, blog posts, Hacker News, Dev.to, Medium, Discord, X/Twitter
- Look for official recommendations first
- Cross-reference with community consensus
- Find examples from production use
- Identify anti-patterns and common pitfalls
- Note evolving best practices
- Create structured comparisons with clear criteria
- Find real-world usage examples and case studies
- Look for performance benchmarks and user experiences
**Academic Papers Strategy** (for research, algorithms, scientific topics):
Sources: Google Scholar, arXiv, Hugging Face Papers, bioRxiv, ResearchGate, Semantic Scholar, ACM Digital Library, IEEE Xplore
- Use Google Scholar as primary source with advanced search operators
- Search by author names, paper titles, DOI numbers
- Include year ranges to find seminal works and recent publications
- Look for related papers and citation patterns
- Search for preprints on arXiv and bioRxiv
- Track citation networks to understand research evolution
## Information Gathering Standards
- Read beyond the first few results - valuable information is often buried
- Look for patterns in solutions across different sources
- Pay attention to dates to ensure relevance (note if information is outdated)
- Note different approaches and their trade-offs
- Identify authoritative sources and experienced contributors
- Verify information across multiple sources when possible
- Clearly indicate when information is speculative or unverified
## Task
Research {item_related_info}, output structured JSON to {output_path}
## Field Definitions
Read {fields_path} to get all field definitions
## Output Requirements
1. Output JSON according to fields defined in fields.yaml
2. Mark uncertain field values with [uncertain]
3. Add uncertain array at the end of JSON, listing all uncertain field names
4. All field values must be in English
## Self-Validation
After writing the JSON file, read it back and verify:
1. Every field defined in fields.yaml has a corresponding entry in the JSON
2. The JSON is valid (properly formatted)
3. All uncertain fields are listed in the uncertain array
If validation fails, fix the JSON and re-write it.
## Output Path
{output_path}
For each item:
{item_related_info}: Item's complete YAML content (name + category + description){output_path}: Absolute path to workspace/idea-eval/IE-<NNNN>-<slug>/results/<item_name_slug>.json
_, remove special characters{fields_path}: Absolute path to workspace/idea-eval/IE-<NNNN>-<slug>/fields.yaml{started_path}: Absolute path to workspace/idea-eval/IE-<NNNN>-<slug>/results/<item_name_slug>.startedImmediately after launching all agents:
status: in_progress
started: "<YYYY-MM-DD HH:MM>"
total_items: <N>
items:
- name: "<Item Name>"
slug: "<Item_Name>"
status: pending
# ... all items being researched
CRITICAL: Do NOT use TaskOutput to read agent results. Agent outputs are large and reading them into the orchestrator context will cause context window exhaustion. All results are persisted to disk as JSON — the orchestrator only needs to check file existence.
Polling loop — repeat until all items are resolved:
ls <session_path>/results/*.json 2>/dev/null | wc -l
ls <session_path>/results/*.started 2>/dev/null | wc -l
completed = .json count, in_progress = .started without .json, remaining = total - completedin_progress > 0, wait ~30 seconds (sleep 30) then poll againin_progress == 0, exit loopAfter loop completes:
progress.yaml with final status per itemRead all completed JSON results and identify fields suitable for TOC display:
Auto-select the most informative summary fields for the TOC:
Generate generate_report.py in the session directory.
The script must handle:
1. JSON Structure Compatibility Support two structures:
{"name": "xxx", "release_date": "xxx"}{"basic_info": {"name": "xxx"}, "technical_features": {...}}Field lookup order: Top level -> category mapping key -> traverse all nested dicts
2. Complex Value Formatting
|<br> or blockquote3. Extra Fields Collection
Collect fields in JSON but not in fields.yaml, put in "Other Info." Filter out internal fields (_source_file, uncertain).
4. Uncertain Value Skipping Skip if:
[uncertain]uncertain array5. Report Format
python workspace/idea-eval/IE-<NNNN>-<slug>/generate_report.py
After research agents complete: update state.yaml: set phase_6_research: completed.
After report is generated: update state.yaml: set phase_6_report: completed.
Update state.yaml: set phase_7_synthesis: in_progress.
On resume (zero context), re-read all prior outputs:
IDEA.md from idea_source in state.yamlphase-2-moat.md — defensibility analysisphase-3-vc.md — VC filter resultsphase-4-revenue.md — revenue reality checkphase-5-verdict.md — initial verdict and claims listreport.md — deep research report (summary, not raw JSON)Then read research results from workspace/idea-eval/IE-<NNNN>-<slug>/results/. Compare validated findings against IDEA.md claims. Note where claims were confirmed, corrected, or invalidated.
## [Idea Name]
**What it is:**
One sentence.
---
### Defensibility Analysis
**The Ratio:** X% scar tissue / Y% specifiable code
**Survival Questions:**
- Q1 (Interventional?): [Answer with specifics]
- Q2 (System changing?): [Answer with direction]
- Q3 (Adversarial?): [Answer with pattern name]
**Interaction Effects:** [How customer streams interact]
**Competitive Compression:** [How fast a funded competitor catches up]
**System Replacement Risk:** [Assessment]
---
### VC Filter
- **Ceiling:** [$XB / $XM / Capped] — [reasoning]
- **Inevitable Touchpoint:** [Strong / Weak / None] — [what makes switching hard or easy]
- **Distribution Asymmetry:** [Compounding / Linear / None] — [mechanism]
---
### Revenue Reality
- **Price tier:** [Low / Mid / High] — [amount] x [customers needed]
- **Time to first dollar:** [estimate]
- **Sales cycle fit:** [how many cycles in 6 months]
- **Distribution:** [most realistic channel]
- **Willingness to pay:** [must-have / nice-to-have / unproven]
- **Ramp shape:** [linear / step-function / exponential / front-loaded]
- **$10k MRR in 6 months:** [Likely / Possible / Unlikely / Near-impossible] — [biggest risk]
- **Revenue-first strategy:** [If applicable — how revenue buys time to find scar tissue]
---
### Deep Research Summary
- **Claims validated:** [list]
- **Claims corrected:** [list with corrections]
- **New risks discovered:** [list]
- **New opportunities discovered:** [list]
- **Areas of high uncertainty:** [list]
---
### Verdict Dashboard
**Ratio:** [High / Medium / Low] — X% specifiable, Y% scar tissue
**Volatility:** [Converging / Stable / Increasing / Extreme]
**Interventional:** [Yes / Partial / No]
**Adversarial:** [Yes / No]
**Interaction effects:** [Strong / Weak / None]
**Time to scar tissue:** [Immediate / Months / Years / Never]
**System replacement risk:** [Low / Medium / High]
**VC ceiling:** [$XB / $XM / Capped]
**Inevitable touchpoint:** [Strong / Weak / None]
**Distribution asymmetry:** [Compounding / Linear / None]
**$10k MRR in 6 months:** [Likely / Possible / Unlikely / Near-impossible]
### Final One-Line Verdict
[The most honest single-sentence characterization of this idea]
---
### The Pivot
[If the idea scores poorly on any axis, show the strongest alternative version.]
Consider how the pivot affects:
- **Moat:** Does the interventional version create deeper scar tissue?
- **Revenue speed:** Does it make $10k MRR easier or harder?
- **Distribution:** Does it unlock a better channel?
- **Compression:** Does it change the competitive timeline?
If the idea is observational, show the interventional version.
If the idea has weak distribution, show the version with stronger distribution.
If the idea has a low ceiling, show the version that unlocks a larger TAM.
Write the final synthesis to workspace/idea-eval/IE-<NNNN>-<slug>/verdict.md.
Update state.yaml: set phase_7_synthesis: completed.
Present:
## Idea Evaluation Complete: <idea name>
**Session**: workspace/idea-eval/IE-<NNNN>-<slug>/
### Output Files
- state.yaml — Phase completion tracking (resume source of truth)
- phase-2-moat.md — Defensibility analysis
- phase-3-vc.md — VC Power-Law filter
- phase-4-revenue.md — Revenue Reality Check
- phase-5-verdict.md — Initial verdict + claims catalog
- outline.yaml — Research outline and items list
- fields.yaml — Field definitions
- progress.yaml — Research execution progress
- results/ — JSON results per item (<count> files)
- generate_report.py — Report generation script
- report.md — Deep research report
- verdict.md — Final synthesized verdict
### Key Findings
- Defensibility: [one line]
- Revenue: [one line]
- VC scalability: [one line]
- Biggest risk: [one line]
- Recommended next step: [one line]
Reference these when you see them:
| Pattern | Example | Defensibility | Revenue | |---------|---------|---------------|---------| | The Stripe | Payments, ground station brokerage, claims processing | Durable compounding. Non-adversarial, interventional, crossing streams. | Slow ramp — enterprise sales cycles, compliance gates, long integration timelines. $10k MRR in 6 months unlikely without services wedge. | | The Treadmill | Cybersecurity, trading alpha, SEO, ad fraud | Real scar tissue but adversarial. You run to stay in place. | Variable — cybersecurity sells on fear and closes fast; trading alpha monetizes immediately but is volatile; SEO/ad fraud often self-serve with fast ramp. | | The Stockpile | Medical imaging datasets, content libraries, public market data | Observational. AI synthesizes equivalents. Moat dissolves. | Often fast — proven demand, self-serve, but race to bottom on price as competitors replicate. | | The Experiment | Drug development, clinical trials, molecule screening | Path-dependent but rate-limited by experiment speed. Slow durable moat. | Very slow — regulatory timelines, long sales cycles, high ACV but few deals. $10k MRR in 6 months near-impossible without services revenue. | | The CRM | Generic SaaS, dashboards, developer tools, auth libraries | Mostly specifiable code. Low ratio. Commoditized when building is free. | Often fast — proven demand, self-serve signup, low friction. But margins compress as competitors multiply. | | The Blockbuster | Scar tissue in a system being replaced | Real knowledge in a dying domain. Liability, not asset. | May still be extractable short-term from slow-migrating incumbents. Declining market = declining revenue ceiling. |
Be honest, not reassuring. The goal is to save the founder time and money by identifying weak spots before they invest years. Frame honesty as respect.
Be specific, not generic. Never say "this could be commoditized." Say exactly what is specifiable, what is scar tissue, and how thin the scar tissue layer is.
Most ideas are low-ratio. That does not mean they cannot make money. Defensibility and revenue speed are orthogonal axes. A low-ratio business that hits $10k MRR fast is a legitimate strategy. Frame revenue-first paths as viable strategy, not consolation prize.
Scar tissue without distribution compounds slowly. A strong moat with no way to reach customers is an academic exercise.
Distribution without inevitability is fragile. Fast growth on a switchable product is a race you eventually lose.
Complexity is not a moat. A complex system that is fully specifiable is just expensive to build — and build cost is going to zero.
Ceiling matters. A $50M ARR business is real but not venture-scale. Be clear about which game the founder is playing.
Time to first intervention is critical. An idea where you can start accumulating scar tissue this week beats one that requires 12 months of approvals.
Founder-market fit matters. An idea with perfect scores but no connection to the founder's skills, network, or experience is worse than a slightly lower-scoring idea they can actually execute.
USD pricing. Always include USD alongside other currencies when discussing pricing or market sizes.
This skill supports resuming from zero context at any point. All state is on disk.
Any of these trigger resume mode:
Explicit: User args contain "resume" and a session path.
Example: /brutal-idea-eval resume workspace/idea-eval/IE-0001-ai-invoicing
Auto-detect: User invokes the skill and a session directory already exists for the same IDEA.md. Check workspace/idea-eval/ for directories whose state.yaml → idea_source matches the current IDEA.md path. If found and not completed, automatically resume from last incomplete phase. If completed, start a new session with the next number.
Read state.yaml from the session directory. This is the single source of truth.
Scan phases in order. Find the first phase that is NOT completed:
| state.yaml value | Resume action |
|---|---|
| phase_1_context: pending | Should not happen (state.yaml would not exist). Start fresh. |
| phase_2_moat: pending or in_progress | Read IDEA.md from idea_source. Run Phase 2 from start. |
| phase_3_vc: pending or in_progress | Read IDEA.md + phase-2-moat.md. Run Phase 3 from start. |
| phase_4_revenue: pending or in_progress | Read IDEA.md + phase-2-moat.md + phase-3-vc.md. Run Phase 4. |
| phase_5_verdict: pending or in_progress | Read IDEA.md + phases 2-4 files. Run Phase 5. |
| phase_6_research: pending | Read IDEA.md + phase-5-verdict.md. Run Phase 6 from start. |
| phase_6_research: in_progress | Read outline.yaml + fields.yaml. Check results/ for completed items. Resume research for remaining items only. |
| phase_6_research: completed, phase_6_report: pending or in_progress | Read outline.yaml + fields.yaml. Run report generation (Step 6.5+). |
| phase_6_report: completed, phase_7_synthesis: pending or in_progress | Read all phase files + report.md. Run Phase 7. |
| phase_7_synthesis: completed | Report "Session already complete." Present completion summary. |
For the resume point identified, read ONLY the files needed:
state.yamlidea_source path)phase-2-moat.mdphase-3-vc.mdphase-4-revenue.mdphase-5-verdict.mdoutline.yaml, fields.yaml, check results/*.json and results/*.startedreport.md (do NOT read raw result JSONs into context — use report.md summary)Jump to the identified phase and execute from there. All subsequent phases run normally, including their checkpoints and state updates.
If state.yaml is missing or corrupted, reconstruct state from filesystem:
phase_1_context: completed if state.yaml exists (circular, but directory existence implies Phase 1 ran)
phase_2_moat: completed if phase-2-moat.md exists
phase_3_vc: completed if phase-3-vc.md exists
phase_4_revenue: completed if phase-4-revenue.md exists
phase_5_verdict: completed if phase-5-verdict.md exists
phase_6_research: completed if outline.yaml exists AND all items in outline.yaml have matching .json in results/
phase_6_report: completed if report.md exists
phase_7_synthesis: completed if verdict.md exists
Write a reconstructed state.yaml before proceeding.
You are not here to rubber-stamp ideas. You are not here to crush dreams. You are here to give the founder the clearest possible picture of what they are building — its strengths, its vulnerabilities, and its realistic path to revenue.
Every gap you miss is a blind spot that costs the founder months. Every false reassurance is a lie that costs them years.
Be direct. Be thorough. Be systematic. Be constructive.
Do not:
Do:
tools
Autonomous Linear task worker that selects Linear issues, implements them with TDD, self-reviews, commits, pushes, and moves finished work to In Review.
tools
Systematically reviews a project subsystem-by-subsystem with resumable .brutal-workspace state and creates Linear review finding issues for CRITICAL and MAJOR problems.
development
Collaborative, multi-perspective feature planning with rigorous requirements interrogation. Creates Linear project documents and Linear issues instead of local workspace plan/task files.
documentation
Compact the current conversation into a handoff document for another agent to pick up.