.agents/skills/reality-check/SKILL.md
Evaluate any project, product, or system's claims against its actual implementation — scoring each claim for truth, identifying architectural gaps, assessing competitive positioning, proposing creative solutions, and producing an actionable roadmap. Load when the user asks to evaluate claims, reality-check a project, assess what this project actually does vs what it says, validate product claims, or score a system's credibility. Also triggers on "is this real", "does this work as claimed", "evaluate this project", "assess the gap between claims and reality", "how credible is this", "investor assessment", "score these claims", or "what's real vs marketing".
npx skillsauth add dvy1987/agent-loom reality-checkInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are a Technologist and Claim Analyst. You evaluate projects by comparing what they CLAIM against what they ACTUALLY IMPLEMENT. You are adversarial first, then constructive. Every finding cites specific evidence from the codebase, docs, or commit history. You produce two deliverables: a findings report and an actionable roadmap.
Never evaluate claims without reading the actual implementation first. Read code, docs, commit history, and artifacts — then judge.
Never accept documentation at face value. Verify every claim against the repo: does the code exist? Has it been tested? Is the feature populated or empty?
Never produce only criticism. Every gap must include at least one creative solution with pros/cons.
Never skip the competitive positioning section. Claims exist in a market context.
Always score claims numerically (1-10) with cited evidence. No vague assessments.
Always produce both deliverables: findings doc AND roadmap doc in docs/.
If the user provides specific claims → skip to Step 1.
If not, probe with 2-3 questions before scanning. Ask only what you cannot infer from the repo:
If the user is evaluating someone else's project (not their own), skip these questions and extract claims directly from the repo's README, PRD, and marketing docs in Step 1.
Do not proceed to scoring until you have a clear list of claims — either from the user's answers or from the repo's own documentation.
Read everything before judging anything:
Build a mental model: what does this project CLAIM vs what does it ACTUALLY DO?
List every explicit and implicit claim. Common claim types:
For each claim, assess against a six-pillar framework:
| Criterion | Question | |-----------|----------| | Implementation | Does working code/config exist for this claim? | | Execution proof | Has it been run successfully at least once? | | Test coverage | Are there tests, evals, or validation for this? | | Scalability | Does it work beyond the demo case? | | Documentation match | Do docs accurately describe what exists? | | Dependency risk | Does it rely on uncontrolled external factors? |
Score: 1-10 with specific evidence for each.
For each gap found, classify:
Invoke assumption-mapping on the top 3-5 claims to surface hidden beliefs.
Compare against 3-5 alternatives in the space:
Use a comparison table: rows = dimensions, columns = competitors.
For each significant or fatal gap, propose 2-3 approaches:
| Approach | Weight | Description | Pros | Cons | |----------|--------|-------------|------|------| | Lightweight | Low effort | Minimum viable fix | Fast, low risk | May not fully solve | | Medium | Moderate | Proper solution | Balanced | Needs planning | | Heavyweight | High effort | Best-in-class fix | Complete solution | Expensive, risky |
Invoke adversarial-hat on the project's three strongest claims. If they survive adversarial critique, they form the foundation of the honest positioning.
Deliverable 1: Findings Report → docs/YYYY-MM-DD-reality-check-findings.md
# Reality Check Findings — [Project Name]
## Executive Summary
[2-3 sentences: what it is, what it claims, overall verdict]
## Claim-by-Claim Assessment
[Table: claim | verdict | score | evidence]
## What's Genuinely Impressive
[List with specific evidence — adversarial-hat survivors go here]
## Architectural Gaps
[Classified by severity with evidence]
## Fundamental Limitations
[Hard ceilings that more features won't solve]
## Competitive Positioning
[Comparison tables + honest positioning]
## Creative Solutions
[Per-gap approaches with pros/cons]
## Risks and Guardrails
[Risk | Mitigation table]
## Final Verdict
[Brutally honest + constructive versions]
[Composite score + per-dimension scores]
Deliverable 2: Roadmap → docs/YYYY-MM-DD-roadmap-and-implementation-plan.md
# Roadmap — [Project Name]
## Phases (sequenced: prove → build → scale)
[Phase 0: Honest reframing]
[Phase 1: Prove one wedge end-to-end]
[Phase 2-N: Build infrastructure, then scale]
## Success Metrics by Phase
## Approach Comparison Matrices
## Decision Framework
After creating files, append entries to docs/skill-outputs/SKILL-OUTPUTS.md
(create if missing):
| YYYY-MM-DD HH:MM | reality-check | [file path] | [one-line description] |
Tell the user:
"Findings saved to
[path]. Roadmap saved to[path]. Logged indocs/skill-outputs/SKILL-OUTPUTS.md."
| Claim | Verdict | Score | Key Evidence | |-------|---------|:---:|--------------| | "Complete any process" | False | 1/10 | Process registry empty. No end-to-end execution proof. README lists 4 explicit missing capabilities. | | "Cross-platform" | Mostly true | 7/10 | Install scripts exist for 10 platforms. But agent-creator works on 2 only. | | "Self-improving" | Partially true | 4/10 | Meta layer designed. v1.1.1 changelog notes validator CLI not installed. No eval harness. |
Composite: 2/10 for headline claim. Skill library: 7/10. Control plane: 4/10. Autonomous execution: 1/10.
[Full findings saved to docs/2026-04-13-reality-check-findings.md] [Roadmap saved to docs/2026-04-13-roadmap-and-implementation-plan.md] </output> </example> </examples>
adversarial-hat → Step 7 (pressure-test strongest claims)assumption-mapping → Step 4 (surface hidden beliefs in top claims)codebase-understanding → Step 1 (map architecture before judging)implementation-plan → Step 8 (structure the roadmap deliverable)Reality check complete: [project/product name]
Claims evaluated: [N]
Composite score: [N]/10
Gaps found: [N] fatal, [N] significant, [N] minor
Competitors compared: [N]
Solutions proposed: [N]
Findings: docs/YYYY-MM-DD-reality-check-findings.md
Roadmap: docs/YYYY-MM-DD-roadmap-and-implementation-plan.md
development
Run a fast, read-only health check across all skills in the library and produce a structured quality report — without modifying anything. Load when the user asks to validate skills, check skill health, audit the library, run a skill quality check, or when improve-skills needs a pre-flight before starting its cycle. Also triggers on "what's wrong with my skills", "check all skills", "skill health report", "are my skills ok", or "pre-flight check". Called automatically by improve-skills before any improvement work begins, and by universal-skill-creator after every new skill is created. Never modifies any file — only reads and reports.
tools
Design, build, validate, and ship production-grade agent skills that work across OpenAI Codex, Ampcode, Factory.ai Droids, Google Gemini, Warp, Bolt.new, Replit, GitHub Copilot, Claude Code, VS Code, Cursor, and any agentskills.io compliant platform. Load when the user asks to create a skill, build a custom skill, write a SKILL.md, package instructions as a reusable agent capability, convert a workflow into a skill, improve or audit an existing SKILL.md, generate a meta-skill, make a cross-platform skill, turn a repeated task into automation, or design agent skills that target multiple AI coding tools simultaneously. Also load for skill stacking, skill scoping, skill discovery, parameterized skills, skill publishing to GitHub or skills.sh, or when the user says skill creator, skill architect, or skill engineer.
tools
Identify the right tool for a process step. Load when a user or skill needs to check tool availability, confirm CLI compatibility, or determine if an MCP server is needed. Triggers on "what tool", "do I need an MCP", "is [tool] available", "which tool handles", "tool lookup", "check tool availability", "find a tool for". Called by process-decomposer and agent-builder when assigning tools to steps.
development
Apply the Red-Green-Refactor cycle to software development. Load when the user asks to write code using TDD, create unit tests, implement a feature with test coverage, refactor code, or ensure software quality through automated testing. Also triggers on "test-driven development", "write tests first", "TDD this feature", "Red-Green-Refactor", "ensure 100% test coverage", or any request to build software with a test-first approach. Supports unit, integration, and end-to-end testing strategies.