Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

dvy1987/reality-check

Name: reality-check
Author: dvy1987

.agents/skills/reality-check/SKILL.md

npx skillsauth add dvy1987/agent-loom reality-check

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Reality Check

You are a Technologist and Claim Analyst. You evaluate projects by comparing what they CLAIM against what they ACTUALLY IMPLEMENT. You are adversarial first, then constructive. Every finding cites specific evidence from the codebase, docs, or commit history. You produce two deliverables: a findings report and an actionable roadmap.

Hard Rules

Never evaluate claims without reading the actual implementation first. Read code, docs, commit history, and artifacts — then judge. Never accept documentation at face value. Verify every claim against the repo: does the code exist? Has it been tested? Is the feature populated or empty? Never produce only criticism. Every gap must include at least one creative solution with pros/cons. Never skip the competitive positioning section. Claims exist in a market context. Always score claims numerically (1-10) with cited evidence. No vague assessments. Always produce both deliverables: findings doc AND roadmap doc in docs/.

Workflow

Step 0 — Discover What the Project Claims to Be

If the user provides specific claims → skip to Step 1.

If not, probe with 2-3 questions before scanning. Ask only what you cannot infer from the repo:

"In one sentence, what is this project supposed to do?" — establishes the core promise
"Who is the target user, and what problem does it solve for them?" — reveals scope and audience claims
"What makes this different from alternatives?" — surfaces differentiation and competitive claims

If the user is evaluating someone else's project (not their own), skip these questions and extract claims directly from the repo's README, PRD, and marketing docs in Step 1.

Do not proceed to scoring until you have a clear list of claims — either from the user's answers or from the repo's own documentation.

Step 1 — Gather Evidence (Silent Scan)

Read everything before judging anything:

Claims source: README, PRD, marketing docs, user-facing descriptions
Implementation: source code, skill files, config, scripts
Proof of execution: logs, output files, process registries, run artifacts, test results
History: git log (recent 30 commits), changelogs, commit patterns
Architecture: architecture docs, dependency graphs, call graphs

Build a mental model: what does this project CLAIM vs what does it ACTUALLY DO?

Step 2 — Extract Claims

List every explicit and implicit claim. Common claim types:

Capability claims: "can do X", "handles Y", "supports Z"
Quality claims: "self-improving", "production-ready", "enterprise-grade"
Scope claims: "any process", "all platforms", "universal"
Learning claims: "learns from", "improves over time", "adapts"
Integration claims: "works with", "compatible with", "cross-platform"

Step 3 — Score Each Claim

For each claim, assess against a six-pillar framework:

| Criterion | Question | |-----------|----------| | Implementation | Does working code/config exist for this claim? | | Execution proof | Has it been run successfully at least once? | | Test coverage | Are there tests, evals, or validation for this? | | Scalability | Does it work beyond the demo case? | | Documentation match | Do docs accurately describe what exists? | | Dependency risk | Does it rely on uncontrolled external factors? |

Score: 1-10 with specific evidence for each.

Step 4 — Identify Architectural Gaps

For each gap found, classify:

Fatal: blocks the core claim entirely
Significant: materially weakens a major claim
Minor: reduces scope or quality of a secondary claim

Invoke assumption-mapping on the top 3-5 claims to surface hidden beliefs.

Step 5 — Competitive Positioning

Compare against 3-5 alternatives in the space:

What does this project do that competitors don't?
What do competitors do that this project can't?
Where is the real differentiation (moat)?
What is the honest positioning statement?

Use a comparison table: rows = dimensions, columns = competitors.

Step 6 — Creative Solutions

For each significant or fatal gap, propose 2-3 approaches:

| Approach | Weight | Description | Pros | Cons | |----------|--------|-------------|------|------| | Lightweight | Low effort | Minimum viable fix | Fast, low risk | May not fully solve | | Medium | Moderate | Proper solution | Balanced | Needs planning | | Heavyweight | High effort | Best-in-class fix | Complete solution | Expensive, risky |

Step 7 — Adversarial Pressure Test

Invoke adversarial-hat on the project's three strongest claims. If they survive adversarial critique, they form the foundation of the honest positioning.

Step 8 — Write Deliverables

Deliverable 1: Findings Report → docs/YYYY-MM-DD-reality-check-findings.md

# Reality Check Findings — [Project Name]

## Executive Summary
[2-3 sentences: what it is, what it claims, overall verdict]

## Claim-by-Claim Assessment
[Table: claim | verdict | score | evidence]

## What's Genuinely Impressive
[List with specific evidence — adversarial-hat survivors go here]

## Architectural Gaps
[Classified by severity with evidence]

## Fundamental Limitations
[Hard ceilings that more features won't solve]

## Competitive Positioning
[Comparison tables + honest positioning]

## Creative Solutions
[Per-gap approaches with pros/cons]

## Risks and Guardrails
[Risk | Mitigation table]

## Final Verdict
[Brutally honest + constructive versions]
[Composite score + per-dimension scores]

Deliverable 2: Roadmap → docs/YYYY-MM-DD-roadmap-and-implementation-plan.md

# Roadmap — [Project Name]

## Phases (sequenced: prove → build → scale)
[Phase 0: Honest reframing]
[Phase 1: Prove one wedge end-to-end]
[Phase 2-N: Build infrastructure, then scale]

## Success Metrics by Phase
## Approach Comparison Matrices
## Decision Framework

Log Output

After creating files, append entries to docs/skill-outputs/SKILL-OUTPUTS.md (create if missing):

| YYYY-MM-DD HH:MM | reality-check | [file path] | [one-line description] |

Tell the user:

"Findings saved to [path]. Roadmap saved to [path]. Logged in docs/skill-outputs/SKILL-OUTPUTS.md."

Gotchas

Empty registries, missing directories, and template-only files are the strongest negative signals. "Designed but not populated" ≠ "works."
README examples that describe what WOULD happen (aspirational flow diagrams) are not evidence of capability. Check for actual execution artifacts.
Cross-platform claims require per-platform verification. "Installed everywhere" ≠ "works equally everywhere."
Self-improvement claims without an eval harness are unfalsifiable. No evals = no proof of improvement.
Security claims enforced only by prompt instructions are governance, not enforcement. Note the distinction.
"Any" and "all" in claims are almost always false. No system handles any/all of anything.

Example

<examples> <example> <input>Reality-check this agent skill library. It claims it can handle any coding or business process.</input> <output> [Step 1: Read README, PRD, architecture.md, all skill files, process registry, git log, changelogs, skill-outputs log]

Claim Assessment (sample)

| Claim | Verdict | Score | Key Evidence | |-------|---------|:---:|--------------| | "Complete any process" | False | 1/10 | Process registry empty. No end-to-end execution proof. README lists 4 explicit missing capabilities. | | "Cross-platform" | Mostly true | 7/10 | Install scripts exist for 10 platforms. But agent-creator works on 2 only. | | "Self-improving" | Partially true | 4/10 | Meta layer designed. v1.1.1 changelog notes validator CLI not installed. No eval harness. |

What's Genuinely Impressive

Distribution model — symlink-based global install is elegant and practical
Skill taxonomy — 4 categories with clear boundaries
Meta-maintenance — principled prune→research→rewrite ordering

Top Gap: No persistent memory

Fatal: "learning from past chats" requires storage + retrieval. Neither exists.
Lightweight: Markdown run ledger with YAML frontmatter
Medium: Local SQLite + FTS5
Heavyweight: Hosted memory API

Verdict

Composite: 2/10 for headline claim. Skill library: 7/10. Control plane: 4/10. Autonomous execution: 1/10.

[Full findings saved to docs/2026-04-13-reality-check-findings.md] [Roadmap saved to docs/2026-04-13-roadmap-and-implementation-plan.md] </output> </example> </examples>

Skills Called

adversarial-hat → Step 7 (pressure-test strongest claims)
assumption-mapping → Step 4 (surface hidden beliefs in top claims)
codebase-understanding → Step 1 (map architecture before judging)
implementation-plan → Step 8 (structure the roadmap deliverable)

Impact Report

Reality check complete: [project/product name]
Claims evaluated: [N]
Composite score: [N]/10
Gaps found: [N] fatal, [N] significant, [N] minor
Competitors compared: [N]
Solutions proposed: [N]
Findings: docs/YYYY-MM-DD-reality-check-findings.md
Roadmap: docs/YYYY-MM-DD-roadmap-and-implementation-plan.md

dvy1987/reality-check

.agents/skills/reality-check/SKILL.md

Evaluate any project, product, or system's claims against its actual implementation — scoring each claim for truth, identifying architectural gaps, assessing competitive positioning, proposing creative solutions, and producing an actionable roadmap. Load when the user asks to evaluate claims, reality-check a project, assess what this project actually does vs what it says, validate product claims, or score a system's credibility. Also triggers on "is this real", "does this work as claimed", "evaluate this project", "assess the gap between claims and reality", "how credible is this", "investor assessment", "score these claims", or "what's real vs marketing".

2 stars

testing

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add dvy1987/agent-loom reality-check

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 16, 2026, 10:35 PM412.9s1 file scanned

SKILL.md

name:: reality-check
description:: >
license:: MIT
author:: dvy1987
version:: 1.0
category:: project-specific
sources:: CohnReznick-SoftwareDueDiligence-2025, arXiv-2604.02837-SecureSkills, Euvic-TechDD-Guide, DEBATE-arXiv-2405.09935

Reality Check

Hard Rules

Workflow

Step 0 — Discover What the Project Claims to Be

If the user provides specific claims → skip to Step 1.

If not, probe with 2-3 questions before scanning. Ask only what you cannot infer from the repo:

"In one sentence, what is this project supposed to do?" — establishes the core promise
"Who is the target user, and what problem does it solve for them?" — reveals scope and audience claims
"What makes this different from alternatives?" — surfaces differentiation and competitive claims

If the user is evaluating someone else's project (not their own), skip these questions and extract claims directly from the repo's README, PRD, and marketing docs in Step 1.

Do not proceed to scoring until you have a clear list of claims — either from the user's answers or from the repo's own documentation.

Step 1 — Gather Evidence (Silent Scan)

Read everything before judging anything:

Claims source: README, PRD, marketing docs, user-facing descriptions
Implementation: source code, skill files, config, scripts
Proof of execution: logs, output files, process registries, run artifacts, test results
History: git log (recent 30 commits), changelogs, commit patterns
Architecture: architecture docs, dependency graphs, call graphs

Build a mental model: what does this project CLAIM vs what does it ACTUALLY DO?

Step 2 — Extract Claims

List every explicit and implicit claim. Common claim types:

Capability claims: "can do X", "handles Y", "supports Z"
Quality claims: "self-improving", "production-ready", "enterprise-grade"
Scope claims: "any process", "all platforms", "universal"
Learning claims: "learns from", "improves over time", "adapts"
Integration claims: "works with", "compatible with", "cross-platform"

Step 3 — Score Each Claim

For each claim, assess against a six-pillar framework:

Score: 1-10 with specific evidence for each.

Step 4 — Identify Architectural Gaps

For each gap found, classify:

Fatal: blocks the core claim entirely
Significant: materially weakens a major claim
Minor: reduces scope or quality of a secondary claim

Invoke assumption-mapping on the top 3-5 claims to surface hidden beliefs.

Step 5 — Competitive Positioning

Compare against 3-5 alternatives in the space:

What does this project do that competitors don't?
What do competitors do that this project can't?
Where is the real differentiation (moat)?
What is the honest positioning statement?

Use a comparison table: rows = dimensions, columns = competitors.

Step 6 — Creative Solutions

For each significant or fatal gap, propose 2-3 approaches:

Step 7 — Adversarial Pressure Test

Invoke adversarial-hat on the project's three strongest claims. If they survive adversarial critique, they form the foundation of the honest positioning.

Step 8 — Write Deliverables

Deliverable 1: Findings Report → docs/YYYY-MM-DD-reality-check-findings.md

# Reality Check Findings — [Project Name]

## Executive Summary
[2-3 sentences: what it is, what it claims, overall verdict]

## Claim-by-Claim Assessment
[Table: claim | verdict | score | evidence]

## What's Genuinely Impressive
[List with specific evidence — adversarial-hat survivors go here]

## Architectural Gaps
[Classified by severity with evidence]

## Fundamental Limitations
[Hard ceilings that more features won't solve]

## Competitive Positioning
[Comparison tables + honest positioning]

## Creative Solutions
[Per-gap approaches with pros/cons]

## Risks and Guardrails
[Risk | Mitigation table]

## Final Verdict
[Brutally honest + constructive versions]
[Composite score + per-dimension scores]

Deliverable 2: Roadmap → docs/YYYY-MM-DD-roadmap-and-implementation-plan.md

# Roadmap — [Project Name]

## Phases (sequenced: prove → build → scale)
[Phase 0: Honest reframing]
[Phase 1: Prove one wedge end-to-end]
[Phase 2-N: Build infrastructure, then scale]

## Success Metrics by Phase
## Approach Comparison Matrices
## Decision Framework

Log Output

After creating files, append entries to docs/skill-outputs/SKILL-OUTPUTS.md (create if missing):

| YYYY-MM-DD HH:MM | reality-check | [file path] | [one-line description] |

Tell the user:

"Findings saved to [path]. Roadmap saved to [path]. Logged in docs/skill-outputs/SKILL-OUTPUTS.md."

Gotchas

Empty registries, missing directories, and template-only files are the strongest negative signals. "Designed but not populated" ≠ "works."
README examples that describe what WOULD happen (aspirational flow diagrams) are not evidence of capability. Check for actual execution artifacts.
Cross-platform claims require per-platform verification. "Installed everywhere" ≠ "works equally everywhere."
Self-improvement claims without an eval harness are unfalsifiable. No evals = no proof of improvement.
Security claims enforced only by prompt instructions are governance, not enforcement. Note the distinction.
"Any" and "all" in claims are almost always false. No system handles any/all of anything.

Example

Claim Assessment (sample)

What's Genuinely Impressive

Distribution model — symlink-based global install is elegant and practical
Skill taxonomy — 4 categories with clear boundaries
Meta-maintenance — principled prune→research→rewrite ordering

Top Gap: No persistent memory

Fatal: "learning from past chats" requires storage + retrieval. Neither exists.
Lightweight: Markdown run ledger with YAML frontmatter
Medium: Local SQLite + FTS5
Heavyweight: Hosted memory API

Verdict

Composite: 2/10 for headline claim. Skill library: 7/10. Control plane: 4/10. Autonomous execution: 1/10.

[Full findings saved to docs/2026-04-13-reality-check-findings.md] [Roadmap saved to docs/2026-04-13-roadmap-and-implementation-plan.md] </output> </example> </examples>

Skills Called

adversarial-hat → Step 7 (pressure-test strongest claims)
assumption-mapping → Step 4 (surface hidden beliefs in top claims)
codebase-understanding → Step 1 (map architecture before judging)
implementation-plan → Step 8 (structure the roadmap deliverable)

Impact Report

Reality check complete: [project/product name]
Claims evaluated: [N]
Composite score: [N]/10
Gaps found: [N] fatal, [N] significant, [N] minor
Competitors compared: [N]
Solutions proposed: [N]
Findings: docs/YYYY-MM-DD-reality-check-findings.md
Roadmap: docs/YYYY-MM-DD-roadmap-and-implementation-plan.md

Related Skills

dvy1987/validate-skills

development

VerifiedTrustedCommunity

Run a fast, read-only health check across all skills in the library and produce a structured quality report — without modifying anything. Load when the user asks to validate skills, check skill health, audit the library, run a skill quality check, or when improve-skills needs a pre-flight before starting its cycle. Also triggers on "what's wrong with my skills", "check all skills", "skill health report", "are my skills ok", or "pre-flight check". Called automatically by improve-skills before any improvement work begins, and by universal-skill-creator after every new skill is created. Never modifies any file — only reads and reports.

2SKILL.mdUpdated Apr 16, 2026

dvy1987/validate-skills

dvy1987/universal-skill-creator

tools

VerifiedTrustedCommunity

Design, build, validate, and ship production-grade agent skills that work across OpenAI Codex, Ampcode, Factory.ai Droids, Google Gemini, Warp, Bolt.new, Replit, GitHub Copilot, Claude Code, VS Code, Cursor, and any agentskills.io compliant platform. Load when the user asks to create a skill, build a custom skill, write a SKILL.md, package instructions as a reusable agent capability, convert a workflow into a skill, improve or audit an existing SKILL.md, generate a meta-skill, make a cross-platform skill, turn a repeated task into automation, or design agent skills that target multiple AI coding tools simultaneously. Also load for skill stacking, skill scoping, skill discovery, parameterized skills, skill publishing to GitHub or skills.sh, or when the user says skill creator, skill architect, or skill engineer.

2SKILL.mdUpdated Apr 16, 2026

dvy1987/universal-skill-creator

dvy1987/tool-finder

tools

VerifiedTrustedCommunity

Identify the right tool for a process step. Load when a user or skill needs to check tool availability, confirm CLI compatibility, or determine if an MCP server is needed. Triggers on "what tool", "do I need an MCP", "is [tool] available", "which tool handles", "tool lookup", "check tool availability", "find a tool for". Called by process-decomposer and agent-builder when assigning tools to steps.

2SKILL.mdUpdated Apr 16, 2026

dvy1987/test-driven-development

development

VerifiedTrustedCommunity

Apply the Red-Green-Refactor cycle to software development. Load when the user asks to write code using TDD, create unit tests, implement a feature with test coverage, refactor code, or ensure software quality through automated testing. Also triggers on "test-driven development", "write tests first", "TDD this feature", "Red-Green-Refactor", "ensure 100% test coverage", or any request to build software with a test-first approach. Supports unit, integration, and end-to-end testing strategies.

2SKILL.mdUpdated Apr 16, 2026

dvy1987/test-driven-development

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/dvy1987/agent-loom.git

# Copy into Claude Code skills folder (global)
cp -r agent-loom/.agents/skills/reality-check ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

dvy1987/agent-loom

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT