.claude/skills/ai-error-analysis-and-eval-design/SKILL.md
A systematic workflow to move AI products beyond "vibe checks" by identifying specific failure modes and building automated LLM judges. Use this when your AI outputs feel "janky," when you need a feedback signal for prompt engineering, or when monitoring production performance at scale.
npx skillsauth add samarv/Shanon ai-error-analysis-and-eval-designInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
To build great AI products, you must transition from subjective "vibe checks" to systematic measurement. This process identifies exactly where an LLM is failing and creates a feedback loop for continuous improvement.
Before automating, you must manually ground yourself in the data. Appoint one "Benevolent Dictator"—typically the Product Manager or domain expert—to define "good" taste.
Synthesize your mess of notes into actionable categories using an LLM.
For complex, subjective failures (like "human handoff quality"), create an automated evaluator.
Never ship an eval until you know the judge matches human judgment.
Example 1: Real Estate AI Assistant
Example 2: Customer Support Handoff
documentation
Presentation creation, editing, and analysis. When Claude needs to work with presentations (.pptx files) for: (1) Creating new presentations, (2) Modifying or editing content, (3) Working with layouts, (4) Adding comments or speaker notes, or any other presentation tasks
development
A framework to identify and develop sustainable competitive advantages (Power) based on a company's lifecycle stage. Use this when drafting a product strategy, evaluating business model durability, or distinguishing between "operational excellence" and true competitive moats.
development
```yaml --- name: podcast-launch-and-growth-engine description: A framework for launching and scaling a podcast based on topic validation, ranking momentum, and lean production. Use this skill when starting a new content channel, choosing a niche, or designing a listener acquisition strategy. --- This framework leverages Chris Hutchins' "All the Hacks" methodology to move from an idea to the top 5% of active podcasts through strategic validation, momentum-based launching, and high-efficiency di
development
A high-bar framework for measuring and achieving product-market fit (PMF) before scaling. Use this when validating a new product line, deciding if a beta is ready for a general release, or diagnosing why a product isn't generating organic word-of-mouth growth.