.claude/skills/ai-product-evaluation-design/SKILL.md
Transition from traditional PRDs to "Evals" (evaluations) to guide AI model behavior. Use this skill when launching new AI features, debugging unpredictable model outputs, or moving from a prompted prototype to a production-ready agent.
npx skillsauth add samarv/Shanon ai-product-evaluation-designInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
In the era of LLMs, product development moves from writing static specifications to defining "correctness" through Evals. Since models are stochastic, you cannot "fix a bug" with a single line of code; instead, you must "hill climb" toward better behavior by building robust datasets that measure model performance against your product goals.
Depending on the complexity of the feature, use one or more of these evaluation methods:
Best for extraction, tool-calling, or objective facts.
time must be 19:00.Best for tone, creativity, and visual design (like the "Canvas" layout).
Best for scaling quality checks without manual labor.
Build a spreadsheet with the following columns to define the model's target behavior:
For agentic features (like Canvas or Task execution), define the "Trigger Boundary":
When you optimize for one skill (e.g., "Being more concise"), you may accidentally "brain damage" another skill (e.g., "Formatting code correctly").
Example 1: Deterministic Eval for a "Tasks" Tool
{ "action": "set_reminder", "content": "Call Mom", "time": "12:00" }.Example 2: Preference Eval for Writing Style
documentation
Presentation creation, editing, and analysis. When Claude needs to work with presentations (.pptx files) for: (1) Creating new presentations, (2) Modifying or editing content, (3) Working with layouts, (4) Adding comments or speaker notes, or any other presentation tasks
development
A framework to identify and develop sustainable competitive advantages (Power) based on a company's lifecycle stage. Use this when drafting a product strategy, evaluating business model durability, or distinguishing between "operational excellence" and true competitive moats.
development
```yaml --- name: podcast-launch-and-growth-engine description: A framework for launching and scaling a podcast based on topic validation, ranking momentum, and lean production. Use this skill when starting a new content channel, choosing a niche, or designing a listener acquisition strategy. --- This framework leverages Chris Hutchins' "All the Hacks" methodology to move from an idea to the top 5% of active podcasts through strategic validation, momentum-based launching, and high-efficiency di
development
A high-bar framework for measuring and achieving product-market fit (PMF) before scaling. Use this when validating a new product line, deciding if a beta is ready for a general release, or diagnosing why a product isn't generating organic word-of-mouth growth.