skills/ux-evaluate/SKILL.md
Compares current design system against category benchmarks to produce a structured gap analysis. Classifies every design element as MUST keep (brand identity), SHOULD keep (working patterns), MAY change (style updates), or SHOULD improve (gaps vs. category). Use after ux-audit and ux-research complete, or when the user says "evaluate this design", "what should we change", "gap analysis", or "compare against competitors".
npx skillsauth add xoai/sage ux-evaluateInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Compare what you have against what the category expects. Not to copy competitors — to make intentional decisions about what to keep, evolve, and improve.
Core Principle: A redesign without evaluation is a reskin. Changing colors and fonts without understanding what works and what doesn't produces a different-looking version of the same problems. Evaluation grounds the redesign in evidence: what's working (keep it), what's convention (match it), what's below standard (fix it).
After both ux-audit (know what you have) and ux-research (know what the category does). Before ux-brief (translating evaluation into design decisions).
Read both artifacts:
.sage/work/<feature>/current-design-system.md (from ux-audit).sage/work/<feature>/category-benchmarks.md (from ux-research)If either is missing, the evaluation can't proceed. Run the missing skill first.
For each dimension (colors, typography, layout, components, information architecture), compare current state against category benchmarks and produce a classification:
## Colors
| Element | Current | Category Convention | Classification | Rationale |
|---------|---------|-------------------|----------------|-----------|
| Primary brand color | Orange #f97316 | Varies (each brand unique) | MUST KEEP | Brand identity — recognized by users |
| CTA color | Orange (same as brand) | Contrasting color from brand | SHOULD IMPROVE | CTA doesn't stand out from brand elements |
| Background | White #fff | White or light gray | SHOULD KEEP | Matches convention, no issue |
| Text color | #333 | #111-#333 range | SHOULD KEEP | Within convention range |
## Typography
| Element | Current | Category Convention | Classification | Rationale |
|---------|---------|-------------------|----------------|-----------|
| Hero heading size | ~28px mobile | 32-40px mobile | SHOULD IMPROVE | Below category standard, feels less impactful |
| Body text | 16px | 16-18px | SHOULD KEEP | Matches convention |
| Heading hierarchy | h1 → h3 (skips h2) | Sequential h1 → h2 → h3 | MUST IMPROVE | Accessibility violation, not just preference |
## Layout
| Element | Current | Category Convention | Classification | Rationale |
|---------|---------|-------------------|----------------|-----------|
| Hero pattern | Image grid + tagline | Clear value prop + single CTA | SHOULD IMPROVE | Category leaders lead with outcome, not feature showcase |
| CTA placement | Below fold on mobile | Above fold | SHOULD IMPROVE | Users must scroll to take action |
| Section count | 7 sections | 4-6 sections | MAY CHANGE | Slightly long but not critical |
## Components
| Element | Current | Category Convention | Classification | Rationale |
|---------|---------|-------------------|----------------|-----------|
| Course cards | Image + title + description | Image + title + key metric + CTA | SHOULD IMPROVE | Missing differentiating metric (e.g., "95% AI accuracy") |
| Testimonials | Text carousel | Photo + name + score + quote | SHOULD IMPROVE | Photo + score increases credibility |
| Bee mascot | Present throughout | N/A (unique to Prep) | MUST KEEP | Brand differentiator, recognized by users |
## Information Architecture
| Element | Current | Category Convention | Classification | Rationale |
|---------|---------|-------------------|----------------|-----------|
| First screen content | Brand tagline + animated images | Value prop + CTA + social proof | SHOULD IMPROVE | Category leaders answer "what is this?" and "why should I care?" above fold |
| Social proof position | Bottom half of page | Top half, near hero | SHOULD IMPROVE | 100K+ students is a powerful signal — it's buried |
| Navigation | Full nav with dropdowns | Simplified nav + prominent CTA | MAY CHANGE | Current works, but heavy for a landing page |
Use these four categories consistently:
MUST KEEP — Brand identity elements. Changing these confuses existing users.
SHOULD KEEP — Patterns that work and match conventions.
MAY CHANGE — Style elements where updating would freshen without breaking.
SHOULD IMPROVE — Elements that fall below category standards or have measurable problems.
Rank SHOULD IMPROVE items by impact:
## Priority Improvements
1. [Highest impact]: [description] — [why: conversion / accessibility / performance]
2. [High impact]: [description] — [why]
3. [Medium impact]: [description] — [why]
...
Impact factors:
Save to .sage/work/<feature>/design-evaluation.md:
# Design Evaluation: [page/product]
**Based on:** current-design-system.md + category-benchmarks.md
**Date:** [timestamp]
## Summary
[2-3 sentences: overall assessment, how current design compares to category]
## Classification Table
| Dimension | MUST KEEP | SHOULD KEEP | MAY CHANGE | SHOULD IMPROVE |
|-----------|-----------|-------------|------------|----------------|
| Colors | [count] | [count] | [count] | [count] |
| Typography | [count] | [count] | [count] | [count] |
| Layout | [count] | [count] | [count] | [count] |
| Components | [count] | [count] | [count] | [count] |
| IA | [count] | [count] | [count] | [count] |
## Detailed Evaluation
[from Step 2 — all dimension tables]
## Priority Improvements
[from Step 4 — ranked list]
Show to user: "Here's the evaluation. Before I create the design brief, I want to confirm: do you agree with the classifications? Anything I marked as MUST KEEP that you actually want to change, or vice versa?"
🔒 CHECKPOINT: This is where the user's input matters most. The classifications are proposals — the user decides.
MUST (violation = uninformed redesign):
SHOULD (violation = incomplete evaluation):
MAY (context-dependent):
Communication style: Analytical language. Justify classifications with evidence — benchmarks, conventions, principles. Evaluations should hold up if challenged by a stakeholder.
Good UX evaluation output:
Before presenting your output, check each quality criterion above. For each, confirm it's met or note what's missing. Present your findings AND your self-assessment:
"Self-review: [X/Y criteria met]. [Note any gaps and why they exist.]"
tools
Captures agent mistakes, corrections, and discovered gotchas so they are not repeated. Use when: (1) a command or operation fails unexpectedly, (2) the user corrects the agent, (3) the agent discovers non-obvious behavior through debugging, (4) an API or tool behaves differently than expected, (5) a better approach is found for a recurring task. Also searches past learnings before starting tasks to avoid known pitfalls. Activate alongside the sage-memory skill — they share the same MCP backend but serve different purposes (sage-memory = codebase knowledge, sage-self-learning = agent mistakes and gotchas).
development
Typed knowledge graph stored in sage-memory. Use when creating or querying structured entities (Person, Project, Task, Event, Document), linking related objects, checking dependencies, planning multi-step actions as graph transformations, or when skills need to share structured state. Trigger on "remember that X is Y", "what do I know about", "link X to Y", "show dependencies", "what blocks X", entity CRUD, cross-skill data access, or any request involving structured relationships between things.
tools
Integrates sage-memory into Sage workflows. Teaches the agent when to remember (store findings during work), when to recall (search memory at session start and task start), and how to learn (structured knowledge capture via sage learn). Use when the user mentions memory, remember, recall, learn, capture knowledge, onboard to codebase, or when starting any session where sage-memory MCP tools are available.
tools
Captures agent mistakes, corrections, and discovered gotchas so they are not repeated. Use when: (1) a command or operation fails unexpectedly, (2) the user corrects the agent, (3) the agent discovers non-obvious behavior through debugging, (4) an API or tool behaves differently than expected, (5) a better approach is found for a recurring task. Also searches past learnings before starting tasks to avoid known pitfalls. Activate alongside the sage-memory skill — they share the same MCP backend but serve different purposes (sage-memory = codebase knowledge, sage-self-learning = agent mistakes and gotchas).