skills/evidence-review/SKILL.md
Default-to-rejection quality gate. Assumes NEEDS WORK until overwhelming evidence proves otherwise. Requires actual test output, screenshots, build logs — not claims. Triggers: "evidence review", "prove it works", "show me proof", "quality gate", "final review".
npx skillsauth add OmexIT/claude-skills-pack evidence-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Act as the final quality gate before shipping. I default to NEEDS WORK and require concrete evidence to approve. Claims without proof are automatic failures.
NEEDS WORK until proven otherwise.
This inverts the typical review pattern. Instead of "looks good unless I find problems," this skill assumes problems exist and requires evidence of quality.
Any of these → immediate NEEDS WORK rating:
Collect evidence inventory: For every requirement marked "done," demand the proof: | Requirement | Evidence Type | Evidence Location | Verified? | |---|---|---|---| | FR-001: User can submit form | Screenshot of running app | e2e/screenshots/tc001.png | ✅/❌ | | FR-002: API validates input | curl output showing 400 | test-report.log line 45 | ✅/❌ | | FR-003: Data persists to DB | psql query result | db-verify.log | ✅/❌ |
Verify evidence is real: Run the verification commands myself. Don't trust pasted output — re-execute:
# Re-run tests
mvn test 2>&1 | tail -20
# Re-check DB state
docker compose exec db psql -U postgres -d appdb -c "SELECT count(*) FROM ..."
# Re-take screenshot
npx playwright test --grep "TC-001" 2>&1
Check for missing coverage: Cross-reference the spec/PRD against what was tested:
Check for regressions: Compare current test results against previous baseline:
# Find previous test results
ls -t e2e/reports/verify-*.log | head -2
# Diff test counts
Check code quality signals:
@Suppress / // eslint-disable / // ignore: annotations? (suppressed warnings)Rate the implementation:
| Type | What constitutes valid evidence | Invalid evidence | |---|---|---| | Test output | Actual stdout from test runner with pass/fail counts | "All tests pass" (text claim) | | Screenshots | Playwright/emulator screenshot of running app | Figma mockup or design file | | Build logs | Actual compiler/bundler output showing success | "Build works" (text claim) | | API responses | curl output with status code and response body | API spec showing expected response | | DB state | psql/mongosh query result showing actual rows | Schema diagram | | Coverage report | Coverage tool output with percentages and uncovered lines | "We have good coverage" | | Lint output | Linter stdout showing zero errors | "Code is clean" |
/verify-impl, /spec-to-impl, code-review:code-review (official plugin)/finalize (if PASS or CONDITIONAL PASS)/test-plan (defines what needs evidence), /security-reviewAfter completing this skill, store reusable insights in memory:
produces:
- type: "evidence-review"
format: markdown
path: "claudedocs/<feature>-evidence-review.md"
sections: [evidence_inventory, verification_results, code_quality, rating, follow_ups]
rating: "REJECT | NEEDS WORK | CONDITIONAL PASS | PASS"
handoff: "Write claudedocs/handoff-evidence-review-<timestamp>.yaml — suggest: finalize"
tools
Use this skill to verify a completed implementation through live testing — API calls, database state checks, and UI automation with Playwright. Triggers include: "test the implementation", "verify this works", "run API tests", "check the database", "test the UI", "end-to-end verify", "smoke test", "sanity check the implementation", "manually test", or any time an implementation needs post-build validation beyond unit tests. Also triggered automatically by spec-to-impl during the integration review phase. Use this when you want real evidence the system works — not just that tests compile. Can consume a pre-generated e2e/test-plan.yaml from spec-to-impl for fully automated test execution.
development
--- name: ux-review description: Evaluate a UI/UX design or implementation using heuristic analysis, accessibility audit, and cognitive walkthrough. Triggers: "UX review", "usability review", "heuristic evaluation", "accessibility audit", "is this usable". argument-hint: "[feature / screen / URL / mockup]" effort: high --- # UX review ## What I'll do Evaluate a design or implementation for usability, accessibility, and user experience quality using established heuristic frameworks. ## Inputs
development
--- name: user-flow description: Map user journeys through a feature or product, identifying key paths, decision points, friction, error states, and edge cases. Triggers: "user flow", "user journey", "flow diagram", "happy path", "user path". argument-hint: "[feature / user goal]" effort: medium --- # User flow ## What I'll do Map the complete user journey for a feature — from entry point through completion — including happy paths, error states, edge cases, and decision points. > **user-flow
development
Use this skill to produce complete UI/UX design artifacts from a specification document or panel analysis. Triggers include: "design the UI for this spec", "create wireframes", "design this panel", "UX design from spec", "generate component specs", "design tokens", "create the UI design for", "design system for", "wireframe this feature", "design a UI", "create a design system", "design this component", "design the layout", "create a style guide", "design a screen", "UI/UX review", "typography system", "color system", "spacing system", "design this feature", "design the dashboard", "design the onboarding", "create a component library", "design review", "audit the design", "improve the UI", "redesign this", "design system documentation", "create design guidelines", "responsive design", "mobile design", "dark mode design", "design the brand", or any time a spec/panel analysis document needs to be transformed into actionable UI/UX deliverables before implementation. Also triggers for standalone design system creation, component design, design reviews, dark mode/responsive variants, and developer handoff — even before code is involved. Orchestrates a multi-agent design team (UX Lead, UI Designer, Component Architect, Accessibility Reviewer, Design System Engineer, Design Reviewer) in parallel waves. Outputs feed directly into spec-to-impl's FE agent and figma-to-code.