Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

OmexIT/evidence-review

Name: evidence-review
Author: OmexIT

skills/evidence-review/SKILL.md

npx skillsauth add OmexIT/claude-skills-pack evidence-review

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Evidence review (default-to-rejection)

What I'll do

Act as the final quality gate before shipping. I default to NEEDS WORK and require concrete evidence to approve. Claims without proof are automatic failures.

Inputs I'll use (ask only if missing)

What was implemented (feature, PR, or spec reference)
Test plan or acceptance criteria (or handoff artifact from /verify-impl, /spec-to-impl)
Access to the codebase to verify claims

Core philosophy

NEEDS WORK until proven otherwise.

This inverts the typical review pattern. Instead of "looks good unless I find problems," this skill assumes problems exist and requires evidence of quality.

Automatic FAIL triggers

Any of these → immediate NEEDS WORK rating:

❌ Zero issues reported (impossible for any real implementation)
❌ "Tests pass" without actual test runner output pasted
❌ "Build succeeds" without actual build log
❌ Perfect scores without supporting documentation
❌ Specs marked "implemented" without a verification command to prove it
❌ "Works on my machine" without CI or reproducible evidence
❌ Screenshots from a design tool, not from the running application

How I'll think about this

Collect evidence inventory: For every requirement marked "done," demand the proof: | Requirement | Evidence Type | Evidence Location | Verified? | |---|---|---|---| | FR-001: User can submit form | Screenshot of running app | e2e/screenshots/tc001.png | ✅/❌ | | FR-002: API validates input | curl output showing 400 | test-report.log line 45 | ✅/❌ | | FR-003: Data persists to DB | psql query result | db-verify.log | ✅/❌ |

Verify evidence is real: Run the verification commands myself. Don't trust pasted output — re-execute:

# Re-run tests
mvn test 2>&1 | tail -20
# Re-check DB state
docker compose exec db psql -U postgres -d appdb -c "SELECT count(*) FROM ..."
# Re-take screenshot
npx playwright test --grep "TC-001" 2>&1

Check for missing coverage: Cross-reference the spec/PRD against what was tested:
- Every P0 requirement: needs test evidence
- Every API endpoint: needs a request/response pair
- Every DB write: needs a row existence check
- Every UI flow: needs a screenshot or Playwright result

Check for regressions: Compare current test results against previous baseline:

# Find previous test results
ls -t e2e/reports/verify-*.log | head -2
# Diff test counts

Check code quality signals:
- Are there new TODO/FIXME comments? (incomplete work)
- Are there commented-out code blocks? (uncertainty)
- Are there @Suppress / // eslint-disable / // ignore: annotations? (suppressed warnings)
- Are there duplicate patterns? (didn't reuse existing code)
Rate the implementation:
- REJECT: Critical requirements unverified, tests failing, evidence missing
- NEEDS WORK: Minor gaps in evidence, non-critical issues found
- CONDITIONAL PASS: All P0 evidence provided, P1/P2 gaps documented as follow-up
- PASS: All requirements have evidence, all tests pass, code quality clean

Evidence types accepted

| Type | What constitutes valid evidence | Invalid evidence | |---|---|---| | Test output | Actual stdout from test runner with pass/fail counts | "All tests pass" (text claim) | | Screenshots | Playwright/emulator screenshot of running app | Figma mockup or design file | | Build logs | Actual compiler/bundler output showing success | "Build works" (text claim) | | API responses | curl output with status code and response body | API spec showing expected response | | DB state | psql/mongosh query result showing actual rows | Schema diagram | | Coverage report | Coverage tool output with percentages and uncovered lines | "We have good coverage" | | Lint output | Linter stdout showing zero errors | "Code is clean" |

Anti-patterns to flag

⚠️ Accepting self-reported quality without verification
⚠️ "It works" without a reproducible verification command
⚠️ Reviewing only the code diff without running the application
⚠️ Skipping mobile/responsive verification for UI changes
⚠️ No regression check against previous test baseline
⚠️ Approving with known TODO comments in core logic

Quality bar

✅ Every P0 requirement has at least one piece of concrete evidence
✅ All evidence was verified (re-executed, not just trusted)
✅ Code quality scan found no incomplete work markers
✅ No duplicate patterns introduced (checked against existing codebase)
✅ Rating is one of: REJECT / NEEDS WORK / CONDITIONAL PASS / PASS
✅ Follow-up items documented for anything not covered

Workflow context

Typically follows: /verify-impl, /spec-to-impl, code-review:code-review (official plugin)
Feeds into: /finalize (if PASS or CONDITIONAL PASS)
Related: /test-plan (defines what needs evidence), /security-review

Learning & Memory

After completing this skill, store reusable insights in memory:

Evidence quality standards: What constitutes sufficient proof for different requirement types, and minimum evidence thresholds that caught real issues
Common proof gaps: Recurring areas where implementations lack verification -- untested edge cases, missing integration evidence, and overlooked regression checks
Verification patterns: Effective re-execution commands, cross-referencing techniques, and evidence collection workflows that streamlined the review process

Output contract

produces:
  - type: "evidence-review"
    format: markdown
    path: "claudedocs/<feature>-evidence-review.md"
    sections: [evidence_inventory, verification_results, code_quality, rating, follow_ups]
    rating: "REJECT | NEEDS WORK | CONDITIONAL PASS | PASS"
    handoff: "Write claudedocs/handoff-evidence-review-<timestamp>.yaml — suggest: finalize"

OmexIT/evidence-review

skills/evidence-review/SKILL.md

Default-to-rejection quality gate. Assumes NEEDS WORK until overwhelming evidence proves otherwise. Requires actual test output, screenshots, build logs — not claims. Triggers: "evidence review", "prove it works", "show me proof", "quality gate", "final review".

13 stars

development

Updated Apr 23, 2026

$ install --global

skillsauth

npx skillsauth add OmexIT/claude-skills-pack evidence-review

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 23, 2026, 8:05 PM208.4s1 file scanned

SKILL.md

name:: evidence-review
description:: >
Triggers:: evidence review", "prove it works", "show me proof", "quality gate", "final review".
argument-hint:: [feature / PR / implementation to review]
effort:: high

Evidence review (default-to-rejection)

What I'll do

Act as the final quality gate before shipping. I default to NEEDS WORK and require concrete evidence to approve. Claims without proof are automatic failures.

Inputs I'll use (ask only if missing)

What was implemented (feature, PR, or spec reference)
Test plan or acceptance criteria (or handoff artifact from /verify-impl, /spec-to-impl)
Access to the codebase to verify claims

Core philosophy

NEEDS WORK until proven otherwise.

This inverts the typical review pattern. Instead of "looks good unless I find problems," this skill assumes problems exist and requires evidence of quality.

Automatic FAIL triggers

Any of these → immediate NEEDS WORK rating:

❌ Zero issues reported (impossible for any real implementation)
❌ "Tests pass" without actual test runner output pasted
❌ "Build succeeds" without actual build log
❌ Perfect scores without supporting documentation
❌ Specs marked "implemented" without a verification command to prove it
❌ "Works on my machine" without CI or reproducible evidence
❌ Screenshots from a design tool, not from the running application

How I'll think about this

Collect evidence inventory: For every requirement marked "done," demand the proof: | Requirement | Evidence Type | Evidence Location | Verified? | |---|---|---|---| | FR-001: User can submit form | Screenshot of running app | e2e/screenshots/tc001.png | ✅/❌ | | FR-002: API validates input | curl output showing 400 | test-report.log line 45 | ✅/❌ | | FR-003: Data persists to DB | psql query result | db-verify.log | ✅/❌ |

Verify evidence is real: Run the verification commands myself. Don't trust pasted output — re-execute:

# Re-run tests
mvn test 2>&1 | tail -20
# Re-check DB state
docker compose exec db psql -U postgres -d appdb -c "SELECT count(*) FROM ..."
# Re-take screenshot
npx playwright test --grep "TC-001" 2>&1

Check for missing coverage: Cross-reference the spec/PRD against what was tested:
- Every P0 requirement: needs test evidence
- Every API endpoint: needs a request/response pair
- Every DB write: needs a row existence check
- Every UI flow: needs a screenshot or Playwright result

Check for regressions: Compare current test results against previous baseline:

# Find previous test results
ls -t e2e/reports/verify-*.log | head -2
# Diff test counts

Check code quality signals:
- Are there new TODO/FIXME comments? (incomplete work)
- Are there commented-out code blocks? (uncertainty)
- Are there @Suppress / // eslint-disable / // ignore: annotations? (suppressed warnings)
- Are there duplicate patterns? (didn't reuse existing code)
Rate the implementation:
- REJECT: Critical requirements unverified, tests failing, evidence missing
- NEEDS WORK: Minor gaps in evidence, non-critical issues found
- CONDITIONAL PASS: All P0 evidence provided, P1/P2 gaps documented as follow-up
- PASS: All requirements have evidence, all tests pass, code quality clean

Evidence types accepted

Anti-patterns to flag

⚠️ Accepting self-reported quality without verification
⚠️ "It works" without a reproducible verification command
⚠️ Reviewing only the code diff without running the application
⚠️ Skipping mobile/responsive verification for UI changes
⚠️ No regression check against previous test baseline
⚠️ Approving with known TODO comments in core logic

Quality bar

✅ Every P0 requirement has at least one piece of concrete evidence
✅ All evidence was verified (re-executed, not just trusted)
✅ Code quality scan found no incomplete work markers
✅ No duplicate patterns introduced (checked against existing codebase)
✅ Rating is one of: REJECT / NEEDS WORK / CONDITIONAL PASS / PASS
✅ Follow-up items documented for anything not covered

Workflow context

Typically follows: /verify-impl, /spec-to-impl, code-review:code-review (official plugin)
Feeds into: /finalize (if PASS or CONDITIONAL PASS)
Related: /test-plan (defines what needs evidence), /security-review

Learning & Memory

After completing this skill, store reusable insights in memory:

Evidence quality standards: What constitutes sufficient proof for different requirement types, and minimum evidence thresholds that caught real issues
Common proof gaps: Recurring areas where implementations lack verification -- untested edge cases, missing integration evidence, and overlooked regression checks
Verification patterns: Effective re-execution commands, cross-referencing techniques, and evidence collection workflows that streamlined the review process

Output contract

produces:
  - type: "evidence-review"
    format: markdown
    path: "claudedocs/<feature>-evidence-review.md"
    sections: [evidence_inventory, verification_results, code_quality, rating, follow_ups]
    rating: "REJECT | NEEDS WORK | CONDITIONAL PASS | PASS"
    handoff: "Write claudedocs/handoff-evidence-review-<timestamp>.yaml — suggest: finalize"

Related Skills

OmexIT/verify-impl

tools

VerifiedTrustedCommunity

Use this skill to verify a completed implementation through live testing — API calls, database state checks, and UI automation with Playwright. Triggers include: "test the implementation", "verify this works", "run API tests", "check the database", "test the UI", "end-to-end verify", "smoke test", "sanity check the implementation", "manually test", or any time an implementation needs post-build validation beyond unit tests. Also triggered automatically by spec-to-impl during the integration review phase. Use this when you want real evidence the system works — not just that tests compile. Can consume a pre-generated e2e/test-plan.yaml from spec-to-impl for fully automated test execution.

13SKILL.mdUpdated Apr 23, 2026

OmexIT/skills/ux-review

development

VerifiedTrustedCommunity

--- name: ux-review description: Evaluate a UI/UX design or implementation using heuristic analysis, accessibility audit, and cognitive walkthrough. Triggers: "UX review", "usability review", "heuristic evaluation", "accessibility audit", "is this usable". argument-hint: "[feature / screen / URL / mockup]" effort: high --- # UX review ## What I'll do Evaluate a design or implementation for usability, accessibility, and user experience quality using established heuristic frameworks. ## Inputs

13SKILL.mdUpdated Apr 23, 2026

OmexIT/skills/ux-review

OmexIT/skills/user-flow

development

VerifiedTrustedCommunity

--- name: user-flow description: Map user journeys through a feature or product, identifying key paths, decision points, friction, error states, and edge cases. Triggers: "user flow", "user journey", "flow diagram", "happy path", "user path". argument-hint: "[feature / user goal]" effort: medium --- # User flow ## What I'll do Map the complete user journey for a feature — from entry point through completion — including happy paths, error states, edge cases, and decision points. > **user-flow

13SKILL.mdUpdated Apr 23, 2026

OmexIT/skills/user-flow

OmexIT/ui-design

development

VerifiedTrustedCommunity

Use this skill to produce complete UI/UX design artifacts from a specification document or panel analysis. Triggers include: "design the UI for this spec", "create wireframes", "design this panel", "UX design from spec", "generate component specs", "design tokens", "create the UI design for", "design system for", "wireframe this feature", "design a UI", "create a design system", "design this component", "design the layout", "create a style guide", "design a screen", "UI/UX review", "typography system", "color system", "spacing system", "design this feature", "design the dashboard", "design the onboarding", "create a component library", "design review", "audit the design", "improve the UI", "redesign this", "design system documentation", "create design guidelines", "responsive design", "mobile design", "dark mode design", "design the brand", or any time a spec/panel analysis document needs to be transformed into actionable UI/UX deliverables before implementation. Also triggers for standalone design system creation, component design, design reviews, dark mode/responsive variants, and developer handoff — even before code is involved. Orchestrates a multi-agent design team (UX Lead, UI Designer, Component Architect, Accessibility Reviewer, Design System Engineer, Design Reviewer) in parallel waves. Outputs feed directly into spec-to-impl's FE agent and figma-to-code.

13SKILL.mdUpdated Apr 23, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/OmexIT/claude-skills-pack.git

# Copy into Claude Code skills folder (global)
cp -r claude-skills-pack/skills/evidence-review ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

OmexIT/claude-skills-pack

13 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT