Evidence-Based Verification Skill

Version: 1.0.0 Type: Quality Assurance Auto-activate: Code review, task completion, production deployment

Overview

This skill teaches agents how to collect and verify evidence before marking tasks complete. Inspired by production-grade development practices, it ensures all claims are backed by executable proof: test results, coverage metrics, build success, and deployment verification.

Key Principle: Show, don't tell. No task is complete without verifiable evidence.

When to Use This Skill

Auto-Activate Triggers

Completing code implementation
Finishing code review
Marking tasks complete in Squad mode
Before agent handoff
Production deployment verification

Manual Activation

When user requests "verify this works"
Before creating pull requests
During quality assurance reviews
When troubleshooting failures

Core Concepts

1. Evidence Types

Test Evidence

Exit code (must be 0 for success)
Test suite results (passed/failed/skipped)
Coverage percentage (if available)
Test duration

Build Evidence

Build exit code (0 = success)
Compilation errors/warnings
Build artifacts created
Build duration

Deployment Evidence

Deployment status (success/failed)
Environment deployed to
Health check results
Rollback capability verified

Code Quality Evidence

Linter results (errors/warnings)
Type checker results
Security scan results
Accessibility audit results

2. Evidence Collection Protocol

## Evidence Collection Steps

1. **Identify Verification Points**
   - What needs to be proven?
   - What could go wrong?
   - What does "complete" mean?

2. **Execute Verification**
   - Run tests
   - Run build
   - Run linters
   - Check deployments

3. **Capture Results**
   - Record exit codes
   - Save output snippets
   - Note timestamps
   - Document environment

4. **Store Evidence**
   - Add to shared context
   - Reference in task completion
   - Link to artifacts

3. Verification Standards

Minimum Evidence Requirements:

✅ At least ONE verification type executed
✅ Exit code captured (0 = pass, non-zero = fail)
✅ Timestamp recorded
✅ Evidence stored in context

Production-Grade Requirements:

✅ Tests run with exit code 0
✅ Coverage >70% (or project standard)
✅ Build succeeds with exit code 0
✅ No critical linter errors
✅ Security scan passes

Evidence Collection Templates

Template 1: Test Evidence

Use this template when running tests:

## Test Evidence

**Command:** `npm test` (or equivalent)
**Exit Code:** 0 ✅ / non-zero ❌
**Duration:** X seconds
**Results:**
- Tests passed: X
- Tests failed: X
- Tests skipped: X
- Coverage: X%

**Output Snippet:**

[First 10 lines of test output]


**Timestamp:** YYYY-MM-DD HH:MM:SS
**Environment:** Node vX.X.X, OS, etc.

Template 2: Build Evidence

Use this template when building:

## Build Evidence

**Command:** `npm run build` (or equivalent)
**Exit Code:** 0 ✅ / non-zero ❌
**Duration:** X seconds
**Artifacts Created:**
- dist/bundle.js (XXX KB)
- dist/styles.css (XXX KB)

**Errors:** X
**Warnings:** X

**Output Snippet:**

[First 10 lines of build output]


**Timestamp:** YYYY-MM-DD HH:MM:SS

Template 3: Code Quality Evidence

Use this template for linting and type checking:

## Code Quality Evidence

**Linter:** ESLint / Ruff / etc.
**Command:** `npm run lint`
**Exit Code:** 0 ✅ / non-zero ❌
**Errors:** X
**Warnings:** X

**Type Checker:** TypeScript / mypy / etc.
**Command:** `npm run typecheck`
**Exit Code:** 0 ✅ / non-zero ❌
**Type Errors:** X

**Timestamp:** YYYY-MM-DD HH:MM:SS

Template 4: Combined Evidence Report

Use this comprehensive template for task completion:

## Task Completion Evidence

### Task: [Task description]
### Agent: [Agent name]
### Completed: YYYY-MM-DD HH:MM:SS

### Verification Results

| Check | Command | Exit Code | Result |
|-------|---------|-----------|--------|
| Tests | `npm test` | 0 | ✅ 45 passed, 0 failed |
| Build | `npm run build` | 0 | ✅ Bundle created (234 KB) |
| Linter | `npm run lint` | 0 | ✅ No errors, 2 warnings |
| Types | `npm run typecheck` | 0 | ✅ No type errors |

### Coverage
- Statements: 87%
- Branches: 82%
- Functions: 90%
- Lines: 86%

### Evidence Files
- Test output: `.claude/quality-gates/evidence/tests-2025-XX-XX.log`
- Build output: `.claude/quality-gates/evidence/build-2025-XX-XX.log`

### Conclusion
All verification checks passed. Task ready for review.

Step-by-Step Workflows

Workflow 1: Code Implementation Verification

When: After writing code for a feature or bug fix

Steps:

Save all files - Ensure changes are written
Run tests
```
npm test
# or: pytest, cargo test, go test, etc.
```
- Capture exit code
- Note passed/failed counts
- Record coverage if available
Run build (if applicable)
```
npm run build
# or: cargo build, go build, etc.
```
- Capture exit code
- Note any errors/warnings
- Verify artifacts created

Run linter

npm run lint
# or: ruff check, cargo clippy, golangci-lint run

Capture exit code
Note errors/warnings

Run type checker (if applicable)
```
npm run typecheck
# or: mypy, tsc --noEmit
```
- Capture exit code
- Note type errors
Document evidence
- Use Template 4 (Combined Evidence Report)
- Add to shared context under quality_evidence
- Reference in task completion message
Mark task complete (only if all evidence passes)

Workflow 2: Code Review Verification

When: Reviewing another agent's code or user's PR

Steps:

Read the code changes
Verify tests exist
- Are there tests for new functionality?
- Do tests cover edge cases?
- Are existing tests updated?
Run tests
- Execute test suite
- Verify exit code 0
- Check coverage didn't decrease
Check build
- Ensure project still builds
- No new build errors
Verify code quality
- Run linter
- Run type checker
- Check for security issues
Document review evidence
- Use Template 3 (Code Quality Evidence)
- Note any issues found
- Add to context
Approve or request changes
- Approve only if all evidence passes
- If issues found, document them with evidence

Workflow 3: Production Deployment Verification

When: Deploying to production or staging

Steps:

Pre-deployment checks
- All tests pass (exit code 0)
- Build succeeds
- No critical linter errors
- Security scan passes
Execute deployment
- Run deployment command
- Capture output
Post-deployment checks
- Health check endpoint responds
- Application starts successfully
- No immediate errors in logs
- Smoke tests pass

Document deployment evidence

## Deployment Evidence

**Environment:** production
**Timestamp:** YYYY-MM-DD HH:MM:SS
**Version:** vX.X.X

**Pre-Deployment:**
- Tests: ✅ Exit 0
- Build: ✅ Exit 0
- Security: ✅ No critical issues

**Deployment:**
- Command: `kubectl apply -f deployment.yaml`
- Exit Code: 0 ✅

**Post-Deployment:**
- Health Check: ✅ 200 OK
- Smoke Tests: ✅ All passed
- Error Rate: <0.1%

Verify rollback capability
- Ensure previous version can be restored
- Document rollback procedure

Evidence Storage

Where to Store Evidence

Shared Context (Primary)

{
  "quality_evidence": {
    "tests_run": true,
    "test_exit_code": 0,
    "coverage_percent": 87,
    "build_success": true,
    "build_exit_code": 0,
    "linter_errors": 0,
    "linter_warnings": 2,
    "timestamp": "2025-11-02T10:30:00Z"
  }
}

Evidence Files (Secondary)

.claude/quality-gates/evidence/ directory
One file per verification run
Format: {type}-{timestamp}.log
Example: tests-2025-11-02-103000.log

Task Completion Messages

Include evidence summary
Link to detailed evidence files
Example: "Task complete. Tests passed (exit 0, 87% coverage), build succeeded."

Quality Standards

Minimum Acceptable

✅ Tests executed with captured exit code ✅ Timestamp recorded ✅ Evidence stored in context

Production-Grade

✅ Tests pass (exit code 0) ✅ Coverage ≥70% (or project standard) ✅ Build succeeds (exit code 0) ✅ No critical linter errors ✅ Type checker passes ✅ Security scan shows no critical issues

Gold Standard

✅ All production-grade requirements ✅ Coverage ≥80% ✅ No linter warnings ✅ Performance benchmarks within thresholds ✅ Accessibility audit passes (WCAG 2.1 AA) ✅ Integration tests pass ✅ Deployment verification complete

Common Pitfalls

❌ Don't Skip Evidence Collection

Bad:

"I've implemented the login feature. It should work correctly."

Good:

"I've implemented the login feature. Evidence:
- Tests: Exit code 0, 12 tests passed, 0 failed
- Build: Exit code 0, no errors
- Coverage: 89%
Task complete with verification."

❌ Don't Fake Evidence

Bad:

"Tests passed" (without actually running them)

Good:

"Tests passed. Exit code: 0
Command: npm test
Output: Test Suites: 3 passed, 3 total
Timestamp: 2025-11-02 10:30:15"

❌ Don't Ignore Failed Evidence

Bad:

"Build failed with exit code 1, but the code looks correct so marking complete."

Good:

"Build failed with exit code 1. Errors:
- TypeError: Cannot read property 'id' of undefined (line 42)
Fixing the error now before marking complete."

❌ Don't Collect Evidence Only Once

Bad:

"Tests passed yesterday, so the code is still good."

Good:

"Re-running tests after today's changes.
New evidence: Exit code 0, 45 tests passed, coverage 87%"

Integration with Other Systems

Context System Integration

Evidence is automatically tracked in shared context:

// Context structure includes:
{
  quality_evidence?: {
    tests_run: boolean;
    test_exit_code?: number;
    coverage_percent?: number;
    build_success?: boolean;
    linter_errors?: number;
    timestamp: string;
  }
}

Quality Gates Integration

Evidence collection feeds into quality gates:

Quality gates check if evidence exists
Block task completion if evidence missing
Escalate if evidence shows failures

Squad Mode Integration

In parallel execution:

Each agent collects evidence independently
Studio Coach validates evidence before sync
Blocked tasks don't waste parallel cycles

Quick Reference

Evidence Collection Checklist

Before marking task complete:

- [ ] Tests executed
- [ ] Test exit code captured (0 = pass)
- [ ] Build executed (if applicable)
- [ ] Build exit code captured (0 = pass)
- [ ] Code quality checks run (linter, types)
- [ ] Evidence documented with timestamp
- [ ] Evidence added to shared context
- [ ] Evidence summary in completion message

Common Commands by Language/Framework

JavaScript/TypeScript:

npm test                 # Run tests
npm run build           # Build project
npm run lint            # Run ESLint
npm run typecheck       # Run TypeScript compiler

Python:

pytest                  # Run tests
pytest --cov           # Run tests with coverage
ruff check .           # Run linter
mypy .                 # Run type checker

Rust:

cargo test             # Run tests
cargo build            # Build project
cargo clippy           # Run linter

Go:

go test ./...          # Run tests
go build               # Build project
golangci-lint run      # Run linter

Examples

See /skills/evidence-verification/examples/ for:

Sample evidence reports
Real-world verification scenarios
Integration examples

Version History

v1.0.0 - Initial release

Core evidence collection templates
Verification workflows
Quality standards
Integration with context system

Remember: Evidence-first development prevents hallucinations, ensures production quality, and builds confidence. When in doubt, collect more evidence, not less.

Evidence-Based Verification Skill

Version: 1.0.0 Type: Quality Assurance Auto-activate: Code review, task completion, production deployment

Overview

Key Principle: Show, don't tell. No task is complete without verifiable evidence.

When to Use This Skill

Auto-Activate Triggers

Completing code implementation
Finishing code review
Marking tasks complete in Squad mode
Before agent handoff
Production deployment verification

Manual Activation

When user requests "verify this works"
Before creating pull requests
During quality assurance reviews
When troubleshooting failures

Core Concepts

1. Evidence Types

Test Evidence

Exit code (must be 0 for success)
Test suite results (passed/failed/skipped)
Coverage percentage (if available)
Test duration

Build Evidence

Build exit code (0 = success)
Compilation errors/warnings
Build artifacts created
Build duration

Deployment Evidence

Deployment status (success/failed)
Environment deployed to
Health check results
Rollback capability verified

Code Quality Evidence

Linter results (errors/warnings)
Type checker results
Security scan results
Accessibility audit results

2. Evidence Collection Protocol

## Evidence Collection Steps

1. **Identify Verification Points**
   - What needs to be proven?
   - What could go wrong?
   - What does "complete" mean?

2. **Execute Verification**
   - Run tests
   - Run build
   - Run linters
   - Check deployments

3. **Capture Results**
   - Record exit codes
   - Save output snippets
   - Note timestamps
   - Document environment

4. **Store Evidence**
   - Add to shared context
   - Reference in task completion
   - Link to artifacts

3. Verification Standards

Minimum Evidence Requirements:

✅ At least ONE verification type executed
✅ Exit code captured (0 = pass, non-zero = fail)
✅ Timestamp recorded
✅ Evidence stored in context

Production-Grade Requirements:

✅ Tests run with exit code 0
✅ Coverage >70% (or project standard)
✅ Build succeeds with exit code 0
✅ No critical linter errors
✅ Security scan passes

Evidence Collection Templates

Template 1: Test Evidence

Use this template when running tests:

## Test Evidence

**Command:** `npm test` (or equivalent)
**Exit Code:** 0 ✅ / non-zero ❌
**Duration:** X seconds
**Results:**
- Tests passed: X
- Tests failed: X
- Tests skipped: X
- Coverage: X%

**Output Snippet:**

[First 10 lines of test output]


**Timestamp:** YYYY-MM-DD HH:MM:SS
**Environment:** Node vX.X.X, OS, etc.

Template 2: Build Evidence

Use this template when building:

## Build Evidence

**Command:** `npm run build` (or equivalent)
**Exit Code:** 0 ✅ / non-zero ❌
**Duration:** X seconds
**Artifacts Created:**
- dist/bundle.js (XXX KB)
- dist/styles.css (XXX KB)

**Errors:** X
**Warnings:** X

**Output Snippet:**

[First 10 lines of build output]


**Timestamp:** YYYY-MM-DD HH:MM:SS

Template 3: Code Quality Evidence

Use this template for linting and type checking:

## Code Quality Evidence

**Linter:** ESLint / Ruff / etc.
**Command:** `npm run lint`
**Exit Code:** 0 ✅ / non-zero ❌
**Errors:** X
**Warnings:** X

**Type Checker:** TypeScript / mypy / etc.
**Command:** `npm run typecheck`
**Exit Code:** 0 ✅ / non-zero ❌
**Type Errors:** X

**Timestamp:** YYYY-MM-DD HH:MM:SS

Template 4: Combined Evidence Report

Use this comprehensive template for task completion:

## Task Completion Evidence

### Task: [Task description]
### Agent: [Agent name]
### Completed: YYYY-MM-DD HH:MM:SS

### Verification Results

| Check | Command | Exit Code | Result |
|-------|---------|-----------|--------|
| Tests | `npm test` | 0 | ✅ 45 passed, 0 failed |
| Build | `npm run build` | 0 | ✅ Bundle created (234 KB) |
| Linter | `npm run lint` | 0 | ✅ No errors, 2 warnings |
| Types | `npm run typecheck` | 0 | ✅ No type errors |

### Coverage
- Statements: 87%
- Branches: 82%
- Functions: 90%
- Lines: 86%

### Evidence Files
- Test output: `.claude/quality-gates/evidence/tests-2025-XX-XX.log`
- Build output: `.claude/quality-gates/evidence/build-2025-XX-XX.log`

### Conclusion
All verification checks passed. Task ready for review.

Step-by-Step Workflows

Workflow 1: Code Implementation Verification

When: After writing code for a feature or bug fix

Steps:

Save all files - Ensure changes are written
Run tests
```
npm test
# or: pytest, cargo test, go test, etc.
```
- Capture exit code
- Note passed/failed counts
- Record coverage if available
Run build (if applicable)
```
npm run build
# or: cargo build, go build, etc.
```
- Capture exit code
- Note any errors/warnings
- Verify artifacts created

Run linter

npm run lint
# or: ruff check, cargo clippy, golangci-lint run

Capture exit code
Note errors/warnings

Run type checker (if applicable)
```
npm run typecheck
# or: mypy, tsc --noEmit
```
- Capture exit code
- Note type errors
Document evidence
- Use Template 4 (Combined Evidence Report)
- Add to shared context under quality_evidence
- Reference in task completion message
Mark task complete (only if all evidence passes)

Workflow 2: Code Review Verification

When: Reviewing another agent's code or user's PR

Steps:

Read the code changes
Verify tests exist
- Are there tests for new functionality?
- Do tests cover edge cases?
- Are existing tests updated?
Run tests
- Execute test suite
- Verify exit code 0
- Check coverage didn't decrease
Check build
- Ensure project still builds
- No new build errors
Verify code quality
- Run linter
- Run type checker
- Check for security issues
Document review evidence
- Use Template 3 (Code Quality Evidence)
- Note any issues found
- Add to context
Approve or request changes
- Approve only if all evidence passes
- If issues found, document them with evidence

Workflow 3: Production Deployment Verification

When: Deploying to production or staging

Steps:

Pre-deployment checks
- All tests pass (exit code 0)
- Build succeeds
- No critical linter errors
- Security scan passes
Execute deployment
- Run deployment command
- Capture output
Post-deployment checks
- Health check endpoint responds
- Application starts successfully
- No immediate errors in logs
- Smoke tests pass

Document deployment evidence

## Deployment Evidence

**Environment:** production
**Timestamp:** YYYY-MM-DD HH:MM:SS
**Version:** vX.X.X

**Pre-Deployment:**
- Tests: ✅ Exit 0
- Build: ✅ Exit 0
- Security: ✅ No critical issues

**Deployment:**
- Command: `kubectl apply -f deployment.yaml`
- Exit Code: 0 ✅

**Post-Deployment:**
- Health Check: ✅ 200 OK
- Smoke Tests: ✅ All passed
- Error Rate: <0.1%

Verify rollback capability
- Ensure previous version can be restored
- Document rollback procedure

Evidence Storage

Where to Store Evidence

Shared Context (Primary)

{
  "quality_evidence": {
    "tests_run": true,
    "test_exit_code": 0,
    "coverage_percent": 87,
    "build_success": true,
    "build_exit_code": 0,
    "linter_errors": 0,
    "linter_warnings": 2,
    "timestamp": "2025-11-02T10:30:00Z"
  }
}

Evidence Files (Secondary)

.claude/quality-gates/evidence/ directory
One file per verification run
Format: {type}-{timestamp}.log
Example: tests-2025-11-02-103000.log

Task Completion Messages

Include evidence summary
Link to detailed evidence files
Example: "Task complete. Tests passed (exit 0, 87% coverage), build succeeded."

Quality Standards

Minimum Acceptable

✅ Tests executed with captured exit code ✅ Timestamp recorded ✅ Evidence stored in context

Production-Grade

Gold Standard

Common Pitfalls

❌ Don't Skip Evidence Collection

Bad:

"I've implemented the login feature. It should work correctly."

Good:

"I've implemented the login feature. Evidence:
- Tests: Exit code 0, 12 tests passed, 0 failed
- Build: Exit code 0, no errors
- Coverage: 89%
Task complete with verification."

❌ Don't Fake Evidence

Bad:

"Tests passed" (without actually running them)

Good:

"Tests passed. Exit code: 0
Command: npm test
Output: Test Suites: 3 passed, 3 total
Timestamp: 2025-11-02 10:30:15"

❌ Don't Ignore Failed Evidence

Bad:

"Build failed with exit code 1, but the code looks correct so marking complete."

Good:

"Build failed with exit code 1. Errors:
- TypeError: Cannot read property 'id' of undefined (line 42)
Fixing the error now before marking complete."

❌ Don't Collect Evidence Only Once

Bad:

"Tests passed yesterday, so the code is still good."

Good:

"Re-running tests after today's changes.
New evidence: Exit code 0, 45 tests passed, coverage 87%"

Integration with Other Systems

Context System Integration

Evidence is automatically tracked in shared context:

// Context structure includes:
{
  quality_evidence?: {
    tests_run: boolean;
    test_exit_code?: number;
    coverage_percent?: number;
    build_success?: boolean;
    linter_errors?: number;
    timestamp: string;
  }
}

Quality Gates Integration

Evidence collection feeds into quality gates:

Quality gates check if evidence exists
Block task completion if evidence missing
Escalate if evidence shows failures

Squad Mode Integration

In parallel execution:

Each agent collects evidence independently
Studio Coach validates evidence before sync
Blocked tasks don't waste parallel cycles

Quick Reference

Evidence Collection Checklist

Before marking task complete:

- [ ] Tests executed
- [ ] Test exit code captured (0 = pass)
- [ ] Build executed (if applicable)
- [ ] Build exit code captured (0 = pass)
- [ ] Code quality checks run (linter, types)
- [ ] Evidence documented with timestamp
- [ ] Evidence added to shared context
- [ ] Evidence summary in completion message

Common Commands by Language/Framework

JavaScript/TypeScript:

npm test                 # Run tests
npm run build           # Build project
npm run lint            # Run ESLint
npm run typecheck       # Run TypeScript compiler

Python:

pytest                  # Run tests
pytest --cov           # Run tests with coverage
ruff check .           # Run linter
mypy .                 # Run type checker

Rust:

cargo test             # Run tests
cargo build            # Build project
cargo clippy           # Run linter

Go:

go test ./...          # Run tests
go build               # Build project
golangci-lint run      # Run linter

Examples

See /skills/evidence-verification/examples/ for:

Sample evidence reports
Real-world verification scenarios
Integration examples

Version History

v1.0.0 - Initial release

Core evidence collection templates
Verification workflows
Quality standards
Integration with context system

Remember: Evidence-first development prevents hallucinations, ensures production quality, and builds confidence. When in doubt, collect more evidence, not less.

Adoption

aiskillstore/evidence-verification

$ install --global

Security Scan Results

SKILL.md

Evidence-Based Verification Skill

Overview

When to Use This Skill

Auto-Activate Triggers

Manual Activation

Core Concepts

1. Evidence Types

2. Evidence Collection Protocol

3. Verification Standards

Evidence Collection Templates

Template 1: Test Evidence

Template 2: Build Evidence

Template 3: Code Quality Evidence

Template 4: Combined Evidence Report

Step-by-Step Workflows

Workflow 1: Code Implementation Verification

Workflow 2: Code Review Verification

Workflow 3: Production Deployment Verification

Evidence Storage

Where to Store Evidence

Quality Standards

Minimum Acceptable

Production-Grade

Gold Standard

Common Pitfalls

❌ Don't Skip Evidence Collection

❌ Don't Fake Evidence

❌ Don't Ignore Failed Evidence

❌ Don't Collect Evidence Only Once

Integration with Other Systems

Context System Integration

Quality Gates Integration

Squad Mode Integration

Quick Reference

Evidence Collection Checklist

Common Commands by Language/Framework

Examples

Version History

Related Skills

aiskillstore/hig-components-content

aiskillstore/helpdesk-automation

aiskillstore/haskell-pro

aiskillstore/graphql

aiskillstore/evidence-verification

$ install --global

Security Scan Results

SKILL.md

Evidence-Based Verification Skill

Overview

When to Use This Skill

Auto-Activate Triggers

Manual Activation

Core Concepts

1. Evidence Types

2. Evidence Collection Protocol

3. Verification Standards

Evidence Collection Templates

Template 1: Test Evidence

Template 2: Build Evidence

Template 3: Code Quality Evidence

Template 4: Combined Evidence Report

Step-by-Step Workflows

Workflow 1: Code Implementation Verification

Workflow 2: Code Review Verification

Workflow 3: Production Deployment Verification

Evidence Storage

Where to Store Evidence

Quality Standards

Minimum Acceptable

Production-Grade

Gold Standard

Common Pitfalls

❌ Don't Skip Evidence Collection

❌ Don't Fake Evidence

❌ Don't Ignore Failed Evidence