skills/ariegoldkin/evidence-verification/SKILL.md
This skill teaches agents how to collect and verify evidence before marking tasks complete. Inspired by production-grade development practices, it ensures all claims are backed by executable proof:...
npx skillsauth add aiskillstore/marketplace evidence-verificationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Version: 1.0.0 Type: Quality Assurance Auto-activate: Code review, task completion, production deployment
This skill teaches agents how to collect and verify evidence before marking tasks complete. Inspired by production-grade development practices, it ensures all claims are backed by executable proof: test results, coverage metrics, build success, and deployment verification.
Key Principle: Show, don't tell. No task is complete without verifiable evidence.
Test Evidence
Build Evidence
Deployment Evidence
Code Quality Evidence
## Evidence Collection Steps
1. **Identify Verification Points**
- What needs to be proven?
- What could go wrong?
- What does "complete" mean?
2. **Execute Verification**
- Run tests
- Run build
- Run linters
- Check deployments
3. **Capture Results**
- Record exit codes
- Save output snippets
- Note timestamps
- Document environment
4. **Store Evidence**
- Add to shared context
- Reference in task completion
- Link to artifacts
Minimum Evidence Requirements:
Production-Grade Requirements:
Use this template when running tests:
## Test Evidence
**Command:** `npm test` (or equivalent)
**Exit Code:** 0 ✅ / non-zero ❌
**Duration:** X seconds
**Results:**
- Tests passed: X
- Tests failed: X
- Tests skipped: X
- Coverage: X%
**Output Snippet:**
[First 10 lines of test output]
**Timestamp:** YYYY-MM-DD HH:MM:SS
**Environment:** Node vX.X.X, OS, etc.
Use this template when building:
## Build Evidence
**Command:** `npm run build` (or equivalent)
**Exit Code:** 0 ✅ / non-zero ❌
**Duration:** X seconds
**Artifacts Created:**
- dist/bundle.js (XXX KB)
- dist/styles.css (XXX KB)
**Errors:** X
**Warnings:** X
**Output Snippet:**
[First 10 lines of build output]
**Timestamp:** YYYY-MM-DD HH:MM:SS
Use this template for linting and type checking:
## Code Quality Evidence
**Linter:** ESLint / Ruff / etc.
**Command:** `npm run lint`
**Exit Code:** 0 ✅ / non-zero ❌
**Errors:** X
**Warnings:** X
**Type Checker:** TypeScript / mypy / etc.
**Command:** `npm run typecheck`
**Exit Code:** 0 ✅ / non-zero ❌
**Type Errors:** X
**Timestamp:** YYYY-MM-DD HH:MM:SS
Use this comprehensive template for task completion:
## Task Completion Evidence
### Task: [Task description]
### Agent: [Agent name]
### Completed: YYYY-MM-DD HH:MM:SS
### Verification Results
| Check | Command | Exit Code | Result |
|-------|---------|-----------|--------|
| Tests | `npm test` | 0 | ✅ 45 passed, 0 failed |
| Build | `npm run build` | 0 | ✅ Bundle created (234 KB) |
| Linter | `npm run lint` | 0 | ✅ No errors, 2 warnings |
| Types | `npm run typecheck` | 0 | ✅ No type errors |
### Coverage
- Statements: 87%
- Branches: 82%
- Functions: 90%
- Lines: 86%
### Evidence Files
- Test output: `.claude/quality-gates/evidence/tests-2025-XX-XX.log`
- Build output: `.claude/quality-gates/evidence/build-2025-XX-XX.log`
### Conclusion
All verification checks passed. Task ready for review.
When: After writing code for a feature or bug fix
Steps:
Save all files - Ensure changes are written
Run tests
npm test
# or: pytest, cargo test, go test, etc.
Run build (if applicable)
npm run build
# or: cargo build, go build, etc.
Run linter
npm run lint
# or: ruff check, cargo clippy, golangci-lint run
Run type checker (if applicable)
npm run typecheck
# or: mypy, tsc --noEmit
Document evidence
quality_evidenceMark task complete (only if all evidence passes)
When: Reviewing another agent's code or user's PR
Steps:
Read the code changes
Verify tests exist
Run tests
Check build
Verify code quality
Document review evidence
Approve or request changes
When: Deploying to production or staging
Steps:
Pre-deployment checks
Execute deployment
Post-deployment checks
Document deployment evidence
## Deployment Evidence
**Environment:** production
**Timestamp:** YYYY-MM-DD HH:MM:SS
**Version:** vX.X.X
**Pre-Deployment:**
- Tests: ✅ Exit 0
- Build: ✅ Exit 0
- Security: ✅ No critical issues
**Deployment:**
- Command: `kubectl apply -f deployment.yaml`
- Exit Code: 0 ✅
**Post-Deployment:**
- Health Check: ✅ 200 OK
- Smoke Tests: ✅ All passed
- Error Rate: <0.1%
Verify rollback capability
Shared Context (Primary)
{
"quality_evidence": {
"tests_run": true,
"test_exit_code": 0,
"coverage_percent": 87,
"build_success": true,
"build_exit_code": 0,
"linter_errors": 0,
"linter_warnings": 2,
"timestamp": "2025-11-02T10:30:00Z"
}
}
Evidence Files (Secondary)
.claude/quality-gates/evidence/ directory{type}-{timestamp}.logtests-2025-11-02-103000.logTask Completion Messages
✅ Tests executed with captured exit code ✅ Timestamp recorded ✅ Evidence stored in context
✅ Tests pass (exit code 0) ✅ Coverage ≥70% (or project standard) ✅ Build succeeds (exit code 0) ✅ No critical linter errors ✅ Type checker passes ✅ Security scan shows no critical issues
✅ All production-grade requirements ✅ Coverage ≥80% ✅ No linter warnings ✅ Performance benchmarks within thresholds ✅ Accessibility audit passes (WCAG 2.1 AA) ✅ Integration tests pass ✅ Deployment verification complete
Bad:
"I've implemented the login feature. It should work correctly."
Good:
"I've implemented the login feature. Evidence:
- Tests: Exit code 0, 12 tests passed, 0 failed
- Build: Exit code 0, no errors
- Coverage: 89%
Task complete with verification."
Bad:
"Tests passed" (without actually running them)
Good:
"Tests passed. Exit code: 0
Command: npm test
Output: Test Suites: 3 passed, 3 total
Timestamp: 2025-11-02 10:30:15"
Bad:
"Build failed with exit code 1, but the code looks correct so marking complete."
Good:
"Build failed with exit code 1. Errors:
- TypeError: Cannot read property 'id' of undefined (line 42)
Fixing the error now before marking complete."
Bad:
"Tests passed yesterday, so the code is still good."
Good:
"Re-running tests after today's changes.
New evidence: Exit code 0, 45 tests passed, coverage 87%"
Evidence is automatically tracked in shared context:
// Context structure includes:
{
quality_evidence?: {
tests_run: boolean;
test_exit_code?: number;
coverage_percent?: number;
build_success?: boolean;
linter_errors?: number;
timestamp: string;
}
}
Evidence collection feeds into quality gates:
In parallel execution:
Before marking task complete:
- [ ] Tests executed
- [ ] Test exit code captured (0 = pass)
- [ ] Build executed (if applicable)
- [ ] Build exit code captured (0 = pass)
- [ ] Code quality checks run (linter, types)
- [ ] Evidence documented with timestamp
- [ ] Evidence added to shared context
- [ ] Evidence summary in completion message
JavaScript/TypeScript:
npm test # Run tests
npm run build # Build project
npm run lint # Run ESLint
npm run typecheck # Run TypeScript compiler
Python:
pytest # Run tests
pytest --cov # Run tests with coverage
ruff check . # Run linter
mypy . # Run type checker
Rust:
cargo test # Run tests
cargo build # Build project
cargo clippy # Run linter
Go:
go test ./... # Run tests
go build # Build project
golangci-lint run # Run linter
See /skills/evidence-verification/examples/ for:
v1.0.0 - Initial release
Remember: Evidence-first development prevents hallucinations, ensures production quality, and builds confidence. When in doubt, collect more evidence, not less.
development
Apple Human Interface Guidelines for content display components. Use this skill when the user asks about charts component, collection view, image view, web view, color well, image well, activity view, lockup, data visualization, content display, displaying images, rendering web content, color pickers, or presenting collections of items in Apple apps. Also use when the user says how should I display charts, what's the best way to show images, should I use a web view, how do I build a grid of items, what component shows media, or how do I present a share sheet. Cross-references: hig-foundations for color/typography/accessibility, hig-patterns for data visualization patterns, hig-components-layout for structural containers, hig-platforms for platform-specific component behavior.
tools
Automate HelpDesk tasks via Rube MCP (Composio): list tickets, manage views, use canned responses, and configure custom fields. Always search tools first for current schemas.
testing
Expert Haskell engineer specializing in advanced type systems, pure functional design, and high-reliability software. Use PROACTIVELY for type-level programming, concurrency, and architecture guidance.
tools
GraphQL gives clients exactly the data they need - no more, no less. One endpoint, typed schema, introspection. But the flexibility that makes it powerful also makes it dangerous. Without proper controls, clients can craft queries that bring down your server. This skill covers schema design, resolvers, DataLoader for N+1 prevention, federation for microservices, and client integration with Apollo/urql. Key insight: GraphQL is a contract. The schema is the API documentation. Design it carefully.