plugins/autopilot/skills/proof-of-work/SKILL.md
Proof artifact generation patterns for task validation. Covers screenshots, test results, deployments, and confidence scoring.
npx skillsauth add madappgang/magus proof-of-workInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
plugin: autopilot updated: 2026-01-20
Version: 0.1.0 Purpose: Generate validation artifacts for autonomous task completion Status: Phase 1
Use this skill when you need to:
Proof-of-work is the mechanism that validates task completion. Every finished task must include verifiable artifacts that demonstrate the work was done correctly.
| Artifact | Required | Purpose | |----------|----------|---------| | Git diff | Yes | Show minimal, focused changes | | Test results | Yes | All tests passing | | Regression test | Yes | Specific test for the bug | | Error log (before/after) | Optional | Visual evidence |
| Artifact | Required | Purpose | |----------|----------|---------| | Screenshots | Yes | Visual verification | | Test results | Yes | Functionality works | | Coverage report | Yes | >= 80% coverage | | Build output | Yes | Builds successfully | | Deployment URL | Optional | Live demo |
| Artifact | Required | Purpose | |----------|----------|---------| | Desktop screenshot | Yes | 1920x1080 view | | Mobile screenshot | Yes | 375x667 view | | Tablet screenshot | Yes | 768x1024 view | | Accessibility score | Yes | >= 80 Lighthouse | | Visual regression | Optional | BackstopJS diff |
Playwright Pattern:
import { chromium } from 'playwright';
async function captureScreenshots(url: string, outputDir: string) {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext();
const page = await context.newPage();
// Desktop
await page.setViewportSize({ width: 1920, height: 1080 });
await page.goto(url);
await page.waitForLoadState('networkidle');
await page.screenshot({
path: `${outputDir}/desktop.png`,
fullPage: true,
});
// Mobile
await page.setViewportSize({ width: 375, height: 667 });
await page.goto(url);
await page.waitForLoadState('networkidle');
await page.screenshot({
path: `${outputDir}/mobile.png`,
fullPage: true,
});
// Tablet
await page.setViewportSize({ width: 768, height: 1024 });
await page.goto(url);
await page.waitForLoadState('networkidle');
await page.screenshot({
path: `${outputDir}/tablet.png`,
fullPage: true,
});
await browser.close();
}
Algorithm:
interface ProofArtifacts {
testResults?: { passed: number; total: number };
buildSuccessful?: boolean;
lintErrors?: number;
screenshots?: string[];
testCoverage?: number;
performanceScore?: number;
}
function calculateConfidence(artifacts: ProofArtifacts): number {
let score = 0;
// Tests (40 points)
if (artifacts.testResults) {
if (artifacts.testResults.passed === artifacts.testResults.total) {
score += 40;
}
}
// Build (20 points)
if (artifacts.buildSuccessful) {
score += 20;
}
// Coverage (20 points)
if (artifacts.testCoverage) {
if (artifacts.testCoverage >= 80) score += 20;
else if (artifacts.testCoverage >= 60) score += 15;
else if (artifacts.testCoverage >= 40) score += 10;
else score += 5;
}
// Screenshots (10 points)
if (artifacts.screenshots) {
if (artifacts.screenshots.length >= 3) score += 10;
else if (artifacts.screenshots.length >= 1) score += 5;
}
// Lint (10 points)
if (artifacts.lintErrors === 0) {
score += 10;
}
return score;
}
| Confidence | Action | |------------|--------| | >= 95% | Auto-approve (In Review -> Done) | | 80-94% | Manual review required | | < 80% | Validation failed, iterate |
# Proof of Work
**Task**: {issue_id}
**Type**: {task_type}
**Confidence**: {score}%
## Test Results
- Total: {total}
- Passed: {passed}
- Failed: {failed}
- Coverage: {coverage}%
## Build
- Status: {status}
- Duration: {duration}
## Screenshots
- Desktop: proof/desktop.png
- Mobile: proof/mobile.png
- Tablet: proof/tablet.png
## Artifacts
- test-results.txt
- coverage.json
- build-output.txt
const proof = {
testResults: { passed: 15, total: 15 },
buildSuccessful: true,
lintErrors: 0,
screenshots: ['desktop.png', 'mobile.png', 'tablet.png'],
testCoverage: 85,
};
const confidence = calculateConfidence(proof);
// 40 (tests) + 20 (build) + 20 (coverage) + 10 (screenshots) + 10 (lint) = 100%
const proof = {
testResults: { passed: 12, total: 15 }, // Some failing
buildSuccessful: true,
lintErrors: 2,
screenshots: ['desktop.png'],
testCoverage: 65,
};
const confidence = calculateConfidence(proof);
// 0 (tests fail) + 20 (build) + 15 (coverage) + 5 (1 screenshot) + 0 (lint errors) = 40%
// Result: Validation failed, must iterate
testing
A test skill for validation testing. Use when testing skill parsing and validation logic.
tools
--- name: bad-skill description: This skill has invalid YAML in frontmatter allowed-tools: [invalid, array, syntax prerequisites: not-an-array --- # Bad Skill This skill has malformed frontmatter that should fail parsing. The YAML has: - Unclosed array bracket - Wrong type for prerequisites (should be array, not string)
development
Sync model aliases from the curated Firebase database. Fetches default model assignments, short aliases, team compositions, and known model metadata from the claudish API. Run this to get fresh model recommendations.
tools
Release one or more Magus plugins to the distribution repos (magus, magus-alpha, magus-marketing). Handles version inference from git history, marketplace.json updates, tagging, and force-push to lean dist repos. Use whenever the user says "release kanban", "release the dev plugin", "cut a new version of gtd", "bump kanban to 1.7", or hands you a batch like "release kanban and gtd". Also use for multi-plugin releases and for checking what a release would contain before committing.