skills/visual-validation/SKILL.md
# Visual Validation — Autonomous Screenshot Verification ## Philosophy Every UI change should be visually verified before it ships. Peekaboo captures pixel-accurate screenshots. The system compares before/after and flags visual regressions. No manual "looks good to me" — the machine verifies what the machine built. ## Autonomous Flow ``` static/* files modified (detected by auto-review-hook or E2E testkit) ↓ peekaboo image --mode screen → ~/.maggy/visual-verify/after-{ts}.png ↓ Compa
npx skillsauth add alinaqi/claude-bootstrap skills/visual-validationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Every UI change should be visually verified before it ships. Peekaboo captures pixel-accurate screenshots. The system compares before/after and flags visual regressions. No manual "looks good to me" — the machine verifies what the machine built.
static/* files modified (detected by auto-review-hook or E2E testkit)
↓
peekaboo image --mode screen → ~/.maggy/visual-verify/after-{ts}.png
↓
Compare with latest baseline (previous after-*.png)
↓
AI evaluation: Gemini Flash analyzes screenshot for visual regressions
↓
Report: CLEAN or REGRESSION_DETECTED with specific issues
# Install
brew install steipete/tap/peekaboo
# Capture full system screenshot
peekaboo image --mode screen --retina --path /tmp/screen.png
# Capture specific window
peekaboo image --mode window --app "Google Chrome" --path /tmp/window.png
# Verify Maggy dashboard after code change
peekaboo image --mode window --app "Google Chrome" --path ~/.maggy/visual-verify/after-$(date -u +%Y%m%d-%H%M%S).png
# Compare two screenshots
ls -la ~/.maggy/visual-verify/after-*.png # review before/after pairs
# After making any static/ file change, Claude should:
1. git add maggy/maggy/static/ && git diff --stat HEAD # check what changed
2. Restart Maggy: lsof -ti :8080 | xargs kill && python3 -m maggy.main &
3. Wait for health: curl -s http://localhost:8080/api/health
4. Capture screenshot: peekaboo image --mode screen --path ~/.maggy/visual-verify/after-$(date -u +%Y%m%d-%H%M%S).png
5. Compare with latest baseline in ~/.maggy/visual-verify/
6. AI evaluation: ~/bin/gemini-api --flash "Analyze this screenshot for visual regressions"
7. Report findings
# One-shot verification
curl -s http://localhost:8080/api/health && \
peekaboo image --mode screen --path ~/.maggy/visual-verify/verify.png && \
echo "✓ Dashboard verified — screenshot at ~/.maggy/visual-verify/verify.png"
static/ changes in git diff~/.maggy/visual-verify/POST /api/testkit/run| Scenario | Command |
|----------|---------|
| After static/ change | peekaboo image --mode screen --path ~/.maggy/visual-verify/after.png |
| Before committing UI change | Run verification flow above |
| Reviewing dashboard | Screenshot + AI analysis |
| Build-in-public post | Auto-captured by plugin |
| Cross-browser check | peekaboo image --mode window --app "Safari" |
testing
Multi-model validation council — auto-validate plans, architecture changes, and PRs via validate-plan/review before executing
development
Mandatory code reviews via /code-review before commits and deploys
tools
# Model Routing System ## How Routing Decisions Are Made Every user prompt goes through a 9-tier classification pipeline before any AI model processes it. The system answers three questions: 1. **Which model should handle this?** — 9-tier cost/complexity classification 2. **Is the classifier itself working?** — Cascading fallback (qwen3 → kimi → deepseek → cache) 3. **Can we verify the result?** — Tool-level fallback + auto-evaluation ### The Pipeline ``` User types prompt ↓ UserPromptS
development
# Build in Public — Best Practices ## Philosophy Build in public isn't marketing. It's letting people watch you work. The best posts feel like you're narrating your thought process to a friend who's also a senior engineer. No hype sludge. No "I'm excited to announce." Just: here's what I built, here's why it matters, here's what I learned. ## What to Share (and What Not To) **Share:** - Technical decisions and the reasoning behind them ("Chose SQLite over Postgres because...") - Architecture