/SKILL.md
Pure-discovery autonomous quality engine that reads ANY codebase, dynamically generates metrics from what it finds, auto-fixes what it can, creates GitHub issues for what it can't, and loops continuously until all auto-fixable metrics reach 10/10. No templates, no hardcoded categories — everything is discovered from the project itself. Each cycle re-reads the project, refreshes the rubric (adding new metrics, removing obsolete ones), and adapts to what the project looks like NOW. Works with any language, any framework, any project type. Supports goal-driven mode where you specify what you're building and the engine measures completeness + quality against that goal. Use when the user asks to "review everything", "score the app", "find all issues", "perfection engine", "run quality loop", "audit all metrics", "make everything 10/10", "score my project", "score my code", "rate my code", "find everything wrong", "quality score", "review my API", "audit my codebase", "what should I build next", "is my project complete", "continuous improvement", "portfolio review", "check my code quality", "how good is my code", or wants autonomous quality improvement.
npx skillsauth add adaptationio/perfection-engine perfection-engineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reads any codebase. Discovers what matters. Measures it. Fixes it. Loops until perfect.
No templates. No hardcoded categories. Everything generated from the project itself.
# THE ULTIMATE COMMAND — runs everything, makes the project the best it can be
/perfection-engine max # All features: score + fix + ideas + implement + loop forever
# Individual modes
/perfection-engine # Quality loop only (score + fix until 10/10)
/perfection-engine --goal "description" # Goal-driven: measure completeness against an end state
/perfection-engine --ideas # Quality loop + ideas discovery each cycle
/perfection-engine score # Score only (no fixes)
/perfection-engine score --changed-only # Score only files changed since last cycle
/perfection-engine validate # Validate scoring quality
/perfection-engine fix # Fix failing metrics only
/perfection-engine fix --dry-run # Show what fixes would be applied
/perfection-engine fix --quick-wins # Fix easiest high-impact items first
/perfection-engine ideas # Discover improvement opportunities
/perfection-engine ideas --implement # Discover AND auto-implement best ideas
/perfection-engine report # Display latest scorecard
/perfection-engine portfolio # Cross-project dashboard
/perfection-engine export # Export scores to CSV/JSON
/perfection-engine reset # Delete state, start fresh
/perfection-engine max runs EVERY feature in a single continuous loop:
DISCOVER → SCORE → VALIDATE → FIX → IDEAS → IMPLEMENT → LOOP
↑ |
└──────────────────────────────────────────────────────┘
What it does each cycle:
It combines:
The result: The engine doesn't just make your code better — it makes your PROJECT better. It finds what's missing, builds it, then makes sure it's high quality. It loops until every quality metric is 10/10 AND there are no more improvement ideas to implement.
Safety: All MAX mode safety rails apply — max 20 cycles, $50/cycle cost cap, simplicity gate, ratcheting, dev branch only, never deletes features.
READ the project → DISCOVER what matters → SCORE everything →
FIX what's broken → LOOP (re-read, re-discover, re-score, re-fix)
The engine has zero opinions about what your project should look like. It reads the code, understands what the project does, generates categories and metrics that make sense for THIS project, scores them, fixes what it can, and repeats.
Each cycle, the rubric refreshes — new metrics added for new code, obsolete metrics removed for deleted code, priorities adjusted based on current state.
The engine's intelligence lives here. Everything is generated dynamically.
mkdir -p .claude/skills/perfection-engine-state
perfection-rubric.json exists → load it (continuing from previous cycle)current-cycle.json shows in-progress and started less than 2 hours ago → STOP with message "Another cycle is in progress (started {time}). Wait or run /perfection-engine reset." Locks older than 2 hours are considered stale and ignored..perfection-engine.yml exists in project root → load user config overrides{"cycle": N, "status": "in-progress|completed|paused|error", "started_at": "ISO", "completed_at": "ISO|null", "metrics_scored": 0, "metrics_fixed": 0}Build a comprehensive understanding of the project by reading:
Skip binary files: Do not read images, fonts, compiled binaries, databases, or other non-text files during discovery. Detect by extension (.png, .jpg, .gif, .woff, .ttf, .ico, .sqlite, .wasm, etc.) or by failed text decoding.
For large projects (>200 source files): Sample strategically — read entry points, config files, key directories, and a representative sample of source files. Use subagents to read different areas in parallel.
Save understanding as project-profile.json:
{
"project_name": "detected from manifest",
"description": "LLM-generated 1-sentence summary of what this project does",
"languages": ["typescript", "python"],
"frameworks": ["react", "express"],
"features_detected": [
"user authentication",
"payment processing",
"real-time chat",
"admin dashboard"
],
"architecture": "monolith|microservices|serverless|static|library|cli",
"has_frontend": true,
"has_backend": true,
"has_tests": true,
"has_ci": true,
"has_deployment": true,
"urls": { "dev": "...", "prod": "..." },
"source_file_count": 150,
"test_runner": "jest|pytest|cargo-test|go-test|make|null",
"goal": "from --goal flag or .perfection-engine.yml or null"
}
If source_file_count == 0 → report "No source files found. Nothing to measure." and exit.
If source_file_count < 5 → generate a simplified rubric focused on project setup
(README quality, license, gitignore, basic linting, dependency management) with ~10 metrics.
Suggest the user re-run after the project has more substance.
This is the core innovation. The LLM reads the project and generates ALL categories and metrics dynamically. No templates, no pre-defined lists.
Rubric generation prompt:
You just read this project: {project_profile}
Source files sampled: {file_list_with_summaries}
Generate a comprehensive quality rubric for THIS specific project.
For each category you discover:
- Give it a short ID (2-5 chars) and descriptive name
- Explain why it matters for THIS project
- Generate 5-25 specific, measurable metrics
For each metric:
- Give it an ID ({category}-{NNN})
- Name it specifically (not generic — reference actual files/features)
- Describe exactly how to measure it (what tool, what command, what to look for)
- Classify the scoring method:
deterministic — run a command, check output (lint, grep, test, curl)
playwright — navigate a URL, interact with UI, verify state
llm_judged — read code/output, evaluate quality with structured rubric
statistical — run N samples, verify distribution
unit_test_proxy — run the project's own test suite
- Determine if it can be auto-fixed (true/false)
- Define what 10/10 looks like for this specific metric
Guidelines:
- Be SPECIFIC to this project. "Login page loads" not "Page loads"
- Reference actual file paths, function names, endpoints you found
- Cover ALL dimensions: does it work, is it secure, is it fast, is it accessible,
is it well-coded, is it well-tested, is it well-documented
- If a goal was provided, generate COMPLETENESS metrics:
does feature X exist, does flow Y work end-to-end
- Include metrics at every level: code, feature, flow, system
- Remove anything generic that doesn't apply
Return as JSON:
{
"categories": {
"category_id": {
"name": "Category Name",
"why": "Why this matters for this project",
"metrics": [
{
"id": "CAT-001",
"name": "Specific metric name",
"description": "How to measure this",
"type": "deterministic|playwright|llm_judged|statistical|unit_test_proxy",
"auto_fixable": true,
"what_10_looks_like": "Description of perfect score",
"score": null,
"incumbent_score": null,
"target": 10,
"scope": ["src/auth/*.js"],
"goal_driven": false,
"history": []
}
]
}
}
}
On subsequent cycles, the engine re-reads the project and re-generates the rubric. Then merges with the existing rubric:
Rubric cleanup rules:
max_metrics range is 1-500)When a goal is provided, the LLM generates an additional layer of metrics:
Goal: "{goal_text}"
Current features detected: {features_detected}
What features/flows/capabilities are NEEDED to achieve this goal but
do NOT exist in the codebase? For each missing feature:
- Name it
- Describe what "complete" looks like
- Define 3-5 acceptance criteria
- Estimate complexity (small/medium/large)
Also: for features that PARTIALLY exist, what's missing?
These generate metrics scored using any of the five standard methods (usually playwright or deterministic), and are tagged with "goal_driven": true to distinguish them from structural metrics:
{
"id": "GOAL-001",
"name": "Payment flow exists and works end-to-end",
"type": "playwright",
"goal_driven": true,
"score": 0,
"what_10_looks_like": "User can complete checkout, payment processes, confirmation shows"
}
git add .claude/skills/perfection-engine-state/
git commit -m "perfection-engine: cycle {N} DISCOVER"
If no git repo, write state files without committing.
Score every metric in the rubric. The scoring methods are universal — they're tools, not opinions.
Before starting, estimate and display:
Project: {name}
Metrics discovered: {N} across {M} categories
Estimated scoring time: {T} minutes
Estimated API cost: ${C}
Read references/scoring-methods.md for detailed protocols.
| Method | What It Does | When Used | |--------|-------------|-----------| | deterministic | Run a command, check output | Lint, grep, test suites, curl, file checks | | playwright | Navigate URL, interact with UI, verify state | Frontend flows, forms, auth gates | | llm_judged | Read code/output, evaluate quality | Code review, prompt quality, documentation | | statistical | Run N samples, verify distribution | Randomness, balance, probability systems | | unit_test_proxy | Run the project's test suite | Map pass/fail ratio to 0-10 |
Scoring scale: 0 (broken) → 5 (adequate) → 10 (perfect). Full definitions in references/scoring-methods.md.
Group metrics by scoring method:
Process each batch in groups of 20, compact context between groups
Dependency awareness: If metric A depends on metric B (e.g., "login works" gates "dashboard loads"), and B failed → skip A, score as null with reason
Incremental mode (--changed-only): Use git diff to identify changed files,
only re-score metrics related to those files, carry forward unchanged scores
Before scoring, discover what tools and capabilities are available in this session. The engine adapts its scoring methods based on what it finds — never assume a tool exists, always check first.
Discovery checklist (run at start of each cycle):
MCP servers: Check what MCP tools are available (Playwright, Exa, Tavily, etc.). If Playwright MCP is connected → browser-based scoring is available. If no browser tools → skip all playwright-type metrics or fall back to curl/API checks.
Installed skills: Check what skills are loaded in this session.
If superpowers:writing-plans available → use for fix planning.
If e2e-playwright-core available → leverage its patterns.
If none → fall back to inline approaches.
Plugin tools: Check for any plugin-provided tools that could enhance scoring (web search, code analysis, database queries, etc.).
CLI tools on the system: Check what's installed locally.
Run which npm node python3 cargo go java rustc docker kubectl gh curl (or equivalent).
Only generate metrics that the local toolchain can actually measure.
Project's own tooling: Check package.json scripts, Makefile targets, CI workflows for project-specific commands that can be leveraged for scoring.
Save discovered capabilities to project-profile.json:
{
"capabilities": {
"browser": true,
"mcp_tools": ["playwright", "exa", "tavily"],
"skills": ["superpowers:writing-plans", "superpowers:executing-plans"],
"cli_tools": ["npm", "node", "python3", "docker", "kubectl", "gh"],
"project_scripts": ["test", "lint", "build", "typecheck"],
"can_deploy": true,
"can_create_issues": true
}
}
How capabilities affect metric generation:
gh CLI → no GitHub issues (log to markdown instead)Use whatever is available. Adapt to whatever is missing. Never fail because a tool doesn't exist — degrade gracefully and note what was skipped and why.
When scoring will inform the FIX phase, the scores are recorded but detailed scoring
criteria are NOT exposed to fix agents. Fix agents see:
{metric_id, name, score, evidence, suggestion} — not weights or formulas.
This prevents gaming.
Scores move monotonically upward. Track incumbent_score per metric (best ever).
After a fix, if new_score <= incumbent_score → reject the fix immediately.
Write to docs/perfection-engine/cycle-{N}-scorecard.md:
Update docs/perfection-engine/leaderboard.md with best/failed fixes.
After scoring, validate the results before acting on them. This prevents fixing based on bad data.
Before scoring any metrics, verify the environment is stable:
curl -s -o /dev/null -w "%{http_code}")npm test --bail or equivalent)npm run build or equivalent)null with
null_reason: "environment_unstable", do NOT score them as 0Run these checks on the scored rubric:
Contradiction detection: If metric A says "feature works" (score 10) but metric B says "feature's output is broken" (score 0), flag both for manual review. Detect by finding metric pairs in the same category where scores differ by >7 points and they reference the same file or feature.
Unmeasurable metric detection: If a metric scored null for 2+ consecutive cycles
and has never been successfully measured, it's likely unmeasurable. Remove it from the
rubric and log: "Removed {id}: never successfully measured after {N} attempts".
Coverage analysis: Map all source files to the metrics that reference them via
scope. If any source file has zero metrics → flag as uncovered. If >30% of files
are uncovered → generate additional metrics for those files in the next DISCOVER phase.
Score confidence check: For deterministic metrics, add a confidence level:
highmediumlow
Low-confidence deterministic scores are treated like low-confidence LLM scores:
excluded from auto-fix targeting and flagged for review.Flaky metric detection: If a metric's score has oscillated (e.g., 8 → 3 → 9 → 2)
across recent cycles, mark it as flaky: true. Flaky metrics get 3 measurements
(median taken) instead of 1. If still oscillating → flag for human review.
Append validation results to the scorecard:
## Scoring Validation
- Contradictions found: {N} (details below)
- Unmeasurable metrics removed: {N}
- Uncovered source files: {N} of {total} ({%})
- Low-confidence scores: {N} (excluded from auto-fix)
- Flaky metrics: {N} (using 3x median)
Fix metrics scoring below target where auto_fixable is true.
Two modes:
Default (impact-priority): LLM looks at all failing metrics and orders by:
Quick-wins mode (--quick-wins): Sort by effort-adjusted impact:
priority = (target - score) × (1 / estimated_effort)
One-line boolean fixes before complex refactors.
fix-history.json — if same approach failed 2x → skip, create issuegit stash of current working tree (safety net)superpowers:writing-plans if available, otherwise plan inlinegit diff HEAD~1){metric_id}-cycle-{N}-before.png / {metric_id}-cycle-{N}-after.pngscope overlaps with
modified files). If any regressed → revert, create combined issue with evidence.fix-history.json — approach, files, lines, scores, evidence pathsAfter max attempts (default 2) → create GitHub issue with:
fix-history.json records every attempt:
{
"metric_id": "AUTH-003",
"approach": "Added CSRF middleware to Express",
"files_changed": ["backend/middleware/csrf.js"],
"lines_added": 25, "lines_removed": 0,
"score_before": 2, "score_after": 9,
"success": true, "cycle": 3
}
Before fixing, check: "What worked on similar metrics in past cycles?" This makes the engine smarter over time.
If any step fails with an exception:
null_reason: "fix_error: {message}"/perfection-engine fix --dry-run: Generate fix plans without modifying code.
Output a report showing what would change, which files, estimated risk.
Requires: A git repository with a GitHub remote. If no GitHub remote is detected, skip issue management entirely and log unfixable metrics to docs/perfection-engine/unfixable-metrics.md instead.
.perfection-engine.ymlAfter each cycle, update docs/perfection-engine/leaderboard.md:
COMPLETE when ALL of:
NOT COMPLETE → compact context, commit state, start next cycle immediately.
Compact context means: discard raw tool outputs and file contents from memory, retain only: metric scores, category scores, fix history summary, current cycle number, and the rubric JSON. This frees context window for the next cycle's DISCOVER phase.
The engine does NOT pause between cycles. It:
/perfection-engine reportDisplay the latest cycle scorecard. If none exists, suggest running score first.
/perfection-engine resetDelete .claude/skills/perfection-engine-state/ and docs/perfection-engine/.
Asks for confirmation. Next run starts completely fresh.
/perfection-engine portfolioScan the parent directory of the current project for sibling directories containing .claude/skills/perfection-engine-state/perfection-rubric.json. Trend = score delta from previous cycle, shown as arrow. If a project has no completed cycles, show --.
Generate cross-project dashboard:
| Project | Score | Trend | Weakest | Strongest |
|---------|-------|-------|---------|-----------|
/perfection-engine exportExport all scores to docs/perfection-engine/scores-export.csv (columns: metric_id, category, name, score, target, type, auto_fixable, cycle) and .json.
The ideas mode goes beyond quality — it discovers what the project COULD be, not just
whether what exists is good. It uses parallel agents to research, analyze, and propose
improvements, new features, architectural changes, and strategic opportunities.
/perfection-engine ideasRuns a multi-agent review of the project and generates an improvement report.
Phase 1: UNDERSTAND — Read the entire project (same as DISCOVER)
Phase 2: RESEARCH — Launch parallel agents to explore opportunities across 6 dimensions:
| Agent | What It Explores | |-------|-----------------| | Feature gaps | What features do similar projects have that this one doesn't? What would users expect? What's the next logical feature? | | Architecture | Could the codebase be restructured for better maintainability? Are there patterns that should be refactored? Is the tech stack optimal? | | Performance | What would make this significantly faster? CDN, caching, database optimization, lazy loading, edge computing? | | User experience | What flows are confusing? What would make users happier? What's the onboarding experience like? | | Developer experience | What would make contributing easier? Better docs, tests, CI, tooling, error messages? | | Strategic | What's the competitive landscape? What trends in this domain should the project adopt? What's the biggest risk if nothing changes? |
Each agent reads the codebase, uses web search for research when helpful, and returns structured findings.
Phase 3: SYNTHESIZE — Combine all agent findings into a prioritized report:
# Improvement Report — {project_name}
## High-Impact Ideas (do these first)
| # | Idea | Category | Effort | Impact | Description |
|---|------|----------|--------|--------|-------------|
| 1 | Add real-time collaboration | Feature | Large | High | Users expect multiplayer editing... |
| 2 | Split monolithic router | Architecture | Medium | High | 17K-line file is unmaintainable... |
## Medium-Impact Ideas
...
## Exploratory Ideas (research further)
...
## Anti-Patterns Found
| Pattern | Where | Why It's Risky | Suggested Fix |
|---------|-------|---------------|---------------|
Save to docs/perfection-engine/ideas-{date}.md.
Phase 4: ISSUE CREATION — For each idea, create a GitHub issue:
Title: "💡 {idea_name}"
Labels: perfection-engine, idea, {category}, {effort}
Body: Description, rationale, suggested approach, estimated effort, impact assessment
/perfection-engine ideas --implementSame as ideas but after generating the report, automatically implements the top 3
highest-impact ideas using the fix loop (plan → implement → deploy → validate).
Only implements ideas tagged as effort: small or effort: medium by default.
Large-effort ideas always create issues instead of auto-implementing.
Safety: Ideas mode never deletes existing features or makes breaking changes. It only ADDS new capabilities or IMPROVES existing ones. If an idea requires removing code, it creates an issue instead of auto-implementing.
You can run both together:
/perfection-engine --ideas # Full loop + ideas discovery each cycle
This adds a 5th phase to the loop:
DISCOVER → SCORE → FIX → IDEAS → LOOP
Each cycle: fix quality issues AND discover new improvements. The ideas feed into the next cycle's rubric — new features that get implemented become new metrics to score for quality.
Ideas are tracked in the per-project state:
{project}/.claude/skills/perfection-engine-state/
├── ideas-history.json # All ideas generated across cycles
├── ideas-implemented.json # Ideas that were auto-implemented
└── ...existing state files
/perfection-engine score --changed-onlyIncremental scoring via git diff — only metrics affected by recent changes.
/perfection-engine --goal "description"Goal-driven mode: generates completeness metrics in addition to quality metrics. Measures both "does it exist?" and "is it good?"
Optional .perfection-engine.yml in project root:
# All fields optional — engine auto-discovers everything
goal: "Build a weight loss telehealth platform" # Adds completeness metrics
scoring:
max_metrics: 300 # Default 300, hard cap 500
batch_size: 20 # Metrics per scoring batch
fix:
max_attempts: 2 # Per metric per cycle
timeout_seconds: 60 # Per fix attempt
strategy: "impact" # impact (default) or quick-wins
ratcheting: true # Scores only go up
simplicity_gate: true # Reject bloat fixes
github:
max_issues_per_cycle: 50
team_assignments: # Optional
security: "@security-team"
default: "@tech-lead"
safety:
max_cycles: 20
cost_threshold: 50 # USD per cycle, pause for confirmation
deploy_branch: "dev" # Auto-detected if not specified
exclude: # Skip specific things
files: ["vendor/", "dist/", "node_modules/"]
metrics: ["PERF-003"] # Skip specific metric IDs
{project}/.claude/skills/perfection-engine-state/
├── project-profile.json # What the engine understood about the project
├── perfection-rubric.json # Living rubric (regenerated each cycle)
├── cycle-history.json # Score history across all cycles
├── current-cycle.json # Lock + progress
├── fix-history.json # What fixes were tried, what worked/failed
└── leaderboard-data.json # Structured data for leaderboard.md
If no git repo → state files still written, just not committed.
The engine uses a Stop hook to keep looping without human intervention. When Claude finishes responding, the hook checks if a cycle is in-progress:
The hook is registered in hooks/hooks.json and fires on both Stop and SubagentStop
events. It reads current-cycle.json to determine state.
How to stop the engine manually: Set current-cycle.json status to "paused" or
"completed", or run /perfection-engine reset. The hook will then allow Claude to stop.
No hook installed? The engine still works — it just won't auto-continue between cycles. You'd need to manually say "continue" after each cycle completes.
| Rail | Default | |------|---------| | Max fix attempts per metric per cycle | 2 | | Max GitHub issues per cycle | 50 | | Max total cycles | 20 | | Fix timeout | 60s | | Cost threshold | $50/cycle → pause for confirmation | | Max metrics | 300 (hard cap 500) | | Deployment target | Dev branch only | | Simplicity gate | Reject >50 lines for <1 point | | Ratcheting | Scores only go up | | Consecutive errors | 5 → pause | | Concurrency lock | 2-hour staleness window | | Bootstrap guard | 0 files → exit; <5 files → simplified rubric |
| Skill | Usage | Required? |
|-------|-------|-----------|
| superpowers:writing-plans | Create fix plans | Optional (fallback: inline) |
| superpowers:executing-plans | Implement fixes | Optional (fallback: inline) |
| Playwright MCP | Browser testing | Only when URLs available |
perfection-engine/
├── .claude-plugin/
│ └── plugin.json # Plugin manifest
└── skills/
└── perfection-engine/
├── SKILL.md # Main skill documentation
└── references/
├── config-schema.md # .perfection-engine.yml schema
├── fix-process.md # Fix workflow & crash recovery
└── scoring-methods.md # Scoring tools & protocols
| File | Purpose |
|------|---------|
| references/scoring-methods.md | How each scoring method works (tools, not content) |
| references/fix-process.md | Fix workflow, memory, crash recovery, simplicity |
| references/config-schema.md | .perfection-engine.yml full schema |
development
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.
development
Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.
development
Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.
development
End-to-end Parallels smoke, upgrade, and rerun workflow for OpenClaw across macOS, Windows, and Linux guests. Use when Codex needs to run, rerun, debug, or interpret VM-based install, onboarding, gateway smoke tests, latest-release-to-main upgrade checks, fresh snapshot retests, or optional Discord roundtrip verification under Parallels.