skills/skill-evolution/SKILL.md
Self-evolving skill system. Skills are scored after execution (0-100) on 5 dimensions. Score 90+ over 5 runs = crystallized (locked). Score below 30 = auto-repair attempted. Skills improve themselves through usage feedback.
npx skillsauth add rubicanjr/FinCognis skill-evolutionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Darwinian selection for skills. Skills that produce good outcomes are crystallized and protected. Skills that produce poor outcomes are repaired or archived. Every execution generates a score that drives the next generation of the skill.
Each skill execution is scored 0-100 on five dimensions:
| Dimension | Weight | What It Measures | |-----------|--------|-----------------| | Accuracy | 25% | Did the skill produce the correct result for the task? | | Relevance | 20% | Was the skill content applicable to the actual use case? | | Token Efficiency | 20% | Did the skill guide the agent without bloat or repetition? | | User Satisfaction | 20% | Did the outcome meet or exceed user expectations? | | Reusability | 15% | Could another agent use this skill in a similar situation? |
Composite score = weighted average of all five dimensions (0-100).
90-100: Excellent -- candidate for crystallization
70-89: Good -- active skill, no action needed
50-69: Adequate -- flag for review after 3 more runs
30-49: Poor -- schedule auto-repair attempt
0-29: Critical -- immediate auto-repair or archive
DRAFT ACTIVE CRYSTALLIZED ARCHIVED
| | | |
New skill In regular use Proven stable Deprecated/replaced
| | | |
+-- first run ->+-- score >90 ++-- score <30 |
| for 5+ runs | (3 attempts) |
+-- score <30 -->+ auto-repair |
| auto-repair | fails 3x -->--+
+-- score >90 -->+
New skills enter as Draft. They receive no special protection and are evaluated critically on first use. A Draft skill that scores below 30 on its very first run is discarded rather than repaired.
Skills in regular use. Scores are tracked in ~/.claude/skill-scores.jsonl. No action unless scores trend below 30 or above 90 over a rolling window of 5 runs.
A skill that maintains an average composite score above 90 over 5 or more consecutive runs is crystallized:
skill/<name>/crystallized-v<N>locked: trueA skill that fails auto-repair 3 times is archived:
skills/_archived/<name>/skill/<name>/archivedcatalyst agent if the capability is still neededAppend one record per execution to ~/.claude/skill-scores.jsonl:
{"skill":"experiment-loop","ts":"2026-04-07T10:00:00Z","session":"abc123","scores":{"accuracy":88,"relevance":92,"token_efficiency":75,"user_satisfaction":90,"reusability":85},"composite":86.5,"feedback":"Loop ran 4 iterations successfully, target nearly met"}
{"skill":"experiment-loop","ts":"2026-04-07T14:30:00Z","session":"def456","scores":{"accuracy":95,"relevance":90,"token_efficiency":82,"user_satisfaction":95,"reusability":88},"composite":90.4,"feedback":"Bundle size reduced 28%, target exceeded"}
# Average scores for a skill (last 10 runs)
cat ~/.claude/skill-scores.jsonl | python3 -c "
import sys, json, statistics
skill = '$1'
runs = [json.loads(l) for l in sys.stdin if json.loads(l).get('skill') == skill][-10:]
if runs:
avg = statistics.mean(r['composite'] for r in runs)
print(f'{skill}: {avg:.1f} avg over {len(runs)} runs')
"
When a skill reaches 90+ composite score over 5+ consecutive runs:
~/.claude/skill-scores.jsonl -- confirm no outliers inflating the averagelocked: true to the skill's frontmattergit tag skill/<name>/crystallized-v1 -m "Crystallized: avg score 92.3 over 7 runs"
git push origin skill/<name>/crystallized-v1
thoughts/SKILL-EVOLUTION.mdWhen a skill's composite score drops below 30:
~/.claude/skill-scores.jsonlThe catalyst agent rewrites the failing section(s) of the skill:
After repair, the skill is re-scored on a synthetic test case by the verifier agent:
After 3 failed auto-repairs:
thoughts/SKILL-EVOLUTION.mdcatalyst to draft a replacement from scratchAppend events to thoughts/SKILL-EVOLUTION.md:
## 2026-04-07
### skill: experiment-loop
- Status change: Active -> Crystallized
- Trigger: avg composite 91.2 over 6 consecutive runs
- Git tag: skill/experiment-loop/crystallized-v1
- Notable strength: Token Efficiency dimension consistently 85+
### skill: legacy-deploy-helper
- Status change: Active -> Auto-Repair (attempt 1/3)
- Trigger: composite 24 on last run
- Lowest dimension: Relevance (12) -- skill referenced outdated Heroku patterns
- Repair: catalyst rewrote "Deployment Targets" section with Vercel/Railway focus
- Post-repair synthetic score: 71 -- promoted back to Active
Skill evolution data feeds into canavar's cross-training pipeline:
skill-matrix.json with trust: lockedtrust: deprecated -- agents stop referencing iterror-ledger.jsonl with source: skill-evolution# View crystallized skills
node ~/.claude/hooks/dist/canavar-cli.mjs leaderboard --filter crystallized
# View skills needing repair
cat ~/.claude/skill-scores.jsonl | python3 -c "
import sys, json, collections
runs = [json.loads(l) for l in sys.stdin]
low = {r['skill'] for r in runs if r['composite'] < 30}
print('Skills needing repair:', low)
"
This skill activates automatically when:
verifier agent reports a skill-guided task as failedAgents involved: catalyst (repair), verifier (validation), self-learner (feedback extraction), canavar (cross-training propagation).
development
Goal-based workflow orchestration - routes tasks to specialist agents based on user goals
tools
Wiring Verification
development
Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.
development
Screenshot comparison QA for frontend development. Takes a screenshot of the current implementation, scores it across multiple visual dimensions, and returns a structured PASS/REVISE/FAIL verdict with concrete fixes. Use when implementing UI from a design reference or verifying visual correctness.