Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

rubicanjr/skill-evolution

Name: skill-evolution
Author: rubicanjr

skills/skill-evolution/SKILL.md

npx skillsauth add rubicanjr/FinCognis skill-evolution

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Skill Evolution

Darwinian selection for skills. Skills that produce good outcomes are crystallized and protected. Skills that produce poor outcomes are repaired or archived. Every execution generates a score that drives the next generation of the skill.

The 5 Scoring Dimensions

Each skill execution is scored 0-100 on five dimensions:

| Dimension | Weight | What It Measures | |-----------|--------|-----------------| | Accuracy | 25% | Did the skill produce the correct result for the task? | | Relevance | 20% | Was the skill content applicable to the actual use case? | | Token Efficiency | 20% | Did the skill guide the agent without bloat or repetition? | | User Satisfaction | 20% | Did the outcome meet or exceed user expectations? | | Reusability | 15% | Could another agent use this skill in a similar situation? |

Composite score = weighted average of all five dimensions (0-100).

Scoring Rubric

90-100: Excellent -- candidate for crystallization
70-89:  Good -- active skill, no action needed
50-69:  Adequate -- flag for review after 3 more runs
30-49:  Poor -- schedule auto-repair attempt
0-29:   Critical -- immediate auto-repair or archive

Skill Lifecycle

DRAFT          ACTIVE         CRYSTALLIZED      ARCHIVED
  |               |                |                |
New skill   In regular use   Proven stable    Deprecated/replaced
  |               |                |                |
  +-- first run ->+-- score >90   ++-- score <30    |
                  |   for 5+ runs  |   (3 attempts)  |
                  +-- score <30 -->+ auto-repair      |
                  |   auto-repair  |   fails 3x -->--+
                  +-- score >90 -->+

Draft

New skills enter as Draft. They receive no special protection and are evaluated critically on first use. A Draft skill that scores below 30 on its very first run is discarded rather than repaired.

Active

Skills in regular use. Scores are tracked in ~/.claude/skill-scores.jsonl. No action unless scores trend below 30 or above 90 over a rolling window of 5 runs.

Crystallized

A skill that maintains an average composite score above 90 over 5 or more consecutive runs is crystallized:

Git tag applied: skill/<name>/crystallized-v<N>
Read-only flag added to frontmatter: locked: true
Skill is excluded from auto-repair
Changes require explicit human unlock + PR

Archived

A skill that fails auto-repair 3 times is archived:

Moved to skills/_archived/<name>/
Git tag applied: skill/<name>/archived
Replacement skill drafted by catalyst agent if the capability is still needed

Score Storage Format

Append one record per execution to ~/.claude/skill-scores.jsonl:

{"skill":"experiment-loop","ts":"2026-04-07T10:00:00Z","session":"abc123","scores":{"accuracy":88,"relevance":92,"token_efficiency":75,"user_satisfaction":90,"reusability":85},"composite":86.5,"feedback":"Loop ran 4 iterations successfully, target nearly met"}
{"skill":"experiment-loop","ts":"2026-04-07T14:30:00Z","session":"def456","scores":{"accuracy":95,"relevance":90,"token_efficiency":82,"user_satisfaction":95,"reusability":88},"composite":90.4,"feedback":"Bundle size reduced 28%, target exceeded"}

Score CLI (quick check)

# Average scores for a skill (last 10 runs)
cat ~/.claude/skill-scores.jsonl | python3 -c "
import sys, json, statistics
skill = '$1'
runs = [json.loads(l) for l in sys.stdin if json.loads(l).get('skill') == skill][-10:]
if runs:
    avg = statistics.mean(r['composite'] for r in runs)
    print(f'{skill}: {avg:.1f} avg over {len(runs)} runs')
"

Crystallization Protocol

When a skill reaches 90+ composite score over 5+ consecutive runs:

Verify scores in ~/.claude/skill-scores.jsonl -- confirm no outliers inflating the average
Add locked: true to the skill's frontmatter

Apply git tag:

git tag skill/<name>/crystallized-v1 -m "Crystallized: avg score 92.3 over 7 runs"
git push origin skill/<name>/crystallized-v1

Log the crystallization in thoughts/SKILL-EVOLUTION.md
Notify via canavar cross-training so all agents know this skill is stable

Auto-Repair Protocol

When a skill's composite score drops below 30:

Diagnosis

Identify the lowest-scoring dimension (the primary failure mode)
Read the last 3 session feedback notes from ~/.claude/skill-scores.jsonl
Summarize what went wrong (specific, not vague)

Repair

The catalyst agent rewrites the failing section(s) of the skill:

Only the sections relevant to the low-scoring dimension
Preserve all high-scoring sections unchanged
Add a concrete example for the repaired section

Validation

After repair, the skill is re-scored on a synthetic test case by the verifier agent:

Synthetic score must be 50+ to proceed to Active state
If synthetic score < 50, attempt 2 of 3 repairs begins

Escalation

After 3 failed auto-repairs:

Archive the skill
Alert via thoughts/SKILL-EVOLUTION.md
Spawn catalyst to draft a replacement from scratch

Evolution Log Format

Append events to thoughts/SKILL-EVOLUTION.md:

## 2026-04-07

### skill: experiment-loop
- Status change: Active -> Crystallized
- Trigger: avg composite 91.2 over 6 consecutive runs
- Git tag: skill/experiment-loop/crystallized-v1
- Notable strength: Token Efficiency dimension consistently 85+

### skill: legacy-deploy-helper
- Status change: Active -> Auto-Repair (attempt 1/3)
- Trigger: composite 24 on last run
- Lowest dimension: Relevance (12) -- skill referenced outdated Heroku patterns
- Repair: catalyst rewrote "Deployment Targets" section with Vercel/Railway focus
- Post-repair synthetic score: 71 -- promoted back to Active

Integration with Canavar Cross-Training

Skill evolution data feeds into canavar's cross-training pipeline:

A crystallized skill is injected into canavar's skill-matrix.json with trust: locked
An archived skill is marked trust: deprecated -- agents stop referencing it
Auto-repair failures are logged to error-ledger.jsonl with source: skill-evolution
The canavar leaderboard tracks which agents most frequently produce high-scoring skill executions

# View crystallized skills
node ~/.claude/hooks/dist/canavar-cli.mjs leaderboard --filter crystallized

# View skills needing repair
cat ~/.claude/skill-scores.jsonl | python3 -c "
import sys, json, collections
runs = [json.loads(l) for l in sys.stdin]
low = {r['skill'] for r in runs if r['composite'] < 30}
print('Skills needing repair:', low)
"

Activation

This skill activates automatically when:

A skill completes an execution (PostToolUse hook)
A skill is referenced in a session that ends with user dissatisfaction
The verifier agent reports a skill-guided task as failed

Agents involved: catalyst (repair), verifier (validation), self-learner (feedback extraction), canavar (cross-training propagation).

rubicanjr/skill-evolution

skills/skill-evolution/SKILL.md

Self-evolving skill system. Skills are scored after execution (0-100) on 5 dimensions. Score 90+ over 5 runs = crystallized (locked). Score below 30 = auto-repair attempted. Skills improve themselves through usage feedback.

data-ai

Updated Apr 24, 2026

$ install --global

skillsauth

npx skillsauth add rubicanjr/FinCognis skill-evolution

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 6:53 AM239.0s1 file scanned

SKILL.md

name:: skill-evolution
description:: Self-evolving skill system. Skills are scored after execution (0-100) on 5 dimensions. Score 90+ over 5 runs = crystallized (locked). Score below 30 = auto-repair attempted. Skills improve themselves through usage feedback.

Skill Evolution

The 5 Scoring Dimensions

Each skill execution is scored 0-100 on five dimensions:

Composite score = weighted average of all five dimensions (0-100).

Scoring Rubric

90-100: Excellent -- candidate for crystallization
70-89:  Good -- active skill, no action needed
50-69:  Adequate -- flag for review after 3 more runs
30-49:  Poor -- schedule auto-repair attempt
0-29:   Critical -- immediate auto-repair or archive

Skill Lifecycle

DRAFT          ACTIVE         CRYSTALLIZED      ARCHIVED
  |               |                |                |
New skill   In regular use   Proven stable    Deprecated/replaced
  |               |                |                |
  +-- first run ->+-- score >90   ++-- score <30    |
                  |   for 5+ runs  |   (3 attempts)  |
                  +-- score <30 -->+ auto-repair      |
                  |   auto-repair  |   fails 3x -->--+
                  +-- score >90 -->+

Draft

New skills enter as Draft. They receive no special protection and are evaluated critically on first use. A Draft skill that scores below 30 on its very first run is discarded rather than repaired.

Active

Skills in regular use. Scores are tracked in ~/.claude/skill-scores.jsonl. No action unless scores trend below 30 or above 90 over a rolling window of 5 runs.

Crystallized

A skill that maintains an average composite score above 90 over 5 or more consecutive runs is crystallized:

Git tag applied: skill/<name>/crystallized-v<N>
Read-only flag added to frontmatter: locked: true
Skill is excluded from auto-repair
Changes require explicit human unlock + PR

Archived

A skill that fails auto-repair 3 times is archived:

Moved to skills/_archived/<name>/
Git tag applied: skill/<name>/archived
Replacement skill drafted by catalyst agent if the capability is still needed

Score Storage Format

Append one record per execution to ~/.claude/skill-scores.jsonl:

{"skill":"experiment-loop","ts":"2026-04-07T10:00:00Z","session":"abc123","scores":{"accuracy":88,"relevance":92,"token_efficiency":75,"user_satisfaction":90,"reusability":85},"composite":86.5,"feedback":"Loop ran 4 iterations successfully, target nearly met"}
{"skill":"experiment-loop","ts":"2026-04-07T14:30:00Z","session":"def456","scores":{"accuracy":95,"relevance":90,"token_efficiency":82,"user_satisfaction":95,"reusability":88},"composite":90.4,"feedback":"Bundle size reduced 28%, target exceeded"}

Score CLI (quick check)

# Average scores for a skill (last 10 runs)
cat ~/.claude/skill-scores.jsonl | python3 -c "
import sys, json, statistics
skill = '$1'
runs = [json.loads(l) for l in sys.stdin if json.loads(l).get('skill') == skill][-10:]
if runs:
    avg = statistics.mean(r['composite'] for r in runs)
    print(f'{skill}: {avg:.1f} avg over {len(runs)} runs')
"

Crystallization Protocol

When a skill reaches 90+ composite score over 5+ consecutive runs:

Verify scores in ~/.claude/skill-scores.jsonl -- confirm no outliers inflating the average
Add locked: true to the skill's frontmatter

Apply git tag:

git tag skill/<name>/crystallized-v1 -m "Crystallized: avg score 92.3 over 7 runs"
git push origin skill/<name>/crystallized-v1

Log the crystallization in thoughts/SKILL-EVOLUTION.md
Notify via canavar cross-training so all agents know this skill is stable

Auto-Repair Protocol

When a skill's composite score drops below 30:

Diagnosis

Identify the lowest-scoring dimension (the primary failure mode)
Read the last 3 session feedback notes from ~/.claude/skill-scores.jsonl
Summarize what went wrong (specific, not vague)

Repair

The catalyst agent rewrites the failing section(s) of the skill:

Only the sections relevant to the low-scoring dimension
Preserve all high-scoring sections unchanged
Add a concrete example for the repaired section

Validation

After repair, the skill is re-scored on a synthetic test case by the verifier agent:

Synthetic score must be 50+ to proceed to Active state
If synthetic score < 50, attempt 2 of 3 repairs begins

Escalation

After 3 failed auto-repairs:

Archive the skill
Alert via thoughts/SKILL-EVOLUTION.md
Spawn catalyst to draft a replacement from scratch

Evolution Log Format

Append events to thoughts/SKILL-EVOLUTION.md:

## 2026-04-07

### skill: experiment-loop
- Status change: Active -> Crystallized
- Trigger: avg composite 91.2 over 6 consecutive runs
- Git tag: skill/experiment-loop/crystallized-v1
- Notable strength: Token Efficiency dimension consistently 85+

### skill: legacy-deploy-helper
- Status change: Active -> Auto-Repair (attempt 1/3)
- Trigger: composite 24 on last run
- Lowest dimension: Relevance (12) -- skill referenced outdated Heroku patterns
- Repair: catalyst rewrote "Deployment Targets" section with Vercel/Railway focus
- Post-repair synthetic score: 71 -- promoted back to Active

Integration with Canavar Cross-Training

Skill evolution data feeds into canavar's cross-training pipeline:

A crystallized skill is injected into canavar's skill-matrix.json with trust: locked
An archived skill is marked trust: deprecated -- agents stop referencing it
Auto-repair failures are logged to error-ledger.jsonl with source: skill-evolution
The canavar leaderboard tracks which agents most frequently produce high-scoring skill executions

# View crystallized skills
node ~/.claude/hooks/dist/canavar-cli.mjs leaderboard --filter crystallized

# View skills needing repair
cat ~/.claude/skill-scores.jsonl | python3 -c "
import sys, json, collections
runs = [json.loads(l) for l in sys.stdin]
low = {r['skill'] for r in runs if r['composite'] < 30}
print('Skills needing repair:', low)
"

Activation

This skill activates automatically when:

A skill completes an execution (PostToolUse hook)
A skill is referenced in a session that ends with user dissatisfaction
The verifier agent reports a skill-guided task as failed

Agents involved: catalyst (repair), verifier (validation), self-learner (feedback extraction), canavar (cross-training propagation).

Related Skills

rubicanjr/workflow-router

development

VerifiedTrustedCommunity

Goal-based workflow orchestration - routes tasks to specialist agents based on user goals

SKILL.mdUpdated Apr 24, 2026

rubicanjr/workflow-router

rubicanjr/wiring

tools

VerifiedTrustedCommunity

Wiring Verification

SKILL.mdUpdated Apr 24, 2026

rubicanjr/websocket-patterns

development

VerifiedTrustedCommunity

Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.

SKILL.mdUpdated Apr 24, 2026

rubicanjr/websocket-patterns

rubicanjr/visual-verdict

development

VerifiedTrustedCommunity

Screenshot comparison QA for frontend development. Takes a screenshot of the current implementation, scores it across multiple visual dimensions, and returns a structured PASS/REVISE/FAIL verdict with concrete fixes. Use when implementing UI from a design reference or verifying visual correctness.

SKILL.mdUpdated Apr 24, 2026

rubicanjr/visual-verdict

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/rubicanjr/FinCognis.git

# Copy into Claude Code skills folder (global)
cp -r FinCognis/skills/skill-evolution ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

rubicanjr/FinCognis

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT