experimental/evolve-skills/SKILL.md
EXPERIMENTAL. Mine recent Claude Code transcripts for friction events, cluster them by active skill, propose patches for skills with 3+ friction events, validate each patch via headless replay, scrub the report through /publish-check, and present an EVOLUTION_REPORT.md for human review on a branch (never auto-merge). Use when asked to "evolve my skills", "audit skills against recent friction", "propose skill improvements from transcripts", "run the skill evolution pipeline", or as part of a weekly skill-quality cadence.
npx skillsauth add antjanus/skillbox evolve-skillsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill lives in skillbox/experimental/ and follows the experimental conventions:
main).EVOLUTION_REPORT.md) — never auto-merges./publish-check before any output is surfaced.AskUserQuestion.See experimental/README.md for the full set of conventions.
A weekly meta-loop: scan recent Claude Code transcripts, cluster friction events by which skill was active when they occurred, propose patches for the skills with the most friction, validate each patch via headless Claude replay of historical sessions, scrub the report for personal data, and present everything for human review on a branch.
Core principle: Skills should compound in capability over time, but only with human-validated changes. The pipeline finds candidates and presents evidence; the human decides what merges.
Always use when:
Useful for:
Six phases, each with a gate. Failing a gate halts the pipeline.
Create a branch in the skillbox repo:
git checkout -b experimental/evolve-skills/<YYYY-MM-DD-runid>
If the working tree is dirty, halt — experimental skills should not be run on a dirty tree.
Gate: On a clean experimental branch.
Read JSONL transcripts under ~/.claude/projects/ from the last --days (default: 7).
For each transcript, extract events that match a friction pattern (see reference/friction-patterns.md):
request_interrupted_by_user events.AskUserQuestion checkpoint and was interrupted.For each friction event, capture: (session_id, timestamp, friction_type, active_skill_or_none, surrounding_context).
Gate: At least one friction event found, or pipeline exits cleanly with "no friction to act on."
Group friction events by the skill that was active (frontmatter context, recent skill invocation in transcript, or "no skill" if none was active).
For each skill, compute:
friction_countfriction_types_distributionrepresentative_examples (top 3 most distinctive events, after redacting verbatim text)Skills with friction_count < --min-friction (default: 3) are excluded from patch generation.
Gate: ≥ 1 skill exceeds threshold, else exit with "no skill warrants patches this week."
For each qualifying skill, dispatch a parallel Agent (subagent_type: general-purpose) with:
SKILL.md (full content).SKILL.md — that would have prevented or reduced these friction events. Include rationale and a unified diff."Each agent returns: {patch_diff, rationale, expected_friction_reduction}.
Gate: All agents return — either with a proposal or a "no patch warranted" justification.
Skipped if --skip-replay was passed (the "lite" version of the pipeline).
For each proposed patch:
claude --headless (or equivalent CLI invocation) with the patched skill loaded.Replay validation is non-deterministic — expect score variance run-to-run. The skill records the validation as one signal among several, not as a gate.
See reference/replay-protocol.md for the exact replay invocation, scoring rubric, and known caveats.
Gate: Each patch has a validation result attached (or marked "skipped: --skip-replay" / "skipped: replay infrastructure unavailable").
Assemble EVOLUTION_REPORT.md (see Output Format below).
Run /publish-check on the report. Friction context can include verbatim user messages, file paths, and skill content — all sources of leakable PII. The privacy scrub is non-optional.
If /publish-check returns BLOCK, halt and surface the BLOCKERS. The user fixes the report (or the underlying patches) and the pipeline resumes.
If /publish-check returns WARN, surface findings via AskUserQuestion for ack-and-continue or revise.
If /publish-check returns PASS, write the report to the branch and surface it for review.
Commit the report to the branch (not main):
git add EVOLUTION_REPORT.md
git commit -m "evolve-skills: report for <YYYY-MM-DD-runid>"
DO NOT commit the proposed patches. They live in runs/<runid>/proposed-patches/<skill>.diff and the user applies them manually after review (or via a follow-up /apply-patch flow that does not exist yet).
Gate: Report committed to branch, ready for review.
Single file: EVOLUTION_REPORT.md, committed to the experimental branch.
# Skill Evolution Report — <YYYY-MM-DD-runid>
## Run Configuration
- Days scanned: 7
- Transcripts processed: N
- Friction events found: M
- Skills exceeding threshold: K (threshold: 3)
- Replay validation: enabled / skipped
## Skill: <name>
### Friction summary
- Events: N
- Distribution: {interruption: 4, wrong-approach: 2, correction: 1}
### Representative examples
1. <session-id>, <timestamp>: <redacted summary>
2. ...
### Proposed patch
- Rationale: <agent's rationale>
- Expected friction reduction: <agent's claim>
```diff
<unified diff against SKILL.md>
...
## Usage
**Default (last 7 days, replay enabled):**
```bash
/evolve-skills
Wider window:
/evolve-skills --days=14
Lite mode (skip replay validation):
/evolve-skills --skip-replay
Higher friction threshold (only well-established patterns):
/evolve-skills --min-friction=5
Input: /evolve-skills (default 7-day window)
Pipeline output:
Phase 1: Branch experimental/evolve-skills/2026-05-07-a3b9 created ✅
Phase 2: 47 transcripts scanned, 34 friction events found ✅
Phase 3: Clustered by skill:
- code-review: 8 events (above threshold)
- track-session: 4 events (above threshold)
- publish-check: 1 event (below threshold, excluded)
- 22 events with no active skill (excluded)
Phase 4: Dispatching 2 patch-proposal agents in parallel
- Agent A (code-review): patch proposed (12-line diff to "How It Works")
- Agent B (track-session): patch proposed (4-line diff to triggers)
Phase 5: Replay validation (3 sessions per patch)
- code-review patch: 2/3 friction avoided (mean 0.67)
- track-session patch: 1/3 friction avoided, 1/3 inconclusive (mean 0.50)
Phase 6: /publish-check on EVOLUTION_REPORT.md → PASS ✅
EVOLUTION_REPORT.md committed to branch.
Open: ~/projects/antjanus/skillbox/EVOLUTION_REPORT.md
The user reviews the report, approves the code-review patch, defers track-session for next week.
Input: /evolve-skills --min-friction=5
Pipeline output:
Phase 1: Branch experimental/evolve-skills/2026-05-07-c8f2 created ✅
Phase 2: 41 transcripts scanned, 19 friction events found ✅
Phase 3: Clustered by skill — max friction count: 3 (below threshold of 5)
Pipeline exits cleanly: "no skill warrants patches this week."
No EVOLUTION_REPORT.md generated. Branch experimental/evolve-skills/2026-05-07-c8f2 left empty (no commits).
The user can either lower --min-friction for a wider audit or accept that no patches are needed.
Input: /evolve-skills
Pipeline output:
Phase 1-5: completed ✅
Phase 6: /publish-check on EVOLUTION_REPORT.md → BLOCK
- skills/publish-check/representative-examples:1: "/Users/<USER>/projects/<repo>/..."
- 1 BLOCKER finding
Pipeline halted. Fix the report (or the underlying friction-extraction patterns
that included the unredacted path) and re-run Phase 6.
The user investigates, finds the friction-pattern extractor wasn't redacting absolute paths in representative_examples, fixes the extractor, re-runs.
runs/<runid>/proposed-patches/. Application is a separate human-driven step./publish-check.--skip-replay for cheaper iterations.This skill graduates from experimental/ to skills/ when:
Composes with:
/publish-check — required for Phase 6 (privacy scrub)./rate-skill — orthogonal; rate the skill before applying any patch to track quality changes.Does NOT compose with:
testing
Manual QA tracking — things tests can't verify. Use when asked to "create a QA list", "set up QA for this project", "what should I QA", "track manual QA", "audit the QA list", or "start manual QA".
development
Multi-source web research with cited synthesis in chat. Use when asked to "research X", "deep research on Y", "deep dive on Z", "investigate this topic", "compare X and Y", "pros and cons of X", or "survey the landscape of Y".
development
Use this skill whenever the user wants a multi-agent review of local changes — triggers include "review my code", "review these changes", "do a code review", or "check my changes before I commit". Writes REVIEW.md. Do NOT use for an open PR by number (use /review) or a security-specific pass (use /security-review).
testing
Resume work, track progress across sessions, verify completion. Use when asked to "resume work", "pick up where I left off", "what was I doing", "save progress", or "are we done". For multi-session tasks.