skills/skill-improver/SKILL.md
Runs on a schedule to mine recent Codex and Claude Code conversations across configured projects, find moments where things went off plan (the user had to steer, correct, abort, or re-explain), and propose targeted improvements to the specific skills that were in use at the time. Opens one pull request per run against the skills repo, with each proposed edit annotated with the concrete steering moment that motivated it. Also analyzes its own runs (the `skills` repo is one of the configured projects) so it iteratively improves itself. Use this skill when the user asks to "analyze recent conversations", "find what went wrong", "improve skills based on past runs", or sets up a scheduled run of skill-improver. Make sure to use this skill whenever the user mentions recursive skill improvement, post-mortem analysis of agent conversations, or automating skill quality based on real usage.
npx skillsauth add ilamanov/skills skill-improverInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A scheduled audit loop. Each run pulls newly finished agent conversations across configured projects, identifies steering moments, attributes each one to a specific skill (or the orchestration around it), and ships a single PR with all proposed skill edits — with the why spelled out in both the PR and the conversation that triggered the run.
The goal is continuous, evidence-based skill improvement. Every time something goes wrong, the question is: what went wrong, and who should have caught it? If a skill could have prevented it, the skill changes.
git and gh (GitHub CLI, authenticated) — for branch + PR creationpython3 — runs the conversation puller and the brief discovery script~/.codex/sessions/, ~/.codex/archived_sessions/, and ~/.claude/projects/.briefs/ directory inside each configured project root (and its worktrees) — this is where the brief skill writes its HTML artifactsIf any are missing, stop and ask.
config.json (next to this file) lists projects to analyze and the path patterns where their conversations live (main repo, codex worktrees, claude worktrees, cursor worktrees, gwt-worktrees). To add a project, append a new entry:
{
"name": "my-new-project",
"roots": [
"/Users/nazar/LocalProjects/my-new-project",
"/Users/nazar/.codex/worktrees/*/my-new-project",
"/Users/nazar/.cursor/worktrees/my-new-project/*",
"/Users/nazar/LocalProjects/my-new-project/.claude-worktrees/*",
"/Users/nazar/gwt-worktrees/my-new-project/*"
]
}
The state/state.json file holds a per-project, per-source started_at cursor so each run only analyzes what's new since the last run. A separate state/briefs-state.json file holds the per-project mtime cursor for .briefs/*.html artifacts (see Step 3b) — kept apart because briefs are a different cadence and source from conversations and bundling them muddies both cursors.
The skills project is configured as one of the targets — this is intentional. The skill audits its own runs and proposes its own edits.
~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl (live) and ~/.codex/archived_sessions/*.jsonl. First line is session_meta with cwd, id, timestamp, and git.branch. Codex spawns a fresh worktree per automation run under ~/.codex/worktrees/<hash>/<project> — these must be matched.~/.claude/projects/<encoded-cwd>/ contains both:
<session-uuid>.jsonl — the full transcript (older format, still common)sessions-index.json — newer index that survives even after the .jsonl is archived; has projectPath, firstPrompt, created, gitBranch
Claude Code creates worktrees under <project>/.claude-worktrees/<name> and users also use ~/gwt-worktrees/<project>/<name> or ~/.cursor/worktrees/<project>/<hash>. All show up under ~/.claude/projects/ with their cwd encoded into the dir name.The path encoding in ~/.claude/projects/ is lossy (both / and . map to -), so don't try to derive cwd from the dir name. The cwd/projectPath field inside the file is authoritative — that's what the puller matches against config.json patterns.
Be precise about which is which — they're treated very differently:
.agents/skills/ (and .claude/skills/ which is just symlinks to .agents/). These come from upstream via npx skills update — skill-creator is the canonical example, but more may be added over time. They are the tools used to create, maintain, and improve the user's own skills. Never edit these. They are read-only from this skill's perspective; updates only ever come from npx skills update.skills/ at the repo root (e.g. skills/ship/, skills/ticket/, skills/skill-improver/). These are what this skill exists to improve. All analysis-driven edits target files here.Two pre-flight actions run before any analysis. Both happen in the main worktree of the skills repo (not a feature worktree).
0a. Sync main with origin/main. Every later step branches off main — the upstream refresh below, the feature branch in Step 5 — so stale local main ships stale work. Fast-forward only.
Guardrails:
main, skip the sync and proceed to 0b using the current state. Note it in the final summary.main has diverged from origin/main, report the divergence in the final summary and proceed to 0b.0b. Refresh external meta-skills. The meta-skills under .agents/skills/ encode how skills in this repo should be written and operated. Refresh them so the rest of the run uses the latest guidance — and so any upstream improvements ship to the user promptly.
cd "$(git -C "$CLAUDE_PROJECT_DIR" rev-parse --show-toplevel)"
npx skills update
git status --short
Treatment:
git status shows changes (anywhere under .agents/skills/ — could be any external meta-skill, not just skill-creator), stage and keep them. They will ship in the same PR as any analysis-driven edits, in their own PR section.npx skills update fails (network, auth, etc.), don't abort the run — proceed with Steps 1+ using the currently-installed versions. Note the failure in the final summary.SKILL_DIR="$CLAUDE_PROJECT_DIR/skills/skill-improver" # or the absolute path to this skill
python3 "$SKILL_DIR/scripts/list_conversations.py" --full-prompt > /tmp/skill-improver-batch.jsonl
The script reads state/state.json and emits only conversations newer than the last cursor, one JSON object per line, oldest first.
First-run behavior: when a project/source has no cursor yet, the script defaults to a 7-day lookback window instead of dumping all history. Override with --first-run-days N (use 0 to scan everything; only do that if the user explicitly asks). This means the very first invocation is safe to run unattended — you'll get a small, manageable batch from the last week.
Useful flags:
--since 2026-05-01 — override both the cursor and first-run window--first-run-days 14 — widen the first-run lookback (default 7)--project pixelle2 — restrict to one project--source claude — restrict to one source--limit 50 — keep batches manageable--update-state — advance the cursor after the run completes (only pass on success)Length is not a useful filter — a two-turn exchange where the user's second message is "no, don't do that" is more valuable than a long conversation that went smoothly. Default to reading every conversation in the batch unless one of the cheap skip signals below clearly applies.
Skip only when there's nothing to learn from:
Automation: ... first-prompt with a single agent reply that says "no work to do" or equivalent, then end.Definitely keep — these are high-signal even when very short:
<turn_aborted> system event (user interrupted).$ship, $ticket, $ship NFL-4). Attribution is unambiguous.When in doubt, keep it. Reading a conversation that turns out to be uninteresting costs little; missing a sharp correction in a short conversation costs the next user the same mistake.
For each conversation, read the full transcript and look for these signals. Read the full file — these signals don't appear in summaries.
| Signal | What it looks like |
|---|---|
| Corrective steering | "no", "don't", "stop", "actually", "wait", "instead", "you should have", "why did you", "that's wrong" |
| Re-explanation | User repeats the task with new framing because the agent misunderstood |
| Backtracking | Agent reverts a commit / undoes an edit at user request |
| Plan deviation | Agent skipped a step the skill mandates, or invented a step that wasn't required |
| Tool misuse | Agent used git reset --hard, force-pushed without asking, ran a destructive command the skill should have gated |
| Missing safeguard | User flagged something the skill should have caught (missing tests, skipped review, no migration check) |
| Frustration | Long-form complaint, "you keep doing X", or explicit "this skill needs to ..." feedback |
| Aborted turn | <turn_aborted> event |
For every tagged moment, record:
{
"conversation_id": "...",
"source": "codex|claude",
"project": "...",
"skill_in_use": "ship | ticket | ... | (none)",
"timestamp": "...",
"signal": "corrective | re-explanation | ...",
"quote": "<verbatim user message — keep it tight>",
"what_should_have_caught_it": "<one sentence — which skill/step/check>",
"proposed_change": "<concrete edit, in skill-author voice>"
}
skill_in_use is usually the skill named in the first user prompt ($ship, $ticket, etc.). If no skill was invoked, attribute to (none) — those findings may become new skills or process improvements rather than edits to existing skills.
The brief skill produces visual HTML one-pagers in each project's .briefs/ directory. Those artifacts are direct, durable evidence of how well the skill is performing — much higher signal than reading transcripts about brief generation. This step audits each new brief against the skill's own self-check list and surfaces failures as findings, attributed to skill_in_use: brief.
python3 "$SKILL_DIR/scripts/list_briefs.py" > /tmp/skill-improver-briefs.jsonl
The script reads state/briefs-state.json and emits one JSON object per .briefs/*.html file that is newer than the per-project cursor, oldest first. First-run window defaults to 14 days (briefs are generated less often than conversations, so a wider window is fine).
Useful flags mirror the conversation puller: --since, --project, --mode draft|final, --limit N, --update-state (advance the cursor — only pass on success).
For each new brief, open the HTML file and check it against the rules in skills/brief/SKILL.md's "Step 6 — Save, open, and self-check" section. That self-check list is the source of truth for what a good brief looks like; this audit is just running it across recent real outputs. Concretely, look for:
<pre><code> blocks, no inline diff or source snippets.Tag findings the same way as conversation findings, but with a brief-specific shape:
{
"source": "brief-audit",
"project": "...",
"skill_in_use": "brief",
"brief_path": "<absolute html path>",
"mode": "draft | final",
"mtime": "...",
"rule_violated": "<short name of the self-check rule, e.g. 'PR stack buried'>",
"evidence": "<what you observed — e.g. 'PR Stack appears as section 6, after Code Tour'>",
"proposed_change": "<concrete edit to skills/brief/SKILL.md, in skill-author voice>"
}
The same clustering bar in Step 4 applies: a single off-template brief is noise; two or more independent instances of the same rule violation are the threshold for a skill edit. If the same violation shows up across multiple projects, that's especially strong signal that the rule isn't getting through and needs sharper wording.
Group all findings by skill_in_use. For each skill:
.agents/skills/skill-creator/SKILL.md. That's the canonical reference for how skills in this repo are written: explain the why, avoid stacks of MUST/NEVER, keep prompts lean, prefer reframing over heavy constraints, bundle scripts for repeated work. Every edit you propose should be one skill-creator would approve. If other meta-skills under .agents/skills/ are relevant to the kind of edit you're about to draft (e.g. evaluation, packaging), read those too.A run produces analysis-driven edits or a cleanup pass — never both. Bundling the two muddies review: the reviewer can't easily tell whether a deletion is a deliberate trim or a regression in the new findings, and the cleanup gets rubber-stamped along with the findings instead of getting its own scrutiny.
So after Step 4, branch:
In cleanup mode:
skills/ and consider candidates that genuinely need simplification. A finding-free run does not mean a skill must be cleaned — most runs in cleanup mode should be no-ops, because most skills on most days are fine. Only count a skill as a candidate if you see real signals: SKILL.md past ~300 lines and growing across recent commits, repeated why explanations, dense MUST/NEVER stacks, multiple paragraphs added by previous skill-improver runs piling onto the same section, sections that no longer match the workflow. Vague "could be tighter" doesn't qualify — skill-cleaner itself will refuse to make changes if the target is already tight, and burning a PR on a no-op cleanup wastes the reviewer's time.git log --format='%H %ad %s' --date=short -- skills/<name>/SKILL.md | grep -i 'cleanup' | head -1
If there's a hit, look at what's landed on that file since:
git log --oneline <last-cleanup-sha>..HEAD -- skills/<name>/SKILL.md
git diff --stat <last-cleanup-sha>..HEAD -- skills/<name>/SKILL.md
Skip the candidate if the last cleanup was recent (rule of thumb: within the last ~30 days or within the last 3 skill-improver runs that touched it) and the churn since is small (under ~50 lines added, or just trivial edits like typo fixes). The point is to stop the loop of re-cleaning the same skill every run — a skill that was just tightened needs time to accumulate real growth before another pass is justified.skills/skill-cleaner/SKILL.md and follow it on the chosen skill. It edits in place and returns a structured report. If skill-cleaner itself reports "no changes — skill is already tight", treat the run as a no-op and don't open a cleanup PR for it (this is the second safety net behind the recency filter — if the first didn't catch it, the cleaner's own judgment does).When in doubt, lean toward no-op. A state-only PR (or no PR at all) is always preferable to a cleanup that wasn't justified — re-cleaning a fine skill churns the file, dilutes the signal of past cleanup commits, and trains the reviewer to ignore cleanup PRs.
REPO="$(git -C "${CLAUDE_PROJECT_DIR:-$PWD}" rev-parse --show-toplevel)"
cd "$REPO"
git checkout -b skill-improver/run-$(date -u +%Y%m%d-%H%M%S)
# Step 0's npx skills update changes (if any) are already in the working tree —
# they'll be included in the same commit. Now apply analysis-driven edits.
# apply edits to skills/<name>/SKILL.md ...
Edit the same checkout you branched in — this is a real trap. Resolve $REPO once (above) and prefix every file edit with it ($REPO/skills/<name>/SKILL.md), including the path passed to file-editing tools — not just cd. When this skill runs under a feature worktree (common — the harness spawns one, and CLAUDE_PROJECT_DIR is often unset, so the fallback resolves to that worktree), that worktree is the repo root. A past run branched correctly in the worktree but passed the main checkout's absolute path to its Edit calls, so the changes silently landed on main's working tree and the feature branch's diff came back empty. Guard against it: after the first edit, run git diff --stat and confirm it is non-empty before continuing. An empty diff means you edited the wrong checkout — never edit files under the main checkout path while working in a worktree.
Scope rules:
skills/<name>/ (user-owned skills). Never edit anything under .agents/skills/ or .claude/skills/ — those are upstream and only change via npx skills update. If a finding clearly points at an external meta-skill (e.g. skill-creator gave bad advice), record it in the PR's "Considered but not changed" section and tag it as upstream:<skill-name> so the user can decide whether to file an issue upstream.SKILL.md files unless a finding clearly justifies a script or reference file change.Two cursors persist across runs on main and both live in tracked files:
skills/skill-improver/state/state.json — conversation cursor (per-project, per-source started_at)skills/skill-improver/state/briefs-state.json — brief-artifact cursor (per-project last_mtime)The only way to land them without pushing to main directly is to include them in the same PR as the rest of the run. So advance both before committing in Step 6 — never leave them dirty in the working tree.
python3 "$SKILL_DIR/scripts/list_conversations.py" --update-state --from-batch /tmp/skill-improver-batch.jsonl > /dev/null
python3 "$SKILL_DIR/scripts/list_briefs.py" --update-state --from-batch /tmp/skill-improver-briefs.jsonl > /dev/null
Each script advances its own state file to the newest value per project seen in the Step 1 / Step 3b batch — that's what --from-batch enforces, pointing each script at the JSONL it emitted earlier. This matters because Step 5b runs much later than the pull: without --from-batch, --update-state re-scans live and moves the cursor past any conversation or brief that arrived in between, so it never gets analyzed. Pass the batch files and the cursor only ever moves past what you actually read. The resulting working-tree changes are part of the commit in Step 6.
If the run aborts before Step 6 (push rejected, gh error, etc.), discard both state changes so the next run re-analyzes the same batch:
git checkout -- skills/skill-improver/state/state.json skills/skill-improver/state/briefs-state.json
The body depends on which mode the run ended up in (see Step 4b). Pick the matching template.
Findings-mode PR (Step 4 produced edits):
git add -A
git commit -m "skill-improver: findings from run $(date -u +%Y-%m-%d)"
git push -u origin HEAD
gh pr create --title "skill-improver: $(date -u +%Y-%m-%d) findings" --body "$(cat <<'EOF'
## Summary
<one paragraph: how many conversations analyzed, how many briefs audited, how many findings, which skills touched, plus whether any external meta-skill was updated this run>
## External meta-skill updates
<only if Step 0 produced changes; one bullet per updated meta-skill>
- `.agents/skills/<name>/` — <brief summary of what changed upstream>
These changes come from `npx skills update` and are bundled here so the user has a single review surface. They are not analysis-driven edits.
## Changes from conversation analysis
For each skill edited based on conversation findings:
### skills/<name>/SKILL.md
**Why:** <pattern observed — how many times, across which projects>
**Evidence:**
- <conversation file path> — "<verbatim quote>"
- <...> — "<...>"
**Change:** <what the edit does and why it should prevent the pattern>
## Changes from brief audit
<only if Step 3b produced edits to skills/brief/SKILL.md>
### skills/brief/SKILL.md
**Why:** <self-check rule violated — how many briefs, across which projects/modes>
**Evidence:**
- <brief html path> (mode) — <what was observed>
- <...> — <...>
**Change:** <what the edit does and why it should make the rule land>
## Considered but not changed
<findings that didn't meet the bar for an edit — one bullet each, with reason. Include both conversation and brief findings here.>
## Skipped conversations
<count of skipped trivial/automation conversations>
## Cursors
Advanced conversation cursor for: <project[/source] list with new timestamps>
Advanced brief cursor for: <project list with new mtimes>
🤖 Generated by skill-improver
EOF
)"
Cleanup-mode PR (Step 4 produced no edits, Step 4b ran skill-cleaner):
git add -A
git commit -m "skill-improver: cleanup pass from run $(date -u +%Y-%m-%d)"
git push -u origin HEAD
gh pr create --title "skill-improver: $(date -u +%Y-%m-%d) cleanup of <skill-name>" --body "$(cat <<'EOF'
## Summary
No analysis-driven edits this run (<one sentence on why — empty batch / no patterns met the bar / all findings were noise>). Switched to cleanup mode and ran skill-cleaner on `skills/<name>/`.
## External meta-skill updates
<only if Step 0 produced changes; same format as findings PR>
## Cleanup pass: skills/<name>/SKILL.md
**Why this skill:** <one sentence — e.g. "470 lines, +180 from skill-improver runs over the last 3 months, dense MUST stacks in Step 3">
<paste the full report skill-cleaner returned>
## Cursors
Advanced conversation cursor for: <project[/source] list with new timestamps>
Advanced brief cursor for: <project list with new mtimes>
🤖 Generated by skill-improver
EOF
)"
Rules:
gh pr merge --auto is tempting, resist it.main directly. The skill always opens a PR even for tiny edits.skills/ (from conversation findings or brief audit), (b) Step 0's npx skills update produced changes under .agents/skills/, (c) Step 4b ran skill-cleaner and it made changes, or (d) Step 5b advanced either cursor. Any one is worth a PR — those changes still need a human to merge so the cursors land on main.Post a summary in the conversation that triggered this run (or stdout if scheduled), with:
npx skills update changed any external meta-skill under .agents/skills/ (one line — which meta-skills + a sentence on what changed if non-trivial).Keep it scannable. The PR body has the full evidence; the summary is the orientation.
The skills project is one of the configured targets, and skill-improver lives under skills/ (user-owned), so each run also analyzes conversations in the skills repo itself — including past skill-improver runs — and can propose edits to this SKILL.md. (The same does not apply to skill-creator and other meta-skills under .agents/skills/ — those are upstream and out of scope for analysis-driven edits.)
If a previous run missed a pattern, or wrote a finding-quote-edit that turned out to be wrong, the next run sees the steering in the follow-up conversation and proposes a fix here.
Common self-improvements to watch for:
scripts/list_conversations.py (cite the missed conversation as evidence)~/.claude/projects/ or ~/.codex/ → extend the parserWhen self-improving, the same Step 5-7 rules apply: PR, explain, do not auto-merge.
This skill is designed to be invoked by a cron job (Claude /schedule or a codex automation). The simplest setup is a daily run, but adjust to taste — more frequent runs mean smaller batches and faster feedback, less frequent means more context per finding. The script's 7-day first-run window means even an unattended first fire is bounded.
Entry point under the scheduler. The scheduled task says "Run the skill-improver skill," but this skill lives in the skills repo at skills/skill-improver/SKILL.md — it is not registered as an invocable Skill, so attempting to load it by name fails ("Unknown skill") in both Claude and codex. Don't treat that as a dead end: locate this file directly and follow it. If CLAUDE_PROJECT_DIR is unset (common under the scheduler), glob for skills/skill-improver/config.json under the known skills-repo roots in config.json to find the repo, then read the SKILL.md next to it. This has cost two consecutive runs a couple of wasted turns at startup.
development
Map every Codex and Claude Code session for a project to the git worktrees they ran in, in an interactive local UI. Use whenever someone wants to see, search, audit, or clean up past AI coding-agent conversations and the worktrees those ran in — e.g. "what Codex sessions ran on this repo", "list my Claude Code sessions", "which worktree was that session in", "find the chat where I refactored auth", "archive old Codex sessions", or "show every session across my worktrees". Reach for it to untangle which of many worktrees still has live agent history attached. This is about Codex and Claude Code transcript history plus git worktrees — not HTTP, login, or auth sessions, not terminal or tmux sessions, and not user-research sessions.
tools
Generally-applicable conventions for how code is written and arranged — tooling/package manager, import style, file & component naming, comments, and where files live (colocation vs. global folders). Use whenever creating, naming, moving, or importing a file, running project commands, or deciding where a new module belongs. Consult BEFORE writing the code so the conventions are baked in, not retrofitted. If a convention below matches the work, apply it — don't ask, just follow it (call out the choice in one line so the user can override).
development
Generally-applicable frontend/UI best practices. Use whenever building, modifying, or reviewing UI — adding a form/button/dialog/modal, wiring keyboard shortcuts, creating any interactive surface that submits a form, or any time TSX/JSX is being written or edited. Consult BEFORE writing the code so the patterns are baked in, not retrofitted. If a scenario described in the skill body matches the work, apply the pattern — don't ask, just follow it (call out the choice in one line so the user can override).
tools
Generally-applicable backend/data best practices. Use whenever writing or modifying backend/data code — API routes, server actions, DB writes, background jobs, agent tools, import flows, webhooks, paste handlers, or anywhere data enters the system. Consult BEFORE writing the code so the patterns are baked in, not retrofitted. If a scenario described in the skill body matches the work, apply the pattern — don't ask, just follow it (call out the choice in one line so the user can override).