skill/SKILL.md
Self-improving skills toolkit that watches real agent sessions, detects missed triggers, grades execution quality, and evolves skills through a package evaluation pipeline (replay, baseline, grading, unit tests, and post-deploy watch). Use when verifying skill packages, publishing improvements, evolving skill descriptions or routing tables, discovering reusable workflows, scaffolding new workflow skills, checking skill health, viewing the dashboard, ingesting sessions from other platforms, or running autonomous improvement loops. Make sure to use this skill whenever the user mentions skill improvement, skill performance, skill triggers, skill evolution, skill health, undertriggering, overtriggering, session grading, or wants to know how their skills are doing — even if they don't say "selftune" explicitly.
npx skillsauth add selftune-dev/selftune selftuneInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Observe real agent sessions, detect missed triggers, grade execution quality, evolve skills through package evaluation (replay, baseline, grading, body, unit tests, and post-deploy watch), and scaffold workflow skills from repeated telemetry patterns.
You are the operator. The user installed this skill so YOU can manage their skill health autonomously. They will say things like "set up selftune", "improve my skills", or "how are my skills doing?" — and you route to the correct workflow below. The user does not run CLI commands directly; you do.
If ~/.selftune/config.json does not exist, read workflows/Initialize.md
first. The CLI must be installed (selftune on PATH) before other commands
will work. Do not proceed with other commands until initialization is complete.
Default to this lifecycle unless the user explicitly asks for a low-level workflow:
status
selftune statusselftune create status --skill-path <path>verify
selftune verify --skill-path <path>publish
selftune publish --skill-path <path>publish usually means a validated Improve
action plus Watchimprove
selftune improve --skill <name> --skill-path <path>--scope auto choose bounded package search automatically when the
skill already has package evidence or a draft package manifest--scope description|routing|body|package when the measured gap is
already clear and you want to force the mutation surface--scope package when the problem spans routing and body together or
you want measured frontier comparison before deciding what to publish--dry-run when you want the winning package candidate promoted back
into the draft automaticallyrun
selftune runTreat eval generate, unit-test, replay, baseline, watch, and
body-specific evolution as advanced supporting workflows unless the user asks
for them directly or the default lifecycle fails.
selftune <command> [options]
Commands vary in output format:
selftune doctor and selftune watch emit structured JSON on stdout.selftune status, selftune last, selftune verify, selftune publish, and selftune improve print human-readable text when stdout is a TTY.selftune run / selftune orchestrate emit JSON on stdout and a human report on stderr.selftune sync --json enables structured JSON output.selftune dashboard starts a local SPA server — it does not emit data.For health remediation, prefer machine-readable guidance.next_command or
top-level next_command from selftune doctor output instead of inferring the
next step from prose.
Run selftune <command> --help for exact flags. Read
references/cli-quick-reference.md when you need the full flag reference.
When the user wants to improve a skill, default to this package evaluation pipeline before jumping straight to mutation. Each step builds measured evidence that the package is ready to publish:
draft — the package exists but is still incompleteverify_blocked — the draft is still in one of the concrete readiness states: needs_spec_validation, needs_package_resources, needs_evals, needs_unit_tests, needs_routing_replay, or needs_baselineverified — the trust gates pass and the skill is ready to shippublished — the skill was shipped successfullywatching — post-deploy monitoring is activeneeds_improvement — measured evidence shows trigger, routing, body, or value gapsunhealthy — hooks, telemetry, config, or selftune itself is brokenIf the user asks "how do I know this skill works?" or "can I trust this skill
yet?", start with this pipeline, then use selftune status, the dashboard, or
the skill report to explain what is still missing, whether the package is ready
to publish, or whether it is already being watched live.
| Trigger keywords | Workflow | File |
| -------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------- | --------------------------------- |
| create skill, new skill package, author skill, bootstrap skill, scaffold package, benchmark report, package report, publish report | Create | workflows/Create.md |
| verify skill, creator loop, can I trust this skill, how do I know this skill works, test this skill, ready to ship, ready to deploy | Verify | workflows/Verify.md |
| publish skill, ship skill, deploy skill, go live, release skill | Publish | workflows/Publish.md |
| search run, package frontier, candidate search, bounded package evolution, compare package candidates, optimize package, improve routing and body together, bounded evolution | SearchRun | workflows/SearchRun.md |
| grade, score, evaluate, assess session, auto-grade | Grade | workflows/Grade.md |
| evals, eval set, undertriggering, skill stats, eval generate | Evals | workflows/Evals.md |
| improve, optimize skills, make skills better, triggers, catch more queries, apply proposal, apply contributor proposal | Improve | workflows/Improve.md |
| evolve description, description-only evolution, improve trigger wording | Evolve | workflows/Evolve.md |
| evolve body, evolve routing, full body evolution, rewrite skill, teacher student | EvolveBody | workflows/EvolveBody.md |
| evolve rollback, undo, restore, revert evolution, go back, undo last change | Rollback | workflows/Rollback.md |
| watch, monitor, regression, post-deploy, keep an eye on | Watch | workflows/Watch.md |
| doctor, health, hooks, broken, diagnose, not working, something wrong | Doctor | workflows/Doctor.md |
| ingest, import, codex logs, opencode, openclaw, pi, wrap codex | Ingest | workflows/Ingest.md |
| replay, backfill, claude transcripts, historical sessions | Replay | workflows/Replay.md |
| contributions, sharing preferences, opt in/out creator sharing, approve/revoke contributions | Contributions | workflows/Contributions.md |
| creator contributions, selftune.contribute.json, enable/disable creator contribution | CreatorContributions | workflows/CreatorContributions.md |
| signals dashboard, contributor signals, signals page, community dashboard, community data, contributor stats, signal health, how are signals, how is community | SignalsDashboard | workflows/SignalsDashboard.md |
| contribute, share, export bundle, export data, anonymized, give back | Contribute | workflows/Contribute.md |
| init, setup, set up, bootstrap, first time, install, configure selftune, alpha, enroll | Initialize | workflows/Initialize.md |
| cron, schedule, automate evolution, run automatically | Cron | workflows/Cron.md |
| schedule, selftune schedule, launchd, systemd, crontab, automation setup | Schedule | workflows/Schedule.md |
| auto-activate, suggestions, activation rules, nag, why suggest | AutoActivation | workflows/AutoActivation.md |
| dashboard, visual, open dashboard, show dashboard, serve dashboard | Dashboard | workflows/Dashboard.md |
| evolution memory, session continuity, what happened last | EvolutionMemory | workflows/EvolutionMemory.md |
| grade baseline, baseline lift, adds value, skill value, no-skill comparison | Baseline | workflows/Baseline.md |
| eval unit-test, skill test, test skill, generate tests, run tests | UnitTest | workflows/UnitTest.md |
| eval composability, co-occurrence, skill conflicts, family overlap, sibling confusion | Composability | workflows/Composability.md |
| eval import, skillsbench, external evals, benchmark tasks | ImportSkillsBench | workflows/ImportSkillsBench.md |
| telemetry, analytics, disable analytics, opt out, tracking, privacy | Telemetry | workflows/Telemetry.md |
| orchestrate, autonomous, full loop, improve all skills, run selftune, run selftune loop, run with package search, automatic package improvement | Run | workflows/Run.md |
| sync, refresh, source truth, rescan sessions | Sync | workflows/Sync.md |
| badge, readme badge, skill badge, health badge | Badge | workflows/Badge.md |
| workflows, discover workflows, scaffold workflow skill, build skill from logs | Workflows | workflows/Workflows.md |
| alpha upload, upload data, send alpha data, manual upload | AlphaUpload | workflows/AlphaUpload.md |
| recover, rebuild sqlite, recover db, legacy backfill | Recover | workflows/Recover.md |
| quickstart, getting started, onboard, first time setup, new user | Quickstart | workflows/Quickstart.md |
| uninstall, remove selftune, clean up, teardown | Uninstall | workflows/Uninstall.md |
| repair, rebuild usage, fix skill usage, trustworthy usage | RepairSkillUsage | workflows/RepairSkillUsage.md |
| export canonical, canonical export, canonical telemetry, push payload | ExportCanonical | workflows/ExportCanonical.md |
| hook, run hook, invoke hook, manual hook, debug hook | Hook | workflows/Hook.md |
| codex/opencode/cline/pi hooks, platform hooks, non-claude hooks, multi-agent | PlatformHooks | workflows/PlatformHooks.md |
| registry, distribute, push/install/sync/rollback skill, team skills | Registry | workflows/Registry.md |
| export, dump, jsonl, export sqlite, debug export | Export | (direct: selftune export) |
| status, health summary, skill health, how are skills, run selftune | Status | (direct: selftune status) |
| last, last session, recent session, what happened | Last | (direct: selftune last) |
Workflows Grade, Improve, Watch, and Ingest also run autonomously via selftune orchestrate.
When package evaluation evidence exists, selftune orchestrate (aliased as selftune run)
can automatically select package-level bounded search instead of description-level evolve.
Before running mutating workflows (evolve, evolve-body, evals, baseline), consult
references/interactive-config.md for the pre-flight configuration pattern, model
tier reference, and quick-path rules.
selftune bundles focused agents in agents/. Read the relevant agent file and
follow its instructions — either inline or by spawning a subagent.
| Trigger keywords | Agent file | When to use |
| ---------------------------------------------------- | ------------------------------ | ------------------------------------------------------------ |
| diagnose, root cause, why failing, debug performance | agents/diagnosis-analyst.md | Recurring low grades or unclear failures after doctor/status |
| patterns, conflicts, cross-skill, overlap | agents/pattern-analyst.md | Skills overlap, misroute, or interfere |
| review evolution, check proposal, safe to deploy | agents/evolution-reviewer.md | Before deploying high-stakes or marginal evolutions |
| set up selftune, integrate, configure project | agents/integration-guide.md | Complex setup: monorepos, multi-skill, mixed-platform |
Load these when the user explicitly asks for a low-level step, when the primary lifecycle fails, or when debugging needs deeper evidence:
workflows/Evals.mdworkflows/UnitTest.mdworkflows/Baseline.mdworkflows/Replay.mdworkflows/Watch.mdworkflows/Evolve.mdworkflows/EvolveBody.mdworkflows/Composability.mdworkflows/ImportSkillsBench.mdThese should NOT trigger selftune — near-misses that share keywords:
The key distinction: selftune improves skills themselves (descriptions, triggers, execution quality). If the user wants to use a skill, route to that skill instead.
This skill ships a selftune.contribute.json config that enables creator-directed
signal relay. When users install selftune and approve contribution, anonymous
usage signals (trigger, grade, miss_category) are relayed to the selftune
creator to improve the skill for everyone.
creator_id in the bundled config is the selftune creator's cloud user UUID.selftune contributions approve selftune.workflows/CreatorContributions.md for creator-side setup.workflows/Contributions.md for end-user opt-in/opt-out.Routing keywords: creator contribution, selftune signals, dogfood relay, community contribution, signal sharing, opt in creator, creator UUID.
Load these on demand — do not read unless needed for the current task:
| Reference | When to read |
| ----------------------------------- | -------------------------------------------------------------------- |
| references/cli-quick-reference.md | Need exact CLI flags beyond --help |
| references/troubleshooting.md | Diagnosing common errors |
| references/examples.md | Need step-by-step scenario walkthroughs |
| references/creator-playbook.md | Publishing skills others install; before-ship vs after-ship pipeline |
| references/interactive-config.md | Before mutating workflows |
| references/grading-methodology.md | Grading sessions or interpreting grades |
| references/invocation-taxonomy.md | Analyzing trigger coverage |
| references/logs.md | Parsing or debugging log files |
| references/setup-patterns.md | Complex platform-specific setup |
| references/version-history.md | Checking what changed between versions |
| settings_snippet.json | During initialization |
development
Delivers instruction on UI/UX design principles for web applications including layout, typography, color theory, and responsive design.
data-ai
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities.
testing
Facilitates creation of visual content using generative AI models including DALL-E, Midjourney prompts, and Stable Diffusion workflows.
testing
Helps deploy applications to production environments with safety checks, rollback support, and monitoring.