_internal-skills/tune/SKILL.md
Live RL tuner for skills. Watches skill invocations, reads user reaction, proposes targeted SKILL.md overlay edits, requires explicit approval, writes scorecards. The in-session half of the skill-RL loop (Path B). Triggers on: tune, sharpen, skill feedback, that was shit, that was great, make X better.
npx skillsauth add atrislabs/atris tuneInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Turn the user's reaction into a targeted SKILL.md overlay edit. Human-in-the-loop reinforcement learning for skills.
Base SKILL.md (read-only, shared)
+
Overlay.md (mutable, per-workspace)
=
Effective skill at invocation time
Path A (cron): /api/improve ticks weekly, objective reward harvester
Path B (live): THIS SKILL — user reaction = reward, edits overlay in-session
Same overlay.md, same scorecard — both paths compound.
| Signal | Reward | Source | |---|---|---| | Explicit positive | +1 | "good", "perfect", "keep it", "that nailed it" | | Implicit positive | +0.5 | user kept output unchanged, shipped it, moved on | | Neutral | 0 | no reaction, switched topic | | Implicit negative | −0.5 | user edited heavily, reran skill, tweaked params | | Explicit negative | −1 | "shit", "bad", "nope", "revert", "worse than before" |
Do not infer reward beyond these. If you are unsure, classify as neutral and log; do NOT propose an edit on neutral.
1. OBSERVE
- User just invoked skill <X> at turn T
- Skill produced output Y
- User reacted with R
2. CLASSIFY
- Map R to reward from the schema above
- If reward in {0}: log and stop. Do not propose.
3. DIAGNOSE
- If reward < 0: what specifically did the user reject?
(a phrase? a structural choice? a missing section? tone?)
- If reward > 0: what specifically did the user approve?
(same questions, positive framing)
- Write a one-sentence diagnosis.
4. PROPOSE
- Read /workspace/atris/skills/<X>/SKILL.md (BASE, read-only)
- Read /workspace/atris/skills/<X>/overlay.md (if exists, else empty)
- Propose ONE targeted edit to overlay.md:
• Max 5 lines of change
• Must be additive (append) or a bounded replace of existing overlay lines
• NEVER edits base SKILL.md
5. SHOW THE DIFF
- Display a unified diff of the proposed overlay change
- State the diagnosis and expected outcome in one sentence
6. REQUIRE APPROVAL
- Wait for an explicit "yes" / "apply" / "go" from the user.
- On anything else: log as rejected. Do not apply.
7. APPLY + LOG
- Write the approved change to /workspace/atris/skills/<X>/overlay.md
- Append one line to /workspace/atris/skills/<X>/scorecards.jsonl:
{"ts": "<iso>", "skill": "<X>", "reward": <n>, "reaction": "<short>",
"diagnosis": "<short>", "applied": true, "path": "B-live", "turn": <T>}
/api/improve or ask them to manually prune).scorecards.jsonl (append-only):
{"ts", "skill", "reward", "reaction", "diagnosis", "applied", "path": "B-live", "turn"}
Path A (/api/improve) also writes to the SAME scorecards.jsonl with:
{"ts", "skill", "reward", "path": "A-cron", "variant_hash", "kept": bool}
Weekly: /api/improve cron tick reads the last 7 days of scorecards.jsonl
to propose autonomous variants that address patterns Path B caught.
User: [runs aeo draft, gets output]
User: "this is too hedgy, it uses 'may' twice"
Tune skill:
1. Reward: −1 (explicit negative, specific complaint)
2. Diagnosis: AEO rule 9 (no hedging) is violated; overlay should strengthen it
3. Propose overlay.md addition:
```diff
--- overlay.md (before)
+++ overlay.md (after)
@@ +1,3 @@
+## Stricter no-hedging (learned 2026-04-17)
+Before returning, scan the body for: may, might, could, perhaps, possibly, potentially.
+If found, rewrite the sentence as declarative. No exceptions.
"Apply this?" User: "yes" [writes overlay.md + logs scorecard, reward=−1, applied=true]
### Example 2 — explicit positive on cold-email
User: "damn that cold email hook is sharp, keep doing that"
Tune skill:
Reward: +1
Diagnosis: the specific hook pattern worked (e.g., "opened with a stat from their 10-K")
Propose overlay.md addition:
+## Hook that worked (2026-04-17) +Prefer opening with a verifiable, recent stat from the prospect's own +public material (10-K, earnings call, press release) over generic praise.
"Apply this as a preferred pattern?" User: "yes" [writes overlay.md + logs scorecard, reward=+1, applied=true]
### Example 3 — neutral
User: [gets output, says nothing, moves on]
Tune skill: does NOT proactively run. Only triggers on user-expressed reaction. If somehow invoked: log neutral, propose nothing.
## Anti-patterns
- **Don't propose on neutral signal.** Silence is not −1 or +1.
- **Don't rewrite the base.** Base SKILL.md is shared across customers. Only overlay diverges.
- **Don't batch multiple changes into one tick.** One tick = one targeted edit. Larger refactors go through `/api/improve` in review mode.
- **Don't claim improvement you can't verify.** "Expected outcome" should be a testable prediction (e.g., "next draft will have zero 'may/might/could'"), not a vibe.
- **Don't run without the user's explicit approval.** Every application is gated. No "this seems good, I'll apply it."
## File layout (what exists where)
atris-cli/atris/skills/<name>/SKILL.md ← base, shared (genotype) /workspace/atris/skills/<name>/SKILL.md ← customer copy of base (synced) /workspace/atris/skills/<name>/overlay.md ← mutable, RL-tuned (phenotype) /workspace/atris/skills/<name>/scorecards.jsonl ← append-only reward log
**Invocation-time behavior:** skill callers (backend endpoints like `/aeo/draft`, or Claude Code skill system) read `SKILL.md` AND `overlay.md` and concatenate. If overlay.md is missing, treat as empty.
## Related
- Base pattern: `atris/skills/autoresearch/SKILL.md` (Karpathy keep/revert)
- Path A tick runner: `/api/improve` (`backend/routers/improve.py`)
- PII sanitizer for cross-customer diffs: `backend/rl/skill_extractor.py`
- Reward compiler: `backend/rl/rewards.py`
- Judge infra: `backend/rl/judges.py`
- Feature: `atris/features/skill-rl/idea.md`
## Rules (non-negotiable summary)
1. Never edit base SKILL.md. Overlay only.
2. Always show the diff before applying.
3. Require explicit user approval.
4. Cap a single tick at 5 lines of change.
5. Log every tick — approved, rejected, neutral.
6. Do not propose on neutral reactions.
7. Strip PII before any cross-customer record (use `skill_extractor`).
testing
Detects AI slop and fixes it, especially in memos, docs, READMEs, messages, PRDs, and other written output. Based on Wikipedia's AI Cleanup patterns plus memo-specific anti-slop rules. Triggers on "copy edit", "review writing", "humanize", "deslopper", "ai patterns", "make it sound human", "AI slop", "anti-slop", "memo".
tools
Use when an agent needs to inspect or send local macOS iMessage through Atris CLI. Triggers on iMessage, Messages.app, local text messages, chat.db, or texting someone from the user's Mac.
databases
Submit, list, resolve, close, or delete Atris customer feedback. Use when user types /feedback or asks to triage the feedback queue.
development
Fast research sweep — arxiv, semantic scholar, github, web. Finds papers, scores relevance, extracts actionable insights, stores to wiki. Triggers on: research search, find papers, latest research, arxiv, what's new in, sweep papers, research sweep.