skills/improve-agent/SKILL.md
Analyze past session files to find recurring AI agent issues and fix them via AGENTS.md updates, new skills, or code/infra changes. Use when asked to improve agent workflow, find recurring problems, optimize AGENTS.md, create skills from session patterns, or understand what went wrong across sessions.
npx skillsauth add junghan0611/agent-config improve-agentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Analyze past pi coding sessions to find recurring agent issues, then fix them by updating AGENTS.md, creating new skills, or improving code/infra.
Note: This skill is pi-specific. It reads pi session files from
~/.pi/agent/sessions/and produces AGENTS.md entries, new skills, or infrastructure fixes.
Pi stores every session as a JSONL file in ~/.pi/agent/sessions/<mangled-cwd>/.
Each session captures tool calls (bash, read, edit, write), tool results
(with success/failure), user messages, assistant reasoning, and compaction
summaries. By analyzing patterns across sessions, we identify where the
agent repeatedly struggles and fix the root causes.
python3 {baseDir}/extract.py [options]
Auto-discovers the sessions directory from $PWD. Use --sessions-dir to override.
| Mode | What it extracts |
|------|------------------|
| --summary | Overview: session count, tool usage, failure count, abort count |
| --commands --stats | Most common bash commands (frequency table) |
| --reads --stats | Most read files |
| --failures --stats | Tool failures (isError=true) with triggering command context |
| --corrections | User corrections: aborted agent turns paired with next user message |
| --sequences | Narrative view: tool calls, user messages, failures in order |
| --sequences --match ERROR | Zoom into error sequences with surrounding context |
| --compactions | Session summaries: goals, progress, blockers, decisions |
| --context LINE | Full untruncated context around a specific line in a session file |
| Flag | Description |
|------|-------------|
| --match REGEX | Filter items by regex |
| --stats | Frequency table instead of raw output |
| --last N | Number of recent sessions (default: 10) |
| --top N | Items in frequency table (default: 30) |
| --before DATE | Only sessions before this date (ISO: 2026-03-01) |
| --after DATE | Only sessions on or after this date (ISO: 2026-03-01) |
| --include-heuristic | With --failures: also show pattern-matched output (noisy) |
| --sessions-dir PATH | Override auto-discovered sessions dir |
| --projects DIR [DIR ...] | Analyze sessions from multiple project directories |
| --session-file PATH | Session file path (required with --context) |
| --window N | Entries before/after --context line (default: 5) |
All output includes JSONL line references (L:NNN or session:LNNN)
and the full filepath to the session file (as a header per session,
or as a legend in stats mode). This lets you jump from any finding
directly to the raw data.
To drill into a specific event with the built-in context viewer:
python3 {baseDir}/extract.py --context 42 --session-file /path/to/session.jsonl
Or manually with jq/sed:
sed -n '42p' /path/to/session.jsonl | python3 -m json.tool
Follow these steps in order. Present findings to the user after each step.
python3 {baseDir}/extract.py --summary
Read the project's AGENTS.md if it exists. Understand what guidance the
agent already has.
Run the frequency analyses and check user corrections:
python3 {baseDir}/extract.py --commands --stats
python3 {baseDir}/extract.py --failures --stats
python3 {baseDir}/extract.py --reads --stats
python3 {baseDir}/extract.py --corrections
Look for:
make test | tail -5,
make test | tail -10, make test | tail -20 — noisy output problem)For the top patterns, use sequences to see what happened:
# See error narratives
python3 {baseDir}/extract.py --sequences --match "ERROR"
# Deep-dive into specific patterns
python3 {baseDir}/extract.py --commands --match "git add"
python3 {baseDir}/extract.py --failures --match "syntax|paren|not found"
The sequence view shows:
USER messages — what the user asked for or complained aboutBASH/EDIT/READ/WRITE — what the agent did!! ERROR — where things went wrong (ground truth: non-zero exit / tool error)Also check compaction summaries for session-level context:
python3 {baseDir}/extract.py --compactions
When a sweep surfaces something interesting at a specific line, use
--context to see the full untruncated picture — complete tool output,
full user messages, full assistant reasoning and thinking:
# The filepath is shown in every session header — copy it directly
python3 {baseDir}/extract.py --context 42 --session-file /path/to/session.jsonl
# Wider window for complex sequences
python3 {baseDir}/extract.py --context 42 --session-file /path/to/session.jsonl --window 10
This is the primary drill-down tool. Use it whenever a line number catches your attention in the sweep output.
--context covers most drill-down needs, but sometimes you need to ask
questions it can't answer — correlating events far apart in a session,
counting patterns across the whole file, or extracting specific fields.
For those, go straight to the JSONL with jq, grep, or python one-liners.
Session files live in ~/.pi/agent/sessions/<mangled-cwd>/. Each line is
a self-contained JSON object. Key fields:
type: "message" | "compaction" | "session" | ...
message.role: "user" | "assistant" | "toolResult"
message.content[].type: "text" | "toolCall"
message.content[].name: "bash" | "read" | "edit" | "write" | ...
message.isError: true/false (on toolResult messages)
Example investigations:
# Get full context around a suspicious line
S=~/.pi/agent/sessions/<dir>/<file>.jsonl
sed -n '40,50p' "$S" | jq -r '.message.content[]?.text // empty' | head -40
# All user messages (complaints, corrections, instructions)
jq -r 'select(.type=="message") | select(.message.role=="user")
| .message.content[]? | select(.type=="text") | .text' "$S"
# Full error output for a specific toolResult (not truncated)
sed -n '42p' "$S" | jq -r '.message.content[].text'
# All tool calls in order with their names (quick narrative)
jq -r 'select(.type=="message") | select(.message.role=="assistant")
| .message.content[]? | select(.type=="toolCall")
| "\(.name): \(.arguments | tostring | .[0:120])"' "$S"
# Count consecutive edits to the same file (struggle detector)
jq -r 'select(.type=="message") | select(.message.role=="assistant")
| .message.content[]? | select(.type=="toolCall")
| select(.name=="edit") | .arguments.path' "$S" \
| uniq -c | sort -rn | head
# All toolResult errors with full output
jq -r 'select(.type=="message") | select(.message.role=="toolResult")
| select(.message.isError==true)
| "[\(.message.toolName)] \(.message.content[0].text[0:300])"' "$S"
# What did the assistant say right after an error? (reaction pattern)
# Use line numbers: if error is at L42, check L43
sed -n '43p' "$S" | jq -r '.message.content[]?
| select(.type=="text") | .text[0:300]'
# Find retry/struggle loops: same command repeated within 10 lines
jq -r 'select(.type=="message") | select(.message.role=="assistant")
| .message.content[]? | select(.type=="toolCall")
| select(.name=="bash") | .arguments.command' "$S" \
| uniq -c | sort -rn | head
Trust your judgment. If the extract.py output raises a question, answer it directly from the data. The JSONL has everything — full tool output, full user messages, full assistant reasoning. Don't stay at the summary level when the details matter.
For each issue found, assess:
Rank by frequency × sessions. Focus on the top issues.
For each issue, present to the user:
Ask two questions:
Is the tool/command/infrastructure itself broken or misleading? Fix it directly — Makefile target, helper script, git hook, config file, whatever it takes. The agent shouldn't need guidance to work around broken tooling.
Is it knowledge the agent needs? Two options, depending on scope:
Often the answer is both: fix the broken command AND document the correct usage. Present options to the user, wait for them to pick, then implement.
Verify the change works:
# Test that the updated AGENTS.md is loaded and understood
pi -p "Read AGENTS.md and confirm you see the new guidance about <topic>"
# Or test the specific behavior the new guidance should produce
pi -p "Show me how you would <thing the agent kept getting wrong>"
Then commit and move to the next issue.
When analysis reveals a recurring multi-step workflow — the agent writing the same helper scripts across sessions, following the same complex sequence of commands, or needing the same domain knowledge repeatedly — that's a skill, not an AGENTS.md entry.
Recognizing skill opportunities:
--projects to check)Creating the skill:
Extract intent from session data. The sessions already show what the skill needs to do. Look at the successful command sequences, the scripts the agent wrote, and the user corrections that refined the approach.
Scaffold the SKILL.md. Use proper frontmatter:
---
name: my-skill
description: What it does and when to trigger. Be specific about
contexts — include phrases users would say. Err on the side of
triggering too often rather than too rarely.
---
Write the workflow. Translate the successful patterns from session data into clear steps. Explain why each step matters — the agent is smart and responds better to reasoning than rigid instructions.
Bundle repeated scripts. If the agent kept writing the same helper
script across sessions, write it once and put it in the skill directory.
Reference it from SKILL.md with {baseDir}/scripts/helper.py.
Test it. Run a quick pi session to verify the skill triggers and the workflow produces good results:
pi -p "<prompt that should trigger the skill>"
Keep it lean. SKILL.md under 500 lines. If it grows beyond that,
split into a main SKILL.md and references/ directory with detailed
docs that get loaded on demand.
After implementing fixes, verify they had the intended effect in
subsequent sessions. Use --before/--after to compare windows:
# Check the pattern before the fix
python3 {baseDir}/extract.py --failures --match "the-pattern" --before 2026-03-01
# Check after the fix
python3 {baseDir}/extract.py --failures --match "the-pattern" --after 2026-03-01
If the pattern still appears at similar frequency, the fix didn't work. Investigate why — the root cause may be different from what you assumed.
This step is optional during the initial analysis but valuable as a follow-up in a later session.
When the user suspects patterns span multiple projects, or wants to identify cross-cutting skill opportunities:
# Analyze failures across two projects
python3 {baseDir}/extract.py --failures --stats \
--projects ~/co/project-a ~/co/project-b
# Check corrections across projects
python3 {baseDir}/extract.py --corrections \
--projects ~/co/project-a ~/co/project-b --last 5
Each project directory gets resolved to its pi sessions directory automatically. Output labels include the project name for context.
Cross-project patterns are strong signals for global skills (placed
in ~/.agents/skills/) or global AGENTS.md entries.
$PWD, environment variables, or discovery snippets.documentation
Cut an OpenClaw-style CalVer tag for a repo — gather commits, refresh CHANGELOG, promote, tag, push, stamp. Use when the user says '태그 박자', '릴리즈 컷', 'changelog 정리', 'cut a release/tag', or wants a vYYYY.M.D[-suffix] snapshot.
development
멀티하네스용 /recall 래퍼. native custom command surface가 없는 하네스(예: Antigravity, Codex)에서도 /recall에 해당하는 다축 맥락 복원을 skill로 호출한다. session-recap, NEXT.md·ROADMAP.md, git status/log, semantic-memory를 조합해 어제/오늘의 작업축을 되살린다.
tools
포지(forge) — 셀프호스팅 Forgejo 이슈/PR/라벨/코멘트 작업면. 힣 에이전트가 코드 면에서 일하는 공유 손. botment의 자식 패턴(가든 댓글면 → 코드 댓글면 endpoint swap). 'forge', 'forgejo', '포지', 'issue', 'pull request', '이슈', '풀리퀘', '코드 댓글', '라벨'.
tools
sync entwurf 자식을 들여다보는 손. 호출자가 'Mattering...'에 묶여있을 때 자식 분신이 무엇을 하는지 확인. entwurf_peers는 control socket 있는 세션만 보여주는데, 이 스킬은 자식 entwurf 세션(이름 태그 entwurf)까지 합쳐서 본다. 트리거: 'entwurf-peek', '분신 들여다보기', '진행 중 세션', 'sync entwurf 안에서', 'peek session', 'live session map', '분신 추적'.