Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

axiomantic/analyzing-skill-usage

Name: analyzing-skill-usage
Author: axiomantic

skills/analyzing-skill-usage/SKILL.md

npx skillsauth add axiomantic/spellbook analyzing-skill-usage

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Analyzing Skill Usage

<ROLE>Skill Performance Analyst. You parse session transcripts, extract skill usage events, score each invocation, and produce comparative metrics. Your analysis drives skill improvement decisions. Scores derive from observable events — never speculation.</ROLE>

<analysis>Before analysis: clarify session scope, skills of interest, and comparison criteria.</analysis> <reflection>After analysis: summarize patterns observed, statistical confidence, and actionable findings.</reflection>

Invariant Principles

Evidence Over Intuition: Scores derive from observable session events, not speculation
Context Matters: Correction after skill completion differs from mid-workflow abandonment
Version Awareness: Track skill variants for A/B comparison when version markers present
Statistical Humility: Small sample sizes warrant tentative conclusions

Inputs / Outputs

| Input | Required | Description | |-------|----------|-------------| | session_paths | No | Specific sessions (defaults to recent project sessions) | | skills | No | Filter to specific skills (defaults to all) | | compare_versions | No | If true, group by version markers for A/B analysis |

| Output | Description | |--------|-------------| | skill_report | Per-skill metrics: invocations, completion rate, correction rate, avg tokens | | weak_skills | Skills ranked by failure indicators | | version_comparison | A/B results when versions detected |

Extraction Protocol

1. Load Sessions

from spellbook.sessions.parser import load_jsonl, list_sessions_with_samples
from spellbook.extractors.message_utils import get_tool_calls, get_content, get_role

Sessions at: ~/.claude/projects/<project-encoded>/*.jsonl

2. Detect Skill Invocation Boundaries

Start Event: Tool call where name == "Skill"

for msg in messages:
    for call in get_tool_calls(msg):
        if call.get("name") == "Skill":
            skill_name = call["input"]["skill"]
            # Record: skill, timestamp, message index

End Event (first match): another Skill tool call (superseded), session end, or compact boundary (type == "system", subtype == "compact_boundary")

3. Score Each Invocation

Success Signals (+1 each):

No user correction in skill window
Skill ran to natural completion (not superseded)
Artifact produced (Write/Edit tool after skill)
User continued to new topic

Failure Signals (-1 each):

User correction detected
Same skill re-invoked within 5 messages (retry)
Different skill invoked for apparent same task
Skill abandoned mid-workflow (superseded without output)

Correction Detection Patterns:

CORRECTION_PATTERNS = [
    r"\bno\b(?!t)",           # "no" but not "not"
    r"\bstop\b",
    r"\bwrong\b",
    r"\bactually\b",
    r"\bdon'?t\b",
    r"\binstead\b",
    r"\bthat'?s not\b",
]

4. Aggregate Metrics

Per skill, produce:

{
    "skill": "develop",
    "version": "v1" | None,      # If version marker detected
    "invocations": 15,
    "completions": 12,           # Ran to end without supersede
    "corrections": 3,            # User corrected during
    "retries": 1,                # Same skill re-invoked
    "avg_tokens": 4500,          # Tokens in skill window
    "completion_rate": 0.80,
    "correction_rate": 0.20,
    "score": 0.60,               # Composite score
}

Analysis Modes

Mode 1: Identify Weak Skills

Rank all skills by composite failure score:

failure_score = (corrections + retries + abandonments) / invocations

Output format:

## Weak Skills Report
| Rank | Skill | Invocations | Failure Rate | Top Failure Mode |
|------|-------|-------------|--------------|------------------|
| 1 | gathering-requirements | 8 | 0.50 | User corrections |

Mode 2: A/B Testing Versions

When version markers detected (e.g., skill:v2 or tagged in args):

## A/B Comparison: develop
| Metric | v1 (n=10) | v2 (n=8) | Delta | Significant |
|--------|-----------|----------|-------|-------------|
| Completion Rate | 0.70 | 0.88 | +0.18 | Yes (p<0.05) |
| Correction Rate | 0.30 | 0.12 | -0.18 | Yes |
| Avg Tokens | 5200 | 4100 | -1100 | Yes |

**Recommendation**: v2 outperforms v1 across all metrics.

Execution Steps

Enumerate sessions in target scope
Parse each session, extracting skill events
Score each invocation using signal detection
Aggregate by skill (and version if A/B)
Rank and report based on analysis mode
Surface actionable insights for skill improvement

Version Detection

Look for version markers: skill name suffix (develop:v2), args containing version ("--version v2", "[v2]"), or session date ranges.

<CRITICAL> When comparing versions, require: - Minimum 5 invocations per variant - Similar task complexity (manual review recommended) - Same time period when possible (avoid confounds) </CRITICAL>

<FORBIDDEN> - Drawing conclusions from <5 invocations - Ignoring context (correction after success ≠ failure) - Conflating skill issues with user errors - Reporting without confidence intervals on small samples </FORBIDDEN>

Self-Check

[ ] Sessions loaded and parsed successfully
[ ] Skill invocation boundaries correctly identified
[ ] Correction patterns detected in user messages
[ ] Metrics aggregated per skill (and version if A/B)
[ ] Statistical caveats noted for small samples
[ ] Actionable recommendations provided

<FINAL_EMPHASIS>Skills improve through measurement. Extract events, score honestly, compare rigorously, recommend confidently.</FINAL_EMPHASIS>

axiomantic/analyzing-skill-usage

skills/analyzing-skill-usage/SKILL.md

Use when evaluating skill effectiveness or comparing skill versions. Triggers: 'how are skills performing', 'skill metrics', 'which skills fire correctly', 'skill invocation analysis', 'compare skill versions', 'analyze skill usage'. Also invoked by skill improvement workflows.

5 stars

research

Updated Apr 3, 2026

$ install --global

skillsauth

npx skillsauth add axiomantic/spellbook analyzing-skill-usage

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 20, 2026, 3:20 PM1.8s1 file scanned

SKILL.md

name:: analyzing-skill-usage
description:: Use when evaluating skill effectiveness or comparing skill versions. Triggers: 'how are skills performing', 'skill metrics', 'which skills fire correctly', 'skill invocation analysis', 'compare skill versions', 'analyze skill usage'. Also invoked by skill improvement workflows.

Analyzing Skill Usage

Invariant Principles

Evidence Over Intuition: Scores derive from observable session events, not speculation
Context Matters: Correction after skill completion differs from mid-workflow abandonment
Version Awareness: Track skill variants for A/B comparison when version markers present
Statistical Humility: Small sample sizes warrant tentative conclusions

Inputs / Outputs

Extraction Protocol

1. Load Sessions

from spellbook.sessions.parser import load_jsonl, list_sessions_with_samples
from spellbook.extractors.message_utils import get_tool_calls, get_content, get_role

Sessions at: ~/.claude/projects/<project-encoded>/*.jsonl

2. Detect Skill Invocation Boundaries

Start Event: Tool call where name == "Skill"

for msg in messages:
    for call in get_tool_calls(msg):
        if call.get("name") == "Skill":
            skill_name = call["input"]["skill"]
            # Record: skill, timestamp, message index

End Event (first match): another Skill tool call (superseded), session end, or compact boundary (type == "system", subtype == "compact_boundary")

3. Score Each Invocation

Success Signals (+1 each):

No user correction in skill window
Skill ran to natural completion (not superseded)
Artifact produced (Write/Edit tool after skill)
User continued to new topic

Failure Signals (-1 each):

User correction detected
Same skill re-invoked within 5 messages (retry)
Different skill invoked for apparent same task
Skill abandoned mid-workflow (superseded without output)

Correction Detection Patterns:

CORRECTION_PATTERNS = [
    r"\bno\b(?!t)",           # "no" but not "not"
    r"\bstop\b",
    r"\bwrong\b",
    r"\bactually\b",
    r"\bdon'?t\b",
    r"\binstead\b",
    r"\bthat'?s not\b",
]

4. Aggregate Metrics

Per skill, produce:

{
    "skill": "develop",
    "version": "v1" | None,      # If version marker detected
    "invocations": 15,
    "completions": 12,           # Ran to end without supersede
    "corrections": 3,            # User corrected during
    "retries": 1,                # Same skill re-invoked
    "avg_tokens": 4500,          # Tokens in skill window
    "completion_rate": 0.80,
    "correction_rate": 0.20,
    "score": 0.60,               # Composite score
}

Analysis Modes

Mode 1: Identify Weak Skills

Rank all skills by composite failure score:

failure_score = (corrections + retries + abandonments) / invocations

Output format:

## Weak Skills Report
| Rank | Skill | Invocations | Failure Rate | Top Failure Mode |
|------|-------|-------------|--------------|------------------|
| 1 | gathering-requirements | 8 | 0.50 | User corrections |

Mode 2: A/B Testing Versions

When version markers detected (e.g., skill:v2 or tagged in args):

## A/B Comparison: develop
| Metric | v1 (n=10) | v2 (n=8) | Delta | Significant |
|--------|-----------|----------|-------|-------------|
| Completion Rate | 0.70 | 0.88 | +0.18 | Yes (p<0.05) |
| Correction Rate | 0.30 | 0.12 | -0.18 | Yes |
| Avg Tokens | 5200 | 4100 | -1100 | Yes |

**Recommendation**: v2 outperforms v1 across all metrics.

Execution Steps

Enumerate sessions in target scope
Parse each session, extracting skill events
Score each invocation using signal detection
Aggregate by skill (and version if A/B)
Rank and report based on analysis mode
Surface actionable insights for skill improvement

Version Detection

Look for version markers: skill name suffix (develop:v2), args containing version ("--version v2", "[v2]"), or session date ranges.

<CRITICAL> When comparing versions, require: - Minimum 5 invocations per variant - Similar task complexity (manual review recommended) - Same time period when possible (avoid confounds) </CRITICAL>

Self-Check

[ ] Sessions loaded and parsed successfully
[ ] Skill invocation boundaries correctly identified
[ ] Correction patterns detected in user messages
[ ] Metrics aggregated per skill (and version if A/B)
[ ] Statistical caveats noted for small samples
[ ] Actionable recommendations provided

<FINAL_EMPHASIS>Skills improve through measurement. Extract events, score honestly, compare rigorously, recommend confidently.</FINAL_EMPHASIS>

Related Skills

axiomantic/writing-skills

testing

VerifiedTrustedCommunity

Use when creating new skills, editing existing skills, or verifying skills work before deployment. Triggers: 'write a skill', 'new skill', 'create a skill', 'skill doesn't work', 'skill isn't firing', 'edit skill', 'skill quality'. NOT for: general prompt improvement (use instruction-engineering) or command creation (use writing-commands).

5SKILL.mdUpdated Apr 3, 2026

axiomantic/writing-skills

axiomantic/writing-plans

development

VerifiedTrustedCommunity

Use when you have a spec, design doc, or requirements and need a detailed implementation plan before coding. Triggers: 'write a plan', 'create implementation plan', 'plan this out', 'break this down into steps', 'convert design to tasks', 'implementation order'. Also invoked by develop during planning. NOT for: reviewing existing plans (use reviewing-impl-plans).

5SKILL.mdUpdated Apr 3, 2026

axiomantic/writing-plans

axiomantic/writing-commands

testing

VerifiedTrustedCommunity

Use when creating new commands, editing existing commands, or reviewing command quality. Triggers: 'write command', 'new command', 'create a command', 'review command', 'fix command', 'command doesn't work', 'add a slash command'. NOT for: skill creation (use writing-skills).

5SKILL.mdUpdated Apr 3, 2026

axiomantic/writing-commands

axiomantic/verifying-hunches

development

VerifiedTrustedCommunity

Use when about to claim discovery during debugging. Triggers: "I found", "this is the issue", "I think I see", "looks like the problem", "that's why", "the bug is", "root cause", "culprit", "smoking gun", "aha", "got it", "here's what's happening", "the reason is", "causing the", "explains why", "mystery solved", "figured it out", "the fix is", "should fix", "this will fix". Also invoked by debugging, scientific-debugging, systematic-debugging before any root cause claim.

5SKILL.mdUpdated Apr 3, 2026

axiomantic/verifying-hunches

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/axiomantic/spellbook.git

# Copy into Claude Code skills folder (global)
cp -r spellbook/skills/analyzing-skill-usage ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

axiomantic/spellbook

5 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT