Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

oliver-kriska/skill-monitor

Name: skill-monitor
Author: oliver-kriska

.claude/skills/skill-monitor/SKILL.md

npx skillsauth add oliver-kriska/claude-elixir-phoenix skill-monitor

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Skill Monitor

Closed-loop skill effectiveness monitoring. Reads session metrics, computes per-skill signals, identifies what's working and what needs improvement.

Inspired by the deploy-monitor-evaluate-improve feedback loop: skills get better over time instead of staying static.

Requirements

Requires .claude/session-metrics/metrics.jsonl from /session-scan. If no data: suggest running /session-scan first.

Usage

/skill-monitor                     # Dashboard: all skills
/skill-monitor --skill review      # Deep-dive on one skill
/skill-monitor --improve           # Generate improvement recommendations
/skill-monitor --window 30d        # Change comparison window (default: 7d)

What Main Context Does

Step 1: Parse Arguments

Extract from $ARGUMENTS:

--skill NAME: Focus on one skill (e.g., review, plan, investigate)
--improve: Spawn analysis agent for improvement recommendations
--window PERIOD: Comparison window (7d, 30d, all; default: 7d)

Step 2: Load Metrics

Read .claude/session-metrics/metrics.jsonl. For each entry, extract the skill_effectiveness field (added by compute-metrics.py v2).

Filter by window period. Count sessions with and without skill usage.

If no skill_effectiveness data exists in metrics: "Metrics were computed before skill tracking was added. Run /session-scan --rescan to recompute."

Step 3: Compute Per-Skill Aggregates

For each skill found across all sessions, aggregate:

| Metric              | Computation                                    |
|---------------------|------------------------------------------------|
| Total invocations   | Sum of invocation_count across sessions        |
| Sessions used in    | Count of sessions containing this skill        |
| Action rate         | Weighted avg of per-session action_rate         |
| Avg post-errors     | Weighted avg of avg_post_errors                |
| Avg post-corrections| Weighted avg of avg_post_corrections           |
| Outcome distribution| Count of effective/friction/no_action/mixed    |
| Effectiveness score | action_rate - (0.3 * avg_post_corrections)     |
| Adjusted score      | For analysis/check skills, use lower thresholds |

Skill type weighting: Analysis and check skills (verify, triage, perf, boundaries, pr-review, audit) have low action rates BY DESIGN — their success is "found issues" or "confirmed things pass". Apply adjusted thresholds:

| Skill Type | Flag Threshold | Expected Action Rate | |------------|---------------|---------------------| | Execution (work, quick, full) | < 0.5 | > 0.7 | | Analysis (perf, boundaries, audit, pr-review) | < 0.3 | 0.3-0.5 | | Check (verify, triage) | < 0.1 | 0.0-0.3 | | Knowledge (compound, learn, brief) | < 0.5 | > 0.5 |

Also compute baseline friction (avg friction of sessions WITHOUT any skill usage) vs skill friction (avg friction of sessions WITH skill usage). Delta = skill_friction - baseline_friction. Negative delta = skills reduce friction (good).

Step 4: Display Dashboard

Dashboard mode (no --skill):

## Skill Effectiveness Dashboard (last {window})

Baseline friction (no skills): 0.32 | With skills: 0.18 | Delta: -0.14

| Skill           | Uses | Sessions | Action% | Errors | Corrections | Outcome    | Score |
|-----------------|------|----------|---------|--------|-------------|------------|-------|
| /phx:review     | 12   | 8        | 92%     | 0.5    | 0.1         | effective  | 0.89  |
| /phx:plan       | 9    | 7        | 100%    | 0.2    | 0.0         | effective  | 1.00  |
| /phx:investigate| 5    | 5        | 80%     | 1.2    | 0.4         | mixed      | 0.68  |

Skills needing attention: /phx:investigate (high post-errors)

Flag skills using type-adjusted thresholds (see weighting table above). Also flag if avg_post_corrections > 1 or outcome is predominantly "friction". When displaying flagged skills, note if the flag is "expected" for the skill type (e.g., verify at 0.24 is normal for a check skill).

Skill deep-dive (--skill NAME):

Show per-session breakdown for that skill, including session IDs, dates, and individual outcome signals. If session reports exist in .claude/session-analysis/, reference them.

Step 5: Improvement Mode (--improve)

Spawn skill-effectiveness-analyzer agent:

Agent(subagent_type="skill-effectiveness-analyzer", model="sonnet", prompt="""
Analyze skill effectiveness data and recommend improvements.

Metrics data: {aggregated_metrics_json}

Sessions with friction outcomes: {session_ids}

For each underperforming skill:
1. Identify failure patterns from outcome signals
2. Propose specific skill/agent changes
3. Suggest new Iron Laws if patterns are systematic

Write recommendations to: .claude/skill-metrics/recommendations-{date}.md
""")

Step 6: Write Output

Write aggregated metrics to .claude/skill-metrics/dashboard-{date}.json:

{
  "computed_at": "2026-03-03T14:00:00Z",
  "window": "7d",
  "baseline_friction": 0.32,
  "skill_friction": 0.18,
  "friction_delta": -0.14,
  "skills": { ... },
  "flagged_skills": ["investigate"]
}

Append-only: never modify previous dashboard files.

Iron Laws

NEVER modify metrics.jsonl — read-only from this skill
Baseline comparison is mandatory — raw numbers without baseline are meaningless
Flag, don't judge — surface data, let the human decide what to fix
Evidence tags on recommendations — every suggestion needs session citations

Integration

/session-scan → metrics.jsonl (with skill_effectiveness)
       ↓
/skill-monitor → dashboard + flagged skills
       ↓
/skill-monitor --improve → recommendations
       ↓
Developer updates skills/agents → deploy → repeat

References

references/effectiveness-metrics.md — Full metrics schema and evaluation criteria
references/improvement-template.md — Template for improvement recommendations

oliver-kriska/skill-monitor

.claude/skills/skill-monitor/SKILL.md

Analyze skill effectiveness across sessions. Computes per-skill metrics (action rate, friction, outcomes), identifies degrading skills, and generates improvement recommendations. Requires session-scan data in metrics.jsonl.

207 stars

data-ai

Updated Apr 7, 2026

$ install --global

skillsauth

npx skillsauth add oliver-kriska/claude-elixir-phoenix skill-monitor

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 7, 2026, 8:07 PM5.4s3 files scanned

SKILL.md

name:: skill-monitor
description:: Analyze skill effectiveness across sessions. Computes per-skill metrics (action rate, friction, outcomes), identifies degrading skills, and generates improvement recommendations. Requires session-scan data in metrics.jsonl.
argument-hint:: [--skill NAME] [--improve] [--window 7d|30d|all]
disable-model-invocation:: true

Skill Monitor

Closed-loop skill effectiveness monitoring. Reads session metrics, computes per-skill signals, identifies what's working and what needs improvement.

Inspired by the deploy-monitor-evaluate-improve feedback loop: skills get better over time instead of staying static.

Requirements

Requires .claude/session-metrics/metrics.jsonl from /session-scan. If no data: suggest running /session-scan first.

Usage

/skill-monitor                     # Dashboard: all skills
/skill-monitor --skill review      # Deep-dive on one skill
/skill-monitor --improve           # Generate improvement recommendations
/skill-monitor --window 30d        # Change comparison window (default: 7d)

What Main Context Does

Step 1: Parse Arguments

Extract from $ARGUMENTS:

--skill NAME: Focus on one skill (e.g., review, plan, investigate)
--improve: Spawn analysis agent for improvement recommendations
--window PERIOD: Comparison window (7d, 30d, all; default: 7d)

Step 2: Load Metrics

Read .claude/session-metrics/metrics.jsonl. For each entry, extract the skill_effectiveness field (added by compute-metrics.py v2).

Filter by window period. Count sessions with and without skill usage.

If no skill_effectiveness data exists in metrics: "Metrics were computed before skill tracking was added. Run /session-scan --rescan to recompute."

Step 3: Compute Per-Skill Aggregates

For each skill found across all sessions, aggregate:

| Metric              | Computation                                    |
|---------------------|------------------------------------------------|
| Total invocations   | Sum of invocation_count across sessions        |
| Sessions used in    | Count of sessions containing this skill        |
| Action rate         | Weighted avg of per-session action_rate         |
| Avg post-errors     | Weighted avg of avg_post_errors                |
| Avg post-corrections| Weighted avg of avg_post_corrections           |
| Outcome distribution| Count of effective/friction/no_action/mixed    |
| Effectiveness score | action_rate - (0.3 * avg_post_corrections)     |
| Adjusted score      | For analysis/check skills, use lower thresholds |

Step 4: Display Dashboard

Dashboard mode (no --skill):

## Skill Effectiveness Dashboard (last {window})

Baseline friction (no skills): 0.32 | With skills: 0.18 | Delta: -0.14

| Skill           | Uses | Sessions | Action% | Errors | Corrections | Outcome    | Score |
|-----------------|------|----------|---------|--------|-------------|------------|-------|
| /phx:review     | 12   | 8        | 92%     | 0.5    | 0.1         | effective  | 0.89  |
| /phx:plan       | 9    | 7        | 100%    | 0.2    | 0.0         | effective  | 1.00  |
| /phx:investigate| 5    | 5        | 80%     | 1.2    | 0.4         | mixed      | 0.68  |

Skills needing attention: /phx:investigate (high post-errors)

Skill deep-dive (--skill NAME):

Show per-session breakdown for that skill, including session IDs, dates, and individual outcome signals. If session reports exist in .claude/session-analysis/, reference them.

Step 5: Improvement Mode (--improve)

Spawn skill-effectiveness-analyzer agent:

Agent(subagent_type="skill-effectiveness-analyzer", model="sonnet", prompt="""
Analyze skill effectiveness data and recommend improvements.

Metrics data: {aggregated_metrics_json}

Sessions with friction outcomes: {session_ids}

For each underperforming skill:
1. Identify failure patterns from outcome signals
2. Propose specific skill/agent changes
3. Suggest new Iron Laws if patterns are systematic

Write recommendations to: .claude/skill-metrics/recommendations-{date}.md
""")

Step 6: Write Output

Write aggregated metrics to .claude/skill-metrics/dashboard-{date}.json:

{
  "computed_at": "2026-03-03T14:00:00Z",
  "window": "7d",
  "baseline_friction": 0.32,
  "skill_friction": 0.18,
  "friction_delta": -0.14,
  "skills": { ... },
  "flagged_skills": ["investigate"]
}

Append-only: never modify previous dashboard files.

Iron Laws

NEVER modify metrics.jsonl — read-only from this skill
Baseline comparison is mandatory — raw numbers without baseline are meaningless
Flag, don't judge — surface data, let the human decide what to fix
Evidence tags on recommendations — every suggestion needs session citations

Integration

/session-scan → metrics.jsonl (with skill_effectiveness)
       ↓
/skill-monitor → dashboard + flagged skills
       ↓
/skill-monitor --improve → recommendations
       ↓
Developer updates skills/agents → deploy → repeat

References

references/effectiveness-metrics.md — Full metrics schema and evaluation criteria
references/improvement-template.md — Template for improvement recommendations

Related Skills

oliver-kriska/assigns

tools

VerifiedTrustedCommunity

Compatibility alias for the Elixir/Phoenix plugin's LiveView assigns audit. Invoke explicitly with /lv:assigns.

505SKILL.mdUpdated Jul 26, 2026

oliver-kriska/assigns

oliver-kriska/trace

development

VerifiedTrustedCommunity

Trace Elixir call trees from entry points via mix xref. Use when debugging data flow, planning signature changes, or understanding how a bug reaches code.

505SKILL.mdUpdated Jul 26, 2026

oliver-kriska/n1-check

tools

VerifiedTrustedCommunity

Compatibility alias for the Elixir/Phoenix plugin's N+1 query checker. Invoke explicitly with /ecto:n1-check.

505SKILL.mdUpdated Jul 26, 2026

oliver-kriska/n1-check

oliver-kriska/constraint-debug

tools

VerifiedTrustedCommunity

Compatibility alias for the Elixir/Phoenix plugin's Ecto constraint debugger. Invoke explicitly with /ecto:constraint-debug.

505SKILL.mdUpdated Jul 26, 2026

oliver-kriska/constraint-debug

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/oliver-kriska/claude-elixir-phoenix.git

# Copy into Claude Code skills folder (global)
cp -r claude-elixir-phoenix/.claude/skills/skill-monitor ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

oliver-kriska/claude-elixir-phoenix

207 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT