skills/skill-logger/SKILL.md
Logs and scores skill usage quality, tracking output effectiveness, user satisfaction signals, and improvement opportunities. Expert in skill analytics, quality metrics, feedback loops, and continuous improvement. Activate on "skill logging", "skill quality", "skill analytics", "skill scoring", "skill performance", "skill metrics", "track skill usage", "skill improvement". NOT for creating skills (use agent-creator), skill documentation (use skill-coach), or runtime debugging (use debugger skills).
npx skillsauth add curiositech/windags-skills skill-loggerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Track, measure, and improve skill quality through systematic logging and scoring.
Skill Category Analysis:
├─ Code Generation Skills
│ ├─ IF output type = "code" → Priority: syntax_correctness, test_pass_rate, user_edits
│ ├─ IF includes tool calls → Track: tool_success_rate, retry_count
│ └─ ELSE → Standard completion signals
│
├─ Analysis/Advisory Skills
│ ├─ IF output is recommendations → Priority: follow_up_rate, acceptance_rate
│ ├─ IF research-heavy → Track: source_quality, comprehensiveness
│ └─ ELSE → Focus on user_satisfaction, edit_ratio
│
├─ Creative/Content Skills
│ ├─ IF artistic output → Priority: user_acceptance, revision_requests
│ ├─ IF writing/documentation → Track: readability_score, user_edits
│ └─ ELSE → Standard quality metrics
│
└─ Meta/System Skills
├─ IF affects other skills → Priority: downstream_impact, system_health
├─ IF automation focused → Track: execution_success, error_recovery
└─ ELSE → Completion rate, efficiency metrics
Signal Availability Decision Tree:
├─ Real-time signals available?
│ ├─ YES → Collect: completion_rate, token_efficiency, tool_success
│ └─ NO → Skip to delayed collection
│
├─ User interaction possible?
│ ├─ YES → Request: thumbs_up/down, edit_ratio measurement
│ └─ NO → Infer from: retry_requests, follow_up_questions
│
├─ Output testable?
│ ├─ Code → Run: syntax_check, basic_execution
│ ├─ Data → Validate: format_compliance, completeness
│ └─ Text → Check: length_appropriateness, structure
│
└─ Delayed validation available?
├─ YES → Schedule: outcome_tracking, revert_detection
└─ NO → Mark as: immediate_signals_only
Based on skill usage context:
IF high_stakes_usage (production, important decisions):
→ Use strict scoring: require 90+ for "good", weight errors heavily
ELIF experimental_usage (testing, learning):
→ Use lenient scoring: 70+ acceptable, focus on learning signals
ELIF routine_usage (daily workflow):
→ Use balanced scoring: standard thresholds, efficiency emphasis
ELSE (unknown context):
→ Default to balanced scoring with conservative error handling
Symptoms: Skill produces unnecessarily verbose outputs, token usage 2x+ expected baseline
Detection: IF tokens_output > baseline_tokens * 2.0 AND user_edit_ratio > 0.6
Diagnosis: Skill optimizing for completeness over conciseness
Fix: Add explicit length constraints, example-based training on concise outputs
Symptoms: Skill quality scores trending downward without apparent cause changes
Detection: IF quality_trend_7d < -5 AND no_skill_changes AND stable_usage_pattern
Diagnosis: External factors (user expectations, data changes) affecting relative performance
Fix: Recalibrate baselines, investigate environmental changes, update scoring criteria
Symptoms: Quality scores don't reflect actual user satisfaction, delayed negative signals
Detection: IF immediate_score > 80 AND delayed_satisfaction < 50 AND lag > 24hrs
Diagnosis: Relying too heavily on completion metrics vs. outcome metrics
Fix: Implement delayed signal collection, weight outcome signals higher, add follow-up tracking
Symptoms: Most skills scoring 85+ but users reporting quality issues
Detection: IF avg_skill_score > 85 AND user_complaints > baseline AND edit_ratio > 0.4
Diagnosis: Scoring criteria too lenient, missing key quality dimensions
Fix: Tighten scoring thresholds, add user satisfaction weight, implement comparative scoring
Symptoms: Quality scores fluctuating wildly, no clear improvement signal
Detection: IF score_variance > 20 AND no_clear_trend AND random_pattern
Diagnosis: Collecting too many weak signals, insufficient signal filtering
Fix: Focus on top 3-5 signals per skill category, implement signal confidence weighting
Scenario: The api-architect skill's quality score dropped from 89 to 67 over 3 days.
Step 1 - Detect Regression
# Query recent performance
SELECT skill_name, AVG(quality_score) as avg_score, COUNT(*) as uses
FROM skill_invocations
WHERE skill_name = 'api-architect'
AND timestamp > datetime('now', '-7 days')
GROUP BY DATE(timestamp)
ORDER BY timestamp DESC;
# Result shows: 89→78→71→67 trend with stable usage (12-15 daily uses)
Step 2 - Diagnose Cause
-- Check error patterns
SELECT errors_json, COUNT(*)
FROM skill_invocations
WHERE skill_name = 'api-architect'
AND quality_score < 70
AND timestamp > datetime('now', '-3 days');
-- Result: 8 instances of "Missing import statements" error
-- This is NEW - wasn't appearing before
Step 3 - Analyze Specific Failures
# Examine failed invocations
failed_cases = get_low_score_invocations('api-architect', days=3)
for case in failed_cases:
if 'import' in case.errors:
# Pattern: Generated code missing required imports
# User edit ratio: 0.7 (high - users adding imports)
# Tool success rate: normal (tools work, output incomplete)
Step 4 - Root Cause The skill was recently updated to use a more concise code style, but the import detection logic wasn't updated to handle the new patterns.
Step 5 - Fix Implementation
# Add import detection rule to skill
IMPORT_PATTERNS = [
r'from \w+',
r'import \w+',
r'require\(',
r'#include'
]
def ensure_imports_present(code):
# Check for usage of external functions/classes
# Automatically add missing imports
Step 6 - Validate Fix Monitor for 48 hours:
Expert Insight: The novice approach would be to just re-calibrate the scoring threshold. The expert approach recognizes that a sudden quality drop with stable usage indicates a systematic issue requiring root cause analysis and targeted fixes.
Do NOT use skill-logger for:
agent-creator insteadskill-coach insteaddebugger or error-analyst skillsdevops-automator for system logssystem-monitor skillDelegate when:
skill-coachagent-creatordevops-automatortools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.