skills/automatic-stateful-prompt-improver/SKILL.md
Automatically intercepts and optimizes prompts using the prompt-learning MCP server. Learns from performance over time via embedding-indexed history. Uses APE, OPRO, DSPy patterns. Activate on "optimize prompt", "improve this prompt", "prompt engineering", or ANY complex task request. Requires prompt-learning MCP server. NOT for simple questions (just answer them), NOT for direct commands (just execute them), NOT for conversational responses (no optimization needed).
npx skillsauth add curiositech/windags-skills automatic-stateful-prompt-improverInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
PROMPT ASSESSMENT:
├── Simple question/command (what, when, how)
│ └── Skip optimization → Answer directly
├── Complex task (multi-step, reasoning, technical)
│ ├── Token budget < 1000
│ │ └── APE: 3-5 iterations
│ ├── Token budget 1000-5000
│ │ └── OPRO: 5-10 iterations
│ └── Token budget > 5000
│ └── DSPy compilation: 10-20 iterations
└── Reusable template/system prompt
└── Full optimization with historical retrieval
OPTIMIZATION TECHNIQUE SELECTION:
├── Instruction rewriting needed
│ └── Use APE (Automatic Prompt Engineer)
├── Parameter tuning with constraints
│ └── Use OPRO (Optimization by PROmpting)
├── Complex pipeline with multiple modules
│ └── Use DSPy compilation patterns
└── Unknown/exploratory domain
└── Hybrid APE→OPRO→DSPy cascade
ITERATION CONTROL:
├── Improvement < 1% for 3 rounds → STOP
├── Quality score > 0.95 → STOP
├── Max iterations reached → STOP
├── User satisfaction confirmed → STOP
└── Continue → Next iteration
FEEDBACK INTEGRATION:
├── Task successful (user confirms/metrics good)
│ └── Record positive feedback + embed for retrieval
├── Task failed/poor quality
│ └── Record negative feedback + analyze failure mode
└── Unclear outcome
└── Ask user for explicit feedback before recording
Over-Optimization Spiral
Template Obsession
Historical Overfitting
Capability Misjudgment
Measurement Blindness
Original: "Make this code better"
def process_data(data):
results = []
for item in data:
if item > 0:
results.append(item * 2)
return results
Decision Point Navigation:
What Novice Misses: Vague "make better" doesn't specify criteria What Expert Catches: Need explicit dimensions (performance, style, edge cases)
Result: Clear analysis of list comprehension opportunity, edge case handling, type hints
Original: "Help me think through this decision"
Decision Point Navigation:
Optimized Template:
Decision Analysis Framework:
1. SITUATION: State the decision clearly with constraints
2. STAKEHOLDERS: List affected parties and their interests
3. OPTIONS: Generate 3-5 distinct alternatives
4. CRITERIA: Define success metrics and weighting
5. TRADE-OFFS: Analyze each option against criteria
6. RECOMMENDATION: Select best option with confidence level
Quality Gates Applied: Template completeness, reusability score, user satisfaction
Original: "Set up monitoring"
Decision Point Navigation:
Optimized Prompt: "Design monitoring setup by specifying: (1) Infrastructure scope (servers, containers, applications), (2) Key metrics (performance, availability, business), (3) Alert thresholds and escalation, (4) Technology stack constraints, (5) Budget/complexity limits. Provide implementation roadmap with priorities."
Before/After Trade-offs:
Pre-execution checklist before calling optimize_prompt:
Post-optimization validation:
Quality scoring rubric (0-100):
Do NOT use this skill for:
Delegate instead:
Gray areas requiring judgment:
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.