skills/agentic-skill-discovery/SKILL.md
Automated discovery and matching of agent skills for dynamic task routing and capability assessment
npx skillsauth add curiositech/windags-skills agentic-skill-discoveryInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Version: 1.0
Domain: Autonomous Learning Systems, AI Architecture, Reinforcement Learning
Autonomous skill discovery solves the chicken-egg problem: learning what constitutes success while learning how to achieve it. The system must be both student and teacher, creating circular dependencies that require architectural separation of proposal and validation.
If task has ground-truth success criteria (e.g., navigation, manipulation with clear goals) → Use Single-Model Architecture with fast LLM evaluation → Cost: Low | Precision: ~75% | Use case: Predetermined tasks
If task requires open-ended skill discovery (e.g., creative tasks, exploration) → Use Dual-Process Architecture (System 1/System 2) → System 1: Fast LLM evaluation for training loops → System 2: Independent VLM validation for library admission → Cost: High | Precision: ~76% | Use case: Autonomous learning
If complex multi-step behavior is desired → Top-Down Quest Decomposition → Start with failed complex task → decompose into subtasks → learn missing prerequisites → Success rate: 43.75% vs 12.50% for bottom-up chaining
If building skill library from primitives → Bottom-Up Skill Chaining only if primitives are well-matched to end goals → Risk: Combinatorial explosion and missing "middle skills" → Fallback: Switch to top-down when composition fails
If performance plateaus with existing approach → Check for circular dependencies (same model proposing and evaluating) → If detected: Implement architectural separation → If not detected: Add RAG with successful skill patterns for environmental knowledge distillation
| Scenario | Fast Eval (System 1) | Slow Eval (System 2) | Library Admission Rule | |----------|---------------------|----------------------|------------------------| | Training loops | LLM-generated success functions | None | Not applicable | | Skill validation | LLM evaluation | VLM verification | Require both to pass | | Library contamination risk | High tolerance | Zero tolerance | System 2 override required | | Cost constraints | Optimize for speed | Optimize for accuracy | Asymmetric cost acceptance |
Symptoms: High reported success rates but poor task completion, library growing rapidly Detection Rule: If same model generates rewards AND evaluates success, you have circular dependency Root Cause: LLM acting as both player and referee creates evaluation bias Fix: Implement architectural separation with independent validator (e.g., VLM for visual tasks)
Symptoms: Large skill library but inability to solve complex tasks, skills don't compose effectively Detection Rule: If skill count grows but complex task success rate stays flat, you have the "middle skills" problem Root Cause: Bottom-up chaining can't discover skills between primitives and complex goals Fix: Switch to top-down quest decomposition from desired complex behaviors
Symptoms: RAG retrieval happening but no performance improvement, treating retrieval as generic examples Detection Rule: If RAG doesn't improve constraint learning (e.g., physics understanding), it's just expensive prompting Root Cause: Missing the environmental knowledge distillation mechanism Fix: Structure skill library as progressive model of environmental affordances, not just example collection
Symptoms: Performance degrades over time as library grows, false positives compound Detection Rule: If library admission uses same evaluation as training loops, contamination will cascade Root Cause: Bad skills enable bad compositions; library errors compound unlike training noise Fix: Use expensive independent validation for library admission, cheap evaluation for training only
Symptoms: Skills feel disconnected from natural semantic boundaries, forced abstraction levels Detection Rule: If manually specifying skill granularity contradicts LLM proposals, check domain alignment Root Cause: LLMs encode semantic task boundaries from training data Fix: Let system propose skills freely, observe discovered granularity, adjust only for domain-specific needs
Scenario: Agent must learn to open various drawer types without predefined success criteria.
System 1 (Fast) Process:
drawer_handle_grasped = gripper_distance < 0.02System 2 (Slow) Process:
Expert vs Novice:
Outcome: Library maintains 76% precision vs 46% without System 2
Scenario: "Prepare coffee" fails with bottom-up skill chaining despite having primitives like "grasp_cup", "pour_liquid"
Bottom-Up Failure Analysis:
Top-Down Decomposition:
Trade-off Analysis:
When Top-Down Backfires:
Scenario: Agent must learn manipulation skills in new environment without explicit physics constraints
Without RAG (25% success rate):
reward = distance_to_targetWith RAG (46% success rate):
gripper_height > 0.05 constraintreward = distance_to_target if gripper_height > 0.05 else -10Expert vs Novice:
When RAG Backfires:
Do NOT use this skill for:
Delegate to these skills instead:
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.