skills/wang-2023-voyager/SKILL.md
Mental models and decision frameworks for building autonomous agents that continuously learn, explore, and accumulate skills in open-ended environments without human supervision
npx skillsauth add curiositech/windags-skills wang-2023-voyagerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
IF proposed task requires skills/items agent doesn't have:
IF proposed task is similar to recently completed tasks:
IF task involves completely new domain (new biome/tool/mechanic):
IF current task has clear semantic match in library (similarity >0.85):
IF current task has partial matches (similarity 0.6-0.85):
IF current task has no good matches (similarity <0.6):
IF code has syntax/runtime errors:
IF code runs but fails verification:
IF code times out or loops infinitely:
Symptoms: Agent gets stuck proposing/failing same difficulty tasks repeatedly
Detection: Success rate flat for 10+ tasks, no new skills added to library
Fix: Force curriculum to propose easier tasks to rebuild confidence, or harder tasks to break through plateau
Symptoms: Code generation degrades as library grows, LLM context filled with irrelevant skills
Detection: Recent success rate declining despite library growth, retrieval returning low-similarity matches
Fix: Improve semantic indexing, add recency weighting to retrieval, compress old skills into documentation
Symptoms: Agent spends 4+ iterations on tasks that should succeed in 1-2 attempts
Detection: High iteration count with repeated similar errors, no progress between attempts
Fix: Better error categorization, early termination for unsolvable tasks, task decomposition
Symptoms: Library fills with hyper-specific skills that never get reused
Detection: Low skill reuse rate, many skills with usage_count=1
Fix: Encourage more general skill patterns, merge similar skills, add skill cleanup process
Symptoms: Agent claims success for tasks that obviously failed
Detection: Self-verification approval rate >90% but manual inspection shows failures
Fix: Add objective success criteria, cross-check with environment state, improve verification prompts
Initial State: Agent has iron ore in inventory, knows location of coal
Step 1: Task Proposal
Step 2: Skill Retrieval
smeltCoal() (similarity 0.72)lightFurnace(), collectFromFurnace() (similarity 0.68, 0.64)Step 3: Code Generation (Iteration 1)
async function smeltIronOre(bot) {
// Generated code tries to use furnace without coal
await bot.pathfinder.goto(...furnaceLocation);
await bot.clickWindow(bot.currentWindow.slots[0]); // Place iron ore
// Missing: check for coal, light furnace
}
Step 4: Execution + Feedback
Step 5: Code Generation (Iteration 2)
async function smeltIronOre(bot) {
await collectCoal(bot); // Reused skill
await bot.pathfinder.goto(...furnaceLocation);
await lightFurnace(bot); // Reused skill
// Place iron ore and coal in correct slots
await waitForSmelting(bot);
}
Step 6: Success + Library Addition
Expert vs Novice: Novice would retry without coal indefinitely; expert recognizes furnace lighting as prerequisite and reuses existing skills.
Initial State: Agent has basic building blocks, intermediate construction skills
Step 1: Curriculum Reasoning
Step 2: Skill Composition Pattern
digArea(), placeBlocks(), checkInventory()Progressive Difficulty: Each castle component increases architectural complexity while reusing spatial reasoning skills.
Task Completion Criteria:
Skill Library Health:
System Progress Indicators:
Do NOT use VOYAGER for:
Delegate to other approaches:
deep-learning-training.mdswarm-intelligence.mdmcts-planning.mdformal-verification.mdinteractive-learning.mdVOYAGER excels at open-ended single-agent learning in environments with rich feedback, executable actions, and compositional task structure. Stay within these boundaries for best results.
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.