
Quickly understand an unfamiliar codebase or project by mapping its architecture, identifying key components and data flows, and surfacing complexity hotspots. Load when the user asks to understand a repo, explain how something works, map the architecture, onboard to a codebase, or explore how components connect. Also triggers on "walk me through this codebase", "how does this project work", "explain the architecture", "what does this repo do", "show me the structure", "onboard me", or any request to build a mental model of a codebase before making changes.
Gracefully retire a skill that is redundant, superseded, or no longer earning its place in the context window. Load when improve-skills finds a skill scoring 0-5/14 AND research confirms the domain is now handled natively by current models, when two skills have overlapping triggers and one subsumes the other, when the user asks to remove a skill, retire a skill, delete a skill, or clean up redundant skills, or when validate-skills flags a skill as a duplicate trigger risk. Handles removal cleanly: updates all callers, removes from AGENTS.md, updates README, and archives rather than deletes so the skill can be recovered if needed.
Create focused role prompts for agents in multi-agent topologies. Load when agent-builder needs role prompts for agents, or when a user asks to "create an agent prompt", "write a role prompt", "define agent identity", "write an agent role", "prompt for this agent". Scope: agent role prompts only (v1). System prompts, task prompts, and skill invocation prompts are future TODOs.
Flip a problem, goal, or decision 180 degrees to find what forward thinking misses. Asks what would guarantee failure, then works backward to what must be avoided or changed. Load when the user says "invert this", "flip this problem", "what would guarantee failure", "think backward", or when deep-thinking diagnoses an inversion frame. Two methods: Failure Inversion (ask what guarantees failure, avoid those things) and Opposite Goal (ask what the reverse goal looks like, check if we're accidentally pursuing it). Max 2 clarifying questions before inverting. Always returns forward actions. For broader analysis involving multiple frameworks, deep-thinking calls this.
Capture actionable learnings that emerge during conversation — when the agent or user discovers that a skill, a set of skills, or a process needs to be updated based on what's happening in the current chat. Sub-skill of the learn-from orchestrator. Load when the user says "we should update the skill for this", "this should be a skill rule", "add this as a gotcha", "the skill should know about this", "update the process for this", "learn from this", "remember this for next time", "this is important for the skill". Also triggers when the agent notices a skill's guidance was wrong or incomplete, a process step failed or was unnecessary, a new pattern emerged, a guardrail was missing, a workaround became a pattern, or a debugging session reveals a gap.
Write a Product Soul document — the strategic north star that sits above any PRD or feature spec. Captures the product's reason for existing across five lenses: user, business, strategy, product-market fit, and GTM distribution. Load when the user asks to write a product soul, product strategy doc, product north star, product positioning doc, product one-pager, or "why we exist" document. Also triggers on "write the soul of this product", "product strategy document", "what is this product really about", "capture the product vision", or when an agent needs strategic context before making product decisions. The output is docs/product-soul.md — a living document that brainstorming, prd-writing, and inversion can reference for grounding.
Evaluate any project, product, or system's claims against its actual implementation — scoring each claim for truth, identifying architectural gaps, assessing competitive positioning, proposing creative solutions, and producing an actionable roadmap. Load when the user asks to evaluate claims, reality-check a project, assess what this project actually does vs what it says, validate product claims, or score a system's credibility. Also triggers on "is this real", "does this work as claimed", "evaluate this project", "assess the gap between claims and reality", "how credible is this", "investor assessment", "score these claims", or "what's real vs marketing".
Find the right skill for a capability. Load when a user or skill needs to check if a skill exists for a given task, when process-decomposer assigns skills to steps, or when agent-builder checks skill availability. Triggers on "what skill does this need", "find a skill for", "is there a skill that", "which skill handles", "does a skill exist for", "skill lookup", "check skill library". Prevents skill sprawl by always checking existing skills before creating new ones. The gatekeeper for all skill creation.
Security audit orchestrator for agent skills — scans for prompt injection, data exfiltration, credential theft, supply chain risks, and instruction hierarchy violations before any skill is installed, created, improved, or read from a GitHub repo. Load when creating skills from external sources, when improve-skills reads from GitHub repos, when research-skill fetches community SKILL.md files, when a user installs a third-party skill, or when the user asks to audit skill security, scan for injection, check if a skill is safe, scan all skills, or run a security sweep. Orchestrates all secure-* skills in sequence. Content is SAFE only if ALL secure-* skills return SAFE. 36% of community skills contain flaws (Snyk ToxicSkills 2026). This skill is the first line of defense.
Think through the consequences of consequences — not just what happens immediately, but what happens next, and next after that, across time. Load when a decision looks obviously good or obviously bad at first glance, when the user is optimising for a short-term outcome that might create a long-term problem, when unintended consequences are a concern, or when deep-thinking diagnoses a second-order frame. Triggers on "what are the downstream effects", "what happens after that", "unintended consequences", "think ahead on this", "long-term vs short-term", or "and then what". Based on Howard Marks second-level thinking and Farnam Street mental models. Most powerful for decisions with delayed consequences or systemic effects.
Internal skill. Called by setup-evaluation after a PASS. Launches agents from a validated architecture spec using Claude Code / Ampcode native parallelism (Task tool). Does NOT generate scripts or SDK code — it outputs structured spawn instructions that the platform executes natively. Never invoked directly by the user. Never launches without a setup-evaluation PASS.
Design state-of-the-art multi-agent systems, orchestration patterns, and wiring. Load when the user asks to build an agent system, design agent orchestration, choose between sequential/parallel/hierarchical workflows, or define how multiple agents should collaborate. Also triggers on "agent architecture", "multi-agent wiring", "agent orchestration pattern", "how to connect these agents", or any request to design the cognitive and communication structure of an AI system.
Apply validated research paper insights to the current project codebase — improving architecture, code patterns, testing strategies, documentation, or workflows based on empirical findings. Load when learn-from-paper routes insights to the current project, or when the user asks to apply paper findings to this project, improve this codebase with research, use this paper to improve my project, or apply research to my code. Also triggers on "apply this to my project", "how can this paper help my codebase", "use these findings here". Always called AFTER learn-from-paper has completed credibility and security checks — never ingests papers directly.
Surface every assumption embedded in a plan, strategy, or document, assess how critical and how validated each one is, and identify which ones to test first. Load when the user asks to map assumptions, surface hidden beliefs, find what must be true for this to work, run an assumption audit, or when deep-thinking diagnoses an assumption frame. Also triggers on "what are we assuming", "what must be true for this to work", or "find the untested beliefs". Based on David Bland and Alex Osterwalder's assumption mapping method from Testing Business Ideas.
Turn a rough idea into a fully approved design before any code is written. Load when the user wants to brainstorm, explore ideas, design a feature, think through approaches, plan a new capability, or figure out what to build. Also triggers on "let's think through", "help me design", "explore options", "what's the best approach for", "I have an idea for", "before we build", or any request to plan or spec something before implementation. Enforces a hard gate: no code, no implementation until user approves a design.
Review code changes for correctness, completeness, bugs, edge cases, and quality. Load when the user explicitly asks to review code, check a PR, review a diff, audit recent changes, or verify an implementation matches requirements. Also triggers on "review this code", "check this PR", "review my changes", "code review", "did this implement correctly", "audit this diff", or any explicit request for a formal code review. Do NOT load for "review changes for context" or "review what happened" — those are requests to read code, not to perform a formal review.
Compress an oversized SKILL.md to under 200 lines without losing effectiveness. Load when a skill exceeds 200 lines, when AGENTS.md triggers compression after a skill edit, or when the user asks to compress, shrink, slim down, or optimize a skill. Applies to all skills including meta skills — the 200-line rule has no exceptions. Preserves hard gates, gotchas, output format, routing triggers, and at least one example. When genuinely CORE content cannot be compressed away, invokes split-skill instead of degrading the skill.
Repair and verify cross-references between SKILL.md files after a skill is created, renamed, removed, or restructured. Ensures every skill that calls another skill references the correct name, and every skill that is called has accurate "Called by" context. Load after universal-skill-creator creates a skill, after improve-skills completes a cycle, after a skill is renamed or removed, or when the user asks to fix cross-references, sync skill links, repair broken skill references, or update skill cross-links.
Decompose an unknown quantity into 3-5 estimable factors and produce a defensible order-of-magnitude answer without needing precise data. Load when the user needs to size something without data — market size, resource requirements, effort estimates, user numbers, costs — or when a decision is blocked by "we don't know the numbers". Also triggers on "ballpark this", "rough estimate", "how big is this market", "how long would this take", "how many users", or when deep-thinking diagnoses a sizing/estimation frame. The goal is not precision — it is a defensible answer that enables a decision to be made. Based on Enrico Fermi's estimation method.
Generate user-facing or internal release notes and changelogs. Load when the user prepares a release, tags a version, or wants to summarize recent progress. Also triggers on "write a changelog", "prepare release notes", "what's new in this version", "summarize my commits", or "create a release summary". Auto-triggered by library-skill after major repo changes (new skills added, skills renamed, structure changes). Applicable to any project, including this skill library itself.
Create a detailed, step-by-step implementation plan for a feature or project. Load when the user asks to plan a feature, create a technical roadmap, break down a PRD into tasks, design an implementation strategy, or sequence engineering work. Also triggers on "how should we build this", "implementation plan for", "technical breakdown", "task list for", or any request to turn a high-level requirement into a concrete execution plan. Supports phased rollouts, architecture-first, and MVP-focused planning.
Audit, improve, and compress every skill in the repo using live research. Load when the user asks to improve skills, audit the skill library, upgrade existing skills, refresh with new research, do a skill health check, or says "improve all skills", "update the skill library", "skill audit", or "run an improvement pass". Applies live domain research, fixes structural gaps, checks for skill linking opportunities, then rewrites and resizes each skill. All skills are in scope including meta skills.
Extract actionable insights from blog posts, web articles, and practitioner content — assess credibility, run security checks, and either improve existing skills or apply to the current project. Load when the user asks to learn from an article, extract insights from a blog post, apply a practitioner's findings, or process engineering blog content. Also triggers on "learn from this article", "learn from this blog post", "extract insights from this post", "what can we learn from this article", "apply this article", or when the user links to a blog, Medium, Substack, dev.to, or engineering blog post.
Extract actionable skill knowledge from academic papers and research articles, assess credibility, run security checks, and either improve existing skills or create new ones. Load when the user asks to learn from a paper, extract insights from a research paper, turn a paper into a skill, apply paper findings to skills, read this paper and improve my skills, or process a research article. Also triggers on "skill from paper", "learn from this paper", "paper to skill", "extract from this research", "apply this paper", or when the user uploads or links to an academic PDF or paper URL.
Automatically maintain skill library consistency whenever a structural change occurs — new skill added, skill renamed, skill deprecated, call graph rewired, or category changed. Updates SKILL-INDEX.md, AGENTS.md, README.md, skill graph, docs/prd/PRD.md, and docs/architecture.md. Load when universal-skill-creator finishes creating a skill, split-skill extracts a child, deprecate-skill retires a skill, improve-skills makes a structural change, or the user manually renames or restructures skills. Also triggers on "update the skill index", "sync skill references", "refresh the skill graph", "fix broken skill cross-references", or "update docs after skill change".
Apply Boyd's OODA loop to navigate fast-moving, uncertain, or competitive situations — Observe what is actually happening, Orient through mental models and context, Decide on the clearest path, Act quickly, then loop again. Load when the situation is changing faster than the current plan, when a competitive response is needed, when the user is stuck in analysis paralysis in a dynamic environment, when shipping under uncertainty, or when deep-thinking diagnoses a fast-moving / competitive frame. Triggers on "what should we do right now", "the situation is changing", "how do we respond to this", "competitive response", "we need to move fast", or "we're stuck deciding". Based on John Boyd's OODA loop — Observe, Orient, Decide, Act — adapted for product and business contexts (OODA Canvas 2026).
Extract actionable patterns, architecture decisions, code conventions, and skill-relevant insights from GitHub/GitLab repositories. Assess repo credibility, run the full security scan pipeline, and apply findings to existing skills, new skills, or the current project. Load when the user asks to learn from a repo, extract patterns from a codebase, study a repository, or analyze a repo for reusable techniques. Also triggers on "learn from this repo", "learn from this repository", "what can we learn from this codebase", "extract patterns from this repo", "study this repo".
Turn a problem, edit request, bug report, or feature idea into three deliverables: a mini-spec (docs/specs/), a detailed implementation-ready plan (docs/plans/), and a TODO.md with agent-pickable tasks and milestones. Load when the user describes a problem and wants planning artifacts, says "plan this change", "spec this out", "create a TODO", "write a plan for this", "problem to plan", "break this into tasks for agents", "I want to change X — plan it", or when process-decomposer routes here after determining the user needs planning deliverables. Also triggers on "create tasks from this problem", "make this actionable", or "turn this into a plan agents can execute".
Run a pre-mortem — imagine the project has already failed one year from now and work backward to find the root causes before they happen. Load when the user asks for a pre-mortem, wants to imagine failure before committing, asks "what could go wrong before we start", "assume this fails — why", or when deep-thinking diagnoses a pre-mortem frame. Based on Gary Klein's prospective hindsight method, which surfaces failure causes more effectively than forward risk analysis. Most useful right before a major commitment.
Package and publish a skill to the skills.sh community registry, a public GitHub repo, or both. Load when the user asks to publish a skill, share a skill publicly, submit a skill to the registry, release a skill, or when universal-skill-creator offers publishing as a final step after creation. Also triggers on "push this skill to skills.sh", "make this skill public", "share this skill", or "contribute this skill to the community". Validates quality before publishing, packages correctly (zip for multi-file, .md for atomic), writes a README if missing, and publishes via npx skills CLI.
Content sanitization and hidden-content detection for agent skill security. Scans markdown, HTML, and text for visually hidden but agent-readable attacks: CSS-hidden text (display:none, color:white, font-size:0, opacity:0), HTML comments with instructions, collapsible details sections, zero-width unicode, homoglyphs, misleading links, and inline HTML in markdown. Enforces mandatory sanitization before external content enters agent context. Load as part of the secure-* sequence during any repo scan or skill audit. Also load for sanitize content, check hidden text, scan markdown attacks, strip HTML, detect invisible instructions, check zero-width chars. Core principle: visibility does not equal influence — hidden content is more dangerous than visible content because agents process it but humans cannot see it.
Security checks for repository ingestion — scans repos for poisoned examples, dependency and supply-chain attacks, file/path traversal, format-based attacks, and enforces quarantine-before-commit. Load as part of the secure-* skill sequence whenever an agent reads, ingests, or learns from a GitHub repository. Also load when the user asks to check a repo for poisoned code, scan dependencies, verify supply chain safety, check for path traversal, scan repo files for attacks, or audit a repo before ingestion. Covers Issues 3, 4, 7, 8 from the agent security threat model: poisoned training data, dependency attacks, file/path attacks, and format-based attacks.
Runtime security for agent skills — prevents state corruption, skill overwrite attacks, denial of service, and enforces provenance tracking and no-go repo management. Load as part of the secure-* skill sequence whenever an agent processes external content or writes to the skill store. Also load when the user asks to check for state corruption, prevent skill overwrite, manage no-go repos, check provenance, audit runtime security, detect DoS patterns, or protect the skill store. Covers Issues 6, 9, 10 from the agent security threat model: instruction hierarchy enforcement, state corruption and skill overwrite, and denial of service prevention.
Validate process decomposition and architecture design quality before execution begins. Load when the setup-evaluator agent fires (automatic for agent-chain tasks), or when user says "evaluate this setup", "check the decomposition", "validate the architecture", "is this plan sound", "review the agent design". Catches structural errors, missing knowledge, unrealistic step ordering, and topology mismatches. Does NOT modify — only evaluates.
Break a stuck or complex problem into the smallest sub-question that, if answered, unlocks the next step — then answer it and repeat until the path forward is clear. Load when a problem feels genuinely stuck, when reasoning keeps circling, when the user needs to think through something deeply before acting, or when deep-thinking diagnoses a Socratic frame. Also triggers on "help me think through this", "I keep going in circles", "what is the real question here", "break this down for me". Based on the recursive Socratic questioning method (EMNLP 2023) which outperforms CoT and Tree-of-Thought on complex reasoning tasks.
Reduce an oversized SKILL.md by first checking if an existing skill already covers the excess sub-workflow (link rather than create), then splitting into a new child skill only if no existing skill fits. Load when a skill exceeds 200 lines and compress-skill determines excess content is genuinely CORE, when the same sub-workflow appears in multiple skills, or when universal-skill-creator or improve-skills identifies a coherent sub-capability. Also triggers on "split this skill", "extract a sub-skill", "this skill is doing too much", or "make this skill reusable". Always checks for an existing home before creating a new skill.
Audit the project's technical health and identify "high-interest" debt. Load when the user asks to check code quality, find TODOs, assess project health, or prepare for a refactoring sprint. Also triggers on "technical debt audit", "where is the code messy", "assess project health", "find my hacks", or "identify tech debt". Essential for maintaining velocity in growing projects.
Run a fast, read-only health check across all skills in the library and produce a structured quality report — without modifying anything. Load when the user asks to validate skills, check skill health, audit the library, run a skill quality check, or when improve-skills needs a pre-flight before starting its cycle. Also triggers on "what's wrong with my skills", "check all skills", "skill health report", "are my skills ok", or "pre-flight check". Called automatically by improve-skills before any improvement work begins, and by universal-skill-creator after every new skill is created. Never modifies any file — only reads and reports.
Design, build, validate, and ship production-grade agent skills that work across OpenAI Codex, Ampcode, Factory.ai Droids, Google Gemini, Warp, Bolt.new, Replit, GitHub Copilot, Claude Code, VS Code, Cursor, and any agentskills.io compliant platform. Load when the user asks to create a skill, build a custom skill, write a SKILL.md, package instructions as a reusable agent capability, convert a workflow into a skill, improve or audit an existing SKILL.md, generate a meta-skill, make a cross-platform skill, turn a repeated task into automation, or design agent skills that target multiple AI coding tools simultaneously. Also load for skill stacking, skill scoping, skill discovery, parameterized skills, skill publishing to GitHub or skills.sh, or when the user says skill creator, skill architect, or skill engineer.
Identify the right tool for a process step. Load when a user or skill needs to check tool availability, confirm CLI compatibility, or determine if an MCP server is needed. Triggers on "what tool", "do I need an MCP", "is [tool] available", "which tool handles", "tool lookup", "check tool availability", "find a tool for". Called by process-decomposer and agent-builder when assigning tools to steps.
Apply the Red-Green-Refactor cycle to software development. Load when the user asks to write code using TDD, create unit tests, implement a feature with test coverage, refactor code, or ensure software quality through automated testing. Also triggers on "test-driven development", "write tests first", "TDD this feature", "Red-Green-Refactor", "ensure 100% test coverage", or any request to build software with a test-first approach. Supports unit, integration, and end-to-end testing strategies.
Research a skill domain before building or improving a skill. Searches academic papers, practitioner blogs, and GitHub skill repos in parallel to find current best practices, domain gotchas, and existing skill patterns. Called by universal-skill-creator and improve-skills before writing any skill. Also load directly when the user asks to research a domain for a skill, find existing skills on a topic, discover best practices for a skill, or check what research exists before building an agent skill.
Generate a tailored AGENTS.md for any new project by interviewing the user about their skill gaps, project goals, and tech context. Load when the user asks to set up a project, initialize agents, create an AGENTS.md, bootstrap a repo, onboard agents to a codebase, or says "set up this project for agents". Also triggers on "write an AGENTS.md for this project", "configure agents for my repo", "project bootstrap", "agent onboarding", or when the user starts a new project and needs agent-ready configuration. Re-run when new context arrives (PRD written, stack changes, team changes) to update the AGENTS.md.
Route user requests to the right skill, decompose complex work into parallel subagents, and manage project phase transitions. Load when the user asks "what should I do next", "which skill should I use", "orchestrate this", "run the full workflow", "split this into parallel tasks", or when a complex request spans multiple skills. Also triggers on "coordinate agents", "parallel execution", "task decomposition", "agent workflow", "what phase am I in", or when the user gives a broad instruction that requires multiple skills in sequence. This is the project's brain — it decides what runs, when, and whether to parallelise.
Decompose tasks into structured, outcome-defined process entries with complexity triage. Load when user says "decompose this", "break this down", "what steps do I need", "plan this out", "what's the process for", "how do I approach this", or when any complex task needs structured execution planning. Includes conversational problem understanding (Step 0) before triage. Routes to `problem-to-plan` when the user needs planning deliverables (spec + plan + TODO). Does NOT replace brainstorming — brainstorming is design approval (upstream), this is execution planning (downstream).
Run a structured discovery interview and produce a complete, developer-ready Product Requirements Document. Load when the user asks to write a PRD, create product requirements, document a feature, write a spec, define acceptance criteria, capture user stories, or turn a rough idea into a formal requirements document. Also triggers on "document this feature", "write requirements for", "create a one-pager", "turn this into a spec", "I need a PRD for", or any request to produce a structured product document for stakeholder alignment or engineering handoff. Supports Full PRD, Lean PRD, and One-Pager formats.
Orchestrator for the learn-from suite — auto-detects source type (academic paper, GitHub repo, blog/web article, or in-conversation learnings) and routes to the correct sub-skill for credibility check, security scan, insight extraction, and application. Load when the user says "learn from", "learn from this", "extract insights from", "apply learnings from", "what can we learn from", or provides a URL, file path, or pasted content that should be ingested as knowledge. Single entry point for all learning workflows.
Strip a problem to its irreducible fundamental truths and rebuild the solution from the ground up — free from analogy, convention, and inherited assumptions. Load when the user feels constrained by how something has always been done, when existing solutions feel expensive or inefficient for no good reason, when the user asks to think from first principles, challenge the fundamentals, or rebuild this from scratch. Also triggers on "why does it have to work this way", "what are the actual constraints here", "ignore what everyone else does", or when deep-thinking diagnoses a convention-break frame. Based on Aristotle's first principles method, popularised by Musk and Feynman. Produces genuinely novel solutions by eliminating convention.
Orchestrate one or more thinking frameworks to work through any problem, decision, document, or idea rigorously. Diagnoses which frameworks fit — inversion, pre-mortem, assumption-mapping, socratic, adversarial-hat — then guides the user through them in the right sequence. Load when the user asks for deep thinking, says "help me think through this properly", "apply your best thinking frameworks", "I need to think carefully before deciding", or "what thinking tools should I use here". Also the entry point for any complex problem where the right framework is unclear. Covers product decisions, engineering tradeoffs, personal decisions, strategy, creative challenges — any domain.
Fix broken or failing functionality through structured reproduction, root-cause analysis, minimal fix, and verification. Load when the user asks to fix a bug, debug an error, resolve an issue, or work on a Linear ticket. Also triggers on "this is broken", "fix this bug", "why is this failing", "debug this", "resolve this error", "what went wrong", or any request to diagnose and fix a problem.
Capture the "why" behind technical choices to prevent architectural drift. Load when the user makes a major technical decision, chooses a library/framework, defines a system boundary, or changes a core architectural pattern. Also triggers on "record a decision", "write an ADR", "why did we do this", "document this architectural choice", or "architectural decision record". Essential for long-term project maintainability and agent alignment.
Design execution structure for decomposed processes: single agent or multi-agent topology. Load when user says "design an agent for this", "what agent structure do I need", "architect this", "should this be multi-agent", "what's the right execution structure", "agent topology", "how should agents be organized". Takes process-decomposer output as primary input. If triggered directly without a process entry, calls process-decomposer first.
Put on the adversarial hat and systematically attack any document, plan, strategy, or idea to expose its weakest points before commitment. Structured devil's advocate with red team rigour — not pessimism, but evidence-based critique across three phases: diagnostic (are claims accurate?), creative (is the problem artificially constrained?), challenge (are solutions robust?). Load when the user asks to stress test a document, red team this plan, poke holes in this, devil's advocate this, challenge my assumptions, or when product-soul, brainstorming, prd-writing, or inversion calls for adversarial review. Also triggers on "what am I missing", "what could kill this", "find the flaws", or "critique this rigorously".
Critically audit agent skills and remove content that is outdated, disproven, model-specific, or based on poorly cited sources. Load when improve-skills runs its per-skill cycle, when the user asks to prune skills, remove outdated techniques, check if skills are still valid, verify citations in skills, audit skill sources, or update skills for a new model release. Also triggers on "are these skills still valid", "check for obsolete techniques", "verify skill citations", or "update skills for GPT-5/Claude 4/Gemini 2". Runs before split-skill and compress-skill — removing bad content first means the remaining content is worth preserving.