
A standard Spec-Kitty workflow routine.
A standard Spec-Kitty workflow routine.
A standard Spec-Kitty workflow routine.
Audit RLM cache coverage - compare manifest against filesystem
Stateless evaluation engine that scores and gates skill improvement iterations using headless Python evaluation scripts. Use when the user says "evaluate this skill", "run autoresearch loop on", "optimize this skill", "run the eval loop", or when another agent proposes a change to an existing skill and needs empirical validation before applying it. Supports autonomous loop mode for iterative improvement and single-shot QA mode for validating one specific proposed change. Requires Python 3.8+ and a git repository.
A standard Spec-Kitty workflow routine.
A standard Spec-Kitty workflow routine.
A standard Spec-Kitty workflow routine.
A standard Spec-Kitty workflow routine.
A standard Spec-Kitty workflow routine.
Gemini CLI sub-agent system for persona-based analysis. Use when piping large contexts to Google Gemini models for security audits, architecture reviews, QA analysis, or any specialized analysis requiring a fresh model context.
Tiered memory system for cognitive continuity across agent sessions. Manages hot cache (session context loaded at boot) and deep storage (loaded on demand). Use when: (1) starting a session and loading context, (2) deciding what to remember vs forget, (3) promoting/demoting knowledge between tiers, (4) user says 'remember this' or asks about project history.
A standard Spec-Kitty workflow routine.
Interactive RLM cache initialization. Use when: setting up a new project's semantic cache for the first time, or adding a new cache profile. Walks the user through folder selection, extension config, manifest creation, and first distillation pass.
3-Phase Knowledge Search strategy for the RLM Factory ecosystem. Auto-invoked when tasks involve finding code, documentation, or architecture context in the repository. Enforces the optimal search order: RLM Summary Scan (O(1)) -> Vector DB Semantic Search -> Grep/Exact Match. Never skip phases.
Audit Vector DB coverage -- compares the live filesystem manifest against the ChromaDB index to identify coverage gaps.
Ingests repository files into the ChromaDB vector store. Builds or updates the vector index from a manifest or directory scan using ingest.py. Use when new files need to be indexed or the vector store is out of date. <example> user: "Index these new plugin files into the vector database" assistant: "I'll use vector-db-ingest to add them to the vector store." </example> <example> user: "The vector store is missing recent files -- update it" assistant: "I'll use vector-db-ingest to re-index the changes." </example>
Distills uncached files into the Recursive Language Model(RLM) Summary cache Ledger. You (the agent) ARE the distillation engine. Read each file deeply, write a high-quality 1-sentence summary, inject it via inject_summary.py. The purpose is if you read the full file once and produce a great summary once it will avoid the need to read the file every time you need to know what the script does or what the details of the file are. most cases the RLM summary should be sufficient. Use when files are missing from the ledger and need to be summarized. <example> user: "Summarize these new plugin files into the RLM ledger" assistant: "I'll use rlm-distill-agent to read and summarize each file into the cache." </example> <example> user: "The RLM ledger is missing 40 files -- fill the gaps" assistant: "I'll use rlm-distill-agent to process the missing files." </example>
Bootstraps a skill evaluation lab repo for an autoresearch improvement run. Trigger with "set up an eval lab", "bootstrap the eval repo", "prepare the test repo for skill evaluation", "create an eval environment for this skill", "set up the lab space for this skill", or when starting a new skill optimization run that needs a standalone test environment.
Removes stale and orphaned entries from the RLM Summary Ledger. Use after files are deleted, renamed, or moved to keep the ledger in sync with the filesystem. <example> user: "Clean up the RLM cache after I renamed some files" assistant: "I'll use rlm-cleanup-agent to remove stale entries from the ledger." </example> <example> user: "The RLM ledger has entries for files that no longer exist" assistant: "I'll run rlm-cleanup-agent to prune orphaned entries." </example>
Knowledge Curator agent skill for the RLM Factory. Auto-invoked when tasks involve distilling code summaries, querying the semantic ledger, auditing cache coverage, or maintaining RLM hygiene. Supports both Ollama-based batch distillation and agent-powered direct summarization. V2 enforces Concurrency Safety constraints.
Verifies that os-architect actually causes evolution — not just words. Dispatches os-architect in single-shot simulation mode for a given test scenario, then checks for real artifact presence (new files, HANDOFF_BLOCK, plan files). Reports PASS / FAIL with grep evidence. Accumulates results into a test report. Use after any changes to os-architect, os-evolution-planner, or improvement-intake-agent.
A standard Spec-Kitty workflow routine.
Presents layout options to the SME in plain language before any prototype construction begins. Invoked after the Discovery Plan is approved to confirm visual structure and direction. Trigger phrases: "what should it look like", "show me some layout options", "let me see the design options before we build". Also invoked by prototype-builder after plan approval.
A system architecture scaffolder designed to synthesize visual discoveries and interactive Q&A responses into comprehensive C4 and TOGAF specs.
A package builder skill that compiles specs/ documents into standard spec-kits and scaffolds the clean target codebase sandbox.
A standard Spec-Kitty workflow routine.
A visual & functional crawler operation utilizing Chrome DevTools Protocol (CDP) or Playwright/Puppeteer to audit prototype UI/UX and behavior.
Codifies the plan-and-delegate workflow for evolving plugins, skills, and agents. Given a target (plugin/skill/agent name) and an evolution goal, this skill first brainstorms 2-3 approach options using the cheapest available model, presents them for selection, then writes a structured task plan and Copilot CLI delegation prompt for the chosen approach. Called by os-architect for Path B (update) and Path C (create) executions. Can also be invoked standalone.
Removes stale and orphaned chunks from the ChromaDB vector store for files that have been deleted or renamed. Use after files are removed or moved to keep the vector index in sync with the filesystem. <example> user: "Clean up the vector store after I deleted some files" assistant: "I'll use vector-db-cleanup to remove orphaned chunks." </example> <example> user: "The vector database has chunks for files that no longer exist" assistant: "I'll run vector-db-cleanup to prune them." </example>
A standard Spec-Kitty workflow routine.
Builds a prototype component by component, self-reviewing each component against the Discovery Plan before moving to the next. Invoked by prototype-builder after the layout direction is confirmed. Trigger phrases: "build the prototype", "let's build it", "start building". Also invoked by prototype-builder-agent after visual-companion confirms layout.
SME-facing orchestrator for the Business Exploration Loop. Supports 4 session types (greenfield, brownfield, discovery-only, spike) with adaptive phase selection. Manages state via exploration-dashboard.md, enforces phase gates, and routes to child skills in sequence. Phases can be skipped based on session type. Single canonical entry point — invoke at the start of any exploration session or to resume an in-progress session. Trigger phrases: "start an exploration", "let's explore this idea", "resume my exploration", "where did we leave off", "start discovery".
Extracts pure, framework-free, IO-free domain models and deterministic business rules from a rapid prototype with strict preservation vs replacement classification and purity audit enforcement.
Top-level orchestration skill coordinating the entire 7-step surgical vibe-to-enterprise reengineering pipeline with automated safety and economic optimization controls.
Builds an executable safety net of characterization tests by integrating browser flow recording, API payload snapshotting, DOM state captures, network traces, and mock fixture generation.
Installs plugin components (skills, commands/workflows, rules, hooks, MCP) into the .agents/ central store and symlinks them to agent environments that require it (.claude/). DEFAULT method: run plugin_add.py for an interactive require it (.claude/). DEFAULT method: run plugin_add.py (the primary TUI and headless orchestrator) which internally delegates to plugin_installer.py (the execution engine) for the actual OS-level installation. Trigger when a user says "install plugin", "deploy plugin", "add plugin", "install from GitHub", or "sync plugin to agents".
Progressively migrates legacy prototype routes and features to a clean architecture layer slice-by-slice, verifying them against characterization tests, running purity/drift checks, and executing completion certifications.
End-to-end pipeline that transforms an undocumented vibe-coded prototype into a Spec Kit-compatible specification package and a Superpowers-ready implementation handoff, including characterization tests, domain extraction, architecture decisions, task breakdown, TDD strategy, worktree guidance, and certification reports.
A standard Spec-Kitty workflow routine.
A standard Spec-Kitty workflow routine.
Guides a Subject Matter Expert through a structured discovery session to create and approve a Discovery Plan before any building begins. This is the HARD-GATE brainstorming skill — no prototype can be built until the SME explicitly approves the plan. Trigger phrases: "start a discovery session", "let's plan this out", "help me figure out what we're building", "I have an idea I want to explore", "let's start from scratch"
Pattern 5: Concurrent Event-Driven Multi-Agent Loop. Coordinates multiple Claude sessions as OS threads sharing a common event bus and memory address space. Every loop cycle is a full improvement cycle: execute, eval against benchmark (KEEP/DISCARD), emit friction events during work, close with post_run_metrics, agent self-assessment survey saved to retrospectives, memory persistence, and Triple-Loop Retrospective trigger if friction threshold crossed. Four coordination topologies: turn-signal, fan-out, request-reply, triple-loop (Pattern D).
Trigger with "show me the improvement chart", "how are we improving", "progress report", "graph the eval scores", "show cycle of improvement", "what's the trend", "are we getting better". Produces a visual/text summary of how the agentic loop is improving across cycles. Do NOT use this to run the learning loop or evaluate a specific skill change.
Trigger: "set up agentic OS", "initialize agent harness", "init my project for AI agents", "where do I put CLAUDE.md", "create my agent environment", "set up persistent memory". Guides users through an interview to understand their use case, then scaffolds the right Agentic OS structure. Use even when the user just asks WHERE to put files.
Audit file path references in plugins and skills. Trigger with "audit path references", "check file references", "find broken references", "path reference audit", "verify paths", or when you need to validate that all ./references in code actually exist in the skill/plugin. Three-phase audit: (1) SCAN all files for references, (2) VERIFY each exists, (3) REPORT issues. Generates inventory.json for reuse across multiple checks.
Interactively select and uninstall agent plugins and skills from the local .agents/ environment.
Audit RLM cache coverage - compare manifest against filesystem
Removes stale and orphaned entries from the RLM Summary Ledger. Use after files are deleted, renamed, or moved to keep the ledger in sync with the filesystem. <example> user: "Clean up the RLM cache after I renamed some files" assistant: "I'll use rlm-cleanup-agent to remove stale entries from the ledger." </example> <example> user: "The RLM ledger has entries for files that no longer exist" assistant: "I'll run rlm-cleanup-agent to prune orphaned entries." </example>
Knowledge Curator agent skill for the RLM Factory. Auto-invoked when tasks involve distilling code summaries, querying the semantic ledger, auditing cache coverage, or maintaining RLM hygiene. Supports both Ollama-based batch distillation and agent-powered direct summarization. V2 enforces Concurrency Safety constraints.
Distills uncached files into the Recursive Language Model(RLM) Summary cache Ledger. You (the agent) ARE the distillation engine. Read each file deeply, write a high-quality 1-sentence summary, inject it via inject_summary.py. The purpose is if you read the full file once and produce a great summary once it will avoid the need to read the file every time you need to know what the script does or what the details of the file are. most cases the RLM summary should be sufficient. Use when files are missing from the ledger and need to be summarized. <example> user: "Summarize these new plugin files into the RLM ledger" assistant: "I'll use rlm-distill-agent to read and summarize each file into the cache." </example> <example> user: "The RLM ledger is missing 40 files -- fill the gaps" assistant: "I'll use rlm-distill-agent to process the missing files." </example>
Converts a document (.txt, .md, .pdf, .docx) into an RSVP (Rapid Serial Visual Presentation) token stream using the Spritz ORP formula. Invoked when a user wants to speed-read a file, generate a token stream at a target WPM, or prepare a Spritz/RSVP reading session.
Trigger with "run self-audit", "test the analyzer", "regression test the plugin analyzer", "audit the agent-scaffolders", or "verify the analyzer works correctly". Runs the analyze-plugin skill against the agent-scaffolders itself and its test fixtures as a regression smoke test. Use this after making changes to the analyzer to verify nothing broke.
Create, audit, repair, and document cross-platform symlinks that work correctly on both Windows and macOS/Linux. Use this skill whenever the user mentions symlinks, symbolic links, junction points, .gitconfig symlinks, broken links after git pull, cross-platform path issues, or needs help with ln -s equivalents on Windows. Also trigger when the user reports that files are missing or wrong after switching between Mac and Windows machines using Git. This skill solves the common problem where symlinks committed on macOS show up as plain text files on Windows (and vice versa) because of Git's core.symlinks setting or missing Developer Mode / elevated permissions. **IMPORTANT FOR WINDOWS USERS:** Developer Mode must be enabled before creating symlinks. Without it, Git will check out symlinks as plain-text files or hardlinks, breaking cross-platform workflows.
Audit a file for TODO comments, pending work items, or technical debt markers. Useful for checking code readiness before a commit or reviewing task status. Trigger with "check for todos", "audit for debt", "list pending work", or "scan for TODOs".
Ingests repository files into the ChromaDB vector store. Builds or updates the vector index from a manifest or directory scan using ingest.py. Use when new files need to be indexed or the vector store is out of date. <example> user: "Index these new plugin files into the vector database" assistant: "I'll use vector-db-ingest to add them to the vector store." </example> <example> user: "The vector store is missing recent files -- update it" assistant: "I'll use vector-db-ingest to re-index the changes." </example>
Standard operating procedures for the Spec Kitty agentic workflow (Plan -> Implement -> Review -> Merge).
Triggers the L5 Red Team Sub-Agent to rigorously audit a plugin against the 39-point L4 pattern matrix.
Deploy a skill as an Azure AI Foundry hosted agent
(Industry standard: Meta-Learning System / Automated Autoresearch) Primary Use Case: Continuous, self-improving orchestration of an agentic system over multiple sessions. Use when: building a continuous improvement layer that autonomously identifies workflow friction, postulates hypotheses, and tests improved instructions/coding skills against an objective headless benchmark before merging and persisting.
Systematically analyze agent plugins and skills to extract design patterns, architectural decisions, and reusable techniques. Trigger with "analyze this plugin", "mine patterns from", "review plugin structure", "extract learnings from", "what patterns does this plugin use", "check if this plugin is well-structured", "validate plugin compliance", or when examining any plugin or skill collection to understand its design. Use this skill even when the user just says "look at this plugin" or "tell me how this is structured."
(Industry standard: Review and Critique Pattern) Primary Use Case: Iterative generation paired with adversarial review, continuing until an 'Approved' verdict is reached. Orchestrated adversarial review loop. Use when: research, designs, architectures, or decisions need to be reviewed by red team agents (human, browser, or CLI). Iterates in rounds of research → bundle → review → feedback until approved.
Scaffold a deterministic GitHub Actions CI/CD workflow
Add an MCP server integration to a plugin
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define a command with arguments", "create a command that runs bash", "add a /command to my plugin", "use $ARGUMENTS in a command", "set up argument-hint", "create a workflow command", "interactive command", or needs guidance on slash command structure, YAML frontmatter fields, file references, bash execution, command organization, or command best practices. Use this skill whenever Claude Code slash commands are mentioned even without the word "command" -- e.g. "I want a shortcut that reviews PRs" or "automate my deploy workflow" should trigger this. Do NOT use this for hooks (use create-hook), skills (use create-skill), or agents (use create-sub-agent).
Convert raw plugin analysis results into actionable improvement recommendations for agent-scaffolders and agent-scaffolders. Trigger with "synthesize learnings", "generate improvement recommendations", "what should we improve in our scaffolders", "update our meta-skills based on these findings", or after completing a plugin analysis.
Reduces Claude Code context bloat across three dimensions: (1) duplicate skill deduplication — clears .claude/ copies since Claude Code already reads from plugins/ directly; (2) CLAUDE.md optimization — rewrites to under ~80 lines, keeping only rules that directly change Claude behaviour; (3) session token efficiency — guidance on cheap subagent delegation, context compounding across turns, and session hygiene. Trigger with "optimize claude context", "reduce context bloat", "deduplicate skills", "trim CLAUDE.md", "fix my context usage", "why are my skills loading twice", "how do I reduce token usage", or "clean up .claude directory".
Interactively prepares a targeted Red Team Review package. It conducts a brief discovery interview to determine the threat model, generates a strict security auditor prompt, compiles a manifest of relevant project files, and bundles them into a single Markdown artifact or ZIP archive ready for an external LLM (like Grok, ChatGPT, or Gemini) or a human reviewer.
Orchestrates the full prototype build cycle for a Subject Matter Expert. Coordinates layout confirmation and component building — it does not build components directly. Acts as the single entry point for all prototype-related requests. Trigger phrases: "build a prototype", "create a working prototype", "show me a working version", "prototype to clarify scope", "build an exploratory prototype"
Runs a semantic health check over the Obsidian LLM wiki using the cheapest available LLM CLI. Finds inconsistencies, missing concepts, stale articles, connection candidates, and new article suggestions. Writes a structured report to meta/lint-report.md. Use when the wiki is large enough to have quality drift, or as a periodic maintenance step.
Synchronizes the agent environments. Looks at the inventory (plugin-sources.json) and reinstalls all skills and plugins from the sources indicated. Also cleans up any orphaned artifacts from removed plugins. Trigger when the user asks to "sync plugins", "update all plugins", "refresh environment", or "run the sync script".
Audit Vector DB coverage -- compares the live filesystem manifest against the ChromaDB index to identify coverage gaps.
Identify underspecified areas in the current feature spec by asking up
Session manager for RSVP speed reading. Orchestrates reading sessions with pause, resume, speed adjustment, and comprehension check-ins. Invoke after generating an RSVP token stream with the rsvp-reading skill.
Scaffold an advanced stateful agent skill with L4 patterns
This skill should be used when the user wants to "humanize this", "make this sound less AI", "rewrite this naturally", "remove AI patterns", "make this more conversational", "this sounds robotic", "edit for voice or tone", or pastes text that reads like LLM output (parallel structure in threes, em dashes, semicolons, hedging phrases, hollow affirmations). Also trigger for "make this sound like me", "clean up this draft", or "rewrite this for LinkedIn/email/Slack". Use this skill even for a single sentence if the user's intent is to make writing feel more human. Do NOT use for pure grammar correction or style guide work unrelated to humanizing AI patterns.
Interactively initializes the Vector DB plugin. Guided discovery asks which folders to index, confirms the manifest, then scaffolds vector_profiles.json for high-performance In-Process or Native Server connections. Mandatory first step before ingestion or search.
This skill should be used when the user asks to "audit a plugin", "validate my plugin", "check plugin structure", "verify plugin is correct", "validate .claude-plugin/plugin.json", "check if my plugin is compliant", "review plugin components", or mentions plugin validation or structure compliance. Also trigger proactively after the user creates or modifies any plugin component (commands, agents, skills, hooks, .claude-plugin/plugin.json). Use this skill even when the user says "check my work" or "make sure this is right" in a plugin context. Do NOT use this for auditing individual skills only (use skill-reviewer for that).
Removes stale and orphaned chunks from the ChromaDB vector store for files that have been deleted or renamed. Use after files are removed or moved to keep the vector index in sync with the filesystem. <example> user: "Clean up the vector store after I deleted some files" assistant: "I'll use vector-db-cleanup to remove orphaned chunks." </example> <example> user: "The vector database has chunks for files that no longer exist" assistant: "I'll run vector-db-cleanup to prune them." </example>
Activate when the user wants to install, deploy, test, or materialize an APM package into runtime directories such as .agents/, .github/, .claude/, .cursor/, .gemini/, .codex/, .opencode/, or .windsurf/. Use after creating or converting an APM package.
Activate when the user wants to create a new APM-native package from scratch for reusable agent skills, agents, commands, hooks, MCP configuration, prompts, or governance-managed agent assets. Do not use this for existing plugin migration; use convert-plugin-to-apm instead.
Scaffold a complete Claude Code plugin from scratch
Activate when the user wants to compile an APM package into top-level context documents such as AGENTS.md, CLAUDE.md, or GEMINI.md, especially for Codex, Gemini, OpenCode, or agents-protocol style hosts. Do not use when the user only needs per-skill installation; use install-apm-package instead.
Automatically updates the plugin/skill/agent counts in README.md based on the current plugins/ directory.
Design and scaffold a Claude Code sub-agent
Coding conventions enforcement agent. Auto-invoked when writing new code, reviewing code quality, adding headers, or checking documentation compliance across Python, TypeScript/JavaScript, and C#/.NET.
Task management agent. Auto-invoked for task creation, status tracking, and kanban board operations using Markdown files across lane directories. V2 enforces Kanban Sovereignty constraints preventing manual task file edits.
Transforms raw source files registered in wiki_sources.json into Karpathy-style LLM wiki nodes inside the wiki-root. Generates concept pages, cluster pages, index, and table of contents. Use when building or rebuilding the wiki from raw content.
Discovers and persists the user's available AI environments (Claude, Copilot CLI, Gemini CLI, Cursor, etc.) to context/memory/environment.md. Run once after OS setup or whenever the environment changes. os-architect and os-evolution-planner read this file to select the right delegation backend and cheapest brainstorm model automatically. Invoked by os-architect on first run if environment.md is absent.
Interactively creates technical bundles of code, design, and documentation for external review or context sharing. It conducts a brief discovery phase to confirm the targets and format, presents a plan, and then packages multiple project files into a single Markdown file or a portable `.zip` archive.
Generate Mermaid flowcharts documenting business processes, state machines, and workflow logic from session captures. Use when you need to map multi-step processes, approval flows, user journeys, or decision trees during exploration. Trigger with "map this workflow", "create a process diagram", "flowchart the business process", "document this workflow", or "visualize the state machine".
Semantic search skill for retrieving code and documentation from the ChromaDB vector store. Use when you need concept-based search across the repository (Phase 2 of the 3-phase search protocol). V2 includes L4/L5 retrieval constraints.
Start the Native Python ChromaDB background server. Use when semantic search returns connection refused on port 8110, or when the user wants to enable concurrent agent read/writes.
Derives, groups, and refines user stories from exploration work, prototype behavior, and business context, with prioritization for the first implementation slice. Supports standard "As a / I want / So that" format and Gherkin "Given / When / Then" Acceptance Criteria format. Trigger with "generate user stories", "write acceptance criteria", "create Gherkin scenarios", "derive stories from requirements", or "create a backlog".
3-Phase Knowledge Search strategy for the RLM Factory ecosystem. Auto-invoked when tasks involve finding code, documentation, or architecture context in the repository. Enforces the optimal search order: RLM Summary Scan (O(1)) -> Vector DB Semantic Search -> Grep/Exact Match. Never skip phases.
Interactive RLM cache initialization. Use when: setting up a new project's semantic cache for the first time, or adding a new cache profile. Walks the user through folder selection, extension config, manifest creation, and first distillation pass.
Self-healing and self-evolving pattern for agents operating against external systems (CDP automation, DOM-dependent tooling, web APIs). Classifies failures into three tiers — Gap / Failure / Regression — applies repo-profile-gated edits with appropriate autonomy, verifies the fix, and updates domain reference files ("The Map, not the Diary"). Invoke whenever a tool call or subprocess returns a failure that a patched helper could fix.
Scaffolds the filesystem structure for a new agent skill: creates the directory layout, writes a starter SKILL.md, generates evals/evals.json, references/, scripts/, and assets/ as needed, and runs a discovery interview to capture name, purpose, and trigger phrases before writing any files. Use when the user says "create a new skill", "scaffold a skill", "generate a skill", "new skill setup", or "make a skill directory". Does not handle content improvement for existing skills — that is handled by os-skill-improvement.
Activate when the user wants to add APM governance, lockfile/audit readiness, or multi-runtime package management to an existing Claude/Copilot/agent plugin, or explicitly convert a plugin into an APM-native package.
ADR management skill. Auto-invoked for generating architecture decisions, documenting design rationale, and maintaining the decision record log. Uses native read/write tools to scaffold and update ADR markdown files.
(Industry standard: Parallel Agent) Primary Use Case: Work that can be partitioned into independent sub-tasks running concurrently across multiple agents. Parallel multi-agent execution pattern. Use when: work can be partitioned into independent tasks that N agents can execute simultaneously across worktrees. Includes routing (sequential vs parallel), merge verification, and correction loops.
Interactive skill to scaffold and optimize the .agents/ directory for any project mapping up Antigravity configuration. Sets up .gemini/GEMINI.md, skills/, prompts/, and config.json using best practices. Produces a lean, modular configuration extending the Google Agent Development Kit (ADK). Trigger with "set up antigravity", "scaffold .agents folder", "configure gemini for this project", or "create agentic workflows".
Captures and refines business requirements, including functional requirements, non-functional requirements, business rules, constraints, assumptions, and success measures. Produces structured BRD-style documents with [CONFIRMED] and [UNCONFIRMED] confidence markers. Trigger with "capture requirements", "generate a BRD", "document business rules", "list the constraints", or "create a requirements document".
Claude CLI sub-agent system for persona-based analysis. Use when piping large contexts to Anthropic models for security audits, architecture reviews, QA analysis, or any specialized analysis requiring a fresh model context.
Trigger with "evaluate autoresearch fit", "score this skill for karpathy loop", "is this a good autoresearch candidate", "assess autoresearch viability for", "which skills are best for autonomous loop optimization", "score skills for 3-file architecture", or when the user wants to determine if a skill is a good candidate for applying the Karpathy autoresearch autonomous optimization loop pattern.
Interactive skill to scaffold and optimize the .claude/ directory for any project. Sets up CLAUDE.md, .claude/rules/, .claude/settings.json with best practices, and optional hooks. Produces a lean, modular configuration that avoids monolithic context bloat. Trigger with "set up claude", "optimize my CLAUDE.md", "scaffold .claude folder", "configure claude for this project", or "create claude settings".
Translate .mmd diagram files into PNG images with configurable resolution (retina/HQ/scale), supporting rasterization (raster, rasterize, rasterization). V2 includes L5 Delegated Constraint Verification via verify_png for strict binary linting and Puppeteer-based rendering.
Copilot CLI sub-agent system for dispatching tasks and persona-based analysis to GitHub Copilot models. Use for task delegation (agent reads/writes files directly), security audits, architecture reviews, or any work requiring a fresh model context.
Scaffold a GitHub agentic workflow from an existing skill
Scaffold an agent skill with Docker runtime support
Design and scaffold an event-driven Claude Code hook
Python dependency and environment management for multi-service or monorepo python backends. Use when: (1) adding, upgrading, or removing a Python package, (2) responding to Dependabot or security vulnerability alerts (GHSA/CVE), (3) creating a new service that needs its own requirements files, (4) debugging pip install failures or Docker build issues related to dependencies, (5) reviewing or auditing the dependency tree, (6) running pip-compile. Enforces the pip-compile locked-file workflow and tiered dependency hierarchy.
(Industry standard: Sequential Agent / Agent as a Tool) Primary Use Case: Delegating a well-defined task to a worker agent, verifying its execution, and repeating if necessary. Inner/outer agent delegation pattern. Use when: work needs to be delegated from a strategic controller (Outer Loop) to a tactical executor (Inner Loop) via strategy packets, with verification and correction loops.
Provides information about how to create, structure, install, and audit Agent Skills, Plugins, Antigravity Workflows, and Sub-agents. Trigger this when specifications, rules, or best practices for the ecosystem are required.
Provides active execution protocols to rigorously audit how code, directory structures, and agent actions comply with the authoritative ecosystem specs. Trigger when validating new skills, plugins, or workflows.
Interactive co-authoring skill for the narrow end of the exploration funnel. Synthesizes session briefs, BRDs, story sets, and prototype notes into a structured handoff package targeted at the correct downstream consumer (e.g., formal software specs, strategic roadmaps, or process documentation).
Evaluates and improves the exploration-cycle skills, prompts, routing, and artifact quality using baseline-first, one-hypothesis iteration loops with keep-discard decisions and experiment ledgers.
Interactive co-authoring skill for the wide end of the exploration funnel. Captures and refines the core intent, whether the outcome is a software app, a business process improvement, research analysis, or strategic roadmap. Guides users through gathering context, iteratively drafting the brief, and testing for blind spots.
Fixes broken path references in plugin skill and agent files to ensure portability across installed environments. Use when you see "plugins/" paths in SKILL.md or agent files, need to standardize path references after installing a skill, want to audit and fix cross-plugin path dependencies, run a portability audit on a repository, neutralize hardcoded machine paths like /Users/, find Python scripts using PROJECT_ROOT or Path() to reach into plugins/<name>/ at runtime, or are preparing plugin files for distribution via uvx or bootstrap.py. Also handles evolving a skill in-session while tracking quality scores with the eval runner to continuously improve skill routing accuracy.
Initialize HuggingFace integration - validates .env variables, tests API connectivity, and ensures the dataset repository structure exists. Use when onboarding a new project to HuggingFace or when credentials change.
Performs an uncompromising L5 Enterprise Red Team Audit on a given plugin against the 39-point architectural maturity matrix. Trigger when the user requests a security audit, red team assessment, structural compliance review, or maturity gap analysis of any agent plugin or skill directory.
(Industry standard: Loop Agent / Single Agent) Primary Use Case: Self-contained research, content generation, and exploration where no inner delegation is required. Self-directed research and knowledge capture loop. Use when: starting a session (Orientation), performing research (Synthesis), or closing a session (Seal, Persist, Retrospective). Ensures knowledge survives across isolated agent sessions.
Specialized Quality Assurance Operator for documentation link integrity and scans. Automatically handles automated link validation, auditing, fixing, and repairing broken documentation links and docs paths across repositories, with guidance on when to commit changes.
This skill should be used when the user wants to "create a marketplace", "setup a marketplace catalog", "scaffold marketplace.json", "initialize a plugin registry", or "configure a Gemini CLI extension". Use this even if they just mention "setting up a marketplace".
Trigger with "mine plugins", "analyze plugin collection", "run the full analysis pipeline", "inventory and analyze all plugins", "mine patterns from this directory", or when you want to run the complete virtuous cycle: inventory, analyze, extract patterns, synthesize recommendations, and deliver a structured report. Use this even if the user just says "analyze everything in this folder".
Trigger with "mine this skill", "analyze this skill", "run targeted skill analysis", "extract patterns from this skill", or when you want focused analysis on a single Agent Skill directory without processing an entire plugin. Use this when the user points to a specific skill folder or says "look at this skill".
Read and manipulate Obsidian Bases (.base) files - YAML-based database views that render as tables, cards, and grids inside the vault. Use when reading, appending rows, or updating cells in a Base file.
Programmatically create and manipulate Obsidian Canvas (.canvas) files using JSON Canvas Spec 1.0. Enables agents to generate visual flowcharts, architecture diagrams, and planning boards. Use when creating or editing visual canvas files.
Semantic link traversal for Obsidian Vaults. Builds an in-memory graph index from wikilinks and provides instant forward-link, backlink, and multi-degree connection queries. Use when exploring note relationships or finding orphaned notes.
Initialize and onboard a new project repository as an Obsidian Vault. Covers prerequisite installation, vault configuration, exclusion filters, and validation. Use when setting up Obsidian for the first time in a project.
Core markdown syntax skill for Obsidian. Enforces strict parsing and authoring of Obsidian proprietary syntax (Wikilinks, Blocks, Headings, Aliases, Embeds, Callouts). Use when reading, writing, or validating Obsidian-flavored markdown.
Progressive-disclosure query against the Obsidian LLM wiki. Returns RLM summary first, expands to bullets, then full wiki node on demand. Use when looking up concepts, searching the wiki, or getting instant context from the knowledge graph.
Distills wiki source files into the RLM summary layer (summary.md, bullets.md, deep.md) using the cheapest available LLM CLI. Routes to Copilot gpt-5-mini first, then Claude Haiku, then Gemini Flash. Never uses Ollama. Use when wiki nodes need RLM summaries generated or refreshed.
Safe Create/Read/Update/Delete operations for Obsidian Vault notes. Implements atomic writes, advisory locking, concurrent edit detection, and lossless YAML frontmatter handling. Use when reading, writing, updating, or appending to any vault note.
Audits and rewrites AI agent instruction files (CLAUDE.md, GEMINI.md, .github/copilot-instructions.md) in any repo. Strips stale or foreign content, applies Karpathy's four behavioral principles, ensures platform-specific sections, and makes each file authoritative rather than a copy of another. Trigger when the user says "optimize my CLAUDE.md", "audit agent instructions", "improve my CLAUDE.md", "apply Karpathy principles to my agent files", "clean up my copilot instructions", "review my GEMINI.md", or "update my AI instruction files".
(Industry standard: Routing Agent / Orchestrator Pattern) Primary Use Case: Analyzing an ambiguous trigger and routing it to one of the specific specialized implementations. Routes triggers to the appropriate agent-loop pattern. Use when: assessing a task, research need, or work assignment and deciding whether to run a simple learning loop, red team review, dual-loop delegation, or parallel swarm. Manages shared closure (seal, persist, retrospective, self-improvement).
Safely removes all agent lock files from the context/.locks/ directory to resolve deadlocks caused by crashed agents leaving stale locks behind. Use when the user says "/os-clean-locks", "clear all locks", "reset agent locks", or when an agent is deadlocked and cannot acquire a lock because a previous agent crashed and left a stale lock behind in context/.locks/. Verifies lock existence, discovers and removes stale lock directories, updates OS state via kernel.py, and emits event bus notifications. Requires Python 3.8+ standard library only.
Reviews a completed os-eval-runner lab run and backports approved changes to master plugin sources. Trigger with "backport the eval results", "review the lab run", "apply eval improvements to master", "check what the eval agent changed".
Maintains a persistent, folder-based log of all agentic-os experiment runs. Each run writes one dated file to context/experiment-log/ and updates index.md. Supports five source types: verifier (qualitative), tester (qualitative), orchestrator (numeric), planner (qualitative), survey (mixed). Handles both numeric results (eval scores, KEEP/DISCARD, delta) and qualitative results (PASS/FAIL/PARTIAL, gap analysis). Use after any experiment run to persist findings before temp/ is cleared.
Trigger with "explain agentic os", "how do I set up a persistent agent environment", "what is the CLAUDE.md hierarchy", "explain the context folder structure", "how does session memory work", "what is soul.md or user.md", "explain auto-memory or MEMORY.md", "what is a loop scheduler or heartbeat", or when the user asks for the canonical guide.
Trigger with "remember this", "update memory", "what should we record from this session", "capture learnings", "write a session log", or when closing a session. Guides agents on managing memory hygiene across sessions, deciding what to write to dated memory logs, what to promote to long-term memory.md, and when to archive. <example> User: I'm done for the day, can you write up a session log? Agent: <Bash> python context/kernel.py emit_event --agent os-memory-manager --type intent --action promote_memory python context/kernel.py state_update active_agent os-memory-manager </Bash> </example> <example> User: That's all, logging off now. Agent: <Bash> python context/kernel.py acquire_lock memory </Bash> </example> <example> User: How does the memory system work? Agent: <Read> ./references/architecture/context-folder-patterns.md </Read> </example>
Tiered memory system for cognitive continuity across agent sessions. Manages hot cache (session context loaded at boot) and deep storage (loaded on demand). Use when: (1) starting a session and loading context, (2) deciding what to remember vs forget, (3) promoting/demoting knowledge between tiers, (4) user says 'remember this' or asks about project history.
Full-cycle install or update of the Spec-Kitty framework - upgrades the CLI, refreshes templates, syncs the plugin, reconciles custom knowledge, and bridges to agent environments. Custom skill (not from upstream spec-kitty).
Upload primitives for HuggingFace Soul persistence - file, folder, snapshot, JSONL append, and dataset card management with exponential backoff. Use when persisting agent learnings, snapshots, or semantic caches to HuggingFace.
SME-facing front-door skill for Agentic OS ecosystem evolution. Invokes the os-architect interview flow: classifies intent, audits existing capabilities, proposes evolution path (orchestrate / update / create), and dispatches work. Use when evolving plugins, skills, or agents — whether applying a new pattern, setting up an improvement lab, filling a capability gap, or coordinating multiple loops.
Performs an uncompromising L5 Enterprise Red Team Audit on a given plugin against the 39-point architectural maturity matrix. Trigger when the user requests a security audit, red team assessment, structural compliance review, or maturity gap analysis of any agent plugin or skill directory.
Fixes broken path references in plugin skill and agent files to ensure portability across installed environments. Use when you see "plugins/" paths in SKILL.md or agent files, need to standardize path references after installing a skill, want to audit and fix cross-plugin path dependencies, run a portability audit on a repository, neutralize hardcoded machine paths like /Users/, find Python scripts using PROJECT_ROOT or Path() to reach into plugins/<name>/ at runtime, or are preparing plugin files for distribution via uvx or bootstrap.py. Also handles evolving a skill in-session while tracking quality scores with the eval runner to continuously improve skill routing accuracy.
Continuously improves an existing agent skill based on eval results using the RED-GREEN-REFACTOR cycle. Apply when a skill's routing accuracy is low, trigger descriptions need sharpening, or os-eval-runner scores are below target. (1) run a RED baseline to observe the failure mode, (2) apply a focused patch and verify with os-eval-runner (GREEN), (3) refactor to close loopholes until score meets threshold. Integrates with os-eval-runner as the objective eval gate. NOT for scaffolding new skills — use create-skill (agent-scaffolders) for that.
This skill should be used when the user asks to "audit a plugin", "validate my plugin", "check plugin structure", "verify plugin is correct", "validate .claude-plugin/plugin.json", "check if my plugin is compliant", "review plugin components", or mentions plugin validation or structure compliance. Also trigger proactively after the user creates or modifies any plugin component (commands, agents, skills, hooks, .claude-plugin/plugin.json). Use this skill even when the user says "check my work" or "make sure this is right" in a plugin context. Do NOT use this for auditing individual skills only (use skill-reviewer for that).
Triggers the L5 Red Team Sub-Agent to rigorously audit a plugin against the 39-point L4 pattern matrix.
Trigger with "run self-audit", "test the analyzer", "regression test the plugin analyzer", "audit the agent-plugin-analyzer", or "verify the analyzer works correctly". Runs the analyze-plugin skill against the agent-plugin-analyzer itself and its test fixtures as a regression smoke test. Use this after making changes to the analyzer to verify nothing broke.
Systematically analyze agent plugins and skills to extract design patterns, architectural decisions, and reusable techniques. Trigger with "analyze this plugin", "mine patterns from", "review plugin structure", "extract learnings from", "what patterns does this plugin use", "check if this plugin is well-structured", "validate plugin compliance", or when examining any plugin or skill collection to understand its design. Use this skill even when the user just says "look at this plugin" or "tell me how this is structured."
Use when implementation is complete, all tests pass, and you need to decide how to integrate the work - guides completion of development work by presenting structured options for merge, PR, or cleanup
This skill does things.
Convert raw plugin analysis results into actionable improvement recommendations for agent-scaffolders and agent-skill-open-specifications. Trigger with "synthesize learnings", "generate improvement recommendations", "what should we improve in our scaffolders", "update our meta-skills based on these findings", or after completing a plugin analysis.
Trigger with "mine plugins", "analyze plugin collection", "run the full analysis pipeline", "inventory and analyze all plugins", "mine patterns from this directory", or when you want to run the complete virtuous cycle: inventory, analyze, extract patterns, synthesize recommendations, and deliver a structured report. Use this even if the user just says "analyze everything in this folder".
Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always
Use when starting development tasks that need isolation from current workspace or before executing implementation plans - creates independent git worktrees with smart directory selection. Good for simultaneously working on things, or when you need to add a worktree, and more.
Trigger with "evaluate autoresearch fit", "score this skill for karpathy loop", "is this a good autoresearch candidate", "assess autoresearch viability for", "which skills are best for autonomous loop optimization", "score skills for 3-file architecture", or when the user wants to determine if a skill is a good candidate for applying the Karpathy autoresearch autonomous optimization loop pattern.
Provides active execution protocols to rigorously audit how code, directory structures, and agent actions comply with the authoritative ecosystem specs. Trigger when validating new skills, plugins, or workflows.
A deliberately broken skill that should trigger multiple anti-pattern detections during analysis.
Provides information about how to create, structure, install, and audit Agent Skills, Plugins, Antigravity Workflows, and Sub-agents. Trigger this when specifications, rules, or best practices for the ecosystem are required.
This skill does things.
Audit file path references in plugins and skills. Trigger with "audit path references", "check file references", "find broken references", "path reference audit", "verify paths", or when you need to validate that all ./references in code actually exist in the skill/plugin. Three-phase audit: (1) SCAN all files for references, (2) VERIFY each exists, (3) REPORT issues. Generates inventory.json for reuse across multiple checks.
A deliberately broken skill that should trigger multiple anti-pattern detections during analysis.
Use when demonstrating a correctly structured agent skill. Trigger when the user asks to "show a well-formed skill", "give me a skill template", or "what does a compliant SKILL.md look like". Also triggers for regression testing: this fixture MUST score maturity >= L2 with zero Critical or Error findings.
# Moved This skill has been promoted to the canonical plugin location: ``` plugins/agent-scaffolders/skills/eval-autoresearch-fit/SKILL.md ``` The data files (ranked skills JSON, opportunities report) remain here under `assets/resources/` as the canonical ecosystem data store. The skill definition and scripts now live in the plugin.
../../../create-stateful-skill/SKILL.md
Use when demonstrating a correctly structured agent skill. Trigger when the user asks to "show a well-formed skill", "give me a skill template", or "what does a compliant SKILL.md look like". Also triggers for regression testing: this fixture MUST score maturity >= L2 with zero Critical or Error findings.
Trigger with "mine this skill", "analyze this skill", "run targeted skill analysis", "extract patterns from this skill", or when you want focused analysis on a single Agent Skill directory without processing an entire plugin. Use this when the user points to a specific skill folder or says "look at this skill".
Use when implementing any feature or bugfix, before writing implementation code
Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes
Use when completing tasks, implementing major features, or before merging to verify work meets requirements