
Write and maintain an implementation diary capturing what changed, why, what worked, what failed (with exact errors and commands), what was tricky, and how to review and validate. Activates proactively during non-trivial implementation work (new features, bug fixes, refactors, research spikes). Does not activate for trivial tasks like one-line fixes, config tweaks, or quick questions.
Address code review feedback by walking through comments one at a time with the user. Use when the user has received code review comments — on a GitHub PR, in a document in the repo, or directly in conversation — and wants to work through them methodically. Also trigger when the user mentions "address review", "review comments", "PR feedback", or wants to respond to code review feedback.
Autonomous experiment loop that iteratively improves a measurable metric. Given a goal, a verify command, and an optional guard, the agent branches, makes one change, measures the result, and keeps or discards the experiment -- repeating indefinitely. Use this skill when the user wants to optimize something measurable through automated experimentation, autonomous improvement loops, or when they mention "autoresearch". Works for any domain with a quantifiable metric (code performance, ML training, build size, test scores, content quality metrics, etc.).
Guide for posting content to the Bluesky social network using the bsky terminal app. This skill should be used proactively when working in public repositories and there is interesting, shareable content (new features, insights, achievements, or announcements worth sharing with the community). Use it when asked to post to Bluesky, or when content seems worth sharing publicly.
Guide for collaborating on GitHub projects. This skill should be used when contributing to projects, creating PRs, reviewing code, or managing issues on GitHub.
Guide for building interactive web UIs with Datastar and gomponents-datastar. Use this skill when adding frontend interactivity to Go web applications with Datastar attributes.
Guide for recording significant architectural and design decisions in docs/decisions.md. Use this skill when clearly significant architectural decisions are made (database choices, frameworks, core design patterns) or when explicitly asked to document a decision. Be conservative - only suggest for major decisions, not minor implementation details.
For when you're asked to write a design doc or specification, especially after a brainstorm or feature design session.
Autonomous project gardening. Scans for maintenance issues (starting with documentation), picks one, fixes it in a worktree, self-reviews with competing agents, and opens a PR. Use when the user wants to tidy up the project, fix stale docs, or generally tend the codebase. Invoke with /garden.
# Observable Plot Skill Observable Plot is a JavaScript library for exploratory data visualization. It's built on D3 and provides a concise, declarative API for creating charts. ## Installation ```bash npm install @observablehq/plot ``` Or via CDN: ```html <script type="module"> import * as Plot from "https://cdn.jsdelivr.net/npm/@observablehq/[email protected]/+esm"; </script> ``` ## Core Concepts ### Plot.plot(options) The main function that renders a visualization. Returns an SVG or HTML figure
Guide for using git according to my preferences. Use it when you're asked to commit something.
Guide for generating and editing images using generative AI with the nanobanana CLI
Guide for using git worktrees to parallelize development with coding agents. Use this skill when the user requests to work in a new worktree or wants to work on a separate feature in isolation (e.g., "Work in a new worktree", "Create a worktree for feature X").
Guide for creating and working with marimo notebooks, the reactive Python notebook that stores as pure .py files. This skill should be used when creating, editing, running, or deploying marimo notebooks.
Guide for saving a web page for offline use using the monolith CLI. Use this when instructed to save a web page.
Guide for working with SQL queries, in particular for SQLite. Use this skill when writing SQL queries, analyzing database schemas, designing migrations, or working with SQLite-related code.
Guide for making code reviews. Use this when asked to make code reviews, or ask to use it before committing changes.
Guide for how to develop Go apps and modules/libraries. Always use this skill when reading or writing Go code.
Guide for working with gomponents, a pure Go HTML component library. Use this skill when reading or writing gomponents code, or when building HTML views in Go applications.
Guide for how to brainstorm an idea and turn it into a fully formed design.
Use this skill when crafting, reviewing, or improving prompts for LLM pipelines — including task prompts, system prompts, and LLM-as-Judge prompts. Triggers include: requests to write or refine a prompt, diagnose why an LLM produces inconsistent or incorrect outputs, bridge the gap between intent and model behavior, reduce ambiguity in instructions, add few-shot examples, structure complex prompts, or improve output formatting. Also use when the user needs help distinguishing specification failures (unclear instructions) from generalization failures (model limitations), or when iterating on prompts based on observed failure modes. Do NOT use for general coding tasks, document creation, or non-LLM writing.
Generate a custom trace annotation web app for open coding during LLM error analysis. Use when the user wants to review LLM traces, annotate failures with freeform comments, and do first-pass qualitative labeling (open coding). Also use when the user mentions "annotate traces", "trace review tool", "open coding tool", "label traces", "build an annotation interface", "review LLM outputs", or wants to manually inspect pipeline traces before building a failure taxonomy. This skill produces a tailored Python web application using FastHTML, TailwindCSS, and HTMX.
Build, validate, and deploy LLM-as-Judge evaluators for automated quality assessment of LLM pipeline outputs. Use this skill whenever the user wants to: create an automated evaluator for subjective or nuanced failure modes, write a judge prompt for Pass/Fail assessment, split labeled data for judge development, measure judge alignment (TPR/TNR), estimate true success rates with bias correction, or set up CI evaluation pipelines. Also trigger when the user mentions "judge prompt", "automated eval", "LLM evaluator", "grading prompt", "alignment metrics", "true positive rate", or wants to move from manual trace review to automated evaluation. This skill covers the full lifecycle: prompt design → data splitting → iterative refinement → success rate estimation.
Build a structured taxonomy of failure modes from open-coded trace annotations. Use this skill whenever the user has freeform annotations from reviewing LLM traces and wants to cluster them into a coherent, non-overlapping set of binary failure categories (axial coding). Also use when the user mentions "failure modes", "error taxonomy", "axial coding", "cluster annotations", "categorize errors", "failure analysis", or wants to go from raw observation notes to structured evaluation criteria. This skill covers the full pipeline: grouping open codes, defining failure modes, re-labeling traces, and quantifying error rates.