maragudk

24 verified skills832 total stars

diary

Write and maintain an implementation diary capturing what changed, why, what worked, what failed (with exact errors and commands), what was tricky, and how to review and validate. Activates proactively during non-trivial implementation work (new features, bug fixes, refactors, research spikes). Does not activate for trivial tasks like one-line fixes, config tweaks, or quick questions.

development40

address-code-review

Address code review feedback by walking through comments one at a time with the user. Use when the user has received code review comments — on a GitHub PR, in a document in the repo, or directly in conversation — and wants to work through them methodically. Also trigger when the user mentions "address review", "review comments", "PR feedback", or wants to respond to code review feedback.

development40

autoresearch

Autonomous experiment loop that iteratively improves a measurable metric. Given a goal, a verify command, and an optional guard, the agent branches, makes one change, measures the result, and keeps or discards the experiment -- repeating indefinitely. Use this skill when the user wants to optimize something measurable through automated experimentation, autonomous improvement loops, or when they mention "autoresearch". Works for any domain with a quantifiable metric (code performance, ML training, build size, test scores, content quality metrics, etc.).

development40

bluesky

Guide for posting content to the Bluesky social network using the bsky terminal app. This skill should be used proactively when working in public repositories and there is interesting, shareable content (new features, insights, achievements, or announcements worth sharing with the community). Use it when asked to post to Bluesky, or when content seems worth sharing publicly.

testing40

collaboration

Guide for collaborating on GitHub projects. This skill should be used when contributing to projects, creating PRs, reviewing code, or managing issues on GitHub.

development40

datastar

Guide for building interactive web UIs with Datastar and gomponents-datastar. Use this skill when adding frontend interactivity to Go web applications with Datastar attributes.

development40

decisions

Guide for recording significant architectural and design decisions in docs/decisions.md. Use this skill when clearly significant architectural decisions are made (database choices, frameworks, core design patterns) or when explicitly asked to document a decision. Be conservative - only suggest for major decisions, not minor implementation details.

development40

design-doc

For when you're asked to write a design doc or specification, especially after a brainstorm or feature design session.

testing40

garden

Autonomous project gardening. Scans for maintenance issues (starting with documentation), picks one, fixes it in a worktree, self-reviews with competing agents, and opens a PR. Use when the user wants to tidy up the project, fix stale docs, or generally tend the codebase. Invoke with /garden.

development40

observable-plot

# Observable Plot Skill Observable Plot is a JavaScript library for exploratory data visualization. It's built on D3 and provides a concise, declarative API for creating charts. ## Installation ```bash npm install @observablehq/plot ``` Or via CDN: ```html <script type="module"> import * as Plot from "https://cdn.jsdelivr.net/npm/@observablehq/[email protected]/+esm"; </script> ``` ## Core Concepts ### Plot.plot(options) The main function that renders a visualization. Returns an SVG or HTML figure

development40

git

Guide for using git according to my preferences. Use it when you're asked to commit something.

documentation40

nanobanana

Guide for generating and editing images using generative AI with the nanobanana CLI

tools40

worktrees

Guide for using git worktrees to parallelize development with coding agents. Use this skill when the user requests to work in a new worktree or wants to work on a separate feature in isolation (e.g., "Work in a new worktree", "Create a worktree for feature X").

development40

marimo

Guide for creating and working with marimo notebooks, the reactive Python notebook that stores as pure .py files. This skill should be used when creating, editing, running, or deploying marimo notebooks.

development40

save-web-page

Guide for saving a web page for offline use using the monolith CLI. Use this when instructed to save a web page.

tools40

sql

Guide for working with SQL queries, in particular for SQLite. Use this skill when writing SQL queries, analyzing database schemas, designing migrations, or working with SQLite-related code.

development40

code-review

Guide for making code reviews. Use this when asked to make code reviews, or ask to use it before committing changes.

development40

go

Guide for how to develop Go apps and modules/libraries. Always use this skill when reading or writing Go code.

development40

gomponents

Guide for working with gomponents, a pure Go HTML component library. Use this skill when reading or writing gomponents code, or when building HTML views in Go applications.

development40

brainstorm

Guide for how to brainstorm an idea and turn it into a fully formed design.

documentation40

prompt-engineering

Use this skill when crafting, reviewing, or improving prompts for LLM pipelines — including task prompts, system prompts, and LLM-as-Judge prompts. Triggers include: requests to write or refine a prompt, diagnose why an LLM produces inconsistent or incorrect outputs, bridge the gap between intent and model behavior, reduce ambiguity in instructions, add few-shot examples, structure complex prompts, or improve output formatting. Also use when the user needs help distinguishing specification failures (unclear instructions) from generalization failures (model limitations), or when iterating on prompts based on observed failure modes. Do NOT use for general coding tasks, document creation, or non-LLM writing.

development8

trace-annotation-tool

Generate a custom trace annotation web app for open coding during LLM error analysis. Use when the user wants to review LLM traces, annotate failures with freeform comments, and do first-pass qualitative labeling (open coding). Also use when the user mentions "annotate traces", "trace review tool", "open coding tool", "label traces", "build an annotation interface", "review LLM outputs", or wants to manually inspect pipeline traces before building a failure taxonomy. This skill produces a tailored Python web application using FastHTML, TailwindCSS, and HTMX.

tools8

llm-as-a-judge

Build, validate, and deploy LLM-as-Judge evaluators for automated quality assessment of LLM pipeline outputs. Use this skill whenever the user wants to: create an automated evaluator for subjective or nuanced failure modes, write a judge prompt for Pass/Fail assessment, split labeled data for judge development, measure judge alignment (TPR/TNR), estimate true success rates with bias correction, or set up CI evaluation pipelines. Also trigger when the user mentions "judge prompt", "automated eval", "LLM evaluator", "grading prompt", "alignment metrics", "true positive rate", or wants to move from manual trace review to automated evaluation. This skill covers the full lifecycle: prompt design → data splitting → iterative refinement → success rate estimation.

development8

failure-taxonomy

Build a structured taxonomy of failure modes from open-coded trace annotations. Use this skill whenever the user has freeform annotations from reviewing LLM traces and wants to cluster them into a coherent, non-overlapping set of binary failure categories (axial coding). Also use when the user mentions "failure modes", "error taxonomy", "axial coding", "cluster annotations", "categorize errors", "failure analysis", or wants to go from raw observation notes to structured evaluation criteria. This skill covers the full pipeline: grouping open codes, defining failure modes, re-labeling traces, and quantifying error rates.

development8

maragudk

diary

address-code-review

autoresearch

bluesky

collaboration

datastar

decisions

design-doc

garden

observable-plot

git

nanobanana

worktrees

marimo

save-web-page

sql

code-review

go

gomponents

brainstorm

prompt-engineering

trace-annotation-tool

llm-as-a-judge

failure-taxonomy

Adoption

maragudk

diary

address-code-review

autoresearch

bluesky

collaboration

datastar

decisions

design-doc

garden

observable-plot

git

nanobanana

worktrees

marimo

save-web-page

sql

code-review

go

gomponents

brainstorm

prompt-engineering

trace-annotation-tool

llm-as-a-judge

failure-taxonomy