Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

shipshitdev/spec-first

Name: spec-first
Author: shipshitdev

bundles/ai-agents/skills/spec-first/SKILL.md

npx skillsauth add shipshitdev/library spec-first

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Spec-First Development

Contract

Inputs:

User request describing a feature, project, or non-trivial implementation task.

Outputs:

Stage A framing with 3 approaches and tradeoffs.
Draft spec-[feature-name].md content for .agents/memory/.
Draft todo.md checklist with per-step verification commands.

Creates/Modifies:

.agents/memory/spec-[feature-name].md (spec artifact).
.agents/memory/decisions-[feature-name].md (decision log).
GitHub Issue (checklist body for active todo tracking).

External Side Effects:

Creates GitHub Issues via the gh CLI when creating todo tracking.

Confirmation Required:

Before creating GitHub Issues.
Before proceeding from Stage C to Stage D (execution).

Delegates To:

prd-task-creator for PRD-style issue creation.
executing-plans for Stage D autonomous execution.

A structured workflow for LLM-assisted coding that delays implementation until decisions are explicit.

When This Activates

"Build X" or "Create Y" (new features/projects)
"Implement..." (non-trivial functionality)
"Add a feature that..." (multi-step work)
Any request requiring 3+ files or unclear requirements

When to Skip

Single-file changes under 50 lines
Typo fixes, log additions, config tweaks
User explicitly says "just do it" or "quick fix"

Core Principles

Delay implementation until tradeoffs are explicit — Use conversation to clarify constraints, compare options, surface risks. Only then write code.
Treat the model like a junior engineer with infinite typing speed — Provide structure: clear interfaces, small tasks, explicit acceptance criteria. Code is cheap; understanding and correctness are scarce.
Specs beat prompts — For anything non-trivial, create a durable artifact (spec file) that can be re-fed, diffed, and reused across sessions.
Generated code is disposable; tests are not — Assume rewrites. Design for easy replacement: small modules, minimal coupling, clean seams, strong tests.
The model is over-confident; reality is the judge — Everything important gets verified by execution: tests, linters, typecheckers, reproducible builds.

The 6-Stage Workflow

Stage A: Frame the Problem (conversation mode)

Goal: Decide before you implement.

Prompts that work:

"List 3 viable approaches. Compare on: complexity, failure modes, testability, future change, time to first demo."
"What assumptions are you making? Which ones are risky?"
"Propose a minimal version that can be deleted later without regret."

Output: Decision notes in .agents/memory/decisions-[feature-name].md

Stage B: Write spec.md (freeze decisions)

Goal: Turn decisions into unambiguous requirements.

File: .agents/memory/spec-[feature-name].md

# [Feature Name] Spec

## Purpose
One paragraph: what this is for.

## Non-Goals
Explicitly state what you are NOT building.

## Interfaces
Inputs/outputs, data types, file formats, API endpoints, CLI commands.

## Key Decisions
Libraries, architecture, persistence choices, constraints.

## Edge Cases and Failure Modes
Timeouts, retries, partial failures, invalid input, concurrency, idempotency.

## Acceptance Criteria
Bullet list of EARS statements (`WHEN`/`WHILE`/`WHERE`/`IF … THE SYSTEM SHALL …`,
or a bare `THE SYSTEM SHALL …`) — testable, pass/fail, no judgement.
Avoid "should be fast." Prefer: "WHEN given 1k items THE SYSTEM SHALL process them under 2s on M1 Mac."

## Test Plan
Unit/integration boundaries, fixtures, golden files, what must be mocked.

Stage C: Generate todo.md (planning mode)

Goal: Stepwise checklist where each step has a verification command.

Tracking: a GitHub Issue per feature — the checklist below is the issue body.

# [Feature Name] TODO

- [ ] Add project scaffolding (build/run/test commands)
  Verify: `bun run build && bun run test`

- [ ] Implement module X with interface Y
  Verify: `bun run test -- --grep "module X"`

- [ ] Add tests for edge cases A/B/C
  Verify: `bun run test -- --grep "edge cases"`

- [ ] Wire integration
  Verify: `bun run integration`

- [ ] Add docs
  Verify: `bun run docs && open docs/index.html`

Each item must be independently checkable. This prevents "looks right" progress.

Stage D: Execute Changes (implementation mode)

Goal: Small diffs, frequent verification, controlled context.

Rules:

One logical change per step
Keep focus on one interface at a time
After each change: run verification command, paste actual output back
Commit early and often

For large codebases:

Provide only relevant files plus spec/todo
If summarizing repo, do it once and keep as reusable artifact

Stage E: Verify and Review (adversarial mode)

Goal: Force the model to try to break its own work.

Prompts:

"Act as a hostile reviewer. Find correctness bugs, not style nits. List concrete failing scenarios."
"Given these acceptance criteria, which are not actually satisfied? Be specific."
"Propose 5 tests that would fail if the implementation is wrong."

Stage F: Decide What Lasts

Goal: Keep the system easy to delete and rewrite.

Heuristics:

Keep "policy" (business rules) separate from "mechanism" (I/O, DB, HTTP)
Prefer shallow abstractions that can be removed without cascade
Invest in tests and fixtures more than clever architecture

The Three-Artifact Convention

Durable spec + decisions live in .agents/memory/ (not project root); the stepwise todo is tracked as a GitHub Issue:

.agents/memory/
├── spec-[feature-name].md       # what/why/constraints
└── decisions-[feature-name].md  # tradeoffs, rejected options, assumptions

GitHub Issue (one per feature)   # steps + verification commands (checklist body)

Naming: Use the feature/task name (e.g., user-auth, api-refactor) as the filename suffix and the issue title.

Why memory/ + Issues:

Keeps project root clean
Durable spec/decisions stay in .agents/memory/ (the source of truth)
Active todos live in GitHub Issues, where work is tracked
Works with prd-task-creator and executing-plans skills
Persists across sessions

Agent Readiness Checklist (IMPACT)

Before running autonomous/agentic execution, verify:

| Dimension | Question | If No... | |-----------|----------|----------| | Intent | Do you have acceptance criteria and a test harness? | Don't run agent | | Memory | Do you have durable artifacts (spec/todo) so it can resume? | It will thrash | | Planning | Can it produce/update a plan with checkpoints? | It will improvise badly | | Authority | Is what it can do restricted (edit, test, commit)? | Too risky | | Control Flow | Does it decide next step based on tool output? | It's just generating blobs | | Tools | Does it have minimum necessary tooling and nothing extra? | Attack surface too large |

Approve at meaningful checkpoints (end of todo item, after test suite passes), not every micro-step.

Prompt Patterns

Authoritarian (for correctness):

Edit these files: [paths]
Interface: [exact signatures]
Acceptance criteria: [list]
Required tests: [list]
Don't change anything else.

Options and tradeoffs (for design):

Give me 3 options and a recommendation.
Make the recommendation conditional on constraints A/B/C.

Context discipline (for large codebases):

Only use the files I provided.
If you need more context, ask for a specific file and explain why.

Make it provable:

Add a test that fails on the buggy version and passes on the correct one.

Output Format

When this skill activates, produce:

SPEC-FIRST WORKFLOW

STAGE A - FRAMING:
[3 approaches with tradeoffs]
[Recommendation]

STAGE B - SPEC:
[Draft spec.md content]

STAGE C - TODO:
[Draft todo.md with verification commands]

Ready to proceed to Stage D (execution)?

shipshitdev/spec-first

bundles/ai-agents/skills/spec-first/SKILL.md

Enforces a spec → plan → execute → verify loop before writing code, preventing "looks right" failures. Activates on "build X", "implement...", "add a feature that...", or any multi-file/unclear-requirements request. Creates spec.md, todo.md, and decisions.md as durable artifacts.

30 stars

development

Updated Jun 24, 2026

$ install --global

skillsauth

npx skillsauth add shipshitdev/library spec-first

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 24, 2026, 5:20 AM148.5s2 files scanned

SKILL.md

name:: spec-first
description:: Enforces a spec → plan → execute → verify loop before writing code, preventing "looks right" failures. Activates on "build X", "implement...", "add a feature that...", or any multi-file/unclear-requirements request. Creates spec.md, todo.md, and decisions.md as durable artifacts.
version:: 1.1.0
tags:: specification, planning, execution, ears

Spec-First Development

Contract

Inputs:

User request describing a feature, project, or non-trivial implementation task.

Outputs:

Stage A framing with 3 approaches and tradeoffs.
Draft spec-[feature-name].md content for .agents/memory/.
Draft todo.md checklist with per-step verification commands.

Creates/Modifies:

.agents/memory/spec-[feature-name].md (spec artifact).
.agents/memory/decisions-[feature-name].md (decision log).
GitHub Issue (checklist body for active todo tracking).

External Side Effects:

Creates GitHub Issues via the gh CLI when creating todo tracking.

Confirmation Required:

Before creating GitHub Issues.
Before proceeding from Stage C to Stage D (execution).

Delegates To:

prd-task-creator for PRD-style issue creation.
executing-plans for Stage D autonomous execution.

A structured workflow for LLM-assisted coding that delays implementation until decisions are explicit.

When This Activates

"Build X" or "Create Y" (new features/projects)
"Implement..." (non-trivial functionality)
"Add a feature that..." (multi-step work)
Any request requiring 3+ files or unclear requirements

When to Skip

Single-file changes under 50 lines
Typo fixes, log additions, config tweaks
User explicitly says "just do it" or "quick fix"

Core Principles

Delay implementation until tradeoffs are explicit — Use conversation to clarify constraints, compare options, surface risks. Only then write code.
Treat the model like a junior engineer with infinite typing speed — Provide structure: clear interfaces, small tasks, explicit acceptance criteria. Code is cheap; understanding and correctness are scarce.
Specs beat prompts — For anything non-trivial, create a durable artifact (spec file) that can be re-fed, diffed, and reused across sessions.
Generated code is disposable; tests are not — Assume rewrites. Design for easy replacement: small modules, minimal coupling, clean seams, strong tests.
The model is over-confident; reality is the judge — Everything important gets verified by execution: tests, linters, typecheckers, reproducible builds.

The 6-Stage Workflow

Stage A: Frame the Problem (conversation mode)

Goal: Decide before you implement.

Prompts that work:

"List 3 viable approaches. Compare on: complexity, failure modes, testability, future change, time to first demo."
"What assumptions are you making? Which ones are risky?"
"Propose a minimal version that can be deleted later without regret."

Output: Decision notes in .agents/memory/decisions-[feature-name].md

Stage B: Write spec.md (freeze decisions)

Goal: Turn decisions into unambiguous requirements.

File: .agents/memory/spec-[feature-name].md

# [Feature Name] Spec

## Purpose
One paragraph: what this is for.

## Non-Goals
Explicitly state what you are NOT building.

## Interfaces
Inputs/outputs, data types, file formats, API endpoints, CLI commands.

## Key Decisions
Libraries, architecture, persistence choices, constraints.

## Edge Cases and Failure Modes
Timeouts, retries, partial failures, invalid input, concurrency, idempotency.

## Acceptance Criteria
Bullet list of EARS statements (`WHEN`/`WHILE`/`WHERE`/`IF … THE SYSTEM SHALL …`,
or a bare `THE SYSTEM SHALL …`) — testable, pass/fail, no judgement.
Avoid "should be fast." Prefer: "WHEN given 1k items THE SYSTEM SHALL process them under 2s on M1 Mac."

## Test Plan
Unit/integration boundaries, fixtures, golden files, what must be mocked.

Stage C: Generate todo.md (planning mode)

Goal: Stepwise checklist where each step has a verification command.

Tracking: a GitHub Issue per feature — the checklist below is the issue body.

# [Feature Name] TODO

- [ ] Add project scaffolding (build/run/test commands)
  Verify: `bun run build && bun run test`

- [ ] Implement module X with interface Y
  Verify: `bun run test -- --grep "module X"`

- [ ] Add tests for edge cases A/B/C
  Verify: `bun run test -- --grep "edge cases"`

- [ ] Wire integration
  Verify: `bun run integration`

- [ ] Add docs
  Verify: `bun run docs && open docs/index.html`

Each item must be independently checkable. This prevents "looks right" progress.

Stage D: Execute Changes (implementation mode)

Goal: Small diffs, frequent verification, controlled context.

Rules:

One logical change per step
Keep focus on one interface at a time
After each change: run verification command, paste actual output back
Commit early and often

For large codebases:

Provide only relevant files plus spec/todo
If summarizing repo, do it once and keep as reusable artifact

Stage E: Verify and Review (adversarial mode)

Goal: Force the model to try to break its own work.

Prompts:

"Act as a hostile reviewer. Find correctness bugs, not style nits. List concrete failing scenarios."
"Given these acceptance criteria, which are not actually satisfied? Be specific."
"Propose 5 tests that would fail if the implementation is wrong."

Stage F: Decide What Lasts

Goal: Keep the system easy to delete and rewrite.

Heuristics:

Keep "policy" (business rules) separate from "mechanism" (I/O, DB, HTTP)
Prefer shallow abstractions that can be removed without cascade
Invest in tests and fixtures more than clever architecture

The Three-Artifact Convention

Durable spec + decisions live in .agents/memory/ (not project root); the stepwise todo is tracked as a GitHub Issue:

.agents/memory/
├── spec-[feature-name].md       # what/why/constraints
└── decisions-[feature-name].md  # tradeoffs, rejected options, assumptions

GitHub Issue (one per feature)   # steps + verification commands (checklist body)

Naming: Use the feature/task name (e.g., user-auth, api-refactor) as the filename suffix and the issue title.

Why memory/ + Issues:

Keeps project root clean
Durable spec/decisions stay in .agents/memory/ (the source of truth)
Active todos live in GitHub Issues, where work is tracked
Works with prd-task-creator and executing-plans skills
Persists across sessions

Agent Readiness Checklist (IMPACT)

Before running autonomous/agentic execution, verify:

Approve at meaningful checkpoints (end of todo item, after test suite passes), not every micro-step.

Prompt Patterns

Authoritarian (for correctness):

Edit these files: [paths]
Interface: [exact signatures]
Acceptance criteria: [list]
Required tests: [list]
Don't change anything else.

Options and tradeoffs (for design):

Give me 3 options and a recommendation.
Make the recommendation conditional on constraints A/B/C.

Context discipline (for large codebases):

Only use the files I provided.
If you need more context, ask for a specific file and explain why.

Make it provable:

Add a test that fails on the buggy version and passes on the correct one.

Output Format

When this skill activates, produce:

SPEC-FIRST WORKFLOW

STAGE A - FRAMING:
[3 approaches with tradeoffs]
[Recommendation]

STAGE B - SPEC:
[Draft spec.md content]

STAGE C - TODO:
[Draft todo.md with verification commands]

Ready to proceed to Stage D (execution)?

Related Skills

shipshitdev/typescript-refactor

development

VerifiedTrustedCommunity

TypeScript refactoring and modernization guidelines from a principal specialist perspective. This skill should be used when refactoring, reviewing, or modernizing TypeScript code to ensure type safety, compiler performance, and idiomatic patterns. Triggers on tasks involving TypeScript type architecture, narrowing, generics, error handling, or migration to modern TypeScript features.

30SKILL.mdUpdated Jul 14, 2026

shipshitdev/typescript-refactor

shipshitdev/typescript-expert

tools

VerifiedTrustedCommunity

Resolves TypeScript and JavaScript problems across type-level programming, performance, monorepo management, migration, and modern tooling. Invoke when diagnosing "type instantiation excessively deep" errors, migrating JS to TS, configuring strict tsconfig, debugging module resolution, or choosing between Biome/ESLint/Turborepo/Nx.

30SKILL.mdUpdated Jul 14, 2026

shipshitdev/typescript-expert

shipshitdev/turborepo

tools

VerifiedTrustedCommunity

Turborepo monorepo build system guidance. Triggers on: `turbo.json`, task pipelines, `dependsOn`, caching, remote cache, the `turbo` CLI, `--filter`, `--affected`, CI optimization, environment variables, internal packages, monorepo structure, and package boundaries. Use when the user configures tasks or workflows, creates packages, sets up a monorepo, shares code between apps, runs changed packages, debugs cache behavior, or works in an `apps/` plus `packages/` workspace.

30SKILL.mdUpdated Jul 14, 2026

shipshitdev/turborepo

shipshitdev/tailwind

tools

VerifiedTrustedCommunity

Provides Tailwind CSS v4 performance optimization and best practices guidelines. Triggers when writing, reviewing, or refactoring Tailwind CSS v4 code; when working with Tailwind configuration, @theme directive, utility classes, responsive design, dark mode, container queries, or CSS generation optimization.

30SKILL.mdUpdated Jul 14, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/shipshitdev/library.git

# Copy into Claude Code skills folder (global)
cp -r library/bundles/ai-agents/skills/spec-first ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

shipshitdev/library

30 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT