skills/high-quality-vibe-coding/SKILL.md
Patterns for vibe coding that produce production-quality output instead of garbage. Covers prompt engineering for AI coding, verification without reading every line, accept/reject/refine decision framework, architectural guardrails, context management (CLAUDE.md, .cursor/rules), TDD-driven vibe coding, screenshot-driven development, diff review discipline, and the anti-patterns that destroy quality. Activate on: "vibe coding", "AI coding quality", "Claude Code workflow", "AI-assisted development", "how to use AI for coding", "coding with AI", "agentic engineering workflow", "AI code quality", "vibe code without garbage", "production quality AI code", "CLAUDE.md setup", "cursorrules setup", "pre-commit hooks for AI", "TDD with AI", "test-driven vibe coding". NOT for: multi-person collaborative coding sessions (use cooperative-vibe-coding), building AI/LLM applications (use ai-engineer), prompt engineering for non-coding LLM use cases (use prompt-engineer).
npx skillsauth add curiositech/windags-skills high-quality-vibe-codingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists." — Andrej Karpathy, February 2025
Karpathy coined the term for throwaway weekend projects. One year later he declared it passe — LLM agents had become the default professional workflow. He now calls it agentic engineering: you orchestrate agents who write code while you provide oversight, architecture, and judgment.
The problem: most people still vibe code the original way — Accept All, ignore diffs, copy-paste errors back in, hope for the best. The result is garbage. Studies show AI co-authored code has 2.74x higher security vulnerability rates, 75% more misconfigurations, and experienced developers are 19% slower when they use AI tools without discipline (while believing they are 24% faster).
This skill is about the discipline layer that makes vibe coding produce production-quality output. You are the architect. The AI is the builder.
cooperative-vibe-codingai-engineerprompt-engineergit-best-practicesvitest-testing-patterns"AI should help us produce better code. Code that is better tested, better documented, code with better commit messages, and code that has been more thoroughly reviewed." — Simon Willison, Agentic Engineering Patterns, 2026
How you describe what you want determines what you get.
[CONTEXT] + [CONSTRAINT] + [EXAMPLE] + [TASK] + [VERIFICATION]
Bad: Build a user authentication system
Good:
Implement JWT auth middleware for our Express app.
Context: TypeScript strict mode, Express 5, Drizzle ORM.
Existing patterns in src/middleware/auth.ts. RS256 with env var keys.
Constraints: Handle expired tokens (401). Attach typed user to Request
(see src/types/express.d.ts). Use jose, no other auth libraries.
Write the middleware, then 3 tests:
1. Valid token -> req.user populated
2. Expired token -> 401 "Token expired"
3. Missing token -> 401 "No token provided"
LLMs are exceptional mimics. Show an example from your codebase instead of describing the pattern in prose:
Here's how we write route handlers:
// src/routes/health.ts
export const healthRoute = createRoute({
method: 'GET',
path: '/health',
handler: async (req, res) => res.json({ status: 'ok', timestamp: Date.now() }),
schema: { response: healthResponseSchema },
});
Create a similar route for GET /api/users/:id using Drizzle patterns in src/db/queries/.
| Size | Result | Use When | |------|--------|----------| | 1-2 sentences | Ambiguous, many assumptions | Trivial tasks (rename, format) | | 3-8 sentences | Focused, high quality | Most tasks (functions, components) | | 1-2 paragraphs | Good if structured | Complex features with clear scope | | Full page+ | Diminishing returns, contradictions | Almost never — break into smaller tasks |
"LLMs perform best when given focused prompts: implementing one function, fixing one bug, or adding one feature at a time." — Addy Osmani, 2026
Before any complex task, force a planning phase. In Claude Code:
Think through the architecture for adding WebSocket support.
Do NOT write any code yet. Give me: files to change, new files needed,
public API signatures, edge cases, test plan.
I will review your plan before we implement.
In Cursor: Shift+Tab twice for Plan Mode (prevents file modifications).
This eliminates the most common vibe coding failure: the AI building the wrong thing very quickly and confidently.
How do you verify AI output without reading every line? Build systems that verify for you.
| Layer | Catches | Setup | Run Time | |-------|---------|-------|----------| | Type checker (tsc --strict) | Wrong types, null errors | 10 min | 2-10 sec | | Linter (ESLint, Biome) | Style violations, common bugs | 15 min | 1-5 sec | | Formatter (Prettier, Biome) | Inconsistent formatting | 5 min | 1-3 sec | | Unit tests (Vitest) | Logic errors, regressions | Ongoing | 5-30 sec | | Visual inspection | UI bugs, layout issues | 0 | 10-30 sec | | Diff review (human) | Design flaws, security issues | 0 | 2-10 min |
Minimum viable verification (catches 80% of AI mistakes in 15 seconds):
tsc --noEmit && eslint . && vitest run
TDD is the single most effective technique for high-quality AI coding. Tests are specifications — when you write the test first, you define exactly what "correct" means. The AI cannot cheat by writing tests that verify broken behavior.
"Automated tests are no longer optional when working with coding agents. The old excuses — that they're time consuming and expensive — no longer hold when an agent can knock them into shape in just a few minutes." — Simon Willison, 2026
The workflow:
Critical rule: If the AI writes both test AND implementation, review the tests with extra scrutiny. AI writes tests that assert "function returns whatever it returns" rather than "function returns what it should return."
Visual verification beats reading CSS diffs:
ALWAYS READ: Security code, public API surfaces, config changes, deleted code
SPOT-CHECK: Implementation logic, error handling, resource management
TRUST IF TESTS PASS: Formatting, import order, variable renames, boilerplate
Every piece of AI-generated code requires a decision.
AI generates code
-> Does it type-check? NO -> REJECT (feed error back)
-> Do tests pass? NO -> REJECT (feed failure back)
-> Does it do what I asked? NO -> REFINE (clarify requirements)
-> Is the approach sound? NO -> REFINE (suggest better approach)
-> Is it maintainable? NO -> REFINE ("simplify — a junior dev should grok this")
-> ACCEPT
REJECT when: Approach is fundamentally wrong. Unwanted dependency added. Files modified outside task scope. Hallucinated API. More complex than the problem warrants.
REFINE when: Approach is right but details are wrong. Missing error handling. Naming inconsistent. Test coverage incomplete. Ask the AI to justify or simplify.
ACCEPT when: Types, lint, and tests pass. You understand the architecture (not every line). Consistent with codebase patterns. Spot-check reveals nothing concerning.
Healthy accept rate: 60-80%. Above 90% means insufficient review. Below 50% means your prompts need work.
Reset when the AI has gone 3+ iterations without converging, the approach is fundamentally wrong, or you have lost track of what the code does.
Iterate when the approach is correct but details are off, test failures are specific, and the AI is converging toward correct.
Heuristic: If your third feedback message is longer than your original prompt, reset. You have more context now — a fresh prompt will be better than continued patching.
Constraints that make it hard for AI to create a mess, even when you are not watching.
{
"compilerOptions": {
"strict": true,
"noUncheckedIndexedAccess": true,
"noImplicitReturns": true,
"noFallthroughCasesInSwitch": true,
"exactOptionalPropertyTypes": true
}
}
strict: true is the single highest-ROI guardrail. AI generates code assuming values
exist that might be null, returns wrong types, accesses optional properties without
checks. The type checker catches all of this at compile time.
npm install -D husky lint-staged && npx husky init
// package.json
{ "lint-staged": {
"*.{ts,tsx}": ["biome check --write", "tsc-files --noEmit"],
"*.{json,md}": ["biome format --write"]
} }
Every commit — human or AI — passes through formatting, linting, and type checking.
// .claude/settings.json
{ "hooks": {
"PostToolUse": [{
"matcher": "Write|Edit",
"hooks": [{ "type": "command", "command": "npx biome check --write $CLAUDE_FILE_PATH" }]
}],
"PreToolUse": [{
"matcher": "Bash\\(rm|Bash\\(git push --force|Bash\\(git reset --hard",
"hooks": [{ "type": "command", "command": "echo 'BLOCKED: Destructive op' && exit 2" }]
}]
} }
Auto-formats every file Claude writes. Blocks destructive operations (exit 2 = block).
AI coding quality is directly proportional to the context you provide.
Keep under 200 lines. For each line ask: "Would removing this cause the AI to make a mistake?" If not, cut it.
# CLAUDE.md
## Project Overview
[1-2 sentences: what it does, tech stack]
## Commands
pnpm dev / pnpm test / pnpm typecheck / pnpm lint
## Code Conventions
- TypeScript strict, no `any`
- Functional components with hooks
- Result<T, E> for expected errors, try/catch for unexpected
- camelCase vars/functions, PascalCase types/components
## Testing
- Colocated: foo.ts -> foo.test.ts
- describe blocks by function, prefer integration over mocks
## Patterns to Follow
- [Reference file for route handlers]
- [Reference file for components]
## Do NOT
- Add dependencies without asking
- Modify tsconfig.json or biome.json
- Use console.log — use src/lib/logger.ts
.cursorrules is deprecated. Use individual .mdc files with frontmatter:
---
description: TypeScript coding standards
globs: ["src/**/*.ts", "src/**/*.tsx"]
alwaysApply: true
---
# TypeScript Standards
- strict: true, never use `any`
- Branded types for IDs: `type UserId = string & { __brand: 'UserId' }`
- Discriminated unions for errors, try/catch for unexpected only
- One component per file, named exports, colocated tests
Principles: One concern per file. Small and actionable. Concrete code samples over abstract descriptions. Explicit globs to scope rules.
Well-structured projects produce better AI output without any CLAUDE.md:
src/
routes/users.ts # AI infers: HTTP route handlers
routes/users.test.ts # AI infers: colocated tests
db/schema.ts # AI infers: database schema
db/queries/ # AI infers: query functions
lib/logger.ts # AI infers: logging utility
types/index.ts # AI infers: shared types
Name clearly. Group related things. Colocate tests. The AI learns from your code.
Prompt -> Generate (~10s) -> Verify: types+tests+visual (~15s) -> Accept/Feedback (~5s)
Quality degrades as conversation grows. At ~70% context, precision drops. At 85%+, hallucinations increase.
/clear in Claude Code between unrelated tasks/clear between unrelated tasks.cursor/rules/ with scoped .mdc filesWhatever your language's equivalent of "strict types + linter + formatter + tests" is,
set it up. Python: mypy --strict + ruff + pytest. Go: the compiler + golangci-lint
go test. These four pillars are universal.Accepting every AI output without verifying. AI generates plausible code with subtle logic errors, security holes, and wrong assumptions that accumulate silently. Fix: Minimum viable verification: type check + lint + test. Fifteen seconds.
One prompt asking for an entire feature with auth, DB, API, frontend, and tests. LLMs lose coherence on long outputs — the 50th function is dramatically worse than the 5th. Fix: 5-10 focused prompts, each independently verifiable.
Pasting errors to AI with "fix this," no context. Each surface-level patch introduces new issues. After 5 rounds the code is a patchwork of bandaids. Fix: Understand the root cause. Tell AI: "Error is X, cause is Y, fix by Z."
JavaScript (not TypeScript), no tests, any everywhere. Without guardrails there is no
automated way to verify correctness.
Fix: 30 min setup: tsconfig strict + Biome + Vitest. Pays back on every AI interaction.
No project conventions documented. AI reinvents patterns on every prompt — callbacks Monday, promises Tuesday, async/await Wednesday. Fix: Write a CLAUDE.md. Create one reference implementation per pattern.
"Build me a full-stack app with auth, payments, and a dashboard." The AI makes hundreds of architectural decisions you did not review. Fix: Delegate implementation, not architecture. You decide schema, auth approach, API design. AI implements within your architecture.
TypeScript and linter warnings ignored because "it works." AI-generated code with warnings is statistically more likely to have logic errors in the same area. Fix: Treat warnings as errors in CI. Clean them as they appear.
Before the session:
During the session:
After the session:
// TODO: fix later hacks — file tickets for follow-ups"If an LLM wrote the code for you, and you then reviewed it, tested it thoroughly and made sure you could explain how it works to someone else — that's not vibe coding, it's software development." — Simon Willison, 2025
The goal is not to avoid AI. It is to use AI so effectively that the code it produces is better than what you would have written by hand — better tested, more consistent, produced in a fraction of the time. That requires discipline. This skill is that discipline.
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.