.cursor/skills/ai-regression-testing/SKILL.md
Regression testing strategies for AI-assisted development. Sandbox-mode API testing without database dependencies, automated bug-check workflows, and patterns to catch AI blind spots where the same model writes and reviews code.
npx skillsauth add LUAgam/stage-harness ai-regression-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Testing patterns specifically designed for AI-assisted development, where the same model writes code and reviews it — creating systematic blind spots that only automated tests can catch.
/bug-check or similar review commands after code changesWhen an AI writes code and then reviews its own work, it carries the same assumptions into both steps. This creates a predictable failure pattern:
AI writes fix → AI reviews fix → AI says "looks correct" → Bug still exists
Real-world example (observed in production):
Fix 1: Added notification_settings to API response
→ Forgot to add it to the SELECT query
→ AI reviewed and missed it (same blind spot)
Fix 2: Added it to SELECT query
→ TypeScript build error (column not in generated types)
→ AI reviewed Fix 1 but didn't catch the SELECT issue
Fix 3: Changed to SELECT *
→ Fixed production path, forgot sandbox path
→ AI reviewed and missed it AGAIN (4th occurrence)
Fix 4: Test caught it instantly on first run ✅
The pattern: sandbox/production path inconsistency is the #1 AI-introduced regression.
Most projects with AI-friendly architecture have a sandbox/mock mode. This is the key to fast, DB-free API testing.
// vitest.config.ts
import { defineConfig } from "vitest/config";
import path from "path";
export default defineConfig({
test: {
environment: "node",
globals: true,
include: ["__tests__/**/*.test.ts"],
setupFiles: ["__tests__/setup.ts"],
},
resolve: {
alias: {
"@": path.resolve(__dirname, "."),
},
},
});
// __tests__/setup.ts
// Force sandbox mode — no database needed
process.env.SANDBOX_MODE = "true";
process.env.NEXT_PUBLIC_SUPABASE_URL = "";
process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY = "";
// __tests__/helpers.ts
import { NextRequest } from "next/server";
export function createTestRequest(
url: string,
options?: {
method?: string;
body?: Record<string, unknown>;
headers?: Record<string, string>;
sandboxUserId?: string;
},
): NextRequest {
const { method = "GET", body, headers = {}, sandboxUserId } = options || {};
const fullUrl = url.startsWith("http") ? url : `http://localhost:3000${url}`;
const reqHeaders: Record<string, string> = { ...headers };
if (sandboxUserId) {
reqHeaders["x-sandbox-user-id"] = sandboxUserId;
}
const init: { method: string; headers: Record<string, string>; body?: string } = {
method,
headers: reqHeaders,
};
if (body) {
init.body = JSON.stringify(body);
reqHeaders["content-type"] = "application/json";
}
return new NextRequest(fullUrl, init);
}
export async function parseResponse(response: Response) {
const json = await response.json();
return { status: response.status, json };
}
The key principle: write tests for bugs that were found, not for code that works.
// __tests__/api/user/profile.test.ts
import { describe, it, expect } from "vitest";
import { createTestRequest, parseResponse } from "../../helpers";
import { GET, PATCH } from "@/app/api/user/profile/route";
// Define the contract — what fields MUST be in the response
const REQUIRED_FIELDS = [
"id",
"email",
"full_name",
"phone",
"role",
"created_at",
"avatar_url",
"notification_settings", // ← Added after bug found it missing
];
describe("GET /api/user/profile", () => {
it("returns all required fields", async () => {
const req = createTestRequest("/api/user/profile");
const res = await GET(req);
const { status, json } = await parseResponse(res);
expect(status).toBe(200);
for (const field of REQUIRED_FIELDS) {
expect(json.data).toHaveProperty(field);
}
});
// Regression test — this exact bug was introduced by AI 4 times
it("notification_settings is not undefined (BUG-R1 regression)", async () => {
const req = createTestRequest("/api/user/profile");
const res = await GET(req);
const { json } = await parseResponse(res);
expect("notification_settings" in json.data).toBe(true);
const ns = json.data.notification_settings;
expect(ns === null || typeof ns === "object").toBe(true);
});
});
The most common AI regression: fixing production path but forgetting sandbox path (or vice versa).
// Test that sandbox responses match the expected contract
describe("GET /api/user/messages (conversation list)", () => {
it("includes partner_name in sandbox mode", async () => {
const req = createTestRequest("/api/user/messages", {
sandboxUserId: "user-001",
});
const res = await GET(req);
const { json } = await parseResponse(res);
// This caught a bug where partner_name was added
// to production path but not sandbox path
if (json.data.length > 0) {
for (const conv of json.data) {
expect("partner_name" in conv).toBe(true);
}
}
});
});
<!-- .claude/commands/bug-check.md -->
# Bug Check
## Step 1: Automated Tests (mandatory, cannot skip)
Run these commands FIRST before any code review:
npm run test # Vitest test suite
npm run build # TypeScript type check + build
- If tests fail → report as highest priority bug
- If build fails → report type errors as highest priority
- Only proceed to Step 2 if both pass
## Step 2: Code Review (AI review)
1. Sandbox / production path consistency
2. API response shape matches frontend expectations
3. SELECT clause completeness
4. Error handling with rollback
5. Optimistic update race conditions
## Step 3: For each bug fixed, propose a regression test
User: "バグチェックして" (or "/bug-check")
│
├─ Step 1: npm run test
│ ├─ FAIL → Bug found mechanically (no AI judgment needed)
│ └─ PASS → Continue
│
├─ Step 2: npm run build
│ ├─ FAIL → Type error found mechanically
│ └─ PASS → Continue
│
├─ Step 3: AI code review (with known blind spots in mind)
│ └─ Findings reported
│
└─ Step 4: For each fix, write a regression test
└─ Next bug-check catches if fix breaks
Frequency: Most common (observed in 3 out of 4 regressions)
// ❌ AI adds field to production path only
if (isSandboxMode()) {
return { data: { id, email, name } }; // Missing new field
}
// Production path
return { data: { id, email, name, notification_settings } };
// ✅ Both paths must return the same shape
if (isSandboxMode()) {
return { data: { id, email, name, notification_settings: null } };
}
return { data: { id, email, name, notification_settings } };
Test to catch it:
it("sandbox and production return same fields", async () => {
// In test env, sandbox mode is forced ON
const res = await GET(createTestRequest("/api/user/profile"));
const { json } = await parseResponse(res);
for (const field of REQUIRED_FIELDS) {
expect(json.data).toHaveProperty(field);
}
});
Frequency: Common with Supabase/Prisma when adding new columns
// ❌ New column added to response but not to SELECT
const { data } = await supabase
.from("users")
.select("id, email, name") // notification_settings not here
.single();
return { data: { ...data, notification_settings: data.notification_settings } };
// → notification_settings is always undefined
// ✅ Use SELECT * or explicitly include new columns
const { data } = await supabase
.from("users")
.select("*")
.single();
Frequency: Moderate — when adding error handling to existing components
// ❌ Error state set but old data not cleared
catch (err) {
setError("Failed to load");
// reservations still shows data from previous tab!
}
// ✅ Clear related state on error
catch (err) {
setReservations([]); // Clear stale data
setError("Failed to load");
}
// ❌ No rollback on failure
const handleRemove = async (id: string) => {
setItems(prev => prev.filter(i => i.id !== id));
await fetch(`/api/items/${id}`, { method: "DELETE" });
// If API fails, item is gone from UI but still in DB
};
// ✅ Capture previous state and rollback on failure
const handleRemove = async (id: string) => {
const prevItems = [...items];
setItems(prev => prev.filter(i => i.id !== id));
try {
const res = await fetch(`/api/items/${id}`, { method: "DELETE" });
if (!res.ok) throw new Error("API error");
} catch {
setItems(prevItems); // Rollback
alert("削除に失敗しました");
}
};
Don't aim for 100% coverage. Instead:
Bug found in /api/user/profile → Write test for profile API
Bug found in /api/user/messages → Write test for messages API
Bug found in /api/user/favorites → Write test for favorites API
No bug in /api/user/notifications → Don't write test (yet)
Why this works with AI development:
| AI Regression Pattern | Test Strategy | Priority | |---|---|---| | Sandbox/production mismatch | Assert same response shape in sandbox mode | 🔴 High | | SELECT clause omission | Assert all required fields in response | 🔴 High | | Error state leakage | Assert state cleanup on error | 🟡 Medium | | Missing rollback | Assert state restored on API failure | 🟡 Medium | | Type cast masking null | Assert field is not undefined | 🟡 Medium |
DO:
DON'T:
development
在 generate-test-cases 阶段之后执行,逐个验证测试用例并在失败时修复项目代码、重新编译部署、再次验证, 直到通过或达到最大修复次数。覆盖 UI / API / API+UI / 性能测试四个维度,UI 测试通过浏览器真实模拟用户操作并截图, API 测试根据项目代码生成可执行的接口脚本,性能测试调用现有性能/质量技能全量执行。 涉及真实用户登录信息(如手机号+验证码、账号密码、JWT)时必须中断要求用户提供,禁止编造无效凭证。 所有 case 状态变更必须通过 e2e-case-tracker.sh 脚本持久化,确保中途崩溃可恢复、无 case 遗漏。
development
# SKILL: e2e > **核心原则**: > 1. 测试范围跟着本次变动走。后端接口改了,对应的前端流程必须做联调验证;与本次需求无关的功能不测。对于涉及算法、转换准确率等质量敏感型需求,需额外生成专项质量测试。 > 2. **覆盖完整性优先于执行便利性**。不得以"链路复杂"、"需要外部依赖"为由跳过本次变动相关的用例;凡是受变动影响的接口和 UI 流程,都必须生成真实调用/操作用例。 > 3. **UI 测试必须模拟真实用户操作**(定位元素、点击、键入、等待渲染、断言可见文本/状态)。**禁止**将 UI 套件退化为浏览器上下文里的 `page.evaluate(fetch(...))` API 验证——那只是把 API 测试换了执行环境,没有额外价值,不算 UI 测试。 > 4. **通用性**:本 skill 不假设具体业务域,所有规则均以抽象变动面(文件、接口、页面、用户动作)为单位组织,不针对任何特定项目的数据库/领域词汇。 > 5. **E2E 套件必须验证运行时行为**。严禁把"读取源码/配置文件并做字符串/结构匹配"的检查封装成独立 E2E 套件——这类检
tools
# SKILL: deploy ## CLI Bootstrap 在执行任何 `harnessctl` 命令前,先解析本地 CLI 路径: ```bash if [ -z "${HARNESSCTL:-}" ]; then candidates=( "./stage-harness/scripts/harnessctl" "../stage-harness/scripts/harnessctl" "$(git rev-parse --show-toplevel 2>/dev/null)/stage-harness/scripts/harnessctl" ) for candidate in "${candidates[@]}"; do if [ -n "$candidate" ] && [ -x "$candidate" ]; then HARNESSCTL="$candidate" break fi done fi test -n "${HARNESSCTL:-}" && test -x "$H
tools
# SKILL: build ## CLI Bootstrap 在执行任何 `harnessctl` 命令前,先解析本地 CLI 路径: ```bash if [ -z "${HARNESSCTL:-}" ]; then candidates=( "./stage-harness/scripts/harnessctl" "../stage-harness/scripts/harnessctl" "$(git rev-parse --show-toplevel 2>/dev/null)/stage-harness/scripts/harnessctl" ) for candidate in "${candidates[@]}"; do if [ -n "$candidate" ] && [ -x "$candidate" ]; then HARNESSCTL="$candidate" break fi done fi test -n "${HARNESSCTL:-}" && test -x "$HA