skills/dev-ai-based-test/SKILL.md
AI-based testing via subagent + a per-task test-flow skill. Use when the user wants to verify something that mechanical assertions can't fully capture — image recognition, visual size/position comparison, animation smoothness, multi-step manual flows that need AI judgment. Triggers: 'AI-based test', 'AI test', 'visual verify', 'image recognition test', 'manual operation test', 'human-eye check', 'verify visually', 'compare screenshots', 'looks the same', 'looks correct'. The skill's job is to (1) author a focused test-flow skill that captures the exact procedure + verdict criteria, then (2) dispatch a verification subagent via the Agent tool that loads BOTH the test-flow skill AND a browser-driving skill (/verify-ui primary, /headless-browser fallback) so the subagent has clear context and consistent verdicts. NEVER uses `claude -p` — subagent dispatch goes through the Agent tool exclusively.
npx skillsauth add takazudo/claude-resources dev-ai-based-testInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
AI-based testing for things that can't be cleanly mechanically asserted: image recognition, visual size / position parity, animation correctness, multi-step manual flows where a human eye would catch the bug but assertEqual won't.
The deliverable is not just "run a test" — it's a reusable, focused test-flow skill that captures the test procedure with clear context, plus a dispatched verification subagent that loads that skill alongside a browser-driving skill.
/verify-ui directly, no subagent needed..spec.ts in e2e/, run via pnpm exec playwright test. No AI judgment required.claude -pThe subagent dispatch in this skill uses the Agent tool (the same tool the main agent uses to spawn subagent_type: general-purpose, Plan, Explore, etc.). Never claude -p, never a subprocess shell invocation. The reasons matter:
claude -p produces stdout text the parent has to re-parse and interpret.claude -p starts a fresh process that may not see project skills the parent does.claude -p is opaque — if it stalls or fails, the parent doesn't get clean error signaling.If you find yourself reaching for claude -p for a subagent dispatch, stop and use the Agent tool instead.
The skill has two halves: author the test-flow skill, then dispatch the verification subagent.
A test-flow skill is a small, focused skill at $HOME/.claude/skills/test-flow-<topic>/SKILL.md (or project-local .claude/skills/test-flow-<topic>/SKILL.md) that captures:
The skill is per-task, not per-app. A single project will accumulate multiple test-flow skills as different tests are needed.
test-flow-<short-topic-slug> (e.g. test-flow-composer-image-same-size, test-flow-animation-frame-pacing).pgenImageWidth, composerImageWidth, ratio, verdict, summary).Use the skill-creator skill's init_skill.py to scaffold the new test-flow skill, then write its body. Format with pnpm dlx @takazudo/mdx-formatter --write <path-to-SKILL.md>.
After the test-flow skill is written, dispatch a subagent via the Agent tool:
Agent({
subagent_type: "general-purpose", // browser-driving + structured output, no specialty needed
description: "<short description>",
prompt: `<self-contained brief — see template below>`,
})
The subagent's prompt must include:
/test-flow-<topic> (the just-authored skill) AND a browser-driving skill — /verify-ui for computed-styles / screenshot comparison, OR /headless-browser for multi-step interactive flows.You are a verification subagent. Produce a structured verdict using the test-flow skill below.
## Goal
{one-sentence verdict goal, e.g. "Determine whether the composer-side image visually matches the pgen-side image at default landing viewport."}
## Skills to load
- /test-flow-<topic> — the test procedure and verdict criteria. Read this first.
- /verify-ui — primary browser-driving skill (computed-styles + screenshots).
- /headless-browser — fallback if /verify-ui doesn't fit the task shape.
## Inputs
- Preview URL: <resolved URL — pass from the parent>
- Fixture: <path or asset reference>
- Viewport: <e.g. 1440x900>
- Any other per-run knobs the test-flow skill expects
## Output contract
Return a structured result message containing exactly these fields:
{ <list each field from the test-flow skill's output schema> }
Plus a `summary` field with a one-line human-readable verdict.
## Don'ts
- Don't improvise the test procedure — follow /test-flow-<topic> exactly.
- Don't change the verdict tolerance — it's locked in /test-flow-<topic>.
- Don't post anywhere — return the result to me; I (the parent agent) handle posting.
The parent agent receives the structured result and decides what to do with it: post to a PR comment, write to an evidence file, gate a workflow step, etc. The test-flow skill stays on disk for reuse — next time the same test class is needed, the existing skill is invoked without re-authoring.
| Skill | Best for | When to fall back |
|---|---|---|
| /verify-ui | Deterministic computed-style checks; pure pgen-vs-composer parity; CSS / layout assertions | Cannot drive multi-step UI flows beyond single-page reads |
| /headless-browser | Multi-step interactive flows (drag-drop a file, click → screenshot → click → screenshot); element bounding-rect reads via Playwright CLI | Slightly heavier; only use when /verify-ui can't reach the test surface |
The test-flow skill should name BOTH so the subagent picks based on the task shape. If /verify-ui returns "cannot perform this flow" the subagent transparently switches to /headless-browser without re-prompting the parent.
A test-flow skill is not a one-shot scaffold for a single PR. It's a permanent artifact that captures "how to verify this class of behavior in this codebase." When a similar test is needed later (regression check, repeated verification across PRs), invoke the same test-flow skill — the AI subagent gets the same context, produces consistent verdicts.
Sign that you're using this pattern correctly:
.claude/skills/ (project-scope, shared with the team), not just $HOME/.claude/skills/ (personal-only).---
name: test-flow-composer-image-same-size
description: Verify the composer-side image visually matches the pgen-side image at default landing viewport. Use when /dev-ai-based-test dispatches a subagent for issue #1678 / composer-image-same-size verification.
---
# Test flow: composer image same size as pgen
## Scenario
1. Open <preview URL from inputs> at viewport 1440x900.
2. Click the first template card.
3. Click "Start cropping the pattern".
4. Drop `packages/pattern-gen-viewer/e2e/fixtures/red-100-fits-composition.png` on the pgen canvas-layer.
5. Capture screenshot A (pgen with image visible).
6. Click "Commit selection and open Composer".
7. Wait for composer mount (composer-art-canvas visible).
8. Capture screenshot B (composer with image visible).
## Measurements
- pgen image width (CSS px): read via `__pgenLayerState.getSelectedLayerTransform()` + pgen canvas CSS scale.
- composer image width (CSS px): read via `__composerTest.getState()` + cameraZoom + composer canvas CSS rect.
- ratio = composer / pgen.
## Verdict
PASS if ratio ∈ [0.95, 1.05] (±5%). FAIL otherwise.
## Output schema
{
pgenImageWidth: number,
composerImageWidth: number,
ratio: number,
delta: number,
verdict: "PASS" | "FAIL",
summary: string,
pgenScreenshot: string (path),
composerScreenshot: string (path),
toolUsed: "verify-ui" | "headless-browser"
}
The example shows the shape; the verification subagent reads this and follows the procedure verbatim.
development
Link Claude Code skill names mentioned in a CodeGrid article (data/{series}/{n}.md) to the author's public claude-resources repo, pinned to the latest commit hash so links don't rot. Use when: (1) user says 'linkify cc resources', 'link the skills', 'link skill names', or invokes /dev-linkify-cc-resources; (2) editing a CodeGrid article that mentions `/commits`, `/pr-complete`, `/skill-creator` or other Claude Code skills and they should point to claude-resources. Only links skills that actually exist in the public repo; skips hypothetical examples and code blocks.
development
Second opinion from Claude Opus on a plan or approach. Use when: (1) Planning phase of /big-plan needs a higher-quality review than /codex-2nd / /gco-2nd, (2) User says 'opus 2nd' or 'opus opinion', (3) Wanting Anthropic's larger model to critique a plan. Spawns a general-purpose Agent with model: opus that reads the plan file and returns structured feedback. Anthropic quota — not free.
development
End-of-workflow audit of touched GitHub issues, PRs, and branches via a Sonnet subagent. Use when: (1) /big-plan, /x-as-pr, or /x-wt-teams finishes its main work and needs to verify every touched resource is in the right state (closed when done, kept when ongoing, deleted when dead), (2) User says 'cleanup resources', 'audit cleanup', or 'check what should be closed', (3) A long workflow ends and the manager wants a structured paper trail of what it closed/kept/deleted. Auto-execute by default — the Sonnet agent proposes, the manager (you) executes safe actions and prints a final report.
tools
Solve a complex bug or design problem by building a tiny isolated prototype first, instead of patching the production system in place. Trigger PROACTIVELY when (1) the same bug has resisted 2+ in-place fix attempts (fail-retry loop), (2) the user mentions "minimal prototype", "from zero", "from scratch", "simple script", "sandbox", "standalone", "isolate", "play around", or "try a sandbox version", (3) you find yourself ranking a list of suspects and ruling them out via source-grep on a runtime/visual bug, (4) the user is brainstorming many design options for a UI surface and wants speed (e.g., "make 20 patterns of the top page"), (5) the next reasonable step would be "instrument the existing complex code" — pause and consider this skill instead. Build the prototype in the repo-scoped Dropbox-synced cclogs dir (`$DROPBOX_CCLOGS_DIR/<repo>/<descriptive-name>/`) so it survives switching between Mac and WSL; the exception is a prototype that must import the repo's production code or use its workspace/Vite tooling — keep that one in `__inbox/<descriptive-name>/` in the project root (in-repo, gitignored) so relative imports resolve. Match the project's tech stack (HTML+CSS+vanilla JS for static sites, Vite+React for React apps, Node script for CLI/utility logic). Don't commit it — its value is the learning, not the artifact. **Variant for repeated regression cycles (8+ in-place fixes on the same bug class):** keep the prototype as a committed sub-package named `packages/prototype-<topic>/` — see the "Variant: project-level reference prototype" section below.