seed-skills/playwright-test-step/SKILL.md
Teaches the agent to structure tests with test.step, attach evidence and annotations via test.info, use soft assertions, and produce readable, debuggable Playwright HTML reports.
npx skillsauth add PramodDutta/qaskills Playwright test.step & ReportingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill makes the agent produce tests that explain themselves in the report. Instead of a flat wall of actions, the agent groups logical phases with test.step, attaches screenshots/JSON/diffs via testInfo.attach, records annotations for traceability, and uses expect.soft to collect multiple failures in one run. When a test fails in CI, a human should understand what happened from the HTML report alone.
Use this skill when writing non-trivial flows, when a test is hard to debug from its output, or when the user mentions reports, steps, annotations, attachments, or issue traceability.
test.step. Steps appear as a collapsible tree in the HTML report with timings, turning a failure into "which step failed" instead of "which line number."console.log is invisible in the report; testInfo.attach puts screenshots, JSON, and text into the report next to the step.expect.soft to gather multiple defects in one execution, but end critical paths with a hard assertion or expect.poll so the test still fails.annotation, and use test.info().annotations to surface skips/known-issues in the report.test.stepSteps nest and report their own duration. Return values from a step to chain them.
import { test, expect } from '@playwright/test';
test('checkout flow', async ({ page }) => {
await test.step('Sign in', async () => {
await page.goto('https://shop.example.com/login');
await page.getByLabel('Email').fill('[email protected]');
await page.getByLabel('Password').fill('Secret123!');
await page.getByRole('button', { name: 'Sign in' }).click();
await expect(page.getByText('Welcome back')).toBeVisible();
});
const orderId = await test.step('Place order for Pro plan', async () => {
await page.getByRole('link', { name: 'Pro plan' }).click();
await page.getByRole('button', { name: 'Buy now' }).click();
await page.getByRole('button', { name: 'Confirm purchase' }).click();
const id = await page.getByTestId('order-id').textContent();
return id!.trim();
});
await test.step(`Verify order ${orderId} in history`, async () => {
await page.goto('https://shop.example.com/orders');
await expect(page.getByText(orderId)).toBeVisible();
});
});
testInfo.attachAttachments render inline in the HTML report. Attach a screenshot, the API response, or a computed diff at the moment it matters.
test('attaches evidence to the report', async ({ page }, testInfo) => {
await page.goto('https://example.com/dashboard');
await test.step('Capture dashboard state', async () => {
// Screenshot attachment (shown inline in the report).
await testInfo.attach('dashboard.png', {
body: await page.screenshot({ fullPage: true }),
contentType: 'image/png',
});
// JSON attachment — the raw API payload behind the screen.
const widgets = await page.evaluate(() => (window as any).__WIDGETS__ ?? []);
await testInfo.attach('widgets.json', {
body: JSON.stringify(widgets, null, 2),
contentType: 'application/json',
});
// Plain-text attachment for a human-readable note.
await testInfo.attach('environment.txt', {
body: `Project: ${testInfo.project.name}\nBase URL: ${page.url()}`,
contentType: 'text/plain',
});
});
});
expect.soft records the failure and keeps going. End the test with expect(test.info().errors).toHaveLength(0) or a hard check so it still fails — and the report shows every problem at once.
test('validates a form with soft assertions', async ({ page }) => {
await page.goto('https://example.com/profile');
await test.step('Verify all profile fields at once', async () => {
await expect.soft(page.getByLabel('Display name')).toHaveValue('Ada Lovelace');
await expect.soft(page.getByLabel('Email')).toHaveValue('[email protected]');
await expect.soft(page.getByLabel('Timezone')).toHaveValue('UTC');
await expect.soft(page.getByRole('img', { name: 'Avatar' })).toBeVisible();
});
// Hard gate: fail the test if any soft assertion failed.
expect(test.info().errors).toHaveLength(0);
});
Annotations attach metadata to a test; they show up in the report and JSON output. Use them to link issues and to document why something is skipped.
test('payment retries on gateway 503', async ({ page }) => {
test.info().annotations.push(
{ type: 'issue', description: 'https://github.com/acme/app/issues/4821' },
{ type: 'suite', description: 'payments-regression' },
);
await page.goto('https://example.com/pay');
await expect(page.getByRole('button', { name: 'Pay' })).toBeEnabled();
});
test('legacy export still works', async ({ page }) => {
test.skip(process.env.LEGACY !== 'on', 'Legacy export disabled in this env');
test.info().annotations.push({ type: 'known-issue', description: 'Slow > 5s, tracked in #5102' });
await page.goto('https://example.com/export');
await expect(page.getByText('Export ready')).toBeVisible({ timeout: 30_000 });
});
A boxed step collapses its internals in the report; the error is reported at the step call site, not buried in shared helper code.
import { test, expect, type Page } from '@playwright/test';
async function loginAs(page: Page, email: string, password: string) {
await test.step(
`Log in as ${email}`,
async () => {
await page.goto('https://example.com/login');
await page.getByLabel('Email').fill(email);
await page.getByLabel('Password').fill(password);
await page.getByRole('button', { name: 'Sign in' }).click();
await expect(page.getByTestId('user-menu')).toBeVisible();
},
{ box: true }, // failure surfaces at this step, internals collapsed
);
}
test('admin can open settings', async ({ page }) => {
await loginAs(page, '[email protected]', 'Admin123!');
await page.getByRole('link', { name: 'Settings' }).click();
await expect(page.getByRole('heading', { name: 'Settings' })).toBeVisible();
});
Wire up reporters in playwright.config.ts. The HTML reporter consumes the steps, attachments, and annotations above; JSON/JUnit feed CI dashboards.
import { defineConfig } from '@playwright/test';
export default defineConfig({
reporter: [
['html', { open: 'never', outputFolder: 'playwright-report' }],
['json', { outputFile: 'test-results/results.json' }],
['junit', { outputFile: 'test-results/junit.xml' }],
['list'],
],
use: {
// Auto-capture trace + screenshot on failure — they attach to the report.
trace: 'retain-on-failure',
screenshot: 'only-on-failure',
video: 'retain-on-failure',
},
});
test.step with a business-readable, imperative name — the report becomes a living test plan.testInfo.attach.trace: 'retain-on-failure' and screenshot: 'only-on-failure' so failing tests carry full debugging context automatically.expect.soft for grouped, independent checks, then add a hard expect(test.info().errors).toHaveLength(0) gate.issue/known-issue annotations to keep traceability between tests and tickets visible in the report.console.log for evidence. Logs are not in the HTML report. Use testInfo.attach.expect.soft everywhere with no hard gate. The test goes green while assertions silently failed. Always end with a hard check.test.skip() and no reason — the report shows a skip with no explanation. Always pass a reason string.test.step"testing
Teaches the agent to migrate a Jest suite to Vitest — vi.mock and the globals shim, vitest.config workspaces/projects, coverage, browser mode, and Vitest v4 breaking changes.
testing
Teaches the agent to speed up Node integration tests with Testcontainers reuse — withReuse(true), TESTCONTAINERS_REUSE_ENABLE, the .testcontainers.properties opt-in, stable hashing for Postgres/MySQL/Kafka, and Ryuk/CI caveats.
development
Port a Java Selenium suite to Playwright TypeScript - locator mapping, WebDriverWait to auto-wait, Grid to workers, Page Object port, with before/after code and a phased checklist.
development
Gate RAG pipelines in CI with versioned golden eval sets, per-metric thresholds, baseline drift detection, and a build that fails when retrieval or answer quality regresses.