.claude/skills/e2e-testing/SKILL.md
Playwright visual regression testing for the assistant Flutter SPA. Covers test structure, screenshot baselines, cross-platform diff tolerance, CI workflow with inline diff comments, and baseline management. Use when adding screens, changing layouts, or debugging visual test failures.
npx skillsauth add cedricziel/assistant e2e-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
The assistant frontend is a Flutter web app embedded in the Rust binary. Playwright captures screenshots of the compiled Flutter SPA at three viewport sizes (desktop, tablet, mobile) and compares them against committed baselines.
The binary serves the Flutter web build at /. The webServer config in
playwright.config.ts builds the binary (which embeds flutter build web)
and starts it with --auth-token test-token --listen 127.0.0.1:8787.
crates/web-ui/e2e/
playwright.config.ts # Config: viewports, server, reporters
package.json # Dependencies (playwright, @playwright/test)
tests/
visual-regression.spec.ts # All visual tests
screenshots/ # Committed baselines (PNG)
tests/visual-regression.spec.ts/
login-desktop-chrome.png
login-tablet-chrome.png
login-mobile-chrome.png
...
test-results/ # Generated on failure (diff, actual, expected PNGs)
playwright-report/ # HTML report (gitignored)
# From the e2e directory
cd crates/web-ui/e2e
# Run all visual tests (starts server automatically via webServer config)
npx playwright test
# Update baselines after intentional changes
npx playwright test --update-snapshots
# or use the npm script:
npm run test:update
# Run a single test
npx playwright test -g "traces page"
# Run only desktop viewport
npx playwright test --project=desktop-chrome
# Show HTML report after failure
npx playwright show-report
The webServer config in playwright.config.ts automatically builds and
starts the web-ui binary with --auth-token test-token --listen 127.0.0.1:8787.
Set E2E_BASE_URL to skip the auto-start and use a running server instead.
import { test, expect, Page } from "@playwright/test";
const AUTH_TOKEN = "test-token";
const MAX_DIFF_RATIO = 0.03; // 3% tolerance for cross-platform fonts
const CSS_SETTLE_MS = 300; // Wait for CSS transitions
// Authenticate via the login form
async function login(page: Page) {
await page.goto("/login");
await page.fill('input[name="token"]', AUTH_TOKEN);
await page.click('button[type="submit"]');
await page.waitForURL((url) => !url.pathname.includes("/login"));
}
// Navigate and wait for network idle + CSS settle
async function navigateAndSettle(page: Page, path: string) {
await page.goto(path, { waitUntil: "networkidle" });
await page.waitForTimeout(CSS_SETTLE_MS);
}
test("my new page (empty state)", async ({ page }) => {
await navigateAndSettle(page, "/my-page");
await expect(page).toHaveScreenshot("my-page-empty.png", {
fullPage: true,
maxDiffPixelRatio: MAX_DIFF_RATIO,
});
});
Then generate baselines:
npx playwright test --update-snapshots -g "my new page"
This creates three files in screenshots/ (one per project/viewport).
login() firsttest.beforeEach with login(page):test.describe("Authenticated pages", () => {
test.beforeEach(async ({ page }) => {
await login(page);
});
test("page name", async ({ page }) => { ... });
});
Font rendering differs between macOS (local dev) and Linux (CI). The
maxDiffPixelRatio: 0.03 setting allows up to 3% pixel differences,
which absorbs font hinting/anti-aliasing variance while still catching
layout regressions (moved elements, missing sections, broken styles).
When to adjust:
Three projects in playwright.config.ts match the app's responsive breakpoints:
| Project | Viewport | App Layout |
| ---------------- | -------- | ----------------------------- |
| desktop-chrome | 1280x900 | Icon rail + top bar |
| tablet-chrome | 768x1024 | Hamburger + drawer |
| mobile-chrome | Pixel 7 | Bottom tabs + stacked content |
The visual-regression job in .github/workflows/ci.yml:
subosito/flutter-action@v2, flutter 3.x stable) and runs flutter pub get in app/cargo build -p assistant-cli), which embeds the Flutter web app via build.rsnpx playwright testvisual-diffs/pr-N branch
and posts an inline comment with embedded image comparisonsvisual-diffs/pr-N branchWhen visual tests fail on a PR, the bot posts a comment with:
Review the diff images to decide whether to:
npm run test:update and commitAfter intentional visual changes:
cd crates/web-ui/e2e
npx playwright test --update-snapshots
Commit the updated PNGs in the same commit as the code change.
Font rendering differences. The 3% tolerance should absorb this. If not:
The webServer.timeout is 120 seconds. The first build is slow because
cargo build triggers flutter build web --release (via build.rs).
If it still times out:
cargo build -p assistant-cli from the repo root$PATH (run flutter doctor)E2E_BASE_URL to point to a manually started serverViewport size is fixed per project — if dimensions change, it's likely
the page content height changed. fullPage: true captures the full
scrollable height, so adding content to a page will change the baseline.
tools
Enforces OpenAPI spec discipline when working on REST API endpoints in this project. Triggers whenever adding, modifying, or removing HTTP routes, request/response types, or API handlers in the Rust web-ui crate (`crates/web-ui`). Reminds the agent to (1) update the committed `openapi.json` spec, (2) run `make dump-openapi` to re-export the spec from the running server, and (3) run `make generate-flutter-client` to regenerate the Dart/dio client in `app/packages/assistant_api/`. Also applies when changing route parameters, status codes, or authentication on existing endpoints.
tools
Browser automation via @playwright/mcp (Microsoft). Use this when the user wants to navigate websites, fill forms, take screenshots, scrape web content, test web apps, or run any multi-step browser workflow. Requires no display (headless mode supported).
testing
A minimal example WASM skill that returns a greeting. Use to verify that the WASM execution tier is working correctly.
development
Run coding agents (Claude Code, Codex, OpenCode, or others) as background processes for programmatic control. Use when you need non-blocking execution, parallel agents, PR reviews, or long-running coding tasks. Prefer this over direct bash for any task that takes more than ~20 seconds.