plugins/github-copilot-modernization/skills/runtime-validation/SKILL.md
Runtime validation for migrated applications — covers testing strategy (planning phase) and test execution (validation phase): startup verification, integration testing, and end-to-end flow validation. Use when: (1) designing test strategy during planning phase (teamlead reads Part 1) (2) verifying a migrated app starts and runs correctly (3) writing or executing integration / E2E tests (tester reads Part 2) (4) choosing test tooling and environment setup (5) producing structured test evidence and verdicts Triggers: "runtime validation", "testing strategy", "test strategy", "test design", "verify the app", "integration test", "e2e test", "end-to-end test", "smoke test", "startup check", "write tests", "run tests", "test the migration", "playwright", "testcontainers", "test strategy", "test plan", "runtime gate", "testing phase", "validation phase"
npx skillsauth add microsoft/github-copilot-modernization runtime-validationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill serves two phases of the pipeline:
Audience: planning roles producing the pre-implementation plan. Tester reads the resulting testing strategy during validation and executes it. Output: a "Testing Strategy" section in the planning artifact, produced before writing test code.
Not every project needs every tier. Pick the tiers that match the app's architecture:
| Tier | Config | DB | Security | Speed | When to use | |------|--------|----|----------|-------|-------------| | Unit | Mocked deps | None | None | 1ms | Always — implementer writes & runs. Test infrastructure is designed here (see §1.4 item 10) | | Slice | Partial app context | In-memory | Mocked | 100ms | Framework-supported isolation (e.g., @WebMvcTest, partial DI) | | Integration | Full app context + Docker-based infra | Real DB | Real | 3s | Default for most apps — catches 80% of bugs | | E2E (API) | Running app (random port or external) | Real | Real | 5s | API-only apps, or API layer of mixed apps | | E2E (Browser) | Playwright against running app | Real | Real | 10s | Any app with rendered HTML — only tool that sees what users see |
Key insight: Integration tier (full app context + Docker-based real infrastructure) is the sweet spot for most migrations. Don't skip it by jumping straight to browser E2E.
| App Type | Primary Tool | Reference |
|---|---|---|
| Server-rendered Java (JSP, Thymeleaf) with minimal JS | Playwright (if Node available) + framework integration tests + Docker-based infra | playwright.md |
| REST API only (no browser UI) | Framework integration tests + Docker-based infra | rest-assured.md |
| Client-side SPA (React, Angular, Vue) | Playwright (TypeScript) | playwright.md |
| Server-rendered with significant client-side JS | Playwright + framework integration tests for API layer | Both |
| Messaging (Kafka / Service Bus) | Docker-based infra (Testcontainers for Java, docker-compose for others) + framework test runner | testcontainers.md |
| Mixed (API + SPA browser UI) | Playwright for UI + framework integration tests for API | Both |
| .NET (ASP.NET Core) | WebApplicationFactory<T> + HttpClient | (built-in) |
| .NET (WinForms/WPF — no web UI) | xUnit + in-process testing | (no browser E2E needed) |
Non-Java frameworks — equivalent tools:
| Framework | Integration Tool (like MockMvc) | Browser E2E | Startup Check |
|-----------|-------------------------------|-------------|---------------|
| .NET (ASP.NET Core) | WebApplicationFactory<T> + TestServer | Playwright | dotnet run + health check |
| Node.js (Express/Nest) | supertest (calls app instance directly) | Playwright | npm start + health check |
| Python (Django) | django.test.Client | Playwright | manage.py runserver |
| Python (FastAPI) | TestClient (httpx-based) | Playwright | uvicorn + health check |
| Go | httptest.NewServer + http.Client | Playwright | compile + run + health check |
| Ruby (Rails) | ActionDispatch::IntegrationTest | Playwright | rails server |
The pattern is universal: each framework has a built-in "test the handler without a real HTTP server" tool for the integration tier, and Playwright covers the browser tier across all of them.
When is Playwright REQUIRED vs RECOMMENDED?
@SpringBootTest + @AutoConfigureMockMvc covers 80% of flows (cross-controller, forward/redirect, security). Gap: cannot verify actual view rendering. Document this gap but do not skip testing.Spring MVC apps with HTML views: use MockMvc, not TestRestTemplate. @SpringBootTest + @AutoConfigureMockMvc loads all controllers + security + exception handlers and can assert view names, model attributes, and forwarded URLs. TestRestTemplate cannot inspect these — a @ControllerAdvice that catches exceptions and returns 200 + error view will make TestRestTemplate see all requests as "successful". For apps with HTML views, always prefer MockMvc.
Spring REST API apps (JSON only): TestRestTemplate is fine. API responses use HTTP status codes directly (200/4xx/5xx), no view resolution to inspect. TestRestTemplate handles cookies/sessions/redirects naturally and is simpler to use.
⚠️ MANDATORY: Verify tool availability before selecting a branch. Do NOT assume answers — run the following commands in the terminal and use the actual output to choose the correct branch:
# Unix/macOS — verify CLI exists AND daemon/runtime is functional which docker && docker info > /dev/null 2>&1 && echo "Docker: available" || echo "Docker: NOT available" which node && node --version && echo "Node.js: available" || echo "Node.js: NOT available"# Windows (PowerShell) Get-Command docker -ErrorAction SilentlyContinue | ForEach-Object { docker info >$null 2>&1; if ($?) { "Docker: available" } else { "Docker: NOT available (daemon not responding)" } } Get-Command node -ErrorAction SilentlyContinue | ForEach-Object { node --version; if ($?) { "Node.js: available" } else { "Node.js: NOT available" } }If a command succeeds, take the YES branch. If it fails (
command not foundor daemon not responding), take the NO branch. ⚠️docker --versionis NOT sufficient — it succeeds even when the daemon is stopped. Usedocker infoinstead. Skipping this verification and defaulting to a fallback branch is a defect.
⚠️ The two trees below are INDEPENDENT capability axes. Evaluate BOTH and combine the results. Docker availability determines the infrastructure tier (Docker-based real infra vs embedded/in-memory alternatives). Node.js availability determines the browser-E2E tier (Playwright vs framework-native HTTP testing). Losing one capability does NOT affect the other axis.
Has Docker?
├── YES
│ ├── Java/Spring → Testcontainers (DB, MQ, caches)
│ ├── .NET / Node.js / Python / Go / Ruby → docker-compose or docker run (DB, MQ, caches)
│ ├── Full stack with docker-compose? → docker-compose up + test against it
│ └── Otherwise → start app + Docker-managed deps
│
└── NO
├── Java/Spring → H2 in-memory DB + embedded alternatives for other infra (e.g., embedded Kafka, local mock services) — see §A.1 for H2 SQL compatibility risks
├── .NET → SQLite in-memory
└── Node.js / Python / Go / Ruby → SQLite/in-memory store + mock services
Has Node.js?
├── YES + App renders HTML (JSP, Thymeleaf, Razor, templates, SPA)
│ ├── `npx playwright install --with-deps chromium` succeeds?
│ │ └── YES → Playwright for browser E2E (catches view-layer bugs invisible to framework-native HTTP testing)
│ │ └── NO → Framework-native HTTP testing fallback (document gap: browser rendering unverified; record exact install error)
├── YES + API-only (no HTML)
│ └── Framework-native HTTP client (no Playwright needed)
└── NO
├── Java with HTML views → framework integration test (e.g., MockMvc — covers cross-controller flows, forward/redirect, security; only gap is actual view rendering)
├── API-only (any language) → framework-native HTTP testing
└── No Node.js + non-Java with HTML views → document gap, prioritize getting Node.js
Beyond Docker and Node.js, verify these additional prerequisites when they apply:
| Capability | Prerequisite | Verify Command | If Missing |
|-----------|-------------|----------------|------------|
| Docker-based infra | Docker daemon functional | docker info > /dev/null 2>&1 | Embedded/in-memory alternatives (H2/SQLite for DB, embedded broker for MQ, local mocks) |
| Playwright | Node.js + browser binaries | npx playwright install --with-deps chromium | MockMvc / TestRestTemplate |
| Build tools | Maven or Gradle on PATH | mvn --version or gradle --version | Cannot proceed — escalate |
| Headless browser | Playwright headless mode | Set headless: true in Playwright config | Always use headless in CI |
| Network (external) | Outbound HTTPS for image pulls / browser downloads | curl -s --max-time 5 https://registry.npmjs.org | Use pre-cached binaries or mock external services |
These are secondary checks — run them after the Docker/Node.js primary axes. If any fails, document it as a constraint that may force fallback on that specific axis.
Before finalising the testing strategy, check whether legacy E2E or integration tests are available. They may come from two sources — check both:
Source 1 — User-provided tests The user may directly supply test files or paste test code in their request. These take priority over anything discovered on disk. Accept them as-is and skip the file-scan for the journeys they already cover.
Source 2 — Tests discovered in the source project Scan the source project for existing test files. This is the fallback when the user provides nothing.
Inventory procedure:
legacyTestAssets with source user-provided and skip to step 3src/test/, e2e/, cypress/, tests/)| Framework / Pattern | Classification | Migration Target |
|---|---|---|
| AssertJ + Selenium / HtmlUnit | Legacy Java browser E2E | Rewrite to Playwright |
| RestAssured / MockMvc (@SpringBootTest) | Java integration / API tests | Keep as-is if stack unchanged; rewrite if REST contract changes |
| Cypress | JS/TS browser E2E | Evaluate rewrite to Playwright (see §1.3.3) |
| Playwright (existing) | Modern browser E2E | Reuse with minimal edits (see §1.3.3) |
| JUnit / pytest / xUnit (unit tests only) | Unit tests | Out of scope — implementer's job |
legacyTestAssets, including source (user-provided or discovered)Output rule: If legacy E2E tests are found, the Testing Strategy MUST specify one of:
Apply the appropriate strategy based on test classification:
Case A — Rewrite (e.g., AssertJ/Selenium → Playwright)
When the tech stack changes significantly (e.g., server-rendered JSP → Angular SPA), legacy Java browser tests cannot be run against the new frontend. They must be rewritten:
*.spec.ts Playwright tests in the target project's e2e/ or tests/ directorylegacyTestMappingTable — old class → new spec file, journey parity confirmedCase B — Reuse with minimal edits (existing Playwright suite)
When the source already has Playwright tests and the same journeys are still valid after migration:
| test file | reason changed | lines changed |Recording in Testing Strategy:
### Legacy Test Assets
- legacyTestAssets: [e.g., "AssertJ/Selenium suite — 12 test classes in src/test/java/e2e/"]
- source: user-provided | discovered
- migrationDecision: rewrite | reuse-with-edits | discard
- migrationRationale: [reason]
- legacyTestMappingTable: [old class → new spec, or N/A if discard]
The implementation plan MUST include a "Testing Strategy" section containing:
infra-tier: primary tool (e.g., Testcontainers for Java, docker-compose for others) → fallback (e.g., H2, SQLite, embedded broker, local mock). Prerequisite: Dockerbrowser-tier: primary tool (e.g., Playwright) → fallback (e.g., framework-native HTTP testing). Prerequisite: Node.js + npx playwright install| Project Type | Primary Validation Stack | |-------------|--------------------------| | Server-rendered web app (any language) | Playwright + framework integration tests + Docker-based infra | | SPA (React/Angular/Vue) | Playwright | | Pure REST API (any language) | Framework integration tests + Docker-based infra | | Mixed app (API + browser UI) | Playwright + framework integration tests | | Messaging / event-driven | Docker-based infra + framework test runner |
⚠️ Fallbacks are per-capability axis, NOT per-stack. Docker and Node.js are independent axes. Losing one capability does NOT affect the other. Apply each row independently based on which specific capability is actually unavailable.
| Capability Unavailable | What Changes | What Stays Unchanged | Gap Introduced |
|----------------------|--------------|----------------------|----------------|
| Docker only (Node.js available) | Docker-based infra → embedded/in-memory alternatives (H2/SQLite for DB, embedded broker for MQ, local mock for services) | Playwright stays — start app with embedded infra, run Playwright against it | Infrastructure fidelity loss (see §1.3.2 for DB-specific risks) |
| Node.js only (Docker available) | Playwright → framework-native HTTP testing | Docker-based infra stays — real DB/MQ/caches via Docker | Cannot verify browser rendering, static assets, browser behavior |
| Node.js available but Playwright install fails | Playwright → framework-native HTTP testing | Docker-based infra stays | Same as above; record exact npx playwright install error |
| Docker + Node.js both unavailable | Docker-based infra → embedded/in-memory, Playwright → framework-native HTTP testing | Framework HTTP tests + embedded infra (maximum degradation) | Both infrastructure fidelity and browser rendering gaps |
Key insight: Playwright requires Node.js + a running HTTP server. It does NOT require Docker. The server can run with embedded/in-memory infrastructure (H2, embedded brokers, etc.) when Docker is unavailable. Therefore "no Docker" NEVER justifies dropping Playwright.
Execution rules — per-capability independence:
npx playwright install --with-deps chromium for browser E2E, set up Docker-based infra for the infra tier). Skipping Playwright because Docker is unavailable, or skipping Docker-based infra because Node.js is unavailable, is NOT acceptable — these are independent axes.docker info fails → embedded/in-memory alternatives for infra, but does NOT justify dropping Playwright). The tester MUST record per-axis: (a) the exact command attempted, (b) the exact error output, (c) why it cannot be resolved within the task, (d) which specific capability axis is being downgraded.[notify:coordinator] with the per-capability blocker evidence. The coordinator decides whether to accept the fallback or remediate.knownGaps for that specific capability axis in the verdict block and final report.Example:
## Testing Strategy
### Critical Journeys
1. [Primary user flow — e.g., login → access protected resource → verify]
2. [Core business operation — e.g., create entity → persist → retrieve → verify]
3. [Administrative flow — e.g., manage resources → CRUD operations]
4. [Auth failure path — e.g., invalid credentials → expected error response]
5. [Data integrity — e.g., create → update → delete → verify gone]
### Tooling
- Infra tier: real infrastructure via Testcontainers (prerequisite: Docker); fallback: embedded/in-memory alternatives (gap: fidelity loss, see §1.3.2 for DB-specific risks)
- HTTP/API tier: framework-native integration test client (e.g., MockMvc for Spring, supertest for Node.js, WebApplicationFactory for .NET)
- Browser tier: Playwright (prerequisite: Node.js + `npx playwright install`); fallback: framework-native HTTP testing (gap: actual view rendering, static assets, browser behavior)
- ⚠️ Infra tier and Browser tier are INDEPENDENT — losing Docker does NOT affect Playwright; losing Node.js does NOT affect Testcontainers
### Test Data
- Seed in test setup (before each test) with unique identifiers (UUID) to avoid interference
- Use auto-rollback or cleanup in teardown
- Do NOT assume pre-existing data
Audience: tester agent during the validation phase. Input: testing strategy from implementation plan (Part 1 output). Output: test code + evidence + verdict.
Before any test runs, verify the project compiles and the application starts.
| Indicator | Tech Stack | Default Start Command |
|-----------|------------|-----------------------|
| pom.xml with spring-boot plugin | Spring Boot (Maven) | mvn spring-boot:run |
| build.gradle with spring-boot plugin | Spring Boot (Gradle) | gradle bootRun |
| package.json with start script | Node.js | npm start |
| *.csproj with ASP.NET | .NET | dotnet run --project <web-project> |
If a runtime-validation-config.yaml exists in the project root, use it to override auto-detection:
startup:
command: "mvn spring-boot:run -Dspring-boot.run.profiles=test"
readinessUrl: "http://localhost:8080/actuator/health"
timeoutSeconds: 120
/actuator/health, /health, /healthz. Poll until 2xx.Started, Listening on port, Application is running.[notify:coordinator]Fail-fast: If startup fails, skip all subsequent steps.
⚠️ MANDATORY: You MUST produce new runnable test code. Running existing mvn test and curling endpoints is NOT sufficient.
Design authority: The planning-phase testing strategy is the source of truth and is a binding contract, not guidance.
primaryValidationStack. Do NOT substitute a simpler tool because it "already works" (e.g., using H2 when Testcontainers PostgreSQL was specified).npx playwright install and write E2E tests — even if Docker is unavailable for the DB tier).fallbackMatrix entry for a given capability when that specific capability's prerequisite is actually unavailable after a real attempt — not because another capability failed, and not because setup seems complex.knownGaps AND escalate via [notify:coordinator] with the exact error that blocked that specific capability.Before writing new tests, check the testing strategy for legacyTestAssets and migrationDecision. Legacy tests may have been user-provided (supplied directly in the request) or discovered on disk — handle them the same way regardless of source.
Case A — migrationDecision: rewrite (e.g., AssertJ/Selenium → Playwright)
*.spec.ts Playwright test that covers the same journeye2e/ or tests/ directory| Legacy class | New Playwright spec | Journey covered |
Case B — migrationDecision: reuse-with-edits (existing Playwright suite)
npx playwright test| Test file | Change reason | Lines changed |
Case C — No legacy tests / migrationDecision: discard
Proceed directly to writing new tests from the testing strategy's critical journeys.
@BeforeEach or test setup@AfterEach or use @Transactional rollbackmvn test + curling endpoints → this checks existing tests, not new validationWhen tests fail:
[notify:role]. Do NOT modify production code.[notify:coordinator]Every test run MUST end with this verdict block:
environment:
docker: AVAILABLE|UNAVAILABLE — <`docker info` output or error>
node: AVAILABLE|UNAVAILABLE — <`node --version` output or error>
playwright: AVAILABLE|UNAVAILABLE — <`npx playwright install` output or error>
infra-tier: PRIMARY(Docker-based)|FALLBACK(embedded/in-memory) — <reason if fallback>
browser-tier: PRIMARY(Playwright)|FALLBACK(MockMvc)|SKIPPED — <reason if not primary>
startup: PASS|FAIL — <start command>, <readiness signal>, <startup time>
integration: PASS|FAIL|UNVERIFIED — <scope>, <gaps>
e2e: PASS|FAIL|PARTIAL|UNVERIFIED — <flows tested>, <boundaries exercised>, <gaps>
overall: PASS|FAIL|NEEDS_SIGNOFF — <reason>
Also produce runtime-validation-report.md:
# Runtime Validation Report
**Generated**: [ISO 8601 timestamp]
**Target**: [project path]
## Summary
| Step | Status | Details |
|------|--------|---------|
| Startup | ✓ PASS | Started in 8.3s, /actuator/health → 200 |
| Integration Tests | ✓ PASS | 3 test files, 12 tests, all green |
| E2E Tests | N/A | No browser UI — skipped per testing strategy |
**Overall**: PASS
## Legacy Test Migration (if applicable)
| Decision | Source | Details |
|---|---|---|
| rewrite / reuse-with-edits / discard | [legacy framework] | [summary] |
### Test Mapping Table (Case A — rewrite)
| Legacy class | New Playwright spec | Journey covered |
|---|---|---|
### Change Log (Case B — reuse-with-edits)
| Test file | Change reason | Lines changed |
|---|---|---|
## Test Evidence
- Surefire reports: target/surefire-reports/*.xml
- [or] Playwright report: playwright-report/index.html
## Issues Found (if any)
| # | Severity | Description | Escalated To |
|---|----------|-------------|--------------|
After runtime validation completes, the final planning/quality reviewer must verify per-capability conformance between the executed test process and the approved testing strategy:
docker info failed → H2 acceptable for DB tier, but does NOT justify dropping Playwright if Node.js was available)A runtime-validation result is not complete if tests ran but the evidence does not show per-capability conformance to the approved testing strategy.
When H2 replaces PostgreSQL/MySQL as a fallback (Docker unavailable), document these risks in knownGaps:
jsonb, hstore, inet, cidr, native uuid, array types (text[], int[])ON CONFLICT ... DO UPDATE (upsert), RETURNING clausestring_agg, array_agg, FILTER clause may differ or be absentpgcrypto, pg_trgm, ltree, uuid-osspMitigation during planning (Part 1): Scan src/main/resources/**/*.sql and repository interfaces for PostgreSQL-specific syntax. If found, mark Docker as REQUIRED (not just preferred) for the DB tier. If not found, H2 fallback is acceptable.
Mitigation during execution (Part 2): If H2 is used and the app starts successfully, proceed with testing — the tests still have value even if DB fidelity is reduced. If H2 startup fails due to unsupported SQL, document the specific incompatibility and escalate as a Docker-required blocker.
development
Evaluates whether a user's modernization/rewrite request provides enough scenario context to proceed (e.g., target component library, screenshots, design system for frontend; API contract policy, data migration strategy for backend). Produces a deterministic clarity score, asks the user for missing required fields via a structured form, and writes a canonical `clarification.md` artifact consumed by all downstream agents. Triggers: "clarification gate", "scenario clarification", "elicit missing context", "evaluate prompt completeness", "ask user for screenshots / target library / design system". NOT for: feature specification (use feature-inventory), planning (use creating-implementation-plan), implementation (use implementing-code), or resolving spec-time `[NEEDS CLARIFICATION]` markers (those remain owned by feature-inventory).
tools
Lifecycle hooks for the modernize-rearchitecture coordinator. Defines hook points, registered actions, and execution rules.
development
Provides role charters (mission, ownership, core principles, quality bar) for a multi-agent coding team. Each charter defines the role's mission, ownership scope, core principle (boundary constraints), and quality bar. Most roles also include communication rules. Consumed by the coordinator during task decomposition to assign work to the correct role. Triggers: "look up role charter", "what does the architect own", "check role boundaries", "find team roles", "which role handles X", "list agent charters", "role responsibilities". NOT for: task decomposition (use breaking-down-tasks), implementation (use implementing-code), architecture analysis (use analyzing-architecture).
tools
Zero-dependency shell recon for any code repository — detect languages, count LOC, and report project scale. Pure POSIX find/wc or PowerShell, no Python or third-party tools required. Triggers: "how big is this project", "what languages", "project sizing", "repo recon", "LOC count", "scope check".