skills/quality-assurance/SKILL.md
End-to-end quality assurance and secure coding for any software repo: code review, test strategy, bug triage, debugging, flaky-test repair, coverage improvement, coverage assessment, coverage regression, identify low-coverage files, coverage artifacts, make coverage, pnpm test:coverage, coverage thresholds failing, propose high-impact tests, coverage gaps, confirm before writing tests, suite architecture, CI/CD quality gates, security best practices, security audits, and AppSec threat modeling for frontend, backend, or full-stack systems. Use when reviewing PRs, receiving review feedback, writing or repairing tests, debugging failing or flaky suites, proving browser behavior, hardening frontend or backend CI, improving release confidence with reliable verification, defining a testing strategy for a new project, security review, security audit, security scan, security report, write a security report, security vulnerability, secure by default, secure coding, write secure code, audit codebase for vulnerabilities, security best practices review, find vulnerabilities, passive security review, active security audit, threat model, threat modeling, AppSec threat model, security threat model, threat model a codebase, threat model a repository, enumerate threats, abuse paths, trust boundaries, attacker model, attack surface analysis, threat enumeration, STRIDE threat model, PASTA threat model, STRIDE PASTA, security guidance for React, security guidance for Django, security guidance for FastAPI, security guidance for Flask, security guidance for Express, security guidance for Next.js, security guidance for Vue, security guidance for jQuery, security guidance for Go, python web security, javascript web security, typescript web security, golang backend security, XSS prevention, SQL injection prevention, CSRF protection, CSP Content Security Policy, secure token storage, avoid prototype pollution, safe HTML rendering, OWASP, cloud security, infrastructure security, IAM security, AWS security, cloud infrastructure security, cloud deployment security, CI/CD security, pipeline security, secrets rotation, secrets management, VPC security, network security, CloudWatch logging, cloud secrets manager, Cloudflare WAF, CDN security, DDoS protection, backup disaster recovery, OIDC federation CI, supply chain security, cloud misconfiguration, S3 bucket security, RDS security, Terraform security, rate limiting, dependency security, npm audit, file upload validation, input validation schema, blockchain security, Solana security, Web3 security, wallet signature verification, wallet ownership verification, transaction verification, Solana wallet, on-chain security, crypto security, NFT security, DeFi security, smart contract interaction security, scan Claude Code configuration, agent configuration security scan, AgentShield, ecc-agentshield, agentshield scan, scan .claude directory, CLAUDE.md security audit, settings.json security, MCP server security scan, hooks security audit, agent definition security, claude code config vulnerability, prompt injection CLAUDE.md, hook command injection, MCP supply chain risk, overly permissive allow list, bash wildcard permission, hardcoded secrets in config, claude code security, agent config hardening, secure claude code setup, initialize secure config, agentshield init, agentshield fix, agentshield opus deep analysis, agentshield github action, claude code settings audit, designing a test pyramid, applying test pyramid ratio 70 20 10, defining unit integration E2E test distribution, AI testing pyramid, AI application test pyramid, LLM testing pyramid, AI test strategy, AI eval pyramid, offline AI evaluations, LLM evals, AI evals, agent workflow testing, red team AI testing, human QA AI, LLM-as-judge, PromptFoo evals, Ragas evals, DeepEval evals, RAG testing pyramid, AI agent test layers, deterministic unit tests AI, AI contracts testing, AI integration testing, test pyramid for LLM apps, test pyramid for agent systems, eval framework selection, AI quality assurance, before refactoring to build a safety net, setting up CI CD automated tests, quality issues bugs happen frequently, how should we test, test strategy for, write tests for, test plan, what tests do we need, testing approach, test strategy document output, coverage goals and tools selection, fixing lint errors or formatting issues, running pre-commit checks, using yarn prettier or yarn linc, adding new React error messages, seeing 'unknown error code' warnings, writing Django or DRF tests, generating Factory Boy fixtures, debugging pytest failures in a Django project, testing Django models, serializers, or API views, working with React feature flags, understanding @gate pragmas, debugging channel-specific test failures, adding new flags to ReactFeatureFlags, generating test planning documentation from PRDs or implementation plans, applying ISTQB test design techniques, applying ISO 25010 quality characteristics, creating GitHub test issue templates, defining quality gates and entry/exit criteria, estimating test tasks, structuring test strategies for GitHub project management, writing TypeScript Playwright E2E tests, implementing page object model, reusing auth state across tests, managing test data with API helpers, configuring Playwright for CI, writing smoke tests or critical-path regression tests, debugging flaky E2E tests, cross-browser E2E testing with Chromium Firefox or WebKit, writing Cypress E2E tests, Cypress custom commands, cy.intercept network mocking, Cypress session auth reuse, visual regression testing with toHaveScreenshot, parallel test sharding, accessibility testing with axe-core, Playwright network interception with page.route, test.step structured reporting, E2E-first testing philosophy, preferring E2E tests over unit tests, avoiding React component tests in isolation, minimizing mocking, 3-mock rule, MSW integration tests, Vitest unit tests for pure functions, co-located test files, createTestingAccount utility, addAccountBalance utility, role-based selectors, accessible selectors in Playwright, test ID selectors, avoid CSS selectors in E2E tests, Playwright installation npm init playwright, Playwright configuration webServer, beforeEach afterEach beforeAll afterAll test hooks, serial test mode test.describe.configure, Playwright form interactions fill check uncheck selectOption setInputFiles, Playwright keyboard interactions press type dblclick dragAndDrop, Playwright assertions toBeVisible toBeHidden toHaveText toHaveCount toHaveCSS, soft assertions expect.soft, toPass poll assertion, custom fixtures test.extend, multiple user roles fixture adminPage userPage, test tags filtering tag @smoke @regression grep, video recording retain-on-failure, browser console capture page.on console pageerror, popup handling context waitForEvent, conditional optional elements count isVisible, data-driven tests for loop parametrized, Playwright Docker mcr.microsoft.com playwright, generating Playwright tests from a scenario, Playwright MCP test generation, generate test with Playwright MCP, walk scenario live before writing code, emit test after MCP steps, save and execute generated Playwright test, iterate until Playwright test passes, pytest fixtures, pytest parametrize, pytest markers, pytest-mock mocker, pytest-asyncio async tests, pytest-cov coverage, pytest-xdist parallel pytest, FastAPI testing TestClient AsyncClient httpx, pytest conftest, fixture scope function class module session, autouse fixture, arrange act assert, pytest.raises, pytest.approx, pytest.mark.skip pytest.mark.xfail, pytest.ini addopts testpaths, writing Python unit tests, writing Python integration tests, testing FastAPI endpoints, async Python testing, SQLAlchemy test database override, dependency override FastAPI test, aiosqlite test database, pytest best practices, pytest anti-patterns, pytest common pitfalls, pytest coverage annotate, cov_annotate, coverage annotate report, lines missing coverage, increase coverage to 100%, exclamation mark uncovered lines, iterative coverage improvement, TDD test-driven development, red green refactor, write failing test first, TDD iron laws, no production code without failing test, observe the red, unittest.mock patch Mock mock_open autospec PropertyMock, testing file operations tmp_path tmpdir tempfile, mocking async functions assert_awaited_once, coverage target 80 percent critical path 100 percent coverage, quick reference pytest cheatsheet, monkeypatch setenv delenv setattr, MagicMock magic methods, freezegun freeze_time time travel time freezing, property-based testing hypothesis given strategies, SQLAlchemy in-memory test database db_session, testing retry behavior side_effect sequence, concurrent async asyncio.gather, CI CD GitHub Actions Python pytest workflow, pyproject.toml coverage omit exclude_lines, test naming convention test unit scenario expected outcome, testing error paths failure cases, parametrized fixture fixture params request.param fixture parametrization backend variant environment parametrize fixture, test organization directory structure tests unit integration e2e conftest shared fixtures, python testing patterns, Mastra smoke test, create-mastra, Mastra Studio, Mastra Studio smoke test, smoke test Mastra, create mastra project, mastra dev server, mastra agents page, mastra tools page, mastra workflows page, mastra scorers page, mastra observability, mastra network mode, mastra agent network, plannerNetwork, mastra localhost 4111, mastra LLM provider, mastra openai anthropic groq google cerebras mistral, mastra environment variables, mastra browser testing, mastra studio routes, smoke testing a web app, smoke test checklist browser automation, Jest unit tests, Vitest unit tests, describe beforeEach afterEach jest.fn mockResolvedValue, Supertest API testing integration tests, API endpoint testing Node.js, k6 load testing, k6 stress test, k6 spike test, k6 thresholds, Artillery performance testing, load test ramp up ramp down, p95 response time SLA, performance testing stages, security requirement extraction, threat model to requirements, STRIDE requirements, security user stories, security acceptance criteria, compliance requirement mapping, security traceability matrix, threat to requirement mapping, STRIDE PASTA security requirements, PCI-DSS requirements, HIPAA requirements, GDPR requirements, OWASP ASVS requirements, security requirements from threat model, derive security requirements, security requirement priority, security testing, authentication security tests, authorization IDOR privilege escalation, input validation SQL injection XSS, security headers CSP HSTS X-Frame-Options, rate limiting brute force, OWASP security test checklist, test report template, defect report, severity CRITICAL HIGH MEDIUM LOW, coverage analysis gaps recommendations, QA methodology, exploratory testing charter, usability testing, accessibility testing WCAG 2.1 AA, localization testing RTL UTF-8, compatibility matrix browser OS, pairwise testing, risk-based testing, defect management 5 whys root cause analysis, quality metrics DRE defect leakage MTTR, quality dashboard, continuous testing shift-left, feedback cycle targets, quality gates production release, automation framework, Screenplay pattern Actor Task, keyword-driven testing, model-based testing state machine, self-healing locators multi-strategy finder, error recovery smart retry exponential backoff, parallel distributed execution Playwright sharding, test data factories Faker UserFactory, team enablement training code review checklist, automation ROI calculation break-even, custom reporter metrics, prefer integration tests over unit tests, mock external services use real fixtures, minimize edge case testing, always add regression tests for bugs, cover every user entry point, tests validate before manual QA, testing philosophy testing principles, pnpm test pnpm test:run, co-locate test files star.test.ts, test isolation temp directory cleanup afterEach, pure function tests no setup, sanitized fixtures PII scrubbing, happy path integration test entry point coverage, regression test for bug fail before fix pass after, public interface not internal methods, real world fixtures not mocks for data, test submission checklist, boundary conditions, boundary value analysis, edge case testing, limit values, integer overflow underflow, MIN_VALUE MAX_VALUE, null empty whitespace string testing, collection empty single many, floating point precision tolerance, NaN Infinity special values, array index out of bounds, date time leap year boundaries, parameterized boundary tests, JUnit 5 boundary testing, ParameterizedTest ValueSource CsvSource, Math.addExact Math.subtractExact, isCloseTo within tolerance, pytest.approx sys.maxsize, toBeCloseTo Number.MAX_SAFE_INTEGER, it.each boundary cases, vitest vitest.config.ts vite testing framework, vi.fn vi.mock vi.spyOn vitest mocks, vi.useFakeTimers vi.setSystemTime vitest timers, vitest globals jsdom happy-dom environment, vitest coverage v8 istanbul thresholds, vitest sharding shard CI parallel, expectTypeOf type testing test-d.ts, vitest browser mode playwright chromium, vitest fixtures test.extend custom fixtures, vitest snapshots toMatchSnapshot toMatchInlineSnapshot, vitest workspace projects monorepo, vitest benchmarks bench describe, vitest watch mode HMR, vitest run coverage, vitest concurrent sequential test.concurrent, defineConfig mergeConfig vitest config, vitest setup files setupFiles globalSetup, vitest pool threads forks vmThreads, vitest aroundEach aroundAll around hooks, vi.hoisted hoisted variables vitest, vi.stubGlobal vi.stubEnv vitest globals env, vi.mockObject mock object vitest, vitest soft assertions expect.soft, vitest poll assertion expect.poll, vitest filter tags testNamePattern changed, vitest related files imports, vitest typecheck type check, NestJS E2E testing, NestJS jest e2e, nestjs test e2e, jest-e2e.config.ts, e2e-spec.ts, test/e2e directory, GWT pattern, Given-When-Then test, real infrastructure testing, Docker E2E testing, docker-compose E2E, docker-compose testing infrastructure, Kafka E2E testing, Kafka Jest testing, KafkaTestHelper, Redpanda testing, Kafka consumer E2E, Kafka producer E2E, kafka fromBeginning pre-subscription, waitForMessages smart polling, consumer group isolation, PostgreSQL E2E testing, PostgresTestHelper, MongoDB E2E testing, MongoDbTestHelper, Redis E2E testing, RedisTestHelper, MSW external API mock E2E, Nock API mock, supertest NestJS, NestJS test helper, NestJS app bootstrap test, connectMicroservice test, inheritAppConfig, runInBand sequential jest, jest maxWorkers 1, jest forceExit detectOpenHandles, E2E session temp file, E2E_SESSION log file redirect, fix one test at a time E2E, debug E2E test, review E2E test, setup E2E test, write E2E test, run E2E test, optimize E2E test, e2e test flaky NestJS, e2e connection error database Kafka Redis, e2e timeout async, e2e race condition, e2e state leakage, e2e test isolation beforeEach cleanup, release readiness review, release candidate review, release gate, ship or block, green light to ship, blocked release, release diff review, release diff audit, release review checklist, breaking changes detection, regression detection, release risk assessment, find latest release tag, BASE_TAG TARGET diff, git diff release, version bump check, changelog audit, migration path audit, release blocking triggers, release polish, release notes audit, release readiness checklist, pre-release review, pre-release audit, validate release, release review workflow, release call, release gate policy, deterministic gate, ship block call, openai-agents-python release, release candidate diff, release sign-off, release readiness gate, make format make lint make typecheck make tests, makefile verification stack, parallel make steps, pre-commit make workflow, code-change-verification, run verification stack, make sync uv dev requirements, fail-fast parallel make, heartbeat make steps, make verify run, autonomous penetration testing, pentest web application, shannon keygraph, npx shannon, run a pentest, active security testing, DAST dynamic application security testing, proof of concept exploit, PoC exploit, vulnerability confirmation, confirmed exploit, exploitable vulnerability, security assessment report, JWT algorithm confusion, JWT none algorithm, RS256 HS256 confusion, KID header injection, NoSQL injection MongoDB, mass assignment vulnerability, SSRF cloud metadata, SSRF auth header forwarding, command injection OS, horizontal privilege escalation, vertical privilege escalation, 41 checkpoint security test, pentest staging environment, autonomous pentest, automated pentest, AI pentesting, security assessment, penetration testing report, vulnerability exploitation, exploit validation, pentest framework, authorized testing, QA a web app, test this site, find bugs and fix, test and fix bugs, live web app QA, browser QA testing, QA report, health score, issue taxonomy, QA health score, quick QA, standard QA, exhaustive QA, diff-aware QA, QA tier, QA mode, QA phases, fix loop QA, before after screenshot, ship readiness, does this work, feature ready for testing, QA this branch, QA this PR, QA this feature, visual bug, functional bug, UX bug, console error QA, broken link QA, accessibility bug, content bug, performance bug, per-page exploration, QA checklist, framework detection QA, Next.js QA, Rails QA, WordPress QA, SPA QA, QA regression, WTF likelihood, verified best-effort reverted deferred, commit per fix, minimal fix QA, regression test QA, qa-report-template, issue-taxonomy, systematic debugging, root cause investigation, investigate this error, root cause analysis, debug report, hypothesis testing, scope lock, pattern analysis bug, investigate broken functionality, debug this, fix this bug, why is this broken, phase investigation, race condition debug, null propagation debug, state corruption debug, integration failure debug, configuration drift debug, stale cache debug, 3-strike debugging, blast radius fix, fix root cause not symptom, no fixes without root cause, iron law debugging, reproduce before repair, bug investigation phases, structured debug report, DONE DONE_WITH_CONCERNS BLOCKED status, CSO review, chief security officer audit, comprehensive security audit, infrastructure-first security audit, secrets archaeology, git history credential scan, find leaked credentials in git history, CI/CD pipeline security audit, LLM AI security audit, skill supply chain security scan, attack surface census, webhook signature verification audit, infrastructure shadow surface, false positive confidence gate, security posture report, 15-phase security audit, secrets in git history, dependency supply chain audit, OWASP Top 10 assessment, STRIDE per component threat model, data classification security, active verification security findings, SOC 2 compliance, SOC 2 Type II, PCI-DSS compliance, HIPAA compliance, GDPR compliance, compliance checker, compliance scan, compliance audit, compliance report, compliance framework, compliance validation, compliance score, compliance gap, run compliance check, check compliance, verify compliance, compliance controls, SOC 2 controls, HIPAA safeguards, GDPR principles, PCI cardholder data, CVE triage, CVSS scoring, CVSS score, CVSS v3.1, vulnerability management, vulnerability lifecycle, vulnerability assessment, vulnerability report, vulnerability remediation, dependency CVE, npm vulnerability, Python vulnerability, go vulnerability, vulnerability scanner, vulnerability assessor, scan for vulnerabilities, automated vulnerability scan, risk score dependency, emergency patch, patch priority, vulnerability SLA, security scanner script, scan codebase security, automated code security scan, detect hardcoded secrets, detect injection vulnerabilities, path traversal detection, security scan exit code, zero trust architecture, defense in depth, design secure architecture, secure architecture design, security incident response, handle security incident, incident response plan, incident response playbook, P1 P2 P3 P4 incident severity, breach containment, eradication recovery post-mortem, FIDO2 WebAuthn hardware keys, mTLS mutual TLS, OAuth 2.0 PKCE, JWT short expiry refresh token, authentication pattern selection, AES-256-GCM encryption, Argon2id password hashing, bcrypt hashing, HMAC-SHA256, key management key rotation, envelope encryption DEK KEK, vulnerability severity matrix impact exploitability, security code review checklist, secure vs insecure patterns, SQL injection parameterized query, Ed25519 digital signature, X25519 key exchange, PBKDF2 scrypt KDF.
npx skillsauth add alvarovillalbaa/agent-suite quality-assuranceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Quality assurance is a delivery system, not a phase. Reconstruct intended behavior, choose the cheapest evidence that can prove or falsify it, then wire the same verification into repeatable local and CI workflows.
In command examples below, <skill-dir> means the installed quality-assurance skill directory and <repo-root> means the target repository root.
A minimal Jest unit test illustrating the key patterns this skill enforces:
// ✅ Good: meaningful description, specific assertion, isolated dependency
describe('calculateDiscount', () => {
it('applies 10% discount for premium users', () => {
const result = calculateDiscount({ price: 100, userTier: 'premium' });
expect(result).toBe(90); // specific outcome, not just truthy
});
it('throws on negative price', () => {
expect(() => calculateDiscount({ price: -1, userTier: 'standard' }))
.toThrow('Price must be non-negative');
});
});
Apply the same structure for pytest (def test_…, assert result == expected) and other frameworks.
MUST DO
it('…') descriptions that read as plain-English specificationsexpect(result).toBe(90)), not just truthinessMUST NOT
python <skill-dir>/scripts/qa-scan.py <repo-root> when the bundled scanner is available; otherwise perform the same stack and CI inventory manually.Makefile, package.json, pyproject.toml, tox.ini, noxfile.py, justfile, Taskfile.yml, or CI config before inventing new ones.Use scripts/qa-scan.py. It detects likely languages, frameworks, test runners, linters, and CI providers, then suggests which references to load and which commands probably matter.
Read references/code-review.md for:
gh pr checkout) and preflight verification (npm run preflight)Read references/test-strategy.md for:
Use this section when the system under test includes LLMs, AI agents, RAG pipelines, or any component whose output is nondeterministic.
Read references/ai-evals.md for:
data_source_config + testing_criteria → eval run → report_url resultsDecision rule: if a system prompt, tool call, agent handoff, RAG retrieval, or LLM output is part of the feature under test, add an eval for it. Deterministic code paths still get unit/integration tests.
Use this workflow when coverage drops below targets, coverage thresholds fail in CI, or the user requests stronger test coverage.
Workflow (always confirm with the user before writing any code):
make coverage (Python) or pnpm test:coverage / npm run test:coverage (Node.js) at the repo root. Avoid watch flags. Keep prior artifacts only when comparing trends.coverage report -m for file-level totals.coverage.xml for tooling or spreadsheets.uv run coverage html (Python) or equivalent to generate htmlcov/index.html for interactive drill-down.Design principles for high-impact tests:
For pytest coverage tooling (
--cov,--cov-report, annotate workflow for reaching 100%) seereferences/pytest.md. For coverage targets by area and the coverage lagging-indicator rule seereferences/test-strategy.md.
Read references/backend-testing.md for:
Database strategy by test layer:
| Layer | Database | Rationale | |-------|----------|-----------| | Unit | Mock / in-memory | Fast, isolated, no side effects | | Contract | Test database (seeded) | Verify schema and interface expectations against real SQL | | Integration | Test database (seeded, reset per test) | Full code path with real queries; never production | | E2E | Docker-based real infrastructure | Same engine and behavior as production | | Production | Never used in tests | Protect real data; use anonymized fixtures instead |
Always reset or truncate the test database in
beforeEach/afterEach. Never share test database state across test runs in CI.
Read references/pytest.md when writing Python tests with pytest. Covers:
pytest, pytest-cov, pytest-asyncio, pytest-mock, pytest-xdist)pytest.raises, pytest.approxfunction/class/module/session), setup/teardown with yield, fixture dependencies@pytest.mark.parametrize, explicit IDs, indirect parametrization through fixtures, fixture-level params + request.param for environment/variant testingskip, skipif, xfail, slow, unit, integration, smoke), pytest.ini marker registrationTestClient, async AsyncClient with httpx, dependency override for database testingmocker.patch, mocker.patch.object, side_effect, mocker.spy, mocker.AsyncMock--cov, --cov-report=term-missing/html/xml/annotate, --cov-fail-under, annotate report workflow for reaching 100% coverage (lines prefixed ! = not covered), pytest.ini coverage configurationpytest-asyncio: @pytest.mark.asyncio, async fixtures, async session scopepytest-xdist: -n auto, --dist=loadfiletests/test_unit/, tests/test_integration/, tests/test_e2e/ layouttest_<unit>_<scenario>_<expected>)For Django/DRF tests (Factory Boy, model/serializer/view tests) see the section below.
Read references/django-drf-testing.md for:
unit, integration, e2e, smoke) and run commandsconftest.py fixture setupRead references/test-planning.md when you need to:
Read references/frontend-testing.md for:
Read references/frontend-mocking.md for:
Read references/frontend-async-testing.md for:
Read references/frontend-patterns.md for:
Read references/frontend-testing-best-practices.md when the repo follows an E2E-first strategy or when deciding what type of test to write. Covers:
e2e/tests/ layout, createTestingAccount / addAccountBalance helpers, beforeEach isolation, waitForURL patternsdata-testid; never CSS class or generated ID selectorsfoo.test.ts, Vitest, describe groups, test.each for parameterized cases, edge casesThis is an opinionated philosophy. If the repo uses a different strategy (e.g., Testing Library component tests or a traditional pyramid), use
references/frontend-testing.mdinstead.
Read references/react-testing-strategies.md for React and Next.js specific testing guidance:
jest.config.js — higher for services/payment/ and services/auth/*.test.tsx vs. separate __tests__/ directory*.test.ts / *.spec.ts / *.integration.test.ts / *.a11y.test.ts--shard=N/M)Read references/test-automation-patterns.md for reusable React test patterns:
LoginPage class with typed locators and action methods; Component Object Model for React Testing LibraryOrderBuilder), Faker-based factories for realistic datatest.extend<AuthFixtures> with authenticatedPage / adminPage / testUser fixtures; skip UI login via APIbeforeAll/afterEach/afterAll for MSW server, window.matchMedia and IntersectionObserver mocksrest.get/post handlers, per-test overrides with server.use(), setupServer in msw/nodejest.mock(), jest.requireActual() for partial mocks, mocking next/routerrenderWithProviders wrapping QueryClientProvider + AuthProvider + ThemeProviderfindBy* vs waitFor, waitForElementToBeRemoved, debounce testing with fake timerswaitForResponse, waitForURL, auto-retrying assertionsgetByRole > getByLabelText > getByPlaceholderText > getByTestIdThree automation scripts for React/Next.js projects (in scripts/):
scripts/test_suite_generator.py — Scans React/TypeScript components and generates Jest + React Testing Library test stubs:
# Scan and generate test stubs for all components
python <skill-dir>/scripts/test_suite_generator.py src/components/ --output __tests__/
# Include accessibility tests
python <skill-dir>/scripts/test_suite_generator.py src/ --output __tests__/ --include-a11y
# Scan only (no output, just report what's untested)
python <skill-dir>/scripts/test_suite_generator.py src/components/ --scan-only
scripts/coverage_analyzer.py — Parses Jest/Istanbul coverage reports and surfaces gaps with recommendations:
# Analyze coverage report with threshold
python <skill-dir>/scripts/coverage_analyzer.py coverage/coverage-final.json --threshold 80
# Strict mode — exit 1 if below threshold
python <skill-dir>/scripts/coverage_analyzer.py coverage/ --threshold 80 --strict
# Generate HTML report
python <skill-dir>/scripts/coverage_analyzer.py coverage/ --format html --output report.html
# Identify critical uncovered paths
python <skill-dir>/scripts/coverage_analyzer.py coverage/ --critical-paths
# Compare to previous coverage snapshot
python <skill-dir>/scripts/coverage_analyzer.py coverage/ --compare previous-coverage.json
scripts/e2e_test_scaffolder.py — Scans Next.js App Router and generates Playwright test files with Page Object Models:
# Scaffold E2E tests for Next.js App Router
python <skill-dir>/scripts/e2e_test_scaffolder.py src/app/ --output e2e/
# Include Page Object Model classes
python <skill-dir>/scripts/e2e_test_scaffolder.py src/app/ --output e2e/ --include-pom
# Scaffold for specific routes only
python <skill-dir>/scripts/e2e_test_scaffolder.py src/app/ --routes "/login,/dashboard,/checkout"
Read references/browser-playwright.md for one-off Python Playwright automation (debugging, recon, scraping, screenshots):
scripts/with_server.py — start one or more servers, run your script, then clean up automaticallyexamples/ — runnable scripts for element discovery, static HTML, and console log captureFor TypeScript Playwright E2E test suites (page objects, auth reuse, CI) see the section below.
Use this workflow when the user says "QA this app", "test this site", "find bugs and fix", "test and fix", or "does this work?" It is distinct from writing automated tests — it drives a real browser, finds actual bugs in a running app, scores overall health, fixes issues, and produces a structured report.
Three coverage tiers:
Diff-aware mode (automatic on feature branches): Run git diff main...HEAD --name-only to identify changed files, map them to affected pages/routes, test only those pages first, then check for regressions on adjacent pages. If no pages are obvious from the diff, fall back to Quick mode on the homepage.
Workflow phases:
__next → Next.js, csrf-token → Rails, wp-content → WordPress), create output dir, copy templates/qa-report-template.md.links, check console errors on landing.fix(qa): ISSUE-NNN — description) → before/after screenshots → classify as verified/best-effort/reverted. Write a regression test for each verified fix with // Regression: ISSUE-NNN — {what broke} comment. Stop and ask if WTF-likelihood > 20% (multiple reverts, fixes touching >3 files).templates/qa-report-template.md; include health score delta, fix table, before/after evidence, and a one-line PR summary ("QA found N issues, fixed M, health score X → Y").Health score rubric (0–100 per category, weighted average):
| Category | Weight | Scoring | |----------|--------|---------| | Console | 15% | 0 errors → 100; 1–3 → 70; 4–10 → 40; 10+ → 10 | | Links | 10% | Start 100; -15 per broken link (min 0) | | Functional | 20% | Start 100; Critical: -25; High: -15; Medium: -8; Low: -3 | | UX | 15% | Start 100; Critical: -25; High: -15; Medium: -8; Low: -3 | | Accessibility | 15% | Start 100; Critical: -25; High: -15; Medium: -8; Low: -3 | | Visual | 10% | Start 100; Critical: -25; High: -15; Medium: -8; Low: -3 | | Performance | 10% | Start 100; Critical: -25; High: -15; Medium: -8; Low: -3 | | Content | 5% | Start 100; Critical: -25; High: -15; Medium: -8; Low: -3 |
Bug classification: Read references/issue-taxonomy.md for severity definitions (critical/high/medium/low) and the 7 category taxonomy (visual, functional, UX, content, performance, console, accessibility), plus a per-page exploration checklist.
Report template: Use templates/qa-report-template.md for structured QA reports with health scores, issue tables, before/after evidence, and ship readiness summary.
Read references/e2e-playwright-ts.md for:
e2e/ directory layout (fixtures, pages, tests, utils)data-testid > role > label > placeholder > text)waitForTimeout)storageState (authenticate once, reuse across all tests)finallyplaywright.config.ts production setup: CI-aware retries, multi-browser projects, dev server autostartscripts/run-e2e-with-report.sh — local report generation with configurable project and output dirpage.route() — stub responses, modify payloads, wait for responsestoHaveScreenshot — snapshot workflow and update strategy--shard=N/M — CI split strategies and slow test tagging@axe-core/playwright — WCAG violation detection in automated teststest.step structured reporting — labeled steps in trace viewer and HTML reportSee examples/e2e/ for annotated runnable templates:
page-object-template.ts — base + concrete page object with locators and actione2e-test-template.ts — full test suite covering success, validation error, duplicate, and cancel pathsplaywright-config-example.ts — production Playwright config with auth, multi-browser, and CI settingsUse this workflow when the user provides a scenario and you need to generate a passing Playwright test from it using the Playwright MCP tools.
Hard rules — follow in order, no exceptions:
test() blocks, no locators.@playwright/test that accurately reflects what happened in the live session.tests/ directory (or the established E2E test dir).npx playwright test <file>.Why this order matters: Generating test code before observing live behavior produces fragile selectors, wrong URLs, and mismatched assertions. The MCP session is the source of truth.
Read references/testing-principles.md when the repo follows an opinionated testing philosophy that emphasizes integration tests over unit tests, minimizing edge cases, regression coverage for every bug, and entry-point coverage before manual QA. Covers:
*.test.ts co-located with source, temp-dir isolation with beforeEach/afterEach cleanup, pure function test patternspnpm test (watch mode), pnpm test:run (once)Read references/tdd-iron-laws.md when practicing test-driven development or enforcing a test-first workflow. Covers:
Use this alongside
references/testing-anti-patterns.md. TDD prevents most anti-patterns from forming.
Read references/testable-code-design.md when designing code for testability, improving test naming, or establishing quality metrics:
mockDb / mockEmail without hitting real servicesdate: Date--inspect-brk, RTL screen.debug() / logTestingPlaygroundURL(), Playwright --debug / page.pause()Read the references/vitest/ files when writing unit or integration tests for Vite-based projects. Load only what's relevant to your task:
| Task | File | | ---- | ---- | | Setting up vitest.config.ts, CLI, projects | references/vitest/config.md | | Writing describe/it blocks, hooks, fixtures | references/vitest/test-api.md | | Mocking modules, timers, dates, spies | references/vitest/mocking.md | | Assertions, snapshots, coverage, filtering | references/vitest/utilities.md | | Environments, type testing, browser mode | references/vitest/advanced.md |
Quick start: npm i -D vitest → vitest.config.ts with defineConfig({ test: { globals: true } }) → import { describe, expect, it, vi } from 'vitest'.
Key Vitest-specific patterns:
vi.fn() / vi.mock() / vi.spyOn() — Vitest's native mock API (not jest.*)vi.useFakeTimers() / vi.setSystemTime() — timer and date controltest.concurrent / test.sequential — fine-grained parallelism controlvitest --shard=1/3 — CI sharding without extra toolingexpectTypeOf / .test-d.ts files — TypeScript type-level testsDO NOT load all vitest files at once. Load only the file that matches your current task.
Read references/unit-testing.md for concise Jest and Vitest patterns:
describe/it/beforeEach/afterEach structure for TypeScript servicesjest.fn(), mockResolvedValue, mockRejectedValue — mock functions and async returnsjest.mock('./module') — module-level mockingjest.spyOn — non-destructive method spyingMock, AsyncMock)For Python-specific pytest depth (fixtures, parametrize, markers, coverage) see
references/pytest.md.
Read references/boundary-conditions.md when writing tests that must cover limit values, edge inputs, or overflow scenarios. Covers:
@ParameterizedTest + @ValueSource/@CsvSource, Math.addExact for overflow detection, isCloseTo for float precisionit.each for parameterized boundaries, toBeCloseTo for float, Number.MAX_SAFE_INTEGER edge@pytest.mark.parametrize for multi-value boundaries, pytest.approx for float, sys.maxsize numeric limitsint/JS Number), NaN detection, DST-aware date boundariesRead references/integration-testing.md for Node.js API integration test patterns:
request(app).post().send().expect() — end-to-end HTTP assertionsbeforeAll and protected-route testingbeforeEach for isolationAsyncClient patterns for FastAPI APIsFor Django/DRF-specific integration testing (Factory Boy, viewsets, serializers) see
references/django-drf-testing.md.
Read references/typescript-e2e/common/rules.md first — it covers mandatory GWT pattern, timeout config, test isolation requirements, and the critical output-to-temp-file pattern that prevents context bloat.
Use the appropriate workflow for your task:
| Workflow | When to Use | File | |----------|-------------|------| | Setup | Initialize E2E infrastructure (Docker, Jest config, test helpers) for a NestJS project | references/workflows/typescript-e2e/setup/workflow.md | | Writing | Create new E2E tests with GWT pattern | references/workflows/typescript-e2e/writing/workflow.md | | Review | Audit existing tests for quality, isolation, GWT compliance | references/workflows/typescript-e2e/review/workflow.md | | Running | Execute tests, start/verify Docker infrastructure | references/workflows/typescript-e2e/running/workflow.md | | Debugging | Fix failing or flaky tests one at a time | references/workflows/typescript-e2e/debugging/workflow.md | | Optimizing | Improve test suite performance and startup time | references/workflows/typescript-e2e/optimize/workflow.md |
Knowledge base by technology (load only what the task needs):
references/typescript-e2e/common/ — GWT rules, NestJS setup, debugging, best practices, examplesreferences/typescript-e2e/kafka/ — KafkaTestHelper, pre-subscription isolation, smart polling, Docker, performancereferences/typescript-e2e/postgres/ — PostgresTestHelper, cleanup patterns, transaction examplesreferences/typescript-e2e/mongodb/ — MongoDbTestHelper, Docker setup, document/aggregation examplesreferences/typescript-e2e/redis/ — RedisTestHelper, Docker setup, TTL/pub-sub patternsreferences/typescript-e2e/api/ — Supertest helpers, auth setup, MSW/Nock external API mockingCore principles enforced by all workflows:
// GIVEN:, // WHEN:, // THEN: comments with one action per WHEN--runInBand; parallel E2E causes Kafka/DB state collisions/tmp/e2e-${E2E_SESSION}-*.log; never dump raw output to consoleRead references/e2e-testing.md for a concise E2E strategy reference:
beforeEach, clean up in afterEachFor deep Playwright TypeScript guidance (page objects, auth reuse, CI, flaky test repair) see
references/e2e-playwright-ts.md.
Read references/performance-testing.md when writing or running load tests. Covers:
options.stages, thresholds, http.get, check, sleepsetup() for token exchange, data param for test functionhttp_req_duration, http_req_failed, http_reqs, percentile syntaxUse when the user asks for a "security audit", "CSO review", "comprehensive security audit", "infrastructure security scan", "secrets archaeology", "CI/CD security audit", "LLM security audit", "attack surface census", or "check for leaked credentials in git history". This is a full-spectrum 15-phase audit that produces a structured Security Posture Report with findings, exploit scenarios, severity ratings, and remediation plans.
This skill audits and reports — it never modifies code.
Two audit modes:
--comprehensive): 2/10 confidence gate — surfaces tentative findings; mark them TENTATIVE.Scope flags (mutually exclusive; combine --diff with any):
--infra — infrastructure only (Phases 0–6, 12–14)--code — code only (Phases 0–1, 7, 9–11, 12–14)--skills — AI skill supply chain only (Phases 0, 8, 12–14)--supply-chain — dependency audit only (Phases 0, 3, 12–14)--owasp — OWASP Top 10 only (Phases 0, 9, 12–14)--diff — restrict scanning to files changed on the current branch vs base--scope <domain> — focused audit on a specific domainIf multiple scope flags are passed, error immediately: "Scope flags are mutually exclusive. Pick one, or run without flags for a full audit."
Use the Grep tool for all code searches — the bash patterns in phase descriptions show WHAT to search for, not HOW. Never use raw bash grep.
Before hunting bugs, build an explicit mental model of the codebase.
Stack detection: Check package.json, pyproject.toml, go.mod, Cargo.toml, Gemfile, pom.xml, composer.json for language. Check framework imports for Next.js, Express, FastAPI, Django, Flask, Rails, Spring Boot, Gin, etc.
Soft gate: Stack detection determines scan PRIORITY, not SCOPE. After targeted scanning for detected stacks, run a catch-all pass with high-signal patterns (SQL injection, command injection, hardcoded secrets, SSRF) across ALL file types.
Mental model: Read CLAUDE.md, README, and key config files. Map components, connections, trust boundaries, and data flow. Identify where user input enters, where it exits, what transformations occur. Express as a brief architecture summary. This is a reasoning phase — the output is understanding, not findings.
Map what an attacker sees — code surface and infrastructure surface.
Code surface: Use Grep to find endpoints, auth boundaries, external integrations, file upload paths, admin routes, webhook handlers, background jobs, WebSocket channels. Count each category.
Infrastructure surface: Check for CI/CD workflow files (.github/workflows/, .gitlab-ci.yml), Dockerfiles, docker-compose.yml, IaC files (*.tf, *.tfvars, kustomization.yaml), .env files.
Output:
ATTACK SURFACE MAP
══════════════════
CODE SURFACE
Public endpoints: N (unauthenticated)
Authenticated: N (require login)
Admin-only: N (require elevated privileges)
File upload points: N
External integrations: N
Background jobs: N (async attack surface)
WebSocket channels: N
INFRASTRUCTURE SURFACE
CI/CD workflows: N
Webhook receivers: N
Container configs: N
IaC configs: N
Secret management: [env vars | KMS | vault | unknown]
Scan git history for leaked credentials, check tracked .env files, find CI configs with inline secrets.
Git history patterns to search:
AKIA (AWS access keys), sk- (OpenAI), ghp_/gho_/github_pat_ (GitHub tokens), xoxb-/xoxp-/xapp- (Slack tokens)password, secret, token, api_key in .env, .yml, .json, .conf files.env files tracked by git: Check whether any .env files (excluding .example/.sample/.template) are tracked. Verify .env is in .gitignore.
CI configs with inline secrets: Check whether workflow files reference secrets via ${{ secrets.* }} or secret stores. Inline values are findings.
Severity: CRITICAL for active secret patterns in git history (valid prefix + correct format). HIGH for .env tracked by git or CI configs with inline credentials. MEDIUM for suspicious .env.example values with real-looking data.
FP rules: Placeholders ("your_", "changeme", "TODO") excluded. Test fixtures excluded unless the same value appears in non-test code. Rotated secrets still flagged — they were exposed. .env.local in .gitignore is expected behavior.
Diff mode: Restrict to commits on the current branch only.
Goes beyond npm audit — checks actual supply chain risk.
Detection: Identify package manager from package.json, Gemfile, requirements.txt/pyproject.toml, Cargo.toml, or go.mod.
Vulnerability scan: Run the appropriate tool (npm audit, bundle audit, pip-audit, cargo audit, govulncheck). If not installed, note as "SKIPPED — tool not installed" and continue.
Install scripts in production deps (Node.js): Check production dependencies for preinstall, postinstall, or install scripts — a supply chain attack vector.
Lockfile integrity: Verify lockfiles exist AND are tracked by git.
Severity: CRITICAL for known CVEs (high/critical) in direct deps. HIGH for install scripts in prod deps or missing lockfile. MEDIUM for abandoned packages, medium CVEs, or lockfile not tracked.
FP rules: devDependency CVEs are MEDIUM max. node-gyp/cmake install scripts are expected (MEDIUM not HIGH). Missing lockfile for library repos (not apps) is NOT a finding.
GitHub Actions: For each workflow file, check for:
uses: lines missing @[40-char-sha]pull_request_target — dangerous when it also checks out PR code (fork PRs get write access)${{ github.event.* }} in run: stepsSeverity: CRITICAL for pull_request_target + checkout of PR code, or script injection via event body in run: steps. HIGH for unpinned third-party actions or secrets as unmasked env vars. MEDIUM for missing CODEOWNERS on workflow files.
FP rules: First-party actions/* unpinned = MEDIUM not HIGH. pull_request_target without PR ref checkout is safe. Secrets in with: blocks (not env:/run:) are handled by runtime.
Dockerfiles: Check for missing USER directive (runs as root), secrets passed as ARG, .env files copied into images, unexplained exposed ports.
Config files with prod credentials: Grep for database connection strings (postgres://, mysql://, mongodb://, redis://) with embedded credentials in committed config files — excluding localhost/127.0.0.1/test hosts.
IaC (Terraform/K8s): Check for "*" in IAM actions/resources, hardcoded secrets in .tf/.tfvars, privileged containers, hostNetwork, hostPID.
Severity: CRITICAL for prod DB URLs with credentials in committed config, "*" IAM on sensitive resources, or secrets baked into Docker images. HIGH for root containers in prod, staging with prod DB access, or privileged K8s. MEDIUM for missing USER directive or unexplained exposed ports.
FP rules: docker-compose.yml for local dev with localhost credentials is NOT a finding. Terraform "*" in read-only data sources excluded. K8s manifests in test//dev//local/ with localhost networking excluded.
Webhook routes: Grep for webhook/hook/callback route patterns. For each file, check whether it also contains signature verification (signature, hmac, verify, digest, x-hub-signature, stripe-signature, svix-signature). Files with webhook routes but NO verification are findings.
TLS verification disabled: Grep for verify.*false, VERIFY_NONE, InsecureSkipVerify, NODE_TLS_REJECT_UNAUTHORIZED.*0.
OAuth scope analysis: Check for overly broad OAuth scopes.
Verification approach: Trace handler code to determine if signature verification exists anywhere in the middleware chain. Do NOT make actual HTTP requests to webhook endpoints.
Severity: CRITICAL for webhooks with no signature verification. HIGH for TLS verification disabled in prod code or overly broad OAuth scopes. MEDIUM for undocumented outbound data flows to third parties.
FP rules: TLS disabled in test code excluded. Internal service-to-service webhooks on private networks = MEDIUM max. Webhook endpoints behind an API gateway that handles signature verification upstream are NOT findings — but require code evidence.
Use Grep to search for these patterns:
system: or prompt constructiondangerouslySetInnerHTML, v-html, innerHTML, .html(), raw() rendering LLM responses directlytool_choice, function_call, tools=, functions= without input validationsk- patterns, hardcoded key assignmentseval(), exec(), Function(), new Function processing AI responsesKey checks:
Severity: CRITICAL for user input in system prompts, unsanitized LLM output rendered as HTML, or eval of LLM output. HIGH for missing tool call validation or exposed AI API keys. MEDIUM for unbounded LLM calls or RAG without input validation.
FP rules: User content in the user-message position of an AI conversation is NOT prompt injection. Only flag when user content enters system prompts, tool schemas, or function-calling contexts.
Scan installed AI coding agent skill files for malicious patterns. Published skills can contain network exfiltration, credential access, or prompt injection.
Repo-local skills: Scan .claude/skills/ or equivalent for:
curl, wget, fetch, http, exfiltrat — network exfiltrationANTHROPIC_API_KEY, OPENAI_API_KEY, env., process.env — credential accessIGNORE PREVIOUS, system override, disregard, forget your instructions — prompt injectionGlobal skills: Ask the user for permission before scanning globally installed skills outside the repo (reads files outside the project scope).
Severity: CRITICAL for credential exfiltration attempts or prompt injection in skill files. HIGH for suspicious network calls or overly broad tool permissions. MEDIUM for skills from unverified sources without review.
Note: Skill/agent definition files are NOT documentation — they are executable prompt code that controls AI agent behavior. Never exclude Phase 8 findings under a "documentation files excluded" rule.
Targeted analysis for each category. Scope file extensions to detected stacks from Phase 0.
skip_before_action, skip_authorization, public), direct object reference patterns (params[:id], req.params.id), horizontal/vertical privilege escalationsystem(), exec(), spawn(), popen), template injection (eval(), html_safe, raw()), LLM prompt injection (see Phase 7)For each major component identified in Phase 0, evaluate:
COMPONENT: [Name]
Spoofing: Can an attacker impersonate a user/service?
Tampering: Can data be modified in transit/at rest?
Repudiation: Can actions be denied? Is there an audit trail?
Information Disclosure: Can sensitive data leak?
Denial of Service: Can the component be overwhelmed?
Elevation of Privilege: Can a user gain unauthorized access?
For a deeper AppSec threat model with abuse paths, attacker modeling, and Mermaid diagrams, see the Threat modeling section below.
Classify all data the application handles:
DATA CLASSIFICATION
═══════════════════
RESTRICTED (breach = legal liability):
- Passwords/credentials: [where stored, how protected]
- Payment data: [where stored, PCI compliance status]
- PII: [what types, where stored, retention policy]
CONFIDENTIAL (breach = business damage):
- API keys: [where stored, rotation policy]
- Business logic: [trade secrets in code?]
- User behavior data: [analytics, tracking]
INTERNAL (breach = embarrassment):
- System logs: [what they contain, who can access]
- Configuration: [what's exposed in error messages]
PUBLIC:
- Marketing content, documentation, public APIs
Before producing findings, apply these gates.
Confidence gates:
--comprehensive): 2/10 — flag anything that MIGHT be real; mark these TENTATIVE.Hard exclusions (automatically discard):
pull_request_target, script injection, secrets exposure) under this rule.*.md documentation files — EXCEPTION: skill/agent definition files are executable prompt code; Phase 8 findings MUST NOT be excluded under this rule.Dockerfile.dev or Dockerfile.local unless referenced in prod deploy configs.Precedents:
dangerouslySetInnerHTML, v-html).docker-compose.yml for local dev are NOT findings; in production Dockerfiles/K8s they ARE.pull_request_target without PR ref checkout is safe.Active Verification: For each finding that survives the confidence gate, attempt to PROVE it where safe:
pull_request_target actually checks out PR code.Mark each surviving finding as VERIFIED, UNVERIFIED, or TENTATIVE.
Variant analysis: When a finding is VERIFIED, grep the entire codebase for the same vulnerability pattern. Report variants as separate findings linked to the original: "Variant of Finding #N".
Parallel verification: Use the Agent tool to launch independent verification sub-tasks for each candidate finding. The verifier receives only the file path and line number (not the initial scan's reasoning) plus the FP filtering rules. Discard findings where the verifier scores below threshold.
Exploit scenario requirement: Every finding MUST include a concrete exploit scenario — a step-by-step attack path an attacker would follow. "This pattern is insecure" is not a finding.
Findings table:
SECURITY FINDINGS
═════════════════
# Sev Conf Status Category Finding Phase File:Line
─ ──── ──── ────── ──────── ─────── ───── ─────────
1 CRIT 9/10 VERIFIED Secrets AWS key in git history P2 .env:3
2 CRIT 9/10 VERIFIED CI/CD pull_request_target + checkout P4 .github/ci.yml:12
3 HIGH 8/10 VERIFIED Supply Chain postinstall in prod dep P3 package.json
4 HIGH 9/10 UNVERIFIED Integrations Webhook w/o signature verify P6 api/webhooks.ts:24
Per-finding format:
## Finding N: [Title] — [File:Line]
* **Severity:** CRITICAL | HIGH | MEDIUM
* **Confidence:** N/10
* **Status:** VERIFIED | UNVERIFIED | TENTATIVE
* **Phase:** N — [Phase Name]
* **Category:** [Secrets | Supply Chain | CI/CD | Infrastructure | Integrations | LLM Security | Skill Supply Chain | OWASP A01-A10]
* **Description:** [What's wrong]
* **Exploit scenario:** [Step-by-step attack path]
* **Impact:** [What an attacker gains]
* **Recommendation:** [Specific fix with example]
Incident Response Playbook (for leaked secrets):
git filter-repo or BFG Repo-CleanerProtection file check: If the project has no .gitleaks.toml or .secretlintrc, recommend creating one.
Write findings to security-reports/{date}-{HHMMSS}.json:
{
"version": "2.0.0",
"date": "ISO-8601-datetime",
"mode": "daily | comprehensive",
"scope": "full | infra | code | skills | supply-chain | owasp",
"diff_mode": false,
"phases_run": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
"attack_surface": {
"code": { "public_endpoints": 0, "authenticated": 0, "admin": 0, "uploads": 0, "integrations": 0, "background_jobs": 0, "websockets": 0 },
"infrastructure": { "ci_workflows": 0, "webhook_receivers": 0, "container_configs": 0, "iac_configs": 0, "secret_management": "unknown" }
},
"findings": [{
"id": 1,
"severity": "CRITICAL",
"confidence": 9,
"status": "VERIFIED",
"phase": 2,
"phase_name": "Secrets Archaeology",
"category": "Secrets",
"title": "...",
"file": "...",
"line": 0,
"description": "...",
"exploit_scenario": "...",
"impact": "...",
"recommendation": "..."
}],
"filter_stats": { "candidates_scanned": 0, "hard_exclusion_filtered": 0, "confidence_gate_filtered": 0, "reported": 0 },
"totals": { "critical": 0, "high": 0, "medium": 0, "tentative": 0 }
}
Add security-reports/ to .gitignore if not already present — security reports should stay local.
Disclaimer: This AI-assisted scan catches common vulnerability patterns — it is not comprehensive, not guaranteed, and not a replacement for a professional security audit. For production systems handling sensitive data, payments, or PII, engage a qualified security firm. Use this as a first pass to catch low-hanging fruit between professional audits — not as your only defense.
Read references/security-testing.md when adding security tests or hardening an API. Covers:
x-content-type-options, x-frame-options, strict-transport-securityalg: "none" and RS256→HS256 token forgery tests$ne, $regex, $where operator bypass testsUse when the user wants to run an active pentest against a staging or sandbox web application, confirm that identified vulnerabilities are actually exploitable with proof-of-concept evidence, or generate a security assessment report from confirmed exploits. Do not use against production systems or targets without explicit written authorization.
Read references/pentesting-shannon.md for:
npx @keygraph/shannon setup, start, logs, workspaces, stop)Use when performing authorized, hands-on offensive security testing without automated tooling, designing a pentest engagement, or reviewing an existing security assessment report. Requires written authorization from the system owner — unauthorized testing is illegal under the CFAA and equivalent laws.
Read references/attack-patterns.md for safe, non-destructive test payloads organized by vulnerability class:
Read references/owasp-top-10-checklist.md for per-category OWASP Top 10 (2021) coverage including test procedures, bad/good code examples, remediation steps, and CVSS scoring guidance across all 10 categories (A01–A10).
For automated OWASP checklist generation and regex-based source code scanning (13 built-in rules: SQL injection, XSS, command injection, hardcoded secrets, unsafe deserialization, SSRF, path traversal, JWT with hardcoded secret), run:
python <skill-dir>/scripts/vulnerability_scanner.py --target web --scope full [--code /path/to/repo]
For multi-ecosystem dependency auditing (npm, pip, Go, Ruby) against a built-in CVE database with risky pattern detection:
python <skill-dir>/scripts/dependency_auditor.py --file package.json [--severity high] [--json]
To generate a structured pentest report from a JSON findings file:
python <skill-dir>/scripts/pentest_report_generator.py findings.json [--format markdown]
# Use --sample to generate a template findings JSON
Read references/responsible-disclosure.md for:
Anti-patterns for manual pentesting:
Use when the user asks to scan, audit, or harden their Claude Code configuration (.claude/ directory), or when setting up a new project, onboarding to a repo with existing Claude Code configs, or running periodic security hygiene checks.
This is distinct from application security (security-testing.md, security-best-practices/) — it targets the AI coding assistant's own configuration surface: CLAUDE.md, settings.json, MCP servers, hooks, and agent definitions.
| File | Checks |
|------|--------|
| CLAUDE.md | Hardcoded secrets, auto-run instructions, prompt injection patterns |
| settings.json | Overly permissive allow lists, missing deny lists, dangerous bypass flags |
| mcp.json | Risky MCP servers, hardcoded env secrets, npx supply chain risks |
| hooks/ | Command injection via interpolation, data exfiltration, silent error suppression |
| agents/*.md | Unrestricted tool access, prompt injection surface, missing model specs |
# Check if installed
npx ecc-agentshield --version
# Install globally (recommended)
npm install -g ecc-agentshield
# Basic scan (current project)
npx ecc-agentshield scan
# Scan a specific path
npx ecc-agentshield scan --path /path/to/.claude
# Filter by minimum severity
npx ecc-agentshield scan --min-severity medium
# Output formats
npx ecc-agentshield scan --format json # CI/CD integration
npx ecc-agentshield scan --format markdown # documentation
npx ecc-agentshield scan --format html > security-report.html
# Apply auto-fixes (only fixes marked as auto-fixable)
npx ecc-agentshield scan --fix
# Deep analysis via adversarial three-agent pipeline (requires ANTHROPIC_API_KEY)
npx ecc-agentshield scan --opus --stream
The --fix flag replaces hardcoded secrets with env variable references and tightens wildcard permissions. It never modifies manual-only suggestions.
The --opus --stream mode runs three agents: Attacker (red team), Defender (blue team), and Auditor (synthesizes both).
npx ecc-agentshield init
Creates settings.json with scoped permissions and deny list, a CLAUDE.md with security best practices, and a mcp.json placeholder.
- uses: affaan-m/agentshield@v1
with:
path: '.'
min-severity: 'medium'
fail-on-findings: true
| Grade | Score | Meaning | |-------|-------|---------| | A | 90–100 | Secure configuration | | B | 75–89 | Minor issues | | C | 60–74 | Needs attention | | D | 40–59 | Significant risks | | F | 0–39 | Critical vulnerabilities |
Fix immediately (Critical):
Bash(*) in the allow list (unrestricted shell access)${file} interpolationFix before production (High):
CLAUDE.md (prompt injection vector)Recommended (Medium):
2>/dev/null, || true)PreToolUse security hooksnpx -y auto-install in MCP server configsAwareness (Info):
Use when the user asks to threat model a codebase or path, enumerate threats or abuse paths, or perform AppSec threat modeling. Do not trigger for general architecture summaries, code review, or non-security design work.
Read references/threat-modeling/prompt-template.md for:
intended_usage, deployment_model, data_sensitivity, internet_exposure, authn_authz_expectations)Read references/threat-modeling/threat-modeling-guide.md for:
Read references/threat-modeling/security-controls-and-assets.md for:
Workflow (8 steps):
<repo-or-dir-name>-threat-model.md.Risk prioritization guidance:
After producing a threat model, use
references/security-requirements.mdto translate findings into sprint-ready user stories, acceptance criteria, and traceability matrices.
Read references/security-requirements.md when translating threat analysis into actionable development artifacts. Covers:
SecurityRequirement, calculate_priority) as a reference implementation for automationUse this section when: a threat model exists (STRIDE, PASTA, or informal) and you need to derive sprint-ready requirements, user stories, acceptance criteria, or security test cases from it. This is distinct from security-testing.md (which covers verifying existing security behavior) and security-best-practices/ (which covers writing secure code).
Read references/security-architecture-patterns.md when designing a secure system from scratch, evaluating an existing architecture against Zero Trust or defense-in-depth principles, or choosing authentication and API security patterns. Covers:
Use alongside references/threat-modeling/ (identify threats first) and references/cryptography-implementation.md (choose the right algorithms).
Read references/cryptography-implementation.md when selecting cryptographic algorithms, implementing encryption/hashing, managing keys, or auditing cryptographic code for weaknesses. Covers:
Two Python scripts are bundled in scripts/ for automated threat analysis and secret detection:
| Script | Purpose | Key options |
|--------|---------|-------------|
| scripts/threat_modeler.py | STRIDE threat analysis with DREAD risk scoring; maps components to threat databases; human-readable and JSON output | --component "User Auth", --interactive, --json, --list-threats |
| scripts/secret_scanner.py | Detect hardcoded secrets and credentials across 24+ patterns (AWS, GitHub, JWT, Stripe, RSA, etc.); severity-classified findings | --path <dir>, --min-severity critical, --json, --list-patterns |
| scripts/security_scanner.py | Scan source files for 19+ vulnerability patterns: hardcoded secrets, SQL injection, XSS, command injection, path traversal; false-positive filtering for test/comment contexts | --path <dir>, --min-severity high, --json, --output report.json |
| scripts/vulnerability_assessor.py | Scan npm, Python, and Go dependency manifests for known CVEs; 0–100 risk scoring weighted by CVSS; compares installed versions against CVE database | --path <dir>, --severity critical, --json, --verbose |
| scripts/compliance_checker.py | Validate a codebase against SOC 2, PCI-DSS, HIPAA, and GDPR controls; framework-specific pattern matching across JS/TS/Python/Go/Java; scores 0–100 with category breakdown | --path <dir>, --framework soc2, --json, --verbose, --output report.json |
Run python scripts/threat_modeler.py --interactive for a guided threat modeling session, or python scripts/secret_scanner.py --path <repo-root> as part of a pre-commit or CI secret scan.
These scripts complement the CSO audit workflow (Phase 2 Secrets Archaeology, Phase 10 STRIDE Threat Model) — use them for automated first-pass analysis before applying expert judgment.
Use when the user asks to check compliance, validate against SOC 2 / PCI-DSS / HIPAA / GDPR, run a compliance audit, generate a compliance report, or identify compliance gaps.
Read references/compliance-requirements.md for:
Run python <skill-dir>/scripts/compliance_checker.py <repo-root> for automated compliance scoring:
# Scan against all frameworks (default)
python <skill-dir>/scripts/compliance_checker.py <repo-root>
# Target a specific framework
python <skill-dir>/scripts/compliance_checker.py <repo-root> --framework soc2
python <skill-dir>/scripts/compliance_checker.py <repo-root> --framework pci_dss
python <skill-dir>/scripts/compliance_checker.py <repo-root> --framework hipaa
python <skill-dir>/scripts/compliance_checker.py <repo-root> --framework gdpr
# Export results as JSON
python <skill-dir>/scripts/compliance_checker.py <repo-root> --json --output compliance-report.json
# Verbose mode (show which controls passed/failed)
python <skill-dir>/scripts/compliance_checker.py <repo-root> --verbose
Score categories: COMPLIANT (90–100), PARTIALLY_COMPLIANT (70–89), NON_COMPLIANT (50–69), CRITICAL_GAPS (<50). The checker scans .js, .ts, .py, .go, .java, .md, and config files. Use findings as a first-pass gap analysis — not a replacement for a formal audit.
The compliance checker complements Phase 11 (Data Classification) and Phase 9 (OWASP A01-A10) in the CSO audit workflow. After running it, use
references/security-requirements.mdto convert compliance gaps into sprint-ready user stories with PCI-DSS, HIPAA, GDPR, or OWASP ASVS control IDs.
Use when the user asks to assess dependency risk, triage a CVE, set patch priorities, or establish a vulnerability management process.
Read references/vulnerability-management.md for:
.github/workflows), manual review commands for Node.js (npm audit), Python (pip-audit), Go (govulncheck), and container images (trivy)Run python <skill-dir>/scripts/vulnerability_assessor.py <repo-root> for automated dependency CVE scanning:
# Scan all supported ecosystems (npm, Python, Go)
python <skill-dir>/scripts/vulnerability_assessor.py <repo-root>
# Filter by minimum severity
python <skill-dir>/scripts/vulnerability_assessor.py <repo-root> --severity critical
# Export JSON report
python <skill-dir>/scripts/vulnerability_assessor.py <repo-root> --json --output vuln-report.json
# Verbose output (show all CVE details)
python <skill-dir>/scripts/vulnerability_assessor.py <repo-root> --verbose
The assessor produces a 0–100 risk score weighted by CVSS severity. Use alongside Phase 3 (Dependency Supply Chain) in the CSO audit — the assessor provides CVSS-weighted scoring while Phase 3 focuses on supply chain risks (install scripts, lockfile integrity, abandoned packages).
Read references/security-standards.md for language-agnostic secure coding patterns, particularly when working in Python (Flask) or when no framework-specific security-best-practices/ file exists. Covers:
Prefer the framework-specific files in
references/security-best-practices/when a matching file exists (e.g.,python-fastapi-web-server-security.md). Usereferences/security-standards.mdfor framework-agnostic patterns or as a supplement.
Read the framework-specific files in references/security-best-practices/ when asked to write secure code, perform a security review, or audit a codebase for vulnerabilities. This is distinct from security-testing.md (which covers writing tests that verify security behavior) — this section is about coding securely and detecting vulnerabilities while working.
Step 1 — Identify all languages and frameworks in scope. Check both frontend and backend. Inspect package.json, pyproject.toml, framework imports, and directory layout.
Step 2 — Load matching reference files. Filename format: <language>-<framework>-<stack>-security.md. Also check <language>-general-<stack>-security.md for framework-agnostic guidance. For web apps with both frontend and backend, read refs for both layers.
| Stack | File |
|-------|------|
| General (any stack) | general-web-application-security.md |
| React (TypeScript) | javascript-typescript-react-web-frontend-security.md |
| Next.js | javascript-typescript-nextjs-web-server-security.md |
| Vue (TypeScript) | javascript-typescript-vue-web-frontend-security.md |
| jQuery | javascript-jquery-web-frontend-security.md |
| General JS frontend | javascript-general-web-frontend-security.md |
| Express | javascript-express-web-server-security.md |
| Django | python-django-web-server-security.md |
| FastAPI | python-fastapi-web-server-security.md |
| Flask | python-flask-web-server-security.md |
| Go (backend) | golang-general-backend-security.md |
| Cloud / Infrastructure | cloud-infrastructure-security.md |
Step 3 — If no matching file exists, apply known security best practices for that language/framework. If generating a report, tell the user that no concrete guidance file is available but still surface any clearly critical findings.
Operating modes:
security_best_practices_report.md (or a user-specified path). Format: short executive summary → findings by severity (CRITICAL → HIGH → MEDIUM → LOW) → each finding with a numeric ID, file path + line numbers, impact statement, and fix. After writing the report, summarize findings to the user.Fixes:
General security advice (applies to all languages):
secure on cookies only when TLS is confirmed to be active (use an env flag if needed).Overrides: If project docs or user instructions require bypassing a security best practice, comply without argument. You may note the override and suggest documenting it in the project so future readers understand why the best practice is not followed.
Structured step-by-step workflows for common security engineering tasks. Use these when you need a procedural checklist rather than the deep-dive references above.
Layer 1: PERIMETER — WAF, DDoS mitigation, DNS filtering, rate limiting
Layer 2: NETWORK — Segmentation, IDS/IPS, monitoring, VPN, mTLS
Layer 3: HOST — Endpoint protection, hardening, patching, logging
Layer 4: APPLICATION — Input validation, authentication, secure coding, SAST
Layer 5: DATA — Encryption at rest/transit, access controls, DLP, backup
| Use Case | Pattern | |----------|---------| | Web application | OAuth 2.0 + PKCE with OIDC | | API authentication | JWT (short expiry) + refresh tokens | | Service-to-service | mTLS with certificate rotation | | CLI / Automation | API keys with IP allowlisting | | High security | FIDO2/WebAuthn hardware keys |
See references/security-architecture-patterns.md for implementation patterns.
| Impact \ Exploitability | Easy | Moderate | Difficult | |-------------------------|------|----------|-----------| | Critical | Critical | Critical | High | | High | Critical | High | Medium | | Medium | High | Medium | Low | | Low | Medium | Low | Low |
For OWASP Top 10 guidance and automated scanning, see scripts/vulnerability_assessor.py and the CSO audit workflow above.
Security Code Review Checklist:
| Category | Check | Risk | |----------|-------|------| | Input Validation | All user input validated and sanitized | Injection | | Output Encoding | Context-appropriate encoding applied | XSS | | Authentication | Passwords hashed with Argon2/bcrypt | Credential theft | | Session | Secure cookie flags (HttpOnly, Secure, SameSite) | Session hijacking | | Authorization | Server-side permission checks on all endpoints | Privilege escalation | | SQL | Parameterized queries used exclusively | SQL injection | | File Access | Path traversal sequences rejected | Path traversal | | Secrets | No hardcoded credentials or keys | Information disclosure | | Dependencies | Known vulnerable packages updated | Supply chain | | Logging | Sensitive data not logged | Information disclosure |
Secure vs. insecure patterns quick reference:
| Pattern | Issue | Secure Alternative |
|---------|-------|-------------------|
| SQL string formatting | SQL injection | Parameterized queries |
| Shell command building | Command injection | subprocess with argument list, no shell |
| Path concatenation | Path traversal | Validate and canonicalize paths |
| MD5/SHA1 for passwords | Weak hashing | Argon2id or bcrypt |
| Math.random() for tokens | Predictable values | crypto.getRandomValues / secrets.token_hex |
See references/security-testing.md and references/security-best-practices/ for detailed patterns.
Incident severity and escalation:
| Level | Response Time | Escalation | |-------|---------------|------------| | P1 — Critical (active breach/exfiltration) | Immediate | CISO, Legal, Executive | | P2 — High (confirmed, contained) | 1 hour | Security Lead, IT Director | | P3 — Medium (potential, under investigation) | 4 hours | Security Team | | P4 — Low (suspicious, low impact) | 24 hours | On-call engineer |
IR phases checklist:
| Phase | Actions | |-------|---------| | Identification | Validate alert, assess scope, determine severity | | Containment | Isolate systems, preserve evidence, block access | | Eradication | Remove threat, patch vulnerabilities, reset credentials | | Recovery | Restore services, verify integrity, increase monitoring | | Lessons Learned | Document timeline, identify gaps, update procedures |
For leaked secrets specifically, see the Incident Response Playbook in the CSO audit section above (Phase 13).
Read references/test-reports.md when creating test plans or summarizing results. Covers:
Read references/qa-methodology.md for manual testing, quality advocacy, and QA planning. Covers:
Read references/automation-frameworks.md when designing or scaling a test automation framework. Covers:
data-testid → semantic → CSS)fullyParallel, workers, sharding across GitHub Actions matrixUserFactory.create() with Faker, seed via API in beforeEachMetricsReporter for sending test metrics to monitoringRead references/mastra-smoke-test.md when you need to:
create-mastra@<tag> and smoke test the Mastra StudioplannerNetwork with sub-agents + Memory) for Network Mode testingQuick decision: use this section when the task involves spinning up a fresh Mastra project and running a browser-driven smoke test checklist against the Studio at http://localhost:4111.
Read references/e2e-cypress.md when working with a Cypress-based E2E suite or adding Cypress to a project. Covers:
cypress.config.ts setup and configurationcy.login, cy.createUser, cy.dataCy)cy.intercept — stub responses, modify replies, assert on request payloads, simulate slow networkcy.sessionbeforeEach, clean up in afterEach.cy.ts)cy.pause(), screenshots)Use when running tests for the React codebase. Supports source, www, stable, and experimental channels.
Channel mapping:
| Channel | Command |
|---------|---------|
| (default / source) | yarn test --silent --no-watchman <pattern> |
| experimental | yarn test -r=experimental --silent --no-watchman <pattern> |
| stable | yarn test-stable --silent --no-watchman <pattern> |
| classic (legacy www) | yarn test-classic --silent --no-watchman <pattern> |
| www (modern, __VARIANT__=true) | yarn test-www --silent --no-watchman <pattern> |
| www variant false | yarn test-www --variant=false --silent --no-watchman <pattern> |
Hard rules:
--silent — limits output to failures only.--no-watchman — watchman causes failures in sandboxed environments.Common mistakes:
www variants when testing __VARIANT__ flags — test www AND www variant false.@gate pragma — see the React feature flags section below.Read references/react-feature-flags.md for:
ReactFeatureFlags.js and its channel forks)@gate pragma (test-level gating — skip when flag is off)gate() inline (assertion-level — both paths run, behavior differs)__VARIANT__ flags and testing both true/false variantsWhen adding new error messages to React or encountering "unknown error code" warnings:
yarn extract-errors.Read references/debugging.md for:
Use when asked to "debug this", "fix this bug", "why is this broken", "investigate this error", or "root cause analysis". Also trigger proactively when the user reports errors, unexpected behavior, or is troubleshooting why something stopped working.
Iron Law: no fixes without root cause investigation first. Fixing symptoms creates whack-a-mole debugging. Find the root cause, then fix it.
Gather context before forming any hypothesis.
git log --oneline -20 -- <affected-files>
Was this working before? A regression means the root cause is in the diff.Output: "Root cause hypothesis: ..." — a specific, testable claim about what is wrong and why.
After forming a hypothesis, identify the narrowest directory containing the affected files and restrict edits to it. Tell the user the debug scope. This prevents changes to unrelated code during the investigation.
Check if the bug matches a known pattern:
| Pattern | Signature | Where to look | |---------|-----------|---------------| | Race condition | Intermittent, timing-dependent | Concurrent access to shared state | | Null propagation | NoMethodError, TypeError | Missing guards on optional values | | State corruption | Inconsistent data, partial updates | Transactions, callbacks, hooks | | Integration failure | Timeout, unexpected response | External API calls, service boundaries | | Configuration drift | Works locally, fails in staging/prod | Env vars, feature flags, DB state | | Stale cache | Shows old data, fixes on cache clear | Redis, CDN, browser cache |
Also check git log for prior fixes in the same area — recurring bugs in the same files are an architectural smell, not a coincidence.
External search: If the bug doesn't match a known pattern, search for "{framework} {generic error type}". Sanitize first — strip hostnames, IPs, file paths, SQL fragments, customer data. Search the error category, not the raw message.
Before writing any fix, verify the hypothesis.
Red flags — slow down if you see these:
Once root cause is confirmed:
Reproduce the original bug scenario and confirm it's fixed. This is not optional.
Emit a structured debug report:
DEBUG REPORT
════════════════════════════════════════
Symptom: [what the user observed]
Root cause: [what was actually wrong]
Fix: [what was changed, with file:line references]
Evidence: [test output, reproduction attempt showing fix works]
Regression test: [file:line of the new test]
Related: [prior bugs in same area, architectural notes]
Status: DONE | DONE_WITH_CONCERNS | BLOCKED
════════════════════════════════════════
Status definitions:
Important rules:
Read references/ci-cd.md for:
Bootstrapping a CI pipeline from scratch: If the repo has no CI workflow yet, use the agentic-development skill's stack-detection + pipeline-generation scripts (stack_detector.py → pipeline_generator.py) to generate a baseline CI YAML (lint/test/build). Then return here to wire coverage thresholds, quality gates, and branch protection rules on top of that baseline.
Read references/suite-architecture.md for:
Use when you have lint errors, formatting issues, or need to pass CI before committing.
yarn prettier to fix formatting (formats changed files only — not all files).yarn linc to check for remaining lint issues (these will fail CI if not addressed).Common mistakes:
yarn prettier only formats changed files.linc errors — they are not warnings; they will block CI.For the full pre-commit workflow (format → lint → parallel type-check + tests), read references/verification.md.
Use when the repo uses a Makefile with make format, make lint, make typecheck, and make tests targets and you need to confirm a change is complete before claiming it is fixed, merged, or ready.
Quick start:
bash <skill-dir>/scripts/make-verify-run.shpowershell -ExecutionPolicy Bypass -File <skill-dir>/scripts/make-verify-run.ps1The scripts run make format first, then run make lint, make typecheck, and make tests in parallel with fail-fast semantics. They emit periodic heartbeat updates every 10 seconds while the parallel steps are still running (override with CODE_CHANGE_VERIFICATION_HEARTBEAT_SECONDS).
Manual workflow (when running steps individually):
make sync first to install dev requirements via uv.make format from the repository root.make lint, make typecheck, and make tests — you may parallelize these three, but stop all remaining steps as soon as any one fails.When to use:
Fail-fast semantics: when a parallel step fails, the scripts kill all sibling step process groups immediately and print the last 80 lines of the failing step's log to stderr. Fix the reported issue, then rerun.
Resources:
scripts/make-verify-run.sh — bash entry point (macOS/Linux); requires perl, python3, python, or uv for process group management.scripts/make-verify-run.ps1 — PowerShell entry point (Windows); uses Start-Process + taskkill /T for process tree cleanup.Use when the user asks to validate a release candidate, audit a diff before shipping, call a ship/block gate, or review what changed since the last release tag.
This is distinct from references/verification.md (which covers verifying that a single fix or feature is complete) — this section is about auditing the full BASE_TAG...TARGET diff for breaking changes, regressions, improvement opportunities, and release risks across an entire release boundary.
# 1. Sync tags and pick base
BASE_TAG="$(scripts/find_latest_release_tag.sh origin 'v*')"
# 2. Fetch latest tip of main
git fetch origin main --prune
TARGET="$(git rev-parse origin/main)"
# 3. Snapshot scope
git diff --stat "${BASE_TAG}"..."${TARGET}"
git diff --dirstat=files,0 "${BASE_TAG}"..."${TARGET}"
git log --oneline --reverse "${BASE_TAG}".."${TARGET}"
git diff --name-status "${BASE_TAG}"..."${TARGET}"
references/release-review-checklist.md to spot breaking changes, regressions, and improvement opportunities.'*.*.*').origin/main tip) has already passed CI unless the user says otherwise.--stat, --dirstat, and --name-status outputs to spot hot directories and file types.git diff --word-diff BASE...TARGET -- <path>.references/release-review-checklist.md (breaking changes, regression clues, improvement opportunities).Evidence, Impact, and Action.BLOCKED release call when they are already covered by an appropriate version bump and migration/upgrade notes; only block when the bump is missing/mismatched (e.g., patch bump for a breaking change) or when the change introduces unresolved risk.BASE_TAG and TARGET explicitly.BLOCKED.BLOCKED):
| Level | Meaning | |-------|---------| | 🟢 LOW | Low blast radius or clearly covered behavior; no release gate impact | | 🟡 MODERATE | Plausible user-facing regression signal; needs validation but not a confirmed blocker | | 🔴 HIGH | Confirmed or strongly evidenced release-blocking issue |
### Release readiness review (<tag> -> TARGET <ref>)
### Diff
https://github.com/<owner>/<repo>/compare/<tag>...<target-commit>
### Release call:
**<🟢 GREEN LIGHT TO SHIP | 🔴 BLOCKED>** <one-line rationale>
### Scope summary:
- <N files changed (+A/-D); key areas touched: ...>
### Risk assessment (ordered by impact):
1) **<Finding title>**
- Risk: **<🟢 LOW | 🟡 MODERATE | 🔴 HIGH>**. <Impact in one sentence.>
- Evidence: <specific diff/test/commit signal>
- Files: <path(s)>
- Action: <concrete command/task with pass criteria>
### Unblock checklist (required when BLOCKED):
1. [ ] <concrete check/fix>
- Exit criteria: <what must be true to unblock>
### Notes:
- <working tree status, tag/target assumptions, or re-run guidance>
If the gate is not blocked, omit the Unblock checklist section. Every finding must have Evidence, Impact, and Action.
Read references/release-review-checklist.md for the full set of signals: breaking change indicators, regression risk clues, improvement opportunities, and the gate decision matrix.
Resources:
scripts/find_latest_release_tag.sh — fetches remote tags and returns the newest match for a pattern (default v*).references/release-review-checklist.md — detailed signals and commands for spotting breaking changes, regressions, and release polish gaps.Read references/verification.md before saying something is fixed, asking for merge, or treating a release as ready. It includes:
Read references/testing-anti-patterns.md when writing mocks, reviewing test code, or adding test utilities. Covers: testing mock behavior instead of real behavior, test-only methods in production classes, mocking without understanding dependency chains, incomplete mocks, and tests written as afterthoughts.
Read references/anti-patterns.md for fast smell detection across review, testing, debugging, and CI.
When creating test plans or QA reports, provide:
references/test-reports.mdscripts/qa-scan.py: detect stack, runners, CI providers, and likely QA commands.scripts/qa-check.sh: run lint, type, and test commands across common Python, JS, Ruby, and Go repos.scripts/coverage-report.sh: run coverage with configurable thresholds across common runners.scripts/with_server.py: start one or more local servers, wait for readiness, run a command, then shut down. Use for Python Playwright-based webapp testing.scripts/run-e2e-with-report.sh: run TypeScript Playwright E2E tests and generate an HTML report. Supports --project, --output-dir, and --headed flags.scripts/find_latest_release_tag.sh: fetch remote tags and return the newest tag matching a pattern (default v*). Used for release readiness reviews to determine BASE_TAG.Testing philosophy quick-reference — if the repo uses the integration-first, entry-point-coverage philosophy, always read
references/testing-principles.mdfirst. Its principles (prefer integration over unit, minimize edge cases, regression for every bug, cover every entry point, mock externals with sanitized fixtures) take precedence over the general QA defaults in this skill.
agentic-development when repo orientation, architecture choice, or the code-change path itself is the bottleneck — or when a new CI pipeline needs to be generated from stack signals (stack_detector.py + pipeline_generator.py).gh-fix-ci when GitHub Actions failures need log retrieval and implementation.Do not stop on "likely fixed". Stop on reproduced failure, root-cause explanation, regression protection, fresh verification output, and a clear statement of residual risk if verification is partial.
development
Use for frontend engineering work such as components, routes, state management, accessibility, performance, design-system integration, and browser-facing debugging or refactors.
development
This skill should be used when the user asks to write, update, review, scaffold, move, remove, or continuously improve documentation for code, folders, services, repos, workflows, architectural decisions, or operational processes. Trigger for inline docs, `README.md`, `ARCHITECTURE.md`, `TESTS.md`, `SETUP.md`, `RUNBOOK.md`, `CHANGELOG.md`, `SECURITY.md`, `OVERVIEW.md`, `FAQ.md`, `DECISIONS.md`, `DEPENDENCIES.md`, `AGENTS.md`, `PLAN.md`, `SPEC.md`, `SOUL.md`, `PRINCIPLES.md`, `DESIGN.md`, `logs/`, `lessons/`, `items/`, `fixes/`, `audits/`, `raw/`, `plans/`, `specs/`, `sources/`, `lib/`, `references/`, `cookbook/`, `knowledge/`, `runbooks/`, `research/`, `official-documentation/`, `context/`, MDX docs, JSDoc/TSDoc, docstrings, ADRs, post-mortems, migration guides, documentation cleanups, and documentation-impact reviews.
tools
Cross-cloud CLI-first cloud operations for AWS, Azure, and GCP. Use when the assistant needs to identify which cloud provider or multi-cloud estate a repo uses, deploy new resources or services, wire automatic deployments, inventory and optimize infrastructure, or diagnose and repair cloud failures entirely from the terminal, with explicit approval gates for high-cost, destructive, identity-sensitive, or hard-to-reverse changes. Covers AWS Amplify full-stack projects, serverless workloads (Lambda, API Gateway, Step Functions, SAM, CDK), and the full AWS database portfolio (RDS, Aurora, Aurora DSQL, DynamoDB, ElastiCache), as well as deep Azure references for diagnostics, storage, compute, compliance, identity, Foundry, and cross-cloud migrations.
development
Use for backend engineering work such as APIs, services, data models, persistence, queues, caching, auth, background jobs, and server-side debugging or refactors.