plugins/test-evaluator/skills/test-evaluator/SKILL.md
Evaluate the quality and efficacy of existing tests by reviewing test code against source code. Use when the user asks to review tests, validate test quality, audit test suites, check test efficacy, or assess whether tests are testing things properly. Prioritizes real interactions over mocking and simulation.
npx skillsauth add mattermost/mattermost-ai-marketplace test-quality-reviewerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Review existing tests for efficacy, correctness, and coverage gaps. The goal is not to run tests — it is to evaluate whether the tests are actually testing what they claim to test, and whether they do so in a meaningful way.
A good test exercises real behavior. A bad test creates an elaborate illusion of safety. When evaluating tests, apply these principles in priority order:
When the user points you at tests to review, follow these steps:
Read all test files the user specifies. For each test case, note:
Identify the production code that the tests target. Read it thoroughly. You need to understand the real behavior to judge whether the tests cover it.
For every test case, assess:
| Criterion | What to look for | |-----------|-----------------| | Efficacy | Does the test exercise real behavior or just verify mock wiring? | | Assertion quality | Are assertions on meaningful outcomes or implementation details? | | Mock abuse | Is anything mocked that could reasonably be used directly? Are internal interfaces mocked? | | Brittleness | Will the test break on harmless refactors but survive actual bugs? | | Clarity | Can you tell what breaks if this test fails? | | Redundancy | Does this test duplicate coverage from another test without adding value? |
Compare the source code against the full test suite. Look for:
Output the report using the format defined below.
Produce a markdown table with one row per test case:
| Test Name | What It Tests | Verdict | Issues / Suggestions |
|-----------|--------------|---------|---------------------|
| TestCreateUser | Inserts a user row and verifies DB state | Good | — |
| TestHandleWebhook | Mocks the HTTP client and checks mock was called | Remove or rewrite | Mocks the entire HTTP layer; never validates actual webhook processing. Rewrite to send a real HTTP request to a test server. |
| TestParseConfig | Parses valid and invalid YAML | Good, improve | Add cases for missing required fields and malformed YAML. |
Verdict values:
After the table, add a section listing recommended new tests:
## Coverage Gaps
### Missing test coverage
| Source Code Area | What Needs Testing | Suggested Approach |
|-----------------|-------------------|-------------------|
| `server/api.go:HandleDelete` | No tests for the delete endpoint | Integration test: call the endpoint with a real HTTP request, verify the resource is removed from the DB |
| `webapp/src/components/Modal.tsx` | Modal close behavior untested | E2E test: open modal, click close button, assert modal is no longer in DOM |
Apply these heuristics based on what you observe in the test code:
httptest.NewServer), use real database connections (even SQLite in-memory), call exported functions directly.io.Reader, pass a strings.NewReader — don't create a MockReader.page.route() and only verify mocked responses rendered. The point of E2E is end-to-end.db.Insert was called. That tests nothing about whether the insert works or the data is correct.Flag any of these immediately:
| Anti-Pattern | Why It's Bad | What To Recommend Instead |
|-------------|-------------|--------------------------|
| Mocking your own interfaces | Tests the mock, not the code | Use the real implementation or a lightweight fake (e.g. in-memory DB) |
| assert(mockFn).toHaveBeenCalledWith(...) as the only assertion | Verifies wiring, not behavior | Assert on the actual outcome (DB state, HTTP response, DOM output) |
| page.route('**/*', ...) in E2E tests | Defeats the purpose of E2E | Let real requests flow; mock only true external services if needed |
| Snapshot tests as sole coverage | Snapshots are approvals, not assertions | Add explicit assertions for key behaviors; use snapshots only for regression on stable output |
| Testing private/internal methods directly | Couples tests to implementation | Test through the public API |
| Giant setup / teardown with no assertions | Complexity without value | Simplify setup; each test should have clear, specific assertions |
| Tests that pass when the feature is deleted | Zero coupling to real behavior | Rewrite to depend on actual feature code |
tools
Analyze a GitHub pull request for risk level and generate concrete QA recommendations. Accepts a PR URL or "owner/repo#number" reference. Uses `gh` CLI to fetch the diff and metadata, computes blast radius, scores six risk dimensions, and returns a structured JSON risk assessment. Use when the user invokes /qa-analysis:qa-analysis with a GitHub PR URL or reference, or asks for a PR risk assessment, QA recommendations, or "what should I test?" for a given pull request.
tools
Add an MCP (Model Context Protocol) server to a Mattermost plugin so the Agents plugin can call its tools. Use when implementing cross-plugin MCP, exposing AI tools from a Mattermost plugin to the Agents plugin, or wiring up the `pluginmcp` helper from mattermost-plugin-agents.
tools
Create a new Mattermost plugin from the starter template in the current directory. Use when creating a new plugin from scratch, scaffolding a Mattermost plugin, or bootstrapping a plugin project.
development
Orchestrates test-driven fixes for Mattermost security tickets (Jira/Atlassian) with a Staff Security Engineer mindset: failing secure-behavior tests first, then implementation, then security review and edge-case loops, then opening a non-draft PR that follows `.github/PULL_REQUEST_TEMPLATE.md` when present, with a vague public description (no exploit detail). Use when the user invokes /security-fix:security-fix with a mattermost.atlassian.net browse URL, MM-* security work, backend permission or authorization bugs, or asks for this security TDD workflow.