skills/cmd-pr-test-plan/SKILL.md
Generate manual test plans for PR changes — focused on hands-on verification a developer would do, not unit-test edge cases
npx skillsauth add olshansk/agent-skills cmd-pr-test-planInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate a manual test plan for the changes in the current branch. The plan should focus on what a developer/reviewer needs to manually verify — real user flows, integration behavior, and observable outcomes. Leave input validation, error branches, and edge cases to unit tests.
Default (no scope specified): diff the current branch against the repo's base branch.
Detect the base branch in order — stop at the first success:
gh repo view --json defaultBranchRef -q '.defaultBranchRef.name' 2>/dev/nullgit remote show origin 2>/dev/null | grep "HEAD branch" | cut -d: -f2 | xargsgit symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@'Do not assume main or master. If all methods fail, ask the user.
Once resolved, run:
git diff <base>...HEAD -- ":(exclude)*.lock" ":(exclude)package-lock.json" ":(exclude)pnpm-lock.yaml" ":(exclude)package.json"
If the user specifies a scope, use the corresponding command instead:
| Scope | Command | What it covers |
|---|---|---|
| unstaged | git diff HEAD -- <excludes> | All uncommitted changes (staged + unstaged) |
| last commit / last 1 commit | git diff HEAD~1...HEAD -- <excludes> | Changes in the most recent commit |
| last N commits | git diff HEAD~N...HEAD -- <excludes> | Changes in the last N commits |
| entire repo | git ls-files \| grep -vE "\.(lock\|snap)$\|package-lock\.json\|pnpm-lock\.yaml" | All tracked source files; no diff — generate a full application test plan covering all major flows and integration points |
For all diff commands, apply: -- ":(exclude)*.lock" ":(exclude)package-lock.json" ":(exclude)pnpm-lock.yaml" ":(exclude)package.json"
Use Determine Scope above to resolve the scope and get the diff (or file list for entire repo).
Run all of these and capture the results:
git diff $BASE_BRANCH...HEAD --name-only
git diff $BASE_BRANCH...HEAD --stat
git log $BASE_BRANCH..HEAD --oneline
For non-branch scopes (unstaged, last N commits), adapt the commands accordingly — replace $BASE_BRANCH...HEAD with the scoped range (e.g. HEAD~N...HEAD), and replace git log with git log HEAD~N..HEAD --oneline.
Check what's available in the project so you can reference real commands (not generic guesses):
make help 2>/dev/null || grep -E '^[a-z_-]+:.*##' Makefile makefiles/*.mk 2>/dev/nullpyproject.toml (uv/pip), package.json (npm/pnpm), Cargo.toml (cargo), go.mod (go)pytest.ini, pyproject.toml [tool.pytest], jest.config.*, .mocharc.*AGENTS.md, CLAUDE.md, CONTRIBUTING.md, or README.md for project-specific test/build instructions.github/workflows/, Makefile, or Taskfile.yml for existing test commandsPrefer project Makefile targets and documented commands over raw tool invocations. If the project has make test_unit, use that instead of uv run pytest tests/unit/.
Group changed files into categories. Common categories (adapt based on actual changes):
Present the detected categories to the user with a summary of what changed in each. Ask them to confirm or adjust before generating the full plan.
Example confirmation format:
I found 3 change areas in this branch:
1. CLI agent mode -- new --agent flag on setup command (cli/commands/setup.py, cli/cli.py)
2. Skills restructuring -- SKILL.md rewrite, new reference docs, deleted shell scripts
3. Test fixes -- E2E test stability improvements (4 test files)
Should I generate the test plan for all 3, or would you like to adjust?
For each confirmed category, generate a test section following these rules:
Tag every test step with one of these emojis in the step title:
| Emoji | Meaning | When to use | |-------|---------|-------------| | 🔴 | Critical | Core functionality — if this fails, the feature is broken | | 🟢 | Expected | Standard behavior that should work — moderate confidence but worth verifying | | 🔵 | Nice-to-have | Polish, UX, non-blocking — skip if short on time |
Example: 1a. 🔴 **Pre-register a profile end-to-end**
---) between themGROVE_API_URL=http://localhost:8000 make test_e2e_suiteDO include — things you must verify manually:
DO NOT include — leave these for unit tests:
For any changed file that has multi-phase logic (e.g., validate → external I/O → DB write), add a "Failure-path" column to the test coverage table:
| File Changed | Unit | Integration | E2E | Failure-path |
|------------------------|------|-------------|-----|--------------|
| routes/account.py | ❌ | ❌ | ✅ | ❌ |
| db/display_names.py | ✅ | ❌ | ❌ | ⏭️ |
The "Failure-path" column asks: does at least one test inject a failure into a non-first phase of multi-phase work?
verify token → DB read → external API call → DB write, a failure-path test mocks the external API to raise and asserts the route logs context and returns a typed error (not a silent 500).⏭️ only for files with no exception-bearing branches (pure utilities, type definitions).❌ is the default until a test exists.Always end with a "Quick Smoke Test" section -- the 2-3 commands a reviewer would run if they only have 60 seconds. Tag each with the appropriate emoji.
TEST_PLAN.md in the repo rootTerminal summary format:
Wrote TEST_PLAN.md with 4 sections:
1. CLI Agent Mode -- 4 test steps (🔴×2, 🟢×1, 🔵×1)
2. Skills Restructuring -- 3 test steps (🔴×1, 🟢×2)
3. Automated Tests -- 2 test steps (🟢×2)
4. Quick Smoke Test -- 3 commands
Run `cat TEST_PLAN.md` to view the full plan.
Follow the same style used in cmd-pr-description:
testing
Ask the agent whether it finished everything or has more to do — a lightweight completeness gate for the end of any task
development
Audit personal skills for redundancy, verbosity, weak triggers, and overlap. Runs a Claude→Codex review loop, presents per-item approval checkboxes, then applies approved edits and updates README and agent metadata. Use when asked to "review my skills", "audit my skills", "revisit my skills", or "clean up my skills". Accepts an optional skill name to scope the review to a single skill.
development
Set up or extend golden/snapshot tests for a project. Covers fixture design, Makefile targets, snapshot storage, diff workflow, and update protocol.
development
Proofread posts before publishing for spelling, grammar, repetition, logic, weak arguments, broken links, and optionally reformat for skimmability or shape the writing vibe toward a known author's style