skills/gh-actions-wisdom/SKILL.md
GitHub Actions workflow best practices and pitfalls reference. Use when: (1) Writing or reviewing .yml workflows, (2) Setting up CI/CD pipelines, (3) Debugging slow, expensive, or stuck workflow runs, (4) User says 'gh actions', 'github actions', 'workflow best practices', (5) Before creating or modifying any .github/workflows/ file. Keywords: GitHub Actions, CI/CD, workflow, timeout, concurrency, security, caching.
npx skillsauth add takazudo/claude-resources gh-actions-wisdomInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference best practices before writing or reviewing any GitHub Actions workflow.
Load topic-specific references as needed from references/.
Several rules below depend on whether your jobs run on ephemeral cloud runners (GitHub-hosted ubuntu-latest, RunsOn, BuildJet, Namespace, etc. — fresh VM per job, wiped between runs) or persistent self-hosted runners (long-lived machines with state that carries across runs). Advice that is correct in one context can be a hard-to-debug bug in the other.
| Concern | Ephemeral cloud runners | Persistent self-hosted runners |
| ---------------------------------- | ---------------------------------------------------- | ----------------------------------------------------------- |
| actions/cache for build tools | Use it — disk is wiped between runs | Avoid — local disk is already the cache |
| set-safe-directory: false | Don't set — containers need the default | Set it — avoids ~/.gitconfig pollution |
| Manual workspace cleanup steps | Not needed — fresh VM each run | Often needed — workspace persists |
| chown workspace at job end | Not needed — VM is destroyed | Sometimes needed for next-run access |
| detect-runner fallback pattern | Obsolete — the cloud runner IS the runner | Useful when mixing self-hosted + GitHub-hosted |
Migration warning. When moving a workflow from self-hosted to ephemeral (or vice versa), audit every step and option that was added "for the runner". Leftover self-hosted-isms on a cloud runner produce mysterious failures: pnpm: command not found (no setup step because pnpm was preinstalled), Cache not found between jobs (cache backend differs), fatal: detected dubious ownership (because set-safe-directory: false is now actively wrong), etc. Specific rules below are gated by runner context where it matters.
timeout-minutesThe default timeout is 360 minutes (6 hours). A stuck job silently burns runner minutes.
jobs:
build:
runs-on: ubuntu-latest
timeout-minutes: 15 # ALWAYS set this
Recommended values:
| Job type | timeout-minutes | | ---------------- | --------------- | | Lint / typecheck | 5-10 | | Unit tests | 10-15 | | Build | 15-30 | | E2E tests | 30-60 | | Docker build | 15-30 | | Deploy | 10-15 | | Notification | 5 |
Prevent redundant runs and protect production deploys.
# PR checks: cancel previous runs on new push
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
# Production deploy: never cancel in-progress
concurrency:
group: deploy-production
cancel-in-progress: false
Never rely on default permissions. Declare explicitly per workflow or per job.
permissions:
contents: read
jobs:
deploy:
permissions:
contents: read
deployments: write
Tags are mutable. The March 2025 tj-actions/changed-files supply chain attack (CVE-2025-30066) compromised 23,000+ repos via rewritten tags.
# Bad - tag can be rewritten
- uses: actions/checkout@v4
# Good - immutable SHA
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
Caveat: Some repos (e.g., pnpm/action-setup) have force-pushed, invalidating previously pinned SHAs. If CI fails with Unable to resolve action ... unable to find version, look up the current SHA via gh api repos/OWNER/REPO/git/ref/tags/vX.Y.Z. See references/security.md for the full diagnostic procedure.
Do not use cache: 'pnpm' (or cache: 'npm', cache: 'yarn') in actions/setup-node. GitHub Actions cache restore is often slower than a fresh pnpm install from npm's CDN. npm's CDN is highly optimized for package downloads, while GitHub's cache API has significant overhead for large stores (especially 1GB+). Benchmarking confirmed: direct install from CDN consistently beats cache restore + install.
# BAD - cache restore adds overhead, slower than fresh install
- uses: actions/setup-node@v4
with:
node-version-file: .node-version
cache: pnpm # REMOVE THIS
# GOOD - just install directly
- uses: actions/setup-node@v4
with:
node-version-file: .node-version
- run: pnpm install
This is especially true for self-hosted runners where the pnpm store is already local — caching to GitHub's remote cache and restoring it is pointless overhead.
set-safe-directory: leave default on ephemeral runners, set false on self-hostedactions/checkout defaults set-safe-directory to true, which runs git config --global --add safe.directory on every run.
Ephemeral cloud runners — leave the default (true). Each run is a fresh VM, so there is no gitconfig to pollute. The default is also required for container jobs whose UID differs from the host runner user; without it, git inside the container errors with fatal: detected dubious ownership when it tries to operate on the mounted workspace.
# GOOD on ephemeral runners — let checkout do its default thing
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
Persistent self-hosted runners — set it to false. Otherwise ~/.gitconfig accumulates a duplicate safe.directory entry on every run, polluting the shared gitconfig across every repo on that machine.
# GOOD on self-hosted runners — prevent gitconfig pollution
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
set-safe-directory: false
When migrating self-hosted → ephemeral, forgetting to remove set-safe-directory: false is a common gotcha. Non-container jobs may still work (the runner user owns the workspace), but the moment a job runs in a container, git inside hits dubious-ownership and fails with confusing errors.
actions/checkout (a node action) writes safe.directory to the node-action HOME (/root/.gitconfig inside many containers). Shell run: steps inside the container have a different HOME (/github/home), so they read a different gitconfig and don't see the safe.directory entry. Lifecycle scripts (pnpm install calling prepare → lefthook install → git) then fail with fatal: detected dubious ownership.
For container jobs, add a manual step before checkout that writes safe.directory to the shell-side gitconfig:
test:
runs-on: ubuntu-latest
container:
image: foo:bar
steps:
- name: Mark workspace as safe for git
run: git config --global --add safe.directory "$GITHUB_WORKSPACE"
- uses: actions/checkout@v4
# ... rest of the job
This is orthogonal to the set-safe-directory option — it covers shell-step git invocations, which checkout's option doesn't reliably reach in container jobs. Plain (non-container) jobs do not need it.
actions/cache for build tools: yes on ephemeral, no on self-hostedPersistent self-hosted runners — build tool caches (Cargo, Go modules, Gradle, etc.) already persist on the runner's local disk. Using actions/cache uploads them to GitHub's remote cache API on every run and creates duplicate entries, wasting storage.
# BAD on self-hosted — uploads local cache to remote on every run
- uses: actions/cache@v4
with:
path: ~/.cargo/registry
key: cargo-${{ hashFiles('Cargo.lock') }}
# GOOD on self-hosted — just use the local disk cache directly
# (no actions/cache step needed)
Ephemeral cloud runners — disk is wiped between runs, so actions/cache is essential to avoid re-downloading the dependency tree from scratch every time. Use it for ~/.cargo/registry, ~/.gradle/caches, the Go module cache, etc.
# GOOD on ephemeral runners — survives across runs
- uses: actions/cache@v4
with:
path: ~/.cargo/registry
key: cargo-${{ runner.os }}-${{ hashFiles('Cargo.lock') }}
Note: rule 5 ("Don't cache package managers in setup-node") still applies on both runner types — that rule is about npm package downloads where the CDN is faster than cache restore. Rule 7 is about general build-tool caches.
curl | sh Installers — Use Prebuilt-Binary ActionsInstaller scripts like curl https://.../init.sh | sh (wasm-pack, rustup, many language toolchains) do one HTTP request with no retry. A single transient 5xx from the redirect target (e.g., a GitHub release asset) kills the entire workflow. Seen in the wild: rustwasm.github.io → github.com/rustwasm/wasm-pack/releases/... returning 504 mid-deploy.
# BAD — one curl, no retry, fails on any 5xx
- name: Install wasm-pack
run: curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh
# GOOD — prebuilt binary from GitHub releases, with retries + runner caching
- uses: taiki-e/install-action@v2
with:
tool: wasm-pack
taiki-e/install-action covers most Rust/Go/Node tools (wasm-pack, cargo-nextest, just, mdbook, etc.). For tools it doesn't cover, use actions/cache on a pinned-version binary, or wrap the curl in a retry loop with curl --retry 5 --retry-all-errors --retry-delay 5.
upload-artifact/download-artifact counts toward shared org storage (often limited). For passing build output between jobs in the same workflow, use actions/cache instead — it has a separate 10 GB per-repo limit.
# BAD — artifacts accumulate in shared org storage
- uses: actions/upload-artifact@v4
with:
name: build-output
path: dist/
retention-days: 1
# GOOD — cache uses separate per-repo quota
- uses: actions/cache/save@v4
with:
path: dist/
key: build-${{ github.run_id }}
# In the downstream job:
- uses: actions/cache/restore@v4
with:
path: dist/
key: build-${{ github.run_id }}
If the build and deploy steps can run on the same runner, merging them into a single job is even simpler.
Caveat for cloud runners that proxy the cache layer (e.g., RunsOn with extras=s3-cache+magic-cache) — the runner injects a sidecar at ACTIONS_RESULTS_URL that intercepts both v2 cache and v4 artifact API calls. The sidecar speaks the cache protocol but does not always speak the v4 artifact protocol. With magic-cache enabled, actions/upload-artifact@v4 may fail with Unexpected token '...' is not valid JSON because the sidecar returns plain-text errors for artifact endpoints.
If you hit this, the symptoms vary by transport:
Cache not found even when the upstream job successfully saved.magic-cache from the runs-on label) so v4 artifact calls reach api.github.com directly.When in doubt on a cloud runner that proxies caching, prefer upload-artifact/download-artifact over actions/cache and disable any cache-proxy extras. Artifacts go straight to the GitHub API which is reachable from any container or instance.
For detailed guidance, read the appropriate reference file:
pull_request_target, script injection, secrets, OIDCNever debug CI issues by pushing and waiting. CI runs consume time (10-15 min per cycle) and runner minutes. Always verify locally first:
# Run the same checks CI runs, locally
pnpm check # typecheck + lint + format
pnpm build # production build
pnpm test # unit tests
# Only after ALL pass locally:
git push
# Then monitor:
/watch-ci
The workflow: fix locally → verify locally → push once → /watch-ci. If CI fails after local verification, it's either an environment difference (Node version, missing env vars) or a path/dependency issue specific to CI — much easier to diagnose than a code bug.
When reviewing or writing a workflow, verify:
timeout-minutesconcurrency group is set with appropriate cancel-in-progresspermissions are declared (least privilege)pull_request_target is NOT used with PR code checkoutrun: blockssecrets: inheritcache: parameter in setup-node (fresh install from CDN is faster — see rule 5)actions/checkout matches the runner type — default on ephemeral, set-safe-directory: false on self-hosted only (see rule 6)actions/cache for build tools matches the runner type — used on ephemeral, NOT on self-hosted (see rule 7)curl | sh installers — use taiki-e/install-action or similar with retries (see rule 8)actions/cache not upload-artifact to avoid org storage limits — but switch to artifacts when a cloud runner's cache-proxy sidecar (e.g. RunsOn magic-cache) breaks v4 caching from container jobs (see rule 9)development
Link Claude Code skill names mentioned in a CodeGrid article (data/{series}/{n}.md) to the author's public claude-resources repo, pinned to the latest commit hash so links don't rot. Use when: (1) user says 'linkify cc resources', 'link the skills', 'link skill names', or invokes /dev-linkify-cc-resources; (2) editing a CodeGrid article that mentions `/commits`, `/pr-complete`, `/skill-creator` or other Claude Code skills and they should point to claude-resources. Only links skills that actually exist in the public repo; skips hypothetical examples and code blocks.
development
Second opinion from Claude Opus on a plan or approach. Use when: (1) Planning phase of /big-plan needs a higher-quality review than /codex-2nd / /gco-2nd / /gcoc-2nd, (2) User says 'opus 2nd' or 'opus opinion', (3) Wanting Anthropic's larger model to critique a plan. Spawns a general-purpose Agent with model: opus that reads the plan file and returns structured feedback. Anthropic quota — not free.
tools
AI-based testing via subagent + a per-task test-flow skill. Use when the user wants to verify something that mechanical assertions can't fully capture — image recognition, visual size/position comparison, animation smoothness, multi-step manual flows that need AI judgment. Triggers: 'AI-based test', 'AI test', 'visual verify', 'image recognition test', 'manual operation test', 'human-eye check', 'verify visually', 'compare screenshots', 'looks the same', 'looks correct'. The skill's job is to (1) author a focused test-flow skill that captures the exact procedure + verdict criteria, then (2) dispatch a verification subagent via the Agent tool that loads BOTH the test-flow skill AND a browser-driving skill (/verify-ui primary, /headless-browser fallback) so the subagent has clear context and consistent verdicts. NEVER uses `claude -p` — subagent dispatch goes through the Agent tool exclusively.
development
End-of-workflow audit of touched GitHub issues, PRs, and branches via a Sonnet subagent. Use when: (1) /big-plan, /x-as-pr, or /x-wt-teams finishes its main work and needs to verify every touched resource is in the right state (closed when done, kept when ongoing, deleted when dead), (2) User says 'cleanup resources', 'audit cleanup', or 'check what should be closed', (3) A long workflow ends and the manager wants a structured paper trail of what it closed/kept/deleted. Auto-execute by default — the Sonnet agent proposes, the manager (you) executes safe actions and prints a final report.