skills/babysit-pr/SKILL.md
Create, monitor, and shepherd a GitHub pull request end-to-end: respect repo PR templates, watch CI and reviews, fix branch-related failures, address valid comments, and validate post-merge deployment when possible.
npx skillsauth add ckorhonen/claude-skills babysit-prInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Own the pull request from creation through post-merge validation with as little manual babysitting as possible.
Terminal outcomes:
Do not stop at "PR created" or "CI is green once." Keep following the PR until one of the terminal outcomes is true.
These gotchas appear during real PR automation and require proactive handling:
Watcher Script Exits Silently (Template Resolution)
The watcher can exit unexpectedly if PR template resolution fails during preflight. Root cause: resolve_pr_template.py may fail to parse custom template syntax, or template file is unreadable. Mitigation: always run the resolver in --json mode first to validate template before starting the watcher. If resolution fails, fall back to the bundled default template at assets/default_pr_template.md.
Multiple Watchers on Same PR (Duplicate Comments)
Running multiple gh_pr_watch.py instances for the same PR causes duplicate comment processing and can result in duplicate bot comments or conflicting retry actions. This typically happens when the watcher is restarted without properly stopping the previous session. Mitigation: use one watcher session per PR. Verify no orphaned watcher processes exist before starting: ps aux | grep gh_pr_watch. Kill any stale sessions before resuming.
CI Retry Logic Triggers on Non-Flaky Failures (Quota Waste)
The retry logic may consume the CI retry budget on failures that are not actually transient (e.g., legitimate test failures from branch-specific issues). This wastes quota and can block progress on legitimate issues. Mitigation: always inspect CI logs with gh run view <run-id> --log-failed before retrying. Confirm the failure is truly flaky or unrelated to branch changes. Apply the heuristics in references/heuristics.md strictly before calling --retry-failed-now.
Review Score Extraction Fails with Markdown Formatting
When reviewers use markdown formatting in their score phrases (e.g., **4/5** stars or `4 out of 5`), the review score extraction regex may fail to match. This results in the watcher missing explicit scores and potentially merging PRs prematurely despite unfinished reviewer feedback. Mitigation: the score extractor looks for patterns like 5/5 and 5 out of 5. If a reviewer uses markdown around the score, manually extract the numeric rating and document it in the PR thread before proceeding.
See references in each workflow section below for prevention strategies.
gh auth status works.Use the resolver script first:
python3 skills/babysit-pr/scripts/resolve_pr_template.py --json
⚠️ Related gotcha: See "Watcher Script Exits Silently (Template Resolution)" under Common Failure Modes. Always validate template resolution before starting the watcher.
Rules:
assets/default_pr_template.md.gh pr create --body-file or gh pr edit --body-file with a prepared body that mirrors the resolved template.Helpful commands:
gh pr create --base <default-branch> --title "<title>" --body-file <body-file>
gh pr edit <pr-number> --title "<title>" --body-file <body-file>
python3 skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --watch
The watcher emits snapshots and recommended actions. Start with:
python3 skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --watch
⚠️ Related gotchas: See Common Failure Modes for "Multiple Watchers on Same PR" and "Watcher Script Exits Silently." Ensure no stale watcher processes are running before starting, and verify template resolution completes cleanly.
Key actions:
process_review_comment: inspect the new comment or review and decide whether it is valid, relevant, and actionable on the current branch.improve_review_score: a trusted reviewer gave an explicit score below the maximum; do not treat the work as complete yet.diagnose_ci_failure: inspect failed logs and decide whether the failure is branch-related.retry_failed_checks: rerun failed checks only when the failure looks flaky or unrelated.stop_ready_to_merge: CI is green, no outstanding trusted review items remain, and mergeability is clean.stop_pr_closed: the PR merged or closed; if merged, transition to deployment monitoring.stop_exhausted_retries: flaky retry budget is exhausted; ask the user for help with evidence.Use references/heuristics.md for the classification checklist.
⚠️ Related gotcha: See "CI Retry Logic Triggers on Non-Flaky Failures" under Common Failure Modes. Always inspect logs before retrying to avoid wasting quota on legitimate failures.
Default commands:
gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headSha
gh run view <run-id> --log-failed
python3 skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --retry-failed-now
Rules:
Apply feedback from trusted humans and approved bots when it is technically correct, actionable, and consistent with user intent.
When new feedback appears:
Ignore or escalate feedback when it is:
The watcher now extracts explicit score phrases such as 5/5 and 5 out of 5 from trusted comments and reviews.
⚠️ Related gotcha: See "Review Score Extraction Fails with Markdown Formatting" under Common Failure Modes. If reviewers use markdown around scores, manually extract ratings before proceeding.
Rules:
After merge, start the deployment watcher:
python3 skills/babysit-pr/scripts/gh_deploy_watch.py --pr auto --watch
The deployment watcher uses GitHub deployment records and deployment-like workflow runs tied to the merge commit. It stops with one of these actions:
wait_deployment: a deployment signal exists and is still running.wait_for_deployment_signal: the PR merged recently and the grace window has not expired yet.alert_deployment_failure: a deployment signal failed; inspect logs and decide whether to ship a fix.validate_production: deployment signals look successful; run the narrowest useful production validation.no_deployment_signal: no machine-readable deployment signal was found after the grace window.stop_pr_not_merged: the PR is not merged yet.Use references/deployment-monitoring.md for provider-specific follow-up, smoke tests, logs, and metrics guidance.
references/heuristics.mdreferences/github-api-notes.mdreferences/deployment-monitoring.mddocumentation
Create or expand an Idea.md / IDEA.md file from a rough description, existing repo, conversation history, notes, or other early-stage product inputs. Use when the user asks to "write an Idea.md", "turn this into an idea file", "capture this product idea", "expand this concept", or wants a repo-grounded concept brief before validation, PRD, or implementation work.
development
Write structured implementation plans from specs or requirements before touching code. Use when given a spec, requirements doc, or feature description, when user says "plan this out", "write a plan for", "how should we implement", or before starting any multi-step coding task.
testing
Expert guidance for video editing with ffmpeg, encoding best practices, and quality optimization. Use when working with video files, transcoding, remuxing, encoding settings, color spaces, or troubleshooting video quality issues.
development
Opinionated constraints for building better interfaces with agents. Use when building UI components, implementing animations, designing layouts, reviewing frontend accessibility, or working with Tailwind CSS, motion/react, or accessible primitives like Radix/Base UI.