kramme-cc-workflow/skills/kramme:launch:rollout/SKILL.md
Execute a post-merge launch with staged rollout, numeric decision thresholds, and rollback triggers. Sequence — staging → prod (flag OFF) → team enable → 5% canary → 25→50→100% gradual → full rollout + 1-week monitor + flag cleanup. Use after merging a user-facing change that needs safe rollout. Complements kramme:pr:finalize (pre-merge readiness) with post-merge verification, canary gates, and rollback paths. Not for PR creation, CI debugging, or pre-merge checks.
npx skillsauth add abildtoft/kramme-cc-workflow kramme:launch:rolloutInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Execute a staged, reversible, observable post-merge launch. The goal is not just to deploy — it's to deploy safely, with monitoring in place, a rollback plan ready, and numeric gates that turn "is this healthy?" into a falsifiable question. Every launch should be reversible, observable, and incremental.
This skill owns the post-merge half of the shipping lifecycle. It begins at "the PR has landed on main" and ends at "the flag has been cleaned up and the feature is permanent."
kramme:pr:finalize. Covers code-review verdict, UX review, QA run, description generation. Ends at a READY verdict.The handoff between the two is the merge commit. If you are still running finalize, you are not yet in rollout territory.
kramme:pr:fix-ci.kramme:pr:create.kramme:pr:finalize.kramme:pr:resolve-review.The markers below anchor this skill's output. Emit them explicitly; do not bury them in prose.
STACK DETECTED: <feature-flag platform, deploy target, monitoring tools>
State the rollout stack before step 1. Example: STACK DETECTED: LaunchDarkly flags, Vercel deploy, Datadog metrics + Sentry errors. If any component is missing (no flag platform, no error reporting), that's a MISSING REQUIREMENT.
UNVERIFIED: <claim about baseline, capacity, or behavior that has no source>
Flag any numeric or behavioral claim the user has not confirmed. "Baseline error rate is ~0.5%" is UNVERIFIED until you can point to a dashboard or a log query.
NOTICED BUT NOT TOUCHING: <what you saw>
Why skipping: <out-of-scope for this rollout>
Use when you spot an unrelated infra problem during rollout (a dashboard gap, an alert with a wrong threshold, a runbook that's stale). Log it and move on. Do not silently fix adjacent issues mid-rollout — silent fixes during a live canary are unreviewable and expand the blast radius.
MISSING REQUIREMENT: <which pre-launch checklist item is missing>
Plan: <how to resolve before advancing>
Emit when any pre-launch checklist item is unchecked and unowned. Rollout does not advance past the pre-flight gate with a missing requirement.
CONFUSION: <ambiguous signal>
Use when a metric could have multiple causes. Example: CONFUSION: P95 latency 30% above baseline at 5% canary — could be cold-start cache warm-up or real regression. Forces an investigation rather than a snap judgment.
Before step 1, confirm every box in references/pre-launch-checklist.md is checked or explicitly deferred with an owner. The checklist has six sections: Code Quality, Security, Performance, Accessibility, Infrastructure, Documentation.
If any box is unchecked:
MISSING REQUIREMENT and halt the rollout.Do not proceed past this gate on the assumption that "we can fix it during canary." Problems that ship do not un-ship mid-rollout.
The sequence is six staged gates. Each gate has a monitoring window and a set of thresholds that must pass before advancing. Do not compress the sequence to save time — the windows are the whole point.
The launch ticket referenced throughout is wherever this rollout is tracked — your team's Linear/Jira/GitHub issue for the release. If none exists, create a LAUNCH.md at the repo root and use it as the ticket. The sequence, the thresholds table, and the rollback plan all get written there. Archive or delete LAUNCH.md as part of the final flag-cleanup gate once the rollout completes — it is a working artifact, not permanent documentation.
Re-entry: if this skill is re-invoked mid-rollout, read the launch ticket first and resume at the gate it records as current — do not restart from staging or re-run monitoring windows for gates that already passed.
1. DEPLOY to staging
└── Full test suite passes in the staging environment.
└── Manual smoke test of the critical user flow.
└── Verify the feature flag is reachable and togglable from staging's flag UI.
2. DEPLOY to production (feature flag OFF)
└── Verify the deploy succeeded (health check returns 200).
└── Check error monitoring — no new error types attributed to the deploy.
└── Confirm the feature flag is present and defaults to OFF for all users.
└── Window: immediate — if the deploy itself is unhealthy, stop here.
3. ENABLE for team (flag ON for internal users only)
└── Team members use the feature in production.
└── Gather qualitative feedback: does it feel right? any rough edges?
└── Monitor error rates and latency against the team cohort.
└── Window: 24 hours minimum.
4. CANARY rollout (flag ON for 5% of users)
└── Monitor error rates, latency, client errors, and business metrics.
└── Compare canary cohort vs. control cohort (not against the full population).
└── Apply the Rollout Decision Thresholds table (see next section).
└── Window: 24–48 hours. Advance only if all thresholds pass.
5. GRADUAL increase (25% → 50% → 100%)
└── Same monitoring at each step.
└── Ability to roll back to the previous percentage at any point.
└── Window: 12–24 hours per step, depending on traffic volume.
6. FULL rollout (flag ON for all users)
└── Monitor for 1 week.
└── Remove the feature flag and the off-state code path (see references/feature-flag-rules.md).
└── Window: 1 week active monitoring, then flag cleanup within 2 weeks.
Write the sequence into the launch ticket verbatim so the on-call can see which gate is current.
The numeric gates for advance / hold / rollback decisions. Reproduce the table in the launch ticket so the decision rule is visible, not tribal.
| Metric | Advance (green) | Hold and investigate (yellow) | Roll back (red) | | --- | --- | --- | --- | | Error rate | Within 10% of baseline | 10–100% above baseline | >2× baseline | | P95 latency | Within 20% of baseline | 20–50% above baseline | >50% above baseline | | Client JS errors | No new error types | New errors at <0.1% of sessions | New errors at >0.1% of sessions | | Business metrics | Neutral or positive | Decline <5% (may be noise) | Decline >5% |
How to read the table:
UNVERIFIED applies if you don't have a real baseline — do not advance past 5% on a guessed baseline.
Full rationale (why these numbers, how to interpret edge cases) lives in references/rollout-thresholds.md.
Independent of the thresholds table. Roll back immediately, without waiting for the monitoring window to expire, if any of the following occur:
Pre-agree these triggers with the on-call before step 1. STACK DETECTED should include the exact runbook step for each trigger.
Within the first hour after the flag flips on (whether at team enable, canary, or full rollout), complete every item:
Complete all six. Do not skip any "because it always works" — the one time it doesn't is launch day.
Post-100%, the flag has one remaining job: to be removed. See references/feature-flag-rules.md for the full rules. The short version:
Removing a flag means removing the check, the off-state code path, the flag-service definition, the CI test matrix entry, and any runbook references. A half-cleaned flag is worse than the original.
Every rollout needs a documented rollback plan before step 1. Fill this in for the launch ticket:
## Rollback Plan for [Feature/Release]
### Trigger Conditions
- Error rate > 2× baseline.
- P95 latency > [Xms — specific to this feature].
- User reports of [specific expected failure mode].
- [Any feature-specific trigger — e.g., "checkout conversion drops >5%"].
### Rollback Steps
1. Reverse the change, whichever is faster:
- **Flag flip** — disable the feature flag in the flag UI (expected time: < 1 minute), or
- **Redeploy** — `git revert <commit> && git push` to ship the previous version (expected time: < 5 minutes).
2. Verify the rollback: health check returns 200, error rate returns to baseline.
3. Communicate: notify the team channel and on-call that a rollback occurred.
4. Open a postmortem ticket within 24 hours.
### Database Considerations
- Migration [X] rollback: [specific command / procedure, or "N/A — schema is backward-compatible"].
- Data inserted by the new feature: [preserved / cleaned up / quarantined].
### Time-to-Rollback
- Feature flag: < 1 minute.
- Redeploy previous version: < 5 minutes.
- Database rollback (if needed): < 15 minutes.
A rollback plan that does not fit in the launch ticket is too vague to execute under pressure.
End the rollout session with a structured summary that an on-call can scan in 30 seconds:
CHANGES MADE
- Flag <name> rolled to <percentage>% at <timestamp>.
- Monitoring windows completed: <which gates passed>.
- Feature flag cleanup: <scheduled for DATE | completed | deferred — reason>.
THINGS I DIDN'T TOUCH
- <adjacent infra / monitoring / runbook items noticed but not modified>.
POTENTIAL CONCERNS
- <any yellow metric observations>.
- <any UNVERIFIED assumptions not yet closed>.
- <any CONFUSION entries not yet resolved>.
This template replaces handwavy "launch looks good" posts. It documents what happened, what was observed but not changed, and what remains open.
kramme:pr:finalize produces the READY verdict that gates merge. This skill picks up at the merge commit.kramme:launch:monitor (post-launch canary surveillance via browser MCP) and kramme:launch:rollback (execute a rollback when thresholds are breached) are deferred until demand appears. Both would extend this skill, not replace it.The lies engineers tell themselves to skip rollout discipline. Each one has a correct response.
| Rationalization | Reality | | --- | --- | | "It works in staging, it'll work in production." | Production has different data, traffic, and edge cases. Staging validates that the code runs; production validates that the code works. | | "It's a small change, skip the canary." | Small changes break big things. The canary costs one day; a bad full-rollout costs a week. | | "We don't need a feature flag for this." | Every non-trivial change benefits from a kill switch. Flags are the cheapest insurance in the stack. | | "Monitoring is overhead." | Not having monitoring means discovering problems from user complaints. That's a worse kind of overhead — and you cannot debug what you cannot see, so add it before launch, not later. | | "Latency is only 30% above baseline — probably fine." | 30% is yellow. The table says hold and investigate. "Probably fine" is not a decision rule. | | "Rolling back is admitting failure." | Rolling back is responsible engineering. Shipping a broken feature is the failure. | | "We'll clean up the flag later." | Later is never. Schedule the cleanup ticket before starting the rollout. | | "The on-call will watch it." | The on-call does not know this feature. You do. Be there for the first-hour verification. | | "It's Friday afternoon, let's ship it." | No. Ship Monday–Thursday, during working hours, with the team available. |
If you notice any of these during rollout planning or execution, stop:
UNVERIFIED applies and has not been resolved).MISSING REQUIREMENT from the pre-launch checklist has not been resolved.Any single red flag above is grounds to halt. Two or more is grounds to cancel the rollout and restart from pre-flight.
Before declaring the rollout complete, self-check every item:
Before step 1:
STACK DETECTED line is emitted and names flag platform, deploy target, monitoring tools.At each gate (team, 5%, 25%, 50%, 100%):
MISSING REQUIREMENT has emerged.After full rollout:
CHANGES MADE / THINGS I DIDN'T TOUCH / POTENTIAL CONCERNS).UNVERIFIED has been closed or explicitly left open with an owner.CONFUSION has been resolved.If any answer is no, the rollout is not done. Close the gap before calling it shipped.
development
Runs kramme:pr:code-review as a closeout review loop for local or PR branch changes before commit, ship, or final response. Use when the user asks for autoreview, second-model review, or a final code-review pass after non-trivial edits. Not for UX, visual, accessibility, or product review.
development
Guides topic-level understanding verification for a PR, branch, feature, document, spec, design decision, bug fix, or other concrete subject. Use when the user asks to confirm, quiz, drill, teach-and-check, or verify that they understand a topic. Maintains a topic-specific checklist artifact and requires demonstrated understanding before marking the topic complete. Not for ordinary explanations without verification, end-of-session summaries, or code/test correctness checks.
testing
Design a CI/CD pipeline with quality gates, a <10-minute budget, feature-flag lifecycle, and an exit checklist. Use when adding a new CI pipeline, changing gate configuration, or planning a rollout for a new service. Complementary to kramme:pr:fix-ci (which fixes failures in an existing pipeline). Covers gate ordering, secrets storage, branch protection, rollback mechanism, and staged-rollout guardrails — not a rollout-execution runbook.
tools
--- name: kramme:visual:demo-reel description: Capture local demo evidence for observable product behavior: screenshots, before/after image sets, browser reels, terminal recordings, and short GIF/video proof. Use when shipping UI changes, CLI features, or any change where PR reviewers would benefit from visual or behavioral evidence. argument-hint: "[what to capture] [--url <url>|auto] [--tier static|before-after|browser-reel|terminal-recording]" disable-model-invocation: true user-invocable: tr