skills/saas-launch-checklist/SKILL.md
Pre-launch verification across infrastructure, security, legal, payment, email, analytics, and performance. Day-1 monitoring, rollback plan, incident response skeleton, and post-launch week-1 checklist.
npx skillsauth add rubicanjr/FinCognis saas-launch-checklistInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A structured checklist for shipping a SaaS product to production with confidence. Every item exists because someone skipped it and regretted it.
npm audit / pip-audit / trivy shows zero critical CVEs/terms, acceptance recorded at signup/privacy, covers data collection, retention, and third partiesquarantine or reject, verified with mail-tester.com/unsubscribe endpoint works, preference center availableCache-Control and ETag headersBuild this dashboard before launch. Every panel answers a specific question.
+-----------------------------------------------------+
| SaaS Launch Dashboard |
+---------------------------+-------------------------+
| Request Rate (req/s) | Error Rate (%) |
| Normal: 10-50/s | Target: < 1% |
| Alert: > 200/s | Alert: > 2% |
+---------------------------+-------------------------+
| p95 Latency (ms) | Active Users (real-time) |
| Target: < 200ms | Shows WebSocket/polling |
| Alert: > 500ms | count |
+---------------------------+-------------------------+
| Signup Rate (/hr) | Payment Success Rate (%) |
| Compare to projection | Target: > 95% |
| Alert: 0 for 30min | Alert: < 90% |
+---------------------------+-------------------------+
| CPU / Memory | Database Connections |
| Alert: > 80% | Alert: > 80% pool |
+---------------------------+-------------------------+
# monitoring/launch-dashboard.yml
panels:
- title: Request Rate
query: sum(rate(http_requests_total[5m]))
alert_threshold: 200
- title: Error Rate
query: |
sum(rate(http_requests_total{status_code=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m])) * 100
alert_threshold: 2
- title: p95 Latency
query: |
histogram_quantile(0.95,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
alert_threshold: 0.5
- title: Signup Rate
query: sum(increase(user_signups_total[1h]))
alert_threshold: 0 # alert if zero for 30 min
- title: Payment Success Rate
query: |
sum(rate(payment_completed_total[5m]))
/ sum(rate(payment_attempted_total[5m])) * 100
alert_threshold: 90 # alert if below
Complete this before you deploy. If you cannot fill every field, you are not ready to launch.
# Rollback Plan: [Product Name] v[X.Y.Z]
## Decision Criteria
Trigger rollback if ANY of these occur within 60 minutes of deploy:
- [ ] Error rate exceeds 5%
- [ ] p95 latency exceeds 2 seconds for 5 consecutive minutes
- [ ] Payment processing fails for 3+ consecutive attempts
- [ ] Data integrity issue detected (mismatched records)
## Rollback Steps
1. Set feature flag `launch_v1` to OFF (immediate, <30 seconds)
2. Revert DNS/load balancer to previous deployment
3. Run: `kubectl rollout undo deployment/api --to-revision=PREV`
OR: `docker compose -f docker-compose.prod.yml up -d --force-recreate`
4. Verify health check returns 200 on previous version
5. Notify #incidents channel: "Rollback executed, investigating"
## Data Migration Rollback
- [ ] Database migration has a DOWN migration
- [ ] Tested: `npm run migrate:down` or `python manage.py migrate APP PREVIOUS`
- [ ] New columns are NULLABLE (old code ignores them)
- [ ] No destructive changes (column drops, renames) in this release
## Communication
- Engineering: #incidents Slack channel
- Support: Pre-drafted message in support tool
- Customers: Status page update (only if downtime > 5 min)
## Owner
- Rollback decision: [On-call engineer name]
- Execution: [DevOps engineer name]
- Communication: [Support lead name]
# Launch Day Incident Playbook
## Severity Levels
| Level | Definition | Response | Example |
|-------|-----------------------------|-----------|----------------------------|
| SEV-1 | Service down, data loss | 5 min | Database unreachable |
| SEV-2 | Major feature broken | 15 min | Payments failing |
| SEV-3 | Minor feature broken | 1 hour | Email delivery delayed |
| SEV-4 | Cosmetic issue | Next day | Alignment bug on Safari |
## On-Call Roster (Launch Day)
| Role | Primary | Secondary | Contact |
|--------------------|----------------|----------------|----------------|
| Incident Commander | [Name] | [Name] | [Phone/Slack] |
| Backend Engineer | [Name] | [Name] | [Phone/Slack] |
| Frontend Engineer | [Name] | [Name] | [Phone/Slack] |
| DevOps/SRE | [Name] | [Name] | [Phone/Slack] |
## Response Flow
1. DETECT - Alert fires or user reports issue
2. TRIAGE - Assign severity, open incident channel (#inc-YYYYMMDD-NN)
3. CONTAIN - Feature flag OFF, rollback, or scale up
4. FIX - Root cause fix, deploy to staging, verify
5. DEPLOY - Push fix to production with monitoring
6. REVIEW - Post-incident review within 48 hours
## Pre-Written Status Page Messages
- Investigating: "We are aware of an issue affecting [X] and are investigating."
- Identified: "The issue has been identified. A fix is being deployed."
- Resolved: "The issue has been resolved. All systems are operational."
Monday 9:00 AM:
- Mass email to 50,000 person waitlist
- ProductHunt launch post goes live
- Hacker News submission
- All social media posts scheduled simultaneously
- Full feature set available to everyone
Monday 9:15 AM:
- 3,000 concurrent users hit the app
- Database connections exhausted
- Payment webhook queue backs up
- Support inbox: 200 tickets in 30 minutes
- Error rate: 15%
- Team panics
Monday 10:00 AM:
- Emergency rollback
- Status page: "We are experiencing issues"
- ProductHunt comments: "This doesn't work"
- First impression destroyed
// Feature flag configuration for phased rollout
const LAUNCH_PHASES = {
phase1: {
name: 'Team and Friends',
startDate: '2025-01-06',
criteria: { userList: 'internal-testers' }, // 50 users
goal: 'Find critical bugs before anyone else sees them',
},
phase2: {
name: 'Beta Waitlist (10%)',
startDate: '2025-01-08',
criteria: { percentage: 10 }, // 500 users
goal: 'Validate onboarding flow and payment',
},
phase3: {
name: 'Beta Waitlist (50%)',
startDate: '2025-01-10',
criteria: { percentage: 50 }, // 2,500 users
goal: 'Load test with real traffic patterns',
},
phase4: {
name: 'Full Waitlist + Public',
startDate: '2025-01-13',
criteria: { percentage: 100 }, // everyone
goal: 'General availability',
},
} as const
Week 1 Monday: 50 internal users -> Find showstoppers
Week 1 Wednesday: 500 beta users -> Validate payment flow
Week 1 Friday: 2,500 beta users -> Verify infrastructure holds
Week 2 Monday: Full public launch -> Confidence backed by data
Each phase gate:
[x] Error rate < 1%
[x] p95 latency < 300ms
[x] Payment success > 98%
[x] NPS from phase users > 30
[x] Zero data integrity issues
-> Only then proceed to next phase
| Dimension | Big Bang | Phased Rollout | |-------------------|------------------------|--------------------------| | Risk | All-or-nothing | Contained per phase | | Feedback | Overwhelming, chaotic | Manageable, actionable | | Infrastructure | Guess and hope | Scale based on real data | | Recovery | Public failure | Private fix | | First impression | One shot | Refined over 4 attempts | | Team stress | Maximum | Distributed |
Key principle: Launch is not a single moment -- it is a process. Every item you skip is a bet that nothing will go wrong in that area. The checklist exists to make the boring stuff automatic so you can focus on users.
development
Goal-based workflow orchestration - routes tasks to specialist agents based on user goals
tools
Wiring Verification
development
Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.
development
Screenshot comparison QA for frontend development. Takes a screenshot of the current implementation, scores it across multiple visual dimensions, and returns a structured PASS/REVISE/FAIL verdict with concrete fixes. Use when implementing UI from a design reference or verifying visual correctness.