database-reliability/SKILL.md
Database reliability engineering: SLOs for databases, operational runbooks, change management, capacity planning, backup verification, incident response, and monitoring strategies for production MySQL. Use when setting up production database...
npx skillsauth add peterbamuhigire/skills-web-dev database-reliabilityInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
database-reliability or would be better handled by a more specific companion skill.SKILL.md first, then load only the referenced deep-dive files that are necessary for the task.Databases are not special snowflakes. Treat every node as cattle: replaceable, automated, and monitored. The DBRE role is engineering, not firefighting. Toil (manual, repetitive, automatable work that scales linearly with growth) is the enemy — eliminate it.
| Tier | SLO | Downtime/Year | Downtime/Week | |------|-----|---------------|---------------| | Standard | 99.9% | 8.7 hours | 10.08 minutes | | High | 99.95% | 4.4 hours | 5.04 minutes | | Critical | 99.99% | 52 minutes | 1 minute |
Sample SLO: 99.9% availability averaged over one week; no single incident >10.08 minutes; downtime declared when >5% of users affected; one annual 4-hour maintenance window (2 weeks notice, <10% users).
Never use averages — they are lossy and hide tail latency. Use percentiles over 1-minute windows at 99% of requests.
p50 < 5ms (pk lookups) | p95 < 50ms (indexed queries)
p99 < 200ms (joins/agg) | max 500ms (circuit breaker)
Replication lag: <5s normal | alert 10s | critical 30s
Connections: <80% of max_connections (alert), >20% headroom required
Slow query rate: <1% of total queries
Track weekly. 30% consumed by Tuesday → create ticket. 70% consumed with 3+ days left → freeze non-critical deployments. 99.9% SLO = 604.8 seconds budget per week.
SELECT ROUND(SUM(duration_seconds) / 604.8, 1) AS budget_pct_used
FROM downtime_log WHERE week_start = CURDATE() - INTERVAL WEEKDAY(CURDATE()) DAY;
Extended guidance for database-reliability was moved to references/skill-deep-dive.md to keep this entrypoint compact and fast to load.
Use that deep dive for:
2. Toil Reduction — What to Automate3. Change Management for Databases4. Backup Verification Runbook5. Monitoring Pyramid6. Alert Fatigue Prevention7. Capacity Planning8. Incident Response Runbook9. Connection Exhaustion Response10. Replication Failure Recovery11. Security Incident Response12. Planned Maintenance Checklist13. Chaos Engineering for Databasesdata-ai
Use when adding AI-powered analytics to a SaaS platform — semantic search over business data, natural language queries, trend detection, anomaly alerts, and AI-generated insights for dashboards. Covers embeddings, NL2SQL, and per-tenant analytics...
data-ai
Design AI-powered analytics dashboards — what metrics to show, how to display AI predictions and confidence, drill-down patterns, KPI cards, trend visualisation, AI Insights panels, export design, and role-based dashboard variants. Invoke when...
development
Use when designing, building, reviewing, or upgrading production software systems that must be secure, performant, maintainable, scalable, and user-centered. Apply before writing specs, code, architecture, APIs, databases, mobile apps, SaaS platforms, or ERP systems.
development
Professional web app UI using commercial templates (Tabler/Bootstrap 5) with strong frontend design direction when needed. Use for CRUD interfaces, dashboards, admin panels with SweetAlert2, DataTables, Flatpickr. Clone seeder-page.php, use...