skills/infrastructure-performance/monitoring-alerting-commerce/SKILL.md
Track store health in real time with dashboards for checkout success rate, payment failures, cart errors, and custom SLO alerting
npx skillsauth add finsilabs/awesome-ecommerce-skills monitoring-alerting-commerceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generic infrastructure monitoring (CPU, memory, error rate) is insufficient for e-commerce — you need commerce-domain metrics: checkout funnel conversion rates, payment success/failure breakdown by gateway and card type, cart abandonment rates, and inventory out-of-stock events. This skill covers setting up monitoring across different platforms and instrumenting custom storefronts with OpenTelemetry, building dashboards for commerce KPIs, and setting up alerts that fire before revenue impact becomes visible in sales reports.
| Platform | Built-In Monitoring | What You Can Add | |----------|-------------------|-----------------| | Shopify | Shopify admin shows orders, conversion rates, and a performance report with Core Web Vitals | Install Lucky Orange or Microsoft Clarity for session recordings and funnel drop-off; connect Shopify to Google Analytics 4 for detailed checkout funnel tracking | | WooCommerce | WooCommerce analytics dashboard shows orders and revenue; no performance monitoring built in | Install MonsterInsights (GA4 integration), WooFunnels (checkout funnel tracking), and set up Uptime Robot (free tier: 50 monitors) for availability alerting | | BigCommerce | BigCommerce Analytics shows orders, conversion rate, and abandoned carts | Connect BigCommerce to GA4 via native integration; use Lucky Orange for heatmaps and session recordings on checkout pages | | Custom / Headless | Nothing — you build it | Instrument with OpenTelemetry, ship metrics to Grafana Cloud (free tier: 10K series) or Datadog; build checkout funnel dashboards with PromQL; see implementation below |
Set up GA4 checkout funnel tracking:
begin_checkout eventadd_shipping_info eventadd_payment_info eventpurchase eventMonitor your store's Core Web Vitals:
fetchpriority="high"Set up availability and error alerting:
Connect WooCommerce to GA4:
Track checkout funnel drop-off:
Set up uptime and error alerting:
/wp-admin/admin-ajax.php (WooCommerce uses this heavily)Monitor server health:
For custom storefronts, implement a full observability stack: OpenTelemetry for instrumentation, Prometheus or Grafana Cloud for metrics storage, and Grafana for dashboards.
Define commerce SLOs before building dashboards — these become your alert thresholds:
| Metric | Target | Alert threshold | |--------|--------|----------------| | Payment success rate | 95% | < 90% for 2 min | | Checkout P99 latency | < 3000ms | > 5000ms for 3 min | | Checkout availability | 99.9% | Zero starts for 2 min | | Catalog page P95 | < 1000ms | > 2000ms for 5 min |
Instrument your storefront with OpenTelemetry:
npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-metrics-otlp-http @opentelemetry/exporter-trace-otlp-http \
@opentelemetry/sdk-metrics
// instrumentation.ts — import before any other module
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
const sdk = new NodeSDK({
serviceName: 'commerce-storefront',
traceExporter: new OTLPTraceExporter({ url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT }),
metricReader: new PeriodicExportingMetricReader({
exporter: new OTLPMetricExporter({ url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT }),
exportIntervalMillis: 15000,
}),
instrumentations: [getNodeAutoInstrumentations({
'@opentelemetry/instrumentation-http': { enabled: true },
'@opentelemetry/instrumentation-pg': { enabled: true },
'@opentelemetry/instrumentation-ioredis': { enabled: true },
})],
});
sdk.start();
Track checkout funnel metrics:
// lib/metrics/checkout-metrics.ts
import { metrics } from '@opentelemetry/api';
const meter = metrics.getMeter('commerce-checkout');
const checkoutStarted = meter.createCounter('checkout.started');
const checkoutCompleted = meter.createCounter('checkout.completed');
const paymentAttempts = meter.createCounter('payment.attempts');
const paymentSuccesses = meter.createCounter('payment.successes');
const paymentFailures = meter.createCounter('payment.failures');
const orderCreationDuration = meter.createHistogram('order.creation.duration_ms', {
boundaries: [100, 250, 500, 1000, 2000, 5000, 10000],
});
export const checkoutMetrics = {
recordCheckoutStart(channel: string) {
checkoutStarted.add(1, { channel });
},
recordCheckoutComplete(channel: string, paymentMethod: string) {
checkoutCompleted.add(1, { channel, payment_method: paymentMethod });
},
recordPaymentAttempt(gateway: string, method: string) {
paymentAttempts.add(1, { gateway, method });
},
recordPaymentSuccess(gateway: string, method: string) {
paymentSuccesses.add(1, { gateway, method });
},
recordPaymentFailure(gateway: string, declineCode: string) {
paymentFailures.add(1, { gateway, decline_code: declineCode });
},
recordOrderCreation(durationMs: number, channel: string) {
orderCreationDuration.record(durationMs, { channel });
},
};
PromQL queries for Grafana dashboard panels:
# Payment success rate (gauge panel — target 95%)
rate(payment_successes_total[5m])
/
(rate(payment_successes_total[5m]) + rate(payment_failures_total[5m]))
# Checkout P99 latency
histogram_quantile(0.99,
sum(rate(order_creation_duration_ms_bucket[5m])) by (le, channel)
)
# Checkout funnel conversion rate
rate(checkout_completed_total[1h]) / rate(checkout_started_total[1h])
# Payment failures by decline code (top 5)
topk(5, sum(rate(payment_failures_total[5m])) by (decline_code))
Prometheus alerting rules:
# prometheus/commerce-alerts.yaml
groups:
- name: commerce-critical
rules:
- alert: PaymentSuccessRateLow
expr: |
(
rate(payment_successes_total[5m]) /
(rate(payment_successes_total[5m]) + rate(payment_failures_total[5m]))
) < 0.90
for: 2m
labels:
severity: critical
annotations:
summary: "Payment success rate below 90%"
description: "Rate is {{ $value | humanizePercentage }}. Check Stripe status page and recent deployments."
runbook: "https://wiki.mystore.com/runbooks/payment-failures"
- alert: CheckoutHighLatency
expr: |
histogram_quantile(0.99,
sum(rate(order_creation_duration_ms_bucket[5m])) by (le)
) > 5000
for: 3m
labels:
severity: warning
annotations:
summary: "Checkout P99 latency above 5 seconds"
description: "P99 is {{ $value }}ms. Check database slow query log and Redis connection pool."
- alert: CheckoutServiceDown
expr: rate(checkout_started_total[5m]) == 0
for: 2m
labels:
severity: critical
annotations:
summary: "No checkouts being started — possible service outage"
Add Real User Monitoring for Core Web Vitals:
// lib/rum.ts — initialize in root layout
import { onCLS, onINP, onLCP, onFCP, onTTFB } from 'web-vitals';
function sendVital(metric: any) {
navigator.sendBeacon?.('/api/rum/vitals', JSON.stringify({
name: metric.name,
value: metric.value,
rating: metric.rating, // 'good', 'needs-improvement', 'poor'
page: window.location.pathname,
}));
}
export function initRUM() {
onCLS(sendVital);
onINP(sendVital);
onLCP(sendVital);
onFCP(sendVital);
onTTFB(sendVital);
}
insufficient_funds is different from do_not_honor (possible fraud or issuer outage); they require completely different responsesrunbook URL pointing to a page describing how to diagnose and resolve that specific conditionfor duration to avoid alert flapping — a 30-second latency spike is normal; alerts with for: 2m only fire if the condition is sustained| Problem | Solution |
|---------|----------|
| Too many alerts, low signal-to-noise | Start with 3–5 high-value alerts (payment failures, checkout latency, service down); add more only after validating each fires at the right threshold |
| Metrics not recorded when errors occur | Instrument metrics before and after error-prone operations; a failed payment should still increment payment.failures even if it throws an exception |
| Dashboard looks healthy but revenue is down | Add business metrics (orders per minute, revenue per hour) alongside technical metrics; technical SLOs can be met while UX issues suppress conversion |
| RUM data skewed by bots | Filter RUM events by user agent; bot traffic distorts Core Web Vitals and can hide real user performance regressions |
| Shopify GA4 funnel shows no data | Verify that the GA4 Measurement ID in Online Store → Preferences matches your GA4 property; check the GA4 DebugView to confirm events are firing |
tools
Let shoppers save products to a wishlist, share it with friends, and get notified when saved items come back in stock or drop in price
development
Build a themeable storefront with design tokens and CSS custom properties that supports white-labeling, multi-brand variants, and dark mode
development
Speed up product discovery with instant search suggestions, fuzzy typo matching, and category-aware results powered by Algolia or Elasticsearch
development
Build a mobile-first storefront with thumb-friendly navigation, sticky add-to-cart buttons, and touch-optimized components for high mobile conversion