Operational Patterns

Patterns and recommendations for security, observability, CI/CD, database migrations, and environment management. Use when generating the Security Architecture, Observability, and DevOps Blueprint deliverables.

Security Architecture

Auth Strategy Selection

| Project Type | Recommended Auth | Rationale | |-------------|-----------------|-----------| | Simple app, few roles | Clerk or Supabase Auth | Managed service, minimal setup, built-in UI components | | Multi-tenant SaaS | Auth0 or Clerk (organizations) | Organization-level isolation, role management, SSO | | API-only / B2B | API keys + JWT | Simple machine-to-machine auth | | Enterprise / compliance-heavy | Auth0 or Keycloak (self-hosted) | Fine-grained control, audit logs, compliance certifications | | Mobile app | Firebase Auth or Clerk | Native SDK support, social login, biometrics |

API Security Checklist

Every REST API service should implement these protections:

| Protection | Implementation | Priority | |-----------|---------------|----------| | Rate limiting | Express-rate-limit, Upstash Ratelimit, or API gateway rate limits | Must-have | | Input validation | Zod schemas on all request bodies and query params. Reject unknown fields. | Must-have | | CORS | Whitelist specific origins. Never use * in production. | Must-have | | Helmet headers | helmet middleware for security headers (CSP, X-Frame-Options, HSTS) | Must-have | | SQL/NoSQL injection | Parameterized queries only. Never interpolate user input into queries. | Must-have | | XSS prevention | Sanitize HTML output. Use frameworks with built-in escaping (React, Next.js). | Must-have | | CSRF protection | SameSite cookies + CSRF tokens for cookie-based auth. Not needed for bearer-only APIs. | Conditional | | Request size limits | Limit body size (e.g. 1MB default, higher for file uploads) | Should-have | | Authentication | Verify JWT/session on every protected route. Middleware, not per-route. | Must-have | | Authorization | Check user roles/permissions after authentication. Separate middleware. | Must-have |

Data Protection

| Concern | Recommendation | |---------|---------------| | Encryption at rest | Use database provider's built-in encryption (RDS, Supabase, MongoDB Atlas all encrypt by default) | | Encryption in transit | TLS 1.3 on all endpoints. Enforce HTTPS redirects. Internal service-to-service can use HTTP if within VPC. | | PII handling | Identify PII fields (email, name, phone, address, IP). Log them only when necessary. Mask in non-prod environments. | | Secrets management | Never hardcode secrets. Use platform env vars (Vercel, Railway) or a secrets manager (Doppler, AWS SSM). Rotate API keys periodically. | | Data retention | Define retention periods per data type. Implement soft deletes for user data. Support data export/deletion for GDPR. | | Backups | Automated daily database backups with point-in-time recovery. Test restores quarterly. |

OWASP Top 10 Quick Reference

When assessing security, check against these common threats:

Broken Access Control — Ensure authorization checks on every endpoint, not just authentication
Cryptographic Failures — Don't store passwords in plaintext (use bcrypt/argon2 via auth provider), don't log tokens
Injection — Parameterized queries, input validation, no eval/exec of user input
Insecure Design — Rate limit auth endpoints, implement account lockout, use CAPTCHA on public forms
Security Misconfiguration — Remove default credentials, disable debug mode in production, review CORS
Vulnerable Components — Keep dependencies updated, run npm audit / pip audit in CI
Authentication Failures — Use a managed auth provider, implement MFA for admin roles
Data Integrity Failures — Verify webhook signatures, validate JWT signatures, use CSP headers
Logging Failures — Log auth events, access denials, and input validation failures
SSRF — Validate and whitelist URLs before making server-side requests

Observability

Observability Stack Recommendations

| Project Size | Logging | Tracing | Metrics | Alerting | Monthly Cost | |-------------|---------|---------|---------|----------|-------------| | MVP / startup | Axiom (free tier) or console + Vercel logs | Not needed yet | Vercel/Railway built-in | Sentry (free tier) | $0 | | Growing (1K-10K users) | Axiom or Betterstack | Sentry performance | PostHog + Sentry | Sentry + Slack webhooks | $20-50/mo | | Production (10K+ users) | Datadog or Grafana Cloud | OpenTelemetry → Jaeger/Datadog | Prometheus + Grafana or Datadog | PagerDuty + Datadog | $100-500/mo | | Enterprise | Datadog or Splunk | Datadog APM or Honeycomb | Datadog or custom Prometheus | PagerDuty + Datadog | $500+/mo |

Structured Logging

All services should use structured JSON logging:

{
  "level": "info",
  "timestamp": "2026-02-07T10:30:00.000Z",
  "service": "api-server",
  "requestId": "req_abc123",
  "userId": "usr_xyz",
  "action": "create_order",
  "duration_ms": 145,
  "message": "Order created successfully"
}

Logging rules:

Use log levels consistently: error (broken), warn (degraded), info (business events), debug (dev only)
Include requestId for request correlation across services
Include userId for audit trail (mask in logs if compliance requires)
Never log: passwords, tokens, full credit card numbers, API keys
Always log: auth failures, permission denials, input validation errors, external API errors

Key Metrics to Track

| Category | Metric | Alert Threshold | |----------|--------|----------------| | Availability | Uptime percentage | < 99.5% over 24h | | Latency | Request duration p50, p95, p99 | p99 > 2s | | Error rate | 5xx errors / total requests | > 1% over 5 minutes | | Throughput | Requests per second | Unusual spike or drop (>3x baseline) | | Queue depth | Jobs waiting in queue | > 1000 for > 5 minutes | | Database | Connection pool usage, query duration | Pool > 80%, queries > 500ms | | AI/LLM | Token usage, response time, failure rate | Failure rate > 5%, response > 30s | | Business | Signups, conversions, active users | Unusual drops (context-dependent) |

Health Check Pattern

Every service exposes /health with tiered checks:

{
  "status": "healthy",
  "service": "api-server",
  "version": "1.2.3",
  "uptime_seconds": 86400,
  "checks": {
    "database": { "status": "healthy", "latency_ms": 5 },
    "redis": { "status": "healthy", "latency_ms": 2 },
    "external_api": { "status": "degraded", "latency_ms": 1500, "note": "slow but responding" }
  }
}

/health — quick liveness check (returns 200 if process is running)
/health/ready — readiness check (returns 200 only if all dependencies are reachable)
Used by load balancers, container orchestrators, and monitoring

CI/CD Pipeline

Pipeline Templates by Provider

GitHub Actions (recommended for most projects):

Stages: lint → test → build → deploy

Triggers:
  - Push to main → deploy to production
  - Push to develop → deploy to staging
  - Pull request → run lint + test only
  - Manual dispatch → deploy to any environment

Pipeline stages:

| Stage | What It Does | Tools | |-------|-------------|-------| | Lint | Code style, formatting, type checking | ESLint, Prettier, tsc --noEmit / Ruff, mypy | | Test | Unit tests, integration tests | Jest, Vitest, pytest | | Build | Compile, bundle, Docker image | tsc, next build, docker build | | Security | Dependency audit, secret scanning | npm audit, pip audit, Trivy, GitGuardian | | Deploy | Push to hosting provider | Vercel CLI, Railway CLI, AWS CDK, Docker push |

Branch Strategy Selection

| Team Size | Recommended Strategy | Workflow | |-----------|---------------------|----------| | Solo / 1-2 devs | github-flow | main + feature branches. Merge via PR. Deploy on merge to main. | | 3-5 devs | github-flow | Same, but require PR reviews. Use staging environment for pre-prod testing. | | 5-10 devs | gitflow or trunk-based | Gitflow if you need scheduled releases. Trunk-based if you deploy continuously. | | 10+ devs | trunk-based with feature flags | Short-lived branches (<1 day). Feature flags for incomplete features. |

Environment Promotion

Feature branch → PR review → merge to develop → auto-deploy to staging →
manual promote to production (merge develop → main) → auto-deploy to production

For simpler projects:

Feature branch → PR review → merge to main → auto-deploy to production

Database Migrations

Migration Tool Selection

| Stack | Recommended Tool | Alternatives | |-------|-----------------|-------------| | Node.js + PostgreSQL | Prisma Migrate | Knex, TypeORM, Drizzle Kit | | Node.js + MongoDB | Mongoose (schema-on-read) | migrate-mongo | | Python + PostgreSQL | Alembic | Django migrations, SQLAlchemy-migrate | | Python + MongoDB | No formal migrations needed | mongomock for testing |

Migration Strategy

| Concern | Recommendation | |---------|---------------| | Versioning | Sequential numbered migrations (001_create_users.sql, 002_add_orders.sql). Never edit applied migrations. | | Rollback | Every migration has an up and a down. Test rollbacks before deploying. | | CI integration | Run pending migrations automatically in CI before tests. Run in staging before production. | | Zero-downtime | Avoid breaking changes in one step. Add column → backfill → make required → remove old. | | Seed data | Dev seeds: faker/factory data for local development. Staging seeds: anonymized subset of production data. | | Production | Run migrations before deploying new code. Use advisory locks to prevent concurrent migrations. |

Common Migration Patterns

| Pattern | When | Example | |---------|------|---------| | Add nullable column | Safe, no downtime | ALTER TABLE users ADD COLUMN phone TEXT; | | Rename column | Requires migration in 2 steps | Step 1: Add new column + backfill. Step 2: Drop old column. | | Add index | Can lock table on large datasets | Use CREATE INDEX CONCURRENTLY on PostgreSQL | | Change column type | Risky — may lose data | Create new column, migrate data, drop old column |

Environment Strategy

Environment Definitions

| Environment | Purpose | Data | Access | Deploy Trigger | |------------|---------|------|--------|---------------| | Local | Developer machine | Seed data / Docker Compose | Developer only | Manual | | Development | Shared dev environment | Seed data | Dev team | Push to develop | | Staging | Pre-production testing | Anonymized prod data or rich seeds | Dev team + QA | Push to staging or manual promote | | Production | Live users | Real data | Restricted access | Push to main or manual promote |

Config Management

Environment variable categories:

| Category | Examples | Where Stored | |----------|---------|-------------| | Service config | PORT, NODE_ENV, LOG_LEVEL | .env file (local), platform env vars (deployed) | | Database | DATABASE_URL, REDIS_URL | Platform env vars, secrets manager | | Third-party API keys | STRIPE_SECRET_KEY, SENDGRID_API_KEY | Secrets manager (Doppler, AWS SSM) | | Feature flags | ENABLE_AI_AGENT, ENABLE_BETA_FEATURES | Feature flag service or env vars | | Internal service URLs | API_SERVER_URL, AGENT_SERVICE_URL | Platform env vars, service discovery |

Config validation:

Validate all environment variables on service startup using Zod, envalid, or pydantic-settings
Fail fast with clear error messages if required vars are missing
Log which environment the service is running in (but never log secret values)

Feature Flags

| Approach | When to Use | Tool | |----------|------------|------| | Environment variables | Simple on/off for 1-2 features | ENABLE_FEATURE_X=true | | Config file | Multiple flags, no runtime changes needed | features.json loaded on startup | | Feature flag service | Runtime toggling, gradual rollouts, A/B testing | LaunchDarkly ($10/mo), Unleash (open source), PostHog (free tier) |

Choosing What to Include

Not every project needs all operational patterns. Use this guide:

| Project Stage | Include | Skip | |-------------|---------|------| | MVP / proof of concept | Basic auth, console logging, simple CI (lint + test + deploy), env vars | Tracing, alerting, feature flags, multi-environment | | Early startup (pre-product-market fit) | Managed auth, structured logging, Sentry, GitHub Actions CI/CD, staging env | APM, custom metrics, PagerDuty, complex migration strategy | | Growing product (1K+ users) | All security checklist items, observability stack, full CI/CD pipeline, migration tooling, staging + production | Enterprise compliance, self-hosted tooling | | Production / enterprise | Everything above + compliance audits, APM, distributed tracing, PagerDuty, feature flags, multi-region | Nothing — you need it all |

When generating blueprints, match the depth to the project's stage and complexity. Don't overwhelm an MVP with enterprise patterns.

Operational Patterns

Security Architecture

Auth Strategy Selection

API Security Checklist

Every REST API service should implement these protections:

Data Protection

OWASP Top 10 Quick Reference

When assessing security, check against these common threats:

Broken Access Control — Ensure authorization checks on every endpoint, not just authentication
Cryptographic Failures — Don't store passwords in plaintext (use bcrypt/argon2 via auth provider), don't log tokens
Injection — Parameterized queries, input validation, no eval/exec of user input
Insecure Design — Rate limit auth endpoints, implement account lockout, use CAPTCHA on public forms
Security Misconfiguration — Remove default credentials, disable debug mode in production, review CORS
Vulnerable Components — Keep dependencies updated, run npm audit / pip audit in CI
Authentication Failures — Use a managed auth provider, implement MFA for admin roles
Data Integrity Failures — Verify webhook signatures, validate JWT signatures, use CSP headers
Logging Failures — Log auth events, access denials, and input validation failures
SSRF — Validate and whitelist URLs before making server-side requests

Observability

Observability Stack Recommendations

Structured Logging

All services should use structured JSON logging:

{
  "level": "info",
  "timestamp": "2026-02-07T10:30:00.000Z",
  "service": "api-server",
  "requestId": "req_abc123",
  "userId": "usr_xyz",
  "action": "create_order",
  "duration_ms": 145,
  "message": "Order created successfully"
}

Logging rules:

Use log levels consistently: error (broken), warn (degraded), info (business events), debug (dev only)
Include requestId for request correlation across services
Include userId for audit trail (mask in logs if compliance requires)
Never log: passwords, tokens, full credit card numbers, API keys
Always log: auth failures, permission denials, input validation errors, external API errors

Key Metrics to Track

Health Check Pattern

Every service exposes /health with tiered checks:

{
  "status": "healthy",
  "service": "api-server",
  "version": "1.2.3",
  "uptime_seconds": 86400,
  "checks": {
    "database": { "status": "healthy", "latency_ms": 5 },
    "redis": { "status": "healthy", "latency_ms": 2 },
    "external_api": { "status": "degraded", "latency_ms": 1500, "note": "slow but responding" }
  }
}

/health — quick liveness check (returns 200 if process is running)
/health/ready — readiness check (returns 200 only if all dependencies are reachable)
Used by load balancers, container orchestrators, and monitoring

CI/CD Pipeline

Pipeline Templates by Provider

GitHub Actions (recommended for most projects):

Stages: lint → test → build → deploy

Triggers:
  - Push to main → deploy to production
  - Push to develop → deploy to staging
  - Pull request → run lint + test only
  - Manual dispatch → deploy to any environment

Pipeline stages:

Branch Strategy Selection

Environment Promotion

Feature branch → PR review → merge to develop → auto-deploy to staging →
manual promote to production (merge develop → main) → auto-deploy to production

For simpler projects:

Feature branch → PR review → merge to main → auto-deploy to production

Database Migrations

Migration Tool Selection

Migration Strategy

Common Migration Patterns

Environment Strategy

Environment Definitions

Config Management

Environment variable categories:

Config validation:

Validate all environment variables on service startup using Zod, envalid, or pydantic-settings
Fail fast with clear error messages if required vars are missing
Log which environment the service is running in (but never log secret values)

Feature Flags

Choosing What to Include

Not every project needs all operational patterns. Use this guide:

When generating blueprints, match the depth to the project's stage and complexity. Don't overwhelm an MVP with enterprise patterns.

Adoption

navraj007in/operational-patterns

$ install --global

Security Scan Results

SKILL.md

Operational Patterns

Security Architecture

Auth Strategy Selection

API Security Checklist

Data Protection

OWASP Top 10 Quick Reference

Observability

Observability Stack Recommendations

Structured Logging

Key Metrics to Track

Health Check Pattern

CI/CD Pipeline

Pipeline Templates by Provider

Branch Strategy Selection

Environment Promotion

Database Migrations

Migration Tool Selection

Migration Strategy

Common Migration Patterns

Environment Strategy

Environment Definitions

Config Management

Feature Flags

Choosing What to Include

Related Skills

navraj007in/skills/tradeoff-analysis

navraj007in/skills/stage-detection

navraj007in/skills/stack-swap-simulator

navraj007in/skills/stack-compatibility

navraj007in/operational-patterns

$ install --global

Security Scan Results

SKILL.md

Operational Patterns

Security Architecture

Auth Strategy Selection

API Security Checklist

Data Protection

OWASP Top 10 Quick Reference

Observability

Observability Stack Recommendations

Structured Logging

Key Metrics to Track

Health Check Pattern

CI/CD Pipeline

Pipeline Templates by Provider

Branch Strategy Selection

Environment Promotion

Database Migrations

Migration Tool Selection

Migration Strategy

Common Migration Patterns

Environment Strategy

Environment Definitions

Config Management

Feature Flags

Choosing What to Include

Related Skills

navraj007in/skills/tradeoff-analysis

navraj007in/skills/stage-detection

navraj007in/skills/stack-swap-simulator

navraj007in/skills/stack-compatibility