skills/data-quality-guardian/SKILL.md
Great Expectations, dbt tests, anomaly detection, and data contracts for data quality. Activate on: data quality, data validation, Great Expectations, data contract, anomaly detection, SLA, freshness check, schema validation. NOT for: dbt model structure (use dbt-analytics-engineer), schema evolution (use schema-evolution-manager).
npx skillsauth add curiositech/windags-skills data-quality-guardianInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Implement comprehensive data quality frameworks with automated validation, anomaly detection, data contracts, and SLA monitoring.
Activate on: "data quality", "data validation", "Great Expectations", "data contract", "anomaly detection", "SLA", "freshness check", "schema validation", "data observability", "Soda", "elementary"
NOT for: dbt project layout → dbt-analytics-engineer | Schema evolution strategy → schema-evolution-manager | Data migration validation → data-migration-specialist
| Domain | Technologies | |--------|-------------| | Testing | dbt tests, Great Expectations 1.x, Soda Core 3.x | | Observability | elementary (dbt), Monte Carlo, Anomalo | | Contracts | Soda data contracts, dbt model contracts, Protobuf schemas | | Anomaly Detection | elementary anomaly monitors, custom z-score, Prophet | | Alerting | Slack/PagerDuty integration, SLA miss alerts |
Raw Data (Landing)
↓
Layer 1: Schema Validation
- Column types match contract
- No unexpected NULLs in required fields
- Row count within expected range
↓
Layer 2: Business Rule Validation
- Referential integrity (FK relationships hold)
- accepted_values constraints
- Custom SQL assertions (e.g., revenue >= 0)
↓
Layer 3: Statistical Anomaly Detection
- Row count deviation from 7-day rolling average
- Null rate spike detection
- Distribution shift (KL divergence)
↓
Layer 4: Freshness & SLA Monitoring
- Source loaded within 2 hours
- Downstream models built within 4 hours
- Dashboard data <6 hours old
# contracts/payments-contract.yml
contract:
name: stripe_payments
owner: payments-team
version: "2.0"
sla:
freshness: 2h # must be updated within 2 hours
volume_min: 1000 # at least 1000 rows per day
volume_max: 500000 # no more than 500k (anomaly if exceeded)
schema:
- name: payment_id
type: string
required: true
unique: true
- name: amount_cents
type: integer
required: true
checks:
- type: range
min: 0
max: 10000000 # $100k max
- name: status
type: string
required: true
checks:
- type: accepted_values
values: [succeeded, failed, pending, refunded]
- name: created_at
type: timestamp
required: true
checks:
- type: not_in_future
# dbt model with elementary anomaly monitors
version: 2
models:
- name: fct_orders
tests:
- elementary.volume_anomalies:
timestamp_column: created_at
where: "created_at > dateadd(day, -30, current_date())"
sensitivity: 3 # z-score threshold
- elementary.column_anomalies:
column_name: total_amount
where: "created_at > dateadd(day, -30, current_date())"
- elementary.freshness_anomalies:
timestamp_column: _loaded_at
sensitivity: 2
unique, not_null, accepted_values, relationshipstools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.