.claude/skills/ts-data-validator/SKILL.md
Validate data quality in CSV, JSON, and database exports by checking for missing values, type mismatches, duplicates, outliers, and schema violations. Use when building ETL pipelines, auditing data imports, checking data freshness, or ensuring data contracts between teams. Trigger words: data quality, validation, null values, duplicates, schema check, data contract, ETL, pipeline, data drift.
npx skillsauth add eliferjunior/Claude data-validatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Perform comprehensive data quality checks on datasets — validate schemas, detect anomalies, find duplicates, and enforce data contracts. Essential for ETL pipelines where bad data silently corrupts downstream analytics and dashboards.
Before validating, understand the data:
Present as a data profile summary:
Dataset Profile: orders_export.csv
Rows: 142,847 | Columns: 12
| Column | Type | Nulls | Unique | Sample Values |
|---------------|---------|--------|---------|------------------------|
| order_id | string | 0% | 142,847 | ORD-20260217-001 |
| customer_id | integer | 0.3% | 28,491 | 10042, 10043 |
| amount | float | 0% | 8,234 | 29.99, 149.00 |
| created_at | date | 0% | 89,112 | 2026-02-17T14:23:01Z |
| status | string | 0% | 5 | completed, pending |
Apply these checks systematically:
Completeness — Are required fields populated?
Uniqueness — Are IDs actually unique?
Type consistency — Do values match expected types?
Range validity — Are values within expected bounds?
Referential integrity — Do foreign keys match?
Freshness — Is the data up to date?
Structure results as a validation report:
## Data Validation Report
Dataset: orders_export.csv | Checked: 2026-02-17 14:30 UTC
### ❌ Failed Checks (3)
1. **Duplicate order_id** — 23 duplicate IDs found (0.016%)
Sample: ORD-20260215-4421 appears 3 times
2. **Null customer_email** — 2.1% null (threshold: 1%)
Spike on 2026-02-15 (bulk import batch)
3. **Future dates in created_at** — 7 rows have dates in 2027
### ⚠️ Warnings (2)
1. **Amount outliers** — 4 orders exceed $10,000 (review manually)
2. **Status enum drift** — New value "on_hold" not in expected set
### ✅ Passed Checks (8)
- Primary key uniqueness (order_id): PASS (99.98%)
- Amount non-negative: PASS
- Date range valid: PASS (excluding 7 future dates)
...
For each failed check, recommend a concrete action:
Input: "Validate this customer export before importing to our new CRM"
Output:
## Validation Report: customers_export.csv
Rows: 8,234 | Columns: 9
### ❌ Failed (2)
1. **Email format invalid** — 142 rows (1.7%) have malformed emails
Pattern: missing @ symbol, spaces in domain
Fix: SELECT * FROM data WHERE email NOT LIKE '%_@_%.__%'
2. **Phone duplicates** — 34 phone numbers shared by multiple customers
Likely data entry errors or shared business lines
Fix: Review and merge duplicate customer records
### ✅ Passed (6)
- customer_id unique: PASS
- Required fields (name, email): PASS (99.8%)
- Country codes valid ISO-3166: PASS
- Created dates in range: PASS
- No PII in notes field: PASS
- UTF-8 encoding clean: PASS
Recommendation: Fix the 142 invalid emails before import.
The phone duplicates are non-blocking but should be reviewed.
Input: "Check if the analytics events from our tracking pipeline match the expected schema"
Output:
## Schema Contract Validation: events_2026_02_17.json
Events: 1.2M | Schema version: v3.2
### ❌ Contract Violations (1)
1. **Missing required field: session_id**
- 12,847 events (1.07%) missing session_id
- All from source: mobile-ios-sdk v4.1.0
- Started: 2026-02-16 09:00 UTC (correlates with SDK release)
### ⚠️ Warnings (1)
1. **New field detected: device_model**
- Not in schema v3.2 — likely added in SDK update
- 100% populated on mobile events, absent on web
### ✅ Contract Compliance: 98.93%
Action: Pin mobile-ios-sdk to v4.0.x or update schema to v3.3
with session_id as optional for mobile sources.
development
Expert guidance for Fireworks AI, the platform for running open-source LLMs (Llama, Mixtral, Qwen, etc.) with enterprise-grade speed and reliability. Helps developers integrate Fireworks' inference API, fine-tune models, and deploy custom model endpoints with function calling and structured output support.
development
Convert any website into clean, structured data with Firecrawl — API-first web scraping service. Use when someone asks to "turn a website into markdown", "scrape website for LLM", "Firecrawl", "extract website content as clean text", "crawl and convert to structured data", or "scrape website for RAG". Covers single-page scraping, full-site crawling, structured extraction, and LLM-ready output.
tools
Expert guidance for Firebase, Google's platform for building and scaling web and mobile applications. Helps developers set up authentication, Firestore/Realtime Database, Cloud Functions, hosting, storage, and analytics using Firebase's SDK and CLI.
development
When the user needs to build file upload functionality for a web application. Use when the user mentions "file upload," "image upload," "upload endpoint," "multipart upload," "presigned URL," "S3 upload," "file validation," "upload to cloud storage," or "accept user files." Handles upload endpoints, file validation (type, size, magic bytes), cloud storage integration, and upload status tracking. For image/video processing after upload, see media-transcoder.