src/orchestrator/skills/data-engineering/SKILL.md
Transforms, validates, and loads data in ETL pipelines. Use when building scrapers, validating NDJSON feeds, or importing data into CMS/DB targets.
npx skillsauth add etylsarin/opencastle data-engineeringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generic pipeline patterns. For project-specific sources and full schema references see REFERENCE.md.
Launch a headless browser cluster (Puppeteer Cluster / Playwright) with retryLimit: 3, retryDelay: 5000, timeout: 30000, args: ['--no-sandbox', '--disable-setuid-sandbox'].
One record per line. Schema:
| Field | Type | Notes |
|-------|------|-------|
| name | Required | Preserve original encoding |
| lat/lng | Required | GPS coordinates |
| address | Required | Full text |
| source | Required | e.g. google-maps |
| sourceId | Required | Source-unique ID |
| category | Required | Domain category |
| rating, reviewCount, phone, website, openingHours, photos, priceLevel | Optional | — |
--dry-run to collect a sample (50–200 records).
validate-ndjson.js example).
ndjson-filter to isolate failing records and inspect source HTML.createOrReplace disabled; check counts and duplicates.
node ./scripts/scrape-to-ndjson.js --out=data.ndjson --pages=100
node ./scripts/validate-ndjson.js data.ndjson
node ./scripts/dry-import.js data.ndjson --target=staging
node ./scripts/import.js data.ndjson --target=production
const fs = require('fs'), rl = require('readline'), { z } = require('zod');
const schema = z.object({ name: z.string(), source: z.string(), sourceId: z.string() });
const iface = rl.createInterface({ input: fs.createReadStream(process.argv[2]) });
let line = 0, errors = 0;
for await (const l of iface) { line++; try { schema.parse(JSON.parse(l)); } catch(e) { console.error(`Line ${line}:`, e.message); errors++; } }
if (errors) { console.error(`${errors} errors`); process.exit(2); }
console.log('OK');
Full scraper and extended validator: see REFERENCE.md.
development
Defines 10 sequential validation gates: secret scanning, lint/test/build checks, blast radius analysis, dependency auditing, browser testing, cache management, regression checks, and smoke tests. Use when running pre-deploy validation or CI checks, CI/CD pipelines, deployment pipeline validation, pre-merge checks, continuous integration, or pull request validation.
development
Generates test plans, writes unit/integration/E2E test files, identifies coverage gaps, and flags common testing anti-patterns. Use when writing tests, creating test suites, planning test strategies, mocking dependencies, measuring code coverage, or test planning.
development
Provides model routing rules, validates delegation prerequisites, supplies cost tracking templates, and defines dead-letter queue formats for Team Lead orchestration. Load when assigning tasks to agents, choosing model tiers, starting a delegation session, running a multi-agent workflow, delegating work, choosing which model to use, or assigning tasks.
testing
Saves and restores session state including task progress, file changes, and delegation history. Use when saving progress, resuming interrupted work, picking up where you left off, or checkpointing current work.