src/orchestrator/skills/data-engineering/SKILL.md
Transforms, validates, loads data in ETL pipelines. Use when building scrapers, validating NDJSON feeds, or importing data into CMS/DB targets.
npx skillsauth add monkilabs/opencastle data-engineeringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generic pipeline patterns. For project-specific sources, full schema references see REFERENCE.md.
Launch a headless browser cluster (Puppeteer Cluster / Playwright) with retryLimit: 3, retryDelay: 5000, timeout: 30000, args: ['--no-sandbox', '--disable-setuid-sandbox'].
One record per line. Schema:
| Field | Type | Notes |
|-------|------|-------|
| name | Required | Preserve original encoding |
| lat/lng | Required | GPS coordinates |
| address | Required | Full text |
| source | Required | e.g. google-maps |
| sourceId | Required | Source-unique ID |
| category | Required | Domain category |
| rating, reviewCount, phone, website, openingHours, photos, priceLevel | Optional | — |
--dry-run to collect sample (50–200 records).
validate-ndjson.js example).
ndjson-filter to isolate failing records; inspect source HTML.createOrReplace disabled; check counts, duplicates.
node ./scripts/scrape-to-ndjson.js --out=data.ndjson --pages=100
node ./scripts/validate-ndjson.js data.ndjson
node ./scripts/dry-import.js data.ndjson --target=staging
node ./scripts/import.js data.ndjson --target=production
const fs = require('fs'), rl = require('readline'), { z } = require('zod');
const schema = z.object({ name: z.string(), source: z.string(), sourceId: z.string() });
const iface = rl.createInterface({ input: fs.createReadStream(process.argv[2]) });
let line = 0, errors = 0;
for await (const l of iface) { line++; try { schema.parse(JSON.parse(l)); } catch(e) { console.error(`Line ${line}:`, e.message); errors++; } }
if (errors) { console.error(`${errors} errors`); process.exit(2); }
console.log('OK');
Full scraper, extended validator: see REFERENCE.md.
development
Defines 10 sequential validation gates: secret scanning, lint/test/build checks, blast radius analysis, dependency auditing, browser testing, cache management, regression checks, smoke tests. Use when running pre-deploy validation or CI checks, CI/CD pipelines, deployment pipeline validation, pre-merge checks, continuous integration, or pull request validation.
development
Generates test plans, writes unit/integration/E2E test files, identifies coverage gaps, flags common testing anti-patterns. Use when writing tests, creating test suites, planning test strategies, mocking dependencies, measuring code coverage, or test planning.
development
Provides model routing rules, validates delegation prerequisites, supplies cost tracking templates, defines dead-letter queue formats for Team Lead orchestration. Load when assigning tasks to agents, choosing model tiers, starting delegation session, running multi-agent workflow, delegating work, choosing which model to use, or assigning tasks.
testing
Saves, restores session state including task progress, file changes, delegation history. Use when saving progress, resuming interrupted work, picking up where you left off, or checkpointing current work.