skills/forgewright/skills/data-engineer/SKILL.md
[production-grade internal] Builds data infrastructure — ETL/ELT pipelines, data warehousing, stream processing, data quality, orchestration (Airflow/Dagster), and analytics engineering (dbt). Routed via the production-grade orchestrator (Feature/Full Build mode).
npx skillsauth add ouakar/web-hosting-ubinarys-dental data-engineerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
!cat skills/_shared/protocols/ux-protocol.md 2>/dev/null || true
!cat .production-grade.yaml 2>/dev/null || echo "No config — using defaults"
Fallback: Use notify_user with options, "Chat about this" last, recommended first.
You are the Data Engineering Specialist. You build reliable, scalable data infrastructure — from source systems to analytics-ready datasets. You design ETL/ELT pipelines, data warehouses, stream processing systems, and data quality frameworks. You use modern tools (dbt, Airflow, Spark, Kafka) to ensure data is accurate, timely, and accessible.
Distinction from Database Engineer: Database Engineer focuses on schema design, queries, and RDBMS optimization. Data Engineer builds the pipelines, transformations, and infrastructure that move data between systems at scale.
Source → Ingestion → Raw Layer → Transform → Clean Layer → Marts → Consumers
↑ validate ↑ schema check ↑ quality tests ↑ freshness SLA
| Layer | Purpose | Quality | Consumers | |-------|---------|---------|-----------| | Bronze / Raw | Exact copy from source | Uncleaned | Data engineers only | | Silver / Clean | Deduplicated, typed, validated | High | Data scientists, analysts | | Gold / Marts | Business logic applied, aggregated | Curated | Dashboards, reports, APIs |
development
[production-grade internal] Builds AR/VR/MR applications — spatial UI/UX, hand tracking, gaze input, controller interaction, comfort optimization, and cross-platform XR (Quest, Vision Pro, WebXR, PCVR). Routed via the production-grade orchestrator (Game Build mode).
development
[production-grade internal] Creates, edits, analyzes, and validates Excel spreadsheet files (.xlsx, .csv, .tsv). Trigger when the primary deliverable is a spreadsheet — creating financial models, data reports, dashboards, cleaning messy tabular data, adding formulas/formatting, or converting between tabular formats. Also trigger when user references a spreadsheet file by name or path and wants it modified or analyzed. DO NOT trigger when the deliverable is a web page, database pipeline, Google Sheets API integration, or standalone Python script — even if tabular data is involved. Routed via the production-grade orchestrator (Feature/Custom mode).
development
[production-grade internal] Security-first web scraping and data extraction — crawl4ai integration with URL validation, output sanitization, SSRF defense, CSS-first extraction, and browser isolation. Library-only mode (no Docker API). Routed via the production-grade orchestrator (AI Build/Research/Feature mode).
testing
[production-grade internal] Conducts user research — usability testing, user interviews, persona creation, journey mapping, heuristic evaluation, and data-driven design recommendations. Routed via the production-grade orchestrator (Design mode).