.agents/starflow/skills/starflow-transform-design/SKILL.md
Design SQL transformations for data pipelines with quality checks and dependency management. Use when the user says "design transforms" or "create SQL transformations".
npx skillsauth add starlake-ai/starlake-skills starflow-transform-designInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guides the design of SQL transformations that turn raw/curated data into business-ready datasets. Produces Starlake-compatible .sql files and their companion .sl.yml configuration files, including write strategies, sink configuration, and data quality expectations.
Role Guidance: Act as a Data Engineer with expertise in SQL analytics, data modeling, and Starlake's transform engine.
Design Rationale: Transformations are where business logic lives. They must be pure SQL (or Python for complex cases), testable locally on DuckDB, and deployable to any target engine via Starlake's SQL transpiler. Dependencies are inferred from SQL table references.
{planning_artifacts}/data-architecture-*.md{implementation_artifacts}/pipeline-spec-*.md{planning_artifacts}/schema-design-*.mdList all transformations needed:
| Task Name | Source Tables | Target Table | Write Strategy | Schedule |
|-----------|-------------|--------------|----------------|----------|
| e.g., orders_daily_agg | sales.orders, ref.products | analytics.orders_daily | OVERWRITE_BY_PARTITION | Daily |
For each transformation, write the SQL:
-- Task: orders_daily_agg
-- Dependencies: sales.orders, ref.products (auto-inferred)
SELECT
DATE(o.order_date) AS order_date,
p.category,
COUNT(*) AS order_count,
SUM(o.amount) AS total_amount,
AVG(o.amount) AS avg_amount
FROM sales.orders o
JOIN ref.products p ON o.product_id = p.product_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '1 day'
GROUP BY DATE(o.order_date), p.category
Guidelines:
{domain}.{table} notationSL_THIS to reference the current task's output table in expectationsFor each task, create the .sl.yml:
version: 1
transform:
name: "orders_daily_agg"
writeStrategy:
type: "OVERWRITE_BY_PARTITION"
key: ["order_date"]
sink:
partition: ["order_date"]
clustering: ["category"]
connectionRef: "warehouse"
expectations:
- query: "SELECT COUNT(*) > 0 FROM SL_THIS"
name: "not_empty"
failSeverity: "ERROR"
- query: "SELECT COUNT(*) = 0 FROM SL_THIS WHERE total_amount < 0"
name: "no_negative_amounts"
failSeverity: "ERROR"
Document the transformation DAG:
--recursive executionGenerate:
{implementation_artifacts}/transform-design-{{domain}}.md{implementation_artifacts}/transforms/{domain}/transform skill for SQL/Python transformation execution optionsexpectations skill for transform-level data quality checksconfig skill for available SQL functions and type referenceComplete SQL transformations with Starlake task configurations, quality expectations, and a documented dependency graph — ready for implementation and testing.
devops
Plan and track sprint progress for data pipeline implementation. Use when the user says "sprint planning" or "plan data sprint".
testing
Analyze data sources in depth: schema, quality, volume, and extraction strategy. Use when the user says "analyze data source" or "profile this data source".
data-ai
Design Starlake-compatible table schemas with types, constraints, privacy, and expectations. Use when the user says "design schema" or "create table definition".
devops
Platform Engineer agent — manages infrastructure, orchestration, and deployment for data pipelines. Use when the user says "platform-engineer" or "talk to the platform-engineer".