.agents/starflow/skills/starflow-create-pipeline-spec/SKILL.md
Create a complete pipeline specification covering extract, load, transform, and orchestrate. Use when the user says "create pipeline spec" or "design a data pipeline".
npx skillsauth add starlake-ai/starlake-skills starflow-create-pipeline-specInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Creates a comprehensive pipeline specification that covers the full ETL/ELT lifecycle: extraction from sources, loading into the target system, SQL transformations, and orchestration scheduling. The spec produces implementation-ready Starlake configuration files.
Role Guidance: Act as a Data Architect who translates business requirements into detailed pipeline specifications using Starlake's declarative configuration model.
Design Rationale: A pipeline spec is the bridge between architecture decisions and implementation. It defines exactly what data moves where, how it's transformed, and when it runs. Starlake's declarative model means the spec IS the implementation — YAML + SQL, not code.
{planning_artifacts}/domain-discovery-*.md{planning_artifacts}/data-architecture-*.md{planning_artifacts}/source-analysis-*.mdFor database sources, define JDBC extraction config:
version: 1
extract:
connectionRef: "source_db"
jdbcSchemas:
- schema: "public"
tables:
- name: "orders"
columns: ["*"]
fetchSize: 10000
partitionColumn: "order_id"
numPartitions: 4
For file sources, document:
For each table in the pipeline, define:
For each transformation task:
Document the transformation DAG (dependency graph).
Define the DAG configuration:
version: 1
dag:
comment: "Daily orders pipeline"
template: "dag_template.py.j2"
filename: "dag_orders_daily"
schedule: "0 6 * * *" # Daily at 6 AM
options:
catchup: false
dagrun_timeout: 7200
Specify:
Document connection configs per environment:
# env.sl.yml (base)
connections:
source_db:
type: "jdbc"
options:
url: "jdbc:postgresql://localhost:5432/source"
driver: "org.postgresql.Driver"
warehouse:
type: "duckdb"
options:
path: "./data/warehouse.db"
Generate:
{implementation_artifacts}/pipeline-spec-{{pipeline_name}}.md{implementation_artifacts}/starlake-config/load skill for detailed write strategy options and file format supporttransform skill for transformation task configurationextract skill for extraction method referencedag-generate skill for orchestration template optionsconnection skill for connection configuration patternsA complete, implementation-ready pipeline specification with Starlake YAML configurations for extract, load, transform, and orchestrate — ready for the Data Engineer to implement.
development
Design SQL transformations for data pipelines with quality checks and dependency management. Use when the user says "design transforms" or "create SQL transformations".
devops
Plan and track sprint progress for data pipeline implementation. Use when the user says "sprint planning" or "plan data sprint".
testing
Analyze data sources in depth: schema, quality, volume, and extraction strategy. Use when the user says "analyze data source" or "profile this data source".
data-ai
Design Starlake-compatible table schemas with types, constraints, privacy, and expectations. Use when the user says "design schema" or "create table definition".