.agents/starflow/skills/starflow-dev-pipeline/SKILL.md
Implement a data pipeline from a pipeline specification, generating Starlake configuration files. Use when the user says "implement pipeline" or "dev this pipeline".
npx skillsauth add starlake-ai/starlake-skills starflow-dev-pipelineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Implements a data pipeline by generating all necessary Starlake configuration files (YAML + SQL) from a pipeline specification. Sets up the Starlake metadata directory structure, creates load schemas, transform queries, extraction configs, and DAG definitions. Validates the implementation locally with DuckDB.
Role Guidance: Act as a Data Engineer building production-grade data pipelines using Starlake's declarative configuration.
Design Rationale: Pipeline implementation with Starlake means creating YAML + SQL files — not writing application code. The framework handles execution mechanics. Focus is on correct configuration, comprehensive schemas, and testable SQL.
{implementation_artifacts}/pipeline-spec-*.md.{planning_artifacts}/data-architecture-*.md{planning_artifacts}/schema-design-*.md{implementation_artifacts}/transform-design-*.md{implementation_artifacts}/orchestration-design-*.mdIf the Starlake project doesn't exist yet:
starlake bootstrap
This creates the base metadata/ directory structure. Then configure:
metadata/application.sl.yml — global settings and connectionsmetadata/env.sl.yml — base environment variablesmetadata/types/ — custom type definitionsFor each table in the pipeline:
metadata/load/{domain}/metadata/load/{domain}/_config.sl.ymlmetadata/load/{domain}/{table}.sl.ymlstarlake validateFor each transformation task:
metadata/transform/{domain}/metadata/transform/{domain}/{task}.sqlmetadata/transform/{domain}/{task}.sl.ymlmetadata/expectations/{domain}.j2For JDBC sources:
metadata/extract/{source}.sl.ymlstarlake extract-datastarlake infer-schemametadata/dags/{dag_name}.sl.ymlstarlake dag-generateRun the full pipeline locally with DuckDB:
# Stage incoming files
starlake stage
# Load data
starlake load
# Run transforms (with dependencies)
starlake transform --name {domain}.{task} --recursive
# Check lineage
starlake lineage --task {domain}.{task}
# Validate all configs
starlake validate
Generate implementation summary to {implementation_artifacts}/pipeline-impl-{{pipeline_name}}.md covering:
bootstrap skill to initialize a new Starlake projectload skill for write strategy and sink configuration detailstransform skill for SQL transformation execution optionsextract-schema skill for JDBC schema extractionextract-data skill for data extraction to filesdag-generate skill for DAG generation options and templatesvalidate skill to check all configuration filesconfig skill for environment variables and connection setupA fully implemented, locally validated Starlake pipeline with all YAML configuration, SQL transforms, and orchestration DAGs — ready for deployment.
development
Design SQL transformations for data pipelines with quality checks and dependency management. Use when the user says "design transforms" or "create SQL transformations".
devops
Plan and track sprint progress for data pipeline implementation. Use when the user says "sprint planning" or "plan data sprint".
testing
Analyze data sources in depth: schema, quality, volume, and extraction strategy. Use when the user says "analyze data source" or "profile this data source".
data-ai
Design Starlake-compatible table schemas with types, constraints, privacy, and expectations. Use when the user says "design schema" or "create table definition".