Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

starlake-ai/starflow-create-pipeline-spec

Name: starflow-create-pipeline-spec
Author: starlake-ai

.agents/starflow/skills/starflow-create-pipeline-spec/SKILL.md

npx skillsauth add starlake-ai/starlake-skills starflow-create-pipeline-spec

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Pipeline Specification

Overview

Creates a comprehensive pipeline specification that covers the full ETL/ELT lifecycle: extraction from sources, loading into the target system, SQL transformations, and orchestration scheduling. The spec produces implementation-ready Starlake configuration files.

Role Guidance: Act as a Data Architect who translates business requirements into detailed pipeline specifications using Starlake's declarative configuration model.

Design Rationale: A pipeline spec is the bridge between architecture decisions and implementation. It defines exactly what data moves where, how it's transformed, and when it runs. Starlake's declarative model means the spec IS the implementation — YAML + SQL, not code.

Steps

Step 1: Pipeline Context

Load available planning artifacts:
- {planning_artifacts}/domain-discovery-*.md
- {planning_artifacts}/data-architecture-*.md
- {planning_artifacts}/source-analysis-*.md
Ask the user:
- Which domain/pipeline to specify?
- What is the business objective of this pipeline?
- What is the target SLA (freshness requirement)?

Step 2: Extract Specification

For database sources, define JDBC extraction config:

version: 1
extract:
  connectionRef: "source_db"
  jdbcSchemas:
    - schema: "public"
      tables:
        - name: "orders"
          columns: ["*"]
          fetchSize: 10000
          partitionColumn: "order_id"
          numPartitions: 4

For file sources, document:

File location and naming pattern
File format and delimiters
Arrival schedule
File sensor or ACK file strategy

Step 3: Load Specification

For each table in the pipeline, define:

Domain assignment: Which Starlake domain
File pattern: Regex to match incoming files
Write strategy: APPEND, OVERWRITE, UPSERT_BY_KEY, SCD2, etc.
Schema: Attributes with types, constraints, privacy
Expectations: Data quality checks at load time
Sink configuration: Partitioning, clustering, connection

Step 4: Transform Specification

For each transformation task:

Input tables: Source tables referenced in SQL
Output table: Target table name and write strategy
SQL logic: The transformation query (referencing source tables directly)
Expectations: Post-transform quality checks
Dependencies: Automatically inferred from SQL, but document explicitly
Recursive execution: Whether upstream tasks should run first

Document the transformation DAG (dependency graph).

Step 5: Orchestration Specification

Define the DAG configuration:

version: 1
dag:
  comment: "Daily orders pipeline"
  template: "dag_template.py.j2"
  filename: "dag_orders_daily"
  schedule: "0 6 * * *"   # Daily at 6 AM
  options:
    catchup: false
    dagrun_timeout: 7200

Specify:

Schedule (cron expression)
Dependencies between DAGs
File sensor triggers (if event-driven)
Retry and timeout policies
Alert channels on failure

Step 6: Environment Configuration

Document connection configs per environment:

# env.sl.yml (base)
connections:
  source_db:
    type: "jdbc"
    options:
      url: "jdbc:postgresql://localhost:5432/source"
      driver: "org.postgresql.Driver"
  warehouse:
    type: "duckdb"
    options:
      path: "./data/warehouse.db"

Step 7: Output Generation

Generate:

Pipeline specification to {implementation_artifacts}/pipeline-spec-{{pipeline_name}}.md
Starlake configuration files to {implementation_artifacts}/starlake-config/

Related Starlake Skills

Use the load skill for detailed write strategy options and file format support
Use the transform skill for transformation task configuration
Use the extract skill for extraction method reference
Use the dag-generate skill for orchestration template options
Use the connection skill for connection configuration patterns

Outcome

A complete, implementation-ready pipeline specification with Starlake YAML configurations for extract, load, transform, and orchestrate — ready for the Data Engineer to implement.

starlake-ai/starflow-create-pipeline-spec

.agents/starflow/skills/starflow-create-pipeline-spec/SKILL.md

Create a complete pipeline specification covering extract, load, transform, and orchestrate. Use when the user says "create pipeline spec" or "design a data pipeline".

1 stars

testing

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add starlake-ai/starlake-skills starflow-create-pipeline-spec

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 16, 2026, 3:37 AM7.2s1 file scanned

SKILL.md

name:: starflow-create-pipeline-spec
description:: Create a complete pipeline specification covering extract, load, transform, and orchestrate. Use when the user says "create pipeline spec" or "design a data pipeline".

Pipeline Specification

Overview

Role Guidance: Act as a Data Architect who translates business requirements into detailed pipeline specifications using Starlake's declarative configuration model.

Steps

Step 1: Pipeline Context

Load available planning artifacts:
- {planning_artifacts}/domain-discovery-*.md
- {planning_artifacts}/data-architecture-*.md
- {planning_artifacts}/source-analysis-*.md
Ask the user:
- Which domain/pipeline to specify?
- What is the business objective of this pipeline?
- What is the target SLA (freshness requirement)?

Step 2: Extract Specification

For database sources, define JDBC extraction config:

version: 1
extract:
  connectionRef: "source_db"
  jdbcSchemas:
    - schema: "public"
      tables:
        - name: "orders"
          columns: ["*"]
          fetchSize: 10000
          partitionColumn: "order_id"
          numPartitions: 4

For file sources, document:

File location and naming pattern
File format and delimiters
Arrival schedule
File sensor or ACK file strategy

Step 3: Load Specification

For each table in the pipeline, define:

Domain assignment: Which Starlake domain
File pattern: Regex to match incoming files
Write strategy: APPEND, OVERWRITE, UPSERT_BY_KEY, SCD2, etc.
Schema: Attributes with types, constraints, privacy
Expectations: Data quality checks at load time
Sink configuration: Partitioning, clustering, connection

Step 4: Transform Specification

For each transformation task:

Input tables: Source tables referenced in SQL
Output table: Target table name and write strategy
SQL logic: The transformation query (referencing source tables directly)
Expectations: Post-transform quality checks
Dependencies: Automatically inferred from SQL, but document explicitly
Recursive execution: Whether upstream tasks should run first

Document the transformation DAG (dependency graph).

Step 5: Orchestration Specification

Define the DAG configuration:

version: 1
dag:
  comment: "Daily orders pipeline"
  template: "dag_template.py.j2"
  filename: "dag_orders_daily"
  schedule: "0 6 * * *"   # Daily at 6 AM
  options:
    catchup: false
    dagrun_timeout: 7200

Specify:

Schedule (cron expression)
Dependencies between DAGs
File sensor triggers (if event-driven)
Retry and timeout policies
Alert channels on failure

Step 6: Environment Configuration

Document connection configs per environment:

# env.sl.yml (base)
connections:
  source_db:
    type: "jdbc"
    options:
      url: "jdbc:postgresql://localhost:5432/source"
      driver: "org.postgresql.Driver"
  warehouse:
    type: "duckdb"
    options:
      path: "./data/warehouse.db"

Step 7: Output Generation

Generate:

Pipeline specification to {implementation_artifacts}/pipeline-spec-{{pipeline_name}}.md
Starlake configuration files to {implementation_artifacts}/starlake-config/

Related Starlake Skills

Use the load skill for detailed write strategy options and file format support
Use the transform skill for transformation task configuration
Use the extract skill for extraction method reference
Use the dag-generate skill for orchestration template options
Use the connection skill for connection configuration patterns

Outcome

A complete, implementation-ready pipeline specification with Starlake YAML configurations for extract, load, transform, and orchestrate — ready for the Data Engineer to implement.

Related Skills

starlake-ai/starflow-transform-design

development

VerifiedTrustedCommunity

Design SQL transformations for data pipelines with quality checks and dependency management. Use when the user says "design transforms" or "create SQL transformations".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-transform-design

starlake-ai/starflow-sprint-planning

devops

VerifiedTrustedCommunity

Plan and track sprint progress for data pipeline implementation. Use when the user says "sprint planning" or "plan data sprint".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-sprint-planning

starlake-ai/starflow-source-analysis

testing

VerifiedTrustedCommunity

Analyze data sources in depth: schema, quality, volume, and extraction strategy. Use when the user says "analyze data source" or "profile this data source".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-source-analysis

starlake-ai/starflow-schema-design

data-ai

VerifiedTrustedCommunity

Design Starlake-compatible table schemas with types, constraints, privacy, and expectations. Use when the user says "design schema" or "create table definition".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-schema-design

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/starlake-ai/starlake-skills.git

# Copy into Claude Code skills folder (global)
cp -r starlake-skills/.agents/starflow/skills/starflow-create-pipeline-spec ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

starlake-ai/starlake-skills

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT