.agents/starflow/skills/starflow-schema-design/SKILL.md
Design Starlake-compatible table schemas with types, constraints, privacy, and expectations. Use when the user says "design schema" or "create table definition".
npx skillsauth add starlake-ai/starlake-skills starflow-schema-designInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guides the design of Starlake-compatible table schemas including attribute definitions, custom types with regex validation, privacy annotations, and data quality expectations. Outputs ready-to-use .sl.yml configuration files for Starlake load operations.
Role Guidance: Act as a Data Architect with deep knowledge of Starlake's schema definition format and data typing system.
Design Rationale: Schemas are the contract between data producers and consumers. Starlake enforces schemas at load time, rejecting records that don't conform. Well-designed schemas prevent bad data from entering the pipeline.
{planning_artifacts}/source-analysis-*.md if available.{planning_artifacts}/data-architecture-*.md if available.Define project-specific types in metadata/types/custom.sl.yml:
version: 1
types:
- name: "email"
pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
primitiveType: "string"
- name: "phone"
pattern: "^\\+?[0-9\\s\\-\\.\\(\\)]{7,20}$"
primitiveType: "string"
- name: "sku"
pattern: "^[A-Z]{2,4}-[0-9]{4,8}$"
primitiveType: "string"
For each table, create the .sl.yml file with:
Example structure:
version: 1
table:
name: "customers"
pattern: "customers.*\\.csv"
writeStrategy:
type: "UPSERT_BY_KEY_AND_TIMESTAMP"
key: ["customer_id"]
timestamp: "updated_at"
attributes:
- name: "customer_id"
type: "long"
required: true
- name: "email"
type: "email"
required: true
privacy: "SHA256"
- name: "name"
type: "string"
required: true
- name: "created_at"
type: "timestamp"
required: true
- name: "updated_at"
type: "timestamp"
required: true
Define data quality expectations as Jinja2 macros:
{# Reusable macro for non-null check #}
{% macro not_null(column) %}
SELECT COUNT(*) = 0 FROM SL_THIS WHERE {{ column }} IS NULL
{% endmacro %}
{# Domain-specific check #}
{% macro valid_email(column) %}
SELECT COUNT(*) = 0 FROM SL_THIS
WHERE {{ column }} NOT LIKE '%@%.%'
{% endmacro %}
Create the domain _config.sl.yml:
version: 1
load:
metadata:
directory: "{incoming_dir}/{domain_name}"
multiline: false
encoding: "UTF-8"
withHeader: true
separator: ","
quote: "\""
Generate:
{planning_artifacts}/schema-design-{{domain_name}}.md.sl.yml files to {implementation_artifacts}/schemas/config skill for the complete attribute types catalog (string, int, long, date, timestamp, etc.)load skill for write strategy reference (APPEND, OVERWRITE, SCD2, UPSERT, etc.)infer-schema skill to auto-infer schemas from existing data filesexpectations skill for Jinja2 macro syntax when defining quality checksComplete Starlake-compatible schema definitions with custom types, privacy annotations, and expectations — ready to be placed in the metadata/load/ directory.
development
Design SQL transformations for data pipelines with quality checks and dependency management. Use when the user says "design transforms" or "create SQL transformations".
devops
Plan and track sprint progress for data pipeline implementation. Use when the user says "sprint planning" or "plan data sprint".
testing
Analyze data sources in depth: schema, quality, volume, and extraction strategy. Use when the user says "analyze data source" or "profile this data source".
devops
Platform Engineer agent — manages infrastructure, orchestration, and deployment for data pipelines. Use when the user says "platform-engineer" or "talk to the platform-engineer".