Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

starlake-ai/starflow-create-data-architecture

Name: starflow-create-data-architecture
Author: starlake-ai

.agents/starflow/skills/starflow-create-data-architecture/SKILL.md

npx skillsauth add starlake-ai/starlake-skills starflow-create-data-architecture

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Data Architecture Design

Overview

Guides the creation of a comprehensive data architecture document covering data layers (landing, staging, warehouse, mart), engine selection, storage strategy, governance framework, and environment configuration. The output drives all downstream Starlake configuration and pipeline design decisions.

Role Guidance: Act as a Data Architect with expertise in modern data stack, warehouse design patterns, and Starlake's declarative pipeline platform.

Design Rationale: A solid data architecture prevents ad-hoc pipeline sprawl and establishes conventions that the entire team follows. Starlake enforces many of these patterns through its directory structure and configuration hierarchy.

Steps

Step 1: Requirements Gathering

Load domain discovery from {planning_artifacts}/domain-discovery-*.md if available.
Clarify:
- Target use cases (analytics, reporting, ML, operational)
- Latency requirements (batch, micro-batch, real-time)
- Scale expectations (GB, TB, PB)
- Compliance and security requirements
- Budget and team constraints

Step 2: Layer Design

Define the data layers and their purpose:

| Layer | Starlake Stage | Purpose | Write Strategy | |-------|---------------|---------|----------------| | Landing | incoming/pending | Raw data as-is from source | N/A (file staging) | | Bronze / Raw | accepted | Validated, typed, privacy-applied | APPEND or OVERWRITE | | Silver / Curated | transform (business) | Cleaned, deduplicated, conformed | UPSERT_BY_KEY or SCD2 | | Gold / Mart | transform (business) | Business-ready aggregations | OVERWRITE or OVERWRITE_BY_PARTITION |

Step 3: Engine & Storage Selection

Development engine: DuckDB (local, fast iteration)
Production engine(s): BigQuery / Snowflake / Databricks / PostgreSQL
File storage: Local filesystem (dev), Cloud storage (prod: GCS, S3, ADLS)
File format: Parquet (default), JSON (nested/semi-structured), CSV (legacy compatibility)
Document connection configurations per environment.

Step 4: Starlake Project Structure

Define the metadata directory structure:

metadata/
  application.sl.yml       # Global config, connections, defaults
  env.sl.yml               # Base environment variables
  env.PROD.sl.yml           # Production overrides
  types/
    default.sl.yml          # Built-in types
    custom.sl.yml           # Project-specific types (regex patterns)
  load/
    {domain}/
      _config.sl.yml        # Domain defaults (incoming dir, connection)
      {table}.sl.yml        # Per-table schema and load config
  transform/
    {domain}/
      {task}.sl.yml          # Transform config (write strategy, sink)
      {task}.sql             # SQL transformation
  extract/
    {source}.sl.yml          # JDBC/API extraction config
  dags/
    {schedule}.sl.yml        # Orchestration DAG definitions
  expectations/
    {domain}.j2              # Reusable data quality macros

Step 5: Governance Framework

Data classification (public, internal, confidential, restricted)
Privacy strategy: column-level annotations (HIDE, SHA256, MD5, AES)
Access control: Starlake ACL policies and IAM integration
Data retention policies per layer
Lineage tracking strategy

Step 6: Environment Strategy

Development: DuckDB + local filesystem
Staging: Target engine + cloud storage (subset of data)
Production: Target engine + cloud storage (full data)
Environment switching via SL_ENV and env.{ENV}.sl.yml files
Connection references (connectionRef) switchable per environment

Step 7: Output Generation

Generate the data architecture document and save to {planning_artifacts}/data-architecture-{{project_name}}.md.

Outcome

A comprehensive data architecture document covering layers, engines, Starlake project structure, governance, and environment strategy — ready to guide all pipeline implementation.

starlake-ai/starflow-create-data-architecture

.agents/starflow/skills/starflow-create-data-architecture/SKILL.md

Design the overall data architecture including layers, storage, engines, and governance. Use when the user says "create data architecture" or "design the data platform".

1 stars

development

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add starlake-ai/starlake-skills starflow-create-data-architecture

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 16, 2026, 3:37 AM8.0s1 file scanned

SKILL.md

name:: starflow-create-data-architecture
description:: Design the overall data architecture including layers, storage, engines, and governance. Use when the user says "create data architecture" or "design the data platform".

Data Architecture Design

Overview

Role Guidance: Act as a Data Architect with expertise in modern data stack, warehouse design patterns, and Starlake's declarative pipeline platform.

Steps

Step 1: Requirements Gathering

Load domain discovery from {planning_artifacts}/domain-discovery-*.md if available.
Clarify:
- Target use cases (analytics, reporting, ML, operational)
- Latency requirements (batch, micro-batch, real-time)
- Scale expectations (GB, TB, PB)
- Compliance and security requirements
- Budget and team constraints

Step 2: Layer Design

Define the data layers and their purpose:

Step 3: Engine & Storage Selection

Development engine: DuckDB (local, fast iteration)
Production engine(s): BigQuery / Snowflake / Databricks / PostgreSQL
File storage: Local filesystem (dev), Cloud storage (prod: GCS, S3, ADLS)
File format: Parquet (default), JSON (nested/semi-structured), CSV (legacy compatibility)
Document connection configurations per environment.

Step 4: Starlake Project Structure

Define the metadata directory structure:

metadata/
  application.sl.yml       # Global config, connections, defaults
  env.sl.yml               # Base environment variables
  env.PROD.sl.yml           # Production overrides
  types/
    default.sl.yml          # Built-in types
    custom.sl.yml           # Project-specific types (regex patterns)
  load/
    {domain}/
      _config.sl.yml        # Domain defaults (incoming dir, connection)
      {table}.sl.yml        # Per-table schema and load config
  transform/
    {domain}/
      {task}.sl.yml          # Transform config (write strategy, sink)
      {task}.sql             # SQL transformation
  extract/
    {source}.sl.yml          # JDBC/API extraction config
  dags/
    {schedule}.sl.yml        # Orchestration DAG definitions
  expectations/
    {domain}.j2              # Reusable data quality macros

Step 5: Governance Framework

Data classification (public, internal, confidential, restricted)
Privacy strategy: column-level annotations (HIDE, SHA256, MD5, AES)
Access control: Starlake ACL policies and IAM integration
Data retention policies per layer
Lineage tracking strategy

Step 6: Environment Strategy

Development: DuckDB + local filesystem
Staging: Target engine + cloud storage (subset of data)
Production: Target engine + cloud storage (full data)
Environment switching via SL_ENV and env.{ENV}.sl.yml files
Connection references (connectionRef) switchable per environment

Step 7: Output Generation

Generate the data architecture document and save to {planning_artifacts}/data-architecture-{{project_name}}.md.

Outcome

A comprehensive data architecture document covering layers, engines, Starlake project structure, governance, and environment strategy — ready to guide all pipeline implementation.

Related Skills

starlake-ai/starflow-transform-design

development

VerifiedTrustedCommunity

Design SQL transformations for data pipelines with quality checks and dependency management. Use when the user says "design transforms" or "create SQL transformations".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-transform-design

starlake-ai/starflow-sprint-planning

devops

VerifiedTrustedCommunity

Plan and track sprint progress for data pipeline implementation. Use when the user says "sprint planning" or "plan data sprint".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-sprint-planning

starlake-ai/starflow-source-analysis

testing

VerifiedTrustedCommunity

Analyze data sources in depth: schema, quality, volume, and extraction strategy. Use when the user says "analyze data source" or "profile this data source".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-source-analysis

starlake-ai/starflow-schema-design

data-ai

VerifiedTrustedCommunity

Design Starlake-compatible table schemas with types, constraints, privacy, and expectations. Use when the user says "design schema" or "create table definition".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-schema-design

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/starlake-ai/starlake-skills.git

# Copy into Claude Code skills folder (global)
cp -r starlake-skills/.agents/starflow/skills/starflow-create-data-architecture ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

starlake-ai/starlake-skills

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT