Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

starlake-ai/starflow-source-analysis

Name: starflow-source-analysis
Author: starlake-ai

.agents/starflow/skills/starflow-source-analysis/SKILL.md

npx skillsauth add starlake-ai/starlake-skills starflow-source-analysis

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Data Source Analysis

Overview

Performs a deep analysis of a specific data source, profiling its schema, data quality characteristics, volume patterns, and extraction requirements. Produces a source analysis document that feeds directly into pipeline specification and Starlake load configuration.

Role Guidance: Act as a Data Analyst with expertise in data profiling and source system analysis.

Design Rationale: Each data source has unique characteristics that determine how it should be extracted, validated, and loaded. Understanding these upfront prevents pipeline failures and data quality issues in production.

Steps

Step 1: Source Identification

Ask the user to identify the source to analyze:
- Source name and type (database table, file, API, stream)
- Connection details or sample data
- Business context: what does this data represent?
If a domain discovery document exists at {planning_artifacts}/domain-discovery-*.md, load it for context.

Step 2: Schema Analysis

For each attribute/column, document: | Field | Description | |-------|-------------| | Name | Column/field name | | Type | Data type (maps to Starlake types: string, integer, long, double, decimal, boolean, date, timestamp, bytes) | | Nullable | Whether NULLs are allowed | | Primary Key | Part of unique identifier | | Foreign Key | References to other tables/sources | | Pattern | Regex pattern for validation (Starlake custom types) | | Privacy | Privacy classification (PII, sensitive, public) and recommended transform (HIDE, SHA256, MD5, AES) | | Sample values | Representative examples |

Step 3: Quality Profiling

Assess data quality dimensions:

Completeness: NULL rates per column
Uniqueness: Duplicate detection on key columns
Validity: Values matching expected patterns/ranges
Consistency: Cross-column consistency rules
Timeliness: Data freshness and update patterns
Recommend Starlake expectations (Jinja2 macros) for each quality check.

Step 4: Volume & Pattern Analysis

Row count (current and growth trend)
Record size (average, max)
Update pattern: append-only, full refresh, incremental (CDC), SCD Type 2
Recommended Starlake write strategy: APPEND, OVERWRITE, UPSERT_BY_KEY, UPSERT_BY_KEY_AND_TIMESTAMP, SCD2, DELETE_THEN_INSERT, OVERWRITE_BY_PARTITION, ADAPTATIVE

Step 5: Extraction Strategy

Recommended extraction method (full, incremental, CDC)
Extraction frequency
Partitioning strategy (if applicable)
File format recommendation (Parquet preferred for columnar analytics, JSON for nested structures)
Error handling: what happens with rejected records?

Step 6: Output Generation

Generate the source analysis document and save to {planning_artifacts}/source-analysis-{{source_name}}.md.

Outcome

A detailed source analysis document with schema definition, quality profile, volume characteristics, and extraction strategy — ready for pipeline specification and Starlake YAML configuration generation.

starlake-ai/starflow-source-analysis

.agents/starflow/skills/starflow-source-analysis/SKILL.md

Analyze data sources in depth: schema, quality, volume, and extraction strategy. Use when the user says "analyze data source" or "profile this data source".

1 stars

testing

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add starlake-ai/starlake-skills starflow-source-analysis

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 16, 2026, 3:37 AM5.9s1 file scanned

SKILL.md

name:: starflow-source-analysis
description:: Analyze data sources in depth: schema, quality, volume, and extraction strategy. Use when the user says "analyze data source" or "profile this data source".

Data Source Analysis

Overview

Role Guidance: Act as a Data Analyst with expertise in data profiling and source system analysis.

Steps

Step 1: Source Identification

Ask the user to identify the source to analyze:
- Source name and type (database table, file, API, stream)
- Connection details or sample data
- Business context: what does this data represent?
If a domain discovery document exists at {planning_artifacts}/domain-discovery-*.md, load it for context.

Step 2: Schema Analysis

Step 3: Quality Profiling

Assess data quality dimensions:

Completeness: NULL rates per column
Uniqueness: Duplicate detection on key columns
Validity: Values matching expected patterns/ranges
Consistency: Cross-column consistency rules
Timeliness: Data freshness and update patterns
Recommend Starlake expectations (Jinja2 macros) for each quality check.

Step 4: Volume & Pattern Analysis

Row count (current and growth trend)
Record size (average, max)
Update pattern: append-only, full refresh, incremental (CDC), SCD Type 2
Recommended Starlake write strategy: APPEND, OVERWRITE, UPSERT_BY_KEY, UPSERT_BY_KEY_AND_TIMESTAMP, SCD2, DELETE_THEN_INSERT, OVERWRITE_BY_PARTITION, ADAPTATIVE

Step 5: Extraction Strategy

Recommended extraction method (full, incremental, CDC)
Extraction frequency
Partitioning strategy (if applicable)
File format recommendation (Parquet preferred for columnar analytics, JSON for nested structures)
Error handling: what happens with rejected records?

Step 6: Output Generation

Generate the source analysis document and save to {planning_artifacts}/source-analysis-{{source_name}}.md.

Outcome

Related Skills

starlake-ai/starflow-transform-design

development

VerifiedTrustedCommunity

Design SQL transformations for data pipelines with quality checks and dependency management. Use when the user says "design transforms" or "create SQL transformations".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-transform-design

starlake-ai/starflow-sprint-planning

devops

VerifiedTrustedCommunity

Plan and track sprint progress for data pipeline implementation. Use when the user says "sprint planning" or "plan data sprint".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-sprint-planning

starlake-ai/starflow-schema-design

data-ai

VerifiedTrustedCommunity

Design Starlake-compatible table schemas with types, constraints, privacy, and expectations. Use when the user says "design schema" or "create table definition".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-schema-design

starlake-ai/starflow-platform-engineer

devops

VerifiedTrustedCommunity

Platform Engineer agent — manages infrastructure, orchestration, and deployment for data pipelines. Use when the user says "platform-engineer" or "talk to the platform-engineer".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-platform-engineer

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/starlake-ai/starlake-skills.git

# Copy into Claude Code skills folder (global)
cp -r starlake-skills/.agents/starflow/skills/starflow-source-analysis ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

starlake-ai/starlake-skills

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT