Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

starlake-ai/extract

Name: extract
Author: starlake-ai

.agents/skills/extract/SKILL.md

npx skillsauth add starlake-ai/starlake-skills extract

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Extract Skill

Combines schema extraction and data extraction in a single command. First extracts the database schema metadata into Starlake YAML files, then extracts the actual data into files. This is a convenience command that runs extract-schema followed by extract-data.

Usage

starlake extract [options]

Options

Combines all options from extract-schema and extract-data.

Schema Extraction Options

--config <value>: Database tables & connection info
--outputDir <value>: Where to output YML files
--tables <value>: Database tables to extract
--connectionRef <value>: JDBC connection reference
--all: Extract all schemas and tables
--external: Output YML files to the external folder
--parallelism <value>: Parallelism level
--snakecase: Apply snake_case to column names

Data Extraction Options

--limit <value>: Limit number of records
--numPartitions <value>: Partition parallelism
--ignoreExtractionFailure: Continue on extraction failure
--clean: Clean target files before extraction
--incremental: Export only new data since last extraction
--includeSchemas <value>: Domains to include
--excludeSchemas <value>: Domains to exclude
--includeTables <value>: Tables to include
--excludeTables <value>: Tables to exclude
--reportFormat <value>: Report output format: console, json, or html

Configuration Context

Extract commands use a configuration file (metadata/extract/{name}.sl.yml) to define which schemas and tables to extract:

# metadata/extract/externals.sl.yml
version: 1
extract:
  connectionRef: "duckdb"
  jdbcSchemas:
    - schema: "starbake"
      tables:
        - name: "*"              # "*" to extract all tables
      tableTypes:
        - "TABLE"

Advanced Extract Configuration

# metadata/extract/source_db.sl.yml
version: 1
extract:
  connectionRef: "source_postgres"
  jdbcSchemas:
    - schema: "sales"
      tableTypes:
        - "TABLE"
        - "VIEW"
      tables:
        - name: "orders"
          fullExport: false          # Incremental extraction
          partitionColumn: "id"      # Column for parallel extraction
          numPartitions: 4           # Parallelism level
          timestamp: "updated_at"    # Incremental tracking column
          fetchSize: 1000            # JDBC fetch size
        - name: "customers"
          fullExport: true

Connection Configuration

The connection referenced in the extract config must be defined in application.sl.yml:

# metadata/application.sl.yml
version: 1
application:
  connections:
    source_postgres:
      type: jdbc
      options:
        url: "jdbc:postgresql://{{PG_HOST}}:{{PG_PORT}}/{{PG_DB}}"
        driver: "org.postgresql.Driver"
        user: "{{DATABASE_USER}}"
        password: "{{DATABASE_PASSWORD}}"

OpenAPI Extract Configuration

Extract schemas from OpenAPI/Swagger specifications:

# metadata/extract/api.sl.yml
version: 1
extract:
  openAPI:
    basePath: /api/v2

    domains:
      - name: customers_api

        # Schema filtering (regex)
        schemas:
          exclude:
            - "Model\\.Common\\.Id"
            - "Internal\\..*"

        # Route selection
        routes:
          - paths:
              include:
                - "/users"
                - "/orders"
                - "/products"

Freshness Monitoring

Track data freshness with timestamp columns after extraction:

# Check freshness for specific tables
starlake freshness --tables dataset1.table1,dataset2.table2 --persist true

Monitoring table: SL_LAST_EXPORT in audit schema.

Examples

Extract Schema and Data

starlake extract --config externals --outputDir metadata/load

Extract with Incremental Mode

starlake extract --config source_db --outputDir /tmp/output --incremental

Extract Specific Tables

starlake extract --config source_db --tables sales.orders,sales.customers

Related Skills

extract-schema - Extract schema only
extract-data - Extract data only
extract-script - Generate extraction scripts from templates
freshness - Check data freshness
load - Load extracted data into the warehouse

starlake-ai/extract

.agents/skills/extract/SKILL.md

Extract both schema and data from a JDBC source

1 stars

data-ai

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add starlake-ai/starlake-skills extract

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 16, 2026, 3:36 AM11.8s1 file scanned

SKILL.md

name:: extract
description:: Extract both schema and data from a JDBC source

Extract Skill

Usage

starlake extract [options]

Options

Combines all options from extract-schema and extract-data.

Schema Extraction Options

--config <value>: Database tables & connection info
--outputDir <value>: Where to output YML files
--tables <value>: Database tables to extract
--connectionRef <value>: JDBC connection reference
--all: Extract all schemas and tables
--external: Output YML files to the external folder
--parallelism <value>: Parallelism level
--snakecase: Apply snake_case to column names

Data Extraction Options

--limit <value>: Limit number of records
--numPartitions <value>: Partition parallelism
--ignoreExtractionFailure: Continue on extraction failure
--clean: Clean target files before extraction
--incremental: Export only new data since last extraction
--includeSchemas <value>: Domains to include
--excludeSchemas <value>: Domains to exclude
--includeTables <value>: Tables to include
--excludeTables <value>: Tables to exclude
--reportFormat <value>: Report output format: console, json, or html

Configuration Context

Extract commands use a configuration file (metadata/extract/{name}.sl.yml) to define which schemas and tables to extract:

# metadata/extract/externals.sl.yml
version: 1
extract:
  connectionRef: "duckdb"
  jdbcSchemas:
    - schema: "starbake"
      tables:
        - name: "*"              # "*" to extract all tables
      tableTypes:
        - "TABLE"

Advanced Extract Configuration

# metadata/extract/source_db.sl.yml
version: 1
extract:
  connectionRef: "source_postgres"
  jdbcSchemas:
    - schema: "sales"
      tableTypes:
        - "TABLE"
        - "VIEW"
      tables:
        - name: "orders"
          fullExport: false          # Incremental extraction
          partitionColumn: "id"      # Column for parallel extraction
          numPartitions: 4           # Parallelism level
          timestamp: "updated_at"    # Incremental tracking column
          fetchSize: 1000            # JDBC fetch size
        - name: "customers"
          fullExport: true

Connection Configuration

The connection referenced in the extract config must be defined in application.sl.yml:

# metadata/application.sl.yml
version: 1
application:
  connections:
    source_postgres:
      type: jdbc
      options:
        url: "jdbc:postgresql://{{PG_HOST}}:{{PG_PORT}}/{{PG_DB}}"
        driver: "org.postgresql.Driver"
        user: "{{DATABASE_USER}}"
        password: "{{DATABASE_PASSWORD}}"

OpenAPI Extract Configuration

Extract schemas from OpenAPI/Swagger specifications:

# metadata/extract/api.sl.yml
version: 1
extract:
  openAPI:
    basePath: /api/v2

    domains:
      - name: customers_api

        # Schema filtering (regex)
        schemas:
          exclude:
            - "Model\\.Common\\.Id"
            - "Internal\\..*"

        # Route selection
        routes:
          - paths:
              include:
                - "/users"
                - "/orders"
                - "/products"

Freshness Monitoring

Track data freshness with timestamp columns after extraction:

# Check freshness for specific tables
starlake freshness --tables dataset1.table1,dataset2.table2 --persist true

Monitoring table: SL_LAST_EXPORT in audit schema.

Examples

Extract Schema and Data

starlake extract --config externals --outputDir metadata/load

Extract with Incremental Mode

starlake extract --config source_db --outputDir /tmp/output --incremental

Extract Specific Tables

starlake extract --config source_db --tables sales.orders,sales.customers

Related Skills

extract-schema - Extract schema only
extract-data - Extract data only
extract-script - Generate extraction scripts from templates
freshness - Check data freshness
load - Load extracted data into the warehouse

Related Skills

starlake-ai/starflow-transform-design

development

VerifiedTrustedCommunity

Design SQL transformations for data pipelines with quality checks and dependency management. Use when the user says "design transforms" or "create SQL transformations".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-transform-design

starlake-ai/starflow-sprint-planning

devops

VerifiedTrustedCommunity

Plan and track sprint progress for data pipeline implementation. Use when the user says "sprint planning" or "plan data sprint".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-sprint-planning

starlake-ai/starflow-source-analysis

testing

VerifiedTrustedCommunity

Analyze data sources in depth: schema, quality, volume, and extraction strategy. Use when the user says "analyze data source" or "profile this data source".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-source-analysis

starlake-ai/starflow-schema-design

data-ai

VerifiedTrustedCommunity

Design Starlake-compatible table schemas with types, constraints, privacy, and expectations. Use when the user says "design schema" or "create table definition".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-schema-design

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/starlake-ai/starlake-skills.git

# Copy into Claude Code skills folder (global)
cp -r starlake-skills/.agents/skills/extract ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

starlake-ai/starlake-skills

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT