Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

starlake-ai/autoload

Name: autoload
Author: starlake-ai

.agents/skills/autoload/SKILL.md

npx skillsauth add starlake-ai/starlake-skills autoload

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

AutoLoad Skill

Watches the incoming directory, automatically infers schemas for new data files, generates the corresponding YAML table definitions, and loads the data into the data warehouse. This is the quickest way to get data loaded — it combines schema inference and loading in a single step.

Usage

starlake autoload [options]

Options

--domains <value>: Comma-separated list of domains to watch (default: all)
--tables <value>: Comma-separated list of tables to watch (default: all)
--clean: Overwrite existing mapping/schema files before starting
--accessToken <value>: Access token for authentication (e.g. GCP)
--scheduledDate <value>: Scheduled date for the job, format: yyyy-MM-dd'T'HH:mm:ss.SSSZ
--options k1=v1,k2=v2: Substitution arguments passed to the watch job
--reportFormat <value>: Report output format: console, json, or html

How It Works

Scans the incoming directory for new data files
Infers the schema from file contents (column names, types)
Generates _config.sl.yml and {table}.sl.yml files in metadata/load/
Loads the data into the data warehouse using the inferred schema

The incoming directory is defined in application.sl.yml or env.sl.yml:

# metadata/env.sl.yml
version: 1
env:
  incoming_path: "{{SL_ROOT}}/datasets/incoming"

Configuration Context

AutoLoad creates table definitions like the following in metadata/load/{domain}/:

# Auto-generated: metadata/load/starbake/_config.sl.yml
version: 1
load:
  name: "starbake"
  metadata:
    directory: "{{incoming_path}}/starbake"

# Auto-generated: metadata/load/starbake/orders.sl.yml
version: 1
table:
  name: "orders"
  pattern: "orders_.*.json"
  attributes:
    - name: "customer_id"
      type: "long"
    - name: "order_id"
      type: "long"
    - name: "status"
      type: "string"
    - name: "timestamp"
      type: "iso_date_time"
  metadata:
    format: "JSON_FLAT"
    encoding: "UTF-8"
    array: true
    writeStrategy:
      type: "APPEND"

Load Strategies

The loadStrategyClass in application.sl.yml controls how files are ordered for processing during autoload:

Standard Strategies

| Strategy Class | Description | Ordering | |---|---|---| | ai.starlake.job.load.IngestionTimeStrategy | Load by file modification time | Oldest first | | ai.starlake.job.load.IngestionNameStrategy | Load by lexicographical filename order | Alphabetical |

Configuration:

# metadata/application.sl.yml
application:
  loadStrategyClass: "ai.starlake.job.load.IngestionNameStrategy"

Custom Load Strategy

Implement ai.starlake.job.load.LoadStrategy interface:

package com.mycompany.starlake

import ai.starlake.job.load.LoadStrategy
import ai.starlake.storage.StorageHandler
import org.apache.hadoop.fs.Path
import java.time.LocalDateTime

object CustomLoadStrategy extends LoadStrategy with StrictLogging {
  def list(
    storageHandler: StorageHandler,
    path: Path,
    extension: String = "",
    since: LocalDateTime = LocalDateTime.MIN,
    recursive: Boolean
  ): List[FileInfo] = {
    // Custom file ordering logic
    ???
  }
}

application:
  loadStrategyClass: "com.mycompany.starlake.CustomLoadStrategy"

Examples

AutoLoad All Incoming Data

starlake autoload

AutoLoad Specific Domain

starlake autoload --domains starbake

AutoLoad Specific Tables

starlake autoload --domains starbake --tables orders,products

AutoLoad with Clean (Re-infer Schemas)

Overwrite existing schema files and re-infer from data:

starlake autoload --clean

AutoLoad with JSON Report

starlake autoload --reportFormat json

Related Skills

load - Load data with pre-defined schemas
infer-schema - Infer schema for a single file
stage - Move files from landing to pending area
config - Configuration reference

starlake-ai/autoload

.agents/skills/autoload/SKILL.md

Automatically infer schemas and load data from the incoming directory

1 stars

data-ai

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add starlake-ai/starlake-skills autoload

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 16, 2026, 3:36 AM18.3s1 file scanned

SKILL.md

name:: autoload
description:: Automatically infer schemas and load data from the incoming directory

AutoLoad Skill

Usage

starlake autoload [options]

Options

--domains <value>: Comma-separated list of domains to watch (default: all)
--tables <value>: Comma-separated list of tables to watch (default: all)
--clean: Overwrite existing mapping/schema files before starting
--accessToken <value>: Access token for authentication (e.g. GCP)
--scheduledDate <value>: Scheduled date for the job, format: yyyy-MM-dd'T'HH:mm:ss.SSSZ
--options k1=v1,k2=v2: Substitution arguments passed to the watch job
--reportFormat <value>: Report output format: console, json, or html

How It Works

Scans the incoming directory for new data files
Infers the schema from file contents (column names, types)
Generates _config.sl.yml and {table}.sl.yml files in metadata/load/
Loads the data into the data warehouse using the inferred schema

The incoming directory is defined in application.sl.yml or env.sl.yml:

# metadata/env.sl.yml
version: 1
env:
  incoming_path: "{{SL_ROOT}}/datasets/incoming"

Configuration Context

AutoLoad creates table definitions like the following in metadata/load/{domain}/:

# Auto-generated: metadata/load/starbake/_config.sl.yml
version: 1
load:
  name: "starbake"
  metadata:
    directory: "{{incoming_path}}/starbake"

# Auto-generated: metadata/load/starbake/orders.sl.yml
version: 1
table:
  name: "orders"
  pattern: "orders_.*.json"
  attributes:
    - name: "customer_id"
      type: "long"
    - name: "order_id"
      type: "long"
    - name: "status"
      type: "string"
    - name: "timestamp"
      type: "iso_date_time"
  metadata:
    format: "JSON_FLAT"
    encoding: "UTF-8"
    array: true
    writeStrategy:
      type: "APPEND"

Load Strategies

The loadStrategyClass in application.sl.yml controls how files are ordered for processing during autoload:

Standard Strategies

Configuration:

# metadata/application.sl.yml
application:
  loadStrategyClass: "ai.starlake.job.load.IngestionNameStrategy"

Custom Load Strategy

Implement ai.starlake.job.load.LoadStrategy interface:

package com.mycompany.starlake

import ai.starlake.job.load.LoadStrategy
import ai.starlake.storage.StorageHandler
import org.apache.hadoop.fs.Path
import java.time.LocalDateTime

object CustomLoadStrategy extends LoadStrategy with StrictLogging {
  def list(
    storageHandler: StorageHandler,
    path: Path,
    extension: String = "",
    since: LocalDateTime = LocalDateTime.MIN,
    recursive: Boolean
  ): List[FileInfo] = {
    // Custom file ordering logic
    ???
  }
}

application:
  loadStrategyClass: "com.mycompany.starlake.CustomLoadStrategy"

Examples

AutoLoad All Incoming Data

starlake autoload

AutoLoad Specific Domain

starlake autoload --domains starbake

AutoLoad Specific Tables

starlake autoload --domains starbake --tables orders,products

AutoLoad with Clean (Re-infer Schemas)

Overwrite existing schema files and re-infer from data:

starlake autoload --clean

AutoLoad with JSON Report

starlake autoload --reportFormat json

Related Skills

load - Load data with pre-defined schemas
infer-schema - Infer schema for a single file
stage - Move files from landing to pending area
config - Configuration reference

Related Skills

starlake-ai/starflow-transform-design

development

VerifiedTrustedCommunity

Design SQL transformations for data pipelines with quality checks and dependency management. Use when the user says "design transforms" or "create SQL transformations".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-transform-design

starlake-ai/starflow-sprint-planning

devops

VerifiedTrustedCommunity

Plan and track sprint progress for data pipeline implementation. Use when the user says "sprint planning" or "plan data sprint".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-sprint-planning

starlake-ai/starflow-source-analysis

testing

VerifiedTrustedCommunity

Analyze data sources in depth: schema, quality, volume, and extraction strategy. Use when the user says "analyze data source" or "profile this data source".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-source-analysis

starlake-ai/starflow-schema-design

data-ai

VerifiedTrustedCommunity

Design Starlake-compatible table schemas with types, constraints, privacy, and expectations. Use when the user says "design schema" or "create table definition".

1SKILL.mdUpdated Apr 16, 2026

starlake-ai/starflow-schema-design

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/starlake-ai/starlake-skills.git

# Copy into Claude Code skills folder (global)
cp -r starlake-skills/.agents/skills/autoload ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

starlake-ai/starlake-skills

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT