.agents/skills/autoload/SKILL.md
Automatically infer schemas and load data from the incoming directory
npx skillsauth add starlake-ai/starlake-skills autoloadInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Watches the incoming directory, automatically infers schemas for new data files, generates the corresponding YAML table definitions, and loads the data into the data warehouse. This is the quickest way to get data loaded — it combines schema inference and loading in a single step.
starlake autoload [options]
--domains <value>: Comma-separated list of domains to watch (default: all)--tables <value>: Comma-separated list of tables to watch (default: all)--clean: Overwrite existing mapping/schema files before starting--accessToken <value>: Access token for authentication (e.g. GCP)--scheduledDate <value>: Scheduled date for the job, format: yyyy-MM-dd'T'HH:mm:ss.SSSZ--options k1=v1,k2=v2: Substitution arguments passed to the watch job--reportFormat <value>: Report output format: console, json, or html_config.sl.yml and {table}.sl.yml files in metadata/load/The incoming directory is defined in application.sl.yml or env.sl.yml:
# metadata/env.sl.yml
version: 1
env:
incoming_path: "{{SL_ROOT}}/datasets/incoming"
AutoLoad creates table definitions like the following in metadata/load/{domain}/:
# Auto-generated: metadata/load/starbake/_config.sl.yml
version: 1
load:
name: "starbake"
metadata:
directory: "{{incoming_path}}/starbake"
# Auto-generated: metadata/load/starbake/orders.sl.yml
version: 1
table:
name: "orders"
pattern: "orders_.*.json"
attributes:
- name: "customer_id"
type: "long"
- name: "order_id"
type: "long"
- name: "status"
type: "string"
- name: "timestamp"
type: "iso_date_time"
metadata:
format: "JSON_FLAT"
encoding: "UTF-8"
array: true
writeStrategy:
type: "APPEND"
The loadStrategyClass in application.sl.yml controls how files are ordered for processing during autoload:
| Strategy Class | Description | Ordering |
|---|---|---|
| ai.starlake.job.load.IngestionTimeStrategy | Load by file modification time | Oldest first |
| ai.starlake.job.load.IngestionNameStrategy | Load by lexicographical filename order | Alphabetical |
Configuration:
# metadata/application.sl.yml
application:
loadStrategyClass: "ai.starlake.job.load.IngestionNameStrategy"
Implement ai.starlake.job.load.LoadStrategy interface:
package com.mycompany.starlake
import ai.starlake.job.load.LoadStrategy
import ai.starlake.storage.StorageHandler
import org.apache.hadoop.fs.Path
import java.time.LocalDateTime
object CustomLoadStrategy extends LoadStrategy with StrictLogging {
def list(
storageHandler: StorageHandler,
path: Path,
extension: String = "",
since: LocalDateTime = LocalDateTime.MIN,
recursive: Boolean
): List[FileInfo] = {
// Custom file ordering logic
???
}
}
application:
loadStrategyClass: "com.mycompany.starlake.CustomLoadStrategy"
starlake autoload
starlake autoload --domains starbake
starlake autoload --domains starbake --tables orders,products
Overwrite existing schema files and re-infer from data:
starlake autoload --clean
starlake autoload --reportFormat json
development
Design SQL transformations for data pipelines with quality checks and dependency management. Use when the user says "design transforms" or "create SQL transformations".
devops
Plan and track sprint progress for data pipeline implementation. Use when the user says "sprint planning" or "plan data sprint".
testing
Analyze data sources in depth: schema, quality, volume, and extraction strategy. Use when the user says "analyze data source" or "profile this data source".
data-ai
Design Starlake-compatible table schemas with types, constraints, privacy, and expectations. Use when the user says "design schema" or "create table definition".