.agents/skills/infer-schema/SKILL.md
Infer a Starlake schema from a data file
npx skillsauth add starlake-ai/starlake-skills infer-schemaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Analyzes a data file (CSV, JSON, XML, Parquet) and infers the Starlake table schema, generating the corresponding YAML configuration file with detected column names, types, and format metadata.
starlake infer-schema [options]
--input <value>: Path to the data file or directory to analyze (required)--domain <value>: Domain name for the generated schema (e.g. starbake)--table <value>: Table name for the generated schema (e.g. orders)--outputDir <value>: Output directory for the YAML file (default: metadata/load)--write <value>: Write mode: OVERWRITE or APPEND--format <value>: Force input file format: DSV, JSON, JSON_FLAT, JSON_ARRAY, XML, PARQUET--rowTag <value>: Row tag for XML files (e.g. record)--variant: Infer schema as a single variant attribute (schema-on-read)--clean: Delete previous YAML file before writing--encoding <value>: Input file encoding (default: UTF-8)--from-json-schema: Input file is a JSON Schema file (not data)--reportFormat <value>: Report output format: console, json, or htmlThe command generates a table YAML file like:
# Generated: metadata/load/starbake/orders.sl.yml
version: 1
table:
name: "orders"
pattern: "orders_.*.json"
attributes:
- name: "customer_id"
type: "long"
sample: "9"
- name: "order_id"
type: "long"
sample: "99"
- name: "status"
type: "string"
sample: "Pending"
- name: "timestamp"
type: "iso_date_time"
sample: "2024-03-01T09:01:12.529Z"
metadata:
format: "JSON_FLAT"
encoding: "UTF-8"
array: true
writeStrategy:
type: "APPEND"
starlake infer-schema --domain starbake --table order_lines --input /data/order-lines_20240301.csv --format DSV
starlake infer-schema --domain starbake --table orders --input /data/orders_20240301.json
starlake infer-schema --domain starbake --table products --input /data/products.xml --rowTag record
starlake infer-schema --domain starbake --table events --input /data/events.parquet
starlake infer-schema --domain starbake --table orders --input /schemas/orders.json --from-json-schema
starlake infer-schema --domain starbake --table orders --input /data/orders.json --clean
starlake infer-schema --domain starbake --table events --input /data/events.json --variant
development
Design SQL transformations for data pipelines with quality checks and dependency management. Use when the user says "design transforms" or "create SQL transformations".
devops
Plan and track sprint progress for data pipeline implementation. Use when the user says "sprint planning" or "plan data sprint".
testing
Analyze data sources in depth: schema, quality, volume, and extraction strategy. Use when the user says "analyze data source" or "profile this data source".
data-ai
Design Starlake-compatible table schemas with types, constraints, privacy, and expectations. Use when the user says "design schema" or "create table definition".