.agents/skills/extract-schema/SKILL.md
Extract database schemas into Starlake YAML configuration files
npx skillsauth add starlake-ai/starlake-skills extract-schemaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Connects to a JDBC database and extracts table schemas (column names, types, constraints) into Starlake YAML configuration files. This is the first step when reverse-engineering an existing database into a Starlake project.
starlake extract-schema [options]
--config <value>: Extract configuration name (references a file in metadata/extract/)--outputDir <value>: Where to output the generated YML files--tables <value>: Specific database tables to extract--connectionRef <value>: JDBC connection reference defined in application.sl.yml--all: Extract all schemas and tables to the external folder--external: Output YML files in the metadata/external/ folder--parallelism <value>: Parallelism level for extraction (default: available CPU cores)--snakecase: Apply snake_case transformation to column names--reportFormat <value>: Report output format: console, json, or htmlmetadata/extract/{name}.sl.yml)# metadata/extract/externals.sl.yml
version: 1
extract:
connectionRef: "duckdb"
jdbcSchemas:
- schema: "starbake"
tables:
- name: "*" # Extract all tables
tableTypes:
- "TABLE" # TABLE, VIEW, SYSTEM TABLE, GLOBAL TEMPORARY
The connection must be defined in application.sl.yml:
# metadata/application.sl.yml
version: 1
application:
connections:
duckdb:
type: "jdbc"
options:
url: "jdbc:duckdb:{{SL_ROOT}}/datasets/duckdb.db"
driver: "org.duckdb.DuckDBDriver"
source_postgres:
type: jdbc
options:
url: "jdbc:postgresql://{{PG_HOST}}:5432/{{PG_DB}}"
driver: "org.postgresql.Driver"
user: "{{DATABASE_USER}}"
password: "{{DATABASE_PASSWORD}}"
# metadata/extract/source_db.sl.yml
version: 1
extract:
connectionRef: "source_postgres"
jdbcSchemas:
- schema: "sales"
# Custom remarks queries (for databases like DB2)
columnRemarks: "SELECT COLUMN_NAME, COLUMN_TEXT FROM SYSCAT.COLUMNS WHERE ..."
tableRemarks: "SELECT TABLE_TEXT FROM SYSCAT.TABLES WHERE ..."
tableTypes:
- "TABLE"
- "VIEW"
tables:
- name: "*" # Or specific table name or pattern
fullExport: true # false for incremental
# Incremental configuration
partitionColumn: "id" # For parallel extraction
numPartitions: 4 # Parallelism level
timestamp: "updated_at" # For incremental tracking
# JDBC tuning
fetchSize: 1000 # JDBC fetch size
# Custom query (overrides table name)
sql: "SELECT * FROM orders WHERE region = 'EMEA'"
# Column selection (optional)
columns:
- "order_id"
- "customer_id"
- "order_date"
- "total_amount"
The command generates table YAML files like:
# Generated: metadata/load/starbake/orders.sl.yml
version: 1
table:
name: "orders"
pattern: "orders_.*.json"
attributes:
- name: "order_id"
type: "long"
required: true
- name: "customer_id"
type: "long"
- name: "status"
type: "string"
- name: "timestamp"
type: "timestamp"
starlake extract-schema --config externals --outputDir metadata/load
starlake extract-schema --connectionRef source_postgres --outputDir metadata/load
starlake extract-schema --config externals --tables starbake.orders,starbake.customers
starlake extract-schema --config externals --all --external
starlake extract-schema --config externals --outputDir metadata/load --snakecase
starlake extract-schema --config externals --outputDir metadata/load --parallelism 8
development
Design SQL transformations for data pipelines with quality checks and dependency management. Use when the user says "design transforms" or "create SQL transformations".
devops
Plan and track sprint progress for data pipeline implementation. Use when the user says "sprint planning" or "plan data sprint".
testing
Analyze data sources in depth: schema, quality, volume, and extraction strategy. Use when the user says "analyze data source" or "profile this data source".
data-ai
Design Starlake-compatible table schemas with types, constraints, privacy, and expectations. Use when the user says "design schema" or "create table definition".