.agents/skills/extract/SKILL.md
Extract both schema and data from a JDBC source
npx skillsauth add starlake-ai/starlake-skills extractInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Combines schema extraction and data extraction in a single command. First extracts the database schema metadata into Starlake YAML files, then extracts the actual data into files. This is a convenience command that runs extract-schema followed by extract-data.
starlake extract [options]
Combines all options from extract-schema and extract-data.
--config <value>: Database tables & connection info--outputDir <value>: Where to output YML files--tables <value>: Database tables to extract--connectionRef <value>: JDBC connection reference--all: Extract all schemas and tables--external: Output YML files to the external folder--parallelism <value>: Parallelism level--snakecase: Apply snake_case to column names--limit <value>: Limit number of records--numPartitions <value>: Partition parallelism--ignoreExtractionFailure: Continue on extraction failure--clean: Clean target files before extraction--incremental: Export only new data since last extraction--includeSchemas <value>: Domains to include--excludeSchemas <value>: Domains to exclude--includeTables <value>: Tables to include--excludeTables <value>: Tables to exclude--reportFormat <value>: Report output format: console, json, or htmlExtract commands use a configuration file (metadata/extract/{name}.sl.yml) to define which schemas and tables to extract:
# metadata/extract/externals.sl.yml
version: 1
extract:
connectionRef: "duckdb"
jdbcSchemas:
- schema: "starbake"
tables:
- name: "*" # "*" to extract all tables
tableTypes:
- "TABLE"
# metadata/extract/source_db.sl.yml
version: 1
extract:
connectionRef: "source_postgres"
jdbcSchemas:
- schema: "sales"
tableTypes:
- "TABLE"
- "VIEW"
tables:
- name: "orders"
fullExport: false # Incremental extraction
partitionColumn: "id" # Column for parallel extraction
numPartitions: 4 # Parallelism level
timestamp: "updated_at" # Incremental tracking column
fetchSize: 1000 # JDBC fetch size
- name: "customers"
fullExport: true
The connection referenced in the extract config must be defined in application.sl.yml:
# metadata/application.sl.yml
version: 1
application:
connections:
source_postgres:
type: jdbc
options:
url: "jdbc:postgresql://{{PG_HOST}}:{{PG_PORT}}/{{PG_DB}}"
driver: "org.postgresql.Driver"
user: "{{DATABASE_USER}}"
password: "{{DATABASE_PASSWORD}}"
Extract schemas from OpenAPI/Swagger specifications:
# metadata/extract/api.sl.yml
version: 1
extract:
openAPI:
basePath: /api/v2
domains:
- name: customers_api
# Schema filtering (regex)
schemas:
exclude:
- "Model\\.Common\\.Id"
- "Internal\\..*"
# Route selection
routes:
- paths:
include:
- "/users"
- "/orders"
- "/products"
Track data freshness with timestamp columns after extraction:
# Check freshness for specific tables
starlake freshness --tables dataset1.table1,dataset2.table2 --persist true
Monitoring table: SL_LAST_EXPORT in audit schema.
starlake extract --config externals --outputDir metadata/load
starlake extract --config source_db --outputDir /tmp/output --incremental
starlake extract --config source_db --tables sales.orders,sales.customers
development
Design SQL transformations for data pipelines with quality checks and dependency management. Use when the user says "design transforms" or "create SQL transformations".
devops
Plan and track sprint progress for data pipeline implementation. Use when the user says "sprint planning" or "plan data sprint".
testing
Analyze data sources in depth: schema, quality, volume, and extraction strategy. Use when the user says "analyze data source" or "profile this data source".
data-ai
Design Starlake-compatible table schemas with types, constraints, privacy, and expectations. Use when the user says "design schema" or "create table definition".