.agents/skills/extract-data/SKILL.md
Extract data from database tables to CSV/Parquet files
npx skillsauth add starlake-ai/starlake-skills extract-dataInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Extracts data from database tables into local files (CSV, Parquet). Supports full and incremental extraction, parallel processing, and schema/table filtering.
starlake extract-data [options]
--config <value>: Extract configuration name (required) — references a file in metadata/extract/--outputDir <value>: Where to output data files (required)--limit <value>: Limit number of records extracted per table--numPartitions <value>: Parallelism level for partitioned table extraction--parallelism <value>: Parallelism level of the overall extraction process--ignoreExtractionFailure: Continue extraction even if individual tables fail--clean: Clean all files for a table before extracting it--incremental: Export only new data since last extraction (uses timestamp tracking)--ifExtractedBefore <value>: Only extract if last extraction was before this datetime--includeSchemas <value>: Comma-separated list of schemas/domains to include--excludeSchemas <value>: Comma-separated list of schemas/domains to exclude--includeTables <value>: Comma-separated list of tables to include--excludeTables <value>: Comma-separated list of tables to exclude--reportFormat <value>: Report output format: console, json, or html# metadata/extract/externals.sl.yml
version: 1
extract:
connectionRef: "source_postgres"
jdbcSchemas:
- schema: "sales"
tables:
- name: "orders"
fullExport: false
partitionColumn: "id"
numPartitions: 4
timestamp: "updated_at"
fetchSize: 1000
- name: "customers"
fullExport: true
- name: "*" # All remaining tables
starlake extract-data --config externals --outputDir /tmp/output
Extract only rows added/updated since last extraction:
starlake extract-data --config externals --outputDir /tmp/output --incremental
starlake extract-data --config externals --outputDir /tmp/output --limit 1000
starlake extract-data --config externals --outputDir /tmp/output --includeSchemas sales,hr
starlake extract-data --config externals --outputDir /tmp/output --excludeTables audit_log,temp_data
starlake extract-data --config externals --outputDir /tmp/output --parallelism 8 --ignoreExtractionFailure
starlake extract-data --config externals --outputDir /tmp/output --clean
development
Design SQL transformations for data pipelines with quality checks and dependency management. Use when the user says "design transforms" or "create SQL transformations".
devops
Plan and track sprint progress for data pipeline implementation. Use when the user says "sprint planning" or "plan data sprint".
testing
Analyze data sources in depth: schema, quality, volume, and extraction strategy. Use when the user says "analyze data source" or "profile this data source".
data-ai
Design Starlake-compatible table schemas with types, constraints, privacy, and expectations. Use when the user says "design schema" or "create table definition".