Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

kumewata/databricks

Name: databricks
Author: kumewata

config/agents/skills/databricks/SKILL.md

npx skillsauth add kumewata/dotfiles databricks

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Databricks Expert Engineer Skill

This skill provides a comprehensive guide for Databricks development.

1. Databricks CLI Usage

1.1. About warehouse_id

Find and select one Serverless SQL Warehouse for warehouse_id
Note: databricks CLI does not auto-read warehouse_id from config files, so explicitly include it in JSON each time

1.2. Authentication

When auth_type=databricks-cli in profile, run U2M authentication first

databricks auth login --host https://xxx.cloud.databricks.com --profile PROFILE_NAME

Check authentication status
```
databricks auth profiles
```

1.3. Basic Usage

# Execute query
databricks api post /api/2.0/sql/statements --profile "DEFAULT" --json '{
  "warehouse_id": "xxxxxxxxxx",
  "catalog": "catalog_name",
  "schema": "schema_name",
  "statement": "select * from table_name limit 10"
}'

# Get results (statement_id is returned from execution)
databricks api get /api/2.0/sql/statements/{statement_id} --profile "DEFAULT"

1.4. Command Tips

Query execution flow
- post executes query -> returns statement_id
- get retrieves results (wait until state is SUCCEEDED)
- For long queries, add sleep and retry
Error handling
- state: CLOSED: Result retrieval was too slow. Get earlier
- state: FAILED: SQL error. Check error_message
- state: RUNNING: Still executing. Wait and retry get
- Timeout: For large data, use limit to verify
Reading results
- data_array: Actual data (2D array)
- schema.columns: Column names and type info
- total_row_count: Total count (shown even with limit)
- state: Query execution state
Parameterized queries

databricks api post /api/2.0/sql/statements --profile "DEFAULT" --json '{
  "warehouse_id": "xxxxxxxxxx",
  "statement": "select * from table where date >= :start_date",
  "parameters": [{"name": "start_date", "value": "2025-01-01", "type": "DATE"}]
}'

2. Well-Architected Lakehouse Framework

Consists of 7 pillars:

2.1. Data and AI Governance

Policies and practices to securely manage data and AI assets. Minimize data copies with unified governance solution.

2.2. Interoperability and Usability

Consistent user experience and seamless integration with external systems.

2.3. Operational Excellence

Processes supporting continuous production operations.

2.4. Security, Privacy, and Compliance

Implement safeguards against threats.

2.5. Reliability

Ensure disaster recovery capabilities.

2.6. Performance Efficiency

Adaptability to workload changes.

2.7. Cost Optimization

Cost management to maximize value delivery.

3. Unity Catalog

3.1. Basic Concepts

"Define once, secure everywhere" approach
Unified access control policies across multiple workspaces
ANSI SQL compliant permission management

3.2. Object Model

3-level namespace: catalog.schema.table

Catalog layer: Data isolation unit (by department, etc.)
Schema layer: Logical group containing tables, views, volumes
Object layer: Tables, views, volumes, functions, models

3.3. Permission Management

Users cannot access data by default
Explicit permission grants required
Permissions inherit from parent to child (catalog -> schema -> table)

-- Check permissions
SHOW GRANTS ON SCHEMA main.default;

-- Grant permissions
GRANT CREATE TABLE ON SCHEMA main.default TO `finance-team`;

-- Revoke permissions
REVOKE CREATE TABLE ON SCHEMA main.default FROM `finance-team`;

3.4. Best Practices

Managed tables/volumes recommended (Delta Lake format, full lifecycle management)
Catalog isolation across workspaces possible
Independent managed storage location per catalog recommended

4. Data Engineering

4.1. Lakeflow Solution

Unifies data ingestion, transformation, and orchestration.

Lakeflow Connect: Simplifies data ingestion
Lakeflow Spark Declarative Pipelines (SDP): Declarative pipeline framework
Lakeflow Jobs: Workflow automation

4.2. Delta Lake

Parquet data files with file-based transaction log
ACID transactions
Time travel functionality
Optimizations: liquid clustering, data skipping, file layout optimization, vacuum

4.3. Lakeflow Jobs

Task types:

Notebook tasks
Pipeline tasks
Python script tasks

Triggers:

Time-based (e.g., daily at 2 AM)
Event-based (on new data arrival)

Limits:

Workspace: Max 2000 concurrent task executions
Saved jobs: Max 12000
Tasks per job: Max 1000

5. Machine Learning Infrastructure

5.1. MLflow

Core tool for experiment tracking and model management
Dedicated features for GenAI

5.2. Feature Store

Feature management system
Automatic data pipelines and feature discovery

5.3. Model Serving

Deploy custom models and LLMs as REST endpoints
Auto-scaling and GPU support

6. Security

6.1. Authentication and Access Control

SSO configuration
Multi-factor authentication
Access control lists

6.2. Network Security

Private connectivity
Serverless egress control
Firewall settings
VPC management

6.3. Data Encryption

Encryption at rest and in transit
Customer-managed keys
Inter-cluster communication encryption
Automatic credential masking

7. SQL Warehouse

7.1. Serverless SQL Warehouse Benefits

Instant and elastic compute
Auto-scaling
Minimal management (Databricks handles capacity)
Low total cost of ownership

8. Schema Discovery and Validation

8.1. Pre-Query Validation Rule

YOU MUST: Run DESCRIBE before executing SELECT on unfamiliar tables
YOU MUST: Verify exact column names and case before writing queries

-- Check table columns first
DESCRIBE TABLE catalog.schema.table_name;

-- Then write your query using verified column names
SELECT column_name FROM catalog.schema.table_name;

8.2. Schema Discovery Commands

-- Basic column info
DESCRIBE TABLE catalog.schema.table_name;

-- Extended info (types, nullability, comments)
DESCRIBE EXTENDED catalog.schema.table_name;

-- List tables in schema
SHOW TABLES IN catalog.schema;

-- Table properties and metadata
DESCRIBE DETAIL catalog.schema.table_name;

8.3. Common Gotchas

| Issue | Cause | Prevention | | ------------------- | ------------------------------ | ----------------------------- | | Column name case | Databricks preserves case | Use DESCRIBE before query | | Data type mismatch | Implicit conversion fails | Check column types explicitly | | NULL handling | Unexpected NULL in aggregation | Use COALESCE or filter NULLs | | Timestamp precision | TIMESTAMP vs TIMESTAMP_NTZ | Verify type before comparison |

8.4. Knowledge Accumulation

When encountering schema-related issues, update this skill with:

Universal patterns (case sensitivity, type coercion rules)
Common column naming conventions in Unity Catalog
Databricks-specific SQL behaviors

NOTE: Do not include project-specific table names or business logic. Keep entries generalizable across environments.

9. Reference Links

Official docs: https://docs.databricks.com/
Unity Catalog: https://docs.databricks.com/en/data-governance/unity-catalog/
Lakeflow Jobs: https://docs.databricks.com/en/jobs/
MLflow: https://docs.databricks.com/en/mlflow/
Delta Lake: https://docs.databricks.com/en/delta/
Security: https://docs.databricks.com/en/security/

kumewata/databricks

config/agents/skills/databricks/SKILL.md

Databricks Expert Engineer Skill - Comprehensive guide for data engineering, machine learning infrastructure, and permission design Use when: - Running databricks CLI commands (auth, api) - Executing SQL queries via Databricks SQL Warehouse - Working with Unity Catalog permissions - Managing Lakeflow Jobs or Delta Lake

tools

Updated Apr 15, 2026

$ install --global

skillsauth

npx skillsauth add kumewata/dotfiles databricks

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 15, 2026, 1:10 PM3.8s1 file scanned

SKILL.md

name:: databricks
description:: |

Databricks Expert Engineer Skill

This skill provides a comprehensive guide for Databricks development.

1. Databricks CLI Usage

1.1. About warehouse_id

Find and select one Serverless SQL Warehouse for warehouse_id
Note: databricks CLI does not auto-read warehouse_id from config files, so explicitly include it in JSON each time

1.2. Authentication

When auth_type=databricks-cli in profile, run U2M authentication first

databricks auth login --host https://xxx.cloud.databricks.com --profile PROFILE_NAME

Check authentication status
```
databricks auth profiles
```

1.3. Basic Usage

# Execute query
databricks api post /api/2.0/sql/statements --profile "DEFAULT" --json '{
  "warehouse_id": "xxxxxxxxxx",
  "catalog": "catalog_name",
  "schema": "schema_name",
  "statement": "select * from table_name limit 10"
}'

# Get results (statement_id is returned from execution)
databricks api get /api/2.0/sql/statements/{statement_id} --profile "DEFAULT"

1.4. Command Tips

Query execution flow
- post executes query -> returns statement_id
- get retrieves results (wait until state is SUCCEEDED)
- For long queries, add sleep and retry
Error handling
- state: CLOSED: Result retrieval was too slow. Get earlier
- state: FAILED: SQL error. Check error_message
- state: RUNNING: Still executing. Wait and retry get
- Timeout: For large data, use limit to verify
Reading results
- data_array: Actual data (2D array)
- schema.columns: Column names and type info
- total_row_count: Total count (shown even with limit)
- state: Query execution state
Parameterized queries

databricks api post /api/2.0/sql/statements --profile "DEFAULT" --json '{
  "warehouse_id": "xxxxxxxxxx",
  "statement": "select * from table where date >= :start_date",
  "parameters": [{"name": "start_date", "value": "2025-01-01", "type": "DATE"}]
}'

2. Well-Architected Lakehouse Framework

Consists of 7 pillars:

2.1. Data and AI Governance

Policies and practices to securely manage data and AI assets. Minimize data copies with unified governance solution.

2.2. Interoperability and Usability

Consistent user experience and seamless integration with external systems.

2.3. Operational Excellence

Processes supporting continuous production operations.

2.4. Security, Privacy, and Compliance

Implement safeguards against threats.

2.5. Reliability

Ensure disaster recovery capabilities.

2.6. Performance Efficiency

Adaptability to workload changes.

2.7. Cost Optimization

Cost management to maximize value delivery.

3. Unity Catalog

3.1. Basic Concepts

"Define once, secure everywhere" approach
Unified access control policies across multiple workspaces
ANSI SQL compliant permission management

3.2. Object Model

3-level namespace: catalog.schema.table

Catalog layer: Data isolation unit (by department, etc.)
Schema layer: Logical group containing tables, views, volumes
Object layer: Tables, views, volumes, functions, models

3.3. Permission Management

Users cannot access data by default
Explicit permission grants required
Permissions inherit from parent to child (catalog -> schema -> table)

-- Check permissions
SHOW GRANTS ON SCHEMA main.default;

-- Grant permissions
GRANT CREATE TABLE ON SCHEMA main.default TO `finance-team`;

-- Revoke permissions
REVOKE CREATE TABLE ON SCHEMA main.default FROM `finance-team`;

3.4. Best Practices

Managed tables/volumes recommended (Delta Lake format, full lifecycle management)
Catalog isolation across workspaces possible
Independent managed storage location per catalog recommended

4. Data Engineering

4.1. Lakeflow Solution

Unifies data ingestion, transformation, and orchestration.

Lakeflow Connect: Simplifies data ingestion
Lakeflow Spark Declarative Pipelines (SDP): Declarative pipeline framework
Lakeflow Jobs: Workflow automation

4.2. Delta Lake

Parquet data files with file-based transaction log
ACID transactions
Time travel functionality
Optimizations: liquid clustering, data skipping, file layout optimization, vacuum

4.3. Lakeflow Jobs

Task types:

Notebook tasks
Pipeline tasks
Python script tasks

Triggers:

Time-based (e.g., daily at 2 AM)
Event-based (on new data arrival)

Limits:

Workspace: Max 2000 concurrent task executions
Saved jobs: Max 12000
Tasks per job: Max 1000

5. Machine Learning Infrastructure

5.1. MLflow

Core tool for experiment tracking and model management
Dedicated features for GenAI

5.2. Feature Store

Feature management system
Automatic data pipelines and feature discovery

5.3. Model Serving

Deploy custom models and LLMs as REST endpoints
Auto-scaling and GPU support

6. Security

6.1. Authentication and Access Control

SSO configuration
Multi-factor authentication
Access control lists

6.2. Network Security

Private connectivity
Serverless egress control
Firewall settings
VPC management

6.3. Data Encryption

Encryption at rest and in transit
Customer-managed keys
Inter-cluster communication encryption
Automatic credential masking

7. SQL Warehouse

7.1. Serverless SQL Warehouse Benefits

Instant and elastic compute
Auto-scaling
Minimal management (Databricks handles capacity)
Low total cost of ownership

8. Schema Discovery and Validation

8.1. Pre-Query Validation Rule

YOU MUST: Run DESCRIBE before executing SELECT on unfamiliar tables
YOU MUST: Verify exact column names and case before writing queries

-- Check table columns first
DESCRIBE TABLE catalog.schema.table_name;

-- Then write your query using verified column names
SELECT column_name FROM catalog.schema.table_name;

8.2. Schema Discovery Commands

-- Basic column info
DESCRIBE TABLE catalog.schema.table_name;

-- Extended info (types, nullability, comments)
DESCRIBE EXTENDED catalog.schema.table_name;

-- List tables in schema
SHOW TABLES IN catalog.schema;

-- Table properties and metadata
DESCRIBE DETAIL catalog.schema.table_name;

8.3. Common Gotchas

8.4. Knowledge Accumulation

When encountering schema-related issues, update this skill with:

Universal patterns (case sensitivity, type coercion rules)
Common column naming conventions in Unity Catalog
Databricks-specific SQL behaviors

NOTE: Do not include project-specific table names or business logic. Keep entries generalizable across environments.

9. Reference Links

Official docs: https://docs.databricks.com/
Unity Catalog: https://docs.databricks.com/en/data-governance/unity-catalog/
Lakeflow Jobs: https://docs.databricks.com/en/jobs/
MLflow: https://docs.databricks.com/en/mlflow/
Delta Lake: https://docs.databricks.com/en/delta/
Security: https://docs.databricks.com/en/security/

Related Skills

kumewata/waza-eval

tools

VerifiedTrustedCommunity

Use when creating a new skill or making a substantial change to an existing skill and you also need to design, update, or review Waza-based executable evaluations. This includes deciding whether Waza is warranted, mapping `evals.json` cases into Waza tasks, choosing fixtures and graders, selecting a valid model with `waza models --json`, and running a local-first `waza run` workflow. Do NOT use for installing the Waza CLI itself or for general skill-authoring advice that does not involve Waza; use `skill-creator` for skill design and this skill for the Waza execution layer. Trigger especially when the user mentions Waza, `waza run`, `waza models`, executable evals, compare, graders, fixtures, or wants to validate a skill change with model-backed evaluation.

SKILL.mdUpdated Jun 3, 2026

kumewata/cc-delegate

tools

VerifiedTrustedCommunity

Use when the user wants Codex to ask Claude Code for a second opinion or review on code, docs, diffs, PR changes, or design notes without modifying files. This delegates bounded review-only analysis through the Claude Code CLI (`claude -p`). Do NOT use for implementation or file edits; keep this skill review-only. Trigger especially when the user says ask Claude, ask Claude Code, cc-delegate, Claude review, second opinion from Claude, compare Codex and Claude, or review this diff/document with Claude Code.

SKILL.mdUpdated May 29, 2026

kumewata/airflow

tools

VerifiedTrustedCommunity

Airflow DAG development skill for writing, reviewing, testing, and debugging Apache Airflow workflows. Use whenever the user mentions Airflow, DAGs, tasks, operators, sensors, schedules, retries, catchup, DAG import errors, DAG parse performance, or workflow orchestration in Python. Also use for Amazon MWAA / Managed Workflows for Apache Airflow work, including MWAA DAG deployment, requirements.txt, plugins.zip, aws-mwaa-docker-images, S3 DAG folders, CloudWatch logs, and MWAA-specific dependency or IAM issues.

SKILL.mdUpdated May 17, 2026

kumewata/tone

development

VerifiedTrustedCommunity

Use when the user asks for help drafting a GitHub PR description, a PR review comment, or a Slack post in their own tone (i.e., their personal writing voice). The skill detects the context (formal for PR / review, casual for Slack) and target_type (pr_description, pr_review, slack), drafts the body with an explicit reflection step that avoids verbose, mechanical phrasing, and stages the draft to `~/.local/state/tone/drafts/` via `tone-stage-draft.sh`. The user later runs `/tone-capture <url>` after posting, which pairs the staged draft with the final body to build a corpus for future tone tuning. Trigger especially when the user mentions PR description, PR review comment, Slack post, または「文を書いて」「文面を作って」「自分らしく」「トーン」「tone」.

SKILL.mdUpdated May 1, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/kumewata/dotfiles.git

# Copy into Claude Code skills folder (global)
cp -r dotfiles/config/agents/skills/databricks ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

kumewata/dotfiles

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT