skills/codex/databricks-app-python/SKILL.md
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-app-python description: Build Python-based Databricks applications using Dash, Streamlit, or Flask. --- # Databricks Python Application Build Python-based Databricks applications using frameworks like Dash, Streamlit, Flask, or other Python web frameworks. ## Trigger Conditions **Invoke when user requests**: - "Dash app" or "Dash application" - "Streamlit app" or "Streamlit application" - "Python web app" for Dat
npx skillsauth add frank-luongt/faos-skills-marketplace skills/codex/databricks-app-pythonInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Build Python-based Databricks applications using frameworks like Dash, Streamlit, Flask, or other Python web frameworks.
Invoke when user requests:
Do NOT invoke if user specifies: APX, React, Node.js, or other non-Python frameworks.
Ask user which framework to use if not specified:
| Aspect | Dash | Streamlit | | --------------------- | ---------------------------------- | --------------------------------- | | Development Speed | Moderate (more boilerplate) | Fast (script-based) | | Learning Curve | Steeper (callbacks, components) | Gentle (Pythonic, intuitive) | | Layout Control | High (Bootstrap grid, custom CSS) | Medium (columns, containers) | | Styling | Extensive (Bootstrap themes, CSS) | Limited (custom CSS via markdown) | | Callbacks | Explicit (Input/Output decorators) | Automatic (reruns on interaction) | | State Management | Manual (via callbacks) | Built-in (st.session_state) | | Performance | Better for complex interactions | Slower (full page reruns) | | Best For | Production dashboards, BI tools | Prototypes, data science demos | | Multi-page Apps | Better routing support | Simpler but less flexible | | Data Science Fit | Good (requires more setup) | Excellent (notebook-like) | | Code Complexity | ~600 lines for full app | ~400 lines for full app |
Choose Dash when:
Choose Streamlit when:
For framework-specific details, see:
python --version (3.9+)uv package manager: uv --versionDATABRICKS_WAREHOUSE_ID (required for SQL backend)All Python Databricks apps follow this pattern:
app-directory/
├── models.py # Pydantic data models
├── backend_mock.py # Mock backend with sample data
├── backend_real.py # Real Databricks backend
├── {framework}_app.py # Main application (dash_app.py, streamlit_app.py, etc.)
├── setup_database.py # Database initialization
├── requirements.txt # Python dependencies
├── app.yaml # Databricks Apps configuration
├── .env # Environment configuration
└── README.md # Documentation
Dash (dash_app.py):
dash>=2.14.0
dash-bootstrap-components>=1.5.0
pandas>=2.0.0
plotly>=5.17.0
pydantic>=2.0.0
python-dotenv>=1.0.0
databricks-sdk>=0.12.0
databricks-sql-connector>=3.0.0
Streamlit (streamlit_app.py):
streamlit>=1.28.0
pandas>=2.0.0
plotly>=5.17.0
pydantic>=2.0.0
python-dotenv>=1.0.0
databricks-sdk>=0.12.0
databricks-sql-connector>=3.0.0
Key Difference: Dash requires dash-bootstrap-components, Streamlit doesn't need any additional
UI libraries.
# Standard environment variables
USE_MOCK_BACKEND=true|false # Toggle backend mode
DATABRICKS_WAREHOUSE_ID=... # SQL Warehouse ID (required)
DATABRICKS_CATALOG=main # Unity Catalog
DATABRICKS_SCHEMA=app_schema # Schema name
DATABRICKS_APP_PORT=8080 # Application port
DEBUG=false # Debug mode
# Note: No DATABRICKS_TOKEN needed when using SDK Config
# Authentication handled automatically via:
# - Databricks CLI profile (local development)
# - Service principal (Databricks Apps)
import os
USE_MOCK = os.getenv("USE_MOCK_BACKEND", "true").lower() == "true"
if USE_MOCK:
from backend_mock import MockBackend
backend = MockBackend()
else:
from backend_real import RealBackend
backend = RealBackend()
from pydantic import BaseModel, Field, field_validator
from decimal import Decimal
from datetime import datetime
from enum import Enum
from typing import List, Optional
class StatusEnum(str, Enum):
"""Status enumeration"""
ACTIVE = "active"
INACTIVE = "inactive"
class Entity(BaseModel):
"""Main entity model"""
id: str = Field(..., description="Unique identifier")
name: str = Field(..., description="Entity name")
created_at: datetime = Field(default_factory=datetime.utcnow)
status: StatusEnum = Field(default=StatusEnum.ACTIVE)
amount: Decimal = Field(..., description="Monetary amount", gt=0)
@field_validator('amount', mode='before')
@classmethod
def validate_amount(cls, v):
"""Ensure amount is a valid Decimal"""
if isinstance(v, (int, float, str)):
return Decimal(str(v))
return v
class Config:
json_schema_extra = {
"example": {
"id": "ENT-001",
"name": "Example Entity",
"status": "active",
"amount": "99.99"
}
}
from typing import List, Optional
from models import Entity
class MockBackend:
"""Mock backend with sample data"""
def __init__(self):
self.entities = self._generate_entities()
def _generate_entities(self) -> List[Entity]:
"""Generate sample data"""
return [
Entity(id="ENT-001", name="Entity 1", amount=Decimal("100.00")),
Entity(id="ENT-002", name="Entity 2", amount=Decimal("200.00")),
]
def get_entities(self, filter_criteria: Optional[dict] = None) -> List[Entity]:
"""Get entities with optional filtering"""
results = self.entities
if filter_criteria:
# Apply filters
if filter_criteria.get("status"):
results = [e for e in results if e.status == filter_criteria["status"]]
return results
def get_entity(self, entity_id: str) -> Optional[Entity]:
"""Get specific entity"""
for entity in self.entities:
if entity.id == entity_id:
return entity
return None
def get_statistics(self) -> dict:
"""Get aggregated statistics"""
return {
"total_count": len(self.entities),
"total_amount": float(sum(e.amount for e in self.entities))
}
Important: For SQL Warehouse connection examples, see the Databricks Apps Cookbook:
import os
from databricks import sql
from databricks.sdk import WorkspaceClient
from databricks.sdk.core import Config
from typing import List, Optional
from models import Entity
class RealBackend:
"""Real backend using Databricks SQL with SDK Config authentication"""
def __init__(self, catalog: Optional[str] = None, schema: Optional[str] = None):
self.catalog = catalog or os.getenv("DATABRICKS_CATALOG", "main")
self.schema = schema or os.getenv("DATABRICKS_SCHEMA", "app_schema")
self.warehouse_id = os.getenv("DATABRICKS_WAREHOUSE_ID")
if not self.warehouse_id:
raise ValueError("DATABRICKS_WAREHOUSE_ID required")
self.config = Config() # Automatically handles authentication
self._connection = None
def _get_connection(self):
"""Get or create database connection using SDK Config"""
if self._connection is None:
self._connection = sql.connect(
server_hostname=self.config.host,
http_path=f"/sql/1.0/warehouses/{self.warehouse_id}",
credentials_provider=lambda: self.config.authenticate
)
return self._connection
def _execute_query(self, query: str, params: Optional[dict] = None) -> List[dict]:
"""Execute SQL query and return results"""
connection = self._get_connection()
cursor = connection.cursor()
try:
cursor.execute(query, params or {})
columns = [desc[0] for desc in cursor.description]
results = []
for row in cursor.fetchall():
results.append(dict(zip(columns, row)))
return results
finally:
cursor.close()
def get_entities(self, filter_criteria: Optional[dict] = None) -> List[Entity]:
"""Get entities with optional filtering"""
query = f"""
SELECT * FROM {self.catalog}.{self.schema}.entities
WHERE 1=1
"""
params = {}
if filter_criteria and filter_criteria.get("status"):
query += " AND status = :status"
params["status"] = filter_criteria["status"]
query += " ORDER BY created_at DESC"
results = self._execute_query(query, params)
return [Entity(**row) for row in results]
def initialize_schema(self):
"""Initialize database schema"""
self._execute_query(f"""
CREATE TABLE IF NOT EXISTS {self.catalog}.{self.schema}.entities (
id STRING NOT NULL,
name STRING NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP(),
status STRING NOT NULL,
amount DECIMAL(10, 2) NOT NULL,
PRIMARY KEY (id)
)
""")
def close(self):
"""Close database connection"""
if self._connection:
self._connection.close()
self._connection = None
"""Database setup script"""
import os
import argparse
from dotenv import load_dotenv
from backend_mock import MockBackend
from backend_real import RealBackend
def setup_database(seed_data: bool = False):
"""Initialize database and optionally seed data"""
load_dotenv()
# Verify environment
required_vars = ["DATABRICKS_SERVER_HOSTNAME", "DATABRICKS_TOKEN", "DATABRICKS_WAREHOUSE_ID"]
missing = [v for v in required_vars if not os.getenv(v)]
if missing:
print(f"Missing: {', '.join(missing)}")
return 1
# Initialize backend
backend = RealBackend()
backend.initialize_schema()
# Seed if requested
if seed_data:
mock = MockBackend()
# Copy data from mock to real backend
for entity in mock.entities:
backend.insert_entity(entity)
backend.close()
return 0
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--seed", action="store_true")
args = parser.parse_args()
exit(setup_database(seed_data=args.seed))
json_schema_extra examples.env files for configuration.env.example template--seed# Always convert to Decimal for monetary values
@field_validator('price', 'total', mode='before')
@classmethod
def validate_decimal(cls, v):
if isinstance(v, (int, float, str)):
return Decimal(str(v))
return v
# Consistent date formatting
order.order_date.strftime("%Y-%m-%d %H:%M")
order.created_at.isoformat()
# Map status to visual indicators
STATUS_COLORS = {
Status.ACTIVE: "#2CA02C", # Green
Status.PENDING: "#FF7F0E", # Orange
Status.FAILED: "#D62728", # Red
}
# Reusable filter criteria model
class FilterCriteria(BaseModel):
status: Optional[Status] = None
date_from: Optional[datetime] = None
date_to: Optional[datetime] = None
search: Optional[str] = None
First Step: Check Application Logs
# Always check logs first when troubleshooting
databricks apps logs <app-name> --profile <profile-name>
# Examples:
databricks apps logs order-management-dash-dev -p DEFAULT
databricks apps logs order-management-streamlit-dev -p DEFAULT
Logs reveal:
Connection Issues
databricks auth profilesDATABRICKS_WAREHOUSE_ID exists and is accessibledatabricks warehouses get <warehouse-id>databricks apps logs <app-name>Data Type Errors
Performance Issues
IMPORTANT: Before deploying, ask the user which deployment method they prefer:
databricks apps commandsExample: "Would you like to deploy using Databricks CLI or Databricks Asset Bundles (DABs)?"
Prerequisites:
Steps:
For Dash apps:
command:
- 'python'
- 'dash_app.py'
env:
- name: USE_MOCK_BACKEND
value: 'false'
- name: DATABRICKS_WAREHOUSE_ID
value: 'your-warehouse-id'
- name: DATABRICKS_CATALOG
value: 'main'
- name: DATABRICKS_SCHEMA
value: 'app_schema'
- name: DATABRICKS_APP_PORT
value: '8080'
- name: DEBUG
value: 'false'
For Streamlit apps:
command:
- 'streamlit'
- 'run'
- 'streamlit_app.py'
- '--server.port'
- '8080'
- '--server.address'
- '0.0.0.0'
env:
- name: USE_MOCK_BACKEND
value: 'false'
- name: DATABRICKS_WAREHOUSE_ID
value: 'your-warehouse-id'
- name: DATABRICKS_CATALOG
value: 'main'
- name: DATABRICKS_SCHEMA
value: 'app_schema'
Note: Streamlit uses streamlit run command, while Dash uses python. Streamlit doesn't need
DATABRICKS_APP_PORT env var as it's specified in the command.
# Run setup script locally (requires profile configured)
python setup_database.py --seed
databricks apps create <app-name> --profile <profile-name>
databricks workspace mkdirs /Workspace/Users/<user>/apps/<app-name> --profile <profile-name>
databricks workspace import-dir . /Workspace/Users/<user>/apps/<app-name> --profile <profile-name>
databricks apps deploy <app-name> \
--source-code-path /Workspace/Users/<user>/apps/<app-name> \
--profile <profile-name>
databricks apps get <app-name> --profile <profile-name>
Redeployment:
# Update workspace files
databricks workspace delete /Workspace/Users/<user>/apps/<app-name> --recursive --profile <profile-name>
databricks workspace mkdirs /Workspace/Users/<user>/apps/<app-name> --profile <profile-name>
databricks workspace import-dir . /Workspace/Users/<user>/apps/<app-name> --profile <profile-name>
# Redeploy
databricks apps deploy <app-name> \
--source-code-path /Workspace/Users/<user>/apps/<app-name> \
--profile <profile-name>
Prerequisites:
Advantages:
Recommended Workflow: CLI First, Then DABs
Deploy app using CLI first (see Option 1 above)
Generate bundle configuration from existing app
# This creates resources/*.app.yml and downloads source to src/app/
databricks bundle generate app \
--existing-app-name <app-name> \
--key <resource_key> \
--profile <profile-name>
# Example:
databricks bundle generate app \
--existing-app-name order-management-dash \
--key order_management_dash \
--profile DEFAULT
What gets generated:
resources/<resource_key>.app.yml - Minimal app resource definitionsrc/app/ - All app source files including app.yaml with env varsdatabricks.yml updated with bundle structureEdit databricks.yml:
bundle:
name: <app-name>
include:
- resources/*.yml
variables:
warehouse_id:
default: 'your-warehouse-id'
catalog:
default: 'main'
schema:
default: 'app_schema'
targets:
dev:
default: true
mode: development
workspace:
profile: <profile-name>
variables:
warehouse_id: 'dev-warehouse-id'
schema: 'app_schema_dev'
prod:
mode: production
workspace:
profile: <profile-name>
variables:
warehouse_id: 'prod-warehouse-id'
schema: 'app_schema_prod'
Edit resources/<resource_key>.app.yml:
resources:
apps:
<resource_key>:
name: <app-name>-${bundle.target} # Environment-specific naming
description: 'Python ${framework} application'
source_code_path: ../src/app # Or .. if source in project root
Important: Environment variables are in src/app/app.yaml, NOT in databricks.yml:
command:
- 'python'
- 'dash_app.py'
env:
- name: USE_MOCK_BACKEND
value: 'false'
- name: DATABRICKS_WAREHOUSE_ID
value: 'your-warehouse-id'
- name: DATABRICKS_CATALOG
value: 'main'
- name: DATABRICKS_SCHEMA
value: 'app_schema'
# Validate configuration
databricks bundle validate -t dev
# Deploy to dev (creates/updates resource)
databricks bundle deploy -t dev
# Start the app (required after deployment)
databricks bundle run <resource_key> -t dev
# For production
databricks bundle deploy -t prod
databricks bundle run <resource_key> -t prod
Key Differences from Other Resources:
app.yaml (source dir), NOT databricks.ymldatabricks bundle run to start the app after deploymentFor complete DABs guidance, use the asset-bundles skill.
Verify deployment
Configure permissions
Set up monitoring and view logs
View application logs:
# View logs for your deployed app
databricks apps logs <app-name> --profile <profile-name>
# Examples:
databricks apps logs order-management-dash-dev --profile DEFAULT
databricks apps logs order-management-streamlit-dev --profile DEFAULT
What logs show:
[SYSTEM] - Deployment status, file updates, dependency installation[APP] - Application output (print statements, framework messages)Useful for debugging:
Additional monitoring:
Documentation
For framework-specific implementation details:
development
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-mlflow-evaluation --- # MLflow 3 GenAI Evaluation ## Before Writing Any Code 1. **Read GOTCHAS.md** - 15+ common mistakes that cause failures 2. **Read CRITICAL-interfaces.md** - Exact API signatures and data schemas ## End-to-End Workflows Follow these workflows based on your goal. Each step indicates which reference files to read. ### Workflow 1: First-Time Evaluation Setup For users new to MLflow GenAI evalu
development
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-lakebase-provisioned --- # Lakebase Provisioned Patterns and best practices for using Lakebase Provisioned (Databricks managed PostgreSQL) for OLTP workloads. ## When to Use Use this skill when: - Building applications that need a PostgreSQL database for transactional workloads - Adding persistent state to Databricks Apps - Implementing reverse ETL from Delta Lake to an operational database - Storing chat/agent m
tools
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-jobs --- # Databricks Lakeflow Jobs ## Overview Databricks Jobs orchestrate data workflows with multi-task DAGs, flexible triggers, and comprehensive monitoring. Jobs support diverse task types and can be managed via Python SDK, CLI, or Asset Bundles. ## Reference Files | Use Case | Reference File | | ----------------------
development
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-genie --- # Databricks Genie Create and query Databricks Genie Spaces - natural language interfaces for SQL-based data exploration. ## Overview Genie Spaces allow users to ask natural language questions about structured data in Unity Catalog. The system translates questions into SQL queries, executes them on a SQL warehouse, and presents results conversationally. ## When to Use This Skill Use this skill when: -