skills/codex/databricks-lakebase-provisioned/SKILL.md
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-lakebase-provisioned --- # Lakebase Provisioned Patterns and best practices for using Lakebase Provisioned (Databricks managed PostgreSQL) for OLTP workloads. ## When to Use Use this skill when: - Building applications that need a PostgreSQL database for transactional workloads - Adding persistent state to Databricks Apps - Implementing reverse ETL from Delta Lake to an operational database - Storing chat/agent m
npx skillsauth add frank-luongt/faos-skills-marketplace skills/codex/databricks-lakebase-provisionedInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Patterns and best practices for using Lakebase Provisioned (Databricks managed PostgreSQL) for OLTP workloads.
Use this skill when:
Lakebase Provisioned is Databricks' managed PostgreSQL database service for OLTP (Online Transaction Processing) workloads. It provides a fully managed PostgreSQL-compatible database that integrates with Unity Catalog and supports OAuth token-based authentication.
| Feature | Description | | ------------------------ | --------------------------------------------------- | | Managed PostgreSQL | Fully managed instances with automatic provisioning | | OAuth Authentication | Token-based auth via Databricks SDK (1-hour expiry) | | Unity Catalog | Register databases for governance | | Reverse ETL | Sync data from Delta tables to PostgreSQL | | Apps Integration | First-class support in Databricks Apps |
Available Regions (AWS): us-east-1, us-east-2, us-west-2, eu-central-1, eu-west-1, ap-south-1, ap-southeast-1, ap-southeast-2
Create and connect to a Lakebase Provisioned instance:
from databricks.sdk import WorkspaceClient
import uuid
# Initialize client
w = WorkspaceClient()
# Create a database instance
instance = w.database.create_database_instance(
name="my-lakebase-instance",
capacity="SMALL", # SMALL, MEDIUM, LARGE
stopped=False
)
print(f"Instance created: {instance.name}")
print(f"DNS endpoint: {instance.read_write_dns}")
from databricks.sdk import WorkspaceClient
import uuid
w = WorkspaceClient()
# Generate OAuth token for database connection
cred = w.database.generate_database_credential(
request_id=str(uuid.uuid4()),
instance_names=["my-lakebase-instance"]
)
token = cred.token # Use this as password in connection string
import psycopg
from databricks.sdk import WorkspaceClient
import uuid
# Get instance details
w = WorkspaceClient()
instance = w.database.get_database_instance(name="my-lakebase-instance")
# Generate token
cred = w.database.generate_database_credential(
request_id=str(uuid.uuid4()),
instance_names=["my-lakebase-instance"]
)
# Connect using psycopg3
conn_string = f"host={instance.read_write_dns} dbname=postgres user={w.current_user.me().user_name} password={cred.token} sslmode=require"
with psycopg.connect(conn_string) as conn:
with conn.cursor() as cur:
cur.execute("SELECT version()")
print(cur.fetchone())
For long-running applications, tokens must be refreshed (expire after 1 hour):
import asyncio
import os
import uuid
from sqlalchemy import event
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import sessionmaker
from databricks.sdk import WorkspaceClient
# Token refresh state
_current_token = None
_token_refresh_task = None
TOKEN_REFRESH_INTERVAL = 50 * 60 # 50 minutes (before 1-hour expiry)
def _generate_token(instance_name: str) -> str:
"""Generate fresh OAuth token."""
w = WorkspaceClient()
cred = w.database.generate_database_credential(
request_id=str(uuid.uuid4()),
instance_names=[instance_name]
)
return cred.token
async def _token_refresh_loop(instance_name: str):
"""Background task to refresh token every 50 minutes."""
global _current_token
while True:
await asyncio.sleep(TOKEN_REFRESH_INTERVAL)
_current_token = await asyncio.to_thread(_generate_token, instance_name)
def init_database(instance_name: str, database_name: str, username: str) -> AsyncEngine:
"""Initialize database with OAuth token injection."""
global _current_token
w = WorkspaceClient()
instance = w.database.get_database_instance(name=instance_name)
# Generate initial token
_current_token = _generate_token(instance_name)
# Build URL (password injected via do_connect)
url = f"postgresql+psycopg://{username}@{instance.read_write_dns}:5432/{database_name}"
engine = create_async_engine(
url,
pool_size=5,
max_overflow=10,
pool_recycle=3600,
connect_args={"sslmode": "require"}
)
# Inject token on each connection
@event.listens_for(engine.sync_engine, "do_connect")
def provide_token(dialect, conn_rec, cargs, cparams):
cparams["password"] = _current_token
return engine
For Databricks Apps, use environment variables for configuration:
# Environment variables set by Databricks Apps:
# - LAKEBASE_INSTANCE_NAME: Instance name
# - LAKEBASE_DATABASE_NAME: Database name
# - LAKEBASE_USERNAME: Username (optional, defaults to service principal)
import os
def is_lakebase_configured() -> bool:
"""Check if Lakebase is configured for this app."""
return bool(
os.environ.get("LAKEBASE_PG_URL") or
(os.environ.get("LAKEBASE_INSTANCE_NAME") and
os.environ.get("LAKEBASE_DATABASE_NAME"))
)
Add Lakebase as an app resource via CLI:
databricks apps add-resource $APP_NAME \
--resource-type database \
--resource-name lakebase \
--database-instance my-lakebase-instance
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# Register database in Unity Catalog
w.database.register_database_instance(
name="my-lakebase-instance",
catalog="my_catalog",
schema="my_schema"
)
Declare Lakebase as a model resource for automatic credential provisioning:
from mlflow.models.resources import DatabricksLakebase
resources = [
DatabricksLakebase(database_instance_name="my-lakebase-instance"),
]
# When logging model
mlflow.langchain.log_model(
model,
artifact_path="model",
resources=resources,
pip_requirements=["databricks-langchain[memory]"]
)
# Create instance
databricks database create-database-instance \
--name my-lakebase-instance \
--capacity SMALL
# Get instance details
databricks database get-database-instance --name my-lakebase-instance
# Generate credentials
databricks database generate-database-credential \
--request-id $(uuidgen) \
--json '{"instance_names": ["my-lakebase-instance"]}'
# List instances
databricks database list-database-instances
# Stop instance (saves cost)
databricks database stop-database-instance --name my-lakebase-instance
# Start instance
databricks database start-database-instance --name my-lakebase-instance
| Issue | Solution |
| ----------------------------------- | ------------------------------------------------------------------------ |
| Token expired during long query | Implement token refresh loop (see Pattern 3); tokens expire after 1 hour |
| DNS resolution fails on macOS | Use dig command to resolve hostname, pass hostaddr to psycopg |
| Connection refused | Ensure instance is not stopped; check instance.state |
| Permission denied | User must be granted access to the Lakebase instance |
| SSL required error | Always use sslmode=require in connection string |
hostaddr parameter for DNS workaround)postgresql+psycopg driver%pip install -U "databricks-sdk>=0.81.0" "psycopg[binary]>=3.0" sqlalchemy
databricks-langchain[memory] which includes Lakebase
support.development
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-mlflow-evaluation --- # MLflow 3 GenAI Evaluation ## Before Writing Any Code 1. **Read GOTCHAS.md** - 15+ common mistakes that cause failures 2. **Read CRITICAL-interfaces.md** - Exact API signatures and data schemas ## End-to-End Workflows Follow these workflows based on your goal. Each step indicates which reference files to read. ### Workflow 1: First-Time Evaluation Setup For users new to MLflow GenAI evalu
tools
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-jobs --- # Databricks Lakeflow Jobs ## Overview Databricks Jobs orchestrate data workflows with multi-task DAGs, flexible triggers, and comprehensive monitoring. Jobs support diverse task types and can be managed via Python SDK, CLI, or Asset Bundles. ## Reference Files | Use Case | Reference File | | ----------------------
development
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-genie --- # Databricks Genie Create and query Databricks Genie Spaces - natural language interfaces for SQL-based data exploration. ## Overview Genie Spaces allow users to ask natural language questions about structured data in Unity Catalog. The system translates questions into SQL queries, executes them on a SQL warehouse, and presents results conversationally. ## When to Use This Skill Use this skill when: -
tools
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-docs --- # Databricks Documentation Reference This skill provides access to the complete Databricks documentation index via llms.txt - use it as a **reference resource** to supplement other skills and inform your use of MCP tools. ## Role of This Skill This is a **reference skill**, not an action skill. Use it to: - Look up documentation when other skills don't cover a topic - Get authoritative guidance on Databr