skills/codex/databricks-jobs/SKILL.md
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-jobs --- # Databricks Lakeflow Jobs ## Overview Databricks Jobs orchestrate data workflows with multi-task DAGs, flexible triggers, and comprehensive monitoring. Jobs support diverse task types and can be managed via Python SDK, CLI, or Asset Bundles. ## Reference Files | Use Case | Reference File | | ----------------------
npx skillsauth add frank-luongt/faos-skills-marketplace skills/codex/databricks-jobsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Databricks Jobs orchestrate data workflows with multi-task DAGs, flexible triggers, and comprehensive monitoring. Jobs support diverse task types and can be managed via Python SDK, CLI, or Asset Bundles.
| Use Case | Reference File | | ------------------------------------------------------- | ---------------------------------------------------------- | | Configure task types (notebook, Python, SQL, dbt, etc.) | task-types.md | | Set up triggers and schedules | triggers-schedules.md | | Configure notifications and health monitoring | notifications-monitoring.md | | Complete working examples | examples.md |
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.jobs import Task, NotebookTask, Source
w = WorkspaceClient()
job = w.jobs.create(
name="my-etl-job",
tasks=[
Task(
task_key="extract",
notebook_task=NotebookTask(
notebook_path="/Workspace/Users/[email protected]/extract",
source=Source.WORKSPACE
)
)
]
)
print(f"Created job: {job.job_id}")
databricks jobs create --json '{
"name": "my-etl-job",
"tasks": [{
"task_key": "extract",
"notebook_task": {
"notebook_path": "/Workspace/Users/[email protected]/extract",
"source": "WORKSPACE"
}
}]
}'
# resources/jobs.yml
resources:
jobs:
my_etl_job:
name: '[${bundle.target}] My ETL Job'
tasks:
- task_key: extract
notebook_task:
notebook_path: ../src/notebooks/extract.py
Jobs support DAG-based task dependencies:
tasks:
- task_key: extract
notebook_task:
notebook_path: ../src/extract.py
- task_key: transform
depends_on:
- task_key: extract
notebook_task:
notebook_path: ../src/transform.py
- task_key: load
depends_on:
- task_key: transform
run_if: ALL_SUCCESS # Only run if all dependencies succeed
notebook_task:
notebook_path: ../src/load.py
run_if conditions:
ALL_SUCCESS (default) - Run when all dependencies succeedALL_DONE - Run when all dependencies complete (success or failure)AT_LEAST_ONE_SUCCESS - Run when at least one dependency succeedsNONE_FAILED - Run when no dependencies failedALL_FAILED - Run when all dependencies failedAT_LEAST_ONE_FAILED - Run when at least one dependency failed| Task Type | Use Case | Reference |
| ------------------- | ------------------------- | ------------------------------------------------------------------ |
| notebook_task | Run notebooks | task-types.md#notebook-task |
| spark_python_task | Run Python scripts | task-types.md#spark-python-task |
| python_wheel_task | Run Python wheels | task-types.md#python-wheel-task |
| sql_task | Run SQL queries/files | task-types.md#sql-task |
| dbt_task | Run dbt projects | task-types.md#dbt-task |
| pipeline_task | Trigger DLT/SDP pipelines | task-types.md#pipeline-task |
| spark_jar_task | Run Spark JARs | task-types.md#spark-jar-task |
| run_job_task | Trigger other jobs | task-types.md#run-job-task |
| for_each_task | Loop over inputs | task-types.md#for-each-task |
| Trigger Type | Use Case | Reference |
| ---------------------- | --------------------- | ---------------------------------------------------------------------------------------- |
| schedule | Cron-based scheduling | triggers-schedules.md#cron-schedule |
| trigger.periodic | Interval-based | triggers-schedules.md#periodic-trigger |
| trigger.file_arrival | File arrival events | triggers-schedules.md#file-arrival-trigger |
| trigger.table_update | Table change events | triggers-schedules.md#table-update-trigger |
| continuous | Always-running jobs | triggers-schedules.md#continuous-jobs |
Define reusable cluster configurations:
job_clusters:
- job_cluster_key: shared_cluster
new_cluster:
spark_version: '15.4.x-scala2.12'
node_type_id: 'i3.xlarge'
num_workers: 2
spark_conf:
spark.speculation: 'true'
tasks:
- task_key: my_task
job_cluster_key: shared_cluster
notebook_task:
notebook_path: ../src/notebook.py
new_cluster:
spark_version: '15.4.x-scala2.12'
node_type_id: 'i3.xlarge'
autoscale:
min_workers: 2
max_workers: 8
tasks:
- task_key: my_task
existing_cluster_id: '0123-456789-abcdef12'
notebook_task:
notebook_path: ../src/notebook.py
For notebook and Python tasks, omit cluster configuration to use serverless:
tasks:
- task_key: serverless_task
notebook_task:
notebook_path: ../src/notebook.py
# No cluster config = serverless
parameters:
- name: env
default: 'dev'
- name: date
default: '{{start_date}}' # Dynamic value reference
# In notebook
dbutils.widgets.get("env")
dbutils.widgets.get("date")
tasks:
- task_key: my_task
notebook_task:
notebook_path: ../src/notebook.py
base_parameters:
env: '{{job.parameters.env}}'
custom_param: 'value'
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# List jobs
jobs = w.jobs.list()
# Get job details
job = w.jobs.get(job_id=12345)
# Run job now
run = w.jobs.run_now(job_id=12345)
# Run with parameters
run = w.jobs.run_now(
job_id=12345,
job_parameters={"env": "prod", "date": "2024-01-15"}
)
# Cancel run
w.jobs.cancel_run(run_id=run.run_id)
# Delete job
w.jobs.delete(job_id=12345)
# List jobs
databricks jobs list
# Get job details
databricks jobs get 12345
# Run job
databricks jobs run-now 12345
# Run with parameters
databricks jobs run-now 12345 --job-params '{"env": "prod"}'
# Cancel run
databricks jobs cancel-run 67890
# Delete job
databricks jobs delete 12345
# Validate configuration
databricks bundle validate
# Deploy job
databricks bundle deploy
# Run job
databricks bundle run my_job_resource_key
# Deploy to specific target
databricks bundle deploy -t prod
# Destroy resources
databricks bundle destroy
resources:
jobs:
my_job:
name: 'My Job'
permissions:
- level: CAN_VIEW
group_name: 'data-analysts'
- level: CAN_MANAGE_RUN
group_name: 'data-engineers'
- level: CAN_MANAGE
user_name: '[email protected]'
Permission levels:
CAN_VIEW - View job and run historyCAN_MANAGE_RUN - View, trigger, and cancel runsCAN_MANAGE - Full control including edit and delete| Issue | Solution |
| ----------------------------------- | -------------------------------------------------------------- |
| Job cluster startup slow | Use job clusters with job_cluster_key for reuse across tasks |
| Task dependencies not working | Verify task_key references match exactly in depends_on |
| Schedule not triggering | Check pause_status: UNPAUSED and valid timezone |
| File arrival not detecting | Ensure path has proper permissions and uses cloud storage URL |
| Table update trigger missing events | Verify Unity Catalog table and proper grants |
| Parameter not accessible | Use dbutils.widgets.get() in notebooks |
| "admins" group error | Cannot modify admins permissions on jobs |
| Serverless task fails | Ensure task type supports serverless (notebook, Python) |
development
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-mlflow-evaluation --- # MLflow 3 GenAI Evaluation ## Before Writing Any Code 1. **Read GOTCHAS.md** - 15+ common mistakes that cause failures 2. **Read CRITICAL-interfaces.md** - Exact API signatures and data schemas ## End-to-End Workflows Follow these workflows based on your goal. Each step indicates which reference files to read. ### Workflow 1: First-Time Evaluation Setup For users new to MLflow GenAI evalu
development
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-lakebase-provisioned --- # Lakebase Provisioned Patterns and best practices for using Lakebase Provisioned (Databricks managed PostgreSQL) for OLTP workloads. ## When to Use Use this skill when: - Building applications that need a PostgreSQL database for transactional workloads - Adding persistent state to Databricks Apps - Implementing reverse ETL from Delta Lake to an operational database - Storing chat/agent m
development
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-genie --- # Databricks Genie Create and query Databricks Genie Spaces - natural language interfaces for SQL-based data exploration. ## Overview Genie Spaces allow users to ask natural language questions about structured data in Unity Catalog. The system translates questions into SQL queries, executes them on a SQL warehouse, and presents results conversationally. ## When to Use This Skill Use this skill when: -
tools
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-docs --- # Databricks Documentation Reference This skill provides access to the complete Databricks documentation index via llms.txt - use it as a **reference resource** to supplement other skills and inform your use of MCP tools. ## Role of This Skill This is a **reference skill**, not an action skill. Use it to: - Look up documentation when other skills don't cover a topic - Get authoritative guidance on Databr