skills/docker-databricks-lab-ops/SKILL.md
Start and verify the local Docker CDC lab (dvdrental), run the PostgreSQL load generators, reset Databricks tables, trigger Databricks notebook jobs through databricks.sdk, and check whether Bronze/Silver notebooks completed successfully. Use when Codex needs to bring up the repo's local infrastructure, generate CDC traffic, reset or clean Databricks tables, execute Databricks jobs, poll run status, inspect failures, or validate notebook outputs for this lab.
npx skillsauth add alexeyban/databricks-lab docker-databricks-lab-opsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill for the operational loop of this repository: bring up Docker services, generate source-table mutations, reset Databricks tables, execute a Databricks notebook or job, and verify whether the notebook run finished successfully.
Prefer the bundled scripts over rewriting shell commands. They encode the repository-specific paths and the expected sequence.
docker-compose.yml, postgres-connector.json, and the target notebook paths exist.scripts/start_stack.sh.docker compose up -d from the repository root.docker compose ps or service-specific health checks.scripts/prepare_ngrok_kafka.py and then start Compose with the discovered KAFKA_EXTERNAL_HOST and KAFKA_EXTERNAL_PORT.scripts/register_connector.sh.scripts/run_generators.sh.ITERATIONS; avoid indefinite generators unless the user asked for sustained load.Example:
skills/docker-databricks-lab-ops/scripts/run_generators.sh 20 40
This runs 20 film mutations and 40 rental/payment mutations.
scripts/reset_databricks_tables.py to drop all Bronze/Silver/Gold tables and clear streaming checkpoints before a fresh load.--cluster-id or will submit via git source.--dry-run to preview what would be dropped without dropping.python3 skills/docker-databricks-lab-ops/scripts/reset_databricks_tables.py \
--cluster-id <cluster-id> \
--catalog workspace
# preview only:
python3 skills/docker-databricks-lab-ops/scripts/reset_databricks_tables.py \
--cluster-id <cluster-id> \
--dry-run
scripts/run_databricks_notebook.py.--job-id to run an existing Databricks job, or--notebook-path and --cluster-id to submit a one-off notebook run.--notebook-param KAFKA_BOOTSTRAP=<ngrok-host:port> so the current tunnel endpoint is used at run time instead of a stale static value.DATABRICKS_HOST and DATABRICKS_TOKEN from the environment.Examples:
python3 skills/docker-databricks-lab-ops/scripts/run_databricks_notebook.py \
--job-id 123 \
--notebook-param KAFKA_BOOTSTRAP=0.tcp.eu.ngrok.io:12345
python3 skills/docker-databricks-lab-ops/scripts/run_databricks_notebook.py \
--notebook-path /Workspace/agent/notebook \
--cluster-id 0123-456abc-cluster
SUCCESS.run_idscripts/smoke_test_notebooks.py.--reset --cluster-id <id> to drop all tables and checkpoints before the smoke run.# Full smoke test with table reset:
python3 skills/docker-databricks-lab-ops/scripts/smoke_test_notebooks.py \
--job-id 123 \
--reset \
--cluster-id <cluster-id>
# Smoke test without reset (append to existing data):
python3 skills/docker-databricks-lab-ops/scripts/smoke_test_notebooks.py \
--job-id 123
scripts/start_stack.sh: start Docker Compose services for the labscripts/prepare_ngrok_kafka.py: discover or start an ngrok TCP tunnel for Kafka and print the current public bootstrapscripts/register_connector.sh: register the Debezium connector from postgres-connector.jsonscripts/run_generators.sh: run film and rental/payment generatorsscripts/reset_databricks_tables.py: drop all Bronze/Silver/Gold Delta tables and clear streaming checkpointsscripts/run_databricks_notebook.py: launch or submit a Databricks run and poll to completionscripts/smoke_test_notebooks.py: run the end-to-end smoke test with dynamic ngrok bootstrap handlingscripts/migrate_and_run.py: full migration script — drops legacy tables, updates the Databricks job via API, resets dvdrental tables, starts Docker+connector+generators, and triggers the job end-to-endreferences/repo-workflow.md: repo-specific execution order, assumptions, and parameterstesting
Use this skill to act as the agent defined in testing-reality-checker.md for tasks matching that role.
data-ai
Use this skill to act as the agent defined in spark_performance_engineer.md for tasks matching that role.
data-ai
Use this skill to act as the agent defined in project-manager-senior.md for tasks matching that role.
data-ai
Use this skill to act as the agent defined in lakehouse_data_architect.md for tasks matching that role.