plugins/adf-master/skills/databricks-2025/SKILL.md
ADF + Databricks 2025 integration patterns. PROACTIVELY activate for: (1) Databricks Job activity in ADF, (2) DatabricksJob (preview) vs DatabricksNotebook activity, (3) ServiceNow V2 connector, (4) ADF managed identity authentication for Databricks, (5) Databricks serverless linked services, (6) Snowflake V2 connector, (7) Databricks job parameters and outputs, (8) MFA enforcement and authentication updates, (9) Unity Catalog integration, (10) Delta Live Tables orchestration from ADF. Provides: Databricks linked service templates (PAT, MSI, serverless), DatabricksJob activity examples, parameter passing recipes, and authentication migration guidance.
npx skillsauth add JosiahSiegel/claude-plugin-marketplace databricks-2025Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
CRITICAL UPDATE (2025): The Databricks Job activity is now the ONLY recommended method for orchestrating Databricks in ADF. Microsoft strongly recommends migrating from legacy Notebook, Python, and JAR activities.
DatabricksJob (NOT DatabricksSparkJob or DatabricksNotebook)jobParameters (NOT parameters)"authentication": "MSI") recommended| Feature | Notebook Activity (Legacy) | Job Activity (2025) | |---------|---------------------------|---------------------| | Compute | Must configure cluster in linked service | Serverless by default | | Workflow tasks | Single notebook | Multi-task DAGs (notebook, Python, SQL, DLT) | | Retry | ADF-level only | Job-level + task-level | | Repair runs | Not supported | Rerun failed tasks only | | Git integration | Limited | Full Databricks Git support + DABs | | Lineage | None | Built-in data lineage | | If/Else logic | Must use ADF control flow | Native If/Else task types |
For complete JSON examples of Job activity, linked service, and pipeline configurations, see references/databricks-job-examples.md.
ServiceNow V1 connector is at End of Support. Migrate to V2 immediately.
| Feature | V1 | V2 |
|---------|----|----|
| Linked service type | ServiceNow | ServiceNowV2 |
| Source type | ServiceNowSource | ServiceNowV2Source |
| Query builder | Custom | Aligns with ServiceNow condition builder |
| Performance | Standard | Enhanced extraction |
| OData support | No | Yes |
Migration steps: Update linked service type to ServiceNowV2, update source type to ServiceNowV2Source, test queries in ServiceNow UI condition builder, adjust timeouts.
Improved performance with 2025 SSL enhancements: enableSsl: true, sslMode: "Require".
Improved performance with KeyPair authentication support and Key Vault secret integration.
New managed identity support for Azure Table Storage and Azure Files connectors (system-assigned and user-assigned).
Spark 3.3 now powers Mapping Data Flows with 30% faster processing, Adaptive Query Execution (AQE), dynamic partition pruning, improved caching, and better column statistics.
Git integration now supports on-premises Azure DevOps Server 2022 via the hostName property.
For complete JSON examples of all connectors, see references/connector-examples.md.
| Scenario | Recommendation | |----------|---------------| | Single ADF, simple setup | System-assigned | | Multiple data factories | User-assigned (shared identity) | | Complex multi-environment | User-assigned | | Granular permission control | User-assigned | | Identity lifecycle independence | User-assigned |
Use ADF's centralized Credentials feature to consolidate Microsoft Entra ID-based credentials across multiple linked services.
Azure MFA is mandatory for all interactive user logins. Impact on ADF:
| Resource | Source Role | Sink Role |
|----------|-----------|-----------|
| Storage Blob | Storage Blob Data Reader | Storage Blob Data Contributor |
| SQL Database | db_datareader | db_datareader + db_datawriter |
| Key Vault | Get secrets only | Get secrets only |
For complete managed identity JSON examples, see references/connector-examples.md.
Use Databricks Job Activity (MANDATORY) -- Stop using Notebook, Python, JAR activities. Define workflows in Databricks workspace with serverless compute.
Managed Identity Authentication (MANDATORY) -- Use managed identities for ALL Azure resources. Leverage Credentials feature for consolidation. MFA-compliant since October 2025.
Monitor Job Execution -- Track Databricks Job run IDs from ADF output, log parameters for auditability, set up alerts for failures, leverage built-in lineage.
Optimize Spark 3.3 (Data Flows) -- Enable AQE, use 4-8 partitions per core, broadcast joins for small dimensions, dynamic partition pruning.
references/databricks-job-examples.md - Complete JSON for Job activity, linked services, pipeline, and Databricks workspace job definitionreferences/connector-examples.md - Complete JSON for ServiceNow V2, PostgreSQL, Snowflake, Azure Storage MI, Mapping Data Flows, and Azure DevOps Serverdevelopment
This skill should be used when the user asks to train, debug, scale, or improve ML models. PROACTIVELY activate for: (1) PyTorch, TensorFlow/Keras, JAX, Flax, Hugging Face Trainer/Accelerate training loops, (2) distributed training, DDP/FSDP/DeepSpeed, TPU/GPU setup, (3) mixed precision AMP/bf16, gradient accumulation, checkpointing, seeding, (4) overfitting, imbalance, loss functions, regularization, LR schedules, warmup, (5) memory optimization, gradient checkpointing, offloading, quantization-aware training. Provides: reproducible training best practices across deep learning and classical ML.
development
This skill should be used when the user asks to productionize, track, version, govern, monitor, or automate ML systems. PROACTIVELY activate for: (1) MLflow, Weights & Biases, Neptune, Comet, ClearML experiment tracking, (2) model registry, model versioning, artifact lineage, reproducibility, (3) Kubeflow, SageMaker Pipelines, Vertex AI Pipelines, Azure ML pipelines, Databricks workflows, (4) CI/CD, continuous training/evaluation, A/B tests, canary/shadow deployments, (5) drift detection, model monitoring, data validation, responsible AI governance. Provides: end-to-end MLOps architecture and operational safeguards.
development
This skill should be used when the user asks to optimize, export, serve, compress, or accelerate ML inference. PROACTIVELY activate for: (1) latency, throughput, p95/p99, batching, concurrency, KV cache, memory, or cost issues, (2) quantization INT8/INT4, GPTQ, AWQ, bitsandbytes, pruning, sparsity, distillation, (3) ONNX export, ONNX Runtime, TensorRT, TorchScript, torch.compile, XLA, OpenVINO, Core ML, TFLite, (4) Triton, TorchServe, TF Serving, BentoML, Seldon, KServe configuration, (5) edge deployment, CPU/GPU/TPU/Inferentia serving. Provides: hardware-aware inference optimization and safe benchmarking.
testing
This skill should be used when the user asks to tune hyperparameters, run sweeps, optimize search spaces, or use AutoML. PROACTIVELY activate for: (1) Optuna, Ray Tune, FLAML, AutoGluon, Hyperopt, Nevergrad, KerasTuner, W&B sweeps, (2) grid search, random search, Bayesian optimization, TPE, Gaussian processes, evolutionary search, (3) ASHA, Hyperband, successive halving, multi-fidelity optimization, population-based training, (4) learning-rate finder, batch-size search, early stopping, pruning, (5) reproducible sweep design and experiment analysis. Provides: budget-aware hyperparameter search strategy.