skills/spark-authoring-cli/SKILL.md
Develop Microsoft Fabric Spark/data engineering workflows and write code in Fabric Notebook cells with intelligent routing to specialized resources. Provides workspace/lakehouse management, notebook code authoring (PySpark, Scala, SparkR, SQL), and routes to: data engineering patterns, development workflow, or infrastructure orchestration. Use when the user wants to: (1) manage Fabric workspaces and resources, (2) write or debug code in notebook cells, (3) use notebookutils, (4) develop notebooks and PySpark applications, (5) design data pipelines, (6) provision infrastructure as code. Triggers: "develop notebook", "data engineering", "workspace setup", "pipeline design", "infrastructure provisioning", "Delta Lake patterns", "Spark development", "lakehouse configuration", "write notebook code", "notebookutils", "notebook cell", "PySpark notebook", "%%sql cell", "%%configure", "fabric notebook", "run notebook", "notebook deployment".
npx skillsauth add microsoft/skills-for-fabric spark-authoring-cliInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Update Check — ONCE PER SESSION (mandatory) The first time this skill is used in a session, run the check-updates skill before proceeding.
- GitHub Copilot CLI / VS Code: invoke the
check-updatesskill.- Claude Code / Cowork / Cursor / Windsurf / Codex: compare local vs remote package.json version.
- Skip if the check was already performed earlier in this session.
CRITICAL NOTES
- To find the workspace details (including its ID) from workspace name: list all workspaces and, then, use JMESPath filtering
- To find the item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace and, then, use JMESPath filtering
This skill covers two complementary areas: (1) managing Fabric Spark artifacts via REST APIs (workspaces, lakehouses, notebooks, jobs, pipelines) and (2) writing code inside Fabric Notebook cells (PySpark, Scala, SparkR, SQL with correct lakehouse access, notebookutils, and Spark configuration). For notebook code authoring fundamentals and shared modules, MUST see SPARK-NOTEBOOK-AUTHORING-CORE.md.
| Task | Reference | Notes |
|---|---|---|
| RULES — Read these first, follow them always | SKILL.md § RULES | MUST read — 4 rules for this skill |
| Finding Workspaces and Items in Fabric | COMMON-CLI.md § Finding Workspaces and Items in Fabric | Mandatory — READ link first [needed for finding workspace id by its name or item id by its name, item type, and workspace id] |
| Fabric Topology & Key Concepts | COMMON-CORE.md § Fabric Topology & Key Concepts ||
| Environment URLs | COMMON-CORE.md § Environment URLs ||
| Authentication & Token Acquisition | COMMON-CORE.md § Authentication & Token Acquisition | Wrong audience = 401; read before any auth issue |
| Core Control-Plane REST APIs | COMMON-CORE.md § Core Control-Plane REST APIs ||
| Pagination | COMMON-CORE.md § Pagination ||
| Long-Running Operations (LRO) | COMMON-CORE.md § Long-Running Operations (LRO) ||
| Rate Limiting & Throttling | COMMON-CORE.md § Rate Limiting & Throttling ||
| OneLake Data Access | COMMON-CORE.md § OneLake Data Access | Requires storage.azure.com token, not Fabric token |
| Definition Envelope | ITEM-DEFINITIONS-CORE.md § Definition Envelope | Definition payload structure |
| Per-Item-Type Definitions | ITEM-DEFINITIONS-CORE.md § Per-Item-Type Definitions | Support matrix, decoded content, part paths — REST specs, CLI recipes |
| Job Execution | COMMON-CORE.md § Job Execution ||
| Capacity Management | COMMON-CORE.md § Capacity Management ||
| Gotchas & Troubleshooting | COMMON-CORE.md § Gotchas & Troubleshooting ||
| Best Practices | COMMON-CORE.md § Best Practices ||
| Tool Selection Rationale | COMMON-CLI.md § Tool Selection Rationale ||
| Authentication Recipes | COMMON-CLI.md § Authentication Recipes | az login flows and token acquisition |
| Fabric Control-Plane API via az rest | COMMON-CLI.md § Fabric Control-Plane API via az rest | Always pass --resource https://api.fabric.microsoft.com or az rest fails |
| Pagination Pattern | COMMON-CLI.md § Pagination Pattern ||
| Long-Running Operations (LRO) Pattern | COMMON-CLI.md § Long-Running Operations (LRO) Pattern ||
| OneLake Data Access via curl | COMMON-CLI.md § OneLake Data Access via curl | Use curl not az rest (different token audience) |
| SQL / TDS Data-Plane Access | COMMON-CLI.md § SQL / TDS Data-Plane Access ||
| Job Execution (CLI) | COMMON-CLI.md § Job Execution ||
| Job Scheduling | COMMON-CLI.md § Job Scheduling | URL is /jobs/{jobType}/schedules; endDateTime required |
| OneLake Shortcuts | COMMON-CLI.md § OneLake Shortcuts ||
| Capacity Management (CLI) | COMMON-CLI.md § Capacity Management ||
| Composite Recipes | COMMON-CLI.md § Composite Recipes ||
| Gotchas & Troubleshooting (CLI-Specific) | COMMON-CLI.md § Gotchas & Troubleshooting (CLI-Specific) | az rest audience, shell escaping, token expiry |
| Quick Reference: az rest Template | COMMON-CLI.md § Quick Reference: az rest Template ||
| Quick Reference: Token Audience / CLI Tool Matrix | COMMON-CLI.md § Quick Reference: Token Audience ↔ CLI Tool Matrix | Which --resource + tool for each service |
| Relationship to SPARK-CONSUMPTION-CORE.md | SPARK-AUTHORING-CORE.md § Relationship to SPARK-CONSUMPTION-CORE.md ||
| Data Engineering Authoring Capability Matrix | SPARK-AUTHORING-CORE.md § Data Engineering Authoring Capability Matrix ||
| Lakehouse Management | SPARK-AUTHORING-CORE.md § Lakehouse Management ||
| Notebook Management | SPARK-AUTHORING-CORE.md § Notebook Management ||
| Notebook Execution & Job Management | SPARK-AUTHORING-CORE.md § Notebook Execution & Job Management ||
| CI/CD & Automation Patterns | SPARK-AUTHORING-CORE.md § CI/CD & Automation Patterns ||
| Infrastructure-as-Code | SPARK-AUTHORING-CORE.md § Infrastructure-as-Code ||
| Performance Optimization & Resource Management | SPARK-AUTHORING-CORE.md § Performance Optimization & Resource Management ||
| Authoring Gotchas and Troubleshooting | SPARK-AUTHORING-CORE.md § Authoring Gotchas and Troubleshooting ||
| Quick Reference: Authoring Decision Guide | SPARK-AUTHORING-CORE.md § Quick Reference: Authoring Decision Guide ||
| Recommended Patterns (Data Engineering) |data-engineering-patterns.md § Recommended patterns ||
| Data Ingestion Principles | data-engineering-patterns.md § Data Ingestion Principles ||
| Transformation Patterns | data-engineering-patterns.md § Transformation Patterns ||
| Delta Lake Best Practices | data-engineering-patterns.md § Delta Lake Best Practices ||
| Quality Assurance Strategies | data-engineering-patterns.md § Quality Assurance Strategies ||
| Recommended Patterns (Development Workflow) | development-workflow.md § Recommended patterns ||
| Notebook Lifecycle | development-workflow.md § Notebook Lifecycle ||
| Parameterization Patterns | development-workflow.md § Parameterization Patterns ||
| Variable Library (notebook + pipeline usage) | development-workflow.md § Method 4: Variable Library | getLibrary() + dot notation in notebooks; libraryVariables + @pipeline().libraryVariables in pipelines |
| Variable Library Definition | ITEM-DEFINITIONS-CORE.md § VariableLibrary | Definition parts, decoded content, types, pipeline mappings, gotchas |
| Local Testing Strategy | development-workflow.md § Local Testing Strategy ||
| Debugging Patterns | development-workflow.md § Debugging Patterns ||
| Recommended Patterns (Infrastructure) | infrastructure-orchestration.md § Recommended patterns ||
| Workspace Provisioning Principles | infrastructure-orchestration.md § Workspace Provisioning Principles ||
| Lakehouse Configuration Guidance | infrastructure-orchestration.md § Lakehouse Configuration Guidance ||
| Pipeline Design Patterns | infrastructure-orchestration.md § Pipeline Design Patterns ||
| CI/CD Integration Strategy | infrastructure-orchestration.md § CI/CD Integration Strategy ||
| Notebook API — Which Endpoint to Use | notebook-api-operations.md § Quick Decision | Start here for remote notebook edits — getDefinition vs updateDefinition |
| Notebook Modification Workflow | notebook-api-operations.md § Workflow | Five-step flow: retrieve, decode, modify, encode, upload |
| Notebook API Error Reference | notebook-api-operations.md § Error Reference | 411, 400 (updateMetadata), 401, 403 explained |
| Notebook API Gotchas | notebook-api-operations.md § Gotchas | /result suffix, empty body, \n per-line rule, format=ipynb |
| Default Lakehouse Binding | notebook-api-operations.md § Default Lakehouse Binding | .ipynb metadata vs .py # METADATA block; discover IDs dynamically |
| Public URL Data Ingestion | notebook-api-operations.md § Public URL Data Ingestion | Use real source URL, stage into Files/, then read with Spark |
| getDefinition (read notebook content) | notebook-api-operations.md § Step 1 — Retrieve Notebook Content | LRO flow, ?format=ipynb, empty body (--body '{}') requirement |
| Decode Base64 Notebook Payload | notebook-api-operations.md § Step 2 — Decode the Notebook Content | Extract payload, base64 decode, ipynb JSON structure |
| Modify Notebook Cells | notebook-api-operations.md § Step 3 — Modify the Notebook Content | Find cell, insert/replace lines, \n per-line rule |
| updateDefinition (write notebook content) | notebook-api-operations.md § Step 4 — Re-encode and Upload | Re-encode, upload, LRO poll, updateMetadata flag pitfall |
| Verify Notebook Update (Optional) | notebook-api-operations.md § Step 5 — Verify the Update | Skip unless you suspect a silent failure — Succeeded from updateDefinition is sufficient (see Rule 2) |
| Notebook API Error Reference | notebook-api-operations.md § Error Reference | 411, 400 (updateMetadata), 401, 403 explained |
| Notebook API End-to-End Script | notebook-api-operations.md § Complete End-to-End Script | Full bash: get → decode → modify → encode → update → verify |
| Quick Start Examples | SKILL.md § Quick Start Examples | Minimal examples for common operations |
| — Notebook Code Authoring (shared modules) — | | |
| Notebook Authoring Core | SPARK-NOTEBOOK-AUTHORING-CORE.md | READ FIRST for notebook code tasks — fundamentals, code gen approach, module index |
\n to prevent code mergingname, driverMemory, driverCores, executorMemory, executorCores. Do NOT wrap in {"payload": ...} or send only {"kind": "pyspark"} — that causes HTTP 500. Use valid memory values (28g, 56g, 112g, 224g). See Create Lakehouse Livy Session example below and SPARK-CONSUMPTION-CORE.md.useStarterPool: trueuseWorkspacePool: truecreationPayload.enableSchemas: true for better table organizationRunNotebook) which creates a Notebook Spark session internally. See SPARK-AUTHORING-CORE.md § Notebook Execution & Job ManagementRule 1 — Validate prerequisites before operations. Verify workspace has capacity assigned (see COMMON-CORE.md Create Workspace and Capacity Management) and resource IDs exist before attempting operations.
Rule 2 — Trust updateDefinition success. A
Succeededpoll result fromupdateDefinitionis sufficient confirmation that content and lakehouse bindings persisted. Do NOT callgetDefinitionafter every upload — it is an async LRO that adds significant latency. Only usegetDefinitionfor its intended purpose: reading current notebook content before making modifications.Rule 3 — Prevent duplicate jobs and monitor execution properly. Before submitting new notebook run, ALWAYS check for recent job instances first (last 5 minutes). If recent job exists, monitor it instead of creating duplicate. After submission, capture job instance ID immediately and poll status - never retry POST. See SPARK-AUTHORING-CORE.md Job Monitoring for patterns.
Rule 4 — For notebook code authoring, MUST follow SPARK-NOTEBOOK-AUTHORING-CORE.md. When writing code inside notebook cells, MUST read SPARK-NOTEBOOK-AUTHORING-CORE.md first — it defines the code generation approach, rules, and a Module Index linking to detailed guides (lakehouse paths, connections, context, orchestration, etc.). Use the Spark-specific resources in this skill (data-engineering-patterns.md, development-workflow.md) for Spark-only implementation details.
For detailed patterns, authentication, and comprehensive API usage, see:
az rest usage, environment detection, token acquisitionBelow are minimal quick-start examples. Always reference the COMMON- files for production use.*
# See COMMON-CORE.md Environment URLs and SPARK-AUTHORING-CORE.md for full patterns
cat > /tmp/body.json << 'EOF'
{"displayName": "DataEng-Dev"}
EOF
workspace_id=$(az rest --method post --resource "https://api.fabric.microsoft.com" \
--url "https://api.fabric.microsoft.com/v1/workspaces" \
--body @/tmp/body.json --query "id" --output tsv)
cat > /tmp/body.json << 'EOF'
{"displayName": "DevLakehouse", "type": "Lakehouse", "creationPayload": {"enableSchemas": true}}
EOF
lakehouse_id=$(az rest --method post --resource "https://api.fabric.microsoft.com" \
--url "https://api.fabric.microsoft.com/v1/workspaces/$workspace_id/items" \
--body @/tmp/body.json --query "id" --output tsv)
# See SPARK-AUTHORING-CORE.md Lakehouse Schema Organization for table organization patterns
# Create schemas for medallion architecture
spark.sql("CREATE SCHEMA IF NOT EXISTS bronze")
spark.sql("CREATE SCHEMA IF NOT EXISTS silver")
spark.sql("CREATE SCHEMA IF NOT EXISTS gold")
# See SPARK-CONSUMPTION-CORE.md for Lakehouse Livy session configuration and management
# IMPORTANT: Body MUST be flat JSON with memory/cores — do NOT wrap in {"payload": ...}
cat > /tmp/body.json << 'EOF'
{"name": "dev-session", "driverMemory": "56g", "driverCores": 8, "executorMemory": "56g", "executorCores": 8, "conf": {"spark.dynamicAllocation.enabled": "true", "spark.fabric.pool.name": "Starter Pool"}}
EOF
az rest --method post --resource "https://api.fabric.microsoft.com" \
--url "https://api.fabric.microsoft.com/v1/workspaces/$workspace_id/lakehouses/$lakehouse_id/livyapi/versions/2023-12-01/sessions" \
--body @/tmp/body.json
Lakehouse Livy Session Body — Common Mistakes
- ❌
{"payload": {"kind": "pyspark"}}→ HTTP 500 (wrong wrapper, missing required fields)- ❌
{"kind": "pyspark"}→ HTTP 500 (missingdriverMemory,executorMemory, etc.)- ✅ Flat JSON with
name,driverMemory,driverCores,executorMemory,executorCores(and optionallyconfwith Starter Pool)
For detailed workload-specific configurations, see data-engineering-patterns.md Delta Lake Best Practices.
Quick reference:
# Write-heavy (Bronze): Disable V-Order, enable autoCompact
# Balanced (Silver): Enable V-Order, adaptive execution
# Read-heavy (Gold): Vectorized reads, optimal parallelism
# See data-engineering-patterns.md for complete config tables
Focus: Essential CLI patterns for Spark/data engineering development and notebook code authoring, with intelligent routing to specialized resources. For comprehensive patterns, always reference COMMON-* files and resource documents.
tools
Execute raw DAX queries and inspect metadata of Microsoft Fabric Power BI semantic models via the MCP server ExecuteQuery tool. Use when the user already knows the DAX to write, wants to run EVALUATE statements, or needs to inspect model metadata (tables, columns, measures, relationships, hierarchies) using INFO functions. For natural-language business questions (where you generate the DAX), use `fabriciq`. For creating, deploying, or managing semantic model definitions, use `semantic-model-authoring`. Triggers: "run DAX query", "execute EVALUATE", "semantic model metadata", "list semantic model tables", "INFO.VIEW.TABLES", "get measure expression", "DAX against", "query the model".
development
Develops and manages Power BI semantic models across Desktop, PBIP projects, and Fabric Service. Handles: (1) creating new models (Import, DirectQuery, Direct Lake), (2) editing existing models (e.g. measures, tables, columns, relationships), (3) deploying models to Fabric workspaces, (4) working with PBIP project files, (5) refreshing semantic models, (6) configuring data sources and permissions, (7) DAX performance optimization. Supports both Power BI Desktop and Fabric Service development workflows. For read-only DAX queries, use `semantic-model-consumption`. Does NOT handle report layout/visual authoring, workspace administration, or RLS/OLS role membership management. Triggers: "create semantic model", "edit semantic model", "add a DAX measure to semantic model", "refresh semantic model", "set semantic model permissions", "Prepare semantic model for AI/Copilot".
tools
Answer business questions by querying Power BI reports and dashboards through the FabricIQ MCP endpoint. Orchestrates: discover Power BI artifacts, inspect report/model schemas, resolve entity values, generate DAX, execute queries. Returns plain-language answers from Power BI semantic models. Use when the user asks a natural-language question about Power BI report or dashboard content (not raw DAX). Triggers: "ask power bi", "PBI question", "discover report", "report data", "dashboard data", "what are the top", "show me the power bi data", "which products sold", "compare sales in report".
development
Develops and manages Power BI semantic models across Desktop, PBIP projects, and Fabric Service. Handles: (1) creating new models (Import, DirectQuery, Direct Lake), (2) editing existing models (e.g. measures, tables, columns, relationships), (3) deploying models to Fabric workspaces, (4) working with PBIP project files, (5) refreshing semantic models, (6) configuring data sources and permissions, (7) DAX performance optimization. Supports both Power BI Desktop and Fabric Service development workflows. For read-only DAX queries, use `semantic-model-consumption`. Does NOT handle report layout/visual authoring, workspace administration, or RLS/OLS role membership management. Triggers: "create semantic model", "edit semantic model", "add a DAX measure to semantic model", "refresh semantic model", "set semantic model permissions", "Prepare semantic model for AI/Copilot".