Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

microsoft/hdinsight-migration

Name: hdinsight-migration
Author: microsoft

skills/hdinsight-migration/SKILL.md

npx skillsauth add microsoft/skills-for-fabric hdinsight-migration

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Update Check — ONCE PER SESSION (mandatory) The first time this skill is used in a session, run the check-updates skill before proceeding.

GitHub Copilot CLI / VS Code: invoke the check-updates skill.

Claude Code / Cowork / Cursor / Windsurf / Codex: compare local vs remote package.json version.

Skip if the check was already performed earlier in this session.

CRITICAL NOTES

To find workspace details (including its ID) from a workspace name: list all workspaces, then use JMESPath filtering

To find item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace, then use JMESPath filtering

HDInsight has no mssparkutils or dbutils equivalent — notebookutils is net-new capability being introduced

HiveContext and SQLContext are legacy Spark 1.x/2.x APIs — Fabric uses Spark 3.x SparkSession exclusively

wasb:// paths are deprecated and require a Storage Account key or SAS — replace with OneLake shortcuts

HDInsight → Microsoft Fabric Migration

Prerequisite Knowledge

Read these companion documents before executing migration tasks:

COMMON-CORE.md — Fabric REST API patterns, authentication, token audiences, item discovery
COMMON-CLI.md — az rest, az login, token acquisition, Fabric REST via CLI
SPARK-AUTHORING-CORE.md — Notebook deployment, lakehouse creation, Spark job execution

For notebook and Lakehouse creation, see spark-authoring-cli. For Fabric Warehouse DDL/DML authoring, see sqldw-authoring-cli.

| Topic | Reference | |---|---| | Migration Workload Map | § Migration Workload Map | | SparkSession & Context API Changes | § SparkSession API Changes | | WASB / ABFS → OneLake Path Migration | path-migration.md | | Hive DDL → Delta Lake / Lakehouse Schemas | hive-to-delta.md | | Oozie → Fabric Pipelines | § Oozie → Fabric Pipelines | | Introducing notebookutils | § Introducing notebookutils | | Before/After Code Patterns | code-patterns.md | | Spark Configuration Differences | § Spark Configuration Differences | | Must / Prefer / Avoid | § Must / Prefer / Avoid | | Authentication & Token Acquisition | COMMON-CORE.md § Authentication | | Lakehouse Management | SPARK-AUTHORING-CORE.md § Lakehouse Management |

Migration Workload Map

| HDInsight Component | Fabric Target | Notes | |---|---|---| | Spark cluster (notebooks, scripts) | Fabric Spark (Lakehouse / Notebooks / SJD) | No persistent cluster — Starter Pool or Custom Pool provides on-demand Spark | | Hive / HiveServer2 | Lakehouse SQL Endpoint + Lakehouse schemas | Delta Lake replaces Hive metastore; schemas provide namespace equivalent | | HBase | Fabric Warehouse or Azure Cosmos DB (separate from Fabric) | HBase has no direct Fabric equivalent — assess workload access patterns | | Oozie workflows | Fabric Data Pipelines | Map Oozie actions to Fabric activities; see § Oozie → Fabric Pipelines | | YARN Resource Manager | Fabric Spark monitoring (Spark UI, Monitoring Hub) | No YARN — Fabric manages compute automatically | | Ambari | Fabric Monitoring Hub + Admin Portal | Cluster health, capacity, and job monitoring | | WASB / ABFS storage | OneLake Shortcuts → abfss://[email protected]/ | See path-migration.md | | Ranger policies | Fabric workspace roles + OneLake data access roles | Map Ranger row/column filters to Lakehouse row-level security | | Livy REST server | Fabric Livy API | Compatible endpoint — see SPARK-AUTHORING-CORE.md |

SparkSession & Context API Changes

HDInsight Spark clusters often use legacy Spark 1.x / 2.x API styles. Replace all of these with the unified SparkSession:

| Legacy HDInsight Pattern | Fabric Spark 3.x Replacement | |---|---| | from pyspark import SparkContext; sc = SparkContext() | Not needed — sc = spark.sparkContext (pre-instantiated) | | from pyspark.sql import HiveContext; hc = HiveContext(sc) | Not needed — spark session has Hive-compatible SQL support via Delta schemas | | from pyspark.sql import SQLContext; sqlc = SQLContext(sc) | Not needed — use spark.sql(...) directly | | SparkSession.builder.enableHiveSupport().getOrCreate() | Not needed in Fabric — spark is pre-built and available | | sc.textFile("wasb://[email protected]/path") | spark.read.text("abfss://[email protected]/lh.Lakehouse/Files/path") | | sqlContext.sql("CREATE TABLE ... STORED AS ORC") | See hive-to-delta.md for Delta DDL equivalent |

In Fabric notebooks, spark (SparkSession) and sc (SparkContext) are pre-instantiated — do not call SparkContext() or SparkSession.builder...getOrCreate() at the top of migrated notebooks.

Oozie → Fabric Pipelines

Map Oozie workflow actions to Fabric Data Pipeline activities:

| Oozie Action Type | Fabric Pipeline Activity | Notes | |---|---|---| | <spark> action | Notebook activity or Spark Job Definition activity | Pass parameters via notebook cell parameters or SJD arguments | | <hive> action | Script activity (SQL) against Lakehouse SQL Endpoint | Convert HiveQL to Spark SQL or Delta SQL | | <shell> action | Azure Function activity or Web activity | Shell scripts must be refactored; no direct shell execution in Fabric Pipelines | | <java> action | Azure Batch activity (external) or refactor to PySpark | Java MapReduce jobs must be rewritten | | <sqoop> action | Copy Data activity (Fabric Data Factory connector) | Sqoop import/export maps to Fabric Copy Data with JDBC source/sink | | <coordinator> (time-based schedule) | Pipeline schedule trigger | Set recurrence in pipeline trigger; supports cron-like expressions | | <coordinator> (data-triggered) | Storage Event trigger | Trigger on OneLake file arrival |

Delegate to spark-authoring-cli for notebook and SJD creation after mapping pipeline activities.

Introducing `notebookutils`

HDInsight Spark had no built-in utility framework equivalent to mssparkutils or dbutils. When migrating to Fabric, introduce notebookutils for common operations:

| Operation | Old HDInsight Approach | notebookutils Equivalent | |---|---|---| | List files | dbutils (N/A) / HDFS CLI | notebookutils.fs.ls("abfss://...") | | Copy file | HDFS API / shutil | notebookutils.fs.cp(src, dest) | | Read secret | Azure Key Vault REST call | notebookutils.credentials.getSecret(keyVaultUrl, secretName) | | Get notebook context | Not available | notebookutils.runtime.context — returns workspace ID, notebook ID, etc. | | Run child notebook | Not available | notebookutils.notebook.run("notebook_name", timeout, {"param": "value"}) | | Exit notebook with value | sys.exit() | notebookutils.notebook.exit("value") | | Mount storage | WASB config in spark-defaults.conf | OneLake Shortcut (no runtime mount needed) |

Spark Configuration Differences

| HDInsight Concept | Fabric Spark Equivalent | Migration Action | |---|---|---| | spark-defaults.conf (cluster-wide) | Fabric Spark Workspace Settings + Environment item | Move config properties to Environment or use %%configure in notebooks | | %%configure magic | %%configure magic — identical | No change needed | | YARN queue / resource allocation | Fabric Spark pool node size and autoscale settings | Map queue SLAs to Custom Pool configuration | | Ambari service configs (HDFS, YARN tuning) | Not applicable — Fabric manages infrastructure | Remove; focus on application-level Spark configs | | HDI Spark version (e.g., Spark 2.4) | Fabric Runtime 1.3 = Spark 3.5 (latest) | Test for deprecated API removals (e.g., HiveContext, RDD-style ML) | | Conda environment / bootstrap.sh | Fabric Environment item with custom libraries | Recreate conda/pip dependencies in a Fabric Environment | | hive-site.xml (metastore connection) | Not needed — Delta Lake IS the metastore in Fabric | Remove metastore config; use Lakehouse schemas for namespace organization |

Must / Prefer / Avoid

MUST DO

Replace all wasb:// / wasbs:// paths with OneLake abfss:// paths or OneLake Shortcuts — wasb:// requires storage account keys which are not the Fabric-preferred auth model
Replace HiveContext, SQLContext, and standalone SparkContext() — use the pre-instantiated spark session in Fabric notebooks
Migrate Hive DDL (STORED AS ORC, LOCATION, TBLPROPERTIES) to Delta Lake DDL — see hive-to-delta.md
Introduce notebookutils for file system operations, secret retrieval, and child notebook orchestration where HDInsight used custom scripts or direct API calls
Replace Oozie XML workflows with Fabric Data Pipelines — see § Oozie → Fabric Pipelines
Align library management to Fabric Environments — remove bootstrap.sh, conda envs, and runtime %pip install patterns for production workloads

PREFER

OneLake Shortcuts over copying data — mount existing ADLS Gen2 containers as shortcuts to avoid re-ingestion during migration
Delta Lake for all tables migrated from Hive ORC/Parquet — ACID guarantees, time travel, and schema enforcement improve data quality
Fabric Starter Pool for initial migration validation — no pool configuration overhead, fast session startup
Lakehouse schemas (database namespaces) for organizing migrated Hive databases — one schema per Hive database within a single Lakehouse
Medallion architecture for restructuring migrated data layers during migration — align Bronze/Silver/Gold with raw Hive → validated Delta → serving Gold patterns

AVOID

Do not use SparkContext() or HiveContext() constructors in Fabric notebooks — they conflict with the pre-instantiated spark session and will raise errors
Do not use hive-site.xml or external Hive metastore configuration — Fabric's Delta Lake-backed Lakehouse IS the metastore
Do not assume YARN queue mappings translate to Fabric pools — re-design resource allocation based on Fabric Spark pool SLAs
Do not attempt to run Oozie shell actions or Java MapReduce jobs directly in Fabric — these must be refactored (see § Oozie → Fabric Pipelines)
Do not use %sh magic for file system operations in production notebooks — use notebookutils.fs.* for portability and OneLake token-based auth

Examples

See code-patterns.md for full before/after examples. Key quick references:

Legacy context → Fabric pre-instantiated session

# HDInsight (remove entirely)
from pyspark.sql import HiveContext
hc = HiveContext(sc)

# Fabric — use pre-instantiated spark directly
df = spark.sql("SELECT * FROM sales.fact_orders")

WASB path → OneLake path (after shortcut creation)

# HDInsight
df = spark.read.parquet("wasb://[email protected]/orders/")

# Fabric
df = spark.read.parquet("Files/raw/orders/")

Hive DDL → Delta DDL

-- HDInsight
CREATE TABLE sales_db.fact_orders (...) STORED AS ORC LOCATION 'wasb://...';

-- Fabric
CREATE SCHEMA IF NOT EXISTS sales_db;
CREATE TABLE sales_db.fact_orders (...) USING DELTA;

microsoft/hdinsight-migration

skills/hdinsight-migration/SKILL.md

Port Azure HDInsight Spark clusters and Hive workloads to Microsoft Fabric. Removes legacy HiveContext and standalone SparkContext constructors, replacing them with the pre-instantiated SparkSession. Converts WASB and ABFS storage paths to OneLake abfss URLs via Shortcuts. Transforms Hive DDL (STORED AS ORC, external tables) to Delta Lake schemas inside Fabric Lakehouse. Maps Oozie workflow actions — spark, hive, shell, sqoop, coordinator — to Fabric Pipeline activities and schedule triggers. Introduces notebookutils for file and credential operations previously handled via subprocess or HDFS client calls. Use when the user wants to: (1) retire an HDInsight cluster and move to Fabric, (2) convert WASB paths or Hive DDL, (3) replace Oozie coordinators with Fabric Pipelines. Triggers: "migrate from hdinsight", "hdi to fabric", "hivecontext sparksession fabric", "wasb to onelake", "hive ddl to delta", "oozie to fabric pipelines", "hive metastore lakehouse", "hdinsight spark migration".

338 stars

tools

Updated May 11, 2026

$ install --global

skillsauth

npx skillsauth add microsoft/skills-for-fabric hdinsight-migration

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 11, 2026, 6:07 AM103.0s4 files scanned

SKILL.md

name:: hdinsight-migration
description:: >
Triggers:: migrate from hdinsight", "hdi to fabric", "hivecontext sparksession fabric",

Update Check — ONCE PER SESSION (mandatory) The first time this skill is used in a session, run the check-updates skill before proceeding.

GitHub Copilot CLI / VS Code: invoke the check-updates skill.

Claude Code / Cowork / Cursor / Windsurf / Codex: compare local vs remote package.json version.

Skip if the check was already performed earlier in this session.

CRITICAL NOTES

To find workspace details (including its ID) from a workspace name: list all workspaces, then use JMESPath filtering

To find item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace, then use JMESPath filtering

HDInsight has no mssparkutils or dbutils equivalent — notebookutils is net-new capability being introduced

HiveContext and SQLContext are legacy Spark 1.x/2.x APIs — Fabric uses Spark 3.x SparkSession exclusively

wasb:// paths are deprecated and require a Storage Account key or SAS — replace with OneLake shortcuts

HDInsight → Microsoft Fabric Migration

Prerequisite Knowledge

Read these companion documents before executing migration tasks:

COMMON-CORE.md — Fabric REST API patterns, authentication, token audiences, item discovery
COMMON-CLI.md — az rest, az login, token acquisition, Fabric REST via CLI
SPARK-AUTHORING-CORE.md — Notebook deployment, lakehouse creation, Spark job execution

For notebook and Lakehouse creation, see spark-authoring-cli. For Fabric Warehouse DDL/DML authoring, see sqldw-authoring-cli.

Migration Workload Map

SparkSession & Context API Changes

HDInsight Spark clusters often use legacy Spark 1.x / 2.x API styles. Replace all of these with the unified SparkSession:

In Fabric notebooks, spark (SparkSession) and sc (SparkContext) are pre-instantiated — do not call SparkContext() or SparkSession.builder...getOrCreate() at the top of migrated notebooks.

Oozie → Fabric Pipelines

Map Oozie workflow actions to Fabric Data Pipeline activities:

Delegate to spark-authoring-cli for notebook and SJD creation after mapping pipeline activities.

Introducing `notebookutils`

HDInsight Spark had no built-in utility framework equivalent to mssparkutils or dbutils. When migrating to Fabric, introduce notebookutils for common operations:

Spark Configuration Differences

Must / Prefer / Avoid

MUST DO

Replace all wasb:// / wasbs:// paths with OneLake abfss:// paths or OneLake Shortcuts — wasb:// requires storage account keys which are not the Fabric-preferred auth model
Replace HiveContext, SQLContext, and standalone SparkContext() — use the pre-instantiated spark session in Fabric notebooks
Migrate Hive DDL (STORED AS ORC, LOCATION, TBLPROPERTIES) to Delta Lake DDL — see hive-to-delta.md
Introduce notebookutils for file system operations, secret retrieval, and child notebook orchestration where HDInsight used custom scripts or direct API calls
Replace Oozie XML workflows with Fabric Data Pipelines — see § Oozie → Fabric Pipelines
Align library management to Fabric Environments — remove bootstrap.sh, conda envs, and runtime %pip install patterns for production workloads

PREFER

OneLake Shortcuts over copying data — mount existing ADLS Gen2 containers as shortcuts to avoid re-ingestion during migration
Delta Lake for all tables migrated from Hive ORC/Parquet — ACID guarantees, time travel, and schema enforcement improve data quality
Fabric Starter Pool for initial migration validation — no pool configuration overhead, fast session startup
Lakehouse schemas (database namespaces) for organizing migrated Hive databases — one schema per Hive database within a single Lakehouse
Medallion architecture for restructuring migrated data layers during migration — align Bronze/Silver/Gold with raw Hive → validated Delta → serving Gold patterns

AVOID

Do not use SparkContext() or HiveContext() constructors in Fabric notebooks — they conflict with the pre-instantiated spark session and will raise errors
Do not use hive-site.xml or external Hive metastore configuration — Fabric's Delta Lake-backed Lakehouse IS the metastore
Do not assume YARN queue mappings translate to Fabric pools — re-design resource allocation based on Fabric Spark pool SLAs
Do not attempt to run Oozie shell actions or Java MapReduce jobs directly in Fabric — these must be refactored (see § Oozie → Fabric Pipelines)
Do not use %sh magic for file system operations in production notebooks — use notebookutils.fs.* for portability and OneLake token-based auth

Examples

See code-patterns.md for full before/after examples. Key quick references:

Legacy context → Fabric pre-instantiated session

# HDInsight (remove entirely)
from pyspark.sql import HiveContext
hc = HiveContext(sc)

# Fabric — use pre-instantiated spark directly
df = spark.sql("SELECT * FROM sales.fact_orders")

WASB path → OneLake path (after shortcut creation)

# HDInsight
df = spark.read.parquet("wasb://[email protected]/orders/")

# Fabric
df = spark.read.parquet("Files/raw/orders/")

Hive DDL → Delta DDL

-- HDInsight
CREATE TABLE sales_db.fact_orders (...) STORED AS ORC LOCATION 'wasb://...';

-- Fabric
CREATE SCHEMA IF NOT EXISTS sales_db;
CREATE TABLE sales_db.fact_orders (...) USING DELTA;

Related Skills

microsoft/azmon-mirroredcatalogs-operations-cli

tools

VerifiedTrustedCommunity

Onboard Log Analytics, Application Insights, and Azure Monitor telemetry into Microsoft Fabric as a Mirrored Catalog, then turn it into business-impact insights by correlating telemetry with business data via Eventhouse shortcuts, verified schemas, and ready-to-use Operations Agent instructions. Triggers: onboard Log Analytics telemetry, connect Application Insights to a Mirrored Catalog, correlate App Insights telemetry with business data, build an Operations Agent for business-impact alerting, determine if availability or latency impacted bookings orders or revenue.

877SKILL.mdUpdated Jul 26, 2026

microsoft/azmon-mirroredcatalogs-operations-cli

microsoft/azmon-mirroredcatalogs-operations-cli

tools

VerifiedTrustedCommunity

877SKILL.mdUpdated Jul 26, 2026

microsoft/azmon-mirroredcatalogs-operations-cli

microsoft/azmon-mirroredcatalogs-operations-cli

tools

VerifiedTrustedCommunity

877SKILL.mdUpdated Jul 26, 2026

microsoft/azmon-mirroredcatalogs-operations-cli

microsoft/mlv-operations-cli

tools

VerifiedTrustedCommunity

Manage refresh schedules and job execution for an EXISTING Microsoft Fabric Materialized Lake View (MLV) via REST APIs: create, update, and delete refresh schedules (interval-based: hourly, daily, weekly), trigger on-demand refreshes, monitor job status, and cancel running jobs. Uses human-in-the-loop confirmations for safety. This skill does NOT author or create the MLV definition: writing the CREATE MATERIALIZED LAKE VIEW / CREATE OR REPLACE SQL is `spark-authoring-cli`, not this skill. Note: MLV discovery (list MLVs, lineage, data quality) requires UI as REST APIs are not yet available. Triggers: "schedule MLV refresh", "manage MLV refresh", "MLV refresh schedule", "schedule materialized lake view refresh", "automate MLV refresh", "trigger MLV refresh", "monitor MLV refresh", "MLV job status", "cancel MLV refresh", "refresh schedule", "MLV automation", "refresh my materialized views"

877SKILL.mdUpdated Jun 26, 2026

microsoft/mlv-operations-cli

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/microsoft/skills-for-fabric.git

# Copy into Claude Code skills folder (global)
cp -r skills-for-fabric/skills/hdinsight-migration ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

microsoft/skills-for-fabric

338 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

microsoft/hdinsight-migration

$ install --global

Security Scan Results

SKILL.md

HDInsight → Microsoft Fabric Migration

Prerequisite Knowledge

Table of Contents

Migration Workload Map

SparkSession & Context API Changes

Oozie → Fabric Pipelines

Introducing notebookutils

Spark Configuration Differences

Must / Prefer / Avoid

MUST DO

PREFER

AVOID

Examples

Related Skills

microsoft/azmon-mirroredcatalogs-operations-cli

microsoft/azmon-mirroredcatalogs-operations-cli

microsoft/azmon-mirroredcatalogs-operations-cli

microsoft/mlv-operations-cli

microsoft/hdinsight-migration

$ install --global

Security Scan Results

SKILL.md

HDInsight → Microsoft Fabric Migration

Prerequisite Knowledge

Table of Contents

Migration Workload Map

SparkSession & Context API Changes

Oozie → Fabric Pipelines

Introducing notebookutils

Spark Configuration Differences

Must / Prefer / Avoid

MUST DO

PREFER

AVOID

Examples

Related Skills

microsoft/azmon-mirroredcatalogs-operations-cli

microsoft/azmon-mirroredcatalogs-operations-cli

microsoft/azmon-mirroredcatalogs-operations-cli

microsoft/mlv-operations-cli

Introducing `notebookutils`

Introducing `notebookutils`