Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

openhands/spark-version-upgrade

Name: spark-version-upgrade
Author: openhands

skills/spark-version-upgrade/SKILL.md

npx skillsauth add openhands/extensions spark-version-upgrade

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Upgrade Apache Spark applications between major versions with a structured, phase-by-phase workflow.

When to Use

Migrating from Spark 2.x → 3.x or Spark 3.x → 4.x
Updating PySpark, Spark SQL, or Structured Streaming applications
Resolving deprecation warnings before a Spark version bump

Workflow Overview

Inventory & Impact Analysis — Scan the codebase and assess scope
Build File Updates — Bump Spark/Scala/Java dependencies
API Migration — Replace deprecated and removed APIs
Configuration Migration — Update Spark config properties
SQL & DataFrame Migration — Fix query-level breaking changes
Test Validation — Compile, run tests, verify results

Phase 1: Inventory & Impact Analysis

Before changing any code, assess what needs to change. Read the official Apache Spark migration guide for the target version — it documents every API removal, config rename, and behavioral change per release: https://spark.apache.org/docs/latest/migration-guide.html

Checklist

[ ] Read the migration guide section for the target Spark version
[ ] Identify current Spark version (check pom.xml, build.sbt, build.gradle, or requirements.txt)
[ ] Identify target Spark version
[ ] Search for deprecated APIs: grep -rn 'import org.apache.spark' --include='*.scala' --include='*.java' --include='*.py'
[ ] List all Spark config properties: grep -rn 'spark\.' --include='*.conf' --include='*.properties' --include='*.scala' --include='*.java' --include='*.py' | grep -v 'test'
[ ] On Windows PowerShell, use Get-ChildItem -Recurse -Include *.scala,*.java,*.py | Select-String 'import org.apache.spark' and adjust the extensions/pattern for config searches.
[ ] Check for custom SparkSession or SparkContext extensions
[ ] Identify connector dependencies (Hive, Kafka, Cassandra, Delta, Iceberg)
[ ] Document findings in spark_upgrade_impact.md

Output

spark_upgrade_impact.md   # Summary of affected files, APIs, and configs

Phase 2: Build File Updates

Update dependency versions and resolve compilation.

Maven (`pom.xml`)

<!-- Update Spark version property -->
<spark.version>3.5.1</spark.version>    <!-- or 4.0.0 -->
<scala.version>2.13.12</scala.version>  <!-- Spark 3.x: 2.12/2.13; Spark 4.x: 2.13 -->

<!-- Update artifact IDs if Scala cross-version changed -->
<artifactId>spark-core_2.13</artifactId>
<artifactId>spark-sql_2.13</artifactId>

SBT (`build.sbt`)

val sparkVersion = "3.5.1" // or "4.0.0"
scalaVersion := "2.13.12"

libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion
libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion

Gradle (`build.gradle`)

ext {
    sparkVersion = '3.5.1' // or '4.0.0'
}
dependencies {
    implementation "org.apache.spark:spark-core_2.13:${sparkVersion}"
    implementation "org.apache.spark:spark-sql_2.13:${sparkVersion}"
}

PySpark (`requirements.txt` / `pyproject.toml`)

pyspark==3.5.1   # or 4.0.0

Checklist

[ ] Update Spark version in build file
[ ] Update Scala version if crossing 2.12→2.13 boundary
[ ] Update Java source/target level if required (Spark 4.x requires Java 17+)
[ ] Update connector library versions to match new Spark version
[ ] Resolve dependency conflicts (mvn dependency:tree / sbt dependencyTree)
[ ] Confirm project compiles (errors at this stage are expected — they guide Phase 3)

Phase 3: API Migration

Replace removed and deprecated APIs. Work through compiler errors systematically.

Common Patterns

Consult the official Apache Spark migration guide for the complete list of changes for each version: https://spark.apache.org/docs/latest/migration-guide.html

SparkSession Creation (2.x → 3.x)

// BEFORE (Spark 1.x/2.x)
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)

// AFTER (Spark 2.x+/3.x)
val spark = SparkSession.builder()
  .config(conf)
  .enableHiveSupport() // if needed
  .getOrCreate()
val sc = spark.sparkContext

RDD to DataFrame (2.x → 3.x)

// BEFORE
rdd.toDF()  // implicit from SQLContext

// AFTER
import spark.implicits._
rdd.toDF()  // implicit from SparkSession

Accumulator API (2.x → 3.x)

// BEFORE
val acc = sc.accumulator(0)

// AFTER
val acc = sc.longAccumulator("name")

Checklist

[ ] Replace SQLContext / HiveContext with SparkSession
[ ] Replace deprecated Accumulator with AccumulatorV2
[ ] Update DataFrame → Dataset[Row] where needed
[ ] Replace removed RDD.mapPartitionsWithContext with mapPartitions
[ ] Fix SparkConf deprecated setters
[ ] Update custom UserDefinedFunction registration
[ ] Migrate Experimental / DeveloperApi usages that were removed
[ ] Verify all compilation errors from Phase 2 are resolved

Phase 4: Configuration Migration

Spark renames and removes configuration properties between versions. The official migration guide documents every renamed and removed property per release: https://spark.apache.org/docs/latest/migration-guide.html

Checklist

[ ] Rename deprecated config keys (e.g., spark.shuffle.file.buffer.kb → spark.shuffle.file.buffer)
[ ] Update removed configs to their replacements
[ ] Review spark-defaults.conf, application code, and submit scripts
[ ] Check for hardcoded config values in test fixtures
[ ] Verify SparkSession.builder().config(...) calls use current property names

Phase 5: SQL & DataFrame Migration

Spark SQL behavior changes between versions can silently alter query results.

Key Breaking Changes (2.x → 3.x)

CAST to integer no longer truncates silently — set spark.sql.ansi.enabled if needed
FROM clause is required in SELECT (no more SELECT 1)
Column resolution order changed in subqueries
spark.sql.legacy.timeParserPolicy controls date/time parsing behavior

Key Breaking Changes (3.x → 4.x)

ANSI mode is default (spark.sql.ansi.enabled=true)
Stricter type coercion in comparisons
spark.sql.legacy.* flags removed

Checklist

[ ] Audit SQL strings and DataFrame expressions for changed behavior
[ ] Add explicit CAST where implicit coercion relied on legacy behavior
[ ] Update date/time format patterns to match new parser
[ ] Test SQL queries with representative data and compare output to pre-upgrade baseline
[ ] Set spark.sql.legacy.* flags temporarily if needed for phased migration

Phase 6: Test Validation

Checklist

[ ] All code compiles without errors
[ ] All existing unit tests pass
[ ] All existing integration tests pass
[ ] Run Spark jobs locally with sample data and compare output to pre-upgrade baseline
[ ] No deprecation warnings remain (or are documented with a migration timeline)
[ ] Update CI/CD pipeline to use new Spark version
[ ] Document any spark.sql.legacy.* flags that are set temporarily

Done When

✓ Project compiles against target Spark version ✓ All tests pass ✓ No removed APIs remain in code ✓ Configuration properties are current ✓ SQL queries produce correct results ✓ Upgrade impact documented in spark_upgrade_impact.md

openhands/spark-version-upgrade

skills/spark-version-upgrade/SKILL.md

Upgrade Apache Spark applications between major versions (2.x→3.x, 3.x→4.x). Covers build files, deprecated APIs, configuration changes, SQL/DataFrame updates, and test validation.

118 stars

development

Updated Jul 2, 2026

$ install --global

skillsauth

npx skillsauth add openhands/extensions spark-version-upgrade

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jul 2, 2026, 6:31 AM149.8s2 files scanned

SKILL.md

name:: spark-version-upgrade
description:: Upgrade Apache Spark applications between major versions (2.x→3.x, 3.x→4.x). Covers build files, deprecated APIs, configuration changes, SQL/DataFrame updates, and test validation.
license:: MIT
compatibility:: Requires Java 8+/11+/17+, Scala 2.12/2.13, Maven/Gradle/SBT, Apache Spark

Upgrade Apache Spark applications between major versions with a structured, phase-by-phase workflow.

When to Use

Migrating from Spark 2.x → 3.x or Spark 3.x → 4.x
Updating PySpark, Spark SQL, or Structured Streaming applications
Resolving deprecation warnings before a Spark version bump

Workflow Overview

Inventory & Impact Analysis — Scan the codebase and assess scope
Build File Updates — Bump Spark/Scala/Java dependencies
API Migration — Replace deprecated and removed APIs
Configuration Migration — Update Spark config properties
SQL & DataFrame Migration — Fix query-level breaking changes
Test Validation — Compile, run tests, verify results

Phase 1: Inventory & Impact Analysis

Checklist

[ ] Read the migration guide section for the target Spark version
[ ] Identify current Spark version (check pom.xml, build.sbt, build.gradle, or requirements.txt)
[ ] Identify target Spark version
[ ] Search for deprecated APIs: grep -rn 'import org.apache.spark' --include='*.scala' --include='*.java' --include='*.py'
[ ] List all Spark config properties: grep -rn 'spark\.' --include='*.conf' --include='*.properties' --include='*.scala' --include='*.java' --include='*.py' | grep -v 'test'
[ ] On Windows PowerShell, use Get-ChildItem -Recurse -Include *.scala,*.java,*.py | Select-String 'import org.apache.spark' and adjust the extensions/pattern for config searches.
[ ] Check for custom SparkSession or SparkContext extensions
[ ] Identify connector dependencies (Hive, Kafka, Cassandra, Delta, Iceberg)
[ ] Document findings in spark_upgrade_impact.md

Output

spark_upgrade_impact.md   # Summary of affected files, APIs, and configs

Phase 2: Build File Updates

Update dependency versions and resolve compilation.

Maven (`pom.xml`)

<!-- Update Spark version property -->
<spark.version>3.5.1</spark.version>    <!-- or 4.0.0 -->
<scala.version>2.13.12</scala.version>  <!-- Spark 3.x: 2.12/2.13; Spark 4.x: 2.13 -->

<!-- Update artifact IDs if Scala cross-version changed -->
<artifactId>spark-core_2.13</artifactId>
<artifactId>spark-sql_2.13</artifactId>

SBT (`build.sbt`)

val sparkVersion = "3.5.1" // or "4.0.0"
scalaVersion := "2.13.12"

libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion
libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion

Gradle (`build.gradle`)

ext {
    sparkVersion = '3.5.1' // or '4.0.0'
}
dependencies {
    implementation "org.apache.spark:spark-core_2.13:${sparkVersion}"
    implementation "org.apache.spark:spark-sql_2.13:${sparkVersion}"
}

PySpark (`requirements.txt` / `pyproject.toml`)

pyspark==3.5.1   # or 4.0.0

Checklist

[ ] Update Spark version in build file
[ ] Update Scala version if crossing 2.12→2.13 boundary
[ ] Update Java source/target level if required (Spark 4.x requires Java 17+)
[ ] Update connector library versions to match new Spark version
[ ] Resolve dependency conflicts (mvn dependency:tree / sbt dependencyTree)
[ ] Confirm project compiles (errors at this stage are expected — they guide Phase 3)

Phase 3: API Migration

Replace removed and deprecated APIs. Work through compiler errors systematically.

Common Patterns

Consult the official Apache Spark migration guide for the complete list of changes for each version: https://spark.apache.org/docs/latest/migration-guide.html

SparkSession Creation (2.x → 3.x)

// BEFORE (Spark 1.x/2.x)
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)

// AFTER (Spark 2.x+/3.x)
val spark = SparkSession.builder()
  .config(conf)
  .enableHiveSupport() // if needed
  .getOrCreate()
val sc = spark.sparkContext

RDD to DataFrame (2.x → 3.x)

// BEFORE
rdd.toDF()  // implicit from SQLContext

// AFTER
import spark.implicits._
rdd.toDF()  // implicit from SparkSession

Accumulator API (2.x → 3.x)

// BEFORE
val acc = sc.accumulator(0)

// AFTER
val acc = sc.longAccumulator("name")

Checklist

[ ] Replace SQLContext / HiveContext with SparkSession
[ ] Replace deprecated Accumulator with AccumulatorV2
[ ] Update DataFrame → Dataset[Row] where needed
[ ] Replace removed RDD.mapPartitionsWithContext with mapPartitions
[ ] Fix SparkConf deprecated setters
[ ] Update custom UserDefinedFunction registration
[ ] Migrate Experimental / DeveloperApi usages that were removed
[ ] Verify all compilation errors from Phase 2 are resolved

Phase 4: Configuration Migration

Checklist

[ ] Rename deprecated config keys (e.g., spark.shuffle.file.buffer.kb → spark.shuffle.file.buffer)
[ ] Update removed configs to their replacements
[ ] Review spark-defaults.conf, application code, and submit scripts
[ ] Check for hardcoded config values in test fixtures
[ ] Verify SparkSession.builder().config(...) calls use current property names

Phase 5: SQL & DataFrame Migration

Spark SQL behavior changes between versions can silently alter query results.

Key Breaking Changes (2.x → 3.x)

CAST to integer no longer truncates silently — set spark.sql.ansi.enabled if needed
FROM clause is required in SELECT (no more SELECT 1)
Column resolution order changed in subqueries
spark.sql.legacy.timeParserPolicy controls date/time parsing behavior

Key Breaking Changes (3.x → 4.x)

ANSI mode is default (spark.sql.ansi.enabled=true)
Stricter type coercion in comparisons
spark.sql.legacy.* flags removed

Checklist

[ ] Audit SQL strings and DataFrame expressions for changed behavior
[ ] Add explicit CAST where implicit coercion relied on legacy behavior
[ ] Update date/time format patterns to match new parser
[ ] Test SQL queries with representative data and compare output to pre-upgrade baseline
[ ] Set spark.sql.legacy.* flags temporarily if needed for phased migration

Phase 6: Test Validation

Checklist

[ ] All code compiles without errors
[ ] All existing unit tests pass
[ ] All existing integration tests pass
[ ] Run Spark jobs locally with sample data and compare output to pre-upgrade baseline
[ ] No deprecation warnings remain (or are documented with a migration timeline)
[ ] Update CI/CD pipeline to use new Spark version
[ ] Document any spark.sql.legacy.* flags that are set temporarily

Done When

Related Skills

openhands/iterate

tools

VerifiedTrustedCommunity

Iterate on a GitHub pull request — drive it through CI, code review, and QA until it is merge-ready. Poll verification layers with `gh` CLI, diagnose and fix CI failures, address review feedback, retry flaky checks, push fixes, and repeat. The agent is the orchestration loop.

125SKILL.mdUpdated Apr 18, 2026

openhands/technical-writing

development

VerifiedTrustedCommunity

Guides technical explanations toward flowing, direct, conversational prose. This skill should be used for engineering chat, design discussion, architecture analysis, code-review explanations, and technical recommendations that should be concise without becoming fragmented or vague.

124SKILL.mdUpdated Jul 13, 2026

openhands/technical-writing

openhands/jira-issue-to-pr

tools

VerifiedTrustedCommunity

This skill should be used when the user asks to "set up a Jira automation to create pull requests", "poll Jira for create-pr issues", "automatically create GitHub PRs from Jira tickets", "deploy a Jira issue-to-PR automation", "create a Jira to GitHub PR workflow", or mentions automating GitHub PR creation from a Jira label. Deploys a cron-based OpenHands automation that watches a Jira Cloud project for issues labeled with a configurable label (default: "create-pr") and spawns an agent conversation to create a GitHub pull request for each new issue found. The target GitHub repository is read from the body of the Jira ticket - no repo parameter is required at deploy time.

123SKILL.mdUpdated Jul 7, 2026

openhands/jira-issue-to-pr

openhands/openhands-automation

tools

VerifiedTrustedCommunity

This skill should be used when the user asks to "create an automation", "schedule a task", "set up a cron job", "webhook integration", "event-triggered automation", or mentions automations, scheduled tasks, cron scheduling, or webhook events in OpenHands Cloud.

123SKILL.mdUpdated Apr 24, 2026

openhands/openhands-automation

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/openhands/extensions.git

# Copy into Claude Code skills folder (global)
cp -r extensions/skills/spark-version-upgrade ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

openhands/extensions

118 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

openhands/spark-version-upgrade

$ install --global

Security Scan Results

SKILL.md

When to Use

Workflow Overview

Phase 1: Inventory & Impact Analysis

Checklist

Output

Phase 2: Build File Updates

Maven (pom.xml)

SBT (build.sbt)

Gradle (build.gradle)

PySpark (requirements.txt / pyproject.toml)

Checklist

Phase 3: API Migration

Common Patterns

SparkSession Creation (2.x → 3.x)

RDD to DataFrame (2.x → 3.x)

Accumulator API (2.x → 3.x)

Checklist

Phase 4: Configuration Migration

Checklist

Phase 5: SQL & DataFrame Migration

Key Breaking Changes (2.x → 3.x)

Key Breaking Changes (3.x → 4.x)

Checklist

Phase 6: Test Validation

Checklist

Done When

Related Skills

openhands/iterate

openhands/technical-writing

openhands/jira-issue-to-pr

openhands/openhands-automation

openhands/spark-version-upgrade

$ install --global

Security Scan Results

SKILL.md

When to Use

Workflow Overview

Phase 1: Inventory & Impact Analysis

Checklist

Output

Phase 2: Build File Updates

Maven (pom.xml)

SBT (build.sbt)

Gradle (build.gradle)

PySpark (requirements.txt / pyproject.toml)

Checklist

Phase 3: API Migration

Common Patterns

SparkSession Creation (2.x → 3.x)

RDD to DataFrame (2.x → 3.x)

Accumulator API (2.x → 3.x)

Checklist

Phase 4: Configuration Migration

Checklist

Phase 5: SQL & DataFrame Migration

Key Breaking Changes (2.x → 3.x)

Key Breaking Changes (3.x → 4.x)

Checklist

Phase 6: Test Validation

Checklist

Done When

Related Skills

openhands/iterate

openhands/technical-writing

openhands/jira-issue-to-pr

openhands/openhands-automation

Maven (`pom.xml`)

SBT (`build.sbt`)

Gradle (`build.gradle`)

PySpark (`requirements.txt` / `pyproject.toml`)

Maven (`pom.xml`)

SBT (`build.sbt`)

Gradle (`build.gradle`)

PySpark (`requirements.txt` / `pyproject.toml`)