skills/spark-version-upgrade/SKILL.md
Upgrade Apache Spark applications between major versions (2.x→3.x, 3.x→4.x). Covers build files, deprecated APIs, configuration changes, SQL/DataFrame updates, and test validation.
npx skillsauth add openhands/skills spark-version-upgradeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Upgrade Apache Spark applications between major versions with a structured, phase-by-phase workflow.
Before changing any code, assess what needs to change. Read the official Apache Spark migration guide for the target version — it documents every API removal, config rename, and behavioral change per release: https://spark.apache.org/docs/latest/migration-guide.html
pom.xml, build.sbt, build.gradle, or requirements.txt)grep -rn 'import org.apache.spark' --include='*.scala' --include='*.java' --include='*.py'grep -rn 'spark\.' --include='*.conf' --include='*.properties' --include='*.scala' --include='*.java' --include='*.py' | grep -v 'test'SparkSession or SparkContext extensionsspark_upgrade_impact.mdspark_upgrade_impact.md # Summary of affected files, APIs, and configs
Update dependency versions and resolve compilation.
pom.xml)<!-- Update Spark version property -->
<spark.version>3.5.1</spark.version> <!-- or 4.0.0 -->
<scala.version>2.13.12</scala.version> <!-- Spark 3.x: 2.12/2.13; Spark 4.x: 2.13 -->
<!-- Update artifact IDs if Scala cross-version changed -->
<artifactId>spark-core_2.13</artifactId>
<artifactId>spark-sql_2.13</artifactId>
build.sbt)val sparkVersion = "3.5.1" // or "4.0.0"
scalaVersion := "2.13.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion
libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion
build.gradle)ext {
sparkVersion = '3.5.1' // or '4.0.0'
}
dependencies {
implementation "org.apache.spark:spark-core_2.13:${sparkVersion}"
implementation "org.apache.spark:spark-sql_2.13:${sparkVersion}"
}
requirements.txt / pyproject.toml)pyspark==3.5.1 # or 4.0.0
mvn dependency:tree / sbt dependencyTree)Replace removed and deprecated APIs. Work through compiler errors systematically.
Consult the official Apache Spark migration guide for the complete list of changes for each version: https://spark.apache.org/docs/latest/migration-guide.html
// BEFORE (Spark 1.x/2.x)
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
// AFTER (Spark 2.x+/3.x)
val spark = SparkSession.builder()
.config(conf)
.enableHiveSupport() // if needed
.getOrCreate()
val sc = spark.sparkContext
// BEFORE
rdd.toDF() // implicit from SQLContext
// AFTER
import spark.implicits._
rdd.toDF() // implicit from SparkSession
// BEFORE
val acc = sc.accumulator(0)
// AFTER
val acc = sc.longAccumulator("name")
SQLContext / HiveContext with SparkSessionAccumulator with AccumulatorV2DataFrame → Dataset[Row] where neededRDD.mapPartitionsWithContext with mapPartitionsSparkConf deprecated settersUserDefinedFunction registrationExperimental / DeveloperApi usages that were removedSpark renames and removes configuration properties between versions. The official migration guide documents every renamed and removed property per release: https://spark.apache.org/docs/latest/migration-guide.html
spark.shuffle.file.buffer.kb → spark.shuffle.file.buffer)spark-defaults.conf, application code, and submit scriptsSparkSession.builder().config(...) calls use current property namesSpark SQL behavior changes between versions can silently alter query results.
CAST to integer no longer truncates silently — set spark.sql.ansi.enabled if neededFROM clause is required in SELECT (no more SELECT 1)spark.sql.legacy.timeParserPolicy controls date/time parsing behaviorspark.sql.ansi.enabled=true)spark.sql.legacy.* flags removedCAST where implicit coercion relied on legacy behaviorspark.sql.legacy.* flags temporarily if needed for phased migrationspark.sql.legacy.* flags that are set temporarily✓ Project compiles against target Spark version
✓ All tests pass
✓ No removed APIs remain in code
✓ Configuration properties are current
✓ SQL queries produce correct results
✓ Upgrade impact documented in spark_upgrade_impact.md
tools
Create an automation that reviews GitHub pull requests when a configurable trigger label is applied. Polls GitHub deterministically, starts one OpenHands review conversation per label event, inspects full repository and PR context, and posts the final review comment back to GitHub.
tools
This skill should be used when the user asks to "monitor a Slack channel", "watch Slack for messages", "create a Slack bot that responds to mentions", "set up an OpenHands Slack integration", "trigger OpenHands from Slack", "respond to @openhands in Slack", or "poll Slack channels for a trigger phrase". Guides the user through creating a cron automation that watches up to 10 Slack channels and starts an OpenHands conversation whenever a configurable trigger phrase is detected.
tools
Reference skill for the OpenHands Software Agent SDK - the Python framework for building AI agents that write software. Use when you need to build agents with the SDK, create custom tools, configure LLMs, manage conversations, delegate to sub-agents, or deploy agents locally or remotely.
tools
This skill should be used when the user asks to "monitor a GitHub repository", "watch GitHub for issues or PRs", "respond to @OpenHands mentions on GitHub", "set up an OpenHands GitHub integration", "trigger OpenHands from a GitHub comment", or "poll a GitHub repo for a trigger phrase". Guides the user through creating a cron automation that polls a single repository and starts an OpenHands conversation whenever a configurable trigger phrase is detected in an issue or PR comment.