skills/spark-version-upgrade/SKILL.md
Upgrade Apache Spark applications between major versions (2.x→3.x, 3.x→4.x). Covers build files, deprecated APIs, configuration changes, SQL/DataFrame updates, and test validation.
npx skillsauth add openhands/extensions spark-version-upgradeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Upgrade Apache Spark applications between major versions with a structured, phase-by-phase workflow.
Before changing any code, assess what needs to change. Read the official Apache Spark migration guide for the target version — it documents every API removal, config rename, and behavioral change per release: https://spark.apache.org/docs/latest/migration-guide.html
pom.xml, build.sbt, build.gradle, or requirements.txt)grep -rn 'import org.apache.spark' --include='*.scala' --include='*.java' --include='*.py'grep -rn 'spark\.' --include='*.conf' --include='*.properties' --include='*.scala' --include='*.java' --include='*.py' | grep -v 'test'SparkSession or SparkContext extensionsspark_upgrade_impact.mdspark_upgrade_impact.md # Summary of affected files, APIs, and configs
Update dependency versions and resolve compilation.
pom.xml)<!-- Update Spark version property -->
<spark.version>3.5.1</spark.version> <!-- or 4.0.0 -->
<scala.version>2.13.12</scala.version> <!-- Spark 3.x: 2.12/2.13; Spark 4.x: 2.13 -->
<!-- Update artifact IDs if Scala cross-version changed -->
<artifactId>spark-core_2.13</artifactId>
<artifactId>spark-sql_2.13</artifactId>
build.sbt)val sparkVersion = "3.5.1" // or "4.0.0"
scalaVersion := "2.13.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion
libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion
build.gradle)ext {
sparkVersion = '3.5.1' // or '4.0.0'
}
dependencies {
implementation "org.apache.spark:spark-core_2.13:${sparkVersion}"
implementation "org.apache.spark:spark-sql_2.13:${sparkVersion}"
}
requirements.txt / pyproject.toml)pyspark==3.5.1 # or 4.0.0
mvn dependency:tree / sbt dependencyTree)Replace removed and deprecated APIs. Work through compiler errors systematically.
Consult the official Apache Spark migration guide for the complete list of changes for each version: https://spark.apache.org/docs/latest/migration-guide.html
// BEFORE (Spark 1.x/2.x)
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
// AFTER (Spark 2.x+/3.x)
val spark = SparkSession.builder()
.config(conf)
.enableHiveSupport() // if needed
.getOrCreate()
val sc = spark.sparkContext
// BEFORE
rdd.toDF() // implicit from SQLContext
// AFTER
import spark.implicits._
rdd.toDF() // implicit from SparkSession
// BEFORE
val acc = sc.accumulator(0)
// AFTER
val acc = sc.longAccumulator("name")
SQLContext / HiveContext with SparkSessionAccumulator with AccumulatorV2DataFrame → Dataset[Row] where neededRDD.mapPartitionsWithContext with mapPartitionsSparkConf deprecated settersUserDefinedFunction registrationExperimental / DeveloperApi usages that were removedSpark renames and removes configuration properties between versions. The official migration guide documents every renamed and removed property per release: https://spark.apache.org/docs/latest/migration-guide.html
spark.shuffle.file.buffer.kb → spark.shuffle.file.buffer)spark-defaults.conf, application code, and submit scriptsSparkSession.builder().config(...) calls use current property namesSpark SQL behavior changes between versions can silently alter query results.
CAST to integer no longer truncates silently — set spark.sql.ansi.enabled if neededFROM clause is required in SELECT (no more SELECT 1)spark.sql.legacy.timeParserPolicy controls date/time parsing behaviorspark.sql.ansi.enabled=true)spark.sql.legacy.* flags removedCAST where implicit coercion relied on legacy behaviorspark.sql.legacy.* flags temporarily if needed for phased migrationspark.sql.legacy.* flags that are set temporarily✓ Project compiles against target Spark version
✓ All tests pass
✓ No removed APIs remain in code
✓ Configuration properties are current
✓ SQL queries produce correct results
✓ Upgrade impact documented in spark_upgrade_impact.md
tools
Create an automation that generates an async standup digest from Slack. Searches selected channels for messages since the previous workday, groups updates by project, highlights blockers and decisions, and posts a summary to a target channel.
tools
Create an automation that writes a recurring research brief. Uses Tavily MCP for web research and Notion MCP to publish the final brief with executive summary, implications, and source citations.
tools
Create an automation that triages new Linear issues. Inspects the issue title, description, team, customer, priority, and recent related issues via Linear MCP. Suggests labels, priority, likely owner, duplicates, and posts a clarifying comment.
tools
Create an automation that drafts incident retrospectives. Gathers incident-channel messages from Slack, collects linked tickets and follow-ups from Linear, and publishes a retrospective draft to Notion with a timeline, impact summary, root-cause hypotheses, and action items.