skills/data/spark-basics/SKILL.md
PySpark fundamentals for distributed data processing.
npx skillsauth add timequity/vibe-coder spark-basicsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("ETL Job") \
.config("spark.sql.adaptive.enabled", "true") \
.getOrCreate()
# CSV
df = spark.read.csv("s3://bucket/data.csv", header=True, inferSchema=True)
# Parquet
df = spark.read.parquet("s3://bucket/data/")
# JSON
df = spark.read.json("s3://bucket/data.json")
# Delta Lake
df = spark.read.format("delta").load("s3://bucket/delta/")
from pyspark.sql import functions as F
# Select and rename
df = df.select(
F.col("id").alias("user_id"),
F.col("name"),
F.col("created_at").cast("timestamp")
)
# Filter
df = df.filter(F.col("status") == "active")
# Aggregate
summary = df.groupBy("category").agg(
F.count("*").alias("count"),
F.sum("amount").alias("total"),
F.avg("amount").alias("average")
)
# Join
result = orders.join(customers, "customer_id", "left")
# Window functions
from pyspark.sql.window import Window
window = Window.partitionBy("user_id").orderBy("created_at")
df = df.withColumn("row_num", F.row_number().over(window))
# Parquet with partitions
df.write \
.partitionBy("year", "month") \
.mode("overwrite") \
.parquet("s3://bucket/output/")
# Delta Lake
df.write \
.format("delta") \
.mode("merge") \
.save("s3://bucket/delta/")
cache() for reused DataFramescollect() on large datadevelopment
Hidden quality gate that runs before showing "Done!" to user - ensures all tests pass, build succeeds, and requirements met before claiming completion
data-ai
Use when about to claim work is complete or fixed - requires running verification commands and confirming output before making any success claims
tools
Generate UI components from natural language descriptions. Use when: user asks for a page, component, or UI element. Triggers: "create page", "add component", "show form", "make button", "страница", "компонент", "форма".
content-media
10 ready-to-use themes with colors and fonts for consistent styling. Use when: applying visual themes to pages, components, or design systems. Triggers: "theme", "color palette", "color scheme", "fonts", "branding", "visual identity", "design system colors".