.claude/skills/spark-best-practices/SKILL.md
Apache Spark best practices for PySpark and Scala distributed data processing
npx skillsauth add baekenough/oh-my-customcode spark-best-practicesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
broadcast(small_df) for small-large table joinsspark.sql.autoBroadcastJoinThreshold)coalesce() to reduce partitions without shufflerepartition() only when necessary (causes shuffle)df.cache() or df.persist()df.unpersist()development
Generate and maintain a persistent codebase wiki — LLM-built interlinked markdown knowledge base (Karpathy LLM Wiki pattern)
development
Use the project wiki as RAG knowledge source — search wiki pages to answer codebase questions before exploring raw files
tools
Analyze task trajectories to propose reusable SKILL.md candidates from successful patterns
data-ai
hada.io RSS feed monitoring for AI agent/harness articles with automated /scout analysis