Optimize Spark data processing pipelines

spark-engineerskillsetup L39,726
Jeffallan/claude-skills
What it does

Optimize distributed Spark data pipelines

Best for

Large-scale ETL, DataFrame transformations, partitioning, shuffle optimization, or Spark SQL queries

Outputs
  • · JSON response object
  • · Spark DataFrame
Requires
  • · SQL
  • · Apache Spark
Preconditions

Use DataFrame API over RDD for structured data processing; Define explicit schemas for production pipelines

Failure modes

("overwrite")

Trust signals
  • · Source: Jeffallan/claude-skills
  • · Includes worked examples
  • · Has validation checklist