Optimize Spark data processing pipelines
spark-engineerskillsetup L3★9,726
Jeffallan/claude-skills ↗What it does
Optimize distributed Spark data pipelines
Best for
Large-scale ETL, DataFrame transformations, partitioning, shuffle optimization, or Spark SQL queries
Outputs
- · JSON response object
- · Spark DataFrame
Requires
- · SQL
- · Apache Spark
Preconditions
Use DataFrame API over RDD for structured data processing; Define explicit schemas for production pipelines
Failure modes
("overwrite")
Trust signals
- · Source: Jeffallan/claude-skills
- · Includes worked examples
- · Has validation checklist