How to set the number of partitions/nodes when importing data into Spark
Answer #1 93.7 %By default it partitions into 200 sets. You can change it by using set command in sql context sqlContext.sql("set spark.sql.shuffle.partitions=10");
. However you need to set it with caution based up on your data characteristics.
You can call repartition()
on dataframe for setting partitions. You can even set spark.sql.shuffle.partitions
this property after creating hive context or by passing to spark-submit jar:
spark-submit .... --conf spark.sql.shuffle.partitions=100
or
dataframe.repartition(100)