pyspark资源配置
生活随笔
收集整理的這篇文章主要介紹了
pyspark资源配置
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
pyhton中,想像scala一樣,對spark使用資源做指定,如:
spark-submit \ --principal $principal \ --keytab $keytab \ --name Test \ --master yarn --deploy-mode cluster \ --num-executors 10 \ --executor-cores 4 \ --executor-memory 16G \ --driver-memory 16G \ --conf spark.locality.wait=10 \ --conf spark.serializer="org.apache.spark.serializer.KryoSerializer" \ --conf spark.streaming.backpressure.enabled=true \ --conf spark.task.maxFailures=8 \ --conf spark.driver.maxResultSize=8G \ --conf spark.default.parallelism=500 \ --conf spark.sql.shuffle.partitions=300 \ --conf spark.sql.autoBroadcastJoinThreshold=-1 \ --conf spark.sql.broadcastTimeout=3000 \ --conf spark.yarn.submit.waitAppCompletion=true \ --conf spark.yarn.report.interval=6000 \ --conf spark.driver.extraClassPath=$localroot/config \ --conf spark.executor.userClassPathFirst=true \ --conf spark.hbase.obtainToken.enabled=true \ --conf spark.yarn.security.credentials.hbase.enabled=true \ --conf spark.executor.extraJavaOptions="${executorJavaOpts:34}" \ --conf spark.yarn.cluster.driver.extraJavaOptions="${driverJavaOpts:45}" \ --conf spark.driver.userClassPathFirst=true \ --conf spark.yarn.dist.innerfiles=$SPARK_HOME/conf/log4j-executor.properties,$SPARK_HOME/conf/jaas-zk.conf,$SPARK_HOME/conf/carbon.properties,$SPARK_HOME/conf/jets3t.properties,$SPARK_HOME/conf/topology.properties,$SPARK_HOME/conf/mapred-site.xml \ --files $localroot/config/logback.xml,\ $localroot/config/hbase-site.xml,$localroot/config/kdc.conf,${uploadFiles} \ --class com.huawei.rcm.newsfeed.boxrcm.nearby.CityInfoParse \ --jars $localroot/lib/bcprov-ext-jdk15on-1.68.jar,\ $localroot/lib/CryptoUtil-1.1.5.304.jar,\ $localroot/lib/commons-pool2-2.8.1.jar,\ $localroot/lib/jedis-2.9.0.jar,$localroot/lib/com.huawei.dcs.dcsdk.core-1.6.18.101.jar,$localroot/lib/com.huawei.dcs.dcsdk.support.onejar-1.6.18.101.jar,$localroot/lib/gpaas-middleware-common-2.2.5.101.jar,\ $app_jarpath \ $arg1 \ $arg2 \ $arg3 \但很難查到pyspark任務的資源指定配置。
其實很相似,方法如下:
keytab_path=/home/testuser/wdbkeytab/user.keytabanaconda_archive=hdfs://teset/anaconda3.tar.gz#anaconda_pack application_name=task_id_$1_$2pyspark_python="./anaconda_pack/anaconda3/bin/python"spark-submit --master yarn --deploy-mode cluster --name $application_name \--driver-cores 2 \--driver-memory 64G \--queue $queue \--num-executors 50 \--executor-memory 3g \--executor-cores 2 \--principal $principal \--keytab $keytab_path \--archives $anaconda_archive \--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=$pyspark_python \--conf spark.executorEnv.PYSPARK_PYTHON=$pyspark_python \--conf spark.driver.maxResultSize=10G \--conf spark.default.parallelism=1000 \--conf spark.speculation=true --conf spark.speculation.interval=60000 --conf spark.speculation.quantile=0.85 \--conf spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive=true \--conf spark.security.credentials.hbase.enabled=true \--conf spark.hadoop.validateOutputSpecs=false \--conf spark.yarn.user.classpath.first=true \--conf spark.executor.memoryOverhead=40960 \--conf spark.yarn.am.waitTime=1000s \--py-files test.py \總結
以上是生活随笔為你收集整理的pyspark资源配置的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 入手sm961
- 下一篇: 转岗大数据了,先用数据看看行情