Python in worker has different version 3.7 than that in driver 3.6
生活随笔
收集整理的這篇文章主要介紹了
Python in worker has different version 3.7 than that in driver 3.6
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
環境:
| 組件 | 版本 |
| Ubuntu | 20.04 |
| Spark | 3.0.0-preview2-bin-hadoop3.2 |
完整報錯如下 :
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 43, 192.168.0.103, executor 1): org.apache.spark.api.python.PythonException: Traceback (most recent call last):File "/home/appleyuchi/bigdata/spark-3.0.0-preview2-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/worker.py", line 469, in main("%d.%d" % sys.version_info[:2], version)) Exception: Python in worker has different version 3.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:484)at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:619)at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:602)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:437)at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)at scala.collection.Iterator.foreach(Iterator.scala:941)at scala.collection.Iterator.foreach$(Iterator.scala:941)at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1004)at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2156)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)at org.apache.spark.scheduler.Task.run(Task.scala:127)at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:441)at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:444)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at java.lang.Thread.run(Thread.java:748)Driver stacktrace:at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:1989)at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1977)at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1976)at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1976)at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:956)at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:956)at scala.Option.foreach(Option.scala:407)at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:956)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2206)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2155)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2144)at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:758)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2116)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2137)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2156)at org.apache.spark.SparkContext.runJob(SparkContext.scala:2181)at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1004)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)at org.apache.spark.rdd.RDD.collect(RDD.scala:1003)at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:168)at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)at py4j.Gateway.invoke(Gateway.java:282)at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)at py4j.commands.CallCommand.execute(CallCommand.java:79)at py4j.GatewayConnection.run(GatewayConnection.java:238)at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):File "/home/appleyuchi/bigdata/spark-3.0.0-preview2-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/worker.py", line 469, in main("%d.%d" % sys.version_info[:2], version)) Exception: Python in worker has different version 3.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:484)at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:619)at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:602)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:437)at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)at scala.collection.Iterator.foreach(Iterator.scala:941)at scala.collection.Iterator.foreach$(Iterator.scala:941)at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1004)at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2156)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)at org.apache.spark.scheduler.Task.run(Task.scala:127)at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:441)at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:444)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)... 1 more這個事情比較復雜,我們有三種python
root自帶的系統python
anaconda的python
Spark自帶的python
所以上面的log中的python到底來自哪里。。。。。。。
?
這個問題據我觀察,Pyspark是這么使用的,
master上用的是虛擬環境anaconda里面的默認python版本
slave上用的是系統python版本。。。。
?
恢復成shell的設置如下:
export PYSPARK_PYTHON=~/anaconda3/envs/Python3.6/bin/python3
export PYSPARK_DRIVER_PYTHON=~/anaconda3/envs/Python3.6/bin/python3
這樣設置jupyter notebook就無法使用了。
?
?
總結
以上是生活随笔為你收集整理的Python in worker has different version 3.7 than that in driver 3.6的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 奥奇传说怎么到160000以上战力啊
- 下一篇: spark shell的运行模式汇总