问题遇到的现象和发生背景
在windows下使用pycharm编写spark程序报错如下:
用代码块功能插入代码,请勿粘贴截图
源代码
from pyspark import SparkConf, SparkContext
conf = SparkConf().setAppName("WordCount").setMaster("local")
sc = SparkContext(conf=conf)
inputFile = "./word.txt"
textFile = sc.textFile(inputFile)
wordCount = textFile.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)
wordCount.foreach(print)
运行结果及报错内容
22/11/09 16:23:07 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
Traceback (most recent call last):
File "D:/Code/PythonCode/testSpark/TEST/big_word_process.py", line 8, in
wordCount.foreach(print)
File "D:\myjava\spark\spark-3.3.1-bin-hadoop3\python\lib\pyspark.zip\pyspark\rdd.py", line 1163, in foreach
File "D:\myjava\spark\spark-3.3.1-bin-hadoop3\python\lib\pyspark.zip\pyspark\rdd.py", line 1521, in count
File "D:\myjava\spark\spark-3.3.1-bin-hadoop3\python\lib\pyspark.zip\pyspark\rdd.py", line 1509, in sum
File "D:\myjava\spark\spark-3.3.1-bin-hadoop3\python\lib\pyspark.zip\pyspark\rdd.py", line 1336, in fold
File "D:\myjava\spark\spark-3.3.1-bin-hadoop3\python\lib\pyspark.zip\pyspark\rdd.py", line 1197, in collect
File "D:\myjava\spark\spark-3.3.1-bin-hadoop3\python\lib\py4j-0.10.9.5-src.zip\py4j\java_gateway.py", line 1322, in call
File "D:\myjava\spark\spark-3.3.1-bin-hadoop3\python\lib\py4j-0.10.9.5-src.zip\py4j\protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (LAPTOP-DVJ0R5NO executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "D:\myjava\spark\spark-3.3.1-bin-hadoop3\python\lib\pyspark.zip\pyspark\worker.py", line 668, in main
File "D:\myjava\spark\spark-3.3.1-bin-hadoop3\python\lib\pyspark.zip\pyspark\worker.py", line 85, in read_command
File "D:\myjava\spark\spark-3.3.1-bin-hadoop3\python\lib\pyspark.zip\pyspark\serializers.py", line 173, in _read_with_length
return self.loads(obj)
File "D:\myjava\spark\spark-3.3.1-bin-hadoop3\python\lib\pyspark.zip\pyspark\serializers.py", line 452, in loads
return pickle.loads(obj, encoding=encoding)
File "D:\myjava\spark\spark-3.3.1-bin-hadoop3\python\lib\pyspark.zip\pyspark\cloudpickle\cloudpickle.py", line 590, in _create_parametrized_type_hint
return origin[args]
File "D:\python366\lib\typing.py", line 682, in inner
return func(*args, **kwds)
File "D:\python366\lib\typing.py", line 1131, in getitem
_check_generic(self, params)
File "D:\python366\lib\site-packages\typing_extensions.py", line 113, in _check_generic
raise TypeError(f"Too {'many' if alen > elen else 'few'} parameters for {cls};"
TypeError: Too many parameters for typing.Iterable; actual 2, expected 1
我的解答思路和尝试过的方法
代码和数据应该没有问题