map and flatmap 区别
2019獨角獸企業重金招聘Python工程師標準>>>
map vs flatMap in Spark
September 24, 2014Big Dataexample, spark
In the previous blogs around Spark examples, RDD.flatMap() has been used. In this blog we will look at the differences between RDD.map() and RDD.flatMap().
map and flatMap are similar, in the sense they take a line from the input RDD and apply a function on it. The way they differ is that the function in map returns only one element, while function in flatMap can return a list of elements?(0 or more) as an iterator.
Also, the output of the flatMap is flattened. Although the function in flatMap returns a list of elements, the flatMap returns an RDD which has all the elements from the list in a flat way (not a list).
Sounds a bit confusing. In the below code snippet, on the input lines both map and flatMap are applied and output dumped in HDFS to wordsWithMap and wordsWithFlatMap folder.
from pyspark import SparkContext sc = SparkContext("spark://bigdata-vm:7077", "Map") lines = sc.parallelize(["hello world", "hi"]) wordsWithMap = lines.map(lambda line: line.split(" ")).coalesce(1) wordsWithFlatMap = lines.flatMap(lambda line: line.split(" ")).coalesce(1) wordsWithMap.saveAsTextFile("hdfs://localhost:9000/user/bigdatavm/wordsWithMap") wordsWithFlatMap.saveAsTextFile("hdfs://localhost:9000/user/bigdatavm/wordsWithFlatMap")
| 1 2 3 4 5 6 7 8 9 10 | from pyspark import SparkContext ? sc = SparkContext("spark://bigdata-vm:7077", "Map") lines = sc.parallelize(["hello world", "hi"]) ? wordsWithMap = lines.map(lambda line: line.split(" ")).coalesce(1) wordsWithFlatMap = lines.flatMap(lambda line: line.split(" ")).coalesce(1) ? wordsWithMap.saveAsTextFile("hdfs://localhost:9000/user/bigdatavm/wordsWithMap") wordsWithFlatMap.saveAsTextFile("hdfs://localhost:9000/user/bigdatavm/wordsWithFlatMap") |
?
The output of the map function in HDFS
The output of the flatMap function in HDFS
Conclusion
The input function to map returns a single element, while the flatMap returns a list of elements (0 or more). And also, the output of the flatMap is flattened.
In the case of word count, where the input line is split into multiple words, flatMap can be used. Also, in the case of weather data set, the extractData nethod will validate the record and might or might not return a value. In this case also, flatMap can be used.
Share this:
轉載于:https://my.oschina.net/fayebrooke/blog/689731
總結
以上是生活随笔為你收集整理的map and flatmap 区别的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Jfinal碰到的问题记录
- 下一篇: (转)Eclipse New Serve