Spark _12_每个网址的每个地区访问量 ,由大到小排序
生活随笔
收集整理的這篇文章主要介紹了
Spark _12_每个网址的每个地区访问量 ,由大到小排序
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
圖示:
?
代碼:
?
package ddd.henu.pvuvimport org.apache.spark.{SparkConf, SparkContext}import scala.collection.mutable import scala.collection.mutable.ListBufferobject RegionScala {def main(args: Array[String]): Unit = {val conf = new SparkConf()conf.setMaster("local")conf.setAppName("test")val sc = new SparkContext(conf)val lines = sc.textFile("./data/pvuvdata")//每個網址的每個地區訪問量 ,由大到小排序val site_local = lines.map(line=>{(line.split("\t")(5),line.split("\t")(1))})val site_localIterable = site_local.groupByKey()val result = site_localIterable.map(one => {val localMap = mutable.Map[String, Int]()val site = one._1val localIter = one._2.iteratorwhile (localIter.hasNext) {val local = localIter.next()if (localMap.contains(local)) {val value = localMap.get(local).getlocalMap.put(local, value + 1)} else {localMap.put(local, 1)}}val tuples: List[(String, Int)] = localMap.toList.sortBy(one => {one._2})if(tuples.size>3){val returnList = new ListBuffer[(String, Int)]()for(i <- 0 to 2){returnList.append(tuples(i))}(site, returnList)}else{(site, tuples)}})result.foreach(println)/*** (www.suning.com,ListBuffer((海南,509), (甘肅,512), (浙江,514)))* (www.gome.com.cn,ListBuffer((浙江,489), (貴州,501), (云南,504)))* (www.jd.com,ListBuffer((青海,492), (浙江,499), (甘肅,510)))* (www.dangdang.com,ListBuffer((河北,490), (云南,520), (黑龍江,521)))* (www.taobao.com,ListBuffer((臺灣,493), (青海,496), (浙江,504)))* (www.mi.com,ListBuffer((安徽,486), (河北,489), (浙江,496)))* (www.baidu.com,ListBuffer((遼寧,509), (天津,510), (甘肅,511)))*/} }?
總結
以上是生活随笔為你收集整理的Spark _12_每个网址的每个地区访问量 ,由大到小排序的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: linux命令行输入下一行但不运行命令
- 下一篇: linux命令运行中,输入命令,如果输入