python + hadoop (案例)
生活随笔
收集整理的這篇文章主要介紹了
python + hadoop (案例)
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
python如何鏈接hadoop,并且使用hadoop的資源,這篇文章介紹了一個簡單的案例!
一、python的map/reduce代碼
首先認(rèn)為大家已經(jīng)對haoop已經(jīng)有了很多的了解,那么需要建立mapper和reducer,分別代碼如下:
1、mapper.py
#!/usr/bin/env python import sys for line in sys.stdin:line = line.strip()words = line.split()for word in words:print '%s\t%s' %(word, 1)2、reducer.py
#!/usr/bin/env python from operator import itemgetter import syscurrent_word = None current_count = 0 word = Nonefor line in sys.stdin:words = line.strip()word, count = words.split('\t')try:count = int(count)except ValueError:continueif current_word == word:current_count += countelse:if current_word:print '%s\t%s' %(current_word, current_count)current_count = countcurrent_word = wordif current_word == word:print '%s\t%s' %(current_word, current_count)建立了兩個代碼之后,測試一下:
[qiu.li@l-tdata5.tkt.cn6 /export/python]$ echo "I like python hadoop , hadoop very good" | ./mapper.py | sort -k 1,1 | ./reducer.py , 1 good 1 hadoop 2 I 1 like 1 python 1 very 1二、上傳文件
發(fā)現(xiàn)沒啥問題,那么成功一半了,下面上傳幾個文件到hadoop做進(jìn)一步測試。我在線上找了幾個文件,命令如下:
wget http://www.gutenberg.org/ebooks/20417.txt.utf-8 wget http://www.gutenberg.org/files/5000/5000-8.txt wget http://www.gutenberg.org/ebooks/4300.txt.utf-8查看下載的文件:
[qiu.li@l-tdata5.tkt.cn6 /export/python]$ ls 20417.txt.utf-8 4300.txt.utf-8 5000-8.txt mapper.py reducer.py run.sh上傳文件到hadoop上面,命令如下:hadoop dfs -put ./*.txt /user/ticketdev/tmp (hadoop是配置好的,目錄也是建立好的)
建立run.sh
hadoop jar $STREAM \-files ./mapper.py,./reducer.py \-mapper ./mapper.py \-reducer ./reducer.py \-input /user/ticketdev/tmp/*.txt \-output /user/ticketdev/tmp/output查看結(jié)果:
[qiu.li@l-tdata5.tkt.cn6 /export/python]$ hadoop dfs -cat /user/ticketdev/tmp/output/part-00000 | sort -nk 2 | tail DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it.it 2387 which 2387 that 2668 a 3797 is 4097 to 5079 in 5226 and 7611 of 10388 the 20583三、參考文獻(xiàn):
http://www.cnblogs.com/wing1995/p/hadoop.html?utm_source=tuicool&utm_medium=referral
?
總結(jié)
以上是生活随笔為你收集整理的python + hadoop (案例)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 使用margin来做适应屏幕的定位
- 下一篇: 脚本之家(各种资料下载)