Hadoop伪集群环境搭建
生活随笔
收集整理的這篇文章主要介紹了
Hadoop伪集群环境搭建
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
結合網上多份文檔,不斷反復的修正hadoop啟動和運行過程中出現的問題,終于把Hadoop2.5.2偽分布式安裝起來,跑通了wordcount例子。Hadoop的安裝復雜性的體現之一是,Hadoop的安裝文檔非常多,但是能一個文檔走下來的少之又少,尤其是Hadoop不同版本的配置差異非常的大。Hadoop2.5.2于前兩天發布,但是它的配置跟2.5.0,2.5.1沒有分別。
系統環境: Ubuntu 12.04 LTS x86_32
sudo?addgroup?hadoop??
useradd?-g?hadoop?hadoop?? su?hadoop??
ssh-keygen?-t?rsa?-P?""?? ????????
cat?/proc/sys/net/ipv6/conf/all/disable_ipv6??
sudo?vim?/etc/sysctl.conf?? ?
Java代碼?? net.ipv6.conf.all.disable_ipv6?=?1?? net.ipv6.conf.default.disable_ipv6?=?1?? net.ipv6.conf.lo.disable_ipv6?=?1?? ?
export?JAVA_HOME=/software/devsoftware/jdk1.7.0_55?? export?PATH=$JAVA_HOME/bin:$PATH?? ?
export?HADOOP_HOME=/home/hadoop/hadoop-2.5.2?? export?PATH=$HADOOP_HOME/bin:$PATH?? ?
source?/etc/profile??
export?JAVA_HOME=/software/devsoftware/jdk1.7.0_55?? ?
<configuration>?? <property>?? ?? ??????<name>hadoop.tmp.dir</name>?? ??????<!--目錄必須手動創建出來-->?? ??????<value>/home/hadoop/data/tmp</value>?? ?? ??????<description>A?base?for?other?temporary?directories.</description>?? ?? ??</property>?? ?? <!--file?system?properties-->?? ?? ??<property>?? ?? ??????<name>fs.defaultFS</name>?? ???????? ?????<!--HDFS的服務地址,只能使用域名,不能設置為IP或者localhost-->?? ??????<value>hdfs://hostname:9000</value>?? ?? ??</property>?? ??<property>?? ????<!--使用Hadoop自帶的so庫-->?? ????<name>hadoop.native.lib</name>?? ????<value>true</value>?? ????<description>Should?native?hadoop?libraries,?if?present,?be?used.</description>?? </property>?? ?? </configuration>?? cp?mapred-site.xml.template?mapred-site.xml?? ?
做如下設置,
Xml代碼?? <configuration>?? ??<property>?? ??<name>mapreduce.framework.name</name>?? ??<!--yarn全是小寫,不是Yarn-->?? ??<value>yarn</value>?? ??</property>?? ?</configuration>?? ??
<configuration>?? ?? <!--?Site?specific?YARN?configuration?properties?-->????? ?? ??<property>?? ?? ????<!--yarn是小寫,或許大些Y也可以-->?? ????<name>yarn.nodemanager.aux-services</name>?? ?????? ????<!--不是mapreduce.shuffle-->?? ?? ????<value>mapreduce_shuffle</value>??? ?? ??</property>?? ?? ??<property>?? ????<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>?? ????<value>org.apache.hadoop.mapred.ShuffleHandler</value>?? ?</property>??????? ?? ??<property>?? ?? ????<description>The?address?of?the?applications?manager?interface?in?the?RM.</description>????????? ?? ????<name>Yarn.resourcemanager.address</name>??? ????? ????<!--根據實際情況,設置hostname域名-->?????????? ????<value>hostname:18040</value>????????????? ?? ??</property>?? ?? ??<property>?? ?? ????<description>The?address?of?the?scheduler?interface.</description>?? ?? ????<name>Yarn.resourcemanager.scheduler.address</name>??? ??? ????<!--根據實際情況,設置hostname域名-->?? ????<value>hostname:18030</value>????? ?? ??</property>?? ?? ??<property>?? ?? ????<description>The?address?of?the?RM?web?application.</description>?? ?? ????<name>Yarn.resourcemanager.webapp.address</name>??? ????<!--根據實際情況,設置hostname域名-->?? ????<value>hostname:18088</value>????? ?? ??</property>?? ?? ??<property>?? ?? ????<description>The?address?of?the?resource?tracker?interface.</description>?? ?? ????<name>Yarn.resourcemanager.resource-tracker.address</name>??? ?? ????<!--根據實際情況,設置hostname域名-->?? ????<value>hostname:8025</value>????? ?? ??</property>?? ?? </configuration>?? <configuration>?? ?? ????<property>?? ?? ????????<name>dfs.namenode.name.dir</name>?? ????????<!--手工創建好-->?? ????????<value>/home/hadoop/data/hdfs/name</value>?? ?? ????</property>?? ?? ????<property>?? ?? ????????<name>dfs.datanode.data.dir</name>?? ????????<!--手工創建好-->?? ????????<value>/home/hadoop/data/hdfs/data</value>?? ?? ????</property>?? ?? ????<property>?? ?? ????????<!--HDFS文件復本數-->?? ????????<name>dfs.replication</name>?? ?? ????????<value>1</value>?? ?? ????</property>?? ?? </configuration>?? hadoop?namenode?-format??? ? /home/hadoop/hadoop-2.5.2/sbin/start-all.sh?? ? 10682?DataNode?? 10463?NameNode?? 11229?ResourceManager?? 24647?Jps?? 11040?SecondaryNameNode?? 11455?NodeManager?? ? tcp????????0??????0?0.0.0.0:8042????????????0.0.0.0:*???????????????LISTEN??????11455/java???????? tcp????????0??????0?0.0.0.0:50090???????????0.0.0.0:*???????????????LISTEN??????11040/java???????? tcp????????0??????0?0.0.0.0:50070???????????0.0.0.0:*???????????????LISTEN??????10463/java???????? tcp????????0??????0?0.0.0.0:8088????????????0.0.0.0:*???????????????LISTEN??????11229/java???????? tcp????????0??????0?0.0.0.0:34456???????????0.0.0.0:*???????????????LISTEN??????11455/java???????? tcp????????0??????0?0.0.0.0:13562???????????0.0.0.0:*???????????????LISTEN??????11455/java???????? tcp????????0??????0?0.0.0.0:50010???????????0.0.0.0:*???????????????LISTEN??????10682/java???????? tcp????????0??????0?0.0.0.0:50075???????????0.0.0.0:*???????????????LISTEN??????10682/java???????? tcp????????0??????0?0.0.0.0:8030????????????0.0.0.0:*???????????????LISTEN??????11229/java???????? tcp????????0??????0?0.0.0.0:8031????????????0.0.0.0:*???????????????LISTEN??????11229/java???????? tcp????????0??????0?0.0.0.0:8032????????????0.0.0.0:*???????????????LISTEN??????11229/java???????? tcp????????0??????0?0.0.0.0:8033????????????0.0.0.0:*???????????????LISTEN??????11229/java???????? tcp????????0??????0?0.0.0.0:50020???????????0.0.0.0:*???????????????LISTEN??????10682/java???????? tcp????????0??????0?0.0.0.0:8040????????????0.0.0.0:*???????????????LISTEN??????11455/java???? ? http://hostname:50070?? ? http://hostname:8088?? ?
echo?"My?first?hadoop?example.?Hello?Hadoop?in?input.?"?>?/home/hadoop/input?? ? hadoop?fs?-mkdir?/user/hadooper?? ? hadoop?fs?-put?/home/hadoop/input?/user/hadooper?? ? hadoop?jar?/home/hadoop/hadoop-2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar?wordcount?/user/hadooper/input?/user/hadooper/output?? ? hadoop@hostname:~/hadoop-2.5.2/share/hadoop/mapreduce$?hadoop?jar?/home/hadoop/hadoop-2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar?wordcount?/user/hadooper/input?/user/hadooper/output?? 14/11/23?19:45:04?INFO?client.RMProxy:?Connecting?to?ResourceManager?at?/0.0.0.0:8032?? 14/11/23?19:45:05?INFO?input.FileInputFormat:?Total?input?paths?to?process?:?1?? 14/11/23?19:45:05?INFO?mapreduce.JobSubmitter:?number?of?splits:1?? 14/11/23?19:45:06?INFO?mapreduce.JobSubmitter:?Submitting?tokens?for?job:?job_1416742510596_0001?? 14/11/23?19:45:06?INFO?impl.YarnClientImpl:?Submitted?application?application_1416742510596_0001?? 14/11/23?19:45:07?INFO?mapreduce.Job:?The?url?to?track?the?job:?http://hostname:8088/proxy/application_1416742510596_0001/?? 14/11/23?19:45:07?INFO?mapreduce.Job:?Running?job:?job_1416742510596_0001?? 14/11/23?19:45:18?INFO?mapreduce.Job:?Job?job_1416742510596_0001?running?in?uber?mode?:?false?? 14/11/23?19:45:18?INFO?mapreduce.Job:??map?0%?reduce?0%?? 14/11/23?19:45:26?INFO?mapreduce.Job:??map?100%?reduce?0%?? 14/11/23?19:45:36?INFO?mapreduce.Job:??map?100%?reduce?100%?? 14/11/23?19:45:37?INFO?mapreduce.Job:?Job?job_1416742510596_0001?completed?successfully?? 14/11/23?19:45:37?INFO?mapreduce.Job:?Counters:?49?? ????File?System?Counters?? ????????FILE:?Number?of?bytes?read=102?? ????????FILE:?Number?of?bytes?written=195793?? ????????FILE:?Number?of?read?operations=0?? ????????FILE:?Number?of?large?read?operations=0?? ????????FILE:?Number?of?write?operations=0?? ????????HDFS:?Number?of?bytes?read=168?? ????????HDFS:?Number?of?bytes?written=64?? ????????HDFS:?Number?of?read?operations=6?? ????????HDFS:?Number?of?large?read?operations=0?? ????????HDFS:?Number?of?write?operations=2?? ????Job?Counters??? ????????Launched?map?tasks=1?? ????????Launched?reduce?tasks=1?? ????????Data-local?map?tasks=1?? ????????Total?time?spent?by?all?maps?in?occupied?slots?(ms)=5994?? ????????Total?time?spent?by?all?reduces?in?occupied?slots?(ms)=6925?? ????????Total?time?spent?by?all?map?tasks?(ms)=5994?? ????????Total?time?spent?by?all?reduce?tasks?(ms)=6925?? ????????Total?vcore-seconds?taken?by?all?map?tasks=5994?? ????????Total?vcore-seconds?taken?by?all?reduce?tasks=6925?? ????????Total?megabyte-seconds?taken?by?all?map?tasks=6137856?? ????????Total?megabyte-seconds?taken?by?all?reduce?tasks=7091200?? ????Map-Reduce?Framework?? ????????Map?input?records=1?? ????????Map?output?records=8?? ????????Map?output?bytes=80?? ????????Map?output?materialized?bytes=102?? ????????Input?split?bytes=119?? ????????Combine?input?records=8?? ????????Combine?output?records=8?? ????????Reduce?input?groups=8?? ????????Reduce?shuffle?bytes=102?? ????????Reduce?input?records=8?? ????????Reduce?output?records=8?? ????????Spilled?Records=16?? ????????Shuffled?Maps?=1?? ????????Failed?Shuffles=0?? ????????Merged?Map?outputs=1?? ????????GC?time?elapsed?(ms)=101?? ????????CPU?time?spent?(ms)=2640?? ????????Physical?memory?(bytes)?snapshot=422895616?? ????????Virtual?memory?(bytes)?snapshot=2055233536?? ????????Total?committed?heap?usage?(bytes)=308281344?? ????Shuffle?Errors?? ????????BAD_ID=0?? ????????CONNECTION=0?? ????????IO_ERROR=0?? ????????WRONG_LENGTH=0?? ????????WRONG_MAP=0?? ????????WRONG_REDUCE=0?? ????File?Input?Format?Counters??? ????????Bytes?Read=49?? ????File?Output?Format?Counters??? ????????Bytes?Written=64?? hadoop@hostname:~/hadoop-2.5.2/share/hadoop/mapreduce$?hadoop?fs?-cat?/user/hadooper/output/part-r-00000?? Hadoop??1?? Hello???1?? My??1?? example.????1?? first???1?? hadoop??1?? in??1?? input.??1?? hadoop?jar?/home/hadoop/hadoop-2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar?pi?10?10?? hadoop?dfsadmin?-safemode?leave
一、創建用戶組和用戶
?
- ?創建用戶組,為系統添加一個用戶組hadoop
- 創建用戶,為系統添加一個用戶hadoop
?
Java代碼???
- 使用hadoop用戶登陸
?
二、使用SSH免密碼登錄
- 執行如下命令,生成ssh的密鑰,創建的key的路徑是/home/hadoop/.ssh/id_rsa
- 將 /home/hadoop/.ssh/id_rsa.pub中的內容追加到/home/hadoop/.ssh/authorized_keys中,保存
- 執行ssh localhost,驗證無密碼即可登陸
?
三、禁用IPv6
- 執行如下命令查看當前IPv6是否禁用,1表示禁用,0表示未禁用,默認是0
- ?編輯如下文件,添加三行,禁用IPv6
Java代碼??
- ?重啟機器,再次查看IPv6是否禁用
四、安裝配置JDK
??
- ?? 編輯/etc/profile文件,設置JAVA相關的系統變量
?
Java代碼???五、安裝配置Hadoop2.5.2
- ?編輯/etc/profile文件,設置Hadoop相關的系統變量
- 執行如下使上面配置的系統變量生效
- 將JDK設置到Hadoop的環境腳本/home/hadoop/hadoop-2.5.2/etc/hadoop/hadoop-env.sh中,追加一行
六、Hadoop2.5.2配置文件設置
Hadoop2.5.2有四個配置文件需要配置,它們都位于/home/hadoop/hadoop-2.5.2/etc/hadoop目錄下。四個文件分別是- core-site.xml??
- yarn-site.xml
- mapred-site.xml
- hdfs-site.xml
這寫配置文件中有些需要手工創建目錄,有些需要根據系統的實際情況,設置hostname,hostname不能是IP或者localhost,需要在/etc/hosts中進行設置。需要補充一點,有幾個文檔指出,127.0.0.1最好只跟一個hostname(即Hadoop用到的)綁定,把其余的注釋掉。這個究竟是否產生影響,沒有測,只是按照網上的說法,只保留一個hostname
?
6.1 core-site.xml配置
?
?
Xml代碼???
6.2 mapred-site.xml配置
?mapred-site.xml文件默認不存在,使用cp命令從mapred-site.xml.template拷貝一份 Java代碼??6.3 yarn-site.xml配置
?
Xml代碼???
6.4 hdfs-site.xml 配置
?
?
Xml代碼???
七、Hadoop初始化并啟動
?
- 格式化Hadoop NameNode
?
Java代碼???
觀察日志,如果有輸出中包括Storage directory /home/hadoop/data/hdfs/name has been successfully formatted,則表示格式化成功
?
- 啟動Hadoop
?
Java代碼???
- 使用JDK的jps檢查Hadoop狀態,如果是如下結果,則表示安裝成功
?
Java代碼???
- ?使用netstat -anp|grep java觀察Hadoop端口號使用情況
?
Java代碼???
?
- 瀏覽NameNode、DataNode信息,可以查看HDFS狀態信息
?
?
Java代碼???
- 瀏覽ResourceManagered運行狀態,可以瀏覽MapReduce任務的執行情況
?
?
Java代碼???
八、運行Hadoop自帶的WordCount實例
- 創建本地文件用于計算這個文件中的單詞數
?
Java代碼???
- 創建HDFS輸入目錄,用于將上面的文件寫入這個目錄
?
Java代碼???
- 傳文件到HDFS輸入目錄
?
Java代碼???
- 執行Hadoop自帶的WordCount例子
?
Java代碼???
- MapReduce的過程輸出
?
Java代碼???
- 查看MapReduce的運行結果
?
?
Java代碼???
?
九、運行Hadoop的PI程序
?
Java代碼???
執行結果是3.200000000000000000
?
十、Hadoop常見問題
1. hadoop不正常推出后,重啟后,NameNode將進入Safe Mode,不能提交任務,解決辦法:
?
Java代碼??總結
以上是生活随笔為你收集整理的Hadoop伪集群环境搭建的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Mac上运行第一个Hadoop实例
- 下一篇: 使用Eclipse编译运行MapRedu