當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

hadoop学习笔记2

發布時間：2025/4/16 编程问答 15 豆豆

生活随笔收集整理的這篇文章主要介紹了 hadoop学习笔记2 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

2019獨角獸企業重金招聘Python工程師標準>>>

1.入門
HDFS 存儲
MapReduce 計算
? Spark Flink
Yarn 資源作業調度

偽分布式部署
要求環境配置文件參數文件 ssh無密碼啟動

jps命令
[hadoop@hadoop002 ~]$ jps
28288 NameNode ? NN
27120 Jps
28410 DataNode ? DN
28575 SecondaryNameNode ?SNN

1.MapReduce job on Yarn
[hadoop@hadoop002 hadoop]$ cp mapred-site.xml.template mapred-site.xml
[hadoop@hadoop002 hadoop]$?

Configure parameters as follows:
etc/hadoop/mapred-site.xml:

<configuration>
? ? <property>
? ? ? ? <name>mapreduce.framework.name</name>
? ? ? ? <value>yarn</value>
? ? </property>
</configuration>
etc/hadoop/yarn-site.xml:

<configuration>
? ? <property>
? ? ? ? <name>yarn.nodemanager.aux-services</name>
? ? ? ? <value>mapreduce_shuffle</value>
? ? </property>
</configuration>
Start ResourceManager daemon and NodeManager daemon:
? $ sbin/start-yarn.sh

open web: http://47.75.249.8:8088/

3.運行MR JOB
Linux 文件存儲系統 mkdir ls
HDFS 分布式文件存儲系統
-format?
hdfs dfs -???

Make the HDFS directories required to execute MapReduce jobs:
? $ bin/hdfs dfs -mkdir /user
? $ bin/hdfs dfs -mkdir /user/<username>
Copy the input files into the distributed filesystem:
? $ bin/hdfs dfs -put etc/hadoop input
Run some of the examples provided:
? $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar grep input output 'dfs[a-z.]+'
Examine the output files:
Copy the output files from the distributed filesystem to the local filesystem and examine them:

? $ bin/hdfs dfs -get output output
? $ cat output/*
or

View the output files on the distributed filesystem:

? $ bin/hdfs dfs -cat output/*

-------------------------------------------------
bin/hdfs dfs -mkdir /user/hadoop/input
bin/hdfs dfs -put etc/hadoop/core-site.xml /user/hadoop/input

bin/hadoop jar \
share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar \
grep \
/user/hadoop/input \
/user/hadoop/output \
'fs[a-z.]+'

4.HDFS三個進程啟動以hadoop002啟動
NN: core-site.xml ?fs.defaultFS參數
DN: slaves
SNN:?

<property>
?? ?<name>dfs.namenode.secondary.http-address</name>
?? ?<value>hadoop002:50090</value>
</property>
<property>
?? ?<name>dfs.namenode.secondary.https-address</name>
?? ?<value>hadoop002:50091</value>
</property>

5.jps
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ jps
16188 DataNode
16379 SecondaryNameNode
16566 Jps
16094 NameNode
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$?

5.1 位置
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ which jps
/usr/java/jdk1.7.0_80/bin/jps
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$?

5.2 其他用戶
[root@hadoop002 ~]# jps
16188 -- process information unavailable
16607 Jps
16379 -- process information unavailable
16094 -- process information unavailable
[root@hadoop002 ~]#?

[root@hadoop002 ~]# useradd jepson
[root@hadoop002 ~]# su - jepson
[jepson@hadoop002 ~]$ jps
16664 Jps
[jepson@hadoop002 ~]$?

process information unavailable?
真正可用的

[root@hadoop002 ~]# kill -9 16094
[root@hadoop002 ~]#?
[root@hadoop002 ~]# jps
16188 -- process information unavailable
16379 -- process information unavailable
16702 Jps
16094 -- process information unavailable
[root@hadoop002 ~]#?
[root@hadoop002 ~]# ps -ef|grep 16094
root ? ? 16722 16590 ?0 22:19 pts/4 ? ?00:00:00 grep 16094
[root@hadoop002 ~]#?
process information unavailable?
真正不可用的

正確的做法: process information unavailable?
1.找到進程號 pid
2.ps -ef|grep pid 是否存在
3.假如存在，
? 第二步是可以知道哪個用戶運行這個進程，
? su - 用戶，進去查看

? 假如刪除rm -f /tmp/hsperfdata_${user}/pid文件
? 進程不掛，但是jps命令不顯示了，所依賴的腳本都會有問題

4.假如不存在，怎樣清空殘留信息
rm -f /tmp/hsperfdata_${user}/pid文件
? ?
http://blog.itpub.net/30089851/viewspace-1994344/

6.補充命令
ssh root@ip -p 22
ssh root@47.75.249.8 date

rz sz

兩個Linux系統怎樣傳輸呢？
hadoop000-->hadoop002
[ruoze@hadoop000 ~]$ scp test.log ?root@47.75.249.8:/tmp/
將當前的Linux系統文件 scp到遠程的機器上

hadoop000<--hadoop002
[ruoze@hadoop002 ~]$ scp test.log ?root@hadoop000:/tmp/

但是 hadoop002屬于生產機器你不可登陸
scp root@47.75.249.8:/tmp/test.log /tmp/rz.log

但是: 生產上絕對不可能給你密碼

ssh多臺機器互相信任關系
http://blog.itpub.net/30089851/viewspace-1992210/
坑:
scp 傳輸 pub文件
/etc/hosts文件里面配置多臺機器的ip和name

--------------------------------------------
作業:
1.Yarn偽分布式部署 +1 blog
2.MR JOB +1 blog
3.HDFS進程啟動 hadoop002 + 1blog
4.jps整理為1blog
5.再裝1臺VM虛擬機?
ssh多臺信任關系 ?1blog

6.拓展:
rm -rf ~/.ssh
A機器無密碼訪問B機器，請問誰的pub文件拷貝給誰？

轉載于:https://my.oschina.net/u/3862440/blog/2246054

總結

以上是生活随笔為你收集整理的hadoop学习笔记2的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： hanlp中的N最短路径分词
下一篇： OSError:[Errno 13] P