Hadoop Shell 命令 与 WordCount
前言
在前2章內, 我們分別介紹了Hadoop安裝的3種形式(Standalone mode/ Pseudo-Distributed mode/Cluster mode). 本章, 我們介紹如何使用HDFS命令進行一些基本的操作. 官方的操作文檔可以查看Hadoop Shell命令.
正文
前置條件
已經安裝Hadoop集群, 并啟動. 從頁面可以看到, 我們HDFS系統的文件目錄.
基本操作
對于文件系統用的最多的就是, 增刪查改與權限系統一直是我們操作文件系統的基本命令.它們的基本操作分別如下所示:
- 查看文件目錄 ls
- 上傳文件 put
默認文件切分大小為128M, 大于的話會切分成2快.
- 查看文件內容 cat
- 下載文件 get
- 創建文件目錄 mkdir
Others
hdfs的命令操作, 可以通過hadoop fs直接顯示所以命令.
localhost:mapreduce Sean$ hadoop fs Usage: hadoop fs [generic options][-appendToFile <localsrc> ... <dst>][-cat [-ignoreCrc] <src> ...][-checksum <src> ...][-chgrp [-R] GROUP PATH...][-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...][-chown [-R] [OWNER][:[GROUP]] PATH...][-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>][-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>][-count [-q] [-h] <path> ...][-cp [-f] [-p | -p[topax]] <src> ... <dst>][-createSnapshot <snapshotDir> [<snapshotName>]][-deleteSnapshot <snapshotDir> <snapshotName>][-df [-h] [<path> ...]][-du [-s] [-h] <path> ...][-expunge][-find <path> ... <expression> ...][-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>][-getfacl [-R] <path>][-getfattr [-R] {-n name | -d} [-e en] <path>][-getmerge [-nl] <src> <localdst>][-help [cmd ...]][-ls [-d] [-h] [-R] [<path> ...]][-mkdir [-p] <path> ...][-moveFromLocal <localsrc> ... <dst>][-moveToLocal <src> <localdst>][-mv <src> ... <dst>][-put [-f] [-p] [-l] <localsrc> ... <dst>][-renameSnapshot <snapshotDir> <oldName> <newName>][-rm [-f] [-r|-R] [-skipTrash] <src> ...][-rmdir [--ignore-fail-on-non-empty] <dir> ...][-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]][-setfattr {-n name [-v value] | -x name} <path>][-setrep [-R] [-w] <rep> <path> ...][-stat [format] <path> ...][-tail [-f] <file>][-test -[defsz] <path>][-text [-ignoreCrc] <src> ...][-touchz <path> ...][-truncate [-w] <length> <path> ...][-usage [cmd ...]]Generic options supported are -conf <configuration file> specify an application configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|resourcemanager:port> specify a ResourceManager -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.The general command line syntax is bin/hadoop command [genericOptions] [commandOptions]-
help
輸出命令參數手冊. -
mkdir
創建目錄. hadoop fs -mkdir -p /abc/acc -
moveFromLocal / moveToLocal
從本地移動HDFS(本地原文件刪除). hadoop fs -moveFromLocal abc.txt /
從HDFS移動本地(HDFS原文件刪除). hadoop fs -moveFromLocal abc.txt / -
appendToFile
追加到文件上面. hadoop fs -appendToFile abc.txt /hello2019.txt
-
cat
顯示文件. hadoop fs -cat /hello2019.sh. 文件過多hadoop fs -cat /hello2019.sh | more或者hadoop fs -tail /hello2019.sh -
tail
顯示文件末尾. hadoop fs -tail /hello2019.sh -
text
已字符形式打印一個文件的內容.hadoop fs -text /hello2019.sh. -
chgrp / chmod / chown
chgrp更改文件組; chmod更改權限; chown更改用戶和組.
-
copyFromlocal / copyToLocal
從本地拷貝; 拷貝到本地 -
cp
hdfs內部進行拷貝. hadoop fs -cp /hello2019.sh /a/hello2019.sh -
mv
hdfs內部進行移動. hadoop fs -mv /hello2019.sh /a/ -
get
獲取到本地. 類似copyToLocal. hadoop fs -get /hello.sh -
getmerge
合并下載多個文件. hadoop fs -getmerge /wordcount/output/* hellomerge.sh
-
put
下載到本地. 類似copyFromLocal. hadoop fs -put hello2019.sh / -
rm
刪除. hadoop fs -rm -r /hello2019.sh
-
rmdir
刪除空目錄. hadoop fs - rmdir /abbc -
df
統計文件系統的可用信息. hadoop fs -df -h /
- du
統計文件夾目錄的大小. hadoop fs -du -s -h /abc/d
-
count
統計一個目錄下的文件數目. hadoop fs -count /aaa/ -
setrep
設置文件的副本數目. replication.
如果結點為3個, 但是設置為10個的時候.并不會設置10個, 這個是namenode中的原數據的副本數目,但是不一定是真實的副本數目(視datanode的數目而定).
WordCount
localhost:mapreduce Sean$ hadoop jar hadoop-mapreduce-examples-2.7.5.jar wordcount /wordcount/input/ /wordcount/outputlocalhost:mapreduce Sean$ hadoop jar hadoop-mapreduce-examples-2.7.5.jar wordcount /wordcount/input/ /wordcount/output Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 19/03/30 16:43:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 19/03/30 16:43:31 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 19/03/30 16:43:32 INFO input.FileInputFormat: Total input paths to process : 1 19/03/30 16:43:32 INFO mapreduce.JobSubmitter: number of splits:1 19/03/30 16:43:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1553933297569_0001 19/03/30 16:43:33 INFO impl.YarnClientImpl: Submitted application application_1553933297569_0001 19/03/30 16:43:33 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1553933297569_0001/ 19/03/30 16:43:33 INFO mapreduce.Job: Running job: job_1553933297569_0001 19/03/30 16:43:43 INFO mapreduce.Job: Job job_1553933297569_0001 running in uber mode : false 19/03/30 16:43:43 INFO mapreduce.Job: map 0% reduce 0% 19/03/30 16:43:48 INFO mapreduce.Job: map 100% reduce 0% 19/03/30 16:43:54 INFO mapreduce.Job: map 100% reduce 100% 19/03/30 16:43:54 INFO mapreduce.Job: Job job_1553933297569_0001 completed successfully 19/03/30 16:43:54 INFO mapreduce.Job: Counters: 49File System CountersFILE: Number of bytes read=74FILE: Number of bytes written=243693FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=157HDFS: Number of bytes written=44HDFS: Number of read operations=6HDFS: Number of large read operations=0HDFS: Number of write operations=2Job CountersLaunched map tasks=1Launched reduce tasks=1Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=3271Total time spent by all reduces in occupied slots (ms)=3441Total time spent by all map tasks (ms)=3271Total time spent by all reduce tasks (ms)=3441Total vcore-milliseconds taken by all map tasks=3271Total vcore-milliseconds taken by all reduce tasks=3441Total megabyte-milliseconds taken by all map tasks=3349504Total megabyte-milliseconds taken by all reduce tasks=3523584Map-Reduce FrameworkMap input records=7Map output records=8Map output bytes=74Map output materialized bytes=74Input split bytes=115Combine input records=8Combine output records=6Reduce input groups=6Reduce shuffle bytes=74Reduce input records=6Reduce output records=6Spilled Records=12Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=123CPU time spent (ms)=0Physical memory (bytes) snapshot=0Virtual memory (bytes) snapshot=0Total committed heap usage (bytes)=306184192Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format CountersBytes Read=42File Output Format CountersBytes Written=44
輸出目錄不能有其他內容, 否則會被進行覆蓋.本次測試后的輸出目錄下出現2個文件:success / part-r-0000. 第一個文件為輸出標準, 第二個文件是真實的文件.
這個文件夾下方有非常多的測試例子. 可以自己研究.
基本原理(大體)
存儲在HDFS內的文件其實還是存儲在本地的, 只是它是一個分布式的文件系統而已. 我們看下我們之前存儲的.
注意,由于我本地的是DataNode與NameNode安裝在一起的, 所以文件目錄下結點如下所示:
localhost:tmp Sean$ tree . ├── dfs │ ├── data │ │ ├── current │ │ │ ├── BP-586017156-127.0.0.1-1553485799471 │ │ │ │ ├── current │ │ │ │ │ ├── VERSION │ │ │ │ │ ├── dfsUsed │ │ │ │ │ ├── finalized │ │ │ │ │ │ └── subdir0 │ │ │ │ │ │ └── subdir0 │ │ │ │ │ │ ├── blk_1073741825 │ │ │ │ │ │ ├── blk_1073741825_1001.meta │ │ │ │ │ └── rbw │ │ │ │ ├── scanner.cursor │ │ │ │ └── tmp │ │ │ └── VERSION │ │ └── in_use.lock │ ├── name │ │ ├── current │ │ │ ├── VERSION │ │ │ ├── edits_0000000000000000001-0000000000000000118 │ │ │ ├── edits_inprogress_0000000000000001233 │ │ │ ├── fsimage_0000000000000001230 │ │ │ ├── fsimage_0000000000000001230.md5 │ │ │ ├── fsimage_0000000000000001232 │ │ │ ├── fsimage_0000000000000001232.md5 │ │ │ └── seen_txid │ │ └── in_use.lock │ └── namesecondary │ ├── current │ │ ├── VERSION │ │ ├── edits_0000000000000000001-0000000000000000118 │ │ ├── edits_0000000000000000119-0000000000000000943 │ │ ├── edits_0000000000000001231-0000000000000001232 │ │ ├── fsimage_0000000000000001230 │ │ ├── fsimage_0000000000000001230.md5 │ │ ├── fsimage_0000000000000001232 │ │ └── fsimage_0000000000000001232.md5 │ └── in_use.lock └── nm-local-dir├── filecache├── nmPrivate└── usercache- NameNode: 文件的管理
- DataNode: 文件的切分與管理(Socket / Netty)
HDFS命令
- hdfs dfsadmin -report 查看集群狀態
Reference
[1].Hadoop Shell命令
[2]. 介紹hadoop中的hadoop和hdfs命令
[3]. Hadoop學習筆記4之HDFS常用命令
總結
以上是生活随笔為你收集整理的Hadoop Shell 命令 与 WordCount的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: MAX40026 280ps高速比较器开
- 下一篇: java 内存溢出 扩大jvm内存