當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

一：HDFS 用户指导

發布時間：2023/11/27 生活经验 27 豆豆

生活随笔收集整理的這篇文章主要介紹了一：HDFS 用户指导小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1.hdfs的牛逼特性

Hadoop, including HDFS, is well suited for distributed storage and distributed processing using commodity hardware. It is fault tolerant, scalable, and extremely simple to expand. MapReduce, well known for its simplicity and applicability for large set of distributed applications, is an integral part of Hadoop. 分布式存儲
HDFS is highly configurable with a default configuration well suited for many installations. Most of the time, configuration needs to be tuned only for very large clusters. 適當的配置
Hadoop is written in Java and is supported on all major platforms. 平臺適應性
Hadoop supports shell-like commands to interact with HDFS directly. shell-like的操作方式
The NameNode and Datanodes have built in web servers that makes it easy to check current status of the cluster. 內置web服務，方便檢查集群
New features and improvements are regularly implemented in HDFS. The following is a subset of useful features in HDFS:
- File permissions and authentication. ?文件權限驗證
- Rack awareness: to take a node's physical location into account while scheduling tasks and allocating storage.
- Safemode: an administrative mode for maintenance. ?安全模式，用于運維
- fsck: a utility to diagnose health of the file system, to find missing files or blocks. ?檢查文件系統的工具，發現丟失的文件或者塊
- fetchdt: a utility to fetch DelegationToken and store it in a file on the local system.
- Balancer: tool to balance the cluster when the data is unevenly distributed among DataNodes.
- Upgrade and rollback: after a software upgrade, it is possible to rollback to HDFS' state before the upgrade in case of unexpected problems.
- Secondary NameNode: performs periodic checkpoints of the namespace and helps keep the size of file containing log of HDFS modifications within certain limits at the NameNode.
- Checkpoint node: performs periodic checkpoints of the namespace and helps minimize the size of the log stored at the NameNode containing changes to the HDFS. Replaces the role previously filled by the Secondary NameNode, though is not yet battle hardened. The NameNode allows multiple Checkpoint nodes simultaneously, as long as there are no Backup nodes registered with the system.
- Backup node: An extension to the Checkpoint node. In addition to checkpointing it also receives a stream of edits from the NameNode and maintains its own in-memory copy of the namespace, which is always in sync with the active NameNode namespace state. Only one Backup node may be registered with the NameNode at once.
  來源：?http://hadoop.apache.org/docs/r2.6.4/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html

2.webUI 默認是50070端口 3.hdfs基本管理命令 bin/hdfs dfsadmin -參數

-report: reports basic statistics of HDFS. Some of this information is also available on the NameNode front page. 報告狀態
-safemode: though usually not required, an administrator can manually enter or leave Safemode. ?開啟安全模式
-finalizeUpgrade: removes previous backup of the cluster made during last upgrade. 刪除上次集群更新時的備份
-refreshNodes: Updates the namenode with the set of datanodes allowed to connect to the namenode. Namenodes re-read datanode hostnames in the file defined bydfs.hosts,?dfs.hosts.exclude. Hosts defined in?dfs.hosts?are the datanodes that are part of the cluster. If there are entries in?dfs.hosts, only the hosts in it are allowed to register with the namenode. Entries in?dfs.hosts.exclude?are datanodes that need to be decommissioned. Datanodes complete decommissioning when all the replicas from them are replicated to other datanodes. Decommissioned nodes are not automatically shutdown and are not chosen for writing for new replicas.
-printTopology?: Print the topology of the cluster. Display a tree of racks and datanodes attached to the tracks as viewed by the NameNode. 打印拓撲

4.secondary namenode? namenode把文件系統的修改以日志追加方式寫到本地文件系統，namenode啟動時，先從鏡像中讀取HDFS的狀態，然后再把日志中的修改合并到鏡像中，再打開一個新的日志文件接收新的修改。namenode僅僅在啟動時才合并狀態鏡像和日志，所以日志可能會變的非常大，在下次啟動時需要合并的內容太多導致啟動時間很長。 secondary namenode定時的從namenode合并日志，并且保證日志大小限制在一定的范圍內。一般不和主namenode放一起，但機器的配置要和namenode一樣。 secondary namenode上的checkpoint 里程由以下兩個參數控制：

dfs.namenode.checkpoint.period, set to 1 hour by default, specifies the maximum delay between two consecutive checkpoints, and
dfs.namenode.checkpoint.txns, set to 1 million by default, defines the number of uncheckpointed transactions on the NameNode which will force an urgent checkpoint, even if the checkpoint period has not been reached.

dfs.namenode.checkpoint.preiod ?兩次執行checkpoint之間的最大時間間隔 dfs.namenode.checkpoint.txns ? ?當沒有checkpoint的事務達到多少時執行，即使未達到上面的參數設置的時間,默認是100萬(比如10分鐘修改了100萬個，那么10分鐘就執行一次checkpoint而非1小時) 5.checkpoint node 和secondary namenode極為相似，不同的地方是checkpoint下載hdfs狀態鏡像和日志文件，并在本地合并，合并后還上傳到正在運行的namenode. dfs.namenode.backup.address?? ? ??地址 dfs.namenode.backup.http-address? ip端口 dfs.namenode.checkpoint.preiod 和dfs.namenode.checkpoint.txns? 同樣影響checkpoint checkpoint node和secondary namenode實際上就是一個東西，只是名稱有所不同 6.backup node backup node的功能和checkpoint node一樣，但是backup node能實時的從namenode讀取namespace變化數據并合并到本地(注意：namenode是不合并，只有重啟后才合并)，所以backup node是namenode的完全實時備份。目前一個集群只能有一個backup node，未來可以支持多個。一旦有個backup node，checkpoint node就無法再注冊進集群。backup node的配置文件和checkpoint一致(dfs.namenode.backup.address?\?dfs.namenode.backup.http-address)，以bin/hdfs namenode -backup啟動 7.import checkpoint 如果鏡像文件和日志文件丟失，可以用import checkpoint方式從checkpoint節點讀取。需要配置三個參數： dfs.namenode.name.dir?namenode的元數據文件夾 dfs.namenode.checkpoint.dir?checkpoint node上傳鏡像的文件夾以-importCheckpoint的方式啟動namenode? 8.balancer HDFS中數據可能不是均衡的放在集群中?？紤]到一下情況：

Policy to keep one of the replicas of a block on the same node as the node that is writing the block. ?在當前讀寫的節點中保存一個數據備份。
Need to spread different replicas of a block across the racks so that cluster can survive loss of whole rack. 保存數據分布到各個機架，可以允許整個機架的丟失
One of the replicas is usually placed on the same rack as the node writing to the file so that cross-rack network I/O is reduced.?
Spread HDFS data uniformly across the DataNodes in the cluster.
來源：?http://hadoop.apache.org/docs/r2.6.4/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer

9.機架感知，略 10.safemode 當集群重新啟動時，namenode讀取狀態鏡像和日志信息，此時namenode等待datanode報告塊信息，所以不會立即打開集群，此時namenode處于safemode,集群處于只讀狀態。等datanode報告完塊信息后，集群自動打開，解除safemode狀態。可以手動設置safemode狀態。 11.fsck fsck命令用來檢查文件(文件塊)不一致,與傳統的fsck不一樣的地方是,該命令并不修正錯誤,默認下不檢查已經打開的文件.fsck命令不是hadoop shell 命令,但是可以以bin/hdfs fsck啟動. 12.fecthdt HDFS支持fecthdt命令來讀取口令并存放在本地文件系統中.該口令可用于非安全驗證的客戶端連接到安全的服務器上(比如namenode).略.. 13.recovery mode 恢復模式.如果僅有的namemode元數據丟失了,可以通過recovery mode找到部分數據,此時以namenode -recover啟動namenode,然后按照提示輸入文件位置,可以使用force參數不輸入讓hdfs自己找文件位置 14.upgrade and rollback 升級和回滾.略 15.File permissions and security? 文件權限和安全.HDFS的文件權限類似LINUX.啟動namenode的用戶被視為HDFS的超級用戶. 16.可擴展性 HDFS可以支持數千個節點的集群.每個集群只有一個namenode,因此namenode的內存成為集群大小的限制

<wiz_tmp_tag id="wiz-table-range-border" contenteditable="false" style="display: none;">

來自為知筆記(Wiz)

轉載于:https://www.cnblogs.com/skyrim/p/7455503.html

總結

以上是生活随笔為你收集整理的一：HDFS 用户指导的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

用户
HDFS

上一篇：布偶猫多少钱一只，宠物级、繁育级、赛级有
下一篇： r.json()