一:HDFS 用户指导
生活随笔
收集整理的這篇文章主要介紹了
一:HDFS 用户指导
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
1.hdfs的牛逼特性
- Hadoop, including HDFS, is well suited for distributed storage and distributed processing using commodity hardware. It is fault tolerant, scalable, and extremely simple to expand. MapReduce, well known for its simplicity and applicability for large set of distributed applications, is an integral part of Hadoop. 分布式存儲
- HDFS is highly configurable with a default configuration well suited for many installations. Most of the time, configuration needs to be tuned only for very large clusters. 適當的配置
- Hadoop is written in Java and is supported on all major platforms. 平臺適應性
- Hadoop supports shell-like commands to interact with HDFS directly. shell-like的操作方式
- The NameNode and Datanodes have built in web servers that makes it easy to check current status of the cluster. 內置web服務,方便檢查集群
- New features and improvements are regularly implemented in HDFS. The following is a subset of useful features in HDFS:
- File permissions and authentication. ?文件權限驗證
- Rack awareness: to take a node's physical location into account while scheduling tasks and allocating storage.
- Safemode: an administrative mode for maintenance. ?安全模式,用于運維
- fsck: a utility to diagnose health of the file system, to find missing files or blocks. ?檢查文件系統的工具,發現丟失的文件或者塊
- fetchdt: a utility to fetch DelegationToken and store it in a file on the local system.
- Balancer: tool to balance the cluster when the data is unevenly distributed among DataNodes.
- Upgrade and rollback: after a software upgrade, it is possible to rollback to HDFS' state before the upgrade in case of unexpected problems.
- Secondary NameNode: performs periodic checkpoints of the namespace and helps keep the size of file containing log of HDFS modifications within certain limits at the NameNode.
- Checkpoint node: performs periodic checkpoints of the namespace and helps minimize the size of the log stored at the NameNode containing changes to the HDFS. Replaces the role previously filled by the Secondary NameNode, though is not yet battle hardened. The NameNode allows multiple Checkpoint nodes simultaneously, as long as there are no Backup nodes registered with the system.
- Backup node: An extension to the Checkpoint node. In addition to checkpointing it also receives a stream of edits from the NameNode and maintains its own in-memory copy of the namespace, which is always in sync with the active NameNode namespace state. Only one Backup node may be registered with the NameNode at once.
來源:?http://hadoop.apache.org/docs/r2.6.4/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
- -report: reports basic statistics of HDFS. Some of this information is also available on the NameNode front page. 報告狀態
- -safemode: though usually not required, an administrator can manually enter or leave Safemode. ?開啟安全模式
- -finalizeUpgrade: removes previous backup of the cluster made during last upgrade. 刪除上次集群更新時的備份
- -refreshNodes: Updates the namenode with the set of datanodes allowed to connect to the namenode. Namenodes re-read datanode hostnames in the file defined bydfs.hosts,?dfs.hosts.exclude. Hosts defined in?dfs.hosts?are the datanodes that are part of the cluster. If there are entries in?dfs.hosts, only the hosts in it are allowed to register with the namenode. Entries in?dfs.hosts.exclude?are datanodes that need to be decommissioned. Datanodes complete decommissioning when all the replicas from them are replicated to other datanodes. Decommissioned nodes are not automatically shutdown and are not chosen for writing for new replicas.
- -printTopology?: Print the topology of the cluster. Display a tree of racks and datanodes attached to the tracks as viewed by the NameNode. 打印拓撲
- dfs.namenode.checkpoint.period, set to 1 hour by default, specifies the maximum delay between two consecutive checkpoints, and
- dfs.namenode.checkpoint.txns, set to 1 million by default, defines the number of uncheckpointed transactions on the NameNode which will force an urgent checkpoint, even if the checkpoint period has not been reached.
- Policy to keep one of the replicas of a block on the same node as the node that is writing the block. ?在當前讀寫的節點中保存一個數據備份。
- Need to spread different replicas of a block across the racks so that cluster can survive loss of whole rack. 保存數據分布到各個機架,可以允許整個機架的丟失
- One of the replicas is usually placed on the same rack as the node writing to the file so that cross-rack network I/O is reduced.?
- Spread HDFS data uniformly across the DataNodes in the cluster.
來源:?http://hadoop.apache.org/docs/r2.6.4/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer
<wiz_tmp_tag id="wiz-table-range-border" contenteditable="false" style="display: none;">
轉載于:https://www.cnblogs.com/skyrim/p/7455503.html
總結
以上是生活随笔為你收集整理的一:HDFS 用户指导的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 布偶猫多少钱一只,宠物级、繁育级、赛级有
- 下一篇: r.json()