Hadoop编程调用HDFS
主要介紹Hadoop家族產品,常用的項目包括Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa,新增加的項目包括,YARN, Hcatalog, Oozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, Hue等。
從2011年開始,中國進入大數據風起云涌的時代,以Hadoop為代表的家族軟件,占據了大數據處理的廣闊地盤。開源界及廠商,所有數據軟件,無一不向Hadoop靠攏。Hadoop也從小眾的高富帥領域,變成了大數據開發的標準。在Hadoop原有技術基礎之上,出現了Hadoop家族產品,通過“大數據”概念不斷創新,推出科技進步。
作為IT界的開發人員,我們也要跟上節奏,抓住機遇,跟著Hadoop一起雄起!
關于作者:
- 張丹(Conan), 程序員Java,R,PHP,Javascript
- weibo:@Conan_Z
- blog:?http://blog.fens.me
- email: bsspirit@gmail.com
轉載請注明出處:
http://blog.fens.me/hadoop-hdfs-api/
前言
HDFS 全稱Hadoop分步文件系統(Hadoop Distributed File System),是Hadoop的核心部分之一。要實現MapReduce的分步式算法時,數據必需提前放在HDFS上。因此,對于HDFS的操作就變得非常重要。Hadoop的命令行,提供了一套完整命令接口,就像Linux命令一樣方便使用。
不過,有時候我們還需要在程序中直接訪問HDFS,我們可以通過API的方式進行HDFS操作。
目錄
1. 系統環境
Hadoop集群環境
- Linux Ubuntu 64bit Server 12.04.2 LTS
- Java 1.6.0_29
- Hadoop 1.1.2
如何搭建Hadoop集群環境? 請參考文章:Hadoop歷史版本安裝
開發環境
- Win7 64bit
- Java 1.6.0_45
- Maven 3
- Hadoop 1.1.2
- Eclipse Juno Service Release 2
如何用Maven搭建Win7的Hadoop開發環境? 請參考文章:用Maven構建Hadoop項目
注:hadoop-core-1.1.2.jar,已重新編譯,已解決了Win遠程調用Hadoop的問題,請參考文章:Hadoop歷史版本安裝
Hadooop命令行:java FsShell
~ hadoop fsUsage: java FsShell[-ls ][-lsr ][-du ][-dus ][-count[-q] ][-mv ][-cp ][-rm [-skipTrash] ][-rmr [-skipTrash] ][-expunge][-put ... ][-copyFromLocal ... ][-moveFromLocal ... ][-get [-ignoreCrc] [-crc] ][-getmerge [addnl]][-cat ][-text ][-copyToLocal [-ignoreCrc] [-crc] ][-moveToLocal [-crc] ][-mkdir ][-setrep [-R] [-w] ][-touchz ][-test -[ezd] ][-stat [format] ][-tail [-f] ][-chmod [-R] PATH...][-chown [-R] [OWNER][:[GROUP]] PATH...][-chgrp [-R] GROUP PATH...][-help [cmd]]上面列出了30個命令,我只實現了一部分的HDFS的命令!
新建文件:HdfsDAO.java,用來調用HDFS的API。
public class HdfsDAO {//HDFS訪問地址private static final String HDFS = "hdfs://192.168.1.210:9000/";public HdfsDAO(Configuration conf) {this(HDFS, conf);}public HdfsDAO(String hdfs, Configuration conf) {this.hdfsPath = hdfs;this.conf = conf;}//hdfs路徑private String hdfsPath;//Hadoop系統配置private Configuration conf;//啟動函數public static void main(String[] args) throws IOException {JobConf conf = config();HdfsDAO hdfs = new HdfsDAO(conf);hdfs.mkdirs("/tmp/new/two");hdfs.ls("/tmp/new");} //加載Hadoop配置文件public static JobConf config(){JobConf conf = new JobConf(HdfsDAO.class);conf.setJobName("HdfsDAO");conf.addResource("classpath:/hadoop/core-site.xml");conf.addResource("classpath:/hadoop/hdfs-site.xml");conf.addResource("classpath:/hadoop/mapred-site.xml");return conf;}//API實現public void cat(String remoteFile) throws IOException {...}public void mkdirs(String folder) throws IOException {...}... }2. ls操作
說明:查看目錄文件
對應Hadoop命令:
~ hadoop fs -ls / Found 3 items drwxr-xr-x - conan supergroup 0 2013-10-03 05:03 /home drwxr-xr-x - Administrator supergroup 0 2013-10-03 13:49 /tmp drwxr-xr-x - conan supergroup 0 2013-10-03 09:11 /userJava程序:
public void ls(String folder) throws IOException {Path path = new Path(folder);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);FileStatus[] list = fs.listStatus(path);System.out.println("ls: " + folder);System.out.println("==========================================================");for (FileStatus f : list) {System.out.printf("name: %s, folder: %s, size: %d\n", f.getPath(), f.isDir(), f.getLen());}System.out.println("==========================================================");fs.close();}public static void main(String[] args) throws IOException {JobConf conf = config();HdfsDAO hdfs = new HdfsDAO(conf);hdfs.ls("/");}控制臺輸出:
ls: / ========================================================== name: hdfs://192.168.1.210:9000/home, folder: true, size: 0 name: hdfs://192.168.1.210:9000/tmp, folder: true, size: 0 name: hdfs://192.168.1.210:9000/user, folder: true, size: 0 ==========================================================3. mkdir操作
說明:創建目錄,可以創建多級目錄
對應Hadoop命令:
~ hadoop fs -mkdir /tmp/new/one ~ hadoop fs -ls /tmp/new Found 1 items drwxr-xr-x - conan supergroup 0 2013-10-03 15:35 /tmp/new/oneJava程序:
public void mkdirs(String folder) throws IOException {Path path = new Path(folder);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);if (!fs.exists(path)) {fs.mkdirs(path);System.out.println("Create: " + folder);}fs.close();}public static void main(String[] args) throws IOException {JobConf conf = config();HdfsDAO hdfs = new HdfsDAO(conf);hdfs.mkdirs("/tmp/new/two");hdfs.ls("/tmp/new");}控制臺輸出:
Create: /tmp/new/two ls: /tmp/new ========================================================== name: hdfs://192.168.1.210:9000/tmp/new/one, folder: true, size: 0 name: hdfs://192.168.1.210:9000/tmp/new/two, folder: true, size: 0 ==========================================================4. rmr操作
說明:刪除目錄和文件
對應Hadoop命令:
~ hadoop fs -rmr /tmp/new/one Deleted hdfs://master:9000/tmp/new/one~ hadoop fs -ls /tmp/new Found 1 items drwxr-xr-x - Administrator supergroup 0 2013-10-03 15:38 /tmp/new/twoJava程序:
public void rmr(String folder) throws IOException {Path path = new Path(folder);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);fs.deleteOnExit(path);System.out.println("Delete: " + folder);fs.close();}public static void main(String[] args) throws IOException {JobConf conf = config();HdfsDAO hdfs = new HdfsDAO(conf);hdfs.rmr("/tmp/new/two");hdfs.ls("/tmp/new");}控制臺輸出:
Delete: /tmp/new/two ls: /tmp/new ========================================================== ==========================================================5. copyFromLocal操作
說明:復制本地文件系統到HDFS
對應Hadoop命令:
~ hadoop fs -copyFromLocal /home/conan/datafiles/item.csv /tmp/new/~ hadoop fs -ls /tmp/new/ Found 1 items -rw-r--r-- 1 conan supergroup 210 2013-10-03 16:07 /tmp/new/item.csvJava程序:
public void copyFile(String local, String remote) throws IOException {FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);fs.copyFromLocalFile(new Path(local), new Path(remote));System.out.println("copy from: " + local + " to " + remote);fs.close();}public static void main(String[] args) throws IOException {JobConf conf = config();HdfsDAO hdfs = new HdfsDAO(conf);hdfs.copyFile("datafile/randomData.csv", "/tmp/new");hdfs.ls("/tmp/new");}控制臺輸出:
copy from: datafile/randomData.csv to /tmp/new ls: /tmp/new ========================================================== name: hdfs://192.168.1.210:9000/tmp/new/item.csv, folder: false, size: 210 name: hdfs://192.168.1.210:9000/tmp/new/randomData.csv, folder: false, size: 36655 ==========================================================6. cat操作
說明:查看文件內容
對應Hadoop命令:
~ hadoop fs -cat /tmp/new/item.csv 1,101,5.0 1,102,3.0 1,103,2.5 2,101,2.0 2,102,2.5 2,103,5.0 2,104,2.0 3,101,2.5 3,104,4.0 3,105,4.5 3,107,5.0 4,101,5.0 4,103,3.0 4,104,4.5 4,106,4.0 5,101,4.0 5,102,3.0 5,103,2.0 5,104,4.0 5,105,3.5 5,106,4.0Java程序:
public void cat(String remoteFile) throws IOException {Path path = new Path(remoteFile);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);FSDataInputStream fsdis = null;System.out.println("cat: " + remoteFile);try { fsdis =fs.open(path);IOUtils.copyBytes(fsdis, System.out, 4096, false); } finally { IOUtils.closeStream(fsdis);fs.close();}}public static void main(String[] args) throws IOException {JobConf conf = config();HdfsDAO hdfs = new HdfsDAO(conf);hdfs.cat("/tmp/new/item.csv");}控制臺輸出:
cat: /tmp/new/item.csv 1,101,5.0 1,102,3.0 1,103,2.5 2,101,2.0 2,102,2.5 2,103,5.0 2,104,2.0 3,101,2.5 3,104,4.0 3,105,4.5 3,107,5.0 4,101,5.0 4,103,3.0 4,104,4.5 4,106,4.0 5,101,4.0 5,102,3.0 5,103,2.0 5,104,4.0 5,105,3.5 5,106,4.07. copyToLocal操作
說明:從HDFS復制文件在本地操作系
對應Hadoop命令:
~ hadoop fs -copyToLocal /tmp/new/item.csv /home/conan/datafiles/tmp/~ ls -l /home/conan/datafiles/tmp/ -rw-rw-r-- 1 conan conan 210 Oct 3 16:16 item.csvJava程序:
public void download(String remote, String local) throws IOException {Path path = new Path(remote);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);fs.copyToLocalFile(path, new Path(local));System.out.println("download: from" + remote + " to " + local);fs.close();}public static void main(String[] args) throws IOException {JobConf conf = config();HdfsDAO hdfs = new HdfsDAO(conf);hdfs.download("/tmp/new/item.csv", "datafile/download");File f = new File("datafile/download/item.csv");System.out.println(f.getAbsolutePath());}控制臺輸出:
2013-10-12 17:17:32 org.apache.hadoop.util.NativeCodeLoader 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable download: from/tmp/new/item.csv to datafile/download D:\workspace\java\myMahout\datafile\download\item.csv8. 創建一個新文件,并寫入內容
說明:創建一個新文件,并寫入內容。
- touchz:可以用來創建一個新文件,或者修改文件的時間戳。
- 寫入內容沒有對應命令。
對應Hadoop命令:
~ hadoop fs -touchz /tmp/new/empty~ hadoop fs -ls /tmp/new Found 3 items -rw-r--r-- 1 conan supergroup 0 2013-10-03 16:24 /tmp/new/empty -rw-r--r-- 1 conan supergroup 210 2013-10-03 16:07 /tmp/new/item.csv -rw-r--r-- 3 Administrator supergroup 36655 2013-10-03 16:09 /tmp/new/randomData.csv~ hadoop fs -cat /tmp/new/emptyJava程序:
public void createFile(String file, String content) throws IOException {FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);byte[] buff = content.getBytes();FSDataOutputStream os = null;try {os = fs.create(new Path(file));os.write(buff, 0, buff.length);System.out.println("Create: " + file);} finally {if (os != null)os.close();}fs.close();}public static void main(String[] args) throws IOException {JobConf conf = config();HdfsDAO hdfs = new HdfsDAO(conf);hdfs.createFile("/tmp/new/text", "Hello world!!");hdfs.cat("/tmp/new/text");}控制臺輸出:
Create: /tmp/new/text cat: /tmp/new/text Hello world!!完整的文件:HdfsDAO.java
https://github.com/bsspirit/maven_mahout_template/blob/mahout-0.8/src/main/java/org/conan/mymahout/hdfs/HdfsDAO.java轉載請注明出處:
http://blog.fens.me/hadoop-hdfs-api/
總結
以上是生活随笔為你收集整理的Hadoop编程调用HDFS的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Clojure入门指南(Getting
- 下一篇: 用MapReduce实现矩阵乘法