當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

JAVA API实现HDFS操作（二）操作函数

發布時間：2025/4/5 编程问答 23 豆豆

生活随笔收集整理的這篇文章主要介紹了 JAVA API实现HDFS操作（二）操作函数小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

說明：在IDEA集成環境中利用JAVA API實現目錄的創建、文件的創建、文件的上傳和下載、文件的查看、文件刪除、文件的編輯等操作。以下代碼均創建在my.dfs包下

創建文件夾

在hdfs系統的根目錄下創建文件夾 /hdfstest
驗證程序執行結果：

$hadoop fs -ls -R /

package my.hdfs; import java.io.IOException; import java.net.URI; import java.net.URISyntaxException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; /** 在hdfs系統的根目錄下創建文件夾 /hdfstest */ public class MakeDir {public static void main(String[] args) throws IOException, URISyntaxException {// 創建 Configuration 對象Configuration conf = new Configuration();// FileSystem默認的端口號:9000String hdfsPath = "hdfs://localhost:9000";// 獲取 FileSystem 對象FileSystem hdfs = FileSystem.get(new URI(hdfsPath), conf);String newDir = "/hdfstest";// 如果創建成功將返回trueboolean result = hdfs.mkdirs(new Path(newDir));if (result) {System.out.println("Success!");} else {System.out.println("Failed!");}} }

迭代文件夾獲取文件信息

設置文件夾，迭代獲取其下的文件和文件信息（權限、所有者、所在組、文件所在路徑）

package my.hdfs;import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path;import java.io.IOException; import java.net.URI;public class IteratorListFiles {public static void main(String[] args) throws IOException {FileSystem hdfs = FileSystem.get(URI.create("hdfs://localhost:9000/"), new Configuration());String watchHDFS = "/";iteratorListFile(hdfs, new Path(watchHDFS));}public static void iteratorListFile(FileSystem hdfs, Path path) throws IOException {FileStatus[] files = hdfs.listStatus(path);for (FileStatus file: files) {if (file.isDirectory()) {System.out.println("dir:" + file.getPermission() + " " + file.getOwner() + " " + file.getGroup() + " " + file.getPath());} else if(file.isFile()){System.out.println("file:" + file.getPermission() + " " + file.getOwner() + " " + file.getGroup() + " " + file.getPath());}}} }

操作文件

創建、重命名、刪除文件、判斷文件是否存在

package my.hdfs;import java.io.IOException; import java.net.URI; import java.net.URISyntaxException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path;/** 創建、重命名、刪除文件、判斷文件是否存在、迭代文件夾獲取其下的文件和文件夾信息 */ public class OperateFile {public static void main(String[] args) throws IOException, URISyntaxException {// 創建 Configuration 對象Configuration conf = new Configuration();// FileSystem默認的端口號:9000String hdfsPath = "hdfs://localhost:9000";// 獲取 FileSystem 對象FileSystem hdfs = FileSystem.get(new URI(hdfsPath), conf);// 1. 迭代 /hdfstest 文件夾下所有的文件文件String watchHDFS = "/hdfstest";FileStatus[] files = hdfs.listStatus(new Path(watchHDFS));for (FileStatus file: files){if (file.isFile()) {System.out.println("file:" + file.getPermission() + " " + file.getOwner() + " " + file.getGroup() + " " + file.getPath());}}System.out.println("----------Operate file----------");// 2. create file。如果文件所在的文件夾不存在，那么會自動創建文件夾String filePath = "/hdfstest/src_data";FSDataOutputStream create = hdfs.create(new Path(filePath));System.out.println("create file successfully!" + create);// 3. 重命名 hdfs.rename() src_data -> dst_dataPath src = new Path("/hdfstest/src_data");Path dst = new Path("/hdfstest/dst_data");System.out.println("rename file successfully?" + hdfs.rename(src, dst));// 判斷文件是否存在 hdfs.exist() /hdfstest/src_dataSystem.out.println("after rename,src_data is still exist?" + hdfs.exists(src));// 4. 刪除文件 hdfs.delete() dst_datahdfs.delete(dst, true);// 判斷文件是否存在 /hdfstest/dst_dataSystem.out.println("after delete(), dst_data is still exist?" + hdfs.exists(dst));System.out.println("----------Operate file----------");// 5. 迭代 /hdfstest 文件夾下所有的文件文件for (FileStatus file: files){if (file.isFile()) {System.out.println("file:" + file.getPermission() + " " + file.getOwner() + " " + file.getGroup() + " " + file.getPath());}}} }

寫入文件信息

在HDFS文件系統中創建/hdfstest/writefile文件，在文件中寫入內容hello world hello data!

驗證程序執行結果：

$hadoop fs -ls -R /hdfstest $hadoop fs -cat /hdfstest/writefile

api實現代碼：

package my.hdfs;import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.Configuration;import java.io.IOException; import java.net.URI; import java.net.URISyntaxException;/* 在HDFS文件系統中創建 /hdfstest/writefile 文件并在文件中寫入內容 hello world hello data! */ public class WriteFile{public static void main(String[] args) throws IOException, URISyntaxException {// FileSystem.get(Configuration conf) 使用配置文件來獲取文件系統，配置文件conf/core-site.xml，若沒有指定則返回local file system. （原來是這樣）// FileSystem.get(URI uri, Configuration conf) 根據uri和conf來確定文件系統// 若 hdfs系統的uri 寫錯，則會默認為local文件系統，從當前java的Project文件所在的位置FileSystem hdfs = FileSystem.get(new URI("hdfs//localhost:9000"), new Configuration());String filePath = "/hdfstest/writefile";FSDataOutputStream create = hdfs.create(new Path(filePath));System.out.println("Step 1 Finish!");String sayHi = "hello world hello data!";byte[] buff = sayHi.getBytes();create.write(buff, 0, buff.length);create.close();System.out.println("Step 2 Finish!");} }

合并文件內容

首先進入本地/data/hadoop目錄下，將該目錄下的所有文件刪除（此時要求/data/hadoop中必須全是文件，不能有目錄）。

$cd /data/hadoop $sudo rm -r /data/hadoop/*

然后在該目錄下新建兩文件，分別命名為file1、file2。

$touch file1 $touch file2

向 file1 和 file2 中，分別輸入內容如下

$echo "hello file1" > file1 $echo "hello file2" > file2

在 my.hdfs 包下，新建類 PutMerge，將 Linux 本地文件夾/data/hadoop/下的所有文件，上傳到HDFS上并合并成一個文件/hdfstest/mergefile

驗證程序執行結果：

$hadoop fs -ls /hdfstest

api代碼實現

package my.hdfs;import org.apache.hadoop.fs.*; import org.apache.hadoop.conf.Configuration;import java.io.IOException; import java.net.URI; import java.net.URISyntaxException;/** 將/data/hadoop下的所有文件上傳至HDFS并合并文件內容 */public class PutMerge{public static void main(String[] args) throws IOException, URISyntaxException {Configuration conf = new Configuration();FileSystem hdfs = FileSystem.get(new URI("hdfs://localhost:9000"), conf);FileSystem local = FileSystem.getLocal(conf);String from_LinuxDir = "/data/hadoop/";String to_HDFS = "/hdfstest/mergefile";FileStatus[] inputFiles = local.listStatus(new Path(from_LinuxDir));FSDataOutputStream out = hdfs.create(new Path(to_HDFS));for (FileStatus file: inputFiles) {FSDataInputStream in = local.open(file.getPath());byte[] buffer = new byte[256];int bytesRead = 0;while ((bytesRead = in.read(buffer)) > 0) {out.write(buffer, 0, bytesRead); }in.close();}System.out.println("Finish!");} }

文件上傳與下載

HDFS文件下載至本地

在/data/hadoop/下創建目錄 copytolocal。

$mkdir /data/hadoop/copytolocal

在my.hdfs包下，創建類CopyToLocalFile，程序功能是將 HDFS 文件系統上的文件/hdfstest/sample_data，下載到本地/data/hadoop/copytolocal

驗證程序執行結果：

$cd /data/hadoop/copytolocal $ls

api實現代碼：

package my.hdfs;import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path;import java.io.IOException; import java.net.URI; import java.net.URISyntaxException;/** 將 HDFS 文件系統中的文件下載到本地 Linux 系統 */ public class CopyToLocalFile{public static void main(String[] args) throws URISyntaxException, IOException {Configuration conf = new Configuration();String hdfsPath = "hdfs://localhost:9000";FileSystem hdfs = FileSystem.get(new URI(hdfsPath), conf);String from_HDFS = "/hdfstest/sample_data";String to_Local = "/data/hadoop/copytolocal";hdfs.copyToLocalFile(false, new Path(from_HDFS), new Path(to_Local));System.out.println("Finish!");} }

/hdfstest/sample_data等價于hdfs://localhost:9000/hdfstest/sample_data

若刪除/，則hdfstest/sample_data表示：？

若無法將文件移至/data/hadoop/copytolocal，即為權限不夠，終端輸入：chomd -R 777 /data/hadoop/copytolocal

本地文件上傳至HDFS

在 /data/hadoop 下創建 sample_data 文件

$cd /data/hadoop $touch sample_data

sample_data 文件中寫入hello world

$echo "hello file1" > sample_data

在 my.hdfs 包下，創建類``CopyFromLocalFile.java，將本地 Linux 操作系統上的文件/data/hadoop/sample_data，上傳到 HDFS 文件系統的/hdfstest` 目錄下。

驗證程序執行結果：

$hadoop fs -ls -R /

api實現代碼：

package my.hdfs;import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import java.io.IOException; import java.net.URI; import java.net.URISyntaxException;/* 將本地文件上傳到 HDFS 文件系統 */ public class CopyFromLocalFile{public static void main(String[] args) throws IOException, URISyntaxException {Configuration conf = new Configuration();String hdfsPath = "hdfs://localhost:9000";FileSystem hdfs = FileSystem.get(new URI(hdfsPath), conf);String from_Linux = "/data/hadoop/sample_data";String to_HDFS = "/hdfstest";hdfs.copyFromLocalFile(new Path(from_Linux), new Path(to_HDFS));System.out.println("Finish!");} }

文件塊信息

在 my.hdfs 包下，新建類 LocateFile，程序功能是查看 HDFS 文件系統上，文件/hdfstest/sample_data的文件塊信息

package my.hdfs;import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.BlockLocation; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path;import java.io.IOException; import java.net.URI; import java.net.URISyntaxException;public class LocateFile{public static void main(String[] args) throws URISyntaxException, IOException {FileSystem hdfs = FileSystem.get(new URI("hdfs://localhost:9000"), new Configuration());Path file = new Path("/hdfstest/sample_data");FileStatus fileStatus = hdfs.getFileStatus(file);BlockLocation[] location = hdfs.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());for (BlockLocation block: location) {String[] hosts = block.getHosts();for (String host: hosts){System.out.println("block:" + block + "host:" + host);}}} }

獲取FileSystem對象，詳見：JAVA API實現HDFS操作（一）獲取FileSystem

用Java API實現HDFS操作過程中的常見問題，詳見：用Java API實現HDFS操作（三）問題匯總

《新程序員》：云原生和全面數字化實踐50位技術專家共同創作，文字、視頻、音頻交互閱讀

總結

以上是生活随笔為你收集整理的JAVA API实现HDFS操作（二）操作函数的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：基于DataFrame结构的词频表生成词
下一篇： pandas中的括号索引