當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Hadoop的TextInputFormat的作用，如何自定义实现的

發布時間：2024/2/28 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了 Hadoop的TextInputFormat的作用，如何自定义实现的小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

代碼先行：【源代碼】

二話不說，代碼先行！

/*** Licensed to the Apache Software Foundation (ASF) under one* or more contributor license agreements. See the NOTICE file* distributed with this work for additional information* regarding copyright ownership. The ASF licenses this file* to you under the Apache License, Version 2.0 (the* "License"); you may not use this file except in compliance* with the License. You may obtain a copy of the License at** http://www.apache.org/licenses/LICENSE-2.0** Unless required by applicable law or agreed to in writing, software* distributed under the License is distributed on an "AS IS" BASIS,* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.* See the License for the specific language governing permissions and* limitations under the License.*/package org.apache.hadoop.mapred;import java.io.*;import org.apache.hadoop.classification.InterfaceAudience; import org.apache.hadoop.classification.InterfaceStability; import org.apache.hadoop.fs.*; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.compress.*;import com.google.common.base.Charsets;/** * An {@link InputFormat} for plain text files. Files are broken into lines.* Either linefeed or carriage-return are used to signal end of line. Keys are* the position in the file, and values are the line of text.. */ @InterfaceAudience.Public @InterfaceStability.Stable public class TextInputFormat extends FileInputFormat<LongWritable, Text>implements JobConfigurable {private CompressionCodecFactory compressionCodecs = null;public void configure(JobConf conf) {compressionCodecs = new CompressionCodecFactory(conf);}protected boolean isSplitable(FileSystem fs, Path file) {final CompressionCodec codec = compressionCodecs.getCodec(file);if (null == codec) {return true;}return codec instanceof SplittableCompressionCodec;}public RecordReader<LongWritable, Text> getRecordReader(InputSplit genericSplit, JobConf job,Reporter reporter)throws IOException {reporter.setStatus(genericSplit.toString());String delimiter = job.get("textinputformat.record.delimiter");byte[] recordDelimiterBytes = null;if (null != delimiter) {recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8);}return new LineRecordReader(job, (FileSplit) genericSplit,recordDelimiterBytes);} }

先看注釋：

/** * An {@link InputFormat} for plain text files. Files are broken into lines.* Either linefeed or carriage-return are used to signal end of line. Keys are* the position in the file, and values are the line of text.. */ 純文本文件的{@link InputFormat}。文件被分成幾行。換行或載波返回都用來表示行結束。鍵是文件中的位置，值是文本的行。

InputFormat用于描述輸入數據的格式。

TextInputFormat重寫了其父類的isSplitable和RecordReader方法。

采用的編碼機Charsets.UTF_8

其他的沒什么說的，【主要不懂。。。】

總結

以上是生活随笔為你收集整理的Hadoop的TextInputFormat的作用，如何自定义实现的的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：简单的combineByKey算子【看完
下一篇：什么时候该用MySQL，什么时候该用ES