Hadoop的TextInputFormat的作用,如何自定义实现的
生活随笔
收集整理的這篇文章主要介紹了
Hadoop的TextInputFormat的作用,如何自定义实现的
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
代碼先行:【源代碼】
二話不說,代碼先行!
/*** Licensed to the Apache Software Foundation (ASF) under one* or more contributor license agreements. See the NOTICE file* distributed with this work for additional information* regarding copyright ownership. The ASF licenses this file* to you under the Apache License, Version 2.0 (the* "License"); you may not use this file except in compliance* with the License. You may obtain a copy of the License at** http://www.apache.org/licenses/LICENSE-2.0** Unless required by applicable law or agreed to in writing, software* distributed under the License is distributed on an "AS IS" BASIS,* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.* See the License for the specific language governing permissions and* limitations under the License.*/package org.apache.hadoop.mapred;import java.io.*;import org.apache.hadoop.classification.InterfaceAudience; import org.apache.hadoop.classification.InterfaceStability; import org.apache.hadoop.fs.*; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.compress.*;import com.google.common.base.Charsets;/** * An {@link InputFormat} for plain text files. Files are broken into lines.* Either linefeed or carriage-return are used to signal end of line. Keys are* the position in the file, and values are the line of text.. */ @InterfaceAudience.Public @InterfaceStability.Stable public class TextInputFormat extends FileInputFormat<LongWritable, Text>implements JobConfigurable {private CompressionCodecFactory compressionCodecs = null;public void configure(JobConf conf) {compressionCodecs = new CompressionCodecFactory(conf);}protected boolean isSplitable(FileSystem fs, Path file) {final CompressionCodec codec = compressionCodecs.getCodec(file);if (null == codec) {return true;}return codec instanceof SplittableCompressionCodec;}public RecordReader<LongWritable, Text> getRecordReader(InputSplit genericSplit, JobConf job,Reporter reporter)throws IOException {reporter.setStatus(genericSplit.toString());String delimiter = job.get("textinputformat.record.delimiter");byte[] recordDelimiterBytes = null;if (null != delimiter) {recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8);}return new LineRecordReader(job, (FileSplit) genericSplit,recordDelimiterBytes);} }先看注釋:
/** * An {@link InputFormat} for plain text files. Files are broken into lines.* Either linefeed or carriage-return are used to signal end of line. Keys are* the position in the file, and values are the line of text.. */ 純文本文件的{@link InputFormat}。文件被分成幾行。換行或載波返回都用來表示行結束。鍵是文件 中的位置,值是文本的行。InputFormat用于描述輸入數據的格式。
TextInputFormat重寫了其父類的isSplitable和RecordReader方法。
采用的編碼機Charsets.UTF_8
其他的沒什么說的,【主要不懂。。。】
?
總結
以上是生活随笔為你收集整理的Hadoop的TextInputFormat的作用,如何自定义实现的的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 简单的combineByKey算子【看完
- 下一篇: 什么时候该用MySQL,什么时候该用ES