Hadoop+eclipse运行MapReduce程序
前面,我們已經(jīng)通過(guò)eclipse下安裝Hadoop的插件配置好了基于Hadoop+eclipse的MapReduce開(kāi)發(fā)環(huán)境。現(xiàn)在,我們?cè)谶@個(gè)Hadoop+eclipse環(huán)境下運(yùn)行MapReduce程序。
一、新建MapReduce項(xiàng)目
【 File】—>【new】->【Project】,選擇【Map/Reduce Project】,單擊下一步,設(shè)置項(xiàng)目名稱為WordCount,確定。
在WordCount項(xiàng)目下,新建類,類名為WordCount,其程序內(nèi)容為WordCount.java。
二、設(shè)置HDFS的輸入文件
hadoop fs -mkdir input hadoop fs -copyFromLocal WordCount.txt input我將WordCount.java源程序的內(nèi)容拷到了WordCount.txt內(nèi),并上傳到Input中作為程序的輸入。三、配置eclipse的運(yùn)行參數(shù)
對(duì)本項(xiàng)目右鍵->【run】->【Run Configurations】,單擊中間的Arguments,并設(shè)置輸入輸出參數(shù)。在Program arguments欄中輸入:
本文中的master是在localhost上,因?yàn)閙aster可以替換成localhost。
點(diǎn)擊【Run】或?qū)Ρ卷?xiàng)目->【Run As】->【Run on Hadoop】,運(yùn)行MapReduce程序。
四、查看運(yùn)行結(jié)果
點(diǎn)擊【Run】后,eclipse的輸出窗口中會(huì)顯示,MapReduce程序運(yùn)行的狀態(tài):
15/11/23 17:47:06 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/11/23 17:47:07 WARN mapred.JobClient: No job jar file set.? User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
15/11/23 17:47:07 INFO input.FileInputFormat: Total input paths to process : 1
15/11/23 17:47:07 INFO mapred.JobClient: Running job: job_local_0001
15/11/23 17:47:07 INFO input.FileInputFormat: Total input paths to process : 1
15/11/23 17:47:07 INFO mapred.MapTask: io.sort.mb = 100
15/11/23 17:47:09 INFO mapred.JobClient:? map 0% reduce 0%
15/11/23 17:47:11 INFO mapred.MapTask: data buffer = 79691776/99614720
15/11/23 17:47:11 INFO mapred.MapTask: record buffer = 262144/327680
15/11/23 17:47:11 INFO mapred.MapTask: Starting flush of map output
15/11/23 17:47:12 INFO mapred.MapTask: Finished spill 0
15/11/23 17:47:12 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
15/11/23 17:47:12 INFO mapred.LocalJobRunner:
15/11/23 17:47:12 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
15/11/23 17:47:12 INFO mapred.LocalJobRunner:
15/11/23 17:47:12 INFO mapred.Merger: Merging 1 sorted segments
15/11/23 17:47:12 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 2723 bytes
15/11/23 17:47:12 INFO mapred.LocalJobRunner:
15/11/23 17:47:12 INFO mapred.JobClient:? map 100% reduce 0%
15/11/23 17:47:13 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
15/11/23 17:47:13 INFO mapred.LocalJobRunner:
15/11/23 17:47:13 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
15/11/23 17:47:13 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://master:9000/user/abc/output4
15/11/23 17:47:13 INFO mapred.LocalJobRunner: reduce > reduce
15/11/23 17:47:13 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
15/11/23 17:47:13 INFO mapred.JobClient:? map 100% reduce 100%
15/11/23 17:47:13 INFO mapred.JobClient: Job complete: job_local_0001
15/11/23 17:47:13 INFO mapred.JobClient: Counters: 14
15/11/23 17:47:13 INFO mapred.JobClient:?? FileSystemCounters
15/11/23 17:47:13 INFO mapred.JobClient:???? FILE_BYTES_READ=35973
15/11/23 17:47:13 INFO mapred.JobClient:???? HDFS_BYTES_READ=4938
15/11/23 17:47:13 INFO mapred.JobClient:???? FILE_BYTES_WRITTEN=72594
15/11/23 17:47:13 INFO mapred.JobClient:???? HDFS_BYTES_WRITTEN=2266
15/11/23 17:47:13 INFO mapred.JobClient:?? Map-Reduce Framework
15/11/23 17:47:13 INFO mapred.JobClient:???? Reduce input groups=0
15/11/23 17:47:13 INFO mapred.JobClient:???? Combine output records=114
15/11/23 17:47:13 INFO mapred.JobClient:???? Map input records=119
15/11/23 17:47:13 INFO mapred.JobClient:???? Reduce shuffle bytes=0
15/11/23 17:47:13 INFO mapred.JobClient:???? Reduce output records=0
15/11/23 17:47:13 INFO mapred.JobClient:???? Spilled Records=228
15/11/23 17:47:13 INFO mapred.JobClient:???? Map output bytes=3100
15/11/23 17:47:13 INFO mapred.JobClient:???? Combine input records=176
15/11/23 17:47:13 INFO mapred.JobClient:???? Map output records=176
15/11/23 17:47:13 INFO mapred.JobClient:???? Reduce input records=114
在eclipse的左上角DFS Locations下可以查看輸入和輸出結(jié)果,output下的part-r-00000即是WordCount程序的運(yùn)行結(jié)果文件。
我這里因?yàn)槎啻芜\(yùn)行,所以有多個(gè)output文件。
也可以在終端方式下查看運(yùn)行結(jié)果:
hadoop fs -cat output/*同時(shí),在web方式下輸入master:50070后進(jìn)去查看運(yùn)行結(jié)果也是可以的。
這次WordCount程序的運(yùn)行結(jié)果為:
!=?? ?1
"word?? ?1
(IntWritable?? ?1
(itr.hasMoreTokens())?? ?1
(otherArgs.length?? ?1
+=?? ?1
0?? ?1
0;?? ?1
1);?? ?1
2)?? ?1
:?? ?2
<in>?? ?1
<out>");?? ?1
=?? ?8
??? ?1
Configuration();?? ?1
Context?? ?1
Exception?? ?1
GenericOptionsParser(conf,?? ?1
IOException,?? ?2
IntSumReducer?? ?1
IntWritable?? ?2
IntWritable();?? ?1
IntWritable(1);?? ?1
IntWritable>{?? ?1
InterruptedException?? ?2
Iterable<IntWritable>?? ?1
Job(conf,?? ?1
Mapper<Object,?? ?1
Path(otherArgs[0]));?? ?1
Path(otherArgs[1]));?? ?1
Reducer<Text,IntWritable,Text,IntWritable>?? ?1
StringTokenizer(value.toString());?? ?1
Text?? ?2
Text();?? ?1
Text,?? ?2
TokenizerMapper?? ?1
WordCount?? ?1
args)?? ?1
args).getRemainingArgs();?? ?1
class?? ?3
conf?? ?1
context)?? ?2
count");?? ?1
extends?? ?2
final?? ?1
import?? ?12
itr?? ?1
java.io.IOException;?? ?1
java.util.StringTokenizer;?? ?1
job?? ?1
key,?? ?2
main(String[]?? ?1
map(Object?? ?1
new?? ?9
one?? ?1
one);?? ?1
org.apache.hadoop.conf.Configuration;?? ?1
org.apache.hadoop.fs.Path;?? ?1
org.apache.hadoop.io.IntWritable;?? ?1
org.apache.hadoop.io.Text;?? ?1
org.apache.hadoop.mapreduce.Job;?? ?1
org.apache.hadoop.mapreduce.Mapper;?? ?1
org.apache.hadoop.mapreduce.Reducer;?? ?1
org.apache.hadoop.mapreduce.lib.input.FileInputFormat;?? ?1
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;?? ?1
org.apache.hadoop.util.GenericOptionsParser;?? ?1
otherArgs?? ?1
public?? ?4
reduce(Text?? ?1
result?? ?1
result);?? ?1
static?? ?4
sum?? ?1
throws?? ?3
val?? ?1
val.get();?? ?1
value,?? ?1
values)?? ?1
values,Context?? ?1
void?? ?3
word?? ?1
wordcount?? ?1
{?? ?8
}?? ?4
Configuration?? ?1
FileInputFormat.addInputPath(job,?? ?1
FileOutputFormat.setOutputPath(job,?? ?1
Job?? ?1
String[]?? ?1
System.exit(job.waitForCompletion(true)?? ?1
if?? ?1
job.setCombinerClass(IntSumReducer.class);?? ?1
job.setJarByClass(WordCount.class);?? ?1
job.setMapperClass(TokenizerMapper.class);?? ?1
job.setOutputKeyClass(Text.class);?? ?1
job.setOutputValueClass(IntWritable.class);?? ?1
job.setReducerClass(IntSumReducer.class);?? ?1
private?? ?3
public?? ?2
}?? ?3
StringTokenizer?? ?1
System.err.println("Usage:?? ?1
System.exit(2);?? ?1
context.write(key,?? ?1
for?? ?1
int?? ?1
result.set(sum);?? ?1
}?? ?1
sum?? ?1
while?? ?1
}?? ?1
context.write(word,?? ?1
word.set(itr.nextToken());?? ?1
總結(jié)
以上是生活随笔為你收集整理的Hadoop+eclipse运行MapReduce程序的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: Hadoop自带WordCount.ja
- 下一篇: GitHub添加SSH keys报错Ke