當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

简单的MapReduce项目，计算文件中单词出现的次数

發布時間：2025/3/21 编程问答 15 豆豆

生活随笔收集整理的這篇文章主要介紹了简单的MapReduce项目，计算文件中单词出现的次数小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

簡單的MapReduce項目，計算文件中單詞出現的次數

計算文件中單詞出現的次數，試題如下圖

1、創建讀取單詞的文件tast，內容如下：

hadoop core map reduce hiv hbase Hbase pig hadoop mapreduce MapReduce Hadoop Hbase spark

2、流程圖如下：

根據上圖得知，計算流程中Mapping和Reducing是需要自己編寫功能，其他交給Map/Reduce完成的

那么，我們首先編寫Mapping步驟的代碼，

新建WcMapper.java

package com.all58.mr;

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

public class WcMapper extends Mapper<LongWritable, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

/**

* 每次調用map方法會傳入split中一行數據；

* key：該行數據所在文件中的位置下標

* value：該行數據

@Override

protected void map(LongWritable key, Text value, Context context)

throws IOException, InterruptedException {

String line = value.toString();

StringTokenizer itr = new StringTokenizer(line);

while (itr.hasMoreTokens()) {

word.set(itr.nextToken());

context.write(word, one);//map的輸出

}

新建WcReduce.java

package com.all58.mr;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Reducer;

public class WcReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

private IntWritable result = new IntWritable();

@Override

protected void reduce(Text key, Iterable<IntWritable> iter,

Context context) throws IOException, InterruptedException {

int sum = 0;

for (IntWritable value : iter) {

sum += value.get();

}

result.set(sum);

context.write(key, result);

}

到此，計算程序全部完成，下面編寫Job執行程序

新建JobRun.java

package com.all58.mr;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class JobRun {

public static void main(String[] args) {

Configuration conf = new Configuration();

conf.set("mapred.job.tracker", "node1:9001");

try {

Job job = new Job(conf);

job.setJarByClass(JobRun.class);

job.setMapperClass(WcMapper.class);

job.setReducerClass(WcReducer.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(IntWritable.class);

//job.setNumReduceTasks(1);//設置reduce任務的個數

//mapreduce輸入數據所在目錄或文件

FileInputFormat.addInputPath(job, new Path("/opt/hadoop-1.2/mapred/xiaoming"));

//mapreduce執行之后的輸出數據的目錄

FileOutputFormat.setOutputPath(job, new Path("/opt/hadoop-1.2/mapred/xiaoming/output"));

System.exit(job.waitForCompletion(true) ? 0 : 1);

} catch (Exception e) {

e.printStackTrace();

}

運行

1、eclipse導出jar包 wc.jar，使用scp上傳至node1服務器

2、進入node1服務器~/hadoop-1.2.1/bin，執行命令

./hadoop jar ~/wc.jar com.all58.mr.JobRun

執行完畢，如下圖

打開eclipse，查看結果

part-r-00000的內容：

Hadoop 1

Hbase 2

MapReduce 1

core 1

hadoop 1

hbase 1

hiv 1

map 1

mapreduce 1

pig 1

reduce 1

spark 1

hadoop 1

總結

以上是生活随笔為你收集整理的简单的MapReduce项目，计算文件中单词出现的次数的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： MapReduce示例——WordCou
下一篇：入门HBase