生活随笔
收集整理的這篇文章主要介紹了
MapReduce 编程实践:统计对象中的某些属性
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
文章目錄
- 1. 生成數(shù)據(jù)
- 2. 編寫實(shí)體類
- 3. Mapper類
- 4. Reducer類
- 5. Driver類
- 6. 運(yùn)行
參考書:《Hadoop大數(shù)據(jù)原理與應(yīng)用》
相關(guān)文章:MapReduce 編程實(shí)踐
1. 生成數(shù)據(jù)
超市消費(fèi)者 數(shù)據(jù): id, 時(shí)間,消費(fèi)金額,會(huì)員/非會(huì)員
使用 Python 生成虛擬數(shù)據(jù)
import random
import time
consumer_type
= ['會(huì)員', '非會(huì)員']
vip
, novip
= 0, 0
vipValue
, novipValue
= 0, 0
with open("consumer.txt",'w',encoding
='utf-8') as f
:for i
in range(1000): random
.seed
(time
.time
()+i
)id = random
.randint
(0, 10000)t
= time
.strftime
('%Y-%m-%d %H:%M',time
.localtime
(time
.time
()))value
= random
.randint
(1, 500)type = consumer_type
[random
.randint
(0, 1)]f
.write
(str(id)+'\t'+t
+'\t'+str(value
)+'\t'+type+'\n')if type == consumer_type
[0]:vip
+= 1vipValue
+= value
else:novip
+= 1novipValue
+= value
print(consumer_type
[0] + ": 人數(shù) " + str(vip
) + ", 總金額: " + str(vipValue
) +", 平均金額:"+str(vipValue
/vip
))
print(consumer_type
[1] + ": 人數(shù) " + str(novip
) + ", 總金額: " + str(novipValue
) +", 平均金額:"+str(novipValue
/novip
))
[dnn
@master HDFS_example
]$ python test
.py
會(huì)員
: 人數(shù)
510, 總金額
: 128744, 平均金額:
252.439215686
非會(huì)員
: 人數(shù)
490, 總金額
: 123249, 平均金額:
251.528571429
2. 編寫實(shí)體類
package com
.michael
.mapreduce
;import org
.apache
.hadoop
.io
.IntWritable
;
import org
.apache
.hadoop
.io
.Writable
;
import java
.io
.DataInput
;
import java
.io
.DataOutput
;
import java
.io
.IOException
;
import java
.util
.Date
;public class Consumer implements Writable{private String id
;private int money
;private int vip
; public Consumer() {}public Consumer(String id
, int money
, int vip
) {this.id
= id
;this.money
= money
;this.vip
= vip
;}public int getMoney() {return money
;}public void setMoney() {this.money
= money
;}public String
getId() {return id
;}public void setId(String id
) {this.id
= id
;}public int getVip() {return vip
;}public void setVip(int vip
) {this.vip
= vip
;}public void write(DataOutput dataOutput
) throws IOException
{dataOutput
.writeUTF(id
);dataOutput
.writeInt(money
);dataOutput
.writeInt(vip
);}public void readFields(DataInput dataInput
) throws IOException
{this.id
= dataInput
.readUTF();this.money
= dataInput
.readInt();this.vip
= dataInput
.readInt();}@Overridepublic String
toString() {return this.id
+ "\t" + this.money
+ "\t" + this.vip
;}
}
3. Mapper類
package com
.michael
.mapreduce
;import org
.apache
.hadoop
.io
.IntWritable
;
import org
.apache
.hadoop
.io
.LongWritable
;
import org
.apache
.hadoop
.io
.Text
;
import org
.apache
.hadoop
.mapreduce
.Mapper
;
import java
.io
.IOException
;public class ConsumerMapper extends Mapper<LongWritable, Text, Text, IntWritable>{@Overrideprotected void map(LongWritable key
, Text value
, Context context
) throws IOException
, InterruptedException
{String line
= value
.toString();String
[] fields
= line
.split("\t");String id
= fields
[0];int money
= Integer
.parseInt(fields
[2]);String vip
= fields
[3];context
.write(new Text(vip
), new IntWritable(money
));}
}
4. Reducer類
package com
.michael
.mapreduce
;import org
.apache
.hadoop
.io
.IntWritable
;
import org
.apache
.hadoop
.io
.LongWritable
;
import org
.apache
.hadoop
.io
.Text
;
import org
.apache
.hadoop
.mapreduce
.Reducer
;
import java
.io
.IOException
;public class ConsumerReducer extends Reducer<Text, IntWritable, Text, LongWritable>{@Overrideprotected void reduce(Text key
, Iterable
<IntWritable> values
, Context context
) throws IOException
, InterruptedException
{int count
= 0;long sum
= 0;for(IntWritable v
: values
) {count
++;sum
+= v
.get();}long avg
= sum
/count
;context
.write(key
, new LongWritable(avg
));}
}
5. Driver類
package com
.michael
.mapreduce
;import org
.apache
.hadoop
.conf
.Configuration
;
import org
.apache
.hadoop
.fs
.Path
;
import org
.apache
.hadoop
.io
.IntWritable
;
import org
.apache
.hadoop
.io
.LongWritable
;
import org
.apache
.hadoop
.io
.Text
;
import org
.apache
.hadoop
.mapreduce
.Job
;
import org
.apache
.hadoop
.mapreduce
.lib
.input
.FileInputFormat
;
import org
.apache
.hadoop
.mapreduce
.lib
.output
.FileOutputFormat
;public class ConsumerDriver {public static void main(String
[] args
) throws Exception
{Configuration conf
= new Configuration();Job job
= Job
.getInstance(conf
);job
.setJarByClass(ConsumerDriver
.class);job
.setMapperClass(ConsumerMapper
.class);job
.setReducerClass(ConsumerReducer
.class);job
.setMapOutputKeyClass(Text
.class);job
.setMapOutputValueClass(IntWritable
.class);job
.setOutputKeyClass(Text
.class);job
.setOutputValueClass(LongWritable
.class);FileInputFormat
.setInputPaths(job
, new Path(args
[0]));FileOutputFormat
.setOutputPath(job
, new Path(args
[1]));boolean result
= job
.waitForCompletion(true);System
.exit(result
? 0 : 1);}
}
6. 運(yùn)行
[dnn
@master Desktop
]$ hadoop dfs
-copyFromLocal
/home
/dnn
/eclipse
-workspace
/HDFS_example
/consumer
.txt
/InputDataTest
- 導(dǎo)出 jar 在 bash 命令行運(yùn)行
hadoop jar
/home
/dnn
/eclipse
-workspace
/HDFS_example
/consumer_avg
.jar com
.michael
.mapreduce
.ConsumerDriver
/InputDataTest
/consumer
.txt
/OutputTest2
[dnn
@master Desktop
]$ hadoop fs
-cat
/OutputTest2
/part
-r
-00000
會(huì)員
252
非會(huì)員
251
總結(jié)
以上是生活随笔為你收集整理的MapReduce 编程实践:统计对象中的某些属性的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。