hadoop 分布式缓存
Hadoop 分布式緩存實(shí)現(xiàn)目的是在所有的MapReduce調(diào)用一個(gè)統(tǒng)一的配置文件,首先將緩存文件放置在HDFS中,然后程序在執(zhí)行的過程中會可以通過設(shè)定將文件下載到本地具體設(shè)定如下:
public static void main(String[] arge) throws IOException, ClassNotFoundException, InterruptedException{
?? ?
?? ??? ?Configuration conf=new Configuration();
?? ??? ?conf.set("fs.default.name", "hdfs://192.168.1.45:9000");
?? ??? ?FileSystem fs=FileSystem.get(conf);
?? ??? ?fs.delete(new Path("CASICJNJP/gongda/Test_gd20140104"));
?? ??? ?
?? ??? ?conf.set("mapred.job.tracker", "192.168.1.45:9001");
?? ??? ?conf.set("mapred.jar", "/home/hadoop/workspace/jar/OBDDataSelectWithImeiTxt.jar");
?? ??? ?Job job=new Job(conf,"myTaxiAnalyze");
?? ??? ?
?? ??? ?
?? ?????DistributedCache.createSymlink(job.getConfiguration());//
?? ??? ?try {
?? ??? ??? ?DistributedCache.addCacheFile(new URI("/user/hadoop/CASICJNJP/DistributeFiles/imei.txt"), job.getConfiguration());
?? ??? ?} catch (URISyntaxException e1) {
?? ??? ??? ?// TODO Auto-generated catch block
?? ??? ??? ?e1.printStackTrace();
?? ??? ?}?? ??????? ?
?? ??? ?job.setMapperClass(OBDDataSelectMaper.class);
?? ??? ?job.setReducerClass(OBDDataSelectReducer.class);
?? ??? ?//job.setNumReduceTasks(10);
?? ??? ?//job.setCombinerClass(IntSumReducer.class);
?? ??? ?job.setMapOutputKeyClass(Text.class);
?? ??? ?job.setMapOutputValueClass(Text.class);
?? ??? ?
?? ??? ?FileInputFormat.addInputPath(job, new Path("/user/hadoop/CASICJNJP/SortedData/20140104"));
?? ??? ?FileOutputFormat.setOutputPath(job, new Path("CASICJNJP/gongda/SelectedData"));
?? ??? ?
?? ??? ?System.exit(job.waitForCompletion(true)?0:1);
?? ??? ?
?? ?}
??? 代碼中標(biāo)紅的為將HDFS中的/user/hadoop/CASICJNJP/DistributeFiles/imei.txt作為分布式緩存
?
public class OBDDataSelectMaper extends Mapper<Object, Text, Text, Text> {
?? ?String[] strs;
?? ?String[] ImeiTimes;
?? ?String timei;
?? ?String time;
?? ?private java.util.List<Integer> ImeiList = new java.util.ArrayList<Integer>();
?? ?protected void setup(Context context) throws IOException,
?? ??? ??? ?InterruptedException {
?? ?????try {
?? ??? ??? ?Path[] cacheFiles = DistributedCache.getLocalCacheFiles(context
?? ??? ??? ??? ??? ?.getConfiguration());
?? ??? ??? ?if (cacheFiles != null && cacheFiles.length > 0) {
?? ??? ??? ??? ?String line;
?? ??? ??? ??? ?BufferedReader br = new BufferedReader(new FileReader(
?? ??? ??? ??? ??? ??? ?cacheFiles[0].toString()));
?? ??? ??? ??? ?try {
?? ??? ??? ??? ??? ?line = br.readLine();
?? ??? ??? ??? ??? ?while ((line = br.readLine()) != null) {
?? ??? ??? ??? ??? ??? ?ImeiList.add(Integer.parseInt(line));
?? ??? ??? ??? ??? ?}
?? ??? ??? ??? ?} finally {
?? ??? ??? ??? ??? ?br.close();
?? ??? ??? ??? ?}
?? ??? ??? ?}
?? ??? ?} catch (IOException e) {
?? ??? ??? ?System.err.println("Exception reading DistributedCache: " + e);
?? ??? ?}
?? ?}
?? ?public void map(Object key, Text value, Context context)
?? ??? ??? ?throws IOException, InterruptedException {
?? ??? ?try {
?? ??? ??? ?strs = value.toString().split("\t");
?? ??? ??? ?ImeiTimes = strs[0].split("_");
?? ??? ??? ?timei = ImeiTimes[0];
?? ??? ??? ?if (ImeiList.contains(Integer.parseInt(timei))) {
?? ??? ??? ??? ?context.write(new Text(strs[0]), value);
?? ??? ??? ?}
?? ??? ?} catch (Exception ex) {
?? ??? ?}
?? ?}
}
上述標(biāo)紅代碼中在Map的setup函數(shù)中加載分布式緩存。
轉(zhuǎn)載于:https://www.cnblogs.com/mfryf/p/5360306.html
總結(jié)
以上是生活随笔為你收集整理的hadoop 分布式缓存的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 梦到好几个男生追求我
- 下一篇: 梦到身边有很多蛇围绕着我预示着什么