當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

MapReduce多表连接

發(fā)布時間：2025/3/20 编程问答 19 豆豆

生活随笔收集整理的這篇文章主要介紹了 MapReduce多表连接小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

多表關聯(lián)

??? 多表關聯(lián)和單表關聯(lián)類似，它也是通過對原始數(shù)據(jù)進行一定的處理，從其中挖掘出關心的信息。下面進入這個實例。

1 實例描述

??? 輸入是兩個文件，一個代表工廠表，包含工廠名列和地址編號列；另一個代表地址表，包含地址名列和地址編號列。要求從輸入數(shù)據(jù)中找出工廠名和地址名的對應關系，輸出"工廠名——地址名"表。

??? 樣例輸入如下所示。

??? 1）factory：

factoryname??????????????? 　　　　addressed

Beijing Red Star??????????????? 　　　　1

Shenzhen Thunder??????????? 　　　　3

Guangzhou Honda??????????? 　　　　2

Beijing Rising?????????????????? 　　　　1

Guangzhou Development Bank??????2

Tencent??????????????? 　　　　　　　　3

Back of Beijing??????????????? 　　　　 1

??? 2）address：

addressID??? addressname

1??????? 　　　　Beijing

2??????? 　　　　Guangzhou

3??????? 　　　　Shenzhen

4??????? 　　　　Xian

??? 樣例輸出如下所示。

factoryname??????????????????? 　　　　addressname

Back of Beijing??????????????????? 　　　　? Beijing

Beijing Red Star??????????????????? 　　　　Beijing

Beijing Rising??????????????????? 　　　　　 Beijing

Guangzhou Development Bank??????????Guangzhou

Guangzhou Honda??????????????? 　　　　Guangzhou

Shenzhen Thunder??????????????? 　　　　Shenzhen

Tencent??????????????????? 　　　　　　　　Shenzhen

2 設計思路

??? 多表關聯(lián)和單表關聯(lián)相似，都類似于數(shù)據(jù)庫中的自然連接。相比單表關聯(lián)，多表關聯(lián)的左右表和連接列更加清楚。所以可以采用和單表關聯(lián)的相同的處理方式，map識別出輸入的行屬于哪個表之后，對其進行分割，將連接的列值保存在key中，另一列和左右表標識保存在value中，然后輸出。reduce拿到連接結果之后，解析value內(nèi)容，根據(jù)標志將左右表內(nèi)容分開存放，然后求笛卡爾積，最后直接輸出。

??? 這個實例的具體分析參考單表關聯(lián)實例。下面給出代碼。

1 import java.io.IOException; 2 import java.lang.String; 3 import java.util.Iterator; 4 import java.util.StringTokenizer; 5 6 import org.apache.hadoop.fs.Path; 7 import org.apache.hadoop.io.Text; 8 import org.apache.hadoop.mapreduce.Job; 9 import org.apache.hadoop.mapreduce.Mapper; 10 import org.apache.hadoop.mapreduce.Reducer; 11 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 12 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 13 14 public class MTJoin { 15 public static int time = 0; 16 17 public static class Map extends Mapper<Object, Text, Text, Text> { 18 19 @Override 20 protected void map(Object key, Text value, Context context) 21 throws IOException, InterruptedException { 22 String line = value.toString(); 23 String relationType = new String(); 24 if (line.contains("factoryname") == true 25 || line.contains("addressID") == true) { 26 return; 27 } 28 29 StringTokenizer itr = new StringTokenizer(line); 30 String mapkey = new String(); 31 String mapvalue = new String(); 32 33 String[] split = line.split(" "); 34 35 if (split.length == 2 && split[1].charAt(0) >= '0' 36 && split[1].charAt(0) <= '9') { 37 mapkey = split[1]; 38 mapvalue = split[0]; 39 relationType = "1"; 40 } 41 if (split.length == 2 && split[0].charAt(0) >= '0' 42 && split[0].charAt(0) <= '9') { 43 mapkey = split[0]; 44 mapvalue = split[1]; 45 relationType = "2"; 46 } 47 48 context.write(new Text(mapkey), new Text(relationType + "+" 49 + mapvalue)); 50 51 } 52 } 53 54 public static class Reduce extends Reducer<Text, Text, Text, Text> { 55 56 @Override 57 protected void reduce(Text key, Iterable<Text> values, Context context) 58 throws IOException, InterruptedException { 59 if (0 == time) { 60 context.write(new Text("factoryname"), new Text("addressname")); 61 time++; 62 } 63 64 int factorynum = 0; 65 String[] factory = new String[10]; 66 int addressnum = 0; 67 String[] address = new String[10]; 68 69 for(Text value:values ){ 70 if (0 == value.toString().length()) { 71 continue; 72 } 73 74 char relationType = value.toString().charAt(0); 75 76 // left 77 if ('1' == relationType) { 78 factory[factorynum] = value.toString().substring(2); 79 factorynum++; 80 } 81 // right 82 if ('2' == relationType) { 83 address[addressnum] = value.toString().substring(2); 84 addressnum++; 85 } 86 } 87 88 89 if (0 != factorynum && 0 != addressnum) { 90 for (int m = 0; m < factorynum; m++) { 91 for (int n = 0; n < addressnum; n++) { 92 context.write(new Text(factory[m]), 93 new Text(address[n])); 94 } 95 } 96 } 97 } 98 99 } 100 101 public static void main(String[] args) throws Exception { 102 Job job = new Job(); 103 job.setJobName("MTJoin"); 104 job.setJarByClass(MTJoin.class); 105 106 job.setMapperClass(Map.class); 107 job.setReducerClass(Reduce.class); 108 109 job.setOutputKeyClass(Text.class); 110 job.setOutputValueClass(Text.class); 111 112 FileInputFormat.addInputPath(job, new Path(args[0])); 113 FileOutputFormat.setOutputPath(job, new Path(args[1])); 114 115 System.exit(job.waitForCompletion(true) ? 0 : 1); 116 } 117 } View Code

轉載于:https://www.cnblogs.com/liutoutou/p/3481903.html

總結

以上是生活随笔為你收集整理的MapReduce多表连接的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

MapReduce

上一篇：初学spring（一）
下一篇：为什么不走INDEX FAST FULL