Hbase中的列式表映射到hive的外表
在做數據ETL中,可能原始數據在列式存儲Hbase中,這個時候,如果我們想清洗數據,可以考慮把Hbase表映射為Hive的外表,然后使用Hive的HQL來清除處理數據,具體過程參考下面例子:
步驟
1,創建Hbase表?
2,映射Hive表
步驟一
說明:cf 列簇名,只put了少量測試列?
create ‘cofeed_info’,{NAME => ‘cf’, REPLICATION_SCOPE => 1}?
put ‘cofeed_info’, ‘100001’, ‘cf:id’, ‘101’?
put ‘cofeed_info’, ‘100001’, ‘cf:title’, ‘這是測試用的數據’?
put ‘cofeed_info’, ‘100001’, ‘cf:insert_time’, ‘45679848161564’
步驟二
說明:雖然很多列在Hbase表中當前并沒有,不過沒關系,:key表示rowkey?
CREATE EXTERNAL TABLE cofeed_info?
(?
rowkey string,?
id string,?
title string,?
tourl string,?
content string,?
data_provider string,?
b_class string,?
b_catogory string,?
source string,?
insert_time timestamp,?
dt string?
) STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’ WITH?
SERDEPROPERTIES (“hbase.columns.mapping”=?
“:key,?
cf:id,?
cf:title,?
cf:tourl,?
cf:content,?
cf:data_provider,?
cf:b_class,?
cf:b_catogory,?
cf:source,?
cf:insert_time,?
cf:dt”) TBLPROPERTIES (“hbase.table.name” = “cofeed_info”);
結果
hive> desc cofeed_info;?
OK?
rowkey string from deserializer?
id string from deserializer?
title string from deserializer?
tourl string from deserializer?
content string from deserializer?
data_provider string from deserializer?
b_class string from deserializer?
b_catogory string from deserializer?
source string from deserializer?
insert_time timestamp from deserializer?
dt string from deserializer?
說明:Hbase中沒有的列簇為NULL了.?
hive> select * from cofeed_info;?
OK?
100001 101 這是測試用的數據 NULL NULL NULL NULL NULL NULL NULL NULL
總結
以上是生活随笔為你收集整理的Hbase中的列式表映射到hive的外表的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Elasticsearch新增一个字段并
- 下一篇: 用户画像2种数据存储的方式