sqoop导出 建表_hive中orc表sqoop导出到mysql
環境:
sqoop版本:1.4.6
hive版本:2.1.1
hadoop版本:2.7.3
起因:
orc表導出到mysql有時碰到空字符串的問題一直沒有時間整理,這次測試一下
結論:
hive中為string類型的空字符串‘’數據導出到mysql中bigint或者int類型字段時會報錯。
空值null沒問題。
過程:
1、創建臨時庫分區orc表
CREATE TABLE temp.js_pianyuan_orc(
dates string,
city_id string,
starttime string,
endtime string,
programid string,
device_id string,
ip string)
PARTITIONED BY (
curdate string)
stored as orc;
2、創建mysql測試表
注意hive中的表字段和mysql中的表字段名字必須一致
CREATE TABLE js_pianyuan(
dates varchar(64) DEFAULT NULL,
city_id varchar(64) DEFAULT NULL,
starttime varchar(64) DEFAULT NULL,
endtime varchar(64) DEFAULT NULL,
programid varchar(64) DEFAULT NULL,
device_id varchar(64) DEFAULT NULL,
ip varchar(64) DEFAULT NULL)
ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
3、數據檢查
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (dates is null or trim(dates) = '');
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (city_id is null or trim(city_id) = '');
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (starttime is null or trim(starttime) = '');
5040
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (endtime is null or trim(endtime) = '');
5040
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (programid is null or trim(programid) = '');
5144
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (device_id is null or trim(device_id) = '');
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (ip is null or trim(ip) = '');
最后確認starttime、endtime和programid 有空字符串數據
4、22號日期原數據測試導出
insert overwrite table temp.js_pianyuan_orc partition(curdate)
select * from dm_bas.js_pianyuan where curdate = '20180122';
sqoop export \
--hcatalog-database temp \
--hcatalog-table js_pianyuan_orc \
--hcatalog-partition-keys curdate \
--hcatalog-partition-values 20180122 \
--connect jdbc:mysql://ip:3306/test \
--username username \
--password passwd --m 10\
--table js_pianyuan
結果沒有異常
18/01/24 10:34:58 INFO mapreduce.ExportJobBase: Transferred 178.8203 MB in 1,341.2754 seconds (136.5208 KB/sec)
18/01/24 10:34:58 INFO mapreduce.ExportJobBase: Exported 15558677 records.
5、空字符串改成字符串null測試導出
22號原始數據插入測試表23號
insert overwrite table temp.js_pianyuan_orc partition(curdate='20180123')
select
case when trim(dates) = '' then 'null' else dates end,
case when trim(city_id) = '' then 'null' else city_id end,
case when trim(starttime) = '' then 'null' else starttime end,
case when trim(endtime) = '' then 'null' else endtime end,
case when trim(programid) = '' then 'null' else programid end,
case when trim(device_id) = '' then 'null' else device_id end,
case when trim(ip) = '' then 'null' else ip end
from dm_bas.js_pianyuan where curdate = '20180122';
結果沒有異常
18/01/24 10:48:23 INFO mapreduce.ExportJobBase: Transferred 178.8589 MB in 714.5818 seconds (256.3058 KB/sec)
18/01/24 10:48:23 INFO mapreduce.ExportJobBase: Exported 15558677 records.
6、空字符串改成null測試
22號原始數據插入測試表24號
insert overwrite table temp.js_pianyuan_orc partition(curdate='20180124')
select
case when trim(dates) = '' then null else dates end,
case when trim(city_id) = '' then null else city_id end,
case when trim(starttime) = '' then null else starttime end,
case when trim(endtime) = '' then null else endtime end,
case when trim(programid) = '' then null else programid end,
case when trim(device_id) = '' then null else device_id end,
case when trim(ip) = '' then null else ip end
from dm_bas.js_pianyuan where curdate = '20180122';
結果沒有異常
18/01/24 11:25:53 INFO mapreduce.ExportJobBase: Transferred 178.8346 MB in 1,275.8381 seconds (143.5344 KB/sec)
18/01/24 11:25:53 INFO mapreduce.ExportJobBase: Exported 15558677 records.
7、更改mysql字段類型為bigint測試導出22號含有空字符串原數據
alter table js_pianyuan change programid `programid` bigint(64);
空字符串異常
結果:
hive中的string類型字段有空字符串數據的導出到mysql端varchar字段類型沒問題,但是導出mysql端int或者bigint類型會報異常。這種數據要單獨處理。
8、更改mysql字段類型為bigint測試導出24號含有空值null的原數據
18/01/24 12:35:00 INFO mapreduce.ExportJobBase: Transferred 178.8346 MB in 1,100.6016 seconds (166.3878 KB/sec)
18/01/24 12:35:00 INFO mapreduce.ExportJobBase: Exported 15558677 records.
結果沒有異常。
總結
以上是生活随笔為你收集整理的sqoop导出 建表_hive中orc表sqoop导出到mysql的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: MPLS基本原理讲解
- 下一篇: 万兆以太网是什么?如何测试维护?TFN