當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

DBMS、hdfs、hive之间的数据转换之sqoop

發布時間：2024/9/27 编程问答 20 豆豆

生活随笔收集整理的這篇文章主要介紹了 DBMS、hdfs、hive之间的数据转换之sqoop 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1 使用sqoop進行數據導入導出

將B線數據中心所有的數據表放到xxxxx數據庫下。

1.1 導入區域編碼表

bin/sqoop import –connect jdbc:mysql://xxx.xxx.xxx.142:3306/db1?useSSL=false –username root –password 123456 –target-dir /xxxx/xxxx/sys_area –table tb_sys_area –m 1;

成功之后導入的文件目錄為：

/bplan/data-center/sys_area/part-m-00000

如下圖：

1.2 導入行業基礎數據

bin/sqoop import –connect jdbc:mysql://xxx.xxx.xxx.142:3306/db1?useSSL=false –username root –password 123456 –target-dir /bplan/data-center/sys_industry –table tb_sys_industry –m 1;

1.3 特別注意

如果按照上面的方式導入數據，那么數據間的間隔符號默認為”,”，若需自定義分割符則加入—fields-terminated-by ‘\t’;如：

bin/sqoop import --connect jdbc:mysql://xxx.xxx.xxx.142:3306/db1?useSSL=false --username root --password 123456 --target-dir /xxxx/xxxx/sys_industry_1 --table tb_sys_industry --m 1 --fields-terminated-by '\t';

成功后導入的文件目錄為：

/bplan/data-center/sys_industry/part-m-00000

如下圖：

1.4 將區域數據和行業數據導入到hive中

root@bigdata2 hive-2.3.2]# cd $HIVE_HOME root@bigdata2 hive-2.3.2]# bin/hive 創建新數據庫data_center; # create database data_center;

切換數據庫 data_center; # user data_center;

創建hive 區域信息表 #CREATE TABLE IF NOT EXISTS tb_sys_area ( id int comment '主鍵id', code string comment '編碼', name string comment "地區名稱", parent_code int comment "父級地區編碼", short_name string comment "地區簡稱", level_type smallint comment '地區層級', city_code string comment '城市編碼', zip_code string comment '郵政編碼', merger_name string comment '地區全稱', pinyin string comment '地區拼音', pingan_area_name string comment '平安銀行地區名稱' ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

將在hdfs中的數據導入到hive中

load data inpath '/bplan/data-center/sys_area/part-m-00000' into table data_center.tb_sys_area ;

1.5 Sqoop直接將數據導入到hive中

在hive中創建tb_sys_industry

# CREATE TABLE IF NOT EXISTS tb_sys_industry ( id int comment '主鍵id', category_id string comment '類目id ', parent_category_id string comment "上級類目id ", root_category_id int comment "根類目id ", category_name string comment "類目名稱", weixin_category_id smallint comment '微信類目id ', merger_name string comment '地區全稱', mybank_category_id string comment '網商銀行類目id ' ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ', '; //如果是通過mysql直接導入到hive中，此步驟可以不做

使執行sqoop導入

#cd $SQOOP_HOME #bin/sqoop import --connect jdbc:mysql://xxx.xxx.xxx.142:3306/db1 --username root --password 123456 --table tb_sys_industry --fields-terminated-by ',' --delete-target-dir --num-mappers 1 --hive-import --hive-database data_center --hive-table tb_sys_industry;

注意：在此過程中可能會出現如下：
執行異常：ERROR tool.ImportTool: Import failed: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
解決：sqoop需要一個hive的包，將hive/lib中的hive-common-2.3.3.jar拷貝到sqoop的lib目錄中。

1.6 Hive中的數據導入到mysql中

在MySQL中新建hive導入的表

CREATE TABLE `tb_sys_industry_1` (`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,`category_id` VARCHAR(50) NOT NULL COMMENT '類目id',`parent_category_id` VARCHAR(50) DEFAULT NULL COMMENT '上級類目id',`root_category_id` VARCHAR(50) NOT NULL COMMENT '根類目id',`category_name` VARCHAR(100) NOT NULL COMMENT '類目名稱',`weixin_category_id` VARCHAR(50) DEFAULT NULL COMMENT '微信類目id',`merger_name` VARCHAR(100) DEFAULT NULL,`mybank_category_id` VARCHAR(50) DEFAULT NULL COMMENT '網商銀行類目id',PRIMARY KEY (`id`) ) ENGINE=INNODB AUTO_INCREMENT=157 DEFAULT CHARSET=utf8mb4 COMMENT='行業信息’

在sqoop中執行：

# bin/sqoop export --connect jdbc:mysql://xxx.xxx.xxx.142:3306/db1 --username root --password 123456 --table tb_sys_industry_1 --export-dir /user/hive/warehouse/data_center.db/tb_sys_industry/part-m-00000 如果分割符不一樣的話則命令后+ --input-fields-terminated-by '\t'；

注意：
hive默認的字段分隔符為’\001’,sqoop默認的分隔符是’,’。
–input-fields-terminated-by：表示用于hive或hdfs數據導出到外部存儲分隔參數；
–fields-terminated-by：表示用于外面存儲導入到hive或hdfs中需要實現字段分隔的參數；

1.7 HIVE數據備份

進入hive;

use nginx_log;

仿照MySQL方式進行表數據備份

create table nginx_log_info_20180724 as select * from nginx_log_info;

把Hive中的表數據備份到磁盤中。
備份示例：

insert overwrite local directory '/home/bigdata_bak/nginx_log /nginx_log_info_20180724' ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE select * from nginx_log_info;

1.8 把磁盤數據導入到hive中

創建表：

CREATE TABLE IF NOT EXISTS nginx_log_info_20180724 ( id bigint comment '主鍵id', product_name string comment '所屬業務', remote_addr string comment "遠程服務器ip", access_time int comment "訪問時間，格式為：yyyyMMdd", access_timestamp double comment "時間戳", time_zone string comment '時區', request_type string comment '請求類型', request_url string comment '請求url', request_protocol string comment '請求協議', status smallint comment '請求狀態', body_bytes_sent int comment '發送內容大小', request_body string comment '請求體', http_referer string comment 'http引用頁', http_user_agent string comment 'http_user_agent', os_name string comment '操作系統名稱', os string comment '操作系統', browser_name string comment '瀏覽器名稱', browser_version string comment '瀏覽器版本', device_type string comment '設備類型', browser string comment '瀏覽器', access_tool string comment '類型', http_x_forwarded_for string comment 'http_x_forwarded_for', request_time double comment '請求響應時間' ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/home/bigdata_bak/nginx_log /nginx_log_info_20180724' OVERWRITE INTO TABLE nginx_log_info_20180724;

清空表數據：

insert overwrite table nginx_log_info_20180724 select * from nginx_log_info_20180724 where 1=0;

總結

以上是生活随笔為你收集整理的DBMS、hdfs、hive之间的数据转换之sqoop的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： ideaIU-2018.1.5.win-
下一篇：养殖业什么最赚钱农村这些可以获得不错