mysql数据首次导入hive_sqoop1.4.7环境搭建及mysql数据导入导出到hive
sqoop文檔:http://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html#_prerequisites
在hive創(chuàng)建表和導入數(shù)據時必須添加分隔符,否則數(shù)據導出時會報錯
1.下載安裝
[root@node1 ~]# wget http://mirrors.shu.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
[root@node1 ~]# tar xvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /opt/
[root@node1 ~]# cd /opt/
[root@node1 opt]# mv sqoop-1.4.7.bin__hadoop-2.6.0/ sqoop-1.4.7
[root@node1 opt]# vim /etc/profile
export SQOOP_HOME=/opt/sqoop-1.4.7
export HADOOP_HOME=/opt/hadoop-2.8.5
export HADOOP_CLASSPATH=/opt/hive-2.3.4/lib/*
export HCAT_HOME=/opt/sqoop-1.4.7/testdata/hcatalog
export ACCUMULO_HOME=/opt/sqoop-1.4.7/src/java/org/apache/sqoop/accumulo
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$SQOOP_HOME/bin
[root@node1 opt]# source? /etc/profile
[root@node1 opt]# sqoop help? ? ? ? ? ? ?--幫助信息
[root@node1 opt]# sqoop import --help? ? --參數(shù)幫助信息
2.修改yarn配置文件
[root@node1 ~]# vim /opt/hadoop-2.8.5/etc/hadoop/yarn-site.xml
yarn.nodemanager.resource.memory-mb
2048
yarn.nodemanager.resource.cpu-vcores
2
[root@node1 ~]# scp /opt/hadoop-2.8.5/etc/hadoop/yarn-site.xml node2:/opt/hadoop-2.8.5/etc/hadoop/? ? ?--將配置文件復制到各節(jié)點
yarn-site.xml? ? ? ? ? ? ? ? ? ?100% 1414? ?804.3KB/s? ?00:00
[root@node1 ~]# scp /opt/hive-2.3.4/conf/hive-site.xml /opt/sqoop-1.4.7/conf/? ? ?--hive的配置文件也要放在sqoop下面,因為sqoop要調用hive
[root@node1 ~]# stop-all.sh
[root@node1 ~]# start-all.sh
3.將mysql數(shù)據導入到HDFS
參數(shù)解釋:
--append? ? ? ? ? 追加數(shù)據
--as-textfile? ? ?導入后形成文本文件
--columns? ? ? ? ?只導入哪些字段
--delete-target-dir? ? --如果導入的目錄存在先刪除再導入
--fetch-size ? ? ? ?--每次讀多少數(shù)據
-m? ? ? ? ? ? ? ? ? ? ?--起多少任務
-e? ? ? ? ? ? ? ? ? ? ? --查詢語句(select)
--table ? ?--表名
--target-dir dir? ? ? ? ? ? ? --指定HDFS目錄
--warehouse-dir dir? ? ? --導入的表將在此目錄之下(表名與目錄名一至)
--where where clause --where條件
-z? ? ? ? ? ? ? ? ? ? ? ? --數(shù)據壓縮
--direct? ? ? ? ? ? ? ?--繞過mysql數(shù)據庫,直接導入(憂化參數(shù))
[root@node1 ~]# sqoop import --connect jdbc:mysql://172.16.9.100/hive --username hive --password system --table TBL_PRIVS? --target-dir /user/sqoop --direct -m 1 --fields-terminated-by '\t'
[root@node1 ~]# hdfs dfs -ls /user/sqoop? ? ? ?--查看導入的目錄
Found 2 items
-rw-r--r--? ?3 root supergroup? ? ? ? ? 0 2019-03-19 12:43 /user/sqoop/_SUCCESS
-rw-r--r--? ?3 root supergroup? ? ? ? 176 2019-03-19 12:43 /user/sqoop/part-m-00000
[root@node1 ~]# hdfs dfs -cat /user/sqoop/part-m-00000? ? ? ?--查看導入的數(shù)據
6,1552878877,1,root,USER,root,USER,INSERT,6
7,1552878877,1,root,USER,root,USER,SELECT,6
8,1552878877,1,root,USER,root,USER,UPDATE,6
9,1552878877,1,root,USER,root,USER,DELETE,6
[root@node1 ~]#
4.將mysql數(shù)據導入到hive中
參數(shù)詳解:
--hive-home dir? ? ? ? ? ?指定hive目錄
--hive-import? ? ? ? ? ? ? ?導入到hive
--hive-database? ? ? ? ? ?導入指定的庫
--hive-overwrite? ? ? ? ? ?覆蓋到hive
--create-hive-table? ? ? 在hive中創(chuàng)建表
--hive-table table-name? ? ? ? ?指定hive表名
--hive-partition-value? v? ? ? ? ?hive分區(qū)
[root@node1 ~]# sqoop import --connect jdbc:mysql://172.16.9.100/hive --username hive --password system --table TBL_PRIVS --target-dir /user/tmp --hive-import --hive-table tt -m 1 --create-hive-table --delete-target-dir --direct --fields-terminated-by '\t'
[root@node1 conf]# hive
Logging initialized using configuration in jar:file:/opt/hive-2.3.4/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show tables;
OK
tt
Time taken: 11.464 seconds, Fetched: 1 row(s)
hive> select * from tt;
OK
615528788771rootUSERrootUSERINSERT6
715528788771rootUSERrootUSERSELECT6
815528788771rootUSERrootUSERUPDATE6
915528788771rootUSERrootUSERDELETE6
Time taken: 3.978 seconds, Fetched: 4 row(s)
hive>
5.將mysql數(shù)據導入到hive指定的庫中
[root@node1 ~]# sqoop import --connect jdbc:mysql://172.16.9.100/hive --username hive --password system --table TABLE_PARAMS --hive-import --hive-table tt1 -m 1 --create-hive-table --hive-database tong --direct --fields-terminated-by '\t'
[root@node1 conf]# hive
Logging initialized using configuration in jar:file:/opt/hive-2.3.4/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> use tong;
OK
Time taken: 14.34 seconds
hive> show tables;
OK
tt1
Time taken: 0.374 seconds, Fetched: 1 row(s)
hive> select * from tt1;
OK
6numFiles1
6numRows0
6rawDataSize0
6totalSize8
6transient_lastDdlTime1552878901
11commentImported by sqoop on 2019/03/19 15:36:21
11numFiles1
11numRows0
11rawDataSize0
11totalSize176
11transient_lastDdlTime1552981011
16commentImported by sqoop on 2019/03/19 16:04:22
16numFiles1
16numRows0
16rawDataSize0
16totalSize239
16transient_lastDdlTime1552982688
Time taken: 3.004 seconds, Fetched: 17 row(s)
hive>
6.將HDFS的數(shù)據導入到mysql中
[root@node1 ~]# hdfs dfs -cat /user/tmp/part-m-00000
12
34
56
[root@node1 ~]# sqoop export --connect jdbc:mysql://172.16.9.100/tong --username tong --password system --export-dir /user/tmp/part-m-00000 --table t1 --direct --fields-terminated-by '\t'
[root@node1 ~]# mysql -u root -psystem
Welcome to the MariaDB monitor.? Commands end with ; or \g.
Your MySQL connection id is 1006876
Server version: 5.6.35 MySQL Community Server (GPL)
Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)]> use tong
MySQL [tong]> select * from t1;
+------+------+
| a? ? | b? ? |
+------+------+
|? ? 3 |? ? 4 |
|? ? 5 |? ? 6 |
|? ? 1 |? ? 2 |
+------+------+
3 rows in set (0.00 sec)
MySQL [tong]>
報錯信息:(卡在Running job不動,不向下執(zhí)行)
19/03/19 11:20:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552965562217_0001
19/03/19 11:20:10 INFO impl.YarnClientImpl: Submitted application application_1552965562217_0001
19/03/19 11:20:10 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1552965562217_0001/
19/03/19 11:20:10 INFO mapreduce.Job: Running job: job_1552965562217_0001
解決方法:
[root@node1 ~]# vim /opt/hadoop-2.8.5/etc/hadoop/yarn-site.xml? ?--限制內存,cpu的資源,并將配置文件同步到其它node,重啟hadoop服務
yarn.nodemanager.resource.memory-mb
2048
yarn.nodemanager.resource.cpu-vcores
2
[root@node1 ~]#
報錯信息:(mysql導入到hive中)
19/03/19 14:34:25 INFO hive.HiveImport: Loading uploaded data into Hive
19/03/19 14:34:25 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.
19/03/19 14:34:25 ERROR tool.ImportTool: Import failed: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50)
at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392)
at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379)
解決方法:
[root@node1 ~]# vim /etc/profile? ? --添加lib變量
export HADOOP_CLASSPATH=/opt/hive-2.3.4/lib/*
[root@node1 ~]# source /etc/profile
報錯信息:(是因為sqoop和hive的jackson包沖突)
19/03/19 15:32:11 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/19 15:32:11 INFO ql.Driver: Executing command(queryId=root_20190319153153_63feddd9-a2c8-4217-97d4-23dd9840a54b): CREATE TABLE `tt` ( `TBL_GRANT_ID` BIGINT, `CREATE_TIME` INT,
`GRANT_OPTION` INT, `GRANTOR` STRING, `GRANTOR_TYPE` STRING, `PRINCIPAL_NAME` STRING, `PRINCIPAL_TYPE` STRING, `TBL_PRIV` STRING, `TBL_ID` BIGINT) COMMENT 'Imported by sqoop on 2019/03/19
15:31:49' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE
19/03/19 15:32:11 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
19/03/19 15:32:12 ERROR exec.DDLTask: java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang/Class;)Lcom/fasterxml/jackson/databind/ObjectReader;
at org.apache.hadoop.hive.common.StatsSetupConst$ColumnStatsAccurate.(StatsSetupConst.java:165)
at org.apache.hadoop.hive.common.StatsSetupConst.parseStatsAcc(StatsSetupConst.java:297)
at org.apache.hadoop.hive.common.StatsSetupConst.setBasicStatsState(StatsSetupConst.java:230)
at org.apache.hadoop.hive.common.StatsSetupConst.setBasicStatsStateForCreateTable(StatsSetupConst.java:292)
解決方法:
[root@node1 ~]# mv /opt/sqoop-1.4.7/lib/jackson-* /home/
[root@node1 ~]# cp -a /opt/hive-2.3.4/lib/jackson-* /opt/sqoop-1.4.7/lib/
報錯信息:
19/03/19 18:38:40 INFO metastore.HiveMetaStore: 0: Done cleaning up thread local RawStore
19/03/19 18:38:40 INFO HiveMetaStore.audit: ugi=rootip=unknown-ip-addrcmd=Done cleaning up thread local RawStore
19/03/19 18:38:40 ERROR tool.ImportTool: Import failed: java.io.IOException: Hive CliDriver exited with status=1
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:355)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
解決方法:
create table t1(a int,b int) row format delimited fields terminated by '\t';? ? ? --創(chuàng)建表時必須加分隔符
sqoop import --connect jdbc:mysql://172.16.9.100/hive --username hive --password system --table TBL_PRIVS? --target-dir /user/sqoop --direct -m 1 --fields-terminated-by '\t'
總結
以上是生活随笔為你收集整理的mysql数据首次导入hive_sqoop1.4.7环境搭建及mysql数据导入导出到hive的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: mysql 远程攻击_gopher 协议
- 下一篇: mysql php页面流量统计_PHP学