當(dāng)前位置：首頁(yè) > 运维知识 > 数据库 >内容正文

数据库

HIVE的安装配置、mysql的安装、hive创建表、创建分区、修改表等内容、hive beeline使用、HIVE的四种数据导入方式、使用Java代码执行hive的sql命令

發(fā)布時(shí)間：2024/9/27 数据库 21 豆豆

生活随笔收集整理的這篇文章主要介紹了 HIVE的安装配置、mysql的安装、hive创建表、创建分区、修改表等内容、hive beeline使用、HIVE的四种数据导入方式、使用Java代码执行hive的sql命令小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

1.上傳tar包
這里我上傳的是apache-hive-1.2.1-bin.tar.gz

2.解壓

mkdir -p?/home/tuzq/software/hive/

tar -zxvf apache-hive-1.2.1-bin.tar.gz ?-C /home/tuzq/software/hive/

3.安裝mysql數(shù)據(jù)庫(kù)（切換到root用戶）（裝在哪里沒(méi)有限制，只有能聯(lián)通hadoop集群的節(jié)點(diǎn)）

mysql安裝可以參考：http://blog.csdn.net/tototuzuoquan/article/details/52711808

mysql安裝僅供參考，不同版本mysql有各自的安裝流程

rpm -qa | grep mysql
rpm -e mysql-libs-5.1.66-2.el6_3.i686 --nodeps
rpm -ivh MySQL-server-5.1.73-1.glibc23.i386.rpm?
rpm -ivh MySQL-client-5.1.73-1.glibc23.i386.rpm?
修改mysql的密碼
/usr/bin/mysql_secure_installation
（注意：刪除匿名用戶，允許用戶遠(yuǎn)程連接）
登陸mysql
mysql -u root -p

4.配置hive

（a）配置HIVE_HOME環(huán)境變量

vim /etc/profile

export JAVA_HOME=/usr/local/jdk1.8.0_73
export HADOOP_HOME=/home/tuzq/software/hadoop-2.8.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HIVE_HOME=/home/tuzq/software/hive/apache-hive-1.2.1-bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin

source?/etc/profile

將hadoop集群中的其它的環(huán)境變量也配置成這種，如我的hadoop集群是hadoop1,hadoop2,hadoop3,hadoop4,hadoop5。這些我都配置成了上面的同樣的環(huán)境變量

[root@hadoop1 conf]# cd $HIVE_HOME/conf

[root@hadoop1 conf]# mv hive-env.sh.template hive-env.sh

[root@hadoop1 conf]#?vi $HIVE_HOME/conf/hive-env.sh

配置其中的$hadoop_home

添加：

export HIVE_CONF_DIR=/home/bigdata/installed/hive-2.3.2/conf

（b）配置元數(shù)據(jù)庫(kù)信息 ??

在$HIVE_HOME/conf文件加下，創(chuàng)建hive-site.xml文件，文件內(nèi)容如下：

vi ?hive-site.xml?

添加如下內(nèi)容：

<?xml version="1.0" encoding="UTF-8" standalone="no"?> <configuration><property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://hadoop10:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=utf-8</value><description>JDBC connect string for a JDBC metastore</description></property><property><name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.jdbc.Driver</value><description>Driver class name for a JDBC metastore</description></property><property><name>javax.jdo.option.ConnectionUserName</name><value>root</value><description>username to use against metastore database</description></property><property><name>javax.jdo.option.ConnectionPassword</name><value>root</value><description>password to use against metastore database</description></property> </configuration>

5.安裝hive和mysq完成后，將mysql的連接jar包拷貝到$HIVE_HOME/lib目錄下

如果出現(xiàn)沒(méi)有權(quán)限的問(wèn)題，在mysql授權(quán)(在安裝mysql的機(jī)器上執(zhí)行)
mysql -uroot -p
#(執(zhí)行下面的語(yǔ)句 ?*.*:所有庫(kù)下的所有表 ? %：任何IP地址或主機(jī)都可以連接)
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'root' WITH GRANT OPTION;
FLUSH PRIVILEGES;

6. 如果hadoop使用的是2.6.4版本的，會(huì)存在Jline包版本不一致的問(wèn)題，需要拷貝hive的lib目錄中jline.2.12.jar的jar包替換掉hadoop中的

/home/hadoop/app/hadoop-2.6.4/share/hadoop/yarn/lib/jline-0.9.94.jar

如果是hadoop-2.8.0版本的，發(fā)現(xiàn)在/home/tuzq/software/hadoop-2.8.0/share/hadoop/yarn/lib下沒(méi)有jline-2.12.jar

下面的命令是查看HIVE中的jline的版本號(hào)的方式：

[root@hadoop1 lib]# cd $HIVE_HOME/lib
[root@hadoop1 lib]# ls jline-2.12.jar?
jline-2.12.jar
[root@hadoop1 lib]#

將創(chuàng)建好的hive遠(yuǎn)程拷貝到hadoop2,hadoop3,hadoop4,hadoop5服務(wù)器上的相同位置

scp -r /home/tuzq/software/hive* root@hadoop2:/home/tuzq/software/

scp -r /home/tuzq/software/hive* root@hadoop3:/home/tuzq/software/

scp -r /home/tuzq/software/hive* root@hadoop4:/home/tuzq/software/

scp -r /home/tuzq/software/hive* root@hadoop5:/home/tuzq/software/

使用schematool初始化hive的表

cd?/home/bigdata/installed/hive-2.3.2/bin

然后執(zhí)行：

[bigdata@bigdata1 bin]$ ./schematool -dbType mysql -initSchema

為了解決hive出現(xiàn)的亂碼，解決辦法是：

因?yàn)槲覀冎?/span>?metastore?支持?jǐn)?shù)據(jù)庫(kù)級(jí)別，表級(jí)別的字符集是?latin1，那么我們只需要把相應(yīng)注釋的地方的字符集由?latin1?改成?utf-8，就可以了。用到注釋的就三個(gè)地方，表、分區(qū)、視圖。如下修改分為兩個(gè)步驟：

執(zhí)行下面的操作：

(1)、進(jìn)入數(shù)據(jù)庫(kù)?Metastore?中執(zhí)行以下?5?條?SQL?語(yǔ)句?
?

?①修改表字段注解和表注解
alter table COLUMNS_V2 modify column COMMENT varchar(256) character?set utf8
alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000)?character set utf8
②?修改分區(qū)字段注解：
alter table PARTITION_PARAMS modify column PARAM_VALUE?varchar(4000) character set utf8 ;
alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000)?character set utf8;
③修改索引注解：
alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000)?character set utf8;

啟動(dòng)hive
bin/hive

執(zhí)行完成之后，到mysql中進(jìn)行查看，發(fā)現(xiàn)現(xiàn)象如下：

另外：通過(guò)Hive beeline也可以訪問(wèn)hive:

---Beeline要與HiveServer2配合使用，支持嵌入式模式和遠(yuǎn)程模式

--啟動(dòng)HiverServer2 ,./bin/hiveserver2

命令模式：

hive?--service hiveserver2 --hiveconf?hive.server2.thrift.port=10001

最后面的port可以更改，hiveserver2默認(rèn)的端口號(hào)是10000。beeline的退出方式：!quit

[root@hadoop1 apache-hive-1.2.1-bin]# bin/beeline?
Beeline version 1.2.1 by Apache Hive
beeline> !connect jdbc:hive2://hadoop1:10000
Connecting to jdbc:hive2://hadoop1:10000
Enter username for jdbc:hive2://hadoop1:10000:?
Enter password for jdbc:hive2://hadoop1:10000:?
Connected to: Apache Hive (version 1.2.1)
Driver: Hive JDBC (version 1.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hadoop1:10000> show databases;
+----------------+--+
| database_name ?|
+----------------+--+
| default ? ? ? ?|
| mydb ? ? ? ? ? |
| userdb ? ? ? ? |
| userdb2 ? ? ? ?|
| userdb3 ? ? ? ?|
+----------------+--+
5 rows selected (0.709 seconds)
0: jdbc:hive2://hadoop1:10000>

啟動(dòng)hiveserver2 和?hivemetastore

[root@bigdata1 hive-2.3.2]# cd $HIVE_HOME/bin

hivemetastore ? 啟動(dòng)方式： nohup hive --service metastore &

hiveserver2 ? ? ? ?啟動(dòng)方式：? nohup?hive --service hiveserver2 &
----------------------------------------------------------------------------------------------------

查看有哪些表：

[root@hadoop1 apache-hive-1.2.1-bin]# bin/hive

Logging initialized using configuration in jar:file:/home/tuzq/software/hive/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 0.98 seconds, Fetched: 1 row(s)
hive> create database db1; #創(chuàng)建一個(gè)數(shù)據(jù)庫(kù)
OK
Time taken: 0.239 seconds
hive> show databases; #顯示所有的數(shù)據(jù)庫(kù)
OK
db1
default
Time taken: 0.015 seconds, Fetched: 2 row(s)
hive>

然后進(jìn)入hdfs上進(jìn)行查看http://hadoop1:50070/ ：

6.建表(默認(rèn)是內(nèi)部表)

use db1;?

hive> create table trade_detail(id bigint,account string,income double,expense double,time string)

row format delimited fields terminated by '\t';

進(jìn)入hdfs進(jìn)行查看：

select nation, avg(size) from beauties group by nation order by avg(size);

注意：如果在此過(guò)程中獲取的結(jié)果是NULL的，說(shuō)明創(chuàng)建表的時(shí)候需要加上：lines terminated by '\n'?

如果想通過(guò)drop table if exists table_name刪除表時(shí)刪除不了，請(qǐng)換$HIVE_HOME/lib中的mysql-connector-java。比如我使用的是：mysql-5.7.15-linux-glibc2.5-x86_64.tar.gz,開(kāi)始的時(shí)候使用的是mysql-connector-java-5.1.7.jar，最后換成mysql-connector-java-5.1.38.jar，發(fā)現(xiàn)就可以drop表了。

查看database，刪除數(shù)據(jù)庫(kù)

以下是使用CASCADE查詢刪除數(shù)據(jù)庫(kù)。這意味著要全部刪除相應(yīng)的表在刪除數(shù)據(jù)庫(kù)之前：

hive> show databases;
OK
default
mydb
userdb
userdb2
userdb3
Time taken: 0.962 seconds, Fetched: 5 row(s)
hive> drop database IF EXISTS ?userdb3 CASCADE;
OK
Time taken: 0.203 seconds

hive> show databases;
OK
default
mydb
userdb
userdb2
Time taken: 0.014 seconds, Fetched: 4 row(s)
hive>

修改表

Alter Table語(yǔ)句，它是在Hive中用來(lái)修改表的

語(yǔ)法：

ALTER TABLE name RENAME TO new_name ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...]) ALTER TABLE name DROP [COLUMN] column_name ALTER TABLE name CHANGE column_name new_name new_type ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])(col_spec[, col_spec ...]) ALTER TABLE name DROP [COLUMN] column_name ALTER TABLE name CHANGE column_name new_name new_type ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])

hive> show tables;
OK
testhivedrivertable
Time taken: 0.029 seconds, Fetched: 1 row(s)
hive> ALTER TABLE testhivedrivertable RENAME To testHive;
OK
Time taken: 0.345 seconds
hive> show tables;
OK
testhive
Time taken: 0.031 seconds, Fetched: 1 row(s)

修改列中列的類型：

hive> desc testhive;
OK
key ? ? ? ? ? ? ? ? int ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
value ? ? ? ? ? ? ?string ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Time taken: 0.161 seconds, Fetched: 2 row(s)

hive> ALTER TABLE testhive CHANGE value? Double;
OK
Time taken: 0.251 seconds
hive> desc testhive;
OK
key ? ? ? ? ? ? ? ? int ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
value ? ? ? ? ? ? ? double ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Time taken: 0.126 seconds, Fetched: 2 row(s)
hive>

為表添加一列：

hive> ALTER TABLE testhive ADD COLUMNS (dept STRING COMMENT 'Departname name');
OK
Time taken: 0.219 seconds
hive> desc testhive;
OK
key ? ? ? ? ? ? ? ? int ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
value ? ? ? ? ? ? ? double ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
dept ? ? ? ? ? ? ? ? string ? ? ? ? ? ? ? Departname name ? ??
Time taken: 0.09 seconds, Fetched: 3 row(s)
hive>

刪除表

刪除表的語(yǔ)法是：

hive>DROP TABLE IF EXISTS testhive;

創(chuàng)建分區(qū)

Hive組織表到分區(qū)。它是將一個(gè)表到基于分區(qū)列，如日期，城市和部門的值相關(guān)方式。使用分區(qū)，很容易對(duì)數(shù)據(jù)進(jìn)行部分查詢。很容易對(duì)數(shù)據(jù)進(jìn)行部分查詢。

表或分區(qū)是細(xì)分成桶，以提供額外的結(jié)構(gòu)，可以使用更搞笑的查詢的數(shù)據(jù)。桶的工作是基于表的一些列的散列函數(shù)值。

添加分區(qū)，語(yǔ)法是：

ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec
[LOCATION 'location1'] partition_spec [LOCATION 'location2'] ...;
partition_spec:
:(p_column = p_col_value, p_column = p_col_value, ...)

先做準(zhǔn)備工作，創(chuàng)建表：

CREATE TABLE IF NOT EXISTS employee (eid int, name String,
? ? destination String)
? ? partitioned by (salary String)
? ? ROW FORMAT DELIMITED
? ? FIELDS TERMINATED BY '\t'
? ? LINES TERMINATED BY '\n'
? ? STORED AS TEXTFILE;

hive>desc employee;

經(jīng)過(guò)上面步驟，表已經(jīng)添加了一個(gè)分區(qū)

導(dǎo)入數(shù)據(jù)：

[root@hadoop1 hivedata]# cat /home/tuzq/software/hivedata/sample.txt?
1201 pal 45000 Technical manager
1202 Manisha 45000 Proof reader
[root@hadoop1 hivedata]#

將上面的數(shù)據(jù)導(dǎo)入到分區(qū)：

LOAD DATA LOCAL INPATH '/home/tuzq/software/hivedata/sample.txt' INTO TABLE employee PARTITION(salary = '45000');

注意上滿的紅字，表示將數(shù)據(jù)到如45000這個(gè)分區(qū)中。

在hdfs上的效果如下：

http://hadoop1:50070/explorer.html#/user/hive/warehouse/userdb.db/employee

下面再次給表添加另外一個(gè)分區(qū)值：

ALTER TABLE employee ADD PARTITION (salary ='40000') location '/40000/part40000';

添加location之后，它在HDFS上的位置將會(huì)改變，將會(huì)到/40000/part40000中。效果圖如下：

http://hadoop1:50070/explorer.html#/40000/part40000

創(chuàng)建2個(gè)分區(qū)的方式：

雙分區(qū)建表語(yǔ)句：

create table table_name (id int, content string) partitioned by (dt string, hour string);

雙分區(qū)表，按天和小時(shí)分區(qū)，在表結(jié)構(gòu)中新增加了dt和hour兩列。
先以dt為文件夾，再以hour子文件夾區(qū)分

查看分區(qū)語(yǔ)句：

hive> show partitions employee;
OK
salary=40000
salary=45000
Time taken: 0.088 seconds, Fetched: 2 row(s)
hive>

再如：

建分區(qū)表
hive> create table td_part(id bigint,account string,income double,expenses double,time string)?

? ? ? ? ? ? ? ? ? ? partitioned by (logdate string)row format delimited fields terminated by '\t';
? ? ? ?OK
? ? ? ?Time taken: 0.114 seconds
? ? ? ?hive>?show tables;
? ? ? ?OK
? ? ? ?td_part
? ? ? ?trade_detail
? ? ? ?Time taken: 0.021 seconds, Fetched: 2 row(s)

? ? ? ?hive>

建外部表
create external table td_ext(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t' location '/td_ext';

7.創(chuàng)建分區(qū)表
普通表和分區(qū)表區(qū)別：有大量數(shù)據(jù)增加的需要建分區(qū)表
hive> create table book(id bigint,name string) partitioned by (pubdate string) row format delimited fields terminated by '\t';
OK
Time taken: 0.108 seconds
hive> show tables;
OK
book
td_part-
trade_detail
Time taken: 0.02 seconds, Fetched: 3 row(s)
hive>

分區(qū)表加載數(shù)據(jù)
load data local inpath './book.txt' overwrite into table book partition (pubdate='2010-08-22');

load data local inpath '/root/data.am' into table beauty partition (nation="USA");

?

創(chuàng)建視圖和索引

視圖在Hive的用法和SQL視圖用法相同。它是一個(gè)標(biāo)準(zhǔn)的RDBMS概念。我們可以在視圖上執(zhí)行DML操作。

創(chuàng)建視圖的語(yǔ)法如下：

CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_comment], ...) ]
[COMMENT table_comment]
AS SELECT ...

hive> desc employee;
OK
eid ? ? ? ? ? ? ? ? int ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
name ? ? ? ? ? ? ? ? string ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
destination ? ? ? ? string ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
salary ? ? ? ? ? ? ? string ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
?
# Partition Information ?
# col_name ? ? ? ? ? ? data_type ? ? ? ? ? comment ? ? ? ? ? ??
?
salary ? ? ? ? ? ? ? string ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Time taken: 0.08 seconds, Fetched: 9 row(s)
hive> create VIEW emp_45000 AS
? ? > SELECT * FROM employee
? ? > WHERE salary = 45000;

刪除一個(gè)視圖的方式：

hive > DROP VIEW emp_45000;

創(chuàng)建索引：

創(chuàng)建索引的語(yǔ)法如下：

CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, ...)]
[
? ?[ ROW FORMAT ...] STORED AS ...
? ?| STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]

HIVE的四種數(shù)據(jù)導(dǎo)入方式：

HIVE的幾種常見(jiàn)的數(shù)據(jù)導(dǎo)入方式，這里介紹四種：

（1）、從本地文件系統(tǒng)中導(dǎo)入數(shù)據(jù)到Hive表；

（2）、從HDFS上導(dǎo)入數(shù)據(jù)到Hive表

（3）、從別的表中查詢出相應(yīng)的數(shù)據(jù)并導(dǎo)入到Hive表中。

（4）、在創(chuàng)建表的時(shí)候通過(guò)從別的表中查詢出相應(yīng)的記錄并插入到所創(chuàng)建的表中。

一、從本地文件系統(tǒng)中導(dǎo)入數(shù)據(jù)到Hive表

先在Hive里面創(chuàng)建好表，如下：

hive> create table wyp(id int,name string,age int,tel string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' lines terminated by '\n' STORED AS TEXTFILE;

注意上面的：ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' ? ? ?注意這個(gè)分割的字符，若是設(shè)置的不好，最后能夠插入數(shù)據(jù)庫(kù)，但是select出來(lái)的結(jié)果是NULL.

這個(gè)表很簡(jiǎn)單，只有四個(gè)字段。本地文件系統(tǒng)里有/home/tuzq/software/hive/apache-hive-1.2.1-bin/wyp.txt 文件，內(nèi)容如下：
[root@hadoop1 apache-hive-1.2.1-bin]# pwd
/home/tuzq/software/hive/apache-hive-1.2.1-bin
[root@hadoop1 apache-hive-1.2.1-bin]# cat wyp.txt?
1 wyp 25 13188888888888
2 test 30 13888888888888
3 zs 34 899314121
[root@hadoop1 apache-hive-1.2.1-bin]#

wyp.txt文件中的數(shù)據(jù)列之間使用空格分割的，可以通過(guò)下面的語(yǔ)句將這個(gè)文件里面的數(shù)據(jù)導(dǎo)入到wyp表里面，操作如下：

hive> load data local inpath '/home/tuzq/software/hive/apache-hive-1.2.1-bin/wyp.txt' into table wyp;
Loading data to table default.wyp
Table default.wyp stats: [numFiles=1, totalSize=67]
OK
Time taken: 0.35 seconds
hive> select * from wyp;
OK
1 wyp 25 13188888888888
2 test 30 13888888888888
3 zs 34 899314121
Time taken: 0.086 seconds, Fetched: 3 row(s)
hive>

這樣就將wyp.txt里面的內(nèi)容導(dǎo)入到wyp表里面去了，可以到wyp表的數(shù)據(jù)目錄下查看，http://hadoop1:50070/explorer.html#/user/hive/warehouse/db1.db：

二、HDFS上導(dǎo)入數(shù)據(jù)到hive表

? ? 從本地文件系統(tǒng)中將數(shù)據(jù)導(dǎo)入到Hive表的過(guò)程中，其實(shí)是先將數(shù)據(jù)臨時(shí)復(fù)制到HDFS的一個(gè)目錄下（典型的情況是復(fù)制到上傳用戶的HDFS的目錄下，比如/根目錄下），然后在將數(shù)據(jù)從那個(gè)臨時(shí)目錄下移動(dòng)(注意，這里說(shuō)的是移動(dòng)，不是復(fù)制)到對(duì)應(yīng)的數(shù)據(jù)目錄里面。既然如此，那么Hive肯定支持將數(shù)據(jù)直接從HDFS上的一個(gè)目錄移動(dòng)到相應(yīng)Hive表的數(shù)據(jù)目錄下，假設(shè)有這個(gè)文件/add.txt,具體的操作如下：

[root@hadoop1 apache-hive-1.2.1-bin]# ls

add.txt ?bin ?book.txt ?conf ?examples ?hcatalog ?lib ?LICENSE ?NOTICE ?README.txt ?RELEASE_NOTES.txt ?scripts ?wyp.txt

[root@hadoop1 apache-hive-1.2.1-bin]# vim add.txt

?[root@hadoop1 apache-hive-1.2.1-bin]# hdfs dfs -put add.txt /

[root@hadoop1 apache-hive-1.2.1-bin]# hdfs dfs -ls /

Found 4 items-rw-r--r-- ? 3 root supergroup ? ? ? ? 67 2017-06-11 11:34 /add.txt

-rw-r--r-- ? 3 root supergroup ? ? ? 3719 2017-06-10 12:11 /kms.sh

drwx-wx-wx ? - root supergroup ? ? ? ? ?0 2017-06-10 22:06 /tmp

drwxr-xr-x ? - root supergroup ? ? ? ? ?0 2017-06-10 22:27 /user

[root@hadoop1 apache-hive-1.2.1-bin]# hdfs dfs -cat /add.txt

4 wyp?

25 131888888888885 test?30 138888888888886 zs?34 899314121

[root@hadoop1 apache-hive-1.2.1-bin]#

上面是需要插入數(shù)據(jù)的內(nèi)容，這個(gè)文件時(shí)存放在HDFS上/add.txt里面的（和一中提到的不同，一中提到的文件是存放在本地文件系統(tǒng)上，并且在load數(shù)據(jù)的時(shí)候加上了關(guān)鍵字local），我們可以同通過(guò)下面的命令將這個(gè)文件里面的內(nèi)容導(dǎo)入到Hive表中，具體操作如下：

hive> select * from wyp;

OK1 wyp

25 131888888888882 test 30 138888888888883 zs 34 899314121

Time taken: 0.086 seconds, Fetched: 3 row(s)

hive> load data inpath '/add.txt' into table wyp;

Loading data to table default.wypTable default.wyp stats: [numFiles=2, totalSize=134]OKTime taken: 0.283 seconds

hive> select * from wyp;

OK4 wyp

25 131888888888885 test

30 138888888888886 zs

34 8993141211 wyp

25 131888888888882 test

30 138888888888883 zs

34 899314121

Time taken: 0.076 seconds, Fetched: 6 row(s)

hive>?

從上面的執(zhí)行結(jié)果我們可以看到，數(shù)據(jù)的確導(dǎo)入到wyp表中了！請(qǐng)注意 load data inpath '/add.txt' into table wyp; 里面沒(méi)有l(wèi)ocal這個(gè)單詞，這個(gè)是和一中的區(qū)別。? ??

三、從別的表中查詢出相應(yīng)的數(shù)據(jù)并導(dǎo)入到Hive表中

假設(shè)Hive中有test表，其建表語(yǔ)句如下所示：

hive> create table test(

? ? > id int, name string

? ? > ,tel string)

? ? > partitioned by

? ? > (age int)

? ? > ROW FORMAT DELIMITED

? ? > FIELDS TERMINATED BY '\t'

? ? > STORED AS TEXTFILE;

Time taken: 0.261 seconds

復(fù)制代碼

大體和wyp表的建表語(yǔ)句類似，只不過(guò)test表里面用age作為了分區(qū)字段。對(duì)于分區(qū)，這里在做解釋一下：

分區(qū)：在Hive中，表的每一個(gè)分區(qū)對(duì)應(yīng)表下的相應(yīng)目錄，所有分區(qū)的數(shù)據(jù)都是存儲(chǔ)在對(duì)應(yīng)的目錄中。比如wyp表有dt和city兩個(gè)分區(qū)，則對(duì)應(yīng)dt=20131218,city=BJ對(duì)應(yīng)表的目錄為/user/hive/warehouse/dt=20131218/city=BJ，所有屬于這個(gè)分區(qū)的數(shù)據(jù)都存放在這個(gè)目錄中。

下面語(yǔ)句就是將wyp表中的查詢結(jié)果并插入到test表中：

hive> insert into table test

? ? > partition (age='25')

? ? > select id, name, tel

? ? > from wyp;

#####################################################################

? ?? ?? ???這里輸出了一堆Mapreduce任務(wù)信息，這里省略

#####################################################################

Total MapReduce CPU Time Spent: 1 seconds 310 msec

Time taken: 19.125 seconds

hive> select * from test;

5? ?? ? wyp1? ? 131212121212? ? 25

6? ?? ? wyp2? ? 134535353535? ? 25

7? ?? ? wyp3? ? 132453535353? ? 25

8? ?? ? wyp4? ? 154243434355? ? 25

1? ?? ? wyp? ???13188888888888??25

2? ?? ? test? ? 13888888888888??25

3? ?? ? zs? ?? ?899314121? ?? ? 25

Time taken: 0.126 seconds, Fetched: 7 row(s)

復(fù)制代碼

這里做一下說(shuō)明：我們知道我們傳統(tǒng)數(shù)據(jù)塊的形式insert into table values（字段1，字段2），這種形式hive是不支持的。

通過(guò)上面的輸出，我們可以看到從wyp表中查詢出來(lái)的東西已經(jīng)成功插入到test表中去了！如果目標(biāo)表（test）中不存在分區(qū)字段，可以去掉partition (age=’25′)語(yǔ)句。當(dāng)然，我們也可以在select語(yǔ)句里面通過(guò)使用分區(qū)值來(lái)動(dòng)態(tài)指明分區(qū)：

hive> set hive.exec.dynamic.partition.mode=nonstrict;

hive> insert into table test

? ? > partition (age)

? ? > select id, name,

? ? > tel, age

? ? > from wyp;

#####################################################################

? ?? ?? ???這里輸出了一堆Mapreduce任務(wù)信息，這里省略

#####################################################################

Total MapReduce CPU Time Spent: 1 seconds 510 msec

Time taken: 17.712 seconds

hive> select * from test;

5? ?? ? wyp1? ? 131212121212? ? 23

6? ?? ? wyp2? ? 134535353535? ? 24

7? ?? ? wyp3? ? 132453535353? ? 25

1? ?? ? wyp? ???13188888888888??25

8? ?? ? wyp4? ? 154243434355? ? 26

2? ?? ? test? ? 13888888888888??30

3? ?? ? zs? ?? ?899314121? ?? ? 34

Time taken: 0.399 seconds, Fetched: 7 row(s)

復(fù)制代碼

這種方法叫做動(dòng)態(tài)分區(qū)插入，但是Hive中默認(rèn)是關(guān)閉的，所以在使用前需要先把hive.exec.dynamic.partition.mode設(shè)置為nonstrict。當(dāng)然，Hive也支持insert overwrite方式來(lái)插入數(shù)據(jù)，從字面我們就可以看出，overwrite是覆蓋的意思，是的，執(zhí)行完這條語(yǔ)句的時(shí)候，相應(yīng)數(shù)據(jù)目錄下的數(shù)據(jù)將會(huì)被覆蓋！而insert into則不會(huì)，注意兩者之間的區(qū)別。例子如下：

hive> insert overwrite table test

? ? > PARTITION (age)

? ? > select id, name, tel, age

? ? > from wyp;

復(fù)制代碼

更可喜的是，Hive還支持多表插入，什么意思呢？在Hive中，我們可以把insert語(yǔ)句倒過(guò)來(lái)，把from放在最前面，它的執(zhí)行效果和放在后面是一樣的，如下：

hive> show create table test3;

CREATE??TABLE test3(

??id int,

??name string)

Time taken: 0.277 seconds, Fetched: 18 row(s)

hive> from wyp

? ? > insert into table test

? ? > partition(age)

? ? > select id, name, tel, age

? ? > insert into table test3

? ? > select id, name

? ? > where age>25;

hive> select * from test3;

8? ?? ? wyp4

2? ?? ? test

3? ?? ? zs

Time taken: 4.308 seconds, Fetched: 3 row(s)

復(fù)制代碼

可以在同一個(gè)查詢中使用多個(gè)insert子句，這樣的好處是我們只需要掃描一遍源表就可以生成多個(gè)不相交的輸出。這個(gè)很酷吧！

四、在創(chuàng)建表的時(shí)候通過(guò)從別的表中查詢出相應(yīng)的記錄并插入到所創(chuàng)建的表中

在實(shí)際情況中，表的輸出結(jié)果可能太多，不適于顯示在控制臺(tái)上，這時(shí)候，將Hive的查詢輸出結(jié)果直接存在一個(gè)新的表中是非常方便的，我們稱這種情況為CTAS（create table .. as select）如下：

hive> create table test4

? ? > as

? ? > select id, name, tel

? ? > from wyp;

hive> select * from test4;

5? ?? ? wyp1? ? 131212121212

6? ?? ? wyp2? ? 134535353535

7? ?? ? wyp3? ? 132453535353

8? ?? ? wyp4? ? 154243434355

1? ?? ? wyp? ???13188888888888

2? ?? ? test? ? 13888888888888

3? ?? ? zs? ?? ?899314121

Time taken: 0.089 seconds, Fetched: 7 row(s)

復(fù)制代碼

數(shù)據(jù)就插入到test4表中去了，CTAS操作是原子的，因此如果select查詢由于某種原因而失敗，新表是不會(huì)創(chuàng)建的！

Java遠(yuǎn)程調(diào)用hive

? 使用java遠(yuǎn)程連接hive,在這個(gè)過(guò)程中需要先啟動(dòng)：hiveServer2. ? （注意:org.apache.hive.jdbc.HiveDriver依賴的jar包是：hive-jdbc-1.2.1.jar）

package hive;import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.Statement;public class HiveCreateDb {/** hiverserver 版本使用此驅(qū)動(dòng) private static String driverName =* "org.apache.hadoop.hive.jdbc.HiveDriver";*//** hiverserver2 版本使用此驅(qū)動(dòng)*/private static String driverName = "org.apache.hive.jdbc.HiveDriver";public static void main(String[] args) throws Exception {Class.forName(driverName);/* hiverserver 版本jdbc url格式,主要體現(xiàn)在jdbc:hive:// */// Connection con =// DriverManager.getConnection("jdbc:hive://hadoop1:10000/default", "",// "");/* hiverserver2 版本jdbc url格式,主要體現(xiàn)在jdbc:hive2:// */Connection con = DriverManager.getConnection("jdbc:hive2://hadoop1:10000/default", "", "");Statement stmt = con.createStatement();// 下面的這一句如果在沒(méi)有userdb數(shù)據(jù)庫(kù)的情況下，可以放開(kāi)。// stmt.executeQuery("CREATE DATABASE userdb");// 參數(shù)設(shè)置測(cè)試// boolean resHivePropertyTest = stmt// .execute("SET tez.runtime.io.sort.mb = 128");boolean resHivePropertyTest = stmt.execute("set hive.execution.engine=tez");System.out.println(resHivePropertyTest);stmt.execute("USE userdb");String tableName = "testHiveDriverTable";try {stmt.executeQuery("drop table " + tableName);} catch (Exception e) {e.printStackTrace();}ResultSet res;try {res = stmt.executeQuery("create table " + tableName + " (key int, value string)");} catch (Exception e) {e.printStackTrace();}// show tablesString sql = "show tables '" + tableName + "'";System.out.println("Running: " + sql);res = stmt.executeQuery(sql);if (res.next()) {System.out.println(res.getString(1));}// //describe tablesql = "describe " + tableName;System.out.println("Running: " + sql);res = stmt.executeQuery(sql);while (res.next()) {System.out.println(res.getString(1) + "\t" + res.getString(2));}// load data into table// NOTE: filepath has to be local to the hive server// NOTE: /tmp/a.txt is a ctrl-A separated file with two fields per// lineString filepath = "/tmp/a.txt";sql = "load data local inpath '" + filepath + "' into table " + tableName;System.out.println("Running: " + sql);res = stmt.executeQuery(sql);// select * querysql = "select * from " + tableName;System.out.println("Running: " + sql);res = stmt.executeQuery(sql);while (res.next()) {System.out.println(String.valueOf(res.getInt(1)) + "\t" + res.getString(2));}// regular hive querysql = "select count(1) from " + tableName;System.out.println("Running: " + sql);res = stmt.executeQuery(sql);while (res.next()) {System.out.println(res.getString(1));}stmt.close();con.close();}}

總結(jié)

以上是生活随笔為你收集整理的HIVE的安装配置、mysql的安装、hive创建表、创建分区、修改表等内容、hive beeline使用、HIVE的四种数据导入方式、使用Java代码执行hive的sql命令的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： hdfs haadmin使用，DataN
下一篇：台式惠普怎么设置u盘启动惠普台式机U盘