CentOS7下Hive集群搭建
文章目錄
- 1、下載Hive
- 2、安裝mysql數據庫
- 3、修改配置文件
- 3.1 hive-site.xml配置文件
- 3.2 hive-env.sh配置文件
- 4、新增mysql驅動到hive中
- 5、將hive命令添加到環境變量中
- 6、初始化hive操作
- 7、啟動Metastore服務
- 8、開始測試
- 8.1、hdsf目錄創建
- 8.2、將hive拷貝到集群的其他幾臺機器
- 8.3、啟動測試
- 8.4、遠程訪問hive測試
1、下載Hive
本實例使用的是apache-hive-2.1.1,請根據需要將hive下載到本地并解壓。下載地址:http://archive.apache.org/dist/hive/
解壓后的路徑:
2、安裝mysql數據庫
Hive搭建分三種方式:
內嵌Derby方式:使用derby存儲方式時,運行hive會在當前目錄生成一個derby文件和一個metastore_db目錄。這種存儲方式的弊端是在同一個目錄下同時只能有一個hive客戶端能使用數據庫。
本地模式:這種存儲方式需要在本地運行一個mysql服務器,并作如下配置(下面兩種使用mysql的方式,需要將mysql的jar包拷貝到$HIVE_HOME/lib目錄下)。
多用戶模式:這種存儲方式需要在遠端服務器運行一個mysql服務器,并且需要在Hive服務器啟動meta服務。
三種方式歸根到底就是元數據的存儲位置不一樣,本文采用的是多用戶模式。
mysql的安裝可以參考 《CentOS7下安裝mysql-5.7.24》
3、修改配置文件
3.1 hive-site.xml配置文件
首先進入下面這個目錄,編輯hive-site.xml文件,沒有就新加一個
[root@hadoop-master conf]# vi /usr/local/hive/apache-hive-2.1.1/conf/hive-site.xml其中hive-site.xml內容:
<?xml version="1.0" encoding="utf-8"?> <configuration><!--數據連接地址 --><property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hadoop-master:3306/hive?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property><!-- 數據庫驅動,這里使用mysql數據庫驅動 --><property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <!-- 數據庫用戶名,根據自己的數據庫用戶名填寫 --> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <description>username to use against metastore database</description> </property> <!-- 數據庫密碼,根據自己的數據庫密碼填寫 --> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>root</value> <description>password to use against metastore database</description> </property><property><name>datanucleus.autoCreateSchema</name><value>true</value></property><property><name>datanucleus.autoCreateTables</name><value>true</value></property><property><name>datanucleus.autoCreateColumns</name><value>true</value></property><!-- 設置hive倉庫在hdfs上的位置 --><property><name>hive.metastore.warehouse.dir</name><value>/user/hive/warehouse</value></property><!--資源臨時文件存放位置 --><property><name>hive.downloaded.resources.dir</name><value>/usr/local/hive/tmp/resources</value><description>Temporary local directory for added resources in the remote file system.</description></property><!-- Hive在0.9版本之前需要設置hive.exec.dynamic.partition為true, Hive在0.9版本之后默認為true --><property><name>hive.exec.dynamic.partition</name><value>true</value></property><property><name>hive.exec.dynamic.partition.mode</name><value>nonstrict</value></property><!-- 修改日志位置 --><property><name>hive.exec.local.scratchdir</name><value>/usr/local/hive/tmp/hiveJobsLog</value><description>Local scratch space for Hive jobs</description></property><property><name>hive.downloaded.resources.dir</name><value>/usr/local/hive/tmp/resourcesLog</value><description>Temporary local directory for added resources in the remote file system.</description></property><property><name>hive.querylog.location</name><value>/usr/local/hive/tmp/hiveRunLog</value><description>Location of Hive run time structured log file</description></property><property><name>hive.server2.logging.operation.log.location</name><value>/usr/local/hive/tmp/opertitionLog</value><description>Top level directory where operation tmp are stored if logging functionality is enabled</description></property> <!-- 配置HWI接口 --><property> <name>hive.hwi.war.file</name> <value>/usr/local/hive/apache-hive-2.1.1/lib/hive-hwi-2.1.1.jar</value> <description>This sets the path to the HWI war file, relative to ${HIVE_HOME}. </description> </property> <property> <name>hive.hwi.listen.host</name> <value>hadoop-master</value> <description>This is the host address the Hive Web Interface will listen on</description> </property> <property> <name>hive.hwi.listen.port</name> <value>9999</value> <description>This is the port the Hive Web Interface will listen on</description> </property><!-- Hiveserver2已經不再需要hive.metastore.local這個配置項了(hive.metastore.uris為空,則表示是metastore在本地,否則就是遠程)遠程的話直接配置hive.metastore.uris即可 --><property><name>hive.metastore.uris</name><value>thrift://hadoop-master:9083</value><description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description></property><property><name>hive.server2.thrift.bind.host</name><value>hadoop-master</value></property><property><name>hive.server2.thrift.port</name><value>10000</value></property><property><name>hive.server2.thrift.http.port</name><value>10001</value></property><property><name>hive.server2.thrift.http.path</name><value>cliservice</value></property><!-- HiveServer2的WEB UI --><property><name>hive.server2.webui.host</name><value>hadoop-master</value></property><property><name>hive.server2.webui.port</name><value>10002</value></property><property><name>hive.scratch.dir.permission</name><value>755</value></property><!-- 使用Hive on spark時,若不設置下列該配置會出現內存溢出異常 --><property><name>spark.driver.extraJavaOptions</name><value>-XX:PermSize=128M -XX:MaxPermSize=512M</value></property><!-- 是否驗證配置,此處設置未NONE,暫時不需要驗證,測試用 --><property><name>hive.server2.authentication</name><value>NONE</value></property><property><name>hive.metastore.schema.verification</name><value>false</value><description>Enforce metastore schema version consistency.True: Verify that version information stored in is compatible with one from Hive jars. Also disable automaticschema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensuresproper metastore schema migration. (Default)False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.</description></property> </configuration>配置中的文件目錄記得要創建:
[root@hadoop-master tmp]# pwd /usr/local/hive/tmp [root@hadoop-master tmp]# ll 總用量 40 drwxr-xr-x. 2 root root 6 2月 21 22:47 hiveJobsLog drwxr-xr-x. 2 root root 6 2月 21 21:20 hiveRunLog drwxr-xr-x. 2 root root 6 2月 21 21:20 opertitionLog drwxr-xr-x. 2 root root 6 2月 21 21:19 resources drwxr-xr-x. 2 root root 6 2月 21 21:20 resourcesLog3.2 hive-env.sh配置文件
首先,進入hive的配置目錄
cd /usr/local/hive/apache-hive-2.1.1/conf然后,將需要的配置文件拷貝,即去掉后面的template
cp hive-env.sh.template hive-env.sh cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties cp hive-log4j2.properties.template hive-log4j2.properties之后的文件如圖所示:
在hive-env.sh文件結尾增加如下內容:
4、新增mysql驅動到hive中
在以下目錄添加mysql-connector-java-5.1.30.jar
[root@hadoop-master lib]# pwd /usr/local/hive/apache-hive-2.1.1/lib并且chomd 777 mysql-connector-java-5.1.30.jar 賦權限
5、將hive命令添加到環境變量中
[root@hadoop-master bin]# vi /etc/profile添加HIVE_HOME到環境變量中
#java環境變量 export JAVA_HOME=/usr/local/jdk/jdk1.8.0_261 export CLASSPATH=.:${JAVA_HOME}/jre/lib/rt.jar:${JAVA_HOME}/lib/dt.jar:${JAVA_HOME}/lib/tools.jar export PATH=$PATH:${JAVA_HOME}/bin#配置Hadoop環境變量 export HADOOP_HOME=/usr/local/hadoop/apps/hadoop-2.7.3 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin#配置hive環境變量 export HIVE_HOME=/usr/local/hive/apache-hive-2.1.1 export PATH=$PATH:$HIVE_HOME/bin:$HIVE_HOME/sbin添加完成之后執行刷新:
[root@hadoop-master bin]# source /etc/profile6、初始化hive操作
選用MySQLysql和Derby二者之一為元數據庫
注意:先查看MySQL中是否有殘留的Hive元數據,若有,需先刪除
其中mysql表示用mysql做為存儲hive元數據的數據庫, 若不用mysql做為元數據庫, 則執行
schematool -dbType derby -initSchema ## Derby作為元數據庫7、啟動Metastore服務
執行Hive前, 須先啟動metastore服務, 否則會報錯。客戶端連接metastore服務,metastore再去連接MySQL數據庫來存取元數據。有了metastore服務,就可以有多個客戶端同時連接,而且這些客戶端不需要知道MySQL數據庫的用戶名和密碼,只需要連接metastore 服務即可。
hive --service metastore &啟動hiveserver2服務,這樣遠程才能訪問到
hive --service hiveserver2 &遠程訪問地址:
jdbc:hive2://hadoop-master:10000同時可以從web頁面上查看
http://hadoop-master:10002/8、開始測試
8.1、hdsf目錄創建
首先如果hadoop還未啟動的先保證hadoop正常啟動。然后執行hive的目錄創建:
hdfs dfs -mkdir -p /user/hive/warehouse8.2、將hive拷貝到集群的其他幾臺機器
將hadoop-master上的hive拷貝到其他幾臺機器上
scp -r /usr/local/hive/* hadoop-slave3:/usr/local/hive/ scp -r /usr/local/hive/* hadoop-slave2:/usr/local/hive/ scp -r /usr/local/hive/* hadoop-slave1:/usr/local/hive/并且注意修改其他幾臺機器的配置文件:/etc/profile
#java環境變量 export JAVA_HOME=/usr/local/jdk/jdk1.8.0_261 export CLASSPATH=.:${JAVA_HOME}/jre/lib/rt.jar:${JAVA_HOME}/lib/dt.jar:${JAVA_HOME}/lib/tools.jar export PATH=$PATH:${JAVA_HOME}/bin#配置Hadoop環境變量 export HADOOP_HOME=/usr/local/hadoop/apps/hadoop-2.7.3 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin#配置hive環境變量 export HIVE_HOME=/usr/local/hive/apache-hive-2.1.1 export PATH=$PATH:$HIVE_HOME/bin:$HIVE_HOME/sbin然后,刷新環境變量配置文件:
source /etc/profile8.3、啟動測試
在hadoop-slave1上直接輸入hive,然后進行如下測試
[root@hadoop-slave1 hive]# hive which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/jdk/jdk1.8.0_261/bin:/usr/local/hadoop/apps/hadoop-2.7.3/bin:/usr/local/hadoop/apps/hadoop-2.7.3/sbin::/usr/local/hive/apache-hive-2.1.1/bin:/usr/local/hive/apache-hive-2.1.1/sbin::/root/bin:/usr/local/jdk/jdk1.8.0_261/bin:/usr/local/hadoop/apps/hadoop-2.7.3/bin:/usr/local/hadoop/apps/hadoop-2.7.3/sbin:/usr/local/hive/apache-hive-2.1.1/bin:/usr/local/hive/apache-hive-2.1.1/sbin) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/apps/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Logging initialized using configuration in file:/usr/local/hive/apache-hive-2.1.1/conf/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> create database hive_test; OK Time taken: 0.29 seconds hive> show databases; OK default hive_test Time taken: 0.026 seconds, Fetched: 2 row(s) hive> use hive_test; OK Time taken: 0.042 seconds hive> create table book (id bigint, name string) row format delimited fields terminated by '\t'; OK Time taken: 0.408 seconds hive> show tables; OK book Time taken: 0.027 seconds, Fetched: 1 row(s) hive> select * from book; OK Time taken: 1.111 seconds hive>8.4、遠程訪問hive測試
使用beeline方式連接
[root@hadoop-master hive]# beeline which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/jdk/jdk1.8.0_261/bin:/usr/local/hadoop/apps/hadoop-2.7.3/bin:/usr/local/hadoop/apps/hadoop-2.7.3/sbin:/usr/local/hive/apache-hive-2.1.1/bin:/usr/local/hive/apache-hive-2.1.1/sbin:/usr/local/git/git-2.9.5/bin:/usr/local/go/bin:/root/bin) Beeline version 2.1.1 by Apache Hive beeline> !connect jdbc:hive2://hadoop-master:10000 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/apps/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connecting to jdbc:hive2://hadoop-master:10000 Enter username for jdbc:hive2://hadoop-master:10000: root Enter password for jdbc:hive2://hadoop-master:10000: **** Connected to: Apache Hive (version 2.1.1) Driver: Hive JDBC (version 2.1.1) 21/06/27 11:26:43 [main]: WARN jdbc.HiveConnection: Request to set autoCommit to false; Hive does not support autoCommit=false. Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://hadoop-master:10000> show databases; +----------------+--+ | database_name | +----------------+--+ | default | | hive_test | +----------------+--+ 2 rows selected (0.385 seconds) 0: jdbc:hive2://hadoop-master:10000>總結
以上是生活随笔為你收集整理的CentOS7下Hive集群搭建的全部內容,希望文章能夠幫你解決所遇到的問題。