2021年大数据Spark(三十一):Spark On Hive
目錄
Spark On Hive
spark-sql中集成Hive
Spark代碼中集成Hive
Spark On Hive
Spark SQL模塊從發(fā)展來說,從Apache Hive框架而來,發(fā)展歷程:Hive(MapReduce)-> Shark (Hive on Spark) -> Spark SQL(SchemaRDD -> DataFrame -> Dataset),所以SparkSQL天然無縫集成Hive,可以加載Hive表數(shù)據(jù)進(jìn)行分析。
http://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
spark-sql中集成Hive
SparkSQL集成Hive本質(zhì)就是:讀取Hive框架元數(shù)據(jù)MetaStore,此處啟動(dòng)Hive MetaStore服務(wù)即可。
nohup /export/server/hive/bin/hive --service metastore &
編寫配置文件hive-site.xml,并放于node1的【$SPARK_HOME/conf】目錄
cd /export/server/spark/conf/
vim hive-site.xml
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property><name>hive.metastore.warehouse.dir</name><value>/user/hive/warehouse</value></property><property><name>hive.metastore.local</name><value>false</value></property><property><name>hive.metastore.uris</name><value>thrift://node3:9083</value></property></configuration>
也可以將hive-site.xml分發(fā)到集群中所有Spark的conf目錄,此時(shí)任意機(jī)器啟動(dòng)應(yīng)用都可以訪問Hive表數(shù)據(jù)。
使用sparksql操作hive
/export/server/spark/bin/spark-sql --master local[2] --conf spark.sql.shuffle.partitions=2
show database;
show tables;
CREATE TABLE person (id int, name string, age int) row format delimited fields terminated by ' ';
LOAD DATA LOCAL INPATH 'file:///root/person.txt' INTO TABLE person;
show tables;
select * from person;
???????Spark代碼中集成Hive
在IDEA中開發(fā)應(yīng)用,集成Hive,讀取表的數(shù)據(jù)進(jìn)行分析,構(gòu)建SparkSession時(shí)需要設(shè)置HiveMetaStore服務(wù)器地址及集成Hive選項(xiàng),首先添加MAVEN依賴包:
<!--SparkSQL+ Hive依賴--><dependency><groupId>org.apache.spark</groupId><artifactId>spark-hive_2.11</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-hive-thriftserver_2.11</artifactId><version>${spark.version}</version></dependency>
范例演示代碼如下:
package cn.it.sqlimport org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession/*** SparkSQL集成Hive*/
object SparkSQLHive {def main(args: Array[String]): Unit = {val spark: SparkSession = SparkSession.builder().appName(this.getClass.getSimpleName.stripSuffix("$")).master("local[*]").config("spark.sql.shuffle.partitions", "4").config("spark.sql.warehouse.dir", "hdfs://node1:8020/user/hive/warehouse").config("hive.metastore.uris", "thrift://node3:9083").enableHiveSupport()//開啟hive語法的支持.getOrCreate()val sc: SparkContext = spark.sparkContextsc.setLogLevel("WARN")import spark.implicits._import org.apache.spark.sql.functions._//查看有哪些表spark.sql("show tables").show()//創(chuàng)建表spark.sql("CREATE TABLE person2 (id int, name string, age int) row format delimited fields terminated by ' '")//加載數(shù)據(jù)spark.sql("LOAD DATA LOCAL INPATH 'file:///D:/person.txt' INTO TABLE person2")//查看有哪些表spark.sql("show tables").show()//查詢數(shù)據(jù)spark.sql("select * from person2").show()}
}
總結(jié)
以上是生活随笔為你收集整理的2021年大数据Spark(三十一):Spark On Hive的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 2021年大数据Spark(二十八):S
- 下一篇: 2021年大数据Spark(三十二):S