慕课网Spark SQL日志分析 - 4.从Hive平滑过渡到Spark SQL
生活随笔
收集整理的這篇文章主要介紹了
慕课网Spark SQL日志分析 - 4.从Hive平滑过渡到Spark SQL
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
4.1 SQLContext/HiveContext/SparkSesson
1.SQLContext
老版本文檔:spark.apache.org/docs/1.6.1/
- SQLContext示例文件:
- 打包:
- 提交Spark Application到環(huán)境中運(yùn)行 文檔: spark.apache.org/docs/1.6.1/…
- 腳本提交: 將上面的命令做成shell腳本,賦予執(zhí)行權(quán)限即可執(zhí)行
2.HiveContext使用
To use a HiveContext, you do not need to have an existing Hive setup
代碼上面代碼類似,只是把SQLContext改成HiveContext。不過使用時(shí)需要通過--jars 把mysql的驅(qū)動(dòng)傳遞到classpath
3.SparkSession
def main(args: Array[String]): Unit = { val path = args(0)val spark = SparkSession .builder() .appName("SQLContextApp") .config("spark.driver.bindAddress","127.0.0.1") .master("local[2]") .getOrCreate()val people = spark.read.format("json").load(path) people.printSchema() people.show() spark.stop() } 復(fù)制代碼4.2 spark-shell/spark-sql的使用
分析執(zhí)行計(jì)劃理解sparksql的架構(gòu)
create table t(key string,value string); explain extended select a.key * (2+3),b.value from t a join t b on a.key = b.key and a.key > 3;# 解析成一個(gè)邏輯執(zhí)行計(jì)劃 == Parsed Logical Plan == # unresolvedalias:并沒有解析全 'Project [unresolvedalias(('a.key * (2 + 3)), None), 'b.value] # select 的兩個(gè)字段 +- 'Join Inner, (('a.key = 'b.key) && ('a.key > 3)) # or后面的條件 :- 'SubqueryAlias a : +- 'UnresolvedRelation `t` +- 'SubqueryAlias b +- 'UnresolvedRelation `t`# 解析操作(需要與底層的metastore打交道) == Analyzed Logical Plan == (CAST(key AS DOUBLE) * CAST((2 + 3) AS DOUBLE)): double, value: string # 將a.key , (2+3) 分別轉(zhuǎn)換成double類型 Project [(cast(key#8 as double) * cast((2 + 3) as double)) AS (CAST(key AS DOUBLE) * CAST((2 + 3) AS DOUBLE))#12, value#11] # select 的兩個(gè)字段 +- Join Inner, ((key#8 = key#10) && (cast(key#8 as int) > 3)) :- SubqueryAlias a : +- SubqueryAlias t # 已經(jīng)解析出了使元數(shù)據(jù)中的哪張表 : +- CatalogRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#8, value#9] +- SubqueryAlias b +- SubqueryAlias t +- CatalogRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#10, value#11]# 優(yōu)化操作 == Optimized Logical Plan == Project [(cast(key#8 as double) * 5.0) AS (CAST(key AS DOUBLE) * CAST((2 + 3) AS DOUBLE))#12, value#11] +- Join Inner, (key#8 = key#10) :- Project [key#8] : +- Filter (isnotnull(key#8) && (cast(key#8 as int) > 3)) # 把a(bǔ).key>3 提到前面來,先過濾, : +- CatalogRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#8, value#9] +- Filter (isnotnull(key#10) && (cast(key#10 as int) > 3)) +- CatalogRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#10, value#11]# 物理執(zhí)行計(jì)劃 == Physical Plan == *Project [(cast(key#8 as double) * 5.0) AS (CAST(key AS DOUBLE) * CAST((2 + 3) AS DOUBLE))#12, value#11] +- *SortMergeJoin [key#8], [key#10], Inner :- *Sort [key#8 ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(key#8, 200) : +- *Filter (isnotnull(key#8) && (cast(key#8 as int) > 3)) : +- HiveTableScan [key#8], CatalogRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#8, value#9] +- *Sort [key#10 ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(key#10, 200) +- *Filter (isnotnull(key#10) && (cast(key#10 as int) > 3)) +- HiveTableScan [key#10, value#11], CatalogRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#10, value#11] 復(fù)制代碼4.3 thriftserver/beeline的使用
- http://localhost:4040/sqlserver/ 這個(gè)界面可以查看具體執(zhí)行過的sql語句,可以查看執(zhí)行計(jì)劃
- http://localhost:4040/SQL/execution/ 可以查看sql執(zhí)行的詳細(xì)信息
3.thriftserver 和 spark-shell/spark-sql 的區(qū)別:
4.4 jdbc方式編程訪問
1.添加maven依賴
<dependency> <groupId>org.spark-project.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>1.2.1.spark2</version> </dependency> 復(fù)制代碼2.開發(fā)代碼訪問thriftserver
注意事項(xiàng):在使用jdbc開發(fā)時(shí),一定要先啟動(dòng)thriftserver
def main(args: Array[String]): Unit = { Class.forName("org.apache.hive.jdbc.HiveDriver") try{} val conn = DriverManager.getConnection("jdbc:hive2://localhost:10000","gaowenfeng","") val pstmt = conn.prepareStatement("select * from emp") val rs = pstmt.executeQuery()while (rs.next()){ print(rs.getInt("id")+"\t"+rs.getString("name")) }rs.close() pstmt.close() conn.close() } 復(fù)制代碼總結(jié)
以上是生活随笔為你收集整理的慕课网Spark SQL日志分析 - 4.从Hive平滑过渡到Spark SQL的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: kubernetes使用ansible快
- 下一篇: maven——pom.xml