當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Shark集群搭建配置

發布時間：2025/1/21 编程问答 41 豆豆

生活随笔收集整理的這篇文章主要介紹了 Shark集群搭建配置小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一、Shark簡單介紹

Shark是基于Spark與Hive之上的一種SQL查詢引擎，官網的架構圖及性能測試圖例如以下：(Ps:本人也做了一個性能測試見Shark性能測試報告)

我們涉及到了2個依賴組件，1是Apache Spark，另外一個是AMPLAB的Hive0.11.

這里注意版本號的選擇。一定要選擇官方的推薦版本號：

Spark0.91 + AMPLAB Hive0.11 + Shark0.91

一定要自己編譯好它們，適用于自己的集群。

二、Shark集群搭建

1. 搭建Spark集群。這個能夠參照：Spark集群搭建。

2. 編譯AMPLAB的Hive0.11，進入到根文件夾下直接 ant package.

3.編譯Shark，這個步驟和編譯Spark是一樣的。和HDFS的版本號記得兼容即可，改動project以下的SharkBuild.scala里面的Hadoop版本號號。然后sbt/sbt assembly.

三、啟動Spark + Shark

首先。啟動Spark，這里要改動spark的配置文件,在Spark-env.sh里面配置：

HADOOP_CONF_DIR=/home/hadoop/src/hadoop/conf SPARK_CLASSPATH=/home/hadoop/src/hadoop/lib/:/app/hadoop/shengli/sharklib/* SPARK_LOCAL_DIRS=/app/hadoop/shengli/spark/data SPARK_MASTER_IP=10.1.8.210 SPARK_MASTER_WEBUI_PORT=7078
接著，配置Spark的spark-defaults.conf

spark.master ? ? ? ? ? ?spark://10.1.8.210:7077 spark.executor.memory ? 32g spark.shuffle.spill ?true java.library.path ? ?/usr/local/lib spark.shuffle.consolidateFiles true# spark.eventLog.enabled ?true # spark.eventLog.dir ? ? ?hdfs://namenode:8021/directory # spark.serializer ? ? ? ?org.apache.spark.serializer.KryoSerializer

接著配置slaves：

10.1.8.210 #這里master節點不會做cache 10.1.8.211 10.1.8.212 10.1.8.213
最后啟動集群，sbin/start-all.sh，至此Spark集群配置完成。

Shark有依賴的Jar包。我們集中將其復制到一個目錄內：

#!/bin/bash for jar in `find /home/hadoop/shengli/shark/lib -name '*jar'`; docp $jar /home/hadoop/shengli/sharklib/ done for jar in `find /home/hadoop/shengli/shark/lib_managed/jars -name '*jar'`; docp $jar /home/hadoop/shengli/sharklib/ done for jar in `find /home/hadoop/shengli/shark/lib_managed/bundles -name '*jar'`; docp $jar /home/hadoop/shengli/sharklib/ done

配置Shark，在shark/conf/shark-env.sh中配置

# format as the JVM's -Xmx option, e.g. 300m or 1g. export JAVA_HOME=/usr/java/jdk1.7.0_25 # (Required) Set the master program's memory #export SHARK_MASTER_MEM=1g# (Optional) Specify the location of Hive's configuration directory. By default, # Shark run scripts will point it to $SHARK_HOME/conf #export HIVE_CONF_DIR="" export HADOOP_HOME=/home/hadoop/src/hadoop # For running Shark in distributed mode, set the following: export SHARK_MASTER_MEM=1g export HADOOP_HOME=$HADOOP_HOME export SPARK_HOME=/app/hadoop/shengli/spark export SPARK_MASTER_IP=10.1.8.210 export MASTER=spark://10.1.8.210:7077# Only required if using Mesos: #export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so# Only required if run shark with spark on yarn #export SHARK_EXEC_MODE=yarn #export SPARK_ASSEMBLY_JAR= #export SHARK_ASSEMBLY_JAR=# (Optional) Extra classpath #export SPARK_LIBRARY_PATH=""# Java options # On EC2, change the local.dir to /mnt/tmp# (Optional) Tachyon Related Configuration #export TACHYON_MASTER="" # e.g. "localhost:19998" #export TACHYON_WAREHOUSE_PATH=/sharktables # Could be any valid path name #export HIVE_HOME=/home/hadoop/shengli/hive/build/dest export HIVE_CONF_DIR=/app/hadoop/shengli/hive/conf export CLASSPATH=$CLASSPATH:/home/hadoop/src/hadoop/lib:home/hadoop/src/hadoop/lib/native:/app/hadoop/shengli/sharklib/*export SCALA_HOME=/app/hadoop/shengli/scala-2.10.3#export SPARK_LIBRARY_PATH=/home/hadoop/src/hadoop/lib/native/Linux-amd64-64#export LD_LIBRARY_PATH=/home/hadoop/src/hadoop/lib/native/Linux-amd64-64#spark conf copy hereSPARK_JAVA_OPTS=" -Dspark.cores.max=8 -Dspark.local.dir=/app/hadoop/shengli/spark/data -Dspark.deploy.defaultCores=2 -Dspark.executor.memory=24g -Dspark.shuffle.spill=true -Djava.library.path=/usr/local/lib " SPARK_JAVA_OPTS+="-Xmx4g -Xms4g -verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedOops " export SPARK_JAVA_OPTS

接下來配置Shark的集群了，我們要將編譯好的Spark，Shark。Hive所有都分發到各個節點。保持同步更新rsync。

rsync --update -pav --progress /app/hadoop/shengli/spark/ root@10.1.8.211:/app/hadoop/shengli/spark/ ...... rsync --update -pav --progress /app/hadoop/shengli/shark/ root@10.1.8.211:/app/hadoop/shengli/shark/ ...... rsync --update -pav --progress /app/hadoop/shengli/hive/ root@10.1.8.211:/app/hadoop/shengli/hive/ ...... rsync --update -pav --progress /app/hadoop/shengli/sharklib/ root@10.1.8.211:/app/hadoop/shengli/sharklib/ ...... rsync --update -pav --progress /usr/java/jdk1.7.0_25/ root@10.1.8.211:/usr/java/jdk1.7.0_25/ ......
啟動Shark，能夠在WEBUI上查看集群狀態（上面配置的是WEB UI PORT 7078）

進入到SHARK_HOME/bin

drwxr-xr-x 4 hadoop games 4.0K Jun 12 10:01 . drwxr-xr-x 13 hadoop games 4.0K Jun 16 16:59 .. -rwxr-xr-x 1 hadoop games 882 Apr 10 19:18 beeline drwxr-xr-x 2 hadoop games 4.0K Jun 12 10:01 dev drwxr-xr-x 2 hadoop games 4.0K Jun 12 10:01 ext -rwxr-xr-x 1 hadoop games 1.4K Apr 10 19:18 shark -rwxr-xr-x 1 hadoop games 730 Apr 10 19:18 shark-shell -rwxr-xr-x 1 hadoop games 840 Apr 10 19:18 shark-withdebug -rwxr-xr-x 1 hadoop games 838 Apr 10 19:18 shark-withinfo

這里shark是直接執行shark

shark-shell類似spark-shell

shark-withdebug是在執行中以DEBUG的log4J模式進入，適合排查錯誤和理解執行。

shark-withinfo同上。

shark還提供了一種shark-server共享Application中Cacahed RDD概念。

bin/shark -h 10.1.8.210 -p 7100-h 10.1.8.210 -p 7100 Starting the Shark Command Line ClientLogging initialized using configuration in jar:file:/app/hadoop/shengli/sharklib/hive-common-0.11.0-shark-0.9.1.jar!/hive-log4j.properties Hive history file=/tmp/root/hive_job_log_root_25876@wh-8-210_201406171640_1172020906.txt SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/app/hadoop/shengli/sharklib/slf4j-log4j12-1.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/app/hadoop/shengli/sharklib/shark-assembly-0.9.1-hadoop0.20.2-cdh3u5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/app/hadoop/shengli/shark/lib_managed/jars/org.slf4j/slf4j-log4j12/slf4j-log4j12-1.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2.870: [GC 262208K->21869K(1004928K), 0.0274310 secs] [10.1.8.210:7100] shark>
這樣就能夠用多個client連接這個port了。

bin/shark -h 10.1.8.210 -p 7100 -h 10.1.8.210 -p 7100 Starting the Shark Command Line ClientLogging initialized using configuration in jar:file:/app/hadoop/shengli/sharklib/hive-common-0.11.0-shark-0.9.1.jar!/hive-log4j.properties Hive history file=/tmp/hadoop/hive_job_log_hadoop_28486@wh-8-210_201406171719_457245737.txt SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/app/hadoop/shengli/sharklib/slf4j-log4j12-1.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/app/hadoop/shengli/sharklib/shark-assembly-0.9.1-hadoop0.20.2-cdh3u5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/app/hadoop/shengli/shark/lib_managed/jars/org.slf4j/slf4j-log4j12/slf4j-log4j12-1.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] show ta3.050: [GC 262208K->22324K(1004928K), 0.0240010 secs] ble[10.1.8.210:7100] shark> show tables; Time taken (including network latency): 0.072 seconds
至此，shark啟動完成。

3、測試

來做一個簡單的測試，看是否可用，處理一個21g的文件。

[hadoop@wh-8-210 shark]$ hadoop dfs -ls /user/hive/warehouse/log/ Found 1 items -rw-r--r-- 3 hadoop supergroup 22499035249 2014-06-16 18:32 /user/hive/warehouse/log/21gfile
create table log (c1 string,c2 string,c3 string,c4 string,c5 string,c6 string,c7 string,c8 string,c9 string,c10 string,c11 string,c12 string,c13 string ) row format delimited fields terminated by '\t' stored as textfile;
load data inpath '/user/hive/warehouse/log/21gfile' into table log;
count一下log表：

[10.1.8.210:7100] shark> select count(1) from log > ; 171802086 Time taken (including network latency): 33.753 seconds用時33秒。

將log表所有裝在至內存，count一下log_cached：

CREATE TABLE log_cached TBLPROPERTIES ("shark.cache" = "true") AS SELECT * from log; Time taken (including network latency): 481.96 secondsshark> select count(1) from log_cached; 171802086 Time taken (including network latency): 6.051 seconds
用時6秒，速度提升了至少5倍。

查看Executor以及Task存儲狀況：

查看存儲狀況Storage：

至此，Shark集群搭建和簡單的測試已完畢。

興許我會寫篇環境搭建中常見的問題，以及更具體的Shark測試結論。

注：原創文章。轉載請注明出處。出自：http://blog.csdn.net/oopsoom/article/details/30513929

-EOF-

總結

以上是生活随笔為你收集整理的Shark集群搭建配置的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

集群
Shark

上一篇：免费的HTML5连载来了《HTML5网页
下一篇：如何把关联性的告警智能添加到 Nagio