spark学习7之IDEA下搭建Spark本地编译环境并上传到集群运行
更多代碼請見:https://github.com/xubo245/SparkLearning
IDEA下搭建Spark本地編譯環境并上傳到集群運行
環境:
本地:window7 64 +idea15.0.4+scala 2.10.5
集群:ubuntu+spark1.5.2
1.安裝scala2.10.5,需要配置環境變量,還需要安裝jdk1.7,同樣要環境變量,很多教材,不細講
2.本地安裝idea15.0.4:
https://www.jetbrains.com/idea/download/#section=windows
3.安裝插件:
http://plugins.jetbrains.com/plugin/?idea&id=1347
直接在idea 15.0.4的file-》setting-》plugins中搜索scala會搜索不到,應該是網絡原因,可以去上面的網址下,然后放到idea安裝位置的plugins下,重啟idea,會發現有scala,但是new project的時候沒有
于是刪了,然后在setting的plugins中加上http://www.jetbrains.net/confluence/display/SCA/Scala+Plugin+for+IntelliJ+IDEA
然后在install?jetbrains plugin中搜索就可以安裝上scala 2.2.0
由于spark1.5.2使用的是scala2.10,以及spark-assembly-1.5.2-hadoop2.6.0.jar也是scala2.10
所以找到剛才安裝的目錄:C:\Users\xubo\.IdeaIC15\config\plugins,我得idea默認安裝插件位置,然后保存scala為scala2,將從http://plugins.jetbrains.com/plugin/?idea&id=1347中下載的scala2.10解壓到該目錄
4.重啟idea,就可以新建scala project 然后導入spark-assembly-1.5.2-hadoop2.6.0.jar就可以本地編譯spark程序:
需要安裝hadoop-2.6.0的運行文件,,并配置環境變量,還得有正確的
示例:SparkPi.scala,從源碼中cp,然后加了setMaster
/** Licensed to the Apache Software Foundation (ASF) under one or more* contributor license agreements. See the NOTICE file distributed with* this work for additional information regarding copyright ownership.* The ASF licenses this file to You under the Apache License, Version 2.0* (the "License"); you may not use this file except in compliance with* the License. You may obtain a copy of the License at** http://www.apache.org/licenses/LICENSE-2.0** Unless required by applicable law or agreed to in writing, software* distributed under the License is distributed on an "AS IS" BASIS,* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.* See the License for the specific language governing permissions and* limitations under the License.*/// scalastyle:off println package scalaTestimport scala.math.randomimport org.apache.spark._/** Computes an approximation to pi */ object SparkPi {def main(args: Array[String]) {val conf = new SparkConf().setAppName("Spark Pi ").setMaster("local")val spark = new SparkContext(conf)val slices = if (args.length > 0) args(0).toInt else 2println("slices:\n"+slices)println("args.length:\n"+args.length)val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflowval count = spark.parallelize(1 until n, slices).map { i =>val x = random * 2 - 1val y = random * 2 - 1if (x*x + y*y < 1) 1 else 0}.reduce(_ + _)println("Pi is roughly " + 4.0 * count / n)spark.stop()} } // scalastyle:on println本地運行結果: D:\1win7\java\jdk\bin\java -Didea.launcher.port=7534 "-Didea.launcher.bin.path=D:\1win7\idea\IntelliJ IDEA Community Edition 15.0.4\bin" -Dfile.encoding=UTF-8 -classpath "D:\1win7\java\jdk\jre\lib\charsets.jar;D:\1win7\java\jdk\jre\lib\deploy.jar;D:\1win7\java\jdk\jre\lib\ext\access-bridge-64.jar;D:\1win7\java\jdk\jre\lib\ext\dnsns.jar;D:\1win7\java\jdk\jre\lib\ext\jaccess.jar;D:\1win7\java\jdk\jre\lib\ext\localedata.jar;D:\1win7\java\jdk\jre\lib\ext\sunec.jar;D:\1win7\java\jdk\jre\lib\ext\sunjce_provider.jar;D:\1win7\java\jdk\jre\lib\ext\sunmscapi.jar;D:\1win7\java\jdk\jre\lib\ext\zipfs.jar;D:\1win7\java\jdk\jre\lib\javaws.jar;D:\1win7\java\jdk\jre\lib\jce.jar;D:\1win7\java\jdk\jre\lib\jfr.jar;D:\1win7\java\jdk\jre\lib\jfxrt.jar;D:\1win7\java\jdk\jre\lib\jsse.jar;D:\1win7\java\jdk\jre\lib\management-agent.jar;D:\1win7\java\jdk\jre\lib\plugin.jar;D:\1win7\java\jdk\jre\lib\resources.jar;D:\1win7\java\jdk\jre\lib\rt.jar;D:\1win7\scala;D:\1win7\scala\lib;D:\all\idea\scala2\out\production\scala2;G:\149\spark-assembly-1.5.2-hadoop2.6.0.jar;D:\1win7\scala\lib\scala-actors-migration.jar;D:\1win7\scala\lib\scala-actors.jar;D:\1win7\scala\lib\scala-library.jar;D:\1win7\scala\lib\scala-reflect.jar;D:\1win7\scala\lib\scala-swing.jar;D:\1win7\idea\IntelliJ IDEA Community Edition 15.0.4\lib\idea_rt.jar" com.intellij.rt.execution.application.AppMain scalaTest.SparkPi Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/03/03 17:19:19 INFO SparkContext: Running Spark version 1.5.2 16/03/03 17:19:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/03/03 17:19:21 INFO SecurityManager: Changing view acls to: xubo 16/03/03 17:19:21 INFO SecurityManager: Changing modify acls to: xubo 16/03/03 17:19:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xubo); users with modify permissions: Set(xubo) 16/03/03 17:19:22 INFO Slf4jLogger: Slf4jLogger started 16/03/03 17:19:22 INFO Remoting: Starting remoting 16/03/03 17:19:22 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@202.38.84.241:52826] 16/03/03 17:19:22 INFO Utils: Successfully started service 'sparkDriver' on port 52826. 16/03/03 17:19:22 INFO SparkEnv: Registering MapOutputTracker 16/03/03 17:19:22 INFO SparkEnv: Registering BlockManagerMaster 16/03/03 17:19:22 INFO DiskBlockManager: Created local directory at C:\Users\xubo\AppData\Local\Temp\blockmgr-193ae298-f771-488a-92ee-60c4e94ca9d1 16/03/03 17:19:22 INFO MemoryStore: MemoryStore started with capacity 730.6 MB 16/03/03 17:19:22 INFO HttpFileServer: HTTP File server directory is C:\Users\xubo\AppData\Local\Temp\spark-4b618306-ea29-4c02-a891-754af4d84648\httpd-0a2aa0cd-b7f2-453b-983c-482852013882 16/03/03 17:19:22 INFO HttpServer: Starting HTTP Server 16/03/03 17:19:22 INFO Utils: Successfully started service 'HTTP file server' on port 52827. 16/03/03 17:19:22 INFO SparkEnv: Registering OutputCommitCoordinator 16/03/03 17:19:22 INFO Utils: Successfully started service 'SparkUI' on port 4040. 16/03/03 17:19:22 INFO SparkUI: Started SparkUI at http://202.38.84.241:4040 16/03/03 17:19:23 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 16/03/03 17:19:23 INFO Executor: Starting executor ID driver on host localhost 16/03/03 17:19:23 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52834. 16/03/03 17:19:23 INFO NettyBlockTransferService: Server created on 52834 16/03/03 17:19:23 INFO BlockManagerMaster: Trying to register BlockManager 16/03/03 17:19:23 INFO BlockManagerMasterEndpoint: Registering block manager localhost:52834 with 730.6 MB RAM, BlockManagerId(driver, localhost, 52834) 16/03/03 17:19:23 INFO BlockManagerMaster: Registered BlockManager slices: 2 args.length: 0 16/03/03 17:19:24 INFO SparkContext: Starting job: main at NativeMethodAccessorImpl.java:-2 16/03/03 17:19:24 INFO DAGScheduler: Got job 0 (main at NativeMethodAccessorImpl.java:-2) with 2 output partitions 16/03/03 17:19:24 INFO DAGScheduler: Final stage: ResultStage 0(main at NativeMethodAccessorImpl.java:-2) 16/03/03 17:19:24 INFO DAGScheduler: Parents of final stage: List() 16/03/03 17:19:24 INFO DAGScheduler: Missing parents: List() 16/03/03 17:19:24 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at main at NativeMethodAccessorImpl.java:-2), which has no missing parents 16/03/03 17:19:24 INFO MemoryStore: ensureFreeSpace(1856) called with curMem=0, maxMem=766075207 16/03/03 17:19:24 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1856.0 B, free 730.6 MB) 16/03/03 17:19:24 INFO MemoryStore: ensureFreeSpace(1198) called with curMem=1856, maxMem=766075207 16/03/03 17:19:24 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1198.0 B, free 730.6 MB) 16/03/03 17:19:24 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:52834 (size: 1198.0 B, free: 730.6 MB) 16/03/03 17:19:24 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:861 16/03/03 17:19:24 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at main at NativeMethodAccessorImpl.java:-2) 16/03/03 17:19:24 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 16/03/03 17:19:24 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 2085 bytes) 16/03/03 17:19:24 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 16/03/03 17:19:24 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1031 bytes result sent to driver 16/03/03 17:19:24 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 2085 bytes) 16/03/03 17:19:24 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 16/03/03 17:19:24 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 190 ms on localhost (1/2) 16/03/03 17:19:24 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1031 bytes result sent to driver 16/03/03 17:19:24 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 33 ms on localhost (2/2) 16/03/03 17:19:24 INFO DAGScheduler: ResultStage 0 (main at NativeMethodAccessorImpl.java:-2) finished in 0.230 s 16/03/03 17:19:24 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 16/03/03 17:19:24 INFO DAGScheduler: Job 0 finished: main at NativeMethodAccessorImpl.java:-2, took 0.545201 s Pi is roughly 3.14548 16/03/03 17:19:24 INFO SparkUI: Stopped Spark web UI at http://202.38.84.241:4040 16/03/03 17:19:24 INFO DAGScheduler: Stopping DAGScheduler 16/03/03 17:19:24 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 16/03/03 17:19:24 INFO MemoryStore: MemoryStore cleared 16/03/03 17:19:24 INFO BlockManager: BlockManager stopped 16/03/03 17:19:24 INFO BlockManagerMaster: BlockManagerMaster stopped 16/03/03 17:19:24 INFO SparkContext: Successfully stopped SparkContext 16/03/03 17:19:24 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 16/03/03 17:19:24 INFO ShutdownHookManager: Shutdown hook called 16/03/03 17:19:24 INFO ShutdownHookManager: Deleting directory C:\Users\xubo\AppData\Local\Temp\spark-4b618306-ea29-4c02-a891-754af4d84648Process finished with exit code 0
5.將代碼打成jar包,上傳到集群,請參考:書“”Spark大數據應用“P123
大概:File-》Project Structure-》Artifact,然后選擇jar-》from modules dependences。。。
選擇class,可以將scala和spark的包刪除,不然會很大,最后在idea界面選擇build-》build artifact 生成jar導入集群,然后在運行,
運行腳本:
#!/usr/bin/env bash spark-submit --name SparkPi \ --class scalaTest.SparkPi \ --master spark://219.219.220.149:7077 \ --executor-memory 512M \ --total-executor-cores 22 scala2.jar位置:/home/hadoop/cloud/testByXubo/spark/backupSuccess/ideaSparkPi/1
執行結果:
hadoop@Master:~/cloud/testByXubo/spark/backupSuccess/ideaSparkPi/1$ ./submitJob.sh slices: 2 args.length: 0 Pi is roughly 3.14344
總結
以上是生活随笔為你收集整理的spark学习7之IDEA下搭建Spark本地编译环境并上传到集群运行的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 远程桌面控制软件 Splashtop 新
- 下一篇: 软件体系结构期末复习