用Maven构建Hadoop项目
Hadoop家族系列文章,主要介紹Hadoop家族產(chǎn)品,常用的項(xiàng)目包括Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa,新增加的項(xiàng)目包括,YARN, Hcatalog, Oozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, Hue等。
從2011年開(kāi)始,中國(guó)進(jìn)入大數(shù)據(jù)風(fēng)起云涌的時(shí)代,以Hadoop為代表的家族軟件,占據(jù)了大數(shù)據(jù)處理的廣闊地盤(pán)。開(kāi)源界及廠商,所有數(shù)據(jù)軟件,無(wú)一不向Hadoop靠攏。Hadoop也從小眾的高富帥領(lǐng)域,變成了大數(shù)據(jù)開(kāi)發(fā)的標(biāo)準(zhǔn)。在Hadoop原有技術(shù)基礎(chǔ)之上,出現(xiàn)了Hadoop家族產(chǎn)品,通過(guò)“大數(shù)據(jù)”概念不斷創(chuàng)新,推出科技進(jìn)步。
作為IT界的開(kāi)發(fā)人員,我們也要跟上節(jié)奏,抓住機(jī)遇,跟著Hadoop一起雄起!
關(guān)于作者:
- 張丹(Conan), 程序員Java,R,PHP,Javascript
- weibo:@Conan_Z
- blog:?http://blog.fens.me
- email: bsspirit@gmail.com
轉(zhuǎn)載請(qǐng)注明出處:
http://blog.fens.me/hadoop-maven-eclipse/
前言
Hadoop的MapReduce環(huán)境是一個(gè)復(fù)雜的編程環(huán)境,所以我們要盡可能地簡(jiǎn)化構(gòu)建MapReduce項(xiàng)目的過(guò)程。Maven是一個(gè)很不錯(cuò)的自動(dòng)化項(xiàng)目構(gòu)建工具,通過(guò)Maven來(lái)幫助我們從復(fù)雜的環(huán)境配置中解脫出來(lái),從而標(biāo)準(zhǔn)化開(kāi)發(fā)過(guò)程。所以,寫(xiě)MapReduce之前,讓我們先花點(diǎn)時(shí)間把刀磨快!!當(dāng)然,除了Maven還有其他的選擇Gradle(推薦), Ivy….
后面將會(huì)有介紹幾篇MapReduce開(kāi)發(fā)的文章,都要依賴(lài)于本文中Maven的構(gòu)建的MapReduce環(huán)境。
目錄
1. Maven介紹
Apache Maven,是一個(gè)Java的項(xiàng)目管理及自動(dòng)構(gòu)建工具,由Apache軟件基金會(huì)所提供。基于項(xiàng)目對(duì)象模型(縮寫(xiě):POM)概念,Maven利用一個(gè)中央信息片斷能管理一個(gè)項(xiàng)目的構(gòu)建、報(bào)告和文檔等步驟。曾是Jakarta項(xiàng)目的子項(xiàng)目,現(xiàn)為獨(dú)立Apache項(xiàng)目。
maven的開(kāi)發(fā)者在他們開(kāi)發(fā)網(wǎng)站上指出,maven的目標(biāo)是要使得項(xiàng)目的構(gòu)建更加容易,它把編譯、打包、測(cè)試、發(fā)布等開(kāi)發(fā)過(guò)程中的不同環(huán)節(jié)有機(jī)的串聯(lián)了起來(lái),并產(chǎn)生一致的、高質(zhì)量的項(xiàng)目信息,使得項(xiàng)目成員能夠及時(shí)地得到反饋。maven有效地支持了測(cè)試優(yōu)先、持續(xù)集成,體現(xiàn)了鼓勵(lì)溝通,及時(shí)反饋的軟件開(kāi)發(fā)理念。如果說(shuō)Ant的復(fù)用是建立在”拷貝–粘貼”的基礎(chǔ)上的,那么Maven通過(guò)插件的機(jī)制實(shí)現(xiàn)了項(xiàng)目構(gòu)建邏輯的真正復(fù)用。
2. Maven安裝(win)
下載Maven:http://maven.apache.org/download.cgi
下載最新的xxx-bin.zip文件,在win上解壓到 D:\toolkit\maven3
并把maven/bin目錄設(shè)置在環(huán)境變量PATH:
然后,打開(kāi)命令行輸入mvn,我們會(huì)看到mvn命令的運(yùn)行效果
~ C:\Users\Administrator>mvn [INFO] Scanning for projects... [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 0.086s [INFO] Finished at: Mon Sep 30 18:26:58 CST 2013 [INFO] Final Memory: 2M/179M [INFO] ------------------------------------------------------------------------ [ERROR] No goals have been specified for this build. You must specify a valid lifecycle phase or a goal in the format : or :[:]:. Available lifecycle phases are: validate, initialize, generate-sources, process-sources, generate-resources, process-resources, compile, process-class es, generate-test-sources, process-test-sources, generate-test-resources, process-test-resources, test-compile, process-test-classes, test, prepare-package, package, pre-integration-test, integration-test, post-integration-test, verify, install, deploy, pre-clean, clean, post-clean, pre-site, site, post-site, site-deploy. -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/NoGoalSpecifiedException安裝Eclipse的Maven插件:Maven Integration for Eclipse
Maven的Eclipse插件配置
3. Hadoop開(kāi)發(fā)環(huán)境介紹
如上圖所示,我們可以選擇在win中開(kāi)發(fā),也可以在linux中開(kāi)發(fā),本地啟動(dòng)Hadoop或者遠(yuǎn)程調(diào)用Hadoop,標(biāo)配的工具都是Maven和Eclipse。
Hadoop集群系統(tǒng)環(huán)境:
- Linux: Ubuntu 12.04.2 LTS 64bit Server
- Java: 1.6.0_29
- Hadoop: hadoop-1.0.3,單節(jié)點(diǎn),IP:192.168.1.210
4. 用Maven構(gòu)建Hadoop環(huán)境
- 1. 用Maven創(chuàng)建一個(gè)標(biāo)準(zhǔn)化的Java項(xiàng)目
- 2. 導(dǎo)入項(xiàng)目到eclipse
- 3. 增加hadoop依賴(lài),修改pom.xml
- 4. 下載依賴(lài)
- 5. 從Hadoop集群環(huán)境下載hadoop配置文件
- 6. 配置本地host
1). 用Maven創(chuàng)建一個(gè)標(biāo)準(zhǔn)化的Java項(xiàng)目
~ D:\workspace\java>mvn archetype:generate -DarchetypeGroupId=org.apache.maven.archetypes -DgroupId=org.conan.myhadoop.mr -DartifactId=myHadoop -DpackageName=org.conan.myhadoop.mr -Dversion=1.0-SNAPSHOT -DinteractiveMode=false [INFO] Scanning for projects... [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building Maven Stub Project (No POM) 1 [INFO] ------------------------------------------------------------------------ [INFO] [INFO] >>> maven-archetype-plugin:2.2:generate (default-cli) @ standalone-pom >>> [INFO] [INFO] <<< maven-archetype-plugin:2.2:generate (default-cli) @ standalone-pom <<< [INFO] [INFO] --- maven-archetype-plugin:2.2:generate (default-cli) @ standalone-pom --- [INFO] Generating project in Batch mode [INFO] No archetype defined. Using maven-archetype-quickstart (org.apache.maven.archetypes:maven-archetype-quickstart:1. 0) Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/archetypes/maven-archetype-quickstart/1.0/maven-archet ype-quickstart-1.0.jar Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/archetypes/maven-archetype-quickstart/1.0/maven-archety pe-quickstart-1.0.jar (5 KB at 4.3 KB/sec) Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/archetypes/maven-archetype-quickstart/1.0/maven-archet ype-quickstart-1.0.pom Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/archetypes/maven-archetype-quickstart/1.0/maven-archety pe-quickstart-1.0.pom (703 B at 1.6 KB/sec) [INFO] ---------------------------------------------------------------------------- [INFO] Using following parameters for creating project from Old (1.x) Archetype: maven-archetype-quickstart:1.0 [INFO] ---------------------------------------------------------------------------- [INFO] Parameter: groupId, Value: org.conan.myhadoop.mr [INFO] Parameter: packageName, Value: org.conan.myhadoop.mr [INFO] Parameter: package, Value: org.conan.myhadoop.mr [INFO] Parameter: artifactId, Value: myHadoop [INFO] Parameter: basedir, Value: D:\workspace\java [INFO] Parameter: version, Value: 1.0-SNAPSHOT [INFO] project created from Old (1.x) Archetype in dir: D:\workspace\java\myHadoop [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 8.896s [INFO] Finished at: Sun Sep 29 20:57:07 CST 2013 [INFO] Final Memory: 9M/179M [INFO] ------------------------------------------------------------------------進(jìn)入項(xiàng)目,執(zhí)行mvn命令
~ D:\workspace\java>cd myHadoop ~ D:\workspace\java\myHadoop>mvn clean install [INFO] [INFO] --- maven-jar-plugin:2.3.2:jar (default-jar) @ myHadoop --- [INFO] Building jar: D:\workspace\java\myHadoop\target\myHadoop-1.0-SNAPSHOT.jar [INFO] [INFO] --- maven-install-plugin:2.3.1:install (default-install) @ myHadoop --- [INFO] Installing D:\workspace\java\myHadoop\target\myHadoop-1.0-SNAPSHOT.jar to C:\Users\Administrator\.m2\repository\o rg\conan\myhadoop\mr\myHadoop\1.0-SNAPSHOT\myHadoop-1.0-SNAPSHOT.jar [INFO] Installing D:\workspace\java\myHadoop\pom.xml to C:\Users\Administrator\.m2\repository\org\conan\myhadoop\mr\myHa doop\1.0-SNAPSHOT\myHadoop-1.0-SNAPSHOT.pom [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 4.348s [INFO] Finished at: Sun Sep 29 20:58:43 CST 2013 [INFO] Final Memory: 11M/179M [INFO] ------------------------------------------------------------------------2). 導(dǎo)入項(xiàng)目到eclipse
我們創(chuàng)建好了一個(gè)基本的maven項(xiàng)目,然后導(dǎo)入到eclipse中。 這里我們最好已安裝好了Maven的插件。
3). 增加hadoop依賴(lài)
這里我使用hadoop-1.0.3版本,修改文件:pom.xml
~ vi pom.xml<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.conan.myhadoop.mr</groupId> <artifactId>myHadoop</artifactId> <packaging>jar</packaging> <version>1.0-SNAPSHOT</version> <name>myHadoop</name> <url>http://maven.apache.org</url> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-core</artifactId> <version>1.0.3</version> </dependency><dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.4</version> <scope>test</scope> </dependency> </dependencies> </project>4). 下載依賴(lài)
下載依賴(lài):
~ mvn clean install在eclipse中刷新項(xiàng)目:
項(xiàng)目的依賴(lài)程序,被自動(dòng)加載的庫(kù)路徑下面。
5). 從Hadoop集群環(huán)境下載hadoop配置文件
- core-site.xml
- hdfs-site.xml
- mapred-site.xml
查看core-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/conan/hadoop/tmp</value> </property> <property> <name>io.sort.mb</name> <value>256</value> </property> </configuration>查看hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration> <property> <name>dfs.data.dir</name> <value>/home/conan/hadoop/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>查看mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration> <property> <name>mapred.job.tracker</name> <value>hdfs://master:9001</value> </property> </configuration>保存在src/main/resources/hadoop目錄下面
刪除原自動(dòng)生成的文件:App.java和AppTest.java
6).配置本地host,增加master的域名指向
~ vi c:/Windows/System32/drivers/etc/hosts192.168.1.210 master6. MapReduce程序開(kāi)發(fā)
編寫(xiě)一個(gè)簡(jiǎn)單的MapReduce程序,實(shí)現(xiàn)wordcount功能。
新一個(gè)Java文件:WordCount.java
package org.conan.myhadoop.mr;import java.io.IOException; import java.util.Iterator; import java.util.StringTokenizer;import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.TextOutputFormat;public class WordCount {public static class WordCountMapper extends MapReduceBase implements Mapper<Object, Text, Text, IntWritable> {private final static IntWritable one = new IntWritable(1);private Text word = new Text();@Overridepublic void map(Object key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {StringTokenizer itr = new StringTokenizer(value.toString());while (itr.hasMoreTokens()) {word.set(itr.nextToken());output.collect(word, one);}}}public static class WordCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {private IntWritable result = new IntWritable();@Overridepublic void reduce(Text key, Iterator values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {int sum = 0;while (values.hasNext()) {sum += values.next().get();}result.set(sum);output.collect(key, result);}}public static void main(String[] args) throws Exception {String input = "hdfs://192.168.1.210:9000/user/hdfs/o_t_account";String output = "hdfs://192.168.1.210:9000/user/hdfs/o_t_account/result";JobConf conf = new JobConf(WordCount.class);conf.setJobName("WordCount");conf.addResource("classpath:/hadoop/core-site.xml");conf.addResource("classpath:/hadoop/hdfs-site.xml");conf.addResource("classpath:/hadoop/mapred-site.xml");conf.setOutputKeyClass(Text.class);conf.setOutputValueClass(IntWritable.class);conf.setMapperClass(WordCountMapper.class);conf.setCombinerClass(WordCountReducer.class);conf.setReducerClass(WordCountReducer.class);conf.setInputFormat(TextInputFormat.class);conf.setOutputFormat(TextOutputFormat.class);FileInputFormat.setInputPaths(conf, new Path(input));FileOutputFormat.setOutputPath(conf, new Path(output));JobClient.runJob(conf);System.exit(0);}}啟動(dòng)Java APP.
控制臺(tái)錯(cuò)誤
2013-9-30 19:25:02 org.apache.hadoop.util.NativeCodeLoader 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2013-9-30 19:25:02 org.apache.hadoop.security.UserGroupInformation doAs 嚴(yán)重: PriviledgedActionException as:Administrator cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Administrator\mapred\staging\Administrator1702422322\.staging to 0700 Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Administrator\mapred\staging\Administrator1702422322\.staging to 0700at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:396)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)at org.conan.myhadoop.mr.WordCount.main(WordCount.java:78)這個(gè)錯(cuò)誤是win中開(kāi)發(fā)特有的錯(cuò)誤,文件權(quán)限問(wèn)題,在Linux下可以正常運(yùn)行。
解決方法是,修改/hadoop-1.0.3/src/core/org/apache/hadoop/fs/FileUtil.java文件
688-692行注釋,然后重新編譯源代碼,重新打一個(gè)hadoop.jar的包。
685 private static void checkReturnValue(boolean rv, File p, 686 FsPermission permission 687 ) throws IOException { 688 /*if (!rv) { 689 throw new IOException("Failed to set permissions of path: " + p + 690 " to " + 691 String.format("%04o", permission.toShort())); 692 }*/ 693 }我這里自己打了一個(gè)hadoop-core-1.0.3.jar包,放到了lib下面。
我們還要替換maven中的hadoop類(lèi)庫(kù)。
~ cp lib/hadoop-core-1.0.3.jar C:\Users\Administrator\.m2\repository\org\apache\hadoop\hadoop-core\1.0.3\hadoop-core-1.0.3.jar再次啟動(dòng)Java APP,控制臺(tái)輸出:
2013-9-30 19:50:49 org.apache.hadoop.util.NativeCodeLoader 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2013-9-30 19:50:49 org.apache.hadoop.mapred.JobClient copyAndConfigureFiles 警告: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2013-9-30 19:50:49 org.apache.hadoop.mapred.JobClient copyAndConfigureFiles 警告: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 2013-9-30 19:50:49 org.apache.hadoop.io.compress.snappy.LoadSnappy 警告: Snappy native library not loaded 2013-9-30 19:50:49 org.apache.hadoop.mapred.FileInputFormat listStatus 信息: Total input paths to process : 4 2013-9-30 19:50:50 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local_0001 2013-9-30 19:50:50 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-9-30 19:50:50 org.apache.hadoop.mapred.MapTask runOldMapper 信息: numReduceTasks: 1 2013-9-30 19:50:50 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-9-30 19:50:50 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-9-30 19:50:50 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-9-30 19:50:50 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-9-30 19:50:50 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-9-30 19:50:50 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting 2013-9-30 19:50:51 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 0% reduce 0% 2013-9-30 19:50:53 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: hdfs://192.168.1.210:9000/user/hdfs/o_t_account/part-m-00003:0+119 2013-9-30 19:50:53 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0001_m_000000_0' done. 2013-9-30 19:50:53 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-9-30 19:50:53 org.apache.hadoop.mapred.MapTask runOldMapper 信息: numReduceTasks: 1 2013-9-30 19:50:53 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-9-30 19:50:53 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-9-30 19:50:53 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-9-30 19:50:53 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-9-30 19:50:53 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-9-30 19:50:53 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting 2013-9-30 19:50:54 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 0% 2013-9-30 19:50:56 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: hdfs://192.168.1.210:9000/user/hdfs/o_t_account/part-m-00000:0+113 2013-9-30 19:50:56 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0001_m_000001_0' done. 2013-9-30 19:50:56 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-9-30 19:50:56 org.apache.hadoop.mapred.MapTask runOldMapper 信息: numReduceTasks: 1 2013-9-30 19:50:56 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-9-30 19:50:56 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-9-30 19:50:56 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-9-30 19:50:56 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-9-30 19:50:56 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-9-30 19:50:56 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0001_m_000002_0 is done. And is in the process of commiting 2013-9-30 19:50:59 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: hdfs://192.168.1.210:9000/user/hdfs/o_t_account/part-m-00001:0+110 2013-9-30 19:50:59 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: hdfs://192.168.1.210:9000/user/hdfs/o_t_account/part-m-00001:0+110 2013-9-30 19:50:59 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0001_m_000002_0' done. 2013-9-30 19:50:59 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-9-30 19:50:59 org.apache.hadoop.mapred.MapTask runOldMapper 信息: numReduceTasks: 1 2013-9-30 19:50:59 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-9-30 19:50:59 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-9-30 19:50:59 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-9-30 19:50:59 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-9-30 19:50:59 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-9-30 19:50:59 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0001_m_000003_0 is done. And is in the process of commiting 2013-9-30 19:51:02 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: hdfs://192.168.1.210:9000/user/hdfs/o_t_account/part-m-00002:0+79 2013-9-30 19:51:02 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0001_m_000003_0' done. 2013-9-30 19:51:02 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-9-30 19:51:02 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-9-30 19:51:02 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 4 sorted segments 2013-9-30 19:51:02 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 4 segments left of total size: 442 bytes 2013-9-30 19:51:02 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-9-30 19:51:02 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting 2013-9-30 19:51:02 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-9-30 19:51:02 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local_0001_r_000000_0 is allowed to commit now 2013-9-30 19:51:02 org.apache.hadoop.mapred.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://192.168.1.210:9000/user/hdfs/o_t_account/result 2013-9-30 19:51:05 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2013-9-30 19:51:05 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0001_r_000000_0' done. 2013-9-30 19:51:06 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2013-9-30 19:51:06 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local_0001 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Counters: 20 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=421 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=348 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=7377 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=1535 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=209510 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=348 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=458 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Map input records=11 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=30 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=509 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=1838546944 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Map input bytes=421 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=452 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Combine input records=22 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=15 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=13 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Combine output records=15 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=13 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Map output records=22成功運(yùn)行了wordcount程序,通過(guò)命令我們查看輸出結(jié)果
~ hadoop fs -ls hdfs://192.168.1.210:9000/user/hdfs/o_t_account/resultFound 2 items -rw-r--r-- 3 Administrator supergroup 0 2013-09-30 19:51 /user/hdfs/o_t_account/result/_SUCCESS -rw-r--r-- 3 Administrator supergroup 348 2013-09-30 19:51 /user/hdfs/o_t_account/result/part-00000~ hadoop fs -cat hdfs://192.168.1.210:9000/user/hdfs/o_t_account/result/part-000001,abc@163.com,2013-04-22 1 10,ade121@sohu.com,2013-04-23 1 11,addde@sohu.com,2013-04-23 1 17:21:24.0 5 2,dedac@163.com,2013-04-22 1 20:21:39.0 6 3,qq8fed@163.com,2013-04-22 1 4,qw1@163.com,2013-04-22 1 5,af3d@163.com,2013-04-22 1 6,ab34@163.com,2013-04-22 1 7,q8d1@gmail.com,2013-04-23 1 8,conan@gmail.com,2013-04-23 1 9,adeg@sohu.com,2013-04-23 1這樣,我們就實(shí)現(xiàn)了在win7中的開(kāi)發(fā),通過(guò)Maven構(gòu)建Hadoop依賴(lài)環(huán)境,在Eclipse中開(kāi)發(fā)MapReduce的程序,然后運(yùn)行JavaAPP。Hadoop應(yīng)用會(huì)自動(dòng)把我們的MR程序打成jar包,再上傳的遠(yuǎn)程的hadoop環(huán)境中運(yùn)行,返回日志在Eclipse控制臺(tái)輸出。
7. 模板項(xiàng)目上傳github
https://github.com/bsspirit/maven_hadoop_template
大家可以下載這個(gè)項(xiàng)目,做為開(kāi)發(fā)的起點(diǎn)。
~ git clone https://github.com/bsspirit/maven_hadoop_template.git我們完成第一步,下面就將正式進(jìn)入MapReduce開(kāi)發(fā)實(shí)踐。
?
轉(zhuǎn)載請(qǐng)注明出處:
http://blog.fens.me/hadoop-maven-eclipse/
總結(jié)
以上是生活随笔為你收集整理的用Maven构建Hadoop项目的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: Mahout学习路线图
- 下一篇: 在Ubuntu中安装HBase