Hadoop详细配置
目錄
第1章?概要說明4
1.1?Hadoop是什么?4
1.2?為什么選擇CDH版本?4
1.3?集群配置環境4
1.4?網絡結構圖5
第2章?安裝hadoop環境6
2.1?準備安裝包6
2.2?默認用戶組root:root6
2.3?卸載自帶的jdk6
2.4?安裝和配置jdk環境6
2.5?配置/etc/hosts6
2.6?配置ssh無密碼登陸7
2.7?處理防火墻7
2.8?將hadoop-2.0.0-cdh4.2.0.zip上傳到/opt,并解壓縮9
2.9?編輯core-site.xml文件9
2.10?編輯hdfs-site.xml文件9
2.11?編輯slaves文件10
2.12?編輯mapred-site.xml文件10
2.13?編輯yarn-site.xml文件11
2.14?編輯.bashrc文件13
2.15?將master01機上的/opt/hadoop拷貝到其他機器上14
2.16?第一次啟動hadoop需要先格式化NameNode14
2.17?在master01機上啟動hdfs:14
2.18?在master01機上啟動mapreduce,historyserver14
2.19?查看master01機的MapReduce15
2.20?查看slave01,slave02的節點15
2.21?檢查各臺機器的集群進程15
2.22?關閉服務15
第3章?Zookeeper安裝16
3.1?準備安裝包16
3.2?解壓16
3.3?修改zoo.cfg文件16
3.4?修改環境變量17
3.5?創建data文件夾及修改myid文件17
3.6?將文件復制至其他機器17
3.7?啟動18
3.8?檢查是否成功18
3.9?停止服務18
3.10?參考文檔18
第4章?Hive的安裝19
4.1?準備安裝包19
4.2?準備機器19
4.3?訪問mysql19
4.4?配置hive-site.xml文件,將meta信息保存在mysql里19
4.5?將mysql-connector-java-5.1.18.tar.gz解壓22
4.6?Mysql的一些操作22
4.7?查看日志記錄22
4.8?Hive導入本地數據命令22
第5章?Hive+Thrift+PHP整合23
5.1?準備安裝包23
5.2?編輯代碼23
5.3?啟動hiveserver24
5.4?查看默認開啟的10000端口24
5.5?測試24
5.6?出錯提示及解決辦法24
第6章?sqoop安裝使用25
6.1?準備安裝包25
6.2?前提工作25
6.3?安裝25
6.4?放置mysql驅動包25
6.5?修改configure-sqoop文件25
6.6?將路徑加入PATH25
6.7?使用測試26
6.8?出錯提示及解決辦法27
6.9?參考27
?
第1章?概要說明
1.1?Hadoop是什么?
Hadoop一個分布式系統基礎架構,由Apache基金會開發。用戶可以在不了解分布式底層細節的情況下,開發分布式程序。充分利用集群的威力高速運算和存儲。Hadoop實現了一個分布式文件系統(Hadoop?Distributed?File?System),簡稱HDFS。HDFS有著高容錯性的特點,并且設計用來部署在低廉的(low-cost)硬件上。而且它提供高傳輸率(high?throughput)來訪問應用程序的數據,適合那些有著超大數據集(large?data?set)的應用程序。HDFS放寬了(relax)POSIX的要求(requirements)這樣可以流的形式訪問(streaming?access)文件系統中的數據。
1.2?為什么選擇CDH版本?
?0?1?CDH基于穩定版Apache?Hadoop,并應用了最新Bug修復或者Feature的Patch。Cloudera常年堅持季度發行Update版本,年度發行Release版本,更新速度比Apache官方快,而且在實際使用過程中CDH表現無比穩定,并沒有引入新的問題。
?0?1?Cloudera官方網站上安裝、升級文檔詳細,省去Google時間。
?0?1?CDH支持Yum/Apt包,Tar包,RPM包,Cloudera?Manager四種方式安裝
?0?1?獲取最新特性和最新Bug修復;安裝維護方便,節省運維時間
1.3?集群配置環境
[root@master01?~]#?lsb_release?-a
LSBVersion:????:base-4.0-ia32:base-4.0-noarch:core-4.0-ia32:core-4.0-noarch:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-ia32:printing-4.0-noarch
Distributor?ID:?CentOS
Description:????CentOS?release?6.4?(Final)
Release:????????6.4
Codename:???????Final
1.4?網絡結構圖
?
第2章?安裝hadoop環境
2.1?準備安裝包
jdk-7-linux-i586.rpm???[77.2M]
hadoop-2.0.0-cdh4.2.0??[129M]???
此安裝包URL下載:http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html
http://archive.cloudera.com/cdh4/cdh/4/
2.2?默認用戶組root:root
2.3?卸載自帶的jdk
[root@master01?local]#?rpm?-qa?|?grep?jdk
java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.i686
yum?-y?remove?java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.i686
yum?-y?remove?java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.i686
2.4?安裝和配置jdk環境
[root@master01?local]#?rpm?-ivh?jdk-7-linux-i586.rpm
Preparing...????????????????###########################################?[100%]
???1:jdk????????????????????###########################################?[100%]
&?注意
下面有設置JAVA_HOME環境的清單,寫在~/.bashrc.sh文件里
另外請注意:生產環境下一般為64位機,請下載相應的64位JDK包進行安裝
2.5?配置/etc/hosts
vi?/etc/hosts
192.168.2.18???master01
192.168.2.19???master02
192.168.2.163??slave01
192.168.2.38???slave02
192.168.2.212???slave03
&?注意:其他機器也要修改
rsync??-vzrtopgu???--progress?/etc/hosts?192.168.2.38:/etc/hosts
2.6?配置ssh無密碼登陸
ssh-keygen?-t?rsa
ssh-copy-id?-i?~/.ssh/id_rsa.pub?root@slave01
ssh-copy-id?-i?~/.ssh/id_rsa.pub?root@slave02
&?注意
Master01機本身也要設置一下哦!
cd?~
cat?id_rsa.pub?>>authorized_keys
2.7?處理防火墻
service?iptables?stop
&?說明
如果不關閉防火墻,讓datanode通過namenode機的訪問,請配置slave01,slave02等相關機器的iptables表,各臺機器都要能互相訪問
vi?/etc/sysconfig/iptables
添加:
-I?INPUT?-s?192.168.2.18?-j?ACCEPT
-I?INPUT?-s?192.168.2.38?-j?ACCEPT
-I?INPUT?-s?192.168.2.87?-j?ACCEPT
開啟master01的8088和50070端口,方便WEB訪問namenode和mapreduce
圖1?
?
圖2?
?
2.8?將hadoop-2.0.0-cdh4.2.0.zip上傳到/opt,并解壓縮
tar?xzvf?hadoop-2.0.0-cdh4.2.0.tar.gz
mv?hadoop-2.0.0-cdh4.2.0?hadoop
cd?hadoop/etc/hadoop/
③ 解壓后進入:~/bin/hadoop-0.20.2/conf/,修改配置文件:
修改hadoop-env.sh:
export JAVA_HOME=/root/bin/jdk1.6.0_32 轉載注明出處:博客園 石頭兒 http://www.cnblogs.com/shitouer/hadoop-env.sh里面有這一行,默認是被注釋的,只需要把注釋去掉,并且把JAVA_HOME 改成你的java安裝目錄即可
2.9?編輯core-site.xml文件
vi?core-site.xml
<configuration>
<property>
?????<name>fs.defaultFS</name>
????????<value>hdfs://master01</value>
</property>
<property>
????????<name>fs.trash.interval</name>
????????<value>10080</value>
</property>
<property>
????????<name>fs.trash.checkpoint.interval</name>
????????<value>10080</value>
</property>
</configuration>
2.10?編輯hdfs-site.xml文件
vi?hdfs-site.xml
<configuration>
<property>
??????????<name>dfs.replication</name>
??????????<value>1</value>
</property>
<property>
????????<name>hadoop.tmp.dir</name>
????????<value>/opt/data/hadoop-${user.name}</value>
</property>
<property>
????????<name>dfs.namenode.http-address</name>
????????<value>master01:50070</value>
</property>
<property>
????????<name>dfs.secondary.http.address</name>
????????<value>master02:50090</value>
</property>
<property>
????????<name>dfs.webhdfs.enabled</name>
????????<value>true</value>
</property>
</configuration>
2.11?編輯slaves文件
vi?slaves
slave01
slave02
2.12?編輯mapred-site.xml文件
cp?mapred-site.xml.template?mapred-site.xml
vi?mapred-site.xml
<configuration>
<property>
?????????<name>mapreduce.framework.name</name>
?????????<value>yarn</value>
</property>
<property>
?????????<name>mapreduce.jobhistory.address</name>
?????????<value>master01:10020</value>
</property>
<property>
?????????<name>mapreduce.jobhistory.webapp.address</name>
?????????<value>master01:19888</value>
</property>
</configuration>
2.13?編輯yarn-site.xml文件
<!--[if gte mso 9]><xml><w:WordDocument><w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel><w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery><w:DisplayVerticalDrawingGridEvery>2</w:DisplayVerticalDrawingGridEvery><w:DocumentKind>DocumentNotSpecified</w:DocumentKind><w:DrawingGridVerticalSpacing>7.8</w:DrawingGridVerticalSpacing><w:View>Normal</w:View><w:Compatibility></w:Compatibility><w:Zoom>0</w:Zoom></w:WordDocument></xml><![endif]-->
vi?yarn-site.xml
<configuration>
<!--?Site?specific?YARN?configuration?properties?-->
<property>
????<name>yarn.resourcemanager.resource-tracker.address</name>
????<value>master01:8031</value>
??</property>
??<property>
????<name>yarn.resourcemanager.address</name>
????<value>master01:8032</value>
??</property>
??<property>
????<name>yarn.resourcemanager.scheduler.address</name>
????<value>master01:8030</value>
??</property>
??<property>
????<name>yarn.resourcemanager.admin.address</name>
????<value>master01:8033</value>
??</property>
??<property>
????<name>yarn.resourcemanager.webapp.address</name>
????<value>master01:8088</value>
??</property>
??<property>
????<description>Classpath?for?typical?applications.</description>
????<name>yarn.application.classpath</name>
????<value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,
????$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
?$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
????$YARN_HOME/share/hadoop/yarn/*,$YARN_HOME/share/hadoop/yarn/lib/*,
????$YARN_HOME/share/hadoop/mapreduce/*,$YARN_HOME/share/hadoop/mapreduce/lib/*</value>
??</property>
??<property>
????<name>yarn.nodemanager.aux-services</name>
????<value>mapreduce.shuffle</value>
??</property>
??<property>
????<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
????<value>org.apache.hadoop.mapred.ShuffleHandler</value>
??</property>
??<property>
????<name>yarn.nodemanager.local-dirs</name>
????<value>/opt/data/yarn/local</value>
??</property>
??<property>
????<name>yarn.nodemanager.log-dirs</name>
????<value>/opt/data/yarn/logs</value>
??</property>
??<property>
????<description>Where?to?aggregate?logs</description>
????<name>yarn.nodemanager.remote-app-log-dir</name>
????<value>/opt/data/yarn/logs</value>
??</property>
??<property>
????<name>yarn.app.mapreduce.am.staging-dir</name>
????<value>/user</value>
?</property>
</configuration>
1.1?編輯.bashrc文件
cd?~
vi?.bashrc
#export?LANG=zh_CN.utf8
export?JAVA_HOME=/usr/java/jdk1.7.0
export?JRE_HOME=$JAVA_HOME/jre
export?CLASSPATH=./:$JAVA_HOME/lib:$JRE_HOME/lib:$JRE_HOME/lib/tools.jar
export?HADOOP_HOME=/opt/hadoop
export?HIVE_HOME=/opt/hive
export?HBASE_HOME=/opt/hbase
export?HADOOP_MAPRED_HOME=${HADOOP_HOME}
export?HADOOP_COMMON_HOME=${HADOOP_HOME}
export?HADOOP_HDFS_HOME=${HADOOP_HOME}
export?YARN_HOME=${HADOOP_HOME}
export?HADOOP_YARN_HOME=${HADOOP_HOME}
export?HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export?HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export?YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export?PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin
source?.bashrc
1.2?將master01機上的/opt/hadoop拷貝到其他機器上
rsync?-vzrtopgu???--progress?hadoop??slave01:/opt/
rsync?-vzrtopgu???--progress?hadoop??slave02:/opt/
或者
rsync?-vzrtopgu???--progress?hadoop??192.168.2.38:/opt/
rsync?-vzrtopgu???--progress?hadoop??192.168.2.163:/opt/
&?rsync命令參數解釋
-v,?--verbose?詳細模式輸出?
-z,?--compress?對備份的文件在傳輸時進行壓縮處理?
-r,?--recursive?對子目錄以遞歸模式處理?
-t,?--times?保持文件時間信息?
-o,?--owner?保持文件屬主信息?
-p,?--perms?保持文件權限?
-g,?--group?保持文件屬組信息?
-u,?--update?僅僅進行更新,也就是跳過所有已經存在于DST,并且文件時間晚于要備份的文件。(不覆蓋更新的文件)?
1.3?第一次啟動hadoop需要先格式化NameNode
/opt/hadoop/bin/hadoop?namenode?-format
&?說明:
該操作只做一次。當修改了配置文件時,需要重新格式化?
1.4?在master01機上啟動hdfs:
/opt/hadoop/sbin/start-dfs.sh
1.5?在master01機上啟動mapreduce,historyserver
/opt/hadoop/sbin/start-yarn.sh
/opt/hadoop/sbin/mr-jobhistory-daemon.sh?start?historyserver
1.6?查看master01機的MapReduce
http://192.168.2.18:8088/cluster
1.7?查看slave01,slave02的節點
http://192.168.2.163:8042/node/node
1.8?檢查各臺機器的集群進程
[root@master01?~]#?jps
5389?NameNode
5980?Jps
5710?ResourceManager
7032?JobHistoryServer
[root@slave01?~]#?jps
3187?Jps
3124?SecondaryNameNode
[root@slave02~]#?jps
3187?Jps
3124?DataNode
5711?NodeManager
1.9?關閉服務
/opt/hadoop/sbin/stop-all.sh
第2章?Zookeeper安裝
2.1?準備安裝包
zookeeper-3.4.5-cdh4.2.0.tar.gz
2.2?解壓
tar?xzvf?zookeeper-3.4.5-cdh4.2.0.tar.gz
mv?zookeeper-3.4.5-cdh4.2.0?zookeeper
2.3?修改zoo.cfg文件
cd?conf/
cp?zoo_sample.cfg?zoo.cfg
vi?zoo.cfg
#?The?number?of?milliseconds?of?each?tick
tickTime=2000
#?The?number?of?ticks?that?the?initial
#?synchronization?phase?can?take
initLimit=10
#?The?number?of?ticks?that?can?pass?between
#?sending?a?request?and?getting?an?acknowledgement
syncLimit=5
#?the?directory?where?the?snapshot?is?stored.
#?do?not?use?/tmp?for?storage,?/tmp?here?is?just
#?example?sakes.
dataDir=/opt/zookeeper/data
#dataLogDir=/opt/zookeeper/log
#?the?port?at?which?the?clients?will?connect
clientPort=2181
#
#?Be?sure?to?read?the?maintenance?section?of?the
#?administrator?guide?before?turning?on?autopurge.
#
#?http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
#?The?number?of?snapshots?to?retain?in?dataDir
#autopurge.snapRetainCount=3
#?Purge?task?interval?in?hours
#?Set?to?"0"?to?disable?auto?purge?feature
#autopurge.purgeInterval=1
server.1=master01:2888:3888?????
server.2=master02:2888:3888?????
server.3=slave01:2888:3888
server.4=slave02:2888:3888
2.4?修改環境變量
vi?~/.bashrc
export?ZOOKEEPER_HOME=/opt/zookeeper
export?PATH=$PATH:$ZOOKEEPER_HOME/bin
2.5?創建data文件夾及修改myid文件
mkdir?/opt/zookeeper/data
touch?myid
vi?myid
第一臺機器寫入數字1
第二臺機器寫入數字2
依此類推
2.6?將文件復制至其他機器
rsync?-vzrtopgu???--progress?zookeeper??master02:/opt/
rsync?-vzrtopgu???--progress?zookeeper??slave01:/opt/
rsync?-vzrtopgu???--progress?zookeeper??slave02:/opt/
2.7?啟動
sh?/opt/zookeeper/bin/zkServer.sh?start
[root@master01?zookeeper]#?jps
3459?JobHistoryServer
6259?Jps
2906?NameNode
3171?ResourceManager
6075?QuorumPeerMain
2.8?檢查是否成功
/opt/zookeeper/bin/zkCli.sh?-server?master01:2181?
或者
sh?/opt/zookeeper/bin/zkServer.sh?stop
2.9?停止服務
sh?/opt/zookeeper/bin/zkServer.sh?stop
2.10?參考文檔
http://archive.cloudera.com/cdh4/cdh/4/zookeeper-3.4.5-cdh4.2.0/
第3章?Hive的安裝
3.1?準備安裝包
hive-0.10.0-cdh4.2.0???[43.2M]
mysql-connector-java-5.1.18.tar.gz???[3.65M]
3.2?準備機器
slave03機器,安裝hive+thrift+sqoop,專門作為數據分析用途。
3.3?訪問mysql
和mysql整合前,請務必配置好各機器間能訪問Mysql服務器機?
GRANT?select,?insert,?update,?delete?ON?*.*?TO?'hadoop'@'slave01'?IDENTIFIED?BY?'hadoop';
GRANT?select,?insert,?update,?delete?ON?*.*?TO?'hadoop'@'slave01'?IDENTIFIED?BY?'hadoop';
GRANT?select,?insert,?update,?delete?ON?*.*?TO?'hadoop'@'slave01'?IDENTIFIED?BY?'hadoop';
flush?privileges;
show?grants?for?'hive'@'slave03';
revoke?all?on?*.*?from?'hadoop'@'slave01';
drop?user?'hive'@'slave03';
&?說明
測試環境下,本人仍然用slave03機做mysql服務器。在實際生產環境中,建議用專門的機器做Mysql。
3.4?配置hive-site.xml文件,將meta信息保存在mysql里
cd?/opt/hive
vi?hive-site.xml
<?xml?version="1.0"?>
<?xml-stylesheet?type="text/xsl"?href="configuration.xsl"?>
<configuration>
<property>
??<name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://slave03:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8</value>?
?<description>JDBC?connect?string?for?a?JDBC?metastore</description>
</property>
<property>
??<name>javax.jdo.option.ConnectionDriverName</name>
??<value>com.mysql.jdbc.Driver</value>
??<description>Driver?class?name?for?a?JDBC?metastore</description>
</property>
<property>
??<name>javax.jdo.option.ConnectionUserName</name>
??<value>hadoop</value>
??<description>username?to?use?against?metastore?database</description>
</property>
<property>
??<name>javax.jdo.option.ConnectionPassword</name>
??<value>hadoop</value>
??<description>password?to?use?against?metastore?database</description>
</property>
<property>
?<name>mapred.job.tracker</name>
?<value>master01:8031</value>
</property>
<property>
?<name>mapreduce.framework.name</name>
?<value>yarn</value>
</property>
<property>
??<name>hive.metastore.warehouse.dir</name>
??<value>/opt/data/warehouse-${user.name}</value>
??<description>location?of?default?database?for?the?warehouse</description>
</property>
<property>
??<name>hive.exec.scratchdir</name>
??<value>/opt/data/hive-${user.name}</value>
??<description>Scratch?space?for?Hive?jobs</description>
</property>
<property>
??<name>hive.querylog.location</name>
??<value>/opt/data/querylog-${user.name}</value>
??<description>
????Location?of?Hive?run?time?structured?log?file
??</description>
</property>
<property>
??<name>hive.support.concurrency</name>
??<description>Enable?Hive's?Table?Lock?Manager?Service</description>
??<value>false</value>
</property>
<property>
??<name>hive.hwi.listen.host</name>
??<value>master01</value>
??<description>This?is?the?host?address?the?Hive?Web?Interface?will?listen?on</description>
</property>
<property>
??<name>hive.hwi.listen.port</name>
??<value>9999</value>
??<description>This?is?the?port?the?Hive?Web?Interface?will?listen?on</description>
</property>
<property>
??<name>hive.hwi.war.file</name>
??<value>lib/hive-hwi-0.10.0-cdh4.2.0.war</value>
??<description>This?is?the?WAR?file?with?the?jsp?content?for?Hive?Web?Interface</description>
</property>
</configuration>
3.5?將mysql-connector-java-5.1.18.tar.gz解壓
tar?xzvf?mysql-connector-java-5.1.18.tar.gz
mv?mysql-connector-java-5.1.18-bin.jar?/opt/hive/lib
3.6?Mysql的一些操作
create?database?hive;
alter?database?hive?character?set?latin1;
&?注意:
如果不設置上述命令,則會出現如下:
Specified?key?was?too?long;?max?key?length?is?767?bytes
3.7?查看日志記錄
tail?/tmp/root/hive.log
3.8?Hive導入本地數據命令
1)?CREATE?TABLE?mytest2(num?INT,?name?STRING)??COMMENT?'only?a?test'???????????????????????????ROW?FORMAT?DELIMITED?FIELDS?TERMINATED?BY?'\t'?STORED?AS?TEXTFILE;???
2)?LOAD?DATA?LOCAL?INPATH?'/var/22.txt'?INTO?TABLE?mytest2;???
第4章?Hive+Thrift+PHP整合
4.1?準備安裝包
Thrift.zip????[71.7K]??下載URL:http://download.csdn.net/detail/jiedushi/3409880
PHP安裝,略過
4.2?編輯代碼
vi?test.php
<?php
????$GLOBALS['THRIFT_ROOT']?=?'/home/wwwroot/Thrift/';
????require_once?$GLOBALS['THRIFT_ROOT']?.?'packages/hive_service/ThriftHive.php';
????require_once?$GLOBALS['THRIFT_ROOT']?.?'transport/TSocket.php';
????require_once?$GLOBALS['THRIFT_ROOT']?.?'protocol/TBinaryProtocol.php';
?
????$transport?=?new?TSocket('slave03',?10000);
????$protocol?=?new?TBinaryProtocol($transport);
????$client?=?new?ThriftHiveClient($protocol);
????$transport->open();
?
????#$client->execute('add?jar?/opt/hive/lib/hive-contrib-0.10.0-cdh4.2.0.jar?');
????$client->execute("LOAD?DATA?LOCAL?INPATH?'/var/22.txt'?INTO?TABLE?mytest2");
????$client->execute("SELECT?COUNT(1)?FROM?mytest2");
????var_dump($client->fetchAll());
????$transport->close();
?>
&?說明:
/var/22.txt文件內容為:
1???????jj
2???????kk
與上一章2.5的操作同步
4.3?啟動hiveserver
/opt/hive/bin/hive?--service?hiveserver?>/dev/null?2>/dev/null?&
4.4?查看默認開啟的10000端口
netstat?-lntp|grep?10000
4.5?測試
php?test.php
4.6?出錯提示及解決辦法
?0?1?Warning:?stream_set_timeout():?supplied?argument?is?not?a?valid?stream?resource?in?/home/wwwroot/Thrift/transport/TSocket.php?on?line?213
修改php.ini中的disable_functions
disable_functions?=?passthru,exec,system,chroot,scandir,chgrp,chown,shell_exec,proc_get_status,ini_alter,ini_alter,ini_restore,dl,openlog,syslog,readlink,symlink,popepassthru
第5章?sqoop安裝使用
5.1?準備安裝包
sqoop-1.4.2-cdh4.2.0.tar.gz?????[6M]
5.2?前提工作
按第一章的介紹步驟配置好hadoop,環境變量HADOOP_HOME已經設置好。
5.3?安裝
cd?/opt/
tar?xzvf?sqoop-1.4.2-cdh4.2.0.tar
mv?sqoop-1.4.2-cdh4.2.0?sqoop
5.4?放置mysql驅動包
將mysql-connector-java-5.1.18-bin.jar包放至/opt/sqoop/lib下
5.5?修改configure-sqoop文件
vi?/opt/sqoop/bin/configure-sqoop
因為沒安裝hbase,請注釋
#if?[?!?-d?"${HBASE_HOME}"?];?then
#??echo?"Warning:?$HBASE_HOME?does?not?exist!?HBase?imports?will?fail."
#??echo?'Please?set?$HBASE_HOME?to?the?root?of?your?HBase?installation.'
#fi
5.6?將路徑加入PATH
vi?~/.bashrc
export?PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin:$ANT_HOME/bin:/opt/sqoop/bin
5.7?使用測試
?0?1?列出mysql數據庫中的所有數據庫命令
sqoop?list-databases?--connect?jdbc:mysql://slave03:3306/?--username?hadoop?--password?hadoop
?0?1?列出表名:
sqoop?list-tables?-connect?jdbc:mysql://slave03/ggg?-username?hadoop?-password?hadoop
?0?1?將關系型數據的表結構復制到hive中
sqoop?create-hive-table?--connect?jdbc:mysql://master01:3306/ggg?--table?hheccc_area?--username?hadoop?--password?hadoop?--hive-table?ggg_hheccc_area
?0?1?從關系數據庫導入文件到hive中
sqoop?import?-connect?jdbc:mysql://slave03/ggg?-username?hadoop?-password?hadoop?-table?sp_log_fee?-hive-import?--hive-table?hive_log_fee?--split-by?id?-m?4
&?參照
一般導入:?
import?\
???????--append?\
???????--connect?$DS_BJ_HOTBACKUP_URL?\
???????--username?$DS_BJ_HOTBACKUP_USER?\
???????--password?$DS_BJ_HOTBACKUP_PWD?\
???????--table?'seven_book_sync'?\
???????--where?"create_date?>=?'${par_31days}'?and?create_date?<?'${end_date}'"?\
???????--hive-import?\
???????--hive-drop-import-delims?\
???????--hive-table?${hive_table}?\????????//可以點分法識別schema.table
???????--m?1
以時間作為增量條件是最好的辦法
并行導入:
sqoop?import?--append?--connect?$CONNECTURL?--username?$ORACLENAME?--password?$ORACLEPASSWORD?--target-dir?$hdfsPath??--m?12?--split-by?CLIENTIP?--table?$oralceTableName?--columns?$columns?--fields-terminated-by?'\001'??--where?"data_desc='2011-02-26'"?
增量導入:
sqoop?import???--connect?jdbc:mysql://master01:3306/ggg?--username?hadoop?--password?hadoop?--table?hheccc_area?--columns?"id,name,reid,disorder"?--direct?--hive-import???--hive-table?hheccc_area?--incremental?append??--check-column?id?--last-value?0
sqoop?job?--exec?area_import
以上為網上找來的命令,經測試,不起作用。留著僅供參考。
?0?1?將hive中的表數據導出到mysql中
sqoop?export?--connect?jdbc:mysql://master01:3306/ggg?--username?hadoop?--password?hadoop?--table?mytest2?--export-dir?/opt/data/warehouse-root/ggg_hheccc_area
&?備注
分區保存:/user/hive/warehouse/uv/dt=2011-08-03
5.8?出錯提示及解決辦法
?0?1?Encountered?IOException?running?import?job:?org.apache.hadoop.fs.FileAlreadyExistsException:?Output?directory?hdfs://master01/user/root/hheccc_area?already?exists
/opt/hadoop/bin/hadoop?fs?-rm?-r?/user/root/hheccc_area
5.9?參考
http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html
http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html
- hadoop2.0.0-CDH4.2.0系列手工安裝指南.rar?(822 KB)
- 下載次數: 60
- 查看圖片附件
總結
以上是生活随笔為你收集整理的Hadoop详细配置的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 迷你世界狙击枪怎么做
- 下一篇: 欧洲五大移动运营商计划到 2025 年大