Ambari离线部署Hadoop集群踩到的坑
1、遠程拷貝HDP組件不全導致安裝client時缺少rpm包,手動拷貝解決
2、安裝HAWQ,啟動時報錯 passwordlell ssh hawq hosts ,hawq master 和其他主機機拷貝文件輸入密碼受限,兩方面原因: 一 root 用戶 ssh 無密登錄時 權限配置錯誤,正確的權限應該是 chmod 700 /roo/.ssh chmod 600 /root/.ssh/authorized_keys ;二:su gpadmin 在 /home/gpadmin 下新建hawq_host文件,寫入節點hostname 執行 hawq ssh-exkeys -f host_file 檢查Log發現RSA hostname 無法訪問, 修改/etc/hosts文件,重新修改hostname 成功。
3、中間安裝過程失敗卸載服務?
? ?卸載某個服務
stop:
curl -s -u admin:admin -H “X-Requested-By: Ambari” -X PUT -d ‘{“RequestInfo”:{“context”:”Stop Service”},”Body”:{“ServiceInfo”:{“state”:”INSTALLED”}}}’ http://AMBARI-HOST:8080/api/v1/clusters/CLUSTER_NAME/services/SERVICE_NAME
delete
curl -s -u admin:admin -H “X-Requested-By: Ambari” -X DELETE http://AMBARI-HOST:8080/api/v1/clusters/CLUSTER_NAME/services/SERVICE_NAME
卸載整個集群(Ambari和hadoop)
執行腳本:
#!/bin/bash
ambari-server stop
ambari-server reset
ambari-agent stop
service mysqld stop
service postgresql stop
python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py
yum remove ambari\* hadoop hdfs bigtop-jsvc bigtop-tomcat hbase\* hadoop\* hdp-select ranger\* zookeeper\* postgresql-libs postgresql postgresql-server
yum remove mysql mysql-server mysql-libs mysql-connector-java
rm -rf /opt/hadoop
rm -rf /opt/app/hadoop
rm -rf /opt/app/ambari-metrics-collector
rm -rf /opt/kafka-logs
rm -rf /usr/hdp
rm -rf /usr/hadoop
rm -rf /usr/kafka-logs
rm -rf /usr/lib/ambari*
rm -rf /usr/lib/hadoop
rm -rf /usr/lib/nagios
rm -rf /usr/lib/ams-hbase
rm -rf /var/nagios
rm -rf /var/kafka-logs
rm -rf /var/lib/ambari*
rm -rf /var/lib/flume
rm -rf /var/lib/ganglia*
rm -rf /var/lib/hadoop*
rm -rf /var/lib/hdfs
rm -rf /var/lib/hive
rm -rf /var/lib/atlas
rm -rf /var/lib/mysql
rm -rf /var/lib/pgsql
rm -rf /var/run/hadoop /var/run/hbase /var/run/zookeeper /var/run/flume /var/run/webhcat /var/run/hadoop-yarn /var/run/hadoop-mapreduce
rm -rf /var/run/accumulo
rm -rf /var/run/ambari*
rm -rf /var/run/atlas
rm -rf /var/run/nagios
rm -rf /var/run/spark
rm -rf /var/log/hbase /var/log/hive /var/log/zookeeper /var/log/flume /var/log/hadoop-yarn /var/log/hadoop-mapreduce
rm -rf /var/log/accumulo
rm -rf /var/log/ambari*
rm -rf /var/log/atlas
rm -rf /var/log/nagios
rm -rf /var/log/spark
rm -rf /var/log/hadoop
rm -rf /tmp/ambari-qa
rm -rf /etc/ambari*
rm -rf /etc/ams-hbase
rm -rf /etc/flume
rm -rf /etc/ganglia
rm -rf /etc/hadoop*
rm -rf /etc/hbase
rm -rf /etc/hive*
rm -rf /etc/nagios
rm -rf /etc/phoenix
rm -rf /etc/pig
rm -rf /etc/tez
rm -rf /etc/zookeeper
rm -rf /etc/accumulo
rm -rf /etc/atlas
rm -rf /etc/spark
rm -rf /etc/mahout
rm -rf /home/accumulo /home/ams /home/atlas /home/mahout /home/nagios /home/spark
rm -rf /etc/yum.repos.d/ambari.repo /etc/yum.repos.d/HDP-2.3.0.0.repo /etc/yum.repos.d/HDP-UTILS.repo /etc/yum.repos.d/HDP.repo
yum clean all
ps -elf | grep java
另外補充: userdel 部分
4、卸載所以服務之后 yum 不能用,發現是卸載python的組件導致
執行 whereis python ?修改 vi /usr/bin/yum 中python的目錄
5、安裝metrict-monitor client的過程中報錯, require python-2.6.6-64 while installed python-2.6.6-66
? ?在已經掛載鏡像iso的Packages中拷貝出對應的python python-devel python-lib 下載python-2.6.6-66 rpm -e --nodeps python 后重新安裝python2.6.6-64 報錯解決
6、ams服務無法停止 ,進程無法Kill,userdel 無法刪除, 重啟機器后即可。
7、datanode 和zookeeper啟動后一會自動掛掉,查Log發現 報錯 Address already in use 查看對應組件的Log ?/var/log/.....查看對應的端口,通過 netstat -anp | grep port_name kill 掉對應的進程,重新啟動服務成功。
8、hawq master無法啟動 執行 sysctl -p 后正常啟動
總結
以上是生活随笔為你收集整理的Ambari离线部署Hadoop集群踩到的坑的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: fast路由器服务器未响应,win7系统
- 下一篇: android波纹效果弹窗,Androi