Greenplum failed segment的恢复方法
轉載自:li0924的博客《Greenplum failed segment的恢復方法》
【前記】
Segment檢測及故障切換機制
GP Master首先會檢測Primary狀態(tài),如果Primary不可連通,那么將會檢測Mirror狀態(tài),Primary/Mirror狀態(tài)總共有4種:
1. Primary活著,Mirror活著。GP Master探測Primary成功之后直接返回,進行下一個Segment檢測;
2. Primary活著,Mirror掛了。GP Master探測Primary成功之后,通過Primary返回的狀態(tài)得知Mirror掛掉了(Mirror掛掉之后,Primary將會探測到,將自己變成ChangeTracking模式),這時候更新Master元信息,進行下一個Segment檢測;
3. Primary掛了,Mirror活著。GP Master探測Primary失敗之后探測Mirror,發(fā)現(xiàn)Mirror是活著,這時候更新Master上面的元信息,同時使Mirror接管Primary(故障切換),進行下一個Segment檢測;
4. Primary掛了,Mirror掛了。GP Master探測Primary失敗之后探測Mirror,Mirror也是掛了,直到重試最大值,結束這個Segment的探測,也不更新Master元信息了,進行下一個Segment檢測。
上面的2-4需要進行gprecoverseg進行segment恢復。
對失敗的segment節(jié)點;啟動時會直接跳過,忽略。
[gpadmin@mdw ~]$ gpstart 20160718:18:43:27:002949 gpstart:mdw:gpadmin-[INFO]:-Starting gpstart with args: 20160718:18:43:27:002949 gpstart:mdw:gpadmin-[INFO]:-Gathering information and validating the environment... 20160718:18:43:27:002949 gpstart:mdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.8.1 build 1' 20160718:18:43:28:002949 gpstart:mdw:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150' 20160718:18:43:28:002949 gpstart:mdw:gpadmin-[INFO]:-Starting Master instance in admin mode 20160718:18:43:30:002949 gpstart:mdw:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information 20160718:18:43:30:002949 gpstart:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20160718:18:43:30:002949 gpstart:mdw:gpadmin-[INFO]:-Setting new master era 20160718:18:43:30:002949 gpstart:mdw:gpadmin-[INFO]:-Master Started... 20160718:18:43:30:002949 gpstart:mdw:gpadmin-[INFO]:-Shutting down master 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw2 directory /home/gpadmin/gpdata/gpdatam/gpseg0 <<<<< 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:--------------------------- 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:-Master instance parameters 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:--------------------------- 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:-Database = template1 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:-Master Port = 1921 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:-Master directory = /home/gpadmin/gpdata/pgmaster/gpseg-1 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:-Timeout = 600 seconds 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:-Master standby = Off 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:--------------------------------------- 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:-Segment instances that will be started 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:--------------------------------------- 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:- Host Datadir Port Role 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:- sdw1 /home/gpadmin/gpdata/gpdatap/gpseg0 40000 Primary 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:- sdw2 /home/gpadmin/gpdata/gpdatap/gpseg1 40000 Primary 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:- sdw1 /home/gpadmin/gpdata/gpdatam/gpseg1 50000 MirrorContinue with Greenplum instance startup Yy|Nn (default=N): > y 20160718:18:43:34:002949 gpstart:mdw:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait... ........... 20160718:18:43:45:002949 gpstart:mdw:gpadmin-[INFO]:-Process results... 20160718:18:43:45:002949 gpstart:mdw:gpadmin-[INFO]:----------------------------------------------------- 20160718:18:43:45:002949 gpstart:mdw:gpadmin-[INFO]:- Successful segment starts = 3 20160718:18:43:45:002949 gpstart:mdw:gpadmin-[INFO]:- Failed segment starts = 0 20160718:18:43:45:002949 gpstart:mdw:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration) = 1 <<<<<<<< 20160718:18:43:45:002949 gpstart:mdw:gpadmin-[INFO]:----------------------------------------------------- 20160718:18:43:45:002949 gpstart:mdw:gpadmin-[INFO]:- 20160718:18:43:46:002949 gpstart:mdw:gpadmin-[INFO]:-Successfully started 3 of 3 segment instances, skipped 1 other segments 20160718:18:43:46:002949 gpstart:mdw:gpadmin-[INFO]:----------------------------------------------------- 20160718:18:43:46:002949 gpstart:mdw:gpadmin-[WARNING]:-**************************************************************************** 20160718:18:43:46:002949 gpstart:mdw:gpadmin-[WARNING]:-There are 1 segment(s) marked down in the database 20160718:18:43:46:002949 gpstart:mdw:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg 20160718:18:43:46:002949 gpstart:mdw:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases. 20160718:18:43:46:002949 gpstart:mdw:gpadmin-[WARNING]:-**************************************************************************** 20160718:18:43:46:002949 gpstart:mdw:gpadmin-[INFO]:-Starting Master instance mdw directory /home/gpadmin/gpdata/pgmaster/gpseg-1 20160718:18:43:48:002949 gpstart:mdw:gpadmin-[INFO]:-Command pg_ctl reports Master mdw instance active 20160718:18:43:49:002949 gpstart:mdw:gpadmin-[INFO]:-No standby master configured. skipping... 20160718:18:43:49:002949 gpstart:mdw:gpadmin-[WARNING]:-Number of segments not attempted to start: 1 20160718:18:43:49:002949 gpstart:mdw:gpadmin-[INFO]:-Check status of database with gpstate utility查看數(shù)據(jù)庫的mirror的節(jié)點啟動狀態(tài)
[gpadmin@mdw ~]$ gpstate -m 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -m 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.8.1 build 1' 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.8.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Apr 20 2016 08:08:56' 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:-------------------------------------------------------------- 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:--Current GPDB mirror list and status 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:--Type = Spread 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:-------------------------------------------------------------- 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:- Mirror Datadir Port Status Data Status 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[WARNING]:-sdw2 /home/gpadmin/gpdata/gpdatam/gpseg0 50000 Failed <<<<<<<< 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:- sdw1 /home/gpadmin/gpdata/gpdatam/gpseg1 50000 Passive Synchronized 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:-------------------------------------------------------------- 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[WARNING]:-1 segment(s) configured as mirror(s) have failed可直觀看出“[WARNING]:-sdw2 /home/gpadmin/gpdata/gpdatam/gpseg0 50000 Failed ”
如何恢復這個mirror segment呢?當然primary segment也是這樣恢復的
你要不要把primary , mirror角色對調一下,因為現(xiàn)在mirror和primary和優(yōu)先角色是相反的。
如果要對調,使用以下命令,會停庫來處理。
用于修復Segment的是gprecoverseg。使用方式比較簡單,有限的幾個主要參數(shù)如下:
-i :主要參數(shù),用于指定一個配置文件,該配置文件描述了需要修復的Segment和修復后的目的位置。
-F :可選項,指定后,gprecoverseg會將”-i”中指定的或標記”d”的實例刪除,并從活著的Mirror復制一個完整一份到目標位置。
-r :當FTS發(fā)現(xiàn)有Primary宕機并進行主備切換,在gprecoverseg修復后,擔當Primary的Mirror角色并不會立即切換回來,就會導致部分主機上活躍的Segment過多從而引起性能瓶頸。因此需要恢復Segment原先的角色,稱為re-balance。
總結
以上是生活随笔為你收集整理的Greenplum failed segment的恢复方法的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: ssh出错 sign_and_send_
- 下一篇: 设置MySQL服务的字符集为uft8