RHCS + GNBD实现基于multipath上的GFS文件系统
生活随笔
收集整理的這篇文章主要介紹了
RHCS + GNBD实现基于multipath上的GFS文件系统
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
在Red Hat集群和存儲套件(RHCS)中,提供了針對不同的要求構(gòu)建集群的套件,其中包括HA集群、Load balance(LVS)集群以及存儲集群套件。在存儲集群套件中比較著名的是GFS文件系統(tǒng),能夠?qū)崿F(xiàn)在HA和LVS集群環(huán)境中對共享存儲安全和高效訪問的要求,當然GFS目前在集群環(huán)境中使用也比較多。
不過在GFS之外,RHCS中還提供了一個稱為GNBD的東西。GNBD全稱為(Global Network Block Device) ,全局網(wǎng)絡塊設備。該軟件提供了一個基于以太網(wǎng)訪問塊設備也就是存儲設備的一種機制和功能。GNBD通常情況下會被部署在多臺已經(jīng)安裝GFS模塊的服務器上,并且通過不同的配置,安裝gnbd的設備可以做gnbd客戶端也可以做gnbd服務器。其區(qū)別取決于在他們上面所執(zhí)行的操作——在GNBD服務器上可以通過GNBD來導出自己的一個本地設備,相當于通過網(wǎng)絡共享給其他主機,而其他主機可以像訪問本地設備一樣實現(xiàn)對gnbdserver導出設備的讀寫操作。
從剛才的描述中我們應該可以發(fā)現(xiàn)gnbd的功能類似于iscsi的iscsitarget和iscsiinitiator。這點沒錯,但是gnbd相比iscsi又有點不同:
首先gnbd在實現(xiàn)類似iscsi這樣功能的同時還帶有內(nèi)置的fence功能,在通過網(wǎng)絡對塊設備進行訪問的過程中能夠像電源fence那樣切斷主機和存儲之間的聯(lián)系,當然這個fence肯定不是電源fence,而是中斷對存儲訪問來達到fence的目的;
其次gnbd可以結(jié)合device-mapper-multipath來實現(xiàn)對GFS文件系統(tǒng)的多路徑訪問以及線路冗余,在這樣的要求下,可以使gnbdclient可以通過兩個gnbdserver來訪問其上的GFS,如果一個server斷連,只要另外一個server存在,他就還會通過網(wǎng)絡提供GFS文件系統(tǒng)的訪問。盡管我們通過iscsi配置也能實現(xiàn)該功能,不過在這方面似乎沒有g(shù)nbd名頭響亮,因為其開發(fā)者特意在官方文檔中提到了gnbd的multipath功能,而沒有提到iscsi也能實現(xiàn)multipath;
最后一點不同就是和iscsi相比,gnbd的性能會差很多,以至于在未來RHEL系統(tǒng)中的RHCS里是否保留GNBD都是一個未知數(shù),但是可以確認的是對GNBD的開發(fā)已經(jīng)停止了。
既然明知道這是一個過氣的產(chǎn)品,為什么還要寫在其基礎上進行操作的文檔?其實原因也很簡單:
盡管GNBD有諸多不好的地方,但是在RHEL中直到5u2版本才推出一個預覽版性質(zhì)的iscsitarget實現(xiàn)工具,叫做tgtd和tgtadm。可見在此之前一直沒有官方認可的iscsitarget程序。所以如果客戶需要一個官方的解決方案,盡管這個東西性能差但還非他莫屬;同時他可以結(jié)合device-mapper-multipath實現(xiàn)多路徑的功能,在預算不太充足的情況下也是一些企業(yè)的權(quán)宜之計。所以也有必要試驗一些并寫點心得。
不過在結(jié)合device-mapper實現(xiàn)multipath試驗過程中才發(fā)現(xiàn),這個東西其實惡心得還真不是一點半點,因為已經(jīng)是過氣的產(chǎn)品,所以少有人關(guān)注也鮮有維護開發(fā)者,相應的文檔就更是鳳毛麟角。因此在實驗過程中碰到的釘子到最后都成了無頭公案,很多出現(xiàn)的問題即便解決了也并非能從根本上找到原因。所以這里對于那些執(zhí)迷不悟還繼續(xù)要使用或者有膽量使用gnbd的人先敲個警鐘——看看好了,但別真拿到生產(chǎn)環(huán)境中玩,否則我將以名譽保證以后將碰到更多形形色色的問題,當然如果有哪個無聊的冤大頭一定要玩的話,RHEL會在他的RHCS中提供更好的東西。
不過盡管如此,在整個試驗的過程中也要感謝jerry卡王和來自捷克的bmarzins兄提供的技術(shù)支持和大力幫助,尤其是bmarzins兄不顧六個小時的時差對于我的問題和求助給予非常具體和高屋建瓴的意見,以至于在好幾次我已經(jīng)快絕望的時候又見到了曙光,盡管可能并沒有從根本上解決所遇到的問題,但是他的一些提示對這個試驗的繼續(xù)進行提供了重要的線索和支持,最終這個試驗能夠有一個相對滿意的結(jié)果。再次證明了bmarzins兄作為Red Hat資深存儲工程師的深厚功底和嚴謹態(tài)度。
所以為了對得起兩位高人,我將這次的試驗過程整理成文檔,以下是通過GNBD實現(xiàn)基于multipath上GFS訪問的完整記錄:
按照我寫文檔的慣例,首先是這個試驗的拓撲結(jié)構(gòu)和相關(guān)說明:
如下圖所示,在該結(jié)構(gòu)中有五臺服務器,全部使用RHEL5u2系統(tǒng)。其中一臺通過安裝和配置scsi-target來實現(xiàn)iscsitarget服務器,有兩臺iscsiinitiator作為客戶端,并同時作為gnbdserver,另外兩臺服務器作為gnbdclient。每個gnbdclient都有兩條鏈路各自通過一個獨立的vmnet連接到相應的gnbdserver從而實現(xiàn)了一個多路徑的物理結(jié)構(gòu),當然這兩條鏈路盡管都是以太網(wǎng)鏈路,但其實是用于scsi數(shù)據(jù)傳輸?shù)摹N矣盟{色表示連接到gnbdserver1的存儲鏈路,用綠色表示連接到gnbdserver2的存儲鏈路。用黑色表示從iscsiinitiator到scsitarget的iscsi存儲鏈路。最后由于整個的gnbdserver和gnbdclient都要在集群中,那么需要一條獨立的心跳鏈路,就是紅色的192.168.10.0/24這條鏈路。
這個結(jié)構(gòu)完全是在虛擬機vmware 6.0的版本上構(gòu)建起來的。在vmware虛擬機上添加網(wǎng)卡和接口并指定哪個網(wǎng)卡橋接哪個vmnet的過程我想看圖就成,應該不用我再一一說了。
我的設想是這樣,我首先在兩臺iscsiinitiator上實現(xiàn)對iscsitarget上共享存儲的訪問使其成為本地存儲,然后再在這兩臺iscsiinitiator上部署gnbd并加載gnbd.ko模塊和開啟gnbdserver服務使其成為gnbdserver,然后在兩臺gnbdserver上export出其加載過來的iscsitarget的存儲,使gnbdclient能夠import,之后在gnbdclient上配置multipath實現(xiàn)多路徑訪問。最后進行測試。
能夠?qū)崿F(xiàn)的效果是,如果從gnbdclient1上向GFS寫入數(shù)據(jù)時候,如果將任何一條存儲線路斷開,gnbd內(nèi)置的fence機制結(jié)合物理fence可以將相應的gnbdserver從集群中fence掉,成功之后數(shù)據(jù)還可以通過另外一條存儲線路傳輸,或者數(shù)據(jù)傳輸可以從失效鏈路切換到另外一條正常鏈路。
在整個試驗中,gnbdclient和gnbdserver都要作為HA集群中的一個節(jié)點。也就是說在配置multipath之前要構(gòu)建一個四節(jié)點集群。
GNBD_ARC_MOD3.PNG (58.85 KB) 2008-10-24 09:58
?
現(xiàn)在開始配置:
首先是iscsitarget上的配置。這個服務器上的基本情況和配置信息包括:
地址信息:
[root@localhost ~]# ifconfig | grep inet
? ?? ?? ? inet addr:172.16.1.10??Bcast:172.16.255.255??Mask:255.255.0.0
? ?? ?? ? inet6 addr: fe80::20c:29ff:feb6:9605/64 Scope:Link
? ?? ?? ? inet addr:127.0.0.1??Mask:255.0.0.0
? ?? ?? ? inet6 addr: ::1/128 Scope:Host
在該主機上添加一個8G的硬盤作為共享:
[root@localhost ~]# fdisk -l
Disk /dev/sda: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
? ?Device Boot? ?? ?Start? ?? ?? ?End? ?? ?Blocks? ?Id??System
/dev/sda1? ?*? ?? ?? ???1? ?? ?? ? 13? ?? ?104391? ?83??Linux
/dev/sda2? ?? ?? ?? ???14? ?? ?? ?796? ???6289447+??83??Linux
/dev/sda3? ?? ?? ?? ? 797? ?? ?? ?861? ?? ?522112+??82??Linux swap / Solaris
Disk /dev/sdb: 8589 MB, 8589934592 bytes
64 heads, 32 sectors/track, 8192 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
然后是所安裝的軟件包,尤其是前面的scsi-target-utils,該包提供了iscsi服務端tgtd和tgtadm
[root@localhost ~]# rpm -qa | grep scsi
scsi-target-utils-0.0-0.20070620snap.el5
iscsi-initiator-utils-6.2.0.868-0.7.el5
[root@localhost ~]# rpm -ql scsi-target-utils
/etc/rc.d/init.d/tgtd
/usr/sbin/tgtadm
/usr/sbin/tgtd
之后啟動scsitarget服務:
[root@localhost ~]# service tgtd start
[root@localhost ~]# chkconfig --level 345 tgtd on
并將共享設備規(guī)則寫入到/etc/rc.local中。沒辦法,這個產(chǎn)品還是預覽版本,所以只能將就了:
[root@localhost ~]# cat /etc/rc.local
/usr/sbin/tgtadm --lld iscsi --op new --mode target --tid 1 -T iqn.2008-09.gnbd.storage.sdb
/usr/sbin/tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 -b /dev/sdb
/usr/sbin/tgtadm --lld iscsi --op bind --mode target --tid 1 -I 172.16.0.0/16
然后重啟tgtd服務:
[root@localhost ~]# service tgtd restart
到此為止在iscsi-target上的配置基本完成。下面是iscsi-initiator上的配置:
首先是基本網(wǎng)絡信息:
[root@gnbdserver1 ~]# hostname
gnbdserver1
[root@gnbdserver1 ~]# ifconfig | grep inet
? ?? ?? ? inet addr:192.168.10.13??Bcast:192.168.10.255??Mask:255.255.255.0? ? ? ? ?eth0
? ?? ?? ? inet6 addr: fe80::20c:29ff:fe82:237a/64 Scope:Link
? ?? ?? ? inet addr:172.16.1.11??Bcast:172.16.1.255??Mask:255.255.255.0 ?eth1
? ?? ?? ? inet6 addr: fe80::20c:29ff:fe82:2384/64 Scope:Link
? ?? ?? ? inet addr:192.168.2.12??Bcast:192.168.2.255??Mask:255.255.255.0 ?eth2
? ?? ?? ? inet6 addr: fe80::20c:29ff:fe82:238e/64 Scope:Link
然后實現(xiàn)配置iscsi-initiator,確保安裝了iscsi-initiator-utils包,然后執(zhí)行下面的操作:
[root@gnbdserver1 ~]# service iscsi restart
[root@gnbdserver1 ~]# chkconfig --level 345 iscsi on
[root@gnbdserver1 ~]# iscsiadm --mode discovery --type sendtargets --portal 172.16.1.10
修改文件:/etc/iscsi/iscsi.conf,更改下面兩個參數(shù)為:
node.session.timeo.replacement_timeout = 12? ???
node.session.initial_login_retry_max = 1? ?? ?? ?
對于這兩個參數(shù)的解釋:
To specify the length of time to wait for session re-establishment before failing SCSI commands back to the application when running the Linux SCSI Layer error handler, edit the line.
The value is in seconds and the default is 120 seconds.
To speficy the number of times iscsiadm should retry a login to the target when we first login, modify the following line. The default is 4. Valid values are any integer value. This only affects the initial login. Setting it to a high value can slow down the iscsi service startup. Setting it to a low value can cause a session to not get logged into, if there are distuptions during startup or if the network is not ready at that time.
其實我修改這兩個值的目的完全是希望底層iscsi在探測到鏈路失效的時候驅(qū)動的反映能夠快一些,因為我在普通環(huán)境下以iscsi做multipath的試驗發(fā)現(xiàn)在iscsi網(wǎng)絡上存儲線路failover太慢,要一分鐘之久。而這個值在結(jié)合gnbd的時候可能會出現(xiàn)一些問題。事實上根據(jù)其他工程師的反應,在光纖存儲控制器或者光纖交換機切換的周期根本沒有這么長。不過我對此做的更改似乎沒有什么太大的用處,如果在非cluster環(huán)境下做multipath,更改該值反而會使線路切換更加慢。所以我認為該值不改也行。
接著,就要通過iscsiadm命令來確認連接到iscsi存儲,并重啟服務:
[root@gnbdserver1 ~]# service iscsi restart
[root@gnbdserver1 ~]# iscsiadm --mode node
172.16.1.10:3260,1 iqn.2008-09.gnbd.storage.sdb
[root@gnbdserver1 ~]# fdisk -l
Disk /dev/sda: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
? ?Device Boot? ?? ?Start? ?? ?? ?End? ?? ?Blocks? ?Id??System
/dev/sda1? ?*? ?? ?? ???1? ?? ?? ? 13? ?? ?104391? ?83??Linux
/dev/sda2? ?? ?? ?? ???14? ?? ?? ?796? ???6289447+??83??Linux
/dev/sda3? ?? ?? ?? ? 797? ?? ?? ?861? ?? ?522112+??82??Linux swap / Solaris
Disk /dev/sdb: 8589 MB, 8589934592 bytes
64 heads, 32 sectors/track, 8192 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
? ?Device Boot? ?? ?Start? ?? ?? ?End? ?? ?Blocks? ?Id??System
/dev/sdb1? ?? ?? ?? ?? ?1? ?? ???3816? ???3907568? ?83??Linux
到此為止在gnbdserver1上的配置已經(jīng)完成。而gnbdserver2除了基本信息的改動之外iscsi的相關(guān)配置是一樣的,所以這里只將gnbdserver2的基本信息列出來:
[root@gnbdserver2 ~]# hostname
gnbdserver2
[root@gnbdserver2 ~]# ifconfig | grep inet
? ?? ?? ? inet addr:192.168.10.14??Bcast:192.168.10.255??Mask:255.255.255.0
? ?? ?? ? inet6 addr: fe80::20c:29ff:fe8e:c855/64 Scope:Link
? ?? ?? ? inet addr:172.16.1.12??Bcast:172.16.1.255??Mask:255.255.255.0
? ?? ?? ? inet6 addr: fe80::20c:29ff:fe8e:c85f/64 Scope:Link
? ?? ?? ? inet addr:192.168.3.12??Bcast:192.168.3.255??Mask:255.255.255.0
接著配置gnbdclient1和gnbdclient2的基本信息,做加入集群的準備工作:
[root@gnbdclient1 ~]# hostname
gnbdclient1
[root@gnbdclient1 ~]# ifconfig | grep inet
? ?? ?? ? inet addr:192.168.10.11??Bcast:192.168.10.255??Mask:255.255.255.0? ? ? ? ?eth0
? ?? ?? ? inet6 addr: fe80::20c:29ff:fe6e:4d82/64 Scope:Link
? ?? ?? ? inet addr:192.168.3.13??Bcast:192.168.3.255??Mask:255.255.255.0 ?eth1
? ?? ?? ? inet6 addr: fe80::20c:29ff:fe6e:4d8c/64 Scope:Link
? ?? ?? ? inet addr:192.168.2.11??Bcast:192.168.2.255??Mask:255.255.255.0? ? ? ? ?eth2
? ?? ?? ? inet6 addr: fe80::20c:29ff:fe6e:4d96/64 Scope:Link
[root@gnbdclient1 ~]# cat /etc/hosts
127.0.0.1? ?? ?? ?? ?? ?localhost.localdomain localhost
::1? ?? ?? ?? ? localhost6.localdomain6 localhost6
192.168.10.13? ?gnbdserver1
192.168.10.14? ?gnbdserver2
192.168.10.11? ?gnbdclient1
192.168.10.12? ?gnbdclient2
注意,該文件要同步到其他的gnbdservers和gnbdclients中。
然后確保以下軟件的安裝:
cman-2.0.84-2.el5
rgmanager-2.0.38-2.el5
system-config-cluster-1.0.52-1.1
kmod-gnbd-0.1.4-12.el5
gnbd-1.1.5-1.el5
gfs-utils-0.1.17-1.el5
kmod-gfs2-1.92-1.1.el5
gfs2-utils-0.1.44-1.el5
kmod-gfs-0.1.23-5.el5
上述這些包也要在其他的gnbdservers和gnbdclients中安裝。
完成之后兩臺gnbdclients和gnbdservers都重啟以加載gnbd和gfs模塊。當然在重啟所有系統(tǒng)之前可以先在gnbdclients上安裝device-mapper-multipath,并啟動該服務:
[root@gnbdclient1 ~]# service multiapthd restart
[root@gnbdclient1 ~]# chkconfig –-level 345 multipathd on
[root@gnbdclient2 ~]# service multiapthd restart
[root@gnbdclient2 ~]# chkconfig –-level 345 multipathd on
這樣在client啟動的時候可以自動加載multipath模塊。
在重啟完成之后,下面就可以配置集群了,在任何一臺gnbdservers和gnbdclients上進入圖形界面,然后執(zhí)行system-config-cluster打開集群配置工具。基本步驟和部署一個普通集群無差別,只是我在這里使用了manual_fence,因為確實沒有物理fence設備,并且分別為gnbdservers和gnbdclients建立一個failover domain。關(guān)于為什么要使用manual_fence,官方文檔有相關(guān)解釋:
GNBD server nodes must be fenced using a fencing method that physically removes the nodes
from the network. To physically remove a GNBD server node, you can use any fencing device:
except the following: fence_brocade fence agent, fence_vixel fence agent, fence_mcdata
fence agent, fence_sanbox2 fence agent, fence_scsi fence agent. In addition, you cannot use
the GNBD fencing device (fence_gnbd fence agent) to fence a GNBD server node.
至于服務方面我沒有添加,因為該試驗的目的是測試gnbd和multipath的功能。完成之后將配置文件同步到其他主機,下面是我的配置文件:
[root@gnbdclient1 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="6" name="gnbd_cluster">
? ?? ???<fence_daemon post_fail_delay="0" post_join_delay="3"/>
? ?? ???<clusternodes>
? ?? ?? ?? ?? ? <clusternode name="gnbdserver1" nodeid="1" votes="1">
? ?? ?? ?? ?? ?? ?? ?? ?<fence>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???<method name="1">
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? <device name="mfence" nodename="gnbdserver1"/>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???</method>
? ?? ?? ?? ?? ?? ?? ?? ?</fence>
? ?? ?? ?? ?? ? </clusternode>
? ?? ?? ?? ?? ? <clusternode name="gnbdserver2" nodeid="2" votes="1">
? ?? ?? ?? ?? ?? ?? ?? ?<fence>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???<method name="1">
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? <device name="mfence" nodename="gnbdserver2"/>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???</method>
? ?? ?? ?? ?? ?? ?? ?? ?</fence>
? ?? ?? ?? ?? ? </clusternode>
? ?? ?? ?? ?? ? <clusternode name="gnbdclient1" nodeid="3" votes="1">
? ?? ?? ?? ?? ?? ?? ?? ?<fence>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???<method name="1">
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? <device name="mfence" nodename="gnbdclient1"/>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???</method>
? ?? ?? ?? ?? ?? ?? ?? ?</fence>
? ?? ?? ?? ?? ? </clusternode>
? ?? ?? ?? ?? ? <clusternode name="gnbdclient2" nodeid="4" votes="1">
? ?? ?? ?? ?? ?? ?? ?? ?<fence>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???<method name="1">
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? <device name="mfence" nodename="gnbdclient2"/>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???</method>
? ?? ?? ?? ?? ?? ?? ?? ?</fence>
? ?? ?? ?? ?? ? </clusternode>
? ?? ???</clusternodes>
? ?? ???<cman/>
? ?? ???<fencedevices>
? ?? ?? ?? ?? ? <fencedevice agent="fence_manual" name="mfence"/>
? ?? ???</fencedevices>
? ?? ???<rm>
? ?? ?? ?? ?? ? <failoverdomains>
? ?? ?? ?? ?? ?? ?? ?? ?<failoverdomain name="server" ordered="1" restricted="1">
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???<failoverdomainnode name="gnbdserver1" priority="1"/>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???<failoverdomainnode name="gnbdserver2" priority="1"/>
? ?? ?? ?? ?? ?? ?? ?? ?</failoverdomain>
? ?? ?? ?? ?? ?? ?? ?? ?<failoverdomain name="client" ordered="1" restricted="1">
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???<failoverdomainnode name="gnbdclient1" priority="1"/>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???<failoverdomainnode name="gnbdclient2" priority="1"/>
? ?? ?? ?? ?? ?? ?? ?? ?</failoverdomain>
? ?? ?? ?? ?? ? </failoverdomains>
? ?? ?? ?? ?? ? <resources/>
? ?? ???</rm>
</cluster>
然后在所有g(shù)nbdclients和gnbdservers上開啟集群服務并重啟所有系統(tǒng):
[root@gnbdclient1 ~]# chkconfig --level 345 cman on
[root@gnbdclient1 ~]# chkconfig --level 345 rgmanager on
[root@gnbdclient1 ~]# chkconfig --level 345 gfs on
最后再次重啟所有系統(tǒng)。
在系統(tǒng)啟動之后,如果配置沒錯并且拓撲沒錯的話集群也就已經(jīng)起來了。接著在gnbdservers上建立gfs文件系統(tǒng),注意,這里gfs文件系統(tǒng)要針對整個分區(qū)建立而不是lvm上建立,因為在lvm上構(gòu)建GFS文件系統(tǒng)在后期gnbd export的時候會出錯。所以自然也就不需要啟動clvmd服務。
下面是建立gfs文件系統(tǒng)的步驟,可以在任何一臺gnbdserver上建立gfs文件系統(tǒng):
先用fdisk將共享盤劃分為整個一個單分區(qū),然后執(zhí)行g(shù)fs_mkfs命令:
[root@gnbdclient1 ~]# gfs_mkfs -t gnbd_clustre:gfs1 -p lock_dlm -j 2 /dev/sdb1
在另外一個節(jié)點上就不用再次執(zhí)行g(shù)fs_mkfs了。
接著可以在gnbdservers和gnbdclients上部署gnbd。
首先是gnbdserver1上的配置步驟:
系統(tǒng)重啟之后手動執(zhí)行modprobe命令加載gnbd.ko模塊并啟動gnbd服務,我可以將將這幾個步驟寫入到/etc/rc.local文件中:
[root@gnbdserver1 ~]# cat /etc/rc.local
/sbin/rmmod pcspkr
/sbin/modprobe gnbd
/sbin/gnbd_serv
/sbin/gnbd_export -d /dev/sdb1 -e gfs-1 -U -t 5
然后執(zhí)行該文件:
[root@gnbdserver1 ~]# /etc/rc.local
完成之后查看:
[root@gnbdserver1 ~]# gnbd_export -l
Server[1] : gfs-1
--------------------------
? ?? ?file : /dev/sdb1
? ?sectors : 7815136
??readonly : no
? ? cached : no
? ?timeout : 5
? ?? ? uid : GNBD-1-16465616462656166313a3100000000000000000000000000
官方文檔對該操作的部分解釋:
# gnbd_export Command to create, export and manage GNBDs on a GNBD server.
另外在使用gnbd實現(xiàn)multipath的時候有幾點需要注意:
For GNBD with device-mapper multipath, do not specify Linux page caching (the -c option of
the gnbd_export command). All GNBDs that are part of a logical volume must run with caching
disabled. Data corruption occurs if the GNBDs are run with caching enabled.
-U Command
Gets the UID command. The UID command is a command the gnbd_export command will run to get a Universal Identifier for the exported device. The UID is necessary to use device-mapper multipath with GNBD. The command must use the full path of any executeable that you wish to run. A command can contain the %M, %m or %n escape sequences. %M will be expanded to the major number of the exported device, %m will be expaned to the minor number of the exported device, and %n will be expanded to the sysfs name for the device. If no command is given, GNBD will use the default command /usr/sbin/gnbd_get_uid. This command will work for most SCSI devices.
1. A GNBD server node must have local access to all storage devices needed to mount a GFS
file system. The GNBD server node must not import (gnbd_import command) other GNBD
devices to run the file system.
2. The GNBD server must export all the GNBDs in uncached mode, and it must export the raw
devices, not logical volume devices.
3. GFS must be run on top of a logical volume device, not raw devices.
You may need to increase the timeout period on the exported GNBDs to accommodate reduced performance. The need to increase the timeout period depends on the quality of the hardware.
然后是gnbdserver2上的配置步驟和gnbdserver1一樣,稍有不同的是配置文件內(nèi)容,所以我只給出配置文件,其他不再贅述。
[root@gnbdserver2 ~]# cat /etc/rc.local
/sbin/rmmod pcspkr
/sbin/modprobe gnbd
/sbin/gnbd_serv
/sbin/gnbd_export -d /dev/sdb1 -e gfs-2 -U -t 5
接著是在gnbdclient1上的配置:
和gnbdserver一樣,我將需要的配置寫入到/etc/rc.local中:
[root@gnbdclient1 ~]# cat /etc/rc.local
/sbin/rmmod pcspkr
/sbin/modprobe gnbd
/sbin/gnbd_import -i 192.168.2.12??
/sbin/gnbd_import -i 192.168.3.12
并執(zhí)行該腳本:
[root@gnbdserver1 ~]# /etc/rc.local
查看執(zhí)行情況:
[root@gnbdclient1 ~]# gnbd_import -l
Device name : gfs-1
----------------------
? ? Minor # : 0
sysfs name : /block/gnbd0
? ???Server : 192.168.2.12
? ?? ? Port : 14567
? ?? ?State : Open Connected Clear
? ?Readonly : No
? ? Sectors : 7815136
Device name : gfs-2
----------------------
? ? Minor # : 1
sysfs name : /block/gnbd1
? ???Server : 192.168.3.12
? ?? ? Port : 14567
? ?? ?State : Open Connected Clear
? ?Readonly : No
Sectors : 7815136
[root@gnbdclient1 ~]# gnbd_monitor
device #? ?timeout? ?state
? ?? ? 1? ?? ?? ?5? ?normal
? ?? ? 0? ?? ?? ?5? ?normal
當然在gnbdclient2上的配置和測試過程完全一樣,不再贅述。
到此為止gnbd在server和client上的配置都已經(jīng)全部完成。
最后是在gnbdclients上部署multipath。剛才已經(jīng)安裝了相關(guān)的multipath包,所以直接修改配置文件。除了修改blacklist之外,將原來的default段的內(nèi)容修改如下:
[root@gnbdclient1 ~]# cat /etc/multipath.conf
blacklist {
? ?? ?? ?devnode "sda"
}
defaults {
? ?? ? udev_dir? ?? ?? ?? ?? ? /dev
? ?? ? polling_interval? ?? ???5
? ?? ? selector? ?? ?? ?? ?? ? "round-robin 0"
? ?? ? path_grouping_policy? ? failover
? ?? ? prio_callout? ?? ?? ?? ?none
? ?? ? path_checker? ?? ?? ?? ?readsector0
? ?? ? rr_min_io? ?? ?? ?? ?? ?1000
? ?? ? rr_weight? ?? ?? ?? ?? ?uniform
}
multipaths {
? ?? ???multipath {
? ?? ?? ?? ?? ? wwid? ?? ?? ?? ?? ?? ???GNBD-1-16465616462656166313a3100000000000000000000000000
? ?? ?? ?? ?? ? alias? ?? ?? ?? ?? ?? ? yellow
? ?? ?? ?? ?? ? path_grouping_policy? ? failover
? ?? ?? ?? ?? ? path_checker? ?? ?? ?? ?readsector0
? ?? ?? ?? ?? ? path_selector? ?? ?? ???"round-robin 0"
? ?? ?? ?? ?? ? failback? ?? ?? ?? ?? ? manual
? ?? ?? ?? ?? ? rr_weight? ?? ?? ?? ?? ?priorities
? ?? ?? ?? ?? ? no_path_retry? ?? ?? ???5
? ?? ???}
}
這是在jerry幫助下修改的一個實現(xiàn)的主備功能的multipath配置,完成之后保存并啟用服務:
[root@gnbdclient1 ~]# service multipathd restart
[root@gnbdclient1 ~]# chkconfig --level 345 multipathd on
[root@gnbdclient1 ~]# multipath -ll
yellow (GNBD-1-16465616462656166313a3100000000000000000000000000) dm-0 GNBD,GNBD
[size=3.7G][features=1 queue_if_no_path][hwhandler=0]
/_ round-robin 0 [prio=1][active]
/_ #:#:#:# gnbd0 252:0 [active][ready]
/_ round-robin 0 [prio=1][enabled]
/_ #:#:#:# gnbd1 252:1 [active][ready]
下面把配置文件multipath.conf同步到gnbdclient2上,然后按照同樣的步驟啟用服務,這樣整個的試驗結(jié)構(gòu)就算全部完成。 下面開始按照預先的設想去進行測試:
為了保證一個干凈的環(huán)境,首先重啟全部節(jié)點并清空所有節(jié)點上的日志:
# > /var/log/messages,
然后在gnbdclient1上掛載gfs文件系統(tǒng):
[root@gnbdclient1 ~]# mount /dev/mpath/yellow /mnt/
[root@gnbdclient1 ~]# df -TH /mnt
Filesystem? ? Type? ???Size? ?Used??Avail Use% Mounted on
/dev/dm-0? ?? ?gfs? ???3.8G? ?889M? ?2.9G??24% /mnt
[root@gnbdclient1 ~]# tail -f /var/log/messages
Oct 23 13:59:46 gnbdclient1 kernel: GFS 0.1.23-5.el5 (built Apr 30 2008 16:56:42) installed
Oct 23 13:59:46 gnbdclient1 kernel: Trying to join cluster "lock_dlm", "gnbd_cluster:gfs1"
Oct 23 13:59:46 gnbdclient1 kernel: Joined cluster. Now mounting FS...
Oct 23 13:59:46 gnbdclient1 kernel: GFS: fsid=gnbd_cluster:gfs1.0: jid=0: Trying to acquire journal lock...
Oct 23 13:59:46 gnbdclient1 kernel: GFS: fsid=gnbd_cluster:gfs1.0: jid=0: Looking at journal...
Oct 23 13:59:47 gnbdclient1 kernel: GFS: fsid=gnbd_cluster:gfs1.0: jid=0: Done
Oct 23 13:59:47 gnbdclient1 kernel: GFS: fsid=gnbd_cluster:gfs1.0: jid=1: Trying to acquire journal lock...
Oct 23 13:59:47 gnbdclient1 kernel: GFS: fsid=gnbd_cluster:gfs1.0: jid=1: Looking at journal...
Oct 23 13:59:47 gnbdclient1 kernel: GFS: fsid=gnbd_cluster:gfs1.0: jid=1: Done
同時在gnbdclient2上也同樣掛載該文件系統(tǒng):
[root@gnbdclient2 ~]# mount /dev/mpath/yellow /mnt/
在gnbdclient2上監(jiān)控/mnt:
[root@gnbdclient2 ~]# watch -n1 "ls -l /mnt"
然后在gnbdclient1上向GFS文件系統(tǒng)dd數(shù)據(jù):
[root@gnbdclient1 mnt]# dd if=/dev/zero of=test.img
在執(zhí)行該命令的時候,我們會通過虛擬機網(wǎng)絡接口發(fā)現(xiàn)數(shù)據(jù)流在vmnet3上,根據(jù)結(jié)構(gòu)圖判斷,數(shù)據(jù)是在通過gnbdserver1走。然后我將gnbdclient1和gnbdserver1連接的鏈路在gnbdclient1上物理斷開。數(shù)據(jù)流將出現(xiàn)暫時的中斷。
然后在gnbdserver1上出現(xiàn)下面的報錯信息:
[root@gnbdserver1 ~]# cat /var/log/messages
Oct 23 14:05:26 gnbdserver1 openais[4857]: [CMAN ] cman killed by node 3 because we were killed by cman_tool or other application
Oct 23 14:05:26 gnbdserver1 gfs_controld[4885]: groupd_dispatch error -1 errno 11
Oct 23 14:05:26 gnbdserver1 gfs_controld[4885]: groupd connection died
Oct 23 14:05:26 gnbdserver1 gfs_controld[4885]: cluster is down, exiting
Oct 23 14:05:26 gnbdserver1 dlm_controld[4879]: cluster is down, exiting
Oct 23 14:05:26 gnbdserver1 kernel: dlm: closing connection to node 4
Oct 23 14:05:26 gnbdserver1 kernel: dlm: closing connection to node 3
Oct 23 14:05:26 gnbdserver1 kernel: dlm: closing connection to node 2
Oct 23 14:05:26 gnbdserver1 kernel: dlm: closing connection to node 1
Oct 23 14:05:29 gnbdserver1 gnbd_clusterd[5672]: ERROR [gnbd_clusterd.c:72] Bad poll result 0x11 from cluster
Oct 23 14:05:30 gnbdserver1 fenced[4873]: cluster is down, exiting
Oct 23 14:05:47 gnbdserver1 ccsd[4817]: Unable to connect to cluster infrastructure after 30 seconds.
而在gnbdserver2上認為gnbdserver1將被fence出集群:
[root@gnbdserver2 ~]# cat /var/log/messages
Oct 23 14:09:40 gnbdserver2 openais[4832]: [TOTEM] entering GATHER state from 12.
Oct 23 14:09:44 gnbdserver2 openais[4832]: [TOTEM] entering GATHER state from 11.
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] Saving state aru 78 high seq received 78
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] Storing new sequence id for ring 460
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] entering COMMIT state.
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] entering RECOVERY state.
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] position [0] member 192.168.10.11:
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] previous ring seq 1116 rep 192.168.10.11
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] aru 78 high delivered 78 received flag 1
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] position [1] member 192.168.10.12:
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] previous ring seq 1116 rep 192.168.10.11
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] aru 78 high delivered 78 received flag 1
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] position [2] member 192.168.10.14:
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] previous ring seq 1116 rep 192.168.10.11
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] aru 78 high delivered 78 received flag 1
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] Did not need to originate any messages in recovery.
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] CLM CONFIGURATION CHANGE
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] New Configuration:
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??]? ?? ?r(0) ip(192.168.10.11)??
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??]? ?? ?r(0) ip(192.168.10.12)??
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??]? ?? ?r(0) ip(192.168.10.14)??
Oct 23 14:09:45 gnbdserver2 kernel: dlm: closing connection to node 1
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] Members Left:
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??]? ?? ?r(0) ip(192.168.10.13)??
Oct 23 14:09:45 gnbdserver2 fenced[4848]: gnbdserver1 not a cluster member after 0 sec post_fail_delay
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] Members Joined:
Oct 23 14:09:45 gnbdserver2 fenced[4848]: fencing node "gnbdserver1"
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] CLM CONFIGURATION CHANGE
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] New Configuration:
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??]? ?? ?r(0) ip(192.168.10.11)??
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??]? ?? ?r(0) ip(192.168.10.12)??
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??]? ?? ?r(0) ip(192.168.10.14)??
Oct 23 14:09:45 gnbdserver2 fence_manual: Node gnbdserver1 needs to be reset before recovery can procede.??Waiting for gnbdserver1 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n gnbdserver1)
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] Members Left:
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] Members Joined:
Oct 23 14:09:45 gnbdserver2 openais[4832]: [SYNC ] This node is within the primary component and will provide service.
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] entering OPERATIONAL state.
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] got nodejoin message 192.168.10.11
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] got nodejoin message 192.168.10.12
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] got nodejoin message 192.168.10.14
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CPG??] got joinlist message from node 2
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CPG??] got joinlist message from node 3
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CPG??] got joinlist message from node 4
現(xiàn)在我按照提示在gnbdserver2上執(zhí)行命令來將gnbdserver1來fence掉:
[root@gnbdserver2 ~]# fence_ack_manual -n gnbdserver1
Warning:??If the node "gnbdserver1" has not been manually fenced
(i.e. power cycled or disconnected from shared storage devices)
the GFS file system may become corrupted and all its data
unrecoverable!??Please verify that the node shown above has
been reset or disconnected from storage.
Are you certain you want to continue? [yN] y
Done
命令執(zhí)行成功之后,數(shù)據(jù)流將立刻切換到gnbdserver2上。如果我在gnbdclient1上終止dd進程能夠順利完成:
[root@gnbdclient1 mnt]# dd if=/dev/zero of=test.img
163457+0 records in
163457+0 records out
83689984 bytes (84 MB) copied, 4.27649 seconds, 19.6 MB/s
[root@gnbdclient1 mnt]# dd if=/dev/zero of=test.img
? ?? ???1653921+0 records in
1653921+0 records out
846807552 bytes (847 MB) copied, 158.782 seconds, 5.3 MB/s
這個時候監(jiān)控gnbdserver2的后續(xù)的日志:
Oct 23 14:10:57 gnbdserver2 fenced[4848]: fence "gnbdserver1" success
而在gnbdclient1上的整個日志:
Oct 23 14:03:28 gnbdclient1 kernel: eth2: link down
Oct 23 14:03:38 gnbdclient1 kernel: gnbd (pid 5361: gnbd_recvd) got signal
Oct 23 14:03:38 gnbdclient1 kernel: gnbd0: Receive control failed (result -4)
Oct 23 14:03:38 gnbdclient1 kernel: gnbd0: shutting down socket
Oct 23 14:03:38 gnbdclient1 kernel: exiting GNBD_DO_IT ioctl
Oct 23 14:03:38 gnbdclient1 gnbd_recvd[5361]: client lost connection with 192.168.2.12 : Interrupted system call
Oct 23 14:03:38 gnbdclient1 gnbd_recvd[5361]: reconnecting
Oct 23 14:03:54 gnbdclient1 openais[4711]: [TOTEM] The token was lost in the OPERATIONAL state.
Oct 23 14:03:54 gnbdclient1 openais[4711]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
Oct 23 14:03:54 gnbdclient1 openais[4711]: [TOTEM] Transmit multicast socket send buffer size (219136 bytes).
Oct 23 14:03:54 gnbdclient1 openais[4711]: [TOTEM] entering GATHER state from 2.
Oct 23 14:03:58 gnbdclient1 openais[4711]: [TOTEM] entering GATHER state from 0.
Oct 23 14:03:58 gnbdclient1 openais[4711]: [TOTEM] Creating commit token because I am the rep.
Oct 23 14:03:58 gnbdclient1 openais[4711]: [TOTEM] Saving state aru 78 high seq received 78
Oct 23 14:03:58 gnbdclient1 openais[4711]: [TOTEM] Storing new sequence id for ring 460
Oct 23 14:03:58 gnbdclient1 openais[4711]: [TOTEM] entering COMMIT state.
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] entering RECOVERY state.
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] position [0] member 192.168.10.11:
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] previous ring seq 1116 rep 192.168.10.11
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] aru 78 high delivered 78 received flag 1
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] position [1] member 192.168.10.12:
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] previous ring seq 1116 rep 192.168.10.11
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] aru 78 high delivered 78 received flag 1
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] position [2] member 192.168.10.14:
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] previous ring seq 1116 rep 192.168.10.11
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] aru 78 high delivered 78 received flag 1
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] Did not need to originate any messages in recovery.
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] Sending initial ORF token
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] CLM CONFIGURATION CHANGE
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] New Configuration:
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??]? ?? ?r(0) ip(192.168.10.11)??
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??]? ?? ?r(0) ip(192.168.10.12)??
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??]? ?? ?r(0) ip(192.168.10.14)??
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] Members Left:
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??]? ?? ?r(0) ip(192.168.10.13)??
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] Members Joined:
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] CLM CONFIGURATION CHANGE
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] New Configuration:
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??]? ?? ?r(0) ip(192.168.10.11)??
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??]? ?? ?r(0) ip(192.168.10.12)??
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??]? ?? ?r(0) ip(192.168.10.14)??
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] Members Left:
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] Members Joined:
Oct 23 14:03:59 gnbdclient1 openais[4711]: [SYNC ] This node is within the primary component and will provide service.
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] entering OPERATIONAL state.
Oct 23 14:03:59 gnbdclient1 kernel: dlm: closing connection to node 1
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] got nodejoin message 192.168.10.11
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] got nodejoin message 192.168.10.12
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] got nodejoin message 192.168.10.14
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CPG??] got joinlist message from node 2
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CPG??] got joinlist message from node 3
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CPG??] got joinlist message from node 4
Oct 23 14:03:59 gnbdclient1 fenced[4727]: fencing deferred to gnbdserver2
Oct 23 14:04:26 gnbdclient1 gnbd_recvd[5361]: ERROR [gnbd_recvd.c:213] cannot connect to server 192.168.2.12 (-1) : No route to host
Oct 23 14:04:26 gnbdclient1 gnbd_recvd[5361]: reconnecting
Oct 23 14:04:34 gnbdclient1 gnbd_recvd[5361]: ERROR [gnbd_recvd.c:213] cannot connect to server 192.168.2.12 (-1) : No route to host
Oct 23 14:04:34 gnbdclient1 gnbd_recvd[5361]: reconnecting
Oct 23 14:04:35 gnbdclient1 multipathd: gnbd0: directio checker reports path is down
Oct 23 14:04:35 gnbdclient1 multipathd: checker failed path 252:0 in map yellow
Oct 23 14:04:35 gnbdclient1 multipathd: yellow: remaining active paths: 1
Oct 23 14:04:35 gnbdclient1 multipathd: dm-0: add map (uevent)
Oct 23 14:04:35 gnbdclient1 multipathd: dm-0: devmap already registered
Oct 23 14:04:35 gnbdclient1 kernel: device-mapper: multipath: Failing path 252:0.
Oct 23 14:04:41 gnbdclient1 multipathd: gnbd0: directio checker reports path is down
Oct 23 14:04:42 gnbdclient1 gnbd_recvd[5361]: ERROR [gnbd_recvd.c:213] cannot connect to server 192.168.2.12 (-1) : No route to host
Oct 23 14:04:42 gnbdclient1 gnbd_recvd[5361]: reconnecting
Oct 23 14:04:47 gnbdclient1 multipathd: gnbd0: directio checker reports path is down
Oct 23 14:04:50 gnbdclient1 gnbd_recvd[5361]: ERROR [gnbd_recvd.c:213] cannot connect to server 192.168.2.12 (-1) : No route to host
………………………………………………………………………………………………………………
Oct 23 14:04:58 gnbdclient1 gnbd_recvd[5361]: reconnecting
Oct 23 14:04:59 gnbdclient1 multipathd: gnbd0: directio checker reports path is down
Oct 23 14:05:05 gnbdclient1 multipathd: gnbd0: directio checker reports path is down
Oct 23 14:05:06 gnbdclient1 gnbd_recvd[5361]: ERROR [gnbd_recvd.c:213] cannot connect to server 192.168.2.12 (-1) : No route to host
Oct 23 14:05:06 gnbdclient1 gnbd_recvd[5361]: reconnecting
Oct 23 14:05:11 gnbdclient1 multipathd: gnbd0: directio checker reports path is down
Oct 23 14:05:14 gnbdclient1 kernel: gnbd_monitor 16518 called gnbd_end_request with an error
Oct 23 14:05:14 gnbdclient1 kernel: end_request: I/O error, dev gnbd0, sector 2759848
…………………………………………………………………………………………………………………
Oct 23 14:05:14 gnbdclient1 kernel: gnbd_monitor 16518 called gnbd_end_request with an error
Oct 23 14:05:14 gnbdclient1 kernel: end_request: I/O error, dev gnbd0, sector 3732744
Oct 23 14:05:16 gnbdclient1 multipathd: gnbd0: directio checker reports path is down
Oct 23 14:05:21 gnbdclient1 kernel: multipathd 4563 called gnbd_end_request with an error
Oct 23 14:05:21 gnbdclient1 kernel: end_request: I/O error, dev gnbd0, sector 0
…………………………………………………………………………………………………………
Oct 23 14:05:46 gnbdclient1 kernel: end_request: I/O error, dev gnbd0, sector 0
Oct 23 14:05:46 gnbdclient1 multipathd: gnbd0: directio checker reports path is down
Oct 23 14:05:51 gnbdclient1 kernel: multipathd 4563 called gnbd_end_request with an error
Oct 23 14:05:51 gnbdclient1 kernel: end_request: I/O error, dev gnbd0, sector 0
Oct 23 14:05:51 gnbdclient1 multipathd: gnbd0: directio checker reports path is down
而在gnbdclient2上的日志信息:
Oct 23 13:56:25 gnbdclient2 kernel: gnbd (pid 5576: gnbd_recvd) got signal
Oct 23 13:56:25 gnbdclient2 kernel: gnbd0: Receive control failed (result -4)
Oct 23 13:56:25 gnbdclient2 kernel: gnbd0: shutting down socket
Oct 23 13:56:25 gnbdclient2 kernel: exiting GNBD_DO_IT ioctl
Oct 23 13:56:28 gnbdclient2 kernel: ls 15997 called gnbd_end_request with an error
Oct 23 13:56:28 gnbdclient2 kernel: end_request: I/O error, dev gnbd0, sector 208
Oct 23 13:56:28 gnbdclient2 kernel: device-mapper: multipath: Failing path 252:0.
Oct 23 13:56:28 gnbdclient2 multipathd: dm-0: add map (uevent)
Oct 23 13:56:28 gnbdclient2 multipathd: dm-0: devmap already registered
Oct 23 13:56:28 gnbdclient2 multipathd: 252:0: mark as failed
Oct 23 13:56:28 gnbdclient2 multipathd: yellow: remaining active paths: 1
………………………………………………………………………………………………………
Oct 23 13:57:05 gnbdclient2 kernel: end_request: I/O error, dev gnbd0, sector 0
Oct 23 13:57:05 gnbdclient2 multipathd: gnbd0: directio checker reports path is down
Oct 23 13:57:05 gnbdclient2 gnbd_recvd[16027]: gnbd_recvd started
Oct 23 13:57:05 gnbdclient2 kernel: resending requests
Oct 23 13:57:10 gnbdclient2 multipathd: gnbd0: directio checker reports path is up
Oct 23 13:57:10 gnbdclient2 multipathd: 252:0: reinstated
Oct 23 13:57:10 gnbdclient2 multipathd: yellow: remaining active paths: 2
Oct 23 13:57:10 gnbdclient2 multipathd: dm-0: add map (uevent)
Oct 23 13:57:10 gnbdclient2 multipathd: dm-0: devmap already registered
這樣證明斷線測試gnbd能夠觸發(fā)fence將失效gnbdserver踢出集群,這個時候multipath能夠正常工作。
第二部分的測試是模擬將整個的gnbdserver關(guān)閉掉。
其實我已經(jīng)不用將任何輸出貼上來想必大家應該已經(jīng)能夠猜到結(jié)果:如果還是像剛才那樣測試,在dd數(shù)據(jù)到GFS文件系統(tǒng)的時候?qū)⒄趥鬏敂?shù)據(jù)的vmnet3所連接的gnbdserver1整個關(guān)閉電源。這個時候dd的動作會停頓,在20s內(nèi),gnbdserver2上的日志顯示gnbdserver1將被fence出集群,這個時候我繼續(xù)等待,待從中斷鏈路之后計算有一分鐘的時間,才去gnbdserver2上執(zhí)行fence_ack_manual –n gnbdserver1命令。如果fence成功之后,隨后的數(shù)據(jù)流將全部通過vmnet4和gnbdserver2走。
整個試驗到此成功!
在這個試驗當中最大的問題實際上是multipath和gnbdclient之間不能配合正常工作,我所面對的最多的現(xiàn)象就是在fence失效的gnbdserver之后,線路總是不能按照要求failover到其他的server上,而是不斷報告error信息,而且在GFS上的觀測顯示,的確沒有新的數(shù)據(jù)流寫入,這個時候不管終止dd進程還是cd到/mnt去看都將稱為z狀態(tài)進程,從而multipath不能發(fā)揮作用。
從我實際測試的情況看,如果在某個gnbdserver發(fā)出要fence另外一個gnbdserver的請求時,如果立即fence掉指定的節(jié)點,這個時候盡管fence成功但multipath不能正確failover線路;我將該情況歸結(jié)于使用iscsi作為共享存儲的時候,如果配置了多路徑其本身的切換速度就會比較慢,像剛才mar所說的那樣,從normal到timeout到failed到restarable這個過程完成之后multipath才能實現(xiàn)對線路的切換。但是如果從某個節(jié)點上向另外一個節(jié)點發(fā)送fence請求之后就立即將對方干掉,這個時候iscsi的底層切換還沒有完成一次從normal到restarable的過程,所以這個時侯multipath將無法接管失效鏈路。
因此總結(jié)出來的方法也很簡單,我在斷掉某個鏈路做測試的時候開始計時,在十數(shù)秒之后其中一個gnbdserver會發(fā)出fence失效節(jié)點的請求,在該請求發(fā)出大概30s以上的時間,我可能會選擇一分鐘或更長時間才執(zhí)行fence_ack_manual,這樣經(jīng)過實際驗證大大提高了multipath切換鏈路的成功率。
另外還有一個地方值得注意,就是在gnbdclient上,每次啟動系統(tǒng)之后不管gnbd_import的內(nèi)容是否正常都要進行一次檢查并將沒有正確import的鏈路再次import進來。由于這種import通常寫入到系統(tǒng)自動啟動腳本中,所以不需要太多關(guān)注內(nèi)容是否正確,而需要關(guān)注的是務必手動重啟一次multipathd,或者可以將multipathd的啟動順序調(diào)整到一個比較靠后的位置,也就是確保每次正確import了所有共享鏈路之后再重啟multipathd服務。這樣也能夠保證multipathd切換鏈路的成功率。 GNBD - no multipath support for it now
I finished reading part I, excellent! exactly what I am looking for.
There is something wrong here:
開發(fā)者特意在官方文檔中提到了gnbd的multipath功能,而沒有提到iscsi也能實現(xiàn)multipath
Changes to GFS 6.1 for RHEL4 U2
? ?This release supports iSCSI and multipath iSCSI. That is, device mapper
? ?multipath (dm-multipath) can use iSCSI.
? ?
? ?This release prevents the activation of snapshots in a clustered volume
? ?group.
? ?
? ?
Important Notes
? ?
? ?Multipath GNBD is not available with this and previous releases of??
? ?Red Hat GFS 6.1. That is, device mapper multipath (dm-multipath)
? ?cannot use GNBD. GNBD without multipath *is* available.
不過在GFS之外,RHCS中還提供了一個稱為GNBD的東西。GNBD全稱為(Global Network Block Device) ,全局網(wǎng)絡塊設備。該軟件提供了一個基于以太網(wǎng)訪問塊設備也就是存儲設備的一種機制和功能。GNBD通常情況下會被部署在多臺已經(jīng)安裝GFS模塊的服務器上,并且通過不同的配置,安裝gnbd的設備可以做gnbd客戶端也可以做gnbd服務器。其區(qū)別取決于在他們上面所執(zhí)行的操作——在GNBD服務器上可以通過GNBD來導出自己的一個本地設備,相當于通過網(wǎng)絡共享給其他主機,而其他主機可以像訪問本地設備一樣實現(xiàn)對gnbdserver導出設備的讀寫操作。
從剛才的描述中我們應該可以發(fā)現(xiàn)gnbd的功能類似于iscsi的iscsitarget和iscsiinitiator。這點沒錯,但是gnbd相比iscsi又有點不同:
首先gnbd在實現(xiàn)類似iscsi這樣功能的同時還帶有內(nèi)置的fence功能,在通過網(wǎng)絡對塊設備進行訪問的過程中能夠像電源fence那樣切斷主機和存儲之間的聯(lián)系,當然這個fence肯定不是電源fence,而是中斷對存儲訪問來達到fence的目的;
其次gnbd可以結(jié)合device-mapper-multipath來實現(xiàn)對GFS文件系統(tǒng)的多路徑訪問以及線路冗余,在這樣的要求下,可以使gnbdclient可以通過兩個gnbdserver來訪問其上的GFS,如果一個server斷連,只要另外一個server存在,他就還會通過網(wǎng)絡提供GFS文件系統(tǒng)的訪問。盡管我們通過iscsi配置也能實現(xiàn)該功能,不過在這方面似乎沒有g(shù)nbd名頭響亮,因為其開發(fā)者特意在官方文檔中提到了gnbd的multipath功能,而沒有提到iscsi也能實現(xiàn)multipath;
最后一點不同就是和iscsi相比,gnbd的性能會差很多,以至于在未來RHEL系統(tǒng)中的RHCS里是否保留GNBD都是一個未知數(shù),但是可以確認的是對GNBD的開發(fā)已經(jīng)停止了。
既然明知道這是一個過氣的產(chǎn)品,為什么還要寫在其基礎上進行操作的文檔?其實原因也很簡單:
盡管GNBD有諸多不好的地方,但是在RHEL中直到5u2版本才推出一個預覽版性質(zhì)的iscsitarget實現(xiàn)工具,叫做tgtd和tgtadm。可見在此之前一直沒有官方認可的iscsitarget程序。所以如果客戶需要一個官方的解決方案,盡管這個東西性能差但還非他莫屬;同時他可以結(jié)合device-mapper-multipath實現(xiàn)多路徑的功能,在預算不太充足的情況下也是一些企業(yè)的權(quán)宜之計。所以也有必要試驗一些并寫點心得。
不過在結(jié)合device-mapper實現(xiàn)multipath試驗過程中才發(fā)現(xiàn),這個東西其實惡心得還真不是一點半點,因為已經(jīng)是過氣的產(chǎn)品,所以少有人關(guān)注也鮮有維護開發(fā)者,相應的文檔就更是鳳毛麟角。因此在實驗過程中碰到的釘子到最后都成了無頭公案,很多出現(xiàn)的問題即便解決了也并非能從根本上找到原因。所以這里對于那些執(zhí)迷不悟還繼續(xù)要使用或者有膽量使用gnbd的人先敲個警鐘——看看好了,但別真拿到生產(chǎn)環(huán)境中玩,否則我將以名譽保證以后將碰到更多形形色色的問題,當然如果有哪個無聊的冤大頭一定要玩的話,RHEL會在他的RHCS中提供更好的東西。
不過盡管如此,在整個試驗的過程中也要感謝jerry卡王和來自捷克的bmarzins兄提供的技術(shù)支持和大力幫助,尤其是bmarzins兄不顧六個小時的時差對于我的問題和求助給予非常具體和高屋建瓴的意見,以至于在好幾次我已經(jīng)快絕望的時候又見到了曙光,盡管可能并沒有從根本上解決所遇到的問題,但是他的一些提示對這個試驗的繼續(xù)進行提供了重要的線索和支持,最終這個試驗能夠有一個相對滿意的結(jié)果。再次證明了bmarzins兄作為Red Hat資深存儲工程師的深厚功底和嚴謹態(tài)度。
所以為了對得起兩位高人,我將這次的試驗過程整理成文檔,以下是通過GNBD實現(xiàn)基于multipath上GFS訪問的完整記錄:
按照我寫文檔的慣例,首先是這個試驗的拓撲結(jié)構(gòu)和相關(guān)說明:
如下圖所示,在該結(jié)構(gòu)中有五臺服務器,全部使用RHEL5u2系統(tǒng)。其中一臺通過安裝和配置scsi-target來實現(xiàn)iscsitarget服務器,有兩臺iscsiinitiator作為客戶端,并同時作為gnbdserver,另外兩臺服務器作為gnbdclient。每個gnbdclient都有兩條鏈路各自通過一個獨立的vmnet連接到相應的gnbdserver從而實現(xiàn)了一個多路徑的物理結(jié)構(gòu),當然這兩條鏈路盡管都是以太網(wǎng)鏈路,但其實是用于scsi數(shù)據(jù)傳輸?shù)摹N矣盟{色表示連接到gnbdserver1的存儲鏈路,用綠色表示連接到gnbdserver2的存儲鏈路。用黑色表示從iscsiinitiator到scsitarget的iscsi存儲鏈路。最后由于整個的gnbdserver和gnbdclient都要在集群中,那么需要一條獨立的心跳鏈路,就是紅色的192.168.10.0/24這條鏈路。
這個結(jié)構(gòu)完全是在虛擬機vmware 6.0的版本上構(gòu)建起來的。在vmware虛擬機上添加網(wǎng)卡和接口并指定哪個網(wǎng)卡橋接哪個vmnet的過程我想看圖就成,應該不用我再一一說了。
我的設想是這樣,我首先在兩臺iscsiinitiator上實現(xiàn)對iscsitarget上共享存儲的訪問使其成為本地存儲,然后再在這兩臺iscsiinitiator上部署gnbd并加載gnbd.ko模塊和開啟gnbdserver服務使其成為gnbdserver,然后在兩臺gnbdserver上export出其加載過來的iscsitarget的存儲,使gnbdclient能夠import,之后在gnbdclient上配置multipath實現(xiàn)多路徑訪問。最后進行測試。
能夠?qū)崿F(xiàn)的效果是,如果從gnbdclient1上向GFS寫入數(shù)據(jù)時候,如果將任何一條存儲線路斷開,gnbd內(nèi)置的fence機制結(jié)合物理fence可以將相應的gnbdserver從集群中fence掉,成功之后數(shù)據(jù)還可以通過另外一條存儲線路傳輸,或者數(shù)據(jù)傳輸可以從失效鏈路切換到另外一條正常鏈路。
在整個試驗中,gnbdclient和gnbdserver都要作為HA集群中的一個節(jié)點。也就是說在配置multipath之前要構(gòu)建一個四節(jié)點集群。
GNBD_ARC_MOD3.PNG (58.85 KB) 2008-10-24 09:58
?
首先是iscsitarget上的配置。這個服務器上的基本情況和配置信息包括:
地址信息:
[root@localhost ~]# ifconfig | grep inet
? ?? ?? ? inet addr:172.16.1.10??Bcast:172.16.255.255??Mask:255.255.0.0
? ?? ?? ? inet6 addr: fe80::20c:29ff:feb6:9605/64 Scope:Link
? ?? ?? ? inet addr:127.0.0.1??Mask:255.0.0.0
? ?? ?? ? inet6 addr: ::1/128 Scope:Host
在該主機上添加一個8G的硬盤作為共享:
[root@localhost ~]# fdisk -l
Disk /dev/sda: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
? ?Device Boot? ?? ?Start? ?? ?? ?End? ?? ?Blocks? ?Id??System
/dev/sda1? ?*? ?? ?? ???1? ?? ?? ? 13? ?? ?104391? ?83??Linux
/dev/sda2? ?? ?? ?? ???14? ?? ?? ?796? ???6289447+??83??Linux
/dev/sda3? ?? ?? ?? ? 797? ?? ?? ?861? ?? ?522112+??82??Linux swap / Solaris
Disk /dev/sdb: 8589 MB, 8589934592 bytes
64 heads, 32 sectors/track, 8192 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
然后是所安裝的軟件包,尤其是前面的scsi-target-utils,該包提供了iscsi服務端tgtd和tgtadm
[root@localhost ~]# rpm -qa | grep scsi
scsi-target-utils-0.0-0.20070620snap.el5
iscsi-initiator-utils-6.2.0.868-0.7.el5
[root@localhost ~]# rpm -ql scsi-target-utils
/etc/rc.d/init.d/tgtd
/usr/sbin/tgtadm
/usr/sbin/tgtd
之后啟動scsitarget服務:
[root@localhost ~]# service tgtd start
[root@localhost ~]# chkconfig --level 345 tgtd on
并將共享設備規(guī)則寫入到/etc/rc.local中。沒辦法,這個產(chǎn)品還是預覽版本,所以只能將就了:
[root@localhost ~]# cat /etc/rc.local
/usr/sbin/tgtadm --lld iscsi --op new --mode target --tid 1 -T iqn.2008-09.gnbd.storage.sdb
/usr/sbin/tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 -b /dev/sdb
/usr/sbin/tgtadm --lld iscsi --op bind --mode target --tid 1 -I 172.16.0.0/16
然后重啟tgtd服務:
[root@localhost ~]# service tgtd restart
到此為止在iscsi-target上的配置基本完成。下面是iscsi-initiator上的配置:
首先是基本網(wǎng)絡信息:
[root@gnbdserver1 ~]# hostname
gnbdserver1
[root@gnbdserver1 ~]# ifconfig | grep inet
? ?? ?? ? inet addr:192.168.10.13??Bcast:192.168.10.255??Mask:255.255.255.0? ? ? ? ?eth0
? ?? ?? ? inet6 addr: fe80::20c:29ff:fe82:237a/64 Scope:Link
? ?? ?? ? inet addr:172.16.1.11??Bcast:172.16.1.255??Mask:255.255.255.0 ?eth1
? ?? ?? ? inet6 addr: fe80::20c:29ff:fe82:2384/64 Scope:Link
? ?? ?? ? inet addr:192.168.2.12??Bcast:192.168.2.255??Mask:255.255.255.0 ?eth2
? ?? ?? ? inet6 addr: fe80::20c:29ff:fe82:238e/64 Scope:Link
然后實現(xiàn)配置iscsi-initiator,確保安裝了iscsi-initiator-utils包,然后執(zhí)行下面的操作:
[root@gnbdserver1 ~]# service iscsi restart
[root@gnbdserver1 ~]# chkconfig --level 345 iscsi on
[root@gnbdserver1 ~]# iscsiadm --mode discovery --type sendtargets --portal 172.16.1.10
修改文件:/etc/iscsi/iscsi.conf,更改下面兩個參數(shù)為:
node.session.timeo.replacement_timeout = 12? ???
node.session.initial_login_retry_max = 1? ?? ?? ?
對于這兩個參數(shù)的解釋:
To specify the length of time to wait for session re-establishment before failing SCSI commands back to the application when running the Linux SCSI Layer error handler, edit the line.
The value is in seconds and the default is 120 seconds.
To speficy the number of times iscsiadm should retry a login to the target when we first login, modify the following line. The default is 4. Valid values are any integer value. This only affects the initial login. Setting it to a high value can slow down the iscsi service startup. Setting it to a low value can cause a session to not get logged into, if there are distuptions during startup or if the network is not ready at that time.
其實我修改這兩個值的目的完全是希望底層iscsi在探測到鏈路失效的時候驅(qū)動的反映能夠快一些,因為我在普通環(huán)境下以iscsi做multipath的試驗發(fā)現(xiàn)在iscsi網(wǎng)絡上存儲線路failover太慢,要一分鐘之久。而這個值在結(jié)合gnbd的時候可能會出現(xiàn)一些問題。事實上根據(jù)其他工程師的反應,在光纖存儲控制器或者光纖交換機切換的周期根本沒有這么長。不過我對此做的更改似乎沒有什么太大的用處,如果在非cluster環(huán)境下做multipath,更改該值反而會使線路切換更加慢。所以我認為該值不改也行。
接著,就要通過iscsiadm命令來確認連接到iscsi存儲,并重啟服務:
[root@gnbdserver1 ~]# service iscsi restart
[root@gnbdserver1 ~]# iscsiadm --mode node
172.16.1.10:3260,1 iqn.2008-09.gnbd.storage.sdb
[root@gnbdserver1 ~]# fdisk -l
Disk /dev/sda: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
? ?Device Boot? ?? ?Start? ?? ?? ?End? ?? ?Blocks? ?Id??System
/dev/sda1? ?*? ?? ?? ???1? ?? ?? ? 13? ?? ?104391? ?83??Linux
/dev/sda2? ?? ?? ?? ???14? ?? ?? ?796? ???6289447+??83??Linux
/dev/sda3? ?? ?? ?? ? 797? ?? ?? ?861? ?? ?522112+??82??Linux swap / Solaris
Disk /dev/sdb: 8589 MB, 8589934592 bytes
64 heads, 32 sectors/track, 8192 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
? ?Device Boot? ?? ?Start? ?? ?? ?End? ?? ?Blocks? ?Id??System
/dev/sdb1? ?? ?? ?? ?? ?1? ?? ???3816? ???3907568? ?83??Linux
到此為止在gnbdserver1上的配置已經(jīng)完成。而gnbdserver2除了基本信息的改動之外iscsi的相關(guān)配置是一樣的,所以這里只將gnbdserver2的基本信息列出來:
[root@gnbdserver2 ~]# hostname
gnbdserver2
[root@gnbdserver2 ~]# ifconfig | grep inet
? ?? ?? ? inet addr:192.168.10.14??Bcast:192.168.10.255??Mask:255.255.255.0
? ?? ?? ? inet6 addr: fe80::20c:29ff:fe8e:c855/64 Scope:Link
? ?? ?? ? inet addr:172.16.1.12??Bcast:172.16.1.255??Mask:255.255.255.0
? ?? ?? ? inet6 addr: fe80::20c:29ff:fe8e:c85f/64 Scope:Link
? ?? ?? ? inet addr:192.168.3.12??Bcast:192.168.3.255??Mask:255.255.255.0
接著配置gnbdclient1和gnbdclient2的基本信息,做加入集群的準備工作:
[root@gnbdclient1 ~]# hostname
gnbdclient1
[root@gnbdclient1 ~]# ifconfig | grep inet
? ?? ?? ? inet addr:192.168.10.11??Bcast:192.168.10.255??Mask:255.255.255.0? ? ? ? ?eth0
? ?? ?? ? inet6 addr: fe80::20c:29ff:fe6e:4d82/64 Scope:Link
? ?? ?? ? inet addr:192.168.3.13??Bcast:192.168.3.255??Mask:255.255.255.0 ?eth1
? ?? ?? ? inet6 addr: fe80::20c:29ff:fe6e:4d8c/64 Scope:Link
? ?? ?? ? inet addr:192.168.2.11??Bcast:192.168.2.255??Mask:255.255.255.0? ? ? ? ?eth2
? ?? ?? ? inet6 addr: fe80::20c:29ff:fe6e:4d96/64 Scope:Link
[root@gnbdclient1 ~]# cat /etc/hosts
127.0.0.1? ?? ?? ?? ?? ?localhost.localdomain localhost
::1? ?? ?? ?? ? localhost6.localdomain6 localhost6
192.168.10.13? ?gnbdserver1
192.168.10.14? ?gnbdserver2
192.168.10.11? ?gnbdclient1
192.168.10.12? ?gnbdclient2
注意,該文件要同步到其他的gnbdservers和gnbdclients中。
然后確保以下軟件的安裝:
cman-2.0.84-2.el5
rgmanager-2.0.38-2.el5
system-config-cluster-1.0.52-1.1
kmod-gnbd-0.1.4-12.el5
gnbd-1.1.5-1.el5
gfs-utils-0.1.17-1.el5
kmod-gfs2-1.92-1.1.el5
gfs2-utils-0.1.44-1.el5
kmod-gfs-0.1.23-5.el5
上述這些包也要在其他的gnbdservers和gnbdclients中安裝。
完成之后兩臺gnbdclients和gnbdservers都重啟以加載gnbd和gfs模塊。當然在重啟所有系統(tǒng)之前可以先在gnbdclients上安裝device-mapper-multipath,并啟動該服務:
[root@gnbdclient1 ~]# service multiapthd restart
[root@gnbdclient1 ~]# chkconfig –-level 345 multipathd on
[root@gnbdclient2 ~]# service multiapthd restart
[root@gnbdclient2 ~]# chkconfig –-level 345 multipathd on
這樣在client啟動的時候可以自動加載multipath模塊。
在重啟完成之后,下面就可以配置集群了,在任何一臺gnbdservers和gnbdclients上進入圖形界面,然后執(zhí)行system-config-cluster打開集群配置工具。基本步驟和部署一個普通集群無差別,只是我在這里使用了manual_fence,因為確實沒有物理fence設備,并且分別為gnbdservers和gnbdclients建立一個failover domain。關(guān)于為什么要使用manual_fence,官方文檔有相關(guān)解釋:
GNBD server nodes must be fenced using a fencing method that physically removes the nodes
from the network. To physically remove a GNBD server node, you can use any fencing device:
except the following: fence_brocade fence agent, fence_vixel fence agent, fence_mcdata
fence agent, fence_sanbox2 fence agent, fence_scsi fence agent. In addition, you cannot use
the GNBD fencing device (fence_gnbd fence agent) to fence a GNBD server node.
至于服務方面我沒有添加,因為該試驗的目的是測試gnbd和multipath的功能。完成之后將配置文件同步到其他主機,下面是我的配置文件:
[root@gnbdclient1 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="6" name="gnbd_cluster">
? ?? ???<fence_daemon post_fail_delay="0" post_join_delay="3"/>
? ?? ???<clusternodes>
? ?? ?? ?? ?? ? <clusternode name="gnbdserver1" nodeid="1" votes="1">
? ?? ?? ?? ?? ?? ?? ?? ?<fence>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???<method name="1">
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? <device name="mfence" nodename="gnbdserver1"/>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???</method>
? ?? ?? ?? ?? ?? ?? ?? ?</fence>
? ?? ?? ?? ?? ? </clusternode>
? ?? ?? ?? ?? ? <clusternode name="gnbdserver2" nodeid="2" votes="1">
? ?? ?? ?? ?? ?? ?? ?? ?<fence>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???<method name="1">
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? <device name="mfence" nodename="gnbdserver2"/>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???</method>
? ?? ?? ?? ?? ?? ?? ?? ?</fence>
? ?? ?? ?? ?? ? </clusternode>
? ?? ?? ?? ?? ? <clusternode name="gnbdclient1" nodeid="3" votes="1">
? ?? ?? ?? ?? ?? ?? ?? ?<fence>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???<method name="1">
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? <device name="mfence" nodename="gnbdclient1"/>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???</method>
? ?? ?? ?? ?? ?? ?? ?? ?</fence>
? ?? ?? ?? ?? ? </clusternode>
? ?? ?? ?? ?? ? <clusternode name="gnbdclient2" nodeid="4" votes="1">
? ?? ?? ?? ?? ?? ?? ?? ?<fence>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???<method name="1">
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? <device name="mfence" nodename="gnbdclient2"/>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???</method>
? ?? ?? ?? ?? ?? ?? ?? ?</fence>
? ?? ?? ?? ?? ? </clusternode>
? ?? ???</clusternodes>
? ?? ???<cman/>
? ?? ???<fencedevices>
? ?? ?? ?? ?? ? <fencedevice agent="fence_manual" name="mfence"/>
? ?? ???</fencedevices>
? ?? ???<rm>
? ?? ?? ?? ?? ? <failoverdomains>
? ?? ?? ?? ?? ?? ?? ?? ?<failoverdomain name="server" ordered="1" restricted="1">
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???<failoverdomainnode name="gnbdserver1" priority="1"/>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???<failoverdomainnode name="gnbdserver2" priority="1"/>
? ?? ?? ?? ?? ?? ?? ?? ?</failoverdomain>
? ?? ?? ?? ?? ?? ?? ?? ?<failoverdomain name="client" ordered="1" restricted="1">
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???<failoverdomainnode name="gnbdclient1" priority="1"/>
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???<failoverdomainnode name="gnbdclient2" priority="1"/>
? ?? ?? ?? ?? ?? ?? ?? ?</failoverdomain>
? ?? ?? ?? ?? ? </failoverdomains>
? ?? ?? ?? ?? ? <resources/>
? ?? ???</rm>
</cluster>
然后在所有g(shù)nbdclients和gnbdservers上開啟集群服務并重啟所有系統(tǒng):
[root@gnbdclient1 ~]# chkconfig --level 345 cman on
[root@gnbdclient1 ~]# chkconfig --level 345 rgmanager on
[root@gnbdclient1 ~]# chkconfig --level 345 gfs on
最后再次重啟所有系統(tǒng)。
在系統(tǒng)啟動之后,如果配置沒錯并且拓撲沒錯的話集群也就已經(jīng)起來了。接著在gnbdservers上建立gfs文件系統(tǒng),注意,這里gfs文件系統(tǒng)要針對整個分區(qū)建立而不是lvm上建立,因為在lvm上構(gòu)建GFS文件系統(tǒng)在后期gnbd export的時候會出錯。所以自然也就不需要啟動clvmd服務。
下面是建立gfs文件系統(tǒng)的步驟,可以在任何一臺gnbdserver上建立gfs文件系統(tǒng):
先用fdisk將共享盤劃分為整個一個單分區(qū),然后執(zhí)行g(shù)fs_mkfs命令:
[root@gnbdclient1 ~]# gfs_mkfs -t gnbd_clustre:gfs1 -p lock_dlm -j 2 /dev/sdb1
在另外一個節(jié)點上就不用再次執(zhí)行g(shù)fs_mkfs了。
接著可以在gnbdservers和gnbdclients上部署gnbd。
首先是gnbdserver1上的配置步驟:
系統(tǒng)重啟之后手動執(zhí)行modprobe命令加載gnbd.ko模塊并啟動gnbd服務,我可以將將這幾個步驟寫入到/etc/rc.local文件中:
[root@gnbdserver1 ~]# cat /etc/rc.local
/sbin/rmmod pcspkr
/sbin/modprobe gnbd
/sbin/gnbd_serv
/sbin/gnbd_export -d /dev/sdb1 -e gfs-1 -U -t 5
然后執(zhí)行該文件:
[root@gnbdserver1 ~]# /etc/rc.local
完成之后查看:
[root@gnbdserver1 ~]# gnbd_export -l
Server[1] : gfs-1
--------------------------
? ?? ?file : /dev/sdb1
? ?sectors : 7815136
??readonly : no
? ? cached : no
? ?timeout : 5
? ?? ? uid : GNBD-1-16465616462656166313a3100000000000000000000000000
官方文檔對該操作的部分解釋:
# gnbd_export Command to create, export and manage GNBDs on a GNBD server.
另外在使用gnbd實現(xiàn)multipath的時候有幾點需要注意:
For GNBD with device-mapper multipath, do not specify Linux page caching (the -c option of
the gnbd_export command). All GNBDs that are part of a logical volume must run with caching
disabled. Data corruption occurs if the GNBDs are run with caching enabled.
-U Command
Gets the UID command. The UID command is a command the gnbd_export command will run to get a Universal Identifier for the exported device. The UID is necessary to use device-mapper multipath with GNBD. The command must use the full path of any executeable that you wish to run. A command can contain the %M, %m or %n escape sequences. %M will be expanded to the major number of the exported device, %m will be expaned to the minor number of the exported device, and %n will be expanded to the sysfs name for the device. If no command is given, GNBD will use the default command /usr/sbin/gnbd_get_uid. This command will work for most SCSI devices.
1. A GNBD server node must have local access to all storage devices needed to mount a GFS
file system. The GNBD server node must not import (gnbd_import command) other GNBD
devices to run the file system.
2. The GNBD server must export all the GNBDs in uncached mode, and it must export the raw
devices, not logical volume devices.
3. GFS must be run on top of a logical volume device, not raw devices.
You may need to increase the timeout period on the exported GNBDs to accommodate reduced performance. The need to increase the timeout period depends on the quality of the hardware.
然后是gnbdserver2上的配置步驟和gnbdserver1一樣,稍有不同的是配置文件內(nèi)容,所以我只給出配置文件,其他不再贅述。
[root@gnbdserver2 ~]# cat /etc/rc.local
/sbin/rmmod pcspkr
/sbin/modprobe gnbd
/sbin/gnbd_serv
/sbin/gnbd_export -d /dev/sdb1 -e gfs-2 -U -t 5
接著是在gnbdclient1上的配置:
和gnbdserver一樣,我將需要的配置寫入到/etc/rc.local中:
[root@gnbdclient1 ~]# cat /etc/rc.local
/sbin/rmmod pcspkr
/sbin/modprobe gnbd
/sbin/gnbd_import -i 192.168.2.12??
/sbin/gnbd_import -i 192.168.3.12
并執(zhí)行該腳本:
[root@gnbdserver1 ~]# /etc/rc.local
查看執(zhí)行情況:
[root@gnbdclient1 ~]# gnbd_import -l
Device name : gfs-1
----------------------
? ? Minor # : 0
sysfs name : /block/gnbd0
? ???Server : 192.168.2.12
? ?? ? Port : 14567
? ?? ?State : Open Connected Clear
? ?Readonly : No
? ? Sectors : 7815136
Device name : gfs-2
----------------------
? ? Minor # : 1
sysfs name : /block/gnbd1
? ???Server : 192.168.3.12
? ?? ? Port : 14567
? ?? ?State : Open Connected Clear
? ?Readonly : No
Sectors : 7815136
[root@gnbdclient1 ~]# gnbd_monitor
device #? ?timeout? ?state
? ?? ? 1? ?? ?? ?5? ?normal
? ?? ? 0? ?? ?? ?5? ?normal
當然在gnbdclient2上的配置和測試過程完全一樣,不再贅述。
到此為止gnbd在server和client上的配置都已經(jīng)全部完成。
最后是在gnbdclients上部署multipath。剛才已經(jīng)安裝了相關(guān)的multipath包,所以直接修改配置文件。除了修改blacklist之外,將原來的default段的內(nèi)容修改如下:
[root@gnbdclient1 ~]# cat /etc/multipath.conf
blacklist {
? ?? ?? ?devnode "sda"
}
defaults {
? ?? ? udev_dir? ?? ?? ?? ?? ? /dev
? ?? ? polling_interval? ?? ???5
? ?? ? selector? ?? ?? ?? ?? ? "round-robin 0"
? ?? ? path_grouping_policy? ? failover
? ?? ? prio_callout? ?? ?? ?? ?none
? ?? ? path_checker? ?? ?? ?? ?readsector0
? ?? ? rr_min_io? ?? ?? ?? ?? ?1000
? ?? ? rr_weight? ?? ?? ?? ?? ?uniform
}
multipaths {
? ?? ???multipath {
? ?? ?? ?? ?? ? wwid? ?? ?? ?? ?? ?? ???GNBD-1-16465616462656166313a3100000000000000000000000000
? ?? ?? ?? ?? ? alias? ?? ?? ?? ?? ?? ? yellow
? ?? ?? ?? ?? ? path_grouping_policy? ? failover
? ?? ?? ?? ?? ? path_checker? ?? ?? ?? ?readsector0
? ?? ?? ?? ?? ? path_selector? ?? ?? ???"round-robin 0"
? ?? ?? ?? ?? ? failback? ?? ?? ?? ?? ? manual
? ?? ?? ?? ?? ? rr_weight? ?? ?? ?? ?? ?priorities
? ?? ?? ?? ?? ? no_path_retry? ?? ?? ???5
? ?? ???}
}
這是在jerry幫助下修改的一個實現(xiàn)的主備功能的multipath配置,完成之后保存并啟用服務:
[root@gnbdclient1 ~]# service multipathd restart
[root@gnbdclient1 ~]# chkconfig --level 345 multipathd on
[root@gnbdclient1 ~]# multipath -ll
yellow (GNBD-1-16465616462656166313a3100000000000000000000000000) dm-0 GNBD,GNBD
[size=3.7G][features=1 queue_if_no_path][hwhandler=0]
/_ round-robin 0 [prio=1][active]
/_ #:#:#:# gnbd0 252:0 [active][ready]
/_ round-robin 0 [prio=1][enabled]
/_ #:#:#:# gnbd1 252:1 [active][ready]
下面把配置文件multipath.conf同步到gnbdclient2上,然后按照同樣的步驟啟用服務,這樣整個的試驗結(jié)構(gòu)就算全部完成。 下面開始按照預先的設想去進行測試:
為了保證一個干凈的環(huán)境,首先重啟全部節(jié)點并清空所有節(jié)點上的日志:
# > /var/log/messages,
然后在gnbdclient1上掛載gfs文件系統(tǒng):
[root@gnbdclient1 ~]# mount /dev/mpath/yellow /mnt/
[root@gnbdclient1 ~]# df -TH /mnt
Filesystem? ? Type? ???Size? ?Used??Avail Use% Mounted on
/dev/dm-0? ?? ?gfs? ???3.8G? ?889M? ?2.9G??24% /mnt
[root@gnbdclient1 ~]# tail -f /var/log/messages
Oct 23 13:59:46 gnbdclient1 kernel: GFS 0.1.23-5.el5 (built Apr 30 2008 16:56:42) installed
Oct 23 13:59:46 gnbdclient1 kernel: Trying to join cluster "lock_dlm", "gnbd_cluster:gfs1"
Oct 23 13:59:46 gnbdclient1 kernel: Joined cluster. Now mounting FS...
Oct 23 13:59:46 gnbdclient1 kernel: GFS: fsid=gnbd_cluster:gfs1.0: jid=0: Trying to acquire journal lock...
Oct 23 13:59:46 gnbdclient1 kernel: GFS: fsid=gnbd_cluster:gfs1.0: jid=0: Looking at journal...
Oct 23 13:59:47 gnbdclient1 kernel: GFS: fsid=gnbd_cluster:gfs1.0: jid=0: Done
Oct 23 13:59:47 gnbdclient1 kernel: GFS: fsid=gnbd_cluster:gfs1.0: jid=1: Trying to acquire journal lock...
Oct 23 13:59:47 gnbdclient1 kernel: GFS: fsid=gnbd_cluster:gfs1.0: jid=1: Looking at journal...
Oct 23 13:59:47 gnbdclient1 kernel: GFS: fsid=gnbd_cluster:gfs1.0: jid=1: Done
同時在gnbdclient2上也同樣掛載該文件系統(tǒng):
[root@gnbdclient2 ~]# mount /dev/mpath/yellow /mnt/
在gnbdclient2上監(jiān)控/mnt:
[root@gnbdclient2 ~]# watch -n1 "ls -l /mnt"
然后在gnbdclient1上向GFS文件系統(tǒng)dd數(shù)據(jù):
[root@gnbdclient1 mnt]# dd if=/dev/zero of=test.img
在執(zhí)行該命令的時候,我們會通過虛擬機網(wǎng)絡接口發(fā)現(xiàn)數(shù)據(jù)流在vmnet3上,根據(jù)結(jié)構(gòu)圖判斷,數(shù)據(jù)是在通過gnbdserver1走。然后我將gnbdclient1和gnbdserver1連接的鏈路在gnbdclient1上物理斷開。數(shù)據(jù)流將出現(xiàn)暫時的中斷。
然后在gnbdserver1上出現(xiàn)下面的報錯信息:
[root@gnbdserver1 ~]# cat /var/log/messages
Oct 23 14:05:26 gnbdserver1 openais[4857]: [CMAN ] cman killed by node 3 because we were killed by cman_tool or other application
Oct 23 14:05:26 gnbdserver1 gfs_controld[4885]: groupd_dispatch error -1 errno 11
Oct 23 14:05:26 gnbdserver1 gfs_controld[4885]: groupd connection died
Oct 23 14:05:26 gnbdserver1 gfs_controld[4885]: cluster is down, exiting
Oct 23 14:05:26 gnbdserver1 dlm_controld[4879]: cluster is down, exiting
Oct 23 14:05:26 gnbdserver1 kernel: dlm: closing connection to node 4
Oct 23 14:05:26 gnbdserver1 kernel: dlm: closing connection to node 3
Oct 23 14:05:26 gnbdserver1 kernel: dlm: closing connection to node 2
Oct 23 14:05:26 gnbdserver1 kernel: dlm: closing connection to node 1
Oct 23 14:05:29 gnbdserver1 gnbd_clusterd[5672]: ERROR [gnbd_clusterd.c:72] Bad poll result 0x11 from cluster
Oct 23 14:05:30 gnbdserver1 fenced[4873]: cluster is down, exiting
Oct 23 14:05:47 gnbdserver1 ccsd[4817]: Unable to connect to cluster infrastructure after 30 seconds.
而在gnbdserver2上認為gnbdserver1將被fence出集群:
[root@gnbdserver2 ~]# cat /var/log/messages
Oct 23 14:09:40 gnbdserver2 openais[4832]: [TOTEM] entering GATHER state from 12.
Oct 23 14:09:44 gnbdserver2 openais[4832]: [TOTEM] entering GATHER state from 11.
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] Saving state aru 78 high seq received 78
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] Storing new sequence id for ring 460
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] entering COMMIT state.
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] entering RECOVERY state.
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] position [0] member 192.168.10.11:
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] previous ring seq 1116 rep 192.168.10.11
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] aru 78 high delivered 78 received flag 1
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] position [1] member 192.168.10.12:
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] previous ring seq 1116 rep 192.168.10.11
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] aru 78 high delivered 78 received flag 1
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] position [2] member 192.168.10.14:
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] previous ring seq 1116 rep 192.168.10.11
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] aru 78 high delivered 78 received flag 1
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] Did not need to originate any messages in recovery.
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] CLM CONFIGURATION CHANGE
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] New Configuration:
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??]? ?? ?r(0) ip(192.168.10.11)??
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??]? ?? ?r(0) ip(192.168.10.12)??
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??]? ?? ?r(0) ip(192.168.10.14)??
Oct 23 14:09:45 gnbdserver2 kernel: dlm: closing connection to node 1
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] Members Left:
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??]? ?? ?r(0) ip(192.168.10.13)??
Oct 23 14:09:45 gnbdserver2 fenced[4848]: gnbdserver1 not a cluster member after 0 sec post_fail_delay
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] Members Joined:
Oct 23 14:09:45 gnbdserver2 fenced[4848]: fencing node "gnbdserver1"
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] CLM CONFIGURATION CHANGE
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] New Configuration:
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??]? ?? ?r(0) ip(192.168.10.11)??
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??]? ?? ?r(0) ip(192.168.10.12)??
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??]? ?? ?r(0) ip(192.168.10.14)??
Oct 23 14:09:45 gnbdserver2 fence_manual: Node gnbdserver1 needs to be reset before recovery can procede.??Waiting for gnbdserver1 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n gnbdserver1)
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] Members Left:
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] Members Joined:
Oct 23 14:09:45 gnbdserver2 openais[4832]: [SYNC ] This node is within the primary component and will provide service.
Oct 23 14:09:45 gnbdserver2 openais[4832]: [TOTEM] entering OPERATIONAL state.
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] got nodejoin message 192.168.10.11
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] got nodejoin message 192.168.10.12
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CLM??] got nodejoin message 192.168.10.14
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CPG??] got joinlist message from node 2
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CPG??] got joinlist message from node 3
Oct 23 14:09:45 gnbdserver2 openais[4832]: [CPG??] got joinlist message from node 4
現(xiàn)在我按照提示在gnbdserver2上執(zhí)行命令來將gnbdserver1來fence掉:
[root@gnbdserver2 ~]# fence_ack_manual -n gnbdserver1
Warning:??If the node "gnbdserver1" has not been manually fenced
(i.e. power cycled or disconnected from shared storage devices)
the GFS file system may become corrupted and all its data
unrecoverable!??Please verify that the node shown above has
been reset or disconnected from storage.
Are you certain you want to continue? [yN] y
Done
命令執(zhí)行成功之后,數(shù)據(jù)流將立刻切換到gnbdserver2上。如果我在gnbdclient1上終止dd進程能夠順利完成:
[root@gnbdclient1 mnt]# dd if=/dev/zero of=test.img
163457+0 records in
163457+0 records out
83689984 bytes (84 MB) copied, 4.27649 seconds, 19.6 MB/s
[root@gnbdclient1 mnt]# dd if=/dev/zero of=test.img
? ?? ???1653921+0 records in
1653921+0 records out
846807552 bytes (847 MB) copied, 158.782 seconds, 5.3 MB/s
這個時候監(jiān)控gnbdserver2的后續(xù)的日志:
Oct 23 14:10:57 gnbdserver2 fenced[4848]: fence "gnbdserver1" success
而在gnbdclient1上的整個日志:
Oct 23 14:03:28 gnbdclient1 kernel: eth2: link down
Oct 23 14:03:38 gnbdclient1 kernel: gnbd (pid 5361: gnbd_recvd) got signal
Oct 23 14:03:38 gnbdclient1 kernel: gnbd0: Receive control failed (result -4)
Oct 23 14:03:38 gnbdclient1 kernel: gnbd0: shutting down socket
Oct 23 14:03:38 gnbdclient1 kernel: exiting GNBD_DO_IT ioctl
Oct 23 14:03:38 gnbdclient1 gnbd_recvd[5361]: client lost connection with 192.168.2.12 : Interrupted system call
Oct 23 14:03:38 gnbdclient1 gnbd_recvd[5361]: reconnecting
Oct 23 14:03:54 gnbdclient1 openais[4711]: [TOTEM] The token was lost in the OPERATIONAL state.
Oct 23 14:03:54 gnbdclient1 openais[4711]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
Oct 23 14:03:54 gnbdclient1 openais[4711]: [TOTEM] Transmit multicast socket send buffer size (219136 bytes).
Oct 23 14:03:54 gnbdclient1 openais[4711]: [TOTEM] entering GATHER state from 2.
Oct 23 14:03:58 gnbdclient1 openais[4711]: [TOTEM] entering GATHER state from 0.
Oct 23 14:03:58 gnbdclient1 openais[4711]: [TOTEM] Creating commit token because I am the rep.
Oct 23 14:03:58 gnbdclient1 openais[4711]: [TOTEM] Saving state aru 78 high seq received 78
Oct 23 14:03:58 gnbdclient1 openais[4711]: [TOTEM] Storing new sequence id for ring 460
Oct 23 14:03:58 gnbdclient1 openais[4711]: [TOTEM] entering COMMIT state.
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] entering RECOVERY state.
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] position [0] member 192.168.10.11:
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] previous ring seq 1116 rep 192.168.10.11
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] aru 78 high delivered 78 received flag 1
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] position [1] member 192.168.10.12:
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] previous ring seq 1116 rep 192.168.10.11
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] aru 78 high delivered 78 received flag 1
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] position [2] member 192.168.10.14:
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] previous ring seq 1116 rep 192.168.10.11
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] aru 78 high delivered 78 received flag 1
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] Did not need to originate any messages in recovery.
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] Sending initial ORF token
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] CLM CONFIGURATION CHANGE
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] New Configuration:
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??]? ?? ?r(0) ip(192.168.10.11)??
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??]? ?? ?r(0) ip(192.168.10.12)??
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??]? ?? ?r(0) ip(192.168.10.14)??
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] Members Left:
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??]? ?? ?r(0) ip(192.168.10.13)??
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] Members Joined:
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] CLM CONFIGURATION CHANGE
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] New Configuration:
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??]? ?? ?r(0) ip(192.168.10.11)??
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??]? ?? ?r(0) ip(192.168.10.12)??
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??]? ?? ?r(0) ip(192.168.10.14)??
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] Members Left:
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] Members Joined:
Oct 23 14:03:59 gnbdclient1 openais[4711]: [SYNC ] This node is within the primary component and will provide service.
Oct 23 14:03:59 gnbdclient1 openais[4711]: [TOTEM] entering OPERATIONAL state.
Oct 23 14:03:59 gnbdclient1 kernel: dlm: closing connection to node 1
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] got nodejoin message 192.168.10.11
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] got nodejoin message 192.168.10.12
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CLM??] got nodejoin message 192.168.10.14
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CPG??] got joinlist message from node 2
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CPG??] got joinlist message from node 3
Oct 23 14:03:59 gnbdclient1 openais[4711]: [CPG??] got joinlist message from node 4
Oct 23 14:03:59 gnbdclient1 fenced[4727]: fencing deferred to gnbdserver2
Oct 23 14:04:26 gnbdclient1 gnbd_recvd[5361]: ERROR [gnbd_recvd.c:213] cannot connect to server 192.168.2.12 (-1) : No route to host
Oct 23 14:04:26 gnbdclient1 gnbd_recvd[5361]: reconnecting
Oct 23 14:04:34 gnbdclient1 gnbd_recvd[5361]: ERROR [gnbd_recvd.c:213] cannot connect to server 192.168.2.12 (-1) : No route to host
Oct 23 14:04:34 gnbdclient1 gnbd_recvd[5361]: reconnecting
Oct 23 14:04:35 gnbdclient1 multipathd: gnbd0: directio checker reports path is down
Oct 23 14:04:35 gnbdclient1 multipathd: checker failed path 252:0 in map yellow
Oct 23 14:04:35 gnbdclient1 multipathd: yellow: remaining active paths: 1
Oct 23 14:04:35 gnbdclient1 multipathd: dm-0: add map (uevent)
Oct 23 14:04:35 gnbdclient1 multipathd: dm-0: devmap already registered
Oct 23 14:04:35 gnbdclient1 kernel: device-mapper: multipath: Failing path 252:0.
Oct 23 14:04:41 gnbdclient1 multipathd: gnbd0: directio checker reports path is down
Oct 23 14:04:42 gnbdclient1 gnbd_recvd[5361]: ERROR [gnbd_recvd.c:213] cannot connect to server 192.168.2.12 (-1) : No route to host
Oct 23 14:04:42 gnbdclient1 gnbd_recvd[5361]: reconnecting
Oct 23 14:04:47 gnbdclient1 multipathd: gnbd0: directio checker reports path is down
Oct 23 14:04:50 gnbdclient1 gnbd_recvd[5361]: ERROR [gnbd_recvd.c:213] cannot connect to server 192.168.2.12 (-1) : No route to host
………………………………………………………………………………………………………………
Oct 23 14:04:58 gnbdclient1 gnbd_recvd[5361]: reconnecting
Oct 23 14:04:59 gnbdclient1 multipathd: gnbd0: directio checker reports path is down
Oct 23 14:05:05 gnbdclient1 multipathd: gnbd0: directio checker reports path is down
Oct 23 14:05:06 gnbdclient1 gnbd_recvd[5361]: ERROR [gnbd_recvd.c:213] cannot connect to server 192.168.2.12 (-1) : No route to host
Oct 23 14:05:06 gnbdclient1 gnbd_recvd[5361]: reconnecting
Oct 23 14:05:11 gnbdclient1 multipathd: gnbd0: directio checker reports path is down
Oct 23 14:05:14 gnbdclient1 kernel: gnbd_monitor 16518 called gnbd_end_request with an error
Oct 23 14:05:14 gnbdclient1 kernel: end_request: I/O error, dev gnbd0, sector 2759848
…………………………………………………………………………………………………………………
Oct 23 14:05:14 gnbdclient1 kernel: gnbd_monitor 16518 called gnbd_end_request with an error
Oct 23 14:05:14 gnbdclient1 kernel: end_request: I/O error, dev gnbd0, sector 3732744
Oct 23 14:05:16 gnbdclient1 multipathd: gnbd0: directio checker reports path is down
Oct 23 14:05:21 gnbdclient1 kernel: multipathd 4563 called gnbd_end_request with an error
Oct 23 14:05:21 gnbdclient1 kernel: end_request: I/O error, dev gnbd0, sector 0
…………………………………………………………………………………………………………
Oct 23 14:05:46 gnbdclient1 kernel: end_request: I/O error, dev gnbd0, sector 0
Oct 23 14:05:46 gnbdclient1 multipathd: gnbd0: directio checker reports path is down
Oct 23 14:05:51 gnbdclient1 kernel: multipathd 4563 called gnbd_end_request with an error
Oct 23 14:05:51 gnbdclient1 kernel: end_request: I/O error, dev gnbd0, sector 0
Oct 23 14:05:51 gnbdclient1 multipathd: gnbd0: directio checker reports path is down
而在gnbdclient2上的日志信息:
Oct 23 13:56:25 gnbdclient2 kernel: gnbd (pid 5576: gnbd_recvd) got signal
Oct 23 13:56:25 gnbdclient2 kernel: gnbd0: Receive control failed (result -4)
Oct 23 13:56:25 gnbdclient2 kernel: gnbd0: shutting down socket
Oct 23 13:56:25 gnbdclient2 kernel: exiting GNBD_DO_IT ioctl
Oct 23 13:56:28 gnbdclient2 kernel: ls 15997 called gnbd_end_request with an error
Oct 23 13:56:28 gnbdclient2 kernel: end_request: I/O error, dev gnbd0, sector 208
Oct 23 13:56:28 gnbdclient2 kernel: device-mapper: multipath: Failing path 252:0.
Oct 23 13:56:28 gnbdclient2 multipathd: dm-0: add map (uevent)
Oct 23 13:56:28 gnbdclient2 multipathd: dm-0: devmap already registered
Oct 23 13:56:28 gnbdclient2 multipathd: 252:0: mark as failed
Oct 23 13:56:28 gnbdclient2 multipathd: yellow: remaining active paths: 1
………………………………………………………………………………………………………
Oct 23 13:57:05 gnbdclient2 kernel: end_request: I/O error, dev gnbd0, sector 0
Oct 23 13:57:05 gnbdclient2 multipathd: gnbd0: directio checker reports path is down
Oct 23 13:57:05 gnbdclient2 gnbd_recvd[16027]: gnbd_recvd started
Oct 23 13:57:05 gnbdclient2 kernel: resending requests
Oct 23 13:57:10 gnbdclient2 multipathd: gnbd0: directio checker reports path is up
Oct 23 13:57:10 gnbdclient2 multipathd: 252:0: reinstated
Oct 23 13:57:10 gnbdclient2 multipathd: yellow: remaining active paths: 2
Oct 23 13:57:10 gnbdclient2 multipathd: dm-0: add map (uevent)
Oct 23 13:57:10 gnbdclient2 multipathd: dm-0: devmap already registered
這樣證明斷線測試gnbd能夠觸發(fā)fence將失效gnbdserver踢出集群,這個時候multipath能夠正常工作。
第二部分的測試是模擬將整個的gnbdserver關(guān)閉掉。
其實我已經(jīng)不用將任何輸出貼上來想必大家應該已經(jīng)能夠猜到結(jié)果:如果還是像剛才那樣測試,在dd數(shù)據(jù)到GFS文件系統(tǒng)的時候?qū)⒄趥鬏敂?shù)據(jù)的vmnet3所連接的gnbdserver1整個關(guān)閉電源。這個時候dd的動作會停頓,在20s內(nèi),gnbdserver2上的日志顯示gnbdserver1將被fence出集群,這個時候我繼續(xù)等待,待從中斷鏈路之后計算有一分鐘的時間,才去gnbdserver2上執(zhí)行fence_ack_manual –n gnbdserver1命令。如果fence成功之后,隨后的數(shù)據(jù)流將全部通過vmnet4和gnbdserver2走。
整個試驗到此成功!
在這個試驗當中最大的問題實際上是multipath和gnbdclient之間不能配合正常工作,我所面對的最多的現(xiàn)象就是在fence失效的gnbdserver之后,線路總是不能按照要求failover到其他的server上,而是不斷報告error信息,而且在GFS上的觀測顯示,的確沒有新的數(shù)據(jù)流寫入,這個時候不管終止dd進程還是cd到/mnt去看都將稱為z狀態(tài)進程,從而multipath不能發(fā)揮作用。
從我實際測試的情況看,如果在某個gnbdserver發(fā)出要fence另外一個gnbdserver的請求時,如果立即fence掉指定的節(jié)點,這個時候盡管fence成功但multipath不能正確failover線路;我將該情況歸結(jié)于使用iscsi作為共享存儲的時候,如果配置了多路徑其本身的切換速度就會比較慢,像剛才mar所說的那樣,從normal到timeout到failed到restarable這個過程完成之后multipath才能實現(xiàn)對線路的切換。但是如果從某個節(jié)點上向另外一個節(jié)點發(fā)送fence請求之后就立即將對方干掉,這個時候iscsi的底層切換還沒有完成一次從normal到restarable的過程,所以這個時侯multipath將無法接管失效鏈路。
因此總結(jié)出來的方法也很簡單,我在斷掉某個鏈路做測試的時候開始計時,在十數(shù)秒之后其中一個gnbdserver會發(fā)出fence失效節(jié)點的請求,在該請求發(fā)出大概30s以上的時間,我可能會選擇一分鐘或更長時間才執(zhí)行fence_ack_manual,這樣經(jīng)過實際驗證大大提高了multipath切換鏈路的成功率。
另外還有一個地方值得注意,就是在gnbdclient上,每次啟動系統(tǒng)之后不管gnbd_import的內(nèi)容是否正常都要進行一次檢查并將沒有正確import的鏈路再次import進來。由于這種import通常寫入到系統(tǒng)自動啟動腳本中,所以不需要太多關(guān)注內(nèi)容是否正確,而需要關(guān)注的是務必手動重啟一次multipathd,或者可以將multipathd的啟動順序調(diào)整到一個比較靠后的位置,也就是確保每次正確import了所有共享鏈路之后再重啟multipathd服務。這樣也能夠保證multipathd切換鏈路的成功率。 GNBD - no multipath support for it now
I finished reading part I, excellent! exactly what I am looking for.
There is something wrong here:
開發(fā)者特意在官方文檔中提到了gnbd的multipath功能,而沒有提到iscsi也能實現(xiàn)multipath
Changes to GFS 6.1 for RHEL4 U2
? ?This release supports iSCSI and multipath iSCSI. That is, device mapper
? ?multipath (dm-multipath) can use iSCSI.
? ?
? ?This release prevents the activation of snapshots in a clustered volume
? ?group.
? ?
? ?
Important Notes
? ?
? ?Multipath GNBD is not available with this and previous releases of??
? ?Red Hat GFS 6.1. That is, device mapper multipath (dm-multipath)
? ?cannot use GNBD. GNBD without multipath *is* available.
總結(jié)
以上是生活随笔為你收集整理的RHCS + GNBD实现基于multipath上的GFS文件系统的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: [激光器原理与应用-3]:激光器的国外品
- 下一篇: 数字后端基本概念介绍阈值电压