LVS(DR)+Keepalive高可用+Zabbix监控脑裂
1 LVS(DR)
DR模型中各主機(jī)上均需要配置VIP,解決地址沖突的方式有三種:
(1) 在前端網(wǎng)關(guān)做靜態(tài)綁定
(2) 在各RS使用arptables
(3) 在各RS修改內(nèi)核參數(shù),來(lái)限制arp響應(yīng)和通告的級(jí)別
限制響應(yīng)級(jí)別:arp_ignore
-
0:默認(rèn)值,表示可使用本地任意接口上配置的任意地址進(jìn)行響應(yīng)
-
1:僅在請(qǐng)求的目標(biāo)IP配置在本地主機(jī)的接收到請(qǐng)求報(bào)文的接口上時(shí),才給予響應(yīng)
限制通告級(jí)別:arp_announce
-
0:默認(rèn)值,把本機(jī)所有接口的所有信息向每個(gè)接口的網(wǎng)絡(luò)進(jìn)行通告
-
1:盡量避免將接口信息向非直接連接網(wǎng)絡(luò)進(jìn)行通告
-
2:必須避免將接口信息向非本網(wǎng)絡(luò)進(jìn)行通告
配置要點(diǎn)
Director 服務(wù)器采用雙IP橋接網(wǎng)絡(luò),一個(gè)是 VIP,一個(gè) DIP
Web 服務(wù)器采用和 DIP 相同的網(wǎng)段和 Director 連接
每個(gè) Web 服務(wù)器配置VIP
每個(gè) web 服務(wù)器可以應(yīng)答client的請(qǐng)求
每個(gè) web 服務(wù)器的網(wǎng)關(guān)不能指向 DIP
LVS(DR)架構(gòu)圖
Router 設(shè)置
# 需要開(kāi)啟路由轉(zhuǎn)發(fā)功能,實(shí)際生產(chǎn)中使用真實(shí)的路由器則不需此設(shè)置 [root@Router ~]# grep net.ipv4.ip_forward /etc/sysctl.conf net.ipv4.ip_forward = 1 [root@Router ~]# sysctl -p net.ipv4.ip_forward = 1 [root@Router ~]#LVS 設(shè)置
# 在LVS主機(jī)運(yùn)行的腳本 # 注意:VIP如果配置在LO網(wǎng)卡上,必須使用32bit子網(wǎng)掩碼 # 如果VIP綁定在eth0上,可以使用其它netmask[root@LVS ~]# cat Set_Lvs.sh #!/bin/bash # VIP="192.168.60.60" PORT="80" RS1="192.168.60.100" RS2="192.168.60.200" NET_INTERFACE="lo:1" NETMASK=32 MODE="-g" SCHEDULER="wrr" Lo_Addr=$(ifconfig lo:1|awk '/inet/{print $2}') rpm -q ipvsadm &> /dev/null || yum -y install ipvsadm &> /dev/nullcase $1 in start|START|up|UP)ifconfig ${NET_INTERFACE} ${VIP}/${NETMASK}iptables -Fipvsadm -A -t ${VIP}:${PORT} -s ${SCHEDULER}ipvsadm -a -t ${VIP}:${PORT} -r ${RS1} ${MODE} -w 1ipvsadm -a -t ${VIP}:${PORT} -r ${RS2} ${MODE} -w 1echo -e "\033[1;33mThe LVS Server is Ready!\033[0m" ;; stop|STOP|down|DOWN)if [[ "$VIP" == "$Lo_Addr" ]];thenifconfig ${NET_INTERFACE} downipvsadm -Cecho -e "\033[1;31mThe LVS Server is Canceled!\033[0m"elseecho -e "\033[1;31mvip:$VIP address not exist,don't stop!\033[0m"exit 1fi ;; *)echo -e "\033[1;32mUsage: $(basename $0) start|START|up|UP|stop|STOP|down|DOWN\033[0m"exit 1 ;; esac [root@LVS ~]#RealServer 設(shè)置
注意:兩臺(tái) RealServer 都要運(yùn)行此腳本
root@RS2:~# cat Set_RealServer.sh #!/bin/bash # VIP="192.168.60.60" NET_INTERFACE="lo:1" NETMASK=32 SET_ARP="/proc/sys/net/ipv4/conf" Lo_Addr=$(ifconfig lo:1|awk '/inet/{print $2}')case $1 in start|START|up|UP)ifconfig ${NET_INTERFACE} ${VIP}/${NETMASK}echo 1 > ${SET_ARP}/all/arp_ignoreecho 2 > ${SET_ARP}/all/arp_announceecho 1 > ${SET_ARP}/lo/arp_ignoreecho 2 > ${SET_ARP}/lo/arp_announceecho -e "\033[1;33mThe RealServer is Ready!\033[0m" ;; stop|STOP|down|DOWN)echo 0 > ${SET_ARP}/all/arp_ignoreecho 0 > ${SET_ARP}/all/arp_announceecho 0 > ${SET_ARP}/lo/arp_ignoreecho 0 > ${SET_ARP}/lo/arp_announceif [[ "$VIP" == "$Lo_Addr" ]];thenifconfig ${NET_INTERFACE} downecho -e "\033[1;31mThe LVS Server is Canceled!\033[0m"elseecho -e "\033[1;31mvip:$VIP address not exist,don't stop!\033[0m"exit 1fi ;; *)echo -e "\033[1;32mUsage: $(basename $0) start|START|up|UP|stop|STOP|down|DOWN\033[0m"exit 1 ;; esac root@RS2:~#RealServer1 安裝配置 nginx 服務(wù)
root@RS1:~# apt install nginx root@RS1:~# cat /etc/nginx/conf.d/pc.conf server {listen 80;server_name localhost;location /{root /data/nginx/pc;index index.html;} } root@RS1:~# mkdir -p /data/nginx/pc root@RS1:~# echo "<h1>RS1 192.168.60.100</h1>" > /data/nginx/pc/index.html root@RS1:~# cat /data/nginx/pc/index.html <h1>RS1 192.168.60.100</h1> root@RS1:~# systemctl enable --now nginxRealServer2 安裝配置 nginx 服務(wù)
root@RS2:~# apt install nginx root@RS2:~# cat /etc/nginx/conf.d/pc.conf server {listen 80;server_name localhost;location /{root /data/nginx/pc;index index.html;} } root@RS2:~# mkdir -p /data/nginx/pc root@RS2:~# echo "<h1>RS2 192.168.60.200</h1>" > /data/nginx/pc/index.html root@RS2:~# cat /data/nginx/pc/index.html <h1>RS2 192.168.60.200</h1> root@RS2:~# systemctl enable --now nginx因?yàn)?LVS-DR 模式是通過(guò)為請(qǐng)求報(bào)文重新封裝一個(gè) MAC 首部進(jìn)行轉(zhuǎn)發(fā),不會(huì)修改請(qǐng)求和應(yīng)答 IP,所以在 Realserver 上我們是可以看到客戶端(Client)的請(qǐng)求 IP
# Client IP [root@client ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft foreverinet6 ::1/128 scope hostvalid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000link/ether 00:50:56:82:e0:18 brd ff:ff:ff:ff:ff:ffinet 172.18.8.17/16 brd 172.18.255.255 scope global noprefixroute eth0valid_lft forever preferred_lft foreverinet6 fe80::250:56ff:fe82:e018/64 scope link noprefixroutevalid_lft forever preferred_lft forever [root@client ~]# curl 192.168.60.60 <h1>RS2 192.168.60.200</h1> [root@client ~]# curl 192.168.60.60 <h1>RS1 192.168.60.100</h1> [root@client ~]# curl 192.168.60.60 <h1>RS2 192.168.60.200</h1> [root@client ~]# curl 192.168.60.60 <h1>RS1 192.168.60.100</h1> [root@client ~]## 在 Realserver 上查看請(qǐng)求日志 root@RS2:/etc/nginx# tail -fn5 /var/log/nginx/access.log 172.18.8.17 - - [06/Oct/2022:10:06:28 +0000] "GET / HTTP/1.1" 200 28 "-" "curl/7.29.0" 172.18.8.17 - - [06/Oct/2022:10:06:57 +0000] "GET / HTTP/1.1" 200 28 "-" "curl/7.29.0" 172.18.8.17 - - [06/Oct/2022:10:09:16 +0000] "GET / HTTP/1.1" 200 28 "-" "curl/7.29.0" 172.18.8.17 - - [06/Oct/2022:10:09:18 +0000] "GET / HTTP/1.1" 200 28 "-" "curl/7.29.0" 172.18.8.17 - - [06/Oct/2022:10:17:04 +0000] "GET / HTTP/1.1" 200 28 "-" "curl/7.29.0"我們通過(guò)使用 tcpdump 抓包,結(jié)合 wireshark 解讀 LVS-DR的三次握手過(guò)程
# 在 RealServer 上抓包 root@RS2:~# tcpdump -i eth1 -nn port 80 and host 172.18.8.17 -w lvs_dr.pcap tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes ^C10 packets captured 10 packets received by filter 0 packets dropped by kernel root@RS2:~# du -sh lvs_dr.pcap 4.0K lvs_dr.pcap將 lvs_dr.pcap 導(dǎo)入 wireshark
上圖展示的是 Client 和 RealServer 建立握手的過(guò)程,你可能會(huì)有疑問(wèn),LVS 和 RealServer 都配置了 VIP,你如何確認(rèn)是和 RealServer 建立連接,而不是和 lVS ?
那是因?yàn)?LVS 是運(yùn)行在此處就相當(dāng)于一個(gè)路由器,不參與握手
如圖:
可以通過(guò) IP + MAC 來(lái)確認(rèn),握手的是 RealServer,而不是 LVS
查看網(wǎng)絡(luò)連接
我們可以使用 dd 命令來(lái)生成一個(gè)大文件,在客戶端通過(guò) wget 來(lái)下載這個(gè)文件,可以確認(rèn) Client 是直接與 RealServer 建立連接
RealServer1(192.168.60.100)
root@RS1:~# dd if=/dev/zero of=/data/nginx/pc/testfile bs=1M count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.217004 s, 483 MB/s root@RS1:~# ll /data/nginx/pc/ total 102412 drwxr-xr-x 2 root root 4096 Oct 13 19:26 ./ drwxr-xr-x 4 root root 4096 Jun 19 2021 ../ -rw-r--r-- 1 root root 28 Jun 19 2021 index.html -rw-r--r-- 1 root root 104857600 Oct 13 19:26 testfile root@RS1:~#客戶端(172.18.8.17)
[root@client ~]# wget --limit-rate 10k http://192.168.60.60/testfile --2021-07-01 18:40:54-- http://192.168.60.60/testfile 正在連接 192.168.60.60:80... 已連接。 已發(fā)出 HTTP 請(qǐng)求,正在等待回應(yīng)... 200 OK 長(zhǎng)度:104857600 (100M) [application/octet-stream] 正在保存至: “testfile.1”36% [======================> ] 38,035,456 10.0KB/s 剩余 1h 48m# 另開(kāi)一個(gè) Client 窗口查看 [root@client ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft foreverinet6 ::1/128 scope hostvalid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000link/ether 00:50:56:82:e0:18 brd ff:ff:ff:ff:ff:ffinet 172.18.8.17/16 brd 172.18.255.255 scope global noprefixroute eth0valid_lft forever preferred_lft foreverinet6 fe80::250:56ff:fe82:e018/64 scope link noprefixroutevalid_lft forever preferred_lft forever [root@client ~]# ss -ant State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 *:22 *:* LISTEN 0 100 127.0.0.1:25 *:* ESTAB 0 0 172.18.8.17:22 172.18.60.171:49776 ESTAB 252040 0 172.18.8.17:38434 192.168.60.60:80 # CIP 和 VIP 的連接 LISTEN 0 128 [::]:22 [::]:* LISTEN 0 100 [::1]:25 [::]:* [root@client ~]#RealServer1(192.168.60.100)的網(wǎng)絡(luò)連接
root@RS1:~# ss -nt State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 0 192.168.60.100:22 172.18.60.171:50015 ESTAB 0 232104 192.168.60.60:80 172.18.8.17:38434 # VIP 和 CIP 的連接 SYN-SENT 0 1 192.168.60.100:42240 223.5.5.5:53 root@RS1:~#因?yàn)闆](méi)有調(diào)度到 RealServer2(192.168.60.200) 上,所以沒(méi)有和 Client 的網(wǎng)絡(luò)連接
root@RS2:~# ss -nt State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 0 192.168.60.200:22 172.18.60.171:50238 SYN-SENT 0 1 192.168.60.200:43766 180.76.76.76:53 root@RS2:~#2 LVS(DR)+ Keepalive
在上面的架構(gòu)中,我們的業(yè)務(wù)服務(wù)器(RealServer)做到了負(fù)載均衡和高可用
即,我們 DOWN 掉 RS1 后,LVS 不會(huì)再往 RS1 調(diào)度了
# Down 掉 RS1 的 Nginx 服務(wù) root@RS1:~# systemctl stop nginx root@RS1:~# ps -ef|grep nginx root 15893 15514 0 16:39 pts/0 00:00:00 grep --color=auto nginx root@RS1:~## 客戶端訪問(wèn)測(cè)試 [root@client ~]# curl 192.168.60.60 <h1>RS2 192.168.60.200</h1> [root@client ~]# curl 192.168.60.60 <h1>RS2 192.168.60.200</h1> [root@client ~]# curl 192.168.60.60 <h1>RS2 192.168.60.200</h1> [root@client ~]#啟動(dòng) RS1 的 Nginx 后,又做到了輪詢
# 啟動(dòng) RS1 的 Nginx 服務(wù) root@RS1:~# systemctl start nginx root@RS1:~# ps -ef|grep nginx |grep -v grep root 15908 1 0 16:41 ? 00:00:00 nginx: master process /usr/sbin/nginx -g daemon on; master_process on; www-data 15910 15908 0 16:41 ? 00:00:00 nginx: worker process www-data 15912 15908 0 16:41 ? 00:00:00 nginx: worker process www-data 15914 15908 0 16:41 ? 00:00:00 nginx: worker process www-data 15916 15908 0 16:41 ? 00:00:00 nginx: worker process root@RS1:~## 客戶端訪問(wèn)測(cè)試 [root@client ~]# curl 192.168.60.60 <h1>RS1 192.168.60.100</h1> [root@client ~]# curl 192.168.60.60 <h1>RS2 192.168.60.200</h1> [root@client ~]# curl 192.168.60.60 <h1>RS1 192.168.60.100</h1> [root@client ~]# curl 192.168.60.60 <h1>RS2 192.168.60.200</h1> [root@client ~]#但是,如果我們的 LVS 機(jī)器 DOWN 掉之后呢?
很顯然,我們的業(yè)務(wù)將徹底無(wú)法訪問(wèn)
那么,有沒(méi)有辦法解決 LVS 的單點(diǎn)問(wèn)題呢?
答案就是引入 Keepalive 高可用服務(wù)
架構(gòu)如下
Keepalive設(shè)置
我們?cè)诓渴鹨慌_(tái) LVS 服務(wù)器,并且將 Keepalive 和 LVS 部署在一起
# LVS-1 配置 [root@LVS-1 ~]# dnf -y install keepalived [root@LVS-1 ~]# cat /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs {notification_email {root@localhost}notification_email_from keepalived@localhostsmtp_server 127.0.0.1smtp_connect_timeout 30router_id lvs1vrrp_mcast_group4 224.0.100.10 }vrrp_instance VI_1 {state MASTERinterface eth1virtual_router_id 66priority 100advert_int 1authentication {auth_type PASSauth_pass 123456}virtual_ipaddress {#192.168.60.60 dev lo label lo:1192.168.60.60/24 dev eth1 label eth1:1}notify_master "/etc/keepalived/notify.sh master"notify_backup "/etc/keepalived/notify.sh backup"notify_fault "/etc/keepalived/notify.sh fault" }virtual_server 192.168.60.60 80 {delay_loop 3lb_algo rrlb_kind DRprotocol TCPsorry_server 127.0.0.1 80real_server 192.168.60.100 80 {weight 1HTTP_GET {url {path /status_code 200}connect_timeout 1nb_get_retry 3delay_before_retry 1}}real_server 192.168.60.200 80 {weight 1TCP_CHECK {connect_timeout 5nb_get_retry 3delay_before_retry 3connect_port 80}} } [root@LVS-1 ~]# systemctl start keepalived# LVS-2 配置 [root@LVS-2 ~]# dnf -y install keepalived [root@LVS-2 ~]# cat /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs {notification_email {root@localhost}notification_email_from keepalived@localhostsmtp_server 127.0.0.1smtp_connect_timeout 30router_id lvs2vrrp_mcast_group4 224.0.100.10 }vrrp_instance VI_1 {state BACKUPinterface eth1virtual_router_id 66priority 80advert_int 1authentication {auth_type PASSauth_pass 123456}virtual_ipaddress {192.168.60.60/24 dev eth1 label eth1:1}notify_master "/etc/keepalived/notify.sh master"notify_backup "/etc/keepalived/notify.sh backup"notify_fault "/etc/keepalived/notify.sh fault" }virtual_server 192.168.60.60 80 {delay_loop 3lb_algo rrlb_kind DRprotocol TCPsorry_server 127.0.0.1 80real_server 192.168.60.100 80 {weight 1HTTP_GET {url {path /status_code 200}connect_timeout 1nb_get_retry 3delay_before_retry 1}}real_server 192.168.60.200 80 {weight 1TCP_CHECK {connect_timeout 5nb_get_retry 3delay_before_retry 3connect_port 80}} } [root@LVS-2 ~]# systemctl start keepalived設(shè)置完 lvs + keepalive 如果沒(méi)有設(shè)置過(guò) RealServer 的話,我們還需要做如下設(shè)置
RealServer 設(shè)置
注意:兩臺(tái) RealServer 都要運(yùn)行此腳本
root@RS2:~# cat Set_RealServer.sh #!/bin/bash # VIP="192.168.60.60" NET_INTERFACE="lo:1" NETMASK=32 SET_ARP="/proc/sys/net/ipv4/conf" Lo_Addr=$(ifconfig lo:1|awk '/inet/{print $2}')case $1 in start|START|up|UP)ifconfig ${NET_INTERFACE} ${VIP}/${NETMASK}echo 1 > ${SET_ARP}/all/arp_ignoreecho 2 > ${SET_ARP}/all/arp_announceecho 1 > ${SET_ARP}/lo/arp_ignoreecho 2 > ${SET_ARP}/lo/arp_announceecho -e "\033[1;33mThe RealServer is Ready!\033[0m" ;; stop|STOP|down|DOWN)echo 0 > ${SET_ARP}/all/arp_ignoreecho 0 > ${SET_ARP}/all/arp_announceecho 0 > ${SET_ARP}/lo/arp_ignoreecho 0 > ${SET_ARP}/lo/arp_announceif [[ "$VIP" == "$Lo_Addr" ]];thenifconfig ${NET_INTERFACE} downecho -e "\033[1;31mThe LVS Server is Canceled!\033[0m"elseecho -e "\033[1;31mvip:$VIP address not exist,don't stop!\033[0m"exit 1fi ;; *)echo -e "\033[1;32mUsage: $(basename $0) start|START|up|UP|stop|STOP|down|DOWN\033[0m"exit 1 ;; esac root@RS2:~#至此,設(shè)置完成
因?yàn)?LVS-1 是主機(jī)(優(yōu)先級(jí)是 100,高于 LVS-2)我們可以登錄 LVS-1 查看 VIP 綁定以及 lvs-dr規(guī)則
# 查看 VIP:192.168.60.60 是否存在 [root@LVS-1 ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft foreverinet6 ::1/128 scope hostvalid_lft forever preferred_lft forever 2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000link/ether 00:50:56:82:98:c3 brd ff:ff:ff:ff:ff:ff 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000link/ether 00:50:56:a3:e8:d2 brd ff:ff:ff:ff:ff:ffinet 192.168.60.80/24 brd 192.168.60.255 scope global noprefixroute eth1valid_lft forever preferred_lft foreverinet 192.168.60.60/32 scope global eth1:1valid_lft forever preferred_lft foreverinet6 fe80::6c06:303c:a126:d35b/64 scope link noprefixroutevalid_lft forever preferred_lft forever [root@LVS-1 ~]## 查看 lvs-dr 規(guī)則 [root@LVS-1 ~]# ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags-> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.60.60:80 rr-> 192.168.60.100:80 Route 1 0 0-> 192.168.60.200:80 Route 1 0 0 [root@LVS-1 ~]#此時(shí) LVS-2 沒(méi)有 VIP,雖說(shuō)有 lvs-dr 規(guī)則,但并不會(huì)生效
[root@LVS-2 ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft foreverinet6 ::1/128 scope hostvalid_lft forever preferred_lft forever 2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000link/ether 00:50:56:a3:07:c1 brd ff:ff:ff:ff:ff:ff 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000link/ether 00:50:56:a3:4c:74 brd ff:ff:ff:ff:ff:ffinet 192.168.60.88/24 brd 192.168.60.255 scope global noprefixroute eth1valid_lft forever preferred_lft foreverinet6 fe80::5745:92ae:e725:b669/64 scope link noprefixroutevalid_lft forever preferred_lft forever [root@LVS-2 ~]# ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags-> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.60.60:80 rr-> 192.168.60.100:80 Route 1 0 0-> 192.168.60.200:80 Route 1 0 0 [root@LVS-2 ~]#如果 LVS-1 放生故障,則 VIP 會(huì)飄向 LVS-2 并且 lvs-dr 規(guī)則生效
# LVS-1 操作 [root@LVS-1 ~]# systemctl stop keepalived [root@LVS-1 ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft foreverinet6 ::1/128 scope hostvalid_lft forever preferred_lft forever 2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000link/ether 00:50:56:82:98:c3 brd ff:ff:ff:ff:ff:ff 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000link/ether 00:50:56:a3:e8:d2 brd ff:ff:ff:ff:ff:ffinet 192.168.60.80/24 brd 192.168.60.255 scope global noprefixroute eth1valid_lft forever preferred_lft foreverinet6 fe80::6c06:303c:a126:d35b/64 scope link noprefixroutevalid_lft forever preferred_lft forever [root@LVS-1 ~]# ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags-> RemoteAddress:Port Forward Weight ActiveConn InActConn [root@LVS-1 ~]## 查看 LVS-2(確認(rèn) VIP:192.168.60.60已存在) [root@LVS-2 ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft foreverinet6 ::1/128 scope hostvalid_lft forever preferred_lft forever 2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000link/ether 00:50:56:a3:07:c1 brd ff:ff:ff:ff:ff:ff 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000link/ether 00:50:56:a3:4c:74 brd ff:ff:ff:ff:ff:ffinet 192.168.60.88/24 brd 192.168.60.255 scope global noprefixroute eth1valid_lft forever preferred_lft foreverinet 192.168.60.60/24 scope global secondary eth1:1valid_lft forever preferred_lft foreverinet6 fe80::5745:92ae:e725:b669/64 scope link noprefixroutevalid_lft forever preferred_lft forever [root@LVS-2 ~]# ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags-> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.60.60:80 rr-> 192.168.60.100:80 Route 1 0 0-> 192.168.60.200:80 Route 1 0 0 [root@LVS-2 ~]#因?yàn)?Keepalive 默認(rèn)采用的是搶占式,如果LVS-1服務(wù)器恢復(fù)后 ,啟動(dòng) Keepalived 服務(wù),VIP 會(huì)被搶走,此時(shí)將機(jī)器由 LVS-1 提供服務(wù)
3 Zabbix 監(jiān)控 Keepalive 腦裂
我們?cè)谏a(chǎn)中使用了一段時(shí)間的 Keepalive ,發(fā)現(xiàn)了一個(gè)問(wèn)題,就是出現(xiàn)了兩臺(tái) Keepalive 上都有 VIP 的存在,即我們經(jīng)常所說(shuō)的“腦裂”問(wèn)題,我們排查之后發(fā)現(xiàn),這樣被動(dòng)的發(fā)現(xiàn)問(wèn)題的方法很蠢,所以領(lǐng)導(dǎo)要求我們主動(dòng)出擊,第一時(shí)間發(fā)現(xiàn)問(wèn)題,避免對(duì)線上業(yè)務(wù)造成不可估量的損失,為此,我們發(fā)揮主觀能動(dòng)性,引入 Zabbix 監(jiān)控
3.1 Zabbix Server 的安裝
注:Zabbix Server 和 Route 部署在一臺(tái)機(jī)器上,IP:172.18.8.18
安裝 Zabbix 和配置報(bào)警步驟
3.2 Zabbix Agent 安裝
在兩臺(tái) LVS + Keepalive 機(jī)器上部署 zabbix_agent
# 我在編譯安裝 zabbix_server 的時(shí)候,也將 agent 啟用了 # 所以,直接將在 zabbix_server 編譯好的 agent 拷貝過(guò)來(lái)就可以使用 # 或者你直接 yum 安裝 # 或者單獨(dú)編譯 zabbix_agentd [root@LVS-1 ~]# mkdir -p /apps [root@LVS-1 ~]# cd /apps/ [root@LVS-1 apps]# scp -r 172.18.8.18:/apps/zabbix ./ root@172.18.8.18's password: zabbix_agentd 100% 2265KB 46.4MB/s 00:00 zabbix_server 100% 14MB 57.1MB/s 00:00 zabbix_server.conf 100% 24KB 245.3KB/s 00:00 zabbix_agentd.conf 100% 15KB 3.9MB/s 00:00 zabbix_get 100% 1090KB 9.0MB/s 00:00 zabbix_sender 100% 1127KB 75.2MB/s 00:00 zabbix_js 100% 4056KB 92.3MB/s 00:00 zabbix_get.1 100% 4929 1.0MB/s 00:00 zabbix_sender.1 100% 14KB 4.1MB/s 00:00 zabbix_agentd.8 100% 3927 46.7KB/s 00:00 zabbix_server.8 100% 3775 2.8MB/s 00:00 zabbix_server.log 100% 365KB 11.2MB/s 00:00 zabbix_agentd.log 100% 212KB 10.0MB/s 00:00 zabbix_server.pid 100% 5 4.0KB/s 00:00 zabbix_agentd.pid 100% 5 4.9KB/s 00:00 [root@LVS-1 apps]# scp 172.18.8.18:/usr/lib/systemd/system/zabbix_agent.service /usr/lib/systemd/system/ root@172.18.8.18's password: zabbix_agent.service 100% 391 23.0KB/s 00:00 [root@LVS-1 apps]#啟動(dòng) zabbix_agent
[root@LVS-1 ~]# groupadd --system zabbix [root@LVS-1 ~]# useradd --system -g zabbix -d /usr/lib/zabbix -s /sbin/nologin -c "Zabbix Monitoring System" zabbix [root@LVS-1 ~]# chown -R zabbix.zabbix /apps/zabbix/ [root@LVS-1 ~]# hostName=$(hostname -I|awk '{print $1}') [root@LVS-1 ~]# sed -i '/^Hostname=Zabbix Agent/c Hostname='${hostName}'' /apps/zabbix/etc/zabbix_agentd.conf [root@LVS-1 ~]# sed -i '/^Server=127.0.0.1/c Server=172.18.8.18,192.168.60.1' /apps/zabbix/etc/zabbix_agentd.conf [root@LVS-1 ~]# sed -i '/^ServerActive=127.0.0.1/c ServerActive=172.18.8.18,192.168.60.1' /apps/zabbix/etc/zabbix_agentd.conf [root@LVS-1 ~]# systemctl daemon-reload [root@LVS-1 ~]# systemctl start zabbix_agent [root@LVS-1 ~]# ps -ef|grep zabbix|grep -v grep zabbix 6625 1 0 23:03 ? 00:00:00 /apps/zabbix/sbin/zabbix_agentd -c /apps/zabbix/etc/zabbix_agentd.conf zabbix 6626 6625 0 23:03 ? 00:00:00 /apps/zabbix/sbin/zabbix_agentd: collector [idle 1 sec] zabbix 6627 6625 0 23:03 ? 00:00:00 /apps/zabbix/sbin/zabbix_agentd: listener #1 [waiting for connection] zabbix 6628 6625 0 23:03 ? 00:00:00 /apps/zabbix/sbin/zabbix_agentd: listener #2 [waiting for connection] zabbix 6629 6625 0 23:03 ? 00:00:00 /apps/zabbix/sbin/zabbix_agentd: listener #3 [waiting for connection] zabbix 6630 6625 0 23:03 ? 00:00:00 /apps/zabbix/sbin/zabbix_agentd: active checks #1 [idle 1 sec] [root@LVS-1 ~]#3.3 將 agent 加入到 Zabbix Server 的設(shè)置步驟如下
添加hosts
添加監(jiān)控模板
3.4 在 Zabbix 中添加監(jiān)控項(xiàng)和報(bào)警設(shè)置
自定義報(bào)警的key
[root@Router ~]# cat /apps/zabbix/etc/zabbix_agentd.conf.d/check_keepalived.conf UserParameter=check_keepalived[*],/bin/bash /apps/zabbix/etc/zabbix_agentd.conf.d/check_vip.sh自定義監(jiān)控腳本
備機(jī)(LVS-2)上存在VIP有兩種情況
第一種:主機(jī)(LVS-1)真實(shí)發(fā)生了宕機(jī)或者服務(wù)故障,導(dǎo)致VIP飄到了備機(jī)(LVS-2)上
第二種:因?yàn)镵eepAlive配置問(wèn)題或防火墻(iptalbes)等原因?qū)е轮鳈C(jī)和備機(jī)之間的心跳鏈路通信出現(xiàn)了故障,即,Keepalive 產(chǎn)生了“腦裂”
網(wǎng)上有些方法是監(jiān)控備機(jī)出現(xiàn)了VIP的情況就判斷發(fā)生了“腦裂”,這種方法是容易產(chǎn)生誤報(bào)的
我們要監(jiān)控的其實(shí)就只有第二種情況,此時(shí)我們將腳本部署在第三臺(tái)中立的機(jī)器上,采用arping命令,如果解析出兩個(gè)MAC,即,確認(rèn)產(chǎn)生了“腦裂”。
[root@Router ~]# cat /apps/zabbix/etc/zabbix_agentd.conf.d/check_vip.sh #!/bin/bash # vip="192.168.60.60" net="eth0" checkMac=$(arping -c 1 -I $net $vip|awk -F"[][]" '/Unicast/{print $2}'|wc -l) if [[ $checkMac -gt 1 ]] thenresult=1 elseresult=0 fi echo $result [root@Router ~]#添加監(jiān)控項(xiàng)
報(bào)警閾值設(shè)置
3.5 模擬 keepalive “腦裂”
在備機(jī)(LVS-2)的機(jī)器上設(shè)置防火墻,拒絕接收主機(jī)(LVS-1)的數(shù)據(jù)包,VIP(192.168.60.60) 飄到了 LVS-2 上,產(chǎn)生了“腦裂”
[root@LVS-2 ~]# iptables -t filter -A INPUT -s 192.168.60.80 -j DROP [root@LVS-2 ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft foreverinet6 ::1/128 scope hostvalid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000link/ether 00:50:56:a3:07:c1 brd ff:ff:ff:ff:ff:ff 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000link/ether 00:50:56:a3:4c:74 brd ff:ff:ff:ff:ff:ffinet 192.168.60.88/24 brd 192.168.60.255 scope global noprefixroute eth1valid_lft forever preferred_lft foreverinet 192.168.60.60/24 scope global secondary eth1:1 # VIP(192.168.60.60) 也飄到了 LVS-2 上面,此時(shí)發(fā)生了“腦裂”valid_lft forever preferred_lft foreverinet6 fe80::5745:92ae:e725:b669/64 scope link noprefixroutevalid_lft forever preferred_lft forever此時(shí)會(huì)收到一封 keepalive “腦裂”的郵件
收到郵件后,我們?cè)谇宄?iptables 規(guī)則,在 LVS-2 上已經(jīng)沒(méi)有了 VIP(192.168.60.60)
[root@LVS-2 ~]# iptables -F [root@LVS-2 ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft foreverinet6 ::1/128 scope hostvalid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000link/ether 00:50:56:a3:07:c1 brd ff:ff:ff:ff:ff:ff 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000link/ether 00:50:56:a3:4c:74 brd ff:ff:ff:ff:ff:ffinet 192.168.60.88/24 brd 192.168.60.255 scope global noprefixroute eth1valid_lft forever preferred_lft foreverinet6 fe80::5745:92ae:e725:b669/64 scope link noprefixroutevalid_lft forever preferred_lft forever [root@LVS-2 ~]#此時(shí)會(huì)收到一封恢復(fù)的郵件
總結(jié)
以上是生活随笔為你收集整理的LVS(DR)+Keepalive高可用+Zabbix监控脑裂的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: matlab数据拟合语句,Matlab数
- 下一篇: 今天给大家带来满满的干货:华为的薪资与奖