RabbitMQ脑裂
歡迎支持筆者新作:《深入理解Kafka:核心設計與實踐原理》和《RabbitMQ實戰指南》,同時歡迎關注筆者的微信公眾號:朱小廝的博客。
歡迎跳轉到本文的原文鏈接:https://honeypps.com/mq/rabbitmq-network-partition-1/
在RabbitMQ3.4.x中會出現錯誤的網絡分區檢測(某種意義上可以稱之為腦裂)的現象,本文通過實驗驗證此現象,愿小伙伴們少走彎路。
Preview
網上有兩篇帖子(需要翻墻)
https://groups.google.com/forum/#!topic/rabbitmq-users/dt8VFhMb2zM
https://groups.google.com/forum/#!topic/rabbitmq-users/06OQkYtLJd8
陳述了腦裂的現象。
帖子中描述現象:
Hey Folk,i just set up a rabbitmq cluster:Three Nodes: Node A | Node B | Node CAll three nodes see each other (same erlang-cookie, mode: pause_minority).rabbitmqctl cluster_status => shows status of all nodes on every instance.Every queue is mirrored to the other nodes.If i shutdown Node B, the following is happening: * Node A realizes Node B is offline. * Node A asks Node C for Node B status. * Node C answers: "I still have connection to Node B." * Node A shuts down itself. * Node C realizes some seconds later, that the connection to Node B is no more possible.From three Nodes only one is left in case of an unexpected outage.I would like to realize a setup where Node A and C keep the connection even if Node B goes offline. Is there any way to do this?Michael Klishin(rabbitmq-server第二貢獻者)回復:
A known issue which is partially resolve in 3.4.x releases. 26474 can be related.(根據RabbitMQ 3.4.2 Release日志:26474 prevent false positive detection of partial partitions (since 3.4.0)) ====》錯誤的網絡分區檢測。
Simon MacMullen(也是rabbitmq-server的contributor):
So this is caused by the new partial partition detection in 3.4.x. It looks like it is too sensitive - C should only reply "yes" if it has positive confirmation that it can still talk to B, not if the connection just hasn't failed yet. This will be fixed in 3.4.2.假設
自此可以假設:rabbitmq3.4.0存在錯誤的網絡分區檢測,rabbitmq3.4.2修復了此bug。
論證過程:分別對rabbitmq3.4.0, rabbitmq3.4.1, rabbitmq3.4.2, rabbitmq3.6.0進行實驗, 分別配置A B C三個節點組成一個cluster,然后通過停止C的網絡來驗證A和B是否出現錯誤的網絡分區檢測.
論證
論證1
rabbitmq版本:3.4.0
rabbitmq節點配置
共三個節點:A B C,分別為:
A:rabbit@zhuzhonghua2-fqawb
B:rabbit@hiddenzhu-8drd
C:rabbit@hidden-local
B join_cluster A; C join_cluster A
查看cluster_status:(rabbitmqctl cluster_status)
Cluster status of node 'rabbit@zhuzhonghua2-fqawb' ... [{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb']}]},{running_nodes,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb']},{cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},{partitions,[]}]在C節點執行service network stop
在A節點查看cluster_status
再次在A節點查看cluster_status
Cluster status of node 'rabbit@zhuzhonghua2-fqawb' ... [{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb']}]},{running_nodes,['rabbit@zhuzhonghua2-fqawb']},{cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},{partitions,[{'rabbit@zhuzhonghua2-fqawb',['rabbit@hiddenzhu-8drdc']}]}] 在B節點查看cluster_status [{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb']}]},{running_nodes,['rabbit@hiddenzhu-8drdc']},{cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},{partitions,[{'rabbit@hiddenzhu-8drdc',['rabbit@zhuzhonghua2-fqawb']}]}]結論:【這里出現了網絡分區,但是真正的網絡分區是要在網絡恢復連通之后才能檢測】
在C節點執行service network start
查看A節點cluster_status
查看B節點cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb']}]},{running_nodes,['rabbit@hiddenzhu-8drdc']},{cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},{partitions,[{'rabbit@hiddenzhu-8drdc',['rabbit@zhuzhonghua2-fqawb']}]}]查看C節點cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb']}]},{running_nodes,['rabbit@hidden-local']},{cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},{partitions,[{'rabbit@hidden-local',['rabbit@zhuzhonghua2-fqawb']}]}]論證2
rabbitmq版本:3.4.1
節點配置如上(B join_cluster A, C join_cluster A)
查看節點狀態:
在C節點執行service network stop
查看A節點cluster_status
查看B節點cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb']}]},{running_nodes,['rabbit@hiddenzhu-8drdc']},{cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},{partitions,[{'rabbit@hiddenzhu-8drdc',['rabbit@zhuzhonghua2-fqawb']}]}]結論:【復現】
在C節點執行service network start
查看A節點cluster_status
查看B節點cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb']}]},{running_nodes,['rabbit@hiddenzhu-8drdc']},{cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},{partitions,[{'rabbit@hiddenzhu-8drdc',['rabbit@zhuzhonghua2-fqawb']}]}]查看C節點cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb']}]},{running_nodes,['rabbit@hidden-local']},{cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},{partitions,[{'rabbit@hidden-local',['rabbit@zhuzhonghua2-fqawb']}]}]論證3
rabbitmq版本:3.4.2 (版本3.6.0與此相同)
節點配置如上(B join_cluster A, C join_cluster A)
查看節點狀態
在C節點執行service network stop
查看A節點cluster_status
查看B節點cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb']}]},{running_nodes,['rabbit@zhuzhonghua2-fqawb','rabbit@hiddenzhu-8drdc']},{cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},{partitions,[]}]結論:【未復現】
在C節點執行service network start
查看A節點cluster_status
查看B節點cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb']}]},{running_nodes,['rabbit@zhuzhonghua2-fqawb','rabbit@hiddenzhu-8drdc']},{cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},{partitions,[{'rabbit@zhuzhonghua2-fqawb',['rabbit@hidden-local']}]}]查看C節點cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb']}]},{running_nodes,['rabbit@hidden-local']},{cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},{partitions,[{'rabbit@hidden-local',['rabbit@zhuzhonghua2-fqawb']}]}]結論
版本問題基本得到驗證,為了防止錯誤的網絡分區檢測現象,建議正在使用rabbitmq的小伙伴升級,避免使用3.4.0和3.4.1這兩個版本。
網絡分區
有關網絡分區有篇文章(RabbitMQ 網絡分區問題)這樣介紹:
RabbitMQ 集群的網絡分區容錯性并不是非常高,在網絡經常發生分區時會有些問題,最明顯的就是腦裂問題。
官方文檔是這樣介紹的:
RabbitMQ clusters do not tolerate network partitions well. If you are thinking of clustering across a WAN, don't. You should use federation or the shovel instead.從中我們可以看出,在廣域網環境下不應該使用集群,而應該使用 federation 或者 shovel 來解決。
不過即使是在局域網環境下,網絡分區也不可能完全避免,網絡設備(比如中繼設備、網卡)出現故障也會導致網絡分區。
Network partition detectedMnesia reports that this RabbitMQ cluster has experienced a network partition. This is a dangerous situation. RabbitMQ clusters should not be installed on networks which can experience partitions.當出現網絡分區時,不同分區里的節點會認為不屬于自身所在分區的節點都已經掛了,對 queue、exchange、binding 的操作僅對當前分區有效。在 RabbitMQ 的默認配置下,即使網絡恢復了也不會自動處理網絡分區帶來的問題從而恢復集群。RabbitMQ(3.1+)會自動探測網絡分區,并且提供了配置來解決這個問題。
[{rabbit,[{tcp_listeners,[5672]},{cluster_partition_handling, ignore}]} ].RabbitMQ 提供了4種配置(詳細參考:http://blog.csdn.net/u013256816/article/details/73757884):
參考:
● RabbitMQ 官方文檔
● 網絡分區
● 腦裂問題
歡迎跳轉到本文的原文鏈接:https://honeypps.com/mq/rabbitmq-network-partition-1/
歡迎支持筆者新作:《深入理解Kafka:核心設計與實踐原理》和《RabbitMQ實戰指南》,同時歡迎關注筆者的微信公眾號:朱小廝的博客。
總結
以上是生活随笔為你收集整理的RabbitMQ脑裂的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: RabbitMQ单机多实例配置
- 下一篇: Kafka集群配置