k8s系列08-负载均衡器之PureLB
本文主要在k8s原生集群上部署v0.6.1版本的PureLB作為k8s的LoadBalancer,主要涉及PureLB的Layer2模式和ECMP模式兩種部署方案。由于PureLB的ECMP支持多種路由協議,這里選用的是在k8s中常見的BGP進行配置。由于BGP的相關原理和配置比較復雜,這里僅涉及簡單的BGP配置。
文中使用的k8s集群是在CentOS7系統上基于docker和cilium組件部署v1.23.6版本,此前寫的一些關于k8s基礎知識和集群搭建的一些方案,有需要的同學可以看一下。
1、工作原理
PureLB的工作原理和其他的負載均衡器(MetalLB、OpenELB)類似,也可以大致分為Layer2模式和BGP模式,但是PureLB的兩個模式和(MetalLB/OpenELB)還有著較大的區別。
More simply, PureLB either uses the LoadBalancing functionality provided natively by k8s and/or combines k8s LoadBalancing with the routers Equal Cost Multipath (ECMP) load-balancing.
- MetalLB/OpenELB的BGP模式是指通過跑BGP協議實現ECMP從而實現高可用,并且因為MetalLB/OpenELB只支持BGP這一個路由協議,所以稱為BGP模式,或者也可以稱之為ECMP模式;
- PureLB會在k8s的宿主機節點上面添加一個新的虛擬網卡,通過這種方式使得我們可以使用Linux的網絡棧看到k8s集群中使用的LoadBalancerVIP,同樣得益于使用了Linux網絡棧,因此PureLB可以使用任意路由協議實現ECMP(BGP、OSPF等),這種模式更傾向于ECMP模式而不止是BGP模式
- MetalLB/OpenELB的Layer2模式會把所有的VIP的請求通過ARP/NDP吸引到一臺節點上面,所有的流量都會經過這個節點,屬于典型的雞蛋放在一個籃子里
- PureLB的Layer2模式也和MetalLB/OpenELB不同,它可以根據單個VIP來選擇節點,從而將多個VIP分散到集群中的不同節點上,盡可能的把流量均衡的分散到集群中的每個節點,一定程度上將雞蛋分散,避免了嚴重的單點故障
解釋PureLB的工作原理比較簡單,我們看一下官方的這個架構圖:
Instead of thinking of PureLB as advertising services, think of PureLB as attracting packets to allocated addresses with KubeProxy forwarding those packets within the cluster via the Container Network Interface Network (POD Network) between nodes.
- Allocator:用來監聽API中的LoadBalancer類型服務,并且負責分配IP。
- LBnodeagent: 作為daemonset部署到每個可以暴露請求并吸引流量的節點上,并且負責監聽服務的狀態變化同時負責把VIP添加到本地網卡或者是虛擬網卡
- KubeProxy:k8s的內置組件,并非是PureLB的一部分,但是PureLB依賴其進行正常工作,當對VIP的請求達到某個具體的節點之后,需要由kube-proxy來負責將其轉發到對應的pod
和MetalLB與OpenELB不同,PureLB并不需要自己去發送GARP/GNDP數據包,它執行的操作是把IP添加到k8s集群宿主機的網卡上面。具體來說就是:
從上面這個邏輯我們不難看出:PureLB在設計實現原理的時候,盡可能地優先使用已有的基礎架構設施。這樣一來是可以盡可能地減少開發工作量,不必重復造輪子;二來是可以給用戶提供盡可能多的接入選擇,降低用戶的入門門檻。
2、Layer2模式
2.1 準備工作
在開始部署PureLB之前,我們需要進行一些準備工作,主要就是端口檢查和arp參數設置。
-
PureLB使用了CRD,原生的k8s集群需要版本不小于1.15才能支持CRD
-
PureLB也使用了Memberlist來進行選主,因此需要確保7934端口沒有被占用(包括TCP和UDP),否則會出現腦裂的情況
PureLB uses a library called Memberlist to provide local network address failover faster than standard k8s timeouts would require. If you plan to use local network address and have applied firewalls to your nodes, it is necessary to add a rule to allow the memberlist election to occur. The port used by Memberlist in PureLB is Port 7934 UDP/TCP, memberlist uses both TCP and UDP, open both.
-
修改arp參數,和其他的開源LoadBalancer一樣,也要把kube-proxy的arp參數設置為嚴格strictARP: true
把k8s集群中的ipvs配置打開strictARP之后,k8s集群中的kube-proxy會停止響應kube-ipvs0網卡之外的其他網卡的arp請求。
strict ARP開啟之后相當于把 將 arp_ignore 設置為 1 并將 arp_announce 設置為 2 啟用嚴格的 ARP,這個原理和LVS中的DR模式對RS的配置一樣,可以參考之前的文章中的解釋。
# 查看kube-proxy中的strictARP配置 $ kubectl get configmap -n kube-system kube-proxy -o yaml | grep strictARPstrictARP: false# 手動修改strictARP配置為true $ kubectl edit configmap -n kube-system kube-proxy configmap/kube-proxy edited# 使用命令直接修改并對比不同 $ kubectl get configmap kube-proxy -n kube-system -o yaml | sed -e "s/strictARP: false/strictARP: true/" | kubectl diff -f - -n kube-system# 確認無誤后使用命令直接修改并生效 $ kubectl get configmap kube-proxy -n kube-system -o yaml | sed -e "s/strictARP: false/strictARP: true/" | kubectl apply -f - -n kube-system# 重啟kube-proxy確保配置生效 $ kubectl rollout restart ds kube-proxy -n kube-system# 確認配置生效 $ kubectl get configmap -n kube-system kube-proxy -o yaml | grep strictARPstrictARP: true
2.2 部署PureLB
老規矩我們還是使用manifest文件進行部署,當然官方還提供了helm等部署方式。
$ wget https://gitlab.com/api/v4/projects/purelb%2Fpurelb/packages/generic/manifest/0.0.1/purelb-complete.yaml$ kubectl apply -f purelb/purelb-complete.yaml namespace/purelb created customresourcedefinition.apiextensions.k8s.io/lbnodeagents.purelb.io created customresourcedefinition.apiextensions.k8s.io/servicegroups.purelb.io created serviceaccount/allocator created serviceaccount/lbnodeagent created Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+ podsecuritypolicy.policy/allocator created podsecuritypolicy.policy/lbnodeagent created role.rbac.authorization.k8s.io/pod-lister created clusterrole.rbac.authorization.k8s.io/purelb:allocator created clusterrole.rbac.authorization.k8s.io/purelb:lbnodeagent created rolebinding.rbac.authorization.k8s.io/pod-lister created clusterrolebinding.rbac.authorization.k8s.io/purelb:allocator created clusterrolebinding.rbac.authorization.k8s.io/purelb:lbnodeagent created deployment.apps/allocator created daemonset.apps/lbnodeagent created error: unable to recognize "purelb/purelb-complete.yaml": no matches for kind "LBNodeAgent" in version "purelb.io/v1"$ kubectl apply -f purelb/purelb-complete.yaml namespace/purelb unchanged customresourcedefinition.apiextensions.k8s.io/lbnodeagents.purelb.io configured customresourcedefinition.apiextensions.k8s.io/servicegroups.purelb.io configured serviceaccount/allocator unchanged serviceaccount/lbnodeagent unchanged Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+ podsecuritypolicy.policy/allocator configured podsecuritypolicy.policy/lbnodeagent configured role.rbac.authorization.k8s.io/pod-lister unchanged clusterrole.rbac.authorization.k8s.io/purelb:allocator unchanged clusterrole.rbac.authorization.k8s.io/purelb:lbnodeagent unchanged rolebinding.rbac.authorization.k8s.io/pod-lister unchanged clusterrolebinding.rbac.authorization.k8s.io/purelb:allocator unchanged clusterrolebinding.rbac.authorization.k8s.io/purelb:lbnodeagent unchanged deployment.apps/allocator unchanged daemonset.apps/lbnodeagent unchanged lbnodeagent.purelb.io/default created請注意,由于 Kubernetes 的最終一致性架構,此manifest清單的第一個應用程序可能會失敗。發生這種情況是因為清單既定義了CRD,又使用該CRD創建了資源。如果發生這種情況,請再次應用manifest清單,應該就會部署成功。
Please note that due to Kubernetes’ eventually-consistent architecture the first application of this manifest can fail. This happens because the manifest both defines a Custom Resource Definition and creates a resource using that definition. If this happens then apply the manifest again and it should succeed because Kubernetes will have processed the definition in the mean time.
檢查一下部署的服務
$ kubectl get pods -n purelb -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES allocator-5bf9ddbf9b-p976d 1/1 Running 0 2m 10.0.2.140 tiny-cilium-worker-188-12.k8s.tcinternal <none> <none> lbnodeagent-df2hn 1/1 Running 0 2m 10.31.188.12 tiny-cilium-worker-188-12.k8s.tcinternal <none> <none> lbnodeagent-jxn9h 1/1 Running 0 2m 10.31.188.1 tiny-cilium-master-188-1.k8s.tcinternal <none> <none> lbnodeagent-xn8dz 1/1 Running 0 2m 10.31.188.11 tiny-cilium-worker-188-11.k8s.tcinternal <none> <none>$ kubectl get deploy -n purelb NAME READY UP-TO-DATE AVAILABLE AGE allocator 1/1 1 1 10m [root@tiny-cilium-master-188-1 purelb]# kubectl get ds -n purelb NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE lbnodeagent 3 3 3 3 3 kubernetes.io/os=linux 10m$ kubectl get crd | grep purelb lbnodeagents.purelb.io 2022-05-20T06:42:01Z servicegroups.purelb.io 2022-05-20T06:42:01Z$ kubectl get --namespace=purelb servicegroups.purelb.io No resources found in purelb namespace. $ kubectl get --namespace=purelb lbnodeagent.purelb.io NAME AGE default 55m和MetalLB/OpenELB不一樣的是,PureLB使用了另外的一個單獨的虛擬網卡kube-lb0而不是默認的kube-ipvs0網卡
$ ip addr show kube-lb0 15: kube-lb0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000link/ether 12:27:b1:48:4e:3a brd ff:ff:ff:ff:ff:ffinet6 fe80::1027:b1ff:fe48:4e3a/64 scope linkvalid_lft forever preferred_lft forever2.3 配置purelb
上面部署的時候我們知道purelb主要創建了兩個CRD,分別是lbnodeagents.purelb.io和servicegroups.purelb.io
$ kubectl api-resources --api-group=purelb.io NAME SHORTNAMES APIVERSION NAMESPACED KIND lbnodeagents lbna,lbnas purelb.io/v1 true LBNodeAgent servicegroups sg,sgs purelb.io/v1 true ServiceGroup2.3.1 lbnodeagents.purelb.io
默認情況下已經創建好了一個名為default的lbnodeagent,我們可以看一下它的幾個配置項
$ kubectl describe --namespace=purelb lbnodeagent.purelb.io/default Name: default Namespace: purelb Labels: <none> Annotations: <none> API Version: purelb.io/v1 Kind: LBNodeAgent Metadata:Creation Timestamp: 2022-05-20T06:42:23ZGeneration: 1Managed Fields:API Version: purelb.io/v1Fields Type: FieldsV1fieldsV1:f:metadata:f:annotations:.:f:kubectl.kubernetes.io/last-applied-configuration:f:spec:.:f:local:.:f:extlbint:f:localint:Manager: kubectl-client-side-applyOperation: UpdateTime: 2022-05-20T06:42:23ZResource Version: 1765489UID: 59f0ad8c-1024-4432-8f95-9ad574b28fff Spec:Local:Extlbint: kube-lb0Localint: default Events: <none>注意上面的Spec:Local:字段中的Extlbint和Localint
- Extlbint字段指定的是PureLB使用的虛擬網卡名稱,默認為kube-lb0,如果修改為自定義名稱,記得同時修改bird中的配置
- Localint字段指定的是用來實際通信的物理網卡,默認情況下會使用正則表達式來匹配,當然也可以自定義,如果集群節點是單網卡機器基本無需修改
2.3.2 servicegroups.purelb.io
servicegroups默認情況下并沒有創建,需要我們進行手動配置,注意purellb是支持ipv6的,配置方式和ipv4一致,只是這里沒有需求就沒有單獨配置v6pool。
apiVersion: purelb.io/v1 kind: ServiceGroup metadata:name: layer2-ippoolnamespace: purelb spec:local:v4pool:subnet: '10.31.188.64/26'pool: '10.31.188.64-10.31.188.126'aggregation: /32然后我們直接部署并檢查
$ kubectl apply -f purelb-ipam.yaml servicegroup.purelb.io/layer2-ippool created$ kubectl get sg -n purelb NAME AGE layer2-ippool 50s$ kubectl describe sg -n purelb Name: layer2-ippool Namespace: purelb Labels: <none> Annotations: <none> API Version: purelb.io/v1 Kind: ServiceGroup Metadata:Creation Timestamp: 2022-05-20T07:58:32ZGeneration: 1Managed Fields:API Version: purelb.io/v1Fields Type: FieldsV1fieldsV1:f:metadata:f:annotations:.:f:kubectl.kubernetes.io/last-applied-configuration:f:spec:.:f:local:.:f:v4pool:.:f:aggregation:f:pool:f:subnet:Manager: kubectl-client-side-applyOperation: UpdateTime: 2022-05-20T07:58:32ZResource Version: 1774182UID: 92422ea9-231d-4280-a8b5-ec6c61605dd9 Spec:Local:v4pool:Aggregation: /32Pool: 10.31.188.64-10.31.188.126Subnet: 10.31.188.64/26 Events:Type Reason Age From Message---- ------ ---- ---- -------Normal Parsed 4m13s purelb-allocator ServiceGroup parsed successfully2.4 部署service
PureLB的部分CRD特性需要我們手動在Service中通過添加注解(annotations)來啟用,這里我們只需要指定purelb.io/service-group來確定使用的IP池即可
annotations:purelb.io/service-group: layer2-ippool完整的測試服務相關manifest如下:
apiVersion: v1 kind: Namespace metadata:name: nginx-quic---apiVersion: apps/v1 kind: Deployment metadata:name: nginx-lbnamespace: nginx-quic spec:selector:matchLabels:app: nginx-lbreplicas: 4template:metadata:labels:app: nginx-lbspec:containers:- name: nginx-lbimage: tinychen777/nginx-quic:latestimagePullPolicy: IfNotPresentports:- containerPort: 80---apiVersion: v1 kind: Service metadata:annotations:purelb.io/service-group: layer2-ippoolname: nginx-lb-servicenamespace: nginx-quic spec:allocateLoadBalancerNodePorts: falseexternalTrafficPolicy: ClusterinternalTrafficPolicy: Clusterselector:app: nginx-lbports:- protocol: TCPport: 80 # match for service access porttargetPort: 80 # match for pod access porttype: LoadBalancer---apiVersion: v1 kind: Service metadata:annotations:purelb.io/service-group: layer2-ippoolname: nginx-lb2-servicenamespace: nginx-quic spec:allocateLoadBalancerNodePorts: falseexternalTrafficPolicy: ClusterinternalTrafficPolicy: Clusterselector:app: nginx-lbports:- protocol: TCPport: 80 # match for service access porttargetPort: 80 # match for pod access porttype: LoadBalancer---apiVersion: v1 kind: Service metadata:annotations:purelb.io/service-group: layer2-ippoolname: nginx-lb3-servicenamespace: nginx-quic spec:allocateLoadBalancerNodePorts: falseexternalTrafficPolicy: ClusterinternalTrafficPolicy: Clusterselector:app: nginx-lbports:- protocol: TCPport: 80 # match for service access porttargetPort: 80 # match for pod access porttype: LoadBalancer確認沒有問題之后我們直接部署,會創建namespace/nginx-quic、deployment.apps/nginx-lb、service/nginx-lb-service 、service/nginx-lb2-service 、service/nginx-lb3-service 這幾個資源
$ kubectl apply -f nginx-quic-lb.yaml namespace/nginx-quic unchanged deployment.apps/nginx-lb created service/nginx-lb-service created service/nginx-lb2-service created service/nginx-lb3-service created$ kubectl get svc -n nginx-quic NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE nginx-lb-service LoadBalancer 10.188.54.81 10.31.188.64 80/TCP 101s nginx-lb2-service LoadBalancer 10.188.34.171 10.31.188.65 80/TCP 101s nginx-lb3-service LoadBalancer 10.188.6.24 10.31.188.66 80/TCP 101s查看k8s的服務日志就能知道VIP在哪個節點上
$ kubectl describe service nginx-lb-service -n nginx-quic Name: nginx-lb-service Namespace: nginx-quic Labels: <none> Annotations: purelb.io/allocated-by: PureLBpurelb.io/allocated-from: layer2-ippoolpurelb.io/announcing-IPv4: tiny-cilium-worker-188-11.k8s.tcinternal,eth0purelb.io/service-group: layer2-ippool Selector: app=nginx-lb Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 10.188.54.81 IPs: 10.188.54.81 LoadBalancer Ingress: 10.31.188.64 Port: <unset> 80/TCP TargetPort: 80/TCP Endpoints: 10.0.1.45:80,10.0.1.49:80,10.0.2.181:80 + 1 more... Session Affinity: None External Traffic Policy: Cluster Events:Type Reason Age From Message---- ------ ---- ---- -------Normal AddressAssigned 3m12s purelb-allocator Assigned {Ingress:[{IP:10.31.188.64 Hostname: Ports:[]}]} from pool layer2-ippoolNormal AnnouncingLocal 3m8s (x7 over 3m12s) purelb-lbnodeagent Node tiny-cilium-worker-188-11.k8s.tcinternal announcing 10.31.188.64 on interface eth0$ kubectl describe service nginx-lb2-service -n nginx-quic Name: nginx-lb2-service Namespace: nginx-quic Labels: <none> Annotations: purelb.io/allocated-by: PureLBpurelb.io/allocated-from: layer2-ippoolpurelb.io/announcing-IPv4: tiny-cilium-master-188-1.k8s.tcinternal,eth0purelb.io/service-group: layer2-ippool Selector: app=nginx-lb Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 10.188.34.171 IPs: 10.188.34.171 LoadBalancer Ingress: 10.31.188.65 Port: <unset> 80/TCP TargetPort: 80/TCP Endpoints: 10.0.1.45:80,10.0.1.49:80,10.0.2.181:80 + 1 more... Session Affinity: None External Traffic Policy: Cluster Events:Type Reason Age From Message---- ------ ---- ---- -------Normal AddressAssigned 4m20s purelb-allocator Assigned {Ingress:[{IP:10.31.188.65 Hostname: Ports:[]}]} from pool layer2-ippoolNormal AnnouncingLocal 4m17s (x5 over 4m20s) purelb-lbnodeagent Node tiny-cilium-master-188-1.k8s.tcinternal announcing 10.31.188.65 on interface eth0$ kubectl describe service nginx-lb3-service -n nginx-quic Name: nginx-lb3-service Namespace: nginx-quic Labels: <none> Annotations: purelb.io/allocated-by: PureLBpurelb.io/allocated-from: layer2-ippoolpurelb.io/announcing-IPv4: tiny-cilium-worker-188-11.k8s.tcinternal,eth0purelb.io/service-group: layer2-ippool Selector: app=nginx-lb Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 10.188.6.24 IPs: 10.188.6.24 LoadBalancer Ingress: 10.31.188.66 Port: <unset> 80/TCP TargetPort: 80/TCP Endpoints: 10.0.1.45:80,10.0.1.49:80,10.0.2.181:80 + 1 more... Session Affinity: None External Traffic Policy: Cluster Events:Type Reason Age From Message---- ------ ---- ---- -------Normal AddressAssigned 4m33s purelb-allocator Assigned {Ingress:[{IP:10.31.188.66 Hostname: Ports:[]}]} from pool layer2-ippoolNormal AnnouncingLocal 4m29s (x6 over 4m33s) purelb-lbnodeagent Node tiny-cilium-worker-188-11.k8s.tcinternal announcing 10.31.188.66 on interface eth0我們找一臺局域網內的其他機器查看可以發現三個VIP的mac地址并不完全一樣,符合上面的日志輸出結果
$ ip neigh | grep 10.31.188.6 10.31.188.65 dev eth0 lladdr 52:54:00:69:0a:ab REACHABLE 10.31.188.64 dev eth0 lladdr 52:54:00:3c:88:cb REACHABLE 10.31.188.66 dev eth0 lladdr 52:54:00:3c:88:cb REACHABLE我們再查看節點上面的網絡地址,除了大家都有的kube-ipvs0網卡上面有所有的VIP,PureLB和MetalLB/OpenELB最大的不同在于PureLB還能在對應節點的物理網卡上面準確地看到對應的Service所屬的VIP。
$ ansible cilium -m command -a "ip addr show eth0" 10.31.188.11 | CHANGED | rc=0 >> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000link/ether 52:54:00:3c:88:cb brd ff:ff:ff:ff:ff:ffinet 10.31.188.11/16 brd 10.31.255.255 scope global noprefixroute eth0valid_lft forever preferred_lft foreverinet 10.31.188.64/16 brd 10.31.255.255 scope global secondary eth0valid_lft forever preferred_lft foreverinet 10.31.188.66/16 brd 10.31.255.255 scope global secondary eth0valid_lft forever preferred_lft foreverinet6 fe80::5054:ff:fe3c:88cb/64 scope linkvalid_lft forever preferred_lft forever10.31.188.12 | CHANGED | rc=0 >> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000link/ether 52:54:00:32:a7:42 brd ff:ff:ff:ff:ff:ffinet 10.31.188.12/16 brd 10.31.255.255 scope global noprefixroute eth0valid_lft forever preferred_lft foreverinet6 fe80::5054:ff:fe32:a742/64 scope linkvalid_lft forever preferred_lft forever10.31.188.1 | CHANGED | rc=0 >> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000link/ether 52:54:00:69:0a:ab brd ff:ff:ff:ff:ff:ffinet 10.31.188.1/16 brd 10.31.255.255 scope global noprefixroute eth0valid_lft forever preferred_lft foreverinet 10.31.188.65/16 brd 10.31.255.255 scope global secondary eth0valid_lft forever preferred_lft foreverinet6 fe80::5054:ff:fe69:aab/64 scope linkvalid_lft forever preferred_lft forever2.5 指定VIP
同樣的,需要指定IP的話我們可以添加spec:loadBalancerIP:字段來指定VIP
apiVersion: v1 kind: Service metadata:annotations:purelb.io/service-group: layer2-ippoolname: nginx-lb4-servicenamespace: nginx-quic spec:allocateLoadBalancerNodePorts: falseexternalTrafficPolicy: ClusterinternalTrafficPolicy: Clusterselector:app: nginx-lbports:- protocol: TCPport: 80 # match for service access porttargetPort: 80 # match for pod access porttype: LoadBalancerloadBalancerIP: 10.31.188.1002.6 關于nodeport
PureLB支持allocateLoadBalancerNodePorts特性,可以通過設置allocateLoadBalancerNodePorts: false來關閉自動為LoadBalancer服務分配nodeport這個功能。
3、ECMP模式
因為purelb使用了Linux的網絡棧,因此在ECMP的實現這一塊就有更多的選擇,這里我們參考官方的實現方案,使用BGP+Bird的方案來實現。
| 10.31.188.1 | tiny-cilium-master-188-1.k8s.tcinternal |
| 10.31.188.11 | tiny-cilium-worker-188-11.k8s.tcinternal |
| 10.31.188.12 | tiny-cilium-worker-188-12.k8s.tcinternal |
| 10.188.0.0/18 | serviceSubnet |
| 10.31.254.251 | BGP-Router(frr) |
| 10.189.0.0/16 | PuerLB-BGP-IPpool |
其中PureLB的ASN是64515,路由器的ASN為64512。
3.1 準備工作
我們先把官方的GitHub倉庫拉到本地,然后實際上我們部署需要的配置文件只有bird-cm.yml和bird.yml這兩個即可。
$ git clone https://gitlab.com/purelb/bird_router.git $ ls bird*yml bird-cm.yml bird.yml接下來我們對其進行一些修改,首先是configmap文件bird-cm.yml,我們只需要修改description、as、neighbor這三個字段:
- description:建立BGP連接的路由器的描述,一般我習慣命名為IP的數字加橫杠
- as:自己的ASN
- neighbor:建立BGP連接的路由器的IP地址
- namespace:官方默認新建了一個router的namespace來管理,這里我們為了方便統一到purelb
接下來是bird的daemonset配置文件,這里不一定要根據我的步驟修改,大家可以按照實際需求來處理
- namespace:官方默認新建了一個router的namespace來管理,這里我們為了方便統一到purelb
- imagePullPolicy:官方默認是Always,這里我們修改為IfNotPresent
3.2 部署bird
部署的話非常簡單,直接部署上面的兩個配置文件即可,注意上面我們把namespace修改為了purelb,因此這里創建namespace這一步可以省略
# Create the router namespace $ kubectl create namespace router# Apply the edited configmap $ kubectl apply -f bird-cm.yml# Deploy the Bird Router $ kubectl apply -f bird.yml接著我們檢查一下部署的狀態
$ kubectl get ds -n purelb NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE bird 2 2 2 0 2 <none> 27m lbnodeagent 3 3 3 3 3 kubernetes.io/os=linux 42h$ kubectl get cm -n purelb NAME DATA AGE bird-cm 1 28m kube-root-ca.crt 1 42h$ kubectl get pods -n purelb NAME READY STATUS RESTARTS AGE allocator-5bf9ddbf9b-p976d 1/1 Running 0 42h bird-4qtrm 1/1 Running 0 16s bird-z9cq2 1/1 Running 0 49s lbnodeagent-df2hn 1/1 Running 0 42h lbnodeagent-jxn9h 1/1 Running 0 42h lbnodeagent-xn8dz 1/1 Running 0 42h默認情況下bird不會調度到master節點,這樣可以保證master節點不參與到ECMP的負載均衡中,減少master節點上面的網絡流量從而提高master的穩定性
3.3 配置路由器
路由器我們還是使用frr來進行配置
root@tiny-openwrt-plus:~# cat /etc/frr/frr.conf frr version 8.2.2 frr defaults traditional hostname tiny-openwrt-plus log file /home/frr/frr.log log syslog password zebra ! router bgp 64512bgp router-id 10.31.254.251no bgp ebgp-requires-policy!neighbor 10.31.188.11 remote-as 64515neighbor 10.31.188.11 description 10-31-188-11neighbor 10.31.188.12 remote-as 64515neighbor 10.31.188.12 description 10-31-188-12!!address-family ipv4 unicast!maximum-paths 3exit-address-family exit ! access-list vty seq 5 permit 127.0.0.0/8 access-list vty seq 10 deny any ! line vtyaccess-class vty exit !配置完成之后我們重啟服務,然后查看路由器這端的BGP狀態,這時候看到和兩個worker節點之間的BGP狀態建立正常就說明配置沒有問題
tiny-openwrt-plus# show ip bgp summaryIPv4 Unicast Summary (VRF default): BGP router identifier 10.31.254.251, local AS number 64512 vrf-id 0Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 10.31.188.11 4 64515 3 4 0 0 0 00:00:13 0 3 10-31-188-11 10.31.188.12 4 64515 3 4 0 0 0 00:00:13 0 3 10-31-188-123.4 創建ServiceGroup
我們還需要給BGP模式創建一個ServiceGroup,用于管理BGP網段的IP,建議IP段使用和k8s的宿主機節點不同網段的IP
apiVersion: purelb.io/v1 kind: ServiceGroup metadata:name: bgp-ippoolnamespace: purelb spec:local:v4pool:subnet: '10.189.0.0/16'pool: '10.189.0.0-10.189.255.254'aggregation: /32完成之后我們直接部署并檢查
$ kubectl apply -f purelb-sg-bgp.yaml servicegroup.purelb.io/bgp-ippool created$ kubectl get sg -n purelb NAME AGE bgp-ippool 7s layer2-ippool 41h3.5 部署測試服務
這里我們還是直接使用上面已經創建的nginx-lb這個deployments,然后直接新建兩個service進行測試
apiVersion: v1 kind: Service metadata:annotations:purelb.io/service-group: bgp-ippoolname: nginx-lb5-servicenamespace: nginx-quic spec:allocateLoadBalancerNodePorts: falseexternalTrafficPolicy: ClusterinternalTrafficPolicy: Clusterselector:app: nginx-lbports:- protocol: TCPport: 80 # match for service access porttargetPort: 80 # match for pod access porttype: LoadBalancer---apiVersion: v1 kind: Service metadata:annotations:purelb.io/service-group: bgp-ippoolname: nginx-lb6-servicenamespace: nginx-quic spec:allocateLoadBalancerNodePorts: falseexternalTrafficPolicy: ClusterinternalTrafficPolicy: Clusterselector:app: nginx-lbports:- protocol: TCPport: 80 # match for service access porttargetPort: 80 # match for pod access porttype: LoadBalancerloadBalancerIP: 10.189.100.100此時我們檢查部署的狀態
$ kubectl get svc -n nginx-quic NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE nginx-lb-service LoadBalancer 10.188.54.81 10.31.188.64 80/TCP 40h nginx-lb2-service LoadBalancer 10.188.34.171 10.31.188.65 80/TCP 40h nginx-lb3-service LoadBalancer 10.188.6.24 10.31.188.66 80/TCP 40h nginx-lb4-service LoadBalancer 10.188.50.164 10.31.188.100 80/TCP 40h nginx-lb5-service LoadBalancer 10.188.7.75 10.189.0.0 80/TCP 11s nginx-lb6-service LoadBalancer 10.188.27.208 10.189.100.100 80/TCP 11s再使用curl進行測試
[root@tiny-centos7-100-2 ~]# curl 10.189.100.100 10.0.1.47:57768 [root@tiny-centos7-100-2 ~]# curl 10.189.100.100 10.0.1.47:57770 [root@tiny-centos7-100-2 ~]# curl 10.189.100.100 10.31.188.11:47439 [root@tiny-centos7-100-2 ~]# curl 10.189.100.100 10.31.188.11:33964 [root@tiny-centos7-100-2 ~]# curl 10.189.100.100 10.0.1.47:57776 [root@tiny-centos7-100-2 ~]# curl 10.189.100.100 10.0.1.47:57778[root@tiny-centos7-100-2 ~]# curl 10.189.0.0 10.31.188.12:53078 [root@tiny-centos7-100-2 ~]# curl 10.189.0.0 10.0.2.151:59660 [root@tiny-centos7-100-2 ~]# curl 10.189.0.0 10.0.2.151:59662 [root@tiny-centos7-100-2 ~]# curl 10.189.0.0 10.31.188.12:21972 [root@tiny-centos7-100-2 ~]# curl 10.189.0.0 10.31.188.12:28855 [root@tiny-centos7-100-2 ~]# curl 10.189.0.0 10.0.2.151:59668然后我們再查看kube-lb0網卡上面的IP信息,可以看到每臺節點上面都有兩個BGP模式的LoadBalancer的IP
[tinychen /root/ansible]# ansible cilium -m command -a "ip addr show kube-lb0" 10.31.188.11 | CHANGED | rc=0 >> 19: kube-lb0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000link/ether d6:65:b8:31:18:ce brd ff:ff:ff:ff:ff:ffinet 10.189.0.0/32 scope global kube-lb0valid_lft forever preferred_lft foreverinet 10.189.100.100/32 scope global kube-lb0valid_lft forever preferred_lft foreverinet6 fe80::d465:b8ff:fe31:18ce/64 scope linkvalid_lft forever preferred_lft forever 10.31.188.12 | CHANGED | rc=0 >> 21: kube-lb0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000link/ether aa:10:d5:cd:2b:98 brd ff:ff:ff:ff:ff:ffinet 10.189.0.0/32 scope global kube-lb0valid_lft forever preferred_lft foreverinet 10.189.100.100/32 scope global kube-lb0valid_lft forever preferred_lft foreverinet6 fe80::a810:d5ff:fecd:2b98/64 scope linkvalid_lft forever preferred_lft forever 10.31.188.1 | CHANGED | rc=0 >> 15: kube-lb0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000link/ether 12:27:b1:48:4e:3a brd ff:ff:ff:ff:ff:ffinet 10.189.0.0/32 scope global kube-lb0valid_lft forever preferred_lft foreverinet 10.189.100.100/32 scope global kube-lb0valid_lft forever preferred_lft foreverinet6 fe80::1027:b1ff:fe48:4e3a/64 scope linkvalid_lft forever preferred_lft forever最后我們查看路由器上面的路由表,可以確定ECMP開啟成功
tiny-openwrt-plus# show ip route Codes: K - kernel route, C - connected, S - static, R - RIP,O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,f - OpenFabric,> - selected route, * - FIB route, q - queued, r - rejected, b - backupt - trapped, o - offload failureK>* 0.0.0.0/0 [0/0] via 10.31.254.254, eth0, 00:08:51 C>* 10.31.0.0/16 is directly connected, eth0, 00:08:51 B>* 10.189.0.0/32 [20/0] via 10.31.188.11, eth0, weight 1, 00:00:19* via 10.31.188.12, eth0, weight 1, 00:00:19 B>* 10.189.100.100/32 [20/0] via 10.31.188.11, eth0, weight 1, 00:00:19* via 10.31.188.12, eth0, weight 1, 00:00:194、總結
PureLB和前面我們提到過的MetalLB以及OpenELB有著非常大的不同,盡管三者的主要工作模式都是分為Layer2模式和BGP模式。還是老規矩,我們先來看兩種工作模式的優缺點,再來總結PureLB。
4.1 Layer2 mode優缺點
優點:
- 通用性強,對比BGP模式不需要BGP路由器支持,幾乎可以適用于任何網絡環境;當然云廠商的網絡環境例外
- VIP會被分散到多個節點上面,解決了MetalLB和OpenELB的Layer2模式下的流量單點瓶頸問題
- 使用了Linux網絡棧,可以通過iproute之類的命令直接查看到vip所在的節點
缺點:
- 當VIP所在節點宕機之后,需要較長時間進行故障轉移(官方沒說多久),PureLB和MetalLB一樣都使用了memberlist來進行選主(并表示此舉更優),當VIP所在節點宕機之后重新選主的時間要比傳統的keepalived使用的vrrp協議(一般為1s)要更長
改進方案:
- 有條件的可以考慮使用BGP模式
- 可以針對一個負載workload創建多個service,并對外暴露多個VIP,由于PureLB會把VIP分散到多個節點上,這樣可以一定程度上實現高可用
- 既不能用BGP模式也不能接受Layer2模式的,基本和目前主流的三個開源負載均衡器無緣了(三者都是Layer2模式和BGP模式且原理類似,優缺點相同)
4.2 ECMP mode優缺點
ECMP模式的優缺點幾乎和Layer2模式相反
優點:
- 無單點故障,在開啟ECMP的前提下,k8s集群內所有的節點都有請求流量,都會參與負載均衡并轉發請求
- 支持了Linux網絡棧,因此可以使用bird、quagga、frr等各種路由軟件實現標準的路由協議
缺點:
- 條件苛刻,需要有特殊路由器支持,配置起來也更復雜;
- ECMP的故障轉移(failover)并不是特別地優雅,這個問題的嚴重程度取決于使用的ECMP算法;當集群的節點出現變動導致BGP連接出現變動,所有的連接都會進行重新哈希(使用三元組或五元組哈希),這對一些服務來說可能會有影響;
路由器中使用的哈希值通常 不穩定,因此每當后端集的大小發生變化時(例如,當一個節點的 BGP 會話關閉時),現有的連接將被有效地隨機重新哈希,這意味著大多數現有的連接最終會突然被轉發到不同的后端,而這個后端可能和此前的后端毫不相干且不清楚上下文狀態信息。
改進方案:
PureLB官方只簡單提及了使用路由協議的一些問題:
Depending on the router and its configuration, load balancing techniques will vary however they are all generally based upon a 4 tuple hash of sourceIP, sourcePort, destinationIP, destinationPort. The router will also have a limit to the number of ECMP paths that can be used, in modern TOR switches, this can be set to a size larger than a /24 subnet, however in old routers, the count can be less than 10. This needs to be considered in the infrastructure design and PureLB combined with routing software can help create a design that avoids this limitation. Another important consideration can be how the router load balancer cache is populated and updated when paths are removed, again modern devices provide better behavior.
不過由于都是使用ECMP,我們可以參考MetalLB官方給出的資料,下面是MetalLB給出的一些改進方案,列出來給大家參考一下
- 使用更穩定的ECMP算法來減少后端變動時對現有連接的影響,如“resilient ECMP” or “resilient LAG”
- 將服務部署到特定的節點上減少可能帶來的影響
- 在流量低峰期進行變更
- 將服務分開部署到兩個不同的LoadBalanceIP的服務中,然后利用DNS進行流量切換
- 在客戶端加入透明的用戶無感的重試邏輯
- 在LoadBalance后面加入一層ingress來實現更優雅的failover(但是并不是所有的服務都可以使用ingress)
- 接受現實……(Accept that there will be occasional bursts of reset connections. For low-availability internal services, this may be acceptable as-is.)
4.3 PureLB優缺點
這里盡量客觀的總結概況一些客觀事實,是否為優缺點可能會因人而異:
- PureLB使用了CRD來實現更優秀的IPAM,也是三者中唯一一個支持外置IPAM的
- PureLB對Linux網絡棧有更好的支持(可以使用iproute等工具查看LoadBalancerVIP)
- PureLB可以使用任意路由協議實現ECMP(BGP、OSPF等)
- PureLB和使用BGP模式的CNI集成更加方便
- PureLB的社區熱度不如MetalLB和OpenELB,也沒有加入CNCF,只表示CNCF提供了一個slack通道給用戶進行交流(The CNCF have generously provided the PureLB community a Slack Channel in the Kubernetes workspace.)
- PureLB的文檔相對齊全,但是還是有些小紕漏
- PureLB的Layer2模式不存在單點流量瓶頸
總得來說PureLB是一款非常不錯的云原生負載均衡器,在軟件本身的設計模式上面應該是參考了MetalLB等前輩的思路,同時又青出于藍而勝于藍。唯一美中不足的是社區熱度不高,讓人有些擔心這個項目以后的發展情況。如果在三者中選一個使用layer2模式的話,個人推薦首選PureLB;如果是使用BGP模式,則建議結合自己的CNI組件和IPAM等情況綜合考慮。
總結
以上是生活随笔為你收集整理的k8s系列08-负载均衡器之PureLB的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: RJ11
- 下一篇: 利用RJ11电话接口收发电子传真,节约纸