kubernetes问题排查
1. 查看系統Event事件
kubectl describe pod <PodName> --namespace=<NAMESPACE>該命令可以顯示Pod創建時的配置定義、狀態等信息和最近的Event事件,事件信息可用于排錯。例如當Pod狀態為Pending,可通過查看Event事件確認原因,一般原因有幾種:
- 沒有可用的Node可調度
- 開啟了資源配額管理并且當前Pod的目標節點上恰好沒有可用的資源
- 正在下載鏡像(鏡像拉取耗時太久)或鏡像下載失敗。
kubectl describe還可以查看其它k8s對象:NODE,RC,Service,Namespace,Secrets。
1.1. Pod
kubectl describe pod <PodName> --namespace=<NAMESPACE>以下是容器的啟動命令非阻塞式導致容器掛掉,被k8s頻繁重啟所產生的事件。
kubectl describe pod <PodName> --namespace=<NAMESPACE> Events:FirstSeen LastSeen Count From SubobjectPath Reason Message───────── ──────── ───── ──── ───────────── ────── ───────7m 7m 1 {scheduler } Scheduled Successfully assigned yangsc-1-0-0-index0 to 10.8.216.197m 7m 1 {kubelet 10.8.216.19} containers{infra} Pulled Container image "gcr.io/kube-system/pause:0.8.0" already present on machine7m 7m 1 {kubelet 10.8.216.19} containers{infra} Created Created with docker id 84f133c324d07m 7m 1 {kubelet 10.8.216.19} containers{infra} Started Started with docker id 84f133c324d07m 7m 1 {kubelet 10.8.216.19} containers{yangsc0} Started Started with docker id 3f9f82abb1457m 7m 1 {kubelet 10.8.216.19} containers{yangsc0} Created Created with docker id 3f9f82abb1457m 7m 1 {kubelet 10.8.216.19} containers{yangsc0} Created Created with docker id fb112e4002f47m 7m 1 {kubelet 10.8.216.19} containers{yangsc0} Started Started with docker id fb112e4002f46m 6m 1 {kubelet 10.8.216.19} containers{yangsc0} Created Created with docker id 613b119d44746m 6m 1 {kubelet 10.8.216.19} containers{yangsc0} Started Started with docker id 613b119d44746m 6m 1 {kubelet 10.8.216.19} containers{yangsc0} Created Created with docker id 25cb68d1fd3d6m 6m 1 {kubelet 10.8.216.19} containers{yangsc0} Started Started with docker id 25cb68d1fd3d5m 5m 1 {kubelet 10.8.216.19} containers{yangsc0} Started Started with docker id 7d9ee8610b285m 5m 1 {kubelet 10.8.216.19} containers{yangsc0} Created Created with docker id 7d9ee8610b283m 3m 1 {kubelet 10.8.216.19} containers{yangsc0} Started Started with docker id 88b9e8d582dd3m 3m 1 {kubelet 10.8.216.19} containers{yangsc0} Created Created with docker id 88b9e8d582dd7m 1m 7 {kubelet 10.8.216.19} containers{yangsc0} Pulling Pulling image "gcr.io/test/tcp-hello:1.0.0"1m 1m 1 {kubelet 10.8.216.19} containers{yangsc0} Started Started with docker id 089abff050e71m 1m 1 {kubelet 10.8.216.19} containers{yangsc0} Created Created with docker id 089abff050e77m 1m 7 {kubelet 10.8.216.19} containers{yangsc0} Pulled Successfully pulled image "gcr.io/test/tcp-hello:1.0.0"6m 7s 34 {kubelet 10.8.216.19} containers{yangsc0} Backoff Back-off restarting failed docker container1.2. NODE
kubectl describe node 10.8.216.20 [root@FC-43745A-10 ~]# kubectl describe node 10.8.216.20 Name: 10.8.216.20 Labels: kubernetes.io/hostname=10.8.216.20,namespace/bcs-cc=true,namespace/myview=true CreationTimestamp: Mon, 17 Apr 2017 11:32:52 +0800 Phase: Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ──── ────── ───────────────── ────────────────── ────── ─────── Ready True Fri, 18 Aug 2017 09:38:33 +0800 Tue, 02 May 2017 17:40:58 +0800 KubeletReady kubelet is posting ready status OutOfDisk False Fri, 18 Aug 2017 09:38:33 +0800 Mon, 17 Apr 2017 11:31:27 +0800 KubeletHasSufficientDisk kubelet has sufficient disk space available Addresses: 10.8.216.20,10.8.216.20 Capacity: cpu: 32 memory: 67323039744 pods: 40 System Info: Machine ID: 723bafc7f6764022972b3eae1ce6b198 System UUID: 4C4C4544-0042-4210-8044-C3C04F595631 Boot ID: da01f2e3-987a-425a-9ca7-1caaec35d1e5 Kernel Version: 3.10.0-327.28.3.el7.x86_64 OS Image: CentOS Linux 7 (Core) Container Runtime Version: docker://1.13.1 Kubelet Version: v1.1.1-xxx2-13.1+79c90c68bfb72f-dirty Kube-Proxy Version: v1.1.1-xxx2-13.1+79c90c68bfb72f-dirty ExternalID: 10.8.216.20 Non-terminated Pods: (6 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits ───────── ──── ──────────── ────────── ─────────────── ───────────── bcs-cc bcs-cc-api-0-0-1364-index0 1 (3%) 1 (3%) 4294967296 (6%) 4294967296 (6%) bcs-cc bcs-cc-api-0-0-1444-index0 1 (3%) 1 (3%) 4294967296 (6%) 4294967296 (6%) fw fw-demo2-0-0-1519-index0 1 (3%) 1 (3%) 4294967296 (6%) 4294967296 (6%) myview myview-api-0-0-1362-index0 1 (3%) 1 (3%) 4294967296 (6%) 4294967296 (6%) myview myview-api-0-0-1442-index0 1 (3%) 1 (3%) 4294967296 (6%) 4294967296 (6%) qa-ts-dna ts-dna-console3-0-0-1434-index0 1 (3%) 1 (3%) 4294967296 (6%) 4294967296 (6%) Allocated resources: (Total limits may be over 100%, i.e., overcommitted. More info: http://releases.k8s.io/HEAD/docs/user-guide/compute-resources.md) CPU Requests CPU Limits Memory Requests Memory Limits ──────────── ────────── ─────────────── ───────────── 6 (18%) 6 (18%) 25769803776 (38%) 25769803776 (38%) No events.1.3. RC
kubectl describe rc mytest-1-0-0 --namespace=test [root@FC-43745A-10 ~]# kubectl describe rc mytest-1-0-0 --namespace=test Name: mytest-1-0-0 Namespace: test Image(s): gcr.io/test/mywebcalculator:1.0.1 Selector: app=mytest,appVersion=1.0.0 Labels: app=mytest,appVersion=1.0.0,env=ts,zone=inner Replicas: 1 current / 1 desired Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed No volumes. Events: FirstSeen LastSeen Count From SubobjectPath Reason Message ───────── ──────── ───── ──── ───────────── ────── ─────── 20h 19h 9 {replication-controller } FailedCreate Error creating: Pod "mytest-1-0-0-index0" is forbidden: limited to 10 pods 20h 17h 7 {replication-controller } FailedCreate Error creating: pods "mytest-1-0-0-index0" already exists 20h 17h 4 {replication-controller } SuccessfulCreate Created pod: mytest-1-0-0-index01.4. NAMESPACE
kubectl describe namespace test [root@FC-43745A-10 ~]# kubectl describe namespace test Name: test Labels: <none> Status: Active Resource Quotas Resource Used Hard --- --- --- cpu 5 20 memory 1342177280 53687091200 persistentvolumeclaims 0 10 pods 4 10 replicationcontrollers 8 20 resourcequotas 1 1 secrets 3 10 services 8 20 No resource limits.1.5. Service
kubectl describe service xxx-containers-1-1-0 --namespace=test [root@FC-43745A-10 ~]# kubectl describe service xxx-containers-1-1-0 --namespace=test Name: xxx-containers-1-1-0 Namespace: test Labels: app=xxx-containers,appVersion=1.1.0,env=ts,zone=inner Selector: app=xxx-containers,appVersion=1.1.0 Type: ClusterIP IP: 10.254.46.42 Port: port-dna-tcp-35913 35913/TCP Endpoints: 10.0.92.17:35913 Port: port-l7-tcp-8080 8080/TCP Endpoints: 10.0.92.17:8080 Session Affinity: None No events.2. 查看容器日志
1、查看指定pod的日志
kubectl logs <pod_name>kubectl logs -f <pod_name> #類似tail -f的方式查看2、查看上一個pod的日志
kubectl logs -p <pod_name>3、查看指定pod中指定容器的日志
kubectl logs <pod_name> -c <container_name>4、kubectl logs --help
[root@node5 ~]# kubectl logs --help Print the logs for a container in a pod. If the pod has only one container, the container name is optional. Usage: kubectl logs [-f] [-p] POD [-c CONTAINER] [flags] Aliases: logs, log Examples: # Return snapshot logs from pod nginx with only one container $ kubectl logs nginx # Return snapshot of previous terminated ruby container logs from pod web-1 $ kubectl logs -p -c ruby web-1 # Begin streaming the logs of the ruby container in pod web-1 $ kubectl logs -f -c ruby web-1 # Display only the most recent 20 lines of output in pod nginx $ kubectl logs --tail=20 nginx # Show all logs from pod nginx written in the last hour $ kubectl logs --since=1h nginx3. 查看k8s服務日志
3.1. journalctl
在Linux系統上systemd系統來管理kubernetes服務,并且journal系統會接管服務程序的輸出日志,可以通過systemctl status?或journalctl -u?-f來查看kubernetes服務的日志。
其中kubernetes組件包括:
| kube-apiserver | ? | ? |
| kube-controller-manager | Pod擴容相關或RC相關 | ? |
| kube-scheduler | Pod擴容相關或RC相關 | ? |
| kubelet | Pod生命周期相關:創建、停止等 | ? |
| etcd | ? |
3.2. 日志文件
也可以通過指定日志存放目錄來保存和查看日志
- --logtostderr=false:不輸出到stderr
- --log-dir=/var/log/kubernetes:日志的存放目錄
- --alsologtostderr=false:設置為true表示日志輸出到文件也輸出到stderr
- --v=0:glog的日志級別
- --vmodule=gfs=2,test=4:glog基于模塊的詳細日志級別
4. 常見問題
4.1. Pod狀態一直為Pending
kubectl describe <pod_name> --namespace=<NAMESPACE>查看該POD的事件。
- 正在下載鏡像但拉取不下來(鏡像拉取耗時太久)[一般都是該原因]
- 沒有可用的Node可調度
- 開啟了資源配額管理并且當前Pod的目標節點上恰好沒有可用的資源
解決方法:
4.2. Pod創建后不斷重啟
kubectl get pods中Pod狀態一會running,一會不是,且RESTARTS次數不斷增加。
一般原因為容器啟動命令不是阻塞式命令,導致容器運行后馬上退出。
非阻塞式命令:
- 本身CMD指定的命令就是非阻塞式命令
- 將服務啟動方式設置為后臺運行
解決方法:
1、將命令改為阻塞式命令(前臺運行),例如:zkServer.sh start-foreground
2、java運行程序的啟動腳本將 nohup xxx &的nobup和&去掉,例如:
nohup JAVA_HOME/bin/java JAVA_OPTS -cp $CLASSPATH com.cnc.open.processor.Main &改為:
JAVA_HOME/bin/java JAVA_OPTS -cp $CLASSPATH com.cnc.open.processor.Main文章參考《Kubernetes權威指南》
總結
以上是生活随笔為你收集整理的kubernetes问题排查的全部內容,希望文章能夠幫你解決所遇到的問題。