Kubernetes系列《六》——⽇常故障处理集锦这篇⽂章仅供业务中台的兄弟姐妹们⽇常排查故障所⽤,对于平台层⾯的⼤神,可忽略不计。
问题1:K8S集群服务访问失败?
老托福听力curl: (60) Peer's Certificate issuer is not recognized.
curl performs SSL certificate verification by default, using a "bundle"
of Certificate Authority (CA) public keys (CA certs). If the default
bundle file isn't adequate, you can specify an alternate file
using the --cacert option.
If this HTTPS rver us a certificate signed by a CA reprented in
the bundle, the certificate verification probably failed due to a
problem with the certificate (it might be expired, or the name might
not match the domain name in the URL).
严格英文If you'd like to turn off curl's verification of the certificate, u
the -k (or --incure) option.
原因分析:证书不能被识别,其原因为:⾃定义证书,过期等。
解决⽅法:更新证书即可。
问题2:K8S集群服务访问失败?
curl: (7) Failed connect to 10.103.22.158:3000; Connection refud
原因分析:端⼝映射错误,服务正常⼯作,但不能提供服务。
解决⽅法:删除svc,重新映射端⼝即可。
kubectl delete svc nginx-deployment
问题3:K8S集群服务暴露失败?
Error from rver (AlreadyExists): rvices "nginx-deployment" already exists
原因分析:该容器已暴露服务了。
解决⽅法:删除svc,重新映射端⼝即可。
问题4:外⽹⽆法访问K8S集群提供的服务?
原因分析:K8S集群的type为ClusterIP,未将服务暴露⾄外⽹。
解决⽅法:修改K8S集群的type为NodePort即可,于是可通过所有K8S集群节点访问服务。
kubectl edit svc nginx-deployment
问题5:pod状态为ErrImagePull?
readiness-httpget-pod 0/1 ErrImagePull 0 10s
原因分析:image⽆法拉取;
Warning Failed 59m (x4 over 61m) kubelet, k8s-node01 Error: ErrImagePull
解决⽅法:更换镜像即可。
问题6:创建init C容器后,其状态不正常?
NAME READY STATUS RESTARTS AGE
myapp-pod 0/1 Init:0/2 0 20s
原因分析:查看⽇志发现,pod⼀直出于初始化中;然后查看pod详细信息,定位pod创建失败的原因为:初始化容器未执⾏完毕。
Error from rver (BadRequest): container "myapp-container" in pod "myapp-pod" is waiting to start: PodInitializing
waiting for myrvice
Server: 10.96.0.10
Address: 10.96.0.10:53
** rver can't find myrvice.default.svc.cluster.local: NXDOMAIN
*** Can't find myrvice.svc.cluster.local: No answer
*** Can't find myrvice.cluster.local: No answer
*** Can't find myrvice.default.svc.cluster.local: No answer
*** Can't find myrvice.svc.cluster.local: No answer
*** Can't find myrvice.cluster.local: No answer
解决⽅法:创建相关rvice,将SVC的name写⼊K8S集群的coreDNS服务器中,于是coreDNS就能对POD的initC容器执⾏过程中的域名解析了。
kubectl apply -f myrvice.yaml
NAME READY STATUS RESTARTS AGE
myapp-pod 0/1 Init:1/2 0 27m
myapp-pod 0/1 PodInitializing 0 28m
myapp-pod 1/1 Running 0 28m
问题7:探测存活pod状态为CrashLoopBackOff?
readiness-httpget-pod 0/1 CrashLoopBackOff 1 13s
readiness-httpget-pod 0/1 Completed 2 20s
readiness-httpget-pod 0/1 CrashLoopBackOff 2 31s
readiness-httpget-pod 0/1 Completed 3 42s
readiness-httpget-pod 0/1 CrashLoopBackOff 3 53s
原因分析:镜像问题,导致容器重启失败。
解决⽅法:更换镜像即可。
问题8:POD创建失败?
readiness-httpget-pod 0/1 Pending 0 0s
readiness-httpget-pod 0/1 Pending 0 0s
5971readiness-httpget-pod 0/1 ContainerCreating 0 0s
readiness-httpget-pod 0/1 Error 0 2s
readiness-httpget-pod 0/1 Error 1 3s
readiness-httpget-pod 0/1 CrashLoopBackOff 1 4s
readiness-httpget-pod 0/1 Error 2 15s
readiness-httpget-pod 0/1 CrashLoopBackOff 2 26s
readiness-httpget-pod 0/1 Error 3 37s
readiness-httpget-pod 0/1 CrashLoopBackOff 3 52s
readiness-httpget-pod 0/1 Error 4 82s
原因分析:镜像问题导致容器⽆法启动。
忙里偷闲英文[root@k8s-master01 ~]# kubectl logs readiness-httpget-pod
url.js:106
throw new errors.TypeError('ERR_INVALID_ARG_TYPE', 'url', 'string', url);
^
TypeError [ERR_INVALID_ARG_TYPE]: The "url" argument must be of type string. Received type undefined
at Url.par (url.js:106:11)
at Object.urlPar [as par] (url.js:100:13)
ports (/myapp/node_modules/mongodb/lib/url_parr.js:17:23)
at connect (/myapp/node_modules/mongodb/lib/mongo_client.js:159:16)
at t (/myapp/node_modules/mongodb/lib/mongo_client.js:110:3)
at Object.<anonymous> (/myapp/app.js:12:13)
at Module._compile (module.js:641:30)
at Object.Module._extensions..js (module.js:652:10)
at Module.load (module.js:560:32)
at tryModuleLoad (module.js:503:12)
at Function.Module._load (module.js:495:3)
at Function.Module.runMain (module.js:682:10)
at startup (bootstrap_node.js:191:16)
at bootstrap_node.js:613:3找出不同类的一项
Events:mst
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 58m (x5 over 59m) kubelet, k8s-node01 Container image "/library/myapp:v1" already prent on machine Normal Created 58m (x5 over 59m) kubelet, k8s-node01 Created container readiness-httpget-container
Normal Started 58m (x5 over 59m) kubelet, k8s-node01 Started container readiness-httpget-container
Warning BackOff 57m (x10 over 59m) kubelet, k8s-node01 Back-off restarting failed container
Normal Scheduled 3m35s default-scheduler Successfully assigned default/readiness-httpget-pod to k8s-node01
解决⽅法:更换镜像。
问题9:POD的ready状态未进⼊?
发言稿200字readiness-httpget-pod 0/1 Running 0 116s
原因分析:POD的执⾏命令失败,⽆法获取资源。blot
Error from rver (NotFound): pods "pod" not found
2021/06/11 07:10:14 [error] 30#30: *1 open() "/usr/share/nginx/html/index1.html" failed (2: No such file or directory), client: 10.244.2.1, rver: localhost, request: "GET /index1.html HTTP/1.1", host: "10.244.2.25:80"
10.244.2.1 - - [11/Jun/2021:07:10:14 +0000] "GET /index1.html HTTP/1.1" 404 153 "-" "kube-probe/1.15" "-"
10.244.2.1 - - [11/Jun/2021:07:10:17 +0000] "GET /index1.html HTTP/1.1" 404 153 "-" "kube-probe/1.15" "-"
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 64m kubelet, k8s-node01 Container image "/library/nginx" already prent on machine
Normal Created 64m kubelet, k8s-node01 Created container readiness-httpget-containerjame
Normal Started 64m kubelet, k8s-node01 Started container readiness-httpget-container
Warning Unhealthy 59m (x101 over 64m) kubelet, k8s-node01 Readiness probe failed: HTTP probe
failed with statuscode: 404 Normal Scheduled 8m16s default-scheduler Successfully assigned default/readiness-httpget-pod to k8s-node01
解决⽅法:进⼊容器内部,创建yaml定义的资源
问题10:pod创建失败?
error: error validating "l": error validating data: ValidationError(Pod.spec.imagePullSecrets[0]): invalid type for io.v1.LocalObjectReference: got "string", expected "map"; if you choo to ignore the errors, turn validation off with --validate=fal
原因分析:yml⽂件内容出错---使⽤中⽂字符;
解决⽅法:修改myregistrykey内容即可。
11、kube-flannel-ds-amd64-ndsf7插件pod的status为Init:0/1?
排查思路:kubectl -n kube-system describe pod kube-flannel-ds-amd64-ndsf7 #查询pod描述信息;
原因分析:k8s-slave1节点拉取镜像失败。
解决⽅法:登录k8s-slave1,重启docker服务,⼿动拉取镜像。
k8s-master节点,重新安装插件即可。
kubectl create -l;kubectl get nodes
12、K8S创建服务status为ErrImagePull?
排查思路:kubectl describe pod test-nginx
原因分析:拉取镜像名称问题。
解决⽅法:删除错误pod;重新拉取镜像;
kubectl delete pod test-nginx;kubectl run test-nginx --image=10.0.0.81:5000/nginx:alpine
13、不能进⼊指定容器内部?
Error from rver (BadRequest): container volume-test-container is not valid for pod volume-test-pod
原因分析:yml⽂件comtainers字段重复,导致该pod没有该容器。
解决⽅法:去掉yml⽂件中多余的containers字段,重新⽣成pod。
14、创建PV失败?
persistentvolume/nfspv1 unchanged
persistentvolume/nfspv01 created
Error from rver (Invalid): error when applying patch:
{"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"
{\"apiVersion\":\"v1\",\"kind\":\"PersistentVolume\",\"metadata\":{\"annotations\":{},\"name\":\"nfspv01\"},\"spec\":{\"accessModes\": [\"ReadWriteOnce\"],\"capacity\":{\"storage\":\"5Gi\"},\"nfs\":
{\"path\":\"/nfs2\",\"rver\":\"192.168.66.100\"},\"persistentVolumeReclaimPolicy\":\"Retain\",\"storageClassName\":\"nfs\"}}\n"}},"spec": {"nfs":{"path":"/nfs2"}}}
to:
Resource: "/v1, Resource=persistentvolumes", GroupVersionKind: "/v1, Kind=PersistentVolume"
Name: "nfspv01", Namespace: ""
Object: &{map["apiVersion":"v1" "kind":"PersistentVolume" "metadata":map["annotations":map["kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"v1\",\"kind\":\"PersistentVolume\",\"metadata\":{\"annotations\":{},\"name\":\"nfspv01\"},\"spec\": {\"accessModes\":[\"ReadWriteOnce\"],\"capacity\":{\"storage\":\"5Gi\"},\"nfs\":
{\"path\":\"/nfs1\",\"rver\":\"192.168.66.100\"},\"persistentVolumeReclaimPolicy\":\"Retain\",\"storageClassName\":\"nfs\"}}\n"] "creationTimestamp":"2021-06-25T01:54:24Z" "finalizers":["kubernetes.io/pv-protection"] "name":"nfspv01" "resourceVersion":"325674" "lfLink":"/api/v1/persistentvolumes/nfspv01" "uid":"89cb1d15-8012-47f0-aee6-6507bb624387"] "spec":map["accessModes": ["ReadWriteOnce"] "capacity":map["storage":"5Gi"] "nfs":map["path":"/nfs1" "rver":"192.168.66.100"] "persistentVolumeReclaimPolicy":"Retain" "storageClassName":"nfs" "volumeMode":"Filesystem"] "status":map["pha":"Available"]]} for: "PV.yml": PersistentVolume "nfspv01" is invalid: spec.persistentvolumesource: Forbidden: is immutable after creation
poljes
原因分析:pv的name字段重复。
解决⽅法:修改pv的name字段即可。
15、pod⽆法挂载PVC?
原因分析:pod⽆法挂载PVC。
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 60s default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 2 times) accessModes与可使⽤的PV不⼀致,导致⽆法挂载PVC,由于只能挂载⼤于1G且accessModes为RWO的PV,故只能成功创建1个pod,第2个pod⼀致pending,按序创建时则第3个pod⼀直未被创建;
解决⽅法:修改yml⽂件中accessModes或PV的accessModes即可。
16、问题:pod使⽤PV后,⽆法访问其内容?
原因分析:nfs卷中没有⽂件或权限不对。
解决⽅法:在nfs卷中创建⽂件并授予权限。
17、查看节点状态失败?
Error from rver (NotFound): the rver could not find the requested resource (get rvices http:heapster:)
原因分析:没有heapster服务。