K8s的一些小坑或者bug简要记录:修订间差异
(创建页面,内容为“=== kubectl === ==== kubectl rollout history ==== kubectl rollout history 在 v1.26之前,如果带上-o yaml或者-o json之类的-o 参数,输出的内容会…”) |
小无编辑摘要 |
||
(未显示同一用户的7个中间版本) | |||
第1行: | 第1行: | ||
=== kubectl === | __TOC__ | ||
=== kubectl=== | |||
==== kubectl rollout history ==== | ==== kubectl rollout history==== | ||
kubectl rollout history 在 v1.26之前,如果带上-o yaml或者-o json之类的-o 参数,输出的内容会是错误的版本内容 | kubectl rollout history 在 v1.26之前,如果带上-o yaml或者-o json之类的-o 参数,输出的内容会是错误的版本内容 | ||
相关Issue https://github.com/kubernetes/kubectl/issues/598#issuecomment-1230824762 | 相关Issue https://github.com/kubernetes/kubectl/issues/598#issuecomment-1230824762 | ||
==== kubectl apply 在特定情况下可能有bug或者非预期行为 ==== | ==== kubectl apply 在特定情况下可能有bug或者非预期行为==== | ||
前提提要: kubectl apply 的工作涉及到了计算行为 [https://kubernetes.io/docs/tasks/manage-kubernetes-objects/declarative-config/#how-apply-calculates-differences-and-merges-changes How apply calculates differences and merges changes] | 前提提要: kubectl apply 的工作涉及到了计算行为 [https://kubernetes.io/docs/tasks/manage-kubernetes-objects/declarative-config/#how-apply-calculates-differences-and-merges-changes How apply calculates differences and merges changes] | ||
例如如果在kubectl 1.18,kubectl apply操作hostAliases的时候可能是追加而不是替换 | 例如如果在kubectl 1.18,kubectl apply操作hostAliases的时候可能是追加而不是替换 [[在使用kubectl_apply操作hostalias产生的非预期行为]] | ||
还有一个修改probe配置,apply会有异常的,这个基本也是跟apply计算实现有关(只出现在1.18,不是很记得怎么复现,有缘再补) | 还有一个修改probe配置,apply会有异常的,这个基本也是跟apply计算实现有关(只出现在1.18,不是很记得怎么复现,有缘再补) | ||
kubernetes中apply命令执行的全过程源码解析:https://juejin.cn/post/6968106028642598949 | kubernetes中apply命令执行的全过程源码解析:https://juejin.cn/post/6968106028642598949 | ||
=== kubelet=== | |||
==== kubelet 1.27前串行拉取容器镜像==== | |||
https://kubernetes.io/docs/concepts/containers/images/#serial-and-parallel-image-pulls | |||
By default, kubelet pulls images serially. In other words, kubelet sends only one image pull request to the image service at a time. Other image pull requests have to wait until the one being processed is complete. | |||
kubernetes 节点上kubelet在1.27版本之前对于容器镜像是串行拉取的,串行值为1,这在拉公网镜像的时候会有可能导致其它容器镜像一直处在拉取状态,在1.27中改成了并行镜像拉取 | |||
==== kubelet 不断刷大量的 'Path "/var/lib/kubelet/pods/${pod_ID}/volumes" does not exist' 日志报错 ==== | |||
关联原因Issue里面介绍是runc cgroup GC异常 | |||
issue: | |||
https://github.com/kubernetes/kubernetes/issues/112124 | |||
底部有cgroup清理脚本,但是KUBE_POD_IDS的取值逻辑要根据实际环境调整,而且就算改完了,rmdir cgroup directory会提示Device or resource busy错误 | |||
继续关联issue: | |||
https://github.com/kubernetes/kubernetes/issues/112151#issuecomment-1285261341 | |||
issue解释诱因: 磁盘IO | |||
==== kubelet 刷 vol_data.json: no such file or directory 日志 ==== | |||
报错日志样式: | |||
operationExecutor.UnmountVolume failed | |||
failed to open volume data file [/var/lib/kubelet/pods/${pod_id}/volumes/kubernetes.io~csi/${pvc_id}/vol_data.json]: open /var/lib/kubelet/pods/${pod_id}/volumes/kubernetes.io~csi/${pvc_id}/vol_data.json: no such file or directory | |||
Issue: | |||
https://github.com/kubernetes/kubernetes/issues/85280 | |||
里面issue creator有提及: <blockquote>When there is something wrong to execute os.Remove(volPath), volume path is left on node. However, mount path and vol_data.json is deleted.</blockquote>这时候实践下来,可以手动umount,重启kubelet错误即可解除 | |||
其他issue中有提及<nowiki/>https://github.com/kubernetes/kubernetes/issues/116847#issuecomment-1721540974<blockquote>Alright one last update. If anyone is running into problems like these, make sure your CSI driver implements <code>NodeUnpublish</code> correctly at very minimum and idempotent. This issue imo is almost entirely caused by problematic CSI driver implementations.</blockquote> | |||
[[分类:K8s]] | [[分类:K8s]] |
2023年11月8日 (三) 17:12的最新版本
kubectl
kubectl rollout history
kubectl rollout history 在 v1.26之前,如果带上-o yaml或者-o json之类的-o 参数,输出的内容会是错误的版本内容
相关Issue https://github.com/kubernetes/kubectl/issues/598#issuecomment-1230824762
kubectl apply 在特定情况下可能有bug或者非预期行为
前提提要: kubectl apply 的工作涉及到了计算行为 How apply calculates differences and merges changes
例如如果在kubectl 1.18,kubectl apply操作hostAliases的时候可能是追加而不是替换 在使用kubectl_apply操作hostalias产生的非预期行为
还有一个修改probe配置,apply会有异常的,这个基本也是跟apply计算实现有关(只出现在1.18,不是很记得怎么复现,有缘再补)
kubernetes中apply命令执行的全过程源码解析:https://juejin.cn/post/6968106028642598949
kubelet
kubelet 1.27前串行拉取容器镜像
https://kubernetes.io/docs/concepts/containers/images/#serial-and-parallel-image-pulls
By default, kubelet pulls images serially. In other words, kubelet sends only one image pull request to the image service at a time. Other image pull requests have to wait until the one being processed is complete.
kubernetes 节点上kubelet在1.27版本之前对于容器镜像是串行拉取的,串行值为1,这在拉公网镜像的时候会有可能导致其它容器镜像一直处在拉取状态,在1.27中改成了并行镜像拉取
kubelet 不断刷大量的 'Path "/var/lib/kubelet/pods/${pod_ID}/volumes" does not exist' 日志报错
关联原因Issue里面介绍是runc cgroup GC异常
issue:
https://github.com/kubernetes/kubernetes/issues/112124
底部有cgroup清理脚本,但是KUBE_POD_IDS的取值逻辑要根据实际环境调整,而且就算改完了,rmdir cgroup directory会提示Device or resource busy错误
继续关联issue:
https://github.com/kubernetes/kubernetes/issues/112151#issuecomment-1285261341
issue解释诱因: 磁盘IO
kubelet 刷 vol_data.json: no such file or directory 日志
报错日志样式:
operationExecutor.UnmountVolume failed
failed to open volume data file [/var/lib/kubelet/pods/${pod_id}/volumes/kubernetes.io~csi/${pvc_id}/vol_data.json]: open /var/lib/kubelet/pods/${pod_id}/volumes/kubernetes.io~csi/${pvc_id}/vol_data.json: no such file or directory
Issue:
https://github.com/kubernetes/kubernetes/issues/85280
里面issue creator有提及:
When there is something wrong to execute os.Remove(volPath), volume path is left on node. However, mount path and vol_data.json is deleted.
这时候实践下来,可以手动umount,重启kubelet错误即可解除 其他issue中有提及https://github.com/kubernetes/kubernetes/issues/116847#issuecomment-1721540974
Alright one last update. If anyone is running into problems like these, make sure your CSI driver implements
NodeUnpublish
correctly at very minimum and idempotent. This issue imo is almost entirely caused by problematic CSI driver implementations.