K8s下iptables-invalid-drop引起的耗时波动或者偶发断流随记:修订间差异
小无编辑摘要 |
小无编辑摘要 |
||
| 第35行: | 第35行: | ||
-A cali-fw-cali09725d6075c -m comment --comment "cali:3Q23jKsPGkXWWHjs" -m conntrack --ctstate INVALID -j DROP | -A cali-fw-cali09725d6075c -m comment --comment "cali:3Q23jKsPGkXWWHjs" -m conntrack --ctstate INVALID -j DROP | ||
但是这一行为是可以通过[https://docs.tigera.io/archive/v3.16/reference/felix/configuration FELIX_DISABLECONNTRACKINVALIDCHECK]环境变量关闭 | 但是这一行为是可以通过[https://docs.tigera.io/archive/v3.16/reference/felix/configuration FELIX_DISABLECONNTRACKINVALIDCHECK]环境变量关闭 | ||
具体是否受影响,利用iptables命中计数器是观测手段之一 | |||
iptables -w 3 -L --line -nv|grep DROP|sort -rn -k 2|head -n 10 | |||
[root@gzu-prd ~]# iptables -w 3 -L --line -nv|grep DROP|sort -rn -k 2|head -n 10 | |||
2 19020 773K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:kRQn4VHUEHOpigCm */ ctstate INVALID | |||
2 15617 937K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:DTf_pGZFWLZaqlg8 */ ctstate INVALID | |||
2 7068 283K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:HGKygSKf4SfkbRyf */ ctstate INVALID | |||
2 3845 154K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:t5nJs-UfMTVjRtBI */ ctstate INVALID | |||
2 2312 139K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:h3VJGUlERuK34Tcz */ ctstate INVALID | |||
2 2115 110K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:dTQ4mHZc378Z1e33 */ ctstate INVALID | |||
2 1828 110K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:kp1Tzme9aWaPgdKP */ ctstate INVALID | |||
2 1556 62240 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:VaeGtNK_681jKlg9 */ ctstate INVALID | |||
2 1330 69160 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:meQqPUz96UN62T8l */ ctstate INVALID | |||
2 1025 53300 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:mIIn1Wh34t2SZwbR */ ctstate INVALID | |||
如果在不修改kube-proxy和calico-node参数的情况下,想避免这种情况,可以简单粗暴地在集群中设置一个daemonset | 如果在不修改kube-proxy和calico-node参数的情况下,想避免这种情况,可以简单粗暴地在集群中设置一个daemonset | ||
2023年7月28日 (五) 12:57的版本
环境前提
- 有kube-proxy组件且工作在iptables模式下
- 可有可无的条件: calico CNI
可能的诱因 & 现象结果
- overlay POD 与集群外服务通讯
- underlay与overlay网络通讯(去程overlay 回程underlay导致 asymmetrical routing 即非对称路由)
- conntrack saturation? (conntrack 饱和)
产生偶发性大耗时 或者 偶发性断流现象
在 kube-proxy 所维护的filter KUBE-FORWARD iptables规则链中,存在一条规则-A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP
[root@gzu-prd ~]# iptables -L KUBE-FORWARD --line -nv Chain KUBE-FORWARD (1 references) num pkts bytes target prot opt in out source destination 1 0 0 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 ctstate INVALID 2 4 240 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding rules */ mark match 0x4000/0x4000 3 11412 33M ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED 4 0 0 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED
这一条规则会导致在connection track标记为INVALID的流量被DROP处理,同时这一行为目前不支持配置禁用(除非改代码重新编译)
其中关于TCP的connection track状态可以在conntrack -L 或者 cat /proc/net/nf_conntrack中查到(例如[UNREPLIED]之类的)
kube-proxy会在endpoint发生变动的时候粗暴地Flush iptables规则,导致不能简单地在KUBE-FORWARD中插入一条ACCEPT规则来避免这种问题
同样在calico所维护的各种iptables filter表中,每一个cali-fw-cali****链基本也存在规则-m conntrack --ctstate INVALID -j DROP
[root@gzu-prd ~]# iptables-save -t filter|grep INVALID -A cali-fw-cali02fca994756 -m comment --comment "cali:Zgj-5PhkyRyRGc5v" -m conntrack --ctstate INVALID -j DROP -A cali-fw-cali091fd1acd82 -m comment --comment "cali:vySNraYuHVkcwzZC" -m conntrack --ctstate INVALID -j DROP -A cali-fw-cali0945b5ec7e6 -m comment --comment "cali:YpO6T4K2fN2biMqp" -m conntrack --ctstate INVALID -j DROP -A cali-fw-cali09725d6075c -m comment --comment "cali:3Q23jKsPGkXWWHjs" -m conntrack --ctstate INVALID -j DROP
但是这一行为是可以通过FELIX_DISABLECONNTRACKINVALIDCHECK环境变量关闭
具体是否受影响,利用iptables命中计数器是观测手段之一
iptables -w 3 -L --line -nv|grep DROP|sort -rn -k 2|head -n 10
[root@gzu-prd ~]# iptables -w 3 -L --line -nv|grep DROP|sort -rn -k 2|head -n 10 2 19020 773K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:kRQn4VHUEHOpigCm */ ctstate INVALID 2 15617 937K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:DTf_pGZFWLZaqlg8 */ ctstate INVALID 2 7068 283K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:HGKygSKf4SfkbRyf */ ctstate INVALID 2 3845 154K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:t5nJs-UfMTVjRtBI */ ctstate INVALID 2 2312 139K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:h3VJGUlERuK34Tcz */ ctstate INVALID 2 2115 110K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:dTQ4mHZc378Z1e33 */ ctstate INVALID 2 1828 110K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:kp1Tzme9aWaPgdKP */ ctstate INVALID 2 1556 62240 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:VaeGtNK_681jKlg9 */ ctstate INVALID 2 1330 69160 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:meQqPUz96UN62T8l */ ctstate INVALID 2 1025 53300 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:mIIn1Wh34t2SZwbR */ ctstate INVALID
如果在不修改kube-proxy和calico-node参数的情况下,想避免这种情况,可以简单粗暴地在集群中设置一个daemonset
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: iptables-conntrack-hacker
namespace: kube-system
labels:
app: iptables-conntrack
spec:
selector:
matchLabels:
app: iptables-conntrack-hacker
template:
metadata:
name: iptables-conntrack-hacker
labels:
app: iptables-conntrack-hacker
spec:
volumes:
- name: lib-modules
hostPath:
path: /lib/modules
type: ''
- name: xtables-lock
hostPath:
path: /run/xtables.lock
type: ''
containers:
- name: iptables-conntrack-hacker
image: 'your-registry-address/kube-system/kube-proxy:v1.18.20'
command:
- /bin/sh
- '-c'
- |
export TZ=Asia/Shanghai;
echo "$(date) postStart ...";
iptables -C FORWARD -w 10 -m conntrack -m comment --comment "To avoid invalid tcp traffic dropped by kubelet" --ctstate INVALID -j ACCEPT || \
iptables -I FORWARD -w 10 -m conntrack -m comment --comment "To avoid invalid tcp traffic dropped by kubelet" --ctstate INVALID -j ACCEPT && echo "Add iptables rules ...";
iptables -w 10 -L FORWARD --line -nv|grep INV;
tail -f /dev/stdout;
resources:
limits:
cpu: 250m
memory: 256Mi
requests:
cpu: 1m
memory: 1Mi
volumeMounts:
- name: lib-modules
mountPath: /lib/modules
- name: xtables-lock
mountPath: /run/xtables.lock
lifecycle:
postStart:
exec:
command:
- /bin/sh
- '-c'
- |
sleep 10;
preStop:
exec:
command:
- /bin/sh
- '-c'
- |
export TZ=Asia/Shanghai;
echo "$(date) preStop delete iptables rules ..." > /proc/1/fd/1 2>&1;
iptables -D FORWARD -w 10 -m conntrack -m comment --comment "To avoid invalid tcp traffic dropped by kubelet" --ctstate INVALID -j ACCEPT > /proc/1/fd/1 2>&1;
sleep 10;
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
runAsUser: 0
restartPolicy: Always
terminationGracePeriodSeconds: 5
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
securityContext: {}
schedulerName: default-scheduler
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- operator: Exists
effect: NoExecute
- operator: Exists
effect: NoSchedule
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 50%
revisionHistoryLimit: 2
这个Daemonset只有在启动的时候会去操作宿主机的iptables以粗暴地插入一条INVALID ACCEPT规则
有条件的同学可以修改为死循环并且每10 - 30秒检测一次iptables是否存在ACCEPT规则,不存在则插入
注意使用这个Daemonset还存在一个前提约束,如果使用的overlay CNI为calico,需要确认calico-node的iptables操作模式为追加模式
将 FELIX_CHAININSERTMODE环境变量要修改为Append ,否则cali-FORWARD这个链会被插在FORWARD链最前面,导致INVALID ACCEPT规则失效
Related
kube-proxy(v1.18.20) code: https://github.com/kubernetes/kubernetes/blob/1f3e19b7beb1cc0110255668c4238ed63dadb7ad/pkg/proxy/iptables/proxier.go#L1503-L1511
calico v3.16 config(FELIX_DISABLECONNTRACKINVALIDCHECK): https://docs.tigera.io/archive/v3.16/reference/felix/configuration