K8s下iptables-invalid-drop引起的耗时波动或者偶发断流随记
环境前提
- 有kube-proxy组件且工作在iptables模式下
- 可有可无的条件: calico CNI
可能的诱因 & 现象结果
- overlay POD 与集群外服务通讯
- underlay与overlay网络通讯(去程overlay 回程underlay导致 asymmetrical routing 即非对称路由)
- conntrack saturation? (conntrack 饱和)
产生偶发性大耗时 或者 偶发性断流现象
在 kube-proxy 所维护的filter KUBE-FORWARD iptables规则表中,存在一条规则-A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP
[root@gzu-prd ~]# iptables -L KUBE-FORWARD --line -nv Chain KUBE-FORWARD (1 references) num pkts bytes target prot opt in out source destination 1 0 0 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 ctstate INVALID 2 4 240 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding rules */ mark match 0x4000/0x4000 3 11412 33M ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED 4 0 0 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED
这一条规则会导致在connection track标记为INVALID的流量被DROP处理,同时这一行为目前不支持配置禁用(除非改代码重新编译)
其中关于TCP的connection track状态可以在conntrack -L 或者 cat /proc/net/nf_conntrack中查到(例如[UNREPLIED]之类的)
kube-proxy会在endpoint发生变动的时候粗暴地Flush iptables规则,导致不能简单地在KUBE-FORWARD中插入一条ACCEPT规则来避免这种问题
同样在calico所维护的各种iptables filter表中,每一个cali-fw-cali****表基本也存在规则-m conntrack --ctstate INVALID -j DROP
[root@gzu-prd ~]# iptables-save -t filter|grep INVALID -A cali-fw-cali02fca994756 -m comment --comment "cali:Zgj-5PhkyRyRGc5v" -m conntrack --ctstate INVALID -j DROP -A cali-fw-cali091fd1acd82 -m comment --comment "cali:vySNraYuHVkcwzZC" -m conntrack --ctstate INVALID -j DROP -A cali-fw-cali0945b5ec7e6 -m comment --comment "cali:YpO6T4K2fN2biMqp" -m conntrack --ctstate INVALID -j DROP -A cali-fw-cali09725d6075c -m comment --comment "cali:3Q23jKsPGkXWWHjs" -m conntrack --ctstate INVALID -j DROP
但是这一行为是可以通过FELIX_DISABLECONNTRACKINVALIDCHECK环境变量关闭
如果在不修改kube-proxy和calico-node参数的情况下,想避免这种情况,可以简单粗暴地在集群中设置一个daemonset
kind: DaemonSet apiVersion: apps/v1 metadata: name: iptables-conntrack-hacker namespace: kube-system labels: app: iptables-conntrack spec: selector: matchLabels: app: iptables-conntrack-hacker template: metadata: name: iptables-conntrack-hacker labels: app: iptables-conntrack-hacker spec: volumes: - name: lib-modules hostPath: path: /lib/modules type: '' - name: xtables-lock hostPath: path: /run/xtables.lock type: '' containers: - name: iptables-conntrack-hacker image: 'your-registry-address/kube-system/kube-proxy:v1.18.20' command: - /bin/sh - '-c' - | export TZ=Asia/Shanghai; echo "$(date) postStart ..."; iptables -C FORWARD -w 10 -m conntrack -m comment --comment "To avoid invalid tcp traffic dropped by kubelet" --ctstate INVALID -j ACCEPT || \ iptables -I FORWARD -w 10 -m conntrack -m comment --comment "To avoid invalid tcp traffic dropped by kubelet" --ctstate INVALID -j ACCEPT && echo "Add iptables rules ..."; iptables -w 10 -L FORWARD --line -nv|grep INV; tail -f /dev/stdout; resources: limits: cpu: 250m memory: 256Mi requests: cpu: 1m memory: 1Mi volumeMounts: - name: lib-modules mountPath: /lib/modules - name: xtables-lock mountPath: /run/xtables.lock lifecycle: postStart: exec: command: - /bin/sh - '-c' - | sleep 10; preStop: exec: command: - /bin/sh - '-c' - | export TZ=Asia/Shanghai; echo "$(date) preStop delete iptables rules ..." > /proc/1/fd/1 2>&1; iptables -D FORWARD -w 10 -m conntrack -m comment --comment "To avoid invalid tcp traffic dropped by kubelet" --ctstate INVALID -j ACCEPT > /proc/1/fd/1 2>&1; sleep 10; imagePullPolicy: IfNotPresent securityContext: privileged: true runAsUser: 0 restartPolicy: Always terminationGracePeriodSeconds: 5 dnsPolicy: ClusterFirstWithHostNet hostNetwork: true securityContext: {} schedulerName: default-scheduler tolerations: - key: CriticalAddonsOnly operator: Exists - operator: Exists effect: NoExecute - operator: Exists effect: NoSchedule updateStrategy: type: RollingUpdate rollingUpdate: maxUnavailable: 50% revisionHistoryLimit: 2
这个Daemonset只有在启动的时候会去操作宿主机的iptables以粗暴地插入一条INVALID ACCEPT规则
有条件的同学可以修改为死循环并且每10 - 30秒检测一次iptables是否存在ACCEPT规则,不存在则插入
Related
kube-proxy(v1.18.20) code: https://github.com/kubernetes/kubernetes/blob/1f3e19b7beb1cc0110255668c4238ed63dadb7ad/pkg/proxy/iptables/proxier.go#L1503-L1511
calico v3.16 config(FELIX_DISABLECONNTRACKINVALIDCHECK): https://docs.tigera.io/archive/v3.16/reference/felix/configuration