Etcd命令随记

来自三线的随记
Admin讨论 | 贡献2023年11月17日 (五) 19:08的版本

切换 ETCDCTL version

with docker exec

docker exec -e "ETCDCTL_API=2" etcd_container_name_or_id etcdctl --help

env

export ETCDCTL_API=2

排障相关

  • 如果etcd是基于k8s manifest启动的,在指定endpoints只能通过环境变量指定,或者在bash环境下unset environment variable "ETCDCTL_ENDPOINTS" 然后再通过参数传入,不然会提示配置冲突
2021-10-19 03:25:28.520086 C | pkg/flags: conflicting environment variable "ETCDCTL_ENDPOINTS" is shadowed by corresponding command-line flag (either unset environment variable or disable flag)

etcdctl v3 api

etcdctl v3可用的环境变量可见文档 https://github.com/etcd-io/etcd/blob/main/etcdctl/README.md

其中常见的包括

export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379,https://10.255.251.102:2379,https://10.255.251.103:2379
export ETCDCTL_CACERT=/etc/kubernetes/ssl/etcd/ca.crt
export ETCDCTL_CERT=/etc/kubernetes/ssl/etcd/peer.crt
export ETCDCTL_KEY=/etc/kubernetes/ssl/etcd/peer.key
etcd member list (v3 API)
  • etcdctl member list
[root@test kubelet]# docker exec -e "ETCDCTL_ENDPOINTS=https://192.168.150.12:12379,https://192.168.150.13:12379,https://192.168.150.14:12379" `docker ps | awk '/etcd /{print $1}'` etcdctl member list -w table
+------------------+---------+-------------------------+-----------------------------+------------------------------+------------+
|        ID        | STATUS  |          NAME           |         PEER ADDRS          |         CLIENT ADDRS         | IS LEARNER |
+------------------+---------+-------------------------+-----------------------------+------------------------------+------------+
| 8a8a69237e3e00ef | started | dce-etcd-192.168.150.12 | http://192.168.150.12:12380 | https://192.168.150.12:12379 |      false |
| d58cc05313738455 | started | dce-etcd-192.168.150.13 | http://192.168.150.13:12380 | https://192.168.150.13:12379 |      false |
| ed2566e796a749a6 | started | dce-etcd-192.168.150.14 | http://192.168.150.14:12380 | https://192.168.150.14:12379 |      false |
+------------------+---------+-------------------------+-----------------------------+------------------------------+------------+
etcd endpoint health (v3 API)
  • etcdctl endpoint health
docker exec -e "ETCDCTL_API=3" -e "ETCDCTL_ENDPOINTS=https://192.168.155.22:12379,https://192.168.155.23:12379,https://192.168.155.24:12379" etcd_container_name_or_id etcdctl endpoint health -w table

+------------------------------+--------+--------------+-------+
|           ENDPOINT           | HEALTH |     TOOK     | ERROR |
+------------------------------+--------+--------------+-------+
| https://192.168.155.23:12379 |   true |    14.2308ms |       |
| https://192.168.155.22:12379 |   true |  14.572283ms |       |
| https://192.168.155.24:12379 |   true | 351.572429ms |       |
+------------------------------+--------+--------------+-------+
etcd endpoint status (v3 API)
  • etcdctl endpoint status
docker exec -e "ETCDCTL_API=3" -e "ETCDCTL_ENDPOINTS=https://192.168.155.22:12379,https://192.168.155.23:12379,https://192.168.155.24:12379" etcd_container_name_or_id etcdctl endpoint status -w table

+------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|           ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.155.22:12379 | 5710b6824446f271 |   3.4.1 |   35 MB |     false |      false |        52 |    7925759 |            7925759 |        |
| https://192.168.155.23:12379 | 2a8509b66bfae6b6 |   3.4.1 |   35 MB |      true |      false |        52 |    7925759 |            7925759 |        |
| https://192.168.155.24:12379 |  72f4884011f8a2b |   3.4.1 |   35 MB |     false |      false |        52 |    7925760 |            7925760 |        |
+------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

etcd cluster-health (v2 API)

etcdctl v2可用的环境变量可见文档 https://github.com/etcd-io/etcd/blob/main/etcdctl/READMEv2.md

etcdctl v2常用的环境变量

ETCDCTL_ENDPOINT 
ETCDCTL_CA_FILE
ETCDCTL_KEY_FILE
ETCDCTL_CERT_FILE
  • etcdctl cluster-health
docker exec -e "ETCDCTL_API=2" -e "ETCDCTL_ENDPOINTS=https://192.168.155.22:12379,https://192.168.155.23:12379,https://192.168.155.24:12379" etcd_container_name_or_id etcdctl cluster-health
member 72f4884011f8a2b is healthy: got healthy result from https://192.168.155.24:12379
member 2a8509b66bfae6b6 is healthy: got healthy result from https://192.168.155.23:12379
member 5710b6824446f271 is healthy: got healthy result from https://192.168.155.22:12379
cluster is healthy
docker exec -it -e ETCDCTL_API=2 `docker ps | awk '/etcd /{print $1}'` etcdctl cluster-health

磁盘/网络性能要求相关

official documentation

etcd is very sensitive to disk write latency. Typically 50 sequential IOPS (e.g., a 7200 RPM disk) is required. For heavily loaded clusters, 500 sequential IOPS (e.g., a typical local SSD or a high performance virtualized block device) is recommended. Note that most cloud providers publish concurrent IOPS rather than sequential IOPS; the published concurrent IOPS can be 10x greater than the sequential IOPS. To measure actual sequential IOPS, we suggest using a disk benchmarking tool such as diskbench or fio.

openshift_container_platform 推荐的 etcd 实践

就延迟而言,应该在一个可最少以 50 IOPS 按顺序写入 8000 字节的块设备上运行。也就是说,当有一个 20ms 的延迟时,使用 fdatasync 来同步 WAL 中的写入操作。对于高负载的集群,建议使用 8000 字节的连续 500 IOPS (2 毫秒)。要测量这些数字,您可以使用基准测试工具,如 fio。

简书 - etcd 性能测试与调优


对硬盘性能的深度解析之二

IO延迟与Queue Depth(队列深度)/Queue Length (队列长度)

IO延迟是指控制器将IO指令发出之后,直到IO完成的过程中总共花费的时间。早前业界不成文的规定为,只要IO延迟在20ms内,IO性能对于应用程序来说都是可以接受的,但是如果大于20ms,应用程序的性能将会受到较大影响。(JMF602的小文件随机写入IOPS是个位数,所以你们觉得卡)

这样算下来,存储设备应当满足最低的IOPS要求应该为1S/20ms=50IOPS,所以只要区区50IOPS就可以满足这个要求了。单块机械硬盘的IOPS一般在80附近(7200转),固态硬盘的话就比较夸张了,对于大型的存储设备,通过并行N个IO通道工作,达到几十万甚至几百万IOPS都不是问题。

然而不能总以最低标准来要求存储设备。当接收到的IO很少的时候,IO延迟也会很小。比如一块Intel X25-M Gen2 34nm 80G固态硬盘,即使延迟平均在0.1ms的话,每个IO通道的IOPS=1000/0.1=10000,但是这块固态硬盘被厂家标称35000的读取IOPS,这里就引出另一个概念:Queue Depth(队列深度,也可以叫队列长度)

控制器向存储设备发起的指令,不是一条条发送的,而是一批批的发送,存储目标设备批量执行IO,然后把数据和结果返回控制器。只要存储设备肚量和消化能力足够强,在IO比较少的时候,处理一条指令和同时处理多条指令将会消耗几乎相同的时间。控制器发出的批量指令的最大条数,由控制器上的Queue Depth(队列深度)决定。(一般好的固态硬盘主控,队列深度都支持到32了)

如果给出队列深度,IOPS,IO延迟三者中的任意两者,则可以推算出第三者,公式:IOPS=(队列深度)/ (IO延迟)。实际上,随着队列深度的增加,IO延迟也在增加,二者是互相促进的关系,所以,随着IO数目的增多,将很快达到存储设备提供的最大IOPS处理能力,此时IO延迟将会陡峭升高,而IOPS则增加缓慢。(消化不良)