Etcd命令随记:修订间差异
小无编辑摘要 |
小无编辑摘要 |
||
第77行: | 第77行: | ||
docker exec -it -e ETCDCTL_API=2 `docker ps | awk '/etcd /{print $1}'` etcdctl cluster-health | docker exec -it -e ETCDCTL_API=2 `docker ps | awk '/etcd /{print $1}'` etcdctl cluster-health | ||
=== 磁盘/网络性能要求相关 === | |||
[https://etcd.io/docs/v3.5/op-guide/hardware/ official documentation]<blockquote>etcd is very sensitive to disk write latency. Typically 50 sequential IOPS (e.g., a 7200 RPM disk) is required. For heavily loaded clusters, 500 sequential IOPS (e.g., a typical local SSD or a high performance virtualized block device) is recommended. Note that most cloud providers publish concurrent IOPS rather than sequential IOPS; the published concurrent IOPS can be 10x greater than the sequential IOPS. To measure actual sequential IOPS, we suggest using a disk benchmarking tool such as diskbench or fio. | |||
</blockquote>[https://access.redhat.com/documentation/zh-cn/openshift_container_platform/4.10/html-single/scalability_and_performance/index#recommended-etcd-practices_recommended-host-practices openshift_container_platform 推荐的 etcd 实践]<blockquote>就延迟而言,应该在一个可最少以 50 IOPS 按顺序写入 8000 字节的块设备上运行。也就是说,当有一个 20ms 的延迟时,使用 fdatasync 来同步 WAL 中的写入操作。对于高负载的集群,建议使用 8000 字节的连续 500 IOPS (2 毫秒)。要测量这些数字,您可以使用基准测试工具,如 fio。 | |||
</blockquote>[https://www.jianshu.com/p/f31ef5e7bdd0 简书 - etcd 性能测试与调优] | |||
[https://bbs.pceva.com.cn/thread-24244-1-1.html 对硬盘性能的深度解析之二]<blockquote>IO延迟与Queue Depth(队列深度)/Queue Length (队列长度) | |||
IO延迟是指控制器将IO指令发出之后,直到IO完成的过程中总共花费的时间。早前业界不成文的规定为,只要IO延迟在20ms内,IO性能对于应用程序来说都是可以接受的,但是如果大于20ms,应用程序的性能将会受到较大影响。(JMF602的小文件随机写入IOPS是个位数,所以你们觉得卡) | |||
这样算下来,存储设备应当满足最低的IOPS要求应该为1S/20ms=50IOPS,所以只要区区50IOPS就可以满足这个要求了。单块机械硬盘的IOPS一般在80附近(7200转),固态硬盘的话就比较夸张了,对于大型的存储设备,通过并行N个IO通道工作,达到几十万甚至几百万IOPS都不是问题。 | |||
然而不能总以最低标准来要求存储设备。当接收到的IO很少的时候,IO延迟也会很小。比如一块Intel X25-M Gen2 34nm 80G固态硬盘,即使延迟平均在0.1ms的话,每个IO通道的IOPS=1000/0.1=10000,但是这块固态硬盘被厂家标称35000的读取IOPS,这里就引出另一个概念:Queue Depth(队列深度,也可以叫队列长度) | |||
控制器向存储设备发起的指令,不是一条条发送的,而是一批批的发送,存储目标设备批量执行IO,然后把数据和结果返回控制器。只要存储设备肚量和消化能力足够强,在IO比较少的时候,处理一条指令和同时处理多条指令将会消耗几乎相同的时间。控制器发出的批量指令的最大条数,由控制器上的Queue Depth(队列深度)决定。(一般好的固态硬盘主控,队列深度都支持到32了) | |||
如果给出队列深度,IOPS,IO延迟三者中的任意两者,则可以推算出第三者,公式:IOPS=(队列深度)/ (IO延迟)。实际上,随着队列深度的增加,IO延迟也在增加,二者是互相促进的关系,所以,随着IO数目的增多,将很快达到存储设备提供的最大IOPS处理能力,此时IO延迟将会陡峭升高,而IOPS则增加缓慢。(消化不良)</blockquote> | |||
[[分类:Linux]] | [[分类:Linux]] | ||
[[分类:K8s]] | [[分类:K8s]] |
2023年11月17日 (五) 19:08的版本
切换 ETCDCTL version
with docker exec
docker exec -e "ETCDCTL_API=2" etcd_container_name_or_id etcdctl --help
env
export ETCDCTL_API=2
排障相关
- 如果etcd是基于k8s manifest启动的,在指定endpoints只能通过环境变量指定,或者在bash环境下unset environment variable "ETCDCTL_ENDPOINTS" 然后再通过参数传入,不然会提示配置冲突
2021-10-19 03:25:28.520086 C | pkg/flags: conflicting environment variable "ETCDCTL_ENDPOINTS" is shadowed by corresponding command-line flag (either unset environment variable or disable flag)
etcdctl v3 api
etcdctl v3可用的环境变量可见文档 https://github.com/etcd-io/etcd/blob/main/etcdctl/README.md
其中常见的包括
export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379,https://10.255.251.102:2379,https://10.255.251.103:2379 export ETCDCTL_CACERT=/etc/kubernetes/ssl/etcd/ca.crt export ETCDCTL_CERT=/etc/kubernetes/ssl/etcd/peer.crt export ETCDCTL_KEY=/etc/kubernetes/ssl/etcd/peer.key
etcd member list (v3 API)
- etcdctl member list
[root@test kubelet]# docker exec -e "ETCDCTL_ENDPOINTS=https://192.168.150.12:12379,https://192.168.150.13:12379,https://192.168.150.14:12379" `docker ps | awk '/etcd /{print $1}'` etcdctl member list -w table +------------------+---------+-------------------------+-----------------------------+------------------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+-------------------------+-----------------------------+------------------------------+------------+ | 8a8a69237e3e00ef | started | dce-etcd-192.168.150.12 | http://192.168.150.12:12380 | https://192.168.150.12:12379 | false | | d58cc05313738455 | started | dce-etcd-192.168.150.13 | http://192.168.150.13:12380 | https://192.168.150.13:12379 | false | | ed2566e796a749a6 | started | dce-etcd-192.168.150.14 | http://192.168.150.14:12380 | https://192.168.150.14:12379 | false | +------------------+---------+-------------------------+-----------------------------+------------------------------+------------+
etcd endpoint health (v3 API)
- etcdctl endpoint health
docker exec -e "ETCDCTL_API=3" -e "ETCDCTL_ENDPOINTS=https://192.168.155.22:12379,https://192.168.155.23:12379,https://192.168.155.24:12379" etcd_container_name_or_id etcdctl endpoint health -w table +------------------------------+--------+--------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +------------------------------+--------+--------------+-------+ | https://192.168.155.23:12379 | true | 14.2308ms | | | https://192.168.155.22:12379 | true | 14.572283ms | | | https://192.168.155.24:12379 | true | 351.572429ms | | +------------------------------+--------+--------------+-------+
etcd endpoint status (v3 API)
- etcdctl endpoint status
docker exec -e "ETCDCTL_API=3" -e "ETCDCTL_ENDPOINTS=https://192.168.155.22:12379,https://192.168.155.23:12379,https://192.168.155.24:12379" etcd_container_name_or_id etcdctl endpoint status -w table +------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.155.22:12379 | 5710b6824446f271 | 3.4.1 | 35 MB | false | false | 52 | 7925759 | 7925759 | | | https://192.168.155.23:12379 | 2a8509b66bfae6b6 | 3.4.1 | 35 MB | true | false | 52 | 7925759 | 7925759 | | | https://192.168.155.24:12379 | 72f4884011f8a2b | 3.4.1 | 35 MB | false | false | 52 | 7925760 | 7925760 | | +------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
etcd cluster-health (v2 API)
etcdctl v2可用的环境变量可见文档 https://github.com/etcd-io/etcd/blob/main/etcdctl/READMEv2.md
etcdctl v2常用的环境变量
ETCDCTL_ENDPOINT ETCDCTL_CA_FILE ETCDCTL_KEY_FILE ETCDCTL_CERT_FILE
- etcdctl cluster-health
docker exec -e "ETCDCTL_API=2" -e "ETCDCTL_ENDPOINTS=https://192.168.155.22:12379,https://192.168.155.23:12379,https://192.168.155.24:12379" etcd_container_name_or_id etcdctl cluster-health member 72f4884011f8a2b is healthy: got healthy result from https://192.168.155.24:12379 member 2a8509b66bfae6b6 is healthy: got healthy result from https://192.168.155.23:12379 member 5710b6824446f271 is healthy: got healthy result from https://192.168.155.22:12379 cluster is healthy
docker exec -it -e ETCDCTL_API=2 `docker ps | awk '/etcd /{print $1}'` etcdctl cluster-health
磁盘/网络性能要求相关
etcd is very sensitive to disk write latency. Typically 50 sequential IOPS (e.g., a 7200 RPM disk) is required. For heavily loaded clusters, 500 sequential IOPS (e.g., a typical local SSD or a high performance virtualized block device) is recommended. Note that most cloud providers publish concurrent IOPS rather than sequential IOPS; the published concurrent IOPS can be 10x greater than the sequential IOPS. To measure actual sequential IOPS, we suggest using a disk benchmarking tool such as diskbench or fio.
openshift_container_platform 推荐的 etcd 实践
就延迟而言,应该在一个可最少以 50 IOPS 按顺序写入 8000 字节的块设备上运行。也就是说,当有一个 20ms 的延迟时,使用 fdatasync 来同步 WAL 中的写入操作。对于高负载的集群,建议使用 8000 字节的连续 500 IOPS (2 毫秒)。要测量这些数字,您可以使用基准测试工具,如 fio。
IO延迟与Queue Depth(队列深度)/Queue Length (队列长度)
IO延迟是指控制器将IO指令发出之后,直到IO完成的过程中总共花费的时间。早前业界不成文的规定为,只要IO延迟在20ms内,IO性能对于应用程序来说都是可以接受的,但是如果大于20ms,应用程序的性能将会受到较大影响。(JMF602的小文件随机写入IOPS是个位数,所以你们觉得卡)
这样算下来,存储设备应当满足最低的IOPS要求应该为1S/20ms=50IOPS,所以只要区区50IOPS就可以满足这个要求了。单块机械硬盘的IOPS一般在80附近(7200转),固态硬盘的话就比较夸张了,对于大型的存储设备,通过并行N个IO通道工作,达到几十万甚至几百万IOPS都不是问题。
然而不能总以最低标准来要求存储设备。当接收到的IO很少的时候,IO延迟也会很小。比如一块Intel X25-M Gen2 34nm 80G固态硬盘,即使延迟平均在0.1ms的话,每个IO通道的IOPS=1000/0.1=10000,但是这块固态硬盘被厂家标称35000的读取IOPS,这里就引出另一个概念:Queue Depth(队列深度,也可以叫队列长度)
控制器向存储设备发起的指令,不是一条条发送的,而是一批批的发送,存储目标设备批量执行IO,然后把数据和结果返回控制器。只要存储设备肚量和消化能力足够强,在IO比较少的时候,处理一条指令和同时处理多条指令将会消耗几乎相同的时间。控制器发出的批量指令的最大条数,由控制器上的Queue Depth(队列深度)决定。(一般好的固态硬盘主控,队列深度都支持到32了)
如果给出队列深度,IOPS,IO延迟三者中的任意两者,则可以推算出第三者,公式:IOPS=(队列深度)/ (IO延迟)。实际上,随着队列深度的增加,IO延迟也在增加,二者是互相促进的关系,所以,随着IO数目的增多,将很快达到存储设备提供的最大IOPS处理能力,此时IO延迟将会陡峭升高,而IOPS则增加缓慢。(消化不良)