Elasticsearch的一些api随记
Health
/_cat/health
/_cluster/health
Indices health
按条件查看索引状态
/_cat/indices?help /_cat/indices?health=red&v&s=store.size:desc,index
/_cat/indices?health=yellow&v&s=store.size:desc,index
/_cat/indices?health=green&v&s=store.size:desc,index
Nodes
/_cat/nodes?v
查看es各节点磁盘空间占用、分片数目等
/_cat/allocation?v
/_cat/nodeattrs
Get master node
/_cat/master?v
可以用于定位分片状态以及分片为何故障
/_cat/shards/index_name-*?v&s=state,index&h=index,shard,prirep,state,docs,store,ip,node,unassigned.reason
/_cluster/allocation/explain
Shards
粗略查看分片情况,特别是查看分片分布节点或大小/状态
GET /_cat/shards
GET /_cat/shards?index=index_name
GET /_cat/shards?index=index_na*
查看分片分配失败原因
/_cat/shards/index_name-*?v&s=state,index&h=index,shard,prirep,state,docs,store,ip,node,unassigned.reason
Recovery API
Returns information about ongoing and completed shard recoveries, similar to the index recovery API.
For data streams, the API returns information about the stream’s backing indices
可以查看当前正在 relocating 的分片,也能查到各分片处理进度百分比
GET /_cat/recovery?active_only=true&s=index&v
Adds a data stream or index to an alias, and sets the write index or data stream for the alias
为别名设置可写索引或数据流
If the alias doesn’t exist, the add
action creates it.
POST /_aliases { "actions": [ { "add": { "index": "es-k8s-logs-000020", "alias": "es-k8s-logs-alias", "is_write_index": true } } ] }
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-thread-pool.html
/_cluster/settings?pretty&include_defaults=true | grep processors
Get maximum number of threads info
curl "127.1:9200/_cat/thread_pool?v&h=ip,node_name,id,name,max,size,queue_size,queue,active,rejected&pretty"
Templates 模板
/_cat/templates?v
⚠️ /_template/${template_name} is legacy index templates, which are deprecated and will be replaced by the composable templates introduced in Elasticsearch 7.8.
新版本中使用 /_index_template
取代
GET/PUT /_template/${template_name}
Use template to change the replicas settings of all indexes (Legacy index template)
Multiple index templates can potentially match an index, in this case, both the settings and mappings are merged into the final configuration of the index.
The order of the merging can be controlled using the order
parameter, with lower order being applied first, and higher orders overriding them.
legacy es template 中, 取值范围为 0 - 2^31-1 (0~2147483647)
PUT /_template/${template_name} { "order": 2147483647, "index_patterns": [ "*" ], "settings": { "index": { "number_of_replicas": "0" } } }
使用jq批量修改es index template的lifecycle配置
ILM (index lifecycle policy) 索引生命周期
顾名思义,ilm另外也可用于做ES集群的冷热温架构。
不同的阶段(phase)能做哪些事可以在这个 Document 查看
比较难受的是,ilm目前没有类似 _cat/templates
的接口一次性只查看这个集群已配置的 ILM 策略名字,只能一次性获取全部策略具体定义 (不过可以利用浏览器的F12 json preview折叠来曲线救国)
GET /_ilm/policy
Get specific ilm policy detail 获取特定ILM策略定义
GET /_ilm/policy/${ilm_name}
PUT /_ilm/policy/ilm-30d-delete { "policy": { "phases": { "delete": { "min_age": "30d", "actions": { "delete": { "delete_searchable_snapshot": true } } } } } }
Get index's ilm status 获取索引当前 ILM 状态
get /${index_name}/_ilm/explain
Move index's ilm to step 修改索引的ILM阶段状态(人为触发ILM action执行)
https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-move-to-step.html
POST _ilm/move/my-index-000001 { "current_step": { "phase": "new", "action": "complete", "name": "complete" }, "next_step": { "phase": "warm", "action": "forcemerge", "name": "forcemerge" } }
Manually create an index that is managed by template and ILM (including rollover operations by day)
手动创建原本应由template和ilm管控的索引,且索引名内包含日期(动态索引名)
⚠️ 这种索引不能直接粗暴地 PUT /index-name-2022.10.23-000022
以创建索引,否则手动创建出来的索引,在rollover滚动 (例如rollover-max_age:1d)的时候,创建出来的新索引名字仍然是创建索引时定义的日期,而不是当天轮滚发生时的日期(如 index-name-2022.10.23-000023
)
这个现象可以通过判断 GET /index-name/_settings
中 index.provided_name
属性看出来
解法:
PUT %3Cindex-name-%7Bnow%2Fd%7D-000099%3E
注意在kibana-Dev Tools中不要做URL Decode,他就是这样的需要编码一下(解码后就是: <index-name-{now/d}-000099>
)
ps: 如果怕创建错名字的话,可以使用 GET %3Cindex-name-%7Bnow%2Fd%7D-000099%3E/_settings
预览一下生成的索引名效果
对于索引最后的这个序号,无论前一个索引的名称是什么,该编号始终为 6 个字符,且为零填充。即使手动创建的索引结尾是-00001
,在rollover发生以后,索引后缀序号依然会变成-000002
故障处理: 有的时候对现有的索引修改了其引用的 ilm policy 为别的 policy,或者修改了其引用的 ilm policy中的 phase 定义。会导致索引ilm故障
有可能导致他的ilm处理会出问题(不记得怎么告警),没记错的话通过 GET {index_name}/_ilm/explain
能看到 error 信息,能看到卡在某个 phase 失败
这时候需要人为修改 index 的 ilm phase 修复,如
POST _ilm/move/insight-es-k8s-logs-dce5-aliyun-default-prd-2024.01.16 { "current_step": { "phase": "hot", "action": "rollover", "name": "ERROR" }, "next_step": { "phase": "cold" } }
或者尝试通过 POST {index_name}/_ilm/retry
接口重试
/_cluster/settings?include_defaults=true&pretty
/_cluster/settings?include_defaults=true
Wildcard expressions or all indices are not allowed
允许泛匹配删除索引
PUT /_cluster/settings { "persistent": { "action": { "destructive_requires_name": "false" } } }
primaries recovery settings
控制索引恢复或者relocating的并发数
{ "transient": { "cluster": { "routing": { "allocation": { "node_initial_primaries_recoveries": 10, "node_concurrent_incoming_recoveries": null, "node_concurrent_outgoing_recoveries": null, "node_concurrent_recoveries": 20 } } } } }
{ "transient": { "cluster": { "routing": { "allocation": { "node_initial_primaries_recoveries": null, "node_concurrent_incoming_recoveries": null, "node_concurrent_recoveries": null } } } } }
{ "persistent": { "cluster": { "routing": { "allocation": { "node_initial_primaries_recoveries": 30, "node_concurrent_incoming_recoveries": null, "node_concurrent_recoveries": 10 } } } } }
cluster.routing.allocation.node_concurrent_recoveries: A shortcut to set both cluster.routing.allocation.node_concurrent_incoming_recoveries
and cluster.routing.allocation.node_concurrent_outgoing_recoveries
.
# PUT /_cluster/settings { "persistent": { "cluster": { "routing": { "allocation": { "node_concurrent_recoveries": 8 } } } } }
recovery.max_bytes_per_sec 修改relocating时并发传输数据量
加大此数值可以有效缩短es relocating index的耗时
indices.recovery.max_bytes_per_sec: Limits total inbound and outbound recovery traffic for each node. Applies to both peer recoveries as well as snapshot recoveries (i.e., restores from a snapshot). Defaults to 40mb
unless the node is a dedicated cold or frozen node, in which case the default relates to the total memory available to the node.
Index settings
modify the number of replicas in bulk
批量/单个 设置索引副本数
PUT /index_name*/_settings { "index": { "number_of_replicas": 1 } }
Search Documents
match_all 搜索
GET /sw_segment-20230914/_search { "query": { "match_all": {} }, "size": 1 }
单字段排序匹配搜索(match)
GET /sw_segment-20230914/_search { "query": { "match": { "segment_id": "b7bb26fae59e4f45b101346cb83ff796.69.16946808855979526" } }, "sort": [ { "start_time": { "order": "desc" } } ], "size": 1 }
Elastic Cloud on Kubernetes (ECK / Elastic operator)
ECK operator下管理的Elasticsearch如果要修改cluster.routing.allocation.exclude
的参数配置,需要先为 elasticsearch 实例配置annotation: 'eck.k8s.elastic.co/managed=false',不然会配置一会就会被刷回原状
Other
对于有大量索引的刚重启的es集群
(主分片在1w-2w)
加快es集群恢复速度
结合es节点资源监控图,观测节点cpu压力,以及cpu IO wait
适当通过update cluster settings接口动态增加node_initial_primaries_recoveries (Defaults to 4
)
和 node_concurrent_recoveries
(A shortcut to set both cluster.routing.allocation.node_concurrent_incoming_recoveries
and cluster.routing.allocation.node_concurrent_outgoing_recoveries
Defaults to 2
)数值
通过使用 cluster settings + include_defaults=true 筛选查到当前配置值
减少集群从red状态到yellow状态的耗时:增加索引副本数量,增加node_initial_primaries_recoveries值
减少集群从yellow状态到green状态的耗时:增加 node_concurrent_recoveries 值
通过访问 /_cluster/allocation/explain
接口查到阻碍集群 to green(yellow)的原因
在es集群恢复期间因节点内存压力大(node was low on resources: memory.)而被k8s Evicted
调整缩小 jvm 配置值,尽量不超配(requests 和 limit尽量一致或提高requests值)
Error
集群分片数达到maximum错误
集群分片数达到maximum错误会有如下log信息,但是集群的健康状态不会改变
2022-11-10T10:26:03.643184618Z org.elasticsearch.common.ValidationException: Validation Failed: 1: this action would add [3] shards, but this cluster currently has [1999]/[2000] maximum normal shards open;
解决:
调整index ilm 策略或者调整集群的max_shards_per_node配置
临时生效配置:
curl -H "content-type: application/json" -X PUT "127.0.0.1:9200/_cluster/settings" -d '{"transient": {"cluster.max_shards_per_node": "5000"}}'
永久更改性配置:
curl -H "content-type: application/json" -X PUT "127.0.0.1:9200/_cluster/settings" -d '{"persistent": {"cluster.max_shards_per_node": "5000"}}'