Elasticsearch的一些api随记

Health

/_cat/health

/_cluster/health

Indices health

按条件查看索引状态

/_cat/indices?help
/_cat/indices?health=red&v&s=store.size:desc,index

/_cat/indices?health=yellow&v&s=store.size:desc,index

/_cat/indices?health=green&v&s=store.size:desc,index

注意在配合 * 通配符搜索/操作索引的时候，如果涉及隐藏的datastream / 隐藏的index，默认不会匹配命中，必须指定 expand_wildcards=all 参数

-> https://www.elastic.co/guide/en/elasticsearch/reference/7.16/multi-index.html#hidden

Nodes

/_cat/nodes?v

查看es各节点磁盘空间占用、分片数目等

/_cat/allocation?v

/_cat/nodeattrs

Get master node

/_cat/master?v

Cluster allocation explain related

可以用于定位分片状态以及分片为何故障

/_cat/shards/index_name-*?v&s=state,index&h=index,shard,prirep,state,docs,store,ip,node,unassigned.reason

/_cluster/allocation/explain

Shards

粗略查看分片情况，特别是查看分片分布节点或大小/状态

GET /_cat/shards

GET /_cat/shards?index=index_name

GET /_cat/shards?index=index_na*

查看分片分配失败原因

/_cat/shards/index_name-*?v&s=state,index&h=index,shard,prirep,state,docs,store,ip,node,unassigned.reason

Recovery API

Returns information about ongoing and completed shard recoveries, similar to the index recovery API.

For data streams, the API returns information about the stream’s backing indices

可以查看当前正在 relocating 的分片，也能查到各分片处理进度百分比

GET /_cat/recovery?active_only=true&s=index&v

Adds a data stream or index to an alias, and sets the write index or data stream for the alias

为别名设置可写索引或数据流

If the alias doesn’t exist, the add action creates it.

POST /_aliases 
{
  "actions": [
      {
            "add": {
               "index": "es-k8s-logs-000020",    
               "alias": "es-k8s-logs-alias",    
               "is_write_index": true 
          }
      }
    ]
}

Thread pool related

https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-thread-pool.html

/_cluster/settings?pretty&include_defaults=true | grep processors

Get maximum number of threads info

curl "127.1:9200/_cat/thread_pool?v&h=ip,node_name,id,name,max,size,queue_size,queue,active,rejected&pretty"

Tasks

概览

GET /_cat/tasks?v&s=action,node,start_time

tasks 详情

GET /_tasks

GET /_tasks?group_by=parents&actions=*forcemerge*&detailed=true

Templates 模板

/_cat/templates?v

⚠️ /_template/${template_name} is legacy index templates, which are deprecated and will be replaced by the composable templates introduced in Elasticsearch 7.8.

新版本中使用 /_index_template 取代

GET/PUT /_template/${template_name}

但是 index_template 对比 legacy template有个很明显的差异就是，legacy template可以直接根据priority进行叠加覆盖，而index_template哪个template priority高，就只有哪个生效 https://www.elastic.co/guide/en/elasticsearch/reference/7.17/index-templates.html

If a new data stream or index matches more than one index template, the index template with the highest priority is used.

Use template to change the replicas settings of all indexes (Legacy index template)

Multiple index templates can potentially match an index, in this case, both the settings and mappings are merged into the final configuration of the index.

The order of the merging can be controlled using the order parameter, with lower order being applied first, and higher orders overriding them.

legacy es template 中, 取值范围为 0 - 2^31-1 (0~2147483647)

PUT /_template/${template_name}
{
    "order": 2147483647,
    "index_patterns": [
        "*"
    ],
    "settings": {
        "index": {
            "number_of_replicas": "0"
        }
    }
}

使用jq批量修改es index template的lifecycle配置

ILM (index lifecycle policy) 索引生命周期

顾名思义，ilm另外也可用于做ES集群的冷热温架构。

不同的阶段(phase)能做哪些事可以在这个 Document 查看

比较难受的是，ilm目前没有类似 _cat/templates 的接口一次性只查看这个集群已配置的 ILM 策略名字，只能一次性获取全部策略具体定义 (不过可以利用浏览器的F12 json preview折叠来曲线救国)

GET /_ilm/policy

Get specific ilm policy detail 获取特定ILM策略定义

GET /_ilm/policy/${ilm_name}

PUT /_ilm/policy/ilm-30d-delete
{
  "policy": {
    "phases": {
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {
            "delete_searchable_snapshot": true
          }
        }
      }
    }
  }
}

Get index's ilm status 获取索引当前 ILM 状态

GET /${index_name}/_ilm/explain

GET /${index_name}/_ilm/explain?human

Move index's ilm to step 修改索引的ILM阶段状态(人为触发ILM action执行)

https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-move-to-step.html

POST _ilm/move/my-index-000001
{
  "current_step": { 
    "phase": "new",
    "action": "complete",
    "name": "complete"
  },
  "next_step": { 
    "phase": "warm",
    "action": "forcemerge", 
    "name": "forcemerge" 
  }
}

Manually create an index that is managed by template and ILM (including rollover operations by day)

手动创建原本应由template和ilm管控的索引，且索引名内包含日期(动态索引名)

⚠️ 这种索引不能直接粗暴地 PUT /index-name-2022.10.23-000022 以创建索引，否则手动创建出来的索引，在rollover滚动 (例如rollover-max_age:1d)的时候，创建出来的新索引名字仍然是创建索引时定义的日期，而不是当天轮滚发生时的日期(如 index-name-2022.10.23-000023)

这个现象可以通过判断 GET /index-name/_settings 中 index.provided_name 属性看出来

解法:

PUT %3Cindex-name-%7Bnow%2Fd%7D-000099%3E

注意在kibana-Dev Tools中不要做URL Decode，他就是这样的需要编码一下(解码后就是: <index-name-{now/d}-000099>)

ps: 如果怕创建错名字的话，可以使用 GET %3Cindex-name-%7Bnow%2Fd%7D-000099%3E/_settings 预览一下生成的索引名效果

对于索引最后的这个序号，无论前一个索引的名称是什么，该编号始终为 6 个字符，且为零填充。即使手动创建的索引结尾是-00001，在rollover发生以后，索引后缀序号依然会变成-000002

故障处理: 有的时候对现有的索引修改了其引用的 ilm policy 为别的 policy,或者修改了其引用的 ilm policy中的 phase 定义。会导致索引ilm故障

有可能导致他的ilm处理会出问题（不记得怎么告警），没记错的话通过 GET {index_name}/_ilm/explain 能看到 error 信息，能看到卡在某个 phase 失败

这时候需要人为修改 index 的 ilm phase 修复，如

POST _ilm/move/insight-es-k8s-logs-dce5-aliyun-default-prd-2024.01.16
{
  "current_step": { 
    "phase": "hot",
    "action": "rollover",
    "name": "ERROR"
  },
  "next_step": { 
    "phase": "cold"
  }
}

或者尝试通过 POST {index_name}/_ilm/retry 接口重试

Cluster setting related ES集群参数设置

/_cluster/settings?include_defaults=true&pretty

/_cluster/settings?include_defaults=true

Wildcard expressions or all indices are not allowed

允许泛匹配删除索引

PUT /_cluster/settings
{
  "persistent": {
    "action": {
      "destructive_requires_name": "false"
    }
  }
}

primaries recovery settings

控制索引恢复或者relocating的并发数

{
    "transient": {
        "cluster": {
            "routing": {
                "allocation": {
                    "node_initial_primaries_recoveries": 10,
                    "node_concurrent_incoming_recoveries": null,
                    "node_concurrent_outgoing_recoveries": null,
                    "node_concurrent_recoveries": 20
                }
            }
        }
    }
}

{
    "transient": {
        "cluster": {
            "routing": {
                "allocation": {
                    "node_initial_primaries_recoveries": null,
                    "node_concurrent_incoming_recoveries": null,
                    "node_concurrent_recoveries": null
                }
            }
        }
    }
}

{
    "persistent": {
        "cluster": {
            "routing": {
                "allocation": {
                    "node_initial_primaries_recoveries": 30,
                    "node_concurrent_incoming_recoveries": null,
                    "node_concurrent_recoveries": 10
                }
            }
        }
    }
}

cluster.routing.allocation.node_concurrent_recoveries: A shortcut to set both cluster.routing.allocation.node_concurrent_incoming_recoveries and cluster.routing.allocation.node_concurrent_outgoing_recoveries.

# PUT /_cluster/settings
{
    "persistent": {
        "cluster": {
            "routing": {
                "allocation": {
                    "node_concurrent_recoveries": 8
                }
            }
        }
    }
}

recovery.max_bytes_per_sec 修改relocating时并发传输数据量

加大此数值可以有效缩短es relocating index的耗时

indices.recovery.max_bytes_per_sec: Limits total inbound and outbound recovery traffic for each node. Applies to both peer recoveries as well as snapshot recoveries (i.e., restores from a snapshot). Defaults to 40mb unless the node is a dedicated cold or frozen node, in which case the default relates to the total memory available to the node.

Index settings

modify the number of replicas in bulk

批量/单个设置索引副本数

PUT /index_name*/_settings
{
  "index": {
    "number_of_replicas": 1
  }
}

Search Documents

通过指定 _source 可以控制结果只保留某些字段

match_all 搜索

GET /sw_segment-20230914/_search
{
  "query": {
    "match_all": {}
  },
  "size": 1
}

单字段排序匹配搜索(match)

GET /sw_segment-20230914/_search
{
  "query": {
    "match": {
      "segment_id": "b7bb26fae59e4f45b101346cb83ff796.69.16946808855979526"
    }
  },
  "sort": [
    {
      "start_time": {
        "order": "desc"
      }
    }
  ],
  "size": 1
}

对某个字段的值进行聚合(查询该字段有多少种类型的值)

GET /.monitoring-es-7-2024.12.13/_search
{
  "size": 0,
  "aggs": {
    "unique_types": {
      "terms": {
        "field": "type",
        "size": 10
      }
    }
  }
}

对某个字段的值进行计数统计(查询该字段排名前10的值)

(top_10_endpoints 可以自定义, 自定义的聚合名)

GET sw_segment-20251021/_search
{
  "size": 0, 
  "aggs": {
    "top_10_endpoints": {
      "terms": {
        "field": "endpoint_id",
        "size": 10,
        "order": {
          "_count": "desc"
        }
      }
    }
  }
}

Elastic Cloud on Kubernetes (ECK / Elastic operator)

ECK operator下管理的Elasticsearch如果要修改cluster.routing.allocation.exclude 的参数配置，需要先为 elasticsearch 实例配置annotation: 'eck.k8s.elastic.co/managed=false'，不然会配置一会就会被刷回原状

Other

对于有大量索引的刚重启的es集群

(主分片在1w-2w)

加快es集群恢复速度

结合es节点资源监控图，观测节点cpu压力，以及cpu IO wait

适当通过update cluster settings接口动态增加node_initial_primaries_recoveries (Defaults to 4)

和 node_concurrent_recoveries

(A shortcut to set both cluster.routing.allocation.node_concurrent_incoming_recoveries and cluster.routing.allocation.node_concurrent_outgoing_recoveries

Defaults to 2)数值

通过使用 cluster settings + include_defaults=true 筛选查到当前配置值

减少集群从red状态到yellow状态的耗时：增加索引副本数量，增加node_initial_primaries_recoveries值

减少集群从yellow状态到green状态的耗时：增加 node_concurrent_recoveries 值

通过访问 /_cluster/allocation/explain 接口查到阻碍集群 to green(yellow)的原因

在es集群恢复期间因节点内存压力大(node was low on resources: memory.)而被k8s Evicted

调整缩小 jvm 配置值，尽量不超配(requests 和 limit尽量一致或提高requests值)

Error

集群分片数达到maximum错误

集群分片数达到maximum错误会有如下log信息，但是集群的健康状态不会改变

2022-11-10T10:26:03.643184618Z org.elasticsearch.common.ValidationException: Validation Failed: 1: this action would add [3] shards, but this cluster currently has [1999]/[2000] maximum normal shards open;

解决:

调整index ilm 策略或者调整集群的max_shards_per_node配置

临时生效配置:

curl -H "content-type: application/json" -X PUT "127.0.0.1:9200/_cluster/settings" -d '{"transient": {"cluster.max_shards_per_node": "5000"}}'

永久更改性配置:

curl -H "content-type: application/json" -X PUT "127.0.0.1:9200/_cluster/settings" -d '{"persistent": {"cluster.max_shards_per_node": "5000"}}'

匿名

搜索