首页 > 英文翻译

Elasticarch（ES）生产集群健康状况为黄色（yellow）的官方详细解释、原。。。

更新时间:2023-08-12 02:56:49 阅读：评论：0

Elasticarch（ES）⽣产集群健康状况为黄⾊（yellow）的官⽅详细解释、

原。。。

⽂章⽬录

介绍

Elasticarch(ES)集群状态显⽰黄⾊时，使⽤cerebro会提⽰显⽰黄⾊原因，如果使⽤其他⼯具，则可以通过健康检查api查看集群状态GET /_cluster/health。

调⽤健康检查api GET /_cluster/health反馈如下信息：

{

"cluster_name":"troll*",

"status":"yellow",

"timed_out":fal,

"number_of_nodes":***,

"number_of_data_nodes":***,

"active_primary_shards":***,

"active_shards":***,

"relocating_shards":***,

"initializing_shards":***,

"unassigned_shards":***,// ~注意看这⾥~

"delayed_unassigned_shards":***,

"number_of_pending_tasks":***,

"number_of_in_flight_fetch":***,

"task_max_waiting_in_queue_millis":***,

"active_shards_percent_as_number":***

}

elasticarch健康装填查询接⼝/_cluster/health接⼝反馈内容解释如下：

响应正⽂

cluster_name

（字符串）集群的名称。

status

（字符串）集群的运⾏状况，基于其主要和副本分⽚的状态。状态为：

– green

所有分⽚均已分配。

– yellow

所有主分⽚均已分配，但未分配⼀个或多个副本分⽚。如果群集中的某个节点发⽣故障，则在修复该节点之前，某些数据可能不可⽤。

– red

未分配⼀个或多个主分⽚，因此某些数据不可⽤。在集群启动期间，这可能会短暂发⽣，因为已分配了主要分⽚。

timed_out

（布尔值）如果fal响应在timeout参数指定的时间段内返回（30s默认情况下）。

number_of_nodes

（整数）集群中的节点数。

考研复试一般考什么number_of_data_nodes

（整数）作为专⽤数据节点的节点数。

active_primary_shards

（整数）活动主分区的数量。

active_shards

（整数）活动主分区和副本分区的总数。

relocating_shards购物商店

（整数）正在重定位的分⽚的数量。

initializing_shards

（整数）正在初始化的分⽚数。

unassigned_shards

（整数）未分配的分⽚数。

delayed_unassigned_shards

（整数）其分配因超时设置⽽延迟的分⽚数。

number_of_pending_tasks

（整数）尚未执⾏的集群级别更改的数量。

number_of_in_flight_fetch

（整数）未完成的访存数量。

bull

task_max_waiting_in_queue_millis

（整数）⾃最早的初始化任务等待执⾏以来的时间（以毫秒为单位）。

active_shards_percent_as_number

（浮动）群集中活动碎⽚的⽐率，以百分⽐表⽰。

问题分析

查看集群状态

# 查看集群健康状态

GET /_cluster/health

查看集群分⽚的情况，重点关注unassigned_shards没有正常分配的副本数量。

“cluster_name” : “*******”,

“status” : “yellow”,

“timed_out” : fal,

八上英语单词表“number_of_nodes” : *******,

乐趣英文

“number_of_data_nodes” : *******,

“active_primary_shards” : *******,

“active_shards” : *******,

“relocating_shards” : *******,

“initializing_shards” : *******,

“unassigned_shards” : *******,

“delayed_unassigned_shards” : *******,

“number_of_pending_tasks” : *******,

“number_of_in_flight_fetch” : *******,

“task_max_waiting_in_queue_millis” : *******,

“active_shards_percent_as_number” : *******

}

找到问题索引

# 查看索引情况

GET _cat/indices

根据返回值找到异常索引

yello open 索引名 ***** ***** ***** ***** ***** ***** *****

查看详细的异常信息

# 查看异常原因

GET /_cluster/allocation/explain

查看分⽚异常的原因，这⾥提⽰异常原因为：unassigned、node_left、the shard cannot be allocated to the same node on which a copy of the shard already exists和cannot allocate becau allocation is not permitted to any of the nodes，此处是由于节点丢失导致⽆法进⾏副本复制导致。

“index” : “",

“shard” : "”,thereafter

“primary” : “",

“current_state” : “unassigned”,

“unassigned_info” : {

“reason” : “NODE_LEFT”,

“at” : “2020-05-15T06:12:23.967Z”,

“details” : “node_left [KyZROB7BSASwY0i3r7q3nw]”,

“last_allocation_status” : “no_attempt”

“can_allocate” : “no”,

freeline

“allocate_explanation” : “cannot allocate becau allocation is not permitted to any of the nodes”,

“node_allocation_decisions” : [

{

“node_id” : “FkwTKuMISlG88uNtelHQbQ”,

“node_name” : “es7_01”,

“transport_address” : “172.21.0.6:9300”,

“node_attributes” : {

“ml.machine_memory” : “12566077440”,

“ml.max_open_jobs” : “20”,

“xpack.installed” : “true”

“node_decision” : “no”,

“deciders” : [

{

“decider” : “same_shard”,

“decision” : “NO”,

“explanation” : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[]][0], node[FkwTKuMISlG88uNtelHQbQ], [P], s[STARTED], a[id=l_k948LiTcSqjhp8PRKqVQ]]”

}

hmm]

{

“node_id” : “mjNvBmkASwq0Dx6W5028Uw”,anonymously

“node_name” : “es7_03”,

“transport_address” : “172.21.。:9300”,

“node_attributes” : {

“ml.machine_memory” : “12566077440”,

“ml.max_open_jobs” : “20”,

“xpack.installed” : “true”

“node_decision” : “no”,

“deciders” : [

{

“decider” : “same_shard”,

“decision” : “NO”,

“explanation” : “the shard cannot be allocated to the same node on which a copy of the shard already exists [[******]]

[0], node[mjNvBmkASwq0Dx6W5028Uw], [R], s[STARTED], a[id=lS8fqbDoRA-ju6QW5psnjA]]”

}

]

}

]

}

处理⽅案

步骤⼀、找到elasticarch集群异常的索引

# 查看索引信息，找出异常索引

GET /_cat/indices\?v

# health status index uuid pri unt docs.deleted store.size pri.store.size

# green open ** D90ToWRGTpyeJAIy2ZVCvw *** *** *** *** *** ***

# yellow open ** hXI3lFOlSVi6gnqREZzEwQ *** *** *** *** *** ***

# green open .kibana_task_manager_1 akJZg3QkRta-oGH8BEfhXA *** *** *** *** *** ***

# green open .apm-agent-configuration f5ftL0VISRm36KXnN3QtPQ *** *** *** *** *** ***

# green open .kibana_1 d5k_3pOkRSe95Cf-dMo0SQ *** *** *** *** *** ***

从以上信息中可以看出第⼆⾏的索引存在异常，为黄⾊(yellow),elasticarch健康状态为黄⾊则代表所有主分⽚均已分配，但未分配⼀个或多个副本分⽚。如果群集中的某个节点发⽣故障，则在修复该节点之前，某些数据可能不可⽤。则将副本集⼤⼩进⾏重新设置即可。

步骤⼆、查看es集群健康信息，以及黄⾊状态索引的ttings信息进⾏分析

查看es集群的健康状态GET /_cluster/health

返回信息如下：

{

"cluster_name":"troll*",

"status":"yellow",

"timed_out":fal,

"number_of_nodes":***,

"number_of_data_nodes":***,

"active_primary_shards":***,

"active_shards":***,

"relocating_shards":***,

"initializing_shards":***,

意趣

"unassigned_shards":***,// ~注意看这⾥~

"delayed_unassigned_shards":***,

"number_of_pending_tasks":***,

"number_of_in_flight_fetch":***,

"task_max_waiting_in_queue_millis":***,

"active_shards_percent_as_number":***

}

对照返回值官⽅⽂档解释(如上介绍中)，发现存在部分副本分⽚为正常分配的情况。

查看es集群黄⾊状态索引的ttings

# 查看索引设置

GET /***/_ttings

反馈信息如下：

{

"***" : {

"ttings" : {

"index" : {

"creation_date" : "***",

"number_of_shards" : "***",

"number_of_replicas" : "***", // 关注此处的副本分⽚的⼤⼩

"uuid" : "hXI3lFOlSVi6gnqREZzEwQ",

"version" : {

"created" : "***"

"provided_name" : "***"

}

步骤三、分析并解决问题

此处假设number_of_replicas的数量为3，则说明3个分⽚未分配。我们需要根据不同的情况进⾏分析：

本文发布于:2023-08-12 02:56:49，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/90/194242.html

上一篇：AS400

下一篇：linuxtcpNagle算法，TCP_NODELAY和TCP_CORK转载

标签：集群查看节点分配

留言与评论（共有 0 条评论）