报错

集群状态是red，查看状态有两个index未分配分区。

curl -X GET 127.0.0.1:29200/_cluster/health?pretty
curl -X GET 127.0.0.1:29200/_cat/shards?v
curl -X GET "http://127.0.0.1:29200/_cat/shards/alarm_log_message_index?v"

查看未分配原因："failed shard on node [FM1JOz5XRsGWNj5k0hKHSw]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[alarm_log_message_index][0]: obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [56399ms]]; "

curl -X GET 127.0.0.1:29200/_cluster/allocation/explain?pretty

解决

主分片和背分片都没有分配，所以需要先强制分配主分片到一个健康的node上，然后看备份片是否自动分配，如果没有再手动分配。

查看节点状态：

curl "http://127.0.0.1:29200/_cat/nodes?v&h=name,disk.used_percent,heap.percent,role"

强制分配主分片：

curl -X POST "http://127.0.0.1:29200/_cluster/reroute" -H "Content-Type: application/json" -d'
> {
>   "commands": [
>     {
>       "allocate_stale_primary": {
>         "index": "alarm_log_message_index",
>         "shard": 0,
>         "node": "BdC8aXFKRjS7w8-Z4uQwNw",
>         "accept_data_loss": true
>       }
>     }
>   ]
> }'

查看集群状态变为了yellow。

主分片已分配，备分片没有自动分配。

手动分配备分片：

curl -X POST "http://127.0.0.1:29200/_cluster/reroute" -H "Content-Type: application/json" -d'
{
  "commands": [
    {
      "allocate_replica": {
        "index": "alarm_log_message_index",
        "shard": 0,
        "node": "FpL89aSJTeWu6DfuWAK6YQ"
      }
    }
  ]
}'

报错了，查看原因:

curl -X GET 127.0.0.1:29200/_cluster/allocation/explain?pretty

需要清理磁盘。清理后再次手动分配。需要添加?retry_failed=true参数，否则还是会失败。

 curl -X POST "http://127.0.0.1:29200/_cluster/reroute?retry_failed=true" -H "Content-Type: application/json" -d'
{
  "commands": [
    {
      "allocate_replica": {
        "index": "alarm_log_message_index",
        "shard": 0,
        "node": "FM1JOz5XRsGWNj5k0hKHSw"
      }
    }
  ]
}'

等待初始化完成即可。

集群状态变为green。

报错

解决

相关文章