报错
集群状态是red,查看状态有两个index未分配分区。
curl -X GET 127.0.0.1:29200/_cluster/health?pretty
curl -X GET 127.0.0.1:29200/_cat/shards?v
curl -X GET "http://127.0.0.1:29200/_cat/shards/alarm_log_message_index?v"
查看未分配原因:"failed shard on node [FM1JOz5XRsGWNj5k0hKHSw]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[alarm_log_message_index][0]: obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [56399ms]]; "
curl -X GET 127.0.0.1:29200/_cluster/allocation/explain?pretty
解决
主分片和背分片都没有分配,所以需要先强制分配主分片到一个健康的node上,然后看备份片是否自动分配,如果没有再手动分配。
查看节点状态:
curl "http://127.0.0.1:29200/_cat/nodes?v&h=name,disk.used_percent,heap.percent,role"
强制分配主分片:
curl -X POST "http://127.0.0.1:29200/_cluster/reroute" -H "Content-Type: application/json" -d'
> {
> "commands": [
> {
> "allocate_stale_primary": {
> "index": "alarm_log_message_index",
> "shard": 0,
> "node": "BdC8aXFKRjS7w8-Z4uQwNw",
> "accept_data_loss": true
> }
> }
> ]
> }'
查看集群状态变为了yellow。
主分片已分配,备分片没有自动分配。
手动分配备分片:
curl -X POST "http://127.0.0.1:29200/_cluster/reroute" -H "Content-Type: application/json" -d'
{
"commands": [
{
"allocate_replica": {
"index": "alarm_log_message_index",
"shard": 0,
"node": "FpL89aSJTeWu6DfuWAK6YQ"
}
}
]
}'
报错了,查看原因:
curl -X GET 127.0.0.1:29200/_cluster/allocation/explain?pretty
需要清理磁盘。清理后再次手动分配。需要添加?retry_failed=true
参数,否则还是会失败。
curl -X POST "http://127.0.0.1:29200/_cluster/reroute?retry_failed=true" -H "Content-Type: application/json" -d'
{
"commands": [
{
"allocate_replica": {
"index": "alarm_log_message_index",
"shard": 0,
"node": "FM1JOz5XRsGWNj5k0hKHSw"
}
}
]
}'
等待初始化完成即可。
集群状态变为green。