背景
节点断电重启,longhorn存储恢复后,mongodb pod报错pvc挂载在其他节点上,删除pod手动attach,再启动pod,状态卡在PodInitializing,查看报错信息如下:
Warning FailedMount 74s (x13 over 11m) kubelet MountVolume.MountDevice failed for volume "pvc-4e5da2e1-1afc-4d6f-af1e-e4c3b92c72b6" : rpc error: code = Internal desc = 'fsck' found errors on device /dev/longhorn/pvc-4e5da2e1-1afc-4d6f-af1e-e4c3b92c72b6 but could not correct them: fsck from util-linux 2.34
/dev/longhorn/pvc-4e5da2e1-1afc-4d6f-af1e-e4c3b92c72b6 contains a file system with errors, check forced.
/dev/longhorn/pvc-4e5da2e1-1afc-4d6f-af1e-e4c3b92c72b6: Directory inode 2, block #0, offset 0: directory has no checksum.
FIXED.
/dev/longhorn/pvc-4e5da2e1-1afc-4d6f-af1e-e4c3b92c72b6: Directory inode 2, block #0, offset 0: directory corrupted
/dev/longhorn/pvc-4e5da2e1-1afc-4d6f-af1e-e4c3b92c72b6: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)
怀疑是手动attach导致的目录损坏。
解决
参考官网地址:
https://longhorn.io/kb/troubleshooting-volume-filesystem-corruption/
具体步骤如下:
- 缩减异常pod
- 登录longhorn ui将异常卷attach到任意一个node
- 登录该node
- 在/dev/longhorn/下找到异常pvc
- 运行fsck修复文件系统
- 从longhorn ui detach卷
- 扩容异常pod
安装e2fsck
wget https://distfiles.macports.org/e2fsprogs/e2fsprogs-1.45.6.tar.gz
tar -xvf e2fsprogs-1.45.6.tar.gz
cd e2fsprogs-1.45.6/
./configure
make
cd e2fsck/
cp e2fsck /usr/sbin
执行fsck修复目录
查看pvc目录挂载格式
df -T | grep pvc
fsck.ext4 pvc-34c9682a-689f-4c73-845b-5a02093900d4
detach volume后,pod启动成功。