问题
线上发现kube-state-metrics服务一直在重启,查看日志报错空指针问题:
I1215 07:03:08.230700 1 builder.go:146] Active collectors: certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments
E1215 07:03:08.237888 1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 60 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x13c4de0, 0x2178810)
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191111054156-6eb29fdf75dc/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191111054156-6eb29fdf75dc/pkg/util/runtime/runtime.go:48 +0x82
panic(0x13c4de0, 0x2178810)
/usr/local/go/src/runtime/panic.go:679 +0x1b2
k8s.io/apimachinery/pkg/api/resource.(*Quantity).AsInt64(...)
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191111054156-6eb29fdf75dc/pkg/api/resource/quantity.go:448
k8s.io/kube-state-metrics/internal/store.glob..func80(0xc000277000, 0xc000047760)
/go/src/k8s.io/kube-state-metrics/internal/store/hpa.go:250 +0x176
k8s.io/kube-state-metrics/internal/store.wrapHPAFunc.func1(0x153f360, 0xc000277000, 0xc000305c50)
/go/src/k8s.io/kube-state-metrics/internal/store/hpa.go:290 +0x5c
k8s.io/kube-state-metrics/pkg/metric.(*FamilyGenerator).Generate(...)
/go/src/k8s.io/kube-state-metrics/pkg/metric/generator.go:39
k8s.io/kube-state-metrics/pkg/metric.ComposeMetricGenFuncs.func1(0x153f360, 0xc000277000, 0x7fda3d2fa730, 0xc000277000, 0x0)
/go/src/k8s.io/kube-state-metrics/pkg/metric/generator.go:78 +0x122
k8s.io/kube-state-metrics/pkg/metrics_store.(*MetricsStore).Add(0xc000428640, 0x153f360, 0xc000277000, 0x0, 0x0)
/go/src/k8s.io/kube-state-metrics/pkg/metrics_store/metrics_store.go:76 +0xf6
k8s.io/kube-state-metrics/pkg/metrics_store.(*MetricsStore).Replace(0xc000428640, 0xc000436360, 0x9, 0x9, 0xc0003f9000, 0xa, 0x21a23c6, 0x219a560)
/go/src/k8s.io/kube-state-metrics/pkg/metrics_store/metrics_store.go:138 +0xa4
k8s.io/client-go/tools/cache.(*Reflector).syncWith(0xc00043e540, 0xc0004362d0, 0x9, 0x9, 0xc0003f9000, 0xa, 0x0, 0xc00015c000)
/go/pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:354 +0xf8
k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch.func1(0xc00043e540, 0xc0003be080, 0xc0003fe3c0, 0xc0003b9bb8, 0x0, 0x0)
/go/pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:250 +0x8fa
k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc00043e540, 0xc0003fe3c0, 0x0, 0x0)
/go/pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:257 +0x1a9
k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
/go/pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:155 +0x33
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc000357f78)
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191111054156-6eb29fdf75dc/pkg/util/wait/wait.go:152 +0x5e
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0003b9f78, 0x3b9aca00, 0x0, 0x1, 0xc0003fe3c0)
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191111054156-6eb29fdf75dc/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(...)
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191111054156-6eb29fdf75dc/pkg/util/wait/wait.go:88
k8s.io/client-go/tools/cache.(*Reflector).Run(0xc00043e540, 0xc0003fe3c0)
/go/pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:154 +0x16b
created by k8s.io/kube-state-metrics/internal/store.(*Builder).reflectorPerNamespace
/go/src/k8s.io/kube-state-metrics/internal/store/builder.go:337 +0x265
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x125fae6]
解决
从错误日志可以看到,kube-state-metrics 在处理hpa时出现了空指针解引用错误。
为什么会出现空指针?
上篇文章中定义了一个kafka_consumer_message_accumulationmetrics用来获取特定kafka ConsumerGroup的消息堆积量。
而堆积量可能是为0的,当堆积为0时,阿里云没有返回该组的数据,这时kube-state-metrics处理这个null值时就报错了空指针。
解决方法有两种:
方法1
给kafka_consumer_message_accumulationmetrics设置一个默认0值,当堆积量为0时,获取到的值为0,而不是null。
方法2
kube-state-metricsmetrics启动参数中可以指定使用哪些采集器,禁用掉hpa即可。
spec:
containers:
- name: kube-state-metrics
args:
# 添加以下参数来禁用 HPA 采集器
- --collectors=configmaps,cronjobs,daemonsets,deployments,endpoints,ingresses,jobs,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments,certificatesigningrequests
快速恢复先用方法2,再修改代码,使用方法1。
重启pod后恢复。