背景
prometheus如果是通过opreator部署的,prometheus.yaml是挂载的emptyDir,权限是只读的,修改读写权限失败。需要通过新增prometheus-additional.yaml来添加新增配置,达到修改prometheus配置文件的效果。
步骤
- 查看blackbox-exporter的svc,我这里端口是19115。
k get svc -n thanos | grep blackbox-exporter
- 创建prometheus-additional.yaml文件
- job_name: 'blackbox' # 配置job名 metrics_path: /probe # 定义metric获取的路径 params: module: [http_2xx] # blackbox_exporter中定义的模块名 static_configs: - targets: - https://wghdr.top # 监控地址 relabel_configs: - source_labels: [__address__] # 当前target的访问地址 target_label: __param_target # __param是默认参数前缀,target为参数,这里可以理解为把__address__ 的值赋给__param_target - source_labels: [__param_target] target_label: instance # 把__param_target的值赋给instance标签 - target_label: __address__ replacement: kube-prometheus-blackbox-exporter:19115 # black_exporter的地址
- 以prometheus-additional.yaml文件创建secret。
k create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml --dry-run -oyaml > additional-scrape-configs.yaml k apply -f additional-scrape-configs.yaml -n thanos k get secret additional-scrape-configs -n thanos -o yaml
- 查看并修改prometheus的crd。
k get Prometheus kube-prometheus-prometheus -n thanos -o yaml k edit Prometheus kube-prometheus-prometheus -n thanos 在spec下添加: additionalScrapeConfigs: key: prometheus-additional.yaml name: additional-scrape-configs
- 在prometheus页面查看target。
注意:这里如果target一直没有更新的话,可以删除kube-prometheus-operator的pod,如果重启pod后也没有的话,就需要看kube-prometheus-operator的日志了。
查看prometheus配置中blackbox的配置。
- grafana导入模版14603
- 配置PrometheusRule
cat web-promerule.yaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: labels: prometheus: kube-prometheus-prometheus role: alert-rules name: web-status-alert namespace: thanos spec: groups: - name: webstatus-alert rules: - alert: curlHttpStatus expr: probe_http_status_code{job="web_status"} >=400 for: 1m labels: severity: red annotations: summary: 'web接口访问异常状态码 > 400' description: '{{$labels.instance}} 不可访问,请及时查看,当前状态码为{{$value}}' - name: web-ssl_expiry rules: - alert: Ssl Cert Will Expire in 30 days expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30 for: 5m labels: severity: warning annotations: summary: "域名证书即将过期 (instance {{ $labels.instance }})" description: "域名证书 30 天后过期 \n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
k apply -f web-promerule.yaml k get PrometheusRule -n thanos -o yaml
- prometheus页面查看rule