工商银行转账手续费prometheus-operator安装部署
我们⽤的是阿⾥云托管的K8S集群1.21版本,⽤的 0.9 版本,如果你也是⽤的阿⾥云托管的ACK,提前提⼯单打开授权管理,不然安装的时候会找不到RoleBinding。
参考⽂档:
/blog/prometheus-operator-manual/
1、概述
1.1在k8s中部署Prometheus监控的⽅法
通常在k8s中部署prometheus监控可以采取的⽅法有以下三种
通过yaml⼿动部署
operator部署
通过helm chart部署
1.2 什么是Prometheus Operator
Prometheus Operator的本职就是⼀组⽤户⾃定义的CRD资源以及Controller的实现,Prometheus Operator负责监听这些⾃定义资源的变化,并且根据这些资源的定义⾃动化的完成如Prometheus Server⾃⾝以及配置的⾃动化管理⼯作。以下是Prometheus Operator的架构图:
在配置prometheus-operator 监控jvm之前,我们必须要了解prometheus-operator的4个crd组件,这四个CRD作⽤如下:Prometheus: 由 Operator 依据⼀个⾃定义资源kind: Prometheus类型中,所描述的内容⽽部署的 Prometheus Server 集群,可以将这个⾃定义资源看作是⼀种特别⽤来管理Prometheus Server的StatefulSets资源。
ServiceMonitor: ⼀个Kubernetes⾃定义资源(和kind: Prometheus⼀样是CRD),该资源描述了Prometheus Server的Target列表,Operator 会监听这个资源的变化来动态的更新Prometheus Server的Scrape targets并让prometheus rver去reload配置(prometheus有对应reload的http接⼝/-/reload)。⽽该资源主要通过Selector来依据 Labels 选取对应的Service的endpoints,并让 Prometheus Server 通过 Service 进⾏拉取(拉)指标资料(也就是metrics信息),metrics信息要在http的url输出符合metrics格式的信息,ServiceMonitor也可以定义⽬标的metrics的url.
Alertmanager:Prometheus Operator 不只是提供 Prometheus Server 管理与部署,也包含了 AlertManager,并且⼀样通过⼀个 kind:
幽默无处不在
Alertmanager ⾃定义资源来描述信息,再由 Operator 依据描述内容部署 Alertmanager 集群。
PrometheusRule:对于Prometheus⽽⾔,在原⽣的管理⽅式上,我们需要⼿动创建Prometheus的告警⽂件,并且通过在Prometheus配置中声明式的加载。⽽在Prometheus Operator模式中,告警规则
也编程⼀个通过Kubernetes API 声明式创建的⼀个资源.告警规则创建成功后,通过在Prometheus中使⽤想rvicemonitor那样⽤ruleSelector通过label匹配选择需要关联的PrometheusRule即可。
2.安装部署
1.下载部署包
wget -/prometheus-operator/kube-prometheus/archive/v0.7.0.zip
2.修改⽂件
其中kubelet的metrics采集端⼝,10250是https的,10255是http的
kube-scheduler的metrics采集端,10259是https的,10251是http的
Kube-controller的metrics采集端,10257是https的,10252是http的
kubernetes-rviceMonitorKubeScheduler.yaml
kubernetes-rviceMonitorKubeControllerManager.yaml
kubernetes-rviceMonitorKubelet.yaml
Yaml⽂件中相关信息采集默认采⽤https的端⼝,即10250端⼝,这样我们需要将port的端⼝改为http-metrics,同样的scheme改为http
3.部署
# cd kube-prometheus\manifests\tup
# kubectl apply .
# cd kube-prometheus\manifests\
# kubectl apply .
为prometheus、grafana、alertmanager 创建 ingress:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: prometheus-alertmangaer-grafana-ingress
namespace: monitoring
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/ssl-redirect: 'true'
nginx.ingress.kubernetes.io/proxy-connect-timeout: "600"
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
nginx.ingress.kubernetes.io/proxy-nd-timeout: "600"
nginx.ingress.kubernetes.io/connection-proxy-header: "keep-alive"
nginx.ingress.kubernetes.io/proxy-http-version: "1.1"
nginx.ingress.kubernetes.io/proxy-body-size: 80m
spec:
tls:
- hosts:
- ''
cretName: xxx-com-cret
- hosts:
皮羽绒服- ''
cretName: xxx-com-cret
- hosts:
- ''
cretName: xxx-com-cret
rules:
-
host:
http:
paths:
- path: /
backend:
rviceName: prometheus-k8s
rvicePort: 9090
- host:
http:
paths:
- path: /
backend:
rviceName: grafana
rvicePort: 3000
- host:
http:
paths:
- path: /
backend:
rviceName: alertmanager-main
rvicePort: 9093
解决Watchdog、ControllerManager、Scheduler监控问题
Watchdog是⼀个正常的报警,这个告警的作⽤是:如果alermanger或者prometheus本⾝挂掉了就发不出告警了,因此⼀般会采⽤另⼀个监控来监控prometheus,或者⾃定义⼀个持续不断的告警通知,哪⼀天这个告警通知不发了,说明监控出现问题了。prometheus operator已经考虑了这⼀点,本⾝携带⼀个watchdog,作为对⾃⾝的监控。
如果需要关闭,删除或注释掉Watchdog部分
prometheus-rules.yaml
...
- name: general.rules
rules:
- alert: TargetDown
睡衣品牌前十名annotations:
福建特色小吃
message: 'xxx'
expr: 100 * (count(up == 0) BY (job, namespace, rvice) / count(up) BY (job, namespace, rvice)) > 10
for: 10m
labels:
verity: warning
# - alert: Watchdog
# annotations:
# message: |
# This is an alert meant to ensure that the entire alerting pipeline is functional.
# This alert is always firing, therefore it should always be firing in Alertmanager
# and always fire against a receiver. There are integrations with various notification
# mechanisms that nd a notification when this alert is not firing. For example the
# "DeadMansSnitch" integration in PagerDuty.
# expr: vector(1)
# labels:
# verity: none
对应的Watchdog的ServiceMonitor也可以删除。
KubeControllerManagerDown、KubeSchedulerDown的解决
原因是因为在prometheus-rviceMonitorKubeControllerManager.yaml中有如下内容,但默认安装的集群并没有给系统kube-controller-manager组件创建svc
lector:
matchLabels:
k8s-app: kube-controller-manager
修改kube-controller-manager的监听地址:
# vim /etc/kubernetes/manifests/kube-controller-manager.yaml
...
spec:
containers:
- command:
- kube-controller-manager
- --allocate-node-cidrs=true
- --authentication-kubeconfig=/etc/f
- --authorization-kubeconfig=/etc/f
- --bind-address=0.0.0.0
# netstat -lntup|grep kube-contro
tcp6 0 0 :::10257 :::* LISTEN 38818/kube-controll
创建
prometheus-kube-controller-manager-rvice.yaml
prometheus-kube-scheduler-rvice.yaml,以便rviceMonitor监听
# cat prometheus-kube-controller-manager-rvice.yaml
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-controller-manager
labels:
k8s-app: kube-controller-manager
spec:
lector:
component: kube-controller-manager
ports:
- name: http-metrics
port: 10252
targetPort: 10252
protocol: TCP
# cat prometheus-kube-scheduler-rvice.yaml
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
甲状腺结节是怎么回事
战争的危害name: kube-scheduler
labels:
k8s-app: kube-scheduler
spec:
lector:
component: kube-scheduler
ports:
- name: http-metrics
port: 10251
targetPort: 10251
protocol: TCP
#10251是kube-scheduler组件 metrics 数据所在的端⼝,10252是kube-controller-manager组件的监控数据所在端⼝。
游泳技巧
上⾯ labels 和 lector 部分,labels 区域的配置必须和我们上⾯的 ServiceMonitor 对象中的 lector 保持⼀致,lector下⾯配置的是component=kube-scheduler,为什么会是这个 label 标签呢,我们可以去 describe 下 kube-scheduelr 这个 Pod
# kubectl describe pod kube-scheduler-k8s-master -n kube-system
Name: kube-scheduler-k8s-master
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: k8s-master/10.6.76.25
Start Time: Thu, 29 Aug 2019 09:21:01 +0800
Labels: component=kube-scheduler
tier=control-plane
# kubectl describe pod kube-controller-manager-k8s-master -n kube-system
Name: kube-controller-manager-k8s-master
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: k8s-master/10.6.76.25
Start Time: Thu, 29 Aug 2019 09:21:01 +0800
Labels: component=kube-controller-manager
tier=control-plane
浏览器ingress⽅式访问
参考: