Promethues之AlertManager 配置企业微信告警
生活随笔
收集整理的這篇文章主要介紹了
Promethues之AlertManager 配置企业微信告警
小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
一 安裝AlertManager
?1 下載
?這里是直接從官網(wǎng)下載的
wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz
2 安裝并創(chuàng)建alertmanager用戶
3 配置開(kāi)機(jī)自啟
cat /usr/lib/systemd/system/alertmanager.service [Unit] Description=alertmanager Documentation=https://github.com/prometheus/alertmanager After=network.target [Service] Type=simple User=alertmanager ExecStart=/usr/local/alertmanager/alertmanager --storage.path=/usr/local/alertmanager/data --config.file=/usr/local/alertmanager/alertmanager.yml Restart=on-failure [Install] WantedBy=multi-user.target4 配置alertmanager?
注:repeat_interval 這個(gè)參數(shù)生產(chǎn)環(huán)境可以安實(shí)際情況設(shè)置時(shí)間久點(diǎn),這樣可以避免同樣的未處理告警一直重復(fù)發(fā)出
cat /usr/local/alertmanager/alertmanager.ymlglobal:resolve_timeout: 5m #每5分鐘檢測(cè)一次是否恢復(fù)wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/' #調(diào)用企業(yè)微信api地址不用改 templates: #告警模板- './template/*.tmpl'route: # 設(shè)置報(bào)警分發(fā)策略group_by: ['alertname'] # 分組標(biāo)簽group_wait: 10s # 告警等待時(shí)間。告警產(chǎn)生后等待10s,如果有同組告警一起發(fā)出group_interval: 10s # 兩組告警的間隔時(shí)間repeat_interval: 1m # 重復(fù)告警的間隔時(shí)間,減少相同告警的發(fā)送頻率 此處為測(cè)試設(shè)置為1分鐘 receiver: 'wechat' # 默認(rèn)接收者 receivers:- name: 'wechat'wechat_configs:- send_resolved: trueagent_id: '' # 自建應(yīng)用的agentIdto_party: '' # 接收告警消息的人員Idapi_secret: '' # 自建應(yīng)用的secretcorp_id: '' # 企業(yè)IDagent_id、api_secret可在微信管理后臺(tái)獲取,corp_id 企業(yè)信息中獲取,
corp_id
?
?5 prometheus.yml 中添加alertmanager地址
# Alertmanager configuration alerting:alertmanagers:- static_configs:- targets:- "localhost:9093" # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files:- "rules/*_alert.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: "alertmanager"static_configs:- targets: ['localhost:9093']6 配置告警模板
cat ./template/wechat.tmpl {{ define "wechat.default.message" }} {{- if gt (len .Alerts.Firing) 0 -}} {{- range $index, $alert := .Alerts -}} {{- if eq $index 0 }} =========監(jiān)控報(bào)警 ========= 告警狀態(tài):{{ .Status }} 告警級(jí)別:{{ .Labels.severity }} 告警類型:{{ $alert.Labels.alertname }} 故障主機(jī): {{ $alert.Labels.instance }} {{ $alert.Labels.pod }} 告警主題: {{ $alert.Annotations.summary }} 告警詳情: {{ $alert.Annotations.message }}{{ $alert.Annotations.description}}; 觸發(fā)閥值:{{ .Annotations.value }} 故障時(shí)間: {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} ========= = end = ========= {{- end }} {{- end }} {{- end }} {{- if gt (len .Alerts.Resolved) 0 -}} {{- range $index, $alert := .Alerts -}} {{- if eq $index 0 }} =========異常恢復(fù) ========= 告警類型:{{ .Labels.alertname }} 告警狀態(tài):{{ .Status }} 告警主題: {{ $alert.Annotations.summary }} 告警詳情: {{ $alert.Annotations.message }}{{ $alert.Annotations.description}}; 故障時(shí)間: {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} 恢復(fù)時(shí)間: {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} {{- if gt (len $alert.Labels.instance) 0 }} 實(shí)例信息: {{ $alert.Labels.instance }} {{- end }} ========= = end = ========= {{- end }} {{- end }} {{- end }} {{- end }}7 配置告警規(guī)則?
cat rules/host_alert.ymlgroups: - name: node-alertrules:- alert: NodeDownexpr: up {job="nodes"} == 0for: 30slabels:status: criticalannotations:summary: "{{ $labels.job }} {{.instance}}:服務(wù)器宕機(jī)"description: "{{ $labels.job }} {{.instance}}:服務(wù)器延時(shí)超過(guò)30s"value: "{{ $value }}"- alert: NodeCpuHighexpr: 100-(avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by(instance)* 100) > 80for: 5mlabels:status: warningannotations:summary: "{{$labels.instance}}: High CPU Usage Detected"description: "{{ $labels.job }} {{$labels.instance}}: CPU usage is {{$value}}, above 80%"value: "{{ $value }}"- alert: NodeFilesystemUsageexpr: 100 - (node_filesystem_free_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80for: 10mlabels:severity: warningannotations:summary: "Instance {{ $labels.job }} {{ $labels.instance }} : {{ $labels.mountpoint }} 分區(qū)使用率過(guò)高"description: "{{ $labels.job }} {{ $labels.instance }}: {{ $labels.mountpoint }} 分區(qū)使用大于80% (當(dāng)前值: {{ $value }})"- alert: NodeMemoryHighexpr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 90for: 5mlabels:status: warningannotations:summary: "{{ $labels.job }} {{ $labels.instance}} 內(nèi)存使用率過(guò)高!"description: "{{ $labels.job }} {{ $labels.instance }} 內(nèi)存使用大于90%(目前使用:{{ $value}}%)"- alert: NodeIOexpr: (avg(irate(node_disk_io_time_seconds_total[1m])) by(instance)* 100) > 60for: 1mlabels:status: warningannotations:summary: "{{ $labels.job }} {{$labels.instance}} 流入磁盤(pán)IO使用率過(guò)高!"description: "{{ $labels.job }} {{ $labels.instance }} 流入磁盤(pán)IO大于60%(目前使用:{{ $value }})"- alert: Networkexpr: ((sum(rate (node_network_receive_bytes_total{device!~'tap.*|veth.*|br.*|docker.*|virbr*|lo*|ens*'}[5m])) by (instance)) / 100) > 102400for: 2mlabels:status: warningannotations:summary: "{{ $labels.job }} {{ $labels.instance}} 流入網(wǎng)絡(luò)帶寬過(guò)高!"description: "{{ $labels.job }} {{ $labels.instance }}流入網(wǎng)絡(luò)帶寬持續(xù)2分鐘高于100M. RX帶寬使用率{{ $value }}"8 配置完成后使用命令檢查配置文件是否正確
./promtool check config prometheus.yml Checking prometheus.ymlSUCCESS: 3 rule files foundChecking rules/blackbox_exporter_alert.ymlSUCCESS: 1 rules foundChecking rules/check_ssl_alert.ymlSUCCESS: 1 rules foundChecking rules/host_alert.ymlSUCCESS: 6 rules found./amtool check-config alertmanager.yml Checking 'alertmanager.yml' SUCCESS Found:- global config- route- 0 inhibit rules- 1 receivers- 1 templatesSUCCESS9 重啟alertmanager和promethues生效
systemctl restart?alertmanager
systemctl restart promethues
10 最后需要在自建應(yīng)用中設(shè)置企業(yè)可信IP,IP為alertmanager的IP。不然不會(huì)收到告警信息
總結(jié)
以上是生活随笔為你收集整理的Promethues之AlertManager 配置企业微信告警的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 黑科技网站第三弹 怀旧游戏集锦
- 下一篇: 屏幕翻拍_带有现代翻拍的前5大经典合作游