Promethues + Grafana + AlertManager使用总结
Prometheus是一個(gè)開(kāi)源監(jiān)控報(bào)警系統(tǒng)和時(shí)序列數(shù)據(jù)庫(kù),通常會(huì)使用Grafana來(lái)美化數(shù)據(jù)展示。
1. 監(jiān)控系統(tǒng)基礎(chǔ)架
1.1核心組件
Prometheus Server, 主要用于抓取數(shù)據(jù)和存儲(chǔ)時(shí)序數(shù)據(jù),另外還提供查詢(xún)和 Alert Rule 配置管理。
exporters ,數(shù)據(jù)采樣器,例如采集機(jī)器數(shù)據(jù)的node_exporter,采集MongoDB 信息的 MongoDB exporter 等等。
alertmanager ,用于告警通知管理。
Grafana ,監(jiān)控?cái)?shù)據(jù)圖表化展示模塊。
2. 基礎(chǔ)組件安裝
由于是學(xué)習(xí)研究使用,這里通過(guò)docker快速安裝環(huán)境。
2.1 安裝Node Exporter
docker-compose-node-export.yml
version: '3'
services:
node-exporter:
image: prom/node-exporter
container_name: node-exporter
hostname: node-exporter
restart: always
ports:
- "9100:9100"
2.2 安裝Alert Manager
docker-compose-alertmanager.yml
version: '3'
services:
alertmanager:
image: prom/alertmanager
container_name: alertmanager
hostname: alertmanager
restart: always
volumes:
- /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
ports:
- "9093:9093"
alertmanager.yml
global:
smtp_smarthost: 'smtp.qq.com:25' #QQ服務(wù)器
smtp_from: '793272861@qq.com' #發(fā)郵件的郵箱
smtp_auth_username: '793272861@qq.com' #發(fā)郵件的郵箱用戶名,也就是你的郵箱
smtp_auth_password: '****************' #發(fā)郵件的郵箱密碼
smtp_require_tls: false #不進(jìn)行tls驗(yàn)證
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 10m
receiver: live-monitoring
receivers:
- name: 'live-monitoring'
email_configs:
- to: '793272861@qq.com' #收郵件的郵箱
2.3 安裝Prometheus
docker-compose-prometheus.yml
version: '3'
services:
prometheus:
image: prom/prometheus
container_name: prometheus
hostname: prometheus
restart: always
volumes:
- /data/docker_file/prometheus/data:/prometheus
- /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
# 配置定時(shí)任務(wù),輪詢(xún)拉取監(jiān)控?cái)?shù)據(jù)
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus:9090']
- job_name: 'node-exporter'
scrape_interval: 5s
static_configs:
- targets: ['node-exporter:9100']
Prometheus服務(wù)發(fā)現(xiàn)機(jī)制
通過(guò)consul實(shí)現(xiàn)自動(dòng)服務(wù)發(fā)現(xiàn)
訪問(wèn):http://localhost:9090/
2.4 安裝Grafana
docker-compose-grafana.yml
version: '3'
services:
grafana:
image: grafana/grafana
container_name: grafana
hostname: grafana
restart: always
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- /data/docker_file/grafana/data:/var/lib/grafana
- /data/docker_file/grafana/log:/var/log/grafana
ports:
- "3000:3000"
添加數(shù)據(jù)源(Prometheus)
訪問(wèn):http://localhost:30000/ , 默認(rèn)用戶名:admin,密碼:admin
2.5 Docker-Compose腳本
version: '3'
services:
prometheus:
image: prom/prometheus
container_name: prometheus
hostname: prometheus
restart: always
volumes:
- /data/docker_file/prometheus/data:/prometheus
- /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
networks:
- monitor
alertmanager:
image: prom/alertmanager
container_name: alertmanager
hostname: alertmanager
restart: always
volumes:
- /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
ports:
- "9093:9093"
networks:
- monitor
grafana:
image: grafana/grafana
container_name: grafana
hostname: grafana
restart: always
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- /data/docker_file/grafana/data:/var/lib/grafana
- /data/docker_file/grafana/log:/var/log/grafana
ports:
- "3000:3000"
networks:
- monitor
node-exporter:
image: prom/node-exporter
container_name: node-exporter
hostname: node-exporter
restart: always
ports:
- "9100:9100"
networks:
- monitor
networks:
monitor:
driver: bridge
3. 配置Grafana DashBoard
Grafana通過(guò)PromQL查詢(xún)語(yǔ)句從Prometheus拉取數(shù)據(jù),并有Pannel進(jìn)行渲染,一個(gè)個(gè)Grafana Pannel 組成一個(gè)Grafana DashBoard。
3.1下載Grafana DashBoard文件
可以從官網(wǎng)下載已經(jīng)寫(xiě)好的Grafana DashBoard文件,導(dǎo)入到我們Grafana系統(tǒng)就可以直接使用。
推薦的Grafana DashBoard
JVM (Micrometer)
Spring Boot 2.1 Statistics
主機(jī)基礎(chǔ)監(jiān)控(cpu,內(nèi)存,磁盤(pán),網(wǎng)絡(luò))
Node Exporter for Prometheus Dashboard CN
Druid Connection Pool Dashboard
導(dǎo)入Grafana DashBoard
3.2 添加修改Grafana Panel(擴(kuò)展)
官方自帶的Spring Boot 2.1 Statistics Dashboard沒(méi)有展示第三方請(qǐng)求的數(shù)據(jù)報(bào)表,我們以此為例,添加第三方請(qǐng)求的Client Request Count報(bào)表和Client Response Time報(bào)表。
Client Request Count
irate(http_client_requests_seconds_count{instance="$instance", application="$application", uri!~".*actuator.*"}[5m])
注意:應(yīng)用中的Meter的名稱(chēng)必須為http.client.requests
Client Response Time
irate(http_client_requests_seconds_sum{instance="$instance", application="$application",uri!~".*actuator.*"}[5m]) / irate(http_client_requests_seconds_count{instance="$instance", application="$application",uri!~".*actuator.*"}[5m])
4. Spring Boot 集成Micrometer
Metrics(譯:指標(biāo),度量)
Micrometer提供了與供應(yīng)商無(wú)關(guān)的接口,包括 timers(計(jì)時(shí)器), gauges(量規(guī)), counters(計(jì)數(shù)器), distribution summaries(分布式摘要), long task timers(長(zhǎng)任務(wù)定時(shí)器)。它具有維度數(shù)據(jù)模型,當(dāng)與維度監(jiān)視系統(tǒng)結(jié)合使用時(shí),可以高效地訪問(wèn)特定的命名度量,并能夠跨維度深入研究。
4.1 引入依賴(lài)
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
<version>${micrometer.version}</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
4.2 開(kāi)啟Prometheus功能
spring:
application:
name: spring-boot-node
management:
metrics:
# 1.添加全局的tags,后面可以作為變量搜索數(shù)據(jù)
tags:
application: ${spring.application.name}
endpoints:
web:
exposure:
# 2.打開(kāi)prometheus端點(diǎn)功能
include: 'health,prometheus'
4.3 實(shí)現(xiàn)第三方請(qǐng)求的監(jiān)控
基于OkHttpMetricsEventListener可以有好的對(duì)OkHttp Client的請(qǐng)求進(jìn)行監(jiān)控。
配置OkHttp Client事件監(jiān)聽(tīng)
@Bean("okHttpClient")
public OkHttpClient okHttpClient(ConnectionPool connectionPool) {
return new OkHttpClient().newBuilder().connectionPool(connectionPool)
.connectTimeout(5, TimeUnit.SECONDS)
.readTimeout(10, TimeUnit.SECONDS)
.eventListener(eventListener())
.build();
}
/**
* 事件監(jiān)聽(tīng)器 OkHttpMetricsEventListener
* metricsProperties.getWeb().getClient().getRequestsMetricName() equals 'http.client.request',可稱(chēng)為度量。
* @return
*/
private EventListener eventListener(){
return OkHttpMetricsEventListener.builder(
meterRegistry, metricsProperties.getWeb().getClient().getRequestsMetricName())
.build();
}
原理:OkHttpMetricsEventListener.java
public class OkHttpMetricsEventListener extends EventListener {
/**
* Header name for URI patterns which will be used for tag values.
*/
public static final String URI_PATTERN = "URI_PATTERN";
@Override
public void callFailed(Call call, IOException e) {
CallState state = callState.remove(call);
if (state != null) {
state.exception = e;
// 請(qǐng)求完成時(shí),注冊(cè)監(jiān)控?cái)?shù)據(jù)
time(state);
}
}
@Override
public void responseHeadersEnd(Call call, Response response) {
CallState state = callState.remove(call);
if (state != null) {
state.response = response;
// 請(qǐng)求完成時(shí),注冊(cè)監(jiān)控?cái)?shù)據(jù)
time(state);
}
}
private void time(CallState state) {
String uri = state.response == null ? "UNKNOWN" :
(state.response.code() == 404 || state.response.code() == 301 ? "NOT_FOUND" : urlMapper.apply(state.request));
// 定義一些Tag或者是變量,在Prometheus和Grafana中可以使用
Iterable<Tag> tags = Tags.concat(extraTags, Tags.of(
"method", state.request != null ? state.request.method() : "UNKNOWN",
"uri", uri,
"status", getStatusMessage(state.response, state.exception),
"host", state.request != null ? state.request.url().host() : "UNKNOWN"
));
// 注冊(cè)計(jì)時(shí)器監(jiān)控?cái)?shù)據(jù),此時(shí)Prometheus可以通過(guò)Spring Boot Actuator提供的/actuator/promotheus斷點(diǎn)來(lái)pull數(shù)據(jù)
Timer.builder(this.requestsMetricName)
.tags(tags)
.description("Timer of OkHttp operation")
.register(registry)
.record(registry.config().clock().monotonicTime() - state.startTime, TimeUnit.NANOSECONDS);
}
}
4.4 Spring Boot集成案例
Spring Boot Node
5. 參考文檔
【1】Grafana Dashboards
【2】Centos7.X 搭建Prometheus+node-exporter+Grafana實(shí)時(shí)監(jiān)控平臺(tái)
【3】Micrometer 快速入門(mén)
【4】JVM應(yīng)用度量框架Micrometer實(shí)戰(zhàn)
【5】SpringBoot+Prometheus:微服務(wù)開(kāi)發(fā)中自定義業(yè)務(wù)監(jiān)控指標(biāo)的幾點(diǎn)經(jīng)驗(yàn)
總結(jié)
以上是生活随笔為你收集整理的Promethues + Grafana + AlertManager使用总结的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: title与h1标签的区别和联系
- 下一篇: 中国空间站首个大型对日定向装置亮相:55