为什么选择 P+G?
Prometheus 是 CNCF 毕业项目,采用 Pull 模型采集指标,配合 Grafana 强大的可视化能力,已成为云原生监控的事实标准。
1. 快速部署
# docker-compose.yml
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus:/etc/prometheus
- prometheus_data:/prometheus
command: --config.file=/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
node-exporter:
image: prom/node-exporter:latest
ports:
- "9100:9100"
volumes:
prometheus_data:
grafana_data:
2. Prometheus 配置
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
rule_files:
- "alerts/*.yml"
scrape_configs:
- job_name: 'node'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'myapp'
static_configs:
- targets: ['app:8080']
3. 应用埋点
// Node.js 应用 Prometheus 指标
const prometheus = require('prom-client');
const httpRequestsTotal = new prometheus.Counter({
name: 'http_requests_total',
help: 'Total HTTP requests',
labelNames: ['method', 'route', 'status'],
});
const httpDuration = new prometheus.Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request duration',
labelNames: ['method', 'route'],
buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5],
});
// 中间件
app.use((req, res, next) => {
const end = httpDuration.startTimer();
res.on('finish', () => {
httpRequestsTotal.inc({ method: req.method, route: req.path, status: res.statusCode });
end({ method: req.method, route: req.path });
});
next();
});
// 暴露指标端点
app.get('/metrics', async (req, res) => {
res.set('Content-Type', prometheus.register.contentType);
res.end(await prometheus.register.metrics());
});
4. 告警规则
groups:
- name: node_alerts
rules:
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "CPU 使用率 > 80%"
- alert: DiskSpaceLow
expr: node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100 < 10
for: 1m
labels:
severity: critical
annotations:
summary: "磁盘空间不足 10%"
5. Grafana 面板导入
推荐面板 ID:Node Exporter Full(1860)、Docker Monitoring(193)、NGINX(11190)。一键导入即可获得专业级监控面板。
总结
Prometheus + Grafana 是性价比最高的开源监控方案。从部署到应用埋点再到告警通知,掌握这套体系能极大提升系统可观测性。