go-Prometheus：云原生时代的监控利器

icy 02-26 115 抢沙发

默认

摘要： Prometheus：云原生时代的监控利器什么是Prometheus？ Prometheus是一个开源的系统监控和警报工具包，最初由SoundCloud开发，现在已成为云原生计算...

Prometheus：云原生时代的监控利器

什么是Prometheus？

Prometheus是一个开源的系统监控和警报工具包，最初由SoundCloud开发，现在已成为云原生计算基金会（CNCF）的毕业项目。它采用Go语言编写，以其强大的多维数据模型、灵活的查询语言和高效的时序数据库而闻名。

核心特性

1. 多维数据模型

Prometheus使用键值对标签来标识时间序列数据，这种设计使得数据过滤、聚合和查询变得异常灵活。

2. 强大的查询语言（PromQL）

PromQL允许用户对收集的指标进行实时查询和聚合操作，支持范围查询、即时查询和各种数学运算。

3. 高效的时序数据库

Prometheus内置的时序数据库针对监控场景进行了优化，支持高效的数据存储和检索。

4. 灵活的采集方式

支持pull和push两种数据采集模式，通过服务发现机制自动发现监控目标。

5. 完善的警报系统

内置Alertmanager组件，支持灵活的警报路由、分组和抑制机制。

安装与部署

使用Docker快速启动

text

docker run -p 9090:9090 prom/prometheus

二进制安装

text

# 下载最新版本
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz

# 解压并运行
tar xvfz prometheus-2.45.0.linux-amd64.tar.gz
cd prometheus-2.45.0.linux-amd64
./prometheus --config.file=prometheus.yml

配置示例

基础配置文件（prometheus.yml）

text

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']

实际应用示例

1. 监控Go应用指标

main.go - 集成Prometheus客户端库

text

package main

import (
    "net/http"
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
    requestsTotal = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Total number of HTTP requests",
        },
        []string{"method", "endpoint"},
    )
    
    requestDuration = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "http_request_duration_seconds",
            Help:    "Duration of HTTP requests",
            Buckets: prometheus.DefBuckets,
        },
        []string{"method", "endpoint"},
    )
)

func init() {
    prometheus.MustRegister(requestsTotal)
    prometheus.MustRegister(requestDuration)
}

func main() {
    http.Handle("/metrics", promhttp.Handler())
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        timer := prometheus.NewTimer(requestDuration.WithLabelValues(r.Method, r.URL.Path))
        defer timer.ObserveDuration()
        
        requestsTotal.WithLabelValues(r.Method, r.URL.Path).Inc()
        w.Write([]byte("Hello, Prometheus!"))
    })
    
    http.ListenAndServe(":8080", nil)
}

2. 使用PromQL查询数据

text

# 查询HTTP请求总数
http_requests_total

# 按端点分组统计
sum by (endpoint) (rate(http_requests_total[5m]))

# 计算95分位响应时间
histogram_quantile(0.95, 
    sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
)

# 监控CPU使用率
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# 内存使用率
(node_memory_MemTotal_bytes - node_memory_MemFree_bytes) / node_memory_MemTotal_bytes * 100

3. 警报规则配置

alerts.yml

text

groups:
  - name: example
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status="500"}[5m]) > 0.05
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "High error rate on {{ $labels.instance }}"
          description: "Error rate is {{ $value }}"
      
      - alert: ServiceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Service {{ $labels.job }} is down"

生态系统集成

1. 与Grafana集成

text

# Prometheus作为Grafana数据源
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090

2. Kubernetes监控

text

# 使用Prometheus Operator部署
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: k8s
spec:
  serviceAccountName: prometheus
  serviceMonitorSelector: {}
  podMonitorSelector: {}
  resources:
    requests:
      memory: 400Mi