保姆级教程:从零搭建一套完整的Prometheus+Grafana监控系统(含Linux/MySQL/Redis/Windows) 企业级监控系统实战从零构建PrometheusGrafana全栈监控方案监控系统如同运维工程师的眼睛而PrometheusGrafana的组合正在成为现代监控领域的黄金标准。这套开源解决方案不仅能覆盖从基础设施到应用服务的全栈监控需求其灵活的架构设计更可轻松应对业务规模的增长。本文将带您以项目实战的方式从架构设计到落地实施完整构建一个支持Linux主机、MySQL、Redis和Windows服务器的生产级监控系统。1. 监控系统架构设计与核心组件在开始安装配置之前我们需要先理解这套监控系统的整体架构。Prometheus采用拉取(Pull)模式采集数据通过各类Exporter将监控指标暴露为HTTP接口而Grafana则负责数据的可视化展示。这种松耦合的设计使得系统具备良好的扩展性。核心组件功能对比组件角色通信协议数据流向Prometheus Server监控数据采集与存储HTTP主动拉取Exporter数据Node Exporter主机指标采集HTTP暴露系统指标供Prometheus采集MySQLd ExporterMySQL数据库指标采集HTTP暴露数据库性能指标Redis ExporterRedis指标采集HTTP暴露缓存服务指标WMI ExporterWindows主机指标采集HTTP暴露Windows系统指标Grafana数据可视化HTTP从Prometheus查询数据展示这套架构的优势在于模块化设计各组件职责单一故障隔离性好低侵入性通过标准HTTP协议采集无需修改被监控应用弹性扩展新增监控目标只需部署对应Exporter并修改配置提示生产环境中建议将Prometheus Server与Grafana部署在独立服务器避免资源竞争影响监控系统稳定性。2. 基础环境部署与配置2.1 Prometheus Server安装我们以CentOS 7为例演示如何部署Prometheus服务端# 创建专用用户和目录 sudo useradd --no-create-home --shell /bin/false prometheus sudo mkdir /etc/prometheus /var/lib/prometheus sudo chown prometheus:prometheus /var/lib/prometheus # 下载并解压二进制包 wget https://github.com/prometheus/prometheus/releases/download/v2.37.0/prometheus-2.37.0.linux-amd64.tar.gz tar xvf prometheus-2.37.0.linux-amd64.tar.gz cd prometheus-2.37.0.linux-amd64/ # 安装核心文件 sudo cp prometheus promtool /usr/local/bin/ sudo cp -r consoles console_libraries /etc/prometheus/ sudo chown -R prometheus:prometheus /etc/prometheus # 验证版本 prometheus --version创建systemd服务单元文件/etc/systemd/system/prometheus.service[Unit] DescriptionPrometheus Wantsnetwork-online.target Afternetwork-online.target [Service] Userprometheus Groupprometheus Typesimple ExecStart/usr/local/bin/prometheus \ --config.file /etc/prometheus/prometheus.yml \ --storage.tsdb.path /var/lib/prometheus/ \ --web.console.templates/etc/prometheus/consoles \ --web.console.libraries/etc/prometheus/console_libraries \ --web.listen-address0.0.0.0:9090 Restartalways [Install] WantedBymulti-user.target2.2 初始配置文件优化编辑/etc/prometheus/prometheus.yml配置文件global: scrape_interval: 15s evaluation_interval: 15s rule_files: - alert.rules scrape_configs: - job_name: prometheus static_configs: - targets: [localhost:9090] metrics_path: /metrics启动服务并验证sudo systemctl daemon-reload sudo systemctl start prometheus sudo systemctl enable prometheus # 检查服务状态 curl -v http://localhost:9090/metrics3. 多平台监控目标集成3.1 Linux主机监控配置Node Exporter是监控Linux主机的标准方案安装步骤如下wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz tar xvf node_exporter-1.3.1.linux-amd64.tar.gz sudo mv node_exporter-1.3.1.linux-amd64/node_exporter /usr/local/bin/ # 创建系统服务 cat /etc/systemd/system/node_exporter.service EOF [Unit] DescriptionNode Exporter Afternetwork.target [Service] Usernode_exporter Groupnode_exporter Typesimple ExecStart/usr/local/bin/node_exporter \ --collector.systemd \ --collector.systemd.unit-whitelist(docker|sshd|nginx).service Restarton-failure [Install] WantedBymulti-user.target EOF # 启动服务 sudo useradd -rs /bin/false node_exporter sudo systemctl daemon-reload sudo systemctl start node_exporter sudo systemctl enable node_exporter在Prometheus配置中添加监控目标- job_name: node static_configs: - targets: [192.168.1.10:9100, 192.168.1.11:9100] labels: env: production role: web-server3.2 MySQL监控配置MySQL监控需要先创建监控专用账号CREATE USER exporterlocalhost IDENTIFIED BY StrongPassword WITH MAX_USER_CONNECTIONS 3; GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO exporterlocalhost;安装MySQL Exporterwget https://github.com/prometheus/mysqld_exporter/releases/download/v0.14.0/mysqld_exporter-0.14.0.linux-amd64.tar.gz tar xvf mysqld_exporter-0.14.0.linux-amd64.tar.gz sudo mv mysqld_exporter-0.14.0.linux-amd64/mysqld_exporter /usr/local/bin/ # 创建配置文件 cat /etc/.my.cnf EOF [client] userexporter passwordStrongPassword hostlocalhost port3306 EOF # 设置系统服务 cat /etc/systemd/system/mysql_exporter.service EOF [Unit] DescriptionMySQL Exporter Afternetwork.target [Service] Usermysql_exporter Groupmysql_exporter Typesimple ExecStart/usr/local/bin/mysqld_exporter \ --config.my-cnf/etc/.my.cnf \ --collect.global_status \ --collect.info_schema.innodb_metrics \ --collect.auto_increment.columns \ --collect.info_schema.processlist \ --collect.binlog_size \ --collect.info_schema.tablestats \ --collect.global_variables \ --collect.info_schema.query_response_time \ --collect.info_schema.userstats \ --collect.info_schema.tables \ --collect.perf_schema.tablelocks \ --collect.perf_schema.file_events \ --collect.perf_schema.eventswaits \ --collect.perf_schema.indexiowaits \ --collect.perf_schema.tableiowaits \ --collect.slave_status \ --web.listen-address0.0.0.0:9104 Restarton-failure [Install] WantedBymulti-user.target EOF # 启动服务 sudo useradd -rs /bin/false mysql_exporter sudo systemctl daemon-reload sudo systemctl start mysql_exporter sudo systemctl enable mysql_exporterPrometheus配置添加- job_name: mysql static_configs: - targets: [192.168.1.20:9104] labels: env: production role: primary-db3.3 Redis监控配置Redis监控相对简单安装Redis Exporterwget https://github.com/oliver006/redis_exporter/releases/download/v1.45.0/redis_exporter-v1.45.0.linux-amd64.tar.gz tar xvf redis_exporter-v1.45.0.linux-amd64.tar.gz sudo mv redis_exporter-v1.45.0.linux-amd64/redis_exporter /usr/local/bin/ # 创建系统服务 cat /etc/systemd/system/redis_exporter.service EOF [Unit] DescriptionRedis Exporter Afternetwork.target [Service] Userredis_exporter Groupredis_exporter Typesimple ExecStart/usr/local/bin/redis_exporter \ --redis.addrredis://localhost:6379 \ --web.listen-address0.0.0.0:9121 Restarton-failure [Install] WantedBymulti-user.target EOF # 启动服务 sudo useradd -rs /bin/false redis_exporter sudo systemctl daemon-reload sudo systemctl start redis_exporter sudo systemctl enable redis_exporterPrometheus配置添加- job_name: redis static_configs: - targets: [192.168.1.30:9121] labels: env: production role: cache3.4 Windows主机监控配置Windows监控使用WMI Exporter下载最新MSI安装包https://github.com/prometheus-community/windows_exporter/releases/download/v0.21.0/windows_exporter-0.21.0-amd64.msi安装时可选择需要采集的指标类别msiexec /i windows_exporter-0.21.0-amd64.msi ENABLED_COLLECTORScpu,cs,logical_disk,net,os,service,system,textfile LISTEN_PORT9182Prometheus配置添加- job_name: windows static_configs: - targets: [192.168.1.40:9182] labels: env: production role: file-server4. Grafana可视化配置4.1 Grafana安装与基础配置sudo yum install -y https://dl.grafana.com/oss/release/grafana-9.1.5-1.x86_64.rpm sudo systemctl daemon-reload sudo systemctl start grafana-server sudo systemctl enable grafana-server访问http://服务器IP:3000默认账号admin/admin。添加Prometheus数据源左侧菜单选择Configuration Data Sources点击Add data source选择Prometheus配置URL为http://localhost:9090点击Save Test验证连接4.2 仪表板导入与定制推荐仪表板ID监控对象仪表板ID特点Linux主机8919全面的系统指标展示MySQL7362包含查询性能分析Redis11835详细的缓存指标Windows10467系统服务监控导入仪表板步骤左侧菜单选择Create Import输入仪表板ID或上传JSON文件选择对应的Prometheus数据源点击Import完成导入自定义仪表板技巧使用变量实现环境切换创建$env变量在PromQL中使用{env~$env}过滤设置阈值告警在面板编辑中配置Alert规则添加注释说明使用Text面板记录监控指标含义5. 生产环境优化实践5.1 性能调优建议Prometheus服务器配置优化# /etc/prometheus/prometheus.yml 追加配置 storage: tsdb: retention: 15d wal_compression: true max_block_chunk_segment_size: 512MB query: lookback-delta: 5m timeout: 2m资源分配参考监控目标规模CPU核心内存磁盘空间100节点24GB50GB100-500节点48GB200GB500节点816GB1TB5.2 高可用方案对于关键业务监控建议部署Prometheus高可用集群双活Prometheus部署两套独立Prometheus采集相同目标远程存储配置VictoriaMetrics或Thanos实现长期存储服务发现使用Consul或Kubernetes服务发现动态管理监控目标示例配置远程存储# /etc/prometheus/prometheus.yml remote_write: - url: http://victoriametrics:8428/api/v1/write queue_config: capacity: 10000 max_shards: 200 min_shards: 1 max_samples_per_send: 10005.3 常见问题排查指标采集失败检查Exporter服务状态systemctl status exporter验证端口连通性telnet IP PORT检查Prometheus日志journalctl -u prometheus -fGrafana显示无数据确认数据源配置正确检查时间范围设置验证PromQL查询语法资源占用过高调整采集频率适当增加scrape_interval减少不必要指标在Exporter启动参数中使用--no-collector.name优化TSDB配置调整storage.tsdb相关参数