推理服务上线后NPU 利用率、显存、温度、功耗都需要实时监控。CANN 提供了npu-smi和 Python API 两套监控方案。这篇讲清楚怎么用。npu-smi命令行监控npu-smi是 CANN 自带的诊断工具# 查看所有 NPU 状态npu-smi info# 查看某个 NPU 的详细状态npu-smi info-i0# 实时监控每 1 秒刷新watch-n1npu-smi info# 查看 NPU 温度npu-smi info-ttemp# 查看 NPU 功耗npu-smi info-tpower输出示例NPU ID: 0 Chip ID: 0 Product Name: Atlas 800I A2 AI Core Utilization: 72% ← 算力利用率 Memory Utilization: 85% ← 显存利用率 Temperature: 65°C ← 温度 Power: 300W / 400W ← 当前功耗 / TDP Memory Used: 48GB / 64GB ← 显存占用Python API 监控推理服务内部用 Python API 采集监控数据importtorch_npuimporttimeclassNPUMonitor:def__init__(self,device_id0):self.device_iddevice_iddefcollect(self):return{utilization:torch_npu.npu.utilization(self.device_id),memory_allocated:torch_npu.memory_allocated(self.device_id),memory_reserved:torch_npu.memory_reserved(self.device_id),temperature:torch_npu.npu.temperature(self.device_id),power:torch_npu.npu.power(self.device_id),}# 后台线程定期采集importthreading monitorNPUMonitor(device_id0)defmonitor_loop():whileTrue:statsmonitor.collect()print(f[{time.strftime(%H:%M:%S)}]{stats})time.sleep(5)threading.Thread(targetmonitor_loop,daemonTrue).start()Prometheus 集成线上服务用 Prometheus Grafana 做监控fromprometheus_clientimportGauge,start_http_server# 定义指标npu_utilGauge(npu_utilization_percent,NPU Utilization,[device])npu_memGauge(npu_memory_used_bytes,NPU Memory Used,[device])npu_tempGauge(npu_temperature_celsius,NPU Temperature,[device])npu_powerGauge(npu_power_watts,NPU Power,[device])# 定期更新指标defupdate_metrics():monitorNPUMonitor(device_id0)whileTrue:statsmonitor.collect()npu_util.labels(devicenpu:0).set(stats[utilization])npu_mem.labels(devicenpu:0).set(stats[memory_allocated])npu_temp.labels(devicenpu:0).set(stats[temperature])npu_power.labels(devicenpu:0).set(stats[power])time.sleep(5)threading.Thread(targetupdate_metrics,daemonTrue).start()# 启动 Prometheus HTTP 端点start_http_server(8000)Grafana 面板配置panels:-title:NPU Utilizationexpr:npu_utilization_percent{devicenpu:0}type:graph-title:NPU Memoryexpr:npu_memory_used_bytes{devicenpu:0}/ 1024 / 1024 / 1024type:gauge-title:NPU Temperatureexpr:npu_temperature_celsius{devicenpu:0}type:gauge-title:NPU Powerexpr:npu_power_watts{devicenpu:0}type:gauge告警规则关键告警规则# Prometheus 告警规则groups:-name:npu_alertsrules:-alert:NPUUtilizationLowexpr:npu_utilization_percent{devicenpu:0} 20for:5mannotations:summary:NPU 利用率过低20%可能影响吞吐-alert:NPUMemoryHighexpr:npu_memory_used_bytes / npu_memory_total_bytes0.95for:1mannotations:summary:NPU 显存使用率 95%可能 OOM-alert:NPUTemperatureHighexpr:npu_temperature_celsius80for:2mannotations:summary:NPU 温度 80°C请检查散热-alert:NPUPowerHighexpr:npu_power_watts / npu_power_max_watts0.95for:1mannotations:summary:NPU 功耗接近 TDP可能降频日志集成把 NPU 监控数据写入推理日志importlogging loggerlogging.getLogger(inference)classLoggingMonitor:def__init__(self,interval60):self.intervalinterval self.monitorNPUMonitor(device_id0)defstart(self):defloop():whileTrue:statsself.monitor.collect()logger.info(fNPU Stats: util{stats[utilization]:.1f}% fmem{stats[memory_allocated]/1024/1024/1024:.1f}GB ftemp{stats[temperature]}°C fpower{stats[power]:.0f}W)time.sleep(self.interval)threading.Thread(targetloop,daemonTrue).start()故障排查问题 1NPU 利用率低30%可能原因Batch size 太小decode 阶段 M1算子没融合GE 编译没生效数据预处理在 CPUNPU 等数据排查npu-smi info看 AI Core UtilizationProfiler 看 kernel 时间。问题 2显存泄漏# 定期打印显存print(fAllocated:{torch_npu.memory_allocated()/1024/1024/1024:.1f}GB)print(fReserved:{torch_npu.memory_reserved()/1024/1024/1024:.1f}GB)如果 Allocated 在增长但 Reserved 不变说明有 tensor 没释放可能保存在某个地方了。问题 3温度报警检查机箱风道是否通畅NPU 风扇转速npu-smi info -t fan机房空调是否正常推理服务的监控是上线前的必修课。npu-smi 做快速诊断Python API 做精细化采集Prometheus Grafana 做长期监控和告警。三个层次都要有。仓库在这里https://atomgit.com/cann/torch_npu
CANN-昇腾NPU-推理服务监控-怎么实时监控NPU状态
发布时间:2026/5/23 23:18:09
推理服务上线后NPU 利用率、显存、温度、功耗都需要实时监控。CANN 提供了npu-smi和 Python API 两套监控方案。这篇讲清楚怎么用。npu-smi命令行监控npu-smi是 CANN 自带的诊断工具# 查看所有 NPU 状态npu-smi info# 查看某个 NPU 的详细状态npu-smi info-i0# 实时监控每 1 秒刷新watch-n1npu-smi info# 查看 NPU 温度npu-smi info-ttemp# 查看 NPU 功耗npu-smi info-tpower输出示例NPU ID: 0 Chip ID: 0 Product Name: Atlas 800I A2 AI Core Utilization: 72% ← 算力利用率 Memory Utilization: 85% ← 显存利用率 Temperature: 65°C ← 温度 Power: 300W / 400W ← 当前功耗 / TDP Memory Used: 48GB / 64GB ← 显存占用Python API 监控推理服务内部用 Python API 采集监控数据importtorch_npuimporttimeclassNPUMonitor:def__init__(self,device_id0):self.device_iddevice_iddefcollect(self):return{utilization:torch_npu.npu.utilization(self.device_id),memory_allocated:torch_npu.memory_allocated(self.device_id),memory_reserved:torch_npu.memory_reserved(self.device_id),temperature:torch_npu.npu.temperature(self.device_id),power:torch_npu.npu.power(self.device_id),}# 后台线程定期采集importthreading monitorNPUMonitor(device_id0)defmonitor_loop():whileTrue:statsmonitor.collect()print(f[{time.strftime(%H:%M:%S)}]{stats})time.sleep(5)threading.Thread(targetmonitor_loop,daemonTrue).start()Prometheus 集成线上服务用 Prometheus Grafana 做监控fromprometheus_clientimportGauge,start_http_server# 定义指标npu_utilGauge(npu_utilization_percent,NPU Utilization,[device])npu_memGauge(npu_memory_used_bytes,NPU Memory Used,[device])npu_tempGauge(npu_temperature_celsius,NPU Temperature,[device])npu_powerGauge(npu_power_watts,NPU Power,[device])# 定期更新指标defupdate_metrics():monitorNPUMonitor(device_id0)whileTrue:statsmonitor.collect()npu_util.labels(devicenpu:0).set(stats[utilization])npu_mem.labels(devicenpu:0).set(stats[memory_allocated])npu_temp.labels(devicenpu:0).set(stats[temperature])npu_power.labels(devicenpu:0).set(stats[power])time.sleep(5)threading.Thread(targetupdate_metrics,daemonTrue).start()# 启动 Prometheus HTTP 端点start_http_server(8000)Grafana 面板配置panels:-title:NPU Utilizationexpr:npu_utilization_percent{devicenpu:0}type:graph-title:NPU Memoryexpr:npu_memory_used_bytes{devicenpu:0}/ 1024 / 1024 / 1024type:gauge-title:NPU Temperatureexpr:npu_temperature_celsius{devicenpu:0}type:gauge-title:NPU Powerexpr:npu_power_watts{devicenpu:0}type:gauge告警规则关键告警规则# Prometheus 告警规则groups:-name:npu_alertsrules:-alert:NPUUtilizationLowexpr:npu_utilization_percent{devicenpu:0} 20for:5mannotations:summary:NPU 利用率过低20%可能影响吞吐-alert:NPUMemoryHighexpr:npu_memory_used_bytes / npu_memory_total_bytes0.95for:1mannotations:summary:NPU 显存使用率 95%可能 OOM-alert:NPUTemperatureHighexpr:npu_temperature_celsius80for:2mannotations:summary:NPU 温度 80°C请检查散热-alert:NPUPowerHighexpr:npu_power_watts / npu_power_max_watts0.95for:1mannotations:summary:NPU 功耗接近 TDP可能降频日志集成把 NPU 监控数据写入推理日志importlogging loggerlogging.getLogger(inference)classLoggingMonitor:def__init__(self,interval60):self.intervalinterval self.monitorNPUMonitor(device_id0)defstart(self):defloop():whileTrue:statsself.monitor.collect()logger.info(fNPU Stats: util{stats[utilization]:.1f}% fmem{stats[memory_allocated]/1024/1024/1024:.1f}GB ftemp{stats[temperature]}°C fpower{stats[power]:.0f}W)time.sleep(self.interval)threading.Thread(targetloop,daemonTrue).start()故障排查问题 1NPU 利用率低30%可能原因Batch size 太小decode 阶段 M1算子没融合GE 编译没生效数据预处理在 CPUNPU 等数据排查npu-smi info看 AI Core UtilizationProfiler 看 kernel 时间。问题 2显存泄漏# 定期打印显存print(fAllocated:{torch_npu.memory_allocated()/1024/1024/1024:.1f}GB)print(fReserved:{torch_npu.memory_reserved()/1024/1024/1024:.1f}GB)如果 Allocated 在增长但 Reserved 不变说明有 tensor 没释放可能保存在某个地方了。问题 3温度报警检查机箱风道是否通畅NPU 风扇转速npu-smi info -t fan机房空调是否正常推理服务的监控是上线前的必修课。npu-smi 做快速诊断Python API 做精细化采集Prometheus Grafana 做长期监控和告警。三个层次都要有。仓库在这里https://atomgit.com/cann/torch_npu