CANN/PTO-ISA高级调试工具 高级调试工具【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isamssanitizer 内存检测mssanitizer是昇腾平台的内存安全检测工具可检测通信算子中的内存越界、未对齐访问等问题。编译集成在 CMakeLists.txt 中添加 sanitizer 编译选项option(ENABLE_SANITIZER Enable mssanitizer for memory checking OFF) if(ENABLE_SANITIZER) target_compile_options(comm_kernel PRIVATE -fsanitizememory) target_link_options(comm_kernel PRIVATE -fsanitizememory) target_compile_options(compute_kernel PRIVATE -fsanitizememory) target_link_options(compute_kernel PRIVATE -fsanitizememory) endif()使用方式# 编译带 sanitizer 的版本 cmake .. -DENABLE_SANITIZERON make -j # 运行mssanitizer 自动检测 mssanitizer --toolmemcheck mpirun -np 8 ./my_operator可检测问题问题类型说明通信算子中的典型场景GM 越界读读取超出分配范围的 GM 地址Tile 分块时边界计算错误GM 越界写写入超出分配范围的 GM 地址远端地址偏移计算溢出UB 越界访问超出 UB 容量Tile 大小设置过大未对齐访问未满足对齐要求Signal 地址非 4B 对齐输出解读[mssanitizer] ERROR: out-of-bounds access at GM address 0x12345678 in kernel CommKernelEntry at comm_kernel.cpp:142 allocated at main.cpp:89 with size 65536 accessed offset: 65540 (4 bytes beyond allocation)重点关注out-of-bounds检查 Tile 边界和远端地址计算use-after-free检查 buffer 生命周期uninitialized检查信号矩阵是否清零环境变量调试# HCCL 调试 export HCCL_LOG_LEVELDEBUG # HCCL 日志级别 export HCCL_BUFFSIZE1024 # 通信缓冲区大小MB # ACL 错误码检查 export ACL_ERROR_ABORT1 # 遇到 ACL 错误立即 abort缩小问题规模#define DEBUG_MODE #ifdef DEBUG_MODE static constexpr uint32_t G_ORIG_M 128; static constexpr uint32_t G_ORIG_N 256; static constexpr int COMPUTE_BLOCK_NUM 2; static constexpr int COMM_BLOCK_NUM 2; #endifHost 侧性能计时aclrtEvent startEvent, endEvent; aclrtCreateEvent(startEvent); aclrtCreateEvent(endEvent); aclrtRecordEvent(startEvent, stream); launchKernel(..., stream); aclrtRecordEvent(endEvent, stream); aclrtSynchronizeStream(stream); float elapsed_ms; aclrtEventElapsedTime(elapsed_ms, startEvent, endEvent); printf(Kernel time: %.3f ms\n, elapsed_ms);Warmup 多次测量// Warmup排除首次开销 for (int i 0; i WARMUP_ITERS; i) { ClearSignals(); LaunchKernel(...); aclrtSynchronizeStream(stream); } // 正式测量 float total_ms 0; for (int i 0; i MEASURE_ITERS; i) { ClearSignals(); aclrtRecordEvent(start, stream); LaunchKernel(...); aclrtRecordEvent(end, stream); aclrtSynchronizeStream(stream); float ms; aclrtEventElapsedTime(ms, start, end); total_ms ms; } printf(Average: %.3f ms\n, total_ms / MEASURE_ITERS);msprof 硬件 Profiling对于 Device 侧管道级分析# 采集 kernel 执行 timeline msprof --output./prof_data --applicationmpirun -np 8 ./my_operator # 导出分析结果 msprof --exporttimeline --output./prof_data可展示 MTE2/MTE3/Cube/Vec 管道占用率定位通信/计算重叠空洞。【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考