TileOneBlkColumnBroadcastMul【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass代码位置[TOC]功能说明TileOneBlkColumnBroadcastMul实现 epilogue 阶段的列广播乘法操作。将形状 (m, 1) 的列向量在 block 内广播到 (m, n) 后与输入相乘。broadcast 粒度为一个 blockBYTE_PER_BLK字节即 column 上的每 1 个元素广播到 1 个完整的 block。适用范围所有架构无架构特化风格非 TLA模板原型template class ArchTag_, // 架构标签 class ComputeType_, // 计算数据类型 class TileShape_ // Tile 形状 struct TileOneBlkColumnBroadcastMul;模板参数说明ArchTag_架构标签ComputeType_Gemm::GemmTypeElementCompute, RowMajorTileShape_Tile 形状ShapeROW, COLUMN调用接口void operator()( AscendC::LocalTensorElementCompute const ubOut, AscendC::LocalTensorElementCompute const ubIn0, AscendC::LocalTensorElementCompute const ubIn1 // (m, eleNumPerBlk) 形状 )通过AscendC::MulBinaryRepeatParamssrc1RepStride 0,src1BlkStride 1实现列广播。调用示例#include catlass/epilogue/tile/tile_broadcast_mul.hpp using namespace Catlass::Epilogue::Tile; using ComputeType Gemm::GemmTypehalf, layout::RowMajor; using TileShape Shape128, 256; using ColumnBroadcastMul TileOneBlkColumnBroadcastMulArch::AtlasA2, ComputeType, TileShape; AscendC::LocalTensorhalf ubOut, ubIn0, ubIn1; ColumnBroadcastMul op; op(ubOut, ubIn0, ubIn1);【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考
CANN/catlass列广播乘法API
发布时间:2026/5/30 21:53:54
TileOneBlkColumnBroadcastMul【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass代码位置[TOC]功能说明TileOneBlkColumnBroadcastMul实现 epilogue 阶段的列广播乘法操作。将形状 (m, 1) 的列向量在 block 内广播到 (m, n) 后与输入相乘。broadcast 粒度为一个 blockBYTE_PER_BLK字节即 column 上的每 1 个元素广播到 1 个完整的 block。适用范围所有架构无架构特化风格非 TLA模板原型template class ArchTag_, // 架构标签 class ComputeType_, // 计算数据类型 class TileShape_ // Tile 形状 struct TileOneBlkColumnBroadcastMul;模板参数说明ArchTag_架构标签ComputeType_Gemm::GemmTypeElementCompute, RowMajorTileShape_Tile 形状ShapeROW, COLUMN调用接口void operator()( AscendC::LocalTensorElementCompute const ubOut, AscendC::LocalTensorElementCompute const ubIn0, AscendC::LocalTensorElementCompute const ubIn1 // (m, eleNumPerBlk) 形状 )通过AscendC::MulBinaryRepeatParamssrc1RepStride 0,src1BlkStride 1实现列广播。调用示例#include catlass/epilogue/tile/tile_broadcast_mul.hpp using namespace Catlass::Epilogue::Tile; using ComputeType Gemm::GemmTypehalf, layout::RowMajor; using TileShape Shape128, 256; using ColumnBroadcastMul TileOneBlkColumnBroadcastMulArch::AtlasA2, ComputeType, TileShape; AscendC::LocalTensorhalf ubOut, ubIn0, ubIn1; ColumnBroadcastMul op; op(ubOut, ubIn0, ubIn1);【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考