CATLASS GM到UB搬运TLA组件 CopyGm2UbTla【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass代码位置[TOC]功能说明CopyGm2UbTla实现 epilogue 阶段从 GM 到 UB 的 TLA 风格数据搬运。通过tla::Tensor封装操作数利用 SFINAE 根据源/目的 Layout 自动选择搬运策略。适用范围AtlasA2RowMajor、Ascend950VectorLayout / RowMajor风格TLAtla::Tensor接口与 CopyGm2Ub 的区别TLA 风格通过decltype推导模板参数模板原型template class ArchTag, class TensorSrc, class TensorDst, class Enable void struct CopyGm2UbTla;偏特化实现架构SFINAE 条件搬运方式AtlasA2isRowMajorSrc isRowMajorDstDataCopyPadDataCopyPadExtParamsAscend950isVectorSrc isVectorDstDataCopyPad单 block 搬运Ascend950isRowMajorSrc isRowMajorDstDataCopyPadDataCopyPadExtParams调用接口template class TensorDst, class TensorSrc void operator()(TensorDst const dstTensor, TensorSrc const srcTensor)参数说明dstTensor目的 TLA TensorUB, VECCALCsrcTensor源 TLA TensorGM调用示例#include catlass/epilogue/tile/copy_gm_to_ub_tla.hpp using namespace Catlass::Epilogue::Tile; auto srcLayout tla::MakeLayouthalf, layout::RowMajor(128, 256); auto dstLayout tla::MakeLayouthalf, layout::RowMajor(128, 256); AscendC::GlobalTensorhalf srcData; AscendC::LocalTensorhalf dstData; auto srcTensor tla::MakeTensor(srcData, srcLayout, Arch::PositionGM{}); auto dstTensor tla::MakeTensor(dstData, dstLayout, Arch::PositionUB{}); CopyGm2UbTlaArch::AtlasA2, decltype(srcTensor), decltype(dstTensor) copyOp; copyOp(dstTensor, srcTensor);【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考