CATLASS:CopyGm2Ub / TileCopyTla(GM → UB) CopyGm2Ub / TileCopyTlaGM → UB【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass代码位置[TOC]概述GM→UB 搬运模块负责将数据从 GMGlobal Memory搬运到 UBUnified Buffer。VectorLayout 的一维向量数据使用CopyGm2UbRowMajor 的二维矩阵数据使用TileCopyTla。限制仅支持 AtlasA2 架构CATLASS_ARCH 2201。API 清单API风格适用硬件Layout说明CopyGm2Ub非 TLAAtlasA2VectorLayoutGM 一维向量 → UBTileCopyTlaTLAAtlasA2RowMajorGM RowMajor → UB RowMajor调用示例非 TLA#include catlass/gemm/tile/copy_gm_to_ub.hpp using CopyOp CopyGm2UbArch::AtlasA2, Gemm::GemmTypehalf, layout::VectorLayout; auto layoutSrc layout::VectorLayout(len); auto layoutDst layout::VectorLayout(len); CopyOp copyOp; copyOp(dstUB, srcGm, layoutDst, layoutSrc);TLA#include catlass/gemm/tile/copy_gm_to_ub.hpp #include tla/tensor.hpp auto srcLayout tla::MakeLayouthalf, layout::RowMajor(M, K); auto dstLayout tla::MakeLayouthalf, layout::RowMajor(M, K); auto srcTensor tla::MakeTensor(srcGm, srcLayout, Arch::PositionGM{}); auto dstTensor tla::MakeTensor(dstUB, dstLayout, Arch::PositionUB{}); TileCopyTlaArch::AtlasA2, decltype(srcTensor), decltype(dstTensor) copyOp; copyOp(dstTensor, srcTensor);模板选择指南场景推荐风格一维向量搬运Bias/ScaleCopyGm2Ub非 TLA二维矩阵搬运TileCopyTlaTLA【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考