CANN/catlass稀疏GEMM搬运模板 TileCopySparseTla【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass代码位置[TOC]功能说明TileCopySparseTla是稀疏 GEMM 专用的 TLA 搬运模板与 TileCopyTla 的核心区别在于源/目标 Tensor 可能为不同的元素类型如 B index 数据int32_t。结构与TileCopyTla类似通过 SFINAE 自动分发。限制仅 AtlasA2 架构CATLASS_ARCH 2201。基类声明template class ArchTag, // 架构标签 class TensorSrc, // 源 tensor 类型 class TensorDst, // 目标 tensor 类型 class Enable void // SFINAE enable struct TileCopySparseTla { static_assert(DEPENDENT_FALSEArchTag, Unsupported TileCopySparseTla, can not find the specialization.); };偏特化实现清单全 AtlasA2方向说明实现位置API 文档GM→L1A稀疏 A 矩阵 GM RowMajor → L1 RowMajoratlasa2/copy_gm_to_l1.hppcopy_gm_to_l1GM→L1B稀疏 B 矩阵 GM ColumnMajor → L1 ColumnMajoratlasa2/copy_gm_to_l1.hppcopy_gm_to_l1L1→L0A稀疏 A 矩阵 L1→L0A zZatlasa2/copy_l1_to_l0a.hppcopy_l1_to_l0a调用接口template class TensorDst, class TensorSrc void operator()( TensorDst const dstTensor, TensorSrc const srcTensor );调用示例#include catlass/gemm/tile/tile_copy_tla.hpp #include catlass/gemm/tile/copy_gm_to_l1.hpp #include tla/tensor.hpp using namespace Catlass::Gemm::Tile; using namespace tla; // B 矩阵 indexint32_t GM→L1 auto idxGmLayout tla::MakeLayoutint32_t, layout::ColumnMajor(K, N); auto idxL1Layout tla::MakeLayoutint32_t, layout::ColumnMajor(K, N); auto idxGmTensor tla::MakeTensor(idxGm, idxGmLayout, Arch::PositionGM{}); auto idxL1Tensor tla::MakeTensor(idxL1, idxL1Layout, Arch::PositionL1{}); TileCopySparseTlaArch::AtlasA2, decltype(idxGmTensor), decltype(idxL1Tensor) copyOp; copyOp(idxL1Tensor, idxGmTensor);【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考