RpSort16【免费下载链接】asc-devkit本项目是CANN 推出的昇腾AI处理器专用的算子程序开发语言原生支持C和C标准规范主要由类库和语言扩展层构成提供多层级API满足多维场景算子开发诉求。项目地址: https://gitcode.com/cann/asc-devkit产品支持情况产品是否支持Ascend 950PR/Ascend 950DTxAtlas A3 训练系列产品 / Atlas A3 推理系列产品xAtlas A2 训练系列产品 / Atlas A2 推理系列产品xAtlas 200I/500 A2 推理产品xAtlas 推理系列产品 AI Core√Atlas 推理系列产品 Vector CorexAtlas 训练系列产品√功能说明根据Region Proposals中的score域对其进行排序score大的排前面每次排16个Region Proposals。函数原型template typename T __aicore__ inline void RpSort16(const LocalTensorT dst, const LocalTensorT src, const int32_t repeatTime)参数说明表 1模板参数说明参数名描述T操作数数据类型。Atlas 训练系列产品支持的数据类型为halfAtlas 推理系列产品 AI Core支持的数据类型为half/float表 2参数说明参数名称输入/输出含义dst输出目的操作数存储经过排序后的Region Proposals。类型为LocalTensor支持的TPosition为VECIN/VECCALC/VECOUT。LocalTensor的起始地址需要32字节对齐。src输入源操作数存储未经过排序的Region Proposals。类型为LocalTensor支持的TPosition为VECIN/VECCALC/VECOUT。LocalTensor的起始地址需要32字节对齐。repeatTime输入重复迭代次数int32_t类型每次排16个Region Proposals。取值范围repeatTime∈[0,255]。约束说明用户需保证src和dst中存储的Region Proposal数目大于实际所需数据否则会存在tensor越界错误。当存在proposal[i]与proposal[j]的score值相同时如果ij则proposal[j]将首先被选出来排在前面。操作数地址对齐要求请参见通用地址对齐约束。调用示例详细示例请参考RegionProposalSort样例。接口使用样例// ProposalConcat将连续元素合入Region Proposal内对应位置 // repeatTime 2, 对2个Region Proposal进行排序model4起始位置为4 AscendC::ProposalConcat(dstLocal, srcLocal, 2, 4); AscendC::RpSort16(dstLocal, dstLocal, 2);示例结果 输入数据srcLocal: [ -1.624 -42.3 -54.12 91.25 -99.4 36.72 67.44 -66.3 -52.53 3.377 -62.47 -15.85 -31.47 3.143 58.47 -83.75 21.58 63.47 7.234 35.16 -39.72 37.8 73.06 -98.7 44.1 -77.2 67.2 19.62 -87.9 -14.875 15.86 -77.75] 经过ProposalConcat后的dstLocal数据repeat2计算32个元素model4起始位置为4 [ 0. 0. 0. 0. -1.624 0. 0. 0. 0. 0. 0. 0. -42.3 0. 0. 0. 0. 0. 0. 0. -54.12 0. 0. 0. 0. 0. 0. 0. 91.25 0. 0. 0. 0. 0. 0. 0. -99.4 0. 0. 0. 0. 0. 0. 0. 36.72 0. 0. 0. 0. 0. 0. 0. 67.44 0. 0. 0. 0. 0. 0. 0. -66.3 0. 0. 0. 0. 0. 0. 0. -52.53 0. 0. 0. 0. 0. 0. 0. 3.377 0. 0. 0. 0. 0. 0. 0. -62.47 0. 0. 0. 0. 0. 0. 0. -15.85 0. 0. 0. 0. 0. 0. 0. -31.47 0. 0. 0. 0. 0. 0. 0. 3.143 0. 0. 0. 0. 0. 0. 0. 58.47 0. 0. 0. 0. 0. 0. 0. -83.75 0. 0. 0. 0. 0. 0. 0. 21.58 0. 0. 0. 0. 0. 0. 0. 63.47 0. 0. 0. 0. 0. 0. 0. 7.234 0. 0. 0. 0. 0. 0. 0. 35.16 0. 0. 0. 0. 0. 0. 0. -39.72 0. 0. 0. 0. 0. 0. 0. 37.8 0. 0. 0. 0. 0. 0. 0. 73.06 0. 0. 0. 0. 0. 0. 0. -98.7 0. 0. 0. 0. 0. 0. 0. 44.1 0. 0. 0. 0. 0. 0. 0. -77.2 0. 0. 0. 0. 0. 0. 0. 67.2 0. 0. 0. 0. 0. 0. 0. 19.62 0. 0. 0. 0. 0. 0. 0. -87.9 0. 0. 0. 0. 0. 0. 0. -14.875 0. 0. 0. 0. 0. 0. 0. 15.86 0. 0. 0. 0. 0. 0. 0. -77.75 0. 0. 0. ] 输出数据(dst_gm): [ 0. 0. 0. 0. 91.25 0. 0. 0. 0. 0. 0. 0. 67.44 0. 0. 0. 0. 0. 0. 0. 58.47 0. 0. 0. 0. 0. 0. 0. 36.72 0. 0. 0. 0. 0. 0. 0. 3.377 0. 0. 0. 0. 0. 0. 0. 3.143 0. 0. 0. 0. 0. 0. 0. -1.624 0. 0. 0. 0. 0. 0. 0. -15.85 0. 0. 0. 0. 0. 0. 0. -31.47 0. 0. 0. 0. 0. 0. 0. -42.3 0. 0. 0. 0. 0. 0. 0. -52.53 0. 0. 0. 0. 0. 0. 0. -54.12 0. 0. 0. 0. 0. 0. 0. -62.47 0. 0. 0. 0. 0. 0. 0. -66.3 0. 0. 0. 0. 0. 0. 0 -83.75 0. 0. 0. 0. 0. 0. 0. -99.4 0. 0. 0. 0. 0. 0. 0. 73.06 0. 0. 0. 0. 0. 0. 0. 67.2 0. 0. 0. 0. 0. 0. 0. 63.47 0. 0. 0. 0. 0. 0. 0. 44.1 0. 0. 0. 0. 0. 0. 0. 37.8 0. 0. 0. 0. 0. 0. 0. 35.16 0. 0. 0. 0. 0. 0. 0. 21.58 0. 0. 0. 0. 0. 0. 0. 19.62 0. 0. 0. 0. 0. 0. 0. 15.86 0. 0. 0. 0. 0. 0. 0. 7.234 0. 0. 0. 0. 0. 0. 0. -14.875 0. 0. 0. 0. 0. 0. 0. -39.72 0. 0. 0. 0. 0. 0. 0. -77.2 0. 0. 0. 0. 0. 0. 0. -77.75 0. 0. 0. 0. 0. 0. 0. -87.9 0. 0. 0. 0. 0. 0. 0. -98.7 0. 0. 0. ]【免费下载链接】asc-devkit本项目是CANN 推出的昇腾AI处理器专用的算子程序开发语言原生支持C和C标准规范主要由类库和语言扩展层构成提供多层级API满足多维场景算子开发诉求。项目地址: https://gitcode.com/cann/asc-devkit创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考
CANN/asc-devkit RpSort16排序API
发布时间:2026/5/20 11:59:47
RpSort16【免费下载链接】asc-devkit本项目是CANN 推出的昇腾AI处理器专用的算子程序开发语言原生支持C和C标准规范主要由类库和语言扩展层构成提供多层级API满足多维场景算子开发诉求。项目地址: https://gitcode.com/cann/asc-devkit产品支持情况产品是否支持Ascend 950PR/Ascend 950DTxAtlas A3 训练系列产品 / Atlas A3 推理系列产品xAtlas A2 训练系列产品 / Atlas A2 推理系列产品xAtlas 200I/500 A2 推理产品xAtlas 推理系列产品 AI Core√Atlas 推理系列产品 Vector CorexAtlas 训练系列产品√功能说明根据Region Proposals中的score域对其进行排序score大的排前面每次排16个Region Proposals。函数原型template typename T __aicore__ inline void RpSort16(const LocalTensorT dst, const LocalTensorT src, const int32_t repeatTime)参数说明表 1模板参数说明参数名描述T操作数数据类型。Atlas 训练系列产品支持的数据类型为halfAtlas 推理系列产品 AI Core支持的数据类型为half/float表 2参数说明参数名称输入/输出含义dst输出目的操作数存储经过排序后的Region Proposals。类型为LocalTensor支持的TPosition为VECIN/VECCALC/VECOUT。LocalTensor的起始地址需要32字节对齐。src输入源操作数存储未经过排序的Region Proposals。类型为LocalTensor支持的TPosition为VECIN/VECCALC/VECOUT。LocalTensor的起始地址需要32字节对齐。repeatTime输入重复迭代次数int32_t类型每次排16个Region Proposals。取值范围repeatTime∈[0,255]。约束说明用户需保证src和dst中存储的Region Proposal数目大于实际所需数据否则会存在tensor越界错误。当存在proposal[i]与proposal[j]的score值相同时如果ij则proposal[j]将首先被选出来排在前面。操作数地址对齐要求请参见通用地址对齐约束。调用示例详细示例请参考RegionProposalSort样例。接口使用样例// ProposalConcat将连续元素合入Region Proposal内对应位置 // repeatTime 2, 对2个Region Proposal进行排序model4起始位置为4 AscendC::ProposalConcat(dstLocal, srcLocal, 2, 4); AscendC::RpSort16(dstLocal, dstLocal, 2);示例结果 输入数据srcLocal: [ -1.624 -42.3 -54.12 91.25 -99.4 36.72 67.44 -66.3 -52.53 3.377 -62.47 -15.85 -31.47 3.143 58.47 -83.75 21.58 63.47 7.234 35.16 -39.72 37.8 73.06 -98.7 44.1 -77.2 67.2 19.62 -87.9 -14.875 15.86 -77.75] 经过ProposalConcat后的dstLocal数据repeat2计算32个元素model4起始位置为4 [ 0. 0. 0. 0. -1.624 0. 0. 0. 0. 0. 0. 0. -42.3 0. 0. 0. 0. 0. 0. 0. -54.12 0. 0. 0. 0. 0. 0. 0. 91.25 0. 0. 0. 0. 0. 0. 0. -99.4 0. 0. 0. 0. 0. 0. 0. 36.72 0. 0. 0. 0. 0. 0. 0. 67.44 0. 0. 0. 0. 0. 0. 0. -66.3 0. 0. 0. 0. 0. 0. 0. -52.53 0. 0. 0. 0. 0. 0. 0. 3.377 0. 0. 0. 0. 0. 0. 0. -62.47 0. 0. 0. 0. 0. 0. 0. -15.85 0. 0. 0. 0. 0. 0. 0. -31.47 0. 0. 0. 0. 0. 0. 0. 3.143 0. 0. 0. 0. 0. 0. 0. 58.47 0. 0. 0. 0. 0. 0. 0. -83.75 0. 0. 0. 0. 0. 0. 0. 21.58 0. 0. 0. 0. 0. 0. 0. 63.47 0. 0. 0. 0. 0. 0. 0. 7.234 0. 0. 0. 0. 0. 0. 0. 35.16 0. 0. 0. 0. 0. 0. 0. -39.72 0. 0. 0. 0. 0. 0. 0. 37.8 0. 0. 0. 0. 0. 0. 0. 73.06 0. 0. 0. 0. 0. 0. 0. -98.7 0. 0. 0. 0. 0. 0. 0. 44.1 0. 0. 0. 0. 0. 0. 0. -77.2 0. 0. 0. 0. 0. 0. 0. 67.2 0. 0. 0. 0. 0. 0. 0. 19.62 0. 0. 0. 0. 0. 0. 0. -87.9 0. 0. 0. 0. 0. 0. 0. -14.875 0. 0. 0. 0. 0. 0. 0. 15.86 0. 0. 0. 0. 0. 0. 0. -77.75 0. 0. 0. ] 输出数据(dst_gm): [ 0. 0. 0. 0. 91.25 0. 0. 0. 0. 0. 0. 0. 67.44 0. 0. 0. 0. 0. 0. 0. 58.47 0. 0. 0. 0. 0. 0. 0. 36.72 0. 0. 0. 0. 0. 0. 0. 3.377 0. 0. 0. 0. 0. 0. 0. 3.143 0. 0. 0. 0. 0. 0. 0. -1.624 0. 0. 0. 0. 0. 0. 0. -15.85 0. 0. 0. 0. 0. 0. 0. -31.47 0. 0. 0. 0. 0. 0. 0. -42.3 0. 0. 0. 0. 0. 0. 0. -52.53 0. 0. 0. 0. 0. 0. 0. -54.12 0. 0. 0. 0. 0. 0. 0. -62.47 0. 0. 0. 0. 0. 0. 0. -66.3 0. 0. 0. 0. 0. 0. 0 -83.75 0. 0. 0. 0. 0. 0. 0. -99.4 0. 0. 0. 0. 0. 0. 0. 73.06 0. 0. 0. 0. 0. 0. 0. 67.2 0. 0. 0. 0. 0. 0. 0. 63.47 0. 0. 0. 0. 0. 0. 0. 44.1 0. 0. 0. 0. 0. 0. 0. 37.8 0. 0. 0. 0. 0. 0. 0. 35.16 0. 0. 0. 0. 0. 0. 0. 21.58 0. 0. 0. 0. 0. 0. 0. 19.62 0. 0. 0. 0. 0. 0. 0. 15.86 0. 0. 0. 0. 0. 0. 0. 7.234 0. 0. 0. 0. 0. 0. 0. -14.875 0. 0. 0. 0. 0. 0. 0. -39.72 0. 0. 0. 0. 0. 0. 0. -77.2 0. 0. 0. 0. 0. 0. 0. -77.75 0. 0. 0. 0. 0. 0. 0. -87.9 0. 0. 0. 0. 0. 0. 0. -98.7 0. 0. 0. ]【免费下载链接】asc-devkit本项目是CANN 推出的昇腾AI处理器专用的算子程序开发语言原生支持C和C标准规范主要由类库和语言扩展层构成提供多层级API满足多维场景算子开发诉求。项目地址: https://gitcode.com/cann/asc-devkit创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考