3步实现语义引导LiDAR体素遮掩:让MAE预训练更聪明(附代码) 本文定位CSDN 代码实战 | 语义引导 LiDAR 体素遮掩核心模块复现 核心收益3 步实现语义引导遮掩 辅助语义监督附完整 PyTorch 代码3步实现语义引导LiDAR体素遮掩让MAE预训练更聪明附代码前言多模态 MAE 预训练中均匀随机遮掩忽略了语义重要性。本文用 3 个步骤实现语义引导的 LiDAR 体素遮掩和辅助语义监督。Step 1语义类别重要性分析importtorchdefanalyze_class_importance(voxel_labels,voxel_recon_before,voxel_recon_after,classes): 量化每个语义类别的重建重要性 遮掩某类别后重建退化越大该类别越重要 importance{}forcls_name,cls_idinclasses.items():# 找到属于该类别的体素cls_mask(voxel_labelscls_id)ifcls_mask.sum()0:continue# 计算遮掩前后的重建指标退化chamfer_beforecompute_chamfer(voxel_recon_before[cls_mask])chamfer_aftercompute_chamfer(voxel_recon_after[cls_mask])degradationchamfer_after-chamfer_before importance[cls_name]degradationprint(f{cls_name}: 退化{degradation:.4f})# 按退化程度排序rankedsorted(importance.items(),keylambdax:x[1],reverseTrue)returnrankedStep 2语义引导遮掩策略defsemantics_guided_masking(voxel_labels,mask_ratio0.70): 按语义重要性重新分配遮掩比例 # 重要性权重从论文Table IIweights{car:0.75,pedestrian:0.75,construction_vehicle:0.75,motorcycle:0.95,truck:0.95,bus:0.95,traffic_cone:0.95,barrier:0.95,trailer:1.05,bicycle:1.05,background:1.20,}Nlen(voxel_labels)n_maskint(N*mask_ratio)# 计算每个体素的遮掩概率mask_probtorch.tensor([weights.get(get_class_name(l.item()),1.0)forlinvoxel_labels])mask_probmask_prob/mask_prob.sum()*n_mask# 按概率采样masktorch.bernoulli(mask_prob.clamp(max1.0)).bool()# 精确调整到目标遮掩比例ifmask.sum()n_mask:excessmask.sum()-n_mask idxtorch.where(mask)[0][torch.randperm(mask.sum())[:excess]]mask[idx]Falseelifmask.sum()n_mask:deficitn_mask-mask.sum()idxtorch.where(~mask)[0][torch.randperm((~mask).sum())[:deficit]]mask[idx]TruereturnmaskStep 3辅助语义监督损失classSemanticSupervisionLoss(torch.nn.Module): 解码器端逐点语义监督def__init__(self,feat_dim128,num_classes16):super().__init__()self.headtorch.nn.Sequential(torch.nn.Linear(feat_dim3,64),torch.nn.ReLU(),torch.nn.Linear(64,num_classes),)self.cetorch.nn.CrossEntropyLoss()defforward(self,voxel_feat,point_offset,sem_labels):# 拼接体素特征 局部偏移ztorch.cat([voxel_feat,point_offset],dim-1)logitsself.head(z)returnself.ce(logits,sem_labels)预期效果方法mAPNDS均匀随机遮掩24.72%31.41%语义引导遮掩26.21%33.07%辅助语义监督26.11%34.63%完整代码和详细解析请查看主文章语义引导掩码预训练LiDAR体素语义遮掩辅助语义监督nuScenes 3D BEV检测NDS提升3.22%