目标跟踪实战SORT、DeepSORT 与 ByteTrack 原理实现1. 引言目标跟踪Multi-Object Tracking, MOT是计算机视觉的核心任务之一。在自动驾驶、视频监控、运动分析等场景中需要在连续帧中维持每个目标的唯一身份。核心挑战检测器给出每帧的目标框但不知道第1帧的车A和第2帧的车A是同一辆车。技术演进SORT (2016) → DeepSORT (2017) → ByteTrack (2021) → BoT-SORT (2023) 卡尔曼滤波 外观特征 低分检测利用 相机补偿2. SORTSimple Online and Realtime Tracking2.1 核心流程帧 t: 检测结果 D_t {d_1, d_2, ...} 帧 t-1: 跟踪轨迹 T_{t-1} {t_1, t_2, ...} 1. 预测用卡尔曼滤波预测每个轨迹在帧 t 的位置 2. 匹配用匈牙利算法将预测框与检测框匹配 3. 更新匹配成功的轨迹用检测框更新 4. 创建未匹配的检测创建新轨迹 5. 删除长时间未匹配的轨迹删除2.2 卡尔曼滤波importnumpyasnpfromfilterpy.kalmanimportKalmanFilterclassKalmanBoxTracker:基于卡尔曼滤波的边界框跟踪器count0def__init__(self,bbox): bbox: [x1, y1, x2, y2] → 转为 [cx, cy, s, r] cx, cy 中心坐标 s 面积 r 宽高比 self.kfKalmanFilter(dim_x7,dim_z4)# 状态转移矩阵 Fself.kf.Fnp.array([[1,0,0,0,1,0,0],[0,1,0,0,0,1,0],[0,0,1,0,0,0,1],[0,0,0,1,0,0,0],[0,0,0,0,1,0,0],[0,0,0,0,0,1,0],[0,0,0,0,0,0,1],])# 观测矩阵 Hself.kf.Hnp.array([[1,0,0,0,0,0,0],[0,1,0,0,0,0,0],[0,0,1,0,0,0,0],[0,0,0,1,0,0,0],])# 噪声self.kf.R[2:,2:]*10.self.kf.P[4:,4:]*1000.self.kf.P*10.self.kf.Q[-1,-1]*0.01self.kf.Q[4:,4:]*0.01# 初始化状态self.kf.x[:4]self._bbox_to_z(bbox)self.time_since_update0self.idKalmanBoxTracker.count KalmanBoxTracker.count1self.history[]self.hits0self.hit_streak0self.age0def_bbox_to_z(self,bbox):[x1,y1,x2,y2] → [cx,cy,s,r]wbbox[2]-bbox[0]hbbox[3]-bbox[1]cxbbox[0]w/2.cybbox[1]h/2.sw*h rw/float(h)returnnp.array([[cx],[cy],[s],[r]])def_z_to_bbox(self,z):[cx,cy,s,r] → [x1,y1,x2,y2]wnp.sqrt(z[2]*z[3])hz[2]/wreturnnp.array([z[0]-w/2.,z[1]-h/2.,z[0]w/2.,z[1]h/2.]).reshape((1,4))defpredict(self):预测下一帧位置self.kf.predict()self.age1ifself.time_since_update0:self.hit_streak0self.time_since_update1returnself._z_to_bbox(self.kf.x)defupdate(self,bbox):用检测框更新self.time_since_update0self.hits1self.hit_streak1self.kf.update(self._bbox_to_z(bbox))defget_state(self):returnself._z_to_bbox(self.kf.x)2.3 SORT 主流程fromscipy.optimizeimportlinear_sum_assignmentclassSORT:def__init__(self,max_age5,min_hits3,iou_threshold0.3):self.max_agemax_age self.min_hitsmin_hits self.iou_thresholdiou_threshold self.trackers[]self.frame_count0defupdate(self,detections): detections: (N, 5) — [x1, y1, x2, y2, score] 返回: (M, 5) — [x1, y1, x2, y2, track_id] self.frame_count1# 预测已有轨迹predicted[]fortrkinself.trackers:predtrk.predict()predicted.append(pred[0])predictednp.array(predicted)ifpredictedelsenp.empty((0,4))# 匹配IoU 匈牙利算法iflen(predicted)0andlen(detections)0:iou_matrixself._iou_batch(detections[:,:4],predicted)row_idx,col_idxlinear_sum_assignment(-iou_matrix)# 过滤低 IoUmatched[]unmatched_detslist(range(len(detections)))unmatched_trkslist(range(len(predicted)))forr,cinzip(row_idx,col_idx):ifiou_matrix[r,c]self.iou_threshold:matched.append((r,c))unmatched_dets.remove(r)unmatched_trks.remove(c)else:matched[]unmatched_detslist(range(len(detections)))unmatched_trkslist(range(len(predicted)))# 更新匹配的轨迹ford,tinmatched:self.trackers[t].update(detections[d,:4])# 创建新轨迹fordinunmatched_dets:trkKalmanBoxTracker(detections[d,:4])self.trackers.append(trk)# 删除旧轨迹self.trackers[tfortinself.trackersift.time_since_updateself.max_age]# 输出确认的轨迹results[]fortrkinself.trackers:iftrk.hitsself.min_hits:bboxtrk.get_state()[0]results.append([*bbox,trk.id])returnnp.array(results)ifresultselsenp.empty((0,5))def_iou_batch(self,bb_dets,bb_trks):计算 IoU 矩阵defbox_iou(a,b):x1max(a[0],b[0])y1max(a[1],b[1])x2min(a[2],b[2])y2min(a[3],b[3])intermax(0,x2-x1)*max(0,y2-y1)area_a(a[2]-a[0])*(a[3]-a[1])area_b(b[2]-b[0])*(b[3]-b[1])returninter/(area_aarea_b-inter1e-6)iounp.zeros((len(bb_dets),len(bb_trks)))fordinrange(len(bb_dets)):fortinrange(len(bb_trks)):iou[d,t]box_iou(bb_dets[d],bb_trks[t])returniou3. DeepSORT3.1 改进点DeepSORT 在 SORT 基础上加入外观特征代价矩阵 α × IoU代价 (1-α) × 外观代价 外观特征CNN 提取 128 维特征向量 特征库每个轨迹维护一个特征队列最近 100 帧 匹配余弦距离3.2 外观特征提取importtorchimporttorch.nnasnnfromtorchvision.modelsimportresnet50classFeatureExtractor(nn.Module):外观特征提取器def__init__(self,feature_dim128):super().__init__()backboneresnet50(pretrainedTrue)self.featuresnn.Sequential(*list(backbone.children())[:-1])self.fcnn.Linear(2048,feature_dim)defforward(self,images): images: (B, 3, 128, 64) — 裁剪的目标图像 返回: (B, 128) — 归一化特征 featself.features(images).flatten(1)featself.fc(feat)featnn.functional.normalize(feat,dim1)returnfeat3.3 级联匹配classDeepSORT:def__init__(self,max_age70,nn_budget100):self.max_agemax_age self.nn_budgetnn_budget self.tracks[]self.feature_extractorFeatureExtractor()defupdate(self,detections,features): detections: (N, 5) — [x1, y1, x2, y2, score] features: (N, 128) — 外观特征 # 1. 预测fortrackinself.tracks:track.predict()# 2. 级联匹配优先匹配更长时间未更新的轨迹matched,unmatched_dets,unmatched_trksself._cascade_match(detections,features)# 3. IoU 匹配剩余的用 IoU 匹配iflen(unmatched_dets)0andlen(unmatched_trks)0:iou_matched,unmatched_dets,unmatched_trksself._iou_match(detections[unmatched_dets],unmatched_trks)matched.extend(iou_matched)# 4. 更新/创建/删除ford,tinmatched:self.tracks[t].update(detections[d],features[d])fordinunmatched_dets:self.tracks.append(Track(detections[d],features[d]))self.tracks[tfortinself.tracksift.time_since_updateself.max_age]returnself._get_results()def_cascade_match(self,detections,features):级联匹配matched[]unmatched_detslist(range(len(detections)))forageinrange(self.max_age1):tracks_of_age[ifori,tinenumerate(self.tracks)ift.time_since_updateage]ifnottracks_of_ageornotunmatched_dets:continue# 计算代价矩阵cost_matrixself._cosine_distance(features[unmatched_dets],[self.tracks[t].featuresfortintracks_of_age])row_idx,col_idxlinear_sum_assignment(cost_matrix)new_matched[]forr,cinzip(row_idx,col_idx):ifcost_matrix[r,c]0.7:# 余弦距离阈值new_matched.append((unmatched_dets[r],tracks_of_age[c]))unmatched_dets.remove(unmatched_dets[r])matched.extend(new_matched)unmatched_trks[iforiinrange(len(self.tracks))ifself.tracks[i].time_since_update0andinotin[m[1]forminmatched]]returnmatched,unmatched_dets,unmatched_trks4. ByteTrack4.1 核心创新ByteTrack 的关键洞察低分检测框也有用传统方法 高分检测 (0.6) → 匹配跟踪 低分检测 (0.6) → 直接丢弃 ByteTrack 第一轮高分检测 ↔ 已有轨迹 匹配 第二轮低分检测 ↔ 剩余轨迹 匹配 第三轮未匹配高分检测 → 创建新轨迹4.2 实现classByteTrack:def__init__(self,high_thresh0.6,low_thresh0.1,max_age30):self.high_threshhigh_thresh self.low_threshlow_thresh self.max_agemax_age self.tracks[]self.track_id0defupdate(self,detections): detections: (N, 6) — [x1, y1, x2, y2, score, class] # 分为高分和低分检测high_detsdetections[detections[:,4]self.high_thresh]low_detsdetections[(detections[:,4]self.low_thresh)(detections[:,4]self.high_thresh)]# 预测fortrackinself.tracks:track.predict()# 第一轮高分检测 ↔ 所有轨迹matched1,unmatched_tracks,unmatched_highself._match(self.tracks,high_dets,thresh0.3)# 第二轮低分检测 ↔ 剩余轨迹remaining_tracks[self.tracks[i]foriinunmatched_tracks]matched2,still_unmatched,unmatched_lowself._match(remaining_tracks,low_dets,thresh0.5)# 更新匹配的轨迹fort_idx,d_idxinmatched1:self.tracks[t_idx].update(high_dets[d_idx])fort_idx,d_idxinmatched2:remaining_tracks[t_idx].update(low_dets[d_idx])# 创建新轨迹仅高分检测ford_idxinunmatched_high:self.tracks.append(Track(high_dets[d_idx],self.track_id))self.track_id1# 删除旧轨迹self.tracks[tfortinself.tracksift.time_since_updateself.max_age]returnself._get_results()def_match(self,tracks,detections,thresh):IoU 匈牙利匹配ifnottracksorlen(detections)0:return[],list(range(len(tracks))),list(range(len(detections)))# IoU 矩阵iou_matrixself._compute_iou(tracks,detections)row_idx,col_idxlinear_sum_assignment(-iou_matrix)matched,unmatched_tracks,unmatched_dets[],[],[]matched_tracks,matched_detsset(),set()forr,cinzip(row_idx,col_idx):ifiou_matrix[r,c]thresh:matched.append((r,c))matched_tracks.add(r)matched_dets.add(c)unmatched_tracks[iforiinrange(len(tracks))ifinotinmatched_tracks]unmatched_dets[iforiinrange(len(detections))ifinotinmatched_dets]returnmatched,unmatched_tracks,unmatched_dets5. 算法对比算法匹配策略外观特征速度MOTASORTIoU无200 FPS59.8DeepSORTIoU 外观ResNet40 FPS61.4ByteTrack两轮 IoU无150 FPS80.3BoT-SORTIoU 外观 相机补偿ResNet35 FPS81.26. 总结目标跟踪的核心是帧间关联SORT最简单卡尔曼滤波 IoU 匹配DeepSORT加入外观特征解决遮挡后重识别ByteTrack利用低分检测大幅减少漏检实践建议速度优先选 ByteTrack精度优先选 BoT-SORT
目标跟踪实战:SORT、DeepSORT 与 ByteTrack 原理实现
发布时间:2026/6/21 22:33:50
目标跟踪实战SORT、DeepSORT 与 ByteTrack 原理实现1. 引言目标跟踪Multi-Object Tracking, MOT是计算机视觉的核心任务之一。在自动驾驶、视频监控、运动分析等场景中需要在连续帧中维持每个目标的唯一身份。核心挑战检测器给出每帧的目标框但不知道第1帧的车A和第2帧的车A是同一辆车。技术演进SORT (2016) → DeepSORT (2017) → ByteTrack (2021) → BoT-SORT (2023) 卡尔曼滤波 外观特征 低分检测利用 相机补偿2. SORTSimple Online and Realtime Tracking2.1 核心流程帧 t: 检测结果 D_t {d_1, d_2, ...} 帧 t-1: 跟踪轨迹 T_{t-1} {t_1, t_2, ...} 1. 预测用卡尔曼滤波预测每个轨迹在帧 t 的位置 2. 匹配用匈牙利算法将预测框与检测框匹配 3. 更新匹配成功的轨迹用检测框更新 4. 创建未匹配的检测创建新轨迹 5. 删除长时间未匹配的轨迹删除2.2 卡尔曼滤波importnumpyasnpfromfilterpy.kalmanimportKalmanFilterclassKalmanBoxTracker:基于卡尔曼滤波的边界框跟踪器count0def__init__(self,bbox): bbox: [x1, y1, x2, y2] → 转为 [cx, cy, s, r] cx, cy 中心坐标 s 面积 r 宽高比 self.kfKalmanFilter(dim_x7,dim_z4)# 状态转移矩阵 Fself.kf.Fnp.array([[1,0,0,0,1,0,0],[0,1,0,0,0,1,0],[0,0,1,0,0,0,1],[0,0,0,1,0,0,0],[0,0,0,0,1,0,0],[0,0,0,0,0,1,0],[0,0,0,0,0,0,1],])# 观测矩阵 Hself.kf.Hnp.array([[1,0,0,0,0,0,0],[0,1,0,0,0,0,0],[0,0,1,0,0,0,0],[0,0,0,1,0,0,0],])# 噪声self.kf.R[2:,2:]*10.self.kf.P[4:,4:]*1000.self.kf.P*10.self.kf.Q[-1,-1]*0.01self.kf.Q[4:,4:]*0.01# 初始化状态self.kf.x[:4]self._bbox_to_z(bbox)self.time_since_update0self.idKalmanBoxTracker.count KalmanBoxTracker.count1self.history[]self.hits0self.hit_streak0self.age0def_bbox_to_z(self,bbox):[x1,y1,x2,y2] → [cx,cy,s,r]wbbox[2]-bbox[0]hbbox[3]-bbox[1]cxbbox[0]w/2.cybbox[1]h/2.sw*h rw/float(h)returnnp.array([[cx],[cy],[s],[r]])def_z_to_bbox(self,z):[cx,cy,s,r] → [x1,y1,x2,y2]wnp.sqrt(z[2]*z[3])hz[2]/wreturnnp.array([z[0]-w/2.,z[1]-h/2.,z[0]w/2.,z[1]h/2.]).reshape((1,4))defpredict(self):预测下一帧位置self.kf.predict()self.age1ifself.time_since_update0:self.hit_streak0self.time_since_update1returnself._z_to_bbox(self.kf.x)defupdate(self,bbox):用检测框更新self.time_since_update0self.hits1self.hit_streak1self.kf.update(self._bbox_to_z(bbox))defget_state(self):returnself._z_to_bbox(self.kf.x)2.3 SORT 主流程fromscipy.optimizeimportlinear_sum_assignmentclassSORT:def__init__(self,max_age5,min_hits3,iou_threshold0.3):self.max_agemax_age self.min_hitsmin_hits self.iou_thresholdiou_threshold self.trackers[]self.frame_count0defupdate(self,detections): detections: (N, 5) — [x1, y1, x2, y2, score] 返回: (M, 5) — [x1, y1, x2, y2, track_id] self.frame_count1# 预测已有轨迹predicted[]fortrkinself.trackers:predtrk.predict()predicted.append(pred[0])predictednp.array(predicted)ifpredictedelsenp.empty((0,4))# 匹配IoU 匈牙利算法iflen(predicted)0andlen(detections)0:iou_matrixself._iou_batch(detections[:,:4],predicted)row_idx,col_idxlinear_sum_assignment(-iou_matrix)# 过滤低 IoUmatched[]unmatched_detslist(range(len(detections)))unmatched_trkslist(range(len(predicted)))forr,cinzip(row_idx,col_idx):ifiou_matrix[r,c]self.iou_threshold:matched.append((r,c))unmatched_dets.remove(r)unmatched_trks.remove(c)else:matched[]unmatched_detslist(range(len(detections)))unmatched_trkslist(range(len(predicted)))# 更新匹配的轨迹ford,tinmatched:self.trackers[t].update(detections[d,:4])# 创建新轨迹fordinunmatched_dets:trkKalmanBoxTracker(detections[d,:4])self.trackers.append(trk)# 删除旧轨迹self.trackers[tfortinself.trackersift.time_since_updateself.max_age]# 输出确认的轨迹results[]fortrkinself.trackers:iftrk.hitsself.min_hits:bboxtrk.get_state()[0]results.append([*bbox,trk.id])returnnp.array(results)ifresultselsenp.empty((0,5))def_iou_batch(self,bb_dets,bb_trks):计算 IoU 矩阵defbox_iou(a,b):x1max(a[0],b[0])y1max(a[1],b[1])x2min(a[2],b[2])y2min(a[3],b[3])intermax(0,x2-x1)*max(0,y2-y1)area_a(a[2]-a[0])*(a[3]-a[1])area_b(b[2]-b[0])*(b[3]-b[1])returninter/(area_aarea_b-inter1e-6)iounp.zeros((len(bb_dets),len(bb_trks)))fordinrange(len(bb_dets)):fortinrange(len(bb_trks)):iou[d,t]box_iou(bb_dets[d],bb_trks[t])returniou3. DeepSORT3.1 改进点DeepSORT 在 SORT 基础上加入外观特征代价矩阵 α × IoU代价 (1-α) × 外观代价 外观特征CNN 提取 128 维特征向量 特征库每个轨迹维护一个特征队列最近 100 帧 匹配余弦距离3.2 外观特征提取importtorchimporttorch.nnasnnfromtorchvision.modelsimportresnet50classFeatureExtractor(nn.Module):外观特征提取器def__init__(self,feature_dim128):super().__init__()backboneresnet50(pretrainedTrue)self.featuresnn.Sequential(*list(backbone.children())[:-1])self.fcnn.Linear(2048,feature_dim)defforward(self,images): images: (B, 3, 128, 64) — 裁剪的目标图像 返回: (B, 128) — 归一化特征 featself.features(images).flatten(1)featself.fc(feat)featnn.functional.normalize(feat,dim1)returnfeat3.3 级联匹配classDeepSORT:def__init__(self,max_age70,nn_budget100):self.max_agemax_age self.nn_budgetnn_budget self.tracks[]self.feature_extractorFeatureExtractor()defupdate(self,detections,features): detections: (N, 5) — [x1, y1, x2, y2, score] features: (N, 128) — 外观特征 # 1. 预测fortrackinself.tracks:track.predict()# 2. 级联匹配优先匹配更长时间未更新的轨迹matched,unmatched_dets,unmatched_trksself._cascade_match(detections,features)# 3. IoU 匹配剩余的用 IoU 匹配iflen(unmatched_dets)0andlen(unmatched_trks)0:iou_matched,unmatched_dets,unmatched_trksself._iou_match(detections[unmatched_dets],unmatched_trks)matched.extend(iou_matched)# 4. 更新/创建/删除ford,tinmatched:self.tracks[t].update(detections[d],features[d])fordinunmatched_dets:self.tracks.append(Track(detections[d],features[d]))self.tracks[tfortinself.tracksift.time_since_updateself.max_age]returnself._get_results()def_cascade_match(self,detections,features):级联匹配matched[]unmatched_detslist(range(len(detections)))forageinrange(self.max_age1):tracks_of_age[ifori,tinenumerate(self.tracks)ift.time_since_updateage]ifnottracks_of_ageornotunmatched_dets:continue# 计算代价矩阵cost_matrixself._cosine_distance(features[unmatched_dets],[self.tracks[t].featuresfortintracks_of_age])row_idx,col_idxlinear_sum_assignment(cost_matrix)new_matched[]forr,cinzip(row_idx,col_idx):ifcost_matrix[r,c]0.7:# 余弦距离阈值new_matched.append((unmatched_dets[r],tracks_of_age[c]))unmatched_dets.remove(unmatched_dets[r])matched.extend(new_matched)unmatched_trks[iforiinrange(len(self.tracks))ifself.tracks[i].time_since_update0andinotin[m[1]forminmatched]]returnmatched,unmatched_dets,unmatched_trks4. ByteTrack4.1 核心创新ByteTrack 的关键洞察低分检测框也有用传统方法 高分检测 (0.6) → 匹配跟踪 低分检测 (0.6) → 直接丢弃 ByteTrack 第一轮高分检测 ↔ 已有轨迹 匹配 第二轮低分检测 ↔ 剩余轨迹 匹配 第三轮未匹配高分检测 → 创建新轨迹4.2 实现classByteTrack:def__init__(self,high_thresh0.6,low_thresh0.1,max_age30):self.high_threshhigh_thresh self.low_threshlow_thresh self.max_agemax_age self.tracks[]self.track_id0defupdate(self,detections): detections: (N, 6) — [x1, y1, x2, y2, score, class] # 分为高分和低分检测high_detsdetections[detections[:,4]self.high_thresh]low_detsdetections[(detections[:,4]self.low_thresh)(detections[:,4]self.high_thresh)]# 预测fortrackinself.tracks:track.predict()# 第一轮高分检测 ↔ 所有轨迹matched1,unmatched_tracks,unmatched_highself._match(self.tracks,high_dets,thresh0.3)# 第二轮低分检测 ↔ 剩余轨迹remaining_tracks[self.tracks[i]foriinunmatched_tracks]matched2,still_unmatched,unmatched_lowself._match(remaining_tracks,low_dets,thresh0.5)# 更新匹配的轨迹fort_idx,d_idxinmatched1:self.tracks[t_idx].update(high_dets[d_idx])fort_idx,d_idxinmatched2:remaining_tracks[t_idx].update(low_dets[d_idx])# 创建新轨迹仅高分检测ford_idxinunmatched_high:self.tracks.append(Track(high_dets[d_idx],self.track_id))self.track_id1# 删除旧轨迹self.tracks[tfortinself.tracksift.time_since_updateself.max_age]returnself._get_results()def_match(self,tracks,detections,thresh):IoU 匈牙利匹配ifnottracksorlen(detections)0:return[],list(range(len(tracks))),list(range(len(detections)))# IoU 矩阵iou_matrixself._compute_iou(tracks,detections)row_idx,col_idxlinear_sum_assignment(-iou_matrix)matched,unmatched_tracks,unmatched_dets[],[],[]matched_tracks,matched_detsset(),set()forr,cinzip(row_idx,col_idx):ifiou_matrix[r,c]thresh:matched.append((r,c))matched_tracks.add(r)matched_dets.add(c)unmatched_tracks[iforiinrange(len(tracks))ifinotinmatched_tracks]unmatched_dets[iforiinrange(len(detections))ifinotinmatched_dets]returnmatched,unmatched_tracks,unmatched_dets5. 算法对比算法匹配策略外观特征速度MOTASORTIoU无200 FPS59.8DeepSORTIoU 外观ResNet40 FPS61.4ByteTrack两轮 IoU无150 FPS80.3BoT-SORTIoU 外观 相机补偿ResNet35 FPS81.26. 总结目标跟踪的核心是帧间关联SORT最简单卡尔曼滤波 IoU 匹配DeepSORT加入外观特征解决遮挡后重识别ByteTrack利用低分检测大幅减少漏检实践建议速度优先选 ByteTrack精度优先选 BoT-SORT