-
摘要:
针对海面舰船多目标跟踪过程中图像背景复杂、目标尺度差异大等难点,提出了一种改进CSTrack的舰船多目标跟踪算法。首先,针对CSTrack算法使用暴力解耦分解颈部特征造成目标特征损失的问题,提出了一种结合Res2net模块的改进互相关解耦网络RES_CCN,使网络解耦后获得更加细粒度的特征。其次,为提升对多类别舰船的跟踪性能,采用检测头网络解耦设计分别预测目标类别、置信度和位置。最后,采用MOT2016数据集进行消融实验,验证了所提模块的有效性,在新加坡海事数据集上进行测试,所提算法的多目标跟踪精度提升了8.4%,目标识别准确度提升了3.1%,优于ByteTrack等算法。本文所提算法具有跟踪精度高、误检率低等优点,适用于海面舰船多目标跟踪任务。
Abstract:Due to the difficulties of complex backgrounds and large-scale differences between objects during the process of ship multi-object tracking in sea-surface scenarios, an improved CSTrack algorithm for ship multi-object tracking is proposed in this paper. Firstly, as violent decoupling is used in the CSTrack algorithm to decompose neck features and cause object feature loss, an improved cross-correlation decoupling network that combines the Res2net module (RES_CCN) is proposed, and thus more fine-grained features can be obtained. Secondly, to improve the tracking performance of multi-class ships, the decoupled design of the detection head network is used to predict the class, confidence, and position of objects, respectively. Finally, the MOT2016 dataset is used for the ablation experiment to verify the effectiveness of the proposed module. When tested on the Singapore maritime dataset, the multiple object tracking accuracy of the proposed algorithm is improved by 8.4% and the identification F1 score is increased by 3.1%, which are better than those of the ByteTrack and other algorithms. The proposed algorithm has the advantages of high tracking accuracy and low error detection rate and is suitable for ship multi-object tracking in sea-surface scenarios.
-
Overview: Ship multi-object tracking is an important application scenario in the field of multi-object tracking (MOT), and can be widely applied in both military and civilian fields. The objective of MOT is to locate multiple ship objects and maintain a unique identification (ID) number for each ship object, and record its continuous trajectory. The difficulty of MOT lies in the uncertainty of false positives, false negatives, ID switches, and object numbers. The feature maps obtained by the neck part of the network in CSTrack multi-object tracking algorithm are decomposed into two different feature vectors by decoupling, and are as the input of object detection and Re-identification networks respectively to alleviate the contradiction between these two tasks and improve the performance of multi-object tracking. However, this kind of violent decoupling will bring about the problem of object feature loss, which leads to the deterioration of tracking performance in the case of object occlusion, small objects, or dense objects. To solve this issue, an improved cross-correlation network (CCN) named RES_CCN which can extract fine-grained features is proposed in this paper. This network is composed of an improved Res2net network, coordinate attention, and CCN network, and is inserted between the neck and head modules of the network, so that more fine-grained features can be obtained by increasing the receptive field and inserting more hierarchical residual connection structures into the residual unit before feature decoupling. To meet the requirements of multi-class ship multi-object tracking and improve the detection performance of the algorithm, the decoupled design of the detection head network is used to predict class, confidence, and position of objects, respectively, and binary cross-entropy is used as class loss function and added to the total loss function. Finally, the ablation experimental results on the MOT2016 dataset show that the multiple object tracking accuracy (MOTA) of the proposed algorithm has an improvement of 4.6 compared with that of the original algorithm, and the identification F1 score (IDF1) is increased by 3.4. When tested on the Singapore maritime dataset, the MOTA of the proposed algorithm is improved by 8.4 compared with that of the original CSTrack, and IDF1 is increased by 3.1, which are better than the performance of ByteTrack and other algorithms. The qualitative experimental results show that the proposed algorithm can effectively detect small objects and maintain object IDs in sea-surface scenarios. The algorithm proposed in this paper has the characteristics of high tracking accuracy and low error detection rate, and is suitable for ship multi-object tracking in sea-surface scenarios.
-
-
表 1 调整的SMD数据集视频序列相关参数
Table 1. Adjusted SMD dataset video sequence related parameters
SMD视频序列 视频帧数 Ferry Vessel-ship Speed-boat Boat Kayak Sail-boat 调整前 调整后 MVI_1448 600 - 3210 1410 - - - 测试集 - MVI_1474 445 890 3560 - - - - 测试集 - MVI_1484 600 600 1200 - - - - 测试集 - MVI_1486 600 1023 4200 - - - - 测试集 - MVI_1582 540 540 5400 - - - - 测试集 - MVI_1612 261 165 2349 - - - - 测试集 - MVI_1626 556 - 2775 - - - - 测试集 - MVI_1627 600 - 4200 - - - - 测试集 - MVI_1640 310 - 1677 274 - - - 测试集 - MVI_0797 600 - 767 - - - - 测试集 - MVI_1587 600 - 7800 - 600 - - 训练集 测试集 MVI_1592 491 491 2347 - - 791 - 训练集 测试集 MVI_1452 340 - 1360 - - - 340 训练集 测试集 MVI_1469 600 - 3600 941 - - - 验证集 训练集 MVI_1578 505 - 3535 - - - - 验证集 训练集 MVI_0790 600 - 70 - 140 - - 验证集 训练集 MVI_0799 600 - 390 170 - - - - 训练集 表 2 MOT16数据集上不同模块对跟踪性能的影响
Table 2. Influence of different modules on the tracking performance on MOT16 dataset
模型 MOTA↑ IDF1↑ FP↓ FN↓ MT↑ ML↓ IDS↓ Baseline 79.4 77.9 6235 15584 354 29 876 Baseline+Res2net* 82.8 79.7 4714 13966 390 21 616 Baseline+CA 82.4 78.3 4776 14022 377 21 642 Baseline+检测头解耦 82.7 79.2 4628 14318 375 28 571 Baseline* 82.2 75.4 4927 13801 389 23 875 Baseline*+Res2net 83.2 75.8 4459 13350 398 22 758 Baseline*+Res2net* 83.1 80.8 4413 13720 385 23 536 Baseline*+ Res2net* +CA注意力机制(Baseline*+RES_CCN) 83.4 81.9 4335 13434 393 18 571 Baseline*+ Res2net* +CA注意力机制+检测头解耦 84.0 81.3 4000 13107 400 20 480 表 3 不同注意力机制对跟踪性能的影响
Table 3. Influence of different attention mechanisms on tracking performance
模型 MOTA↑ IDF1↑ FP↓ FN↓ MT↑ ML↓ IDS↓ SE 83.0 78.6 4624 13557 394 18 589 CBAM 83.6 80.8 4229 13402 391 20 491 ECA 80.5 79.3 3316 17806 351 29 489 CA 84.0 81.3 4000 13107 400 20 480 表 4 ReID权重参数对跟踪性能的影响
Table 4. Influence of ReID weight parameters on tracking performance
ReID权重参数 MOTA↑ IDF1↑ FP↓ FN↓ MT↑ ML↓ IDS↓ 4X10-2 80.1 83.0 4685 14488 374 28 576 4X10-3 81.1 82.6 4416 13687 388 23 530 4X10-4 84.0 81.3 4000 13107 400 20 480 4X10-5 83.5 80.2 4319 13379 396 22 530 表 5 本文方法与其他先进方法在SMD数据集上跟踪表现的对比结果
Table 5. Comparison of tracking performance between the proposed method and other state-of-the-art methods on SMD dataset
算法 MOTA↑ IDF1↑ FP↓ FN↓ MT↑ ML↓ IDS↓ DeepSORT 31.1 62.3 21678 11082 69 25 224 StrongSORT 42.1 65 13264 17233 63 21 224 ByteTrack 44.8 67.3 9387 17003 57 26 49 CSTrack 38.5 62.6 9760 19617 48 33 109 本文方法 46.9 65.7 6658 16565 43 23 172 -
[1] Ciaparrone G, Sánchez F L, Tabik S, et al. Deep learning in video multi-object tracking: a survey[J]. Neurocomputing, 2020, 381: 61−88. doi: 10.1016/j.neucom.2019.11.023
[2] 伍瀚, 聂佳浩, 张照娓, 等. 基于深度学习的视觉多目标跟踪研究综述[J]. 计算机科学, 2023, 50(4): 77−87. doi: 10.11896/jsjkx.220300173
Wu H, Lie J H, Zhang Z W, et al. Deep learning-based visual multiple object tracking: a review[J]. Comput Sci, 2023, 50(4): 77−87. doi: 10.11896/jsjkx.220300173
[3] Wang G A, Song M L, Hwang J N. Recent advances in embedding methods for multi-object tracking: a survey[Z]. arXiv: 2205.10766, 2022. https://doi.org/10.48550/arXiv.2205.10766.
[4] Xiao T, Li S, Wang B C, et al. Joint detection and identification feature learning for person search[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017: 3376–3385.https://doi.org/10.1109/CVPR.2017.360.
[5] Wojke N, Bewley A, Paulus D. Simple online and realtime tracking with a deep association metric[C]//Proceedings of 2017 IEEE International Conference on Image Processing, Beijing, 2017: 3645–3649. https://doi.org/10.1109/ICIP.2017.8296962.
[6] Du Y H, Zhao Z C, Song Y, et al. StrongSORT: make deepSORT great again[Z]. arXiv: 2202.13514, 2023. https://doi.org/10.48550/arXiv.2202.13514.
[7] Zhang Y F, Sun P Z, Jiang Y, et al. Bytetrack: multi-object tracking by associating every detection box[C]//Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, 2022: 1–21. https://doi.org/10.1007/978-3-031-20047-2_1.
[8] Zhang Y F, Wang C Y, Wang X G, et al. FairMOT: on the fairness of detection and re-identification in multiple object tracking[J]. Int J Comput Vis, 2021, 129(11): 3069−3087. doi: 10.1007/s11263-021-01513-4
[9] Liang C, Zhang Z P, Zhou X, et al. Rethinking the competition between detection and ReID in multiobject tracking[J]. IEEE Trans Image Process, 2022, 31: 3182−3196. doi: 10.1109/TIP.2022.3165376
[10] Prasad D K, Rajan D, Rachmawati L, et al. Video processing from electro-optical sensors for object detection and tracking in a maritime environment: a survey[J]. IEEE Trans Intell Transp Syst, 2017, 18(8): 1993−2016. doi: 10.1109/TITS.2016.2634580
[11] Milan A, Leal-Taixé L, Reid I, et al. MOT16: a benchmark for multi-object tracking[Z]. arXiv: 1603.00831, 2016. https://doi.org/10.48550/arXiv.1603.00831.
[12] Wu J L, Cao J L, Song L C, et al. Track to detect and segment: an online multi-object tracker[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 2021: 12347–12356. https://doi.org/10.1109/CVPR46437.2021.01217.
[13] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137−1149. doi: 10.1109/TPAMI.2016.2577031
[14] Wang Z D, Zheng L, Liu Y X, et al. Towards real-time multi-object tracking[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, 2020: 107–122. https://doi.org/10.1007/978-3-030-58621-8_7.
[15] Yu E, Li Z L, Han S D, et al. RelationTrack: relation-aware multiple object tracking with decoupled representation[J]. IEEE Trans Multimedia, 2022, 25: 2686−2697. doi: 10.1109/TMM.2022.3150169
[16] Wan X Y, Zhou S P, Wang J J, et al. Multiple object tracking by trajectory map regression with temporal priors embedding[C]//Proceedings of the 29th ACM International Conference on Multimedia, 2021: 1377–1386. https://doi.org/10.1145/3474085.3475304.
[17] Meng F J, Wang X Q, Wang D, et al. Spatial–semantic and temporal attention mechanism-based online multi-object tracking[J]. Sensors, 2020, 20(6): 1653. doi: 10.3390/s20061653
[18] Guo S, Wang J Y, Wang X C, et al. Online multiple object tracking with cross-task synergy[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 2021: 8132–8141. https://doi.org/10.1109/CVPR46437.2021.00804.
[19] Bloisi D D, Iocchi L, Pennisi A, et al. ARGOS-Venice boat classification[C]//Proceedings of the 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance, Karlsruhe, 2015: 1–6. https://doi.org/10.1109/AVSS.2015.7301727.
[20] Shao Z F, Wu W J, Wang Z Y, et al. SeaShips: a large-scale precisely annotated dataset for ship detection[J]. IEEE Trans Multimedia, 2018, 20(10): 2593−2604. doi: 10.1109/TMM.2018.2865686
[21] Ribeiro R, Cruz G, Matos J, et al. A data set for airborne maritime surveillance environments[J]. IEEE Trans Circuits Syst Video Technol, 2017, 29(9): 2720−2732. doi: 10.1109/TCSVT.2017.2775524
[22] 徐安林, 杜丹, 王海红, 等. 结合层次化搜索与视觉残差网络的光学舰船目标检测方法[J]. 光电工程, 2021, 48(4): 200249. doi: 10.12086/oee.2021.200249
Xu A L, Du D, Wang H H, et al. Optical ship target detection method combining hierarchical search and visual residual network[J]. Opto-Electron Eng, 2021, 48(4): 200249. doi: 10.12086/oee.2021.200249
[23] 于国莉, 桑金歌, 李俊荣. 基于改进卷积神经网络的舰船实时目标跟踪识别技术[J]. 舰船科学技术, 2022, 44(21): 152−155. doi: 10.3404/j.issn.1672-7649.2022.21.031
Yu G L, Sang J G, Li J R. Ship real-time target tracking and recognition technology based on improved convolutional neural network[J]. Ship Sci Technol, 2022, 44(21): 152−155. doi: 10.3404/j.issn.1672-7649.2022.21.031
[24] Li G Y, Qiao Y L. A ship target detection and tracking algorithm based on graph matching[J]. J Phys Conf Ser, 2021, 1873: 012056. doi: 10.1088/1742-6596/1873/1/012056
[25] 周越冬. 基于深度学习的遥感图像舰船多目标跟踪方法研究[D]. 西安: 西安电子科技大学, 2021.https://doi.org/10.27389/d.cnki.gxadu.2021.000391.
Zhou Y D. Research on ship multiple object tracking in remote sensing image based on deep learning[D]. Xi’an: Xidian University, 2021. https://doi.org/10.27389/d.cnki.gxadu.2021.000391.
[26] 陈庆林. 面向舰船视频目标检测的标注与多目标跟踪算法研究[D]. 杭州: 杭州电子科技大学, 2021. https://doi.org/10.27075/d.cnki.ghzdc.2021.000349.
Chen Q L. Research on automatic annotation and multi-target tracking algorithm for ship video target detection[D]. Hangzhou: Hangzhou Dianzi University, 2021. https://doi.org/10.27075/d.cnki.ghzdc.2021.000349.
[27] 陈旭, 彭冬亮, 谷雨. 基于改进YOLOv5s的无人机图像实时目标检测[J]. 光电工程, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372
Chen X, Peng D L, Gu Y. Real-time object detection for UAV images based on improved YOLOv5s[J]. Opto-Electron Eng, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372
[28] Gao S H, Cheng M M, Zhao K, et al. Res2Net: a new multi-scale backbone architecture[J]. IEEE Trans Pattern Anal Mach Intell, 2019, 43(2): 652−662. doi: 10.1109/TPAMI.2019.2938758
[29] Hou Q B, Zhou D Q, Feng J S. Coordinate attention for efficient mobile network design[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 2021: 13708–13717. https://doi.org/10.1109/CVPR46437.2021.01350.
[30] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 7132–7141. https://doi.org/10.1109/CVPR.2018.00745.
[31] Woo S, Park J, Lee J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, 2018: 3–19. https://doi.org/10.1007/978-3-030-01234-2_1.
[32] Wang Q L, Wu B G, Zhu P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 2020: 11531–11539. https://doi.org/10.1109/CVPR42600.2020.01155.
[33] Li C Y, Li L L, Jiang H L, et al. YOLOv6: a single-stage object detection framework for industrial applications[Z]. arXiv: 2209.02976, 2022. https://doi.org/10.48550/arXiv.2209.02976.
[34] Ge Z, Liu S T, Wang F, et al. YOLOX: exceeding YOLO series in 2021[Z]. arXiv: 2107.08430, 2021. https://doi.org/10.48550/arXiv.2107.08430.
[35] Moosbauer S, König D, Jäkel J, et al. A benchmark for deep learning based object detection in maritime environments[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, 2019: 916–925. https://doi.org/10.1109/CVPRW.2019.00121.
-