改进YOLOv7的无人机视角下复杂环境目标检测算法

张润梅,肖钰霏,贾振楠,等. 改进YOLOv7的无人机视角下复杂环境目标检测算法[J]. 光电工程,2024,51(5): 240051. doi: 10.12086/oee.2024.240051
引用本文: 张润梅,肖钰霏,贾振楠,等. 改进YOLOv7的无人机视角下复杂环境目标检测算法[J]. 光电工程,2024,51(5): 240051. doi: 10.12086/oee.2024.240051
Zhang R M, Xiao Y F, Jia Z N, et al. Improved YOLOv7 algorithm for target detection in complex environments from UAV perspective[J]. Opto-Electron Eng, 2024, 51(5): 240051. doi: 10.12086/oee.2024.240051
Citation: Zhang R M, Xiao Y F, Jia Z N, et al. Improved YOLOv7 algorithm for target detection in complex environments from UAV perspective[J]. Opto-Electron Eng, 2024, 51(5): 240051. doi: 10.12086/oee.2024.240051

改进YOLOv7的无人机视角下复杂环境目标检测算法

  • 基金项目:
    安徽省仿真设计与现代制造工程技术研究中心开放基金(SGCZXZD2101);基于知识图谱的无人机安全知识库的构建(FZ2021KF10)
详细信息
    作者简介:
    *通讯作者: 宋娓娓,swwahjzu11@163.com
  • 中图分类号: TP391

Improved YOLOv7 algorithm for target detection in complex environments from UAV perspective

  • Fund Project: Project supported by Open Fund of Anhui Simulation Design and Modern Manufacturing Engineering Technology Research Centre (SGCZXZD2101), and Construction of UAV Safety Knowledge Base Based on Knowledge Graph (FZ2021KF10)
More Information
  • 针对无人机在航拍过程中容易受到恶劣环境的影响,导致航拍图像出现辨识度低、被障碍物遮挡、特征严重丢失等问题,提出了一种改进YOLOv7的无人机视角下复杂环境的目标检测算法(SSG-YOLOv7)。首先从VisDrone2019数据集和RSOD数据集中分别抽取图片进行五种环境的模拟,将VisDrone数据集扩充至12803张,RSOD数据集扩充至1320张。其次,聚类出更适合数据集的锚框尺寸。接着将3D无参注意力机制SimAM引入主干网络和特征提取模块中,增加模型的学习能力。然后重构特征提取模块SPPCSPC,融合不同尺寸池化通道提取的信息同时引入轻量级的卷积模块GhostConv,在不增加模型参数量的同时提高算法对密集多尺度目标检测精度。最后使用Soft NMS优化锚框的置信度,减少算法的误检、漏检率。实验结果表明,在复杂环境的检测任务中SSG-YOLOv7检测效果优良,性能指标VisDrone_mAP@0.5和RSOD_mAP@0.5较YOLOv7分别提高了10.45%和2.67%。

  • Overview: Using low-cost unmanned aerial vehicle (UAV) photography technology combined with deep learning can create significant value in various fields. Targets captured from a UAV perspective often exhibit drastic scale variations, uneven distribution, and susceptibility to obstruction by obstacles. Moreover, UAVs typically fly at low altitudes and high speeds during the capture process, which can result in low-resolution aerial images affected by weather conditions or the drone's own vibrations. Maintaining high detection accuracy in such complex environments is a crucial challenge in UAV-based target detection tasks. Therefore, this paper proposes a new target detection algorithm, SSG-YOLOv7, based on YOLOv7. Firstly, the algorithm utilizes the K-means++ clustering algorithm to generate four different-scale anchor boxes suitable for the target dataset, effectively addressing the issue of large-scale variations in targets from the UAV perspective. Next, by introducing the SimAM attention mechanism into the neck network and feature extraction module, the model's detection accuracy is improved without increasing the model's parameter count. Subsequently, the pooling layers at different scales of the feature extraction module are fused to enable the model to learn richer target feature information in complex environments. Additionally, GhostConv is used to replace traditional convolutional modules to reduce the parameter count of the feature extraction module. Finally, Soft NMS is employed to reduce the false detection and missed detection rates of small-scale targets during the detection process, thereby enhancing target detection effectiveness from the UAV perspective. In the experimental process, the original VisDrone dataset and RSOD dataset are simulated under five complex environments using transformation functions from the Imgaug library. SSG-YOLOv7 is validated against the original algorithm. Compared to the original algorithm, the proposed algorithm improves the average precision (mAP@0.5) of the model by 10.45% in the VisDrone dataset and by 2.67% in the RSOD dataset, while reducing the model's parameter count by 24.2%. This effectively demonstrates that SSG-YOLOv7 is better suited for target detection tasks in complex environments from the UAV perspective. Additionally, the experiment compares the detection accuracy of YOLOv7 and SSG-YOLOv7 before and after data augmentation on both datasets. In the VisDrone dataset, YOLOv7 improves by 4.13%, while SSG-YOLOv7 improves by 8.71%. In the RSOD dataset, YOLOv7 improves by 3.59%, while SSG-YOLOv7 improves by 4.45%. This effectively proves that SSG-YOLOv7 can learn more target features from samples in complex environments, accurately locate the targets, and is suitable for multi-target detection tasks in complex environments from the UAV perspective.

  • 加载中
  • 图 1  SimAM注意力模块

    Figure 1.  SimAM attention module

    图 2  SSG-YOLOv7整体结构

    Figure 2.  SSG-YOLOv7 overall structure

    图 3  GhostConv结构图

    Figure 3.  GhostConv structure

    图 4  (a) NMS与(b) Soft NMS检测效果对比示例图

    Figure 4.  Comparison of (a) NMS and (b) soft NMS detection effect sample chart

    图 5  两类数据集数据增强效果对比

    Figure 5.  Data augmentation comparison of two kinds of datasets

    图 6  YOLOv7与SSG-YOLOv7检测效果可视化对比

    Figure 6.  Visual comparison of YOLOv7 and SSG-YOLOv7 detection effect

    表 1  K-means++生成的锚框尺寸

    Table 1.  Anchor frame size generated by K-means++

    特征图尺寸感受野锚框
    20x20Big[33,49],[63,73]
    40x40Medium[14,35],[27,23]
    80x80Small[20,8,8,15,14]
    160x160Tiny[2,5,4,11]
    下载: 导出CSV

    表 2  SPPCSPC与SG-SPPCSPC模块参数量和GFLOPs对比结果

    Table 2.  Comparison of SPPCSPC and SG-SPPCSPC

    模块类型Parameters/MGFLOPS
    SPPCSPC模块12.816.2
    SG-SPPCSPC模块3.694.9
    下载: 导出CSV

    表 3  消融实验结果

    Table 3.  Results of ablation experiments

    ModelK-means++SimAMSG-SPPCSPCSoft NMSVis_mAP@0.5/%RSOD_mAP@0.5/%Parameters/MFPSGFLOPs
    A40.8995.6037.682106.5
    B44.1596.9137.682106.5
    C46.4097.2237.687107.2
    D48.6197.9128.59395.9
    E51.34(+10.45)98.27(+2.67)28.59395.9
    下载: 导出CSV

    表 4  数据增强前后mAP(%)对比

    Table 4.  Comparison of mAP(%) before and after data enhancement

    Model原始VisDrone增强后VisDrone原始RSOD增强后RSOD
    YOLOv736.7640.8992.0195.60
    SSG-YOLOv742.6351.3493.8298.27
    下载: 导出CSV

    表 5  对比实验结果

    Table 5.  Comparison of experimental results

    MethodVisdrone_mAP@0.5 /%Visdrone_mAP@0.5:0.95 /%RSOD_mAP@0.5 /%RSOD_mAP@0.5:0.95 /%FPSParameters/M
    Faster R-CNN[6]20.08.9185.654.143137.10
    SSD[23]10.25.187.452.624926.29
    YOLOv5s27.415.694.059.51267.28
    YOLOv5m32.018.895.266.49821.38
    YOLOv5l36.521.595.168.37547.10
    YOLOv7[13]40.824.095.669.68237.62
    YOLOv8s43.125.094.163.016011.17
    YOLOv8m39.622.894.168.712225.90
    YOLOv8l43.725.196.068.99843.69
    本文算法51.329.298.370.09328.49
    下载: 导出CSV
  • [1]

    陈旭, 彭冬亮, 谷雨. 基于改进YOLOv5s的无人机图像实时目标检测[J]. 光电工程, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372

    Chen X, Peng D L, Gu Y. Real-time object detection for UAV images based on improved YOLOv5s[J]. Opto-Electron Eng, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372

    [2]

    阳珊, 王建, 胡莉, 等. 改进RetinaNet的遮挡目标检测算法研究[J]. 计算机工程与应用, 2022, 58(11): 209−214. doi: 10.3778/j.issn.1002-8331.2107-0277

    Yang S, Wang J, Hu L, et al. Research on occluded object detection by improved RetinaNet[J]. Comput Eng Appl, 2022, 58(11): 209−214. doi: 10.3778/j.issn.1002-8331.2107-0277

    [3]

    Zhan W, Sun C F, Wang M C, et al. An improved Yolov5 real-time detection method for small objects captured by UAV[J]. Soft Comput, 2022, 26(6): 361−373. doi: 10.1007/s00500-021-06407-8

    [4]

    Liu W, Quijano K, Crawford M M. YOLOv5-tassel: detecting tassels in RGB UAV imagery with improved YOLOv5 based on transfer learning[J]. IEEE J Sel Top Appl Earth Obs Remote Sens, 2022, 15: 8085−8094. doi: 10.1109/JSTARS.2022.3206399

    [5]

    Purkait P, Zhao C, Zach C. SPP-Net: deep absolute pose regression with synthetic views[Z]. arXiv: 1712.03452, 2017. https://doi.org/10.48550/arXiv.1712.03452.

    [6]

    Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, 2015: 1440–1448. https://doi.org/10.1109/ICCV.2015.169.

    [7]

    Uijlings J R R, van de Sande K E A, Gevers T, et al. Selective search for object recognition[J]. Int J Comput Vis, 2013, 104(2): 154−171. doi: 10.1007/s11263-013-0620-5

    [8]

    Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137−1149. doi: 10.1109/TPAMI.2016.2577031

    [9]

    Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016. https://doi.org/10.1109/CVPR.2016.91.

    [10]

    Yin R H, Zhao W, Fan X D, et al. AF-SSD: an accurate and fast single shot detector for high spatial remote sensing imagery[J]. Sensors, 2020, 20(22): 6530. doi: 10.3390/s20226530

    [11]

    齐向明, 柴蕊, 高一萌. 重构SPPCSPC与优化下采样的小目标检测算法[J]. 计算机工程与应用, 2023, 59(20): 158−166. doi: 10.3778/j.issn.1002-8331.2305-0004

    Qi X M, Chai R, Gao Y M. Algorithm of reconstructed SPPCSPC and optimized downsampling for small object detection[J]. Comput Eng Appl, 2023, 59(20): 158−166. doi: 10.3778/j.issn.1002-8331.2305-0004

    [12]

    Shang J C, Wang J S, Liu S B, et al. Small target detection algorithm for UAV aerial photography based on improved YOLOv5s[J]. Electronics, 2023, 12(11): 2434. doi: 10.3390/electronics12112434

    [13]

    WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464–7475. https://doi.org/10.1109/CVPR52729.2023.00721.

    [14]

    Tang F, Yang F, Tian X Q. Long-distance person detection based on YOLOv7[J]. Electronics, 2023, 12(6): 1502. doi: 10.3390/electronics12061502

    [15]

    Huang T Y, Cheng M, Yang Y L, et al. Tiny object detection based on YOLOv5[C]//Proceedings of the 2022 5th International Conference on Image and Graphics Processing, 2022: 45–50. https://doi.org/10.1145/3512388.3512395.

    [16]

    Ismkhan H. I-k-means-+: an iterative clustering algorithm based on an enhanced version of the k-means[J]. Pattern Recognit, 2018, 79: 402−413. doi: 10.1016/j.patcog.2018.02.015

    [17]

    Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000–6010. https://doi.org/10.5555/3295222.3295349.

    [18]

    Yang L X, Zhang R Y, Li L D, et al. SimAM: a simple, parameter-free attention module for convolutional neural networks[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 11863–11874.

    [19]

    Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936–944. https://doi.org/10.1109/CVPR.2017.106.

    [20]

    Han K, Wang Y H, Tian Q, et al. GhostNet: more features from cheap operations[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1577–1586. https://doi.org/10.1109/CVPR42600.2020.00165.

    [21]

    Bodla N, Singh B, Chellappa R, et al. Soft-NMS - improving object detection with one line of code[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 5562–5570. https://doi.org/10.1109/ICCV.2017.593.

    [22]

    Du D W, Zhu P F, Wen L Y, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop, 2019: 213–226. https://doi.org/10.1109/ICCVW.2019.00030.

    [23]

    Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 21-37. https://doi.org/10.1007/978-3-319-46448-0_2.

  • 加载中

(7)

(5)

计量
  • 文章访问数: 
  • PDF下载数: 
  • 施引文献:  0
出版历程
收稿日期:  2024-03-06
修回日期:  2024-04-21
录用日期:  2024-04-24
刊出日期:  2024-05-25

目录

/

返回文章
返回