改进YOLOv7的无人机视角下复杂环境目标检测算法

张润梅; 肖钰霏; 贾振楠; 陈中; 陈梓华; 袁彬; 曹炜威; 宋娓娓

doi:10.12086/oee.2024.240051

改进YOLOv7的无人机视角下复杂环境目标检测算法

- 1.
  安徽建筑大学机械与电气工程学院，安徽合肥 230601
- 2.
  安徽省工程机械智能制造重点实验室，安徽合肥 230601
- 3.
  安徽省仿真设计与现代制造工程技术研究中心，安徽黄山 242700
- 4.
  民航飞行技术与飞行安全重点实验室，四川广汉 618300
基金项目:
安徽省仿真设计与现代制造工程技术研究中心开放基金(SGCZXZD2101)；基于知识图谱的无人机安全知识库的构建(FZ2021KF10)

详细信息

作者简介:
张润梅(1971-)，女，博士，教授，主要研究方向为机器人技术、图像处理。E-mail：zrmahjzu11@163.com;

宋娓娓(1984-)，女，博士，副教授，主要研究方向为搅拌摩擦表面改性技术、现代加工技术。E-mail: swwahjzu11@163.com

**^*通讯作者:** 宋娓娓，swwahjzu11@163.com

中图分类号: TP391

收稿日期: 2024-03-06

修回日期: 2024-04-21

录用日期: 2024-04-24

刊出日期: 2024-05-25

Improved YOLOv7 algorithm for target detection in complex environments from UAV perspective

- 1.
  School of Mechanical and Electrical Engineering, Anhui Jianzhu University, Hefei, Anhui 230601, China
- 2.
  Key Laboratory of Intelligent Manufacturing of Construction Machinery, Hefei, Anhui 230601, China
- 3.
  Anhui Simulation Design and Modern Manufacturing Engineering Technology Research Center, Huangshan, Anhui 242700, China
- 4.
  Key Laboratory of Civil Aviation Flight Technology and Flight Safety, Guanghan, Sichuan 618300, China
Fund Project: Project supported by Open Fund of Anhui Simulation Design and Modern Manufacturing Engineering Technology Research Centre (SGCZXZD2101), and Construction of UAV Safety Knowledge Base Based on Knowledge Graph (FZ2021KF10)

More Information

**^*Corresponding author:** swwahjzu11@163.com

Received Date 06 March 2024

Revised Date 21 April 2024

Accepted Date 24 April 2024

Published Date 25 May 2024

摘要

摘要

针对无人机在航拍过程中容易受到恶劣环境的影响，导致航拍图像出现辨识度低、被障碍物遮挡、特征严重丢失等问题，提出了一种改进YOLOv7的无人机视角下复杂环境的目标检测算法(SSG-YOLOv7)。首先从VisDrone2019数据集和RSOD数据集中分别抽取图片进行五种环境的模拟，将VisDrone数据集扩充至12803张，RSOD数据集扩充至1320张。其次，聚类出更适合数据集的锚框尺寸。接着将3D无参注意力机制SimAM引入主干网络和特征提取模块中，增加模型的学习能力。然后重构特征提取模块SPPCSPC，融合不同尺寸池化通道提取的信息同时引入轻量级的卷积模块GhostConv，在不增加模型参数量的同时提高算法对密集多尺度目标检测精度。最后使用Soft NMS优化锚框的置信度，减少算法的误检、漏检率。实验结果表明，在复杂环境的检测任务中SSG-YOLOv7检测效果优良，性能指标VisDrone_mAP@0.5和RSOD_mAP@0.5较YOLOv7分别提高了10.45%和2.67%。
- 无人机 /
- 复杂环境 /
- YOLOv7 /
- simAM注意力机制 /
- SPPCSPC /
- 数据增强
Abstract

To address the challenges faced by drones during UAV (unmanned aerial vehicle) photography in adverse conditions, such as low image recognition, obstruction by obstacles, and significant feature loss, a novel algorithm named SSG-YOLOv7 was proposed to enhance object detection from the perspective of drones in complex environments. Firstly, 12803 images were augmented from the VisDrone2019 dataset, and 1320 images were augmented from the RSOD dataset to simulate five different environments. Subsequently, anchor box sizes suitable for the datasets were clustered. The 3D non-local attention mechanism SimAM was integrated into the backbone network and feature extraction module to enhance the model's learning capabilities. Furthermore, the feature extraction module SPPCSPC was restructured to integrate information extracted from channels with different pool sizes and introduce the lightweight convolution module GhostConv, thereby improving the precision of dense multi-scale object detection without increasing the model's parameter count. Finally, Soft NMS was employed to optimize the confidence of anchor boxes, reducing false positives and missed detections. Experimental results demonstrate that SSG-YOLOv7 exhibits superior detection performance in complex environments, with performance metrics VisDrone_mAP@0.5 and RSOD_mAP@0.5 showing improvements of 10.45% and 2.67%, respectively, compared to YOLOv7.
- UAV /
- complex environment /
- YOLOv7 /
- simAM attention mechanism /
- SPPCSPC /
- data enhancement

Overview

Overview

Overview: Using low-cost unmanned aerial vehicle (UAV) photography technology combined with deep learning can create significant value in various fields. Targets captured from a UAV perspective often exhibit drastic scale variations, uneven distribution, and susceptibility to obstruction by obstacles. Moreover, UAVs typically fly at low altitudes and high speeds during the capture process, which can result in low-resolution aerial images affected by weather conditions or the drone's own vibrations. Maintaining high detection accuracy in such complex environments is a crucial challenge in UAV-based target detection tasks. Therefore, this paper proposes a new target detection algorithm, SSG-YOLOv7, based on YOLOv7. Firstly, the algorithm utilizes the K-means++ clustering algorithm to generate four different-scale anchor boxes suitable for the target dataset, effectively addressing the issue of large-scale variations in targets from the UAV perspective. Next, by introducing the SimAM attention mechanism into the neck network and feature extraction module, the model's detection accuracy is improved without increasing the model's parameter count. Subsequently, the pooling layers at different scales of the feature extraction module are fused to enable the model to learn richer target feature information in complex environments. Additionally, GhostConv is used to replace traditional convolutional modules to reduce the parameter count of the feature extraction module. Finally, Soft NMS is employed to reduce the false detection and missed detection rates of small-scale targets during the detection process, thereby enhancing target detection effectiveness from the UAV perspective. In the experimental process, the original VisDrone dataset and RSOD dataset are simulated under five complex environments using transformation functions from the Imgaug library. SSG-YOLOv7 is validated against the original algorithm. Compared to the original algorithm, the proposed algorithm improves the average precision (mAP@0.5) of the model by 10.45% in the VisDrone dataset and by 2.67% in the RSOD dataset, while reducing the model's parameter count by 24.2%. This effectively demonstrates that SSG-YOLOv7 is better suited for target detection tasks in complex environments from the UAV perspective. Additionally, the experiment compares the detection accuracy of YOLOv7 and SSG-YOLOv7 before and after data augmentation on both datasets. In the VisDrone dataset, YOLOv7 improves by 4.13%, while SSG-YOLOv7 improves by 8.71%. In the RSOD dataset, YOLOv7 improves by 3.59%, while SSG-YOLOv7 improves by 4.45%. This effectively proves that SSG-YOLOv7 can learn more target features from samples in complex environments, accurately locate the targets, and is suitable for multi-target detection tasks in complex environments from the UAV perspective.

HTML全文

图 1 SimAM注意力模块

Figure 1. SimAM attention module

下载: 全尺寸图片幻灯片

图 2 SSG-YOLOv7整体结构

Figure 2. SSG-YOLOv7 overall structure

下载: 全尺寸图片幻灯片

图 3 GhostConv结构图

Figure 3. GhostConv structure

下载: 全尺寸图片幻灯片

图 4 (a) NMS与(b) Soft NMS检测效果对比示例图

Figure 4. Comparison of (a) NMS and (b) soft NMS detection effect sample chart

下载: 全尺寸图片幻灯片

图 5 两类数据集数据增强效果对比

Figure 5. Data augmentation comparison of two kinds of datasets

下载: 全尺寸图片幻灯片

图 6 YOLOv7与SSG-YOLOv7检测效果可视化对比

Figure 6. Visual comparison of YOLOv7 and SSG-YOLOv7 detection effect

下载: 全尺寸图片幻灯片

表 1 K-means++生成的锚框尺寸

Table 1. Anchor frame size generated by K-means++

特征图尺寸	感受野	锚框
20x20	Big	[33,49],[63,73]
40x40	Medium	[14,35],[27,23]
80x80	Small	[20,8,8,15,14]
160x160	Tiny	[2,5,4,11]

下载: 导出CSV

表 2 SPPCSPC与SG-SPPCSPC模块参数量和GFLOPs对比结果

Table 2. Comparison of SPPCSPC and SG-SPPCSPC

模块类型	Parameters/M	GFLOPS
SPPCSPC模块	12.8	16.2
SG-SPPCSPC模块	3.69	4.9

下载: 导出CSV

表 3 消融实验结果

Table 3. Results of ablation experiments

Model	K-means++	SimAM	SG-SPPCSPC	Soft NMS	Vis_mAP@0.5/%	RSOD_mAP@0.5/%	Parameters/M	FPS	GFLOPs
A					40.89	95.60	37.6	82	106.5
B	√				44.15	96.91	37.6	82	106.5
C	√	√			46.40	97.22	37.6	87	107.2
D	√	√	√		48.61	97.91	28.5	93	95.9
E	√	√	√	√	51.34(+10.45)	98.27(+2.67)	28.5	93	95.9

下载: 导出CSV

表 4 数据增强前后mAP(%)对比

Table 4. Comparison of mAP(%) before and after data enhancement

Model	原始VisDrone	增强后VisDrone	原始RSOD	增强后RSOD
YOLOv7	36.76	40.89	92.01	95.60
SSG-YOLOv7	42.63	51.34	93.82	98.27

下载: 导出CSV

表 5 对比实验结果

Table 5. Comparison of experimental results

Method	Visdrone_mAP@0.5 /%	Visdrone_mAP@0.5:0.95 /%	RSOD_mAP@0.5 /%	RSOD_mAP@0.5:0.95 /%	FPS	Parameters/M
Faster R-CNN^[6]	20.0	8.91	85.6	54.1	43	137.10
SSD^[23]	10.2	5.1	87.4	52.6	249	26.29
YOLOv5s	27.4	15.6	94.0	59.5	126	7.28
YOLOv5m	32.0	18.8	95.2	66.4	98	21.38
YOLOv5l	36.5	21.5	95.1	68.3	75	47.10
YOLOv7^[13]	40.8	24.0	95.6	69.6	82	37.62
YOLOv8s	43.1	25.0	94.1	63.0	160	11.17
YOLOv8m	39.6	22.8	94.1	68.7	122	25.90
YOLOv8l	43.7	25.1	96.0	68.9	98	43.69
本文算法	51.3	29.2	98.3	70.0	93	28.49

下载: 导出CSV

参考文献(23)

参考文献

[1]	陈旭, 彭冬亮, 谷雨. 基于改进YOLOv5s的无人机图像实时目标检测[J]. 光电工程, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372 Chen X, Peng D L, Gu Y. Real-time object detection for UAV images based on improved YOLOv5s[J]. Opto-Electron Eng, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372
[2]	阳珊, 王建, 胡莉, 等. 改进RetinaNet的遮挡目标检测算法研究[J]. 计算机工程与应用, 2022, 58(11): 209−214. doi: 10.3778/j.issn.1002-8331.2107-0277 Yang S, Wang J, Hu L, et al. Research on occluded object detection by improved RetinaNet[J]. Comput Eng Appl, 2022, 58(11): 209−214. doi: 10.3778/j.issn.1002-8331.2107-0277
[3]	Zhan W, Sun C F, Wang M C, et al. An improved Yolov5 real-time detection method for small objects captured by UAV[J]. Soft Comput, 2022, 26(6): 361−373. doi: 10.1007/s00500-021-06407-8
[4]	Liu W, Quijano K, Crawford M M. YOLOv5-tassel: detecting tassels in RGB UAV imagery with improved YOLOv5 based on transfer learning[J]. IEEE J Sel Top Appl Earth Obs Remote Sens, 2022, 15: 8085−8094. doi: 10.1109/JSTARS.2022.3206399
[5]	Purkait P, Zhao C, Zach C. SPP-Net: deep absolute pose regression with synthetic views[Z]. arXiv: 1712.03452, 2017. https://doi.org/10.48550/arXiv.1712.03452.
[6]	Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, 2015: 1440–1448. https://doi.org/10.1109/ICCV.2015.169.
[7]	Uijlings J R R, van de Sande K E A, Gevers T, et al. Selective search for object recognition[J]. Int J Comput Vis, 2013, 104(2): 154−171. doi: 10.1007/s11263-013-0620-5
[8]	Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137−1149. doi: 10.1109/TPAMI.2016.2577031
[9]	Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016. https://doi.org/10.1109/CVPR.2016.91.
[10]	Yin R H, Zhao W, Fan X D, et al. AF-SSD: an accurate and fast single shot detector for high spatial remote sensing imagery[J]. Sensors, 2020, 20(22): 6530. doi: 10.3390/s20226530
[11]	齐向明, 柴蕊, 高一萌. 重构SPPCSPC与优化下采样的小目标检测算法[J]. 计算机工程与应用, 2023, 59(20): 158−166. doi: 10.3778/j.issn.1002-8331.2305-0004 Qi X M, Chai R, Gao Y M. Algorithm of reconstructed SPPCSPC and optimized downsampling for small object detection[J]. Comput Eng Appl, 2023, 59(20): 158−166. doi: 10.3778/j.issn.1002-8331.2305-0004
[12]	Shang J C, Wang J S, Liu S B, et al. Small target detection algorithm for UAV aerial photography based on improved YOLOv5s[J]. Electronics, 2023, 12(11): 2434. doi: 10.3390/electronics12112434
[13]	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464–7475. https://doi.org/10.1109/CVPR52729.2023.00721.
[14]	Tang F, Yang F, Tian X Q. Long-distance person detection based on YOLOv7[J]. Electronics, 2023, 12(6): 1502. doi: 10.3390/electronics12061502
[15]	Huang T Y, Cheng M, Yang Y L, et al. Tiny object detection based on YOLOv5[C]//Proceedings of the 2022 5th International Conference on Image and Graphics Processing, 2022: 45–50. https://doi.org/10.1145/3512388.3512395.
[16]	Ismkhan H. I-k-means-+: an iterative clustering algorithm based on an enhanced version of the k-means[J]. Pattern Recognit, 2018, 79: 402−413. doi: 10.1016/j.patcog.2018.02.015
[17]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000–6010. https://doi.org/10.5555/3295222.3295349.
[18]	Yang L X, Zhang R Y, Li L D, et al. SimAM: a simple, parameter-free attention module for convolutional neural networks[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 11863–11874.
[19]	Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936–944. https://doi.org/10.1109/CVPR.2017.106.
[20]	Han K, Wang Y H, Tian Q, et al. GhostNet: more features from cheap operations[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1577–1586. https://doi.org/10.1109/CVPR42600.2020.00165.
[21]	Bodla N, Singh B, Chellappa R, et al. Soft-NMS - improving object detection with one line of code[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 5562–5570. https://doi.org/10.1109/ICCV.2017.593.
[22]	Du D W, Zhu P F, Wen L Y, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop, 2019: 213–226. https://doi.org/10.1109/ICCVW.2019.00030.
[23]	Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 21-37. https://doi.org/10.1007/978-3-319-46448-0_2.

施引文献

资源附件(0)

访问统计

访问统计

点击扫一扫

图(7)

表(5)

计量

文章访问数:
PDF下载数:
施引文献: 0

改进YOLOv7的无人机视角下复杂环境目标检测算法

作者简介:
张润梅(1971-)，女，博士，教授，主要研究方向为机器人技术、图像处理。E-mail：zrmahjzu11@163.com;

宋娓娓(1984-)，女，博士，副教授，主要研究方向为搅拌摩擦表面改性技术、现代加工技术。E-mail: swwahjzu11@163.com

**^*通讯作者:** 宋娓娓，swwahjzu11@163.com

Improved YOLOv7 algorithm for target detection in complex environments from UAV perspective

**^*Corresponding author:** swwahjzu11@163.com

摘要

Abstract

Overview

参考文献

访问统计

计量

目录

作者须知

其他内容

条款和政策

改进YOLOv7的无人机视角下复杂环境目标检测算法

作者简介: 张润梅(1971-)，女，博士，教授，主要研究方向为机器人技术、图像处理。E-mail：zrmahjzu11@163.com; 宋娓娓(1984-)，女，博士，副教授，主要研究方向为搅拌摩擦表面改性技术、现代加工技术。E-mail: swwahjzu11@163.com

*通讯作者: 宋娓娓，swwahjzu11@163.com

Improved YOLOv7 algorithm for target detection in complex environments from UAV perspective

*Corresponding author: swwahjzu11@163.com

摘要

Abstract

Overview

参考文献

访问统计

计量

出版历程

目录

作者须知

其他内容

条款和政策

作者简介:
张润梅(1971-)，女，博士，教授，主要研究方向为机器人技术、图像处理。E-mail：zrmahjzu11@163.com;

宋娓娓(1984-)，女，博士，副教授，主要研究方向为搅拌摩擦表面改性技术、现代加工技术。E-mail: swwahjzu11@163.com

**^*通讯作者:** 宋娓娓，swwahjzu11@163.com

**^*Corresponding author:** swwahjzu11@163.com