融合视觉中心机制和并行补丁感知的遥感图像检测算法

梁礼明; 陈康泉; 王成斌; 冯耀; 龙鹏威

doi:10.12086/oee.2024.240099

融合视觉中心机制和并行补丁感知的遥感图像检测算法

- 江西理工大学电气工程与自动化学院，江西赣州 341000
基金项目:
国家自然科学基金资助项目(51365017，61463018)；江西省自然科学基金资助项目(20192BAB205084)；江西省教育厅科学技术研究青年项目(GJJ2200848)

详细信息

作者简介:
梁礼明(1967-)，男，硕士，教授，硕士生导师，主要研究方向为机器学习、医学影像和系统建模等公开发表学术论文百余篇，其中被SCI、EI、ISTP收录论文二十余篇。获得中国发明专利六项(排名第一)、出版研究生教材一部。E-mail：lianglm67@163.com;

陈康泉(1995-)，男，硕士研究生，主要研究方向为机器学习、模式识别与图像处理。E-mail：1136344152@qq.com

**^*通讯作者:** 陈康泉，1136344152@qq.com。

中图分类号: TP391

收稿日期: 2024-05-01

修回日期: 2024-07-10

录用日期: 2024-07-10

刊出日期: 2024-08-20

Remote sensing image detection algorithm integrating visual center mechanism and parallel patch perception

- School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
Fund Project: Project supported by National Natural Science Foundation of China (51365017, 61463018), Natural Science Foundation of Jiangxi Province (20192BAB205084), Jiangxi Provincial Department of Education Science, and Technology Research Youth Project (GJJ2200848)

More Information

**^*Corresponding author:** 1136344152@qq.com

Received Date 01 May 2024

Revised Date 10 July 2024

Accepted Date 10 July 2024

Published Date 20 August 2024

摘要

摘要

针对遥感图像存在复杂背景干扰、目标多尺度差异和微小目标提取难的问题，本文基于YOLOv7-tiny模型提出一种融合视觉中心机制和并行补丁感知的遥感图像检测算法。该算法一是引入显式视觉中心机制，构建像素点间的长距离依赖关系，丰富图像的整体语义信息，同时提升对目标纹理的提取性能；二是改进并行补丁感知模块，调整特征提取感受野，以适应不同目标尺度；三是设计多尺度特征融合模块，实现对多层特征的高效融合，提升模型推理速度。在公共数据集RSOD上进行实验，所提算法的准确率、召回率和平均准确率均值相较YOLOv7-tiny分别提升1.5%、2.4%和2.4%，此外在NWPU VHR-10和DOTA数据集上进行泛化性验证，结果表明本文算法具备较强的泛化性能。通过与不同算法对比分析，进一步体现本文算法性能的优越性。
- 遥感图像 /
- 目标检测 /
- YOLOv7-tiny /
- 显式视觉中心机制 /
- 并行补丁感知
Abstract

To address the challenges of complex background interference, multi-scale differences in targets, and the difficulty in extracting small targets from remote sensing images, this paper proposes a remote sensing image detection algorithm based on the YOLOv7-tiny model that integrates the visual center mechanism and parallel patch perception. Firstly, the algorithm introduces an explicit visual center mechanism to establish long-distance dependencies between pixels, enriching the overall semantic information of the image and improving the extraction performance of target textures. Secondly, it improves the parallel patch perception module by adjusting the feature extraction receptive fields to adapt to different target scales. Thirdly, a multi-scale feature fusion module is designed to efficiently fuse multi-layer features, thereby improving the model's inference speed. Experimental results on the RSOD dataset show that the proposed algorithm achieves improvements over YOLOv7-tiny in terms of precision, recall, and mean average precision by 1.5%, 2.4%, and 2.4%, respectively. Additionally, validation on the NWPU VHR-10 and DOTA datasets confirms the strong generalization performance of the proposed algorithm. Comparative analysis with other algorithms further demonstrates the superior performance of the proposed approach.
- remote sensing images /
- object detection /
- YOLOv7-tiny /
- explicit visual center mechanism /
- parallel patch perception

Overview

Overview

Overview: In response to challenges posed by complex background interference, multi-scale variations of targets, and difficulties in extracting small targets in remote sensing images, this paper proposes a novel remote sensing image detection algorithm based on the YOLOv7-tiny model. The algorithm integrates a visual centering mechanism and parallel patch perception to enhance target detection performance. The algorithm introduces three main innovations. Firstly, it introduces an explicit visual centering mechanism that uses a lightweight multi-layer perceptron to establish long-distance dependencies between pixels, focusing on capturing central features of contextual information to enrich the overall semantic information of images, including scene structures and contextual details. Simultaneously, a trainable visual centering mechanism aggregates local area information within layers to capture locally representative feature representations, thereby further improving the extraction performance of target textures. This approach effectively extracts and utilizes the overall semantic information of images, accurately capturing global features of targets to enhance recognition of target textures and shapes during detection. Secondly, the algorithm improves the parallel patch perception module by dynamically adjusting the feature extraction receptive field to adapt to different target scales and capture diverse scale feature information, effectively handling varied backgrounds. In practical applications, targets in remote sensing images often exhibit different scales and complex environmental backgrounds, where traditional methods may struggle to distinguish or ignore these differences. By dynamically adjusting the receptive field, the algorithm flexibly perceives targets of different scales while maintaining high accuracy and low error rates in complex background scenarios. Finally, the algorithm designs a multi-scale feature fusion module to efficiently integrate multi-level and multi-scale feature information, comprehensively capturing diverse representations of targets and further enhancing model inference speed while meeting high-precision detection requirements. This fusion method significantly enhances the algorithm's effectiveness in static image detection tasks. Experimental results on the RSOD dataset demonstrate improvements in accuracy, recall, and mean average precision by 1.5%, 2.4%, and 2.4%, respectively, compared to YOLOv7-tiny. Additionally, generalization validation on the NWPU VHR-10 and DOTA datasets shows commendable results, with average precision mean values increasing by 3.0% and 1.3%, respectively, compared to baseline models. These findings illustrate the algorithm's outstanding performance not only on the RSOD dataset but also on datasets encompassing diverse types and scenes, highlighting its robust generalization capability. Through comparative analysis with different algorithms, the superiority of the proposed algorithm's performance is further underscored.

HTML全文

图 1 融合视觉中心机制和并行补丁感知的遥感图像检测模型

Figure 1. Remote sensing image detection model integrating visual center mechanism and parallel patch perception

下载: 全尺寸图片幻灯片

图 2 显式视觉中心机制

Figure 2. Explicit visual center mechanism

下载: 全尺寸图片幻灯片

图 3 并行多分支特征提取模块

Figure 3. Parallel multi-branch feature extraction module

下载: 全尺寸图片幻灯片

图 4 大型选择核模块

Figure 4. Large selective kernel module

下载: 全尺寸图片幻灯片

图 5 多尺度特征融合模块

Figure 5. Multi-scale feature fusion module

下载: 全尺寸图片幻灯片

图 6 不同算法遥感目标检测结果

Figure 6. Remote sensing target detection results of different algorithms

下载: 全尺寸图片幻灯片

表 1 参数设置

Table 1. Parameter setting

参数	参数值
输入图像分辨率	640×640
初始学习率	0.01
动量参数	0.937
权重衰减系数	0.0005
训练轮次	300
批量大小	16

下载: 导出CSV

表 2 不同注意力对比实验

Table 2. Experiments on contrasting attentional differences

注意力	参数量/M	FPS	mAP@0.5/%
CBAM	11.3	92	96.1
SE	11.1	90	93.4
CA	11.1	92	96.2
EMA	11.1	93	95.6
LSK	11.5	92	96.5

下载: 导出CSV

表 3 大内核分解为两个深度可分离卷积的有效性

Table 3. Effectiveness of decomposing a large kernel into two sequences of depth-wise separable kernels

(k₁, d₁)	(k₂, d₂)	RF	FPS	mAP@0.5/%
(3, 1)	(5, 2)	11	104	92.0
(5, 1)	(7, 3)	23	105	95.0
(7, 1)	(9, 4)	39	88	94.7

下载: 导出CSV

表 4 MFFM与ELAN对比实验

Table 4. Comparison of experiments between MFFM and ELAN

模块	参数量/M	FPS	mAP@0.5/%
ELAN	6.0	88	94.6
MFFM	4.8	126	94.2

下载: 导出CSV

表 5 消融实验数据

Table 5. Ablation experimental data

模型	准确率P/%	召回率R/%	平均准确率AP/%				平均准确率均值 mAP@0.5/%
模型	准确率P/%	召回率R/%	飞机	油桶	立交桥	操场	平均准确率均值 mAP@0.5/%
M1	90.3	93.1	97.9	97.8	85.0	97.7	94.6
M2	92.6	91.2	97.7	98.5	88.5	98.4	95.8
M3	92.0	95.2	97.8	98.8	91.4	99.5	96.9
M4	91.8	95.5	97.7	98.6	93.0	98.8	97.0

下载: 导出CSV

表 6 不同算法检测数据对比

Table 6. Comparison of detection data from different algorithms

模型	参数量/M	FPS	平均准确率AP/%				平均准确率均值 mAP@0.5%
模型	参数量/M	FPS	飞机	油桶	立交桥	操场	平均准确率均值 mAP@0.5%
Faster R-CNN	72.0	10	71.0	98.0	85.0	100.0	88.5
SSD	24.4	43	79.0	98.0	73.0	100.0	87.5
YOLOv3-tiny	12.1	104	94.2	96.4	76.9	98.5	91.5
YOLOv4-tiny	6.1	50	70.7	97.3	61.7	99.1	82.4
YOLOv5s	9.1	90	97.4	97.8	87.4	99.3	95.5
YOLOv5m	25.0	56	97.0	96.8	89.4	99.2	95.6
YOLOv7-tiny	6.0	88	97.9	97.8	85.0	97.7	94.6
YOLOv8s	11.1	97	97.6	97.2	82.8	99.4	94.3
YOLOv8m	25.8	53	97.2	98.1	84.3	99.5	94.8
ours	11.5	85	97.7	98.6	93.0	98.8	97.0

下载: 导出CSV

表 7 NWPU VHR-10数据集上检测结果对比

Table 7. Comparison of detection results on NWPU VHR-10 dataset

模型	准确率P/%	召回率R/%	参数量/M	FPS	mAP@0.5/%
YOLOv7-tiny	88.7	88.4	6.0	83	90.7
Ours	92.5	87.6	11.5	79	93.7

下载: 导出CSV

表 8 DOTA数据集上检测结果对比

Table 8. Comparison of detection results on DOTA dataset

模型	准确率P/%	召回率R/%	参数量/M	FPS	mAP@0.5/%
YOLOv7-tiny	78.2	70.4	6.0	82	74.7
Ours	80.0	71.2	11.5	77	76.0

下载: 导出CSV

参考文献(25)

参考文献

[1]	马梁, 苟于涛, 雷涛, 等. 基于多尺度特征融合的遥感图像小目标检测[J]. 光电工程, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363 Ma L, Gou Y T, Lei T, et al. Small object detection based on multi-scale feature fusion using remote sensing images[J]. Opto-Electron Eng, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363
[2]	袁金豪, 张南峰, 阮洁珊, 等. 基于改进YOLOX算法的X射线图像违禁品检测方法[J]. 激光技术, 2023, 47(4): 547−552. doi: 10.7510/jgjs.issn.1001-3806.2023.04.016 Yuan J H, Zhang N F, Ruan J S, et al. Detection of prohibited items in X-ray images based on modified YOLOX algorithm[J]. Laser Technol, 2023, 47(4): 547−552. doi: 10.7510/jgjs.issn.1001-3806.2023.04.016
[3]	Ming Q, Miao L J, Zhou Z Q, et al. CFC-Net: a critical feature capturing network for arbitrary-oriented object detection in remote-sensing images[J]. IEEE Trans Geosci Remote Sens, 2022, 60: 5605814. doi: 10.1109/TGRS.2021.3095186
[4]	Cong R M, Zhang Y M, Fang L Y, et al. RRNet: relational reasoning network with parallel multiscale attention for salient object detection in optical remote sensing images[J]. IEEE Trans Geosci Remote Sens, 2022, 60: 5613311. doi: 10.1109/TGRS.2021.3123984
[5]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 2014: 580–587. https://doi.org/10.1109/CVPR.2014.81.
[6]	Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, 2015: 1440–1448. https://doi.org/10.1109/ICCV.2015.169.
[7]	Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137−1149. doi: 10.1109/TPAMI.2016.2577031
[8]	He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, 2017: 2961–2969. https://doi.org/10.1109/ICCV.2017.322.
[9]	Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference, Amsterdam, 2016: 21–37. https://doi.org/10.1007/978-3-319-46448-0_2.
[10]	Zhao L Q, Li S Y. Object detection algorithm based on improved YOLOv3[J]. Electronics, 2020, 9(3): 537. doi: 10.3390/electronics9030537
[11]	Gai R L, Chen N, Yuan H. A detection algorithm for cherry fruits based on the improved YOLO-v4 model[J]. Neural Comput Appl, 2023, 35(19): 13895−13906. doi: 10.1007/s00521-021-06029-z
[12]	Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, 2023: 7464–7475. https://doi.org/10.1109/CVPR52729.2023.00721.
[13]	Salehi A W, Khan S, Gupta G, et al. A study of CNN and transfer learning in medical imaging: advantages, challenges, future scope[J]. Sustainability, 2023, 15(7): 5930. doi: 10.3390/su15075930
[14]	Gao S H, Li Z Y, Han Q, et al. RF-Next: efficient receptive field search for convolutional neural networks[J]. IEEE Trans Pattern Anal Mach Intell, 2023, 45(3): 2984−3002. doi: 10.1109/TPAMI.2022.3183829
[15]	Gao T, Niu Q Q, Zhang J, et al. Global to local: a scale-aware network for remote sensing object detection[J]. IEEE Trans Geosci Remote Sens, 2023, 61: 5615614. doi: 10.1109/TGRS.2023.3294241
[16]	Zhang J Q, Lei J, Xie W Y, et al. SuperYOLO: super resolution assisted object detection in multimodal remote sensing imagery[J]. IEEE Trans Geosci Remote Sens, 2023, 61: 5605415. doi: 10.1109/TGRS.2023.3258666
[17]	Wang L, Liu X B, Ma J T, et al. Real-time steel surface defect detection with improved multi-scale YOLO-v5[J]. Processes, 2023, 11(5): 1357. doi: 10.3390/pr11051357
[18]	Quan Y, Zhang D, Zhang L Y, et al. Centralized feature pyramid for object detection[J]. IEEE Trans Image Process, 2023, 32: 4341−4354. doi: 10.1109/TIP.2023.3297408
[19]	Xu S B, Zheng S C, Xu W H, et al. HCF-Net: hierarchical context fusion network for infrared small object detection[Z]. arXiv: 2403.10778, 2024. https://arxiv.org/abs/2403.10778.
[20]	Li Y X, Li X, Dai Y M, et al. LSKNet: a foundation lightweight backbone for remote sensing[Z]. arXiv: 2403.11735, 2024. https://arxiv.org/abs/2403.11735.
[21]	Li X, Wang W H, Hu X L, et al. Selective kernel networks[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 2019: 510–519. https://doi.org/10.1109/CVPR.2019.00060.
[22]	梁礼明, 詹涛, 雷坤, 等. 多分辨率融合输入的U型视网膜血管分割算法[J]. 电子与信息学报, 2023, 45(5): 1795−1806. doi: 10.11999/JEIT220470 Liang L M, Zhan T, Lei K, et al. Multi-resolution fusion input U-shaped retinal vessel segmentation algorithm[J]. J Electron Inf Technol, 2023, 45(5): 1795−1806. doi: 10.11999/JEIT220470
[23]	Chen Y X, Lin M W, He Z, et al. Consistency-and dependence-guided knowledge distillation for object detection in remote sensing images[J]. Expert Syst Appl, 2023, 229: 120519. doi: 10.1016/j.eswa.2023.120519
[24]	Zhao D W, Shao F M, Liu Q, et al. A small object detection method for drone-captured images based on improved YOLOv7[J]. Remote Sens, 2024, 16(6): 1002. doi: 10.3390/rs16061002
[25]	Xia G S, Bai X, Ding J, et al. DOTA: a large-scale dataset for object detection in aerial images[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 3974–3983. https://doi.org/10.1109/CVPR.2018.00418.

施引文献

资源附件(0)

访问统计

访问统计

点击扫一扫

图(7)

表(8)

计量

文章访问数:
PDF下载数:
施引文献: 0

融合视觉中心机制和并行补丁感知的遥感图像检测算法

**^*通讯作者:** 陈康泉，1136344152@qq.com。

Remote sensing image detection algorithm integrating visual center mechanism and parallel patch perception

**^*Corresponding author:** 1136344152@qq.com

摘要

Abstract

Overview

参考文献

访问统计

计量

目录

作者须知

其他内容

条款和政策

融合视觉中心机制和并行补丁感知的遥感图像检测算法

*通讯作者: 陈康泉，1136344152@qq.com。

Remote sensing image detection algorithm integrating visual center mechanism and parallel patch perception

*Corresponding author: 1136344152@qq.com

摘要

Abstract

Overview

参考文献

访问统计

计量

出版历程

目录

作者须知

其他内容

条款和政策

**^*通讯作者:** 陈康泉，1136344152@qq.com。

**^*Corresponding author:** 1136344152@qq.com