融合视觉中心机制和并行补丁感知的遥感图像检测算法

梁礼明,陈康泉,王成斌,等. 融合视觉中心机制和并行补丁感知的遥感图像检测算法[J]. 光电工程,2024,51(7): 240099. doi: 10.12086/oee.2024.240099
引用本文: 梁礼明,陈康泉,王成斌,等. 融合视觉中心机制和并行补丁感知的遥感图像检测算法[J]. 光电工程,2024,51(7): 240099. doi: 10.12086/oee.2024.240099
Liang L M, Chen K Q, Wang C B, et al. Remote sensing image detection algorithm integrating visual center mechanism and parallel patch perception[J]. Opto-Electron Eng, 2024, 51(7): 240099. doi: 10.12086/oee.2024.240099
Citation: Liang L M, Chen K Q, Wang C B, et al. Remote sensing image detection algorithm integrating visual center mechanism and parallel patch perception[J]. Opto-Electron Eng, 2024, 51(7): 240099. doi: 10.12086/oee.2024.240099

融合视觉中心机制和并行补丁感知的遥感图像检测算法

  • 基金项目:
    国家自然科学基金资助项目(51365017,61463018);江西省自然科学基金资助项目(20192BAB205084);江西省教育厅科学技术研究青年项目(GJJ2200848)
详细信息
    作者简介:
    *通讯作者: 陈康泉,1136344152@qq.com
  • 中图分类号: TP391

Remote sensing image detection algorithm integrating visual center mechanism and parallel patch perception

  • Fund Project: Project supported by National Natural Science Foundation of China (51365017, 61463018), Natural Science Foundation of Jiangxi Province (20192BAB205084), Jiangxi Provincial Department of Education Science, and Technology Research Youth Project (GJJ2200848)
More Information
  • 针对遥感图像存在复杂背景干扰、目标多尺度差异和微小目标提取难的问题,本文基于YOLOv7-tiny模型提出一种融合视觉中心机制和并行补丁感知的遥感图像检测算法。该算法一是引入显式视觉中心机制,构建像素点间的长距离依赖关系,丰富图像的整体语义信息,同时提升对目标纹理的提取性能;二是改进并行补丁感知模块,调整特征提取感受野,以适应不同目标尺度;三是设计多尺度特征融合模块,实现对多层特征的高效融合,提升模型推理速度。在公共数据集RSOD上进行实验,所提算法的准确率、召回率和平均准确率均值相较YOLOv7-tiny分别提升1.5%、2.4%和2.4%,此外在NWPU VHR-10和DOTA数据集上进行泛化性验证,结果表明本文算法具备较强的泛化性能。通过与不同算法对比分析,进一步体现本文算法性能的优越性。

  • Overview: In response to challenges posed by complex background interference, multi-scale variations of targets, and difficulties in extracting small targets in remote sensing images, this paper proposes a novel remote sensing image detection algorithm based on the YOLOv7-tiny model. The algorithm integrates a visual centering mechanism and parallel patch perception to enhance target detection performance. The algorithm introduces three main innovations. Firstly, it introduces an explicit visual centering mechanism that uses a lightweight multi-layer perceptron to establish long-distance dependencies between pixels, focusing on capturing central features of contextual information to enrich the overall semantic information of images, including scene structures and contextual details. Simultaneously, a trainable visual centering mechanism aggregates local area information within layers to capture locally representative feature representations, thereby further improving the extraction performance of target textures. This approach effectively extracts and utilizes the overall semantic information of images, accurately capturing global features of targets to enhance recognition of target textures and shapes during detection. Secondly, the algorithm improves the parallel patch perception module by dynamically adjusting the feature extraction receptive field to adapt to different target scales and capture diverse scale feature information, effectively handling varied backgrounds. In practical applications, targets in remote sensing images often exhibit different scales and complex environmental backgrounds, where traditional methods may struggle to distinguish or ignore these differences. By dynamically adjusting the receptive field, the algorithm flexibly perceives targets of different scales while maintaining high accuracy and low error rates in complex background scenarios. Finally, the algorithm designs a multi-scale feature fusion module to efficiently integrate multi-level and multi-scale feature information, comprehensively capturing diverse representations of targets and further enhancing model inference speed while meeting high-precision detection requirements. This fusion method significantly enhances the algorithm's effectiveness in static image detection tasks. Experimental results on the RSOD dataset demonstrate improvements in accuracy, recall, and mean average precision by 1.5%, 2.4%, and 2.4%, respectively, compared to YOLOv7-tiny. Additionally, generalization validation on the NWPU VHR-10 and DOTA datasets shows commendable results, with average precision mean values increasing by 3.0% and 1.3%, respectively, compared to baseline models. These findings illustrate the algorithm's outstanding performance not only on the RSOD dataset but also on datasets encompassing diverse types and scenes, highlighting its robust generalization capability. Through comparative analysis with different algorithms, the superiority of the proposed algorithm's performance is further underscored.

  • 加载中
  • 图 1  融合视觉中心机制和并行补丁感知的遥感图像检测模型

    Figure 1.  Remote sensing image detection model integrating visual center mechanism and parallel patch perception

    图 2  显式视觉中心机制

    Figure 2.  Explicit visual center mechanism

    图 3  并行多分支特征提取模块

    Figure 3.  Parallel multi-branch feature extraction module

    图 4  大型选择核模块

    Figure 4.  Large selective kernel module

    图 5  多尺度特征融合模块

    Figure 5.  Multi-scale feature fusion module

    图 6  不同算法遥感目标检测结果

    Figure 6.  Remote sensing target detection results of different algorithms

    表 1  参数设置

    Table 1.  Parameter setting

    参数 参数值
    输入图像分辨率 640×640
    初始学习率 0.01
    动量参数 0.937
    权重衰减系数 0.0005
    训练轮次 300
    批量大小 16
    下载: 导出CSV

    表 2  不同注意力对比实验

    Table 2.  Experiments on contrasting attentional differences

    注意力 参数量/M FPS mAP@0.5/%
    CBAM 11.3 92 96.1
    SE 11.1 90 93.4
    CA 11.1 92 96.2
    EMA 11.1 93 95.6
    LSK 11.5 92 96.5
    下载: 导出CSV

    表 3  大内核分解为两个深度可分离卷积的有效性

    Table 3.  Effectiveness of decomposing a large kernel into two sequences of depth-wise separable kernels

    (k1, d1) (k2, d2) RF FPS mAP@0.5/%
    (3, 1) (5, 2) 11 104 92.0
    (5, 1) (7, 3) 23 105 95.0
    (7, 1) (9, 4) 39 88 94.7
    下载: 导出CSV

    表 4  MFFM与ELAN对比实验

    Table 4.  Comparison of experiments between MFFM and ELAN

    模块 参数量/M FPS mAP@0.5/%
    ELAN 6.0 88 94.6
    MFFM 4.8 126 94.2
    下载: 导出CSV

    表 5  消融实验数据

    Table 5.  Ablation experimental data

    模型 准确率P/% 召回率R/% 平均准确率AP/% 平均准确率均值
    mAP@0.5/%
    飞机 油桶 立交桥 操场
    M1 90.3 93.1 97.9 97.8 85.0 97.7 94.6
    M2 92.6 91.2 97.7 98.5 88.5 98.4 95.8
    M3 92.0 95.2 97.8 98.8 91.4 99.5 96.9
    M4 91.8 95.5 97.7 98.6 93.0 98.8 97.0
    下载: 导出CSV

    表 6  不同算法检测数据对比

    Table 6.  Comparison of detection data from different algorithms

    模型 参数量/M FPS 平均准确率AP/% 平均准确率均值
    mAP@0.5%
    飞机 油桶 立交桥 操场
    Faster R-CNN 72.0 10 71.0 98.0 85.0 100.0 88.5
    SSD 24.4 43 79.0 98.0 73.0 100.0 87.5
    YOLOv3-tiny 12.1 104 94.2 96.4 76.9 98.5 91.5
    YOLOv4-tiny 6.1 50 70.7 97.3 61.7 99.1 82.4
    YOLOv5s 9.1 90 97.4 97.8 87.4 99.3 95.5
    YOLOv5m 25.0 56 97.0 96.8 89.4 99.2 95.6
    YOLOv7-tiny 6.0 88 97.9 97.8 85.0 97.7 94.6
    YOLOv8s 11.1 97 97.6 97.2 82.8 99.4 94.3
    YOLOv8m 25.8 53 97.2 98.1 84.3 99.5 94.8
    ours 11.5 85 97.7 98.6 93.0 98.8 97.0
    下载: 导出CSV

    表 7  NWPU VHR-10数据集上检测结果对比

    Table 7.  Comparison of detection results on NWPU VHR-10 dataset

    模型 准确率P/% 召回率R/% 参数量/M FPS mAP@0.5/%
    YOLOv7-tiny 88.7 88.4 6.0 83 90.7
    Ours 92.5 87.6 11.5 79 93.7
    下载: 导出CSV

    表 8  DOTA数据集上检测结果对比

    Table 8.  Comparison of detection results on DOTA dataset

    模型 准确率P/% 召回率R/% 参数量/M FPS mAP@0.5/%
    YOLOv7-tiny 78.2 70.4 6.0 82 74.7
    Ours 80.0 71.2 11.5 77 76.0
    下载: 导出CSV
  • [1]

    马梁, 苟于涛, 雷涛, 等. 基于多尺度特征融合的遥感图像小目标检测[J]. 光电工程, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363

    Ma L, Gou Y T, Lei T, et al. Small object detection based on multi-scale feature fusion using remote sensing images[J]. Opto-Electron Eng, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363

    [2]

    袁金豪, 张南峰, 阮洁珊, 等. 基于改进YOLOX算法的X射线图像违禁品检测方法[J]. 激光技术, 2023, 47(4): 547−552. doi: 10.7510/jgjs.issn.1001-3806.2023.04.016

    Yuan J H, Zhang N F, Ruan J S, et al. Detection of prohibited items in X-ray images based on modified YOLOX algorithm[J]. Laser Technol, 2023, 47(4): 547−552. doi: 10.7510/jgjs.issn.1001-3806.2023.04.016

    [3]

    Ming Q, Miao L J, Zhou Z Q, et al. CFC-Net: a critical feature capturing network for arbitrary-oriented object detection in remote-sensing images[J]. IEEE Trans Geosci Remote Sens, 2022, 60: 5605814. doi: 10.1109/TGRS.2021.3095186

    [4]

    Cong R M, Zhang Y M, Fang L Y, et al. RRNet: relational reasoning network with parallel multiscale attention for salient object detection in optical remote sensing images[J]. IEEE Trans Geosci Remote Sens, 2022, 60: 5613311. doi: 10.1109/TGRS.2021.3123984

    [5]

    Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 2014: 580–587. https://doi.org/10.1109/CVPR.2014.81.

    [6]

    Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, 2015: 1440–1448. https://doi.org/10.1109/ICCV.2015.169.

    [7]

    Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137−1149. doi: 10.1109/TPAMI.2016.2577031

    [8]

    He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, 2017: 2961–2969. https://doi.org/10.1109/ICCV.2017.322.

    [9]

    Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference, Amsterdam, 2016: 21–37. https://doi.org/10.1007/978-3-319-46448-0_2.

    [10]

    Zhao L Q, Li S Y. Object detection algorithm based on improved YOLOv3[J]. Electronics, 2020, 9(3): 537. doi: 10.3390/electronics9030537

    [11]

    Gai R L, Chen N, Yuan H. A detection algorithm for cherry fruits based on the improved YOLO-v4 model[J]. Neural Comput Appl, 2023, 35(19): 13895−13906. doi: 10.1007/s00521-021-06029-z

    [12]

    Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, 2023: 7464–7475. https://doi.org/10.1109/CVPR52729.2023.00721.

    [13]

    Salehi A W, Khan S, Gupta G, et al. A study of CNN and transfer learning in medical imaging: advantages, challenges, future scope[J]. Sustainability, 2023, 15(7): 5930. doi: 10.3390/su15075930

    [14]

    Gao S H, Li Z Y, Han Q, et al. RF-Next: efficient receptive field search for convolutional neural networks[J]. IEEE Trans Pattern Anal Mach Intell, 2023, 45(3): 2984−3002. doi: 10.1109/TPAMI.2022.3183829

    [15]

    Gao T, Niu Q Q, Zhang J, et al. Global to local: a scale-aware network for remote sensing object detection[J]. IEEE Trans Geosci Remote Sens, 2023, 61: 5615614. doi: 10.1109/TGRS.2023.3294241

    [16]

    Zhang J Q, Lei J, Xie W Y, et al. SuperYOLO: super resolution assisted object detection in multimodal remote sensing imagery[J]. IEEE Trans Geosci Remote Sens, 2023, 61: 5605415. doi: 10.1109/TGRS.2023.3258666

    [17]

    Wang L, Liu X B, Ma J T, et al. Real-time steel surface defect detection with improved multi-scale YOLO-v5[J]. Processes, 2023, 11(5): 1357. doi: 10.3390/pr11051357

    [18]

    Quan Y, Zhang D, Zhang L Y, et al. Centralized feature pyramid for object detection[J]. IEEE Trans Image Process, 2023, 32: 4341−4354. doi: 10.1109/TIP.2023.3297408

    [19]

    Xu S B, Zheng S C, Xu W H, et al. HCF-Net: hierarchical context fusion network for infrared small object detection[Z]. arXiv: 2403.10778, 2024. https://arxiv.org/abs/2403.10778.

    [20]

    Li Y X, Li X, Dai Y M, et al. LSKNet: a foundation lightweight backbone for remote sensing[Z]. arXiv: 2403.11735, 2024. https://arxiv.org/abs/2403.11735.

    [21]

    Li X, Wang W H, Hu X L, et al. Selective kernel networks[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 2019: 510–519. https://doi.org/10.1109/CVPR.2019.00060.

    [22]

    梁礼明, 詹涛, 雷坤, 等. 多分辨率融合输入的U型视网膜血管分割算法[J]. 电子与信息学报, 2023, 45(5): 1795−1806. doi: 10.11999/JEIT220470

    Liang L M, Zhan T, Lei K, et al. Multi-resolution fusion input U-shaped retinal vessel segmentation algorithm[J]. J Electron Inf Technol, 2023, 45(5): 1795−1806. doi: 10.11999/JEIT220470

    [23]

    Chen Y X, Lin M W, He Z, et al. Consistency-and dependence-guided knowledge distillation for object detection in remote sensing images[J]. Expert Syst Appl, 2023, 229: 120519. doi: 10.1016/j.eswa.2023.120519

    [24]

    Zhao D W, Shao F M, Liu Q, et al. A small object detection method for drone-captured images based on improved YOLOv7[J]. Remote Sens, 2024, 16(6): 1002. doi: 10.3390/rs16061002

    [25]

    Xia G S, Bai X, Ding J, et al. DOTA: a large-scale dataset for object detection in aerial images[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 3974–3983. https://doi.org/10.1109/CVPR.2018.00418.

  • 加载中

(7)

(8)

计量
  • 文章访问数: 
  • PDF下载数: 
  • 施引文献:  0
出版历程
收稿日期:  2024-05-01
修回日期:  2024-07-10
录用日期:  2024-07-10
刊出日期:  2024-08-20

目录

/

返回文章
返回