基于多尺度特征融合的遥感图像小目标检测

马梁,苟于涛,雷涛,等. 基于多尺度特征融合的遥感图像小目标检测[J]. 光电工程,2022,49(4): 210363. doi: 10.12086/oee.2022.210363
引用本文: 马梁,苟于涛,雷涛,等. 基于多尺度特征融合的遥感图像小目标检测[J]. 光电工程,2022,49(4): 210363. doi: 10.12086/oee.2022.210363
Ma L, Gou Y T, Lei T, et al. Small object detection based on multi-scale feature fusion using remote sensing images[J]. Opto-Electron Eng, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363
Citation: Ma L, Gou Y T, Lei T, et al. Small object detection based on multi-scale feature fusion using remote sensing images[J]. Opto-Electron Eng, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363

基于多尺度特征融合的遥感图像小目标检测

详细信息
    作者简介:
    通讯作者: 雷涛,taoleiyan@ioe.ac.cn
  • 中图分类号: TP751

Small object detection based on multi-scale feature fusion using remote sensing images

More Information
  • 本文提出了一种鲁棒的基于多尺度特征融合的遥感图像小目标检测方法。考虑到常用的特征提取网络参数量庞大,过多的下采样可能导致小目标消失,同时基于自然图像的预训练模型直接应用到遥感图像中可能存在特征鸿沟。因此,根据数据集中所有目标尺寸的分布情况(即:先验知识),首先提出了一种基于动态选择机制的轻量化特征提取模块,它允许每个神经元依据目标的不同尺度自适应地分配用于检测的感受野大小并快速从头训练模型。其次,不同尺度特征所反应的信息量各不相同且各有侧重,因此提出了基于自适应特征加权融合的FPN (feature pyramid networks)模块,它利用分组卷积的方式对所有特征通道分组且组间互不影响,从而增加图像特征表达的准确性。另外,深度学习需要大量数据驱动,由于遥感小目标数据集匮乏,自建了一个遥感飞机小目标数据集,并对DOTA数据集中的飞机和小汽车目标做处理,使其尺寸分布满足小目标检测的任务。实验结果表明,与大多数主流检测方法对比,本文方法在DOTA和自建数据集上取得了更好的结果。

  • Overview: In recent years, with the continuous development of remote sensing optical technology, the acquisition of a large number of high-resolution remote sensing images has promoted the construction of environmental monitoring, animal protection, national defense and military. In numerous remote sensing image visual tasks, remote sensing aircraft detection is of great significance for civil and national defense. Research of the remote sensing small object detection technology is important. Currently, the object detection method based on deep learning has achieved excellent results in large and medium object testing tasks, but the performance and application of remote sensing small object detection are poor. The main reasons are the following: 1) the model is huge, and the real-time is poor; 2) remote sensing image is complicated and the object scale distribution is wide; 3) remote sensing small object detection dataset is extremely lacking.

    To solve the above problems, this paper proposes a robust small object detection method based on multi-scale feature fusion using remote sensing images. The main work as follows. First, as the image will be sampled and convolved for many times after being input into common neural networks (such as ResNet and VGG-16), the features of small objects will be seriously lost and the final detection accuracy will be affected. To this end, according to the distribution of all object sizes in the dataset (i.e., prior knowledge), we propose a lightweight feature extraction module based on dynamic selection mechanism, which allows each neuron to adaptively allocate the receptive field size for detection and control the sampling times based on different scale of the objects. Second, although FPN is widely used to solve the problem of small object undetected, the information reflected by various scale features usually has different amounts and emphasis. Therefore, the FPN module based on adaptive feature weighted fusion is proposed, which uses the method of grouping convolution to group all feature channels without affecting each other, so as to further improve the accuracy of image feature expression. Third, for the issue of lack of remote sensing small object dataset, this paper built a remote sensing small object dataset of plane, and processed the plane and small-vehicle objects in DOTA-1.5 dataset to make its distribution of size meet the requirement of small object detection. Finally, experimental results on DOTA and self-built datasets show that our method posseses the best results compared with mainstream detection methods.

  • 加载中
  • 图 1  遥感图像中的复杂背景

    Figure 1.  Complex background in remote sensing images

    图 2  网络框架

    Figure 2.  Network framework

    图 3  网络结构图

    Figure 3.  Network structure

    图 4  基于分组卷积的特征加权方法

    Figure 4.  Feature weighting method based on grouped convolution

    图 5  (a)卷积网络感受野示意图;(b)基于感受野的目标分类策略

    Figure 5.  (a) Schematic diagram of convolutional network receptive field; (b) Object classification strategy based on receptive field

    图 6  数据集目标尺度分布

    Figure 6.  Object scale distribution of the dataset

    图 7  DOTA数据集中的飞机与小汽车图像样例。(a)训练集,(b)测试集

    Figure 7.  Sample of plane and small-vehicle image of DOTA dataset used in the experiment. (a) Training set; (b) Testing set

    图 8  目标剪切粘贴流程示意图

    Figure 8.  Objects cut and copy flow diagram

    图 9  网络在DOTA飞机训练集上训练的loss曲线

    Figure 9.  The loss curve of the network trained on the DOTA plane training set

    图 10  网络在DOTA小汽车训练集上训练的loss曲线

    Figure 10.  The loss curve of the network trained on the DOTA small-vehicle training set

    图 12  部分小汽车检测结果。

    Figure 12.  Partial small-vehicle test results.

    图 13  融合因子不同初始值下的模型收敛情况

    Figure 13.  Model convergence under different initial values of fusion factors

    图 11  部分飞机检测结果。

    Figure 11.  Partial plane test results.

    表 1  不同网络的参数量

    Table 1.  Parameters of different networks

    模型参数量M
    VGG16138
    ResNet5025.6
    ResNet10144.6
    Ours0.49
    下载: 导出CSV

    表 2  特征图感受野与对应目标尺寸参数

    Table 2.  Receptive field of feature map and corresponding object size parameters

    金字塔层数检测目标尺寸下采样倍数感受野感受野步长感受野/目标尺寸
    两层16~2545543.5
    225~5089582.5
    三层16~1022322.9
    210~2044743.1
    320~5087982.3
    下载: 导出CSV

    表 3  不同特征融合方案的检测结果

    Table 3.  Detection results of different feature fusion schemes

    Basic unitFPNmAPPrecisionRecall
    两层B_11-86.876.888.9
    B_1187.282.688.8
    三层B_10-87.447.491.8
    B_1088.583.890.4
    下载: 导出CSV

    表 4  网络不同配置下的DOTA飞机数据集测试结果

    Table 4.  DOTA plane dataset test results under different network configurations

    B_13FPN分组数量(3)特征图通道(channel)常数融合因子[0.71,0.87]mAPPrecisionRecall
    ----80.563.482.8
    ---82.081.485.1
    --82.385.184.5
    --83.685.587.0
    --82.582.385.6
    下载: 导出CSV

    表 5  网络不同配置下的DOTA小汽车数据集测试结果

    Table 5.  DOTA small-vehicle dataset test results under different network configurations

    B_12FPN分组数量(3)特征图通道(channel)常数融合因子[0.63,1.28]mAPPrecisionRecall
    ----63.756.873.9
    ---65.986.168.5
    --66.383.368.9
    --68.786.471.7
    --64.484.067.3
    下载: 导出CSV

    表 6  网络不同配置下的自建数据集测试结果

    Table 6.  Test results of our dataset under different network configurations

    B_10FPN分组数量(3)特征图通道(channel)常数融合因子[1.08,1.05]mAPPrecisionRecall
    ----89.944.293.7
    ---90.283.691.4
    --90.684.892.0
    --91.087.792.4
    --未收敛未收敛未收敛
    下载: 导出CSV

    表 7  数据集各尺度目标分布数量统计

    Table 7.  Statistics of the distribution of each scale objects number

    数据集尺度目标数量常数融合因子$\left( { \dfrac{{\mathit{S}}_{\mathit{i}+1}}{{\mathit{S}}_{\mathit{i}}} } \right)$
    DOTA飞机训练集$ {\mathit{S}}_{1} $[6-12]55030.87
    $ {\mathit{S}}_{2} $[12-30]4807
    $ {\mathit{S}}_{3} $[30-70]34280.71
    DOTA小汽车训练集$ {\mathit{S}}_{1} $[6-15]598751.28
    $ {\mathit{S}}_{2} $[15-25]76615
    $ {\mathit{S}}_{3} $[25-60]482030.63
    自建数据训练集$ {\mathit{S}}_{1} $[6-10]49631.05
    $ {\mathit{S}}_{2} $[10-20]5196
    $ {\mathit{S}}_{3} $[20-50]56251.08
    下载: 导出CSV

    表 8  融合因子初始值对检测性能的影响

    Table 8.  Influence of initial value of fusion factor on detection performance

    融合因子初始值mAP
    183.6
    随机初始化80.7
    下载: 导出CSV

    表 9  CBAM与自适应融合模块对检测性能的影响

    Table 9.  Influence of CBAM and adaptive fusion module on detection performance

    模型+数据集mAPPrecisionRecall推理速度/(s/张)
    B_10+FPN+CBAM(自建数据集) 90.5 83.8 90.6 0.036
    B_10+FPN+自适应融合模块(自建数据集) 91.0 87.7 92.4 0.027
    B_13+FPN+CBAM(DOTA飞机数据集) 83.0 82.6 85.8 0.048
    B_13+FPN+自适应融合模块(DOTA飞机数据集) 83.6 85.5 87.0 0.037
    B_12+FPN+CBAM(DOTA小汽车数据集) 67.6 83.0 71.1 0.043
    B_12+FPN+自适应融合模块(DOTA小汽车数据集) 68.7 83.3 71.7 0.034
    下载: 导出CSV

    表 10  不同方法检测性能对比

    Table 10.  Comparison of detection performance of different methods

    方法DOTA飞机数据集(mAP)DOTA小汽车数据集(mAP)自建数据集(mAP)
    SSD63.443.364.4
    RetinaNet55.245.162.7
    Yolov3-tiny70.858.374.3
    Faster R-CNN73.059.088.6
    Ours83.668.791.0
    下载: 导出CSV

    表 11  基于自适应特征加权融合的FPN模块在Faster R-CNN上的性能

    Table 11.  Performance of FPN module based on adaptive feature weighted fusion on Faster R-CNN

    Backbone+数据集mAP
    ResNet50+FPN(自建数据集) 88.6
    ResNet50+自适应融合模块(自建数据集) 89.7
    ResNet50+FPN(DOTA飞机数据集) 73.0
    ResNet50+自适应融合模块(DOTA飞机数据集) 73.8
    ResNet50+FPN(DOTA小汽车数据集) 59.0
    ResNet50+自适应融合模块(DOTA小汽车数据集) 63.2
    下载: 导出CSV
  • [1]

    Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580–587.

    [2]

    Girshick R. Fast R-Cnn[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, 2015: 1440–1448.

    [3]

    Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015, 28: 91–99.

    [4]

    Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 21–37.

    [5]

    Redmon J, Farhadi A. YOLOV3: an incremental improvement[Z]. arXiv: 1804.02767, 2018. https://doi.org/10.48550/arXiv.1804.02767

    [6]

    Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 2999–3007.

    [7]

    Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936–944.

    [8]

    Fu C Y, Liu W, Ranga A, et al. DSSD: deconvolutional single shot detector[Z]. arXiv: 1701.06659, 2017. https://arxiv.org/abs/1701.06659

    [9]

    Li Z X, Zhou F Q. FSSD: feature fusion single shot multibox detector[Z]. arXiv: 1712.00960, 2017. https://doi.org/10.48550/arXiv.1712.00960

    [10]

    Cui L S, Ma R, Lv P, et al. MDSSD: multi-scale deconvolutional single shot detector for small objects[Z]. arXiv: 1805.07009, 2018. https://doi.org/10.48550/arXiv.1805.07009

    [11]

    Liang Z W, Shao J, Zhang D Y, et al. Small object detection using deep feature pyramid networks[C]//Proceedings of the 19th Pacific Rim Conference on Multimedia, 2018: 554–564.

    [12]

    Cao G M, Xie X M, Yang W Z, et al. Feature-fused SSD: fast detection for small objects[J]. Proc SPIE, 2018, 10615: 106151E.

    [13]

    Zhang S F, Wen L Y, Bian X, et al. Single-shot refinement neural network for object detection[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 4203–4212.

    [14]

    Zhao Q J, Sheng T, Wang Y T, et al. M2Det: a single-shot object detector based on multi-level feature pyramid network[J]. Proc AAAI Conf Artif Intell, 2019, 33(1): 9259−9266.

    [15]

    徐安林, 杜丹, 王海红, 等. 结合层次化搜索与视觉残差网络的光学舰船目标检测方法[J]. 光电工程, 2021, 48(4): 200249.

    Xu A L, Du D, Wang H H, et al. Optical ship target detection method combining hierarchical search and visual residual network[J]. Opto-Electron Eng, 2021, 48(4): 200249.

    [16]

    赵春梅, 陈忠碧, 张建林. 基于卷积网络的目标跟踪应用研究[J]. 光电工程, 2020, 47(1): 180668.

    Zhao C M, Chen Z B, Zhang J L. Research on target tracking based on convolutional networks[J]. Opto-Electron Eng, 2020, 47(1): 180668.

    [17]

    金瑶, 张锐, 尹东. 城市道路视频中小像素目标检测[J]. 光电工程, 2019, 46(9): 190053.

    Jin Y, Zhang R, Yin D. Object detection for small pixel in urban roads videos[J]. Opto-Electron Eng, 2019, 46(9): 190053.

    [18]

    Pang J M, Li C, Shi J P, et al. R²-CNN: fast tiny object detection in large-scale remote sensing images[J]. IEEE Trans Geosci Remote Sens, 2019, 57(8): 5512−5524. doi: 10.1109/TGRS.2019.2899955

    [19]

    Zhang G J, Lu S J, Zhang W. CAD-Net: a context-aware detection network for objects in remote sensing imagery[J]. IEEE Trans Geosci Remote Sens, 2019, 57(12): 10015−10024. doi: 10.1109/TGRS.2019.2930982

    [20]

    Gong Y Q, Yu X H, Ding Y, et al. Effective fusion factor in FPN for tiny object detection[C]//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision, 2021: 1159–1167.

    [21]

    Xia G S, Bai X, Ding J, et al. DOTA: a large-scale dataset for object detection in aerial images[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 3974–3983.

    [22]

    Ding J, Xue N, Xia G S, et al. Object detection in aerial images: a large-scale benchmark and challenges[Z]. arXiv: 2102.12219, 2021. https://doi.org/10.48550/arXiv.2102.12219

    [23]

    Han J W, Zhang D W, Cheng G, et al. Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning[J]. IEEE Trans Geosci Remote Sens, 2015, 53(6): 3325−3337. doi: 10.1109/TGRS.2014.2374218

    [24]

    Long Y, Gong Y P, Xiao Z F, et al. Accurate object localization in remote sensing images based on convolutional neural networks[J]. IEEE Trans Geosci Remote Sens, 2017, 55(5): 2486−2498. doi: 10.1109/TGRS.2016.2645610

    [25]

    Hu F, Xia G S, Hu J W, et al. Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery[J]. Remote Sens, 2015, 7(11): 14680−14707. doi: 10.3390/rs71114680

    [26]

    Ševo I, Avramović A. Convolutional neural network based automatic object detection on aerial images[J]. IEEE Geosci Remote Sens Lett, 2016, 13(5): 740−744. doi: 10.1109/LGRS.2016.2542358

    [27]

    Cheng G, Zhou P C, Han J W. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images[J]. IEEE Trans Geosci Remote Sens, 2016, 54(12): 7405−7415. doi: 10.1109/TGRS.2016.2601622

    [28]

    赵春梅, 陈忠碧, 张建林. 基于深度学习的飞机目标跟踪应用研究[J]. 光电工程, 2019, 46(9): 180261.

    Zhao C M, Chen Z B, Zhang J L. Application of aircraft target tracking based on deep learning[J]. Opto-Electron Eng, 2019, 46(9): 180261.

    [29]

    Deng J, Dong W, Socher R, et al. Imagenet: a large-scale hierarchical image database[C]//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009: 248–255.

    [30]

    Xu Y C, Fu M T, Wang Q M, et al. Gliding vertex on the horizontal bounding box for multi-oriented object detection[J]. IEEE Trans Pattern Anal Mach Intell, 2021, 43(4): 1452−1459. doi: 10.1109/TPAMI.2020.2974745

    [31]

    Yang X, Yang J R, Yan J C, et al. SCRDet: towards more robust detection for small, cluttered and rotated objects[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 2019: 8231–8240.

    [32]

    Azimi S M, Vig E, Bahmanyar R, et al. Towards multi-class object detection in unconstrained remote sensing imagery[C]//Proceedings of the 14th Asian Conference on Computer Vision, 2018: 150–165.

    [33]

    He Y H, Xu D Z, Wu L F, et al. LFFD: a light and fast face detector for edge devices[Z]. arXiv: 1904.10633, 2019. https://doi.org/10.48550/arXiv.1904.10633

    [34]

    Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[Z]. arXiv: 1409.1556, 2014. https://doi.org/10.48550/arXiv.1409.1556

    [35]

    He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770–778.

    [36]

    Zhu C C, He Y H, Savvides M. Feature selective anchor-free module for single-shot object detection[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 840–849.

    [37]

    Woo S, Park J, Lee J Y, et al. Cbam: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 3–19.

  • 加载中

(14)

(11)

计量
  • 文章访问数:  8765
  • PDF下载数:  2077
  • 施引文献:  0
出版历程
收稿日期:  2021-11-15
修回日期:  2022-01-06
刊出日期:  2022-04-25

目录

/

返回文章
返回