基于多尺度特征融合的遥感图像小目标检测

马梁; 苟于涛; 雷涛; 靳雷; 宋怡萱

doi:10.12086/oee.2022.210363

基于多尺度特征融合的遥感图像小目标检测

- 1.
  中国科学院光电探测技术研究室，四川成都 610209
- 2.
  中国科学院光电技术研究所，四川成都 610209
- 3.
  中国科学院大学，北京 100049

详细信息

作者简介:
马梁(1997-)，男，硕士，主要从事基于深度学习的目标检测的研究。E-mail: ml3318276387@163.com;

苟于涛(1997-)，男，硕士，主要从事基于深度学习的目标检测和多模图像融合识别的研究。E-mail: gouyutao19@163.com;

【通信作者】雷涛(1981-)，男，博士，研究员，主要从事基于传统方法及深度学习技术的图像处理与分析、复杂场景下目标检测识别与跟踪等方面的研究。E-mail: taoleiyan@ioe.ac.cn

通讯作者: 雷涛，taoleiyan@ioe.ac.cn

中图分类号: TP751

收稿日期: 2021-11-15

修回日期: 2022-01-06

刊出日期: 2022-04-25

Small object detection based on multi-scale feature fusion using remote sensing images

- 1.
  Photoelectric Detection Technology Laboratory, Chinese Academy of Sciences, Chengdu, Sichuan 610209, China
- 2.
  Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu, Sichuan 610209, China
- 3.
  University of Chinese Academy of Sciences, Beijing 100049, China

More Information

Corresponding author: taoleiyan@ioe.ac.cn

Received Date 15 November 2021

Revised Date 06 January 2022

Published Date 25 April 2022

摘要

摘要:
本文提出了一种鲁棒的基于多尺度特征融合的遥感图像小目标检测方法。考虑到常用的特征提取网络参数量庞大，过多的下采样可能导致小目标消失，同时基于自然图像的预训练模型直接应用到遥感图像中可能存在特征鸿沟。因此，根据数据集中所有目标尺寸的分布情况(即：先验知识)，首先提出了一种基于动态选择机制的轻量化特征提取模块，它允许每个神经元依据目标的不同尺度自适应地分配用于检测的感受野大小并快速从头训练模型。其次，不同尺度特征所反应的信息量各不相同且各有侧重，因此提出了基于自适应特征加权融合的FPN (feature pyramid networks)模块，它利用分组卷积的方式对所有特征通道分组且组间互不影响，从而增加图像特征表达的准确性。另外，深度学习需要大量数据驱动，由于遥感小目标数据集匮乏，自建了一个遥感飞机小目标数据集，并对DOTA数据集中的飞机和小汽车目标做处理，使其尺寸分布满足小目标检测的任务。实验结果表明，与大多数主流检测方法对比，本文方法在DOTA和自建数据集上取得了更好的结果。
- 多尺度特征 /
- 小目标检测 /
- 特征融合 /
- 场景复杂度
Abstract:
This paper proposes a robust small object detection method based on multi-scale feature fusion using remote sensing images. When the natural image-based pre-training model is directly applied to the remote sensing images, the large number of parameters and excessive down sampling in widely feature extractions may lead to the disappearances of small objects due to feature gaps. Therefore, based on the distribution of all object sizes in the dataset (i.e., prior knowledge), a lightweight feature extraction module is first integrated via dynamic selection mechanism that allows each neuron to adaptively allocate the receptive field size for detection. Meanwhile, the information reflected by various scale features has different amounts and emphasis. To increase the accuracy of image feature expression, the FPN (feature pyramid networks) module based on adaptive feature weighted fusion is applied by using the grouping convolution to group all feature channels without affecting each other. In addition, deep learning needs a large amount of data to drive. Due to the lack of remote sensing small object dataset, we built a remote sensing plane small object dataset, and processed the plane and small-vehicle objects in DOTA dataset to make its distribution of size meet the requirement of small object detection. Experimental results show that compared with most mainstream detection methods, the proposed method achieves better results on DOTA and self-built datasets.
- multi-scale features /
- small object detection /
- feature fusion /
- scene complexity

Overview

Overview: In recent years, with the continuous development of remote sensing optical technology, the acquisition of a large number of high-resolution remote sensing images has promoted the construction of environmental monitoring, animal protection, national defense and military. In numerous remote sensing image visual tasks, remote sensing aircraft detection is of great significance for civil and national defense. Research of the remote sensing small object detection technology is important. Currently, the object detection method based on deep learning has achieved excellent results in large and medium object testing tasks, but the performance and application of remote sensing small object detection are poor. The main reasons are the following: 1) the model is huge, and the real-time is poor; 2) remote sensing image is complicated and the object scale distribution is wide; 3) remote sensing small object detection dataset is extremely lacking.

To solve the above problems, this paper proposes a robust small object detection method based on multi-scale feature fusion using remote sensing images. The main work as follows. First, as the image will be sampled and convolved for many times after being input into common neural networks (such as ResNet and VGG-16), the features of small objects will be seriously lost and the final detection accuracy will be affected. To this end, according to the distribution of all object sizes in the dataset (i.e., prior knowledge), we propose a lightweight feature extraction module based on dynamic selection mechanism, which allows each neuron to adaptively allocate the receptive field size for detection and control the sampling times based on different scale of the objects. Second, although FPN is widely used to solve the problem of small object undetected, the information reflected by various scale features usually has different amounts and emphasis. Therefore, the FPN module based on adaptive feature weighted fusion is proposed, which uses the method of grouping convolution to group all feature channels without affecting each other, so as to further improve the accuracy of image feature expression. Third, for the issue of lack of remote sensing small object dataset, this paper built a remote sensing small object dataset of plane, and processed the plane and small-vehicle objects in DOTA-1.5 dataset to make its distribution of size meet the requirement of small object detection. Finally, experimental results on DOTA and self-built datasets show that our method posseses the best results compared with mainstream detection methods.

HTML全文

图 1 遥感图像中的复杂背景

Figure 1. Complex background in remote sensing images

下载: 全尺寸图片幻灯片

图 2 网络框架

Figure 2. Network framework

下载: 全尺寸图片幻灯片

图 3 网络结构图

Figure 3. Network structure

下载: 全尺寸图片幻灯片

图 4 基于分组卷积的特征加权方法

Figure 4. Feature weighting method based on grouped convolution

下载: 全尺寸图片幻灯片

图 5 (a)卷积网络感受野示意图；(b)基于感受野的目标分类策略

Figure 5. (a) Schematic diagram of convolutional network receptive field; (b) Object classification strategy based on receptive field

下载: 全尺寸图片幻灯片

图 6 数据集目标尺度分布

Figure 6. Object scale distribution of the dataset

下载: 全尺寸图片幻灯片

图 7 DOTA数据集中的飞机与小汽车图像样例。(a)训练集，(b)测试集

Figure 7. Sample of plane and small-vehicle image of DOTA dataset used in the experiment. (a) Training set; (b) Testing set

下载: 全尺寸图片幻灯片

图 8 目标剪切粘贴流程示意图

Figure 8. Objects cut and copy flow diagram

下载: 全尺寸图片幻灯片

图 9 网络在DOTA飞机训练集上训练的loss曲线

Figure 9. The loss curve of the network trained on the DOTA plane training set

下载: 全尺寸图片幻灯片

图 10 网络在DOTA小汽车训练集上训练的loss曲线

Figure 10. The loss curve of the network trained on the DOTA small-vehicle training set

下载: 全尺寸图片幻灯片

图 12 部分小汽车检测结果。

Figure 12. Partial small-vehicle test results.

下载: 全尺寸图片幻灯片

图 13 融合因子不同初始值下的模型收敛情况

Figure 13. Model convergence under different initial values of fusion factors

下载: 全尺寸图片幻灯片

图 11 部分飞机检测结果。

Figure 11. Partial plane test results.

下载: 全尺寸图片幻灯片

表 1 不同网络的参数量

Table 1. Parameters of different networks

模型	参数量M
VGG16	138
ResNet50	25.6
ResNet101	44.6
Ours	0.49

下载: 导出CSV

表 2 特征图感受野与对应目标尺寸参数

Table 2. Receptive field of feature map and corresponding object size parameters

金字塔层数		检测目标尺寸	下采样倍数	感受野	感受野步长	感受野/目标尺寸
两层	1	6~25	4	55	4	3.5
两层	2	25~50	8	95	8	2.5
三层	1	6~10	2	23	2	2.9
	2	10~20	4	47	4	3.1
	3	20~50	8	79	8	2.3

下载: 导出CSV

表 3 不同特征融合方案的检测结果

Table 3. Detection results of different feature fusion schemes

Basic unit		FPN	mAP	Precision	Recall
两层	B_11	-	86.8	76.8	88.9
两层	B_11	√	87.2	82.6	88.8
三层	B_10	-	87.4	47.4	91.8
三层	B_10	√	88.5	83.8	90.4

下载: 导出CSV

表 4 网络不同配置下的DOTA飞机数据集测试结果

Table 4. DOTA plane dataset test results under different network configurations

B_13	FPN	分组数量(3)	特征图通道(channel)	常数融合因子[0.71,0.87]	mAP	Precision	Recall
√	-	-	-	-	80.5	63.4	82.8
√	√	-	-	-	82.0	81.4	85.1
√	√	√	-	-	82.3	85.1	84.5
√	√	-	√	-	83.6	85.5	87.0
√	√	-	-	√	82.5	82.3	85.6

下载: 导出CSV

表 5 网络不同配置下的DOTA小汽车数据集测试结果

Table 5. DOTA small-vehicle dataset test results under different network configurations

B_12	FPN	分组数量(3)	特征图通道(channel)	常数融合因子[0.63,1.28]	mAP	Precision	Recall
√	-	-	-	-	63.7	56.8	73.9
√	√	-	-	-	65.9	86.1	68.5
√	√	√	-	-	66.3	83.3	68.9
√	√	-	√	-	68.7	86.4	71.7
√	√	-	-	√	64.4	84.0	67.3

下载: 导出CSV

表 6 网络不同配置下的自建数据集测试结果

Table 6. Test results of our dataset under different network configurations

B_10	FPN	分组数量(3)	特征图通道(channel)	常数融合因子[1.08,1.05]	mAP	Precision	Recall
√	-	-	-	-	89.9	44.2	93.7
√	√	-	-	-	90.2	83.6	91.4
√	√	√	-	-	90.6	84.8	92.0
√	√	-	√	-	91.0	87.7	92.4
√	√	-	-	√	未收敛	未收敛	未收敛

下载: 导出CSV

表 7 数据集各尺度目标分布数量统计

Table 7. Statistics of the distribution of each scale objects number

数据集	尺度	目标数量	常数融合因子$\left( { \dfrac{{\mathit{S}}_{\mathit{i}+1}}{{\mathit{S}}_{\mathit{i}}} } \right)$
DOTA飞机训练集	$ {\mathit{S}}_{1} $[6-12]	5503	0.87
	$ {\mathit{S}}_{2} $[12-30]	4807
	$ {\mathit{S}}_{3} $[30-70]	3428	0.71
DOTA小汽车训练集	$ {\mathit{S}}_{1} $[6-15]	59875	1.28
	$ {\mathit{S}}_{2} $[15-25]	76615
	$ {\mathit{S}}_{3} $[25-60]	48203	0.63
自建数据训练集	$ {\mathit{S}}_{1} $[6-10]	4963	1.05
	$ {\mathit{S}}_{2} $[10-20]	5196
	$ {\mathit{S}}_{3} $[20-50]	5625	1.08

下载: 导出CSV

表 8 融合因子初始值对检测性能的影响

Table 8. Influence of initial value of fusion factor on detection performance

融合因子初始值	mAP
1	83.6
随机初始化	80.7

下载: 导出CSV

表 9 CBAM与自适应融合模块对检测性能的影响

Table 9. Influence of CBAM and adaptive fusion module on detection performance

模型+数据集	mAP	Precision	Recall	推理速度/(s/张)
B_10+FPN+CBAM(自建数据集)	90.5	83.8	90.6	0.036
B_10+FPN+自适应融合模块(自建数据集)	91.0	87.7	92.4	0.027
B_13+FPN+CBAM(DOTA飞机数据集)	83.0	82.6	85.8	0.048
B_13+FPN+自适应融合模块(DOTA飞机数据集)	83.6	85.5	87.0	0.037
B_12+FPN+CBAM(DOTA小汽车数据集)	67.6	83.0	71.1	0.043
B_12+FPN+自适应融合模块(DOTA小汽车数据集)	68.7	83.3	71.7	0.034

下载: 导出CSV

表 10 不同方法检测性能对比

Table 10. Comparison of detection performance of different methods

方法	DOTA飞机数据集(mAP)	DOTA小汽车数据集(mAP)	自建数据集(mAP)
SSD	63.4	43.3	64.4
RetinaNet	55.2	45.1	62.7
Yolov3-tiny	70.8	58.3	74.3
Faster R-CNN	73.0	59.0	88.6
Ours	83.6	68.7	91.0

下载: 导出CSV

表 11 基于自适应特征加权融合的FPN模块在Faster R-CNN上的性能

Table 11. Performance of FPN module based on adaptive feature weighted fusion on Faster R-CNN

Backbone+数据集	mAP
ResNet50+FPN(自建数据集)	88.6
ResNet50+自适应融合模块(自建数据集)	89.7
ResNet50+FPN(DOTA飞机数据集)	73.0
ResNet50+自适应融合模块(DOTA飞机数据集)	73.8
ResNet50+FPN(DOTA小汽车数据集)	59.0
ResNet50+自适应融合模块(DOTA小汽车数据集)	63.2

下载: 导出CSV

参考文献(37)

[1]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580–587.
[2]	Girshick R. Fast R-Cnn[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, 2015: 1440–1448.
[3]	Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015, 28: 91–99.
[4]	Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 21–37.
[5]	Redmon J, Farhadi A. YOLOV3: an incremental improvement[Z]. arXiv: 1804.02767, 2018. https://doi.org/10.48550/arXiv.1804.02767
[6]	Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 2999–3007.
[7]	Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936–944.
[8]	Fu C Y, Liu W, Ranga A, et al. DSSD: deconvolutional single shot detector[Z]. arXiv: 1701.06659, 2017. https://arxiv.org/abs/1701.06659
[9]	Li Z X, Zhou F Q. FSSD: feature fusion single shot multibox detector[Z]. arXiv: 1712.00960, 2017. https://doi.org/10.48550/arXiv.1712.00960
[10]	Cui L S, Ma R, Lv P, et al. MDSSD: multi-scale deconvolutional single shot detector for small objects[Z]. arXiv: 1805.07009, 2018. https://doi.org/10.48550/arXiv.1805.07009
[11]	Liang Z W, Shao J, Zhang D Y, et al. Small object detection using deep feature pyramid networks[C]//Proceedings of the 19th Pacific Rim Conference on Multimedia, 2018: 554–564.
[12]	Cao G M, Xie X M, Yang W Z, et al. Feature-fused SSD: fast detection for small objects[J]. Proc SPIE, 2018, 10615: 106151E.
[13]	Zhang S F, Wen L Y, Bian X, et al. Single-shot refinement neural network for object detection[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 4203–4212.
[14]	Zhao Q J, Sheng T, Wang Y T, et al. M2Det: a single-shot object detector based on multi-level feature pyramid network[J]. Proc AAAI Conf Artif Intell, 2019, 33(1): 9259−9266.
[15]	徐安林, 杜丹, 王海红, 等. 结合层次化搜索与视觉残差网络的光学舰船目标检测方法[J]. 光电工程, 2021, 48(4): 200249. Xu A L, Du D, Wang H H, et al. Optical ship target detection method combining hierarchical search and visual residual network[J]. Opto-Electron Eng, 2021, 48(4): 200249.
[16]	赵春梅, 陈忠碧, 张建林. 基于卷积网络的目标跟踪应用研究[J]. 光电工程, 2020, 47(1): 180668. Zhao C M, Chen Z B, Zhang J L. Research on target tracking based on convolutional networks[J]. Opto-Electron Eng, 2020, 47(1): 180668.
[17]	金瑶, 张锐, 尹东. 城市道路视频中小像素目标检测[J]. 光电工程, 2019, 46(9): 190053. Jin Y, Zhang R, Yin D. Object detection for small pixel in urban roads videos[J]. Opto-Electron Eng, 2019, 46(9): 190053.
[18]	Pang J M, Li C, Shi J P, et al. R²-CNN: fast tiny object detection in large-scale remote sensing images[J]. IEEE Trans Geosci Remote Sens, 2019, 57(8): 5512−5524. doi: 10.1109/TGRS.2019.2899955
[19]	Zhang G J, Lu S J, Zhang W. CAD-Net: a context-aware detection network for objects in remote sensing imagery[J]. IEEE Trans Geosci Remote Sens, 2019, 57(12): 10015−10024. doi: 10.1109/TGRS.2019.2930982
[20]	Gong Y Q, Yu X H, Ding Y, et al. Effective fusion factor in FPN for tiny object detection[C]//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision, 2021: 1159–1167.
[21]	Xia G S, Bai X, Ding J, et al. DOTA: a large-scale dataset for object detection in aerial images[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 3974–3983.
[22]	Ding J, Xue N, Xia G S, et al. Object detection in aerial images: a large-scale benchmark and challenges[Z]. arXiv: 2102.12219, 2021. https://doi.org/10.48550/arXiv.2102.12219
[23]	Han J W, Zhang D W, Cheng G, et al. Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning[J]. IEEE Trans Geosci Remote Sens, 2015, 53(6): 3325−3337. doi: 10.1109/TGRS.2014.2374218
[24]	Long Y, Gong Y P, Xiao Z F, et al. Accurate object localization in remote sensing images based on convolutional neural networks[J]. IEEE Trans Geosci Remote Sens, 2017, 55(5): 2486−2498. doi: 10.1109/TGRS.2016.2645610
[25]	Hu F, Xia G S, Hu J W, et al. Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery[J]. Remote Sens, 2015, 7(11): 14680−14707. doi: 10.3390/rs71114680
[26]	Ševo I, Avramović A. Convolutional neural network based automatic object detection on aerial images[J]. IEEE Geosci Remote Sens Lett, 2016, 13(5): 740−744. doi: 10.1109/LGRS.2016.2542358
[27]	Cheng G, Zhou P C, Han J W. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images[J]. IEEE Trans Geosci Remote Sens, 2016, 54(12): 7405−7415. doi: 10.1109/TGRS.2016.2601622
[28]	赵春梅, 陈忠碧, 张建林. 基于深度学习的飞机目标跟踪应用研究[J]. 光电工程, 2019, 46(9): 180261. Zhao C M, Chen Z B, Zhang J L. Application of aircraft target tracking based on deep learning[J]. Opto-Electron Eng, 2019, 46(9): 180261.
[29]	Deng J, Dong W, Socher R, et al. Imagenet: a large-scale hierarchical image database[C]//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009: 248–255.
[30]	Xu Y C, Fu M T, Wang Q M, et al. Gliding vertex on the horizontal bounding box for multi-oriented object detection[J]. IEEE Trans Pattern Anal Mach Intell, 2021, 43(4): 1452−1459. doi: 10.1109/TPAMI.2020.2974745
[31]	Yang X, Yang J R, Yan J C, et al. SCRDet: towards more robust detection for small, cluttered and rotated objects[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 2019: 8231–8240.
[32]	Azimi S M, Vig E, Bahmanyar R, et al. Towards multi-class object detection in unconstrained remote sensing imagery[C]//Proceedings of the 14th Asian Conference on Computer Vision, 2018: 150–165.
[33]	He Y H, Xu D Z, Wu L F, et al. LFFD: a light and fast face detector for edge devices[Z]. arXiv: 1904.10633, 2019. https://doi.org/10.48550/arXiv.1904.10633
[34]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[Z]. arXiv: 1409.1556, 2014. https://doi.org/10.48550/arXiv.1409.1556
[35]	He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770–778.
[36]	Zhu C C, He Y H, Savvides M. Feature selective anchor-free module for single-shot object detection[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 840–849.
[37]	Woo S, Park J, Lee J Y, et al. Cbam: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 3–19.

施引文献

资源附件(0)

访问统计

点击扫一扫

图(14)

表(11)

计量

文章访问数: 8765
PDF下载数: 2077
施引文献: 0

基于多尺度特征融合的遥感图像小目标检测

通讯作者: 雷涛，taoleiyan@ioe.ac.cn

Small object detection based on multi-scale feature fusion using remote sensing images

Corresponding author: taoleiyan@ioe.ac.cn

计量

目录

作者须知

其他内容

条款和政策

基于多尺度特征融合的遥感图像小目标检测

通讯作者: 雷涛，taoleiyan@ioe.ac.cn

Small object detection based on multi-scale feature fusion using remote sensing images

Corresponding author: taoleiyan@ioe.ac.cn

计量

出版历程

目录

作者须知

其他内容

条款和政策