基于双分支多尺度融合网络的毫米波SAR图像多目标语义分割方法

丁俊华; 袁明辉

doi:10.12086/oee.2023.230242

基于双分支多尺度融合网络的毫米波SAR图像多目标语义分割方法

丁俊华^1,2,,
袁明辉^1,2, ,

- 1.
  上海理工大学太赫兹技术创新研究院，上海 200093
- 2.
  上海理工大学光电信息与计算机工程学院，上海 200093
基金项目:
国家自然科学基金资助项目 (61601291)；上海市科委专项资助 (14dz1206602)

详细信息

作者简介:
丁俊华(1998-)，男，硕士研究生，从事计算机视觉研究。E-mail：1483802325@qq.com;

袁明辉(1974-)，男，博士，副教授，从事太赫兹器件研究。E-mail: yuanminghui@usst.edu.cn

**^*通讯作者:** 袁明辉，yuanminghui@usst.edu.cn

中图分类号: TP391

收稿日期: 2023-09-28

修回日期: 2023-11-30

录用日期: 2023-11-30

刊出日期: 2024-01-19

A multi-target semantic segmentation method for millimetre wave SAR images based on a dual-branch multi-scale fusion network

Ding Junhua^1,2,,
Yuan Minghui^1,2, ,

- 1.
  Terahertz Technology Innovation Research Institute, University of Shanghai for Science and Technology, Shanghai 200093, China
- 2.
  School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
Fund Project: Project supported by National Natural Science Foundation of China (61601291), and Shanghai Committee of Science and Technology (14dz1206602)

More Information

**^*Corresponding author:** yuanminghui@usst.edu.cn

Received Date 28 September 2023

Revised Date 30 November 2023

Accepted Date 30 November 2023

Published Date 19 January 2024

摘要

摘要:
在毫米波合成孔径雷达(SAR)安检成像违禁品的检测与识别中，存在着目标尺寸过小、目标被部分遮挡和多目标之间重叠等复杂情况，不利于违禁品的准确识别。针对这些问题，提出了一种基于双分支多尺度融合网络(DBMFnet)的违禁品检测方法。该网络使用Encoder-Decoder的结构，在Encoder阶段，提出一种双分支并行特征提取网络(DBPFEN)来增强特征提取；在Decoder阶段，提出一种多尺度融合模块(MSFM)来提高对目标的检测能力。实验结果表明，该方法的均交并比(mIoU)均优于现有的语义分割方法，降低了漏检与错检率。
- 毫米波合成孔径雷达 /
- 违禁品检测 /
- 深度学习 /
- 语义分割 /
- 双分支多尺度融合网络
Abstract:
There are several major challenges in the detection and identification of contraband in millimetre-wave synthetic aperture radar (SAR) security imaging: the complexities of small target sizes, partially occluded targets and overlap between multiple targets, which are not conducive to the accurate identification of contraband. To address these problems, a contraband detection method based on dual branch multiscale fusion network (DBMFnet) is proposed. The overall architecture of the DBMFnet follows the encoder-decoder framework. In the encoder stage, a dual-branch parallel feature extraction network (DBPFEN) is proposed to enhance the feature extraction. In the decoder stage, a multi-scale fusion module (MSFM) is proposed to enhance the detection ability of the targets. The experimental results show that the proposed method outperforms the existing semantic segmentation methods in the mean intersection over union (mIoU) and reduces the incidence of missed and error detection of targets.
- millimetre-wave synthetic aperture radar /
- contraband detection /
- deep learning /
- semantic segmentation /
- dual-branch multi-scale fusion network

Overview

Overview: With the advancements of millimeter wave technology, millimeter wave security inspection systems have reached a higher level of maturity. Compared with traditional security inspection technologies such as X-ray, infrared, and metal detectors, millimeter wave security imaging not only enables the detection of the metallic objects hidden under fabrics, but also identifies dangerous items such as plastic firearms, knives, explosives, etc. Significantly, it is crucial to note that millimeter waves are non-ionizing and do not cause harm to the human body. The utilization of millimeter wave security inspection enables the acquisition of precise image information and significantly reduces the occurrence of false alarms, making millimeter wave imaging equipment extensively employed in the security inspection of the human body.

There are several major challenges in the detection and identification of contraband in millimetre-wave synthetic aperture radar (SAR) security imaging: the complexities of small target sizes, partially occluded targets and overlap between multiple targets, which are not conducive to the accurate identification of contraband. To address these problems, a contraband detection method based on Dual Branch Multiscale Fusion Network (DBMFnet) is proposed. The overall architecture of the DBMFnet follows the encoder-decoder framework. In the encoder stage, a dual-branch parallel feature extraction network (DBPFEN) is proposed to enhance the feature extraction. In the feature extraction process of DBMFnet, one branch preserves the high resolution while the other branch extracts the rich semantic information through multiple downsampling operations. Bilateral connections are established between high-resolution and low-resolution branches to facilitate repeated feature exchange, ensuring that the high-resolution branch feature maps integrate into the low-rate branch feature maps across different scales, which facilitates the combination of rich semantic information and fine-grained details to improve the detection of small and interfering targets in images. In the decoder stage, a multi-scale fusion module (MSFM) is proposed to enhance the detection ability of the targets. The module consists of the Feature Alignment Module (FAM), which allows multiple low-resolution feature maps to merge into high-resolution maps. The FAM is inspired by the optical flow for the motion alignment between adjacent video frames, where the feature maps F^h, F^lof different resolutions are used as the input and changed to the same number of channels by a 1×1 convolutional layer, respectively. Subsequently, the high-resolution feature map F^h is concatenated with the low-resolution feature map F^l by a bilinear interpolation up-sampling layer.

The experimental results show that when tested using the HM-SAR dataset, our proposed model improves mIoU by 2.54% compared to the existing best performing semantic segmentation models. The ablation experiment shows that the proposed MSFM can effectively improve the mIoU value.

HTML全文

图 1 DBMFnet网络结构图

Figure 1. DBMFnet network structure diagram

下载: 全尺寸图片幻灯片

图 2 特征融合过程

Figure 2. Feature fusion process

下载: 全尺寸图片幻灯片

图 3 不同的特征融合方式。(a) FCM; (b) FDM; (c) MSFM;

Figure 3. Different feature fusion methods. (a) FCM; (b) FDM; (c) MSFM

下载: 全尺寸图片幻灯片

图 4 HM-SAR安检图片。(a) 背面扫描的人体图片；(b) 正面扫描的人体图片

Figure 4. HM-SAR security images. (a) Back scanning image of the human body; (b) Frontal scanning image of the human body

下载: 全尺寸图片幻灯片

图 5 DBMFnet热力图

Figure 5. DBMFnet thermal diagram

下载: 全尺寸图片幻灯片

图 6 各模型测试结果，每一行代表相同图片测试的结果，每一列代表同一模型的测试结果。黑色像素表示背景，红色像素表示锤头，绿色像素表示扳手，黄色像素表示手枪，蓝色像素表示小刀

Figure 6. Test results of each model. Each row represents the test results of the same picture, and each column represents the test results of the same model. Black denotes the background, green denotes the wrench, yellow denotes the pistol, red denotes the hammer, and blue denotes the knife

下载: 全尺寸图片幻灯片

图 7 基线模型

Figure 7. Baseline model

下载: 全尺寸图片幻灯片

表 1 双分支特征提取网络结构

Table 1. Architectures of DBFEN

Stage	Output	DBFEN	Stage	Output	DBFEN
Conv1	256×256	3×3, 64, stride 2	Conv6	64×64	$ \left( \begin{array}{l}3 \times 3,\;128\\3 \times 3,\;128\end{array} \right) \times 2$
Conv2	128×128	3×3, 64, stride 2	Conv7	16×16	$ \left( \begin{array}{l}3 \times 3,\;256\\3 \times 3,\;512\end{array} \right) \times 2$
Conv3	64×64	$ \left( \begin{array}{l}3 \times 3,\;64\\3 \times 3,\;128\end{array} \right) \times 2$	Conv8	64×64	$ \left( \begin{array}{l}3 \times 3,\;128\\3 \times 3,\;256\end{array} \right) \times 2$
Conv4	64×64	$ \left( \begin{array}{l}3 \times 3,\;128\\3 \times 3,\;128\end{array} \right) \times 2$	Conv9	8×8	$ \left( \begin{array}{l}3 \times 3,\;512\\3 \times 3,\;1024\end{array} \right) \times 2$
Conv5	32×32	$ \left( \begin{array}{l}3 \times 3,\;128\\3 \times 3,\;256\end{array} \right) \times 2$

下载: 导出CSV

表 2 各模型在HM-SAR数据集中的分割性能比较

Table 2. Comparisons of the segmentation performance of each model in the HM-SAR dataset

Network model	MPA/%	mIoU/%	F1/%	Network model	MPA/%	mIoU/%	F1/%
U-net	80.29	70.35	81.87	Deeplabv3+	81.05	70.58	82.00
Pspnet	82.98	72.32	83.28	HRnet-v2	82.33	72.90	83.69
FCN-8s	81.29	72.11	83.11	DBMFnet (ours)	85.01	75.44	85.21

下载: 导出CSV

表 3 各模型在HM-SAR数据集中的目标分割性能比较

Table 3. Comparisons of the objects segmentation performance of each model in the HM-SAR dataset

Class	U-net		Pspnet		Deeplabv3+		HRnet-v2		FCN-8s		DBMFnet (ours)
Class	Pre	IoU	Pre	IoU	Pre	IoU	Pre	IoU	Pre	IoU	Pre	IoU
Hammer	80.74	61.98	76.49	63.7	80.15	63.99	79.93	67.35	79.16	65.17	81.91	69.33
Wrench	82.66	66.78	82.88	71.84	80.61	66.57	78.80	66.15	84.04	69.56	84.22	75.24
Pistol	75.63	63.77	77.3	64.21	75.45	62.65	85.71	69.47	81.07	65.81	87.89	70.56
Knife	78.59	59.4	81.36	62.01	78.82	59.84	81.67	61.68	80.06	60.16	82.55	66.15

下载: 导出CSV

表 4 各个模型的计算复杂度和推理速度

Table 4. Calculation complexity and inference speed of each model

Network model	Params/M	GFLOPs	Speed/(f/s)
U-net	24.89	452.31	32
Pspnet	46.7	118.43	33.5
FCN-8s	32.95	277.74	16
Deeplabv3+	54.71	166.87	21
HRnet	29.55	80.18	11.5
DBMFnet(our)	19.54	47.36	26

下载: 导出CSV

表 5 使用不同解码器模块的模型性能对比

Table 5. Comparisons of models using different decoder modules

Network model	mIoU	Params/M	GFLOPs
Baseline	72.61	23.15	38.78
Deeplabv3+(FCM)	70.58	54.71	166.87
FCN-8s(FDM)	72.11	32.95	277.74
Baseline+FCM	74.1	22.44	100.8
Baseline+FDM	73.16	21.65	45.27
Baseline+MSFM	75.44	23.06	47.86

下载: 导出CSV

参考文献(22)

[1]	Saadat M S, Sur S, Nelakuditi S, et al. MilliCam: hand-held millimeter-wave imaging[C]//Proceedings of 29th International Conference on Computer Communications and Networks, Honolulu, 2020: 1–9. https://doi.org/10.1109/ICCCN49398.2020.9209710.
[2]	Jing H D, Li S Y, Cui X X, et al. Near-field single-frequency millimeter-wave 3-D imaging via multifocus image fusion[J]. IEEE Antennas Wirel Propag Lett, 2021, 20(3): 298−302. doi: 10.1109/LAWP.2020.3048478
[3]	Nozokido T, Noto M, Murai T. Passive millimeter-wave microscopy[J]. IEEE Microw Wirel Compon Lett, 2009, 19(10): 638−640. doi: 10.1109/LMWC.2009.2029741
[4]	Appleby R, Anderton R N. Millimeter-wave and submillimeter-wave imaging for security and surveillance[J]. Proc IEEE, 2007, 95(8): 1683−1690. doi: 10.1109/JPROC.2007.898832
[5]	Işiker H, Ünal İ, Tekbaş M, et al. An auto‐classification procedure for concealed weapon detection in millimeter‐wave radiometric imaging systems[J]. Microw Opt Technol Lett, 2018, 60(3): 583−594. doi: 10.1002/mop.31005
[6]	He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 770–778. https://doi.org/10.1109/CVPR.2016.90.
[7]	Chollet F. Xception: deep learning with depthwise separable convolutions[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017: 1251–1258. https://doi.org/10.1109/CVPR.2017.195.
[8]	Ren S Q, He K M, Girshick R B, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, 2015.
[9]	Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision, Amsterdam, 2016: 21–37. https://doi.org/10.1007/978-3-319-46448-0_2.
[10]	Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 779–788. https://doi.org/10.1109/CVPR.2016.91.
[11]	Xie E Z, Wang W H, Yu Z D, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems, 2021.
[12]	Zhao H S, Shi J P, Qi X J, et al. Pyramid scene parsing network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. https://doi.org/10.1109/CVPR.2017.660.
[13]	Chen L C, Zhu Y K, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, 2018. https://doi.org/10.1007/978-3-030-01234-2_49.
[14]	Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. https://doi.org/10.1109/CVPR.2019.00584.
[15]	Pan H H, Hong Y D, Sun W C, et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes[J]. IEEE Trans Intell Transp Syst, 2023, 24(3): 3448−3460. doi: 10.1109/TITS.2022.3228042
[16]	López-Tapia S, Molina R, de la Blanca N P. Deep CNNs for object detection using passive millimeter sensors[J]. IEEE Trans Circuits Syst Video Technol, 2019, 29(9): 2580−2589. doi: 10.1109/TCSVT.2017.2774927
[17]	Liu C Y, Yang M H, Sun X W. Towards robust human millimeter wave imaging inspection system in real time with deep learning[J]. Prog Electromagn Res, 2018, 161: 87−100. doi: 10.2528/PIER18012601
[18]	Sun P, Liu T, Chen X T, et al. Multi-source aggregation transformer for concealed object detection in millimeter-wave images[J]. IEEE Trans Circuits Syst Video Technol, 2022, 32(9): 6148−6159. doi: 10.1109/TCSVT.2022.3161815
[19]	王林华, 袁明辉, 黄慧, 等. 太赫兹安检系统人体图像边缘物体识别[J]. 红外与激光工程, 2017, 46(11): 1125002. doi: 10.3788/IRLA201746.1125002 Wang L H, Yuan M H, Huang H, et al. Recognition of edge object of human body in THz security inspection system[J]. Infrared Laser Eng, 2017, 46(11): 1125002. doi: 10.3788/IRLA201746.1125002
[20]	Wang C J, Yang K H, Sun X W. Precise localization of concealed objects in millimeter-wave images via semantic segmentation[J]. IEEE Access, 2020, 8: 121246−121256. doi: 10.1109/ACCESS.2020.3007256
[21]	Liang D, Pan J X, Yu Y, et al. Concealed object segmentation in terahertz imaging via adversarial learning[J]. Optik, 2019, 185: 1104−1114. doi: 10.1016/j.ijleo.2019.04.034
[22]	Li X T, You A S, Zhu Z, et al. Semantic flow for fast and accurate scene parsing[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, 2020: 775–793. https://doi.org/10.1007/978-3-030-58452-8_45.

施引文献

资源附件(0)

访问统计

点击扫一扫

图(8)

表(5)

计量

文章访问数:
PDF下载数:
施引文献: 0

基于双分支多尺度融合网络的毫米波SAR图像多目标语义分割方法

作者简介:
丁俊华(1998-)，男，硕士研究生，从事计算机视觉研究。E-mail：1483802325@qq.com;

袁明辉(1974-)，男，博士，副教授，从事太赫兹器件研究。E-mail: yuanminghui@usst.edu.cn

**^*通讯作者:** 袁明辉，yuanminghui@usst.edu.cn

A multi-target semantic segmentation method for millimetre wave SAR images based on a dual-branch multi-scale fusion network

**^*Corresponding author:** yuanminghui@usst.edu.cn

计量

目录

作者须知

其他内容

条款和政策

基于双分支多尺度融合网络的毫米波SAR图像多目标语义分割方法

作者简介: 丁俊华(1998-)，男 ，硕士研究生，从事计算机视觉研究。E-mail：1483802325@qq.com; 袁明辉(1974-)，男，博士，副教授，从事太赫兹器件研究。E-mail: yuanminghui@usst.edu.cn

*通讯作者: 袁明辉，yuanminghui@usst.edu.cn

A multi-target semantic segmentation method for millimetre wave SAR images based on a dual-branch multi-scale fusion network

*Corresponding author: yuanminghui@usst.edu.cn

计量

出版历程

目录

作者须知

其他内容

条款和政策

作者简介:
丁俊华(1998-)，男，硕士研究生，从事计算机视觉研究。E-mail：1483802325@qq.com;

袁明辉(1974-)，男，博士，副教授，从事太赫兹器件研究。E-mail: yuanminghui@usst.edu.cn

**^*通讯作者:** 袁明辉，yuanminghui@usst.edu.cn

**^*Corresponding author:** yuanminghui@usst.edu.cn