基于双分支多尺度融合网络的毫米波SAR图像多目标语义分割方法

丁俊华,袁明辉. 基于双分支多尺度融合网络的毫米波SAR图像多目标语义分割方法[J]. 光电工程,2023,50(12): 230242. doi: 10.12086/oee.2023.230242
引用本文: 丁俊华,袁明辉. 基于双分支多尺度融合网络的毫米波SAR图像多目标语义分割方法[J]. 光电工程,2023,50(12): 230242. doi: 10.12086/oee.2023.230242
Ding J H, Yuan M H. A multi-target semantic segmentation method for millimetre wave SAR images based on a dual-branch multi-scale fusion network[J]. Opto-Electron Eng, 2023, 50(12): 230242. doi: 10.12086/oee.2023.230242
Citation: Ding J H, Yuan M H. A multi-target semantic segmentation method for millimetre wave SAR images based on a dual-branch multi-scale fusion network[J]. Opto-Electron Eng, 2023, 50(12): 230242. doi: 10.12086/oee.2023.230242

基于双分支多尺度融合网络的毫米波SAR图像多目标语义分割方法

  • 基金项目:
    国家自然科学基金资助项目 (61601291);上海市科委专项资助 (14dz1206602)
详细信息

A multi-target semantic segmentation method for millimetre wave SAR images based on a dual-branch multi-scale fusion network

  • Fund Project: Project supported by National Natural Science Foundation of China (61601291), and Shanghai Committee of Science and Technology (14dz1206602)
More Information
  • 在毫米波合成孔径雷达(SAR)安检成像违禁品的检测与识别中,存在着目标尺寸过小、目标被部分遮挡和多目标之间重叠等复杂情况,不利于违禁品的准确识别。针对这些问题,提出了一种基于双分支多尺度融合网络(DBMFnet)的违禁品检测方法。该网络使用Encoder-Decoder的结构,在Encoder阶段,提出一种双分支并行特征提取网络(DBPFEN)来增强特征提取;在Decoder阶段,提出一种多尺度融合模块(MSFM)来提高对目标的检测能力。实验结果表明,该方法的均交并比(mIoU)均优于现有的语义分割方法,降低了漏检与错检率。

  • Overview: With the advancements of millimeter wave technology, millimeter wave security inspection systems have reached a higher level of maturity. Compared with traditional security inspection technologies such as X-ray, infrared, and metal detectors, millimeter wave security imaging not only enables the detection of the metallic objects hidden under fabrics, but also identifies dangerous items such as plastic firearms, knives, explosives, etc. Significantly, it is crucial to note that millimeter waves are non-ionizing and do not cause harm to the human body. The utilization of millimeter wave security inspection enables the acquisition of precise image information and significantly reduces the occurrence of false alarms, making millimeter wave imaging equipment extensively employed in the security inspection of the human body.

    There are several major challenges in the detection and identification of contraband in millimetre-wave synthetic aperture radar (SAR) security imaging: the complexities of small target sizes, partially occluded targets and overlap between multiple targets, which are not conducive to the accurate identification of contraband. To address these problems, a contraband detection method based on Dual Branch Multiscale Fusion Network (DBMFnet) is proposed. The overall architecture of the DBMFnet follows the encoder-decoder framework. In the encoder stage, a dual-branch parallel feature extraction network (DBPFEN) is proposed to enhance the feature extraction. In the feature extraction process of DBMFnet, one branch preserves the high resolution while the other branch extracts the rich semantic information through multiple downsampling operations. Bilateral connections are established between high-resolution and low-resolution branches to facilitate repeated feature exchange, ensuring that the high-resolution branch feature maps integrate into the low-rate branch feature maps across different scales, which facilitates the combination of rich semantic information and fine-grained details to improve the detection of small and interfering targets in images. In the decoder stage, a multi-scale fusion module (MSFM) is proposed to enhance the detection ability of the targets. The module consists of the Feature Alignment Module (FAM), which allows multiple low-resolution feature maps to merge into high-resolution maps. The FAM is inspired by the optical flow for the motion alignment between adjacent video frames, where the feature maps Fh, Flof different resolutions are used as the input and changed to the same number of channels by a 1×1 convolutional layer, respectively. Subsequently, the high-resolution feature map Fh is concatenated with the low-resolution feature map Fl by a bilinear interpolation up-sampling layer.

    The experimental results show that when tested using the HM-SAR dataset, our proposed model improves mIoU by 2.54% compared to the existing best performing semantic segmentation models. The ablation experiment shows that the proposed MSFM can effectively improve the mIoU value.

  • 加载中
  • 图 1  DBMFnet网络结构图

    Figure 1.  DBMFnet network structure diagram

    图 2  特征融合过程

    Figure 2.  Feature fusion process

    图 3  不同的特征融合方式。(a) FCM; (b) FDM; (c) MSFM;

    Figure 3.  Different feature fusion methods. (a) FCM; (b) FDM; (c) MSFM

    图 4  HM-SAR安检图片。(a) 背面扫描的人体图片;(b) 正面扫描的人体图片

    Figure 4.  HM-SAR security images. (a) Back scanning image of the human body; (b) Frontal scanning image of the human body

    图 5  DBMFnet热力图

    Figure 5.  DBMFnet thermal diagram

    图 6  各模型测试结果,每一行代表相同图片测试的结果,每一列代表同一模型的测试结果。黑色像素表示背景,红色像素 表示锤头,绿色像素表示扳手,黄色像素表示手枪,蓝色像素表示小刀

    Figure 6.  Test results of each model. Each row represents the test results of the same picture, and each column represents the test results of the same model. Black denotes the background, green denotes the wrench, yellow denotes the pistol, red denotes the hammer, and blue denotes the knife

    图 7  基线模型

    Figure 7.  Baseline model

    表 1  双分支特征提取网络结构

    Table 1.  Architectures of DBFEN

    Stage Output DBFENStage Output DBFEN
    Conv1256×2563×3, 64, stride 2Conv664×64$ \left( \begin{array}{l}3 \times 3,\;128\\3 \times 3,\;128\end{array} \right) \times 2$
    Conv2128×1283×3, 64, stride 2Conv716×16$ \left( \begin{array}{l}3 \times 3,\;256\\3 \times 3,\;512\end{array} \right) \times 2$
    Conv364×64$ \left( \begin{array}{l}3 \times 3,\;64\\3 \times 3,\;128\end{array} \right) \times 2$Conv864×64$ \left( \begin{array}{l}3 \times 3,\;128\\3 \times 3,\;256\end{array} \right) \times 2$
    Conv464×64$ \left( \begin{array}{l}3 \times 3,\;128\\3 \times 3,\;128\end{array} \right) \times 2$Conv98×8$ \left( \begin{array}{l}3 \times 3,\;512\\3 \times 3,\;1024\end{array} \right) \times 2$
    Conv532×32$ \left( \begin{array}{l}3 \times 3,\;128\\3 \times 3,\;256\end{array} \right) \times 2$
    下载: 导出CSV

    表 2  各模型在HM-SAR数据集中的分割性能比较

    Table 2.  Comparisons of the segmentation performance of each model in the HM-SAR dataset

    Network modelMPA/%mIoU/%F1/% Network modelMPA/%mIoU/%F1/%
    U-net 80.29 70.35 81.87 Deeplabv3+ 81.05 70.58 82.00
    Pspnet 82.98 72.32 83.28 HRnet-v2 82.33 72.90 83.69
    FCN-8s 81.29 72.11 83.11 DBMFnet (ours) 85.01 75.44 85.21
    下载: 导出CSV

    表 3  各模型在HM-SAR数据集中的目标分割性能比较

    Table 3.  Comparisons of the objects segmentation performance of each model in the HM-SAR dataset

    ClassU-netPspnetDeeplabv3+HRnet-v2FCN-8sDBMFnet (ours)
    PreIoUPreIoUPreIoUPreIoUPreIoUPreIoU
    Hammer 80.74 61.98 76.49 63.7 80.15 63.99 79.93 67.35 79.16 65.17 81.91 69.33
    Wrench 82.66 66.78 82.88 71.84 80.61 66.57 78.80 66.15 84.04 69.56 84.22 75.24
    Pistol 75.63 63.77 77.3 64.21 75.45 62.65 85.71 69.47 81.07 65.81 87.89 70.56
    Knife 78.59 59.4 81.36 62.01 78.82 59.84 81.67 61.68 80.06 60.16 82.55 66.15
    下载: 导出CSV

    表 4  各个模型的计算复杂度和推理速度

    Table 4.  Calculation complexity and inference speed of each model

    Network modelParams/MGFLOPsSpeed/(f/s)
    U-net 24.89 452.31 32
    Pspnet 46.7 118.43 33.5
    FCN-8s 32.95 277.74 16
    Deeplabv3+ 54.71 166.87 21
    HRnet 29.55 80.18 11.5
    DBMFnet(our) 19.54 47.36 26
    下载: 导出CSV

    表 5  使用不同解码器模块的模型性能对比

    Table 5.  Comparisons of models using different decoder modules

    Network modelmIoUParams/MGFLOPs
    Baseline 72.61 23.15 38.78
    Deeplabv3+(FCM) 70.58 54.71 166.87
    FCN-8s(FDM) 72.11 32.95 277.74
    Baseline+FCM 74.1 22.44 100.8
    Baseline+FDM 73.16 21.65 45.27
    Baseline+MSFM 75.44 23.06 47.86
    下载: 导出CSV
  • [1]

    Saadat M S, Sur S, Nelakuditi S, et al. MilliCam: hand-held millimeter-wave imaging[C]//Proceedings of 29th International Conference on Computer Communications and Networks, Honolulu, 2020: 1–9. https://doi.org/10.1109/ICCCN49398.2020.9209710.

    [2]

    Jing H D, Li S Y, Cui X X, et al. Near-field single-frequency millimeter-wave 3-D imaging via multifocus image fusion[J]. IEEE Antennas Wirel Propag Lett, 2021, 20(3): 298−302. doi: 10.1109/LAWP.2020.3048478

    [3]

    Nozokido T, Noto M, Murai T. Passive millimeter-wave microscopy[J]. IEEE Microw Wirel Compon Lett, 2009, 19(10): 638−640. doi: 10.1109/LMWC.2009.2029741

    [4]

    Appleby R, Anderton R N. Millimeter-wave and submillimeter-wave imaging for security and surveillance[J]. Proc IEEE, 2007, 95(8): 1683−1690. doi: 10.1109/JPROC.2007.898832

    [5]

    Işiker H, Ünal İ, Tekbaş M, et al. An auto‐classification procedure for concealed weapon detection in millimeter‐wave radiometric imaging systems[J]. Microw Opt Technol Lett, 2018, 60(3): 583−594. doi: 10.1002/mop.31005

    [6]

    He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 770–778. https://doi.org/10.1109/CVPR.2016.90.

    [7]

    Chollet F. Xception: deep learning with depthwise separable convolutions[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017: 1251–1258. https://doi.org/10.1109/CVPR.2017.195.

    [8]

    Ren S Q, He K M, Girshick R B, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, 2015.

    [9]

    Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision, Amsterdam, 2016: 21–37. https://doi.org/10.1007/978-3-319-46448-0_2.

    [10]

    Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 779–788. https://doi.org/10.1109/CVPR.2016.91.

    [11]

    Xie E Z, Wang W H, Yu Z D, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems, 2021.

    [12]

    Zhao H S, Shi J P, Qi X J, et al. Pyramid scene parsing network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. https://doi.org/10.1109/CVPR.2017.660.

    [13]

    Chen L C, Zhu Y K, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, 2018. https://doi.org/10.1007/978-3-030-01234-2_49.

    [14]

    Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. https://doi.org/10.1109/CVPR.2019.00584.

    [15]

    Pan H H, Hong Y D, Sun W C, et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes[J]. IEEE Trans Intell Transp Syst, 2023, 24(3): 3448−3460. doi: 10.1109/TITS.2022.3228042

    [16]

    López-Tapia S, Molina R, de la Blanca N P. Deep CNNs for object detection using passive millimeter sensors[J]. IEEE Trans Circuits Syst Video Technol, 2019, 29(9): 2580−2589. doi: 10.1109/TCSVT.2017.2774927

    [17]

    Liu C Y, Yang M H, Sun X W. Towards robust human millimeter wave imaging inspection system in real time with deep learning[J]. Prog Electromagn Res, 2018, 161: 87−100. doi: 10.2528/PIER18012601

    [18]

    Sun P, Liu T, Chen X T, et al. Multi-source aggregation transformer for concealed object detection in millimeter-wave images[J]. IEEE Trans Circuits Syst Video Technol, 2022, 32(9): 6148−6159. doi: 10.1109/TCSVT.2022.3161815

    [19]

    王林华, 袁明辉, 黄慧, 等. 太赫兹安检系统人体图像边缘物体识别[J]. 红外与激光工程, 2017, 46(11): 1125002. doi: 10.3788/IRLA201746.1125002

    Wang L H, Yuan M H, Huang H, et al. Recognition of edge object of human body in THz security inspection system[J]. Infrared Laser Eng, 2017, 46(11): 1125002. doi: 10.3788/IRLA201746.1125002

    [20]

    Wang C J, Yang K H, Sun X W. Precise localization of concealed objects in millimeter-wave images via semantic segmentation[J]. IEEE Access, 2020, 8: 121246−121256. doi: 10.1109/ACCESS.2020.3007256

    [21]

    Liang D, Pan J X, Yu Y, et al. Concealed object segmentation in terahertz imaging via adversarial learning[J]. Optik, 2019, 185: 1104−1114. doi: 10.1016/j.ijleo.2019.04.034

    [22]

    Li X T, You A S, Zhu Z, et al. Semantic flow for fast and accurate scene parsing[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, 2020: 775–793. https://doi.org/10.1007/978-3-030-58452-8_45.

  • 加载中

(8)

(5)

计量
  • 文章访问数: 
  • PDF下载数: 
  • 施引文献:  0
出版历程
收稿日期:  2023-09-28
修回日期:  2023-11-30
录用日期:  2023-11-30
刊出日期:  2024-01-19

目录

/

返回文章
返回