基于BiLevelNet的实时语义分割算法

吴马靖,张永爱,林珊玲,等. 基于BiLevelNet的实时语义分割算法[J]. 光电工程,2024,51(5): 240030. doi: 10.12086/oee.2024.240030
引用本文: 吴马靖,张永爱,林珊玲,等. 基于BiLevelNet的实时语义分割算法[J]. 光电工程,2024,51(5): 240030. doi: 10.12086/oee.2024.240030
Wu M J, Zhang Y A, Lin S L, et al. Real-time semantic segmentation algorithm based on BiLevelNet[J]. Opto-Electron Eng, 2024, 51(5): 240030. doi: 10.12086/oee.2024.240030
Citation: Wu M J, Zhang Y A, Lin S L, et al. Real-time semantic segmentation algorithm based on BiLevelNet[J]. Opto-Electron Eng, 2024, 51(5): 240030. doi: 10.12086/oee.2024.240030

基于BiLevelNet的实时语义分割算法

  • 基金项目:
    国家重点研发计划资助项目(2023YFB3609400);福建省自然科学基金资助项目(2020J01468);国家自然科学基金青年科学基金资助项目(62101132)
详细信息
    作者简介:
    *通讯作者: 林坚普,ljp@fzu.edu.cn
  • 中图分类号: TP394.1; TH691.9

Real-time semantic segmentation algorithm based on BiLevelNet

  • Fund Project: Project supported by the National Key R&D Program of China (2023YFB3609400), Fujian Province Natural Science Foundation of China (2020J01468), and Youth Science Foundation of the National Natural Science Foundation of China (62101132)
More Information
  • 针对语义分割网络参数量过大导致其难以部署在内存受限的边缘设备等问题,本文提出一种基于BiLevelNet的轻量级实时语义分割算法。首先,利用空洞卷积扩大感受野,并结合特征复用策略增强网络的区域感知能力。接着,嵌入两阶段的PBRA注意力机制,建立远距离相关物体之间的依赖关系以增强网络的全局感知能力。最后,引入结合浅层特征的FADE算子以改善图像上采样效果。实验结果表明,在输入图像分辨率为 512×1024的情况下,本文网络在Cityscapes数据集上以121 f/s的速率获得了75.1%的平均交并比,模型大小仅为0.7 M。同时在输入图像分辨率为360×480的情况下,在Camvid数据集上取得68.2%的平均交并比。同当前其他实时语义分割方法相比,该网络性能取得速度与精度的均衡,符合自动驾驶应用场景对实时性的要求。

  • Overview: In response to the challenge posed by the large parameter sizes of semantic segmentation networks, which complicate deployment on memory-constrained edge devices, a lightweight real-time semantic segmentation algorithm based on BiLevelNet is proposed. Initially, dilated convolutions are utilized to broaden the receptive field, and strategies for reusing features are integrated to bolster the network's awareness of regions. Subsequently, a two-stage PBRA (Partial Bi-Level Route Attention) mechanism is adopted to form connections between distant objects, thereby enhancing the network's capability to perceive global contexts. Moreover, the FADE operator is introduced for merging shallow features, thereby augmenting the efficacy of image upsampling.

    Within the depicted AFR module in Fig. 4, a variety of hierarchical feature maps are presented, along with descriptions of their characteristics and roles. The distinctions and connections between the input feature map, the local feature map achieved through 3×3 depth convolution, and the context information feature map acquired through dilated convolution are clarified. It is further emphasized how these features are effectively amalgamated in the final fused feature map, showcasing strong activation across both local and global contexts. Additionally, a gradually decreasing channel reduction factor is employed, as elaborated in Table 3. Through the gradual adjustment of the channel reduction factor, it is observed that with a reduction factor of r=1/4, the PBRA module enhances mIoU by 1.5% and boosts speed by 12FPS in comparison to BRA.

    Moreover, discontinuities and missing pixels are noted in segmentation results when bilinear interpolation is used for upsampling. Observations of the depth feature maps prior to bilinear upsampling reveal that features corresponding to roads and sidewalks bear similarities, leading to potential misclassifications. To counteract this issue, shallow features that preserve edge information are introduced and merged into the FADE upsampling process, thereby improving edge segmentation. This method effectively addresses the loss of spatial information, resulting in smoother and more defined edge segmentation outcomes.

    Experimental outcomes indicate that, at an input image resolution of 512×1024, the network attains an average Intersection over Union (IoU) of 75.1% on the Cityscapes dataset, operating at a speed of 121 frames per second, while maintaining a modest model size of only 0.7M. Furthermore, at an input image resolution of 360×480, the network secures an average IoU of 68.2% on the CamVid dataset. Compared with other real-time semantic segmentation methods, this network maintains an optimal balance between speed and accuracy, fulfilling the real-time operation requirements for applications such as autonomous driving.

  • 加载中
  • 图 1  BiLevelNet的网络框架

    Figure 1.  Network framework of BiLevelNet

    图 2  不同特征提取模块的比较

    Figure 2.  Comparison of different feature extraction modules

    图 3  AFR-S模块和AFR模块

    Figure 3.  AFR-S module and AFR module

    图 4  区域感知与特征复用示意图

    Figure 4.  Schematic diagram of region perception and feature reuse

    图 5  BRA模块

    Figure 5.  BRA module

    图 6  PBRA模块

    Figure 6.  Partial Bi-Level route attention module

    图 7  解码器模块

    Figure 7.  Decorder module

    图 8  Cityscapes数据集与Camvid数据集的样本分布

    Figure 8.  Sample distribution of Cityscapes dataset and Camvid dataset

    图 9  引入PBRA模块前后的分割结果对比

    Figure 9.  Comparison of results with using the PBRA modules

    图 10  双线性插值分割结果对比

    Figure 10.  Comparison of segmentation results using bilinear interpolation

    图 11  双线性插值前的特征图与浅层特征图

    Figure 11.  Feature map before bilinear interpolation and the shallow feature map

    图 12  FADE上采样分割结果

    Figure 12.  FADE upsampling segmentation results

    图 13  各网络在Cityscapes数据集上的可视化结果

    Figure 13.  Visualization results of networks on Cityscapes dataset

    图 14  各网络在Camvid数据集上的可视化结果

    Figure 14.  Visualization results of networks on the Camvid dataset

    表 1  本文网络框架

    Table 1.  Network framework of BiLevelNet

    StageOperatorModeOutput size
    Stage 13 × 3 ConvStride 232 × 256 × 512
    3 × 3 ConvStride 132 × 256 × 512
    3 × 3 ConvStride 132 × 256 × 512
    Stage 2AFR-S64 × 128 × 256
    2 × ARFDilated 264 × 128 × 256
    Stage 3AFR-S128 × 64 × 128
    4 × AFRDilated 4128 × 64 × 128
    5 × AFRDilated 8128 × 64 × 128
    DecoderDAF32 × 256 × 512
    1 × 1 ConvStride 119 × 256 × 512
    Bilinear19 × 512 × 1024
    下载: 导出CSV

    表 2  不同特征提取模块在Cityscapes数据集的性能对比

    Table 2.  Performance comparison of different feature extraction modules on the Cityscapes dataset

    Params/MFLOPs/GFPSmIoU/%
    SSnbt0.8311.6113267.1
    DAB0.7510.7814071.8
    AFR0.689.6412875.5
    下载: 导出CSV

    表 3  不同缩减因子在Cityscapes验证集的实验结果

    Table 3.  Experimental results of different reduction factor modules in Cityscapes validation set

    RatioParams/MFLOPs/GFPSmIoU/%
    00.679.5913574.0
    10.7410.2511674.2
    1/20.699.7512075.0
    1/40.689.6412875.5
    1/80.679.6113075.1
    1/160.679.613174.1
    下载: 导出CSV

    表 4  FADE上采样算子在Cityscapes验证集的消融实验

    Table 4.  Experimental results of FADE modules on the Cityscapes validation dataset

    Params/MFLOPs/GFPSmIoU/%
    Bilinear0.689.6412875.5
    FADE0.710.412175.9
    下载: 导出CSV

    表 5  不同模型在Cityscapes数据集的性能对比

    Table 5.  Performance comparison of different models on the Cityscapes dataset

    AlgorithmSizeParams/MFLOPs/GFPSmIoU/%
    ENet512×10240.364.354258.3
    ERFNet512×10242.1026.85968.0
    LEDNet512×10240.9411.57169.2
    DABNet512×10240.76-10470.1
    ELANet[29]512×10240.679.79374.7
    RELAXNet512×10241.9022.846474.8
    DALNet[30]512×1 0240.48-7471.1
    BiseNet-v2512×10243.4021.215672.6
    MIFNet[31]512×10240.8212.037473.1
    文献[32]512×10246.2212.5154.774.2
    Ours512×10240.7010.412175.1
    下载: 导出CSV

    表 6  不同模型在 Cityscapes数据集上的各类别交并比

    Table 6.  Evaluation results of per-class IoU /% on the Cityscapes dataset

    ClassERFNetDABNetLEDNetFDDWNetOurs
    Roa97.996.897.198.098.0
    Sid82.178.578.682.482.2
    Bui90.790.990.491.191.8
    Wal45.245.346.552.554.8
    Fen50.450.148.151.256.5
    Pol59.059.160.959.963.2
    Tli62.665.260.464.468.4
    TSi68.470.771.168.972.1
    Veg91.992.591.292.592.8
    Ter69.468.160.070.370.5
    Sky94.294.693.294.494.5
    Ped78.580.574.380.882.3
    Rid59.858.551.859.865.2
    Car93.492.792.394.094.3
    Tru52.552.761.056.559.2
    Bus60.867.272.468.978.5
    Tra53.750.951.048.673.9
    Mot49.950.443.355.757.9
    Bic64.265.770.267.770.2
    下载: 导出CSV

    表 7  CamVid数据集上的性能对比

    Table 7.  Performance comparison on the CamVid dataset

    AlgorithmSizePretrainParams/MmIoU/%
    ENet360×480N0.3651.3
    CGNet360×480N0.564.7
    DALNet360×480N0.4766.1
    LEDNet360×480N0.9466.6
    DABNet360×480N0.7666.4
    MIFNet360×480N0.8167.7
    ELANet360×480N0.6767.9
    BiseNet-v2360×480Y5.868.7
    Ours360×480N0.768.2
    下载: 导出CSV
  • [1]

    Li L H, Qian B, Lian J, et al. Traffic scene segmentation based on RGB-D image and deep learning[J]. IEEE Trans Intell Transp Syst, 2017, 19(5): 1664−1669. doi: 10.1109/TITS.2017.2724138

    [2]

    梁礼明, 卢宝贺, 龙鹏威, 等. 自适应特征融合级联Transformer视网膜血管分割算法[J]. 光电工程, 2023, 50(10): 230161. doi: 10.12086/oee.2023.230161

    Liang L M, Lu B H, Long P W, et al. Adaptive feature fusion cascade transformer retinal vessel segmentation algorithm[J]. Opto-Electron Eng, 2023, 50(10): 230161. doi: 10.12086/oee.2023.230161

    [3]

    闵锋, 彭伟明, 况永刚, 等. 基于非下采样轮廓波变换的遥感地物分割算法[J]. 电光与控制, 2023, 30(11): 49−55. doi: 10.3969/j.issn.1671-637X.2023.11.008

    Min F, Peng W M, Kuang Y G, et al. A remote sensing ground object segmentation algorithm based on non-subsampled contourlet transform[J]. Electron Opt Control, 2023, 30(11): 49−55. doi: 10.3969/j.issn.1671-637X.2023.11.008

    [4]

    Zhao H S, Shi J P, Qi X J, et al. Pyramid scene parsing network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2881–2890. https://doi.org/10.1109/CVPR.2017.660.

    [5]

    张文博, 瞿珏, 王崴, 等. 融合多尺度特征的改进Deeplab v3+图像语义分割算法[J]. 电光与控制, 2022, 29(11): 12−16,30. doi: 10.3969/j.issn.1671-637X.2022.11.003

    Zhang W B, Qu J, Wang W, et al. An improved Deeplab v3+ image semantic segmentation algorithm incorporating multi-scale features[J]. Electron Opt Control, 2022, 29(11): 12−16,30. doi: 10.3969/j.issn.1671-637X.2022.11.003

    [6]

    Howard A, Sandler M, Chen B, et al. Searching for MobileNetV3[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 2019: 1314–1324. https://doi.org/10.1109/ICCV.2019.00140.

    [7]

    Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 3213–3223. https://doi.org/10.1109/CVPR.2016.350.

    [8]

    Brostow G J, Fauqueur J, Cipolla R. Semantic object classes in video: a high-definition ground truth database[J]. Pattern Recognit Lett, 2009, 30(2): 88−97. doi: 10.1016/j.patrec.2008.04.005

    [9]

    Yu C Q, Gao C X, Wang J B, et al. BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation[J]. Int J Comput Vis, 2021, 129(11): 3051−3068. doi: 10.1007/s11263-021-01515-2

    [10]

    Zhuang M X, Zhong X Y, Gu D B, et al. LRDNet: a lightweight and efficient network with refined dual attention decorder for real-time semantic segmentation[J]. Neurocomputing, 2021, 459: 349−360. doi: 10.1016/j.neucom.2021.07.019

    [11]

    Romera E, Álvarez J M, Bergasa L M, et al. ERFNet: efficient residual factorized ConvNet for real-time semantic segmentation[J]. IEEE Trans Intell Transp Syst, 2018, 19(1): 263−272. doi: 10.1109/TITS.2017.2750080

    [12]

    Liu J, Zhou Q, Qiang Y, et al. FDDWNet: a lightweight convolutional neural network for real-time semantic segmentation[C]//Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing, 2020: 2373–2377. https://doi.org/10.1109/ICASSP40776.2020.9053838.

    [13]

    Liu J, Xu X Q, Shi Y Q, et al. RELAXNet: residual efficient learning and attention expected fusion network for real-time semantic segmentation[J]. Neurocomputing, 2022, 474: 115−127. doi: 10.1016/j.neucom.2021.12.003

    [14]

    林珊玲, 彭雪玲, 林坚普, 等. 多尺度增强特征融合的钢表面缺陷目标检测[J]. 光学精密工程, 2024, 32(7): 1076−1086. doi: 10.37188/OPE.20243207.1075

    Lin S L, Peng X L, Lin J P, et al. Object detection of steel surface defect based on multi-scale enhanced feature fusion[J]. Opt Precision Eng, 2024, 32(7): 1076−1086. doi: 10.37188/OPE.20243207.1075

    [15]

    Wang Y, Zhou Q, Liu J, et al. Lednet: a lightweight encoder-decoder network for real-time semantic segmentation[C]//Proceedings of 2019 IEEE International Conference on Image Processing, 2019: 1860–1864. https://doi.org/10.1109/ICIP.2019.8803154.

    [16]

    Wei H R, Liu X, Xu S C, et al. DWRSeg: dilation-wise residual network for real-time semantic segmentation[Z]. arXiv: 2212.01173, 2023. https://arxiv.org/abs/2212.01173v1.

    [17]

    Chen J R, Kao S H, He H, et al. Run, don't walk: chasing higher FLOPS for faster neural networks[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 12021–12031. https://doi.org/10.1109/CVPR52729.2023.01157.

    [18]

    Ma N N, Zhang X Y, Zheng H T, et al. ShuffleNet V2: practical guidelines for efficient CNN architecture design[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 116–131. https://doi.org/10.1007/978-3-030-01264-9_8.

    [19]

    Woo S, Park J, Lee J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 3–19. https://doi.org/10.1007/978-3-030-01234-2_1.

    [20]

    张冲, 黄影平, 郭志阳, 等. 基于语义分割的实时车道线检测方法[J]. 光电工程, 2022, 49(5): 210378. doi: 10.12086/oee.2022.210378

    Zhang C, Huang Y P, Guo Z Y, et al. Real-time lane detection method based on semantic segmentation[J]. Opto-Electron Eng, 2022, 49(5): 210378. doi: 10.12086/oee.2022.210378

    [21]

    Huang Z L, Wang X G, Huang L C, et al. CCNet: criss-cross attention for semantic segmentation[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 2019: 603–612. https://doi.org/10.1109/ICCV.2019.00069.

    [22]

    吴刚, 葛芸, 储珺, 等. 面向遥感图像检索的级联池化自注意力研究[J]. 光电工程, 2022, 49(12): 220029. doi: 10.12086/oee.2022.220029

    Wu G, Ge Y, Chu J, et al. Cascade pooling self-attention research for remote sensing image retrieval[J]. Opto-Electron Eng, 2022, 49(12): 220029. doi: 10.12086/oee.2022.220029

    [23]

    Xia Z F, Pan X R, Song S J, et al. Vision transformer with deformable attention[C]//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 4794–4803. https://doi.org/10.1109/CVPR52688.2022.00475.

    [24]

    Zhu L, Wang X J, Ke Z H, et al. BiFormer: vision transformer with Bi-level routing attention[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 10323–10333. https://doi.org/10.1109/CVPR52729.2023.00995.

    [25]

    Wang J Q, Chen K, Xu R, et al. CARAFE: content-aware ReAssembly of FEatures[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 2019: 3007–3016. https://doi.org/10.1109/ICCV.2019.00310.

    [26]

    刘春娟, 乔泽, 闫浩文, 等. 基于多尺度互注意力的遥感图像语义分割网络[J]. 浙江大学学报(工学版), 2023, 57(7): 1335−1344. doi: 10.3785/j.issn.1008-973X.2023.07.008

    Liu C J, Qiao Z, Yan H W, et al. Semantic segmentation network for remote sensing image based on multi-scale mutual attention[J]. J Zhejiang Univ (Eng Sci), 2023, 57(7): 1335−1344. doi: 10.3785/j.issn.1008-973X.2023.07.008

    [27]

    Lu H, Liu W Z, Fu H T, et al. FADE: fusing the assets of decoder and encoder for task-agnostic upsampling[C]//Proceedings of the 17th European Conference on Computer Vision, 2022: 231–247. https://doi.org/10.1007/978-3-031-19812-0_14.

    [28]

    Li H C, Xiong P F, Fan H Q, et al. DFANet: deep feature aggregation for real-time semantic segmentation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9522–9531. https://doi.org/10.1109/CVPR.2019.00975.

    [29]

    Yi Q M, Dai G S, Shi M, et al. ELANet: effective lightweight attention-guided network for real-time semantic segmentation[J]. Neural Process Lett, 2023, 55(5): 6425−6442. doi: 10.1007/s11063-023-11145-z

    [30]

    石敏, 沈佳林, 易清明, 等. 快速超轻量城市交通场景语义分割[J]. 计算机科学与探索, 2022, 16(10): 2377−2386. doi: 10.3778/j.issn.1673-9418.2203015

    Shi M, Shen J L, Yi Q M, et al. Rapid and ultra-lightweight semantic segmentation in urban traffic scene[J]. J Front Comput Sci Technol, 2022, 16(10): 2377−2386. doi: 10.3778/j.issn.1673-9418.2203015

    [31]

    易清明, 张文婷, 石敏, 等. 多尺度特征融合的道路场景语义分割[J]. 激光与光电子学进展, 2023, 60(12): 1210006. doi: 10.3788/LOP220914

    Yi Q M, Zhang W T, Shi M, et al. Semantic segmentation for road scene based on multiscale feature fusion[J]. Laser Optoelectron Prog, 2023, 60(12): 1210006. doi: 10.3788/LOP220914

    [32]

    兰建平, 董冯雷, 杨亚会, 等. 改进STDC-Seg的实时图像语义分割网络算法[J]. 传感器与微系统, 2023, 42(11): 110−113,118. doi: 10.13873/J.1000-9787(2023)11-0110-04

    Lan J P, Dong F L, Yang Y H, et al. Real-time image semantic segmentation network algorithm based on improved STDC-Seg[J]. Transducer Microsyst Technol, 2023, 42(11): 110−113,118. doi: 10.13873/J.1000-9787(2023)11-0110-04

  • 加载中

(15)

(7)

计量
  • 文章访问数: 
  • PDF下载数: 
  • 施引文献:  0
出版历程
收稿日期:  2024-01-30
修回日期:  2024-03-13
录用日期:  2024-03-13
刊出日期:  2024-05-25

目录

/

返回文章
返回