多特征聚合的红外-可见光行人重识别

郑海君,葛斌,夏晨星,等. 多特征聚合的红外-可见光行人重识别[J]. 光电工程,2023,50(7): 230136. doi: 10.12086/oee.2023.230136
引用本文: 郑海君,葛斌,夏晨星,等. 多特征聚合的红外-可见光行人重识别[J]. 光电工程,2023,50(7): 230136. doi: 10.12086/oee.2023.230136
Zheng H J, Ge B, Xia C X, et al. Infrared-visible person re-identification based on multi feature aggregation[J]. Opto-Electron Eng, 2023, 50(7): 230136. doi: 10.12086/oee.2023.230136
Citation: Zheng H J, Ge B, Xia C X, et al. Infrared-visible person re-identification based on multi feature aggregation[J]. Opto-Electron Eng, 2023, 50(7): 230136. doi: 10.12086/oee.2023.230136

多特征聚合的红外-可见光行人重识别

  • 基金项目:
    国家自然科学基金(6210071479,62102003);国家重大专项(2020YFB1314103);安徽省自然科学基金(2108085QF258);安徽省博士后基金(2022B623)
详细信息
    作者简介:
    *通讯作者: 葛斌,bge@aust.edu.cn
  • 中图分类号: TP391

Infrared-visible person re-identification based on multi feature aggregation

  • Fund Project: Project supported by National Natural Science Foundation of China (6210071479,62102003), National Science and Technology Major Project (2020YFB1314103), Natural Science Foundation of Anhui Province(2108085QF258) and Anhui Postdoctoral Fund (2022B623)
More Information
  • 红外-可见光行人重识别在视频监控、智能交通、安防等领域具有广泛应用。但是不同图像模态间的差异,给该领域带来了巨大的挑战。现有方法主要集中于缓解模态间差异以获得更具鉴别性的特征,但却忽略了邻级特征之间的关系以及多尺度信息对全局特征的影响。因此,本文提出一种基于多特征聚合的红外-可见光行人重识别方法(MFANet)解决现有方法的缺陷。首先在特征提取阶段融合邻级特征,引导低级特征信息的融入,以强化高级特征,使得特征更具健壮性;然后聚合不同感受野的多尺度特征以获得丰富的上下文信息;最后,以多尺度特征作为引导,强化特征以获得更具鉴别性的特征。在SYSU-MM01和RegDB数据集上的实验结果证明了所提方法的有效性,其中SYSU-MM01数据集在最困难的全搜索单镜头模式下平均精度达到了71.77%。

  • Overview: Infrared-visible person re-identification is a prominent research topic in the field of computer vision, encompassing several essential aspects. These include multi-modal perception technology, challenges in person re-identification, practical application demands, and the development of datasets and evaluation metrics. With the emergence of multi-modal perception technology, the primary objective of infrared-visible light person re-identification is to effectively fuse information from different modalities to enhance the accuracy and robustness of person re-identification. Person re-identification faces challenges such as variations in viewpoint, pose, occlusion, and lighting conditions. Furthermore, infrared-visible person re-identification poses additional challenges as a cross-modal task. This technology holds broad prospects for applications in video surveillance, security, intelligent transportation, and other related fields. Particularly, it is well-suited for person re-identification in low-light or nighttime environments. The development of relevant datasets and evaluation metrics has facilitated ongoing innovation and improvement in infrared-visible person re-identification algorithms and systems. Infrared-visible person re-identification is a research field extensively supported by various backgrounds, providing a foundation for enhancing the performance and application effectiveness of person re-identification. With the continuous exploration of researchers, the accuracy of infrared-visible person re-identification has steadily improved. However, due to the differences between different image modalities, it brings great challenges to this field. The existing methods mainly focus on mitigating the differences between modes to obtain more discriminating features, but ignore the relationship between adjacent features and the influence of multi-scale information on global features. Here, a infrared-visible person re-identification method (MFANet) based on multi-feature aggregation is proposed to solve the shortcomings of existing methods. Firstly, the adjacent level features are fused in the feature extraction stage, and the integration of low-level feature information is guided to strengthen the high-level features and make the features more robust. Then, the multi-scale features of different receptive fields of view are aggregated to obtain rich contextual information. Finally, multi-scale features are used as a guide to strengthen the features to obtain more discriminating features. Experimental results on SYSU-MM01 and RegDB datasets show the effectiveness of the proposed method, and the average accuracy of SYSU-MM01 dataset reaches 71.77% in the all-search single-shot mode and 78.24% in the indoor-search single-shot mode.

  • 加载中
  • 图 1  MFANet结构图

    Figure 1.  MFANet structure diagram

    图 2  邻级特征聚合模块

    Figure 2.  Adjacency feature aggregation module

    图 3  多尺度聚合模块

    Figure 3.  Multi scale aggregation module

    图 4  多尺度特征聚合模块

    Figure 4.  Multi scale feature aggregation module

    图 5  类内类间距离与特征分布图

    Figure 5.  Inter-class intra-class distance and feature distribution diagram

    图 6  不同感受野下的热力图

    Figure 6.  Heat map at different receptive fields of view

    图 7  SYSU-MM01可视化排序结果

    Figure 7.  Visual sorting results on SYSU-MM01

    表 1  SYSU-MM01数据集比较结果

    Table 1.  Comparison results on SYSU-MM01 datasets

    Method Setting
    All searchIndoor search
    rank-1rank-10rank-20mAPmINPrank-1rank-10rank-20mAPmINP
    One-stream[14]12.0449.6866.7413.67-16.9463.5582.1022.95-
    Two-stream[14]11.6547.9965.5012.85-15.6061.1881.0221.49-
    Zero-Padding[14]14.8054.1271.3315.59-20.5868.3885.7926.92-
    HCML[18]14.3253.1669.1716.16-24.5273.2586.7330.08-
    BDTR[17]27.3266.9681.0727.32-31.9277.1889.2841.86-
    D2RL[4]28.9070.6082.4029.20------
    AlignGAN[19]42.0385.2593.7341.48-45.8690.1795.3955.18-
    AGW[16]47.5884.4592.1147.6935.3054.2991.1495.9963.0259.23
    Xmodal[20]49.9289.7995.9650.73------
    DDAG[8]53.6189.1795.3052.0239.6258.3791.9297.4265.4462.61
    CM-NAS[21]62.0492.9297.3160.00-67.0397.0299.3472.97-
    CAJ[9]68.2395.5998.4965.3253.6174.0197.7999.6778.5276.79
    MPANet[6]70.0795.3998.3967.07-76.3597.5699.4880.16-
    PIC[23]57.5--55.1-60.4--67.7-
    DART[24]68.7296.3998.9666.2953.2672.5297.8499.4678.1774.94
    SPOT[10]65.3492.7397.0462.2548.8669.4296.2299.1274.6370.48
    DML[7]58.4091.2095.8056.10-62.4095.2098.7069.50-
    PMT[12]67.5395.3698.6464.9851.8671.6696.7399.2576.5272.74
    SFANet[25]65.7492.9897.0560.83-71.6096.6099.4580.05-
    SIDA[26]68.3695.9198.5664.19-73.2897.3599.5277.49-
    MTMFE[27]69.4796.4299.1166.41-72.5696.9899.2076.58-
    Ours71.7796.1598.768.4355.2178.2498.2399.4981.978.44
    下载: 导出CSV

    表 2  RegDB数据集比较结果

    Table 2.  Comparison results on RegDB datasets

    Method Setting
    Visible to infraredInfrared to visible
    rank-1rank-10rank-20mAPmINPrank-1rank-10rank-20mAPmINP
    Zero-Padding[14]17.7534.2144.3518.90-16.6334.6844.2517.82-
    HCML[18]24.4447.5356.7820.08-21.7045.0255.5822.24-
    BDTR[17]33.5658.6167.4332.76-32.9258.4668.4331.96-
    D2RL[4]43.4066.1076.3044.10-----
    AGW[16]70.0586.2191.5566.3750.1970.4987.2191.8465.9051.24
    Xmodal[20]62.2183.1391.7260.18------
    DDAG[8]69.3485.7789.9863.1949.2464.7783.8588.9058.5448.62
    CM-NAS[21]84.5495.1897.8580.32-82.5694.5297.3778.31-
    MCLNet[22]80.3192.7096.0373.07-75.9390.9394.5969.49-
    CAJ[9]84.7295.1797.3878.7065.3384.0994.7997.1177.2561.56
    PIC[23]83.6--79.6-79.5--77.4-
    DART[24]78.23--67.0448.3675.04--64.3843.32
    SPOT[10]80.3593.4896.4472.4656.1979.3792.7996.0172.2656.06
    DML[7]77.60--84.30-77.00--83.60-
    PMT[12]84.83--76.55-84.16--75.13-
    SFANet[25]76.3191.0294.2768.00-70.1585.2489.2763.77-
    SIDA[26]81.73-96.5575.07-79.71-95.4772.60-
    MTMFE[27]85.0494.3897.2282.52-81.1192.3596.1979.59-
    Ours85.3895.3997.5479.4965.7284.5895.2797.2378.0262.22
    下载: 导出CSV

    表 3  SYSU-MM01数据集上4种不同设定的消融研究

    Table 3.  Ablation study of four different settings on the SYSU-MM01 dataset

    SettingsAll searchIndoor search
    AFAMMSAMrank-1rank-10rank-20mAPmINPrank-1rank-10rank-20mAPmINP
    68.2395.5998.4965.3251.9074.0197.7999.6778.5274.78
    69.3095.6998.4165.9552.1475.2797.8499.4879.5175.79
    70.8995.8898.5267.6154.377.6997.4399.2581.0977.48
    71.7796.1598.7068.4355.2178.2498.2399.4981.9078.44
    下载: 导出CSV

    表 4  多尺度特征聚合模块感受野分析

    Table 4.  Multi scale feature aggregation module receptive field analysis

    Settings Receptive field
    All searchIndoor search
    rank-1rank-10rank-20mAPmINPrank-1rank-10rank-20mAPmINP
    1,3,5,7 69.41 95.82 98.54 66.27 52.96 75.34 97.96 99.66 79.98 76.53
    1,2,3,4 70.17 95.51 98.51 67.14 54.1 76.44 98.09 99.6 80.65 77.22
    1,3,5 70.5 95.77 98.54 67.11 53.68 76.66 97.81 99.51 80.54 76.99
    1,2,3 70.53 95.86 98.67 67.21 53.77 76.59 97.88 99.37 80.68 77.22
    1,3 71.77 96.15 98.70 68.43 55.21 78.24 98.23 99.49 81.90 78.44
    下载: 导出CSV
  • [1]

    刘丽, 李曦, 雷雪梅. 多尺度多特征融合的行人重识别模型[J]. 计算机辅助设计与图形学学报, 2022, 34(12): 1868−1876. doi: 10.3724/SP.J.1089.2022.19218

    Liu L, Li X, Lei X M. A person re-identification method with multi-scale and multi-feature fusion[J]. J Comput-Aided Des Comput Graphics, 2022, 34(12): 1868−1876. doi: 10.3724/SP.J.1089.2022.19218

    [2]

    石跃祥, 周玥. 基于阶梯型特征空间分割与局部注意力机制的行人重识别[J]. 电子与信息学报, 2022, 44(1): 195−202. doi: 10.11999/JEIT201006

    Shi Y X, Zhou Y. Person re-identification based on stepped feature space segmentation and local attention mechanism[J]. J Electron Inf Technol, 2022, 44(1): 195−202. doi: 10.11999/JEIT201006

    [3]

    王松, 纪鹏, 张云洲, 等. 自适应感受野网络的行人重识别[J]. 控制与决策, 2022, 37(1): 119−126. doi: 10.13195/j.kzyjc.2020.0505

    Wang S, Ji P, Zhang Y Z, et al. Adaptive receptive network for person re-identification[J]. Control Decis, 2022, 37(1): 119−126. doi: 10.13195/j.kzyjc.2020.0505

    [4]

    Wang Z X, Wang Z, Zheng Y Q, et al. Learning to reduce dual-level discrepancy for infrared-visible person re-identification[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 618–626. https://doi.org/10.1109/CVPR.2019.00071.

    [5]

    Zhong X, Lu T Y, Huang W X, et al. Grayscale enhancement colorization network for visible-infrared person re-identification[J]. IEEE Trans Circ Syst Video Technol, 2022, 32(3): 1418−1430. doi: 10.1109/TCSVT.2021.3072171

    [6]

    Wu Q, Dai P Y, Chen J, et al. Discover cross-modality nuances for visible-infrared person re-identification[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 4330–4339. https://doi.org/10.1109/CVPR46437.2021.00431.

    [7]

    Zhang D M, Zhang Z Z, Ju Y, et al. Dual mutual learning for cross-modality person re-identification[J]. IEEE Trans Circ Syst Video Technol, 2022, 32(8): 5361−5373. doi: 10.1109/TCSVT.2022.3144775

    [8]

    Ye M, Shen J B, Crandall D J, et al. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification[C]//Proceedings of the 16th European Conference on Computer Vision, 2020: 229–247. https://doi.org/10.1007/978-3-030-58520-4_14.

    [9]

    Ye M, Ruan W J, Du B, et al. Channel augmented joint learning for visible-infrared recognition[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision, 2021: 13567–13576. https://doi.org/10.1109/ICCV48922.2021.01331.

    [10]

    Chen C Q, Ye M, Qi M B, et al. Structure-aware positional transformer for visible-infrared person re-identification[J]. IEEE Trans Image Process, 2022, 31: 2352−2364. doi: 10.1109/TIP.2022.3141868

    [11]

    Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000–6010.

    [12]

    Lu H, Zou X Z, Zhang P P. Learning progressive modality-shared transformers for effective visible-infrared person re-identification[C]//Proceedings of the 37th AAAI Conference on Artificial Intelligence, 2023: 1835–1843. https://doi.org/10.1609/aaai.v37i2.25273.

    [13]

    Lin B B, Zhang S L, Yu X. Gait recognition via effective global-local feature representation and local temporal aggregation[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision, 2021: 14648–14656. https://doi.org/10.1109/ICCV48922.2021.01438.

    [14]

    Wu A C, Zheng W S, Yu H X, et al. RGB-infrared cross-modality person re-identification[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 5380–5389. https://doi.org/10.1109/ICCV.2017.575.

    [15]

    Nguyen D T, Hong H G, Kim K W, et al. Person recognition system based on a combination of body images from visible light and thermal cameras[J]. Sensors, 2017, 17(3): 605. doi: 10.3390/s17030605

    [16]

    Ye M, Shen J B, Lin G J, et al. Deep learning for person re-identification: a survey and outlook[J]. IEEE Trans Pattern Anal Mach Intell, 2022, 44(6): 2872−2893. doi: 10.1109/TPAMI.2021.3054775

    [17]

    Ye M, Lan X Y, Wang Z, et al. Bi-directional center-constrained top-ranking for visible thermal person re-identification[J]. IEEE Trans Inf Forensics Secur, 2020, 15: 407−419. doi: 10.1109/TIFS.2019.2921454

    [18]

    Ye M, Lan X Y, Li J W, et al. Hierarchical discriminative learning for visible thermal person re-identification[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018: 919. https://doi.org/10.1609/aaai.v32i1.12293.

    [19]

    Wang G A, Zhang T Z, Cheng J, et al. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 2019: 3623–3632. https://doi.org/10.1109/ICCV.2019.00372.

    [20]

    Li D G, Wei X, Hong X P, et al. Infrared-visible cross-modal person re-identification with an X modality[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020: 4610–4617. https://doi.org/10.1609/aaai.v34i04.5891.

    [21]

    Fu C Y, Hu Y B, Wu X, et al. CM-NAS: cross-modality neural architecture search for visible-infrared person re-identification[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision, 2021: 11823–11832. https://doi.org/10.1109/ICCV48922.2021.01161.

    [22]

    Hao X, Zhao S Y, Ye M, et al. Cross-modality person re-identification via modality confusion and center aggregation[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision, 2021: 16403–16412. https://doi.org/10.1109/ICCV48922.2021.01609.

    [23]

    Zheng X T, Chen X M, Lu X Q. Visible-infrared person re-identification via partially interactive collaboration[J]. IEEE Trans Image Process, 2022, 31: 6951−6963. doi: 10.1109/TIP.2022.3217697

    [24]

    Yang M X, Huang Z Y, Hu P, et al. Learning with twin noisy labels for visible-infrared person re-identification[C]//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 14308–14317. https://doi.org/10.1109/CVPR52688.2022.01391.

    [25]

    Liu H J, Ma S, Xia D X, et al. SFANet: a spectrum-aware feature augmentation network for visible-infrared person reidentification[J]. IEEE Trans Neural Netw Learn Syst, 2023, 34(4): 1958−1971. doi: 10.1109/TNNLS.2021.3105702

    [26]

    Gong J H, Zhao S Y, Lam K M, et al. Spectrum-irrelevant fine-grained representation for visible–infrared person re-identification[J]. Comput Vis Image Underst, 2023, 232: 103703. doi: 10.1016/j.cviu.2023.103703

    [27]

    Huang N C, Liu J N, Luo Y J, et al. Exploring modality-shared appearance features and modality-invariant relation features for cross-modality person Re-IDentification[J]. Pattern Recogn, 2023, 135: 109145. doi: 10.1016/j.patcog.2022.109145

    [28]

    Selvaraju R R, Cogswell M, Das A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. Int J Comput Vis, 2020, 128(2): 336−359. doi: 10.1007/s11263-019-01228-7

  • 加载中

(8)

(4)

计量
  • 文章访问数: 
  • PDF下载数: 
  • 施引文献:  0
出版历程
收稿日期:  2023-06-15
修回日期:  2023-08-10
录用日期:  2023-08-11
刊出日期:  2023-08-20

目录

/

返回文章
返回