多特征聚合的红外-可见光行人重识别

郑海君; 葛斌; 夏晨星; 邬成

doi:10.12086/oee.2023.230136

多特征聚合的红外-可见光行人重识别

- 1.
  安徽理工大学计算机科学与工程学院，安徽淮南 232001
- 2.
  合肥综合性国家科学中心能源研究院，安徽合肥 230031
基金项目:
国家自然科学基金(6210071479，62102003)；国家重大专项(2020YFB1314103)；安徽省自然科学基金(2108085QF258)；安徽省博士后基金(2022B623)

详细信息

作者简介:
郑海君(1998-)，男，硕士，主要从事图像处理、计算机视觉等方面的研究。E-mail: navy626@163.com;

葛斌(1975-)，男，博士，教授，硕士生导师，主要从事图像处理、信息安全等方面的研究。E-mail: bge@aust.edu.cn;

夏晨星(1991-)，男，博士，副教授，硕士生导师，主要从事图像处理、计算机视觉等方面的研究。E-mail: cxxia@aust.edu.cn;

邬成(1998-)，男，硕士，主要从事图像处理、计算机视觉等方面的研究。E-mail: 1965596900@qq.com

**^*通讯作者:** 葛斌，bge@aust.edu.cn

中图分类号: TP391

收稿日期: 2023-06-15

修回日期: 2023-08-10

录用日期: 2023-08-11

刊出日期: 2023-08-20

Infrared-visible person re-identification based on multi feature aggregation

- 1.
  College of Computer Science and Engineering, Anhui University of Science and Technology, Huainan, Anhui 232001, China
- 2.
  Institute of Energy, Hefei Comprehensive National Science Center, Hefei, Anhui 230031, China
Fund Project: Project supported by National Natural Science Foundation of China (6210071479，62102003), National Science and Technology Major Project (2020YFB1314103), Natural Science Foundation of Anhui Province(2108085QF258) and Anhui Postdoctoral Fund (2022B623)

More Information

**^*Corresponding author:** bge@aust.edu.cn

Received Date 15 June 2023

Revised Date 10 August 2023

Accepted Date 11 August 2023

Published Date 20 August 2023

摘要

摘要:
红外-可见光行人重识别在视频监控、智能交通、安防等领域具有广泛应用。但是不同图像模态间的差异，给该领域带来了巨大的挑战。现有方法主要集中于缓解模态间差异以获得更具鉴别性的特征，但却忽略了邻级特征之间的关系以及多尺度信息对全局特征的影响。因此，本文提出一种基于多特征聚合的红外-可见光行人重识别方法(MFANet)解决现有方法的缺陷。首先在特征提取阶段融合邻级特征，引导低级特征信息的融入，以强化高级特征，使得特征更具健壮性；然后聚合不同感受野的多尺度特征以获得丰富的上下文信息；最后，以多尺度特征作为引导，强化特征以获得更具鉴别性的特征。在SYSU-MM01和RegDB数据集上的实验结果证明了所提方法的有效性，其中SYSU-MM01数据集在最困难的全搜索单镜头模式下平均精度达到了71.77%。
- 行人重识别 /
- 红外 /
- 多尺度 /
- 邻级特征
Abstract:
Infrared-visible person re-identification has been widely used in video surveillance, intelligent transportation, security, and other fields. However, due to the differences between different image modalities, it brings great challenges to this field. The existing methods mainly focus on mitigating the differences between modes to obtain more discriminating features, but ignore the relationship between adjacent features and the influence of multi-scale information on global features. Here, a infrared-visible person re-identification method (MFANet) based on multi-feature aggregation is proposed to solve the shortcomings of existing methods. Firstly, the adjacent level features are fused in the feature extraction stage, and the integration of low-level feature information is guided to strengthen the high-level features and make the features more robust. Then, the multi-scale features of different receptive fields of view are aggregated to obtain rich contextual information. Finally, multi-scale features are used as a guide to strengthen the features to obtain more discriminating features. Experimental results on SYSU-MM01 and RegDB datasets show the effectiveness of the proposed method, and the average accuracy of SYSU-MM01 dataset reaches 71.77% in the most difficult all-search single-shot mode.
- person re-identification /
- infrared /
- multi-scale /
- adjacent level features

Overview

Overview: Infrared-visible person re-identification is a prominent research topic in the field of computer vision, encompassing several essential aspects. These include multi-modal perception technology, challenges in person re-identification, practical application demands, and the development of datasets and evaluation metrics. With the emergence of multi-modal perception technology, the primary objective of infrared-visible light person re-identification is to effectively fuse information from different modalities to enhance the accuracy and robustness of person re-identification. Person re-identification faces challenges such as variations in viewpoint, pose, occlusion, and lighting conditions. Furthermore, infrared-visible person re-identification poses additional challenges as a cross-modal task. This technology holds broad prospects for applications in video surveillance, security, intelligent transportation, and other related fields. Particularly, it is well-suited for person re-identification in low-light or nighttime environments. The development of relevant datasets and evaluation metrics has facilitated ongoing innovation and improvement in infrared-visible person re-identification algorithms and systems. Infrared-visible person re-identification is a research field extensively supported by various backgrounds, providing a foundation for enhancing the performance and application effectiveness of person re-identification. With the continuous exploration of researchers, the accuracy of infrared-visible person re-identification has steadily improved. However, due to the differences between different image modalities, it brings great challenges to this field. The existing methods mainly focus on mitigating the differences between modes to obtain more discriminating features, but ignore the relationship between adjacent features and the influence of multi-scale information on global features. Here, a infrared-visible person re-identification method (MFANet) based on multi-feature aggregation is proposed to solve the shortcomings of existing methods. Firstly, the adjacent level features are fused in the feature extraction stage, and the integration of low-level feature information is guided to strengthen the high-level features and make the features more robust. Then, the multi-scale features of different receptive fields of view are aggregated to obtain rich contextual information. Finally, multi-scale features are used as a guide to strengthen the features to obtain more discriminating features. Experimental results on SYSU-MM01 and RegDB datasets show the effectiveness of the proposed method, and the average accuracy of SYSU-MM01 dataset reaches 71.77% in the all-search single-shot mode and 78.24% in the indoor-search single-shot mode.

HTML全文

图 1 MFANet结构图

Figure 1. MFANet structure diagram

下载: 全尺寸图片幻灯片

图 2 邻级特征聚合模块

Figure 2. Adjacency feature aggregation module

下载: 全尺寸图片幻灯片

图 3 多尺度聚合模块

Figure 3. Multi scale aggregation module

下载: 全尺寸图片幻灯片

图 4 多尺度特征聚合模块

Figure 4. Multi scale feature aggregation module

下载: 全尺寸图片幻灯片

图 5 类内类间距离与特征分布图

Figure 5. Inter-class intra-class distance and feature distribution diagram

下载: 全尺寸图片幻灯片

图 6 不同感受野下的热力图

Figure 6. Heat map at different receptive fields of view

下载: 全尺寸图片幻灯片

图 7 SYSU-MM01可视化排序结果

Figure 7. Visual sorting results on SYSU-MM01

下载: 全尺寸图片幻灯片

表 1 SYSU-MM01数据集比较结果

Table 1. Comparison results on SYSU-MM01 datasets

Method	Setting
	All search					Indoor search
	rank-1	rank-10	rank-20	mAP	mINP	rank-1	rank-10	rank-20	mAP	mINP
One-stream^[14]	12.04	49.68	66.74	13.67	-	16.94	63.55	82.10	22.95	-
Two-stream^[14]	11.65	47.99	65.50	12.85	-	15.60	61.18	81.02	21.49	-
Zero-Padding^[14]	14.80	54.12	71.33	15.59	-	20.58	68.38	85.79	26.92	-
HCML^[18]	14.32	53.16	69.17	16.16	-	24.52	73.25	86.73	30.08	-
BDTR^[17]	27.32	66.96	81.07	27.32	-	31.92	77.18	89.28	41.86	-
D²RL^[4]	28.90	70.60	82.40	29.20	-	-	-	-	-	-
AlignGAN^[19]	42.03	85.25	93.73	41.48	-	45.86	90.17	95.39	55.18	-
AGW^[16]	47.58	84.45	92.11	47.69	35.30	54.29	91.14	95.99	63.02	59.23
Xmodal^[20]	49.92	89.79	95.96	50.73	-	-	-	-	-	-
DDAG^[8]	53.61	89.17	95.30	52.02	39.62	58.37	91.92	97.42	65.44	62.61
CM-NAS^[21]	62.04	92.92	97.31	60.00	-	67.03	97.02	99.34	72.97	-
CAJ^[9]	68.23	95.59	98.49	65.32	53.61	74.01	97.79	99.67	78.52	76.79
MPANet^[6]	70.07	95.39	98.39	67.07	-	76.35	97.56	99.48	80.16	-
PIC^[23]	57.5	-	-	55.1	-	60.4	-	-	67.7	-
DART^[24]	68.72	96.39	98.96	66.29	53.26	72.52	97.84	99.46	78.17	74.94
SPOT^[10]	65.34	92.73	97.04	62.25	48.86	69.42	96.22	99.12	74.63	70.48
DML^[7]	58.40	91.20	95.80	56.10	-	62.40	95.20	98.70	69.50	-
PMT^[12]	67.53	95.36	98.64	64.98	51.86	71.66	96.73	99.25	76.52	72.74
SFANet^[25]	65.74	92.98	97.05	60.83	-	71.60	96.60	99.45	80.05	-
SIDA^[26]	68.36	95.91	98.56	64.19	-	73.28	97.35	99.52	77.49	-
MTMFE^[27]	69.47	96.42	99.11	66.41	-	72.56	96.98	99.20	76.58	-
Ours	71.77	96.15	98.7	68.43	55.21	78.24	98.23	99.49	81.9	78.44

下载: 导出CSV

表 2 RegDB数据集比较结果

Table 2. Comparison results on RegDB datasets

Method	Setting
	Visible to infrared					Infrared to visible
	rank-1	rank-10	rank-20	mAP	mINP	rank-1	rank-10	rank-20	mAP	mINP
Zero-Padding^[14]	17.75	34.21	44.35	18.90	-	16.63	34.68	44.25	17.82	-
HCML^[18]	24.44	47.53	56.78	20.08	-	21.70	45.02	55.58	22.24	-
BDTR^[17]	33.56	58.61	67.43	32.76	-	32.92	58.46	68.43	31.96	-
D²RL^[4]	43.40	66.10	76.30	44.10	-	-	-	-	-
AGW^[16]	70.05	86.21	91.55	66.37	50.19	70.49	87.21	91.84	65.90	51.24
Xmodal^[20]	62.21	83.13	91.72	60.18	-	-	-	-	-	-
DDAG^[8]	69.34	85.77	89.98	63.19	49.24	64.77	83.85	88.90	58.54	48.62
CM-NAS^[21]	84.54	95.18	97.85	80.32	-	82.56	94.52	97.37	78.31	-
MCLNet^[22]	80.31	92.70	96.03	73.07	-	75.93	90.93	94.59	69.49	-
CAJ^[9]	84.72	95.17	97.38	78.70	65.33	84.09	94.79	97.11	77.25	61.56
PIC^[23]	83.6	-	-	79.6	-	79.5	-	-	77.4	-
DART^[24]	78.23	-	-	67.04	48.36	75.04	-	-	64.38	43.32
SPOT^[10]	80.35	93.48	96.44	72.46	56.19	79.37	92.79	96.01	72.26	56.06
DML^[7]	77.60	-	-	84.30	-	77.00	-	-	83.60	-
PMT^[12]	84.83	-	-	76.55	-	84.16	-	-	75.13	-
SFANet^[25]	76.31	91.02	94.27	68.00	-	70.15	85.24	89.27	63.77	-
SIDA^[26]	81.73	-	96.55	75.07	-	79.71	-	95.47	72.60	-
MTMFE^[27]	85.04	94.38	97.22	82.52	-	81.11	92.35	96.19	79.59	-
Ours	85.38	95.39	97.54	79.49	65.72	84.58	95.27	97.23	78.02	62.22

下载: 导出CSV

表 3 SYSU-MM01数据集上4种不同设定的消融研究

Table 3. Ablation study of four different settings on the SYSU-MM01 dataset

Settings		All search					Indoor search
AFAM	MSAM	rank-1	rank-10	rank-20	mAP	mINP	rank-1	rank-10	rank-20	mAP	mINP
		68.23	95.59	98.49	65.32	51.90	74.01	97.79	99.67	78.52	74.78
√		69.30	95.69	98.41	65.95	52.14	75.27	97.84	99.48	79.51	75.79
	√	70.89	95.88	98.52	67.61	54.3	77.69	97.43	99.25	81.09	77.48
√	√	71.77	96.15	98.70	68.43	55.21	78.24	98.23	99.49	81.90	78.44

下载: 导出CSV

表 4 多尺度特征聚合模块感受野分析

Table 4. Multi scale feature aggregation module receptive field analysis

Settings	Receptive field
	All search					Indoor search
	rank-1	rank-10	rank-20	mAP	mINP	rank-1	rank-10	rank-20	mAP	mINP
1，3，5，7	69.41	95.82	98.54	66.27	52.96	75.34	97.96	99.66	79.98	76.53
1，2，3，4	70.17	95.51	98.51	67.14	54.1	76.44	98.09	99.6	80.65	77.22
1，3，5	70.5	95.77	98.54	67.11	53.68	76.66	97.81	99.51	80.54	76.99
1，2，3	70.53	95.86	98.67	67.21	53.77	76.59	97.88	99.37	80.68	77.22
1，3	71.77	96.15	98.70	68.43	55.21	78.24	98.23	99.49	81.90	78.44

下载: 导出CSV

参考文献(28)

[1]	刘丽, 李曦, 雷雪梅. 多尺度多特征融合的行人重识别模型[J]. 计算机辅助设计与图形学学报, 2022, 34(12): 1868−1876. doi: 10.3724/SP.J.1089.2022.19218 Liu L, Li X, Lei X M. A person re-identification method with multi-scale and multi-feature fusion[J]. J Comput-Aided Des Comput Graphics, 2022, 34(12): 1868−1876. doi: 10.3724/SP.J.1089.2022.19218
[2]	石跃祥, 周玥. 基于阶梯型特征空间分割与局部注意力机制的行人重识别[J]. 电子与信息学报, 2022, 44(1): 195−202. doi: 10.11999/JEIT201006 Shi Y X, Zhou Y. Person re-identification based on stepped feature space segmentation and local attention mechanism[J]. J Electron Inf Technol, 2022, 44(1): 195−202. doi: 10.11999/JEIT201006
[3]	王松, 纪鹏, 张云洲, 等. 自适应感受野网络的行人重识别[J]. 控制与决策, 2022, 37(1): 119−126. doi: 10.13195/j.kzyjc.2020.0505 Wang S, Ji P, Zhang Y Z, et al. Adaptive receptive network for person re-identification[J]. Control Decis, 2022, 37(1): 119−126. doi: 10.13195/j.kzyjc.2020.0505
[4]	Wang Z X, Wang Z, Zheng Y Q, et al. Learning to reduce dual-level discrepancy for infrared-visible person re-identification[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 618–626. https://doi.org/10.1109/CVPR.2019.00071.
[5]	Zhong X, Lu T Y, Huang W X, et al. Grayscale enhancement colorization network for visible-infrared person re-identification[J]. IEEE Trans Circ Syst Video Technol, 2022, 32(3): 1418−1430. doi: 10.1109/TCSVT.2021.3072171
[6]	Wu Q, Dai P Y, Chen J, et al. Discover cross-modality nuances for visible-infrared person re-identification[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 4330–4339. https://doi.org/10.1109/CVPR46437.2021.00431.
[7]	Zhang D M, Zhang Z Z, Ju Y, et al. Dual mutual learning for cross-modality person re-identification[J]. IEEE Trans Circ Syst Video Technol, 2022, 32(8): 5361−5373. doi: 10.1109/TCSVT.2022.3144775
[8]	Ye M, Shen J B, Crandall D J, et al. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification[C]//Proceedings of the 16th European Conference on Computer Vision, 2020: 229–247. https://doi.org/10.1007/978-3-030-58520-4_14.
[9]	Ye M, Ruan W J, Du B, et al. Channel augmented joint learning for visible-infrared recognition[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision, 2021: 13567–13576. https://doi.org/10.1109/ICCV48922.2021.01331.
[10]	Chen C Q, Ye M, Qi M B, et al. Structure-aware positional transformer for visible-infrared person re-identification[J]. IEEE Trans Image Process, 2022, 31: 2352−2364. doi: 10.1109/TIP.2022.3141868
[11]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000–6010.
[12]	Lu H, Zou X Z, Zhang P P. Learning progressive modality-shared transformers for effective visible-infrared person re-identification[C]//Proceedings of the 37th AAAI Conference on Artificial Intelligence, 2023: 1835–1843. https://doi.org/10.1609/aaai.v37i2.25273.
[13]	Lin B B, Zhang S L, Yu X. Gait recognition via effective global-local feature representation and local temporal aggregation[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision, 2021: 14648–14656. https://doi.org/10.1109/ICCV48922.2021.01438.
[14]	Wu A C, Zheng W S, Yu H X, et al. RGB-infrared cross-modality person re-identification[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 5380–5389. https://doi.org/10.1109/ICCV.2017.575.
[15]	Nguyen D T, Hong H G, Kim K W, et al. Person recognition system based on a combination of body images from visible light and thermal cameras[J]. Sensors, 2017, 17(3): 605. doi: 10.3390/s17030605
[16]	Ye M, Shen J B, Lin G J, et al. Deep learning for person re-identification: a survey and outlook[J]. IEEE Trans Pattern Anal Mach Intell, 2022, 44(6): 2872−2893. doi: 10.1109/TPAMI.2021.3054775
[17]	Ye M, Lan X Y, Wang Z, et al. Bi-directional center-constrained top-ranking for visible thermal person re-identification[J]. IEEE Trans Inf Forensics Secur, 2020, 15: 407−419. doi: 10.1109/TIFS.2019.2921454
[18]	Ye M, Lan X Y, Li J W, et al. Hierarchical discriminative learning for visible thermal person re-identification[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018: 919. https://doi.org/10.1609/aaai.v32i1.12293.
[19]	Wang G A, Zhang T Z, Cheng J, et al. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 2019: 3623–3632. https://doi.org/10.1109/ICCV.2019.00372.
[20]	Li D G, Wei X, Hong X P, et al. Infrared-visible cross-modal person re-identification with an X modality[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020: 4610–4617. https://doi.org/10.1609/aaai.v34i04.5891.
[21]	Fu C Y, Hu Y B, Wu X, et al. CM-NAS: cross-modality neural architecture search for visible-infrared person re-identification[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision, 2021: 11823–11832. https://doi.org/10.1109/ICCV48922.2021.01161.
[22]	Hao X, Zhao S Y, Ye M, et al. Cross-modality person re-identification via modality confusion and center aggregation[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision, 2021: 16403–16412. https://doi.org/10.1109/ICCV48922.2021.01609.
[23]	Zheng X T, Chen X M, Lu X Q. Visible-infrared person re-identification via partially interactive collaboration[J]. IEEE Trans Image Process, 2022, 31: 6951−6963. doi: 10.1109/TIP.2022.3217697
[24]	Yang M X, Huang Z Y, Hu P, et al. Learning with twin noisy labels for visible-infrared person re-identification[C]//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 14308–14317. https://doi.org/10.1109/CVPR52688.2022.01391.
[25]	Liu H J, Ma S, Xia D X, et al. SFANet: a spectrum-aware feature augmentation network for visible-infrared person reidentification[J]. IEEE Trans Neural Netw Learn Syst, 2023, 34(4): 1958−1971. doi: 10.1109/TNNLS.2021.3105702
[26]	Gong J H, Zhao S Y, Lam K M, et al. Spectrum-irrelevant fine-grained representation for visible–infrared person re-identification[J]. Comput Vis Image Underst, 2023, 232: 103703. doi: 10.1016/j.cviu.2023.103703
[27]	Huang N C, Liu J N, Luo Y J, et al. Exploring modality-shared appearance features and modality-invariant relation features for cross-modality person Re-IDentification[J]. Pattern Recogn, 2023, 135: 109145. doi: 10.1016/j.patcog.2022.109145
[28]	Selvaraju R R, Cogswell M, Das A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. Int J Comput Vis, 2020, 128(2): 336−359. doi: 10.1007/s11263-019-01228-7

施引文献

资源附件(0)

访问统计

点击扫一扫

图(8)

表(4)

计量

文章访问数:
PDF下载数:
施引文献: 0

多特征聚合的红外-可见光行人重识别

**^*通讯作者:** 葛斌，bge@aust.edu.cn

Infrared-visible person re-identification based on multi feature aggregation

**^*Corresponding author:** bge@aust.edu.cn

计量

目录

作者须知

其他内容

条款和政策

多特征聚合的红外-可见光行人重识别

*通讯作者: 葛斌，bge@aust.edu.cn

Infrared-visible person re-identification based on multi feature aggregation

*Corresponding author: bge@aust.edu.cn

计量

出版历程

目录

作者须知

其他内容

条款和政策

**^*通讯作者:** 葛斌，bge@aust.edu.cn

**^*Corresponding author:** bge@aust.edu.cn