
薛丽霞,朱正发,汪荣贵,等. 基于多分区注意力的行人重识别方法[J]. 光电工程,2020,47(11):190628. doi: 10.12086/oee.2020.190628
Xue L X, Zhu Z F, Wang R G, et al. Person re-identification by multi-division attention[J]. Opto-Electron Eng, 2020, 47(11): 190628. doi: 10.12086/oee.2020.190628
    *通讯作者: 杨娟(1983-),女,博士,讲师,硕士生导师,主要从事视频信息处理、视频大数据处理技术、深度学习与二进神经网络理论与应用等的研究。E-mail:yangjuan6985@163.com
  • 中图分类号: TP391.4;TP301.6

Person re-identification by multi-division attention

  • 行人重识别是计算机视觉中一项具有挑战性和实际意义的重要任务,具有广泛的应用前景。背景干扰、任意变化的行人姿态和无法控制的摄像机角度等都会给行人重识别研究带来较大的阻碍。为提取更具有辨别力的行人特征,本文提出了基于多分区注意力的网络架构,该网络能同时从全局图像和不同局部图像中学习具有鲁棒性和辨别力的行人特征表示,能高效地提高行人重识别任务的识别能力。此外,在局部分支中设计了一种双重注意力网络,由空间注意力和通道注意力共同组成,优化提取局部特征。实验结果表明,该网络在Market-1501、DukeMTMC-reID和CUHK03数据集上的平均精度均值分别达到82.94%、72.17%、71.76%。

  • Overview:With the popularity of surveillance cameras in public areas, person re-identification has become more and more important, and has become a core technology in video content retrieval, video surveillance, and intelligent security. However, in actual application scenarios, due to factors such as camera shooting angle, complex lighting changes, and changing pedestrian poses, occlusions, clothes, and background clutter in person images. It makes even the same person target have significant differences in different cameras, which poses a great challenge for person re-identification research. Therefore, in this paper we propose a research method based on deep convolutional networks, which combines global and local person feature and attention mechanisms to solve the problem of person re-identification. First, unlike traditional methods, we use ResNet50 network to initially extract person image features with more discriminating ability. Then, according to the person inherent body structure, the image is divided into several bands in the horizontal direction, and it is input into the local branch of the built-in attention mechanism to extract the person local attention features. At the same time, the global image is input to the global branch to extract the person global features. Finally, the person global features and local attention features are fused to calculate the loss function. In the network, in order to better extract the person local features, we design two local branches to segment the person images into different numbers of local area images. With the increase of the number of blocks, the network will learn more detailed and discriminative local features in each different local area, and at the same time, it can filter irrelevant information in local images to a large extent by combining the attention mechanism. Our proposed attention mechanism can make the network focus on the areas that need to be identified. The output person attention features usually have a stronger response than the non-target areas. Therefore, the attention networks we design include spatial attention networks and channel attention networks, which complement each other to learn the optimal attention feature, thereby extracting more discriminative local features. Experimental results show that the method proposed in this paper can effectively improve the performance of person re-identification.

  • 图 1  MDA模型框架的概述

    Figure 1.  Overview of our proposed MDA network for person re-identification

    图 2  SANet网络结构

    Figure 2.  Detailed network of the SANet subnet

    图 3  CANet网络结构

    Figure 3.  Detailed network of the CANet subnet

    图 4  行人图像前10个排序结果

    Figure 4.  Top-10 ranking list for some query images

    图 5  不同分支组合比较结果图

    Figure 5.  Comparison of different branch combination

    图 6  DLA效果图

    Figure 6.  Evaluations on how DLA enhances person re-identification

    表 1  Backbone network结构

    Table 1.  Backbone network structure

    Layer name Share Patch size Output size
    Input data - - 384×128, 3
    Backbone Conv2d Yes 7×7, 64 192×64, 64
    BN Yes 64 192×64, 64
    Max pool Yes 3×3, 64 96×32, 64
    Conv2_x Yes $\left[ {\begin{array}{*{20}{c}} {1 \times 1, 64}\\ {3 \times 3, 64}\\ {1 \times 1, 256} \end{array}} \right] \times 3 $ 96×32, 256
    Conv3_x Yes $ \left[ {\begin{array}{*{20}{c}} {1 \times 1, 128}\\ {3 \times 3, 128}\\ {1 \times 1, 512} \end{array}} \right] \times 4$ 48×16, 512
    Conv4_1 Yes $\left[ {\begin{array}{*{20}{c}} {1 \times 1, 256}\\ {3 \times 3, 256}\\ {1 \times 1, 1024} \end{array}} \right] $ 24×8, 1024
    G_conv Conv5_x No $\left[ {\begin{array}{*{20}{c}} {1 \times 1, 512}\\ {3 \times 3, 512}\\ {1 \times 1, 2048} \end{array}} \right] \times 3 $ 12×4, 2048
    P_conv i
    (i∈[1, 2])
    Conv5_x No $ \left[ {\begin{array}{*{20}{c}} {1 \times 1, 512}\\ {3 \times 3, 512}\\ {1 \times 1, 2048} \end{array}} \right] \times 3$ 24×8, 2048
    表 2  Market-1501数据集实验结果

    Table 2.  Comparison of results on Market-1501

    Methods Rank-1/% mAP/%
    PCB+RPP[18] 93.80 81.60
    Spindle[16] 76.90 64.67
    PDC[22] 84.14 63.41
    Part-Aligned[24] 81.00 63.40
    AlignedReID[31] 91.80 79.30
    APR[30] 87.04 66.89
    HA-CNN[25] 91.20 75.70
    Hydraplus-net[26] 91.80 -
    DuATM[32] 91.40 76.60
    Ours 94.03 82.94
    Ours(RK) 94.98 90.27
    "RK" refers to implementing re-ranking[29] operation
    表 3  DukeMTMC-ReID数据集实验结果

    Table 3.  Comparison of results on DukeMTMC-ReID

    Methods Rank-1/% mAP/%
    PAN[19] 71.59 51.51
    PCB+RPP[18] 83.30 69.20
    APR[30] 73.92 55.56
    HA-CNN[25] 80.50 63.80
    DuATM[32] 81.16 67.73
    Ours 84.68 72.17
    表 4  CUHK03数据集实验结果

    Table 4.  Comparison of results on CUHK03

    Methods CUHK03-Labeled CUHK03-Detected
    Rank-1/% mAP/% Rank-1/% mAP/%
    PAN[19] 36.86 35.03 36.29 34.00
    PCB+RPP[18] - - 63.70 57.50
    PDC[22] 88.70 - 78.29 -
    PAN[19] 36.30 34.0 36.90 35.0
    Part-Aligned[24] 68.90 - 65.64 -
    HA-CNN[25] 44.40 41.0 41.70 38.60
    Ours 75.36 71.76 73.53 65.91
  • 文章访问数:  5401
  PDF下载数:  1814
  • 施引文献:  0
收稿日期:  2019-10-17
修回日期:  2020-03-10
刊出日期:  2020-11-15


