
汪荣贵, 姚旭晨, 杨娟, 等. 基于深度迁移学习的微型细粒度图像分类[J]. 光电工程, 2019, 46(6): 180416. doi: 10.12086/oee.2019.180416
Wang Ronggui, Yao Xuchen, Yang Juan, et al. Deep transfer learning for fine-grained categorization on micro datasets[J]. Opto-Electronic Engineering, 2019, 46(6): 180416. doi: 10.12086/oee.2019.180416
    *通讯作者: 杨娟(1983-), 女,博士,讲师,主要从事视频信息处理、视频大数据处理技术、深度学习与二进神经网络理论与应用研究等。E-mail:yangjuan6985@163.com
  • 中图分类号: TP18

Deep transfer learning for fine-grained categorization on micro datasets

  • 现有的细粒度分类模型不仅利用图像的类别标签,还使用大量人工标注的额外信息。为解决该问题,本文提出一种深度迁移学习模型,将大规模有标签细粒度数据集上学习到的图像特征有效地迁移至微型细粒度数据集中。首先,通过衔接域定量计算域间任务的关联度。然后,根据关联度选择适合目标域的迁移特征。最后,使用细粒度数据集视图类标签进行辅助学习,通过联合学习所有属性来获取更多的特征表示。实验表明,本文方法不仅可以获得较高精度,而且能够有效减少模型训练时间,同时也验证了进行域间特征迁移可以加速网络学习与优化这一结论。

  • Overview: Fine-grained categorization is challenging due to its small inter-class and large intra-class variance. Moreover, requiring domain expertise makes fine-grained labelled data much more expensive to acquire. Existing models predominantly require extra information such as bounding box and part annotation in addition to the image category labels, which involves heavy human manual labor. To solve this problem, we propose a novel deep transfer learning model, which transfers the learned representations from large-scale labelled fine-grained datasets to micro fine-grained datasets. While the network in deep learning is a unified training and prediction framework that combines multi-level feature extractors and recognizers, end-to-end processing is particularly important. The design concept for our model is to take full advantage of the ability that the convolutional neural network itself can perform end-to-end processing. As is known that feature transfer learning can use the existing data to rapidly construct the corresponding network parameters for new data through end-to-end training, which assumes that the source domain and the target domain contains some common cross-features, data from each domain can be transformed into the same feature space for the following learning. We present a novel discriminative training method that is used to learn similarity measurement, introducing the cohesion-domain quantitative calculation for the correlation between the two domains. Firstly, we introduce a cohesion domain to measure the degree of correlation between source domain and target domain. Secondly, selecting the transferrable feature that are suitable for the target domain based on the correlation. Finally, we make most of perspective-class labels for auxiliary learning, and learn all the attributes through joint learning to extract more feature representations. Our model aims to make joint adjustments from end to end, we expect to explore abundant source-domain attributes through cross-domain learning and capture more complex cross-domain knowledge by embedding cross-dataset information, in order to minimize the original function loss for the learning tasks in two domains as much as possible. For the problem of inter-domain transition network, we freeze part of the network layers to extract relatively more well-defined representations of labelled fine-grained samples for transferring to target domain. Since feature learning has the ability to collect hierarchical information which is not affected by the training data. In this way, the problem of high non-convex model optimization is not only simplified, but also can be modified from a more local perspective. So that subsequent incremental learning can limit the switching task to its own domain, and it is also conducive for multi-task parallel training to share the learned representation from different tasks. The experiments show that our model not only achieves high categorization accuracy but also economizes training time effectively, it also verifies the conclusion that the inter-domain feature transition can accelerate learning and optimization.

  • 图 1  本文总体框架

    Figure 1.  Overall view of network architecture

    图 2  衔接域中微型网络示意图

    Figure 2.  Micro network in cohesion domain

    图 3  从四个不同视角对三个细粒度数据集检索的视图类标签

    Figure 3.  Each category in perspective class

    图 4  使用本文方法和未用本文方法在源域数据集Stanford Cars上时间性能比较

    Figure 4.  Time performance on the source-domain dataset Stanford Cars with and without our method

    图 5  使用本文方法和未用本文方法在源域数据集Stanford Dogs上时间性能比较

    Figure 5.  Time performance on the source-domain dataset Stanford Cars with and without our method

    图 6  使用本文方法和未用本文方法在源域数据集CUB-200-2011上时间性能比较

    Figure 6.  Time performance on the source-domain dataset CUB-200-2011 with and without our method

    表 1  单一任务与增加辅助任务对比结果

    Table 1.  Categorization result comparisons between single- task and auxiliary-task

    Single-task Auxiliary-task
    BMVC[15] 97.4 97.7
    BMW-10[16] 80 80.32
    Oxford-ⅢT Pet[17] 83.6 84.2
    Birds[18] 99.8 99.8
    表 2  投票机制的候选者作为单一个体进行独立迁移学习的结果对比

    Table 2.  Comparison candidates of voting mechanism as a single individual to transfer learning independently

    frozen 1-4 frozen 1-5 frozen 1-6
    BMVC[15] 97.4 97.7 97.2
    BMW-10[16] 76.7 80.23 74
    Oxford-ⅢT Pet[17] 83.7 84.2 83.55
    Birds[18] 99.8 99.8 99.6
    表 3  各方法在目标域微型数据集上分类结果

    Table 3.  Categorization result comparison on micro fine-grained datasets with advanced methods

    Method BMVC[15] Method BMW-10[16] Method Oxford-ⅢT Pet[17] Method Birds[18]
    PHOW[2] 89.0 KDES[19] 46.5 GMP+XColor[20] 56.8 CoCount[21] 55.22
    BoT[22] 96.6 BB[23] 58.7 DDTF[24] 57.5 GP[25] 58.06
    LLC[31] 84.5 LLC[31] 52.8 Zernike+SCC[26] 59.5 Low-rank[27] 74.5
    StructDPM[15] 93.5 structDPM[15] 29.1 BW-FMP[28] 69.6 SNAK[29] 81.33
    BB-3D-G[16] 94.5 BB-3D-G[16] 66.1 MsML+[30] 81.18 MEF-PB[18] 92.33
    AlexNet 94.95 57.2 82.5 98.67
    SqueezeNet 96.65 74 83.38 99.8
    Ours 97.7 80.23 84.2 99.8
    表 4  将数据集Oxford-ⅢT Pet[17]中的猫与狗数据分开后单独实验结果

    Table 4.  Categorization results for separate statistics of cats and dogs in Oxford-ⅢT Pet[17]

    Oxford-ⅢT Pet-dog Oxford-ⅢT Pet-cat
    AlexNet 59.45 40.3
    Ours-AlexNet 59.75 43.6
    SqueezeNet 59.50 44.4
    Ours-SqueezeNet 59.95 44.7
