多尺度注意力与领域自适应的小样本图像识别

陈龙,张建林,彭昊,等. 多尺度注意力与领域自适应的小样本图像识别[J]. 光电工程,2023,50(4): 220232. doi: 10.12086/oee.2023.220232
引用本文: 陈龙,张建林,彭昊,等. 多尺度注意力与领域自适应的小样本图像识别[J]. 光电工程,2023,50(4): 220232. doi: 10.12086/oee.2023.220232
Chen L, Zhang J L, Peng H, et al. Few-shot image classification via multi-scale attention and domain adaptation[J]. Opto-Electron Eng, 2023, 50(4): 220232. doi: 10.12086/oee.2023.220232
Citation: Chen L, Zhang J L, Peng H, et al. Few-shot image classification via multi-scale attention and domain adaptation[J]. Opto-Electron Eng, 2023, 50(4): 220232. doi: 10.12086/oee.2023.220232

多尺度注意力与领域自适应的小样本图像识别

  • 基金项目:
    国家自然科学基金青年科学基金资助项目(62101529)
详细信息
    作者简介:
    *通讯作者: 张建林,jlin_zh@163.com
  • 中图分类号: TP391.4

Few-shot image classification via multi-scale attention and domain adaptation

  • Fund Project: National Natural Science Foundation of China (62101529)
More Information
  • 小样本图像识别是计算机视觉中的关键问题。针对小样本情况下度量学习方法的类别原型不准确、泛化能力差问题,本文采用以下措施来提高小样本图像识别准确率:第一、为减缓样本稀缺问题,利用掩膜自编码器进行图像扩充,提高样本复杂度。第二、设计多尺度注意力模块,突出类别相关特征,解决类别原型偏差大的问题。第三、提出领域自适应模块,引入间隔损失函数,优化嵌入函数的表征能力,实现新类样本的精确表征,增强模型的泛化能力。通过在多个公开数据集上进行实验表明,本文方法可以有效提升小样本图像识别准确率。

  • Overview: In recent years, deep learning has been highly successful in the field of computer vision. However, deep learning is often hampered when the dataset is limited. Few-shot learning is proposed to tackle this problem. Few-shot learning can generalize to the new task which has a small number of samples by using prior knowledge. Metric learning is often used to solve few-shot classification, which maps samples to feature space and classifies query samples by using similarity metric functions. Prototypes calculated by metric learning are inaccurate under the condition of limited samples. Moreover, the generalization ability of the model is poor.

    To improve the performance of few-shot classification, we present a general and flexible method named Multi-Scale Attention and Domain Adaptation Network (MADA). Firstly, to tackle the problem of limited samples, a masked autoencoder is used to image augmentation. Moreover, it can be inserted as a plug-and-play module into a few-shot classification. Secondly, the multi-scale attention module can adapt feature vectors extracted by embedding function to the current classification task. Multi-scale attention machine strengthens the discriminative image region by focusing on relating samples in both base class and novel class, which makes prototypes more accurate. In addition, the embedding function pays attention to the task-specific feature. Thirdly, the domain adaptation module is used to address the domain shift caused by the difference in data distributions of the two domains. The domain adaptation module consists of the metric module and the margin loss function. The margin loss pushes different prototypes away from each other in the feature space. Sufficient margin space in feature space improves the generalization performance of the method. The experimental results show the classification accuracy of the proposed method is 67.45% for 5-way 1-shot and 82.77% for 5-way 5-shot on the miniImageNet dataset. The classification accuracy is 70.57% for 5-way 1-shot and 85.10% for 5-way 5-shot on the tieredImageNet dataset. The classification accuracy of our method is better than most previous methods. After dimension reduction and visualization of features by using t-SNE, it can be concluded that domain drift is alleviated, and prototypes are more accurate. The multi-scale attention module enhanced feature representations are more discriminative for the target classification task. In addition, the domain adaptation module improves the generalization ability of the model.

  • 加载中
  • 图 1  5-way 1-shot示意图

    Figure 1.  Diagram of 5-way 1-shot

    图 2  多尺度注意力与领域自适应小样本分类模型结构

    Figure 2.  The framework of few-shot image classification via multi-scale attention and domain adaptation

    图 3  掩膜自编码器复原图像

    Figure 3.  Image restoration by masked autoencoders

    图 4  多尺度注意力模型结构

    Figure 4.  Structure of the multi-scale attention

    图 5  领域自适应模块工作原理。(a) 模型训练阶段;(b) 模型测试阶段

    Figure 5.  Operating principle of the domain adaptation module. (a) Model training process; (b) Model testing process

    图 6  MiniImageNet数据集模型收敛。(a) 5-way 1-shot;(b) 5-way 5-shot

    Figure 6.  Model convergence on the miniImageNet dataset. (a) 5-way 1-shot;(b) 5-way 5-shot

    图 7  MiniImageNet数据集模型损失。 (a) 5-way 1-shot;(b) 5-way 5-shot

    Figure 7.  Model loss on the miniImageNet dataset. (a) 5-way 1-shot;(b) 5-way 5-shot

    图 8  卷积神经网络视觉特征可视化

    Figure 8.  Visualization of features based on convolutional neural network

    图 9  MiniImageNet中5类图像特征向量可视化。(a) Baseline方法;(b) 本文方法

    Figure 9.  Visualization of image features in five classes of miniImageNet. (a) Baseline method; (b) MADA method

    图 10  注意力热图

    Figure 10.  Attention heat map

    图 11  领域自适应模型性能分析。(a) 5-way 1-shot ;(b) 5-way 5-shot

    Figure 11.  Domain adaptation module analysis. (a) 5-way 1-shot; (b) 5-way 5-shot

    表 1  骨架网络的模型结构

    Table 1.  Structure of the backbone

    模型结构输出尺寸ResNet-12Conv4
    卷积层142$ \times $42[3×3,64] × 3[3×3,64]
    卷积层221$ \times $21[3×3,160] × 3[3×3,64]
    卷积层310$ \times $10[3×3,320] × 3[3×3,64]
    卷积层45$ \times $5[3×3,640] × 3[3×3,64]
    池化层1$ \times $15×5 Pool5×5 Pool
    参数量50 MB0.46 MB
    下载: 导出CSV

    表 2  MiniImageNet数据集置信度95%小样本分类准确率 (episodes为10000)

    Table 2.  Few-shot classification accuracies with 95 confidence interval on the miniImageNet dataset (the number of episodes is 10000)

    模型骨架网络5-way 1-shot5-way 5-shot
    Matching Net [12]Conv443.56±0.8455.31±0.37
    Proto Net [13]Conv449.42±0.7868.20±0.66
    Relation Net [21]Conv450.44±0.8265.32±0.70
    MAML [9]Conv448.70±1.8463.11±0.92
    DN4 [14]Conv451.24±0.7471.02±0.64
    DSN [23]Conv451.78±0.9668.99±0.69
    BOIL [25]Conv449.61±0.1666.45±0.37
    MADA(ours)Conv455.27±0.2072.12±0.16
    Matching Net [12]ResNet-1265.64±0.2078.73±0.15
    Proto Net [13]ResNet-1260.37±0.8378.02±0.75
    DN4 [14]ResNet-1254.37±0.3674.44±0.29
    DSN [23]ResNet-1262.64±0.6678.73±0.45
    SNAIL [22]ResNet-1255.71±0.9968.88±0.92
    CTM [24]ResNet-1264.12±0.8280.51±0.14
    MADA(ours)ResNet-1267.45±0.2082.77±0.13
    下载: 导出CSV

    表 3  TieredImageNet数据集置信度95%小样本分类准确率 (episodes为10000)

    Table 3.  Few-shot classification accuracies with 95 confidence interval on the tieredImageNet dataset (the number of episodes is 10000)

    模型骨架网络5-way 1-shot5-way 5-shot
    Matching Net [12]ResNet-1268.50±0.9280.60±0.71
    Proto Net [13]ResNet-1265.65±0.9283.40±0.65
    MetaOpt Net [26]ResNet-1265.99±0.7281.56±0.53
    TPN [27]ResNet-1259.91±0.9473.30±0.75
    CTM [24]ResNet-1268.41±0.3984.28±1.74
    LEO [10]ResNet-1266.63±0.0581.44±0.09
    MADA(ours)ResNet-1270.67±0.2285.10±0.15
    下载: 导出CSV

    表 4  CUB数据集置信度95%小样本分类准确率 (episodes为10000)

    Table 4.  Few-shot classification accuracies with 95 confidence interval on the CUB dataset (the number of episodes is 10000)

    模型骨架网络5-way 1-shot5-way 5-shot
    Matching Net [12]Conv460.52±0.8875.29±0.75
    Proto Net [13]Conv450.46±0.8876.39±0.64
    Relation Net [21]Conv461.10±0.7976.11±0.69
    MAML [9]Conv454.73±0.9775.75±0.76
    Baseline++ [28]Conv460.53±0.8379.34±0.61
    DN4 [14]Conv466.63±0.0581.44±0.09
    MADA(ours)Conv462.12±0.2477.63±0.17
    下载: 导出CSV

    表 5  在miniImageNet数据集上置信度95%小样本分类的消融实验 (episodes为10000)

    Table 5.  Ablation study of few-shot classification accuracies with 95 confidence interval on the miniImageNet (the number of episodes is 10000)

    网络MA DA DE5-way 1-shot5-way 5-shot
    Baseline× × ×60.37±0.8378.02±0.75
    MA × ×65.84±0.2381.94±0.34
    MADA ×67.21±0.1882.41±0.48
    MADA+67.45±0.2082.77±0.13
    下载: 导出CSV
  • [1]

    He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770–778. https://doi.org/10.1109/CVPR.2016.90.

    [2]

    Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4171–4186. https://doi.org/10.18653/v1/N19-1423.

    [3]

    赵春梅, 陈忠碧, 张建林. 基于深度学习的飞机目标跟踪应用研究[J]. 光电工程, 2019, 46(9): 180261. doi: 10.12086/oee.2019.180261

    Zhao C M, Chen Z B, Zhang J L. Application of aircraft target tracking based on deep learning[J]. Opto-Electron Eng, 2019, 46(9): 180261. doi: 10.12086/oee.2019.180261

    [4]

    石超, 陈恩庆, 齐林. 红外视频中的舰船检测[J]. 光电工程, 2018, 45(6): 170748. doi: 10.12086/oee.2018.170748

    Shi C, Chen E Q, Qi L. Ship detection from infrared video[J]. Opto-Electron Eng, 2018, 45(6): 170748. doi: 10.12086/oee.2018.170748

    [5]

    Tan J R, Wang C B, Li B Y, et al. Equalization loss for long-tailed object recognition[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11659–11668. https://doi.org/10.1109/CVPR42600.2020.01168.

    [6]

    Li F F, Fergus R, Perona P. A Bayesian approach to unsupervised one-shot learning of object categories[C]//Proceedings of the Ninth IEEE International Conference on Computer Vision, 2003: 1134−1141. https://doi.org/10.1109/ICCV.2003.1238476.

    [7]

    Mehrotra A, Dukkipati A. Generative adversarial residual pairwise networks for one shot learning[Z]. arXiv: 1703.08033, 2017. https://doi.org/10.48550/arXiv.1703.08033.

    [8]

    Chen Z T, Fu Y W, Zhang Y D, et al. Multi-level semantic feature augmentation for one-shot learning[J]. IEEE Trans Image Process, 2019, 28(9): 4594−4605. doi: 10.1109/TIP.2019.2910052

    [9]

    Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//Proceedings of the 34th International Conference on Machine Learning, 2017: 1126–1135. https://doi.org/10.5555/3305381.3305498.

    [10]

    Rusu A A, Rao D, Sygnowski J, et al. Meta-learning with latent embedding optimization[C]//Proceedings of the 7th International Conference on Learning Representations, 2019.

    [11]

    Frikha A, Krompaß D, Köpken H G, et al. Few-shot one-class classification via meta-learning[J]. Proc AAAI Conf Artif Intell, 2021, 35(8): 7448−7456. doi: 10.1609/aaai.v35i8.16913

    [12]

    Vinyals O, Blundell C, Lillicrap T, et al. Matching networks for one shot learning[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016: 3637–3645. https://doi.org/10.5555/3157382.3157504.

    [13]

    Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning[C]//Proceedings of the 31st Conference on Neural Information Processing Systems, 2017: 4077–4087.

    [14]

    Li W B, Wang L, Huo J L, et al. Revisiting local descriptor based image-to-class measure for few-shot learning[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 7253−7260. https://doi.org/10.1109/CVPR.2019.00743.

    [15]

    Luo X, Wei L H, Wen L J, et al. Rectifying the shortcut learning of background for few-shot learning[C]//Proceedings of the 35th Conference on Neural Information Processing Systems, 2021.

    [16]

    Chen Y H, Li W, Sakaridis C, et al. Domain adaptive faster R-CNN for object detection in the wild[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 3339−3348. https://doi.org/10.1109/CVPR.2018.00352.

    [17]

    Gong R, Li W, Chen Y H, et al. DLOW: domain flow for adaptation and generalization[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 2472–2481. https://doi.org/10.1109/CVPR.2019.00258.

    [18]

    He K M, Chen X L, Xie S N, et al. Masked autoencoders are scalable vision learners[C]//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 15979–15988. https://doi.org/10.1109/CVPR52688.2022.01553.

    [19]

    Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000–6010. https://doi.org/10.5555/3295222.3295349.

    [20]

    Ren M Y, Triantafillou E, Ravi S, et al. Meta-learning for semi-supervised few-shot classification[C]//Proceedings of the 6th International Conference on Learning Representations, 2018.

    [21]

    Sung F, Yang Y X, Zhang L, et al. Learning to compare: relation network for few-shot learning[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 1199–1208. https://doi.org/10.1109/CVPR.2018.00131.

    [22]

    Mishra N, Rohaninejad M, Chen X, et al. A simple neural attentive meta-learner[C]//Proceedings of the 6th International Conference on Learning Representations, 2018.

    [23]

    Simon C, Koniusz P, Nock R, et al. Adaptive subspaces for few-shot learning[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 4135–4144. https://doi.org/10.1109/CVPR42600.2020.00419.

    [24]

    Li H Y, Eigen D, Dodge S, et al. Finding task-relevant features for few-shot learning by category traversal[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 1–10. https://doi.org/10.1109/CVPR.2019.00009.

    [25]

    Oh J, Yoo H, Kim C H, et al. BOIL: towards representation change for few-shot learning[C]//Proceedings of the 9th International Conference on Learning Representations, 2021.

    [26]

    Lee K, Maji S, Ravichandran A, et al. Meta-learning with differentiable convex optimization[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 10649–10657. https://doi.org/10.1109/CVPR.2019.01091.

    [27]

    Liu Y B, Lee J, Park M, et al. Learning to propagate labels: Transductive propagation network for few-shot learning[C]//Proceedings of the 7th International Conference on Learning Representations, 2019.

    [28]

    Chen W Y, Liu Y C, Kira Z, et al. A closer look at few-shot classification[C]//Proceedings of the 7th International Conference on Learning Representations, 2019.

    [29]

    陈旭, 彭冬亮, 谷雨. 基于改进YOLOv5s的无人机图像实时目标检测[J]. 光电工程, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372

    Chen X, Peng D L, Gu Y. Real-time object detection for UAV images based on improved YOLOv5s[J]. Opto-Electron Eng, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372

    [30]

    李珣, 李林鹏, Lazovik A, 等. 基于改进双流卷积递归神经网络的RGB-D物体识别方法[J]. 光电工程, 2021, 48(2): 200069. doi: 10.12086/oee.2021.200069

    Li X, Li L P, Lazovik A, et al. RGB-D object recognition algorithm based on improved double stream convolution recursive neural network[J]. Opto-Electron Eng, 2021, 48(2): 200069. doi: 10.12086/oee.2021.200069

    [31]

    曹志, 尚丽丹, 尹东. 一种车辆识别代号检测和识别的弱监督学习方法[J]. 光电工程, 2021, 48(2): 200270. doi: 10.12086/oee.2021.200270

    Cao Z, Shang L D, Yin D. A weakly supervised learning method for vehicle identification code detection and recognition[J]. Opto-Electron Eng, 2021, 48(2): 200270. doi: 10.12086/oee.2021.200270

    [32]

    van der Maaten L, Hinton G. Visualizing data using t-SNE[J]. J Mach Learn Res, 2008, 9(86): 2579−2605.

    [33]

    唐彪, 金炜, 李纲, 等. 结合稀疏表示和子空间投影的云图检索[J]. 光电工程, 2019, 46(10): 180627. doi: 10.12086/oee.2019.180627

    Tang B, Jin W, Li G, et al. The cloud retrieval of combining sparse representation with subspace projection[J]. Opto-Electron Eng, 2019, 46(10): 180627. doi: 10.12086/oee.2019.180627

  • 加载中

(12)

(5)

计量
  • 文章访问数: 
  • PDF下载数: 
  • 施引文献:  0
出版历程
收稿日期:  2022-09-22
修回日期:  2022-12-26
录用日期:  2022-12-29
刊出日期:  2023-04-25

目录

/

返回文章
返回