在线推断校准的小样本目标检测

彭昊,王婉祺,陈龙,等. 在线推断校准的小样本目标检测[J]. 光电工程,2023,50(1): 220180. doi: 10.12086/oee.2023.220180
引用本文: 彭昊,王婉祺,陈龙,等. 在线推断校准的小样本目标检测[J]. 光电工程,2023,50(1): 220180. doi: 10.12086/oee.2023.220180
Peng H, Wang W Q, Chen L, et al. Few-shot object detection via online inferential calibration[J]. Opto-Electron Eng, 2023, 50(1): 220180. doi: 10.12086/oee.2023.220180
Citation: Peng H, Wang W Q, Chen L, et al. Few-shot object detection via online inferential calibration[J]. Opto-Electron Eng, 2023, 50(1): 220180. doi: 10.12086/oee.2023.220180

在线推断校准的小样本目标检测

  • 基金项目:
    国家自然科学基金青年科学基金资助项目(62101529)
详细信息
    作者简介:
    *通讯作者: 彭先蓉, peng_xr@ioe.ac.cn
  • 中图分类号: TP391.4

Few-shot object detection via online inferential calibration

  • Fund Project: National Natural Science Foundation of China (62101529)
More Information
  • 针对少量样本条件下模型易过拟合、目标错检与漏检问题,本文基于TFA (two-stage fine-tuning approach)提出了一种在线推断校准的小样本目标检测框架。该框架设计了一种全新的Attention-FPN网络,通过建模特征通道间的依赖关系选择性融合特征,结合分级冻结的学习机制引导RPN模块提取正确的新类前景目标;同时,构建了一种在线校准模块对样本进行实例分割编码,对众多候选目标进行评分重加权处理,纠正误检和漏检的预测目标。结果表明,所提算法在VOC数据集Novel Set1中,五个任务的平均nAP50提升10.16%,在性能上优于目前的主流算法。

  • 加载中
  • 图 1  Faster R-CNN网络结构

    Figure 1.  Faster R-CNN network architecture

    图 2  FSOIC网络结构

    Figure 2.  FSOIC network architecture

    图 3  基于TFA的检测结果

    Figure 3.  Detection results based on TFA

    图 4  Attention-FPN网络结构

    Figure 4.  Attention-FPN network architecture

    图 5  通道注意力模块

    Figure 5.  Channel attention module

    图 6  FSOIC算法的类模板生成模块

    Figure 6.  FSOIC algorithm class template generation module

    图 7  特征度量空间

    Figure 7.  Feature metric space

    图 8  检测结果性能对比

    Figure 8.  Performance comparison of the detection results

    图 9  10 shot任务中遮挡条件下的检测结果

    Figure 9.  Detection results under the occlusion conditions in the 10 shot task

    图 10  10 shot 任务下的检测结果。(a) 基于TFA的Faster R-CNN网络检测结果;(b) 使用在线推断校准模块的FasterR-CNN网络检测结果;(c) 使用在线推断校准模块并添加Attention-FPN网络的Faster R-CNN网络检测结果

    Figure 10.  10 shot task detection results. (a) Detection results of the Faster R-CNN network based on TFA; (b) Detection results of the Faster R-CNN net work using the online inference calibration module; (c) Detection results of the Faster R-CNN network using the online inference calibration module and adding the Attention-FPN network

    表 1  分级冻结机制

    Table 1.  Hierarchical freezing mechanism

    ShotBackboneRegressorClassiferAttention-FPNRPNROI
    1××××
    2×××
    3×
    5
    10
    下载: 导出CSV

    表 2  数据集实验设置

    Table 2.  Experimental settings of the dataset

    DatasetShotNumber of categoriesInitial learning rateBatch_sizeDecay ratio of learning rateNumber of attenuationIterations
    VOC1200.001160.116000
    20.117000
    30.128000
    50.529000
    100.5213000
    COCO10800.001160.3130000
    3040000
    下载: 导出CSV

    表 3  小样本目标检测算法在VOC新类划分集的性能分析比较表

    Table 3.  Performance analysis and comparison of the few shot object detection algorithm in VOC new class partition sets

    MethodYearNovel Set 1Novel Set 2Novel Set 3
    123510123510 123510
    LSTD[26]AAAI 188.21.012.429.138.511.43.85.015.731.012.68.515.027.336.3
    MetaDet[40]ICCV 1918.920.630.236.849.621.823.127.831.743.020.623.929.443.944.1
    Meta R-CNN[15]ICCV 1919.925.535.045.751.510.419.429.634.845.414.318.227.541.248.1
    RepMet[28]CVPR 1926.132.934.438.641.317.222.123.428.335.827.531.131.534.437.2
    FSRW[37]ICCV 1914.815.526.733.947.215.715.322.730.140.521.325.628.442.845.9
    FSDetView[42]ECCV 2024.235.342.249.157.421.624.631.937.045.721.230.037.243.849.6
    TFA w/cos[44]ICML 2039.836.144.755.756.023.526.934.135.139.130.834.842.849.549.8
    MPSR[51]ECCV 2041.7-51.455.261.824.4-39.239.947.835.6-42.348.049.7
    TFA w/cos+Halluc[18]CVPR 2145.144.044.755.055.923.227.535.134.939.030.535.141.449.049.3
    TIP[41]CVPR 2127.736.543.350.259.622.730.133.840.946.921.730.638.144.550.9
    FSCE[25]CVPR 2144.243.851.461.963.427.329.543.544.250.237.241.947.554.658.5
    Retentive R-CNN[45]CVPR 2142.445.845.953.756.121.727.835.237.040.330.237.643.049.750.1
    Meta-DETR[38]IEEE 2235.149.053.257.462.027.932.338.443.251.834.941.847.154.158.2
    AGCM[33]IEEE 2240.3--58.559.927.5--49.350.642.1--54.258.2
    FSOIC(Ours)46.653.456.662.064.525.730.543.845.953.342.444.949.556.658.8
    下载: 导出CSV

    表 4  小样本目标检测算法在COCO数据集的性能分析比较

    Table 4.  Performance analysis and comparison of few shot object detection algorithms in the COCO datasets

    MethodYearNovel AP
    1030
    LSTD [26]AAAI 183.26.7
    FSRW [37]ICCV 195.69.1
    MPSR [51]ECCV 209.814.1
    TFA w/cos [44]ICML 2010.013.7
    Retentive R-CNN [45]CVPR 2110.513.8
    FSCE [25]CVPR 2111.916.4
    FSOIC(Ours)12.716.7
    下载: 导出CSV

    表 5  消融实验性能比较

    Table 5.  Comparison of the ablation experimental performance

    MethodFPN+4*ROIFinetune RPNOnline calibrationAttention of channelNovel Set1
    1310
    TFA w/cos[44]----39.844.756.0
    FSOIC(Ours)×××43.652.262.5
    FSOIC(Ours)××44.153.063.2
    FSOIC(Ours)×45.754.264.2
    FSOIC(Ours)×46.254.962.8
    FSOIC(Ours)×44.754.061.7
    FSOIC(Ours)46.656.664.5
    下载: 导出CSV
  • [1]

    陈旭, 彭冬亮, 谷雨. 基于改进YOLOv5s的无人机图像实时目标检测[J]. 光电工程, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372

    Chen X, Peng D L, Gu Y. Real-time object detection for UAV images based on improved YOLOv5s[J]. Opto-Electron Eng, 2022, 49(3): 210372. doi: 10.12086/oee.2022.210372

    [2]

    Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580−587. https://doi.org/10.1109/CVPR.2014.81.

    [3]

    He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Trans Pattern Anal Mach Intell, 2015, 37(9): 1904−1916. doi: 10.1109/TPAMI.2015.2389824

    [4]

    Girshick R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015: 1440–1448. https://doi.org/10.1109/ICCV.2015.169.

    [5]

    Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015, 91–99.

    [6]

    Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779–788. https://doi.org/10.1109/CVPR.2016.91.

    [7]

    Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//14th European Conference on Computer Vision, 2016: 21–37. https://doi.org/10.1007/978-3-319-46448-0_2.

    [8]

    Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6517–6525. https://doi.org/10.1109/CVPR.2017.690.

    [9]

    Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, 2017: 2999−3007. https://doi.org/10.1109/ICCV.2017.324.

    [10]

    Redmon J, Farhadi A. YOLOv3: an incremental improvement[Z]. arXiv: 1804.02767, 2018. https://arxiv.org/abs/1804.02767.

    [11]

    Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: optimal speed and accuracy of object detection[Z]. arXiv: 2004.10934, 2020. https://arxiv.org/abs/2004.10934.

    [12]

    马梁, 苟于涛, 雷涛, 等. 基于多尺度特征融合的遥感图像小目标检测[J]. 光电工程, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363

    Ma L, Gou Y T, Lei T, et al. Small object detection based on multi-scale feature fusion using remote sensing images[J]. Opto-Electron Eng, 2022, 49(4): 210363. doi: 10.12086/oee.2022.210363

    [13]

    Bennequin E. Meta-learning algorithms for few-shot computer vision[Z]. arXiv: 1909.13579, 2019. https://arxiv.org/abs/1909.13579.

    [14]

    Behl H S, Baydin A G, Torr P H S. Alpha MAML: adaptive model-agnostic meta-learning[Z]. arXiv: 1905.07435, 2019. https://arxiv.org/abs/1905.07435.

    [15]

    Yan X P, Chen Z L, Xu A N, et al. Meta R-CNN: towards general solver for instance-level low-shot learning[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019: 9576–9585. https://doi.org/10.1109/ICCV.2019.00967.

    [16]

    Wang Y Q, Yao Q M. Few-shot learning: a survey[Z]. arXiv: 1904.05046v1, 2019. https://arxiv.org/abs/1904.05046v1.

    [17]

    Duan Y, Andrychowicz M, Stadie B, et al. One-shot imitation learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 1087–1098.

    [18]

    Zhang W L, Wang Y X. Hallucination improves few-shot object detection[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13003–13012. https://doi.org/10.1109/CVPR46437.2021.01281.

    [19]

    Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//2017 IEEE International Conference on Computer Vision, 2017: 2242–2251. https://doi.org/10.1109/ICCV.2017.244.

    [20]

    Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014: 3672–2680.

    [21]

    Li K, Zhang Y L, Li K P, et al. Adversarial feature hallucination networks for few-shot learning[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 13467–13476. https://doi.org/10.1109/CVPR42600.2020.01348.

    [22]

    Hui B Y, Zhu P F, Hu Q H, et al. Self-attention relation network for few-shot learning[C]//2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2019: 198–203. https://doi.org/10.1109/ICMEW.2019.00041.

    [23]

    Hao F S, Cheng J, Wang L, et al. Instance-level embedding adaptation for few-shot learning[J]. IEEE Access, 2019, 7: 100501−100511. doi: 10.1109/ACCESS.2019.2906665

    [24]

    Schönfeld E, Ebrahimi S, Sinha S, et al. Generalized zero-and few-shot learning via aligned variational autoencoders[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019: 8239–8247. https://doi.org/10.1109/CVPR.2019.00844.

    [25]

    Sun B, Li B H, Cai S C, et al. FSCE: few-shot object detection via contrastive proposal encoding[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 7348–7358. https://doi.org/10.1109/CVPR46437.2021.00727.

    [26]

    Chen H, Wang Y L, Wang G Y, et al. LSTD: a low-shot transfer detector for object detection[C]//Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, 2018: 346.

    [27]

    Hu H Z, Bai S, Li A X, et al. Dense relation distillation with context-aware aggregation for few-shot object detection[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 10180–10189. https://doi.org/10.1109/CVPR46437.2021.01005.

    [28]

    Karlinsky L, Shtok J, Harary S, et al. RepMet: representative-based metric learning for classification and few-shot object detection[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5197–5206. https://doi.org/10.1109/CVPR.2019.00534.

    [29]

    Jiang W, Huang K, Geng J, et al. Multi-scale metric learning for few-shot learning[J]. IEEE Trans Circuits Syst Video Technol, 2021, 31(3): 1091−1102. doi: 10.1109/TCSVT.2020.2995754

    [30]

    Sung F, Yang Y X, Zhang L, et al. Learning to compare: relation network for few-shot learning[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 1199–1208. https://doi.org/10.1109/CVPR.2018.00131.

    [31]

    Tao X Y, Hong X P, Chang X Y, et al. Few-shot class-incremental learning[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 12180–12189. .

    [32]

    Wang Y, Wu X M, Li Q M, et al. Large margin few-shot learning[Z]. arXiv: 1807.02872, 2018. https://doi.org/10.48550/arXiv.1807.02872.

    [33]

    Agarwal A, Majee A, Subramanian A, et al. Attention guided cosine margin to overcome class-imbalance in few-shot road object detection[C]//2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), 2022: 221–230. https://doi.org/10.1109/WACVW54805.2022.00028.

    [34]

    Nichol A, Achiam J, Schulman J. On first-order meta-learning algorithms[Z]. arXiv: 1803.02999, 2018. https://arxiv.org/abs/1803.02999.

    [35]

    Li Z G, Zhou F W, Chen F, et al. Meta-SGD: learning to learn quickly for few-shot learning[Z]. arXiv: 1707.09835, 2017. https://arxiv.org/abs/1707.09835.

    [36]

    Ravi S, Larochelle H. Optimization as a model for few-shot learning[C]//5th International Conference on Learning Representations, 2016.

    [37]

    Kang B Y, Liu Z, Wang X, et al. Few-shot object detection via feature reweighting[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019: 8419–8428. https://doi.org/10.1109/ICCV.2019.00851.

    [38]

    Zhang G J, Luo Z P, Cui K W, et al. Meta-DETR: image-level few-shot detection with inter-class correlation exploitation[J]. IEEE Trans Pattern Anal Mach Intell, 2022. https://doi.org/10.1109/TPAMI.2022.3195735.

    [39]

    马雯, 于炯, 王潇, 等. 基于改进Faster R-CNN的垃圾检测与分类方法[J]. 计算机工程, 2021, 47(8): 294−300. doi: 10.19678/j.issn.1000-3428.0058258

    Ma W, Yu J, Wang X, et al. Garbage detection and classification method based on improved faster R-CNN[J]. Comput Eng, 2021, 47(8): 294−300. doi: 10.19678/j.issn.1000-3428.0058258

    [40]

    Wang Y X, Ramanan D, Hebert M. Meta-learning to detect rare objects[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019: 9924–9933. https://doi.org/10.1109/ICCV.2019.01002.

    [41]

    Li A X, Li Z G. Transformation invariant few-shot object detection[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 3093–3101. https://doi.org/10.1109/CVPR46437.2021.00311.

    [42]

    Xiao Y, Marlet R. Few-shot object detection and viewpoint estimation for objects in the wild[C]//16th European Conference on Computer Vision, 2020: 192−210. https://doi.org/10.1007/978-3-030-58520-4_12.

    [43]

    Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936−944. https://doi.org/10.1109/CVPR.2017.106.

    [44]

    Wang X, Huang T, Gonzalez J, et al. Frustratingly simple few-shot object detection[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 9919–9928.

    [45]

    Fan Z B, Ma Y C, Li Z M, et al. Generalized few-shot object detection without forgetting[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 4525−4534. https://doi.org/10.1109/CVPR46437.2021.00450.

    [46]

    Bertinetto L, Valmadre J, Henriques J F, et al. Fully-convolutional siamese networks for object tracking[C]//14th European Conference on Computer Vision, 2016: 850–865. https://doi.org/10.1007/978-3-319-48881-3_56.

    [47]

    Li B, Yan J J, Wu W, et al. High performance visual tracking with Siamese region proposal network[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 8971−8980. https://doi.org/10.1109/CVPR.2018.00935.

    [48]

    Zhu Z, Wang Q, Li B, et al. Distractor-aware Siamese networks for visual object tracking[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 103–119. https://doi.org/10.1007/978-3-030-01240-3_7.

    [49]

    赵春梅, 陈忠碧, 张建林. 基于卷积网络的目标跟踪应用研究[J]. 光电工程, 2020, 47(1): 180668. doi: 10.12086/oee.2020.180668

    Zhao C M, Chen Z B, Zhang J L. Research on target tracking based on convolutional networks[J]. Opto-Electron Eng, 2020, 47(1): 180668. doi: 10.12086/oee.2020.180668

    [50]

    赵春梅, 陈忠碧, 张建林. 基于深度学习的飞机目标跟踪应用研究[J]. 光电工程, 2019, 46(9): 180261. doi: 10.12086/oee.2019.180261

    Zhao C M, Chen Z B, Zhang J L. Application of aircraft target tracking based on deep learning[J]. Opto-Electron Eng, 2019, 46(9): 180261. doi: 10.12086/oee.2019.180261

    [51]

    Wu J X, Liu S T, Huang D, et al. Multi-scale positive sample refinement for few-shot object detection[C]//Proceedings of the 16th European Conference on Computer Vision, 2020: 456–472. https://doi.org/10.1007/978-3-030-58517-4_27.

  • 加载中

(10)

(5)

计量
  • 文章访问数: 
  • PDF下载数: 
  • 施引文献:  0
出版历程
收稿日期:  2022-07-27
修回日期:  2022-12-22
录用日期:  2022-12-29
刊出日期:  2023-01-25

目录

/

返回文章
返回