空间位置矫正的稀疏特征图像分类网络

姜文涛,陈晨,张晟翀. 空间位置矫正的稀疏特征图像分类网络[J]. 光电工程,2024,51(5): 240050. doi: 10.12086/oee.2024.240050
引用本文: 姜文涛,陈晨,张晟翀. 空间位置矫正的稀疏特征图像分类网络[J]. 光电工程,2024,51(5): 240050. doi: 10.12086/oee.2024.240050
Jiang W T, Chen C, Zhang S C. Sparse feature image classification network with spatial position correction[J]. Opto-Electron Eng, 2024, 51(5): 240050. doi: 10.12086/oee.2024.240050
Citation: Jiang W T, Chen C, Zhang S C. Sparse feature image classification network with spatial position correction[J]. Opto-Electron Eng, 2024, 51(5): 240050. doi: 10.12086/oee.2024.240050

空间位置矫正的稀疏特征图像分类网络

  • 基金项目:
    国家自然科学基金资助项目(61172144);辽宁省自然科学基金资助项目(20170540426);辽宁省教育厅重点基金资助项目(LJYL049)
详细信息
    作者简介:
    *通讯作者: 陈晨,867428188@qq.com
  • 中图分类号: TP391.4

Sparse feature image classification network with spatial position correction

  • Fund Project: Project supported by National Natural Science Foundation of China (61172144), Liaoning Provincial Natural Science Foundation of China (20170540426), and Key Fund of Liaoning Provincial Department of Education (LJYL049)
More Information
  • 为稀疏语义并加强对重点特征的关注,增强空间位置和局部特征的关联性,对特征空间位置进行约束,本文提出空间位置矫正的稀疏特征图像分类网络(SSCNet)。该网络以ResNet-34残差网络为基础,首先,提出稀疏语义强化特征模块(SSEF),SSEF模块将深度可分离卷积(DSC)和SE相融合,在稀疏语义的同时增强特征提取能力,并能够保持空间信息的完整性;然后,提出空间位置矫正对称注意力机制(SPCS),SPCS将对称全局坐标注意力机制加到网络特定位置中,能够加强特征之间的空间关系,对特征的空间位置进行约束和矫正,从而增强网路对全局细节特征的感知能力;最后,提出平均池化残差模块(APM),并将APM应用到网络的每个残差分支中,使网络能够更有效地捕捉全局特征信息,增强特征的平移不变性,延缓网络过拟合,提高网络的泛化能力。在多个数据集中,SSCNet相比于其它高性能网络在分类准确率上均有不同程度的提升,证明了其在兼顾全局信息的同时,能够更好地提取局部细节信息,具有较高的分类准确率和较强的泛化性能。

  • Overview: To sparse semantics and enhance attention to key features, enhance the correlation between spatial and local features, and constrain the spatial position of features, this paper proposes a Sparse Feature Image Classification Network with Spatial Position Correction (SSCNet) for spatial position correction. Firstly, a Sparse Semantic Enhanced Feature Module (SSEF) module is proposed, which combines Depth Separable Convolution (DSC) and SE (Squeeze and Excitation) modules to enhance feature extraction ability while maintaining spatial information integrity; Then, the Spatial Position Correction Symmetric Attention Mechanism (SPCS) is proposed. SPCS adds the symmetric coordinate attention mechanism to specific positions in the network, which can strengthen the spatial relationships between features, constrain and correct their spatial positions, and enhance the network's perception of global detailed features; Finally, the Average Pooling Module (APM) is proposed and applied to each residual branch of the network, enabling the network to more effectively capture global feature information, enhance feature translation invariance, delay network overfitting, and improve network generalization ability. This article used CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewood datasets as experimental datasets. The CIFAR-10 dataset contains a total of 60000 color images from 10 categories, each with a resolution of 32×32 pixels. This dataset is commonly used to test and compare the performance of image classification algorithms. The CIFAR-100 dataset is more challenging and used to evaluate model performance for finer grained image classification tasks. The SVHN dataset contains real-world street view house number images, which contain digital images from Google Street View images used to recognize numbers on house signs. The images in the SVHN dataset are divided into training, testing, and additional training sets, each containing one or more numbers, and the resolution of the images is also higher than that of the CIFAR dataset. The Imagenette and Imagewof datasets are small scale subsets extracted from ImageNet, which have been streamlined and adjusted based on the ImageNet dataset. This article compares the network model with 12 other network models on 5 datasets. In the CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewood datasets, the classification accuracy of SSCNet is 96.72%, 80.63%, 97.43%, 88.75%, and 82.09%. Compared with other methods, SSCNet in this paper can better extract local detail information while balancing global information, and has higher classification accuracy and strong generalization performance.

  • 加载中
  • 图 1  SSCNet网络结构

    Figure 1.  SSCNet network structure

    图 2  修改首层卷积核尺寸前后卷积操作对比

    Figure 2.  Comparison of convolution operations before and after modifying the size of the first layer convolution kernel

    图 3  SSEF模块

    Figure 3.  SSEF module

    图 4  空间位置矫正对称注意力

    Figure 4.  Spatial position rectification symmetric attention

    图 5  坐标注意力结构

    Figure 5.  Coordinate attention structure

    图 6  全局坐标注意力结构

    Figure 6.  Max coordinate attention structure

    图 7  三种残差块比较。(a)残差块;(b)降维后的残差块;(c) 平均池化残差模块

    Figure 7.  Three types of residual blocks. (a) Basic block;(b) Residual block;(c) APM

    图 8  APM-Block和SSEF模块位置和数量的排列方式

    Figure 8.  The arrangement of APM-Block and SSEF module positions and quantities

    图 9  不同卷积核尺寸对分类准确率的影响

    Figure 9.  Influence of different convolutional kernel sizes on classification accuracy

    图 10  不同学习率对分类准确率的影响

    Figure 10.  Influence of different learning rates on classification accuracy

    图 11  经过SPCS模块的特征图前后对比

    Figure 11.  Comparison of feature maps before and after SPCS module

    图 12  不同数据集上各网络分类准确率。 (a) CIFAR-10; (b) CIFAR-100; (c) SVHN; (d) Imageneete

    Figure 12.  Classification accuracy of each network on different datasets. (a) CIFAR-10;(b) CIFAR-100; (c) SVHN; (d) Imageneete

    图 13  不同网络的热力图可视化图像

    Figure 13.  Visualization images of heat maps for different networks

    表 1  不同N值下的准确率

    Table 1.  Accuracy under different N values

    N243264128
    ACC/%72.1976.5378.9177.78
    下载: 导出CSV

    表 2  实验数据集

    Table 2.  Experimental datasets

    DatasetSizeClassificationTrainsetTestset
    CIFAR-1032×32105000010000
    CIFAR-10032×321005000010000
    SVHN32×32107325726032
    Imagenette224×2241094693925
    Imagewoof224×2241090253929
    下载: 导出CSV

    表 3  不同位置和数量SSEF模块与APM-Block对分类准确率的影响

    Table 3.  Influence of different positions and numbers of SSEF modules and APM-Blocks on classification accuracy

    CIFAR-10/%CIFAR-100/%SVHN/%Imagenette/%Imagewoof/%
    A94.7977.2295.6386.7680.56
    B95.1977.8696.3787.1781.17
    C95.1377.6396.2587.0980.86
    D95.2778.0396.3587.1281.06
    E95.8978.7996.9787.7981.35
    F95.7678.5796.5687.6481.29
    G95.6978.6396.8987.7281.41
    H96.7280.6397.4388.7582.09
    I96.3780.1297.2688.3981.71
    J96.1379.8197.1988.2781.63
    K96.4680.5297.3188.5181.76
    下载: 导出CSV

    表 4  SSEF模块对参数和准确率的影响

    Table 4.  The impact of SSEF module on parameters and accuracy

    ModuleParamACC/%Loss
    S14569678.120.82
    S24569678.530.79
    S34992078.190.81
    SSEF1350478.960.76
    下载: 导出CSV

    表 5  不同网络下参数量减少对计算效率的影响

    Table 5.  The impact of parameters reduction on computational efficiency under different networks

    NetworkSpeed/(f/s)Params/MFLOPs/GACC/%
    Multi-ResNet[22]1.6251.2337.9378.68
    ResNet-PSE[16]2.2340.5627.5672.81
    ResNeXt-PSE[16]2.0747.2931.3477.32
    SSLLNet[23]2.5731.5720.8679.23
    ATONet[24]2.8830.1216.9178.54
    SSCNet3.0521.3611.7180.63
    下载: 导出CSV

    表 6  各模块之间在不同数据集上的消融实验

    Table 6.  Ablation experiments between different modules on different datasets

    GroupSPCSSSEFAPMACC1/%ACC2/%ACC3/%ACC4/%Speed/ (f/s)FLOPs/G
    1-90.2369.6392.8983.672.8513.93
    2-92.3675.5194.1785.232.3226.67
    3-95.6777.3496.5687.092.9612.36
    496.7280.6397.4388.753.0511.17
    下载: 导出CSV

    表 7  各网络在5个数据集上的不同指标

    Table 7.  Different metrics for each network on five datasets

    NetworkCIFAR10CIFAR-100/%SVHNImagenette/%Imagewoof/%Speed/(f/s)Params/MFLOPs/G
    ResNet-34[5]87.8268.9291.3984.9178.863.0221.3211.63
    HO-ResNet[10]96.3277.1295.6986.2379.641.9350.2635.69
    CAPRDenseNet[25]94.2478.8494.9587.5680.792.8625.5117.73
    MobileNet-LAM[18]89.3768.09------------------------------------
    Multi-ResNet[22]94.5678.6894.5887.6981.211.6251.2337.93
    Couplformer[26]93.5473.9294.2685.1379.082.7327.6314.29
    ResNet-PSE[16]92.8972.8196.1485.0979.132.2340.5627.56
    ResNeXt-PSE[16]93.9277.3296.5486.2780.662.0747.2931.34
    ATONet[24]94.5178.5495.2186.6780.192.8830.1216.91
    QKFormer[27]96.1880.2697.1388.3281.652.3635.6226.39
    TLENet[28]95.4678.4296.8387.6280.572.1946.6730.57
    SSLLNet[23]95.5179.2396.9187.9380.892.5731.5720.86
    SSCNet96.7280.6397.4388.7582.093.0521.3611.71
    下载: 导出CSV
  • [1]

    Yang H, Li J. Label contrastive learning for image classification[J]. Soft Comput, 2023, 27(18): 13477−13486. doi: 10.1007/s00500-022-07808-z

    [2]

    Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015: 1–9. https://doi.org/10.1109/CVPR.2015.7298594.

    [3]

    Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proc IEEE, 1998, 86(11): 2278−2324. doi: 10.1109/5.726791

    [4]

    Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Commun ACM, 2017, 60(6): 84−90. doi: 10.1145/3065386

    [5]

    He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 770–778. https://doi.org/10.1109/CVPR.2016.90.

    [6]

    徐胜军, 荆扬, 李海涛, 等. 渐进式多粒度ResNet车型识别网络[J]. 光电工程, 2023, 50(7): 230052. doi: 10.12086/oee.2023.230052

    Xu S J, Jing Y, Li H T, et al. Progressive multi-granularity ResNet vehicle recognition network[J]. Opto-Electron Eng, 2023, 50(7): 230052. doi: 10.12086/oee.2023.230052

    [7]

    Wang J, Yang Q P, Yang S Q, et al. Dual-path processing network for high-resolution salient object detection[J]. Appl Intell, 2022, 52(10): 12034−12048. doi: 10.1007/s10489-021-02971-6

    [8]

    Xue T, Hong Y. IX-ResNet: fragmented multi-scale feature fusion for image classification[J]. Multimed Tools Appl, 2021, 80(18): 27855−27865. doi: 10.1007/s11042-021-10893-1

    [9]

    Jiang Z W, Ma Z J, Wang Y N, et al. Aggregated decentralized down-sampling-based ResNet for smart healthcare systems[J]. Neural Comput Appl, 2023, 35(20): 14653−14665. doi: 10.1007/s00521-021-06234-w

    [10]

    Luo Z B, Sun Z T, Zhou W L, et al. Rethinking ResNets: improved stacking strategies with high-order schemes for image classification[J]. Complex Intell Syst, 2022, 8(4): 3395−3407. doi: 10.1007/S40747-022-00671-3

    [11]

    Jafar A, Lee M. High-speed hyperparameter optimization for deep ResNet models in image recognition[J]. Cluster Comput, 2023, 26(5): 2605−2613. doi: 10.1007/s10586-021-03284-6

    [12]

    陈龙, 张建林, 彭昊, 等. 多尺度注意力与领域自适应的小样本图像识别[J]. 光电工程, 2023, 50(4): 220232. doi: 10.12086/oee.2023.220232

    Chen L, Zhang J L, Peng H, et al. Few-shot image classification via multi-scale attention and domain adaptation[J]. Opto-Electron Eng, 2023, 50(4): 220232. doi: 10.12086/oee.2023.220232

    [13]

    梁礼明, 金家新, 冯耀, 等. 融合坐标感知与混合提取的视网膜病变分级算法[J]. 光电工程, 2024, 51(1): 230276. doi: 10.12086/oee.2024.230276

    Liang L M, Jin J X, Feng Y, et al. Retinal lesions graded algorithm that integrates coordinate perception and hybrid extraction[J]. Opto-Electron Eng, 2024, 51(1): 230276. doi: 10.12086/oee.2024.230276

    [14]

    叶宇超, 陈莹. 跨尺度注意力融合的单幅图像去雨[J]. 光电工程, 2023, 50(10): 230191. doi: 10.12086/oee.2023.230191

    Ye Y C, Chen Y. Single image rain removal based on cross scale attention fusion[J]. Opto-Electron Eng, 2023, 50(10): 230191. doi: 10.12086/oee.2023.230191

    [15]

    Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 7132–7141. https://doi.org/10.1109/CVPR.2018.00745.

    [16]

    Ying Y, Zhang N B, Shan P, et al. PSigmoid: improving squeeze-and-excitation block with parametric sigmoid[J]. Appl Intell, 2021, 51(10): 7427−7439. doi: 10.1007/s10489-021-02247-z

    [17]

    Hou Q B, Zhou D Q, Feng J S. Coordinate attention for efficient mobile network design[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 2021: 13708–13717. https://doi.org/10.1109/CVPR46437.2021.01350.

    [18]

    Ji Q W, Yu B, Yang Z W, et al. LAM: lightweight attention module[C]//15th International Conference on Knowledge Science, Engineering and Management, Singapore, 2022: 485–497. https://doi.org/10.1007/978-3-031-10986-7_39.

    [19]

    Zhong H M, Han T T, Xia W, et al. Research on real-time teachers’ facial expression recognition based on YOLOv5 and attention mechanisms[J]. EURASIP J Adv Signal Process, 2023, 2023(1): 55. doi: 10.1186/s13634-023-01019-w

    [20]

    Qi F, Wang Y L, Tang Z. Lightweight plant disease classification combining GrabCut algorithm, new coordinate attention, and channel pruning[J]. Neural Process Lett, 2022, 54(6): 5317−5331. doi: 10.1007/s11063-022-10863-0

    [21]

    Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//32nd International Conference on Machine Learning, Lille, 2015: 448–456.

    [22]

    Abdi M, Nahavandi S. Multi-residual networks: improving the speed and accuracy of residual networks[Z]. arXiv: 1609.05672, 2016. https://doi.org/10.48550/arXiv.1609.05672.

    [23]

    Ma C X, Wu J B, Si C Y, et al. Scaling supervised local learning with augmented auxiliary networks[Z]. arXiv: 2402.17318, 2024. https://doi.org/10.48550/arXiv.2402.17318.

    [24]

    Wu X D, Gao S Q, Zhang Z Y, et al. Auto-train-once: controller network guided automatic network pruning from scratch[Z]. arXiv: 2403.14729, 2024. https://doi.org/10.48550/arXiv.2403.14729.

    [25]

    Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017: 2261–2269. https://doi.org/10.1109/CVPR.2017.243.

    [26]

    Lan H, Wang X H, Shen H, et al. Couplformer: rethinking vision transformer with coupling attention[C]//Proceedings of 2023 IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, 2023: 6464–6473. https://doi.org/10.1109/WACV56688.2023.00641.

    [27]

    Zhou C L, Zhang H, Zhou Z K, et al. QKFormer: hierarchical spiking transformer using Q-K attention[Z]. arXiv: 2403.16552, 2024. https://doi.org/10.48550/arXiv.2403.16552.

    [28]

    Shin H, Choi D W. Teacher as a lenient expert: teacher-agnostic data-free knowledge distillation[C]//Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, 2024: 14991–14999. https://doi.org/10.1609/aaai.v38i13.29420.

    [29]

    Tan M X, Le Q. EfficientNet: rethinking model scaling for convolutional neural networks[C]//36th International Conference on Machine Learning, Long Beach, 2019: 6105–6114.

  • 加载中

(14)

(7)

计量
  • 文章访问数: 
  • PDF下载数: 
  • 施引文献:  0
出版历程
收稿日期:  2024-03-06
修回日期:  2024-04-23
录用日期:  2024-04-24
刊出日期:  2024-05-25

目录

/

返回文章
返回