基于Swin-AK Transformer的智能手机拍摄图像质量评价方法

侯国鹏,董武,陆利坤,等. 基于Swin-AK Transformer的智能手机拍摄图像质量评价方法[J]. 光电工程,2025,52(1): 240264. doi: 10.12086/oee.2025.240264
引用本文: 侯国鹏,董武,陆利坤,等. 基于Swin-AK Transformer的智能手机拍摄图像质量评价方法[J]. 光电工程,2025,52(1): 240264. doi: 10.12086/oee.2025.240264
Hou G P, Dong W, Lu L K, et al. Smartphone image quality assessment method based on Swin-AK Transformer[J]. Opto-Electron Eng, 2025, 52(1): 240264. doi: 10.12086/oee.2025.240264
Citation: Hou G P, Dong W, Lu L K, et al. Smartphone image quality assessment method based on Swin-AK Transformer[J]. Opto-Electron Eng, 2025, 52(1): 240264. doi: 10.12086/oee.2025.240264

基于Swin-AK Transformer的智能手机拍摄图像质量评价方法

  • 基金项目:
    北京市数字教育研究重点课题 (BDEC2022619027);北京市高等教育学会2023年立项面上课题 (MS2023168);北京印刷学院校级科研项目 (Ec202303, Ea202301, E6202405);北京印刷学院学科建设和研究生教育专项 (21090323009, 21090224002, 21090124013);北京市教育委员会出版学新兴交叉学科平台建设-数字喷墨印刷技术及多功能轮转胶印机关键技术研发平台项目 (04190123001/003);北京邮电大学网络与交换技术全国重点实验室开放课题资助项目 (SKLNST-2023-1-12);北京印刷学院“人工智能+”课程建设项目
详细信息
    作者简介:
    *通讯作者: 董武,dongwu@bigc.edu.cn。
  • 中图分类号: TP391.4

  • CSTR: 32245.14.oee.2025.240264

Smartphone image quality assessment method based on Swin-AK Transformer

  • Fund Project: Project supported by The Important Project of Digital Education Research of Beijing (BDEC2022619027), 2023 Project Proposal of Beijing Higher Education Association (MS2023168), the Research Project of Beijing Institute of Graphic Communication (Ec202303, Ea202301, E6202405), the Disciplinary Construction and Postgraduate Education Project of Beijing Institute of Graphic Communication (21090323009, 21090224002, 21090124013), Classification Development of Beijing Municipal Universities-Construction Project of Emerging Interdisciplinary Platform for Publishing at Beijing Institute of Graphic Communication-Key Technology Research and Development Platform for Digital Inkjet Printing Technology and Multifunctional Rotary Offset Press (04190123001/003), Open Foundation of State key Laboratory of Networking and Switching Technology (Beijing University of Posts and Telecommunications) (SKLNST-2023-1-12) , and the project of the "Artificial Intelligence Plus" Course Construction of Beijing Institute of Graphic Communication
More Information
  • 本文提出了一种基于双交叉注意力融合的Swin-AK Transformer (Swin Transformer based on alterable kernel convolution)和手工特征相结合的智能手机拍摄图像质量评价方法。首先,提取了影响图像质量的手工特征,这些特征可以捕捉到图像中细微的视觉变化;其次,提出了Swin-AK Transformer,增强了模型对局部信息的提取和处理能力。此外,本文设计了双交叉注意力融合模块,结合空间注意力和通道注意力机制,融合了手工特征与深度特征,实现了更加精确的图像质量预测。实验结果表明,在SPAQ和LIVE-C数据集上,皮尔森线性相关系数分别达到0.932和0.885,斯皮尔曼等级排序相关系数分别达到0.929和0.858。上述结果证明了本文提出的方法能够有效地预测智能手机拍摄图像的质量。

  • Overview: With the extensive use of smartphones, users’ expectations for smartphone image quality have risen significantly. However, due to limitations in camera hardwares, smartphones often face constraints in light capture, especially in complex or low-light scenarios, which can lead to image quality degradation. Existing no-reference image quality assessment (IQA) algorithms frequently show limitations when handling smartphone-captured images, motivating the development of a more accurate quality evaluation method. This study proposes an approach based on manual features and a Swin-AK Transformer with dual cross-attention fusion, designed to assess smartphone image quality with greater precision. First, manual features affecting image quality are extracted, guided by the human visual system, enabling the capture of subtle visual variations such as color, contrast, and texture, which enhances the model’s sensitivity to image quality. To further improve the discriminative power for image quality assessment, ResNet50 is introduced after manual feature extraction to establish a nonlinear mapping between manual features and image quality. This process transforms initial low-level features into more representative high-level features, allowing for a more comprehensive expression of image content. Subsequently, the study introduces the Swin-AK Transformer, which utilizes a self-attention mechanism to capture local image features, thereby enhancing the model’s capability to recognize and process local information in smartphone images. This method effectively adapts to the unique characteristics of smartphone images, offering robust handling of intricate details. Additionally, a dual cross-attention fusion module is designed to integrate manual and deep features efficiently. The module combines spatial and channel attention mechanisms: spatial attention aids the model in focusing on key areas within the image, while channel attention optimizes feature representation by adjusting the weights of each channel. As a result, the fused features reflect both global image information and local detail variations, aligning well with the human visual system’s natural perception of image quality. Experiments were conducted on two public datasets, SPAQ and LIVE-C, to evaluate the proposed model. The results demonstrate the model’s superior performance in image quality prediction, achieving Pearson correlation coefficients of 0.932 and 0.885 and Spearman rank correlation coefficients of 0.929 and 0.858 on the SPAQ and LIVE-C datasets, respectively. These outcomes validate the proposed method’s effectiveness in smartphone image quality assessment tasks, showcasing improved sensitivity to quality changes and excellent accuracy and robustness.

  • 加载中
  • 图 1  本文方法的整体结构图

    Figure 1.  Overall structure diagram of the proposed method

    图 2  手工特征提取的示意图

    Figure 2.  Diagram of manual feature extraction

    图 3  ResNet50网络结构图

    Figure 3.  ResNet50 architecture diagram

    图 4  Swin Transformer的滑动窗口操作示意图

    Figure 4.  Diagram of the sliding window operation in Swin Transformer

    图 5  Swin-AK Transformer结构图

    Figure 5.  Swin-AK Transformer architecture diagram

    图 6  Swin-AK blocks的结构图

    Figure 6.  Swin-AK blocks architecture diagram

    图 7  AKConv的结构图

    Figure 7.  AKConv architecture diagram

    图 8  双交叉注意力特征融合模块的结构示意图

    Figure 8.  Structure diagram of the dual attention cross fusion module

    图 9  通道注意力模块的结构示意图

    Figure 9.  Channel attention network structure diagram

    图 10  空间注意力模块的结构示意图

    Figure 10.  Structure diagram of the spatial attention module

    图 11  SPAQ数据集中图像属性评分与整体主观质量评分之间的散点图。(a)亮度;(b)色彩度;(c)锐度

    Figure 11.  Scatter plot of image attribute scores versus overall subjective quality scores in the SPAQ. (a) Brightness; (b) Colorfulness; (c) Sharpness

    图 12  LIVE-C数据集上的散点图

    Figure 12.  Scatter plot on the LIVE-C Dataset

    图 13  SPAQ数据集上的散点图

    Figure 13.  Scatter plot on the SPAQ Dataset

    图 14  Swin Transformer和Swin-AK Transformer的注意力热力图对比

    Figure 14.  Comparison of attention heatmaps between Swin Transformer and Swin-AK Transformer

    图 15  SPAQ数据集中图像的MOS值与本文方法的质量预测值

    Figure 15.  MOS values of images in the SPAQ dataset and the quality prediction values of the proposed method

    图 16  LIVE-C数据集中图像的MOS值与本文方法的质量预测值

    Figure 16.  MOS values of images in the LIVE-C dataset and the quality prediction values of the proposed method

    表 1  本文方法在SPAQ数据集上与其它方法的对比

    Table 1.  Comparison of the proposed method with other methods on the SPAQ Dataset

    MethodsPLCCSROCC
    BLINDS-II[21]0.5390.478
    DIIVINE[22]0.6030.596
    BRISQUE[23]0.8170.828
    CORNIA[24]0.7240.709
    IL-NIQE[25]0.7040.695
    HOSA[26]0.8240.817
    DIQaM-NR[27]0.8360.824
    WaDIQaM-NR[27]0.8430.821
    TS-CNN[28]0.8110.801
    TReS[29]0.9110.902
    DB-CNN[30]0.9130.909
    HyperIQA[31]0.9190.916
    CaHDC[32]0.8410.833
    ResNet50[7]0.9090.908
    MT-A[7]0.9160.916
    MUSIQ[2]0.9210.917
    DACNN[33]0.9210.915
    Re-IQA[34]0.9250.918
    DEIQT[35]0.9230.919
    LoDa[36]0.9280.925
    Ours0.9320.929
    下载: 导出CSV

    表 2  本文方法在LIVE-C数据集上与其它方法的对比

    Table 2.  Comparison of the proposed method with other methods on the LIVE-C Dataset

    MethodsPLCCSROCC
    BLINDS-II[21]0.4970.456
    DIIVINE[22]0.5570.513
    BRISQUE[23]0.6370.616
    CORNIA[24]0.6590.617
    IL-NIQE[25]0.5160.539
    HOSA[26]0.6910.674
    DIQaM-NR[27]0.6450.633
    WaDIQaM-NR[27]0.6920.669
    TS-CNN[28]0.6670.655
    TReS[29]0.8770.846
    DB-CNN[30]0.8590.852
    HyperIQA[31]0.8700.855
    CaHDC[32]0.7380.734
    MUSIQ[2]0.8750.862
    DACNN[33]0.8820.861
    Re-IQA[34]0.8540.84
    Ours0.8850.865
    下载: 导出CSV

    表 3  消融实验的结果

    Table 3.  Results of the ablation experiment

    ModelPLCCSROCC
    Swin Transformer0.9210.918
    Swin-AK Transformer0.9230.920
    Manual features+
    Swin Transformer
    0.9240.922
    Manual features+
    Swin-AK Transformer
    0.9290.925
    Ours0.9320.929
    下载: 导出CSV
  • [1]

    鄢杰斌, 方玉明, 刘学林. 图像质量评价研究综述——从失真的角度[J]. 中国图象图形学报, 2022, 27 (5): 1430−1466. doi: 10.11834/jig.210790

    Yan J B, Fang Y M, Liu X L. The review of distortion-related image quality assessment[J]. J Image Graphics, 2022, 27 (5): 1430−1466. doi: 10.11834/jig.210790

    [2]

    Ke J J, Wang Q F, Wang Y L, et al. MUSIQ: multi-scale image quality transformer[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021: 5128–5137. https://doi.org/10.1109/ICCV48922.2021.00510.

    [3]

    Varga D. No-reference image quality assessment using the statistics of global and local image features[J]. Electronics, 2023, 12 (7): 1615. doi: 10.3390/electronics12071615

    [4]

    Jain P, Shikkenawis G, Mitra S K. Natural scene statistics and CNN based parallel network for image quality assessment[C]//2021 IEEE International Conference on Image Processing (ICIP), 2021: 1394–1398. https://doi.org/10.1109/ICIP42928.2021.9506404.

    [5]

    Shao X, Liu M Q, Li Z H, et al. CPDINet: blind image quality assessment via a content perception and distortion inference network[J]. IET Image Processing, 2022, 16 (7): 1973−1987. doi: 10.1049/ipr2.12463

    [6]

    Zhao K, Yuan K, Sun M, et al. Quality-aware pretrained models for blind image quality assessment[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023: 22302–22313. https://doi.org/10.1109/CVPR52729.2023.02136.

    [7]

    Fang Y M, Zhu H W, Zeng Y, et al. Perceptual quality assessment of smartphone photography[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020: 3674–3683. https://doi.org/10.1109/CVPR42600.2020.00373.

    [8]

    Yuan Z F, Qi Y, Hu M H, et al. Opinion-unaware no-reference image quality assessment of smartphone camera images based on aesthetics and human perception[C]//2020 IEEE International Conference on Multimedia & Expo Workshops, 2020: 1–6. https://doi.org/10.1109/ICMEW46912.2020.9106048.

    [9]

    Zhou Y W, Wang Y L, Kong Y Y, et al. Multi-Indicator image quality assessment of smartphone camera based on human subjective behavior and perception[C]//2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2020: 1–6. https://doi.org/10.1109/ICMEW46912.2020.9105971.

    [10]

    Huang C H, Wu J L. Multi-task deep CNN model for no-reference image quality assessment on smartphone camera photos[Z]. arXiv: 2008.11961, 2020. https://arxiv.org/abs/2008.11961.

    [11]

    Yao C, Lu Y R, Liu H, et al. Convolutional neural networks based on residual block for no-reference image quality assessment of smartphone camera images[C]//Proceedings of the 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2020: 1–6. https://doi.org/10.1109/ICMEW46912.2020.9106034.

    [12]

    王斌, 白永强, 朱仲杰, 等. 联合空角信息的无参考光场图像质量评价[J]. 光电工程, 2024, 51 (9): 240139. doi: 10.12086/oee.2024.240139

    Wang B, Bai Y Q, Zhu Z J, et al. No-reference light field image quality assessment based on joint spatial-angular information[J]. Opto-Electron Eng, 2024, 51 (9): 240139. doi: 10.12086/oee.2024.240139

    [13]

    陈松, 温宇鑫, 安浩铭. 基于多尺度双边滤波Retinex的非均匀光照散斑图像矫正[J/OL]. 激光技术, 1-16 [2025-01-18].

    http://kns.cnki.net/kcms/detail/51.1125.TN.20240116.1129.004.html.

    [14]

    刘佳, 唐鋆磊, 林冰, 等. 基于HSV (色相-饱和度-明度)与形状特征的涂层锈点图像识别[J]. 中国表面工程, 2023, 36 (4): 217−228. doi: 10.11933/j.issn.1007-9289.20221008001

    Liu J, Tang J L, Lin B, et al. Rust spot image recognition of coatings based on HSV and shape feature[J]. China Surf Eng, 2023, 36 (4): 217−228. doi: 10.11933/j.issn.1007-9289.20221008001

    [15]

    Liu Q G, Liu P, Wang Y H, et al. Semi-parametric decolorization with Laplacian-based perceptual quality metric[J]. IEEE Trans Circuits Syst Video Technol, 2017, 27 (9): 1856−1868. doi: 10.1109/tcsvt.2016.2555779

    [16]

    马常昊, 胡文惠, 钟海超, 等. 融合Sobel算子的SAR图像结构优化方法[J]. 探测与控制学报, 2024, 46 (2): 119−124.

    Ma C H, Hu W H, Zhong H C, et al. SAR image structure optimization method using Sobel operator fusion[J]. J Detect Control, 2024, 46 (2): 119−124.

    [17]

    罗小燕, 胡振, 汤文聪, 等. 傅里叶和LBP描述子相结合的矿石颗粒种类识别[J]. 传感器与微系统, 2023, 42 (11): 147−150. doi: 10.13873/J.1000-9787(2023)11-0147-04

    Luo X Y, Hu Z, Tang W C, et al. Species identification of ore particles combined with Fourier and LBP descriptors[J]. Transducer Microsyst Technol, 2023, 42 (11): 147−150. doi: 10.13873/J.1000-9787(2023)11-0147-04

    [18]

    Liu Z, Lin Y T, Cao Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021: 9992–10002. https://doi.org/10.1109/ICCV48922.2021.00986.

    [19]

    王高平, 李珣, 贾雪芳, 等. 融合Swin Transformer的立体匹配方法STransMNet[J]. 光电工程, 2023, 50 (4): 220246. doi: 10.12086/oee.2023.220246

    Wang G P, Li X, Jia X F, et al. STransMNet: a stereo matching method with Swin Transformer fusion[J]. Opto-Electron Eng, 2023, 50 (4): 220246. doi: 10.12086/oee.2023.220246

    [20]

    Ghadiyaram D, Bovik A C. Massive online crowdsourced study of subjective and objective picture quality[J]. IEEE Trans Image Process, 2016, 25 (1): 372−387. doi: 10.1109/TIP.2015.2500021

    [21]

    Saad M A, Bovik A C, Charrier C. Blind image quality assessment: a natural scene statistics approach in the DCT domain[J]. IEEE Trans Image Process, 2012, 21 (8): 3339−3352. doi: 10.1109/TIP.2012.2191563

    [22]

    Moorthy A K, Bovik A C. Blind image quality assessment: from natural scene statistics to perceptual quality[J]. IEEE Trans Image Process, 2011, 20 (12): 3350−3364. doi: 10.1109/TIP.2011.2147325

    [23]

    Mittal A, Moorthy A K, Bovik A C. No-reference image quality assessment in the spatial domain[J]. IEEE Trans Image Process, 2012, 21 (12): 4695−4708. doi: 10.1109/TIP.2012.2214050

    [24]

    Ye P, Kumar J, Kang L, et al. Unsupervised feature learning framework for no-reference image quality assessment[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012: 1098–1105. https://doi.org/10.1109/CVPR.2012.6247789.

    [25]

    Zhang L, Zhang L, Bovik A C. A feature-enriched completely blind image quality evaluator[J]. IEEE Trans Image Process, 2015, 24 (8): 2579−2591. doi: 10.1109/TIP.2015.2426416

    [26]

    Xu J T, Ye P, Li Q H, et al. Blind image quality assessment based on high order statistics aggregation[J]. IEEE Trans Image Process, 2016, 25 (9): 4444−4457. doi: 10.1109/TIP.2016.2585880

    [27]

    Bosse S, Maniry D, Müller K R, et al. Deep neural networks for no-reference and full-reference image quality assessment[J]. IEEE Trans Image Process, 2018, 27 (1): 206−219. doi: 10.1109/TIP.2017.2760518

    [28]

    Yan Q S, Gong D, Zhang Y N. Two-stream convolutional networks for blind image quality assessment[J]. IEEE Trans Image Process, 2019, 28 (5): 2200−2211. doi: 10.1109/TIP.2018.2883741

    [29]

    Golestaneh S A, Dadsetan S, Kitani K M. No-reference image quality assessment via transformers, relative ranking, and self-consistency[C]//Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, 2022: 3989–3999. https://doi.org/10.1109/WACV51458.2022.00404.

    [30]

    Zhang W X, Ma K D, Yan J, et al. Blind image quality assessment using a deep bilinear convolutional neural network[J]. IEEE Trans Circuits Syst Video Technol, 2020, 30 (1): 36−47. doi: 10.1109/TCSVT.2018.2886771

    [31]

    Su S L, Yan Q S, Zhu Y, et al. Blindly assess image quality in the wild guided by a self-adaptive hyper network[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020: 3664–3673. https://doi.org/10.1109/CVPR42600.2020.00372.

    [32]

    Wu J J, Ma J P, Liang F H, et al. End-to-end blind image quality prediction with cascaded deep neural network[J]. IEEE Trans Image Process, 2020, 29: 7414−7426. doi: 10.1109/TIP.2020.3002478

    [33]

    Pan Z Q, Zhang H, Lei J J, et al. DACNN: Blind image quality assessment via a distortion-aware convolutional neural network[J]. IEEE Trans Circuits Syst Video Technol, 2022, 32 (11): 7518−7531. doi: 10.1109/TCSVT.2022.3188991

    [34]

    Saha A, Mishra S, Bovik A C. Re-IQA: Unsupervised learning for image quality assessment in the wild[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 5846–5855. https://doi.org/10.1109/CVPR52729.2023.00566.

    [35]

    Qin G Y, Hu R Z, Liu Y T, et al. Data-efficient image quality assessment with attention-panel decoder[C]//Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023: 2091–2100. https://doi.org/10.1609/aaai.v37i2.25302.

    [36]

    Xu K M, Liao L, Xiao J, et al. Boosting image quality assessment through efficient transformer adaptation with local feature enhancement[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024: 2662–2672. https://doi.org/10.1109/CVPR52733.2024.00257.

  • 加载中

(17)

(3)

计量
  • 文章访问数: 
  • PDF下载数: 
  • 施引文献:  0
出版历程
收稿日期:  2024-11-11
修回日期:  2024-12-23
录用日期:  2024-12-23
刊出日期:  2025-01-25

目录

/

返回文章
返回