双重对比学习框架下近红外-可见光人脸图像转换方法

孙锐,单晓全,孙琦景,等. 双重对比学习框架下近红外-可见光人脸图像转换方法[J]. 光电工程,2022,49(4): 210317. doi: 10.12086/oee.2022.210317
引用本文: 孙锐,单晓全,孙琦景,等. 双重对比学习框架下近红外-可见光人脸图像转换方法[J]. 光电工程,2022,49(4): 210317. doi: 10.12086/oee.2022.210317
Sun R, Shan X Q, Sun Q J, et al. NIR-VIS face image translation method with dual contrastive learning framework[J]. Opto-Electron Eng, 2022, 49(4): 210317. doi: 10.12086/oee.2022.210317
Citation: Sun R, Shan X Q, Sun Q J, et al. NIR-VIS face image translation method with dual contrastive learning framework[J]. Opto-Electron Eng, 2022, 49(4): 210317. doi: 10.12086/oee.2022.210317

双重对比学习框架下近红外-可见光人脸图像转换方法

  • 基金项目:
    国家自然科学基金面上项目(61471154, 61876057);安徽省重点研发计划-科技强警专项(202004d07020012)
详细信息
    作者简介:
    通讯作者: 单晓全,2334321350@qq.com
  • 中图分类号: TP391

NIR-VIS face image translation method with dual contrastive learning framework

  • Fund Project: National Natural Science Foundation of China (61471154,61876057) and the Key Research Plan of Anhui Province -Strengthening Police with Science and Technology (202004d07020012).
More Information
  • 随着可见光-红外双模相机在视频监控中的广泛应用,跨模态人脸识别也成为计算机视觉领域的研究热点,而将近红外域人脸图像转化为可见光域人脸图像是跨模态人脸识别中的关键问题,在刑侦安防领域有着重要研究价值。针对近红外人脸图像在着色过程中面部轮廓易被扭曲、肤色还原不真实等问题,本文提出了一种双重对比学习框架下的近红外-可见光人脸图像转换方法。该方法构建了基于StyleGAN2结构的生成器网络并将其嵌入到双重对比学习框架下,利用双向的对比学习挖掘人脸图像的精细化表征。同时,本文设计了一种面部边缘增强损失,利用从源域图像中提取的面部边缘信息进一步强化生成人脸图像中的面部细节、提高人脸图像的视觉效果。最后,在NIR-VIS Sx1和NIR-VIS Sx2数据集上的实验表明,与近期的主流方法相比,本文方法生成的可见光人脸图像更加贴近真实图像,能够更好地还原人脸图像的面部边缘细节和肤色信息。

  • Overview: Near-infrared image sensors are widely used because they can overcome the effects of natural light and work in various lighting conditions. In the field of criminal security, NIR face images are usually not directly used for face retrieval and recognition because the single-channel images acquired by NIR sensors miss the natural colors of the original images. Therefore, converting NIR face images into VIS face images and restoring the color information of face images can help further improve the subjective visual effect and cross-modal recognition performance of face images, and provide technical support for building a 24/7 video surveillance system. However, NIR face images are different from other NIR images. If the details of face contours and facial skin tones are distorted in the coloring process, the visual effect and image quality of the generated face images will be greatly affected. Therefore, it is necessary to design algorithms to enhance the retention of detailed information in the coloring process of NIR face images. We propose a NIR-VIS face image translation method under a dual contrastive learning framework. The method is based on the dual contrastive learning network and uses contrastive learning to enhance the quality of the images generated from the image localization. Meanwhile, since the StyleGAN2 network can extract deeper features of face images compared with ResNets, we construct a generator network based on the StyleGAN2 structure and embed it into the dual contrastive learning network to replace the original ResNets generator to further improve the quality of the generated face images. In addition, for the characteristics of blurred external contours and missing edges of human figures in NIR domain images, a facial edge enhancement loss is designed in this paper to further enhance the facial details of the generated face images by using the facial edge information extracted from the source domain images. Experiments show that the generation results on two public datasets based on our method are significantly better than those of recent mainstream methods. The VIS face images generated by our method are closer to the real images and possesses more facial edge details and skin tone information of face images.

  • 加载中
  • 图 1  部分算法由近红外生成的可见光图像(首行)与真实可见光图像(末行)对比

    Figure 1.  Comparison of the VIS image (the first row) generated by some algorithms from NIR domain with the real visible image (the last row)

    图 2  本文网络框架图。

    Figure 2.  The structure diagram of the proposed method.

    图 3  本文生成器结构图

    Figure 3.  The structure diagram of generator in the proposed method

    图 4  在近红外和可见光条件下分别对人脸图像裁剪出面部区域并提取边缘

    Figure 4.  Crop out facial regions and extract edges from face images in NIR and VIS conditions respectively

    图 5  在两个数据集上的对比实验结果。

    Figure 5.  The comparison experimental results on two datasets.

    图 6  在两个数据集上的消融实验结果。

    Figure 6.  Results of the ablation experiments on two datasets.

    图 7  分别使用各边缘提取方法得到的边缘图像对比。

    Figure 7.  Comparison of edge images obtained by using each edge extraction method separately.

    图 8  NIR-VIS Sx1数据集上不同${\lambda _{{\rm{FEE}}}}$取值对本文方法性能的影响

    Figure 8.  The effect of different values of ${\lambda _{{\rm{FEE}}}}$ on the performance of our method on the NIR-VIS Sx1 dataset

    表 1  NIR-VIS Sx1数据集上各图像转换网络性能比较

    Table 1.  Performance comparison of image translation networks on the NIR-VIS Sx1 dataset

    MethodMean SSIMMean PSNR/dB
    CycleGAN 0.7433 29.0987
    CSGAN 0.7964 29.9471
    CDGAN 0.7636 29.4922
    UNIT 0.7935 29.8568
    Pix2pixHD 0.8023 31.6584
    Ours 0.8096 31.0976
    下载: 导出CSV

    表 2  NIR-VIS Sx2数据集上各图像转换网络性能比较

    Table 2.  Performance comparison of image translation networks on the NIR-VIS Sx2 dataset

    MethodMean SSIMMean PSNR/dB
    CycleGAN 0.6317 28.7974
    CSGAN 0.6891 28.8176
    CDGAN 0.5283 28.1679
    UNIT 0.6986 29.0634
    Pix2pixHD 0.7894 30.5449
    Ours 0.8135 31.2393
    下载: 导出CSV

    表 3  各图像转换网络在不同数据集上FID性能与平均单张测试耗时比较

    Table 3.  Comparison of FID performance and average single test time of each image translation network on different datasets

    MethodFID (NIR-VIS Sx1)FID (NIR-VIS Sx2)Time/s
    CycleGAN142.2574171.35960.181
    CSGAN70.2146102.67180.344
    CDGAN123.7183212.42990.098
    UNIT74.831595.76380.358
    Pix2pixHD67.1044106.36150.079
    Ours58.528646.93640.337
    下载: 导出CSV

    表 4  NIR-VIS Sx1数据集上各消融方法性能比较

    Table 4.  Performance comparison of ablation methods on the NIR-VIS Sx1 dataset

    MethodMean SSIMMean PSNR/dB
    Baseline0.527928.3419
    Ours w/o StyleGAN20.529328.4381
    Ours w/o GAN0.361711.5007
    Ours w/o IDT0.686429.2308
    Ours w/o PMC0.635928.6156
    Ours w/o FEE0.798230.2057
    Ours0.809631.0976
    下载: 导出CSV

    表 5  NIR-VIS Sx1数据集上分别应用Prewitt算子与Sobel算子的性能比较

    Table 5.  Performance comparison of applying the Prewitt operator and Sobel operator respectively on the NIR-VIS Sx1 dataset

    MethodMean SSIMMean PSNR/dB
    Ours (Prewitt)0.792430.2815
    Ours (Sobel)0.809631.0976
    下载: 导出CSV
  • [1]

    Dutta A K. Imaging beyond human vision[C]//2014 8th International Conference on Electrical and Computer Engineering (ICECE), 2014: 224–229.

    [2]

    Cao Z C, Schmid N A, Bourlai T. Composite multilobe descriptors for cross-spectral recognition of full and partial face[J]. Opt Eng, 2016, 55(8): 083107. doi: 10.1117/1.OE.55.8.083107

    [3]

    Sun Y, Wang X G, Tang X O. Deep learning face representation from predicting 10, 000 classes[C]//IEEE Conference on Computer Vision & Pattern Recognition, 2014: 1891–1898.

    [4]

    He R, Wu X, Sun Z N, et al. Wasserstein CNN: learning invariant features for NIR-VIS face recognition[J]. IEEE Trans Pattern Anal Mach Intell, 2019, 41(7): 1761−1773. doi: 10.1109/TPAMI.2018.2842770

    [5]

    Hu S W, Short N, Riggan B S, et al. Heterogeneous face recognition: recent advances in infrared-to-visible matching[C]//2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), 2017: 883–890.

    [6]

    Mori A, Wada T. Part based regression with dimensionality reduction for colorizing monochrome face images[C]//2013 2nd IAPR Asian Conference on Pattern Recognition, 2013: 506–510.

    [7]

    Cheng Z Z, Yang Q X, Sheng B. Deep colorization[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 415–423.

    [8]

    Limmer M, Lensch H P A. Infrared colorization using deep convolutional neural networks[C]//2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), 2016: 61–68.

    [9]

    Larsson G, Maire M, Shakhnarovich G. Learning representations for automatic colorization[C]//14th European Conference on Computer Vision, 2016: 577–593.

    [10]

    Limmer M, Lensch H P A. Improved IR-colorization using adversarial training and estuary networks[C]//British Machine Vision Conference, 2017.

    [11]

    Suárez P L, Sappa A D, Vintimilla B X, et al. Near InfraRed imagery colorization[C]//Proceedings of 2018 25th IEEE International Conference on Image Processing (ICIP), 2018: 2237–2241.

    [12]

    Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014, 2: 2672–2680.

    [13]

    Liu M Y, Breuel T, Kautz J. Unsupervised image-to-image translation networks[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 700–708.

    [14]

    Huang X, Liu M Y, Belongie S, et al. Multimodal unsupervised image-to-image translation[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 179–196.

    [15]

    Isola P, Zhu J Y, Zhou T H, et al. Image-to-image translation with conditional adversarial networks[C]//IEEE Conference on Computer Vision & Pattern Recognition, 2017: 5967–5976.

    [16]

    Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation[C]//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015: 234–241.

    [17]

    Wang T C, Liu M Y, Zhu J Y, et al. High-resolution image synthesis and semantic manipulation with conditional GANs[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018: 8798–8807.

    [18]

    Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017: 2242–2251.

    [19]

    Wang H J, Zhang H J, Yu L, et al. Facial feature embedded Cyclegan for Vis-Nir translation[C]//ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020: 1903–1907.

    [20]

    Dou H, Chen C, Hu X Y, et al. Asymmetric cyclegan for unpaired NIR-to-RGB face image translation[C]//ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019: 1757–1761.

    [21]

    Babu K K, Dubey S R. CSGAN: cyclic-synthesized generative adversarial networks for image-to-image transformation[J]. Expert Syst Appl, 2021, 169: 114431. doi: 10.1016/j.eswa.2020.114431

    [22]

    Babu K K, Dubey S R. CDGAN: cyclic discriminative generative adversarial networks for image-to-image transformation[J]. J Vis Commun Image Represen, 2022, 82: 103382. doi: 10.1016/j.jvcir.2021.103382

    [23]

    Park T, Efros A A, Zhang R, et al. Contrastive learning for unpaired image-to-image translation[C]//Proceedings of the 16th European Conference on Computer Vision, 2020: 319–345.

    [24]

    Han J L, Shoeiby M, Petersson L, et al. Dual contrastive learning for unsupervised image-to-image translation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021: 746–755.

    [25]

    Karras T, Laine S, Aittala M, et al. Analyzing and improving the image quality of StyleGAN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020: 8107–8116.

    [26]

    He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 770–778.

    [27]

    Gao W S, Zhang X G, Yang L, et al. An improved Sobel edge detection[C]//Proceedings of the 3rd IEEE International Conference on Computer Science & Information Technology, 2010: 67–71.

    [28]

    Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019: 4396–4405.

    [29]

    Li S Z, Yi D, Lei Z, et al. The CASIA NIR-VIS 2.0 face database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2013: 348–353.

    [30]

    Sheikh H R, Sabir M F, Bovik A C. A statistical evaluation of recent full reference image quality assessment algorithms[J]. IEEE Trans Image Process, 2006, 15(11): 3440−3451. doi: 10.1109/TIP.2006.881959

    [31]

    Ma J Y, Yu W, Liang P W, et al. FusionGAN: a generative adversarial network for infrared and visible image fusion[J]. Inf Fusion, 2019, 48: 11−26. doi: 10.1016/j.inffus.2018.09.004

    [32]

    Heusel M, Ramsauer H, Unterthiner T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6629–6640.

  • 加载中

(9)

(5)

计量
  • 文章访问数:  7819
  • PDF下载数:  981
  • 施引文献:  0
出版历程
收稿日期:  2021-09-30
修回日期:  2022-01-20
网络出版日期:  2022-04-20
刊出日期:  2022-04-25

目录

/

返回文章
返回