基于多任务学习框架的红外行人检测算法

苟于涛; 马梁; 宋怡萱; 靳雷; 雷涛

doi:10.12086/oee.2021.210358

基于多任务学习框架的红外行人检测算法

- 1.
  中国科学院光电探测技术研究室, 四川成都 610209
- 2.
  中国科学院光电技术研究所, 四川成都 610209
- 3.
  中国科学院大学, 北京 100049

详细信息

作者简介:
苟于涛(1997-)，男，硕士，主要从事基于深度学习的目标检测和多模图像融合识别的研究。E-mail：gouyutao19@mails.ucas.ac.cn

**^*通讯作者:** 雷涛(1981-)，男，博士，研究员，主要从事基于传统方法及深度学习技术的图像处理与分析、复杂场景下目标检测识别与跟踪等方面的研究。E-mail：taoleiyan@ioe.ac.cn

中图分类号: TP391.41; TN215

收稿日期: 2021-11-12

修回日期: 2021-11-30

刊出日期: 2021-12-30

Multi-task learning for thermal pedestrian detection

- 1.
  Photoelectric Detection Technology Laboratory, Chinese Academy of Sciences, Chengdu, Sichuan 610209, China
- 2.
  Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu, Sichuan 610209, China
- 3.
  University of Chinese Academy of Sciences, Beijing 100049, China

More Information

**^*Corresponding author:** Lei Tao, E-mail: taoleiyan@ioe.ac.cn

Received Date 12 November 2021

Revised Date 30 November 2021

Published Date 30 December 2021

摘要

摘要

与高质量可见光图像相比，红外图像在行人检测任务中往往存在较高的虚警率。其主要原因在于红外图像受成像分辨率及光谱特性限制，缺乏清晰的纹理特征，同时部分样本的特征质量较差，干扰网络的正常学习。本文提出基于多任务学习框架的红外行人检测算法，其在多尺度检测框架的基础上，做出以下改进：1) 引入显著性检测任务作为协同分支与目标检测网络构成多任务学习框架，以共同学习的方式侧面强化检测器对强显著区域及其边缘信息的关注。2) 通过将样本显著性强度引入分类损失函数，抑制噪声样本的学习权重。在公开KAIST数据集上的检测结果证实，本文的算法相较于基准算法RetinaNet能够降低对数平均丢失率(MR^-2)4.43%。
- 红外行人检测 /
- 多任务学习 /
- 显著性检测
Abstract

Compared with high-quality RGB images, thermal images tend to have a higher false alarm rate in pedestrian detection tasks. The main reason is that thermal images are limited by imaging resolution and spectral characteristics, lacking clear texture features, while some samples have poor feature quality, which interferes with the network training. We propose a thermal pedestrian algorithm based on a multi-task learning framework, which makes the following improvements based on the multiscale detection framework. First, saliency detection tasks are introduced as an auxiliary branch with the target detection network to form a multitask learning framework, which side-step the detector's attention to illuminate salient regions and their edge information in a co-learning manner. Second, the learning weight of noisy samples is suppressed by introducing the saliency strength into the classification loss function. The detection results on the publicly available KAIST dataset confirm that our learning method can effectively reduce the log-average miss rate by 4.43% compared to the baseline, RetinaNet.
- thermal pedestrian detection /
- multi-task learning /
- saliency detection

Overview

Overview

Overview: In recent years, pedestrian detection techniques based on visible images have been developed rapidly. However, interference from light, smoke, and occlusion makes it difficult to achieve robust detection around the clock by relying on these images alone. Thermal images, on the other hand, can sense the thermal radiation information in the specified wavelength band emitted by the target, which are highly resistant to interference, ambient lighting, etc, and widely used in security and transportation. At present, the detection performance of thermal images still needs to be improved, which suffers from the poor image quality of thermal images and the interference of some noisy samples to network learning.
In order to improve the performance of the thermal pedestrian detection algorithm, we firstly introduce a saliency detection map as supervised information and adopt a framework of multi-task learning, where the main network completes the pedestrian detection task and the auxiliary network satisfies the saliency detection task. By sharing the feature extraction modules of both tasks, the network has saliency detection capability while guiding the network to focus on salient regions. To search for the most reasonable framework of the auxiliary network, we test four different kinds of design from the independent-learning to the guided-attentive model. Secondly, through the visualization of the pedestrian samples, we induce noisy samples that have lower saliency expressions in the thermal images and introduce the saliency strengths of different samples into the classification loss function by hand-designing the mapping function to relieve the interference of noisy samples on the network learning. To achieve this goal, we adopt a sigmoid function with reasonable transformation as our mapping function, which maps the saliency area percentage to the saliency score. Finally, we introduce the saliency score to the Focal Loss and design the Smooth Focal Loss, which can decrease the loss of low-saliency samples with reasonable settings.
Extensive experiments on KAIST thermal images have proved the conclusions as follows. First, compared with other auxiliary frameworks, our cascaded model achieves impressive performance with independent design. Besides, compared with the RetinaNet, we decrease the log-average miss rate by 4.43%, which achieves competitive results among popular thermal pedestrian detection methods. Finally, our method has no impact on the computational cost in the inference process as a network training strategy. Although the effectiveness of our method has been proven, one still needs to set the super-parameters manually. In the future, how to enable the network to adapt to various detection conditions will be our next research point.

HTML全文

图 1 KAIST行人样本可视化。

Figure 1. The visualization of pedestrian samples in KAIST.

下载: 全尺寸图片幻灯片

图 2 网络整体框架示意图

Figure 2. The illustration of the network framework

下载: 全尺寸图片幻灯片

图 3 文献[13]的方法与本文方法的整体框架对比。

Figure 3. Comparison between the method in Ref. ^[13] and ours.

下载: 全尺寸图片幻灯片

图 4 三种独立学习式网络结构设计方案。

Figure 4. The illustration of three designed solutions.

下载: 全尺寸图片幻灯片

图 5 两种引导注意力式网络结构设计方案。

Figure 5. Two design schemes of the guided-attention network.

下载: 全尺寸图片幻灯片

图 6 教师网络R³Net的部分显著性检测结果可视化。奇数列为红外图像，偶数列为显著性检测结果

Figure 6. The visualization of part of R³Net saliency detection results. Odd-numbered columns are IR images, and even-numbered columns are detection results

下载: 全尺寸图片幻灯片

图 7 协同分支的部分显著性检测结果可视化。

Figure 7. The visualization of part of saliency object detection results on the auxiliary network.

下载: 全尺寸图片幻灯片

图 8 (a) 不同参数下显著性得分因子的映射函数曲线；(b) 部分映射结果可视化。红框为检测label，数字为计算的显著性得分因子S_i

Figure 8. (a) Mapping function curves of significance score factors with different parameters; (b) Visualization of partial mapping results. The red box refers to the object detection label, and the number refers to the S_i

下载: 全尺寸图片幻灯片

图 9 不同时段不同模型的MR-FPPI曲线可视化。(a) 全天候；(b)白天；(c) 夜晚

Figure 9. The visualization of MR-FPPI curves with various models on different periods within (a) all day, (b) only day, and (c) only night

下载: 全尺寸图片幻灯片

图 10 5个场景下真实值及不同模型的检测结果。

Figure 10. Partial test results.

下载: 全尺寸图片幻灯片

表 1 R³Net显著性检测结果的定量分析

Table 1. Quantitative analyses of saliency detection results on R³Net

Method	F_β QUOTE -Score	MAE
R³Net	0.6875	0.0045

下载: 导出CSV

表 2 独立学习式框架性能测试

Table 2. Individual learning framework contrastive test

Algorithm	Reasonable-all		Reasonable-day		Reasonable-night
Algorithm	QUOTE MR^-2/(%)	mAP/(%)	MR^-2/(%)	mAP/(%)	MR^-2/(%)	mAP/(%)
RetinaNet(baseline)	24.68	83.20	29.41	80.45	15.41	89.11
PAR	25.16	84.00	30.35	80.80	13.36	91.43
CAS	22.15	86.39	27.77	82.75	9.63	94.16

下载: 导出CSV

表 3 引导注意力式框架性能测试

Table 3. Guided-attention framework contrastive test

Algorithm	Reasonable-all		Reasonable-day		Reasonable-night
Algorithm	QUOTE MR^-2/(%)	mAP/(%)	MR^-2/(%)	mAP/(%)	MR^-2/(%)	mAP/(%)
RetinaNet(baseline)	24.68	83.20	29.41	80.45	15.41	89.11
Guided(a)	30.01	79.96	34.68	76.86	19.40	87.23
Guided(b)	21.94	85.74	26.97	82.72	11.86	92.38

下载: 导出CSV

表 4 不同参数下的检测性能对比实验

Table 4. Contrastive testing experiment with different parameters

S_low	w	b	α	Reasonable-all		Reasonable-day		Reasonable-night
S_low	w	b	α	MR^-2/(%)	mAP/(%)	MR^-2/(%)	mAP/(%)	MR^-2/(%)	mAP/(%)
0.75	0.5	0.5	16	20.63	86.50	25.15	83.55	10.63	93.19
			14	21.88	85.57	26.66	82.80	12.67	91.53
			12	22.37	85.30	27.64	81.97	11.25	92.58
0.7	0.6	0.4	16	22.72	85.36	28.51	81.86	10.84	92.78
			14	20.25	86.13	25.18	82.81	9.57	93.57
			12	21.76	85.82	26.36	82.90	11.46	92.39
0.65	0.7	0.3	16	21.13	85.89	26.06	82.83	10.37	92.80
			14	22.35	84.58	27.04	81.62	12.52	91.30
			12	21.05	86.15	24.99	83.63	12.18	91.97

下载: 导出CSV

表 5 KAIST红外行人检测算法性能测试对比，其中+SM表示采用文献[13]的方式引入显著图

Table 5. Contrastive experiments on various thermal pedestrian detection methods, where +SM represents introducing Saliency Map in the way of Ref. ^[13]

Detectors	MR^-2-all	MR^-2-day	MR^-2-night
Faster RCNN-T^[13]	47.59	50.13	40.93
Faster RCNN+SM^[13]	—	30.4	21
Bottom up^[25]	35.2	40	20.5
TC-thermal^[14]	28.53	36.59	11.03
TC-Det^[14]	27.11	34.81	10.31
RetinaNet(baseline)	24.68	29.41	15.41
RetinaNet+SM	23.47	30.30	9.85
Ours(CAS)	22.15	27.77	9.63
Ours(CAS+Smooth FL)	20.25	25.18	9.57

下载: 导出CSV

参考文献(23)

参考文献

[1]	Zhang L L, Lin L, Liang X D, et al. Is faster R-CNN doing well for pedestrian detection?[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 443–457.
[2]	Li J N, Liang X D, Shen S M, et al. Scale-aware fast R-CNN for pedestrian detection[J]. IEEE Trans Multimed, 2018, 20(4): 985–996. https://ieeexplore.ieee.org/document/8060595
[3]	张宝华, 朱思雨, 吕晓琪, 等. 软多标签和深度特征融合的无监督行人重识别[J]. 光电工程, 2020, 47(12): 190636. doi: 10.12086/oee.2020.190636 Zhang B H, Zhu S Y, Lv X Q, et al. Soft multilabel learning and deep feature fusion for unsupervised person re-identification[J]. Opto-Electron Eng, 2020, 47(12): 190636. doi: 10.12086/oee.2020.190636
[4]	张晓艳, 张宝华, 吕晓琪, 等. 深度双重注意力的生成与判别联合学习的行人重识别[J]. 光电工程, 2021, 48(5): 200388. doi: 10.12086/oee.2021.200388 Zhang X Y, Zhang B H, Lv X Q, et al. The joint discriminative and generative learning for person re-identification of deep dual attention[J]. Opto-Electron Eng, 2021, 48(5): 200388. doi: 10.12086/oee.2021.200388
[5]	Hwang S, Park J, Kim N, et al. Multispectral pedestrian detection: Benchmark dataset and baseline[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015: 1037–1045.
[6]	Liu J J, Zhang S T, Wang S, et al. Multispectral deep neural networks for pedestrian detection[Z]. arXiv preprint arXiv: 1611.02644, 2016.
[7]	汪荣贵, 王静, 杨娟, 等. 基于红外和可见光模态的随机融合特征金子塔行人重识别[J]. 光电工程, 2020, 47(12): 190669. doi: 10.12086/oee.2020.190669 Wang R G, Wang J, Yang J, et al. Feature pyramid random fusion network for visible-infrared modality person re-identification[J]. Opto-Electron Eng, 2020, 47(12): 190669. doi: 10.12086/oee.2020.190669
[8]	张汝榛, 张建林, 祁小平, 等. 复杂场景下的红外目标检测[J]. 光电工程, 2020, 47(10): 200314. doi: 10.12086/oee.2020.200314 Zhang R Z, Zhang J L, Qi X P, et al. Infrared target detection and recognition in complex scene[J]. Opto-Electron Eng, 2020, 47(10): 200314. doi: 10.12086/oee.2020.200314
[9]	Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(6): 1137–1149. https://ieeexplore.ieee.org/document/7485869
[10]	Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 779–788.
[11]	John V, Mita S, Liu Z, et al. Pedestrian detection in thermal images using adaptive fuzzy C-means clustering and convolutional neural networks[C]//2015 14th IAPR International Conference on Machine Vision Applications (MVA), 2015: 246–249.
[12]	Devaguptapu C, Akolekar N, Sharma M M, et al. Borrow from anywhere: pseudo multi-modal object detection in thermal imagery[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019: 1029–1038.
[13]	Ghose D, Desai S M, Bhattacharya S, et al. Pedestrian detection in thermal images using saliency maps[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019: 988–997.
[14]	Kieu M, Bagdanov AD, Bertini M, et al. Task-conditioned domain adaptation for pedestrian detection in thermal imagery[C]//Proceedings of the 16th European Conference on Computer Vision, 2020: 546–562.
[15]	Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), 2017: 2999–3007.
[16]	Deng Z J, Hu X W, Zhu L, et al. R³Net: recurrent residual refinement network for saliency detection[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018: 684–690.
[17]	Koch C, Ullman S. Shifts in selective visual attention: towards the underlying neural circuitry[J]. Hum Neurobiol, 1985, 4(4): 219–227. https://www.cin.ufpe.br/~fsq/Artigos/200.pdf
[18]	Hou X D, Zhang L Q. Saliency detection: a spectral residual approach[C]//2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007: 1–8.
[19]	Montabone S, Soto A. Human detection using a mobile platform and novel features derived from a visual saliency mechanism[J]. Image Vis Comput, 2010, 28(3): 391–402. doi: 10.1016/j.imavis.2009.06.006
[20]	Liu N, Han J W, Yang M H. PiCANet: learning pixel-wise contextual attention for saliency detection[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 3089–3098.
[21]	Li C Y, Song D, Tong R F, et al. Illumination-aware faster R-CNN for robust multispectral pedestrian detection[J]. Pattern Recognit, 2019, 85: 161–171. doi: 10.1016/j.patcog.2018.08.005
[22]	Li C Y, Song D, Tong R F, et al. Multispectral pedestrian detection via simultaneous detection and segmentation[Z]. arXiv preprint arXiv: 1808.04818, 2018.
[23]	Guo T T, Huynh C P, Solh M. Domain-adaptive pedestrian detection in thermal images[C]//2019 IEEE International Conference on Image Processing (ICIP), 2019: 1660–1664.

施引文献

资源附件(0)

访问统计

访问统计

点击扫一扫

图(10)

表(5)

计量

文章访问数:
PDF下载数:
施引文献: 0

基于多任务学习框架的红外行人检测算法

作者简介:
苟于涛(1997-)，男，硕士，主要从事基于深度学习的目标检测和多模图像融合识别的研究。E-mail：gouyutao19@mails.ucas.ac.cn

**^*通讯作者:** 雷涛(1981-)，男，博士，研究员，主要从事基于传统方法及深度学习技术的图像处理与分析、复杂场景下目标检测识别与跟踪等方面的研究。E-mail：taoleiyan@ioe.ac.cn

Multi-task learning for thermal pedestrian detection

**^*Corresponding author:** Lei Tao, E-mail: taoleiyan@ioe.ac.cn

摘要

Abstract

Overview

参考文献

访问统计

计量

目录

作者须知

其他内容

条款和政策

基于多任务学习框架的红外行人检测算法

作者简介: 苟于涛(1997-)，男，硕士，主要从事基于深度学习的目标检测和多模图像融合识别的研究。E-mail：gouyutao19@mails.ucas.ac.cn

*通讯作者: 雷涛(1981-)，男，博士，研究员，主要从事基于传统方法及深度学习技术的图像处理与分析、复杂场景下目标检测识别与跟踪等方面的研究。E-mail：taoleiyan@ioe.ac.cn

Multi-task learning for thermal pedestrian detection

*Corresponding author: Lei Tao, E-mail: taoleiyan@ioe.ac.cn

摘要

Abstract

Overview

参考文献

访问统计

计量

出版历程

目录

作者须知

其他内容

条款和政策

作者简介:
苟于涛(1997-)，男，硕士，主要从事基于深度学习的目标检测和多模图像融合识别的研究。E-mail：gouyutao19@mails.ucas.ac.cn

**^*通讯作者:** 雷涛(1981-)，男，博士，研究员，主要从事基于传统方法及深度学习技术的图像处理与分析、复杂场景下目标检测识别与跟踪等方面的研究。E-mail：taoleiyan@ioe.ac.cn

**^*Corresponding author:** Lei Tao, E-mail: taoleiyan@ioe.ac.cn