基于改进R-FCN的多遮挡行人实时检测算法

刘辉; 彭力; 闻继伟

doi:10.12086/oee.2019.180606

基于改进R-FCN的多遮挡行人实时检测算法

- 江南大学物联网工程学院物联网应用技术教育部工程中心，江苏无锡 214122
基金项目:
教育部中国移动创新基金资助项目(MCM20182019)

详细信息

作者简介:
刘辉(1992-)，男，硕士研究生，主要从事模式识别的研究。E-mail：1391570995@qq.com

**^*通讯作者:** 闻继伟(1981-)，男，博士，副教授，主要从事控制理论的研究。E-mail：wjw8143@aliyun.com

中图分类号: TP391.4

收稿日期: 2018-11-21

修回日期: 2019-01-10

刊出日期: 2019-09-30

Multi-occluded pedestrian real-time detection algorithm based on preprocessing R-FCN

- Engineering Research Center of Internet of Things Technology Applications of the Ministry of Education, School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
Fund Project: Supported by Education Ministry and China Mobile Science Research Foundation (MCM20182019)

More Information

**^*Corresponding author:** Wen Jiwei, E-mail:wjw8143@aliyun.com

Received Date 21 November 2018

Revised Date 10 January 2019

Published Date 30 September 2019

摘要

摘要

当前车辆辅助驾驶系统的一个主要挑战就是在复杂场景下实时检测出多遮挡的行人，以减少交通事故的发生。为了提高系统的检测精度和速度，提出了一种基于改进区域全卷积网络(R-FCN)的多遮挡行人实时检测算法。在R-FCN网络基础上，引进感兴趣区域(RoI)对齐层，解决特征图与原始图像上的RoI不对准问题；改进可分离卷积层，降低R-FCN的位置敏感分数图维度，提高检测速度。针对行人遮挡问题，提出多尺度上下文算法，采用局部竞争机制进行自适应上下文尺度选择；针对遮挡部位可见度低，引进可形变RoI池化层，扩大对身体部位的池化面积。最后为了减少视频序列中行人的冗余信息，使用序列非极大值抑制算法代替传统的非极大值抑制算法。检测算法在基准数据集Caltech训练检测和ETH上产生较低的检测误差，优于当前数据集中检测算法的精度，且适用于检测遮挡的行人。
- 多遮挡行人 /
- 可分离卷积层 /
- 多尺度上下文 /
- 可形变RoI池化层
Abstract

One of main challenges of driver assistance systems is to detect multi-occluded pedestrians in real-time in complicated scenes, to reduce the number of traffic accidents. In order to improve the accuracy and speed of detection system, we proposed a real-time multi-occluded pedestrian detection algorithm based on R-FCN. RoI Align layer was introduced to solve misalignments between the feature map and RoI of original images. A separable convolution was optimized to reduce the dimensions of position-sensitive score maps, to improve the detection speed. For occluded pedestrians, a multi-scale context algorithm is proposed, which adopt a local competition mechanism for adaptive context scale selection. For low visibility of the body occlusion, deformable RoI pooling layers were introduced to expand the pooled area of the body model. Finally, in order to reduce redundant information in the video sequence, Seq-NMS algorithm is used to replace traditional NMS algorithm. The experiments have shown that there is low detection error on the datasets Caltech and ETH, the accuracy of our algorithm is better than that of the detection algorithms in the sets, works particularly well with occluded pedestrians.
- multi-occluded pedestrian /
- separable convolution layer /
- multi-scale context /
- deformable RoI pooling layer

Overview

Overview

Overview: Pedestrian detection is a research hot in the fields of pattern recognition and machine learning. It is widely used in areas such as video surveillance, intelligent driving and robot navigation. Computer realizes pedestrian detection automatically, which can reduce the burden of people in a certain extent. With the development of deep learning theory, the convolutional neural network has made remarkable achievements in the field of pedestrian detection by improving the generation strategy of candidate regions and optimizing the network structure and training methods. Different from the usual object detection, pedestrian is a moving target and not a rigidity instance because of the change of occlusion and height. The methods base on feature extraction cannot meet the industrial requirements. So we choose a method base on convolutional neural network to achieve higher accuracy and real-time detection for multi-occluded pedestrians. The main work of pedestrian detection is to accurately draw the position coordinates of pedestrians in different scenarios and output the detection accuracy of the system. However, due to the complexity of the surrounding environment (such as multiple occlusion, weak illumination, etc.), the accuracy of the pedestrian detection system is greatly challenged. Compared with non-occluded pedestrians, multi-occluded pedestrians are easier to lose the detection information, and cause the decrease of pedestrian detection score below the threshold, thus missed the detection. In order to improve the detection accuracy and speed of multi-occlusion pedestrians in complex scenes, we propose a fast deformable full convolutional pedestrian detection network (called Fast D-FCN). Based on R-FCN, we introduced RoI Align layer to solve misalignments between the feature map and RoI of original images. To improve detection speed, we improved a separable convolution to reduce dimensions of position-sensitive score maps, put it on feature extraction layers of ResNet-50. For multi-occluded pedestrians, we proposed a multi-scale context in res5a of ResNet-50, which adopt a local competition mechanism for adaptive context scale selection. In the case of low visibility of the body occlusion, we introduced deformable RoI pooling layers to expand the pooled area of the body model in res5b of ResNet-50. Through the res5c layer, the channel feature vector of the fixed dimension, classification probability in the classification layer, and bounding box information in the regression layer are outputted. Finally, in order to reduce redundant information in the video sequence, we used Seq-NMS algorithm to replace traditional NMS algorithm. The experiments have shown that on the datasets Caltech, the detection error about part occlusion and heavy occlusion decrease 0.55% and 12.77% respectively compared to F-DNN. On the ETH dataset, our algorithm is better than the accuracy of other detection algorithms, and works particularly well with multi-occluded pedestrians.

HTML全文

图 1 整体网络结构图

Figure 1. Schematic of the network structure

下载: 全尺寸图片幻灯片

图 2 R-FCN结构图

Figure 2. Schematic of the R-FCN structure

下载: 全尺寸图片幻灯片

图 3 可分离卷积执行k×1和1×k卷积顺序

Figure 3. Separable convolution performs k×1 and 1× k convolution

下载: 全尺寸图片幻灯片

图 4 3×3可形变RoI池化示例

Figure 4. Illustration of 3×3 deformable RoI pooling

下载: 全尺寸图片幻灯片

图 5 Caltech数据集的结果比较。(a)部分遮挡；(b)严重遮挡

Figure 5. Comparison results on the Caltech bench-mark. (a) Part-occlusion; (b) Heavy-occlusion

下载: 全尺寸图片幻灯片

图 6 ETH数据集检测结果

Figure 6. Results on the ETH benchmark

下载: 全尺寸图片幻灯片

图 7 算法检测效果

Figure 7. Test result carried out by the algorithm

下载: 全尺寸图片幻灯片

表 1 ResNet-50网络参数表

Table 1. ResNet-50 network parameter

Layer	Output size	K	Output channels
Image	224×224
Conv1 maxPool	112×112 56×56	3×3 3×3	256 256
Stage2	28×28 28×28		512
Stage3	14×14 14×14		1024
Stage4	7×7 7×7		2048
FC	1×1		1000
Comp*			98 M
(Comp*表示模型的复杂度，K表示卷积内核大小)

下载: 导出CSV

表 2 漏检率与检测速度比较

Table 2. Comparison of miss and detect rate

Algorithm	Fast D-FCN	SSD	R-FCN
Test size Base-model Part-occlusion(MR)/%	640x480 ResNet-50 14.86	512x512 ResNet-50 20.49	640x480 ResNet-50 16.09
Heavy-occlusion(MR)/%	42.36	57.64	55.81
Speed/(f/s)	48.71	35.42	11.24

下载: 导出CSV

参考文献(24)

参考文献

[1]	Dollar P, Wojek C, Schiele B, et al. Pedestrian detection: an Evaluation of the State of the art[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(4): 743–761. doi: 10.1109/TPAMI.2011.155
[2]	Wang X Y, Han T X, Yan S C. An HOG-LBP human detector with partial occlusion handling[C]//Proceedings of the 12th IEEE International Conference on Computer Vision, 2009: 32–39.http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=5459207
[3]	Dai J F, Li Y, He K M, et al. R-FCN: object detection via region-based fully convolutional networks[C]//Proceedings of the 30th Conference on Neural Information Processing Systems, 2016: 379–387.http://www.researchgate.net/publication/303409473_R-FCN_Object_Detection_via_Region-based_Fully_Convolutional_Networks
[4]	王科俊, 赵彦东, 邢向磊.深度学习在无人驾驶汽车领域应用的研究进展[J].智能系统学报, 2018, 13(1): 55–69. http://d.old.wanfangdata.com.cn/Periodical/xdkjyc201801005 Wang K J, Zhao Y D, Xing X L. Deep learning in driverless vehicles[J]. CAAI Transactions on Intelligent Systems, 2018, 13(1): 55–69. http://d.old.wanfangdata.com.cn/Periodical/xdkjyc201801005
[5]	王正来, 黄敏, 朱启兵, 等.基于深度卷积神经网络的运动目标光流检测方法[J].光电工程, 2018, 45(8): 180027. CNKI:SUN:GDGC.0.2018-08-006 Wang Z L, Huang M, Zhu Q B, et al. The optical flow detection method of moving target using deep convolution neural network[J]. Opto-Electronic Engineering, 2018, 45(8): 180027. CNKI:SUN:GDGC.0.2018-08-006
[6]	Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 21–37.
[7]	Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015: 91–99.http://www.tandfonline.com/servlet/linkout?suffix=CIT0014&dbid=8&doi=10.1080%2F2150704X.2018.1475770&key=27295650
[8]	程德强, 唐世轩, 冯晨晨, 等.改进的HOG-CLBC的行人检测方法[J].光电工程, 2018, 45(8): 180111. CNKI:SUN:GDGC.0.2018-08-010 Cheng D Q, Tang S X, Feng C C, et al. Extended HOG-CLBC for pedstrain detection[J]. Opto-Electronic Engineering, 2018, 45(8): 180111. CNKI:SUN:GDGC.0.2018-08-010
[9]	Ouyang W L, Wang X G. Joint deep learning for pedestrian detection[C]//Proceedings of 2013 IEEE International Conference on Computer Vision, 2014: 2056–2063.http://www.researchgate.net/publication/261857512_Joint_Deep_Learning_for_Pedestrian_Detection
[10]	Tian Y L, Luo P, Wang X G, et al. Deep learning strong parts for pedestrian detection?[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, 2015: 1904–1912.http://www.researchgate.net/publication/300412405_Deep_Learning_Strong_Parts_for_Pedestrian_Detection
[11]	Ouyang W L, Zeng X Y, Wang X G. Partial occlusion handling in pedestrian detection with a deep model[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2016, 26(11): 2123–2137. doi: 10.1109/TCSVT.2015.2501940
[12]	Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[J]. arXiv: 1512.00567v3[cs.CV], 2015.
[13]	Han W, Khorrami P, Le Paine P, et al. Seq-NMS for video object detection[J]. arXiv: 1602.08465[cs.CV], 2016.
[14]	He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770–778.http://www.tandfonline.com/servlet/linkout?suffix=CIT0020&dbid=16&doi=10.1080%2F15481603.2018.1426091&key=10.1109%2FCVPR.2016.90
[15]	He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]// Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 2980–2988.
[16]	Dai J F, Qi H Z, Xiong Y W, et al. Deformable convolutional networks[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 764–773.http://www.researchgate.net/publication/315463609_Deformable_Convolutional_Networks?ev=prf_high
[17]	Bell S, Zitnick C L, Bala K, et al. Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2874–2883.
[18]	Cai Z W, Fan Q F, Feris R S, et al. A unified multi-scale deep convolutional neural network for fast object detection[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 354–370.
[19]	Goodfellow I J, Warde-Farley D, Mirza M, et al. Maxout networks[J]. JMLR WCP, 2013, 28(3): 1319–1327. http://d.old.wanfangdata.com.cn/Periodical/txxb201707012
[20]	Zhang L L, Lin L, Liang X D, et al. Is faster R-CNN doing well for pedestrian detection?[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 443–457.http://link.springer.com/chapter/10.1007/978-3-319-46475-6_28
[21]	Tian Y L, Luo P, Wang X G, et al. Pedestrian detection aided by deep learning semantic tasks[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, 2015: 5079–5087.10.1109/CVPR.2015.7299143
[22]	Du X Z, El-Khamy M, Lee J, et al. Fused DNN: a deep neural network fusion approach to fast and robust pedestrian detection[C]//Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision, 2017.10.1109/WACV.2017.111
[23]	Dollár P, Appel R, Belongie S, et al. Fast feature pyramids for object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(8): 1532–1545. doi: 10.1109/TPAMI.2014.2300479
[24]	Nam W, Dollár P, Han J H. Local decorrelation for improved pedestrian detection[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014: 424–432.https://www.researchgate.net/publication/319770161_Local_Decorrelation_for_Improved_Pedestrian_Detection

施引文献

资源附件(0)

访问统计

访问统计

点击扫一扫

图(7)

表(2)

计量

文章访问数:
PDF下载数:
施引文献: 0

基于改进R-FCN的多遮挡行人实时检测算法

作者简介:
刘辉(1992-)，男，硕士研究生，主要从事模式识别的研究。E-mail：1391570995@qq.com

**^*通讯作者:** 闻继伟(1981-)，男，博士，副教授，主要从事控制理论的研究。E-mail：wjw8143@aliyun.com

Multi-occluded pedestrian real-time detection algorithm based on preprocessing R-FCN

**^*Corresponding author:** Wen Jiwei, E-mail:wjw8143@aliyun.com

摘要

Abstract

Overview

参考文献

访问统计

计量

目录

作者须知

其他内容

条款和政策

基于改进R-FCN的多遮挡行人实时检测算法

作者简介: 刘辉(1992-)，男，硕士研究生，主要从事模式识别的研究。E-mail：1391570995@qq.com

*通讯作者: 闻继伟(1981-)，男，博士，副教授，主要从事控制理论的研究。E-mail：wjw8143@aliyun.com

Multi-occluded pedestrian real-time detection algorithm based on preprocessing R-FCN

*Corresponding author: Wen Jiwei, E-mail:wjw8143@aliyun.com

摘要

Abstract

Overview

参考文献

访问统计

计量

出版历程

目录

作者须知

其他内容

条款和政策

作者简介:
刘辉(1992-)，男，硕士研究生，主要从事模式识别的研究。E-mail：1391570995@qq.com

**^*通讯作者:** 闻继伟(1981-)，男，博士，副教授，主要从事控制理论的研究。E-mail：wjw8143@aliyun.com

**^*Corresponding author:** Wen Jiwei, E-mail:wjw8143@aliyun.com