一种用于驾驶场景下手机检测的端到端的神经网络

戴腾,张珂,尹东. 一种用于驾驶场景下手机检测的端到端的神经网络[J]. 光电工程,2021,48(4):200325. doi: 10.12086/oee.2021.200325
引用本文: 戴腾,张珂,尹东. 一种用于驾驶场景下手机检测的端到端的神经网络[J]. 光电工程,2021,48(4):200325. doi: 10.12086/oee.2021.200325
Dai T, Zhang K, Yin D. An end-to-end neural network for mobile phone detection in driving scenarios[J]. Opto-Electron Eng, 2021, 48(4): 200325. doi: 10.12086/oee.2021.200325
Citation: Dai T, Zhang K, Yin D. An end-to-end neural network for mobile phone detection in driving scenarios[J]. Opto-Electron Eng, 2021, 48(4): 200325. doi: 10.12086/oee.2021.200325

一种用于驾驶场景下手机检测的端到端的神经网络

  • 基金项目:
    安徽省2018年度重点研究与开发计划项目(1804a09020049)
详细信息
    作者简介:
    *通讯作者: 尹东(1965-),男,副教授,主要从事图像处理的研究。E-mail:yindong@ustc.edu.cn
  • 中图分类号: TP181;TP391.41

An end-to-end neural network for mobile phone detection in driving scenarios

  • Fund Project: 2018 Anhui Key Research and Development Plan Project (1804a09020049)
More Information
  • 小目标物体实时检测一直是图像处理领域中的难点。本文基于深度学习的目标检测算法,提出了一种端到端的神经网络,用于复杂驾驶场景下的手机小目标检测。首先,通过改进YOLOv4算法,设计了一个端到端的小目标检测网络(OMPDNet)来提取图片特征;其次,基于K-means算法设计了一个聚类中心更加贴切数据样本分布的聚类算法K-means-Precise,用以生成适应于小目标数据的先验框(anchor),从而提升网络模型的效率;最后,采用监督与弱监督方式构建了自己的数据集,并在数据集中加入负样本用于训练。在复杂的驾驶场景实验中,本文提出的OMPDNet算法不仅可以有效地完成驾驶员行车时使用手机的检测任务,而且对小目标检测在准确率和实时性上较当今流行算法都有一定的优势。

  • Overview: Real-time detection of small objects is always a difficult problem in the field of image processing. It has the characteristics of low resolution and difficult detection, which often leads to missed detection and false detection. In this paper, based on the deep learning target detection algorithm, an end-to-end neural network is proposed for small target detection like mobile phone in complex driving scenes. Firstly, in order to maintain a high accuracy rate and ensure real-time performance, this paper improves the YOLOv4 algorithm and designs an end-to-end small target detection network (OMPDNet) to extract image features. Secondly, setting an appropriate size of Anchor is conducive to improving the convergence speed and accuracy of the model. Meanwhile, based on K-means, this paper presents a clustering algorithm K-means-Precise, which is more suitable for the distribution of sample data. It is used to generate anchors suitable for small target data, so as to improve the efficiency of the network model. Finally, a data set (OMPD Dataset) is made by using supervision and weak supervision method to make up for the lack of public data set in specific driving scenes. It is composed of shooting videos from the in-car monitoring camera, a small number of public data sets and internet pictures. And more, in order to solve the problem of imbalance between positive and negative samples, negative samples are added to the data set for training in the paper. The experimental results on OMPD Dataset show that K-means-Precise can slightly improve the accuracy of the model. But importantly, it converges five cycles ahead of time. The overall detection of the network model is evaluated by the accuracy rate, recall rate and average accuracy rate, which are 89.7%, 96.1% and 89.4% respectively, and the speed reaches 72.4 frames per second. It shows that in the complex driving scene experiments, the OMPDNet proposed in this paper can not only effectively complete the detection task of drivers using mobile phones while driving, but also has certain advantages in accuracy and real-time performance of small target detection compared with current popular algorithms. Especially, in the practical engineering application, real-time is more important, which can recognize the behavior while driver playing mobile phone to reduce the occurrence of traffic accidents, and be benefit to the traffic management department. Our proposed method is not only suitable for mobile phone detection, but also can be extended to small target detection problems in the field of deep learning. In the future work, we will continue to improve the algorithm and generalize its performance.

  • 加载中
  • 图 1  两阶段过程原理

    Figure 1.  Basic two-stage process

    图 2  一阶段过程原理(端到端)

    Figure 2.  Basic one-stage process (end to end)

    图 3  OMPDNet网络架构图

    Figure 3.  OMPDNet network architecture diagram

    图 4  数据和难例数据聚类。

    Figure 4.  Clustering of data and difficult data.

    图 5  数据集OMPDDataset

    Figure 5.  OMPDDataset

    图 6  复杂的驾驶场景下的检测结果

    Figure 6.  Detection results in complex driving scenarios

    图 7  异常检测

    Figure 7.  Abnormal detection

    Input: dataset {x1, x2, …, xN}, number of clusters k, variance threshold λ
    Output : clusters q{xi}∈{1, 2, …, k}
    Initialize centroids  {c1, c2, …, ck}
    Repeat
        Repeat
            for i=1, 2, …, N do
                q{xi}∈←argminj |xi-cj|
            end for
            for j=1, 2, …, k do
               cj←mean|xi|q{xi}=j|
            end for
        Until centroids do not change
        for m=1, 2, …, k do
            calculate q{xi} mean mi and variance σi
            if σi > λ then
                remove the sample
            end if
        end for
       {x1, x2, …, xN}←{x1, x2, …, xN'} NN'
    Until dataset and centroids do not change
    下载: 导出CSV

    表 1  K-means-Precise的实验结果

    Table 1.  Experimental results of K-means-Precise

    Method P/% R/% mAP/% Convergence time
    OMPDNet 84.5 94.2 82.4 64 epoch
    OMPDNet+K-means-Precise 85.7 94.3 83.2 59 epoch
    下载: 导出CSV

    表 2  负样本训练的实验结果

    Table 2.  Experimental results of negative sample training

    Method P/% R/% mAP/%
    OMPDNet 84.5 94.2 82.4
    OMPDNet+negative sample training 89.2 94.1 86.3
    下载: 导出CSV

    表 3  五种算法的性能比较

    Table 3.  Performance comparisons of five algorithms

    Method P/% R/% mAP/% Speed/(f/s)
    Faster R-CNN 85.4 83.5 78.6 23.2
    SSD 78.6 75.9 75.8 44.5
    YOLOv3 82.3 79.4 80.1 52.4
    YOLOv4 89.8 84.2 83.6 56.8
    Ours 89.7 96.1 89.4 72.4
    下载: 导出CSV
  • [1]

    Rodríguez-Ascariz J M, Boquete L, Cantos J, et al. Automatic system for detecting driver use of mobile phones[J]. Transp Res C Emergi Technol, 2011, 19(4): 673-681. doi: 10.1016/j.trc.2010.12.002.

    [2]

    Leem S K, Khan F, Cho S H. Vital sign monitoring and mobile phone usage detection using IR-UWB radar for intended use in car crash prevention[J]. Sensors (Basel), 2017, 17(6): 1240. doi: 10.3390/s17061240.

    [3]

    Berri R A, Silva A G, Parpinelli R S, et al. A pattern recognition system for detecting use of mobile phones while driving[C]//Proceedings of the 9th International Conference on Computer Vision Theory and Applications, 2014: 411-418. doi: 10.5220/0004684504110418.

    [4]

    Cortes C, Vapnik V. Support-vector networks[J]. Mach Learn, 1995, 20(3): 273-297.

    [5]

    Xiong Q F, Lin J, Wei Y, et al. A deep learning approach to driver distraction detection of using mobile phone[C]//2019 IEEE Vehicle Power and Propulsion Conference, 2019: 1-5. doi: 10.1109/VPPC46532.2019.8952474.

    [6]

    Shi X P, Shan S G, Kan M N, et al. Real-time rotation-invariant face detection with progressive calibration networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 2295-2303.

    [7]

    Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: Optimal speed and accuracy of object detection[Z]. arXiv: 2004.10934, 2020.

    [8]

    Bishop C. Pattern Recognition and Machine Learning[M]. New York: Springer-Verlag, 2006.

    [9]

    Fukushima K. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position[J]. Biol Cybern, 1980, 36(4): 193-202. doi: 10.1007/BF00344251

    [10]

    Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proc IEEE, 1998, 86(11): 2278-2324. doi: 10.1109/5.726791.

    [11]

    Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012.

    [12]

    Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]//ICLR, 2015.

    [13]

    Howard A G, Zhu M L, Chen B, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications[Z]. arXiv: 1704.04861, 2017.

    [14]

    Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015: 1-9.

    [15]

    He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.

    [16]

    Viola P, Jones M. Rapid object detection using a boosted cascade of simple features[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. doi: 10.1109/CVPR.2001.990517.

    [17]

    Viola P, Jones M J. Robust real-time face detection[J]. Int J Comput Vis, 2004, 57(2): 137-154. doi: 10.1023/B:VISI.0000013087.49260.fb

    [18]

    Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005: 886-893.

    [19]

    Felzenszwalb P F, Girshick R B, McAllester D, et al. Object detection with discriminatively trained part-based models[J]. IEEE Trans Pattern Anal Mach Intell, 2010, 32(9): 1627-1645. doi: 10.1109/TPAMI.2009.167.

    [20]

    Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.

    [21]

    He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Trans Pattern Anal Mach Intell, 2015, 37(9): 1904-1916. doi: 10.1109/TPAMI.2015.2389824

    [22]

    Girshick R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015: 1440-1448.

    [23]

    Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015: 91-99.

    [24]

    Dai J F, Li Y, He K M, et al. R-FCN: Object detection via region-based fully convolutional networks[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016: 379-387.

    [25]

    Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.

    [26]

    Redmon J, Farhadi A. Yolo9000: Better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017: 6517-6525.

    [27]

    Redmon J, Farhadi A. YOLOv3: An incremental improvement[Z]. arXiv: 1804.02767, 2018.

    [28]

    Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//European Conference on Computer Vision, 2016: 21-37.

    [29]

    金瑶, 张锐, 尹东. 城市道路视频中小像素目标检测[J]. 光电工程, 2019, 46(9): 190053. doi: 10.12086/oee.2019.190053

    Jin Y, Zhang R, Yin D. Object detection for small pixel in urban roads videos[J]. Opto-Electron Eng, 2019, 46(9): 190053. doi: 10.12086/oee.2019.190053

    [30]

    Hu P, Ramanan D. Finding tiny faces[Z]. arXiv: 1612.04402, 2016.

    [31]

    Wang C Y, Liao H Y M, Wu Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020: 1571-1580.

    [32]

    Liu S, Qi L, Qin H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768.

    [33]

    Duan K W, Bai S, Xie L X, et al. CenterNet: keypoint triplets for object detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019: 6569-6578.

    [34]

    Zhang H Y, Cisse M, Dauphin Y N, et al. mixup: beyond empirical risk minimization[Z]. arXiv: 1710.09412, 2017.

  • 加载中

(7)

(4)

计量
  • 文章访问数:  4457
  • PDF下载数:  1234
  • 施引文献:  0
出版历程
收稿日期:  2020-09-02
修回日期:  2020-12-21
刊出日期:  2021-04-15

目录

/

返回文章
返回