Fast Object Detection Using Dimensional Based Features for Public Street Environments
Abstract
:1. Introduction
- low computational requirements of the algorithm so it can be operated on inexpensive, low-performance and low-energy consumption devices (e.g., Raspberry Pi, Orange Pi and alike)
- less than 5% false positive error rate
- less than 30% false negative error rate
- detection of object types which are in need of illumination and expected in the street environment; the system uses information about object type to adjust illumination area, e.g., to provide longer visibility range for vehicles or dime the light according to the object class
- ability to use low-resolution frames (not less than 320 × 240 pixels) captured in gray-scale by an inexpensive camera with infrared illumination
2. Related Work and State of the Art
3. Method Description
- As a result of camera calibration, the preliminary data, such as internal and external camera parameters as well as lens distortion coefficients are obtained. The calibration can be done using well-known methods, such as the algorithm by Roger Y. Tsai [28].
- The pre-processing block receives images/video captured using a camera mounted on a static object and directed at the control area. Adaptive histogram equalization of contrast is applied to the input images to improve the results of the following background subtraction in nighttime scenes. The contrast improvement technique and its parameters can be adjusted depending on the concrete exploitation scenario. Lens distortions can be corrected either at the current stage, applying correction to the entire image, or at the stage of feature extraction to pixels of interest only to improve processing speed. The correction is performed using coefficients obtained via camera calibration.
- The taken image is fed to a background subtraction method for the purpose of object segmentation resulting in a binary mask containing moving objects. The background subtraction is chosen due to its relatively low requirements for computational power [29]. The method has some limitations, such as the inability to segment static objects, which however, do not contradict imposed requirements. Background subtraction methods build an average background model, which is regularly updated to adapt to the gradual changing of lighting and scene conditions [30]. The commonly used methods are based on Gaussian Mixture Models, such as GMG, KNN, MOG and MOG2 [31,32,33,34,35]. The MOG2 method provided by the OpenCV library was used in the software implementation of the proposed detection method.
- At the stage of filtering and transformation, the filtering is performed via a morphological opening on the binary image that includes procedures of erosion and dilation following each other with the same kernel. Erosion removes noises caused by insignificant fluctuations and lighting changes, while the dilation increases an area of ROIs to compensate erosion shrinking [36]. Additional dilation can be performed in order to join the sub-regions which were divided by the segmentation that happened due to insufficient illumination. Primary features such as a contour, bounding rectangle and image coordinates of ROI are extracted via standard methods of OpenCV, which are subsequently computed through the Green’s theorem [37]. The objects with frame borders’ intersections are ignored and not considered in the following stages due to misleading geometrical form which is the defining criterion of detection.
- Information about the camera, obtained during calibration, allows transforming initial ROI’s primary parameters into features estimated width, height and contour area which represent the object’s geometrical form in the real world (3D scene) independent of object distance and scene parameters. The detailed explanation of the stage feature extraction is provided in Section 4.
- The trained logistic regression classifier finally makes the decision whether the candidate belongs to a specific group based on obtained features. The stage of classification is discussed in Section 5.
4. Proposed Feature Extraction Method
4.1. Object Distance Obtaining
- an object intersects a ground surface in real world in at least one point, and
- the lowest object point in image lies on a ground surface in real world
- known intrinsic (internal) camera parameters such as equivalent focal length, physical dimensions of the sensor and image resolution
- known extrinsic (external) parameters which include tilt angle and real-world coordinates of the camera relative to the horizontal surface
- known distortion coefficients of the camera lenses
where | , | — | coordinates of the distorted point in image plane along horizontal and vertical axis in pixels |
– | — | radial distortion coefficients | |
— | tangential distortion coefficients |
where | — | bottom coordinate of the object in image plane in pixels | |
— | height of a camera sensor in mm | ||
— | total amount of pixels along the v-axis, |
where | f | — | focal length of a camera in mm |
— | incline of a camera with respect to the ground in degrees |
4.2. Object Height Estimation
where | — | the lowest and the highest object coordinates along the v-axis in pixels |
4.3. Object Width Estimation
where | u, v | — | coordinates of projected point in image plane in pixels |
K | — | an intrinsic camera matrix, represents relationship between camera coordinates and image plane coordinates. It includes focal lengths in pixels and central coordinates of image plan in pixels | |
— | an extrinsic camera matrix, describes a camera location and direction in world and includes matrix R rotated about X axis by an angle of and translation vector —. The values — can be equal to zero if camera translation is incorporated in coordinates | ||
W | — | real-world coordinates of a point in meters which are expressed in a homogeneous form |
where | — | an inverted intrinsic camera matrix | |
— | an inverted extrinsic camera matrix | ||
— | a scaling factor which is the distance to the object in camera coordinates in meters |
5. Classification
5.1. Features
where | , | — | height and width of object’s bounding rectangle in image plane in pixels |
— | a contour area of the object in pixels |
5.2. Object Classes
5.3. Data Acquisition
5.4. Model of Classifier
6. Evaluation Results
6.1. Testbed
- A
- The nighttime parking area includes moving objects on distances up to 25 m. The scene per image contains up to three pedestrians, two groups and one cyclist with and without a flashlight. The web-camera’s IR-cut filter has been removed. The camera is equipped with an external IR emitter.
- B
- Another nighttime parking area with up to six pedestrians, three groups, two cyclists with and without a flashlight and one vehicle are present on distances up to 25 m. The camera with removed IR-cut filter is equipped with an external IR emitter.
- C
- Nighttime city street includes up to four pedestrians, one group, two cyclists with flashlight and one vehicle on distances up to 50 m. The camera has an embedded IR illumination.
- D
- Daytime city street includes moving objects on distances up to 50 m, in particular, up to six pedestrians, three groups, three cyclists and two cars.
6.2. Classifier Precision
7. Conclusions
8. Future Work
8.1. Dynamic Error Compensation
where | A | — | an event that the object has been detected on at least more than half of the frames |
p | — | a true positive rate for object detection on a single frame | |
q | — | a false negative rate for object detection on a single frame | |
n | — | an amount of frames in the detection sequence | |
k | — | the least number of frames where the object is supposed to be detected |
8.2. Tracking System
8.3. Segmentation Improvement
8.4. Simulation Framework
Author Contributions
Funding
Conflicts of Interest
Abbreviations
CNN | Convolutional Neural Network |
RCNN | Region Based CNN |
ML | Machine Learning |
HOG | Histograms of Oriented Gradients |
SoC | System on Chip |
PIR | Passive Infra-Red |
GMM | Gaussian Mixture Model |
KNN | K-Nearest-Neighbor |
MOG | Mixture of Gaussians |
YOLO | You Look Only Once |
BS | Background Subtraction |
GPU | Graphics Processing Unit |
US | Ultra-Sonic |
RW | Radio Wave |
SoC | System on Chip |
IOT | Internet of Things |
IR | Infrared |
FPS | Frames Per Second |
mAP | mean Average Precision |
References
- Kim, C.; Lee, J.; Han, T.; Kim, Y.M. A hybrid framework combining background subtraction and deep neural networks for rapid person detection. J. Big Data 2018, 5, 22. [Google Scholar] [CrossRef]
- Siemens, E. Method for lighting e.g., road, involves switching on the lamp on detection of movement of person, and sending message to neighboring lamps through communication unit. Germany Patent DE102010049121 A1, 2012. [Google Scholar]
- Maolanon, P.; Sukvichai, K. Development of a Wearable Household Objects Finder and Localizer Device Using CNNs on Raspberry Pi 3. In Proceedings of the 2018 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE), Chonburi, Thailand, 14–16 December 2018; pp. 25–28. [Google Scholar] [CrossRef]
- Yang, L.W.; Su, C.Y. Low-Cost CNN Design for Intelligent Surveillance System. In Proceedings of the 2018 International Conference on System Science and Engineering (ICSSE), New Taipei, Taiwan, 28–30 June 2018; pp. 1–4. [Google Scholar] [CrossRef]
- Speed and Speed Management; Technical Report; European Commission, Directorate General for Transport: Brussel, Belgium, 2018.
- La Center Road Standards Ordinance. Chapter 12.10. Public and Private Road Standards; Standard, La Center: Washington, DC, USA, 2009.
- Code of Practice (Part-1); Technical Report; Institute of Urban Transport: Delhi, India, 2012.
- Lim, T.S.; Loh, W.Y.; Shih, Y.S. A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms. Mach. Learn. 2000, 40, 203–228. [Google Scholar] [CrossRef]
- Yavari, E.; Jou, H.; Lubecke, V.; Boric-Lubecke, O. Doppler radar sensor for occupancy monitoring. In Proceedings of the 2013 IEEE Topical Conference on Power Amplifiers for Wireless and Radio Applications, Santa Clara, CA, USA, 20 January 2013; pp. 145–147. [Google Scholar] [CrossRef]
- Teixeira, T. A Survey of Human-Sensing: Methods for Detecting Presence, Count, Location, Track and Identity. ACM Comput. Surv. 2010, 5, 59–69. [Google Scholar]
- Panasonic Corporation. PIR Motion Sensor. EKMB/EKMC Series; Panasonic Corporation: Kadoma, Japan, 2016. [Google Scholar]
- Digital Security Controls (DSC). DSC PIR Motion Detector. Digital Bravo 3. 2005. Available online: https://objects.eanixter.com/PD487897.PDF (accessed on 1 December 2019).
- Canali, C.; De Cicco, G.; Morten, B.; Prudenziati, M.; Taroni, A. A Temperature Compensated Ultrasonic Sensor Operating in Air for Distance and Proximity Measurements. IEEE Trans. Ind. Electron. 1982, IE-29, 336–341. [Google Scholar] [CrossRef]
- Mainetti, L.; Patrono, L.; Sergi, I. A survey on indoor positioning systems. In Proceedings of the 2014 22nd International Conference on Software, Telecommunications and Computer Networks (SoftCOM), Split, Croatia, 17–19 September 2014; pp. 111–120. [Google Scholar] [CrossRef]
- Matveev, I.; Siemens, E.; Dugaev, D.A.; Yurchenko, A. Development of the Detection Module for a SmartLighting System. In Proceedings of the 5th International Conference on Applied Innovations in IT, (ICAIIT), Koethen, Germany, 16 March 2017; Volume 5, pp. 87–94. [Google Scholar]
- Pfeifer, T.; Elias, D. Commercial Hybrid IR/RF Local Positioning System. In Proceedings of the Kommunikation in Verteilten Systemen (KiVS), Leipzig, Germany, 26–28 February 2003; pp. 1–9. [Google Scholar]
- Bai, Y.W.; Cheng, C.C.; Xie, Z.L. Use of ultrasonic signal coding and PIR sensors to enhance the sensing reliability of an embedded surveillance system. In Proceedings of the 2013 IEEE International Systems Conference (SysCon), Orlando, FL, USA, 15–18 April 2013; pp. 287–291. [Google Scholar] [CrossRef]
- Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Kauai, HI, USA, 8–14 December 2001; Volume 1, pp. I–511–I–518. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–6919. [Google Scholar] [CrossRef] [Green Version]
- Zou, Z.; Shi, Z.; Guo, Y.; Ye, J. Object Detection in 20 Years: A Survey. arXiv 2019, arXiv:1905.05055. [Google Scholar]
- Sager, D.H. Pedestrian Detection in Low Resolution Videos Using a Multi-Frame Hog-Based Detector. Int. Res. J. Comput. Sci. 2019, 6, 17. [Google Scholar]
- Van de Sande, K.E.A.; Uijlings, J.R.R.; Gevers, T.; Smeulders, A.W.M. Segmentation as selective search for object recognition. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 1879–5499. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef] [Green Version]
- Fernández-Caballero, A.; López, M.; Serrano-Cuerda, J. Thermal-Infrared Pedestrian ROI Extraction through Thermal and Motion Information Fusion. Sensors 2014, 14, 6666–6676. [Google Scholar] [CrossRef] [Green Version]
- Bertozzi, M.; Broggi, A.; Fascioli, A.; Graf, T.; Meinecke, M. Pedestrian Detection for Driver Assistance Using Multiresolution Infrared Vision. IEEE Trans. Veh. Technol. 2004, 53, 1666–1678. [Google Scholar] [CrossRef]
- Jeon, E.S.; Choi, J.S.; Lee, J.H.; Shin, K.Y.; Kim, Y.G.; Le, T.T.; Park, K.R. Human Detection Based on the Generation of a Background Image by Using a Far-Infrared Light Camera. Sensors 2015, 15, 6763–6788. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhao, X.; He, Z.; Zhang, S.; Liang, D. Robust pedestrian detection in thermal infrared imagery using a shape distribution histogram feature and modified sparse representation classification. Pattern Recognit. 2015, 48, 1947–1960. [Google Scholar] [CrossRef]
- Tsai, R. A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses. IEEE J. Robot. Autom. 1987, 3, 323–344. [Google Scholar] [CrossRef] [Green Version]
- Hardas, A.; Bade, D.; Wali, V. Moving Object Detection using Background Subtraction, Shadow Removal and Post Processing. Int. J. Comput. Appl. 2015, 975, 8887. [Google Scholar]
- Piccardi, M. Background subtraction techniques: A review. In Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583), Hague, The Netherlands, 10–13 October 2004; Volume 4, pp. 3099–3104. [Google Scholar] [CrossRef] [Green Version]
- Trnovszký, T.; Sýkora, P.; Hudec, R. Comparison of Background Subtraction Methods on Near Infra-Red Spectrum Video Sequences. Procedia Eng. 2017, 192, 887–892. [Google Scholar] [CrossRef]
- Godbehere, A.B.; Matsukawa, A.; Goldberg, K. Visual tracking of human visitors under variable-lighting conditions for a responsive audio art installation. In Proceedings of the 2012 American Control Conference (ACC), Montreal, QC, Canada, 27–29 June 2012; pp. 4305–4312. [Google Scholar] [CrossRef] [Green Version]
- Zivkovic, Z.; van der Heijden, F. Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognit. Lett. 2006, 27, 773–780. [Google Scholar] [CrossRef]
- KaewTraKulPong, P.; Bowden, R. An Improved Adaptive Background Mixture Model for Real-time Tracking with Shadow Detection. In Video-Based Surveillance Systems: Computer Vision and Distributed Processing; Remagnino, P., Jones, G.A., Paragios, N., Regazzoni, C.S., Eds.; Springe: Boston, MA, USA, 2002; pp. 135–144. [Google Scholar] [CrossRef] [Green Version]
- Zivkovic, Z. Improved adaptive Gaussian mixture model for background subtraction. In Proceedings of the 17th International Conference on Pattern Recognition, 2004 (ICPR 2004), Cambridge, UK, 26 August 2004; Volume 2, pp. 28–31. [Google Scholar] [CrossRef]
- Abbott, R.G.; Williams, L.R. Multiple target tracking with lazy background subtraction and connected components analysis. Mach. Vis. Appl. 2009, 20, 93–101. [Google Scholar] [CrossRef] [Green Version]
- Strang, G.; Herman, E.J. Green’s Theorem. In Calculus; OpenStax: Houston, TX, USA, 2016; Volume 3, Chapter 6. [Google Scholar]
- Sturm, P. Pinhole Camera Model. In Computer Vision: A Reference Guide; Ikeuchi, K., Ed.; Springer: Boston, MA, USA, 2014; pp. 610–613. [Google Scholar] [CrossRef]
- Ho, N.H.; Truong, P.H.; Jeong, G.M. Step-Detection and Adaptive Step-Length Estimation for Pedestrian Dead-Reckoning at Various Walking Speeds Using a Smartphone. Sensors 2016, 16, 1423. [Google Scholar] [CrossRef] [PubMed]
- Buchmüller, S.; Weidmann, U. Parameters of Pedestrians, Pedestrian Traffic and Walking Facilities; ETH Zurich: Zürich, Switzerland, 2006. [Google Scholar] [CrossRef]
- Collision Mitigation System: Pedestrian Test Target, Final Design Report; Technical Report; Mechanical Engineering Department, California Polytechnic State University: San Luis Obispo, CA, USA, 2017.
- Blocken, B.; Druenen, T.V.; Toparlar, Y.; Malizia, F.; Mannion, P.; Andrianne, T.; Marchal, T.; Maas, G.J.; Diepens, J. Aerodynamic drag in cycling pelotons: New insights by CFD simulation and wind tunnel testing. J. Wind Eng. Ind. Aerodyn. 2018, 179, 319–337. [Google Scholar] [CrossRef]
- Fintelman, D.; Hemida, H.; Sterling, M.; Li, F.X. CFD simulations of the flow around a cyclist subjected to crosswinds. J. Wind Eng. Ind. Aerodyn. 2015, 144, 31–41. [Google Scholar] [CrossRef] [Green Version]
- Nikouei, S.Y.; Chen, Y.; Song, S.; Xu, R.; Choi, B.Y.; Faughnan, T.R. Real-Time Human Detection as an Edge Service Enabled by a Lightweight CNN. In Proceedings of the 2018 IEEE International Conference on Edge Computing (EDGE), San Francisco, CA, USA, 2–7 July 2018; pp. 125–129. [Google Scholar] [CrossRef] [Green Version]
- Durr, O.; Pauchard, Y.; Browarnik, D.; Axthelm, R.; Loeser, M. Deep Learning on a Raspberry Pi for Real Time Face Recognition. In Eurographics (Posters); The Eurographics Association: Geneve, Switzerland, 2015; pp. 11–12. [Google Scholar]
- Vikram, K.; Padmavathi, S. Facial parts detection using Viola Jones algorithm. In Proceedings of the 2017 4th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 6–7 January 2017; pp. 1–4. [Google Scholar] [CrossRef]
- Noman, M.; Yousaf, M.H.; Velastin, S.A. An Optimized and Fast Scheme for Real-Time Human Detection Using Raspberry Pi. In Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, Australia, 30 November–2 December 2016; pp. 1–7. [Google Scholar] [CrossRef] [Green Version]
Object Class | Target Group | |||
---|---|---|---|---|
Pedestrian | Group | Cyclist | Vehicle | |
Width, m | 0.3–0.64 | 1.2–3 | 0.3–0.64 | 1.4–1.8 |
Height, m | 1.15–2 | 1.5–2 | 1.5–1.9 | 1.4–2 |
Depth (step length), m | 0.3–0.79 | 0.3–0.79 | 1.5–1.7 | 3.7–4.6 |
Scene | A | B | C | D |
---|---|---|---|---|
Times of day | Night | Day | ||
Description | Parking 1 | Parking 2 | City street | City street |
Camera | Logitech C920 HD | Microsoft HD-3000 | Gadinan Full HD IP cam | |
Height, m | 3 | 3.1 | 5 | |
Angle, ° | 13 | 22 | 17 | |
Resolution | 424 × 240 | 320 × 240 | 424 × 240 | |
Focal length, mm | 3.67 | 0.73 | 3.6 | |
Sensor dim., mm | 4.8 × 3.6 | 0.7 × 0.52 | 3.45 × 1.94 | |
Objects distance, m | up to 25 | up to 25 | up to 50 | |
Noises | 4575 | 13,724 | 2474 | 2258 |
Pedestrians | 1364 | 6705 | 326 | 1990 |
Groups | 108 | 473 | 218 | 928 |
Cyclists | 335 | 134 | 183 | 730 |
Vehicles | — | 26 | 238 | 398 |
Board | FPS | CPU | SoC | ||
---|---|---|---|---|---|
Min | Mean | Max | |||
Beaglebone Black | 4 | 5 | 7 | Cortex-A8 1 GHz | AM3358 |
Orange Pi PC | 10 | 13 | 15 | 4 × Cortex-A7 1.4 GHz | Allwinner H3 |
Orange Pi Zero H5 | 25 | 28 | 35 | 4 × Cortex-A53 1.4 GHz | Allwinner H5 |
Raspberry Pi 3 Model B+ | 30 | 35 | 40 | 4 × Cortex-A53 1.4 GHz | Broadcom BCM2837B0 |
Algorithm | Our Method | HOG | Viola-Jones | SSD GoogleNet | L-CNN | YOLO |
---|---|---|---|---|---|---|
FPS | 35 | 0.3 | 1.82 | 0.39 | 0.39 | 1.25 |
CPU, % | 25 | 93 | 76.9 | 84.7 | 75.7 | - |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Matveev, I.; Karpov, K.; Chmielewski, I.; Siemens, E.; Yurchenko, A. Fast Object Detection Using Dimensional Based Features for Public Street Environments. Smart Cities 2020, 3, 93-111. https://doi.org/10.3390/smartcities3010006
Matveev I, Karpov K, Chmielewski I, Siemens E, Yurchenko A. Fast Object Detection Using Dimensional Based Features for Public Street Environments. Smart Cities. 2020; 3(1):93-111. https://doi.org/10.3390/smartcities3010006
Chicago/Turabian StyleMatveev, Ivan, Kirill Karpov, Ingo Chmielewski, Eduard Siemens, and Aleksey Yurchenko. 2020. "Fast Object Detection Using Dimensional Based Features for Public Street Environments" Smart Cities 3, no. 1: 93-111. https://doi.org/10.3390/smartcities3010006
APA StyleMatveev, I., Karpov, K., Chmielewski, I., Siemens, E., & Yurchenko, A. (2020). Fast Object Detection Using Dimensional Based Features for Public Street Environments. Smart Cities, 3(1), 93-111. https://doi.org/10.3390/smartcities3010006