Geometric Recognition of Moving Objects in Monocular Rotating Imagery Using Faster R-CNN
Abstract
:1. Introduction
2. Methodology
2.1. Camera Motion Rectification
2.2. Motion Segmentation
2.3. Moving Object Recognition
2.4. Geometric Observing
3. Results and Discussion
3.1. Quantitative Evaluation with Synthetic Configuration
3.2. Street View Surveillance of a Rotating PTZ Camera
3.3. Performance Evaluation of Various Networks
4. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Yazdi, M.; Bouwmans, T. New trends on moving object detection in video images captured by a moving camera: A survey. Comput. Sci. Rev. 2018, 28, 157–177. [Google Scholar] [CrossRef]
- Kandylakis, Z.; Vasili, K.; Karantzalos, K. Fusing multimodal video data for detecting moving objects/targets in challenging indoor and outdoor scenes. Remote Sens. 2019, 11, 446. [Google Scholar] [CrossRef] [Green Version]
- Maglogiannis, I.G. Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies; IOS Press: Amsterdam, The Netherlands, 2007. [Google Scholar]
- Zang, Y.P.; Zhang, F.J.; Di, C.A.; Zhu, D.B. Advances of flexible pressure sensors toward artificial intelligence and health care applications. Mater. Horiz. 2015, 2, 140–156. [Google Scholar] [CrossRef]
- Zhang, L.Q.; Zhang, L. Deep learning-based classification and reconstruction of residential scenes from large-scale point clouds. IEEE Trans. Geosci. Remote Sens. 2018, 56, 1887–1897. [Google Scholar] [CrossRef]
- Yang, M.D.; Su, T.C. Automated diagnosis of sewer pipe defects based on machine learning approaches. Expert Syst. Appl. 2008, 35, 1327–1337. [Google Scholar] [CrossRef]
- Su, T.C.; Yang, M.D. Application of morphological segmentation to leaking defect detection in sewer pipelines. Sensors 2014, 14, 8686–8704. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhong, Z.L.; Li, J.; Luo, Z.M.; Chapman, M. Spectral–spatial residual network for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 2018, 56, 847–858. [Google Scholar] [CrossRef]
- Yang, M.D.; Su, T.C.; Lin, H.Y. Fusion of infrared thermal image and visible image for 3D thermal model reconstruction using smartphone. Sensors 2018, 18, 2003. [Google Scholar] [CrossRef] [Green Version]
- Ojha, S.; Sakhare, S. Image processing techniques for object tracking in video surveillance—A survey. In Proceedings of the 2015 International Conference on Pervasive Computing, Pune, India, 8–10 January 2015. [Google Scholar]
- Zhang, G.; Jia, J.; Xiong, W.; Wong, T.T.; Heng, P.A.; Bao, H. Moving object extraction with a hand-held camera. In Proceedings of the 2007 International Conference on Computer Vision, Rio de Janeiro, Brazil, 14–21 October 2007. [Google Scholar]
- Das, D.; Saharia, S. Implementation and performance evaluation of background subtraction algorithms. Int. J. Comput. Sci. Appl. 2014, 4, 50–55. [Google Scholar] [CrossRef] [Green Version]
- Nguyen, V.T.; Vu, H.; Tran, T.H. An efficient combination of RGB and depth for background subtraction. In The National Foundation for Science and Technology Development (NAFOSTED) Conference on Information and Computer Science; Dang, Q.A., Nguyen, X.H., Le, H.B., Nguyen, V.H., Bao, V.N.Q., Eds.; Springer: Cham, Switzerland, 2014; pp. 49–63. [Google Scholar]
- Yin, P.; Criminisi, A.; Winn, J.; Essa, I. Bilayer segmentation of webcam videos using tree-based classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 30–42. [Google Scholar] [CrossRef] [Green Version]
- Criminisi, A.; Cross, G.; Blake, A.; Kolmogorov, V. Bilayer segmentation of live video. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 17–22 June 2006; pp. 53–60. [Google Scholar]
- Sun, J.; Zhang, W.; Tang, X.; Shum, H.Y. Background cut. In Proceedings of the 2006 European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 628–641. [Google Scholar]
- Athanesious, J.J.; Suresh, P. Systematic survey on object tracking methods in video. J. Adv. Comput. Eng. Technol. 2012, 1, 242–247. [Google Scholar]
- Balaji, S.R.; Karthikeyan, S. A survey on moving object tracking using image processing. In Proceedings of the 2017 International Conference on Intelligent Systems and Control, Coimbatore, India, 5–6 January 2017. [Google Scholar]
- Yang, M.D.; Huang, K.S.; Kuo, Y.H.; Tsai, H.P.; Lin, L.M. Spatial and spectral hybrid image classification for rice-lodging assessment through UAV imagery. Remote Sens. 2017, 9, 583. [Google Scholar] [CrossRef] [Green Version]
- Kadim, Z.; Daud, M.M.; Radzi, S.S.M.; Samudin, N.; Woon, H.H. Method to detect and track moving object in non-static PTZ camera. In Proceedings of the International MultiConference of Engineers and Computer Scientists, Hong Kong, China, 13–15 March 2013. [Google Scholar]
- Yang, M.D.; Su, T.C.; Pan, N.F.; Liu, P. Feature extraction of sewer pipe defects using wavelet transform and co-occurrence matrix. Int. J. Wavelets Multiresolut. Inf. Process. 2011, 9, 211–225. [Google Scholar] [CrossRef]
- Nayagam, M.G.; Ramar, D.K. A survey on real time object detection and tracking algorithms. Int. J. Appl. Eng. Res. 2015, 10, 8290–8297. [Google Scholar]
- Chauhan, A.K.; Krishan, P. Moving object tracking using gaussian mixture model and optical flow. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2013, 3, 243–246. [Google Scholar]
- Cheung, S.S.; Kamath, C. Robust techniques for background subtraction in urban traffic video. In Proceedings of the 2004 Visual Communications and Image Processing, San Jose, CA, USA, 18–22 January 2004; Volume 5308, pp. 881–892. [Google Scholar]
- Sankari, M.; Meena, C. Estimation of dynamic background and object detection in noisy visual surveillance. Int. J. Adv. Comput. Sci. Appl. 2011, 2, 77–83. [Google Scholar] [CrossRef] [Green Version]
- Brutzer, S.; Höferlin, B.; Heidemann, G. Evaluation of background subtraction techniques for video surveillance. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 20–25 June 2011. [Google Scholar]
- Rakibe, R.S.; Patil, B.D. Background subtraction algorithm based human motion detection. Int. J. Sci. Res. Publ. 2013, 3, 2250–3153. [Google Scholar]
- Vedula, S.; Baker, S.; Rander, P.; Collins, R.; Kanade, T. Three-dimensional scene flow. In Proceedings of the 1999 International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; Volume 2, pp. 722–729. [Google Scholar]
- Yang, M.D.; Chao, C.F.; Lu, L.Y.; Huang, K.S.; Chen, Y.P. Image-based 3D scene reconstruction and exploration in augmented reality. Autom. Constr. 2013, 3, 48–60. [Google Scholar] [CrossRef]
- Lalonde, M.; Foucher, S.; Gagnon, L.; Pronovost, E.; Derenne, M.; Janelle, A. A system to automatically track humans and vehicles with a PTZ camera. In Proceedings of the SPIE Defense and Security: Visual Information Processing XVI (SPIE #6575), Orlando, FL, USA, 30 April 2007. [Google Scholar]
- Black, M.J.; Anandan, P. The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Comput. Vis. Image Underst. 1996, 63, 75–104. [Google Scholar] [CrossRef]
- Yang, M.D.; Su, T.C.; Pan, N.F.; Yang, Y.F. Systematic image quality assessment for sewer inspection. Expert Syst. Appl. 2011, 38, 1766–1776. [Google Scholar] [CrossRef]
- Parekh, H.S.; Thakore, D.G.; Jaliya, U.K. A survey on object detection and tracking methods. Int. J. Innov. Res. Comput. Commun. Eng. 2014, 2, 2970–2978. [Google Scholar]
- Long, Y.; Gong, Y.; Xiao, Z.; Liu, Q. Accurate object localization in remote sensing images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2486–2498. [Google Scholar] [CrossRef]
- Nimmagadda, Y.; Kumar, K.; Lu, Y.H.; Lee, G.C.S. Real-time moving object recognition and tracking using computation offloading. In Proceedings of the 2010 Intelligent Robots and Systems, Taipei, Taiwan, 18–22 October 2010; pp. 2449–2455. [Google Scholar]
- Hu, W.C.; Chen, C.H.; Chen, T.Y.; Huang, D.Y.; Wu, Z.C. Moving object detection and tracking from video captured by moving camera. J. Vis. Commun. Image Represent. 2015, 30, 164–180. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 142–158. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Jiao, J.; Zhang, Y.; Sun, H. A densely connected end-to-end neural network for multiscale and multiscene SAR ship detection. IEEE Access. 2018, 6, 20881–20892. [Google Scholar] [CrossRef]
- Akcay, S.; Kundegorski, M.E.; Willcocks, C.G.; Breckon, T.P. Using deep convolutional neural network architectures for object classification and detection within X-ray baggage security imagery. IEEE Trans. Inf. Forensic Secur. 2018, 13, 2203–2215. [Google Scholar] [CrossRef] [Green Version]
- Yang, M.D.; Tseng, H.H.; Hsu, Y.C.; Tsai, H.P. Semantic Segmentation Using Deep Learning with Vegetation Indices for Rice Lodging Identification in Multi-date UAV Visible Images. Remote Sens. 2020, 12, 633. [Google Scholar] [CrossRef] [Green Version]
- Ferone, A.; Maddalena, L. Neural background subtraction for pan-tilt-zoom cameras. IEEE Trans. Syst. Man Cybern. Syst. 2013, 44, 571–579. [Google Scholar] [CrossRef]
- Wu, J. Complexity and accuracy analysis of common artificial neural networks on pedestrian detection. MATEC Web Conf. 2018, 232, 01003. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? In The KITTI vision benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 18–20 June 2012; pp. 3354–3361. [Google Scholar]
- Bay, H.; Ess, A.; Tuytelaars, T.; Gool, L.V. SURF: Speeded up robust features. Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
- Raguram, R.; Chum, O.; Pollefeys, M.; Matas, J.; Frahm, J. USAC: A universal framework for random sample consensus. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2022–2038. [Google Scholar] [CrossRef] [PubMed]
- Longuet-Higgins, H.C. A computer algorithm for reconstructing a scene from two projections. Nature 1981, 293, 133–135. [Google Scholar] [CrossRef]
- Yang, M.D.; Huang, K.S.; Yang, Y.F.; Lu, L.Y.; Feng, Z.Y.; Tsai, H.P. Hyperspectral image classification using fast and adaptive bidimensional empirical mode decomposition with minimum noise fraction. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1950–1954. [Google Scholar] [CrossRef]
- Zoph, B.; Le, Q.V. Neural architecture search with reinforcement learning. In Proceedings of the 2017 International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Engel, J.; Schöps, T.; Cremers, D. LSD-SLAM: Large-scale direct monocular SLAM. In Proceedings of the 2014 European Conference on Computer Vision—ECCV, Zurich, Switzerland, 6–12 September 2014; pp. 834–849. [Google Scholar]
- Mur-Artal, R.; Montiel, J.M.M.; Tardós, J.D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. In Proceedings of the 2014 European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; Susstrunk, S. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2274–2282. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Static Camera Configuration | ||||||||
---|---|---|---|---|---|---|---|---|
Quality Indexes | Stable Lighting | Lighting Change | ||||||
a–b | b–c | c–d | Avg. | a–b | b–c | c–d | Avg. | |
Recall | 0.78 | 0.96 | 0.75 | 0.83 | 0.65 | 0.88 | 0.79 | 0.77 |
Precision | 0.90 | 0.95 | 0.99 | 0.94 | 0.95 | 0.92 | 0.93 | 0.94 |
0.83 | 0.95 | 0.85 | 0.88 | 0.77 | 0.77 | 0.85 | 0.80 | |
Rotating Camera Configuration | ||||||||
Stable lighting | Lighting change | |||||||
a–b | b–c | c–d | Avg. | a–b | b–c | c–d | Avg. | |
Recall | 0.69 | 0.86 | 0.82 | 0.79 | 0.70 | 0.92 | 0.59 | 0.74 |
Precision | 0.97 | 0.99 | 0.99 | 0.98 | 0.98 | 0.99 | 0.99 | 0.98 |
0.81 | 0.92 | 0.89 | 0.87 | 0.82 | 0.83 | 0.74 | 0.80 |
Class | Height (cm) | Velocity (km/hr) | |||
---|---|---|---|---|---|
Mean | Std. dev. | Mean | Std. dev. | ||
Object 1 | White vehicle | 170.8 | 6.4 | 21.3 | 7.3 |
Object 2 | Pedestrian | 168 | 3.3 | 4.7 | 1.5 |
Object 3 | Blue vehicle | 180.2 | 4.3 | 27.6 | 9.9 |
Class | Height (cm) | Height (cm) | ||
---|---|---|---|---|
Mean | Std. dev. | Specification | ||
Object 1 | White vehicle | 161.4 | 3.1 | 148 |
Object 3 | Blue vehicle | 174.4 | 2.6 | 171 |
Object ID | Faster R-CNN | Mask R-CNN | YOLOv3 | ||||
---|---|---|---|---|---|---|---|
Height (cm) | |||||||
Mean | Std. dev. | Mean | Std. dev. | Mean | Std. dev. | ||
Indoor | Person 1 | 176.8 | 2.3 | 177.3 | 1.9 | 175.6 | 2.1 |
Corridor | Person 2 | 177.5 | 3.1 | 176.1 | 2.3 | 1.77.9 | 3.2 |
Person 3 | 176.5 | 2.6 | 174.6 | 3.2 | 176.6 | 2.9 | |
Person 4 | 173.2 | 2.5 | 169.8 | 4.8 | 174.2 | 2.3 | |
Construction site | Person 5 | 179.7 | 1.8 | 177.2 | 4.4 | 179.2 | 2.2 |
Car | 169.7 | 2.5 | 172.6 | 2.4 | 171.6 | 2.5 | |
Truck 1 | 193.6 | 2.7 | 194.7 | 3.2 | 203.5 | 3.3 | |
Truck 2 | 324.7 | 2.7 | 304.3 | 3.5 | 3.28 | 2.9 |
Object ID | Faster R-CNN | Mask R-CNN | YOLOv3 | KITTI | ||||
---|---|---|---|---|---|---|---|---|
Height (cm) | ||||||||
Mean | Std. dev. | Mean | Std. dev. | Mean | Std. dev. | True Value | ||
Sequence 1 | Person 1 | 185.6 | 1.7 | 184.3 | 1.4 | 186.2 | 1.6 | 182 |
Sequence 2 | Person 2 | 174.1 | 2.5 | 174.6 | 3.4 | 179.5 | 2.8 | 173 |
Person 3 | 181.5 | 2.1 | 182.2 | 3.1 | 186.7 | 2.5 | 179 | |
Car | 206.6 | 1.7 | 204.4 | 2.2 | n/a | 211 | ||
Bicycle 2 | 107.6 | 3.5 | 114.3 | 2.3 | 116.2 | 3.8 | 110 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chuang, T.-Y.; Han, J.-Y.; Jhan, D.-J.; Yang, M.-D. Geometric Recognition of Moving Objects in Monocular Rotating Imagery Using Faster R-CNN. Remote Sens. 2020, 12, 1908. https://doi.org/10.3390/rs12121908
Chuang T-Y, Han J-Y, Jhan D-J, Yang M-D. Geometric Recognition of Moving Objects in Monocular Rotating Imagery Using Faster R-CNN. Remote Sensing. 2020; 12(12):1908. https://doi.org/10.3390/rs12121908
Chicago/Turabian StyleChuang, Tzu-Yi, Jen-Yu Han, Deng-Jie Jhan, and Ming-Der Yang. 2020. "Geometric Recognition of Moving Objects in Monocular Rotating Imagery Using Faster R-CNN" Remote Sensing 12, no. 12: 1908. https://doi.org/10.3390/rs12121908
APA StyleChuang, T.-Y., Han, J.-Y., Jhan, D.-J., & Yang, M.-D. (2020). Geometric Recognition of Moving Objects in Monocular Rotating Imagery Using Faster R-CNN. Remote Sensing, 12(12), 1908. https://doi.org/10.3390/rs12121908