T-RexNet—A Hardware-Aware Neural Network for Real-Time Detection of Small Moving Objects
Abstract
:1. Introduction
2. Tiny Moving Object Detection: State of the Art
2.1. Single-Image General-Purpose Solutions
- Multi-scale representation: high- and low- resolution feature maps stem from different levels of a feature-extraction network; after super-sampling low-resolution maps, features fuse together by applying either element-wise sum (Multi-scale deconvolutional single shot detector (MDSSD) [17]) or concatenation (Diverse region-based CNN (DR-CNN), [18]).
- Contextual information: the network takes into account explicitly the contextual information around a candidate object. For example, ContextNet [19] applies a custom region-proposal network specifically aimed to small objects, and for each candidate region an enlarged region is used to process contextual information.
- Super resolution: generative adversarial networks generate a higher-resolution version of the candidate object, thus improving accuracy in the detection of small objects (Perceptual generative adversarial networks (PGAN)) [20]).
- Mixed methods: features with distinct scales are extracted from different layers of a convolutional neural network; they are concatenated together, and then used to generate a series of pyramid features [21].
2.2. Background Subtraction and Frame-Difference Solutions
2.3. Spatio-Temporal Convolutional Neural Networks (CNNs)
2.4. Summary of Contribution
3. Methodology
3.1. Step 1: Extracting Motion-Augmented Images
3.2. Step 2: Feature Extraction
3.3. Step 3: Object Detection
4. Experimental Setup
4.1. Scenarios
- (1)
- the research community proved the comparison’s effectiveness in object detection and its implementation on embedded devices; the experiments focused on each method’s ability to detect small moving objects;
- (2)
- the various methods had been targeted to their specific test scenario, hence comparisons with T-RexNet could highlight the latter’s balance between accuracy and speed.
Scenario | # of Obj. | Obj. Size | Obj. Speed | Im. Size |
---|---|---|---|---|
Aerial surv. | High | Small | Mid | 2000 |
Civilian surv. | Medium | Med. & small | Low | 512 |
Fast obj. track. | Single | Small | High | 300 |
4.1.1. Aerial Surveillance
4.1.2. Civilian Surveillance
4.1.3. Tennis Ball Tracking
4.2. Deployment
5. Results
5.1. Aerial Surveillance
5.2. Civilian Surveillance
5.3. Tennis Ball Tracking
5.4. Deployment of T-RexNet on the Jetson Nano
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Hyperparameters and Training Details
References
- Mhalla, A.; Chateau, T.; Gazzah, S.; Amara, N.E.B. An embedded computer-vision system for multi-object detection in traffic surveillance. IEEE Trans. Intell. Transp. Syst. 2018, 20, 4006–4018. [Google Scholar] [CrossRef]
- Ragusa, E.; Gianoglio, C.; Zunino, R.; Gastaldo, P. Image Polarity Detection on Resource-Constrained Devices. IEEE Intell. Syst. 2020, 35, 50–57. [Google Scholar] [CrossRef]
- Huang, Y.C.; Liao, I.N.; Chen, C.H.; İk, T.U.; Peng, W.C. TrackNet: A Deep Learning Network for Tracking High-speed and Tiny Objects in Sports Applications. In Proceedings of the 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan, 18–21 September 2019; pp. 1–8. [Google Scholar]
- Hawk-Eye Innovations Ltd. HawkEye System. Available online: http://www.hawkeyeinnovations.com (accessed on 8 June 2020).
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Nv, S.M. STM32 32bit Arm Cortex MCUs. Available online: https://www.st.com/en/microcontrollers-microprocessors/stm32-32-bit-arm-cortex-mcus.html (accessed on 8 June 2020).
- Intel Corporation. Intel Movidius Neural Compute Stick. Available online: https://software.intel.com/content/www/us/en/develop/articles/intel-movidius-neural-compute-stick.html (accessed on 8 June 2020).
- Corporation, N. NVIDIA Autonomous Machines. Available online: https://www.nvidia.com/autonomous-machines/embedded-systems/ (accessed on 8 June 2020).
- Nair, D.; Pakdaman, A.; Plöger, P.G. Performance Evaluation of Low-Cost Machine Vision Cameras for Image-Based Grasp Verification. arXiv 2020, arXiv:2003.10167. [Google Scholar]
- Liu, Y.; Sun, P.; Wergeles, N.; Shang, Y. A survey and performance evaluation of deep learning methods for small object detection. Expert Syst. Appl. 2020. [Google Scholar] [CrossRef]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-fcn: Object detection via region-based fully convolutional networks. arXiv 2016, arXiv:1605.06409. [Google Scholar]
- Fu, C.Y.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. Dssd: Deconvolutional single shot detector. arXiv 2017, arXiv:1701.06659. [Google Scholar]
- Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.; Chen, J.; Liu, X.; Pietikäinen, M. Deep learning for generic object detection: A survey. Int. J. Comput. Vis. 2020, 128, 261–318. [Google Scholar] [CrossRef] [Green Version]
- Chen, G.; Wang, H.; Chen, K.; Li, Z.; Song, Z.; Liu, Y.; Chen, W.; Knoll, A. A survey of the four pillars for small object detection: Multiscale representation, contextual information, super-resolution, and region proposal. IEEE Trans. Syst. Man Cybern. Syst. 2020. [Google Scholar] [CrossRef]
- Cui, L.; Ma, R.; Lv, P.; Jiang, X.; Gao, Z.; Zhou, B.; Xu, M. Mdssd: Multi-scale deconvolutional single shot detector for small objects. arXiv 2018, arXiv:1805.07009. [Google Scholar] [CrossRef] [Green Version]
- Zhang, M.; Li, W.; Du, Q. Diverse region-based CNN for hyperspectral image classification. IEEE Trans. Image Process. 2018, 27, 2623–2634. [Google Scholar] [CrossRef] [PubMed]
- Chen, C.; Liu, M.Y.; Tuzel, O.; Xiao, J. R-CNN for small object detection. In Asian Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 214–230. [Google Scholar]
- Li, J.; Liang, X.; Wei, Y.; Xu, T.; Feng, J.; Yan, S. Perceptual generative adversarial networks for small object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–27 July 2017; pp. 1222–1230. [Google Scholar]
- Lin, H.; Zhou, J.; Gan, Y.; Vong, C.M.; Liu, Q. Novel up-scale feature aggregation for object detection in aerial images. Neurocomputing 2020, 411, 364–374. [Google Scholar] [CrossRef]
- Joshi, K.A.; Thakore, D.G. A survey on moving object detection and tracking in video surveillance system. Int. J. Soft Comput. Eng. 2012, 2, 44–48. [Google Scholar]
- KaewTraKulPong, P.; Bowden, R. An improved adaptive background mixture model for real-time tracking with shadow detection. In Video-Based Surveillance Systems; Springer: Berlin/Heidelberg, Germany, 2002; pp. 135–144. [Google Scholar]
- Piccardi, M. Background subtraction techniques: A review. In Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No. 04CH37583), The Hague, The Netherlands, 10–13 October 2004; Volume 4, pp. 3099–3104. [Google Scholar]
- Váraljai, G.; Szénási, S. Projectile Detection and Avoidance using Computer Vision. In Proceedings of the 2020 IEEE 20th International Symposium on Computational Intelligence and Informatics (CINTI), Budapest, Hungary, 5–7 November 2020; pp. 000157–000160. [Google Scholar]
- Rakibe, R.S.; Patil, B.D. Background subtraction algorithm based human motion detection. Int. J. Sci. Res. Publ. 2013, 3, 2250–3153. [Google Scholar]
- Horprasert, T.; Harwood, D.; Davis, L.S. A statistical approach for real-time robust background subtraction and shadow detection. IEEE ICCV Citeseer 1999, 99, 1–19. [Google Scholar]
- Kim, Z. Real time object tracking based on dynamic feature grouping with background subtraction. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Siam, M.; Mahgoub, H.; Zahran, M.; Yogamani, S.; Jagersand, M.; El-Sallab, A. Modnet: Moving object detection network with motion and appearance for autonomous driving. arXiv 2017, arXiv:1709.04821. [Google Scholar]
- Qiu, Z.; Yao, T.; Mei, T. Learning spatio-temporal representation with pseudo-3d residual networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5533–5541. [Google Scholar]
- LaLonde, R.; Zhang, D.; Shah, M. Clusternet: Detecting small objects in large scenes by exploiting spatio-temporal information. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4003–4012. [Google Scholar]
- Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2014; pp. 818–833. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Sommer, L.W.; Teutsch, M.; Schuchert, T.; Beyerer, J. A survey on moving object detection for wide area motion imagery. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016; pp. 1–9. [Google Scholar]
- Liu, Y.; Gadepalli, K.; Norouzi, M.; Dahl, G.E.; Kohlberger, T.; Boyko, A.; Venugopalan, S.; Timofeev, A.; Nelson, P.Q.; Corrado, G.S.; et al. Detecting cancer metastases on gigapixel pathology images. arXiv 2017, arXiv:1703.02442. [Google Scholar]
- Wang, M.; Li, W.; Wang, X. Transferring a generic pedestrian detector towards specific scenes. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3274–3281. [Google Scholar]
- Corporation, N. Jetson Nano. Available online: https://developer.nvidia.com/embedded/jetson-nano (accessed on 8 June 2020).
- Ragusa, E.; Apicella, T.; Gianoglio, C.; Zunino, R.; Gastaldo, P. Design and deployment of an image polarity detector with visual attention. Cogn. Comput. 2021. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2016; pp. 770–778. [Google Scholar]
Parameter | Value |
---|---|
AI Performance | 472 GFLOPs |
GPU | 128-core NVIDIA Maxwell GPU |
CPU | Quad-Core ARM Cortex-A57 MPCore |
Memory | 4 GB 64-bit LPDDR4 25.6 GB/s |
Storage | 16 GB eMMC 5.1 |
Power | 5 W/10 W |
Mechanical | 69.6 mm × 45 mm 260-pin SO-DIMM connector |
Aerial Surveillance | |
---|---|
T-RexNet | 0.91 (3) |
ClusterNet | 0.95 * (0.3 *) |
Median BG+ N | 0.89 * |
Civilian Surveillance | ||
---|---|---|
Normal | Small | |
T-RexNet | 0.77 (44) | 0.79 (44) |
Faster R-CNN | 0.69 (23) | 0.5 (23) |
SSD512 | 0.73 (41) | 0.59 (41) |
Tennis Ball Tracking | |||
---|---|---|---|
A | B | C | |
T-RexNet | 0.78 (47) | 0.84 (47) | 0.67 (47) |
SSD300 | 0.34 (43) | <0.2 (43) | 0.23 (43) |
TrackNet | >0.84 * (2.2) | >0.84 * (2.2) | >0.84 * (2.2) |
Power Mode | 512 × 512 | 300 × 300 | ||
---|---|---|---|---|
TRT (ms) | TF (ms) | TRT (ms) | TF (ms) | |
Max-N | 70.28 | 437.15 | 65.45 | 431.28 |
5W | 108.77 | 616.74 | 98.28 | 661.14 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Canepa, A.; Ragusa, E.; Zunino, R.; Gastaldo, P. T-RexNet—A Hardware-Aware Neural Network for Real-Time Detection of Small Moving Objects. Sensors 2021, 21, 1252. https://doi.org/10.3390/s21041252
Canepa A, Ragusa E, Zunino R, Gastaldo P. T-RexNet—A Hardware-Aware Neural Network for Real-Time Detection of Small Moving Objects. Sensors. 2021; 21(4):1252. https://doi.org/10.3390/s21041252
Chicago/Turabian StyleCanepa, Alessio, Edoardo Ragusa, Rodolfo Zunino, and Paolo Gastaldo. 2021. "T-RexNet—A Hardware-Aware Neural Network for Real-Time Detection of Small Moving Objects" Sensors 21, no. 4: 1252. https://doi.org/10.3390/s21041252
APA StyleCanepa, A., Ragusa, E., Zunino, R., & Gastaldo, P. (2021). T-RexNet—A Hardware-Aware Neural Network for Real-Time Detection of Small Moving Objects. Sensors, 21(4), 1252. https://doi.org/10.3390/s21041252