Next Article in Journal
Energy Prediction of Access Points in Wi-Fi Networks According to Users’ Behaviour
Previous Article in Journal
Pollutant Recognition Based on Supervised Machine Learning for Indoor Air Quality Monitoring Systems
Article Menu

Export Article

Open AccessArticle
Appl. Sci. 2017, 7(8), 826; doi:10.3390/app7080826

Optimized Deep Neural Networks for Real-Time Object Classification on Embedded GPUs

1
Dipartimento di Automatica e Informatica (DAUIN), Politecnico di Torino, 10129 Turin, Italy
2
Joint Open Lab, Telecom Italia Mobile (TIM), 10129 Turin, Italy
This paper is an extended version of our paper published in Proceedings of International Conference on Control, Decision and Information Technologies (CoDIT’17) as Syed Tahir Hussain Rizvi, Gianpiero Cabodi and Gianluca Francini, “GPU-only Unified ConvMM Layer for Neural Classifiers”, 2017 International Conference on Control, Decision and Information Technologies (CoDIT’17), Barcelona, Spain, 2017.
*
Author to whom correspondence should be addressed.
Received: 19 July 2017 / Revised: 7 August 2017 / Accepted: 10 August 2017 / Published: 11 August 2017
View Full-Text   |   Download PDF [4597 KB, uploaded 11 August 2017]   |  

Abstract

Convolution is the most computationally intensive task of the Convolutional Neural Network (CNN). It requires a lot of memory storage and computational power. There are different approaches to compute the solution of convolution and reduce its computational complexity. In this paper, a matrix multiplication-based convolution (ConvMM) approach is fully parallelized using concurrent resources of GPU (Graphics Processing Unit) and optimized, considerably improving the performance of the image classifiers and making them applicable to real-time embedded applications. The flow of this CUDA (Compute Unified Device Architecture)-based scheme is optimized using unified memory and hardware-dependent acceleration of matrix multiplication. Proposed flow is evaluated on two different embedded platforms: first on an Nvidia Jetson TX1 embedded board and then on a Tegra K1 GPU of an Nvidia Shield K1 Tablet. The performance of this optimized and accelerated convolutional layer is compared with its sequential and heterogeneous versions. Results show that the proposed scheme significantly improves the overall results including energy efficiency, storage requirement and inference performance. In particular, the proposed scheme on embedded GPUs is hundreds of times faster than the sequential version and delivers tens of times higher performance than the heterogeneous approach. View Full-Text
Keywords: concurrent computing; general purpose GPU; unified memory; convolutional neural networks; heterogeneous; matrix multiplication; CUDA basic linear algebra subroutines (cuBLAS); embedded platform concurrent computing; general purpose GPU; unified memory; convolutional neural networks; heterogeneous; matrix multiplication; CUDA basic linear algebra subroutines (cuBLAS); embedded platform
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Supplementary material

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Rizvi, S.T.H.; Cabodi, G.; Francini, G. Optimized Deep Neural Networks for Real-Time Object Classification on Embedded GPUs. Appl. Sci. 2017, 7, 826.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Appl. Sci. EISSN 2076-3417 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top