Next Article in Journal
GhostBlock-Augmented Lightweight Gaze Tracking via Depthwise Separable Convolution
Previous Article in Journal
Evaluation of Vision Transformers for Multi-Organ Tumor Classification Using MRI and CT Imaging
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Accelerating Deep Learning Inference: A Comparative Analysis of Modern Acceleration Frameworks

Department of Computer Science, Texas State University, San Marcos, TX 78666, USA
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(15), 2977; https://doi.org/10.3390/electronics14152977
Submission received: 16 June 2025 / Revised: 19 July 2025 / Accepted: 21 July 2025 / Published: 25 July 2025
(This article belongs to the Special Issue Hardware Acceleration for Machine Learning)

Abstract

Deep learning (DL) continues to play a pivotal role in a wide range of intelligent systems, including autonomous machines, smart surveillance, industrial automation, and portable healthcare technologies. These applications often demand low-latency inference and efficient resource utilization, especially when deployed on embedded or edge devices with limited computational capacity. As DL models become increasingly complex, selecting the right inference framework is essential to meeting performance and deployment goals. In this work, we conduct a comprehensive comparison of five widely adopted inference frameworks: PyTorch, ONNX Runtime, TensorRT, Apache TVM, and JAX. All experiments are performed on the NVIDIA Jetson AGX Orin platform, a high-performance computing solution tailored for edge artificial intelligence workloads. The evaluation considers several key performance metrics, including inference accuracy, inference time, throughput, memory usage, and power consumption. Each framework is tested using a wide range of convolutional and transformer models and analyzed in terms of deployment complexity, runtime efficiency, and hardware utilization. Our results show that certain frameworks offer superior inference speed and throughput, while others provide advantages in flexibility, portability, or ease of integration. We also observe meaningful differences in how each framework manages system memory and power under various load conditions. This study offers practical insights into the trade-offs associated with deploying DL inference on resource-constrained hardware.
Keywords: deep learning; PyTorch; real-time inference; TensorRT; JAX; Apache TVM deep learning; PyTorch; real-time inference; TensorRT; JAX; Apache TVM

Share and Cite

MDPI and ACS Style

Ratul, I.J.; Zhou, Y.; Yang, K. Accelerating Deep Learning Inference: A Comparative Analysis of Modern Acceleration Frameworks. Electronics 2025, 14, 2977. https://doi.org/10.3390/electronics14152977

AMA Style

Ratul IJ, Zhou Y, Yang K. Accelerating Deep Learning Inference: A Comparative Analysis of Modern Acceleration Frameworks. Electronics. 2025; 14(15):2977. https://doi.org/10.3390/electronics14152977

Chicago/Turabian Style

Ratul, Ishrak Jahan, Yuxiao Zhou, and Kecheng Yang. 2025. "Accelerating Deep Learning Inference: A Comparative Analysis of Modern Acceleration Frameworks" Electronics 14, no. 15: 2977. https://doi.org/10.3390/electronics14152977

APA Style

Ratul, I. J., Zhou, Y., & Yang, K. (2025). Accelerating Deep Learning Inference: A Comparative Analysis of Modern Acceleration Frameworks. Electronics, 14(15), 2977. https://doi.org/10.3390/electronics14152977

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop