Towards Sustainable AI: Benchmarking Energy Efficiency of Deep Neural Networks for Resource-Constrained Edge Devices

Qamar, Rohail; Asif, Raheela; Jameel, Syed Muslim

doi:10.3390/info17040380

Open AccessArticle

Towards Sustainable AI: Benchmarking Energy Efficiency of Deep Neural Networks for Resource-Constrained Edge Devices

by

Rohail Qamar

^1,*

,

Raheela Asif

¹

and

Syed Muslim Jameel

²

¹

Department of Computer Science & Information Technology, NED University of Engineering & Technology, Karachi 75270, Pakistan

²

JANUS Research Centre, Atlantic Technological University, F92 N8H2 Galway, Ireland

^*

Author to whom correspondence should be addressed.

Information 2026, 17(4), 380; https://doi.org/10.3390/info17040380

Submission received: 20 February 2026 / Revised: 1 April 2026 / Accepted: 5 April 2026 / Published: 17 April 2026

(This article belongs to the Special Issue Machine Learning for Predictive Analytics: Models, Applications, and Challenges)

Download

Browse Figures

Versions Notes

Abstract

Deep learning models represent one of the most advanced and effective approaches in predictive modeling. Their hierarchical architectures enable the extraction of complex, non-linear feature relationships and the identification of latent patterns within data, making them highly suitable for tasks involving high-dimensional or unstructured inputs. However, these models are computationally demanding, requiring significant processing resources and time. Furthermore, their predictive performance is largely contingent upon the availability of large-scale datasets. In this study, a Deep Green Framework is employed for the prediction of two computer vision tasks. CIFAR-10 and CIFAR-00 have been taken for image classification. Fifteen convolutional neural network (CNN) variants categorized into light-weight and heavy-weight are trained for the prediction of these two datasets. Based on energy footprint, time, memory usage, Top-1 accuracy, Top-3 accuracy, model size, and model parameters. The study highlights that MobileNetV3-Small produces the best outcomes when compared to other trained models having low task latency and higher efficiency, making it highly suitable for edge environments where resources are scarce.

Keywords:

sustainable AI; deep learning; energy efficiency; computational efficiency; model benchmarking; edge ML; predictive modelling; lightweight neural networks; explainable AI

Graphical Abstract

1. Introduction

The IT sector produced between 2% and 6% of worldwide CO₂ emissions during 2020, and scientists predict its emissions could increase to 20% by 2030 [1]. The International Energy Agency (IEA) reported that data centers across the world used 200 terawatt-hours (TWh) of electricity during 2018, which represented 1% of global electricity consumption and produced 0.3% of worldwide carbon dioxide emissions [2]. The research shows that carbon footprint will reach double or nine times its current level in the following decade, while electricity demand will exceed 974 TWh by 2030 [3,4]. The energy consumption of cryptocurrencies has become a major problem because mining operations used 70 TWh/year of power during July 2021 through dedicated facilities with specialized equipment [5]. The large carbon emissions from computing activities require both personal and societal action to achieve environmentally friendly calculations because of increasing global computational needs.

Artificial Intelligence (AI) is a computer science field that creates autonomous systems to perform work that requires human mental processing. Machine learning (ML) operates as a part of AI, which enables models to learn patterns from data to generate predictions through automated training processes. Deep learning (DL) functions as a machine learning subfield that uses multi-layered artificial neural networks to extract data from complex information structures. DL has proven itself highly effective in performing tasks including computer vision (CV), speech recognition, Natural Language Processing (NLP), and additional applications.

Machine learning and deep learning are two types of AI models. ML models are often linear or tree-based, whereas DL models are highly complex, non-linear, and multi-layered architectures. Generally, a DL task can either be a training or an inference. The edge environment has scare resources in terms of computing (CPU and memory), permanent storage, and network communication. Deep learning models are typically hardware-intensive. Therefore, the merger of edge computing and deep learning is constrained by both the edge environment and the DL models. Table 1 summarizes the requirements and their categories of Edge DL. For these models to work properly, significant computing and memory resources are needed. Large and complex datasets and intricate architectures can make training DL models tedious and time-consuming, demanding high-performance hardware like Graphical Processing Units (GPUs) and Tensor Processing Units (TPUs). Furthermore, considerable memory space is needed for the processing and storage of large datasets. Another component of resource demand is energy usage because training just one AI model can produce as much carbon dioxide as five cars emit throughout their entire lifetimes [6]. Energy efficiency is fundamentally tied to computational capability on edge devices. Whereas cloud systems have access to unlimited power supplies, edge devices often operate on batteries with strict energy budgets [7]. The computational complexity of convolutional neural network (CNN) makes them major power consumers, and this energy constraint effectively limits which computations can be performed [8].

Despite the ongoing man-made climate emergency, there is a general lack of awareness regarding the consequences of our actions on carbon emissions and global warming. Greenhouse gas emissions have triggered the melting of ice caps and permafrost, which is direct evidence of increasing worldwide temperatures. The entire marine ecosystem faces a major threat from ocean acidification at the same time that biodiversity worldwide continues to decrease at a fast rate. While research holds promise in developing technologies to combat global warming, one cannot ignore the environmental impact of our work contributing to more carbon emissions as a result of heavy computations.

Sustainability refers to the concept of meeting present needs without compromising the ability of future generations to meet their own needs. It involves finding a balance between social, environmental, and economic considerations to ensure long-term well-being and harmony. Sustainable AI refers to the development and deployment of AI systems that are environmentally friendly and contribute to long-term sustainability. It involves considering the environmental, social, and economic impacts of AI technologies throughout their lifecycle. Sustainable AI is classified into the following two categories: Sustainability of AI and AI for Sustainability. Sustainability of AI deals with the extent to which the AI models are sustainable for the environment, e.g., reducing carbon footprints resulting from massive computations during model training/tuning. AI for Sustainability is a branch of sustainable AI making the environment sustainable by using AI models, e.g., AI4Good [9].

It is important to note that one Edge DL solution may not satisfy all the requirements. The priority of the requirements also depends on the Edge DL application. For instance, in self-driving vehicles, the low task latency requirement could be more critical than other requirements such as energy efficiency and cost efficiency [10].

1.1. Motivation and Challenges

The motivation for sustainable AI arises from the potential environmental impacts associated with resource-intensive models and the growing reliance on AI technologies. By evaluating the energy efficiency of the model in training and inference phases, one can harness the benefits of AI while minimizing negative environmental consequences and ensuring a sustainable future. This research aims to develop a framework to measure GPU, CPU, and DRAM incorporating the NVIDIA Management Library (NVML) and Running Average Power Limit (RAPL) interface to accurately measure energy consumption. The study will compute computational efficiency based on energy footprints of 15 CNN models. These models are categorized into light-weight and heavy-weight specifically tailored for computer vision tasks based on the CIFAR-10 and CIFAR-100 datasets. It will also analyze the correlation among accuracy, energy consumption, memory utilization, and task latency. Task latency refers to the time taken by DL to complete a task in seconds. This comprehensive approach aims to align AI with sustainability goals, a relatively new and important area of research. The second reason for developing this framework is to compare different DL models’ accuracy with respect to energy efficiency. Significant efforts have been made to develop energy-aware tools for deep learning models, such as power and carbon consumption, there is a lack of insight into more comprehensive metrics of performance, usage of resources, and environmental impacts. This creates a critical gap in enabling comprehensive and reproducible evaluation of deep learning models, particularly for resource-constrained and sustainability-aware deployments.

1.2. Novelty

In order to address these limitations, the present study proposes a Deep Green Framework (DGF), which generates a unified Model Efficiency and Effectiveness Report (MEER) by integrating predictive performance, computational efficiency, and resource utilization to recommend energy-efficient AI models, especially under resource-constrained environments. Unlike existing tools, they lack a unified end-to-end framework that can handle model performance, energy consumption, fine-grained resource utilization, and carbon footprint for informed and sustainability-aware model selection. It is designed to address the deep learning requirements.

2. Measuring Energy in Deep Hierarchical Network

The primary aim of this study is to understand the energy efficiency of different deep learning models. This might seem like a trivial task, but it is not as simple as it sounds. To properly compare energy efficiency across DL models, it is essential to obtain various comparable implementations with a good representation of different problems/solutions. With this in mind, the study begins by addressing the following research questions:

RQ1:: Can we compare the energy efficiency of deep learning while training and inference? The study found that CNNs are famous for their exceptional ability to process and analyze visual data, which makes it possible to obtain a large, comparable, representative, and diverse set of computer vision datasets. This allows data scientists and researchers to measure, analyze, and compare the energy consumption and energy efficiency of different DL models (Energy = Power × Time).
RQ2:: Is the faster model in terms of training and inference always the greenest (energy-efficient) and computationally efficient model? The study examines whether faster models are inherently more energy-efficient and computationally efficient. The proposed framework provides data scientists with the ability to analyze the relationship between CNN model execution speed and both energy consumption and architectural design complexity.
RQ3:: How does memory usage relate to energy consumption? Understanding how memory usage impacts efficiency. It facilitates effective memory management where computational efficiency is a concern, especially under resource-constrained environments.
RQ4:: How does overall energy consumption relate to the resulting accuracy of the model? The study reveals a weak relationship between training and inference energy with the effectiveness of the model.
RQ5:: Can a CNN model be automatically identified for an edge environment by considering carbon footprints and energy efficiency, accuracy, task latency, and memory usage? The study analyzed trade-offs between carbon and energy consumption, task latency, and memory usage of different CNN architectures and found notable differences in how these models utilize computational resources. The study provides information on which CNN models are best suited for specific scenarios like resource-constrained environment.

3. Related Work

Pinto, Gustavo, et al. analyzed a sample of 325 questions and 558 answers from more than 800 users [11]. The questions related to software energy consumption are more interesting but also have a lower success rate of answers. Code Design received the most attention and had the highest success rate, while measurement had the lowest success rate and the most views.

Pang, Candy, et al. paper explores whether programmers possess sufficient knowledge about software energy consumption [12]. The researchers surveyed over 100 programmers and found that most had limited knowledge of energy efficiency and needed to be made aware of best practices for reducing energy consumption in software. This highlights the need for improved training and education on energy consumption and efficiency for programmers.

The authors Ardito et al. defined a taxonomy based on the time axis and space axis [13]. IT is responsible for limited percentages of energy consumption and emissions. However, an analysis of future trends shows that carbon footprint emissions and consumption tend to increase. In the spatial dimension, data centers and PCs/smartphones should be the focus for energy consumption reduction during the usage phase. In the time dimension, efforts to reduce energy consumption should concentrate on the manufacturing phase and increasing the duration of the usage phase. They proposed some guidelines for reducing energy consumption as follows: design-efficient UI, event-based programming, low-level programming to avoid byte code, reduced data redundancy, and energy profiling tools.

The accelerating expansion of software and IT infrastructures has led to a substantial rise in global energy consumption, with data centers expected to account for 10% of global electricity use by 2030. Furthermore, the Internet, telecommunications, and embedded devices could reach one-third of global demand for energy consumption. Implementing Green IT and energy-efficient software development practices is therefore imperative to mitigate environmental and climate impacts [14].

The researchers Pereira, Rui, et al. validate the robustness of their rankings by comparing them to real-world implementations found in the Rosetta Code repository (https://rosettacode.org/wiki/Rosetta_Code, accessed on 12 December 2025) which confirms their reliability with small variations [15]. The researchers conducted rigorous evaluations on 10 programming problems sourced from the Computer Language Benchmark Game repository (https://benchmarksgame-team.pages.debian.net/benchmarksgame/index.html, accessed on 12 December 2025) ran on 27 different languages. The team created an organized system to evaluate and analyze the performance of energy usage, time management, and memory retention. The study produced interesting findings which showed that languages with slower speech rates require less energy and memory usage, directly impacts the amount of energy needed. This research provides valuable guidance for software engineers seeking energy-efficient language choices.

Georgiou, Stefanos, et al. examined the Energy Delay Product (EDP) of 25 Rosetta Code repository tasks implemented in 14 programming languages [16]. The study revealed that compiled languages outperformed interpreted ones for most task with C, C#, and JavaScript being the best performing. The study used a power analyzer called Watts Up Pro (https://github.com/pyrovski/watts-up, accessed on 14 December 2025) for measuring the energy consumption of each task.

The research explored the importance of empowering computer programmers to make energy-aware decisions. Two strategies proposed application-level and hardware-level to minimize energy consumption. Application-level approaches include data access patterns, data organization, data precision, and I/O strategies. A hardware-level management technique called Dynamic Voltage and Frequency Scaling (DVFS) allows CPU voltage and frequency to be adjusted [17]. Tests were made on Java (version 8) programs running on called jRAPL (https://github.com/kliu20/jRAPL, accessed on 14 December 2025).

A preliminary comparison of the energy efficiency of different implementations of the Towers of Hanoi problem was conducted using POWERAPI (beta version). The results show that the recursive version is more energy-efficient than the iterative version, and that the choice of programming language can also affect energy consumption significantly [18]. Another study compared two sorting algorithms, Bubble Sort and Heap Sort. The result shows that Bubble Sort which has a complexity of O (n2) consumes more energy than Heap Sort with O (n log n) complexity. Higher performance leads to higher energy efficiency as faster algorithms consume less energy per unit of useful work [19]. A further study introduced several strategies for mitigating carbon emissions and reducing energy consumption, such as running experiments in low-carbon regions, using efficient algorithms and hardware, considering trade-offs between performance and energy, ensuring reproducibility, and reporting energy and carbon metrics consistently [20].

Anthony, et al. study uses the following three medical image datasets for the task of image segmentation: DRIVE, LIDC, and CXR [21]. The authors train the following two different convolutional neural network models on these datasets: U-Net and LungVAE. They use carbontracker (https://pypi.org/project/carbontracker/, accessed on 14 December 2025) to track and predict the energy and carbon footprint of their training sessions. The model with the lowest carbon footprint is the U-Net model trained on the DRIVE dataset when compared with the LungVAE model.

Hähnel, Marcus, et al. explored the RAPL in Intel CPUs to measure the energy consumption of short code paths [22]. The study found that RAPL can perform fine-grained energy measurements at sub-update frequencies, with negligible performance overhead and high accuracy compared to external measurements. The paper showcased using RAPL infrastructure to characterize energy costs for a decoding video slice with an error rate of 1.12%.

Another investigation of the Intel RAPL interface found that it is a potent tool for measuring power usage in cloud computing servers. The authors ran a number of tests utilizing specialized microbenchmarks and application-level benchmarks. The researchers performed their experiments on Amazon EC2 while using actual power measurement data from a supercomputing cluster to achieve realistic results. The research demonstrates that RAPL readings and plug power measurements produce equivalent results, which deliver precise energy consumption data without causing any system performance degradation. Without complicated power meters, RAPL is a useful tool for monitoring server energy use [23].

PowerJoular uses Intel RAPL to measure the power consumption of CPU and GPU components. JoularJX uses PowerJoular to monitor the power consumption of Java methods in real time. The experiment is conducted in three programming languages—Python, Java, and C. The tool measures the ray-casting algorithm’s energy efficiency taken from Rosetta Code. Inappropriate measurement of the algorithm was executed in different numbers of iterations in Python (10,000), Java (100,000), and C (C11 version) (5000) languages [24].

The research discusses four methods to decrease ML energy consumption and carbon footprint through model efficiency and processor optimization and cloud deployment and renewable energy site selection. These technologies reduce ML training carbon emissions to a level that is 1000 times lower. The implementation of this technology throughout all operations would result in a 2030 reduction in machine learning training carbon emissions. The study uses Google data centers’ carbon intensity for estimates, but this might not reflect ML’s true environmental impact. It was estimated using the ML Emissions Calculator, which is an empirical method. GPT-3 and GPT-4 models are deployed on Microsoft cloud centers with varying power usage effectiveness [25].

The environmental impact of AI is driven by the super-linear growth trends in data, models, and infrastructure capacity. The operational carbon footprint of AI can be reduced by up to 800× through iterative hardware–software co-design and optimization for various ML tasks. Use more efficient model architectures and training algorithms. Reduce the precision of model weights and activations. Compress models using techniques such as pruning and quantization. Use hardware accelerators such as GPUs, TPUs, and FPGAs to speed up training and inference [26].

The powerful scientific research tool of high-performance computing (HPC) generates major environmental effects through its energy usage and hardware production processes. The computational demands of AI in high-performance computing create both environmental challenges and social and ethical dilemmas. Scientists need to determine their work’s carbon impact while implementing strategies to decrease it through optimized code development and environmentally friendly equipment selection and facility choices and emission offsetting practices [27].

A study suggested that by 2027, the AI industry is projected to consume energy comparable to that of the Netherlands, with annual usage estimated between 85 and 134 TWh. Mitigating this environmental impact requires slowing AI growth rates and ensuring greater transparency in corporate energy and water consumption practices [28]. The training process of big systems, including GPT-3 and AlphaGo, requires massive computational power, which results in high electricity consumption. The development of GPT-3 needed 1287 megawatt-hours of power, which equals the annual energy usage of 120 average U.S. households [29,30].

Recent research highlights “Green AI” as an emerging field that is aimed at reducing the environmental and energy impacts of AI systems. A systematic review of 98 studies identifies key trends, including benchmarking, footprint tracking, and hyperparameter optimization to enhance sustainability. Related work suggested a comprehensive framework incorporating model optimization, efficient algorithms, energy-efficient hardware, and sustainable data center operations. These can be validated through case studies in NLP and CV applications. These studies highlight the urgent need for sustainable AI development and deployment practices [31,32,33]. The study investigates the increasing energy demands of Deep Neural Network training and introduces Zeus, an optimization framework that balances training performance with energy efficiency. It leverages real-time energy profiling and adaptive configuration by improving energy efficiency up to 75.8% [34].

The studies collectively examine the relationship between model performance and energy efficiency within the Green Machine Learning paradigm. Through evaluations of multiple algorithms, including Linear Regression, Decision Tree, Random Forest, KNN, and SVM. The research highlights trade-offs between predictive accuracy and energy consumption. While the Random Forest model has the highest accuracy, achieving an R2 = 0.9989, the study found simpler models such as Logistic Regression are more energy-efficient [35,36].

According to research referenced in [37,38], organizations need to conduct readiness assessments for sustainable and responsible AI adoption. Complementary research introduced frameworks to quantify the carbon emissions and energy consumption of Python-based ML models, examining the role of Explainable AI (XAI) and feature reduction in optimizing model performance and transparency.

These findings provide practical insights for selecting ML models that balance performance and sustainability, emphasizing the importance of energy-aware algorithm design in industrial and computational applications.

4. Methodology

A comprehensive benchmarking framework named Deep Green Framework (DGF) is developed to systematically evaluate the energy efficiency and performance effectiveness of DL models. The DGF has a set of over 60 defined measuring parameters to evaluate the performance, resource usage and energy efficiency of a classifier. It monitors detailed energy impressions for GPU, CPU, and DRAM along with task latency (execution time) to assess system performance efficiency. It generates a unified Model Efficiency and Effectiveness Report (MEER). MEER encompasses performance indicators such as Top-1 and Top-3 accuracy, recall, precision, F1-score, ROC-AUC, Cohen’s Kappa, and geometric mean as well as efficiency related indicators energy consumption, carbon footprint, memory utilization, model size, and number of parameters. Inference backend (e.g., CPU, GPU, and XNNPACK) and training key hyper-parameters (such as number of epochs, batch size, and optimizer) are incorporated because they significantly influence performance and efficiency outcomes.

The experimental setup utilizes datasets of two main fields, name of dataset and the distribution of samples to train, validate and test. Energy profiling is recorded in Joules separately for training and inference of the entire network as well as for each component, i.e., CPU, GPU and RAM. Resource utilization metrics include average usage of CPU, GPU and RAM in %, average usage of RAM in GB and peak memory usage in GB. Inference efficiency is further examined through inference task latency, per-sample latency, and throughput, across different inference backend. Additionally, system-level attributes such as operating system, CPU and GPU model, total RAM, Python version, and experiment location are documented to ensure reproducibility and carbon footprint estimation. An automated Python script is implemented, enabling sequential execution and benchmarking of fifteen pre-trained CNN models. A fixed 5 s idle interval is introduced between consecutive runs to minimize thermal and resource-induced variability. DGF ensures stable, fair and reproducible performance measurements. This integration facilitates a formalized energy-aware evaluation methodology which supports comparative analysis of deep learning models in terms of both predictive performance and sustainability enabling responsible AI.

This research comprising three phases which are framework development, measuring energy footprints of GPU, CPU, DRAM, and ranking CNN models based on energy efficiency. The developed framework helps data scientists and developers become aware of the carbon footprint of their deep learning models. It will also help to quantify the environmental impact of computationally heavy tasks like training and inference. It encourages them to write and select more energy-efficient and sustainable DL models.

The first phase involves the development of a framework. Since the RAPL interface can only presently be used with Python, Java and C languages through pyRAPL, pyJoules, and jRAPL [39,40,41], to capture NVIDIA GPU energy readings, Python NVIDIA Management Library (PYNVML) is used. The goal is to measure the amount of energy consumed by the models. The primary reason to choose Python language for the framework development is because it has extensive ecosystem of specialized libraries, good hardware compatibility, supports direct access to registers, memory, and other hardware elements. Furthermore, this stage will involve data acquisition on computer vision tasks.

In the second phase of research, work focused on recording real-time energy readings of 15 pre-trained CNN models using transfer learning. The energy samples are recorded via interface. The DGF analyses computational resources such as memory usage and task latency of models to evaluate the extent of their sustainability. The final phase requires the assessment of sustainable CNNs models through carbon footprint analysis. The proposed framework contains a pictorial representation, which can be seen in Figure 1.

4.1. Experimental Setup

The evaluation process conducted on a system with an AMD Ryzen 5500 processor (6C/12T) at 3.6 GHz base speed and 4.2 GHz boost speed. The system runs with 24 GB of RAM, which provides stable and efficient deep learning experiment training performance at high computational levels. The system depends on an NVIDIA GeForce RTX 3050 GPU with 8 GB of VRAM to accelerate model training and inference operations. The experiments are executed on Ubuntu Linux OS 24.04 LTS (kernel version 6.14.0-28-generic, x86_64 architecture) with Python 3.12.11 as the primary programming environment. The assessment of training/inference environmental effects used Asia as the carbon emission region, with Pakistan as the specific location for energy consumption factor measurements. The setup allowed researchers to compare energy efficiency and computational performance and carbon emissions of different deep learning models in a fair manner.

4.2. Dataset Description and Key Training Hyperparameters

CIFAR-10 contains 60,000 color images. It has 10 different classes with 6000 images per class. CIFAR-100 also contains 60,000 color images. It has 100 different classes with 600 images per class. The dataset is split into 40,000 (Train size), 10,000 (Validation size) and 10,000 (Test size). Both are widely used for benchmarking DNNs [42,43]. ImageNet, a large-scale dataset with over 1.2 million images across 1000 classes, is often used for training deep learning models [44]. All the pre-trained models are trained on this dataset. The research study evaluates 15 pre-trained CNN architectures that were originally trained on ImageNet. Each model is fine-tuned on the CIFAR-10 and CIFAR-100 datasets to adapt the learned representations to different levels of task complexity. The fine-tuning method enables researchers to evaluate energy usage and execution speed and memory requirements and accuracy performance between different architectures which support both lightweight models like MobileNet variants and high-capacity networks including ConvNeXt and ResNet families. The research uses ImageNet-pretrained weights to provide all models with robust initial feature extraction abilities, which enables better evaluation of their performance trade-offs for edge and resource-constrained environments.

All reported CNN models were trained using a consistent set of hyperparameters, including 3 epochs with a fixed batch size of 32 and Stochastic Gradient Descent (SGD) as an optimizer on the CIFAR-10 dataset. For the second dataset, which presents a more challenging learning scenario due to label scarcity, the same models were trained with adjusted hyperparameters, specifically 15 epochs with a fixed batch size of 32 and the same optimizer. However, for ConvNeXt-Tiny and ConvNeXt-Small, a batch size of 32 resulted in out-of-memory errors on both datasets (see Table A1). Therefore, the batch size was reduced to 16 for these models to ensure stable training under hardware constraints. Table shows mathematical notions, governing equations and corresponding attributes (Table 2).

5. Analysis and Discussion

The evaluation of classifier performance through comparison to CIFAR-10 shows that systems must choose between running efficiently and making accurate predictions. The MobileNet series and Neural Architecture Search Network (NASNet) Mobile lightweight architectures achieve lower energy usage between 25,992 J and 108,534 J while providing faster inference times, which makes them suitable for deployment on edge devices under resource constrains. The classification performance and energy utilization of various CNN architectures on CIFAR-10 and CIFAR-100 datasets are summarized in Table 3, illustrating the trade-off between accuracy and computational cost across CNN architectures. The results presented in this study are median of 5 experimental runs for all scenarios to mitigate cold-start effects. The mobile networks reach a prediction accuracy of 90 to 92%. ResNet101 and DenseNet and ConvNeXt variants show the highest energy and time requirements among the models because they have the largest architecture size with ConvNeXt-Small reaching 576,910 J and 4370 s. The heavy-weight neural networks require more computational resources to operate, yet they deliver better accuracy results than light-weight NN with ConvNeXt-Small reaching 97.54% accuracy. The memory usage between models stays consistent at 5.84 GB to 7.09 GB, which shows that memory usage is not the main factor limiting high-performance CNNs since energy consumption and task latency are more significant factors. The results show that increased energy usage results in longer inference times but produces better accuracy in large models, which impacts the deployment of edge AI systems.

The evaluation of classifiers on the CIFAR-100 dataset shows that predictive performance comes at the expense of increased computational cost. MobileNetV3-Small and MobileNetV1 operate as lightweight architectures which use 124,356 J and 264,159 J of energy to run in 1001 and 1925 s while achieving 77.5% and 78.5% accuracy levels suitable for edge environments with limited resources. The Top-1 accuracy reached 86.16% and 87.39% with the highest energy and time requirements of 1,711,460 J and 2,777,480 J and 12,911 and 21,096 s for the high-capacity models ConvNeXt-Tiny and ConvNeXt-Small. The EfficientNet-B1 and InceptionV3 models with intermediate complexity achieved the best results by delivering more than 83% accuracy while using approximately 664,902 J of energy and 6.40 GB of memory. The results show that network depth and parameter count expansion results in major increases in energy consumption and inference duration yet produces consistent accuracy gains, which confirm the fundamental connection between model efficiency and accuracy for deep models running on CIFAR-100. Relative efficiency 100% shows the most efficient model. Also, 50% is half as efficient as the best model, while 10% is 10× less efficient.

RQ2: Is the faster model in terms of training and inference always the greenest and computationally efficient model? The study found a direct relationship between a CNN’s execution time and energy efficiency. In other words, low task latency of a DL model results in minimum energy consumption. The energy dissipated by a model is directly proportional to the duration of its training process. As the training time increases, the corresponding energy consumption also rises, thereby contributing to a higher carbon footprint. This relationship is valid for both the training and inference phases of the model tested on CIFAR-10 and CIFAR-100 dataset. The ConvNeXt-Small dissipated the highest energy and showed the highest task latency to train, whereas the most energy-efficient and fastest model is Mobile Net V3-small. Figure 2 and Figure 3 shows energy, time, and the amount of carbon footprints. The line graphs show the train/inference time along with energy and carbon impressions.

RQ3: How does memory usage relate to energy consumption? The study found that memory usage does affect energy consumption in DL. GPU-based energy consumption represents the majority of the energy consumed, but there is a variation between CNNs in how memory usage influences energy consumption. The graph depicts a non-linear trend, with each model demonstrating distinct memory utilization patterns that do not exhibit a direct correlation with energy consumption. Among the models during training phase, NASNet-Mobile records the highest memory usage, while VGG16 and VGG19 show the lowest. In contrast, MobileNetV3-small demonstrates the lowest DRAM energy consumption, whereas ConvNeXt-Small exhibits the highest (see Figure 4 and Figure 5 for in-depth insight). While in inference phase, Resnet101 has the highest memory usage and VGG19 has the lowest. In contrast, MobileNetV3-Small exhibits lowest DRAM energy and ConvNeXt-Small shows the highest.

RQ4: How does overall energy consumption (training and inference) relate to the resulting accuracy of the model? Looking at Figure 6 most models that consume higher amounts of energy tend to achieve greater accuracy. However, not all of them can be classified as energy-efficient or green. For instance, ConvNeXt-Small attains the highest accuracy (97.54%) in CIFAR-10 dataset but also exhibits the largest energy footprint, making it less sustainable. Conversely, models such as MobileNetV3-Small demonstrate the lowest energy consumption while maintaining a comparably high accuracy (91.09%), making them the most energy-efficient among all evaluated architectures.

Therefore, it can be inferred that models offering comparable accuracy with lower energy requirements should be prioritized to achieve an optimal balance between performance and sustainability. Considering the CIFAR-100 dataset, ConvNeXt-Small achieves the highest accuracy (87.39%), closely followed by EfficientNet-B1 (85.08%). Although the difference in accuracy between the two models is merely 2.31%, but the energy footprint of ConvNeXt-Small is approximately 3× greater than EfficientNet-B1. This substantial disparity in energy consumption makes EfficientNet-B1 a far more suitable in terms of energy-efficient choice with remarkable accuracy. The study found that higher energy footprint does not guarantee proportional gain in performance.

Model training produces greater environmental effects when moving from CIFAR-10 to CIFAR-100 datasets because of increased computational complexity. The training of CIFAR-10 with MobileNetV3-Small and MobileNetV1 resulted in minimal emissions, which equated to 3–7 g CO₂e and produced the same impact as driving a car for 26–57 m or watching TV for 49 min. The CIFAR-100 dataset required additional data and computational power, which resulted in CO₂e emissions between 15 and 32 g. ConvNeXt-Small and ConvNeXt-Tiny models with their high-capacity architecture produced the highest emissions which reached 338 g CO₂e for CIFAR-100 testing, equivalent to driving 2.8 km or watching 242 min of TV. The training of EfficientNet-B1 and ResNet50 and DenseNet models resulted in moderate carbon dioxide equivalent (CO₂e) emissions which ranged from 94 to 155 g. These models achieved a good balance between training speed and performance. The results in Figure A1 demonstrate that model depth directly affects computational intensity and environmental cost, which requires energy-efficient neural network design to address label scarcity in large-scale datasets such as CIFAR-100. Training is the most computationally expensive phase and has much higher carbon footmarks. The inference phase has almost the same impressions on both datasets.

RQ5: Can a CNN model be automatically identified for edge environment by considering carbon footprints and energy efficiency, accuracy, task latency, and memory usage? The experimental results demonstrate that lightweight architectures (see Table 3), particularly MobileNetV3-Small achieve the most favorable trade-off offering reasonable accuracy (91.09%) with the lowest energy footprint (25,992 J) and lowest task latency (216 s). This validates that MobileNet series, EfficientNet-B0V2, and NASNet-Mobile are highly compatible with edge environments as they computationally efficient models. In contrast, deeper models such as ConvNeXt and DenseNet provide marginal accuracy gains but at significantly higher energy and time costs. The substantial computational cost of these models is much higher, making them incompatible with scarce resource environment. It can be observed that energy consumption is a defining computational constraint. Edge computing is more popular for DL tasks presently. Less Float Point Operations (FLOPs) lead to less energy consumption. Similarly, less model parameters account less memory and storage utilization.

The correlation analysis in Table 4 reveals particular patterns of relationship between the performance metrics which researchers assessed. The analysis shows a very strong positive Spearman correlation (ρ = 0.99) between energy consumption and task latency because models that need extended training periods also consume more energy. The relationship between energy usage and memory requirements showed a strong correlation of ρ = 0.71 and memory usage and computation time showed a strong correlation of ρ = 0.68. The results showed that models which require more memory resources both use more energy and need longer processing times. The energy–accuracy correlation (ρ = 0.45) showed a weak positive relationship, which indicates that higher energy usage does not result in equivalent improvements in classification accuracy. The research shows that computational resource requirements maintain a direct relationship, yet energy efficiency optimization does not negatively affect model accuracy, which makes it possible to achieve balanced performance in limited resource settings.

5.1. Resource-Efficient Edge Inference

Table 5 presents the top five CNN models for edge inference, highlighting low-inference task latency, high-throughput inference, and compute backend type, i.e., CPU, CUDA-based GPU, and XNNPACK. The CPU is mainly used for sequential tasks, whereas the GPU is extensively used for high-level parallelism to accelerate compute-intensive operations such as deep hierarchical networks training task latency. In contrast, XNNPACK serves as an optimized CPU-based inference backend within TF Lite, enhancing execution efficiency on resource-constrained edge devices without requiring specialized hardware.

Lightweight architectures MobileNetV3-Small yield the lowest inference latency and highest throughput, respectively, across all hardware configurations, particularly outperforming under XNNPACK-based edge environments. Conversely, more complex models such as InceptionV3 and NASNet-Mobile exhibit significantly higher inference latency and lower throughput, especially on XNNPACK, indicating their limited suitability for resource-constrained environments. EfficientNet-B0V2 offers a balanced trade-off between predictive performance and computational efficiency. Furthermore, the results remain consistent across both datasets, which suggests robustness in model behavior irrespective of dataset complexity.

5.2. Strategies for Resource-Efficient Edge Inference

Proposed strategies for resource-efficient and low carbon inference on edge environment.

5.2.1. Model-Level Optimization

The main goal of model-level compression is to reduce both computational costs and energy consumption of deep learning systems while maintaining their operational accuracy. The well-known static compression techniques include model pruning, weight quantization, knowledge distillation and early exit to reduce redundant parameters, numerical precision and model complexity [47,48]. Moreover, adaptive compression dynamically adjusts its compression sensitivity based on accuracy requirements, input complexity and edge device limitations [49]. This will preserve their predictive strength while decreasing environmental effects. From a deployment perspective, lightweight and edge-optimized architectures (e.g., MobileNet, EfficientNetB0V2, and NASNet-Mobile) should be preferred for inference, minimizing data transmission and server energy requirements. Collectively, these strategies enhance computational efficiency, reduce CO₂ emissions, and promote responsible AI development aligned with the principles of sustainable AI.

5.2.2. Hardware-Level Optimization

The hardware-level focused on building a system which would achieve peak computing performance while minimizing power consumption. The training process used mixed-precision methods to speed up computations through GPU and TPU execution of FP16 operations, which lowered power usage without compromising accuracy. The training efficiency received additional enhancement through hardware acceleration methods, which included parallel computation and asynchronous data loading and GPU memory optimization.

5.2.3. Training-Level Optimization

The implementation of meta-learning and early stopping and check pointing and model reuse strategies resulted in better workflow optimization. Meta-learning enabled the system to adapt more efficiently by leveraging prior training experiences to improve convergence speed on new tasks, thereby reducing computation time and energy consumption. Early stopping is used to halt training once performance on the validation set plateaued, preventing overfitting and unnecessary iterations. Check pointing mechanisms ensured progress preservation and minimized retraining overhead in the event of interruptions. Furthermore, the reuse of pre-trained models (transfer learning) replaced full-scale training with fine-tuning, significantly lowering computational requirements while maintaining high model accuracy.

6. Conclusions

Image classification task in the present era of modern computational capabilities is no longer an intrinsically challenging task. Nevertheless, attaining high accuracy depends on the fulfillment of crucial requirements. These include the availability of sufficient resources, access to large volume and diverse variety of data, and ample computational time. Hierarchical deep networks specifically are highly dependent on these factors to perform effectively. In the current study, sufficient datasets are used for classification of images, and it has been observed that computationally resource-intensive convolutional models performed inefficiently due to their dependency on complex architecture and huge datasets. However, resource-efficient CNN models, for instance, MobileNetV3-Small, have the lightest model weight and hence the lowest inference task latency and the highest throughput, with the largest difference under the XNNPACK-based edge platforms. On the other end of the spectrum, models like InceptionV3 and NASNet-Mobile have much higher latency and throughput. Although EfficientNet-B0V2 delivers a balanced trade-off between predictive performance and computational efficiency. These edge-oriented models exhibit superior performance in comparison to traditional DL models. This dominance is not restricted to accuracy; light-weight CNN models also require low task latency and computationally efficient, making them a more effective and efficient solution for resource constraint environment.

It can be deduced from this study that in a context when there is a scarcity of computational resources, model selection using the Deep Green Framework can be preferred to get improved and sustainable results. Future extension will focus on designing adaptive compression techniques to achieve resource-efficient inference in edge environments. This approach will dynamically balance the trade-off between accuracy and efficiency by adjusting its compression sensitivity based on accuracy requirements, input complexity, and edge deep learning requirements. This extension is expected to enhance computational efficiency while preserving predictive performance and making them highly energy-efficient sustainable AI models to reduce environmental impact.

Author Contributions

Conceptualization, R.Q.; methodology, R.Q. and R.A.; validation, S.M.J. and R.A.; formal analysis, R.Q. and R.A.; investigation, R.Q., S.M.J. and R.A.; writing—original draft preparation, R.Q. and R.A.; writing—review and editing, R.A. and S.M.J.; project administration, S.M.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research did not receive any external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in CIFAR-10 and CIFAR-100 datasets at https://www.cs.toronto.edu/~kriz/cifar.html (accessed on 25 September 2025).

Acknowledgments

The authors sincerely thank the reviewers for their constructive feedback, which significantly improved the quality of this work.

Conflicts of Interest

Authors have no conflict of interest relevant to this article.

Appendix A

Table A1. Training hyperparameters of CNNs.

Classifier	Model Size (MB)	Model Complexity (Parameters in Millions)	CIFAR-10				CIFAR-100
Classifier	Model Size (MB)	Model Complexity (Parameters in Millions)	Epoch	Batch Size	Optimizer	Epoch	Batch Size	Optimizer
MobNetV3Small_LW	17	2.1	3	32	SGD	15	32	SGD
MobNetV2_LW	33	4.1	3	32	SGD	15	32	SGD
MobNetV1_LW	39	4.9	3	32	SGD	15	32	SGD
EffNetB0_LW	48	5.9	3	32	SGD	15	32	SGD
NASNetMob_LW	50	5.9	3	32	SGD	15	32	SGD
EffNetB0V2_LW	63	7.8	3	32	SGD	15	32	SGD
EffNetB1_HW	68	8.5	3	32	SGD	15	32	SGD
DenseNet_HW	70	8.7	3	32	SGD	15	32	SGD
VGG16_HW	126	15.8	3	32	SGD	15	32	SGD
VGG19_HW	169	21.1	3	32	SGD	15	32	SGD
InceptionV3_HW	196	24.5	3	32	SGD	15	32	SGD
ResNet50_HW	210	26.3	3	32	SGD	15	32	SGD
ConvNeXtTiny_LW	234	29.2	3	16	SGD	15	16	SGD
ResNet101_HW	363	45.3	3	32	SGD	15	32	SGD
ConvNeXtSmall_LW	407	50.8	3	16	SGD	15	16	SGD

Figure A1. CO₂e as equivalent car distance travel.

References

Malmodin, J. Greenhouse Gas Emissions in the ICT Sector. 2020. Available online: https://c2e2.unepccc.org/wp-content/uploads/sites/3/2020/03/greenhouse-gas-emissions-in-the-ict-sector.pdf (accessed on 10 October 2025).
Jones, N. How to stop data centres from gobbling up the world’s electricity. Nature 2018, 561, 163–166. [Google Scholar] [CrossRef]
Liu, Y.; Wei, X.; Xiao, J.; Liu, Z.; Xu, Y.; Tian, Y. Energy consumption and emission mitigation prediction based on data center traffic and PUE for global data centers. Glob. Energy Interconnect. 2020, 3, 272–282. [Google Scholar] [CrossRef]
Andrae, A.S.G. New perspectives on internet electricity use in 2030. Eng. Appl. Sci. Lett. 2020, 3, 19–31. [Google Scholar] [CrossRef]
Cambridge Blockchain Network Sustainability Index: CBECI. Available online: https://ccaf.io/cbnsi/cbeci (accessed on 14 October 2025).
Hao, K. Training a single AI model can emit as much carbon as five cars in their lifetimes. MIT Technol. Rev. 2020, 75, 103. [Google Scholar]
Su, Z.; Li, Q.; Kaneko, H.; Li, H.; Meng, L. Optimization and Deployment of DNNs for RISC-V-based Edge AI. In 2024 IEEE International Conference on Real-Time Computing and Robotics (RCAR), Alesund, Norway; IEEE: New York, NY, USA, 2024; pp. 200–205. [Google Scholar] [CrossRef]
Wu, Y.; Wu, C.; Yuan, G.; Li, Y.; Guo, W.; Rao, J.; Shen, X.; Ren, B.; Wang, Y. DACO: Pursuing Ultra-low Power Consumption via DNN-Adaptive CPU-GPU CO-optimization on Mobile Devices. In 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), Valencia, Spain; IEEE: New York, NY, USA, 2024; pp. 1–2. [Google Scholar] [CrossRef]
Dhiman, R.; Miteff, S.; Wang, Y.; Ma, S.-C.; Amirikas, R.; Fabian, B. Artificial Intelligence and Sustainability—A review. Analytics 2024, 3, 140–164. [Google Scholar] [CrossRef]
Li, W.; Hacid, H.; Almazrouei, E.; Debbah, M. A comprehensive review and a taxonomy of edge machine learning: Requirements, paradigms, and techniques. AI 2023, 4, 729–786. [Google Scholar] [CrossRef]
Pinto, G.; Castor, F.; Liu, Y.D. Mining questions about software energy consumption. In Proceedings of the 11th Working Conference on Mining Software Repositories; Association for Computing Machinery: New York, NY, USA, 2014; pp. 22–31. [Google Scholar] [CrossRef]
Pang, C.; Hindle, A.; Adams, B.; Hassan, A.E. What Do Programmers Know about Software Energy Consumption? IEEE Softw. 2016, 33, 83–89. [Google Scholar] [CrossRef]
Ardito, L.; Morisio, M. Green IT—Available data and guidelines for reducing energy consumption in IT systems. Sustain. Comput. Inform. Syst. 2013, 4, 24–32. [Google Scholar] [CrossRef]
Verdecchia, R.; Lago, P.; Ebert, C.; De Vries, C. Green IT and green software. IEEE Softw. 2021, 38, 7–15. [Google Scholar] [CrossRef]
Pereira, R.; Couto, M.; Ribeiro, F.; Rua, R.; Cunha, J.; Fernandes, J.P.; Saraiva, J. Ranking programming languages by energy efficiency. Sci. Comput. Program. 2021, 205, 102609. [Google Scholar] [CrossRef]
Georgiou, S.; Kechagia, M.; Louridas, P.; Spinellis, D. What are Your Programming Language’s Energy-Delay Implications? In 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), Gothenburg, Sweden; Association for Computing Machinery: New York, NY, USA, 2018; pp. 303–313. [Google Scholar] [CrossRef]
Liu, K.; Pinto, G.; Liu, Y.D. Data-Oriented characterization of Application-Level energy optimization. In Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2015; pp. 316–331. [Google Scholar] [CrossRef]
Noureddine, A.; Bourdon, A.; Rouvoy, R.; Seinturier, L. A preliminary study of the impact of software engineering on GreenIT. In 2012 First International Workshop on Green and Sustainable Software (GREENS), Zurich, Switzerland; IEEE: New York, NY, USA, 2012; pp. 21–27. [Google Scholar] [CrossRef]
Johann, T.; Dick, M.; Naumann, S.; Kern, E. How to measure energy-efficiency of software: Metrics and measurement results. In 2012 First International Workshop on Green and Sustainable Software (GREENS), Zurich, Switzerland; IEEE: New York, NY, USA, 2012; pp. 51–54. [Google Scholar] [CrossRef]
Henderson, P.; Hu, J.; Romoff, J.; Brunskill, E.; Jurafsky, D.; Pineau, J. Towards the systematic reporting of the energy and carbon footprints of machine learning. arXiv 2020. [Google Scholar] [CrossRef]
Anthony, L.F.W.; Kanding, B.; Selvan, R. CarbonTracker: Tracking and Predicting the carbon footprint of training deep learning models. arXiv 2020. [Google Scholar] [CrossRef]
Hähnel, M.; Döbel, B.; Völp, M.; Härtig, H. Measuring energy consumption for short code paths using RAPL. ACM SIGMETRICS Perform. Eval. Rev. 2012, 40, 13–17. [Google Scholar] [CrossRef]
Khan, K.N.; Hirki, M.; Niemi, T.; Nurminen, J.K.; Ou, Z. RAPL in action. ACM Trans. Model. Perform. Eval. Comput. Syst. 2018, 3, 9. [Google Scholar] [CrossRef]
Noureddine, A. PowerJoular and JoularJX: Multi-Platform Software Power Monitoring Tools. In 2022 18th International Conference on Intelligent Environments (IE), Biarritz, France; IEEE: New York, NY, USA, 2022; pp. 1–4. [Google Scholar] [CrossRef]
Patterson, D.; Gonzalez, J.; Holzle, U.; Le, Q.; Liang, C.; Munguia, L.-M.; Rothchild, D.; So, D.R.; Texier, M.; Dean, J. The carbon footprint of machine learning training will plateau, then shrink. Computer 2022, 55, 18–28. [Google Scholar] [CrossRef]
Wu, C.J.; Raghavendra, R.; Gupta, U.; Acun, B.; Ardalani, N.; Maeng, K.; Chang, G.; Aga, F.; Huang, J.; Bai, C.; et al. Sustainable AI: Environmental implications, challenges and opportunities. arXiv 2021. [Google Scholar] [CrossRef]
Lannelongue, L. Carbon Footprint, the (Not So) Hidden Cost of High Performance Computing. BCS, 11 October 2023. Available online: https://www.bcs.org/articles-opinion-and-research/carbon-footprint-the-not-so-hidden-cost-of-high-performance-computing/ (accessed on 14 October 2025).
Kleinman, Z.; Vallance, C. Warning AI Industry Could Use as much Energy as the Netherlands. BBC News, 10 October 2023. Available online: https://www.bbc.com/news/technology-67053139 (accessed on 10 October 2025).
Dhar, P. The carbon impact of artificial intelligence. Nat. Mach. Intell. 2020, 2, 423–425. [Google Scholar] [CrossRef]
Zhao, H.; Cui, W.; Chen, Q.; Zhang, S.; Li, Z.; Leng, J.; Li, C.; Zeng, D.; Guo, M. Towards fast setup and high throughput of GPU serverless computing. arXiv 2024. [Google Scholar] [CrossRef]
Verdecchia, R.; Sallou, J.; Cruz, L. A systematic review of Green AI. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2023, 13, e1507. [Google Scholar] [CrossRef]
Tabbakh, A.; Amin, L.A.; Islam, M.; Mahmud, G.M.I.; Chowdhury, I.K.; Mukta, M.S.H. Towards sustainable AI: A comprehensive framework for Green AI. Discov. Sustain. 2024, 5, 408. [Google Scholar] [CrossRef]
Wang, Q.; Li, Y.; Li, R. Ecological footprints, carbon emissions, and energy transitions: The impact of artificial intelligence (AI). Humanit. Soc. Sci. Commun. 2024, 11, 1043. [Google Scholar] [CrossRef]
You, J.; Chung, J.-W.; Chowdhury, M. ZEUS: Understanding and Optimizing {GPU} Energy Consumption of {DNN} Training. 2023. Available online: https://www.usenix.org/conference/nsdi23/presentation/you (accessed on 20 October 2025).
Santos, S.O.S.; Skiarski, A.; García-Núñez, D.; Lazzarini, V.; Moral, R.D.A.; Galvan, E.; Ottoni, A.L.C.; Nepomuceno, E. Green Machine Learning: Analysing the Energy Efficiency of Machine Learning Models. In Irish Signals and Systems Conference; IEEE: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
Yenikaya, M.A.; Oktaysoy, O. Machine learning in energy efficiency: Comparison of energy estimation models. EKEV Akad. Derg. 2025, 103, 196–210. [Google Scholar] [CrossRef]
Aldoseri, A.; Al-Khalifa, K.N.; Hamouda, A.M. Methodological approach to assessing the current state of organizations for AI-Based Digital Transformation. Appl. Syst. Innov. 2024, 7, 14. [Google Scholar] [CrossRef]
Jean-Quartier, C.; Bein, K.; Hejny, L.; Hofer, E.; Holzinger, A.; Jeanquartier, F. The Cost of Understanding—XAI Algorithms towards Sustainable ML in the View of Computational Cost. Computation 2023, 11, 92. [Google Scholar] [CrossRef]
PyRAPL. PyPI, 19 December 2019. Available online: https://pypi.org/project/pyRAPL/ (accessed on 14 October 2025).
Client Challenge. Available online: https://pypi.org/project/pyJoules/ (accessed on 14 October 2025).
Kliu. GitHub-kliu20/jRAPL. GitHub. Available online: https://github.com/kliu20/jRAPL/ (accessed on 14 October 2025).
Liao, S.; Xie, Y.; Lin, X.; Wang, Y.; Zhang, M.; Yuan, B. Reduced-Complexity deep neural networks design using Multi-Level compression. IEEE Trans. Sustain. Comput. 2017, 4, 245–251. [Google Scholar] [CrossRef]
Menghani, G. Efficient Deep Learning: A survey on making deep learning models smaller, faster, and better. ACM Comput. Surv. 2023, 55, 259. [Google Scholar] [CrossRef]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
Monitoring of CO₂ Emissions from Passenger Cars. Available online: https://co2cars.apps.eea.europa.eu/ (accessed on 12 September 2025).
Electricity: Which Appliances Consume the Most in Your Home? Act for the Ecological Transition, 11 November 2025. Available online: https://agirpourlatransition.ademe.fr/particuliers/economiser/energie/consommation-appareils-menagers (accessed on 15 September 2025).
Cheng, Y.; Wang, D.; Zhou, P.; Zhang, T. A survey of model compression and acceleration for deep neural networks. arXiv 2017. [Google Scholar] [CrossRef]
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
Xu, Y.; Khan, T.M.; Song, Y.; Meijering, E. Edge deep learning in computer vision and medical diagnostics: A comprehensive survey. Artif. Intell. Rev. 2025, 58, 3. [Google Scholar] [CrossRef]

Figure 1. Diagrammatic representation of Deep Green Framework.

Figure 2. Energy, time and carbon footprints graphical data for CIFAR-10.

Figure 3. Energy, time and carbon footprints graphical data for CIFAR-100.

Figure 4. DRAM energy and memory graphical data for CIFAR-10.

Figure 5. DRAM energy and memory graphical data for CIFAR-100.

Figure 6. Energy and accuracy graphical data for CIFAR-10/100.

Table 1. Edge deep learning specifications and requirements.

Specification	Requirements
Deep Learning Specification	• Low task latency: Total processing time for a deep learning task. A task can either be training or inference; • Model effectiveness: performance metrics like Top-K Accuracy, F1-Score, and Area Under Curve (AUC) are generally used to evaluate classifiers; • Adaptation to generalization: models should generalize the intrinsic patterns instead of memorization which leads to overfitting when unseen tasks are being introduced; • Enhanced privacy and security: on-device data processing, i.e., locally to minimize the information being shared via network; • Annotated data independence: Supervised learning requires large amount of labeled data to train and perform inference. However, data annotation is labor-consuming and time-consuming. The ability of an Edge DL network to solve classification or regression.
Edge Computing Specifications	• Computational efficiency: Processing resource such as CPU/GPU measured by floating point operations per second (FLOPs) and desired amount of memory in MB to complete a task. Maximum Resident Set Size (MaxRSS) is used to measure the required memory that a process uses during the DL task latency; • Optimized bandwidth: refers to the amount of data transferred over network per task (MB/Task). Communication and task latency can be increased by frequent and substantial data exchanges over the network. Edge DL platforms are expected to balance on-device processing and network data transfer for optimal bandwidth utilization; • Offline capability: relates to the capability to solve a DL task when network connectivity is lost or not required.
Overall Specifications	• Availability: edge devices process data in real-time and must remain operational to prevent downtime; • Reliability: the capability of system to carry out its necessary tasks for a certain amount of time. It is commonly measured in Failure Rate and Mean Time Between Failures (MTBF); • Energy efficiency: Useful work done per used energy. Here, useful work done means DL tasks (Task/J); • Cost-efficient: the entire cost of completing one DL task in an edge environment.

Table 2. Statistical measures and parameters for energy and emission assessment.

Measure Name	Notion
Energy	$E_{i} = P_{i} \times T_{i}$ where • Pi: power consumed during each model execution • Ti: execution time represents the training and inference task latency of each model
Energy Efficiency	${E E}_{i} = \frac{{A c c}_{i}}{E_{i}}$ • EE_i: base energy efficiency of a model • Acc_i: accuracy of individual model • E_i: energy readings
Relative Energy Efficiency	${R E E}_{i} = (\frac{{E E}_{i}}{{E E}_{m a x}}) \times 100$ • EE_max: maximum energy efficiency of the best model • REE_i: percentage efficiency of model i relative to the best-performing CNN, enabling comparison across architectures of varying sizes and efficiency demands
Carbon Dioxide Emission	${C O}_{2 e q} = E \times C I$ • E: energy consumed by the computational device in kWh • CI: carbon intensity of the electricity consumed by computational device in gCO₂/kWh
Equivalent Usage Emission	$E U E = (\frac{{C O}_{2 e q}}{E F})$ • EF: emission factor (Car = 0.12 and TV = 0.084) [45,46]

Table 3. Results for CIFAR-10 and CIFAR-100.

Classifier	CIFAR-10					CIFAR-100					Relative Efficiency (%)
Classifier	Energy (J)	Time (s)	Mem (GB)	Top-1 Accuracy	Top-3 Accuracy	Energy (J)	Time (s)	Mem (GB)	Top-1 Accuracy	Top-3 Accuracy	Relative Efficiency (%)
MobNetV3Small_LW	25,992	216	6.19	91.09	99.04	124,356	1001	6.34	77.5	92.65	100
MobNetV1_LW	57,380	418	6.07	92.45	99.21	264,159	1925	6.23	78.49	92.58	47.41
MobNetV2_LW	65,899	491	6.18	90.4	98.82	307,491	2272	6.34	75.41	90.77	39.60
EffNetB0V2_LW	95,285	724	5.06	94.01	99.45	462,996	3526	6.22	84.76	96.11	28.56
InceptionV3_HW	110,369	778	6.41	94.79	99.56	516,232	3629	6.53	81.72	94.25	25.12
NASNetMob_LW	108,534	968	7.09	90.43	98.63	578,874	4406	7.29	80.1	93.52	22.12
EffNetB0_LW	137,995	1045	6.2	94.13	99.36	664,902	4990	6.4	83.71	95.88	19.75
ResNet50_HW	143,695	1132	6.4	94.18	99.58	774,934	5394	6.56	81.19	94.15	17.02
DenseNet_HW	179,481	1295	6.66	95.72	99.69	853,042	6144	6.77	80.27	94.17	15.20
EffNetB1_HW	193,568	1459	6.33	94.89	99.68	928,808	6970	6.42	85.08	96.31	14.30
VGG16_HW	222,019	1509	5.85	91.03	99.1	1162,817	7166	6.07	71.69	89.01	10.48
ResNet101_HW	264,034	1857	6.82	95.73	99.63	1269,484	8833	6.96	82.63	94.7	10.37
VGG19_HW	262,008	1745	5.84	91.33	98.91	1384,247	8410	6.08	73.53	89.65	8.93
ConvNeXtTiny_LW	350,932	2673	6.32	96.97	99.77	1711,460	12,911	6.38	86.16	96.41	7.92
ConvNeXtSmall_LW	576,910	4370	6.77	97.54	99.88	2,777,480	21,096	6.85	87.39	96.71	4.92

Table 4. Spearman correlation.

Relationship	Spearman ρ	Correlation
Energy and Time	0.99	Very Strong
Energy and Memory	0.71	Strong
Memory and Time	0.68	Strong
Energy and Accuracy	0.45	Weak
• 0.80 ≤ 1.00—Very Strong • 0.60 < 0.80—Strong • 0.40 < 0.60—Moderate • 0.20 < 0.40—Weak • 0.10 < 0.20—Very Weak

Table 5. Top five models for edge inference.

Classifier	CIFAR-10 Inference				CIFAR-100 Inference
Classifier	Inference Latency (s)	Per-Sample Latency (ms)	Throughput (inf/s)	Infernce Backend	Inference Latency (s)	Per-Sample Latency (ms)	Throughput (inf/s)	Inference Backend
MobNetV3 Small	4.85	0.48	2063	GPU	4.85	0.49	2060	GPU
	21.64	2.16	462	CPU	30.05	3.01	333	CPU
	16.79	1.68	596	XNNPACK	16.76	1.68	597	XNNPACK
MobNetV1	8.99	0.90	1112	GPU	9.01	0.90	1110	GPU
	69.22	6.92	144	CPU	87.26	8.73	115	CPU
	112.55	11.26	89	XNNPACK	112.37	11.24	89	XNNPACK
NASNetMob	19.13	1.91	523	GPU	19.17	1.92	522	GPU
	98.53	9.85	101	CPU	127.18	12.72	79	CPU
	164.86	16.49	61	XNNPACK	164.13	16.41	61	XNNPACK
EffNetB0V2	13.11	1.31	763	GPU	13.13	1.31	761	GPU
	80.30	8.03	125	CPU	103.64	10.36	96	CPU
	169.06	16.91	59	XNNPACK	169.88	16.99	59	XNNPACK
InceptionV3	15.37	1.54	651	GPU	15.20	1.52	658	GPU
	118.14	11.81	85	CPU	132.99	13.30	75	CPU
	524.20	52.42	19	XNNPACK	526.39	52.64	19	XNNPACK

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qamar, R.; Asif, R.; Jameel, S.M. Towards Sustainable AI: Benchmarking Energy Efficiency of Deep Neural Networks for Resource-Constrained Edge Devices. Information 2026, 17, 380. https://doi.org/10.3390/info17040380

AMA Style

Qamar R, Asif R, Jameel SM. Towards Sustainable AI: Benchmarking Energy Efficiency of Deep Neural Networks for Resource-Constrained Edge Devices. Information. 2026; 17(4):380. https://doi.org/10.3390/info17040380

Chicago/Turabian Style

Qamar, Rohail, Raheela Asif, and Syed Muslim Jameel. 2026. "Towards Sustainable AI: Benchmarking Energy Efficiency of Deep Neural Networks for Resource-Constrained Edge Devices" Information 17, no. 4: 380. https://doi.org/10.3390/info17040380

APA Style

Qamar, R., Asif, R., & Jameel, S. M. (2026). Towards Sustainable AI: Benchmarking Energy Efficiency of Deep Neural Networks for Resource-Constrained Edge Devices. Information, 17(4), 380. https://doi.org/10.3390/info17040380

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu