Strategies of Automated Machine Learning for Energy Sustainability in Green Artificial Intelligence

Castellanos-Nieves, Dagoberto; García-Forte, Luis

doi:10.3390/app14146196

Open AccessArticle

Strategies of Automated Machine Learning for Energy Sustainability in Green Artificial Intelligence

by

Dagoberto Castellanos-Nieves

^*,†

and

Luis García-Forte

^*,†

Computer and Systems Engineering Department, University of La Laguna, 38200 San Cristóbal de La Laguna, Spain

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2024, 14(14), 6196; https://doi.org/10.3390/app14146196

Submission received: 3 June 2024 / Revised: 7 July 2024 / Accepted: 14 July 2024 / Published: 16 July 2024

(This article belongs to the Section Ecology Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

Automated machine learning (AutoML) is recognized for its efficiency in facilitating model development due to its ability to perform tasks autonomously, without constant human intervention. AutoML automates the development and optimization of machine learning models, leading to high energy consumption due to the large amount of calculations involved. Hyperparameter optimization algorithms, central to AutoML, can significantly impact its carbon footprint. This work introduces and investigates energy efficiency metrics for advanced hyperparameter optimization algorithms within AutoML. These metrics enable the evaluation and optimization of an algorithm’s energy consumption, considering accuracy, sustainability, and reduced environmental impact. The experimentation demonstrates the application of Green AI principles to AutoML hyperparameter optimization algorithms. It assesses the current sustainability of AutoML practices and proposes strategies to make them more environmentally friendly. The findings indicate a reduction of 28.7% in CO₂e emissions when implementing the Green AI strategy, compared to the Red AI strategy. This improvement in sustainability is achieved with a minimal decrease of 0.51% in validation accuracy. This study emphasizes the importance of continuing to investigate sustainability throughout the life cycle of AI, aligning with the three fundamental pillars of sustainable development.

Keywords:

automated machine learning; green artificial intelligence; asynchronous successive halving algorithm; energy efficiency; carbon dioxide equivalent; sustainability; Bayesian optimization HyperBand; ecological footprint; HyperBand; population-based training

1. Introduction

Artificial intelligence (AI) holds the potential to foster a circular economy and decarbonization, as well as to optimize scientific experiments. However, it also presents significant environmental threats [1]. Information and communication technologies (ICTs) account for up to 9% of global electricity consumption and are projected to reach 20% by 2030, according to the report ‘The Role of Artificial Intelligence in the European Green Deal’ by the European Parliament [2]. This increase translates into heightened greenhouse gas emissions. The so-called Red AI, a high-energy-consuming AI such as automated machine learning (AutoML), is particularly impactful. The computing resources required to train models like AutoML have been doubling approximately every 3.4 months since 2012, due to the cost of running an AI model, the size of the training dataset, and the number of hyperparameter experiments conducted. The employment of optimal algorithms and experimental strategies can promote greener AutoML practices [3].

Automated machine learning has evolved from a promising field to a solution aimed at reducing human effort in designing and optimizing machine learning models [4,5]. Given the ubiquity of hyperparameters in machine learning algorithms and the significant impact that a well-tuned hyperparameter configuration can have on predictive performance, hyperparameter optimization (HPO) has emerged as a central problem in AutoML [6,7,8,9,10].

However, AutoML also poses challenges due to its resource and energy consumption, as well as its environmental impact, aspects that are often overlooked. Hyperparameter optimization, facilitated by specialized algorithms, can serve as a pathway to mitigate energy consumption and improve the carbon footprint. Algorithms such as Bayesian optimization HyperBand (BOHB), HyperBand, population-based training (PBT), and the asynchronous successive halving algorithm (ASHA) can mitigate energy consumption in hyperparameter optimization [11,12,13,14]. Therefore, the sustainability implications of computational processes within the context of AutoML and improvements in the aforementioned algorithms are considered important. These processes can be made more sustainable if energy efficiency metrics are meticulously considered.

Green artificial intelligence, also known as Green AI, pertains to the field of AI research that aims to generate innovative outcomes without escalating computational costs, and, ideally, achieving a reduction thereof [3]. In studies with a comprehensive perspective, such as [15,16,17,18], the carbon footprint of computing with AI has been examined through the model development cycle in use cases of machine learning (ML) at an industrial scale, while simultaneously considering the life cycle of the system hardware. This perspective underscores the importance of continuing to investigate the sustainability of AI, particularly in terms of economic, social, and environmental aspects.

This research aims to optimize the hyperparameters of advanced algorithms in the field of automated machine learning, utilizing energy efficiency metrics for the enhancement of their sustainability. In this paper, we make the following contributions:

We present a proof of concept in the field of Green AI for advanced hyperparameter optimization algorithms for AutoML.
We delve into the development and application of such metrics to evaluate and improve the sustainability of AutoML algorithms.
We evaluate the current sustainability of some practices in AutoML and propose strategies to make them more environmentally friendly.

In the ensuing state-of-the-art section, current advances and challenges in the field of AutoML are reviewed, with a highlight on relevant research on hyperparameter optimization and its environmental implications. Section 3 addresses the problem of hyperparameter optimization in AutoML. Subsequently, in Section 4, the metrics used to evaluate the environmental impact of AutoML algorithms are discussed. The materials and methods utilized in this study are described in Section 5. The results are presented in Section 6 and their implications are discussed in the Discussion Section 7. Finally, the main conclusions are summarized and future lines of research are proposed in Section 8.

2. State of the Art

In this section, we explore the theoretical foundation of two closely related research areas: hyperparameter optimization in AutoML, and Green AI in the hyperparameter optimization process of AutoML, focusing on generated CO₂e.

2.1. Hyperparameter Optimization

Hyperparameter tuning is a critical task for improving the performance of machine learning algorithms, including AutoML [19,20]. Consequently, numerous recent research endeavors in the field of machine learning have concentrated on enhancing methods for optimizing the hyperparameters of ML algorithms [21,22,23].

The extensive adoption of machine learning models, coupled with the implementation of AutoML, has driven a relentless pursuit of performance optimization for machine learning algorithms. This endeavor has underscored the importance of achieving a delicate balance. This balance is between the performance of these models and their impact on environmental sustainability [24,25]. The exploration of various strategies has been motivated by the search for potential solutions to this critical challenge [26]. The quest for balance in AI involves the selection of efficient models, hyperparameter optimization, the use of knowledge transfer, reinforcement learning techniques, real-time monitoring, and energy-efficient hardware [27,28,29,30,31].

Hyperparameter optimization for machine learning algorithms has been a subject of intense research, with significant advancements evidenced in recent works [20,22,32,33]. In response to the growing need to develop environmentally friendly artificial intelligence, various optimization strategies have emerged [34,35,36]. These strategies, aligned with the principles of Green AI, strive to mitigate the environmental repercussions associated with AI development [37].

Among recent advancements, the work of Bergstra et al. [32] stands out with the Hyperopt library, which focuses on optimizing the hyperparameters of machine learning algorithms. Additionally, Probst et al. [20] have delved into the importance of hyperparameter tuning in machine learning algorithms, highlighting the relevance of this aspect. Claesen and De Moor [33] have addressed the topic of hyperparameter search in machine learning in an article available on arXiv. Yang and Shami [22] have provided both theoretical and practical analyses of HPO in machine learning algorithms.

Regarding optimization strategies for environmentally friendly AI, the systematic review conducted by Verdecchia et al. [34] on Green AI stands out. Yarally et al. [35] have taken preliminary steps toward Green AI, uncovering energy-efficient practices in deep learning training. Candelieri et al. [36] proposed a Green machine learning approach through augmented Gaussian processes and multi-information source optimization. These initiatives seek to integrate energy efficiency into the design and development of AI, aligning with principles of sustainability and environmental responsibility.

Among the most resource-efficient optimization techniques are those based on Bayesian sequential models and their derivatives [38,39,40]. The central purpose of these techniques is to minimize the number of objective function evaluations, which significantly contributes to the reduction of computational expenses. Bayesian optimization, in particular, leverages probabilistic surrogate models to guide the selection of hyperparameters to evaluate, based on previous results [41].

A detailed analysis of recent advancements in the field of surrogate models for modeling problems, feasibility analysis, and optimization, including the Bayesian approach, can be found in the work of Bhosekar and Ierapetritou [42]. Optimization methods based on sequential models, and particularly, Bayesian methods, emerge as promising options by offering principle-based approaches to weigh the importance of each dimension [43]. Furthermore, Yu and Zhu conducted a comparison between optimization algorithms, highlighting the Bayesian approach, and present notable approaches for model evaluation with limited computational resources in their review on HPO [44].

This convergence of evidence supports the idea that Bayesian optimization methods are not only effective in terms of resource efficiency but also offer solid and principle-based approaches for hyperparameter selection. However, they do not focus on solutions based on Green AI.

2.2. Green AI in Hyperparameter Optimization

The prevailing research efforts within the machine learning domain primarily concentrate on monitoring the carbon footprint, model benchmarking, and hyperparameter optimization, as evidenced by Verdecchia et al. [34]. However, cutting-edge methodologies are actively pursuing the reduction of iterations essential for identifying optimal hyperparameters. Approaches exemplified by the work of Stamoulis et al. [45] significantly deviated from conventional hyperparameter tuning procedures, demonstrably diminishing the energy expenditure associated with exploring the hyperparameter space to arrive at the optimal configuration. Furthermore, De Chavannes et al. [46] provided a compelling demonstration of how incorporating energy consumption considerations into the hyperparameter tuning process can ultimately promote the development of models characterized by superior energy efficiency.

In a related framework, Rajput et al. [47] presented a detailed approach to measuring energy consumption in deep learning (DL), examining the impact of parameter size and runtime on energy consumption. These contributions not only broaden the research landscape on energy efficiency in machine learning but also underscore the importance of considering factors beyond model accuracy in the search for optimal hyperparameters. In the study by Ferro et al. [37], it is concluded that the selection of certain simple parameters can have a significant impact on energy consumption and, consequently, lead to the discovery of more environmentally friendly strategies for artificial intelligence.

By highlighting these approaches, the diversity of strategies that go beyond mere carbon footprint evaluation is recognized. These investigations not only enrich the understanding of hyperparameter optimization but also offer valuable insights for the development of more energy-sustainable machine learning models.

Recent studies in HPO underscore the importance of multi-objective approaches, particularly in the context of model sustainability and efficiency. These studies, including those by Morales et al. [23], Ali et al. [48], and Kim et al. [49], employ multi-objective optimization techniques and evolutionary algorithms to simultaneously address multiple objectives such as precision, latency, and energy efficiency. Furthermore, Wistuba et al. [50] presented a unifying and categorizing overview of existing methods, including restricted and multi-objective architecture search, automatic data augmentation, optimizer, and activation function search.

Kim et al. [49] proposed Nemo, a methodology that utilizes neuroevolution alongside multi-objective optimization strategies for deep neural networks to simultaneously enhance velocity and accuracy. Ali et al. [48] concentrated their research on hyperparameter optimization processes for machine learning algorithms, targeting the minimization of computational complexity. Furthermore, Morales et al. [23] delivered an in-depth examination of multi-objective optimization techniques specifically applied to the hyperparameter tuning within the machine learning paradigm.

Wilson et al. [51] demonstrated a hybrid human–AI methodology that incorporated human expertise to boost machine learning model efficiency. Han et al. [52] described model efficiency improvements through pruning, which removes superfluous model connections and parameters. Knowledge distillation, originally proposed by Hinton et al. [53] and refined by Yang et al. [54], facilitates efficient knowledge transfer from a complex teacher model to a compact student model, enhancing machine learning sustainability with reduced resource consumption. Zoph et al. [55] advocated for quantization to decrease the computational and environmental burden of ML models by lowering the precision of numerical representations, benefitting resource usage and inference speed while maintaining accuracy.

In the research conducted by Castellanos et al. [56], energy efficiency metrics were employed to fine-tune hyperparameters using both random and Bayesian search strategies. The primary objective was to ameliorate the environmental footprint in accordance with the tenets of Green AI. The findings from this study indicate that AutoML can achieve greater sustainability when the energy efficiency of computational processes is carefully taken into account.

3. Hyperparameter Optimization Problem

In this section, the hyperparameter optimization problem is presented. A brief analysis of the hyperparameter optimization algorithms, such as Bayesian optimization HyperBand (BOHB) [12], HyperBand [11], population-based training (PBT) [13], and the asynchronous successive halving algorithm (ASHA) [14], which can mitigate energy consumption in hyperparameter optimization, is provided. These methods constitute the main approaches in the experimentation.

3.1. Definitions of Hyperparameter Optimization

In the domain of machine learning, hyperparameter optimization constitutes the process of identifying the optimal combination of hyperparameter values for a model, with the objective of maximizing its performance on a designated evaluation dataset. This pursuit can be framed as a mathematical optimization problem, as described in the work of Lorenzo et al. [57].

The goal is to find

T^{*}

such that (Equation (1)):

\begin{matrix} T^{*} = arg max_{T} (f (T)) \end{matrix}

(1)

subject to the following:

T \in C

In this context, T denotes the set of hyperparameter configurations. The objective function

f (T)

assesses the model’s performance based on a specific hyperparameter configuration T. The constraints

C

encompass any limitations or requirements that must be met by hyperparameter configurations. These constraints may involve allowable values for hyperparameters, restrictions on computational resources (such as maximum runtime or available memory), or any other relevant conditions for the given problem.

The optimal hyperparameter configuration

T^{*}

is the one that maximizes the objective function

f (T)

while satisfying the constraints

C

.

The proposal presents the search for the set of hyperparameters

T^{*}

that minimizes the equivalent carbon dioxide (CO₂e) emissions while maximizing the model’s accuracy within an iterative hyperparameter optimization process (Equation (2)).

\begin{matrix} T^{*} = arg min_{T \in C} (C O_{2} e (T_{accuracy}^{*} (T))) \end{matrix}

(2)

where T represents a set of hyperparameters.

C

denotes the search space of all possible sets of hyperparameters.

accuracy (M)

denotes the accuracy metric of a model M.

M (T)

represents the set of all possible models that can be trained given the hyperparameters T.

T_{accuracy}^{*} (T)

denotes the set of hyperparameters T that maximizes accuracy within the search space

M (T)

. Formally, it can be defined as (Equation (3)):

\begin{matrix} T_{accuracy}^{*} (T) = arg max_{M \in M (T)} (accuracy (M)) \end{matrix}

(3)

This means that for a given set of hyperparameters T, we select the model that has the highest accuracy.

C O 2 e

denotes a measure of equivalent carbon dioxide emissions, which represents the environmental impact of training a model given a set of hyperparameters.

This dual optimization approach ensures that the model not only achieves high performance in terms of accuracy but also minimizes its environmental impact, providing a balanced and sustainable solution in the field of IA and AutoML.

3.2. HyperBand Optimization Algorithm

The HyperBand algorithm, based on the principle of adaptively allocating resources to a hyperparameter configuration [11], is designed to expedite the optimization process. Its primary strength lies in the use of early stopping to concentrate computational resources on promising configurations. This approach, coupled with the division of the optimization process into rounds, streamlines the search for optimal hyperparameters.

Despite its simplicity and ease of implementation, which make it an attractive choice for certain applications, HyperBand’s reliance on random sampling of hyperparameter configurations can lead to suboptimal solutions. Therefore, fine-tuning the parameters that govern the early stopping mechanism is crucial for achieving optimal performance. The formalization of the algorithm is presented in Algorithm 1.

Algorithm 1 HyperBand.

1:: Input: R (maximum resources per configuration), $η$ (resource reduction factor)
2:: Output: the best hyperparameter configuration
3:: procedure HyperBand( $R, η$ )
4:: $s_{\max} \leftarrow ⌊ {log}_{η} R ⌋$
5:: $B \leftarrow (s_{\max} + 1) \cdot R$
6:: for $(s = s_{\max})$ down to 0 do
7:: $n \leftarrow ⌈ \frac{B}{R} \cdot η^{s} / (s + 1) ⌉$
8:: $r \leftarrow R \cdot η^{- s}$
9:: $T \leftarrow Sample (n)$
10:: for $i = 0, \dots, s$ do
11:: $n_{i} \leftarrow ⌊ n \cdot η^{- i} ⌋$
12:: $r_{i} \leftarrow r \cdot η^{i}$
13:: $L \leftarrow Run (T, r_{i})$
14:: $T \leftarrow Top (\frac{n_{i}}{η}, L)$
15:: end for
16:: end for
17:: return best configuration of T
18:: end procedure

In this context, R represents the maximum number of resources that can be allocated to a single configuration, while

η

is a resource reduction factor. The maximum number of iterations is denoted by

s_{\max}

, and B signifies the total resource budget. The number of configurations to test is represented by n, and r is the number of resources allocated to each configuration. The set of hyperparameter configurations is represented by T. The function

Sample (n)

randomly selects n hyperparameter configurations, and

Run (T, r_{i})

runs the model with the configurations in T and

r_{i}

resources. Lastly,

Top (k, L)

is a function that selects the top k configurations from L based on a specific performance metric.

The choice of R and

η

can greatly impact the results of the algorithm.

3.3. Bayesian Optimization HyperBand

The Bayesian optimization HyperBand algorithm, which combines the strengths of Bayesian optimization and the HyperBand algorithm, is designed for HPO [12].

BOHB’s efficiency is derived from its integration of HyperBand’s early stopping mechanism and Bayesian optimization. The latter allows for a more informed selection of hyperparameter configurations, thereby enhancing the exploration of the search space. This combination results in a robust and efficient algorithm. However, the performance of BOHB may be influenced by the quality of the initial surrogate model used in Bayesian optimization [12]. Furthermore, the computational overhead associated with Bayesian optimization can pose a limitation. The formalization of the algorithm is presented in Algorithm 2.

In this context, R denotes the maximum number of resources that can be allocated to a single configuration, and

η

is a resource reduction factor. The objective function f measures the model’s performance for a given hyperparameter configuration, while D is the density model used for sampling these configurations. The maximum number of iterations is represented by

s_{\max}

, and B is the total resource budget. The number of configurations to test is given by n, and r is the number of resources allocated to each configuration. The set of hyperparameter configurations is denoted by T. The function

Sample (n, D)

selects n hyperparameter configurations from the distribution D, and

Run (T, r_{i}, f)

runs the model with the configurations in T and

r_{i}

resources, returning the results of the objective function f. The function

Top (k, L)

selects the top k configurations from L based on a specific performance metric, and

Update (D, L)

updates the density model D with the results of the objective function L.

Algorithm 2 Bayesian optimization HyperBand.

1:: Input: R (maximum resources per configuration), $η$ (resource reduction factor), f (objective function), D (density model)
2:: Output: the best hyperparameter configuration
3:: procedure BOHB( $R, η, f, D$ )
4:: $s_{\max} \leftarrow ⌊ {log}_{η} R ⌋$
5:: $B \leftarrow (s_{\max} + 1) \cdot R$
6:: for $(s = s_{\max})$ down to 0 do
7:: $n \leftarrow ⌈ \frac{B}{R} \cdot η^{s} / (s + 1) ⌉$
8:: $r \leftarrow R \cdot η^{- s}$
9:: $T \leftarrow Sample (n, D)$
10:: for $i = 0, \dots, s$ do
11:: $n_{i} \leftarrow ⌊ n \cdot η^{- i} ⌋$
12:: $r_{i} \leftarrow r \cdot η^{i}$
13:: $L \leftarrow Run (T, r_{i}, f)$
14:: $T \leftarrow Top (\frac{n_{i}}{η}, L)$
15:: $D \leftarrow Update (D, L)$
16:: end for
17:: end for
18:: return best configuration of T
19:: end procedure

3.4. Population-Based Training

The population-based training algorithm is an optimization method that utilizes a diverse set of models and their hyperparameters to optimize performance [13]. PBT’s strength stems from its ability to adapt to shifting optima in the optimization landscape, facilitated by maintaining a varied population of models. The algorithm periodically replaces underperforming models with modified versions of superior ones, enhancing the balance between exploration and exploitation. PBT’s efficiency is highlighted by its lower computational resource demand compared to other methods. However, its effectiveness may depend on the appropriate configuration of hyperparameters that govern the population size and the exploration-exploitation trade-offs. The formalization of the algorithm is presented in Algorithm 3.

Algorithm 3 Population-based training.

1:: Input: P (population of models), f (objective function), R (available resources)
2:: Output: the best hyperparameter configuration
3:: procedure PBT( $P, f, R$ )
4:: $P \leftarrow Initialize (P)$
5:: while $R > 0$ do
6:: $P \leftarrow Train (P)$
7:: $f (P) \leftarrow Evaluate (P, f)$
8:: $P \leftarrow Sort (P, f (P))$
9:: $P_{bottom} \leftarrow Bottom (P, 0.2)$
10:: $P_{top} \leftarrow Top (P, 0.2)$
11:: $P_{bottom} \leftarrow Copy (P_{top})$
12:: $P_{bottom} \leftarrow Perturb (P_{bottom})$
13:: $R \leftarrow R - UsedResources$
14:: end while
15:: return $arg {max}_{p \in P} f (p)$
16:: end procedure

In this context,

Initialize (P)

is a function that initializes P with random hyperparameter configurations, and

Train (P)

is a function that trains each model in P for a fixed number of steps. The function

Evaluate (P, f)

evaluates f for each model in P, and

Sort (P, f (P))

sorts P based on the values of f. The functions

Bottom (P, 0.2)

and

Top (P, 0.2)

select the bottom 20% and top 20% of P, respectively. The function

Copy (P_{top})

creates copies of the hyperparameter configurations in

P_{top}

, and

Perturb (P_{bottom})

perturbs the hyperparameter configurations in

P_{bottom}

. Lastly, UsedResources represents the amount of resources used in the current iteration of the loop.

3.5. Asynchronous Successive Halving

The asynchronous successive halving algorithm is a scheduling algorithm used in HPO [14,58]. ASHA’s robustness stems from its ability to handle large-scale problems effectively. By leveraging parallelism, it optimizes resource utilization through concurrent evaluation of multiple configurations. The incorporation of aggressive early stopping ensures that computational resources are not expended on unpromising configurations. Notably, ASHA’s scalability in distributed environments allows it to adapt efficiently to various computational infrastructures. However, achieving optimal performance may require careful tuning of hyperparameters.

ASHA operates on the fundamental principle of iteratively partitioning the hyperparameter search space into several rounds or brackets. In each round, a subset of hyperparameter configurations is evaluated, and the best ones advance to the next round. This process continues until the optimal configuration is identified. The formalization of the algorithm is presented in Algorithm 4.

Algorithm 4 Asynchronous successive halving algorithm.

1:: Input: Minimum resource r, maximum resource R, reduction factor $η$ , minimum early-stopping rate s
2:: Output: the best hyperparameter configuration
3:: procedure ASHA
4:: repeat
5:: for each free worker do
6:: $(θ, k) \leftarrow GET_JOB ()$
7:: run_then_return_val_loss $(θ, r η^{s + k})$
8:: end for
9:: for completed job $(θ, k)$ with loss l do
10:: Update configuration $θ$ in rung k with loss l.
11:: end for
12:: until stopping condition is met
13:: end procedure
14:: procedure GET_JOB
15:: for $k = ⌊ {log}_{η} (R / r) ⌋ - s, \dots, 1, 0$ do
16:: $candidates = top_k (rung k, | rung k | / η)$
17:: $promotable = {t ∣ t \in candidates, t not already promoted}$
18:: if $| promotable | > 0$ then
19:: return $promotable [0]$ , $k + 1$
20:: end if
21:: end for
22:: $θ \leftarrow Draw random configuration$ // If not, grow bottom rung.
23:: return $θ$ , 0
24:: end procedure

In hyperparameter optimization, resources r and R are allocated to configurations, reduced by factor

η

. Workers test configurations

θ

, returning losses from function run__then__return__val__loss. The procedure GET_JOB selects or generates configurations for testing in optimization rounds, promoting configurations based on performance. The minimum early-stopping rate s determines when a round can be stopped.

3.6. Relationship between Hyperparameter Configuration and Convergence Efficiency

The algorithms HyperBand, BOHB, PBT, and ASHA are specifically designed to optimize hyperparameters in the field of machine learning and AutoML. Each of these algorithms implements a unique strategy to reduce the time needed for convergence.

HyperBand is an algorithm that utilizes the concept of non-stochastic bandits to expedite the process of random search. This is achieved through the adaptive allocation of resources and the implementation of early stopping. The problem of hyperparameter optimization is formulated as a pure exploration problem involving non-stochastic infinite-armed bandits. In this scenario, a predefined resource, such as iterations, data samples, or features, is allocated to configurations that have been randomly sampled.
BOHB is an algorithm that amalgamates the strengths of both HyperBand and Bayesian optimization. This results in a robust and efficient method that consistently outperforms both Bayesian optimization and HyperBand across a diverse range of problem types.
PBT, an acronym for population-based training, is an evolutionary approach that continually adjusts a set of hyperparameters during the training of a model. This is in contrast to traditional methods that maintain a fixed set of hyperparameters. PBT allows for the evolution of these hyperparameters over time.
ASHA, short for the asynchronous successive halving algorithm, is an asynchronous variant of HyperBand. This algorithm leverages parallelism and aggressive early stopping to tackle large-scale hyperparameter optimization problems. ASHA surpasses existing methods of hyperparameter optimization, scales linearly with the number of workers in distributed environments, and is well-suited for massive parallelism.

The reduction in convergence time can be formalized by the relationship between the number of evaluated configurations and the necessary training time to achieve a desired performance (Equation (4)). For instance, if T represents the total training time and N denotes the number of configurations, algorithms such as HyperBand and ASHA aim to minimize T while maximizing the quality of the selected configuration. This can be expressed as follows:

\begin{matrix} min_{1 \leq i \leq N} T_{i} subject to performance (c o n f i g u r a t i o n_{i}) \geq desired threshold \end{matrix}

(4)

Here,

T_{i}

is the training time for configuration i, and

performance (c o n f i g u r a t i o n_{i})

is a measure of quality for configuration i.

The algorithms HyperBand, BOHB, PBT, and ASHA, among others in the field of hyperparameter optimization, do not necessarily guarantee a reduction in convergence time in all cases. This is because it depends on the nature of the hyperparameter space and the objective function. However, they have demonstrated their effectiveness across a broad spectrum of problems and are particularly beneficial when parallel or distributed computational resources are available [11,12,58,59].

4. Metrics with Environmental Implications

This section delves into various metrics employed to assess the environmental implications of computational processes [60,61,62,63]. From the fundamental measure of runtime to more nuanced evaluations such as central processing unit (CPU)/graphics processing unit (GPU) or tensor processing unit (TPU) hours and random access memory (RAM) energy consumption, each metric provides unique insights into the interplay between computational efficiency and environmental sustainability [24,64,65,66,67].

4.1. Runtime

Runtime, a key measure of program execution duration, is linked to energy consumption and carbon footprint estimation, despite not directly indicating energy efficiency. Its ease of measurement makes it a practical tool for performance evaluation across different platforms, despite requiring additional data like hardware power consumption and power mix composition.

4.2. Energy Consumption of CPUs, GPUs, and TPUs

Quantifying the energy consumption of central processing units (CPUs), graphics processing units (GPUs), and tensor processing units (TPUs) provides a practical and straightforward method for assessing environmental impact. However, this metric can be ambiguous, as its calculation can be based on both the actual clock time and the actual activity time, which can lead to different interpretations of the environmental impact.

Despite its usefulness, counting usage hours presents itself as a suboptimal indicator of efficiency, largely due to its dependence on specific hardware. Even so, this metric remains one of the most practical thanks to its ease of measurement and its relatively simple ability to estimate the carbon footprint. This, of course, assumes that the devices consume a constant amount of energy and that the composition of the energy combination used is known.

Below, we present the key mathematical expressions for comprehending and quantifying the environmental impact of computational processes, with a specific focus on energy consumption and the associated carbon footprint. The power calculation for all CPU devices is denoted by

E_{C P U}

(Equation (5)). This formula calculates the total energy consumed by the CPU over a time period of (0) to (n), taking into account the varying workload.

\begin{matrix} E_{C P U} = T D P_{C P U} \int_{0}^{n} W_{C P U} (t) d t \end{matrix}

(5)

here,

T D P_{C P U}

represents the maximum amount of power the cooling system in a computer is required to dissipate under any workload. It is a constant value specific to the CPU model, and

W_{C P U}

is the workload on the CPU at any given time (t), expressed as a fraction of the maximum load the CPU can handle.

Analogously, the total power consumption of all active GPUs, denoted as

E_{G P U}

, is calculated using a similar formula (Equation (6)):

\begin{matrix} E_{G P U} = T D P_{G P U} \int_{0}^{n} W_{G P U} (t) d t \end{matrix}

(6)

where

T D P_{G P U}

is similar to the CPU’s TDP, but specific to the GPU model, and

W_{G P U}

is the workload on the GPU at the time (t).

Correspondingly, the total energy consumption of all active TPUs denoted as

E_{T P U}

, is calculated using a mathematical expression (Equation (7)). To determine the energy consumption of one or more TPUs, we sum the product of the voltage

V_{i}

and the current

C_{i}

of each TPU (where

V_{i}

and

C_{i}

represent the voltage and the current of the i-th TPU), multiplied by the usage time

T_{u s a g e}

, in addition to the sum of the energy consumption of each TPU when operating sequentially. Here,

T_{u s a g e}

represents the usage time for the TPU, and

T P U e c_{i}

is the base energy consumption of the i-th TPU, assuming it might have different power requirements.

\begin{matrix} E_{T P U} = \sum_{i = 1}^{n} (V_{i} \cdot C_{i} \cdot T_{u s a g e}) + \sum_{i = 1}^{n} (T P U e c_{i}) \end{matrix}

(7)

The total power consumption of the system, denoted as

E_{T C G T}

(Equation (8)), is obtained by adding the total power consumption of the CPUs (Equation (5)), GPUs (Equation (6)), and TPUs (Equation (7)):

\begin{matrix} E_{T C G T} = E_{C P U} + E_{G P U} + E_{T P U} \end{matrix}

(8)

The formula allows us to calculate the total energy consumption of the system, taking into account the main processing units, such as CPUs, GPUs, and TPUs [68,69]. This provides us with a comprehensive view of the environmental impact of these processing units collectively. It is important to note that this calculation includes the contribution from all these processing sources, which aids us in better understanding how they affect the system’s total energy consumption when performing algorithm computations.

4.3. Energy Consumption of RAM

Random access memory (RAM) significantly contributes to energy consumption in modern computing, particularly with data-intensive tasks. Accounting for RAM energy poses challenges due to its dependency on data activities. The power consumption,

E_{R A M}

, is calculated as Equation (9), as follows:

\begin{matrix} E_{R A M} = 0.375 \int_{0}^{n} M_{R A M_{i}} (t) d t \end{matrix}

(9)

where

M_{R A M_{i}}

is allocated memory (GB). In this equation, 0.375 is a constant factor that could represent the energy rate per GB per unit time, and

M_{R A M_{i}} (t)

represents the memory used at time (t) by the (i)-th RAM module. This comprehensive approach allows us to account for the energy consumption of each individual RAM module over time, providing a more accurate estimate of total system energy consumption.

It has been discussed how to calculate the energy consumption for processing units (CPU, GPU, and TPU) and RAM. Now, to obtain a comprehensive view of the system’s environmental impact, it is important to consider all these energy consumption sources together. Therefore, we propose a new metric,

E_{T o t a l}

, which sums up the energy consumption of the processing units and the RAM. The modified formula is as follows (Equation (10)):

\begin{matrix} E_{T o t a l} = E_{T C G T} + E_{R A M} \end{matrix}

(10)

This metric furnishes a holistic understanding of the system’s environmental footprint by amalgamating the energy consumption of CPUs, GPUs, TPUs, and RAM. This results in a total energy consumption metric that encapsulates both processing and memory utilization, thereby offering a more expansive perspective. By including RAM in our metric, we can more accurately capture energy consumption in data-intensive tasks, enhancing our understanding of environmental impact [68,69].

4.4. Assessing the Carbon Footprint or CO₂e

Understanding the environmental impact of AutoML experiments requires a multi-faceted approach, considering both energy consumption and its broader implications. While directly measuring energy expenditure does not provide a perfect evaluation of specific algorithms, it serves as a valuable indicator of the overall environmental footprint. Quantifying energy consumption, despite its limitations in assessing internal efficiency, provides a commendable starting point for understanding the experiment-specific environmental impact. Hardware efficiency plays a crucial role in this assessment, as it significantly influences how much energy AutoML algorithms consume during execution.

Accurately quantifying the carbon footprint (CF), also known as CO₂e, associated with AI practices presents additional challenges beyond merely measuring energy consumption. Factors such as geographical location, execution duration, and various other parameters contribute to the complexity of this metric. However, despite these inherent difficulties, measuring CO₂e remains essential for gaining a comprehensive understanding of the environmental impact of AI operations. This comprehensive approach to calculating the carbon footprint involves considering energy consumption alongside other critical factors such as emission intensity (

γ

), which could be defined as kg CO₂ equivalent per kWh, showing how much CO₂ is emitted per unit of energy consumed, and power usage effectiveness (

P U E

), a metric used to determine the energy efficiency of a data center by comparing the total facility energy to the energy consumed by its IT equipment. Employing this comprehensive methodology, as outlined in Equation (11), fosters a deeper understanding of the sustainability and eco-friendliness of AI practices.

\begin{matrix} C F = γ \cdot P U E \cdot E_{T o t a l} \end{matrix}

(11)

By precisely measuring energy use and considering hardware efficiency and execution conditions, we can confidently approximate CO₂e emissions generated during AutoML processes, thus offering a more comprehensive assessment of AutoML’s environmental implications. This comprehensive approach considers not only the total energy consumption but also the key parameters mentioned above. Including these factors with the total energy consumption of the AI components, the equivalent CO₂e rate is obtained in Equation (12), providing a measure of the carbon emissions associated with the AI operation, enabling a deeper understanding of their sustainability and eco-friendliness.

\begin{matrix} C O_{2} e = γ \cdot P U E \cdot E_{T o t a l} \end{matrix}

(12)

These calculations provide an integral view of both the direct energy consumption and its resultant environmental impact through carbon emissions. Understanding these metrics helps evaluate the sustainability of computational processes effectively.

5. Materials and Methods

In this section, we detail the methodology employed in our experimental study, which is focused on validating the energy efficiency of hyperparameter optimization algorithms to promote the development of more sustainable AI. The validation consists of the development of a proof of concept (PoC) prototype, in which a series of experiments are conducted. During these tests, various metrics specifically aimed at evaluating energy efficiency and sustainability are selected and applied, with the goal of affirming the effectiveness of the algorithms within the context of Green AI.

5.1. Methodology

The proposed methodology encompasses several phases. Initially, a representative dataset of DL is selected, which has minimal demands for storage and processing. Subsequently, the characteristics and configurations of the algorithms chosen for hyperparameter optimization in the domain of AutoML are defined. Following this, metrics to evaluate sustainability are identified, and experimentation will be conducted using both proprietary Red AI metrics and Green AI metrics for comparison (Figure 1). A PoC for AutoML with a quantifiable carbon footprint is then developed, utilizing the elements from the aforementioned phases. Finally, a comprehensive evaluation of the proposal is conducted.

The implementation of the proposed methodology and hyperparameter optimization strategies focused on energy efficiency, coupled with the evaluation of the environmental impact, provides empirical evidence validating the hypothesis that such practices contribute to enhancing sustainability in the realm of Green AI. If the experiments conducted within the realm of Green AI exhibit a decrease in energy consumption and carbon footprint relative to Red AI, without significantly undermining performance, it substantiates the hypothesis that optimization predicated on energy parameters presents a viable route toward a more sustainable artificial intelligence.

5.2. Datasets

For this study, the CIFAR-10 dataset, developed by Krizhevsky [70], and the IMDb dataset, obtained through the Keras framework [71], were chosen. The CIFAR-10 dataset consists of 60,000 color images of 32 × 32 pixels, labeled into 10 mutually exclusive classes. As for the annotation of the data, the CIFAR-10 dataset is a labeled subset of the 80 million tiny images dataset, collected from various online sources. The images were reviewed and classified manually by humans, ensuring a high degree of accuracy in the image labels. During the preprocessing of the datasets, channel normalization was performed to promote equitable learning from the neural network across the different color channels. This technique is essential to ensure that the features of the various channels are considered equitably during the network’s learning process.

CIFAR-10 is widely recognized in the field of machine learning and is commonly used for tasks such as image classification, anomaly detection, and image generation. For the data division, 80% was allocated for the training set and 20% for testing. This partition aligns with conventional practices and ensures an appropriate evaluation of the model’s performance. The choice of CIFAR-10 as a dataset provides a solid foundation for evaluating and comparing AutoML models in a standard and well-established scenario.

The IMDb dataset is a popularly used dataset for sentiment analysis in text. This dataset contains 50,000 movie reviews labeled as positive or negative, making it a suitable dataset for binary sentiment classification. The dataset contains a balanced number of positive and negative reviews that were written by humans. Only highly polarizing reviews are considered for constructing the dataset. The reviews are textual and were collected from the online database known as the Internet Movie Database (IMDb), and were manually included in the dataset by humans. The IMDb dataset, due to its characteristics and size, is suitable for testing energy optimization strategies in natural language processing (NLP) for sentiment classification tasks. Addressing this challenge is crucial to reducing environmental impact and improving the efficiency of opinion analysis systems.

5.3. Proof of Concept on Sustainability

A proof-of-concept study was conducted, focusing on energy sustainability within the field of Green AI. We established an experimental framework to evaluate various Green AI strategies. This framework allows for the comparison of these strategies with conventional approaches in hyperparameter optimization for AutoML. Experiments were carried out with some of the most representative algorithms of HPO, and the corresponding carbon emissions were measured. To achieve this, energy efficiency metrics defined in previous sections were employed. We developed and trained the models using the machine learning framework, PyTorch. For hyperparameter optimization, we utilized the Ray Tune library [72]. Ray Tune provides a unified interface for a wide range of commonly used HPO packages.

5.4. Hyperparameter Optimization and Architecture

This paper explores hyperparameter optimization within the EfficientNet architecture, focusing particularly on the EfficientNetB0 variant [27], and hyperparameter optimization for addressing binary classification tasks using sequential data, such as sentiment analysis in text with a BiLSTM architecture. This specific choice of EfficientNetB0 arises from its unique ability to balance efficiency and performance. Its scalable design, compound scaling technique, modular structure, and effective transfer learning capabilities position it as a leader among convolutional neural network (CNN) models, especially for applications requiring lower complexity and size due to constrained computational resources. By offering an optimal equilibrium, EfficientNetB0 proves to be a proficient selection for computer vision tasks characterized by moderate memory and computing demands.

The proposed BiLSTM model, consisting of bidirectional recurrent neural networks (RNNs) with long short-term memory (LSTM) units, represents an effective architecture for addressing binary classification tasks in sequential data, with a particular focus on sentiment analysis in text. Its ability to capture contextual information in both directions and its flexible tuning capability make it suitable for a variety of applications in natural language processing (NLP) and other areas involving sequential data. The model’s hyperparameters, such as the maximum vocabulary size, the maximum length of input sequences, the training batch size, and the number of training epochs, are configurable and can be adjusted to optimize the performance of the models.

To mitigate the phenomenon of overfitting, reduce the duration of convergence time, and address other complications associated with overfitting, several strategies were implemented in the EfficientNetB0 and BiLSTM architectures. In the EfficientNetB0 architecture, regularization techniques such as Dropout and L2 regularization were employed, along with the reduction of the learning rate. In contrast, in the BiLSTM architecture, regularizers such as L1, L2, and Dropout were applied to the LSTM layers to prevent excessive co-adaptation of neurons. Constantly observing how the models performed with the test data facilitated the rapid identification and reduction of overfitting. This prevented the time required for the model to properly adjust from unnecessarily increasing [73,74,75,76].

5.5. Experiment

The experimentation was conducted following the proposed methodology, as well as some enhancements from the one presented in [77].

The EfficientNet architecture was used in conjunction with the CIFAR-10 dataset, which is discussed in previous sections. The dataset was divided with an 80–20% split designated for training and the remaining portion for testing. Two main experiments were conducted, named the Red AI and the Green AI experiment. In these experiments, the dataset was evaluated with the implementation of a proof of concept. The hyperparameter optimization algorithms presented in Section 2.1 were utilized. The optimization of the hyperparameters was carried out for four hyperparameters: epochs, learning rate, batch size, and optimizer. However, the search space of the algorithms was deliberately kept conservative, and default and equal parameters were employed for each algorithm to mitigate any potential expert bias in the results. All algorithms were configured with the basic features that differentiate them.

The search procedure comprised optimization techniques and hyperparameter tuning, backed by specific algorithms. In the HyperBand algorithm, random search is accelerated through adaptive resource allocation and early stopping. In BOHB, Bayesian optimization is merged with the HyperBand to fine-tune the hyperparameters, selecting promising candidates through kernel density estimators. In the case of PBT, it improves the efficiency of machine learning model training by dynamically adjusting their hyperparameters and maintaining a population of models with different hyperparameter configurations. The ASHA adapts the successive halving algorithm to the asynchronous parallel scenario, promoting configurations to the next rung level as soon as enough observations are collected. The search and optimization process was not intensive in the pursuit of the optimum, adhering to certain sustainability guidelines.

In the Red AI experiment with CIFAR-10, HPO is performed to maximize the accuracy metric. Conversely, in the Green AI experiment with CIFAR-10, the focus is on minimizing energy consumption. In both experiments, the carbon footprint is determined to assess the environmental impact of each strategy.

In the BiLSTM architecture, the IMDb dataset was utilized, which was discussed in previous sections. The dataset was divided with an 80–20% split designated for training and the remaining portion for validation. Two main experiments were conducted, named the Red AI with IMDb experiment and the Green AI IMDb experiment.

The BOHB and PBT hyperparameter optimization algorithms, presented in Section 2.1, were utilized. The BOHB algorithm, which combines Bayesian optimization with the HyperBand, allows for an effective hyperparameter search and efficient allocation of computational resources. It adapts to the effectiveness of observed configurations, reducing the number of model evaluations required and potentially decreasing energy consumption. The PBT algorithm, population-based, allows multiple models to compete and collaborate to improve their hyperparameters. This facilitates efficient exploration of the hyperparameter space and adaptive search. However, maintaining multiple models and coordinating their interaction may require additional resources, potentially increasing energy consumption.

Hyperparameter optimization was carried out considering the maximum vocabulary size, the maximum length of input sequences, the training batch size, and the number of training epochs. Similar to previous experiments, the search space of the algorithms was deliberately kept conservative, and similar parameters were used for each algorithm in order to mitigate any potential expert bias in the results. This illustrates that in both experiments, improvements in sustainability are not contingent upon extreme fine-tuning or specialized knowledge, but rather on the systematic application of energy optimization principles. The BOHB and PBT algorithms were configured taking into account the fundamental characteristics that distinguish them.

Similar to the image classification experimentation in the Red AI experiment, HPO is performed to maximize the accuracy metric. Conversely, in the Green AI experiment, the focus is on minimizing energy consumption. In both experiments, the carbon footprint is determined to assess the environmental impact of each strategy in natural language processing (NLP).

Experiments were executed on two distinct systems. The first was a workstation with a 12th-gen Intel i7 12700 CPU (4.9 GHz), 32 GB DDR4 RAM (3600 MHz), and an RTX 3060 GPU (8 GB VRAM). The second, Google Colab, is a cloud-based coding platform, running on an Intel Xeon E5-2690 v4 CPU, Ubuntu 16.04 LTS OS, with 12.7 GB system RAM, 15 GB GPU-dedicated RAM, with a Tesla T4 GPU accelerator, and 78.2 GB of HDD storage.

The implementation of hyperparameter optimization algorithms, such as HyperBand, BOHB, PBT, and ASHA, with a focus on adaptive resource allocation and early stopping, signifies an energy-conscious methodology.

6. Results

In this section, we delineate the primary outcomes of the experimentation. An exhaustive analysis of the various experimental results is conducted, with a focus on the key metrics in Green AI and Red AI. To illustrate these findings, a series of tables and diagrams are presented.

The distinction between Red AI and Green AI experiments is pivotal in demonstrating the validity of the proposition. While Red AI is focused on maximizing accuracy metrics, Green AI prioritizes the minimization of energy consumption. This duality of approaches allows for a direct comparison of outcomes in terms of energy efficiency and model performance.

6.1. Experiment Red AI with CIFAR

In this experiment, as the first case, the performance of an EfficientNet-B0 model was evaluated during training and validation in an AutoML environment [27]. The performance metrics evaluated were CO₂e emissions, accuracy, validation accuracy, and energy consumption (Table 1). The optimization of the parameters was carried out with respect to accuracy. Experiments were conducted with 10 and 100 trials. Five repetitions of the same experiment were conducted, and the mean (or average) of the results was calculated. This approach helps achieve a more precise and robust view of the model’s performance by mitigating the impact of random variations or errors in a single execution, thereby enabling a more reliable estimation of overall model performance.

Based on the experimental results, the models that demonstrated the highest validation accuracy for each algorithm were identified. This criterion was established to ensure optimal model performance on unseen data. Subsequently, from this set of high-precision validation models, those that exhibited the lowest equivalent CO₂e emissions were selected.

An assessment was conducted to ascertain the average CO₂e (g/kWh) for each algorithm, providing insightful perspectives on the environmental sustainability of the methods employed in AutoML experimentation. The aggregated metric of CO₂e (g/kWh) enables the comparison of the environmental performance of different algorithms, promotes informed decision-making concerning sustainability and efficiency, identifies algorithms suitable for sustainable production environments, and raises awareness regarding the carbon footprint associated with machine learning processes.

In the Red AI experiments, the results delineated in Table 1 and Table 2, derived from a set of 10 trials, are articulated in terms of energy efficiency (quantified in CO₂e emissions per kilowatt-hour, g/kWh) and validation accuracy. These findings indicate that the PBT algorithm ostensibly exhibits the highest energy efficiency, with a mean of 0.854 g/kWh of CO₂e emissions, closely trailed by ASHA with a mean of 0.926 g/kWh. However, it is crucial to underscore that the median CO₂e emissions for PBT are significantly elevated (1.222 g/kWh), suggesting the existence of outlier hyperparameter values that skew the mean. Pertaining to validation accuracy, the BOHB algorithm surpasses the others with a mean of 75.851% and a median of 77.420%. This implies that BOHB manifests superior consistency in terms of accuracy performance relative to the other algorithms evaluated.

In the analysis of the various algorithms employed in both the Red AI and Green AI strategies, the Kruskal–Wallis test is utilized within each group. This non-parametric method is used to compare three or more independent samples. The Kruskal–Wallis test provides a flexible and robust alternative to parametric tests, as it does not require data normality (that the samples belong to a Gaussian distribution) [78,79].

The Kruskal–Wallis test, applied to 10 trials of CO₂e in Red AI, yielded an H = 2.917 and a p = 0.4047. As this exceeds the conventional threshold of 0.05, no significant differences were found in the CO₂e medians among the evaluated algorithms. Similarly, for validation accuracy, with an H = 2.372 and a p = 0.499, no significant differences were found among the distributions of the studied algorithms.

In the experimentation involving 100 trials, as depicted in Figure 2 and detailed in Table 3, in terms of energy efficiency, BOHB appears to be the most efficient, with a mean of 0.969 g/kWh of CO₂e emissions, closely followed by ASHA with a mean of 1.035 g/kWh. However, it is crucial to underscore that the median of the CO₂e emissions for both algorithms is lower than the mean, suggesting the presence of outlier values that could skew the mean. With respect to validation accuracy, HB outperforms the others with a mean of 74.555% and a median of 76.045%. This suggests that HB demonstrates greater consistency in terms of accuracy performance compared to the other algorithms, within the context of the experimentation involving 100 trials.

The Kruskal–Wallis test, applied to 100 trials of the Red AI, found no significant differences in CO₂e emissions among the evaluated algorithms (H = 5.332, p = 0.149). Similarly, no differences were found in the validation accuracy among the algorithms (H = 4.321, p = 0.229).

6.2. Experiment Green AI with CIFAR

In this second experiment, the performance of an EfficientNet-B0 model was evaluated during training and validation in an AutoML environment, with the optimization process employing energy efficiency metrics [27]. The performance metrics evaluated were CO₂e emissions, accuracy, validation accuracy, and energy consumption (Table 4). Experiments were conducted with 10 and 100 trials, similar to the previous experiment. Each experiment was repeated five times, and the mean was calculated to enhance the robustness of the experiment.

The same strategy was adopted in the data analysis as in the previous experiment, with reference to the validation accuracy and the equivalent CO₂e emissions (CO₂e (g/kWh)).

In the Green AI experiments, energy efficiency and validation accuracy of various algorithms were evaluated. In terms of energy efficiency, measured in CO₂e emissions per kilowatt-hour (g/kWh), the BOHB algorithm proved to be the most efficient with an average of 0.794 g/kWh, followed by HB with 0.811 g/kWh. However, the median of the CO₂e emissions for both algorithms is significantly lower than the mean, indicating the presence of outliers. In terms of validation accuracy, the PBT algorithm outperformed the others with an average of 76.788% and a median of 77.630%, indicating greater consistency in its performance. These results were obtained from a set of 10 trials (Table 5).

Upon applying the Kruskal–Wallis test to analyze ten trials of CO₂e emissions in Green AI, the results indicated an H value of 3.183 and a p-value of 0.3643 for CO₂e emissions, and an H value of 0.886 and a p-value of 0.8287 for validation accuracy. Both p-values exceed the conventional threshold of 0.05, implying the acceptance of the null hypothesis. Therefore, it is concluded that there are no significant differences in the medians of CO₂e nor the distributions of validation accuracy among the evaluated algorithms.

In the context of Green AI, according to the data presented in Table 6, 100 trials were conducted to evaluate the energy efficiency of various algorithms. The BOHB algorithm proved to be the most efficient, with an average CO₂e emission of 0.809 g/kWh, closely followed by the PBT algorithm with an average of 0.846 g/kWh. However, it is crucial to highlight that the median CO₂e emissions for both algorithms are lower than the mean, suggesting the presence of outliers that could be influencing the mean. With respect to validation accuracy, the PBT algorithm outperforms the others with an average of 73.918% and a median of 75.905%. This observation suggests that the PBT algorithm exhibits greater consistency in terms of accuracy performance compared to the other evaluated algorithms.

The analysis of 100 trials of the CO₂e variable in Green AI using the Kruskal–Wallis test yielded an H = 2.177 and a p = 0.537, while for validation accuracy, an H = 2.291 and a p = 0.514 were obtained. Both p-values exceed the conventional threshold of 0.05, indicating that there are no statistically significant differences in the medians of CO₂e and validation accuracy among the evaluated algorithms, suggesting similar distributions for both metrics.

6.3. Experimentation with the PBT Algorithm Using the IMDb Dataset

In this experiment, we evaluated the performance of the population-based training (PBT) algorithm on the BiLSTM architecture during training and validation for natural language processing in an AutoML environment. The performance metrics assessed included CO₂e emissions and validation accuracy (see Table 7 and Table 8). Parameter optimization was conducted with respect to accuracy. The experiment was repeated three times and the results were averaged to mitigate the impact of random variations or errors in a single run, allowing for a more reliable estimation of the overall model performance. Google Colab, also known as Colaboratory, was utilized for this experimentation [80].

The validation accuracy and CO₂ emissions are positively correlated (Table 7). This implies that as validation accuracy increases, so does the CO₂ emissions (measured in grams per kilowatt-hour, g/kWh). Models with higher validation accuracy are more energy-intensive and generate greater CO₂ emissions. The relationship between validation accuracy and the type of optimizer used (such as SGD, RMSprop, or Adam) is not straightforward. However, it is observed that some optimizers, like Adam, tend to achieve higher validation accuracy compared to SGD and RMSprop. This difference could be attributed to variations in learning speed and convergence capabilities among optimizers. Additionally, validation accuracy is influenced by factors such as batch size, the number of epochs, and the maximum feature length. In general, increasing these parameters tends to enhance validation accuracy, suggesting that a model trained with more data and additional epochs can achieve greater precision.

The data indicate that as validation accuracy increases, so too do CO₂e emissions (measured in grams per kilowatt-hour, g/kWh) (Table 8). This suggests that models with higher validation accuracy consume more energy and, thus, generate more CO₂e emissions, as observed in the previous experiment. The relationship between CO₂e emissions and the type of optimizer used (SGD, RMSprop, Adam) is not directly clear from the experimental data. However, it can be observed that some optimizers, such as Adam, tend to have higher CO₂e emissions compared to SGD and RMSprop. This could be due to differences in learning rate and convergence capabilities of the optimizers. CO₂e emissions appear to be influenced by several factors, including batch size, the number of epochs, and the maximum number of features and length. In general, as these parameters increase, CO₂e emissions also tend to increase, suggesting that a model with more data and more training epochs may consume more energy and, thus, generate more CO₂e emissions.

The analysis of the PBT algorithm in the Red AI and Green AI scenarios reveals significant differences between the average values and the median of validation accuracy and CO₂e emissions (Table 7 and Table 8). In Red AI, the average accuracy was 72.17% and the median was 84.25%, with average emissions of 2.999 g/kWh and a median of 1.023 g/kWh. In Green AI, the average accuracy was 70.96% and the median was 84.67%, with average emissions of 1.1198 g/kWh and a median of 0.079 g/kWh. The discrepancies suggest outlier values, highlighting the advantage of Green AI in sustainability.

6.4. Experimentation with the BOHB Algorithm Using the IMDb Dataset

In this experiment, the performance of the BOHB algorithm on the BiLSTM architecture for sentiment classification was evaluated using the IMDb dataset in an AutoML environment. The performance metrics evaluated included CO₂e emissions and validation accuracy (refer to Table 9 and Table 10). The experiment was repeated three times and the average was taken to mitigate the impact of variations or random errors in a single run. Google Colab was also employed for this experimentation [80].

Experimental results show that as validation accuracy increases, so does the value of CO₂e (Table 9). This suggests that, in general, models with higher validation accuracy also generate higher CO₂e emissions. However, there are exceptions, as can be observed in the runs with validation accuracies of 50.20% and 50.98%, where the CO₂e value decreases despite similar validation accuracy. The Adam and RMSprop optimizers appear to be associated with lower CO₂e values compared to SGD. This could be due to these optimizers being more efficient in terms of computational resources, resulting in lower CO₂e emissions. As the maximum number of features increases, the value of CO₂e emissions also tends to increase. This suggests that models with more features may require more computational resources, resulting in higher CO₂e emissions. Although no direct correlation is observed between the number of epochs and CO₂e, a greater number of epochs may imply greater use of computational resources, which could result in higher CO₂e emissions. Batch sizes of 32 and 64 appear to be associated with lower CO₂e values compared to 128. This could be due to smaller batch sizes being more efficient in terms of computational resources, resulting in lower CO₂e emissions.

A positive correlation is observed between validation accuracy and equivalent CO₂e emissions, indicating that an increase in validation accuracy coincides with an increase in CO₂e emissions per unit of energy consumed (Table 10). With respect to the relationship between validation accuracy and the type of optimizer used, no direct correlation is evident, although a trend is perceived in which the Adam optimizer tends to produce a higher validation accuracy. Validation accuracy is influenced by hyperparameters, such as batch size, the number of epochs, and the maximum number of features and their length.

The comparison between the experimentation of the BOHB algorithm in the Red AI and Green AI solutions shows that although there are similarities in the general trends, there are differences in the specific CO₂e values (Table 9 and Table 10). This is justified by the use of an optimization strategy focused on validation accuracy in one of the experiments and on energy efficiency in the other. The data suggest that validation accuracy and the optimizer are key factors influencing CO₂e emissions, with Adam and RMSprop being the most resource-efficient optimizers.

In the Red AI scenario, the BOHB algorithm demonstrated superior performance, achieving an average validation accuracy of 77.697% and generating CO₂e emissions of 2.36241 g/kWh. However, in the Green AI scenario, despite a slight decrease in validation accuracy (76.01%), greater energy efficiency was observed with lower CO₂e emissions (2.05353 g/kWh). This underscores the importance of energy efficiency metrics in evaluating the performance of AI algorithms.

7. Discussion

In this section, we present the fundamental findings derived from our experiments with Red AI and Green AI. We conducted a thorough analysis of the various experimental results, focusing on HPO, and key metrics such as performance, energy efficiency, and sustainability.

The evaluation of the carbon footprint across both datasets offers a quantitative assessment of environmental impact. By focusing on the minimization of energy consumption and the evaluation of the carbon footprint, Green AI experiments demonstrate a tangible improvement in sustainability.

The presence of outliers adversely affects the means, resulting in median values that are typically higher compared to the mean values. These outliers could be hyperparameters of various algorithms that yield results significantly different from the majority of trials. In terms of validation accuracy, the PBT algorithm consistently exhibits median values that are higher than its means, particularly in Green AI scenarios. Most algorithms display median values that are lower than their means in CO₂e emissions, suggesting occasional spikes of high energy consumption. In terms of accuracy in Red AI and energy efficiency in Green AI, the BOHB algorithm demonstrates robust performance, with favorable median values.

The differences between the 10 and 100 trial scenarios suggest that a larger number of trials allows AutoML algorithms to reveal more stable performance patterns. When selecting algorithms for hyperparameter optimization, it is important to consider both medians and means to gain a more comprehensive view of typical performance and result variability.

The experimental results derived from both the Red AI and Green AI, specifically pertaining to CO₂e emissions, and both utilizing the CIFAR-10 dataset, were subjected to analysis using the Mann–Whitney U test. The choice of a non-parametric test is based on the violation of the assumptions of normality and equality of variances, which are considered critical for the validity of parametric tests [78,79,81]. The necessity of non-parametric alternatives like the Mann–Whitney U test is underscored in such instances. A U value of 92,407.0 was obtained, with a p-value of

1.47 \times 10^{- 4}

. Given that this p-value is less than 0.05, it can be concluded that there are statistically significant differences between the two strategies in hyperparameter optimization (Figure 3).

In the experimentation of the Red AI and Green AI strategies with the PBT and BOHB algorithms, using the IMDb dataset, the results do not meet the assumptions of normality and equality of variances. Therefore, the non-parametric Mann–Whitney U test was applied. In the case of the BOHB algorithm, the U value for CO₂e emissions is 280.0, with a p-value of 0.012, while for validation accuracy, U is 325.5 and the p-value is 0.067. These results indicate the presence of significant differences in CO₂e emissions between the Red AI and Green AI strategies (Figure 4). However, in terms of validation accuracy, we cannot assert that there are significant differences between both strategies. On the other hand, in the case of the PBT algorithm, the U value for CO₂e emissions is 589.0, with a p-value of 0.041, while for validation accuracy, U is 461.0 and the p-value is 0.877. Similar to the BOHB algorithm, these results indicate the presence of significant differences in CO₂e emissions between the Red AI and Green AI strategies. Nevertheless, in terms of validation accuracy, we cannot assert that there are significant differences between both strategies (Figure 5).

8. Conclusions

This study has highlighted the importance of considering energy efficiency in the development and optimization of automated machine learning (AutoML) models. Energy efficiency metrics for advanced hyperparameter optimization algorithms within AutoML have been introduced and examined.

In this context, the potential of algorithms such as Bayesian optimization HyperBand (BOHB), HyperBand, population-based training (PBT), and asynchronous successive halving algorithm (ASHA) to reduce the carbon footprint of AutoML has been specifically investigated. These algorithms have been applied in image classification using an EfficientNet architecture. In addition, in the case of the PBT and BOHB algorithms, they have also been implemented in a BiLSTM architecture to solve a natural language processing problem, specifically in sentiment analysis. The findings indicate a reduction of 28.7% in CO₂e emissions when implementing the Green AI strategy, compared to the Red AI strategy. This improvement in sustainability is achieved with a minimal decrease of 0.51% in validation accuracy. These results suggest that the strategy of using these algorithms, focused on energy efficiency metrics, can contribute to mitigating the carbon footprint of AutoML. These metrics allow for the evaluation and optimization of an algorithm’s energy consumption, taking into account not only accuracy but also sustainability and reduced environmental impact.

The experimentation has demonstrated the application of Green AI principles to AutoML hyperparameter optimization algorithms. The current sustainability of AutoML practices has been evaluated, and strategies have been proposed to make them more environmentally friendly. The findings suggest that careful consideration of energy efficiency metrics can promote environmentally friendly computational processes in AutoML.

As future work, these findings underscore the importance of expanding research into efficient and eco-friendly algorithms, as well as strategies for AI sustainability. This approach should consider not only economic viability but also the dimensions of social responsibility and environmental management. Through a comprehensive assessment of the carbon footprint and consideration of various aspects of AI operations, we can move toward environmentally sustainable AI practices.

Author Contributions

All the authors have contributed equally to the realization of this work. D.C.-N. and L.G.-F. participated in the conception and design of the work; D.C.-N. and L.G.-F. reviewed the bibliography; D.C.-N. and L.G.-F. conceived and designed the experiments; D.C.-N. and L.G.-F. performed the experiments; D.C.-N. and L.G.-F. analyzed the data; D.C.-N. and L.G.-F. wrote and edited the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially possible thanks to the collaboration and support of the Spanish Ministry of Science and Innovation with the projects PDC2022-134013-I00, TED2021-131019B-I00, and PID2019-107228RB-I00.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset used is in the public domain. The code can be requested from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

Adam	adaptive moment estimation
AI	artificial intelligence
ASHA	asynchronous successive halving algorithm
AutoML	automated machine learning
BiLSTM	bidirectional long short-term memory
BOHB	Bayesian optimization HyperBand
CF	carbon footprint
CNN	convolutional neural network
CO₂e	carbon dioxide equivalent
CPU	central processing unit
DL	deep learning
GPU	graphics processing unit
HB	HyperBand
HPO	hyperparameter optimization
IMDb	Internet Movie Database
ML	machine learning
NLP	natural language processing
PBT	population-based training
RAM	random access memory
RMSprop	root mean square propagation
SGD	stochastic gradient descent
TPU	tensor processing unit

References

Dhar, P. The carbon impact of artificial intelligence. Nat. Mach. Intell. 2020, 2, 423–425. [Google Scholar] [CrossRef]
Gailhofer, P.; Herold, A.; Schemmel, J.P.; Scherf, C.S.; de Stebelski, C.U.; Köhler, A.R.; Braungardt, S. The Role of Artificial Intelligence in the European Green Deal; European Parliament: Luxembourg; Brussels, Belgium, 2021. [Google Scholar]
Schwartz, R.; Dodge, J.; Smith, N.A.; Etzioni, O. Green ai. Commun. ACM 2020, 63, 54–63. [Google Scholar] [CrossRef]
Hadi, R.H.; Hady, H.N.; Hasan, A.M.; Al-Jodah, A.; Humaidi, A.J. Improved fault classification for predictive maintenance in industrial IoT based on AutoML: A case study of ball-bearing faults. Processes 2023, 11, 1507. [Google Scholar] [CrossRef]
Zhuhadar, L.P.; Lytras, M.D. The application of autoML techniques in diabetes diagnosis: Current approaches, performance, and future directions. Sustainability 2023, 15, 13484. [Google Scholar] [CrossRef]
Li, L. Towards Efficient Automated Machine Learning. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2021. [Google Scholar] [CrossRef]
Bischl, B.; Binder, M.; Lang, M.; Pielok, T.; Richter, J.; Coors, S.; Thomas, J.; Ullmann, T.; Becker, M.; Boulesteix, A.L.; et al. Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2023, 13, e1484. [Google Scholar] [CrossRef]
Radzi, S.F.M.; Karim, M.K.A.; Saripan, M.I.; Rahman, M.A.A.; Isa, I.N.C.; Ibahim, M.J. Hyperparameter tuning and pipeline optimization via grid search method and tree-based autoML in breast cancer prediction. J. Pers. Med. 2021, 11, 978. [Google Scholar] [CrossRef] [PubMed]
Alsharef, A.; Kumar, K.; Iwendi, C. Time series data modeling using advanced machine learning and AutoML. Sustainability 2022, 14, 15292. [Google Scholar] [CrossRef]
Karras, A.; Karras, C.; Schizas, N.; Avlonitis, M.; Sioutas, S. AutoML with Bayesian optimizations for big data management. Information 2023, 14, 223. [Google Scholar] [CrossRef]
Li, L.; Jamieson, K.; DeSalvo, G.; Rostamizadeh, A.; Talwalkar, A. Hyperband: A novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 2018, 18, 1–52. [Google Scholar]
Falkner, S.; Klein, A.; Hutter, F. BOHB: Robust and efficient hyperparameter optimization at scale. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1437–1446. [Google Scholar]
Jaderberg, M.; Dalibard, V.; Osindero, S.; Czarnecki, W.M.; Donahue, J.; Razavi, A.; Vinyals, O.; Green, T.; Dunning, I.; Simonyan, K.; et al. Population based training of neural networks. arXiv 2017, arXiv:1711.09846. [Google Scholar]
Li, L.; Jamieson, K.; Rostamizadeh, A.; Gonina, E.; Hardt, M.; Recht, B.; Talwalkar, A. Massively parallel hyperparameter tuning. arXiv 2018, arXiv:1810.05934. [Google Scholar]
Wu, C.J.; Raghavendra, R.; Gupta, U.; Acun, B.; Ardalani, N.; Maeng, K.; Chang, G.; Aga, F.; Huang, J.; Bai, C.; et al. Sustainable ai: Environmental implications, challenges and opportunities. Proc. Mach. Learn. Syst. 2022, 4, 795–813. [Google Scholar]
Kaack, L.H.; Donti, P.L.; Strubell, E.; Kamiya, G.; Creutzig, F.; Rolnick, D. Aligning artificial intelligence with climate change mitigation. Nat. Clim. Chang. 2022, 12, 518–527. [Google Scholar] [CrossRef]
Kuo, C.C.J.; Madni, A.M. Green learning: Introduction, examples and outlook. J. Vis. Commun. Image Represent. 2023, 90, 103685. [Google Scholar] [CrossRef]
Treviso, M.; Lee, J.U.; Ji, T.; Aken, B.v.; Cao, Q.; Ciosici, M.R.; Hassid, M.; Heafield, K.; Hooker, S.; Raffel, C.; et al. Efficient methods for natural language processing: A survey. Trans. Assoc. Comput. Linguist. 2023, 11, 826–860. [Google Scholar] [CrossRef]
Baratchi, M.; Wang, C.; Limmer, S.; van Rijn, J.N.; Hoos, H.; Bäck, T.; Olhofer, M. Automated machine learning: Past, present and future. Artif. Intell. Rev. 2024, 57, 122. [Google Scholar] [CrossRef]
Probst, P.; Boulesteix, A.L.; Bischl, B. Tunability: Importance of hyperparameters of machine learning algorithms. J. Mach. Learn. Res. 2019, 20, 1934–1965. [Google Scholar]
Wu, J.; Chen, X.Y.; Zhang, H.; Xiong, L.D.; Lei, H.; Deng, S.H. Hyperparameter optimization for machine learning models based on Bayesian optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar]
Yang, L.; Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 2020, 415, 295–316. [Google Scholar] [CrossRef]
Morales-Hernández, A.; Van Nieuwenhuyse, I.; Rojas Gonzalez, S. A survey on multi-objective hyperparameter optimization algorithms for machine learning. Artif. Intell. Rev. 2023, 56, 8043–8093. [Google Scholar] [CrossRef]
Strubell, E.; Ganesh, A.; McCallum, A. Energy and policy considerations for deep learning in NLP. arXiv 2019, arXiv:1906.02243. [Google Scholar]
Lacoste, A.; Luccioni, A.; Schmidt, V.; Dandres, T. Quantifying the carbon emissions of machine learning. arXiv 2019, arXiv:1910.09700. [Google Scholar]
Probst, P.; Wright, M.N.; Boulesteix, A.L. Hyperparameters and tuning strategies for random forest. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1301. [Google Scholar] [CrossRef]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Feurer, M.; Hutter, F. Hyperparameter optimization. In Automated Machine Learning: Methods, Systems, Challenges; Springer: Berlin/Heidelberg, Germany, 2019; pp. 3–33. [Google Scholar]
Vázquez-Canteli, J.R.; Nagy, Z. Reinforcement learning for demand response: A review of algorithms and modeling techniques. Appl. Energy 2019, 235, 1072–1089. [Google Scholar] [CrossRef]
Anthony, L.F.W.; Kanding, B.; Selvan, R. Carbontracker: Tracking and predicting the carbon footprint of training deep learning models. arXiv 2020, arXiv:2007.03051. [Google Scholar]
Han, S.; Mao, H.; Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv 2015, arXiv:1510.00149. [Google Scholar]
Bergstra, J.; Yamins, D.; Cox, D.D. Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms. In Proceedings of the 12th Python in Science Conference, Austin, TX, USA, 24–29 June 2013; Volume 13, p. 20. [Google Scholar]
Claesen, M.; De Moor, B. Hyperparameter search in machine learning. arXiv 2015, arXiv:1502.02127. [Google Scholar]
Verdecchia, R.; Sallou, J.; Cruz, L. A systematic review of Green AI. In Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery; Wiley: Hoboken, NJ, USA, 2023; p. e1507. [Google Scholar]
Yarally, T.; Cruz, L.; Feitosa, D.; Sallou, J.; Van Deursen, A. Uncovering Energy-Efficient Practices in Deep Learning Training: Preliminary Steps Towards Green AI. In Proceedings of the 2023 IEEE/ACM 2nd International Conference on AI Engineering—Software Engineering for AI (CAIN), Melbourne, Australia, 15–16 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 25–36. [Google Scholar]
Candelieri, A.; Perego, R.; Archetti, F. Green machine learning via augmented Gaussian processes and multi-information source optimization. Soft Comput. 2021, 25, 12591–12603. [Google Scholar] [CrossRef]
Ferro, M.; Silva, G.D.; de Paula, F.B.; Vieira, V.; Schulze, B. Towards a sustainable artificial intelligence: A case study of energy efficiency in decision tree algorithms. Concurr. Comput. Pract. Exp. 2023, 35, e6815. [Google Scholar] [CrossRef]
Bachoc, F. Cross validation and maximum likelihood estimations of hyper-parameters of Gaussian processes with model misspecification. Comput. Stat. Data Anal. 2013, 66, 55–69. [Google Scholar] [CrossRef]
Snoek, J.; Larochelle, H.; Adams, R.P. Practical bayesian optimization of machine learning algorithms. In Proceedings of the Advances in Neural Information Processing Systems 25 (NIPS 2012), Lake Tahoe, NV, USA, 3–6 December 2012; pp. 2951–2959. [Google Scholar]
Sun, X.; Lin, J.; Bischl, B. Reinbo: Machine learning pipeline search and configuration with bayesian optimization embedded reinforcement learning. arXiv 2019, arXiv:1904.05381. [Google Scholar]
Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.P.; De Freitas, N. Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE 2015, 104, 148–175. [Google Scholar] [CrossRef]
Bhosekar, A.; Ierapetritou, M. Advances in surrogate based modeling, feasibility analysis, and optimization: A review. Comput. Chem. Eng. 2018, 108, 250–267. [Google Scholar] [CrossRef]
Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
Yu, T.; Zhu, H. Hyper-parameter optimization: A review of algorithms and applications. arXiv 2020, arXiv:2003.05689. [Google Scholar]
Stamoulis, D.; Cai, E.; Juan, D.C.; Marculescu, D. Hyperpower: Power-and memory-constrained hyper-parameter optimization for neural networks. In Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 19–23 March 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 19–24. [Google Scholar]
de Chavannes, L.H.P.; Kongsbak, M.G.K.; Rantzau, T.; Derczynski, L. Hyperparameter power impact in transformer language model training. In Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing, Virtual, 10 November 2021; pp. 96–118. [Google Scholar]
Rajput, S.; Widmayer, T.; Shang, Z.; Kechagia, M.; Sarro, F.; Sharma, T. FECoM: A Step towards Fine-Grained Energy Measurement for Deep Learning. arXiv 2023, arXiv:2308.12264. [Google Scholar]
Ali, Y.A.; Awwad, E.M.; Al-Razgan, M.; Maarouf, A. Hyperparameter search for machine learning algorithms for optimizing the computational complexity. Processes 2023, 11, 349. [Google Scholar] [CrossRef]
Kim, Y.H.; Reddy, B.; Yun, S.; Seo, C. Nemo: Neuro-evolution with multiobjective optimization of deep neural network for speed and accuracy. In Proceedings of the ICML 2017 AutoML Workshop, Sydney, Australia, 10–11 August 2017; pp. 1–8. [Google Scholar]
Wistuba, M.; Rawat, A.; Pedapati, T. A survey on neural architecture search. arXiv 2019, arXiv:1905.01392. [Google Scholar]
Wilson, A.G.; Dann, C.; Lucas, C.; Xing, E.P. The human kernel. Adv. Neural Inf. Process. Syst. 2015, 28, 2854–2862. [Google Scholar]
Han, S.; Pool, J.; Tran, J.; Dally, W. Learning both weights and connections for efficient neural network. Adv. Neural Inf. Process. Syst. 2015, 28, 1135–1143. [Google Scholar]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
Yang, J.; Martinez, B.; Bulat, A.; Tzimiropoulos, G. Knowledge distillation via adaptive instance normalization. arXiv 2020, arXiv:2003.04289. [Google Scholar]
Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8697–8710. [Google Scholar]
Castellanos-Nieves, D.; García-Forte, L. Improving Automated Machine-Learning Systems through Green AI. Appl. Sci. 2023, 13, 11583. [Google Scholar] [CrossRef]
Lorenzo, P.R.; Nalepa, J.; Kawulok, M.; Ramos, L.S.; Pastor, J.R. Particle swarm optimization for hyper-parameter selection in deep neural networks. In Proceedings of the Genetic and Evolutionary Computation Conference, Berlin, Germany, 15–19 July 2017; pp. 481–488. [Google Scholar]
Li, L.; Jamieson, K.; Rostamizadeh, A.; Gonina, E.; Ben-Tzur, J.; Hardt, M.; Recht, B.; Talwalkar, A. A system for massively parallel hyperparameter tuning. Proc. Mach. Learn. Syst. 2020, 2, 230–246. [Google Scholar]
Li, Y.; Shen, Y.; Jiang, H.; Zhang, W.; Li, J.; Liu, J.; Zhang, C.; Cui, B. Hyper-tune: Towards efficient hyper-parameter tuning at scale. arXiv 2022, arXiv:2201.06834. [Google Scholar] [CrossRef]
Oyedeji, S.; Seffah, A.; Penzenstadler, B. A catalogue supporting software sustainability design. Sustainability 2018, 10, 2296. [Google Scholar] [CrossRef]
Calero, C.; Moraga, M.Á.; Piattini, M. Introduction to Software Sustainability. In Software Sustainability; Calero, C., Moraga, M.Á., Piattini, M., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 1–15. [Google Scholar] [CrossRef]
Noman, H.; Mahoto, N.A.; Bhatti, S.; Abosaq, H.A.; Al Reshan, M.S.; Shaikh, A. An Exploratory Study of Software Sustainability at Early Stages of Software Development. Sustainability 2022, 14, 8596. [Google Scholar] [CrossRef]
Calero, C.; Bertoa, M.F.; Moraga, M.Á. A systematic literature review for software sustainability measures. In Proceedings of the 2013 2nd International Workshop on Green and Sustainable Software (GREENS), San Francisco, CA, USA, 20 May 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 46–53. [Google Scholar]
Tornede, T.; Tornede, A.; Hanselle, J.; Mohr, F.; Wever, M.; Hüllermeier, E. Towards green automated machine learning: Status quo and future directions. J. Artif. Intell. Res. 2023, 77, 427–457. [Google Scholar] [CrossRef]
Heguerte, L.B.; Bugeau, A.; Lannelongue, L. How to estimate carbon footprint when training deep learning models? A guide and review. arXiv 2023, arXiv:2306.08323. [Google Scholar]
Lannelongue, L.; Grealey, J.; Inouye, M. Green algorithms: Quantifying the carbon footprint of computation. Adv. Sci. 2021, 8, 2100707. [Google Scholar] [CrossRef]
Patel, Y.S.; Mehrotra, N.; Soner, S. Green cloud computing: A review on Green IT areas for cloud computing environment. In Proceedings of the 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE), Noida, India, 25–27 February 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 327–332. [Google Scholar]
Maevsky, D.; Maevskaya, E.; Stetsuyk, E. Evaluating the RAM energy consumption at the stage of software development. In Green IT Engineering: Concepts, Models, Complex Systems Architectures; Springer: Berlin/Heidelberg, Germany, 2017; pp. 101–121. [Google Scholar]
Budennyy, S.; Lazarev, V.; Zakharenko, N.; Korovin, A.; Plosskaya, O.; Dimitrov, D.; Arkhipkin, V.; Oseledets, I.; Barsola, I.; Egorov, I.; et al. Eco2AI: Carbon emissions tracking of machine learning models as the first step towards sustainable AI. arXiv 2022, arXiv:2208.00406. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 84–90. [Google Scholar] [CrossRef]
Chollet, F. Keras. 2015. Available online: https://github.com/fchollet/keras (accessed on 19 December 2023).
Liaw, R.; Liang, E.; Nishihara, R.; Moritz, P.; Gonzalez, J.E.; Stoica, I. Tune: A Research Platform for Distributed Model Selection and Training. arXiv 2018, arXiv:1807.05118. [Google Scholar]
Ying, X. An overview of overfitting and its solutions. J. Phys. Conf. Ser. 2019, 1168, 022022. [Google Scholar] [CrossRef]
Jabbar, H.; Khan, R.Z. Methods to avoid over-fitting and under-fitting in supervised machine learning (comparative study). Comput. Sci. Commun. Instrum. Devices 2015, 70, 978–981. [Google Scholar]
Dietterich, T. Overfitting and undercomputing in machine learning. ACM Comput. Surv. (CSUR) 1995, 27, 326–327. [Google Scholar] [CrossRef]
Samek, W.; Stanczak, S.; Wiegand, T. The convergence of machine learning and communications. arXiv 2017, arXiv:1708.08299. [Google Scholar]
Tariq, H.I.; Sohail, A.; Aslam, U.; Batcha, N.K. Loan default prediction model using sample, explore, modify, model, and assess (SEMMA). J. Comput. Theor. Nanosci. 2019, 16, 3489–3503. [Google Scholar] [CrossRef]
Iantovics, L.B.; Dehmer, M.; Emmert-Streib, F. MetrIntSimil—An accurate and robust metric for comparison of similarity in intelligence of any number of cooperative multiagent systems. Symmetry 2018, 10, 48. [Google Scholar] [CrossRef]
Iantovics, L.B. Black-box-based mathematical modelling of machine intelligence measuring. Mathematics 2021, 9, 681. [Google Scholar] [CrossRef]
Bisong, E.; Bisong, E. Google colaboratory. In Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners; Apress: Berkeley, CA, USA, 2019; pp. 59–64. [Google Scholar]
McKnight, P.E.; Najab, J. Mann-Whitney U Test. In The Corsini Encyclopedia of Psychology; John Wiley & Sons: Hoboken, NJ, USA, 2010; p. 1. [Google Scholar] [CrossRef]

Figure 1. The illustration depicts a workflow diagram for the proposed methodology. This workflow encompasses the input dataset and advanced algorithms for HPO, which are assessed using performance and sustainability metrics. Ultimately, high-performance artificial intelligence (AI) models are generated, or alternatively, more sustainable models are produced, facilitating the development of eco-friendly AI systems.

Figure 2. The CO₂e metric results for the 100 trials conducted by Red AI and Green AI are depicted for each algorithm. The visualization highlights the best results in terms of the CO₂e metric optimization within the Green AI strategy and Red AI. The outliers represented are combinations of hyperparameters that result in solutions with high energy consumption and, consequently, with a high value of carbon dioxide equivalent. Outliers resulting from the experimentation are also presented.

Figure 3. The graph provides a visual representation of the distribution of the CO₂e metric within the context of Green and Red artificial intelligence (AI) strategies. The results offer an in-depth view of the data density, depicted on both sides of an axis, thereby facilitating a more comprehensive understanding of the data distribution. Additionally, the figure displays the median, quartiles, and outliers of the data, offering an all-encompassing view of this metric’s distribution across both AI strategies. Outliers are combinations of hyperparameters that offer atypical solutions compared to other combinations.

Figure 4. The graph provides a visual representation of the distribution of the CO₂e metric in the context of Green and Red artificial intelligence (AI) strategies for the IMDb dataset for the PBT and BOHB algorithms. The figure shows the median and quartiles, providing a comprehensive view of the distribution of this metric in both AI strategies. Outliers resulting from the experimentation are also presented.

Figure 5. The graph provides a visual representation of the distribution of the validation accuracy metric in the context of Green and Red artificial intelligence (AI) strategies for the IMDb dataset for the PBT and BOHB algorithms. The figure shows the median and quartiles, providing a comprehensive view of the distribution of this metric in both AI strategies.

Table 1. The table elucidates the findings of the experiment conducted within the Red AI. Ten trials were executed, showing the selection of the two best and worst results, each comprising five iterations of the ASHA, BOHB, HB, and PBT algorithms, each with distinct hyperparameters. The optimized hyperparameters included learning rates (lrs) of 0.001, 0.00125, and 0.0015; epochs ranging from 5 to 20 iterations; optimizers such as Adam (AD), RMSprop (Rprop), and SGD; and batch sizes of 16, 32, and 64.

Algorithm	Validation	$E_{Total}$ (w)	Time (s)	lr	Epoch	Optimizer	Bach	CO₂e
	Accuracy							(g/kWh)
ASHA	47.2	3065.44	506.02	0.0015	5	AD	16	0.582
ASHA	68.68	3300.40	501.50	0.00125	5	Rprop	16	0.627
ASHA	78.91	8729.97	1188.48	0.0015	20	AD	32	1.659
ASHA	80.11	1433.52	206.57	0.001	5	Rprop	64	0.272
BOHB	67.14	2088.49	299.37	0.001	5	SGD	32	0.397
BOHB	68.78	6096.27	787.88	0.00125	20	SGD	64	1.158
BOHB	79.15	8215.46	1154.96	0.001	20	Rprop	32	1.561
BOHB	81.63	6211.69	795.02	0.00125	20	Rprop	64	1.180
HB	45.45	4412.82	504.52	0.0015	5	AD	16	0.838
HB	67.02	18,285.95	1967.38	0.00125	20	AD	16	3.474
HB	79.61	3414.21	376.82	0.001	10	Rprop	64	0.649
HB	80.06	882.07	231.42	0.001	10	Rprop	64	0.168
PBT	51.63	3734.77	505.15	0.0015	5	AD	16	0.710
PBT	63.92	3206.16	474.56	0.00125	5	SGD	16	0.609
PBT	78.84	2020.27	261.97	0.001	5	Rprop	64	0.384
PBT	80.32	6119.05	782.53	0.001	20	AD	64	1.163

Table 2. Comparative performance of algorithms in Red AI with 10 trials.

Algorithms	CO₂e (g/kWh)				Val Accuracy
Algorithms	Std	Var	Avg	Median	Std	Var	Avg	Median
ASHA	0.539	0.290	0.926	0.868	9.486	80.984	72.036	73.915
BOHB	0.578	0.334	1.218	1.169	4.701	19.886	75.851	77.420
HB	1.132	1.281	1.563	0.738	10.238	94.326	70.956	70.280
PBT	0.447	0.200	0.854	1.222	9.078	74.169	70.502	71.970

Table 3. Comparative performance of algorithms in Red AI with 100 trials.

Algorithms	CO₂e (g/kWh)				Val Accuracy
Algorithms	Std	Var	Avg	Median	Std	Var	Avg	Median
ASHA	0.770	0.592	1.035	0.819	7.806	60.930	72.614	74.585
BOHB	0.703	0.494	0.969	0.763	8.454	71.477	72.687	75.850
HB	0.864	0.746	1.202	0.993	5.392	29.072	74.555	76.045
PBT	0.795	0.633	1.156	0.958	8.179	66.888	72.255	74.395

Table 4. The table elucidates the findings of the experiment conducted within the Green AI. A total of ten trials were executed, showing the selection of the two best and worst results, each comprising five iterations of the ASHA, BOHB, HB, and PBT algorithms, each with distinct hyperparameters. The optimized hyperparameters included learning rates (lrs) of 0.001, 0.00125, and 0.0015; epochs ranging from 5 to 20 iterations; optimizers such as Adam (AD), RMSprop (Rprop), and SGD; and batch sizes of 16, 32, and 64.

Algorithm	Validation	$E_{Total}$ (w)	Time (s)	lr	Epoch	Optimizer	Bach	CO₂e
	Accuracy							(g/kWh)
ASHA	57.11	5419.03	1462.94	0.0015	20	SGD	16	1.030
ASHA	65.64	4169.07	515.17	0.00125	5	AD	16	0.792
ASHA	79.44	5694.30	732.91	0.001	20	Rprop	64	1.082
ASHA	80.08	5700.88	739.54	0.0015	20	AD	64	1.083
BOHB	64.27	2360.44	446.59	0.0015	5	SGD	16	0.448
BOHB	68.81	1112.85	182.34	0.001	5	SGD	64	0.211
BOHB	80.34	2646.50	372.06	0.00125	10	AD	64	0.503
BOHB	81.11	4827.45	720.88	0.001	20	Rprop	64	0.917
HB	39.53	13,085.63	1965.14	0.0015	20	AD	16	2.486
HB	65.31	851.26	147.24	0.0015	5	SGD	64	0.162
HB	78.82	7143.64	1077.33	0.00125	20	AD	32	1.357
HB	80.23	1254.59	325.38	0.0015	10	Rprop	64	0.238
PBT	69.01	1274.76	180.78	0.001	5	SGD	64	0.242
PBT	72.35	3199.51	480.07	0.001	5	Rprop	16	0.608
PBT	79.58	1390.35	279.76	0.0015	10	AD	64	0.264
PBT	80.22	2295.46	358.83	0.0015	10	Rprop	64	0.436

Table 5. Comparative performance of algorithms in Green AI with 10 trials.

Algorithms	CO₂e (g/kWh)				Val Accuracy
Algorithms	Std	Var	Avg	Median	Std	Var	Avg	Median
ASHA	0.565	0.287	1.059	0.932	7.538	51.133	72.750	74.170
BOHB	0.584	0.307	0.794	0.710	5.717	29.415	74.644	75.930
HB	0.723	0.470	0.811	0.549	12.357	137.427	72.323	78.210
PBT	0.830	0.621	0.827	0.522	3.536	11.251	76.788	77.630

Table 6. Comparative performance of algorithms in Green AI with 100 trials.

Algorithms	CO₂e (g/kWh)				Val Accuracy
Algorithms	Std	Var	Avg	Median	Std	Var	Avg	Median
ASHA	0.657	0.432	0.959	0.723	9.818	96.391	71.746	75.825
BOHB	0.512	0.263	0.809	0.753	8.457	71.528	72.924	75.915
HB	0.578	0.334	0.870	0.748	9.258	85.718	71.799	74.350
PBT	0.569	0.323	0.846	0.644	6.368	40.553	73.918	75.905

Table 7. The table elucidates the findings of the experiment conducted within the Red AI. A total of thirty trials were executed, each comprising three iterations of the PBT algorithms, each with distinct hyperparameters. The optimized hyperparameters included maximum features ranging from 10,000 to 30,000; epochs ranging from 2 to 10 iterations; optimizers such as Adam (AD), RMSprop (Rprop), and SGD; and batch sizes of 32, 64, and 128.

Validation	$E_{Total}$ (w)	Time (s)	Max	Max	Epoch	Optimizer	Bach	CO₂e
Accuracy			Features	Len				(g/kWh)
49.71	37,433.92	1104.51	30,000	200	5	SGD	128	7.112
50.14	11,500.53	308.01	20,000	300	2	SGD	64	2.185
50.37	784.58	1190.19	30,000	200	5	SGD	128	0.149
50.44	1465.32	876.20	10,000	300	2	SGD	128	0.278
50.60	12,907.44	362.60	10,000	300	2	SGD	32	2.452
50.71	177.67	331.36	30,000	300	2	SGD	64	0.034
51.14	43,252.82	1531.88	10,000	200	5	SGD	128	8.218
51.59	78.84	108.74	20,000	100	2	SGD	64	0.015
51.89	526.51	529.39	30,000	200	5	SGD	64	0.100
53.49	36,301.74	1290.10	10,000	200	10	SGD	64	6.897
54.80	20,993.73	692.94	10,000	100	10	SGD	32	3.989
54.90	1447.49	634.36	30,000	100	10	SGD	32	0.275
82.78	69.70	160.30	30,000	100	2	Rprop	32	0.013
83.39	36.56	120.64	30,000	100	2	Rprop	64	0.007
83.98	59.84	223.55	20,000	200	2	AD	64	0.011
84.53	3622.88	122.31	30,000	100	2	Rprop	128	0.688
84.98	168.72	221.93	30,000	100	5	AD	64	0.032
85.27	58.44	118.89	20,000	100	2	AD	64	0.011
85.28	244.22	458.85	20,000	100	5	Rprop	128	0.046
85.29	12,772.85	446.51	10,000	100	5	Rprop	128	2.427
85.32	203.98	575.88	10,000	100	10	AD	32	0.039
85.51	20,094.64	739.76	20,000	100	10	Rprop	64	3.818
85.53	17,832.91	617.82	30,000	200	2	Rprop	128	3.388
86.46	58,798.03	1791.48	10,000	300	5	AD	128	11.172
86.95	9827.10	3291.42	20,000	200	10	AD	128	1.867
87.46	709.49	804.46	30,000	200	5	Rprop	64	0.135
87.80	87,803.41	3275.55	30,000	200	10	Rprop	128	16.683
88.12	21,998.02	603.18	10,000	200	5	Rprop	32	4.180
88.36	7146.22	1923.02	10,000	300	10	Rprop	64	1.358
88.44	65,218.35	1934.61	10,000	300	10	AD	32	12.391

Table 8. The table elucidates the findings of the experiment conducted within the Green AI. A total of thirty trials were executed, each comprising three iterations of the PBT algorithms, each with distinct hyperparameters. The optimized hyperparameters included maximum features ranging from 10,000 to 30,000; epochs ranging from 2 to 10 iterations; optimizers such as Adam (AD), RMSprop (Rprop), and SGD; and batch sizes of 32, 64, and 128.

Validation	$E_{Total}$ (w)	Time (s)	Max	Max	Epoch	Optimizer	Bach	CO₂e
Accuracy			Features	Len				(g/kWh)
49.52	33.13	181.02	30,000	200	2	SGD	64	0.006
49.92	2228.96	209.74	30,000	200	2	SGD	32	0.424
49.95	5542.27	515.52	30,000	200	2	SGD	128	1.053
50.52	764.49	73.74	30,000	100	2	SGD	64	0.145
50.65	208.67	325.90	30,000	300	2	SGD	64	0.040
50.65	28.73	316.55	30,000	300	2	SGD	64	0.005
50.66	48.50	177.30	10,000	200	2	SGD	64	0.009
50.82	1911.84	178.27	20,000	200	2	SGD	64	0.363
51.36	26.67	205.24	20,000	200	2	SGD	32	0.005
52.11	22,179.18	2030.80	20,000	200	10	SGD	128	4.214
52.48	165.03	977.13	30,000	200	10	SGD	64	0.031
52.67	3676.45	334.12	10,000	100	5	SGD	128	0.699
54.17	18,966.34	1726.26	30,000	300	10	SGD	32	3.604
83.39	999.66	94.38	10,000	100	2	Rprop	32	0.190
84.64	125.26	341.75	30,000	100	5	ad	128	0.024
84.69	2871.99	272.41	10,000	100	5	AD	32	0.546
84.9	345.59	968.24	30,000	300	2	Rprop	128	0.066
85.22	66.89	619.44	30,000	100	10	AD	32	0.013
85.32	118.55	343.55	20,000	100	5	AD	128	0.023
85.58	14.89	99.98	30,000	100	2	Rprop	32	0.003
85.66	33.33	277.84	10,000	100	5	Rprop	32	0.006
86.16	24,388.97	2251.25	20,000	300	5	AD	128	4.634
86.29	12,543.62	1123.19	30,000	200	10	AD	32	2.383
86.38	330.43	556.33	20,000	200	5	AD	32	0.063
87.15	485.82	1190.38	10,000	200	5	AD	128	0.092
87.31	21,353.57	1926.95	30,000	300	5	AD	128	4.057
87.44	48.23	498.27	30,000	200	5	Rprop	64	0.009
87.7	40,846.20	3690.75	30,000	300	10	AD	128	7.761
87.77	16,337.37	1492.16	30,000	300	10	Rprop	64	3.104
87.81	122.23	490.41	10,000	200	5	Rprop	64	0.023

Table 9. The table elucidates the findings of the experiment conducted within the Red AI. A total of thirty trials were executed, each comprising three iterations of the BOHB algorithms, each with distinct hyperparameters. The optimized hyperparameters included maximum features ranging from 10,000 to 30,000; epochs ranging from 2 to 10 iterations; optimizers such as Adam (AD), RMSprop (Rprop), and SGD; and batch sizes of 32, 64, and 128.

Validation	$E_{Total}$ (w)	Time (s)	Max	Max	Epoch	Optimizer	Bach	CO₂e
Accuracy			Features	Len				(g/kWh)
50.20	30,212.11	1369.62	10,000	2000	SGD	32	5	5.74
50.98	4956.48	156.27	30,000	300	SGD	64	5	0.942
51.38	1507.00	47.21	10,000	100	SGD	64	2	0.286
51.80	20,541.84	713.15	30,000	2000	SGD	64	5	3.903
51.84	1859.32	56.88	20,000	100	SGD	128	5	0.353
52.30	4861.84	122.80	20,000	300	SGD	128	5	0.924
52.36	32,433.27	1471.54	10,000	2000	SGD	32	5	6.162
52.58	43,899.18	1333.07	20,000	2000	SGD	64	10	8.341
84.82	4379.05	117.07	30,000	100	AD	128	10	0.832
85.28	4886.98	143.89	20,000	100	Rprop	64	10	0.929
85.39	2465.77	95.36	10,000	100	Rprop	64	5	0.468
86.17	10,544.97	337.84	20,000	300	AD	64	10	2.004
86.28	9402.99	343.67	20,000	2000	AD	64	2	1.787
86.84	9573.21	308.11	20,000	300	AD	32	5	1.819
86.92	6268.75	207.86	10,000	300	AD	64	5	1.191
86.95	16,685.24	616.88	20,000	300	AD	32	10	3.17
86.97	14,972.11	661.55	20,000	2000	AD	32	2	2.845
87.39	21,227.97	981.87	30,000	2000	AD	128	10	4.033
87.44	32,236.47	971.46	20,000	2000	AD	128	10	6.125
87.48	11,062.88	330.84	20,000	2000	AD	64	2	2.102
87.50	14,197.34	492.86	10,000	2000	AD	128	5	2.697
87.55	13,350.97	509.96	30,000	2000	AD	128	5	2.537
87.68	5943.76	180.63	30,000	300	Rprop	64	5	1.129
87.80	6808.25	213.08	10,000	2000	AD	128	2	1.294
87.95	2323.36	74.71	30,000	300	AD	128	2	0.441
87.96	37,971.15	1356.75	10,000	2000	AD	32	5	7.215
88.08	60,120.23	2608.80	10,000	2000	AD	32	10	11.423
88.14	21,655.02	715.80	10,000	2000	Rprop	64	5	4.114
88.15	8141.40	268.05	10,000	300	Rprop	32	5	1.547
88.74	68,038.37	2610.30	30,000	2000	Rprop	32	10	12.927

Table 10. The table elucidates the findings of the experiment conducted within the Green AI. A total of thirty trials were executed, each comprising three iterations of the BOHB algorithms, each with distinct hyperparameters. The optimized hyperparameters included maximum features ranging from 10,000 to 30,000; epochs ranging from 2 to 10 iterations; optimizers such as Adam (AD), RMSprop (Rprop), and SGD; and batch sizes of 32, 64, and 128.

Validation	$E_{Total}$ (w)	Time (s)	Max	Max	Epoch	Optimizer	Bach	CO₂e
Accuracy			Features	Len				(g/kWh)
50.71	1239.15	47.43	10,000	100	2	SGD	64	0.235
51.06	6568.25	269.55	20,000	300	5	SGD	32	1.248
52.02	16,534.06	494.55	10,000	300	10	SGD	32	3.141
52.32	7881.13	280.74	10,000	300	5	SGD	32	1.497
52.33	4198.72	104.58	10,000	300	5	SGD	128	0.798
52.54	39,271.67	1264.49	30,000	2000	5	SGD	32	7.462
53.27	4319.18	132.45	20,000	100	10	SGD	64	0.821
55.32	3281.47	139.56	10,000	100	10	SGD	64	0.623
56.2	64,652.37	2390.88	10,000	2000	10	SGD	32	12.284
80.45	1277.27	43.1	20,000	100	2	Rprop	128	0.243
84.25	2651.41	141.34	30,000	100	5	Rprop	32	0.504
84.38	2081.01	59.62	10,000	100	5	Rprop	128	0.395
84.49	1890.48	76.92	10,000	100	2	AD	32	0.359
84.63	4143.16	154.09	20,000	100	10	AD	64	0.787
85.12	2231.29	70.12	10,000	100	5	AD	128	0.424
85.19	3435.22	98.1	30,000	100	10	Rprop	128	0.653
85.28	1846.09	60.68	30,000	300	2	Rprop	128	0.351
85.38	3129.63	144.08	20,000	100	5	Rprop	32	0.595
86.23	2160.13	73.28	30,000	300	2	AD	128	0.41
86.78	5235.87	133.2	30,000	300	5	AD	128	0.995
86.82	3632.52	129.11	10,000	300	2	Rprop	32	0.69
86.94	2044.75	57.46	10,000	300	2	Rprop	128	0.389
87.15	22,058.11	918.94	30,000	2000	10	Rprop	128	4.191
87.18	11,013	316	30,000	2000	2	AD	64	2.092
87.22	11,889.38	303.38	10,000	300	10	AD	64	2.259
87.28	4541.17	165.07	30,000	300	5	Rprop	64	0.863
87.32	26,534.06	936.43	10,000	2000	10	AD	128	5.041
87.44	15,022.42	464.5	30,000	2000	5	Rprop	128	2.854
87.45	3314.13	105.16	30,000	300	2	AD	64	0.63
87.55	46,168.17	1370.24	30,000	2000	10	Rprop	64	8.772

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Castellanos-Nieves, D.; García-Forte, L. Strategies of Automated Machine Learning for Energy Sustainability in Green Artificial Intelligence. Appl. Sci. 2024, 14, 6196. https://doi.org/10.3390/app14146196

AMA Style

Castellanos-Nieves D, García-Forte L. Strategies of Automated Machine Learning for Energy Sustainability in Green Artificial Intelligence. Applied Sciences. 2024; 14(14):6196. https://doi.org/10.3390/app14146196

Chicago/Turabian Style

Castellanos-Nieves, Dagoberto, and Luis García-Forte. 2024. "Strategies of Automated Machine Learning for Energy Sustainability in Green Artificial Intelligence" Applied Sciences 14, no. 14: 6196. https://doi.org/10.3390/app14146196

APA Style

Castellanos-Nieves, D., & García-Forte, L. (2024). Strategies of Automated Machine Learning for Energy Sustainability in Green Artificial Intelligence. Applied Sciences, 14(14), 6196. https://doi.org/10.3390/app14146196

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Strategies of Automated Machine Learning for Energy Sustainability in Green Artificial Intelligence

Abstract

1. Introduction

2. State of the Art

2.1. Hyperparameter Optimization

2.2. Green AI in Hyperparameter Optimization

3. Hyperparameter Optimization Problem

3.1. Definitions of Hyperparameter Optimization

3.2. HyperBand Optimization Algorithm

3.3. Bayesian Optimization HyperBand

3.4. Population-Based Training

3.5. Asynchronous Successive Halving

3.6. Relationship between Hyperparameter Configuration and Convergence Efficiency

4. Metrics with Environmental Implications

4.1. Runtime

4.2. Energy Consumption of CPUs, GPUs, and TPUs

4.3. Energy Consumption of RAM

4.4. Assessing the Carbon Footprint or CO2e

5. Materials and Methods

5.1. Methodology

5.2. Datasets

5.3. Proof of Concept on Sustainability

5.4. Hyperparameter Optimization and Architecture

5.5. Experiment

6. Results

6.1. Experiment Red AI with CIFAR

6.2. Experiment Green AI with CIFAR

6.3. Experimentation with the PBT Algorithm Using the IMDb Dataset

6.4. Experimentation with the BOHB Algorithm Using the IMDb Dataset

7. Discussion

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.4. Assessing the Carbon Footprint or CO₂e