1. Introduction
Currently, data centres and high-performance computing infrastructures massively contribute to climate change, emitting over 100 megatons of CO
2 per year—a figure comparable to that of American commercial aviation [
1]. With regard to the exponentially increasing computational and energy demands of modern machine learning (ML), most research articles in ML do not regularly report energy metrics or CO
2 emissions [
2].
Within the field of Computational Intelligence, evolutionary algorithms (EAs) are optimisation and population-based algorithms that mimic the process of natural biological evolution to find high-quality solutions for complex problems. They apply evolutionary principles to iteratively improve the set of potential solutions through selection and variation. Even though EAs are not as massive as Large Language Models (LLM), they are widely applied in network optimisation, hyperparameter tuning, drug design, and industrial simulation [
3]. The ongoing improvement of performance has been for decades the main motivation behind research efforts in the domain of EAs [
4,
5,
6]. Research works are usually evaluated using two metrics: the quality of the solution to the problem and the time to convergence.
Two distinctive characteristics of evolutionary algorithms exacerbate the problem: their stochastic nature forces many iterations to be run to achieve robust results, and the hyperparameter exploration multiplies the number of executions. Moreover, some studies reveal that handheld devices such as Raspberry Pi and tablets require an order of magnitude less energy to run evolutionary algorithms than standard computers such as laptops and iMacs, and even parameters such as the population size (potential solutions in the evolutionary process) also have an impact on the energy requirements [
7].
Consequently, it is essential to convey that energy awareness, which has already been discussed in the context of machine learning [
8], to the specific domain of evolutionary algorithmss. In this specific case, two main questions must be considered. First, what would be the difference in consumption when running the same EA on hardware devices with different computing characteristics, with the same configurations and stopping criterion? Second, how does this consumption relate to the quality of the solution obtained and to the internal configuration of the algorithm itself?
The following research questions (RQs) have arisen as a result of raising these issues: RQ 1 (hardware-energy). Does a server consume significantly more energy than a laptop when running the same EA with identical settings and a time stop criterion? RQ 2 (setting-efficiency). How do the parameters impact energy efficiency ()? Further, is this effect consistent across the assessed frameworks? RQ 3 (framework-performance). Are the compiled frameworks (C++ and Java) more energy efficient than the interpreted ones (Python) when solving identical problems with EAs?
The rest of the paper is organised into six sections.
Section 2 reviews the state of the art on the energy footprint of the EAs and current monitoring tools.
Section 3 details the experimental methodology, including the benchmarks chosen, the frameworks compared, the common genetic setting, the execution environments, and the energy consumption instrumentation.
Section 4 outlines the quantitative results supported by its statistical and graphical analysis. In
Section 5, the implications and limitations of these findings are discussed in light of the research questions posed and the previous literature. Finally, the main conclusions are summarised and suggestions for future research lines are provided in
Section 6.
2. Background
The concept of green computing has emerged in the last decade to accurately address the issue of energy consumption, which is inextricably linked to computing, especially in large infrastructures such as data centres [
7]. This growing interest is particularly evident in large-scale infrastructures, where increasing physical resources inevitably leads to higher energy consumption [
9].
The relevance of the energy issue in evolutionary computation has been considerably stepped up. Henderson et al. [
2] pointed out that with the compute and energy demands of the modern machine learning methods exponentially growing, ML systems have the potential to significantly contribute to carbon emissions. However, the majority of the research articles do not regularly report energy or carbon emissions metrics. Data centres and high-performance computing infrastructure contribute 100 megatons of CO
2 emissions per year, similar to US commercial aviation [
1].
In this context, EAs exhibit characteristics that exacerbate the problem: their stochastic nature forces many iterations to be run to achieve robust results, and hyperparameter exploration multiplies the number of executions.
First, quantifying energy consumption in EAs has evolved from descriptive studies to standardized measurement protocols. Ref. [
7] carried out the first cross-platform trials and demonstrated that handheld devices such as Raspberry Pi and tablets require an order of magnitude less energy to run the same EA compared to standard computers such as laptops and iMacs, while population size also has a significant impact on energy requirements.
Notwithstanding, there has been significant advancement in energy measurement instrumentation; ref. [
10] implemented considerably precise measurements using the Pinpoint tool. This tool accesses the RAPL API and facilitates the collection of consistent measurements of the energy consumption of the system while a process is running. They achieved 5% precision in hardware counters. However, ref. [
11] warns that accuracy depends on the instrumental proportion representing a significant part of CPU time. Conversely, the ’noise’ of the system may mask subtle differences. Furthermore, the influence of the programming language on energy consumption has generated extensive research with unexpected results. Ref. [
12] settled that there is no clear category of superior languages, due to the performance that varies according to the specific operation, although Java is almost always among the fastest along with C or Go.
Secondly, comparative experiments convey additional complexities: interpreted languages such as PHP may overcome C++ with particular operations, blurring the differences between fast compiled languages and slow scripting languages [
12]. More recent studies have delved deeper into this research line, using more accurate methodologies. Merelo et al. [
10] evaluated three representative languages at different levels of abstraction using computationally intensive fitness functions, such as HIFF (Hierarchical If and only IF). Their results showed that Kotlin, which compiles JVM bytecode, achieved the best energy efficiency at 131.02 operations per Joule, followed by Zip (114.40 ops/J) as a native, compiled low-level language, and JavaScript using bun as an interpreted language (93.92 ops/J). These findings demonstrated that the difference in energy consumption is so low that it falls below the average for the generation or the chromosomes in specific scenarios, suggesting that optimisations at the compiler level and virtual machine are more determining that than those at the language’s abstraction level.
Furthermore, the granularity of implementation also influences energy consumption significantly. Authors in [
13] documented that the use of fixed-size data structures consumes approximately one-eighth of the energy required by variable-size structures such as arrays. Moreover, experiments with transprecision computing using customised 8 and 16-bit formats reduced energy consumption by up to 30%.
Thirdly, the explicit consideration of energy as an optimisation objective has evolved from being used for specific applications to being used for generalised, multi-objective approaches. Ref. [
14] developed immediately a simulation–optimisation approach using evolutionary programming for the effective management of the energy used by HVAC systems. They achieved a 7% energy savings compared to existing operational settings.
Lee et al. [
15] extended this approximation and demonstrated that a Differential Evolution algorithm identified combinations of parallelisation that, at low loads (40%), reduce energy consumption by up to 23.4% compared to the Lagrangian method. More recent applications include the management of residential demand. In this context, ref. [
16] proposed a multi-file differential evolution that can simultaneously reduce costs and increase user comfort.
Finally, recent literature explores strategies that adjust algorithmic parameters during execution to optimise energy efficiency. Cotta et al. [
11] demonstrated that the introduction of short pauses between executions can reduce the energy consumption up to 9%. Paradoxically, ref. [
7] documented that population size can influence energy consumption, with larger populations sometimes requiring less total energy due to automated adjustments in the processor frequency. The use of surrogate models has shown a significant energy impact, as [
17] reported that in cache optimisation for embedded systems, a combination of NSGA-II and Dinero IV simulator achieved an average reduction of 92% in kWh compared to an exhaustive search.
4. Results and Analysis
The primary quantitative findings derived from the experimental analysis are presented below, organised by key metrics and concluding with the overall ranking of frameworks. For better generalisation, in this Section we have grouped the results at the framework and benchmark levels.
4.1. Estimated Energy Consumption
Table 2 shows the average energy consumption on the laptop platform. ParadisEO emerges as the most energy-efficient framework, with an average consumption of
kWh, and DEAP comes second with
kWh.
This closeness, with a difference of only , demonstrates that optimised implementations in Python can achieve energy efficiency comparable to native compilation in C++ under moderate computational loads. Conversely, ECJ shows the highest consumption (0.000373 kWh), denoting that the JVM overhead imposes a substantial penalty on performance in environments with limited resources.
The energy hierarchy is completely reversed in the server architecture (see
Table 3). With a consumption of
kWh, ECJ emerges as the most efficient framework, leveraging its substantial processing capabilities and JIT optimisation to the fullest extent. The values of the frameworks are similar to each other (only 7.3% variation), which is very different to the 13.6% variation observed on the laptop. This shows that high-performance architectures can balance the differences in how languages are implemented.
The scaling factor of the server/laptop (see
Table 4) reveals systematic patterns that defy common assumptions about computational efficiency. DEAP demonstrates the most aggressive scaling in OneMax (7.15×), but exhibits more moderate scaling in other benchmarks, resulting in an overall factor of 5.08×. ParadisEO shows the most balanced scaling, with an overall factor of 5.20× and relatively steady ratios ranging from 4.04× (Rosenbrock) to 6.90× (Sphere). ECJ yields the lowest overall scaling factor (4.58×), which is due to its architecture being optimized for parallelism. This benefit is particularly evident in OneMax (4.17×). Inspyred maintains an intermediate scaling factor of 5.03×, with little variability between benchmarks. These results show that the server consistently uses 4–6 times more energy than the laptop, suggesting that there is no direct relationship between computational power and energy efficiency with the same amount of time allocated.
4.2. Computing Performance Measured by the Maximum Number of Generations
The computational performance of the laptop (see
Table 5) highlights significant differences between the frameworks. ECJ dominates categorically with an average of 32,685 generations, reaching extraordinary peaks in Sphere (43,335) and Rosenbrock (45,158), where its compiled loop efficiency is fully demonstrated. ParadisEO is in a solid middle position with an average of 8700 generations, showing particular strength in Schwefel (11,626). Python frameworks are severely limited: Inspyred reaches an average of merely 964 generations, while DEAP achieves and average of just 975. This 33× disparity between ECJ and Python frameworks reflects the fundamental differences between JIT-compiled bytecode and dynamic interpretation in loop intensive applications.
The server (see
Table 6) dramatically amplifies the computational differences. ECJ achieves an extraordinary average of 91,252 generations, with outstanding performance in the Rosenbrock test (112,902). This demonstrates the superior scalability of the JVM in massively parallel environments. ParadisEO maintains a competitive position with 25,251 generations, with especially good results achieved in Schwefel (35,284). The Python frameworks show modest improvements: DEAP achieves 2522 generations and Inspyred obtains 2392, representing accelerations of just 2.6× and 2.8×, respectively, compared to the laptop. Meanwhile, ECJ and ParadisEO achieve accelerations of 2.9×. This difference in scalability demonstrates that compiled frameworks are better suited to take advantage of parallel architectures.
The server/laptop acceleration ratios (
Table 7) demonstrate the inherent scaling capabilities of each framework. OneMax is led by ECJ with a factor of 4.06×, with steady solid accelerations across all benchmarks. ParadisEO shows the most consistent acceleration with factors between 2.40× and 3.38×, demonstrating a robust design for parallelisation. Python frameworks demonstrate more modest acceleration: DEAP achieves factors of between 2.5 and 2.7, while Inspyred remains within a similar range. The superior scaling of compiled frameworks (ECJ and ParadisEO) over interpreted ones suggests that, when selecting a framework, it is important to consider not only absolute performance, but also the ability to leverage additional resources.
4.3. Maximum Fitness Reached
All fitness results in this Section have been normalized in the range of 0 to 1, with 1 representing the optimal value.
Table 8 shows the average fitness values obtained on the laptop platform, which disclose that computational speed does not automatically result in better search space exploration. ParadisEO leads the way with an overall fitness score of 0.797, demonstrating the effectiveness of its SBX operator and polynomial mutation with exceptional convergence in Rosenbrock (0.993). Inspyred ranks closely behind with a score of 0.794, exhibiting outstanding performance in Rosenbrock with a score of 0.984. This confirms that well-designed Python implementations can compete in terms of quality. Despite its computational superiority, ECJ ranks third (0.785), demonstrating particular strength in Sphere (0.893). DEAP exhibits the most variability in fitness (0.756 overall), and evinces excellence in Sphere (0.856) and weaknesses in Schwefel (0.589).
As depicted in
Table 9, Inspyred emerges as the leader in convergence quality (with an overall score of 0.815), leveraging its greater computational capacity, which enables it to execute more sophisticated exploration strategies. Its performance on Rosenbrock (0.991) and Sphere (0.867) is particularly noteworthy. ParadisEO maintains a competitive position with an overall score of 0.808 and consistent convergence across benchmarks. ECJ improves slightly on the laptop (0.789), but still fails to capitalise fully on its computational advantage in terms of solution quality. DEAP demonstrates the most significant relative improvement (0.782), particularly in the Sphere benchmark (0.893), indicating that its architecture leverages the available parallelism on the server.
As shown in
Table 9, the server/laptop fitness ratios demonstrate remarkable stability, with variations of less than 6% across all frameworks. This confirms that changes in architecture primarily affect execution speed without fundamentally altering the dynamics of the algorithmic search when spending the same amount of time. As it can be seen in
Table 10, DEAP exhibits the greatest variability, with ratios ranging from 1.01 to 1.06, suggesting architectural sensitivity in its convergence. Conversely, ParadisEO, ECJ and Inspyred demonstrate platform-independent algorithmic robustness with very stable ratios (1.01–1.03). This consistency across platforms is crucial for reproducible results, suggesting that high-quality findings obtained in one architecture can be transferred to another.
4.4. Metric
Table 11 depicts the computed values of metric
on the laptop. This score reveals the real balance between solution quality and energy consumption. ParadisEO is the most efficient framework overall (2520.472 fitness/kWh), particularly standing out in Sphere (3087.423), where it combines excellent convergence with low energy use. ECJ ranks second in Sphere (3460.072), but performs worse in other benchmarks, resulting in modest overall efficiency (2230.042). Inspyred maintains competitive efficiency (2206.367), demonstrating particular strength in Rosenbrock (2615.694). DEAP exhibits significant variability, achieving the highest score in OneMax (2782.618), but performing poorly in Schwefel (1656.521). This dispersion between frameworks highlights that energy efficiency is not a uniform property, but rather depends critically on the type of problem and the specific implementation.
As shown in
Table 12, the compression of efficiency differences on the server is dramatic, with only 8% variation between frameworks. ParadisEO retains first place (471.511 fitness/kWh), excelling particularly in Rosenbrock (600.927), where its genetic operators make optimal use of the available parallelism. ECJ ranks second with a score of 462.718, demonstrating consistent performance across benchmarks. Inspyred (447.962) and DEAP (434.594) exhibit comparable efficiency, primarily being penalised in Schwefel. This convergence on the server suggests that massively parallel architectures homogenise efficiency, reducing the specific competitive advantages of each implementation and rendering framework selection less critical from an energy perspective.
Table 13 depicts the laptop/server energy efficiency ratios, which reveal the ‘power paradox’: the laptop is consistently 4–7 times more efficient than the server. DEAP presents the most extreme ratios, which suggests that its architecture is particularly susceptible to server overload. ParadisEO shows more consistent ratios (4.07–6.78×), averaging 5.346×, which demonstrates its ability to adapt across different platforms. ECJ exhibits the most stable ratios (4.078–6.4×), benefiting from architectural optimisations that partially offset the server’s energy penalty. This paradox highlights a key trade-off: although the server can accelerate computation by 2.5–3 times measured in number of generations during the same amount of time, it reduces energy efficiency by 4–7 times, raising questions about the sustainability of high-performance computing for evolutionary algorithms.
Figure 1,
Figure 2,
Figure 3 and
Figure 4 graphically depict the result for
for each benchmark (OneMax, Sphere, Rosenbrock, and Schwefel), and all different experimental configurations carried out in this research work. In all graphs, X-axis represents individual experimental configurations and Y-axis the
metric value. As stated previously, there are evident differences between server and laptop results, but population size is also a critical factor: Fernández de Vega et al. [
7] detected a general trend of increasing energy costs as population size increases, an effect that may be due to problems related to memory management.
Figure 5 presents a two-dimensional evaluation of the thorough, which combines energy efficiency and solution quality. This reveals the holistic performance of each framework. ParadisEO emerges as the most balanced framework, positioning itself in the upper right quadrant, which combines high energy efficiency with competitive convergence quality. This dominant position reflects the advantages of native C++ compilation and optimised memory management, which achieve the best balance between sustainability and algorithmic effectiveness.
Inspyred ranks second, demonstrating that Python can compete with compiled languages when optimised effectively. Its favourable position indicates an excellent balance between ease of implementation and effective performance, making it particularly valuable for researchers who prioritise results and sustainability.
DEAP ranks third, excelling in terms of energy efficiency on laptops, but being penalised by its lower scalability on servers. While its declarative architecture favours experimental reproducibility, it introduces significant overhead in large configurations, restricting its applicability in computationally intensive environments.
Despite showing the highest raw performance in terms of number of generations per second, ECJ ranks last in the overall classification. Most experimental scenarios are adversely affected by the high energy consumption of these devices, particularly laptops, which leads to an unfavourable speed/energy ratio. This highlights that ECJ is better suited to problems where pure speed is prioritised over energy efficiency, such as real-time optimisation or problems involving extremely costly evaluations.
This comprehensive ranking demonstrates that computational power alone is insufficient for ensuring overall efficiency. It highlights the need for selecting the framework carefully, balancing speed, consumption and quality based on the specific needs of the application. The results suggest that, for most evolutionary algorithm research applications where sustainability and reproducibility are important, frameworks such as ParadisEO and Inspyred offer the best overall compromise.
5. Discussion
The experimental results reveal four key findings that challenge conventional assumptions about the computational efficiency of evolutionary algorithms. Firstly, an analysis of 2880 controlled experiments identified a clear trend of increased energy consumption and a ‘computational paradox’: an increase in processing power from 15 W (an i7-7500U laptop) to 241 W (an i9-12900KF server) results in an improvement in speed of only 2.5 generations, but increases energy consumption by between four and seven times. According to the metric , this indicates that the laptop is systematically 5–7 times more energy efficient, which fundamentally undermines assumptions about the relationship between computing power and energy efficiency.
Secondly, the systematic influence of population size on energy efficiency is a key factor in achieving a balance between solution quality and energy consumption. This finding remains consistent for both laptops and servers, suggesting a fundamental trade-off between population diversity and computational cost that is independent of implementation and hardware differences.
Thirdly, the data disclose specialisation by context, with no dominant universal framework. ParadisEO stands out for its consistency and balance (with an average of 2.408 on the laptop), while ECJ stands out for its pure processing capacity (with an average of 91,251 generations on the server). DEAP and Inspyred, meanwhile, show greater variability, but reach peaks of efficiency in specific configurations. A comprehensive analysis combining energy efficiency and solution quality through a two-dimensional evaluation confirms that ParadisEO is the most balanced framework, as it is positioned in the quadrant combining high energy efficiency and competitive convergence quality.
Finally, energy performance varies significantly depending on the topology of the problem. The greatest disparities between frameworks are observed in OneMax (up to a 40% difference in ), whereas in continuous benchmarks such as Sphere and Rosenbrock, the differences are reduced to 20–25%. Due to its multimodal nature, Schwefel amplifies implementation differences, confirming that framework selection must consider the specific characteristics of the search space.
The experimental validation of the research questions provides robust quantitative evidence. RQ1 (hardware-energy) is strongly confirmed: for identical configurations, the server consumes between 4.3 and 7.4 times more energy than the laptop during the same amount of time, while accelerating generations by only 2.3 to 2.9 times. This finding corroborates the observations of [
7], who stated that portable devices such as Raspberry Pis and tablets require an order of magnitude less energy to run evolutionary algorithms than standard computers such as laptops and iMacs. Beyond the numerical results, these differences can be explained by the higher baseline power requirements and the greater number of active resources in the server compared to the laptop, as well as differences in memory access and instruction throughput, which lead to higher energy needs even under the same time budget. This is a relevant point that should be explored in more detail. Therefore, resource estimation for running specific computing experiments, in turn, will contribute to increased energy efficiency and reduced environmental footprint.
RQ2 (configuration-efficiency) is also answered: population size systematically influences energy efficiency (
) in all analysed cases, establishing a robust empirical pattern that transcends differences in implementation and hardware. This corroborates the observations of [
7] that population size also significantly influences energy requirements.
RQ3 (framework-performance) is confirmed, but with important nuances: while compiled frameworks offer advantages in specific contexts, the relationship is not linear. ParadisEO (C++) leads in overall efficiency, ECJ (Java) in raw speed, but DEAP (Python) achieves higher
peaks in optimised configurations. This supports the “no fast lunch” principle of [
26]: there is no clear category of superior languages, as performance varies depending on the specific operation.
The findings’ contributions to the field are twofold. Firstly, they provide a systematic quantification of the computational power paradox in evolutionary algorithms. Secondly, they fundamentally challenge assumptions about computational efficiency. The homogeneous implementation of the metric,
, across four frameworks establishes a reproducible protocol that the community can adopt in response to ref. [
1]’s recommendation for `carbon impact statements’ for computational experiments.
The results are consistent with the emerging green computing paradigm described by [
7]. They extend this concept to the level of algorithm and framework selection and demonstrate that implementation decisions have a measurable environmental impact. Confirmation that DEAP and Inspyred can compete with compiled frameworks in terms of energy usage validates [
13]’s assertion that ’some interpreted languages, such as Python, can achieve competitive speeds in certain specific operations’.
Finally, while this study provides a comparative analysis of several frameworks, benchmarks, and parameter configurations, it is crucial to acknowledge the inherent limitations of our approach that prevent broad, universal conclusions. Our goal was not to declare a single `best’ framework universally, but rather to provide a methodological illustration and a controlled investigation into how specific implementations behave under defined conditions.
6. Conclusions
This research work presents the first systematic evaluation of energy efficiency in evolutionary algorithms using the unified metric , comparing four representative frameworks in two different architectures. The study reveals an interesting computational power behaviour. The increase in processing power from 15 W to 241 W results in only 2.5 times greater progress (measured in completed generations and in the improvement in fitness obtained), while energy consumption increases by a factor of 4 to 7. This confirms that lower-power devices are, in some cases, more efficient.
The results confirm that there is no universally dominant framework; rather, there is contextual specialisation. ParadisEO stands out for its overall consistency, ECJ for its pure computational speed and Python frameworks (DEAP and Inspyred) for their efficiency in specific configurations. Population size emerges as a crucial factor in achieving a balance between solution quality and energy consumption, exerting a consistent influence that surpasses differences in implementation and hardware.
The research sets out a step-by-step plan for assessing sustainability in evolutionary computing, and shows that the decisions made during implementation have a quantifiable effect on the environment. For research applications where sustainability is a priority, the selection of a framework must carefully balance speed, consumption and quality according to the specific context, with ParadisEO and Inspyred being the options that offer the best overall compromise between energy efficiency and algorithmic effectiveness.
It is important to mention that our study is limited to the specific set of benchmark problems, algorithmic parameters, frameworks, and hardware configurations tested. Therefore, our findings should be interpreted as a robust methodological illustration rather than a definitive, global performance ranking of the experimental configurations or frameworks.
This study paves the way for future research, with potential areas for exploration including the analysis of more parameters or the evaluation of high-cost functions, such as the most demanding variants of the BBOB set [
27,
28] or even the CEC benchmark industrial cases [
29]. This will help to verify whether the
metric maintains its discriminatory power when fitness time dominates energy consumption. Secondly, other hardware configurations, including GPU accelerators and systematically comparing them with CPUs, will reveal the circumstances in which massive parallelisation reduces or increases consumption per unit of solution. The third step is to come up with a set of rules for stopping the process that take into account the quality of the results, how much time and energy has been used, and the sustainability of the process. The algorithm will then decide when to stop. Finally, real-time adaptive strategies, such as self-adjusting crossover intensity, should be explored to optimise energy efficiency during execution.