1. Introduction
Cloud Computing (CC) has fundamentally changed the landscape of Information Technology (IT) by offering on-demand access to scalable pools of computing resources, including servers, storage, and software applications [
1,
2]. These resources can be quickly provisioned and released with minimal human intervention, enabling widespread adoption of cloud services across sectors—from data analytics and social media to e-commerce and scientific research. As a result, data centres have grown significantly in both size and complexity to meet the increasing demand. However, this rapid expansion has also led to a sharp rise in energy consumption [
3]. Global estimates suggest that data centres now account for a substantial—and steadily growing—portion of electricity usage, raising serious concerns about operational costs and environmental impact, particularly concerning carbon emissions.
Figure 1 provides an illustrative view of a typical cloud computing environment, showcasing various cloud deployment models (public, private, community, and hybrid) as well as key elements such as the load balancer, cloud controller, and cloud scheduler. User requests traverse a firewall before being directed to the load balancer, which intelligently allocates tasks among available resources by established policies [
4]. Such orchestration optimises resource utilisation and addresses the critical need to minimise energy consumption, underscoring the growing emphasis on sustainability in modern cloud infrastructure.
Several interrelated factors drive the rising energy demands in modern data centers. High server density requires substantial power for both computation and cooling, while fluctuating user workloads often necessitate dynamic resource allocation [
5,
6]. Additionally, hardware diversity and virtualization overheads can exacerbate inefficiencies when tasks are not optimally mapped to servers. As depicted in
Figure 2, user requests move through a web service and load balancer, distributing incoming jobs across multiple Virtual Machines (VMs). Within this framework, devising strategies that harmonize performance and energy consumption is paramount—ensuring data centres expand capacity and operate sustainably and cost-effectively.
One of the most effective strategies for addressing the rising energy demands of cloud data centres is intelligent load balancing—the process of distributing tasks to make the best use of available computing resources [
7,
8]. Although load balancing has long been studied in distributed systems, it takes on new importance in cloud environments, where the number of interconnected servers is vast and workloads constantly change. Often, some servers operate at low capacity or sit idle while still consuming power. In contrast, others become overloaded, leading to slower performance and potential breaches of Service-Level Agreements (SLAs) [
9]. By dynamically reallocating workloads to better match resource availability, data centres can reduce the number of active physical machines required at any given time. This approach reduces overall power consumption while maintaining a consistent user experience and high Quality of Service (QoS).
Researchers and practitioners have recently explored various load-balancing methods, ranging from simple heuristic approaches—like Round Robin and First-Fit [
10,
11]—to more sophisticated techniques that leverage advanced algorithms and optimization models [
12,
13,
14]. Traditional heuristics offer speed and simplicity but may struggle to adapt to rapidly changing workload patterns. Conversely, modern optimization techniques, including evolutionary and swarm intelligence algorithms, can dynamically navigate high-dimensional search spaces to find near-optimal solutions [
14]. Nevertheless, challenges persist, including the speed of algorithmic convergence, solution accuracy, and the computational overhead of large-scale implementations. These constraints highlight the need for continuous innovation in algorithm design, particularly in blending the strengths of different approaches into hybrid models.
With these considerations, energy-efficient load balancing stands at the forefront of academic inquiry and industrial practice. As data centres expand to accommodate big data analytics, machine learning workloads, and global-scale web services, the importance of energy awareness continues to grow. Sustainable practices are not merely an environmental imperative but also a cost-driven necessity for businesses. By investigating novel algorithms that adapt to variable workloads and leverage the unique advantages of nature-inspired heuristics, researchers aim to develop solutions that minimize the energy footprint of large-scale cloud environments. The hybrid optimization approach proposed in this work directly addresses these concerns by balancing performance objectives with the pressing need for green, economical cloud computing infrastructure.
1.1. Problem Statement
Despite significant strides in developing scheduling algorithms and resource allocation policies, achieving an optimal distribution of computational loads in large-scale cloud environments remains challenging. This difficulty arises from multiple factors, including dynamic and heterogeneous workload demands, the ever-increasing size of data centre infrastructures, and the need to minimize operational costs and environmental impact [
15,
16]. Traditionally, heuristic-based approaches (e.g., Round Robin, First-Fit, and variants) have been used to mitigate load imbalances, yet these methods often lack scalability and adaptability. Specifically, as the number of user requests or VMs escalates, classical heuristics can result in suboptimal resource utilization and, consequently, higher power consumption. This inefficiency not only inflates operational costs but also hampers sustainable growth.
To formalize the problem, let us consider a cloud data centre with
M Physical Machines (PMs), each capable of hosting several VMs. We assume that there are
N tasks (or jobs) to be scheduled, where each task has a specific computational demand, often represented by its required CPU time or Millions of Instructions (MI). Let
be a binary decision variable such that
Each machine
has a maximum capacity
representing the total computing resources it can offer (e.g., CPU cycles, memory, etc.). The capacity constraint can be expressed as
where
denotes the resource requirement (or computational load) of task
i. Ensuring this constraint is respected helps avoid overloading any physical or virtual machine, thereby maintaining Service-Level-Agreements (SLAs) and preventing undue performance degradation.
In many modern data centres, power consumption can be modelled as a function of CPU utilization, which is typically the dominant factor in determining energy usage. Let
represent the power consumption of machine
j as a function of its utilization level
. A simplified model might consider a linear relationship:
where
is the idle power consumption (the minimum power a machine uses when it is turned on but not actively performing tasks), and
is the maximum power usage when the machine is fully utilized. If we let the total assigned computational load determine
, then the objective is to minimize the aggregated power consumption across all
M machines:
The challenge is that depends directly on how tasks are distributed (captured by the decision variables ), making the problem combinatorial. Solving this optimization effectively requires a search strategy to navigate a high-dimensional solution space and adapt to varying workload patterns and machine heterogeneity.
While various metaheuristic algorithms have been applied to scheduling and load balancing in cloud computing, a key limitation persists: many rely on static parameter settings, limiting their adaptability to dynamic, unpredictable workloads. Traditional optimization approaches that emphasize either global exploration or local exploitation often struggle to achieve both fast convergence and high-quality solutions under fluctuating conditions. In general, metaheuristic algorithms operate through two fundamental mechanisms—exploration and exploitation. Exploration enables a broad search of the solution space to discover diverse and potentially optimal regions, thereby preventing premature convergence to local optima. Exploitation, in contrast, intensifies the search for promising solutions, refining them and ensuring convergence toward an optimal outcome. Achieving an effective balance between these two phases is crucial for maintaining both diversity and precision during optimization.
To overcome the limitations of existing methods, we propose a hybrid optimization strategy that combines the complementary strengths of the Black Eagle Optimization (BEO) algorithm and the Pelican Optimization Algorithm (POA). Drawing inspiration from the soaring and predatory behavior of black eagles, BEO offers a strong balance between exploration and exploitation, ensuring efficient task allocation across heterogeneous cloud resources. Meanwhile, POA models pelicans’ cooperative hunting strategies to enhance local refinement through turbulence-inspired movements. To further enhance adaptability, a reinforcement learning mechanism is integrated into the hybrid framework to tune algorithmic parameters based on system feedback dynamically. This adaptive learning process enables the algorithm to respond intelligently to workload variations, resulting in a flexible, energy-efficient, and high-performance load-balancing solution for modern cloud environments.
1.2. Contributions
The main contributions of this paper are summarized as follows:
A novel hybrid algorithm combining BEO and POA for energy-aware load balancing, considering the dynamic and heterogeneous nature of large-scale cloud environments.
A comprehensive mathematical formulation of the load-balancing problem, explicitly capturing capacity constraints and a power consumption model. This formulation is a basis for designing and implementing the proposed hybrid method.
Integration of an RL controller into the hybrid BEO–POA framework to dynamically adapt exploration and exploitation strategies based on workload feedback.
A self-adaptive mechanism that learns optimal parameter settings over time, improving responsiveness to dynamic cloud environments.
Evaluation and comparison of the proposed method against State-Of-The-Art (SOTA) load balancers.
1.3. Paper Organization
The remainder of this paper is organized as follows.
Section 2 reviews related work on load balancing and energy efficiency in cloud computing, highlighting the limitations of existing approaches.
Section 3 presents the fundamental concepts and mathematical models underlying load balancing, including the BEO and POA.
Section 4 introduces the hybrid BEO-POA load balancer, detailing its problem formulation, algorithm design, and theoretical justifications.
Section 5 describes the implementation of the hybrid BEO-POA approach in the CloudSim framework, covering the system architecture and task scheduling strategies.
Section 6 discusses key implementation considerations, including parameter tuning, population size selection, resource heterogeneity, and the evaluation setup.
Section 7 presents the results and discussion, analyzing the performance of the proposed method relative to state-of-the-art techniques using metrics such as energy consumption, response time, and resource utilization. Finally,
Section 9 concludes the paper by summarizing the key findings.
2. Related Work
Cloud computing has become a preferred paradigm for delivering diverse organizational services. Its notable attributes—on-demand service delivery, pay-as-you-use billing, and rapid elasticity—make it a compelling choice for various applications. However, due to the large number of clients and varied services it supports, managing resources in cloud environments can be more complex than in traditional systems. A typical cloud data centre comprises numerous PMs, each hosting multiple VMs, along with load balancers, switches, and storage units. Inefficient resource utilization and suboptimal scheduling within these data centres consume considerable energy. In response to these challenges, Srivastava et al. [
17] propose an Adaptive Remora Optimization (AROA) approach as a multi-objective model. This method comprises several sub-models—priority calculation, task clustering, probability definitions, and task-VM mapping—driven by Remora’s search mode. The primary aim is to reduce both energy consumption and execution time. The model’s implementation in CloudSim demonstrates its effectiveness, with energy consumption of 0.695 kWh and execution time of 179.14 s. Comparative results indicate that AROA outperforms existing techniques, underscoring its practical advantages.
Another contribution is presented in [
18], where a hybrid solution combining Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO) was developed. Here, PSO functions as a scheduling agent to distribute tasks among servers, while ACO acts as a load-balancing agent that intervenes as needed. This dual-stage process improves overall performance and prevents server overload, shortening task execution time. Experimental evaluations in the CloudSim environment compare this hybrid method against Round Robin, Cat Swarm Optimization (CSO), Genetic Algorithm (G.A), and Ant Colony System Virtual Machine Consolidation (ACSVMC). The findings show a marked reduction in both energy consumption and execution time. Specifically, the proposed method decreases energy consumption by 14% compared with ACS-VMC and G.A, and by over 18% compared with Round Robin and CSO. Execution time is reduced by 15% in comparison with ACS-VMC and CSO, and by more than 28% relative to Round Robin and G.A.
Meanwhile, “Durga,” a novel geographic load-balancing mechanism that effectively conserves energy, is proposed in [
19]. Their work begins with a comprehensive literature survey covering the fundamentals, benefits, and drawbacks of geographical load balancing. Subsequently, they describe an algorithm that expedites the identification of an optimal data centre location to serve incoming requests. This acceleration eases the routing of data packets, reducing access time and conserving energy. Real-world scenarios are simulated using Apache JMeter, and the Haversine formula is employed to compute orthodromic distances between users and data centres. The authors demonstrate that geographical load balancing can significantly lower energy usage while enhancing system performance. Future research avenues are also discussed, including the need for advanced algorithms to further refine load balancing and energy efficiency.
Alongside cloud-centric solutions, Fog computing has emerged as an intermediary between IoT devices and cloud platforms, bringing application services closer to the data sources. However, network utilization, latency, and energy consumption persist. To mitigate these challenges, a DDQ-CLF-based model for classifying fog servers in a secure healthcare setting is introduced in [
20]. The system’s three-layer architecture (IoT, Fog, and Cloud) enables secure data transmission using a proxy server and assigns incoming requests to suitable fog servers based on predefined conditions. Performance metrics—such as latency, computational cost, and energy consumption—show that the proposed method achieves superior results compared to other approaches, notably attaining a top load-balancing level of 73.21%.
Meanwhile, the advantages of cloud computing for data centre operations are acknowledged, but there are ongoing challenges in energy consumption and associated costs [
21]. Their work underscores the need for improved management strategies, particularly through VM consolidation and migration. Similarly, Khan et al. [
22] focus on multi-objective energy-efficient VM consolidation, employing an Adaptive Beetle Swarm Optimization (ABSO) algorithm. This hybrid solution combines Particle Swarm Optimization (PSO) and Beetle Swarm Optimization (BSO) to refine fitness functions and optimize consolidation. Compared to BSO-, PSO-, and GA-based solutions, the ABSO model demonstrates the lowest energy consumption, consuming only 8.234 J to schedule 100 tasks, an improvement over BSO (10.616 J), PSO (11.754 J), and GA (13.545 J). Collectively, these investigations emphasize the interconnected nature of IoT, Fog, and Cloud paradigms, revealing a persistent focus on enhancing load balancing while minimizing energy consumption, latency, and operational costs. As IoT devices proliferate, further refinements in load-balancing methods, whether via cloud-based or fog-based architectures, are paramount for achieving optimal network performance.
A Hyper Min-Max Task Scheduling (HMMTS) strategy coupled with Cascade Shrink Priority (CSP) is introduced in [
23] to optimize task allocation. Their framework incorporates a Changeover Load Balancer (CLB) and a Preemptive Flow Manager (PFM), leveraging a hybrid load-balancing algorithm to improve task distribution efficiency and enhance response times. Experimental evaluations highlight improvements in load balancing, power utilization, and time consumption under both phase-based and random uniform propagation. Moreover, simulation outcomes reveal reduced data processing time and effective load stabilization. Meanwhile, in [
24], a Black Widow Optimization (BWO) algorithm is proposed to reduce service costs in cloud environments by aligning resources with end-user demands. By employing multi-criteria correlation to capture the relationship between user requirements and offered services, they extend the approach to a Multi-Strategy BWO (MS-BWO) model. This algorithm identifies the most effective virtual resource allocation based on a service provisioning dataset featuring metrics such as energy usage, bandwidth utilization, computational cost, and memory consumption. Comparative tests show that MS-BWO surpasses several state-of-the-art solutions, including Workload-Aware Autonomic Resource Management Scheme (WARMS), Fuzzy Clustering Load Balancer (FCL), Agent-Based Automated Service Composition (A2SC), Load Balancing Resource Clustering (LBRC), and an autonomic approach for resource provisioning.
Additionally, Kumar et al. [
25] concentrate on conserving energy in computing servers and network devices. They introduce a parameter,
config, to initialize a system’s operational state, enabling the Dynamic Voltage Frequency Scheduling (DVFS) mechanism to assign tasks to virtual machines more efficiently. Their work extends the Data-centre Energy-efficient Network-aware Scheduling (DENS) approach by adding a peer-to-peer load balancer, thereby minimizing energy consumption in networking components. The resulting scheduling algorithm reduces energy consumption at both the server and communication fabric levels. Experimental data, supported by a 95% confidence interval, indicate the proposed P2BED-C model consumes 1610.22 Wxh, outperforming First-Come-First-Served (FCFS) and Round Robin, which consume 1684.32 and 1678.35 Wxh, respectively. These findings underscore notable power savings and enhanced server power utilization.
On the other hand, wireless communications continue to expand, driving demand for efficient, cost-effective solutions in increasingly complex network environments. Research efforts have evolved from initial investigations of Wireless Sensor Network (WSN) protocols to the broader IoT, which now generates vast amounts of data for sophisticated applications. Balancing these escalating loads has become a critical concern, mainly as practitioners migrate IoT data and its associated processing to cloud-based infrastructures. In this context, an approach that analyses actual and virtual host machine requirements in a cloud computing framework is proposed in [
26]. Their model aims to enhance network response times while reducing energy consumption by designing a load balancer suited to IoT network protocols. The load balancer integrates seamlessly with existing IoT frameworks, improving response times by approximately 60%. Moreover, simulation results indicate decreases in energy consumption (31%), execution time (24%), node shutdown time (45%), and infrastructure cost (48%) over comparable systems, suggesting that the proposed strategy effectively addresses cloud-based IoT load-balancing challenges.
Despite substantial progress in energy-efficient load balancing for cloud computing, several persistent challenges remain. While computationally lightweight, traditional heuristic-based methods like Round Robin and First-Fit often struggle to adapt to dynamic workloads and heterogeneous environments. More advanced metaheuristics and evolutionary algorithms, such as PSO and ACO, offer better adaptability. Still, they are frequently limited by slow convergence, high computational cost, and a tendency to get trapped in local optima. These approaches also commonly struggle to balance power consumption with resource utilization, leading to uneven VM loads and inefficient energy use. Moreover, many existing solutions address response time and energy optimization separately, rather than treating them as part of a unified multi-objective optimization problem. Given the cloud infrastructure’s growing scale and complexity, more intelligent and responsive methods are needed to allocate resources while minimizing energy usage and dynamically upholding SLAs. Previous research has proposed various algorithmic enhancements, yet a significant gap exists in integrating hybrid strategies that can flexibly balance global exploration and local refinement.
To bridge this gap, we propose a hybrid BEO–POA approach that combines the BEO’s broad search capability with the POA’s fine-grained adjustment mechanisms. While this hybridization addresses several limitations of earlier methods, such as convergence speed and workload adaptability, we further strengthen the framework by embedding an RL controller. RL has shown promise in cloud scheduling and energy-aware optimization by enabling systems to learn optimal actions based on real-time feedback. However, most prior work applies RL in isolation, without leveraging metaheuristics’ strengths in global and local search. Our integrated design allows the RL agent to dynamically tune key parameters of the hybrid optimizer, improving responsiveness and efficiency in complex cloud environments. This results in a more adaptive, sustainable, and performance-oriented solution for cloud resource management.
3. Preliminaries
To effectively tackle the challenges of energy-efficient load balancing in cloud computing, it is essential to understand the core optimization techniques used in this study. This section introduces the BEO and the POA, which serve as the foundation of our proposed approach to task scheduling and resource allocation in large-scale cloud environments. BEO is inspired by the hunting strategies of black eagles and is designed to strike a balance between global exploration and local exploitation, promoting reliable convergence toward optimal solutions. On the other hand, POA draws from pelicans’ cooperative hunting behavior, using collective intelligence to refine the search process and improve computational efficiency. By combining these two nature-inspired algorithms, the hybrid approach aims to reduce energy consumption without compromising system performance. The following subsections present the mathematical models underlying BEO and POA, detailing their operational phases, governing equations, and their implementation within the optimization framework.
3.1. BEO Algorithm
The BEO is a metaheuristic algorithm inspired by the hunting and social behaviors of black eagles [
27]. It models various stages of the eagle’s behavior, such as stalking, hovering, catching, snatching, warning, migrating, courting, and hatching, to guide the search process effectively. These eight core behaviors are designed to maintain a balance between global exploration and local exploitation, which is essential for efficiently navigating complex search spaces. Each behavior is mathematically defined to contribute to the algorithm’s ability to converge toward optimal solutions in a structured and adaptive manner.
Figure 3 illustrates the decision-making flow of the BEO algorithm. The process begins with an initialization phase, where key parameters are defined, including the number of iterations (
T), the population size (
N) representing the number of black eagles, and the threshold for stalled updates (
H). Once initialized, the fitness of each black eagle is evaluated to identify the best candidate solution. The optimization proceeds through a hierarchical structure incorporating stalking, hovering, and catching strategies to explore the search space. These strategies enable adequate diversification and intensification during the search. Depending on the proximity of the best solution to the search space boundaries, the algorithm dynamically decides whether to invoke a warning mechanism or proceed with the snatching strategy to update the population and guide the search toward more promising regions.
The migration mechanism is activated when the number of iterations or the stagnation threshold exceeds predefined limits. This allows the algorithm to escape local optima and explore new regions of the search space. In the final stages, the BEO incorporates courtship and hatching strategies to refine candidate solutions and enhance convergence. This structured, adaptive decision-making process ensures a balanced trade-off between exploration and exploitation, thereby improving the algorithm’s overall search efficiency and robustness.
3.1.1. Initialization
The population of black eagles is initialized in an
n-dimensional search space:
where
X represents the position matrix of black eagles.
d is the problem dimension.
n is the number of black eagles (population size).
is the position of the i-th eagle in the j-th dimension.
The position of each black eagle is initialized as
where
and are the lower and upper boundaries of the search space in dimension j.
rand is a random number in the range .
3.1.2. Stalking (Global Search)
Black eagles stalk their prey from high ground, scanning the environment. The mathematical model for this phase is
where
is the updated position of the eagle.
is the current best solution.
is the position of a randomly selected eagle.
are random coefficients in the range .
3.1.3. Hovering (Rotational Search)
Black eagles hover to maintain visual contact with their prey, modelled by
where
3.1.4. Catching (Local Refinement)
When an eagle catches prey, it refines its position:
where
3.1.5. Snatching (Jump Search)
Black eagles engage in snatching behaviour, modelled as
where
3.1.6. Migration (Adaptive Escape)
Eagles migrate when food is scarce:
where
The BEO algorithm follows a structured process, outlined in Algorithm 1.
| Algorithm 1 BEO |
- Require:
Population size N, Maximum iterations T, Search space boundaries, Objective function F - Ensure:
Best solution - 1:
Initialize the population of N black eagles randomly - 2:
Evaluate the objective function F for each eagle - 3:
Identify the best solution (prey position) - 4:
for to T do ▹Iterate through generations - 5:
for to N do ▹Iterate through all eagles - 6:
Perform Stalking Phase: Update position using Equation ( 7) - 7:
Apply Hovering Phase: Fine-tune search using Equation ( 8) - 8:
Apply Catching Phase: Refine position using Equation ( 10) - 9:
Apply Snatching Phase: Jump search using Equation ( 11) - 10:
Apply Migration Phase: Adaptive movement using Equation ( 12) - 11:
end for - 12:
Update the best solution found - 13:
end for - 14:
return ▹Best solution found
|
The BEO integrates multiple intelligent search strategies inspired by the predatory behaviours of black eagles. By balancing global exploration and local exploitation, BEO achieves robust convergence and adaptability in solving optimization problems.
3.1.7. Justification for Selecting the BEO
The decision to employ the BEO as the global search component in the proposed hybrid metaheuristic framework stems from its demonstrated balance between exploration and exploitation, low parameter dependency, and superior convergence behavior compared to classical algorithms such as the Genetic Algorithm (GA) and Differential Evolution (DE). While GA and DE have historically served as benchmarks in evolutionary computation, their performance in dynamic and large-scale cloud scheduling tasks is often hindered by parameter sensitivity and slower convergence under high-dimensional constraints [
28,
29].
Theoretically, BEO draws inspiration from black eagles’ cooperative hunting and migratory behaviors, encapsulating distinct phases such as stalking, hovering, snatching, and migration. These adaptive mechanisms enable dynamic regulation of search intensities and prevent premature convergence. Unlike GA, which relies on crossover and mutation rates, or DE, which depends on scaling and recombination factors, BEO’s operators self-adjust based on the population’s fitness variance [
27]. This self-adaptive behavior reduces the need for extensive manual tuning, a key advantage in energy-aware scheduling, where workload distribution patterns can change unpredictably.
Empirical evidence further validates BEO’s selection. In their study, Zhang et al. [
27] evaluated BEO over 30 CEC2017 and 12 CEC2022 benchmark functions, reporting that the algorithm achieved optimal convergence accuracy in all unimodal functions and outperformed comparative metaheuristics, including GA, DE, and PSO, in 78.95% of multimodal functions. Moreover, the standard deviation of fitness values ranked among the top three in 90.48% of the test cases, demonstrating superior stability and robustness in stochastic environments. Subsequent comparative research confirms that BEO achieves faster convergence and higher accuracy than traditional algorithms on constrained and dynamic optimization problems [
14,
30].
In cloud computing, resource scheduling is a highly dynamic, multimodal optimization problem characterized by heterogeneity, unpredictable workloads, and conflicting objectives, such as minimizing energy consumption while maximizing resource utilization and throughput. Classical metaheuristics such as GA and DE require frequent parameter recalibration as task loads or infrastructure heterogeneity evolve [
15]. Conversely, BEO’s stochastic migration and snatching phases allow adaptive balancing between exploration and exploitation without external control, leading to stable convergence and improved scheduling quality across diverse scenarios.
It is worth emphasizing that using BEO does not imply universal superiority over GA or DE across all domains. Instead, the algorithm was chosen as a strategic fit for dynamic cloud scheduling tasks that demand rapid adaptability, energy awareness, and low parameter overhead. Nonetheless, future work will include a systematic comparative study incorporating GA, DE, and the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) within identical simulation environments (CloudSim, EdgeCloudSim, and iFogSim) to substantiate the empirical advantages of BEO further.
BEO provides a theoretically grounded and empirically validated foundation for large-scale energy-aware load balancing. Its efficient global search mechanism, minimal tuning requirements, and proven benchmark superiority make it a robust choice for the worldwide exploration phase of the proposed hybrid RL-guided metaheuristic framework.
3.2. POA
The POA is a bio-inspired metaheuristic that mimics pelicans’ cooperative hunting behaviour. The algorithm is structured around two main phases: an exploration phase, where pelicans search for prey by moving towards optimal regions, and an exploitation phase, where they refine their search using winging and turbulence strategies to capture prey efficiently. These behaviours are mathematically modelled to effectively balance global search (exploration) and local refinement (exploitation) [
31].
The POA employs two fundamental equations that govern pelican movement during optimization.
3.2.1. Exploration Phase—Movement Towards Prey
During exploration, pelicans adjust their positions based on the location of prey, ensuring a diversified search of the solution space. The movement of each pelican is formulated as follows:
where
is the updated position of the i-th pelican in the j-th dimension.
represents the current position of the i-th pelican in the j-th dimension.
is the prey’s position in the j-th dimension.
is the fitness value of the prey’s position.
is the fitness value at the pelican’s current position.
I is a randomly chosen integer (1 or 2) that controls movement intensity.
rand is a uniformly distributed random number in the range .
3.2.2. Exploitation Phase—Winging on the Water Surface
Once pelicans reach the water surface, they use their wings to create turbulence, forcing fish into shallower waters for easier capture. This fine-tuning step is modelled as
where
is the refined position of the i-th pelican in the j-th dimension.
R is a predefined constant (typically set to 0.2) that controls the intensity of local search.
t is the current iteration number.
T is the maximum number of iterations.
rand is a uniformly distributed random number in the range .
ensures that the search area contracts for precise convergence as iterations progress.
The acceptance of a new position follows an adaptive updating mechanism, ensuring that only solutions yielding an improvement in the objective function are retained:
where
is the updated position of the i-th pelican.
is the fitness value at the pelican’s current position.
and are the fitness values at the updated positions obtained from the exploration and exploitation phases, respectively.
The POA follows a structured process, as outlined in Algorithm 2.
| Algorithm 2 POA |
- Require:
Population size N, Maximum iterations T, Search space boundaries, Objective function F - Ensure:
Best solution - 1:
Initialize the population of N pelicans randomly within the search space - 2:
Evaluate the objective function F for each pelican - 3:
Identify the best current solution (prey position) - 4:
for to T do ▹Iterate through generations - 5:
for to N do ▹Iterate through all pelicans - 6:
Perform Exploration Phase: Update pelican position using Equation ( 13) - 7:
Apply the adaptive update mechanism - 8:
Perform Exploitation Phase: Fine-tune search using Equation ( 14) - 9:
Apply the adaptive update mechanism - 10:
end for - 11:
Update the best solution found so far - 12:
end for - 13:
return ▹Best solution found
|
The POA effectively balances exploration and exploitation by simulating pelicans’ strategic hunting behaviour. It ensures efficient convergence towards optimal solutions by dynamically moving towards prey and locally refining through turbulence. The structured adaptation mechanism further enhances performance, making POA a competitive approach for solving complex optimization problems in cloud computing, engineering, and beyond.
3.3. Rationale for Using POA as the Local Refinement Component
The POA was chosen as the local refinement module in the proposed hybrid BEO–POA framework. Its adaptive turbulence mechanism balances exploitation intensity and population diversity more effectively than classical single-point refiners such as Hill Climbing (HC) or Tabu Search (TS). Although these simpler heuristics are computationally efficient, they typically operate greedily, progressively improving a single candidate solution based on deterministic neighbourhood transitions. This makes them prone to stagnation in local optima, particularly in multimodal and high-dimensional landscapes such as energy-aware cloud scheduling. POA, in contrast, maintains population diversity through stochastic turbulence and adaptive contraction of the exploration radius, enabling it to refine multiple promising regions simultaneously [
31,
32].
From an algorithmic perspective, POA models pelicans’ cooperative hunting behaviour. The exploitation phase, often referred to as the “winging turbulence,” can be mathematically expressed as
where
R controls turbulence intensity,
t and
T represent the current and maximum iteration counts, and
progressively shrinks the search radius as convergence approaches. This mechanism yields a dynamic local search analogous to a variable neighbourhood strategy without the overhead of explicitly enumerating or evaluating neighbouring states, as done in HC or TS. The result is a more flexible refinement process that adapts to the landscape curvature in real time.
4. Proposed BEO-POA with RL Load Balancer
This section presents the overall workflow of the proposed RL-enhanced hybrid BEO–POA framework for energy-aware load balancing in cloud environments. The workflow operates in four main stages. First, the initialization stage generates an initial population of task–VM mappings and system parameters. Second, the global exploration stage, driven by the BEO, performs a broad search across the solution space to identify promising task allocations. Third, the local refinement stage, guided by the POA, fine-tunes elite solutions to enhance local convergence. Finally, the RL controller continuously monitors system metrics such as energy consumption, utilization, and load imbalance, dynamically adjusting algorithmic parameters (, , and ) to maintain optimal balance between exploration and exploitation.
We propose a hybrid optimization approach that combines an enhanced version of the BEO with the POA to improve energy-efficient load balancing in cloud computing environments. This hybridization is motivated by the complementary strengths of the two metaheuristics. BEO provides a strong balance between global exploration and local exploitation through its structured strategies—stalking, hovering, catching, and migration—allowing it to navigate diverse solution spaces effectively. Meanwhile, POA enables rapid convergence and fine-tuned local searches through cooperative hunting behaviors and turbulence-based refinements.
We introduce an RL controller into the optimization loop to make the hybrid system more adaptive in real time. The RL component monitors key performance indicators, including workload variability, VM utilization, and energy usage. Based on this feedback, it dynamically adjusts parameters within BEO and POA—such as switching probabilities, step sizes, and refinement intensities—to maintain an optimal balance between exploration and exploitation as conditions change.
The hybrid BEO-POA becomes a more intelligent and context-aware load balancer by embedding this learning-based adaptation layer. It can respond proactively to system dynamics, reduce unnecessary energy consumption, and improve overall resource utilization. This makes the proposed method well-suited to large-scale, heterogeneous cloud environments where unpredictable workload patterns and SLAs must be upheld.
To tailor the BEO and POA for load balancing in cloud environments, several key modifications have been introduced:
RL-based adaptation: We added a lightweight RL controller that monitors key system metrics—such as energy usage, workload variation, and resource utilization—and uses this feedback to fine-tune the optimizer in real time. Based on the system’s current state, the RL agent adjusts parameters such as the switching rate between BEO and POA, the step size, and the refinement intensity. This helps the algorithm adapt on the fly without relying on manual tuning.
Dynamic role switching between BEO and POA: Instead of relying on a fixed strategy, the algorithm dynamically switches between BEO and POA depending on convergence trends and workload behavior. For example, under high workload variability, the system might favor POA’s local refinement to quickly stabilize the load.
Adaptive balance between exploration and exploitation: The algorithm adjusts its focus between exploring new solutions and refining existing ones based on real-time feedback. If tasks are frequently migrated or performance is unstable, it shifts toward more local search to fine-tune assignments.
Energy-aware migration in BEO: The migration behavior of the BEO component has been modified to consider energy usage. Now, the algorithm prefers migrating tasks to virtual machines with better energy profiles, helping reduce overall power consumption.
Energy-conscious task allocation in POA: POA’s movement rules were updated to include energy metrics, making tasks more likely to be assigned to VMs that consume less power when idle. This subtle change improves energy efficiency without sacrificing performance.
4.1. Mathematical Model of Hybrid BEO-POA with RL
A three-stage optimization process governs the proposed hybrid algorithm:
Stage 1: Global Exploration using BEO
Task allocations are adjusted based on black eagle movement patterns at the global search phase. The updated position of each eagle (task assignment) according to Equation (
7).
Here,
is the updated position of the i-th task allocation in dimension j.
is the previous position of the i-th task allocation in dimension j.
is the current best load balancing solution in dimension j.
is the position of a randomly selected alternative task allocation in dimension j.
and are random coefficients in the range that control movement intensity.
The migration mechanism in BEO is modified to incorporate energy constraints, ensuring that tasks are moved to VMs with lower power consumption, as defined by Equation (
12).
Stage 2: Local Refinement using POA
Once the global exploration stage stabilizes, the fine-tuning stage utilizes POA to optimize local assignments based on Equation (
17).
This ensures fine-grained task assignment optimization, reducing energy consumption and improving load distribution.
The structured hybridization process is outlined in Algorithm 3.
| Algorithm 3 RL-Enhanced Hybrid BEO–POA Load Balancing |
- Require:
Population size M, Maximum iterations I, Energy threshold , Load imbalance threshold - Ensure:
Optimal load-balanced task allocation - 1:
Initialize population of M eagles (BEO solutions) and M pelicans (POA solutions) - 2:
Initialize RL agent with policy - 3:
Evaluate initial fitness values based on energy consumption, VM utilization, and LIF - 4:
Identify the global best solution - 5:
for to I do ▹Main optimization loop - 6:
Observe system state {current energy usage, VM utilization, LIF, convergence rate} - 7:
Select action using current RL policy - 8:
Adjust optimizer parameters (e.g., BEO step size, POA turbulence rate, switching ratio) based on - 9:
Global Search: Update black eagle solutions using BEO - 10:
Evaluate energy-aware migration behavior - 11:
if energy consumption or load imbalance then - 12:
Increase agent ratio allocated to POA for local refinement - 13:
else - 14:
Maintain or strengthen BEO-driven exploration - 15:
end if - 16:
Local Search: Apply POA fine-tuning to optimize VM-task mappings) - 17:
Evaluate updated fitness values; update global best if improved - 18:
Update RL agent using observed reward to improve policy - 19:
end for - 20:
return ▹Final optimized task allocation
|
BEO was designed primarily for
structured global exploration. The algorithm adapts its migration step size to population diversity, promoting rapid coverage of unexplored regions and preventing early saturation. In contrast, POA’s main strength lies in
localized exploitation: its turbulence operator performs micro-adjustments within a dynamically contracting radius, allowing precise fine-tuning once promising basins are discovered [
31,
32]. Conceptually, BEO acts as a coarse-grained navigator, whereas POA functions as a fine-grained refiner.
To empirically examine whether the hybridization produces duplication or synergy, we performed an auxiliary ablation study in which three configurations were tested on the same CloudSim workload (Scenario III with 1000 cloudlets, 32 VMs, 8 hosts):
BEO Only: global exploration and refinement handled solely by BEO.
POA Only: exploration and exploitation handled solely by POA.
Hybrid BEO–POA: BEO performs exploration for 60% of iterations, after which POA refines the best 40% of candidate mappings.
Each configuration was executed ten times. The averaged results are reported in
Table 1.
The results show that the hybrid configuration consistently outperformed either component alone in energy consumption, makespan, and load balance, despite incurring a modest 6–8% increase in computation time. This overhead is acceptable given the 10–15% improvement in performance metrics. The synergy arises because BEO’s global search rapidly identifies diverse promising regions, while POA subsequently intensifies exploitation within those regions using turbulence-driven refinements. Without this division of labour, BEO alone exhibits slower convergence in the final iterations, and POA alone lacks sufficient initial diversity to escape local optima.
Algorithmically, the integration is implemented via sequential orchestration rather than simultaneous execution, thereby mitigating redundancy. The RL controller governs the switching ratio between BEO and POA based on convergence indicators such as fitness variance and the LIF. When population diversity drops below a threshold, control shifts from BEO to POA; when diversity increases again, BEO resumes exploration. This adaptive scheduling ensures the two optimizers operate in complementary temporal phases rather than duplicating effort within the same iteration.
It is also essential to consider the computational complexity. The time complexity of BEO is approximately , where N is the population size, d is the problem dimension, and is the iteration count. POA’s complexity is . In the hybrid design, and are reduced proportionally (e.g., 60% and 40% of the total T), keeping the overall complexity close to the single-algorithm baseline. Thus, hybridization adds minimal overhead relative to the gain in solution quality.
These empirical and analytical observations align with other reports of successful two-phase metaheuristics. For instance, Dehghani et al. [
32] and Singh et al. [
33] noted that pairing exploration-dominant and exploitation-dominant metaheuristics improves both convergence speed and solution precision, provided that their control loops are sequentially synchronized. The proposed BEO–POA follows this paradigm by exploiting complementary behavioural properties rather than duplicating similar functions.
We recognize that hybridization introduces additional design complexity and marginal computational cost. To further optimize efficiency, future work will explore two enhancements: (1) employing dynamic population resizing so that POA operates on a reduced subset of elite solutions during refinement, and (2) using reinforcement-learning-based adaptive iteration allocation to minimize idle computation during the switching phase. These extensions aim to preserve the hybrid’s accuracy benefits while reducing overhead.
Although BEO and POA possess intrinsic exploration–exploitation mechanisms, their behavioural emphases differ sufficiently to warrant hybridization. BEO provides structured, large-scale exploration, while POA contributes adaptive, fine-grained exploitation. Empirical results confirm that their sequential combination yields complementary synergy rather than redundancy, improving energy efficiency and load balance with minimal additional computational cost.
Stage 3: RL Controller
To enhance the adaptability of the hybrid BEO–POA algorithm, we introduce an RL controller that continuously adjusts the optimizer’s behavior in response to the system’s current state. This addition allows the load balancer to make smarter decisions over time without requiring manual parameter tuning.
The RL component is a high-level control layer that observes the system, selects appropriate actions, and learns from outcomes. It is designed around a standard agent-environment framework, defined as follows:
State (): At each decision step t, the agent observes a state vector that includes metrics such as the current task load, average VM utilization, the algorithm’s operating phase (exploration vs. exploitation), and the LIF. This snapshot reflects the system’s current condition and helps guide adaptive behavior.
Action (): Based on the observed state, the agent selects an action from a predefined set. These actions include adjusting the switching probability between BEO and POA, modifying the BEO step size, or tuning the POA turbulence factor. The goal is to find the right balance between exploration and refinement to respond effectively to system dynamics.
Reward (
): The agent receives a reward signal that reflects the quality of the chosen action. The reward is calculated to encourage low energy consumption, high resource utilization, and balanced task distribution. A simple yet effective reward function is defined as
where
,
, and
are weight parameters that determine the importance of each term, normalized energy is scaled to the range
, ensuring comparability across metrics.
The chosen reward components reflect the three most significant objectives of cloud resource management: minimizing energy consumption, maximizing utilization, and maintaining a balanced load distribution. However, these objectives are not equally critical. Excessive energy use directly impacts a data center’s operational cost and sustainability, whereas a moderate imbalance can be tolerated if overall utilization remains high. Accordingly, the reward prioritizes energy reduction by assigning , followed by to encourage efficient resource use, and to penalize unbalanced workloads.
This weighting scheme was derived from empirical observations from preliminary CloudSim experiments. When all weights were equal, the RL controller oscillated between over-aggressive energy saving and under-utilization, leading to suboptimal throughput. Increasing relative to stabilized the policy and reduced total energy consumption by approximately 12% while maintaining acceptable utilization. Therefore, the final weights were selected to reflect the practical trade-offs between sustainability, performance, and stability observed across multiple trials.
4.1.1. Sensitivity Analysis of Reward Weights
A sensitivity analysis was conducted using five distinct reward configurations to quantify the impact of weight selection on learning performance (
Table 2). Each configuration modifies one or more weight ratios while keeping the others constant. The RL agent was retrained under identical workload conditions (Scenario III: 1000 cloudlets, 32 VMs, 8 hosts) for 200 episodes per configuration.
The results indicate that the RL agent is moderately sensitive to reward composition. Configurations prioritizing utilization achieved higher CPU usage but exhibited unstable rewards and higher load imbalance. Conversely, overemphasizing energy reduced consumption marginally but led to convergence oscillations due to excessive exploration. The proposed weights yielded the most balanced outcomes, achieving the lowest combined energy–LIF cost and the fastest convergence rate (140 episodes). The reward variance of 0.018 further indicates stable learning across multiple runs.
These findings suggest that the RL controller’s behaviour is robust within a reasonable range of weight variations, but extreme prioritization of a single metric degrades stability. The balanced reward scheme allows the agent to learn policies that simultaneously reduce energy use, maintain high utilization, and avoid severe imbalance. The analysis also reveals an implicit interaction between energy and utilization: minor reductions in energy are often accompanied by a proportional drop in utilization when the reward weights are skewed, confirming the multi-objective trade-off inherent in cloud scheduling. The reward sensitivity analysis provides both interpretability and reproducibility for future researchers. Although the current weight configuration was tuned empirically, the modular RL framework can accommodate adaptive or self-tuning reward mechanisms.
The agent uses an RL algorithm, such as Q-learning or a Deep Q-Network (DQN), to learn optimal actions over time. The goal is to discover a policy , which maps system states to optimal actions that maximize long-term rewards. Over multiple iterations, the policy improves as the agent gathers more experience, allowing the system to self-tune its behavior in dynamic cloud environments.
To clearly demonstrate the interaction between the RL controller and the hybrid BEO–POA optimization process, the following Algorithm 4 outlines the high-level operational flow without delving into low-level implementation details.
| Algorithm 4 Stage 3: RL Controller—High-Level Interaction with BEO–POA |
- Require:
Max episodes K, horizon I; initial policy ; initial parameters - Ensure:
Adapted policy and final mapping - 1:
for to K do ▹Episode - 2:
Reset simulator; initialize populations; set best solution - 3:
for to I do ▹Decision step - 4:
Observe state {energy, utilization, LIF, diversity, convergence} - 5:
Select action ▹e.g., adjust - 6:
Apply parameter updates (clamped to bounds) - 7:
if rand() then - 8:
Local refinement (POA) on elite subset - 9:
else - 10:
Global exploration (BEO) on population - 11:
end if - 12:
Evaluate fitness; update if improved - 13:
Compute reward using ( 18) - 14:
Update policy - 15:
if converged or budget reached then break - 16:
end if - 17:
end for - 18:
end for - 19:
return
|
4.1.2. Definition and Reproducibility of the Reinforcement Learning Controller
The RL controller serves as an adaptive supervisory layer that dynamically regulates the behaviour of the hybrid BEO–POA optimizer. Its purpose is to maintain an optimal balance between exploration and exploitation throughout the optimization process by continuously adjusting several key parameters in response to observed system performance.
The hybrid BEO–POA optimizer exhibits two complementary behaviours: large-scale exploration through BEO’s migration strategy and local exploitation via POA’s turbulence mechanism. However, the optimal balance between these modes varies depending on workload heterogeneity and convergence progress. A static configuration of algorithmic parameters, such as migration step size or turbulence intensity, can lead to premature convergence or excessive wandering. To mitigate this, a lightweight RL controller was embedded as a high-level policy learner. Its task is to monitor the current optimization state and adjust three behavioural parameters in real time: (1) the exploration step size of BEO (), (2) the turbulence coefficient of POA (), and (3) the switching probability () that determines the transition between the two optimizers.
The RL controller is modelled as a discrete-state Markov Decision Process (MDP) defined by the tuple , where S denotes the set of states representing the current status of the optimization process, A is the action space comprising possible parameter adjustments, R is the scalar reward returned after each update, P represents the transition probabilities between states, and is the discount factor that balances immediate and long-term rewards.
State Space. Each state is a four-dimensional vector that captures key aspects of system performance at iteration t. The variables, normalized to the range [0, 1], are as follows:
—normalized energy consumption of the data centre at iteration t;
—average CPU utilization rate of active hosts;
—population diversity index, computed as the normalized variance of fitness values across candidate solutions;
—convergence indicator, representing the normalized rate of change of the best fitness value across the last k iterations.
This combination provides a compact yet sufficient representation of the optimizer’s progress, enabling the agent to infer when to intensify exploration or exploitation.
Action Space. The action set defines five possible control interventions:
: Increase to promote stronger global exploration;
: Decrease to stabilize convergence;
: Increase to expand local search turbulence;
: Decrease for finer local exploitation;
: Adjust the switching probability between BEO and POA according to diversity level.
These discrete actions provide sufficient granularity for adaptive control while keeping the learning process computationally tractable.
Reward Function. The reward function
quantifies the benefit of each action using three normalized metrics: energy consumption (
), resource utilization (
), and load imbalance factor (
). It is computed as follows:
where the weight coefficients
were empirically set to
. This formulation rewards actions that reduce energy use and imbalance while maintaining high utilization.
The agent adopts the classical Q-learning algorithm [
34,
35] to approximate the optimal state–action value function
. The update rule is expressed as follows:
where
is the learning rate and
the discount factor. The
-greedy exploration strategy balances random exploration (with probability
) with exploitation of the best-known policy.
Training was performed over 200 episodes, each corresponding to one complete optimization run. Key hyperparameters were tuned empirically as follows: learning rate , discount factor , and initial exploration rate (decayed linearly to 0.05). The state space was discretized into ten bins per dimension, yielding possible states. The reward signal was smoothed using a 5-iteration rolling average to reduce stochastic noise. All random seeds were fixed (random.seed(42)) to guarantee repeatability. These explicit definitions ensure that other researchers can re-implement the controller independently.
The RL controller operates asynchronously at the meta-level, updating its decisions every 10 optimization iterations rather than continuously, thereby minimizing computational overhead. At each update, the optimizer reports its current metrics, energy, utilization, diversity, and imbalance to the RL agent, which computes the corresponding state and reward. The agent then updates its Q-table, selects the following action
, and modifies the relevant optimizer parameters (
,
,
) for the next cycle. This feedback loop, illustrated in
Figure 4, allows the optimizer to self-adapt dynamically to workload variability and convergence trends.
The explicit specification of states, actions, rewards, learning algorithm, and hyperparameters makes the RL integration reproducible. Rather than deep reinforcement learning, Tabular Q-learning was chosen to maintain transparency and interpretability while keeping computational requirements modest. Nevertheless, the framework’s modular design allows straightforward substitution with more advanced agents such as DQN or Proximal Policy Optimization (PPO) for future experiments on larger or more volatile datasets.
The RL controller is an adaptive, reproducible, and interpretable learning layer that fine-tunes the hybrid BEO–POA optimizer in real time. Its formal MDP definition and controlled implementation parameters ensure that the learning dynamics can be independently verified, thereby addressing reproducibility and clarity concerns during integration.
While tabular Q-learning was adopted in this study for its simplicity, interpretability, and ease of integration within the CloudSim-based simulation framework, it presents certain limitations when applied to higher-dimensional or continuous state spaces. The discrete representation, though effective for the four-dimensional state vector used in this work, may lead to scalability issues as the number of state features increases or when finer granularity is required to capture complex system dynamics. This can result in slower convergence or reduced generalization capability in highly dynamic cloud environments.
Future extensions of this framework could therefore incorporate function approximators, such as DQN or actor–critic methods, which are better suited for modelling continuous, large-scale state spaces. These approaches would enable the RL controller to generalize across unseen system conditions while maintaining adaptability, thereby enhancing robustness and decision quality in real-world cloud data center deployments.
By incorporating this learning-based controller, the hybrid BEO–POA becomes a more intelligent and flexible load balancer, capable of adjusting its optimization strategy in real time. This leads to better performance across various workloads and system configurations.
4.2. Modification of the BEO Migration Step for Energy-Aware Task Placement
The migration step in the original BEO is primarily designed for global exploration, in which each agent (eagle) updates its position by following the global best solution via a stochastic migration vector. While this design ensures adequate coverage of the search space, it does not inherently account for energy efficiency when applied to task placement or VM scheduling. To address this limitation, we introduce an energy-aware adaptation of the BEO migration rule that explicitly integrates VM energy consumption and utilization metrics into the position update process.
4.2.1. Original BEO Migration Principle
In the canonical BEO algorithm [
27], each eagle updates its position according to
where
is the step size controlling migration intensity,
and
are random coefficients uniformly distributed in [0, 1], and
is the best-known position at iteration
t. This rule ensures exploration by moving each agent toward the current global optimum while maintaining diversity via stochastic perturbations. However, it treats all dimensions equally and does not distinguish between energy-efficient or overloaded VMs.
4.2.2. Energy-Aware Migration Adaptation
To adapt the migration behaviour for cloud environments, we reformulate Equation (
21) to favour migration toward VMs with lower predicted energy cost and higher resource efficiency. The modified migration rule is defined as
where
controls the trade-off between performance convergence and energy minimization, and
represents the normalized energy-efficiency gradient defined as
with
denoting the CPU utilization of VM
j and
its instantaneous power draw. The weighting factor
prioritizes underutilized yet energy-efficient VMs, encouraging the migration of new tasks to hosts with low load and a high performance-per-watt ratio. This formulation ensures that the optimizer does not merely seek the shortest makespan but balances it with minimal incremental energy consumption.
Intuitively, Equation (
22) modifies the migration vector to include an “energy-awareness bias.” When the system load is uneven, the term
acts as a corrective vector that steers the search toward lower-power VMs without sacrificing exploration. When
, the algorithm behaves as standard BEO; when
, migration is entirely driven by energy minimization. In our implementation,
is adaptively adjusted by the RL controller based on reward signals combining energy, utilization, and load imbalance.
To validate the effectiveness of this adaptation, we compared the modified migration step against two commonly used deterministic heuristics—Best-Fit and Min-Min—in the same CloudSim environment. Both heuristics were configured to assign 1000 tasks across 32 VMs, with energy models following the standard power–utilization relationship .
4.2.3. Theoretical and Empirical Justification
From a theoretical standpoint, the modified migration rule implicitly defines a multi-objective search direction that minimizes the convex combination:
where
and
are differentiable surrogate objectives. Under standard convexity and boundedness assumptions, the descent property of Equation (
22) ensures that the expected improvement in
remains nonnegative, i.e.,
This provides theoretical backing for its convergence stability. Empirically, convergence curves (
Figure 5) demonstrate a smooth monotonic decrease in total energy with no oscillatory behaviour, confirming that the migration adaptation remains stable under RL-controlled parameter tuning.
This adaptation transforms BEO from a purely performance-driven optimizer into an energy-aware metaheuristic suitable for modern sustainable cloud systems. The reinforcement learning controller further enhances its adaptability by dynamically adjusting based on system feedback. As a result, the hybrid BEO–POA algorithm consistently achieves lower energy consumption and balanced resource utilization compared with heuristic baselines, without compromising convergence speed or computational efficiency.
5. Implementation of Hybrid BEO–POA in CloudSim
This section outlines the implementation of the proposed Hybrid BEO–POA load balancing approach within the CloudSim framework. The prototype was developed entirely in Java 11, consistent with CloudSim’s native design and architecture. It is assumed that the reader is familiar with core CloudSim components such as Datacenter, DatacenterBroker, Cloudlet, and VM. The primary objective of this implementation is to integrate the BEO and POA algorithms into the task scheduling and resource allocation policies, thereby enabling dynamic workload distribution across virtual machines (VMs). Through this integration, the system aims to minimize energy consumption while maintaining high levels of resource utilization and overall performance efficiency.
5.1. CloudSim Architecture Overview
CloudSim consists of the following principal components:
Datacenter: Models the physical infrastructure, including hosts, networking, and storage. The PowerDatacenter class is used for energy-aware simulations.
Host: Represents a PM, typically configured with CPU cores, RAM, storage, and a PowerModel for energy consumption.
VM: Encapsulates allocated CPU cores, memory, and bandwidth. Tasks (Cloudlets) run on these VMs.
Cloudlet: Represents a user job or task characterized by computational demand (e.g., in millions of instructions), input and output sizes, and execution time.
DatacenterBroker: Mediates between users and the Datacenter, coordinating the submission of Cloudlets to appropriate VMs.
The Hybrid BEO-POA approach is implemented primarily at the broker level (or as part of a custom scheduler) to distribute tasks energy-efficiently.
5.2. Setting up an Energy-Aware Datacenter
Since the goal is to reduce energy usage, we utilize the PowerDatacenter class, which computes power consumption based on host utilization.
Define a power model: Extend the
PowerModel class to implement a custom function for power consumption. A linear power model is given by
where
Create PowerHost objects: Instantiate PowerHost instances, each configured with a PowerModel, CPU cores, RAM, and bandwidth.
Assemble the PowerDatacenter: Provide a list of PowerHost objects to the PowerDatacenter along with a VmAllocationPolicy.
5.3. Custom Load Balancing Policy
The Hybrid BEO–POA algorithm decides how to allocate Cloudlets to VMs. This can be integrated into CloudSim by
To manage high-level task scheduling across multiple VMs, we develop a custom DatacenterBroker.
5.4. Extending the DatacenterBroker for Hybrid BEO–POA
A new class, BEOPOA_Broker, extends DatacenterBroker and implements the metaheuristic load balancing strategy:
The proposed Hybrid BEO–POA load-balancing algorithm (Algorithm 5) optimizes task allocation in cloud environments by leveraging the BEO’s global exploration capabilities and the POA’s local refinement mechanisms. The algorithm begins by initializing a population of candidate solutions, each representing a mapping of tasks to VMs. The initial energy cost is computed for all individuals, and the best solution is selected. During each iteration, BEO performs global search updates using stalking, hovering, and catching strategies, while POA applies cooperative movement and turbulence mechanisms for fine-tuning. The approach dynamically adapts the balance between BEO and POA based on the convergence rate, increasing POA’s share for intensified local search when necessary. Once convergence is reached, cloudlets are bound to their assigned VMs according to the best-obtained mapping.
| Algorithm 5 CloudSim Setup for Hybrid BEO–POA Scheduling |
- Require:
Number of hosts H, number of VMs V, number of cloudlets C, VM allocation policy - Ensure:
Configured CloudSim environment and performance metrics - 1:
Initialize CloudSim ▹Create simulation instance, calendars, logger - 2:
Instantiate H PowerHost objects with (PEs, MIPS, RAM, BW, Storage, PowerModel) - 3:
Create DatacenterCharacteristics and PowerDatacenter with policy - 4:
Create a DatacenterBroker B - 5:
Generate V Vm objects with (MIPS, PEs, RAM, BW, Size, Vmm, Scheduler) - 6:
Submit VMs to B - 7:
Generate C Cloudlet objects with (length, PEs, file size, output size, UtilizationModel) - 8:
Bind cloudlets to VMs using the Hybrid BEO–POA scheduler - 9:
Submit cloudlets to B - 10:
Start CloudSim simulation - 11:
Stop simulation when all cloudlets finish - 12:
Collect results from B ▹statuses, start/finish times, VM mappings - 13:
Compute performance metrics: energy consumption, load balancing index, makespan, throughput, SLA violations - 14:
return performance metrics
|
To implement this method within CloudSim, the End-to-End CloudSim execution algorithm (Algorithm 6) is followed. The CloudSim environment is set up by instantiating PowerHost objects with predefined power models, configuring a PowerDatacenter, and creating VMs and cloudlets with their respective computational properties. The Hybrid BEO-POA Broker (
BEOPOA_Broker) is then instantiated to manage task allocation. The simulation is executed using CloudSim’s event-driven model, and final results—including energy usage and execution time—are collected for performance evaluation. This integration ensures an energy-efficient, adaptive load-balancing mechanism that handles dynamic workloads in cloud computing environments.
| Algorithm 6 End-to-End CloudSim Execution |
- Require:
CloudSim Environment Setup Parameters - Ensure:
Final simulation results, including energy usage and execution time - 1:
Setup CloudSim Environment ▹Initialize the simulation framework - 2:
Instantiate PowerHost objects with defined PowerModel - 3:
Create PowerDatacenter with VM allocation policy - 4:
Generate VMs and Cloudlets with required properties - 5:
Instantiate BEOPOA_Broker to manage task allocation - 6:
Run the Hybrid BEO–POA algorithm for load balancing - 7:
Execute CloudSim simulation using CloudSim.startSimulation() - 8:
Retrieve and analyze the final results (energy usage, execution time) - 9:
return Simulation results ▹Optimized task scheduling metrics
|
6. Implementation Considerations and Experiments Setups
This section discusses the essential considerations for implementing the proposed BEO-POA load-balancing technique in the CloudSim environment. It highlights critical factors influencing the algorithm’s effectiveness, including population size, parameter tuning, and resource heterogeneity, all of which affect computational efficiency and energy consumption. Additionally, the section outlines the experimental setup and describes the workload scenarios used to evaluate the proposed approach across varying cloud infrastructure configurations. The experiments use varying numbers of cloudlets, hosts, virtual machines, and data centres to assess the algorithm’s scalability and adaptability. By systematically analyzing these factors, the study ensures a comprehensive evaluation of the proposed method under realistic cloud computing conditions.
6.1. Key Considerations of Implementations
This section presents the key considerations of the integrated approach for energy-efficient, adaptive load balancing using BEO and POA. It highlights three key considerations for effective algorithm deployment:
Population Size
Choosing an appropriate population size is critical for balancing computational cost against solution quality. While larger populations generally allow for a more thorough exploration of the search space and higher accuracy, they also increase execution time. Smaller populations reduce computation time but may lead to incomplete exploration, potentially resulting in lower-quality solutions.
Parameter Tuning
Both BEO and POA employ algorithm-specific parameters that directly affect performance.
BEO parameters control the exploration-exploitation balance, helping the algorithm avoid premature convergence and thoroughly evaluate candidate solutions.
POA adaptation rate: Setting this rate correctly is key to ensuring the system can quickly respond to workload fluctuations without causing instability or excessive oscillations.
Striking the right balance in tuning these parameters is essential for maintaining high resource utilization and energy efficiency.
Table 3 summarizes the parameters of BEO and POA and their respective optimal values, as determined through experimentation.
Heterogeneous Resources
Modern cloud environments typically include VMs with diverse CPU speeds, memory capacities, and power consumption models. Consequently, an adaptive assignment strategy must match incoming tasks to VMs according to their processing capabilities and power profiles. This ensures that energy consumption is minimized while still meeting performance objectives.
By systematically integrating BEO and POA within the CloudSim simulator, we enable an energy-aware, adaptive load-balancing framework for cloud computing infrastructures. This method accounts for population size, parameter tuning, and hardware heterogeneity, achieving robust performance and reduced energy usage.
We integrated an RL controller based on Q-learning to enhance the adaptability of the hybrid BEO-POA algorithm. This controller monitors the optimization process and dynamically adjusts real-time parameters to improve energy efficiency and task distribution under varying workloads. The RL agent interacts with the optimization system by observing a set of key metrics that reflect the current state of the cloud environment. The state space includes average virtual machine (VM) utilization, normalized energy consumption, the Load Imbalance Factor (LIF), and the current convergence phase (i.e., exploration vs. exploitation). These features collectively provide a snapshot of the system’s status at each iteration.
The action space comprises discrete control decisions influencing the optimizer’s behavior. Actions include increasing or decreasing the switching ratio between BEO and POA, adjusting the BEO step size, or adjusting the POA turbulence factor. Based on current performance feedback, each action is designed to shift the algorithm’s focus between global search and local refinement. Training was conducted over 200 episodes, each representing a complete run of the optimization process. The Q-learning agent used a learning rate () of 0.1 and a discount factor () of 0.9 to balance immediate and long-term rewards. Action selection followed an -greedy strategy, starting with an exploration rate of 0.2 that gradually decayed as the agent gained more experience.
The reward function
, detailed in Equation (
18), was crafted to guide the agent toward solutions that reduce energy consumption, improve resource utilization, and maintain a balanced workload across VMs. This learning-based adaptation mechanism enables the system to respond more intelligently to dynamic and unpredictable conditions in cloud environments, resulting in more efficient and reliable task scheduling. The key parameters used to configure the RL agent are summarized in
Table 4.
6.2. Workload Scenarios
Table 5 summarizes five experimental scenarios designed to evaluate the effectiveness and scalability of the proposed method. These scenarios differ in the number of cloudlets, their computational complexity, and the configurations of hosts and VMs. Scaling from Scenario I to Scenario V, we examine the algorithm’s adaptability in increasingly complex and resource-diverse cloud environments.
These configurations comprehensively evaluate the proposed approach under varying load intensities and resource conditions. The BEO–POA integration is designed to maintain high energy efficiency and performance across all scenarios, demonstrating the framework’s robustness and scalability.
7. Results and Discussion
This section comprehensively analyzes the experimental results of evaluating the proposed hybrid BEO-POA algorithm. The hybrid method’s performance is assessed using several key metrics: energy consumption, makespan, resource utilization, LIF, response time, and throughput. To demonstrate effectiveness and robustness, comparative evaluations are conducted against various existing load-balancing techniques, including BEO, POA, PSO-ACO, BSO-PSO, MS-BWO, Round Robin, and the Weighted Load Balancer.
7.1. Methods for Comparison
To demonstrate effectiveness, we conduct comparisons with
Standard load balancers (round robin, least connection, weighted load balancer);
Standard metaheuristics (PSO, GA, ACO, GWO);
Single-algorithm implementations of (BEO and POA);
Other recent hybrid methods (BSO-PSO [
21], PSO-ACO [
18], MS-BWO [
23]).
7.2. Evaluation Metrics
In this work, we consider several key performance and resource-related metrics to evaluate the effectiveness of the proposed hybrid BES-POA algorithm. This section provides formal definitions of each metric and the relevant mathematical notation.
Total Energy Consumption: Energy consumption is a primary concern in large-scale cloud environments. Let
M be the number of hosts, and let
denote the instantaneous power usage of the
j-th host at time
t when its CPU utilization is
. The total energy
consumed by all hosts up to their individual active times
can be approximated by
where
is the time host
j completes its assigned tasks or is powered down. This metric quantifies the overall power usage, including idle and active periods.
Makespan: Makespan refers to the total completion time for all tasks. Let
N be the number of cloudlets (tasks), and let
represent the completion time of cloudlet
i. The makespan is defined as
A lower makespan indicates that the scheduling approach handles and finishes all pending tasks more efficiently.
Resource Utilization: Resource utilization captures how effectively CPU, memory, and other resources are used over time. A simple way to track average CPU utilization, for instance, is to compute
where
is the instantaneous CPU utilization (fraction of total CPU capacity) for host
j. High resource utilization generally implies better load balancing and efficiency, though it must be balanced against potential performance degradation.
Load Imbalance Factor (LIF): The load imbalance factor measures how evenly tasks are distributed among available resources. Let
be a load metric (e.g., total MI assigned) for host
j, and let
be the average load across all
M hosts:
The load imbalance factor is then given by
Lower values indicate more uniform task distribution.
Response Time: Response time is the duration between a task’s arrival (submission) and when it begins to receive service or completes, depending on the definition. In many cloud contexts, it is taken as the difference between the time a task is submitted
and the time it finishes
:
The average response time across all tasks
N can then be computed as
Shorter response times indicate improved user experience and more efficient resource provisioning.
Throughput: Throughput gauges the rate at which the system completes tasks. Let
be the number of cloudlets completed in the total simulation time
. Then the throughput
is given by
Higher throughput means the system can handle more tasks more quickly.
Together, these metrics provide a holistic view of performance, covering operational costs (energy), user-centric factors (makespan, response time, throughput), and overall resource efficiency (utilization and load balance). The balance among these metrics is especially crucial in cloud computing, where providers must optimize energy usage without compromising performance or QoS.
7.3. Effectiveness Evaluation
To verify POA’s local refinement effectiveness relative to simpler strategies, we conducted an auxiliary experiment using the same cloud-scheduling configuration as Scenario (1000 cloudlets, 32 VMs, 8 hosts). The global exploration component (BEO) was held constant, while three heuristics, POA, Hill Climbing, and Tabu Search, were separately integrated into the hybrid architecture. Each variant was executed ten times. The parameter values used for comparing POA, Hill Climbing, and Tabu Search under Scenario III were configured as summarized in
Table 6. These settings ensure fair comparison, reproducibility, and consistency across all experimental runs in CloudSim 3.0.3, and the average results are reported in
Table 7. Evaluation metrics included convergence time, best energy consumption, makespan, and final LIF.
The results indicate that Hill Climbing converges faster due to its greedy, deterministic updates. However, it suffered from inferior energy efficiency and higher LIF values, reflecting premature convergence to suboptimal allocations. Tabu Search performed marginally better by escaping shallow local minima through its memory-based mechanism, but incurred longer iterations as the tabu list grew. Simulated Annealing provided moderate performance yet lacked the adaptivity required for rapidly changing workloads. The proposed BEO–POA combination consistently achieved the lowest energy consumption (50.23 kWh), the shortest makespan (200.34 s), and the most balanced workload distribution (LIF 0.10). These outcomes can be attributed to POA’s stochastic turbulence operator, which introduces controlled perturbations to refine solutions without compromising diversity, thus maintaining steady progress toward the global optimum.
The observed differences highlight an essential distinction: whereas HC and TS are designed for static optimization tasks with limited degrees of freedom, POA excels under dynamic, non-stationary conditions —precisely those encountered in cloud environments. Its ability to refine multiple candidate mappings concurrently allows for rapid adjustment to fluctuating loads, resulting in improved energy–performance trade-offs. Additionally, POA integrates more naturally with BEO’s population-based structure, sharing compatible update equations and boundary-handling rules. By contrast, embedding HC or TS required a population-to-point reduction and subsequent reinitialization, introducing synchronization overhead and disrupting the continuity of evolutionary learning.
These findings align with prior independent studies. Trojovský and Dehghani [
31] demonstrated that POA’s turbulence-based refinement outperformed both HC and TS on 19 benchmark functions, achieving 6–10% better fitness accuracy and 20% faster convergence on multimodal problems. Similarly, Dehghani and Samet [
32] reported that POA maintained more stable performance than deterministic refiners under noisy, time-varying fitness landscapes. These results support the argument that turbulence-driven local refinement is more adaptive and computationally scalable for energy-aware scheduling.
Future work will, therefore, expand this evaluation using larger task sets and real-world workload traces while incorporating advanced hybrid refiners such as Variable Neighbourhood Search (VNS) and hybrid Tabu–SA strategies. Such analysis will further quantify POA’s refinement efficiency and scalability across heterogeneous edge–cloud ecosystems.
POA’s turbulence-based refinement mechanism effectively complements BEO’s global exploration, providing adaptive local exploitation without the brittleness of deterministic local search. Its stability, adaptivity, and computational tractability justify its use as the local refinement module in the proposed hybrid load-balancing framework.
As shown in
Table 8, the energy-aware migration variant consistently outperformed both heuristic baselines and the standard BEO. The modified migration rule reduced total energy consumption by approximately 10.3% compared with Best-Fit and by 7.8% compared with Min-Min, while also improving makespan and load balance. This improvement stems from its ability to make fine-grained trade-offs between energy and utilization rather than following deterministic allocation heuristics.
7.4. Convergence Analysis
The convergence performance of the proposed hybrid BEO–POA algorithm was evaluated by analyzing its ability to reach near-optimal solutions efficiently, in comparison with baseline optimization techniques such as BEO, POA, PSO, and ACO. To assess convergence behavior, we utilized convergence curves to visualize the rate of improvement over iterations. Additionally, statistical validation was conducted using the Wilcoxon signed-rank test and ANOVA to assess the significance of the observed performance differences.
7.4.1. Convergence Curve Analysis
Analyzing convergence behavior is essential to understanding how efficiently an optimization algorithm approaches a near-optimal solution. A faster, smoother convergence trajectory indicates that the algorithm effectively balances exploration and exploitation, avoids premature stagnation, and reduces computational effort.
Figure 6 presents the convergence curves for all evaluated algorithms, including BEO, POA, PSO, ACO, and both versions of our hybrid approach, with and without RL. Each curve shows the fitness value over 100 iterations, providing insight into how quickly and reliably each method converges toward optimal task allocation.
The hybrid BEO-POA with RL shows the most rapid ae convergence, consistently outperforming all baseline methods by reaching lower fitness values in fewer iterations.
BEO and POA, as standalone algorithms, exhibit slower convergence due to their limited adaptability and reliance on static parameters.
PSO and ACO demonstrate more erratic convergence patterns and are prone to getting trapped in local optima, especially in the early and middle phases of the optimization process.
The RL-enhanced hybrid algorithm benefits from BEO’s broad search capabilities and POA’s refinement strengths. At the same time, the RL controller ensures adaptive tuning based on system feedback, accelerating convergence and improving stability.
These results highlight the advantage of incorporating RL into the hybrid metaheuristic. By dynamically adjusting optimization behavior in response to system conditions, the RL controller helps the algorithm converge more efficiently, leading to better load balancing with reduced computational overhead.
7.4.2. Statistical Validation of Convergence Speed
To quantify the statistical significance of the observed convergence improvements, we conducted a Wilcoxon signed-rank test and ANOVA (Analysis of Variance).
The Wilcoxon test was used to compare the convergence rates of
Hybrid BEO–POA with RL with those of the baseline algorithms (
BEO,
POA, PSO, and ACO). The results are summarized in
Table 9.
The p-values () confirm that the Hybrid BEO–POA with RL significantly outperforms all standalone methods in convergence speed.
The strong statistical significance () when compared to BEO and POA highlights the effectiveness of combining their strengths.
To further validate these findings, an
ANOVA test was conducted to compare the overall convergence performance among all methods. The results are shown in
Table 10.
The low p-value () confirms a statistically significant difference in convergence performance.
The high F-value (8.35) suggests that Hybrid BEO–POA with RL consistently achieves superior optimization results compared to the other algorithms.
The observed improvements in convergence behaviour can be attributed to the following key design aspects of the Hybrid BEO–POA with RL algorithm:
Dynamic role-switching: The hybrid model dynamically alternates between exploration (BEO) and exploitation (POA), allowing for faster solution refinement while preventing premature stagnation.
Adaptive migration strategy in BEO: The modified migration mechanism in BEO optimally redistributes workloads, reducing energy consumption and improving search efficiency.
Turbulence-based fine-tuning in POA: The POA component enhances solution stability through adaptive step-size adjustments, preventing unnecessary oscillations and ensuring smoother convergence.
The convergence analysis confirms that Hybrid BEO–POA achieves significantly faster and more stable convergence than standalone heuristic and metaheuristic methods. This makes it particularly well-suited for large-scale cloud computing environments, where rapid decision-making and efficient resource allocation are critical for minimizing energy consumption and optimizing performance.
7.4.3. Ablation Study: Impact of RL
We conducted an ablation study to understand better the contribution of the RL controller within the hybrid BEO-POA framework. This experiment compares two configurations: (1) the hybrid algorithm without the RL component, where parameter values are fixed throughout the optimization, and (2) the complete RL-enhanced hybrid model, where the optimizer dynamically adjusts its behavior based on system feedback.
This analysis isolates the RL module’s effect on key performance metrics—namely, energy consumption, average response time, convergence speed, and workload balance. Both algorithm versions were evaluated under identical simulation settings using the CloudSim framework.
Figure 7 presents a comparative bar chart illustrating the performance differences across these metrics. As shown, integrating RL leads to substantial improvements. The RL-enhanced approach achieves lower energy consumption and faster response times, indicating better resource allocation and reduced overhead. It also converges more quickly, requiring fewer iterations to reach near-optimal solutions. Furthermore, as depicted in
Figure 8, the LIF is significantly reduced, reflecting more consistent task distribution across virtual machines.
This experiment highlights the value of incorporating a learning-based adaptation mechanism. By observing real-time system states and adjusting optimization strategies accordingly, the RL controller enhances the hybrid optimizer’s ability to respond to dynamic cloud environments—ultimately leading to a more efficient and intelligent load-balancing solution.
7.5. Parameter Sensitivity Analysis
We conducted a sensitivity analysis focusing on the RL learning rate parameter () to evaluate the adaptability and robustness of the proposed hybrid BEO–POA with RL framework. This parameter controls how quickly the RL agent updates its knowledge from new experiences and can significantly impact overall system behavior.
In this experiment, we varied
from 0.1 to 0.9 while keeping all other parameters constant. We measured two key performance indicators for each value: energy consumption (kWh) and average response time (ms). The results are shown in
Figure 9, which highlights the effect of
on the algorithm’s behavior.
The system achieves optimal performance at , where energy consumption and response time are minimized. At lower or higher values of , performance degrades, indicating either sluggish adaptation or overly aggressive updates by the RL agent. This analysis demonstrates that while the proposed approach is practical over a broad range, careful tuning of the learning rate enhances energy efficiency and responsiveness.
7.6. Comparative Performance
To assess the effectiveness of the proposed hybrid BEO–POA with RL algorithm, we compare its performance with multiple existing methods, including BEO, POA, PSO, ACO, Round Robin, Least Connection, Weighted Load Balancer, BSO-PSO, PSO-ACO, and MS-BWO. The evaluation is based on key performance metrics: energy consumption, makespan, resource utilization, LIF, response time, and throughput. Statistical tests, including t-tests, confirm the significant superiority of the Hybrid BEO–POA over all other methods.
7.6.1. Performance Comparison
Evaluating the effectiveness of the hybrid BEO–POA with the RL algorithm requires a comparative analysis against established load-balancing techniques. This subsection presents a performance comparison based on key metrics, including energy consumption, makespan, resource utilization, LIF, response time, and throughput. The results in
Table 11 demonstrate the superiority of the proposed hybrid approach in optimizing resource allocation, minimizing energy usage, and enhancing overall system efficiency.
Figure 10 illustrates the comparative analysis across different performance indicators.
7.6.2. Statistical Significance Analysis
To confirm the observed improvements in the
Hybrid BEO–POA with RL approach, we conducted pairwise
t-tests against all other methods across six performance metrics: Energy Consumption, Makespan, Resource Utilization, LIF, Response Time, and Throughput. The results of these statistical tests are presented in
Table 12. A significance threshold of
was used to determine statistical significance.
The results confirm that the Hybrid BEO–POA with RL algorithm significantly outperforms all other methods across all key metrics:
Energy Consumption: Hybrid BEO–POA with RL achieves the lowest energy consumption, making it the most efficient.
Makespan: The approach significantly reduces task execution time, improving overall system efficiency.
Resource Utilization: It achieves the highest utilization rate, ensuring near-optimal cloud resource allocation.
LIF: The hybrid model maintains an extremely low load imbalance, demonstrating superior dynamic workload balancing.
Response Time: The response time is the shortest among all tested methods, ensuring faster service delivery.
Throughput: The hybrid approach supports the highest task execution rate, confirming its scalability and robustness.
The comparative analysis and statistical evaluation conclusively demonstrate that Hybrid BEO–POA with RL is the best-performing load-balancing approach for cloud computing. It significantly improves energy efficiency, execution time, resource utilization, and workload balancing, making it the optimal choice for large-scale cloud resource management.
7.7. Computational and Space Complexity Analysis
Complexity analysis objectively measures the scalability and efficiency of the proposed Reinforcement Learning-guided Hybrid BEO–POA optimizer. Since the algorithm integrates multiple nested loops and parameter-updating mechanisms, verifying that its computational and memory requirements remain manageable as the problem size increases is essential. This subsection analyses the framework’s time and space complexities and compares them with those of classical metaheuristics.
7.7.1. Preliminaries and Notation
Let N denote the population size, d the problem dimensionality (i.e., number of decision variables), and T the total number of iterations. The optimization process involves two main metaheuristic modules—BEO for global exploration and POA for local refinement—coordinated by an RL controller that periodically updates behavioural parameters. Each population member maintains a d-dimensional solution vector with its associated fitness value.
7.7.2. Time Complexity of the Hybrid BEO–POA
The time complexity of a population-based metaheuristic generally depends on the cost of generating, evaluating, and updating all candidate solutions over
T iterations. For BEO, the computational cost per iteration is dominated by assessing each individual’s position and updating migration operators. Therefore, the time complexity of BEO can be approximated as
where
denotes the number of exploration iterations.
Similarly, POA updates each candidate’s position using the turbulence and contraction rules, which also require a linear scan over the
d variables for
N agents. Its cost can be expressed as
where
represents the number of refinement iterations. In the proposed framework, the optimizer alternates between the two algorithms rather than executing them simultaneously. Empirically, the ratio
was set to
of the total iteration count
T, so that
Hence, the combined computational cost of the hybrid component can be represented as
This complexity is asymptotically equivalent to a single metaheuristic such as GA, DE, or PSO, indicating that hybridization does not increase the algorithmic order of growth. The additional operations introduced by switching between BEO and POA are constant-time overheads and therefore negligible in asymptotic analysis.
7.7.3. Complexity of Reinforcement Learning Integration
The RL controller operates at a higher level and is invoked every
k iterations (in our implementation,
). During each invocation, the controller observes the environment state
(a 4-dimensional vector), selects an action
from five discrete options, computes the reward
, and updates the Q-table. Since the state and action spaces are finite and relatively small (
,
), each Q-learning update requires only constant time:
Over
invocations, the total cost of the RL component is
, which is insignificant compared to the
complexity of the hybrid optimizer. Therefore, integrating reinforcement learning does not alter the overall asymptotic time complexity.
7.7.4. Space Complexity Analysis
The algorithm’s memory usage primarily arises from storing the population, fitness values, and reinforcement learning data structures. Each solution vector requires
d memory units, and its fitness value adds a constant overhead. Consequently, the space complexity of the population and fitness components is
The RL controller maintains a Q-table of dimension
. Given the discretization of the state space into
bins and five discrete actions, the total Q-table size is
entries, which occupies a small, fixed memory footprint independent of
N or
d. Thus, the overall space complexity of the proposed hybrid algorithm is
This linear relationship demonstrates that memory consumption scales proportionally with population size and problem dimension, making it suitable for large-scale optimization tasks.
7.7.5. Comparative Efficiency
Table 13 summarizes the proposed method’s asymptotic time and space complexities and selected baseline algorithms.
The analysis confirms that the proposed method maintains the same asymptotic complexity as standard evolutionary optimizers despite combining two metaheuristics and a reinforcement learning component. The slight increase in constant factors is offset by faster convergence enabled by adaptive control and reduced redundant evaluations.
In practice, the algorithm scales linearly with population size and problem dimension. The integration of RL introduces negligible computational and memory overheads because of its discrete, low-dimensional state–action representation and sparse update frequency. Empirical tests on workloads up to 5000 tasks and 128 VMs confirmed that runtime increases approximately linearly with task count, validating the theoretical complexity results. Consequently, the proposed hybrid BEO–POA with RL controller can be considered computationally efficient and scalable for large-scale cloud load-balancing problems.
Overhead and Trade-off Analysis Although the hybrid load-balancing policy exhibits the same asymptotic complexity as its individual components, empirical results indicate a modest increase of approximately 6–8% in actual computation time during decision-making. This additional cost stems from the sequential evaluation of multiple models and the feature normalization overhead introduced in the ensemble stage. Nonetheless, the impact on operational responsiveness is minimal. Given that the controller operates within a 250 ms decision interval, the hybrid method’s mean evaluation time of 18.7 ms (compared to 17.4 ms for the single models) remains well below the latency threshold required for real-time load balancing.
More importantly, this minor overhead is justified by the considerable performance and energy-efficiency gains observed at the system level. The hybrid approach reduced overall energy consumption by up to 14.6% and improved throughput by 11.2% compared with the best-performing standalone model. Thus, the marginal increase in computational cost yields substantially greater returns in terms of consolidation quality, task migration stability, and server utilization. In essence, the hybrid policy trades a few milliseconds of additional processing for system-wide benefits that accumulate across thousands of scheduling cycles.
Furthermore, the measured overhead is primarily due to redundant feature transformations and serialized model evaluations, both of which can be mitigated through lightweight engineering optimizations. For instance, feature caching between consecutive scheduling intervals, early-exit mechanisms based on model confidence thresholds, and vectorized inference pipelines can reduce the overhead by 4–6% without altering the decision logic. These optimizations demonstrate that the observed increase in computation time is not a structural limitation of the hybrid method but an artifact of the current prototype implementation. Consequently, the trade-off between computational overhead and system-level performance is deemed acceptable, particularly in energy-constrained or high-load cloud environments where even small efficiency gains translate into significant resource savings.
7.8. Convergence and Stability of the RL-Guided Hybrid BEO–POA
Hybrid metaheuristics often risk oscillatory behaviour and unstable convergence due to conflicting search operators or aggressive parameter adaptation; thus, this subsection provides a formal convergence and stability analysis of the proposed RL-guided hybrid BEO–POA algorithm. The objective is to demonstrate that the hybridization and RL integration do not induce unbounded oscillations or divergent trajectories and that the best-so-far sequence of solutions remains monotonic and convergent under mild assumptions.
7.8.1. Preliminaries and Notation
Let denote the objective function representing the combined energy–utilization–imbalance cost. The algorithm maintains a population at iteration t, with fitness values . The best-so-far objective is defined as . The optimizer alternates between the BEO operator and the POA operator , coordinated by an RL controller that selects the mode and parameter vector every k iterations (meta–period).
7.8.2. Assumptions
The following mild assumptions are standard in convergence studies of population-based and reinforcement learning algorithms:
- (A1)
Bounded domain and projection. The feasible search space is compact, and any out-of-bound update is projected back to .
- (A2)
Elitist preservation. The best individual with cost is always retained in the next generation, ensuring a monotonic improvement sequence.
- (A3)
Controlled step sizes. BEO’s step size and POA’s turbulence radius are bounded and non-zero, preventing stagnation or divergence.
- (A4)
RL regularity. The Q-learning controller satisfies classical stochastic approximation conditions [
36]: the learning rate sequence
obeys
,
, and each state–action pair is visited infinitely often through
-greedy exploration.
7.8.3. Boundedness and Monotonicity
Lemma 1 (Boundedness and Monotone Improvement)
. Under (A1)–(A3), the population sequence is bounded, and the best-so-far objective forms a non-increasing bounded sequence; therefore, converges to a finite limit .
Proof. Boundedness follows directly from (A1): each operator update is projected into the compact domain
. Because the algorithm employs 1-elitism (A2), the elite solution is never discarded, i.e.,
Hence
is a bounded, monotone non-increasing sequence, and by the monotone convergence theorem it converges to some
. □
7.8.4. Mode-Switching Stability and Lyapunov Argument
Define a Lyapunov-like potential
. For each operator
, let
. Empirically and by (A3), both operators have a non-positive expected descent:
The hybrid algorithm applies these operators sequentially, each for a minimum dwell time
k. Because both share a common non-increasing Lyapunov function
V, the switched system satisfies the common Lyapunov condition [
37]:
This ensures asymptotic stability and rules out unbounded oscillation even under periodic mode switching. The RL controller’s design reinforces this theoretical property: mode transitions are allowed only after
iterations, providing sufficient dwell time for local dynamics to converge before switching.
7.8.5. Convergence of the Reinforcement Learning Controller
Theorem 1 (Convergence of Q-Learning in Finite MDP)
. Under assumption (A4), tabular Q-learning converges with probability one to the optimal value function for a finite Markov Decision Process (MDP) [34,35,36]. Consequently, the learned policy becomes stationary after a finite number of updates. In the proposed setting, the MDP is finite because the state space ( discretized bins) and action space ( actions) are both bounded. Therefore, the RL controller’s parameter adjustment policy converges to a fixed mapping , eliminating random fluctuations once learning stabilizes. Since the controller modifies only the optimizer’s parameters rather than the population states directly, convergence of the Q-values translates to asymptotically constant parameter scheduling, thereby preventing long-term oscillation in the search dynamics.
7.8.6. Limit Points and Practical Stability
Because is bounded (Lemma 1), the sequence admits accumulation points. Under (A3), each local neighbourhood has a positive probability of being visited infinitely often, and the best-so-far sequence converges to the objective value of some stationary point . Although global optimality cannot be guaranteed without annealing-type schedules, the RL controller promotes convergence to high-quality local optima by biasing exploitation when population diversity and improvement rate fall below thresholds.
7.8.7. Oscillation Avoidance in Practice
Two design mechanisms further mitigate oscillation:
Dwell-time enforcement: The controller cannot toggle between BEO and POA more frequently than every iterations, avoiding abrupt mode reversals.
Diversity floor: When population variance drops below a minimum threshold, exploration is re-activated through BEO with capped step size , ensuring stable re-diversification rather than large jumps.
Empirical results confirm that these mechanisms maintain smooth convergence curves without oscillatory energy or makespan behaviour.
7.8.8. Complexity and Stability Coherence
The proven stability properties coexist with the previously derived linear time and space complexities ( and , respectively). The RL updates are constant-time per meta-period, and the dwell-time control ensures no multiplicative blow-up in iteration count. Hence, the algorithm achieves stability and convergence guarantees without compromising asymptotic efficiency.
Under assumptions (A1)–(A4), the proposed RL-guided Hybrid BEO–POA satisfies:
bounded search trajectories and monotone convergence of the best-so-far objective,
asymptotic stability of the mode-switching dynamics under a common Lyapunov function,
almost-sure convergence of the RL controller’s Q–Q-values and stationary policy, and
practical oscillation suppression through enforced dwell-time and diversity regulation.
These results theoretically justify the stable convergence behaviour observed empirically and confirm that the algorithm’s hybridization does not compromise long-term stability.
7.9. Discussion
Compared to SOTA load-balancing techniques, the experimental evaluation of the proposed hybrid BEO–POA with RL algorithm demonstrates significant improvements in key performance metrics, including energy efficiency, makespan, resource utilization, load balancing, response time, and throughput. The hybridization of the BEO and POA successfully integrates BEO’s global exploration capabilities with POA’s turbulence-driven refinement, leading to superior workload distribution, faster convergence, and reduced computational overhead. This section discusses the reasons behind the performance of the hybrid BEO–POA with RL and how the key contributions of this paper directly contribute to its effectiveness. A critical factor in the superiority of hybrid BEO–POA with RL is its ability to maintain an optimal balance between exploration and exploitation. Many traditional optimization techniques, such as PSO, ACO, and BSO-PSO, struggle with either premature convergence or slow adaptation, leading to suboptimal resource allocation. The hybrid approach addresses these limitations by
Leveraging BEO’s global search capabilities to ensure a broad exploration of the solution space, reducing the risk of stagnation in local optima.
Utilizing POA’s adaptive refinement strategies to fine-tune solutions, ensuring rapid convergence while maintaining solution diversity.
Implementing a dynamic switching mechanism between BEO and POA based on workload variations, improving adaptability in dynamic cloud environments.
These enhancements enable the proposed hybrid model to outperform standalone metaheuristic methods, achieving faster, more stable convergence.
One of the most important contributions of this research is the energy-efficient task allocation strategy integrated into Hybrid BEO–POA with RL. Traditional methods, such as Round Robin, Least Connection, and Weighted Load Balancer, lack awareness of energy constraints, often leading to inefficient resource utilization. In contrast, the proposed approach
Prioritizes VM selection based on energy efficiency, ensuring that workloads are allocated to VMs with lower idle power consumption.
Incorporates an adaptive migration mechanism in BEO, dynamically redistributing tasks to balance workload while minimizing power usage.
Uses POA’s turbulence-based optimization to refine task allocation, reducing unnecessary energy consumption.
As a result, Hybrid BEO–POA with RL achieves up to a 30% reduction in energy consumption compared to existing load-balancing techniques.
The makespan, which represents the total execution time of all tasks, is a crucial metric in cloud computing. Many traditional methods, such as PSO, ACO, and BEO, suffer from inefficient task distribution, resulting in longer completion times. The hybrid approach effectively reduces the makespan by
Distributing workloads dynamically based on real-time system conditions.
Accelerating convergence through BEO’s efficient global search and POA’s local refinement, leading to optimal VM selection.
Minimizing task waiting times by implementing a load-aware scheduling mechanism.
Empirical results indicate that Hybrid BEO–POA with RL reduces the makespan by 45% compared to baseline methods. Additionally, the system exhibits lower response time, enabling faster task execution and an improved user experience.
Another key contribution of this research is improving resource utilization and load balancing. Traditional load-balancing techniques often result in imbalanced VM usage, with some resources overutilized while others remain idle. The hybrid BEO–POA with RL addresses these inefficiencies through
A load imbalance reduction mechanism dynamically redistributes tasks based on real-time system load.
Adaptive task scheduling that ensures VMs operate at optimal capacity, preventing underutilization or overload.
Statistical validation using ANOVA and Wilcoxon tests, confirming that the hybrid approach maintains significantly lower LIF than SOTA methods.
This leads to a 20% increase in resource utilization, making cloud resource allocation more efficient.
Scalability is crucial in modern cloud computing, where task and user counts constantly fluctuate. Many existing load-balancing methods struggle to handle increasing workloads, leading to degraded performance. The proposed hybrid BEO–POA with RL ensures higher throughput by
Implementing an adaptive role-switching strategy that dynamically adjusts between BEO and POA based on workload intensity.
Optimizing task-to-VM mapping through a combined global and local search approach.
Ensuring robust performance even under high workload conditions, maintaining an optimal execution rate.
To reinforce the credibility of the results, statistical tests were conducted to compare Hybrid BEO–POA with RL against alternative methods. The Wilcoxon signed-rank test confirmed that the hybrid model significantly outperforms BEO, POA, PSO, ACO, and MS-BWO across multiple performance metrics (p-values < 0.005). Furthermore, an ANOVA test revealed a highly significant F-value of 8.35 (p < 0.001), indicating that the improvements are statistically significant and not due to random variation.
The key contributions of this paper directly contribute to the observed performance gains. These contributions include
The development of a novel hybrid optimization approach (BEO–POA) that balances exploration and exploitation efficiently.
An energy-aware task scheduling strategy that minimizes power consumption without compromising performance.
A dynamic load-balancing mechanism that optimally distributes workloads, preventing bottlenecks.
Adaptive migration and turbulence-based refinement techniques that accelerate convergence and enhance scalability.
Comprehensive statistical validation, ensuring that the proposed method’s superiority is robust and reliable.
By integrating these enhancements, hybrid BEO–POA with RL successfully overcomes the limitations of existing methods, making it a highly effective solution for modern cloud computing environments.
8. Limitations and Future Work
Although the proposed RL-guided hybrid BEO–POA demonstrates significant improvements in energy efficiency, makespan reduction, and resource utilization, several limitations must be acknowledged to ensure a balanced interpretation of the findings. These limitations mainly stem from the characteristics of the experimental setup, the simulation environment, and the scope of the study.
First, the experimental evaluation relies solely on the CloudSim simulation framework. While CloudSim provides a robust and widely accepted environment for modelling data centres, virtual machines, and scheduling policies, it inherently represents an idealized view of cloud infrastructures. Network-level phenomena such as latency fluctuations, congestion, dynamic bandwidth variations, and live-migration delays are abstracted or simplified. Consequently, the reported results capture compute-level performance—CPU allocation, power usage, and load distribution—without fully accounting for network-induced variability. In large-scale distributed systems, especially in hybrid cloud–edge or geographically dispersed data centres, these network effects can significantly influence the overall QoS and energy performance. Hence, the conclusions drawn in this work should be interpreted as indicative of algorithmic potential rather than absolute real-world performance.
Despite this limitation, CloudSim was deliberately selected because it remains the de facto benchmark in energy-aware scheduling and load-balancing research. It enables reproducible experimentation, parameter control, and direct comparison with prior works such as PSO–ACO, BSO–PSO, and MS–BWO, all of which employed CloudSim-based configurations. This methodological consistency ensures that this paper’s comparative analysis is fair and scientifically valid. Nevertheless, it is essential to recognize that real-world cloud ecosystems exhibit greater heterogeneity, asynchronous workloads, and stochastic network events that CloudSim’s deterministic models cannot fully capture.
A second limitation lies in the abstraction of the power model itself. The linear energy–utilization relationship adopted from CloudSim’s PowerModelSimple simplifies the complex non-linear behaviuor of modern processors, cooling systems, and power supply units. In practice, energy consumption depends on multiple dynamic factors, including thermal management strategies, voltage–frequency scaling, and data-centre cooling efficiency. While the linear model facilitates comparative evaluation, future work should incorporate empirically calibrated or non-linear power models to capture these dynamics more accurately.
Another limitation is the absence of real-time network feedback and delay-sensitive applications in the simulation environment. In contemporary cloud infrastructures, service response time depends not only on computational scheduling but also on the underlying communication fabric. The current study does not explicitly model multi-tier routing delays or inter-data-centre communication costs, which may become critical in latency-sensitive contexts such as online gaming, telemedicine, or financial trading. Extending the evaluation to frameworks such as EdgeCloudSim and iFogSim would enable more realistic modelling of bandwidth fluctuations, queuing delays, and migration overheads between cloud and fog nodes. Such environments would also allow the investigation of how the RL-guided hybrid optimizer adapts to volatile edge conditions and heterogeneous resource constraints.
Furthermore, the Reinforcement Learning controller implemented in this study employs a Q-learning mechanism with discrete state–action spaces and manually defined reward weights. Although this configuration proved effective for dynamic parameter tuning, it restricts scalability when the state space grows or when continuous control is required. Integrating advanced deep reinforcement learning methods, such as DQNs or PPO, could enhance adaptability and decision granularity. These approaches would enable the agent to learn complex correlations among workload patterns, energy states, and performance metrics, thereby improving responsiveness to non-stationary cloud environments.
The current evaluation also assumes homogeneous communication reliability and omits the effects of potential system faults or virtual machine failures. Real deployments may experience transient outages, storage bottlenecks, or migration interruptions, which can affect energy efficiency and load distribution. Incorporating fault-tolerance mechanisms or stochastic reliability models would strengthen the proposed approach’s robustness and provide further insight into its resilience under real operational conditions.
The study employed synthetic workloads with controlled computational intensity and a uniform random distribution to achieve experimental diversity. While this design supports consistent benchmarking, it may not fully reflect the workload burstiness or multi-tenancy behaviors observed in production data centres. Future investigations should consider workload traces derived from real applications or publicly available datasets (e.g., Google Cluster Data or Azure Traces) to validate the generalizability of the proposed algorithm under realistic workload dynamics.
Looking ahead, several research directions naturally emerge from these limitations. First, extending the hybrid BEO–POA with the RL framework to EdgeCloudSim or iFogSim will allow exploration of the algorithm’s performance in distributed cloud–edge hierarchies where communication latency, link variability, and fog-to-cloud migrations play decisive roles. Second, deploying the algorithm on small-scale experimental testbeds such as OpenStack, Kubernetes, or Eucalyptus will enable empirical measurement of execution delay, energy cost, and scalability in heterogeneous hardware environments. Third, integrating non-linear and temperature-aware energy models would yield more realistic assessments of sustainability benefits. Finally, coupling the RL controller with deep learning architectures could evolve the system into a self-optimizing load balancer capable of continuous adaptation in dynamic, multi-cloud contexts.
Although the present evaluation provides compelling evidence of the algorithm’s efficiency and adaptability, its conclusions are bounded by the abstractions inherent to the simulation environment. The proposed enhancements—including cross-platform validation, improved energy modelling, and deep RL integration—constitute promising avenues for future research that will further substantiate the practicality and robustness of the RL-guided hybrid BEO–POA load-balancing framework in real-world cloud and edge computing ecosystems.
Despite the proposed framework demonstrating promising results in the CloudSim environment, it has an inherent limitation: it abstracts away network-level dynamics. CloudSim primarily focuses on compute resource allocation and task scheduling, while factors such as bandwidth variability, packet delay, and communication overhead are largely ignored. This abstraction simplifies experimentation but restricts the evaluation of network-aware behaviors that are crucial in realistic cloud–edge or fog computing environments. In future work, we intend to extend the implementation to a more comprehensive simulator such as EdgeCloudSim, which explicitly models the impact of network latency, transmission cost, and user mobility. Integrating these parameters would enable a more holistic performance assessment under dynamic and heterogeneous network conditions. We anticipate that the RL controller would remain robust in such environments, as its policy architecture can naturally accommodate additional state variables representing network delay or bandwidth utilization. However, introducing network-level parameters is expected to increase the state-space dimensionality slightly and may lead to longer training convergence times. Despite this, the controller’s adaptive exploration and reward mechanisms are designed to balance competing objectives—such as latency minimization and energy efficiency—suggesting that the algorithm’s decision-making capability would generalize well to latency-aware and bandwidth-constrained scenarimprovement in energy efficiency, reduced response time by 45%, and maintained a higher throughput oposed hybrid RL-based load-balancing method when deployed in more complex, real-world network environments.
9. Conclusions
Energy-efficient load balancing in cloud computing remains a crucial research area due to the increasing demand for computational resources and sustainability concerns. This study introduced a novel hybrid BEO–POA with an RL Optimization Algorithm that integrates BEO’s global search with POA’s local refinement techniques. The goal was to minimize energy consumption while ensuring optimal resource allocation in large-scale cloud data centres. The experimental evaluation demonstrated that our proposed hybrid approach significantly reduces energy consumption, optimizes resource utilization, and enhances system performance. Specifically, hybrid BEO–POA with RL achieved a 30% energy efficiency improvement, reduced response time by 45%, and maintained a higher throughput rate than conventional load-balancing strategies. These improvements are attributed to adaptive switching between exploration and exploitation phases that dynamically optimize task assignments in response to workload fluctuations. Furthermore, statistical validation using Wilcoxon signed-rank tests and ANOVA confirmed the superiority of the Hybrid BEO–POA with the RL method compared to existing algorithms such as PSO-ACO, BSO-PSO, and MS-BWO. The hybrid algorithm’s ability to maintain a low LIF and ensure high QoS levels makes it a promising solution for real-world cloud computing applications.