1. Introduction
The escalating pressures of resource scarcity and environmental degradation have heightened the demand for innovative solutions in energy-intensive industries such as infrastructure and agriculture, where heavy machinery, notably hydraulic excavators (HEs), significantly contributes to global energy consumption and greenhouse gas emissions [
1]. HEs, indispensable in these sectors, account for a substantial share of operational energy use, making them critical targets for advancements in energy efficiency and sustainability [
2]. Current hybrid architectures, integrating an internal combustion engine (ICE) with advanced powertrains, aim to enhance energy efficiency in HEs. For instance, Do et al. [
3] proposed an electric hybrid hydraulic excavator that optimizes ICE–hydraulic pump synergy, improving efficiency across diverse conditions. However, this approach struggles with torque management under high loads due to the limited capacity of the hydraulic pump. Similarly, Nguyen et al. [
4] introduced a hydrostatic transmission paired with a planetary gear system, expanding the ICE’s operational range and enhancing adaptability, yet it falls short in delivering precise actuator control essential for operational accuracy and safety. These limitations, namely insufficient adaptability to dynamic conditions and high computational complexity, highlight the need for novel control strategies that effectively balance energy efficiency and robust performance in challenging HE scenarios.
Recent advancements in machine learning and metaheuristic optimization have spurred the development of intelligent control systems to tackle these challenges. Within the last five years, significant contributions include Ben Hazem et al. [
5], who optimized an adaptive neuro-fuzzy inference system (ANFIS) using an enhanced particle swarm optimization (PSO) algorithm for improved actuator precision, and Song et al. [
6], who employed a hybrid adaptive quantum PSO (HAQPSO) to tune PID parameters, achieving notable accuracy gains. Nguyen et al. [
7] advanced this field with a PSO-LSTM-GRU-PID model, integrating PSO with LSTM and GRU networks to achieve positional errors below 3 mm and energy savings of up to 10.89%. However, its computational complexity hinders real-time applicability under rapidly shifting conditions. Likewise, Xuanyu et al. [
8] proposed a beluga whale optimization–long short-term memory–gated recurrent unit (BWO-LSTM-GRU) framework for dynamic trajectory prediction, yet its reliance on GRU- and PID-like elements limits adaptability and increases processing demands, reducing its effectiveness in high-stakes hydraulic environments.
Despite these strides, existing methods often fail to reconcile precision, adaptability, and computational efficiency. The PSO-LSTM-GRU-PID model, for instance, integrates particle swarm optimization with a combination of LSTM and GRU networks alongside a PID controller, resulting in a highly complex architecture. This complexity not only increases the computational burden and makes real-time processing challenging but also complicates hyperparameter tuning. As a result, performance may be less robust under the pronounced nonlinearity, time delays, and noise typical of hydraulic environments. Similarly, the BWO-LSTM-GRU framework, while improving tracking accuracy, lacks the flexibility to handle abrupt operational shifts due to its dependence on GRU components. To address these shortcomings, this study proposes the beluga whale optimization–long short-term memory–random forest (BWO-LSTM-RF) framework. By replacing the GRU and traditional PID elements with a random forest (RF) module, this approach reduces computational complexity while enhancing adaptability to dynamic behaviors. Leveraging BWO to optimize four key LSTM-RF hyperparameters, the proposed model achieves boom cylinder trajectory errors below 3 mm and energy savings of up to 12.46%, outperforming conventional PID, LSTM-RF, and PSO-LSTM-GRU-PID systems by 12.46%, 8.98%, and 3.97%, respectively. Validated through AMESim simulation and experimental trials on a physical platform, this model-free design minimizes processing demands, offering a scalable, efficient solution for hybrid hydraulic excavators (HHEs).
To address these challenges with the proposed BWO-LSTM-RF framework, it is essential to first understand the system setup and its operational dynamics, which are detailed in the next section. The remainder of this paper is organized as follows:
Section 2 outlines the system setup,
Section 3 details the hybrid algorithmic control principles,
Section 4 evaluates performance through simulation and experimental analysis,
Section 5 examines economic implications, and
Section 6 concludes with key findings and future directions.
3. Core Principles of Hybrid Algorithmic Control
To improve both positional accuracy and power efficiency within the system, this study employs the BWO-LSTM-RF controller to orchestrate the interplay of its key components. Specifically, we refined cylinder positioning through precise valve adjustments that modulated the control valve’s aperture. Meanwhile, the HST optimized energy usage by regulating system dynamics. We depict this synergistic approach, facilitated by the BWO-LSTM-RF framework, in
Figure 2, with its operational details elaborated in the subsequent section. Within this schematic,
represents the desired command (for instance, a specified position or speed), while
denotes the measured output, providing feedback to ensure that the system’s response closely follows
and minimizes any deviation.
3.1. Beluga Whale Optimization (BWO) Algorithms
Zhong et al. [
13] introduced the beluga whale optimization (BWO) algorithm, which mimics the hunting patterns of beluga whale groups along with the phenomenon of whale fall. This algorithm is primarily structured into three phases: exploration, exploitation, and whale fall, with detailed procedures outlined as follows:
Step 1: Define the position matrix for beluga whales in the search space:
where
n represents the population size of beluga whales and
d indicates the dimensionality of the variables. The associated fitness value for each beluga whale is as follows:
Subsequently, the transition from the exploration phase to the exploitation phase hinges on the equilibrium factor
, formulated as follows:
Here, t signifies the current iteration number, T denotes the maximum number of iterations, and is a random value between 0 and 1. A value of indicates that the population is in the exploration phase, while signifies the exploitation phase.
Step 2: The exploration phase of the algorithm is modeled based on the swimming patterns of beluga whales. The position update equation for beluga whales is as follows:
where
indicates the position of the
i-th beluga whale in the
j-th dimension,
represents a random index within the
d-dimensional space, and
denotes the current position of the
i-th beluga whale. The random numbers
and
, ranging from
, enhance the stochastic nature of the update. The terms
and
reflect the fin orientations of beluga whales in odd or even dimensions, capturing their synchronized or mirrored movements during swimming or diving.
Next, Step 3 involves the exploitation phase of the algorithm, modeled after the foraging behavior of beluga whales, incorporating the Levy flight strategy to boost convergence, expressed as follows:
where
is the optimal position found,
and
are random numbers between 0 and 1,
denotes the current position of the
i-th beluga whale, and
indicates the position of a randomly selected beluga whale. The parameter
measures the intensity of random jumps in the Levy flight.
The Levy flight function
is defined as follows:
where
u and
v are random numbers following a normal distribution, with
set to 1.5.
Step 4: Model the whale-fall phase when a beluga whale faces threats, updating the position equation as follows:
where
, and
are random numbers between 0 and 1.
The whale-fall step size
is formulated as follows:
where
represents the step factor linked to the whale-fall probability and population size and
and
are the upper and lower boundaries of the variables.
The whale-fall probability
is given by the following:
3.2. Long Short-Term Memory (LSTM) Algorithm
LSTM [
14,
15,
16] is a notable variant of recurrent neural networks (RNNs) that mitigates the vanishing gradient problem common in conventional RNNs. It accomplishes this by utilizing three primary gates: the discard gate, the input gate, and the output gate. These gates operate in concert to selectively capture and update long-term information, thereby enabling the network to model dependencies across lengthy sequences. In practical applications, LSTM has been effectively applied to various time-series prediction tasks, including forecasting crude oil prices and estimating the operational lifespan of electronic components such as power MOSFET devices. More specifically, the discard gate employs a logistic activation to determine the proportion of historical data to retain, where a value of 1 signifies full retention of past information while 0 indicates complete removal.
Figure 3 illustrates the internal structure of an LSTM cell, showcasing how these gates coordinate to manage data flow and preserve long-range dependencies.
The forget gate within an LSTM cell controls the extent to which past information is retained in the cell state, thus managing the retention of historical data. This process is mathematically described as follows:
The output gate calculates its result by integrating the prior state with the present input, which can be expressed mathematically as follows:
In this framework, represents the input vector at the current time step. The weight matrices , , and , along with their corresponding bias terms , , and , are associated with the forget, input, and output gates, respectively. The prior hidden state is indicated by . The outputs from these gates are denoted as , , and . Moreover, signifies the current cell state while denotes the candidate state. Lastly, the parameters and govern the integration of new information into the temporary cell memory.
3.3. Random Forest (RF) Technique
The random forest (RF) is an ensemble learning approach developed by Breiman to address classification challenges [
17]. It mitigates overfitting by aggregating numerous decision trees, each constructed from distinct bootstrap samples of the dataset [
18]. These trees are grown to their fullest depth or halted by a termination condition, employing a randomly chosen subset of features at each node to reduce impurity, typically assessed via the Gini index:
where
represents the fraction of samples belonging to class
k. Each tree produces a predicted class
for an input
x, and the final RF prediction is derived through a majority vote across all tree predictions
.
where “majority” refers to the most common class among the classifications from all trees. This method not only curtails overfitting but also enhances generalization capabilities, positioning the RF as a robust and dependable tool for classification tasks [
19].
Figure 4 depicts the structure of the RF classification framework.
3.4. Assessment Metrics for Controller Effectiveness
In this study, we evaluate the performance of the proposed controller using two key metrics: positional accuracy and system energy usage. To measure positional accuracy, the cylinder displacement error is assessed through the root mean square error (RMSE) and the mean absolute error (MAE). These metrics facilitate a quantitative comparison between the proposed controller and other controllers, allowing for well-informed conclusions regarding their ability to achieve precise and accurate control of the cylinder’s position. The RMSE and MAE are defined as follows:
where
m denotes the total number of displacement error samples
e (
, where
).
5. Economic Analysis of Proposed Algorithm
The BWO-LSTM-RF framework offers considerable potential to reshape the design and operation of future hydraulic excavators, while delivering notable economic and environmental benefits. In terms of design, the framework’s ability to achieve positional accuracy below 3 mm and energy savings of up to 12.46% compared to traditional PID controllers could reduce the reliance on oversized hydraulic components, often used to compensate for control inaccuracies. This advancement may enable manufacturers to develop lighter, more compact excavators, lowering material costs and enhancing mobility without compromising performance. Operationally, the improved control precision and responsiveness, demonstrated in both simulation and experimental results (e.g., the RMSE reduced to 9.41% of PSO-LSTM-GRU-PID, 34.88% of LSTM-RF, and 61.34% of PID, as shown in
Figure 13), could enhance productivity. These improvements minimize downtime and optimize trajectory tracking in dynamic construction environments.
Economically, the energy efficiency gains translate into substantial cost savings for operators. For instance, a 12.46% reduction in energy consumption, as evidenced in
Table 11, could save approximately USD 3332 annually per 48 t excavator in fuel costs, particularly in large-scale construction projects where fleets of machines operate for extended periods. This estimate assumes that an excavator operates 10 h per day for 250 days per year, with a load scaled from 800 kg in the experiment to 48 t in real conditions, and uses diesel priced at USD 1.20 per liter as of March 2025. Environmentally, these energy savings contribute to sustainability efforts by reducing greenhouse gas emissions. With the same operational assumptions, the 12.46% decrease could reduce CO
2 emissions by approximately 7.3 tons per machine annually, based on a carbon dioxide emission factor of 2.63 kg per liter of diesel [
21].
Given the global scale of the construction industry, the proposed framework has significant potential for widespread impact. Based on annual sales data and market trends, the global fleet of hydraulic excavators is estimated to be approximately 6.5 million units in 2025, despite a projected 8% decline in global construction equipment sales to 1.08 million units in that year [
22]. With a forecasted recovery in sales to 1.2 million units by 2028, the global fleet is projected to grow to around 6.8 million units by that year [
22]. If the BWO-LSTM-RF framework were adopted across this global fleet, the cumulative fuel savings and CO
2 reductions could be substantial. For example, applying the per-machine savings to the estimated 6.5 million units in 2025 could result in annual fuel cost savings of approximately USD 21.66 billion and a CO
2 emission reduction of around 47.47 million tons, potentially increasing to USD 22.65 billion and 49.66 million tons by 2028. Furthermore, the control precision and energy efficiency improvements offered by the BWO-LSTM-RF framework could be adapted to other construction machines, such as forklifts, further amplifying its potential to reduce fuel consumption and emissions on a global scale.