1. Introduction
Suspension systems are essential in ensuring vehicle stability, ride comfort, and road handling by absorbing road-induced vibrations and shocks [
1]. Hydro-pneumatic struts (HPS) are increasingly being explored for vehicle suspensions due to their ability to provide both elastic and damping effects within a compact design [
2]. In recent years, compact hydro-pneumatic struts have been explored with a floating piston for controlling road-induced vehicle vibration. Such struts employ a floating piston to separate the gas and oil media, which introduces drawbacks such as excessive friction, leading to higher heat generation, and poor vibration isolation [
3]. A recent study proposed an alternate design to address these limitations. This strut design is considered very simple as it permits gas–oil emulsion and thereby nearly eliminates the friction effect. The design, however, involves highly complex challenges associated with the properties and flow of gas–oil emulsion. Unlike the earlier compact HPS, GOES eliminates the need for a floating piston, reducing friction and dissipating excess heat, which enhances the durability and efficiency of the system [
1]. However, despite its simple design, the GOES exhibits notable performance limitations attributed to fixed flow areas [
2].
In recent years, semi-active and active strategies have been applied to achieve variable flow areas and damping properties [
4]. While active suspensions can offer superior ride comfort and handling performance, their implementations have been limited due to high energy consumption, complexity, and maintenance costs. Alternatively, semi-active strategies offer an attractive compromise between performance and energy demand. Unlike the passive suspensions, the semi-active suspensions can adapt their damping characteristics in real time using electronically controlled flow valves, such as solenoid valves [
5,
6] or controllable magnetorheological (MR) fluids [
7,
8,
9]. Among semi-active suspension configurations, semi-active systems incorporating adjustable valves are particularly considered attractive for their cost-effective and compact design in addition to real-time controllability. Many studies have established superior performance of semi-active suspensions employing Skyhook control [
10,
11], where the damping force is regulated based on the motions of both sprung and unsprung masses. These control strategies, however, consider the damping force as a function of relative velocity alone. The damping force developed by a GOES, however, strongly depends on the displacement and frequency, apart from the velocity. The development of semi-active strategies involving strut deflection as well as frequency is thus vital for effective semi-active vehicle suspension design. Given these additional dependencies, existing semi-active strategies may not be directly applicable, highlighting the need for a new control approach specifically for GOES.
Hybrid control methods have also been explored to seek optimal semi-active suspension performance. For instance, Goncalves et al. [
12] introduced a hybrid control strategy that combines the benefits of both Skyhook and Groundhook control approaches to control sprung mass acceleration and suspension travel. Other control approaches, such as Displacement-Sensitive Shock Absorbers (DSSAs), have been developed to improve ride comfort and road handling by adjusting damping characteristics based on piston position, allowing for distinct soft and hard damping modes [
13]. Jadhav et al. extended the DSSA concept by incorporating transient displacement-sensitive flow orifices to enhance adaptability [
14]. Additionally, Energy-Flow-Driven (EFD) dampers have been developed to achieve semi-active control performance by dynamically switching between different damping modes based on energy flow analysis. Instead of continuously varying the damping coefficient, EFD adjusts between predefined control strategies, such as Skyhook, at a relatively low frequency and Acceleration-Driven-Damper (ADD) at higher frequencies, thereby optimizing ride comfort and road handling [
15]. While these advancements have improved semi-active suspension performance, they have done so by considering damping force dependence only on relative velocity and by switching the damping coefficient between distinct modes. Since the damping force also depends on strut travel and frequency, the control strategies need to consider both the relative displacement and frequency, apart from the velocity.
Among the various optimization-based approaches for solving control problems, deep reinforcement learning (DRL) has emerged as a promising technique. Recent advancements in hardware technologies, particularly the GPUs, have enabled the real-time processing of large datasets, thereby accelerating developments in a wide range of data-driven algorithms [
16,
17]. Several studies have investigated the applications of reinforcement learning (RL) approaches in semi-active suspension control. For instance, Ernst et al. [
18] employed a reinforcement learning framework to design a semi-active suspension system, utilizing the Fitted Q-Iteration algorithm. This approach operated under the constraint that the damping coefficient could switch only between the predefined maxima and minima values, which resulted in performance comparable to that of the Mixed SH-ADD algorithm [
18]. Liu et al. [
19] proposed a semi-active suspension control strategy using the Deep Deterministic Policy Gradient (DDPG) algorithm in a continuous action space. Recent work by Wong et al. [
20] proposed a reinforcement learning-based fault-tolerant control strategy for a magnetorheological semi-active air suspension system, using a generalized fuzzy hysteresis model to address nonlinearities and fault dynamics. Their results highlighted the ability of RL controllers to adapt to complex, real-world uncertainties, further motivating the use of learning-based approaches in advanced suspension control systems.
This study proposes a GOES-based semi-active suspension control strategy using a deep reinforcement learning (DRL) technique. The proposed model, validated using the experimental data, illustrated improved ride comfort performance by varying unique damping characteristics of the strut. The proposed method permits continuous regulation of the damping force instead of switching between pre-defined states. The DRL controller regulates the flow area based on vertical acceleration and velocity of the sprung mass, in addition to the suspension travel. The flow valve opening is discretized into ten levels, ranging from 10% to 100%, which represent different available actions for the DRL agent. The reward function was selected as the weighted sum of vertical acceleration and velocity of the sprung mass, with appropriate weighting factors using a quarter-vehicle simulation platform, considering a random road excitation. The performance of the proposed GOES is compared with that of the optimally tuned and the original passive GOES to demonstrate the superior performance of the proposed strategy.
2. GOES Model and Experimental Tests
A gas–oil emulsion hydro-pneumatic strut (GOES), schematically illustrated in
Figure 1a, is used to develop a semi-active control strategy, employing a deep reinforcement learning (DRL) approach. Under dynamic excitations, the nitrogen gas in the upper chamber tends to gradually dissolve into the oil, forming a gas–oil emulsion (GOE). The strut consists of two fluid chambers: (
i) the piston-side (p-s) chamber, which contains the emulsion and the nitrogen gas on top of it, and (
ii) the rod-side chamber, which is filled solely with the emulsion. In the original setup, these chambers are connected via two pressure-dependent check valves with an area of
Av and two bleed orifices with an area of
Ab, as depicted in
Figure 1a. The experimental setup, shown in
Figure 1b, was designed to measure the performance of the strut under harmonic excitations ranging from 1 to 8 Hz, while the strut temperature was held around 30 °C. The experimental results revealed that the nonlinear behavior of the strut is highly dependent on the excitation conditions; at low velocities, the response is mainly influenced by gas compressibility, while at higher velocities, the damping effects dominate, leading to stronger nonlinear characteristics.
To further enhance controllability and adaptability, a new configuration was introduced by replacing the traditional check valves and bleed orifices with two inner solenoid valves, as shown in
Figure 1c. This adaptive solenoid valve permitted real-time modulation of the flow area, enabling the development of a semi-active damping strategy tailored to the unique characteristics of GOES. This modification allows for more precise control of damping properties, overcoming the limitations of passive damping systems.
The analytical modeling of the strut is based on a previous study [
2], incorporating flow continuity, pressure equilibrium, and the polytropic behavior of real gas, modeled using the van der Waals equation. To simplify the model, gas mass transfer between chambers and the effects of fluid inertia are neglected to further simplify the model, as they represent the primary sources of uncertainty. Within the charge pressure range in this study, the effect of the mentioned assumption is negligible. However, if the strut operates under significantly higher pressures, neglecting this effect in the formulation may result in slight deviations in the system response. The total force exerted by the strut is given by
where
and
are the fluid pressures in the piston- and rod-side chambers, respectively, as seen in
Figure 1a. The effective areas of the piston and rod are
cm
2 and
cm
2, respectively. The total friction force
accounts for fluid and dust seal effects, as well as guiding ring friction between the piston and cylinder. A modified pressure-dependent LuGre model is used to describe the friction behavior [
1,
2]. It is worth noting that the effects of temperature and deformation are not included in this model. The nonlinear dynamic model of the GOES used in this study has been experimentally validated in our previous studies [
21].
3. Quarter-Car Model Under Random and Deterministic Road Excitations
The quarter-vehicle model (QVM), illustrated in
Figure 2, is used to evaluate the performance of the GOES under random road excitations. This model incorporates the sprung mass
, unsprung mass
, and the GOES unit. This study focuses only on a 0.68 MPa charge pressure, which corresponds to 1.10 MPa in static equilibrium conditions when the piston is positioned in the mid-position. The QVM formulation considered two mass parts of the strut, namely, the piston mass and the rod mass. The tire is modeled as a linear spring with stiffness
, and road excitation is denoted as a displacement input
. The sprung and unsprung mass displacements are denoted as
and
, respectively.
The static equilibrium force
is considered to ensure an accurate representation of GOES behavior. This force is derived from the static equilibrium pressure
, which is approximately 23% lower than the initial GOES charge pressure [
1].
where
g is the acceleration due to gravity. Given a constant piston mass of 12 kg, the sprung mass
considered in this study is determined to be 197.6 kg. The equations of motion governing the system dynamics include both static and dynamic forces acting on the QVM. For the combined sprung mass, including the piston mass, the governing equation is taken as
For the unsprung mass, including the rod mass, the equation is expressed as
where
represents the initial tire deflection, set to 15 mm. The tire stiffness
is determined based on the applied load and tire deflection:
With an unsprung mass
of 50 kg, the tire stiffness is calculated as 1.89 × 10
5 kN/m for the 0.68 MPa charge pressure. Furthermore, the values of constant mass parameters are presented in
Table 1.
Random Excitations
Random road excitations are generated in accordance with ISO 8608, which classifies road surface roughness based on power spectral density (PSD) [
22,
23,
24]. The time-domain representation of road roughness for a vehicle traveling at a constant velocity is described by [
23,
24]:
where
represents the reference spatial angular frequency, and
is the PSD of the road surface, which determines the roughness characteristics. The term
denotes the spatial frequency, defining how often road irregularities occur over a given distance.
is a white noise function that follows a normal distribution and serves as a stochastic input to simulate road roughness.
represents the constant velocity of the vehicle. A constant velocity of 10 m/s is used to simulate a Class B road profile. According to ISO 8608, the PSD value
for a Class B road is 64 × 10
−6 m
2/cycles/m, with
cycles/m. The generated random road profile, based on these parameters, is illustrated in
Figure 3, providing a realistic representation of road-induced disturbances typically encountered in real-world driving conditions.
5. Semi-Active Control Strategy with Deep Reinforcement Learning
The semi-active suspension control system shown in
Figure 1c can be formulated as a Markov Decision Process (MDP), as described in Equations (7)–(9). Based on the optimal control framework, the system state is now denoted as the MDP state
. The controller
is assigned to the policy πθ, and the opening percentage of the valve is defined as the action, which corresponds to the flow valve opening area, ranging from 10% to fully open (
cm
2). This action space represents a compromise between control precision and training efficiency. Employing fewer action levels resulted in less precise damping control and consequently reduced ride comfort. Choosing a higher number of actions slightly enhanced the performance but increased the training time and introduced convergence instability. MPD state
, control policy, and the control action
are given by
Because the controller in this study is designed to enhance vibration ride comfort, the objective function
is formulated considering mean squared values of vertical acceleration and vertical velocity of the sprung mass, which relate to ride comfort as defined in ISO-2631-1 [
27], such that
The optimizer, used to maximize the total reward function, is defined by
The reward function
is subsequently determined from Equation (12)
After several trial-and-error simulations, the reward function was chosen as a weighted sum of vertical acceleration and velocity, with weights and . PPO (Proximal Policy Optimization) was used to estimate the state-action value function in the framework of reinforcement learning. The trial-and-error tuning method was conducted to ensure convergence stability and optimal performance for the PPO algorithm. The selected hyperparameters are as follows: learning rate for the actor network = 0.001, for the critic network = 0.01, discount factor γ = 0.99, batch size = 128, experience horizon = 512, entropy loss coefficient = 0.01, clip factor = 0.2, number of epochs per update = 3, and GAE factor = 0.95. The Adam optimizer was used, and these values were selected after evaluating training reward stability across multiple runs.
The neural network used to determine the optimal consisted of three inputs representing observed states and ten discrete action levels. The training was performed over 1000 episodes, each with a maximum length of 1000 steps (interactions between the environment and agent), and it was terminated when it approached the maximum number of permissible episodes. The neural network employed a fully connected structure with three hidden layers of sizes (265, 265, and 10). The total episode rewards were smoothed using an exponential moving average filter with a window size of 20 steps. The smoothing of total rewards and also the early stopping approach on reward convergence helped the algorithm prevent the overfitting issue. The ReLU activation function was subsequently applied to both hidden layers, while the SoftMax function was applied to the final layer to estimate the Q-values. The policy and value networks were both updated according to the RL algorithm. The policy π was represented as a Gaussian distribution and bounded between the lower and higher limiting values of the actions, ranging from ten percent to the fully opened position.
The simulations were executed considering road excitations arising from a Class B road profile and a constant longitudinal velocity of 10 m/s. The duration of each training episode was taken as 10 s to provide adequate excitation, which enabled a steady state condition of the suspension system.
Figure 4 illustrates the complete process of policy learning using the MDP-based reinforcement learning algorithm used in this study.
6. Results and Discussion
In this section, the major conclusions of the investigation are presented, including the results of the multi-objective optimization process as well as the performance of the semi-active suspension system with the RL-based controller. The performance of each method in enhancing vehicle ride comfort by minimizing vertical acceleration and velocity is presented.
Figure 5 depicts the Pareto front generated by NSGA-II. The mean absolute value of vertical acceleration and the mean absolute value of vertical velocity were considered as the two objective functions, representing ride comfort. Every non-dominated solution is represented by each point on the Pareto front and depicts the conflict between the two goals. To identify the most balanced solution, the values of both objective functions were normalized based on their respective ranges. The normalized values were then summed for each solution, and the solution with the lowest value was selected as the optimal design. As depicted in
Figure 5, the selected optimum point is between the two best values of the objectives, with an acceptable balance. The corresponding values of the design parameters for the check valve as well as the bleed orifice were found as 0.27 cm
2 and 0.81 cm
2, respectively.
The training performance of the RL-based controller was assessed by analyzing the total reward obtained over successive episodes. The algorithm for RL demonstrates a consistent increase in reward during training and ultimately converges to a stable and higher reward level after approximately 400 episodes, as presented in
Figure 6. The trend indicates that the RL agent successfully learned an effective damping policy that enhances ride comfort. The convergence behavior of the reward curves confirms the effectiveness and robustness of the PPO method for semi-active suspension control in this application.
Figure 7 illustrates the time history of the valve opening percentage as determined by the RL-based control policy over the ten-second simulation period. The valve opening percentage indicates the damping force capability of the semi-active strut, where lower percentages correspond to higher damping force capacity, and higher percentages indicate lower damping capability.
Figure 8 depicts the vertical acceleration responses of the sprung mass for three suspension conditions using GOESs: the original strut, the optimally tuned strut, and the DRL-based semi-active suspension. In the original strut, the bleed orifice and check valve have cross-sectional areas of 0.0707 cm
2 and 0.041 cm
2, respectively. To ensure a meaningful evaluation of ride comfort, the vertical acceleration signals were filtered based on the ISO 2631 standard [
27] to account for the human body’s sensitivity to specific frequency ranges. The root mean square (RMS) values of the filtered acceleration signals were then evaluated for each strut condition. It is obvious from the figure that the original GOES exhibits big peak accelerations, indicating decreased ride comfort. The optimally tuned passive system reduces the amplitude of acceleration, thus showing considerably improved ride comfort. However, the DRL-based control is superior for both the original and optimal strut, providing the minimum acceleration values for the entire period of the simulation. The performance is further compared in quantitative terms from the root mean square (RMS) values of the vertical acceleration: 2.3161 m/s
2 for the original GOES, 1.1412 m/s
2 for the optimally tuned passive system, and 0.8301 m/s
2 for the DRL-based control, as presented in
Table 2. According to RMS values, the improvement percentage of vertical acceleration for the DRL-based control in comparison to the optimally tuned passive system is about 27.2%. These observations clearly show that the novel RL-based control approach enhances ride comfort through effective reductions in the vertical acceleration of the sprung mass. The check valve in the strut operates in two pressure modes (Pc > Pr and Pr > Pc), which provides behavior similar to basic semi-active control strategies like Skyhook and Groundhook. Adding the results of those classical methods would not show significant performance differences and would clutter the figures, making it harder for readers to clearly understand the comparison between the main cases studied.
Figure 9 presents the vertical velocity response of the sprung mass for three scenarios, mentioned for acceleration. Compared to the original GOES, the optimal passive system exhibits a slight reduction in velocity fluctuations, suggesting improved ride comfort. The RL-based GOES exhibits a velocity profile closely aligned with the optimal passive response; however, it offers no significant improvement over it. This observation is supported by the root mean square (RMS) values of the vertical velocity, which are 0.0891 m/s for the original GOES, 0.0825 m/s for the optimal, and 0.0866 m/s for the DRL-based controller. These results indicate that while the optimization process helps to reduce vertical velocity moderately, the improvement is relatively small. In comparison, the RL-based GOES achieves similar performance, suggesting that it is more effective in minimizing acceleration than vertical velocity. The acceleration and velocity of the sprung mass are generally opposite, and improving one often results in the deterioration of the other. However, the proposed DRL-based controller significantly reduces the acceleration, while the variation in velocity remains almost unchanged.
The acceleration transmissibility depicted in
Figure 10 provides valuable information about the dynamic suspension system performance under three different scenarios. To ensure accurate assessment, the vertical acceleration of the sprung mass in the frequency domain was filtered according to the ISO 2631 standard [
27] before calculating transmissibility. The optimal strut and the RL-based system have the same initial transmissibility of approximately one, which verifies the physical consistency of the system’s low-frequency performance. Nevertheless, a considerable difference appears after the initial transient. Specifically, the model with the RL-based approach shows a higher peak near the system’s natural frequency, slightly above the one observed in the optimally tuned passive case. While this may indicate discomfort near the system’s resonance, it is not fully representative of the overall ride comfort performance of the strut. Ride comfort is assessed over the entire frequency range, with particular attention to the frequencies to which the human body is most sensitive. Importantly, in the frequency range where human sensitivity to vibration is highest, generally between 4 and 10 Hz as defined by ISO 2631-1 [
27], the controller using the RL approach provides a considerable advantage. In this range, the transmissibility of the RL-based controller remains consistently lower than that of the optimum passive system. It stands to reason that the RL model effectively damps the vibrations that have the most adverse effect on ride comfort and therefore improves the overall passenger comfort.
Figure 11a,b illustrate the variation in the valve’s opening percentage as a function of the sprung mass velocity and the relative velocity three-dimensionally and two-dimensionally, respectively.
Figure 11b clearly indicates the control behavior learned by the RL agent in the GOES. The pattern shown in this figure closely resembles the behavior prescribed by the conventional Skyhook control logic. Specifically, when both the sprung mass velocity and relative velocity share the same sign, both either positive or negative, the multiplication of the two is positive. In such cases, the RL policy consistently selects lower valve opening percentages, corresponding to higher damping forces. This indicates the controller’s effort to dissipate energy and suppress undesired motion, aligning with Skyhook’s principle. While the Skyhook control method is based on linear damping, the RL controller can adjust to the nonlinear behavior of the GOES. The learned policy captures the main principle of the Skyhook method by applying damping when the body’s motion is likely to cause discomfort, while also addressing the complex nonlinear behavior of the GOES system. However, due to the strongly nonlinear damping characteristics of the GOES, the exact logic of the Skyhook strategy is not expected to be fully reflected in the obtained results.
Figure 12a,b illustrate the valve opening percentage in terms of relative velocity and sprung mass acceleration three-dimensionally and two-dimensionally, respectively. The values of the opening percentage given in
Figure 12b generally follow the theoretical expectations of the Acceleration Deviation Damping (ADD) control logic [
28], which controls damping according to the signs of the acceleration and the relative velocity. In the specific case when the sprung mass acceleration is negative and the relative velocity is positive, the controller using the RL-based approach provides a higher valve opening percentage, which implies a lower damping force. This concurs with the ADD logic that suggests low damping in this condition to improve the passenger’s comfort during the rebound or compression. However, due to the nonlinear characteristics of the strut, the controller is not expected to consistently follow the ADD logic throughout the entire operating range.
Figure 13 examines the relationship between the valve opening percentage and the combined influence of sprung mass velocity and acceleration. Unlike the trends in
Figure 11 and
Figure 12, the distribution of opening percentage values lacks a clear, interpretable pattern. The absence of a clear pattern suggests that relying only on sprung mass velocity and acceleration as state inputs is insufficient for achieving a stable and efficient semi-active control law. Most semi-active control strategies rely on relative motions/velocities, as they more directly relate to energy dissipation and damper force generation. The use of the trained RL model with access to multiple states clearly illustrates that the decision on valve opening percentage is not heavily based solely on the pair of velocity and acceleration. The semi-active control logics discussed in this paper are primarily based on the assumption that the damping force depends only on velocity. However, the GOES exhibits damping behavior that depends not only on velocity but also on displacement and excitation frequency. Therefore, it is expected to observe some similarities between the results of the DRL controller and traditional control strategies.