Semi-Active Control of a Two-Phase Fluid Strut Suspension via Deep Reinforcement Learning

Seifi, Abolfazl; Yin, Yuming; Yao, Yumeng; Rakheja, Subhash

doi:10.3390/machines13090854

Open AccessArticle

Semi-Active Control of a Two-Phase Fluid Strut Suspension via Deep Reinforcement Learning

¹

Department of Mechanical, Industrial & Aerospace Engineering, Concordia University, Montreal, QC H3G 2W, Canada

²

School of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310023, China

^*

Author to whom correspondence should be addressed.

Machines 2025, 13(9), 854; https://doi.org/10.3390/machines13090854

Submission received: 21 June 2025 / Revised: 1 September 2025 / Accepted: 12 September 2025 / Published: 16 September 2025

(This article belongs to the Special Issue Semi-Active Vibration Control: Strategies and Applications)

Download

Browse Figures

Versions Notes

Abstract

Gas–oil emulsion struts (GOESs), with their simplified and low-cost design and minimal friction, offer attractive potential for industrial applications. However, they exhibit highly nonlinear damping behavior due to the compressibility of the gas–oil emulsion. This study proposes a semi-active control strategy for modulating the emulsion flow via a dynamically controlled solenoid valve. The GOES is modeled considering pressure-dependent friction and flow characteristics. A reinforcement learning model is further developed to modulate the opening area of the control valve under random road excitations to enhance vibration ride comfort, using a quarter-vehicle model framework. The validated model is used to analyze the strut’s performance under three different scenarios, namely, the original passive, optimal passive, and semi-active. The results suggest that the proposed semi-active strategy could yield a considerably lower root mean square of the sprung mass acceleration for both the passive and optimal systems. It is further shown that real-time adjustment of the control valve could yield nearly 27.2% enhancement in ride comfort performance in comparison to optimal passive GOES.

Keywords:

emulsion; semi-active; solenoid valve; hydropneumatics strut; optimization

1. Introduction

Suspension systems are essential in ensuring vehicle stability, ride comfort, and road handling by absorbing road-induced vibrations and shocks [1]. Hydro-pneumatic struts (HPS) are increasingly being explored for vehicle suspensions due to their ability to provide both elastic and damping effects within a compact design [2]. In recent years, compact hydro-pneumatic struts have been explored with a floating piston for controlling road-induced vehicle vibration. Such struts employ a floating piston to separate the gas and oil media, which introduces drawbacks such as excessive friction, leading to higher heat generation, and poor vibration isolation [3]. A recent study proposed an alternate design to address these limitations. This strut design is considered very simple as it permits gas–oil emulsion and thereby nearly eliminates the friction effect. The design, however, involves highly complex challenges associated with the properties and flow of gas–oil emulsion. Unlike the earlier compact HPS, GOES eliminates the need for a floating piston, reducing friction and dissipating excess heat, which enhances the durability and efficiency of the system [1]. However, despite its simple design, the GOES exhibits notable performance limitations attributed to fixed flow areas [2].

In recent years, semi-active and active strategies have been applied to achieve variable flow areas and damping properties [4]. While active suspensions can offer superior ride comfort and handling performance, their implementations have been limited due to high energy consumption, complexity, and maintenance costs. Alternatively, semi-active strategies offer an attractive compromise between performance and energy demand. Unlike the passive suspensions, the semi-active suspensions can adapt their damping characteristics in real time using electronically controlled flow valves, such as solenoid valves [5,6] or controllable magnetorheological (MR) fluids [7,8,9]. Among semi-active suspension configurations, semi-active systems incorporating adjustable valves are particularly considered attractive for their cost-effective and compact design in addition to real-time controllability. Many studies have established superior performance of semi-active suspensions employing Skyhook control [10,11], where the damping force is regulated based on the motions of both sprung and unsprung masses. These control strategies, however, consider the damping force as a function of relative velocity alone. The damping force developed by a GOES, however, strongly depends on the displacement and frequency, apart from the velocity. The development of semi-active strategies involving strut deflection as well as frequency is thus vital for effective semi-active vehicle suspension design. Given these additional dependencies, existing semi-active strategies may not be directly applicable, highlighting the need for a new control approach specifically for GOES.

Hybrid control methods have also been explored to seek optimal semi-active suspension performance. For instance, Goncalves et al. [12] introduced a hybrid control strategy that combines the benefits of both Skyhook and Groundhook control approaches to control sprung mass acceleration and suspension travel. Other control approaches, such as Displacement-Sensitive Shock Absorbers (DSSAs), have been developed to improve ride comfort and road handling by adjusting damping characteristics based on piston position, allowing for distinct soft and hard damping modes [13]. Jadhav et al. extended the DSSA concept by incorporating transient displacement-sensitive flow orifices to enhance adaptability [14]. Additionally, Energy-Flow-Driven (EFD) dampers have been developed to achieve semi-active control performance by dynamically switching between different damping modes based on energy flow analysis. Instead of continuously varying the damping coefficient, EFD adjusts between predefined control strategies, such as Skyhook, at a relatively low frequency and Acceleration-Driven-Damper (ADD) at higher frequencies, thereby optimizing ride comfort and road handling [15]. While these advancements have improved semi-active suspension performance, they have done so by considering damping force dependence only on relative velocity and by switching the damping coefficient between distinct modes. Since the damping force also depends on strut travel and frequency, the control strategies need to consider both the relative displacement and frequency, apart from the velocity.

Among the various optimization-based approaches for solving control problems, deep reinforcement learning (DRL) has emerged as a promising technique. Recent advancements in hardware technologies, particularly the GPUs, have enabled the real-time processing of large datasets, thereby accelerating developments in a wide range of data-driven algorithms [16,17]. Several studies have investigated the applications of reinforcement learning (RL) approaches in semi-active suspension control. For instance, Ernst et al. [18] employed a reinforcement learning framework to design a semi-active suspension system, utilizing the Fitted Q-Iteration algorithm. This approach operated under the constraint that the damping coefficient could switch only between the predefined maxima and minima values, which resulted in performance comparable to that of the Mixed SH-ADD algorithm [18]. Liu et al. [19] proposed a semi-active suspension control strategy using the Deep Deterministic Policy Gradient (DDPG) algorithm in a continuous action space. Recent work by Wong et al. [20] proposed a reinforcement learning-based fault-tolerant control strategy for a magnetorheological semi-active air suspension system, using a generalized fuzzy hysteresis model to address nonlinearities and fault dynamics. Their results highlighted the ability of RL controllers to adapt to complex, real-world uncertainties, further motivating the use of learning-based approaches in advanced suspension control systems.

This study proposes a GOES-based semi-active suspension control strategy using a deep reinforcement learning (DRL) technique. The proposed model, validated using the experimental data, illustrated improved ride comfort performance by varying unique damping characteristics of the strut. The proposed method permits continuous regulation of the damping force instead of switching between pre-defined states. The DRL controller regulates the flow area based on vertical acceleration and velocity of the sprung mass, in addition to the suspension travel. The flow valve opening is discretized into ten levels, ranging from 10% to 100%, which represent different available actions for the DRL agent. The reward function was selected as the weighted sum of vertical acceleration and velocity of the sprung mass, with appropriate weighting factors using a quarter-vehicle simulation platform, considering a random road excitation. The performance of the proposed GOES is compared with that of the optimally tuned and the original passive GOES to demonstrate the superior performance of the proposed strategy.

2. GOES Model and Experimental Tests

A gas–oil emulsion hydro-pneumatic strut (GOES), schematically illustrated in Figure 1a, is used to develop a semi-active control strategy, employing a deep reinforcement learning (DRL) approach. Under dynamic excitations, the nitrogen gas in the upper chamber tends to gradually dissolve into the oil, forming a gas–oil emulsion (GOE). The strut consists of two fluid chambers: (i) the piston-side (p-s) chamber, which contains the emulsion and the nitrogen gas on top of it, and (ii) the rod-side chamber, which is filled solely with the emulsion. In the original setup, these chambers are connected via two pressure-dependent check valves with an area of A_v and two bleed orifices with an area of A_b, as depicted in Figure 1a. The experimental setup, shown in Figure 1b, was designed to measure the performance of the strut under harmonic excitations ranging from 1 to 8 Hz, while the strut temperature was held around 30 °C. The experimental results revealed that the nonlinear behavior of the strut is highly dependent on the excitation conditions; at low velocities, the response is mainly influenced by gas compressibility, while at higher velocities, the damping effects dominate, leading to stronger nonlinear characteristics.

To further enhance controllability and adaptability, a new configuration was introduced by replacing the traditional check valves and bleed orifices with two inner solenoid valves, as shown in Figure 1c. This adaptive solenoid valve permitted real-time modulation of the flow area, enabling the development of a semi-active damping strategy tailored to the unique characteristics of GOES. This modification allows for more precise control of damping properties, overcoming the limitations of passive damping systems.

The analytical modeling of the strut is based on a previous study [2], incorporating flow continuity, pressure equilibrium, and the polytropic behavior of real gas, modeled using the van der Waals equation. To simplify the model, gas mass transfer between chambers and the effects of fluid inertia are neglected to further simplify the model, as they represent the primary sources of uncertainty. Within the charge pressure range in this study, the effect of the mentioned assumption is negligible. However, if the strut operates under significantly higher pressures, neglecting this effect in the formulation may result in slight deviations in the system response. The total force exerted by the strut is given by

F_{t} = P_{c} A_{c} - P_{r} (A_{c} - A_{r}) + F_{f}

(1)

where

P_{c}

and

P_{r}

are the fluid pressures in the piston- and rod-side chambers, respectively, as seen in Figure 1a. The effective areas of the piston and rod are

A_{c} = 44.18

cm² and

A_{r} = 19.63

cm², respectively. The total friction force

F_{f}

accounts for fluid and dust seal effects, as well as guiding ring friction between the piston and cylinder. A modified pressure-dependent LuGre model is used to describe the friction behavior [1,2]. It is worth noting that the effects of temperature and deformation are not included in this model. The nonlinear dynamic model of the GOES used in this study has been experimentally validated in our previous studies [21].

3. Quarter-Car Model Under Random and Deterministic Road Excitations

The quarter-vehicle model (QVM), illustrated in Figure 2, is used to evaluate the performance of the GOES under random road excitations. This model incorporates the sprung mass

(m_{s})

, unsprung mass

(m_{u})

, and the GOES unit. This study focuses only on a 0.68 MPa charge pressure, which corresponds to 1.10 MPa in static equilibrium conditions when the piston is positioned in the mid-position. The QVM formulation considered two mass parts of the strut, namely, the piston mass and the rod mass. The tire is modeled as a linear spring with stiffness

k_{t}

, and road excitation is denoted as a displacement input

(z_{r o})

. The sprung and unsprung mass displacements are denoted as

z_{s}

and

z_{u}

, respectively.

The static equilibrium force

(F_{0})

is considered to ensure an accurate representation of GOES behavior. This force is derived from the static equilibrium pressure

(P_{0})

, which is approximately 23% lower than the initial GOES charge pressure [1].

(m_{s} g + m_{s 1} g) = P_{c} A_{c} - P_{r} (A_{c} - A_{r}) = P_{0} A_{r} = F_{0}

(2)

where g is the acceleration due to gravity. Given a constant piston mass of 12 kg, the sprung mass

(m_{s})

considered in this study is determined to be 197.6 kg. The equations of motion governing the system dynamics include both static and dynamic forces acting on the QVM. For the combined sprung mass, including the piston mass, the governing equation is taken as

(m_{s} + m_{s 1}) {\ddot{z}}_{s} = F_{t} - (m_{s} g + m_{s 1} g)

(3)

For the unsprung mass, including the rod mass, the equation is expressed as

(m_{u} + m_{s 2}) {\ddot{z}}_{u} = {- F}_{t} - (m_{u} g + m_{s 2} g) - k_{t y} (z_{u} - z_{r o}) + k_{t y} {Δ z}_{0}

(4)

where

{Δ z}_{0}

represents the initial tire deflection, set to 15 mm. The tire stiffness

(k_{t y})

is determined based on the applied load and tire deflection:

k_{t y} = \frac{(m_{s} g + m_{s 1} g + m_{u} g + m_{s 2} g)}{{Δ z}_{0}}

(5)

With an unsprung mass

(m_{u})

of 50 kg, the tire stiffness is calculated as 1.89 × 10⁵ kN/m for the 0.68 MPa charge pressure. Furthermore, the values of constant mass parameters are presented in Table 1.

Random Excitations

Random road excitations are generated in accordance with ISO 8608, which classifies road surface roughness based on power spectral density (PSD) [22,23,24]. The time-domain representation of road roughness for a vehicle traveling at a constant velocity is described by [23,24]:

{\dot{z}}_{r o} (t) + 2 π n_{0} z_{r o} (t) = \sqrt{G_{z} (Ω_{0}) V} ω (t)

(6)

where

Ω_{0}

represents the reference spatial angular frequency, and

G_{z} (Ω_{0})

is the PSD of the road surface, which determines the roughness characteristics. The term

n_{0}

denotes the spatial frequency, defining how often road irregularities occur over a given distance.

ω (t)

is a white noise function that follows a normal distribution and serves as a stochastic input to simulate road roughness.

V

represents the constant velocity of the vehicle. A constant velocity of 10 m/s is used to simulate a Class B road profile. According to ISO 8608, the PSD value

G_{z} (Ω_{0})

for a Class B road is 64 × 10⁻⁶ m²/cycles/m, with

n_{0} = 0.1

cycles/m. The generated random road profile, based on these parameters, is illustrated in Figure 3, providing a realistic representation of road-induced disturbances typically encountered in real-world driving conditions.

4. Multi-Objective Optimization

Non-dominated sorting genetic algorithm II (NSGA-II), considered well-suited for mechanical engineering applications due to its ability to efficiently address complex multi-objective problems, is applied to thoroughly explore the solution space [25,26]. In this study, the algorithm was configured with an initial population size of 50, a crossover rate of 0.7, and a mutation rate of 0.4. The optimization process was terminated after 100 iterations. The objective function was defined as the mean of the absolute values of the vertical acceleration (

{C F}_{1} = \frac{1}{N} \sum_{t} |{\ddot{z}}_{s} (t)|

) and velocity (

{C F}_{2} = \frac{1}{N} \sum_{t} |{\dot{z}}_{s} (t)|

) of the sprung mass. Considering a 1000 Hz sampling rate, the simulation was conducted for 10 s. The cross-sectional areas of the bleed orifice and check valve were selected as the design variables. Their ranges were set broadly, from 0.001 × 10⁻² cm² to 100 × 10⁻² cm², to ensure comprehensive coverage of the design space.

5. Semi-Active Control Strategy with Deep Reinforcement Learning

The semi-active suspension control system shown in Figure 1c can be formulated as a Markov Decision Process (MDP), as described in Equations (7)–(9). Based on the optimal control framework, the system state is now denoted as the MDP state

s_{t}

. The controller

A_{c v} (s_{t})

is assigned to the policy πθ, and the opening percentage of the valve is defined as the action, which corresponds to the flow valve opening area, ranging from 10% to fully open (

A_{c v_m a x} = 70

cm²). This action space represents a compromise between control precision and training efficiency. Employing fewer action levels resulted in less precise damping control and consequently reduced ride comfort. Choosing a higher number of actions slightly enhanced the performance but increased the training time and introduced convergence instability. MPD state

s_{t}

, control policy, and the control action

a_{t}

are given by

s_{t} = x (t) = [\begin{matrix} {\ddot{z}}_{s} (t) & {\dot{z}}_{s} (t) & {\dot{z}}_{s} (t) - {\dot{z}}_{u} \end{matrix} (t)]

(7)

π_{θ} (a_{t} | s_{t}) = A_{c v} (s_{t})

(8)

a_{t} = [\begin{matrix} 0.1 A_{c v_m a x} & \dots \end{matrix} \begin{matrix} 0.9 A_{c v_m a x} & A_{c v_m a x} \end{matrix}]

(9)

Because the controller in this study is designed to enhance vibration ride comfort, the objective function

J_{o p t}

is formulated considering mean squared values of vertical acceleration and vertical velocity of the sprung mass, which relate to ride comfort as defined in ISO-2631-1 [27], such that

J_{o p t} = K_{a c} E ({|{\ddot{z}}_{s}|}^{2}) + K_{v e} E ({|{\dot{z}}_{s}|}^{2}) = K_{a c} E ({|{\ddot{z}}_{s}|}^{2}) + K_{v e} E ({|{\dot{z}}_{s}|}^{2})

(10)

The optimizer, used to maximize the total reward function, is defined by

\begin{matrix} θ^{*} & = a r g m a x E [- J_{o p t}] \\ = a r g m a x E [{- K}_{a c} E ({|{\ddot{z}}_{s}|}^{2}) - K_{v e} E ({|{\dot{z}}_{s}|}^{2})] \\ = a r g m a x E [\sum_{t} r (s_{t}, a_{t})] . \end{matrix}

(11)

The reward function

r (s_{t}, a_{t})

is subsequently determined from Equation (12)

r (s_{t}, a_{t}) = {- K}_{a c} ({|{\ddot{z}}_{s}|}^{2}) - K_{v e} {|{\dot{z}}_{s}|}^{2}

(12)

After several trial-and-error simulations, the reward function was chosen as a weighted sum of vertical acceleration and velocity, with weights

K_{a c} = 1

and

K_{v e} = 300

. PPO (Proximal Policy Optimization) was used to estimate the state-action value function

(Q (s_{t}, a_{t}))

in the framework of reinforcement learning. The trial-and-error tuning method was conducted to ensure convergence stability and optimal performance for the PPO algorithm. The selected hyperparameters are as follows: learning rate for the actor network = 0.001, for the critic network = 0.01, discount factor γ = 0.99, batch size = 128, experience horizon = 512, entropy loss coefficient = 0.01, clip factor = 0.2, number of epochs per update = 3, and GAE factor = 0.95. The Adam optimizer was used, and these values were selected after evaluating training reward stability across multiple runs.

The neural network used to determine the optimal

Q (s_{t}, a_{t})

consisted of three inputs representing observed states and ten discrete action levels. The training was performed over 1000 episodes, each with a maximum length of 1000 steps (interactions between the environment and agent), and it was terminated when it approached the maximum number of permissible episodes. The neural network employed a fully connected structure with three hidden layers of sizes (265, 265, and 10). The total episode rewards were smoothed using an exponential moving average filter with a window size of 20 steps. The smoothing of total rewards and also the early stopping approach on reward convergence helped the algorithm prevent the overfitting issue. The ReLU activation function was subsequently applied to both hidden layers, while the SoftMax function was applied to the final layer to estimate the Q-values. The policy and value networks were both updated according to the RL algorithm. The policy π was represented as a Gaussian distribution and bounded between the lower and higher limiting values of the actions, ranging from ten percent to the fully opened position.

The simulations were executed considering road excitations arising from a Class B road profile and a constant longitudinal velocity of 10 m/s. The duration of each training episode was taken as 10 s to provide adequate excitation, which enabled a steady state condition of the suspension system. Figure 4 illustrates the complete process of policy learning using the MDP-based reinforcement learning algorithm used in this study.

6. Results and Discussion

In this section, the major conclusions of the investigation are presented, including the results of the multi-objective optimization process as well as the performance of the semi-active suspension system with the RL-based controller. The performance of each method in enhancing vehicle ride comfort by minimizing vertical acceleration and velocity is presented. Figure 5 depicts the Pareto front generated by NSGA-II. The mean absolute value of vertical acceleration and the mean absolute value of vertical velocity were considered as the two objective functions, representing ride comfort. Every non-dominated solution is represented by each point on the Pareto front and depicts the conflict between the two goals. To identify the most balanced solution, the values of both objective functions were normalized based on their respective ranges. The normalized values were then summed for each solution, and the solution with the lowest value was selected as the optimal design. As depicted in Figure 5, the selected optimum point is between the two best values of the objectives, with an acceptable balance. The corresponding values of the design parameters for the check valve as well as the bleed orifice were found as 0.27 cm² and 0.81 cm², respectively.

The training performance of the RL-based controller was assessed by analyzing the total reward obtained over successive episodes. The algorithm for RL demonstrates a consistent increase in reward during training and ultimately converges to a stable and higher reward level after approximately 400 episodes, as presented in Figure 6. The trend indicates that the RL agent successfully learned an effective damping policy that enhances ride comfort. The convergence behavior of the reward curves confirms the effectiveness and robustness of the PPO method for semi-active suspension control in this application. Figure 7 illustrates the time history of the valve opening percentage as determined by the RL-based control policy over the ten-second simulation period. The valve opening percentage indicates the damping force capability of the semi-active strut, where lower percentages correspond to higher damping force capacity, and higher percentages indicate lower damping capability.

Figure 8 depicts the vertical acceleration responses of the sprung mass for three suspension conditions using GOESs: the original strut, the optimally tuned strut, and the DRL-based semi-active suspension. In the original strut, the bleed orifice and check valve have cross-sectional areas of 0.0707 cm² and 0.041 cm², respectively. To ensure a meaningful evaluation of ride comfort, the vertical acceleration signals were filtered based on the ISO 2631 standard [27] to account for the human body’s sensitivity to specific frequency ranges. The root mean square (RMS) values of the filtered acceleration signals were then evaluated for each strut condition. It is obvious from the figure that the original GOES exhibits big peak accelerations, indicating decreased ride comfort. The optimally tuned passive system reduces the amplitude of acceleration, thus showing considerably improved ride comfort. However, the DRL-based control is superior for both the original and optimal strut, providing the minimum acceleration values for the entire period of the simulation. The performance is further compared in quantitative terms from the root mean square (RMS) values of the vertical acceleration: 2.3161 m/s² for the original GOES, 1.1412 m/s² for the optimally tuned passive system, and 0.8301 m/s² for the DRL-based control, as presented in Table 2. According to RMS values, the improvement percentage of vertical acceleration for the DRL-based control in comparison to the optimally tuned passive system is about 27.2%. These observations clearly show that the novel RL-based control approach enhances ride comfort through effective reductions in the vertical acceleration of the sprung mass. The check valve in the strut operates in two pressure modes (Pc > Pr and Pr > Pc), which provides behavior similar to basic semi-active control strategies like Skyhook and Groundhook. Adding the results of those classical methods would not show significant performance differences and would clutter the figures, making it harder for readers to clearly understand the comparison between the main cases studied.

Figure 9 presents the vertical velocity response of the sprung mass for three scenarios, mentioned for acceleration. Compared to the original GOES, the optimal passive system exhibits a slight reduction in velocity fluctuations, suggesting improved ride comfort. The RL-based GOES exhibits a velocity profile closely aligned with the optimal passive response; however, it offers no significant improvement over it. This observation is supported by the root mean square (RMS) values of the vertical velocity, which are 0.0891 m/s for the original GOES, 0.0825 m/s for the optimal, and 0.0866 m/s for the DRL-based controller. These results indicate that while the optimization process helps to reduce vertical velocity moderately, the improvement is relatively small. In comparison, the RL-based GOES achieves similar performance, suggesting that it is more effective in minimizing acceleration than vertical velocity. The acceleration and velocity of the sprung mass are generally opposite, and improving one often results in the deterioration of the other. However, the proposed DRL-based controller significantly reduces the acceleration, while the variation in velocity remains almost unchanged.

The acceleration transmissibility depicted in Figure 10 provides valuable information about the dynamic suspension system performance under three different scenarios. To ensure accurate assessment, the vertical acceleration of the sprung mass in the frequency domain was filtered according to the ISO 2631 standard [27] before calculating transmissibility. The optimal strut and the RL-based system have the same initial transmissibility of approximately one, which verifies the physical consistency of the system’s low-frequency performance. Nevertheless, a considerable difference appears after the initial transient. Specifically, the model with the RL-based approach shows a higher peak near the system’s natural frequency, slightly above the one observed in the optimally tuned passive case. While this may indicate discomfort near the system’s resonance, it is not fully representative of the overall ride comfort performance of the strut. Ride comfort is assessed over the entire frequency range, with particular attention to the frequencies to which the human body is most sensitive. Importantly, in the frequency range where human sensitivity to vibration is highest, generally between 4 and 10 Hz as defined by ISO 2631-1 [27], the controller using the RL approach provides a considerable advantage. In this range, the transmissibility of the RL-based controller remains consistently lower than that of the optimum passive system. It stands to reason that the RL model effectively damps the vibrations that have the most adverse effect on ride comfort and therefore improves the overall passenger comfort.

Figure 11a,b illustrate the variation in the valve’s opening percentage as a function of the sprung mass velocity and the relative velocity three-dimensionally and two-dimensionally, respectively. Figure 11b clearly indicates the control behavior learned by the RL agent in the GOES. The pattern shown in this figure closely resembles the behavior prescribed by the conventional Skyhook control logic. Specifically, when both the sprung mass velocity and relative velocity share the same sign, both either positive or negative, the multiplication of the two is positive. In such cases, the RL policy consistently selects lower valve opening percentages, corresponding to higher damping forces. This indicates the controller’s effort to dissipate energy and suppress undesired motion, aligning with Skyhook’s principle. While the Skyhook control method is based on linear damping, the RL controller can adjust to the nonlinear behavior of the GOES. The learned policy captures the main principle of the Skyhook method by applying damping when the body’s motion is likely to cause discomfort, while also addressing the complex nonlinear behavior of the GOES system. However, due to the strongly nonlinear damping characteristics of the GOES, the exact logic of the Skyhook strategy is not expected to be fully reflected in the obtained results.

Figure 12a,b illustrate the valve opening percentage in terms of relative velocity and sprung mass acceleration three-dimensionally and two-dimensionally, respectively. The values of the opening percentage given in Figure 12b generally follow the theoretical expectations of the Acceleration Deviation Damping (ADD) control logic [28], which controls damping according to the signs of the acceleration and the relative velocity. In the specific case when the sprung mass acceleration is negative and the relative velocity is positive, the controller using the RL-based approach provides a higher valve opening percentage, which implies a lower damping force. This concurs with the ADD logic that suggests low damping in this condition to improve the passenger’s comfort during the rebound or compression. However, due to the nonlinear characteristics of the strut, the controller is not expected to consistently follow the ADD logic throughout the entire operating range.

Figure 13 examines the relationship between the valve opening percentage and the combined influence of sprung mass velocity and acceleration. Unlike the trends in Figure 11 and Figure 12, the distribution of opening percentage values lacks a clear, interpretable pattern. The absence of a clear pattern suggests that relying only on sprung mass velocity and acceleration as state inputs is insufficient for achieving a stable and efficient semi-active control law. Most semi-active control strategies rely on relative motions/velocities, as they more directly relate to energy dissipation and damper force generation. The use of the trained RL model with access to multiple states clearly illustrates that the decision on valve opening percentage is not heavily based solely on the pair of velocity and acceleration. The semi-active control logics discussed in this paper are primarily based on the assumption that the damping force depends only on velocity. However, the GOES exhibits damping behavior that depends not only on velocity but also on displacement and excitation frequency. Therefore, it is expected to observe some similarities between the results of the DRL controller and traditional control strategies.

7. Conclusions

This paper introduced a new deep reinforcement learning (DRL)-based control methodology for a semi-active suspension system using a gas–oil emulsion strut (GOES). The proposed approach addresses the nonlinear damping characteristics inherent to GOES, which are influenced by frequency, relative displacement, and relative velocity factors that conventional control methods often fail to capture adequately. By formulating the suspension control problem as a Markov Decision Process and using PPO (Proximal Policy Optimization), the RL agent learned to tune the solenoid valve opening percentage in real time and thus improved the ride comfort during random road excitations. The results showed that the RL controller achieved better vertical acceleration performance than both the passive and optimally tuned passive systems, leading to an overall ride comfort improvement of 27.2%. Although the velocity response showed limited enhancement beyond the optimal passive case, the RL controller exhibited superior performance in the frequency range most critical to human comfort. The incorporation of DRL in semi-active suspension control of GOESs is a promising line of development for intelligent suspension systems, offering adaptive performance guided by the intricate nonlinear nature of the interaction of fluids with elastic structures.

Author Contributions

A.S. was responsible for conceptualization, methodology, theoretical analysis, validation, and writing—original draft. Y.Y. (Yuming Yin) was responsible for investigation, resources, formal analysis, and data collection. Y.Y. (Yumeng Yao) was responsible for review and editing and visualization. S.R. was responsible for conceptualization, supervision, and funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data that support the findings of this study are available from the author [Yuming Yin], upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

Yin, Y.; Rakheja, S.; Yang, J.; Boileau, P.-E. Characterization of a hydro-pneumatic suspension strut with gas-oil emulsion. Mech. Syst. Signal Process. 2018, 106, 319–333. [Google Scholar] [CrossRef]
Seifi, A.; Yao, Y.; Yin, Y.; Moore, M.; Rakheja, S. Pressure-Dependent Nonlinearities of a Compact Hydropneumatic Strut with Integrated Gas–Oil Chamber. SAE Int. J. Veh. Dyn. Stab. NVH 2025, 9, 151–170. [Google Scholar] [CrossRef]
Li, R.; Yang, F.; Lin, D. Design, experimental modeling and analysis of compact double-gas-chamber hydro-pneumatic strut. Mech. Syst. Signal Process. 2022, 172, 109015. [Google Scholar] [CrossRef]
Savares, S.M.; Poussot-Vassal, C.; Spelta, C.; Sename, O.; Dugard, L. Semi-Active Suspension Control Design for Vehicles|ScienceDirect; Elsevier: Amsterdam, The Netherlands, 2011; ISBN 978-0-08-096678-6. [Google Scholar]
Eslaminasab, N. Development of a Semi-Active Intelligent Suspension System for Heavy Vehicles; University of Waterloo: Waterloo, ON, Canada, 2008. [Google Scholar]
Ahmed, M.R.; Yusoff, A.R.; Romlay, F.R.M. Adjustable Valve Semi-Active Suspension System for Passenger Car. Int. J. Automot. Mech. Eng. 2019, 16, 6470–6481. [Google Scholar] [CrossRef]
Josee, M.; Sosthene, K.; Pacifique, T. Review of semi-active suspension based on Magneto-rheological damper. Eng. Perspect. 2024, 1, 38–51. [Google Scholar] [CrossRef]
Yao, G.Z.; Yap, F.F.; Chen, G.; Li, W.H.; Yeo, S.H. MR damper and its application for semi-active control of vehicle suspension system. Mechatronics 2002, 12, 963–973. [Google Scholar] [CrossRef]
Lee, J.; Oh, K. Hybrid Damping Mode MR Damper: Development and Experimental Validation with Semi-Active Control. Machines 2025, 13, 435. [Google Scholar] [CrossRef]
Karnopp, D.; Crosby, M.J.; Harwood, R.A. Vibration Control Using Semi-Active Force Generators. J. Eng. Ind. 1974, 96, 619–626. [Google Scholar] [CrossRef]
Wang, F.; Wen, H.; Xie, S. Performance Analysis and Hybrid Control Strategy Research of Vehicle Semi-Active Suspension for Ride Comfort and Handling Stability. Machines 2025, 13, 393. [Google Scholar] [CrossRef]
Goncalves, F.D.; Ahmadian, M. A hybrid control policy for semi-active vehicle suspensions. Shock. Vib. 2003, 10, 59–69. [Google Scholar] [CrossRef]
Lee, C.-T.; Moon, B.-Y. Simulation and experimental validation of vehicle dynamic characteristics for displacement-sensitive shock absorber using fluid-flow modelling. Mech. Syst. Signal Process. 2006, 20, 373–388. [Google Scholar] [CrossRef]
Jadhav, M.; Belkar, S. Analysis of Displacement Sensitive Twin Tube Shock Absorber. Int. J. Eng. Res. Dev. 2012, 5, 31–41. [Google Scholar]
Liu, Y.; Zuo, L. Energy-Flow-Driven (EFD) semi-active suspension control. In Proceedings of the 2014 American Control Conference, Portland, OR, USA, 4–6 June 2014; pp. 2120–2125. [Google Scholar]
Xin, J.; Zhao, H.; Liu, D.; Li, M. Application of deep reinforcement learning in mobile robot path planning. In Proceedings of the 2017 Chinese Automation Congress (CAC), Jinan, China, 20–22 October 2017; pp. 7112–7116. Available online: https://ieeexplore.ieee.org/document/8244061 (accessed on 2 May 2025).
Zhu, H.; Gupta, A.; Rajeswaran, A.; Levine, S.; Kumar, V. Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 3651–3657. [Google Scholar]
Ernst, D.; Geurts, P.; Wehenkel, L. Tree-Based Batch Mode Reinforcement Learning. J. Mach. Learn. Res. 2005, 6, 503–556. [Google Scholar]
Liu, M.; Li, Y.; Rong, X.; Zhang, S.; Yin, Y. Semi-Active Suspension Control Based on Deep Reinforcement Learning. IEEE Access 2020, 8, 9978–9986. [Google Scholar] [CrossRef]
Wong, P.K.; Gao, Z.; Zhao, J. Reinforcement Learning-Based Fault-Tolerant Control for Semiactive Air Suspension Based on Generalized Fuzzy Hysteresis Model. IEEE Trans. Fuzzy Syst. 2025, 33, 1803–1814. [Google Scholar] [CrossRef]
Seifi, A.; Yao, Y.; Yin, Y.; Rakheja, S. Frequency-velocity-deflection-dependent damping properties of a hydro-pneumatic strut with gas-oil emulsion. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2025, 09544070251338359. Available online: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=UzP8YzoAAAAJ&citation_for_view=UzP8YzoAAAAJ:kNdYIx-mwKoC (accessed on 20 June 2025).
ISO 8608:2016; Mechanical Vibration—Road Surface Profiles—Reporting of Measured Data. International Organization for Standardization: Geneva, Switzerland, 1997. Available online: https://www.iso.org/standard/71202.html (accessed on 8 August 2023).
Zhang, L.J.; Lee, C.M.; Wang, Y.S. A study on nonstationary random vibration of a vehicle in time and frequency domains. Int. J. Automot. Technol. 2002, 3, 101–109. [Google Scholar]
He, L.; Qin, G.; Zhang, Y.; Chen, L. Non-stationary Random Vibration Analysis of Vehicle with Fractional Damping. In Proceedings of the 2008 International Conference on Intelligent Computation Technology and Automation (ICICTA), Changsha, China, 20–22 October 2008; Volume 2, pp. 150–157. [Google Scholar]
Seifi, A.; Hassannejad, R.; Hamed, M.A. Use of nonlinear asymmetrical shock absorbers in multi-objective optimization of the suspension system in a variety of road excitations. Proc. Inst. Mech. Eng. Part K J. Multi-Body Dyn. 2017, 231, 372–387. [Google Scholar] [CrossRef]
Wu, X.; Chen, Z.; Liu, T.; Song, H.; Wang, Z.; Shi, W. Improved NSGA-II and its application in BIW structure optimization. Adv. Mech. Eng. 2023, 15, 1–11. [Google Scholar] [CrossRef]
ISO 2631-1; Mechanical Vibration and Shock—Evaluation of Human Exposure to Whole-Body Vibration—Part 1: General Requirements. International Organization for Standardization: Geneva, Switzerland, 1997.
Savaresi, S.M.; Silani, E.; Bittanti, S. Semi-Active Suspensions: An Optimal Control Strategy for a Quarter-Car Model. IFAC Proc. Vol. 2004, 37, 553–558. [Google Scholar] [CrossRef]

Figure 1. (a) Schematic of the GOES; (b) experimental platform; (c) GOES with adjustable valves.

Figure 2. Schematic of quarter-car model.

Figure 3. Random road profile.

Figure 4. Reinforcement learning framework applied to semi-active suspension systems with GOES.

Figure 5. Pareto front results of NSGA-II.

Figure 6. Total rewards of episodes.

Figure 7. Controller output (opening percentage of valve).

Figure 8. Vertical acceleration of sprung mass.

Figure 9. Vertical velocity of sprung mass.

Figure 10. The transmissibility of acceleration between sprung and unsprung mass.

Figure 11. Opening percentage value of the valve based on the velocity of sprung mass and relative velocity (a) in 3D; (b) 2D.

Figure 12. Opening percentage value of valve based on acceleration of sprung mass and relative velocity (a) in 3D; (b) 2D.

Figure 13. Opening percentage value of valve based on velocity and acceleration of sprung mass (a) in 3D; (b) 2D.

Table 1. The values of mass parameters.

Parameters	m_s (kg)	m_u (kg)	m_s1 (kg)	m_s2 (kg)
value	197.6	50	22.7	19.9

Table 2. RMS values of acceleration.

Strut Type	Passive Strut	Optimal Strut	Strut with DRL
RMS of acceleration (m/s²)	2.3161	1.1412	0.8301

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Seifi, A.; Yin, Y.; Yao, Y.; Rakheja, S. Semi-Active Control of a Two-Phase Fluid Strut Suspension via Deep Reinforcement Learning. Machines 2025, 13, 854. https://doi.org/10.3390/machines13090854

AMA Style

Seifi A, Yin Y, Yao Y, Rakheja S. Semi-Active Control of a Two-Phase Fluid Strut Suspension via Deep Reinforcement Learning. Machines. 2025; 13(9):854. https://doi.org/10.3390/machines13090854

Chicago/Turabian Style

Seifi, Abolfazl, Yuming Yin, Yumeng Yao, and Subhash Rakheja. 2025. "Semi-Active Control of a Two-Phase Fluid Strut Suspension via Deep Reinforcement Learning" Machines 13, no. 9: 854. https://doi.org/10.3390/machines13090854

APA Style

Seifi, A., Yin, Y., Yao, Y., & Rakheja, S. (2025). Semi-Active Control of a Two-Phase Fluid Strut Suspension via Deep Reinforcement Learning. Machines, 13(9), 854. https://doi.org/10.3390/machines13090854

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Semi-Active Control of a Two-Phase Fluid Strut Suspension via Deep Reinforcement Learning

Abstract

1. Introduction

2. GOES Model and Experimental Tests

3. Quarter-Car Model Under Random and Deterministic Road Excitations

Random Excitations

4. Multi-Objective Optimization

5. Semi-Active Control Strategy with Deep Reinforcement Learning

6. Results and Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI