Next Article in Journal
Additives for Lubricating Oil and Grease: Mechanism, Properties and Applications
Previous Article in Journal
Numerical Optimization Analysis of Floating Ring Seal Performance Based on Surface Texture
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Objective Deep Q-Network Control for Actively Lubricated Bearings

Department of Mechatronics, Mechanics and Robotics, Orel State University, Orel 302015, Russia
*
Author to whom correspondence should be addressed.
Lubricants 2024, 12(7), 242; https://doi.org/10.3390/lubricants12070242
Submission received: 4 April 2024 / Revised: 4 June 2024 / Accepted: 1 July 2024 / Published: 3 July 2024

Abstract

:
This paper aims to study and demonstrate the possibilities of using reinforcement learning for the synthesis of multi-objective controllers for radial actively lubricated hybrid fluid film bearings (ALHBs), which are considered to be complex multi-physical systems. In addition to the rotor displacement control problem being typically solved for active bearings, the proposed approach also includes power losses due to friction and lubricant pumping in ALHBs among the control objectives to be minimized by optimizing the lubrication modes. The multi-objective controller was synthesized using the deep Q-network (DQN) learning technique. An optimal control policy was determined by the DQN agent during its repetitive interaction with the simulation model of the rotor system with ALHBs. The calculations were sped up by replacing the numerical model of an ALHB with its surrogate ANN-based counterpart and by predicting the shaft displacements in response to operation of two independent control loops. The controller synthesized considering the formulated reward function for DQN agent is able to find a stable shaft position that reduces power losses by almost half compared to the losses observed when using a passive system. It also is able to prevent the established limit of the minimum fluid film thickness being exceeded to avoid possible system damage, for example, when the rotor is unbalanced during the operation. Analysis of the development process and the results obtained allowed us to draw conclusions about the main advantages and disadvantages of the considered approach, and also allowed us to identify some important directions for further research.

1. Introduction

Fluid film bearings (FFBs) are the result of a complex set of interrelated hydrodynamic, thermal, tribological and mechanical phenomena. The choice of an FFB’s design parameters determines the course of these processes and, finally, the bearing’s operational properties. If efficient calculation tools are available, the selection of optimal ratios of the design parameters and the improvement of the bearing’s properties can be carried out, for example, by solving optimization problems [1,2,3,4]. Additionally, the characteristics of FFBs are often improved by the use of new design solutions [5,6], advanced lubricants [7,8], etc.
The use of active control techniques is an alternative trend in improving FFBs. There are a variety of ways to implement active FFBs. The geometric bearing’s parameters [9,10,11], the lubricant’s parameters [12,13], as well as the parameters of their supply to the friction zone [14,15] have been made adjustable in various studies. The latter approach can be generally referred to as the active lubrication technique. It can also be applied to various types of bearings, including rigid [9], tilting pads [16], and both radial and thrust bearings [17,18]. An important advantage of active lubrication over the other named approaches is that in many cases, no fundamental changes to the bearing design are required; only the hydraulic (or pneumatic [19]) lubrication system undergoes changes.
The initial purpose of introducing active lubrication to FFBs in most cases was to reduce vibration and ensure rotor stability. Such results are shown in numerous papers, e.g., by Santos et al. [16], Nicoletti et al. [20], Rehman et al. [21], and Li et al. [22]. At the same time, active lubrication inevitably affects not only the rotor motion, but also other parameters of the FFB due to the previously noted complex relationships between physical phenomena, which were also noted in [23]. In addition to the obvious effect on the lubricant consumption, it also modifies the power consumption for pumping if external pressurization is used; the friction in the bearings, both in steady state and during startup and stop and rundown; the lubricant temperature and the corresponding deformations; the life of the rotor-bearing system’s components. Moreover, the last two points are also related to the friction occurring in the FFB. Thus, the energy parameters can also become the subject of control in active FFBs in addition to the system dynamics.
Multiple studies demonstrate various ways to reduce friction in FFBs: the use of less viscous [24] lubricants and improving their properties [7]; texturing bearing’s surface [25]; adding compliant areas [26]. However, friction and other energy parameters are still rarely considered to be controllable parameters or even possible objectives in controlled systems, including active bearings. Murashima et al. [27] considered friction as a factor to be directly adjusted in a tribological system. Engel et al. [28] presented a sliding bearing with adjustable friction controlled by ultrasonic oscillations. Regarding actively lubricated FFBs, the relationship between friction parameters and control techniques, namely adjustment of the shaft position, was discovered and substantiated in [29,30]. This provides the basis for the study and development of complex control strategies considering both the kinematic and the energy parameters of the rotor-bearing system. This approach requires the use of multi-objective control techniques that are also able to deal with nonlinear systems like FFBs.
Typical stabilization problems in actively lubricated FFBs are usually solved using conventional control methods: PID controller and its variations [19,31], LQG [32], and adaptive P control [33]. Some researchers used fuzzy logic [34] and model-predictive controllers [22] for this purpose. The latter, although optimal in essence, implies linearization of the system, and thus can hardly be applied to solve the control problems under consideration.
However, previous studies have shown promising results in the application of controllers based on reinforcement learning. They belong to agent-based methods and have been successfully implemented to resolve various control problems [35,36,37], including in FFBs. In particular, there is a practice of applying a deep Q-network (DQN) method to synthesize controllers in this area. In [38], a DQN controller was developed to minimize friction in an actively lubricated conical bearing. In [39], a DQN controller was used to restrict undesirable shaft movements in a conical bearing with an adjustable bushing. In [40], a DQN-based controller was implemented for a magnetorheological bearing to reduce the amplitudes of rotor vibrations at critical frequencies. Although the mentioned studies represent single-criteria controllers with a single control channel, the principle of reinforcement learning allows us to arbitrarily expand the number of control criteria, as well as control loops.
The aim of this work is to study the possibility of implementing multi-criteria optimal control for journal actively lubricated hybrid bearings (ALHBs) using reinforcement learning. One of the distinctive features of the study is implementation of several control loops that require coordinated operation within the framework of a controller based on reinforcement learning. Another distinctive feature is the multi-criteria formulation of the optimal control problem that takes into account not only the rotor dynamics, but also a number of energy characteristics of the bearing, such as power losses to overcome viscous friction and to pump the lubricant through the bearing. Such formulation of the problem has not previously been considered and solved for active FFBs, so the present study discovers the basics of the methodology for creating this kind of controllers and offers a discussion of the problems and prospects of the described approach.

2. Models and Methods

2.1. Basic Numerical Model

An actively lubricated hybrid bearing (ALHB) is the main object in this study. The ALHB is a journal bearing with four externally pressurized lubrication channels ending with hydrostatic pockets (grooves) located orthogonally along the bearing centerline. The lubricant supply pressure can be adjusted independently by electrohydraulic servovalves. Thus, the hydrostatic force acting on the journal can be adjusted both by the magnitude and the direction. Figure 1 illustrates the bearing design and operation principles. Considering the mentioned purpose of this work, the rotor-bearing system’s design should ensure the representativeness of the results. Thus, a relatively heavy rotor was considered in this study because such a system’s design ensures a more severe reduction in viscous friction in an ALHB [29]. The main parameters of the rotor-bearing system considered in this study are shown in Table 1.
The bearing was modeled using a conventional approach assuming numerical solving of the modified Reynolds equation coupled with the flow balance equation for modeling the hydrostatic effect, and with the Lagrange equations for modeling the rotor dynamics. The Reynolds number for the bearing with the given parameters Re ≈ 250, so the lubricant flow can be considered fully laminar.
The modified Reynolds equation was used to calculate the hydrodynamic pressure distribution in the ALHB [41]:
x h 3 p x + z h 3 p z = 6 μ U h x 12 μ V ,
where x and z are the Cartesian coordinates of the bearing surface, p is pressure, U and V are the journal’s circumferential and lateral velocities, and h is the bearing clearance function. The first term 6 μ U h x of the right side of Equation (1) describes the pressure caused by the rotation journal surface; its optimization mainly ensures the reduction in bearing friction considered in this work [29]. The second term 12 μ V describes the pressure caused by squeezing the lubricant film and makes a smaller contribution to the friction reduction effect.
The flow balance equation was used to consider the throttling effect of the capillary restrictors and calculate the corresponding reduction in the lubricant supply pressure at their inputs and outputs [29,42]:
Q = i = 1 N H Q H i = i = 1 N H π d H 4 128 l H p 0 p H ρ K H μ ,
where Q is the lubricant flow through the bearing, QHi is the lubricant flow through a certain restrictor, NH = 4 is the number of restrictors in the bearing, p H is pressure in hydrostatic pocket, and K H is a coefficient for taking into account the reduction in the flow rate due to the turbulence in a restrictor [43,44]:
K H = R e H R e * 3 4 ,
where R e H is Reynolds number characterizing the lubricant flow in an injector, and the limit Reynolds number is R e * = 2300 .
The inlet pressure p 0 for each hydrostatic pocket is adjustable in the range from 0 to p m a x due to the operation of the corresponding servovalve. The pressure is assumed to be almost the same over the entire area of a hydrostatic pocket due to the appropriate choice of its depth and the limited area compared to the full bearing’s surface area (<7%).
The bearing radial clearance function is as follows:
h = h 0 X 1 sin ( α ) X 1 cos ( α ) ,
where X i are the coordinates of the geometric center of the journal and α is the angular bearing coordinate.
Equation (1) is solved numerically by the finite differences method [38] together with Equations (2)–(4) resulting in the pressure distribution in the ALHB. The cavitation is taken into account based on the Gumbel’s hypothesis [38,45]. Numerical integration of the pressure distribution results in the following bearing forces:
F 1 = 0 L 0 π D p sin   α   d x d z , F 2 = 0 L 0 π D p cos   α   d x d z ,
The viscous friction torque in the ALHB is calculated using Equation (6) [38]:
T f r = r S h 2 p x + U μ h d x d z .
The corresponding power losses for overcoming the viscous friction N f r , as well as for the lubricant pumping through the bearing N p , are as follows:
N f r = T f r ω .
N p = Q H · p 0 .
The rotor was represented by its single-mass model, assuming the absence of significant misalignments of the journal. The rotor motion equations considering the gravity force, the bearing forces, and the imbalance forces, are as follows [46,47]:
m r d V 1 / d t d V 2 / d t = F 1 F 2 + F 1 e x t F 2 e x t + m u d ω 2 cos   ω t sin   ω t + m r 0 g ,
where t is time, m u d is imbalance, g is free fall acceleration, m r is the rotor mass, and F i e x t are other external forces applied to the shaft.
A relatively simple proportional controller previously described in [30,33] was used to establish the relation between the control signals and the pressure changes in lubricant supply channels and further generate the corresponding data:
U X = K G E ,
where U X = [ u X 1 ,   u X 2 ] is the vector of basic control signals, K G is the proportional gain, E = [ e X 1 ,   e X 2 ] is the vector of control errors, e X 1 = X X 0 , X 0 is the setpoint.
U = U 0 K P U X ,
where U is the vector of the control signals to electrohydraulic servovalves; U 0 is the basic signal level, resulting in the pressure of p 0 at a servovalve’s output (and at a restrictor’s input); K P = p m a x / p 0 is the ratio of possible pressure rising due to control signals. Thus, a certain error value increases the control signal at a corresponding servovalve and decreases it at the opposite one, providing their differential operation and increasing the control impact on the shaft.
Finally, the output pressure at a servovalve is as follows:
P X = K S K U ,
where K S K is the servovalves’ transfer coefficient, i.e., voltage-to-pressure ratio.

2.2. Model Verification

The developed model of the ALHB was verified in several stages, in terms of calculating the pressure distribution, rotor loci, friction losses and lubricant flow rates. At each stage, the model predictions were compared to the experimental results published in other authors’ works.
Pressure distributions due to the hydrodynamic and hydrostatic effects have been compared to the experimental results published in [48,49], correspondingly (Figure 2). In both cases, the reference data on the pressure are presented for the bearing centerline.
The comparison for the plain hydrodynamic bearing is presented in Figure 2a. The modeled plain bearing has a diameter of 50 mm, a length of 20.5 mm, clearance of 73.2 μm, and is lubricated by a single groove by supplying the oil with the pressure of 0.15 and 0.3 MPa [48].
The comparison for the hybrid bearing with six hydrostatic pockets is presented in Figure 2b. The modeled hybrid bearing has a diameter of 62 mm, a width of 48.5 mm, and clearance of 60 μm, with 6 jets supplying the oil under the pressure of 1.32 MPa [49]. The experimental test was carried out at an eccentricity of 0.5 at a rotor speed of 4800 rpm.
In both cases, the comparison shows good agreement between the simulation and the experimental data. The agreement for the hydrostatic effect is somewhat worse due to the unknown length of the restrictors used by authors, as well as to the errors of measurement by the pressure sensor integrated into the rotating shaft. Moreover, the mentioned discrepancies are largely leveled out by further processing of pressure data when calculating the rotor movement and other bearing parameters, as shown below.
Verification of the system’s model in terms of the rotor motion has been achieved via a comparison to the experimental results presented by Yi et al. in [50]. The comparison was made for a hydrostatic fluid film bearing with a diameter of 60 mm, a width of 25 mm, and a radial clearance of 52 μm with 9 injectors. The lubricant used was water at a temperature of 25 °C; the supply pressure was 0.1, 0.5, and 1 MPa.
The results presented in Figure 3 also show good agreement. The discrepancy is less than 3 μm, that is, <2% of the bearing clearance (red dashed line in Figure 3), and the order of the typical error of proximity sensors.
Additionally, since the issues of the lubricant flow and the viscous friction in the fluid film are also significant concerns in this study, the model’s ability to accurately predict these parameters has also been checked.
The data on the hydrodynamic friction (M. Fillon in [51]) and the lubricant flow rate (Yi et al. in [50]) have also been obtained experimentally. Their comparison to the model’s predictions is shown in Figure 4. In paper [51], the authors present a study of the friction torque. The study was carried out for a bearing with the length of 80 mm, a diameter of 100 mm, radial clearance of 171 μm, and a rotor speed of 2000 rpm. The friction torque was measured under a static load of 2 kN and at different oil temperatures.
As with the other parameters considered, the predicted values are in good agreement with the measured ones with a deviation of no more than 2%. The verification results make it possible to consider the developed numerical model of ALHB adequate for the purposes of the study and to consider the results obtained to be sufficiently reliable.

2.3. The Study Pipeline and the Control Problem

Reinforcement learning methods, including the DQN method used in this article, are agent-based methods and require the use of simulation models of the control objects [52]. The agent is trained in the optimal control strategy during multiple implementations of the training scenario with further evaluation of the results. The iterative and stochastic nature of such a process, coupled with the quadratic complexity of the grid methods used for modeling ALHBs, lead to the need for huge amounts of computation. Therefore, data-driven ANN-based models of the ALHB were used instead of numerical models to improve the computational speed. The methodology for creating such surrogate models is based on the results for plain bearings presented in [41]. Since this work considers a hydrodynamic bearing, the dataset describing the ALHB was supplemented with the control signals values associated with the lubricant pressure changes in the bearing’s hydrostatic pockets. The process of developing a surrogate ANN model of the ALHB is described in more detail in Section 2.4.
Next, the resulting surrogate model is used together with the rotor model to train the DQN controller. The DQN controller finds the optimal policy that meets the given goals. Goal setting is implemented by assigning a system of rewards and penalties for “right” and “wrong” actions of the DQN agent. Formulation of the reward function is one of the subjects of this study.
Important features of the DQN learning technique are described in more detail in Section 2.6. The general pipeline of development of a DQN controller for the ALHB is schematically presented in Figure 5.
As noted above, the goal setting for the control problem solved in this study is based on the previously found relations between the controlled shaft position in the ALHB and its energy parameters, such as power losses due to friction in the bearing. At the same time, as noted in [29], the gain in reducing power consumption for pumping lubricant through the bearing may be even more significant in comparison with the similar effect from reducing the viscous friction. Therefore, the combination of these factors is taken into account in this study as the control objectives. The relation between the ALHB energy parameters and the shaft equilibrium position set by a controller was calculated using the verified model and is shown in Figure 6.
Both dependencies are characterized by the presence of a single minimum and an increase in the argument in all directions from it. More eccentric shaft positions are less energy-efficient and more dangerous as they reduce the fluid film thickness. This makes it possible to synthesize a controller for the ALHB that ensures optimization of the considered energy parameters of the system and also prevents the system from transitioning to potentially dangerous states with a small fluid film thickness h < h m i n . Formally, the required control policy can be represented as follows:
π = argmin U X ( α 1 N f r + α 2 N p ) s u b j e c t   t o   e e m a x ,
where e = 1 h m i n is the largest observed value of the shaft eccentricity; e m a x is the maximum permissible value of the shaft eccentricity (the inverse of the minimum film thickness); α 1 and α 2 are balancing (weight) coefficients to shift the emphasis, if necessary, to one of the power parameters.

2.4. Surrogate ALHB Model

The numerical ALHB model was used to generate a dataset and build a faster surrogate bearing model consisting of 9 ANNs according to the principles described in [41]. The whole bearing clearance was divided into three areas according to eccentricity (0–0.5, 0.5–0.7, and 0.7–0.9), and a separate ANN was trained for each area to predict a certain set of parameters: bearing forces ( F 1 , F 2 ); power characteristics ( N f r , N p ); and lubricant flow rate ( Q H ). The input parameters of the ANNs were the values of the control action along the axes ( u X 1 , u X 2 ), the position of the rotor in the bearing ( X 1 , X 2 ) and its velocities ( V 1 , V 2 ). As a result, a dataset with corresponding input and output parameters was collected for each eccentricity range. Each bearing sub-area was covered by a uniform grid in polar coordinates resulting in 40 data points by angle and 7 (for the (0–0.5) subrange) or 5 (for other two subranges) data points by eccentricity. Additionally, the data at each point mentioned above were collected considering the shaft velocities ( V 1 , V 2 ) values varying in a range from −0.03 to 0.03 with 28 points, and the control actions ( u X 1 , u X 2 ) values varying in a range from −1 to 1 with 6 points. The resulting data vector included the following variables: [ u X 1 , u X 2 ,   X 1 , X 2 , V 1 , V 2 ]. The resulting dataset included 9.35 million training samples.
Fully connected ANNs with one hidden layer including 64 neurons were used to approximate the dataset representing the ALHB model. The maximal prediction error of trained ANNs compared to the verified numerical model outputs was estimated using the MEAN metric and was of 3%, 7%, 13% for the eccentricity ranges of 0–0.5 and 0.5–0.7, respectively, for bearing forces; <1% for other parameters for all ranges.
After the training and testing processes were complete, the initial ALHB numerical model then was replaced by the obtained set of ANN-based models in the simulation environment, which is described in more detail in Section 2.5.

2.5. Full Simulation Model

The simulation model of the rotor-bearing system was developed in the Simulink tool of the MatLab R2020b software. Simscape Multibody Toolbox was used to simulate rigid body dynamics. It was used to model a shaft on two ALHB operating in parallel. Reinforcement Learning Toolbox was used to implement the DQN controller training. The structure of the simulation model in Simulink is shown in Figure 7.
ALHBs can be represented by their numerical or ANN-based surrogate models predicting the set of parameters including ( F 1 , F 2 , T f r , Q H , N f r , N p ). The control signals u X 1 and u X 2 generated by a controller are the inputs of the ALHB models. Applying them results in a change in the shaft position in the bearing and the corresponding changes in other observed variables.
The transition to surrogate models speeds up the calculation by more than 20 times, which makes it possible to test more variants of hyperparameters and conditions during training. Training a DQN-controller thus usually takes 2–6 h, depending on the algorithm convergence rate at each case.
An important feature of the DQN algorithm is that it provides only discrete outputs, and their number significantly affects the training time. Therefore, the proposed controller used only 3 outputs, generating signals to increase, decrease, or leave the control signal level unchanged, respectively. The final value of the control signals in this case is provided by using the cumulative sum to accumulate signals’ values.

2.6. DQN-Based Control

The DQN method is a type of Q-learning that uses ANNs to find and implement the desired controller behavior policy. During the training process, the DQN agent obtains information about the state of the system S t at each time step t . Then, it generates a control signal A t in response to which the reward r t is calculated based on the observation parameters. Thus, the critic ANN q ( S , A ) is trained to predict the future reward. The error between the trained function q ( S , A ) and the optimal function q * S , A is estimated with the Bellman equation and should be minimized during the training process [53]:
q t * S t , A t = r t + γ   argmax A   q t + 1 S t + 1 , A t + 1 .
In this study, a convolutional ANN with the structure shown in Figure 8 was used as a critic. The critic ANN shown consists of three hidden layers with 14, 18, and 18 neurons, respectively. The input parameters used during the training process are the observation parameters from the given time step and the control parameters from the previous time step. The critic’s output is the control signals vector U X . There are two parallel control loops (along X 1 and X 2 axes).
For the control system, a condition was set to interrupt the calculation for an emergency event, which was a collision of the rotor with a bearing, specified as follows:
b r e a k = i f   e h 0 .
Several different reward functions have been tested during the study to meet the aim described by Equation (13) and overcome some shortcomings of the trained controllers. The obtained results and their discussion are presented in Section 3.

3. Results and Discussion

Previous studies [38,39] have shown that the discrete DQN algorithm used for the FFB control tasks performs well with a discrete reward function. Therefore, the following reward function was developed based on and taking into account Equations (13)–(15):
r e w a r d = 1     i f   N f r + N p < N l i m   &   e < e l i m ;           100     i f   b r e a k = t r u e .
where N l i m is the maximum permissible value of total power losses N f r + N p in the current state of the system; e l i m is the maximum value of the shaft eccentricity, assessed as safe, i.e., ensuring sufficient fluid film thickness.
The following values were set for the hyperparameters of the DQN learning agent: LearnRate had a value of 0.001, the Optimizer was Adam, the TargetSmoothFactor was 1 × 10−4, the DiscountFactor was 0.95, the MiniBatchSize was 250, the control frequency was 10 signals per second, and the discreetness of the control signal was set to 0.1. The following values were set for the parameters in Equation (16): N l i m = 2.5 W, e l i m = 0.7.
The DQN agent was trained using a perfectly balanced rotor model without external forces applied to ensure unambiguous assessment of power parameters for a certain system’s state. Since real rotors have a non-zero imbalance, the shaft center current coordinates were transmitted to the prepared DQN controller during its testing in averaged form over several periods of oscillation. These coordinates approximately characterized the position of the center of the shaft’s orbit in the bearing. On the contrary, direct coordinates values without preliminary averaging were used to check the fulfillment of the condition e < e l i m to assess the most distant shaft position from the bearing center.
The DQN agent training process is presented in Figure A1 in Appendix A. The results of testing the trained controller on the simulation model are presented in Figure 9.
Testing of the trained system shows that the controller moves the shaft from the initial equilibrium point of the passive system to a new position. The total power consumption for overcoming viscous friction and pumping lubricant decreased from 3.35 W for the initial system state to, on average, 2.25 W for the adjusted one. However, oscillations of the control signal u X 2 were observed in the system, which also led to oscillations of the shaft around its steady position, causing fluctuations in the power parameters. The reason for the occurrence of oscillations, presumably, may be a combination of the reward function settings in terms of power losses ( N l i m ), and the chosen step of changing the control signals ∆u = 0.1. The response of the system with such a configuration in terms of the shaft position to a minimal change in the control signal ∆u turned out to be too significant. The DQN controller moves the shaft to a more optimal position to fulfill the condition in Equation (16) regarding minimizing the power losses. However, excessive displacement and subsequent correction occurs in this case, and then the system returns to the previous state and transitions into a self-oscillating mode.
Since self-oscillations are an undesirable operating mode of the system, measures were taken to eliminate the possibility of its occurrence. Changing the reward function from discrete to continuous did not provide the desired results; oscillations in the control action persisted. Changes to the training hyperparameters also failed to eliminate oscillations. Incorporating control actions from the previous step into the reward function turned out to be the most effective solution. The modified reward function took the following form:
r e w a r d = k r k u ( u X 1 + u X 2 )   i f   N f r + N p < N l i m   &   e < e l i m ; 100     i f   b r e a k = t r u e .
where k r is the reward coefficient; k u is the coefficient of influence of the control action on the reward, which is necessary to balance the reward value taking into account the levels of control signals.
The step of change of control signals was reduced to 0.05 to increase the precision of the control process. The control frequency was also increased to 20 signals per second. Other hyperparameters of the DQN agent were set as follows: the LearnRate was 0.001, the Optimizer was Adam, the TargetSmoothFactor was 1 × 10−4, the DiscountFactor was 0.9, and the MiniBatchSize was 250.
The value of the permissible power limit N l i m was reduced to 2 W to test the stability of control at a smaller permissible deviation from the minimum. The e l i m value was maintained at 0.7. The coefficients in Equation (17) had the following values: k r was set equal to 2 to ensure the level of reward increase when the penalty was charged through only one of the control loops; k u was set to 2 to increase the weight of the components while reducing the step of changing the control signals to 0.05.
The DQN agent training process with modified parameters is presented in Figure A2 in Appendix A. The results of testing the trained controller on the simulation model are presented in Figure 10.
The test results show that taking into account the value of the control signal in the reward function eliminated oscillations and provided stable control signal generation. The shaft was also moved to a new stable position, and the total reduction in power losses turned out to be more significant than in the previous case, and reached 1.64 W, i.e., an almost twofold decrease (by 49%).
As can be seen from Figure 10a, the main contribution to this effect was made by the reduction in power losses for lubricant pumping. In this case, a comparison of the steady-state value of the shaft in Figure 10d with the diagram data in Figure 6 shows that this position is some distance from the minima of each of the parameters. The optimal ratio found in this way can be rebalanced by changing the weight coefficients for the parameters N f r   a n d   N p in Equation (17), as shown by Equation (13).
Further testing of the trained DQN controller involved checking its ability to provide a given value of the minimum fluid film thickness in the presence of rotor imbalance. No additional external lateral forces besides the rotor’s own weight were applied. The results of testing using the simulation model are presented in Figure 11. The imbalance values [1 × 10−4, 2 × 10−4, 2.5 × 10−4, 3.2 × 10−4] kg·m were selected so that the first three of them did not exceed the specified limit e l i m = 0.7, and the last one went beyond its limits and elicited a response from the DQN controller.
The graphs in Figure 11 show that as the imbalance increased, the rotor orbit remained stable in all cases, and the conditional center of the orbits was located near the point set by the controller with minimized power losses. Exceeding the established limit of the shaft eccentricity elicited a response from the regulator. As a result, the amplitude of oscillations decreased, and a new stable orbit was established close to the permissible limits.
Analyzing the process of developing the controller and the results obtained allowed us to draw some conclusions about the use of DQN as a tool for synthesizing optimal control for the ALHB.
First of all, the resulting controllers ensured system stability in all cases; although controller-induced oscillations were observed in the first version (Figure 9). These oscillations, although unacceptable, had a relatively small magnitude and could be compensated by modifying the reward function, as well as reducing the step of the control signal ∆u. Testing the controller with the unbalanced rotor demonstrated the preservation of system stability, including during transient processes, as shown in Figure 11b. This is due to the stability of the considered configuration of the rotor-bearing system with passive bearings. The use of the synthesized DQN controller ensured the fulfillment of the specified goals, and at the same time did not have a destabilizing effect, that is, the operation of the control loops turned out to be quite consistent. The ability of the presented controllers to stabilize rotor-bearing systems, which are initially in an unstable state, is not obvious. It can be assumed that this is also the subject of the correct choice of reward functions and may be the subject of further research.
Another logical consequence is the conclusion that the reward function can be easily modified and expanded with almost no restrictions. In this study, the solution to the problem of synthesizing the multi-objective controller for such a complex object as the ALHB showed the absence of fundamental differences from single-objective systems, like those previously implemented in [38,39]. The development process also showed that correction of individual undesirable effects could be achieved by modifying the reward function and the method’s hyperparameters, although this may require quite in-depth analysis and knowledge of the operation of the system. At the same time, the final control policy itself is determined automatically during the training process, without the direct participation of the developer, and based only on the rules specified by reward functions. This has the potential to reduce development time for complex controllers and may be a way to solve problems that are difficult to solve in traditional ways. It should also be taken into account that the dependencies inside the synthesized controller, as for any ANN, are implicit and suffer from the impossibility of a strict physical interpretation. Therefore, the selection of relevant methods for testing the trained system will also play a significant role in the development practice.
Another advantage of the considered approach is the fact that the controller is synthesized using nonlinear models of bearings. In this way, the resulting control policies are synthesized taking into account the simulated nonlinear effects, reducing the risk of unexpected responses in real systems. Although a DQN control does not directly relate to model predictive techniques, agent training occurs during interaction with nonlinear models. This feature distinguishes it for the better, for example, from the ALHB model-predictive controller presented in [30], where preliminary linearization of the system was required with the inevitable loss of some information about its behavior.
Regarding the case of the rotor system on ALHBs considered in this study, the problem posed can be considered to be solved. The features of active lubrication noted in previous studies [22,29] made it possible to optimize the lubricant supply and achieve a reduction in power losses associated with its operation due to friction and pumping of lubricant with all the corresponding advantages. It should be noted that some of the objectives considered in this problem, namely the shaft position and the power parameters, were not conflicting with each other in the Pareto sense. At the same time, the parameters N f r   a n d   N p , on the contrary, partially conflict with each other, though this did not become an obstacle for the controller to finding a single stable solution. Thus, a successful processing even conflicting objective seems to be a matter of correctly formulating the management problem.
Despite the listed advantages of the considered approach, a number of existing and possible problems can also be noted.
First of all, it should be noted that a significant amount of computation is required for systems such as rotor-bearing systems with FFBs. Despite measures taken to speed up calculations, such as the use of the ALHB surrogate models [41], training for each option could take 2–6 h, depending on the current algorithm’s convergence rate. Thus, there is insufficient flexibility in the learning process. If there are implicit inconsistencies in the problem statement and unacceptable results are obtained, testing each of the reward functions, or combinations of hyperparameters can take a significant amount of time, especially for complex systems. This circumstance increases the requirements for the initially correct formulation of learning rules represented by reward functions.
Another potential problem is ANN-based models’ tendency toward overfitting. It is necessary to strictly analyze the training scenarios and correlate them with the accepted reward function in order to avoid excessive determinism in the behavior of the trained controller. A risk of “memorizing” a single scenario and its further reproduction exists for an overfitted controller. Additionally, unexpected variations in the environment in a real system can lead to unpredictable controller responses in such cases. Thus, the requirement noted above for the selection of relevant methods for testing a system for robust properties also becomes applicable here.
At the same time, even in the presence of the listed problems and risks, reinforcement learning methods seem to be quite a powerful tool for synthesizing controllers, including for complex and nonlinear systems such as FFBs. If the requirements for a system allow the use of controllers without an explicit physical interpretation of control laws, then controllers can be synthesized for it in this way, including multi-objective ones and those using several independent control loops.
The above issues of robustness, optimization of hyperparameters, and formulation of reward functions and constraints remain subjects in need of further research in the field of application of reinforcement learning for the synthesis of controllers for systems such as active FFBs. The results of such studies can provide answers to corresponding questions and contribute to the integration of the considered approaches and solutions into engineering practice in the field of rotary machines.

4. Conclusions

This study demonstrates the possibility and features of using reinforcement learning to synthesize controllers of radial actively lubricated hybrid fluid film bearings (ALHBs) with multiple adjustable variables. The developed control system based on deep Q-networks (DQN) considers power losses for viscous friction and for lubricant pumping through ALHB along with the shaft displacements as the objective functions. The considered ALHBs included two independent control loops to adjust the lateral shaft position in a bearing. The synthesized DQN-controller provides stable system operation and is able to ensure efficient reduction in power losses in the bearing (up to 49% for the considered system) due to optimization of the lubricant supply parameters. It also is able to maintain a minimal guaranteed value of the fluid film thickness to provide additional safety and stability of the system.
The key advantages of this approach are the ability to find optimal control strategies automatically during training process; the possibility of using complex nonlinear models of rotor-bearing systems, and therefore taking into account the corresponding features of their behavior in obtained control strategies; the ability to easily increase the number and complexity of the control objectives by simply modifying reward functions.
One of the key shortcomings of the approach is the need for significant amounts of calculations, although this is partially compensated by the use of fast data-driven surrogate models of ALHBs. Appropriate formulation of reward functions also may become a challenging task requiring deep knowledge of features of operation of a complex system such as an ALHB. Avoiding overtraining also should be taken into account when developing the reward functions and training scenarios, particularly by generalizing the applied conditions.
The results obtained in this study are the basis for further research, which should provide more detailed clarification of the capabilities and limitations of the proposed approach, particularly when applying other control objectives and varying the method hyperparameters and operating conditions to obtain a more complete picture in realistic conditions. The corresponding results can bring the proposed approaches and methods closer to implementation in engineering practice.

Author Contributions

Conceptualization, D.S.; Formal analysis, D.S.; Funding acquisition, D.S.; Investigation, D.S. and Y.K.; Methodology, D.S. and Y.K.; Project administration, D.S.; Resources, D.S.; Software, D.S. and Y.K.; Supervision, D.S.; Validation, D.S. and Y.K.; Visualization, D.S. and Y.K.; Writing—original draft, D.S. and Y.K.; Writing—review and editing, D.S. and Y.K. All authors have read and agreed to the published version of the manuscript.

Funding

The study was supported by the Russian Science Foundation grant No. 22-79-00289, https://rscf.ru/en/project/22-79-00289/ (accessed on 3 April 2024).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors express their gratitude to the Russian Science Foundation for the provided financial support and to the reviewers for their help in improving the quality of the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

List of abbreviations
ALHBActively lubricated hybrid bearing.
FFBFluid film bearing.
ANNArtificial neural network.
DQNDeep Q-network.
List of symbols
x ,   y , z Cartesian coordinated.
αAngular bearing coordinate.
U ,   V Circumferential and lateral shaft velocities.
X i Coordinates of the geometric center of the journal.
V 1 , V 2 Shaft’s lateral velocity components.
F 1 , F 2 Bearing forces.
F 1 e x t , F 2 e x t External forces.
O, O1Center of bearing and shaft.
h0Bearing radial clearance.
hminMinimum film thickness.
ρDensity of lubricant.
µDynamic viscosity of lubricant.
Q, QHLubricant flow rates.
pPressure of lubricant (function).
p0, pmaxInstant and maximum lubricant supply pressure.
TfrViscous friction torque in bearing.
m u d Rotor imbalance.
Nvf, NpPower losses due to viscous friction and to lubricant pumping.
n, ωRotation and angular shaft speed.
r, dShaft radius and diameter.
R, DBearing radius and diameter.
LBearing length.
m r The rotor mass.
NHNumber of restrictors in the bearing.
lH, dHRestrictor length and diameter.
Re, ReHReynolds number in bearing and in restrictor.
ReCCritical Reynolds number.
q * S , A Optimal function.
U X Vector of basic control signals.
K G Proportional gain.
E Vector of control errors.
U Vector of the control signals for servovalves.
K S K Servovalve’s transfer coefficient.
U0Basic control signal level.
εControl error.

Appendix A

Figure A1. Training schedule of the initial DQN controller.
Figure A1. Training schedule of the initial DQN controller.
Lubricants 12 00242 g0a1
Figure A2. Training schedule of the improved DQN controller.
Figure A2. Training schedule of the improved DQN controller.
Lubricants 12 00242 g0a2

References

  1. Ghorbanian, J.; Ahmadi, M.; Soltani, R. Design Predictive Tool and Optimization of Journal Bearing Using Neural Network Model and Multi-Objective Genetic Algorithm. Sci. Iran. 2011, 18, 1095–1105. [Google Scholar] [CrossRef]
  2. Saruhan, H. Optimum design of rotor-bearing system stability performance comparing an evolutionary algorithm versus a conventional method. Int. J. Mech. Sci. 2006, 48, 1341–1351. [Google Scholar] [CrossRef]
  3. Codrignani, A.; Savio, D.; Pastewka, L.; Frohnapfel, B.; van Ostayen, R. Optimization of Surface Textures in Hydrodynamic Lubrication through the Adjoint Method. Tribol. Int. 2020, 148, 106352. [Google Scholar] [CrossRef]
  4. Van Ostayen, R.A.J. Film Height Optimization of Dynamically Loaded Hydrodynamic Slider Bearings. Tribol. Int. 2010, 43, 1786–1793. [Google Scholar] [CrossRef]
  5. Wasilczuk, M. Friction and Lubrication of Large Tilting-Pad Thrust Bearings. Lubricants 2015, 3, 164–180. [Google Scholar] [CrossRef]
  6. Liu, Y.; Zhou, Y.; He, T.; Xia, Y. The Utilization of a Damping Structure in the Development of Self-Adaptive Water-Lubricated Stern Bearings. Lubricants 2024, 12, 32. [Google Scholar] [CrossRef]
  7. Zhang, X.; Yan, Y.; Wang, P.; Zhang, T.; Liu, S.; Ye, Q.; Zhou, F. Thiadiazole Functionalized Covalent Organic Frameworks as Oil-Based Lubricant Additives for Anti-Friction and Anti-Wear. Tribol. Int. 2023, 183, 108393. [Google Scholar] [CrossRef]
  8. Cui, Y.; Jin, G.; Xue, S.; Liu, S.; Ye, Q.; Zhou, F.; Liu, W. Laser Manufactured-Liquid Metal Nanodroplets Intercalated Mxene as Oil-Based Lubricant Additives for Reducing Friction and Wear. J. Mater. Sci. Technol. 2024, 187, 169–176. [Google Scholar] [CrossRef]
  9. Martin, J.K.; Parkins, D.W. Testing of a Large Adjustable Hydrodynamic Journal Bearing. Tribol. Trans. 2001, 44, 559–566. [Google Scholar] [CrossRef]
  10. Chasalevris, A.; Dohnal, F. Improving Stability and Operation of Turbine Rotors Using Adjustable Journal Bearings. Tribol. Int. 2016, 104, 369–382. [Google Scholar] [CrossRef]
  11. Chasalevris, A.; Dohnal, F. Enhancing Stability of Industrial Turbines Using Adjustable Partial Arc Bearings. J. Phys. Conf. Ser. 2016, 744, 012152. [Google Scholar] [CrossRef]
  12. Laukiavich, C.A.; Braun, M.J.; Chandy, A.J. A Comparison between the Performance of Ferro- and Magnetorheological Fluids in a Hydrodynamic Bearing. Proc. Inst. Mech. Eng. Part J J. Eng. Tribol. 2014, 228, 649–666. [Google Scholar] [CrossRef]
  13. van der Meer, G.H.G.; Quinci, F.; Litwin, W.; Wodtke, M.; van Ostayen, R.A.J. Experimental Comparison of the Transition Speed of a Hydrodynamic Journal Bearing Lubricated with Oil and Magnetorheological Fluid. Tribol. Int. 2023, 189, 108976. [Google Scholar] [CrossRef]
  14. Santos, I.F.; Watanabe, F.Y. Compensation of Cross-Coupling Stiffness and Increase of Direct Damping in Multirecess Journal Bearings Using Active Hybrid Lubrication: Part I-Theory. J. Tribol. 2004, 126, 146–155. [Google Scholar] [CrossRef]
  15. Haugaard, A.M.; Santos, I.F. Elastohydrodynamics Applied to Active Tilting-Pad Journal Bearings. J. Tribol. 2010, 132, 021702. [Google Scholar] [CrossRef]
  16. Santos, I.F. Design and Evaluation of Two Types of Active Tilting Pad Journal Bearings. Act. Control Vib. 1994, 79–87. [Google Scholar]
  17. Sha, Y.; Lu, C.; Pan, W.; Chen, S.; Ge, P. Nonlinear Control System Design for Active Lubrication of Hydrostatic Thrust Bearing. Coatings 2020, 10, 426. [Google Scholar] [CrossRef]
  18. Rehman, W.U.; Khan, W.; Ullah, N.; Shahariar Chowdhury, M.D.; Techato, K.; Haneef, M. Nonlinear Control of Hydrostatic Thrust Bearing Using Multivariable Optimization. Mathematics 2021, 9, 903. [Google Scholar] [CrossRef]
  19. Pierart, F.G.; Santos, I.F. Active Lubrication Applied to Radial Gas Journal Bearings. Part 2: Modelling Improvement and Experimental Validation. Tribol. Int. 2016, 96, 237–246. [Google Scholar] [CrossRef]
  20. da Silva, H.A.P.; Nicoletti, R. Tilting-Pad Journal Bearing with Active Pads: A Way of Attenuating Rotor Lateral Vibrations. In Lecture Notes in Mechanical Engineering; Springer: Cham, Switerland, 2023; pp. 44–54. [Google Scholar] [CrossRef]
  21. Rehman, W.U.; Jiang, G.; Luo, Y.; Wang, Y.; Khan, W.; Rehman, S.U.; Iqbal, N. Control of Active Lubrication for Hydrostatic Journal Bearing by Monitoring Bearing Clearance. Adv. Mech. Eng. 2018, 10, 2018. [Google Scholar] [CrossRef]
  22. Li, S.; Zhou, C.; Savin, L.; Shutin, D.; Kornaev, A.; Polyakov, R.; Chen, Z. Theoretical and Experimental Study of Motion Suppression and Friction Reduction of Rotor Systems with Active Hybrid Fluid-Film Bearings. Mech. Syst. Signal Process. 2023, 182, 109548. [Google Scholar] [CrossRef]
  23. Santos, I.F. Trends in Controllable Oil Film Bearings. In IUTAM Symposium on Emerging Trends in Rotor Dynamics, Proceedings of the IUTAM Symposium on Emerging Trends in Rotor Dynamics, New Delhi, India, 23–26 March 2009; Springer: Dordrecht, The Netherlands, 2011; Volume 25, pp. 185–199. [Google Scholar] [CrossRef]
  24. Knauder, C.; Allmaier, H.; Sander, D.E.; Salhofer, S.; Reich, F.M.; Sams, T. Analysis of the Journal Bearing Friction Losses in a Heavy-Duty Diesel Engine. Lubricants 2015, 3, 142–154. [Google Scholar] [CrossRef]
  25. Vlădescu, S.C.; Fowell, M.; Mattsson, L.; Reddyhoff, T. The Effects of Laser Surface Texture Applied to Internal Combustion Engine Journal Bearing Shells—An Experimental Study. Tribol. Int. 2019, 134, 317–327. [Google Scholar] [CrossRef]
  26. Rasheed, H.E. The Reduction of Friction in Axially Non-Cylindrical Journal Bearings Using Grooved Bearing Shells. Tribol. Ser. 1998, 34, 535–541. [Google Scholar] [CrossRef]
  27. Murashima, M.; Imaizumi, Y.; Murase, R.; Umehara, N.; Tokoroyama, T.; Saito, T.; Takeshima, M. Active Friction Control in Lubrication Condition Using Novel Metal Morphing Surface. Tribol. Int. 2021, 156, 106827. [Google Scholar] [CrossRef]
  28. Engel, T.; Lechler, A.; Verl, A. Sliding Bearing with Adjustable Friction Properties. CIRP Ann. 2016, 65, 353–356. [Google Scholar] [CrossRef]
  29. Shutin, D.; Kazakov, Y. Theoretical and Numerical Investigation of Reduction of Viscous Friction in Circular and Non-Circular Journal Bearings Using Active Lubrication. Lubricants 2023, 11, 218. [Google Scholar] [CrossRef]
  30. Li, S.; Babin, A.; Shutin, D.; Kazakov, Y.; Liu, Y.; Chen, Z.; Savin, L. Active Hybrid Journal Bearings with Lubrication Control: Towards Machine Learning. Tribol. Int. 2022, 175, 107805. [Google Scholar] [CrossRef]
  31. Pierart, F.G.; Santos, I.F. Lateral Vibration Control of a Flexible Overcritical Rotor via an Active Gas Bearing—Theoretical and Experimental Comparisons. J. Sound Vib. 2016, 383, 20–34. [Google Scholar] [CrossRef]
  32. Salazar, J.G.; Santos, I.F. Active Tilting-Pad Journal Bearings Supporting Flexible Rotors: Part I—The Hybrid Lubrication. Tribol. Int. 2017, 107, 94–105. [Google Scholar] [CrossRef]
  33. Shutin, D.; Polyakov, R. Adaptive Nonlinear Controller of Rotor Position in Active Hybrid Bearings. In Proceedings of the 2016 2nd International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM), Chelyabinsk, Russia, 19–20 May 2016. [Google Scholar] [CrossRef]
  34. Rehman, W.U.R.; Luo, Y.; Wang, Y.; Jiang, G.; Iqbal, N.; Rehman, S.U.R.; Bibi, S. Fuzzy Logic–based Intelligent Control for Hydrostatic Journal Bearing. Meas. Control 2019, 52, 229–243. [Google Scholar] [CrossRef]
  35. Yeo, S.; Naing, Y.; Kim, T.; Oh, S. Achieving Balanced Load Distribution with Reinforcement Learning-Based Switch Migration in Distributed SDN Controllers. Electronics 2021, 10, 162. [Google Scholar] [CrossRef]
  36. Kim, J.B.; Lim, H.K.; Kim, C.M.; Kim, M.S.; Hong, Y.G.; Han, Y.H. Imitation Reinforcement Learning-Based Remote Rotary Inverted Pendulum Control in Openflow Network. IEEE Access 2019, 7, 36682–36690. [Google Scholar] [CrossRef]
  37. Nian, R.; Liu, J.; Huang, B. A Review On Reinforcement Learning: Introduction and Applications in Industrial Process Control. Comput. Chem. Eng. 2020, 139, 106886. [Google Scholar] [CrossRef]
  38. Kazakov, Y.N.; Kornaev, A.V.; Shutin, D.V.; Li, S.; Savin, L.A. Active Fluid-Film Bearing With Deep Q-Network Agent-Based Control System. J. Tribol. 2022, 144, 081803. [Google Scholar] [CrossRef]
  39. Kazakov, Y.N.; Kornaev, A.V.; Shutin, D.V.; Kornaeva, E.P.; Savin, L.A. Reducing Rotor Vibrations in Active Conical Fluid Film Bearings with Controllable Gap. Nelineinaya Din. 2022, 18, 863–873. [Google Scholar] [CrossRef]
  40. Fetisov, A.; Kazakov, Y.; Savin, L.; Shutin, D. Synthesis of a DQN-Based Controller for Improving Performance of Rotor System with Tribotronic Magnetorheological Bearing. In Lecture Notes in Networks and Systems; Springer: Cham, Switerland, 2023; Volume 717, pp. 81–91. [Google Scholar] [CrossRef]
  41. Shutin, D.; Kazakov, Y.; Stebakov, I.; Savin, L. Data-Driven and Physics-Informed Approaches for Improving the Performance of Dynamic Models of Fluid Film Bearings. Tribol. Int. 2024, 191, 109136. [Google Scholar] [CrossRef]
  42. Rowe, W.B. Hydrostatic, Aerostatic and Hybrid Bearing Design; Butterworth-Heinemann: Oxford, UK, 2012; pp. 1–333. [Google Scholar] [CrossRef]
  43. Constantinescu, V.N. On Turbulent Lubrication. Proc. Inst. Mech. Eng. 1959, 173, 881–900. [Google Scholar] [CrossRef]
  44. Afzal, N.; Abu, S.; Bushra, A. Friction factor power law with equivalent log law, of a turbulent fully developed flow, in a fully smooth pipe. Z. Angew. Math. Physik. 2023, 74, 144. [Google Scholar] [CrossRef]
  45. Hori, Y. Hydrodynamic Lubrication; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1–231. [Google Scholar] [CrossRef]
  46. Friswell, M.I.; Penny, J.E.T.; Garvey, S.D.; Lees, A.W. Dynamics of Rotating Machines; Cambridge University Press: Cambridge, UK, 2015; pp. 1–526. [Google Scholar] [CrossRef]
  47. Babin, A.; Polyakov, R. Imitation Model of Unbalanced Rotor on Fluid-Film Bearings. Vibroeng. Procedia 2020, 32, 38–44. [Google Scholar] [CrossRef]
  48. Mansoor, Y.; Shayler, P. The Effect of Oil Feed Pressure on the Friction Torque of Plain Bearings under Light, Steady Loads. Tribol. Int. 2018, 119, 316–328. [Google Scholar] [CrossRef]
  49. Foss, S.L.; Gaev, E.P.; Palladiy, A.V.; Maksimov, V.A. Experimental study of thermal and hydrodynamic characteristics of a hydrostatic bearing-seal. In Study of Hydrostatic Bearings and Seals of Aircraft Engines; Kharkov Aviation Institute: Kharkov, Ukraine, 1987; pp. 90–95. [Google Scholar]
  50. Yi, H.; Jung, H.; Kim, K.; Ryu, K. Static Load Characteristics of Hydrostatic Journal Bearings: Measurements and Predictions. Sensors 2022, 22, 7466. [Google Scholar] [CrossRef] [PubMed]
  51. Bouyer, J.; Fillon, M. Experimental Measurement of the Friction Torque on Hydrodynamic Plain Journal Bearings during Start-Up. Tribol. Int. 2011, 44, 772–781. [Google Scholar] [CrossRef]
  52. Johnson, J.D.; Li, J.; Chen, Z. Reinforcement Learning: An Introduction: R.S. Sutton, A.G. Barto, MIT Press, Cambridge, MA 1998, 322 pp. ISBN 0-262-19398-1. Neurocomputing 2000, 35, 205–206. [Google Scholar] [CrossRef]
  53. Train DQN Agent to Swing Up and Balance Pendulum—MATLAB & Simulink. Available online: https://www.mathworks.com/help/reinforcement-learning/ug/train-dqn-agent-to-swing-up-and-balance-pendulum.html (accessed on 14 May 2022).
Figure 1. Scheme of ALHB.
Figure 1. Scheme of ALHB.
Lubricants 12 00242 g001
Figure 2. Verification of the model by comparison of predictions to the experimental data: (a) from [48] on hydrodynamic pressure distribution; (b) from [49] on hydrostatic pressure distribution.
Figure 2. Verification of the model by comparison of predictions to the experimental data: (a) from [48] on hydrodynamic pressure distribution; (b) from [49] on hydrostatic pressure distribution.
Lubricants 12 00242 g002
Figure 3. Predicted shaft loci compared to the experimental data [50].
Figure 3. Predicted shaft loci compared to the experimental data [50].
Lubricants 12 00242 g003
Figure 4. Models’ prediction compared to the experimental data (a) on lubricant flow rate; (b) on friction torque.
Figure 4. Models’ prediction compared to the experimental data (a) on lubricant flow rate; (b) on friction torque.
Lubricants 12 00242 g004
Figure 5. General pipeline of development of DQN controller for the ALHB.
Figure 5. General pipeline of development of DQN controller for the ALHB.
Lubricants 12 00242 g005
Figure 6. Relationship between the ALHB energy parameters and the shaft equilibrium position set by a controller: (a) power to overcome bearing friction; (b) power for pumping lubricant through the bearing.
Figure 6. Relationship between the ALHB energy parameters and the shaft equilibrium position set by a controller: (a) power to overcome bearing friction; (b) power for pumping lubricant through the bearing.
Lubricants 12 00242 g006
Figure 7. The structure of the simulation model in Simulink.
Figure 7. The structure of the simulation model in Simulink.
Lubricants 12 00242 g007
Figure 8. The structure of the convolutional ANN used as a critic network.
Figure 8. The structure of the convolutional ANN used as a critic network.
Lubricants 12 00242 g008
Figure 9. Results of testing the DQN controller: (a) power losses N f r   and   N p ; (b) summary power losses; (c) control signals; (d) trajectory of the shaft center.
Figure 9. Results of testing the DQN controller: (a) power losses N f r   and   N p ; (b) summary power losses; (c) control signals; (d) trajectory of the shaft center.
Lubricants 12 00242 g009
Figure 10. Results of testing the improved DQN controller: (a) power losses N f r   and   N p ; (b) summary power losses; (c) control signals; (d) trajectory of the shaft center.
Figure 10. Results of testing the improved DQN controller: (a) power losses N f r   and   N p ; (b) summary power losses; (c) control signals; (d) trajectory of the shaft center.
Lubricants 12 00242 g010
Figure 11. The testing results of the stable DQN model: (a) trajectories within the permissible zone, (b) trajectories outside the permissible zone.
Figure 11. The testing results of the stable DQN model: (a) trajectories within the permissible zone, (b) trajectories outside the permissible zone.
Lubricants 12 00242 g011
Table 1. Parameters of the rotor-bearing system.
Table 1. Parameters of the rotor-bearing system.
Rotor Parameters Bearing Parameters
Rotor diameter r = 39.84 mmBearing length L = 40 mm
Rotor mass m r = 24 kgBearing ratio L / D = 1
Rotation speed n = 1500 rpmBearing clearance h 0 = 80 µm
Lubricant typeWaterHydrostatic pocket width W p = 14 mm
Lubricant temperature T = 30 °CHydrostatic pocket length L p = 18 mm
Lubricant dynamic viscosity µ = 1.13 mPa·sRestrictor diameter d H = 1 mm
Lubricant density ρ = 1000 kg/m3Restrictor length l H = 11 mm
Supply pressure (initial) p 0 = 0.2 MPaSommerfeld number S o = 0.025
Supply pressure (maximum) p m a x = 2 MPaInitial eccentricity e 0 ≈ 0.5
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shutin, D.; Kazakov, Y. Multi-Objective Deep Q-Network Control for Actively Lubricated Bearings. Lubricants 2024, 12, 242. https://doi.org/10.3390/lubricants12070242

AMA Style

Shutin D, Kazakov Y. Multi-Objective Deep Q-Network Control for Actively Lubricated Bearings. Lubricants. 2024; 12(7):242. https://doi.org/10.3390/lubricants12070242

Chicago/Turabian Style

Shutin, Denis, and Yuri Kazakov. 2024. "Multi-Objective Deep Q-Network Control for Actively Lubricated Bearings" Lubricants 12, no. 7: 242. https://doi.org/10.3390/lubricants12070242

APA Style

Shutin, D., & Kazakov, Y. (2024). Multi-Objective Deep Q-Network Control for Actively Lubricated Bearings. Lubricants, 12(7), 242. https://doi.org/10.3390/lubricants12070242

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop