Application of Deep Reinforcement Learning to Predict Shaft Deformation Considering Hull Deformation of Medium-Sized Oil/Chemical Tanker

: The enlargement of ships has increased the relative hull deformation owing to draft changes. Moreover, design changes such as an increased propeller diameter and pitch changes have occurred to compensate for the reduction in the engine revolution and consequent ship speed. In terms of propulsion shaft alignment, as the load of the stern tube support bearing increases, an uneven load distribution occurs between the shaft support bearings, leading to stern accidents. To prevent such accidents and to ensure shaft system stability, a shaft system design technique is required in which the shaft deformation resulting from the hull deformation is considered. Based on the measurement data of a medium-sized oil/chemical tanker, this study presents a novel approach to predicting the shaft deformation following stern hull deformation through inverse analysis using deep reinforcement learning, as opposed to traditional prediction techniques. The main bearing reaction force, which was difﬁcult to reﬂect in previous studies, was predicted with high accuracy by comparing it with the measured value, and reasonable shaft deformation could be derived according to the hull deformation. The deep reinforcement learning technique in this study is expected to be expandable for predicting the dynamic behavior of the shaft of an operating vessel.


Introduction
Since the 2000s, hull stiffness has been reduced owing to the enlargement of ships, which has been achieved along with the growth of the global economy. Moreover, significant hull deformation occurs as a result of to the relatively large loading changes in the aft peak tank (APT) and draft. As the stiffness of the shaft increases owing to the increased engine power, the hull deformation causes the reaction force of the bearings that support the shaft to be distributed more unevenly. Furthermore, the ultra-long stroke engine has recently been introduced for energy efficiency. Thus, as the propeller diameter increases and changes in pitch, the propeller weight increases to compensate for the decrease in the engine revolution caused by the long stroke and consequent ship speed, which further increases the stern load [1]. Owing to the above factors, the hull deformation causes shaft deformation, which results in accidents relating to heat generation of the stern tube bearings, abnormal wear, and damage in terms of the shaft alignment [2]. Such damage occurs as an excessive local load concentration on specific bearings, whereas the adjacent bearings form no load, resulting in an uneven distribution of each bearing load. Thus, it is necessary to ensure shaft system stability by performing shaft alignment with appropriate bearing load distribution considering the hull deformation [3][4][5][6].
Many studies have been conducted to evaluate the shaft stability considering the shaft deformation owing to hull deformation. Representative examples of such work include a study that predicted shaft deformation using the finite element method in the analysis of a model of the stern structure [7][8][9] and another study whose prediction of shaft deformation was demonstrated to sufficiently reflect measurements by applying the measured data to various inverse-analysis techniques. [3,10,11] Stern structural analysis has been performed using the finite element (FE) method, and in certain studies, inverse analysis of the measured data has been performed. Choung and Choe [7] generated FE models for shaft systems, double bottoms, and main engines to predict the hull deformation and shaft deformation. Korbetis et al. [8] performed global structural analysis with a finely modeled stern of a 320 K DWT VLCC and predicted the shaft deformation according to the loading conditions by applying the lumped weight of the main engine, without separate modeling of the shaft and engine. Moreover, Seo et al. [9] performed global structural analysis with a finely modeled stern of a 300 K DWT VLCC and predicted the shaft deformation according to the draft by modeling the shaft, engine, and engine foundation separately. However, in studies on stern structural analysis using the FE method, significant deviation of the main bearing (MB) reaction force occurred in the shaft deformation prediction according to the draft, and it was difficult to consider the actual ship conditions and load distribution of the engine room equipment.
Rao et al. [10] generated the reaction force influence number (RIN) matrix from the bearing reaction force that was calculated by the bending moment measured with a strain gauge and applied inverse analysis to predict the shaft deformation. Šverko [3] predicted the shaft deformation using a genetic algorithm from the bearing reaction force that was measured by the jack-up method and the bending moment that was measured by a strain gauge. Lee [11] predicted the shaft deformation using a trial-and-error method according to the bending moment that was measured with a strain gauge. However, the techniques used in this inverse-analysis study had the limitation of being unable to make predictions that sufficiently reflected measurements due to its sensitivity to shaft deformation, whereby a large reaction force is generated even with a minor deformation of the shaft.
Compared to stern structural analysis using the FE method, in the inverse analysis method using the measured data, the measured bearing reaction force and bending moment can be used to predict the shaft deformation, thereby reflecting the hull deformation that is caused by the actual draft conditions of the ship, the load distribution in the engine room, and the APT loading. However, it is also difficult to predict the shaft deformation considering the reaction force of the MBs. Therefore, it is necessary to develop a technique for the inverse analysis of shaft deformation that can sufficiently reflect the MB reaction force.
Recent advances in computing performance have contributed to active research using deep neural networks and reinforcement learning (RL) in various fields. The structure of the deep neural network mimics the human brain and neural network. This technique can perform complex nonlinear analysis that is difficult to achieve with an equation for variables without prior definition [12]. RL changes to the "next state" by selecting an "action" from the "state" based on the content learned through trial and error to obtain a "reward," and an action or action pattern is selected that maximizes the sum of the rewards. Deep RL, which combines RL and deep neural networks, has been successfully applied to various problems, such as control, recommendation, optimization, and detection [13]. Uyanik et al. [14] predicted the fuel consumption of container ships using various machine learning techniques. Cheliotis et al. [15] detected abnormal symptoms of ships using machine learning. Kim et al. [16] predicted the fuel consumption rate of container ships using a deep neural network and presented the optimal operating conditions accordingly. Song et al. [17] conducted a study on the detection of ships through synthetic aperture radar (SAR) images and automatic identification system (AIS) information using a convolutional neural network. Karvelis et al. [18] studied structural health monitoring by attaching a sensor to the hull structure and using an autoencoder. Scardua et al. [19] proposed optimal ship unloader control by applying an artificial neural network to RL. Fu et al. [20] investigated the detection of the rotation of a berthing ship using RL. Zhao and Rho [21] conducted a study on the collision avoidance of ships according to COLREG using deep RL. As described above, various studies have been carried out on the successful application of deep neural networks and deep RL to ships. However, no research to date has applied deep RL to the shaft system of a ship, and it is expected that the use of deep RL in the inverse analysis of the measured data will be suitable for the shaft deformation problem.
In this study, measured values of the shaft under various draft conditions were applied as input variables to the deep-RL-based algorithm, with the output being the prediction of shaft deformation features that sufficiently reflect measured values. First, the supporting bearing reaction force of the shaft system and the shaft bending moment of the target ship were measured according to the draft. This was followed by creating an FE model of the shaft system of the target ship to check the fit of the measured and calculated values. Thereafter, a deep RL algorithm was modeled for use in inverse analysis, and the shaft deformation was predicted through an inverse analysis algorithm by applying deep RL using the measured values. Subsequently, the trends in the predicted shaft deformation were identified and substituted into the shaft model to evaluate how effectively the measured values could be reflected. Finally, the results were compared with those of shaft deformation prediction using existing research techniques. The purpose of this study was to determine the academic significance of the shaft deformation prediction of inverse analysis by applying deep RL. a convolutional neural network. Karvelis et al. [18] studied structural health monitoring by attaching a sensor to the hull structure and using an autoencoder. Scardua et al. [19] proposed optimal ship unloader control by applying an artificial neural network to RL. Fu et al. [20] investigated the detection of the rotation of a berthing ship using RL. Zhao and Rho [21] conducted a study on the collision avoidance of ships according to COLREG using deep RL. As described above, various studies have been carried out on the successful application of deep neural networks and deep RL to ships. However, no research to date has applied deep RL to the shaft system of a ship, and it is expected that the use of deep RL in the inverse analysis of the measured data will be suitable for the shaft deformation problem.

Materials and Methods
In this study, measured values of the shaft under various draft conditions were applied as input variables to the deep-RL-based algorithm, with the output being the prediction of shaft deformation features that sufficiently reflect measured values. First, the supporting bearing reaction force of the shaft system and the shaft bending moment of the target ship were measured according to the draft. This was followed by creating an FE model of the shaft system of the target ship to check the fit of the measured and calculated values. Thereafter, a deep RL algorithm was modeled for use in inverse analysis, and the shaft deformation was predicted through an inverse analysis algorithm by applying deep RL using the measured values. Subsequently, the trends in the predicted shaft deformation were identified and substituted into the shaft model to evaluate how effectively the measured values could be reflected. Finally, the results were compared with those of shaft deformation prediction using existing research techniques. The purpose of this study was to determine the academic significance of the shaft deformation prediction of inverse analysis by applying deep RL.

Target Ship Specifications
The target ship of this study was a 50,000 DWT medium-sized oil/chemical tanker that was equipped with a highly energy-efficient long-stroke engine. The specifications of target ship are presented in Table 1 and the specifications of the shaft system of the target ship are listed in Table 2.

Target Ship Specifications
The target ship of this study was a 50,000 DWT medium-sized oil/chemical tanker that was equipped with a highly energy-efficient long-stroke engine. The specifications of target ship are presented in Table 1 and the specifications of the shaft system of the target ship are listed in Table 2. As illustrated in Figure 2, the forward stern tube bearings were omitted in the propulsion shaft system of the target ship. The aft stern tube bearing (ASTB), intermediate bearing (IB), and eight MBs were aligned with the main engine crank, which was installed linearly.  As illustrated in Figure 2, the forward stern tube bearings were omitted in the propulsion shaft system of the target ship. The aft stern tube bearing (ASTB), intermediate bearing (IB), and eight MBs were aligned with the main engine crank, which was installed linearly. Typical methods for measuring the shaft reaction force include the jack-up method, which measures the bearing reaction force directly using a hydraulic jack, and the strain gauge method, which calculates the reaction force by obtaining the bending moment from the strain that is measured by attaching a strain gauge to the shaft and applying the moment equilibrium equation. In the jack-up method, the reaction force of the bearing is measured directly by installing a hydraulic jack near the bearing position, as indicated in Typical methods for measuring the shaft reaction force include the jack-up method, which measures the bearing reaction force directly using a hydraulic jack, and the strain gauge method, which calculates the reaction force by obtaining the bending moment from the strain that is measured by attaching a strain gauge to the shaft and applying the moment equilibrium equation. In the jack-up method, the reaction force of the bearing is measured directly by installing a hydraulic jack near the bearing position, as indicated in Figure 3. The reaction force of the IB is measured by installing a hydraulic jack on the foundation, as illustrated in Figure 3a. The reaction force of the MBs excluding the aftmost MB is determined by turning the crank arm to a horizontal position in the direction of the exhaust pipe, as indicated in Figure 3b, following which measurement is performed by installing a hydraulic jack on the crank arm that is adjacent to the measured bearing. The reaction force of the aftmost MB is measured by installing a rigid steel beam under the hydraulic jack and a steel bar between the flywheel and hydraulic jack, as illustrated in Figure 3c. However, as the stern tube bearing is located in the stern tube, making it difficult to access, it is impossible to install a hydraulic jack at the bearing position.
J. Mar. Sci. Eng. 2021, 9, x FOR PEER REVIEW 5 of 30 Figure 3. The reaction force of the IB is measured by installing a hydraulic jack on the foundation, as illustrated in Figure 3a. The reaction force of the MBs excluding the aftmost MB is determined by turning the crank arm to a horizontal position in the direction of the exhaust pipe, as indicated in Figure 3b, following which measurement is performed by installing a hydraulic jack on the crank arm that is adjacent to the measured bearing. The reaction force of the aftmost MB is measured by installing a rigid steel beam under the hydraulic jack and a steel bar between the flywheel and hydraulic jack, as illustrated in Figure 3c. However, as the stern tube bearing is located in the stern tube, making it difficult to access, it is impossible to install a hydraulic jack at the bearing position. The advantages of the jack-up method are that it is simple, and measurement can be performed using only a hydraulic jack and a dial indicator. Furthermore, it is the only method in which the reaction force is measured directly. However, the disadvantages are that it is not possible to measure the bearing reaction force against the rotating shaft, it is not possible to measure the bearing reaction force against the shaft during a ship voyage, the same hydraulic jack must be used for repeated measurements, and the measured value may exhibit errors depending on the installation degree [22].

Bending Moment Measurement Using Strain Gauge
The strain gauge method provides the moment and reaction force for the shaft system alignment using the flexural beam theory. The Wheatstone bridge circuit connection used in the strain gauge method is divided into one gauge (quarter gauge), two gauges (half gauge), and four gauges (full gauge) according to the number of attached gauges. In this study, the two-gauge method was used in consideration of the utility versus installation time. The strain gauge used was the WFLA-3-11-L1 manufactured by TML (Tokyo Sokki Kenkyujo). As illustrated in Figure 4a,c,d the two gauges, spaced 180 degrees above and below the shaft surface, were attached with a Wheatstone bridge connection, as indicated in Figure 4b. The resistance value of the strain gauge changes proportionally to the vertical strain of the shaft that is caused by the rotation of the shaft, and in this manner, a variable output voltage compared to the input voltage can be obtained. The strain in the strain gauge can be determined from the initial resistance and the amount of change in the resistance using Equation (1). The advantages of the jack-up method are that it is simple, and measurement can be performed using only a hydraulic jack and a dial indicator. Furthermore, it is the only method in which the reaction force is measured directly. However, the disadvantages are that it is not possible to measure the bearing reaction force against the rotating shaft, it is not possible to measure the bearing reaction force against the shaft during a ship voyage, the same hydraulic jack must be used for repeated measurements, and the measured value may exhibit errors depending on the installation degree [22].

Bending Moment Measurement Using Strain Gauge
The strain gauge method provides the moment and reaction force for the shaft system alignment using the flexural beam theory. The Wheatstone bridge circuit connection used in the strain gauge method is divided into one gauge (quarter gauge), two gauges (half gauge), and four gauges (full gauge) according to the number of attached gauges. In this study, the two-gauge method was used in consideration of the utility versus installation time. The strain gauge used was the WFLA-3-11-L1 manufactured by TML (Tokyo Sokki Kenkyujo). As illustrated in Figure 4a,c,d the two gauges, spaced 180 degrees above and below the shaft surface, were attached with a Wheatstone bridge connection, as indicated in Figure 4b. The resistance value of the strain gauge changes proportionally to the vertical strain of the shaft that is caused by the rotation of the shaft, and in this manner, a variable output voltage compared to the input voltage can be obtained. The strain in the strain gauge can be determined from the initial resistance and the amount of change in the resistance using Equation (1).
where is the strain, R is the initial resistance measured by the strain gauge, ∆R is the amount of change in the resistance measured by the strain gauge, and k is the strain gauge coefficient.
where is the strain, is the initial resistance measured by the strain gauge, Δ is the amount of change in the resistance measured by the strain gauge, and is the strain gauge coefficient.
According to the strain obtained in Equations (1) and (2) is used to calculate the bending stress in the two-gauge method, as illustrated in Figure 4a,b.
where is the bending stress, is the strain amplitude of the strain gauge that is installed in the lower part, is the strain amplitude of the strain gauge that is installed in the lower part, and is the Young's modulus. According to the bending stress obtained in Equation (2), the bending moment is obtained from the relational equation for the beam theory, as per Equation (3): where is the bending moment, is the outer diameter of the shaft, and is the inner diameter of the shaft.
The advantage of the strain gauge method is that the reaction force is calculated using the moment equilibrium equation. Thus, the reaction force of the stern tube bearing that is located at the end of the stern tube can be calculated and the reaction force can be measured while the shaft is rotating during the ship voyage. However, the disadvantage is that, in the case of the MB, it is impossible to install the gauge inside the engine. Therefore, the According to the strain obtained in Equations (1) and (2) is used to calculate the bending stress in the two-gauge method, as illustrated in Figure 4a,b.
where σ b is the bending stress, b is the strain amplitude of the strain gauge that is installed in the lower part, t is the strain amplitude of the strain gauge that is installed in the lower part, and E is the Young's modulus. According to the bending stress obtained in Equation (2), the bending moment is obtained from the relational equation for the beam theory, as per Equation (3): where M b is the bending moment, D o is the outer diameter of the shaft, and D i is the inner diameter of the shaft. The advantage of the strain gauge method is that the reaction force is calculated using the moment equilibrium equation. Thus, the reaction force of the stern tube bearing that is located at the end of the stern tube can be calculated and the reaction force can be measured while the shaft is rotating during the ship voyage. However, the disadvantage is that, in the case of the MB, it is impossible to install the gauge inside the engine. Therefore, the reaction force cannot be calculated, the installation time is long, and the equipment is sensitive and expensive [22]. Table 3 displays the fore and aft drafts under the measured draft conditions for the reaction force and bending moment of the target ship, whereas Figure   reaction force cannot be calculated, the installation time is long, and the equipment is sensitive and expensive [22]. Table 3 displays the fore and aft drafts under the measured draft conditions for the reaction force and bending moment of the target ship, whereas Figure    The reaction force of the target vessel could be measured using the jack-up method with a hydraulic jack by stopping the engine under the conditions displayed in Table 3 and performing the measurement. Table 4 lists the reaction force measurement results of the IB and the MBs measured under each draft condition. The reaction force of the target vessel could be measured using the jack-up method with a hydraulic jack by stopping the engine under the conditions displayed in Table 3 and performing the measurement. Table 4 lists the reaction force measurement results of the IB and the MBs measured under each draft condition. In this study, strain gauges were installed at seven positions, as illustrated in Figure 6, the exact positions of which are detailed in Table 5. The strain was measured by rotating the shaft from one to two turns at a low speed of 2-5 rpm using a turning gear. Table 6 presents the bending moment measurement results of the measured strain gauges. In this study, strain gauges were installed at seven positions, as illustrated in Figure  6, the exact positions of which are detailed in Table 5. The strain was measured by rotating the shaft from one to two turns at a low speed of 2-5 rpm using a turning gear. Table 6 presents the bending moment measurement results of the measured strain gauges.   The reaction force of the ASTB could not be measured by installing a hydraulic jack. Thus, the reaction force of the ASTB ( ), effective support point of the ASTB ( ), no. 7 strain gauge attachment position ( ), and propeller weight (W ) were simply expressed as shown in Figure 7. On this basis, the moment equilibrium equation in Equation (4) was  The reaction force of the ASTB could not be measured by installing a hydraulic jack. Thus, the reaction force of the ASTB (R ASTB ), effective support point of the ASTB (B), no. 7 strain gauge attachment position (A), and propeller weight (W p ) were simply expressed as shown in Figure 7. On this basis, the moment equilibrium equation in Equation (4) was established, and an equation for the ASTB reaction force was established as in Equation (5).

Measurement Results
where M A is the bending moment measured at the no. 7 strain gauge, R ASTB is the reaction force of the ASTB, where is the bending moment measured at the no. 7 strain gauge, is the rea tion force of the ASTB, is the distance from the effective support point of the ASTB ( to the attachment position of the no. 7 strain gauge ( ), is the load of the propell is the distance from the support point of the propeller to the attachment position of t no. 7 strain gauge ( ), and and are the bending moment and weight of the sh from the effective support point of the stern tube bearing to the attachment position of t no. 7 strain gauge ( ), respectively. Table 7 displays the results of the reaction force of the ASTB calculated under t draft conditions in Table 3. As illustrated in Figure 8, the deep neural network consists of an input layer th receives input variables, a hidden layer that receives variables from the input layer a performs calculations, and an output layer that receives the results from the hidden lay and outputs them. Such a network is known as an artificial neural network becau weights and biases are applied as per Equation (6) as it progresses from a node constitu ing a layer to a node in the next layer, and it is transmitted to the next layer through activation function inside the hidden layer. An artificial neural network in which ma layers are stacked is referred to as a deep neural network [23].  Table 7 displays the results of the reaction force of the ASTB calculated under the draft conditions in Table 3. As illustrated in Figure 8, the deep neural network consists of an input layer that receives input variables, a hidden layer that receives variables from the input layer and performs calculations, and an output layer that receives the results from the hidden layer and outputs them. Such a network is known as an artificial neural network because weights and biases are applied as per Equation (6) as it progresses from a node constituting a layer to a node in the next layer, and it is transmitted to the next layer through an activation function inside the hidden layer. An artificial neural network in which many layers are stacked is referred to as a deep neural network [23].
where i 1 and i 2 are the nodes of the input layer; w 11 , w 12 , w 21 , and w 22 are the weights that are applied to the nodes when transmitting from the input layer to the hidden layer; b h1 and b h2 are the biases that are applied to the nodes; and H 1 and H 2 are the values of the nodes of the hidden layer. In a deep neural network, the process of calculating all of the given data from the input layer to the output layer, as indicated in Figure 9, is known as the "epoch," and the function representing the error between the target value and output value after the end of the epoch is referred to as a "loss function." In deep neural network learning, this error is calculated in the opposite direction to that of the neural network to reduce the loss function, which is known as "back propagation." As this process is performed repeatedly through the optimization technique, the weight and bias of the node are updated and a value that is close to the target value is output. [ , ] × where and are the nodes of the input layer; , , , and are the weights that are applied to the nodes when transmitting from the input layer to the hidden layer; and are the biases that are applied to the nodes; and and are the values of the nodes of the hidden layer.
In a deep neural network, the process of calculating all of the given data from the input layer to the output layer, as indicated in Figure 9, is known as the "epoch," and the function representing the error between the target value and output value after the end of the epoch is referred to as a "loss function." In deep neural network learning, this error is calculated in the opposite direction to that of the neural network to reduce the loss function, which is known as "back propagation." As this process is performed repeatedly through the optimization technique, the weight and bias of the node are updated and a value that is close to the target value is output.   [ , ] × where and are the nodes of the input layer; , , , and are the weights that are applied to the nodes when transmitting from the input layer to the hidden layer; and are the biases that are applied to the nodes; and and are the values of the nodes of the hidden layer.
In a deep neural network, the process of calculating all of the given data from the input layer to the output layer, as indicated in Figure 9, is known as the "epoch," and the function representing the error between the target value and output value after the end of the epoch is referred to as a "loss function." In deep neural network learning, this error is calculated in the opposite direction to that of the neural network to reduce the loss function, which is known as "back propagation." As this process is performed repeatedly through the optimization technique, the weight and bias of the node are updated and a value that is close to the target value is output.

RL
RL is a technique that was developed based on the dynamic programming method proposed by Bellman [24]. It can be categorized as value-based RL and policy-based RL depending on whether an action or a policy is selected to maximize the value. Value-based RL was applied in this study because the purpose was to determine the optimal value, and not the policy to determine the optimal value. Q-learning, which is a representative algorithm of value-based RL [25], performs an action (a t ) in the current state (S t ) of the t th episode, as indicated in Equation (7) and Figure 10. The value (Q) is updated through the weighted sum of the maximum value in the next state (max a t+1 Q(S t+1 , a t+1 )) and the previous value (Q(S t , a t )) that is calculated by receiving rewards (R t ) according to the next state (S t+1 ).
where Q(S t , a t ) is the value when an action (a t ) is performed in a state (S t ), α is the learning rate indicating the update degree, R t is the reward, and γ is the discount rate for the future reward. Furthermore, max a t+1 Q(S t+1 , a t+1 ) is the greatest value obtained when an action (a t+1 ) is selected in the next state (S t+1 ).
Q-learning, which is a representative algorithm of value-based RL [25], performs an action ( ) in the current state ( ) of the episode, as indicated in Equation (7) and Figure 10. The value ( ) is updated through the weighted sum of the maximum value in the next state (max ( , )) and the previous value ( ( , )) that is calculated by receiving rewards ( ) according to the next state ( ).
( , ) ← ( , ) + + max ( , ) − ( , ) where ( , ) is the value when an action ( ) is performed in a state ( ), α is the learning rate indicating the update degree, is the reward, and is the discount rate for the future reward. Furthermore, max ( , ) is the greatest value obtained when an action ( ) is selected in the next state ( ).

Deep RL
Deep RL uses a deep neural network to determine a behavior in RL and performs back propagation to reduce the loss function of the Q-value for the behavior. The deep Qlearning network is a method that combines Q-learning and deep neural networks [27]. This algorithm selects a random action with a probability of as in Equation (8) to prevent overfitting of the result, and applies a -greedy policy to select an action ( * ) that takes the maximum value (max ( , )) with a probability of 1 − [13].
Moreover, the experience replay memory [28] is used to optimize the action so that the action is not affected by the old episode result, thereby enabling rapid operation by storing the experience result up to the memory size after the end of the episode. Figure 11 depicts the procedure of the deep RL algorithm model for predicting the shaft deformation from the reaction force and bending moment measurements used in this study.

Deep RL
Deep RL uses a deep neural network to determine a behavior in RL and performs back propagation to reduce the loss function of the Q-value for the behavior. The deep Q-learning network is a method that combines Q-learning and deep neural networks [27]. This algorithm selects a random action with a probability of as in Equation (8) to prevent overfitting of the result, and applies a -greedy policy to select an action (a * ) that takes the maximum value (max a Q(s, a)) with a probability of 1 - [13].
Moreover, the experience replay memory [28] is used to optimize the action so that the action is not affected by the old episode result, thereby enabling rapid operation by storing the experience result up to the memory size after the end of the episode. Figure 11 depicts the procedure of the deep RL algorithm model for predicting the shaft deformation from the reaction force and bending moment measurements used in this study. Episode (red solid line) selects "Action (2)" from "State (1)" and calculates "Next state (3)." If the calculated "Next state" meets "Constraint (4)," Episode ends and "State," "Action," "Next state," and "Value (5)" are stored in "Memory (6)." If the constraint is not satisfied, "Action (2)" is executed again. Epoch (blue dotted line) saves the result from the Episode (red solid line) selects "Action (2)" from "State (1)" and calculates "Next state (3)." If the calculated "Next state" meets "Constraint (4)," Episode ends and "State," "Action," "Next state," and "Value (5)" are stored in "Memory (6)." If the constraint is not satisfied, "Action (2)" is executed again. Epoch (blue dotted line) saves the result from the completed Episode, initializes "State (1)," and suggests the optimal result when all epochs are completed. Update (green dotted line) updates "Action (2)" by sampling the batch from the memory after the episode ends and performing back propagation to reduce the loss function.

Shaft FE Modeling
To calculate the shaft reaction force and bending moment according to the offset, using the displacement of the aft stern tube bearing as a reference point, the shaft diameter change point, load action location, strain gauge attachment location, and node according to the bearing location were divided into nodes. The element details displayed in Table 8 were applied and modeled with the MSC software PATRAN, as illustrated in Figure 12.     Table 9 presents the applied loads in the drawing (bonnet and propeller) and recommended loads of the manufacturer (flywheel, chain force, and crank) acting on the shaft. Table 10 displays the boundary conditions of the nodes applied to the shaft model, and Table 11 presents the density information for calculating the density of the shaft in contact with seawater, lubricant, and air.  In this study, a deep RL algorithm was modeled in which a deep neural network was applied to a Q-learning algorithm for inverse analysis of the shaft deformation. Table 12 presents the hyperparameter values applied in this algorithm. The detailed procedure of the deep RL algorithm for applying the shaft deformation, as illustrated in Figure 8, is outlined as follows:

1.
State: The state is a total of 12 variables, which are the reaction force of each bearing (ASTB, IB, MB8, MB7, and MB6) and the bending moment of the strain gauge attachment point (SG1-7). According to the measurement method, the variables constituting the state can be divided into three types: the directly measured reaction force (IB, MB8, MB7, and MB6), directly measured bending moment (SG1-7), and indirectly calculated reaction force from the bending moment using the moment equilibrium equation (ASTB).

2.
Action: Based on the previous state, three vertical displacement variables of IB, MB8, and MB3 are output through the deep neural network, as illustrated in Figure 13. The displacement ASTB, which is not computed through the deep neural network, is 0 as the reference point of the shaft, and the main bearings MB7 to MB4 are linearly aligned. Thus, a total of eight vertical displacements (ASTB, IB, MB8, MB7, MB6, MB5, MB4, and MB3) are applied as inputs of the next state by linear interpolation between MB8 and MB3.

Next state:
The next state is the bearing reaction force and bending moment at the strain gauge point that is obtained by applying the vertical displacement of each bearing (action) to the shaft model and implementing the FE method.

Constraint:
The constraint is the allowable condition of each bearing according to the classification regulations and manufacturer standard [29][30][31], the expression of which is equivalent to Equation (9): where is the relative slope angle of the ASTB, is the minimum allowable bearing reaction force, is the maximum allowable bearing reaction force, is the bearing reaction force, is the bearing position, is the MB displacement, is the IB displacement, and is the aftmost MB displacement. According to the classification rules, as illustrated in Figure 14, the relative slope angle of the support point of the ASTB should be less than 0.3 mrad, and it is recommended that the surface pressure of the ASTB and IB does not exceed the allowable value [29][30][31].

3.
Next state: The next state is the bearing reaction force and bending moment at the strain gauge point that is obtained by applying the vertical displacement of each bearing (action) to the shaft model and implementing the FE method.

4.
Constraint: The constraint is the allowable condition of each bearing according to the classification regulations and manufacturer standard [29][30][31], the expression of which is equivalent to Equation (9): where θ ASTB is the relative slope angle of the ASTB, P min is the minimum allowable bearing reaction force, P max is the maximum allowable bearing reaction force, P b is the bearing reaction force, l is the bearing position, δ MB is the MB displacement, δ IB is the IB displacement, and δ MBa f t is the aftmost MB displacement. According to the classification rules, as illustrated in Figure 14, the relative slope angle of the support point of the ASTB should be less than 0.3 mrad, and it is recommended that the surface pressure of the ASTB and IB does not exceed the allowable value [29][30][31].
where is the relative slope angle of the ASTB, is the minimum allowable bearing reaction force, is the maximum allowable bearing reaction force, is the bearing reaction force, is the bearing position, is the MB displacement, is the IB displacement, and is the aftmost MB displacement. According to the classification rules, as illustrated in Figure 14, the relative slope angle of the support point of the ASTB should be less than 0.3 mrad, and it is recommended that the surface pressure of the ASTB and IB does not exceed the allowable value [29][30][31].   Table 13 presents the allowable reaction force of the ASTB and the IB of the target vessel, reflecting the classification rules. Furthermore, it is recommended by the manufacturer of the main engine of the target ship that the surface pressure of the MBs does not exceed the allowable value. Table 14 presents the allowable reaction force of the MBs, reflecting the recommendations of the main engine manufacturer. According to the main engine manufacturer standard, the displacements of the MBs should be arranged linearly. Moreover, the displacement of the MB at the end of the stern should be located lower than the displacement of the IB [32].

Value
The value refers to learning while repeating epochs and acting in the direction with the smallest difference from the actual measured value within minimum episodes. If deep RL is performed using the measured value as it is, the bending moment with a relatively large number of variables will have the greatest effect. However, as the criterion in the classification rules is the allowable reaction force of each bearing, deep RL was performed by applying a weight to the bearing reaction force [29][30][31]. In this study, the state variables were divided into three types: the direct measurement reaction force, direct measurement bending moment, and indirect calculation reaction force. As per Equation (10), weights of 0.55 for the directly measured bearing reaction force, 0.40 for the directly measured strain gauge bending moment, and 0.05 for the indirectly calculated bearing reaction force were applied to increase the priority of the jack-up reaction force during the deep RL.
where R JU and R JUmeasured are the predicted and measured reaction force of IB, MB8, MB7, and MB6, M SG and M measurement are the predicted and measured bending moments of strain gages 1-7, and R ASTB and R ASTBmeasured are the predicted and calculated reaction forces of the aft stern tube bearing. The goal of deep RL is to maximize the value, but the smaller the value calculated in Equation (10), the closer to the actual value, the closer the shaft deformation including the reaction force and bending moment is predicted. Therefore, the weighted sum is multiplied by -1 to predict the shaft deformation including the reaction force and bending moment close to the actual value.

Memory
The memory stores the state, action, next state, and value at the end of an episode up to the maximum memory size and performs back propagation to obtain the action that maximizes the value using the batch that is extracted by sampling from memory, so that the update is focused on the most recent value.

Light Draft Condition (D1)
To understand the trends of the shaft deformation prediction results after performing the deep RL inverse analysis under each draft condition, "offset" (a) (shaft deformation predicted by deep RL execution as the basis of the measured value), "load" (b) and "bending moment" (c) (reaction force and bending moment calculated by applying the shaft deformation to the finite element model), and "value" (d) (the difference between the predicted and measured values, where a smaller value is closer to the measured value) for all cases (Figure 15), the top 100 cases (Figure 16), and top 10 cases (Figure 17) are presented according to the "value" in the light draft condition (D1).
According to Figures 15-17, which present the RL results of the light draft condition D1, the reaction force converged to a certain range with the filtering from all cases to the top 10 cases. The values in the top 10 cases (Figure 17d) were −1.23 to −1.22 and there was no significant difference in the values of the 10 cases, but the shaft deformation prediction (Figure 17a) did not converge beyond a certain level. Based on these results, various margins of error may occur in the shaft assembly stage depending on the sag tolerance of the engine bed plate, the deviation of the center of the MBs, and the tendency of the operator when the shaft installation is inclined compared to the design [12]. Owing to the above characteristics of the shaft, the fact that the reaction force and bending moment values in the shaft deformation prediction converged to similar values while filtering from all cases to the top 10 cases means that all shaft deformation predictions of the top 10 cases were valid. It is believed that the suggestion of the corresponding shaft deformation prediction range can aid the operator in decision-making during the process.
The shaft deformation, reaction force, bending moment predictions, and values of the top 10 cases according to the values in draft conditions D2-D5 are presented in the following.
predicted by deep RL execution as the basis of the measured value), "load" (b) and "ben ing moment" (c) (reaction force and bending moment calculated by applying the sh deformation to the finite element model), and "value" (d) (the difference between the p dicted and measured values, where a smaller value is closer to the measured value) for cases (Figure 15), the top 100 cases (Figure 16), and top 10 cases (Figure 17) are present according to the "value" in the light draft condition (D1).   (c) (d)

Ballast Draft Conditions (D2 and D3)
The RL results of the ballast conditions D2 and D3 are depicted in Figures 18 and 19, respectively. As in D1, the predicted values of the reaction force (Figures 18b and 19b) and bending moment (Figures 18c and 19c) in the top 10 cases converged to a certain range. The values of the 10 cases (Figures 18d and 19d) Figure 5, it was estimated that the APT located at the stern of the target ship was full and the load on the rear part of the shaft increased, so that the shaft tended to increase relative to the ASTB.

Scantling Draft Conditions (D4 and D5)
The RL results of the scantling conditions D4 and D5 are depicted in Figures 20 and 21, respectively. As with the other results, the predicted values of the bending moment (Figures 20c and 21c) in the top 10 cases converged to a certain range. The values of the 10 cases (Figures 20d and 21d) were from −8.75 to −8.60 in D4 and from −5.95 to −5.65 in D5, and there was no significant difference among the values of the 10 cases. However, it can be observed that the predicted value of the reaction force according to the shaft deformation did not converge compared to the predicted reaction force values at other drafts in the MB, and the value was estimated to be larger than the values at other drafts.
that the shaft deformation of D3 with the APT full (Figure 19a; IB: −0.2 to −0.8 mm; MB8: −1.6 to −3.2 mm; MB3: 0.2 to −1.9mm) was increased compared to the shaft deformation of D2 with the APT empty (Figure 18a; IB: −0.6 to −1.7 mm; MB8: −3.0 to −5.3 mm; MB3: −1.8 to −4.5 mm). As illustrated in Figure 5, it was estimated that the APT located at the stern of the target ship was full and the load on the rear part of the shaft increased, so that the shaft tended to increase relative to the ASTB.

Scantling Draft Conditions (D4 and D5)
The RL results of the scantling conditions D4 and D5 are depicted in Figures 20 and  21, respectively. As with the other results, the predicted values of the bending moment (Figures 20c and 21c) in the top 10 cases converged to a certain range. The values of the 10 cases (Figures 20d and 21d) were from −8.75 to −8.60 in D4 and from −5.95 to −5.65 in D5, 21, respectively. As with the other results, the predicted values of the bending moment (Figures 20c and 21c) in the top 10 cases converged to a certain range. The values of the 10 cases (Figures 20d and 21d) were from −8.75 to −8.60 in D4 and from −5.95 to −5.65 in D5, and there was no significant difference among the values of the 10 cases. However, it can be observed that the predicted value of the reaction force according to the shaft deformation did not converge compared to the predicted reaction force values at other drafts in the MB, and the value was estimated to be larger than the values at other drafts.  As in the ballast condition illustrated in Figure 5, it was estimated that the APT located at the stern of the target ship was full and the load on the rear part of the shaft increased, so that the shaft tended to rise relative to the ASTB.
However, as indicated in Figure 5c, compared to the ballast condition in which the APT was located above the water surface and the load was applied to the rear of the shaft, the scantling condition was as illustrated in Figure 5e, where the APT was mostly immersed in the lower portion of the water surface, and the load acting on the back of the shaft was estimated to have a small effect on the shaft deformation.
As a result of deep RL, the predicted reaction force, bending moment, and value converged to an almost certain range in the top 10 cases. As indicated in Table 15, the pre-  As in the ballast condition illustrated in Figure 5, it was estimated that the APT located at the stern of the target ship was full and the load on the rear part of the shaft increased, so that the shaft tended to rise relative to the ASTB.
However, as indicated in Figure 5c, compared to the ballast condition in which the APT was located above the water surface and the load was applied to the rear of the shaft, the scantling condition was as illustrated in Figure 5e, where the APT was mostly immersed in the lower portion of the water surface, and the load acting on the back of the shaft was estimated to have a small effect on the shaft deformation.  Figure 5, it was estimated that the APT located at the stern of the target ship was full and the load on the rear part of the shaft increased, so that the shaft tended to rise relative to the ASTB.
However, as indicated in Figure 5c, compared to the ballast condition in which the APT was located above the water surface and the load was applied to the rear of the shaft, the scantling condition was as illustrated in Figure 5e, where the APT was mostly immersed in the lower portion of the water surface, and the load acting on the back of the shaft was estimated to have a small effect on the shaft deformation.
As a result of deep RL, the predicted reaction force, bending moment, and value converged to an almost certain range in the top 10 cases. As indicated in Table 15, the predicted shaft deformation occurred in the top 10 cases, but overall, as the load progressed from a light load to a full load draft, and when the aft peak tank changed from empty to full, the predicted shaft deformation tended to increase. Furthermore, the change in the shaft deformation owing to the loading of the APT was larger in the ballast condition than in the scantling condition.

Comparison of Predicted Results and Measurements
As a result of the shaft deformation prediction using the deep RL described in Section 3.1, the reaction force and bending moment within a certain range were converged in the top 10 cases. To compare these results with the actual measured values, the predicted value at the time of the best value was extracted and analyzed, as illustrated in Figures 22 and 23.

Comparison of Predicted Bearing Reaction Force with Measurements
The measured bearing reaction force, the reaction force in the shaft deformation predicted by deep RL, and the maximum allowable load on the bearings are depicted in Figure 22. As in Equation (11), where Error is the ratio of the difference between the measured and the predicted reaction force to the allowable bearing load.
The calculated bearing reaction force in the predicted shaft deformation is similar to the measured value. Therefore, the inverse analysis technique applying deep RL predicted the shaft deformation in which the bearing reaction force was sufficiently reflected.

Comparison of Predicted Strain Gauge Bending Moment with Measurements
As illustrated in Figure 23, the calculated value of the strain gauge bending moment at the predicted shaft deformation was predicted similarly to the measured strain gauge bending moment.
In the scantling draft conditions D4 and D5, the predicted bending moment was overall lower than the measured value from the strain gauge by more than 10 kNm overall. However, the changes in the bending moment between strain gage positions predicted by deep RL under the draft conditions are similar to the changes in the measured values. Thus, the inverse analysis technique applied with deep RL predicted shaft deformation that sufficiently reflects the bending moment of the strain gage position, but predicted an overall low bending moment under some draft conditions. The measured bearing reaction force, the reaction force in the shaft deformation predicted by deep RL, and the maximum allowable load on the bearings are depicted in Figure 22. As in Equation (11), where Error is the ratio of the difference between the measured and the predicted reaction force to the allowable bearing load.
The calculated bearing reaction force in the predicted shaft deformation is similar to the measured value. Therefore, the inverse analysis technique applying deep RL predicted the shaft deformation in which the bearing reaction force was sufficiently reflected.

Comparison of Predicted Strain Gauge Bending Moment with Measurements
As illustrated in Figure 23, the calculated value of the strain gauge bending moment at the predicted shaft deformation was predicted similarly to the measured strain gauge bending moment.
In the scantling draft conditions D4 and D5, the predicted bending moment was over-

Comparison With Previous Research Methods
The measured values and ship information used for the deep RL in this study were the same data as those in Lee [14]. The prediction results of the inverse analysis technique using the experimental method of Lee [14] and the inverse analysis technique using deep RL in this study were compared for each draft condition (D1 to D5). Figures

Comparison with Previous Research Methods
The measured values and ship information used for the deep RL in this study were the same data as those in Lee [14]. The prediction results of the inverse analysis technique using the experimental method of Lee [14] and the inverse analysis technique using deep RL in this study were compared for each draft condition (D1 to D5). Figures 24 and 25 present the comparison results of the measured values under each draft condition, the predicted values of the inverse analysis technique using deep RL, and the predicted values of the inverse analysis technique using the experimental method (EM).
The comparison results of the reaction force prediction in Figure 24 demonstrate that the reaction force predicted by the inverse analysis method using deep RL under all draft conditions was close to the measured MB reaction force value. The MB reaction force predicted by the inverse analysis method using the experimental method was significantly different from the measured MB reaction force value.
Previous inverse analysis studies by Rao et al. [10] and Šverko [3] also exhibited difficulties in predicting the MB reaction force. In particular, in the work of Šverko [3], inverse analysis was conducted with a genetic algorithm using the MB reaction force and strain gauge bending moment, as in this study, but it was difficult to predict the MB reaction force. Lee [14] discussed the difficulty of predicting the reaction force of the MB when predicting the shaft deformation because a difference in the reaction force occurs even with a small offset difference owing to the characteristics of the MB with a large reaction force influence coefficient. Nevertheless, the MB reaction force in the shaft deformation predicted by the inverse analysis technique using deep RL was closer to the actual measured value compared to the techniques used in the previous studies.
The measured values and ship information used for the deep RL in this study were the same data as those in Lee [14]. The prediction results of the inverse analysis technique using the experimental method of Lee [14] and the inverse analysis technique using deep RL in this study were compared for each draft condition (D1 to D5  The comparison results of the reaction force prediction in Figure 24 demonstrate that the reaction force predicted by the inverse analysis method using deep RL under all draft conditions was close to the measured MB reaction force value. The MB reaction force predicted by the inverse analysis method using the experimental method was significantly different from the measured MB reaction force value.
Previous inverse analysis studies by Rao et al. [10] and Šverko [3] also exhibited difficulties in predicting the MB reaction force. In particular, in the work of Šverko [3], inverse analysis was conducted with a genetic algorithm using the MB reaction force and strain gauge bending moment, as in this study, but it was difficult to predict the MB reaction force. Lee [14] discussed the difficulty of predicting the reaction force of the MB when predicting the shaft deformation because a difference in the reaction force occurs even with a small offset difference owing to the characteristics of the MB with a large reaction force influence coefficient. Nevertheless, the MB reaction force in the shaft deformation predicted by the inverse analysis technique using deep RL was closer to the actual measured value compared to the techniques used in the previous studies. The comparison results of the reaction force prediction in Figure 24 demonstrate that the reaction force predicted by the inverse analysis method using deep RL under all draft conditions was close to the measured MB reaction force value. The MB reaction force predicted by the inverse analysis method using the experimental method was significantly different from the measured MB reaction force value.
Previous inverse analysis studies by Rao et al. [10] and Šverko [3] also exhibited difficulties in predicting the MB reaction force. In particular, in the work of Šverko [3], inverse analysis was conducted with a genetic algorithm using the MB reaction force and strain gauge bending moment, as in this study, but it was difficult to predict the MB reaction force. Lee [14] discussed the difficulty of predicting the reaction force of the MB when predicting the shaft deformation because a difference in the reaction force occurs even with a small offset difference owing to the characteristics of the MB with a large reaction force influence coefficient. Nevertheless, the MB reaction force in the shaft deformation predicted by the inverse analysis technique using deep RL was closer to the actual measured value compared to the techniques used in the previous studies.  The comparison results of the bending moment prediction in Figure 25 demonstrate that in the light and ballast conditions (D1-D3), the inverse analysis technique using deep RL predicted values that were closer to the measured strain gauge bending moment value than the values of the inverse analysis technique using the experimental method. In the scantling conditions D4 and D5, the inverse analysis method using the experimental method predicted the bending moment closer to the measured strain gauge bending moment value than the inverse analysis method using deep RL.
The changes in the bending moment between strain gage positions predicted by deep RL under the draft conditions are similar to the changes in the measured values. The inverse analysis technique applied with deep RL predicted shaft deformation that sufficiently reflects the bending moment of the strain gage position but predicted an overall low bending moment under some draft conditions. Future research intends to advance the inverse analysis algorithm by applying deep RL as a supplement.

Discussion
In this study, deep RL, which has been used in various fields in recent years, was applied to the shaft inverse analysis technique to predict the shaft deformation according to the hull deformation. Moreover, the prediction results and those of previously conducted inverse analysis were compared and analyzed. The novelty of this study is the reasonable prediction of the shaft deformation of the MB, which was difficult in previous studies owing to the large reaction force influence coefficient and the resulting change in the reaction force compared to other bearings. As a result, it is expected that accurate shaft deformation prediction will be possible, while saving time and costs compared to existing methods. The comparison results of the bending moment prediction in Figure 25 demonstrate that in the light and ballast conditions (D1-D3), the inverse analysis technique using deep RL predicted values that were closer to the measured strain gauge bending moment value than the values of the inverse analysis technique using the experimental method. In the scantling conditions D4 and D5, the inverse analysis method using the experimental method predicted the bending moment closer to the measured strain gauge bending moment value than the inverse analysis method using deep RL.
The changes in the bending moment between strain gage positions predicted by deep RL under the draft conditions are similar to the changes in the measured values. The inverse analysis technique applied with deep RL predicted shaft deformation that sufficiently reflects the bending moment of the strain gage position but predicted an overall low bending moment under some draft conditions. Future research intends to advance the inverse analysis algorithm by applying deep RL as a supplement.

Discussion
In this study, deep RL, which has been used in various fields in recent years, was applied to the shaft inverse analysis technique to predict the shaft deformation according to the hull deformation. Moreover, the prediction results and those of previously conducted inverse analysis were compared and analyzed. The novelty of this study is the reasonable prediction of the shaft deformation of the MB, which was difficult in previous studies owing to the large reaction force influence coefficient and the resulting change in the reaction force compared to other bearings. As a result, it is expected that accurate shaft deformation prediction will be possible, while saving time and costs compared to existing methods.
In this study, the shaft deformation according to the draft was predicted by applying the reaction force measured by the jack-up method, which is the simplest and most accurate direct measurement method that can be used only when the shaft is stopped, and the bending moment measured by the strain gauge method, which is time consuming, and can be used even when rotating the shaft, to the inverse analysis technique using deep RL. Moreover, the effect of the hull deformation according to the draft change on the shaft deformation was examined without shaft rotation; that is, in the static state. However, unlike in the static shaft state, in the dynamic shaft state, it is difficult to measure the bearing reaction force using a hydraulic jack, and the effects of the thrust that is generated from the propeller during ship operation on the propulsion shaft system should be investigated. Therefore, future research will include the performance of cross-validation of the strain gauge method and the jack-up method to guarantee the accuracy of the strain gauge method. Subsequently, the strain gauge method will be applied to the inverse analysis technique using deep RL to predict the shaft deformation in the dynamic state of a vessel in operation.

Conclusions
In this study, the bearing reaction force and bending moment were measured according to the draft conditions in the static shaft of a 50,000 DWT medium-sized oil/chemical tanker equipped with a high-efficiency engine. The inverse analysis technique was modeled using deep RL, and the following results were obtained by predicting the shaft deformation from the measured values under five draft conditions:

•
Although the predictions of the shaft deformation of the top 10 cases were not exactly the same, the predicted values of the reaction force and bending moment converged within a certain range.

•
For each draft condition, the predicted shaft deformation tended to increase as the target ship progressed from a light load to a full load. • Under the same draft condition, the shaft deformation tended to increase in the state where the APT was full compared to the state in which the APT was empty. It was estimated that the load of the APT located at the stern was applied on the rear part of the shaft. • A rise of the shaft owing to the loading of the APT occurred significantly in the ballast condition compared to the scantling condition. It was estimated that this was because the APT was located above the water surface and the load that was applied on the rear part of the shaft was larger than that in the scantling condition.

•
In this study, the shaft deformation that sufficiently reflects the measured MB reaction force was predicted, which was difficult in previous inverse analysis studies owing to the large reaction force influence coefficient.

•
The aim of future research will be to confirm the validity of the strain gauge method based on the static shaft deformation prediction applied in this study, so as to use the method for dynamic shaft deformation prediction through the advancement of the inverse analysis technique with deep RL. Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to confidentiality.