Bearing Digital Twin Based on Response Model and Reinforcement Learning

: In recent years, research on bearing fault modeling has witnessed signiﬁcant advancements. However, the modeling of bearing faults using digital twins (DTs) remains an emerging area of exploration. This paper introduces a bearing digital twin developed by integrating a signal-based response model with reinforcement learning techniques. Initially, a signal-based model is constructed, comprising a unit fault impulse function and a decay oscillation function. This model illustrates the bearing’s acceleration response under fault conditions and acts as the environmental component within the bearing digital twin. Subsequently, a parameter estimation process identiﬁes two critical parameters from the signal-based model: the load proportional factor and the decaying constant. The Deep Deterministic Policy Gradient (DDPG) algorithm is employed as the agent for online learning of these parameters. The cosine similarity metric is employed to deﬁne the state and reward by comparing the real acceleration measurements with the simulation data generated by the digital twin. To validate the effectiveness of the digital twin, experimental data sourced from the three datasets are utilized. The results underscore the digital twin’s capacity to faithfully replicate the bearing’s acceleration response under diverse conditions, demonstrating a high degree of similarity in both the time and frequency domains.


Introduction
Rolling element bearings serve as critical components within rotating machinery, exerting significant influence over the overall performance of the machine.Modeling techniques are commonly employed to simulate the dynamic behavior of the bearings.Technically, mechanism-based and signal-based methods find extensive application for simulating the vibration response of rolling element bearings under both normal and fault operating conditions [1,2].
The mechanism-based model is formulated through the derivation of force and moment balance equations based on Hertz contact theory.In 1985, S. Fukata introduced an initial dynamic model featuring a two degrees of freedom (2-DoF) configuration, rooted in Hertz contact theory [3].The research undertaken by the Tiwari group expanded upon this concept, exploring the dynamics of balanced and unbalanced rotors supported by rolling element bearings, thereby characterizing them as nonlinear systems [4,5].Subsequently, S. Sopanen presented a more comprehensive bearing dynamics model that encompasses the influence of various geometric defects, such as surface roughness, surface waviness, as well as partial and distributed defects, thereby enhancing the fidelity of the bearing model [6,7].Nevertheless, it is worth noting that the mechanism-based model exhibits certain limitations.This endeavor necessitates a profound understanding of both kinematics and dynamics, resulting in the construction of a notably intricate model.Furthermore, this complexity is compounded by the presence of numerous unmeasured parameters that demand identification.This not only hinders modeling efficiency but also elevates the complexity of real-time parameter identification.
In contrast to the mechanism-based model, the signal-based model boasts a simpler structure and a reduced number of parameters requiring identification.The signal-based modeling places its emphasis on the representation of vibration signals.McFadden and Smith have previously developed a model to describe the high-frequency vibration generated by a single point defect on the inner race under radial load [8], and subsequently extended this model to encompass the vibration produced by multiple point defects [9].Additionally, Mohammadi introduced a method for detecting multiple defects in bearings based on the time constant within the envelope detector.This technique is employed to discern the characteristic pattern of amplitude variations in defect frequency harmonics within the frequency domain [10].Various bearing fault characteristics can theoretically be simulated with the same model structure, while the signal-based model exhibits a marginally lower level of accuracy when compared to the mechanism-based model, it compensates with its remarkable modeling efficiency.This heightened efficiency enables the real-time identification of parameters.
As a burgeoning modeling approach, the digital twin possesses the capability to replicate a system using physical information and data gathered from sensors.This capability renders it suitable for application in signal-based modeling.Digital twin applications in bearings have yielded notable outcomes, encompassing three primary avenues: dynamics modeling, fault classification, and remaining useful life prediction.Qin [11] integrated the back-propagation neural network with the digital twin framework to construct a bearing model capable of simulating the life cycle vibration signals.Addressing the challenge of limited data in bearing fault diagnosis, Zhang [12] proposed an innovative digital-twin-driven approach featuring a transformer-based network and a selective adversarial strategy.This approach achieved an 80% accuracy in identifying various types of rolling bearing faults.Xiao [13] introduced a pioneering joint transfer network designed for unsupervised bearing fault diagnosis.This network facilitates knowledge transfer from the simulation domain to the experimental domain.Feng [14] devised an innovative digital-twin-enabled domain adversarial graph network (DTDAGN) that exclusively relies on the structural parameters of bearing dynamics.This novel approach is complemented by a transfer learning framework based on graph convolutional networks.Piltan [15] harnessed machine learning and intelligent digital twins to classify bearing faults and determine crack sizes.Remarkably, accuracy rates of 99.5% and 99.6% were achieved, respectively.Zhang [16] employed the integrated learning CatBoost method to construct a digital twin dataset and utilized fusion features for the life prediction of rolling bearings.Zhao [17] introduced a hybrid approach that combines virtual and real aspects of a bearing digital twin.This approach is based on a modified CycleGAN (generative adversarial network) and Wasserstein distance.It demonstrates the ability to predict the life of rolling bearings with a minimal mean absolute error (MAE) of 0.13.
Updating the model and parameters represents a crucial aspect in the research of digital twin models for diagnosis and prognosis.In the realm of applying digital twin to diagnose faults in rotating machinery, Wang presented a fault diagnosis framework empowered by digital twin technology [18].In this framework, model updating is conceptualized as an optimization challenge, and it is addressed by employing the particle swarm optimization technique to achieve real-time updates.Aivaliotis [19] employed periodic estimation of modeling parameters utilizing the nonlinear least squares method.Similarly, Xu [20] introduced a novel machine learning technique known as deep transfer learning.This approach exhibits the capability to accurately forecast the evolution of performance during the initial stages of actual manufacturing and to adapt to new working conditions swiftly.Inspired by this intriguing concept, this paper contemplates the utilization of a reinforcement learning algorithm for model updating.
Drawing insights from the aforementioned literature, it becomes apparent that conventional physics-based modeling approaches can leverage intricate mechanisms and structures to establish highly precise models with commendable interpretability and generalizability.Nonetheless, the drawback of these methods lies in their inefficiency due to the requirement of identifying a substantial number of parameters.In contrast, models founded on signal response remedy this inefficiency by adopting a simpler structure and necessitating the identification of fewer parameters.Recognizing this advantage, this study introduces a lightweight bearing digital twin centered around the signal response.The primary objective of this digital twin is to strike a balance between modeling accuracy and efficiency.This is achieved by swiftly identifying a limited number of essential parameters while maintaining interpretability by applying straightforward mechanisms.The effectiveness of the digital twin is assessed using experimental data obtained from the three datasets.The envelope spectrum error is employed as a metric to compare the acceleration data derived from the physical test bench with that generated by the digital twin.The outcomes of these comparisons unequivocally affirm the viability and soundness of the proposed framework.
The subsequent sections of this paper are organized as follows: Section 2 elucidates the signal-based model devised for capturing bearing vibration responses.In Section 3, the intricate process of bearing digital twin construction is explored, leveraging the signalbased model and incorporating reinforcement learning techniques.Section 4 provides comprehensive insights into the test bench setup and the acquisition of experimental data, which are subsequently used for validation.Section 5 conducts a meticulous analysis of the obtained results.Ultimately, in Section 6, this paper concludes by summarizing the research's key findings and contributions.

Vibration Response Modeling for Bearing with Defects
In this section, a signal-based model will be constructed to analyze bearing responses under fault conditions, serving as the basis of the bearing digital twin.Broadly, the signalbased model integrates various functions to depict the acceleration response characteristics and dynamics of the bearing.These functions encompass load distribution, fault impulse decay, defect-induced vibration, defect localization, and defect width.Subsequently, the modeling theory and process will be introduced in the following sections.

Modeling for Load Distribution
The load directly determines the bearing's vibration response.According to Stribeck, the load around the circumference of a rolling element bearing under radial load can be defined as [21]: where a is the load proportional factor, Q max is the maximum load intensity, is the load distribution factor, ψ is the angle between the defect and the line of application of load.n is the bearing type factor, with n = 3 2 for ball bearings and n = 10 9 for roller bearings.

Modeling for Fault Impulse Decay
When a bearing exhibits a defect, a sequence of impulses arises as the rolling element traverses the affected area.These impulses undergo continuous oscillation and attenuation owing to the inherent spring and damping characteristics.To capture the vibration response characteristics of the bearing, an approach employing a unit fault impulse function along with a corresponding decay oscillation function is utilized.

Unit Fault Impulse
The assumptions of building unit fault impulse under a certain radial load are as follows.Firstly, it assumes that at t = 0, the defect is positioned at ψ = 0, and one of the rolling elements enters into the defect zone, which means an impulse occurs exactly at t = 0. Secondly, it assumes that the impacts are produced under a unit load distributed uniformly around the bearing.Hence, the vibration produced by the defect can be modeled as an infinite series of impulses with equal amplitude.The unit fault impulse function d(t) is given by: where δ(t) stands for the Dirac delta unit impulse function, d 0 represents the severity of the defect, and T d is the time period between the fault impulses.The number of repeated cycles k within one measurement sample can be derived as follows: where t s is the measurement duration of one sample, and f d represents the defect frequency, which can be substituted by different defect frequencies, such as for the outer race ( f BPFO ), inner race ( f BPFI ), cage ( f FTF ), and ball fault ( f BSF ).

Decay Oscillation of Fault Impulse
The bearing can be conceptualized as a mass-spring-damping system, where the vibrational impulse resulting from an impact within the defect zone gradually diminishes over time.Hence, the oscillation of the fault impulse can be effectively modeled by the integration of an amplitude function a(t) and an exponential decay function e(t).The amplitude function is given by a sinusoidal function, as shown in Equation (4): in which a 0 represents the actual load applied onto the bearing, and f n is the bearing system resonance frequency.The exponential function is given by where B is the decaying parameter.This determines the decay rate of the impulse and also represents the bearing fault dynamics.Finally, the decay oscillation k d (t) of fault impulse can be defined as the product of an amplitude function a(ψ) and an exponential decay function e(t), as follows:

Modeling for Bearing Defect Vibration
By incorporating the effects of bearing load distribution Q(ψ), unit fault impulse d(t), and decay oscillation of the fault impulse a(t) and e(t), the vibration signal produced by bearing with defects can be reconstructed, as shown in Equation (7): Since ψ can be substituted by 2π f n t, the response model can be rewritten, as shown in Equation ( 8): in which all the parameters can be expressed in the time domain.Besides the general model for vibration signals produced by bearing with defect, this study will also address the modeling of fault position and fault size.

Modeling for Defect Position
Regarding the modeling of fault positions, it is essential to address distinct scenarios for different bearing components.In the case of the outer ring, typically fixed within the housing, the fault position remains stationary at its initial location, as denoted by Equation ( 9).Conversely, the inner race, which rotates with the shaft, introduces a variable fault position, as described in Equation (10).Similarly, when a fault occurs on the rolling elements, their positions also change due to their rotation, albeit at different frequencies.This variation is expressed in Equation (11), where f r represents the shaft rotation frequency, and f B denotes the revolution frequency of the rolling elements.

Modeling for Defect Length
When the ball goes through the entry and exit points, impulses will be generated.As a result, a time lag ∆t between these two impulses can be observed when the defect length is noticeable.In this subsection, the time lag will be modeled.
As shown in Figure 1, the angle formed by entry point A, the center of ball O, and exit point B are defined as θ.The distance between entry point A and exit point B is the length of the defect and is approximately a straight line when θ is very small.The angle between OC and OB is θ 2 .When the angle is small enough, the relationship between the angle, radius, and the length of defect can be formulated as Equation (12): Consequently, the period between the rolling balls entering and leaving the defect zone can be calculated as follows: where f BPFO is the defect frequency of outer race.With the combination of Equations ( 12) and ( 13), the time lag of going through the defect can be generally identified as where L is the defect length, f d is the defect frequency, and r is the radius.For inner race fault, r can be replaced by r inner and f d by f BPFI .Likewise, for ball fault, the r and f d should be updated with r ball and f BSF , respectively.

Bearing Digital Twin Construction Based on Reinforcement Learning
After establishing the response model of bearing under different fault conditions, the next step is to combine it with reinforcement learning to build a bearing digital twin model.

Bearing Digital Twin Construction
A digital twin encompasses both physical and virtual domains and delineates the connection or interface between these realms [23].The behavior of the physical system is subject to alteration based on various factors, including geometrical configurations, material attributes, process variables, operational states, and environmental surroundings.Constructing a digital twin model relies on integrating physical data and sensor measurements, which can be categorized into two core components: physics-based modeling and parameter refinement.The objective of model refinement is to minimize the discrepancy between the dynamic response predicted by the digital twin and the real-time response observed in the physical system.
Figure 2 presents the main procedure for constructing a bearing digital twin.Generally, it includes six steps, as follows: (1) The data obtained from the test bench necessitate thorough analysis, and it is imperative to establish the specific conditions under which these data were collected.The condition parameters governing the digital twin, such as the test bearing's specifications, operational settings, and defect definitions, should align precisely with those employed in the actual test bench.(2) The digital twin model will be set up based on the signal-based model.The parameters within this digital twin model can be categorized into two distinct parts: those that are predefined and those that remain unknown.The predefined parameters are derived from the previous step, while the unknown parameters are subject to updates as part of our model updating strategy.(3) The data obtained from both the test bench and the digital twin will undergo initial pre-processing.Following this, an envelope spectrum analysis will be performed to establish the foundation for the cost function.(4) This cost function will then be computed to quantify the disparity between the data derived from the test bench and those from the digital twin.(5) Reinforcement learning will identify the unknown parameters.(6) The values of unidentified parameters will be incorporated into the digital twin.Each episode will go through step (2)-step (6) until the error between the test bench and the digital twin is as small as expected.

Analysis of Parameters to Be Identified
After introducing the construction of the bearing digital twin, it becomes evident that the updated parameters hold a pivotal role.In this subsection, an analysis will be conducted to determine the parameters requiring updates.The vibration response model generally comprises two primary components: one for load distribution and another for fault impulse decay.The parameters selected for identification will be drawn from these two functions.More specifically, the load proportional factor a plays a critical role in modeling load distribution, defining the range of load distribution around the bearing's circumference.For a visual representation, please refer to Figure 3, which displays the envelope spectra derived from the vibration response signals under varying load proportional factors.We can find that the load proportional factor affects the modeling results directly.In the envelope spectrum, the amplitudes at fault peaks will be magnified 100 times when the load proportional factor changes from a = 0.001 to a = 0.1.Therefore, the load proportional factor can be selected as a parameter for the digital twin's online updating.Additionally, the decaying parameter B is crucial for modeling the decay of fault impulse.It affects the decaying rate of the fault impulse.Generally, a large value of B will cause the fault peaks to decay quickly.Otherwise, a small value of B will slow the decaying process.Based on the above analysis, a and B will be selected as the updated parameters in the bearing digital twin model.

Deep Deterministic Policy Gradient (DDPG)
Once the signal-based model and the relevant parameters have been established, the subsequent crucial step involves devising an appropriate method for parameter identification.In this study, DDPG is employed as the strategy for updating the bearing digital twin model.DDPG is theoretically derived from Policy Gradient (PG) and its extension, Deterministic Policy Gradient (DPG).Unlike DPG, a core enhancement in DDPG is utilizing a convolutional neural network (CNN) instead of the traditional policy structure and value function.Furthermore, these networks are trained using deep learning techniques.For a comprehensive view of the DDPG structure and algorithm, please refer to Figure 4, which illustrates the nine main steps summarized in Algorithm 1.

Algorithm 1 DDPG algorithm
Randomly initialize critic network Q and actor network µ with weights θ Q and θ µ .Initialize target network Q and µ with weights Initialize replay buffer R. for episode = 1: M do Initialize a random process N for action exploration.
Receive initial observation state s 1 .for t = 1: T do Select action a t = µ(s t |θ µ ) + N t according to the current policy and exploration noise.
Execute action a t and observe reward r t and observe new state s t+1 .Store transition (s t , a t , r t , s t+1 ) in R. Sample a random mini-batch of N transitions (s i , a i , r i , s i+1 ) from R.
Set target Update critic network by minimizing the loss: Update the actor policy network using the sampled gradient:

Parameters Identification Based on DDPG
The previous sections introduce the DDPG algorithm and the structure of bearing digital twin.This section will focus on the application of DDPG into digital twin in detail.

Training Environment
The reinforcement learning algorithm has five main elements: agent, environment, state, reward, and action.Their interactions within the DDPG-based bearing digital twin are illustrated in Figure 5.The agent is the DDPG algorithm, the environment is set up based on bearing digital twin, and the actions are the parameters to be identified: load proportional factor a and decaying parameter B. Both state and reward are defined based on the cost function (C).Specifically, the state is defined as the value of C, while the reward is defined as a piece-wise function of C, as follows: where C min and C max are the lower and upper limits of the cost function defined by the user.This particular reward function is defined based on the mechanism.Firstly, segmentation is required, as it allows for quicker convergence in regions with larger errors.Secondly, specific coefficients for each segment are selected using the trial-and-error method.In the following, the construction of the cost function will be introduced.

Cost Function Construction
The construction of the cost function holds particular significance as it directly influences the definitions of state and reward.In this context, the load proportional factor and decaying parameter play a crucial role in shaping the time-domain response of the digital twin.Consequently, the cost function is formulated based on the time domain.To facilitate the training and validation of the digital twin, this study utilizes data spanning a complete fault period.However, before proceeding, the data necessitate pre-processing.The number of data points within one defect period (L s ) can be calculated with defect frequency and sampling rate, as follows: where f d is the corresponding defect frequency, f s is the sampling rate, and ceil stands for the upward rounding function.The maximum value in one defect period has the strongest possibility of being the fault peak.Then, both simulated and test data could be aligned through locating the position of the fault peak.Moreover, only the positive values of the data will be used for cost function calculation.Regarding the cost function, it is constructed based on cosine similarity, which is a measure of similarity between two non-zero vectors.The cosine similarity is defined as follows: where b sim and b real are two vectors, which will be substituted by simulated data and test data here.θ is the angle between them.When b sim and b real are two vectors with n-dimensions, like b sim = [b sim,0 , b sim,1 , ..., b sim,n ] and b real = [b real,0 , b real,1 , ..., b real,n ], the cosine similarity between them should be calculated by every two adjacent points and then averaged over all the cosine similarities.The average value is regarded as the cosine similarity of these n-dimensional vectors.It can be expressed as: where vectors b sim and b real are simulated data and test data with n-dimensions, respectively.Finally, the cost function based on cosine similarity can be constructed as follows:

Parameters Setting for DDPG
DDPG algorithm is developed based on the actor-critic method.Both the actor and critic adopt CNN in DDPG. Figure 6 illustrates the structure of the critic and actor networks in bearing digital twin.The configuration of the critic network is detailed in Table 1, while the actor network's setup is summarized in Table 2. Several key parameters deserve attention.Firstly, there is a notable disparity in the learning rates between the actor and critic networks.Typically, the critic network employs a learning rate approximately an order of magnitude larger than that of the actor network.For example, if the critic network's learning rate is set to 1 ×10 −3 , then it is advisable to set the actor network's learning rate at 1 × 10 −4 .This choice is motivated by the fact that the gradient used to update the actor network is derived from the critic network.Secondly, a gradient threshold is introduced as a parameter.Lastly, L 2 regularization is employed, introducing a penalty term into the cost function to enhance model robustness, mitigate overfitting, and improve overall accuracy.Specific values for these parameters are provided in Table 3.In addition to neural network parameters and regulations, agent settings are also outlined in Table 4.

Experimental Datasets and Data Processing
After the bearing digital twin has been built, the next step is to validate the model with experiment data.In this section, the test bench used for validation and necessary data pre-processing are introduced.

Introduction of Bearing Datasets
We employ the Case Western Reserve University (CWRU) dataset as a prime illustration to elucidate the intricate procedures entailed in data processing and meticulous experimental validation [24].Additionally, to provide further validation of the proposed method's effectiveness, we conducted comparative experiments employing the Society for Machinery Failure Prevention Technology (MFPT) dataset [25] and the Paderborn University (PU) dataset [26].Table 5 summarizes the bearing specifications of these datasets.As shown in Figure 7, the bearing test bench from CWRU consists of an electric motor, a torque transducer/encoder, a dynamometer, and control electronics.The test bearing connects with the shaft.Table 5 summarizes the bearing specifications.The dataset from the drive end of the motor is collected using accelerometers.There are three bearing defect types: inner race fault, ball fault, and outer race fault.Each bearing fault type has four different fault diameters (0.1778 mm, 0.3556 mm, 0.5334 mm, and 0.7112 mm) and four different motor working loads (745.7 W, 1491.4W, 2237.1 W, and 2982.8W).

Loss Function Based on Envelope Spectrum
Envelope spectrum analysis is an essential tool for bearing diagnostics, from which the unique resonant frequency can be isolated from vibration signals.Thus, the envelope spectrum reveals the repetition frequency of the impulse response series, and the repetition frequency is the bearing fault frequency.The theoretical value of fault frequencies can be calculated as follows, with f BPFO , f BPFI , f FTF , and f BSF for the outer race, inner race, cage, and ball fault, respectively [27]: where n is the number of rolling elements, f r is the shaft frequency, θ is the initial contact angle.d and D are the ball diameter and pitch diameter of a bearing, respectively.
To verify the performance of the digital twin, an essential validation function is defined based on the envelope spectrum.The amplitude and frequency serve as two critical indicators within envelope spectrum analysis after normalization, considering their typical differences in scale [28].The normalization of amplitudes is subtracting their mean value from the amplitudes of its first five orders and then dividing them by the standard deviation.As given in Equation ( 24), where A FF stands for the amplitudes at fault frequencies, as shown with the three red circle positions in Figure 8, A µ is the mean amplitude of the first five orders, and σ A is their standard deviation.
The normalization of frequency is to divide the actual defect frequencies of the first five orders by the theoretical defect frequencies of the fifth order, as formulated in Equation ( 25): where f i is the i-th order of actual fault frequency obtained from the envelope spectrum, f theo represents the first order of theoretical defect frequency, and n order is the highest order of peak, defined as 5 in this study.After the normalization of amplitudes and frequencies, the deviation between the physical system and the digital twin can be calculated by constructing a validation function: where A i real and A i sim represent the amplitude of the i-th order fault peak from the test bench and digital twin, respectively.F i real and F i sim are the frequency of i − th order fault peak from the test bench and digital twin.This function is employed to evaluate the performance of a digital twin model by quantifying the error in the envelope spectrum when compared to the real measurements obtained from the test bench.The error is determined by comparing the peak information of the digital twin's simulated data with the test data from the physical system.This error calculation will be refined as the model updates.A digital twin can be considered a suitable substitute for the physical system when the error approaches zero.Conversely, when the error value becomes significantly large, the digital twin model necessitates revision.

Results and Analysis
This section presents the training results of the bearing digital twin.The test data consist of measurement samples obtained from the CWRU dataset, encompassing defects on the outer ring, inner ring, and ball components.These data were acquired with a sampling frequency of 12,000 Hz from the drive end under a load of 745.7 W, while the motor operated at a speed of 1772 rpm.
The selected actions consist of the load proportional factor a and the decaying parameter B. The allowable ranges for these two parameters are defined through the output layer of the actor network, specifically set as [0.001, 0.1] and [100, 400], respectively.The training process commences with the dataset sample containing an outer ring defect.At each step, a random value within the same data sample is initiated, and subsequently, the error between the test data and their corresponding simulated data is calculated.Each episode encompasses 80 such steps.The resulting agent undergoes training across 120 episodes and can be effectively applied in diverse environments.To ascertain the agent's ability to make accurate selections in alternative scenarios, the agent, initially trained on data involving outer ring faults, will be evaluated within environments featuring inner ring and ball defects.
Based on the training results involving states, which represent errors calculated using the cosine similarity cost function, the process entails selecting the smallest state value and subsequently identifying its corresponding action values.These action values are then integrated into the digital twin model.Table 6 displays the step numbers associated with the minimum states and their corresponding values, which have been acquired through reinforcement learning.Concurrently, Table 7 provides an overview of the corresponding actions.Figure 9 depicts the learning trajectory of the load proportional factor, denoted as 'a,' and the decaying parameter B. Meanwhile, Table 8 illustrates the minimum mean error achieved through digital twin training and the calculated mean error resulting from the trained actions.Notably, it is evident that following the identification of the load proportional factor a and decaying parameter B through the DDPG algorithm, the disparity between the simulation data and accurate measurements is greatly reduced.
Upon careful examination of Tables 6 and 8, it becomes apparent that the errors obtained through digital twin training do not precisely align with the theoretical errors.This discrepancy may be attributed to two key factors.Firstly, an agent trained within a specific environment excels within that environment, resulting in the lowest error percentage when applied to outer ring defect data.Introducing a new environment may potentially impact the performance of the pre-trained agent.Secondly, the initial point of each episode and each step within an episode is characterized by random, unknown values.Consequently, the data employed for comparison and validation differ from those utilized in the DDPG process.Figure 10b compares the frequency domain.Notably, the digital twin accurately captures the fault frequency of the actual measurement sample.However, significant disparities are observed in the time domain of simulated signals.Several factors may contribute to amplitude deviations.Firstly, noise in the actual measurements can influence the envelope spectrum, a factor not addressed in the digital twin model.Secondly, dis-crepancies in the identification of parameters a and B can lead to variations in response amplitude.Lastly, the simplified signal-based model may not fully represent the intricate dynamics of a real bearing test bench.These same observations and conclusions apply to acceleration data generated by the digital twin for inner race faults and ball faults.We conducted ablation experiments using the hyperparameters listed in Table 9 as the baseline configuration.The impact of these hyperparameters on the digital twin can be assessed through the experimental outcomes presented in Table 10.Among the four parameters, the spectrum error exhibits the highest sensitivity to variations in the discount factor, followed by the mini-batch size.Specifically, their respective average errors reach between 0.5904 and 0.5633, while the average errors observed in other models fall within the range of 0.5503-0.5576.
As the discount factor gradually approaches unity, the standard deviation (SD) of the spectrum error increases, aligning with the trend of error changes associated with the maximum step number.Notably, the samples generated by the baseline model exhibit minimal mean error and relatively low standard deviation compared to the actual samples.This finding substantiates the optimality of the parameters listed in Table 9 for the proposed bearing digital twin.The MFPT dataset and the PU dataset are used for comparative experiments.The results are shown in Table 11.In the case of the MFPT dataset, the average error consistently remains below 0.9250.Notably, the SD of the error for the outer race is markedly lower than that for the inner race, a pattern analogous to the findings in the CWRU dataset.When employing the PU dataset, the average error consistently remains below 0.1741, surpassing the performance observed with the CWRU dataset.Moreover, the SD of the errors exhibits a comparable range.These results collectively indicate that the utilization of the PU and MFPT datasets leads to reductions in both the mean and standard deviation of errors.It is reasonable to infer that this improvement can be attributed to the higher sampling frequency, enabling a more faithful representation of the bearing's true dynamic characteristics.This observation serves to validate the effectiveness of the proposed methodology across diverse bearing datasets.Modeling of bearing acceleration response under fault conditions is significant as it provides data to study bearing fault dynamics and can provide data for training fault diagnostics models.This paper proposed a new method to simulate the bearing response by constructing a bearing digital twin.The work finished in this study can be summarized as follows.

•
A signal-based model, consisting of a unit impulse function and decay oscillation function, is built to describe the bearing's acceleration response under different fault conditions, with fault position and length considered.

•
A bearing digital twin model is constructed.The signal-based model is taken as the environment, DDPG is adopted as the agent, and the online learning of two parameters (load proportional factor, decaying parameter) from the signal-based model is regarded as the action of the digital twin.In addition, the cosine similarity between the real acceleration and simulation data from the digital twin is utilized to define the reward and state.

•
Experimental data from the CWRU test bench are used to validate the proposed bearing digital twin.The acceleration similarity between physics space (test bench) and virtual space (digital twin) in the time and frequency domains is compared.

Conclusions
Based on the aforementioned work, the following conclusions can be drawn.

•
The signal-based model can represent the bearing response under normal and fault conditions.The parameters of the load proportional factor (a) and decaying parameter (B) can be used to identify the fault position and fault dynamics.

•
The digital twin can be used to generate the bearing's acceleration response with high similarity in both time and frequency domains to the real measurement data from the test bench.

Outlook
The bearing fault modeling is only the basis for a series of follow-up studies in prognostics and health management (PHM).The signal simulated by the bearing digital twin may contain information on the bearing health status and degradation dynamics.Therefore, related further research can include the following directions:

•
Exploring the extension of the current signal-based model to characterize defect profiles and accommodate multiple defects is a potential direction for further research.

•
Integrating more complex physics models into the digital twin is another area of interest.For instance, combining the Archard model with the signal-based model would enable the modeling of dynamic wear loss and profiles, moving beyond the current focus on static defect size.

Figure 1 .
Figure 1.A rolling element traveling into the defect zone located on the outer raceway [22].

Figure 3 .
Figure 3. Envelope spectra under different load factors.

Figure 5 .
Figure 5. Structure of bearing digital twin's parameters identification based on DDPG.
(a) Critic network structure.(b) Actor network structure.

Figure 8 .
Figure 8. Inner race fault defect frequencies of first five orders.

Figure 9 .
Figure 9. Training results of digital twin under different environments.

Figure 10 .
Figure 10.Training results of digital twin (time domain and frequency domain).

Table 1 .
Parameters of DDPG critic network.

Table 2 .
Parameters of DDPG actor network.

Table 3 .
Parameters of DDPG neural network.

Table 5 .
Bearing specifications of different datasets.

Table 6 .
Optimal states in training of DDPG.

Table 8 .
Validation results based on cosine similarity.

Table 9 .
Hyperparameters used in digital twin as baseline.

Table 10 .
Spectrum error in ablation experiment results.

Table 11 .
Spectrum error in modeling using different datasets.