Transfer Learning Based Method for Frequency Response Model Updating with Insufficient Data

Finite element model updating precision depends heavily on sufficient vibration feature extraction. However, adequate amount of sample collection is generally time-consuming in frequency response (FR) model updating. Accurate vibration feature extraction with insufficient data has become a significant challenge in FR model updating. To update the finite element model with a small dataset, a novel approach based on transfer learning is firstly proposed in this paper. A readily available fault diagnosis dataset is selected as ancillary knowledge to train a high-precision mapping from FR data to updating parameters. The proposed transfer learning network is constructed with two branches: source and target domain feature extractor. Considering about the cross-domain feature discrepancy, a domain adaptation method is designed by embedding the extracted features into a shared feature space to train a reliable model updating framework. The proposed method is verified by a simulated satellite example. The comparison results manifest that sample amount dependency has prominently lessened this method and the updated model outperforms the method without transfer learning in accuracy with the small dataset. Furthermore, the updated model is validated through dynamic response out of the training set.


Introduction
Model updating is an important topic in dynamic analysis and structural engineering [1,2], which is aimed at improving the finite element model reliability. Frequency response (FR) is commonly regarded as the updating objective in the model updating algorithm [3][4][5]. The model updating performance overwhelmingly relies on the sample amount [6,7]. Especially in methods based on the deep neural network [8], the requirement of the training sample amount is generally extensive. Unfortunately, adequate sample collection is extremely time-consuming in practice [9,10]. Insufficient data problems become a common obstacle in FR model updating [11] and reducing sample dependency would be desirable.
However, in previous model updating methods, researchers generally replace the repeated FR data calculation by some approximate functions to address the small sample problem [12,13]. The complicated dynamic propagation function is roughly displaced by a simplified formula. Fang et al., 2015, employed the polynomial-based response surface model to reduce repeated natural frequency calculation in interval model updating [14]. Yin et al., 2019, designed an acceleration FR objective function and applied the Kriging model to replace this function [15]. Deng et al., 2017, used the radial basis function model to simplify the natural frequency calculation [16]. These simplified meta models are established by manual extracted FR features. It is unreliable to represent the complicated vibration characteristic and distinguish the inconspicuous vibration signal [17,18]. Therefore, how to remedy Sensors 2020, 20, 5615 3 of 16 updating problem will be transformed to a forward optimization problem where the optimal result is the final updating result. This optimization problem is as follows [7]: where F represents residual function and the optimization objective, θ iL and θ iU stand for the lower and upper bound of updating parameter θ i , respectively. The residual function is designed as follows: where w i denotes weights, n θ is the number of measured sample location, n ω is the number of selected frequency points of FR, ω i is the selected frequency points, A n e (ω i ) and A n a (ω i ) denote the experimental and simulated acceleration frequency response amplitude at ω i .
Whereas, the residual function is inadequate to represent the structure dynamic feature and the manual feature extractor is inexact for complex FR feature. Hence, a high-precision inverse mapping from raw FR data to the updating parameter is proposed to overcome precision diminution in artificial feature extraction [8]. The inverse mapping can be formulated as follows: where I Aθ is the inverse mapping from experiment FR data A t to updating parameters θ. However, massive samples are essential to train a reliable network. Thus, transfer learning technique is adopted to reduce the quantity requirement of FR samples.

Transfer Learning
Transfer learning is proposed to apply the available knowledge and skills in previous domains to a novel domain. The known domain is source domain D s = χ s , P s (X s ) and the new domain with different distribution is the target domain D t = χ t , P t (X t ) [34], where χ s and χ t refer to the sample space of source and target domain. X s denotes the sufficient source domain sample for the source task T s and X t denotes the scanty target domain sample for the target task T t , where χ s ∈ X s and χ t ∈ X t . In this paper, the labeled bearing fault diagnosis data is in the sample space of source domain with the distribution P s (X s ). P t (X t ) stands for the distribution of the frequency response data in target sample space as described in Figure 1. In various research field, it is an effective strategy to fine-tune the pre-trained network when the source and target data has the same distribution [35]. However, in this paper, the feature distribution is different in FR data and fault vibration signal (namely P s (X s ) P t (X t )). Obviously, the model updating problem is inappropriate to be solved through directly fine-tuning a pre-trained network with target data. Therefore, the domain adaptation technology is necessary for feature mapping in different sample spaces. data in target sample space as described in Figure 1. In various research field, it is an effective strategy to fine-tune the pre-trained network when the source and target data has the same distribution [35]. However, in this paper, the feature distribution is different in FR data and fault vibration signal (namely ≠ ). Obviously, the model updating problem is inappropriate to be solved through directly fine-tuning a pre-trained network with target data. Therefore, the domain adaptation technology is necessary for feature mapping in different sample spaces.

Maximum Mean Discrepancy
Domain adaptation is an important technology in transfer learning [26]. It aims to map the data of source and target domain into a similar feature space and minimizing the discrepancy between the two feature spaces simultaneously. Then, the target knowledge in the shared feature space is learnt to improve the accuracy of the target task. Maximum mean discrepancy (MMD) is widely utilized in domain discrepancy quantification in DA, which is defined as follows [27]: where Dis[Ψ s (X s ), Ψ t (X t )] is the discrepancy function, H represents the reproducing kernel Hilbert space (RKHS), Ψ s (X s ) and Ψ t (X t ) refer to the nonlinear mapping function in source and target domain from the original feature space to RKHS, n s and n t are the amount of source and target samples, χ s r and χ t r refer to the sample in X s and X t , respectively. After feature mapping, the sample distribution in the new feature space will be diminished, namely P s (X s ) ≈ P t (X t ).

Procedure of Proposed Method
In this section, the procedure of the TLNet method is introduced. In reference [8] (hereafter this text will be abbreviated as the UCNN method), a high precious updated model is achieved with sufficient training data. Therefore, the UCNN method is introduced as a comparison. The procedure of these two methods is displayed in Figure 2.

Procedure of Proposed Method
In this section, the procedure of the TLNet method is introduced. In reference [8] (hereafter this text will be abbreviated as the UCNN method), a high precious updated model is achieved with sufficient training data. Therefore, the UCNN method is introduced as a comparison. The procedure of these two methods is displayed in Figure 2.  It can be inferred that the two methods are similar in target data preparation, and forward propagation process, but different in transfer learning and network architecture. Source domain knowledge is learnt in auxiliary training in the TLNet method, which is the major difference between the two methods. Additionally, domain adaptation is necessary to learn the cross-domain assistant knowledge in the TLNet method.

Domain Adaptation
The features extracted from the network is used to quantify the discrepancy between source and target domain, as formulated in Equation (4). After embedding source and target domain data into Sensors 2020, 20, 5615 6 of 16 a shared feature space, the distribution discrepancy is diminished by training the parameters of the nonlinear mapping function by minimizing MMD. Define matrices K and L as follows [34]: and where: and: Then discrepancy function can be simplified as follows: where tr(KL) stands for the trace of KL. Generally, n s and n t are the same in one batch. Therefore, the minimizing discrepancy function can then be eventually written as follows: Considering the substantial distribution difference between original source and target data, the domain adaptation is utilized in two parts of the network to enhance the effect.

Source Domain Sample
The CWRU dataset is chosen as the auxiliary training knowledge. This source domain dataset was acquired from the accelerometers of the motor driving mechanical system [33]. Artificial damage on rolling bearing and single point fault was arranged. The damage diameters were 0.007 inch (0.1778 mm), 0.014 inch (0.3556 mm), and 0.021 inch (0.5334 mm). The damage points of the outer ring of the bearing at the drive end and the fan end are respectively placed at three different positions: 3 o'clock, 6 o'clock, and 12 o'clock [36]. In this paper, the bearing data will be classified to 10 categories: normal, inner ring, outer ring, ball with different damage positions and different damage diameters.
Since the dimensions of the source and target domain data are different, preprocessing is necessary before feature extraction. The source domain sample is continuous time-domain response, which needs to be Fourier transformed to the frequency-domain space. A part of the small stable signal with 1024 signal sampling points is sliced as a frame, as described in Figure 3. Frames are acquired overlapped, which is not continuously. The sample distance between the start positions of two adjacent frames is called the shift. The frame shift length in this paper is 512 sampling points, and the length of overlapping is 512 sampling points. It infers that the longer the overlap part is, the shorter the frame shift length will be, and more frames can be obtained. Therefore, using frames with overlap is an effectively sample expanding method. Finally, the source domain sample size is transformed to be 1 × 11 × 1024. acquired overlapped, which is not continuously. The sample distance between the start positions of two adjacent frames is called the shift. The frame shift length in this paper is 512 sampling points, and the length of overlapping is 512 sampling points. It infers that the longer the overlap part is, the shorter the frame shift length will be, and more frames can be obtained. Therefore, using frames with overlap is an effectively sample expanding method. Finally, the source domain sample size is transformed to be 1 × 11 × 1024.

Target Domain Sample
In this paper, the FR image and corresponding updating parameters are the training pair for neural network training. In the TLNet method, a matrix transformed from the FR data will be firstly normalized into 0-255. After normalization, the FR data will be converted to multichannel image without artificial feature extraction, such as principal component analysis, reduction, and fitting. The channels, width, and height denote the acceleration orientation, number of the sampling location, and frequency measured sampling location of the FR signal, which is described in Figure 4. In Section 4, horizontal acceleration FR data of 11 sampling locations are measured from 0 to 100 Hz for every 1 Hz. Consequently, the FR size image is 1 × 11 × 101. The target training set is established by the simulation result of the updating model.

Target Domain Sample
In this paper, the FR image and corresponding updating parameters are the training pair for neural network training. In the TLNet method, a matrix transformed from the FR data will be firstly normalized into 0-255. After normalization, the FR data will be converted to multichannel image without artificial feature extraction, such as principal component analysis, reduction, and fitting. The channels, width, and height denote the acceleration orientation, number of the sampling location, and frequency measured sampling location of the FR signal, which is described in Figure 4. In Section 4, horizontal acceleration FR data of 11 sampling locations are measured from 0 to 100 Hz for every 1 Hz. Consequently, the FR size image is 1 × 11 × 101. The target training set is established by the simulation result of the updating model.

Network Architecture
In this paper, the deep convolutional neural network is adopted as the feature extractor for training samples. Aiming at taking advantage of target and source data, the network is designed to have two branches: target branch for model updating task and source branch for fault diagnose branch. The feature of FR data will be extracted layer-by-layer in the target branch of the network and the bearing vibration frames will be learnt in the source branch. Figure 5 displays the detail of the proposed network. The network structure is designed as follows:

Network Architecture
In this paper, the deep convolutional neural network is adopted as the feature extractor for training samples. Aiming at taking advantage of target and source data, the network is designed to have two branches: target branch for model updating task and source branch for fault diagnose branch. The feature of FR data will be extracted layer-by-layer in the target branch of the network and the bearing vibration frames will be learnt in the source branch. Figure 5 displays the detail of the proposed network. The network structure is designed as follows: 1.
NetT1 block: four convolutional layers block to extract feature of FR data at different sample locations and reshape the feature map to the size of 1 × 1 × 101.

2.
NetS1 block: four convolutional layers with one flatten layer block to extract feature of bearing vibration and reshape the feature map to the same size as the output of NetT1 block. 3.
NetT2 block: five layers network block to extract feature of NetT1 output. 4.
NetS2 block: the same structure as NetT2 to extract feature of NetS1 output. 5.
Output layers: a flatten layer followed by two fully connected layers to reshape the feature map to source and target task respectively. In the TLNet method, the network is trained with scanty FR samples and massive fault diagnose samples. The training pairs in the two domains are sent to each branch of the network separately. In each branch, the low-level features are firstly extracted. Through convolution layers in the NetT1 and NetS1 block, the target features and the source features are transformed to the same size to compute domain discrepancy. Secondly, the features in the two domains are mapped to a shared feature space by first domain adaptation. Thirdly, the NetT1 and NetS1 output are sent to two network branches with the same structure (namely the ShareNet) to extract the deep-level feature. Fourthly, the second domain discrepancy is calculated, the output features of the two shared branches are mapped to a same feature space again by the second enhanced domain adaptation. Finally, the task of source and target domain is complete through output layers. The kernel size, stride size, and the channel number of TLNet parameter are listed in Table 1.

NetT1
NetS1 In the TLNet method, the network is trained with scanty FR samples and massive fault diagnose samples. The training pairs in the two domains are sent to each branch of the network separately. In each branch, the low-level features are firstly extracted. Through convolution layers in the NetT1 and NetS1 block, the target features and the source features are transformed to the same size to compute domain discrepancy. Secondly, the features in the two domains are mapped to a shared feature space by first domain adaptation. Thirdly, the NetT1 and NetS1 output are sent to two network branches with the same structure (namely the ShareNet) to extract the deep-level feature. Fourthly, the second domain discrepancy is calculated, the output features of the two shared branches are mapped to a same feature space again by the second enhanced domain adaptation. Finally, the task of source and target domain is complete through output layers. The kernel size, stride size, and the channel number of TLNet parameter are listed in Table 1. It can be inferred that the feature from the prior layer is sufficiently propagated to the post layer with the feature map in TLNet. The domain discrepancy is narrowed through two domain adaptation strategy. Eventually, the fault diagnosis task and the model updating task is completed through this network.
In this paper, the network is implemented onto the machine learning framework PyTorch 1.4.0 [37]. The first-order gradient-based stochastic optimization algorithm Adam is utilized in training [38].

Model Updating
After training, the measured FR image of the real structure will be sent to this trained model updating network. Then, the network is the value of target parameters, namely the final updated parameters of the observed FE model.

Case Study
A satellite model is present to demonstrate the feasibility and effectiveness of the TLNet method.  It can be inferred that the feature from the prior layer is sufficiently propagated to the post layer with the feature map in TLNet. The domain discrepancy is narrowed through two domain adaptation strategy. Eventually, the fault diagnosis task and the model updating task is completed through this network.
In this paper, the network is implemented onto the machine learning framework PyTorch 1.4.0 [37]. The first-order gradient-based stochastic optimization algorithm Adam is utilized in training [38].

Model Updating
After training, the measured FR image of the real structure will be sent to this trained model updating network. Then, the network is the value of target parameters, namely the final updated parameters of the observed FE model.

Case Study
A satellite model is present to demonstrate the feasibility and effectiveness of the TLNet method. Figure 6 plots the FE model of the satellite and the sampling locations.

Example Introduction
The updating parameters are the material parameter and the thickness of the structure: elastic model of the major structure , density of the major structure , thickness of the upper platform , thickness of the lower platform , thickness of the central cylinder , and thickness of the shear panels . The real value of these parameters is listed in Table 2.

Example Introduction
The updating parameters are the material parameter and the thickness of the structure: elastic model of the major structure θ 1 , density of the major structure θ 2 , thickness of the upper platform θ 3 , thickness of the lower platform θ 4 , thickness of the central cylinder θ 5 , and thickness of the shear panels θ 6 . The real value of these parameters is listed in Table 2. Horizontal acceleration FR data of 11 sampling locations are collected from 0 to 100 Hz for every 1 Hz by finite element software MSC. Patran and Nastran repeated simulations are implemented to build the data base of the initial model. FR data of X orientation is firstly normalized to 0-255 and then transformed into the FR image by sequence. In this section, FR data and the CWRU dataset are chosen to train the network. After training, the Z orientation data out of the training set are sent to the trained network in model validation.
In this paper, the loss function (MMD2) is defined as follows: where θ t output and θ t label represent network output and label in target domain, X andX stand for the input of NetT1 and NetT2 (or NetS1 and NetS2) block, Ψ andΨ denote the output of NetT2 (or NetS2) block, CE refers to the cross-entropy loss for source classify task, MSE refers to the MSE loss for the target regress task, η stands for the feature loss factor of each loss function. In this paper, another loss function (MMDMSE) is designed as the comparison, which is defined as follows:

Result and Discussion
After training, the experiment FR data will be sent to the trained network, and the network output is the updating result. The parameters updated by two methods without TL (UCNN method and TLNet without source branch) and two methods with TL (MMDMSE and MMD2 loss function) are compared. Figure 7 shows the average errors corresponding to the mentioned four methods with the number of training sample increasing. From this figure, it is observed that the methods with domain adaptation (MMDMSE and MMD2) outperform with insufficient data. The average errors of TL methods are 3.302%, and 3.070% with 100 FR samples, while the average errors of the methods without transfer learning are 8.980% and 7.352%. With the increase of the sample amount, the average errors of those four methods all tend to decrease. When the sample number reaches to 4000 or even more, the accuracy appears to be stable. The result infers that the proposed method can performs better than the method without transfer learning when target training samples is extremely insufficient. This indicates that the representational ability of the proposed network can be improved with the help of vibration features learnt from the source data. Then, the precision of network output result can be improved even with inefficient data. errors of those four methods all tend to decrease. When the sample number reaches to 4000 or even more, the accuracy appears to be stable. The result infers that the proposed method can performs better than the method without transfer learning when target training samples is extremely insufficient. This indicates that the representational ability of the proposed network can be improved with the help of vibration features learnt from the source data. Then, the precision of network output result can be improved even with inefficient data.  The FR signal of the No.1 sampling location with increasing training sample size is plotted in Figure 8. Visually comparing the model frequency response with the experiment data implies that the updated model closely coincides with the real structure in the FR curve. The resonance peak amplitude and position resemble the experiment amplitude curve, which confirms that the model precision is substantially improved by the proposed method. With the increasing of the sample size, the FE model updated by TL methods still works better than those without TL technique. Specifically, the MMD2 loss is slightly better than the MMDMSE loss. The result suggests that the loss function based on MMD works well on mapping the domain-cross features to a more similar feature space, and it can also achieve better result in domain discrepancy diminution.
The final updating results with 4000 samples are presented in Table 2. The deviation between the updated and the real parameters is significantly lessened. The average error of the updated parameters with transfer learning is 0.257% and 0.145%, which is lower than the methods without transfer learning.
Furthermore, the frequency response assurance criterion (FRAC) is selected to assess the similarity between the updated simulation outputs and the experimental measurements [39]. When the data amount reaches to 200, the FRAC with MMD2 loss method is 0.963, while that with the UCNN method is 0.73. When the number of data volume increase to 2000, the FRAC is 0.999 with MMD2 and is 0.755 with the UCNN, respectively. This indicates that the outputs of the simulation model updated by the proposed method is closer to the experimental response, which proves the superiority of the TLNet method when the sample size is limited. A review of the updating results indicates that the model updating accuracy is successfully improved lacking samples. amplitude and position resemble the experiment amplitude curve, which confirms that the model precision is substantially improved by the proposed method. With the increasing of the sample size, the FE model updated by TL methods still works better than those without TL technique. Specifically, the MMD2 loss is slightly better than the MMDMSE loss. The result suggests that the loss function based on MMD works well on mapping the domain-cross features to a more similar feature space, and it can also achieve better result in domain discrepancy diminution. The final updating results with 4000 samples are presented in Table 2. The deviation between the updated and the real parameters is significantly lessened. The average error of the updated parameters with transfer learning is 0.257% and 0.145%, which is lower than the methods without transfer learning.
Furthermore, the frequency response assurance criterion (FRAC) is selected to assess the similarity between the updated simulation outputs and the experimental measurements [39]. When the data amount reaches to 200, the FRAC with MMD2 loss method is 0.963, while that with the

Model Validation
To further evaluate the updated model, it is validated to by FR data at the Z orientation and the first five natural frequencies, which are unused in the training data. These two kinds of dynamic data are both excluded from the training set. Figure 9 displays the model frequency response at Z orientation corresponding to four updated FE models with above introduced methods, and meanwhile, the size of training sets increases gradually. For the situation of the 100 simple size, the simulated outputs, namely the FR signal of the FE model updated by MMD2 loss, can have a better effect in matching with the measured FR signal. It can be observed in this figure that the frequency response of the updated model coincides better with that of the real structure, though the Z orientation sample is excluded for training. orientation corresponding to four updated FE models with above introduced methods, and meanwhile, the size of training sets increases gradually. For the situation of the 100 simple size, the simulated outputs, namely the FR signal of the FE model updated by MMD2 loss, can have a better effect in matching with the measured FR signal. It can be observed in this figure that the frequency response of the updated model coincides better with that of the real structure, though the Z orientation sample is excluded for training.  Furthermore, the updated model is also validated by natural frequency. Figure 10 manifests the average error of the first natural frequency updated by 4 methods. It implies that the average error is lower in the MMD2 method with limited training samples. With training samples increasing, the model accuracy of MMD2 is still higher than the method without TL. Table 3 shows the natural frequency of the updated model with 4000 training samples. The average error of natural frequency of the MMD2 method is 0.012%, which is 0.586% lower than that of the UCNN method. This implies that the MMDMSE and MMD2 loss also have better performance in nature frequency prediction.
average error of the first natural frequency updated by 4 methods. It implies that the average error is lower in the MMD2 method with limited training samples. With training samples increasing, the model accuracy of MMD2 is still higher than the method without TL. Table 3 shows the natural frequency of the updated model with 4000 training samples. The average error of natural frequency of the MMD2 method is 0.012%, which is 0.586% lower than that of the UCNN method. This implies that the MMDMSE and MMD2 loss also have better performance in nature frequency prediction.  These validation results illustrate that TLNet method performs well in and out of the training set. The updated model can achieve better accuracy by using this method. The proposed approach has the capability to mitigate sample size requirement in model updating.

Conclusions
A model updating method based on transfer learning is proposed to tackle the small dataset problem. Using a two-branch deep neural network, a high-level feature extractor is employed to analyze the inverse relationship between FR data and the updating parameters. To make full use of the source domain knowledge, a two-layer domain adaptation strategy is adopted through mapping the cross-domain vibration feature into a shared space. Therefore, the cross-domain knowledge can be used in training a more reliable learning system. Finally, a high-precision inverse mapping from FR data to update parameters can be achieved.
The proposed method is tested by a satellite example with various number of training samples. Material or geometry parameters of the satellite model are updated. The results indicate that the proposed method has achieved higher-precision updating parameters. The model updated by the proposed method is more accurate than those updated by comparison methods without transfer learning. It can prove that the vibration feature from fault diagnose can be grasped in the learning  These validation results illustrate that TLNet method performs well in and out of the training set. The updated model can achieve better accuracy by using this method. The proposed approach has the capability to mitigate sample size requirement in model updating.

Conclusions
A model updating method based on transfer learning is proposed to tackle the small dataset problem. Using a two-branch deep neural network, a high-level feature extractor is employed to analyze the inverse relationship between FR data and the updating parameters. To make full use of the source domain knowledge, a two-layer domain adaptation strategy is adopted through mapping the cross-domain vibration feature into a shared space. Therefore, the cross-domain knowledge can be used in training a more reliable learning system. Finally, a high-precision inverse mapping from FR data to update parameters can be achieved.
The proposed method is tested by a satellite example with various number of training samples. Material or geometry parameters of the satellite model are updated. The results indicate that the proposed method has achieved higher-precision updating parameters. The model updated by the proposed method is more accurate than those updated by comparison methods without transfer learning. It can prove that the vibration feature from fault diagnose can be grasped in the learning system and it can provide more useful information for vibration analysis. Results also reveal that TLNet has higher ability in feature extraction than the networks trained only by target samples. Through this method, it can achieve significant superiority in model updating with insufficient data. The updated model can also have a more accurate prediction in response out of training set.
The future studies can be extended to applications that are sensitive to sample size, like the uncertainty model updating problem. It can be used to reduce sample need in uncertainty propagation analysis.