An Adaptive Operational Modal Analysis under Non-White Noise Excitation Using Hybrid Neural Networks

To adaptively identify the modal parameters for time-invariant structures excited by non-white noise, this paper proposes a new operational modal analysis (OMA) method using hybrid neural networks. In this work, taking the acceleration response directly as the input data of the networks not only simplifies the data processing, but also retains all the characteristics of the data. The data processed by the output function is the output data of the network, and its peak corresponds to the modal frequency. The proposed output function greatly reduces the computational cost. In addition, a small sample dataset ensures that the hybrid neural networks identify the modal parameters with the highest accuracy in the shortest possible time. Interestingly, the hybrid neural networks combine the advantages of the convolutional neural network (CNN) and gate recurrent unit (GRU). To illustrate the advantages of the proposed method, the cantilever beam and the rudder surface excited by white and non-white noise are taken as examples for experimental verification. The results reveal that the proposed method has a strong anti-noise ability and high recognition accuracy, and is not limited by ambient excitation type.


Introduction
OMA is a process of identifying structural modal parameters only according to the response, including time-domain or frequency-domain techniques. It is well known that modal characteristics can be used in various structural health monitoring applications [1][2][3]. As such, modal analysis plays a key role in many engineering applications [4][5][6].
Recently, various methods for identifying modal parameters of time-invariant structures were proposed [7][8][9][10]. Additionally, the simplest and most typical one is the peak picking (PP) method [11]. In addition, the frequency-domain decomposition technique that has been widely studied by researchers [12][13][14][15] is constantly being improved; for instance, the enhanced frequency-domain decomposition (EFDD) algorithm can predict not only the natural frequencies, but also estimate modal damping ratios [16]. Further, the frequency and spatial domain decomposition (FSDD) technique was one of the most successful modifications of the classical frequency-domain decomposition, it was a more sensitive modal damping estimation method [17]. Compared with the frequency-domain method, the time-domain method can obtain the system parameters directly from the response signal. Thus stochastic subspace identification (SSI), autoregressive moving average time series analysis methods, and their improved methods have been widely used in engineering practice [18][19][20]. The methods based on random decrement technology, the Ibrahim time-domain method, natural excitation technology, eigensystem realization algorithm, and other technologies are also powerful modal analysis tools for time-invariant structures [21][22][23][24]. In addition, the least-squares complex exponential method, Hilbert-Huang transform, and their improved methods appear one after another [25][26][27]. Given the abundant literature for OMA, it is noticeable that most of the traditional OMA methods only identify the time-invariant structures excited by white noise [28]. However, in reality, many ambient excitations are not compliant with the white noise assumption [29]. Researchers began to study the OMA method excited by non-white noise. For instance, an OMA method based on the response correlation function for rotating structures is proposed [30]. In addition to the approaches above, the modal parameters of structures can be identified in another way by a neural network. The neural network algorithm can approach any function in theory, has a strong nonlinear mapping ability, and the adaptability of the network is strong [31].
As the profound research of neural networks, the method of neural network was successfully used in many fields, for instance, optimization, signal processing, health monitoring, and fault diagnosis [32][33][34]. In Ref. [35], a random CNN was proposed to extract signals automatically through convolution operation to monitor the working conditions of the diesel engine. Moreover, Yu et al. proposed a residual neural network based on the characteristic of vibration signal [36]. In Ref. [37], the improved fault diagnosis method based on CNN is proposed, which used a light neural network to process the original signal. The above methods and their application in practical engineering owing to the ability of the neural networks extract signal characteristics. Due to the unique advantages of neural networks in data mining and feature extraction [38,39], the OMA method based on the network came into being. For instance, the three support vector regression method using observed vibration data was proposed [40]. In Ref. [41], the OMA method based on the back propagation network was also proposed. With the development of deep learning, a method based on an uncertainty diagram and CNN was proposed [42]. The research of the OMA method using neural networks is valuable and has a bright future. The universal approximation theorem proposed by Cybenko [43] shows that a simple neural network can approach the expected function infinitely, given appropriate parameters. However, the training optimization algorithm is not necessarily able to learn the function. In many cases, using a deeper model can reduce the number of units required and reduce the generalization error. Nevertheless, the large amount of sample data increases the characteristic parameters of the model, resulting in a long training time, with an unstable model training effect.
To handle these concerns, a hybrid neural network based on CNN and GRU is proposed. Taking advantage of CNNs and GRU, the hybrid neural network can extract features from a long sequence with small sample data to reduce the computational cost. In hybrid neural networks, firstly, the CNN with strong feature extraction ability is used to preliminarily extract features, and then the preliminarily extracted feature information is fed into GRU for deeper learning. It can reduce the network characteristic parameters and network layers, to achieve the purpose of short training time and fast convergence speed. The rest of this work is structured as follows. First, the background is detailed in Section 2. Second, the method and simulation verification is provided in Section 3. Then, the experimental verification is provided in Section 4. Finally, conclusions are described in Section 5.

Operational Modal Analysis
In a linear N degree of freedom system, the relationship between the response and excitation of the power spectral density (PSD) matrices can be expressed as where Syy and Sxx are the response and excitation PSD matrices, respectively. '*' is the conjugacy of the matrix, and 'T' denotes the transposition operation. Additionally, H(jω) is the frequency response function (FRF) matrix: where sr = −pr + j qr is the rth pole, pr qr is the damping factor, and qr is the damped modal frequency. Additionally, Rr is the residue, φr is the modal shape, and γr is the modal participation vector. According to Equations (1) and (2): In general, the damping of structure is small. Suppose that sr = −pr + j qr is satisfied with pr « qr, Ar ≈ drφr * φr T , and dr is a real number.
The response PSD matrix can be decomposed by singular value decomposition (SVD) as where the superscript 'H' denotes the conjugate transposition operation, U∈ C Q×N is a unitary matrix, which contains singular vectors, and it corresponds to Ψ in Equation (7). Λ∈ R N×N is the diagonal matrix, which is composed of real singular values. Only the dominant modes are analyzed, and the rth mode of enhanced PSD can be expressed as On the near spectral line, the enhanced PSD function is made of the singular value of the response spectrum matrix corresponding to Ur.
Substitute sr into Equation (11), so the enhanced PSD function can be written as: Understandably, the response signal contains all the required modal information. Then, the least-squares solution of row vector B is obtained by giving ω i in the range of interest, and then the corresponding modal frequencies and damping are obtained.
The derivation of the above equations are the assumption that the input is a flat spectrum (white noise excitation). However, in many actual cases, the ambient excitation is non-white noise [29,30]. For this reason, a general output function is proposed, which is no longer limited to the assumption of white noise excitation, and its peak point corresponds to each order of modal frequency.
where Fh denotes the frequency of the hth peak of the singular value curve, Nr takes order in this paper. For the output function, the find peaks function is used to identify the frequency of each order.
where FP(x) means looking for the peak of data, so we can directly read the modal frequencies of each order. Additionally, the output function is only related to the peak point of the singular value curve, which greatly reduces the computational cost, and is suitable for mixed noise excitation. The modal shape Ψ in Equation (7) corresponds to the singular value vector U in Equation (8), so the modal shape can be obtained according to the corresponding left singular value vector. Based on f p , damping can be obtained according to Equations (11)- (15). The rth damping can be written as:

CNN
Deep neural networks, such as CNN, have shown outstanding data modeling capabilities in various applications [44].
In the simplest case of the convolution layer, where the superscript '*' denotes valid cross-correlation operations, N represents the batch size, and C denotes channel. Cin and Cout are the input and output channels in the convolution layer, respectively. XC and YC are the input and output data in the convolution layer, respectively. b is the bias and w is the weight. The addition of the pooling layer to the convolution layer can decrease the dimension of the feature map to decrease the parameters that are finally transported to the fully connected layer, thereby speeding up the operation speed and avoiding overfitting.
In the sampling layer, where XP and YP are the input and output data in the sampling layer, respectively. m and s express the kernel size and the stride, respectively.

GRU
Unlike the long short-term memory network (LSTM) [45], GRU only has an update gate and reset gate. Therefore, it has few parameters and good convergence. The forward propagation of GRU is as follows, where h G t represents the hidden state at time t, express the reset gate, update gate, and new gate, respectively. σ denotes the sigmoid function, and '⊙' denotes the Hadamard product operation.
In the process of network transmission, the loss of a single sample is as follows: The input-output relationship of the back propagation algorithm [46] network is essentially a highly nonlinear mapping relationship. The back propagation process of the GRU are shown in Equations (28)- (34). where Update the parameters and iterate successively until the loss converges.

Dataset Process
The cantilever beam is used as the structural model to simulate the slender aerospace structure [29], and the ambient excitation of the structure is simulated by base excitation. The beam is shown in Figure 1, which is divided into 10 elements. Generally, the damping ratio of the actual aluminum beam is 0.01-0.03, so the damping ratio of the beam is 0.01 in the simulation. In practical engineering, the system is not constant, such as a rocket launch. With the consumption of fuel, the mass of the system decreases continuously [47]. Therefore, 11 beams with different masses are constructed for the simulation. Take the first beam as a reference; the mass change of the other 10 beams are (0.001; 0.002; 0.003; 0.004; 0.005; 0.006; 0.007; 0.008; 0.009; and 0.01) kg. Therefore, the mass change of the first beam and the 11th beam is 0.01 kg. The mass change is located on the 10th node of the 10th unit of the beam, and the others are the same. The length is 0.8 m, the section height is 0.012 m, and the width is 0.06 m. The elastic modulus is 7.1 × 10 10 Pa, and the mass density is 2770 kg/m 3 . In addition, the spectral lines is 1600 and the analysis frequency band is 1000 Hz. The construction process of the dataset is as follows. Taking the response directly as the input data of the network not only simplifies the data processing process, but also retains all the characteristics of the data. The noise with different signal-to-noise ratios (SNRs) [48] is added to the acceleration response. Additionally, the acceleration response signal is then processed according to the O (ff) (Section 2.1) to obtain the output data of the network (Figure 2). More specifically, each beam has 500 samples for model training.
Therefore, the dataset is composed of 500 × 11 samples, and each sample represents a twodimensional matrix.

Excitation point
Response point SVD Input of the network Output of the network

The Proposed Method
Initially, the features of the acceleration response are extracted by the CNN, as shown in Section 2.2. Additionally, the Mish function [49] is used in the convolution layer because of its remarkable accuracy and generalization ability.
Here, the loss function of the whole network structure is the mean square error (MSE).
where n denotes the total number, yi and ŷi are the actual and the predicted output value, respectively. In the second step, GRU is used to learn the features extracted by CNN.
where G (  ) denotes the GRU networks calculation, and it is detailed in Section 2.3. YP is both the output of CNN and the input of GRU networks, and YG is the output of GRU networks. Finally, the learned features are infinitely approximated to the final output of the network through the activation function.
The PReLU [50] function has the characteristics of fast convergence and simple gradient calculation, and it is the activation function of the fully connected layer.
In brief, the hybrid neural network consists of the convolutional layer, the max-pooling layer, the GRU layer, and the fully connected layer (Figure 3 and Table 1). For convenience, the capitalized initials of each layer are used to represent the corresponding network layer. The model first uses the convolutional layer C1 to learn features and then uses the Mish function to map the feature nonlinearly, where A1 is obtained. Next, the pooling layer P1 is applied to down-sample the features in A1. Then, the features in P1 are flattened into a vector and it is inputted into layer G2 for deep learning through GRU. Finally, the output is calculated through the full connection layer FC3.  Understandably, the more training set data, the better the parameters of the fitting curve can be determined, which is more conducive to the final optimization and determination of the model. On the contrary, the more test set data, the more accurate the generalization error estimation of the model. Therefore, trade-offs need to be made when dividing the dataset. The dataset is divided based on the hold-out method. Generally, about 2/3~4/5 samples are used for training. The most common training set and test set proportions are 6:4, 7:3, or 8:2. As mentioned earlier, the data sets with different proportions of training sets (50%, 60%, 70%, 80%, and 90%) are trained and tested many times, and the average value is taken for comparison. It is found that the more training set data, the longer the training time, and the training time is 203 s~292 s. The tested root mean square errors (RMSEs) were 1.89%, 0.96%, 0.64%, 0.21%, and 0.26%, respectively. When the training set data accounts for 80% and 90%, the test RMSE is better, but the training time of 90% data is 292 s more than that of 80% data, which is 246 s. As mentioned earlier, we investigated the selection of the percentage of the training samples and finally determined that the training sample percentage is 80%. Additionally, we investigated the impact of batch size on the comprehensive ability of the model, and finally determined that the batch size is 64.
To prove the advantages of the hybrid neural network, when different network structures train the same dataset, the advantage of the proposed method is self-evident, as in Figure 4. More specifically, when the proposed method reaches the fifth iteration, the value of the loss function is close to the optimal, and the curve is smoother than other network structures, indicating that this structure has the best training effect and the fastest convergence speed. As expected, the training times of CNN and LSTM are the longest and shortest, respectively. The training time of the proposed method and GRU is similar to that of LSTM. Interestingly, the training time of the proposed method is far less than the sum of the training times of CNN and GRU, indicating that the proposed method perfectly inherits the advantages of CNN and GRU. In addition, according to Equations (47)-(49), the test MRE and MARE of the proposed method is the smallest, indicating that the performance of this structure is the strongest. The comprehensive evaluation shows that the proposed method is the best.

Modal Training and Results
The response data with different SNRs are trained to test the proposed method's antinoise ability. The response data without SNR accounts for 40%, and others are 20%, respectively. The proposed method first trains the dataset and then tests the first beam. As shown in Figure 5a, 11 beams with different SNRs are trained. When iterating to step 5, four loss function curves are close to the optimal and reach the optimal in step 10, and these loss function curves are smooth, indicating that the model training effect is good. Then, the first beam with different SNRs is tested. As shown in Figure 5b, the test results have peaks at the same position, and the first five modal frequencies (15 Hz; 95.63 Hz; 268.8 Hz; 526.9 Hz; and 871.3 Hz) are obtained according to Equation (17). The convergence rate during training is the same, and the test results are the same, indicating that the noise cannot affect the network training and test results, so the proposed method's anti-noise ability is strong. Furthermore, by comparing the analytical solution, the finite element solution, and the test results of the first beam (Figure 5b), it is found that the test results are consistent with other solutions, which verifies the effectiveness of the proposed method ( Figure 6). The modal shapes conform to reality (Figure 7).

Experimental Verification
To verify the proposed method, a slender aluminum beam and an aluminum rudder surface structure were selected as the experimental specimens.

Dataset
Along the Y direction marked in Figure 8,  Amplitude acceleration signal acquisition equipment. The sensitivity of the acceleration sensor is 100.4 mv/g, and the type is PCB 333B32. In the experimental system of the slender aluminum beam, the first five acceleration sensors measure the acceleration response signals, including white noise and non-white noise excitations, and the last sensor measures the excitation signals (Figure 8). In the experimental system of the aluminum rudder space structure, the first acceleration sensor measures the white noise and non-white noise excitation signals, and the second and third sensors measure the acceleration response signals ( Figure 9).  In general, the white noise excitation spectrum refers to the excitation spectrum of PSD uniformly distributed throughout the frequency domain, and its PSD function is a straight line. The non-uniform excitation spectrum refers to the excitation spectrum of the PSD function is not a straight line, it is also called the non-white noise excitation spectrum. Within a given frequency range, blue noise, pink noise, purple noise, and brown noise (four typical non-white noises) refer to the increase and decrease in PSD by 3 dB and 6 dB per octave with the increase in frequency. The beam and rudder surface are widely used in the aerospace field and can be used to study the flight state of the aircraft, and the actual situation is non-white noise excitation [51]. In addition, the measurement of boundary layer pressure fluctuation in hypersonic wind tunnels shows that the boundary pressure fluctuation has a non-flat spectrum and should be regarded as non-white noise [52]. Therefore, three excitation spectra, including white and non-white noise excitations, are applied to simulate the ambient excitation in this paper. Where the excitation spectrum 1 is white noise excitation, the excitation spectrum 2 is a mixed spectrum composed of blue, narrow-band white, and pink noises. Similarly, excitation spectrum 3 is a mixed spectrum composed of purple, narrow-band white, and brown noises ( Figure 10). As mentioned earlier, according to the dataset construction process (Figure 2), the response signals measured under the above three spectrum excitations ( Figure 10) are processed, and the dataset is constructed. The dataset obtained by processing the acceleration response signals of the beam (Figure 8b) excited by excitation spectrum 1 are regarded as dataset 1, and the dataset obtained by processing the acceleration response signals of the beam excited by excitation spectrum2 and excitation spectrum 3 are regarded as dataset 2. Similarly, the dataset obtained by processing the acceleration response signals of the rudder surface (Figure 9b) excited by excitation spectrum 1 are regarded as dataset 3, and the dataset obtained by processing the acceleration response signals of the rudder surface excited by excitation spectrum2 and excitation spectrum 3 are regarded as dataset 4. The training set and test set proportion are 8:2.

Conventional OMA Methods
The conventional OMA methods, such as EFDD, FSDD, data-driven SSI (SSI-DATA), and covariance-driven SSI (SSI-COV), are used to identify the modal parameters of the beam (Figure 8b) under white noise ( Figure 11) and non-white noise excitations ( Figure  12 and 13). However, as shown in Figure 12c, the SSI-COV method has first-order modal leakage in identifying the natural frequency of the beam excited by excitation spectrum 2. As shown in Figures 12d and 13d, the SSI-DATA method also has first-order modal leakage in identifying the natural frequencies of the beam excited by excitation spectra 2 and 3. Additionally, according to Figure 13a-c, the EFDD, FSDD, and SSI-COV methods have mode leakages at the first natural frequency and false modes at the second natural frequency when identifying the modal frequency of the beam excited by excitation spectrum 3.Normally, the result of modal parameter identification excited by excitation spectrum 1 is considered to be accurate [21]. The analysis of Figures 11a,b and 12a,b, shows that the EFDD and FSDD methods have no modal leakages and false modes when identifying the natural frequencies of the beam excited by excitation spectrum 2, but they are inconsistent with the identification results excited by excitation spectrum 1. More specifically, the first five natural frequencies of the beam under white (excitation spectrum 1) and non-white noise excitation (excitation spectrum 2) identified by EFDD are (16.

Model Training and Results
Firstly, the dataset is substituted into the proposed model for training with the iteration steps is 50, and the learning rate is 0.001. The training process and test results are represented in Figure 14 below. The optimal value is reached in the first 10 steps, and the iterative curve is smooth, which shows that the model has a strong convergence ability (Figure 14a,b). Secondly, the network output is consistent with the target output and the proposed method has no modal leakage, showing that the network has high recognition accuracy. Finally, the test results of a different dataset of the beam are the same, and the training time is the same, indicating that the proposed method is not limited by the type of excitation signal (Figure 14c,d). Experimental modal analysis is a method of parameter identification based on the measured excitation and response signals of the structure under laboratory conditions. Additionally, the results obtained by the experimental modal analysis are reliable, which is called analysis results.
The first five modal frequencies, (16.25 97.5 307.5 617.5 996.3) Hz, of the beam are obtained according to Equation (17). Based on the analysis results (Figure 15a), the relative error of identifying the modal frequency of the beam under non-white noise excitation by different methods is obtained, which is shown in Table 3. The recognition accuracy of the results of the proposed method is the highest, and the results under white and non-white noises excitation are consistent. Therefore, the proposed method can be used for nonwhite noise excitation, and it has strong anti-noise ability and strong applicability. According to Equation (47), error = (results obtained-analysis results)/analysis results × 100%.  Similarly, the proposed method is used to train and test the rudder surface. The optimal value is reached in the first 15 steps, which shows that the model has a strong convergence ability (Figure 16a,b). The test results of a different dataset of the rudder surface are the same, and the training time is the same, indicating that the proposed method is also applicable to other structures and has research value (Figure 16c,d). Frequency is an important characterization parameter for damage identification, health monitoring, and finite element analysis [1,5]. Therefore, accurate identification frequency has a high application value. Based on the analysis results (Figure 15b), the relative errors of the proposed method under white and non-white noises excitation are analyzed, as in Table 4. The results of the proposed method for identifying the rudder surface under different noise excitation are consistent, which shows that the anti-noise interference ability of the method is strong, and it is not limited to the excitation type. Compared with the analysis results, the identification errors of the first four frequencies are all less than 0.2%, so the proposed method has high identification accuracy.

Conclusions
An adaptive OMA method using a hybrid neural network is proposed and applied to extract modes from the acceleration response signals under non-white excitation. In this work, the proposed method has fewer layers and greatly fewer characteristic parameters, which simplifies the model and reduces the time cost of training and testing. The small sample dataset makes it possible to achieve the highest accuracy in the shortest possible time. The output function proposed in this paper is only related to the peak point of the singular value curve, which greatly reduces the computational cost, and can be used for non-white noise excitation.
(1) According to the comparison of the training and test results of the same dataset with different network structures, it shows that the hybrid neural network is optimal. (2) A total of 11 beams with different qualities are numerically simulated. The training time and convergence speed of the response data with different SNRs are the same, and the test results of the first beam are the same, showing that the anti-noise ability of the proposed method is strong. In addition, the effectiveness of the method is proved by comparing with the finite element solution, analytical solution, and the test results of the first beam. (3) The conventional OMA methods are used to identify the modal parameters of the beam. There are false modes and mode leakage when identifying the modal parameters of the response under non-white noise excitation (excitation spectrum 2 and excitation spectrum 3). Taking the analysis results as a reference, the results of different methods for identifying the response under non-white noise excitation are compared. It is found that the proposed method has no modal leakage and the highest identification accuracy. Therefore, the proposed method is not limited by the excitation type and is more suitable for the actual ambient excitation. (4) The results of the proposed method for identifying the rudder surface under different noise excitations are consistent, which shows that the anti-noise ability of the method is strong. Compared with the analysis results, the identification errors of the first four modal frequencies are all less than 0.2%, so the identification accuracy of the proposed method is high.