3.1.1. Wavelet-Adaptive Prefilter Based on LSTM Optimization
Since flight training requires simulated flights in different environments, multiple data, such as flight attitude, will be affected by noise interference. In order to avoid effective information being drowned in noise, we first pre-filtered the multimodal data. A pre-filter is a filter that pre-processes the signal before the data is output and used. It is often used to eliminate noise interference, enhance signal quality, etc., to ensure the accuracy and stability of the output signal. When selecting a pre-filter, it is necessary to comprehensively consider factors such as signal type, noise characteristics, system requirements, and implementation difficulty. Most of them use simple traditional filters to process signals. For example, the following common filter types include low-pass, high-pass, band-stop, and adaptive filters. When sensors work in harsh environments, they will be affected by environmental uncertainties, especially jitter, displacement, etc., and the noise of multiple sensors themselves may cause many signals and noises to overlap. In order to improve the filtering efficiency of multimodal data, we propose a wavelet-adaptive pre-filter based on LSTM optimization to dynamically adjust the filter parameters according to the data distribution characteristics of different signals to meet the filtering needs of multimodal data.
- (1)
Wavelet threshold filtering
Wavelet threshold filtering was first proposed by Donoho. The wavelet threshold denoising method can effectively retain important information about the signal while filtering out noise and can adapt to the local characteristics of the signal. The algorithm is relatively simple and easy to implement. The process of the wavelet threshold denoising algorithm: First, select a suitable wavelet basis function to decompose the original signal to obtain a series of wavelet coefficients. Through the threshold and threshold function, the wavelet coefficients above the threshold are retained or appropriately shrunk and retained, and the wavelet coefficients below the threshold are zeroed. Finally, non-zero wavelet coefficients are selected to reconstruct the denoised signal. It can be seen that the selection of the wavelet basis function, decomposition layer number, wavelet threshold, and threshold function is crucial to the effect of wavelet denoising.
Figure 5 shows the wavelet threshold denoising process.
In wavelet denoising, the hard threshold function has problems such as disconnection and discontinuity, leading to oscillation in the filtered signal. Although the soft threshold function is continuous, it has a fixed deviation, which will cause the problem of incomplete filtering. As for the selection of thresholds, the commonly used threshold estimation methods currently include fixed thresholds, maximum and minimum criterion thresholds, Stein unbiased likelihood thresholds, etc. Fixed thresholds are relatively simple but lack flexibility and have the problem of “over-killing.” The maximum and minimum criterion threshold has the problem of “over-retention” when the signal sample is large, and the unbiased likelihood threshold is unsuitable for processing low signal-to-noise ratio signals. The heuristic threshold also makes it easy to mistake useful high-frequency signals for noise and remove them. Therefore, based on the above analysis, this paper proposes introducing fuzzy logic by establishing a wavelet fuzzy threshold and threshold processing function based on two classic thresholds and fuzzy membership functions.
- (2)
Fuzzy Theory
Fuzzy theory is a method for dealing with fuzzy information and uncertainty problems. Unlike traditional binary logic, fuzzy theory allows things to exist between partially true and partially false, which aligns more with the uncertainty and fuzziness characteristics in the real world. The core idea of fuzzy theory is to introduce the concept of fuzzy sets, which have explicit binary attributes in traditional set theory. In fuzzy sets, each element can be represented by a certain membership degree, which can be any actual number between 0 and 1 to represent the degree of association with the set.
The main application areas of fuzzy theory include fuzzy control, fuzzy reasoning, fuzzy optimization, fuzzy decision-making, etc. Among them, fuzzy control is one of the most critical applications of fuzzy theory. In traditional control systems, precise mathematical models usually describe the relationship between input and output. In contrast, in fuzzy control, the uncertainty and fuzziness between input and output can be better handled, thereby achieving more flexible and robust control. In addition to control and reasoning, fuzzy theory can be applied to optimization problems. At present, fuzzy set theory has been widely used in linear programming. Therefore, this paper introduces fuzzy theory into the selection of threshold function and proposes an improved solution based on fuzzy threshold.
- (3)
Fuzzy threshold selection and fuzzy threshold function establishment
First, select the appropriate wavelet basis function and decomposition layer number, perform wavelet decomposition on the noisy signal to obtain a series of wavelet coefficients, remove the wavelet coefficients that are smaller than the maximum and minimum criterion threshold, retain the wavelet coefficients that are larger than the fixed value threshold, and process the wavelet coefficients between the maximum and minimum criterion threshold and the fixed threshold using the fuzzy membership function. Finally, an inverse transformation of the processed wavelet coefficients is performed to reconstruct the signal denoising. The specific improved algorithm is shown in
Figure 6.
This paper uses the ascending semi-ridge fuzzy membership function to establish a wavelet fuzzy threshold denoising method that combines the threshold function with the threshold. The function expression is as follows:
where
is the threshold of the minimax criterion,
is the fixed threshold,
are the wavelet coefficients after decomposition, and
is the fuzzy membership function.
- (4)
Layered adaptive wavelet fuzzy threshold denoising
Due to the difference in noise components in different frequency bands, this paper introduces a dynamic adjustment coefficient
r based on the fuzzy threshold function. According to different decomposition scales, each layer selects the corresponding threshold and implements hierarchical adaptive improvement of the threshold to obtain a threshold
T(j) suitable for each layer. The improved threshold calculation formula is:
where
represents the dynamic adjustment coefficient,
represents the maximum/minimum criterion threshold, and
represents the fixed threshold.
represents the threshold for each layer.
In wavelet analysis, the wavelet coefficients with greater signal correlation are relatively large, and the wavelet coefficients with less noise correlation are smaller. Therefore, according to different decomposition scales, each layer selects a corresponding threshold to achieve dynamic changes in the layered threshold. Use the logarithmic function to construct a dynamic adjustment coefficient
r:
where
j is the number of decomposition layers.
Therefore, the new threshold function is:
where
is the threshold of the maximum/minimum criterion,
is the fixed threshold,
are the wavelet coefficients after decomposition, and
is the fuzzy membership function
- (5)
Improved wavelet threshold denoising algorithm based on LSTM optimization
Traditional wavelet thresholding denoising only processes high-frequency components, neglecting the low-frequency noise of the gyroscope. This paper proposes an improved scheme: LSTM is used to smooth the low-frequency components of the sensor after wavelet decomposition, preserving the original signal trend. After wavelet decomposition, the resulting low-frequency coefficients represent the approximate contour and main trend of the signal, and their changes exhibit time-dependent relationships. Changes in flight attitude and physiological signals can be reflected in the low-frequency coefficient sequence. Long short-term memory (LSTM) networks are specifically designed for modeling long-sequence time dependencies. They can memorize and utilize historical information to predict or generate the output at the current moment, making them suitable for learning smooth, continuous trend signals. Therefore, we use LSTM to fit the low-frequency wavelet coefficient sequence.
The LSTM network is responsible for fitting the overall wavelet coefficient value with the help of the wavelet coefficients before this moment. The characteristics of the wavelet coefficients are established through the input gate, forget gate, and output gate of the long short-term memory network. Finally, different weights are set for different long short-term memory network hidden layer units, and the fitting results are finally output after training. Assume that the long short-term memory network has a total of
m neurons, of which the output value of the
x-th neuron is
, the weight of the neuron is
, and the output value is
.
where
m represents the number of neurons in the long short-term memory network,
x represents the neuron’s index,
is the output value,
is the neuron’s weight, and
is the output value
Using this method, we have solved the problem of low adaptability of traditional wavelet transform filters to signals with different numerical distributions. Through this method, we pre-filter multimodal data to remove interference in the dataset.
In this algorithm, the LSTM network learns and predicts the optimal dynamic adjustment coefficient r. As shown in Equations (3) and (4), the theoretical initial value of r is given by the logarithmic function of the decomposition layer j. To further improve the adaptability of the threshold to different signal characteristics and noise levels, we introduce a lightweight LSTM network to fit the complex nonlinear relationship between the coefficient r and the statistical characteristics of the wavelet coefficients of the current decomposition layer, thereby generating adaptive optimal thresholds Ta(j) and Tb(j) for each wavelet decomposition layer. The input to the LSTM is the mean of the high-frequency wavelet coefficients ωj of the current decomposition layer j. The output is the optimized dynamic adjustment coefficient ropt.
The LSTM is trained in an unsupervised manner. The training objective is to minimize the reconstruction error between the denoised signal and the ideal reference signal. We use the mean square error (MSE) as the loss function and update the parameters of the LSTM weights and the threshold function simultaneously through backpropagation, enabling the system to automatically learn the optimal adjustment coefficient r.
The LSTM unit has 32 hidden states. During training, we use a large number of noisy signals and their corresponding wavelet decomposition coefficients as samples. In each iteration, LSTM predicts ropt based on the input features, then calculates the threshold of this layer using Formulas (3) and (4), processes the wavelet coefficients and reconstructs the signal, and finally calculates the gradient and updates the network using the loss function.
3.1.2. Multimodal Data Layer Fusion
According to aviation flight training regulations, formal five-side flight training includes multiple links such as takeoff, pull-up, descent, and landing, and its period and training subjects are relatively large. Therefore, the multimodal data generated during the training process has the characteristics of high dimension and large scale before being processed. In order to solve the problem of the complexity of the calculation amount brought by large-scale data, we need to rely on mighty computing power as a support to further integrate and extract features of the vast training data to achieve efficient evaluation of pilot operation ability. On the other hand, the research on multimodal data for pilot training evaluation only considers the frequency domain characteristics of a single physiological signal. It cannot capture the complementary relationship expressed by multiple physiological sensors at a certain moment during flight training, nor can it capture the frequency domain characteristics of multimodal data. In order to take both into account, we used a short-time Fourier transform to resample and fuse the data. In order to keep the data undistorted, it is necessary to combine physiological principles to ensure that each physiological information contains a complete physiological cycle. According to the timestamp, we intercepted the required eye movement, electrocardiogram, skin electricity, electromyography, and breathing data. We eliminated individual data that did not contain multimodal physiological information simultaneously.
To better analyze and process this data, we transform it into a time–frequency plot. This can be achieved using the short-time fast Fourier transform (STFT), a commonly used time–frequency analysis method. It decomposes a signal into two dimensions: time and frequency, and analyzes the changes in these dimensions. The multimodal dataset in this paper includes various signals such as eye movement (1024 Hz), electroencephalography (EEG) (256 Hz), electrocardiography/electromyography/electrodermal signals (1024 Hz), respiration (8 Hz), and flight attitude (256 Hz). Pilots performing structured flight training often have fixed and clearly defined task nodes, and their key physiological rhythms and behavioral patterns also have clear phases and specific data frequencies in time. STFT can stably map the signal within each window to the frequency domain, generating a standard two-dimensional time–frequency plot. Therefore, this method is suitable for analyzing quasi-stationary time series signals.
Furthermore, compared to wavelet transform, Hilbert–Huang, and other data fusion methods, STFT reduces the complexity and computational cost of data computation in the preprocessing stage. This method is better suited for processing large-scale, high-efficiency data from flight simulation training.
When STFT processes pilot multimodal data, it divides the original signal into several fixed-length windows. The signal within each window undergoes a Fourier transform to obtain the corresponding spectrum. The window length is typically fixed, ranging from tens to hundreds of milliseconds.
In this way, the spectral characteristics of the signal can be analyzed in each time window to obtain the frequency information of the signal changing over time. By moving the window, the time–frequency diagram of the entire signal is obtained.
The essence of the short-time Fourier transform is to divide the time domain signal on the time axis according to a certain window length and then perform the Fourier transform on each window after division to convert the one-dimensional time domain signal into a two-dimensional time–frequency domain matrix. Short-time Fourier transform is a time–frequency analysis method commonly used in acoustic signal research. Adobe Audition, Prat, and other acoustic signal processing and data labeling software are based on short-time Fourier transform. The calculation formula of the short-time Fourier transform is
where
is the time-domain signal at time
t,
h is the window function, and
is the position of the center time of the window in the Fourier transform on the time axis.
In this paper, the Hamming window is used as the window function, the window function length is 0.5 s, and the window overlap rate is 50%. In order to ensure the consistency of the input data, the data is standardized. The multimodal data fusion flow chart is shown in
Figure 7.
We performed LSTM prediction on the data after feature-layer fusion and the data without fusion and tested the performance of the network through AUC. As shown in the figure, it can be seen that the output results of the dataset after feature-layer fusion are significantly better than the related algorithms without fusion, and the AUC is greater than 0.94 and higher than the method without data fusion. The test results of different fusion methods are shown in
Figure 8.