STFF-CANet Diagnosis Model of Aero-Engine Surge Based on Spatio-Temporal Feature Fusion

Hu, Chunyan; Shen, Yafeng; Zeng, Qingwen; Xu, Gang; Sun, Jiaxian; Miao, Keqiang

doi:10.3390/aerospace13030212

Open AccessArticle

STFF-CANet Diagnosis Model of Aero-Engine Surge Based on Spatio-Temporal Feature Fusion

by

Chunyan Hu

^1,2,3,

Yafeng Shen

^1,2,*

,

Qingwen Zeng

^1,2,

Gang Xu

^1,2,3,

Jiaxian Sun

^1,2 and

Keqiang Miao

^1,2

¹

Key Laboratory of Light-duty Gas-turbine, Institute of Engineering Thermophysics, Chinese Academy of Sciences, Beijing 100190, China

²

National Key Laboratory of Science and Technology on Advanced Light-duty Gas-turbine, Beijing 100190, China

³

School of Space Exploration, University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Aerospace 2026, 13(3), 212; https://doi.org/10.3390/aerospace13030212

Submission received: 5 January 2026 / Revised: 26 January 2026 / Accepted: 5 February 2026 / Published: 27 February 2026

(This article belongs to the Section Aeronautics)

Download

Browse Figures

Versions Notes

Abstract

Aero engine surge diagnosis is a key technology in engine health management, and its diagnostic accuracy is of great significance for ensuring operational safety. Traditional threshold-based diagnostic methods are significantly affected by working conditions, which makes it difficult to achieve full working condition coverage. Moreover, due to issues such as varying feature thresholds across conditions, weak signal characteristics, and low identifiability, the diagnostic accuracy remains limited. To address these challenges, this paper proposes an STFF-CANet (Spatio-Temporal Feature Fusion Cross-Attentional Network) diagnosis model of aero engine surge based on spatio-temporal feature fusion. The model first employs a Convolutional Neural Network (CNN) to extract spatial features from the frequency domain of dynamic signals via Fast Fourier Transform (FFT). Simultaneously, a Bidirectional Long Short-Term Memory (BiLSTM) network is used to capture temporal features from signals optimized by Variational Mode Decomposition (VMD). A cross-attention mechanism is further introduced to achieve deep fusion of spatiotemporal features, thereby enhancing the capability to identify weak fault characteristics. In addition, the sliding window slice method is used to expand the sample size for the small sample fault data of the engine surge of an aero engine. This ensures both informational continuity between slices and statistical stability of features, effectively mitigating the difficulty of diagnosing early and weak surge characteristics under small-sample conditions. Experimental results demonstrate that the model achieves an F1-score, Recall, Precision, and Accuracy of 97.96%, 97.52%, 98.43%, and 99.01%, respectively, in surge fault classification. These outcomes meet the practical requirements for aero engine surge diagnosis and provide an effective solution for early fault warning in complex industrial equipment.

Keywords:

aero engine; engine surge; diagnostic model; temporal features; spatial features; feature fusion

1. Introduction

With increasing demands on aircraft engine performance and the continuous expansion of their operational envelopes, engine surge margins have become insufficient [1]. Engine surge, as an unstable operational condition of aviation engines, is a low-frequency, high-amplitude oscillatory phenomenon of airflow along the compressor axis. It can induce severe mechanical vibrations in engine components and overheating of hot sections, leading to significant damage within a short period and seriously compromising flight safety [2]. Therefore, accurate surge diagnosis has remained a key and challenging focus in the field of aero-engine health management.

The diagnosis of aero engine surge can be regarded as a specific type of fault detection and diagnosis problem. Many scholars have conducted extensive research on fault detection and diagnosis methods for mechanical systems, which primarily fall into two categories: knowledge-based reasoning methods and data-driven methods.

Knowledge-based reasoning methods do not require the establishment of an accurate system model. Instead, they perform computational reasoning and diagnosis based on systemic principles, long-term practical experience, and accumulated fault information, such as fault tree-based reasoning and expert systems. Knezevic et al. [3] analyzed faults in the turbocharger of a diesel engine using Fault Tree Analysis (FTA), assessed system reliability, predicted fault causes, and ultimately achieved the goal of eliminating major faults in the air subsystem. Wang Hailan [4] designed a fault tree-based expert system for natural gas engine fault diagnosis by analyzing the overall structure of fault diagnosis expert systems and their common fault modes, effectively improving the accuracy of natural gas engine fault diagnosis. However, with the advancement of science and technology, systems have become increasingly complex, making it very difficult to establish precise mathematical models. It is also challenging to utilize expert knowledge and practical experience for reasoning and diagnosis, often resulting in unsatisfactory outcomes when dealing with increasingly complex engine systems.

Data-driven methods eliminate the need for building precise complex system models or relying heavily on domain expert knowledge and knowledge representation and reasoning mechanisms. However, they typically require a large amount of accurate data. With the rapid development of artificial intelligence technology, deep learning has achieved remarkable results in many fields. Elashmawi et al. [5] proposed a fault diagnosis model based on artificial neural networks, which offers the dual advantage of online monitoring, analysis, and diagnosis of gas turbine engine faults. Wu Bin et al. [6] focused on turbofan engines and introduced Deep Belief Networks (DBN) for diagnosing performance degradation faults in engine components. This approach addressed the lack of generalization capability in shallow neural networks for diagnosis and improved the diagnostic accuracy for performance degradation faults in engine gas path components. Yuan et al. [7] utilized Long Short-Term Memory Networks (LSTM) for fault diagnosis and remaining useful life prediction of aero engines. Chen et al. [8] proposed a Hybrid Dilated Convolution (HDC) model based on CNN, Deep Neural Networks (DNN), and LSTM. This model achieved an accuracy of 83% in diagnosing gas path faults in aero engines. Guo et al. [9] developed a real-time accurate bearing fault diagnosis method using wavelet transforms and deformable CNNs. Yang et al. [10] applied autoencoders and CNNs to process bearing vibration signals, achieving effective fault diagnosis.

Compared with general rotating machinery, aero engines operate under highly variable conditions with extremely complex mission profiles. Additionally, due to the specific constraints of their working environment, there are strict limitations on sensor placement and weight in the aero engine, resulting in a limited number of fixed measurement points. The test signals received by sensors often undergo multipath propagation and are subject to interference from vibrations, aerodynamics, combustion, and other factors, leading to low signal-to-noise ratios and posing difficulties for surge diagnosis. Li et al. [11] installed pressure sensors inside and at the outlet of a seven-stage compressor pipeline to measure static pressure inside the pipeline and total pressure at the outlet, respectively. By applying short-time Fourier transform to time–frequency analysis of the acquired signals, they successfully identified characteristic features before surge occurrence. Zheng et al. [12] installed pressure and temperature sensors at multiple flow-direction positions in the pipeline to study the initiation and evolution of rotating instability. Pullan et al. [13] identified spike-like features during stall and explained the physical mechanism behind their emergence. Munari et al. [14] mounted vibration sensors on the casing and identified characteristic surge frequencies through sideband analysis of synchronous resonance frequencies.

However, most experimental studies on compressor surge currently rely on conventional pressure, temperature, and vibration sensors. Pressure and temperature sensor probes are installed inside the pipeline as intrusive sensors, introducing external interference into the compressor flow field [15]. Vibration sensors, mounted on the casing, cannot effectively measure the airflow field. Moreover, the precursor features measured by these methods are relatively weak; for aero engine compressors operating under real conditions, these features are often obscured, making early warning challenging [16]. Li Zepeng et al. [17] employed a non-intrusive circumferential microphone array to conduct experimental research on a real aero engine fan test rig, achieving pre-surge feature identification through decomposition of pipeline noise modal waves. Jianpeng Ma et al. [18] propose a method based on uniform phase intrinsic time-scale decomposition (UPITD). By analyzing the correlation between weak magnetic signals and cage rotation frequency in the time domain, the method effectively separates fault signals from cage rotational signals. Yun Li et al. [19] proposed an improved EMD method, called FAEMD, and applied it to the fault diagnosis of bearings. The analysis of two groups of measured fault signals of rolling bearings shows that the FAEMD method has good adaptability to nonstationary signals. Xiaolin Liu et al. [20] present a method with multiscale fusion attention CNN (MSFACNN) to diagnose the fault of aero engine rolling bearings. Yulai Zhao et al. [21] combined with the improved TSA, a data-driven fault diagnosis strategy for the bearing-rotor system is proposed. Yao Yanling et al. [22] proposed a surge diagnosis model for aero engines based on CNN-Seq2Seq.

Current research primarily focuses on surge occurring under steady-speed conditions, whereas surge is more likely to happen during acceleration processes. Dynamic conditions are prone to causing misjudgments. Most studies rely on signals with distinct features, making other forms or weak features susceptible to missed detection. In next-generation engines, excessively high outlet temperatures and the difficulty in measuring dynamic pressures present additional challenges.

To overcome the obstacles identified previously, the research presented in this paper encompasses the following subjects:

1. This study proposes an advanced model for predicting engine surge faults in aircraft engines with high accuracy. The core concept involves capturing subtle fault precursors by fusing spatiotemporal features of signals. The model directly yields diagnostic results from raw signals, reducing the reliance on and errors associated with manual feature extraction in traditional methods.

2. Addressing the challenges of difficult acquisition and high experimental costs for engine surge data, as well as the small-sample dilemma, the number of samples was expanded using a sliding window slicing technique. This approach ensures both the informational continuity between overlapping slices and the stationarity of feature statistics. Consequently, the engine surge diagnosis model was trained using a limited number of samples.

3. Spatial feature extraction employs CNN to analyze FFT of dynamic signals, capturing patterns across different frequency components. Temporal feature extraction utilizes BiLSTM to analyze time-frequency domain characteristics (VMD) of dynamic signals, capturing dependencies and evolutionary patterns along the temporal dimension. Spatiotemporal feature fusion adopts a cross-attention mechanism, enabling deep interaction between spatial and temporal features. This allows for more precise identification of easily overlooked weak fault characteristics, thereby enhancing the accuracy of surge diagnosis.

4. Through comparative analysis of experimental results, this study critically examines the findings, which indicate that the model achieved an F1-score, Recall, Precision, and Accuracy of 97.96%, 97.52%, 98.43%, and 99.01%, respectively, for surge fault classification. These results demonstrate that the model can meet the practical requirements for engine surge diagnosis.

2. Surge Feature Analysis

2.1. Aero-Engine Surge Signals Analysis Based on FFT

Spectral analysis was performed on the dynamic pressure signals from engine distortion test data of an aircraft engine using FFT method. The dynamic pressure signals were acquired using high-frequency response pressure transducers and a data acquisition system with a sampling rate of 20 kHz to ensure signal fidelity and accuracy for spectral analysis. A 1-s segment of the overall time-domain waveform and its corresponding frequency spectrum were extracted respectively. The dynamic pressure signal under non-surge conditions and its FFT spectrum are shown in Figure 1a,b, while those under surge conditions are presented in Figure 1c,d.

To examine the differences in the frequency domain between dynamic pressure signals under non-surge and surge conditions, frequency spectra within a 1-s interval were extracted from the dynamic pressure time-series signals for both states, as shown in Figure 1c,d. It can be observed that their amplitudes are nearly identical around the three high-frequency components. However, the amplitude of the low-frequency components under surge conditions is significantly higher. It can thus be inferred that the increase in amplitude of the low-frequency components is a consequence of engine surge occurrence. It can be observed that their amplitudes are nearly identical around the three high-frequency components (3000–8000 Hz). However, the amplitude of the low-frequency components (100–500 Hz) under surge conditions is significantly higher. Statistical analysis using an independent samples t-test confirmed that the amplitude difference of the low-frequency components (100–500 Hz) between surge and non-surge conditions was statistically significant (p < 0.01). It can thus be inferred that the increase in amplitude of the low-frequency components (100–500 Hz) is a consequence of engine surge occurrence.

From the frequency spectra within a 1-s interval shown in Figure 2c,d, it can be observed that the amplitudes near the high-frequency components (3000–8000 Hz) are almost identical for both conditions. However, the amplitude of the low-frequency components (100–500 Hz) under surge conditions was significantly larger (p < 0.01, independent samples t-test), which is consistent with the phenomenon observed in Test A. Through frequency-domain analysis of the signals from both Test A and Test B, it is evident that the change in amplitude of the low-frequency components (100–500 Hz) under surge conditions indicates that this feature can be utilized for research on surge diagnostic methods. However, relying solely on the amplitude of low-frequency components (100–500 Hz) to determine the occurrence of surge makes it difficult to establish a suitable threshold.

2.2. Aero-Engine Surge Signals Analysis Based on Optimized VMD

Variational Mode Decomposition (VMD) is an adaptive, non-recursive signal decomposition method capable of decomposing a complex signal into multiple intrinsic mode functions (IMFs) with finite bandwidth and specific center frequencies. Its fundamental principle involves determining each modal function and its corresponding center frequency by solving a variational problem. This variational problem aims to minimize the sum of the estimated bandwidths of all IMF components while constraining their sum to equal the original signal.

The objective of VMD is to minimize the following variational framework:

\underset{{u_{k}}, {ω_{k}}}{m i n} {\sum_{k = 1}^{K} {‖\partial_{t} [(δ (t) + \frac{j}{π t}) * u_{k} (t)] e^{- j ω_{k} t}‖}_{2}^{2}}

(1)

s . t . \sum_{k = 1}^{K} u_{k} (t) = x (t)

(2)

where

u_{k} (t)

is the

k

-th intrinsic mode function,

ω_{k}

is the center frequency of the

k

-th mode,

δ (t)

denotes the Dirac delta function, ∗ represents the convolution operation, and

K

is the preset number of modes. By solving the aforementioned variational problem, a series of IMF components

u_{k} (t)

,

k = 1, 2, \dots, K

, can be obtained. These IMF components represent the characteristics of the signal across different frequency bands.

Figure 3 presents the VMD decomposition of the dynamic pressure signal under surge conditions across different IMFs. The performance of VMD is highly dependent on two key preset parameters: the number of modes K and the penalty parameter α. The number of modes K determines the granularity of decomposition: if K is too small (under-decomposition), modalities with distinct characteristics may be aliased within the same component; conversely, if K is too large (over-decomposition), a single physically meaningful mode may be excessively segmented, generating spurious components. The penalty parameter α directly influences the bandwidth of each IMF component: a larger α imposes a stronger constraint, resulting in narrower bandwidths for each component, which may lead to the loss of useful information, whereas an excessively small α results in overly broad bandwidths, causing frequency aliasing between components and information redundancy. Therefore, determining the optimal parameter combination [K, α] is a prerequisite for applying VMD effectively in feature extraction.

To overcome the empirical nature and uncertainty of parameter selection, this study adopts an optimization strategy based on the minimum envelope entropy criterion to adaptively determine [K, α]. The envelope entropy

E_{p}

of a signal’s envelope

e (t)

(obtained via Hilbert transform) is defined as

E_{p} = - \sum_{t = 1}^{N} p_{t} {l o g}_{2} (p_{t})

, where

p_{t} = e (t) / \sum_{t = 1}^{N} e (t)

E is the probability distribution. A lower envelope entropy indicates a sparser and more regular signal, which is desirable for isolating distinct fault features from background noise. This provides the theoretical basis for its use as an optimization objective. The optimization process is conducted as follows: the search range for the mode number K is set from 3 to 10, and the penalty parameter α is searched within [100, 3000] with a step size of 100. For each candidate parameter pair [K, α], VMD decomposition is performed. The envelope entropy is calculated for each resulting Intrinsic Mode Function (IMF), and the minimum value among all IMFs is recorded as the objective function value for that parameter pair. A grid search is employed to find the parameter combination

[K^{*}, α^{*}]

that globally minimizes this objective function. This systematic search ensures the decomposition achieves maximal sparsity and separation of informative components. Envelope entropy can effectively characterize the sparsity and regularity of a signal sequence. Under the optimized parameters, VMD decomposition is performed on the dynamic pressure signals from both surge and non-surge states. Following the minimum envelope entropy criterion, the most representative (i.e., the sparsest and most regular) IMF component is selected from the multiple decomposed IMFs. From this optimal IMF component, nine time-domain statistical indicators—including mean, variance, peak value, kurtosis, among others—are extracted to construct an initial feature vector.

However, even with parameter optimization, relying solely on VMD decomposition and the unidimensional features extracted therefrom proves insufficient for stably and accurately distinguishing between surge and normal operating states. The subtle precursor characteristics of surge and the complex background interference within the signal are not easily fully separated or highlighted within a single decomposition domain. Consequently, to overcome this limitation, this study further constructs a deep learning diagnosis model based on spatiotemporal feature fusion. This model aims to synergistically utilize complementary information extracted from different transform domains of the signal (e.g., FFT features from the frequency domain and optimized VMD features from the time-frequency domain). Through deep feature interaction and fusion, it enhances the capability to identify weak surge precursors, thereby achieving more precise and robust diagnosis.

3. Methodology

The STFF-CANet diagnosis model for engine surge in aircraft engines, based on spatiotemporal feature fusion, is proposed as illustrated in Figure 4. The overall workflow of the model comprises three main stages: data preprocessing, feature extraction, and spatiotemporal feature fusion. It aims to extract and fuse effective features from surge fault data to achieve enhanced performance in fault identification or analysis.

3.1. Data Pre-Processing

To address the challenges of difficult data acquisition, high experimental costs for engine aero engine surge data, and the problem of small-sample data, this study aims to train an efficient engine surge diagnosis model using a limited number of samples. We selected a portion of historical test data from a certain type of aero engine as the dataset. This test data includes dynamic pressure data from various scenarios such as start-up, acceleration, deceleration, thrust increase, and distortion.

The surge fault timestamps were manually annotated by experts based on corresponding rules and experience. For example, when the numerical curve from the speed sensor stabilizes, if the pressure sensor exhibits a sudden rise, drop, or severe fluctuation until it returns to stability, this interval is marked as the surge interval. Every moment within this interval is considered a surge occurrence, while all other moments are labeled as normal operation. After manual annotation, the ratio of normal points to surge points in the dataset is approximately 90:10.

The original dataset was shuffled and then split into training, validation, and test sets in a 7:2:1 ratio. This ratio is a widely adopted practice in machine learning for limited datasets, ensuring a sufficiently large training set for model learning while reserving adequate samples for unbiased validation and final testing. In our case, the 70% training set allows the model to learn the underlying patterns, the 20% validation set is used for hyperparameter tuning and preventing overfitting, and the 10% test set provides a final evaluation on unseen data, simulating real-world application scenarios. The model was trained on the training set and evaluated on the validation set. Once the optimal parameters were found, the final evaluation was conducted on the test set. Let T = {t₁, ⋯, t_n} represent the training set with a total of n timestamps; t_i = {s_i1, ⋯, s_im} represents the test data at the i-th timestamp; s_ij is the value from the j-th pressure sensor at the i-th timestamp, with a total of m pressure sensors. R = {0,1} denotes the diagnosis result, where 0 indicates normal operation and 1 indicates a surge fault.

To increase the number of samples while ensuring both informational continuity between overlapping slices and the statistical stationarity of features, we employed a sliding window slicing method to process the time-series data and construct the model’s input, which is proposed as illustrated in Figure 5. Specifically, data collected from each sensor is segmented using a fixed-size window with a fixed sliding stride. This approach augments the number of samples in the dataset, providing more data for model training. Particularly when experimental data collection is limited, a smaller sliding stride can be used to obtain more samples. Furthermore, the overlapping sub-windows generated during sliding window sampling help the model more easily learn the characteristics of fault sequences. The size of the sliding window corresponds to the time-step length of a single training sample.

The data augmentation through window slicing effectively mitigates overfitting in small-sample data classification, enhancing the model’s robustness and generalization capability. Once the data enters the model, preprocessing is first performed on the surge fault data to extract features from both frequency-domain and time-domain perspectives. Frequency-domain features are extracted by applying the FFT to the original surge fault data, converting the time-domain signal into a frequency-domain signal. FFT reveals the frequency components of the signal, aiding in the analysis of its distribution across different frequencies. The resulting frequency-domain features are represented as a series of spectrograms. Time-domain features are obtained by processing the surge fault data using the VMD method. VMD decomposes complex signals into multiple IMFs with finite bandwidth and specific center frequencies. These IMF components represent the time-domain characteristics of the signal across different frequency bands, thereby capturing local variations and detailed information of the signal along the temporal dimension.

Figure 5. Sliding Window Slicing Data Processing Method.

3.2. Feature Extraction

Following data preprocessing, the model employs a CNN and a BiLSTM to perform further extraction of frequency-domain and time-domain features, respectively:

•: Spatial Feature Extraction (CNN): The frequency-domain features obtained via FFT transformation are fed into the CNN. The CNN processes these features through convolutional and pooling operations. The convolutional layer utilizes multiple convolutional kernels that slide over the frequency-domain feature maps to extract local patterns and spatial features under different frequency components. The pooling layer performs down sampling on the convolved feature maps, reducing computational complexity while enhancing translation invariance. Ultimately, the CNN outputs the spatial feature representation of the frequency domain, denoted as $F_{F F T}$ .
•: Temporal Feature Extraction (BiLSTM): The time-domain features derived from VMD decomposition are input into the BiLSTM. The BiLSTM consists of two oppositely directed LSTM layers, enabling it to simultaneously utilize both past and future contextual information. This architecture fully captures the long-term dependencies and evolutionary patterns of the signal along the temporal dimension. Through processing by the BiLSTM, a more representative time-domain feature representation, denoted as $F_{V M D}$ , is extracted.

3.2.1. Spatial Feature Extraction

The CNN is capable of extracting spatial features from signals. We first apply the FFT to convert the dynamic pressure signal from the time domain to the frequency domain, obtaining a frequency-domain feature matrix. This matrix is then used as input to the CNN, where spatial features within the frequency domain are extracted through convolutional and pooling operations. This approach combines the strengths of FFT in frequency-domain analysis with the powerful feature extraction capabilities of CNNs, enabling effective capture of complex patterns and characteristics present in the dynamic pressure signal.

The Fast Fourier Transform is applied to the acquired dynamic pressure signal x(t) to convert it from the time domain to the frequency domain. The mathematical expression for the FFT is as follows:

X (f) = F {x (t)}

(3)

where

F {x (t)}

denotes the Fourier transform operation, and

X (f)

is the frequency-domain representation of the signal

x (t)

at frequency f. Through the FFT, we obtain the signal’s frequency spectrum, which contains amplitude and phase information for different frequency components.

The result

X (f)

from the FFT is organized into a format suitable for CNN input. Typically, the spectral data can be treated as a two-dimensional image, where the horizontal axis represents frequency and the vertical axis can represent time (in the case of the Short-Time Fourier Transform) or another dimension, or simply as a one-dimensional vector. Suppose we arrange the spectral data into a two-dimensional matrix

F_{F F T}

, where its element

F_{F F T} (i, j)

represents the frequency-domain value at a specific frequency and time point.

The organized frequency-domain feature matrix

F_{F F T}

is then used as input to the CNN. The CNN extracts spatial features (patterns within the frequency domain) through convolutional and pooling operations. The specific steps are as follows:

Convolution Operation: The convolutional layer employs multiple convolution kernels to perform convolution operations on the input frequency-domain feature matrix, capturing local patterns under different frequency components. The mathematical expression for the convolution operation is:

C^{l} = σ (W^{l} * F_{F F T} + b^{l})

(4)

where

C^{l}

is the output feature map after the l-th convolutional layer, σ is the activation function,

W^{l}

is the weight matrix of the convolution kernel in the l-th layer, ∗ denotes the convolution operation, and

b^{l}

is the bias vector for the l-th layer.

Pooling Operation: The pooling layer (typically max pooling or average pooling) performs down sampling on the convolved feature maps. This reduces the dimensionality of the feature maps, improves computational efficiency, and enhances translation invariance of the features. The mathematical expression for max pooling is:

P^{l} (i, j) = \underset{(m, n) \in R_{i, j}}{m a x} C^{l} (m, n)

(5)

where

P^{l}

is the output feature map after the l-th pooling layer, and

R_{i, j}

is the region covered by the pooling window.

Through multiple layers of convolutional and pooling operations, the CNN can progressively extract high-level spatial features from the frequency-domain characteristics. These features can more effectively represent the patterns and structures of the dynamic pressure signal across different frequency components.

3.2.2. Temporal Feature Extraction

To extract more representative features from the dynamic pressure signal, a method combining VMD and BiLSTM can be employed. The detailed description is as follows:

First, the acquired dynamic pressure signal is decomposed using VMD, as described in 2.2. The individual IMF components obtained from VMD are then used as the input for subsequent analysis. Each IMF component can be regarded as a time-frequency domain feature sequence. To facilitate processing by the BiLSTM, these IMF components are appropriately organized and concatenated to form a comprehensive time-frequency domain feature matrix, denoted as

F_{V M D}

. Assuming there are K IMF components, each containing data points for N time instances,

F_{V M D}

can be represented as an M*N matrix, where its element

F_{V M D} (i, j)

represents the feature value of the i-th IMF component at the j-th time point.

The organized time-frequency domain feature matrix

F_{V M D}

is then input into the BiLSTM. The BiLSTM consists of two oppositely directed LSTM layers, enabling it to simultaneously utilize both past and future context information. This allows for a more effective capture of the dependencies and evolutionary patterns of the signal along the temporal dimension. The BiLSTM integrates the forward and backward processing of the input data, thereby extending the standard LSTM architecture. In this setup, the input sequence is processed by two distinct LSTM networks. One network reads the sequence from start to end (the forward LSTM), while the other reads it from end to start (the backward LSTM). This bidirectional processing allows the BiLSTM to capture dependencies that might not be evident through unidirectional analysis alone, resulting in a more accurate representation of the input data.

The LSTM is capable of capturing long-term dependencies within time series. The core of an LSTM unit consists of a cell state and three gating mechanisms (the input gate, the forget gate, and the output gate). Taking a single LSTM unit as an example, its forward propagation process can be described by the following formulas:

Forget Gate: Determines which information to discard from the cell state.

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(6)

Input Gate: Determines how much new information should be incorporated into the cell state.

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(7)

{\tilde{C}}_{t} = t a n h (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C})

(8)

Cell State Update:

C_{t} = f_{t} * C_{t - 1} + i_{t} * {\tilde{C}}_{t}

(9)

Output Gate: Determines what value is to be output.

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})

(10)

h_{t} = o_{t} * t a n h (C_{t})

(11)

where

x_{t}

is the input at time t,

h_{t - 1}

is the hidden state at time t − 1,

C_{t - 1}

is the cell state at time t − 1;

W_{f}

,

W_{i}

,

W_{C}

,

W_{o}

are weight matrices;

b_{f}

,

b_{i}

,

b_{C}

,

b_{o}

are bias vectors; σ is the sigmoid activation function; and tanh is the hyperbolic tangent activation function.

The BiLSTM obtains the final output features by concatenating the hidden states of the forward LSTM and the backward LSTM. Assume the hidden state of the forward LSTM at time t is

{\vec{h}}_{t}

, and the hidden state of the backward LSTM at time t is

{\overset{\leftarrow}{h}}_{t}

; then, the output of the BiLSTM at time t is:

h_{t} = [{\vec{h}}_{t}; {\overset{\leftarrow}{h}}_{t}]

(12)

Through the processing of BiLSTM, the long-term dependencies and evolutionary patterns of dynamic pressure signals along the temporal dimension can be fully explored, enabling the extraction of more representative temporal features.

In summary, we first use VMD to decompose the dynamic pressure signal into multiple Intrinsic Mode Functions, obtaining the time-frequency domain feature matrix

F_{V M D}

. This matrix is then input into the BiLSTM, which captures the dependencies and evolutionary patterns of the signal along the temporal dimension through its bidirectional processing mechanism. This method combines the strengths of VMD in signal decomposition with the powerful capability of BiLSTM in extracting temporal sequence features, allowing for the effective extraction of valuable feature information from dynamic pressure signals.

3.3. Spatio-Temporal Feature Fusion

After separately extracting the spatial features from the frequency domain and the temporal features from the time domain, the model employs a cross-attention mechanism to fuse these two types of features. The spatiotemporal feature fusion utilizes a cross-attention mechanism, enabling deep interaction between spatial and temporal features. This allows for more precise identification of weak fault characteristics that are easily overlooked, thereby enhancing the accuracy of surge diagnosis.

3.3.1. Feature Input

Assume that after preprocessing, the spatial features from the frequency domain are represented as

F_{F F T}

with dimensions df × Nf (where df is the feature dimensionality and Nf is the number of features). The temporal features are represented as

F_{V M D}

with dimensions dv × Nv (dv is the temporal feature dimensionality and Nv is the number of temporal features). To perform cross-attention computation, these two features need to be appropriately transformed and aligned.

3.3.2. Cross-Attention Computation

The core of the cross-attention mechanism is to compute the similarity between the Query (Q), Key (K), and Value (V) to determine the degree of association between different features. Here, we can use the spatial feature F_FFT as the Query (Q), and the temporal feature

F_{V M D}

as the Key (K) and Value (V), respectively, or vice versa. The following description uses the former as an example.

First, linear transformations are applied to the Query, Key, and Value to map them into the same dimensional space for computing attention weights. Let the transformed Query matrix be Q, the Key matrix be K, and the Value matrix be V. The transformation process can be represented as:

Q = F_{F F T} W_{Q}

(13)

V = F_{V M D} W_{V}

(14)

K = F_{V M D} W_{K}

(15)

where

W_{Q}

,

W_{K}

, and

W_{V}

are learnable weight matrices, whose dimensions are determined based on the input features and the target dimension.

Next, compute the similarity between the Query and the Key to obtain the attention scores. A commonly used method for calculating similarity is the dot product similarity, formulated as follows:

Attention - Scores = Q^{T} K

(16)

To ensure better numerical stability of the attention scores, they are typically scaled. The scaling factor is

\sqrt{d_{k}}

(where

d_{k}

is the dimensionality of the key vectors), resulting in the scaled attention scores:

Scaled - Attention - Scores = \frac{Q^{T} K}{\sqrt{d_{k}}}

(17)

Subsequently, a softmax function is applied to the scaled attention scores to obtain the attention weights, ensuring that the weights sum to 1:

Attention - Weights = softmax (Scaled - Attention - Scores)

(18)

Finally, a weighted sum of the value matrix is performed according to the attention weights to obtain the fused feature representation

F

:

F = A t t e n t i o n - W e i g h t s \cdot V

(19)

The learnable weight matrices

W_{Q}

,

W_{K}

, and

W_{V}

are designed to project the input features

F_{F F T}

(dimensions

d_{f} \times N_{f}

) and

F_{V M D}

(dimensions

d_{v} \times N_{v}

) into a shared latent space of dimension

d_{m o d e l}

, ensuring compatibility for the attention calculation:

Q \in R^{N_{f} \times d_{m o d e l}}

,

K, V \in R^{N_{v} \times d_{m o d e l}}

. The attention weights computed via Equation (18) have a clear physical meaning: they represent the adaptive correlation strength between each frequency-domain spatial feature (in Q) and each time-domain temporal feature (in K). For instance, in our case studies, high attention weights were consistently assigned to interactions between low-frequency spectral bands from the FFT features (indicative of surge, as per 2.1) and high-energy transient peaks within specific VMD-IMF components. This demonstrates the mechanism’s ability to automatically focus on and fuse the most salient and mutually reinforcing spatiotemporal signatures of a surge event.

3.3.3. Feature Fusion and Diagnosis

The fused feature F incorporates deep interactive information between the spatial features from the frequency domain and the temporal features, providing a more comprehensive representation of surge fault characteristics. This fused feature F is then fed into a fully connected layer (FC) for further feature transformation and integration. The fully connected layer can be expressed as:

H = σ (W F + b)

(20)

where W is the weight matrix of the fully connected layer, b is the bias vector, and σ is an activation function.

Finally, a Softmax layer (SM) maps the output of the fully connected layer to a probability space, yielding the final classification or fault diagnosis result, which is used to determine the type or state of the surge fault. The formula for the Softmax function is:

P (y = i) = \frac{e^{z_{i}}}{\sum_{j = 1}^{C} e^{z_{j}}}

(21)

where

z_{i}

is the i-th element of the output from the fully connected layer, C is the number of classes, and

P (y = i)

represents the probability that the sample belongs to the i-th class.

Through the cross-attention mechanism, the model can adaptively focus on the correlations between frequency-domain spatial features and temporal-domain features, facilitating deep interaction between the two. This approach helps capture weak fault characteristics that are easily overlooked when relying on a single feature modality, thereby enhancing the accuracy and reliability of surge fault diagnosis.

In summary, the proposed model combines FFT and VMD for data preprocessing, utilizes CNN and BiLSTM to extract frequency-domain and time-domain features respectively, and realizes spatiotemporal feature fusion via a cross-attention mechanism. This integrated approach leverages the respective strengths of different methods in signal processing and feature extraction, enabling a more comprehensive and accurate analysis of surge fault data.

4. Results and Discussion

In this section, based on the experimental data from aero engine tests and the STFF-CANet diagnosis model, the evaluation metrics and surge diagnosis results of the model are presented.

4.1. Experimental Data

The dataset consists of approximately one million dynamic pressure data points obtained from a 1000 kg-class turbofan engine, covering multiple test scenarios including start-up, acceleration, deceleration, thrust increase, and inlet distortion. All experimental data were obtained under controlled ground test-stand conditions, which are the standard and necessary precursor for validating surge characteristics before flight testing. These data are used as the training, validation, and test sets for the model. The aforementioned STFF-CANet diagnostic model is employed to diagnose engine surge faults.

4.2. Evaluating Indicator

To evaluate the performance of the classification model, this paper employs the metrics of Precision, Recall, F1-score, and Accuracy. These four indicators are calculated to assess the overall capability of the fault detection model.

To distinguish the performance of the diagnostic model, we utilize four fundamental definitions: True Positive (TP), False Negative (FN), False Positive (FP), and True Negative (TN). Their specific definitions are as follows:

TP is the total number of samples where the predicted label is normal and the actual label is also normal.
FN is the total number of samples where the predicted label is normal but the actual label is fault.
FP is the total number of samples where the predicted label is fault but the actual label is normal.
TN is the total number of samples where the predicted label is fault and the actual label is also fault.

precision = \frac{T P}{T P + F P}

(22)

recall = \frac{T P}{T P + F N}

(23)

F 1 = \frac{2 \times (precision \times recal)}{precision + recall}

(24)

Accuracy = \frac{T P + T N}{T P + F P + F N + T N}

(25)

In the formulas: TP denotes the number of instances correctly predicted as the positive class, i.e., the count of correctly identified surge faults; FP denotes the number of instances incorrectly predicted as the positive class, i.e., the count of normal states misclassified as faults; FN denotes the number of instances incorrectly predicted as the negative class, i.e., the count of surge faults misclassified as normal; Precision represents the model’s ability to accurately detect faults; Recall represents the model’s ability to identify all surge fault samples within the dataset; and the F1-score indicates the comprehensive performance of the model. To evaluate the degree of privacy protection offered by the model, ε is utilized as an evaluation metric, calculated using the moment accounting method proposed by Mironov.

4.3. Surge Diagnosis Results

To validate the diagnostic effectiveness of the designed STFF-CANet model, this study compares it with other baseline models using the same dataset. Four evaluation metrics—macro-averaged F1-score, Recall, Precision, and overall Accuracy—are employed to assess the performance of the five models. To minimize randomness, each diagnostic model experiment was trained for 50 epochs and tested 20 times on the test set. The final evaluation metrics are derived from the average of these 20 test runs. The fault diagnosis results of different methods on the dataset are presented in Table 1.

As can be seen from Table 1, compared to the CNN, LSTM, BiLSTM, and CNN-BiLSTM diagnostic models, the STFF CANet model proposed in this paper achieves the best performance, with its F1 score, Recall, Precision, and Accuracy for surge fault classification reaching 97.96%, 97.52%, 98.43%, and 99.01%, respectively. Furthermore, the experimental results validate that the dataset generation and preprocessing method proposed in this paper is effective for surge fault diagnosis, as all diagnostic models listed in Table 1 attain a precision above 85%.

As described in 4.1, the dataset used for model training encompassed data from the engine across multiple test scenarios, including start-up, acceleration, deceleration, thrust increase, and inlet distortion. Table 1 presents the evaluation metrics of the STFF-CANet model across all scenarios, demonstrating its certain robustness to operating modes and flight conditions. The model’s effectiveness stems from its training on a composite dataset incorporating these varied modes, enabling it to learn surge signatures that are invariant to the underlying steady-state operating point but are triggered by specific transient aerodynamic events.

To achieve surge fault detection in aircraft engines, a sliding window-based dataset construction method is adopted. To effectively identify aero engine surge faults, a deep learning-based diagnosis model named STFF-CANet is proposed. First, surge feature analysis is performed on aero engine sensor data in both the time and frequency domains, and a sliding window-based data preprocessing algorithm is applied to construct the dataset and label set. Then, by integrating a convolutional neural network and a long short-term memory network, a deep neural network model tailored for aero engine surge fault diagnosis is designed. Finally, the proposed model is compared with state-of-the-art deep neural networks on the constructed dataset. Experimental results demonstrate that the proposed model achieves an F1 score, Recall, Precision, and Accuracy of 97.96%, 97.52%, 98.43%, and 99.01%, respectively, for surge fault classification, outperforming other network models.

4.4. Ablation Study

To validate the necessity and individual contribution of the core modules in the STFF-CANet framework, an ablation study was conducted. We designed two model variants and compared their performance with the full STFF-CANet model on the same test set. The results of the ablation study are summarized in Table 2.

Variant A (No Sliding Window): The model was trained and tested on the original, non-overlapping data segments (i.e., without the sliding window augmentation described in 3.1). This directly reduces the number of training samples.
Variant B (No Cross-Attention): The spatiotemporal features extracted by the CNN and BiLSTM were fused using simple concatenation instead of the proposed cross-attention mechanism.

Table 2. Results of the ablation study.

Model Variant	F1_Score/%	Recall/%	Precision/%	Accuracy/%
STFF-CANet	97.96	97.52	98.43	99.01
Variant A (No Sliding Window)	91.01	90.87	89.20	87.55
Variant B (No Cross-Attention)	90.88	89.40	87.38	88.52

The performance drop in Variant A confirms the effectiveness of the sliding window method in augmenting limited data and providing richer sequential context, which is crucial for learning transient surge characteristics. The inferior results of Variant B compared to the full model demonstrate that the cross-attention mechanism provides a superior fusion strategy over simple concatenation, enabling adaptive, deep interaction between spatiotemporal features. This ablation study quantitatively validates the necessity and individual contribution of each proposed core module to the overall model performance.

4.5. Discussion on Model Feasibility and Practical Deployment

The practical deployment of the STFF-CANet model in real-world aero-engine health monitoring systems necessitates a rigorous assessment of its real-time performance and adaptation to embedded hardware constraints. To preliminarily assess its real-time potential, inference tests were performed on a high-performance multi-core CPU. The average inference time for processing a single data segment (with the sliding window length defined in 3.1) was approximately 15 milliseconds. This latency is primarily attributed to the sequential operations of the BiLSTM layers and the attention mechanism. While this result is promising, constrained by the current research conditions, our next step is to proceed with deployment and testing on embedded aviation systems. It is possible that further lightweighting of the model will be necessary, which itself represents a valuable and worthwhile direction for further research and exploration. Therefore, while the current model demonstrates high diagnostic accuracy on the available test data, its sustained accuracy over an engine’s lifetime necessitates further investigation. This will be a central focus of our subsequent research as we aim to collect more comprehensive lifecycle data and develop robust online adaptation methodologies.

The baseline models selected for comparison (CNN, LSTM, BiLSTM, CNN-BiLSTM) represent standard deep learning approaches for processing temporal and spatial features in fault diagnosis. The STFF-CANet model, with its targeted cross-attention fusion of complementary spatiotemporal features derived from FFT and optimized VMD, offers a parsimonious and effective solution tailored for the specific challenge of extracting weak surge precursors from limited, high-dimensional sensor data. Future work will focus on collecting more extensive datasets to enable fair and robust comparisons with these advanced architectures and on exploring hybrid models that may incorporate their strengths.

5. Conclusions

This study proposes a diagnostic model for engine surge in aero engines based on spatiotemporal feature fusion. The model integrates the strengths of CNN in spatial feature extraction and BiLSTM in temporal sequence modeling, enabling a more comprehensive characterization of fault states. An innovative cross-attention mechanism is introduced to replace traditional concatenation or summation operations, achieving adaptive deep fusion of spatiotemporal features and allowing the model to focus on critical information across different features. This effectively addresses the challenge of accurately identifying and diagnosing weak and incipient surge characteristics under limited sample conditions. Through the sliding window slicing technique, the sample size is expanded while enhancing data continuity and ensuring feature statistical stability. Validation results based on engine data demonstrate the model’s capability to reliably identify surge faults. Experimental data show that the model achieves an F1-score, Recall, Precision, and Accuracy of 97.96%, 97.52%, 98.43%, and 99.01%, respectively, in fault classification tasks. Comprehensive validation using multiple metrics confirms the model’s effectiveness and practicality. In response to the two core challenges of scarce fault data and weak surge indications in aero engines, this study proposes sliding window-based data augmentation and deep fusion-based weak feature identification as effective solutions. By fully leveraging the spatiotemporal characteristics of signals, the method significantly enhances the perception of weak fault features and demonstrates superior overall performance. It provides a feasible approach for early intelligent fault diagnosis in aero engines, with promising application prospects in enhancing flight safety, supporting condition-based maintenance, and extending service life.

Author Contributions

Conceptualization, C.H. and Y.S.; methodology, C.H., Y.S. and Q.Z.; software, Y.S. and Q.Z.; validation, C.H., Y.S. and Q.Z.; writing—original draft preparation, C.H., Y.S. and Q.Z.; writing—review and editing, K.M. and J.S.; visualization, K.M.; supervision, J.S.; project administration, G.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDC0143001).

Data Availability Statement

Data are contained within this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, X.; Fan, M. Oplimization design of suge judgmen parameters of aero engine based on surge signal processing simulation system. Aerengimne 2022, 48, 102–108. [Google Scholar]
Wang, J.C.; Cao, C.J. Surge behavior of a multi-stage highly loaded axial compressor. J. Propuls. Technol. 2024, 45, 2210077. [Google Scholar]
Knezevic, V.; Orovic, J.; Stazic, L.; Čulin, J. Fault tree analysis and failure diagnosis of marine diesel engine turbo charger system. J. Mar. Sci. Eng. 2020, 8, 1004. [Google Scholar] [CrossRef]
Wang, H.L. Design of fault diagnosis expert system for natural gas engine based on fault tree. Electron. Technol. Softw. Eng. 2019, 5, 29–30. [Google Scholar]
Elashmawi, W.H.; Kotp, N.A.; Tawel, G.E. A novel proposed neural network MAD (Monitoring, Analysis and Diagnose) model for industrial gas turbine. Int. J. Soft Comput. 2018, 13, 92–101. [Google Scholar]
Wu, B.; Ding, H.Y.; Huang, S. Research on gas path faultdiagnosis of turbofan engine based on depth confidence net-work. Mech. Electr. Inf. 2020, 26, 12–13. [Google Scholar]
Yuan, M.; Wu, Y.; Lin, L. Fault diagnosis and remaining useful life estimation of aero engine using LSTM neural network. In Proceedings of the 2016 IEEE International Conference on Aircraft Utility Systems (AUS), Beijing, China, 8–14 October 2016; pp. 135–140. [Google Scholar]
Chen, Z.K.; Yuan, X.; Sun, M.Y.; Gao, J.; Li, P. A hybrid deep computation model for feature learning on aero engine data: Applications to fault detection. Appl. Math. Model. 2020, 83, 487–496. [Google Scholar] [CrossRef]
Guo, J.; Liu, X.; Li, S.; Wang, Z. Bearing intelligent fault diagnosis based on wavelet transform and convolutional neural network. Shock Vibrat. 2020, 2020, 6380486. [Google Scholar] [CrossRef]
Yang, K.; Zhao, L.; Wang, C. A new intelligent bearing fault diagnosis model based on triplet network and SVM. Sci. Rep. 2022, 12, 5234. [Google Scholar] [CrossRef]
Lic, X.; Hu, Z. Experimental study of surge and rotating stall occurring in high speed multistage axial compressor. Procedia Eng. 2015, 99, 1548–1560. [Google Scholar] [CrossRef][Green Version]
Zheng, X.; Sun, Z.; Kawakubo, T.; Tamaki, H. Experimental investigation of surge and stall in a turbocharger centrifugal compressor with a vaned diffuser. Exp. Therm. Fluid Sci. 2017, 82, 493–506. [Google Scholar] [CrossRef]
Pullan, G.; Young, A.M.; Day, I.J.; Greitzer, E.M.; Spakovszky, Z.S. Origins and structure of spike type rotating stall. J. Turbomach. 2015, 137, 051007.1–051007.11. [Google Scholar] [CrossRef]
Munari, E.; Delia, G.; Morini, M.; Mucchi, E.; Pinelli, M.; Spina, P.R. Experimental investigation of vibrational and acoustic phenomena for detecting the stall and surge of a multistage compress. J. Eng. Gas Turbines Power 2018, 140, 092605.1–092605.9. [Google Scholar] [CrossRef]
Sun, Z.; Zou, W.; Zheng, X. Instability detection of centrifugal compressors by means of acoustic measurements. Aerosp. Sci. Technol. 2018, 82, 628635. [Google Scholar] [CrossRef]
Cabrera, K.; Pezzini, P.; Shadle, L.; Bryden, K.M. Surge and stall detection using acoustic analysis for gas turbine hybrid cycles. J. Eng. Gas Turbines Power 2020, 142, 041026.1–041026.7. [Google Scholar] [CrossRef]
Li, Z.; Qiao, B.; Wen, B.; Chen, X. Identification of fan surge precursors based on acoustic array signals. J. Aerosp. Power 2021, 36, 923–934. [Google Scholar]
Ma, J.; Bai, X.; Ma, F.; Zhuo, S.; Sun, B.; Li, C. Convolutional Neural Network Design Based on Weak Magnetic Signals and Its Application in Aircraft Bearing Fault Diagnosis. IEEE Sens. J. 2024, 24, 36031–36043. [Google Scholar] [CrossRef]
Li, Y.; Zhou, J.; Li, H.; Meng, G.; Bian, J. A Fast and Adaptive Empirical Mode Decomposition Method and Its Application in Rolling Bearing Fault Diagnosis. IEEE Sens. J. 2023, 23, 567–576. [Google Scholar] [CrossRef]
Liu, X.; Lu, J.; Li, Z. Multiscale Fusion Attention Convolutional Neural Network for Fault Diagnosis of Aero engine Rolling Bearing. IEEE Sens. J. 2023, 23, 19918–19934. [Google Scholar] [CrossRef]
Zhao, Y.; Wang, H.; Gao, X.; Han, Q.; Liu, Y. Improved Time Synchronous Averaging and Its Application in Data-Driven Rotor Fault Diagnosis. IEEE Trans. Instrum. Meas. 2022, 71, 3518212. [Google Scholar] [CrossRef]
Yao, Y.L.; Yuan, H.C.; Lu, C.; Tang, X.L.; Huang, A.H. Research on Aero engine Surge Diagnosis Model Based on CNN-Seq2Seq. Meas. Control Technol. 2022, 41, 45–50. [Google Scholar]

Figure 1. Test A: (a) dynamic pressure signal under non-surge conditions, (b) FFT spectrum of the dynamic pressure signal under non-surge conditions (The red circles indicate three high-frequency distributions, which are highlighted to facilitate comparison), (c) dynamic pressure signal under surge conditions, and (d) FFT spectrum of the dynamic pressure signal under surge conditions (The red circles indicate three high-frequency distributions, which are highlighted to facilitate comparison).

Figure 2. Test B: (a) dynamic pressure signal under non-surge conditions, (b) FFT spectrum of the dynamic pressure signal under non-surge conditions (The red circles indicate the high-frequency distributions, which are highlighted to facilitate comparison), (c) dynamic pressure signal under surge conditions, and (d) FFT spectrum of the dynamic pressure signal under surge conditions (The red circles indicate the high-frequency distributions, which are highlighted to facilitate comparison).

Figure 3. VMD decomposition of the dynamic pressure signal under surge conditions across different IMFs.

Figure 4. STFF-CANet Surge Diagnosis Model Based on Spatio-temporal Feature Fusion.

Table 1. Comparison of Fault Diagnosis Results.

Model	F1_Score/%	Recall/%	Precision/%	Accuracy/%
STFF-CANet	97.96	97.52	98.43	99.01
CNN	97.32	96.96	97.65	98.69
LSTM	96.58	96.01	97.17	98.35
BiLSTM	95.13	93.78	97.34	97.70
CNN-BiLSTM	91.16	79.37	83.85	93.21

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hu, C.; Shen, Y.; Zeng, Q.; Xu, G.; Sun, J.; Miao, K. STFF-CANet Diagnosis Model of Aero-Engine Surge Based on Spatio-Temporal Feature Fusion. Aerospace 2026, 13, 212. https://doi.org/10.3390/aerospace13030212

AMA Style

Hu C, Shen Y, Zeng Q, Xu G, Sun J, Miao K. STFF-CANet Diagnosis Model of Aero-Engine Surge Based on Spatio-Temporal Feature Fusion. Aerospace. 2026; 13(3):212. https://doi.org/10.3390/aerospace13030212

Chicago/Turabian Style

Hu, Chunyan, Yafeng Shen, Qingwen Zeng, Gang Xu, Jiaxian Sun, and Keqiang Miao. 2026. "STFF-CANet Diagnosis Model of Aero-Engine Surge Based on Spatio-Temporal Feature Fusion" Aerospace 13, no. 3: 212. https://doi.org/10.3390/aerospace13030212

APA Style

Hu, C., Shen, Y., Zeng, Q., Xu, G., Sun, J., & Miao, K. (2026). STFF-CANet Diagnosis Model of Aero-Engine Surge Based on Spatio-Temporal Feature Fusion. Aerospace, 13(3), 212. https://doi.org/10.3390/aerospace13030212

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

STFF-CANet Diagnosis Model of Aero-Engine Surge Based on Spatio-Temporal Feature Fusion

Abstract

1. Introduction

2. Surge Feature Analysis

2.1. Aero-Engine Surge Signals Analysis Based on FFT

2.2. Aero-Engine Surge Signals Analysis Based on Optimized VMD

3. Methodology

3.1. Data Pre-Processing

3.2. Feature Extraction

3.2.1. Spatial Feature Extraction

3.2.2. Temporal Feature Extraction

3.3. Spatio-Temporal Feature Fusion

3.3.1. Feature Input

3.3.2. Cross-Attention Computation

3.3.3. Feature Fusion and Diagnosis

4. Results and Discussion

4.1. Experimental Data

4.2. Evaluating Indicator

4.3. Surge Diagnosis Results

4.4. Ablation Study

4.5. Discussion on Model Feasibility and Practical Deployment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI