3.1. Data Collection and Preprocessing
Based on the test data from an LRE test bench, this study constructed a multi-channel time-series dataset comprising normal conditions (class0) and four typical fault states: ball bearing failure (class1), fatigue fracture of the turbine rotor disk-shaft joint (class2), oxygen turbopump ablation (class3), and pipeline fatigue fracture (class4). The selection of these four fault types was based on domain expert knowledge and historical failure cases, as they represent typical and frequently encountered failure modes during actual LRE operation. This dataset was used for training and validating the proposed fault diagnosis model. The data were collected from over 20 sensors distributed across key locations, including the combustion chamber and the turbopump, with sampling rates of either 1 kHz or 100 Hz, and a total duration of approximately 500 s. During the construction of the final dataset for training and generation, based on expert knowledge and fault-relevance analysis, we selected 14 signal channels strongly correlated with fault evolution from the original set of over 20 sensors, excluding some channels that monitored redundant environmental parameters or remained static throughout the tests. This feature selection process aims to reduce noise interference and computational complexity while retaining the most representative physical information for fault diagnosis.
During the data preprocessing stage, the raw engine time-series data were first subjected to outlier removal and cleaning, followed by signal smoothing. Specifically, targeted Gaussian smoothing filtering was applied to the multidimensional sensor signals, with different smoothing intensities applied to different parameters. This operation effectively suppressed high-frequency noise and improved signal stability. Next, to ensure consistent sampling lengths across dimensions such as engine temperature and pressure, a time-window-based truncation strategy was employed. Using the engine emergency shutdown time as the reference, data segments 10 s before shutdown and 5 s after were truncated. This approach not only captured the pre-fault evolution process and post-fault characteristics but also standardized the time length of all sample data. Subsequently, standardization was performed to eliminate dimensional differences among the engine’s physical parameters. The standardized data exhibited a uniform distribution, reducing the model’s dependence on parameter magnitudes and thereby enhancing training stability. Furthermore, inverse standardization was applied to the data to ensure the generated data could be mapped back to real physical dimensions.
Through the aforementioned preprocessing steps, a high-quality, representative dataset was constructed to support the generation of small-sample fault data and fault classification for LREs. For each engine test, a fixed-length segment centered on the emergency shutdown event was extracted. Non-overlapping sliding windows were employed, ensuring that each sample corresponded to an independent test event. The original dataset comprised 407 samples, including 325 normal samples and 4 fault classes with 14, 27, 8, and 33 samples, respectively, resulting in pronounced class imbalance.
To address training instability and insufficient fault coverage under limited sample conditions, a two-stage data augmentation strategy was adopted. Initially, simulation-augmented data from domain experts at a collaborating institution were used to improve class balance and stabilize generative model training (due to confidentiality agreements, the institution’s identity and simulation details are not disclosed). After this stage, the normal samples increased to 1167, and the four fault classes expanded to 47, 51, 39, and 62 samples, respectively. Subsequently, a generative adversarial network (GAN) trained on real data was used to generate additional fault samples, resulting in final fault class sizes of 251, 347, 249, and 250 for model training and validation. To prevent potential information leakage due to temporal autocorrelation, the real dataset was partitioned into training and test sets at the sample (event) level, with a 7:3 split, thereby ensuring temporal independence between the two subsets.
3.2. GAN-Based Fault Data Generation Model
This study adopted a GAN for multidimensional time-series data to generate synthetic LRE fault samples through adversarial training between a generator and a discriminator, thereby alleviating the limited availability of real fault samples. Due to the scarcity of LRE fault data samples—especially for extracting weak features—insufficient data severely compromises diagnostic accuracy. Thus, a GAN was used to generate augmented samples similar to real fault data, supporting dataset augmentation and improving the training conditions for downstream classification under few-shot settings.
Figure 1 illustrates the overall architecture.
The generator employs a hierarchical architecture that starts from a low-resolution latent feature representation and progressively upsamples it to reconstruct multivariate time-series signals, thereby balancing computational efficiency and detail recovery capability. The model takes a 100-dimensional latent space vector as input, which is first mapped to a high-dimensional feature representation via a fully connected layer. A temporal downsampling strategy is subsequently adopted, significantly reducing the number of parameters and computational complexity of subsequent convolutional operations by lowering the time-step dimension. This design enables the model to process only small-scale feature maps in the initial stage, thereby achieving efficient resource allocation across both spatial and temporal dimensions.
During the feature reconstruction phase, the downsampled features are progressively upsampled across multiple layers of transposed convolutions. This hierarchical feature refinement strategy decomposes the complex temporal reconstruction task into multiple scale-specific processing stages. Consequently, the model can focus solely on the feature maps at the current scale in each step, avoiding the high computational cost of directly processing large-scale sequences while maintaining the capacity for detail restoration. To address the strong temporal characteristics of liquid rocket engine fault data, such as the dynamic correlation between sudden pressure changes and temperature rises—which can lead to “temporally chaotic” data when using standard GANs—this study integrated a bidirectional long short-term memory (BiLSTM) network into the generator. This module receives the preliminarily upsampled features from the transposed convolutions and leverages its bidirectional architecture to capture both forward and backward dependencies within the time series. This effectively enhances the temporal coherence of the generated data without substantially increasing the computational burden, preventing the production of synthetic samples with inconsistent temporal dynamics.
To ensure independence and correlation are unified across the various dimensions in multivariate time-series data, the generator incorporates a dimension-interaction module. This module uses 1D convolutional layers to learn implicit relationships across parameter dimensions, generating a dimensional weight matrix. Feature-wise multiplication is then applied to achieve coordinated generation across multiple variables. This mechanism maintains the independent statistical properties of each parameter while ensuring the consistency of the multivariate time series. Finally, the output layer uses the hyperbolic tangent (Tanh) activation function to constrain the generated data to the interval [−1, 1]. This design ensures that the generated data have the same value range as the real data, normalized during preprocessing, thereby guaranteeing the rationality of the generated samples and their compatibility with subsequent processing steps.
The core task of the discriminator is to distinguish between real LRE fault data and synthetic samples. To balance feature discrimination capability and model training stability, its architecture employs a spectral normalization convolutional network [
28]. The discriminator in the proposed framework functions as a critic rather than as a probabilistic binary classifier. Under the WGAN-GP objective, it outputs an unconstrained scalar score instead of a probability bounded between 0 and 1. Spectral normalization is applied to the convolutional and linear layers to improve training stability, and a gradient penalty is introduced during optimization to enforce the Lipschitz constraint.
During the training of the generator and discriminator, to optimize the model and enhance convergence stability, studies have shown that both WGAN [
29] and WGAN-GP [
30] can stabilize the optimization process and be better applied to other architectures. Therefore, this study adopted the WGAN-GP framework for network training. Correspondingly, the generator was optimized using a composite objective consisting of an adversarial loss, a multi-scale temporal consistency loss, a frequency-domain loss, and an adaptive parameter-specific regularization term. The temporal term constrains first-order, second order, and window-level trend consistency between generated and real sequences; the frequency-domain term matches the spectral characteristics of the generated and real signals; and the parameter-specific term places additional emphasis on channels identified as difficult to model during preprocessing. That is, the generator not only needs to “deceive” the discriminator but also ensure that the temporal variation of the generated data is consistent with that of the real data. Through the adversarial learning between the generator and discriminator, 14-dimensional time-series data are generated. During training, the generator combines residual blocks and LSTM modules to capture multi-scale temporal features and dynamically weights its output via a dimension interaction layer. In the current implementation, the discriminator is updated on every other mini-batch, whereas the generator is updated once per mini-batch. Gradient clipping and adaptive learning-rate scheduling are further used to stabilize training.
In the model evaluation phase, to comprehensively assess the consistency between generated data and real fault data across multidimensional characteristics, this study adopted a comprehensive evaluation system based on temporal characteristics, statistical distribution, and overall similarity. This system is composed of three core dimensions: temporal indicator score, statistical indicator score, and Wasserstein distance, and comprehensively measures the quality of the generated data through a weighted comprehensive score.
Temporal characteristics are key features of LRE fault data, and their evaluation includes four critical sub-indicators. The improved dynamic time warping (DTW) distance serves as the core evaluation metric, measuring the alignment degree of temporal morphology between the generated and real sequences. By calculating the matching distance between two time series, DTW overcomes the standard time shifts and nonlinear distortions in engine fault signals. For example, for key parameters such as combustion chamber pressure and turbopump speed, DTW can accurately capture the positions of their mutation points and the trends in amplitude variation. Let the real data time series be
X = (
x1,
x2, …,
xn), the generated data time series be
Y = (
y1,
y2, …,
yn), and π denote the optimal alignment path; that is, the DTW distance solves the optimal alignment path via dynamic programming:
The remaining three indicators are robust autocorrelation similarity (ACF), power spectrum divergence, and trend consistency, respectively. Among them, ACF is mainly used to evaluate the degree of preservation of temporal dependencies, and is particularly suitable for reflecting periodic patterns in vibration signals (e.g., bearing faults) and pressure pulsations. The calculation of this indicator is shown in Equation (2), where ACF denotes the autocorrelation function, and ρ represents the Pearson correlation coefficient.
For the calculation of the power spectrum divergence indicator, this study adopted the Jensen–Shannon Divergence, as shown in Equation (3):
where
PX and
PY are the normalized power spectra of the real signal and the generated signal, respectively.
The last indicator, trend consistency, is implemented using sliding-window linear regression to verify the accuracy of long-term trends. In the sliding-window linear regression used for trend consistency evaluation, the window length is selected according to the sampling frequency and the characteristic time scale of signal evolution. The chosen window size is sufficiently large to smooth short-term fluctuations while preserving long-term trend information. Empirically, this setting provides a stable estimate of temporal trends, and minor variations in the window length do not lead to significant changes in trend consistency.
The Wasserstein distance measures the difference between the distributions of generated data and real data, and its calculation formula is:
where
PX and
PY denote the set of all joint distributions of
X and
Y, respectively.
To comprehensively and quantitatively evaluate the quality of the generated data, this study establishes a hierarchical evaluation framework in which the Temporal Metric Score and Statistical Metric Score are defined as weighted aggregates of multiple underlying metrics. Specifically, the Temporal Metric Score integrates four core sub-metrics: DTW distance, ACF similarity, PSD divergence, and trend consistency, as shown in Equation (5).
where N denotes the number of sensor channels, M
ij represents the i-th fundamental temporal metric of the j-th channel, Norm(·) signifies the normalization operation, and ω
i refers to the internal weighting coefficient for each respective metric.
The Statistical Metric Score consolidates the relative errors of key statistical moments, including the mean, standard deviation, skewness, and kurtosis, as shown in Equation (6).
where
Vreal and
Vgen represent the statistical values of the real and generated data, respectively.
The generator and discriminator were trained using the Adam optimizer (β1 = 0.5, β2 = 0.9). The model had 5000 training epochs; after every 100 epochs, the model was evaluated. The optimal model was selected by synthesizing statistical similarity, DTW distance, and classifier accuracy. Finally, the generated data were restored to their original dimensions via inverse standardization. In this framework, the GAN component is primarily introduced to improve the training distribution under severe data scarcity and class imbalance. The current evaluation supports temporal and statistical consistency of the generated samples, but does not establish complete physical realism.
3.3. LRE Fault Classification Based on the MHA-CBL Model
Using the generated augmented data samples, this study constructed a hybrid classification model integrating a convolutional neural network (CNN), a bidirectional long short-term memory network (BiLSTM), and a multi-head attention (MHA-CBL) to classify fault types during LRE fault diagnosis. CNN is recognized as an effective technique for extracting spatial features of data for learning [
31]; thus, by leveraging CNN to extract local temporal features, using BiLSTM to capture long-range dependencies, and employing multi-head self-attention to focus on key data features—along with fusing augmented samples generated by GAN and real data for training—this model not only addresses the data scarcity issue but also effectively improves the accuracy of fault classification under small-sample conditions.
First, the input layer of the MHA-CBL model consists of time-series signals, such as temperature and pressure, with a fixed time step per sample. In terms of model architecture, a multi-layer CNN is first used for local feature extraction, with the number of channels (K) gradually increasing from 16 to 128 and the size of convolutional kernels (C) varying across layers. The use of convolutional layers enables the model to extract hierarchical local features from raw samples progressively. After each convolutional operation, a batch normalization layer is applied to accelerate training, followed by an average pooling layer to reduce data dimensionality—this retains key information while effectively reducing computational complexity. Meanwhile, to prevent model overfitting, a dropout layer is added immediately after the pooling layer. After local features are extracted by CNN, a BiLSTM network is adopted to capture long-range temporal dependencies. Owing to its bidirectional structure, the model can capture forward and backward information in time-series data; thus, BiLSTM enhances the model’s understanding of time-series data, which is crucial for accurately identifying fault categories. Second, to focus on the critical moments of LRE faults, the model is equipped with multi-head attention. An 8-head attention mechanism dynamically weights each time step, enabling effective identification of key time-step features associated with fault occurrence.
After the hierarchical processing above, the features are fed into a classification head for decision-making. The features are first flattened and then fed into a fully connected layer with 100 neurons for classification. This layer uses the ReLU activation function to introduce nonlinearity; a dropout layer is applied after it for regularization to prevent overfitting. Finally, the model outputs the predicted probabilities for different fault categories via a Softmax activation layer, thereby completing fault classification. The detailed layer structure and parameters of the MHA-CBL model are shown in
Table 1.
To improve reproducibility, we provide complete architectural specifications of the proposed MHA-CBL model in
Table 1. For each layer, we report not only the output shape and parameter count but also the activation function and critical hyperparameters. Specifically, for Conv1D layers, we list the kernel size, stride, padding mode, and the number of filters; for pooling layers we provide the pooling type and window/stride; for the BiLSTM, we report the number of units and whether it is bidirectional and returns sequences; and for the multi-head attention module, we include the model dimension, number of heads, and dropout. We also report the dropout rates and the activation functions used in the fully connected layers (with softmax in the output layer). These details allow the model to be fully reproduced.
During the training process, the input data are normalized to the range [0, 1] in the preprocessing stage to eliminate potential impacts from differences in feature scales during diagnosis, thereby improving the stability of model training. For label processing, fault category labels are converted to numerical labels and transformed into a format suitable for multi-class classification using one-hot encoding. The dataset is divided into a training set and a test set: the training set combines real data and GAN-generated data to enhance the model’s robustness, while the test set consists entirely of real data not used in training and is used to evaluate the model’s generalization ability objectively. The Adam optimizer was selected for the model, with a learning rate of 0.0001, and the loss function adopted categorical cross-entropy for the multi-classification task. The model used a batch size of 32 and 200 training epochs. To prevent model overfitting, an early stopping strategy was employed, and the model weights at the optimal epoch were saved by monitoring the validation set accuracy. The overall architecture of the proposed MHA-CBL model is illustrated in
Figure 2.
To ensure seamless integration of the generated synthetic samples into the fault classifier, a rigorous range-alignment procedure was implemented. Although the generator internally produces normalized outputs in the range [−1, 1] through the Tanh activation, the generated sequences are subsequently inverse-transformed back to the original physical scale using the dataset-level normalization statistics before being saved or evaluated. Therefore, generated and real samples are aligned through the downstream classifier-side preprocessing pipeline rather than through a fixed X′ = (X+ 1)/2 mapping alone. This step ensures that the synthetic data are numerically aligned with real-world observations while preserving crucial morphological features.
After training, multidimensional metrics are used to evaluate the model’s performance, including classification accuracy, precision, recall, and F1-score. Classification accuracy reflects the model’s overall performance on the test set; precision and recall measure the model’s prediction accuracy and coverage for each category, respectively; and the F1-score, a weighted average of the two, is suitable for evaluating classification performance on imbalanced datasets. In addition, a confusion matrix is used to visualize the distribution of predictions for each LRE fault category, intuitively showing the model’s classification biases across categories. This helps adjust the model parameters to enable high-precision fault detection even with a small number of real samples.