A Fault Diagnosis Method for the Train Communication Network Based on Active Learning and Stacked Consistent Autoencoder

Yang, Yueyi; Wang, Haiquan; Nie, Xiaobo; Wen, Shengjun; Li, Guolong

doi:10.3390/sym17101622

Open AccessArticle

A Fault Diagnosis Method for the Train Communication Network Based on Active Learning and Stacked Consistent Autoencoder

by

Yueyi Yang

^1,*

,

Haiquan Wang

¹,

Xiaobo Nie

²,

Shengjun Wen

¹ and

Guolong Li

¹

Zhongyuan Petersburg Aviation College, Zhongyuan University of Technology, Zhengzhou 471700, China

²

School of Electrical Engineering, Beijing Jiaotong University, Beijing 100044, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(10), 1622; https://doi.org/10.3390/sym17101622

Submission received: 13 August 2025 / Revised: 11 September 2025 / Accepted: 19 September 2025 / Published: 1 October 2025

(This article belongs to the Special Issue Symmetry in Fault Detection and Diagnosis for Dynamic Systems)

Download

Browse Figures

Versions Notes

Abstract

As a critical component of rail travel, the train communication network (TCN) is an integrated central platform that is used to realize the train control, condition monitoring, and data transmission, whose failure will disrupt the symmetry of TCN topology and endanger the security of rail trains. To enhance the reliability of TCN, an intelligent fault diagnosis method is proposed based on active learning (AL) and a stacked consistent autoencoder (SCAE), which is capable of building a competitive classifier with a limited amount of labeled training samples. SCAE can learn better feature presentations from electrical multifunction vehicle bus (MVB) signals by reconstructing the same raw input data layer by layer in the unsupervised feature learning phase. In the supervised fine-tuning phase, a deep AL-based fault diagnosis framework is proposed, and a dynamic fusion AL method is presented. The most valuable unlabeled samples are selected for labeling and training by considering uncertainty and similarity simultaneously, and the fusion weight is dynamically adjusted at the different training stages. A TCN experimental platform is constructed, and experimental results show that the proposed method achieves better performance under three different metrics with fewer labeled samples compared to the state-of-the-art methods; it is also symmetrically valid in class-imbalanced data.

Keywords:

train communication network; multifunction vehicle bus; physical-layer signal; fault diagnosis; stacked consistent autoencoder; active learning

1. Introduction

The train communication network (TCN) is used for the transmission of train control and monitoring (TCMS) data, which meets security, robustness, and real-time requirements in railway vehicles [1]. The TCN has been developed as an international electrotechnical commission (IEC), where the multifunction vehicle bus (MVB) is used for the control of data transmission among electronic control units [2]. Due to harsh working conditions and the growing complexity of TCN, various factors could cause performance degradation of the MVB and disrupt the symmetry of TCN topology, which would be catastrophic for railway traffic. Therefore, advanced fault diagnosis methods for MVB are critical not only for reducing maintenance costs but also for improving network performance and reliability.

Most existing research focuses on the effects of network delays on the networked automation system [3]. However, these methods struggle to find the root causes of faults, especially for early faults. Many factors will affect the quality of the electrical signal, such as media degradation, network impedance mismatch, and grounding problems. Thus, the signals from the physical layer contain more network performance and failure information. Lei et al. [4] presented a network health management system that provides diagnostic information for DeviceNet; the features extracted from analog waveform communication signals were used to evaluate the network performance. Choi et al. [5] introduced a source identification method for an in-vehicle network; the features in both time and frequency domains extracted from the electrical CAN signal were employed to detect malicious electronic control units (ECUs). Yang et al. [6] proposed an anomaly detection method using electrical CAN signals for the braking control network. Twelve features, such as steady-state dominance, overshoot, and bit time, were utilized to train the ensemble learning model. Li et al. [7] introduced a health evaluation method for MVB based on the physical-layer waveform; the health condition of the MVB device was quantified using the distance between the tested sample and the trained hypersphere. Li et al. [8] extracted fault features from the electrical MVB signals and used a weighted support vector machine to diagnose MVB faults. Although existing methods have demonstrated impressive results, these methods require thorough analysis and a priori knowledge of the fault mechanism, and the hand-crafted features have limitations in terms of application.

In recent years, prognostic and health management (PHM) technology has emerged as a research hotspot in both academia and industry. The application of PHM to TCN is of great significance for achieving cost-effective and intelligent maintenance, as well as improving system reliability. In PHM systems, many deep learning models have been proposed for fault diagnosis and useful life prediction to achieve automatic feature extraction and obtain high accuracy [9,10,11]. Compared with supervised learning methods such as CNN and GCN, stacked autoencoder (SAE) has the ability to find discriminative and high-level representations of complex data in an unsupervised feature learning manner, which allows for more effective utilization of unlabeled samples and significantly reduces the difficulty of model training [12]. SAE-based fault diagnosis methods have two key phases: greedy layer-wise unsupervised pre-training and supervised fine-tuning of the whole network. The weights of SAE learned in the pre-training phase are utilized to initialize the whole network for fine-tuning, which is much more effective than random initialization [13]. In the pre-training phase, the hierarchical features can be extracted by multiple autoencoders from the original inputs to the top layer. Then, each AE is trained by minimizing the reconstruction error between the low-level feature at the hidden layer of the previous AE and the reconstructed data. Since the minimum reconstruction error of each AE is non-zero, there is information loss in each AE. Therefore, the information loss of SAE is accumulated little by little, and the extracted features may not be the best representation of the original input data [14]. In general, diagnostic performance is more significantly affected by the learned features.

To obtain a high-quality deep learning model, a large number of labeled training instances are required due to the growth of training parameters in deep neural networks. However, it is too costly and difficult for engineering experts to carry out manual labeling for the enormous number of electrical MVB signals in the communication process. Moreover, it is usually difficult to obtain sufficient fault signals in practice. Active learning (AL) offers an effective approach to achieving higher performance of the model while requiring only a few labeled training samples, and it significantly lowers the cost [15]. AL has been applied to many real-world problems, such as image classification [16,17,18], fault diagnosis [19,20], text classification [21], and system monitoring [22]. The sampling strategy for deep learning has attracted the attention of researchers [23,24,25,26]. In active learning, a small labeled training set is used to train the model in the first round; then, a sampling strategy is utilized to select the most informative unlabeled samples for labeling. The newly labeled samples are added to the labeled training set. In the next round, the parameters of the model are updated based on the new training set. This process is repeated until some preset performance criterion is reached or the unlabeled pool becomes empty. Chen et al. [27] proposed an active learning-based fault diagnosis method in self-organizing cellular networks; the most informative unlabeled samples were selected by uncertainty sampling. Wang et al. [28] selected two kinds of samples for fine-tuning of CNNs (convolutional neural networks) according to the output of the CNN model; one kind is the most uncertain samples, and the other kind is the high-confidence samples. Rahhal et al. [29] presented an active deep learning method for electrocardiogram (ECG) signals, where entropy and breaking ties were used to measure uncertainty. However, the improvement achieved using a single sampling strategy was limited. The uncertainty sampling strategy is prone to selecting outliers and may result in the problem of sampling bias [30]. Although the diversity sampling strategy solves the problem of sampling bias and redundant instance selection, this strategy may require more selected unlabeled samples to achieve the target decision boundary, which may cause slow convergence.

To overcome these issues, a fault diagnosis method for TCN is proposed based on active learning and SCAE. The main contributions of this work are summarized as follows:

A TCN fault diagnosis method is proposed based on active learning and SCAE. SCAE is employed to automatically learn discriminative features from electrical MVB signals in the unsupervised feature learning phase, which present better feature representations for raw input data and diagnostic performance than the original SAE.
A framework of deep active learning for TCN fault diagnosis is designed in the supervised fine-tuning phase, and a dynamic fusion AL strategy is proposed to enhance the performance of our diagnosis model with lower labeling costs. There is a trade-off between the uncertainty and similarity sampling, and the fusion weight is dynamically adjusted at the different training stages.
A fault diagnosis testbed was constructed, and a monitoring unit was added to the MVB network. Extensive comparison experiments with the state-of-the-art methods are performed, and the experimental results demonstrate that our proposed method achieves better performance with fewer labeled samples, symmetrically improving diagnosis accuracy and the efficiency of data labeling.

2. Background

The train communication network is mainly applied for transmitting key data such as control commands and status information. Compared with bus communication protocols like CAN, ARCNET, and WordFIP, the TCN protocol is most widely applied in the field of rail transit. According to the international standard IEC 61375-3-1 [2], the train communication network typically adopts a two-level bus network structure consisting of the wire train bus (WTB) and the multifunction vehicle bus (MVB). The MVB network is widely used in high-speed EMUs and other rail transit trains that do not require frequent re-marshalling. It connects various network node devices and control devices of different train subsystems, with diverse application functions. In practical engineering applications, faults in the train communication network often occur in the MVB network. Figure 1 shows the network topology of the metro train; the in-vehicle operations are typically controlled by different electronic control units, such as the vehicle control unit (VCU), electric drive control unit (EDCU), electric braking control unit (EBCU), remote input/output module (RIOM), and air conditioning unit (ACU), which are connected by an MVB. Generally, the MVB is developed in master–slave communication mode, and the two VCUs on a metro train are configured for hot standby redundancy, with one of them functioning as the network master device. MVB messages are used to transmit the train control and monitoring data.

The transmission media of the MVB include electrical short distance (ESD), electrical middle distance (EMD), and optical glass medium (OGF), and the baud rate of MVB is 1.5 Mbps. The EMD is the most commonly used in MVB; a twisted pair of two wires is used in a shielded cable, which can support up to 32 devices over a distance of 200 miles. The electrical MVB signal conforms to ISO/IEC 8482 (RS-485) [31] and is encoded using the non-return-to-zero (NRZ) bit of Manchester code. A high-to-low-level transition within 1 bit time is recognized as 1, while a low-to-high-level transition within 1 bit time is recognized as 0. A high level that persists for 1 bit time is identified as non-data “NH”, and a low level that persists for 1 bit time is identified as non-data “NL”. The data and non-data encoding are shown in Figure 2.

The MVB protocol defines the master frame and the slave frame, and Figure 3 shows the master data frame format and the slave data frame format. The master frame is made up of the master start delimiter, F_code, slave address, check sequence (CS), and end delimiter (ED). The master start delimiter consists of the sequence {start bit, ‘NH’, ‘NL’, ‘0’, ‘NH’, ‘NL’, ‘0’, ‘0’, ‘0’}. The slave frame consists of the slave start delimiter, different lengths of the frame data, the check sequence, and the end delimiter. As an interesting feature of the MVB protocol, the master start delimiter and slave start delimiter are fixed, which do not vary according to the frame data and can be used to identify the MVB master frame.

Figure 4 shows the different electrical MVB signals in a unit macro-period; various factors will result in the signal waveform deviating from the normal condition, which may cause various MVB faults. To diagnose the MVB faults, MVB signals corresponding to the bit string of the master start delimiter are collected, and the fault features are extracted from the measured signals.

Faults and interferences are directly related to major failure modes of the MVB networks, and these failure modes result in the network’s robustness and reliability. In accordance with domain experts, the majority of failure modes commonly occurring on MVB networks are listed in Table 1 [7,8]. The open fault mode and short fault mode are known as hard faults, which cause network breakdown or network devices to move offline. Terminating fault mode, transceiver fault mode, connector degradation, and cable degradation are considered soft faults, which may cause the reflection phenomenon and thus affect the quality of MVB signals and make the system vulnerable to external interferences. These faults are all persistent failures caused by the aging of components such as cables and connectors. The fault phenomena remain until the corresponding faulty parts are replaced.

3. Proposed Method

3.1. System Overview

Due to insufficient labeled signals in practice, the aim of our system is to construct a fault diagnosis model for MVB that achieves high diagnostic performance with a lower labeling cost. The framework of our proposed method is shown in Figure 5, and the description of the proposed method is presented in the following part.

In the unsupervised feature learning phase, all unlabeled data are used to train the SCAE in an unsupervised manner. After the pre-training of SCAE, a SCAE-based DNN model is constructed and initialized using the parameters of the SCAE. In the supervised fine-tuning stage, a dynamic fusion sampling (DFS) strategy is employed to select the most informative instances for expert labeling from the unlabeled set, and the SCAE-based DNN model is trained based on the labeled instances. In the DFS strategy, similarity is employed to reduce the information redundancy, and uncertainty is used to select the unlabeled instances to speed up the convergence of the model. The fusion weight is dynamically adjusted at the different training stages. Subsequently, the SCAE-based DNN model is updated, and this process repeats until the performance requirements are satisfied or the unlabeled dataset is empty. It is worth noting that the model outputs in the current round and the previous round are fused by the max rule to avoid overfitting problems.

3.2. Unsupervised Feature Learning Using SCAE

CAE is also an AE that comprises the input, hidden, and output layers. The structure of a CAE is shown in Figure 6. From the dataset X = {x(1), x(2), … x(i), … x(N)}, x(i) ∈ RS, N is the number of data samples, and S is the number of features in each sample. {W, b} and {

\tilde{W}

,

\tilde{b}

} represent the parameters at the hidden layer and output layer. For CAE, the input vector is the original input data x(i) or the features extracted at a certain hidden layer, and the output layer of CAE is isomorphic with the original input data x(i). The output of CAE is close to or refactoring x(i). We can assume the variable vectors at the input and hidden layer are z(i) and h(i), respectively. The reconstructed original data

\tilde{x}

can be obtained as follows:

\tilde{x} = φ_{θ} (h (i)) = \tilde{f} (f (z (i))

(1)

where θ_CAE = {W, b,

\tilde{W}

,

\tilde{b}

}, f and

\tilde{f}

are the corresponding activation functions. The parameters of CAE are optimized by minimizing the reconstruction error between the original data and the output.

J_{C A E} (W, \tilde{W}, b, \tilde{b}) = \frac{1}{2 N} \sum_{i = 1}^{N} J (z (i), x (i), W, \tilde{W}, b, \tilde{b}) = \frac{1}{2 N} \sum_{i = 1}^{N} || φ_{θ} (z (i) - x (i) {||}^{2}

(2)

Multiple CAEs can be hierarchically stacked to build SCAE neural networks; the structure of SCAE is shown in Figure 7. We can assume there are a total of k CAE models that are denoted {CAE 1, CAE2, … CAE k}, and the detailed pre-training procedure for the deep SCAE is shown in Figure 7, the black dashed line indicates that the hidden layer of the previous CAE serves as the input layer of the next CAE, while the red dashed line indicates that during model training, the output layers of all CAEs are combined with the input layer of the first CAE to calculate the loss function for each CAE. The input feature vector of CAE 1 is the original input data x(i), and the output is the reconstructed raw

\tilde{x}

₁(i). The hidden layer feature h₁(i) of CAE 1 is extracted and used as the input vector of CAE 2. The parameters of CAE 1 are optimized by minimizing the reconstruction error between the original input data x(i) and the reconstructed data

\tilde{x}

₁(i).

J_{1} (W_{1}, {\tilde{W}}_{1}, b_{1}, {\tilde{b}}_{1}) = \frac{1}{2 N} \sum_{i = 1}^{N} || {\tilde{x}}_{1} (i) - x (i) {||}^{2}

(3)

Similarly, the CAE k − 1 has already been constructed and pre-trained, and the hidden layer features h_k₋₁(i) are fed to the CAE k. The hidden layer features h_k(i) and the reconstructed data

\tilde{x}

_k(i) can thus be obtained.

h_{k} (i) = f (W_{k} h_{k - 1} (i) + b_{k + 1})

(4)

{\tilde{x}}_{k} (i) = \tilde{f} ({\tilde{W}}_{k} h_{k - 1} (i) + {\tilde{b}}_{k + 1})

(5)

Then, the CAE k is trained by minimizing the reconstruction error between the original input data x(i) and the output

\tilde{x}

_k(i).

J_{k} (W_{k}, {\tilde{W}}_{k}, b_{k}, {\tilde{b}}_{k}) = \frac{1}{2 N} \sum_{i = 1}^{N} || {\tilde{x}}_{k} (i) - x (i) {||}^{2}

(6)

As can be seen, the SCAE not only has a similar merit to SAE in that it is capable of learning the features from the concept low levels; the information loss is also kept to a minimum in each layer. Hence, it is more effective to carry out pattern recognition using the deep features of SCAE.

3.3. Supervised Fine-Tuning with AL

After the feature learning of SCAE, the hidden representation layers of SCAE are obtained. The SCAE-based DNN is built by adding a Softmax classifier on top of the hidden representation layer to perform multiclass classification. Thus, the fine-tuned DNN architecture is optimized using the backpropagation algorithm by minimizing the following cost function.

L (θ_{D N N}) = - \frac{1}{n} \sum_{j = 1}^{n} \sum_{k = 1}^{K} 1 (y_{i} = k) \log (\frac{\exp (p (x_{i}))}{\sum_{k = 1}^{K} \exp (p (x_{i}))}) + \frac{γ}{2} {‖W_{s o f t \max}‖}^{2}

(7)

where I(·) denotes an indicator function, the first term refers to the cross-entropy loss for the Softmax layer, the second term is the weight decay penalty, and θ_DNN represents the parameters of DNN.

To improve the diagnosis performance with fewer labeled training samples, the active learning algorithm is applied to the SCAE-based DNN for further fine-tuning. Uncertain sampling is one of the most commonly used frameworks for AL due to its low computational cost and high efficiency. Given a labeled training set D_L, the unlabeled training set D_U, and a DNN model M, the more the M is uncertain in classifying the sample x(i), the more informative that sample will be, and the more chances it has to be selected to accelerate the convergence of the DNN model. Generally, the output value of the Softmax layer is used to measure the uncertainty of the prediction by the DNN model. Common methods of measuring the level of uncertainty are entropy, margin sampling, and least confidence.

(1): Entropy (EN): Entropy is often considered an indicator of uncertainty that uses all class label probabilities.

$E N_{i} = - \sum_{j = 1}^{C} p_{θ} (y_{i} | x (i) \log p_{θ} (y_{i} | x (i))$

(8)

where C is the number of fault causes, x(i) ∈ D_U, y_i is a fault class, and θ represents the parameters in the DNN model.
(2): Least Confidence (LC): The probability of the first popular class for an instance is called the confidence; a low confidence means a high uncertainty of the model for this instance. The least confidence is used as an uncertainty measure.

$L C_{i} = \arg \max (1 - p_{θ} (y^{*} | x (i))$

(9)

where y* = argmax(p_θ (y|x(i))) is the most probable fault class of x(i), and p_θ(y*|x(i)) is the confidence coefficient of x(i).
(3): Least Margin (LM): The margin is the difference between the values of the highest posterior probability and the second-highest posterior probability, and the instance that has the smallest margin is selected.

$L M_{i} = \arg \min (p_{θ} ({y_{1}}^{*} | x (i)) - p_{θ} ({y_{2}}^{*} | x (i))$

(10)

where y₁* and y₂* are the first and second most probable class labels classified by the model.

The diversity criterion is used to select unlabeled instances that are diverse from each other, which aims at reducing the redundancy among the selected samples. The similarity is used to ensure the diversity of selected samples, which indicates x(i) is different from the other samples in DL. The similarity function is defined as [32]

f_{s i m} (x (i) / D_{L}) = \arg \max_{x (j) \in D_{L}} \exp (- \frac{| | x (i) - x (j) | |^{2}}{2 δ^{2}})

(11)

where δ is the Gaussian kernel parameter.

The typical measure of uncertainty is entropy, which measures the purity of the sample. However, it is very prone to being affected by trivial labels. The least confident strategy considers only the best-predicted class label and omits the information for the other labels. Therefore, it may result in a wrong instance selection in the multiclassification problem. As a compromise, the least margin strategy overcomes the above problems, so it is suitable for our method.

In general, none of the strategies discussed can make a perfect selection of unlabeled instances. If the instances are selected using uncertainty sampling, then there may be a chance in the reduction of its diversity [33]. Therefore, the combination of the LM and similarity is proposed in the literature to enhance the performance of AL; it is a trade-off between these two measures.

f_{t o t a l} (x (i)) = \arg \min_{x (i) \in D_{s}} \{a f_{sim} (x (i) | D_{L}) + (1 - a) f_{l m} (x (i) | M)\}

(12)

where a ∈ [0, 1] is a weight to balance the uncertainty and similarity, and f_lm(x(i)/M) denotes the uncertainty of unlabeled instance x(i) under the model M.

In our method, the weight a of the DFS strategy is dynamically adjusted to adapt to the different training requirements. At the beginning of training, due to the scarcity of labeled data, the SCAE-based DNN model has poor performance, and posterior probabilities corresponding to the model outputs are not trustworthy. Thus, the sampling strategy should mainly rely on the similarity indicator in the initial stage, and the weight a is set as a large value. As the training process goes on, the classification capability of the model improves, and posterior probabilities corresponding to the model outputs also become credible. The sampling strategy should mainly rely on the LM indicator to select highly informative samples for training, and the value of a is reduced. Consequently, the dynamic weight is adjusted mainly according to the classification accuracy of the model at different training stages, which can be expressed as

a = a_{0} e^{- A C C t}

(13)

where a₀ is the initial value, and ACCt is the classification accuracy in the tth round. The implementation of our proposed method is described in Algorithm 1.

Algorithm 1: Dynamic fusion active deep learning algorithm

  Input:
   Initially labeled samples D⁰_L, unlabeled samples D⁰_U, sample selection size N, maximum iteration number T, initial value a₀
  Output:
   The DNN model M^T
  1. Compute the parameters θ_SDAE of SCAE by minimizing the Equation (6) using unlabeled samples D⁰_U.
  2. Use θ_SCAE to initialize model M⁰, and compute θ_DNN by minimizing the cross-entropy using initially label samples D⁰_L.
  3. Classify all unlabeled samples in D⁰_U using the M¹ and obtain the posterior probabilities of all unlabeled samples in D⁰_U: P⁰_U.
  4. for t = 0 to T do
  5.   for each sample x_i in D^t_U do
  6.   Calculate the lowest difference between the two highest values according to Equation (10) in P^t_U.
  7.   Calculate the similarity measure according to Equation (11).
  8.   Calculate the informativeness value for x_i according to Equation (12).
  9.   end
10.   Add the top N unlabeled samples with the largest informativeness values to Ds for manual labeling.
11.   D^t⁺¹_U = D^t_U − Ds, D^t⁺¹_L = D^t_L + Ds, Ds = {}.
12.   Train the model M^t⁺¹ based on D^t⁺¹_L, and obtain the posterior probabilities of all unlabeled samples in D^t⁺¹_U: P^t⁺¹_U.
13.   Update the weight a according to Equation (13).
14     Update the P^t⁺¹_U = max(P^t_U, P^t⁺¹_U).
15. end

4. Experimental Results and Discussion

4.1. Experiment Setup

To verify the effectiveness of our method for MVB fault diagnosis, an MVB testbed is constructed to simulate the MVB of the metro train, and the constructed testbed is shown in Figure 8. As an interesting feature of the MVB protocol, the master start delimiter is fixed, which does not vary according to the frame data. Therefore, an MVB monitoring unit with a high-speed analog sampling circuit is developed to acquire the electrical MVB signal of the master start delimiter under different conditions, and the sampling rate of the MVB monitoring unit is 100 Msa/s. An MVB fault injection device is designed, in which the relay and analog switch are controlled by the field programmable gate array (FPGA) to simulate different fault modes. The MVB network’s typical fault injection experiment is shown in Figure 9, and the details of these injection faults are provided in Table 1; they are all persistent failures.

To simulate the class imbalance problem existing in actual applications, we use 5000 normal instances and 1000 fault instances from each fault condition. The dataset is set to be imbalanced because the difficulty of collecting fault instances in practice should be considered. The 600 sampling points corresponding to the physical signal segments at the starting boundary of the main node are selected as the model input, forming an MVB network fault diagnosis dataset of size 11,000 × 600. As the dataset is relatively small, we follow the experimental settings adopted in several related active learning studies [27,28], where the validation set is omitted in order to maximize the number of training samples. The dataset is randomly divided, with 70% of the total samples used for model training and the rest for testing; a 5-fold cross-validation strategy is employed on the training set for hyper-parameter tuning. We consider the worst case in which all the training samples are unlabeled. All training samples are used for the unsupervised pre-training.

4.2. Parameter Selection and Evaluation Metrics

In the proposed SCAE-based DNN framework, the SCAE contains three hidden layers, and their hidden nodes are set as 50, 200, and 400, respectively. Considering the MVB signals represent seven different working conditions, the number of neurons in the input layer of the first CAE and the total number of neurons in the output layers of all CAE are all the same as the number of physical signal sampling points of the main starting delimiter of the MVB, so the network structure for DNN is set as [600, 400, 200, 50, 7]. The Adam optimizer is employed with a learning rate of 0.01. The hidden nodes of the SCAE and DNN are the same, such that the features learned by the SCAE can be used by the DNN and fine-tuned further during supervised training.

We employ Keras to implement the SCAE-based DNN model, where the mini-batch gradient descent method is used, and Adam is applied to accelerate the network convergence. All the experiments are conducted on a common desktop PC with an Intel i5 2.8 GHz dual-core processor and 16 GB of RAM. Since the number of samples for each type of fault is 1000, we select 10% of the samples for each fault as the initial training samples and selected samples on each round, which are set to 100. This setting ensures the quality of model training while minimizing the number of manual labels for each sample. At the beginning of training, the DNN model exhibits a low performance; its predicted probability is not trustworthy, and therefore the initial weight value a₀ is set to 0.9.

The fault types considered in this study are mainly persistent faults, in which the fault phenomena continuously occur. The main objective of this work is to obtain a high-quality fault diagnosis model while minimizing manual labeling costs in engineering applications. Therefore, three different indicators are used to measure the performance of the proposed fault diagnosis method. Accuracy is a commonly used metric for evaluating the quality of an algorithm in fault diagnosis; it refers to the ratio of the number of instances diagnosed correctly to the total number of instances in the test set. The diagnosis error rate (DER) represents the ratio of fault instances diagnosed as fault instances but misdiagnosed to the total number of fault instances in the test set. The undetected error rate (UER) denotes the ratio of the fault cases diagnosed as normal instances to the total number of fault instances in the test set.

4.3. Performance Evaluation

To demonstrate that the proposed framework can improve classification performance with fewer labeled instances, we compare it with three uncertainty sampling methods, as mentioned before, and two baseline methods (ALL and RA).

ALL: All the training instances are manually labeled and used to train the SCAE-based DNN. This method can be considered the upper bound (the best performance that our model can reach with all labeled training samples)
Random (RA): During the training process, some instances are randomly selected from the training set to be annotated to fine-tune our model. This method can be regarded as the lower bound.

As illustrated in Figure 10, all the AL algorithms outperform the RA method. This demonstrates the effectiveness of the application of AL algorithms to deep learning. Our DFS strategy achieves the best performance among the other AL algorithms, especially in the early rounds. This is because our method combines similarity and uncertainty criteria to select more informative training samples. To achieve the same accuracy, the number of labeled samples required in our proposed method is much less than the RA method. For example, to achieve an accuracy of 90%, the number of labeled samples required in our proposed method is 600, and that number in the RA method is 2800. The performance of the LM method is marginally better than that of the LC and EN methods, which indicates that the LM method measures uncertainty more accurately.

Figure 11 illustrates the class distributions of the labeled training set as the phase progresses. The random sampling method tends to select the normal samples, and the class distribution of the labeled training set is unchanged during the training process. Therefore, the labeled training set used to train the classifier is class-imbalanced, which significantly degrades the performance. Our method selects minority class samples more so that the proportion of minority classes increases with progress, meaning it can exhibit a relatively balanced class distribution. All of these elements indicate that our method is suitable for fault diagnosis of MVB with fewer labeled training samples.

We also compare the performance of the proposed method with an active deep learning method in the literature [30]; the SDAE has the same structure as a multi-layer network and SCAE, but the parameters are layer-wise pre-trained with traditional AE. The comparison results are shown in Figure 12. The accuracy of our model with ALL (99.24%) is better than the SDAE-based DNN with ALL (97.8%), and the DER and UER are lower. Furthermore, the performance of our approach is better than the SDAE-based DNN with AL algorithms. This is because the SCAE can more effectively capture the intrinsic features from raw input data with smaller information loss, and the DFS strategy can choose more informative training samples to improve classification performance.

To evaluate the effectiveness of our dynamic weight adjustment, our DFS strategy is compared with the fixed weight strategy. The fixed weight strategy adopts our proposed method albeit with a fixed weight. We compare two types of settings: FIX1 (a = 0.5), FIX2 (a = 0.1).

The comparison results after different rounds are summarized in Table 2. Our method outperforms two fixed-weight methods; this indicates that dynamic weight adjustment is more effective for selecting informative samples as compared to fixed-weight settings. Further exploration of the influence of weight a reveals that the performance of the FIX1 method is better than the FIX2 method, especially after five rounds. Due to the small weight value, the performance of the FIX2 method is only slightly better than the LM method. This validates our assumption that the trained DNN model is not trustworthy in the early rounds; the setting of large a in the initial rounds can improve the performance of our method.

5. Conclusions

A fault diagnosis method for TCN based on active learning and SCAE is proposed in this work. Compared with the original SAE, the SCAE can learn features directly from the MVB signals and describe the data structure of the raw inputs much better. Through the diagnosis results of the experiments, the SCAE-based CNN model can effectively identify fault states with high diagnostic performance. Moreover, a dynamic fusion active learning strategy is presented to reduce the cost of manual labeling, which is capable of improving diagnostic performance through adaptively adjusting weight between uncertainty and similarity at different training stages. The experimental results demonstrate that our proposed method outperforms state-of-the-art methods and is symmetrically valid in class-imbalanced data. Since our system just requires the installation of a monitoring unit to collect physical-layer electronic signals in the MVB network, our method can be directly employed in the current system.

Nevertheless, the experimental data were collected under relatively stable operating conditions, which may limit the method’s applicability under strong environmental interference. In addition, the proposed approach has not yet addressed the challenge of identifying previously unseen fault types, as such faults would currently be classified into existing categories.

In future work, we would like to develop a multi-criteria active learning algorithm that suits the fault diagnosis of TCN to achieve a better trade-off between diagnosis accuracy and labeling costs. At the same time, we will also study model compression algorithms to reduce the number of model parameters and further enhance real-time diagnostic performance.

Author Contributions

Conceptualization, Y.Y. and X.N.; methodology, Y.Y.; software, H.W.; validation, Y.Y. and G.L.; formal analysis, H.W. and G.L.; data curation, Y.Y. and S.W.; writing—original draft preparation, Y.Y.; supervision, S.W.; project administration, X.N.; funding acquisition, H.W. and S.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the Key Research and Development Project of Henan Province, grant number: (251111211600); the Natural Science Foundation of Henan Province, grant number: (252300420397); the Henan Province Science and Technology R&D projects, grant number: (242102320215); the Joint Fund Project of the National Natural Science Foundation of China, grant number: (U2368201); the High-end Foreign Expert Program of Henan Province, grant number: (HNGD2024032); and the Subject Strength Enhancement Plan Project of Zhongyuan University of Technology, grant number: (GG202412).

Data Availability Statement

The datasets generated and/or analyzed during the current study are not publicly available at this time as they have not yet been organized for release, but they will be made available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Luedicke, D.; Lehner, A. Train Communication Networks and Prospects. IEEE Commun. Mag. 2019, 57, 39–43. [Google Scholar] [CrossRef]
IEC 61375-1; Electronic Railway Equipment—Train Communication Network (TCN)—Part 1: General Architecture. IEC: Geneva, Switzerland, 2012.
Gao, D.H.; Wang, Q.F. Health monitoring of controller area network in hybrid excavator based on the message response time. In Proceedings of the IEEE/ASME International Conference on AIM, Besacon, France, 8–11 July 2014. [Google Scholar]
Lei, Y.; Djurdjanovic, D.; Barajas, L.; Workman, G.; Biller, S.; Ni, J. DeviceNet network health monitoring using physical layer parameters. J. Intell. Manuf. 2011, 22, 289–299. [Google Scholar] [CrossRef]
Choi, W.; Jo, H.J.; Woo, S.; Chun, J.Y.; Park, J.; Lee, D.H. Identifying ECUs using inimitable characteristics of signals in controller area networks. IEEE Trans. Veh. Technol. 2018, 67, 4757–4770. [Google Scholar] [CrossRef]
Yang, Y.Y.; Wang, L.D.; Li, Z.Z.; Shen, P.; Guan, X.; Xia, W. Anomaly Detection for Controller Area Network in Braking Control System With Dynamic Ensemble Selection. IEEE Access 2019, 7, 95418–95429. [Google Scholar] [CrossRef]
Li, Z.Z.; Wang, L.D.; Yang, Y.Y.; Du, X.M.; Song, H. Health Evaluation of MVB Based on SVDD and Sample Reduction. IEEE Access 2019, 7, 35330–35343. [Google Scholar] [CrossRef]
Li, Z.Z.; Wang, L.D.; Yang, Y.Y. Fault diagnosis of the train communication network based on weighted support vector machine. IEEJ Trans. Electr. Electron. Eng. 2020, 15, 1077–1088. [Google Scholar] [CrossRef]
He, D.Q.; Zhao, J.Y.; Jin, Z.Z.; Huang, C.; Yi, C.; Wu, J. DCAGGCN: A novel method for remaining useful life prediction of bearings. Reliab. Eng. Syst. Saf. 2025, 260, 110978. [Google Scholar] [CrossRef]
Zhao, J.Y.; He, D.Q.; Jin, Z.Z.; Zhang, X.W.; Zhou, J.X. A new method for bearing remaining useful life prediction based on dynamic wavelet and physical information constraints. Expert. Syst. Appl. 2025, 296, 129023. [Google Scholar] [CrossRef]
Yu, J.; Xu, Y.G.; Liu, K. Planetary gear fault diagnosis using stacked denoising autoencoder and gated recurrent unit neural network under noisy environment and time-varying rotational speed conditions. Meas. Sci. Technol. 2019, 30, 095003. [Google Scholar] [CrossRef]
Yu, H.; Wang, K.; Li, Y.; Zhao, W. Representation Learning With Class Level Autoencoder for Intelligent Fault Diagnosis. IEEE Signal Process. Lett. 2019, 26, 1476–1480. [Google Scholar] [CrossRef]
Sun, W.J.; Shao, S.Y.; Zhao, R.; Yan, R.; Zhang, X.; Chen, X. A sparse auto-encoder-based deep neural network approach for induction motor faults classification. Measurement 2016, 89, 171–178. [Google Scholar] [CrossRef]
Yuan, X.F.; Wang, Y.L.; Yang, C.H.; Gui, W.H. Stacked isomorphic autoencoder based soft analyzer and its application to sulfur recovery unit. Inf. Sci. 2020, 534, 72–84. [Google Scholar] [CrossRef]
Kumar, P.; Gupta, A. Active Learning Query Strategies for Classification, Regression, and Clustering: A Survey. J. Comput. Sci. Technol. 2020, 35, 913–945. [Google Scholar] [CrossRef]
Cao, X.Y.; Yao, J.; Xu, Z.B.; Meng, D.Y. Hyperspectral Image Classification With Convolutional Neural Network and Active Learning. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4604–4616. [Google Scholar] [CrossRef]
Bi, H.X.; Xu, F.; Wei, Z.Q.; Xue, Y.; Xu, Z.B. An Active Deep Learning Approach for Minimally Supervised PolSAR Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9378–9395. [Google Scholar] [CrossRef]
Xu, J.; He, Z.Q.; Wang, Y.Y.; Guo, Y. Strategically annotating rare disease samples to improve semi-supervised learning in long-tailed medical image classification. Biomed. Signal Process. Control. 2025, 110, 108258. [Google Scholar] [CrossRef]
Lu, J.T.; Wu, W.; Huang, X.; Yin, Q.T.; Yang, K.Z.; Li, S. A modified active learning intelligent fault diagnosis method for rolling bearings with unbalanced samples. Adv. Eng. Inform. 2024, 60, 102397. [Google Scholar] [CrossRef]
Fan, C.; Wu, Q.T.; Zhao, Y.; Mo, L.K. Integrating active learning and semi-supervised learning for improved data-driven HVAC fault diagnosis performance. Appl. Energy 2024, 356, 122356. [Google Scholar] [CrossRef]
Zhang, A.M.; Li, B.H.; Wang, W.H.; Wan, S.; Chen, W.T. MII: A Novel Text Classification Model Combining Deep Active Learning with BERT. CMC-Comput. Mater. Contin. 2020, 63, 1499–1514. [Google Scholar] [CrossRef]
Zhao, X.; Li, M.; Xu, J.; Song, G. An effective procedure exploiting unlabeled data to build monitoring system. Expert. Syst. Appl. 2011, 38, 10199–10204. [Google Scholar] [CrossRef]
Gao, F.; Yue, Z.Y.; Wang, J.; Sun, J.; Yang, E.; Zhou, H. A novel active semisupervised convolutional neural network algorithm for SAR image recognition. Comput. Intell. Neurosci. 2017, 2017, 3105053. [Google Scholar] [CrossRef]
Sener, O.; Savarese, S. Active Learning for Convolutional Neural Networks: A Core-Set Approach. arXiv 2018, arXiv:1708.00489. [Google Scholar] [CrossRef]
Xiang, Z.L.; Chen, J.H.; Bao, Y.Q.; Li, H. An active learning method combining deep neural network and weighted sampling for structural reliability analysis. Mech. Syst. Signal Proc. 2020, 140, 16. [Google Scholar] [CrossRef]
Liu, F.; Zhang, T.; Zheng, C.; Zheng, C.X.; Cheng, Y.Y.; Liu, X.L.; Qi, M.; Kong, J.; Wang, J.Z. An Intelligent Multi-View Active Learning Method Based on a Double-Branch Network. Entropy 2020, 22, 901. [Google Scholar] [CrossRef]
Chen, M.; Zhu, K.; Wang, R.; Niyato, D. Active Learning-Based Fault Diagnosis in Self-Organizing Cellular Networks. IEEE Commun. Lett. 2020, 24, 1734–1737. [Google Scholar] [CrossRef]
Wang, K.Z.; Zhang, D.Y.; Li, Y.; Zhang, R.M.; Lin, L. Cost-Effective Active Learning for Deep Image Classification. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 2591–2600. [Google Scholar] [CrossRef]
Al Rahhal, M.M.; Bazi, Y.; AlHichri, H.; Alajlan, N.; Melgani, F.; Yager, R.R. Deep learning approach for active classification of electrocardiogram signals. Inf. Sci. 2016, 345, 340–354. [Google Scholar] [CrossRef]
Zhang, Y.Q.; Cao, G.; Li, X.S.; Wang, B.S.; Fu, P. Active Semi-Supervised Random Forest for Hyperspectral Image Classification. Remote Sens. 2019, 11, 21. [Google Scholar] [CrossRef]
ISO/IEC Standard 8482; Information Technology—Telecommunications and Information Exchange Between Systems—Twisted pair Multipoint Interconnections. ISO: Geneva, Switzerland, 1993.
Matiz, S.; Barner, K.E. Inductive conformal predictor for convolutional neural networks: Applications to active learning for image classification. Pattern Recognit. 2019, 90, 172–182. [Google Scholar] [CrossRef]
Yuan, J.; Hou, X.X.; Xiao, Y.Q.; Cao, D.; Guan, W.L.; Nie, L.Q. Multi-criteria active deep learning for image classification. Knowl.-Based Syst. 2019, 172, 86–94. [Google Scholar] [CrossRef]

Figure 1. The TCN topology of the metro train.

Figure 2. Data and non-data encoding.

Figure 3. MVB frame format. (a) Master frame format. (b) Slave frame format.

Figure 4. Electrical MVB signals in a unit macro-period.

Figure 5. Framework of the proposed method.

Figure 6. Network structure of CAE.

Figure 7. Network structure of SCAE.

Figure 8. Fault diagnosis testbed of TCN.

Figure 9. Schematic diagram of the MVB network’s typical fault injection experiment.

Figure 10. Performance comparison between different sampling strategies. (a) Accuracy after each round. (b) DER after each round. (c) UER after each round.

Figure 11. Class distributions of the labeled training set as the phase progresses. (a) Random selection. (b) The proposed method.

Figure 12. Performance comparison with different diagnosis methods. (a) Accuracy after each round. (b) DER after each round. (c) UER after each round.

Table 1. MVB condition descriptions.

Fault ID	State Definition	Description
F0	Normal communication	Conforms to the MVB protocol
F1	Open fault	Some nodes are disconnected
F2	Terminating fault	Missing a terminating resistor
F3	Transceiver fault	The component values’ deviations
F4	Connector degradation	Erosion or wear of the connector may change contact resistance
F5	Cable degradation	The electronic properties of the cable are changed due to the vibration and improper mounting
F6	Short fault	Two wires of MVB are short-connected

Table 2. Performance using different active learning techniques after different rounds.

Method	No. Rounds 5			No. Rounds 10
Method	ACC	DER	UER	ACC	DER	UER
SCAE + LM	84.09	30.88	10.66	91.77	13.31	1.67
SCAE + FIX1	87.32	16.18	7.62	92.23	11.49	0.27
SCAE + FIX2	85.47	26.9	10.21	92.06	13.68	0.35
SCAE + RA	63.16	51.04	15.28	75.21	31.15	10.65
Our method	89.43	11.78	4.98	95.25	6.33	0.21

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, Y.; Wang, H.; Nie, X.; Wen, S.; Li, G. A Fault Diagnosis Method for the Train Communication Network Based on Active Learning and Stacked Consistent Autoencoder. Symmetry 2025, 17, 1622. https://doi.org/10.3390/sym17101622

AMA Style

Yang Y, Wang H, Nie X, Wen S, Li G. A Fault Diagnosis Method for the Train Communication Network Based on Active Learning and Stacked Consistent Autoencoder. Symmetry. 2025; 17(10):1622. https://doi.org/10.3390/sym17101622

Chicago/Turabian Style

Yang, Yueyi, Haiquan Wang, Xiaobo Nie, Shengjun Wen, and Guolong Li. 2025. "A Fault Diagnosis Method for the Train Communication Network Based on Active Learning and Stacked Consistent Autoencoder" Symmetry 17, no. 10: 1622. https://doi.org/10.3390/sym17101622

APA Style

Yang, Y., Wang, H., Nie, X., Wen, S., & Li, G. (2025). A Fault Diagnosis Method for the Train Communication Network Based on Active Learning and Stacked Consistent Autoencoder. Symmetry, 17(10), 1622. https://doi.org/10.3390/sym17101622

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Fault Diagnosis Method for the Train Communication Network Based on Active Learning and Stacked Consistent Autoencoder

Abstract

1. Introduction

2. Background

3. Proposed Method

3.1. System Overview

3.2. Unsupervised Feature Learning Using SCAE

3.3. Supervised Fine-Tuning with AL

4. Experimental Results and Discussion

4.1. Experiment Setup

4.2. Parameter Selection and Evaluation Metrics

4.3. Performance Evaluation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI