1. Introduction
The train communication network (TCN) is used for the transmission of train control and monitoring (TCMS) data, which meets security, robustness, and real-time requirements in railway vehicles [
1]. The TCN has been developed as an international electrotechnical commission (IEC), where the multifunction vehicle bus (MVB) is used for the control of data transmission among electronic control units [
2]. Due to harsh working conditions and the growing complexity of TCN, various factors could cause performance degradation of the MVB and disrupt the symmetry of TCN topology, which would be catastrophic for railway traffic. Therefore, advanced fault diagnosis methods for MVB are critical not only for reducing maintenance costs but also for improving network performance and reliability.
Most existing research focuses on the effects of network delays on the networked automation system [
3]. However, these methods struggle to find the root causes of faults, especially for early faults. Many factors will affect the quality of the electrical signal, such as media degradation, network impedance mismatch, and grounding problems. Thus, the signals from the physical layer contain more network performance and failure information. Lei et al. [
4] presented a network health management system that provides diagnostic information for DeviceNet; the features extracted from analog waveform communication signals were used to evaluate the network performance. Choi et al. [
5] introduced a source identification method for an in-vehicle network; the features in both time and frequency domains extracted from the electrical CAN signal were employed to detect malicious electronic control units (ECUs). Yang et al. [
6] proposed an anomaly detection method using electrical CAN signals for the braking control network. Twelve features, such as steady-state dominance, overshoot, and bit time, were utilized to train the ensemble learning model. Li et al. [
7] introduced a health evaluation method for MVB based on the physical-layer waveform; the health condition of the MVB device was quantified using the distance between the tested sample and the trained hypersphere. Li et al. [
8] extracted fault features from the electrical MVB signals and used a weighted support vector machine to diagnose MVB faults. Although existing methods have demonstrated impressive results, these methods require thorough analysis and a priori knowledge of the fault mechanism, and the hand-crafted features have limitations in terms of application.
In recent years, prognostic and health management (PHM) technology has emerged as a research hotspot in both academia and industry. The application of PHM to TCN is of great significance for achieving cost-effective and intelligent maintenance, as well as improving system reliability. In PHM systems, many deep learning models have been proposed for fault diagnosis and useful life prediction to achieve automatic feature extraction and obtain high accuracy [
9,
10,
11]. Compared with supervised learning methods such as CNN and GCN, stacked autoencoder (SAE) has the ability to find discriminative and high-level representations of complex data in an unsupervised feature learning manner, which allows for more effective utilization of unlabeled samples and significantly reduces the difficulty of model training [
12]. SAE-based fault diagnosis methods have two key phases: greedy layer-wise unsupervised pre-training and supervised fine-tuning of the whole network. The weights of SAE learned in the pre-training phase are utilized to initialize the whole network for fine-tuning, which is much more effective than random initialization [
13]. In the pre-training phase, the hierarchical features can be extracted by multiple autoencoders from the original inputs to the top layer. Then, each AE is trained by minimizing the reconstruction error between the low-level feature at the hidden layer of the previous AE and the reconstructed data. Since the minimum reconstruction error of each AE is non-zero, there is information loss in each AE. Therefore, the information loss of SAE is accumulated little by little, and the extracted features may not be the best representation of the original input data [
14]. In general, diagnostic performance is more significantly affected by the learned features.
To obtain a high-quality deep learning model, a large number of labeled training instances are required due to the growth of training parameters in deep neural networks. However, it is too costly and difficult for engineering experts to carry out manual labeling for the enormous number of electrical MVB signals in the communication process. Moreover, it is usually difficult to obtain sufficient fault signals in practice. Active learning (AL) offers an effective approach to achieving higher performance of the model while requiring only a few labeled training samples, and it significantly lowers the cost [
15]. AL has been applied to many real-world problems, such as image classification [
16,
17,
18], fault diagnosis [
19,
20], text classification [
21], and system monitoring [
22]. The sampling strategy for deep learning has attracted the attention of researchers [
23,
24,
25,
26]. In active learning, a small labeled training set is used to train the model in the first round; then, a sampling strategy is utilized to select the most informative unlabeled samples for labeling. The newly labeled samples are added to the labeled training set. In the next round, the parameters of the model are updated based on the new training set. This process is repeated until some preset performance criterion is reached or the unlabeled pool becomes empty. Chen et al. [
27] proposed an active learning-based fault diagnosis method in self-organizing cellular networks; the most informative unlabeled samples were selected by uncertainty sampling. Wang et al. [
28] selected two kinds of samples for fine-tuning of CNNs (convolutional neural networks) according to the output of the CNN model; one kind is the most uncertain samples, and the other kind is the high-confidence samples. Rahhal et al. [
29] presented an active deep learning method for electrocardiogram (ECG) signals, where entropy and breaking ties were used to measure uncertainty. However, the improvement achieved using a single sampling strategy was limited. The uncertainty sampling strategy is prone to selecting outliers and may result in the problem of sampling bias [
30]. Although the diversity sampling strategy solves the problem of sampling bias and redundant instance selection, this strategy may require more selected unlabeled samples to achieve the target decision boundary, which may cause slow convergence.
To overcome these issues, a fault diagnosis method for TCN is proposed based on active learning and SCAE. The main contributions of this work are summarized as follows:
A TCN fault diagnosis method is proposed based on active learning and SCAE. SCAE is employed to automatically learn discriminative features from electrical MVB signals in the unsupervised feature learning phase, which present better feature representations for raw input data and diagnostic performance than the original SAE.
A framework of deep active learning for TCN fault diagnosis is designed in the supervised fine-tuning phase, and a dynamic fusion AL strategy is proposed to enhance the performance of our diagnosis model with lower labeling costs. There is a trade-off between the uncertainty and similarity sampling, and the fusion weight is dynamically adjusted at the different training stages.
A fault diagnosis testbed was constructed, and a monitoring unit was added to the MVB network. Extensive comparison experiments with the state-of-the-art methods are performed, and the experimental results demonstrate that our proposed method achieves better performance with fewer labeled samples, symmetrically improving diagnosis accuracy and the efficiency of data labeling.
2. Background
The train communication network is mainly applied for transmitting key data such as control commands and status information. Compared with bus communication protocols like CAN, ARCNET, and WordFIP, the TCN protocol is most widely applied in the field of rail transit. According to the international standard IEC 61375-3-1 [
2], the train communication network typically adopts a two-level bus network structure consisting of the wire train bus (WTB) and the multifunction vehicle bus (MVB). The MVB network is widely used in high-speed EMUs and other rail transit trains that do not require frequent re-marshalling. It connects various network node devices and control devices of different train subsystems, with diverse application functions. In practical engineering applications, faults in the train communication network often occur in the MVB network.
Figure 1 shows the network topology of the metro train; the in-vehicle operations are typically controlled by different electronic control units, such as the vehicle control unit (VCU), electric drive control unit (EDCU), electric braking control unit (EBCU), remote input/output module (RIOM), and air conditioning unit (ACU), which are connected by an MVB. Generally, the MVB is developed in master–slave communication mode, and the two VCUs on a metro train are configured for hot standby redundancy, with one of them functioning as the network master device. MVB messages are used to transmit the train control and monitoring data.
The transmission media of the MVB include electrical short distance (ESD), electrical middle distance (EMD), and optical glass medium (OGF), and the baud rate of MVB is 1.5 Mbps. The EMD is the most commonly used in MVB; a twisted pair of two wires is used in a shielded cable, which can support up to 32 devices over a distance of 200 miles. The electrical MVB signal conforms to ISO/IEC 8482 (RS-485) [
31] and is encoded using the non-return-to-zero (NRZ) bit of Manchester code. A high-to-low-level transition within 1 bit time is recognized as 1, while a low-to-high-level transition within 1 bit time is recognized as 0. A high level that persists for 1 bit time is identified as non-data “NH”, and a low level that persists for 1 bit time is identified as non-data “NL”. The data and non-data encoding are shown in
Figure 2.
The MVB protocol defines the master frame and the slave frame, and
Figure 3 shows the master data frame format and the slave data frame format. The master frame is made up of the master start delimiter, F_code, slave address, check sequence (CS), and end delimiter (ED). The master start delimiter consists of the sequence {start bit, ‘NH’, ‘NL’, ‘0’, ‘NH’, ‘NL’, ‘0’, ‘0’, ‘0’}. The slave frame consists of the slave start delimiter, different lengths of the frame data, the check sequence, and the end delimiter. As an interesting feature of the MVB protocol, the master start delimiter and slave start delimiter are fixed, which do not vary according to the frame data and can be used to identify the MVB master frame.
Figure 4 shows the different electrical MVB signals in a unit macro-period; various factors will result in the signal waveform deviating from the normal condition, which may cause various MVB faults. To diagnose the MVB faults, MVB signals corresponding to the bit string of the master start delimiter are collected, and the fault features are extracted from the measured signals.
Faults and interferences are directly related to major failure modes of the MVB networks, and these failure modes result in the network’s robustness and reliability. In accordance with domain experts, the majority of failure modes commonly occurring on MVB networks are listed in
Table 1 [
7,
8]. The open fault mode and short fault mode are known as hard faults, which cause network breakdown or network devices to move offline. Terminating fault mode, transceiver fault mode, connector degradation, and cable degradation are considered soft faults, which may cause the reflection phenomenon and thus affect the quality of MVB signals and make the system vulnerable to external interferences. These faults are all persistent failures caused by the aging of components such as cables and connectors. The fault phenomena remain until the corresponding faulty parts are replaced.
3. Proposed Method
3.1. System Overview
Due to insufficient labeled signals in practice, the aim of our system is to construct a fault diagnosis model for MVB that achieves high diagnostic performance with a lower labeling cost. The framework of our proposed method is shown in
Figure 5, and the description of the proposed method is presented in the following part.
In the unsupervised feature learning phase, all unlabeled data are used to train the SCAE in an unsupervised manner. After the pre-training of SCAE, a SCAE-based DNN model is constructed and initialized using the parameters of the SCAE. In the supervised fine-tuning stage, a dynamic fusion sampling (DFS) strategy is employed to select the most informative instances for expert labeling from the unlabeled set, and the SCAE-based DNN model is trained based on the labeled instances. In the DFS strategy, similarity is employed to reduce the information redundancy, and uncertainty is used to select the unlabeled instances to speed up the convergence of the model. The fusion weight is dynamically adjusted at the different training stages. Subsequently, the SCAE-based DNN model is updated, and this process repeats until the performance requirements are satisfied or the unlabeled dataset is empty. It is worth noting that the model outputs in the current round and the previous round are fused by the max rule to avoid overfitting problems.
3.2. Unsupervised Feature Learning Using SCAE
CAE is also an AE that comprises the input, hidden, and output layers. The structure of a CAE is shown in
Figure 6. From the dataset
X = {
x(1),
x(2), …
x(
i), …
x(
N)},
x(
i) ∈
RS,
N is the number of data samples, and S is the number of features in each sample. {
W,
b} and {
,
} represent the parameters at the hidden layer and output layer. For CAE, the input vector is the original input data
x(
i) or the features extracted at a certain hidden layer, and the output layer of CAE is isomorphic with the original input data
x(
i). The output of CAE is close to or refactoring
x(
i). We can assume the variable vectors at the input and hidden layer are
z(
i) and
h(
i), respectively. The reconstructed original data
can be obtained as follows:
where
θCAE = {
W,
b,
,
},
f and
are the corresponding activation functions. The parameters of CAE are optimized by minimizing the reconstruction error between the original data and the output.
Multiple CAEs can be hierarchically stacked to build SCAE neural networks; the structure of SCAE is shown in
Figure 7. We can assume there are a total of
k CAE models that are denoted {CAE 1, CAE2, … CAE
k}, and the detailed pre-training procedure for the deep SCAE is shown in
Figure 7, the black dashed line indicates that the hidden layer of the previous CAE serves as the input layer of the next CAE, while the red dashed line indicates that during model training, the output layers of all CAEs are combined with the input layer of the first CAE to calculate the loss function for each CAE. The input feature vector of CAE 1 is the original input data
x(
i), and the output is the reconstructed raw
1(
i). The hidden layer feature
h1(
i) of CAE 1 is extracted and used as the input vector of CAE 2. The parameters of CAE 1 are optimized by minimizing the reconstruction error between the original input data
x(
i) and the reconstructed data
1(
i).
Similarly, the CAE
k − 1 has already been constructed and pre-trained, and the hidden layer features
hk−1(
i) are fed to the CAE
k. The hidden layer features
hk(
i) and the reconstructed data
k(
i) can thus be obtained.
Then, the CAE
k is trained by minimizing the reconstruction error between the original input data
x(
i) and the output
k(
i).
As can be seen, the SCAE not only has a similar merit to SAE in that it is capable of learning the features from the concept low levels; the information loss is also kept to a minimum in each layer. Hence, it is more effective to carry out pattern recognition using the deep features of SCAE.
3.3. Supervised Fine-Tuning with AL
After the feature learning of SCAE, the hidden representation layers of SCAE are obtained. The SCAE-based DNN is built by adding a Softmax classifier on top of the hidden representation layer to perform multiclass classification. Thus, the fine-tuned DNN architecture is optimized using the backpropagation algorithm by minimizing the following cost function.
where
I(·) denotes an indicator function, the first term refers to the cross-entropy loss for the Softmax layer, the second term is the weight decay penalty, and
θDNN represents the parameters of DNN.
To improve the diagnosis performance with fewer labeled training samples, the active learning algorithm is applied to the SCAE-based DNN for further fine-tuning. Uncertain sampling is one of the most commonly used frameworks for AL due to its low computational cost and high efficiency. Given a labeled training set DL, the unlabeled training set DU, and a DNN model M, the more the M is uncertain in classifying the sample x(i), the more informative that sample will be, and the more chances it has to be selected to accelerate the convergence of the DNN model. Generally, the output value of the Softmax layer is used to measure the uncertainty of the prediction by the DNN model. Common methods of measuring the level of uncertainty are entropy, margin sampling, and least confidence.
- (1)
Entropy (EN): Entropy is often considered an indicator of uncertainty that uses all class label probabilities.
where
C is the number of fault causes,
x(
i) ∈
DU,
yi is a fault class, and
θ represents the parameters in the DNN model.
- (2)
Least Confidence (LC): The probability of the first popular class for an instance is called the confidence; a low confidence means a high uncertainty of the model for this instance. The least confidence is used as an uncertainty measure.
where
y* = argmax(
pθ (
y|
x(
i))) is the most probable fault class of
x(
i), and
pθ(
y*|
x(
i)) is the confidence coefficient of
x(
i).
- (3)
Least Margin (LM): The margin is the difference between the values of the highest posterior probability and the second-highest posterior probability, and the instance that has the smallest margin is selected.
where
y1* and
y2* are the first and second most probable class labels classified by the model.
The diversity criterion is used to select unlabeled instances that are diverse from each other, which aims at reducing the redundancy among the selected samples. The similarity is used to ensure the diversity of selected samples, which indicates
x(
i) is different from the other samples in DL. The similarity function is defined as [
32]
where
δ is the Gaussian kernel parameter.
The typical measure of uncertainty is entropy, which measures the purity of the sample. However, it is very prone to being affected by trivial labels. The least confident strategy considers only the best-predicted class label and omits the information for the other labels. Therefore, it may result in a wrong instance selection in the multiclassification problem. As a compromise, the least margin strategy overcomes the above problems, so it is suitable for our method.
In general, none of the strategies discussed can make a perfect selection of unlabeled instances. If the instances are selected using uncertainty sampling, then there may be a chance in the reduction of its diversity [
33]. Therefore, the combination of the LM and similarity is proposed in the literature to enhance the performance of AL; it is a trade-off between these two measures.
where
a ∈ [0, 1] is a weight to balance the uncertainty and similarity, and
flm(
x(
i)/
M) denotes the uncertainty of unlabeled instance
x(
i) under the model
M.
In our method, the weight
a of the DFS strategy is dynamically adjusted to adapt to the different training requirements. At the beginning of training, due to the scarcity of labeled data, the SCAE-based DNN model has poor performance, and posterior probabilities corresponding to the model outputs are not trustworthy. Thus, the sampling strategy should mainly rely on the similarity indicator in the initial stage, and the weight
a is set as a large value. As the training process goes on, the classification capability of the model improves, and posterior probabilities corresponding to the model outputs also become credible. The sampling strategy should mainly rely on the LM indicator to select highly informative samples for training, and the value of
a is reduced. Consequently, the dynamic weight is adjusted mainly according to the classification accuracy of the model at different training stages, which can be expressed as
where
a0 is the initial value, and
ACCt is the classification accuracy in the
tth round. The implementation of our proposed method is described in Algorithm 1.
Algorithm 1: Dynamic fusion active deep learning algorithm |
Input: Initially labeled samples D0L, unlabeled samples D0U, sample selection size N, maximum iteration number T, initial value a0 Output: The DNN model MT 1. Compute the parameters θSDAE of SCAE by minimizing the Equation (6) using unlabeled samples D0U. 2. Use θSCAE to initialize model M0, and compute θDNN by minimizing the cross-entropy using initially label samples D0L. 3. Classify all unlabeled samples in D0U using the M1 and obtain the posterior probabilities of all unlabeled samples in D0U: P0U. 4. for t = 0 to T do 5. for each sample xi in DtU do 6. Calculate the lowest difference between the two highest values according to Equation (10) in PtU. 7. Calculate the similarity measure according to Equation (11). 8. Calculate the informativeness value for xi according to Equation (12). 9. end 10. Add the top N unlabeled samples with the largest informativeness values to Ds for manual labeling. 11. Dt+1U = DtU − Ds, Dt+1L = DtL + Ds, Ds = {}. 12. Train the model Mt+1 based on Dt+1L, and obtain the posterior probabilities of all unlabeled samples in Dt+1U: Pt+1U. 13. Update the weight a according to Equation (13). 14 Update the Pt+1U = max(PtU, Pt+1U). 15. end |
4. Experimental Results and Discussion
4.1. Experiment Setup
To verify the effectiveness of our method for MVB fault diagnosis, an MVB testbed is constructed to simulate the MVB of the metro train, and the constructed testbed is shown in
Figure 8. As an interesting feature of the MVB protocol, the master start delimiter is fixed, which does not vary according to the frame data. Therefore, an MVB monitoring unit with a high-speed analog sampling circuit is developed to acquire the electrical MVB signal of the master start delimiter under different conditions, and the sampling rate of the MVB monitoring unit is 100 Msa/s. An MVB fault injection device is designed, in which the relay and analog switch are controlled by the field programmable gate array (FPGA) to simulate different fault modes. The MVB network’s typical fault injection experiment is shown in
Figure 9, and the details of these injection faults are provided in
Table 1; they are all persistent failures.
To simulate the class imbalance problem existing in actual applications, we use 5000 normal instances and 1000 fault instances from each fault condition. The dataset is set to be imbalanced because the difficulty of collecting fault instances in practice should be considered. The 600 sampling points corresponding to the physical signal segments at the starting boundary of the main node are selected as the model input, forming an MVB network fault diagnosis dataset of size 11,000 × 600. As the dataset is relatively small, we follow the experimental settings adopted in several related active learning studies [
27,
28], where the validation set is omitted in order to maximize the number of training samples. The dataset is randomly divided, with 70% of the total samples used for model training and the rest for testing; a 5-fold cross-validation strategy is employed on the training set for hyper-parameter tuning. We consider the worst case in which all the training samples are unlabeled. All training samples are used for the unsupervised pre-training.
4.2. Parameter Selection and Evaluation Metrics
In the proposed SCAE-based DNN framework, the SCAE contains three hidden layers, and their hidden nodes are set as 50, 200, and 400, respectively. Considering the MVB signals represent seven different working conditions, the number of neurons in the input layer of the first CAE and the total number of neurons in the output layers of all CAE are all the same as the number of physical signal sampling points of the main starting delimiter of the MVB, so the network structure for DNN is set as [600, 400, 200, 50, 7]. The Adam optimizer is employed with a learning rate of 0.01. The hidden nodes of the SCAE and DNN are the same, such that the features learned by the SCAE can be used by the DNN and fine-tuned further during supervised training.
We employ Keras to implement the SCAE-based DNN model, where the mini-batch gradient descent method is used, and Adam is applied to accelerate the network convergence. All the experiments are conducted on a common desktop PC with an Intel i5 2.8 GHz dual-core processor and 16 GB of RAM. Since the number of samples for each type of fault is 1000, we select 10% of the samples for each fault as the initial training samples and selected samples on each round, which are set to 100. This setting ensures the quality of model training while minimizing the number of manual labels for each sample. At the beginning of training, the DNN model exhibits a low performance; its predicted probability is not trustworthy, and therefore the initial weight value a0 is set to 0.9.
The fault types considered in this study are mainly persistent faults, in which the fault phenomena continuously occur. The main objective of this work is to obtain a high-quality fault diagnosis model while minimizing manual labeling costs in engineering applications. Therefore, three different indicators are used to measure the performance of the proposed fault diagnosis method. Accuracy is a commonly used metric for evaluating the quality of an algorithm in fault diagnosis; it refers to the ratio of the number of instances diagnosed correctly to the total number of instances in the test set. The diagnosis error rate (DER) represents the ratio of fault instances diagnosed as fault instances but misdiagnosed to the total number of fault instances in the test set. The undetected error rate (UER) denotes the ratio of the fault cases diagnosed as normal instances to the total number of fault instances in the test set.
4.3. Performance Evaluation
To demonstrate that the proposed framework can improve classification performance with fewer labeled instances, we compare it with three uncertainty sampling methods, as mentioned before, and two baseline methods (ALL and RA).
ALL: All the training instances are manually labeled and used to train the SCAE-based DNN. This method can be considered the upper bound (the best performance that our model can reach with all labeled training samples)
Random (RA): During the training process, some instances are randomly selected from the training set to be annotated to fine-tune our model. This method can be regarded as the lower bound.
As illustrated in
Figure 10, all the AL algorithms outperform the RA method. This demonstrates the effectiveness of the application of AL algorithms to deep learning. Our DFS strategy achieves the best performance among the other AL algorithms, especially in the early rounds. This is because our method combines similarity and uncertainty criteria to select more informative training samples. To achieve the same accuracy, the number of labeled samples required in our proposed method is much less than the RA method. For example, to achieve an accuracy of 90%, the number of labeled samples required in our proposed method is 600, and that number in the RA method is 2800. The performance of the LM method is marginally better than that of the LC and EN methods, which indicates that the LM method measures uncertainty more accurately.
Figure 11 illustrates the class distributions of the labeled training set as the phase progresses. The random sampling method tends to select the normal samples, and the class distribution of the labeled training set is unchanged during the training process. Therefore, the labeled training set used to train the classifier is class-imbalanced, which significantly degrades the performance. Our method selects minority class samples more so that the proportion of minority classes increases with progress, meaning it can exhibit a relatively balanced class distribution. All of these elements indicate that our method is suitable for fault diagnosis of MVB with fewer labeled training samples.
We also compare the performance of the proposed method with an active deep learning method in the literature [
30]; the SDAE has the same structure as a multi-layer network and SCAE, but the parameters are layer-wise pre-trained with traditional AE. The comparison results are shown in
Figure 12. The accuracy of our model with ALL (99.24%) is better than the SDAE-based DNN with ALL (97.8%), and the DER and UER are lower. Furthermore, the performance of our approach is better than the SDAE-based DNN with AL algorithms. This is because the SCAE can more effectively capture the intrinsic features from raw input data with smaller information loss, and the DFS strategy can choose more informative training samples to improve classification performance.
To evaluate the effectiveness of our dynamic weight adjustment, our DFS strategy is compared with the fixed weight strategy. The fixed weight strategy adopts our proposed method albeit with a fixed weight. We compare two types of settings: FIX1 (a = 0.5), FIX2 (a = 0.1).
The comparison results after different rounds are summarized in
Table 2. Our method outperforms two fixed-weight methods; this indicates that dynamic weight adjustment is more effective for selecting informative samples as compared to fixed-weight settings. Further exploration of the influence of weight
a reveals that the performance of the FIX1 method is better than the FIX2 method, especially after five rounds. Due to the small weight value, the performance of the FIX2 method is only slightly better than the LM method. This validates our assumption that the trained DNN model is not trustworthy in the early rounds; the setting of large
a in the initial rounds can improve the performance of our method.
5. Conclusions
A fault diagnosis method for TCN based on active learning and SCAE is proposed in this work. Compared with the original SAE, the SCAE can learn features directly from the MVB signals and describe the data structure of the raw inputs much better. Through the diagnosis results of the experiments, the SCAE-based CNN model can effectively identify fault states with high diagnostic performance. Moreover, a dynamic fusion active learning strategy is presented to reduce the cost of manual labeling, which is capable of improving diagnostic performance through adaptively adjusting weight between uncertainty and similarity at different training stages. The experimental results demonstrate that our proposed method outperforms state-of-the-art methods and is symmetrically valid in class-imbalanced data. Since our system just requires the installation of a monitoring unit to collect physical-layer electronic signals in the MVB network, our method can be directly employed in the current system.
Nevertheless, the experimental data were collected under relatively stable operating conditions, which may limit the method’s applicability under strong environmental interference. In addition, the proposed approach has not yet addressed the challenge of identifying previously unseen fault types, as such faults would currently be classified into existing categories.
In future work, we would like to develop a multi-criteria active learning algorithm that suits the fault diagnosis of TCN to achieve a better trade-off between diagnosis accuracy and labeling costs. At the same time, we will also study model compression algorithms to reduce the number of model parameters and further enhance real-time diagnostic performance.