Improved RAkEL’s Fault Diagnosis Method for High-Speed Train Traction Transformer

The traction system is very important to ensure the safe operation of high-speed trains, and the failure of the traction transformer is the most likely fault in the traction system. Fault diagnosis in actual work relies largely on manual experience. This paper proposes an improved RAkEL (Random k-Labelsets) algorithm for the fault diagnosis of high-speed train traction transformers. Firstly, this article starts from the large amount of “sleeping” fault maintenance data accumulated by the railway department, takes a single maintenance record as an instance, uses specific monitoring values to construct an instance vector, and uses the fault phenomena corresponding to the monitoring indicators as labels. Then, this paper improves the step of selecting k-labelsets in RAkEL, and extracts associated faults using the Relief algorithm. Finally, this paper excavates and uses the association rules between data and faults to identify traction transformer faults. The results showed that the improved RAkEL diagnostic method had a significant improvement in the evaluation indicators. Compared with other multi-label classification algorithms, including BR (Binary Relevance) and CLR (Calibrated Label Ranking), this method performs well on multiple evaluation indicators. It can further help engineers perform timely maintenance work in the future.


Introduction
In recent years, high-speed railway trains have developed rapidly in terms of speed and capacity. The system structure is complex, the equipment is closely connected with each other, and there are many types of faults. An efficient and accurate fault diagnosis mechanism is of great significance to reducing or eliminating the occurrence of accidents and ensuring the safety of high-speed train operations. The traction system is a key subsystem of high-speed trains, and the traction transformer is one of the main equipment. It is composed of iron core, coil winding, oil tank, oil protection device, and other equipment. Its operation is often affected by various factors such as electricity, heat, and magnetism. With the increase in service time, changeable train operating environment, or due to defects in the manufacturing process, irregular maintenance work, etc. [1], various failures of traction transformers will inevitably occur, burying potential safety hazards.
The large amount of maintenance data accumulated by the railway department can provide data support for fault diagnosis. The data-driven fault diagnosis method has the advantages of a wide application range, less need for modeling and prior knowledge, and easy implementation [1]. In the field of data-driven fault diagnosis of complex electromechanical equipment, machine learning methods are widely used due to their high reliability and easy implementation [2][3][4][5][6]. For example, support vector machine (SVM), a new machine learning method based on statistical learning theory, is a powerful tool for solving small sampling, nonlinear, and high-dimensional problems, which has inspired several works in recent years [7,8]. Many scholars optimize penalty factors and kernel function parameters through intelligent optimization algorithms, such as applying particle swarm optimization (PSO) [9], improved seagull optimization algorithm [10], etc., to improve diagnostic accuracy. Bayesian networks are also widely used in this field [11,12].
The key to fault diagnosis is feature extraction. The principal component analysis (PCA) method is another important research topic in the field of system fault analysis [13,14]. Among them, kernel principal component analysis (KPCA) is a commonly used method, such as the fuzzy clustering-based operating state diagnosis algorithm proposed in reference [15], or by combining it with SVM [10] to further obtain fault information. Deep PCA combines PCA theory with deep learning to effectively achieve early fault diagnosis [16,17].
In the diagnosis of faults in complex electromechanical equipment, existing research predominantly relies on Dissolved Gas Analysis (DGA), processing its characteristic values, and identifying corresponding single-fault types. However, these studies often overlook the fact that in practical operation, equipment instances are often associated with one or more fault categories, naturally giving rise to multi-label classification problems. Therefore, in recent years, multi-label classification methods have become increasingly popular in the field of fault diagnosis (three articles), which can well explore fault diagnosis situations of high-dimensional features and compound fault types. There are currently a large number of mature methods in the field to solve multi-label learning problems [18]. The first is the transformation method problem, such as BR (Binary Correlation), CC (Classifier Chains), CLR (Calibrated Label Ranking) [19][20][21]. The other is to improve the existing single classifier, such as transforming the nearest neighbor classifier (KNN) into multi-label nearest neighbor classifier (MLKNN), transforming support vector machine (SVM) into rank support vector machine (RANKSVM) [22,23]. The classic multi-label classification algorithm RAkEL has been widely used in image classification [24], biomedicine [25,26], information security, and other fields [27]. By dividing the label space, RAkEL tries to overcome the problems of computational cost and label powerset explosion. This paper is based on the ordinary RAkEL method, treats labels as features, uses the Relief algorithm to improve the steps of randomly selecting labelsets, and diagnoses high-speed train traction transformer faults based on existing fault and maintenance data. The results show that, compared with the ordinary RAkEL and other ordinary multi-label classification algorithms, the improved RAkEL has better performance.
The main contributions of this paper are as follows: 1.
Using the improved RAkEL method proposed in this paper to mine the correlation between fault phenomena in the actual high-speed train traction transformer fault dataset, the accuracy of the fault phenomenon identification and final maintenance diagnosis is high. At the same time, this paper sets relevant parameters based on the high-speed train traction transformer dataset and achieves good diagnostic results, indicating that this method is suitable for the actual fault diagnosis process.

2.
Based on the ordinary RAkEL algorithm, this paper considers the correlation between fault manifestations in the process of selecting k-labelsets, adds the Relief algorithm to reduce the randomness of labelset selection, and improves the calculation efficiency and diagnosis accuracy. In the actual high-speed train traction transformer fault dataset, compared with the ordinary RAkEL, the improved RAkEL has better performance in various evaluation indicators. Based on the optimal parameters obtained through experiments, AP increased by 7.4%, and Coverage, Hamming Loss, One Error, and Ranking Loss decreased by 51.2%, 9.8%, 13.3%, and 51.6%, respectively.

3.
After adding the Relief algorithm to mine label correlation, the improved RAkEL performs the best overall in comparison with other algorithms. In the actual high-speed train traction transformer fault dataset, based on the set parameters, the improved RAkEL has the best comprehensive performance compared with BR, CLR, and LP, and is only slightly lower than BR in AP. This shows that compared with BR, the number of related instances that have not been diagnosed is larger in the improved RAkEL method.
The organizational structure of the rest of the paper is as follows: Section 2 introduces the working content and diagnostic difficulties of the traction transformer and the RAkEL method. Section 3 introduces the model construction, including the data processing process and evaluation indicator system. Section 4 uses the known data to verify the calculation examples, and discusses the related parameters. Section 5 summarizes the full text, and puts forward the deficiencies and the direction of future efforts.

Work Content and Diagnostic Difficulties of Traction Transformers
Taking the CRH5 model as an example, the multiple unit is composed of eight car formations and two power units. The first power unit consists of three high-speed trains and one trailer (M-M-T-M); The second power unit consists of two high-speed trains and two trailers (T-T-M-M). Each power unit is equipped with a main transformer (TT) and a pantograph, and the entire train is equipped with two pantographs.
The design of the traction structure is that the third traction converter of the first traction unit can be powered by the second traction unit and can be switched from the first traction unit to another traction unit. This feature enables the balancing of two main transformers when a traction converter fails; at least three traction converters operate when one main transformer fails. When the electrical equipment in each traction power unit malfunctions, the power train of that unit can be completely or partially cut off (cutting off one or two traction converters), but it does not affect the operation of other power units.
The main transformer (TT) is a single pole transformer equipped with six secondary windings, which allows the train to operate on a line powered by a nominal AC voltage of 25 kV-50 Hz, reducing the voltage to a value suitable for driving the parts. It is cooled by forced oil circulation; a dedicated oil-gas heat exchanger (cooler) is used to cool the oil, and an oil evaporator is also integrated into the transformer to ensure the space required for evaporation and oil storage. The main transformer (TT) meets the needs of the traction unit; each traction/auxiliary converter is powered by two secondary windings. In the component called "HV Controller Box" (COMB) adjacent to the transformer, remote isolation switches SAZ1 and SAZ (to drive one or eight vehicles, respectively), SAZ21 and SAZ122 (to drive two or seven vehicles, respectively), and SAZ31 and SAZ32 (to drive four vehicles) are installed to perform the task of disconnecting from the traction converter in the event of a fault affecting the converter. Figure 1 shows the current phase measurement points for traction transformers, such as breakdown voltage (BV), moisture content, acid value, and dielectric loss factor. Due to the complex types of faults in traction transformers, there is currently little research on the analysis of the relationship between faults in traction transformers. Currently, maintenance work mainly relies on manual testing and judgment. Workers often misjudge or miss faults due to lack of experience or misjudgment, which seriously affects the safe operation of equipment. Therefore, analyzing the relationship between closely related faults and establishing an efficient and accurate fault diagnosis model is crucial for improving the reliability of traction transformer equipment.

RAkEL
The RAkEL method proposed by Tsoumakas et al. involves training multi-classifiers for each k-labelset using the LP method after randomly dividing the total labelset of the data into multiple small labelsets containing k labels (k-labelsets), collecting and combin-

RAkEL
The RAkEL method proposed by Tsoumakas et al. involves training multi-classifiers for each k-labelset using the LP method after randomly dividing the total labelset of the data into multiple small labelsets containing k labels (k-labelsets), collecting and combining the decisions of all the LP classifiers to constitute a multi-label classification of unlabeled instances [28]. Compared to LP, RAkEL has the advantage of generating computationally simpler single-label classification tasks with a more balanced distribution of class values. And in the case of overlapping labelsets, RAkEL is able to collect multiple diagnostic results for the same label through different LP models, which are voted to obtain the final output. This provides diagnostic results with the opportunity to correct potentially irrelevant errors and improve performance. At the same time, RAkEL can generalize to labelsets other than the known ones, which LP cannot do. The pseudocode of the algorithm is shown in Algorithm 1.
Construct the multi-class training set D + L k (l r ) according to Equation (8) . end for 6. Return the diagnostic labelset Y according to Equation (15).

Model Construction
At this stage, the railway department has accumulated a large amount of high-speed train traction transformer operation and maintenance data, including condition monitoring data, fault types and performance, repair and maintenance records, etc. It is of great significance to analyze and excavate these dormant data for the traction transformer's fault identification and diagnosis of the repair method.
The ordinary RAkEL method randomly decomposes the initial labelset into multiple small labelsets, which reduces the complexity of the generated single-label classification task, but also brings about the problems of redundancy and low computational efficiency. Therefore, this paper considers adding the Relief algorithm for improvement, treating labels as features, and preferentially selecting relevant labels to form a labelset. Finally, we use the improved RAkEL algorithm to mine the association rules between data and faults, faults and faults, point out the types of faults that occur in the newly generated data, and further provide preventive maintenance recommendations.

Constructing the Training Set Containing Instances of Maintenance Records and Corresponding Fault Phenomenon Labels
The historical condition-monitoring data in the existing records contain monitoring values for specific monitoring indicators, including BV, moisture content, dielectric loss factor, and so on. In this paper, we take a single maintenance record as an instance and use its corresponding specific monitoring values to construct the instance space in the multi-label classification algorithm: where X denotes the instance space and x i denotes the ith maintenance record as the instance. x i is a d-dimensional feature vector with d corresponding to the total number of monitoring indicators. In this paper, the fault phenomena corresponding to the monitoring indicators are used as labels to construct the label space in the multi-label classification algorithm: where L denotes the label space, M is the total number of possible fault phenomena, and M denotes the Mth label contained in the label space. Based on the fault phenomena in each maintenance record, the label 0/1 matrix corresponding to the instance is constructed: Y i denotes the labelset associated with the ith instance x i , where i = 1 . . . N. a j indicates whether the jth fault phenomenon occurs or not, if the fault occurs, then a j = 1, otherwise, it is 0, where j = 1 . . . M.
The training set containing maintenance record instances and their associated labelsets is denoted as: D denotes the training set, where x i ∈ X denotes the ith maintenance record instance, Y i ⊆ L is the number of instances associated with the set of related labels x i , and N is the total number of instances.

Constructing Multi-Classification Training Sets
RAkEL is an algorithm that transforms a multi-label problem into a multi-classification problem. It reduces the label space into k-labelsets and calls LP methods separately for each k-labelset to train the multi-classifiers and finally votes to obtain the results to improve the accuracy.
Given the labelset size k, the set consisting of the labels in the space of k labels is called a k-labelset. We construct the set of all possible k-labelsets: where L k is the set of all possible k-labelsets, L k denotes the size of L k , M is the number of labels, and ( M k ) denotes that k labels are randomly taken from M labels. For the labelsets in L k : where L k (l) denotes the lth labelset in L k and L k (l) denotes the size of the L k (l). Given the desired number of classifiers m: In the training phase, for an original multi-labeled sub-training set, we reduce the original labeling space L to L k (l) by converting it to the following multi-class single-labeled training set: D + L k (l) contains the new class: where D + L k (l) denotes the training set with label space L k (l); σ L k (l) : 2 L → N is the mapping from the power set of L to the natural numbers of the inverse function; and Γ D +

Constructing a Collection of Multi-Class Classifiers
A multi-class learning algorithm is utilized to induce a multi-class classifier: i.e., To create an integration with m component classifiers, a collection of multi-label classifiers is constructed by calling LP on a set of m randomly chosen k-labelsets L k (l r ) (1 ≤ r ≤ m): where g + L k (l m ) denotes the multi-class classifier constructed for the mth k-labelset. The ordinary RAkEL randomly selects labels to divide the labelset with strong randomness and ignores the correlation between labels. It is easy to select a large number of redundant or irrelevant label combinations during random selection, which reduces the computational efficiency and diagnostic accuracy. Based on this, this paper applies the Relief algorithm, which treats labels as features and constructs closely related labelsets to reduce randomness [29].
Finally, we construct the related label matrix.
where A j denotes the related label matrix of the jth label, a i represents the value of x i with respect to the label in Y C j , N denotes the number of instances in the training set, Y C j denotes the label matrix consisting of labels other than the jth label, Φ(Y i , j ) denotes the value of x i with respect to the label j , and Φ(Y i , j ) is 1 if j ∈ Y i and 0 otherwise. Y i denotes the set of related labels associated with x i ; c 1 denotes the first label in Y C j ; and c k denotes the jth label in Y C j . We apply the Relief algorithm to A j to obtain the weight of each of the remaining labels on j , and next take the k − 1 label with the largest weight value among them as the closely related label of the jth label to obtain the set of closely related labels for each label R j : where R j denotes the set of closely related label corresponding to the jth label; rank( b ) denotes the ranking of weight values corresponding to the bth label in Y C j . By forming a k-labelset with the jth label and its closely related labelset, we obtain a total of M closely related k-labelsets, and then randomly select m-M non-repeated k-labelsets from them to jointly construct a multi-classifier ensemble, which reduces the randomness in the selection of k-labelsets.

Building Diagnostic Instances Labelsets
For unrecognized record instances x, the following two quantities are calculated for each label: where τ( , j ) counts the maximum number of votes for diagnostic integration on label j and µ( , j ) counts the actual number of votes for diagnostic integration on label j . X denotes the voting method, X is 1 if X is true, and 0 otherwise. σ L k (l) : 2 L → N is the mapping from the power set of L to the natural numbers of the inverse function; σ −1 L k (l) represents the inverse function corresponding to it. g + L k (l r ) denotes the multi-class classifier constructed for the rth k-labelset.
Then, the set of diagnostic labels for undiagnosed instances is represented as follows: In other words, it is considered that j is related to when the actual number of votes is more than half of the maximum number of votes. For a collection created from a set of k labels, the maximum number of votes on each label is mk/M on average. Table 1 shows the diagnostic voting process of RAkEL for a multi-label training set with the number of labels M = 6, for example, run with the labelset size k = 3 and the number of desired classifiers m = 7.

Hamming Loss
Hamming Loss is used to evaluate how often labels are misdiagnosed, recording cases where: relevant labels are misdiagnosed as irrelevant, and irrelevant labels are misdiagnosed as relevant.
Hamming Loss = 1 where p denotes the number of instances, q is the number of labels, and h(x i ) denotes the number of labels corresponding to the diagnosed error for instance x i .

Ranking Loss
Ranking Loss evaluates the proportion of reverse-ordered label pairs, recording the proportion of instances where the diagnostic rank of relevant labels is lower than the diagnostic rank of irrelevant labels.
where l denotes the actual relevant label for the ith instance and l denotes the actual irrelevant label for that instance; Y i is the set of irrelevant labels for the instance; f )x, y) is a real-valued function indicating the confidence that y is the relevant label for x, obtained by the classifier system.

One Error
One Error is used to evaluate the proportion of instances where the highest-ranked label is not relevant to the instance.
where oneerror(k) is the One Error value of the kth label. argmax l∈y f (x i , l) denotes the highest ranked label associated with instance x i .

Coverage
Coverage evaluates how many steps it takes, on average, to move down the list of ranking labels to cover all relevant labels for the instance.
where rank f is the rank function corresponding to the real-valued function f .

Average Precision
Average Precision (AP) is used to evaluate the average percentage of relevant labels that rank higher than a specific label.
where rank f (x i , l) is the rank function corresponding to the real-valued function indicating the rank of l in the diagnostic results of the unknown instance x i .

Algorithm Validation
The dataset used in this paper is the historical maintenance record dataset of traction transformer for high-speed trains. The maintenance record contains condition monitoring data under different monitoring indicators several times, and the monitoring indicators include BV, moisture content, acid value, etc. Specific monitoring values can reflect certain fault phenomena, and different combinations of fault manifestations can provide recommendations for high-speed train traction transformer maintenance methods.

Parameter Selection
In the RAkEL algorithm, the settings of the labelset size k and the number of classifiers m have a large impact on the diagnostic results. In this paper, 70% is randomly selected from the dataset as the training set and 30% as the test set, and the values of k and m are discussed.
Based on the different values of the k set (m = 2 M), the change of each evaluation indicator is shown in Figure 2, where coverage has been normalized.
With different values of m set (k = 2), the changes in each evaluation indicator are shown in Figure 3, where coverage has been normalized. Based on the different values of the k set (m = 2 M), the change of each evaluation indicator is shown in Figure 2, where has been normalized. With different values of m set (k = 2), the changes in each evaluation indicator are shown in Figure 3, where has been normalized. With different values of m set (k = 3), the changes in each evaluation indicator are shown in Figure 4, where has been normalized: indicator is shown in Figure 2, where has been normalized. With different values of m set (k = 2), the changes in each evaluation indicator are shown in Figure 3, where has been normalized. With different values of m set (k = 3), the changes in each evaluation indicator are shown in Figure 4, where has been normalized: With different values of m set (k = 3), the changes in each evaluation indicator are shown in Figure 4, where coverage has been normalized:  With different values of m set (k = 4), the changes in each evaluation indicator are shown in Figure 5, where has been normalized. With different values of m set (k = 4), the changes in each evaluation indicator are shown in Figure 5, where coverage has been normalized. With different values of m set (k = 4), the changes in each evaluation indicator are shown in Figure 5, where has been normalized. From Figures 3-5, it can be seen that the comprehensive performance is good when k is set to 3 and m is set to 34, and the results of the indicators near the optimal value show a smooth trend. Considering the test results and suggestions [28], the labelset size k is determined as 3 and the number of classifiers m is determined as 34.

Comparison of Evaluation Indicators before and after Improvement
In order to reduce the impact of randomness in RAkEL, this paper adds the Relief algorithm that mines label correlation, selects labels with greater correlation to form a labelset, and reduces the redundant and irrelevant impact of randomly selecting a labelset. In this paper, 70% of the dataset is randomly selected as the training set and 30% as the test set; based on the parameters set in the previous section, the indicators before and after RAkEL improvement are shown in Table 2, among which are  ,  ,  , and , whereby the smaller the value of the indicator, the better the performance, and the larger the value of the indicator, the better the performance. From Figures 3-5, it can be seen that the comprehensive performance is good when k is set to 3 and m is set to 34, and the results of the indicators near the optimal value show a smooth trend. Considering the test results and suggestions [28], the labelset size k is determined as 3 and the number of classifiers m is determined as 34.

Comparison of Evaluation Indicators before and after Improvement
In order to reduce the impact of randomness in RAkEL, this paper adds the Relief algorithm that mines label correlation, selects labels with greater correlation to form a labelset, and reduces the redundant and irrelevant impact of randomly selecting a labelset. In this paper, 70% of the dataset is randomly selected as the training set and 30% as the test set; based on the parameters set in the previous section, the indicators before and after RAkEL improvement are shown in Table 2, among which are Hamming Loss, One Error, Coverage, and Ranking Loss, whereby the smaller the value of the indicator, the better the performance, and the larger the value of the AP indicator, the better the performance. As shown in Figure 6, the improved RAkEL outperforms the ordinary RAkEL in all indicators, with an AP improvement of 7.4%. This indicates that the improved RAkEL algorithm proposed in this paper, which incorporates feature selection and calculates the correlation between labels to prioritize selecting relevant labels for composing k-labelsets, has analyzed the association between faults and their concurrent failures effectively. It significantly reduces the impact caused by the random selection of labelsets in ordinary RAkEL, thus validating the effectiveness of the improvement.
indicators, with an improvement of 7.4%. This indicates that the improved RAkEL algorithm proposed in this paper, which incorporates feature selection and calculates the correlation between labels to prioritize selecting relevant labels for composing k-labelsets, has analyzed the association between faults and their concurrent failures effectively. It significantly reduces the impact caused by the random selection of labelsets in ordinary RAkEL, thus validating the effectiveness of the improvement.

Comparison with Other Multi-Label Classification Algorithms
According to the dataset and parameters set in the previous section, Table 3 demonstrates the results of comparing the indicators of this experiment with other multi-label classification algorithms, including BR and CLR, among which are , , , and , whereby the smaller the value of the indicator, the better the performance; the larger the value of the indicator, the better the performance. From Table 3, the improved RAkEL had the best overall performance in all the indicators, except for the slightly lower compared to BR.

Comparison with Other Multi-Label Classification Algorithms
According to the dataset and parameters set in the previous section, Table 3 demonstrates the results of comparing the indicators of this experiment with other multi-label classification algorithms, including BR and CLR, among which are Hamming Loss, One Error, Coverage, and Ranking Loss, whereby the smaller the value of the indicator, the better the performance; the larger the value of the AP indicator, the better the performance. From Table 3, the improved RAkEL had the best overall performance in all the indicators, except for the slightly lower AP compared to BR.
The comparison of the improved RAkEL indicators with other methods is shown in Figure 7. The comparison of the improved RAkEL indicators with other methods is shown in Figure 7. As shown in the figure, the improved RAkEL has the best comprehensive performance compared to other classification algorithms in the application of a high-speed train traction system's fault dataset. This indicates that the improved RAkEL has a better performance in mining the correlation between fault phenomenon labels, and also shows the effectiveness and applicability of mining the correlation of fault phenomena for fault diagnosis and taking effective maintenance measures. The actual fault diagnosis results are shown in Table 4. As shown in the figure, the improved RAkEL has the best comprehensive performance compared to other classification algorithms in the application of a high-speed train traction system's fault dataset. This indicates that the improved RAkEL has a better performance in mining the correlation between fault phenomenon labels, and also shows the effectiveness and applicability of mining the correlation of fault phenomena for fault diagnosis and taking effective maintenance measures. The actual fault diagnosis results are shown in Table 4. In this case, if a breakdown voltage > 50 kV is detected, the probability of diagnosing a C3-level repair is 34.8%; a C4-level repair, 52.2%; and a C6-level repair, 13.0%. The diagnostic accuracy of the final maintenance method reached 93.62%, which is 8.62% higher than the commonly used diagnostic method based on the BPLN model in current engineering, indicating that the method proposed in this paper has practical application value [30].

Conclusions
It is very important to carry out efficient and accurate fault prediction for the main transformer, as it is one of the components of the high-speed train traction system with the most types of faults and the most frequent monitoring of the phase points. The algorithm proposed in this paper has a small degree of AP and enhancement before and after improvement, and there is still some room for improvement in the evaluation indicators, such as AP, in comparison with other algorithms. The next step we propose is to study the way with which we can achieve a higher-accuracy fault diagnosis while maintaining the existing level.