A Fault Diagnosis Method of Mine Hoist Disc Brake System Based on Machine Learning

: The performance of the brake system is directly related to the safety and reliability of the mine hoist operation. Mining the useful fault information in the operation of a mine hoist brake system, analyzing the abnormal parts and causes of the equipment, and making accurate early prediction and diagnosis of hidden faults are of great signiﬁcance to ensure the safe and stable operation of a mine hoist. This study presents a fault diagnosis method for hoist disc brake system based on machine learning. First, the monitoring system collects the information of the hoist brake system, extracts the fault features, and pretreats it by SPSS (Statistical Product and Service Solutions). This work provides data support for fault classiﬁcation. Then, due to the complex structure of the hoist brake system, the relationship between the fault factors often has a signiﬁcant impact on the fault. Considering the correlation between the fault samples and the attributes of each sample data, the C4.5 decision tree algorithm is improved by adding Kendall concordance coe ﬃ cient, and the improved algorithm is used to train the sample data to get the decision tree classiﬁcation model. Finally, the fault sample of the hoist brake system is trained to get the algorithm model, and then the fault diagnosis rules are generated. The state of the brake system is judged by classifying the data. Experiments show that the improved C4.5 decision tree algorithm takes the relativity of conditional attributes into account, has a higher diagnostic accuracy when processing more data, and has concise and clear fault classiﬁcation rules, which can meet the needs of hoist fault diagnosis.


Introduction
The mine hoist is a large-scale system integrating machine, electricity, and liquid. It is indispensable transportation equipment for coal mining. The performance of the brake system is directly related to the safety and reliability of the mine hoist. It is of great significance to carry out research on the fault diagnosis of the brake system to ensure the safe operation of the hoist.
Disc brake systems are widely used in mine hoisting equipment and are comprised of disc brakes and hydraulic stations. Figure 1 shows the large mine hoist system (Figure 1a) in which a mine is operating and the disc brakes ( Figure 1b) and hydraulic station (Figure 1c) of the hoist.
Because of the complex structure of the system, there are many uncertain factors and information in actual engineering. Also, there are many complex relationships among faults, and multiple faults occur simultaneously. Hence, it is a complicated process to use existing knowledge to analyze and infer fault diagnosis [1]. In recent years, many researchers have applied artificial intelligence and other new technologies for fault diagnosis, which provides a new idea for the research of mine hoist fault diagnosis methods. To improve mine hoist safety and to prevent the crash of a cage at the shaft Because of the complex structure of the system, there are many uncertain factors and information in actual engineering. Also, there are many complex relationships among faults, and multiple faults occur simultaneously. Hence, it is a complicated process to use existing knowledge to analyze and infer fault diagnosis [1]. In recent years, many researchers have applied artificial intelligence and other new technologies for fault diagnosis, which provides a new idea for the research of mine hoist fault diagnosis methods. To improve mine hoist safety and to prevent the crash of a cage at the shaft boundaries, Giraud et al. [2] used a fault tree to analyze the accidents scenarios of a cage crash in a shaft and proposed two generic fault trees. Li et al. [3] introduced the support vector machine into hoist brake system fault diagnosis, which improved the efficiency of diagnosis greatly. Lei [4] put forward a fault diagnosis classification method based on SOM (Self-Organizing Map), which achieved the first level of diagnosis successfully. In fault diagnosis, a large amount of diagnosis knowledge is the premise of effective fault diagnosis, but at present, the fault diagnosis of large-scale complex mechanical and electrical equipment brake system mainly depends on the traditional fault diagnosis expert system and the experience of field technicians, which does not fully use the existing fault diagnosis knowledge and monitoring data. The stability and correctness of the diagnosis results are based only on successful diagnosis examples and experience, resulting in a lack of scientific theoretical basis for diagnosis decisions, and low diagnostic efficiency, backward knowledge management, etc. To fully use on-site monitoring data and expert experience, the authors of this study used the methods of knowledge engineering, multi-source information fusion, evidence theory, and other techniques to optimize the combination and extraction of this information, and generate diagnostic rules, which provide a reliable basis for fault warning and diagnosis [5][6][7][8]. This has laid the foundation for research on this topic.
With the continuous improvement of intelligent sensors and monitoring technology, as well as the increase of measuring points and the increase of sampling frequency, the monitoring data of lifting equipment based on operating conditions has the characteristics of large data, large data dimension and redundant data attributes. Existing diagnostic methods can no longer meet the With the continuous improvement of intelligent sensors and monitoring technology, as well as the increase of measuring points and the increase of sampling frequency, the monitoring data of lifting equipment based on operating conditions has the characteristics of large data, large data dimension and redundant data attributes. Existing diagnostic methods can no longer meet the processing needs of rapid data growth. As the most efficient data processing algorithm in the era of big data, machine learning has developed. This method studies fault characteristics based on data. By collecting the running data of the equipment and analyzing and processing it, it extracts the useful characteristic data to diagnose the system faults. Especially in terms of fault classification, it has advantages unmatched by other algorithms [9]. The commonly used machine learning classification algorithms are a decision tree, Bayesian classification, and support vector machine. Liu Tao et al. [10] used the decision tree construction method to design the fault alarm system, which solved the problems of passive detection of fault detection of mine hoist and inaccurate fault data, detection delay, and so on. Based on the Bayesian theory, Li Juanli et al. [11] conducted uncertainty reasoning on mine hoist faults and obtained better fault recognition results. Vernekar et al. [12] used the support vector machine as a classifier to diagnose the rolling bearing, which proved the superiority of the machine learning technology in fault diagnosis. Yao Dechen et al. [13] used the optimized support vector machine to diagnose the bearing fault of the train. This method can accurately identify the bearing fault type of the train and improve the accuracy of the classification. In the above methods, the Bayesian classification method is mainly used to process the nominal data. In this subject, most of the data obtained from the various attributes of the Appl. Sci. 2020, 10, 1768 3 of 15 hoist are numerical data, so the Bayesian classification method is not selected. The original classifier of the support vector machine is only suitable for dealing with the binary classification problem. It is obviously not applicable to the hoist brake system with various failure modes as the modification algorithm will increase the calculation amount. However, the decision tree classification method can process the sample set containing both discrete attributes and continuous attributes, and it is easier to convert with classification rules. It can also solve the over-fitting problem well through the pruning process, which is easy to apply to practical work [14][15][16][17][18]. Therefore, this study chooses the decision tree classification method to study the relationship between data and faults in the monitoring system.

Architecture of Mine Hoist Fault Diagnosis System Based on Machine Learning
According to the needs of the fault diagnosis of the hoist brake system, this study established the general fault diagnosis frame of a mine hoist brake system based on machine learning, which is shown in Figure 2.
(1) Data acquisition: The source of the data is mainly obtained from the monitoring system's tracking of the operating state of the hoist brake system and the historical diagnostic knowledge base of the brake system. These data are characterized by large quantities, large numbers of dimensions, and rich content, which make up massive and heterogeneous large data. The collected data is stored in the SQL Server database. Also, the filtering and sorting function of the database can meet the low-cost storage requirements of massive and diverse data. Moreover, the data suitable for analysis can be filtered and integrated. (2) Data processing: Process the collected hoist brake system fault data, extract the fault features, denoise the data, process the missing values and continuous values of the sample data, and implement the K-means clustering algorithm by SPSS software. Finally, discretize continuous values. (3) Model training: Use the decision tree algorithm C4.5 to classify the sample data, add the Kendall concordance coefficient to improve the algorithm and use the improved C4.5 decision tree algorithm to train the sample data. Therefore, the decision tree classification model is obtained. (4) Model application: The data is trained to obtain the algorithm model, which generates fault diagnosis rules and judges the state of the brake system by classifying the data. These data are then applied to the fault diagnosis of the hoist brake system. the support vector machine as a classifier to diagnose the rolling bearing, which proved the superiority of the machine learning technology in fault diagnosis. Yao Dechen et al. [13] used the optimized support vector machine to diagnose the bearing fault of the train. This method can accurately identify the bearing fault type of the train and improve the accuracy of the classification. In the above methods, the Bayesian classification method is mainly used to process the nominal data. In this subject, most of the data obtained from the various attributes of the hoist are numerical data, so the Bayesian classification method is not selected. The original classifier of the support vector machine is only suitable for dealing with the binary classification problem. It is obviously not applicable to the hoist brake system with various failure modes as the modification algorithm will increase the calculation amount. However, the decision tree classification method can process the sample set containing both discrete attributes and continuous attributes, and it is easier to convert with classification rules. It can also solve the over-fitting problem well through the pruning process, which is easy to apply to practical work [14][15][16][17][18]. Therefore, this study chooses the decision tree classification method to study the relationship between data and faults in the monitoring system.

Architecture of Mine Hoist Fault Diagnosis System Based on Machine Learning
According to the needs of the fault diagnosis of the hoist brake system, this study established the general fault diagnosis frame of a mine hoist brake system based on machine learning, which is shown in Figure 2.
(1) Data acquisition: The source of the data is mainly obtained from the monitoring system's tracking of the operating state of the hoist brake system and the historical diagnostic knowledge base of the brake system. These data are characterized by large quantities, large numbers of dimensions, and rich content, which make up massive and heterogeneous large data. The collected data is stored in the SQL Server database. Also, the filtering and sorting function of the database can meet the low-cost storage requirements of massive and diverse data. Moreover, the data suitable for analysis can be filtered and integrated. (2) Data processing: Process the collected hoist brake system fault data, extract the fault features, denoise the data, process the missing values and continuous values of the sample data, and implement the K-means clustering algorithm by SPSS software. Finally, discretize continuous values. (3) Model training: Use the decision tree algorithm C4.5 to classify the sample data, add the Kendall concordance coefficient to improve the algorithm and use the improved C4.5 decision tree algorithm to train the sample data. Therefore, the decision tree classification model is obtained. (4) Model application: The data is trained to obtain the algorithm model, which generates fault diagnosis rules and judges the state of the brake system by classifying the data. These data are then applied to the fault diagnosis of the hoist brake system.

Data Acquisition
The brake system of hoists includes disc brakes and hydraulic stations. The hoist brake system used in the laboratory has one brake disc on both sides of the drum, and also includes eight brake shoes and two hydraulic pipes. In the laboratory, by using displacement sensors, temperature sensors, pressure sensors, etc., to monitor or calculate the key parameters of the brake system directly or indirectly, the corresponding data is obtained. The specific monitoring parameters are shown in Table 1. After collecting the monitoring data, the SPSS data pretreatment method in the next subsection is used to analyze and process the fault eigenvalues, which can reduce the data dimension and reduce the computational workload.

SPSS-Based Data Pretreatment
SPSS is one of the current mainstream data analysis pieces of software, which can analyze the hoist operating state information contained in the data quickly. This analysis method can be applied to the processing and analysis of large amounts of data, providing a reliable scientific basis for analysts to make relevant decisions [19]. In the monitoring data's pretreatment of the brake system, the relationship between the monitoring attributes is analyzed by SPSS, which has a certain enlightenment for the selection of the characteristic attributes in the decision tree. The following are the main treatment methods: (1) Parameter test and interval estimation of data sets In the trend's estimation under the same working conditions, the estimation method of the confidence interval is usually adopted. In fact, the main body effect test and the estimation of the confidence interval can be realized conveniently in the SPSS. For example, by checking the main body effect between brake shoe clearance and the lifting velocity, hydraulic station oil temperature, and hydraulic station hydraulic pressure, it can be concluded that the brake shoe clearance has a significant influence on the lifting speed. As shown in Table 2. (2) Correlation analysis and regression analysis In data analysis, the correlation between parameters is usually analyzed, which mainly refers to the attribute relationship. For instance, models can be established under specific conditions or statistically compensated for missing values when dealing with missing values or prediction of fault. Faced with multi-attribute datasets in fault data, correlation analysis can establish correlations between attributes and make scientific estimates of future trends of data. Regression analysis mainly aims to predict the trend of future variables by optimizing the data over time and establish a functional relationship between variables. According to the analysis in SPSS, the lifting velocity and the acceleration have a linear correlation that can be used for dimensionality reduction. However, there is no correlation between the values of the brake shoe gaps. Therefore, this study only analyzes the fault of a single brake shoe by collecting data during the test.
(3) Cluster analysis The cluster analysis mainly classifies the variables with high similarity to one class by calculating the affinity relationship between variables and displays the classification results in different ways. Since SPSS has an automatically generated cluster analysis model, the system itself can standardize the data, which solves the problem of normalizing and standardizing data. Cluster analysis can at the same time achieve data discretization. In this study, SPSS software is used to discretize the K-means clustering [20] of data continuity attributes, which is seen in Section 5.2 with the fault diagnosis experiment.

Decision Tree Classification Method
The commonly used decision tree classification algorithms are the ID3 and C4.5 algorithms. The ID3 algorithm uses the information gain criterion when selecting feature attributes and constructs the decision tree layer by layer. C4.5 is improved based on ID3. Besides the function of ID3 algorithm, C4.5 has the following advantages [21]: (1) using information gain rate as the basis for dividing attributes, the classification is more reasonable in the face of a few sample subsets; (2) it can process training samples containing both discrete and continuous data and training samples with attribute missing values; (3) multiple pruning methods are used to avoid redundant rules; (4) rules are easier to interpret and more accurate. Therefore, this study focuses on the in-depth analysis of the C4.5 algorithm.
The core idea of C4.5 is to set the training dataset T. When constructing the decision tree with dataset T, the attribute with the largest information gain rate is selected as the dividing node, and the data set T is divided into n subsets according to the current attribute partitioning standard. If one subset T i contains the same element class, the node is used as the leaf node and the partition is stopped. If the subset still contains different elements, it continues to divide according to the above division method recursively until all the elements of the subset belong to the same category. At last, nodes are no longer divided and are generated into trees instantly [22][23][24]. The specific process is: Let D be the training dataset and |D| denote the size of the sample. Set K to be a natural number, for all k = 1, 2, · · · , K, C k is a class. Assume that |C k | is the number of samples which belong to the class C k and K k=1 |C k | = |D|. Suppose characteristic A has different values {a 1 , a 2 , . . . , a n } . According to the value of the characteristic A, D can be divided into n subsets D 1 , D 2 , · · · , D n , where, for each i = 1, 2, · · · , n, |D i | is the sample size of D i , and n i=1 |D i | = |D|. Denote the set of samples belonging to class C k in subset D i as D ik , namely D ik = D i ∩ C k . Here, |D ik | is the sample size of D ik .
Step 1: Calculate the empirical entropy H(D) Step 2: Calculate the empirical conditional entropy H(D|A ) Step 3: Calculate the information gain Step 4: Calculate the information gain rate where and n is the number of characteristic A values.

C4.5 Algorithm Improvement
The structure of the hoist brake system is complex and the cause of the failure is often not single, so the various failure factors should be taken into consideration. It is also worth noting that the relationship between the failure factors often has a significant impact on the fault. Therefore, in this study, when the decision tree goes to the step of selecting the optimal partition attribute, considering the correlation between attributes, Kendall concordance coefficient is introduced to improve C4.5 algorithm. Moreover, more accurate classification rules can be obtained based on keeping the advantages of the C4.5 algorithm.

Kendall Concordance Coefficient
The Kendall concordance coefficient can be used to calculate the degree of correlation between multiple levels of variables. Let T represent the data set, consisting of N attribute variables X and K decision variables Y. Set B k (k = 1, 2, . . . , N) as the value of attribute X and R i (i = 1, 2, . . . K) as the sum of each row of B k . Here, W denotes the Kendall concordance coefficient.
If the values in each attribute variable are different, the calculation formula is: If there are m groups with the same value in each attribute variable, and the number of the same value is m, the calculation formula is: where The Kendall concordance coefficient W has a value between −1 and 1. If W = −1, then there is an opposite correlation between the variables. Also, if W = 1, then there is a consistent correlation between the variables. On the other hand, if W = 0, then there is no correlation between variables, namely, they are independent of each other.

Algorithm Optimization
Introduce the Kendall concordance coefficient W into the C4.5 algorithm and simplify the formula. The process is: Step 1: Calculate the correlation coefficient W using formula (6) or (7).

Algorithm Implementation
The actual calculation process described below provides details about the implementation process of K_C4.5 algorithm optimization. Table 3 is the sample data set. As can be seen from Table 3, there are three attribute values in the condition attributes A, B, and D: 1, 2, 3, and C have two attribute values: 1, 2. There are two attribute values in the decision attribute F: 0,1. According to the attribute values in the table, the coefficient W is first calculated, then the gain rate of each attribute is calculated with respect to the formula (12). After that, the decision tree is constructed according to the gain rate. The detailed steps are as follows: First, each attribute is evaluated according to the rating method. Assuming that the grade of 0 in the decision attribute F is higher than 1, then the probabilities that the three attribute values in A are obtained as: P(A 1 ) = 1 6 , P(A 2 ) = 1 2 , and P(A 3 ) = 1 3 . Obviously, P(A 2 ) > P(A 3 ) > P(A 1 ). Similarly, we can get P(B 2 ) > P(B 3 ) > P(B 1 ) in attribute B, P(C 2 ) > P(C 1 ) in attribute C, and P(D 1 ) > P(D 2 ) > P(D 3 ) in attribute D. The results of grade evaluation are shown in Table 4.

Experimental Verification
In this study, the 2JTP-1.2 hoist in laboratory is used as the test object to test and verify the diagnostic model generated by the algorithm. The fault data is collected through simulation test faults, then the diagnostic rules are used for fault diagnosis and prediction. The improved algorithm is then compared to the original. In this study, the algorithm is implemented in Python language [25,26].

Fault Simulation
Since the fault cannot be specifically set in actual production, it is necessary to put up a test rig in the laboratory to perform a fault simulation test. The monitoring data is collected in the fault state. In this test, the parameters of the brake system are adjusted to simulate the faults and mixed faults of each brake system. Also, the correctness of diagnosis is verified. These parameters include brake disc swing offset, brake shoe clearance, and hydraulic station residual pressure. The specific fault simulation method is as follows [27]: (1) Change brake shoe clearance test Step 1: Start the hydraulic pump and turn the switch to the rope adjustment indicator so that the hand brake in the released state. At this moment, the brake oil pressure of the brake system is 4.8 Mpa; Step 2: Take out the hexagon socket head bolts in the center of the brake disc and loosen the larger bolts in the center of the brake disc. Carry out the same operation on other brake discs, and keep the number of turns of the bolt on the same side to loosen the consistent bolt; Step 3: Turn the switch to the normal indication and run the hoist, then collect the data.
(2) Change brake disc swing offset test Step 1: Start the hydraulic pump when the hoist is stopped. After this, turn the switch to the rope adjustment indicator to ensure that the hand brake is in the fully released state. Thereafter, turn off the hand brake switch of one side of the oil way. At this time, the oil circuit brake is in the fully released state and does not participate in the brake work of the hoist; Step 2: Adjust the brake shoe clearance of the oil circuit brake on the other side; Step 3: Turn the switch to the normal indication and run the elevator. Then, brake and repeat the brake operation. After this process, collect the data.
(3) Chang hydraulic station residual pressure Step 1: Start the hydraulic pump in the state of parking and turn the switch to the indication of the rope adjustment so that the hand brake is in a tight state. The oil pressure at this stage is 0.2 MPa; Step 2: Adjust the screw on the far-right side of the oil pressure gauge to make the residual pressure of the system reach 0.3-1 MPa. Finally, collect the data to observe the brake effect.
Combined with the test conditions, the fault types simulated by adjusting the above parameters include: brake shoe clearance is too small, brake disc overheating, idle motion time is too long, emergency brake fault, residual pressure is too large, disc spring fault and normal, etc. The fault types are numbered as shown in Table 5.

Fault Diagnosis
The data used in the test includes not only normal operating data but also various brake system fault data, and the data set used to train the model is extracted from the historical database monitored by the hoist brake system. The decision tree algorithm constructs the model by mining the hidden fault law from the historical monitoring data of the hoist brake system and extracting the diagnostic rules that can provide the basis for the hoist fault diagnosis.
The redundant data is removed by the above-mentioned SPSS analysis and pretreatment of the data and the relevant feature data is retained for the diagnosis of the above-mentioned fault. Fault analysis test data mainly includes X1-brake shoe clearance, X2-hydraulic station residual pressure, X3-the contact area between disc and brake shoe, X4-brake disc swing offset, and other characteristic attributes. Simulate every fault and collect test data. Some data are displayed in the Table 6. Take K = 4 (number of conditional attributes), use SPSS to perform K-means clustering discretization processing. The results will be displayed in Table 7. After importing the Table 7 data set into the K_C4.5 algorithm, the decision tree is generated via Python. The results are shown in Figure 3. After importing the Table 7 data set into the K_C4.5 algorithm, the decision tree is generated via Python. The results are shown in Figure 3.  After using the Python program to get the classification model of the K_C4.5 decision tree, generate diagnostic rules. Table 8 shows the results.

Result Analysis
(1) Classification accuracy test The authors collected 100 sets of data of various fault data to test the above diagnostic rules. The results of the tests appear in Table 9. The classification accuracy is up to 95.85%, and the model meets the classification accuracy requirements. (2) Evaluation index analysis before and after algorithm improvement This study uses the following evaluation indicators to test the performance of the algorithm before and after improvement: decision tree size, number of decision rules, tree building time, correct classification percentage, and difference degree (Kappa statistic). Among them, the decision tree size refers to the total number of nodes generated, the number of decision rules refers to the number of diagnosis rules finally generated, and the Kappa statistic K is used to evaluate the difference between the classification result of the classifier and the random classification. K = 1 indicates that the classifier is completely different from the random classification. K = 0 indicates that the classifier is the same as the random classification and has no classification effect. The closer the value is to 1, the better. The results of the tests appear in Table 10. (3) Accuracy test before and after algorithm improvement in big data samples To better test the accuracy change of the C4.5 algorithm before and after improvement, this study observes the change of accuracy by increasing the number of samples (taking 1000 samples). The samples are classified according to the diagnostic rules generated by K_C4.5 algorithm and the original C4.5 algorithm. The accuracy of the final test results is shown in Figure 4.  To better test the accuracy change of the C4.5 algorithm before and after improvement, this study observes the change of accuracy by increasing the number of samples (taking 1000 samples). The samples are classified according to the diagnostic rules generated by K_C4.5 algorithm and the original C4.5 algorithm. The accuracy of the final test results is shown in Figure 4. Figure 4 shows that the accuracy of the two algorithms is similar when the number of samples is small, but with the increase of the number of samples, it can be seen that the accuracy of K_C4.5 algorithm is significantly higher than the original C4.5 algorithm, and gradually stabilizes at a higher level.

Conclusions
(1) A fault diagnosis method of mine hoist brake systems based on machine learning is proposed, and the corresponding fault diagnosis model is established. This method is based on data and can meet the needs of fault diagnosis in a big data environment.
(2) A dynamic decision-making model based on C4.5 decision tree is established. The model takes information gain rate as the basis of attribute partition and can deal with training sets with small sample subsets, and discrete and continuous data. It can also deal with training samples with missing attributes. In addition to these, the model considers the correlation between attributes and introduces Kendall concordance coefficient to the process of establishing a decision tree. More accurate classification rules can be obtained.  Figure 4 shows that the accuracy of the two algorithms is similar when the number of samples is small, but with the increase of the number of samples, it can be seen that the accuracy of K_C4.5 algorithm is significantly higher than the original C4.5 algorithm, and gradually stabilizes at a higher level.

Conclusions
(1) A fault diagnosis method of mine hoist brake systems based on machine learning is proposed, and the corresponding fault diagnosis model is established. This method is based on data and can meet the needs of fault diagnosis in a big data environment. (2) A dynamic decision-making model based on C4.5 decision tree is established. The model takes information gain rate as the basis of attribute partition and can deal with training sets with small sample subsets, and discrete and continuous data. It can also deal with training samples with missing attributes. In addition to these, the model considers the correlation between attributes and introduces Kendall concordance coefficient to the process of establishing a decision tree. More accurate classification rules can be obtained. (3) Through the fault simulation and comparison test, the validity and diagnostic accuracy of the diagnostic method are verified. The improved C4.5 decision tree classification algorithm can effectively improve the diagnostic efficiency and reliability. (4) This study focuses on the application of decision tree algorithm in the fault diagnosis of hoist brake system, and the specific application of other machine learning algorithms in fault diagnosis can be further studied, especially the improvement of the algorithm needs to be further explored to make it more in line with the data characteristics of the hoist, which will become the focus of the author's next research.