Predictive Data Mining Techniques for Fault Diagnosis of Electric Equipment: A Review

: Data mining is a technological and scientific field that, over the years, has been gaining more importance in many areas, attracting scientists, developers, and researchers around the world. The reason for this enthusiasm derives from the remarkable benefits of its usefulness, such as the exploitation of large databases and the use of the information extracted from them in an intelligent way through the analysis and discovery of knowledge. This document provides a review of the predictive data mining techniques used for the diagnosis and detection of faults in electric equipment, which constitutes the pillar of any industrialized country. Starting from the year 2000 to the present, a revision of the methods used in the tasks of classification and regression for the diagnosis of electric equipment is carried out. Current research on data mining techniques is also listed and discussed according to the results obtained by different authors.


Introduction
Over the past few years, the number and diversity of electrical equipment, such as motors, transformers, generators, electric vehicles, and energy transmission and distribution systems, among many others, are getting bigger [1][2][3][4][5][6]. Their exponential growth is due to the need of people to perform a number of different activities, ranging from industrial processes to everyday activities such as charging the cell phone battery or starting the car to go to work. Due to their paramount importance in any facet of society, their safety and correct operation is vital, even more so when considering that a failure in one of its components can produce 1) high economical losses derived from its partial or total repair, 2) degradation and poor quality on its performance, 3) outages in the production process, 4) damages to other equipment, and 5) conditions that put in risk the physical integrity of people, among others.
In this regard, the application and development of new techniques and methods to monitor the condition of electric machines and systems are important topics of research. In general, a condition monitoring strategy consists of the following steps (see Figure 1): Data collection through different types of sensors, data processing and feature extraction, and data analysis for condition assessment. The latter can be seen as the process of exploring, finding, selecting, and using specific data to solve the given problem, e.g., a diagnosis problem; however, it is not an easy and straightforward process since the data analyst has to deal with different volumes and varieties of data, as well as redundant and unneeded data, which can compromise and difficult the solution of the assigned task; in fact, the reality is that, in many cases, only a small part of the dataset is used because its volume is simply too large to be used and processed effectively. One solution to this problem has been the use of data mining (DM) techniques. DM is one of the fastest growing fields at both the computational and industrial levels. Its main characteristic involves the search of patterns through the handling of different sets of data to discover the available knowledge. Kantardzic [7] calls DM to the process of applying a computer-based methodology for discovering knowledge from data. Although DM is based on computational algorithms, best results can be obtained by balancing the knowledge of human experts about the problem under study with the advantages and operating modes of different algorithms [4]. In general, DM functionalities can be divided into two categories: Predictive and descriptive. The former is used to construct models that allow the prediction of unknown or future values, whereas the latter is in charge of finding new information that allows the description of the dataset. In this regard, the prediction functionalities become the most suitable option to perform the condition monitoring since a new and unknown equipment condition can be determined or predicted from a specific input information. Therefore, this manuscript is aimed at reviewing the classification and regression tasks that fall within the predictive category of DM, as well as hybrid techniques that combine more than one prediction method. Specifically, classification techniques attempt to find a function or model that distinguishes or predicts the class of unknown data by analyzing a data training set [8]. The regression analysis is used for numerical prediction, i.e., to predict missing or new numerical data values [8].
In the literature, two main groups of research works related to DM and electric equipment are found. On the one hand, there are different reviews about DM applications, e.g., diagnosis in health [9][10][11], marketing [12], industrial [13,14], climatological [15], and financial [16] issues, among others. On the other hand, there are also reviews related to diagnosis methods for specific machines such as transformers [17][18][19], estimation strategies in electric vehicles [20], mathematical models used to study induction motors in defective conditions [21], or, in a more general sense, methodologies of fault classification in transmission systems [22,23] and distribution of energy [24]. There is also the work of Hare et al. [25], where they present a study of modern diagnosis methods in smart micro grids. Although there are specialized reviews on topics of either DM or electric equipment and systems, none of these works have been specifically focused on reviewing the research that has been carried out about the applications of DM techniques for condition monitoring of electric equipment and systems, which is very important in order to highlight the algorithms that have been used in specific equipment but can be applied to other machines since the application core is similar. In this regard, this manuscript provides a review of DM techniques focused particularly on the tasks of classification and regression within the category of predictive analysis applied to various electric machines and systems such as transformers, electric vehicles, heating, ventilation, and air conditioning (HVAC) systems, airplane, automotive, three-phase and multi-phase induction motors, centrifugal pumps, generators, distribution systems, and transmission lines, among others.
The rest of this manuscript is prepared as follows. Section 2 deals with the classification, regression, and hybrid techniques used for the detection of faults and the diagnosis of electrical systems. In Section 3, recent research works on these topics and the latest contributions on DM techniques that can be explored in fault diagnosis methodologies are presented. Finally, Section 4 shows the conclusions of this work.

Predictive Model
DM tasks can be conducted through prediction and description models [7,8] (see Figure 2a). In general, the prediction models are constructed through the learning of known data classes, whereas the description models arise from the findings obtained in a dataset [8]. In this regard, predictive DM techniques are the straightforward option to perform the diagnosis of equipment and systems since their different operating conditions can be learned and determined by a prediction model. In this model (see Figure 2b), its learning is carried out by means of the analysis of a data training set (input data with known outputs) and then used to predict the unknown output (class or value) of new input data. According to the nature of data (discrete or continuous), the prediction model can be used to perform either classification or regression functionalities [7] (see Figure 2a). A classification procedure consists of the assignment of an object (unknown class) into one of several predefined classes (predicted class) according to its properties or features. In a different way, a regression procedure involves the modeling of continuous functions to determine new numerical values (predicted values) according to specific inputs. Classification and regression techniques used for fault detection in electric equipment and systems are presented in the following sections.

Classification-Based Methods
In a condition monitoring context, a classification model can be constructed from a given system and used to provide warnings and predict certain failures in early stages. In this regard, researchers around the world have proposed and used different classification methods in machine learning, pattern recognition, and statistics to perform faults diagnosis.
Recent research on DM has been focused on developing classification techniques capable of handling datasets with different features, e.g., imbalance of proportionally, and large amounts of data. In the latter, this capacity is strongly required because, on the one hand, the availability of data is growing and, on the other hand, their performance can be compromised if limited datasets are analyzed. In fact, the amount of available data during the training of a neural network (NN) plays an important issue in its performance. For instance, Taylor et al. [26] contrasted three different techniques: Neural networks trained by using a hybrid of evolutionary search and backpropagation, neural networks trained by straightforward backpropagation, and simple predictive rulesets trained by evolutionary algorithms. Results indicate that evolved NNs outperform backpropagation trained NNs. However, the results are slightly unsatisfactory from a business viewpoint, obtaining a maximum accuracy of 77.9%, which can be somehow expected due to the small amount of training data, highlighting the need of additional data to establish a better reference during the pattern recognition task. Fortunately, there are many works in which the authors also use NNs as the basis of their investigations and promising results are obtained. In an energy consumption context, Magoulès et al. [27] diagnose different electrical equipment of an office building, including fans, pumps, cooling equipment, and chillers. They use a recursive deterministic perceptron NN to distinguish between normal and defective datasets, where an effectiveness percentage higher than 97% is obtained. Similarly, the use of NNs for fault detection on induction motors are presented [28][29][30][31][32][33][34][35]. Tallam et al. [28] presented an on-line diagnostic scheme to alert the engine protection system of an incipient failure. This scheme consists of a feed-forward NN with a self-organized feature map to display the operating conditions of the in-test machine. An interesting feature offered by the results is that the method is not sensitive to unbalanced supply voltages or asymmetries in the machine. Martins et al. [29] use the alpha-beta stator currents of a three-phase induction motor as input variables to diagnose stator faults. In their proposal, an unsupervised Hebbian-based NN is used to extract the main components of the stator current data. Other proposals combine NNs with fuzzy logic systems (FLSs) to detect inter-turn faults [30,31]. In particular, Ballal et al. [31] developed an ANFIS (Adaptive Neural Fuzzy Inference System) for the detection of stator inter-turn insulation and bearing wear faults, where five input parameters, i.e., current, bearing temperature, winding temperature, speed, and the noise of the machine are used to construct the model. For the inter-turn insulation fault, they obtain an effectiveness of 94.03% using two inputs and 96.67% using five inputs. For the bearing wear fault, the accuracy rate with two inputs is 90.5%, and 98.7% with five inputs. These results demonstrate the importance of an information-rich dataset. In [32][33][34], several NNs are implemented in field programmable gate arrays (FPGAs) to diagnose different faults in induction motors. The diagnosis of broken rotor bars is presented by Zolfaghari et al. [35], where the multilayer perceptron NN used is able to detect the faults in the rotor with a classification effectiveness of 98.80%. Furthermore, modular NNs are used to diagnose transmission lines from the voltage and current signals of their elements (busses, transmission lines, and transformers). Given its modular nature, the diagnosis can be carried out by element, by area, or for the entire context of the electrical system [36]. In [37], adaptive linear neural networks and feed forward neural networks are combined to classify electrical disturbances that affect the electric equipment. The best classification results are obtained when only a single disturbance appears; when more disturbances are combined, the effectiveness is reduced, but it is worth noticing that the effectiveness percentage obtained exceeds 90% for a noiseless condition and exceeds 77% for a noisy condition in the presence of six combined disturbances. In addition, the overall methodology takes 46.5 milliseconds per half cycle analyzed. Hare et al. [25] present a survey for fault diagnostics in smart micro grids, in which they discuss the faults within various components of a micro network, e.g., photovoltaic panels, wind turbines, conventional generation systems, as well as cables and transmission lines, etc., where several classification algorithms such as NNs, decision trees, and FLSs, among others, are presented.
Regarding the transformers, Rigatos and Siano [38] propose the neural-fuzzy network modeling and the local statistical approach for the detection of incipient faults in power transformers. Another technique commonly used for diagnosis of transformers is the decision tree [39][40][41]. Menezes et al. [39] and Han et al. [40] used experimental data from a dissolved gas analysis (DGA) to illustrate the performance of their decision tree-based models. In [39], they present a comparison between the method based on the algorithm C4.5 and other methods used in DGA. They use only 162 samples for the analysis, obtaining the following accuracy: 99.38% for the proposed method, 98.15% for the rules extracted, 88.03% for Duval Triangle, 63.25% for Dornenburg IEC C57.104, and 56.41% for Rogers IEC C57.104. In [40], a decision-tree C4.5 algorithm obtained an effectiveness of 86% for a thermoelectric fault in oil-immersed transformer. Samantaray and Dash [41] analyze the current of a power transformer to discriminate between the current signals generated by the inrush effect and the ones generated by its internal faults. The processing time of the proposed approach is 0.12s and provides an accuracy greater than 96%, exceeding the accuracy of the support vector machine (93.33%). As can be noted, the type of variable to be analyzed by the decision tree-based methods is not restricted; in fact, the use of vibration signals for the diagnosis of faults in monoblock centrifugal pumps [42] and motors of internal combustion [43] are also presented. The latter also compares the classification accuracy obtained by the J48 algorithm, random forest tree algorithm, linear model tree algorithm, best first tree algorithm, and functional tree algorithm, where the linear model tree algorithm provides the best results, offering classification accuracy of 100% using statistical features. In general, it can note that the decision tree algorithms are a practical, economical, and very effective approach. In addition to these different types of decision trees, the fault tree is another alternative used for the diagnosis of systems. For example, Volkanovski et al. [44] evaluate the reliability of a power system for energy delivery by constructing a fault tree structure, which represents the system configuration and includes all the possible flow routes of interruption of the power supply from the generators to the loads, including energy transfer limitations, common cause failure of power lines, energy flows and the capacity of generators, and loads in the power system. Duan and Zhou [45] also use the fault tree analysis and Bayesian networks for fault detection of a system for oil pressure warning instructions in an aircraft engine, where a diagnostic decision tree to guide maintenance personnel to make more efficient decisions when attempting to repair the system is obtained. An advanced Bayesian non-linear state estimation technique called Unscented Kalman Filtering to detect faults in HVAC (heating, ventilation, and air conditioning) components is presented by Bonvini et al. [46]. This algorithm can detect common faults in a chiller plant and functional failures caused by problems in the compressor and occlusions in the valves with a computational performance of 0.25s using Intel Xeon (R) 2.67 GHz-19 Cores and 0.52s using Intel Core i7 2.8 GHz-1 Core. Another tree-based method is the tree-structured fault dependence kernel developed by Li et al. [47]. It implements a structured labeling to include dependency information and describe severity levels in a high-margin learning framework for fault detection of building cooling systems. It is important to highlight that the testing accuracy increases or decreases accordingly with the change of training samples. For instance, in [47], the testing accuracy of the proposed strategy boosted from 69.64% (six training samples) to 99.12% (180 training samples). That is, accumulating more training data is beneficial for the fault detection and diagnosis.
Other classification method that has been widely used is the FLS. In general, it uses knowledgebased reasoning to construct logical rules and, thus, diagnose faults. In this type of algorithms, the designer knowledge about the in-test equipment, e.g., operating conditions, nominal parameters, overall performance, etc., plays a fundamental role. In [48], an FLS is designed to diagnose stator winding faults in induction motors. Similar results are obtained under noisy and noiseless conditions. Therefore, FLS is a good option because there is no general and accurate analytical model that describes completely the induction motor under fault conditions, leaving the open doors to uncertainties or noisy conditions. Amezquita-Sanchez et al. [49] present two FLSs to detect broken rotor bars (BRB) in both regimes of operating conditions, i.e., transient and stationary. The combination of fractal dimension analysis and FL system demonstrated to be highly effective on identifying half-BRB, one BRB, and two BRB, as well as healthy condition, since an effectiveness of 95% and 100% for start-up transient and steady state is obtained. For transformers, Islam et al. [50] present the diagnosis of several transformer faults using dissolved gas in oil analysis (DGA) and an FLS for its interpretation. An overview of different FLSs for DGA is presented in [51], where it is indicated that there is not a single technique that can enable the detection of the full range of faults, therefore the combination of different methods has to be explored as a promising solution. Although promising results have been obtained using FLSs, a relatively high superiority of an adaptive neuro fuzzy inference system for DGA is presented in [52], obtaining an accuracy of 98% for all the 100 fault cases under study, while FL obtained 95%. Regarding other electric systems an equipment, the fault diagnosis of the power system using fuzzy logic is presented in [53]. An online monitoring system of voltage variations in electric systems is presented in [54], where an FLS is used to diagnose and classify instantaneously, i.e., sample to sample, the severity of the electric variation. Their proposal is a suitable tool for analyzing stored data; furthermore, it provides phase information unlike the conventional root mean square technique; moreover, it gives results sample to sample, which is better for nonstationary signals. Lauro et al. [55] diagnose a fan coil electric and Zio et al. [56] classify the faults of a steam generator of a pressurized water reactor. In the latter, a fuzzy clustering-based classification model is transformed into a fuzzy logic inference model, allowing its direct interpretation and inspection; also, improvements in the obtaining of the model are presented to allow the treatment of more complicated scenarios. Table 1 shows a summary for the above reviewed works, where the used techniques and conventional applications, along with the physical variables that have been analyzed by them, are presented. As can be observed, NNs, decision trees, and FLSs are the most commonly used methods for fault detection. Although NNs can be more suitable for fault detection from a generalization viewpoint, decision trees have been preferred in many cases because of the clarity in their interpretation (human friendly) and their low computation burden, which are desirable features in online condition monitoring systems. Also, if the amount of data is limited, a simple decision tree can be used; yet, other aspects of such small dataset have to be taken into account, for instance: redundancy of data, data imbalance, information contained, data type (continuous or discrete), range, time dependency, etc. Regarding the physical variable measured from the in-test equipment, the current signals show to be a powerful and representative source of information for fault detection; although promising results are obtained, the combination of multiple physical variables, e.g., current and vibrations signals, should be explored in order to improve the reliability of new classification schemes and expand the number of fault conditions that can be determined by a single classification algorithm, exploiting the information that each signal can provide, e.g., current signals can provide information to diagnose electrical faults and vibration signals can provide information to diagnose mechanical faults.

Regression-Based Methods
In general, the use of regression techniques consists of numerical prediction, i.e., a methodology to generate a methatical function or model to predict missing or new numerical data values; but it also covers the identification of distribution trends based on the available data. For the latter, the support vector machine (SVM) has been widely used since a regression function is found from the training dataset.
Among the available research works, SVMs have been presented in the literature as one of the most promising methods to diagnose faults in power transformers [57]. Lv et al. [58] and Bacha et al. [59] implement SVM-based strategies to establish the classification of faults in power transformers by using the gases available from the DGA. Both works present an interesting performance comparison among different methods. In [58], five artificial intelligence methods are presented. It is found that the SVM is the most effective and fastest method, obtaining an accuracy of 100% and a training time less than 1s, NN (92.76% accuracy and 81s training time), expert system (89.34% accuracy and training time no mentioned), FL (92.32% accuracy and 82s training time), and combined NN and expert system (93.54% accuracy and 44s training time). In [59], the classification accuracy of FL (86.7%), multi-layer perceptron (80%), radial basis function (86.7%), and SVM (90%) is presented. In a similar venue, SVMs are also explored for the detection and localization of faults in transmission lines, where Johnson and Yadav [60] and Parikh et al. [61] conclude that the SVMs are a highly accurate method for these tasks. Zhang et al. [62] present a SVM-based methodology for data-based line trip fault prediction in power systems, where long-term memory networks are used to capture time series characteristics from multiple sources in large systems. The accuracy of the line trip fault prediction can reach about 97%. SVMs have been also employed in the diagnosis of induction motors, e.g., Gangsar and Tiwari [63] carry out a comparative investigation to predict mechanical and electrical faults in induction motors from the analysis of vibration and current signals and the use of multiclass SVM methods. Zhang et al. [64] propose a method based on the robust local linear embedding algorithm and an SVM for the diagnosis the gear fault from an experimental setup composed by a motor, a torque transducer/encoder, and a dynamometer. The diagnosis of fault severity in the stator winding of induction motors using SVM in regression mode is presented by Das et al. [65]. In their research, they analyze the current signals for different levels of short circuit fault, different unbalance conditions in the voltage supply, and different load levels. In the methodology, they use recursive feature elimination to select the optimum number of features and an SVM as a load-immune classifier, demonstrating the high capabilities of SVMs. Among other electric machines where the SVM has been applied, heating, ventilation, and air conditioning (HVAC) systems [66,67] and the steam generator and pressure boundary of the Chinese CNP300 PWR (Qinshan I NPP) reactor coolant system [68] are included. For the latter, a specialized SVM module monitors the subunits of the reactor coolant system and is capable of making fault diagnosis at the component level. Finally, Lai et al. [69] investigate partial discharge activities for online monitoring of power equipment. They use back-propagation NN, self-organizing map, and SVM for classification and comparison, concluding that SVM is the best method in terms of classification accuracy and processing speed.
Some other approaches related to regression models include Poisson regression, least-square regression, and logistic regression [70]. Publications such as Jena and Bhalja [71] use a logistic regression binary classifier for the development of a new fault zone identification scheme for busbar verified by modeling an existing power generation station in a design software package. The proposed scheme is able to identify the fault zone with an accuracy of 99% when it is tested on a large dataset (28,800) by using a small training dataset (9600 cases). In the diagnosis of power systems, Xu and Chow [72] report the results obtained after using two different techniques, i.e., logistic regression and artificial NN, for the identification of the cause of faults in the power distribution systems. Logistic regression is a parametric model that is rarely used in power system fault diagnosis, while artificial NN is a nonparametric method that has been extensively used in this field. Logistic regression as a conventional statistical method has formalized models to exhibit the nonlinear relationship between the independent and dependent variables, while artificial NN can increase its flexibility by including hidden layers, which is often regarded as a substantial advantage. They conclude that both can be easily implemented. As seen from the results, artificial NN can achieve higher balanced accuracy than logistic regression; however, logistic regression is much faster because the artificial NN requires a relatively long training time and cross-validation requires an even longer computation time. Regarding the linear regression-base methods, the work of Cha et al. [73] presents the diagnosis and detection of faults in the main engine of a space shuttle during a stable state. Within the automotive industry, Jiang and Yin [74] present a new design and implementation approach based on recursive total principle component regression for efficient data-driven fault detection in automobile cyber-physical systems. Meanwhile, Bolovinou et al. [75] solve the problem of predicting the distance at which an electric vehicle can be driven before the energy recharge is required. The fact that the model is online implies that the prediction is made at any distance traveled from the beginning of the trip, which is achieved from a regression analysis. Using square linear regressions, Cappiello et al. [76] present a statistical model to predict the instantaneous emissions and fuel consumption of light-duty vehicles. Yu et al. [77] provide theoretical support for the prediction of faults in highway electromechanical equipment through a panel data model-based multi-factor predictive model. This model is characterized by a two-dimensional multivariate regression analysis based on and individuals and time. Emphasizing the intelligent diagnosis of faults, the classification and regression tree (CART) is used by Gopinath et al. [78] as a back-end classifier to diagnose synchronous generators. The statistical characteristics of the frequency domain are extracted from the current signals of the in-test generators. According to the work presented by Bangura et al. [79], the hidden patterns and nuances of differences between healthy performance firms and several fault signatures using time-series DM for the diagnosis of eccentricities and bar/end-ring connector breakages in polyphase induction motors can be identified. In a more general scenario, Wang and Jiao [80] propose a method of failure prediction related to quality by constructing a total principal component regression model, which can divide the space of the variables into two subspaces, and only one of them will be related to the quality fault. Table 2 summarizes the above-reviewed information, where the effectiveness percentage of each method is also presented; from this information, it is evident that the SVMs are one of the most used methods for fault detection. Many authors agree that SVMs are more robust than other algorithms and satisfy the minimization of structural risk; yet, its effectiveness relies on the features and preparation of data. In addition, they have a high correct identification relationship according to the reported effectiveness percentages. In several works, SVMs have presented a better performance than NNs. These works highlight that SVMs reach the global optimum in a more direct way, are less prone to overfitting, present a smaller computational model, etc. Similar to other algorithms such as NNs, once the training stage has been carried out, the computational time to perform a SVM-based diagnosis is relatively short, making it a suitable for online and continuous diagnosis of electric equipment. Although in several works SVMs have presented a low computation cost/time, it cannot be suitably compared if aspects such as effectiveness reached, overfitting issues, robustness, number of hidden layers and neurons per layer, number of nodes, model complexity, activation and kernel functions used, and training algorithms, among others, are not taken into account.

Hybrid Techniques
It is common to find research where the authors decide to use not only predictive techniques, but also to combine different algorithms that lead them to obtain models or methods that offer better results, including greater precision and efficiency, as well as better handling of data. This section deals with those works whose authors use more than one method, combining classification techniques and regression techniques, as well as other methods that do not belong to the predictive modeling of DM. The use of hybrid techniques, i.e., techniques that combine different methods, is frequently observed in the diagnosis of equipment such as motors, transformers, and electric vehicles mainly, as stated below.
In the extensive field of motors, Seera et al. [81] use the hybrid fuzzy min-max (FMM) neural network and classification and regression tree (CART), which is known as FMM-CART, to perform rule extraction and data classification in order to detect and classify faults in different motor conditions. They show the overall accuracy rates of five motor conditions (healthy, broken rotor bars, unbalanced voltages, eccentricity, and stator winding faults). FMM presented the lowest accuracy, 93.62%, while CART and FMM-CART achieved 98.11% and 98.25%, respectively, for multiple motor conditions in a time of 0.21s, 0.92s, and 0.96s, respectively. Two years later, in 2014, Seera and Lim [82] implement this hybrid model and conclude that it can produce accurate predictions of motor failures in an online learning environment. In addition, the results of the model are better than those compared with CART, FMM, and multi-layer perceptron. At the noisy test, multi-layer perceptron and FMM presented 78.39% and 94.88% accuracy, whereas FMM-CART and CART achieved stable results with 96.54% and 97.82% accuracy. The multilayer perceptron structure was the most complex with 30 hidden nodes, whereas FMM produced 12 nodes (hyperboxes). FMM-CART and CART created eight and six leaves, respectively. The computational time of FMM was only 0.13s. Multilayer perceptron consumed the longest time (2.08s), whereas FMM-CART and CART used almost 1s. The CART method combined with adaptive neuro-fuzzy inference system is presented by Tran et al. [83]. They use current and vibration signals from the induction motor for fault diagnosis; additionally, the hybrid of back-propagation and least-squares algorithm is used to adjust the parameters of membership functions. The total classification accuracy was 91.11% and 76.67% for vibration and current signals, respectively. Other works such as the one presented by Pramesti et al. [84] involve the identification of stator failures in induction motors using the multinomial logistic regression analysis and the Wavelet Transform (WT). Júnior et al. [85] use a multiple linear regression modeling technique along with the analysis of variance and the genetic algorithm optimization to obtain classification models to diagnose three-phase induction motors under normal and short-circuit conditions. The method presents percentages of hits greater than 95% in the diagnosis of the normal and incipient short-circuit fault condition, even at different motor load levels. In addition to the low cost and simplicity, this method does not require physical access to the machine because the current and voltage can be measured from the motor control board. Thus, the probability of the occurrence of human accident is reduced significantly. Unlike several other reported methods for fault diagnosis, the proposed approach requires few data and only uses simulation data to construct the expert system. Seshadrinath et al. [86] propose an algorithm based on two parts: In the first one, the optimal size of the structure of the Probabilistic Neural Network (PNN) is determined, using an orthogonal least-squares regression algorithm. In the second part, the fusion of a Bayesian classifier is recommended as an effective solution to diagnose incipient interturn fault in the machine. To track the health status of a degraded system and predict the remaining service life of a turbofan engine, Zhou et al. [87] propose a method that combines the echo state kernel recursive least-squares algorithm and a Bayesian technique, which demonstrates an excellent performance with respect to long-term prediction.
To obtain an effective diagnosis of faults in automotive systems, intelligent monitoring schemes of the vehicle's condition are needed. In this regard, Choi et al. [88] develop three new approaches for fusion of classifiers in order to reduce the error rate. These approaches are: Joint optimization of the fusion center and individual classifiers, class-specific Bayesian fusion, and dynamic fusion, demonstrating that the proposed techniques surpass the individual classifiers such as PNN, k-Nearest Neighbor (kNN), or principal components analysis. A fault detection scheme for applications in the automotive industry is presented by Jakubek and Strasser [89]. They achieve the detection of faults by using kernel regression techniques and a NN. The resulting network uses significantly less basis functions than a radial basis function network with the same accuracy. Oliva et al. [90] present a model-based approach to predict the remaining driving range by combining a particle filtering and Markov chains by implementing detailed models of the battery, electric motor, and vehicle dynamics. Tseng and Chau [91] and Grubwinkler and Lienkamp [92] study different methodologies for the prediction of electric vehicle energy consumption. In particular, Tseng and Chau compare three approaches that include 1) approaches based on driver/vehicle/environment dependent factors using speed profile matching and driving habit matching, 2) approach of comparison with the average using personalized adjustment, and 3) a collaborative filtering approach that uses matrix factorization; whereas Grubwinkler and Lienkamp use the least-mean square algorithm for the prediction of the mean energy consumption. To have a broad overview of the methodologies used in estimation strategies related to the battery, control, and energy management of both hybrid and electric vehicles, it is recommended to review the work of Cuma and Koroglu [93].
In transformers, their preventive maintenance is very often emphasized. Liao et al. [94] use leastsquare SVM (LS-SVM) and particle swarm optimization in order to optimize the regression parameters for the diagnosis of transformers immersed in oil by using dissolved gases. A comparison with back-propagation neural network, radial basis function neural network, generalized regression neural network, and support vector regression is carried out. Advantages of the regression model include those inherited from the support vector regression, i.e., a unique solution and support of statistical learning theory. In the next years, the wavelet technique is fused with the LS-SVM by Zheng et al. [95] and Zhang et al. [96] to diagnose transformers as well. From the analysis and interpretation of the data generated by the concentration of dissolved gases, Yang and Hu [97] propose a fault diagnosis system, which combines back-propagation NN and a multinomial logistic regression model. Al-Janabi et al. [98] also propose a hybrid system to diagnose transformers. The proposal is based on genetic algorithms and neural networks; in general, it provides information to identify the exact fault in the transformer and its fault state. Fei and Zhang [99] also make use of genetic algorithms along with SVM for fault detection in transformers. Unlike the abovementioned works, Koley et al. [100] use the WT for the extraction of characteristics of the impulse test response of a transformer in the time and frequency domains and the SVM in regression mode to classify transformer faults. It should be pointed out that the SVM tool trained with only simulated data was capable of predicting fault classes accurately when the analog data were presented to the trained SVM for fault prediction.
For the diagnosis of faults in centrifugal pumps, Yunlong and Peng [101] present a new method based on LS-SVM and the empirical mode decomposition. In the case of monoblock centrifugal pumps, Sakthivel et al. [102] use a decision tree-fuzzy hybrid system. In the test dataset, the classification accuracy was 99.3% in decision tree-fuzzy method, 97.50% in rough set-fuzzy method, and 96.67% in case of PCA-based decision tree fuzzy method. For the same task, in [103], Muralidharan and Sugumaran use Wavelet analysis, the Naïve Bayes (NB) algorithm, and the Bayes network algorithm. In [104], they apply the J48 algorithm and the continuous Wavelet transform (CWT). The sym3, rbio2.6, and coif1 mother wavelets are the most suitable for fault diagnosis of centrifugal pumps, reaching a classification accuracy of 100%. Finally, in [105], they use the SVM and the CWT. In this case, bior3.7_17 is the wavelet that gives maximum classification accuracy (99.76%). Hence, it can be considered as the best wavelet as it has the maximum fault discriminating capability for the system under study. Other works that have used the WT are the systems for electric power distribution. Jamil et al. [106] implement an algorithm based on fuzzy logic that uses the DWT to identify 10 different types of faults in an electrical power distribution system. For high impedance fault detection in electrical distribution networks, the WT extracts dynamic characteristics to feed a decision-making system based on SVM [107]. The SVM is also used along with the Hilbert Huang transform to decompose the voltages of transmission lines into intrinsic mode functions [108] for fault classification in power systems. The main contribution of the proposed algorithm is the possibility of its application to any transmission line, no matter the line configuration, with no need for re-training at different load values, voltage levels, and fault resistances. In 2018, Singh and Vishwakarma [109] present a methodology to classify cross-country faults in series-compensated double circuit transmission lines. This method is based on EMD and three different classifiers: SVM, NB, and PNN. The effectiveness is 95% for SVM, 91.66% for Naïve-Bayes, and 96.7% for PNN, where their response times are 0.03s, 0.012s, and 0.016s respectively. Da Silva et al. [110] apply qualitative trend analysis and NB for the diagnosis of multiple failures in transmission lines. This hybrid diagnosis system can be generalized to deal with other types of faults along the transmission line.
Regarding other machines, Lin and Horng [111] use a scheme of classification and detection of faults in an ion implanter, proposing a hybrid classification tree, i.e., they combine a grouping algorithm with CART. They indicate that their methodology is general and can be applied to other machines by simply modifying the warning generation criteria. For the fault detection in components of nuclear power plants components, statistical methods have been used. Di Maio et al. [112] used a set of auto-associative kernel regression models, a hybrid approach based on correlation analysis, a genetic algorithm, and a sequential probability ratio test to detect faults by taking as a case study a coolant pump of a typical pressurized water reactor. Liangyu et al. [113] propose an artificial NN combined with optimal zoom search to recognize various degrees of failure in a high-pressure feedwater heater system. The classification of the healthy and defective conditions of a face milling tool is done through the acquisition of sound signals using the discrete WT (DWT) and the J48 algorithm, which is a decision tree technique [114]. On the other hand, to detect and diagnose faults in HVAC systems, Du et al. [115] combine NNs and clustering analysis. Table 3 lists the works that have presented hybrid techniques for the detection of faults and diagnosis of the abovementioned electric equipment and systems. According to the information shown in Table 3, two different are combined on average to perform the diagnosis, where not only DM techniques are implemented, but other signal processing algorithms are used to extract or highlight features contained into the analyzed signals in order to simplify the fault classification task. As main techniques, the WT and the EMD are found. While the works that use WT exploit its capability for time frequency decomposition in a symmetric way, the works that use EMD exploit its capability to decompose a signal in an adaptive way. In this regard, EMD has been preferred in many works since a-priori information for the analyzed signal is not needed. From this point of view, other recent schemes based on EMD such as down-sampling EMD [116], which is a method that provides specific advantages over EMD, should be explored in the field of fault detection in electric equipment. WT has been also widely used to remove the unwanted noise in an electrical signal. This noise is generated by acquisition systems, sensors, or any electronic device. Regarding DM techniques, FLSs have presented suitable results under noisy conditions in the input signals, since this noise is somehow compared with the uncertainties of the input data, which is an inherent ability of FLSs.

Recent Methods for General Applications
In the literature, various articles that involve the most recent research on classification and regression algorithms, which can be used in different areas of application, have been presented. Djeffal et al. [117] present a method based on filtering and revision stages to delete samples that have little influence on the learning results of a SVM, where the goal is to reduce training time without losing accuracy. This strategy could be used for handling and reducing huge databases before the application of any other algorithm. Zhao et al. [118] tackle the challenging problem of classification in the presence of label noise. In this regard, they propose a Markov chain sampling framework that robustly learns effective classifiers and accurately identifies mislabeled instances. Hwang and Son [119] propose a prototype-based classification to select some data from a dataset for development of learning rules and prediction, demonstrating that the proposed approach overcomes other classifiers such as the Bayes classifier and the nearest neighbor. Regarding the Bayesian approaches, Zhang et al. [120] present a probability density estimation approach based on the nonparametric kernel mixing model in order to estimate reliable class-conditional probability functions; in general, the proposed Bayesian classifier consists of three steps: Partitioning, structure learning, and estimation of probability density functions. Zhang et al. [121] propose a learning scheme that offers a recursive algorithm to explore the distribution of class density for the Bayesian estimation and an automated approach to select powerful discriminant functions for the classification of high-dimensional data, while Celotto [122] proposes a unified visual approach to compare and classify a large subset of Bayesian confirmation measures. In the work of Becker et al. [123], analytical and approximate inference methods are discussed to calculate the marginal probabilities of Bayes factors, providing guidance on the interpretation of results and offering new types of analysis to study sequential data in many application areas.
Regarding regression analysis, Le et al. [124] present the geometric-based online Gaussian process that could scale with massive datasets, guaranteeing that the proposed algorithm produces a good enough solution (close to the optimal one) and a fast-online regression. Marx and Vreeken [125] present an information theory-based approach using the Kolmogorov complexity and the principle of minimum description length to provide a practical solution to the problem of inferring the direction of causal dependence of observational data. Rudaś and Jaroszewicz [126] analyze two uplift modeling approaches for linear regression and identify the situations in which each model works best; in fact, they propose a third model that combines the benefits of both approaches. Liang et al. [70] propose the model called heterogeneous-target robust mixture regression that addresses the challenges and practical concerns of joint learning for multiple objectives/multi-tasking learning by managing mixed types of objectives simultaneously, imposing structural constraints on each component of the mixture and adopting robustness strategies.
On the other hand, Chen and Guestrin [127] describe a scalable end-to-end tree boosting system (XGBoost) and propose a new algorithm based on data dispersion providing information on cache access patterns, data compression, and fragmentation to construct a scalable tree boosting system. They claim that XGBoost is widely used by data scientists to achieve cutting-edge results in many machine learning challenges. Teinemaa et al. [128] evaluate the temporal stability and prediction accuracy of different existing predictive process monitoring methods, finding that the methods based on the XGBoost and LSTM exhibit the highest temporal stability. In relation to NN, Baldi [129] studies the internal and external approaches for the design of recursive neural architectures. Zhang et al. [130] address the problems of intelligent fault diagnosis when the data at the time of training and testing does not come from the same distribution by using domain adaptive convolutional NNs. Bouguelia et al. [131] propose an adaptive algorithm to continuously update a system of neurons through the extension of the growing neural gas algorithm with three complementary mechanisms, which allows one to closely monitor the gradual and sudden changes in the distribution of data. The imbalance data problem is addressed by Xi et al. [132]. They propose the least-squares support vector machine for class imbalance learning by evaluating two parameters of misclassification costs; also, the Cholesky factorization is used to enhance computational stability. In order to reduce the estimation error in online sequential extreme learning machine systems, Lu et al. [133] present a new training approach based on Kalman filter. Although the two last works have been applied to fault detection in aircraft engines, they can be used in other machines. Table 4 shows a compendium of the abovementioned methods. They are grouped by year in order to show their chronological appearance and highlight which ones are the latest algorithms or strategies proposed in the literature to solve DM issues or improve DM tasks. As these methods can address general applications, it is recommended their research and integration in fault detection methodologies of electric equipment and systems. For instance, the least-squares SVM is useful for imbalance data, i.e., when there is a disproportionate ratio of observations in each class, and the Markov chain sampling is a useful tool for mislabel data. Least squares support vector machine [132] ▪ Imbalance of data Logistic regression and Kalman filter [133] ▪ Estimation error reduction in online sequential extreme learning machine systems

Conclusions
The development of efficient and reliable methodologies represents an extremely important task for researchers and developers of diagnosis systems; in order to contribute to the solution of this task, DM techniques have been widely used. To offer the reader an overview of DM techniques used in the detection of faults and diagnosis of electrical equipment and systems in recent years, this paper provides a general review that can facilitate informed decision-making for specific applications. All the details and results obtained by the authors cited here can be consulted directly from the bibliography of each research.
Although certain techniques have been constantly used for specific applications, e.g., SVMs in transformers, the selection of an appropriate DM technique for either classification or regression will depend on many factors, e.g., monitoring technology, features of data, and knowledge about the intest system operation. However, it is important to take into account that the more complex and robust the systems, the greater the amount and variety of data produced, and the more difficult the detection of faults and the diagnosis. Additionally, the researcher has to be informed about the features of specific DM algorithms so that, through its implementation, the information contained in the acquired data can be exploited. It is extremely difficult for a single technique to detect the full range of faults of a system in a 100% reliable way. Each method has its own strengths and weaknesses. Outputs from various diagnostic methods must be aggregated into an overall evaluation system; thus, instead of using one diagnostic method, intelligent hybrid methods that combine the strengths of each method can be developed. In the literature, there are many articles using hybrid techniques in order to increase the percentage of efficiency, accuracy, reliability, and speed of their models. On the one hand, different signal processing techniques have been used for pre-processing of data. This pre-processing allows highlighting and extracting information from raw data. Typical operations are denoising, frequency or mode decomposition, and space transformation, where the WT-and EMD-based methods have demonstrated promising results. On the other hand, the combination of different DT algorithms has been also explored in order to take advantage of their individual benefits. Wu et al. [134] present an important analysis on the 10 most influential DM algorithms in the research community, being C4.5, CART, PageRank, k-Means, kNN, Apriori, AdaBoost, Expectation-Maximization, NB, and SVM. It should be noted that these algorithms cover statistical learning, classification, clustering, link mining, and association analysis. Despite obtaining promising results in many works and having knowledge about both the in-test system and analyzed data, it is difficult or impossible to conceive a perfect algorithm in terms of accuracy, velocity, or complexity for specific applications, mainly considering that even similar applications can have many different requirements; in this regard, the design and development of new algorithms and methods are still of paramount importance.
Also, special attention has to be given to equipment related to renewable energy sources such as wind turbines, photovoltaic systems, power converters, energy storage systems, among others, due to their rapid development and growth [135,136]. For instance, the void defects evolving into damage in wind turbine blades are investigated in [137]. The improvement of photovoltaic and wind power storage systems based on the prediction of battery life and its faults using SVMs is presented in [138]; in these systems, the correct operation of batteries is fundamental. It is clear that all the elements of a system are important and the research of specialized fault detection methodologies for the individual elements and the system as a whole are critical for the maintenance and repair of the system. Some recommended directions for future research are: i) Fusion and analysis of multiple physical variables as source of information of a specific equipment, ii) exploration and integration of recent algorithms to improve the quality of data before the application of a DT-based algorithm, iii) development of practical hardware solutions for online and real-time fault diagnosis, and iv) detection of incipient faults.

Conflicts of Interest:
The authors declare no conflict of interest.