Trends and Challenges in Intelligent Condition Monitoring of Electrical Machines Using Machine Learning

: A review of the fault diagnostic techniques based on machine is presented in this paper. As the world is moving towards industry 4.0 standards, the problems of limited computational power and available memory are decreasing day by day. A signiﬁcant amount of data with a variety of faulty conditions of electrical machines working under different environments can be handled remotely using cloud computation. Moreover, the mathematical models of electrical machines can be utilized for the training of AI algorithms. This is true because the collection of big data is a challenging task for the industry and laboratory because of related limited resources. In this paper, some promising machine learning-based diagnostic techniques are presented in the perspective of their attributes.


Introduction
Nowadays, electrical machines and drive systems are being used in many applications and play a significant role in industries. As electrical machines are used in different applications, the maintenance question is of great importance. Today, there are plenty of condition monitoring methods to detect failures in electrical equipment. In general, diagnostic techniques can be divided into the following groups [1][2][3][4][5] Generally, stresses that impact electrical machines' operation can be classified into four main categories, also known as TEAM (Thermal, Electric, Ambient, and Mechanical) stresses. Because of these stresses, faults tend to appear in the machine.
Statistically, 36% of all motor failures are related to the stator winding faults [6]. Usually, winding failures develop from a turn-to-turn short circuit [7]. Without timely maintenance, this fault can grow to phase-to-phase or phase-to-ground short circuits [8].
Due to the fact that this inter-turn fault is hardly detectable in the early stages of its development, this topic is mainly challenging in the electrical machine industry [9]. From the point of view of reliability, in this case, one of the most critical points is electrical machines' insulation [10]. Insulation plays a significant role during the design processes [11]. The insulation condition can be defined by chemical, mechanical, or electrical analysis of the insulating materials [12]. Table 1. Signatures of the main faults in electrical machines.
As shown in Figure 1, three main types of machine maintenance can be expressed to be applied in practice: corrective, preventive, and predictive maintenance [29].
In the case of corrective maintenance, also known as reactive maintenance, all needed repairs are assumed to be done after the failure has already occurred. However, this solution is appropriate only for small and insignificant workstations, where unexpected failure does not lead to economic or catastrophic consequences. Alternatively, many manufacturers assume preventive maintenance to the machine to avoid fatal outcomes. In this case, the electrical equipment needs to be regularly checked by the manufacturers through scheduled and specified inspections. In the case of corrective maintenance, also known as reactive maintenance, all needed repairs are assumed to be done after the failure has already occurred. However, this solution is appropriate only for small and insignificant workstations, where unexpected failure does not lead to economic or catastrophic consequences. Alternatively, many manufacturers assume preventive maintenance to the machine to avoid fatal outcomes. In this case, the electrical equipment needs to be regularly checked by the manufacturers through scheduled and specified inspections.
Although this solution can prolong machine lifespan, this schedule-based condition monitoring approach provides very little information on the remaining useful lifetime (RUL) of the devices and does not allow for their prognostic and full exploitation [30]. Moreover, because of the scheduled controls in production, it usually means a partial or total shutdown of the manufacturing process, leading to inefficient resource usage and extra operating costs.
To decrease shutdown costs and minimize downtime, manufacturers switch their production over predictive maintenance [31,32]. Condition monitoring is an essential component of predictive maintenance that allows forecasting a further failure based on electrical equipment's working conditions. A schematic illustration of the condition monitoring is shown in Figure 2. As can be seen, condition monitoring consists of several stages. The accuracy of measuring systems largely depends on the sensors used for data acquisition. Signal processing is one of the essential stages in condition monitoring.  Although this solution can prolong machine lifespan, this schedule-based condition monitoring approach provides very little information on the remaining useful lifetime (RUL) of the devices and does not allow for their prognostic and full exploitation [30]. Moreover, because of the scheduled controls in production, it usually means a partial or total shutdown of the manufacturing process, leading to inefficient resource usage and extra operating costs.
To decrease shutdown costs and minimize downtime, manufacturers switch their production over predictive maintenance [31,32]. Condition monitoring is an essential component of predictive maintenance that allows forecasting a further failure based on electrical equipment's working conditions. A schematic illustration of the condition monitoring is shown in Figure 2. As can be seen, condition monitoring consists of several stages. The accuracy of measuring systems largely depends on the sensors used for data acquisition. Signal processing is one of the essential stages in condition monitoring. In the case of corrective maintenance, also known as reactive maintenance, all needed repairs are assumed to be done after the failure has already occurred. However, this solution is appropriate only for small and insignificant workstations, where unexpected failure does not lead to economic or catastrophic consequences. Alternatively, many manufacturers assume preventive maintenance to the machine to avoid fatal outcomes. In this case, the electrical equipment needs to be regularly checked by the manufacturers through scheduled and specified inspections.
Although this solution can prolong machine lifespan, this schedule-based condition monitoring approach provides very little information on the remaining useful lifetime (RUL) of the devices and does not allow for their prognostic and full exploitation [30]. Moreover, because of the scheduled controls in production, it usually means a partial or total shutdown of the manufacturing process, leading to inefficient resource usage and extra operating costs.
To decrease shutdown costs and minimize downtime, manufacturers switch their production over predictive maintenance [31,32]. Condition monitoring is an essential component of predictive maintenance that allows forecasting a further failure based on electrical equipment's working conditions. A schematic illustration of the condition monitoring is shown in Figure 2. As can be seen, condition monitoring consists of several stages. The accuracy of measuring systems largely depends on the sensors used for data acquisition. Signal processing is one of the essential stages in condition monitoring.  For feature extraction, to predict and teach the system to detect faults in the future, the system needs a more powerful tool. Moreover, as the data amount is increasing worldwide and computer science is rapidly developing, it is reasonable to remake production under advanced approaches using artificial intelligence (AI). There are widely used thermal imaging in industry to monitor the fault at the early stages of development [33]. In this case, as an example, different variants of machine learning (ML) algorithms can be used for fault detection. These algorithms, as well as their comparison, are described in the following chapters.

Diagnostic Possibilities with Machine Learning
Many types of research about intelligent health monitoring refer to machine learning (ML) [34][35][36]. ML is a study of computer science and artificial intelligence that is not oriented directly to problem solution but rather learning in the process by applying solutions to many similar problems [37]. Typical tasks of ML are classification and regression, learning associations, clustering, and other machine learning tasks, such as reinforcement learning, learning to rank, and structure prediction [38]. ML is closely related to data mining, which can discover new data patterns in large datasets. The main difference is that ML is concentrated on adaptive behavior and operative usage, while data mining focuses on processing extensive amounts of data and discovering unknown patterns. Based on the dataset, so-called training data, ML algorithms can build a model that predicts and makes decisions. There are many types as well as algorithms of ML. These algorithms can be supervised, unsupervised, semi-supervised, and reinforcement [39]. Figure 3 shows the most common methods used in machine learning.
For feature extraction, to predict and teach the system to detect faults in the future, the system needs a more powerful tool. Moreover, as the data amount is increasing worldwide and computer science is rapidly developing, it is reasonable to remake production under advanced approaches using artificial intelligence (AI). There are widely used thermal imaging in industry to monitor the fault at the early stages of development [33]. In this case, as an example, different variants of machine learning (ML) algorithms can be used for fault detection. These algorithms, as well as their comparison, are described in the following chapters.

Diagnostic Possibilities with Machine Learning
Many types of research about intelligent health monitoring refer to machine learning (ML) [34][35][36]. ML is a study of computer science and artificial intelligence that is not oriented directly to problem solution but rather learning in the process by applying solutions to many similar problems [37]. Typical tasks of ML are classification and regression, learning associations, clustering, and other machine learning tasks, such as reinforcement learning, learning to rank, and structure prediction [38]. ML is closely related to data mining, which can discover new data patterns in large datasets. The main difference is that ML is concentrated on adaptive behavior and operative usage, while data mining focuses on processing extensive amounts of data and discovering unknown patterns. Based on the dataset, so-called training data, ML algorithms can build a model that predicts and makes decisions. There are many types as well as algorithms of ML. These algorithms can be supervised, unsupervised, semi-supervised, and reinforcement [39]. Figure 3 shows the most common methods used in machine learning.  The basic paradigms of ML are supervised and unsupervised algorithms. Supervised ML, also known as "learning with a teacher," is a type of learning from examples, where the training set (situation) and test set (required solution) are set [40,41]. Those training sets are challenging to obtain from industry and laboratories. Because of the limited number of faulty machines working in the industry due to scheduled maintenance (preventive) and in laboratories, a limited number of destructive tests can be performed for training purposes. Moreover, data collection with more than one fault (composite faults) in the same machine is not straightforward in both scenarios. Thanks to the increasing computational power of computers and cloud computation, the mathematical models of electrical machines can train AI algorithms. A comparison of different types of mathematical models of induction motors and their attributes can be found in [42,43].
At the same time, unsupervised ML, also known as "learning without a teacher", is a type of learning where patterns are to be discovered from unknown data [44,45]. In this case, there is only training data, and the aim is to group objects into clusters and/or reduce a large amount of the given data. Sometimes, industrial systems use semi-supervised algorithms in order to get a more precise outcome. In this case, some cases have both training set and test set, while some have only training data.
Differently from basic approaches, reinforcement ML focuses on understanding patterns in repetitive situations and their generalization [46]. The purpose is to minimize errors and increase accuracy; the machine learns to analyze the information before each step. Moreover, the machine aims to get the maximum reward (benefit) from the learning, which is set in advance, such as minimum resource spending, reaching the desired value, minimum analyzing time, etc.
One group of widely used intelligent condition monitoring methods, which can be successfully applied to condition monitoring of many machine parameters, is artificial neural networks (ANNs). ANNs can be supervised, unsupervised, and reinforced. Many studies mistakenly consider NNs as a separate field from machine learning groups. However, NNs and deep learning are related to computer science, artificial intelligence, and machine learning. A diagram of NNs related fields is shown in Figure 4.
the training set (situation) and test set (required solution) are set [40,41]. Those training sets are challenging to obtain from industry and laboratories. Because of the limited number of faulty machines working in the industry due to scheduled maintenance (preventive) and in laboratories, a limited number of destructive tests can be performed for training purposes. Moreover, data collection with more than one fault (composite faults) in the same machine is not straightforward in both scenarios. Thanks to the increasing computational power of computers and cloud computation, the mathematical models of electrical machines can train AI algorithms. A comparison of different types of mathematical models of induction motors and their attributes can be found in [42,43].
At the same time, unsupervised ML, also known as "learning without a teacher," is a type of learning where patterns are to be discovered from unknown data [44,45]. In this case, there is only training data, and the aim is to group objects into clusters and/or reduce a large amount of the given data. Sometimes, industrial systems use semi-supervised algorithms in order to get a more precise outcome. In this case, some cases have both training set and test set, while some have only training data.
Differently from basic approaches, reinforcement ML focuses on understanding patterns in repetitive situations and their generalization [46]. The purpose is to minimize errors and increase accuracy; the machine learns to analyze the information before each step. Moreover, the machine aims to get the maximum reward (benefit) from the learning, which is set in advance, such as minimum resource spending, reaching the desired value, minimum analyzing time, etc.
One group of widely used intelligent condition monitoring methods, which can be successfully applied to condition monitoring of many machine parameters, is artificial neural networks (ANNs). ANNs can be supervised, unsupervised, and reinforced. Many studies mistakenly consider NNs as a separate field from machine learning groups. However, NNs and deep learning are related to computer science, artificial intelligence, and machine learning. A diagram of NNs related fields is shown in Figure 4. Machine learning is a powerful tool with a broad set of different algorithms that can be applied for solving many problems. These algorithms, as well as other applications, are described in more detail in the following chapters. Machine learning is a powerful tool with a broad set of different algorithms that can be applied for solving many problems. These algorithms, as well as other applications, are described in more detail in the following chapters.

Supervised Machine Learning
Supervised ML includes a variety of function algorithms that can map inputs to desired outputs. Usually, supervised learning is used in the classification and regression problems: classifiers map inputs into pre-defined classes, while regression algorithms map inputs into a real-value domain. In other words, classification allows predicting the input category, while regression allows predicting a numerical value based on collected data. The general algorithm of supervised learning is shown in Figure 5.

Supervised Machine Learning
Supervised ML includes a variety of function algorithms that can map inputs to desired outputs. Usually, supervised learning is used in the classification and regression problems: classifiers map inputs into pre-defined classes, while regression algorithms map inputs into a real-value domain. In other words, classification allows predicting the input category, while regression allows predicting a numerical value based on collected data. The general algorithm of supervised learning is shown in Figure 5. Among supervised algorithms, the most widely used are the following algorithms: linear and logistic regression [47,48], Naive Bayes [49,50], nearest neighbor [51,52], and random forest [53][54][55][56]. In condition monitoring and diagnostics of electrical machines, the most suitable supervised algorithms are decision trees [57][58][59] and support vector machines [60][61][62].

Decision Trees
A decision tree (DT) is a decision support tool extensively used in data analysis and statistics. Special attention has been paid to DTs in artificial data mining. DTs' goal is to create a model that predicts the target's value based on multiple inputs. The structure of DTs can be represented by branches and leaves. The branches contain attributes on which the function depends, while leaves contain the values of the function. The other nodes contain attributes by which the decision cases are different. An example of the DT algorithm is shown in Figure 6. Among supervised algorithms, the most widely used are the following algorithms: linear and logistic regression [47,48], Naive Bayes [49,50], nearest neighbor [51,52], and random forest [53][54][55][56]. In condition monitoring and diagnostics of electrical machines, the most suitable supervised algorithms are decision trees [57][58][59] and support vector machines [60][61][62].

Decision Trees
A decision tree (DT) is a decision support tool extensively used in data analysis and statistics. Special attention has been paid to DTs in artificial data mining. DTs' goal is to create a model that predicts the target's value based on multiple inputs. The structure of DTs can be represented by branches and leaves. The branches contain attributes on which the function depends, while leaves contain the values of the function. The other nodes contain attributes by which the decision cases are different. An example of the DT algorithm is shown in Figure 6.
Among other decision models, DTs are the simplest and need a little amount of data to succeed. Moreover, this algorithm can be a hybrid model with another decision model in achieving a more accurate outcome. However, these models are unstable. A little amount of input data can lead to a significant change in the decision tree structure, leading to inaccurate results. Additionally, regression algorithms can fail in the case of decision trees. Appl. Sci. 2021, 11, x FOR PEER REVIEW 7 of 20 Among other decision models, DTs are the simplest and need a little amount of data to succeed. Moreover, this algorithm can be a hybrid model with another decision model in achieving a more accurate outcome. However, these models are unstable. A little amount of input data can lead to a significant change in the decision tree structure, leading to inaccurate results. Additionally, regression algorithms can fail in the case of decision trees.

Support Vector Machines
Another widely used condition monitoring set of ML algorithms are the support vector machines (SVM). This is a set of supervised models used for regression, novelty detection tasks, feature reduction, and SVM, which is preferable in classification objectives [63]. In linear classification, each datapoint is represented as a vector in n-dimensional space (n-the number of features). Each of these points belongs to only one of two classes. Figure  7 shows an example of data classification.

Support Vector Machines
Another widely used condition monitoring set of ML algorithms are the support vector machines (SVM). This is a set of supervised models used for regression, novelty detection tasks, feature reduction, and SVM, which is preferable in classification objectives [63]. In linear classification, each datapoint is represented as a vector in n-dimensional space (n-the number of features). Each of these points belongs to only one of two classes. Figure 7 shows an example of data classification. Among other decision models, DTs are the simplest and need a little amount of data to succeed. Moreover, this algorithm can be a hybrid model with another decision model in achieving a more accurate outcome. However, these models are unstable. A little amount of input data can lead to a significant change in the decision tree structure, leading to inaccurate results. Additionally, regression algorithms can fail in the case of decision trees.

Support Vector Machines
Another widely used condition monitoring set of ML algorithms are the support vector machines (SVM). This is a set of supervised models used for regression, novelty detection tasks, feature reduction, and SVM, which is preferable in classification objectives [63]. In linear classification, each datapoint is represented as a vector in n-dimensional space (n-the number of features). Each of these points belongs to only one of two classes. Figure  7 shows an example of data classification.  In the picture, two data classes are represented: Class 1 (triangles) and Class 2 (squares). The aim is to separate these points by a hyperplane of dimension (n − 1), ensuring a maximum gap between them. There are many possible hyperplanes. Maximizing the gap between classes contributes to a more confident classification and helps to find an optimal hyperplane. As shown in Figure 8, to detect the optimal hyperplane, it is essential to find support vectors that can be defined in as closer position to the hyperplane as possible.
In the picture, two data classes are represented: Class 1 (triangles) and Class 2 (squares). The aim is to separate these points by a hyperplane of dimension (n − 1), ensuring a maximum gap between them. There are many possible hyperplanes. Maximizing the gap between classes contributes to a more confident classification and helps to find an optimal hyperplane. As shown in Figure 8, to detect the optimal hyperplane, it is essential to find support vectors that can be defined in as closer position to the hyperplane as possible. In addition to linear classification, SVMs can deal with non-linear classification using the kernel trick, also known as the kernel machine. As shown in Figure 9, the processing algorithm is similar to the linear one, but the kernel function replaces the datapoints.  In addition to linear classification, SVMs can deal with non-linear classification using the kernel trick, also known as the kernel machine. As shown in Figure 9, the processing algorithm is similar to the linear one, but the kernel function replaces the datapoints.
In the picture, two data classes are represented: Class 1 (triangles) and Class 2 (squares). The aim is to separate these points by a hyperplane of dimension (n − 1), ensuring a maximum gap between them. There are many possible hyperplanes. Maximizing the gap between classes contributes to a more confident classification and helps to find an optimal hyperplane. As shown in Figure 8, to detect the optimal hyperplane, it is essential to find support vectors that can be defined in as closer position to the hyperplane as possible. In addition to linear classification, SVMs can deal with non-linear classification using the kernel trick, also known as the kernel machine. As shown in Figure 9, the processing algorithm is similar to the linear one, but the kernel function replaces the datapoints.  SVM is a good solution when there is no initial information about the data. This method is highly preferred because of the little computation power needed to produce results with significant accuracy. Although kernel machine is a great advantage of SVM, its managing is a complicated task. Moreover, it can take a long time to make large amounts of data processed, so SVM is not preferable in large datasets.
Supervised ML approaches are widely applicable for condition monitoring of electrical machines. Many relevant kinds of research can be found in the literature. The authors in [64] proposed a new signal processing method for fault diagnosis of low-speed machin-ery based on DT approaches. In [65], the authors applied statistical process control and supervised ML techniques to diagnose wind turbine faults and predict maintenance needs. The researchers in [66] presented a semi-supervised ML method that uses the DT algorithm's co-training to handle unlabeled data and applied to fault classification in electric power systems. In [67], the authors proposed a RUL prediction method of lithium-ion batteries using particle filter and support vector regression.

Unsupervised Machine Learning
Unsupervised ML includes algorithms that can learn spontaneously to perform a proposed task without intervention from a teacher. Unsupervised learning is often contrasted with supervised learning when an outcome is known, and it is required to find a relationship between system responses. In unsupervised learning, as shown in Figure 10, the program tries to find similarities between objects and divide them into groups if there are similar patterns. These groups are called clusters. Among supervised algorithms, the most widely used are the following algorithms: cluster analysis, fuzzy c-means [68,69], and k-means [70]. In the diagnosis of electrical machines, principal component analysis is the most frequently used algorithm [71][72][73].
SVM is a good solution when there is no initial information about the data. This method is highly preferred because of the little computation power needed to produce results with significant accuracy. Although kernel machine is a great advantage of SVM, its managing is a complicated task. Moreover, it can take a long time to make large amounts of data processed, so SVM is not preferable in large datasets.
Supervised ML approaches are widely applicable for condition monitoring of electrical machines. Many relevant kinds of research can be found in the literature. The authors in [64] proposed a new signal processing method for fault diagnosis of low-speed machinery based on DT approaches. In [65], the authors applied statistical process control and supervised ML techniques to diagnose wind turbine faults and predict maintenance needs. The researchers in [66] presented a semi-supervised ML method that uses the DT algorithm's co-training to handle unlabeled data and applied to fault classification in electric power systems. In [67], the authors proposed a RUL prediction method of lithium-ion batteries using particle filter and support vector regression.

Unsupervised Machine Learning
Unsupervised ML includes algorithms that can learn spontaneously to perform a proposed task without intervention from a teacher. Unsupervised learning is often contrasted with supervised learning when an outcome is known, and it is required to find a relationship between system responses. In unsupervised learning, as shown in Figure 10, the program tries to find similarities between objects and divide them into groups if there are similar patterns. These groups are called clusters. Among supervised algorithms, the most widely used are the following algorithms: cluster analysis, fuzzy c-means [68,69], and k-means [70]. In the diagnosis of electrical machines, principal component analysis is the most frequently used algorithm [71][72][73]. More frequently, the dataset is so large that it is difficult to interpret and distinguish the necessary information. Principal component analysis (PCA) is one of the most spread algorithms to reduce the data's dimensions while losing the least amount of information. PCA can be interpreted geometrically, as shown in Figure 11. More frequently, the dataset is so large that it is difficult to interpret and distinguish the necessary information. Principal component analysis (PCA) is one of the most spread algorithms to reduce the data's dimensions while losing the least amount of information. PCA can be interpreted geometrically, as shown in Figure 11.
The algorithm of SVM is as follows: (a) Points with specific coordinates are designated on the plane. (b) The direction of the maximum data change is selected, and a new axis PCA is drawn through the experimental points. (c) Experimental points are to be projected on the axis PCA. (d) It is assumed that all the points were initially projected on the axis PCA, and all deviations from this axis can be considered as noise.
If noise is considerable, another axis can be added perpendicular to the first one to describe the data's remaining change. As a result, there is a new representation, which has a smaller number of variables, where all variables are considered, and none of them are deleted. An insignificant part of the data is separated and turns into noise. The main components give the initially hidden variables that control the data device. If noise is considerable, another axis can be added perpendicular to the first one to describe the data's remaining change. As a result, there is a new representation, which has a smaller number of variables, where all variables are considered, and none of them are deleted. An insignificant part of the data is separated and turns into noise. The main components give the initially hidden variables that control the data device.
PCA is the most common approach to dimensionality reduction. It is a useful tool for the visualization of large datasets. One of PCA's main advantages is that components are independent of each other, and there is no correlation between them. It can significantly reduce the training time. At the same time, these independent values can become less interpretable. Besides applying PCA, there is still information loss, and the data analysis is relatively less precise than the original values.
Many studies are available in the literature where unsupervised algorithms are used for the analysis of high-dimensional datasets. In [74], the authors applied a new method to the fault diagnosis of rolling bearings in the field of high-dimensional unbalanced fault diagnosis data based on PCA for better classification performance. In [75], researchers used a PCA-based method to monitor non-linear processes. The researchers in [76] pro- PCA is the most common approach to dimensionality reduction. It is a useful tool for the visualization of large datasets. One of PCA's main advantages is that components are independent of each other, and there is no correlation between them. It can significantly reduce the training time. At the same time, these independent values can become less interpretable. Besides applying PCA, there is still information loss, and the data analysis is relatively less precise than the original values.
Many studies are available in the literature where unsupervised algorithms are used for the analysis of high-dimensional datasets. In [74], the authors applied a new method to the fault diagnosis of rolling bearings in the field of high-dimensional unbalanced fault diagnosis data based on PCA for better classification performance. In [75], researchers used a PCA-based method to monitor non-linear processes. The researchers in [76] proposed a PCA-based hybrid method for monitoring linear and non-linear industrial processes.

Reinforcement Learning
Reinforcement learning (RL) is one of the ML methods, where the system (agent) learns by interacting with some environment. Different from supervised algorithms, there is no need for labeled data pairs. RL is mainly focused on finding a balance between an unknown environment and existing knowledge. The general algorithm of reinforcement learning is shown in Figure 12.

Reinforcement Learning
Reinforcement learning (RL) is one of the ML methods, where the system (agent) learns by interacting with some environment. Different from supervised algorithms, there is no need for labeled data pairs. RL is mainly focused on finding a balance between an unknown environment and existing knowledge. The general algorithm of reinforcement learning is shown in Figure 12. One of the algorithms, which can be used in data mining and cluster analysis, is swarm intelligence [77][78][79]. Swarm intelligence (SI) describes a decentralized and self-organized system's collective behavior, which is considered an optimization method. SI system consists of agents (boids) that interact with each other and the environment. SI should be a multi-agent system with self-organized behavior, which could exhibit a reasonable behavior. This algorithm can adapt to changes and converge fast at some optima. Simultaneously, solutions are dependent sequences of random decisions and can be trapped in local minimum in complex tasks.
At the same time, the more frequently used reinforcement algorithm in condition monitoring is the genetic algorithm [80][81][82]. A genetic algorithm (GA) is a tool for solving optimization problems and modeling random selection using natural selection mechanisms in the environment. A distinctive feature of the GA is the emphasis on using the "crossing" operator, which uses the instrumental role of crossing in wildlife.
In the case of GA, the problem is formalized so that its solution can be encoded in the form of a vector of genes (genotype), where each gene has some value. In classical implementations of GA, it is assumed that the genotype has a fixed length. However, there are GA variations that are free from this limitation. The general diagram of GA is shown in Figure 13. One of the algorithms, which can be used in data mining and cluster analysis, is swarm intelligence [77][78][79]. Swarm intelligence (SI) describes a decentralized and selforganized system's collective behavior, which is considered an optimization method. SI system consists of agents (boids) that interact with each other and the environment. SI should be a multi-agent system with self-organized behavior, which could exhibit a reasonable behavior. This algorithm can adapt to changes and converge fast at some optima. Simultaneously, solutions are dependent sequences of random decisions and can be trapped in local minimum in complex tasks.
At the same time, the more frequently used reinforcement algorithm in condition monitoring is the genetic algorithm [80][81][82]. A genetic algorithm (GA) is a tool for solving optimization problems and modeling random selection using natural selection mechanisms in the environment. A distinctive feature of the GA is the emphasis on using the "crossing" operator, which uses the instrumental role of crossing in wildlife.
In the case of GA, the problem is formalized so that its solution can be encoded in the form of a vector of genes (genotype), where each gene has some value. In classical implementations of GA, it is assumed that the genotype has a fixed length. However, there are GA variations that are free from this limitation. The general diagram of GA is shown in Figure 13.
Basically, the optimization algorithm with the usage of GA is as follows: (a) There is a task, and many genotypes of the initial population are to be created. (b) This initial set of data is to be assessed using the "fitness function," which determines how well each initial population's genotype solves the task. (c) After this, the best coincidences are to be selected in the population for the next generations. (d) The best coincidences obtain new solutions. This process repeats until the task is fulfilled and a resultant population is created.
The main benefit of GA is that specified knowledge about the domain is not needed. GA generates a solution through genetic operators. Moreover, a result can contain more than one appropriate solution. However, GA sometimes suffers from degeneracy. The de-generacy can occur if multiple chromosomes represent the same solution. The same shapes of chromosomes occur repeatedly. In this case, the optimal solution is not guaranteed. Basically, the optimization algorithm with the usage of GA is as follows: (a) There is a task, and many genotypes of the initial population are to be created. (b) This initial set of data is to be assessed using the "fitness function," which determines how well each initial population's genotype solves the task. (c) After this, the best coincidences are to be selected in the population for the next generations. (d) The best coincidences obtain new solutions. This process repeats until the task is fulfilled and a resultant population is created.
The main benefit of GA is that specified knowledge about the domain is not needed. GA generates a solution through genetic operators. Moreover, a result can contain more than one appropriate solution. However, GA sometimes suffers from degeneracy. The degeneracy can occur if multiple chromosomes represent the same solution. The same shapes of chromosomes occur repeatedly. In this case, the optimal solution is not guaranteed.
Nonetheless, GA is an efficient tool for industrial processes optimization. In [83], researchers proposed a new method based on GAs that can be used for both fault-type classification and RUL prediction. The authors in [84] proposed a method based on genetic Nonetheless, GA is an efficient tool for industrial processes optimization. In [83], researchers proposed a new method based on GAs that can be used for both fault-type classification and RUL prediction. The authors in [84] proposed a method based on genetic mutation particle swarm optimization for gear faults diagnosis. In [85], researchers proposed a GA-based method to optimize and improve the photovoltaic array accuracy.

Neural Networks
ANNs have been proved as quite approving tools for condition monitoring and prediction of RUL due to their adaptability, nonlinearity, and arbitrary function approximation ability [86,87]. The main advantage of NNs is that they can outperform nearly every other ML algorithm. This method is supposed to analyze and model processes of damage propagating and predict further failures based on collected data. The main tasks that neuron networks deal with are [88,89]: Classification, 2.
Artificial neural networks originate from attempts to reproduce biological nervous systems' ability to learn and correct errors by modeling the brain's low-level structure. To create artificial intelligence, you need to build a system with a similar architecture. The architecture of an ANN is shown in Figure 14.
tion ability [86,87]. The main advantage of NNs is that they can outperform nearly every other ML algorithm. This method is supposed to analyze and model processes of damage propagating and predict further failures based on collected data. The main tasks that neuron networks deal with are [88,89] Artificial neural networks originate from attempts to reproduce biological nervous systems' ability to learn and correct errors by modeling the brain's low-level structure. To create artificial intelligence, you need to build a system with a similar architecture. The architecture of an ANN is shown in Figure 14. ANNs consist of machine learning algorithms that constitute the human brain with connected signals called neurons. Neurons, both biological as well as artificial, consist of the cell body, dendrite (input), synapse (connection), and axon (output). As seen from the picture, the simplest model of an artificial neural network has three layers of neurons. The first (input) layer is connected to a middle (hidden) layer. The hidden layer is connected to the final (output) layer. In case of the neural networks, to solve a given problem, it is necessary to collect training data. A training dataset is a collection of observations, of which the values of the input and output variables are defined and specified. The neurons transfer a signal from the input layer to the output. The input layer neurons receive data from the outside environment (measuring system, sensors) and, after processing them, transmit signals through the synapses to the neurons of the hidden layer. The neurons of the hidden process receive signals and transmit them to the neurons of the output layer. Basically, the neuron is a computing unit that receives information, performs simple calculations on it, and transfers it further. ANNs consist of machine learning algorithms that constitute the human brain with connected signals called neurons. Neurons, both biological as well as artificial, consist of the cell body, dendrite (input), synapse (connection), and axon (output). As seen from the picture, the simplest model of an artificial neural network has three layers of neurons. The first (input) layer is connected to a middle (hidden) layer. The hidden layer is connected to the final (output) layer. In case of the neural networks, to solve a given problem, it is necessary to collect training data. A training dataset is a collection of observations, of which the values of the input and output variables are defined and specified. The neurons transfer a signal from the input layer to the output. The input layer neurons receive data from the outside environment (measuring system, sensors) and, after processing them, transmit signals through the synapses to the neurons of the hidden layer. The neurons of the hidden process receive signals and transmit them to the neurons of the output layer. Basically, the neuron is a computing unit that receives information, performs simple calculations on it, and transfers it further.
Neural networks are not being programmed; they are learning. Learning is one of the main advantages of neural networks over traditional algorithms. Technically, training consists of finding the coefficients of connections between neurons. In the process of training, the neural network can identify complex dependencies between input and output data and perform generalizations. This means that in case of successful training, the network will be able to return the correct result based on data absent in the training sample and incomplete or partially distorted data.
If a neural network consists of more than three layers, which is an increasing tendency nowadays, the algorithm can be considered a deep learning or deep neural network (DNN). Generally, deep learning is one of the ML techniques in ANNs which analyzes big machinery data with more precise results.
NNs have been considered as a universal tool in solving many problems. However, each method has its own limitations, and NNs are no exception. Usually, NNs are used as a hybrid with some other condition monitoring techniques. All the limitations of ANNs and other mentioned ML techniques are given in the following section.
Different types of NN are used for different parameters monitoring. In the literature, a variety of applications can be found. The authors in [90] proposed a novel intelligent fault diagnosis method based on multiscale convolutional NN to identify different failures of wind turbine gearbox. In [91], the authors proposed an intelligent bearing fault diagnosis method combining compressed data acquisition and deep learning, which provides a new strategy to handle the massive data more effectively. The authors in [92] proposed a deep transfer learning (DTL)-based method to predict the remaining useful life in manufacturing. In [93], the author suggested a novel deep convolutional NN cascading architecture for performing localization and detecting defects in power line insulators. Many algorithms have been developed over the years for the automated identification of partial discharges. In [94], an application of a neural network to partial discharge images is presented, which is based on the convolutional neural network architecture, to recognize the aging of highvoltage electrical insulation.

Trends in Condition Monitoring and Discussion
The maintenance of the electrical equipment is a very challenging topic at present. Proper, reliable, and efficient fault diagnostic techniques are becoming more and more essential as the world moves towards Industry 4.0 standards [9]. A major issue related to the prediction and condition monitoring is the reliability of the used methods [95,96]. ML algorithms have given a potent tool for classifications. ML methods are not a novelty; thus, researchers meet different limitations. Nowadays, intelligent condition monitoring methods mentioned in previous chapters are mainly used together as a hybrid to get more precise and robust results of fault diagnostics in industrial systems [97].
The main problem of machine learning and neural networks is the training datasets required for system training. To meet precise results and make accurate predictions, the amount and the quality of data play a significant role. Mostly, the dataset shows irrelevant features, requiring a function to build a model. This function will represent how flexible the model is. The main problem with the data is either overfitting or underfitting.
Big data is a trending challenge nowadays. At the same time, high dimensionality and the limited number of training samples lead to overfitting [98]. Frequently, this problem occurs with neural networks [99]. Overfitting means that there is a very qualified training dataset but a very poor test dataset. Simultaneously, the system cannot perform well if the training set is too small or if the data is too noisy and corrupted with irrelevant features. There can be an underfitting phenomenon where the test dataset is good enough, but training data are inferior. All the examples are shown in Figure 15. As shown in Figure 15, both underfitted and overfitted models describe the same dataset. Although the too generalized model does not give the priciest results, at the same time, the overfitted model has a definite idea and is not flexible enough for upcoming new datasets. The challenge is to find a balance between underfitting and overfitting by the usage of different models. As shown in Figure 15, both underfitted and overfitted models describe the same dataset. Although the too generalized model does not give the priciest results, at the same time, the overfitted model has a definite idea and is not flexible enough for upcoming new datasets. The challenge is to find a balance between underfitting and overfitting by the usage of different models.
ML is a widespread trend in load forecasting. Many operating decisions, such as reliability analysis or maintenance planning, are based on load forecasts [100]. In this case, artificial neural networks have paid significant attention to proper performance. The main problem overfitted sub-optimization system of ANN that can lead to uncertain forecast results [101]. Working in dynamically changing environments can be a complicated task for NNs. Even if the network has been successfully trained, there is no guarantee that it will work in the future. The market is continually transforming, so today's model can be obsolete tomorrow. In this case, various network architectures must be tested to choose the best one that could follow changes in the environment. Moreover, in the case of NNs, a phenomenon can occur known as catastrophic forgetting. This means that NNs cannot be sequentially trained in several tasks. Each new training set will cause rewriting of all neuron weights, and, as a result, the previously trained data will be forgotten.
Another spread limitation for NNs is the so-called "black box" phenomenon. As was already mentioned, deep learning successfully learns hidden layers of NN architecture mapping inputs and outputs. Approximating the function makes it impossible to study insights into the structure and, as a result, study a cause of a mistake. For this reason, in particular, it is reasonable to choose some other technique or to use NNs in combination with another algorithm.

Conclusions
A review of the state of the art, machine learning-based fault diagnostic techniques in the field of electrical machines is presented in this paper. The artificial intelligencebased condition monitoring techniques are becoming more popular as computer power is increasing day by day. Unlike conventional on-board processors responsible for data collection and analysis, the utilization of powerful remote resources using cloud computation gives the freedom of unlimited memory and processing power to handle big data vital for intelligent techniques. Moreover, by effective training of AI algorithms using mathematical models with various faulty conditions, the diagnostic algorithms can be made more reliable.
The collection of these big data is neither possible from industry nor the lab environment. It is not possible from the industry because of the limited number of faulty machines under service. In the lab, a limited number of machines can be broken due to economic constraints. Due to the trend of mounting sensors on the remotely located machines and collecting their data over the cloud, the processing power-related constraints are resolved. Machine learning makes a considerably significant portion of AI techniques. For future work, the studied techniques will be implemented in practice on real industrial objects. Those techniques can use statistical or convention signal processing techniques to detect fault-related patterns and estimate electrical machines' life estimation. Moreover, they give the flexibility to train algorithms under a variety of working conditions. Those conditions may include grid fed, scalar control, low load, and changing load in case of induction machines in particular and for the rest of other machines in general.