Convolutional Neural Network-Based Stator Current Data-Driven Incipient Stator Fault Diagnosis of Inverter-Fed Induction Motor

: In this paper, the idea of using a convolutional neural network (CNN) for the detection and classiﬁcation of induction motor stator winding faults is presented. The diagnosis inference of the stator inter-turn short-circuits is based on raw stator current data. It o ﬀ ers the possibility of using the diagnostic signal direct processing, which could replace well known analytical methods. Tests were carried out for various levels of stator failures. In order to assess the sensitivity of the applied CNN-based detector to motor operating conditions, the tests were carried out for variable load torques and for di ﬀ erent values of supply voltage frequency. Experimental tests were conducted on a specially designed setup with the 3 kW induction motor of special construction, which allowed for the physical modelling of inter-turn short-circuits in each of the three phases of the machine. The on-line tests prove the possibility of using CNN in the real-time diagnostic system with the high accuracy of incipient stator winding fault detection and classiﬁcation. The impact of the developed CNN structure and training method parameters on the fault diagnosis accuracy has also been tested.


Introduction
Nowadays, drive systems based on AC (alternative current) electric motors play a key role in industrial applications. It is estimated that these machines account for approximately 29% of global and 69% of industrial electricity consumption. During the operation of electric motors in industrial drive systems, various types of defects may occur, preventing further operation of the machine. According to EPRI's statistics, the most common defects of electrical machines include stator (37%), rotor (10%), and rolling bearing (41%) damages. The conducted static tests show that, with the increase of the rated power of electric motors, the role of mechanical damages in favor of electrical damages decreases [1]. This fact causes that many scientific centers are focusing on developing diagnostic methods, enabling a partial reduction of the number of electrical damages or detecting the state of the machine (incipient fault), in which it can still be repaired.
The basis for the operation of diagnostic systems is a thorough knowledge of the changes occurring in the machine as a result of damage. Observation of changes occurring in AC machines is carried out using signals available for measurement on the tested object. The most commonly used are currents [2,3], voltages [4,5], vibrations [6,7] as well as flux [8,9] and temperature [10,11]. The idea of analytical methods for assessing the technical condition of the machine is based on the extraction of damage symptoms in measured diagnostic signals.
Among the used deep learning network structures, the most commonly used ones are convolutional neural networks (CNNs) [46][47][48][49] and autoencoders [50,51]. CNNs used in diagnostic applications are characterized by a much higher level of effectiveness compared to MLP and RBF networks, which are presented in [25,49]. In diagnostic processes, CNNs can act as a damage classification system [49], as well as provide information about the degree of damage [24]. In [51], the possibility of using CNNs as members of an autoencoder structure was presented. By comparing structures based on MLP, CNN and LSTM (long short-term memory), the authors [51] showed that the use of an appropriate structure of autoencoders significantly affects the effectiveness of the diagnostic system. An important aspect in the use of DNN in diagnostic processes is the appropriate adjustment of the measured signal to the structure and properties of the network. The input vector of DNNs can result from the signal analysis [24,25] and the directly provided diagnostic signal [35,46]. Due to the principle of operation of deep learning structures, in most cases the measured signal is converted into a 2D [25,48,52] or 3D [51] matrix. The impact of the input matrix size on the damage detection efficiency is presented, among others, in [48].
This article presents the possibility of using the CNNs in the detection process of inter-turns short circuits in IM stator windings, especially incipient faults. The developed application is characterized by the direct processing of a raw diagnostic signal. The authors presented the possibility of extracting damage symptoms directly from the stator phase current signals, omitting well known analytical pre-processing methods, such as the FFT, WT, and other higher-order transforms [53].
The idea of the CNN-based diagnostic system that performs the task of the fault detecting and assessing the degree of damage of the IM stator windings is described. Unlike the methods presented in [35,46] based on an easy to analyze vibration signal for the detection of mechanical damages, the developed method consists of the direct processing of the phase current signals through a DNN, which is a novelty in the detection systems of stator faults. It should be emphasized that the development of a diagnostic system based on direct analysis of the stator current is associated with additional difficulties, due to the fact that the increase in the number of shorted turns causes a similar effect as an increase in the load torque, i.e., an increase in stator current amplitude. However, the diagnostic system should detect the characteristics of the damage, i.e., distinguish between the effects of short circuits and the impact of the drive operating conditions on the diagnostic signal being analyzed. Nevertheless, the article shows the high efficiency of the developed technique for detecting incipient damage to the stator winding of the converter-fed induction machine.
The article is divided into five main sections. The introduction is followed by the second section presenting the theoretical basis of the developed CNN structure. In the third section, the discussion on diagnostic signal processing method and the training process, as well as the parameters of the developed CNN structure applied for the incipient stator fault detection of the IM is presented. The laboratory set-up is also presented in the section. In the next section, the results of experimental verification are presented, including the on-line detection of inter-turn short circuit faults in the one phase and in three stator phases of the motor operating in different conditions. In the following fifth part, the authors focused on the impact of CNN training process parameters and the proposed network structure on the effectiveness of the developed diagnostic system. The conclusions and observations resulting from the performed research are presented in the last section.

Structure of the Convolutional Network
The basic function of CNN is to extract the features of higher orders from the analyzed signal using convolutional operations. These networks do not have a preconceived architecture, parameter selection methods or rules regarding the number of convolutional layers.
A CNN structure should be seen as the determination of features, which progress with each additional convolutional layer. In the presented CNN application in the diagnostic process, the Energies 2020, 13, 1475 4 of 21 first layer can be understood as the filter of basic features, e.g., maximum or minimum values. The subsequent execution of the convolutional operation allows us to detect higher order features, i.e., the distances between minimum and maximum values. Therefore, the network structure will depend on the type of information provided, as well as the function performed by CNN.
To detect complex features, structures composed of multiple sets of layers are used. The ability to detect features is associated with the process of acquiring generalization skills by the network. Due to the extensive structure of CNN, the methods which avoid over-matching are becoming very important. The structure and method of CNN training will be described below on the example of the network used in this paper, for the diagnosis of IM stator winding incipient damages.
The following Figure 1 presents an example of basic structure of the CNN consisting of a few sets of convolutional layers and one set of layers responsible for determining class membership. The task of the CNNs developed in the application described in this paper was to distinguish the degree of damage to the IM stator windings on the basis of information derived directly from the raw phase current signal (details will be given in the next section).
Energies 2020, 13, x FOR PEER REVIEW  4 of 19 layer can be understood as the filter of basic features, e.g. maximum or minimum values. The subsequent execution of the convolutional operation allows us to detect higher order features, i.e. the distances between minimum and maximum values. Therefore, the network structure will depend on the type of information provided, as well as the function performed by CNN.
To detect complex features, structures composed of multiple sets of layers are used. The ability to detect features is associated with the process of acquiring generalization skills by the network. Due to the extensive structure of CNN, the methods which avoid over-matching are becoming very important. The structure and method of CNN training will be described below on the example of the network used in this paper, for the diagnosis of IM stator winding incipient damages.
The following Figure 1 presents an example of basic structure of the CNN consisting of a few sets of convolutional layers and one set of layers responsible for determining class membership. The task of the CNNs developed in the application described in this paper was to distinguish the degree of damage to the IM stator windings on the basis of information derived directly from the raw phase current signal (details will be given in the next section). The principle of the CNN's individual layers was discussed in detail, among others, in [54][55][56][57][58][59][60][61]. The application of each of the presented layers enabled CNN to acquire the characteristic features of the stator current data, which resulted in an increase in the efficiency of the IM faults neuronal detector. The convolutional layer [54,55] performs the function of a feature detector using the convolutional operation of combining two data sets. This convolutional operation is performed for multidimensional input arrays. In the application of the CNN for stator fault detection presented in this paper, the convolutional layer acts as a filter searching for fault symptoms in the phase current waveform.
To accelerate the training process and also to increase the stability of the NN training, the batch normalization layer was used [56]. This layer normalizes the output of the previous layer, subtracting the average value of the batch elements and dividing this received value by their standard deviation. The impact of using this normalization method is thoroughly discussed in [56]. As in the case of classic NN structures, the activation function plays a crucial role in DNN structures. The most commonly used activation function is the rectified linear unit (ReLU) [54,57]. It is mainly used as a complement to convolutional layers and allows us to capture interactions and representing nonlinearities.
Convolutional layers provide a very large amount of information (symptoms) observed in the input matrix. In many cases, the use of so many object features is pointless. Pooling layers are used in a similar way [54,58]. Their task is to choose only the information whose contribution to each cell (window) is the largest. For this purpose, methods that search for maximum or average values from cell elements are most often used. The advantage of pooling layers is that they reduce the spatial size of data representation, thus preventing overmatching.
The significant number of CNN parameters causes some difficulties in giving the neurons special characteristics, from the perspective of generalization property. The dropout layer [54,59,60] is used to avoid a situation in which a single neuron strongly depends on the state of the others. This technique allows one to teach each of the neurons a different useful feature of the analyzed input data. Due to the rejection of a number of neural connections in each iteration, the training process is accelerated, and overmatching is prevented. The principle of the CNN's individual layers was discussed in detail, among others, in [54][55][56][57][58][59][60][61]. The application of each of the presented layers enabled CNN to acquire the characteristic features of the stator current data, which resulted in an increase in the efficiency of the IM faults neuronal detector. The convolutional layer [54,55] performs the function of a feature detector using the convolutional operation of combining two data sets. This convolutional operation is performed for multidimensional input arrays. In the application of the CNN for stator fault detection presented in this paper, the convolutional layer acts as a filter searching for fault symptoms in the phase current waveform.
To accelerate the training process and also to increase the stability of the NN training, the batch normalization layer was used [56]. This layer normalizes the output of the previous layer, subtracting the average value of the batch elements and dividing this received value by their standard deviation. The impact of using this normalization method is thoroughly discussed in [56]. As in the case of classic NN structures, the activation function plays a crucial role in DNN structures. The most commonly used activation function is the rectified linear unit (ReLU) [54,57]. It is mainly used as a complement to convolutional layers and allows us to capture interactions and representing nonlinearities.
Convolutional layers provide a very large amount of information (symptoms) observed in the input matrix. In many cases, the use of so many object features is pointless. Pooling layers are used in a similar way [54,58]. Their task is to choose only the information whose contribution to each cell (window) is the largest. For this purpose, methods that search for maximum or average values from cell elements are most often used. The advantage of pooling layers is that they reduce the spatial size of data representation, thus preventing overmatching.
The significant number of CNN parameters causes some difficulties in giving the neurons special characteristics, from the perspective of generalization property. The dropout layer [54,59,60] is used to avoid a situation in which a single neuron strongly depends on the state of the others. This technique allows one to teach each of the neurons a different useful feature of the analyzed input data. Due to the rejection of a number of neural connections in each iteration, the training process is accelerated, and overmatching is prevented. The last layer in the CNNs is usually the fully connected layer, which enables the assessment of the share of individual classes that are the result of the network operation [54,55]. Fully connected layers applied in CNNs mostly cooperate with softmax layers [61]. Their task is to determine the probability of the input batch elements belonging to one of the categories. The softmax layers' response takes the form of a vector of probabilities and is used by the classification layer to determine the belonging of the CNN input matrix to one of the specified categories. For this purpose, the cross entropy of losses is calculated.

Training of the Convolutional Network
The CNN training process, in most cases, is carried out according to the algorithm of stochastic gradient descent (SGD). The SGD training method allows one to determine the unloaded gradient estimation, using, for this purpose, the average of sample gradients drawn from the mini data packet. The most important parameter of the SGD algorithm is the learning speed. This value is most often chosen in trial and error performed by analyzing learning curves. Unfortunately, this approach does not allow for the optimization of the learning process, because if the value of this parameter is too high, it will cause the rapid oscillation of the learning curve while the value results in extended training time will be too low. An alternative to the fairly slow SGD algorithm is its extension with the momentum parameter (SGDM). Unlike the SGD method, the SGDM algorithm has the largest step size when multiple consecutive gradients point in the same direction [62]. The training process, according to the SGDM algorithm, begins with determining the initial value of the learning rate η and the momentum parameter α. Then, for the sample from the training mini packet {x 1 , . . . , x k }, the gradient p is estimated according to the relationship: where: x i -a randomly selected minibatch element size m, L( f (x i ,w), y i )-calculated loss function for the i-th sample. For estimated gradient value p, momentum v and parameter w are updated in accordance with the following relationships: where: α-hyperparameter determining how quickly the contributions of previous gradients disappear exponentially. The above algorithm is repeated until the stop condition is met, which is most often the number of training periods or the value of the loss function. The SGDM method, thanks to the use of data packets to approximate the gradient estimation, enables network training during experimental verification, using the data from current measurements. This is an advantage of this method, especially in terms of implementing CNNs in electric machine fault detection systems.

Short Description of the Laboratory Set-Up
The experimental verification of the proposed CNN-based fault detector was carried out on a specially designed setup with IM of 3 kW, presented in Figure 2. The construction of the used IM allowed for the physical modelling of inter-turn short circuits in each of the three phases of IM in a range of 0-5 stator turns (parameters and the nominal data of the IM, as well as the connection of the stator winding for inter-turn short-circuit modelling are shown in the Appendices A and B). range T L = (0-1)T N . The loading machine was fed from an industrial frequency converter, operating in a torque mode, with 10 kHz modulation frequency. scalar control us/fs = const. The tests were carried out for various values of the load torque in the range TL = (0-1)TN. The loading machine was fed from an industrial frequency converter, operating in a torque mode, with 10 kHz modulation frequency.
During the preparation of the CNN training data, the information about the actual values of stator currents was used. The measured diagnostic signals were provided by the data acquisition measurement card (DAQ) to the diagnostic application developed in the LabVIEW software, National Instruments (Austin, TX, USA). After measuring the diagnostic signal, the input data were preprocessed with the use of the Matlab software. It should be emphasized that the information on actual drive system speed and load torque, visible in Figure 2a, was not used in the CNN-based detector; these signals were measured for monitoring the proper training process only.

Description of the Input Data Preprocessing for the Developed CNNs
In a diagnostic system using CNN in the procedure of extracting fault symptoms, the measured input signals must be properly processed. The principle of CNN operation forces the appropriate adaptation of a network input matrix. The developed input batch should be a compromise between the size of the input matrix and the amount of information delivered to the network. If the size of the input matrix is too large, it will significantly increase network training time and, consequently, when it is too small, it will not ensure the proper division of the input data into classes.
The following steps of the data preprocessing algorithm are illustrated in Figure 3. In the first step {1}, the phase currents of the IM were measured. The research assumed the possibility of fault detection after measuring only 2000 samples of the diagnostic signal, which constituted two full periods for the minimal frequency of the motor supply voltage equal to 10 Hz.  During the preparation of the CNN training data, the information about the actual values of stator currents was used. The measured diagnostic signals were provided by the data acquisition measurement card (DAQ) to the diagnostic application developed in the LabVIEW software, National Instruments (Austin, TX, USA). After measuring the diagnostic signal, the input data were preprocessed with the use of the Matlab software. It should be emphasized that the information on actual drive system speed and load torque, visible in Figure 2a, was not used in the CNN-based detector; these signals were measured for monitoring the proper training process only.

Description of the Input Data Preprocessing for the Developed CNNs
In a diagnostic system using CNN in the procedure of extracting fault symptoms, the measured input signals must be properly processed. The principle of CNN operation forces the appropriate adaptation of a network input matrix. The developed input batch should be a compromise between the size of the input matrix and the amount of information delivered to the network. If the size of the input matrix is too large, it will significantly increase network training time and, consequently, when it is too small, it will not ensure the proper division of the input data into classes.
The following steps of the data preprocessing algorithm are illustrated in Figure 3. In the first step {1}, the phase currents of the IM were measured. The research assumed the possibility of fault detection after measuring only 2000 samples of the diagnostic signal, which constituted two full periods for the minimal frequency of the motor supply voltage equal to 10 Hz.
Then, phase current vectors, with sizes 1 × 2000 each, were normalized to the rated current of the tested motor {2}. In the third step, the conversion of each normalized vector (expressed in relative units) to a 40 × 50 matrix was used. This operation was performed for each of the measured phase currents {3}. The last step in this procedure consisted in the conversion of three matrices containing standardized phase current samples to a three-dimensional matrix {4}. The presented method of diagnostic signal preprocessing allowed one to generate 3200 input vectors for the neural network. It should also be emphasized that the use of such a small number of samples, compared to those used in the literature [48], allowed to limit the time of a single measurement to 0.2 s. input matrix is too large, it will significantly increase network training time and, consequently, when it is too small, it will not ensure the proper division of the input data into classes.
The following steps of the data preprocessing algorithm are illustrated in Figure 3. In the first step {1}, the phase currents of the IM were measured. The research assumed the possibility of fault detection after measuring only 2000 samples of the diagnostic signal, which constituted two full periods for the minimal frequency of the motor supply voltage equal to 10 Hz.  The analysis of the results is presented below and in the next section of the paper.
The NN training process was performed using the Matlab environment. The basic quantities describing the developed structures and the parameters of the CNN training process are listed in Table 1.
In the research, two structures of convolutional networks, CNN-1 and CNN-2, were used to classify the degree of damage to the stator winding of the IM, based on the information about the phase currents. The differences in the structures of the discussed networks result from the task assigned to them (assessment of the number of categories). The extension of the CNN-1 structure with an additional convolutional layer gave the network the ability to recognize in which phase of the stator winding a damage occurred. A slight expansion of the structure allowed for a triple increase in the number of the IM stator fault categories. As shown in Table 1, the developed networks are characterized by a relatively small number of neurons compared to known convolutional structures [48]. This results from the fact that to solve some problems, it is required to use many layers detecting features, not necessarily having a large number of neurons.
After the training processes were both developed, CNNs were tested in on-line operation based on raw data of the stator current, different from those used in the learning procedures. Result of operation of the CNN-based detectors of inter-turn short circuits of the IM stator winding are demonstrated in the next section.

Analysis of Experimental Results of the CNN-Based Stator Winding Fault Detectors Working On-Line
The experimental verification of the developed structures was performed on the basis of the prepared test data of the following sizes: 600-for CNN-1, 1600-for CNN-2. They contained information about the measured stator currents for various degrees of stator winding faults. The test data were developed for various frequencies of the supply voltage and the load torque values of the motor. Analyzing the responses of the developed CNN structures to the test data, a very high degree of effectiveness in assessing the degree of stator winding damage can be observed. Figure 4 presents an example of the on-line operation of the fault detector based on CNN-1. Physical damage modeling consisted of temporary shorting of five turns of phase A. This approach allowed the determination of the minimum detection time T d , as well as the damage classification time T c . Detection time should be understood as the time between the occurrence of a winding short-circuit and the first CNN output information, indicating damage to the machine. However, the classification time is measured from the moment the damage occurs until the system response is established, and should not be longer than the measurement time of 2000 current samples. The test results presented in Figure 4 were obtained during IM operation at various values of the supply voltage frequency and the load torque.
The analysis of the obtained results allowed one to notice that the developed neural structure allows for the correct detection of defects, even in the case of the measurement of the transient state of the machine. In addition, the machine operating conditions do not affect the precision of the diagnostic system. As can be seen in Figure 4a-d, as a result of completing the CNN input data buffer with subsequent samples characteristic of a faulty winding, the CNN network shows a gradual change in the category of damage. The same occurrence was observed in Figure 4 during the clearance of the damage. An analysis of transient states allowed one to notice that the assessment of the level of damage is based on changes in the whole input matrix. On the other hand, damage detection is based on the outliers' samples, hence the detection time is much shorter than the classification time. Nevertheless, the main task of the diagnostic system is damage detection, while grade assessment is a secondary function.
In the next Figure 5 the accuracy of fault level assessment is demonstrated for both tested CNN-based detectors. The fault category assessment accuracy was 99.3% for the inter-turn short circuits in a single phase (Figure 5a-CNN-1) and 98.8% for the fault detection in three phases (Figure 5b-CNN-2), respectively. Errors in the assessment of the winding condition most frequently resulted from the impact of the load torque on the value of the measured current amplitudes. In Figure 5b, the individual phases of the stator winding, in which the inter-turn short-circuits occurred, were marked with different colors. damage. An analysis of transient states allowed one to notice that the assessment of the level of damage is based on changes in the whole input matrix. On the other hand, damage detection is based on the outliers' samples, hence the detection time is much shorter than the classification time. Nevertheless, the main task of the diagnostic system is damage detection, while grade assessment is a secondary function. In the next Figure 5 the accuracy of fault level assessment is demonstrated for both tested CNNbased detectors. The fault category assessment accuracy was 99.3% for the inter-turn short circuits in a single phase (Figure 5a-CNN-1) and 98.8% for the fault detection in three phases ( Figure  5b-CNN-2), respectively. Errors in the assessment of the winding condition most frequently resulted from the impact of the load torque on the value of the measured current amplitudes. In Figure 5b, the individual phases of the stator winding, in which the inter-turn short-circuits occurred, were marked with different colors.    The analysis of the results presented in this figure showed that CNN-2 provided a small number of incorrect responses regarding the phase in which a failure occurred (approximately 0.43%). The incorrect network responses occurred mainly at small values of the load torque. Nevertheless, both developed structures maintained a very high degree of effectiveness, although no advanced signal processing methods (FFT, DWT, HHT) were used in the diagnostic procedure. The fact that the raw diagnostic signal (measured stator current in one or three phases) went directly to the input of the CNN is, according to the authors' knowledge, a novelty in the field of the diagnostics of inter-turn short circuits of electrical machines. Additionally, the use of analytical diagnostic methods requires a few seconds of signal measurement. In the presented CNN application, the input vector was based on the current measurement in a time interval of maximum 0.2 s, however the detection and classification times are usually smaller, as can be seen in examples of Figure 4. The differences in damage detection and classification times that can be seen in waveforms shown in Figure 4 are due to the fact that with the variable operating conditions of the drive (frequency or load change), the NN needs more or less samples to determine the damage class correctly. Therefore, the proposed CNN structures allow the multiple shortening of the detection process, while maintaining a high efficiency level, which due to the avalanche spreading nature of the analyzed damages, such as inter-turn short-circuits of the stator winding, is an undoubted advantage.

Impact of the Convolutional Network Training Parameters on the Effectiveness of IM Stator Damage Assessment
The appropriate selection of the network structure for the analyzed task can ensure high CNN precision only with the properly adjusted parameters of the training process. The research focused on the simple gradient algorithm, which is the most popular method in the deep network training [62]. Detailed studies on the impact of training process parameters were carried out for the developed CNN-1 and CNN-2 structures and the results are shown in Figure 6.
According to the SGDM algorithm, the right choice of the data packet size, momentum factor and the frequency of parameter update (number of epochs, number of iterations in the epoch) is crucial.
As observed in Figure 6, the impact of the training parameters on the effectiveness of individual structures is similar. The differences result from a larger number of layers in CNN-2, as well as from a much broader learning data vector. The largest differences in the accuracy of the neural detector are noticeable when the momentum factor changes. The CNN-1 structure, with a much smaller number of classes required for recognition, is characterized by higher efficiency for much higher values of the momentum coefficient (Figure 6c). The high value of this parameter during the training of the CNN-1 network resulted in a rapid decrease in the value of the loss function, and at the same time an increase in the number of learning epochs necessary to ensure maximum precision (Figure 6a). The analysis of Figure 6b shows that the size of the data batch significantly affects the training process of both developed structures. The research abandoned the standard approach to choosing the batch size as a power of two. This approach results from the fact that memory operations are optimized for processing 2n arrays, while the structures presented here did not require such accurate optimization of the training process.
The selection of structures, as well as the parameters of the training process in accordance with this principle, can be justified by much more complex CNN structures. The right number of network parameter updates becomes much more important for the training process. This property is directly related to the number of learning epochs and the size of the data batch. The number of network parameter updates with a fixed number of epochs decreases as the batch packet size increases. Therefore, as the batch size increases, the number of learning epochs should be increased accordingly. The number of parameter updates equal to the quotient of the training vector size and the data packet size should always be constant. A very important technique used to improve the training process is the shuffling of the validation and training data. As can be seen in Figure 6d, the structures in which shuffling was used are characterized by a much higher level of efficiency. The shuffling method before each training epoch allows you to achieve the highest level of effectiveness. When the size of the training data batch does not contain an even number of samples from each category, some samples are discarded. The use of data shuffling before each epoch prevents the situation in which certain data is regularly discarded in each epoch. The selection of structures, as well as the parameters of the training process in accordance with this principle, can be justified by much more complex CNN structures. The right number of network parameter updates becomes much more important for the training process. This property is directly related to the number of learning epochs and the size of the data batch. The number of network parameter updates with a fixed number of epochs decreases as the batch packet size increases. Therefore, as the batch size increases, the number of learning epochs should be increased accordingly. The number of parameter updates equal to the quotient of the training vector size and the data packet size should always be constant. A very important technique used to improve the training process is the shuffling of the validation and training data. As can be seen in Figure 6d, the structures in which shuffling was used are characterized by a much higher level of efficiency. The shuffling method before each training epoch allows you to achieve the highest level of effectiveness. When the size of the training data batch does not contain an even number of samples from each category, some samples are discarded. The use of data shuffling before each epoch prevents the situation in which certain data is regularly discarded in each epoch.

Impact of the Convolutional Network Structure on the Effectiveness of IM Stator Damage Assessment
The task of assessing the technical condition of electrical machines operating in converter-fed drives imposes the additional requirements for neural systems such as a high level of detection efficiency, high accuracy in class evaluation, insensitivity to measurement disturbances, and the ability to generalize. Meeting these requirements is possible only with the appropriate selection of the network structure, learning methods, as well as the appropriate choice of the diagnostic information carrier. As presented in the previous section, changes in the network structure affect changes in training process parameters.

Impact of the Convolutional Network Structure on the Effectiveness of IM Stator Damage Assessment
The task of assessing the technical condition of electrical machines operating in converter-fed drives imposes the additional requirements for neural systems such as a high level of detection efficiency, high accuracy in class evaluation, insensitivity to measurement disturbances, and the ability to generalize. Meeting these requirements is possible only with the appropriate selection of the network structure, learning methods, as well as the appropriate choice of the diagnostic information carrier. As presented in the previous section, changes in the network structure affect changes in training process parameters.
In order to optimize the process of selecting network parameters as much as possible, one should simultaneously examine the impact of changes in the structure, as well as in the learning process. In Table 2, a summary of various CNN-1 structure configurations is presented to identify the relationship between the number of layers used and the network efficiency. This table has been divided into five sections, due to different changes made in the network structure. The bold fonts used in this table show what is changed in the CNN structure in the conducted experiments; e.g., Activation function shows, that this finction has benn changed, while the rest of CNN's parameters were the same in the presented experiment. Also the best obtained accuracy has been marked with bold font. In the first step, the effect of the number of convolutional layers was analyzed. It was noted that starting from the basic structure of CNN-1 with a fixed number of categories necessary to recognize, placing an additional convolutional layer does not increase the network efficiency. However, an additional layer significantly increased the number of network parameters, which made the training process longer.
Subsequent changes in the CNN structure concerned the activation function. The undoubted advantage of the ReLU (rectified linear unit layer) function used together with SGDM is a significant increase in the dynamics of the training process, compared to the function of the hyperbolic tangent. The analysis of the results presented in the second section of Table 2 shows the superiority of the ReLU function over the other activation functions.
The use of pooling layers in the CNN structure makes the network more resistant to noise. This fact results from the principle of operation of this layer which provides information about the maximum or average values of a given window. The combination of convolutional and pooling layers results in a significant increase in the network precision, which can be observed by analyzing the results from the third section of Table 2. The advantage of the maximum search method over the calculation of the average value of the window is also noticeable. A clear improvement in CNN performance is visible when using packet normalization at the output of each of the convolutional layers (see the fourth section of Table 2). Thanks to this approach, the mapping of features is independent of their spatial location, which directly affects the network efficiency, especially with test data being significantly different from training packages.
The use of the dropout layer results in an increase in training time and also allows one to obtain more generalized models with higher precision, which can be seen in the fifth section of Table 2.
The research referred to one level of probability equal to 0.5, which results from the lack of influence of this parameter on the network effectiveness. The dropout layer was applied only before the application of fully connected layers, so as not to miss a part of the input data set. The analysis of the fifth section of Table 2 also shows that the use of multiple fully connected layers to better determine the impact of individual features does not have the intended effect. Therefore, it is enough to use such a layer with the number of neurons equal to the number of recognized categories, in combination with classifying layers.

Conclusions
The main goal of the research presented in this paper has been achieved. It has been proven by experimental tests that the application of the CNN, with a relatively simple structure compared to those used in the literature, for the detection and classification of the IM incipient stator winding faults, offers interesting results. The proper detection of the incipient inter-turn short circuits of the IM stator winding was achieved after measuring maximum 2000 samples of the diagnostic signal-stator current transient. It should also be emphasized that the use of such a small number of samples, compared to those used in the literature, allowed one to limit the maximum time of stator winding fault detection to a maximum of 0.2 s, which is directly connected with the size of the input matrix. Nevertheless, the on-line tests show that the high accuracy of fault detection is obtained, based on less than 200 samples equivalent to 0.02 s of measurement. Moreover, the detection system responses are not dependent on the motor operating conditions. The impact of changes in the structure of the analyzed CNNs, as well as parameters of the learning process, have been analyzed in the presented research and the detailed discussion was presented in the fourth section of this article. The two developed structures of CNNs, one for the assessment of the degree of damage to a single stator phase and the other one for assessing the degree of damage to all three phases, maintain a very high degree of effectiveness, based on the direct processing of the raw measured stator current data, without the use of advanced signal processing methods (FFT, DWT, HHT) in the diagnostic procedure. This is, according to the best of the authors' knowledge, a novelty in the fault diagnostic and classification of the stator winding faults using NNs, not only shallow but also deep learning structures.
The main advantages of the presented IM stator winding fault detectors based on CNN networks and stator current measurement are: • the use of raw measurement data as network input signals, • no need to pre-process measurement data with analytical methods, • high accuracy in detecting the localization of the failure (motor phase) and the fault level (number of shorted turns), • the ability to detect even individual shorted turns (incipient faults), • a merger of symptom extraction and damage detection in one machine learning process.
It has been also shown, that after proper training, the developed CNNs can be used for on-line fault detection and their level assessment using program cooperation between Matlab and LabVIEW software. Future work of the authors will be focused on on-line implementation of CNN-based stator winding incipient fault detectors using microcontroller systems and FPGA.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. Rated parameters of the tested induction motor.

Name of the Parameter Symbol Units
Power