Automatic Classification of Rotor Faults in Soft-Started Induction Motors, Based on Persistence Spectrum and Convolutional Neural Network Applied to Stray-Flux Signals

Due to their robustness, versatility and performance, induction motors (IMs) have been widely used in many industrial applications. Despite their characteristics, these machines are not immune to failures. In this sense, breakage of the rotor bars (BRB) is a common fault, which is mainly related to the high currents flowing along those bars during start-up. In order to reduce the stresses that could lead to the appearance of these faults, the use of soft starters is becoming usual. However, these devices introduce additional components in the current and flux signals, affecting the evolution of the fault-related patterns and so making the fault diagnosis process more difficult. This paper proposes a new method to automatically classify the rotor health state in IMs driven by soft starters. The proposed method relies on obtaining the Persistence Spectrum (PS) of the start-up stray-flux signals. To obtain a proper dataset, Data Augmentation Techniques (DAT) are applied, adding Gaussian noise to the original signals. Then, these PS images are used to train a Convolutional Neural Network (CNN), in order to automatically classify the rotor health state, depending on the severity of the fault, namely: healthy motor, one broken bar and two broken bars. This method has been validated by means of a test bench consisting of a 1.1 kW IM driven by four different soft starters coupled to a DC motor. The results confirm the reliability of the proposed method, obtaining a classification rate of 100.00% when analyzing each model separately and 99.89% when all the models are analyzed at a time.


Introduction
Induction Motors (IMs) are widely used in a large part of industrial applications in industrialized countries [1]. Their robustness, reliability, easy maintenance and low cost, among other characteristics, have contributed to this fact. Squirrel Cage Induction Motors (SCIM), more specifically, are a significant part of the IMs used in those applications [2], consuming almost 89% of the power that industrial facilities demand [3]. Despite those characteristics, SCIM are not immune to failures. Due to the high currents during the start-up and other transients, they deal with thermo-mechanical stresses in the rotor bars that can lead to a fault. This is particularly true in applications where continuous cycles of start-stop are required [4]. To avoid stresses during the start-up, several starting systems are used in the industry. Among others, the use of auto-transformers, stator resistors, soft starters or the star-delta starting are the most usual starting systems [5]. In this context, soft starters have become one of the most preferred starting systems due to their advantages. By means of a power electronics circuit, based typically on thyristors connected in anti-parallel 3 of 29 as a method to detect the presence and the severity of bar breakages in induction motors driven by soft starters. The accuracy rate achieved in this work was 94.40%. By their side, in [28], the authors used Linear Discriminant Analysis (LDA) and an FFNN, applied to a combination of current and stray-flux signals, to detect the presence and the severity of bar breakages in an IM driven by soft starters. In this case, the accuracy rate achieved was 94.40%.
Attending to all the above-mentioned considerations, the automatic methods for rotor fault detection in soft-started induction motors are still improvable. This work presents a new methodology for the automatic detection and severity categorization of rotor faults in induction motors driven by soft starters. The novelty of the proposed methodology is the use of the Persistence Spectrum (PS) applied to the start-up transient stray-flux signals. Then, a convolutional neural network (CNN) is used to automatically categorize the severity of the rotor faults. In order to improve the dataset, data augmentation techniques are used. In this regard, Data Augmentation Techniques (DAT) have been proven to be a reliable way to enhance the data base used in CNN. In [18,27] it was stated that the use of Data Augmentation Techniques is a reliable method to deal with the scarcity of samples, providing a good dataset to use in a CNN. In particular, adding Gaussian noise to a signal is one of the DAT that is commonly employed.
On the other hand, Persistence Spectrum (PS), also known as Power-Spectrum Histogram, shows the percentage of the time that a particular frequency is present in a signal. That is to say that the longer a given frequency persists in a signal as it evolves, the higher is its percentage of time. Therefore, the brighter it will appear in the persistence spectrum (PS). Hence, if there is any hidden component in the signal, it will be revealed, even if it is a light one. Thus, the PS images are suitable to be used in a CNN.
To summarize, since the use of soft starters makes it more difficult to identify the fault-related patterns with the most commonly used time-frequency tools, the main goal of this work was to obtain a methodology that led to an easier identification of the presence of rotor faults in soft-started induction motors. Additionally, the suitability of the method to perform an automatic fault classification system was also a goal of this work. Having this in mind, the characteristics of the PS made it suitable for this application, one of them being its ability to reveal very short events present in the analyzed signal. Finally, the stray-flux during the startup transient was the chosen magnitude for this study due to its richer harmonic content compared to other magnitudes.
In order to verify the effectiveness of the proposed methodology, a test-bench consisting of a 1.1 kW induction motor and a DC motor acting as a load was used. Four different commercial soft starters were used to start the motor. The obtained results, achieving an accuracy rate of 100% for each model separately and 99.89% for all the models together, show the capabilities of the proposed approach.
Finally, to provide a global idea of the paper, its structure is presented here: Section 2 exposes the materials and methods, including the theoretical background and the proposed methodology. Section 3 gives information about the experimental setup used for the tests. Section 4 shows the results and their discussion and finally, Section 5 gives the conclusions of the study.

Stray-Flux Analysis
In recent years, the use of the magnetic flux generated by electrical motors to obtain information about their health state has gained interest. The analysis of this magnitude has been proven to be a good alternative to other typical techniques used in the industry for the condition monitoring of electrical motors (e.g., MCSA).
Within this methodology, two approaches have arisen: (1) air-gap flux analysis [29] and (2) stray-flux analysis [30]. Among them both, the second one has attracted a significant interest because of many reasons. Among them, the low cost of the required sensors [31] and their simple and flexible installation on the frame of the motor [20], the fact that it is a non-invasive technique [17] as well as the fact that it provides reliability in some cases where other techniques yield to false fault positives [4,32] are the most important.
Due to the non-invasive nature of the stray-flux analysis technique, it is possible to install sensors in different positions on the motor frame. This fact allows one to capture different flux components depending on the sensor position [33]. In an induction motor, axial and radial stray-flux components can be distinguished [34]. It has been proven in other works that the presence of faults in electrical motors may affect to the stray-flux, thus amplifying some specific frequency components of the stray-flux signal that depend on the existing fault [35]. In Figure 1, the different stray-flux components and positions of the sensor are shown. In this regard, in position A, mainly the axial flux is captured by the sensor, while in position C, mainly the radial component is acquired. Finally, setting the sensor in position B allows one to capture a combination of radial and axial stray-flux.
Within this methodology, two approaches have arisen: (1) air-gap flux analysis [29] and (2) stray-flux analysis [30]. Among them both, the second one has attracted a significant interest because of many reasons. Among them, the low cost of the required sensors [31] and their simple and flexible installation on the frame of the motor [20], the fact that it is a non-invasive technique [17] as well as the fact that it provides reliability in some cases where other techniques yield to false fault positives [4,32] are the most important.
Due to the non-invasive nature of the stray-flux analysis technique, it is possible to install sensors in different positions on the motor frame. This fact allows one to capture different flux components depending on the sensor position [33]. In an induction motor, axial and radial stray-flux components can be distinguished [34]. It has been proven in other works that the presence of faults in electrical motors may affect to the stray-flux, thus amplifying some specific frequency components of the stray-flux signal that depend on the existing fault [35]. In Figure 1, the different stray-flux components and positions of the sensor are shown. In this regard, in position A, mainly the axial flux is captured by the sensor, while in position C, mainly the radial component is acquired. Finally, setting the sensor in position B allows one to capture a combination of radial and axial stray-flux.

Fault-Related Patterns: Theoretical Frequency Evolution during the Start-Up Transient
Many previous works have proven that the presence of rotor faults affect the strayflux, amplifying some specific harmonics which are related to each fault. Particularly, as it has been stated by some researchers, rotor bar breakages affect the following harmonics in the Fourier spectrum of stray-flux signals: 1. Side band harmonics (fSH): These harmonics mainly appear in the radial component of the stray-flux [35,36]. Their frequency values can be calculated by Equation (1): 2. Axial components: Mainly observed in the axial component of the stray-flux [17], can be calculated by Equations (2) and (3): For all the above-mentioned components, refers to the slip and is the supply frequency.
The theoretical transient evolutions of radial and axial components related to rotor bar breakages are shown in Figure 2. As it can be seen in that figure, the upper side harmonic, given by = • (1 + 2 • ) and depicted in blue, drops from 150 Hz to almost 50 Hz. On the other hand, the lower side harmonic, given by = • (1 − 2 • ) and depicted in orange, drops from 50 Hz to 0 Hz and then rises to almost 50 Hz. Regarding the

Fault-Related Patterns: Theoretical Frequency Evolution during the Start-Up Transient
Many previous works have proven that the presence of rotor faults affect the stray-flux, amplifying some specific harmonics which are related to each fault. Particularly, as it has been stated by some researchers, rotor bar breakages affect the following harmonics in the Fourier spectrum of stray-flux signals:

1.
Side band harmonics (f SH ): These harmonics mainly appear in the radial component of the stray-flux [35,36]. Their frequency values can be calculated by Equation (1):

2.
Axial components: Mainly observed in the axial component of the stray-flux [17], can be calculated by Equations (2) and (3): For all the above-mentioned components, s refers to the slip and f is the supply frequency.
The theoretical transient evolutions of radial and axial components related to rotor bar breakages are shown in Figure 2. As it can be seen in that figure, the upper side harmonic, given by f SH = f ·(1 + 2·s) and depicted in blue, drops from 150 Hz to almost 50 Hz. On the other hand, the lower side harmonic, given by f SH = f ·(1 − 2·s) and depicted in orange, drops from 50 Hz to 0 Hz and then rises to almost 50 Hz. Regarding the axial components, s· f (depicted in yellow) drops from 50 Hz to almost 0 Hz, while 3·s· f (depicted in purple) drops from 150 Hz to almost 0 Hz. That evolution is valid for stray-flux signals during a direct online start-up transient. axial components, • (depicted in yellow) drops from 50 Hz to almost 0 H • (depicted in purple) drops from 150 Hz to almost 0 Hz. That evolution stray-flux signals during a direct online start-up transient.

Persistence Spectrum
Persistence Spectrum (PS) is a commonly used technique in spectrum anal known as Spectrum Histogram, it is a histogram in power-frequency space. It to see the percentage of time that a specific frequency is present in a signal. The a specific frequency persists in a signal as it evolves, the brighter it will appea Therefore, it allows one to see very short events and even low power signals other signals [37].
The procedure to obtain the PS follows the steps listed below [37]:

•
Step 1: The original signal is split into different segments of the same Figure 3). These segments may overlap or not; but overlapping leads to mo spectrum analyses. The time resolution, or segment length, has to be smaller than the signal length. The number of segments, is given by Equa with being the signal duration or length, the length of the overlap and resolution or segment length. Symbols ⌊ ⌋ denote a function that rounds the r nearest integer.

•
Step 2: Once the signal is split, the power spectrum of each segment is co applying the Short-Time Fourier Transform (STFT), as shown in Figure 3 matrix is obtained by applying Equation (5).

Persistence Spectrum
Persistence Spectrum (PS) is a commonly used technique in spectrum analyzers. Also known as Spectrum Histogram, it is a histogram in power-frequency space. It allows one to see the percentage of time that a specific frequency is present in a signal. The more time a specific frequency persists in a signal as it evolves, the brighter it will appear in the PS. Therefore, it allows one to see very short events and even low power signals hidden in other signals [37].
The procedure to obtain the PS follows the steps listed below [37]:

•
Step 1: The original signal is split into different segments of the same length (see Figure 3). These segments may overlap or not; but overlapping leads to more detailed spectrum analyses. The time resolution, or segment length, has to be equal to or smaller than the signal length. The number of segments, is given by Equation (4): with N x being the signal duration or length, L the length of the overlap and M the time resolution or segment length. Symbols denote a function that rounds the result to the nearest integer.

•
Step 2: Once the signal is split, the power spectrum of each segment is computed by applying the Short-Time Fourier Transform (STFT), as shown in Figure 3. The STFT matrix is obtained by applying Equation (5).
Sensors 2023, 23, 316 6 of 29 sent higher presence in time of a component.
In Figure 3, an overview of the Persistence Spectrum computation procedure is shown [37]. In this figure, a 50% overlap rate is applied, which is the same rate used in this work.  As it was stated in [38][39][40], the mth element of the STFT matrix is given by Equation (6): where: x(n) = input signal at time n, g(n) = window function (Kaiser window in this work), X m ( f ) = Discrete Fourier Transform (DTF) of windowed data centered in time mR, R = number of samples between subsequent DFT (difference between segment length and overlap length).
For each segment, as stated in [39,41], the power spectrum is given by Equation (7): • Step 3: A bivariate histogram of the power spectrum logarithm is computed for each time value. In this regard, each segment corresponds to a time value. Every powerfrequency bin in which there is signal energy at that time, increases the corresponding matrix element by "1" (see Figure 3).

•
Step 4: Once all the bivariate histograms are obtained, an accumulated histogram is plotted against the frequency (X axis) and the power (Y axis). Brighter colors represent higher presence in time of a component.
In Figure 3, an overview of the Persistence Spectrum computation procedure is shown [37]. In this figure, a 50% overlap rate is applied, which is the same rate used in this work.

Convolutional Neural Network (CNN)
Being a type of Artificial Neural Networks (ANN), CNNs perform specially well in recognizing images. They are composed of an input layer, several hidden layers and an output layer. What differentiates CNNs from other ANNs is the presence of at least one convolutional layer as a part of the hidden layers. And it is this convolution operation which identifies local characteristics of the input data that can be used for the classification.
The basics of CNNs are explained In [42]. Nevertheless, their way of operation is summarized here:

•
Learning stage: Assuming that the input data x l−1 includes m 2-D matrices, they are convolved in the convolutional layer with the learnable kernels that a layer consists of. That is to say that for each input matrix x l−1 i (i ∈ m), it is convolved with the kernel (or filter) k j . After this, the sum of all that is added to the bias b l . Then, the activation function f (typically a ReLU function) is fed with the result and produces the final output of the j th kernel (or filter). This is mathematically expressed in Equation (8). After this, a batch normalization layer is typically used. It helps to make the training faster by normalizing every input channel across a mini batch. Finally, a pooling layer divides the input into smaller areas and then calculates the average or the maximum of that areas [43].
Consisting typically of two layers (fully connected layer and classification layer), this stage combines all the features extracted from the input data in the learning stage. First, the fully connected layer generates a vector with as much dimensions as the number of classes the CNN is able to predict. Then, a classification layer, usually using a softmax function, provides the classification output.

Proposed Methodology
The proposed methodology consists of 5 main steps. First, the current and stray-flux signals are captured by means of different sensors. Then, White Gaussian Noise is added to the original signal to increment the number of signals. In the third step, the persistence spectrum of each signal is computed. Then, the PS images obtained are cropped and resized to adapt them to the requirements of the CNN, and finally, these images are used as input of the classification CNN in the fifth step. Stated yet another way, the input of the CNN will be the PS images after being cropped and resized to 224 × 224 × 3 images, while the output of the CNN will be the three rotor fault classes, namely healthy state, one broken bar and two broken bars.
To better illustrate the sequence of the procedure, a flux diagram of the proposed methodology is shown in Figure 4:  As mentioned before, the proposed methodology follows the steps listed here:

•
Step 1: Acquisition of the current and stray-flux signals. These two magnitudes were captured, simultaneously, during the start-up transients. To do this, a current clamp and a coil-based flux sensor, both of them described in the next section, were used. The signals were stored in a waveform recorder (oscilloscope) and then downloaded to a PC, where the signal analyses were performed. The selected position of the flux sensor was the one allowing one to capture a combination of radial and axial stray-flux (Position B, see Figure 1).

•
Step 2: Data augmentation. In order to generate a higher number of training samples, a Data Augmentation Technique (DAT) was applied. In this case, the addition of White Gaussian Noise (WGN) to the original stray-flux signals was the selected technique.
The addition of Gaussian Noise to the original signals is a data augmentation tool that is frequently used. This technique can increase the dataset by selecting different values of standard deviations ( ) [44] or, since it directly affects the value of , different values of Signal-to-Noise Ratio (SNR). In this regard, also in [44], it is proven that values of SNR smaller than 10 dB report low improvements to the accuracy of the classification methods. On the other hand, authors in [45] pointed out that large ranges of SNR in the injected noise allow one to obtain better performance of the test datasets. In other works, as in [46], the authors set the SNR range for the injected noises between 10 dB and 20 dB. Taking all this into account and also that a level of SNR of 20 dB is commonly considered as a good value of AWGN in electrical signals [19], a set of Gaussian Noises with SNR from 10 dB to 20 dB, in steps of 0.2 dB, was performed for this work. Thus, the number of signals of the dataset, including the original ones, reached the values shown in Table 1: As mentioned before, the proposed methodology follows the steps listed here:

•
Step 1: Acquisition of the current and stray-flux signals. These two magnitudes were captured, simultaneously, during the start-up transients. To do this, a current clamp and a coil-based flux sensor, both of them described in the next section, were used. The signals were stored in a waveform recorder (oscilloscope) and then downloaded to a PC, where the signal analyses were performed. The selected position of the flux sensor was the one allowing one to capture a combination of radial and axial stray-flux (Position B, see Figure 1).

•
Step 2: Data augmentation. In order to generate a higher number of training samples, a Data Augmentation Technique (DAT) was applied. In this case, the addition of White Gaussian Noise (WGN) to the original stray-flux signals was the selected technique.
The addition of Gaussian Noise to the original signals is a data augmentation tool that is frequently used. This technique can increase the dataset by selecting different values of standard deviations (σ) [44] or, since it directly affects the value of σ, different values of Signal-to-Noise Ratio (SNR). In this regard, also in [44], it is proven that values of SNR smaller than 10 dB report low improvements to the accuracy of the classification methods. On the other hand, authors in [45] pointed out that large ranges of SNR in the injected noise allow one to obtain better performance of the test datasets. In other works, as in [46], the authors set the SNR range for the injected noises between 10 dB and 20 dB. Taking all this into account and also that a level of SNR of 20 dB is commonly considered as a good value of AWGN in electrical signals [19], a set of Gaussian Noises with SNR from 10 dB to 20 dB, in steps of 0.2 dB, was performed for this work. Thus, the number of signals of the dataset, including the original ones, reached the values shown in Table 1: In Figure 5, a comparison between one original stray-flux signal and three of the signals with AWGN is shown. In addition, the Persistence Spectrum computed for each of the mentioned signals are shown.   Schneider  420  420  420  ABB  252  252  252  Siemens  336  336  336  Omron  252  252  252  Total  1260 1260 1260

Soft-Starter Healthy 1 BB 2 BB
In Figure 5, a comparison between one original stray-flux signal and three of the signals with AWGN is shown. In addition, the Persistence Spectrum computed for each of the mentioned signals are shown.

•
Step 3: Computation of the Persistence Spectrum of each signal. Once all the strayflux signals were obtained, both the captured ones and those resulting from the data augmentation process, the start-up transient was identified and isolated from the signal itself. Then, the Persistence Spectrum (PS) was computed for all the transients obtained, setting an overlap of 50% and using the Kaiser window as window function. The process to obtain the PS was the one referred in Section 2, Section 2.3. As a result, a set of 3780 images was obtained, one for each transient. Those images were stored in different folders. For each model of soft starter, the images were divided into three folders depending on the health state of the rotor (namely healthy, one broken bar and two broken bars). Those folders contained, in each case, the resulting PS images for the different parameter settings of the soft starter model, with and without load. In Figure 6, an example of the PS images obtained is shown.

•
Step 3: Computation of the Persistence Spectrum of each signal. Once all the strayflux signals were obtained, both the captured ones and those resulting from the data augmentation process, the start-up transient was identified and isolated from the signal itself. Then, the Persistence Spectrum (PS) was computed for all the transients obtained, setting an overlap of 50% and using the Kaiser window as window function. The process to obtain the PS was the one referred in Section 2.3. As a result, a set of 3780 images was obtained, one for each transient. Those images were stored in different folders. For each model of soft starter, the images were divided into three folders depending on the health state of the rotor (namely healthy, one broken bar and two broken bars). Those folders contained, in each case, the resulting PS images for the different parameter settings of the soft starter model, with and without load. In Figure 6, an example of the PS images obtained is shown. Sensors 2023, 23, x FOR PEER REVIEW 10 of 29 As explained in Section 2, Section 2.3, the Persistence Spectrum represents, by means of a color map, the percentage of time that a particular frequency appears in a signal. The x-axis shows the frequency (in Hz) and the y-axis the power (in dB). Thus, it is a timefrequency view. In this regard, the representing limits for the frequency were set between 0 Hz and 200 Hz, while for the power, between −100 dB and 0 dB. Those limits were chosen due to the following reasons: • Regarding the power, the limits were set attending to the maximum and minimum value obtained from all the Persistence Spectra computed for all the signals.

•
Regarding the frequency, the limits were set attending to the frequencies where the components related to the patterns of the studied fault (broken bars) must appear.

•
Step 4. Crop and resize images. In order to adapt the PS images obtained in the previous step to the needs of the CNN, they were cropped and resized. The main aim for the cropping was to eliminate the color bar and the axis legends, keeping only the area where the PS was represented. On the other hand, since the CNN input size for the images was set in 224 × 224 pixels, the cropped images needed to be reduced to that size. In Figure 7, an example of the cropped and resized images against the PS images can be seen.  As explained in Section 2.3, the Persistence Spectrum represents, by means of a color map, the percentage of time that a particular frequency appears in a signal. The x-axis shows the frequency (in Hz) and the y-axis the power (in dB). Thus, it is a time-frequency view. In this regard, the representing limits for the frequency were set between 0 Hz and 200 Hz, while for the power, between −100 dB and 0 dB. Those limits were chosen due to the following reasons:

•
Regarding the power, the limits were set attending to the maximum and minimum value obtained from all the Persistence Spectra computed for all the signals.

•
Regarding the frequency, the limits were set attending to the frequencies where the components related to the patterns of the studied fault (broken bars) must appear.

•
Step 4. Crop and resize images. In order to adapt the PS images obtained in the previous step to the needs of the CNN, they were cropped and resized. The main aim for the cropping was to eliminate the color bar and the axis legends, keeping only the area where the PS was represented. On the other hand, since the CNN input size for the images was set in 224 × 224 pixels, the cropped images needed to be reduced to that size. In Figure 7, an example of the cropped and resized images against the PS images can be seen.

•
Step 5. Automatic fault identification (CNN). For the automatic classification of the different health states of the rotor (healthy, 1 BB and 2 BB), a self-developed Convolutional Neural Network (CNN) was used. It was implemented in MATLAB platform, and the detailed information of the CNN layers is shown in Figure 8 and Table 2. Additionally, the MATLAB pseudocode is shown in Appendix A, Figure A1. As explained in Section 2, Section 2.3, the Persistence Spectrum represents, by means of a color map, the percentage of time that a particular frequency appears in a signal. The x-axis shows the frequency (in Hz) and the y-axis the power (in dB). Thus, it is a timefrequency view. In this regard, the representing limits for the frequency were set between 0 Hz and 200 Hz, while for the power, between −100 dB and 0 dB. Those limits were chosen due to the following reasons: • Regarding the power, the limits were set attending to the maximum and minimum value obtained from all the Persistence Spectra computed for all the signals.

•
Regarding the frequency, the limits were set attending to the frequencies where the components related to the patterns of the studied fault (broken bars) must appear.

•
Step 4. Crop and resize images. In order to adapt the PS images obtained in the previous step to the needs of the CNN, they were cropped and resized. The main aim for the cropping was to eliminate the color bar and the axis legends, keeping only the area where the PS was represented. On the other hand, since the CNN input size for the images was set in 224 × 224 pixels, the cropped images needed to be reduced to that size. In Figure 7, an example of the cropped and resized images against the PS images can be seen.

•
Step 5. Automatic fault identification (CNN). For the automatic classification of the different health states of the rotor (healthy, 1 BB and 2 BB), a self-developed Convo lutional Neural Network (CNN) was used. It was implemented in MATLAB plat form, and the detailed information of the CNN layers is shown in Figure 8 and Table  2. Additionally, the MATLAB pseudocode is shown in Appendix A, Figure A1.  With regards to the training process, the Stochastic Gradient Descent with Momen tum algorithm was selected. The initial learning rate was set in 10 −4 , the momentum in 0.9 and the 2 regularization factor (or weight decay factor) in 10 −4 . Furthermore, the min-Batch size was set in 10, attending to the results available in the technical literature For instance, in [47], it is stated that sizes smaller than 32 allow one to obtain better train ing stability and generalization results. By their side, authors in [48] say that values above 10 allow faster computations. Finally, the maximum number of epochs was set in 20. The number of validation samples during the training was 25% of the available samples, ran domly selected. An overview of the properties is shown in Table 3.  With regards to the training process, the Stochastic Gradient Descent with Momentum algorithm was selected. The initial learning rate was set in 10 −4 , the momentum in 0.9 and the L 2 regularization factor (or weight decay factor) in 10 −4 . Furthermore, the min-Batch size was set in 10, attending to the results available in the technical literature. For instance, in [47], it is stated that sizes smaller than 32 allow one to obtain better training stability and generalization results. By their side, authors in [48] say that values above 10 allow faster computations. Finally, the maximum number of epochs was set in 20. The number of validation samples during the training was 25% of the available samples, randomly selected. An overview of the properties is shown in Table 3.

Experimental Setup
In order to validate the effectiveness of the proposed methodology, several tests were carried out in the laboratory. The test-bench employed was the one shown in Figure 9. It consisted of a 1.1 kW squirrel cage induction motor (tested motor), coupled to a DC motor which acted as a load. The tested motor (SCIM) was started by means of four different commercial soft starters. During every start-up, the stray-flux and the current demanded by the motor were captured. To capture the stray-flux, a handmade coil-based sensor attached to the motor frame was used. A picture of it and its shape and main dimensions are shown in Figure 10(a1,a2), respectively. To capture the current signal of one of the supply phases of the motor, a current clamp was used (see Figure 10b). All these signals were recorded with an oscilloscope and then downloaded to a PC, where the signal analyses were performed. Both the stray-flux and the current signals were acquired for 40 s at a sampling rate of 5 kHz. All the analyses and training and validation processes were conducted on a PC, with an Intel Core i5-9400 1 CPU (2.9 GHz) and 8 GB of memory.

Experimental Setup
In order to validate the effectiveness of the proposed methodology, several tests wer carried out in the laboratory. The test-bench employed was the one shown in Figure 9. I consisted of a 1.1 kW squirrel cage induction motor (tested motor), coupled to a DC moto which acted as a load. The tested motor (SCIM) was started by means of four differen commercial soft starters. During every start-up, the stray-flux and the current demanded by the motor were captured. To capture the stray-flux, a handmade coil-based sensor at tached to the motor frame was used. A picture of it and its shape and main dimension are shown in Figure 10(a1,a2), respectively. To capture the current signal of one of th supply phases of the motor, a current clamp was used (see Figure 10b). All these signal were recorded with an oscilloscope and then downloaded to a PC, where the signal anal yses were performed. Both the stray-flux and the current signals were acquired for 40 s a a sampling rate of 5 kHz. All the analyses and training and validation processes wer conducted on a PC, with an Intel Core i5-9400 1 CPU (2.9 GHz) and 8 GB of memory. The main characteristics of the tested motor (SCIM) are shown in Table 4.  The main characteristics of the tested motor (SCIM) are shown in Table 4.   On the other hand, the main characteristics of the current clamp are listed in Table 6 and a picture of it is shown in Figure 10b.  The main characteristics of the coil-flux sensor are the ones listed in Table 5. In addition, as said before, in Figure 10(a1, a2)), the shape and dimensions of the coil sensor can be seen. On the other hand, the main characteristics of the current clamp are listed in Table 6 and a picture of it is shown in Figure 10b. To carry out all the tests, four different models of soft starters were employed. Each of them had different topologies, controlling one-, two-or the three-supply phases depending on the model. Furthermore, each model allowed one to control the start-up time-ramp and the initial voltage or torque. The different models of soft starters used for the tests were the ones shown in Figure 11, and their main characteristics are listed in Table 7. Regarding the studied fault, different bar breakages were in SCIM tested. Firstly, once the healthy rotor was tested, one rotor ing a hole in the junction with the end short-circuit ring. Then, tests were carried out, a second rotor bar, contiguous to the prev same way. A detail of the healthy rotor and the bar breakages fo 12.  Regarding the studied fault, different bar breakages were induced in the rotor of the SCIM tested. Firstly, once the healthy rotor was tested, one rotor bar was broken by drilling a hole in the junction with the end short-circuit ring. Then, once the one-broken-bar tests were carried out, a second rotor bar, contiguous to the previous, was broken in the same way. A detail of the healthy rotor and the bar breakages forced is shown in Figure 12. The tests were carried out following the same sequence for the four models of soft starters. First, the healthy motor was started, without load, by means of one of the soft starters. Different combinations of time-ramp and initial voltage/torque were performed for each model of soft starter and for each of those combinations, the tested motor was started once. Then, the same tests were repeated, but this time with the tested motor fully loaded. This was achieved by varying the excitation voltage of the DC machine coupled to the tested motor. Afterwards, the procedure was repeated first for the case of one broken bar and then for the case of two broken bars.
For each start-up, the coil-flux sensor was placed in a position which allowed one to obtain a combination of axial and radial flux (called Position B, see Figure 1). In addition, the current signals of one of the supply phases was captured by means of the above-mentioned current clamp. These tests allowed one to obtain a batch of signals of the tested SCIM under different starting conditions. The different combinations of parameters performed for each soft starter are listed in Table 8, and also the number of signals obtained for each model.  The tests were carried out following the same sequence for the four models of soft starters. First, the healthy motor was started, without load, by means of one of the soft starters. Different combinations of time-ramp and initial voltage/torque were performed for each model of soft starter and for each of those combinations, the tested motor was started once. Then, the same tests were repeated, but this time with the tested motor fully loaded. This was achieved by varying the excitation voltage of the DC machine coupled to the tested motor. Afterwards, the procedure was repeated first for the case of one broken bar and then for the case of two broken bars.
For each start-up, the coil-flux sensor was placed in a position which allowed one to obtain a combination of axial and radial flux (called Position B, see Figure 1). In addition, the current signals of one of the supply phases was captured by means of the above-mentioned current clamp. These tests allowed one to obtain a batch of signals of the tested SCIM under different starting conditions. The different combinations of parameters performed for each soft starter are listed in Table 8, and also the number of signals obtained for each model.

Results and Discussion
In this section, the results obtained by applying the proposed methodology are shown. First, a comparison of the different persistence spectra for the four models of soft-starter and each health case are presented, highlighting the differences found. Then, the effectiveness of the CNN proposed for each model of soft starter separately and for all of them combined is shown.
Although many analyses were carried out in this work, only the most representative are shown here. In this regard, in Figure 13, persistence spectra for each rotor health state and soft starter model are compared. Those persistence spectra correspond to tests when the motor was fully loaded. The settings for each soft starter model, were those corresponding to the combination of longest time-ramp and lowest initial voltage (see Table 8). In Figure 14, as an example, the above-mentioned differences for each health state of the rotor are highlighted in a set of PS images. The same differences can be identified in all the cases studied. Taking a look at the images in Figure 13, some differences can be distinguished. For all the cases, some components from 0 Hz to 50 Hz increment their amplitudes as the fault worsens. This also happens to some components from 150 Hz to 50 Hz. This fact fits with the typical behavior of the axial and radial components associated to the presence of broken bars. As it can be seen in Figure 2, axial components s· f and 3·s· f evolve from 50 Hz to almost 0 Hz for the first one and from 150 Hz to almost 0 Hz for the second one. On the other hand, radial components evolve from 50 Hz to 0 Hz and again to almost 50 Hz for the case of f ·(1 − 2·s) and from 150 Hz to almost 50 Hz for the case of f ·(1 + 2·s). Since the position of the flux sensor allowed one to capture the combination of radial and axial stray-flux, it makes sense to see the influence of both types of components in persistence spectra.
In Figure 14, as an example, the above-mentioned differences for each health state of the rotor are highlighted in a set of PS images. The same differences can be identified in all the cases studied. For the case of the SCHNEIDER model, 945 training samples (315 samples per category) were used to train the CNN and 315 different samples (105 samples per category) were used for the validation. In Figure 15, it can be observed that the accuracy of the CNN reaches 100%, which means that the methodology can identify and separate the different rotor health states in every case, even with different combinations of time-ramp and initial voltage. In addition, the training process reaches 100% accuracy after about 150 iterations, in epoch two. Moreover, it becomes stable at 100% after, more or less, 900 iterations.  Figure 15, it can be observed that the accuracy of the CNN reaches 100%, which means that the methodology can identify and separate the different rotor health states in every case, even with different combinations of time-ramp and initial voltage. In addition, the training process reaches 100% accuracy after about 150 iterations, in epoch two. Moreover, it becomes stable at 100% after, more or less, 900 iterations.
For the case of the ABB model, 567 training samples (189 samples per category) were used to train the CNN and 189 different samples (63 samples per category) were used for the validation. In Figure 16. it can be observed that the accuracy of the CNN also in this case reaches 100%. Furthermore, the training process reaches 100% accuracy after about 180 iterations, in epoch four. Moreover, it becomes stable at 100% after, more or less, 200 iterations.
For the case of the SIEMENS model, 756 (252 samples per category) training samples were used to train the CNN and 252 different samples (84 samples per category) were used for the validation. In Figure 17, it can be observed that the accuracy of the CNN also in this case reaches 100%. In addition, the training process reaches 100% accuracy after about 68 iterations, in epoch one. Moreover, it becomes stable at 100% after, more or less, 160 iterations. reaches 100%, which means that the methodology can identify and separate the differe rotor health states in every case, even with different combinations of time-ramp and init voltage. In addition, the training process reaches 100% accuracy after about 150 iteratio in epoch two. Moreover, it becomes stable at 100% after, more or less, 900 iterations. For the case of the ABB model, 567 training samples (189 samples per category) were used to train the CNN and 189 different samples (63 samples per category) were used for the validation. In Figure 16. it can be observed that the accuracy of the CNN also in this case reaches 100%. Furthermore, the training process reaches 100% accuracy after about 180 iterations, in epoch four. Moreover, it becomes stable at 100% after, more or less, 200 iterations. the validation. In Figure 16. it can be observed that the accuracy of the CNN also in t case reaches 100%. Furthermore, the training process reaches 100% accuracy after abo 180 iterations, in epoch four. Moreover, it becomes stable at 100% after, more or less, 2 iterations. For the case of the SIEMENS model, 756 (252 samples per category) training samples were used to train the CNN and 252 different samples (84 samples per category) were used for the validation. In Figure 17, it can be observed that the accuracy of the CNN also in this case reaches 100%. In addition, the training process reaches 100% accuracy after about 68 iterations, in epoch one. Moreover, it becomes stable at 100% after, more or less, 160 iterations. were used to train the CNN and 252 different samples (84 samples per category) we used for the validation. In Figure 17, it can be observed that the accuracy of the CNN al in this case reaches 100%. In addition, the training process reaches 100% accuracy af about 68 iterations, in epoch one. Moreover, it becomes stable at 100% after, more or le 160 iterations. For the case of the OMRON model, 567 training samples (189 samples per category) were used to train the CNN and 189 (63 samples per category) different samples were used for the validation. In Figure 18, it can be observed that the accuracy of the CNN also in this case reaches 100%. Furthermore, the training process reaches 100% accuracy after about 160 iterations, in epoch three. Moreover, it becomes stable at 100% after, more or less, 340 iterations. used for the validation. In Figure 18, it can be observed that the accuracy of the CN in this case reaches 100%. Furthermore, the training process reaches 100% accura about 160 iterations, in epoch three. Moreover, it becomes stable at 100% after, m less, 340 iterations. Finally, in Figure 19, the confusion matrix and the training progress for all the models of soft starters combined are shown. In this case, 2835 training samples (945 samples per category) were used to train the CNN and 945 (315 samples per category) different samples were used for the validation. Although four different topologies of soft starter and different combinations of time-ramp duration and initial voltage were compared in this case, the accuracy achieved a rate of 99.89%. That is to say that only one of the samples was misclassified. Moreover, the misclassified prediction was among the healthy and first For the case of the OMRON model, 567 training samples (189 samples per category) were used to train the CNN and 189 (63 samples per category) different samples were used for the validation. In Figure 18, it can be observed that the accuracy of the CNN also in this case reaches 100%. Furthermore, the training process reaches 100% accuracy after about 160 iterations, in epoch three. Moreover, it becomes stable at 100% after, more or less, 340 iterations.
Finally, in Figure 19, the confusion matrix and the training progress for all the models of soft starters combined are shown. In this case, 2835 training samples (945 samples per category) were used to train the CNN and 945 (315 samples per category) different samples were used for the validation. Although four different topologies of soft starter and different combinations of time-ramp duration and initial voltage were compared in this case, the accuracy achieved a rate of 99.89%. That is to say that only one of the samples was misclassified. Moreover, the misclassified prediction was among the healthy and first stage of failure (one broken bar). The training process reaches the referred accuracy after about 650 iterations, in epoch three, and it becomes stable at 99.89% after, more or less, 3000 iterations.
Once the capabilities of the proposed methodology have been exposed, in Table 9, it is compared with the results of other methodologies proposed for broken bar automatic detection in soft-started induction motors. Additionally, since there are not many works focused on soft starters, the results of other works focused on Direct Online starting are also included in the table.  Once the capabilities of the proposed methodology have been exposed, in Table 9, it is compared with the results of other methodologies proposed for broken bar automatic detection in soft-started induction motors. Additionally, since there are not many works focused on soft starters, the results of other works focused on Direct Online starting are also included in the table. With regards to the accuracy of the methodologies, some of the works in Table 9 achieved a rate of 100% [22,23,49], but all of them were focused on DOL starting, and they were analyzing current signals. On the other hand, those works focused on soft-started induction motors and achieved, in both cases, an overall accuracy of 94.40%, analyzing the stray-flux [27] and the combination of stray-flux and current [28]. Both of them relied on the STFT as the time-frequency analysis tool, which displays noisy time-frequency maps when soft starters are used, making it more difficult to identify the typical patterns related to broken bars.
On the contrary, the proposed methodology relies on the use of Persistence Spectrum as time-frequency display. This method allows one to see even very short events, leading to an easier identification of fault-related patterns and allowing one to achieve an accuracy rate of 100% when analyzing each model of soft starter separately and 99.89% when analyzing all the models combined.

Conclusions
In this work, a novel methodology to automatically detect and categorize the severity of rotor faults in induction motors driven by soft starters is presented. This methodology relies on the computation of the Persistence Spectrum of the start-up transient of stray-flux signals. Then, the images obtained are used as input for a self-developed CNN in order to obtain their classification. Experimental results prove that not only the accuracy achieved is very high, improving the ones of other works focused on soft-started induction motors, but also the convergence of the training progress to the final accuracy rate is very fast.
Thus, taking into account all the above-mentioned and the results shown in the previous section, the following conclusions arise:

•
The use of the persistence spectrum as a way to analyze the stray-flux signals during the start-up transient allows one to detect the health state of the rotor. • Even in the case analyzed in this paper, where soft starters are used to drive the motor and so the level of noise in the signal makes it difficult to identify the characteristic patterns of the fault when using typical time-frequency tools (such as STFT), the use of this method allows one to identify not only the presence of the fault, but also its severity.

•
Even when different starting settings are performed and different topologies of soft starters are used, this method achieves a very high accuracy rate (99.89%), proving its reliability. • This method is a promising way to diagnose induction motors when using soft starters and could lead to integrate the diagnosis system in the soft starter itself, only by adding an external flux sensor.
The results prove that the use of this method could lead to a reliable diagnosis of the health state of the rotor of SCIMs, allowing one to schedule proper maintenance and hence, reducing the energy consumption due to the running of damaged motors and avoiding unscheduled shutdowns of the processes depending on them.
Finally, although a very high accuracy has been achieved with this classification method, further studies have to be carried out in order to evaluate the generalized possibilities of the proposed methodology. The authors are carrying out more tests to evaluate the application of this method to other faults and SCIMs of different nominal power. Furthermore, the authors plan to evaluate the application of this method to other types of motors, like synchronous reluctance motors or permanent magnet synchronous motors, to detect other kind of failures, as well as proposing complementary methodologies based on computing statistical indicators based on the obtained results that may enhance the diagnosis in some specific cases. In this appendix, the pseudocode of the CNN is shown in Figure A1: