Using ANN and SVM for the Detection of Acoustic Emission Signals Accompanying Epoxy Resin Electrical Treeing

Electrical treeing is one of the effects of partial discharges in the solid insulation of high-voltage electrical insulating systems. The process involves the formation of conductive channels inside the dielectric. Acoustic emission (AE) is a method of partial discharge detection and measurement, which belongs to the group of non-destructive methods. If electrical treeing is detected, the measurement, recording, and analysis of signals, which accompany the phenomenon, become difficult due to the low signal-to-noise ratio and possible multiple signal reflections from the boundaries of the object. That is why only selected signal parameters are used for the detection and analysis of the phenomenon. A detailed analysis of various acoustic emission signals is a complex and time-consuming process. It has inspired the search for new methods of identifying the symptoms related to partial discharge in the recorded signal. Bearing in mind that a similar signal is searched, denoting a signal with similar characteristics, the use of artificial neural networks seems pertinent. The paper presents an effort to automate the process of insulation material condition identification based on neural classifiers. An attempt was made to develop a neural classifier that enables the detection of the symptoms in the recorded acoustic emission signals, which are evidence of treeing. The performed studies assessed the efficiency with which different artificial neural networks (ANN) are able to detect treeing-related signals and the appropriate selection of such input parameters as statistical indicators or analysis windows. The feedforward network revealed the highest classification efficiency among all analyzed networks. Moreover, the use of primary component analysis helps to reduce the teaching data to one variable at a classification efficiency of up to 1%.


Introduction
The major factor that contributes to the deterioration of the insulation characteristics of dielectrics is partial discharges (PD), which occur under the influence of high-intensity electric fields. Partial

Test Setup and the Course of the Experiment
The first stage of the studies involved measurements of acoustic emission signals in electrically-stressed epoxy resin samples. The samples had cubicoid shapes of 25 mm × 10 mm × 4 mm. One sample surface of 25 mm × 4 mm was ground and coated with varnish conducive to electric current.
Since the process of tree channels forming in a dielectric may last very long, an electrode made of a T10 surgical needle with liquid resin poured inside the sample during sample formation was used, such that the distance between the sample bottom and the electrode end ranged from 1-3 millimeters. The procedure allowed obtaining the electric field intensity between the needle electrode and the electrode applied to the bottom of the sample, which was high enough to reduce the tree-forming duration to a few hours. The needle was connected to the neutral terminal of the transformer. The opposite base of the sample was adjacent to a plane copper electrode. The electrode was connected through a resistor of 0.5 MΩ resistance to a high-voltage terminal of a test transformer with the ratio of 220 V/30 kV and power of 10 kVA. The voltage was measured with an electrostatic voltmeter.
As part of the tests, acoustic signals were recorded for a dozen or so samples in which, under the influence of a high intensity of the electric field, the process of forming an electric tree began. The measurements were made for variable values of the supply voltage with a 50-Hz frequency and different distances between the electrodes.
The elastic waves of acoustic frequencies were emitted from the analyzed sample through a wave-guide made of a steel rod of 2 mm in diameter. One of its ends was put into a hole bored in the sample, while the other was connected to an electroacoustic converter.
As regards measurement time, the studied sample was placed in a methyl polymethacrylate vessel filled with electrical insulating oil. The voltage between the electrodes during the tests ranged from a few to several kV. The AE signals were measured by means of a Physical Acoustic Corporation (PAC) R3α electroacoustic converter and a filtering and amplifying system composed of 2/4/6 type pre-amplifier from PAC, a 20 ÷ 1000 kHz transmission band filter, and PAC AE5A amplifier. Upon amplification, the signal was recorded in the computer memory with an data acquisition (DAQ) NI-USB6251 card, which enabled signal recording with a sampling frequency up to 1 MS/s and 16-bit resolution. The test setup schema, the view of the measuring equipment, and the view a sample holder are presented respectively in Figure 1a-c.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 3 of 14 Since the process of tree channels forming in a dielectric may last very long, an electrode made of a T10 surgical needle with liquid resin poured inside the sample during sample formation was used, such that the distance between the sample bottom and the electrode end ranged from 1-3 millimeters. The procedure allowed obtaining the electric field intensity between the needle electrode and the electrode applied to the bottom of the sample, which was high enough to reduce the treeforming duration to a few hours. The needle was connected to the neutral terminal of the transformer. The opposite base of the sample was adjacent to a plane copper electrode. The electrode was connected through a resistor of 0.5 MΩ resistance to a high-voltage terminal of a test transformer with the ratio of 220 V/30 kV and power of 10 kVA. The voltage was measured with an electrostatic voltmeter.
As part of the tests, acoustic signals were recorded for a dozen or so samples in which, under the influence of a high intensity of the electric field, the process of forming an electric tree began. The measurements were made for variable values of the supply voltage with a 50-Hz frequency and different distances between the electrodes.
The elastic waves of acoustic frequencies were emitted from the analyzed sample through a wave-guide made of a steel rod of 2 mm in diameter. One of its ends was put into a hole bored in the sample, while the other was connected to an electroacoustic converter.
As regards measurement time, the studied sample was placed in a methyl polymethacrylate vessel filled with electrical insulating oil. The voltage between the electrodes during the tests ranged from a few to several kV. The AE signals were measured by means of a Physical Acoustic Corporation (PAC) R3α electroacoustic converter and a filtering and amplifying system composed of 2/4/6 type pre-amplifier from PAC, a 20 ÷ 1000 kHz transmission band filter, and PAC AE5A amplifier. Upon amplification, the signal was recorded in the computer memory with an data acquisition (DAQ) NI-USB6251 card, which enabled signal recording with a sampling frequency up to 1 MS/s and 16-bit resolution. The test setup schema, the view of the measuring equipment, and the view a sample holder are presented respectively in Figure 1a To ensure that the recorded signals concerned the electrical treeing while we were preparing the experiment, we also tested and observed under a microscope different states of the system, i.e., without any PDs, with corona, etc. We noticed that each of the PD types was connected with different waveforms of signals, and this can be found also in different researchers' results [35]. After that test, we rebuilt/reconfigured the setup and chose a voltage range to be sure that other types of PDs would not be present. Physically, electrical treeing is a combination of chemical and physical changes inside the specimen, so no one can be sure whether that particular signal is connected with the breaking polymer chain or PD inside the existing channel. To ensure that the recorded signals concerned the electrical treeing while we were preparing the experiment, we also tested and observed under a microscope different states of the system, i.e., without any PDs, with corona, etc. We noticed that each of the PD types was connected with different waveforms of signals, and this can be found also in different researchers' results [35]. After that test, we rebuilt/reconfigured the setup and chose a voltage range to be sure that other types of PDs would not be present. Physically, electrical treeing is a combination of chemical and physical changes inside the specimen, so no one can be sure whether that particular signal is connected with the breaking polymer chain or PD inside the existing channel.

Features' Definition
In order to extract the teaching data for neural classifiers, the analyzed signal fragment was divided into x blocks with N length of samples, where: For each x signal block, a set of statistical parameters typically used in the analysis of AE signals was identified [36,37], which was further used as input data for the neural network. The following parameters were used: signal energy (e), band power (p BP ), signal upper envelope (env), skewness (skew), and kurtosis (kurt).
The signal energy is identified based on the following relationship: Skewness, as the probability distribution asymmetry measure, is identified in the following way: where x is the mean value of samples, med (x) the block median, and σ the sample block variance. Kurtosis, which is the measure of the results' concentration around the mean, amounts to: where µ 4 is the fourth central moment of samples in the block and σ the sample block variance. A teaching data vector d was identified for each x signal sample block, where: In order to obtain training information y, which belongs to the set {0,1}, the recorded signal was blasted using the wavelet decomposition with the fifth-order Daubechies wave. Then, using the method of exceeding the trigger threshold, which was set at 20 mV for the presented problem, the information value y was allocated for each training set: 0 for no acoustic emission and 1 for the acoustic emission event (these are the red areas in Figure 2). Figure 2 presents one signal fragment with the sampling frequency of 1 MS/s selected for further studies. The red areas mark acoustic emission events determined on the basis of threshold crossing.

Features' Definition
In order to extract the teaching data for neural classifiers, the analyzed signal fragment was divided into x blocks with N length of samples, where: (1) For each x signal block, a set of statistical parameters typically used in the analysis of AE signals was identified [36,37], which was further used as input data for the neural network. The following parameters were used: signal energy (e), band power (pBP), signal upper envelope (env), skewness (skew), and kurtosis (kurt).
The signal energy is identified based on the following relationship: Skewness, as the probability distribution asymmetry measure, is identified in the following way: where x is the mean value of samples, med (x) the block median, and σ the sample block variance. Kurtosis, which is the measure of the results' concentration around the mean, amounts to: where µ4 is the fourth central moment of samples in the block and σ the sample block variance. A teaching data vector d was identified for each x signal sample block, where: , , BP e p env skew kurt = d (5) In order to obtain training information y, which belongs to the set {0,1}, the recorded signal was blasted using the wavelet decomposition with the fifth-order Daubechies wave. Then, using the method of exceeding the trigger threshold, which was set at 20 mV for the presented problem, the information value y was allocated for each training set: 0 for no acoustic emission and 1 for the acoustic emission event (these are the red areas in Figure 2). Figure 2 presents one signal fragment with the sampling frequency of 1 MS/s selected for further studies. The red areas mark acoustic emission events determined on the basis of threshold crossing.

Passband Power
The selection of the band used for identifying band power p BP was made based on the averaged spectrum for the recorded AE signals. Figure 3 presents an averaged spectrum for 1000 AE signals recorded and the spectrum of a selected signal fragment where no acoustic emission was observed. For both spectrums, the transfer function of the selected detector (R3α), given in the calibration datasheet, was taken into account.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 14 The selection of the band used for identifying band power pBP was made based on the averaged spectrum for the recorded AE signals. Figure 3 presents an averaged spectrum for 1000 AE signals recorded and the spectrum of a selected signal fragment where no acoustic emission was observed. For both spectrums, the transfer function of the selected detector (R3α), given in the calibration datasheet, was taken into account. The presented spectrum was used to select a signal band with a frequency ranging from 15 kHz-30 kHz for further analysis because the power of the signal recorded within the band increased significantly when acoustic emission occurred.

Block Length
Selecting the right length of signal blocks to study is among the most important factors affecting classification quality. When performing an analysis using a taught network, we are not able to predict if the analyzed signal block starts before, during, or at the end of the acoustic emission signal, because it is necessary to know the influence of the signal window position against the acoustic signal on the values of selected network teaching features.
In order to identify the aforementioned relationships, window lengths of 10, 60, 200, and 300 samples were used. The windows were applied to a signal containing a single acoustic emission and moved by one sample (the window position against the AE signal was changed this way). A dataset was calculated for each position of the window. Figure 4 presents the values of each teaching feature as a function of the window position, at different window lengths, against the AE signal: kurtosis (a), skewness (b), and energy (c) respectively. Moreover, Figure 4c presents the identified upper envelope of the sample signal. The presented spectrum was used to select a signal band with a frequency ranging from 15 kHz-30 kHz for further analysis because the power of the signal recorded within the band increased significantly when acoustic emission occurred.

Block Length
Selecting the right length of signal blocks to study is among the most important factors affecting classification quality. When performing an analysis using a taught network, we are not able to predict if the analyzed signal block starts before, during, or at the end of the acoustic emission signal, because it is necessary to know the influence of the signal window position against the acoustic signal on the values of selected network teaching features.
In order to identify the aforementioned relationships, window lengths of 10, 60, 200, and 300 samples were used. The windows were applied to a signal containing a single acoustic emission and moved by one sample (the window position against the AE signal was changed this way). A dataset was calculated for each position of the window. Figure 4 presents the values of each teaching feature as a function of the window position, at different window lengths, against the AE signal: kurtosis (a), skewness (b), and energy (c) respectively. Moreover, Figure 4c presents the identified upper envelope of the sample signal. moved by one sample (the window position against the AE signal was changed this way). A dataset was calculated for each position of the window. Figure 4 presents the values of each teaching feature as a function of the window position, at different window lengths, against the AE signal: kurtosis (a), skewness (b), and energy (c) respectively. Moreover, Figure 4c presents the identified upper envelope of the sample signal.

Principal Component Analysis
Principal component analysis (PCA) is a statistical method for factor-based analysis. A collection N of K variable observations is analyzed as a collection of N points distributed in a K-dimensional space. Using the distribution of singular values for the deviation covariance matrix, it is possible to develop a new system that maximizes the variance of subsequent coordinates. This way, it is possible to reduce the space size (decrease the number of the analyzed input data) [30] and the size of the neural network, while maintaining the highest possible amount of information regarding the input process.

Feedforward Neural Network
In a forward non-linear neural network (FNN), the flow of information (signals) is unidirectional. Its structure consists of the input layer, two or more hidden layers, and the output layer ( Figure 5). All layer inputs can be linked only with the neurons in the preceding layer. Input signals x in a neuron are added to weights w. Neuron y's output signal is calculated using a nonlinear activation function φ, according to the relationship [38]: The unipolar sigmoid function is among the most commonly-used activation functions: alongside a bipolar function (hyperbolic tangent):

Principal Component Analysis
Principal component analysis (PCA) is a statistical method for factor-based analysis. A collection N of K variable observations is analyzed as a collection of N points distributed in a K-dimensional space. Using the distribution of singular values for the deviation covariance matrix, it is possible to develop a new system that maximizes the variance of subsequent coordinates. This way, it is possible to reduce the space size (decrease the number of the analyzed input data) [30] and the size of the neural network, while maintaining the highest possible amount of information regarding the input process.

Feedforward Neural Network
In a forward non-linear neural network (FNN), the flow of information (signals) is unidirectional. Its structure consists of the input layer, two or more hidden layers, and the output layer ( Figure 5). All layer inputs can be linked only with the neurons in the preceding layer. Input signals x in a neuron are added to weights w. Neuron y's output signal is calculated using a non-linear activation function ϕ, according to the relationship [38]: The unipolar sigmoid function is among the most commonly-used activation functions: alongside a bipolar function (hyperbolic tangent): For learning feedforward neural network (FFN), the Levenberg-Marquardt algorithm was used. This method is one of the most effective learning algorithms. It has high convergence when network weights are near the optimal solution (as the Gauss-Newton method) and when the network is far from the optimal solution (as the descent gradient method). Detailed information about the Levenberg-Marquardt algorithm can be found in [39].

Radial Basis Functions
A radial basis functions (RBF) network, similar to FNN, is a network with an oriented flow of signals. The network neurons have a radial activation function φ. The neuron output y is described by the relationship [40]: where i is the hidden layer neuron number, j the radial network output number, wij the coefficient of the weight, x the input data vector, and ci the center of the radial function for the i th neuron of the hidden layer.
In such a neuron, the value of the output signal is not proportional to the scalar product of x inputs and w neuron weights, but inversely proportional to the distance between x and the central point of radial function c located at the hyperspace of the network input parameters. The Gaussian function is among the most commonly-used radial functions. Figure 6 presents the structure of an RBF network.

Radial Basis Functions
A radial basis functions (RBF) network, similar to FNN, is a network with an oriented flow of signals. The network neurons have a radial activation function ϕ. The neuron output y is described by the relationship [40]: where i is the hidden layer neuron number, j the radial network output number, w ij the coefficient of the weight, x the input data vector, and c i the center of the radial function for the ith neuron of the hidden layer.
In such a neuron, the value of the output signal is not proportional to the scalar product of x inputs and w neuron weights, but inversely proportional to the distance between x and the central point of radial function c located at the hyperspace of the network input parameters. The Gaussian function is among the most commonly-used radial functions. Figure 6 presents the structure of an RBF network. hidden layer.
In such a neuron, the value of the output signal is not proportional to the scalar product of x inputs and w neuron weights, but inversely proportional to the distance between x and the central point of radial function c located at the hyperspace of the network input parameters. The Gaussian function is among the most commonly-used radial functions. Figure 6 presents the structure of an RBF network.

Wavelet Neural Network
A wavelet neural network (WNN) structure consists of an input, output, and hidden layer of wavelet neurons (Figure 7).

Wavelet Neural Network
A wavelet neural network (WNN) structure consists of an input, output, and hidden layer of wavelet neurons (Figure 7). The function of wavelet neurons' activation φ is described by the relationship [41]: where x is the input signal vector, ψ the wavelet function, ζ the translation parameter, and ε the scale parameter.
The process of such network teaching involves the selection of the scale parameters and shifting each wavelet ψ used inside a wavelet neuron nucleus. For the study, the authors used Mexican hat wavelet function.
Contrary to other network types, drawing of the initial values of the wavelet parameters ζ and ξ may result in the teaching algorithm being stuck in the local minimum. That is why network teaching is preceded by the initialization of initial parameters. A feature selection method developed by Oussar and Dreyfus was used for the study [42]. It consists of the following steps:

•
Step 1: creating a library of wavelets with different parameters.

•
Step 2: removing those wavelets not falling within the variability range of each input parameter.

•
Step 3: developing a ranking of wavelets and iterative selection of the best wavelets using the Gram-Schmidt ortho-normalization algorithms.
Subsequently, fully-initialized WNN was learned with the backpropagation algorithm. A detailed description of the WNN's structure and learning algorithm used for the study can be found in [41]. The function of wavelet neurons' activation ϕ is described by the relationship [41]:

Support Vector Machine
where x is the input signal vector, ψ the wavelet function, ζ the translation parameter, and ε the scale parameter.
The process of such network teaching involves the selection of the scale parameters and shifting each wavelet ψ used inside a wavelet neuron nucleus. For the study, the authors used Mexican hat wavelet function.
Contrary to other network types, drawing of the initial values of the wavelet parameters ζ and ξ may result in the teaching algorithm being stuck in the local minimum. That is why network teaching is preceded by the initialization of initial parameters. A feature selection method developed by Oussar and Dreyfus was used for the study [42]. It consists of the following steps:

•
Step 1: creating a library of wavelets with different parameters.

•
Step 2: removing those wavelets not falling within the variability range of each input parameter.

•
Step 3: developing a ranking of wavelets and iterative selection of the best wavelets using the Gram-Schmidt ortho-normalization algorithms.
Subsequently, fully-initialized WNN was learned with the backpropagation algorithm. A detailed description of the WNN's structure and learning algorithm used for the study can be found in [41].

Support Vector Machine
A support vector machine (SVM) was another network used in the studies. The purpose of the SVM network operation is to use a hyperplane spread in the hyperspace of the input parameters, which separates a collection of points (sets of input parameters) belonging to two different classes, with a certain error margin.
It is a type of binary classifier. Its operating principle can be presented on the example of a network with two input parameters x = (x 1 , x 2 ). Figure 8 presents the operating principle of such a classifier. The points lying in the plane (x 1 , x 2 ) are divided into two classes marked in green and viloet. The SVM classifier searches for a straight line (black line in the drawing), which will maximize the error margin limited on both sides of the curve with support vectors, the brown lines in the drawing. The points exceeding the limits of the error margin will be misclassified. The purpose of SVM teaching is to select the coefficients of the straight line separating the points in such a way that the error margin is maximized (distance between the support vectors). Algorithms of non-linear optimization with constraints, and Lagrange's method in particular, were used for the network teaching [43].

Results
Each of the described networks was taught in order to classify the signals into two groups: that which contains and that which does not contain acoustic emission signals accompanying dielectric treeing. The previously-described properties were used to develop teaching data for the blocks with the following lengths: 10, 60, 100, 200, and 300 samples.
In the first part of the results, classification efficiency was specified for a test fragment of the signal other than the network teaching signal. In the second section, the signal classification efficiency was presented for the teaching data reduced by means of PCA.
Additionally, the optimum size of the network was achieved in the cross-validation, which consisted of dividing the set of reference data into equinumerous k subsets. The network with a given number of neurons was learned on the basis of k-1 subsets, and the unused subset was the validation data. This process was repeated k times. Thus, the obtained k learning outcomes networks were averaged.

Classifier Efficiency Analysis
Each network taught was subjected to a test developed based on a random signal fragment. Table 1 presents the classification efficiency for each network type at variable block lengths based on which, calculations were made.

Results
Each of the described networks was taught in order to classify the signals into two groups: that which contains and that which does not contain acoustic emission signals accompanying dielectric treeing. The previously-described properties were used to develop teaching data for the blocks with the following lengths: 10, 60, 100, 200, and 300 samples.
In the first part of the results, classification efficiency was specified for a test fragment of the signal other than the network teaching signal. In the second section, the signal classification efficiency was presented for the teaching data reduced by means of PCA.
Additionally, the optimum size of the network was achieved in the cross-validation, which consisted of dividing the set of reference data into equinumerous k subsets. The network with a given number of neurons was learned on the basis of k-1 subsets, and the unused subset was the validation data. This process was repeated k times. Thus, the obtained k learning outcomes networks were averaged.

Classifier Efficiency Analysis
Each network taught was subjected to a test developed based on a random signal fragment. Table 1 presents the classification efficiency for each network type at variable block lengths based on which, calculations were made.

Classification Using PCA
Since the most stable efficiency results for each network were obtained for the data identified based on the block length of 300 samples, the data were subjected to PCA, and PCs were developed from them. Table 2 shows variances for different components and their percent share in the total variance. Based on the results presented in the table above, it can be concluded that the share of the first primary component in the whole signal amounted to over 98%, which denotes the possibility to reduce a system of five decisive variables to only one variable with no significant information loss. Table 3 presents the efficiency of the analyzed networks for teaching when only the first primary component was used.

Discussion
The paper proposed the use of ANN for the analysis of acoustic signals accompanying electrical treeing in epoxy resin.
The recorded signals were divided into blocks used to identify the values of the teaching data. Then, the features of the reference signal were analyzed using a moving window of a set length.
The features were divided into two groups whose properties depended on window length and position versus the beginning of the AE pulse. Hence, kurtosis and skewness, due to the high dynamics of changes, can be good indicators for the identification of the beginnings of AE pulses related to treeing. The values of the parameters, however, did not reflect the duration of AE signals, and their use required the application of an analysis window with a minimum length. In the analyzed case, the results were regarded as satisfactory and repeatable for window lengths over 200 samples.
The other group of analyzed parameters, which describes the signal by its power characteristics, i.e., band power and signal energy, demonstrated elevated values during the signal. The steepness of the signal energy increase did not depend on the analysis of the window length, but a relevant selection of the length helped to identify the place in the signal where the signal amplitude dropped significantly. This is seen in the diagram as an inflexion point.
Since AEs in electrical treeing are related to the breaking of polymer chains or the presence of partial discharge in the existing channels, the value and dynamics of signal energy changes during a single pulse can be linked to treeing intensity.
By analyzing the results for each neural network, one can see that FFN demonstrated the highest efficiency. Slightly poorer results were obtained for RBF, WNN, and SVM. Moreover, for FFN and RBF, the results for each window length applied to the signal were comparable. WNN and SVN rendered better results for the greater analysis of window lengths. Moreover, WNN in its realization used only two wavelet neurons, while the structure of FNN required at least 15 neurons.
In the case of networks taught by means of the first primary component, one can state that the efficiency of each network dropped by a value that did not exceed 0.5%.
Summing up, the high efficiency in detecting AE signals accompanying electrical treeing needs to be highlighted. In order, however, to obtain as much information as possible on the AE pulse course, it seems justified to use an algorithm based on the following three stages: signal envelope analysis, -AE signals' detection, -analysis of individual signals.
A signal envelope analysis could provide hints for identifying the optimum window length at the discrimination level assumed on an arbitrary basis.
The performed experiments revealed that ANN can be successfully used for signal detection and classification, with the kurtosis, skewness, and energy of a single signal as the classifying parameters.
The effectiveness of the applied method was confirmed by the results obtained for various electrical parameters, i.e., different inter-electrode distances and different intensity of the electric field in this space. Regardless of the values of the above parameters, the results of AE signal identification were characterized by equally high efficiency, which proves the correctness of the assumptions made. The next stage covered a detailed analysis of the signals classified in the previous step with regard to the criteria assumed by the experimenter.
The completed analyses justify the usefulness of further studies on using ANN for partial discharge analysis and electrical treeing for solid dielectrics.
This knowledge could be of particular value to persons making diagnoses on the condition that the proposed method could be used online. That is why future efforts should focus on developing algorithms that would satisfy this demand.

Conflicts of Interest:
The authors declare no conflict of interest.