Artificial Neural Network Application for Partial Discharge Recognition: Survey and Future Directions

In order to investigate how artificial neural networks (ANNs) have been applied for partial discharge (PD) pattern recognition, this paper reviews recent progress made on ANN development for PD classification by a literature survey. Contributions from several authors have been presented and discussed. High recognition rate has been recorded for several PD faults, but there are still many factors that hinder correct recognition of PD by the ANN, such as high-amplitude noise or wide spectral content typical from industrial environments, trial and error approaches in determining an optimum ANN, multiple PD sources acting simultaneously, lack of comprehensive and up to date databank of PD faults, and the appropriate selection of the characteristics that allow a correct recognition of the type of source which are currently being addressed by researchers. Several suggestions for improvement are proposed by the authors include: (1) determining the optimum weights in training the ANN; (2) using PD data captured over long stressing period in training the ANN; (3) ANN recognizing different PD degradation levels; (4) using the same resolution sizes of the PD patterns when training and testing the ANN with different PD dataset; (5) understanding the characteristics of multiple concurrent PD faults and effectively recognizing them; and (6) developing techniques in order to shorten the training time for the ANN as applied for PD recognition Finally, this paper critically assesses the suitability of ANNs for both online and offline PD detections outlining the advantages to the practitioners in the field. It is possible for the ANNs to determine the stage of degradation of the PD, thereby giving an indication of the seriousness of the fault.


Introduction
Over the years, partial discharge (PD) recognition has been a topic of interest for a number of reasons, in particular the need to distinguish between different PD fault sources within the insulation systems of power apparatus and discriminate them from extraneous interference events considered as noise [1][2][3][4][5].PDs are the electrical discharges that occur within or outside the insulation of a high-voltage (HV) system under electric stress [6,7].It is essential to recognize these faults at an early stage before they lead to disastrous conditions of the equipment with serious financial and safety implications.Therefore, developing techniques to characterize and classify PD has become of profound importance to condition monitoring (CM) engineers [8].The nature, form and characteristics of PD have been widely investigated and, in many ways, established [9][10][11].Despite that, a step forward must be given to determine novel techniques that can effectively classify PD patterns and give reliable assessment on the nature of the PD faults.To carry out the pattern recognition task, four main techniques have been recognized [12].They are the template matching, statistical approach, syntactic approach and the intelligence systems: (1) In template matching, a sample of the patterns to be recognized is readily available and correlated with a stored template.Examples of this technique are the distance classifiers, e.g., the minimum distance classifier [13].(2) In statistical approach, each pattern is characterized by some measured features and represented as a point in multi-dimensional space [14].The objective of this second technique is to choose those features that allow pattern fingerprints belonging to various categories to occupy separate regions in a multi-dimensional feature space.(3) The syntactic approach is another technique for recognizing complex patterns.In this case, hierarchical observation is adopted where a pattern is regarded as being composed of subpatterns, which are individually less complex sub patterns [15].The main complex pattern is a function of interrelationships between these smaller sub patterns.(4) The intelligence system techniques one example is the artificial neural network (ANN).
Based on the aforementioned pattern recognition techniques, distance classifiers, statistical classifiers and artificial intelligence classifiers have been applied to recognize PD.Some examples of the distance classifiers which have been applied are the minimum distance classifier [13] and nearest neighbour classifiers [16].Statistical classifiers employed are the Bayes classifiers [17] and the recognition rate classifiers [18], while the intelligence classifiers include ANNs [7,10,[19][20][21], fuzzy logic controllers [17], hidden Markov models [22,23], support vector machines [24,25], genetic algorithms [26] and data mining techniques [27].Among them, the ANN is one of the most successful pattern recognition techniques because of its capability to learn input-output relationships from a few examples.Several ANN techniques applied for PD pattern recognition include the feed-forward neural network using the back-propagation (BP) [10,16,28], the Kohonen self-organizing map (KOH) and learning vector quantization (LVQ) [16,29], adaptive resonance theory [30], counter propagation neural network [31], probabilistic neural network (PNN) [18,32], cellular neural network [33], modular neural network (MNN) [34,35], extension neural network [36], fuzzy neural networks [37], and most recently the ensemble neural network (ENN) [7].These techniques yield encouraging results, with recognition rates reaching as high as 90% in some instances, when testing was done with unknown PD fingerprints.When applied for PD pattern recognition, the template matching approach (e.g., minimum distance classifier) and the intelligent technique (e.g., ANN) recorded up to 100% recognition rate for some PD fault examples [10,38].However, the statistical approach (e.g., principal component analysis) is commonly applied as feature extraction technique of PD data in order to determine the most suitable parameters for classification.The Syntactic approach has never been applied to classify PD fingerprints.
Energies 2016, 9, 574 3 of 18 In this paper, a comprehensive survey on the performance of several ANN models applied for PD recognition has been carried out, together with their strengths and limitations.ANN is chosen for this study because of its wider adoption for PD recognition by several researchers.The main advantage of the ANN over all other techniques is its ability to learn complex nonlinear input-output relationships and apply sequential training procedures in order to adapt themselves to the data to be recognized.Suggestions for improvement are made and the impact of the ANN research on real time PD location and recognition is critically analysed.
Section 1 is the introduction, while Section 2 evaluates the impact of ANN research on practical PD recognition.Section 3 describes the BP algorithm topologies.In Section 4, the ANN models applied for PD recognition are discussed.Section 5 presents the previous research works on ANNs for PD recognition.Afterwards, Section 6 provides some discussions and strength of the ANN when applied for PD recognition.Section 7 presents the limitations and suggestions for future improvement.Finally, the conclusions are presented in Section 8.

Impact of the Artificial Neural Networks Research on Practical Partial Discharge Recognition
CM has become a vital technique in HV equipment maintenance and is increasingly attracting attention globally [39,40].The need to minimize fault alarms and allowing planned maintenance bring a series of advantages to the power industry.There are needs for lower maintenance cost, reducing the severity of damages, minimizing accidents and ensuring the safety of personnel.Due to these challenges, the development of new techniques for identifying PD sources has become the main challenge of many experts interested in improving the procedures currently used in condition-based maintenance (CBM).
This paper proposed that the ANN could be a potential tool for future CM equipped with improved sensitivity, reliability, intelligence, and cost savings.The question now is how the ANN can be applied to improve CM and its assessment.It is obvious that the ANN on its own cannot perform all the CM functions but it can be used in conjunction with the existing techniques to provide a robust CM tool.Due to many advantages of the online CM, the capability of the ANN to be applied to online monitoring can be examined.Reviewing the literature [41] shows that online CM systems (e.g., PD measurement) have four main parts, i.e., sensors, data acquisition, fault detection and diagnosis.The sensors usually detect the fault and convert a physical quantity to an electrical signal.Then, data acquisition systems process this information from the sensors usually using microcomputers.Finally, the fault detection and diagnosis systems determine the nature of the fault and clear indication for maintenance.
Current fault detection techniques involve the application of frequency and time-domain signal processing techniques to obtain signatures for a fault or normal condition.At this moment, fault diagnosis is being carried out by experts with the aid of computers and advanced techniques, such as in the one demonstrated by Álvarez et al. [42].The ANN can be very attractive for both online and offline fault detections and diagnoses and can reduce the reliance on experts for fault interpretation, thereby reducing cost and visual implementation work.The ANN can be trained offline with all possible fault data.If sufficient data is not available in terms of scope (i.e., data must be available from wide range of operating conditions ideally including fault, unusual and undesirable conditions), then the developed ANN may not possess the adequate accuracy or functionality for intended applications.Therefore, the data used for training the ANN can come from the actual HV plant either in service or from factory test.If such data is not available then the possibility of using simulated data from laboratory experiment may be investigated.After sufficient training, the ANN can now act as an experienced evaluator.The moment the fault data is fed into the developed ANN, it can simply indicate the fault within seconds.Through training and testing with the known fault, the ANN can also track the degradation level and indicate the urgency for fault correction.However, ANN has some limitations.These include excessive training involved and lack of sufficient PD fault examples currently occurring in the field and in some cases one fault may lead to another fault.Multiple concurrent faults and the Energies 2016, 9, 574 4 of 18 problems associated with the presence of noise sources of different nature may also hinder the process of identification.
A diagram showing the overall structure of the proposed CM technique encompassing the ANN with post-processing elements is shown in Figure 1.It is obvious that the ANN can be applied in two stages.First, the online ANN for detecting the fault and second is the offline ANN for tracking the level of degradation of the insulation thereby offering significant potential for improving plant CM functions.The PD degradation assessment will be done offline because of repeated training and testing schemes of the ANN involved.The ANN will have significant influence on the overall maintenance cost plus reliability, by greatly reducing the time and increasing the accuracy of fault diagnosis.

The Back Propagation Algorithm
The BP algorithm is a form of supervised learning for the feed-forward ANN [10,43] and it occurs in two steps, namely forward and backward learning.It learns through recurring presentation of the input and output examples and each time back-propagating the error and updating the weights and biases until the error is minimized to the desired value [7].
Figure 2 shows how the BP algorithm is implemented together with the neurons (also known as the processing elements).The inputs (XP1, …, XPNo) are initially propagated through the network and the output is computed.Then, the error at the output (TP1, …, TPNM) is back-propagated through the network and the weights are updated according to gradient descent algorithm [34].This process continues until the mean square value at the output reaches the minimum acceptable value.Some of the major drawbacks of the BP are longer convergence time and susceptibility to training failure [44].One of the improvements made to address these problems includes adding a momentum term for faster training but at the cost of extra memory space [22,44].Despite all these issues, the BP has been widely applied for PD recognition because of its easier implementation and ability to provide better PD recognition result as compared to other ANN algorithms.Gulski and Krivda [10] applied different ANN algorithms for PD recognition and the results show that BP provides better recognition result.

The Back Propagation Algorithm
The BP algorithm is a form of supervised learning for the feed-forward ANN [10,43] and it occurs in two steps, namely forward and backward learning.It learns through recurring presentation of the input and output examples and each time back-propagating the error and updating the weights and biases until the error is minimized to the desired value [7].
Figure 2 shows how the BP algorithm is implemented together with the neurons (also known as the processing elements).The inputs (X P1 , . . ., X PNo ) are initially propagated through the network and the output is computed.Then, the error at the output (T P1 , . . ., T PNM ) is back-propagated through the network and the weights are updated according to gradient descent algorithm [34].This process continues until the mean square value at the output reaches the minimum acceptable value.Some of the major drawbacks of the BP are longer convergence time and susceptibility to training failure [44].One of the improvements made to address these problems includes adding a momentum term for faster training but at the cost of extra memory space [22,44].Despite all these issues, the BP has been widely applied for PD recognition because of its easier implementation and ability to provide better PD recognition result as compared to other ANN algorithms.Gulski and Krivda [10] applied different ANN algorithms for PD recognition and the results show that BP provides better recognition result.
One of the improvements made to address these problems includes adding a momentum term for faster training but at the cost of extra memory space [22,44].Despite all these issues, the BP has been widely applied for PD recognition because of its easier implementation and ability to provide better PD recognition result as compared to other ANN algorithms.Gulski and Krivda [10] applied different ANN algorithms for PD recognition and the results show that BP provides better recognition result.

Modular Neural Network
Several implementations of the MNN exist.The feature decomposition based on MNN was employed for PD recognition [34,35].In this case, bulk-training fingerprints are divided into several subsets, with each subset comprising values of a specific parameter, as illustrated in Figure 3.Each ANN can be trained independently using a subset of the data by the BP algorithm.To determine the output of the modular network, majority-voting technique is employed, in order to combine the output of these constituent ANNs and get the final decision.The MNN therefore recognizes a particular input parameter belonging to a particular group if the majority of the sub networks assign this input to this particular group [35].

Modular Neural Network
Several implementations of the MNN exist.The feature decomposition based on MNN was employed for PD recognition [34,35].In this case, bulk-training fingerprints are divided into several subsets, with each subset comprising values of a specific parameter, as illustrated in Figure 3.Each ANN can be trained independently using a subset of the data by the BP algorithm.To determine the output of the modular network, majority-voting technique is employed, in order to combine the output of these constituent ANNs and get the final decision.The MNN therefore recognizes a particular input parameter belonging to a particular group if the majority of the sub networks assign this input to this particular group [35].

The Ensemble Neural Network
An ENN, as presented in Figure 4, is a method of training several BP ANN topologies and combining their component predictions [45].The inspiration for this method lies on the fact that by combining the component ANN predictions, it is expected that there would be a considerable improvement on the generalization performance of the ANN.The literature [46] proves that this is only possible if the constituent neural networks forming the ensemble are concurrently diverse and accurate.Several techniques evolved for training the ENN [45], but bagging (bootstrapping) is seen to be the most effective.In bootstrapping, a number of training fingerprints are generated by bootstrap resampling of the original fingerprint.Several training samples are repeated while others are simply ignored.The bootstrapping prevents over fitting associated to NNs and provides correct values of the bias and variance [43].

The Ensemble Neural Network
An ENN, as presented in Figure 4, is a method of training several BP ANN topologies and combining their component predictions [45].The inspiration for this method lies on the fact that by combining the component ANN predictions, it is expected that there would be a considerable improvement on the generalization performance of the ANN.The literature [46] proves that this is only possible if the constituent neural networks forming the ensemble are concurrently diverse and accurate.Several techniques evolved for training the ENN [45], but bagging (bootstrapping) is seen to be the most effective.In bootstrapping, a number of training fingerprints are generated by bootstrap resampling of the original fingerprint.Several training samples are repeated while others are simply ignored.The bootstrapping prevents over fitting associated to NNs and provides correct values of the bias and variance [43].
only possible if the constituent neural networks forming the ensemble are concurrently diverse and accurate.Several techniques evolved for training the ENN [45], but bagging (bootstrapping) is seen to be the most effective.In bootstrapping, a number of training fingerprints are generated by bootstrap resampling of the original fingerprint.Several training samples are repeated while others are simply ignored.The bootstrapping prevents over fitting associated to NNs and provides correct values of the bias and variance [43].

The Probabilistic Neural Network
The PNN is a technique based on competitive learning procedure based on the Parzen window concept of multivariate probability approximation [18] (Figure 5).The PNN obtains the probability density function (PDF) based on Bayes' decision making approach [18,28].The PNN is made up of input layer, hidden layer and output layer [27].The hidden layer consists of the exemplar and

The Probabilistic Neural Network
The PNN is a technique based on competitive learning procedure based on the Parzen window concept of multivariate probability approximation [18] (Figure 5).The PNN obtains the probability density function (PDF) based on Bayes' decision making approach [18,28].The PNN is made up of input layer, hidden layer and output layer [27].The hidden layer consists of the exemplar and summation layers.Input parameters are fed into the network through the input layers.The exemplar layer is made up of Gaussian functions formed using a specified set of data points representing the centres [47].The summation or class layer performs the summing operation of the outputs coming from the second layer for each class [47].The decision layer then performs the voting-choosing the highest value [28].Then, the related class label is obtained.

The Radial Basis Function Network
The radial basis function network (RBFN) is another ANN model mostly applied to solve interpolation problems and consists of two layers [43], as shown in Figure 6.The neurons in the first layer do not give the weighted sum of inputs through the sigmoid function.The middle layer consists of the basis functions (φi), mostly made up of Gaussian functions.The centre of the basis function and the network input give the output of the first layer neurons.When the input moves away from a given centre, the neurons output drops off quickly to zero.The second layer of the RBFN network possesses receptive fields because they only respond to the inputs that are closer to their centres [43].The RBFN provides quicker training and has unsupervised learning characteristics compared to the feed-forward network, but requires many neurons for high-dimensional input spaces.

The Radial Basis Function Network
The radial basis function network (RBFN) is another ANN model mostly applied to solve interpolation problems and consists of two layers [43], as shown in Figure 6.The neurons in the first layer do not give the weighted sum of inputs through the sigmoid function.The middle layer consists of the basis functions (ϕ i ), mostly made up of Gaussian functions.The centre of the basis function and the network input give the output of the first layer neurons.When the input moves away from a given centre, the neurons output drops off quickly to zero.The second layer of the RBFN network possesses receptive fields because they only respond to the inputs that are closer to their centres [43].The RBFN provides quicker training and has unsupervised learning characteristics compared to the feed-forward network, but requires many neurons for high-dimensional input spaces.
function and the network input give the output of the first layer neurons.When the input moves away from a given centre, the neurons output drops off quickly to zero.The second layer of the RBFN network possesses receptive fields because they only respond to the inputs that are closer to their centres [43].The RBFN provides quicker training and has unsupervised learning characteristics compared to the feed-forward network, but requires many neurons for high-dimensional input spaces.

Relevant Previous Research Works on Artificial Neural Network for Partial Discharge Recognition
Previous research has been undertaken by many authors on the application of NNs for PD pattern recognition.Research on the application of NNs for PD pattern recognition seems to have started in the early nineties.Because of the advantages of the phase-amplitude-number (φ-q-n) patterns (e.g., its visible discriminating capability) [48] in evaluating PD defects, earlier research started by extracting useful information from these distributions.Since different types of PD faults

Relevant Previous Research Works on Artificial Neural Network for Partial Discharge Recognition
Previous research has been undertaken by many authors on the application of NNs for PD pattern recognition.Research on the application of NNs for PD pattern recognition seems to have started in the early nineties.Because of the advantages of the phase-amplitude-number (ϕ-q-n) patterns (e.g., its visible discriminating capability) [48] in evaluating PD defects, earlier research started by extracting useful information from these distributions.Since different types of PD faults generate different ϕ-q-n patterns (see example, Figure 7), the ANN was able to discriminate these faults even with slight pattern variations.The initial stage in pattern recognition was the choice of appropriate fingerprints that can be applied as training and testing parameters for the ANN.
Energies 2016, 9, 574 7 of 17 generate different φ-q-n patterns (see example, Figure 7), the ANN was able to discriminate these faults even with slight pattern variations.The initial stage in pattern recognition was the choice of appropriate fingerprints that can be applied as training and testing parameters for the ANN.Earlier research work by Suzuki and Endoh [49] showed how the φ-q-n patterns from a needle-type defect in cross-linked polyethylene (XLPE) cable are transformed into smaller patterns by reducing the number of pixels, thereby minimizing the number of amplitude and phase resolutions.This is to ensure reduction of the input data to the ANN.A pixel corresponds to a specific phase angle range and a specific discharge magnitude.The paper applied the BP algorithm and the results showed that the correct response reaches 100% detection probability and converges rapidly for the smaller distributions as compared to larger distributions.This result clearly indicates that smaller numbers of pixels in the φ-q-n distributions are better PD recognition parameters for the ANN.
The technique of choosing learning fingerprints by Suzuki and Endoh [49] was adopted by Hozumi et al. [50] and Phung et al. [51].They also applied the BP algorithm and the result shows that the ANN learns and updates faster with high recognition rate above 90%.Gulski and Krivda Earlier research work by Suzuki and Endoh [49] showed how the ϕ-q-n patterns from a needle-type defect in cross-linked polyethylene (XLPE) cable are transformed into smaller patterns by reducing the number of pixels, thereby minimizing the number of amplitude and phase resolutions.This is to ensure reduction of the input data to the ANN.A pixel corresponds to a specific phase angle range and a specific discharge magnitude.The paper applied the BP algorithm and the results showed that the correct response reaches 100% detection probability and converges rapidly for the smaller distributions as compared to larger distributions.This result clearly indicates that smaller numbers of pixels in the ϕ-q-n distributions are better PD recognition parameters for the ANN.
The technique of choosing learning fingerprints by Suzuki and Endoh [49] was adopted by Hozumi et al. [50] and Phung et al. [51].They also applied the BP algorithm and the result shows that the ANN learns and updates faster with high recognition rate above 90%.Gulski and Krivda [10] evaluated the performance of different ANN algorithms, though they used a different approach in determining the input pattern when compared to that of Suzuki and Endoh [49].
Gulski and Krivda [10] studied the application of three types of ANN algorithms for classifying two-electrode PD models.These are models of artificial defects of industrial objects in 400 kV gas insulated substation (GIS) compartments.The work derived the H n (ϕ)+, H n (ϕ)´, H qn (ϕ)+, H qn (ϕ)´, H n (q)+ and H n (q)´plots during 20 min of testing at 20% above PD inception voltage and the patterns were evaluated using 15 sets of statistical fingerprints.These included the Skewness (sk) and Kurtosis (ku) of the positive and negative half cycles of the H n (ϕ) and H qn (ϕ) histograms, as well as the cross-correlation (cc), discharge factor (Q) and the number of peaks.Definition of these statistical parameters can be found from the literature [10].These statistical tools form the bulk of training and testing data for ANN models and encouraging performance, up to 100% rate was recorded for trained PD fingerprints.Recognition efficiency of 100% was obtained for the BP as compared to others, which had efficiency of 70%.Despite the success of this scheme, a number of PD fault misclassifications were recorded.For each algorithm, approximately 8 out of 12 PD defects were misclassified as belonging to others, but Gulski and Krivda [10] did not come up with logical conclusion regarding this observation.
Along with Gulski and Krivda [10], further literature has adopted the use of statistical fingerprints derived from ϕ-q-n patterns.For example, Candela et al. [52] developed a PD recognition system where statistical Weibull analysis was applied to the 3-D for feature extraction.The paper considered three artificial PD geometries, i.e., dielectric surface discharges in air, a metallic dielectric parallel air gap and a dielectric bounded spherical cavity.The sk, ku, α and β values formed the input parameters to the ANN.Based on the application of these parameters, success rates up to 98% were recorded.Further, Mirelli and Schifani [29] evaluated the application of statistical fingerprints as input variables to the ANN.They evaluated two separate parameters from each of the three PD patterns, i.e., H n (q), H n (ϕ) and H qn (ϕ).From the H n (q) histograms, α and β were determined, while from the H n (ϕ), sk and ku were evaluated.The lacunarity (measure of denseness of the fracture surface) and dimensionality (quantification of the surface roughness of the ϕ-q-n plot) are other factors derived from the ϕ-q-n plots.These six parameters were applied as inputs for training the ANN using the BP algorithm.The discriminating capability of the system was tested with high recognition rate of 92% in 20 kV insulators.In another work, Karthikeyan et al. [28] investigated the effectiveness of the BP algorithm for recognition of PD defects in voids, corona and surface discharges, using various statistical measures in order to obtain the fingerprints for the ANN.Some statistical parameters considered as inputs to the ANN include: (1) maximum and minimum values of a specific parameter; (2) measures of dispersion i.e., range, mean deviation, quartile deviation; and (3) measures of the central tendency.Their results show how the BP algorithm was able to show a good recognition rate up to 100% for some testing examples, though a number of misclassifications were recorded The paper concludes that the BP algorithm does not possess the ability to be utilized for online training because of the excessive training time needed to obtain the MSE at the output of the ANN.The learning rate also plays an important role in the convergence of the system as high values of learning rates yield less training time with high convergence error.However, with corrective measures as proposed by the author of this paper in the preceding section these issues can be addressed.
Other investigations employ a large matrix of parameters derived from the phased-resolved patterns as training and testing fingerprints, comprising the number of discharges, the amplitude and the phase angle.These have demonstrated good classification potential even with untrained data.For example, Badent et al. [53] developed a novel PD diagnosis system using artificial ANNs.The training data consisted of a (125 ˆ125) matrix derived from the phase resolved patterns, where the row index is the apparent charge and the column index is the phase resolution.The value in each index corresponds to the number of discharges.The system recorded excellent performance with Hong et al. [34] in their paper investigated the application of feature decomposition based MNN for classifying PD patterns.The training data is comprised of fingerprints derived from the ϕ-q-n PD patterns.These included pulse count, average PD magnitude and maximum PD magnitude.The bulk training parameters were partitioned into three subsets with each subset comprising values of a specific parameter.Three ANNs were independently trained by the BP algorithm by using a subset of the data.To determine the output of the MNN, majority-voting technique combined the output of this ANN to give the final decision.The MNN learns faster than the single network and has been shown to perform better especially when discriminating unknown data.
Apart from the work of Badent et al. [53], Yamazaki et al. [54] investigated the application of ANNs to categorize PD patterns from voids with or without ultraviolet radiation.Partial discharge inception voltages (PDIV) were used as the training fingerprints.The performances of the ANNs were evaluated based on their mean square error (MSE).The network with the highest number of layers has the least MSE.
Mazroua et al. [16] did not adopt the usual feature extraction technique using ϕ-q-n PD patterns but rather used pulse train patterns for classifying PD from electrical trees and voids.Some specific parameters chosen included peak amplitude, rise time, fall time, duration and the area covered by the PD pulse.The ANN was also able to recognize some changes as a result of ageing.The ANN was only able to give good recognition on single discharge sources but the paper considered future work into designing ANNs for multiple defects.
In another work, Tian et al. [44] also applied Fourier transform for PD identification but with spectral density of acoustic signals for choosing the learning fingerprints and so did not utilize the ϕ-q-n patterns as training sets for the ANN.This technique reduces the number of input neurons and training data.The ANN algorithms also showed satisfactory results of up to 90% success rate.
Because of the misclassification problem inherent to the BP algorithm, Hoof et al. [55] developed a new ANN classification technique using the guarded neural classifier (GNC) that performs better than the usual BP algorithm when both are applied to the multi-layer perceptron neural network (MLPN).The GNC applies the nearest neighbour classification.The main advantage of the GNC network is the way it handles misclassification problem inherent to the BP algorithm, i.e., strange inputs are treated separately during learning and uncertain classifications.In order to classify these strange input vectors during training, the network output obtained during the training session is supervised and evaluated by an independent unit known as the guarded independent neuron (GI-neuron).Overall performance shows its ability to recognize new patterns without forgetting previous ones.
More recent research focused on: (1) determining the best technique of choosing initial weights and biases for training the ANNs; (2) noise elimination ability; and (3) short training and convergence times.
The noise elimination capability of ANNs was investigated by Chang et al. [56].Four experimental PD models of PD in cast-resin current transformers (CT)s, with some insulating defects were used as training sets for the ANN.The insulating defects were a perfect CT, corona discharge, low voltage coil PD and HV coil PD.From the experiments, ϕ-q-n patterns were derived from this model and about 120 matrices were obtained.Each matrix was of (M ˆN) dimension, where the x-axis covers 0 ˝-360 ˝with each phase segment of size (360 ˝/M), and the q-axis having a range of 0-400 pC with each division equivalent to (400/N).In order to ascertain the recognition efficiency, different levels of random noise were created to distort the original measurement.Their result shows that there is about 80% recognition rate for PD measurements up to a 20% noise level.
A potential technique to choose the optimal set of initial weights for training an ANN was determined by Kuo [57].Particle swarm optimization (PSO) was applied to provide the optimal set of initial weights and biases for the ANN model.PSO is the first optimization technique to get the best initial weights and biases for the BP ANN and is achieved by optimizing certain objective functions [57].The paper shows the efficiency of the scheme in identifying the insulation aging status of cast-resin transformers for both noisy and noiseless environments.There is high recognition rate of 94% without noise and with around 30% noise level; the recognition rate was still around 84%.
The work undertaken by Chen et al. [58] produced a faster learning algorithm for training the ANN.In this paper, the BP network was applied to transformer insulation diagnosis, which include analysing: (1) low-voltage coil PD; (2) coil PD; (3) corona discharge; and (4) a healthy transformer.A faster learning algorithm known as resilient propagation was adopted as the learning rule because of its high convergence speed.The algorithm showed high recognition precision and high noise elimination capability.With 30% added random noise, recognition as high as 80% for some defects was recorded.Despite the faster convergence rate implemented by Chen et al. [58], the recognition rate was lower than that obtained with the same BP ANN, but implementing PSO as it was implemented by Kuo [57].In order to improve PD recognition, there is need to investigate novel BP ANN optimization technique with resilient propagation to obtain high PD recognition in the presence of noise.
Recently, attention has been paid to the application of PNN [18,32], and RBFN [59] to categorize PD fault geometries, i.e., corona and surface discharges in air and oil.Evagorou et al. [60] applied the PNN to categorize some PD fault geometries, i.e., corona and surface PD in air and oil.After training the PNN algorithm, the input vectors containing the features for classification were then applied to calculate the PDF of each category and collectively by assigning the cost for misclassification; the result minimizes the likely risk taken.Maximum likelihood training was applied here and encouraging recognition probabilities of 99% were recorded for corona, while lower rates recorded for floating and internal discharges.In other research, Karthikeyan et al. [18] also applied the PNN to categorize single source PD patterns and a recognition performance of 100% was obtained for some input PD classes, though misclassification still persisted.This indicates that misclassification is still an issue with the BP ANN, where PD faults are misclassified as others and certain techniques to eliminate this issue must be investigated.Recently, Venkatesh and Gopal [32] focused on recognizing complex multiple source PD patterns using a composite version of the PNN, with a recognition efficiency of 97% being recorded.Chang [59] applied the RBFN to classify insulation defects such as external discharge, internal discharge and corona, etc.
The results indicate that the RBFN has the potential for PD recognition and is very effective for clustering PD defects of insulators with less complex features, which greatly reduced the size of the PD fault database.A summary of relevant ANN implementations is shown in Table 1.

Strength of the Artificial Neural Networks Applied to Partial Discharge Recognition
From the review carried out, PD pattern recognition using ANNs covers the following aspects: ‚ Selection of faults to be investigated.
‚ Selection of appropriate fingerprints, which can be used to train and test the ANN.

‚
Achievement of high recognition rate targets.
Accordingly, several PD defects from two-electrode models as well as other models of artificial defects have been investigated and PD patterns have been captured and established.ANNs have utilized these artificial PD for pattern classification tasks.These defects include cavities or voids at various positions in insulation, corona and surface discharges in air and oil, electrical trees and floating parts in insulation.Mechanisms of cavities of different sizes and positions within the HV insulation systems such as polyethylene-terephthalate and epoxy resin have been established, including treeing pattern development and void in transformer oil.Corona discharges in air and oil have been widely investigated.Corona discharge in air is studied using a point-plane arrangement with different gap distances.Corona in liquid is studied by sharp needlepoint placed on pressboard in oil.The mechanism of corona in air and oil has already been established.Surface PD activity in oil-pressboard interface is a well-known phenomenon that has been identified as a critical effect in HV apparatus.Repeatable surface discharge measurements have been investigated from oil-pressboard interface using point-plane configuration or any other sharp point.These faults represent some of the most common defects found in transformers, underground cables and electrical machines.
Among the variety of PD parameters fed into the ANN, pulse-height (H qn (ϕ)) and phase analysis (H n (ϕ)) fingerprints appears to be the most common among researchers tackling PD classification problems.These phase and pulse-height distributions are obtained from the ϕ-q-n patterns, which are complex to analyse mathematically.They are then further converted into a 2D, i.e., H qn (ϕ) and H n (ϕ), distribution to simplify the analysis.However, several fingerprints have been determined from these distributions.These include discharge numbers, amplitude, phase and statistical operators.Statistical PD operators have been the most widely utilized and have been found to give good recognition probability, since they allow to identify uniquely parameters associated for each type of PD without being affected by the experimental set-up used or the applied voltage level during measurement.These include the sk and ku of the H n (ϕ)+, H n (ϕ)´, H qn (ϕ)+, H qn (ϕ)´distributions, Q, cc and the mcc.However, since the work of Gulski and Krivda [10], there appears to be no research carried out to investigate further parameters best suited for PD recognition by the ANN.On the other hand, the BP network is the most widely used training model for the ANN as applied to solve PD classification problems.This is due to its ease of implementation and track record of classifying complex data in other field of applications [61][62][63].High recognition rates above 90% were recorded with BP, which is a major success of this scheme.The MNN using the BP has also shown improved classification results with recognition reaching as high as90%.Nowadays, attention has been paid to the application of PNN and classification results up to 100% have been recorded for some geometry.Based on this achievement, ongoing research by Venkatesh and Gopal [32] has determined the robustness of PNNs with regard to multiple concurrent PD sources, which are difficult to be recognized effectively.

Limitations and Suggestions for Improvement
After critical analysis of the literature, some of the limitations have been identified and suggestion for future improvements of the ANN made:

‚
The BP depends on a trial and error approach to determine the optimum topology, which is time consuming [64].Though the PSO technique [58] has been implemented to obtain the optimum weights and biases of the ANN, a simpler approach has not yet been determined to ensure shorter training time of the ANN.

‚
Very few works have considered the application of ANNs for discriminating PD source positioning [49].Discharges from similar PD sources (e.g., corona, void, surface discharges) at various positions of the HV insulation may vary in characteristics depending on whether it is single or multiple sources.ANN topologies for these scenarios should be further developed.

‚
Most of the PD fingerprints applied for training and testing the ANN, have been captured over a short period of time.It is necessary to capture data over long stressing periods because some fault PD patterns change over time scales of hours (e.g., voids), which can produce significant changes in parametric statistics.The authors of this paper made an attempt to discriminate different degradation levels of the pressboard subjected to sustained oil surface discharges [7].
‚ Despite great success recorded by Gulski in applying certain statistical operators as input to the ANN, there is need to investigate novel PD parameters and those better suited for ANN.
‚ Also, as stated previously, feature selection from the ϕ-q-n pattern using statistical tools has been the most extensively utilized parameter extraction to select inputs for the ANN.However, PD data has been captured and examined using fixed phase resolutions and amplitude bins by all users.It seems that no work reported in the literature that investigates the statistical operator's sensitivity to different ϕ-q-n resolution sizes and how this can potentially affect the recognition rates of the ANNs.This is important because different equipment instruments may have different settings for the ϕ-q-n patterns and this may likely provide inaccurate PD classification result.Although, in recent work [65], has been investigated the effect of ϕ-q-n resolution sizes on ANN recognition result.

‚
Since PD fault research is still an on-going activity, to date, there appears to be no comprehensive and up to date databank of PD faults, making the recognition task more challenging.

‚
To date, multiple concurrent PD sources have become increasingly complex due to the overlapping of discharges and several attempted classification techniques do not yield a reasonably degree of success.There is need to investigate and understand the mechanism of these fault scenarios in order to effectively recognize them.However, several papers reported the correct separation of simultaneous PD sources but not with the ANN (e.g., as demonstrated by Albarracín et al. [66]).The previous application of these separation techniques will significantly improve the PD sources recognition process by the ANN.

‚
The electrical and radiated noise may be considered as a serious challenge for PD classification research, because as it is shown in Figure 8, these disturbances can be coupled with the PD signals completely altering its spectral content.Moreover, some periodic-pulsing noise from thyristors operation for example, can hide the presence of PD.Uncertainty, regarding the level and proportion of noise in PD patterns may lead to inaccurate classification.To this end, the influence of types and levels of noise on PD source classification needs to be investigated and well understood.Recently, the research carried out by Carvalho et al. [67], and Álvarez et al. [68] shows that noise can be eliminated from PD by using wavelet signal denoising that in combination with ANN techniques resulting in improved classification result.
As a summary, if these weaknesses are adequately addressed in the future work, it is possible to realize a robust and more reliable PD pattern recognition tool.

Conclusions
This paper has reviewed the recent progress made on the application of ANN for PD recognition as well as proposing some suggestions for future improvement.Tremendous success has been achieved in discriminating a number of PD fault examples, with recognition rate reaching as high as 90%.Different proportion and levels of noise in PD patterns hinders recognition task.There is the issue of long training time of the ANN using the conventional trial and error approach.There appear to be no reliable databank for PD faults as novel PD faults are being investigated, making the recognition task challenging.The mechanisms of multiple defects are not yet understood and effectively classified by the ANN, although this has been achieved with other techniques.There is also the issue of PD patterns variation over different time periods and degradation levels, which has not been well addressed and established.PD data has been captured and recognized using fixed phase resolutions and amplitude bins by all users and different PD testing apparatus may have different resolution settings that may provide an incorrect recognition result.In order to improve PD recognition using ANN certain suggestions were proposed by the authors.These include investigating new optimization techniques for PD recognition using the ANN, using PD data captured over long stressing period for training and testing the ANN, fully understanding the mechanism of multiple faults and using identical φ-q-n resolution sizes in training the ANN.
As a further contribution of this paper, the suitability of the ANN for practical PD recognition has been assessed, and benefits to the practitioners outlined.The ANN can give information regarding seriousness of the PD and the urgency of the need to rectify the fault.
maintenance cost plus reliability, by greatly reducing the time and increasing the accuracy of fault diagnosis.

Figure 1 .
Figure 1.A proposed online/offline partial discharge (PD) based condition monitoring (CM) using the artificial neural network (ANN).

Figure 1 .
Figure 1.A proposed online/offline partial discharge (PD) based condition monitoring (CM) using the artificial neural network (ANN).
Energies 2016, 9, 574 6 of 17 summation layers.Input parameters are fed into the network through the input layers.The exemplar layer is made up of Gaussian functions formed using a specified set of data points representing the centres [47].The summation or class layer performs the summing operation of the outputs coming from the second layer for each class [47].The decision layer then performs the voting-choosing the highest value [28].Then, the related class label is obtained.
for trained patterns and approximately 90% for new patterns.Trained patterns represent patterns already used for training while new patterns are patterns not used for training.

Figure 8 .
Figure 8. Example of a signal formed by components of PD and noise.

Table 1 .
Summary of relevant ANNs that have been used for PD recognition.