High-Accuracy Power Quality Disturbance Classiﬁcation Using the Adaptive ABC-PSO As Optimal Feature Selection Algorithm

: Power quality disturbance (PQD) is an important issue in electrical distribution systems that needs to be detected promptly and identiﬁed to prevent the degradation of system reliability. This work proposes a PQD classiﬁcation using a novel algorithm, comprised of the artiﬁcial bee colony (ABC) and the particle swarm optimization (PSO) algorithms, called “adaptive ABC-PSO” as the feature selection algorithm. The proposed adaptive technique is applied to a combination of ABC and PSO algorithms, and then used as the feature selection algorithm. A discrete wavelet transform is used as the feature extraction method, and a probabilistic neural network is used as the classiﬁer. We found that the highest classiﬁcation accuracy (99.31%) could be achieved through nine optimally selected features out of all 72 extracted features. Moreover, the proposed PQD classiﬁcation system demonstrated high performance in a noisy environment, as well as the real distribution system. When comparing the presented PQD classiﬁcation system’s performance to previous studies, PQD classiﬁcation accuracy using adaptive ABC-PSO as the optimal feature selection algorithm is considered to be at a high-range scale; therefore, the adaptive ABC-PSO algorithm can be used to classify the PQD in a practical electrical distribution system.


Introduction
The demand for electricity in Thailand has continually increased over the past several decades, due mainly to an expansion of industrial and commercial customers. Numerous load types have been installed to the grid according to the various customer types [1][2][3]. The power quality disturbance (PQD) is an important issue in electrical distribution systems, especially since this issue becomes more severe in a system having different load types [4]. PQD can significantly decrease the reliability and performance of the grid [5,6]. According to international standards, i.e., EN 50160 and IEEE-1159, PQD can occur in various forms, such as voltage sag, flicker, swell, harmonic distortion, momentary interruption, spike, notch, transient, or a combination of these forms [7][8][9]. The power quality signals are generally categorized into stationary and nonstationary signals, which can be characterized and analyzed either by time-domain or frequency domain analysis methods [10].
Feature extraction is a well-known and reliable procedure used to identify performance degradation in a distribution system caused by PQD; feature selection and feature classifiers are also useful [11]. Feature extraction is a signal processing method to estimate the signals' data based on its statistics [11]. So far, several signal processing tools have been used for feature extraction, depending on the type of signals [12]. The empirical mode decomposition (EMD) is one of the signal analysis methods for time-domain analysis; meanwhile, Fourier transform (FT) is extensively used in the frequency domain signal analysis [13]. The wavelet transform (WT) and S-transform (ST) are applicable for both time and frequency domain signal analysis [14][15][16], and their outstanding ability makes them a suitable signal analysis method for feature selection. FT is usually used for characterizing the spectrum and harmonics of the stationary signal, whereas it is inappropriate to identify the characteristics of nonstationary signals, due to its fixed window function [10,17]. The short-time Fourier transform (STFT) was used to improve the time-frequency window of the FT during the temporary, transient signal localization [18]. ST has been proposed to improve the localization of the STFT [19]; nevertheless, ST still has poor detection potentiality for nonstationary transient disturbances [20]. The Discrete Wavelet Transform (DWT) was developed from the STFT to improve the fixed resolution issue; this extraction method can provide a short window in high-frequency compositions while maintaining a long window at low-frequency components. In the literature, DWT has been widely used for signal detection and feature extraction of a transient disturbance, i.e., fault detection and identification of the transmission line, PQD classification, and islanding detection [21][22][23].
Since feature extraction is insufficient to classify the type of power quality signals accurately, an additional tool called a "classifier" is used after the feature extraction process to improve the accuracy of power quality signal identification [10]. Artificial intelligence (AI) is widely known as an effective tool used to identify the power quality signals and solve many problems in power systems research [24][25][26]. AI is an automatic system that can behave similarly to human thinking, and can engage in decision-making, learning, problem-solving, perception, reasoning, and classification [12,27]. In particular, several AI techniques, i.e., Expert Systems, Fuzzy Logic, Artificial neural networks, and Genetic Algorithms (GA), have been applied to PQD classification, showing that it could precisely classify the power quality signal type [28][29][30][31][32][33][34]. The neural network especially demonstrates higher performance at classifying the PQD than many other AI techniques [35][36][37].
In addition, the feature selection process has been implemented extensively to increasingly improve the precision of PQD identification [38][39][40]. Feature selection is the process that eliminates the redundant information of the feature extracting data, aiming to improve accuracy and reduce the computational time of the PQD classification. Several previous studies have demonstrated that PQD classification adopting feature selection can achieve high accuracy. In 2009, adaptive particle swarm optimization (PSO) was proposed as the optimal feature selection algorithm for PQD classification, based on ST feature extraction to identify nine PQD types; this classification system could provide 96.33% accuracy [37]. H. Erişti et al. presented the k-means with the Apriori algorithm to determine the optimal feature of PQD classification using five PQD signals, and it was shown that an accuracy of 98.88% could be reached [36]. After that, the PQD classification system based on the PSO algorithm as the optimal feature selection algorithm with DWT feature extraction was introduced by R. Ahila et al. [41]. This system could provide a classification accuracy of 97.60% for ten PQD signals. A. A. Abdoos et al. showed that their PQD classification system using the Sequential Forward Selection (SFS) algorithm to find the optimal features could achieve up to 99.66% classification accuracy [13]. In 2018, the Fisher linear discriminant analysis (FDA) algorithm was performed as the optimal feature selection to classify the 15 PQD types based on Variational Mode Decomposition (VMD) as feature extraction, indicating that an accuracy of 98.82% could be accomplished [39]. Recently, L. Fu et al. performed Permutation Entropy with the Fisher Score algorithm to find the optimal feature based on 13 PQD types [40]. It was found that this system has an accuracy of 97.6%.
From the previous studies, we noticed that the PQD classification system's performance could be improved increasingly. In this paper, a combination of the adaptive artificial bee colony (ABC) and the PSO algorithms is introduced and applied to identify 13 PQD signals. We propose a high-accuracy PQD classification using a novel algorithm called "adaptive ABC-PSO" as the feature selection algorithm. The inspiration for this proposed feature selection algorithm was implementing the merit of PSO to compensate for ABC's weakness; the adaptive technique was also applied to improve the combination algorithm's searching ability. The feature extraction of the presented classification system was based on the DWT, whereas the probabilistic neural network (PNN) was performed as the classifier.

Power Quality Disturbances
In this work, 13 types of power quality signals were used for PQD classification. Ten single types of PQD signals were the pure sine waveform, sag, swell, interruption, harmonics, impulsive transients, oscillatory transients, flicker, notch, and spikes, whereas the rest were multiple-type PQD signals, including sag with harmonics, swell with harmonics, and interruption with harmonics. The mathematical models for PQD signal generation were based on the IEEE-1159 standard, as shown in Table 1. All PQD waveforms were considered for 10 cycles with 2000 sampling points, signal amplitude of 1 p.u., electrical frequency of 50 Hz, and sampling frequency of 10 kHz. The signals were randomly generated for each test within their mathematical constraints. Table 1. Mathematical equations of power quality disturbances (PQDs) [7].

C1
Pure sine Sag Interruption Harmonic Impulsive transient Flicker Swell with harmonic Interruption with harmonic

Feature Extraction
Feature extraction is the process of initializing the group of datasets as features by reducing raw data. Figure 1 shows that the dataset of power quality signals is passed through the WT with multi-resolution analysis (MRA) to obtain signals' wavelet coefficients. The statistical parameters of signals are calculated by the obtained coefficients and used for feature vector construction. This section is organized as follows: Section 3.1 describes the wavelet transform. Then, the MRA and the feature vector construction are explained in Sections 3.2 and 3.3, respectively.

Wavelet Transform
In power quality research, the WT and ST are widely known as efficient signal processing techniques. The WT is a signal processing tool that can identify the global and local components of signals through the wavelet function. The advantages of WT are its resilient time-scale representation and good conservation of time and frequency information without resolution reduction compared to the STFT; however, the WT sometimes

Wavelet Transform
In power quality research, the WT and ST are widely known as efficient signal processing techniques. The WT is a signal processing tool that can identify the global and local components of signals through the wavelet function. The advantages of WT are its resilient time-scale representation and good conservation of time and frequency information without resolution reduction compared to the STFT; however, the WT sometimes requires complicated setting parameters. The ST is extended from the continuous wavelet transform, which magnifies the referenced phase information in specific frequency bands with suitable expanse and contraction of Gaussian window as the mother wavelet. Comparing the WT to the ST shows that the WT indicates a simpler implementation and requires less memory storage than ST; therefore, the WT is more suitable for the PQD classification of numerous signal types [21][22][23]42]. In this work, we performed WT in PQD classification, since the number of signal types is huge (13 types). The DWT is developed from the WT using a Energies 2021, 14, 1238 6 of 18 discrete set of wavelet scales. This method has been extensively performed to detect the characteristics of signals, especially in the feature extraction of the recent PQD classification research [21]. In this work, we performed the DWT to identify the characteristics of signals in the feature extraction process. The expression for DWT can be written as Equation (1).
where a m 0 is the scaling parameter representing the recurrence and the length of wavelet as a function of the indicated frequency localization, m. b 0 is the translation parameter that collects a moving position of wavelet. The parameter n indicates the time localization. f (k) is the sequence of discrete point of continuous-time. The ψ is called the mother wavelet. In this paper, the fourth-order Daubechies (dB4) was used as the mother wavelet, since it was claimed in many studies that this value is suitable for analysis of power system transient signals and power quality signals [23,[43][44][45].

Multi-Resolution Analysis
The theory of the MRA algorithm was introduced by S. G. Mallat in 1989 [44]. This algorithm uses the scaling function and the orthogonal wavelet function to decompose and reconstruct signals at different resolution levels. The decomposition of a time-varying signal, f (t), can be written in terms of the scaling, ϕ m,n , and the wavelet, ψ m,n , as shown in Equations (2) and (3), respectively. Figure 2 shows that the time-varying signal is passed through a system consisting of Low-Pass (LP) and High-Pass (HP) filters at each resolution level. After the original time-domain signal is passed through the HP filter and downsampling afterwards, the highfrequency components with the first detail coefficient signal (D 1 ) are obtained. Similarly, the low-frequency component with the first approximation coefficient signal, (A 1 ), is obtained after LP filtering and the original signal's downsampling. After that, the output of signal (A 1 ) is decomposed to produce the second detail coefficient signal (D 2 ) and (A 2 ). This procedure was performed repeatedly until the desired decomposition level was obtained. In this work, the decomposition level was set to be eight levels, since this value results in the appropriate frequency range of each decomposed level for characterizing the disturbances of the focused power quality signals [18]. Then, the feature vector of the PDQ signals, F(k), can be written as Equation (4).

Feature Vector Construction
When the PQD signals are passed to the MRA process, nine signal characteristics are obtained through the statistical parameters' signal features. In this work, the eight features included the energy (E), entropy (Ent), standard deviation (σ), mean value (µ), kurtosis (KT), skewness (SK), root-mean square (RMS), and range (RG). The statistical equation for each feature is given in Table 2, where i, j = 1, 2, 3, . . . , l are the number of wavelet decompositions in level l, and N represents the number of coefficients in each decomposed data. X is the signal obtained from eight detail signals (D 1 , D 2 , . . . , D 8 ) and one approximate signal (A 8 ). The total feature of coefficients is 72. The feature selection algorithm selects the best nine statistical parameters that can provide the highest classification accuracy. This quantity of selected features corresponds to the number of obtained signal coefficients. Then, the feature vector of each statistical parameter was created, as shown in Table 3. Since the feature vector's magnitude might be obtained in

Feature Vector Construction
When the PQD signals are passed to the MRA process, nine signal characteristics are obtained through the statistical parameters' signal features. In this work, the eight features included the energy ( E ), entropy ( Ent ), standard deviation ( σ ), mean value ( μ ), kurtosis ( KT ), skewness ( SK ), root-mean square ( RMS ), and range ( RG ). The statistical equation for each feature is given in Table 2 one approximate signal ( 8 A ). The total feature of coefficients is 72. The feature selection algorithm selects the best nine statistical parameters that can provide the highest classification accuracy. This quantity of selected features corresponds to the number of obtained signal coefficients. Then, the feature vector of each statistical parameter was created, as shown in Table 3. Since the feature vector's magnitude might be obtained in highly different scales, the feature vectors were normalized before entering the classifier, as expressed in Equation (5).
where i Z is the normalized data, i F is feature vector, min F and max F are the minimum and maximum value of the feature vectors, respectively. Then, the overall feature set is given as Equation (6).
where Z i is the normalized data, F i is feature vector, F min and F max are the minimum and maximum value of the feature vectors, respectively. Then, the overall feature set is given as Equation (6).  Table 3. Feature vectors created from statistical parameters.

Probabilistic Neural Network for PQD Classification as a Classifier
To classify the type of power quality signals, the optimally selected features from the feature extraction obtained in Section 3 were passed through the classifier based on the PNN. Then, the dataset was trained and was used to classify the types of power quality  Figure 1. The PNN is one type of neural network that has been widely used as the classifier in PQD classification. It operates the neural network through the probabilistic model based on the Bayesian classifier for a pattern recognition system [46]. The main advantage of PNN is that its linear learning algorithms are capable of reaching similar results to the nonlinear learning algorithms, while conserving high accuracy. A simple implementation of PNN is also widely known for its distinctive merit, since the number of hidden layers and weights of the network are automatically defined by the network itself through the spread constant. In the literature, it has been widely shown that PNN is a highly capable tool for solving several types of classification problems [23,[47][48][49]. Figure 3 shows that the PNN system comprises three layers: The input layer, hidden layer, and output layer. It is a radial network based on the Bayes strategy and the Parzen window. The initial weight value of each node is automatically initialized. The input data set are classified according to the distribution value or probability density function, f k (X), as given in Equation (7).
where X represents the input vector, σ is the standard deviation value, which is generally known as the spread constant or smoothing parameter, N is the number of clusters, j is the number of output layers, k is the amount of training data, and the X − X kj term represents the Euclidean distance between X and X kj . The probability density function is utilized to obtain the output vector of the hidden layer of PNN, H h , as written in Equation (8).
where W xh ih represents the connection weight between the input layers and the hidden layer, i is the number of input layers and h is the number of hidden layers. The network of PNN, net, can be calculated as Equation (9). Each pattern unit contributing to a signal has the same probability as its participated category unit. The test point is created by a Gaussian centered on the associated training point. The summation of these local estimations, computed at the corresponding category unit, provides the neural network as net j = max k (net k ), given in Equation (9).
where W hy hj is the connection weight between the output layers Y and the hidden layers. The max k (net k ) operation provides the desired category of the test point. ( ) where xh ih W represents the connection weight between the input layers and the hidden layer, i is the number of input layers and h is the number of hidden layers. The network of PNN, net, can be calculated as Equation (9). Each pattern unit contributing to a signal has the same probability as its participated category unit. The test point is created by a Gaussian centered on the associated training point. The summation of these local estimations, computed at the corresponding category unit, provides the neural network as ( ) max j k k net net = , given in Equation (9).
where hy hj W is the connection weight between the output layers Y and the hidden layers.

Proposed Adaptive ABC-PSO Algorithm as Optimal Feature Selection
The literature shows that the feature selection process is an important procedure in PQD classification, since this process can significantly impact the classification system's performance indicators, i.e., the quantity of feature data obtained from feature extraction, processing time, and accuracy [38][39][40]. Figure 1 shows that the feature selection was per-

Proposed Adaptive ABC-PSO Algorithm as Optimal Feature Selection
The literature shows that the feature selection process is an important procedure in PQD classification, since this process can significantly impact the classification system's performance indicators, i.e., the quantity of feature data obtained from feature extraction, processing time, and accuracy [38][39][40]. Figure 1 shows that the feature selection was performed to select the feature extracting data's optimal features before it was passed through the classifier. We expected that the classification system could provide higher accuracy with a much shorter computational time based on the optimally selected data. We aimed to improve PQD classification performance by using a combination of adaptive ABC and PSO algorithms as a feature selection algorithm. This combination was motivated by the fact that the ABC algorithm can provide a high-quality solution, but its convergence rate is low; therefore, the fast convergence rate of the PSO algorithm might compensate for the drawbacks of the ABC algorithm. The proposed technique was also adapted to the ABC-PSO combination to improve the searchability of the ABC. Details of the proposed algorithm are described in the following sub-sections.

Artificial Bee Colony
The ABC is one of the swarm intelligent algorithms based on an emulation of honey bees seeking nectar [50,51]. This algorithm's outstanding merits are a high-quality solution, good local search, and an updated solution mechanism. The type of bee grouping of the ABC algorithm is divided into three groups: The employed bee, onlooker bee, and scout bee. Each bee type typically finds the nectar source and exchanges the information with others until the best nectar source is found. The ABC algorithm procedure starts from the employed bee randomly searching the position of the best nectar and then memorizing the obtained position, as written in Equation (10). In this work, it is noted that the feature vector represents the position of the nectar.
where x ij is the feature vector, v ij is an updated vector according to the memorized feature vector, i is the iteration number, j is the feature dimension, k is a random feature at each iteration, and ϕ is a random value between −1 and 1. The feature vector is updated by the onlooker bee, based on the result previously memorized by the employed bee. If the result obtained by the onlooker bee is higher than that of the observation bee, the existing result will be replaced by the updated feature data position. Then, all feature vectors collected by the employed bee are analyzed by the onlooker bee. The quality of the solution called fitness value, f it(θ i ), is calculated from the fitness function, f i , of the classifier, as given in Equation (11).
The probability of the highest solution related to the PQD classification accuracy, p i , can be calculated using Equation (12).
where θ i is the accuracy of the feature vector and S is the number of food souces, which is equal to the number of employed bees. Then, the probability of the fitness having the best solution is selected. An updated food position is calculated using Equation (13).
where v is an updated feature vector, k is a random value, φ is a random number between 0 and 1 in which the neighbor food sources are around z. The scout bee randomly searches the solution within their limited boundary, which is defined as "Limit". If the iteration of the maximum cycle number and the fitness value of classification accuracy cannot be improved, it will be abandoned, and then the employed bee would turn into a scout bee. The expression for finding the new solution of the scout bee's feature vector is given as Equation (14).
where z j i is the abandoned feature vector, j is the feature dimension, and i is the iteration number.

Particle Swarm Optimization
The PSO algorithm is a kind of swarm intelligence technique that emulates a swarm's foraging behavior, for example, birds, fish, ants, or bees [52]. The advantages of this algorithm are its simple implementation, good global optima, fast convergence, and short computational time [53]. The mechanism of the PSO algorithm is that the particles randomly move to find the best solution, and then the previous and the updated best positions throughout the maximum iteration number are recorded. The velocity of particles is updated using Equation (15), whereas the updated position of particles is expressed in Equation (16).
where j is the feature dimension, V is the speed of feature dimension, X is the position of feature dimension, i is the iteration number, w is the initial factor, c 1 and c 2 are the acceleration constants, r 1 and r 2 are random numbers between 0 to 1, P is the best feature dimension position identified during its past iteration, and P g is the best global feature dimension position obtained from overall iterations.

The Proposed Adaptive ABC-PSO Algorithm
As previously mentioned, we use the distinctive property of the PSO algorithm to compensate for the convergence weakness of the ABC algorithm (described in Section 5.1), aiming to improve the performance of PQD classification. An adaptive technique was also applied to the ABC algorithm to improve the searching resolution of the overall algorithm. The procedure to implement the adaptive ABC-PSO algorithm as the feature selection algorithm of PQD classification is shown in the pseudocode of Figure 4 and is detailed as follows.
Step 1: Initialize the control parameters of PSO and ABC algorithms, including the initial speed, position, and number of particles of the PSO algorithm.
Step 2: Determine the solution of the PSO algorithm. All feature vectors are compared in order to select the one having the best solution (highest classification accuracy). After that, calculate the fitness value of the selected optimal feature vector. In this work, the accuracy of identifying each type of PQD is defined as a ratio of the numbers of correctly classified signal types to the numbers of test signals. The overall accuracy (classification accuracy) is defined as the average accuracy of all signal types.
Step 3: The highest classification accuracy obtained from the PSO algorithm is adopted as the initial solution of the ABC algorithm. Then, obtain a new vector feature by using the adaptive ABC, as in the following details. In this work, we enhanced the exploration of the ABC-PSO algorithm by enlarging the employed bee's search space, while finding the new position of best nectar. This process aims to achieve a better solution that could be located near the previously obtained solution; therefore, the improved random value of the proposed self-adaptive bee colony technique is redefined based on the random value of the previous iteration, as given in Equation (17).
Next, calculate the fitness value of the classification accuracy of each feature vector. If the result obtained by the adaptive ABC algorithm is better than that of the PSO algorithm, the PSO algorithm's optimal feature is replaced by that of the adaptive ABC algorithm.
Step 4: The onlooker bee selects a feature vector according to the quantity of the optimal feature following the probability function, shown in Equation (12).
Step 5: The scout bee randomly generates the feature vector of the next iteration using Equation (13) and then evaluates the optimal feature.
Step 6: The optimal feature vector is selected, while the fitness value is collected and used as the input data of the feature selection.    (17) where ϕ k ij is a random value obtained from the previous iteration. w max and w min are the initial and the final weights, respectively, C is the current iteration of the adaptive process, C max is the number of iterations of the adaptive process, j is the feature dimension, i is the iteration number of the employed bee phase, and k is the iteration of the updated random value. Then, one randomly selected element of the feature vector is replaced by the improved random value to generate the updated vector feature, as written in Equation (18).
Next, calculate the fitness value of the classification accuracy of each feature vector. If the result obtained by the adaptive ABC algorithm is better than that of the PSO algorithm, the PSO algorithm's optimal feature is replaced by that of the adaptive ABC algorithm.
Step 4: The onlooker bee selects a feature vector according to the quantity of the optimal feature following the probability function, shown in Equation (12).
Step 5: The scout bee randomly generates the feature vector of the next iteration using Equation (13) and then evaluates the optimal feature.
Step 6: The optimal feature vector is selected, while the fitness value is collected and used as the input data of the feature selection.

Results and Discussion
The PQD classification system's performance based on the proposed adaptive ABC-PSO as a feature selection algorithm was evaluated under the 13 power quality signal types. The feature extraction of the presented classification system was based on the DWT, while the PNN was performed as the classifier. The simulations were based on a MATLAB program. The mechanism of the adaptive ABC-PSO as optimal feature selection algorithm is described in detail in Section 6.1. The classification performance is then demonstrated in Section 6.2; classification under a noisy environment is shown in Section 6.3. A convergence rate profile is presented in Section 6.4. The classification performance based on the real data of the distribution network is discussed in Section 6.5. Lastly, a performance comparison between the presented classification system and the other existing methods is indicated in Section 6.6. Table 4 shows all 72 features obtained from the DWT feature extraction through the MRA coefficients. The best statistical parameters of each wavelet coefficient that could provide the highest classification accuracy were selected by the adaptive ABC-PSO algorithm, as shown by the squares in Table 4. The quantity of optimal selected features corresponded to the number of signal coefficients obtained from feature extraction. Further details of the classification accuracy as a function of iteration number, spread constant, and selected features are reported in Table 5. It was found that the highest classification accuracy of 99.31% was achieved at iteration 728. The nine selected optimal statistical parameters were 5, 16,18,32,40,41,52,58, and 68; the related spread constant is indicated.

Results and Discussion
The PQD classification system's performance based on the proposed adaptive ABC-PSO as a feature selection algorithm was evaluated under the 13 power quality signal types. The feature extraction of the presented classification system was based on the DWT, while the PNN was performed as the classifier. The simulations were based on a MATLAB program. The mechanism of the adaptive ABC-PSO as optimal feature selection algorithm is described in detail in Section 6.1. The classification performance is then demonstrated in Section 6.2; classification under a noisy environment is shown in Section 6.3. A convergence rate profile is presented in Section 6.4. The classification performance based on the real data of the distribution network is discussed in Section 6.5. Lastly, a performance comparison between the presented classification system and the other existing methods is indicated in Section 6.6. Table 4 shows all 72 features obtained from the DWT feature extraction through the MRA coefficients. The best statistical parameters of each wavelet coefficient that could provide the highest classification accuracy were selected by the adaptive ABC-PSO algorithm, as shown by the squares in Table 4. The quantity of optimal selected features corresponded to the number of signal coefficients obtained from feature extraction. Further details of the classification accuracy as a function of iteration number, spread constant, and selected features are reported in Table 5. It was found that the highest classification accuracy of 99.31% was achieved at iteration 728. The nine selected optimal statistical parameters were 5, 16,18,32,40,41,52,58, and 68; the related spread constant is indicated.

Results and Discussion
The PQD classification system's performance based on the proposed adaptive ABC-PSO as a feature selection algorithm was evaluated under the 13 power quality signal types. The feature extraction of the presented classification system was based on the DWT, while the PNN was performed as the classifier. The simulations were based on a MATLAB program. The mechanism of the adaptive ABC-PSO as optimal feature selection algorithm is described in detail in Section 6.1. The classification performance is then demonstrated in Section 6.2; classification under a noisy environment is shown in Section 6.3. A convergence rate profile is presented in Section 6.4. The classification performance based on the real data of the distribution network is discussed in Section 6.5. Lastly, a performance comparison between the presented classification system and the other existing methods is indicated in Section 6.6. Table 4 shows all 72 features obtained from the DWT feature extraction through the MRA coefficients. The best statistical parameters of each wavelet coefficient that could provide the highest classification accuracy were selected by the adaptive ABC-PSO algorithm, as shown by the squares in Table 4. The quantity of optimal selected features corresponded to the number of signal coefficients obtained from feature extraction. Further details of the classification accuracy as a function of iteration number, spread constant, and selected features are reported in Table 5. It was found that the highest classification accuracy of 99.31% was achieved at iteration 728. The nine selected optimal statistical parameters were 5, 16, 18, 32, 40, 41,52, 58, and 68; the related spread constant is indicated.  The accuracy of the PQD classification both with and without using the proposed adaptive ABC-PSO algorithm as optimal feature selection, based on 100 tests, is shown in Table 6. Table 6 shows that the classification accuracy based on a single statistical parameter is about 87.38-96.84%, which falls in the low-range scale of classification compared to the other existing techniques. Thus, multiple statistical parameters are typically required in the PQD classification to improve classification accuracy. Results also show that using three and five statistical parameters could provide an overall accuracy of 97.54% and 98.15%, respectively. Meanwhile, a classification accuracy of 98.54% was obtained by using all statistical parameters. By comparing the overall accuracy achieved by the PNN classifier excluding feature selection with the existed literature, we found that our classification accuracy of 98.54% are higher than that shown in Reference [11] (96.11%), Reference [36] (94.4%), Reference [40] (94.4%), Reference [49] (98.42%), Reference [51] (97.6%), Reference [52] (98%), and Reference [53] (98.08%). It was found that an overall accuracy of up to 99.31% was achieved when nine optimally selected features were used in the feature selection process. In this case, it is also unnecessary to use all features in the classification process, which significantly reduces computational time; therefore, the proposed adaptive ABC-PSO as optimal feature selection algorithm indicates high performance in PQD classification.

Classification Accuracy under a Noisy Environment
Since a noisy situation is typical in a practical electrical distribution system, the presented PQD classification system's performance under a noisy environment was also tested. The white noise added signals with a signal-to-noise ratio of 10, 20, 30, 40, and 50 dB were uniformly applied to the power quality signals. Table 7 shows that the classification system using the optimal feature selection algorithm indicates a higher durability to the noise than that without using optimal feature selection and using ABC and PSO; therefore, the proposed PQD classification system could operate well in a noisy environment.

Convergence Rate
As mentioned above, we aimed to adapt the fast convergence rate property of the PSO to compensate for the low convergence rate of the ABC. Figure 5 compares the convergence rate of the PSO, ABC, and adaptive ABC-PSO algorithms. In this work, the convergence rate is determined based on the iteration in which the best solution is reached. It is noted that the number of iterations must be sufficient to cover the steady-state of solutions. As expected, the PSO algorithm has the highest convergence rate to reach the best solution (iteration 84); meanwhile, the ABC algorithm has a worse convergence rate than others (iteration 1556), as shown in Table 8. Remarkably, the results indicate that the adaptive ABC-PSO can reach the highest overall accuracy at iteration 728, showing that the strength of PSO can greatly enhance the weakness of the ABC. Nevertheless, the adaptive ABC-PSO convergence rate is still lower than the PSO algorithm because it contains the ABC algorithm. Although we attempted to improve the convergence rate of the proposed algorithm, it must be kept in mind that an overall accuracy has the highest priority in PQD classification; therefore, it is clear that the ABC-PSO algorithm indicates the highest performance in this case. Hence, the feature selection based on the adaptive ABC-PSO algorithm not only provides high classification accuracy, but the best solution can also be found within a shorter time.  Table 7. Classification accuracy under a noisy environment. ABC algorithm. Although we attempted to improve the convergence rate of the proposed algorithm, it must be kept in mind that an overall accuracy has the highest priority in PQD classification; therefore, it is clear that the ABC-PSO algorithm indicates the highest performance in this case. Hence, the feature selection based on the adaptive ABC-PSO algorithm not only provides high classification accuracy, but the best solution can also be found within a shorter time.     Table 9 shows the classification accuracy based on the real dataset of a distribution network in China, Singapore, South Africa, and Sweden. The dataset was adopted from PQube equipment, an instrument for power quality monitoring and real-time electrical signal phenomena recoding [54]. The classification accuracy was based on the optimally selected features. The classification system based on the proposed adaptive ABC-PSO algorithm as feature selection demonstrates the classification accuracy of 99.2%, which is consistent with the accuracy, shown in Table 6; therefore, the proposed algorithm not only provides high accuracy to classify the standard waveforms of PQD, but it can also accurately classify the PDQs in a real distribution system.

Performance Comparison to the Existing PQD Classification Systems
The PQD classification performance using the adaptive ABC-PSO algorithm as optimal feature selection was compared to the other existing methods, as shown in Table  10. The overall accuracy of each method was adopted from the mentioned references. The adaptive ABC-PSO algorithm could provide high classification accuracy of 99.31% for identifying 13 types of power quality signals. The classification accuracy achieved by the proposed method is higher than most of the existing methods and is classified as a very-high precision range. Although our proposed algorithm has a slightly lower accuracy than that shown in Reference [52] under the same PQD signal types, the proposed algorithm's convergence rate could be much better, due to the use of their GA selection algorithm; therefore, the proposed optimal feature selection algorithm is suitable for practical PQD classification.

Conclusions
In this article, a high-accuracy PQD classification, based on the adaptive ABC-PSO algorithm as optimal feature selection, was introduced to classify the 13 types of power quality signals. The feature extraction was based on the DWT, while the PNN performed as the classifier. The nine optimal features were selected by the optimal feature selection algorithm. The results show that the optimal features' classification accuracy was higher than that based on other numerical features. The highest classification accuracy of 99.31% could be achieved by using the optimally selected features. The proposed PQD classification system also indicated a high performance under a noisy environment. In addition, the classification accuracy, based on the real dataset of a distribution network, was consistent with that based on the standard waveforms. When comparing the performance of the presented PQD classification system to previous studies, the PQD classification accuracy using the adaptive ABC-PSO as optimal feature selection algorithm is classified as the high precision range; therefore, the adaptive ABC algorithm is suitable for implementation of PQD classification in electrical distribution systems.