Power Quality Disturbances Classification via Fully-Convolutional Siamese Network and k-Nearest Neighbor

: The classification of disturbance signals is of great significance for improving power quality. The existing methods for power quality disturbance classification require a large number of samples to train the model. For small sample learning, their accuracy is relatively limited. In this paper, a hybrid algorithm of k-nearest neighbor and fully-convolutional Siamese network is proposed to classify power quality disturbances by learning small samples. Multiple convolutional layers and full connection layers are used to construct the Siamese network, and the output result of the Siamese network is used to judges the category of the signal. The simulation results show that: For small sample sizes, the accuracy of the proposed approach is significantly higher than that of the existing methods. In addition, it has a strong anti-noise ability.


Introduction
With the development of smart grids, the large-scale integration of advanced power electronic devices, renewable energies, and electric vehicles have brought new challenges to the operation of distribution networks. For example, the uncertainty of photovoltaic output power easily causes a voltage drop in the distribution network [1]. Similarly, the fluctuation and intermittence of wind farms' produced power may cause the voltages of the distribution network to exceed the limits, since the wind power lies on the wind speed that varies from time to time [2]. The unstable power quality will seriously affect industrial production and residential electricity consumption, and even threaten the safe and stable operation of the power system. How to improve power quality has become a common concern of power companies and power users. The premise of improving power quality is to distinguish the category of disturbance quickly for massive high-dimensional power quality data.
Power quality disturbances classification mainly consists of feature extraction and pattern recognition. The quality of the features has a major impact on the accuracy of the classification. The traditional processing methods for feature extraction mainly include wavelet transform, fast Fourier transform, s-transform, and Hilbert-Huang transform. Wavelet transform can be used to analyze the time-frequency characteristics of signals and it is suitable for the analysis of non-stationary signals with sudden change characteristics. However, it is sensitive to noise, and the choice of basis function depends on experts' experience [3,4]. Fast Fourier transform has the advantage of low computation, which makes it widely used in the field of signal processing. It is only suitable for the analysis of stationary signals, and cannot deal with power quality disturbance signals with non-stationary characteristics such as transient and sudden change [5]. As an extension of wavelet transform and short-time Fourier transform, the height and width of window function of s-transform vary with frequency, which overcomes the shortcoming of short-time Fourier transform with fix height and width of the window function. Nevertheless, s-transform is insensitive to the detection of singularities in signals, and the computational complexity is high [6,7]. Hilbert-Huang transform is good at dealing with nonlinear and non-stationary power quality disturbance signals, but it has the problem of endpoint effect and modal aliasing [8]. Generally speaking, these traditional methods rely on experience to select features. There is no unified theoretical basis for extracting features, and the summarized features are not universal. For different data sets, the quality of classification is difficult to guarantee.
Pattern recognition uses the extracted features to determine the category of power quality disturbances data. Common methods include support vector machine (SVM), Bayesian classification, k-nearest neighbor method, case-based reasoning, and multi-layer perceptron (MLP). SVM is suitable for binary classification. For n categories of power quality disturbances, n SVM needs to be trained. Each SVM needs to use all the training set. Therefore, the training speed of SVM decreases sharply with the increase of the training set, which makes it difficult to process large sample data sets [9,10]. Bayesian classification is sensitive to the form of input data, and the prior probability depends on the hypothesis, which may lead to poor classification results due to the inaccurate prior model [11]. The k-nearest neighbor algorithm is simple to implement. When the number of training samples and the dimension of feature vectors are large, the algorithm's complexity will be very high. In addition, the K value that has a great impact on the results needs to be set artificially [12]. Case-based reasoning requires a large amount of historical data, and all kinds of disturbance scenarios should also be included in the database [13]. Multilayer perceptron has powerful non-linear mapping ability and can fit arbitrary continuous functions theoretically. However, it is prone to over-fitting [14]. In general, due to the high dimensionality of the power quality disturbances data, the accuracy of the traditional methods for pattern recognition is low, and it is difficult to meet the actual demand.
In recent years, deep learning technology has become one of the most popular research fields of artificial intelligence, and it has made great achievements in the fields of the power system such as load curve modeling, photovoltaic power prediction, and fault diagnosis. Specifically, many scholars have tried to apply some deep neural networks to improve the accuracy of power quality disturbance classification. For example, the stack sparse auto-encoder is proposed to automatically extract the feature of power quality disturbances data in [15]. The simulation result shows that the stack sparse auto-encoder can effectively learn the natural characteristics of data by reducing dimensionality. To improve the accuracy of classification, a novel framework consists of convolutional neural networks and other classifier is presented in [16,17]. While in [18], the s-transform is used to obtain the specific features, and then the categories of power quality disturbances are determined by a probabilistic neural network. Although the accuracy of these deep neural networks is high, the training process requires a large number of samples which are difficult to obtain in some distribution networks. Siamese network is a typical neural network for few-shot learning. It can automatically determine the categories of samples by calculating similarity, which is very suitable for classification using a few samples from the training set. At present, effective applications based on the Siamese network focus on data classification such as signatures recognition [19,20], disease diagnosis [21] and object tracking [22]. To the best of our knowledge, there is no report on the use of the Siamese network to classify power quality disturbances.
The above analyses brings us to a summary: although traditional methods (e.g., SVM) are suitable for small sample learning, their accuracy is low. Some deep neural networks such as convolutional neural networks (CNN) have relatively high accuracy, but they need a large number of samples for training models. How to design a novel method with high accuracy by learning small samples deserves further study.
In order to address these issues, a hybrid algorithm of k-nearest neighbor and fully-convolutional Siamese network is proposed to classify power quality disturbances. The key contributions of this paper mainly include: 1. This is the first exploration of the application of the hybrid algorithm of k-nearest neighbor and fully-convolutional Siamese network in power quality disturbances. The proposed approach only requires a few samples to train the model, and the accuracy is higher than the traditional method. 2. The conventional Siamese network is composed of multiple full connection layers, which leads to its low accuracy. By applying multiple convolutional layers, the Siamese network can automatically extract the intrinsic attributes of power quality disturbance to improve accuracy. 3. Unlike most deep neural networks (e.g., CNN and MLP) that train a classifier (e.g., SoftMax) through samples, the Siamese network judges the categories by calculating the distance between two feature vectors, which provides a new idea for power quality disturbance classification. The rest of this paper is organized as follows. Section 2 introduces the generation of data sets. Section 3 explains the principle of the Siamese network and its application in power quality disturbance. Section 4 tests the performance of the proposed approaches through simulation. Section 5 summarizes the work and results of this paper.

Data Set Generation
Most of the existing literature get data sets through simulation since it is difficult to obtain the actual data of power quality disturbance. Various power quality disturbances are defined in the IEEE standard 1159 [23]. This paper considers seven classical power quality disturbance signals, including swell, sag, harmonic, flicker, interruption, spike and oscillatory transient. Their mathematical formulas are shown in Table 1, where T is 0.02, α is a random number within the thresholds. The above power quality disturbance signals are visualized as shown in Figure 1.

Symbol
Type of Disturbance Equations Parameters Harmonic Oscillatory transient

Siamese Network
The traditional method for power quality disturbance classification is to use a classifier, such as CNN and SVM. These methods are not suitable for the application of power quality disturbance classification where the number of categories is large and the number of samples per category is small. The Siamese network is a kind of distance-based method used for solving this problem. It calculates the similarity metric between the power quality disturbances signals to be classified and a database of stored prototypes. This similarity metric was used to match new power quality disturbances signals from previously-unused categories during training.
The core idea of the Siamese network is to map the power quality disturbance signal to the target space through a function, and compare the similarity in the target space using simple distance (e.g., Euclidean distance). Specifically, the given a series of functions, ( ) W G X are parameterized by W. The goal of the training process is to find the optimal parameters W so that the similarity is large when 1 X and 2 X belong to different disturbances categories and small when they are the same disturbance category. In the training phase, two samples are selected as a pair of inputs data for the Siamese network [24][25][26]. As shown in Figure 2, the Siamese framework has a symmetrical structure where the neural networks share weights to process power quality disturbance signals.  The similarity of 1 X and 2 X in low-dimensional space is measured by the energy function. Its mathematical formula is as follows: The contrastive loss function relies on samples and the parameters of the energy function. Its mathematical formula is: where P is the number of samples in the training set. ( ) 1 2 , , i Y X X is the i-th sample that includes a pair of power quality disturbance signals and a label (same or fake). G L is the loss function for a pair of signals from the same category. I L is the loss function for a pair of signals from different categories. L will increase the energy function of different pairs and decrease the energy function of the same pairs. In order to achieve this, G L is designed as a monotonically increasing function, and I L is designed as a monotonically decreasing function. In this paper, the exact loss function for a single sample is designed as follows: where Q is a constant and the loss function is convergent. The concrete proof can be seen in [25].

Convolutional Network
The convolutional network has been widely used in image classification, target detection, and style transfer because of its powerful feature extraction ability [27][28][29]. In this paper, an important contribution of the proposed approach is that the convolutional network is used to extract representations that are robust to geometric distortions of the input data.
The convolutional network consists of an input layer, convolutional layer, pooling layer, and output layer. The operation of the convolutional layer is shown in Figure 3. The convolutional layer convolutes the signal matrix from the input layer and adds the bias vector to output the feature map through the activation function. The relationship of dimension is as follows: 1 m n k = − + . The mathematical formula of the convolutional layer is as follows: where i w denotes the weight of convolutional kernel in i-th layer. i x is the output value of in i-th convolutional layer and bi is the bias vector in i-th convolutional layer. * denotes the operation of convolution and denotes the activation function. The activation function is used to transform data nonlinearly so that the neural network can fit the complex nonlinear relationship. Common activation functions include sigmoid, hyperbolic tangent and rectified linear unit. When the input value is large, the output of Sigmoid and hyperbolic tangent functions are close to 0. In this case, with the increase of the number of hidden layers, the error is difficult to continue to propagate downwards and the gradient disappears easily. For this reason, the rectified linear unit will be used as the activation function in the convolutional layer. The pooling layer compresses and maps the feature map from the convolutional layer to reduce the computational complexity and dimension of the feature. The features generated by pooling layer have the invariant properties of rotation and can prevent the over-fitting to a certain extent. As shown in Figure 4, the size of the feature map decreases after being processed by the pooling layer, and the relationship of dimensions among input data, pooling matrix, and output data are as follows: / m n k = . The mathematical formula of pooling layer is as follows: where "subdown" is subsampled function. i β and i b are bias vectors. In this paper, the pooling layer will select the max-pooling function.

K-Nearest Neighbor
The input of the Siamese network is a pair of power quality disturbance signals, and its output is the distance between the two signals. Usually, we need to make many pairs of unknown signals and known signals that are fed to the Siamese network. Then, we divide the unknown signals into the kind of known signal with the shortest distance from it. This traditional method will be affected Feature map m×m by noise, resulting in the accuracy of the Siamese network decline. Therefore, the k-nearest neighbor is proposed to improve the accuracy of the Siamese network. Its steps are as follows: (1) For an unknown signal, it is combined with the known signal from the training set into n pairs. The n is the number of samples in the training set. These n pairs of signals are fed to the Siamese network, and the distance between the unknown signal and n samples is output by the Siamese network.
(2) The n samples are listed in descending order by distance. The first k samples are selected, and the number of categories of k samples is counted. Finally, the unknown signal is assigned to the largest number of categories.

Process of the Proposed Method
To summarize the above analysis, the steps for power quality disturbances classification based on Siamese network are as follows: (1) Format transformation: as is known to all, the original purpose of designing these deep neural networks is to classify images, which have the same size of row and column. However, the power quality disturbance signal is a one-dimensional time series, which cannot be directly used as the input data of the deep neural network. Therefore, it is necessary to convert the power quality disturbance signal into a two-dimensional matrix with the same size of row and column. Take the signal containing 140 elements as an example to explain the principle of format transformation Firstly, some zero elements are added to the tail of the time series to make it become a vector with 144 elements. Then, the time series is transformed into a matrix of 12 × 12 scales as input data of the Siamese network.
(2) Data normalization: after format transformation, power quality disturbance signals need to be normalized, and otherwise the loss function may not converge. In this paper, the input data is transformed into standard data that range from 0 to 1 by the min-max normalization method.
(3) Updating network parameters: after normalization, the convolution neural network maps the two signals to low-dimensional vectors. Then, the similarity between the two signals is calculated to update the weight of the Siamese network via the chain rule and gradient descent method.
(4) Obtain the results: after training the network, comparing an unknown signal against samples of labeled signals, we are able to determine the labeled signal which is most similar to the unknown signals and obtains a classification result.
The program of power quality disturbances classification via the Siamese network is designed with multiple stages: (1) define network, (2) share weights, (3) train network, (4) predict class. Part of the code is shown in Table 2.

Architecture and Parameters
The sampling frequency of power quality disturbance data is 3916 Hz, and the sampling time is 10 cycles, namely 784 points per sample. The proportion of training set to the data set is 80%, and the proportion of validation set and test set is 10%. The proposed methods will run under MATLAB2018a and Keras which is a deep learning library. The parameters of the computer are: 6 GB of memory, the processor is dual-core 2.4 GHz and Intel Core i3-3110M.
The architecture and parameters of the Siamese network are shown in Table 3 and Figure 5. In addition, the CNN, MLP, SVM, extreme gradient boosting (XGBoost) method and light gradient boosting machine (LightGBM) method are used as baselines to verify the performance of the proposed Siamese network. Their parameters and structures are as follows: (1) As far as MLP is concerned, the number of neurons in the input layer is 784, and the number of neurons in the middle layer is 500 and 200, respectively. The number of neurons in the output layer is equal to the number of categories. To prevent over-fitting, a dropout layer with a rate of 0.25 is inserted between each full connection layer. The loss function is cross-entropy and the optimizer is the root mean square prop (RMSprop). (2)The CNN consists of two convolutional layers, two Max-Pooling layers, two dropout layers and two full connection layers. The size of kernel in the convolutional layers is 5. The value of the dropout layer is 0.25. The size of pool in max-pooling layers is 2. The number of neurons in the full connection layers is 128 and 8, respectively. (3) For SVM, the fitcecoc function from MATLAB2018a is used to classify power quality disturbances. (4) For XGBoost, the gamma is 0.1. The max depth is 6 and the subsample is 0.7. The min child weight is 3 and eta is 0.1. (5) For LightGBM, its specific parameters are shown in Table 4. (6) After many experiments, when k is equal to 20, the performance of the proposed method is the best.    Figure 6 shows the training process of the Siamese network. As the number of iterations increases, the loss functions of the training set and validation set decrease. When the number of iterations is more than 120, the loss function of the neural network tends to be stable, which indicates that the network has converged. The loss function of validation set is very close to that of test set, which indicates that the Siamese network has strong generalization performance.

Simulation Results
In order to analyze the influence of data size on the performance of the proposed method, simulation was carried out in seven cases that are shown in Table 5. Each case was run 30 times independently and their average accuracy is obtained as shown in Table 6.  In order to analyze the performance of the proposed methods under different signal-to-noise ratios (SNR), the original power quality disturbance signals and Gauss white noises are combined to form new samples as shown in Figure 7. Each case is tested 30 times independently, and the average accuracy of each case under each SNR is counted as shown in Table 7 and Table 8.  In order to analyze the performance of the proposed methods under different sampling frequency, each algorithm is repeated 30 times at different frequencies, and the average accuracy is shown in Table 9.

Discussion of Results
The following conclusions can be drawn from Table 6: (1) obviously, the accuracy of the existing methods is less than 70% in cases 1 and 2. The accuracy of the proposed method is higher than 80%, which indicates that the proposed method had clear superiority in power quality disturbances classification with small samples. (2) As the number of samples increases, the accuracy of each algorithm increases. It shows that increasing the number of samples was helpful for improving the accuracy. For a large sample size, the accuracy of the proposed method is very close to that of CNN, and it is much higher than that of other methods. (3) Generally speaking, the proposed method had the best performance, followed by CNN. The performance of XGBoost and LightGBM is similar. SVM has the worst effect.
The following conclusions can be drawn from Table 7 and Table 8: (1) When the signal-to-noise ratio is 15, the accuracy of XGBoost and LightGBM decreases significantly, which indicates that their anti-noise ability is weak. (2) The accuracy of MLP and SVM under different SNR is relatively low, which indicates that they are not suitable for classifying power quality disturbances with noises. (3) In contrast, the accuracy of proposed methods and CNN under different SNR in case 3 is more than 85%, which shows that they have strong robustness. In addition, the accuracy of the proposed methods is slightly higher than that of CNN.
The following conclusions can be drawn from Table 9: obviously, the accuracy of the proposed methods has a positive correlation with the sampling frequency. When the sampling frequency is 715 Hz, the accuracy of the proposed methods is slightly lower than that of CNN, which indicates that the proposed method is suitable for power quality disturbance signals with high sampling frequency. When the sampling frequency is more than 1275 Hz, the accuracy of the proposed methods is higher than that of other methods.

Conclusions
The classification of disturbance signals is of great significance for improving power quality and system operation. In this paper, a hybrid algorithm of k-nearest neighbor and fully-convolutional Siamese network is proposed to classify power quality disturbances by learning small samples. The following conclusions are obtained through simulation: (1) For larger sample sizes, the accuracy of the proposed methods is very close to that of CNN, and higher than that of other traditional methods. For small sample sizes, the accuracy of the proposed method is significantly higher than that of the existing methods (e.g., MLP, CNN, SVM, XGBoost and LightGBM), which shows that the proposed method is very suitable for power quality disturbance classification with a small number of samples.
(2) If the data size is small, the accuracy of the proposed method is higher than that of the traditional methods (e.g., MLP, CNN, SVM, XGBoost and LightGBM) under different SNR. Besides, both the proposed method and CNN show strong anti-noise ability.
(3) The accuracy of the proposed method has a positive correlation with the sampling frequency. In order to ensure the accuracy of the proposed method is high enough, the sampling frequency of the power quality disturbance signal is better than 1275 Hz.

Conflicts of Interest:
The authors declare no conflict of interest.