1. Introduction
Power systems have advanced considerably, with electricity playing a crucial role in the manufacturing industry and our daily life. However, the use of high current carries the risk of fire, equipment explosions and damage to the system. To mitigate these risks, maintaining stability in system operations is important. Gas-insulated switchgear (GIS) is a type of electrical equipment that uses gas to insulate and protect different parts of power transmission and distribution systems [
1]. GIS is advantageous because it has a small size [
2] and flexible configuration and is easy to maintain [
3].
In power systems, partial discharges (PDs) ensure the reliability and safety of the electrical infrastructure. When PDs occur in GIS, the insulation deterioration increases, thereby resulting in serious accidents owing to insulation defect. Hence, PD detection in the early phase is very important [
4]. Different methods have been developed to detect PDs, including the use of loop antennas, acoustic emissions, or different types of internal and external sensors [
5,
6,
7,
8]. Among these methods, ultra-high-frequency (UHF) sensors can capture a wide range of frequencies and effectively reduce noise [
9]. Therefore, we use PD signals from UHF sensors for fault diagnosis in GIS [
10].
Recently, artificial intelligence (AI) has successfully performed pattern recognition and classification of PD in GIS using various machine learning algorithms, such as support vector machines (SVMs), random forests (RF), logistic regression (LR), k-nearest neighbors (kNNs) and backpropagation neural networks (BPNN) [
11,
12,
13,
14,
15,
16]. The classification of PD signals involves two distinct patterns: time-resolved partial discharges (TRPDs) and phase-resolved partial discharges (PRPDs). The TRPD-based methods for pattern recognition and classification have the advantage of a simple measurement system and the ability to distinguish signals from noise [
11,
12,
13,
16]. SVM with chromatic methodology is used for TRPD-based pattern recognition [
11]. Also, based on TRPD, PD classification is proposed through RF with Hilbert transform [
12], and incipient discharge classification is proposed through RF and kNN using FFT analysis [
13]. The PRPD-based methods for pattern recognition and classification method have robustness against signal attenuation and noise interference [
14,
15,
16]. Based on PRPD, a PD severity assessment is proposed, utilizing a minimum-redundancy maximum-relevance algorithm-based feature selection and SVM [
14]. A two-step LR is proposed for the identification of multisource PDs based on each PRPD pattern [
15]. Moreover, partial discharge recognition is proposed using BPNNs and integrating information from both TRPD and PRPD [
16]. The limitation in extracting high-level features are the big issues for conventional machine learning techniques [
17]. This is because the feature extraction process in machine learning techniques is time-consuming and depends on domain expertise [
17].
To address the disadvantages of machine learning techniques, deep neural networks (DNNs) have been developed, performing complex calculations with large amounts of data. The DNN-based PD classification method, which consists of multiple hidden layers and activation functions, was proposed using UHF PRPD [
16]. However, the convergence of the structure is slow and complicated owing to numerous nonlinear activation functions, layers, and neurons. To solve the DNN problem, CNNs have been proposed in some studies, including PRPD pattern recognition of GIS [
18], TRPD pattern recognition of GIS [
19], and transfer learning combined CNN with the IoT for the automatic management of GIS [
20]. CNNs comprise different convolution blocks and fully connected layers. Further, different types and variations of the CNN architectures have been proposed, including AlexNet, VGG, ResNet, and GoogLeNet [
21]. While CNNs demonstrate remarkable learning capabilities than conventional machine learning methods, those employed for fault diagnosis require a large training dataset [
22]. In the fault diagnosis for GIS, obtaining extensive fault data based on different environments and levels of failure severity is challenging.
In this study, we propose a novel approach for fault diagnosis in GIS by utilizing supervised contrastive learning (SCL) with PRPD signals. Our work builds upon recent successes in supervised learning. This is exemplified by proposing a supervised contrastive learning-based domain adaptation network (SCLDAN) for cross-domain fault diagnosis of the rolling bearing [
23], using multimodal SCL to classify MRI regions for prostate cancer diagnosis [
24], SCL for image dataset cleaning [
25], and SCL for text representation [
26] in related studies. Contrastive learning aims to maximize the similarity between any two vector representations by minimizing the contrastive loss function, and its advantage includes distinguishing different types of data [
27]. The proposed SCL model comprises pretrained and downstream tasks. As regards the pretrained task, we propose new data augmentation methods for the PRPDs in GIS. For data augmentation, we employ Gaussian noise adding, Gaussian noise scaling, random cropping, and phase shifting to generate more views as input for supervised contrastive loss. To validate the robustness and effectiveness of the proposed SCL model, we conduct experiments using PRPDs and on-site noises in GIS. The proposed SCL achieves a classification accuracy of 97.28%, thereby outperforming conventional methods such as SVM, MLP, and CNN in terms of classification performance with margins of 6.8%, 4.28%, and 2.04%, respectively.
The remainder of this paper is organized as follows: In
Section 2, works related to the SCL method and measurement of PRPDs in GIS are presented. The proposed SCL model, including data augmentation, is presented in
Section 3. The validation results based on measured PRPDs and noise are presented in
Section 4.
Section 5 concludes this paper.
4. Experimental Results
In this section, we present the performance evaluation of the fault diagnosis in power equipment using the proposed SCL.
Table 1 shows the fault types (corona, floating, void, particle, and noise) and the number of measurement samples. A detailed description of the experimental signals is provided in
Section 2.2.
For data augmentation, we choose the mean and standard deviation as the parameters for Gaussian noise adding, = 1 and standard deviation for Gaussian noise scaling, and for random cropping.
Figure 7 shows the structure of the encoder network, which comprises five convolutional blocks (each including a 3 × 3 convolution layer, a rectified linear unit activation and a max pooling layer) and a flatten layer with
. After the encoder network, the projection head comprising a single hidden layer with
and
is used to apply nonlinear transformation and project
to optimize the supervised contrastive loss in (6).
Regarding the downstream task, the classifier network comprises a 900-node hidden layer and a Softmax layer with for classifying five fault types.
In our experiment, the dataset was divided into training and testing sets with 80% and 20% sample proportions, respectively. In addition, data augmentation methods double the training set size. During training, we conducted experiments using different options to acquire optimized hyperparameters related to the batch size, number of layers, and filter size to optimize our method.
Table 2 shows the optimization of the minimum and maximum bound for each hyperparameter, and their type as well. The proposed SCL was optimized using different combinations of parameters to obtain the best choice. After trying several architectures, we used the five convolution layers with a 3 × 3-size kernel for the encoder network, a single hidden layer with 256 nodes for the projection head and a 900-node hidden layer together with a 20% dropout rate for the classifier network. The combination of these architectures with the learning rate, batch size, and epochs of 0.0001, 16, and 200, respectively, achieves the highest overall classification accuracy for each fault type. In addition, we used the ReLU function to apply the nonlinear transformation for the SCL method with each combination of parameters. The programming language (Python) with Tensorflow version 2.12.0 and scikit-learn version 1.2.2 were used to process the fault diagnosis problem.
In this experiment, we compared the performance of the proposed SCL to the SVM with the nonlinear radial basis function (RBF) kernel, MLP, and CNN, where MLP uses ReLU activation and Adam optimizer with a maximum of 200 iterations. For SVM and MLP, we use feature extraction to reduce the two-dimensional raw data into one-dimensional vectors. The mean of the nonzero values and the number of nonzero values in each phase are used for feature extraction.
Table 3 shows the experiments for the SVM and MLP with various hyperparameter ranges. The regularization parameter
for SVM has a range from 0.001 to 100 to avoid the overfitting problem, and the SVM model with
reached the highest accuracy. For the MLP method, different numbers of hidden layers are tried from 1 to 5, and the number of nodes in each layer is reduced to prevent overfitting. After trying many sets with different numbers of nodes and hidden layers, we found the MLP model with (200, 150, 100, 50) nodes for four hidden layers with the highest accuracy. Here, we used the same range of hyperparameters and architectures for both the SCL encoder-classifier network and the CNN method as shown in
Table 2. For the CNN model, the five convolution layers with a 3 × 3-size kernel, a 900-node hidden layer, and two-20% dropout layers with the learning rate, batch size, and epochs of 0.0001, 16, and 200, respectively, achieved the highest classification accuracy.
Table 4 presents the performance analysis for the proposed SCL. The proposed SCL is superior to the SVM owing to the difference in particle fault performance (100% in testing contrary to 53.85%). The proposed SCL outperforms the MLP by a margin of 4.28% in terms of the overall classification performance, with a particularly high performance in terms of classifying the particle, void, and noise. While the accuracy of the corona class in the proposed SCL is lower than that of the CNN, the overall performance (97.28% for the proposed SCL contrary to 95.24% for the CNN) remains superior, primarily owing to distinctions in particle and noise classifications. Hence,
Table 4 shows that, in terms of accuracy, the proposed SCL is the best among the four methods.
For performance evaluation of imbalanced data set,
,
, and
are used [
37].
and
are defined as
and
respectively, where
is the number of accurately predicted samples as “positive”,
is the number of samples wrongly predicted as “positive”, and
represents the samples wrongly predicted as “negative”.
represents the quality of a positive prediction made by the model, while
is the fraction of positives that are correctly classified.
is the trade-off between
and
, and is expressed as
The range of the is between 0 and 1, and the closer the value is to 1, the better the model. Conversely, the closer it is to 0, the worse the model.
Table 5 presents the
,
, and
for all classes between SVM, MLP, CNN, and the proposed SCL. The proposed SCL exhibits a notable performance in terms of
and
, particularly in the noise class, where it achieves values of 0.968 and 1. Hence, the
reaches its peak at 0.984, surpassing other methods. The proposed SCL achieves a
value of 1 in floating fault and a
value of 1 in particle fault. The proposed SCL method achieves the highest
s in three out of the five classes, excluding void and corona. This deviation is attributed to their lower
values compared to the CNN (0.958 and 0.947 compared to 0.958 and 1, respectively), thereby resulting in a drop in the
s for void and corona to the second place (0.963 and 0.973 for the proposed SCL compared to 0.917 and 0.974 for the CNN, respectively). In terms of
performance, the proposed SCL is better than SVM, MLP, and CNN, as presented in
Table 5.
Figure 8 shows the confusion matrices for SVM, MLP, CNN, and the proposed SCL. It can be seen that both the proposed SCL and CNN only have two void samples that are misclassified as noise, while these metrics for the SVM and MLP are higher with the six and four misclassified void samples, respectively. In addition, the proposed SCL appears to be better than the other three methods in testing particle fault. As shown in
Figure 8, the performance of the proposed SCL method is the best among four methods, with only a limited amount of the test data mistakenly recognized under other classes (one sample in the corona is misclassified as a void pattern, one sample in floating is misclassified as a particle, and two samples in void are misclassified as noise).
Table 6 shows accuracies of the baseline CNN in the same condition for the data augmentation, where training data for the data augmentation technique consist of raw data and augmented data. It can be seen that when training raw data combined with the proposed data augmentation methods, the accuracy for the test set is better than only using the raw data.
Figure 9 shows the confusion matrix in detail. All augmentation methods have an advantage in classifying noise class, while training only raw data leads to the five misclassified noise samples as four other faults.
Table 7 shows the training and testing times for SVM, MLP, CNN, and the proposed SCL. In our experiments, the models were trained and tested on a PC with an NVIDIA GeForce RTX 2080 Ti GPU and 128 Gb RAM. It can be seen that the training and testing times of the proposed SCL model were slower than those of SVM, MLP, and CNN. This was because the proposed SCL required data augmentation, the pretrained task, and the downstream task for the training. The test time of the proposed SCL took longer than that of other methods, but the duration was only 0.978 s, which is the computation complexity usable in a fault diagnosis system.