Partial Discharge Pattern Recognition Based on an Ensembled Simple Convolutional Neural Network and a Quadratic Support Vector Machine

: Partial discharge (PD) is a crucial and intricate electrical occurrence observed in various types of electrical equipment. Identifying and characterizing PDs is essential for upholding the integrity and reliability of electrical assets. This paper proposes an ensemble methodology aiming to strike a balance between the model complexity and the predictive performance in PD pattern recognition. A simple convolutional neural network (SCNN) was constructed to efficiently decrease the model parameters (quantities). A quadratic support vector machine (QSVM) was established and ensembled with the SCNN model to effectively improve the PD recognition accuracy. The input for QSVM consisted of the circular local binary pattern (CLBP) extracted from the enhanced image. A testing prototype with three types of PD was constructed and 3D phase-resolved pulse sequence (PRPS) spectrograms were measured and recorded by ultra-high frequency (UHF) sensors. The proposed methodology was compared with three existing lightweight CNNs. The experiment results from the collected dataset emphasize the benefits of the proposed method, showcasing its advantages in high recognition accuracy and relatively few mode parameters, thereby rendering it more suitable for PD pattern recognition on resource-constrained devices.


Introduction
Partial discharge (PD), characterized by localized and transient discharges that typically occur at defects within insulation systems, is a critical and intricate electrical phenomenon in various types of electrical equipment.PD does not completely bridge the insulation between conductors [1]; instead, it represents a localized flashover within an insulation system due to a large localized electric field being greater than the dielectric withstand capability while the overall insulation system remains capable of withstanding the applied electrical field.PD is diverse in both form and location.It can transpire in various electrical equipment, including transformers, generators, insulators, cables, and switchgear.The occurrence of PD in these systems can be ascribed to uneven electric field distributions, material imperfections, or operational stresses, leading to the generation of various signals, including lights, heats, smells, sounds, electromagnetic waves, and high-frequency electric currents.
Detecting and characterizing PD is paramount in maintaining the integrity and reliability of electrical assets.PD measurements are used to evaluate the safety condition of insulation systems, enabling the identification of potential defects and facilitating proactive maintenance.There are several techniques for detecting PD in electrical systems.Ultrasonic detection involves capturing the ultrasonic noise emitted by PD using sensitive sensors, providing insights into the discharge localization and severity.Electromagnetic interference (EMI) detection monitors electromagnetic signals to locate areas of partial discharge Energies 2024, 17, 2443 2 of 12 activity.Acoustic emission detection focuses on capturing and analyzing the acoustic signals produced by PD, offering valuable information about discharge characteristics.High-frequency current transient measurements are effective in assessing insulation conditions and identifying potential failure points.Dissolved gas analysis (DGA) involves monitoring and analyzing the composition of gasses dissolved in insulating oil, providing indications of PD and potential insulation degradations.Electric field measurements detect anomalies and areas of increased field intensity, serving as an indicator of partial discharge activity.In engineering applications, the original measured data are processed to extract the statistical feature parameters and generate the phase-resolved partial discharge (PRPD) patterns [2].Subsequently, PD pattern recognition is carried out based on these processed data.PD pattern recognition involves the identification and analysis of the characteristic electromagnetic, acoustic, and ultrasonic signals to distinguish the type of PD activity based on its unique pattern feature.By utilizing advanced signal processing techniques and machine learning algorithms, PD pattern recognition enables the classification of partial discharge sources within high-voltage equipment.Consequently, PD pattern recognition plays a crucial role in condition monitoring, allowing for the early detection of insulation defects in an electrical system.
Traditional PD pattern features typically include the waveform characteristics, the spectral features, the pulse counts, the phase characteristics, and the amplitude features.Traditional machine learning methods, such as artificial neural networks (ANNs) and support vector machines (SVMs), are conventionally utilized to learn from these features for pattern recognition.Tang et al. proposed a minimum-redundancy maximum-relevance (mRMR) algorithm-based feature optimization selection method to select the statistical features under a PRPD model [3].The results indicated that the PD severity assessment accuracy with the optimal feature set had a higher stability of precisions than that with the traditional feature set.Zhou et al. utilized both time domain and frequency domain features and introduced an optimized SVM algorithm for the pattern recognition of PD using ultrasonic signals [4].The results showed that the proposed SVM algorithm had a higher recognition accuracy and a faster convergence speed.Carvalho et al. compared three clustering algorithms (K-means, Gaussian mixture model, and mean-shift) and the SVM method for PD classifications; the supervised SVM demonstrated a notably high average accuracy [5].Furthermore, global optimization algorithms have been used to optimize the hyperparameters of SVM models in some studies.Sun et al. proposed an improved whale optimization algorithm (IWOA) to optimize the hyperparameters of SVMs to identify different types of PD [6].The resultant accuracy verified that IWOA had a good effect on the parameter optimization of SVMs.Sun et al. also proposed an improved northern goshawk optimization (SCNGO) to optimize the parameter penalty factor and the kernel parameter of the SVM [7].Fujioka et al. utilized the maximum intensity observed in the PRPD pattern as the input data of an ANN [8].The classification accuracy was improved by shifting the phase of the maximum sensor output to 0 • , as proposed.Haiba et al. utilized ANNs for classifying defects in ceramic insulators [9].The results from the ANN indicated that the overall recognition rate was dependent on the number of the collected signals, a greater number of captured signals led to a higher recognition rate.The findings of the ANN technique were also verified by SVM and KNN models in [9].Nevertheless, the major drawback of using traditional machine learning methods for PD pattern recognition is the necessity to extract features in advance.
In recent years, studies on the recognition of PRPDs, phase-resolved pulse sequences (PRPSs) [10], and other spectrograms in the direction of PD pattern recognition have demonstrated outstanding performance attributable to advancements in image recognition technology.Aldosari et al. combined long short-term memory (LSTM) networks and convolutional neural networks (CNNs) to identify the form of PD patterns, demonstrating that the integrated CNN-LSTM network outperformed an individual CNN or an LSTM network [11].Additionally, they found that image data augmentation had a better effect in both grayscale and RGB images.Fu et al. employed the DenseNet model in conjunction with transfer learning to extract features from the time domain signal map of a gas-insulated switchgear PD [12].The proposed method enabled direct pattern recognition research on the unstructured data time-domain waveform spectrogram of PD.Yin et al. constructed a model for identifying the statistical parameters of PRPD patterns based on the Hausdorff-like distance and an improved CNN for PRPD pattern recognition [13].They utilized Dempster-Shafer (D-S) evidence theory to combine the results of the two pattern recognition methods, thus enhancing the accuracy of PD pattern recognition.Song et al. utilized the histogram of oriented gradient (HOG) features of the 3D PRPSs and designed the attribute selective Naïve Bayes (ASNB) classifier to recognize the 3D PRPS graphs [10].The contrasting results compared to those using statistical feature parameters indicated that the use of HOG features resulted in a higher recognition accuracy and a stronger robustness in PD recognition under different voltages.Wang et al. enhanced the PRPS graph using the contrast-limited adaptive histogram equalization (CLAHE) algorithm and employed uniform local binary patterns (LBPs) as the feature vector of the PRPS graph [14].They then used the Adaboost cascade classifier for the integrated learning of different classification models.The experimental results indicated that using ULBP as the feature vector could enhance the generalization ability of traditional algorithms, and the use of CLAHE enhancements improved the upper limit of the recognition rate.Nevertheless, due to their limited number of layers, these models may not comprehensively extract the PD features.
Lightweight CNNs are increasingly being employed in the recognition of PD due to their hardware-friendly nature [15][16][17][18].A lightweight CNN, in the context of deep learning, refers to a neural network architecture designed with a relatively small number of parameters and computations, enabling an efficient inference on resource-constrained devices such as mobile phones or edge devices.These networks are tailored to strike a balance between model complexity and predictive performance, making them ideal for deployments in PD pattern recognition.Currently, the most widely used mobile networks include ShuffleNet [19], MobileNet [20][21][22], and EfficientNet [23].It should be pointed out that even though significant progress has been made in lightweight CNN-based methods for recognizing PD patterns, the large model size still poses challenges in using lightweight CNN-based methods for recognizing PD patterns to satisfy the real-time recognition requirements, especially when deployed to embedded devices.Acknowledging the limitations of existing lightweight CNN-based methods, this paper proposes an ensemble learning method that combines SVM and CNN, with improved recognition accuracy, high solution efficiency, and reduced parameter quantity.A simplification of MobileNet V2 (SCNN) was undertaken to address the demand for more efficient models while preserving the problemsolving accuracy.Furthermore, the integration of a quadratic SVM (QSVM) with the SCNN model effectively enhances the accuracy of PD recognition.These innovations collectively demonstrate the efforts in streamlining complex network architectures while maintaining accuracy, and in integrating traditional machine learning methods with modern CNN models to improve the recognition accuracy.This approach not only advances the field by achieving enhanced recognition results, but also showcases practical relevance by being more suitable for deployment on terminal devices, aligning with the demands of real-world applications.The research makes a significant scientific contribution by addressing the challenges of real-time recognition requirements and deployment on embedded devices in the context of identifying PD patterns in electrical systems.

The Proposed PD Pattern Recognition Methodology
To be self-contained, 3D Graph of PRPS and MobileNet V2 will firstly be briefed, and the proposed PD pattern recognition methodology will then be detailed.

Three-Dimensional Graph of PRPS
According to the generating mechanism, PD can be classified into suspended electrode discharge, surface discharge, and metal tip discharge.Suspended electrode discharge comes from the presence of free or floating conductive particles within an insulation material.When subjected to an electric field, these particles can lead to localized discharges due to the concentration of electric fields in their neighbors.Surface discharge transpires when the electric field at the surface of the insulator exceeds the dielectric strength limit of the material.This can occur due to surface irregularities, impurities, or imperfections, leading to the formation of localized discharge along the insulation surface.Metal tip discharge comes from high electric field concentrations at the tips of protruding conductive elements within the insulation system.This concentration of the electric field at the tips leads to the initiation of localized discharge.
The 3D-PRPS graphs in PD analysis are visualization tools that represent the distribution of partial discharge events in three-dimensional space, and provide a comprehensive view of the period, phase, and discharge amplitude of PD [10].Typical 3D-PRPS graphs of suspended electrode discharge, surface discharge, and metal tip discharge are shown in Figure 1.For suspended electrode discharge, there are obvious discharge pulses in both the positive and negative half of the phase.Comparatively, the phase width of a surface discharge is broader, while the pulse pattern of a metal tip discharge appears sporadic and dispersed.In conclusion, PRPS graphs manifest diverse visual patterns across various discharges, thus forming the fundamental basis for PD pattern recognition.

Three-Dimensional Graph of PRPS
According to the generating mechanism, PD can be classified into suspended electrode discharge, surface discharge, and metal tip discharge.Suspended electrode discharge comes from the presence of free or floating conductive particles within an insulation material.When subjected to an electric field, these particles can lead to localized discharges due to the concentration of electric fields in their neighbors.Surface discharge transpires when the electric field at the surface of the insulator exceeds the dielectric strength limit of the material.This can occur due to surface irregularities, impurities, or imperfections, leading to the formation of localized discharge along the insulation surface.Metal tip discharge comes from high electric field concentrations at the tips of protruding conductive elements within the insulation system.This concentration of the electric field at the tips leads to the initiation of localized discharge.
The 3D-PRPS graphs in PD analysis are visualization tools that represent the distribution of partial discharge events in three-dimensional space, and provide a comprehensive view of the period, phase, and discharge amplitude of PD [10].Typical 3D-PRPS graphs of suspended electrode discharge, surface discharge, and metal tip discharge are shown in Figure 1.For suspended electrode discharge, there are obvious discharge pulses in both the positive and negative half of the phase.Comparatively, the phase width of a surface discharge is broader, while the pulse pattern of a metal tip discharge appears sporadic and dispersed.In conclusion, PRPS graphs manifest diverse visual patterns across various discharges, thus forming the fundamental basis for PD pattern recognition.

MobileNet V2
MobileNet V2 is a neural network architecture designed to facilitate efficient and high-performance deep learning on resource-constrained devices such as mobile phones and embedded systems [21].The MobileNet V2 network uses inverted residual blocks with linear bottlenecks and shortcut connections based on the depthwise separable convolution of MobileNet V1, as shown in Figure 2, where W, H, and C are the width, the height, and the channel of the input image, respectively; N is the size of the kernel of the depthwise convolution; M is the number of kernels in the pointwise convolution.In Figure 2a, the depthwise separable convolution splits standard convolutions into depthwise convolutions and pointwise convolutions.Inverted residual blocks, as shown in Figure 2b, are types of building blocks which are designed to capture nonlinearities more effectively compared to traditional residual blocks.The input is first expanded to a higher-dimensional space using a 1 × 1 pointwise convolution, then processed with depthwise convolutions, and finally projected back to a lower-dimensional space.Within linear bottlenecks, a linear activation function is utilized to alleviate the information collapse that arises when information undergoes nonlinear mapping from a high-dimensional space to a low-dimensional space.Additionally, shortcut connections are employed to facilitate

MobileNet V2
MobileNet V2 is a neural network architecture designed to facilitate efficient and high-performance deep learning on resource-constrained devices such as mobile phones and embedded systems [21].The MobileNet V2 network uses inverted residual blocks with linear bottlenecks and shortcut connections based on the depthwise separable convolution of MobileNet V1, as shown in Figure 2, where W, H, and C are the width, the height, and the channel of the input image, respectively; N is the size of the kernel of the depthwise convolution; M is the number of kernels in the pointwise convolution.In Figure 2a, the depthwise separable convolution splits standard convolutions into depthwise convolutions and pointwise convolutions.Inverted residual blocks, as shown in Figure 2b, are types of building blocks which are designed to capture nonlinearities more effectively compared to traditional residual blocks.The input is first expanded to a higher-dimensional space using a 1 × 1 pointwise convolution, then processed with depthwise convolutions, and finally projected back to a lower-dimensional space.Within linear bottlenecks, a linear activation function is utilized to alleviate the information collapse that arises when information undergoes nonlinear mapping from a high-dimensional space to a low-dimensional space.Additionally, shortcut connections are employed to facilitate information flow and aid in gradient propagation in training.It is reported that MobileNet V2 will achieve an accuracy of 72% on ImageNet classifications [24].

CLAHE and Circular LBP Features
Contrast-limited adaptive histogram equalization (CLAHE) is an image processing technique used to improve the local contrast of an image by adjusting the intensity distribution in small regions [25].Unlike traditional histogram equalization, CLAHE limits the contrast enhancement to prevent the over-amplification of noises.By adaptively modifying the contrast in different areas of the image, CLAHE effectively enhances the visual appearance of images, particularly in regions with varying contrast levels.
Circular local binary pattern (CLBP) is a texture descriptor used in computer vision and image analysis [26].It works by comparing each pixel with its neighboring pixels on a circle to encode the local texture information into a binary pattern.The LBP feature vector is created by calculating the frequency of the occurrences of these patterns within a local neighborhood.This method is robust to monotonic grayscale changes and provides a compact representation of the texture information.

Three-Dimensional PRPSs Acquisition
This study firstly developed a PD defect test prototype by using an ultra-high-frequency (UHF) sensor to obtain PD signals.The voltage came from a non-partial discharge booster transformer.The PD spectrogram and amplitude of partial discharge UHF signals under simulated defects were measured by the UHF sensor.The prototype device is shown in Figure 3.In Figure 3, the resistance-capacitance voltage-dividing device is composed of the coupled capacitance and the measuring impedance.The UHF sensor was 3 m away from the PD generator.The schematic diagram of the prototype is shown in Figure 3. Three types of discharges-suspended electrode discharge, surface discharge, and metal tip discharge-could be generated in the PD generator.The discharging data for a total of 50 power frequency cycles at every 5° angle were recorded by the UHF sensor.The finally collected data sizes for suspended electrode discharge, surface discharge, and metal tip discharge were 262, 64, and 319, respectively.

CLAHE and Circular LBP Features
Contrast-limited adaptive histogram equalization (CLAHE) is an image processing technique used to improve the local contrast of an image by adjusting the intensity distribution in small regions [25].Unlike traditional histogram equalization, CLAHE limits the contrast enhancement to prevent the over-amplification of noises.By adaptively modifying the contrast in different areas of the image, CLAHE effectively enhances the visual appearance of images, particularly in regions with varying contrast levels.
Circular local binary pattern (CLBP) is a texture descriptor used in computer vision and image analysis [26].It works by comparing each pixel with its neighboring pixels on a circle to encode the local texture information into a binary pattern.The LBP feature vector is created by calculating the frequency of the occurrences of these patterns within a local neighborhood.This method is robust to monotonic grayscale changes and provides a compact representation of the texture information.

The Proposed PD Pattern Recognition Methodology 2.4.1. Three-Dimensional PRPSs Acquisition
This study firstly developed a PD defect test prototype by using an ultra-highfrequency (UHF) sensor to obtain PD signals.The voltage came from a non-partial discharge booster transformer.The PD spectrogram and amplitude of partial discharge UHF signals under simulated defects were measured by the UHF sensor.The prototype device is shown in Figure 3.In Figure 3, the resistance-capacitance voltage-dividing device is composed of the coupled capacitance and the measuring impedance.The UHF sensor was 3 m away from the PD generator.The schematic diagram of the prototype is shown in Figure 3. Three types of discharges-suspended electrode discharge, surface discharge, and metal tip discharge-could be generated in the PD generator.The discharging data for a total of 50 power frequency cycles at every 5 • angle were recorded by the UHF sensor.The finally collected data sizes for suspended electrode discharge, surface discharge, and metal tip discharge were 262, 64, and 319, respectively.

SCNN Structure Design
This paper presents a simple CNN (SCNN) structure based on the fundamental bottleneck residual block of MobileNetV2, aiming to strike a high balance between the size of the CNN model and the training accuracy.In order to examine the influence of the quantity of bottleneck residual blocks, this study initially investigated the recognition accuracy of a CNN with varying numbers of bottleneck residual blocks using all the collected data.The recognition process was repeated 10 times; the averaged accuracy is shown in Table 1.From Table 1, it is apparent that with an increase in the number of blocks from one to six, there is a corresponding rise in the recognition accuracy.However, the difference in the recognition efficiency between using five blocks and six blocks was marginal, within an error of 1%.Subsequent increases in the number of blocks did not yield significant improvements in the recognition efficiency.Consequently, the number of blocks in this study was selected to be six, considering the computational resources and the recognition accuracy.It has been proven that a swish activation function outperforms the ReLU function [27].The H-swish activation function approximates the sigmoid function in swish through an approximation function, exhibiting similar performance to swish while reducing the computational costs and improving the execution speed [22].Therefore, an H-swish activation function is more suitable for applications in mobile devices requiring real-time image processing.Hence, in this study, the H-swish function was used as the activation function in the first and second layers of the model, as well as in the final layer.
The final structure of the proposed SCNN is shown in Table 2, where t represents the expansion factor compared to the input channels in the inverted residual structure using 1 × 1 convolutions, c denotes the depth of the output feature map (channel), n signifies the repetition of the bottleneck, and s indicates the stride of the depthwise convolution in the first bottleneck of each row.

SCNN Structure Design
This paper presents a simple CNN (SCNN) structure based on the fundamental bottleneck residual block of MobileNetV2, aiming to strike a high balance between the size of the CNN model and the training accuracy.In order to examine the influence of the quantity of bottleneck residual blocks, this study initially investigated the recognition accuracy of a CNN with varying numbers of bottleneck residual blocks using all the collected data.The recognition process was repeated 10 times; the averaged accuracy is shown in Table 1.From Table 1, it is apparent that with an increase in the number of blocks from one to six, there is a corresponding rise in the recognition accuracy.However, the difference in the recognition efficiency between using five blocks and six blocks was marginal, within an error of 1%.Subsequent increases in the number of blocks did not yield significant improvements in the recognition efficiency.Consequently, the number of blocks in this study was selected to be six, considering the computational resources and the recognition accuracy.It has been proven that a swish activation function outperforms the ReLU function [27].The H-swish activation function approximates the sigmoid function in swish through an approximation function, exhibiting similar performance to swish while reducing the computational costs and improving the execution speed [22].Therefore, an H-swish activation function is more suitable for applications in mobile devices requiring real-time image processing.Hence, in this study, the H-swish function was used as the activation function in the first and second layers of the model, as well as in the final layer.
The final structure of the proposed SCNN is shown in Table 2, where t represents the expansion factor compared to the input channels in the inverted residual structure using 1 × 1 convolutions, c denotes the depth of the output feature map (channel), n signifies the repetition of the bottleneck, and s indicates the stride of the depthwise convolution in the first bottleneck of each row.Furthermore, to determine the most suitable batch size when using batch training for SCNN, various values for minibatch were investigated.After five repeated runs of each setting, the averaged training time and accuracy are shown in Table 3. Compromising the computational time and the recognition accuracy, the minibatch size was set as eight in this study when training the SCNN.For the obtained 3D PRPS graph, more than half of the image space lacked feature information.Consequently, 2D processing was performed from the top view.Subsequently, the processed 2D color image underwent grayscale processing using the floating-point method, followed by image enhancement through CLAHE.CLBP feature extraction was performed to generate the feature space of CLBP.The method described in [14] was adopted to select features within the CLBP feature space.Ultimately, 59 CLBP features were obtained and used as input data of the SVM.The whole processing procedure and results are shown in Figure 4.  Furthermore, to determine the most suitable batch size when using batch training for SCNN, various values for minibatch were investigated.After five repeated runs of each setting, the averaged training time and accuracy are shown in Table 3. Compromising the computational time and the recognition accuracy, the minibatch size was set as eight in this study when training the SCNN.For the obtained 3D PRPS graph, more than half of the image space lacked feature information.Consequently, 2D processing was performed from the top view.Subsequently, the processed 2D color image underwent grayscale processing using the floatingpoint method, followed by image enhancement through CLAHE.CLBP feature extraction was performed to generate the feature space of CLBP.The method described in [14] was adopted to select features within the CLBP feature space.Ultimately, 59 CLBP features were obtained and used as input data of the SVM.The whole processing procedure and results are shown in Figure 4. To determine the most suitable SVM model, experiments were conducted for six types of SVMs: linear SVM, quadratic SVM, cubic SVM, coarse SVM, medium SVM, and fine SVM.The training data comprised all the data for three types of PD.The training of different SVMs was conducted using the classification learner in MATLAB R2022b.The receiver operating characteristic (ROC) curve and the area under the curve (AUC) [28] were used to criticize the performance of different SVMs.The ROC curve is a graphical tool that plots the true positive rate against the false positive rate.The ROC curve provides a visual representation of a classifier ability to discriminate between classes across different threshold values.A steeper ROC curve indicates better performance, and the area under the ROC curve (AUC) quantifies the overall performance of the classifier.AUC values range from 0 to 1, where a value closer to 1 indicates a better discrimination performance, while a value near 0.5 suggests a performance similar to random guessing.The results of the ROC curves and AUC values for different types of PD are shown in Figure 5.The ROC To determine the most suitable SVM model, experiments were conducted for six types of SVMs: linear SVM, quadratic SVM, cubic SVM, coarse SVM, medium SVM, and fine SVM.The training data comprised all the data for three types of PD.The training of different SVMs was conducted using the classification learner in MATLAB R2022b.The receiver operating characteristic (ROC) curve and the area under the curve (AUC) [28] were used to criticize the performance of different SVMs.The ROC curve is a graphical tool that plots the true positive rate against the false positive rate.The ROC curve provides a visual representation of a classifier ability to discriminate between classes across different threshold values.A steeper ROC curve indicates better performance, and the area under the ROC curve (AUC) quantifies the overall performance of the classifier.AUC values range from 0 to 1, where a value closer to 1 indicates a better discrimination performance, while a value near 0.5 suggests a performance similar to random guessing.The results of the ROC curves and AUC values for different types of PD are shown in Figure 5.The ROC curves and AUC values are shown in Figure 4. Observing Figure 5, it is apparent that among the six types of SVM models, the quadratic SVM demonstrated higher AUC values across the three fault types.Therefore, the quadratic SVM model (QSVM) was selected to construct the PD pattern recognition model for the CLPB features.Based on the aforementioned studies, our PD pattern recognition methodology was proposed; its overall procedure is explained in Figure 6 to facilitate its implementation by fellow researchers.After collecting data from UHF sensors, the obtained images were initially preprocessed, involving image resizing, image rotation, image graying, and image enhancement.For the SCNN model, the image needed to be processed to be identical in size to the input size of the network: 224 × 224.For the QSVM model, the CLBP features were extracted, as shown in Figure 4.After image preprocessing, SCNN and QSVM models were separately established.The output scores of the SCNN and QSVM, with as many categories as the types of PD, were concatenated into one input vector, serving as the input for the ensemble learning model, and the ensemble learning model was trained using the bagging and discriminant method.Based on the aforementioned studies, our PD pattern recognition methodology was proposed; its overall procedure is explained in Figure 6 to facilitate its implementation by fellow researchers.After collecting data from UHF sensors, the obtained images were initially preprocessed, involving image resizing, image rotation, image graying, and image enhancement.For the SCNN model, the image needed to be processed to be identical in size to the input size of the network: 224 × 224.For the QSVM model, the CLBP features were extracted, as shown in Figure 4.After image preprocessing, SCNN and QSVM models were separately established.The output scores of the SCNN and QSVM, with as many categories as the types of PD, were concatenated into one input vector, serving as the input for the ensemble learning model, and the ensemble learning model was trained using the bagging and discriminant method.

Procedures of the Proposed ENS-SCNN-QSVM
Based on the aforementioned studies, our PD pattern recognition methodology was proposed; its overall procedure is explained in Figure 6 to facilitate its implementation by fellow researchers.After collecting data from UHF sensors, the obtained images were initially preprocessed, involving image resizing, image rotation, image graying, and image enhancement.For the SCNN model, the image needed to be processed to be identical in size to the input size of the network: 224 × 224.For the QSVM model, the CLBP features were extracted, as shown in Figure 4.After image preprocessing, SCNN and QSVM models were separately established.The output scores of the SCNN and QSVM, with as many categories as the types of PD, were concatenated into one input vector, serving as the input for the ensemble learning model, and the ensemble learning model was trained using the bagging and discriminant method.

Experimental Study
To demonstrate the performance of the proposed PD pattern recognition methodology, comprehensive experiments were conducted.In the experimental study, all recorded data were split into two parts, 70% for training and 30% for testing; the training dataset sizes for suspended electrode discharge, surface discharge, and metal tip discharge were 183, 45, and 223, respectively.Comparison was performed among SCNN, QSVM, random forest (RF) [29], extreme gradient boosting (XGBoost) [30], ensemble learning of SCNN and QSVM (ENS-SCNN-QSVM), and some existing lightweight networks, MobileNet V2 [21], EfficientNetB0 [23], and ShuffleNet [19].The comparison focused on the recognition accuracy, the parameter quantity, and the training time.For the identification of the three types of PD, each classifier was run independently 10 times to obtain an averaged recognition efficiency and an averaged training time.For SCNN, ENS-SCNN-QSVM, MobileNet V2, EfficientNetB0, and ShuffleNet, batch training was used, while the minibatch size was set at 8, the max number of training epochs was 20, and the learn rate was 0.001.For RF, the number of trees was 100, the minimum number of samples for each leaf node was 5, each tree was trained using a random selection of 10 features, and the maximum depth of each tree was 100.For XGBoost, the number of weak classifiers was 100, the maximum depth was 10, and the learning rate was 0.1.CLBP features were used in both the RF model and the XGBoost model.The experiments were conducted in MATLAB R2022b, using a single GPU on an AMD Ryzen 7 4800H with Radeon Graphics 2.90 GHz, NVIDIA GeForce GTX 1650 Ti.Notably, MobileNet V2, EfficientNetB0, and ShuffleNet were trained using transfer learning, where the pre-trained networks from ImageNet [24] were loaded.The initial weights of the main backbone were frozen; then, retraining was conducted using the training data presented in this paper.The PD recognition results are shown in Table 4.The confusion matrices for the eight methods run once on the testing data are shown in Figure 7.The precision, recall, and accuracy for each method with the testing data are presented in Table 5. could be correctly identified, while SCNN and ENS-SCNN-QSVM both achieved a recall of 100%.Among the eight methods for PD pattern recognition, suspended electrode discharge and metal tip discharge were easily misidentified.For the overall recognition rate on the testing dataset, the proposed ENS-SCNN-QSVM is the highest, at 70.6%.

Conclusions
The precise identification of PD is pivotal for ensuring the reliability of power supply within a power system.As CNNs are progressively employed in PD pattern recognition, the challenge of large model sizes persists, especially when striving to meet real-time demands, particularly on embedded devices.This paper introduces an ensemble learning method that combines SCNN and QSVM for identifying PD patterns.An SCNN was constructed based on the inverted residual blocks utilized in MobileNet V2.The QSVM model was established using the CLBP vectors, which was extracted from the enhanced 2D gray image.The SCNN and QSVM scores were ensembled using bagging and discriminant methods.Comparative results with existing lightweight CNNs demonstrate the proposed method's advantages in recognition accuracy, response efficiency, and parameter quantity, making it more suitable for deployment on terminal devices for PD pattern recognition.
In conclusion, the presented method shows advances in the field of PD pattern recognition, offering potential applications in the real-time identification of online PD in electrical equipment such as switchgear.By situating the UHF PD sensor outside the pertinent electrical equipment designated for testing, and subsequently connecting it to the oscilloscope or computer host through the PD host, one can display the PRPS spectrum, facilitating the application of the proposed method.Further research and development in this direction can contribute to explore multi-source mixed PD pattern recognition, focusing on separating PD mixed signals and extracting the respective characteristics, and investigate different methodologies to combine SCNN and QSVM.

Figure 3 .
Figure 3.The prototype testing device for PD.

Figure 3 .
Figure 3.The prototype testing device for PD.

Figure 6 .
Figure 6.The procedure of the proposed PD pattern recognition methodology.

Figure 5 .
Figure 5.The ROC curves and AUC values for (a) suspended electrode discharge, (b) surface discharge, and (c) metal tip discharge.

Figure 6 .
Figure 6.The procedure of the proposed PD pattern recognition methodology.Figure 6.The procedure of the proposed PD pattern recognition methodology.

Figure 6 .
Figure 6.The procedure of the proposed PD pattern recognition methodology.Figure 6.The procedure of the proposed PD pattern recognition methodology.

Table 1 .
Accuracy under different numbers of bottleneck residual blocks.

Table 1 .
Accuracy under different numbers of bottleneck residual blocks.

Table 2 .
The proposed SCNN body architecture.

Table 3 .
Accuracy and runtime under different training minibatch sizes.

Table 2 .
The proposed SCNN body architecture.

Table 3 .
Accuracy and runtime under different training minibatch sizes.
Energies 2024, 17, x FOR PEER REVIEW 8 of 13curves and AUC values are shown in Figure4.Observing Figure5, it is apparent that among the six types of SVM models, the quadratic SVM demonstrated higher AUC values across the three fault types.Therefore, the quadratic SVM model (QSVM) was selected to construct the PD pattern recognition model for the CLPB features.
Energies 2024, 17, x FOR PEER REVIEW 8 of 13curves and AUC values are shown in Figure4.Observing Figure5, it is apparent that among the six types of SVM models, the quadratic SVM demonstrated higher AUC values across the three fault types.Therefore, the quadratic SVM model (QSVM) was selected to construct the PD pattern recognition model for the CLPB features.

Table 4 .
PD recognition results using 8 different methods.

Table 5 .
The precision, recall, and accuracy for 8 methods on the testing data.