Frequency-Based Density Estimation and Identification of Partial Discharges Signal in High-Voltage Generators via Gaussian Mixture Models

Romphuchaiyapruek, Krissana; Wattanawongpitak, Sarawut

doi:10.3390/eng6040064

Open AccessArticle

Frequency-Based Density Estimation and Identification of Partial Discharges Signal in High-Voltage Generators via Gaussian Mixture Models

by

Krissana Romphuchaiyapruek

^*

and

Sarawut Wattanawongpitak

^*

Department of Electrical and Computer Engineering, Naresuan University, Phitsanulok 65000, Thailand

^*

Authors to whom correspondence should be addressed.

Eng 2025, 6(4), 64; https://doi.org/10.3390/eng6040064

Submission received: 10 February 2025 / Revised: 14 March 2025 / Accepted: 25 March 2025 / Published: 27 March 2025

(This article belongs to the Section Electrical and Electronic Engineering)

Download

Browse Figures

Versions Notes

Abstract

Online monitoring of partial discharge (PD) is a complex task traditionally requiring specialized expertise. However, recent advancements in signal processing and machine learning have facilitated the development of automated tools to identify and categorize PD patterns, aiding those without extensive experience. This paper aims to identify PD types and estimate the density distribution of frequency characteristics for three PD types, internal PD, surface PD, and corona PD, using verified PD data. The proposed method employs a findpeaks algorithm based on Fast Fourier Transform (FFT) to extract frequency key features, denoted as f₁ and f₂, from the frequency spectrum. These features are used to estimate model parameters for each PD type, enabling the representation of their frequency density distributions in a 2D map (f₁, f₂) via Gaussian Mixture Models (GMMs). The optimal number of Gaussian components, determined as five using the Bayesian Information Criterion (BIC), ensures accurate modeling. For PD identification, log-likelihood and softmax functions are applied, achieving an evaluation accuracy of 96.68%. The model also demonstrates robust performance in identifying unknown PD data, with accuracy ranging from 78.10% to 95.11%. This approach enhances the distinction between PD types based on their frequency characteristics, providing a reliable tool for PD signal analysis and identification.

Keywords:

partial discharge; fast Fourier transform; density estimation; Gaussian mixture model; identification; high voltage generators

1. Introduction

In the power plant industry, particularly the high-voltage (HV) generators that are critical components of electricity generation, face significant challenges in developing countries [1]. Power system failures, often caused by component malfunctions, human errors, or aging equipment, result in costly repairs, lost productivity, and dissatisfied customers [2]. As mentioned, aging equipment and unforeseen failures pose significant risks to power plant operations, leading to disruptions and financial losses. To mitigate these risks and ensure reliable power generation, effective monitoring of HV generators is crucial [3]. So, key aspects monitored include electrical parameters like voltage, current, and frequency; mechanical parameters like vibration, temperature, and oil pressure; thermal parameters like winding temperature, core temperature, and cooling system performance; and partial discharge activity [4].

Partial discharge (PD) is an electrical phenomenon that occurs within the insulation of HV electrical equipment, where a localized discharge only partially bridges the insulation between conductors [5]. Over time, this will gradually cause the insulation system to break down due to PD processes [6]; various methods and systems can detect PD signals. These include electrical signals, acoustic emissions, optical emissions, and chemical byproducts [7]. Among them, electrical signals are commonly used to detect and locate the PD source. It can assess the overall state of the insulating system by monitoring partial discharge statistics, such as the number of discharges, their amplitude or frequency distribution, and repetition rate, the overall condition of the insulation system can be evaluated [8]. Trends in these statistics can indicate deterioration over time [9]. Thus, it has become the most widely used online detection method [10,11], and it will be the primary focus of this study.

The principal steps involved in online PD detection methods are signal acquisition and signal analysis [12]. The signal acquisition step is to convert the underlying physical PD phenomenon into an electrical signal that can be detected and analyzed. The process generally entails the utilization of a coupling capacitor or a high-frequency current transformer to capture high-frequency current pulses linked to PD events [13]. Following the initial signal acquisition, the PD signal is subjected to a sequence of processing stages, encompassing both hardware and software methodologies. These stages involve hardware-based preprocessing, signal digitization, the implementation of software-driven denoising algorithms, and the subsequent identification of the PD source via feature extraction and pattern recognition techniques [14]. It is clear that PD online assessment tools have been increasingly used recently [15,16,17]. One explanation for an increase in popularity is the improvement of identification capabilities [14]. Additionally, the evolution of digital signal processing methods has enhanced denoising techniques [18], enabled the extraction of relevant PD parameters [19], and allowed for more accurate diagnosis of insulation conditions [20].

The identification of PD sources and noise discrimination increasingly relies on the measurement and appropriate characterization of pulse waveforms [21,22]. This approach necessitates the use of high-frequency bands for signal acquisition, coupled with advanced signal processing techniques, to effectively extract the distinguishing features of both PD and noise signals [23]. This high-frequency pulse analysis has also yielded valuable insights into various approaches for analyzing pulse waveforms, based on signal energy ratios analysis [24], spectral density analysis [25], wavelet-based techniques [26], and time-frequency analysis [27]. Recent techniques, such as zero-crossing rate and fundamental frequency estimation for acoustic signals [28] and advanced ICEEMDAN-based noise isolation methods [29], have further improved PD identification accuracy in noisy industrial environments. While this method of source identification facilitates appropriate identification of the type of PD, industrial environments often present external noise or multiple PD sources, complicating pattern interpretation and necessitating a separation process before identification [30,31]. This separation is often achieved through clusters produced in various regions on a separation map, with each cluster representing a source of PD or electrical noise.

In particular, the use of machine learning (ML) techniques, such as Random Forest (RF) [32], support vector machines (SVMs) [33], and convolutional neural networks (CNNs) [34], has emerged as they can automatically extract and identify the complex patterns within the PD and noise signals. However, these ML models require a large and diverse dataset or effective statistical feature extraction [35]. Additionally, a key challenge is that PD data often exhibit overlapping clusters corresponding to various discharge types. The non-linearity and density overlap of these clusters can make traditional identifiers less effective [36]. Density-based approaches utilizing the Gaussian Mixture Model (GMM) have proven to be a more robust solution for handling the high dimensionality and complexity of signal data [37] as they can effectively model the non-linear and overlapping clusters within the electrical signal data [38,39,40]. Therefore, the objective of this research is to estimate the frequency density distribution for each type of PD observed in onsite industrial environments, extract key features, and develop models to characterize the patterns of each PD type. Furthermore, the study aims to calculate the maximum likelihood probability for each PD signal to evaluate the identification accuracy of the proposed models. By establishing these models, the research also seeks to enable the identification of unknown PD data. This approach intends to enhance the understanding and differentiation of PD types based on their unique frequency characteristics, contributing to more effective monitoring and diagnostics in industrial environments.

This paper proposed a frequency-based density estimation and identification of PD signals in an HV generator. For frequency density estimation, the methodological approach utilizes the FFT, the findpeaks algorithm, and GMM to achieve this goal. Initially, FFT is applied to convert time-domain signals into the frequency domain, enabling the identification of prominent frequencies. Subsequently, the findpeaks algorithm is employed to extract the two most significant peaks in the frequency spectrum, denoted as f₁ (major frequency) and f₂ (minor frequency). These peaks represent dominant features of the signal and play a critical role in distinguishing between different PD types. In the GMM approach, the extracted frequencies from the findpeaks algorithm are estimated and visualized in a 2D map (f₁, f₂) to model the structural patterns of each PD type. For PD identification, the maximum likelihood method is applied using the log-likelihood function and the softmax function to determine the most probable PD type. This analytical framework is designed to be adaptable for real-time frequency representation within data acquisition systems, offering practical utility for users in industrial applications.

The remainder of this paper is organized as follows: Section 2 describes the methodology, starting with the data acquisition stage, followed by preprocessing and feature extraction, parameter estimation and modeling, and the identification stage. Section 3 presents the results and discussion, including the analysis of frequency distribution of PD patterns using only f₁ and both f₁ and f₂, the performance evaluation of the PD identification model, and model validation using unknown data. Finally, Section 4 provides the conclusions from this study.

2. Methodology

This section outlines the improvements in the PD identification methods. The objective is reached with the help of a modified version of the clustering PD algorithm, which was proposed in reference [40]. The structured methodology comprises four main steps, as illustrated in Figure 1. In the first phase, PD signals are recorded using sensors during the data acquisition process. The second phase involves preprocessing and feature extraction of PD signals for analysis through the FFT and the findpeaks algorithm to identify the dominant frequencies of the PD signals. The third phase focuses on parameter estimation to model the frequency distribution of PD patterns for each type, based on verified data, using the GMM algorithm. Finally, in the fourth phase, the identification of PD types is performed using the developed model. This is achieved by applying the log-likelihood and the softmax function. Additionally, unknown PDs are tested using the PD model to assess their accuracy in identifying PD types.

2.1. Data Acquisition Stage

The primary components of the PD testing platform for high-voltage generators. In this experiment, the gas turbine generator at an independent power produces in Thailand generates between 24 and 58 MVA of power. It connects to a voltage divider, which allows you to obtain the 50 Hz signal from one of the three phases’ voltage reference input. To measure the time-domain waveforms of the PD pulse, a 1 nF coupling capacitor is used. These capacitors are designed to operate effectively over a frequency range of approximately 100 kHz to 30 MHz. In addition, the PD generates a generator insulation fault that is transmitted via coaxial cables connected to the PD coupler terminal. The PD coupler terminal is the central point for PD signals measured from the sensors in each phase. Thereafter, the signals will be connected to the data acquisition system, which includes the AQUILA portable PD analyzer system that is responsible for the acquisition, storage, and analysis of PD pulse data, specifically pulse waveforms in a time domain with a sampling rate of 100 Ms/s. It employs an ultra-wide band (UWB) with a 16 kHz–30 MHz. The generator under study used six sensors (3 pairs) to achieve noise rejection based on the arrival time of pulses from the two couplers. This method uses PD pulses originating within generator insulation to propagate differently compared to external noise or interference signals. The directional arrangement employs two sensors for each phase, with one sensor positioned near the phase terminal and the other located on the machine’s output bus, at a minimum distance of 2 m. When an electrical noise pulse enters the ring bus, it splits and travels in both directions. In a symmetric setup, the pulse reaches both ends simultaneously and cancels out in the differential circuit. If a PD occurs near one coupling point, its pulse arrives earlier, creating a time difference. This time difference produces a net output from the differential circuit, indicating a real PD event [41]. Moreover, the computer’s software is responsible for performing the signal filtering, which automatic interference elimination was applied based on statistical disturbance recognition by adjusting the time and frequency windows in the PD analyzer. The IEEE 1434-2000 standard describes this method [12]. A display unit, along with a PRPD, TF Map, PD signal, and pulse spectrum, is shown for the PD signal obtained from one of the measurement terminals. The diagrams and circuit measurement equipment are shown in Figure 2 and Figure 3.

2.2. Preprocessing and Feature Extraction Stage

The data utilized in this research have been derived from the collection of real case studies of HV generators experiencing issues. Three primary forms of PD were implemented in this investigation: internal PD; surface PD; and corona PD. As for surface PD (stress grading) and corona PD, the occurrence data were validated and analyzed by intentionally opening the machinery to obtain a direct view of the visible signs, as depicted in Figure 4. The internal PD poses challenges in evidence verification as it occurs within the metal insulation, making it difficult to ascertain. However, the data used in this study were carefully reviewed by experts using PRPD and TF Map as verification tools. The process included categorizing data into PD and noise groups and comparing the results with standard PRPD patterns [12,13].

2.2.1. PD Signal Waveforms

PD signal waveforms coming from a real generator were taken using a PD measuring apparatus. The time duration for a single PD-recorded waveform in the time domain can be adjusted to 100, 200, or 1000 points at a sampling rate of 100 MS/s, which means that a pulse recording time can lie between 1 μs and 10 μs. Such an allowance facilitates adjusting to pulse current waveforms of different lengths. It is imperative to choose the appropriate dataset size. While smaller datasets (<1000 signals) [42] may result in reduced reliability and increased variance in identification performance, larger datasets (>100,000 signals) [31] can improve accuracy but may also introduce computational challenges, such as increased processing time and data storage requirements. This study used 10,000 PD signal waveforms per PD type, ensuring equal representation of each type during model training. Data balancing prevents bias towards prevalent classes and enhances robustness by reducing overfitting risk. This approach also improves the model’s generalizability when applied to new data, reducing the risk of bias towards prevalent classes. An example of a captured PD signal is presented in Figure 5, which gives a pulse recording time length of 2 μs.

2.2.2. Fast Fourier Transform (FFT)

FFT is a widely used algorithm in signal processing for transforming time-domain signals into the frequency domain, allowing for easier analysis and preprocessing of data. In the beginning, the obtained PD signals are converted into the frequency domain, which is a technique that computes the discrete Fourier transform at a low processing cost [43]. In the context of a time-series PD signal (x_n), the frequency spectrum (X_i) is defined by Equation (1) as the representation of the frequency components of the respective signal.

X_{i} = \sum_{n = 0}^{N - 1} x_{n} e^{- \frac{2 π i k n}{N}}, i = 0, 1, \dots, N - 1

(1)

where N represents the length of the PD signal (x). It is the frequency spectrum (X_i, 0 ≤ i ≤ N − 1) that accurately depicts the frequency range of x_n. In the frequency domain, Δf is the frequency spacing between X components (see Equation (2)), similar to Δt, the interval between x samples in the time domain. Here, f_s is the sampling rate (samples/sec).

Δ f = \frac{f_{s}}{N} = \frac{1}{N Δ t}

(2)

2.2.3. Findpeaks Algorithm

The mean and standard deviation of the PD in magnitude serve the purpose of helping the findpeaks function to locate local maxima of a signal in the frequency domain. It employs Matlab 2018b with a licensed academic version provided by Naresuan University and uses the Matlab function, [peaks, location] = findpeaks (data), and other proprietary algorithms to locate points on the graph fitting a given threshold. The findpeaks function is able to find local maxima on a dataset and returns the magnitude and location of these peaks based on input parameters, such as minimum peak distance and minimum peak height. Thus, the function accurately detects local maxima that satisfy the specified threshold conditions, returning their magnitudes and locations for further analysis. This is important for automated “local maxima” algorithms where peaks overlap without separation sufficiently clear to establish prominence. The data matrix contains peak information such as peak location and peak magnitude data, and these will be used to formulate the empirical model and also be used as an input for the initial condition. How peaks are located using findpeaks:

The signal obtained from the FFT is subjected to a normalization process to ensure that the magnitude values are normalized within the range of 0 to 1.
The findpeaks function is used to detect peaks from the signal components, with the goal in this work being to identify the two highest peaks, referred to as the f₁ (major) and f₂ (minor) frequencies. Figure 6 illustrates an example of the results for determining f₁ and f₂ for each PD type.

2.3. Parameter Estimation and Modeling Stage

2.3.1. Gaussian Mixture Model (GMM)

The GMM is a statistical model that combines several Gaussian components and weights them to represent the density of a specific random variable [44]. Utilizing a distinct collection of Gaussian functions, each characterized by its mean and covariance matrix, enhances the modeling capacity. Equation (3) presents the formula to calculate the probability for x under each Gaussian distribution; multiply that by the weight of the distribution, and sum them up. The result is the overall probability of x under the GMM.

P (x | Θ) = \sum_{k = 1}^{K} w_{k} G (x | μ_{k}, Σ_{k})

(3)

where x data points are the {x₁, x₂, …, x_N} for N is the number of data points. K is the total number of Gaussian components. Statistical notations called parameters are represented by the symbol Θ = {µ_k, Σ_k, w_k}. The mean value of the kth Gaussian is denoted by μ_k. The kth component’s covariance is represented as Σ_k. w_k is the weight of the kth Gaussian component. The value of this variable is within the range of 0 to 1, and it indicates the importance of each component, and the total weighted sum comes out to be 1. In Equation (4), the function G(x|µ, Σ) represents the probability density of the Gaussian component.

G (x | μ, Σ) = \frac{1}{{(2 π)}^{D / 2} | Σ |^{1 / 2}} e^{- \frac{1}{2} {(x - μ)}^{T} Σ^{- 1} (x - μ)}

(4)

where D is the dimension of x. The Σ is a matrix that explains the spread or shape of the component. It displays the degree of closeness or spread among the data points. The μ is the center of each Gaussian component, representing the “average” location of data points in that group.

The Expectation–Maximization (EM) algorithm is used to estimate parameters, improving the model’s fit to the data through iterative steps.

In the expectation step, the EM algorithm calculates the probabilities that individual components generate each sample during the expectation phase. The subsequent equation, Equation (5), is used to determine the likelihood L(C_k|x_n) that the kth component C_k generates sample x_n.

L (C_{k} | x_{n}) = w_{k} P (x_{n} | μ_{k}, Σ_{k}) / (\sum_{k = 1}^{K} w_{k} P (x_{n} | μ_{k}, Σ_{k}))

(5)

In the maximization step, the EM algorithm refines the parameters of the Gaussian Mixture Model. Specifically, it updates the mean as defined in Equation (6), the covariance as outlined in Equation (7), and weights of each component as given in Equation (8). These updates are based on the likelihoods computed during the expectation step, and they aim to maximize the overall log-likelihood of the observed data under the current model. This process helps the algorithm converge to parameter estimates that best represent the underlying data distribution.

w_{k} = \sum_{n = 1}^{N} L (C_{k} | x_{n}) / N

(6)

μ_{k} = \sum_{n = 1}^{N} L (C_{k} | x_{n}) \cdot x_{n} / (N \cdot w_{k})

(7)

Σ_{k} = \sum_{n = 1}^{N} L (C_{k} | x_{n}) [x_{n} - μ_{k}] {[x_{n} - μ_{k}]}^{T} / (N \cdot w_{k})

(8)

The EM algorithm is an iterative method that maximizes the parameter probability by increasing its log-likelihood function, L(Θ) (see Equation (9)). In each iteration, the algorithm starts with an initial value and then updates it repeatedly using Equation (5) along with Equations (6)–(9) for each component until it converges [45]. This approach is demonstrated in the GMM algorithm for parameter estimation, which is used to represent the PD distribution in Table 1, while Table 2 outlines the overall PD modeling process.

L (Θ) = \ln Π_{n = 1}^{N} P (x_{n} | Θ) = \sum_{n = 1}^{N} \ln \{\sum_{k = 1}^{K} w_{k} P_{k} (x_{n} | Θ)\}

(9)

Table 1. The algorithm GMM for parameter estimation.

Input: The frequency value for each type of PD using the findpeaks algorithm.
Output:

μ = {μ_{1}, \dots, μ_{K}}, Σ = {Σ_{1}, \dots, Σ_{K}}, w = {w_{1}, \dots, w_{K}}

and BIC_K

1: for k = 1: K do
2: Initialize

μ_{k}, Σ_{k}, w_{k}

and set tol = 0.001.
3: while not converged do
4: Compute

L (C_{k} | x_{n})

using Equation (5).
5: Compute using

μ_{k}, Σ_{k}, w_{k}

Equations (6)–(8).
6: Compute L(Θ) using Equation (9).
7: Check for converged: If the change in log-likelihood (L(Θ) − previous L(Θ)) < tol
8: end while
9: Compute BIC value using Equation (10).
10:

k \leftarrow k + 1

11: end for

Table 2. The process for modeling of PD.

Input: The internal, surface, and corona PDs signal based on verified PD data
Output: The model of internal PD, surface PD, and corona PD

1: Selecting a PD type for model creation.
2: FFT is applied to PD signals with transformation into the frequency domain.
3: Using the findpeaks function to identify the top two highest peaks (f₁, f₂).
4: Using algorithm GMM for parameter estimation in Table 1.
5: Develop a model using the parameters from Step 4 and incorporate them into Equation (3).
For the visualization of the 2D model, 10,000 frequency data points were randomly
generated. An example of the modeling of internal PD is illustrated in Figure 7.

Figure 7. The modeling of internal PD using the GMM with K = 1 (a); 2 (b); 3 (c); and 5 (d).

2.3.2. BIC

The estimation of the model’s performance as performed by the Bayesian Information Criterion (BIC) [46] focuses on the trade-off between the accuracy of models and their complexity, as expressed in Equation (10). It discourages the adoption of models with more parameters to avoid overfitting. The impact of BIC is the more negative the score, the greater the level of accuracy obtained for the simplicity of the model [47]; thus, it is helpful when determining the optimal number of components in the GMM.

B I C = \ln (N) h - 2 \ln (\hat{L})

(10)

where

\hat{L}

is the maximum likelihood of the model. h represents the number of free parameters in the model. For a GMM with K components, h includes the parameters needed to describe the means, covariances, and weights of each Gaussian component.

2.4. Identification State

For PD identification, our goal is to distinguish between different PD types by modeling the statistical distribution that underlies each type. Each PD type has unique signal characteristics, and the GMM helps capture these features by representing each PD type as a combination of several Gaussian distributions. Table 3 outlines the process used for identifying and evaluating PD signals. To assess the efficiency of PD signal identification, we use the maximum value of the log-likelihood function along with a softmax identifier [48] (see Equation (11)) to determine the highest probability, P(c_j|x), that a given PD signal corresponds to a specific PD type based on the GMM model. Let C = {c₁, c₂, …, c_j} be the set of PD classes. The probability that sample x belongs to class c_j is expressed as follows:

P (c_{j} | x) = e^{L_{j}} / \sum_{i = 1}^{J} e^{L_{i}}

(11)

where L_j is the log-likelihood of sample x for class c_j_, and J is the total number of PD classes, as demonstrated in the calculation example in Figure 8 and Table 4. The process for identification and evaluation of PD signals follows these:

3. Results and Discussion

3.1. Analysis Frequency Distribution of PD Patterns with Only f₁ and Both f₁ and f₂

In Step 3 (see Table 2), we extracted features using the findpeaks function to identify the two highest peaks (f₁ and f₂) from 10,000 PD signal data for each of the three types. This method was used to differentiate the frequencies of the three PD types, with particular attention given to internal PD and surface PD, where the frequencies tend to overlap. The frequency results for each type, as determined by the findpeaks function, are shown in Figure 9 and Figure 10.

The frequency distribution of the f₁ was illustrated in Figure 9. It was found that internal PD had the least dispersion, with a significant accumulation of f₁ at approximately 1.34 MHz. In contrast, surface PD and corona PD exhibited greater dispersion, with the mean f₁ at 1.15 MHz and 12.80 MHz, respectively. These differing characteristics could be analyzed to more accurately distinguish between the types of PD. For identifying the type of PD, the top two peak points were selected, yielding f₁ and f₂. These parameters were represented on a two-dimensional map. The results of this feature extraction are discussed in detail.

The frequency distribution in Figure 10, the internal PD displayed the f₁ distribution between 0.1 and 4.8 MHz and the f₂ distribution between 1.0 and 5.0 MHz. Observations indicated that there were few data points within the frequency of f₁ at 2.25–4.80 MHz, while the majority of the distribution clustered densely around f₁ of 0.5 to 1.5 MHz and f₂ of 1.5 to 5.0 MHz. It was proposed that internal PD was highly probable to occur within the specified range, whereas surface PD showed a distribution pattern that differed from internal PD, taking on an elliptical shape. The f₁ from 0.1 to 4.2 MHz, while the f₂ exhibited a wider distribution from 0.8 to 8.8 MHz, which could be divided into two groups: one with f₂ from 0.8 to 5.0 MHz and the other from 5.0 to 8.8 MHz. This pattern likely represented a characteristic feature of surface PD. On the other hand, the frequency distribution of corona PD showed a distribution pattern distinct from both internal PD and surface PD. Corona PD exhibited higher frequency than the other two types, with the f₁ from 11.0 to 13.0 MHz and the f₂ from 10.0 to 21.0 MHz. The distribution was dense and concentrated, indicating that the corona PD signals analyzed were consistent, resulting in frequencies occurring at the same points. Additionally, the distribution of f₁ was relatively narrow (low variance), while f₂ was noticeably more dispersed.

It was observed that the frequency distribution of all three types had some overlapping areas and some clearly distinguishable parts. Therefore, an analysis should be conducted to determine the model and how many groups the data should be divided into. To identify the optimal GMM using BIC, a range of component counts from one to five was tested. Component counts exceeding five were not assessed, as they caused the mixtures to become too small for reliable model fitting.

The BIC value for each type of PD was displayed in Figure 11. The experimental findings showed that when there were five components, the BIC value for each type of PD was at its lowest. This indicates the optimal parameters of the GMM, which leads to the following presentation of the parameter estimation and modeling results for each PD type.

The GMM results presented in Table 5 and Figure 12 effectively characterize the distinct features of internal PD, surface PD, and corona PD by employing five Gaussian components (K = 5). These results demonstrate the ability of the model to capture and differentiate the unique frequency distribution patterns associated with each type of PD. Each component in the model was characterized by parameter estimation, including the mean frequency (μ_f₁, μ_f₂), covariances (Σ), and weights (w). For internal PD, Group 1 emerged as the most significant cluster, with a weight of 0.71 and mean frequencies of (1.34, 3.16) MHz. Additionally, the Σ for (f₁, f₂) was calculated as (0.01, 8.34) × 10¹¹. The high weight value of Group 1, combined with its distinct frequency distribution characteristics, indicates its dominance in the dataset. Specifically, the frequency distribution in Group 1 exhibited a very narrow spread in the f₁ range but a significantly broader spread in the f₂ range. These distinctive features can be effectively utilized to differentiate between various types of PD activities. Additionally, surface PD exhibited its own unique frequency distribution characteristics. The majority of the data distribution was concentrated in Group 1 and Group 2, with weights of 0.40 and 0.20, respectively. The mean frequencies for these groups were estimated at (1.15, 2.56) MHz and (0.81, 1.19) MHz. Furthermore, the Σ for both groups was found to be the smallest compared to other groups in surface PD. This indicates that the frequency structures of Group 1 and Group 2 in surface PD are relatively compact, with minimal frequency dispersion. These distinct features enhance the ability to identify and differentiate surface PD activities effectively.

However, the corona PD category exhibited very high mean frequency values. For Group 1 and Group 2, the weights were 0.39 and 0.38, respectively, with mean frequencies estimated at (12.83, 10.45) MHz and (12.67, 19.82) MHz. The frequency structure of corona PD showed no overlap with the frequency ranges of internal PD and surface PD. This distinct separation highlights the unique characteristics of corona discharge, as the higher frequency [49] range is a key indicator of corona PD activity. Figure 12 illustrates the identification regions for each type of PD, demonstrating the capability of the proposed design to differentiate between PD types based on their unique pattern characteristics. The ellipses in each subplot represent the GMM identification, with distinct positions and distributions corresponding to the covariance matrices of each group. Notably, corona PD (Figure 12c) exhibits clusters with higher centroid frequencies compared to internal and surface PDs, indicating its characteristic high-frequency activity, which is a significant observation. In contrast, the distinct properties of internal PD and surface PD, as shown in Figure 12a,b, are characterized by smaller covariance matrix values in groups 1 and 2. These groups are particularly important for the identification process, as they provide key features that enable the differentiation between internal PD and surface PD. The smaller covariance values in these groups suggest more tightly clustered data points, highlighting the unique frequency distribution patterns specific to each PD type.

3.2. The Performance of PD Model for Identification

In this experiment, the PD dataset was split into 80% for training and 20% for testing, with a 5-fold cross-validation approach [50] applied. For each fold, one subject’s data was excluded for validation while the rest were used for training. Based on the analysis of the optimal number of components using the BIC in the previous section, a value of K = 5 was selected for the GMM to calculate the accuracy. Accuracy was evaluated for two cases: f₁ alone and f₁ combined with f₂. The performance of the algorithm was comprehensively evaluated using metrics such as accuracy, precision, recall, and confusion matrix values to identify the highest accuracy rate.

The performance of PD identification for each type using a single frequency (only f₁) is presented in Table 6. The f₁ achieves an accuracy of 93.17%. Table 7 provides the performance metrics of the algorithms using f₁ and f₂. The GMM achieved high precision values across all classes: 100% for corona PD, 96.42% for internal PD, and 93.69% for surface PD, with recall values of 100%, 93.53%, and 96.53%, respectively. Meanwhile, the overall accuracy is 96.68%. It is noteworthy that f₁ and f₂ are capable of accurately identifying the type of PD.

In this study, frequency-domain features (f₁ and f₂) were selected due to their high discriminative power, as demonstrated by their ability to identify among internal, surface, and corona PD types. Additionally, these frequency components offer straightforward computation and practical real-time applicability through FFT techniques and the findpeaks algorithm, making them ideal for practical field implementations [51,52]. Although preliminary evaluation of additional features that include time-domain statistics (peak amplitude, mean, variance) as well as waveform characteristics indicates that performance improvement is minimal compared to the increased computational load [53]. Some features, such as peak amplitude and variance, further reduce the performance gain because these two feature values do not differ significantly, and the data distribution is highly overlapping. Future studies should further investigate the integration of these diverse feature domains by exploring variable correlations [54] or dimensionality reduction [55] to potentially achieve improved classification accuracy and robustness.

Table 8 presents a comparative overview of various PD identification methods, highlighting differences in feature extraction techniques, classification algorithms, and their resultant accuracies. The approach proposed in this work, employing the FFT, peak identification with the findpeaks algorithm, and GMM combined with the softmax function, demonstrated robust performance, achieving accuracies between 93.17% and 96.68%. This result aligns well with previously published methods, reinforcing the viability of frequency-domain features for accurate PD identification.

When comparing to laboratory-based studies, [57] utilized discrete wavelet transform (DWT) along with statistical parameters and achieved high accuracy ranging between 90 and 100%, indicating excellent discriminative capability when analyzing controlled laboratory data. Reference [56] also demonstrated the effectiveness of K-means clustering for PD classification, achieving an accuracy of 88.9% through statistical and pulse-shape characteristics derived from CE functions. However, it is noteworthy that laboratory-based experiments, while beneficial for controlled analyses, may not fully replicate the complexity of on-site conditions.

On-site studies typically face additional challenges from noise and interference, making real-world accuracy particularly significant. Reference [58] applied frequency-domain analysis via the Welch method combined with a probabilistic neural network (PNN) and reached an accuracy of 92%, reflecting a high reliability in practical environments. Similarly, [32] employed PHA coupled with a RF classifier, reaching accuracy levels as high as 99% in hydro-generator contexts. Reference [59] achieved an accuracy between 88 and 94.8% by applying ANN to sub-PRPD images and PD cloud shape quantifications, underlining the utility of advanced pattern recognition techniques in complex PD scenarios. Reference [60], using amplitude histograms and Self-Organizing Probability Maps (SOPMs), achieved a classification accuracy of approximately 90%, suggesting moderate effectiveness for large datasets from operational environments.

The methodology proposed in this research presents a balanced trade-off between computational simplicity and high identification accuracy, positioning it favorably among existing methods. Moreover, frequency-domain analysis combined with the GMM and softmax identification has demonstrated clear practical advantages, especially considering real-time and online monitoring applications.

3.3. Evaluation of the Model for Unknown Data

At this stage, the PD identification model was tested on four PD cases with unseen data to evaluate its performance.

The frequency distribution for the four PD cases is shown in Figure 13 as f₁ and f₂. A total of 10,000 PD pulse waveforms per case, across four cases, were used for testing. The focus was on two models: internal PD and surface PD. The testing involved determining the probability of each case being a specific PD type. Unidentified pulse data were evaluated by comparing their probability values against the PD models for each type. If the calculated probability for a given pulse was less than 5%, the case was identified as an unknown PD type. The highest posterior probability for each PD type was then compared. The test results, showing the number of PD signals for four cases being identified as a particular PD type, were presented in Table 9.

The testing results demonstrated that cases 1 and 2 exhibited probabilities of being identified as internal PD at 78.10% and 80.95%, respectively. In contrast, cases 3 and 4 showed significantly higher probabilities of being identified as surface PD, at 94.46% and 95.11%, respectively. As illustrated in Figure 13, the distribution patterns of cases 1 and 2 closely aligned with the data used to construct the GMM identification model for internal PD. Similarly, cases 3 and 4 displayed patterns that closely matched the surface PD model. However, a notable portion of the data in cases 1 and 2, accounting for 18.83% and 9.23%, respectively, could not be confidently identified. This was attributed to their significant deviation from the mean frequency (µ_f) of PD models for each type, highlighting limitations in the model’s ability to identify data points that fall outside the established distribution patterns.

4. Conclusions

This work proposes a novel methodology for frequency density estimation and identification of PD signals in HV generators into three categories: internal PD, surface PD, and corona PD. Our approach transforms time domain PD signals into the frequency domain using FFT coupled with critical frequency extraction via the findpeaks algorithm, while the GMM is employed to estimate the model parameters, as well as to model the frequency patterns of each PD category. One metric used to optimize these parameters is the selection of the model with the lowest BIC value. Experimental results show that optimal identification performance is achieved with five Gaussian components and that the distinct mean frequency values (with the highest for corona PD, followed by internal PD and then surface PD) demonstrate the method’s ability to differentiate between PD types. Moreover, the integration of both major (f₁) and minor (f₂) frequency features significantly enhances identification accuracy, with our approach achieving up to 96.68% accuracy during modeling and between 78.10% and 95.11% accuracy when tested with unknown data. These findings not only validate the effectiveness of our frequency domain approach but also provide a basis for further exploration. The enhanced performance and robustness of the proposed method make it ideal for practical field implementations. Future studies should further investigate the integration of these diverse feature domains to improve identification accuracy and extend the applicability of the model to additional PD characteristics.

Author Contributions

Conceptualization, K.R. and S.W.; Data curation, K.R.; Formal analysis, K.R. and S.W.; Funding acquisition, K.R.; Investigation, K.R. and S.W.; Methodology, K.R. and S.W.; Project administration, S.W.; Resources, K.R.; Software, K.R.; Supervision, S.W.; Validation, S.W.; Visualization, K.R.; Writing—original draft K.R.; Writing—review and editing, K.R. and S.W. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financially supported by the Ministry of Science and Technology, Thailand. The authors sincerely appreciate their support.

Institutional Review Board Statement

Not applicable, as our study did not involve human or animal subjects.

Informed Consent Statement

Not applicable, as our study did not involve human subjects.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

Acknowledgments

The PD data acquisition test HV generators were provided by Bluetech Unique Engineering Co., Ltd.

Conflicts of Interest

All authors declare no conflicts of interest.

References

Ardeshiri, A.; Lotfi, A.; Behkam, R.; Moradzadeh, A.; Barzkar, A. Introduction and Literature Review of Power System Challenges and Issues. In Application of Machine Learning and Deep Learning Methods to Power System Problems; Nazari-Heris, M., Asadi, S., Mohammadi-Ivatloo, B., Abdar, M., Jebelli, H., Sadat-Mohammadi, M., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 19–43. ISBN 978-3-030-77696-1. [Google Scholar]
Darab, C.; Tarnovan, R.; Turcu, A.; Martineac, C. Artificial Intelligence Techniques for Fault Location and Detection in Distributed Generation Power Systems. In Proceedings of the 2019 8th International Conference on Modern Power Systems (MPS), Cluj Napoca, Romania, 21–23 May 2019; pp. 1–4. [Google Scholar]
Li, S.; Li, J. Condition Monitoring and Diagnosis of Power Equipment: Review and Prospective. High Volt. 2017, 2, 82–91. [Google Scholar] [CrossRef]
Kande, M.; Isaksson, A.J.; Thottappillil, R.; Taylor, N. Rotating Electrical Machine Condition Monitoring Automation—A Review. Machines 2017, 5, 24. [Google Scholar] [CrossRef]
Stone, G.C. Partial Discharge Diagnostics and Electrical Equipment Insulation Condition Assessment. IEEE Trans. Dielectr. Electr. Insul. 2005, 12, 891–904. [Google Scholar] [CrossRef]
Fuhr, J.; Aschwanden, T. Identification and Localization of PD-Sources in Power-Transformers and Power-Generators. IEEE Trans. Dielectr. Electr. Insul. 2017, 24, 17–30. [Google Scholar] [CrossRef]
Yaacob, M.M.; Alsaedi, M.A.; Rashed, J.R.; Dakhil, A.M.; Atyah, S.F. Review on Partial Discharge Detection Techniques Related to High Voltage Power Equipment Using Different Sensors. Photonic Sens. 2014, 4, 325–337. [Google Scholar] [CrossRef]
Kunicki, M.; Cichoń, A.; Nagi, Ł. Statistics Based Method for Partial Discharge Identification in Oil Paper Insulation Systems. Electr. Power Syst. Res. 2018, 163, 559–571. [Google Scholar] [CrossRef]
Montanari, G.C.; Cavallini, A. Partial Discharge Diagnostics: From Apparatus Monitoring to Smart Grid Assessment. IEEE Electr. Insul. Mag. 2013, 29, 8–17. [Google Scholar] [CrossRef]
Luo, Y.; Li, Z.; Wang, H. A Review of Online Partial Discharge Measurement of Large Generators. Energies 2017, 10, 1694. [Google Scholar] [CrossRef]
Hussain, M.R.; Refaat, S.S.; Abu-Rub, H. Overview and Partial Discharge Analysis of Power Transformers: A Literature Review. IEEE Access 2021, 9, 64587–64605. [Google Scholar] [CrossRef]
IEEE 1434-2014; IEEE Guide for the Measurement of Partial Discharges in AC Electric Machinery. IEEE: New York, NY, USA, 2014. [CrossRef]
IEC 60034-27-2; Rotating Electrical Machines-Part 27-2: On-Line Partial Discharge Measurements on the Stator Winding Insulation of Rotating Electrical Machines. International Electrotechnical Commission: Geneva, Switzerland, 2012.
Lee, S.B.; Stone, G.C.; Antonino-Daviu, J.; Gyftakis, K.N.; Strangas, E.G.; Maussion, P.; Platero, C.A. Condition Monitoring of Industrial Electric Machines: State of the Art and Future Challenges. IEEE Ind. Electron. Mag. 2020, 14, 158–167. [Google Scholar] [CrossRef]
Wang, Y.; Yan, J.; Yang, Z.; Zhao, Y.; Liu, T. Optimizing GIS Partial Discharge Pattern Recognition in the Ubiquitous Power Internet of Things Context: A MixNet Deep Learning Model. Int. J. Electr. Power Energy Syst. 2021, 125, 106484. [Google Scholar] [CrossRef]
Chang, C.-K.; Chang, H.-H.; Boyanapalli, B.K. Application of Pulse Sequence Partial Discharge Based Convolutional Neural Network in Pattern Recognition for Underground Cable Joints. IEEE Trans. Dielectr. Electr. Insul. 2022, 29, 1070–1078. [Google Scholar] [CrossRef]
Ilkhechi, H.D.; Samimi, M.H. Applications of the Acoustic Method in Partial Discharge Measurement: A Review. IEEE Trans. Dielectr. Electr. Insul. 2021, 28, 42–51. [Google Scholar] [CrossRef]
Chaudhuri, S.; Ghosh, S.; Dey, D.; Munshi, S.; Chatterjee, B.; Dalai, S. Denoising of Partial Discharge Signal Using a Hybrid Framework of Total Variation Denoising-Autoencoder. Measurement 2023, 223, 113674. [Google Scholar] [CrossRef]
Florkowski, M. Anomaly Detection, Trend Evolution, and Feature Extraction in Partial Discharge Patterns. Energies 2021, 14, 3886. [Google Scholar] [CrossRef]
Govindarajan, S.; Morales, A.; Ardila-Rey, J.A.; Purushothaman, N. A Review on Partial Discharge Diagnosis in Cables: Theory, Techniques, and Trends. Measurement 2023, 216, 112882. [Google Scholar] [CrossRef]
Rostaghi-Chalaki, M.; Yousefpour, K.; Klüss, J.; Kurum, M.; Donohoe, J.P.; Park, C. Classification and Comparison of AC and DC Partial Discharges by Pulse Waveform Analysis. Int. J. Electr. Power Energy Syst. 2021, 125, 106518. [Google Scholar] [CrossRef]
Long, J.; Xie, L.; Wang, X.; Zhang, J.; Lu, B.; Wei, C.; Dai, D.; Zhu, G.; Tian, M. A Comprehensive Review of Signal Processing and Machine Learning Technologies for UHF PD Detection and Diagnosis (II): Pattern Recognition Approaches. IEEE Access 2024, 12, 29850–29890. [Google Scholar] [CrossRef]
Ardila-Rey, J.A.; Cerda-Luna, M.P.; Rozas-Valderrama, R.A.; de Castro, B.A.; Andreoli, A.L.; Muhammad-Sukki, F. Separation Techniques of Partial Discharges and Electrical Noise Sources: A Review of Recent Progress. IEEE Access 2020, 8, 199449–199461. [Google Scholar] [CrossRef]
Mor, A.R.; Castro Heredia, L.C.; Muñoz, F.A. New Clustering Techniques Based on Current Peak Value, Charge and Energy Calculations for Separation of Partial Discharge Sources. IEEE Trans. Dielectr. Electr. Insul. 2017, 24, 340–348. [Google Scholar] [CrossRef]
Fresno, J.M.; Ardila-Rey, J.A.; Martínez-Tarifa, J.M.; Robles, G. Partial Discharges and Noise Separation Using Spectral Power Ratios and Genetic Algorithms. IEEE Trans. Dielectr. Electr. Insul. 2017, 24, 31–38. [Google Scholar] [CrossRef]
Kumar, C.; Ganguly, B.; Dey, D.; Chatterjee, S. Wavelet-Based Convolutional Neural Network for Denoising Partial Discharge Signals Extracted via Acoustic Emission Sensors. IEEE Sens. Lett. 2024, 8, 6007804. [Google Scholar] [CrossRef]
Chan, J.C.; Ma, H.; Saha, T.K. Time-Frequency Sparsity Map on Automatic Partial Discharge Sources Separation for Power Transformer Condition Assessment. IEEE Trans. Dielectr. Electr. Insul. 2015, 22, 2271–2283. [Google Scholar] [CrossRef]
Maresch, K.; Freitas-Gutierres, L.F.; Oliveira, A.L.; Borin, A.S.; Cardoso, G.; Damiani, J.S.; Morais, A.M.; Correa, C.H.; Martins, E.F. Advanced Diagnostic Approach for High-Voltage Insulators: Analyzing Partial Discharges through Zero-Crossing Rate and Fundamental Frequency Estimation of Acoustic Raw Data. Energies 2023, 16, 6033. [Google Scholar] [CrossRef]
Thuc, V.C.; Lee, H.S. Partial Discharge (PD) Signal Detection and Isolation on High Voltage Equipment Using Improved Complete EEMD Method. Energies 2022, 15, 5819. [Google Scholar] [CrossRef]
Ardila-Rey, J.A.; Schurch, R.; Poblete, N.M.; Govindarajan, S.; Muñoz, O.; de Castro, B.A. Separation of Partial Discharges Sources and Noise Based on the Temporal and Spectral Response of the Signals. IEEE Trans. Instrum. Meas. 2021, 70, 3526013. [Google Scholar] [CrossRef]
Melo, J.V.J.; Lira, G.R.S.; Costa, E.G.; Vilar, P.B.; Andrade, F.L.M.; Marotti, A.C.F.; Costa, A.I.; Leite Neto, A.F.; Santos Júnior, A.C. dos Separation and Classification of Partial Discharge Sources in Substations. Energies 2024, 17, 3804. [Google Scholar] [CrossRef]
Pardauil, A.C.N.; Nascimento, T.P.; Siqueira, M.R.S.; Bezerra, U.H.; Oliveira, W.D. Combined Approach Using Clustering-Random Forest to Evaluate Partial Discharge Patterns in Hydro Generators. Energies 2020, 13, 5992. [Google Scholar] [CrossRef]
Zhou, Y.; Liu, Y.; Wang, N.; Han, X.; Li, J. Partial Discharge Ultrasonic Signals Pattern Recognition in Transformer Using BSO-SVM Based on Microfiber Coupler Sensor. Measurement 2022, 201, 111737. [Google Scholar] [CrossRef]
Florkowski, M. Classification of Partial Discharge Images Using Deep Convolutional Neural Networks. Energies 2020, 13, 5496. [Google Scholar] [CrossRef]
Boppiniti, S.T. Big Data Meets Machine Learning: Strategies for Efficient Data Processing and Analysis in Large Datasets. Int. J. Creat. Res. Comput. Technol. Des. 2020, 2. Available online: https://jrctd.in/index.php/IJRCTD/article/view/68 (accessed on 9 February 2025).
Mantach, S. Supervised and Unsupervised Deep Learning Models for Partial Discharge Source Detection and Classification in Electrical Insulation. Ph.D. Thesis, University of Manitoba, Winnipeg, MB, Canada, 2023. [Google Scholar]
Miraftabzadeh, S.M.; Colombo, C.G.; Longo, M.; Foiadelli, F. K-Means and Alternative Clustering Methods in Modern Power Systems. IEEE Access 2023, 11, 119596–119633. [Google Scholar] [CrossRef]
Ma, Y.; Hao, Y. Antenna Classification Using Gaussian Mixture Models (GMM) and Machine Learning. IEEE Open J. Antennas Propag. 2020, 1, 320–328. [Google Scholar]
Mas’ud, A.A.; Sundaram, A.; Ardila-Rey, J.A.; Schurch, R.; Muhammad-Sukki, F.; Bani, N.A. Application of the Gaussian Mixture Model to Classify Stages of Electrical Tree Growth in Epoxy Resin. Sensors 2021, 21, 2562. [Google Scholar] [CrossRef]
Romphuchaiyapruek, K.; Wattanawongpitak, S. Application of the Gaussian Mixture Model for Clustering Partial Discharge Signals in High Voltage Generators. In Proceedings of the 2023 International Conference on Power, Energy and Innovations (ICPEI), Phrachuap Khirikhan, Thailand, 18–20 October 2023; pp. 31–35. [Google Scholar]
Stone, G.C.; Sasic, M. Twenty-Five Years of Experience With On-Line Partial Discharge Testing of Stator Windings. In Proceedings of the 12th International Conference on Electrical Insulation, Birmingham, UK, 29–31 May 2013; pp. 29–31. [Google Scholar]
Pattanadech, N.; Nimsanong, P. Effect of Training Methods on the Accuracy of PCA-KNN Partial Discharge Classification Model. In Proceedings of the TENCON 2014—2014 IEEE Region 10 Conference, Bangkok, Thailand, 22–25 October 2014; pp. 1–5. [Google Scholar]
Krishna, H. Digital Signal Processing Algorithms: Number Theory, Convolution, Fast Fourier Transforms, and Applications; Routledge: London, UK, 2017; ISBN 978-1-351-45496-4. [Google Scholar]
McNicholas, P.D. Mixture Model-Based Classification; CRC Press: Boca Raton, FL, USA, 2016; ISBN 978-1-4822-2567-9. [Google Scholar]
Reynolds, D.A. Gaussian Mixture Models. Encycl. Biom. 2009, 741, 3. [Google Scholar]
Neath, A.A.; Cavanaugh, J.E. The Bayesian Information Criterion: Background, Derivation, and Applications. WIREs Comput. Stat. 2012, 4, 199–203. [Google Scholar] [CrossRef]
Zhang, J.; Yang, Y.; Ding, J. Information Criteria for Model Selection. WIREs Comput. Stat. 2023, 15, e1607. [Google Scholar] [CrossRef]
Franke, M.; Degen, J. The Softmax Function: Properties, Motivation, and Interpretation. 2023. Available online: https://osf.io/preprints/psyarxiv/vsw47_v1 (accessed on 9 February 2025).
Javandel, V.; Akbari, A.; Ardebili, M.; Werle, P. Simulation of Negative and Positive Corona Discharges in Air for Investigation of Electromagnetic Waves Propagation. IEEE Trans. Plasma Sci. 2022, 50, 3169–3177. [Google Scholar] [CrossRef]
Alpaydin, E. Introduction to Machine Learning, 4th ed.; MIT Press: Cambridge, MA, USA, 2020; ISBN 978-0-262-04379-3. [Google Scholar]
Kunicki, M.; Cichoń, A.; Borucki, S. Measurements on Partial Discharge in On-Site Operating Power Transformer: A Case Study. IET Gener. Transm. Distrib. 2018, 12, 2487–2495. [Google Scholar] [CrossRef]
Kim, J.; Kim, K.-I. Partial Discharge Online Detection for Long-Term Operational Sustainability of On-Site Low Voltage Distribution Network Using CNN Transfer Learning. Sustainability 2021, 13, 4692. [Google Scholar] [CrossRef]
Liu, H.; Xiang, M.X.; Zhou, B.; Zhu, L.; Duan, Y.; Zhang, X. Partial Discharge Detection Method for Distribution Network Based on Feature Engineering. J. Phys. Conf. Ser. 2023, 2456, 012048. [Google Scholar] [CrossRef]
Yuan, X.; Wang, Y.; Wang, C.; Ye, L.; Wang, K.; Wang, Y.; Yang, C.; Gui, W.; Shen, F. Variable Correlation Analysis-Based Convolutional Neural Network for Far Topological Feature Extraction and Industrial Predictive Modeling. IEEE Trans. Instrum. Meas. 2024, 73, 3001110. [Google Scholar] [CrossRef]
Jia, W.; Sun, M.; Lian, J.; Hou, S. Feature Dimensionality Reduction: A Review. Complex Intell. Syst. 2022, 8, 2663–2693. [Google Scholar] [CrossRef]
Hassan, W.; Mahmood, F.; Hussain, G.A.; Amin, S.; Kay, J.A. Feature Extraction of Partial Discharges During Multiple Simultaneous Defects in Low-Voltage Electric Machines. IEEE Trans. Instrum. Meas. 2021, 70, 3523410. [Google Scholar] [CrossRef]
Kumar, H.; Shafiq, M.; Kauhaniemi, K. Performance Evaluation of AI-Based Algorithms for Condition Assessment of Power Components. In Proceedings of the 2022 9th International Conference on Condition Monitoring and Diagnosis (CMD), Kitakyushu, Japan, 13–18 November 2022; pp. 231–236. [Google Scholar]
Boczar, T.; Borucki, S.; Jancarczyk, D.; Bernas, M.; Kurtasz, P. Application of Selected Machine Learning Techniques for Identification of Basic Classes of Partial Discharges Occurring in Paper-Oil Insulation Measured by Acoustic Emission Technique. Energies 2022, 15, 5013. [Google Scholar] [CrossRef]
Araújo, R.C.F.; de Oliveira, R.M.S.; Barros, F.J.B. Automatic PRPD Image Recognition of Multiple Simultaneous Partial Discharge Sources in On-Line Hydro-Generator Stator Bars. Energies 2022, 15, 326. [Google Scholar] [CrossRef]
de Oliveira, R.M.S.; Fernandes, F.C.; Barros, F.J.B. Novel Self-Organizing Probability Maps Applied to Classification of Concurrent Partial Discharges from Online Hydro-Generators. Energies 2024, 17, 2208. [Google Scholar] [CrossRef]

Figure 1. Overall partial discharge identification flowchart.

Figure 2. The diagrams of PD measurement.

Figure 3. The devices present in the testing facility should be listed as (a) gas turbine generators; (b) PD coupler terminal; (c) data acquisition unit; and (d) computer display unit.

Figure 4. The evidence of PD occurrences in the tested equipment, verified through opening and inspection, is (a) surface PD (stress grading) and (b) corona PD.

Figure 5. Example of the PD signal in the time domain for (a) internal PD; (b) surface PD; and (c) corona PD.

Figure 6. The frequency (f₁ and f₂) feature extraction with the findpeaks algorithm for (a) internal PD, (b) surface PD, and (c) corona PD.

Figure 8. Example of the frequency being tested at (1, 2) MHz for (a) internal model; (b) surface model; and (c) corona model.

Figure 9. The histogram of f₁ for each PD type.

Figure 10. The frequency (f₁, f₂) distribution of (a) internal PD; (b) surface PD; and (c) corona PD.

Figure 11. The BIC value of each PD type.

Figure 12. The modeling of (a) internal PD; (b) surface PD; and (c) corona PD using the GMM.

Figure 13. The frequency distribution of four PD cases: (a) case 1; (b) case 2; (c) case 3; (d) case 4.

Table 3. Process for identification and evaluation of PD signals.

Input: 1. The model of internal PD, surface PD, and corona PD.
2. PD signals represented in the form of frequencies (only f₁ or both (f₁, f₂)).
Output: Evaluate the accuracy, precision, recall, and confusion matrix values.

1: The frequency value (only f₁ or both (f₁, f₂)) is used to calculate the maximum
likelihood using Equation (9), and value of softmax function using Equation (11) for each PD.
Model is illustrated in Figure 8.
2: The identification of PD types is performed by comparing all three models, with the
identification determined by the softmax value exceeding 50%, is shown in Table 4.
3: Evaluate the accuracy, precision, recall, and confusion matrix values.

Table 4. Example of the calculation of log-likelihood, softmax, and identification type.

	Freq For Testing (f₁, f₂) MHz	Value of Log-Likelihood Function (L)	Value of Softmax Function	Identification Type with Softmax > 50%
Internal model	(1, 2)	−31.8521	11.99%
Surface model		−29.8588	88.01%	✓
Corona model		NaN	00.00%

Table 5. The values of mean, covariance, and weight from the GMM of internal, surface, and corona PD with K = 5.

Group	Internal PD			Surface PD			Corona PD
	(µ_f1, µ_f2) (MHz)	Σ	w	(µ_f1, µ_f2) (MHz)	Σ	w	(µ_f1, µ_f2) (MHz)	Σ	w
1	(1.34, 3.16)	$[\begin{matrix} 0.01 & 0.03 \\ 0.03 & 8.34 \end{matrix}] 10^{11}$	0.71	(1.15, 2.56)	$[\begin{matrix} 0.27 & - 0.03 \\ - 0.03 & 2.30 \end{matrix}] 1 0^{9}$	0.40	(12.83, 10.45)	$[\begin{matrix} 0.10 & 0.65 \\ 0.65 & 4.19 \end{matrix}] 10^{11}$	0.39
2	(0.88, 1.31)	$[\begin{matrix} 4.20 & - 0.39 \\ - 0.39 & 0.61 \end{matrix}] 10^{9}$	0.16	(0.81, 1.19)	$[\begin{matrix} 1.15 & - 0.26 \\ - 0.26 & 0.49 \end{matrix}] 10^{9}$	0.20	(12.67, 19.82)	$[\begin{matrix} 0.02 & - 0.18 \\ - 0.18 & 1.78 \end{matrix}] 10^{11}$	0.38
3	(1.08, 3.72)	$[\begin{matrix} 0.12 & 0.09 \\ 0.09 & 1.53 \end{matrix}] 10^{12}$	0.10	(0.98, 2.11)	$[\begin{matrix} 1.71 & 2.35 \\ 2.35 & 6.69 \end{matrix}] 10^{11}$	0.17	(12.51, 14.64)	$[\begin{matrix} 0.23 & 0.38 \\ 0.38 & 5.29 \end{matrix}] 10^{8}$	0.20
4	(2.49, 4.28)	$[\begin{matrix} 0.16 & 0.26 \\ 0.26 & 1.16 \end{matrix}] 10^{11}$	0.02	(1.02, 3.79)	$[\begin{matrix} 0.38 & - 0.87 \\ - 0.87 & 2.12 \end{matrix}] 10^{11}$	0.12	(11.00, 12.85)	$[\begin{matrix} 3.28 & 0.60 \\ 0.60 & 0.36 \end{matrix}] 10^{10}$	0.02
5	(3.83, 4.76)	$[\begin{matrix} 2.85 & 0.05 \\ 0.05 & 0.45 \end{matrix}] 10^{10}$	0.01	(1.39, 6.98)	$[\begin{matrix} 0.03 & 0.17 \\ 0.17 & 4.38 \end{matrix}] 10^{12}$	0.11	(12.86, 16.67)	$[\begin{matrix} 0.02 & - 0.28 \\ - 0.28 & 8.83 \end{matrix}] 10^{12}$	0.01

Table 6. Confusion matrix and performance metrics for PD signal identification using only f₁.

Predicted Label
True Label		Internal PD	Surface PD	Corona PD	Recall
	Internal PD	8771	1229	0	87.71%
	Surface PD	820	9180	0	91.80%
	Corona PD	0	0	10,000	100.00%
	Precision	91.45%	88.20%	100.00%
Accuracy = 93.17%

Table 7. Confusion matrix and performance metrics for PD signal identification using f₁ and f₂.

Predicted Label
True Label		Internal PD	Surface PD	Corona PD	Recall
	Internal PD	9350	650	0	93.50%
	Surface PD	347	9653	0	96.53%
	Corona PD	0	0	10,000	100.00%
	Precision	96.42%	93.69%	100.00%
Accuracy = 96.68%

Table 8. Summary of the performance of the identification methods.

Reference	Test Database	Feature Extraction	PD Identification/Classification /Clustering	Accuracy
Hassan et al., 2021 [56]	Laboratory	Statistical and pulse shape characteristics of cumulative energy (CE) function	K-mean clustering algorithm	88.90%
Kumar et al., 2022 [57]	Laboratory	Discrete wavelet transform (DWT) and statistical parameters	SVM, KNN	90–100%
Boczar et al., 2022 [58]	On-site (Oil Power Transformers)	Frequency domain, using Welch method	Probabilistic neural network (PNN)	92%
Pardauil et al., 2020 [32]	On-site (Hydro generators)	Pulse Height Analysis (PHA)	Random Forest (RF) with various clustering algorithms	94–99%
Araújo et al., 2022 [59]	On-site (Hydro generators)	sub-PRPDs, variables of valid PD, and quantify the shape of PD clouds	Artificial Neural Networks (ANNs) for image recognition	88–94.80%
de Oliveira et al., 2024 [60]	On-site (Hydro generators)	Amplitude histograms, sample-neuron distances in the space of features	Self-Organizing Probability Maps (SOPMs).	90%
This work	On-site (Gas turbine generators)	Fast Fourier Transform (FFT), findpeaks algorithm	Gaussian Mixture Model (GMM) and softmax function	93.17–96.68%

Table 9. Identification of four cases with unknown PD data.

	Assessment Predicted (%)
	Internal PD	Surface PD	Unknown PD
Case 1	7810 (78.10%)	307 (3.07%)	1883 (18.83%)
Case 2	8095 (80.95%)	982 (9.82%)	923 (9.23%)
Case 3	234(2.34%)	9446 (94.46%)	320 (3.20%)
Case 4	153 (1.53%)	9511 (95.11%)	336 (3.36%)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Romphuchaiyapruek, K.; Wattanawongpitak, S. Frequency-Based Density Estimation and Identification of Partial Discharges Signal in High-Voltage Generators via Gaussian Mixture Models. Eng 2025, 6, 64. https://doi.org/10.3390/eng6040064

AMA Style

Romphuchaiyapruek K, Wattanawongpitak S. Frequency-Based Density Estimation and Identification of Partial Discharges Signal in High-Voltage Generators via Gaussian Mixture Models. Eng. 2025; 6(4):64. https://doi.org/10.3390/eng6040064

Chicago/Turabian Style

Romphuchaiyapruek, Krissana, and Sarawut Wattanawongpitak. 2025. "Frequency-Based Density Estimation and Identification of Partial Discharges Signal in High-Voltage Generators via Gaussian Mixture Models" Eng 6, no. 4: 64. https://doi.org/10.3390/eng6040064

APA Style

Romphuchaiyapruek, K., & Wattanawongpitak, S. (2025). Frequency-Based Density Estimation and Identification of Partial Discharges Signal in High-Voltage Generators via Gaussian Mixture Models. Eng, 6(4), 64. https://doi.org/10.3390/eng6040064

Article Menu

Frequency-Based Density Estimation and Identification of Partial Discharges Signal in High-Voltage Generators via Gaussian Mixture Models

Abstract

1. Introduction