1. Introduction
Transformers are one of the most critical components in power systems, playing a vital role in ensuring the stable operation of the electrical grid, and are commonly monitored through multi-dimensional condition-sensing approaches, including electrical, chemical, thermal, and acoustic voiceprint-based techniques. Incorrect or delayed fault diagnosis may lead to unexpected outages, accelerated insulation deterioration, and even catastrophic transformer failures, which can cause substantial repair/replacement costs, loss of revenue due to service interruption, and increased safety and environmental risks. Therefore, improving diagnostic accuracy and early identification of fault types is crucial for condition-based maintenance and reliable grid operation. As electrical networks grow in complexity, transformer failure can have catastrophic effects, leading to power outages, significant repair costs, and system instability. Thus, early fault detection and diagnosis are paramount for ensuring the reliability and efficiency of power grids. Effective fault diagnosis helps reduce downtime, prevents further damage, and improves the overall maintenance strategy of transformers, which directly contributes to grid security and operational continuity [
1,
2].
Transformers, particularly oil-immersed types, can develop faults over time, leading to the generation of gases as a byproduct of these faults. These gases, typically hydrogen (H
2), methane (CH
4), ethane (C
2H
6), ethylene (C
2H
4), and acetylene (C
2H
2), are dissolved in the transformer oil and can provide valuable insight into the fault mechanisms. The type and concentration of these gases are closely related to specific fault types, such as partial discharge, overheating, or electrical arcing [
3]. Therefore, monitoring the gases dissolved in transformer oil becomes a crucial method for diagnosing faults.
Dissolved Gas Analysis (DGA) is a widely adopted technique used to detect and analyze the dissolved gases in transformer oil. Through DGA, it is possible to detect the concentration of specific gases and identify the fault type in a transformer. Several methods, including the IEEE standard and the Rogers’ ratio, have been proposed for interpreting DGA results and diagnosing transformer faults [
4,
5,
6,
7].
However, as fault patterns become more complex and subtle, traditional diagnostic methods may not offer sufficient accuracy. This has led to the development of advanced diagnostic tools based on artificial intelligence (AI) and machine learning (ML) techniques.
In recent years, many scholars across various countries have focused on the application of machine learning algorithms for fault classification using DGA data. Several studies have explored the use of Support Vector Machines (SVM), Artificial Neural Networks (ANN), Decision Trees (DT), and Random Forest (RF) for diagnosing transformer faults. These methods have shown varying degrees of success in classifying faults based on gas concentration data [
8,
9,
10]. However, the performance of these models is highly dependent on the selection of features, optimization of model parameters, and the handling of class imbalances in the data [
11,
12].
In the broader field of condition monitoring and fault diagnostics, statistical learning has also been applied to other types of condition indicators. For example, Niola et al. employed discriminant analysis to characterize and classify the vibrational behavior of a gas micro-turbine under different fuel conditions, highlighting the effectiveness of data-driven classification for machine monitoring [
13]. In addition, recent surveys have discussed the opportunities and challenges of large models (e.g., large language/multimodal models) for machine monitoring and fault diagnostics, which may further promote more adaptive and interpretable diagnostic frameworks in the future [
14].
To improve classification accuracy and robustness, hybrid models combining optimization algorithms with machine learning classifiers have been proposed. For example, Particle Swarm Optimization (PSO), Grey Wolf Optimization (GWO), and Artificial Bee Colony (ABC) algorithms have been integrated with SVM to optimize hyperparameters and enhance the model’s classification performance [
15,
16]. Despite these advancements, challenges such as local minima and the need for more computationally efficient algorithms still persist [
17,
18].
Furthermore, many existing models have shown limitations when applied to diverse datasets or in real-time fault detection scenarios [
19].
This study introduces a hybrid optimization model combining PSO and GWO with SVM, referred to as PSO-GWO-SVM. This model aims to improve the accuracy of transformer fault classification by optimizing the parameters of SVM, thereby enhancing its ability to classify faults based on the dissolved gas concentration data. PSO is employed to explore the parameter space efficiently, while GWO enhances the global search capability, ensuring the optimal balance between exploration and exploitation. SVM is adopted as the base classifier due to its strong generalization capability and robustness on small-to-medium tabular datasets, which is suitable for DGA-based fault classification. Since the diagnostic performance of SVM is sensitive to hyperparameters, metaheuristic optimizers are introduced to perform data-driven hyperparameter tuning. The optimizers considered in this study (PSO, GWO, IGWO, and ISSA) are representative population-based methods that have been widely used for model parameter optimization in fault diagnosis applications.
The proposed PSO-GWO hybrid is motivated by a complementary exploration–exploitation mechanism: PSO’s velocity-driven updates help preserve population diversity and improve global exploration, whereas GWO’s elite-guided encircling strengthens local exploitation and refinement around promising regions. Combining these complementary behaviors is expected to reduce premature convergence and improve the stability of hyperparameter search. Based on this rationale, the manuscript evaluates PSO-SVM, GWO-SVM, IGWO-SVM, ISSA-SVM, and the hybrid PSO-GWO-SVM under the same evaluation protocol to verify the effectiveness of the proposed combination.
Preliminary experiments have demonstrated the superior performance of the PSO-GWO-SVM model in comparison to other hybrid models such as PSO-SVM, GWO-SVM, and IGWO-SVM.
This paper contributes to the field by proposing an advanced hybrid model for transformer fault diagnosis, addressing the challenges faced by traditional machine learning algorithms. The proposed approach not only optimizes the classification process but also provides a more robust and reliable tool for the early detection of transformer faults, which is critical for ensuring the security and stability of power systems.
To clearly summarize the objective and contributions of this work, the key points are as follows:
(1) A hybrid PSO-GWO strategy is proposed to tune SVM hyperparameters for DGA-based multi-class transformer fault diagnosis.
(2) The hybrid search is designed to balance exploration and exploitation, improving convergence stability and diagnostic accuracy.
(3) A strict evaluation protocol is adopted, in which hyperparameters are optimized via cross-validation on the training set and final performance is assessed on a held-out test set.
(4) Comparative experiments are conducted against representative single-optimizer baselines, with performance evaluated using accuracy and confusion-matrix analysis.
The remainder of this paper is organized as follows:
Section 2 presents the proposed method,
Section 3 describes the dataset and experimental settings,
Section 4 reports the experimental results, and
Section 5 concludes the paper and outlines future work.
2. PSO-GWO-SVM Hybrid Model
2.1. Principles of Particle Swarm Optimization
Particle Swarm Optimization is a population-based optimization algorithm proposed by Kennedy and Eberhart in 1995 [
20]. PSO simulates the collective behavior of natural systems such as bird flocks and fish schools, where particles communicate and share information to achieve global optimization.
The position of each particle in the search space is represented as:
where
is the problem dimension, and
is the position of particle
at time
. The velocity update rule for each particle
is given by:
where
is the inertia weight, controlling the balance between exploration and exploitation;
and
are the acceleration coefficients, determining the influence of personal and global best positions;
and
are two random numbers typically in the range [0,1];
is the personal best position of particle
;
is the global best position found by the swarm.
The position update rule is given by:
In high-dimensional and complex constraint spaces, PSO may suffer from premature convergence to local optima, thus limiting its search performance. PSO has been widely applied in various fields, including function optimization, neural network training, data mining, and path planning [
21,
22,
23].
2.2. Principles of Grey Wolf Optimization
Grey Wolf Optimization is a nature-inspired optimization algorithm introduced by Mirjalili et al. in 2014 [
24]. It is based on the hunting behavior and social hierarchy of grey wolves, which have an organized social structure. In this algorithm, wolves are considered as search agents, and their movements are influenced by both their individual experiences and the experiences of other wolves in the pack. The social hierarchy of the grey wolf pack is shown in
Figure 1.
The position of each wolf
in the search space is represented as:
where
is the problem dimension, and
is the position of the
-th wolf at time
. The mathematical formulation for updating the position of a wolf
is given by:
where
denotes the (scaled) distance between the current wolf position
and the alpha wolf position
;
and
are coefficient vectors that control the exploration and exploitation balance. They are updated dynamically based on a random value and the current iteration number.
is the position of the alpha wolf. The process of encircling, chasing, and attacking helps the wolves explore the search space and converge towards the global optimum.
Grey Wolf Optimization has been successfully applied to a wide range of optimization problems, including multi-objective optimization, feature selection, machine learning, and engineering design optimization [
25,
26,
27].
2.3. Principles of Support Vector Machine
Support Vector Machine is a supervised machine learning algorithm primarily used for classification and regression tasks. It was first introduced by Vapnik and Cortes in 1995 [
28]. SVM works by finding a hyperplane that best separates the data into distinct classes. In a two-dimensional space, this hyperplane is simply a line that divides the dataset into two classes. The goal of SVM is to find the optimal hyperplane that maximizes the margin between the two classes. The margin is defined as the distance between the hyperplane and the nearest data points from each class, called support vectors, as shown in
Figure 2.
In higher dimensions, the concept of the hyperplane extends to a higher-dimensional space. The objective of SVM is to find a hyperplane
that best separates the classes while maximizing the margin. The decision function is given by:
where
is the normal vector to the hyperplane, and
is the bias term. The margin is:
And the objective is to maximize this margin, which can be done by solving an optimization problem.
SVM can handle non-linearly separable data by using the kernel trick. The idea is to map the input data into a higher-dimensional feature space where a linear hyperplane can be used to separate the data. The kernel function computes the inner product of the data points in the high-dimensional space without explicitly transforming the data, allowing SVM to find optimal hyperplanes in complex, non-linear decision boundaries. Common kernels include the linear kernel:
And the radial basis function (RBF) kernel:
This allows SVM to handle complex decision boundaries and improve its performance in non-linear classification tasks [
23].
SVM is known for its ability to perform well with high-dimensional data and in cases where the number of dimensions exceeds the number of data points. However, it is sensitive to the choice of kernel function, regularization parameters, and other hyperparameters. Additionally, SVM training can be computationally expensive, especially for large datasets, and it can be sensitive to noisy data and outliers, which can influence the optimal hyperplane [
24]. Despite these limitations, SVM has been widely applied in various fields such as image recognition, bioinformatics, text classification, and speech recognition. Recent studies have explored its applications and further refinements to improve its performance in these areas [
29].
2.4. Multi-Model Hybrid Approach
Support Vector Machine is a supervised machine learning algorithm primarily used for classification and regression. A multi-model hybrid approach involves combining the strengths of different optimization algorithms to improve overall performance. In the case of the GWO, PSO, and SVM, a hybrid model leverages the exploration capabilities of GWO and PSO to optimize the hyperparameters of the SVM, thereby enhancing classification accuracy. In this approach, GWO and PSO work together to find optimal values for the SVM hyperparameters, which are crucial for achieving good performance in classification tasks.
The flow of the hybrid model algorithm is shown in
Figure 3.
By combining the exploration capabilities of GWO and PSO, the hybrid approach ensures a more thorough search of the hyperparameter space. GWO provides a global search, and PSO refines this search with local updates to optimize the SVM’s hyperparameters. This hybrid approach improves both the efficiency and accuracy of the SVM model.
To avoid information leakage and ensure an unbiased evaluation, the dataset is first split into a training set (70%) and an independent test set (30%) using stratified sampling. The test set is strictly held out and is not involved in any stage of preprocessing parameter estimation, hyperparameter tuning, or model selection.
During the hybrid GWO-PSO optimization, the fitness of each candidate SVM parameter set is computed only on the training set using stratified 5-fold cross-validation. In each fold, the normalization (or other preprocessing) parameters are fitted on the fold-training subset and then applied to the fold-validation subset. The fitness is defined as the mean classification performance (e.g., accuracy) across the 5 folds.
After the optimization converges, the SVM is retrained on the full training set using the optimal hyperparameters. Finally, the trained model is evaluated once on the independent test set to report the final generalization performance.
Although PSO and GWO are both well-known optimizers, the proposed PSO-GWO hybrid is motivated by their complementary search behaviors. PSO maintains population diversity through velocity-driven updates, which helps preserve global exploration and reduces premature convergence. In contrast, GWO performs exploitation by encircling and refining the search around elite solutions. By combining these two roles in a cooperative manner, the hybrid strategy achieves a more balanced exploration–exploitation trade-off for SVM hyperparameter tuning, which in turn improves the stability of the optimization process and enhances the generalization performance on multi-class transformer fault diagnosis.
To improve reproducibility and methodological transparency, we summarize the key settings of the training protocol and the optimization algorithms in
Table 1. The SVM classifier is implemented as an ECOC multi-class model with an RBF kernel. Hyperparameter tuning is performed in the log10 domain, i.e.,
(BoxConstraint) and
(KernelScale), where
and
are optimized by the corresponding metaheuristic algorithm. The objective function is the 5-fold cross-validation accuracy on the training set, while the test set is strictly held out and used only for the final evaluation.
4. Experimental Results
4.1. Accuracy Comparison Among Different Models
In this study, the classification performance of multiple models for gas data was evaluated using MATLAB R2024a (MathWorks, Natick, MA, USA), and the results were imported into Origin 2024 (OriginLab Corporation, Northampton, MA, USA) for visualization. To evaluate the effectiveness of the proposed hybrid optimization approach, the classification performance of five different models is compared, including GWO-SVM, IGWO-SVM, ISSA-SVM, PSO-SVM, and the proposed PSO-GWO-SVM.
All models are evaluated using the same dataset and experimental settings, including a stratified train–test split and 5-fold cross-validation. The overall classification accuracy on the test set is used as the primary metric for performance comparison. The results of the experiment are summarised in
Table 4.
The specific classification results of each classification algorithm are shown in
Figure 5. The results in
Table 4 and
Figure 5 are evaluated on the independent test set, while cross-validation is only used within the training set for hyperparameter optimization.
Figure 5 provides a sample-level visualization of the classification results. By comparing the predicted labels (red) with the ground-truth labels (blue) across the test samples, the figure directly shows where misclassifications occur and whether errors are concentrated in certain fault categories. This visualization complements the overall accuracy and confusion-matrix results by offering an intuitive interpretation of prediction consistency at the individual-sample level.
The PSO-GWO-SVM model achieved the highest test accuracy of 96.98%, outperforming PSO-SVM (93.57%), GWO-SVM (92.94%), IGWO-SVM (93.17%), and ISSA-SVM (91.03%) under the same evaluation protocol. In particular, the accuracy gains of PSO-GWO-SVM over these baselines are 3.41, 4.04, 3.81, and 5.95 percentage points, respectively. These improvements indicate that the proposed hybrid optimization provides a more effective hyperparameter search for SVM tuning than the compared single-optimizer baselines and improved variants.
To quantify variability caused by the stochastic nature of the metaheuristic optimization (random population initialization and random coefficients in PSO/GWO updates), the experiment was repeated 10 times under the same data split and experimental settings. The resulting accuracy is 97.24% ± 0.33% (mean ± standard deviation), and the 95% confidence interval for the mean accuracy across runs is [97.01%, 97.47%].
While PSO-SVM achieved an accuracy of 93.57%, and GWO-SVM reached 92.94%, the hybrid PSO-GWO-SVM significantly improved performance by leveraging the strengths of both algorithms. PSO is known for its excellent exploration capabilities, while GWO excels at balancing exploration and exploitation in optimization tasks [
31,
32]. The combination of these algorithms enables PSO-GWO-SVM to achieve faster convergence and a more effective search of the parameter space, resulting in superior classification accuracy [
33,
34]. Previous studies have demonstrated that hybrid optimization methods, like PSO-GWO, outperform single optimization algorithms, particularly in handling complex, high-dimensional datasets [
35]. Thus, the PSO-GWO-SVM model provides a significant improvement in parameter optimization, making it highly effective for classification tasks.
4.2. Confusion Matrix Comparison
In this paper, the confusion matrix is used as the main evaluation tool. In this paper, the confusion matrix is used as a primary evaluation tool. A confusion matrix is a tabular representation of classification performance that summarizes the numbers of true positives, true negatives, false positives, and false negatives for each class, thereby providing detailed insight into class-wise prediction behavior beyond overall accuracy [
30]. Based on the confusion matrix, performance metrics such as precision, recall, and F1-score can be further derived. The confusion matrix of each algorithm shows the prediction results between different categories, including the number of correctly classified samples and the number of misclassified samples. The following is an analysis of the confusion matrices of various algorithms for the seven categories of ‘normal, low energy discharge, high energy discharge, medium-low temperature overheat, high temperature overheat, medium temperature overheat, and low temperature overheat’. The results of the confusion matrix evaluation are shown in
Figure 6.
In addition to accuracy, macro-averaged precision, recall, and F1-score are reported to provide a more complete evaluation of multi-class diagnostic performance. These metrics are computed from the confusion matrices. Since the test set is class-balanced, macro-averaged and weighted-averaged results are identical. The detailed performance comparison of the five models on the test set is presented in
Table 5.
The results of the confusion matrix analysis reveal that the PSO-GWO-SVM model outperforms the other algorithms in terms of classification accuracy. This hybrid approach of combining Particle Swarm Optimization and Gray Wolf Optimization allows for improved parameter optimization, reducing misclassification rates across all categories. Compared to the other models, PSO-GWO-SVM demonstrated fewer misclassifications, especially for Normal, T1/T2, and T3 classes, highlighting its robustness in handling complex classification tasks. The GWO-SVM and IGWO-SVM models also performed well, especially in the Normal and T3 categories, but they showed slightly higher misclassification rates in D1 and T1/T2 compared to PSO-GWO-SVM. ISSA-SVM performed reasonably well but had higher error rates in the D1 and T1/T2 classes, which impacted its overall performance. These findings are consistent with recent literature that emphasizes the effectiveness of hybrid optimization strategies [
36,
37].
The superior performance of the hybrid PSO-GWO-SVM is not only reflected by faster convergence, but also by the quality and robustness of the hyperparameter search. The objective landscape induced by cross-validation accuracy is generally non-convex and may contain multiple local optima. In this context, single-optimizer strategies may suffer from either insufficient exploration (prematurely converging to a suboptimal region) or insufficient exploitation (failing to refine around promising regions). The proposed hybrid strategy leverages the complementary behaviors of PSO and GWO: PSO’s velocity-driven update helps maintain population diversity and facilitates escaping local optima, while GWO’s elite-guided encircling mechanism strengthens exploitation and refines the search around high-quality candidate solutions. This cooperative exploration–exploitation balance increases the probability of locating better-performing hyperparameter configurations, leading to improved generalization on the held-out test set. As a result, the hybrid model yields more reliable decision boundaries for multi-class DGA fault discrimination, which is consistent with the observed improvements in accuracy and confusion-matrix performance compared with PSO-SVM, GWO-SVM, IGWO-SVM, and ISSA-SVM.
4.3. Convergence Speed Analysis
In this study, the optimization iteration curves are plotted using the mean 5-fold cross-validation accuracy on the training set, which serves as the fitness value during the optimization process. The independent test set is not used to monitor the optimization progress and is only employed for the final evaluation after the optimal hyperparameters are obtained, as shown in
Figure 7.
The PSO-GWO-SVM algorithm demonstrates the best performance, achieving the highest classification accuracy most rapidly compared to other models. This hybrid approach of combining Particle Swarm Optimization for global exploration and Gray Wolf Optimization for local exploitation allows the model to quickly converge to an optimal solution, reaching a high accuracy level early in the iterations. The PSO-SVM shows a steady improvement over iterations. The GWO-SVM also shows a gradual increase in accuracy, but it requires more iterations to catch up to the hybrid methods, reflecting a slower convergence rate. The ISSA-SVM does not perform as well as the other models in terms of the final classification accuracy. Finally, the IGWO-SVM shows similar behavior to GWO-SVM, with improvements over time but not reaching the level of performance seen in the hybrid models. These results suggest that hybrid optimization algorithms, such as PSO-GWO-SVM, are highly effective for transformer fault diagnosis, as they balance exploration and exploitation to find optimal solutions more efficiently than single optimization approaches.
A possible reason for the relatively slower convergence observed in some baseline optimizers is the exploration–exploitation imbalance and parameter sensitivity inherent to stochastic metaheuristics. For example, certain algorithms may maintain excessive exploration (slow refinement near promising regions) or become prematurely trapped in suboptimal regions if the population diversity decreases too quickly. In addition, the cross-validation accuracy used as the optimization objective can induce a non-smooth and multi-modal search landscape, making it difficult for some optimizers to consistently improve within limited iterations. By contrast, the proposed PSO-GWO hybrid benefits from complementary behaviors—diversity-preserving exploration and elite-guided exploitation—thereby improving convergence stability and efficiency.
The improvements in classification accuracy and convergence speed also have practical relevance for transformer condition monitoring and maintenance. First, higher diagnostic accuracy can reduce misclassification risk (including false alarms and missed detections), which supports more reliable maintenance prioritization and targeted inspections, especially when multiple units compete for limited maintenance resources. Second, faster and more stable hyperparameter tuning facilitates model updating when new DGA samples become available, making the approach more suitable for periodic re-training in an online/near-online monitoring pipeline and thereby improving early-warning responsiveness. Third, although the current study focuses on fault-type identification, the SVM decision scores (e.g., margins or class-wise confidence scores from the ECOC model) can be further utilized as a proxy for diagnostic confidence; combined with trend analysis of key gases, these scores can support practical alarm thresholding and preliminary fault severity discrimination (e.g., higher confidence and persistent abnormal trends may indicate higher urgency for maintenance actions).
5. Discussion
In this paper, various optimization algorithms integrated with Support Vector Machines are investigated for transformer fault diagnosis, including PSO-SVM, ISSA-SVM, GWO-SVM, IGWO-SVM, and a hybrid model, PSO-GWO-SVM. Through rigorous evaluation using accuracy comparisons, confusion matrix analysis, and optimization iteration curves, the results demonstrate that the proposed hybrid PSO-GWO-SVM algorithm achieves superior performance, obtaining the highest classification accuracy (96.98%) and the most efficient convergence behavior among the tested models.
The hybrid algorithm effectively leverages the global search capabilities of Particle Swarm Optimization and the local exploitation strengths of Gray Wolf Optimization, providing balanced and robust parameter optimization for the SVM.
The confusion matrix analysis further validated the superiority of the PSO-GWO-SVM model, exhibiting fewer misclassifications across different fault categories compared to single optimization-based models. Moreover, the optimization iteration curves confirmed that PSO-GWO-SVM quickly converges to optimal or near-optimal solutions, highlighting its practical applicability in transformer fault diagnosis. In practical substation monitoring scenarios involving multi-source condition perception, including transformer operational data and acoustic voiceprint-based detection, the proposed method can serve as a reliable decision-support component for fault trend analysis and abnormal state identification.
Beyond faster convergence. It is worth noting that the superiority of the hybrid PSO-GWO-SVM is not merely due to faster convergence. The hybrid search tends to retain broader exploration in early iterations while providing stronger refinement around elite regions in later iterations, thereby lowering the risk of being trapped in local optima. This complementary behavior offers a plausible explanation for the observed improvement in test accuracy over single-optimizer baselines.
Recent advances in fault diagnosis increasingly emphasize incorporating mechanism-related constraints into feature extraction and model design to improve physical consistency and interpretability. For example, Cheng et al. proposed CFFsBD by explicitly exploiting candidate fault frequencies to guide blind deconvolution for bearing fault feature enhancement [
38], and further developed an improved envelope spectrum via candidate fault frequency optimization-gram to adaptively select informative spectral bands, both demonstrating the benefit of embedding mechanism-related priors into the diagnostic pipeline. Inspired by these studies, future work on DGA-based transformer diagnosis may consider integrating mechanism-guided constraints (e.g., gas-generation signatures and physically meaningful ratio/consistency rules) into feature construction or learning objectives to further enhance interpretability [
16,
39].
Despite the improved accuracy and convergence behavior, several limitations should be acknowledged. First, the current evaluation is conducted on a dataset of limited scale, and real-world DGA data can be larger and class-imbalanced; therefore, future work will investigate scalability and imbalance-aware learning (e.g., cost-sensitive training, re-sampling strategies, and more extensive cross-site validation). Second, DGA patterns may be influenced by operating conditions (load, temperature, oil aging, moisture) and measurement variability, which may cause distribution shifts; future studies will consider robustness evaluation under varying operating regimes and potential domain adaptation/normalization strategies [
40,
41]. Third, while this paper focuses on offline diagnosis, practical deployment often requires online/continuous monitoring; future work will explore incremental model updating, drift detection, and computationally efficient implementation for near-real-time early warning.
In future work, deep learning can be incorporated mainly as a feature-learning module for DGA inputs (e.g., a lightweight MLP/1D-CNN to learn more discriminative representations), and the proposed PSO-GWO strategy can still be used for robust hyperparameter tuning of the subsequent classifier. Moreover, deep learning–based techniques such as class-balanced training and domain adaptation may help mitigate limitations related to data imbalance and operating-condition variability, and incremental updating can facilitate online monitoring.