1. Introduction
Power transformers play a central role in modern power systems, and their reliable operation is vital for system stability, safety, and economic efficiency. Unexpected failures can cause widespread outages, high repair and replacement costs, and long service interruptions, which underlines the need for timely and accurate fault diagnosis within condition monitoring and asset management schemes [
1,
2]. Among the available diagnostic tools, dissolved gas analysis (DGA) is the most commonly used method for detecting incipient faults in oil-immersed transformers. In this approach, fault-related thermal and electrical stresses generate characteristic gas patterns that are interpreted using gas concentration ratios. Traditional DGA methods such as the Rogers ratio, IEC ratio, and Duval triangle are widely employed because they are straightforward to apply and easy to interpret [
3]. However, these rule-based schemes rely on fixed decision boundaries and can suffer from overlapping fault regions, sensitivity to noise [
4], and limited adaptability to mixed or evolving fault conditions, which may reduce diagnostic accuracy under complex operating environments [
5,
6].
Model-based and signal-processing-based methods have been widely studied alongside data-driven approaches for fault detection and diagnosis in power electronic and energy conversion systems. Model-based techniques rely on mathematical representations of system dynamics and residual analysis to identify abnormal behavior, whereas signal-processing-based approaches extract fault-related features from measured electrical signals. These methods offer valuable physical insight and can provide theoretical performance guarantees; however, their effectiveness in practice may be constrained by modeling uncertainties, parameter variations [
7], and complex operating conditions. Recent studies have underscored the challenges of ensuring robustness and managing uncertainty in model-based fault detection frameworks, and have also highlighted growing interest in hybrid model–data-driven diagnostic schemes aimed at improving adaptability and reliability under real-world conditions [
7,
8,
9].
To overcome the limitations of traditional DGA-based diagnostic schemes, data-driven and optimization-based methods have attracted growing interest in recent years. Machine learning models such as Support Vector Machines (SVM), Extreme Learning Machines (ELM), Kernel Extreme Learning Machines (KELM), and Artificial Neural Networks (ANN) have shown higher diagnostic accuracy than conventional DGA ratio schemes [
10,
11,
12,
13]. Their performance, however, depends strongly on appropriate parameter tuning, feature selection, and decision boundary design, which has encouraged the use of metaheuristic optimization [
14,
15]. Population-based optimizers, including Particle Swarm Optimization (PSO) and Genetic Algorithms (GA), are well-suited to handling nonlinear and constrained optimization problems [
16]. Their complementary exploration and exploitation behaviors have motivated a range of hybrid and multi-objective formulations aimed at improving convergence reliability, robustness, and computational efficiency [
17,
18]. For transformer fault diagnosis, several metaheuristic-assisted frameworks based on optimizers such as grey wolf optimizer, ant colony optimization, and their hybrid variants have been proposed and reported to enhance classification performance [
19,
20,
21,
22]. Nonetheless, many published studies focus mainly on final accuracy indicators and give relatively little attention to convergence characteristics, repeatability over multiple runs, and computational cost [
23,
24,
25]. Metaheuristics are used in several applications such as economic load dispatch, optimal power flow, fuel cell and photovoltaic systems [
26,
27,
28,
29,
30,
31,
32].
Recently, the artificial protozoa optimizer (APO) has been proposed as a bio-inspired metaheuristic that mimics key protozoan behaviors such as foraging, reproduction, and dormancy, and has shown strong global exploration capability in complex engineering optimization tasks [
33,
34]. Several improved APO variants have since been introduced for a range of applications, including renewable energy optimization, multimodal search problems, image segmentation, large-scale optimization, and hybrid learning-based frameworks [
35,
36,
37,
38,
39]. Despite these developments, previous studies have noted that APO can experience relatively slow convergence in the later stages of the search because of its persistently exploratory behavior [
40]. To alleviate this issue, recent work has investigated hybrid APO-based optimization schemes—particularly combinations with particle swarm optimization—which have reported faster convergence and more stable solutions in multi-objective and security-constrained optimization problems. However, the use of APO and its hybrid variants for dissolved gas analysis-based transformer fault diagnosis, especially for adaptive threshold optimization within rule-based diagnostic methods, has received little attention.
Motivated by this gap, the present study examines the effectiveness of APO and hybrid APO-based optimization frameworks for transformer fault diagnosis using DGA data. Standalone APO is systematically compared with established hybrid optimizers, including PSO–GWO and GA–ACO, as well as with a newly developed APO–PSO hybrid model. The proposed APO–PSO framework combines APO’s adaptive global exploration with PSO’s rapid local exploitation to enhance diagnostic accuracy, speed up convergence, and improve robustness.
Extensive experiments were carried out on a dissolved gas analysis (DGA) dataset containing 500 fault samples, with each optimization method evaluated over 50 independent runs. The comparison focused on convergence behavior, rapidity of solution, computational time, and the distribution of final fitness values. In addition, the statistical significance of the observed performance differences was assessed using the Friedman and Wilcoxon signed-rank tests to support the reliability and reproducibility of the results.
The surveyed literature suggests that although traditional DGA interpretation schemes and standalone metaheuristic algorithms can deliver acceptable diagnostic performance, hybrid optimization frameworks generally achieve higher accuracy and greater stability. In line with these findings, the present study shows that the proposed APO–PSO hybrid clearly outperforms standalone APO, GA–ACO, and PSO–GWO in terms of convergence speed, final fitness quality, and robustness across repeated runs. By alleviating the late-stage convergence weakness of APO through PSO-driven local exploitation, the APO–PSO framework provides a reliable and efficient optimization strategy for DGA-based power transformer fault diagnosis.
The main contributions of this study can be summarized as follows:
A hybrid APO–PSO optimization framework is developed for DGA-based transformer fault diagnosis, explicitly targeting the late-stage convergence limitations of standalone APO.
The proposed approach enables dynamic optimization of Rogers’ ratio decision thresholds, yielding higher diagnostic accuracy than conventional static ratio-based schemes.
A rigorous multi-run evaluation protocol, with 50 independent executions for each algorithm, is employed to enhance the robustness and reproducibility of the findings.
A comprehensive set of performance indicators—including convergence characteristics, rapidity behavior, computational time, fitness distribution, and formal statistical significance tests—is examined to assess algorithmic performance.
The proposed method achieves a measurable diagnostic accuracy gain of up to roughly 3%, together with improved stability, supporting its applicability to practical transformer condition monitoring.
Recent data-driven methods, including Support Vector Machines (SVM) [
14], neural networks [
16], and related machine learning classifiers, have been used for transformer fault diagnosis and have shown strong performance on DGA data. While these models are effective at capturing complex nonlinear relationships between gas patterns and fault types, their decision processes typically function as black boxes and do not provide directly interpretable gas ratio thresholds that are familiar to industry practitioners [
27]. In contrast, this work concentrates on optimization-based refinement of transparent, rule-driven diagnostic boundaries, with a particular focus on adaptive adjustment of Rogers’ ratio limits [
6,
7]. This approach preserves the interpretability and practical acceptance of classical ratio-based schemes while improving their flexibility through metaheuristic optimization. For this reason, the study prioritizes a comparative assessment of different optimization strategies rather than benchmarking against alternative machine learning classifiers, since the main goal is to enhance explainable and adaptive rule-based diagnosis rather than replace it with purely black-box models.
2. Optimization Algorithms for Transformer Fault Diagnosis
Transformer fault diagnosis using Dissolved Gas Analysis (DGA) can be cast as a nonlinear optimization problem in which gas ratio thresholds determine the decision boundaries separating different fault types. In traditional ratio-based schemes, these thresholds are fixed, which restricts the method’s ability to adapt to changes in operating conditions. To overcome this limitation, biologically inspired optimization algorithms are used to adjust the thresholds dynamically by minimizing a suitable diagnostic loss function.
Let the threshold vector be defined as
where each element T
i represents a boundary associated with a specific gas ratio. The optimization goal is to determine the optimal vector T* that minimizes the overall misclassification error over the available DGA dataset.
In the proposed formulation, the chromosome vector T encodes the decision threshold boundaries of the Rogers ratio method. The elements of T specify the upper and lower limits of the key gas ratio intervals used to distinguish between different fault types. During optimization, these threshold values are adaptively adjusted to reduce misclassification while preserving the logical structure of the original Rogers diagnostic rules.
Before introducing the individual optimization algorithms, the diagnostic task is first posed as an optimization problem over a threshold vector T, whose components define the decision boundaries of a rule-based DGA classifier. For each gas sample, the classifier maps the input features to a fault type, and a simple loss function measures whether this prediction matches the true label. The overall objective is to choose T so that the aggregate misclassification over all available samples is minimized, ensuring that every algorithm is evaluated against the same classifier structure and error criterion.
2.1. Particle Swarm Optimization (PSO)
Particle Swarm Optimization (PSO) is a population-based search technique inspired by the collective movement and social learning behavior observed in bird flocks and fish schools. In the proposed framework, each particle encodes a candidate threshold vector T
i [
13,
14], and the swarm evolves by iteratively updating particle velocities and positions according to both individual experience and the best-performing solution found by the group.
The PSO update mechanism is derived from a simplified Newtonian motion model, in which the velocity term retains a memory of previous search directions. The velocity update for a particle can be written as
where v
i(t) is the velocity of particle i at iteration t, x
i(t) is the current position of particle i, p
i denotes the personal best position of particle i, and g represents the global best position identified by the swarm. The parameter w is the inertia weight controlling the balance between exploration and exploitation, c
1 and c
2 are the cognitive and social acceleration coefficients, and r
1 and r
2 are independent random numbers uniformly distributed in the interval [0, 1].
Replacing the general position x
i with threshold T
i, the fitness function F(T) replaces the general cost function f(x), where ΔT
i(t) follows the PSO velocity update rule.
where ΔT
i(t) is determined by the updated velocity.
PSO is known for its fast convergence, largely due to the velocity memory and strong attraction toward the best solutions. This property enables rapid adaptation of DGA threshold values with relatively low computational cost, which is advantageous for real-time or near real-time diagnostic applications. However, the strong exploitation tendency can cause premature convergence in multimodal search spaces and may reduce robustness when dealing with noisy or highly overlapping DGA data [
2,
15].
2.2. Genetic Algorithm (GA)
The Genetic Algorithm (GA) is an evolutionary optimization technique that emulates natural selection through the iterative use of selection, crossover [
16], and mutation operators. In the proposed framework, each chromosome represents a candidate threshold vector, and population diversity is maintained by applying these genetic operators across generations.
The evolutionary update of the population can be expressed in compact form as
where x denotes a candidate solution.
The crossover operation between two parent threshold vectors T
1 and T
2 is defined as
where α ∈ [0, 1] is a blending factor that controls the contribution of each parent chromosome T
1 and T
2 to the offspring T′.
where σ is the mutation step size and N(0, 1) is the Standard Gaussian random variable, introducing controlled randomness to maintain population diversity and avoid premature convergence.
Each chromosome encodes a threshold T = [T
1, …, T
m], and the fitness = accuracy F(T) is defined in terms of classification accuracy. The population evolves iteratively according to
Until F(T) converges to T*, where the fitness function F(T) measures classification accuracy and the “genes” directly encode the threshold values.
GA is strongly exploratory and is effective at maintaining population diversity, which helps the search escape local minima and perform robust global exploration. However, its convergence rate is sensitive to parameter settings (such as crossover and mutation probabilities and population size), and it typically converges more slowly than some swarm-based methods. For this reason, GA is particularly well suited to offline or batch diagnostic optimization, whereas its relatively higher computational demand can be a limitation for strict real-time applications [
16,
17].
2.3. Grey Wolf Optimization (GWO)
The Grey Wolf Optimizer (GWO) is a population-based metaheuristic inspired by the leadership hierarchy and cooperative hunting behavior of grey wolves. It models four levels of social hierarchy—alpha, beta, delta, and omega—where the alpha wolf corresponds to the best current solution. During optimization, candidate solutions update their positions by “encircling” and following the leading wolves, which helps maintain a balance between exploration and exploitation. This mechanism allows GWO to guide the population effectively toward promising regions of the search space.
GWO is a swarm-based algorithm that simulates the leadership hierarchy and cooperative hunting strategy of grey wolves [
18,
19]. In this scheme, the three best solutions in the population, denoted as α, β, and δ, act as leaders and guide the remaining wolves toward promising regions of the search space.
During optimization, candidate solutions update their positions by encircling the prey, which represents the optimal solution. The position update of a wolf relative to the prey can be expressed as
where X⃗
t is the current position of a wolf (candidate threshold vector), X⃗
p is the estimated position of the prey (optimal solution) and A and C are the coefficient vectors that control exploration and exploitation.
And the position update rule is given by:
where X⃗
α, X⃗
β, X⃗
δ denote the positions of the three best wolves and D⃗α, D⃗β, D⃗δ are the distances between the current wolf and the leaders.
By mapping the position vector X to the threshold vector T, the GWO update rule for threshold optimization can be written as
where T
α, T
β, and T
δ correspond to the threshold vectors of the top three candidate solutions. The hierarchical leadership structure ensures convergence toward an optimal threshold set T*, while the fitness function F(T) evaluates diagnostic performance.
GWO provides a good compromise between exploration and exploitation through its hierarchical leadership mechanism, and typically exhibits stable convergence behavior. Nonetheless, its convergence in the early iterations is generally slower than that of PSO, which can be a disadvantage when very fast adaptation is required [
18,
19].
2.4. Ant Colony Optimization (ACO)
Ant Colony Optimization (ACO) is a metaheuristic inspired by the foraging behavior of real ant colonies, where ants communicate indirectly through pheromone trails to discover good paths. In ACO, artificial ants probabilistically construct solutions using both pheromone intensity and heuristic information, while pheromone evaporation helps avoid premature convergence. By iteratively reinforcing high-quality solution paths, the algorithm maintains a balance between exploration and exploitation and gradually converges toward optimal or near-optimal solutions.
ACO is a population-based algorithm inspired by the pheromone-driven path selection observed in real ant colonies [
20,
21]. In this context, each artificial ant probabilistically constructs a candidate threshold solution by favoring paths with higher pheromone intensity and stronger heuristic desirability.
The pheromone update rule can be written as
where τ
ij(t) is the pheromone level on the path from i to j and η
ij is yhe heuristic value (often the inverse of the distance) and ρ ∈ [0, 1] is the Pheromone evaporation rate (dimensionless), controlling the forgetting of past information and Δτ
ijk is the pheromone deposited by ant k on path i, typically proportional to the fitness of the solution constructed by that ant (dimensionless)
Each “path” i corresponds to one threshold dimension T
j. Ant k builds a solution (threshold vector T
k). The probability P
ij(t) of an ant moving from node i to node j at time t is given by:
where α and β are the control of the relative importance of pheromone and heuristic information and N
i is the set of admissible nodes (threshold values) reachable from node i.
The transition probability expressions used in Equations (14), (25) and (28) are based on the standard ant colony optimization formulation and are deliberately kept identical across the different hybrid schemes to ensure methodological consistency and fair comparison.
ACO offers strong local exploitation and memory-based refinement through its pheromone mechanism, which can yield high diagnostic sensitivity when tuning DGA thresholds. However, its performance is sensitive to pheromone initialization and parameter settings, and it often benefits from hybridization with other metaheuristics to strengthen global search capability [
20,
21].
2.5. PSO–GWO Hybrid Model
The PSO–GWO hybrid model combines the velocity-driven search mechanism of PSO with the leader-based hierarchy of GWO to achieve a balance between rapid convergence and structured exploration [
22,
23,
24].
The PSO–GWO hybrid model combines the fast convergence behavior of particle swarm optimization with the leadership-based search mechanism of the grey wolf optimizer. In this hybrid structure, PSO mainly updates candidate solutions using velocity and position equations, while GWO guides the search toward promising regions via the alpha, beta, and delta wolves. This cooperation enhances convergence reliability and reduces the risk of premature stagnation in local optima.
The hybrid position update is defined as
where v
i(t) and x
i(t) denote the velocity and position of particle i at iteration t, p
i is the personal best position, g is the global best position, www is the inertia weight, c
1 and c
2 are acceleration coefficients, and r
1, r
2 ∈ [0, 1] are random numbers.
GWO Distance and Position Updates:
where X
α, X
β, and X
δ are the positions of the three best solutions, and A and C are coefficient vectors controlling convergence.
The hybridization principle combines exploration of PSO (velocity-driven) with exploitation of GWO (leader hierarchy).
Introducing a hybrid weight λ(t) ∈ [0, 1]
gradually shifts from PSO to GWO dominance as iterations increase.
where λ(t) is the dynamic hybrid weight controlling the PSO-to-GWO transition.
PSO–GWO hybrid accelerates convergence while maintaining robustness, making it suitable for online transformer monitoring applications where both speed and stability are required [
22,
24].
2.6. GA–ACO Hybrid Model
The GA–ACO hybrid approach integrates the broad global exploration capability of GA with the pheromone-guided local exploitation of ACO [
21,
24].
The GA–ACO hybrid framework integrates the global exploration capability of genetic algorithms with the constructive search behavior of ant colony optimization. GA operators such as selection, crossover, and mutation generate diverse candidate solutions, while ACO mechanisms reinforce high-quality solutions through pheromone updating. This hybridization improves solution diversity and stability, especially in complex optimization landscapes.
A generic hybrid update for the threshold vector can be expressed as
Crossover:
where λ ∈ [0, 1] is the crossover blending coefficient (dimensionless) controlling the contribution from parent thresholds T
i(t) and T
j(t).
Mutation is then applied to preserve population diversity:
where μ is the mutation rate scaling the Gaussian random variable and N(0, 1) to maintain population diversity.
In the ACO phase, pheromone trails are updated according to
where ρ ∈ [0, 1] is the pheromone evaporation rate and Δτ
ijk is the pheromone deposited by ant k.
The probability of selecting threshold component j is defined as
where α and β control the influence of pheromone intensity and heuristic desirability, respectively.
where ω ∈ [0, 1] is the hybrid balance factor which controls the contribution of GA versus ACO.
Let T
i denote thresholds for image or signal segmentation.
With pheromone probability:
By combining GA’s strong diversity maintenance with ACO’s memory-based refinement, GA–ACO delivers high diagnostic sensitivity and stable performance, particularly in noisy environments and in the presence of complex or overlapping fault patterns [
21,
24].
2.7. Artificial Protozoan Optimization (APO)
Artificial Protozoan Optimization (APO) is a population-based metaheuristic inspired by the foraging and adaptive motility mechanisms of protozoa in response to chemical and environmental stimuli [
16,
17].
Each protozoan represents a candidate threshold vector and adjusts its motion through chemotactic-like movement, aggregation around favorable regions, and dispersal from unpromising areas, thereby balancing local exploitation and global exploration.
At iteration t, the position of protozoan i can be expressed as
where X
i(t) is the current threshold vector, S
i(t) is an adaptive step-size vector, D
i(t) is a direction vector guided by fitness information, and ∘ denotes element-wise multiplication. The step size is typically updated using a feedback rule of the form.
where γ ∈ [0, 1] is the emory factor and Φ
i(t) is the performance-dependent term that reduces the step size in promising regions.
APO alternates between aggregation and dispersal phases to prevent premature convergence. In the aggregation phase, protozoa tend to move toward locally best solutions or cluster centers to refine thresholds in high-fitness regions; in the dispersal phase, a subset of protozoa is randomly perturbed or reinitialized around unexplored areas to restore diversity and enhance global search capability.
Through this adaptive movement and step-size modulation, APO offers a controllable trade-off between exploration and exploitation, which is particularly suitable for nonlinear and noisy DGA threshold optimization. Its robust convergence behavior and sensitivity to subtle fitness variations make it a promising candidate for high-reliability transformer fault diagnosis and online monitoring applications [
16,
17].
2.8. Hybrid APO–PSO Model (Proposed)
The proposed hybrid APO–PSO algorithm is designed to exploit the complementary strengths of the Artificial Protozoa Optimizer (APO) and Particle Swarm Optimization (PSO). APO offers strong global exploration capability through protozoan-inspired behaviors such as foraging and reproduction, which promote thorough coverage of the search space; however, this persistent exploration can lead to relatively slow convergence in the later stages of optimization. By contrast, PSO converges quickly and provides efficient local exploitation by directing particles toward their personal and global best positions, but it is prone to premature convergence when applied on its own.
To overcome these complementary weaknesses, the hybrid APO–PSO framework couples APO-driven exploration with PSO-based velocity updating. In the proposed scheme, APO governs the generation of exploratory candidate solutions, while PSO dynamics are embedded to accelerate convergence toward promising regions of the search space. A dynamic weighting mechanism is used to balance the contributions of APO and PSO over the course of the optimization. In the initial iterations, a higher weight is assigned to APO updates to emphasize exploration, whereas the influence of PSO is gradually increased in later iterations to reinforce exploitation and refine the solution.
Mathematically, the hybridization is implemented by combining the APO-based position update and the PSO-based candidate solution through a weighted aggregation strategy, as expressed in Equations (25)–(28). This formulation provides a smooth shift from exploration-dominated behavior to exploitation-focused refinement. Consequently, the hybrid APO–PSO algorithm attains faster convergence, improved solution stability, and superior diagnostic performance compared with standalone APO and the other hybrid optimization strategies considered in this study.
2.8.1. APO Foraging Update
A simplified APO position update for protozoan i can be written as
where X
best(t) is the best protozoan at iteration t and r ∈ (0, 1) is a random scalar controlling exploratory movement.
This equation captures the exploratory tendency of APO: movements are biased toward good regions, but the random factor r maintains diversity and avoids overly aggressive exploitation in early iterations.
2.8.2. PSO Velocity Dynamics
In standard PSO, the position and velocity of particle I evolve as
where w is the inertia weight, and c
1 and c
2are acceleration coefficients, and r
1, r
2 ∈ [0, 1] are random numbers.
The parameters w, c1, and c2 control inertia, self-learning, and social learning. These equations naturally emphasize exploitation because particles are repeatedly pulled toward Pibest and Pbest.
2.8.3. Time-Dependent Behavioral Weight
To control the relative contribution of APO-like exploration and PSO-like exploitation over time, a decaying weight is defined as
where T
max is the maximum number of iterations. For t = 0, λ(0) ≈ 1 and the search is dominated by the exploratory component, while for t ≈ T
max, λ(t) ≈ 0, favoring the exploitative component. This exponential schedule smoothly shifts the algorithm from exploration to exploitation as the search progresses.
2.8.4. Derivation of the Hybrid Update
Let T
iAPO(t + 1) denote the APO-based candidate for protozoan i:
which is algebraically equivalent to Equation (32) expressed in terms of the threshold vector T
i.
Using the PSO velocity from Equation (33), the PSO-based candidate is
A natural way to combine the two behaviors is to form a convex combination of the two candidates:
Substituting the expressions of T
iPSO(t + 1) and T
iAPO(t + 1) gives
If the random factor r is absorbed into the APO attraction term and represented explicitly as a directional pull toward the global best, the APO contribution can be written as a deterministic attraction term [P
best(t) − T
i(t)]. This leads to the compact hybrid form used in the proposed model:
In this expression, the first term λ(t)[Ti(t) + vi(t + 1)] retains the full PSO velocity-driven movement, while the second term (1 − λ(t))[Pbest(t) − Ti(t)] plays the role of an APO-style attraction toward the current global best region. The combination is not arbitrary; it results directly from blending APO and PSO candidate moves with a time-varying convex weight and then regrouping terms.
2.8.5. Behavioral Interpretation
For small t, λ(t) ≈ 1, so Ti(t + 1) ≈ Ti(t) + vi(t + 1) and the dynamics are dominated by PSO-like motion enhanced by stochastic exploration inherited from APO’s initialization and diversity mechanisms.
For large t, λ(t) ≈ 0, so Ti(t + 1) ≈ Pbest(t) − Ti(t), corresponding to a strong, directed pull toward the best-known solution and fine exploitation around it.
The exponential decay of λ (t) guarantees a smooth transition between these regimes, preventing sudden changes that could destabilize convergence.
By construction, the hybrid APO–PSO update preserves APO’s exploration capability while enhancing late-stage exploitation and convergence speed, which accounts for the statistically superior diagnostic performance reported for APO-based hybrids in complex engineering optimization and power system applications [
25,
26,
27].
2.8.6. Algorithmic Implementation of the Proposed APO–PSO
The pseudocode follows directly from the mathematical formulation developed in
Section 2 and is applied to optimize the DGA gas-ratio threshold vector used in the diagnostic scheme. The proposed APO–PSO hybrid algorithm follows a structured optimization flow in which APO-based exploration and PSO-based exploitation are combined within a single iterative process. APO governs the generation of exploratory candidate solutions inspired by protozoan behaviors, while PSO velocity updating is incorporated to accelerate convergence toward optimal regions. A dynamic weighting strategy controls the relative influence of APO and PSO across iterations, with exploration dominating in the early stages and exploitation increasing progressively. The complete algorithmic flow of the proposed APO–PSO framework is summarized in Algorithm 1.
| Algorithm 1. Hybrid APO–PSO for DGA Threshold Optimization |
\STATE Initialize population $\{T_i(0)\}, \; i=1,2,\dots,P$ \STATE Initialize velocities $\{v_i(0)\}$ randomly \STATE Evaluate fitness $F(T_i(0))$ for all $i$ \STATE Set personal best $P_i^{best} = T_i(0)$ \STATE Set global best $P^{best} = \arg\min F(T_i(0))$ \FOR{$t = 1$ to $T_{\max}$} \STATE Compute behavioral weight: \STATE \hspace{0.5cm} $\lambda(t) = \exp(-t/T_{\max})$ \FOR{each agent $i = 1$ to $P$} \STATE Update PSO velocity: \STATE \hspace{0.5cm} $v_i(t+1) = w v_i(t)$ \STATE \hspace{1.1cm} $+ c_1 r_1 (P_i^{best} - T_i(t))$ \STATE \hspace{1.1cm} $+ c_2 r_2 (P^{best} - T_i(t))$ \STATE APO-based attraction: \STATE \hspace{0.5cm} $T_i^{APO} = T_i(t) + r(P^{best} - T_i(t))$ \STATE PSO-based candidate: \STATE \hspace{0.5cm} $T_i^{PSO} = T_i(t) + v_i(t+1)$ \STATE Hybrid APO--PSO update: \STATE \hspace{0.5cm} $T_i(t+1) = \lambda(t)T_i^{PSO} + (1-\lambda(t))T_i^{APO}$ \STATE Evaluate fitness $F(T_i(t+1))$ \IF{$F(T_i(t+1)) < F(P_i^{best})$} \STATE $P_i^{best} = T_i(t+1)$ \ENDIF \ENDFOR \STATE Update global best: \STATE \hspace{0.5cm} $P^{best} = \arg\min F(P_i^{best})$ \ENDFOR \STATE \textbf{return} $T^* = P^{best}$ |
To provide a clearer view of how the proposed diagnostic method operates, an overall framework of the APO–PSO-based transformer fault diagnosis approach is presented in
Figure 1, illustrating the main execution steps.
The framework integrates data preprocessing and intelligent optimization within a unified diagnostic pipeline, enabling adaptive and robust fault identification.
2.9. Unified Objective Function
All optimizers in this work minimize the same diagnostic loss function defined as
where T is the threshold vector, N is the number of DGA samples, x
i is the input gas pattern, y
i is the corresponding fault label, f(·) is the classifier output and L(·) is the sample-wise loss.
This unified formulation guarantees a fair comparison across all optimization algorithms, since they are evaluated against the same data, model, and loss metric. It also provides a direct link between the search dynamics of each metaheuristic and the final diagnostic accuracy of the DGA-based transformer fault identification framework, in line with best practices in transformer diagnostics and performance evaluation [
1,
2].
3. Results & Discussion
3.1. Simulation Setup and Evaluation Protocol
All optimization algorithms were programmed and executed in MATLAB R2018a within a dissolved gas analysis (DGA) framework designed for power transformer fault diagnosis. The employed dataset included 500 authentic dissolved gas samples covering five fault types: partial discharge (PD), low-energy discharge (D1), high-energy discharge (D2), low-temperature thermal fault (T1), and medium-temperature thermal fault (T2). These samples were collected from real transformer test records and represent a variety of operational and fault conditions commonly observed in practical transformer monitoring applications.
For each DGA sample, the following dissolved gas components are recorded:
Prior to analysis, standard preprocessing steps such as data validation and normalization were applied to ensure the consistency and comparability of all samples. Anonymized representative samples of the dataset are included within the manuscript. Each optimization algorithm—APO, GA–ACO, PSO–GWO, and the proposed APO–PSO—was executed over 50 independent runs, using 100 iterations and a population size of 100 agents in each run. Distinct random seeds were initialized for every trial to avoid bias from initial conditions and to ensure statistically consistent performance evaluation.
The parameter settings listed in
Table 1 were chosen within the commonly reported ranges found in previous studies on metaheuristic optimization and transformer fault diagnosis. These settings were subsequently fine-tuned through preliminary experiments to achieve an effective balance between diagnostic accuracy, convergence rate, and computational efficiency.
The selection of control parameters for APO and PSO was based on commonly accepted ranges reported in the literature and was further refined through preliminary tuning experiments. These experiments were designed to strike an appropriate balance between convergence speed, solution stability, and diagnostic accuracy, while avoiding excessive sensitivity to parameter changes. In practical use, the proposed hybrid APO–PSO framework takes advantage of its adaptive switching mechanism, which dynamically balances global exploration and local exploitation throughout the optimization process. This adaptive behavior reduces the need for fine manual tuning and improves robustness under varying operating conditions.
To enable a rigorous and well-rounded assessment, several complementary performance metrics were employed:
Best fitness value, reflecting the optimization objective.
Diagnostic accuracy, defined as the percentage of correctly classified fault samples.
Convergence behavior across iterations.
Rapidity, expressed as the rate of change in the fitness value.
Cumulative computational time.
Fitness distribution stability, examined using boxplot-based analysis.
It is worth noting that the fitness value represents an optimization-oriented objective that combines accuracy and margin stability, whereas diagnostic accuracy specifically measures the actual fault classification performance.
Unlike conventional machine learning classifiers that rely on explicit training and testing phases, the proposed diagnostic framework optimizes decision thresholds for DGA interpretation. The optimization was therefore carried out on the full dataset of 500 samples to obtain globally effective diagnostic thresholds. To reduce the risk of overfitting, each optimization algorithm was run 50 times with different random initializations, and performance consistency was assessed using statistical indicators and nonparametric significance tests. This evaluation strategy emphasizes robustness and generalization rather than memorization of specific data samples.
All simulations were conducted on a standard personal computer equipped with an Intel Core i7 processor, 16 GB of RAM, and running MATLAB under a 64-bit Windows operating system. This computing setup corresponds to a typical workstation rather than high-performance computing hardware, so the reported computational performance demonstrates that the proposed APO–PSO framework is feasible for practical offline diagnostic studies and can be adapted to real-world transformer monitoring applications without requiring specialized computing resources.
3.2. Dataset Description and Origin
The data used in this study were obtained from dissolved gas analysis (DGA) measurements carried out by an electrical utility company on in-service oil-immersed power transformers. The dataset covers units operating at voltage levels between 33 kV and 220 kV, with rated capacities in the range of 100–500 MVA, and reflects typical operating conditions encountered in practical transmission and sub-transmission networks.
Transformer oil samples were taken as part of routine condition monitoring and specific fault investigations and analyzed by gas chromatography in accordance with the procedures recommended in IEEE Std C57.104-2019 [
1]. For each sample, the recorded variables include the concentrations of seven key gases (H
2, CH
4, C
2H
2, C
2H
4, C
2H
6, CO, and CO
2), together with auxiliary operating information such as ambient temperature, oil temperature, and transformer loading at the time of sampling.
Fault labels were assigned by experienced utility engineers using established DGA interpretation guidelines, including IEEE, Rogers’ ratio, and Duval methods, and were cross-checked against maintenance logs and inspection reports whenever available. This procedure ensured that the dataset provides a realistic and reliably annotated basis for evaluating the proposed diagnostic framework [
1,
2,
3]. Anonymized representative samples of the dataset are provided in the manuscript.
Prior to the optimization process, the dissolved gas analysis (DGA) samples were subjected to standard preprocessing. These steps included consistency checks to remove incomplete or inconsistent records, normalization of gas concentration values to place them on comparable numerical scales, and verification of data integrity. Fault labels were assigned according to established DGA interpretation guidelines to ensure consistent labeling across all samples. This preprocessing strategy provides reliable input data and enables a fair comparison among the evaluated optimization algorithms.
Representative Real-World DGA Samples
To illustrate the nature of the real-world data processed in this study,
Table 2 presents a representative subset of anonymized DGA records obtained from the utility dataset. These samples demonstrate how physical operating conditions and gas concentrations are mapped to diagnostic outcomes and maintenance recommendations.
The complete dataset is subject to confidentiality agreements and cannot be publicly released in full; however, the representative samples provided are sufficient to demonstrate the realism and applicability of the proposed optimization framework.
The fault diagnosis labels reported in
Table 2 correspond directly to the classification results produced by the proposed diagnostic algorithm.
The dissolved gas data samples were preprocessed before applying the intelligent diagnostic algorithms to ensure both data reliability and numerical stability. The preprocessing procedure involved consistency checks to detect missing or abnormal values, followed by normalization of the gas concentration measurements to a common scale. The normalized gas quantities were then arranged into input feature vectors suitable for subsequent optimization and classification. These steps help mitigate bias arising from differing gas magnitude ranges and support a fair performance comparison among the investigated diagnostic algorithms.
3.3. Convergence Analysis
The mean convergence curves, obtained by averaging results over 50 independent runs for each algorithm, are depicted in
Figure 2.
APO shows strong exploratory behavior in the early iterations, leading to rapid improvements in the fitness value at the start of the search. Its convergence rate, however, tends to slow in the later stages because the algorithm continues to emphasize exploration rather than intensive local refinement.
GA–ACO converges in a smooth and stable manner, benefiting initially from the genetic diversity generated by crossover and mutation, and later from pheromone-guided reinforcement of promising solution paths. PSO–GWO achieves relatively fast convergence by combining PSO’s velocity-driven exploitation with the leader-based search hierarchy of GWO, which helps guide the population toward high-quality regions of the search space.
Among all tested methods, APO–PSO consistently delivers the fastest convergence and the highest final fitness values. This indicates that the PSO component provides effective local exploitation that compensates for APO’s tendency toward slower late-stage convergence. A key observation is that APO–PSO attains near-optimal fitness in fewer iterations while maintaining a smooth and monotonic improvement trend, reflecting a well-balanced interplay between exploration and exploitation.
3.4. Rapidity and Optimization Stability
The rapidity curves, defined as the absolute change in fitness between successive iterations and illustrated in
Figure 3, offer additional insight into the stability of the optimization process.
The rapidity curves do not rely on a separate predictive model; instead, they represent the rate of change in the fitness value across successive iterations. Specifically, rapidity is computed as the difference between fitness values in consecutive iterations, which provides an indication of the convergence speed of the optimization process.
Standalone APO shows noticeable oscillations in the mid-to-late iterations, indicating fluctuations between exploration and exploitation. GA–ACO reduces these oscillations through pheromone reinforcement, but its convergence is more gradual. PSO–GWO stabilizes relatively quickly; however, it can experience early stagnation if the search concentrates too soon around suboptimal regions. In contrast, APO–PSO exhibits the smoothest rapidity decay, reflecting controlled convergence without abrupt fitness jumps. This pattern indicates that the hybrid APO–PSO design suppresses unnecessary exploration while still maintaining adaptive refinement of solutions.
3.5. Fitness Distribution and Robustness
The distribution of final fitness values over 50 independent runs for each algorithm is summarized in
Figure 4.
APO displays comparatively wider interquartile ranges, indicating a higher sensitivity to random initialization and greater variability in the obtained solutions. GA–ACO and PSO–GWO markedly reduce this fitness variance, leading to more consistent outcomes across repeated runs. APO–PSO, however, presents the narrowest interquartile range and the smallest overall variance, which reflects a high level of robustness and repeatability.
This robustness is particularly important in practical transformer monitoring applications, where consistent performance across runs is essential.
3.6. Computational Time Analysis
The cumulative execution times recorded for each optimization method are depicted in
Figure 5.
APO incurs a moderate computational cost because of its repeated exploratory position updates. GA–ACO introduces additional overhead due to the application of genetic operators and pheromone-update procedures. PSO–GWO attains shorter runtimes by relying on direct, velocity-based position updates. APO–PSO, in turn, maintains competitive computational time while delivering superior optimization performance, which makes it suitable for both offline studies and near real-time diagnostic applications.
3.7. Diagnostic Accuracy and Fault Sensitivity
The comparative assessment of APO and the hybrid optimization algorithms over 50 independent runs shows that hybridization consistently improves diagnostic performance (
Table 3). The proposed hybrid APO–PSO achieves the highest average best fitness value (0.55593) together with the lowest fitness variance (2.95 × 10
−5), indicating both high accuracy and strong robustness.
Although the hybrid PSO–GWO scheme attains the shortest execution time (0.013 s), its fitness stability is inferior to that of APO–PSO. The hybrid GA–ACO provides a moderate improvement over standalone APO but does not reach the same level of accuracy or stability as APO–PSO. Statistical analysis using the Friedman test (p = 9.09 × 10−18) confirms that the performance differences among the algorithms are significant, and Wilcoxon signed-rank tests further verify that the gains achieved by APO–PSO are statistically meaningful, particularly when compared with APO and GA–ACO. Taken together, these results indicate that APO–PSO offers the most favorable balance among accuracy, robustness, and computational feasibility for DGA-based transformer fault diagnosis.
The statistical analysis supports the superiority of the proposed hybrid optimizer. The Friedman test yields p = 9.0869 × 10−18, indicating that the performance differences among the compared algorithms are statistically significant. Pairwise Wilcoxon signed-rank tests further show that APO–PSO significantly outperforms APO (p = 4.51 × 10−9) and GA–ACO (p = 8.53 × 10−10), while its performance is statistically comparable to PSO–GWO (p = 2.73 × 10−1).
In terms of diagnostic accuracy, the individual optimizers (APO, GA–ACO, PSO–GWO) achieve accuracy levels in the range of 94–96%. APO–PSO consistently attains the highest accuracy, around 96–97%, with noticeable gains in distinguishing between closely related thermal fault categories, which are typically difficult to separate in DGA-based diagnosis. This improvement is mainly attributed to the dynamically optimized threshold boundaries, which adapt more effectively to overlapping gas ratio distributions.
3.8. Statistical Validation
To substantiate the observed differences, a Friedman test followed by pairwise Wilcoxon signed-rank tests was applied to the final fitness values of all algorithms. The Friedman test confirmed that the methods differ significantly at the 5% significance level (p < 0.05). The Wilcoxon tests verified that APO–PSO offers statistically superior performance compared with standalone APO, GA–ACO, and PSO–GWO. These findings indicate that the performance gains of APO–PSO are not due to random variation but result from the proposed hybrid optimization strategy.
3.9. Comparative Technical Discussion
For consistency and fair comparison, all results reported in
Table 4, including those corresponding to the Rogers ratio method and the standalone optimization algorithms, were obtained from our own experimental evaluations using the same dataset and identical diagnostic protocol. No performance values were adopted from external literature sources.
APO–PSO achieves the highest average diagnostic accuracy (approximately 96–97.5%), outperforming standalone APO by about 2–3%, PSO–GWO by around 1–2%, and GA–ACO by roughly 1%. It also attains the lowest fitness variance over 50 runs, which reflects strong robustness and repeatability, and shows the fastest convergence due to PSO-driven exploitation in the later iterations. Statistical tests based on the Friedman and Wilcoxon procedures (p < 0.05) confirm that these improvements are significant, and the method offers a favorable exploration–exploitation balance that mitigates APO’s slow late-stage convergence and PSO’s tendency toward premature stagnation.
To further assess the diagnostic effectiveness of the proposed approach, its classification performance was evaluated across different fault categories. The results show that the hybrid APO–PSO algorithm maintains consistently high diagnostic accuracy for partial discharge, electrical, and thermal faults. In particular, the method exhibits enhanced discrimination between closely related fault types, such as low- versus high-energy discharges and low- versus medium-temperature thermal faults. This consistent behavior under varying fault conditions underscores the robustness and generalization capability of the proposed diagnostic framework.
Strengths of the Proposed APO–PSO
Combines APO’s wide-range global exploration with PSO’s rapid local exploitation.
Reduces stagnation in the later stages of the search.
Increases diagnostic accuracy and improves stability across runs.
Preserves an acceptable computational burden suitable for practical use.
Limitations
More complex to implement and analyze than single optimizers.
Requires careful tuning of the hybrid control parameters.
Does not yet exploit deep learning–based feature extraction within the diagnostic pipeline.
3.10. Practical Implications
APO–PSO is well-suited for adaptive transformer fault diagnosis systems that require reliable online decision support.
Hybrid optimization enables real-time or near real-time adjustment of diagnostic thresholds.
The approach enhances the reliability of DGA-based monitoring under changing operating conditions.
3.11. Summary of Findings
The simulation results show that hybrid metaheuristic optimization can substantially improve transformer fault diagnosis performance compared with standalone optimizers. Among all tested algorithms, APO–PSO offers the most favorable compromise between accuracy, convergence speed, robustness, and computational efficiency, making it a strong candidate for deployment in next-generation intelligent transformer monitoring systems.
4. Conclusions
This work presented a comprehensive optimization-based framework for transformer fault diagnosis using dissolved gas analysis, addressing key limitations of traditional ratio-based schemes and individual metaheuristic optimizers. Four optimization strategies—APO, GA–ACO, PSO–GWO, and the proposed hybrid APO–PSO—were assessed using 500 real DGA samples over 50 independent runs to ensure statistical reliability.
The findings showed that hybrid optimization substantially improves diagnostic performance. Standalone APO delivered satisfactory accuracy owing to its strong global exploration ability, but the proposed APO–PSO hybrid consistently outperformed all benchmark methods, achieving about 96–97% diagnostic accuracy, faster convergence, and the lowest fitness variance across repeated trials. Statistical analyses using Friedman and Wilcoxon tests confirmed that these gains are statistically significant. From an engineering standpoint, the proposed framework enables more reliable fault discrimination and enhanced robustness under varying operating conditions, making it suitable for both offline diagnostic studies and online transformer condition monitoring. Overall, the hybrid APO–PSO approach offers an effective and robust solution for intelligent transformer fault diagnosis and provides a solid basis for advanced monitoring and asset management systems.
Despite the promising results, this study has certain limitations. The experimental evaluation was conducted on a dataset derived from a relatively small set of transformer types and operating conditions, which may restrict direct generalization to all practical scenarios. Moreover, the current implementation is limited to offline analysis, and further investigation is needed to examine computational constraints and real-time deployment requirements in large-scale monitoring systems. Future work could focus on combining the APO–PSO framework with conventional machine learning and deep learning models for adaptive parameter tuning and improved feature extraction. Furthermore, validating the method on datasets from transformers with different ratings and operating environments would allow a more thorough assessment of its generalization capability. Future research will investigate the integration of the proposed APO–PSO framework into an online diagnostic platform for real-time transformer condition monitoring and verification.