Next Article in Journal
Optimal Scheduling of Weak-Grid Green Ammonia Systems Based on ALK–PEM Electrolyzer Coordination
Previous Article in Journal
Partial Discharge Signal Denoising for Gas-Insulated Switchgear Using Spearman Coefficient-Optimized VMD and Combined Filtering Algorithm
Previous Article in Special Issue
Simulation Study on SF6 Circuit Breaker Arc-Extinguishing Chamber Based on Lattice Boltzmann Method (LBM)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

PCA and Autoencoder-Based ANN Models for Transformer Fault Diagnosis Using Dissolved Gas Analysis: Comparative Insights and Challenges

by
Mwamba S. Nkwambe
and
Bonginkosi A. Thango
*
Department of Electrical and Electronic Engineering Technology, University of Johannesburg, Johannesburg 2092, South Africa
*
Author to whom correspondence should be addressed.
Energies 2026, 19(12), 2806; https://doi.org/10.3390/en19122806
Submission received: 30 April 2026 / Revised: 5 June 2026 / Accepted: 7 June 2026 / Published: 11 June 2026

Abstract

Accurate fault diagnosis of power transformers using Dissolved Gas Analysis (DGA) depends on effective feature extraction to reduce redundancy and improve classification performance. This study compares linear and nonlinear feature extraction methods viz. Principal Component Analysis (PCA) and bottleneck Autoencoders (AE) to determine whether nonlinear representations provide diagnostic advantages for transformer fault classification. A dataset of 595 IEC 60599-labeled DGA samples covering six fault classes (PD, D1, D2, T1, T2, T3) was used. A 15-dimensional feature space was constructed from gas concentrations, total hydrocarbon content, and IEC-aligned gas ratios. PCA and AE were applied for dimensionality reduction across latent dimensions (k = 1–15), followed by an identical Artificial Neural Network (ANN) classifier. Performance was evaluated using test accuracy, cross-validation stability, and per-class F1-scores. The PCA+ANN model achieved a maximum accuracy of 68.9% at k = 11, outperforming AE+ANN, which achieved 66.4% at k = 4. PCA also demonstrated greater cross-validation stability (62 ± 3.5%) compared to AE (62 ± 6.6%). However, AE improved F1-scores for discharge faults (D1 and D2) by enhancing nonlinear separation of overlapping samples. PCA provides superior overall accuracy and stability for transformer fault diagnosis, while AE offers targeted advantages in distinguishing discharge-related faults. These findings establish a consistent benchmark for future studies and highlight the complementary roles of linear and nonlinear feature extraction in DGA-based diagnostic systems.

1. Introduction

Power transformers are critical assets in modern electrical power systems because they enable voltage transformation, efficient energy transmission, and stable power delivery across generation, transmission, and distribution networks [1,2,3]. In substations and integrated energy systems, failure of transformers can affect substation reliability and system resilience during extreme events such as earthquakes, cascading outages, and severe operational disturbances, motivating resilient operation strategies, risk-economic coordinated seismic-resilient planning, and interdependent expansion planning for integrated electricity and natural gas networks [4,5,6]. Their reliable operation is therefore essential for maintaining grid security and continuity of supply. However, despite their robust design, power transformers are exposed to electrical, thermal, mechanical, and environmental stresses during long-term operation. These stresses may gradually degrade the insulation system, particularly transformer oil and cellulose paper, and may initiate internal defects such as partial discharge, low-energy discharge, high-energy arcing, and thermal overheating. If such incipient faults remain undetected, they can evolve into severe failures, causing forced outages, equipment damage, safety risks, and substantial economic losses [7]. Dissolved Gas Analysis (DGA) is one of the most widely adopted condition-monitoring techniques for transformer fault diagnosis. During abnormal electrical or thermal activity inside a transformer, insulating oil and solid insulation decompose and release characteristic gases. The type and concentration of these gases provide valuable diagnostic information regarding the underlying fault mechanism. Standards such as IEC 60599 and IEEE C57.104 provide interpretation guidelines for relating dissolved gas patterns to transformer fault types [8,9,10]. Conventional diagnostic approaches, including IEC ratio methods, Rogers ratios, and related rule-based schemes, have been extensively used in practice. However, these approaches often struggle when gas signatures overlap, when measurements are affected by uncertainty, or when fault mechanisms do not fit neatly into predefined ratio boundaries [11]. With the development of artificial intelligence and data-driven condition monitoring, machine learning methods have increasingly been applied to DGA-based transformer fault diagnosis. Among these methods, Artificial Neural Networks (ANNs) have demonstrated strong potential because of their ability to approximate nonlinear relationships between gas features and fault classes [12,13,14]. Nevertheless, the performance of ANN-based diagnostic models depends strongly on the quality of the input feature space. When raw gas concentrations are combined with gas-ratio features, the resulting feature set may contain substantial redundancy and correlation. Such redundancy can degrade generalization, increase model sensitivity, and contribute to overfitting, especially when the available fault dataset is limited [15].
Feature extraction offers a practical solution to this problem by transforming the original high-dimensional feature space into a compact representation that preserves diagnostically relevant information while reducing redundancy. Principal Component Analysis (PCA) and Autoencoders (AEs) are two representative feature extraction approaches. PCA is a linear dimensionality reduction method that projects correlated variables onto orthogonal components arranged according to explained variance [12,16,17]. Its advantages include mathematical transparency, deterministic behavior, and computational efficiency. In contrast, bottleneck Autoencoders are nonlinear neural network models that learn compressed latent representations through reconstruction-based training. Their main advantage is the ability to capture nonlinear structures that may not be adequately represented by linear projections. Although both PCA and Autoencoder-based dimensionality reduction have been applied in intelligent diagnostic systems, their relative diagnostic value for DGA-based transformer fault classification remains insufficiently clarified. In particular, it is important to determine whether the additional modeling flexibility of nonlinear Autoencoders produces measurable diagnostic benefits over the simpler and more stable PCA approach when both are evaluated under identical experimental conditions. This study, therefore, compares PCA+ANN and AE+ANN fault diagnosis pipelines using the same IEC 60599-labeled DGA dataset, the same engineered feature space, and the same ANN classifier. The central research question is whether nonlinear hidden-feature learning provides practical diagnostic advantages over linear dimensionality reduction for classifying transformer fault types.

1.1. Necessity of the Study

The diagnosis of transformer faults from DGA measurements is not a straightforward pattern-recognition task. Gas formation inside a transformer is governed by complex physical and chemical processes associated with insulation degradation, discharge activity, oil decomposition, and thermal stress. As a result, different fault mechanisms may produce partially overlapping gas signatures. For example, discharge-related faults such as partial discharge, low-energy discharge, and high-energy discharge may all involve elevated hydrogen and acetylene-related patterns, while thermal faults may show gradual transitions across temperature-dependent gas-generation regions. This overlap makes it difficult for conventional rule-based techniques to provide consistently accurate classification across all fault types. The increasing use of ANN-based classifiers in DGA fault diagnosis has improved the ability to model nonlinear relationships between gas features and fault categories. However, ANN classifiers do not automatically resolve all limitations associated with the input data [18,19]. If the feature space contains strongly correlated variables, redundant ratio features, or overlapping class distributions, the classifier may learn patterns that are unstable or dataset-specific. This is particularly important in transformer fault diagnosis, where datasets are often moderate in size, class distributions may be uneven, and field measurements may contain noise or uncertainty. Therefore, the feature extraction stage becomes a critical component of the diagnostic pipeline rather than a secondary preprocessing step. PCA is commonly used because it provides a compact and stable linear representation of correlated variables. For industrial diagnostic applications, this stability is valuable because the method is deterministic, computationally efficient, and relatively easy to interpret. However, DGA fault patterns are not always linearly separable. A purely linear transformation may preserve global variance while failing to emphasize nonlinear class boundaries, especially among fault classes with similar gas profiles. Autoencoders provide an alternative by learning nonlinear latent representations that may better capture curved or complex data structures. Yet, Autoencoders introduce their own practical challenges, including sensitivity to initialization, architecture selection, training variability, and reduced interpretability. For this reason, it is necessary to evaluate PCA and Autoencoder feature extraction under a controlled and comparable framework. Without such a controlled comparison, performance differences may be caused by unrelated factors such as different datasets, different classifiers, different feature sets, or different evaluation procedures. A fair comparison must isolate the feature extraction method as the primary variable while keeping the classifier, dataset, preprocessing, and performance metrics consistent. This study addresses that need by examining whether nonlinear Autoencoder representations offer genuine diagnostic advantages over PCA, or whether the simpler linear approach remains preferable for practical transformer fault diagnosis.

1.2. Novelty of the Proposed Work

Existing research on DGA-based transformer fault diagnosis has shown that both statistical dimensionality reduction and neural feature learning can improve diagnostic performance. PCA-based methods are often valued for reducing correlated variables and producing compact features before classification, while Autoencoder-based methods are increasingly used to extract nonlinear latent patterns from high-dimensional diagnostic data. However, many existing studies evaluate these methods in isolation or compare them under inconsistent experimental settings. In such cases, conclusions about the superiority of one method over another may be influenced by variations in dataset composition, feature engineering strategy, classifier architecture, hyperparameter tuning, or validation procedure. The novelty of this study is that it establishes a controlled head-to-head comparison between a linear feature extraction paradigm and a nonlinear feature extraction paradigm for transformer fault diagnosis. PCA and bottleneck Autoencoder models are not evaluated as independent, unrelated diagnostic systems; instead, they are embedded into two parallel diagnostic pipelines that share the same preprocessing stage, the same 15-dimensional DGA feature space, and the same ANN classifier. By maintaining identical classification conditions, the study ensures that any observed difference in diagnostic performance can be attributed primarily to the feature extraction mechanism. This makes the comparison more rigorous than approaches that test PCA- and Autoencoder-based models with different classifiers or feature inputs. Another novel aspect of the work is the systematic evaluation of latent dimensionality. Rather than selecting one arbitrary compressed dimension, both PCA and Autoencoder representations are assessed across the full range of latent dimensions from k = 1 to k = 15. This allows the study to examine not only the highest classification accuracy but also the relationship between compression level and diagnostic performance. Such analysis is important because a method that performs well only at high dimensionality may not be suitable for compact real-time diagnostic systems, while a method that retains useful performance at low dimensionality may be advantageous for embedded implementation.
The study also extends beyond overall accuracy by analyzing cross-validation stability and per-class diagnostic behavior. In practical transformer monitoring, overall accuracy alone is insufficient because different fault classes carry different operational implications. Misclassification among discharge faults, for example, may lead to different maintenance priorities compared with misclassification among thermal faults. By evaluating precision, recall, and F1-score for each IEC 60599 fault class, the proposed work reveals where each feature extraction strategy is diagnostically stronger or weaker. This provides a more detailed interpretation of model behavior than a single aggregate accuracy value. Furthermore, the proposed analysis connects numerical performance with feature-space interpretation. PCA is examined in terms of variance retention and component behavior, while the Autoencoder is examined through reconstruction error and latent representation. This allows the study to explain why PCA may provide stronger overall stability, while Autoencoder representations may still offer targeted advantages in separating overlapping discharge-related classes. Therefore, the novelty of the work is not only the identification of the better-performing model, but also the clarification of the diagnostic conditions under which linear and nonlinear feature extraction are most useful.

1.3. Research Contributions

This paper makes several contributions to DGA-based transformer fault diagnosis. First, it develops a comparative diagnostic framework that integrates PCA-based and Autoencoder-based feature extraction with an identical ANN classifier. This controlled design allows the study to isolate the effect of feature extraction and provide a fair comparison between linear and nonlinear dimensionality reduction approaches. Second, the study constructs and evaluates a 15-dimensional DGA feature space comprising raw dissolved gas concentrations, total hydrocarbon content, and IEC-aligned gas-ratio features. This feature set captures both absolute gas concentrations and relative gas-generation patterns, thereby providing a richer diagnostic representation than raw gas inputs alone. Third, the study systematically investigates the effect of latent dimensionality on classification performance. Both PCA+ANN and AE+ANN pipelines are evaluated across latent dimensions k = 1 to k = 15, allowing the analysis to identify the optimal dimensionality for each approach and to assess the trade-off between compression and diagnostic accuracy. Fourth, the work provides a detailed comparison of classification performance using overall accuracy, cross-validation stability, and per-class precision, recall, and F1-score. This makes it possible to determine not only which method performs better overall, but also which method is more effective for specific IEC 60599 fault categories. Fifth, the study offers interpretive insight into the relative strengths of PCA and Autoencoder representations.
Finally, the study establishes a consistent benchmark for future work on feature extraction in transformer fault diagnosis. By using identical classifiers, identical data partitions, and identical evaluation procedures, the results provide a reproducible baseline against which future PCA, Autoencoder, hybrid, or deep feature-learning methods can be assessed.
The remainder of this paper is organized as follows. Section 2 presents the theoretical background of DGA-based transformer fault classification, PCA-based linear feature extraction, bottleneck Autoencoder-based nonlinear feature extraction, and ANN classification. Section 3 describes the dataset, feature engineering process, comparative pipeline architecture, and evaluation procedure. Section 4 presents and discusses the experimental results, including dimensionality-reduction behavior, classification accuracy, confusion matrices, per-class metrics, cross-validation stability, and classifier benchmarking. Section 5 concludes the paper and outlines directions for future research.

2. Theoretical Background

2.1. DGA and IEC 60599 Fault Classification

Dissolved Gas Analysis (DGA) is one of the most established diagnostic techniques for assessing the internal condition of oil-immersed power transformers. During normal operation, only small quantities of gases are produced in transformer oil. However, under abnormal electrical or thermal stress, the insulating oil and cellulose materials decompose and generate characteristic gases [2]. The type, concentration, and relative proportion of these gases provide important diagnostic information about the nature and severity of the internal fault [4]. DGA generally considers key dissolved gases such as hydrogen (H2), methane (CH4), ethane (C2H6), ethylene (C2H4), acetylene (C2H2), carbon monoxide (CO), and carbon dioxide (CO2). Among these, hydrocarbon and hydrogen gases are particularly important for distinguishing discharge and thermal fault mechanisms. The composition and concentration of gases resulting from the decomposition of transformer oil and insulating materials vary according to the fault type. Therefore, gas patterns can be mapped to specific fault categories using standardized interpretation methods. IEC 60599 classifies transformer faults into six major fault categories: partial discharge (PD), low-energy discharge (D1), high-energy discharge (D2), and three thermal fault levels (T1, T2, and T3). These fault types are summarized in Table 1, which presents the associated physical mechanisms and primary gas indicators used for diagnosis [1,20].
Although IEC 60599 provides an important diagnostic basis, practical DGA interpretation is often complicated by overlapping gas patterns, uncertain measurement conditions, and mixed fault mechanisms. For example, discharge faults may share elevated hydrogen and acetylene patterns, while thermal faults may show gradual gas-transition behavior across temperature regions. These overlaps make rule-based diagnosis difficult and motivate the use of data-driven approaches capable of learning nonlinear relationships between gas features and fault classes [11].

2.2. Linear Feature Extraction Using PCA

Principal Component Analysis (PCA) is a widely used linear feature extraction method for reducing dimensionality in correlated datasets [12,21]. In DGA-based transformer fault diagnosis, raw gas concentrations and gas-ratio features often contain redundant information because several gases are generated together during related fault mechanisms. PCA transforms the original feature space into a new set of orthogonal variables called principal components, which are arranged according to the amount of variance they explain [3,12]. The PCA space is defined by k principal components. These components are mutually uncorrelated and represent the dominant directions of variation in the dataset. The transformation can be obtained through eigenvalue decomposition of the covariance matrix or by using Singular Value Decomposition (SVD). In this study, PCA is used to obtain a compact linear representation of the 15-dimensional DGA feature space before ANN classification. The PCA procedure begins with standardization of the original feature values:
x j = x j μ j σ j , j = 1,2 , , m
where x j is the standardized value of the j t h feature, while μ j and σ j are the mean and standard deviation of the same feature, respectively. After standardization, the covariance matrix is computed as:
C x = 1 n X X T
where X is the standardized data matrix, n is the number of samples, and C x is the covariance matrix. The eigenvectors of the covariance matrix define the principal component directions. The transformed PCA feature space is obtained as:
P x = V T X
where V contains the selected eigenvectors associated with the largest eigenvalues. By retaining only the first k components, PCA reduces the input dimensionality while preserving the dominant variance structure of the DGA feature space. PCA is attractive for industrial diagnostic systems because it is deterministic, computationally efficient, and relatively interpretable. However, since PCA is a linear projection, it may not fully capture nonlinear relationships among DGA features, particularly when discharge and thermal fault classes overlap in curved or complex regions of the feature space.

2.3. Nonlinear Feature Extraction Using Bottleneck Autoencoder

Autoencoders are neural network-based feature extraction models designed to learn compressed representations of input data. Unlike PCA, which performs a linear projection, an Autoencoder can learn nonlinear mappings through an encoder–decoder structure. This makes it suitable for datasets where relevant diagnostic information may lie on nonlinear manifolds [22,23]. A bottleneck Autoencoder consists of two main parts: an encoder and a decoder. The encoder maps the original input vector into a lower-dimensional latent representation, while the decoder reconstructs the original input from this compressed code. The bottleneck layer forces the network to retain only the most informative structure of the input data, thereby reducing redundancy and suppressing less useful variations. The encoder function is defined as:
ϕ : R d R p
where d is the input dimension and p is the bottleneck or latent dimension, with p < d . The decoder function is expressed as:
ψ : R p R d
The reconstructed input is obtained by composing the encoder and decoder:
X ^ = ψ ( ϕ ( X ) )
The Autoencoder is trained by minimizing the reconstruction loss, commonly expressed using mean squared error (MSE):
L ( ϕ , ψ ) = 1 n i = 1 n X i ψ ( ϕ ( X i ) ) 2
where X i is the original input sample, X ^ i is the reconstructed sample, and n is the number of training samples. In this study, the Autoencoder uses an encoder architecture of m 32 k , where m = 15 is the original DGA feature dimension and k is the bottleneck dimension. The decoder mirrors this structure as k 32 m , with a linear output layer. The Autoencoder is trained using standardized DGA features and the Adam optimizer. The learned bottleneck representation is then used as the input to the ANN classifier. The bottleneck Autoencoder architecture for nonlinear DGA feature extraction is illustrated in Figure 1 [3].

2.4. ANN Classifier

A Multilayer Perceptron (MLP) is used as the classifier for both feature extraction pipelines. The purpose of using the same ANN classifier after PCA and Autoencoder feature extraction is to ensure that performance differences are caused by the feature extraction stage rather than by the classification model [24,25,26]. The classifier receives k input features, corresponding either to PCA components or Autoencoder latent codes. It consists of one hidden layer with 20 neurons using the hyperbolic tangent activation function and an output layer with six neurons corresponding to the IEC 60599 fault classes: PD, D1, D2, T1, T2, and T3. The output layer uses a softmax activation function to generate class probabilities. The ANN is trained using the L-BFGS optimizer with L2 regularization α 0.01 and a maximum of 2000 training epochs. This controlled design ensures that both PCA+ANN and AE+ANN are evaluated under the same classification conditions. Therefore, any observed difference in accuracy, F1-score, or cross-validation stability can be attributed primarily to the difference between linear and nonlinear feature extraction.

2.5. State-of-the-Art in DGA-Based Transformer Fault Diagnosis

Recent studies on transformer condition monitoring show that artificial intelligence and machine learning are increasingly being adopted to improve the reliability of fault diagnosis beyond conventional rule-based DGA interpretation. Ref. [1] applied pattern-recognition ANN models for distinguishing transformer fault conditions from normal operating states, showing that neural classifiers can improve diagnostic automation when sufficient fault patterns are available. Ref. [2] demonstrated the broader value of machine learning in transformer condition assessment by applying numerical indices and learning-based interpretation to frequency response analysis, confirming the relevance of data-driven methods for transformer diagnostics beyond DGA alone. Root-cause analysis has also benefited from machine learning. Ref. [7] showed that ML-assisted failure analysis can improve the identification of transformer failure mechanisms, supporting the view that intelligent diagnostic systems are useful for both predictive maintenance and post-failure investigation. More recently, ref. [8] provided a systematic review of AI and machine learning in transformer fault diagnosis, emphasizing that intelligent models can overcome several limitations of traditional diagnostic methods, particularly when nonlinear relationships exist between input features and fault mechanisms. Deep learning methods have also been introduced to improve the representational capacity of diagnostic systems. Ref. [9] proposed an improved deep coupled dense convolutional neural network for transformer fault diagnosis, demonstrating the growing role of deep architectures in extracting discriminative patterns from transformer diagnostic data. Ref. [11] specifically compared Autoencoders and PCA for low-dimensional DGA-based fault diagnosis, reporting that nonlinear feature learning may provide advantages under strong dimensionality reduction. This study is closely related to the present work; however, the current paper extends the comparison by evaluating both methods across all latent dimensions and by using an identical ANN classifier for a controlled head-to-head assessment. PCA-based approaches remain important because of their simplicity, interpretability, and stability. Ref. [12] combined PCA with ANN for transformer fault diagnosis and showed that PCA can reduce redundancy before classification. However, the diagnostic performance of PCA depends on how well linear projections preserve class-discriminative gas patterns. In contrast, ref. [15] used adversarial generative networks and deep stacked Autoencoders, showing that deep representation learning can improve fault diagnosis by learning nonlinear structures from DGA data. These studies indicate that the main unresolved issue is not whether feature extraction is useful, but whether linear or nonlinear feature extraction is preferable under controlled diagnostic conditions. Several studies also provide methodological foundations relevant to feature-space representation and dimensionality reduction. Ref. [18] Introduced t-SNE as a nonlinear visualization method for high-dimensional data, which is relevant for understanding separability in latent spaces. Ref. [20] proposed a feature-extraction and ensemble-learning model for transformer fault diagnosis, further confirming that diagnostic performance can be improved when informative transformed features are supplied to classifiers. Ref. [3] reviewed deep Autoencoder neural networks and emphasized their usefulness for compression, representation learning, and nonlinear feature extraction. Ref. [27] also demonstrated the effectiveness of stacked Autoencoders for feature extraction, supporting their application in problems where nonlinear latent structures are expected.
Although these studies have advanced transformer fault diagnosis, several gaps remain. Many works use different datasets, feature spaces, classifiers, and validation procedures, making direct comparison difficult. Some studies emphasize overall accuracy but provide limited analysis of per-class behavior, cross-validation stability, or compression effects across different latent dimensions. Therefore, the present study addresses this gap by comparing PCA+ANN and AE+ANN under identical experimental conditions using the same 15-dimensional DGA feature space, the same IEC 60599-labeled dataset, and the same ANN classifier. The comparison of existing state-of-the-art studies and the proposed work is summarized in Table 2.

3. Methodology

3.1. Comparative Pipeline Architecture

This study adopts a controlled comparative methodology to evaluate the diagnostic effectiveness of linear and nonlinear feature extraction for DGA-based transformer fault classification. Two parallel diagnostic pipelines are developed, as shown in Figure 2. The first pipeline combines Principal Component Analysis with an Artificial Neural Network classifier. The second pipeline combines a bottleneck Autoencoder with the same ANN classifier. Both pipelines follow the same general structure: feature engineering, standardization, dimensionality reduction, ANN-based classification, and performance evaluation. The only methodological difference between the two pipelines is the feature extraction stage. In the PCA+ANN pipeline, the standardized 15-dimensional DGA feature vector is projected into a linear k -dimensional principal component space. In the AE+ANN pipeline, the same standardized feature vector is compressed into a nonlinear k -dimensional bottleneck representation. The extracted features are then passed to an identical ANN classifier. This design ensures that the feature extraction method is treated as the independent variable, while classification performance is treated as the dependent variable.
All remaining experimental conditions, including the dataset, train/test split, feature set, scaling procedure, ANN architecture, optimizer, and evaluation metrics, are kept constant. This controlled design is important because it ensures that differences in diagnostic performance can be attributed primarily to PCA-based linear projection or Autoencoder-based nonlinear encoding rather than to unrelated model or data-processing variations.

3.2. Dataset Description and Class Distribution

The dataset used in this study consists of 595 DGA transformer fault records from the power supply and is labeled according to the IEC 60599 fault classification scheme [28]. Representative DGA cases for the investigated transformer fault type, corresponding to gas concentrations in ppm, are detailed in Table A1. Six fault classes are considered: partial discharge (PD), low-energy discharge (D1), high-energy discharge (D2), low-temperature thermal fault (T1), medium-temperature thermal fault (T2), and high-temperature thermal fault (T3). These classes correspond directly to the IEC 60599 fault categories described in Section 2.1. To preserve the class distribution during model development and evaluation, a stratified 80/20 train/test split is used. The training set contains 476 samples, while the testing set contains 119 samples. Stratification ensures that each class is proportionally represented in both subsets, reducing the risk that model performance is biased by class imbalance. The distribution of samples is presented in Table 3 and visualized in Figure 3.
The class distribution shows that T3 is the most represented class with 117 samples, while PD is the least represented class with 58 samples. This moderate imbalance is important when interpreting the classification results, particularly the per-class precision, recall, and F1-score values reported in the results section.

3.3. Feature Construction

The original DGA measurements consist of five dissolved gas concentrations:
g = [ H 2 , C H 4 , C 2 H 6 , C 2 H 4 , C 2 H 2 ]
These gases are selected because they are strongly associated with discharge and thermal degradation mechanisms in transformer oil. To improve diagnostic representation, the raw gas measurements are expanded using total hydrocarbon content and IEC-aligned gas ratios. The total hydrocarbon content, denoted as T C H , is computed as:
T C H = C H 4 + C 2 H 6 + C 2 H 4 + C 2 H 2
Nine gas-ratio features are then constructed to capture relative gas-generation behavior:
r 1 = C H 4 H 2 + ε
r 2 = C 2 H 2 C 2 H 4 + ε
r 3 = C 2 H 4 C 2 H 6 + ε
r 4 = C 2 H 2 C H 4 + ε
r 5 = C 2 H 6 C H 4 + ε
r 6 = C 2 H 4 T C H + ε
r 7 = C 2 H 6 T C H + ε
r 8 = C H 4 T C H + ε
r 9 = C 2 H 2 T C H + ε
where ε = 10 6 is added to each denominator to avoid division by zero. The final input feature vector is therefore defined as:
x = [ H 2 , C H 4 , C 2 H 6 , C 2 H 4 , C 2 H 2 , T C H , r 1 , r 2 , , r 9 ] R 15
Thus, each DGA sample is represented by a 15-dimensional feature vector. This feature construction combines raw gas concentrations, total hydrocarbon content (TCH), and IEC-aligned gas ratios. The raw gases preserve magnitude information, while the ratios emphasize diagnostic relationships associated with IEC 60599 interpretation. As illustrated in Figure 4, the resulting gas distributions show class-dependent behavior, with acetylene C 2 H 2 providing particularly strong visual discrimination between discharge and thermal fault classes.

3.4. Data Standardization

Before feature extraction, all input features are standardized to remove scale effects. This step is essential because gas concentrations and gas ratios may have different numerical ranges. Without standardization, features with larger magnitudes could dominate the PCA variance structure or affect Autoencoder optimization. For each feature x j , Standardization is performed as:
z j = x j μ j σ j
where z j is the standardized feature, x j is the original feature, μ j is the training-set mean, and σ j is the training-set standard deviation. The standardized feature matrix is denoted as:
Z R n × 15
where n is the number of samples. The scaler is fitted only on the training data and then applied to the test data to avoid information leakage.

3.5. Feature-Space Correlation Analysis

A Pearson correlation analysis is performed to examine redundancy in the engineered 15-dimensional feature space. The correlation coefficient between two features x i and x j is computed as:
ρ i j = l = 1 n ( x l i x - i ) ( x l j x - j ) l = 1 n ( x l i x - i ) 2 l = 1 n ( x l j x - j ) 2
where x l i and x l j are the values of features i and j for sample l , while x - i and x - j are the corresponding feature means. The full correlation matrix is expressed as:
R = ρ 11 ρ 12 ρ 1 m ρ 21 ρ 22 ρ 2 m ρ m 1 ρ m 2 ρ m m
where m = 15 . As shown in Figure 5, the engineered DGA feature space exhibits two major correlation structures. The raw gas concentrations, especially H 2 , C H 4 , C 2 H 6 , and C 2 H 4 , are strongly correlated with one another and with T C H . A second correlation block is formed by the T C H -normalized ratio features, including C 2 H 4 / T C H , C 2 H 6 / T C H , C H 4 / T C H , and C 2 H 2 / T C H . This correlation structure confirms that dimensionality reduction is methodologically justified for both PCA and Autoencoder compression.

3.6. PCA-Based Linear Feature Extraction

For the PCA+ANN pipeline, PCA is applied to the standardized feature matrix Z . The covariance matrix is computed as:
C = 1 n 1 Z T Z
The eigenvalue decomposition of the covariance matrix is given by:
C v j = λ j v j
where λ j is the eigenvalue associated with eigenvector v j . The eigenvalues are ordered as:
λ 1 λ 2 λ m
The cumulative explained variance ratio for the first k principal components is calculated as:
E V R ( k ) = j = 1 k λ j j = 1 m λ j × 100 %
The PCA-transformed feature matrix is obtained by projecting the standardized data onto the first k eigenvectors:
Z P C A k = Z V k
where V k = [ v 1 , v 2 , , v k ] . In this study, k is varied from 1 to 15 to evaluate how the number of retained principal components affects classification performance. This directly supports the results in which PCA+ANN achieved its best test accuracy at k = 11 , corresponding to approximately 99.9% cumulative variance retention.

3.7. Autoencoder-Based Nonlinear Feature Extraction

For the AE+ANN pipeline, a bottleneck Autoencoder is trained on the same standardized 15-dimensional feature space. The encoder maps the input vector z R 15 into a lower-dimensional latent representation:
h 1 = f ( W 1 z + b 1 )
q k = f ( W 2 h 1 + b 2 )
where q k R k is the bottleneck feature vector, W 1 and W 2 are encoder weight matrices, b 1 and b 2 are bias terms, and f ( ) denotes the nonlinear activation function. The decoder reconstructs the input as:
h 2 = f ( W 3 q k + b 3 )
z ^ = W 4 h 2 + b 4
where z ^ is the reconstructed standardized input. A linear output layer is used in the decoder to reconstruct continuous standardized features. The reconstruction loss is defined as:
M S E ( k ) = 1 n i = 1 n z i z ^ i 2 2
The Autoencoder architecture used in this study is selected as a balanced shallow nonlinear compression model suitable for the moderate dataset size:
15 32 k 32 15
where bottleneck dimensionality k   is varied from 1 to 15. The model is trained using the Adam optimizer with 32 hidden neurons. After training, the decoder is discarded, and only the encoder output q k is used as the compressed feature representation for ANN classification:
Z A E k = ϕ ( Z )
This procedure supports the feature extraction diagnostics reported in Figure 6, where Autoencoder reconstruction error decreases sharply from k = 1 to k = 4 . The best AE+ANN test accuracy is later obtained at k = 4 , indicating that the nonlinear encoder compressed most diagnostically useful information into a compact latent representation.

3.8. ANN Classification Procedure

After dimensionality reduction, the extracted features are passed to the same ANN classifier. For PCA+ANN, the classifier input is Z P C A k . For AE+ANN, the classifier input is Z A E k . The ANN input vector for sample i is therefore defined as:
u i = z P C A i k , for   PCA+ANN z A E i k , for   AE+ANN
The hidden layer output is computed as:
a i = t a n h ( W h u i + b h )
where W h and b h are the hidden-layer weights and biases. The output logits are:
o i = W o a i + b o
The predicted class probability for class c is computed using the softmax function:
y ^ c i = e x p ( o c i ) r = 1 6 e x p ( o r i )
The predicted fault class is then:
c ^ i = a r g   m a x c   y ^ c i
The ANN contains one input neuron, one hidden layer with 20 neurons, using the hyperbolic tangent activation function (tanh(x)), and six output neurons corresponding to the IEC 60599 classes. The same classifier structure is used for every value of k and for both feature extraction pipelines. The ANN is trained using the L-BFGS optimizer with L2 regularization:
J = 1 n i = 1 n c = 1 6 y c i l o g ( y ^ c i ) + α W 2 2
where y c i is the one-hot encoded true label, y ^ c i is the predicted class probability, α = 0.01 is the regularization coefficient, and W denotes the trainable ANN weights. The maximum number of training iterations is set to 2000.

3.9. Feature Extraction Diagnostics

Feature extraction diagnostics are used to interpret how PCA and the Autoencoder compress the DGA feature space. For PCA, the cumulative explained variance E V R ( k ) in Equation (27) is used to determine how much information is retained as the number of principal components increases. For the Autoencoder, the reconstruction error M S E ( k ) in Equation (33) is used to evaluate how well the compressed latent representation reconstructs the original standardized feature space. As shown in Figure 6, PCA cumulative variance increases progressively as k increases and reaches approximately 99.9% at k = 11 . This indicates that most of the variance in the engineered DGA feature space is distributed across multiple linear directions. In contrast, the Autoencoder reconstruction error drops sharply between k = 1 and k = 4 , suggesting that the nonlinear encoder captures the dominant structure of the DGA feature space using fewer latent dimensions.

3.10. Latent-Space Visualization

To qualitatively assess the separability of the learned representations, the PCA and Autoencoder latent spaces are visualized. For PCA, the first two principal components are plotted as:
s P C A i = [ z P C 1 i , z P C 2 i ]
For the Autoencoder, the first two latent dimensions are plotted as:
s A E i = [ q 1 i , q 2 i ]
These visualizations provide insight into whether the extracted features produce compact class clusters or overlapping class regions. As shown in Figure 7, the PCA latent space shows relatively compact thermal fault clusters, especially for T2 and T3, along the dominant principal component direction. The discharge classes PD, D1, and D2 exhibit greater overlap, reflecting the similarity of their hydrogen- and acetylene-related signatures. The Autoencoder latent space shows a nonlinear reorganization of the feature space, with improved compactness for some regions but persistent overlap among discharge classes. This visual evidence aligns with the classification results, where AE+ANN improves selected discharge fault F1-scores but PCA+ANN achieves stronger overall test accuracy and cross-validation stability.

3.11. Performance Evaluation Metrics

The diagnostic performance of both pipelines is evaluated using test accuracy, precision, recall, F1-score, and cross-validation stability. These metrics are selected because overall accuracy alone may not fully describe model behavior across imbalanced fault classes. The overall test accuracy is computed as:
A c c u r a c y = T P + T N T P + T N + F P + F N
For each class c , precision is defined as:
P r e c i s i o n c = T P c T P c + F P c
Recall is defined as:
R e c a l l c = T P c T P c + F N c
The class-specific F1-score is computed as:
F 1 c = 2 P r e c i s i o n c R e c a l l c P r e c i s i o n c + R e c a l l c
The weighted average F1-score is calculated as:
F 1 w e i g h t e d = c = 1 6 n c N F 1 c
where n c is the number of test samples in class c , and N is the total number of test samples. Cross-validation stability is reported using the mean and standard deviation of validation accuracy:
C V m e a n = 1 K j = 1 K A j
C V s t d = 1 K 1 j = 1 K ( A j C V m e a n ) 2
where A j is the validation accuracy in fold j , and K is the number of validation folds. This metric is used to compare robustness between PCA+ANN and AE+ANN. The reported results show that PCA+ANN achieves greater stability, with 62 ± 3.5 % , compared with 62 ± 6.6 % for AE+ANN.

3.12. Experimental Protocol

The complete experimental procedure is summarized as follows. First, the raw DGA gas measurements are expanded into a 15-dimensional feature vector using total hydrocarbon content and IEC-aligned gas ratios. Second, the dataset is divided into stratified training and testing subsets using an 80/20 split. Third, standardization is fitted on the training data and applied consistently to both training and testing sets. Fourth, PCA and Autoencoder feature extraction are performed independently for each latent dimension k = 1,2 , , 15 . Fifth, the extracted k -dimensional features are used to train identical ANN classifiers. Finally, both pipelines are evaluated using test accuracy, confusion matrices, per-class precision, recall, F1-score, and cross-validation stability. This protocol ensures that the comparison between PCA+ANN and AE+ANN is systematic, controlled, and directly aligned with the results presented in Section 4. The methodology also supports the central objective of the study: to determine whether nonlinear Autoencoder-based feature extraction provides measurable diagnostic advantages over linear PCA-based dimensionality reduction for IEC 60599 transformer fault classification.

4. Results and Discussion

4.1. Classification Accuracy Across Latent Dimensions

The classification performance of the PCA+ANN and AE+ANN pipelines was first evaluated across all latent dimensions k = 1,2 , , 15 . This analysis determines how the degree of dimensionality reduction affects diagnostic performance and identifies the optimal compressed feature size for each method. Table 4 summarizes the representative accuracy values at selected latent dimensions, while Figure 8 presents the complete accuracy trend for both pipelines.
The PCA+ANN pipeline achieved its highest test accuracy of 68.9% at k = 11 , where the retained cumulative variance was approximately 99.9%. This indicates that although the engineered DGA feature space contains redundancy, diagnostically useful information is distributed across several principal components. Therefore, aggressive PCA compression removes information that remains relevant for fault discrimination. The performance increase at higher k -values suggests that the ANN classifier benefits from retaining nearly the full linear variance structure of the original 15-dimensional feature space. In contrast, the AE+ANN pipeline achieved its best test accuracy of 66.4% at k = 4 . This result indicates that the bottleneck Autoencoder compressed most of the useful nonlinear structure into a much smaller latent representation. The sharp decrease in reconstruction error from k = 1 to k = 4 supports this observation, showing that the Autoencoder rapidly learned a compact representation of the standardized DGA feature space. However, increasing the latent dimension beyond k = 4 did not produce a consistent accuracy improvement. At k = 15 , AE+ANN accuracy decreased to 62.2%, suggesting that additional latent dimensions may reintroduce redundant or less discriminative variations into the ANN classifier. PCA+ANN outperformed AE+ANN by 2.5 percentage points at their respective optimal configurations. This confirms that, for the present dataset, linear variance-preserving feature extraction provides the strongest overall classification accuracy. However, AE+ANN achieved competitive accuracy using only four latent dimensions, making it potentially useful when compact representation is prioritized over maximum accuracy.

4.2. Confusion Matrix Analysis

Confusion matrices were used to examine the class-level behavior of the two optimal models: PCA+ANN at k = 11 and AE+ANN at k = 4 . Figure 9 compares the classification patterns produced by both pipelines on the 119-sample test set.
The PCA+ANN model produced the highest overall test accuracy and showed stronger classification performance for several thermal fault classes. In particular, PCA+ANN more effectively classified T2 and T3 faults, indicating that the linear principal component space preserved important gas-ratio information associated with thermal degradation. This is consistent with the observation that thermal classes form more compact structures in the PCA latent space. The AE+ANN model showed a different class-level behavior. Although its overall accuracy was slightly lower, it improved the separation of some discharge-related classes, particularly D1 and D2. This suggests that nonlinear encoding can reorganize overlapping discharge signatures in a way that improves their distinction. However, the AE+ANN model also introduced additional misclassifications among thermal classes, especially where the Autoencoder compressed features too aggressively. Both models struggled most with the PD-D1-D2 discharge group. This is expected because these fault classes share partially overlapping gas-generation mechanisms, especially involving elevated hydrogen and acetylene-related patterns. Such overlap makes the discharge classes inherently more difficult to separate than the higher-temperature thermal classes.

4.3. Per-Class Precision, Recall, and F1-Score Analysis

To provide a more detailed diagnostic interpretation, per-class precision, recall, and F1-score were calculated for both optimal pipelines. Table 5 presents the comparative per-class metrics, while Figure 10 and Figure 11 visualize the same results.
The per-class results show that neither method dominates across all fault classes. PCA+ANN achieved higher F1-scores for PD, T1, T2, and T3, while AE+ANN achieved higher F1-scores for D1 and D2. This confirms that the two feature extraction strategies preserve different diagnostic structures in the DGA feature space. For thermal faults, PCA+ANN performed particularly well. The highest F1-score was obtained for T3, where PCA+ANN achieved 85.1%, compared with 77.6% for AE+ANN. PCA+ANN also achieved a strong F1-score of 81.0% for T2, compared with 69.8% for AE+ANN. These results indicate that the linear principal component space effectively preserves the dominant gas-generation patterns associated with thermal degradation, especially those involving ethylene and hydrocarbon concentration trends. For discharge faults, AE+ANN showed targeted improvement. The F1-score for D1 increased from 54.1% with PCA+ANN to 60.0% with AE+ANN, while the F1-score for D2 increased from 65.4% to 72.0%. These improvements suggest that the nonlinear Autoencoder representation better captures overlapping discharge-related structures that are not fully separated by linear PCA projection. However, AE+ANN showed weaker performance for PD, T2, and T3. The reduction in T2 performance was the largest, with an F1-score decrease of 11.2 percentage points compared with PCA+ANN. This indicates that although nonlinear compression benefits some discharge classes, it may suppress or distort linear gas-ratio information that is important for thermal fault classification. The weighted average F1-score was 68.1% for PCA+ANN and 65.4% for AE+ANN. Therefore, PCA+ANN remains the stronger overall model, while AE+ANN provides class-specific advantages for discharge fault discrimination.

4.4. Cross-Validation Stability Comparison

Cross-validation was used to evaluate the robustness of both pipelines on the training set. The purpose of this analysis was not only to estimate average performance, but also to determine how sensitive each method is to variation in the training data. The PCA+ANN pipeline achieved a cross-validation accuracy of:
62.0 % ± 3.5 %
Whereas the AE+ANN pipeline achieved:
62.0 % ± 6.6 %
Although both methods obtained the same average cross-validation accuracy, PCA+ANN showed substantially lower variability of 2.5%. The smaller standard deviation indicates that PCA+ANN produced more consistent results across validation folds. This stability is expected because PCA is deterministic once the training data are fixed. Its transformation is based on eigenvalue decomposition and does not depend on random weight initialization. In contrast, AE+ANN showed higher fold-to-fold variation. This is consistent with the stochastic nature of neural representation learning. The Autoencoder training process depends on initialization, nonlinear optimization, and latent-space learning dynamics. These factors can lead to different compressed representations across folds, particularly when the dataset is moderately sized. From a practical deployment perspective, this result is important. Industrial transformer diagnostic systems require not only high accuracy, but also predictable and repeatable behavior. Therefore, the lower cross-validation variability of PCA+ANN strengthens its suitability as a reliable baseline for real-world DGA-based fault diagnosis.

4.5. Benchmark Against Conventional Classifiers

To contextualize the performance of the two proposed feature-extraction pipelines, a broader classifier benchmark was conducted using the same DGA test set. Figure 12 compares Gaussian Naïve Bayes, K-Nearest Neighbors, Support Vector Machine, a standard ANN without dimensionality reduction, PCA+ANN, and AE+ANN.
Gaussian Naïve Bayes achieved the weakest performance, with an accuracy of approximately 36.1%. This poor result is expected because DGA features are strongly correlated and do not satisfy the conditional independence assumptions of Naïve Bayes. In addition, gas concentration and ratio features are typically non-Gaussian and class-overlapping, further limiting the suitability of this classifier. KNN achieved approximately 55% accuracy, while SVM achieved approximately 62% accuracy. These results indicate that distance-based and margin-based classifiers can capture some structure in the engineered DGA feature space, but they do not outperform the best feature-extraction-based ANN pipelines. The baseline ANN trained directly on the uncompressed 15-dimensional input achieved 63.9% accuracy. While this result confirms the usefulness of ANN-based nonlinear classification, it also shows that using the full uncompressed feature space does not produce the best performance. The uncompressed ANN also exhibited higher validation variance, indicating greater sensitivity to redundant input features. Both PCA+ANN and AE+ANN outperformed the baseline ANN at their optimal configurations. This supports the central hypothesis of the study: dimensionality reduction improves DGA fault classification by reducing redundancy and providing a more compact representation before ANN classification. Among all tested models, PCA+ANN achieved the best overall test accuracy, while AE+ANN achieved competitive performance with a smaller latent dimension.

4.6. Head-to-Head Comparison of PCA+ANN and AE+ANN

A direct comparison between PCA+ANN and AE+ANN is summarized in Figure 13, which presents overall test accuracy, weighted average precision, and per-class F1-score differences.
The head-to-head comparison confirms three major findings. First, PCA+ANN provides the strongest overall classification performance. It achieved 68.9% test accuracy compared with 66.4% for AE+ANN. It also achieved a higher weighted average F1-score, confirming that its advantage is not limited to a single class. Second, AE+ANN provides targeted diagnostic improvement for discharge-related faults. The F1-score increased by 5.9 percentage points for D1 and 6.6 percentage points for D2. This supports the interpretation that nonlinear encoding can help separate overlapping discharge signatures in the engineered gas-ratio feature space. Third, PCA+ANN remains stronger for most thermal fault classes. The F1-score advantage of PCA+ANN was especially clear for T2 and T3, where thermal gas-generation patterns appear to be better preserved by linear variance-based projection. This suggests that the dominant structure of the thermal classes is sufficiently represented by PCA components. It is important to note that AE+ANN did not improve T2 performance; instead, T2 F1-score decreased by 11.2 percentage points relative to PCA+ANN. Therefore, AE+ANN should not be described as uniformly superior for thermal or general fault classification. Its advantage is specific to selected discharge classes, particularly D1 and D2. The results demonstrate that PCA and Autoencoder feature extraction provide complementary diagnostic behavior. PCA is more suitable when the goal is stable, general-purpose transformer fault classification. Autoencoder-based extraction is more suitable when compact nonlinear representation or improved discharge-class discrimination is prioritized.

4.7. Practical Interpretation and Deployment Implications

The results have direct implications for practical transformer condition-monitoring systems. In industrial environments, diagnostic models must be accurate, stable, computationally efficient, and sufficiently interpretable for engineering decision support. Based on these requirements, PCA+ANN is the preferable general-purpose model for the present dataset because it achieved the highest test accuracy, stronger weighted F1-score, and lower cross-validation variability. The deterministic nature of PCA also supports deployment in settings where repeatability is important. Since PCA does not depend on random initialization or iterative neural representation learning, the extracted features are more stable across repeated experiments. This makes PCA+ANN attractive for applications requiring consistent diagnostic outputs and simpler model validation. However, AE+ANN remains valuable in specific settings. Its ability to achieve 66.4% accuracy using only k = 4 latent features indicates that nonlinear Autoencoder compression can produce compact diagnostic representations. This may be useful for embedded monitoring systems or edge-based transformer diagnostic devices where computational resources and storage are limited. In addition, the improved F1-scores for D1 and D2 suggest that AE+ANN may be useful when discharge fault discrimination is the primary diagnostic objective. Therefore, the choice between PCA+ANN and AE+ANN should depend on the deployment objective. PCA+ANN is recommended as the stronger baseline for robust industrial classification, while AE+ANN may be considered for compact or discharge-focused diagnostic systems.

4.8. Summary of Findings

The main findings from the results can be summarized as follows. PCA+ANN achieved the best overall test accuracy of 68.9% at k = 11 , while AE+ANN achieved its best accuracy of 66.4% at k = 4 . PCA required more latent dimensions to reach its optimum, indicating that relevant diagnostic variance is distributed across multiple linear components. In contrast, the Autoencoder achieved competitive performance using fewer latent dimensions, indicating effective nonlinear compression. Per-class analysis showed that PCA+ANN performed better for PD and thermal faults, especially T2 and T3. AE+ANN improved D1 and D2 classification, confirming its targeted advantage in separating discharge-related classes. Cross-validation results further showed that PCA+ANN was more stable, with a lower standard deviation than AE+ANN. The broader classifier benchmark confirmed that both feature-extraction pipelines outperform conventional classifiers and a baseline uncompressed ANN. These findings validate the importance of feature extraction in DGA-based transformer fault diagnosis and establish PCA+ANN as the preferred overall approach for this dataset, while highlighting AE+ANN as a useful alternative for compact nonlinear representation and discharge fault discrimination.

5. Conclusions

This paper presented a comparative evaluation of PCA+ANN and AE+ANN models for dimensionality reduction and fault classification in DGA-based power transformer diagnosis. The results showed that PCA+ANN achieved the highest overall test accuracy of 68.9% at k = 11, compared with 66.4% for AE+ANN at k = 4. PCA also demonstrated greater cross-validation stability, indicating that its deterministic linear projection provides a more reliable feature representation for this dataset. Although AE+ANN produced a more compact latent representation, it did not surpass PCA+ANN in overall accuracy. However, the autoencoder showed targeted advantages in distinguishing discharge-related faults, particularly D1 and D2, suggesting that nonlinear feature extraction can better capture overlapping gas-ratio patterns in specific faults. In contrast, PCA+ANN performed more strongly for PD, T1, and T3 classes, supporting its suitability for general industrial deployment where reliability and stability are important. The findings confirm that feature extraction improves ANN-based transformer fault diagnosis compared with using uncompressed input features. PCA+ANN is recommended as the preferred baseline model for robust DGA fault classification, while AE+ANN may be useful in applications focused on discharge fault discrimination or compact embedded diagnostic systems. Future work should investigate variational, stacked, and class-conditional autoencoder architectures, as well as hybrid PCA-AE models and optimized training strategies, to further improve classification accuracy and class-specific fault separation.

Author Contributions

Conceptualization, M.S.N. and B.A.T.; methodology, M.S.N. and B.A.T.; software, M.S.N.; validation, M.S.N. and B.A.T.; formal analysis, M.S.N.; investigation, M.S.N.; resources, B.A.T.; data curation, M.S.N.; writing—original draft preparation, M.S.N.; writing—review and editing, B.A.T.; visualization, M.S.N.; supervision, B.A.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used in this study are included within the article. Additional data may be made available by the corresponding author upon reasonable request.

Acknowledgments

The authors would like to acknowledge the Department of Electrical and Electronic Engineering Technology, University of Johannesburg, for the academic support provided during this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AEAutoencoder
AE+ANNAutoencoder with Artificial Neural Network
ANNArtificial Neural Network
CVCross-Validation
D1Low Energy Discharge
D2High Energy Discharge
DGADissolved Gas Analysis
F1F1-Score
GNBGaussian Naïve Bayes
IECInternational Electrotechnical Commission
KNNK-Nearest Neighbors
L-BFGSLimited-memory Broyden-Fletcher-Goldfarb-Shanno
MLPMultilayer Perceptron
MSEMean Squared Error
PPrecision
PCAPrincipal Component Analysis
PCA+ANNPrincipal Component Analysis with Artificial Neural Network
PCPrincipal Component
PDPartial Discharge
RRecall
SVDSingular Value Decomposition
SVMSupport Vector Machine
T1Thermal Fault (T < 300 °C)
T2Thermal Fault (300–700 °C)
T3Thermal Fault (T > 700 °C)
TCHTotal Hydrocarbon Content
H2Hydrogen
CH4Methane
C2H2Acetylene
C2H4Ethylene
C2H6Ethane
COCarbon Monoxide

Appendix A. Representative DGA Cases

Table A1. Simplified transformer dataset gas concentrations.
Table A1. Simplified transformer dataset gas concentrations.
Dissolved GasesH2CH4C2H6C2H4C2H2IEC Diagnosis
PD-12587.2112.254.7041.40.001Partial Discharges (PD)
PD-216,0003600670140.001Partial Discharges (PD)
PD-34901938547521Partial Discharges (PD)
PD-4292.5838.393.870.840.001Partial Discharges (PD)
PD-516024.738.50.0010.001Partial Discharges (PD)
PD-6103.54.716.33.50.001Partial Discharges (PD)
PD-75869.58175.2116.451.450.001Partial Discharges (PD)
PD-8107695714231Partial Discharges (PD)
PD-944167873620Partial Discharges (PD)
PD-1019921040144Partial Discharges (PD)
PD-116600100038219Partial Discharges (PD)
PD-123417.62131.4214.361.220.001Partial Discharges (PD)
D1-119501233822Discharges of low energy (D1)
D1-29807358120.001Discharges of low energy (D1)
D1-392017442261213Discharges of low energy (D1)
D1-414.241.41.59.51Discharges of low energy (D1)
D1-5700137.414.9194.8936.6Discharges of low energy (D1)
D1-617620647.775.768.7Discharges of low energy (D1)
D1-74879262153321827Discharges of low energy (D1)
D1-85022886Discharges of low energy (D1)
D1-99619780220938Discharges of low energy (D1)
D1-10476282736148Discharges of low energy (D1)
D1-1113733829111Discharges of low energy (D1)
D1-1265.2203.98.1325.1Discharges of low energy (D1)
D2-14405223162183Discharges of high energy
D2-222717394419350,700626,882Discharges of high energy
D2-315151161219Discharges of high energy
D2-421188444443449,264540,711Discharges of high energy
D2-5235.46333.59177.521201.85148.87Discharges of high energy
D2-626.640.001850Discharges of high energy
D2-77238.97695.16231.62394.32308.92Discharges of high energy
D2-824097124284440,297589,171Discharges of high energy
D2-92402862685Discharges of high energy
D2-10431930.00140Discharges of high energy
D2-11576054040.510002760Discharges of high energy
D2-1284611486Discharges of high energy
T1-1110.411232.580.80.001Thermal fault, t < 300 °C
T1-243287290.001Thermal fault, t < 300 °C
T1-3851521281200Thermal fault, t < 300 °C
T1-468309320Thermal fault, t < 300 °C
T1-5903136756610Thermal fault, t < 300 °C
T1-621944330.001Thermal fault, t < 300 °C
T1-792276770.001Thermal fault, t < 300 °C
T1-818126241280.001Thermal fault, t < 300 °C
T1-99389380.001Thermal fault, t < 300 °C
T1-1093.5131.93911.70.001Thermal fault, t < 300 °C
T1-111126813690.001Thermal fault, t < 300 °C
T1-124840110.50.001Thermal fault, t < 300 °C
T2-1110.6458.8242.6406.40.001Thermal fault, 300 °C < t < 700 °C
T2-210910228910.001Thermal fault, 300 °C < t < 700 °C
T2-3128320Thermal fault, 300 °C < t < 700 °C
T2-435112551430Thermal fault, 300 °C < t < 700 °C
T2-52233901603922Thermal fault, 300 °C < t < 700 °C
T2-6100125188100Thermal fault, 300 °C < t < 700 °C
T2-7110136501250Thermal fault, 300 °C < t < 700 °C
T2-8110137501240Thermal fault, 300 °C < t < 700 °C
T2-967022445672Thermal fault, 300 °C < t < 700 °C
T2-108412628.9132.20.37Thermal fault, 300 °C < t < 700 °C
T2-116542171030Thermal fault, 300 °C < t < 700 °C
T2-125503836180Thermal fault, 300 °C < t < 700 °C
T3-1374900932575955Thermal fault, t > 700 °C
T3-2133952200Thermal fault, t > 700 °C
T3-3179306735790.001Thermal fault, t > 700 °C
T3-420266550.001Thermal fault, t > 700 °C
T3-52563311030Thermal fault, t > 700 °C
T3-64279311521Thermal fault, t > 700 °C
T3-7505256300Thermal fault, t > 700 °C
T3-81483964811311Thermal fault, t > 700 °C
T3-9853201026500Thermal fault, t > 700 °C
T3-100.0011061708716Thermal fault, t > 700 °C
T3-11103221.747.24220.9Thermal fault, t > 700 °C
T3-1227510.0011530.001Thermal fault, t > 700 °C

References

  1. Gifalli, A.; Neto, A.B.; de Souza, A.N.; de Mello, R.P.; Ikeshoji, M.A.; Garbelini, E.; Neto, F.T. Fault Detection and Normal Operating Condition in Power Transformers via Pattern Recognition Artificial Neural Network. Appl. Syst. Innov. 2024, 7, 41. [Google Scholar] [CrossRef]
  2. De Andrade Ferreira, R.S.; Picher, P.; Ezzaidi, H.; Fofana, I. Frequency Response Analysis Interpretation Using Numerical Indices and Machine Learning: A Case Study Based on a Laboratory Model. IEEE Access 2021, 9, 67051–67063. [Google Scholar] [CrossRef]
  3. Domor, I.; Theo, M. Deep Autoencoder Neural Networks: A Comprehensive Review and New Perspectives. Arch. Comput. Methods Eng. 2025, 32, 0123456789. [Google Scholar] [CrossRef]
  4. Faridpak, B.; Musilek, P. Resilient Operation Strategies for Integrated Power-Gas Systems. Energies 2024, 17, 6270. [Google Scholar] [CrossRef]
  5. Pan, W.; Li, Y.; Guo, Z.; Zhang, Y. Interdependent Expansion Planning for Resilient Electricity and Natural Gas Networks. Processes 2024, 12, 775. [Google Scholar] [CrossRef]
  6. Sun, Q.; Wu, Z.; Gu, W.; Dong, Z.Y.; Liu, P.; Qiu, H.; Amer, Y.; Lu, Y.; Zheng, Y. Seismic-Resilient Planning for Integrated Energy System: A Risk-Economic Coordination Perspective. In IEEE Transactions on Power Systems; IEEE: Washington, DC, USA, 2025. [Google Scholar]
  7. Velásquez, R.M.A.; Lara, J.V.M. Root cause analysis improved with machine learning for failure analysis in power transformers. Eng. Fail. Anal. 2020, 115, 104684. [Google Scholar] [CrossRef]
  8. Khan, M.A.M. Ai and Machine Learning in Transformer Fault Diagnosis: A Systematic Review. Am. J. Adv. Technol. Eng. Solut. 2025, 1, 290–318. [Google Scholar] [CrossRef]
  9. Li, Z.; He, Y.; Xing, Z.; Duan, J. Transformer fault diagnosis based on improved deep coupled dense convolutional neural network. Electr. Power Syst. Res. 2022, 209, 107969. [Google Scholar] [CrossRef]
  10. Rangel Bessa, A.; Farias Fardin, J.; Marques Ciarelli, P.; Frizera Encarnação, L. Conventional Dissolved Gases Analysis in Power Transformers: Review. Energies 2023, 16, 7219. [Google Scholar] [CrossRef]
  11. Cabral, T.W.; De Lima, E.R.; Cândido, J.; Filho, S.S.; Meloni, L.G.P. Autoencoders Beat PCA for Low-Dimension DGA-based Fault Diagnosis of Power Transformers. In Proceedings of the XLII Simpósio Brasileiro de Telecomunicações e Processamento de Sinais, Belem do Pará, Brazil, 1–4 October 2024. [Google Scholar]
  12. Du, Y.; Wang, Z.; Feng, G. A Methodology to Diagnose Transformer Faults Based on Principal Components Analysis and Artificial Neural Network. In 2022 IEEE 6th Conference on Energy Internet and Energy System Integration (EI2); IEEE: Washington, DC, USA, 2022; pp. 1186–1189. [Google Scholar] [CrossRef]
  13. Demirci, M.; Gözde, H.; Taplamacioglu, M.C. Improvement of power transformer fault diagnosis by using sequential Kalman filter sensor fusion. Int. J. Electr. Power Energy Syst. 2023, 149, 109038. [Google Scholar] [CrossRef]
  14. Al-Sakini, S.R.; Bilal, G.A.; Sadiq, A.T.; Al-Maliki, W.A.K. Dissolved Gas Analysis for Fault Prediction in Power Transformers Using Machine Learning Techniques. Appl. Sci. 2025, 15, 118. [Google Scholar] [CrossRef]
  15. Zhang, L.; Xu, Z.; Lu, C.; Qiao, T.; Su, H.; Luo, Y. Heliyon Transformer fault diagnosis based on adversarial generative networks and deep stacked autoencoder. Heliyon 2024, 10, e30670. [Google Scholar] [CrossRef]
  16. Sakurada, M.; Yairi, T. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the 2014 2nd Workshop on Machine Learning for Sensory Data Analysis; Association for Computing Machinery: New York, NY, USA, 2014; pp. 4–11. [Google Scholar]
  17. Jaiswal, G.; Rani, R.; Mangotra, H.; Sharma, A. Integration of hyperspectral imaging and autoencoders: Benefits, applications, hyperparameter tunning and challenges. Comput. Sci. Rev. 2023, 50, 100584. [Google Scholar] [CrossRef]
  18. Van Der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
  19. Rahman, M.A.; Muniyandi, R.C. An Enhancement in Cancer Classification Accuracy Using a Two-Step Feature Selection Method Based on Artificial Neural Networks with 15 Neurons. Symmetry 2020, 12, 271. [Google Scholar] [CrossRef]
  20. Xu, G.; Zhang, M.; Chen, W.; Wang, Z. Transformer Fault Diagnosis Utilizing Feature Extraction and Ensemble Learning Model. Information 2024, 15, 561. [Google Scholar] [CrossRef]
  21. Shlens, J. A tutorial on principal component analysis. arXiv 2014, arXiv:1404.1100. [Google Scholar] [CrossRef]
  22. Laayati, A.; El-Bazi, N.; Bouzi, M.; Chebak, A.; Guerrero, J.M. An AI-Layered with MultiAgent Systems Architecture for Prognostics Health Management of Smart Transformers: A Novel Approach for Smart Grid-Ready Energy Management Systems. Energies 2022, 15, 7217. [Google Scholar]
  23. Matharage, S.Y.; Liu, Q.; Wang, Z.D.; Mavrommatis, P.; Wilson, G.; Jarman, P. Ageing assessment of transformer paper insulation through detection of methanol in oil. In Proceedings of the 2015 IEEE 11th International Conference on the Properties and Applications of Dielectric Materials (ICPADM), Sydney, NSW, Australia, 19–22 July 2015; pp. 392–395. [Google Scholar]
  24. Kamel, H. Artificial intelligence for predictive maintenance. J. Phys. Conf. Ser. 2022, 2299, 012001. [Google Scholar] [CrossRef]
  25. Nagpal, T.; Brar, Y.S. Artificial neural network approaches for fault classification: Comparison and performance. Neural Comput. Appl. 2014, 25, 1863–1870. [Google Scholar] [CrossRef]
  26. Schmidgall, S.; Ziaei, R.; Achterberg, J.; Kirsch, L.; Hajiseyedrazi, S.P.; Eshraghian, J. Brain-Inspired Learning in Artificial Neural Networks: A Review. APL Mach. Learn. 2024, 2, 021501. [Google Scholar] [CrossRef]
  27. Liu, S.; Zhang, C.; Ma, J. Stacked Auto-Encoders for Feature Extraction with Neural Networks. In Bio-Inspired Computing—Theories and Applications; Springer: Berlin/Heidelberg, Germany, 2016; Volume 1, pp. 377–384. [Google Scholar] [CrossRef]
  28. Aciu, A.-M.; Nicola, C.-I.; Nicola, M.; Nițu, M.-C. Complementary Analysis for DGA Based on Duval Methods and Furan Compounds Using Artificial Neural Networks. Energies 2021, 14, 588. [Google Scholar] [CrossRef]
Figure 1. Bottleneck Autoencoder architecture for nonlinear DGA feature extraction.
Figure 1. Bottleneck Autoencoder architecture for nonlinear DGA feature extraction.
Energies 19 02806 g001
Figure 2. Comparative pipeline architectures between PCA linear projection and Autoencoder nonlinear encoding, both followed by an identical ANN classifier.
Figure 2. Comparative pipeline architectures between PCA linear projection and Autoencoder nonlinear encoding, both followed by an identical ANN classifier.
Energies 19 02806 g002
Figure 3. DGA dataset overall class distribution and training vs. testing split.
Figure 3. DGA dataset overall class distribution and training vs. testing split.
Energies 19 02806 g003
Figure 4. DGA gas concentration distributions by IEC 60599 fault type using logarithmic scale.
Figure 4. DGA gas concentration distributions by IEC 60599 fault type using logarithmic scale.
Energies 19 02806 g004
Figure 5. Pearson correlation matrix of all 15 DGA features.
Figure 5. Pearson correlation matrix of all 15 DGA features.
Energies 19 02806 g005
Figure 6. Feature extraction diagnostics showing PCA cumulative explained variance and Autoencoder reconstruction error across latent dimensions.
Figure 6. Feature extraction diagnostics showing PCA cumulative explained variance and Autoencoder reconstruction error across latent dimensions.
Energies 19 02806 g006
Figure 7. Latent-space comparison between PCA projection and Autoencoder nonlinear encoding.
Figure 7. Latent-space comparison between PCA projection and Autoencoder nonlinear encoding.
Energies 19 02806 g007
Figure 8. Test accuracy versus latent dimension k for PCA+ANN and AE+ANN pipelines.
Figure 8. Test accuracy versus latent dimension k for PCA+ANN and AE+ANN pipelines.
Energies 19 02806 g008
Figure 9. Confusion matrices for PCA+ANN at k = 11 and AE+ANN at k = 4 .
Figure 9. Confusion matrices for PCA+ANN at k = 11 and AE+ANN at k = 4 .
Energies 19 02806 g009
Figure 10. Per-class precision, recall, and F1-score for PCA+ANN and AE+ANN.
Figure 10. Per-class precision, recall, and F1-score for PCA+ANN and AE+ANN.
Energies 19 02806 g010
Figure 11. Per-class metric radar charts for PCA+ANN and AE+ANN.
Figure 11. Per-class metric radar charts for PCA+ANN and AE+ANN.
Energies 19 02806 g011
Figure 12. Complete classifier benchmark evaluated on the same DGA test set.
Figure 12. Complete classifier benchmark evaluated on the same DGA test set.
Energies 19 02806 g012
Figure 13. Head-to-head summary of overall test accuracy, weighted average precision, and per-class F1-score change.
Figure 13. Head-to-head summary of overall test accuracy, weighted average precision, and per-class F1-score change.
Energies 19 02806 g013
Table 1. IEC 60599 Transformer Fault Classification and Gas Signs.
Table 1. IEC 60599 Transformer Fault Classification and Gas Signs.
CodeFault Type Description/MechanismPrimary Gas SignsLabel
PDPartial Discharge; low-energy corona in voidsVery high H2, trace C2H21
D1Low Energy Discharge; sparking or arcing at low energyHigh H2 and C2H22
D2High Energy Discharge; sustained arc dischargeHigh C2H2 and C2H43
T1Thermal Fault; mild overheating of cellulose, T < 300 °CCH4, some CO4
T2Thermal Fault; moderate oil overheating, 300–700 °CCH4 and C2H45
T3Thermal Fault; severe oil thermal degradation, T > 700 °CHigh C2H4, some C2H26
Table 2. Comparison of existing state-of-the-art research and the proposed study.
Table 2. Comparison of existing state-of-the-art research and the proposed study.
AuthorYearObjectiveMethod/TechniqueProsCons/Research Gap
Gifalli et al. [1]2024Detect transformer faults and normal operating conditionsPattern recognition ANNDemonstrates ANN capability for automated transformer fault detectionDoes not focus on dimensionality reduction or PCA-AE comparison
Ferreira et al. [2]2021Interpret transformer frequency response analysis using numerical indices and MLNumerical indices, machine learningShows the usefulness of ML in transformer condition assessmentFocuses on FRA rather than DGA-based fault classification
Arias Velásquez and Mejía Lara [7]2020Improve root-cause analysis for transformer failureMachine learning-based failure analysisSupports ML for identifying transformer failure mechanismsNot centered on IEC 60599 DGA class prediction
Khan [8]2025Review AI and ML methods for transformer fault diagnosisSystematic reviewProvides broad coverage of AI-based transformer diagnosisReview-based; does not provide a controlled experimental PCA-AE benchmark
Li et al. [9]2022Improve transformer fault diagnosis using deep learningImproved deep coupled dense CNNStrong deep-learning feature extraction capabilityHigher architectural complexity; limited interpretability
Cabral et al. [11]2024Compare Autoencoders and PCA for low-dimensional DGA fault diagnosisAutoencoder, PCADirectly addresses low-dimensional DGA feature extractionRequires further controlled evaluation across all latent dimensions and identical ANN classification
Du et al. [12]2022Diagnose transformer faults using PCA and ANNPCA+ANNShows PCA can reduce redundancy before ANN classificationFocuses on PCA; does not compare with nonlinear Autoencoder features
Zhang et al. [15]2024Diagnose transformer faults using generative and deep Autoencoder methodsAdversarial generative networks, deep stacked AutoencoderCaptures nonlinear DGA representationsMore complex model; limited direct comparison with classical PCA
Van der Maaten and Hinton [18]2008Visualize high-dimensional data in low-dimensional spacet-SNEUseful for latent-space visualization and separability analysisVisualization method, not a classifier or diagnostic framework
Xu et al. [20]2024Improve transformer fault diagnosis using feature extraction and ensemble learningFeature extraction, ensemble modelConfirms the value of transformed features in DGA diagnosisDoes not isolate PCA and AE under identical ANN conditions
Domor and Theo [3]2025Review deep Autoencoder neural networksDeep Autoencoder reviewEstablishes AE relevance for nonlinear representation learningGeneral review; not specific to transformer DGA diagnosis
Liu et al. [27]2016Investigate stacked Autoencoders for feature extractionStacked AutoencoderSupports AE-based feature extraction in complex datasetsNot directly applied to IEC 60599 transformer fault classification
Proposed study2026Compare linear PCA and nonlinear AE feature extraction for IEC 60599 DGA fault diagnosisPCA+ANN and AE+ANNUses the same dataset, 15-dimensional feature space, ANN classifier, and evaluates k = 1–15, accuracy, F1-score, and CV stabilityLimited to the available IEC 60599-labeled DGA dataset; future work can explore hybrid and class-conditional AE models
Table 3. Dataset class distribution and stratified train/test split.
Table 3. Dataset class distribution and stratified train/test split.
CodeFault ClassTotalTrainingTesting
PDPartial Discharge584612
D1Low Energy Discharge1068521
D2High Energy Discharge1139023
T1Thermal Fault, T < 300 °C1068521
T2Thermal Fault, 300–700 °C957619
T3Thermal Fault, T > 700 °C1179423
Total 595476119
Table 4. PCA+ANN and AE+ANN test accuracy at selected latent dimensions.
Table 4. PCA+ANN and AE+ANN test accuracy at selected latent dimensions.
k PCA Cumulative VariancePCA+ANN Test AccuracyAE Reconstruction MSEAE+ANN Test Accuracy
126.0%49.6%0.65547.9%
239.5%59.7%0.17357.1%
351.2%63.0%0.09657.1%
460.7%58.8%0.01966.4%
569.1%63.9%0.01963.9%
888.2%58.8%0.01065.5%
1097.4%63.0%0.00666.4%
1199.9%68.9%0.00766.4%
15100.0%65.5%0.00562.2%
Table 5. Per-class performance metrics for PCA+ANN and AE+ANN on the test set n 119 .
Table 5. Per-class performance metrics for PCA+ANN and AE+ANN on the test set n 119 .
CodeFault ClassPCA PPCA RPCA F1AE PAE RAE F1 Δ F1 (AE − PCA)
PDPartial Discharge62.5%41.7%50.0%57.1%33.3%42.1%−7.9%
D1Low Energy Discharge62.5%47.6%54.1%63.2%57.1%60.0%+5.9%
D2High Energy Discharge58.6%73.9%65.4%66.7%78.3%72.0%+6.6%
T1Thermal Fault, T < 300 °C68.4%61.9%65.0%68.8%52.4%59.5%−5.5%
T2Thermal Fault, 300–700 °C73.9%89.5%81.0%62.5%78.9%69.8%−11.2%
T3Thermal Fault, T > 700 °C83.3%87.0%85.1%73.1%82.6%77.6%−7.6%
Weighted Avg. 68.6%68.9%68.1%66.0%66.4%65.4%−2.7%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nkwambe, M.S.; Thango, B.A. PCA and Autoencoder-Based ANN Models for Transformer Fault Diagnosis Using Dissolved Gas Analysis: Comparative Insights and Challenges. Energies 2026, 19, 2806. https://doi.org/10.3390/en19122806

AMA Style

Nkwambe MS, Thango BA. PCA and Autoencoder-Based ANN Models for Transformer Fault Diagnosis Using Dissolved Gas Analysis: Comparative Insights and Challenges. Energies. 2026; 19(12):2806. https://doi.org/10.3390/en19122806

Chicago/Turabian Style

Nkwambe, Mwamba S., and Bonginkosi A. Thango. 2026. "PCA and Autoencoder-Based ANN Models for Transformer Fault Diagnosis Using Dissolved Gas Analysis: Comparative Insights and Challenges" Energies 19, no. 12: 2806. https://doi.org/10.3390/en19122806

APA Style

Nkwambe, M. S., & Thango, B. A. (2026). PCA and Autoencoder-Based ANN Models for Transformer Fault Diagnosis Using Dissolved Gas Analysis: Comparative Insights and Challenges. Energies, 19(12), 2806. https://doi.org/10.3390/en19122806

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop