PCA and Autoencoder-Based ANN Models for Transformer Fault Diagnosis Using Dissolved Gas Analysis: Comparative Insights and Challenges

Nkwambe, Mwamba S.; Thango, Bonginkosi A.

doi:10.3390/en19122806

Open AccessArticle

PCA and Autoencoder-Based ANN Models for Transformer Fault Diagnosis Using Dissolved Gas Analysis: Comparative Insights and Challenges

by

Mwamba S. Nkwambe

and

Bonginkosi A. Thango

^*

Department of Electrical and Electronic Engineering Technology, University of Johannesburg, Johannesburg 2092, South Africa

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(12), 2806; https://doi.org/10.3390/en19122806

Submission received: 30 April 2026 / Revised: 5 June 2026 / Accepted: 7 June 2026 / Published: 11 June 2026

(This article belongs to the Special Issue Artificial Intelligence and Its Application in Condition Monitoring and Evaluation of Power Equipment)

Download

Browse Figures

Versions Notes

Abstract

Accurate fault diagnosis of power transformers using Dissolved Gas Analysis (DGA) depends on effective feature extraction to reduce redundancy and improve classification performance. This study compares linear and nonlinear feature extraction methods viz. Principal Component Analysis (PCA) and bottleneck Autoencoders (AE) to determine whether nonlinear representations provide diagnostic advantages for transformer fault classification. A dataset of 595 IEC 60599-labeled DGA samples covering six fault classes (PD, D1, D2, T1, T2, T3) was used. A 15-dimensional feature space was constructed from gas concentrations, total hydrocarbon content, and IEC-aligned gas ratios. PCA and AE were applied for dimensionality reduction across latent dimensions (k = 1–15), followed by an identical Artificial Neural Network (ANN) classifier. Performance was evaluated using test accuracy, cross-validation stability, and per-class F1-scores. The PCA+ANN model achieved a maximum accuracy of 68.9% at k = 11, outperforming AE+ANN, which achieved 66.4% at k = 4. PCA also demonstrated greater cross-validation stability (62 ± 3.5%) compared to AE (62 ± 6.6%). However, AE improved F1-scores for discharge faults (D1 and D2) by enhancing nonlinear separation of overlapping samples. PCA provides superior overall accuracy and stability for transformer fault diagnosis, while AE offers targeted advantages in distinguishing discharge-related faults. These findings establish a consistent benchmark for future studies and highlight the complementary roles of linear and nonlinear feature extraction in DGA-based diagnostic systems.

Keywords:

power transformer; dissolved gas analysis; fault diagnosis; principal component analysis; artificial neural network; IEC 60599; autoencoders

1. Introduction

Power transformers are critical assets in modern electrical power systems because they enable voltage transformation, efficient energy transmission, and stable power delivery across generation, transmission, and distribution networks [1,2,3]. In substations and integrated energy systems, failure of transformers can affect substation reliability and system resilience during extreme events such as earthquakes, cascading outages, and severe operational disturbances, motivating resilient operation strategies, risk-economic coordinated seismic-resilient planning, and interdependent expansion planning for integrated electricity and natural gas networks [4,5,6]. Their reliable operation is therefore essential for maintaining grid security and continuity of supply. However, despite their robust design, power transformers are exposed to electrical, thermal, mechanical, and environmental stresses during long-term operation. These stresses may gradually degrade the insulation system, particularly transformer oil and cellulose paper, and may initiate internal defects such as partial discharge, low-energy discharge, high-energy arcing, and thermal overheating. If such incipient faults remain undetected, they can evolve into severe failures, causing forced outages, equipment damage, safety risks, and substantial economic losses [7]. Dissolved Gas Analysis (DGA) is one of the most widely adopted condition-monitoring techniques for transformer fault diagnosis. During abnormal electrical or thermal activity inside a transformer, insulating oil and solid insulation decompose and release characteristic gases. The type and concentration of these gases provide valuable diagnostic information regarding the underlying fault mechanism. Standards such as IEC 60599 and IEEE C57.104 provide interpretation guidelines for relating dissolved gas patterns to transformer fault types [8,9,10]. Conventional diagnostic approaches, including IEC ratio methods, Rogers ratios, and related rule-based schemes, have been extensively used in practice. However, these approaches often struggle when gas signatures overlap, when measurements are affected by uncertainty, or when fault mechanisms do not fit neatly into predefined ratio boundaries [11]. With the development of artificial intelligence and data-driven condition monitoring, machine learning methods have increasingly been applied to DGA-based transformer fault diagnosis. Among these methods, Artificial Neural Networks (ANNs) have demonstrated strong potential because of their ability to approximate nonlinear relationships between gas features and fault classes [12,13,14]. Nevertheless, the performance of ANN-based diagnostic models depends strongly on the quality of the input feature space. When raw gas concentrations are combined with gas-ratio features, the resulting feature set may contain substantial redundancy and correlation. Such redundancy can degrade generalization, increase model sensitivity, and contribute to overfitting, especially when the available fault dataset is limited [15].

Feature extraction offers a practical solution to this problem by transforming the original high-dimensional feature space into a compact representation that preserves diagnostically relevant information while reducing redundancy. Principal Component Analysis (PCA) and Autoencoders (AEs) are two representative feature extraction approaches. PCA is a linear dimensionality reduction method that projects correlated variables onto orthogonal components arranged according to explained variance [12,16,17]. Its advantages include mathematical transparency, deterministic behavior, and computational efficiency. In contrast, bottleneck Autoencoders are nonlinear neural network models that learn compressed latent representations through reconstruction-based training. Their main advantage is the ability to capture nonlinear structures that may not be adequately represented by linear projections. Although both PCA and Autoencoder-based dimensionality reduction have been applied in intelligent diagnostic systems, their relative diagnostic value for DGA-based transformer fault classification remains insufficiently clarified. In particular, it is important to determine whether the additional modeling flexibility of nonlinear Autoencoders produces measurable diagnostic benefits over the simpler and more stable PCA approach when both are evaluated under identical experimental conditions. This study, therefore, compares PCA+ANN and AE+ANN fault diagnosis pipelines using the same IEC 60599-labeled DGA dataset, the same engineered feature space, and the same ANN classifier. The central research question is whether nonlinear hidden-feature learning provides practical diagnostic advantages over linear dimensionality reduction for classifying transformer fault types.

1.1. Necessity of the Study

The diagnosis of transformer faults from DGA measurements is not a straightforward pattern-recognition task. Gas formation inside a transformer is governed by complex physical and chemical processes associated with insulation degradation, discharge activity, oil decomposition, and thermal stress. As a result, different fault mechanisms may produce partially overlapping gas signatures. For example, discharge-related faults such as partial discharge, low-energy discharge, and high-energy discharge may all involve elevated hydrogen and acetylene-related patterns, while thermal faults may show gradual transitions across temperature-dependent gas-generation regions. This overlap makes it difficult for conventional rule-based techniques to provide consistently accurate classification across all fault types. The increasing use of ANN-based classifiers in DGA fault diagnosis has improved the ability to model nonlinear relationships between gas features and fault categories. However, ANN classifiers do not automatically resolve all limitations associated with the input data [18,19]. If the feature space contains strongly correlated variables, redundant ratio features, or overlapping class distributions, the classifier may learn patterns that are unstable or dataset-specific. This is particularly important in transformer fault diagnosis, where datasets are often moderate in size, class distributions may be uneven, and field measurements may contain noise or uncertainty. Therefore, the feature extraction stage becomes a critical component of the diagnostic pipeline rather than a secondary preprocessing step. PCA is commonly used because it provides a compact and stable linear representation of correlated variables. For industrial diagnostic applications, this stability is valuable because the method is deterministic, computationally efficient, and relatively easy to interpret. However, DGA fault patterns are not always linearly separable. A purely linear transformation may preserve global variance while failing to emphasize nonlinear class boundaries, especially among fault classes with similar gas profiles. Autoencoders provide an alternative by learning nonlinear latent representations that may better capture curved or complex data structures. Yet, Autoencoders introduce their own practical challenges, including sensitivity to initialization, architecture selection, training variability, and reduced interpretability. For this reason, it is necessary to evaluate PCA and Autoencoder feature extraction under a controlled and comparable framework. Without such a controlled comparison, performance differences may be caused by unrelated factors such as different datasets, different classifiers, different feature sets, or different evaluation procedures. A fair comparison must isolate the feature extraction method as the primary variable while keeping the classifier, dataset, preprocessing, and performance metrics consistent. This study addresses that need by examining whether nonlinear Autoencoder representations offer genuine diagnostic advantages over PCA, or whether the simpler linear approach remains preferable for practical transformer fault diagnosis.

1.2. Novelty of the Proposed Work

Existing research on DGA-based transformer fault diagnosis has shown that both statistical dimensionality reduction and neural feature learning can improve diagnostic performance. PCA-based methods are often valued for reducing correlated variables and producing compact features before classification, while Autoencoder-based methods are increasingly used to extract nonlinear latent patterns from high-dimensional diagnostic data. However, many existing studies evaluate these methods in isolation or compare them under inconsistent experimental settings. In such cases, conclusions about the superiority of one method over another may be influenced by variations in dataset composition, feature engineering strategy, classifier architecture, hyperparameter tuning, or validation procedure. The novelty of this study is that it establishes a controlled head-to-head comparison between a linear feature extraction paradigm and a nonlinear feature extraction paradigm for transformer fault diagnosis. PCA and bottleneck Autoencoder models are not evaluated as independent, unrelated diagnostic systems; instead, they are embedded into two parallel diagnostic pipelines that share the same preprocessing stage, the same 15-dimensional DGA feature space, and the same ANN classifier. By maintaining identical classification conditions, the study ensures that any observed difference in diagnostic performance can be attributed primarily to the feature extraction mechanism. This makes the comparison more rigorous than approaches that test PCA- and Autoencoder-based models with different classifiers or feature inputs. Another novel aspect of the work is the systematic evaluation of latent dimensionality. Rather than selecting one arbitrary compressed dimension, both PCA and Autoencoder representations are assessed across the full range of latent dimensions from k = 1 to k = 15. This allows the study to examine not only the highest classification accuracy but also the relationship between compression level and diagnostic performance. Such analysis is important because a method that performs well only at high dimensionality may not be suitable for compact real-time diagnostic systems, while a method that retains useful performance at low dimensionality may be advantageous for embedded implementation.

The study also extends beyond overall accuracy by analyzing cross-validation stability and per-class diagnostic behavior. In practical transformer monitoring, overall accuracy alone is insufficient because different fault classes carry different operational implications. Misclassification among discharge faults, for example, may lead to different maintenance priorities compared with misclassification among thermal faults. By evaluating precision, recall, and F1-score for each IEC 60599 fault class, the proposed work reveals where each feature extraction strategy is diagnostically stronger or weaker. This provides a more detailed interpretation of model behavior than a single aggregate accuracy value. Furthermore, the proposed analysis connects numerical performance with feature-space interpretation. PCA is examined in terms of variance retention and component behavior, while the Autoencoder is examined through reconstruction error and latent representation. This allows the study to explain why PCA may provide stronger overall stability, while Autoencoder representations may still offer targeted advantages in separating overlapping discharge-related classes. Therefore, the novelty of the work is not only the identification of the better-performing model, but also the clarification of the diagnostic conditions under which linear and nonlinear feature extraction are most useful.

1.3. Research Contributions

This paper makes several contributions to DGA-based transformer fault diagnosis. First, it develops a comparative diagnostic framework that integrates PCA-based and Autoencoder-based feature extraction with an identical ANN classifier. This controlled design allows the study to isolate the effect of feature extraction and provide a fair comparison between linear and nonlinear dimensionality reduction approaches. Second, the study constructs and evaluates a 15-dimensional DGA feature space comprising raw dissolved gas concentrations, total hydrocarbon content, and IEC-aligned gas-ratio features. This feature set captures both absolute gas concentrations and relative gas-generation patterns, thereby providing a richer diagnostic representation than raw gas inputs alone. Third, the study systematically investigates the effect of latent dimensionality on classification performance. Both PCA+ANN and AE+ANN pipelines are evaluated across latent dimensions k = 1 to k = 15, allowing the analysis to identify the optimal dimensionality for each approach and to assess the trade-off between compression and diagnostic accuracy. Fourth, the work provides a detailed comparison of classification performance using overall accuracy, cross-validation stability, and per-class precision, recall, and F1-score. This makes it possible to determine not only which method performs better overall, but also which method is more effective for specific IEC 60599 fault categories. Fifth, the study offers interpretive insight into the relative strengths of PCA and Autoencoder representations.

Finally, the study establishes a consistent benchmark for future work on feature extraction in transformer fault diagnosis. By using identical classifiers, identical data partitions, and identical evaluation procedures, the results provide a reproducible baseline against which future PCA, Autoencoder, hybrid, or deep feature-learning methods can be assessed.

The remainder of this paper is organized as follows. Section 2 presents the theoretical background of DGA-based transformer fault classification, PCA-based linear feature extraction, bottleneck Autoencoder-based nonlinear feature extraction, and ANN classification. Section 3 describes the dataset, feature engineering process, comparative pipeline architecture, and evaluation procedure. Section 4 presents and discusses the experimental results, including dimensionality-reduction behavior, classification accuracy, confusion matrices, per-class metrics, cross-validation stability, and classifier benchmarking. Section 5 concludes the paper and outlines directions for future research.

2. Theoretical Background

2.1. DGA and IEC 60599 Fault Classification

Dissolved Gas Analysis (DGA) is one of the most established diagnostic techniques for assessing the internal condition of oil-immersed power transformers. During normal operation, only small quantities of gases are produced in transformer oil. However, under abnormal electrical or thermal stress, the insulating oil and cellulose materials decompose and generate characteristic gases [2]. The type, concentration, and relative proportion of these gases provide important diagnostic information about the nature and severity of the internal fault [4]. DGA generally considers key dissolved gases such as hydrogen (H₂), methane (CH₄), ethane (C₂H₆), ethylene (C₂H₄), acetylene (C₂H₂), carbon monoxide (CO), and carbon dioxide (CO₂). Among these, hydrocarbon and hydrogen gases are particularly important for distinguishing discharge and thermal fault mechanisms. The composition and concentration of gases resulting from the decomposition of transformer oil and insulating materials vary according to the fault type. Therefore, gas patterns can be mapped to specific fault categories using standardized interpretation methods. IEC 60599 classifies transformer faults into six major fault categories: partial discharge (PD), low-energy discharge (D1), high-energy discharge (D2), and three thermal fault levels (T1, T2, and T3). These fault types are summarized in Table 1, which presents the associated physical mechanisms and primary gas indicators used for diagnosis [1,20].

Although IEC 60599 provides an important diagnostic basis, practical DGA interpretation is often complicated by overlapping gas patterns, uncertain measurement conditions, and mixed fault mechanisms. For example, discharge faults may share elevated hydrogen and acetylene patterns, while thermal faults may show gradual gas-transition behavior across temperature regions. These overlaps make rule-based diagnosis difficult and motivate the use of data-driven approaches capable of learning nonlinear relationships between gas features and fault classes [11].

2.2. Linear Feature Extraction Using PCA

Principal Component Analysis (PCA) is a widely used linear feature extraction method for reducing dimensionality in correlated datasets [12,21]. In DGA-based transformer fault diagnosis, raw gas concentrations and gas-ratio features often contain redundant information because several gases are generated together during related fault mechanisms. PCA transforms the original feature space into a new set of orthogonal variables called principal components, which are arranged according to the amount of variance they explain [3,12]. The PCA space is defined by

k

principal components. These components are mutually uncorrelated and represent the dominant directions of variation in the dataset. The transformation can be obtained through eigenvalue decomposition of the covariance matrix or by using Singular Value Decomposition (SVD). In this study, PCA is used to obtain a compact linear representation of the 15-dimensional DGA feature space before ANN classification. The PCA procedure begins with standardization of the original feature values:

x_{j} = \frac{x_{j} - μ_{j}}{σ_{j}}, j = 1,2, \dots, m

(1)

where

x_{j}

is the standardized value of the

j t h

feature, while

μ_{j}

and

σ_{j}

are the mean and standard deviation of the same feature, respectively. After standardization, the covariance matrix is computed as:

C_{x} = \frac{1}{n} X X^{T}

(2)

where

X

is the standardized data matrix,

n

is the number of samples, and

C_{x}

is the covariance matrix. The eigenvectors of the covariance matrix define the principal component directions. The transformed PCA feature space is obtained as:

P_{x} = V^{T} X

(3)

where

V

contains the selected eigenvectors associated with the largest eigenvalues. By retaining only the first

k

components, PCA reduces the input dimensionality while preserving the dominant variance structure of the DGA feature space. PCA is attractive for industrial diagnostic systems because it is deterministic, computationally efficient, and relatively interpretable. However, since PCA is a linear projection, it may not fully capture nonlinear relationships among DGA features, particularly when discharge and thermal fault classes overlap in curved or complex regions of the feature space.

2.3. Nonlinear Feature Extraction Using Bottleneck Autoencoder

Autoencoders are neural network-based feature extraction models designed to learn compressed representations of input data. Unlike PCA, which performs a linear projection, an Autoencoder can learn nonlinear mappings through an encoder–decoder structure. This makes it suitable for datasets where relevant diagnostic information may lie on nonlinear manifolds [22,23]. A bottleneck Autoencoder consists of two main parts: an encoder and a decoder. The encoder maps the original input vector into a lower-dimensional latent representation, while the decoder reconstructs the original input from this compressed code. The bottleneck layer forces the network to retain only the most informative structure of the input data, thereby reducing redundancy and suppressing less useful variations. The encoder function is defined as:

ϕ : R^{d} \to R^{p}

(4)

where

d

is the input dimension and

p

is the bottleneck or latent dimension, with

p < d

. The decoder function is expressed as:

ψ : R^{p} \to R^{d}

(5)

The reconstructed input is obtained by composing the encoder and decoder:

\hat{X} = ψ (ϕ (X))

(6)

The Autoencoder is trained by minimizing the reconstruction loss, commonly expressed using mean squared error (MSE):

L (ϕ, ψ) = \frac{1}{n} \sum_{i = 1}^{n} {∥ X^{(i)} - ψ (ϕ (X^{(i)})) ∥}^{2}

(7)

where

X^{(i)}

is the original input sample,

{\hat{X}}^{(i)}

is the reconstructed sample, and

n

is the number of training samples. In this study, the Autoencoder uses an encoder architecture of

m \to 32 \to k

, where

m = 15

is the original DGA feature dimension and

k

is the bottleneck dimension. The decoder mirrors this structure as

k \to 32 \to m

, with a linear output layer. The Autoencoder is trained using standardized DGA features and the Adam optimizer. The learned bottleneck representation is then used as the input to the ANN classifier. The bottleneck Autoencoder architecture for nonlinear DGA feature extraction is illustrated in Figure 1 [3].

2.4. ANN Classifier

A Multilayer Perceptron (MLP) is used as the classifier for both feature extraction pipelines. The purpose of using the same ANN classifier after PCA and Autoencoder feature extraction is to ensure that performance differences are caused by the feature extraction stage rather than by the classification model [24,25,26]. The classifier receives

k

input features, corresponding either to PCA components or Autoencoder latent codes. It consists of one hidden layer with 20 neurons using the hyperbolic tangent activation function and an output layer with six neurons corresponding to the IEC 60599 fault classes: PD, D1, D2, T1, T2, and T3. The output layer uses a softmax activation function to generate class probabilities. The ANN is trained using the L-BFGS optimizer with L₂ regularization

(α= 0.01)

and a maximum of 2000 training epochs. This controlled design ensures that both PCA+ANN and AE+ANN are evaluated under the same classification conditions. Therefore, any observed difference in accuracy, F1-score, or cross-validation stability can be attributed primarily to the difference between linear and nonlinear feature extraction.

2.5. State-of-the-Art in DGA-Based Transformer Fault Diagnosis

Recent studies on transformer condition monitoring show that artificial intelligence and machine learning are increasingly being adopted to improve the reliability of fault diagnosis beyond conventional rule-based DGA interpretation. Ref. [1] applied pattern-recognition ANN models for distinguishing transformer fault conditions from normal operating states, showing that neural classifiers can improve diagnostic automation when sufficient fault patterns are available. Ref. [2] demonstrated the broader value of machine learning in transformer condition assessment by applying numerical indices and learning-based interpretation to frequency response analysis, confirming the relevance of data-driven methods for transformer diagnostics beyond DGA alone. Root-cause analysis has also benefited from machine learning. Ref. [7] showed that ML-assisted failure analysis can improve the identification of transformer failure mechanisms, supporting the view that intelligent diagnostic systems are useful for both predictive maintenance and post-failure investigation. More recently, ref. [8] provided a systematic review of AI and machine learning in transformer fault diagnosis, emphasizing that intelligent models can overcome several limitations of traditional diagnostic methods, particularly when nonlinear relationships exist between input features and fault mechanisms. Deep learning methods have also been introduced to improve the representational capacity of diagnostic systems. Ref. [9] proposed an improved deep coupled dense convolutional neural network for transformer fault diagnosis, demonstrating the growing role of deep architectures in extracting discriminative patterns from transformer diagnostic data. Ref. [11] specifically compared Autoencoders and PCA for low-dimensional DGA-based fault diagnosis, reporting that nonlinear feature learning may provide advantages under strong dimensionality reduction. This study is closely related to the present work; however, the current paper extends the comparison by evaluating both methods across all latent dimensions and by using an identical ANN classifier for a controlled head-to-head assessment. PCA-based approaches remain important because of their simplicity, interpretability, and stability. Ref. [12] combined PCA with ANN for transformer fault diagnosis and showed that PCA can reduce redundancy before classification. However, the diagnostic performance of PCA depends on how well linear projections preserve class-discriminative gas patterns. In contrast, ref. [15] used adversarial generative networks and deep stacked Autoencoders, showing that deep representation learning can improve fault diagnosis by learning nonlinear structures from DGA data. These studies indicate that the main unresolved issue is not whether feature extraction is useful, but whether linear or nonlinear feature extraction is preferable under controlled diagnostic conditions. Several studies also provide methodological foundations relevant to feature-space representation and dimensionality reduction. Ref. [18] Introduced t-SNE as a nonlinear visualization method for high-dimensional data, which is relevant for understanding separability in latent spaces. Ref. [20] proposed a feature-extraction and ensemble-learning model for transformer fault diagnosis, further confirming that diagnostic performance can be improved when informative transformed features are supplied to classifiers. Ref. [3] reviewed deep Autoencoder neural networks and emphasized their usefulness for compression, representation learning, and nonlinear feature extraction. Ref. [27] also demonstrated the effectiveness of stacked Autoencoders for feature extraction, supporting their application in problems where nonlinear latent structures are expected.

Although these studies have advanced transformer fault diagnosis, several gaps remain. Many works use different datasets, feature spaces, classifiers, and validation procedures, making direct comparison difficult. Some studies emphasize overall accuracy but provide limited analysis of per-class behavior, cross-validation stability, or compression effects across different latent dimensions. Therefore, the present study addresses this gap by comparing PCA+ANN and AE+ANN under identical experimental conditions using the same 15-dimensional DGA feature space, the same IEC 60599-labeled dataset, and the same ANN classifier. The comparison of existing state-of-the-art studies and the proposed work is summarized in Table 2.

3. Methodology

3.1. Comparative Pipeline Architecture

This study adopts a controlled comparative methodology to evaluate the diagnostic effectiveness of linear and nonlinear feature extraction for DGA-based transformer fault classification. Two parallel diagnostic pipelines are developed, as shown in Figure 2. The first pipeline combines Principal Component Analysis with an Artificial Neural Network classifier. The second pipeline combines a bottleneck Autoencoder with the same ANN classifier. Both pipelines follow the same general structure: feature engineering, standardization, dimensionality reduction, ANN-based classification, and performance evaluation. The only methodological difference between the two pipelines is the feature extraction stage. In the PCA+ANN pipeline, the standardized 15-dimensional DGA feature vector is projected into a linear

k

-dimensional principal component space. In the AE+ANN pipeline, the same standardized feature vector is compressed into a nonlinear

k

-dimensional bottleneck representation. The extracted features are then passed to an identical ANN classifier. This design ensures that the feature extraction method is treated as the independent variable, while classification performance is treated as the dependent variable.

All remaining experimental conditions, including the dataset, train/test split, feature set, scaling procedure, ANN architecture, optimizer, and evaluation metrics, are kept constant. This controlled design is important because it ensures that differences in diagnostic performance can be attributed primarily to PCA-based linear projection or Autoencoder-based nonlinear encoding rather than to unrelated model or data-processing variations.

3.2. Dataset Description and Class Distribution

The dataset used in this study consists of 595 DGA transformer fault records from the power supply and is labeled according to the IEC 60599 fault classification scheme [28]. Representative DGA cases for the investigated transformer fault type, corresponding to gas concentrations in ppm, are detailed in Table A1. Six fault classes are considered: partial discharge (PD), low-energy discharge (D1), high-energy discharge (D2), low-temperature thermal fault (T1), medium-temperature thermal fault (T2), and high-temperature thermal fault (T3). These classes correspond directly to the IEC 60599 fault categories described in Section 2.1. To preserve the class distribution during model development and evaluation, a stratified 80/20 train/test split is used. The training set contains 476 samples, while the testing set contains 119 samples. Stratification ensures that each class is proportionally represented in both subsets, reducing the risk that model performance is biased by class imbalance. The distribution of samples is presented in Table 3 and visualized in Figure 3.

The class distribution shows that T3 is the most represented class with 117 samples, while PD is the least represented class with 58 samples. This moderate imbalance is important when interpreting the classification results, particularly the per-class precision, recall, and F1-score values reported in the results section.

3.3. Feature Construction

The original DGA measurements consist of five dissolved gas concentrations:

g = [H_{2}, C H_{4}, C_{2} H_{6}, C_{2} H_{4}, C_{2} H_{2}]

(8)

These gases are selected because they are strongly associated with discharge and thermal degradation mechanisms in transformer oil. To improve diagnostic representation, the raw gas measurements are expanded using total hydrocarbon content and IEC-aligned gas ratios. The total hydrocarbon content, denoted as

T C H

, is computed as:

T C H = C H_{4} + C_{2} H_{6} + C_{2} H_{4} + C_{2} H_{2}

(9)

Nine gas-ratio features are then constructed to capture relative gas-generation behavior:

r_{1} = \frac{C H_{4}}{H_{2} + ε}

(10)

r_{2} = \frac{C_{2} H_{2}}{C_{2} H_{4} + ε}

(11)

r_{3} = \frac{C_{2} H_{4}}{C_{2} H_{6} + ε}

(12)

r_{4} = \frac{C_{2} H_{2}}{C H_{4} + ε}

(13)

r_{5} = \frac{C_{2} H_{6}}{C H_{4} + ε}

(14)

r_{6} = \frac{C_{2} H_{4}}{T C H + ε}

(15)

r_{7} = \frac{C_{2} H_{6}}{T C H + ε}

(16)

r_{8} = \frac{C H_{4}}{T C H + ε}

(17)

r_{9} = \frac{C_{2} H_{2}}{T C H + ε}

(18)

where

ε = 10^{- 6}

is added to each denominator to avoid division by zero. The final input feature vector is therefore defined as:

x = [H_{2}, C H_{4}, C_{2} H_{6}, C_{2} H_{4}, C_{2} H_{2}, T C H, r_{1}, r_{2}, \dots, r_{9}] \in R^{15}

(19)

Thus, each DGA sample is represented by a 15-dimensional feature vector. This feature construction combines raw gas concentrations, total hydrocarbon content (TCH), and IEC-aligned gas ratios. The raw gases preserve magnitude information, while the ratios emphasize diagnostic relationships associated with IEC 60599 interpretation. As illustrated in Figure 4, the resulting gas distributions show class-dependent behavior, with acetylene

(C_{2} H_{2})

providing particularly strong visual discrimination between discharge and thermal fault classes.

3.4. Data Standardization

Before feature extraction, all input features are standardized to remove scale effects. This step is essential because gas concentrations and gas ratios may have different numerical ranges. Without standardization, features with larger magnitudes could dominate the PCA variance structure or affect Autoencoder optimization. For each feature

x_{j}

, Standardization is performed as:

z_{j} = \frac{x_{j} - μ_{j}}{σ_{j}}

(20)

where

z_{j}

is the standardized feature,

x_{j}

is the original feature,

μ_{j}

is the training-set mean, and

σ_{j}

is the training-set standard deviation. The standardized feature matrix is denoted as:

Z \in R^{n \times 15}

(21)

where

n

is the number of samples. The scaler is fitted only on the training data and then applied to the test data to avoid information leakage.

3.5. Feature-Space Correlation Analysis

A Pearson correlation analysis is performed to examine redundancy in the engineered 15-dimensional feature space. The correlation coefficient between two features

x_{i}

and

x_{j}

is computed as:

ρ_{i j} = \frac{\sum_{l = 1}^{n} (x_{l i} - {\bar{x}}_{i}) (x_{l j} - {\bar{x}}_{j})}{\sqrt{\sum_{l = 1}^{n} (x_{l i} - {\bar{x}}_{i})^{2}} \sqrt{\sum_{l = 1}^{n} (x_{l j} - {\bar{x}}_{j})^{2}}}

(22)

where

x_{l i}

and

x_{l j}

are the values of features

i

and

j

for sample

l

, while

{\bar{x}}_{i}

and

{\bar{x}}_{j}

are the corresponding feature means. The full correlation matrix is expressed as:

R = [\begin{matrix} ρ_{11} & ρ_{12} & \dots & ρ_{1 m} \\ ρ_{21} & ρ_{22} & \dots & ρ_{2 m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ρ_{m 1} & ρ_{m 2} & \dots & ρ_{m m} \end{matrix}]

(23)

where

m = 15

. As shown in Figure 5, the engineered DGA feature space exhibits two major correlation structures. The raw gas concentrations, especially

H_{2}

,

C H_{4}

,

C_{2} H_{6}

, and

C_{2} H_{4}

, are strongly correlated with one another and with

T C H

. A second correlation block is formed by the

T C H

-normalized ratio features, including

C_{2} H_{4} / T C H

,

C_{2} H_{6} / T C H

,

C H_{4} / T C H

, and

C_{2} H_{2} / T C H

. This correlation structure confirms that dimensionality reduction is methodologically justified for both PCA and Autoencoder compression.

3.6. PCA-Based Linear Feature Extraction

For the PCA+ANN pipeline, PCA is applied to the standardized feature matrix

Z

. The covariance matrix is computed as:

C = \frac{1}{n - 1} Z^{T} Z

(24)

The eigenvalue decomposition of the covariance matrix is given by:

C v_{j} = λ_{j} v_{j}

(25)

where

λ_{j}

is the eigenvalue associated with eigenvector

v_{j}

. The eigenvalues are ordered as:

λ_{1} \geq λ_{2} \geq \dots \geq λ_{m}

(26)

The cumulative explained variance ratio for the first

k

principal components is calculated as:

E V R (k) = \frac{\sum_{j = 1}^{k} λ_{j}}{\sum_{j = 1}^{m} λ_{j}} \times 100 %

(27)

The PCA-transformed feature matrix is obtained by projecting the standardized data onto the first

k

eigenvectors:

Z_{P C A}^{(k)} = Z V_{k}

(28)

where

V_{k} = [v_{1}, v_{2}, \dots, v_{k}]

. In this study,

k

is varied from 1 to 15 to evaluate how the number of retained principal components affects classification performance. This directly supports the results in which PCA+ANN achieved its best test accuracy at

k = 11

, corresponding to approximately 99.9% cumulative variance retention.

3.7. Autoencoder-Based Nonlinear Feature Extraction

For the AE+ANN pipeline, a bottleneck Autoencoder is trained on the same standardized 15-dimensional feature space. The encoder maps the input vector

z \in R^{15}

into a lower-dimensional latent representation:

h_{1} = f (W_{1} z + b_{1})

(29)

q^{(k)} = f (W_{2} h_{1} + b_{2})

(30)

where

q^{(k)} \in R^{k}

is the bottleneck feature vector,

W_{1}

and

W_{2}

are encoder weight matrices,

b_{1}

and

b_{2}

are bias terms, and

f (\cdot)

denotes the nonlinear activation function. The decoder reconstructs the input as:

h_{2} = f (W_{3} q^{(k)} + b_{3})

(31)

\hat{z} = W_{4} h_{2} + b_{4}

(32)

where

\hat{z}

is the reconstructed standardized input. A linear output layer is used in the decoder to reconstruct continuous standardized features. The reconstruction loss is defined as:

M S E (k) = \frac{1}{n} \sum_{i = 1}^{n} {∥ z^{(i)} - {\hat{z}}^{(i)} ∥}_{2}^{2}

(33)

The Autoencoder architecture used in this study is selected as a balanced shallow nonlinear compression model suitable for the moderate dataset size:

15 \to 32 \to k \to 32 \to 15

(34)

where bottleneck dimensionality

k

is varied from 1 to 15. The model is trained using the Adam optimizer with 32 hidden neurons. After training, the decoder is discarded, and only the encoder output

q^{(k)}

is used as the compressed feature representation for ANN classification:

Z_{A E}^{(k)} = ϕ (Z)

(35)

This procedure supports the feature extraction diagnostics reported in Figure 6, where Autoencoder reconstruction error decreases sharply from

k = 1

to

k = 4

. The best AE+ANN test accuracy is later obtained at

k = 4

, indicating that the nonlinear encoder compressed most diagnostically useful information into a compact latent representation.

3.8. ANN Classification Procedure

After dimensionality reduction, the extracted features are passed to the same ANN classifier. For PCA+ANN, the classifier input is

Z_{P C A}^{(k)}

. For AE+ANN, the classifier input is

Z_{A E}^{(k)}

. The ANN input vector for sample

i

is therefore defined as:

u^{(i)} = \{\begin{matrix} z_{P C A}^{(i, k)}, & for PCA+ANN \\ z_{A E}^{(i, k)}, & for AE+ANN \end{matrix}

(36)

The hidden layer output is computed as:

a^{(i)} = t a n h (W_{h} u^{(i)} + b_{h})

(37)

where

W_{h}

and

b_{h}

are the hidden-layer weights and biases. The output logits are:

o^{(i)} = W_{o} a^{(i)} + b_{o}

(38)

The predicted class probability for class

c

is computed using the softmax function:

{\hat{y}}_{c}^{(i)} = \frac{e x p (o_{c}^{(i)})}{\sum_{r = 1}^{6} e x p (o_{r}^{(i)})}

(39)

The predicted fault class is then:

{\hat{c}}^{(i)} = a r g \underset{c}{m a x} {\hat{y}}_{c}^{(i)}

(40)

The ANN contains one input neuron, one hidden layer with 20 neurons, using the hyperbolic tangent activation function (tanh(x)), and six output neurons corresponding to the IEC 60599 classes. The same classifier structure is used for every value of

k

and for both feature extraction pipelines. The ANN is trained using the L-BFGS optimizer with L₂ regularization:

J = - \frac{1}{n} \sum_{i = 1}^{n} \sum_{c = 1}^{6} y_{c}^{(i)} l o g ({\hat{y}}_{c}^{(i)}) + α ∥ W ∥_{2}^{2}

(41)

where

y_{c}^{(i)}

is the one-hot encoded true label,

{\hat{y}}_{c}^{(i)}

is the predicted class probability,

α = 0.01

is the regularization coefficient, and

W

denotes the trainable ANN weights. The maximum number of training iterations is set to 2000.

3.9. Feature Extraction Diagnostics

Feature extraction diagnostics are used to interpret how PCA and the Autoencoder compress the DGA feature space. For PCA, the cumulative explained variance

E V R (k)

in Equation (27) is used to determine how much information is retained as the number of principal components increases. For the Autoencoder, the reconstruction error

M S E (k)

in Equation (33) is used to evaluate how well the compressed latent representation reconstructs the original standardized feature space. As shown in Figure 6, PCA cumulative variance increases progressively as

k

increases and reaches approximately 99.9% at

k = 11

. This indicates that most of the variance in the engineered DGA feature space is distributed across multiple linear directions. In contrast, the Autoencoder reconstruction error drops sharply between

k = 1

and

k = 4

, suggesting that the nonlinear encoder captures the dominant structure of the DGA feature space using fewer latent dimensions.

3.10. Latent-Space Visualization

To qualitatively assess the separability of the learned representations, the PCA and Autoencoder latent spaces are visualized. For PCA, the first two principal components are plotted as:

s_{P C A}^{(i)} = [z_{P C 1}^{(i)}, z_{P C 2}^{(i)}]

(42)

For the Autoencoder, the first two latent dimensions are plotted as:

s_{A E}^{(i)} = [q_{1}^{(i)}, q_{2}^{(i)}]

(43)

These visualizations provide insight into whether the extracted features produce compact class clusters or overlapping class regions. As shown in Figure 7, the PCA latent space shows relatively compact thermal fault clusters, especially for T2 and T3, along the dominant principal component direction. The discharge classes PD, D1, and D2 exhibit greater overlap, reflecting the similarity of their hydrogen- and acetylene-related signatures. The Autoencoder latent space shows a nonlinear reorganization of the feature space, with improved compactness for some regions but persistent overlap among discharge classes. This visual evidence aligns with the classification results, where AE+ANN improves selected discharge fault F1-scores but PCA+ANN achieves stronger overall test accuracy and cross-validation stability.

3.11. Performance Evaluation Metrics

The diagnostic performance of both pipelines is evaluated using test accuracy, precision, recall, F1-score, and cross-validation stability. These metrics are selected because overall accuracy alone may not fully describe model behavior across imbalanced fault classes. The overall test accuracy is computed as:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(44)

For each class

c

, precision is defined as:

P r e c i s i o n_{c} = \frac{T P_{c}}{T P_{c} + F P_{c}}

(45)

Recall is defined as:

R e c a l l_{c} = \frac{T P_{c}}{T P_{c} + F N_{c}}

(46)

The class-specific F1-score is computed as:

F 1_{c} = 2 \cdot \frac{P r e c i s i o n_{c} \cdot R e c a l l_{c}}{P r e c i s i o n_{c} + R e c a l l_{c}}

(47)

The weighted average F1-score is calculated as:

F 1_{w e i g h t e d} = \sum_{c = 1}^{6} \frac{n_{c}}{N} F 1_{c}

(48)

where

n_{c}

is the number of test samples in class

c

, and

N

is the total number of test samples. Cross-validation stability is reported using the mean and standard deviation of validation accuracy:

C V_{m e a n} = \frac{1}{K} \sum_{j = 1}^{K} A_{j}

(49)

C V_{s t d} = \sqrt{\frac{1}{K - 1} \sum_{j = 1}^{K} (A_{j} - C V_{m e a n})^{2}}

(50)

where

A_{j}

is the validation accuracy in fold

j

, and

K

is the number of validation folds. This metric is used to compare robustness between PCA+ANN and AE+ANN. The reported results show that PCA+ANN achieves greater stability, with

62 \pm 3.5 %

, compared with

62 \pm 6.6 %

for AE+ANN.

3.12. Experimental Protocol

The complete experimental procedure is summarized as follows. First, the raw DGA gas measurements are expanded into a 15-dimensional feature vector using total hydrocarbon content and IEC-aligned gas ratios. Second, the dataset is divided into stratified training and testing subsets using an 80/20 split. Third, standardization is fitted on the training data and applied consistently to both training and testing sets. Fourth, PCA and Autoencoder feature extraction are performed independently for each latent dimension

k = 1,2, \dots, 15

. Fifth, the extracted

k

-dimensional features are used to train identical ANN classifiers. Finally, both pipelines are evaluated using test accuracy, confusion matrices, per-class precision, recall, F1-score, and cross-validation stability. This protocol ensures that the comparison between PCA+ANN and AE+ANN is systematic, controlled, and directly aligned with the results presented in Section 4. The methodology also supports the central objective of the study: to determine whether nonlinear Autoencoder-based feature extraction provides measurable diagnostic advantages over linear PCA-based dimensionality reduction for IEC 60599 transformer fault classification.

4. Results and Discussion

4.1. Classification Accuracy Across Latent Dimensions

The classification performance of the PCA+ANN and AE+ANN pipelines was first evaluated across all latent dimensions

k = 1,2, \dots, 15

. This analysis determines how the degree of dimensionality reduction affects diagnostic performance and identifies the optimal compressed feature size for each method. Table 4 summarizes the representative accuracy values at selected latent dimensions, while Figure 8 presents the complete accuracy trend for both pipelines.

The PCA+ANN pipeline achieved its highest test accuracy of 68.9% at

k = 11

, where the retained cumulative variance was approximately 99.9%. This indicates that although the engineered DGA feature space contains redundancy, diagnostically useful information is distributed across several principal components. Therefore, aggressive PCA compression removes information that remains relevant for fault discrimination. The performance increase at higher

k

-values suggests that the ANN classifier benefits from retaining nearly the full linear variance structure of the original 15-dimensional feature space. In contrast, the AE+ANN pipeline achieved its best test accuracy of 66.4% at

k = 4

. This result indicates that the bottleneck Autoencoder compressed most of the useful nonlinear structure into a much smaller latent representation. The sharp decrease in reconstruction error from

k = 1

to

k = 4

supports this observation, showing that the Autoencoder rapidly learned a compact representation of the standardized DGA feature space. However, increasing the latent dimension beyond

k = 4

did not produce a consistent accuracy improvement. At

k = 15

, AE+ANN accuracy decreased to 62.2%, suggesting that additional latent dimensions may reintroduce redundant or less discriminative variations into the ANN classifier. PCA+ANN outperformed AE+ANN by 2.5 percentage points at their respective optimal configurations. This confirms that, for the present dataset, linear variance-preserving feature extraction provides the strongest overall classification accuracy. However, AE+ANN achieved competitive accuracy using only four latent dimensions, making it potentially useful when compact representation is prioritized over maximum accuracy.

4.2. Confusion Matrix Analysis

Confusion matrices were used to examine the class-level behavior of the two optimal models: PCA+ANN at

k = 11

and AE+ANN at

k = 4

. Figure 9 compares the classification patterns produced by both pipelines on the 119-sample test set.

The PCA+ANN model produced the highest overall test accuracy and showed stronger classification performance for several thermal fault classes. In particular, PCA+ANN more effectively classified T2 and T3 faults, indicating that the linear principal component space preserved important gas-ratio information associated with thermal degradation. This is consistent with the observation that thermal classes form more compact structures in the PCA latent space. The AE+ANN model showed a different class-level behavior. Although its overall accuracy was slightly lower, it improved the separation of some discharge-related classes, particularly D1 and D2. This suggests that nonlinear encoding can reorganize overlapping discharge signatures in a way that improves their distinction. However, the AE+ANN model also introduced additional misclassifications among thermal classes, especially where the Autoencoder compressed features too aggressively. Both models struggled most with the PD-D1-D2 discharge group. This is expected because these fault classes share partially overlapping gas-generation mechanisms, especially involving elevated hydrogen and acetylene-related patterns. Such overlap makes the discharge classes inherently more difficult to separate than the higher-temperature thermal classes.

4.3. Per-Class Precision, Recall, and F1-Score Analysis

To provide a more detailed diagnostic interpretation, per-class precision, recall, and F1-score were calculated for both optimal pipelines. Table 5 presents the comparative per-class metrics, while Figure 10 and Figure 11 visualize the same results.

The per-class results show that neither method dominates across all fault classes. PCA+ANN achieved higher F1-scores for PD, T1, T2, and T3, while AE+ANN achieved higher F1-scores for D1 and D2. This confirms that the two feature extraction strategies preserve different diagnostic structures in the DGA feature space. For thermal faults, PCA+ANN performed particularly well. The highest F1-score was obtained for T3, where PCA+ANN achieved 85.1%, compared with 77.6% for AE+ANN. PCA+ANN also achieved a strong F1-score of 81.0% for T2, compared with 69.8% for AE+ANN. These results indicate that the linear principal component space effectively preserves the dominant gas-generation patterns associated with thermal degradation, especially those involving ethylene and hydrocarbon concentration trends. For discharge faults, AE+ANN showed targeted improvement. The F1-score for D1 increased from 54.1% with PCA+ANN to 60.0% with AE+ANN, while the F1-score for D2 increased from 65.4% to 72.0%. These improvements suggest that the nonlinear Autoencoder representation better captures overlapping discharge-related structures that are not fully separated by linear PCA projection. However, AE+ANN showed weaker performance for PD, T2, and T3. The reduction in T2 performance was the largest, with an F1-score decrease of 11.2 percentage points compared with PCA+ANN. This indicates that although nonlinear compression benefits some discharge classes, it may suppress or distort linear gas-ratio information that is important for thermal fault classification. The weighted average F1-score was 68.1% for PCA+ANN and 65.4% for AE+ANN. Therefore, PCA+ANN remains the stronger overall model, while AE+ANN provides class-specific advantages for discharge fault discrimination.

4.4. Cross-Validation Stability Comparison

Cross-validation was used to evaluate the robustness of both pipelines on the training set. The purpose of this analysis was not only to estimate average performance, but also to determine how sensitive each method is to variation in the training data. The PCA+ANN pipeline achieved a cross-validation accuracy of:

62.0 % \pm 3.5 %

Whereas the AE+ANN pipeline achieved:

62.0 % \pm 6.6 %

Although both methods obtained the same average cross-validation accuracy, PCA+ANN showed substantially lower variability of 2.5%. The smaller standard deviation indicates that PCA+ANN produced more consistent results across validation folds. This stability is expected because PCA is deterministic once the training data are fixed. Its transformation is based on eigenvalue decomposition and does not depend on random weight initialization. In contrast, AE+ANN showed higher fold-to-fold variation. This is consistent with the stochastic nature of neural representation learning. The Autoencoder training process depends on initialization, nonlinear optimization, and latent-space learning dynamics. These factors can lead to different compressed representations across folds, particularly when the dataset is moderately sized. From a practical deployment perspective, this result is important. Industrial transformer diagnostic systems require not only high accuracy, but also predictable and repeatable behavior. Therefore, the lower cross-validation variability of PCA+ANN strengthens its suitability as a reliable baseline for real-world DGA-based fault diagnosis.

4.5. Benchmark Against Conventional Classifiers

To contextualize the performance of the two proposed feature-extraction pipelines, a broader classifier benchmark was conducted using the same DGA test set. Figure 12 compares Gaussian Naïve Bayes, K-Nearest Neighbors, Support Vector Machine, a standard ANN without dimensionality reduction, PCA+ANN, and AE+ANN.

Gaussian Naïve Bayes achieved the weakest performance, with an accuracy of approximately 36.1%. This poor result is expected because DGA features are strongly correlated and do not satisfy the conditional independence assumptions of Naïve Bayes. In addition, gas concentration and ratio features are typically non-Gaussian and class-overlapping, further limiting the suitability of this classifier. KNN achieved approximately 55% accuracy, while SVM achieved approximately 62% accuracy. These results indicate that distance-based and margin-based classifiers can capture some structure in the engineered DGA feature space, but they do not outperform the best feature-extraction-based ANN pipelines. The baseline ANN trained directly on the uncompressed 15-dimensional input achieved 63.9% accuracy. While this result confirms the usefulness of ANN-based nonlinear classification, it also shows that using the full uncompressed feature space does not produce the best performance. The uncompressed ANN also exhibited higher validation variance, indicating greater sensitivity to redundant input features. Both PCA+ANN and AE+ANN outperformed the baseline ANN at their optimal configurations. This supports the central hypothesis of the study: dimensionality reduction improves DGA fault classification by reducing redundancy and providing a more compact representation before ANN classification. Among all tested models, PCA+ANN achieved the best overall test accuracy, while AE+ANN achieved competitive performance with a smaller latent dimension.

4.6. Head-to-Head Comparison of PCA+ANN and AE+ANN

A direct comparison between PCA+ANN and AE+ANN is summarized in Figure 13, which presents overall test accuracy, weighted average precision, and per-class F1-score differences.

The head-to-head comparison confirms three major findings. First, PCA+ANN provides the strongest overall classification performance. It achieved 68.9% test accuracy compared with 66.4% for AE+ANN. It also achieved a higher weighted average F1-score, confirming that its advantage is not limited to a single class. Second, AE+ANN provides targeted diagnostic improvement for discharge-related faults. The F1-score increased by 5.9 percentage points for D1 and 6.6 percentage points for D2. This supports the interpretation that nonlinear encoding can help separate overlapping discharge signatures in the engineered gas-ratio feature space. Third, PCA+ANN remains stronger for most thermal fault classes. The F1-score advantage of PCA+ANN was especially clear for T2 and T3, where thermal gas-generation patterns appear to be better preserved by linear variance-based projection. This suggests that the dominant structure of the thermal classes is sufficiently represented by PCA components. It is important to note that AE+ANN did not improve T2 performance; instead, T2 F1-score decreased by 11.2 percentage points relative to PCA+ANN. Therefore, AE+ANN should not be described as uniformly superior for thermal or general fault classification. Its advantage is specific to selected discharge classes, particularly D1 and D2. The results demonstrate that PCA and Autoencoder feature extraction provide complementary diagnostic behavior. PCA is more suitable when the goal is stable, general-purpose transformer fault classification. Autoencoder-based extraction is more suitable when compact nonlinear representation or improved discharge-class discrimination is prioritized.

4.7. Practical Interpretation and Deployment Implications

The results have direct implications for practical transformer condition-monitoring systems. In industrial environments, diagnostic models must be accurate, stable, computationally efficient, and sufficiently interpretable for engineering decision support. Based on these requirements, PCA+ANN is the preferable general-purpose model for the present dataset because it achieved the highest test accuracy, stronger weighted F1-score, and lower cross-validation variability. The deterministic nature of PCA also supports deployment in settings where repeatability is important. Since PCA does not depend on random initialization or iterative neural representation learning, the extracted features are more stable across repeated experiments. This makes PCA+ANN attractive for applications requiring consistent diagnostic outputs and simpler model validation. However, AE+ANN remains valuable in specific settings. Its ability to achieve 66.4% accuracy using only

k = 4

latent features indicates that nonlinear Autoencoder compression can produce compact diagnostic representations. This may be useful for embedded monitoring systems or edge-based transformer diagnostic devices where computational resources and storage are limited. In addition, the improved F1-scores for D1 and D2 suggest that AE+ANN may be useful when discharge fault discrimination is the primary diagnostic objective. Therefore, the choice between PCA+ANN and AE+ANN should depend on the deployment objective. PCA+ANN is recommended as the stronger baseline for robust industrial classification, while AE+ANN may be considered for compact or discharge-focused diagnostic systems.

4.8. Summary of Findings

The main findings from the results can be summarized as follows. PCA+ANN achieved the best overall test accuracy of 68.9% at

k = 11

, while AE+ANN achieved its best accuracy of 66.4% at

k = 4

. PCA required more latent dimensions to reach its optimum, indicating that relevant diagnostic variance is distributed across multiple linear components. In contrast, the Autoencoder achieved competitive performance using fewer latent dimensions, indicating effective nonlinear compression. Per-class analysis showed that PCA+ANN performed better for PD and thermal faults, especially T2 and T3. AE+ANN improved D1 and D2 classification, confirming its targeted advantage in separating discharge-related classes. Cross-validation results further showed that PCA+ANN was more stable, with a lower standard deviation than AE+ANN. The broader classifier benchmark confirmed that both feature-extraction pipelines outperform conventional classifiers and a baseline uncompressed ANN. These findings validate the importance of feature extraction in DGA-based transformer fault diagnosis and establish PCA+ANN as the preferred overall approach for this dataset, while highlighting AE+ANN as a useful alternative for compact nonlinear representation and discharge fault discrimination.

5. Conclusions

This paper presented a comparative evaluation of PCA+ANN and AE+ANN models for dimensionality reduction and fault classification in DGA-based power transformer diagnosis. The results showed that PCA+ANN achieved the highest overall test accuracy of 68.9% at k = 11, compared with 66.4% for AE+ANN at k = 4. PCA also demonstrated greater cross-validation stability, indicating that its deterministic linear projection provides a more reliable feature representation for this dataset. Although AE+ANN produced a more compact latent representation, it did not surpass PCA+ANN in overall accuracy. However, the autoencoder showed targeted advantages in distinguishing discharge-related faults, particularly D1 and D2, suggesting that nonlinear feature extraction can better capture overlapping gas-ratio patterns in specific faults. In contrast, PCA+ANN performed more strongly for PD, T1, and T3 classes, supporting its suitability for general industrial deployment where reliability and stability are important. The findings confirm that feature extraction improves ANN-based transformer fault diagnosis compared with using uncompressed input features. PCA+ANN is recommended as the preferred baseline model for robust DGA fault classification, while AE+ANN may be useful in applications focused on discharge fault discrimination or compact embedded diagnostic systems. Future work should investigate variational, stacked, and class-conditional autoencoder architectures, as well as hybrid PCA-AE models and optimized training strategies, to further improve classification accuracy and class-specific fault separation.

Author Contributions

Conceptualization, M.S.N. and B.A.T.; methodology, M.S.N. and B.A.T.; software, M.S.N.; validation, M.S.N. and B.A.T.; formal analysis, M.S.N.; investigation, M.S.N.; resources, B.A.T.; data curation, M.S.N.; writing—original draft preparation, M.S.N.; writing—review and editing, B.A.T.; visualization, M.S.N.; supervision, B.A.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used in this study are included within the article. Additional data may be made available by the corresponding author upon reasonable request.

Acknowledgments

The authors would like to acknowledge the Department of Electrical and Electronic Engineering Technology, University of Johannesburg, for the academic support provided during this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AE	Autoencoder
AE+ANN	Autoencoder with Artificial Neural Network
ANN	Artificial Neural Network
CV	Cross-Validation
D1	Low Energy Discharge
D2	High Energy Discharge
DGA	Dissolved Gas Analysis
F1	F1-Score
GNB	Gaussian Naïve Bayes
IEC	International Electrotechnical Commission
KNN	K-Nearest Neighbors
L-BFGS	Limited-memory Broyden-Fletcher-Goldfarb-Shanno
MLP	Multilayer Perceptron
MSE	Mean Squared Error
P	Precision
PCA	Principal Component Analysis
PCA+ANN	Principal Component Analysis with Artificial Neural Network
PC	Principal Component
PD	Partial Discharge
R	Recall
SVD	Singular Value Decomposition
SVM	Support Vector Machine
T1	Thermal Fault (T < 300 °C)
T2	Thermal Fault (300–700 °C)
T3	Thermal Fault (T > 700 °C)
TCH	Total Hydrocarbon Content
H₂	Hydrogen
CH₄	Methane
C₂H₂	Acetylene
C₂H₄	Ethylene
C₂H₆	Ethane
CO	Carbon Monoxide

Appendix A. Representative DGA Cases

Table A1. Simplified transformer dataset gas concentrations.

Dissolved Gases	H₂	CH₄	C₂H₆	C₂H₄	C₂H₂	IEC Diagnosis
PD-1	2587.2	112.25	4.704	1.4	0.001	Partial Discharges (PD)
PD-2	16,000	3600	670	14	0.001	Partial Discharges (PD)
PD-3	490	193	85	475	21	Partial Discharges (PD)
PD-4	292.58	38.39	3.87	0.84	0.001	Partial Discharges (PD)
PD-5	160	24.7	38.5	0.001	0.001	Partial Discharges (PD)
PD-6	103.5	4.7	16.3	3.5	0.001	Partial Discharges (PD)
PD-7	5869.58	175.21	16.45	1.45	0.001	Partial Discharges (PD)
PD-8	1076	95	71	4	231	Partial Discharges (PD)
PD-9	441	678	73	62	0	Partial Discharges (PD)
PD-10	199	21	0	40	144	Partial Discharges (PD)
PD-11	6600	1000	38	2	19	Partial Discharges (PD)
PD-12	3417.62	131.42	14.36	1.22	0.001	Partial Discharges (PD)
D1-1	1950	123	38	2	2	Discharges of low energy (D1)
D1-2	980	73	58	12	0.001	Discharges of low energy (D1)
D1-3	9201	744	226	12	13	Discharges of low energy (D1)
D1-4	14.2	4	1.4	1.5	9.51	Discharges of low energy (D1)
D1-5	700	137.4	14.9	194.8	936.6	Discharges of low energy (D1)
D1-6	176	206	47.7	75.7	68.7	Discharges of low energy (D1)
D1-7	4879	262	15	332	1827	Discharges of low energy (D1)
D1-8	50	2	28	8	6	Discharges of low energy (D1)
D1-9	9619	780	220	9	38	Discharges of low energy (D1)
D1-10	476	28	27	36	148	Discharges of low energy (D1)
D1-11	137	33	8	29	111	Discharges of low energy (D1)
D1-12	65.2	20	3.9	8.13	25.1	Discharges of low energy (D1)
D2-1	440	522	31	62	183	Discharges of high energy
D2-2	2271	739	4419	350,700	626,882	Discharges of high energy
D2-3	151	51	16	12	19	Discharges of high energy
D2-4	2118	844	4443	449,264	540,711	Discharges of high energy
D2-5	235.46	333.59	177.52	1201.85	148.87	Discharges of high energy
D2-6	26.6	4	0.001	8	50	Discharges of high energy
D2-7	7238.97	695.16	231.6	2394.3	2308.92	Discharges of high energy
D2-8	2409	712	4284	440,297	589,171	Discharges of high energy
D2-9	240	28	6	26	85	Discharges of high energy
D2-10	43	19	3	0.001	40	Discharges of high energy
D2-11	5760	540	40.5	1000	2760	Discharges of high energy
D2-12	84	6	1	14	86	Discharges of high energy
T1-1	110.4	112	32.5	80.8	0.001	Thermal fault, t < 300 °C
T1-2	43	28	72	9	0.001	Thermal fault, t < 300 °C
T1-3	85	152	128	120	0	Thermal fault, t < 300 °C
T1-4	68	30	9	32	0	Thermal fault, t < 300 °C
T1-5	90	313	67	566	10	Thermal fault, t < 300 °C
T1-6	219	44	3	3	0.001	Thermal fault, t < 300 °C
T1-7	92	27	67	7	0.001	Thermal fault, t < 300 °C
T1-8	181	262	41	28	0.001	Thermal fault, t < 300 °C
T1-9	9	38	93	8	0.001	Thermal fault, t < 300 °C
T1-10	93.5	131.9	39	11.7	0.001	Thermal fault, t < 300 °C
T1-11	112	68	136	9	0.001	Thermal fault, t < 300 °C
T1-12	48	40	11	0.5	0.001	Thermal fault, t < 300 °C
T2-1	110.6	458.8	242.6	406.4	0.001	Thermal fault, 300 °C < t < 700 °C
T2-2	109	102	28	91	0.001	Thermal fault, 300 °C < t < 700 °C
T2-3	12	8	3	2	0	Thermal fault, 300 °C < t < 700 °C
T2-4	35	112	55	143	0	Thermal fault, 300 °C < t < 700 °C
T2-5	223	390	160	392	2	Thermal fault, 300 °C < t < 700 °C
T2-6	100	125	188	10	0	Thermal fault, 300 °C < t < 700 °C
T2-7	110	136	50	125	0	Thermal fault, 300 °C < t < 700 °C
T2-8	110	137	50	124	0	Thermal fault, 300 °C < t < 700 °C
T2-9	670	224	45	67	2	Thermal fault, 300 °C < t < 700 °C
T2-10	84	126	28.9	132.2	0.37	Thermal fault, 300 °C < t < 700 °C
T2-11	65	42	17	103	0	Thermal fault, 300 °C < t < 700 °C
T2-12	550	38	36	18	0	Thermal fault, 300 °C < t < 700 °C
T3-1	374	900	932	5759	55	Thermal fault, t > 700 °C
T3-2	13	39	52	20	0	Thermal fault, t > 700 °C
T3-3	179	306	73	579	0.001	Thermal fault, t > 700 °C
T3-4	20	26	6	55	0.001	Thermal fault, t > 700 °C
T3-5	25	63	31	103	0	Thermal fault, t > 700 °C
T3-6	42	79	31	152	1	Thermal fault, t > 700 °C
T3-7	50	52	56	30	0	Thermal fault, t > 700 °C
T3-8	148	396	481	131	1	Thermal fault, t > 700 °C
T3-9	85	320	102	650	0	Thermal fault, t > 700 °C
T3-10	0.001	106	170	871	6	Thermal fault, t > 700 °C
T3-11	103	221.7	47.2	422	0.9	Thermal fault, t > 700 °C
T3-12	27	51	0.001	153	0.001	Thermal fault, t > 700 °C

References

Gifalli, A.; Neto, A.B.; de Souza, A.N.; de Mello, R.P.; Ikeshoji, M.A.; Garbelini, E.; Neto, F.T. Fault Detection and Normal Operating Condition in Power Transformers via Pattern Recognition Artificial Neural Network. Appl. Syst. Innov. 2024, 7, 41. [Google Scholar] [CrossRef]
De Andrade Ferreira, R.S.; Picher, P.; Ezzaidi, H.; Fofana, I. Frequency Response Analysis Interpretation Using Numerical Indices and Machine Learning: A Case Study Based on a Laboratory Model. IEEE Access 2021, 9, 67051–67063. [Google Scholar] [CrossRef]
Domor, I.; Theo, M. Deep Autoencoder Neural Networks: A Comprehensive Review and New Perspectives. Arch. Comput. Methods Eng. 2025, 32, 0123456789. [Google Scholar] [CrossRef]
Faridpak, B.; Musilek, P. Resilient Operation Strategies for Integrated Power-Gas Systems. Energies 2024, 17, 6270. [Google Scholar] [CrossRef]
Pan, W.; Li, Y.; Guo, Z.; Zhang, Y. Interdependent Expansion Planning for Resilient Electricity and Natural Gas Networks. Processes 2024, 12, 775. [Google Scholar] [CrossRef]
Sun, Q.; Wu, Z.; Gu, W.; Dong, Z.Y.; Liu, P.; Qiu, H.; Amer, Y.; Lu, Y.; Zheng, Y. Seismic-Resilient Planning for Integrated Energy System: A Risk-Economic Coordination Perspective. In IEEE Transactions on Power Systems; IEEE: Washington, DC, USA, 2025. [Google Scholar]
Velásquez, R.M.A.; Lara, J.V.M. Root cause analysis improved with machine learning for failure analysis in power transformers. Eng. Fail. Anal. 2020, 115, 104684. [Google Scholar] [CrossRef]
Khan, M.A.M. Ai and Machine Learning in Transformer Fault Diagnosis: A Systematic Review. Am. J. Adv. Technol. Eng. Solut. 2025, 1, 290–318. [Google Scholar] [CrossRef]
Li, Z.; He, Y.; Xing, Z.; Duan, J. Transformer fault diagnosis based on improved deep coupled dense convolutional neural network. Electr. Power Syst. Res. 2022, 209, 107969. [Google Scholar] [CrossRef]
Rangel Bessa, A.; Farias Fardin, J.; Marques Ciarelli, P.; Frizera Encarnação, L. Conventional Dissolved Gases Analysis in Power Transformers: Review. Energies 2023, 16, 7219. [Google Scholar] [CrossRef]
Cabral, T.W.; De Lima, E.R.; Cândido, J.; Filho, S.S.; Meloni, L.G.P. Autoencoders Beat PCA for Low-Dimension DGA-based Fault Diagnosis of Power Transformers. In Proceedings of the XLII Simpósio Brasileiro de Telecomunicações e Processamento de Sinais, Belem do Pará, Brazil, 1–4 October 2024. [Google Scholar]
Du, Y.; Wang, Z.; Feng, G. A Methodology to Diagnose Transformer Faults Based on Principal Components Analysis and Artificial Neural Network. In 2022 IEEE 6th Conference on Energy Internet and Energy System Integration (EI2); IEEE: Washington, DC, USA, 2022; pp. 1186–1189. [Google Scholar] [CrossRef]
Demirci, M.; Gözde, H.; Taplamacioglu, M.C. Improvement of power transformer fault diagnosis by using sequential Kalman filter sensor fusion. Int. J. Electr. Power Energy Syst. 2023, 149, 109038. [Google Scholar] [CrossRef]
Al-Sakini, S.R.; Bilal, G.A.; Sadiq, A.T.; Al-Maliki, W.A.K. Dissolved Gas Analysis for Fault Prediction in Power Transformers Using Machine Learning Techniques. Appl. Sci. 2025, 15, 118. [Google Scholar] [CrossRef]
Zhang, L.; Xu, Z.; Lu, C.; Qiao, T.; Su, H.; Luo, Y. Heliyon Transformer fault diagnosis based on adversarial generative networks and deep stacked autoencoder. Heliyon 2024, 10, e30670. [Google Scholar] [CrossRef]
Sakurada, M.; Yairi, T. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the 2014 2nd Workshop on Machine Learning for Sensory Data Analysis; Association for Computing Machinery: New York, NY, USA, 2014; pp. 4–11. [Google Scholar]
Jaiswal, G.; Rani, R.; Mangotra, H.; Sharma, A. Integration of hyperspectral imaging and autoencoders: Benefits, applications, hyperparameter tunning and challenges. Comput. Sci. Rev. 2023, 50, 100584. [Google Scholar] [CrossRef]
Van Der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Rahman, M.A.; Muniyandi, R.C. An Enhancement in Cancer Classification Accuracy Using a Two-Step Feature Selection Method Based on Artificial Neural Networks with 15 Neurons. Symmetry 2020, 12, 271. [Google Scholar] [CrossRef]
Xu, G.; Zhang, M.; Chen, W.; Wang, Z. Transformer Fault Diagnosis Utilizing Feature Extraction and Ensemble Learning Model. Information 2024, 15, 561. [Google Scholar] [CrossRef]
Shlens, J. A tutorial on principal component analysis. arXiv 2014, arXiv:1404.1100. [Google Scholar] [CrossRef]
Laayati, A.; El-Bazi, N.; Bouzi, M.; Chebak, A.; Guerrero, J.M. An AI-Layered with MultiAgent Systems Architecture for Prognostics Health Management of Smart Transformers: A Novel Approach for Smart Grid-Ready Energy Management Systems. Energies 2022, 15, 7217. [Google Scholar]
Matharage, S.Y.; Liu, Q.; Wang, Z.D.; Mavrommatis, P.; Wilson, G.; Jarman, P. Ageing assessment of transformer paper insulation through detection of methanol in oil. In Proceedings of the 2015 IEEE 11th International Conference on the Properties and Applications of Dielectric Materials (ICPADM), Sydney, NSW, Australia, 19–22 July 2015; pp. 392–395. [Google Scholar]
Kamel, H. Artificial intelligence for predictive maintenance. J. Phys. Conf. Ser. 2022, 2299, 012001. [Google Scholar] [CrossRef]
Nagpal, T.; Brar, Y.S. Artificial neural network approaches for fault classification: Comparison and performance. Neural Comput. Appl. 2014, 25, 1863–1870. [Google Scholar] [CrossRef]
Schmidgall, S.; Ziaei, R.; Achterberg, J.; Kirsch, L.; Hajiseyedrazi, S.P.; Eshraghian, J. Brain-Inspired Learning in Artificial Neural Networks: A Review. APL Mach. Learn. 2024, 2, 021501. [Google Scholar] [CrossRef]
Liu, S.; Zhang, C.; Ma, J. Stacked Auto-Encoders for Feature Extraction with Neural Networks. In Bio-Inspired Computing—Theories and Applications; Springer: Berlin/Heidelberg, Germany, 2016; Volume 1, pp. 377–384. [Google Scholar] [CrossRef]
Aciu, A.-M.; Nicola, C.-I.; Nicola, M.; Nițu, M.-C. Complementary Analysis for DGA Based on Duval Methods and Furan Compounds Using Artificial Neural Networks. Energies 2021, 14, 588. [Google Scholar] [CrossRef]

Figure 1. Bottleneck Autoencoder architecture for nonlinear DGA feature extraction.

Figure 2. Comparative pipeline architectures between PCA linear projection and Autoencoder nonlinear encoding, both followed by an identical ANN classifier.

Figure 3. DGA dataset overall class distribution and training vs. testing split.

Figure 4. DGA gas concentration distributions by IEC 60599 fault type using logarithmic scale.

Figure 5. Pearson correlation matrix of all 15 DGA features.

Figure 6. Feature extraction diagnostics showing PCA cumulative explained variance and Autoencoder reconstruction error across latent dimensions.

Figure 7. Latent-space comparison between PCA projection and Autoencoder nonlinear encoding.

Figure 8. Test accuracy versus latent dimension

k

for PCA+ANN and AE+ANN pipelines.

Figure 8. Test accuracy versus latent dimension

k

for PCA+ANN and AE+ANN pipelines.

Figure 9. Confusion matrices for PCA+ANN at

k = 11

and AE+ANN at

k = 4

.

Figure 9. Confusion matrices for PCA+ANN at

k = 11

and AE+ANN at

k = 4

.

Figure 10. Per-class precision, recall, and F1-score for PCA+ANN and AE+ANN.

Figure 11. Per-class metric radar charts for PCA+ANN and AE+ANN.

Figure 12. Complete classifier benchmark evaluated on the same DGA test set.

Figure 13. Head-to-head summary of overall test accuracy, weighted average precision, and per-class F1-score change.

Table 1. IEC 60599 Transformer Fault Classification and Gas Signs.

Code	Fault Type Description/Mechanism	Primary Gas Signs	Label
PD	Partial Discharge; low-energy corona in voids	Very high H₂, trace C₂H₂	1
D1	Low Energy Discharge; sparking or arcing at low energy	High H₂ and C₂H₂	2
D2	High Energy Discharge; sustained arc discharge	High C₂H₂ and C₂H₄	3
T1	Thermal Fault; mild overheating of cellulose, T < 300 °C	CH₄, some CO	4
T2	Thermal Fault; moderate oil overheating, 300–700 °C	CH₄ and C₂H₄	5
T3	Thermal Fault; severe oil thermal degradation, T > 700 °C	High C₂H₄, some C₂H₂	6

Table 2. Comparison of existing state-of-the-art research and the proposed study.

Author	Year	Objective	Method/Technique	Pros	Cons/Research Gap
Gifalli et al. [1]	2024	Detect transformer faults and normal operating conditions	Pattern recognition ANN	Demonstrates ANN capability for automated transformer fault detection	Does not focus on dimensionality reduction or PCA-AE comparison
Ferreira et al. [2]	2021	Interpret transformer frequency response analysis using numerical indices and ML	Numerical indices, machine learning	Shows the usefulness of ML in transformer condition assessment	Focuses on FRA rather than DGA-based fault classification
Arias Velásquez and Mejía Lara [7]	2020	Improve root-cause analysis for transformer failure	Machine learning-based failure analysis	Supports ML for identifying transformer failure mechanisms	Not centered on IEC 60599 DGA class prediction
Khan [8]	2025	Review AI and ML methods for transformer fault diagnosis	Systematic review	Provides broad coverage of AI-based transformer diagnosis	Review-based; does not provide a controlled experimental PCA-AE benchmark
Li et al. [9]	2022	Improve transformer fault diagnosis using deep learning	Improved deep coupled dense CNN	Strong deep-learning feature extraction capability	Higher architectural complexity; limited interpretability
Cabral et al. [11]	2024	Compare Autoencoders and PCA for low-dimensional DGA fault diagnosis	Autoencoder, PCA	Directly addresses low-dimensional DGA feature extraction	Requires further controlled evaluation across all latent dimensions and identical ANN classification
Du et al. [12]	2022	Diagnose transformer faults using PCA and ANN	PCA+ANN	Shows PCA can reduce redundancy before ANN classification	Focuses on PCA; does not compare with nonlinear Autoencoder features
Zhang et al. [15]	2024	Diagnose transformer faults using generative and deep Autoencoder methods	Adversarial generative networks, deep stacked Autoencoder	Captures nonlinear DGA representations	More complex model; limited direct comparison with classical PCA
Van der Maaten and Hinton [18]	2008	Visualize high-dimensional data in low-dimensional space	t-SNE	Useful for latent-space visualization and separability analysis	Visualization method, not a classifier or diagnostic framework
Xu et al. [20]	2024	Improve transformer fault diagnosis using feature extraction and ensemble learning	Feature extraction, ensemble model	Confirms the value of transformed features in DGA diagnosis	Does not isolate PCA and AE under identical ANN conditions
Domor and Theo [3]	2025	Review deep Autoencoder neural networks	Deep Autoencoder review	Establishes AE relevance for nonlinear representation learning	General review; not specific to transformer DGA diagnosis
Liu et al. [27]	2016	Investigate stacked Autoencoders for feature extraction	Stacked Autoencoder	Supports AE-based feature extraction in complex datasets	Not directly applied to IEC 60599 transformer fault classification
Proposed study	2026	Compare linear PCA and nonlinear AE feature extraction for IEC 60599 DGA fault diagnosis	PCA+ANN and AE+ANN	Uses the same dataset, 15-dimensional feature space, ANN classifier, and evaluates k = 1–15, accuracy, F1-score, and CV stability	Limited to the available IEC 60599-labeled DGA dataset; future work can explore hybrid and class-conditional AE models

Table 3. Dataset class distribution and stratified train/test split.

Code	Fault Class	Total	Training	Testing
PD	Partial Discharge	58	46	12
D1	Low Energy Discharge	106	85	21
D2	High Energy Discharge	113	90	23
T1	Thermal Fault, T < 300 °C	106	85	21
T2	Thermal Fault, 300–700 °C	95	76	19
T3	Thermal Fault, T > 700 °C	117	94	23
Total		595	476	119

Table 4. PCA+ANN and AE+ANN test accuracy at selected latent dimensions.

$k$	PCA Cumulative Variance	PCA+ANN Test Accuracy	AE Reconstruction MSE	AE+ANN Test Accuracy
1	26.0%	49.6%	0.655	47.9%
2	39.5%	59.7%	0.173	57.1%
3	51.2%	63.0%	0.096	57.1%
4	60.7%	58.8%	0.019	66.4%
5	69.1%	63.9%	0.019	63.9%
8	88.2%	58.8%	0.010	65.5%
10	97.4%	63.0%	0.006	66.4%
11	99.9%	68.9%	0.007	66.4%
15	100.0%	65.5%	0.005	62.2%

Table 5. Per-class performance metrics for PCA+ANN and AE+ANN on the test set

(n= 119)

.

Table 5. Per-class performance metrics for PCA+ANN and AE+ANN on the test set

(n= 119)

.

Code	Fault Class	PCA P	PCA R	PCA F1	AE P	AE R	AE F1	$Δ$ F1 (AE − PCA)
PD	Partial Discharge	62.5%	41.7%	50.0%	57.1%	33.3%	42.1%	−7.9%
D1	Low Energy Discharge	62.5%	47.6%	54.1%	63.2%	57.1%	60.0%	+5.9%
D2	High Energy Discharge	58.6%	73.9%	65.4%	66.7%	78.3%	72.0%	+6.6%
T1	Thermal Fault, T < 300 °C	68.4%	61.9%	65.0%	68.8%	52.4%	59.5%	−5.5%
T2	Thermal Fault, 300–700 °C	73.9%	89.5%	81.0%	62.5%	78.9%	69.8%	−11.2%
T3	Thermal Fault, T > 700 °C	83.3%	87.0%	85.1%	73.1%	82.6%	77.6%	−7.6%
Weighted Avg.		68.6%	68.9%	68.1%	66.0%	66.4%	65.4%	−2.7%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nkwambe, M.S.; Thango, B.A. PCA and Autoencoder-Based ANN Models for Transformer Fault Diagnosis Using Dissolved Gas Analysis: Comparative Insights and Challenges. Energies 2026, 19, 2806. https://doi.org/10.3390/en19122806

AMA Style

Nkwambe MS, Thango BA. PCA and Autoencoder-Based ANN Models for Transformer Fault Diagnosis Using Dissolved Gas Analysis: Comparative Insights and Challenges. Energies. 2026; 19(12):2806. https://doi.org/10.3390/en19122806

Chicago/Turabian Style

Nkwambe, Mwamba S., and Bonginkosi A. Thango. 2026. "PCA and Autoencoder-Based ANN Models for Transformer Fault Diagnosis Using Dissolved Gas Analysis: Comparative Insights and Challenges" Energies 19, no. 12: 2806. https://doi.org/10.3390/en19122806

APA Style

Nkwambe, M. S., & Thango, B. A. (2026). PCA and Autoencoder-Based ANN Models for Transformer Fault Diagnosis Using Dissolved Gas Analysis: Comparative Insights and Challenges. Energies, 19(12), 2806. https://doi.org/10.3390/en19122806

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PCA and Autoencoder-Based ANN Models for Transformer Fault Diagnosis Using Dissolved Gas Analysis: Comparative Insights and Challenges

Abstract

1. Introduction

1.1. Necessity of the Study

1.2. Novelty of the Proposed Work

1.3. Research Contributions

2. Theoretical Background

2.1. DGA and IEC 60599 Fault Classification

2.2. Linear Feature Extraction Using PCA

2.3. Nonlinear Feature Extraction Using Bottleneck Autoencoder

2.4. ANN Classifier

2.5. State-of-the-Art in DGA-Based Transformer Fault Diagnosis

3. Methodology

3.1. Comparative Pipeline Architecture

3.2. Dataset Description and Class Distribution

3.3. Feature Construction

3.4. Data Standardization

3.5. Feature-Space Correlation Analysis

3.6. PCA-Based Linear Feature Extraction

3.7. Autoencoder-Based Nonlinear Feature Extraction

3.8. ANN Classification Procedure

3.9. Feature Extraction Diagnostics

3.10. Latent-Space Visualization

3.11. Performance Evaluation Metrics

3.12. Experimental Protocol

4. Results and Discussion

4.1. Classification Accuracy Across Latent Dimensions

4.2. Confusion Matrix Analysis

4.3. Per-Class Precision, Recall, and F1-Score Analysis

4.4. Cross-Validation Stability Comparison

4.5. Benchmark Against Conventional Classifiers

4.6. Head-to-Head Comparison of PCA+ANN and AE+ANN

4.7. Practical Interpretation and Deployment Implications

4.8. Summary of Findings

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Representative DGA Cases

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI