1. Introduction
The generator excitation system provides excitation current to the generator, playing a crucial role in maintaining its normal and stable operation [
1]. As a three-phase alternating current excitation power device in large generator excitation systems, the excitation transformer is exposed to the combined effects of alternating voltage and pulse voltage during operation, resulting in a complex insulation aging process. Furthermore, due to prolonged high-load operation and harsh environmental conditions, excitation transformers are susceptible to various faults, including partial discharge [
2], grounding faults [
3], and direct current bias magnetization [
4]. The safe and stable operation of the excitation transformer is essential for the reliability of the self-excitation system and is a prerequisite for ensuring safe electricity production and full-capacity generation in power plants. Therefore, real-time monitoring and precise fault diagnosis of excitation transformers are critical for ensuring the safe operation of the unit and facilitating planned maintenance.
Existing transformer fault diagnosis methods can be broadly categorized into offline testing and online monitoring techniques [
5]. Offline testing methods include low-voltage pulse testing, short-circuit impedance testing, and frequency response analysis [
6,
7]. While these methods are well-established, they require the transformer to be shut down for implementation, which disrupts production schedules and incurs high labor costs. In contrast, online monitoring involves the real-time collection of data on parameters such as vibration [
8], partial discharge [
9,
10], electrical characteristics [
11], and dissolved gases in oil [
12], all of which provide insights into the transformer’s operational status and potential faults. However, for dry-type transformers, the absence of cooling oil data limits the development of online fault diagnosis techniques. Additionally, in power plant monitoring systems, the diversity of data types and their often nonlinear interdependencies present challenges. Effectively extracting valuable information from the large volume of operational data, while mitigating issues such as the incompleteness of single-signal diagnostics and interference from environmental noise [
13], is crucial for reducing both misdiagnoses and missed diagnoses of faults.
Transformers typically exhibit changes in the electromagnetic field under fault conditions [
14], which in turn influence the vibration and acoustic characteristics of the components. Vibration and voiceprint signals contain rich information regarding the equipment’s operational status, and by analyzing these signals, effective fault diagnosis of transformers can be achieved. In recent years, extensive research has been conducted both domestically and internationally on transformer condition monitoring techniques based on vibration and voiceprint signals. Compared to vibration signals, voiceprint signals offer several advantages, including non-contact measurement, making them particularly promising for application. Zhang et al. [
15] proposed a method for estimating the internal aging degree of transformers by collecting acoustic signals from both the high and low voltage sides and combining these signals with a backpropagation neural network. Li et al. [
16] addressed the challenge of diagnosing faults in converter transformers by proposing a fault diagnosis method that integrates multi-strategy enhanced Mel-Frequency Cepstral Coefficients (MFCC) feature extraction with Improved Hunter–Prey Optimization-optimized (IHPO-optimized) time convolution networks, thereby improving feature representation and model recognition capabilities. Yu et al. [
17] explored the voiceprint features of internal vibration sound signals in power transformers under various operating conditions and developed an automatic, non-invasive condition monitoring and fault diagnosis system based on acoustic characteristics. This system is capable of rapidly distinguishing fault operating conditions across different noise environments and diagnosing six typical faults, as well as mixed faults, in power transformers.
However, under complex operating conditions, fault diagnosis methods based on single signals still face limitations in terms of identification capability and stability. Additionally, the types and degrees of interference factors in on-site operating environments are highly uncertain. To address these challenges, multi-sensor information fusion technology has gained widespread application in fault diagnosis in recent years, achieving significant results in areas such as dissolved gas detection in transformer oil. For instance, Gong et al. [
18] proposed a novel multi-source information fusion strategy that combines a hierarchical vision transformer with a wavelet time-frequency architecture. By extracting and integrating features from vibration and current signals, this method enhances the stability of diagnostic performance. Cui et al. [
19] introduced an intelligent fault diagnosis method based on multi-source data fusion and correlation analysis, utilizing an improved entropy-weighted method to fuse and predict data related to the load rate of dissolved gases in transformers, upper oil temperature, winding temperature, and the melting index of dissolved gas components. Hou et al. [
20] developed a fault diagnosis method based on transformer information fusion technology, which combines a probabilistic extreme learning machine with an improved Dempster–Shafer (D-S) evidence theory. This method applies the enhanced D-S evidence theory algorithm to fuse multiple evidence bodies, enabling comprehensive transformer diagnostics. Tests on real fault data have demonstrated that this approach significantly improves the accuracy of diagnostic results.
Feature extraction, as a core element of fault diagnosis, aims to transform raw monitoring signals into feature information that possesses clear physical meaning or strong discriminatory power. Common signal processing methods currently include Fourier decomposition, wavelet analysis, and others. Zhu et al. [
21] extracted time-frequency features of azimuth signals using Wavelet Packet Transform (WPT), forming a time-frequency feature matrix. The features of the matrix obtained by WPT were then further processed, with irrelevant features removed and fault-sensitive features retained. The resulting feature matrix was used as input to a classifier for bearing fault diagnosis. However, traditional signal feature extraction methods are heavily reliant on manual expertise, making it challenging to effectively capture nonlinear features. Additionally, they are sensitive to noise, limiting their adaptability and robustness under complex operating conditions. In contrast, deep learning methods can automatically learn multi-level, high-dimensional abstract features directly from raw data without manual design. This characteristic makes deep learning particularly suitable for complex nonlinear systems, contributing to its widespread application in feature extraction. Xue et al. [
22] constructed multiple independent network structures in the time, frequency, and time-frequency domains. One network was used for supervised feature fusion, while other networks autonomously performed feature extraction through multiple Inception layers and convolutional layers. To support the extraction of key features across multiple fusion layers in different transformation domains, Xue et al. introduced multiple fusion nodes between the layers of various feature extraction networks and the feature summary network. Experimental results demonstrated that multi-transformation domain feature fusion significantly enhanced performance, surpassing the results of single-domain fusion methods.
To address the limitations in the current fault diagnosis methods for excitation transformers, this paper proposes a fault classification approach that leverages the Hunter–Prey Optimization (HPO) algorithm for optimizing DBN and multi-source heterogeneous information fusion, integrating advanced machine learning and signal processing techniques. In this method, HPO is employed to optimize the hyperparameters of the Deep Belief Network (DBN) network, allowing it to automatically extract deep features from voiceprint signals that are closely correlated with faults and effectively capture the complex patterns of subtle faults. Furthermore, by incorporating multi-source monitoring information, such as temperature and electrical parameters, and introducing a working condition awareness indicator, a classification model based on Adaboost-SVM is developed to perform semi-supervised fault diagnosis for excitation transformers.
3. Fault Diagnosis Framework Based on Multi-Source Heterogeneous Information Fusion
Due to differences in sampling frequency and response time among the excitation transformer voiceprint data, temperature data, and electrical parameter data, as well as the potential data processing delays during classifier computation, it is essential to partition the heterogeneous signals collected from multiple sensors into windows. This partitioning ensures time synchronization of different signal types within the same time window, thereby enhancing the timeliness of fault diagnosis. Specifically, for the voiceprint signal, a 1 s time window with a 1 s sliding step is employed, and down sampling is performed with an interval of 10 to generate several sub-samples. For the temperature and electrical signals, a 10 min time window with a 1 s sliding step is used, similarly partitioned into multiple sub-samples. Given the high dimensionality and large volume of the collected data, practical applications may encounter issues such as packet loss or data instability, leading to missing values and anomalies. These challenges can adversely affect model training and diagnostic accuracy. To address these issues, this study employs the Isolation Forest (IF) algorithm to detect outliers at the window level. Missing values are then imputed using the median substitution method, completing the data cleaning process. Finally, the data is normalized using the Z-score method.
Furthermore, considering the complex and variable operating conditions encountered in the actual operation of excitation transformers, traditional semi-supervised learning methods often fail to fully account for the differences in the distribution of unlabeled samples across these conditions. As a result, the model may overfit the observed conditions, leading to performance degradation when applied to new conditions. To address this challenge, this paper introduces a working condition awareness mechanism. By utilizing distribution distance information in the feature space, the mechanism selects unlabeled samples that significantly differ from the labeled samples and incorporates them into the training set. This approach enhances the model’s adaptability and generalization ability to diverse operating conditions.
The Mahalanobis distance between the center of the unlabeled samples and the labeled samples is calculated using Equation (1) [
29] as an indicator of operating condition differences. This distance is then combined with confidence and disparity thresholds for dual filtering, which facilitates the construction of a pseudo-label dataset.
In the equation, Dj represents the computed Mahalanobis distance, xi denotes the unlabeled sample, where i = 1, 2, …, N, c is the center of all labeled samples, and S is the covariance matrix of the training samples.
Finally, to meet the real-time and fast requirements of excitation transformer fault diagnosis, a Support Vector Machine (SVM) is selected as the base classifier for Adaboost [
30], and a fault diagnosis model is constructed. Given a sample dataset containing
n sets of training data K = {(
x1,
y1), …, (
xn,
yn)}, where
xn represents the
n-th feature vector and
yn is its corresponding label, taking values of −1 or 1. The weight of the
i-th sample is denoted by
ai, with the initial sample weight set as
ai = 1/
n. The number of iterations is denoted by
m.
The initial feature parameters are input into the SVM to obtain the
t-th initial weak classifier as follows:
Calculate the classification error
Eeer of the weak classifier
as follows:
Adjust the weight coefficient
at of the
t-th sample based on its classification error as follows:
Reallocate the weights of each sample and adjust the sample distribution based on the magnitude of the weight coefficients as follows:
In the formula, Ct is the generalization coefficient. If the distribution of a certain sample is the same as the previous one, exit the loop; otherwise, continue the loop.
The resulting combined strong classifier is
In the formula, sign represents the sign function, and n denotes the number of weak classifiers.
The overall framework is illustrated in
Figure 5.
4. Model Evaluation
4.1. Dataset Acquisition
The experimental subject of this study is a three-phase modular, resin-cast, F-class insulated, dry-type excitation transformer with a rated capacity of 3200 kVA and a primary voltage of 24 kV. This transformer is supplied by Sunten Electric Equipment Co., Ltd., and is manufactured using the YD11 wiring method. Using phase A as an example, the signal data and types collected on-site are shown in
Figure 6. This study conducted a field deployment at a certain large hydropower station, with the installation schematics for the voiceprint sensor and the fiber optic sensor shown in
Figure 7. In the experiment, an electret capacitor microphone was used to collect voiceprint signals, while fiber optic sensors were employed to acquire temperature signals. Additionally, power loss and core grounding current data, derived from the on-site system, were also utilized to construct the dataset required for fault diagnosis. The sampling frequency of the voiceprint signal was 48 kHz, with a collection duration of 10 s for each segment; the temperature signal had a sampling period of 1 s; and the electrical signal had a sampling frequency of 1 Hz. Transformer operation data were collected through online monitoring, energized detection, and simulation tests for four typical fault conditions: partial discharge, DC bias magnetization, component loosening, and multi-point grounding of the core. Subsequently, 10 samples of each signal type under normal conditions and 10 samples of each signal type under each abnormal condition were randomly selected. Finally, the collected samples were windowed according to the method described in
Section 3, thus constructing the excitation transformer fault diagnosis dataset.
4.2. Model Evaluation Metrics
The evaluation is performed using accuracy (
ACC), recall (
REC), specificity (
SPE), and
F1 score. The formulas for each evaluation metric are provided as follows:
In the formulas, TP represents the number of samples where both the model prediction and the true value are 1; TN denotes the number of samples where both the model prediction and the true value are 0; FP refers to the number of samples where the model predicts 1 but the true value is 0; FN corresponds to the number of samples where the model predicts 0 but the true value is 1; Precision (PRE) represents the proportion of actual positive samples among the samples predicted as positive.
4.3. HPO-DBN Feature Extraction Model Optimization Parameters
The DBN model is trained using an experimental setup based on the Windows 10 operating system, employing the CPU version of the TensorFlow framework. The programming environment is Python 3.7, with PyCharm 2023 utilized as the development tool. Model training is performed on an NVIDIA GeForce RTX 4070 GPU platform. The experiment is configured with 200 iterations, an RBM learning rate of 0.05, a fully connected layer learning rate of 0.1, a learning rate momentum of 0.02, and 800 neurons in the RBM layer. The dataset is partitioned into training and testing sets with a 6:4 ratio. Following the parameter input, feature vectors are extracted, and the resulting feature vectors are subsequently classified using an SVM classifier. The model evaluation parameters are detailed in
Table 4.
As shown in
Table 4, the
ACC of the DBN-SVM model is 88.4%,
REC is 87.1%,
SPE is 86.7%, and the
F1 score is 88.5%, indicating that the overall prediction performance is quite satisfactory. However, the number of RBM layers, the number of neurons in each RBM layer, and the learning rate are manually set, which introduces a certain degree of subjectivity. To improve the objectivity of parameter selection and further enhance model performance, this paper introduces the HPO algorithm to optimize the DBN model parameters. The HPO parameter settings are as follows: initial population size of 20; the termination condition is set to 200 iterations. The range for the number of hidden layers is [1, 5], the number of neurons per layer is [50, 1500], and the learning rate is [0.001, 0.1]. After initializing the model with the non-optimized DBN parameters and inputting the data, the optimizer determines the optimal number of layers, neurons, and learning rate after 200 iterations, as shown in
Table 5. The SVM classifier is also used to classify the feature vectors, and the corresponding model evaluation parameters are presented in
Table 6 below.
From the comparison between
Table 5 and
Table 6, it can be observed that the HPO-optimized DBN model outperforms the non-optimized DBN model in terms of accuracy. The
ACC of the HPO-optimized model reaches 94.5%, with
REC,
SPE, and
F1 score at 92.2%, 92.0%, and 93.4%, respectively—each of which surpasses the performance of the non-optimized DBN model. As shown in the figure, the loss curves for both the DBN and HPO-DBN models are provided. From
Figure 8, it can be seen that the DBN model stabilizes around the 20th iteration, with the loss value eventually stabilizing around 1.2. In contrast, the HPO-optimized DBN model stabilizes around the 10th iteration, with the loss value reaching approximately 0.8, indicating a faster decline and lower loss value compared to the non-optimized DBN model.
4.4. Analysis of Voiceprint Feature Extraction Results
To evaluate the performance of the HPO-DBN feature extraction method, it is compared with the model proposed in Refs. [
31,
32,
33,
34]. For all methods, the feature vectors obtained after feature extraction are input into an SVM classifier for fault diagnosis. The bar chart illustrating the evaluation metrics for eight different methods in excitation transformer fault diagnosis is presented in
Figure 9.
As shown in
Figure 9, the HPO-DBN method outperforms the other models proposed in references [
31,
32,
33,
34] across all four evaluation metrics. Specifically, this method not only achieves higher accuracy (ACC) but also maintains a good balance with recall (REC), effectively reducing the occurrence of false positives and false negatives. Compared to the traditional SVM, HPO-DBN-SVM shows significant improvements in both accuracy and recall, indicating its ability to identify faulty samples more effectively and reduce missed detections. While IDBO-SVM and IGWO-SVM have made strides in parameter optimization, their performance still falls short of HPO-DBN-SVM, particularly in balancing recall and specificity. The HPO-DBN-SVM method excels in this balance. In terms of
SPE, HPO-DBN-SVM accurately identifies negative class samples and reduces misdiagnosis. On the other hand, deep learning methods such as CNN-SVM and QPSO-LSTM-SVM show weaker performance in
SPE. This is likely due to the complex feature extraction process, which may lead to misclassification of negative class samples, thereby reducing specificity.
This suggests that the HPO-DBN method not only has strong capabilities in identifying positive class samples but also effectively reduces false positives, ensuring the stability and reliability of the fault diagnosis system. Moreover, the improvement in the F1 score further confirms that the HPO-DBN method strikes a better balance between accuracy and recall, ensuring model stability across different samples. Compared to other methods, HPO-DBN-SVM enhances classification accuracy in complex fault conditions by optimizing feature extraction and model parameters, while effectively capturing voiceprint feature information. This provides robust support for the accurate identification of excitation transformer fault states.
4.5. Analysis of Temperature–Electrical Signal Feature Extraction Results
A total of 30 groups of signals were collected from the high- and low-voltage windings (three phases) of the excitation transformer, including winding temperature, power loss, and core grounding current. For each type of signal, 20-dimensional time-domain statistical features were extracted, resulting in a 600-dimensional time-domain feature set. The importance of these statistical features was evaluated using the Random Forest algorithm, with the number of decision trees set to 300. By comparing the classification performance under different feature dimensions, a curve illustrating the relationship between classification error and feature dimension was obtained, as shown in
Figure 10.
According to the analysis results, the classification error is lowest at 10.86% when the top 148 feature dimensions are selected. To further reduce feature dimensionality and eliminate the impact of redundant features, this study employs an autoencoder network to fuse and reduce the dimensionality of the selected statistical features. By optimizing the number of neurons in the hidden layer of the autoencoder, the final value is set to 42, with the mean squared error (
MSE) used as the loss function. The final model parameter settings are presented in
Table 7.
The feature vectors obtained after extraction are input into an SVM classifier for fault diagnosis. The confusion matrices for three conditions—normal state (I), DC bias magnetization state (II), and multi-point grounding state of the core (III)—are compared across three scenarios: using the original time-domain statistical features, the features filtered by Random Forest, and the features fused and reduced by the autoencoder. The results are presented in
Figure 11.
The results show that the classification accuracies on the test set for 42-dimensional, 148-dimensional, and 600-dimensional features are 93%, 87%, and 72%, respectively. When using the original 600-dimensional time-domain statistical features, the excessive number of irrelevant features overwhelms key information, negatively impacting classification performance. After feature selection using Random Forest, most irrelevant features are eliminated, resulting in an improvement of approximately 15% in model accuracy compared to the original feature set. However, some highly correlated redundant features remain. To address this, further nonlinear dimensionality reduction is performed using an autoencoder, which effectively removes redundant information, raising the final model accuracy to 93%. These results indicate that appropriate dimensionality reduction and feature selection can significantly enhance classification performance. Moreover, the combination of Random Forest and the autoencoder effectively reduces redundancy, improves computational efficiency, and boosts the model’s generalization ability.
4.6. Analysis of Fault Diagnosis Results
Based on the preprocessed feature vector data, this study conducts training and validation analysis of the proposed model. Several models—SVM, Adaboost-BP, XGBoost-SVM, and CNN—are selected for comparison with Adaboost-SVM, using the same input dataset. By comparing the evaluation metrics of each model, the effectiveness of the proposed fault diagnosis framework is validated. To minimize the influence of randomness, all models are tested 25 times, with the final overall classification performance reported as the average of these trials. The comparison results are summarized in
Table 8, with evaluation metrics including average accuracy (
PACC), average
F1 score (
PF1), average training time (
Ttrain), and average testing time (
Ttest).
From the comprehensive comparison of various metrics, it is evident that the Adaptive Boosting with Support Vector Machine (Adaboost-SVM) model performs the best in terms of average accuracy and average F1 score, achieving 96.89% and 96.48%, respectively. While ensuring high diagnostic accuracy, its training time is 1692.49 s, and its testing time is only 0.093 s, effectively balancing model accuracy and real-time performance. The Extreme Gradient Boosting with Support Vector Machine (XGBoost-SVM) model ranks second in classification accuracy, with PACC and PF1 values of 92.25% and 90.22%, respectively. It also exhibits relatively shorter training and testing times, demonstrating good efficiency and fast response capability. In contrast, although the Convolutional Neural Network (CNN) model maintains moderate performance with PACC (88.21%) and PF1 (85.30%), its training time is 5007.96 s, and testing time is 0.376 s, resulting in high training overhead and inference latency, making it unsuitable for real-time online diagnostic requirements. The Adaptive Boosting with Backpropagation (Adaboost-BP) model’s PACC (95.46%) and PF1 (92.74%) are close to those of XGBoost-SVM, but its training time (3030.62 s) and testing time (0.185 s) make it less efficient and real-time compared to the top two models. The traditional SVM model performs significantly worse in terms of PACC (74.90%) and PF1 (73.97%). While it has the shortest training time and the fastest inference speed, its classification accuracy is insufficient to meet the demands for high-reliability diagnostics. It should be noted, however, that baseline models such as SVM, CNN, and XGBoost-SVM were not optimized to the same extent as the proposed method. Future work will involve comprehensive hyperparameter tuning and the inclusion of state-of-the-art deep architecture to ensure fairer benchmarking and stronger comparative validation. Overall, Adaboost-SVM achieves a well-balanced trade-off between diagnostic accuracy, training cost, and inference efficiency, making it the most suitable model for online fault diagnosis of excitation transformers under complex operating conditions.
The classification performance of various models for normal operating conditions and faults, such as partial discharge, DC bias magnetization, component loosening, and multi-point grounding of the core, is shown in
Figure 12.
As shown in
Figure 12, the Adaboost-SVM model exhibits strong generalization performance, achieving optimal values for
ACC,
REC,
SPE, and
F1 score in the recognition of partial discharge, DC bias magnetization, component loosening, and multi-point grounding of the core. Specifically, for the normal state, its F1 score is slightly lower than that of the Adaboost-BP model. In the recognition of component loosening faults, ACC and REC are slightly lower than those of the Adaboost-BP model. However, overall, Adaboost-SVM demonstrates the best performance across all evaluation metrics.
In summary, the Adaboost-SVM model excels in fault diagnosis accuracy, generalization ability, and real-time performance. It is well-suited to meet the practical demands for rapid response and precise classification in on-site operations.