Power Transformer Winding Fault Diagnosis Method Based on Time–Frequency Diffusion Model and ConvNeXt-1D

Yang, Yulong; Deng, Xiangli

doi:10.3390/app16052528

Open AccessArticle

Power Transformer Winding Fault Diagnosis Method Based on Time–Frequency Diffusion Model and ConvNeXt-1D

by

Yulong Yang

^* and

Xiangli Deng

School of Electrical Engineering, Shanghai University of Electric Power, Shanghai 200090, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(5), 2528; https://doi.org/10.3390/app16052528

Submission received: 26 January 2026 / Revised: 2 March 2026 / Accepted: 3 March 2026 / Published: 6 March 2026

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Download

Browse Figures

Versions Notes

Abstract

To address the challenges of insufficient transformer winding fault samples and the effective fusion of heterogeneous multi-source data, this study proposes an intelligent fault diagnosis method based on a time–frequency diffusion model and ConvNeXt-1D. First, data augmentation is performed on the original signals using the time–frequency diffusion model. Through a forward noise injection and reverse denoising process, the limited time-series samples are expanded. By alternately applying time-domain noise addition and frequency-domain blurring, the signals are jointly enhanced in the time–frequency domain, improving sample diversity and feature representation. Next, a ConvNeXt-1D network is constructed for multi-scale feature extraction and fault classification, incorporating an attention mechanism to efficiently fuse multi-source features and achieve precise fault identification. Finally, the proposed method is validated using dynamic model experiments. The results indicate that under typical fault conditions—such as inter-turn short circuits, winding deformation, and arc discharge—the proposed method achieves a diagnostic accuracy of 99.23 ± 0.29%. Compared with other classical models, the proposed approach demonstrates stronger classification capability and higher stability under small-sample data conditions.

Keywords:

time–frequency diffusion model; ConvNeXt-1D; transformer; multi-source data; winding fault diagnosis

1. Introduction

In the context of power supply structure modernization and intelligent operation of power systems, transformers, as the core hubs of electrical networks, must operate reliably over long periods to ensure system safety and stability [1,2]. However, the early diagnosis of transformer winding faults (such as inter-turn short circuits and winding deformation) still faces numerous challenges, including severe signal noise interference, difficulty in extracting effective fault features, and insufficient model generalization due to limited sample data [3]. Traditional machine learning methods rely heavily on expert knowledge and prior experience, constructing handcrafted features from signal processing and statistical analysis as model inputs [4,5,6]. Under complex operating conditions, these features often lack robustness and expressive power, which limits diagnostic accuracy and increases dependence on data quality and prior knowledge. In recent years, deep learning has been widely applied in transformer fault diagnosis owing to its capability for automatic feature extraction and nonlinear modeling [7,8]. Generative models (such as diffusion models) have demonstrated strong potential in denoising and data augmentation, offering new solutions for small-sample fault diagnosis [9].

To improve diagnostic and assessment accuracy, existing studies have conducted comprehensive transformer condition evaluations by constructing multi-feature indicator systems, thereby enhancing information diversity and anti-interference capability. Reference [10] realized winding short-circuit fault diagnosis from the perspective of multi-physical feature fusion; Reference [11] introduced fuzzy evaluation and information fusion methods to assess transformer operating states; Reference [12] conducted a comprehensive analysis of multiple transformer characteristics, including DGA, winding resistance, moisture content in insulating oil, and acidity, achieving detection accuracy significantly higher than that obtained using DGA alone. However, such approaches typically rely on manually designed feature indicators, making it difficult to fully exploit the potential deep correlations and complementary information among multi-source data. ConvNeXt, a high-performance convolutional neural network proposed in recent years, has demonstrated strong feature modeling capability in classification tasks and has been applied in fault diagnosis to improve recognition accuracy [13,14]. Nevertheless, as it was originally designed for two-dimensional visual data, processing one-dimensional time-series signals usually requires signal-to-image transformation, which may result in the loss of temporal information. Moreover, under practical operating conditions, insufficient data can lead to inaccurate feature maps. To address these issues, this study proposes a ConvNeXt-1D fault diagnosis model tailored for one-dimensional time-series signals, enabling end-to-end diagnostic optimization. Moreover, under actual fault operating conditions, multi-source signals exhibit significant differences in generation mechanisms, time scales, and frequency-domain characteristics, making it difficult for a single model to simultaneously capture both the discriminative features and complementary information of these signals. To address this, the present study introduces a self-attention mechanism to achieve adaptive feature fusion across signal sources, thereby enhancing the model’s representation capability and diagnostic robustness for complex faults.

From a data perspective, sample generation is an effective approach to alleviating the few-sample problem [15]. Traditional generative methods, such as Generative Adversarial Networks (GANs) [16] and Variational Autoencoders (VAEs) [17], generate new samples by learning the distribution of training data to achieve data augmentation, but their modeling capacity is limited. Moreover, GANs, constrained by the adversarial interplay between the generator and discriminator, often suffer from training instability and mode collapse [18,19]. VAEs typically assume that the latent variables follow a Gaussian distribution, which often leads to overly smooth generation results, making it difficult to accurately capture the fine-grained features in fault signals. As an emerging class of generative models, diffusion models can stably produce high-fidelity samples through a forward noise-adding and reverse denoising process, demonstrating significant advantages in data augmentation tasks. However, the application of diffusion models in transformer fault diagnosis remains relatively limited. Since transformer monitoring signals are primarily one-dimensional time series, existing diffusion-based methods often focus on spatial-domain feature modeling, which can easily overlook the rich frequency-domain information embedded in the temporal signals. To overcome this limitation, this study constructs a time–frequency diffusion model that alternately applies noise addition in the time domain and blurring in the frequency domain, enabling joint time–frequency feature extraction and the generation of highly diverse sample data.

In summary, this paper presents a transformer winding fault diagnosis framework based on a time–frequency diffusion model and ConvNeXt-1D. The main contributions are as follows: (1) a time–frequency diffusion model is constructed to augment original fault signals, producing high-quality and diverse samples that effectively alleviate the small-sample problem; (2) a ConvNeXt-1D network structure is designed for one-dimensional time-series signals, enabling automatic multi-scale feature extraction and precise fault classification; (3) a joint training strategy combining the time–frequency diffusion model and ConvNeXt-1D is proposed, leveraging the data augmentation capabilities of the generative model and the feature representation strengths of the deep network, significantly improving diagnostic accuracy and stability. This framework not only continues the principle of multi-source information utilization in data fusion methods but also provides a new technical pathway for transformer health diagnosis in complex industrial environments through the synergistic optimization of generative models and deep networks.

2. Basic Principles

2.1. Diffusion Model

The Denoising Diffusion Probabilistic Model (DDPM) is a fundamental architecture within the diffusion model framework [20]. This model consists of a forward noising process and a reverse denoising process. In the forward process, the model gradually adds Gaussian noise to the original data according to a predefined noise schedule, causing the data to progressively lose structural information over continuous time steps, ultimately approaching an isotropic Gaussian distribution. In the reverse process, the model learns to remove noise, thereby probabilistically reconstructing the clean signal at each time step. In this way, the model can generate high-quality samples from pure noise, effectively alleviating the issue of limited data, with stable training and high scalability.

(1) Forward Process

The forward diffusion process

q (x_{t} | x_{t - 1})

is based on a Markov chain, in which Gaussian noise

ε_{t}

is gradually added to a clean sample

x_{0} ~ q (x_{0})

to generate a sample sequence

x_{0}, x_{1}, \dots, x_{T}

. At each time step

t

, the sample

x_{t}

is obtained by adding noise

ε_{t - 1}

to the sample

x_{t - 1}

from the previous time step

t - 1

. The forward diffusion process is defined as:

q (x_{t} | x_{t - 1}) = N (x_{t}; \sqrt{1 - β_{t}} x_{t - 1}, β_{t} I)

(1)

where

N

denotes a Gaussian distribution,

β_{t} \in (0, 1)

represents the noise scheduling parameter at time step

t

, and

I

is the identity matrix.

By applying the reparameterization trick, the sample

x_{t}

at any arbitrary time step can be directly sampled from

x_{0}

as follows:

x_{t} ​ = \sqrt{{\bar{α}}_{t}} x_{0} + \sqrt{1 - {\bar{α}}_{t}} ε

(2)

The cumulative coefficients

{\bar{α}}_{t} = \prod_{s = 1}^{t} α_{s}

are defined to represent the proportion of the remaining signal. As the time step increases, the weight of noise gradually increases, and the sample approaches an isotropic Gaussian distribution.

(2) Reverse Process

In contrast to the forward process, the reverse denoising process aims to recover data step by step from random noise. However, the true posterior distribution

q (x_{t - 1} | x_{t})

is intractable. Therefore, a learnable Gaussian distribution

p_{θ}

is used to approximate it. The reverse denoising process can also be represented as a Markov chain:

\{\begin{cases} p_{θ} (x_{t - 1} | x_{t}) = N (x_{t - 1}; μ_{θ} (x_{t}, t), β_{t} I) \\ μ_{θ} (x_{t}, t) = \frac{1}{\sqrt{α_{t}}} (x_{t} - \frac{β_{t}}{\sqrt{1 - \bar{α_{t}}}} ε_{θ} (x_{t}, t)) \end{cases}

(3)

where

θ

denotes the model parameters, and

μ_{θ} (x_{t}, t)

represents the mean predicted by the neural network. In practice, the neural network is usually trained to predict the added noise

ε_{θ}

rather than directly predicting the mean.

The conditional variable

t

is introduced in the form of positional embeddings, enabling parameter sharing across different time steps. To ensure that the predicted distribution approximates the true posterior distribution, the training objective is typically defined as minimizing the mean squared error between the true noise and the predicted noise. The loss function is expressed as:

L = E_{x_{0}, ε} [{‖ε - ε_{θ} (x_{t}, t)‖}^{2}]

(4)

where

E_{x_{0}, ε}

denotes the encoder,

ε_{t}

represents the ground-truth noise, and

ε_{θ}

denotes the predicted noise corresponding to time step

t

.

The sampling process also follows the reverse diffusion procedure. After training, a random noise vector

x_{T} \sim N (0, I)

is drawn from a standard Gaussian distribution, and the trained neural network

ε_{θ} (x_{t}, t)

is used to progressively separate the noise components. The generation process through reverse iteration is illustrated as follows:

x_{t - 1} = μ_{θ} (x_{t}, t) + σ_{t} z

(5)

where

z

represents the Gaussian noise involved at time step

t

. After

T \to 0

iterations, a high-fidelity data sample

x_{0}

is generated.

2.2. ConvNeXt Model

ConvNeXt [21] is a modernized convolutional neural network architecture that combines the design strengths of residual networks (ResNet) and Swin Transformer, exhibiting powerful feature extraction capabilities. It has achieved outstanding performance in tasks such as image classification, object detection, and semantic segmentation. In the field of fault diagnosis, ConvNeXt demonstrates higher accuracy and faster inference speed. The network mainly consists of convolutional layers, normalization layers, downsampling modules, ConvNeXt blocks, pooling layers, and fully connected layers. The major improvements of ConvNeXt are summarized as follows.

First, the number of block repetitions in each stage of the original ResNet is adjusted from (3, 4, 6, 3) to (3, 3, 9, 3), leading to a more balanced allocation of computational resources. The conventional progressive downsampling stem is replaced with a stem layer using a 4 × 4 convolution kernel with stride 4, reducing information loss. The downsampling modules in subsequent stages employ independent convolution layers with kernel size 2 × 2 and stride 2. In addition, layer normalization is used instead of batch normalization, making the model more suitable for small-batch training scenarios.

Within each block module, a large convolution kernel of size 7 × 7 is adopted to expand the receptive field. Depthwise separable convolution is introduced, consisting of depthwise convolution and pointwise convolution, which can be regarded as a special group convolution where the number of groups equals the number of channels, significantly reducing computational cost. In addition, the depthwise convolution module is moved to the front, and an inverted bottleneck design is employed, following a channel expansion–compression strategy. The smoother Gaussian Error Linear Unit (GELU) activation function is used to replace ReLU, enhancing nonlinear feature representation capability.

GELU (x) = \frac{1}{2} x (1 + \tanh (\sqrt{\frac{2}{π}} (x + 0.044715 x^{3})))

(6)

Furthermore, the use of activation functions and normalization layers is reduced to simplify the network structure and mitigate overfitting. DropPath, a stochastic depth regularization technique, is introduced to improve the generalization ability of the model.

3. Few-Shot Fault Diagnosis Model Based on Time–Frequency Diffusion and ConvNeXt-1D

3.1. Time–Frequency Diffusion Model

Frequency-domain characteristics of transformer fault signals, such as harmonics or specific frequency components, play a crucial role in fault diagnosis [22]. Traditional diffusion models mainly focus on the time domain or spatial domain while neglecting structured information in the frequency domain. Learning frequency-domain features requires the model to capture spectral representations of signals, and the frequency-domain information of transformer fault signals demands additional representation mechanisms. If a diffusion model is not optimized for the frequency domain, the generated signals may fail to accurately reflect the fault-related frequency characteristics.

To address this issue, this paper extends conventional denoising-based diffusion models to a joint time–frequency domain and constructs a time–frequency diffusion framework, enabling the generation of diverse and high-quality signal data, as illustrated in Figure 1. By introducing noise in the time domain and blurring operations in the frequency domain, the proposed method effectively disrupts and gradually restores the time–frequency structural characteristics of signals. This approach enhances temporal sampling accuracy while achieving a balance between spectral detail modeling capability and generation stability.

During the forward diffusion process, the time–frequency diffusion model progressively introduces noise into the signal to destroy the original samples, thereby constructing a diffusion sequence from clean samples to noisy samples. At each diffusion step, perturbations are simultaneously introduced in both the time and frequency domains, enabling gradual degradation of the signal’s time–frequency characteristics. Specifically, the forward diffusion process from step

t - 1

to

t

consists of two stages: frequency-domain blurring and time-domain noising. First, in the frequency-domain blurring stage, the Fourier transform

ζ (\cdot)

is applied to the time-domain signal, and a Gaussian convolution kernel

G_{t}

is introduced to perform circular convolution on the spectrum, thereby weakening the spectral structural information and obtaining a blurred spectral representation. Subsequently, in the time-domain noising stage, the blurred spectrum is transformed back to the time domain, and Gaussian noise

ε

is added. The original signal and noise are combined using the cumulative coefficient

α_{t}

through weighted summation. The overall forward diffusion process combining these two steps is expressed as:

x_{t} = \sqrt{α_{t}} ζ^{- 1} (G_{t} * ζ (x_{t - 1})) + \sqrt{1 - α_{t}} ε

(7)

where

ζ^{- 1} (\cdot)

denotes the inverse Fourier transform. To further simplify the time–frequency diffusion process, the iterative procedure from

x_{0}

to

x_{t}

is eliminated to reduce computational complexity. According to the convolution theorem,

ζ^{- 1} (G_{t} * ζ (x_{t - 1})) = ζ^{- 1} (G_{t}) x_{t - 1}

can be derived. Therefore, the computation can be simplified as:

x_{t} = \sqrt{α_{t}} g_{t} x_{t - 1} + \sqrt{1 - α_{t}} ε

(8)

where

g_{t} = ζ^{- 1} (G_{t})

remains a Gaussian kernel, indicating that the convolution of a signal with a Gaussian kernel in the frequency domain is equivalent to the multiplication of the signal with another Gaussian kernel in the time domain. Let

γ_{t} = \sqrt{α_{t}} g_{t}

. By similarly applying the reparameterization trick, the diffusion process can be expressed as:

x_{t} = {\bar{γ}}_{t} x_{0} + \sum_{s = 1}^{t} (\sqrt{1 - α_{s}} \frac{{\bar{γ}}_{t}}{{\bar{γ}}_{s}}) ε = {\bar{γ}}_{t} x_{0} + {\bar{σ}}_{t} ε

(9)

where

{\bar{γ}}_{t} = \prod_{s = 1}^{t} γ_{s}

,

α_{t}

and

g_{t}

correspond to the noise and blurring scheduling strategies, while

{\bar{γ}}_{t}

and

{\bar{σ}}_{t}

represent the weighting coefficient of the original signal and the standard deviation of the added noise, respectively. Under this formulation, the diffusion process at any step

t

can be efficiently completed without iterative computation.

3.2. Training of the Generative Model

In transformer condition monitoring scenarios, raw operational signals usually exhibit strong periodicity and repetitiveness, and a single cycle or even a half cycle often contains sufficient fault-related information. If the data are directly split into training, validation, and test sets according to temporal order, the model may learn spurious correlations associated with time positions rather than discriminative patterns that reflect the intrinsic fault characteristics, leading to overfitting. Moreover, due to practical operating constraints, transformer fault samples are difficult to obtain in large quantities, resulting in a severe imbalance between normal and fault data, which further degrades the training effectiveness and generalization performance of deep learning models. To address these issues, a time–frequency diffusion-based signal generation method is introduced to augment the training data, allowing the limited real-world data to be more effectively used for model validation and performance evaluation. Meanwhile, the data-driven generation of diverse fault samples reduces the temporal dependency of the training data and encourages the model to focus on intrinsic fault features, thereby enhancing its generalization capability under complex and unseen operating conditions.

In the context of this study, the diffusion-based generation process is primarily employed to alleviate the limited sample size of vibration and ultrasonic signals. During the forward diffusion process, Gaussian noise is gradually injected into the original signals, causing them to evolve into noise distributions in both the time and frequency domains. In the reverse process, a neural network is used to progressively denoise and reconstruct the noisy signals, thereby generating high-fidelity samples. Through this bidirectional diffusion mechanism, the model learns the mapping from random noise to meaningful structural information, enabling effective separation of noise components from fault-related features and improving the realism and diversity of the generated data under small-sample conditions. In contrast, leakage magnetic flux signals generally exhibit approximately one-dimensional sinusoidal waveforms, with fault characteristics mainly reflected in parameters such as amplitude or phase. Owing to their clear physical mechanisms and stable features [23], the leakage flux signal dataset in this study is expanded using a transformer simulation model rather than a diffusion-based generation strategy.

The network architecture adopted in this model is U-Net [24], whose input and output dimensions are kept consistent. The model consists of seven downsampling stages, each containing two Conv1D convolutional layers. Conv1D is well suited for processing continuous data with a single spatial dimension, such as time-series or textual sequences. Layer normalization is applied after each convolutional layer, and residual connections are incorporated to enhance training stability and feature representation capability. During the downsampling process, average pooling layers are used to pool features, with a total of 7 pooling layers employed. The number of channels is eventually expanded to 128. The corresponding upsampling stages gradually recover the feature scale, and the dimensionality of the final output representation is consistent with that of the last Conv1D layer. In addition, skip connections between the downsampling and upsampling paths facilitate efficient feature propagation and fusion, further improving overall model performance.

The network takes the noisy signal and the noise variance corresponding to the time step as inputs, where the noise variance represents the noise intensity of the current signal and is used to guide the network to perform conditional noise prediction during the reverse diffusion process. Since signals exhibit different statistical characteristics under different noise levels, incorporating noise variance information enables the network to adapt its denoising strategy according to the noise intensity. To enhance the network’s awareness of time-step variations, the noise variance is mapped into a high-dimensional feature space via sinusoidal embedding and fused with the backbone network through a Lambda layer.

Unlike the noise variance, the diffusion schedule is a rule used during the forward diffusion process to generate training samples. It defines the mixing ratio between the original signal and noise at each time step, thereby controlling the generation of noisy samples at different noise levels. Adopting a cosine-form diffusion schedule allows the signal-to-noise ratio to vary smoothly over time, with relatively slower changes at the beginning and end of the diffusion process, which helps stabilize model training.

Separate diffusion models are trained for vibration and ultrasonic state features, respectively. The generation process is guided by conditional labels, which indicate the specific class of the signal to be generated, enabling optimal feature learning under the corresponding conditions. A small amount of random noise is added prior to training as a data augmentation strategy.

In diffusion model training, the commonly used evaluation metric is the Kernel Inception Distance (KID), while parameter smoothing is often implemented via Exponential Moving Average (EMA). KID measures the similarity between real and generated signals through a pre-trained network, but its application to one-dimensional signals is limited. EMA can smooth model parameters and improve stability, yet it suffers from lag, complex hyperparameter tuning, and high computational cost. In contrast, Mean Squared Error (MSE) has well-understood mathematical properties, aligns with the Gaussian noise assumption, provides stable and fast optimization, preserves fine details in generated samples, and is computationally efficient, making it convenient for implementation and extension.

For vibration fault data of transformer inter-turn short circuits, the MSE-based diffusion model processing pipeline is illustrated in Figure 2, visually demonstrating the original signal, the noisy signal, and the reconstructed result. In this study, the MSE loss function is used to evaluate the discrepancy between the predicted noise and the actual noise as well as the signal components. For data generation, the reverse diffusion process is employed, where the model starts from random Gaussian noise and gradually denoises it to recover the original signal waveform.

3.3. ConvNeXt-1D Training Modell

Following the design philosophy of ConvNeXt, a ConvNeXt-1D network model suitable for one-dimensional time-series signals is constructed. The overall architecture is illustrated in Figure 3. The number of stacked blocks in each stage is set to (1, 1, 3, 1), while retaining a convolution kernel size of 7 to expand the receptive field and better capture long-range temporal dependencies in time-series signals. AdamW optimizer is adopted for parameter updates to maintain good generalization performance. In addition, a warm-up learning rate strategy is employed, allowing the training process to gradually ramp up in the initial stage, thereby avoiding training instability and achieving improved convergence performance and stability. Scale Layer is introduced to incorporate a learnable residual scaling factor, gamma, which adaptively adjusts the response intensity of features across different channels. This mechanism enhances the model’s sensitivity to critical information while preventing excessive amplification of less important features, thereby improving the discriminative power of the learned representations.

Under transformer fault operating conditions, different types of signals exhibit distinct generation mechanisms, time scales, and time–frequency characteristics. Ultrasonic signals, excited by faults such as partial discharges, are highly transient, non-stationary, and sensitive to high-frequency defects. Vibration signals, originating from electromagnetic and mechanical interactions between the winding and core, are more periodic and stable, reflecting the overall structural condition. Leakage flux signals, produced by winding currents, are sensitive to subtle or localized current variations and can detect winding anomalies at an early stage. These signals complement each other in terms of frequency characteristics and information focus, providing rich multi-source information for fault representation while imposing higher requirements on the adaptability and robustness of feature extraction methods.

To address the complexity and complementarity of multi-source signals, independent ConvNeXt-1D feature extraction branches are constructed for vibration, ultrasonic, and leakage flux signals, enabling effective extraction of discriminative features from each signal type. In each branch, the raw one-dimensional signals are progressively abstracted through multiple ConvNeXt-1D blocks to obtain high-dimensional feature representations with rich semantic information. The feature maps from different signal sources are then concatenated and fed into a self-attention module for cross-source feature modeling and adaptive fusion, thereby fully exploiting complementary information between signals and enhancing fault discrimination capability.

The self-attention mechanism is introduced to simulate the human visual attention process by focusing on salient features while suppressing irrelevant information, thus improving the effectiveness and efficiency of feature representation. Since the saliency of different fault categories varies across different signal sources, the self-attention mechanism dynamically adjusts the contribution weights of features. Specifically, the input features are first linearly projected to generate Query (Q), Key (K), and Value (V) vectors. The query vectors compute correlations with the key vectors, and attention weights are obtained through a normalization operation. These weights are then used to perform a weighted summation over the value vectors, preserving important feature information associated with higher weights and enabling adaptive regulation of feature contributions from different signal sources. Through this mechanism, the network can automatically enhance feature representations that are highly correlated with the target fault category, thereby improving the accuracy and robustness of transformer fault diagnosis. The expression is given as:

S e l f - A t t e n t i o n (Q, K, V) = S o f t \max (\frac{Q K^{T}}{\sqrt{d_{K}}})

(10)

where

d_{K}

is used for scaling to prevent excessively large inner products that may lead to gradient vanishing.

3.4. Fault Diagnosis Procedure

The transformer winding fault diagnosis procedure based on the proposed time–frequency diffusion model and ConvNeXt-1D network is illustrated in Figure 4. The specific steps are as follows:

Leakage flux, vibration, and ultrasonic signals collected by sensors are sampled at fixed intervals using a unified time window.
According to the model input requirements, the three types of signals are preprocessed separately, and their sample lengths are aligned through upsampling or downsampling operations.
The processed data are divided into a training dataset and a test dataset according to a predefined ratio.
The training dataset is used to train the time–frequency diffusion model to generate augmented vibration and ultrasonic samples, while a transformer simulation model is constructed to supplement leakage flux samples, thereby forming an expanded and balanced dataset.
The effectiveness of sample generation and fault diagnosis using the time–frequency diffusion model combined with ConvNeXt-1D is validated on a test dataset.
Visualization analysis is conducted on the classification results and diagnostic accuracy.

Figure 4. Transformer winding fault diagnosis process.

4. Experimental Analysis

4.1. Experimental Platform

To verify the feasibility of the transformer winding fault diagnosis method based on the time–frequency diffusion model and the ConvNeXt-1D network, transformer fault data were collected in a dynamic model laboratory. The dynamic experimental system consists of a transformer control cabinet, a three-phase three-limb core-type transformer, a load box, and protection and measurement devices. The experimental transformer has a rated voltage of 1 kV, a rated capacity of 50 kVA, a rated voltage ratio of 1/0.4 kV, a winding turn ratio of 201/139, a rated frequency of 50 Hz, and a winding connection type of Ynd11. The transmission line parameters are

R = 1.767 Ω

and

L = 0.964 H

, and the adjustable load is

Z = 20 Ω

. The experimental system is shown in Figure 5. The experimental platform can simulate three typical transformer winding faults: high-voltage winding arc faults, high-voltage winding inter-turn short-circuit faults, and high-voltage winding axial deformation faults. Specifically, internal winding arc faults are simulated by connecting an arc generation device to different turns of the transformer winding. The inter-turn short-circuit fault is set as a two-turn short circuit, corresponding to approximately 1% of the total winding turns. The winding deformation fault is simulated by introducing a 10% axial deformation in the upper section of the winding.

The signal acquisition system consists of a magneto-optical sensor, an acceleration sensor, and an ultrasonic sensor. The magneto-optical sensor is arranged on the surface of the transformer winding and connected to a magnetic protection unit via optical fiber. Based on the Faraday effect, it enables high-sensitivity measurement of the local magnetic field, with a sampling frequency of 1600 Hz. The customized solenoid was used, and multiple experiments verified that the sensor measurement error remained within 1.4%; a piezoelectric acceleration sensor is mounted on the transformer yoke clamp, with a frequency measurement range of 0–10 kHz and a measurement error within 2%; the ultrasonic sensor is positioned facing the middle section of the transformer winding, with a frequency measurement range of 15–70 kHz. Both the acceleration sensor and the ultrasonic sensor amplify the acquired signals through a constant-current power supply device, and the output signals are fed into an oscilloscope for display. The oscilloscope channel sampling frequency is 1 GHz. Finally, all measured waveform data are transmitted to an upper computer for further processing. The structural parameters of the transformer are shown in Table 1.

The experiments in this study were conducted in the PyCharm environment using the PyTorch 2.6 deep learning framework. The hardware used consisted of a computer equipped with an NVIDIA GeForce RTX 4060 GPU, an Intel i7-12650H CPU, and 16 GB of RAM.

4.2. Experimental Data Processing

For the operating conditions of the experimental transformer winding, five operating states are considered in this study: normal operation, inter-turn short circuit in the middle winding, inter-turn short circuit in the lower winding, axial compressive deformation of the winding, and winding arc discharge. For each operating state, 560 sample groups are constructed, resulting in a total of 2800 data samples. The data conditions and corresponding class labels of each sample group are listed in Table 2. The dataset is divided into training, validation, and test sets with a ratio of 7:1.5:1.5. The test set is used to evaluate the generalization capability of the model and contains 420 sample groups. The originally sampled data are preferentially allocated to the test dataset, while the remaining samples are used for augmentation.

Each sample is strictly derived from a clean dataset corresponding to its fault category, and signal segments that do not contain fault-related information are manually screened and removed. For normal-condition samples with relatively sufficient data, an interval sampling strategy based on a sliding window is adopted to fully utilize valid data while maintaining sample independence. In contrast, fault samples with insufficient data are expanded using data augmentation strategies. Specifically, vibration and ultrasonic signals are augmented using a generative model to produce diverse new samples, thereby alleviating training insufficiency under small-sample conditions and improving model performance. Leakage flux signals are expanded by constructing a transformer electromagnetic simulation model to ensure physical consistency and feature reliability of the generated data. Each sample consists of three groups of time-series data: one leakage flux signal group collected from the upper, middle, and lower positions of the transformer, one vibration signal, and one ultrasonic signal. After unified preprocessing, each signal sequence contains 2048 sampling points, providing rich multi-source feature information for the model. The time-domain, frequency-domain, and time–frequency distributions of the five categories of original vibration and ultrasonic signals are illustrated in Figure 6.

Based on the structural parameters of the three-phase three-limb dry-type transformer used in the dynamic model laboratory, a three-dimensional finite element model with dimensions identical to the physical transformer was established using ANSYS 2021 R2 finite element simulation software. The model was constructed at a 1:1 scale, and its overall structure is consistent with that of the actual transformer. The simulation model is shown in Figure 7.

To verify the consistency between the established simulation model and the dynamic experimental transformer, the radial leakage magnetic field amplitudes at three measurement points—located at the upper, middle, and lower parts of the transformer—were obtained under normal operating conditions through both experimental measurement and simulation calculation. The corresponding variation curves were plotted, and the comparison results are shown in Figure 8.

As shown in Figure 8, the simulation results exhibit consistent variation trends with the measured data in terms of radial leakage magnetic flux at all measurement points, indicating that the established simulation model has high accuracy. The mean errors of all sampled points for the three curves were 5.96%, 11.93%, and 7.12%, respectively, with an overall average error of 8.33%. Therefore, this simulation model can be used to provide targeted supplementation and expansion of training samples for leakage magnetic field data.

4.3. Experimental Results Analysis

The processed training and validation datasets were fed into the network model for training, and the training hyperparameters are listed in Table 3. The batch size was set to 32, and the number of training epochs was set to 50. The convolutional layers adopt the GELU activation function to promote gradient propagation and alleviate training oscillations, while the output layer employs the Softmax activation function to generate class probability distributions. The AdamW optimizer was selected, which applies weight decay directly during parameter updates, thereby improving model generalization on the basis of adaptive learning rates. The initial learning rate was set to 5 × 10⁻⁴, and a warm-up strategy was applied during the first 10% of the training epochs to gradually increase the learning rate and ensure training stability. During the main training phase, Reduce LR On Plateau is employed to monitor the validation loss. When the loss does not show significant improvement within a patience period of three epochs, the learning rate is reduced by a factor of 0.5, with a minimum value of 1 × 10⁻⁶, in order to prevent overfitting and premature learning rate decay. The loss function used during training is categorical cross-entropy.

The transformer winding fault diagnosis model adopts a large-kernel convolution design, with the number of feature channels gradually increasing from 32 to 256 to enhance feature extraction capability. Layer normalization is employed to accelerate training and improve network stability, while DropPath regularization is introduced to mitigate overfitting risk. The training results of the model are shown in Figure 9.

As shown in Figure 9, the model exhibits rapid and stable convergence during training. The validation accuracy approaches 100% and stabilizes after the 21st iteration, while the validation loss shows no significant fluctuation after the 28th iteration. Training was stopped after 50 epochs, with the final model achieving 99.75% accuracy and a loss of 0.0267 on the training set, and 99.13% accuracy and a loss of 0.0261 on the validation set, resulting in an average accuracy of 99.44%. These results indicate that the proposed model demonstrates excellent learning efficiency and diagnostic performance. Two key observations can be made from the convergence curves: first, the model converges rapidly within the initial few iterations, with accuracy rising quickly and loss decreasing simultaneously, indicating that the model structure effectively captures fault features; second, the training and validation accuracy and loss curves closely align, with minimal differences and no signs of divergence, suggesting that the model does not overfit and generalizes well to unseen data. To evaluate the robustness of the model, 5-fold cross-validation was performed, yielding an average accuracy of 99.23% with a standard deviation of 0.29%. The results show minimal variation across different training runs, indicating that the proposed method possesses strong stability and reliability.

To comprehensively evaluate the diagnostic performance of the model and demonstrate its discriminative capability for different fault categories, the trained fault diagnosis model is used to predict the test dataset. The classification results are then visualized using the t-SNE algorithm [25] and a confusion matrix [26]. The test dataset consists of 420 samples, accounting for 15% of the total data, with 84 samples for each fault category. The t-SNE clustering obtained after feeding the experimental data into the diagnostic model is shown in Figure 10.

As shown in Figure 10, the original input features exhibit significant class overlap in the t-SNE visualization, indicating strong similarity among different fault types in the original feature space. After model training, the feature distributions in the low-dimensional space become clearly separated, with well-defined inter-class boundaries and compact intra-class clusters. This demonstrates that the model not only effectively extracts key signal features but also enhances the discriminative capability among different fault categories, achieving effective differentiation of complex multi-source signals.

In the confusion matrix, rows represent the true classes of samples, while columns correspond to the predicted classes. Diagonal elements indicate the number of correctly classified samples, where higher values reflect stronger recognition capability for the corresponding class. Off-diagonal elements represent misclassifications between classes. As shown in the confusion matrix of the test set in Figure 11, fault diagnosis performance based on single-source signal networks is generally suboptimal, and fault diagnosis relying solely on vibration or ultrasonic signals lacks sufficient discriminative capability for certain classes. After multi-source signal fusion, the model correctly classifies the vast majority of samples, with only a small number of misclassifications remaining. This may be attributed to the high similarity of early-stage features among certain fault modes under experimental conditions, which limits their separability. Nevertheless, the model demonstrates high accuracy and strong generalization ability during both training and testing, achieving overall excellent classification performance.

To further verify the superiority of the proposed algorithm, accuracy, precision, recall, and F1-score were selected as evaluation metrics for network performance. These evaluation metrics are defined as follows:

\begin{matrix} A = \frac{T N + T P}{T N + T P + F N + F P} \\ P = \frac{T P}{F P + T P} \\ R = \frac{T P}{F N + T P} \\ F_{1} = 2 \times \frac{P \times R}{P + R} \end{matrix}

(11)

where

T N

denotes the number of true negatives correctly predicted as negative,

F N

denotes the number of false negatives incorrectly predicted as negative,

T P

denotes the number of true positives correctly predicted as positive, and

F P

denotes the number of false positives incorrectly predicted as positive.

The performance metrics for each fault category—including precision, recall, and F1 score—were analyzed individually to provide a more detailed assessment of diagnostic performance. The evaluation results for each fault category are presented in Table 4. The results indicate that the high overall accuracy is maintained across individual categories, demonstrating that the model achieves stable recognition performance for all fault types.

To evaluate the contribution of multi-source signals to different fault categories, the self-attention weights of each network branch were visualized. For an intuitive presentation of the weight distribution, a single sample was selected from each of the five categories in the dataset for sensor weight analysis. The weight values were recorded when the model achieved optimal classification performance and were normalized using the softmax function. The visualization results of the sensor weights are shown in Figure 12, where the magnitude of the weights is proportional to their contribution to fault diagnosis.

As shown in Figure 12, the weights corresponding to different sensors vary. For all category samples, the leakage flux sensor consistently exhibits the highest weight, while the vibration sensor has a relatively higher weight than the ultrasonic sensor. This is due to the differing monitoring capabilities of the sensors, with the leakage flux sensor being more sensitive. In addition, the distribution of sensor weights varies across different samples in the dataset. Vibration sensors have the highest weights when detecting winding deformation, while ultrasonic sensors have the highest weights when detecting discharge samples. This is mainly because these samples correspond to different types of faults, and each sensor contributes differently to diagnosing different fault types, resulting in varying weight distributions.

Based on the same dataset, the proposed method is compared with a conventional CNN, GRU, TCN, and DRSN, and the fault recognition results are listed in Table 5. The conventional CNN achieves an accuracy of 92.64%, but its performance is limited by a restricted receptive field and the lack of effective temporal dependency modeling. GRU and TCN reach accuracies of 96.68% and 97.08%, respectively, benefiting from their inherent temporal modeling mechanisms, with GRU capturing long-term dependencies and TCN enlarging the receptive field through causal and dilated convolutions. The DRSN model attains an accuracy of 95.45%, where the residual shrinkage mechanism improves robustness to noise to some extent. In contrast, the proposed model achieves the highest accuracy of 99.29%, significantly outperforming the comparison methods. This demonstrates that the proposed approach can more effectively integrate local and global features for one-dimensional time-series fault diagnosis while maintaining computational efficiency and robustness.

Moreover, to further investigate the impact of the proposed data augmentation strategy on diagnostic performance, training sets with different ratios of real samples were constructed by reducing the number of real samples while keeping the total number of training samples constant. Comparative experiments were also conducted using a simple noise injection strategy. The proposed method was evaluated using 5-fold cross-validation, and the diagnostic results under different training set compositions are presented in Table 6. The results indicate that as the proportion of real samples decreases, the accuracy gradually declines; however, the performance drops slowly, and the accuracy remains consistently above 90%. When generated samples were replaced with noisy samples, the accuracy decreased significantly.

5. Conclusions

The transformer winding fault diagnosis method proposed in this study, based on the time–frequency diffusion model and ConvNeXt-1D, achieves high diagnostic accuracy and stable recognition performance across multiple fault conditions, demonstrating the effectiveness of the approach. The specific contributions are as follows:

Considering that transformer monitoring signals are primarily one-dimensional time series, an end-to-end ConvNeXt-1D model was constructed. A multi-branch structure was employed to extract vibration, ultrasonic, and leakage flux signal features separately, and a self-attention mechanism was incorporated to achieve adaptive fusion across signal sources, fully exploiting the complementary information among multi-source signals and enhancing feature representation and discriminative performance;
A diffusion-based generative model was innovatively introduced into transformer fault diagnosis, proposing a time–frequency diffusion generation strategy. By alternately performing time-domain noise injection and frequency-domain blurring, joint time–frequency modeling is achieved. This effectively augments the limited training data while preserving the consistency of fault feature distributions, thereby improving model generalization;
A collaborative optimization mechanism between generative modeling and the discriminative network was proposed. The diffusion-generated data enhances the expressiveness of the data distribution, which, combined with multi-branch ConvNeXt-1D feature extraction and self-attention-based adaptive fusion, enables complementary advantages between data augmentation and network structure, resulting in higher diagnostic accuracy and more stable recognition performance under conditions of limited samples and multi-source signals.

This study provides a new technical approach for transformer winding fault diagnosis. Its high diagnostic accuracy and strong generalization capability offer practical engineering value for online condition monitoring and predictive maintenance of transformers. It should be noted that the performance of the proposed method is still influenced by factors such as sensor placement and operational condition consistency, and the overall framework is primarily data-driven, with interpretability remaining to be improved. Moreover, the data generated by diffusion models require a large amount of computation and impose higher demands on hardware. Future work will focus on further validation under complex operating conditions and on integrating physical-mechanism models to enhance the interpretability and reliability of diagnostic results.

Author Contributions

Conceptualization, X.D.; methodology, Y.Y.; investigation, Y.Y.; resources, X.D.; data curation, Y.Y.; writing—original draft preparation, Y.Y.; writing—review and editing, Y.Y.; visualization, Y.Y.; supervision, X.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Project of State Grid Corporation of China (520940240037).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data included in this article that support the results of this study can be obtained by contacting the corresponding author of this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, Y.; Yu, J.; Peng, P.; Xie, J.; Yi, J.; Tao, Z. Online Detection Method for Transformer Faults Based on Multi-model Fusion. High Volt. Eng. 2023, 49, 3415–3424. [Google Scholar]
Barkas, A.D.; Chronis, I.; Psomopoulos, C. Failure Mapping and Critical Measurements for the Operating Condition Assessment of Power Transformers. Energy Rep. 2022, 8, 527–547. [Google Scholar] [CrossRef]
Ou, Y.; Li, Z. Transformer fault diagnosis technology based on sample expansion and feature selection and SVM optimized by IGWO. Power Syst. Prot. Control. 2023, 51, 11–20. [Google Scholar]
Yang, Y.; Li, X.; Li, J.; Zhao, W.; Chen, W.; Xia, N. Time Frequency Diagnosis of Transformer Mechanical Fault Based on SSWT-GLCM and Improved WOA-SVM. Vib. Test Diagn. 2024, 44, 1135–1143+1247. [Google Scholar]
Ma, H.; Xiao, Y.; Yan, J.; Sun, Y. Transformer Winding Looseness Diagnosis Method Based on Multiple Feature Extraction and Sparrow Search Algorithm Optimized XGBoost. Electr. Mach. Control. 2024, 28, 87–97. [Google Scholar]
Liu, B.; Chen, Z.; Zhang, X.; Bai, Y.; Ou, Q.; Chen, G. Method Identifying the Vibration State of Transformer Winding Based on CEEMDAN and QPSO-SVM. J. Lanzhou Univ. Nat. Sci. 2025, 61, 452–458. [Google Scholar]
Qiu, S.; Cui, X.; Ping, Z.; Shan, N.; Li, Z.; Bao, X.; Xu, X. Deep Learning Techniques in Intelligent Fault Diagnosis and Prognosis for Industrial Systems: A Review. Sensors 2023, 23, 1305. [Google Scholar] [CrossRef]
ElSayed, M.E.; Albalawi, F.; Ward, S.A.; Ghoneim, S.S.M.; Eid, M.M.; Abdelhamid, A.A.; Bailek, N.; Ibrahim, A.; Bailek, N.; Ibrahim, A. Feature Selection and Classification of Transformer Faults Based on Novel Meta-Heuristic Algorithm. Mathematics 2022, 10, 3144. [Google Scholar] [CrossRef]
Yang, X.; Ye, T.; Xuan, X.; Zhu, W.; Mei, X.; Zhou, F. A Novel Data Augmentation Method Based on Denoising Diffusion Probabilistic Model for Fault Diagnosis Under Imbalanced Data. IEEE Trans. Ind. Inform. 2024, 20, 7820–7831. [Google Scholar] [CrossRef]
Liu, J.; Li, Z.; Li, K.; Chen, P.; Xu, S.; Xu, K.; Xiao, N. Multi-feature Fusion Detection Method for Interturn Short-circuit Faults of Transformers. J. Hunan Univ. Nat. Sci. 2023, 50, 210–216. [Google Scholar]
Shi, Y.; Tan, G.; Zhao, B.; Zhang, G. Condition Assessment Method for Power Transformers Based on Fuzzy Comprehensive Evaluation and Information Fusion. Power Syst. Prot. Control. 2022, 50, 167–176. [Google Scholar]
Badawi, M.; Ibrahim, S.A.; Mansour, D.-E.A.; El-Faraskoury, A.A.; Ward, S.A.; Mahmoud, K.; Lehtonen, M.; Darwish, M.M.F. Reliable Estimation for Health Index of Transformer Oil Based on Novel Combined Predictive Maintenance Techniques. IEEE Access. 2022, 10, 25954–25972. [Google Scholar] [CrossRef]
Cui, S.; Wu, Z.; Cui, Y.; Zhang, Q.; Zhao, Y. Fault Diagnosis of Planetary Gearbox Based on Frequency Slice Wavelet Transform and Attention-enhanced ConvNeXt Model. Acta Armamentarii. 2023, 46, 157–166. [Google Scholar]
Wan, K.; Ma, H.; Cui, J.; Wang, J. Fault Diagnosis Method of Transformer Core Loosening Based on Mel-GADF and ConvNeXt-T. Electr. Power Autom. Equip. 2024, 44, 217–224. [Google Scholar]
Lopes, S.M.A.; Flauzino, R.A.; Altafim, R.A.C. Incipient Fault Diagnosis in Power Transformers by Data-Driven Models with Over-Sampled Dataset. Electr. Power Syst. Res. 2021, 201, 107519. [Google Scholar] [CrossRef]
Li, Y.; Hou, H.; Xu, M.; Li, H.; Sheng, G.; Jiang, X. Oil Chromatogram Case Generation Method of Transformer Based on Policy Gradient and Generative Adversarial Networks. Electr. Power Autom. Equip. 2020, 40, 211–218. [Google Scholar]
Li, P.; Hu, G. Transformer Fault Diagnosis Based on Data Enhanced One-dimensional Improved Convolutional Neural Network. Power Grid Technol. 2023, 47, 2957–2967. [Google Scholar]
Luo, J.; Huang, J.; Li, H. A Case Study of Conditional Deep Convolutional Generative Adversarial Networks in Machine Fault Diagnosis. J. Intell. Manuf. 2020, 32, 407–425. [Google Scholar] [CrossRef]
Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved Training of Wasserstein GANs. Adv. Neural Inf. Process. Syst. 2017, 30, 1–11. [Google Scholar]
Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
Du, H.; Liu, H.; Lei, L.; Tong, J.; Huang, J.; Ma, G. Power Transformer Fault Detection Based on Multi-Eigenvalues of Vibration Signal. Trans. China Electrotech. Soc. 2023, 38, 83–94. [Google Scholar]
Cao, J.; Chen, Z.; Wang, J.; Chen, L. Few-Shot Gearbox Fault Diagnosis Method Based on Diffusion Model and DenseNet. J. Jilin Univ. Eng. Ed. 2025, 1–11. [Google Scholar] [CrossRef]
Deng, X.; Liu, M.; Ceng, P.; Zhou, D.; Du, Z.; Feng, Q. Research on early fault protection of transformer windings based on leakage magnetic field phase change characteristic. Power Syst. Prot. Control. 2025, 53, 1–12. [Google Scholar]
Gisbrecht, A.; Schulz, A.; Hammer, B. Parametric Nonlinear Dimensionality Reduction Using Kernel t-SNE. Neurocomputing 2015, 147, 71–82. [Google Scholar] [CrossRef]
Ma, J.; Liu, F. Bearing Fault Diagnosis with Variable Speed Based on Fractional Hierarchical Range Entropy and Hunter–Prey Optimization Algorithm-Optimized Random Forest. Machines 2022, 10, 763. [Google Scholar] [CrossRef]

Figure 1. Time–frequency diffusion process.

Figure 2. Diffusion model processing of inter-turn short-circuit vibration signals; (a) original signal; (b) inter-turn short circuit diffusion process; (c) recovered result.

Figure 3. Fault diagnosis framework based on ConvNeXt-1D.

Figure 5. Dynamic model testing platform: (a) experimental system wiring diagram; (b) dynamic model experimental equipment; (c) acceleration sensor; (d) ultrasonic sensor; (e) magneto-optical sensor; (f) transformer control cabinet.

Figure 6. Fault data analysis of vibration and ultrasonic signals.

Figure 7. Dry-type transformer simulation model.

Figure 8. Comparison of measured and simulated leakage magnetic signals.

Figure 9. The loss convergence curve at the optimal accuracy.

Figure 10. t-SNE clustering visualization; (a) original input features; (b) terminal output layer.

Figure 11. Confusion matrix; (a) single vibration signal; (b) single ultrasonic signal; (c) single leakage magnetic signal; (d) multi-source signal fusion.

Figure 12. Sensor weight variation across different fault categories.

Table 1. Structural parameters of dry-type double-winding transformer.

Parameter Name	Dimension/mm	Parameter Name	Dimension/mm
Low-voltage winding height	350	Height per section of high-voltage non-compacted winding	19
Low-voltage winding thickness	15	High-voltage winding thickness	22.5
Window height	430	Distance from high-voltage winding to neutral axis	15.5
Distance between low-voltage winding and core	30	Upper-section height of inner high-voltage winding	116
Inter-winding distance (HV-LV)	30	Edge height difference between HV and LV windings	5
Lower-section height of inner high-voltage winding	174	Distance between upper and lower sections of inner HV winding	20
Upper-section height of outer high-voltage winding	102	Distance between upper and lower sections of outer HV winding	34

Table 2. Sample data information table.

Tag Number	Transformer Operating Status	Sample Data Size (CSV)
0	normal operation	560
1MSC	inter-turn short circuit in the middle of the winding	560
2LSC	inter-turn short circuit in the lower winding	560
3ACD	axial compressive deformation of the winding	560
4AD	winding arc discharge	560

Table 3. Model training hyperparameter table.

Hyperparameters	Value/Choice
batch size	32
number of iterations	50
activation function	Convolutional layer: GELU Output layer: Softmax
optimization algorithm	AdamW
initial learning rate	5 × 10⁻⁴
learning rate adjustment	Warm-up Reduce factor by 0.5, patience value by 3
loss function	Classification cross-entropy

Table 4. Diagnostic performance evaluation for different fault categories.

Labels	Precision	Recall	F1 Score
0	0.9767	1.000	0.988
1MSC	1.000	0.9881	0.994
2LSC	0.9882	1.000	0.994
3ACD	1.000	0.9762	0.988
4AD	1.000	1.000	1.000

Table 5. Comparison of fault identification results for different algorithm models.

Algorithm Category	Accuracy	Precision	Recall	F1 Score
CNN	0.9264	0.9221	0.9218	0.9219
GRU	0.9668	0.9634	0.9641	0.9637
TCN	0.9708	0.9685	0.9692	0.9688
DRSN	0.9545	0.9516	0.9529	0.9522
Our method	0.9929	0.9908	0.9905	0.9906

Table 6. Diagnostic results for different training set compositions.

Training Set Composition (Per Class)		Average Accuracy	Training Set Composition (Per Class)		Average Accuracy
Real Samples	Generated Samples	Average Accuracy	Real Samples	Noisy Samples	Average Accuracy
40	436	0.9130	40	436	0.8454
60	416	0.9515	60	416	0.9071
80	396	0.9856	80	396	0.9433
100	376	0.9923	100	376	0.9802

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, Y.; Deng, X. Power Transformer Winding Fault Diagnosis Method Based on Time–Frequency Diffusion Model and ConvNeXt-1D. Appl. Sci. 2026, 16, 2528. https://doi.org/10.3390/app16052528

AMA Style

Yang Y, Deng X. Power Transformer Winding Fault Diagnosis Method Based on Time–Frequency Diffusion Model and ConvNeXt-1D. Applied Sciences. 2026; 16(5):2528. https://doi.org/10.3390/app16052528

Chicago/Turabian Style

Yang, Yulong, and Xiangli Deng. 2026. "Power Transformer Winding Fault Diagnosis Method Based on Time–Frequency Diffusion Model and ConvNeXt-1D" Applied Sciences 16, no. 5: 2528. https://doi.org/10.3390/app16052528

APA Style

Yang, Y., & Deng, X. (2026). Power Transformer Winding Fault Diagnosis Method Based on Time–Frequency Diffusion Model and ConvNeXt-1D. Applied Sciences, 16(5), 2528. https://doi.org/10.3390/app16052528

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Power Transformer Winding Fault Diagnosis Method Based on Time–Frequency Diffusion Model and ConvNeXt-1D

Abstract

1. Introduction

2. Basic Principles

2.1. Diffusion Model

2.2. ConvNeXt Model

3. Few-Shot Fault Diagnosis Model Based on Time–Frequency Diffusion and ConvNeXt-1D

3.1. Time–Frequency Diffusion Model

3.2. Training of the Generative Model

3.3. ConvNeXt-1D Training Modell

3.4. Fault Diagnosis Procedure

4. Experimental Analysis

4.1. Experimental Platform

4.2. Experimental Data Processing

4.3. Experimental Results Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI