Prediction of Stress in Power Transformer Winding Conductors Using Artificial Neural Networks: Hyperparameter Analysis

Valencia, Fausto; Arcos, Hugo; Quilumba, Franklin

doi:10.3390/en14144242

Open AccessArticle

Prediction of Stress in Power Transformer Winding Conductors Using Artificial Neural Networks: Hyperparameter Analysis

by

Fausto Valencia

^*

,

Hugo Arcos

and

Franklin Quilumba

Faculty of Electrical Engineering, Escuela Politecnica Nacional, Ladrón de Guevara 253, Quito 170517, Ecuador

^*

Author to whom correspondence should be addressed.

Energies 2021, 14(14), 4242; https://doi.org/10.3390/en14144242

Submission received: 9 June 2021 / Revised: 4 July 2021 / Accepted: 8 July 2021 / Published: 14 July 2021

(This article belongs to the Section F5: Artificial Intelligence and Smart Energy)

Download

Browse Figures

Versions Notes

Abstract

:

The purpose of this research is the evaluation of artificial neural network models in the prediction of stresses in a 400 MVA power transformer winding conductor caused by the circulation of fault currents. The models were compared considering the training, validation, and test data errors’ behavior. Different combinations of hyperparameters were analyzed based on the variation of architectures, optimizers, and activation functions. The data for the process was created from finite element simulations performed in the FEMM software. The design of the Artificial Neural Network was performed using the Keras framework. As a result, a model with one hidden layer was the best suited architecture for the problem at hand, with the optimizer Adam and the activation function ReLU. The final Artificial Neural Network model predictions were compared with the Finite Element Method results, showing good agreement but with a much shorter solution time.

Keywords:

artificial neural networks; deep learning; power transformers; stress; finite element method; electromagnetic forces

Graphical Abstract

1. Introduction

Electromagnetic forces and stresses in conductors are usually determined with the Finite Element Method (FEM) [1,2,3,4,5]. As a numerical method, FEM needs the discretization of the medium, which results in the creation of nodes, each one represented as one row and one column in a matrix [6]. This process is not unique because FEM internally looks for the discretization that presents a slight field variation between nodes [7]. This is achieved through an iterative process, which lasts longer when the parameters of the problem have a nonlinear behavior—such as the permeability of the transformer core.

As a consequence, FEM could spend excessive time in obtaining one solution. When the transformer design requires the electromagnetic forces, this is not a problem because just a small number of calculations need to be performed. For example, if the procedure of evaluation of design recommended in [8] is followed, only the worst force and stress must be calculated, which implies one FEM simulation. Even if the dynamical analysis of the same reference is considered, 250 simulations should be performed at most, which correspond to 2 s at 60 Hz, at the highest value of short circuit current and the most pessimistic transient conditions.

However, if the forces must be calculated multiple times, FEM could become cumbersome. For that reason, finding a method that quickly determines the stress in winding conductors of a power transformer, such as an artificial neural network (ANN), may be very useful.

The interaction between artificial intelligence and electromagnetic forces is not widely studied in the research community.

Some investigations are focused on the analysis of possible damage inside the equipment. For example, in [9], ANNs were applied to classify the condition of power transformers as normal, degraded, or anomalous. The novelty of the research was the identification of nonlinearities in the response of the vibrations before electrical load variations. In the experimental setup, the vibrations due to electromagnetic forces in windings were separated from those due to magnetostriction in the core. The ANN classification results were compared with those of a Naive Bayesian Classifier, giving better accuracy.

Other related investigations are not focused on forces but on the determination of the magnetic field. In [10], magnetic fields were calculated with the use of Convolutional Neural Networks (CNN). In that research, a transformer with variable dimensions of the core and windings was included in the analysis. The authors used pictures of the magnetic field distribution as input for training. Each pixel was an input for the CNN. The training process was of the supervised kind. The authors used dropout layers to increase the generalization of the network. The dropout neurons were probabilistically chosen through a Bayesian Monte Carlo Technique. This same technique was used to detect if the problem was not suitable to be modeled with a CNN. An important feature of the research is the use of dilated filters, which fit well with the behavior of the magnetic field, which is influenced by the surrounding fields. The main limitation of the research is that the electrical current was not varied when creating the training data. This restriction limits the application of the CNN model when the goal is its use in practical problems. The study is limited to only one current, which could become a restriction when the fields and the forces are required for different transformer operating conditions.

A similar approach to using CNN—treating the magnetic field like a picture—was used in [11]. In this case, the problem is the determination of electromagnetic scattering when there is an incident wave. Finite differences in the frequency domain were used to generate the training data corresponding to wave and scattering pictures. The novelty of the paper is the use of deep residual networks to improve the network accuracy despite the number of hidden layers. The main difference with the problem of the fields in power transformers is the characteristics of the internal components’ dimensions. While the core has lengths in the order of one meter, the winding conductors or disks could be as thin as millimeters. This wide range of dimension values brings difficulties in treating the pictures required to use this technique.

This research presents an ANN design to determine the stress in the conductors of a 400 MVA, three-phase power transformer produced by the circulation of three-phase fault currents.

FEM simulations are used to train the ANN, where a set of possible fault currents is the input data, and the corresponding stresses in the conductors are the output data. As discussed, each FEM simulation takes a long time. Thus, if the objective is to monitor the effects of the stress on the power transformer in real-time, the FEM simulations would not be the most appropriate option. On the other hand, when developing the ANN model, the long FEM simulation time is needed just during the data collection. After that, the ANN model can predict the stress due to any electrical current inside the range that was established during the creation of the model. As a consequence, there would be a handy ANN model fast enough to get results in real-time, and that could be used during the transformer’s lifetime.

In regards to the design of the ANN model an objective is to find the best combination of alternatives based on three characteristics: architecture, activation functions, and optimizers used in the ANN model. The best combination is assessed by checking three attributes: the error for the training data, the generalization with the validation data, and the convergence time of the process.

2. Artificial Neural Networks Characteristics

Two concepts are commonly used in ANN literature: parameters and hyperparameters. Parameters change during the training process, i.e., they are controlled by the computational program—for example, the weights on each connection. Hyperparameters are defined outside the training process and usually are predefined, for example, the number of ANN layers [12]. Hence, the designer’s task is to find the appropriate hyperparameters to get the lowest difference between the actual and the calculated value.

The difference between the calculated output value

\hat{y}

and the actual value y is the error function of the model L, determined in Equation (1). This function is minimized during the training process.

Minimize L = \sum {(y - \hat{y})}^{2}

(1)

The objective of the ANN model design is to obtain the minimum error. The hyperparameters that the designer can change are the architecture, the optimizer, and the activation function. The architecture includes the number of layers and the number of neurons on each layer. The optimizer is the method to minimize the error function. The activation function introduces nonlinearities in the model.

2.1. Optimizers

Each iteration updates the weights to reach a point as close as possible to the error function’s minimum. This optimization problem is multidimensional (one dimension per weight); hence, the best direction of change for the weights is given by the function’s gradient to be optimized, i.e., the error function. This research has compared four optimizers: SGD, SGD with Momentum (SGDmom), RMSprop, Adam, whose implementations are shown from Equation (2) to Equation (5).

α

is the learning rate,

η

is the momentum, and v is the velocity of change.

ρ

,

ϵ

, m,

β_{1}

, and

β_{2}

are factors according to each optimizer.

SGD

${\bar{w}}_{j} \leftarrow {\bar{w}}_{j} - α \nabla_{w} L$

(2)
SGD with Momentum (SGDmom) [13]

$\begin{matrix} v_{t + 1} & = η \cdot v_{t} + \nabla_{w} L \\ w_{t + 1} & = w_{t} - α \cdot v_{t + 1} \end{matrix}$

(3)
RMSprop [14]

$\begin{matrix} v_{t + 1} & = ρ \cdot v_{t} + (1 - ρ) \cdot \nabla_{w} L^{2} \\ Δ w_{t + 1} & = η \cdot Δ w_{t} + \frac{α}{\sqrt{v_{t + 1} + ε}} \cdot \nabla_{w} L \\ w_{t + 1} & = w_{t} - Δ w_{t + 1} \end{matrix}$

(4)
Adam [15]

$\begin{matrix} m_{t + 1} & = β_{1} \cdot m_{t} + (1 - β_{1}) \cdot \nabla_{w} L \\ v_{t + 1} & = β_{2} \cdot v_{t} + (1 - β_{2}) \cdot \nabla_{w} L^{2} \\ b_{t + 1} & = \frac{\sqrt{1 - β_{2}^{t + 1}}}{1 - β_{1}^{t + 1}} \\ Δ w_{t + 1} & = α \cdot \frac{m_{t + 1}}{\sqrt{v_{t + 1}} + ϵ} \cdot b_{t + 1} \\ w_{t + 1} & = w_{t} - Δ w_{t + 1} \end{matrix}$

(5)

2.2. Activation Functions

The activation functions that are part of the analysis in the present research are

ReLU

$f (x) = max [0, x]$

(6)
Leaky ReLU [16,17]

$\begin{matrix} f (x) & = \{\begin{matrix} x & if x \geq 0 \\ α x & if x < 0 \end{matrix} \end{matrix}$

(7)
PReLU [18]: The factor a is determined adaptively during training [19].

$\begin{matrix} f (x) & = \{\begin{matrix} x & if x \geq 0 \\ a x & if x < 0 \end{matrix} \end{matrix}$

(8)
ELU

\begin{matrix} f (x) & = \{\begin{matrix} x & if x \geq 0 \\ α (e^{x} - 1) & if x < 0 \end{matrix} \end{matrix}

(9)

3. Description of the Case

3.1. Power Transformer Characteristics

The transformer under analysis is a two-winding, three-phase power transformer. Table 1 presents its characteristics, Table 2 indicates the dimensions, and Figure 1a shows its internal structure. The windings are of a disk type. The transformer core material is US Steel Type 2-S, 0.018 inch thickness, whose saturation curve is represented in Figure 1b. This material is part of the FEMM software library [20].

3.2. Currents in Windings

The currents considered to calculate the conductors’ stress are those of three-phase faults seen by the transformer. The range goes from fault currents on the transformer terminals to fault currents with values lower than the rated current.

The peak values are taken from the transient currents. For simplicity, the origin of the fault is located when one phase current (A, B, or C) crosses through zero. In Equation (10), the general form to calculate the currents in the three phases of the low-voltage windings is shown [21]. The current previous to the fault has not been considered when generating the training samples; however, in the final results, it will be seen that the model is accurate even for preloaded cases. Table 3 shows the correspondence between the phase currents and those of Equation (10), according to the instant of the fault.

\begin{matrix} i_{1} (t) & = I_{L V} \cdot [sin (ω t - ϕ) + sin (ϕ) \cdot exp (- λ t)] \\ i_{2} (t) & = I_{L V} \cdot [sin (ω t - θ - ϕ) + sin (θ + ϕ) \cdot exp (- λ t)] \\ i_{3} (t) & = I_{L V} \cdot [sin (ω t + θ - ϕ) + sin (- θ + ϕ) \cdot exp (- λ t)] \end{matrix}

(10)

I_{L V}

is the maximum fault current when it has reached the steady-state. It depends on the impedance Z seen by the transformer during the fault—i.e., on the resistance R and the inductive reactance

X_{L}

—as shown in Equation (11), where

V_{L V}

is the peak line-to-neutral voltage of the low-voltage winding (

112.7

kV for this study).

\begin{matrix} I_{L V} & = V_{L V} / Z \\ Z & = \sqrt{R^{2} + X_{L}^{2}} \end{matrix}

(11)

where

ω

is the angular frequency (

2 \cdot π \cdot 60

for a 60 Hz power frequency).

θ

is the angle to control the fault’s time; one phase will see the fault at zero seconds or zero radians, whereas the other two phases will see the fault at

\pm 120^{\circ}

, depending on the phase.

ϕ

is the angle of the impedance Z, i.e.,

cos ϕ = R / Z

.

λ

is the time constant of the electromagnetic transient, i.e.,

λ = R / L

, where L is the inductance of the inductive reactance

X_{L}

.

The transformer’s impedance is the lowest limit, which, in the generation of input data, represents the current at the transformer’s terminals. The leakage reactance is

33.61 %

, which is 16

Ω

in the transformer base. The highest limit has been set until a reactance of 80

Ω

, as Table 4 shows.

3.3. Electromagnetic Forces

Electromagnetic forces act on the conductors of a winding when the current interacts with the surrounding magnetic field [22]. In mathematical form, this is expressed in Equation (12); f is the force per volume, J is the current density, and B is the magnetic induction [23].

f = J \times B

(12)

Therefore, the main task is to find the magnetic induction B since J is part of the input data. From Electromagnetic Theory, B can be obtained by first calculating the magnetic vector potential A in Equation (13) (Laplace equation for magnetics) and then B in Equation (14) [24]. However, this process requires the treatment of partial differential equations (PDE), which usually do not have analytical solutions unless the problem has a simple geometry [25]. That is not the case for power transformers, which, on the contrary, have a complex geometry [26].

\nabla^{2} A = - μ J

(13)

\nabla \times B = A

(14)

FEM is a numerical method used to solve partial differential equations that have the form of Poisson or Laplace equations. Hence, the method can be used to find the magnetic field B in power transformers. FEM discretizes the medium where the field must be determined.

Like any partial differential equation, Equation (13) also needs a boundary condition to have a unique solution. This condition is considered inside FEM and must be set during the simulation. FEM simulations for power transformers use the Dirichlet boundary condition, which sets a null value for A in some predetermined border. The ideal case would be to have a null vector magnetic potential at infinity because it works as a reference for the rest of the system. As it is not physically possible to have this condition, an appropriate border is chosen by following the convergence criteria for computational methods [27], which recommends performing simulations until no high variation is seen in the values of the potential inside the power transformer. Moreover, the boundary conditions have been located symmetrically around the equipment to have a similar influence in the internal fields.

Figure 2 presents the discretization of the 400 MVA power transformer under analysis, generated by the software FEMM. Note that around the disks, the discretization is denser. This characteristic of the triangulation process allows finding a more detailed field where it is suspected that the magnetic induction will have a higher change.

Figure 3 shows the magnetic induction when the transformer is working under rated conditions. The current on Phase A is at its peak. The magnetic induction is the highest in the first limb, which belongs to Phase A. The same behavior is seen in the flux lines. Note that some lines cross the disks of the transformer. They represent the magnetic flux that, together with the current density, will produce a force in the disks.

For this same case, Figure 4 shows the radial and axial forces on the low-voltage winding. Disk 1 is the highest of the winding, and Disk 100 is the lowest. The radial forces are all positive, which means they are directed towards the core. The maximum radial force is located around the middle of the winding. The axial forces are negative in the highest disk and positive in the lowest disk. This characteristic means that they are directed towards the middle of the winding.

In the high-voltage winding (see Figure 5), the radial forces are negative. They are directed away from the core. The axial forces have similar behavior to the low-voltage radial forces, which means that they point towards the middle of the winding.

3.4. Winding Conductors Stress

The stresses in the winding conductors are the outputs of the ANN. For the sake of simplicity, the ANN design considers only the middle winding conductors, which have the highest value of stress and force. Thus, for the low-voltage winding, the stress is stored from disk 48 to disk 51; for the high-voltage winding, it is held from disk 49 to disk 51.

In addition, to reduce the simulation time, each disk of the winding is taken as one conductor. This simplification in the model is possible because the stress is defined as a per area magnitude, which does not have a high difference among the disk conductors when there is no considerable variation in the force per volume. The procedure developed in this research has taken the middle conductor of the disk to analyze the average value of stress.

The model considers each disk as a ring. Only the radial force is counted towards calculating the stress because it is the only one that can have a value without causing displacement of the ring. The axial force is nullified in the winding structure, either by another disk or by the winding supports; if that were not the case, the disk would have vertical movements [28].

Figure 6 shows a ring subject to a radial force. The wire radius is much lower than the radius R. The radial force

f_{r}

causes P, which is perpendicular to the wire section. Due to the force P, there is a stress

σ

acting on the wire. In that way, the stress

σ

is calculated according to Equation (15) [29]. The force

f_{r}

is the radial component of the force per unit length.

\begin{matrix} P & = f_{r} \cdot R \\ σ & = \frac{P}{S c} \end{matrix}

(15)

The following example uses a case with

R = 1 Ω

and

X_{L} = 15 Ω

to explain the approach to find the conductors’ stress. Table 5 presents the values used in Equation (10) to calculate the current as a function of time. Figure 7a shows the three phases’ current shape for a fault starting when the Phase A current is zero.

The current and the transformer internal geometry are the input data for the FEM simulations. The output of the FEM simulations is the magnetic field. Then, the stress is calculated as described before. Figure 7b shows the stress for the example.

4. ANN Design

The input data of the ANN are the currents circulating through the windings and the output data are the stresses on the conductors, as shown in Figure 8.

The design of the ANN follows three steps:

Tune the architecture of the model;
Tune the optimizer;
Tune the activation functions.

SGD models are developed to obtain a set of possible architectures. The learning rate used during the process is

0.1

because a value less than that would take too much time and too many epochs to converge. The following architectures had the best performance and are part of the next design steps.

One hidden layer:

126	251	501	1000
1995	3981	7943	10,000

Two hidden layers:

$[16 1000]$	$[32 501]$	$[32 1000]$	$[63 126]$
$[63 251]$	$[63 501]$	$[63 1000]$	$[126 32]$
$[126 63]$	$[126 126]$	$[126 251]$	$[126 501]$
$[126 1000]$	$[251 32]$	$[251 63]$	$[251 126]$
$[251 251]$	$[251 501]$	$[251 1000]$	$[501 16]$
$[501 32]$	$[501 63]$	$[501 126]$	$[501 251]$
$[501 501]$	$[501 1000]$	$[1000 16]$	$[1000 32]$
$[1000 63]$	$[1000 126]$	$[1000 251]$	$[1000 501]$
$[1000 1000]$

4.1. Tuning the Momentum and Epsilon

Three models from SGD are used to find the appropriate momentum. They represent the first, second, and third quartile: 3981,

[501 126]

, and

[63 126]

. The usual behavior is to have a lower error as long as the momentum rises. The momentum with the lowest error is 0.9, as Table 6a shows in the mean value for the three cases tested. Therefore, a momentum of 0.9 is used when testing the optimizers.

For tuning epsilon in RMSprop, the first, second, and third quartiles SGDmom are taken, i.e.,

[501 251]

, 1000,

[126 251]

. An epsilon of

10^{- 4}

yields the lowest error for the three cases. This result is verified in the mean values of Table 6b.

The same quartiles of SGDmom are used to tune epsilon in Adam. The lowest error appears for the epsilon

10^{- 4}

. Table 6c confirms this result.

4.2. Comparison of Optimizers

Table 7 summarizes the hyperparameters used in the process for each optimizer. RMSprop and Adam have a lower learning rate than SGD and SGDmom because they are faster to converge.

Figure 9 shows the training time for each optimizer. By far, SGD needs the longest time to converge, even with a higher learning rate. It is about forty times greater than RMSprop and Adam times.

In Figure 10a, the performance for the optimizers analyzed in this research is presented for a patience 10. RMSprop and Adam have the lowest error; hence, they are chosen for the analysis with patience 40, which is shown in Figure 10b. Both optimizers have similar errors, though Adam has a slightly lower value; therefore, it is chosen for the ongoing analysis.

4.3. Choosing the Activation Function

Similar to the procedure to get the optimizers’ hyperparameters, the first, second, and third quartiles obtained in the Adam process with patience 40 are used to analyze the activation functions. The related architectures are

[251 32]

,

[251 1000]

, and 126, respectively.

Table 8a shows the behavior of the error considering the variation of Alpha for Leaky ReLU. Alpha 0.1 presents the lowest error and is used in the detailed analysis. Table 8b shows the behavior of loss for alpha with the ELU activation function. The value of alpha 0.3 presents the lowest error.

Figure 11 presents the loss distribution for the four activation functions: ReLU, Leaky ReLU, ELU, and PReLU. The behavior is similar for all of them. ReLU and Leaky ReLU have the lowest losses. ReLU is the activation function chosen because of its natural simplicity.

To conclude, the best suited ANN has the following characteristics:

Optimizer—Adam;
Activation Function—ReLU.

4.4. Choosing the Final Architecture

Until this point, the design procedure has considered all the architectures described at the beginning of Section 4. Once the optimizer and the activation function have been settled, along with the related hyperparameters, the architectures included in the first quartile are used to continue with the design:

One hidden layer: 501, 1000, 1995, 3981, 7943, 10,000.

Two hidden layers:

[16 1000]

,

[63 1000]

,

[501 16]

,

[1000 251]

,

[1000 1000]

.

Figure 12 shows the error for each architecture. The best four, one-hidden-layer architectures have a better performance than any two-hidden-layer architectures. In addition, they have a similar value of error. The architecture with one hidden layer and 1995 neurons is chosen for the simplicity of the model. According to the training processes, the model does not require additional tuning since the error will not decrease significantly.

Figure 13 presents the process of training the final design with a patience 100. The error consistently reduces until the number of epochs is close to the thousands. Then, the error is almost constant. There is a noise, which means that the algorithm has reached a minimum and is skipping from one point to another.

The final design is validated with data that were not used for training nor testing during the design process. The error with the validation data was

7.8136 \cdot 10^{- 8}

. The validation data has 795 samples from different values of resistance and reactance combinations.

5. Comparison to the Finite Element Method

5.1. Results in Actual Measurement Units

The model is applied to the transformer where the input data are the current through the windings in amperes, and the output data are the stresses on the winding’s middle disks in Pascals. For the simulated cases, the stress is calculated at the points where the currents (hence, the forces and the stresses) have an extreme value. Figure 14 illustrates this point for one cycle. For Phase A, the stress is calculated in

π

and in

2 π

. For Phase B, the stress is calculated in

2 π / 3

and in

5 π / 3

. Finally, for Phase C, the stress is calculated in

π / 3

and in

4 π / 3

. The process is repeated for the rest of the analyzed time.

All of the cases simulate a fault that begins when Phase A is crossing through zero, and the power transformer has a prefault rated current.

The first case simulates a fault with resistance

R = 1 Ω

and a reactance

X_{L} = 80 Ω

. This impedance represents a fault located far away from the transformer’s terminals. Figure 15 shows the results for FEM and ANN simulations. The high

X_{L} / R

ratio extends the transient of the phenomenon. Thus, the cycles with high stress do not touch the cycles with low stress within the fifty cycle simulations. The results for FEM and ANN simulations are very close.

The second event is a fault with a resistance

R = 1 Ω

and a reactance

X_{L} = 47 Ω

(see Figure 16). The impedance is lower than in the first case, which causes higher stress on the conductors. The resistance/reactance ratio is also lower; hence, the transient ends faster and the difference between cycles of high stress and low stress is reduced at the end of the fifty cycles period. Some differences in the results of FEM and ANN are seen at the beginning of the phenomenon, mainly when the stress is high. Nonetheless, such difference is negligible.

Finally, a fault at the terminals of the transformer, with a resistance

R = 1 Ω

and

X_{L} = 15 Ω

, is analyzed (see Figure 17). The transient ends in about 20 cycles. The highest stress is present in Phase A, which agrees with the fault time when Phase A crosses zero. Most of the results are very close for FEM and ANN. A few cases where an appreciable difference exists are located at the beginning of the transient in Phase B and Phase C, for the highest stress values.

Table 9 presents the mean absolute percentage error for the foregoing cases. The model has the least error when the fault is at the terminals of the power transformer. As long as the fault is farther away, the error increases. In each case, the error is greater when the stress is lower. This behavior agrees with the training characteristics of ANNs that are more influenced by higher values than by lower ones.

5.2. Simulation Time

A fault with

R = 1 Ω

and

X_{L} = 15 Ω

was simulated with FEM and the ANN model to obtain the time of the simulation.

Figure 18a presents the time spent in FEM simulations. The relationship between the number of samples and the time of simulation is linear. Each sample takes about 143 s for its simulation. For this reason, the simulation of a phenomenon with 150 pulses of stress required about six hours.

On the other hand, ANN simulations require about 0.53 s for 10,000 samples, see Figure 18b. The reduction of time due to the direct application of the model is evident. This reduction was expected since the model contains only matrix operations and the use of the activation function. The computational system limits the time reduction; therefore, the minimum simulation time is constant (0.085 s) from the tens to the hundreds of samples.

The high number of samples that can be simulated in less than one second opens the possibility of using the model in real-time applications. Considering only the peak values of stress, there are six values per cycle (two per phase); for a power frequency of 60 Hz, that means a sample each

2.78 \cdot 10^{- 3}

s. In real-time, 10,000 samples would require

27.8

s. Consequently, the 0.53 s are enough for a real-time simulation of this kind. The limit would be a batch of 40 samples that has an average of about 470 samples per second to be simulated, which is higher than the 360 samples per second needed for stress analysis. This analysis was performed in a personal computer with an Intel i5, 64 bit, 2.50 GHz processor and 4.00 GB of RAM.

6. Conclusions

This research shows the possibility of modeling the stresses in winding conductors of a 400 MVA, three-phase power transformer using artificial neural networks. For this purpose, an analysis of the combination of hyperparameters that best reduces the error between the proposed model and the finite element analysis results was performed.

For the conductor stress problem, the single-layer models performed better than the two-layer ones. Additionally, more complex models took longer to minimize the error function, which is a disadvantage during the design process. In other words, it is not always true that a more complex model is better.

As the optimizers were more elaborated or had more hyperparameters, the algorithm found the minimum of the loss function faster. RMSprop and Adam had a better convergence speed than SGD or SGDmom. As a consequence, lower learning rates could be used, and the model achieved lower losses.

As for this research’s problem, the activation functions did not have too much influence on the model’s accuracy. ReLU was chosen, in the end, because of its simplicity since it does not have any hyperparameter and its implementation is straightforward.

With groups of data greater than 40 samples, it is even possible to use the artificial neural network model in real-time applications when the stress is needed only in peak values, such as in fatigue analysis.

A possible future investigation could be the design of an artificial neural network model for stresses caused by inrush currents. The problem has particular characteristics that make it challenging for an artificial neural network model. Mainly, the nonlinear features of the core could be an issue in modeling the phenomenon overall because, in this case, the transformer might be working under saturated conditions.

Author Contributions

Conceptualization, F.V.; methodology, F.V.; software, F.V.; validation, H.A. and F.Q.; formal analysis, F.Q.; investigation, F.V.; resources, H.A.; data curation, F.V. and F.Q.; writing—original draft preparation, F.V.; writing—review and editing, H.A. and F.Q.; visualization, F.V.; supervision, H.A.; project administration, H.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Behjat, V.; Shams, A.; Tamjidi, V. Characterization of power transformer electromagnetic forces affected by winding faults. J. Oper. Autom. Power Eng. 2018, 6, 40–49. [Google Scholar]
Moradnouri, A.; Vakilian, M.; Hekmati, A.; Fardmanesh, M. HTS transformer windings design using distributive ratios for minimization of short circuit forces. J. Supercond. Nov. Magn. 2019, 32, 151–158. [Google Scholar] [CrossRef]
Mikhak-Beyranvand, M.; Rezaeealam, B.; Faiz, J.; Rezaei-Zare, A. Impacts of ferroresonance and inrush current forces on transformer windings. IET Electr. Power Appl. 2019, 13, 914–921. [Google Scholar] [CrossRef]
Najafi, A.; Iskender, I. Electromagnetic force investigation on distribution transformer under unbalanced faults based on time stepping finite element methods. Int. J. Electr. Power Energy Syst. 2016, 76, 147–155. [Google Scholar] [CrossRef]
Ge, W.; Zhao, J.; Wang, Y. Analysis of the residual flux influence on inrush current and electromagnetic force in large power transformer. J. Eng. 2019, 2019, 2426–2429. [Google Scholar] [CrossRef]
Jin, J.M. The Finite Element Method in Electromagnetics; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
Peterson, A.F.; Ray, S.L.; Mittra, R.; Institute of Electrical and Electronics Engineers. Computational Methods for Electromagnetics; IEEE: New York, NY, USA, 1998; Volume 351. [Google Scholar]
IEC. IEC 60075 Power Transformers-Part 5: Ability to Withstand Short Circuit; International Electrotechnical Commission: Geneva, Switzerland, 2006. [Google Scholar]
Hong, K.; Lin, G. State classification of transformers using nonlinear dynamic analysis and Hidden Markov models. Measurement 2019, 147, 106851. [Google Scholar] [CrossRef]
Khan, A.; Ghorbanian, V.; Lowther, D. Deep learning for magnetic field estimation. IEEE Trans. Magn. 2019, 55, 1–4. [Google Scholar] [CrossRef]
Qi, S.; Wang, Y.; Li, Y.; Wu, X.; Ren, Q.; Ren, Y. 2D Electromagnetic Solver Based on Deep Learning Technique. IEEE J. Multiscale Multiphy. Comput. Tech. 2020, 5, 83–88. [Google Scholar] [CrossRef]
Diaz, G.I.; Fokoue-Nkoutche, A.; Nannicini, G.; Samulowitz, H. An effective algorithm for hyperparameter optimization of neural networks. IBM J. Res. Dev. 2017, 61, 9:1–9:11. [Google Scholar] [CrossRef] [Green Version]
Sutskever, I.; Martens, J.; Dahl, G.; Hinton, G. On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1139–1147. [Google Scholar]
Hinton, G.; Srivastava, N.; Swersky, K. Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. Cited 2012, 14, 2. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Laurent, T.; Brecht, J. The multilinear structure of ReLU networks. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 2908–2916. [Google Scholar]
Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. Proc. Icml. Citeseer 2013, 30, 3. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar] [CrossRef] [Green Version]
Xu, B.; Wang, N.; Chen, T.; Li, M. Empirical evaluation of rectified activations in convolutional network. arXiv 2015, arXiv:1505.00853. [Google Scholar]
Meeker, D. Finite Element Method Magnetics Version 4.2: User’s Manual; IEEE: New York, NY, USA, 2010. [Google Scholar]
Greenwood, A. Electrical Transients in Power Systems; John Wiley and Sons Inc.: New York, NY, USA, 1991. [Google Scholar]
Bertagnolli, G. Short-Circuit Duty of POWER Transformers; ABB: Zürich, Switzerland, 2013. [Google Scholar]
Vanderlinde, J. Classical Electromagnetic Theory; Fundamental Theories of Physics; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Stratton, J.A. Electromagnetic Theory; IEEE Press Series on Electromagnetic Wave Theory; John Wiley & Sons: Hoboken, NJ, USA, 2007; Volume 33. [Google Scholar]
Jackson, J. Classical Electrodynamics; Wiley: Hoboken, NJ, USA, 2012. [Google Scholar]
Kulkarni, S.V.; Khaparde, S. Transformer Engineering: Design and Practice; CRC Press: Boca Raton, FL, USA, 2004; Volume 25. [Google Scholar]
Rylander, T.; Ingelström, P.; Bondeson, A. Computational Electromagnetics; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Flügge, W. Stresses in Shells; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Timoshenko, S. Strength of Materials Part 1; D. Van Nostrand Company Incorporated: New York, NY, USA, 1940. [Google Scholar]

Figure 1. Power transformer characteristics. (a) Internal structure; (b) Core saturation curve.

Figure 2. Discretization of 400 MVA power transformer. (a) Complete transformer; (b) Discretization around the disks.

Figure 3. Magnetic induction for rated current. (a) Magnetic induction; (b) Flux lines.

Figure 4. Forces on Phase A low-voltage winding. (a) Radial force; (b) Axial force.

Figure 5. Forces on Phase A high-voltage winding. (a) Radial force; (b) Axial force.

Figure 6. Ring subject to the action of a radial force f and a force P perpendicular to its section.

Figure 7. Current and stress for a three-phase fault with a system equivalent impedance R = 1 and X_L = 15. (a) Current on the three phases; (b) Stress on phase A.

Figure 8. Artificial neural network used to find the stress in the conductors of the windings as a function of the input currents.

Figure 9. Histograms for the convergence time of each optimizer.

Figure 10. Comparison of errors for different optimizers.

Figure 11. Comparison of activation functions.

Figure 12. Errors for Patience 100, Optimizer—Adam.

Figure 13. Behavior of the training and testing errors for the 1995 neurons, one-hidden-layer model.

Figure 14. Extreme values location of current for each phase in one cycle.

Figure 15. FEM and ANN simulations for a fault impedance of R = 1 Ω and X_L = 80 Ω.

Figure 16. FEM and ANN simulations for a fault impedance of R = 1 Ω and X_L = 47 Ω.

Figure 17. FEM and ANN simulations for a fault impedance of R = 1 Ω and X_L = 15 Ω.

Figure 18. Simulation time according to the number of samples.

Table 1. Power transformer under analysis.

Variable	Value	Unit
Power	400	MVA
High Voltage	230	kV
Low Voltage	138	kV
Frequency	60	Hz
Group of Connection	Yy0
Impedance (own base)	33.61	%
Number of Low-Voltage Disks	100	u
Number of High Voltage Disks	105	u

Table 2. Transformer internal dimensions.

Item	Length (m)
Core Diameter	0.9582
Core window height	3.034
Limb–limb separation	1.853
Low-voltage winding inside diameter	1.108
Low-voltage winding outside diameter	1.309
High-voltage winding inside diameter	1.429
High-voltage winding outside diameter	1.625
Low-voltage disk height	0.0175
High-voltage disk height	0.0136
Spacer block (located between disks)	0.010

Table 3. Correspondence between the phase currents and Equation (10).

Phase Crossing the Zero Current	$i_{1}$	$i_{2}$	$i_{3}$
Phase A	$i_{a}$	$i_{b}$	$i_{c}$
Phase B	$i_{b}$	$i_{c}$	$i_{a}$
Phase C	$i_{c}$	$i_{a}$	$i_{b}$

Table 4. Range of impedances used in the calculation of the phase currents.

Location of Fault	Resistance	Reactance
Transformer terminals	1 $Ω$	15 $Ω$
Far away from transformer	5 $Ω$	80 $Ω$

Table 5. Variables for the

R = 1 Ω

and

X_{L} = 15 Ω

case.

Table 5. Variables for the

R = 1 Ω

and

X_{L} = 15 Ω

case.

Variable	Value	Unit
$α$	$25.13$	$s^{- 1}$
Z	$15.03$	$Ω$
$ϕ$	$1.504$	radians
V max	$112.67$	$k V$
I max	7495	A
$θ$	$2.094$	radians

Table 6. Mean value for the error according to the momentum and epsilon.

(a) Momentum for SGDmom
Momentum	Mean value
$0.1$	$6.3774 \cdot 10^{- 6}$
$0.3$	$6.6169 \cdot 10^{- 6}$
$0.5$	$6.2652 \cdot 10^{- 6}$
$0.7$	$5.8841 \cdot 10^{- 6}$
$0.9$	$4.8939 \cdot 10^{- 6}$
(b) Epsilon for RMSprop
Epsilon	Mean value
$10^{- 5}$	$3.52 \cdot 10^{- 6}$
$10^{- 4}$	$2.90 \cdot 10^{- 6}$
$10^{- 3}$	$4.15 \cdot 10^{- 6}$
$10^{- 2}$	$5.01 \cdot 10^{- 6}$
$10^{- 1}$	$5.52 \cdot 10^{- 6}$
$1.0$	$7.03 \cdot 10^{- 6}$
(c) Epsilon for ADAM
Epsilon	Mean value
$10^{- 7}$	$6.60 \cdot 10^{- 6}$
$10^{- 6}$	$5.91 \cdot 10^{- 6}$
$10^{- 5}$	$4.14 \cdot 10^{- 6}$
$10^{- 4}$	$3.11 \cdot 10^{- 6}$
$10^{- 3}$	$3.80 \cdot 10^{- 6}$

Table 7. Range of each optimizer hyperparameter.

Optimizer	Hyperparameter	Value
SGD	$α$	$0.1$
SGDmom	$α$	0.1
SGDmom	$η$	$0.9$
RMSprop	$α$	0.01
	$ρ$	0.9
	$η$	0.9
	$ϵ$	$10^{- 4}$
Adam	$α$	0.001
	$β_{1}$	0.9
	$β_{2}$	0.999
	$ϵ$	$10^{- 4}$

Table 8. Mean value for the error according to Alpha.

a. Leaky ReLU with Adam
Alpha	Mean value
$0.1$	$2.80 \cdot 10^{- 6}$
$0.2$	$3.21 \cdot 10^{- 6}$
$0.3$	$4.02 \cdot 10^{- 6}$
$0.4$	$7.78 \cdot 10^{- 6}$
$0.5$	$2.25 \cdot 10^{- 5}$
b. ELU with Adam
Alpha	Mean value
$0.1$	$5.12 \cdot 10^{- 6}$
$0.3$	$3.42 \cdot 10^{- 6}$
$0.5$	$7.05 \cdot 10^{- 6}$
$0.7$	$8.20 \cdot 10^{- 6}$
$0.9$	$1.83 \cdot 10^{- 5}$
$1.0$	$1.29 \cdot 10^{- 5}$

Table 9. Mean absolute percentage error for the ANN model.

a. $R = 1$ and $X_{L} = 15$
Phase	Winding	MAPE(%)
A	Low Voltage	0.33
A	High Voltage	0.26
B	Low Voltage	0.92
B	High Voltage	0.74
C	Low Voltage	0.66
C	High Voltage	0.65
b. $R = 1$ and $X_{L} = 47$
Phase	Winding	MAPE(%)
A	Low Voltage	5.21
A	High Voltage	5.30
B	Low Voltage	2.76
B	High Voltage	1.81
C	Low Voltage	1.47
C	High Voltage	1.51
c. $R = 1$ and $X_{L} = 80$
Phase	Winding	MAPE(%)
A	Low Voltage	15.10
A	High Voltage	15.47
B	Low Voltage	18.06
B	High Voltage	12.90
C	Low Voltage	13.40
C	High Voltage	14.85

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Valencia, F.; Arcos, H.; Quilumba, F. Prediction of Stress in Power Transformer Winding Conductors Using Artificial Neural Networks: Hyperparameter Analysis. Energies 2021, 14, 4242. https://doi.org/10.3390/en14144242

AMA Style

Valencia F, Arcos H, Quilumba F. Prediction of Stress in Power Transformer Winding Conductors Using Artificial Neural Networks: Hyperparameter Analysis. Energies. 2021; 14(14):4242. https://doi.org/10.3390/en14144242

Chicago/Turabian Style

Valencia, Fausto, Hugo Arcos, and Franklin Quilumba. 2021. "Prediction of Stress in Power Transformer Winding Conductors Using Artificial Neural Networks: Hyperparameter Analysis" Energies 14, no. 14: 4242. https://doi.org/10.3390/en14144242

APA Style

Valencia, F., Arcos, H., & Quilumba, F. (2021). Prediction of Stress in Power Transformer Winding Conductors Using Artificial Neural Networks: Hyperparameter Analysis. Energies, 14(14), 4242. https://doi.org/10.3390/en14144242

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of Stress in Power Transformer Winding Conductors Using Artificial Neural Networks: Hyperparameter Analysis

Abstract

1. Introduction

2. Artificial Neural Networks Characteristics

2.1. Optimizers

2.2. Activation Functions

3. Description of the Case

3.1. Power Transformer Characteristics

3.2. Currents in Windings

3.3. Electromagnetic Forces

3.4. Winding Conductors Stress

4. ANN Design

4.1. Tuning the Momentum and Epsilon

4.2. Comparison of Optimizers

4.3. Choosing the Activation Function

4.4. Choosing the Final Architecture

5. Comparison to the Finite Element Method

5.1. Results in Actual Measurement Units

5.2. Simulation Time

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI