A Machine-Learning Approach to Identify the Inﬂuence of Temperature on FRA Measurements

: Frequency response analysis (FRA) is a powerful and widely used tool for condition assessment in power transformers. However, interpretation schemes are still challenging. Studies show that FRA data can be inﬂuenced by parameters other than winding deformation, including temperature. In this study, a machine-learning approach with temperature as an input attribute was used to objectively identify faults in FRA traces. To the best knowledge of the authors, this has not been reported in the literature. A single-phase transformer model was speciﬁcally designed and fabricated for use as a test object for the study. The model is unique in that it allows the non-destructive interchange of healthy and distorted winding sections and, hence, reproducible and repeatable FRA measurements. FRA measurements taken at temperatures ranging from − 40 ◦ C to 40 ◦ C were used ﬁrst to describe the impact of temperature on FRA traces and then to test the ability of the machine learning algorithms to discriminate between fault conditions and temperature variation. The results show that when temperature is not considered in the training dataset, the algorithm may misclassify healthy measurements, taken at different temperatures, as mechanical or electrical faults. However, once the inﬂuence of temperature was considered in the training set, the performance of the classiﬁer as studied was restored. The results indicate the feasibility of using the proposed approach to prevent misclassiﬁcation based on temperature changes.


Introduction
Power transformer monitoring is crucial to prevent unplanned service interruptions and maintain electric power system stability. Frequency response analysis (FRA) is a wellknown method for condition monitoring in power transformers that can identify changes in a transformer's active parts. From early studies of the technique in the late 1970s [1] to the present, FRA has demonstrated an ability to detect mechanical and electrical faults in power transformers.
FRA is a non-intrusive monitoring and diagnostic technique that can be implemented without requiring transformer disassembly. As recommended by the principal FRA standards [2,3], a small sinusoidal voltage waveform is applied over a large frequency band (from a few Hz up to a couple of MHz) to one of the terminals of the transformer (input point), and the response is measured in terms of its amplitude (dB) and phase (degrees) at another available terminal (output point).
The current and reference FRA traces are compared to interpret the FRA measurements, identify changes in the transformer's active parts and relate these changes to faults. Ideally, reference measurements are taken just before energization, and subsequent FRA measurements can then show the evolution of the mechanical condition of the transformer over the years. When a reference trace is not available, comparisons between phases in • Investigation of FRA measurements in a laboratory winding model under a wide range of temperatures (−40 • C to 40 • C); • Comparative analysis of machine learning algorithm performance with a large database of fault modes, considering the effects of temperature on automatic classification; • Recommendations that will minimize the influence of temperature variation on automated FRA traces interpretation.
The research was not intended to evaluate the numerical index used to quantify deviations between traces or the performance of the classification algorithms. The CSD index and the SVM were selected because they have been widely used in previous studies concerning FRA interpretation [6,8,13,14]. Other numerical indices and/or classification algorithms should offer similar conclusions.

Materials and Methods
The study had three parts: (1) FRA measurements were performed on a laboratory winding model; (2) the numerical CSD index was calculated to quantify deviations between reference measurements, and frequencies and amplitudes of resonance and anti-resonance points were determined; and (3) an SVM algorithm was used to automatically classify the measurements.

Laboratory Setup
Measurements were taken on a laboratory transformer model specifically designed for FRA testing, the model having no specifications for power or voltage ratings. The model has a uniform conductor structure (same conductor throughout the windings and an equal number of turns per winding section), and solid, non-graded insulation. The model has two windings that are arranged concentrically. The outer winding (winding 1) has 16 separable sections, each with 28 turns, for a total of 448 turns. The outer diameter of winding 1 measures 317 mm, its inner diameter measures 300 mm, and it is 511.3 mm high. The inner winding (winding 2) consists of three fixed layers with 76 turns per layer, for a total of 228 turns. The outer diameter of winding 2 measures 277 mm, its inner diameter measures 259 mm, and it is 530 mm high. Figure 1 shows the laboratory winding model and its connection schematic.

Laboratory Setup
Measurements were taken on a laboratory transformer model specifically designed for FRA testing, the model having no specifications for power or voltage ratings. The model has a uniform conductor structure (same conductor throughout the windings and an equal number of turns per winding section), and solid, non-graded insulation. The model has two windings that are arranged concentrically. The outer winding (winding 1) has 16 separable sections, each with 28 turns, for a total of 448 turns. The outer diameter of winding 1 measures 317 mm, its inner diameter measures 300 mm, and it is 511.3 mm high. The inner winding (winding 2) consists of three fixed layers with 76 turns per layer, for a total of 228 turns. The outer diameter of winding 2 measures 277 mm, its inner diameter measures 259 mm, and it is 530 mm high. Figure 1 shows the laboratory winding model and its connection schematic. A commercial instrument was used to measure FRA traces in the laboratory winding model. For this study, the minimum number of data points per decade was 200, as specified in the IEC standard 60076-18 [2]. Measurements were taken from 1 kHz up to 1 MHz. Two databases of measurements were used for the study: a database with four fault modes and a database with measurements taken at a variety of temperatures.

Fault Database
The fault database was created by introducing four different faults in the laboratory winding model, as well as taking healthy state measurements to serve as reference measurements. The faults include one electrical fault, shorted turns (ST), and three mechanical deformations: axial displacement (AD), radial deformation (RD), and disc space variation (DSV). Figure 2 illustrates the faults and the healthy state of the laboratory winding model.
The AD fault was created by inserting spacers at the bottom of winding 1 to displace it relative to winding 2, resulting in a loss of magnetic coupling between the windings. The fault was incremented in six steps, AD 1 to AD 6. For the first step (AD 1), 6-mm spacers were inserted under winding 1. Spacers were then added in steps of 5.4 mm, to a maximum of 34.4 mm of displacement (AD 6). Figure 2b illustrates the winding model as winding 1 is displaced vertically upwards.
The RD fault was generated by replacing healthy sections of the winding with deformed sections. Figure 3 shows examples of both healthy and deformed sections used for the measurements. This fault was also incremented in six steps, RD 1 to RD 6. At RD 1, only one deformed section was introduced, to replace section 2 (top to bottom). The sections highlighted in Figure 2c were replaced one by one in each subsequent step with a deformed section until six sections were deformed (RD 6). A commercial instrument was used to measure FRA traces in the laboratory winding model. For this study, the minimum number of data points per decade was 200, as specified in the IEC standard 60076-18 [2]. Measurements were taken from 1 kHz up to 1 MHz. Two databases of measurements were used for the study: a database with four fault modes and a database with measurements taken at a variety of temperatures.

Fault Database
The fault database was created by introducing four different faults in the laboratory winding model, as well as taking healthy state measurements to serve as reference measurements. The faults include one electrical fault, shorted turns (ST), and three mechanical deformations: axial displacement (AD), radial deformation (RD), and disc space variation (DSV). Figure 2 illustrates the faults and the healthy state of the laboratory winding model.
The AD fault was created by inserting spacers at the bottom of winding 1 to displace it relative to winding 2, resulting in a loss of magnetic coupling between the windings. The fault was incremented in six steps, AD 1 to AD 6. For the first step (AD 1), 6-mm spacers were inserted under winding 1. Spacers were then added in steps of 5.4 mm, to a maximum of 34.4 mm of displacement (AD 6). Figure 2b illustrates the winding model as winding 1 is displaced vertically upwards.
The RD fault was generated by replacing healthy sections of the winding with deformed sections. Figure 3 shows examples of both healthy and deformed sections used for the measurements. This fault was also incremented in six steps, RD 1 to RD 6. At RD 1, only one deformed section was introduced, to replace section 2 (top to bottom). The sections highlighted in Figure 2c were replaced one by one in each subsequent step with a deformed section until six sections were deformed (RD 6).
The DSV fault was created by adding spacers in three different positions between the sections of winding 1, as shown in Figure 2d. For DSV 1, a 6-mm spacer was inserted between Sections 2 and 3, and for DSV 2, a 5.4-mm spacer was inserted between these same sections, for a total displacement of 11.4 mm. Next, first a 6-mm spacer (DSV 3) and then a 5.4-mm spacer (DSV-4) were added between Sections 8 and 9, and finally, DSV 5 and DSV 6 were created by adding 6-mm and then 5.4-mm spacers between Sections 14 and 15. In incrementing the DSV fault, the new spacers were added as described without removing the spacers already added for the preceding steps.
The DSV fault was created by adding spacers in three different positions between the sections of winding 1, as shown in Figure 2d. For DSV 1, a 6-mm spacer was inserted between Sections 2 and 3, and for DSV 2, a 5.4-mm spacer was inserted between these same sections, for a total displacement of 11.4 mm. Next, first a 6-mm spacer (DSV 3) and then a 5.4-mm spacer (DSV-4) were added between Sections 8 and 9, and finally, DSV 5 and DSV 6 were created by adding 6-mm and then 5.4-mm spacers between Sections 14 and 15. In incrementing the DSV fault, the new spacers were added as described without removing the spacers already added for the preceding steps. The shorted turns (ST) fault was created by short-circuiting sections of winding 1. For ST 1, the turns of Section 2 were shorted, for a total of 28 shorted turns. For ST 2, the turns of Section 3 were also shorted, for a total of 56 shorted turns, and so forth, with ST 6 having Sections 2, 3,8,9,14, and 15 shorted, for a total of 168 shorted turns. In incrementing the ST fault, the shorted turns were added without correcting those shorted for the preceding step. Figure 2e shows the locations of the shorted turns.  All measurements in the database were taken at 20 °C. A total of 343 FRA traces were used. Details of the FRA traces in the database are given in reference [13].

Temperature Database
The second database created in this study was a temperature database. The FRA measurements were taken with the laboratory winding model placed inside a climatic chamber that can simulate temperatures ranging from −40 °C to 30 °C. For testing at 40 °C, portable heaters were added inside the chamber. The chamber was first heated to 40 °C using the portable heaters and the temperature was then decreased in steps of 10 °C, down to −40 °C. This was done to prevent condensation from forming on the winding model. Once the room temperature was stable (not varying more than 1 °C), the FRA trace was obtained. At least four measurements were taken at each temperature to ensure The shorted turns (ST) fault was created by short-circuiting sections of winding 1. For ST 1, the turns of Section 2 were shorted, for a total of 28 shorted turns. For ST 2, the turns of Section 3 were also shorted, for a total of 56 shorted turns, and so forth, with ST 6 having Sections 2, 3, 8, 9, 14, and 15 shorted, for a total of 168 shorted turns. In incrementing the ST fault, the shorted turns were added without correcting those shorted for the preceding step. Figure 2e shows the locations of the shorted turns.
The DSV fault was created by adding spacers in three different positions between the sections of winding 1, as shown in Figure 2d. For DSV 1, a 6-mm spacer was inserted between Sections 2 and 3, and for DSV 2, a 5.4-mm spacer was inserted between these same sections, for a total displacement of 11.4 mm. Next, first a 6-mm spacer (DSV 3) and then a 5.4-mm spacer (DSV-4) were added between Sections 8 and 9, and finally, DSV 5 and DSV 6 were created by adding 6-mm and then 5.4-mm spacers between Sections 14 and 15. In incrementing the DSV fault, the new spacers were added as described without removing the spacers already added for the preceding steps. The shorted turns (ST) fault was created by short-circuiting sections of winding 1. For ST 1, the turns of Section 2 were shorted, for a total of 28 shorted turns. For ST 2, the turns of Section 3 were also shorted, for a total of 56 shorted turns, and so forth, with ST 6 having Sections 2, 3, 8, 9, 14, and 15 shorted, for a total of 168 shorted turns. In incrementing the ST fault, the shorted turns were added without correcting those shorted for the preceding step. Figure 2e shows the locations of the shorted turns. All measurements in the database were taken at 20 °C. A total of 343 FRA traces were used. Details of the FRA traces in the database are given in reference [13].

Temperature Database
The second database created in this study was a temperature database. The FRA measurements were taken with the laboratory winding model placed inside a climatic chamber that can simulate temperatures ranging from −40 °C to 30 °C. For testing at 40 °C, portable heaters were added inside the chamber. The chamber was first heated to 40 °C using the portable heaters and the temperature was then decreased in steps of 10 °C, down to −40 °C. This was done to prevent condensation from forming on the winding model. Once the room temperature was stable (not varying more than 1 °C), the FRA trace was obtained. At least four measurements were taken at each temperature to ensure All measurements in the database were taken at 20 • C. A total of 343 FRA traces were used. Details of the FRA traces in the database are given in reference [13].

Temperature Database
The second database created in this study was a temperature database. The FRA measurements were taken with the laboratory winding model placed inside a climatic chamber that can simulate temperatures ranging from −40 • C to 30 • C. For testing at 40 • C, portable heaters were added inside the chamber. The chamber was first heated to 40 • C using the portable heaters and the temperature was then decreased in steps of 10 • C, down to −40 • C. This was done to prevent condensation from forming on the winding model. Once the room temperature was stable (not varying more than ±1 • C), the FRA trace was obtained. At least four measurements were taken at each temperature to ensure a sufficient database of measurements. A total of 42 measurements were included in the temperature database.

Numerical Index Calculation
The numerical CSD index was used to quantify deviations between the reference measurement at 20 • C and measurements for other temperatures and faults. CSD values range from zero (perfect match) to infinity, increasing as the deviations between traces increase. The index works well for frequency deviations, but its sensitivity is not as good for amplitude deviations [8,14]. In a comparison with other numerical indices, the CSD was deemed to offer good performance in evaluating deviations in FRA traces, given its monotonicity, linearity, and sensitivity [8,13].
The following equation is used to calculate the CSD: where X and Y are, respectively, the reference and investigated amplitude vectors of measured frequency responses; X(i) and Y(i) are the ith element of these vectors; and N is the number of data points in vectors X and Y at the frequency window under evaluation.
It is important to note that the frequency range of the index calculation has a significant impact on the calculated value. Many different methods are thus used to select the frequency band for the index calculation. One of the simplest approaches is to evaluate the entire frequency spectrum, as described in [15,16]. However, if the frequency range is too wide, deviations between traces might be suppressed or may overlap, resulting in a lack of sensitivity in the numerical index evaluation. The frequencies may then be divided into sub-bands, as explained in [2,17]. To overcome the problem of frequency band division, this study used a sweep frequency window approach, a method based on the study described in [18]. A frequency window (WS) is determined from the number of data points per decade ( f p/d ) in the FRA traces, using Equation (2). Then, the frequency window is swept over the complete frequency range (1 kHz to 1 MHz) in steps of WS/4 to obtain a vector of CSD values:

Support Vector Machine Learning
A support vector machine (SVM) is a supervised learning model with associated learning algorithms. SVMs were first developed for solving binary classification problems. They can, however, be adapted for multiclass problem applications with the help of one-versus-one or one-versus-all heuristic methods. These heuristic methods split and transpose a multiclass problem into a binary classification problem. The SVM algorithm allows the classification of linearly separable patterns (x i ) from two classes: C 1 and C 2 . The discrimination between classes is achieved by positioning a hyperplane as a decision boundary. SVMs choose the maximum margin linear separator centered between the hyperplanes h 1 and h 2 , described in Equations (3) and (4): where w is the weight vector and b is the bias or threshold. The support vectors, which give the name to the method, are all the points lying on h 1 or h 2 . The main task of SVM algorithms is to find the optimal weights and biases that minimize the cost function [19]. However, real-world data are frequently not linearly separable, so the SVM does a kernel trick to transform the input space into a higher-dimensional space where the data is linearly separable. This transformation is made possible by the use of kernel functions [20]. Many different functions can be used as kernel functions in SVMs, some of the most common being linear, polynomial and Gaussian. For the database for this research, the polynomial kernel function was found to perform well and was used for classification. A polynomial function with order p was used, as defined in Equation (5): A 10-fold cross-validation method can be used to train and test SVM algorithms to prevent overfitting in the data used for model validation. In this method, the data set is divided into 10 parts. One part is then left out of the training and is used instead as the test set, and the classification is performed 10 times, with a different part used each time as the test set. The average deviation of the repeated classifications is then returned as the classification error.
To optimize the study results, grid search optimization [21,22] was used to determine the best SVM parameters and hence improve algorithm accuracy. The polynomial kernel of order p = 2 was found to be the best fit for the dataset classification, together with a one-versus-one heuristic method.
The described SVM algorithm was used to automatically classify the data and obtain an objective interpretation of trace deviation. The machine learning algorithm was implemented using Weka, an open-source software developed at the University of Waikato in New Zealand [21].
Three classification scenarios were produced. For the first, the algorithm was trained and tested as described above, using the fault database. For the second, the same already trained algorithm was tested using the temperature database. For the third classification scenario, the SVM algorithm was trained and tested using a combined database that included faults and temperature measurements.
Three inputs were considered in the SVM classification. First, the CSD values calculated for the frequency windows were used. Then, the frequencies and amplitudes of resonance and anti-resonance points were also used as classification input. Lastly, the combination of these two inputs was used to produce the classification scenarios.
Frequencies and amplitudes of resonance and anti-resonance points were detected by a maxima and minima search of the frequency response traces. Five main resonances and anti-resonances for each measurement were identified using an automatic search. Figure 4 shows a flowchart of the methodology. It is important to note the colors of the arrows in the chart: each classification scenario is presented in a different color.  Figure 5 shows the FRA measurements taken at temperatures from −40 °C to 40 °C, in increments of 10 °C. As the figure shows, although deviations are more perceptible at the first anti-resonance and resonance points for the complete frequency range, even   Figure 5 shows the FRA measurements taken at temperatures from −40 • C to 40 • C, in increments of 10 • C. As the figure shows, although deviations are more perceptible at the first anti-resonance and resonance points for the complete frequency range, even higher frequencies also present slight frequency shifts. Zooming in on the first anti-resonance frequency region allows better visualization of the deviations influenced by temperature changes. As the temperature increases, the resonance points shift to lower frequencies. The zoomed-in portion of Figure 5b also shows that resonance amplitudes are damped as the temperature increases.  Figure 5 shows the FRA measurements taken at temperatures from −40 °C to 40 °C, in increments of 10 °C. As the figure shows, although deviations are more perceptible at the first anti-resonance and resonance points for the complete frequency range, even higher frequencies also present slight frequency shifts. Zooming in on the first antiresonance frequency region allows better visualization of the deviations influenced by temperature changes. As the temperature increases, the resonance points shift to lower frequencies. The zoomed-in portion of Figure 5b also shows that resonance amplitudes are damped as the temperature increases.  The deviations in the FRA traces, such as those seen in Figure 5, are definitely from alterations in transformer elements, as demonstrated in previous FRA studies [1,3,8]. Changes to winding inductances, series and shunt capacitances, resistances and insulation conductance are the main causes of deviations. Temperature can influence FRA traces by modifying material parameters, such as magnetic permeability, resistivity, electrical permittivity, etc. [23][24][25]. Changes to geometry due to temperature changes (thermal expansion of conductors, for example) might also be present and affect self and mutual inductances, as well as capacitances between turns. However, in the temperature range considered in this study (−40 • C to 40 • C), copper dilation can be assumed to be negligible [26] and, hence, changes to geometry were not considered as possibly affecting the FRA traces.

Temperature Influence in Frequency Response
Coil inductances can be affected by changes in magnetic permeability due to temperature variation. However, the studies in [12] show only small inductance variations (less than 1.1%) under similar conditions for a temperature shift of 60 • C. In addition, there is no magnetic core in the tested model. Hence, the impact of inductance variation on the FRA traces due to temperature change can be considered insignificant.
The complex model for high-frequency studies of a transformer winding can be overviewed as a series impedance (Z(ω)) and a dielectric shunt capacitance (Y(ω)). These elements are presented as: where ω is the angular frequency, R and L are the equivalent resistance and equivalent inductance of the conductors, respectively, and G and C are the equivalent conductance and equivalent capacitance of the insulation system, respectively. The model under study presents dielectric materials, such as pressboard, paper, and air. The response of these materials in the presence of alternating fields can be described by a complex permittivity frequency dependent presented in Equation (8): where ε and ε are the real and imaginary parts of the dielectric permittivity, respectively. The admittance can then be re-written, in terms of the dielectric response [9], as Equation (9): The dielectric response is a function of both frequency and temperature. The dependency of the permittivity to temperature can be described by its relation to the medium conductivity (σ), which is highly temperature (T)-dependent. The imaginary part of the dielectric response related to the conductivity is then presented by: and the temperature dependence of the conductivity can be described by the Arrhenius equation, as in Equation (11) [27]: where σ 0 is the pre-exponential factor, E a is the activation energy, and k B is the Boltzmann's constant. Finally, the increase in the complex permittivity of the insulation, due to the increase in temperature, has an impact on the conductance loss, and this loss can be identified by the damping effect present in the resonances in the FRA traces, as illustrated in Figure 5b. Furthermore, the shift of resonances to lower frequencies, as the temperature increases, can be associated with an increase in the capacitances of the model [27]. The capacitance changes can be calculated from the first resonance point (Figure 5b). Local resonances and anti-resonances are characterized by the interaction between inductive and capacitive reactances [28]. Every resonance or anti-resonance considered independently can be interpreted through Equation (12): where f res is the resonance frequency, and L i and C i are the inductance and capacitance corresponding to the resonance point under consideration. It is well established that winding inductances are not significantly affected by temperature variation [9,12]; the main hypothesis is that temperature primarily influences the resonance frequency points due to moisture migration/dynamics and electrical permittivity changes. Based on this hypothesis, capacitance variation with temperature can be estimated using Equation (12), with the inductance value estimated from Equation (13). This equation is derived from the FRA transfer function in (14) with a 50 Ω measurement impedance for the measuring instrument [4]: where V in is the input voltage applied at the input point, V out is the output voltage measured at the response terminal, and ϕ is the phase difference between input and output voltages [3]. The inductance value is then calculated at the linear descendent part of the FRA trace leading to the first anti-resonance. In this region (around 4 kHz), the inductances are very close to each other, demonstrating that the inductance is not significantly influenced by temperature. The average value is 28 mH. Capacitance values are calculated using a rearranged Equation (12), solving for C i , and using the first anti-resonance frequencies at each temperature point. The results are shown in Figure 6, along with the polynomial fitted curve for the data points of the capacitance calculation.
can be estimated using Equation (12), with the inductance value estimated from Equation (13). This equation is derived from the FRA transfer function in (14) with a 50 Ω measurement impedance for the measuring instrument [4]: where is the input voltage applied at the input point, is the output voltage measured at the response terminal, and is the phase difference between input and output voltages [3].
The inductance value is then calculated at the linear descendent part of the FRA trace leading to the first anti-resonance. In this region (around 4 kHz), the inductances are very close to each other, demonstrating that the inductance is not significantly influenced by temperature. The average value is 28 mH. Capacitance values are calculated using a rearranged Equation (12), solving for , and using the first anti-resonance frequencies at each temperature point. The results are shown in Figure 6, along with the polynomial fitted curve for the data points of the capacitance calculation. As shown in Figure 6, distributed capacitance increases as the temperature increases. This is mainly due to the electrical permittivity change in the test environment. Since the model has air insulation, any change in the testing chamber temperature directly affects the temperature of the insulation, which in turn displaces the resonant frequencies. As shown in Figure 6, distributed capacitance increases as the temperature increases. This is mainly due to the electrical permittivity change in the test environment. Since the model has air insulation, any change in the testing chamber temperature directly affects the temperature of the insulation, which in turn displaces the resonant frequencies.

Numerical Index Results
The CSD index was used to quantify deviations according to temperature change. The index was calculated over the complete frequency range, from 1 kHz to 1 MHz, in frequency windows calculated from (2). Figure 7 illustrates CSD values for the different temperatures. To avoid overloading the figure, only the two extreme temperatures (40 • C and −40 • C), plus the curve at reference temperature (20 • C), are included in the figure. The CSD index indicated higher values around the first anti-resonance and resonance points, and lower but significant values at higher frequencies (above 200 kHz), as can be seen in Figure 7.
CSD values were similarly calculated for the different fault modes (axial displacement, radial deformation, disc space variation, and shorted turns) for further comparisons and classification algorithm implementation. Figure 8 provides a sample (only one step of each fault) of the results and the calculated CSD vectors.
As Figure 8 shows, the different faults affected the frequency response at different frequency ranges. A comparison of Figures 7 and 8 shows that the shorted turns fault had an impact similar to that of temperature variation on the first anti-resonance, with smaller CSD values. Temperature variation caused significant deviations at higher frequencies (above 250 kHz), as was the case with the different fault modes. As this comparison comparisons and classification algorithm implementation. Figure 8 provides a sample (only one step of each fault) of the results and the calculated CSD vectors.
As Figure 8 shows, the different faults affected the frequency response at different frequency ranges. A comparison of Figures 7 and 8 shows that the shorted turns fault had an impact similar to that of temperature variation on the first anti-resonance, with smaller CSD values. Temperature variation caused significant deviations at higher frequencies (above 250 kHz), as was the case with the different fault modes. As this comparison indicates, an automatic algorithm might have difficulty distinguishing simple temperature variation in FRA measurements from a fault mode. .

Classification Algorithm Results and Discussions
The fault database was used for classification scenario 1. A CSD vector was calculated and used as input for the classification algorithm. The resonance and anti-resonance points of the traces were also considered. The algorithm analyzed each of the 343 instances and classified them into 5 classes: no-fault, axial displacement (AD), radial deformation (RD), disc space variation (DSV), or shorted turns (ST). Figure 9 shows the confusion matrices obtained for these classifications.
The general performance of this classification was 93% when only CSD values were used as input. The performance increased to 99.7% when resonance and anti-resonances were considered as input. In all the classification scenarios, 10-fold cross-validation was used to train the algorithm. The confusion matrix shows the percentage of instances classified in each class, along with the total number of instances corresponding to this percentage.

Classification Algorithm Results and Discussions
The fault database was used for classification scenario 1. A CSD vector was calculated and used as input for the classification algorithm. The resonance and anti-resonance points of the traces were also considered. The algorithm analyzed each of the 343 instances and classified them into 5 classes: no-fault, axial displacement (AD), radial deformation (RD), disc space variation (DSV), or shorted turns (ST). Figure 9 shows the confusion matrices obtained for these classifications. matrices obtained for these classifications.
The general performance of this classification was 93% when only CSD values were used as input. The performance increased to 99.7% when resonance and anti-resonances were considered as input. In all the classification scenarios, 10-fold cross-validation was used to train the algorithm. The confusion matrix shows the percentage of instances classified in each class, along with the total number of instances corresponding to this percentage.
(a) (b) (c) The temperature database was then used to test the classification algorithm (classification scenario 2), with the CSD vector and a combination of CSD vector values The general performance of this classification was 93% when only CSD values were used as input. The performance increased to 99.7% when resonance and anti-resonances were considered as input. In all the classification scenarios, 10-fold cross-validation was used to train the algorithm. The confusion matrix shows the percentage of instances classified in each class, along with the total number of instances corresponding to this percentage.
The temperature database was then used to test the classification algorithm (classification scenario 2), with the CSD vector and a combination of CSD vector values and resonance and anti-resonance points as input. Since classification with only the resonance points did not differ from the combination of inputs, this classification was omitted from thenceforward. The algorithm was expected to classify the data without having previously been trained for temperatures other than 20 • C. Figure 10 shows the confusion matrices obtained for this new test. The general performance of the SVM method dropped to 71% when the CSD vector was used. However, the performance of the classifier using the resonance and anti-resonance points dropped to 40%. These confusion matrices were also divided into four additional matrix lines, according to the temperature of the measurements classified. and resonance and anti-resonance points as input. Since classification with only the resonance points did not differ from the combination of inputs, this classification was omitted from thenceforward. The algorithm was expected to classify the data without having previously been trained for temperatures other than 20 °C. Figure 10 shows the confusion matrices obtained for this new test. The general performance of the SVM method dropped to 71% when the CSD vector was used. However, the performance of the classifier using the resonance and anti-resonance points dropped to 40%. These confusion matrices were also divided into four additional matrix lines, according to the temperature of the measurements classified. The confusion matrices shown in Figure 10 corroborate the hypothesis that the algorithm is not always capable of distinguishing faults from temperature variation. As the matrices divided by temperature show, significant problems occurred when the temperature dropped below −10 °C, that is, a shift of −30 °C from the reference temperature (20 °C). The measurements from −40 °C to −20 °C were misclassified as axial displacement, disc space variation, or shorted turns faults, depending on the input used for classification.
To overcome the misclassification problem, the SVM algorithm needs to be trained with FRA measurements at different temperatures, to learn as many different patterns as The confusion matrices shown in Figure 10 corroborate the hypothesis that the algorithm is not always capable of distinguishing faults from temperature variation. As the matrices divided by temperature show, significant problems occurred when the temperature dropped below −10 • C, that is, a shift of −30 • C from the reference temperature (20 • C). The measurements from −40 • C to −20 • C were misclassified as axial displacement, disc space variation, or shorted turns faults, depending on the input used for classification.
To overcome the misclassification problem, the SVM algorithm needs to be trained with FRA measurements at different temperatures, to learn as many different patterns as possible. In classification scenario 3, both the fault database and the temperature database are considered when training the SVM algorithm. For this classification scenario, the training and testing datasets included 70% and 30% of the complete dataset, respectively; that is, 70% of the data was used to train the classification algorithm, with the remaining 30% left for testing and validation. Afterward, the datasets were stratified to ensure the ratio of temperature and fault data was maintained from the initial complete dataset into the divided training and testing sets.
The SVM's general performance, using the combined databases (fault and temperature databases) and the CSD vector as input, was once again 93.9%. Its performance returned to 99.1% when resonances and anti-resonances were used in combination with the CSD vector. This indicates that once temperature is considered in the training dataset, the classification algorithm performs as well as when only faults are used in the classification. This was true for all inputs considered, confirming the importance of a large database of measurements that consider different temperatures in the training dataset.
In this study, measurements at different temperatures were possible because the laboratory winding model allowed a number of possibilities for FRA measurements. With real transformers, measuring a wide range of temperatures may not be feasible. One possible solution to this problem is to improve automated interpretation by using computational simulation environments to help generate a database of frequency responses that includes different fault and temperature conditions. Further research into this possibility should be considered.

Conclusions
This paper addresses the interpretation of FRA measurements at different temperatures using machine-learning applications. A laboratory winding model specially designed for FRA measurements was used as the testing equipment. The model allows the introduction of mechanical and electrical faults and, hence, frequency response under different conditions can be assessed. Tests were performed in a climatic chamber, allowing the temperature to vary from −40 • C to 40 • C. The influence of the temperature on an SVM algorithm classification was reported.
As already reported in the literature, temperature affected the measurements. Among other things, variations in capacitance values were noted, probably due to moisture dynamics related to changes in the insulation temperature. The results also showed that when temperature is not considered in the training set of the machine learning algorithm, the classification can be compromised. In fact, at least 30% of the tested measurements were misclassified on the first attempt, with the error of classification as high as 60%, depending on the input data for the classification algorithm. The misclassification occurred predominantly in a group with temperature shifts of more than 30 • C.
Temperature measurements need to be included in the training set to overcome the misclassification problem and restore SVM performance. The SVM classifications were performed using the following as classifier input: (a) CSD index values; (b) trace resonance and anti-resonance frequencies and amplitudes; and (c) a combination of (a) and (b). The CSD was calculated over a frequency window that swept the entire frequency range to obtain a vector of CSD index values.
Confusion matrices were used to get a picture of the SVM performance. They show that the algorithm misclassifies different temperature measurements as an axial displacement, disc space variation or short-circuited turns faults, corroborating the need to include different measurement conditions in the training datasets of machine learning algorithms. The improvement in the database when measurements that consider other factors influenc- ing FRA traces are included needs to be acknowledged. This is one of the contributions of this research.