# A Transformer Heavy Overload Spatiotemporal Distribution Prediction Ensemble under Imbalanced and Nonlinear Data Scenarios

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Motivation and Background

#### 1.2. Problems

#### 1.3. Research and Contributions

- The proposed method obviates the necessity for the direct extraction of potential relationships between condition components and transformer heavy overloads, thereby enabling heavy overload predictions for distribution transformers under application data scenarios in the real world.
- The TCCPR model incorporates DSPRt and SCSSC to effectively consider the distribution of UHR factors across different time series and environmental features. This enables an all-inclusive analysis of multi-source inputs in cases of both imbalanced spatial and temporal data distributions, which results in enhanced prediction performance, especially in imbalanced data scenes.
- The CSD model applies a direct measurement of the relative risk impact weights of each factor by analyzing the changing trend and amplitude of the overall system risk that results from their appearance. Compared with the appearance frequency or data proportion, this model provides a more straightforward weight assessment via their impacts on fault results, making it more feasible, especially within nonlinear data scenarios.

## 2. Models and Methods

#### 2.1. Establishment of Comprehensive Evaluation Feature Database

#### 2.2. Two-Fold Conditional Connection Pattern Recognition (TCCPR) Model

#### 2.2.1. Principle Description: Pattern Recognition (PR)

#### 2.2.2. The Establishment of Dynamic Self-Adaptive PR Thresholds (DSPRts)

- First, divide a year into four seasons, with one season serving as the baseline time series. Then, categorize the collected database data into four time series.
- Next, improved DSPRts are set based on the significance of each time series.
- The analysis of factors follows strict criteria: each factor must meet or exceed its time series thresholds to be considered, and it is excluded if it falls below the thresholds.

#### 2.2.3. The Establishment of Spatial Conditions Significant Scores Calculation (SCSSC)

#### 2.2.4. The Utilization of MFP-Growth

#### 2.3. Component Significance Diagnostic (CSD) Model

#### The Establishment of a CSD Model for Overall System Risk

#### 2.4. The Operation Procedure of the TCCPR-CSD Classifier Model

- Data collection and integrated solution: Based on the input features of the distribution transformer, pertinent data were collected and integrated with risk values associated with various factors, encompassing both external and internal environmental features.
- Establishment of Dynamic Self-adaptive PR thresholds in the temporal dimensions: Based on the training data in the database, all factors included in a feature were comprehensively analyzed using four significant PR indexes. The identification of exceptional datasets was accomplished by comparing the calculated DSPRts, as determined by Formulas (19)–(21).
- Establishment of SCSSC in the spatial dimensions: The fault records containing any unusual factors in this feature were classified in the unusual dataset ${A}^{y1}$ and the UHR factors based on these unusual datasets were mined by Formulas (24)–(27) to characterize the potential influence on distribution transformers.
- The sequential repetition of steps (1–3) was applied to each environmental feature in the training dataset.
- The results of the SCSSC were compared against the DSPRts to identify UHR factors in unusual datasets.
- Establishment of risk impact weights measure method for the CSD model: The relative risk impact weights ${\mu}_{{v}_{x,k}}$ of each feature factor were calculated by Formula (33) and then the final predicted failure risk level was calculated for each failure record.
- Performance verification: Finally, (0→1: impossible to occur→certain to occur) was normalized and the predicted failure risk level was compared with the actual overload records (0 or 1: occurred or not occurred) in the test set to verify the performance of the predictive model in this study.

## 3. Empirical Case Study

#### 3.1. Data Description

#### 3.2. Classification Performance Analysis

#### 3.3. Failure Cause Analysis

#### 3.4. Algorithms Analysis

## 4. Conclusions

- In data imbalanced distributions, some rarely occurring environmental condition factors may also be risky ones. Thus, the TCCPR model was built to incorporate UHR factors in spatiotemporal dimensions and different temporal risks from each time series when analyzing the feature factors that affected the occurrence of transformer heavy overload. On the one hand, the four Dynamic Self-adaptive PR thresholds were designed to account for imbalanced risk distributions in temporal dimensions. On the other hand, SCSSC was developed to work out the conditional significance scores that identified UHR factors from the imbalanced distribution of data in spatial dimensions.
- In data nonlinear distributions, data proportion or appearance frequency cannot be simply viewed as impacting the whole system risk. Therefore, the CSD model was designed to evaluate the relative impacting weights of each distinguished risky environmental factor directly through the trend and magnitude of variations of the overall system failure risk level caused by them. This comprehensively considered the impact of factors with different characteristics on system risk and accurately assessed the relative risk weights of each factor.
- According to the empirical case study, the proposed TCCPR model effectively extracted UHR factors from the unusual components. Additionally, the CSD model had higher accuracy and rationality compared to the traditional linear weight calculation method based on the frequency of occurrence of fixed factors. By combining these two, the integrated model accurately predicted heavy overloads in scenarios with multi-source and imbalanced data distributions under spatiotemporal conditions. The prediction outcomes can serve as a reference for the allocation and arrangement of operation and maintenance work. This helps to prevent equipment damage and environmental pollution caused by heavy overload, contributing to a more reliable and sustainable power supply.

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Prasath, T.M.; Kirubakaran, V. A real time study on condition monitoring of distribution transformer using thermal imager. Infrared Phys. Technol.
**2018**, 90, 78–86. [Google Scholar] - Naeem, M.F.; Hashmi, K.; Kashif, S.A.R.; Khan, M.M.; Alghaythi, M.L.; Aymen, F.; Ali, S.G.; AboRas, K.M.; Ben Dhaou, I. A novel method for life estimation of power transformers using fuzzy logic systems: An intelligent predictive maintenance approach. Front. Energy Res.
**2022**, 10, 977665. [Google Scholar] [CrossRef] - Biçen, Y.; Aras, F.; Kirkici, H. Lifetime estimation and monitoring of power transformer considering annual load factors. IEEE Trans. Dielectr. Electr. Insul.
**2014**, 21, 1360–1367. [Google Scholar] [CrossRef] - Lin, C.H.; Wu, C.H.; Huang, P.Z. Grey clustering analysis for incipient fault diagnosis in oil-immersed transformers. Expert Syst. Appl.
**2009**, 36, 1371–1379. [Google Scholar] [CrossRef] - Cheng, L.; Yu, T. Dissolved Gas Analysis Principle-Based Intelligent Approaches to Fault Diagnosis and Decision Making for Large Oil-Immersed Power Transformers: A Survey. Energies
**2018**, 11, 913. [Google Scholar] [CrossRef] - Liu, Z.; Wang, S.; Tang, B. Transformer fault identification based on the cuckoo search algorithm and DBN model. J. Electr. Power Sci. Technol.
**2022**, 37, 3–11. [Google Scholar] - Wu, Q.; Zhang, H. A Novel Expertise-Guided Machine Learning Model for Internal Fault State Diagnosis of Power Transformers. Sustainability
**2019**, 11, 1562. [Google Scholar] [CrossRef] - Huang, X.; Zhang, F.; Li, H.; Liu, X. An online technology for measuring icing shape on conductor based on vision and force sensors. IEEE Trans. Instrum. Meas.
**2017**, 66, 3180–3189. [Google Scholar] [CrossRef] - Huang, X.; Zhao, L.; Chen, G. Design of a wireless sensor module for monitoring conductor galloping of transmission lines. Sensors
**2016**, 16, 1657. [Google Scholar] [CrossRef] - Jalilian, M.; Sariri, H.; Parandin, F.; Karkhanehchi, M.M.; Hookari, M.; Jirdehi, M.A.; Hemmati, R. Design and implementation of the monitoring and control systems for distribution transformer by using GSM network. Int. J. Electr. Power Energy Syst.
**2016**, 74, 36–41. [Google Scholar] [CrossRef] - Gorgan, B.; Notingher, P.V.; Wetzer, J.M.; Verhaart, H.F.A.; Wouters, P.A.A.F.; Van Schijndel, A. Influence of solar irradiation on power transformer thermal balance. IEEE Trans. Dielectr. Electr. Insul.
**2012**, 19, 1843–1850. [Google Scholar] [CrossRef] - Taheri, A.A.; Abdali, A.; Rabiee, A. Indoor distribution transformers oil temperature prediction using new electro-thermal resistance model and normal cyclic overloading strategy: An experimental case study. IET Gener. Transm. Distrib.
**2020**, 14, 5792–5803. [Google Scholar] [CrossRef] - Shadab, S.; Hozefa, J.; Sonam, K.; Wagh, S.; Singh, N.M. Gaussian process surrogate model for an effective life assessment of transformer considering model and measurement uncertainties. Int. J. Electr. Power Energy Syst.
**2022**, 134, 107401. [Google Scholar] [CrossRef] - Behkam, R.; Karami, H.; Naderi, M.S.; Gharehpetian, G.B. Generalized regression neural network application for fault type detection in distribution transformer windings considering statistical indices. COMPEL Int. J. Comput. Math. Electr. Electron. Eng.
**2022**, 41, 381–409. [Google Scholar] [CrossRef] - Bacha, K.; Souahlia, S.; Gossa, M. Power transformer fault diagnosis based on dissolved gas analysis by support vector machine. Electr. Power Syst. Res.
**2012**, 83, 73–79. [Google Scholar] [CrossRef] - Sun, Y.; Ma, S.; Sun, S.; Liu, P.; Zhang, L.; Ouyang, J.; Ni, X. Partial discharge pattern recognition of transformers based on MobileNets convolutional neural network. Appl. Sci.
**2021**, 11, 6984. [Google Scholar] [CrossRef] - Yang, X.; Chen, W.; Li, A.; Yang, C.; Xie, Z.; Dong, H. BA-PNN-based methods for power transformer fault diagnosis. Adv. Eng. Inform.
**2019**, 39, 178–185. [Google Scholar] [CrossRef] - Huang, Y.C.; Sun, H.C. Dissolved gas analysis of mineral oil for power transformer fault diagnosis using fuzzy logic. IEEE Trans. Dielectr. Electr. Insul.
**2013**, 20, 974–981. [Google Scholar] [CrossRef] - Xiao, Y.; Pan, W.; Guo, X.; Bi, S.; Feng, D.; Lin, S. Fault diagnosis of traction transformer based on Bayesian network. Energies
**2020**, 13, 4966. [Google Scholar] [CrossRef] - Lakehal, A.; Tachi, F. Bayesian duval triangle method for fault prediction and assessment of oil immersed transformers. Meas. Control
**2017**, 50, 103–109. [Google Scholar] [CrossRef] - Ma, H.; Yang, P.; Wang, F.; Wang, X.; Yang, D.; Feng, B. Short-Term Heavy Overload Forecasting of Public Transformers Based on Combined LSTM-XGBoost Model. Energies
**2023**, 16, 1507. [Google Scholar] [CrossRef] - Yang, Z.; Shen, Y.; Zhou, R.; Yang, F.; Wan, Z.; Zhou, Z. A transfer learning fault diagnosis model of distribution transformer considering multi-factor situation evolution. IEEJ Trans. Electr. Electron. Eng.
**2020**, 15, 30–39. [Google Scholar] [CrossRef] - Hong, K.; Jin, M.; Huang, H. Transformer winding fault diagnosis using vibration image and deep learning. IEEE Trans. Power Deliv.
**2020**, 36, 676–685. [Google Scholar] [CrossRef] - Zhang, X.; Tang, Y.; Liu, Q.; Liu, G.; Ning, X.; Chen, J. A fault analysis method based on association rule mining for distribution terminal unit. Appl. Sci.
**2021**, 11, 5221. [Google Scholar] [CrossRef] - Wang, X.; Yan, Z.; Zeng, Y.; Liu, X.; Peng, X.; Yuan, H. Research on correlation factor analysis and prediction method of overhead transmission line defect state based on association rule mining and RBF-SVM. Energy Rep.
**2021**, 7, 359–368. [Google Scholar] [CrossRef] - Sheng, G.; Hou, H.; Jian, X.; Chen, Y. A novel association rule mining method of big data for power transformers state parameters based on probabilistic graph model. IEEE Trans. Smart Grid
**2016**, 9, 695–702. [Google Scholar] [CrossRef] - Li, L.; Cheng, Y.; Xie, L.J.; Jiang, L.-Q.; Ma, N.; Lu, M. An integrated method of set pair analysis and association rule for fault diagnosis of power transformers. IEEE Trans. Dielectr. Electr. Insul.
**2015**, 22, 2368–2378. [Google Scholar] [CrossRef] - Agrawal, R.; Imieliński, T.; Swami, A. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, 25–28 May 1993. [Google Scholar]
- He, H.; Zhang, W.; Zhang, S. A novel ensemble method for credit scoring: Adaption of different imbalance ratios. Expert Syst. Appl.
**2018**, 98, 105–117. [Google Scholar] [CrossRef] - Kashan, A.H.; Akbari, A.A.; Ostadi, B. Grouping evolution strategies: An effective approach for grouping problems. Appl. Math. Model.
**2015**, 39, 2703–2720. [Google Scholar] [CrossRef] - Shawkat, M.; Badawi, M.; El-Ghamrawy, S.; Arnous, R.; El-Desoky, A. An optimized FP-growth algorithm for discovery of association rules. J. Supercomput.
**2022**, 78, 5479–5506. [Google Scholar] [CrossRef] - Espinoza, S.; Poulos, A.; Rudnick, H.; de la Llera, J.C.; Panteli, M.; Mancarella, P. Risk and Resilience Assessment with Component Criticality Ranking of Electric Power Systems Subject to Earthquakes. IEEE Syst. J.
**2020**, 14, 2837–2848. [Google Scholar] [CrossRef] - Miziuła, P.; Navarro, J. Birnbaum Importance Measure for Reliability Systems with Dependent Components. IEEE Trans. Reliab.
**2019**, 68, 439–450. [Google Scholar] [CrossRef] - Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett.
**2006**, 27, 861–874. [Google Scholar] [CrossRef] - Keilwagen, J.; Grosse, I.; Grau, J. Area under precision-recall curves for weighted and unweighted data. PLoS ONE
**2014**, 9, e92209. [Google Scholar] [CrossRef] [PubMed]

**Figure 4.**Comparison of classifier models via ROC curves: (

**a**) revised and standard classifiers; (

**b**) revised and other classifier models.

**Figure 5.**Comparison of classifier models via P-R curves: (

**a**) revised and standard classifiers; (

**b**) revised and other classifier models.

Nature Factors | System Factors | Device Factors |
---|---|---|

Date | Load | Device age |

Weather | Voltage | Rated capacity |

Topographical | Current | Cooling efficiency |

Flora and fauna | Phase | Short time capacity |

PR Indexes | Minimum Thresholds |
---|---|

Support | $Min\_Su$ |

Confidence | $Min\_Co$ |

Kulczynski | $Min\_K$ |

Imbalance Ratio | $Min\_ImRat$ |

Prediction | Reality | |
---|---|---|

True | False | |

Negative | FN | TN |

Positive | TP | FP |

Classifier Metrics | Formulas |
---|---|

TP Rate | $TP/(TP+FN)$ |

FP Rate | $FP/(TN+FP)$ |

Recall Rate | $TP/(FN+TP)$ |

Precision Rate | $TP/(TP+FP)$ |

Features | Factor Type |
---|---|

Heavy overload | 1,0 |

Day | 1–31 |

Hour | 1–24 |

Month | 1–12 |

Season | Spring, Summer, Autumn, Winter |

Topography | Plains, Hills, Plateaus, Basins, Mountains |

Weather | Sunny, Rainy, Cloudy, Snowy |

Device age | Years |

Voltage level | 10 KV, 35 KV, 110 KV, 220 KV, 330 KV, 500 KV |

Extreme weather duration | Days |

Rated capacity | kVA |

Short time capacity | kVA |

Continuous features | Load balance rate, Plant distribution rate, Animal activity density, Average temperatures, Relative humidity, Average illumination, Cooling efficiency, Relative humidity, Current and voltage phase |

Features | UHR Factor Type |
---|---|

Month | 2, 3, 10, 11, 12 |

Season | Autumn, Winter |

Weather | Cloudy |

Topography | Plains, Hills, Mountains |

Voltage level | 110 KV, 500 KV |

All unusual features | Short time capacity, Extreme weather duration, Animal activity density, Average temperatures, Relative humidity, Average illumination, and Relative humidity |

Classifier Models | AUC (ROC)% | Classifier Models | AUC (ROC)% |
---|---|---|---|

TCCPR-CSD | 93.15 | BA-PNN | 88.96 |

PR-CSD | 84.86 | MCNN | 86.14 |

PR-AF | 81.21 | SVM | 83.12 |

Classifier Models | AUC (P-R)% | Classifier Models | AUC (P-R)% |
---|---|---|---|

TCCPR-CSD | 93.62 | BA-PNN | 89.81 |

PR-CSD | 85.49 | MCNN | 86.53 |

PR-AF | 81.91 | SVM | 83.76 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Liu, Y.; Sun, C.; Yang, X.; Jia, Z.; Su, J.; Guo, Z.
A Transformer Heavy Overload Spatiotemporal Distribution Prediction Ensemble under Imbalanced and Nonlinear Data Scenarios. *Sustainability* **2024**, *16*, 3110.
https://doi.org/10.3390/su16083110

**AMA Style**

Liu Y, Sun C, Yang X, Jia Z, Su J, Guo Z.
A Transformer Heavy Overload Spatiotemporal Distribution Prediction Ensemble under Imbalanced and Nonlinear Data Scenarios. *Sustainability*. 2024; 16(8):3110.
https://doi.org/10.3390/su16083110

**Chicago/Turabian Style**

Liu, Yanzheng, Chenhao Sun, Xin Yang, Zhiwei Jia, Jianhong Su, and Zhijie Guo.
2024. "A Transformer Heavy Overload Spatiotemporal Distribution Prediction Ensemble under Imbalanced and Nonlinear Data Scenarios" *Sustainability* 16, no. 8: 3110.
https://doi.org/10.3390/su16083110