Power Transformer Fault Diagnosis Using Neural Network Optimization Techniques
Abstract
:1. Introduction
- Electrical: Partial Discharge, Corona, Arcing, Oil’s Breakdown Voltage;
- Thermal: Cellulose Overheating, Oil Overheating;
- Mechanical: Winding, Core Deformations.
- Maximizing the Accuracy for PTFD.
- Minimizing the Architectural Complexity of the ANN.
2. Materials and Methods
2.1. Dissolved Gas Analysis
The Rogers Ratio Method
- MH = CH4/H2
- EM = C2H6/CH4
- EE = C2H4/C2H6
- AE = C2H2/C2H4
- F1: No-Fault (1);
- F2: Low Energy Discharge (2), (12);
- F3: High Energy Discharge (9), (10), (11);
- F4: Low and Medium Thermal Faults (3), (4), (5), (7), (8);
- F5: High-Temperature Thermal Faults (6).
2.2. Artificial Neural Networks
Backpropagation
- Loss function: The choice of the loss function depends on the specific problem. If we denote the predicted output of the ANN as and the target output as , then the loss function measures the discrepancy between and , such as the mean squared error for regression or cross-entropy for classification.
2.3. Model’s Parameters and Hyperparameters
- Hyperparameters that define the neural network structure:
- The quantity of hidden layers;
- The quantity of nodes in each layer;
- The kind of activation function.
- Hyperparameters determine the way the network training and directly control the training process:
- Learning Rule Optimizer type;
- Weight initialization methods;
- Type of loss (or cost) function;
- Learning rate;
- Batch size;
- Number of epochs;
- Training steps.
- Hyperparameters that operate the regularization effect and directly control the overfitting:
- Regularization parameters;
- Lambda-λ in L1 and L2 regularization;
- Dropout rate during the dropout regularization.
2.4. The Hyperparameter Optimization
- with i = 1, 2, …, n—the training data;
- —the model’s weights;
- —prediction function;
- —loss function.
2.4.1. Optimization of the Hyperparameters That Define the Neural Network Structure
- The determination of input nodes;
- The identification of hidden layers and nodes;
- The determination of output nodes;
- The activation function.
2.4.2. Optimization of Hyperparameters That Determine the State of the Network Training and Directly Control the Training Process
2.4.3. Optimization of Hyperparameters That Operate Regularization Effect and Directly Control the Overfitting
- Techniques to prevent overfitting
- regularization
- regularization
- Early Stopping
- When λ equals 0, no regularization is applied;
- When λ equals 1, complete regularization comes into effect;
- The default setting for Keras is λ = 0.01.
2.5. Visualize and Estimate the Performance of a Classifier in the Classification Learner App (CLA)
2.5.1. ROC Curves
2.5.2. Confusion Matrix
- True positives (TP): The number of occurrences where the actual class is positive (actually positive) and the model predicted it as positive;
- True negatives (TN): The number of occurrences where the actual class is negative (actually negative), and the model predicted it as negative. This is often more relevant in binary classification;
- False positives (FP): The number of occurrences where the actual class is negative (actually negative), but the model predicted it as positive;
- False negatives (FN): The number of occurrences where the actual class is positive (actually positive), but the model predicted it as negative.
- Accuracy: (TP + TN)/(TP + TN + FP + FN);
- Precision: TP/(TP + FP);
- Recall = TP/(TP + FN);
- F1 Score = 2 ∗ (Precision ∗ Recall)/(Precision + Recall);
- Specificity = TN/(TN + FP).
- These metrics provide different perspectives on the performance of the ANN. The confusion matrix and the derived show where the model is making correct predictions and where it is making mistakes, which is essential for improving the model or choosing the right model for a given task.
2.6. Experiment
- Data extraction;
- Load data in the Classification Learner app;
- Select classifier options;
- Train classifiers;
- Choose the best classifier type (ANN);
- Train 30 ANNs with different parameters;
- Select the five most accurate networks;
- Train and test 5 ANNs and select the most accurate model;
- ANN hyperparameter optimization;
- Visualize and assess the ANN’s performance;
- Select the ANN with the best accuracy for PTFD.
2.6.1. First Experiment—Selecting the Best Classifier Type
- Validation accuracy: Percentage of properly identified observations;
- Total cost: Misclassification cost;
- Prediction speed: Estimated speed for test data, founded on the prediction timings for the validation data. This speed may be impacted by background operations both within and outside of the app; therefore, we train models in equal states for more accurate comparisons;
- Training time: Duration of the model’s training (train models in equal states for more accurate comparisons);
- Model size: the model’s size if it were exported with no training data.
- Data extraction.
- 2.
- Load data in the Classification Learner app.
- 3.
- Select classifier options.
- Decision trees;
- Naive Bayes classifiers;
- Support Vector Machines (SVMs);
- Kernel;
- ANN.
2.6.2. Second Experiment—Finding the ANN with the Best Performance
- Finding the best possible training choices using Bayesian optimization;
- Operating the trainNetwork built-in function or designing a unique training function;
- Evaluating the performance of various network topologies by comparing the outcomes of utilizing distinct datasets.
2.6.3. Third Experiment—ANN Hyperparameter Optimization
- Firstly, we systematically explore various values for each hyperparameter using the grid, random, and Bayesian search approaches to identify the optimal combination of hyperparameters that maximizes the performance on the validation set (search strategy).
- Secondly, we tune the hyperparameters for the three optimizers according to the optimal combination of hyperparameters we find with the previous search strategy.
- Hyperparameter Tuning Approaches
- Define the Hyperparameters: Identify the tuning hyperparameters: learning rate (LR), regularization strength (L2), momentum (M), and batch size (BS).
- Define the Search Space: Determine the range of values that each hyperparameter can take:
- LR search space between 0.001 and 0.1;
- L2 search space between 0.00001 and 0.1;
- M search space between 0.5 and 0.99;
- BS search space between 32 and 4096.
- Choosing a Search Strategy: Random search, Bayesian optimizer, grid search, random search.
- Train and Evaluate: For each hyperparameter setting, we train the neural network and assess its performance by measuring the validation accuracy and loss.
- The estimated minimal classification error is represented by each light blue point. This estimation is obtained using the optimization procedure, which takes into account all the combinations of hyperparameter values that have been estimated, involving the present iteration.
- The observed minimal classification error is represented in the graph by every dark blue point, which refers to the calculated error obtained during the optimization procedure.
- The optimized hyperparameters are represented by the red square, representing the iteration with the best performance. As we can see from Figure, the best point hyperparameter is the L2 with value 0.00000287, and the observed min classification error is 0.172 in the eighth iteration. The optimized hyperparameters may not consistently provide the reported minimal classification error. Here, we can also mention that the application employs Bayesian optimization for hyperparameter tuning. It selects a combination of hyperparameter metrics that eliminates the upper confidence range of the objective model’s classification error instead of minimizing the classification error itself.
- The hyperparameters that result in the smallest classification error are represented by the yellow point, indicating the corresponding iteration. Here, we observed that the L2 regularization with value 0.00000286 resulted in the min classification error 0.172 in the 6th iteration.
- The best point hyperparameter is the L2 regularization with value 0.00006155, the observed min classification error is 0.152 in the 55th iteration, and the L2 regularization with value 0.0000369 resulted when the min classification error occurs with value 0.152 in the 29th iteration. According to these two diagrams, we can conclude that grid search results in a more nominal classification error. Hence, it has better accuracy than Bayesian, as we have noticed from the confusion matrix and ROC curve. However, the minimum classification error plots indicate that Bayesian has a higher coverage speed as it reaches the minimum error in the sixth iteration.
- The first numerical value (83.1%) states that the neural network correctly predicted 83.1% of fault types as fault type 1. So, the true positives (TP) for Class 1 are 83.1% (blue orthogon).
- The second numerical value, denoted as 0, states that the neural network has not mispredicted type 1 as fault type 2. So, the false negatives (FN) for Class 1 are 0% (white orthogon).
- The third number (1.4%) states that the neural network incorrectly predicted type 1 as fault type 3 at a rate of 1.4%. The false negatives (FN) for Class 1 are 1.4% (light yellow orthogon).
- The fourth numerical value (8.5%) states that the neural network incorrectly predicted type 1 as fault type 4 at a rate of 8.5%. Here, the false negatives (FN) for Class 1 are 8.5 (orange orthogon).
- The fifth numerical value (7.0%) states that the neural network incorrectly predicted type 1 as fault type 5 at the rate of 7.0%. Here, the false negatives (FN) for Class 1 are 7.0% (orange orthogon).
- So, TP = 0.831, TN = 4.1, FP = 0.211, and FN = 0.169 for Class 1 in Figure 7a.
- Accuracy: (TP + TN)/(TP + TN + FP + FN) = 0.92;
- Precision: TP/(TP + FP) = 0.797;
- Recall = TP/(TP + FN) = 0.831;
- F1 Score = 2 ∗ (Precision ∗ Recall)/(Precision + Recall) = 0.813;
- Specificity = TN/(TN + FP) = 0.95.
- 2.
- Adaptive Learning Rate Method
- A.
- Adam solver
- Squared Gradient Decay Factor;
- Learning rate;
- Gradient Decay Factor (controls the strength of L2 regularization factor);
- Epsilon (ε): A tiny value which is added to the denominator to prevent division by zero, usually in the range of 1 × 10−7 or 1 × 10−8. Machine learning usually encounters Epsilon when computing ratios, gradients, or other mathematical procedures involving division. Adding Epsilon guarantees that the division stays defined even if the denominator is near zero and that the computation does not result in numerical instability or errors.
- Best validation accuracy: 90.4%
- Best test accuracy: 0.907
- Gradient Decay Factor: 0.999
- Learning rate: 0.001
- We carry out an identical process for both the SGDM optimizer and the RMSprop opt. The findings are presented in the following tables and figures.
- B.
- SGDM optimization
- Best validation accuracy: 82.7%
- Best test accuracy: 81%
- Momentum: 0.9
- Learning rate: 0.00008
- C.
- RMSprop
- Validation accuracy: 67.31%
- Test accuracy: 0.75
- Momentum: 0.99
- Learning rate: 0.0003
- Best validation accuracy: 86%
- Best test accuracy: 83%
- Momentum: 0.999
- Learning rate: 0.001
3. Results and Analysis
- Number of layers: 3;
- Layers size: 100, 50, 5;
- Activation function: Tanh.
- Number of layers: 3
- Layers size: 100, 50, 5
- Activation function: Tanh
- Best validation accuracy: 90.4%,
- Best test accuracy: 90.7%
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Thango, B.A. Dissolved Gas Analysis and Application of Artificial Intelligence Technique for Fault Diagnosis in Power Transformers: A South African Case Study. Energies 2022, 15, 9030. [Google Scholar] [CrossRef]
- Alsuhaibani, S.; Khan, Y.; Beroual, A.; Malik, N.H. A Review of Frequency Response Analysis Methods for Power Transformer Diagnostics. Energies 2016, 9, 879. [Google Scholar] [CrossRef]
- Bhalla, D.; Bansal, R.K.; Gupta, H. Application of Artificial Intelligence Techniques for Dissolved Gas Analysis of Transformers—A Review. World Acad. Sci. Eng. Technol. 2010, 62, 221–229. [Google Scholar]
- Singh, J.; Sood, Y.; Jarial, R. Condition Monitoring of Power Transformers Bibliography Survey. IEEE Electr. Insul. Mag. 2008, 24, 11–25. [Google Scholar] [CrossRef]
- Papadopoulos, A.E.; Psomopoulos, C.S. The contribution of dissolved gas analysis as a diagnostic tool for the evaluation of the corrosive sulphur activity in oil insulated traction transformers. In Proceedings of the 6TH IET Conference on Railway Condition Monitoring (RCM), University of Birmingham, Birmingham, UK, 17–18 September 2014. [Google Scholar]
- Koroglu, S. A Case Study on Fault Detection in Power Transformers Using Dissolved Gas Analysis and Electrical Test Methods. J. Electr. Syst. 2016, 12, 442–459. [Google Scholar]
- Siva Sarma, D.V.S.S.; Kalyani, G.N.S. ANN approach for condition monitoring of power transformers using DGA. In Proceedings of the IEEE Region 10 Conference TENCON, Chiang Mai, Thailand, 24 November 2004. [Google Scholar]
- Rokani, V.; Kaminaris, S.D. Power Transformers Fault Diagnosis Using AI Techniques. AIP Conf. Proc. 2020, 2307, 020056. [Google Scholar] [CrossRef]
- Barkas, D.A.; Kaminaris, S.D.; Kalkanis, K.K.; Ioannidis, G.C.; Psomopoulos, C.S. Condition Assessment of Power Transformers through DGA Measurements Evaluation Using Adaptive Algorithms and Deep Learning. Energies 2023, 16, 54. [Google Scholar] [CrossRef]
- Patel, D.M.K.; Patel, D.A.M. Simulation and analysis of dga analysis for power transformer using advanced control methods. Asian J. Converg. Technol. 2021, 7, 102–109. [Google Scholar] [CrossRef]
- Ciulavu, C.; Helerea, E. Power Transformer Incipient Faults Monitoring. Ann. Univ. Craiova-Electr. Eng. Ser. 2008, 32, 72–77. [Google Scholar]
- Dhini, A.; Faqih, A.; Kusumoputro, B.; Surjandari, I.; Kusiak, A. Data-driven Fault Diagnosis of Power Transformers using Dissolved Gas Analysis (DGA). Int. J. Technol. 2020, 11, 388–399. [Google Scholar] [CrossRef]
- Rogers, R.R. IEEE and IEC Codes to Interpret Incipient Faults in Transformers, Using Gas in Oil Analysis. IEEE Trans. Electr. Insul. 1978, 5, 349–354. [Google Scholar] [CrossRef]
- IEEE. C57.104-1991-IEEE Guide for the Interpretation of Gases Generated in Oil-Immersed Transformers; IEEE: New York, NY, USA, 1992. [Google Scholar] [CrossRef]
- IEEE DataPort. Available online: https://ieee-dataport.org/documents/dissolved-gas-data-transformer-oil-fault-diagnosis-power-transformers-membership-degree#files (accessed on 17 April 2023).
- Hussein, A.R.; Yaacob, M.; Othman, M. Ann Expert System for Diagnosing Faults and Assessing the Quality Insulation Oil of Power Transformer Depending on the DGA Method. J. Theor. Appl. Inf. Technol. 2015, 78, 278. [Google Scholar]
- Sarma, J.J.; Sarma, R. Fault Analysis of High Voltage Power. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 2017, 6, 2411–2419. [Google Scholar] [CrossRef]
- Zhang, Y.; Tang, Y.; Liu, Y.; Liang, Z. Fault Diagnosis of Transformer Using Artificial Intelligence: A Review. Front. Energy Res. 2022, 10, 1006474. [Google Scholar] [CrossRef]
- Lopes, S.M.d.A.; Flauzino, R.A.; Altafim, R.A.C. Incipient Fault Diagnosis in Power Transformers by Data-Driven Models with over-Sampled Dataset. Electr. Power Syst. Res. 2021, 201, 107519. [Google Scholar] [CrossRef]
- Bishop, C.M. Pattern Recognition and Machine Learning. In Information Science and Statistics; Springer: Berlin/Heidelberg, Germany, 2006; ISBN 978-0-387-31073-2. [Google Scholar]
- Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach. In Always Learning; Pearson: London, UK, 2016; ISBN 978-1-292-15396-4. [Google Scholar]
- Ling-fang, H. Artificial Intelligence. In Proceedings of the 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE), Singapore, 26–28 February 2010; Volume 4, pp. 575–578. [Google Scholar]
- Haykin, S.S. Neural Networks and Learning Machines. In Pearson International Edition; Pearson: London, UK, 2009; ISBN 978-0-13-129376-2. [Google Scholar]
- Hutter, F.; Lücke, J.; Schmidt-Thieme, L. Beyond Manual Tuning of Hyperparameters. KI—Kunstl. Intell. 2015, 29, 329–337. [Google Scholar] [CrossRef]
- Bartz-beielstein, E.B.T.; Zaefferer, M. Tuning for Machine and Deep Learning with R; Springer: Singapore, 2023; ISBN 978-981-19516-9-5. [Google Scholar]
- Gridin, I. Hyperparameter Optimization. In Automated Deep Learning Using Neural Network Intelligence: Develop and Design PyTorch and TensorFlow Models Using Python; Apress: Berkeley, CA, USA, 2022; pp. 31–110. ISBN 978-1-4842-8149-9. [Google Scholar]
- Bi, C.; Tian, Q.; Chen, H.; Meng, X.; Wang, H.; Liu, W.; Jiang, J. Optimizing a Multi-Layer Perceptron Based on an Improved Gray Wolf Algorithm to Identify Plant Diseases. Mathematics 2023, 11, 3312. [Google Scholar] [CrossRef]
- Xu, T.; Gao, Z.; Zhuang, Y. Fault Prediction of Control Clusters Based on an Improved Arithmetic Optimization Algorithm and BP Neural Network. Mathematics 2023, 11, 2891. [Google Scholar] [CrossRef]
- Asimakopoulou, G.; Kontargyri, V.; Tsekouras, G.; Asimakopoulou, F.; Gonos, I.; Stathopulos, I. Artificial Neural Network Optimisation Methodology for the Estimation of the Critical Flashover Voltage on Insulators. IET Sci. Meas. Technol. 2009, 3, 90–104. [Google Scholar] [CrossRef]
- Kussulʹ, E.M.; Baidyk, T.; Wunsch, D.C. Neural Networks and Micromechanics; Springer: Heidelberg/Berlin, Germany; New York, NY, USA, 2010; ISBN 9783642025341. [Google Scholar]
- Geman, S.; Bienenstock, E.; Doursat, R. Neural Networks and the Bias/Variance Dilemma. In Neural Computation; MIT Press: Cambridge, MA, USA, 1992; Volume 4, pp. 1–58. [Google Scholar]
- Girolami, M. A First Course in Machine Learning; CRC Press: Boca Raton, FL, USA, 2015; ISBN 978-1-4987-5960-1. [Google Scholar]
- Alemu, H.Z.; Wu, W.; Zhao, J. Feedforward Neural Networks with a Hidden Layer Regularization Method. Symmetry 2018, 10, 525. [Google Scholar] [CrossRef]
- Sun, S.; Cao, Z.; Zhu, H.; Zhao, J. A Survey of Optimization Methods from a Machine Learning Perspective. IEEE Trans. Cybern. 2020, 50, 3668–3681. [Google Scholar] [CrossRef] [PubMed]
- Bejani, M.M.; Ghatee, M. A Systematic Review on Overfitting Control in Shallow and Deep Neural Networks; Springer: Dordrecht, The Netherlands, 2021; Volume 54, ISBN 0-12-345678-9. [Google Scholar]
- Tian, Y.; Zhang, Y.; Zhang, H. Recent Advances in Stochastic Gradient Descent in Deep Learning. Mathematics 2023, 11, 682. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J.L. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015—Conference Track Proceedings, San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar]
- Beale, M.H.; Hagan, M.T.; Demuth, H.B. Deep Learning Toolbox TM User’s Guide How to Contact MathWorks. 2020. Available online: https://www.mathworks.com/help/deeplearning/ (accessed on 17 April 2023).
- Mathworks. Available online: https://www.mathworks.com/content/dam/mathworks/mathworks-dot-com/campaigns/por-tals/files/machine-learning-resource/machine-learning-with-matlab.pdf (accessed on 1 April 2022).
- Kim, P. MATLAB Deep Learning; Apress: Berkeley, CA, USA, 2017; ISBN 978-1-4842-2844-9. [Google Scholar]
- Biswas, S.; Nayak, P.K.; Panigrahi, B.K.; Pradhan, G. An Intelligent Fault Detection and Classification Technique Based on Variational Mode Decomposition-CNN for Transmission Lines Installed with UPFC and Wind Farm. Electr. Power Syst. Res. 2023, 223, 109526. [Google Scholar] [CrossRef]
- Biswas, S.; Nayak, P.K. A New Approach for Protecting TCSC Compensated Transmission Lines Connected to DFIG-Based Wind Farm. IEEE Trans. Ind. Inf. 2021, 17, 5282–5291. [Google Scholar] [CrossRef]
- Bouzar-Benlabiod, L.; Rubin, S.H.; Benaida, A. Optimizing Deep Neural Network Architectures: An Overview. In Proceedings of the 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science, Las Vegas, NV, USA, 10–12 August 2021; pp. 25–32. [Google Scholar] [CrossRef]
- Kamalov, F.; Leung, H.H. Deep Learning Regularization in Imbalanced Data. In Proceedings of the 2020 IEEE International Conference on Communications, Computing, Cybersecurity, and Informatics, Sharjah, United Arab Emirates, 3–5 November 2020; pp. 17–21. [Google Scholar] [CrossRef]
Ratio Code | Range | Code |
---|---|---|
MH (CH4/H2) | x < 0.1 0.1 ≤ x ≤ 1.0 1.0 ≤ x ≤ 3.0 x > 3.0 | 5 0 1 2 |
EM (C2H6/CH4) | x < 1.0 x ≥ 1.0 | 0 1 |
EE (C2H4/C2H6) | x < 1.0 1.0 ≤ x ≤ 3.0 x > 3.0 | 0 1 2 |
AE (C2H2/C2H4) | x < 0.1 0.1 ≤ x ≤ 3.0 x > 3.0 | 0 1 2 |
No. | CH4/H2 | C2H6/CH4 | C2H4/C2H6 | C2H2/C2H4 | Fault Type |
---|---|---|---|---|---|
1 | 0 | 0 | 0 | 0 | No Fault |
2 | 1–2 | 0 | 0 | 0 | <150 °C Thermal Fault |
3 | 1–2 | 1 | 0 | 0 | 150–200 °C Thermal Fault |
4 | 0 | 1 | 0 | 0 | 200–300 °C Thermal Fault |
5 | 0 | 0 | 1 | 0 | General Conductor Overheating |
6 | 1 | 0 | 1 | 0 | Winding Circulating Currents |
7 | 1 | 0 | 2 | 0 | Core and Tank Circulating Currents |
Overheated Joints | |||||
8 | 5 | 0 | 0 | 0 | Partial Discharge |
9 | 5 | 0 | 0 | 1–2 | Partial Discharge with Tracking |
10 | 0 | 0 | 0 | 1 | Flashover Without Power Follow-Through |
11 | 0 | 0 | 1–2 | 1–2 | Arc with Power Follow-Through |
12 | 0 | 0 | 2 | 2 | Continuous Sparking to Floating Potential |
L1 Regularization | L2 Regularization |
---|---|
Shrinks weights magnitudes toward 0 | Shrinks weights magnitudes to be small but not precisely 0 |
Penalizes the sum of the absolute values of the weights | Penalizes the sum of the square values of the weights |
The cost of outliers current in the data raises linearly | The cost of outliers current in the data raises exponentially |
Preferable when the model is simple | Preferable when the model is complex |
Number of Predictors | Number of Observations | Number of Classes | Response |
---|---|---|---|
4 | 350 | 5 | Faults |
Classifiers | Classifier Type | Accuracy% (Validation) | Cost (Validation) | Prediction Speed (obj/s) | Training Time (s) | Model Size (KB) |
---|---|---|---|---|---|---|
Decision trees | Fine Tree | 78 | 77 | 6388.3 | 6.37 | 11.991 |
Medium Tree | 78 | 77 | 21,522.5 | 0.87 | 11.991 11.991 | |
Coarse Tree | 67.4 | 77 | 20,652.2 | 0.37 | 5.011 | |
SVM | Quadratic Cubic Fine Gaussian Medium G. Coarse G. | 77.7 80.9 81.1 80.6 72.0 | 78 67 66 68 98 | 9763.7 10,405.0 9530.5 10,127.4 10,758.9 | 0.7 0.6 0.7 0.6 0.5 | 60.282 58.026 65.810 65.042 76.082 |
KNN | Fine | 80.3 80.3 | 69 69 | 7547.1 7547.1 | 0.7 0.7 | 28.934 28.934 |
Medium | 77.4 77.4 | 79 | 9726.3 | 0.4 | 28.934 | |
Cubic | 76.9 | 81 | 10,022.9 | 0.4 | 28.934 | |
Weighted | 81.4 | 65 | 10,189.9 | 0.4 | 28.952 | |
ANN | Narrow Medium Wide Bilayered Trilayered | 81.4 82.9 81.1 82.0 81.7 | 65 60 66 63 64 | 13,372.7 21,741.1 20,582.1 22,651.2 21,268.2 | 3.8 2.5 2.8 3.6 4.6 | 6.143 7.343 13.343 7.943 9.743 |
Naïve Bayes | Kernel | 70.0 | 105 | 6570.1 | 1.1 | 92.444 |
NumLayers | Activation Function | Layer 1 Size | Layer 2 Size | Layer 3 Size | Validation Accuracy |
---|---|---|---|---|---|
1 | Sigmoid | 240 | 5 | 0 | 75.0 |
3 | Tanh | 200 | 100 | 5 | 80.1 |
3 | ReLu | 300 | 200 | 5 | 80.2 |
3 | ReLu | 100 | 5 | 0 | 80.2 80.2 |
3 | Tanh | 100 | 50 | 5 | 80.9 |
Num. of Layers | Activation Function | Layer 1 Size | Layer 2 Size | Layer 3 Size | Validation Accuracy (%) |
---|---|---|---|---|---|
3 | Tanh | 100 | 50 | 5 | 80.9 |
Algorithm | Validation Accuracy | Test Accuracy | L2 |
---|---|---|---|
Random search (Model 8) | 82 | 85 | 0.001 |
Bayesian optimizer (Model 12) | 82.86 | 83.1 | 0.00002 |
Grid search (Model 4) | 84.57 | 86 | 0.00002 |
Manual search (Model 2) | 80.9 | 85 | 0.01 |
Adam Training Options | SGDM Training Options | RMSprop Training Options |
---|---|---|
Gradient Decay Factor: 0.999 | Momentum: 0.900 | Squared Gradient Decay Factor: 0.990 |
Squared Gradient Decay Factor: 0.999 | Initial Learning Rate: 0.0100 | Epsilon: 1.0000 × 10−8 |
Epsilon: 1.0000 × 10−8 | Learning Rate Schedule: ‘piecewise’ | Initial Learning Rate: 3.0000 × 10−4 |
Initial Learning Rate: 1.0000 × 10−3 | Learning Rate Drop Factor: 0.2000 | Learning Rate Schedule: ‘none’ |
Learn Rate Schedule: ‘none’ | Learning Rate Drop Period: 5 | Learning Rate Drop Factor: 0.1000 |
Learning Rate Drop Factor: 0.1000 | L2 Regularization: 1.0000 × 10−4 | Learning Rate Drop Period: 10 |
Learning Rate Drop Period: 10 | Gradient Threshold Method: ‘l2norm’ | L2 Regularization: 1.0000 × 10−4 |
L2 Regularization: 1.0000 × 10−4 | Gradient Threshold: Inf | Gradient Threshold Method: ‘l2norm’ |
Gradient Threshold Method: ‘l2norm’ | Mini-Batch Size: 64 | Gradient Threshold: Inf |
Gradient Threshold: Inf | Max Epochs: 100 | Mini-Batch Size: 64 |
Mini-Batch Size: 64 | Max Epochs: 20 | |
Max Epochs: 30 |
Algorithm | Validation Accuracy | Testing Accuracy |
---|---|---|
Adam | 90.4% | 0.907 |
SGDM | 82% | 0.814 |
RMSprop | 86% | 0.833 |
(1) S.N | (2) H2 | (3) CH4 | (4) C2H6 | (5) C2H4 | (6) C2H2 | (7) Real Fault | (8) Rogers | (9) Agr. | (10) ANN | (11) Agr. |
---|---|---|---|---|---|---|---|---|---|---|
1 | 13 | 138 | 83 | 16 | 0 | F4 | F4 | √ | F4 | √ |
2 | 762 | 93 | 38 | 54 | 126 | F3 | F3 | √ | F3 | √ |
3 | 43 | 116 | 65 | 139 | 0 | F4 | F4 | √ | F4 | √ |
4 | 179 | 306 | 73 | 579 | 0 | F4 | F4 | √ | F4 | √ |
5 | 57 | 141 | 38 | 51 | 0 | F4 | F4 | √ | F4 | √ |
6 | 40 | 8 | 34 | 15 | 0 | F4 | F4 | √ | F4 | √ |
7 | 35 | 283 | 121 | 222 | 0 | F4 | U.F | ⸺ | F4 | √ |
8 | 15 | 159 | 29 | 87 | 0 | F4 | U.F | ⸺ | F5 | ⸺ |
9 | 55 | 159 | 114 | 493 | 0 | F4 | F4 | √ | F4 | √ |
10 | 37 | 123 | 67 | 52 | 0 | F4 | F4 | √ | F4 | √ |
11 | 723 | 191 | 110 | 293 | 288 | F3 | F3 | √ | F3 | √ |
12 | 7 | 15 | 78 | 58 | 0 | F4 | F4 | √ | F4 | √ |
13 | 30 | 51 | 12 | 54 | 0 | F4 | F4 | √ | F4 | √ |
14 | 31 | 56 | 33 | 77 | 0 | F4 | F4 | √ | F4 | √ |
15 | 109 | 226 | 68 | 192 | 0 | F4 | F4 | √ | F4 | √ |
16 | 137 | 279 | 66 | 505 | 0 | F4 | F4 | √ | F4 | √ |
17 | 59 | 119 | 36 | 70 | 0 | F4 | F4 | √ | F4 | √ |
18 | 151 | 242 | 68 | 232 | 0 | F4 | F4 | √ | F4 | √ |
19 | 870 | 77 | 73 | 54 | 14 | F2 | F2 | √ | F2 | √ |
20 | 376 | 575 | 146 | 1092 | 0 | F4 | F4 | √ | F4 | √ |
21 | 269 | 1081 | 347 | 1725 | 25 | F5 | U.F | ⸺ | F5 | √ |
22 | 10 | 10 | 8 | 1 | 0.01 | F4 | F4 | √ | F4 | √ |
23 | 30 | 22 | 14 | 4.10 | 0.1 | F1 | F1 | √ | F1 | √ |
24 | 2.90 | 2 | 2 | 0.3 | 0.1 | F1 | F1 | √ | F1 | √ |
25 | 4 | 99 | 82 | 4 | 0.1 | F4 | F4 | √ | F4 | √ |
26 | 21 | 34 | 5 | 47 | 62 | F3 | U.F | ⸺ | F3 | √ |
27 | 50 | 100 | 51 | 305 | 9 | F4 | F4 | √ | F4 | √ |
28 | 120 | 17 | 32 | 4 | 23 | F1 | U.F | ⸺ | F1 | √ |
29 | 980 | 73 | 58 | 12 | 0.01 | F2 | F2 | √ | F2 | √ |
30 | 1607 | 615 | 80 | 916 | 1294 | F3 | U.F | ⸺ | F3 | √ |
31 | 14.7 | 3.7 | 10.5 | 2.7 | 0.2 | F4 | U.F | ⸺ | F5 | ⸺ |
32 | 181 | 262 | 41 | 28 | 0.01 | F4 | U.F | ⸺ | F4 | √ |
33 | 173 | 334 | 172 | 812.5 | 33.7 | F4 | F4 | √ | F4 | √ |
34 | 127 | 107 | 11 | 154 | 224 | F3 | F4 | ⸺ | F3 | √ |
35 | 60 | 40 | 6.9 | 110 | 70 | F3 | F4 | ⸺ | F3 | √ |
36 | 980 | 73 | 58 | 12 | 0.01 | F2 | F3 | ⸺ | F2 | √ |
37 | 86 | 187 | 136 | 363 | 0.01 | F4 | F3 | ⸺ | F5 | ⸺ |
38 | 10 | 24 | 372 | 24 | 0.01 | F4 | U.F | ⸺ | F4 | √ |
39 | 260 | 3 | 18 | 2 | 0.01 | F2 | F2 | √ | F2 | √ |
40 | 586 | 19 | 77 | 6 | 0.01 | F2 | F4 | ⸺ | F2 | √ |
41 | 20 | 175 | 92 | 14 | 0.02 | F4 | F4 | √ | F4 | √ |
42 | 801 | 87 | 45 | 62 | 150 | F3 | F3 | √ | F3 | √ |
43 | 51 | 99 | 75 | 150 | 0.03 | F4 | F4 | √ | F4 | √ |
44 | 200 | 298 | 69 | 602 | 0.05 | F5 | F4 | ⸺ | F4 | ⸺ |
45 | 60 | 154 | 41 | 49 | 0 | F4 | F4 | √ | F4 | √ |
46 | 40 | 8 | 34 | 15 | 0.2 | F1 | F4 | ⸺ | F4 | ⸺ |
47 | 45 | 283 | 158 | 199 | 0 | F4 | U.F | ⸺ | F4 | √ |
48 | 21 | 159 | 22 | 91 | 0.02 | F4 | U.F | ⸺ | F4 | √ |
49 | 55 | 159 | 128 | 502 | 0 | F5 | F4 | ⸺ | F5 | √ |
50 | 41 | 223 | 71 | 52 | 0 | F3 | F4 | √ | F4 | ⸺ |
51 | 689 | 203 | 129 | 301 | 362 | F2 | F3 | √ | F2 | √ |
52 | 10 | 24 | 95 | 45 | 0.02 | F4 | F4 | √ | F4 | √ |
53 | 45 | 69 | 7 | 45 | 0.003 | F5 | F4 | ⸺ | F5 | √ |
54 | 45 | 59 | 45 | 89 | 0.01 | F4 | F4 | √ | F4 | √ |
55 | 98 | 198 | 70 | 201 | 0.04 | F4 | F4 | √ | F4 | √ |
56 | 204 | 302 | 57 | 495 | 0 | F5 | F4 | ⸺ | F5 | √ |
57 | 45 | 125 | 48 | 82 | 0 | F4 | F4 | √ | F4 | √ |
58 | 201 | 256 | 54 | 224 | 0 | F4 | F4 | √ | F4 | √ |
59 | 905 | 83 | 81 | 63 | 12 | F2 | F2 | √ | F2 | √ |
60 | 402 | 604 | 99 | 998 | 0.02 | F5 | F4 | ⸺ | F5 | √ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rokani, V.; Kaminaris, S.D.; Karaisas, P.; Kaminaris, D. Power Transformer Fault Diagnosis Using Neural Network Optimization Techniques. Mathematics 2023, 11, 4693. https://doi.org/10.3390/math11224693
Rokani V, Kaminaris SD, Karaisas P, Kaminaris D. Power Transformer Fault Diagnosis Using Neural Network Optimization Techniques. Mathematics. 2023; 11(22):4693. https://doi.org/10.3390/math11224693
Chicago/Turabian StyleRokani, Vasiliki, Stavros D. Kaminaris, Petros Karaisas, and Dimitrios Kaminaris. 2023. "Power Transformer Fault Diagnosis Using Neural Network Optimization Techniques" Mathematics 11, no. 22: 4693. https://doi.org/10.3390/math11224693
APA StyleRokani, V., Kaminaris, S. D., Karaisas, P., & Kaminaris, D. (2023). Power Transformer Fault Diagnosis Using Neural Network Optimization Techniques. Mathematics, 11(22), 4693. https://doi.org/10.3390/math11224693