# Random Forest Regressor-Based Approach for Detecting Fault Location and Duration in Power Systems

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Methodology

#### 2.1. Random Forest Regressor (RFR) Model

#### 2.2. Dataset

## 3. Results and Discussion

#### 3.1. Experiments and Metrics

#### 3.2. Models Hyperparameters Tuning

^{−6}and 1 × 10

^{−2}, respectively. NB’s results indicated that changes to alpha or lambda values do not have a significant impact on model performance. HT’s optimal parameters are information gain, as a split function, and a split confidence set to 1 × 10

^{−5}. HT’s results indicated that selecting a lower confidence level while using information gain reduced error rates and their standard deviations significantly.

#### 3.3. Experiment Result #1: Fault Location Detection

#### 3.4. Experiment Results #2: Fault Duration Prediction

#### 3.5. Experiment Results #3: Handling Missing Data

#### 3.6. Experiment Results #4: Handling Streaming Data

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Conflicts of Interest

## Abbreviations

AM | Active management |

BPN | Back-propagation neural network |

CNN | Convolutional neural network |

DBSCAN | Density-based spatial clustering and application with noise |

DLG | Double line to ground |

DNN | Deep neural network |

DT | Decision tree |

DWT | Discrete wavelet transform |

ELE | Event location estimation |

EZ | Electrical zone |

FNN | Feedforward neural network |

GA | Genetic algorithm |

GPS | Global Positioning System |

HPC | High-performance computing |

HT | Hoeffding tree |

KNN | k-nearest neighbors |

LL | Line to line |

LLL | Three phase to ground |

MAE | Mean absolute error |

ML | Machine learning |

MLE | Maximum likelihood estimation |

MSE | Mean squared error |

MTHVDC | Multi-terminal high-voltage direct current |

MWE | Modified wavelet energy |

NB | Naive Bayes |

NBC | Naive Bayes classifier |

PDT | Physics-based decision tree |

PMU | Phasor measurement unit |

PNN | Probabilistic neural network |

PV | Photovolatic |

RF | Random forest |

RFR | Random forest regressor |

RNN | Recurrent neural network |

SE | Shannon’s entropy |

SLG | Single line to ground |

SVM | Support vector machine |

## References

- Haes Alhelou, H.; Hamedani-Golshan, M.E.; Njenda, T.C.; Siano, P. A survey on power system blackout and cascading events: Research motivations and challenges. Energies
**2019**, 12, 682. [Google Scholar] [CrossRef][Green Version] - Salimian, M.R.; Aghamohammadi, M.R. A three stages decision tree-based intelligent blackout predictor for power systems using brittleness indices. IEEE Trans. Smart Grid
**2017**, 9, 5123–5131. [Google Scholar] [CrossRef] - Zhang, Y.; Xu, Y.; Dong, Z.Y. Robust ensemble data analytics for incomplete PMU measurements-based power system stability assessment. IEEE Trans. Power Syst.
**2017**, 33, 1124–1126. [Google Scholar] [CrossRef] - Lawton, L.; Sullivan, M.; Van Liere, K.; Katz, A.; Eto, J. A Framework and Review of Customer Outage Costs: Integration and Analysis of Electric Utility Outage Cost Surveys; Technical Report; Lawrence Berkeley National Lab. (LBNL): Berkeley, CA, USA, 2003. [Google Scholar]
- Jaech, A.; Zhang, B.; Ostendorf, M.; Kirschen, D.S. Real-time prediction of the duration of distribution system outages. IEEE Trans. Power Syst.
**2018**, 34, 773–781. [Google Scholar] [CrossRef][Green Version] - Joyokusumo, I.; Putra, H.; Fatchurrahman, R. A Machine Learning-Based Strategy For Predicting The Fault Recovery Duration Class In Electric Power Transmission System. In Proceedings of the 2020 International Conference on Technology and Policy in Energy and Electric Power (ICT-PEP), Bandung, Indonesia, 23–24 September 2020; pp. 252–257. [Google Scholar]
- Gururajapathy, S.S.; Mokhlis, H.; Illias, H.A. Fault location and detection techniques in power distribution systems with distributed generation: A review. Renew. Sustain. Energy Rev.
**2017**, 74, 949–958. [Google Scholar] [CrossRef] - Ajenikoko, G.A.; Sangotola, S.O. An overview of impedance-based fault location techniques in electrical power-transmission network. Int. J. Adv. Eng. Res. Appl. (IJA-ERA)
**2016**, 2, 123–130. [Google Scholar] - Lim, P.K.; Dorr, D.S. Understanding and resolving voltage sag related problems for sensitive industrial customers. In Proceedings of the 2000 IEEE Power Engineering Society Winter Meeting. Conference Proceedings (Cat. No. 00CH37077), Singapore, 23–27 January 2000; Volume 4, pp. 2886–2890. [Google Scholar]
- Ma, G.; Jiang, L.; Zhou, K.; Xu, G. A Method of line fault location based on traveling wave theory. Int. J. Control Autom.
**2016**, 9, 261–270. [Google Scholar] [CrossRef] - Alwash, S.F.; Ramachandaramurthy, V.K.; Mithulananthan, N. Fault-location scheme for power distribution system with distributed generation. IEEE Trans. Power Deliv.
**2014**, 30, 1187–1195. [Google Scholar] [CrossRef][Green Version] - Javadian, S.A.M.; Nasrabadi, A.M.; Haghifam, M.R.; Rezvantalab, J. Determining fault’s type and accurate location in distribution systems with DG using MLP Neural networks. In Proceedings of the 2009 International Conference on Clean Electrical Power, Capri, Italy, 9–11 June 2009; pp. 284–289. [Google Scholar]
- Aslan, Y. An alternative approach to fault location on power distribution feeders with embedded remote-end power generation using artificial neural networks. Electr. Eng.
**2012**, 94, 125–134. [Google Scholar] [CrossRef] - Dehghani, F.; Nezami, H. A new fault location technique on radial distribution systems using artificial neural network. In Proceedings of the 22nd International Conference and Exhibition on Electricity Distribution (CIRED 2013), Stockholm, Sweden, 10–13 June 2013. [Google Scholar]
- Li, W.; Deka, D.; Chertkov, M.; Wang, M. Real-Time Faulted Line Localization and PMU Placement in Power Systems Through Convolutional Neural Networks. IEEE Trans. Power Syst.
**2019**, 34, 4640–4651. [Google Scholar] [CrossRef][Green Version] - Zainab, A.; Refaat, S.S.; Syed, D.; Ghrayeb, A.; Abu-Rub, H. Faulted Line Identification and Localization in Power System using Machine Learning Techniques. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 2975–2981. [Google Scholar]
- Okumus, H.; Nuroglu, F.M. A random forest-based approach for fault location detection in distribution systems. Electr. Eng.
**2021**, 103, 257–264. [Google Scholar] [CrossRef] - Madeti, S.R.; Singh, S. Modeling of PV system based on experimental data for fault detection using kNN method. Sol. Energy
**2018**, 173, 139–151. [Google Scholar] [CrossRef] - Pandey, S.; Srivastava, A.; Amidan, B. A Real Time Event Detection, Classification and Localization using Synchrophasor Data. IEEE Trans. Power Syst.
**2020**, 35, 4421–4431. [Google Scholar] [CrossRef] - Ekici, S. Support Vector Machines for classification and locating faults on transmission lines. Appl. Soft Comput.
**2012**, 12, 1650–1658. [Google Scholar] [CrossRef] - Kim, D.I.; White, A.; Shin, Y.J. Pmu-based event localization technique for wide-area power system. IEEE Trans. Power Syst.
**2018**, 33, 5875–5883. [Google Scholar] [CrossRef] - Hossam-Eldin, A.; Lotfy, A.; Elgamal, M.; Ebeed, M. Combined traveling wave and fuzzy logic based fault location in multi-terminal HVDC systems. In Proceedings of the 2016 IEEE 16th International Conference on Environment and Electrical Engineering (EEEIC), Florence, Italy, 7–10 June 2016; pp. 1–6. [Google Scholar]
- Mohammadnian, Y.; Amraee, T.; Soroudi, A. Fault detection in distribution networks in presence of distributed generations using a data mining–driven wavelet transform. IET Smart Grid
**2019**, 2, 163–171. [Google Scholar] [CrossRef] - Chow, M.Y.; Taylor, L.S.; Chow, M.S. Time of outage restoration analysis in distribution systems. IEEE Trans. Power Deliv.
**1996**, 11, 1652–1658. [Google Scholar] [CrossRef] - Palmer, B.; Perkins, W.; Chen, Y.; Jin, S.; Callahan, D.; Glass, K.; Diao, R.; Rice, M.; Elbert, S.; Vallem, M. GridPACKTM: A framework for developing power grid simulations on high-performance computing platforms. Int. J. High Perform. Comput. Appl.
**2016**, 30, 223–240. [Google Scholar] [CrossRef] - Muallem, A.; Shetty, S.; Pan, J.W.; Zhao, J.; Biswal, B. Hoeffding Tree Algorithms for Anomaly Detection in Streaming Datasets: A Survey. J. Inf. Secur.
**2017**, 8, 720–726. [Google Scholar] [CrossRef][Green Version] - He, Y.; Mendis, G.J.; Wei, J. Real-Time Detection of False Data Injection Attacks in Smart Grid: A Deep Learning-Based Intelligent Mechanism. IEEE Trans. Smart Grid
**2017**, 8, 2505–2516. [Google Scholar] [CrossRef] - Mrabet, Z.E.; Selvaraj, D.F.; Ranganathan, P. Adaptive Hoeffding Tree with Transfer Learning for Streaming Synchrophasor Data Sets. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 5697–5704. [Google Scholar]
- Dahal, N.; Abuomar, O.; King, R.; Madani, V. Event stream processing for improved situational awareness in the smart grid. Expert Syst. Appl.
**2015**, 42, 6853–6863. [Google Scholar] [CrossRef] - Adhikari, U.; Morris, T.H.; Pan, S. Applying hoeffding adaptive trees for real-time cyber-power event and intrusion classification. IEEE Trans. Smart Grid
**2017**, 9, 4049–4060. [Google Scholar] [CrossRef] - Breiman, L. Random Forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef][Green Version] - Glocker, B.; Pauly, O.; Konukoglu, E.; Criminisi, A. Joint classification-regression forests for spatially structured multi-object segmentation. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 870–881. [Google Scholar]
- Linusson, H. Multi-Output Random Forests; School of Business and IT, University of Borås: Borås, Sweden, 2013. [Google Scholar]
- Pauly, O. Random Forests for Medical Applications. Ph.D. Thesis, Technische Universität München, München, Germany, 2012. [Google Scholar]
- Paper, D.; Paper, D. Scikit-Learn Regression Tuning. In Hands-On Scikit-Learn for Machine Learning Applications: Data Science Fundamentals with Python; Apress: Berkeley, CA, USA, 2020; pp. 189–213. [Google Scholar]
- Cheng, R.; Fang, Y.; Renz, M. Data Classification: Algorithms and Applications; CRC Press: Boca Raton, FL, USA, 2014; pp. 37–64. [Google Scholar]

**Figure 4.**Comparison between the proposed model (RFR) and NN, DNN, SVM, NB, DT, and HT in terms of fault location detection accuracy at various location.

**Figure 5.**The MAE and MSE of RFR, NN, DNN, SVM, NB, DT, and HT in terms of fault duration prediction.

**Figure 7.**MSE and MAE as a function of the percentage of missing data for the three models: DNN, HT, and RFR.

Category | Approach | Fault Types | Advantages | Limitations |
---|---|---|---|---|

Conventional | Impedance-based [8,11] | Physical | Ease of implementation. | The accuracy can be affected in the case of a grounded fault where the fault resistance is high. The fault duration was not considered. |

Time wave-based [10] | Physical | Large resistance, load variance, grounding resistance, reflection, and refraction of the traveling wave and series capacitor bank. | The accuracy depends on the correctness of the line parameters’ estimated values, including capacitance and line inductance. The fault duration was not considered. | |

Machine learning | NN + Levenberg–Marquardt [12,13] | Physical | The detection error is less than 3%. High tolerance to the fault resistance, fault type, fault location, and the embedded remote-end source. | The convergence time for the training process is high. The fault duration was not considered. |

NN-based [14] | Physical | Optimal results in terms of estimating the fault distance from the sub-stations even under network–topological changes. High tolerance to noise. | Inappropriate for detecting fault location in a streaming power system network. | |

CNN-based [15] | Physical | Optimal localization estimation even under low visibility (7% of buses). | The fault duration was not considered. | |

RF + DT [16] | Physical | Fault location detection accuracy is 91% with a minimum number of buses (5–7%). | ||

RF [17] | Physical | Fault location detection accuracy is 90.96% in distribution systems. | ||

MLE + DBSCAN [19] | Physical | The proposed data cleansing approach outperforms Chevyshev and K means and achieve a precision of 95%. Less than 0.9 s to classify event for a typical window size of 30 sample data. | ||

KNN [18] | Physical | Fault location accuracy reaches 98.70% with an error between 0.61% and 6.5%. | The proposed model was trained/tested on the PV system only. | |

HAT + DDM + ADWIN [29,30] | Physical and cyber | Classification accuracy is greater than 94% for multiclass and greater than 98% for binary class. Adaptable to the concept of drift events. | The fault location and duration were not considered. | |

RNN-based [5], NBC + SVM [6] | Physical | Predicting fault duration with 97% accuracy. The RNN-based approach is suitable for a real-time environment. | The fault location was not considered. | |

Hybrid | Wavelet transform + SVM [20] | Physical | The fault classification error is below 1% for all fault types. The overall error is 0.26% for SLG, 0.74% for LLG, 0.20% for LL, and 0.39% for LLLG. | The fault duration was not considered. Not suitable for streaming power system data. The accuracy of the SVM depends on selecting and tuning the appropriate kernel type and hyper-parameters. |

Wavelet analysis + K-means +ELE [21] | Physical | Fault location accuracy attained 100%. | The fault duration was not considered. | |

Wavelet analysis + Fuzzy logic [22] | Physical | The error between the actual fault location and the predicted one is low than 0.002%. | ||

Discrete wavelet transform+ SVM [23] | Physical | Fault location accuracy is 98.27% for IEEE 13-bus and 98.29% for the IEEE 34-bus test systems. |

Scenario | Fault Location | Fault Duration | Simulation Time | Number of Generated Sample for Each Fault Duration | Number of Generated Samples for Each Scenario |
---|---|---|---|---|---|

Scenario 1–9 | Apply fault at bus 1–9 | 0.05 s to 0.5 s with a step of 0.05 s | 10 s | 594 samples | 5945 samples/scenario. Total number of samples is 53,512 |

Model | Hyperparameters | Mean Squared Error | Standard Deviation | Model | Hyperparameters | Mean Squared Error | Standard Deviation | ||||
---|---|---|---|---|---|---|---|---|---|---|---|

KNN | Weightfunction | Uniform | 1 | 11.21 | 2.6 | RFR | Max feature: sqrt | Number of trees | 1 | 10.31 | 2.68 |

10 | 7.25 | 0.45 | 10 | 6.45 | 0.53 | ||||||

100 | 6.71 | 0.17 | 100 | 6.2 | 0.67 | ||||||

Distance | 1 | 11.21 | 2.6 | Max feature:log2 | Number oftrees | 1 | 10.52 | 2.68 | |||

10 | 7.24 | 0.43 | 10 | 6.75 | 1.26 | ||||||

100 | 6.7 | 0.16 | 100 | 6.15 | 0.63 | ||||||

SVM | Polynomial kernel | C=1 | 6.013 | 0.11 | NN | Relufunction | Number ofhidden nodes | 150 | 4.37 | 0.18 | |

C=5 | 6.13 | 0.14 | 300 | 4.64 | 0.23 | ||||||

C=10 | 6.16 | 0.08 | 450 | 4.62 | 0.12 | ||||||

Radialbasis function (RBF) kernel | C=1 | 6.09 | 0.14 | Identity function | Number of hidden nodes | 150 | 6.15 | 0.08 | |||

C=5 | 6.17 | 0.08 | 300 | 6.15 | 0.06 | ||||||

C=10 | 5.9 | 0.1 | 450 | 6.16 | 0.08 | ||||||

DT | Minimum leaf size = 1 | Random state | 0 | 10.51 | 3.56 | NB | Alpha = 1 × 10^{−6} | Lambda | 1 × 10^{−6} | 1.26 × 10^{−3} | 1.42 × 10^{−4} |

1 | 10.39 | 3.65 | 1 × 10^{−4} | 1.17 × 10^{−3} | 1.56 × 10^{−4} | ||||||

2 | 10.58 | 3.57 | 1 × 10^{−2} | 1.07 × 10^{−3} | 1.66 × 10^{−4} | ||||||

Minimum leafsize = 6 | Random state | 0 | 9.32 | 3.15 | Alpha = 1 × 10^{−4} | Lambda | 1 × 10^{−6} | 1.14 × 10^{−3} | 1.98 × 10^{−4} | ||

1 | 9.29 | 3.12 | 1 × 10^{−4} | 1.19 × 10^{−3} | 1.51 × 10^{−4} | ||||||

2 | 9.31 | 3.15 | 1 × 10^{−2} | 1.15 × 10^{−3} | 2.37 × 10^{−4} | ||||||

DNN | Relu function | 5 hidden layers | 50 hidden nodes | 1.20 × 10^{−2} | 2.40 × 10^{−3} | HT | Split function: Gini Index | Split confidence | 1 × 10^{−5} | 12.41 | 4.88 |

100 hiddennodes | 1.12 × 10^{−2} | 1.39 × 10^{−3} | 1 × 10^{−4} | 14.53 | 6.13 | ||||||

150 hidden nodes | 1.14 × 10^{−2} | 1.39 × 10^{−3} | 1 × 10^{−3} | 14.91 | 6.24 | ||||||

10 hidden layers | 50 hidden nodes | 1.12 × 10^{−2} | 3.51 × 10^{−3} | Split function:Information gain | Splitconfidence | 1 × 10^{−5} | 10.88 | 2.89 | |||

100 hidden nodes | 1.14 × 10^{−2} | 1.39 × 10^{−3} | 1 × 10^{−4} | 11.24 | 8.13 | ||||||

100 hidden nodes | 1.20 × 10^{−2} | 2.40 × 10^{−3} | 1 × 10^{−3} | 17.64 | 7.22 |

**Table 4.**Summary of the RFR’s performances compared to those of DNN, HT, NN, SVM, DT, NB, and KNN, obtained in the four experiments.

Experiment | Performance Metrics | RFR | DNN | HT | NN | SVM | DT | NB | KNN |
---|---|---|---|---|---|---|---|---|---|

1. Detecting fault location | Overall accuracy for four fault locations | 84% | 72.5% | 27% | 18.75% | 14% | 2% | 8.25% | 41% |

2. Predicting fault duration | MSE | 1.1 s | 1.2 s | 1.1 s | 5.6 s | 6.5 s | 6.6 s | 6.2 s | 5.1 s |

MAE | 0.6 s | 0.6 s | 0.6 s | 1.9 s | 2.2 s | 2.5 s | 2.2 s | 1.8 s | |

3. Handling missing data | MSE | 4.6 s | 8.4 s | 8.7 s | - | - | - | - | - |

MAE | 1.5 s | 2.09 s | 2.14 s | - | - | - | - | - | |

4. Detecting fault in streaming data | Processing time per sample | 0.0028 ms | 0.0032 ms | 0.7 ms | - | - | - | - | - |

Overall ranking | High | Medium | Low | Low | Low | Low | Low | Low |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

El Mrabet, Z.; Sugunaraj, N.; Ranganathan, P.; Abhyankar, S.
Random Forest Regressor-Based Approach for Detecting Fault Location and Duration in Power Systems. *Sensors* **2022**, *22*, 458.
https://doi.org/10.3390/s22020458

**AMA Style**

El Mrabet Z, Sugunaraj N, Ranganathan P, Abhyankar S.
Random Forest Regressor-Based Approach for Detecting Fault Location and Duration in Power Systems. *Sensors*. 2022; 22(2):458.
https://doi.org/10.3390/s22020458

**Chicago/Turabian Style**

El Mrabet, Zakaria, Niroop Sugunaraj, Prakash Ranganathan, and Shrirang Abhyankar.
2022. "Random Forest Regressor-Based Approach for Detecting Fault Location and Duration in Power Systems" *Sensors* 22, no. 2: 458.
https://doi.org/10.3390/s22020458