Machine Learning Techniques for Fault Detection in Smart Distribution Grids

Hariharan, Vishakh K.; Geetha, Amritha; Granelli, Fabrizio; Nair, Manjula G.

doi:10.3390/en18195179

Open AccessArticle

Machine Learning Techniques for Fault Detection in Smart Distribution Grids

¹

Department of Electrical and Electronics Engineering, Amrita Vishwa Vidyapeetham, Amritapuri 690525, India

²

Department of Information Engineering and Computer Science (DISI), University of Trento, 38123 Trento, Italy

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(19), 5179; https://doi.org/10.3390/en18195179

Submission received: 18 August 2025 / Revised: 16 September 2025 / Accepted: 18 September 2025 / Published: 29 September 2025

Download

Browse Figures

Versions Notes

Abstract

Fault detection is critical to the resilience and operational integrity of electrical power grids, particularly smart grids. In addition to requiring a lot of labeled data, traditional fault detection approaches have limited flexibility in handling unknown fault scenarios. In addition, since traditional machine learning models rely on historical data, they struggle to adapt to new fault patterns in dynamic grid environments. Due to these limitations, fault detection systems have limited resilience and scalability, necessitating more advanced approaches. This paper presents a hybrid technique that integrates supervised and unsupervised machine learning with Generative AI to generate artificial data to aid in fault identification. A number of machine learning algorithms were compared with regard to how they detect symmetrical and asymmetrical faults in varying conditions, with a particular focus on fault conditions that have not happened before. A key feature of this study is the application of the autoencoder, a new machine learning model, to compare different ML models. The autoencoder, an unsupervised model, performed better than other models in the detection of faults outside the learning dataset, pointing to its potential to enhance smart grid resilience and stability. Also, the study compared a generative AI-generated dataset (D2) with a conventionally prepared dataset (D1). When the two datasets were utilized to train various machine learning models, the synthetic dataset (D2) outperformed D1 in accuracy and scalability for fault detection applications. The strength of generative AI in improving the quality of data for machine learning is thus indicated by this discovery.By emphasizing the necessity of using advanced machine learning techniques and high-quality synthetic datasets, this research aims to increase the resilience of smart grid networks through improved fault detection and identification.

Keywords:

autoencoder; fault detection; fault specificity; generative adversarial network machine learning; resilience; smart grid; synthetic data generation

1. Introduction

Stimulated by the unabated global demand for electrical energy, the conventional power grid infrastructures have experienced unrelenting upgrades to accommodate changing requirements, such as the high penetration of renewable energy. The emergence of smart electrical power grid technologies has made desired enhancements, offering improved efficiency, reliability, and sustainability. Evolutionary perceptible transformations invariably present challenges; a primary issue facing smart grids is the detection and management of faults. Intrinsic equipment malfunctions or failures, in addition to invincible cyber-attacks, present significant risks to the uninterrupted operability of smart grids, which call for fault detection-armored grids to guarantee uninterrupted services. The complex nature of smart grids, characterized by the integration of various energy sources, advanced communication networks, and intelligent control systems, imposes difficult obstacles and makes fault detection extremely difficult. In contrast to traditional power grids, smart grids are distinguished by the continuous exchange of real-time data that are processed by advanced algorithms to optimize energy flow, track the health of the system, and react to dynamic shifts in energy supply and demand. Due to this increased complexity, fault occurrences have a greater impact and must be detected and mitigated quickly to prevent widespread disruptions.

Traditional fault detection methods are based on large labeled datasets, rule-based thresholds, and predefined fault models. However, particularly in a rapidly changing grid landscape, these approaches frequently lack adaptation to changing fault circumstances. Furthermore, the efficacy of classic machine learning (ML) algorithms in identifying unique fault states is limited due to their reliance on past data. Modern power systems are becoming increasingly complicated, necessitating a move towards more sophisticated fault detection techniques that make use of state-of-the-art artificial intelligence (AI) methodology. Deep learning-based architectures, especially autoencoders, have shown promise in addressing these constraints. Because autoencoders are unsupervised models, they learn the underlying data representations instead of only relying on labeled fault situations, allowing them to demonstrate higher generalization capabilities.

Furthermore, the incorporation of Generative AI for the creation of synthetic data has shown promise in augmenting scarce real-world datasets, resolving issues associated with data scarcity, and improving model resilience. Generative AI can enhance fault detection model training and testing by producing high-quality synthetic datasets, guaranteeing the models’ effectiveness in a variety of unexpected and varied fault scenarios.

This study investigates a two-pronged strategy for the detection of smart grid failure. Initially, a thorough comparison of several machine learning models is performed, supervised and unsupervised, to evaluate how well they identify symmetric and unsymmetrical flaws. Among them, a fault detection system based on autoencoders is presented that exhibits enhanced precision in detecting invisible flaws. Second, to assess the effect of synthetic data on model performance, a conventional dataset (D1) is contrasted with a synthetic dataset (D2) produced using generative artificial intelligence. In order to improve smart grid resilience and eventually create a more resilient and adaptable fault detection framework, the study emphasizes the value of sophisticated machine learning architectures and superior synthetic datasets.

For smart grid systems to remain operationally reliable and sustainable, fault detection and categorization are essential. The accuracy and flexibility needed for contemporary power networks are frequently lacking in traditional fault detection techniques. Fault detection, classification, and mitigation techniques have greatly improved with the integration of machine learning (ML) and artificial intelligence (AI) models. With an emphasis on machine learning, deep learning, and generative AI approaches, this review of the literature examines current approaches and developments in fault detection within smart grids.

Generation, transmission, and distribution are among the operational scenarios in which smart grids are susceptible to a variety of problems. Ref. [1] offers a thorough analysis of smart grid failure types and characterization, including advanced metering infrastructure (AMI), communication systems, cyberattack detection, and real-time monitoring. Economic issues that affect consumers and service providers must be taken into consideration by fault detection systems.

ML-based methods for smart grid fault detection have been the subject of numerous investigations. Ref. [2] suggested a Recurrent Neural Network (RNN) model that uses voltage and current measurements to identify arc and pole-to-pole faults in electric vehicle (EV) charging systems. For validation, accuracy and F1-score were used. In a similar vein, ref. [3] addressed consumer resistance to the deployment of smart meters, highlighting the function of monitoring and security applications in smart grids.

Concerns about voltage instability have grown as EVs and photovoltaic (PV) systems are integrated into power grids. Ref. [4] proposed integrating smart meters as a way to address undetected voltage variations. In [5], a 9-bus distribution system using the Fault Detection, Isolation, and Restoration (FDIR) technique was used to illustrate the self-healing potential of smart grids. Ref. [6] explored how plug-in hybrid EVs affected grid stability and suggested coordinated charging to lessen disruptions. Rapid charging and vehicle-to-grid (V2G) capabilities were also examined in the study. Furthermore, ref. [7] improved grid protection, especially in microgrid settings, by introducing a revolutionary anti-islanding strategy based on Support Vector Machines (SVM).

In microgrids, traditional protection strategies frequently fail, requiring ML-based alternatives. Ref. [8] introduced a hybrid machine learning technique that combines Gaussian regression for localization and prediction with SVM for fault identification. The model was put to the test in conditions of load fluctuation and distributed generation (DG) penetration. By utilizing networked intelligent agents to maximize communication, control, and protection, multi-agent systems (MASs) have been investigated as a means of enhancing smart grid operations. The effectiveness of MASs in controlling power system functions, such as transmission switching, relaying, and plant control, is demonstrated by a study [9] examining MAS-based smart grid control. However, while an MAS enhances decision-making in smart grids, its direct application to real-time fault detection and classification remains an area for further exploration.

The emergence of sensor faults in power systems demands prompt identification. Ref. [10] utilized the Unknown Input Observer to ensure accuracy in fault detection under uncertain conditions, including renewable energy fluctuations and load variations. Ref. [11] examined fault detection via the JADE platform using a multi-agent architecture, integrating EV batteries with PV systems for effective restoration.

Artificial Neural Networks (ANNs) have been widely employed for fault detection, classification, and location identification. Ref. [12] utilized ANNs for detecting transmission line faults, leveraging their ability to handle nonlinear system volatilities. A model incorporating Empirical Wavelet Transform (EWT) and cyclic entropy for fault detection in EV inverters was proposed in [13], effectively mitigating non-Gaussian white noise effects. A fault detection model utilizing Matching Pursuit Decomposition (MPD) and Hidden Markov Models (HMM) was presented in [14]. The study demonstrated superior accuracy by grouping voltage and frequency characteristics through multiple algorithms. Similarly, ref. [15] employed discrete wavelet transform to classify transmission line faults within a Simulink simulation environment.

Sparse autoencoders have been leveraged for overvoltage detection and classification, as demonstrated in [16], which achieved feature extraction and dimensionality reduction without manual feature engineering. Ref. [17] introduced an Isolation Forest-based anomaly detection scheme for smart grids, focusing on anomalies in current, voltage, and power consumption using real-world data. K-nearest neighbor (KNN) algorithms were employed for fault location in [18], reducing errors induced by multiple factors and improving localization precision. Ref. [19] compared the performance of three ML algorithms for fault detection, whereas [20] proposed an Extreme Learning Machine (ELM)-based approach for fault detection in extensive EV charging stations, validated through Simulink simulations. The complex nature of smart grids—which are characterized by the integration of various energy sources, advanced communication networks, and intelligent control systems—imposes difficult obstacles and makes fault detection extremely challenging. In contrast to traditional power grids, smart grids are distinguished by the continuous exchange of real-time data that is processed by advanced algorithms to optimize energy flow, track system health, and respond to dynamic shifts in supply and demand. Consequently, robust condition monitoring, fault detection and identification, and intelligent system frameworks are indispensable to guarantee reliable operation [21]. Foundational studies in mechanical, structural, and energy domains demonstrate how intelligent monitoring approaches can significantly enhance resilience and predictive capabilities, providing a strong basis for their application in smart grid environments [22].

A comparative overview of recent studies is provided in Table 1, which summarizes the key contributions and limitations of various approaches to smart grid fault detection and control. As shown in the table, traditional methods (e.g., threshold- or rule-based) often lack adaptability under diverse operating conditions, while machine learning-based techniques such as SVM, Random Forest, and Autoencoders demonstrate higher flexibility and accuracy. Moreover, recent works emphasize the role of synthetic data generation (e.g., via GANs) to overcome data scarcity, highlighting the importance of integrating generative models into the fault detection framework.

Traditional ML models often require high-quality training data, which is expensive and labor-intensive to obtain. Generative Adversarial Networks (GANs) have emerged as a promising solution for synthetic dataset generation. Ref. [24] demonstrated the efficacy of GANs in producing synthetic load profiles that closely resemble empirical data, enhancing the accuracy of fault detection. Forecasting errors for electricity usage are greatly decreased when artificial datasets are combined with actual data. The integration of synthetic datasets with empirical data can markedly diminish prediction errors in electricity consumption and enhance risk management assessments. We can use a GAN-based model that generates synthetic data by using individual electricity consumption data as input to explore this possibility. We study one-dimensional time series, and numerical tests on an empirical dataset verify that GANs can produce synthetic data that is quite similar to the real data.

In order to improve fault detection in power systems and smart grids, this study investigated the application of both conventional and generative techniques. Using two different datasets—one created using conventional techniques (D1) and the other using generative artificial intelligence (D2)—a comparison of supervised and unsupervised machine learning models was conducted. The study emphasizes how generative AI may produce artificial data that closely mimics actual circumstances, improving the precision and dependability of fault identification. We also investigate the role of autoencoders as unsupervised models. This work demonstrates how GAN-generated synthetic datasets can improve ML-based load models, increasing the smart grid applications’ robustness in identifying random and invisible failures.

This work aims to address the following novelty.

Two-Dataset Method: For a thorough evaluation of fault detection performance, the research uses both synthetic datasets produced using generative artificial intelligence (D2) and traditionally maintained datasets (D1).
Superiority of the Autoencoder: A thorough assessment of the Autoencoder’s capacity to identify faults outside of the training dataset as an unsupervised model, hence improving the resilience of the smart grid.

This study makes contributions both to theory and practice. From a theoretical standpoint, it advances the integration of supervised and unsupervised machine learning approaches with generative models, demonstrating how synthetic data can overcome limitations of conventional datasets and improve generalization to unseen fault conditions. From a practical perspective, the proposed framework validates the effectiveness of GAN-generated data for smart grid fault detection, providing a scalable and adaptable solution that enhances grid resilience under diverse operating conditions. Together, these contributions highlight the dual value of the work: extending the methodological foundations of AI-based fault detection while offering actionable insights for real-world power system applications.

The overall structure of the paper is outlined as follows. Section 2 presents the system model and methodology, including the simulation setup, dataset preparation, and machine learning models considered. Section 3 describes the data collection from the smart distribution grid and the generation of synthetic datasets. Section 4 provides the comparative analysis of supervised and unsupervised learning models under both conventional and random fault conditions. Section 5 introduces the GAN-based framework for synthetic data generation and compares classifier performance using real and synthetic datasets. Finally, Section 6 summarizes the key findings, discusses their implications for smart grid resilience, and highlights directions for future research.

2. System Model and Methodology

The methodology adopted in this research integrates detailed power system simulations with advanced machine learning and generative modeling techniques. The overall approach includes (i) defining the research problem and objectives, (ii) constructing and simulating the grid model, (iii) preparing both real and synthetic datasets, (iv) developing and tuning fault detection models, and (v) evaluating the models through multiple performance indicators.

The central research problem is the reliable detection of diverse fault types in distribution grids that integrate renewable resources. Traditional protection schemes are limited in adaptability under stochastic operating conditions, motivating the use of data-driven methods. The objectives of this work are (a) to benchmark conventional and AI-based fault detection methods, (b) to assess the contribution of synthetic data generated through Generative Adversarial Networks (GANs), and (c) to evaluate detection accuracy across operating conditions with variable distributed generation (DG).

The study is conducted on the IEEE 9-bus test system, implemented in MATLAB/Simulink. The network consists of multiple feeders and distributed generators modeled as constant power sources. To capture operational diversity, scenarios with varying levels of photovoltaic and wind generation are simulated. Faults considered include single line-to-ground, line-to-line, double line-to-ground, and three-phase faults. Each case records bus voltages, currents, and relay signals for subsequent feature extraction.

The dataset combines simulated fault cases with synthetic samples generated using GANs. The synthetic augmentation ensures balanced class distribution and addresses the limited availability of rare fault types. All features are normalized using min–max scaling. The dataset is divided into training and testing subsets using an 80:20 split, with five-fold cross-validation applied on the training set to enhance generalization.

Four machine learning approaches are developed and tuned: Support Vector Machine (SVM with RBF kernel), Random Forest (200 trees, depth optimized by grid search), K-Nearest Neighbors (k = 5), and an autoencoder-based anomaly detector. Hyperparameters such as kernel width, learning rate, and maximum depth are optimized through grid search. The models are implemented in Python using the Scikit-Learn and TensorFlow libraries.

The models are evaluated using accuracy, precision, recall, F1-score, and the Area Under the ROC Curve (AUC). Confusion matrices are analyzed to identify misclassification patterns among different fault types. This ensures both overall performance and class-wise robustness are quantified.

The complete research pipeline is summarized in Figure 1. It illustrates the process from grid simulation, through synthetic data generation and preprocessing, to model training and evaluation.

The proposed methodology integrates system-level simulations, synthetic data augmentation, and advanced machine learning models to enable robust fault detection in a 9-bus distribution network. By simulating diverse fault scenarios and enriching the dataset with GAN-generated samples, the approach ensures adequate coverage of both common and rare operating conditions. The application of supervised and unsupervised learning models, coupled with rigorous training–testing protocols and cross-validation, provides a transparent and reproducible framework. This methodological design establishes a solid foundation for the comparative analysis presented in the following sections, enabling a fair evaluation of model accuracy, generalization capability, and resilience under stochastic distributed generation conditions.

3. Data Collection from Smart Distribution Grid

The simulation in Figure 2 is a representation of our college campus, segmented into two subsystems: the ashram section and the academic block section. The AC microgrid comprises two renewable energy sources: a PV system and a wind energy system, with specifications detailed in Table 2 and Table 3, respectively. For the sake of simulation on the load side, an EV charging station with DC fast charging is considered, with specifications given in Table 4. The transmission line is of length 10 km. The source feeder is 11 V 3-phase with a frequency of 50 Hz (Figure 2).

The following types of faults were simulated and analyzed:

Single Line-to-Ground (LG) Fault: This occurs when a single-phase conductor comes in contact with the ground.
Line-to-Line (LL) Fault: A short circuit between two-phase conductors.
Double Line-to-Ground (LLG) Fault: When two phase conductors are in contact with the ground.
Three-Phase (LLL) Fault: A short circuit between all three-phase conductors.
Three-Phase-to-Ground (LLLG) Fault: A severe fault condition where all three phases contact the ground.

The fault dataset was derived by inducing faults on the transmission line via the Simulink environment. The simulation was carried out for five fault conditions: LG, LL, LLG, LLL, and LLLG (Figure 3 and Figure 4). To ensure diversity in the dataset, each type of fault was applied at different positions along the transmission line, thereby capturing location-dependent variations in current and voltage profiles. One illustrative example of such a fault scenario is presented in Figure 2. In total, 12,000 samples with six features—namely, the RMS values of

V_{a}

,

V_{b}

,

V_{c}

, and

I_{a}

,

I_{b}

,

I_{c}

during both fault and no-fault conditions—were obtained and form the basis of the dataset used in this study.

3.1. Example Fault Scenarios

To illustrate the types of fault cases considered in the study, Table 5 presents representative examples of fault conditions derived from the Simulink model. These cover both symmetrical and asymmetrical faults, at different locations in the feeder and DER interconnection nodes. While the complete dataset contains a much larger set of cases (12,000 samples), the examples provide a clear indication of the diversity of scenarios included.

3.2. Renewable Sources and Sensitivity

Although the base Simulink model represents PV and wind units as controlled power sources, their output was not fixed at a single level. Instead, during the fault simulations, multiple operating points were imposed on both PV and wind generators to reflect their inherent variability. For the PV system, irradiance levels were varied between 20% and 100% of the nominal value, while for the wind system, turbine inputs were set across low, medium, and rated wind speeds. These variations ensured that the fault dataset encompassed diverse DER operating conditions rather than a static case.

By incorporating such operating point variability directly into the dataset generation, the study already includes a form of sensitivity analysis with respect to distributed generation (DG) outputs. This strengthens the applicability of the model to practical grid conditions where PV and wind exhibit stochastic behavior. Future work will extend this approach by integrating time-series renewable profiles and higher-resolution stochastic models to further capture dynamic fluctuations.

4. Comparative Analysis

Four different machine learning algorithms—KNN, SVM, Random Forest, and Autoencoder—were used in a case study on a dataset. The dataset we generated through simulation was used to train the machine learning models, and each model was examined. Accuracy score, Losses, Precision, Recall, F1-score, Receiver Operating Characteristic curve (ROC curve), and Area Under the Curve (AUC) were the primary parameters used to analyze each algorithm. The region that the ROC curve covers is denoted by AUC. An ideal choice is one with an AUC greater than 0.5. The True Positive Rate (TPR) and the False Positive Rate (FPR), which are explained below, are the two components that the ROC curve depends on. To identify the best algorithm for fault detection, the advantages and disadvantages of each algorithm were examined.

True Positive Rate is given as:

T P R = \frac{T P}{T P + F N}

(1)

False Positive Rate is given as:

F P R = \frac{F P}{F P + T N}

(2)

To thoroughly assess the above-mentioned machine learning algorithms, it is crucial to consider factors such as computational efficiency, interpretability, and adaptability to diverse fault types. The simplicity and transparency of KNN render it an ideal foundational option, in contexts where interpretability of fault classification and detection processes is essential. The capability of SVM to manage non-linear relationships and high-dimensional data makes it a formidable option for intricate fault situations. Random Forest, with its ensemble approach, excels in capturing intricate patterns and providing robust predictions, while Autoencoder, leveraging neural network capabilities, proves effective in learning complex representations and detecting anomalies. The specific requirements of the power system and the nature of the problems detected ultimately determine which of these approaches is best.

4.1. Case 1: Training and Testing with Identical Fault Simulation Data

In the first case, both the training and testing datasets are obtained directly from the same pool of simulated fault data generated in the 9-bus Simulink model. Faults of different types and at different locations are simulated, and the dataset is randomly divided into training (80%) and testing (20%) subsets. This case provides a baseline evaluation of the models’ ability to learn fault signatures when the training and testing distributions are identical. Although it offers insight into model accuracy under controlled conditions, it does not reflect real-world variations where unseen operating conditions or new data distributions may occur.

4.1.1. K-Nearest Neighbor (KNN)

KNN is an instance-based, non-parametric learning method that uses the proximity principle. For a given dataset,

D = {(x_{i}, y_{i})}

where

x_{i}

represents input features and

y_{i}

denotes the class label. KNN uses the majority class of its k-nearest neighbors to predict the class label for a new input x. The following is the categorization mathematical equation:

y = majority (y_{1}, y_{2}, \dots, y_{k})

(3)

The projected class label is denoted by y in this equation, whereas the number of neighbors taken into account is k. KNN is a good option for preliminary research because it is easy to use and computationally efficient.

To validate the algorithm, the Accuracy score was computed and an accuracy of 99% was achieved as shown in Figure 5. The classification report in Table 6 gives the precision, recall, and F1 score of the model for both normal conditions and fault-induced cases. This algorithm produces the second-lowest value for the loss function, as shown in Figure 6, and the AUC curve, given in Figure 7, is 0.9975 which suggests very efficient performance.

For this study, the KNN classifier was implemented with

k = 5

using the Euclidean distance metric. The dataset was divided into 80% training and 20% testing, with random shuffling applied to avoid bias. A five-fold cross-validation strategy was employed to improve generalization and reduce overfitting.

4.1.2. Support Vector Machine (SVM)

SVM is a discriminative model that excels at tasks involving binary classification. Given a set of training data

(x_{i}, y_{i})

where

y_{i} \in {- 1, 1}

, a hyperplane that maximally divides classes is found by SVM. The definition of the decision function is:

f (x) = sgn (\sum_{i = 1}^{N} α_{i} y_{i} 〈 x_{i}, x 〉 + b)

(4)

The parameters

α_{i}

and b in this equation were acquired during the training phase, and

〈 x_{i}, x 〉

denotes the dot product of vectors

x_{i}

and x. SVM is a flexible option for power system failure detection because of its capacity to manage high-dimensional data and non-linear decision boundaries.

Support Vector Machine is a supervised learning algorithm primarily used for classification tasks, and in this study, it was employed to classify data into two categories: “Normal” (no fault) and “With Fault” (fault present). Two variants of SVM were implemented—Support Vector Classification (SVC) and Nu-Support Vector Classification (NuSVC). Both can function as either binary or multi-class classifiers, but a binary classification approach was used in this study. The key difference between SVC and NuSVC lies in how they regulate support vectors; SVC uses the regularization parameter C, which controls the trade-off between maximizing the margin and minimizing classification errors, whereas NuSVC replaces C with

ν

(Nu), a parameter that controls the proportion of support vectors and margin errors, making it more flexible in handling imbalanced datasets. The SVM classifier was trained using a radial basis function (RBF) kernel with regularization parameter

C = 1.0

and kernel coefficient

γ = scale

. An 80/20 train–test split with five-fold cross-validation was applied for consistent evaluation.

From the simulation results, SVC outperformed NuSVC in terms of accuracy and loss value, indicating better classification performance. However, despite its structured classification approach, SVM had the lowest accuracy and Area Under the Curve (AUC) among all the machine learning algorithms evaluated. The AUC, which measures classification quality, confirmed that SVM performed the worst in Table 7, making it the least preferred model for fault detection in the proposed system. Due to its lower accuracy and weaker classification ability in this fault detection context, SVM is not the most suitable algorithm for the given dataset. While it remains a powerful classifier in certain applications, its performance in detecting faults in smart grids was suboptimal compared to other machine learning methods analyzed in this study.

4.1.3. Random Forest

Several decision trees are used in the ensemble learning technique Random Forest to produce reliable predictions. A bootstrap sample of the data is used to train each tree, and the trees’ diversity is increased through random feature selection. The sum of the various tree predictions determines the final prediction. The following is the expression for the decision tree’s mathematical equation:

y = Tree (x; Θ)

(5)

where

Θ

represents the set of parameters defining the decision tree and x represents the features of the data. When it comes to managing intricate relationships in data, preventing overfitting, and producing accurate predictions, Random Forest shines. The Random Forest model was implemented with 100 estimators (trees) and a maximum depth of 10. Similar to the other models, an 80/20 train–test split and five-fold cross-validation were adopted to ensure robustness and reduce bias.

The Random Forest classifier achieved the highest accuracy among all the algorithms compared in this investigation. Acknowledging the least value of the loss function and an AUC of 1.0 contributed by the Random Forest Classifier, it was the best ML algorithm among the supervised learning models examined in this study, giving exemplary results as given in Figure 5.

Figure 5. Accuracy comparison of different machine learning algorithms.

4.1.4. Autoencoder

Autoencoder is built on a neural network architecture, designed for unsupervised learning, particularly effective for anomaly detection. Comprised of an encoder and a decoder, it maps input data to a latent representation and reconstructs the input from this representation. Anomaly detection is achieved by measuring the reconstruction error as defined by the following equation:

Reconstruction Error = ∥ x - Decoder (Encoder (x)) ∥

(6)

In this equation, x represents the input data, and the reconstruction error quantifies the dissimilarity between the input data x and its reconstructed form. Autoencoders are adept at capturing complex patterns in the data, making them suitable for power system fault detection. Autoencoder is a fully connected unsupervised neural network and was trained using only clean input data, as well as the targeted output data of this neural network. Autoencoder is of interest in this study due to the following advantages: its ability to reduce the dimensionality of data, and to learn from the correlation between feature vectors, i.e., Autoencoders are not attack-specific. Autoencoder gave an accuracy of 98%, Figure 5, which is lower compared to other algorithms. The Autoencoder was configured with three hidden layers using ReLU activation, a latent dimension of 16, and trained with the Adam optimizer at a learning rate of 0.001. The model was trained exclusively on normal operating data, with evaluation based on an 80/20 train–test division and five-fold cross-validation.

The results indicate that Autoencoder and Random Forest are the most reliable algorithms for fault detection in smart distribution grids. Autoencoder’s ability to minimize reconstruction error and Random Forest’s high accuracy and low loss values make them suitable for ensuring grid resilience. KNN, while simple and interpretable, and SVM, although effective for specific scenarios, do not perform as well as Autoencoder and Random Forest in this context.

The results demonstrate that supervised learning algorithms, such as KNN, SVM, and Random Forest, are effective models for fault detection in a smart grid when trained on labeled datasets containing both normal and faulty conditions. However, their performance is highly dependent on the nature of the test data, particularly when it closely resembles the training data. In contrast, unsupervised learning models like Autoencoders operate differently; they are trained exclusively on normal (clean) data and detect anomalies based on deviations from learned patterns. This distinction is critical when evaluating fault detection capabilities, as supervised models may exhibit high accuracy when tested on faults they have been explicitly trained on but struggle with unseen or random faults—fault conditions that were not included in the training dataset.

Random faults, such as high impedance faults, intermittent faults, or cyber-induced disturbances, introduce unpredictable variations that test a model’s ability to generalize beyond predefined faults. To ensure a fair comparison, the study evaluates how each model performs when tested with such unknown faults. This highlights a key limitation of supervised models, which tend to be fault-specific, whereas Autoencoders generalize better by identifying deviations from normal behavior. Therefore, for robust fault detection in smart grids, where unforeseen faults are common, Autoencoders offer a significant advantage in ensuring grid resilience.

The comparative analysis of machine learning algorithms for fault detection in transmission lines indicates that the Random Forest model is the most reliable, exhibiting consistently low log loss (Figure 6) and high AUC values (Figure 7). The SVM model also demonstrates excellent performance, particularly in capturing non-linear relationships and managing high-dimensional data (Figure 5 and Figure 6). Although KNN is a simpler approach, it achieves competitive classification accuracy and retains interpretability, making it suitable for baseline fault detection applications (Figure 5 and Figure 6). The Autoencoder, while effective for unsupervised anomaly detection, shows higher log loss (Figure 6) and comparatively lower robustness in terms of AUC (Figure 8). Overall, Random Forest and SVM emerge as the most effective models, offering a favorable balance of accuracy, adaptability, and interpretability across diverse fault conditions.

Figure 6. Log loss for case 1.

Figure 7. AUC of supervised learning algorithms.

Figure 8. AUC of Autoencoders.

4.2. Case 2: Training with Fault Simulation Data and Testing with a Random Fault

Accuracies of the supervised algorithms drop drastically when tested for a random fault; comparison of the accuracies is presented in Figure 9. The supervised algorithms are inefficient in detecting a fault that is out of scope of their training data, which indicates that they only succeed at detecting specific faults that they are trained for. Therefore, the supervised algorithms in this case study, SVM, KNN, and Random forest, are fault-specific in nature. This performance characteristic is least preferred in the context of a smart grid. A smart grid is vulnerable to diverse kinds of faults because of the extensive integration of information and communication technologies and intelligent systems.

Autoencoders under the same condition retain an accuracy of 98%, similar to that of the previous condition. This can be attributed to the training process of Autoencoders, wherein the algorithm, trained only using clean data, learns from the correlation between the data entries. Therefore, autoencoders can detect any fault since such a dataset would stand out from the clean data and also because such a dataset would fail to produce a correlation structure similar to the training process. Therefore, Autoencoders are not fault-specific, which makes them an optimal choice for fault detection in smart grids and helpful to ensure enhanced grid stability and resilience. The log losses for the 2 cases are given in Table 8.

The comparative analysis was carried out on a workstation equipped with an Intel Core i7 processor (3.2 GHz), 16 GB RAM, and an NVIDIA GTX 1660 GPU All simulations of the distribution grid and fault scenarios were implemented in MATLAB/Simulink (R2022a). The machine learning models were developed in Python (version 3.9), utilizing the scikit-learn library for Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbors (KNN), and TensorFlow/Keras for the Autoencoder.

5. GAN-Based Algorithm for Generation of Synthetic Data in Fault Prediction

Building directly on the findings of the Comparative analysis, this section introduces a Generative Adversarial Network (GAN)-based approach for generating synthetic data (D2). The purpose of this extension is to address the limitations identified in the comparative analysis by augmenting the training dataset with diverse and realistic fault scenarios. By comparing the performance of machine learning models under both D1 and D2, we verify the effectiveness of GAN-generated data in enhancing fault detection accuracy and robustness.

This algorithm is designed to generate synthetic data for fault prediction using GAN, a critical tool for improving the reliability and efficiency of industrial systems. The Generative Adversarial Network (GAN)-based algorithm is designed to generate synthetic data for fault prediction, serving as a critical tool for improving the reliability and efficiency of industrial systems. Availability of labeled data for ML models is arduous due to various constraints such as privacy issues, so this paper proposes a method which uses GANs to create synthetic data points that closely mimic the underlying distribution of the original dataset.

The synthetic data generation process using GAN begins with preprocessing the dataset by cleaning, normalizing, and selecting relevant features, focused on the extraction of numerical features for fault prediction. Once these features are selected, GANs are trained to model the data as a combination of several Gaussian distributions, characterized by their means and covariance. After training, the GANs generate synthetic data by sampling from the learned distributions, ensuring the generated data replicates the original dataset’s underlying patterns. For categorical features, the algorithm assumes independence and samples based on their observed distributions, thereby preserving the categorical characteristics. Finally, the sampled numerical and categorical features are combined to create a synthetic dataset that closely mirrors the original data distribution. This synthetic dataset is valuable for training ML models, in situations where labeled data is limited or privacy concerns restrict access to openly accessible real-world datasets, thus enhancing the performance of fault prediction systems [26].

When access to real-world datasets is restricted by privacy concerns or labeled data is scarce, this synthetic dataset is a useful tool for training machine learning models. The creation of accurate machine learning models depends on having access to sufficient labeled data; however, obtaining a significant quantity of labeled data is frequently difficult because of several limitations in practical applications. By adding to already-existing datasets or producing brand-new ones that closely resemble the original data, synthetic data production offers an alternate approach. In this research, we focus on generating synthetic data for fault-type prediction using GANs.

In fields where data is expensive or hard to obtain, GANs have become a potent foundation for creating realistic synthetic data. The generator G and discriminator D, two neural networks that make up a GAN, are trained concurrently in a minimax game. The generator

G (z)

, where z is a noise vector, learns to map from the latent space in the data distribution

p_{d a t a}

. The discriminator

D (x)

learns to distinguish between real data samples

x \sim p_{d a t a}

and generated samples

G (z)

. The objective function of a GAN is given by:

min_{G} max_{D} V (D, G) = E_{x \sim p_{d a t a} (x)} [log D (x)] + E_{z \sim p_{z} (z)} [log (1 - D (G (z)))]

(7)

In this case, the discriminator seeks to maximize this loss, whereas the generator seeks to decrease it. The two networks’ adversarial natures push the generator to create data that is identical to the real data, increasing the model’s ability to produce high-quality synthetic data for a range of uses.

This study investigated the use of both conventional and generative techniques to improve fault detection in power systems and smart grids. It uses two datasets—a synthetic dataset produced using generative artificial intelligence (D2) and a traditionally curated dataset (D1)—to compare supervised and unsupervised machine learning methods. More precisely, because Dataset D1 includes experimental data gathered using traditional techniques, it approaches what might be referred to as the traditional notion of a dataset. The study demonstrates how the synthetic data greatly increases fault detection accuracy and reliability by closely simulating real-world settings.

As shown in Figure 10, the process begins with the acquisition of two datasets: (i) Dataset D1 obtained from conventional simulation of grid faults and (ii) Dataset D2 generated using GAN to mimic real operating conditions. Both datasets undergo preprocessing steps such as normalization and feature selection. The processed data are then used to train machine learning models (Random Forest, SVM, KNN, and Autoencoder). In addition, a voting-based ensemble classifier is applied to combine predictions from individual models. Evaluation metrics, including Accuracy, Precision, Recall, F1-score, log loss, and ROC–AUC, are calculated. Finally, the performance of the models trained on D1, and D2 is compared to validate the effectiveness of synthetic data in improving fault detection accuracy and robustness.

The results show how generative AI can produce high-quality datasets that improve fault detection processes, providing a more dependable and scalable solution for industrial applications—Figure 11.

To ensure fair comparability, both the conventional dataset D1 and the GAN-augmented synthetic dataset D2 are tied to the same Simulink grid model and the same set of fault locations. Dataset D2 is generated by training the GAN on features derived from D1 (RMS

V_{a}

,

V_{b}

,

V_{c}

,

I_{a}

,

I_{b}

,

I_{c}

) and then sampling additional points that preserve the joint distribution of these features under the same grid topology and operating ranges. Consequently, all model training and testing using D2 remains grounded in the identical physical grid representation used for D1. The purpose of our approach is to anticipate fault types using synthetic data, which is crucial when addressing privacy issues or the scarcity of relevant, high-quality data. To illustrate this, we use ensemble learning with a Voting Classifier to create a fault prediction system. We use important technologies to manage data pretreatment, model training, and evaluation libraries such as pandas, numpy, and scikit-learn. Our approach involves setting up several base classifiers, including Random Forest Classifier, Logistic Regression, Stochastic Gradient Descent (SGD) Classifier, and K-Nearest Neighbors (KNN) Classifier. We then create a Voting Classifier, combining the predictions of these base classifiers through a hard voting strategy. The ensemble model is trained on the available dataset and used to predict fault types on the generated synthetic dataset, as described in the subsequent sections.

5.1. Comparing Classifier Performance: Synthetic vs. Real Data

In fault detection tasks, synthetic data generated from Gaussian models offers valuable perspectives. When comparing the accuracy of classifiers trained on synthetic data rather than real-world data, noteworthy differences emerge. According to the examination, the Boosting Classifier (BC) [27] and the Random Forest Classifier (RFC) [28] both exhibit the best accuracy on synthetic data, with RFC slightly beating BC. According to these results, fault identification via synthetic data can be accomplished successfully while preserving competitive classifier performance [29].

The findings in Figure 11 show that some classifiers successfully use synthetic data to improve the accuracy of fault detection. Interestingly, when trained on synthetic data instead of real data, the KNN [30] classifier shows a substantial increase in accuracy. On the other hand, classifiers such as Gaussian Naive Bayes (GNB) [31] and Stochastic Gradient Descent (SGD) exhibit reduced accuracy on synthetic data. This implies that the synthetic data produced by Gaussian models might not accurately capture certain subtleties or features present in the general data, resulting in a decline in the models’ performance.

For many classifiers, the general increase in accuracy with synthetic data is apparent; nevertheless, additional research is necessary to understand the underlying dynamics behind these variations. Comparing the distribution of synthetic data to that of general data, spotting biases or deviations, and improving the synthetic data creation procedure to better match the features of the original dataset could all be part of this analysis. Furthermore, adjusting the Gaussian model parameters or investigating different synthetic data-generating techniques may improve the value of synthetic data in fault detection tasks.

With the general data results displayed on the right and the synthetic data results on the left, each bar reflects the accuracy added by a particular classifier. Insights into how well synthetic data generation strategies for fault prediction tasks improve classifier accuracy are provided by the graph, which shows the performance difference between the two types of datasets for different classifiers. The accuracy of various classifiers utilizing both synthetic and real data is compared in the bar graph. With the results for synthetic data on the left and the results for general data on the right, each bar shows the accuracy added by a particular classifier.

5.2. Impact of Synthetic Data Proportion on Model Performance

To further evaluate the effectiveness of GAN-generated data, we conducted an analysis by varying the proportion of synthetic data combined with real data during model training. Four scenarios were considered: 25% synthetic data, 50% synthetic data, 75% synthetic data, and 100% synthetic data. The evaluation was performed using Accuracy, Precision, Recall, F1-score, and ROC–AUC metrics.

The results in Table 9 indicate that introducing synthetic data improves model performance when used in moderate proportions. In particular, adding 50–75% synthetic data yields the highest improvement in Accuracy and AUC. However, when the training dataset consists entirely of synthetic data (100%), a slight decline in performance is observed compared to the mixed scenarios. This suggests that GAN-generated data are highly effective for augmenting real data, but the best results are obtained when synthetic and real datasets are combined.

These findings validate the effectiveness of the data generated by GAN and highlight the importance of balancing synthetic and real data when developing robust fault detection models.

6. Conclusions

This study emphasized the importance of integrating generative artificial intelligence (AI) with advanced machine learning for fault detection in electrical power networks and smart grids. Among the evaluated models, the Autoencoder proved most effective in identifying random, unseen faults, highlighting its ability to enhance grid resilience beyond the limits of label-dependent supervised models.

The use of a synthetic dataset (D2) generated by generative adversarial networks (GANs) significantly improved model accuracy and scalability compared to the conventional dataset (D1), addressing data scarcity and enabling the creation of diverse fault scenarios without extensive manual collection. The introduction of random faults during testing further validated the adaptability of Autoencoders, which consistently outperformed supervised models such as KNN, SVM, and Random Forest.

Nevertheless, the proposed approach has some limitations. The GAN-generated synthetic data, while effective, may not fully capture the stochastic variations of renewable energy sources such as PV and wind. The evaluation is restricted to a simulated 9-bus distribution grid, and broader validation on large-scale, real-world networks is needed. Furthermore, the Autoencoder’s performance depends on careful hyperparameter tuning, and dynamic time-series variations of distributed generation were not explicitly modeled. These factors should be addressed in future studies to enhance the generalizability and robustness of the framework.

Overall, the findings confirm the transformative role of combining synthetic data and unsupervised learning to build scalable and robust fault detection systems. This work lays the foundation for future research focused on optimizing synthetic data generation and refining unsupervised learning to strengthen the resilience of smart grids under uncertain operating conditions.

Author Contributions

Conceptualization, V.K.H. and M.G.N.; Methodology, V.K.H., Fabrizio Granelli and M.G.N.; Software, V.K.H.; Validation, V.K.H.; Formal analysis, V.K.H.; Investigation, V.K.H.; Resources, M.G.N.; Writing—original draft, V.K.H. and A.G.; Writing—review & editing, V.K.H., F.G. and M.G.N.; Visualization, M.G.N.; Supervision, F.G. and M.G.N.; Project administration, M.G.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the European Union under the Italian National Recovery and Resilience Plan (NRRP) of NextGenerationEU, partnership on “Telecommunications of the Future” (PE00000001—program “RESTART”).

Data Availability Statement

The data presented in this study are available on request from the corresponding author. Since the area under study is a an actual system in the campus I cannot release the data publicly due to privacy reasons.

Acknowledgments

The authors gratefully acknowledge the support of the European Union under the Italian National Recovery and Resilience Plan (NRRP) of NextGenerationEU, partnership on “Telecommunications of the Future” (PE00000001—program “RESTART”). We also wish to thank the Department of Electrical and Electronics Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, India, and the Department of Information Engineering and Computer Science, University of Trento, Italy, for their academic support, collaboration, and research facilities that enabled this work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

AI	Artificial Intelligence
ANN	Artificial Neural Network
AUC	Area Under the Curve
DER	Distributed Energy Resources
DG	Distributed Generation
EV	Electric Vehicle
FDIR	Fault Detection, Isolation, and Restoration
GAN	Generative Adversarial Network
HMM	Hidden Markov Model
IoT	Internet of Things
JADE	Java Agent DEvelopment Framework
KNN	K-Nearest Neighbor
MAS	Multi-Agent System
ML	Machine Learning
MPD	Matching Pursuit Decomposition
PV	Photovoltaic
RF	Random Forest
RNN	Recurrent Neural Network
ROC	Receiver Operating Characteristic
SVC	Support Vector Classification
SVM	Support Vector Machine
TPR	True Positive Rate
FPR	False Positive Rate
V2G	Vehicle-to-Grid
EWT	Empirical Wavelet Transform
ELM	Extreme Learning Machine

References

Labrador, A.E.; Abrao, T. Faults in smart grid systems: Monitoring, detection, and classification. Electr. Power Syst. Res. 2020, 189, 106628. [Google Scholar] [CrossRef]
Grcic, I.; Pandzic, H.; Sunde, V. Electric Vehicle Charging Station Fault Detection: A Machine Learning Approach. In Proceedings of the CIRED Workshop on E-mobility and Power Distribution Systems, Porto, Portugal, 2–3 June 2022. [Google Scholar]
Chakraborty, S.; Das, S.J.; Sidhu, T.; Siva, A.K. Smart meters for enhancing protection and monitoring functions in emerging distribution systems. Int. J. Electr. Power Energy Syst. 2021, 127, 106642. [Google Scholar] [CrossRef]
Rodriguez-Calvo, A.; Cossent, R.; Frias, P. Integration of PV and EVs in unbalanced residential LV networks and implications for smart grid and AMI deployment. Int. J. Electr. Power Energy Syst. 2017, 91, 90–102. [Google Scholar] [CrossRef]
Vishakh, K.H.; Airavati, A.O.; Aravind, S.; Nair, M.; Sunil, J.; Manjula, P.P.R. Fault Detection Isolation Restoration of an Active Distribution Network. In Proceedings of the 2019 Innovations in Power and Advanced Computing Technologies (i-PACT), Vellore, India, 22–23 March 2019. [Google Scholar]
Raveendran, V.; Swathy, S.S.W.; Manjula, K.; Alvarez-Bel, C. Vehicle-to-grid ancillary services using intelligent green EV charging infrastructure in smart grid. Int. J. Electr. Power Energy Syst. 2020, 40, 67–75. [Google Scholar]
Baghaee, H.R.; Mlakic, D.; Nikolovski, S.; Dragic, T. Anti-Islanding Protection of PV-Based Microgrids Consisting of PHEVs Using SVMs. IEEE Trans. Smart Grid 2020, 11, 4835–4848. [Google Scholar] [CrossRef]
Srivastava, A.; Parida, S.K. A Robust Fault Detection and Location Prediction Module Using Support Vector Machine and Gaussian Process Regression for AC Microgrid. IEEE Trans. Ind. Appl. 2022, 58, 7240–7250. [Google Scholar] [CrossRef]
Raveendran, V.; Divya, R.; Chandran, P.C.S.; Nair, M.G. Smart level 2 DC electric vehicle charging station with improved grid stability and battery backup. In Proceedings of the International Conference on Technological Advancements in Power and Energy (TAP Energy), Kollam, India, 21–23 December 2017. [Google Scholar]
Alhelou, H.H.; Golshan, M.E.H.; Askari-Marnani, J. Robust sensor fault detection and isolation scheme for interconnected smart power systems in presence of RER and EVs using unknown input observer. Int. J. Electr. Power Energy Syst. 2018, 99, 10–22. [Google Scholar] [CrossRef]
Sujil, A.; Choudhary, S.; Verma, J.; Kumar, R. Agent based intelligent fault detection isolation and restoration in smart power system. In Proceedings of the IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), Bhopal, India, 5–6 March 2016. [Google Scholar]
Gokul Krishna, N.; Raj, J.; Chandran, L.R. Transmission Line Monitoring and Protection with ANN-aided Fault Detection, Classification and Location. In Proceedings of the 2nd International Conference on Smart Electronics and Communication (ICOSEC), Trichy, India, 7–9 October 2021. [Google Scholar]
Zhao, R.; Lin, J.; Gao, H.; Chen, L. Fault Diagnosis of Electric Vehicle Charging Station Based on Empirical Wavelet Transform and Entropy. In Proceedings of the 40th Chinese Control Conference (CCC), Shanghai, China, 26–28 July 2021. [Google Scholar]
Jiang, H.; Zhang, J.J.; Gao, W.; Wu, Z. Fault Detection, Identification and Location in Smart Grid Based on Data-Driven Computational Methods. IEEE Trans. Smart Grid 2014, 5, 2947–2956. [Google Scholar] [CrossRef]
Navyasri, G.S.; Deepa, K.; Sailaja, V. Fault Analysis in Three Phase Transmission Lines using Wavelet Method. In Proceedings of the 6th International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 28–30 April 2022. [Google Scholar]
Chen, K.; Hu, J.; He, J. A Framework for Automatically Extracting Overvoltage Features Based on Sparse Autoencoder. IEEE Trans. Smart Grid 2018, 9, 411–419. [Google Scholar] [CrossRef]
Panthi, M. Anomaly Detection in Smart Grids using Machine Learning Techniques. In Proceedings of the First International Conference on Power, Control and Computing Technologies (ICPC2T), Raipur, India, 3–5 January 2020. [Google Scholar]
Cui, L.; Ni, S.; Zhi, Y.; Kheshti, M.; Cheng, R.; Xu, Y.; Kang, X. Comprehensive transmission line fault location method for smart substations. In Proceedings of the IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC), Xi’an, China, 25–28 October 2016. [Google Scholar]
Santhosh, V.; Sarath, T.V. Comparative Analysis of Machine Learning Algorithms for the Classification of Power System Faults. In Proceedings of the 3rd International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 12–14 June 2019. [Google Scholar]
Gao, X.; Yuan, G.; Zhang, M. Fault Detection of Electric Vehicle Charging Piles Based on Extreme Learning Machine Algorithm. In Proceedings of the Fourth International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 11–13 March 2020. [Google Scholar]
Zhu, Q.; Zhu, L.; Wang, Z.; Zhang, X.; Li, Q.; Han, Q.; Yang, Z.; Qin, Z. Hybrid triboelectric-piezoelectric nanogenerator assisted intelligent condition monitoring for aero-engine pipeline system. Chem. Eng. J. 2025, 519, 165121. [Google Scholar] [CrossRef]
Cao, J.; Lin, Y.; Fu, X.; Wang, Z.; Liu, G.; Zhang, Z.; Qin, Y.; Zhou, H.; Dong, S.; Cheng, G.; et al. Self-powered overspeed wake-up alarm system based on triboelectric nanogenerators for intelligent transportation. Nano Energy 2022, 108, 108150. [Google Scholar] [CrossRef]
Abhishek, S.S.; Malik, G.R.; Suresh, D.T.; Anudev, B.; Prasad, G. Sustainable Mobility and Environmental Impacts of Green Batteries: A Comprehensive Review. E3S Web Conf. 2024, 540, 11005. [Google Scholar]
Yilmaz, B.; Korn, R. Synthetic demand data generation for individual electricity consumers: Generative Adversarial Networks (GANs). Energy AI 2022, 9, 100161. [Google Scholar] [CrossRef]
Caparrini, A.; Arroyo, J.; Mansilla, J.E. S&P 500 stock selection using machine learning classifiers: A look into the changing role of factors. Res. Int. Bus. Financ. 2024, 70, 102336. [Google Scholar]
Bird, J.J.; Lotfi, A. Cifake: Image classification and explainable identification of AI-generated synthetic images. IEEE Access 2024, 12, 15642–15650. [Google Scholar] [CrossRef]
Agrawal, G.; Kaur, A.; Myneni, S. A Review of Generative Models in Generating Synthetic Attack Data for Cybersecurity. Electronics 2024, 13, 322. [Google Scholar] [CrossRef]
Cherian, S.M.; Rajitha, K. Random forest and support vector machine classifiers for coastal wetland characterization using combined optical and SAR features. J. Water Clim. Change 2024, 15, 29–49. [Google Scholar] [CrossRef]
Liu, Y.; Xu, X.; Liu, Y.; Liu, J.; Yang, N.; Jawad, S.; Wei, Z. Grid abandonment potential of photovoltaic and storage-based system in China. Int. J. Electr. Power Energy Syst. 2024, 155, 109414. [Google Scholar] [CrossRef]
Shukla, P.K.; Deepa, K. Deep learning techniques for transmission line fault classification—A comparative study. Ain Shams Eng. J. 2024, 15, 102427. [Google Scholar] [CrossRef]
Dang, X.; Li, Y.; Papadakis, M.; Klein, J.; Bissyande, T.F.; Le Traon, Y. Test input prioritization for Machine Learning Classifiers. IEEE Trans. Softw. Eng. 2024, 50, 413–442. [Google Scholar] [CrossRef]

Figure 1. Research methodology workflow for the 9-bus system.

Figure 2. Simulink model of the ashram section and the academic block section. Va, Vb, Vc and Ia, Ib, Ic refer to the three phases of voltages and currents.

Figure 3. Current characteristics under LLLG fault.

Figure 4. Voltage characteristics under LLLG fault.

Figure 9. Accuracy comparison under random attack.

Figure 10. Comparative framework for fault prediction: experimental vs. synthetic data analysis.

Figure 11. The bar graph represents the comparison of the accuracy of different classifiers using synthetic and general data.

Table 1. Summary of literature review on smart grid fault detection and control: key contributions and limitations.

Title	References	Key Contribution	Limitations
General Fault Classification in Smart Grids	[1,3,4]	Provides an overview of various fault types and their impact on power systems.	Lacks advanced AI-based fault detection techniques.
Machine Learning-Based Fault Detection	[2,8,12,14,19,23]	Explores ML models for fault detection, including supervised and unsupervised methods.	Most models require extensive labeled data and may struggle with unseen faults.
Smart Meters and Protection Mechanisms	[3,5,6,7]	Highlights the role of smart meters in grid protection and monitoring.	Does not focus on real-time fault detection using AI/ML.
EV Integration and Its Impact on Grid Stability	[6,11,20]	Analyzes the influence of EVs on grid stability and proposes mitigation strategies.	Focuses more on grid stability than on fault detection.
Generative AI for Synthetic Data Generation	[24]	Demonstrates how synthetic data can improve ML model training for fault detection.	Synthetic data validity and real-world applicability remain concerns.
Autoencoder-Based Fault Detection	[16,17]	Introduces autoencoders as an unsupervised ML technique for anomaly and fault detection.	Requires fine-tuning for optimal performance across different fault conditions.
Wavelet and Signal Processing Techniques	[13,15]	Uses wavelet-based methods for signal analysis in fault detection.	Limited scalability and high computational requirements.
Anomaly Detection in Smart Grids	[17,18]	Focuses on identifying anomalies in power grid data using ML techniques.	Primarily detects anomalies rather than classifying specific fault types.
Optimal Fault Location Techniques	[18,25]	Proposes ML-based optimization techniques for precise fault location.	Requires validation on real-world power grids.
Self-Healing and Resilient Grid Systems	[5,9,10,11]	Explores self-healing methodologies for enhancing smart grid resilience.	Limited integration with AI-based predictive fault detection.

Table 2. PV panel specifications.

Parameter	Rating/Description
Maximum Voltage	415 V
Maximum Power	900 kW
Number of panels	2000
Irradiance	800 W/m²
Temperature	30 °C

Table 3. Wind energy system specifications.

Parameter	Rating/Description
Turbine Rated Power	250 kW
Rated RMS Phase Voltage	415 V
Rated RMS Phase Current	300 A
Base Angular Frequency	50 Hz
Power Demand	100 kW

Table 4. Electric vehicle charging station specification.

Parameter	Rating/Description
Grid Voltage	415 V
Grid Frequency	50 Hz
Grid Configuration	3-phase AC Microgrid
SoC	75%

Table 5. Representative examples of simulated fault scenarios.

ID	Fault Type	Location	Phases Involved	Duration (ms)
E1	Single Line-to-Ground (LG)	Feeder 1 (Bus A)	Phase A + Ground	100
E2	Line-to-Line (LL)	Academic Block PCC	Phases B–C	120
E3	Double Line-to-Ground (LLG)	PV Intertie	Phases A–B + Ground	150
E4	Three-Phase (LLL)	Wind Turbine Intertie	Phases A–B–C	80
E5	Three-Phase-to-Ground (LLLG)	EV Charging PCC	Phases A–B–C + Ground	200

Table 6. Classification report.

Class	Precision	Recall	F1-Score
Normal Data	1.00	0.99	0.99
Fault Data	0.99	1.00	0.99
Accuracy	0.99	0.99	0.99

Table 7. Accuracy and log loss of classifiers.

Classifier	Accuracy	Log Loss
SVC	96%	0.059
NuSVC	94%	0.061

Table 8. Log loss for case 2.

Algorithm	Case 2 Log Loss
SVM	0.9
KNN	21.35
Random Forest	0.73
Autoencoder	0.71

Table 9. Impact of synthetic data proportion on Random Forest performance.

Synthetic Data Share	Accuracy (%)	Precision	Recall	F1-Score	AUC
25%	96.2	0.95	0.96	0.95	0.991
50%	97.8	0.97	0.98	0.97	0.995
75%	98.1	0.98	0.98	0.98	0.997
100%	96.8	0.96	0.96	0.96	0.992

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hariharan, V.K.; Geetha, A.; Granelli, F.; Nair, M.G. Machine Learning Techniques for Fault Detection in Smart Distribution Grids. Energies 2025, 18, 5179. https://doi.org/10.3390/en18195179

AMA Style

Hariharan VK, Geetha A, Granelli F, Nair MG. Machine Learning Techniques for Fault Detection in Smart Distribution Grids. Energies. 2025; 18(19):5179. https://doi.org/10.3390/en18195179

Chicago/Turabian Style

Hariharan, Vishakh K., Amritha Geetha, Fabrizio Granelli, and Manjula G. Nair. 2025. "Machine Learning Techniques for Fault Detection in Smart Distribution Grids" Energies 18, no. 19: 5179. https://doi.org/10.3390/en18195179

APA Style

Hariharan, V. K., Geetha, A., Granelli, F., & Nair, M. G. (2025). Machine Learning Techniques for Fault Detection in Smart Distribution Grids. Energies, 18(19), 5179. https://doi.org/10.3390/en18195179

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Techniques for Fault Detection in Smart Distribution Grids

Abstract

1. Introduction

2. System Model and Methodology

3. Data Collection from Smart Distribution Grid

3.1. Example Fault Scenarios

3.2. Renewable Sources and Sensitivity

4. Comparative Analysis

4.1. Case 1: Training and Testing with Identical Fault Simulation Data

4.1.1. K-Nearest Neighbor (KNN)

4.1.2. Support Vector Machine (SVM)

4.1.3. Random Forest

4.1.4. Autoencoder

4.2. Case 2: Training with Fault Simulation Data and Testing with a Random Fault

5. GAN-Based Algorithm for Generation of Synthetic Data in Fault Prediction

5.1. Comparing Classifier Performance: Synthetic vs. Real Data

5.2. Impact of Synthetic Data Proportion on Model Performance

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI