1. Introduction
The power industry is taking a dramatic turn toward a new era as distributed energy resources (DERs) are being adopted at high rates, renewable energy becomes widely integrated, and the world moves toward sustainability and decentralization. In this developing environment, microgrids (MGs) come up as one of the key elements in future electrical systems. A microgrid is an energy system that can work in island mode or interconnected with a main grid; it incorporates different energy generation sources (solar PV, wind turbines, diesel generators, and battery storage systems), local loads, and control equipment [
1,
2,
3,
4,
5,
6,
7].
1.1. Background and Motivation
Microgrids are a crucial part of modern power systems due to the capabilities they provide for the integration of distributed energy resources, enhancing reliability and supporting decarbonization. Their potential for operation in both grid-connected and islanded modes enhances the resilience of critical facilities but leads to poor performance in conventional protection schemes. Within an AC microgrid, the most common faults are LG, LL, LLG, LLL, and LLLG faults, and their detection is very important to prevent serious system stability issues [
8]. Additionally, due to the extensive application of power electronic converters, fault current characteristics have changed greatly, leading to reduced and uncertain fault levels compared to traditional power systems [
9]. In this respect, LSTM and Bi-LSTM networks are perfectly suitable for time-series-based fault diagnosis in microgrids. LSTM models capture the temporal dependencies in fault transients effectively, whereas Bi-LSTM networks exploit information from both forward and backward sequences, thereby improving their ability to represent symmetric and non-stationary fault events, which are much more important [
10,
11,
12,
13].
1.2. Challenges in Fault Detection
Several of these technical limitations make fault detection in microgrids challenging. Inverter-interfaced DERs contribute limited fault currents that often result in the underperformance of conventional protection devices. Bidirectional power flow brought by DER integration obscures fault direction and reduces relay selectivity [
14,
15]. In hybrid AC/DC microgrids, transient fault signatures are further distorted by nonlinear converter dynamics, while the non-stationary, short-duration nature of microgrid faults limits the effectiveness of using traditional signal processing methods [
16].
Also, fault signals in microgrids are versatile, rapid, and highly nonlinear; hence, they are difficult to characterize under stationary assumptions. In addition, practical constraints, in terms of communication delays, sensor placement limitations, cyber vulnerabilities, and a lack of labeled real-world fault data, add to the challenge of timely and precise diagnosis. These challenges enforce adaptive data-driven protection strategies. In this respect, deep learning models like LSTM and Bi-LSTM are proficient for temporal correlations and nonlinear fault characteristic representation in complex microgrid environments [
17,
18,
19].
1.3. Machine Learning and Deep Learning for Fault Detection
Since voltage and currents have inherent features that can be learned using data-intensive models, various machine learning tools have been employed in power system protection. Existing classifiers like SVM, Decision Trees, and k-NN have shown a reasonable level of accuracy in identifying faults, but since they are based on manually designed features, they may not remain robust in noisy and higher-dimensional spaces. Deep learning models overcome these limitations as they have the capability to automatically discover features from large volumes of data with good discriminatory capability. In microgrid protection, CNN, LSTM, and hybrid CNN-LSTM models have shown superior performance in identifying faults with complex features like nonlinearity and time variation. LSTM and Bi-LSTM models have specifically shown satisfactory performance in modeling temporal relationships, and improvements have been observed in the identification of symmetric and overlapping faults using Bi-LSTMs due to bidirectional learning of sequences [
20,
21,
22].
Recent works have focused on how IoT technologies can be embedded into microgrid management and control frameworks to make these systems more effective and to enable monitoring of microgrid operation. Studies, for example, have explored mechanisms for integrating EVs (electric vehicles) and DERs (distributed energy resources) and optimizing the operation of microgrids through an IoT-based energy management strategy. Furthermore, hierarchical IoT-based control architectures have been developed to improve the stability of, and provide inertia support for, modern DC microgrids. These studies show that intelligent, data-driven mechanisms to enhance the reliability and resilience of microgrids are becoming increasingly important [
23,
24].
1.4. Existing Research and Limitations
Most existing works show the capabilities of neural networks in the classification of faults, but many of them are limited in scope. Pradhan et al. [
1] presented an MLP-based classifier using phase current magnitudes, which also attained an accuracy of approximately 97% but for fewer fault types. Bu et al. [
16] developed a CNN-attention-LSTM model, enhancing interpretability with higher accuracy in hybrid AC/DC microgrids. Other works utilized Bi-LSTM approaches for distribution network faults [
17] and LSTM-based approaches for DC microgrid fault detection [
11].
The remaining key limitations are as follows: most works address only a subset of the fault types, like LG, LLG, L and LLL. But LLLG faults are hardly ever addressed, and few comparisons exist between LSTM and Bi-LSTM under the same microgrid conditions. This justifies a comprehensive, multi-fault comparative study.
1.5. Novel Contribution of This Work
The choice of both the LSTM and Bi-LSTM models in this study is informed by their capacity to learn the nonlinear and time-dependent fault transients that are characteristic of microgrid data signatures. For example, traditional machine learning models that use CNNs to classify faults still depend on manually designed features to condition the model, unlike LSTMs that can learn directly from raw time-series data input into the model. The other strength that CNNs have is their efficiency in spatial feature extraction, which LSTMs compensate for because they do not have memory to retain information about the progress through the data sequence, which LSTMs handle using their memory gates. Bi-LSTMs offer even better results since they use both forward and backward information to classify faults such as LL, LLL, and LLLG. The model’s good accuracy, ability to generalize, speed, and ability to classify a case study in LLLG faults validate the work’s applicability to real-time protection systems for microgrids.
This study presents a systematic comparison between LSTM and Bi-LSTM models for AC microgrid fault classification based on simulated three-phase current data. The contributions of this study are as follows:
- •
Comprehensive fault coverage: classification of five major faults, namely LG, LL, LLG, LLL, and the LLLG fault.
- •
Comparing performance: both models have been trained and evaluated under the same conditions so their performance can be compared fairly.
- •
Phase-level analysis: model behavior is checked for individual phase responses, in particular for the LLLG faults.
- •
Hyperparameter-optimized modelling: learning architecture and training parameters are refined for high accuracy.
- •
Better performance compared to classical models: the results prove that Bi-LSTM constantly outperforms both ML classifiers and traditional LSTM.
1.6. Organization of Paper
The rest of this paper is organized into three sections that explain the methodology, evaluation, and conclusions of this study.
- •
Section 2 describes the methodological framework of microgrid modelling, fault scenario generation, preprocessing, and network architecture configuration.
- •
Section 3 describes the results: accuracy metrics, confusion matrices, ROC analysis, influence of learning rates, and performance in making predictions for every fault type.
- •
Section 4 summarizes major findings and discusses practical implications and limitations, while suggesting future extensions such as real-time implementation and hardware-in-the-loop validation.
2. Methodology
In the methodology, a modified AC microgrid is simulated in the Real-Time Digital Simulator (RTDS) environment to obtain current data in five different fault scenarios. To ensure data diversity, faults are applied to multiple locations with variances of resistance and inception angles. Three-phase currents are recorded and preprocessed with normalization and segmentation. These newly created time-series signals are now suited for LSTM and Bi-LSTM model training. Each network undergoes fine-tuning for optimal performance via standard training parameters. The classification accuracy for each model is analyzed and compared by fault type to assess effectiveness in fault classification.
2.1. Microgrid System Model
A modified medium-voltage AC microgrid was modelled and simulated with Real-Time Digital Simulator (RTDS, V5) software for the purpose of analyzing and classifying different fault situations in a practical operating environment. When coupled with the RTDS, it provides an accurate and high-fidelity solution for modelling real-world behaviors in power systems, making it an ideal platform for protection algorithm testing in varied fault situations.
The simulated microgrid configuration closely resembles the ideal distribution-level microgrid system configuration with conventional and renewable energy generation sources. The system consists of a 3 MW diesel generator, a 2 MW wind turbine with a Doubly Fed Induction Generator (DFIG), and a 1.74 MW photovoltaic (PV) array with a two-level Voltage Source Converter (VSC) using Maximum Power Point Tracking (MPPT) to ensure efficient energy extraction. An optimal 11 kV nominal voltage value operates the microgrid, and reactive power support is realized via a 0.5 MVAR capacitor bank at Bus B1, which maintains voltage stability in the network for the entire circuit. The system may work in grid-connected mode or islanded mode, being controlled by the strategically positioned switches S1, S2, and S3 shown in
Figure 1. Both the radial and meshed configurations of the network, the two-switches set up, S2 and S3, and the shift between connecting to grid and islanded use are accomplished through switches S1, S2, and S3.
The microgrid is a radial distribution network composed of several buses interconnected through feeder lines. The distribution lines are modeled based on an R–L impedance model to simulate realistic feeder behavior. Distributed energy resources and load buses pass through these feeders in the network and simulate the fault behavior of the network in the given scenarios. The capacitor bank, which stands at the main distribution bus, will provide reactive power compensation, which is in essence a means to regulate voltage and reduce the instability of the system. The diesel generator is a dispatchable generation unit with speed governor as the control mechanism for frequency regulation. The wind turbine’s DFIG with vector control is used for active and reactive power output, and the PV array is equipped with a VSC-based interface, which connects to the MPPT control.
The key electrical parameters and operating conditions used in the RTDS-based microgrid simulation are summarized in
Table 1, which provides a clear overview of the system configuration and improves the reproducibility of the proposed simulation framework.
2.2. Fault Scenario and Modelling
In order to effectively train and test the overall performance of the proposed deep learning models, a complete set of fault scenarios was modelled in the RTDS-based microgrid environment. The objective of the fault simulation was to portray realistic disturbances in the power system through modelling different types of short-circuit scenarios at alternate locations and other operational states.
Five major symmetrical and asymmetrical faults, which are among the most frequent faults in low- and medium-AC voltage microgrids, are considered in this study:
- •
Line-to-Ground (LG): one-phase contact to the ground.
- •
Line-to-Line (LL): two phases short-circuit without ground.
- •
Double Line-to-Ground (LLG): two phases short-circuit and ground.
- •
Three-Phase (LLL): balanced fault in all three phases.
- •
Three-Phase-to-Ground (LLLG): all three phases short-circuit to ground.
To provide a diverse enough dataset for training the deep learning models, fault simulations were performed by conducting them at multiple points in the microgrid network. In particular, faults were applied to buses B1, B2, B3, and B5 in both grid-connected and islanded system operation conditions. For each fault type, a number of simulations were carried out by changing the fault resistance and inception angle. Following signal segmentation using the sliding window method, about 10,000 labeled samples were generated and used for training, validation, and testing the LSTM and Bi-LSTM models.
The chosen fault parameters were representative of realistic disturbance conditions often encountered in distribution-level microgrids. The fault duration of 200 ms approximates the clearing time of protective relays used in medium-voltage networks. The 0.001 Ω–10 Ω resistance range is suitable for simulating both solid faults and high-impedance faults. Furthermore, variations in fault inception angles between 0° and 90° enable the study of faults initiated at different points of the AC waveform and, therefore, provide the flexibility to include different transient characteristics in model training.
The fault inception angle is used to establish the phase angle of the voltage waveform at which the fault is initiated. The instantaneous voltage and current values at the fault instant depend on the starting position of the waveform and lead to different transient current characteristics at different inception angles. Faults occurring near the voltage peak typically lead to greater transient current magnitudes, whereas faults initiated near the zero-crossing point usually give lower initial currents with different transient oscillations. The RTDS simulation exploits the fault inception angle of 0° to 90° for diverse patterns for the transient current generation at the point of common coupling (PCC). This variability promotes the diversity of data and allows the proposed deep learning models to learn different fault patterns under different operating conditions.
2.3. Data Acquisition and Preprocessing
The experimental setup of the proposed system is depicted in
Figure 2. The microgrid model was developed and implemented in real time using the Real-Time Digital Simulator (RTDS). The simulated system incorporates distributed generation resources, loads, and grid connections, which are modelled to operate in both grid-connected and islanded modes. Different fault conditions, such as LG, LL, LLG, LLL, and LLLG, were introduced at predetermined points to produce three-phase current signals for various operating conditions.
The current signals were acquired from the RTDS using GTAO analog output cards and connected to the dSPACE DS1104 controller board for real-time signal acquisition. The signals were also observed using a Digital Storage Oscilloscope (DSO) for verification purposes. The acquired signals were processed and transmitted to a host computer for preprocessing, normalization, and segmentation prior to training the LSTM and Bi-LSTM models. The RTDS-based hardware-in-the-loop (HIL) setup provides a realistic real-time environment for validating the proposed fault classification system.
The hardware-in-the-loop RTDS platform provides a realistic representation of microgrid dynamics but also poses unknown sensor noise, communication delays, and inaccuracies, which can be introduced by a practical measurement system. Hence, this study will evaluate the potential and performance of the proposed deep learning models based on controlled faults generating fault conditions in the RTDS-based microgrid environment. It will further explain how the measurement noise and other practical disturbances affect the proposed fault classification framework to ascertain if it is applicable for real-life applications, especially in the microgrid.
2.3.1. Data Acquisition
To represent fault-induced disturbances correctly, three-phase currents were measured at 20 KHz (resolution of 50 µs), thereby allowing both transient and DC components to be captured accurately. A total duration of 500 ms was assumed for each fault simulation, with pre-fault, fault and post-fault phases included. The instantaneous value of each phase current I
a(t), I
b(t), and I
c(t) is recorded at the point of common coupling (PCC), since it represents all system behavior and is still sensitive, despite microgrid faulting. A sample LLG fault current is shown below in
Table 2.
In order to increase the generalization and robustness of the classification model, the fault events were inserted into various operation conditions. These included the following:
- •
Various fault locations (Buses B1, B2, B3, B5);
- •
Variety of fault resistances from 0.001 Ω to 10 Ω;
- •
Variety of inception angles ranging from 0° to 90°;
- •
Grid-connected and islanded modes of operation.
The final dataset of signals presents a diverse range of system responses under symmetrical and asymmetrical fault conditions and represents the data used for preprocessing, feature extraction, and classification with deep learning models. This dataset will serve to ensure that the LSTM and Bi-LSTM networks created can classify the chosen fault types with a high level of accuracy when implemented in a real-world microgrid.
2.3.2. Windowing and Segmentation
To allow the deep learning models to properly learn temporal features from the current waveforms, the continuous current signals were split into fixed-length time windows. This method is known as windowing and is an important step to prepare the time-series data for input to LSTM (Long Short-Term Memory) and Bi-LSTM (Bidirectional Long Short-Term Memory) networks, which use sequential patterns in data for classification.
The three-phase current signal component is represented as follows:
where I
a(t), I
b(t) and I
c(t) are the instantaneous values of three-phase currents, such as A, B, C, respectively, and T = 0.5 s is the total duration of the signal.
Since the sampling frequency fs = 20 KHz, every 1 sec of data has 20,000 samples. To balance temporal resolution and computational efficiency, each signal was split into overlapping windows of L = 100 ms (N = 2000 samples per segment), represented as follows:
Each section Wk denotes a matrix of three-phase current samples for a particular time window. The segmentation process utilized a sliding window with a step s to ensure that windows overlapped to some degree and continued to offer continuity in sequential learning.
Each segment of the window was provided a fault class label corresponding to the specific fault observed in that segment. The assignments were as follows:
After doing this, the various class tags generated a structured dataset:
Here, D is the complete dataset, is the input window, ∈ {0,1,2,3,4} is the fault label, and M is the total number of windows.
This segmentation methodology ensures that the deep learning models receive input sequences of equal shape while keeping essential temporal information, which is necessary for separation between fault types. Similarly, using overlapping windows increases the size and diversity of learning in the dataset, which potentially increases the classifier generalization ability over hidden data.
2.3.3. Normalization
Normalization is an essential preprocessing step when designing time-series-based deep learning models, especially if the input signals are of different scales under various operating conditions. Normalization is crucial to avoid inputs with higher magnitudes completely dominating the learning process, which would necessarily make the learned model biased and non-converging. All signal inputs were z-score standardized prior to training in order to guarantee homogenization and numerical stability.
The normalized value I
n(t) of a phase current signal I(t), for example, is derived as follows:
where
I(
t) refers to the original value of the current occurring at time t, δ is the average value of the current values in the time series in that time window, σ is the standard deviation of the current values in the time series in that time window.
This process guarantees that all input attributes (phase current) are standardized to a mean of zero and a variance of one, so the data points are effectively scaled to a similar range while maintaining their intra-relationships in time.
Figure 3 shows Z-score normalization was applied independently to each measure of the three-phase current signals I
a(t), I
b(t) and I
c(t) across the entire segmented time periods. This preserved the nature of the patterns while still differentiating the waveforms as to the types of faults (i.e., LG, LLLG, LL, and LLL) due to the phase-specific differences in signal amplitude and transformation characteristics. The input signals became standardized and were used in the learning process of both the LSTM and Bi-LSTM models for training and prediction, which aided model learning efficiency and translated into improved generalization as the models encountered diverse fault cases when predicting.
2.3.4. Dataset Splitting
To prevent data leakage from overlapping sliding windows, the continuous fault simulation events were initially divided into independent training, validation, and testing datasets. Proper dataset splitting is also very important for the performance evaluation of deep learning models in generalization and avoiding overfitting in certain types of problems. The dataset was split into three non-overlapping subsets:
- •
70% used in training: used to update model parameters and to learn the temporal information from the current signals.
- •
15% validation: used to assess the model performance and to tune hyperparameters throughout training.
- •
15% testing: used to evaluate the final performance of the trained model on previously unseen data.
After dividing the fault events into these different subsets, sliding window segmentation and normalization were performed separately in each dataset. It also makes sure that the overlapping samples of the same fault event from both the training and test sets do not overlap, thus preventing data leakage on both datasets and also validating the implementation to test the models proposed for the LSTM and Bi-LSTM models.
In
Figure 4, faulty three-phase currents are shown to represent four fault cases: LG, LLG, LLLG, and LL. The shaded area indicates the fault duration for each fault condition, which demonstrates the transience of the current waveforms.
2.3.5. Input Shape for LSTM and Bi-LSTM
Deep learning models, including LSTM and Bi-LSTM, necessitate a particular input format to enable sequential data processing. In this work, the current signals that were segmented and normalized were organized into three-dimensional tensors, each tensor preserving the temporal evolution of three-phase current waveforms over a defined period. The input architecture was designed to incorporate and maintain both time-dependent and phase-dependent information that is important for classification accuracy.
Figure 5 graphically presents the structural differences between the LSTM and Bi-LSTM structures that serve a similar purpose of performing fault classification with data represented in time series in this work.
LSTM Architecture
An LSTM (Long Short-Term Memory) network processes sequences in a forward manner and learns dependencies over time steps (past to present). The LSTM model takes input in the shape of the following:
where S is batch size (number of samples per training round), T is number of time steps per sequence (e.g., 2000 samples for 100 ms at 20 kHz), and F = 3 is number of features, corresponding to the three-phase currents I
a(t), I
b(t), I
c(t).
Figure 5a shows the proposed LSTM architecture used in binary fault detection in a microgrid. The three-phase currents (I
a, I
b, I
c) are given as sequential input data to an LSTM layer in the network, where 100 memory units are used. The features in the input currents are learned, and a dropout layer follows, introducing a dropout rate of 0.5. The extracted temporal features are fed to a fully connected dense layer, which consists of 50 neurons with ReLU activation. The last output layer contains five neurons with Softmax activation, representing five fault classes (LG, LL, LLG, LLL, and LLLG). The Softmax function provides the likelihood distribution of the fault class predictions for multi-class classification.
Bi-LSTM Architecture
The Bi-LSTM (Bidirectional LSTM) structure is an improvement on traditional LSTM by processing input sequences from both directions, forwards and backwards, allowing the model to learn information from both past and forward time steps. This capability provides the model with considerable strength in detecting patterns which involve both time directions.
The input shape is unchanged from LSTM:
However, the internal structure is made of two LSTM layers running in parallel:
- •
A Forward LSTM, processing t = 1 to t = T.
- •
A Backward LSTM, processing t = T to t = 1.
In each time step, the outputs from both directions are concentrated to form a unified representation as follows:
where
represent the forward and backward hidden states of LSTM layers.
Figure 5b shows a Bidirectional LSTM network. The Bidirectional Long Short-Term Memory (Bi-LSTM) network is an extension of the standard LSTM architecture and processes the data in both directions. This helps the network to incorporate overall temporal dependencies within the current three-phase signal. This is significant as disturbances due to faults affect this signal over a period of time. In the proposed model, 100 memory units in both directions are used in the Bi-LSTM layer to learn the bidirectional features from the input sequences. A dropout rate of 0.5 is applied to avoid overfitting, and then a dense layer of the ReLU activation function is used as the output layer for nonlinear feature transformation. As with the LSTM architecture, the Bi-LSTM model employs a dense layer, followed by an output layer of five neurons, with Softmax activation, for multi-class fault classification. Bidirectional processing improves the capability to recognize transient abnormalities and, thus, increases the classification accuracy compared to the standard model of LSTM.
2.4. Hyperparameter Configuration
The performance and generalization ability of deep learning models largely depend on different hyperparameters. In our work, hyperparameter optimization for both the LSTM and Bi-LSTM models was carried out in a systematic manner to maximize accuracy while minimizing overfitting and training time.
The Adam optimizer trained each LSTM and Bi-LSTM model with an initial learning rate of 0.001, which ensured stable convergence for the primary training experiments. Moreover, numerous independent training sequences run with different learning rate values were conducted to examine their impact on model convergence and classification performance, alongside this default configuration. The choice of a learning rate was made after considering achieving faster convergence and stability during training. Additionally, a batch size of 64 was used for efficient learning and effective generalization. Both models used 128 memory units since they offer high capacity and good modelling of fault dynamics without increasing complexity. To control overfitting, a dropout rate of 0.2 was used after recurrent layers. Training iterations were performed for a maximum of 100 epochs with early stopping if validation loss did not decrease for 10 epochs in a row. The categorical cross-entropy loss function and accuracy were used in multi-class fault classification tasks. The hyperparameters were equal in both designs in order to provide a balanced and equivalent comparison in performance with regard to all fault types.
2.5. Loss Functions and Evaluation Metrices for LSTM and Bi-LSTM
In this research, the LSTM and Bi-LSTM models are formed with the categorical cross-entropy loss function, which is suitable for fault-type detection in microgrids, being a multi-class classification situation. This loss corresponds to the discrepancy between the predicted probability distribution and the true class labels. In training, the loss function penalizes the incorrect predictions more severely when the model assigns high probability to the wrong class. While fault datasets used in power system applications will sometimes face classification imbalances, the dataset used in this study was designed to consist of an equal number of all fault classes. In the end, the final dataset had 30,000 samples evenly divided across five different fault types (LG, LL, LLG, LLL, and LLLG), including 6000 samples per class to balance their training. Thus, the proposed models could be trained without applying further class weightings with classical categorical cross-entropy loss. The models were trained using the Adam optimizer, and loss values were monitored for both the training and validation sets. The loss decrease over epochs indicates stable learning and robustness of some generalization ability. Moreover, a comparison of the LSTM and Bi-LSTM models illustrates that bidirectional temporal learning leads to a reduction in loss and better classification results.
Let
be the ith window of sequence length T, features F = 3, with LSTM and Bi-LSTM producing class logits:
where K = 5 classes (LG, LL, LLG, LLL, and LLLG), h
i is the sequence representation of last hidden states, W and b are trainable parameters.
With one-shot targets
, the mini batch loss for a batch β of size B is represented as follows:
To mitigate class imbalance, class-weighted cross-entropy can be used:
In this context, where is the weight assigned to class k, the true label of sample i to class k is denoted by The predicted probability for sample i to class k represented by and represents the average total loss of all samples in the batch.
When considering loss curves beyond just training/validation loss curves (which is seen by epoch), we report performance using standard multi-class metrics that are computed on the held-out test set.
So, accuracy (Acc) is represented as follows:
For
, the predicted sample for class i can be calculated as follows:
where
is represented by true class label, N is the total number of samples in the dataset, i is the iteration index of each sample in the dataset.
2.6. Machine Learning and Deep Learning Approaches
In recent times, data-driven fault learning methods have become popular for analyzing microgrid disturbances because they address the inflexibility associated with rules-based protection systems. Traditional classification methods such as Decision Trees, SVMs, and Naïve Bayes, considering current and voltage characteristics, have been applied; however, their reliance on manually designed features restricts scalability for complex environments within a microgrid. Neural learning models, such as the multilayer perceptron described by Pradhan et al. [
1], do not require manual design features since they learn nonlinear mappings from data. The classification performance associated with high-intensity faults using MLPs clearly indicates the success achieved through data-driven methods and acts as a stepping stone towards advanced models such as LSTMs and Bi-LSTMs, both described later.
2.7. Methodology Flowchart
Figure 6 illustrates the entire procedure applied here for fault classification in microgrids with LSTM and Bi-LSTM networks. The method is structured in clear steps, making the workflow from generating data to validating models logical and repeatable.
A medium-voltage AC microgrid simulation is performed using the RTDS system in both connected and islanded operating scenarios, with different types of fault configurations (LG, LL, LLG, LLL, and LLLG). Three-phase currents are measured at the PCC, with a sampling rate of 20 kHz. The sampled signals are then chopped into fixed-size segments and normalized by employing z-score normalization. They are also labelled based on the types of faults and split into training and testing datasets to train the model. These segmented sequences are then classified employing the LSTM and Bi-LSTM techniques optimized via the Adam algorithm. The classifiers are validated employing the standard metrics of classification performance—accuracy, precision, recall, and F1-value. The complete system combines real-time simulation, signal processing techniques, and deep learning into a single platform that will enable the intelligent detection of faults in a microgrid system.
3. Results and Discussion
In this section, we present and discuss the results obtained, applying the proposed fault classification models based on LSTM and Bi-LSTM.
Table 3 presents the training, validation, and test set performance for the five categories of faults: LG, LL, LLG, LLL and LLLG. The effectiveness of the models is measured using standard evaluation metrics, including accuracy, precision, recall and F-1 score, as well as confusion matrices.
Under islanded mode, both were able to maintain high classification accuracy, confirming that the models are not dependent on strong grid support or high fault current magnitude. Specifically, the Bi-LSTM model achieved superior performance, even in symmetric faults, where traditional relays normally work poorly. More precisely, for the most complicated LLLG fault, the Bi-LSTM yielded 98.88% test accuracy, outperforming the LSTM model, with an accuracy of 98.06%.
3.1. Training, Validation and Test Accuracy
Figure 7 shows the convergence behavior for both the LSTM and Bi-LSTM architectures in the LG, LLG, LL, and LLL fault categories. It is seen that both architectures converge rapidly during the first few epochs of training and then remain stable around saturation. However, the convergence in the case of Bi-LSTM is smoother, and its performance on the validation set is slightly better in all categories. For multi-phase faults (LLG and LLL), the bidirectional architecture exhibited better generalization and smaller fluctuations in validation accuracy than the standard LSTM. The small difference between the training and validation curves further depicts strong robustness, with minimal overfitting issues within the models.
The small fluctuations evident in LSTM learning curves are mainly due to the stochastic nature of gradient-based optimization and the differences in transient fault signals created by varying fault resistances and inception angles. Despite these fluctuations, the model converges to stable performance and achieves high classification accuracy on unseen test data, demonstrating satisfactory generalization capability.
Figure 8 depicts the learning curves for both LSTM and Bi-LSTM models in learning the LLLG fault. In spite of the observed fluctuations in the learning curve for the LSTM model, it succeeds in achieving higher test accuracy, which determines a good level of generalization capabilities. The Bi-LSTM model, being more complex in nature, takes longer to achieve higher accuracy in both the training and validation phases.
3.2. Confusion Matrix
Figure 9, below, shows the confusion matrices of the LSTM model for four different fault categories. In the LG fault case, the model demonstrates perfect classification performance, with a distinct separation between normal and faulty states. In the LLG fault case, a negligible number of faulty instances are classified under normal state performance. In the LL fault case, a slightly higher level of misclassification is observed, which may correspond to a partial similarity in the characteristics of LL faults and normal state performances. A similar pattern is seen in the LLL fault case, where a small level of faulty instance performance is misclassified.
Figure 10 shows the confusion matrices achieved with the Bi-LSTM model in different fault situations, stressing the efficient learning ability of this model. In the case of the LG fault, perfect classification is attained, which symbolizes a perfect demarcation between normal and faulty phases. In the LLG fault case, very few samples with faults are classified among the normal function class, which manifests robustness in the mixed-fault mode. The LL fault depicts better performance than in the case of unidirectional LSTM, with fewer incorrect classifications for fault samples, which symbolizes enhanced learning capabilities because of bidirectional learning. In a similar manner, for an LLL fault, perfect accuracy is preserved in the Bi-LSTM, with negligible incorrect classifications, along with correct classifications for most fault samples. Collectively, these findings highlight the enhanced consistency and fault identification capacity of the Bi-LSTM model, especially in dealing with complex faults.
The phase-by-phase assessment of LLLG fault identification based on the LSTM and Bi-LSTM models is shown in
Figure 11. During Phase A, both models classify effectively, with a higher imbalance in false-negative and false-positive samples being recorded in the case of LSTM. The imbalance in error margins is reduced by Bi-LSTM, resulting in a stable boundary for making decisions. Similar observations can be noted in Phase B, where accuracy in classification remains equal in both models, with a stable balance in both models in the case of Bi-LSTM. For Phase C, the trend in performance indicators remains unchanged, with an equal decrement in both models being observed in the case of Bi-LSTM. In summary, although both architectures have a competitive capability in identifying LLLG fault instances with comparable accuracy, the Bi-LSTM network shows better performance in terms of fault identification reliability, especially when it comes to maintaining a small number of false positives.
3.3. ROC Curve
Receiver Operating Characteristic (ROC) curves give a visual examination of the model performance for discriminating faults from those that do not occur. ROC curves were obtained for each type of fault with the two models (LSTM and Bi-LSTM).
Figure 12 highlights the ROC properties of the LSTM and Bi-LSTM models for the four fault levels. ROC curves showcase the relationship between the true-positive rate and the false-positive rate based on a set of predetermined thresholds for a classifier in a binary classification problem. For the LG and LLG fault levels in
Figure 12a,b, both models are found to provide slightly short-of-perfect classification accuracy, with ROC-AUC values greater than 99.7%. In the LL fault case in
Figure 12c, very high discrimination power is retained, with ROC-AUC values of 99.96% and 99.97%, respectively, for the LSTM and Bi-LSTM models. In the LLL fault case in
Figure 12d, both models continue to provide excellent results, with a slightly higher ROC-AUC value of 99.60% for Bi-LSTM in comparison with 99.53% for the LSTM model.
The ROC curve for the LLLG fault is shown in
Figure 13, which compares how well the LSTM and Bi-LSTM models each performed in detecting faults in the same faulty condition in all three phases. In
Figure 13a, the LSTM model performed a multi-class classification task where a distinct type of LLLG rupture had been designated as classes. All three classes had an AUC of 1.00, meaning there was no overlap between fault and non-fault samples. Bi-LSTM’S performance in a multi-label classification framework is demonstrated in
Figure 13b. In this case, the model is concerned with predicting whether A-, B- and C-phase malfunctions occur. Each label—FAULT (A), FAULT (B) and FAULT (C)—also receives an AUC of 1.00, indicating high accuracy.
3.4. Comparative Analysis: Precision, Recall, F1-Score
Figure 14 compares the effects of the LSTM and Bi-LSTM models on accuracy, recall, and F1-score performance for LLLG fault classification.
Figure 14a indicates that LSTM achieves an accuracy of 0.98, a recall of 0.95, and an F1-score of 0.96. While the overall accuracy indicates that the model is accurate, it is missing some of the actual fault cases, as seen by the lower recall score.
Figure 14b indicates that Bi-LSTM is performing significantly better in all categories, as compared to LSTM. The Bi-LSTM model has a 1.00 accuracy, a recall of 0.97, and an F1-score of 0.99. The total metrics indicate that Bi-LSTM was better in detecting and classifying LLLG faults, with no false positives and far fewer false negatives.
Table 3 demonstrates the performance results of the LSTM and Bi-LSTM models for various fault types. Based on the performance results, both models are quite accurate for faults. The Bi-LSTM model was significantly superior to the traditional LSTM architecture because the Bi-LSTM model was better at capturing temporal dependencies of past and present behavior with respect to the signals being analyzed, which led to better recall and F1-scores in higher-complexity conditions.
3.5. Accuracy Using Different Learning Rates
To study how responsive our considered models are to changes in the learning rate parameter, we performed further independent training trials at different learning rates while keeping the other hyperparameters fixed.
Figure 15 illustrates a performance comparison of the LSTM and Bi-LSTM with regard to their test accuracy across LG, LLG, LL, and LLL faulty conditions for different learning rates (10
−4, 10
−3, 10
−2).
The variation in learning rate has a different effect depending on the type of fault in both models. For LG faults, higher learning rates are preferred for Bi-LSTM; however, they are less sensitive to this hyperparameter in LSTM. When faults are LLG, the convergence is dominated by intermediate learning rates, with a slight edge for Bi-LSTM. For LL and LLL faults, higher learning rates lead to a decline in accuracy, particularly for Bi-LSTM models, which are more sensitive. Optimum and intermediate values of learning rates strike a good balance, with improved stability in Bi-LSTMs and higher robustness in LSTMs with lower learning rates.
Figure 16 describes the impact of different learning rates on the performance of the LSTM and Bi-LSTM models relative to detecting LLLG faults. In
Figure 16a, the LSTM model achieves maximum accuracy at a learning rate of 10
− 3 before performance drops considerably, indicating model instability from excessive learning rate parameters.
Figure 16b demonstrates the Bi-LSTM model maintains almost perfect accuracy until learning rate 10
-3, after which the model fails at learning rate 10
− 2 and subsequently 10
− 1. The trends further support that, while the Bi-LSTM model yielded higher accuracy, the model was also more sensitive to excessive learning rates, further supporting rigorous hyperparameter tuning.
3.6. Actual vs. Predicted Fault Classes
In order to evaluate the practical performance of the proposed models, we made a direct comparison of the actual and predicted fault classes. This comparison serves to illustrate how well the LSTM and Bi-LSTM models generalize on previously unseen data. Predicted classes were compared with corresponding ground truth labels associated with all five fault types: LG, LLG, LLLG, LL, and LLL.
In
Figure 17, we show the alignment of actual vs. model-predicted fault classes over time, across fault scenarios, using LSTM. In
Figure 17a and
Figure 17b, representing LG and LLG faults, respectively, the model-predicted outputs (in red) resemble the actual fault occurrences (in blue), indicating high accuracy and only small time lags in fault detection. The LL fault in
Figure 17c provides comparable accuracy overall but discrepancies over a larger number of samples. Overall accuracy remains high, but faults may be misclassified due to similarities in the signal traces for fault and non-fault classes.
Figure 17d shows that predicted fault labels using LSTM for LLL faults significantly resemble the actual labels, with no significant deviation. This could suggest LSTM is capable of detecting symmetrical three-phase faults, provided the features are different between the fault and non-fault classes.
Figure 18 shows the time domain analysis of the actual and predicted fault types based on the Bi-LSTM model for all kinds of faults. In the LG and LLG faults in
Figure 18a,b, the predicted results track the actual fault types very closely, with higher detection accuracy and fewer false alarms. In the LL fault (in
Figure 18c), the Bi-LSTM algorithm outperforms the traditional LSTM algorithm as it correctly distinguishes between the fault and non-fault conditions with fewer gradient mismatch errors. In the LLL faults (in
Figure 18d), the results continue to closely track the original fault patterns, as expected, establishing the capability of the Bi-LSTM model for generalizing the fault patterns correctly.
Figure 19 highlights the time-series prediction for faults of the LSTM and Bi-LSTM models for LLLG faults in all three phases. The prediction of faults by the LSTM model correctly identifies faults without some discrepancies between the predicted and actual fault conditions in the system. The prediction of faults by the Bi-LSTM model correctly identifies the faults and their elimination along with their occurrence in all three phases.
Overall, the advantages of phase-level fault tracking combined with the introduction of multiple fault classifications are the identified responsibility of the Bi-LSTM model over the LSTM model, as shown in
Figure 19. Challenges from non-conventional and complex faults like LLLG pick up multiple fault labels, whereby precise analysis of the phased signals and multi-labels is necessary.
3.7. Output Layer Weights
In LSTM-based models, the weights at the output layer transform learned temporal features into fault class decisions. Being the last set of trainable parameters, these weights define the decision boundaries among different fault categories and reflect the relative importance of hidden unit activations.
Figure 20 shows the output-layer weight distributions of the LSTM model with respect to LG, LLG, LL, and LLL faults, which make it clear what manifestation a single hidden unit makes towards classification.
It can be seen that there is a distinct pattern of weight for each fault type, implying the learning of fault-specific representations by the model. The coexistence of positive and negative weights in every faulty class indicates the selective amplification and suppression of features to increase class separability. Notably, LL and LLL faults have more widely spread magnitudes, indicating increased sensitivity with higher interaction of features for more severe fault conditions. In all, these non-uniform distributions confirm that the LSTM output layer is indeed effective in capturing the discriminative temporal fault characteristics.
Figure 21 presents the output layer weights of a four-fold-class Bi-LSTM model. They give us an indication of how the network weights individual hidden units when predicting faults. These weights connect the final hidden state and output layer. Therefore, they have a strong effect on decisions. For LG and LLG faults, weights have moderate dispersion with both positive and negative weights alternating, showing biased learning of bidirectional temporal features. In contrast, for LL and LLL faults, weights vary more in magnitude, showing that Bi-LSTM encodes more sophisticated temporal fault behaviors by learning from both forward and backward sequences. Although strong peaks are more prominent across various neurons, it clearly conveys that various temporal patterns significantly contribute towards making the final judgment. Overall, it can be seen from this graph that the Bi-LSTM output layer has improved discriminative capability in learning fault severity and structure.
In
Figure 22 below, a comparison between the output layer weight response for the LLLG fault for both models, LSTM and Bi-LSTM, in predicting the output of neurons from the previous layer to fit features to their corresponding labels in the faults can be observed.
In the case of the LSTM approach, there are smoother trends for the weights, implying a more linear process of feature selection by the neurons for a given fault type. The overall effect of the unidirectional learning process is that master fault patterns are captured with less consideration for feature couplings. The plots for the weights for the Bi-LSTM case show large fluctuations with changing signs and magnitudes, which confirm the involvement of forward and reverse learning processes to capture more complex features.
In summary, this comparison illustrates the superior representational capability of Bi-LSTM when dealing with complex dynamics related to the LLLG model for a fault, indicated by its diverse weight distribution.
3.8. Comparative Performance Analysis with Recent Methods
The performance of the LSTM and Bi-LSTM models proposed in this study is compared with a number of state-of-the-art machine learning and neural network-based approaches that have appeared in the recent literature. These include DT, SVM, NB, and MLP classifiers, which were also evaluated under identical fault test cases, namely LG, LLG, LL, and LLL faults.
While the results with traditional classifiers had between 91.4% and 96.3% test accuracy, the MLP model used in the base study [
1] improved this performance, reaching up to 98% accuracy in the case of LL faults. However, the deep learning models proposed in this study significantly outperform these methods. The LSTM model reaches test accuracies of 99.76% (LG), 99.75% (LLG), 98.90% (LL), and 98.08% (LLL), while the Bi-LSTM model further improves these results to 99.82%, 99.83%, 99.75%, and 98.24%, respectively, as shown in
Table 4.
Importantly, this work also introduces LLLG fault classification, which was not considered in previous comparative works. For such a severe symmetric fault condition, the LSTM model achieved 98.06%, while Bi-LSTM achieved 98.88%, confirming that bidirectional temporal learning is effective in capturing highly correlated multi-phase fault patterns.
In the bar chart in
Figure 23, we grouped different models by fault classification accuracy. Models range from traditional machine learning classifiers to deep learning structures, sampling five kinds of faults: LG, LLG, LL, LLL, and LLLG. The traditional models, Decision Tree (DT), Support Vector Machine (SVM), Naive Bayes (NB) and Multilayer Perceptron (MLP), can achieve as much as 98%. It is worth mentioning that Multilayer Perceptron (MLP) possesses the strongest learning capability of the four types. LSTM and Bi-LSTM are markedly superior to the classical classifiers; they all achieve more than 98% accuracy in any of the fault types considered.
Compared to the latest hybrid models, such as CNN-attention-LSTM [
16], and protection schemes for distribution networks using a Bi-LSTM-based approach [
13], the proposed framework yields competitive performance or outperforms other models while maintaining low structural complexity and fast inference capabilities. These results validate that Bi-LSTM provides a very reliable and scalable solution for intelligent microgrid protection under both conventional and extreme fault conditions.
Some advanced deep learning-based architectures (e.g., hybrid CNN-LSTM, GRUs, and attention-based sequence models) have been studied for power system fault analysis in recent years, and this study aimed to investigate the training ability of LSTM and Bi-LSTM networks for learning temporal features from a set of similar experimental microgrid fault signals. We employed ML classifiers to assess the difference in the temporal feature extraction performance of deep learning methods compared with baseline or conventional ML classification approaches. Based on the analysis findings, dual-direction learning can provide better classification accuracy for microgrid fault signals than learning from only one direction. More complex sequence modeling approaches could be combined into hybrid architectures to enhance classification accuracy even further.
4. Conclusions
Fault detection and classification are essential for reliable and secure microgrid operation, particularly under increasing the penetration of distributed energy resources. This study presented LSTM and Bi-LSTM-based fault classification models using three-phase current signals to address the limitations of conventional protection schemes under symmetrical and overlapping fault conditions. Five major fault types, including the rarely investigated LLLG fault, were successfully classified. The results demonstrate that both models significantly outperform existing MLP-based approaches, with Bi-LSTM consistently achieving superior performance due to its ability to capture bidirectional temporal dependencies. Notably, Bi-LSTM achieved a classification accuracy of 98.88% for the complex LLLG fault, which was addressed for the first time in this study. These findings confirm the suitability of the proposed models for real-time fault detection in smart microgrids and their potential integration into intelligent protection relays.
The proposed framework was validated in a simulation environment based on RTDS, but, in real-world applications, multiple challenges arise, including sensor noise, delays in communication, and uncertainty in measurements. The next phase of our work will evaluate how well the proposed deep learning models can perform under these types of noise or other issues and then afterwards check the feasibility of our proposed approach using real-world microgrid data. Future work will focus on validating the proposed approach using real microgrid measurements, incorporating voltage and power-quality features, and deploying low-complexity architectures on embedded platforms for decentralized and high-speed fault protection.