Attention-Guided Multi-Task Learning for Fault Detection, Classification, and Localization in Power Transmission Systems

Alam, Md Samsul; Islam, Md Raisul; Fan, Rui; Alam Shazid, Md Shafayat; Hasan, Abu Shouaib

doi:10.3390/en18246547

Open AccessArticle

Attention-Guided Multi-Task Learning for Fault Detection, Classification, and Localization in Power Transmission Systems

by

Md Samsul Alam

^1,*

,

Md Raisul Islam

²,

Rui Fan

¹,

Md Shafayat Alam Shazid

³

and

Abu Shouaib Hasan

¹

Electrical and Computer Engineering, University of Denver, Denver, CO 80208, USA

²

Institute of Information and Communication Technology, Bangladesh University of Engineering and Technology (BUET), Dhaka 1000, Bangladesh

³

Energy Science and Engineering, Khulna University of Engineering and Technology (KUET), Khulna 9203, Bangladesh

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(24), 6547; https://doi.org/10.3390/en18246547

Submission received: 30 October 2025 / Revised: 1 December 2025 / Accepted: 11 December 2025 / Published: 15 December 2025

(This article belongs to the Special Issue Advanced Machine Learning and Data Analysis Technologies in Modern Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

Timely and accurate fault diagnosis in power transmission systems is critical to ensuring grid stability, operational safety, and minimal service disruption. This study presents a unified deep learning framework that simultaneously performs fault identification, fault type classification, and fault location estimation using a multi-task learning (MTL) approach. Using the IEEE 39–Bus network, a comprehensive data set was generated under various load conditions, fault types, resistances, and location scenarios to reflect real-world variability. The proposed model integrates a shared representation layer and task-specific output heads, enhanced with an attention mechanism to dynamically prioritize salient input features. To further optimize the model architecture, Optuna was employed for hyperparameter tuning, enabling systematic exploration of design parameters such as neuron counts, dropout rates, activation functions, and learning rates. Experimental results demonstrate that the proposed Optimized Multi-Task Learning Attention Network (MTL-AttentionNet) achieves high accuracy across all three tasks, outperforming traditional models such as Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP), which require separate training for each task. The attention mechanism contributes to both interpretability and robustness, while the MTL design reduces computational redundancy. Overall, the proposed framework provides a unified and efficient solution for real-time fault diagnosis on the IEEE 39–bus transmission system, with promising implications for intelligent substation automation and smart grid resilience.

Keywords:

multitask learning (MTL); fault diagnosis; fault detection; classification of fault types; estimation of fault location; attention mechanism; hyperparameter optimization

1. Introduction

The stability and operational reliability of a power system depend heavily on the performance and integrity of its transmission network. Any disruption within transmission lines can significantly affect the continuity of the electrical supply, potentially leading to widespread blackouts and economic losses. These transmission lines, while essential, are prone to various types of short-circuit faults, both symmetrical and asymmetrical, which threaten the secure delivery of electrical energy across the grid [1]. Such faults often arise from natural hazards such as storms or lightning, insulation failures, equipment failures, or human error, all of which require immediate attention to prevent prolonged outages and equipment damage [2].

As modern power grids evolve with the integration of distributed energy resources (DERs), automation systems, and wide-area monitoring technologies, the need for accurate and real-time fault diagnosis has become increasingly critical [3,4]. The effectiveness of fault identification and classification depends largely on the quality and diversity of the underlying dataset. In practice, data sets must encompass a wide array of operational scenarios, including different loading conditions, types of fault, resistance levels, and system topologies. Without capturing such variability, diagnostic models may fail to generalize when deployed in actual grid environments. Therefore, developing realistic and comprehensive datasets that closely reflect practical network behavior is fundamental to advancing fault diagnosis capabilities in contemporary power systems.

Despite notable advancements in the application of machine learning and deep learning for power system fault analysis, many existing studies suffer from a lack of realistic data representation. Several works rely on public or simplified data sets that often exclude critical parameters such as fluctuating active/reactive power flows, unbalanced operating conditions, and variable fault resistances [5,6,7]. Moreover, a significant number of approaches address fault classification and location identification as separate problems, each requiring independent models or sequential processing pipelines [5,8]. While effective in controlled settings, these methods introduce computational inefficiencies and hinder scalability for real-time grid monitoring applications.

This research addresses the above limitations by advocating for an integrated approach. A unified framework capable of simultaneously detecting faults, classifying their types, and estimating their locations not only reduces computational overhead but also improves consistency and responsiveness. To make such a model reliable, it must be trained on data that realistically captures fault behavior across a broad spectrum of grid conditions. This motivation drives the need for an advanced model architecture and a robust dataset reflecting real-world network dynamics.

In this paper, we introduce a novel deep learning framework that unifies three essential tasks, fault detection, fault type classification, and fault location identification, into a single cohesive model. This multi-task learning (MTL) model, referred to as MTL-AttentionNet, is designed to enhance both performance and efficiency in power system fault analysis. The key contributions are:

Dataset generation: A comprehensive dataset is generated using the IEEE 39–Bus transmission system in DIgSILENT PowerFactory, incorporating diverse fault scenarios, probabilistic load variations ( $\pm 5 %$ ), a wide range of fault resistances (0–50 $Ω$ ), and spatially distributed fault locations (10% to 90% of line length). This realistic simulation setup captures the stochastic behavior of power systems and provides a robust foundation for developing and evaluating fault diagnosis models under practical grid conditions.
Unified multi-task framework: We propose a multi-task deep learning framework for end-to-end fault diagnosis in power systems, simultaneously addressing fault detection, classification, and location. The model integrates shared hierarchical feature extraction with task-specific adaptive branches, explicitly leveraging cross-task dependencies to enhance diagnostic accuracy and computational efficiency.
Raw signal-based feature learning: Unlike traditional ML approaches that depend on manual feature engineering, our framework leverages raw three-phase voltage and current samples from a single post-fault instant for fault diagnosis. This eliminates the need for complex signal processing or high-frequency time-series inputs, streamlining the modeling process and enhancing computational efficiency without sacrificing diagnostic accuracy.
Comprehensive benchmarking: We benchmark the proposed MTL-AttentionNet against state-of-the-art methods and baselines, including Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP). Across detection, classification, and location, the framework consistently outperforms these baselines, demonstrating robustness, superior diagnostic accuracy, and adaptability in diverse fault scenarios and network conditions.

The remainder of this paper is structured as follows. Section 2 reviews existing research related to fault diagnosis in power systems. Section 3 outlines the fault scenario simulation and dataset development process using the IEEE 39–Bus system. Section 4 introduces the MTL methodology and the formulation of the model. Section 5 presents the proposed attention-enhanced MTL framework, including architectural design and feature engineering. Section 6 reports experimental results, performance evaluation, and comparative analysis with baseline models. Finally, Section 7 summarizes the key contributions and concludes the study.

2. Review of Existing Works

A broad spectrum of fault diagnosis methods has been explored in transmission and distribution networks, each offering unique advantages and encountering specific limitations. Conventional techniques often rely on impedance-based estimation or traveling-wave analysis to detect and classify faults [9,10].

Although traveling-wave-based approaches can be accurate for high-resistance faults, they typically require higher sampling rates, fast data synchronization, and advanced communication infrastructure factors that can pose practical challenges [9,11]. Impedance-based methods, on the other hand, are comparatively straightforward but may become less accurate in the presence of high fault impedances or complex lateral branches [12,13]. Wavelet transform-based solutions have also gained traction by effectively capturing and analyzing transient fault signals across multiple frequency bands; however, their computational load can become significant, especially when dealing with large-scale systems or higher decomposition levels [14,15,16]. Knowledge-based and expert system approaches have been proposed to leverage heuristic rules and historical data, but their robustness diminishes if the system deviates from previously observed operating conditions [17].

Over time, the increasing accessibility of higher-fidelity data, enabled by intelligent electronic devices, phasor measurement units (PMUs), and real-time digital simulators, has led to a surge in data-centric methods for fault diagnosis [4,18,19]. Researchers have begun integrating machine learning (ML) algorithms such as support vector machines (SVMs), artificial neural networks (ANNs), and deep learning architectures to classify and localize faults efficiently. In many cases, these ML-based methods demonstrate impressive classification accuracies exceeding 95%, with some even approaching or surpassing 99% [5,20,21]. However, their reliance on either specialized hardware (e.g., PMUs) [5,6] or computationally intensive hybrid feature extraction has limited their adoption in broader, real-world settings [5,22]. Moreover, several studies emphasize that although deep learning models like convolutional neural networks (CNNs) or hybrid CNN-LSTM techniques achieve high accuracy for fault classification, they often either do not tackle fault location or address it with limited precision [8,23,24,25]. In addition, the training datasets in some of these studies originate from relatively small or simplified networks, thus raising questions about scalability and real-world applicability [20,26,27].

Another common limitation is the discrepancy between simulation-heavy datasets and practical field conditions. Although certain works incorporate noise or data augmentation to mimic real scenarios [18,25,28], many still fall short in accounting for the full complexity of large-scale networks with varying load demand, and uncertain fault resistance [25,29,30]. Even when advanced ML models, such as explainable CNNs or spatiotemporal graph networks, are developed for fault classification and localization, they often demand substantial computational resources or extensive instrumentation to achieve their reported accuracy [18,30]. Likewise, approaches using adaptive neuro-fuzzy inference systems often outperform standard neural networks but at the cost of higher computational overhead, making real-time deployment challenging [20]. These constraints highlight the necessity for more efficient, scalable, and data-driven solutions that address both fault classification and precise fault location under realistic, large-scale operating conditions.

In recent years, multi-task learning (MTL) has also gained momentum in various power system applications, leveraging the interrelatedness of multiple tasks to enhance overall performance. For instance, an MTL-based framework employing a logistic low-ranked dirty model has been proposed for fault detection in PMU data, enabling improvements by exploiting shared features among different network locations [31]. In the context of smart grids, MTL has further been adopted for building load forecasting, where predicting both load and temperature simultaneously yields superior predictive accuracy compared to single-task approaches [32]. These advances highlight the versatility of MTL in addressing complex, high-dimensional power system problems. Nonetheless, to the best of our knowledge, no existing study applies MTL for fault identification, type classification, and location within a unified framework, despite the potential benefits of jointly optimizing these interconnected tasks in large-scale and dynamic networks.

3. Fault Scenario Simulation and Dataset Development

3.1. IEEE 39–Bus System Setup and Load Variation

The IEEE 39–Bus test system, depicted in Figure 1, was selected as the reference network for fault simulation due to its realistic complexity and widespread use in transmission system studies [33]. The system comprises multiple generators, transformers, and loads interconnected through transmission lines, offering a dynamic environment for modeling fault scenarios across diverse operational conditions.

Fault simulations were performed using DIgSILENT PowerFactory operating in engine mode, coupled with automated Python (v3.7) scripts. This integration enabled systematic variation in load conditions, fault injection, simulation control, and data extraction in an efficient and reproducible manner. To reflect real-world grid variability, active (P) and reactive (Q) loads at each bus were probabilistically perturbed using a normal distribution with a standard deviation of approximately

\pm 5 %

around their nominal values. This approach ensured that the network remained operationally feasible while exhibiting realistic operating point fluctuations.

3.2. Fault Scenario Configuration and Data Generation

Fault scenarios were comprehensively designed to include typical conditions encountered in practical power systems, encompassing single-phase-to-ground faults, two-phase faults, two-phase-to-ground faults, three-phase-to-ground faults, and no-fault conditions. For each transmission line in the IEEE 39–bus network, these fault types were simulated at multiple points along the line, covering fault locations from 10% to 90% of the line length. To capture the effects of varying fault severities, fault resistances were randomized between 0

Ω

and 50

Ω

across different scenarios. Each case was labeled with its corresponding fault type and location class, and, over all iterations, the procedure covered faults and healthy conditions on all transmission lines of the IEEE 39–bus system.

A total of 86,676 samples were generated, covering both faulted and healthy conditions under diverse operating scenarios. The complete distribution of fault types and sample counts is summarized in Table 1, providing a balanced dataset for robust machine learning model development.

As shown in Figure 2, the training dataset is generated through an automated co-simulation loop between DIgSILENT PowerFactory and Python. The script first initializes the PowerFactory environment, retrieves the list of all transmission lines, and in each iteration selects one line while perturbing the bus loads around their nominal values using the probabilistic

\pm 5 %

P–Q variation described in Section 3.1. Each RMS dynamic simulation is executed over a fixed time window with identical start and end times for all scenarios. Within this window, the fault (when present) is always applied at the same instant under the selected operating condition. The routine then decides whether to simulate a faulted or healthy (no-fault) operating condition. For faulted cases, the algorithm gradually selects the fault type (SLG, LL, LLG, or three-phase-to-ground), the affected phase(s), the fault location along the line (10–90% of the line length), and the fault resistance (0–50

Ω

) for each line. An RMS simulation is executed for the selected scenario, and data from both ends of the line are recorded to form one labeled sample comprising the electrical measurements and the associated fault/no-fault label, fault type, and location class. This process is repeated for all transmission lines and across multiple operating conditions until the full dataset of 86,676 samples is obtained.

For each simulated case, the input feature set consists of electrical measurements obtained from both terminal buses of the selected transmission line. Specifically, the model uses the three-phase voltages and three-phase line currents (magnitudes and phase angles) recorded at the two line ends. These features capture phase-wise asymmetries and magnitude variations essential for fault identification and fault type classification. For the fault location task, the shared features are enhanced with line-level contextual information, including an encoded line identifier, the symmetrical-component voltages (positive, negative, and zero sequence) at both line ends, and the phase-angle measurements of the three-phase voltages. This results in a compact but informative feature representation tailored to each task within the MTL framework.

4. Baseline Models and Proposed MTL Framework

4.1. Multi-Task Learning (MTL)

4.1.1. MTL Framework and Model Formulation

This study adopts the MTL approach as the foundational modeling framework for fault detection, classification, and location identification. MTL is a technique in which multiple related tasks are learned concurrently within a single neural network architecture. Rather than training isolated models for each task, MTL facilitates shared representation learning, enabling the model to exploit inter-task correlations. This approach improves generalization, reduces overfitting, and enhances overall model efficiency [32].

At the core of MTL is the concept of parameter sharing. The network consists of shared layers that extract general features useful across all tasks, followed by task-specific branches (or heads) that learn representations tailored to individual outputs. This architecture allows the model to learn both commonalities and distinctions between tasks, improving its ability to make accurate predictions under a variety of scenarios [34].

In a general MTL setup, let

θ_{S}

denote the parameters of the shared layers and

θ_{i}

represent the task-specific parameters for task i. The complete parameter set for a model handling T tasks is expressed as:

θ = {θ_{S}, θ_{1}, θ_{2}, \dots, θ_{T}}

(1)

Each task i has its own loss function,

L_{i} (θ_{S}, θ_{i})

. These losses are aggregated to form a joint objective function:

L_{MTL} (θ) = \sum_{i = 1}^{T} α_{i} L_{i} (θ_{S}, θ_{i})

(2)

where

α_{i}

is a scalar weight controlling the influence of task i in the combined loss. During training, gradients are computed for all parameters, and the shared layers are updated by combining the weighted gradients from each task:

\nabla_{θ_{S}} L_{MTL} (θ) = \sum_{i = 1}^{T} α_{i} \nabla_{θ_{S}} L_{i} (θ_{S}, θ_{i})

(3)

This collaborative optimization allows one task to support others through shared learning signals, promoting the development of more generalizable features.

4.1.2. Architectural Components and Training Strategy

The MTL model architecture is designed to balance shared feature extraction with task-specific specialization, enhancing both generalization and accuracy. It begins with shared dense layers that transform input electrical measurements into high-level representations common to all tasks. These shared features enable the model to exploit inter-task correlations among fault detection, classification, and localization, improving data efficiency and reducing overfitting [34].

Following the shared backbone, the network branches into three task-specific heads: fault detection (binary classification with sigmoid activation), fault type classification, and fault location estimation (multi-class classification with softmax activation). This structure preserves shared learning benefits while allowing each branch to optimize its respective diagnostic goal.

Each task employs its corresponding loss—binary cross-entropy for detection and categorical cross-entropy for classification and localization. In this work, all three tasks are given equal importance. Accordingly, we set

α_{i} = 1 / T

for

i = 1, \dots, T

in (2), so that the total multi-task loss reduces to the average of the task-specific losses [31]. Gradients from all tasks jointly update shared and task-specific parameters through backpropagation.

To improve robustness, dropout and batch normalization are applied in both shared and task-specific layers, mitigating overfitting and stabilizing training in high-dimensional input spaces [34]. Performance for each task is tracked independently using accuracy, precision, recall, and F1-score to ensure balanced learning.

As shown in Figure 3, the workflow involves input preprocessing, shared feature extraction, branching into task heads, combined loss computation, and iterative parameter updates until convergence. Once performance thresholds are met, the finalized model forms the foundation of the proposed MTL-AttentionNet, optimized for real-time fault diagnosis in transmission systems.

4.2. Support Vector Machine (SVM)

Support Vector Machine (SVM) was adopted as a baseline for fault type classification owing to its solid theoretical foundation and reliability in high-dimensional feature spaces. SVM constructs an optimal hyperplane that maximizes class separation in a transformed feature space, effectively distinguishing fault categories based on electrical signatures. A multi-class SVM using a one-vs-rest strategy and an RBF kernel was trained on normalized voltage and current samples from simulated fault conditions. Although SVM is not inherently multi-task, it serves as a strong benchmark for evaluating the proposed MTL framework.

4.3. Multi-Layer Perceptron (MLP)

The Multi-Layer Perceptron (MLP) was employed as another baseline for fault detection, classification, and location estimation. It is a feedforward neural network capable of learning nonlinear mappings between electrical features and fault labels. A shallow MLP with two hidden layers, ReLU activations, and dropout regularization was trained using standardized measurements with the Adam optimizer. Binary and multi-class outputs were used for detection and classification tasks, respectively. Despite lacking task-sharing capabilities, MLP provides a valuable comparison to assess the standalone performance of deep learning models.

5. Proposed Model

5.1. Overall Architecture of the Proposed Framework

The proposed framework is designed to simultaneously address three critical tasks in transmission line fault diagnosis: fault identification, fault type classification, and fault location estimation. This integrated approach is rooted in the principles of multi-task learning (MTL), which facilitates shared feature representation while allowing task-specific outputs.

At the core of the model lies a shared representation module that processes fundamental electrical measurements, including three-phase currents and bus voltages. These shared features capture essential dynamics of the power system and serve as the input to an initial stack of dense layers, followed by dropout regularization. This part of the network is responsible for learning general patterns associated with power system behavior under various conditions, including healthy states and fault events. To further refine feature relevance, an attention mechanism is introduced after the shared dense layers. This mechanism assigns adaptive weights to different input features based on their contextual importance, enabling the model to emphasize measurements that are most indicative of fault presence, type, or location.

Following the attention-enhanced representation, the network branches into three task-specific output layers:

A binary classification head using a sigmoid activation function to determine whether a fault has occurred.
A multi-class softmax classifier for identifying the specific type of fault.
A second multi-class softmax classifier for estimating the location of the fault along the affected transmission line, expressed as discrete segments (e.g., 10% to 90% of line length).

This architecture allows shared learning of common fault features while preserving the flexibility to handle the unique demands of each task. By jointly modeling these tasks, the network benefits from shared gradient flows, where improvements in one task can positively influence the learning of another. This synergy not only boosts overall model performance but also facilitates practical deployment in real-time environments by eliminating the need for separate models for each diagnostic stage. The complete architecture is illustrated in Figure 4, highlighting the flow of information from shared input layers to task-specific outputs.

5.2. Task Formulation

Effective feature selection is critical in building robust machine learning models, particularly when the goal is to simultaneously address multiple tasks. In this study, fault identification, fault type classification, and fault location estimation are formulated as three interrelated tasks, each drawing from a common pool of electrical measurements while also benefiting from task-specific information.

The shared encoder receives the three-phase voltage and current measurements from both line ends, while the location head additionally incorporates the encoded line identifier and sequence-component and phase-angle features to resolve spatially close fault locations.

Fault Identification relies on global indicators that differentiate healthy and faulty states. Key features include sudden changes in current magnitude or significant deviations in voltages, which signal the presence of a disturbance in the system.
Fault Type Classification relies on phase-wise differences in the measured quantities. The model receives the three-phase voltages and line currents at both ends of the selected line, which encode how each fault type distorts the balance between phases. For example, single-line-to-ground faults mainly affect one phase and its corresponding return current, whereas double-line and three-phase faults produce more symmetric but higher-magnitude changes across multiple phases. By learning patterns in inter-phase magnitude ratios and angle differences at the two line ends, the shared representation can reliably distinguish among the ten fault categories.
Fault Location Estimation additionally depends on line-specific contextual information. In this task, the shared electrical features are augmented with auxiliary inputs, including the encoded line identifier and the fault-type label, so that the model knows on which transmission line the event occurs and what kind of disturbance is present. This extra context helps disambiguate spatially adjacent locations that may exhibit very similar voltage and current signatures, enabling the location head to refine its prediction among the nine distance classes along the faulted line.

Thus, the shared feature set provides a common representation of the instantaneous electrical state, while each task head exploits a different subset of these signals and contextual variables tailored to its diagnostic goal.

5.3. Attention Mechanism

In the proposed MTL-AttentionNet, attention is applied on top of the shared latent representation rather than directly on the raw voltages and currents. Let

h \in R^{d}

denote the output of the shared dense layer (with

d = 256

units in this work), which represents nonlinear combinations of the original measurements at both ends of the line. To account for the fact that not all latent dimensions contribute equally to fault diagnosis, we employ a feature-wise (channel) attention mechanism with a single attention head.

The attention module first computes an intermediate score vector

s = W_{a} h + b_{a},

(4)

where

W_{a} \in R^{d \times d}

and

b_{a} \in R^{d}

are learnable parameters. These scores are then normalized using the softmax function to obtain attention coefficients

α_{i} = \frac{exp (s_{i})}{\sum_{j = 1}^{d} exp (s_{j})}, i = 1, \dots, d,

(5)

which satisfy

α_{i} \geq 0

and

\sum_{i = 1}^{d} α_{i} = 1

. The attended shared representation is obtained by element-wise reweighting,

{\tilde{h}}_{i} = α_{i} h_{i}, i = 1, \dots, d,

(6)

and

\tilde{h}

is subsequently fed to the three task-specific heads for fault detection, fault type classification, and fault location estimation. This design corresponds to a single-head, feature-wise attention layer (channel attention) acting on a static feature vector.

From a power-system perspective, each component of

h

captures a latent pattern constructed from the underlying features (e.g., phase imbalances, asymmetries between line terminals, or magnitude/angle combinations). The attention layer learns which of these latent patterns are most informative for the diagnosis tasks and amplifies them, while suppressing less relevant dimensions. In this way, the attention module acts as a learned feature-selection or gating mechanism on top of the shared representation.

To analyze the behavior of the attention mechanism, we examined the attention coefficients on the test set. Figure 5 illustrates the average attention weights of the shared latent features, sorted by importance, and highlights the top 30 dimensions. The distribution is clearly non-uniform: a relatively small subset of latent features receives higher attention weights, whereas many dimensions are assigned near-zero values. This indicates that the attention module concentrates the model’s capacity on a compact set of informative latent features derived from the measured voltages and currents, rather than treating all dimensions equally.

5.4. Hyperparameter Tuning with Optuna

Model optimization was performed using Optuna, an automated framework for hyperparameter tuning. Key parameters—such as hidden layer count, neuron size, dropout rate, activation type, and learning rate—were systematically explored within predefined ranges to identify the most effective configuration.

In each trial, Optuna instantiated an attention-based MTL model with sampled hyperparameters and evaluated its performance using the aggregated validation loss across fault detection, classification, and location tasks. This multi-objective approach ensured balanced optimization across all outputs.

Using adaptive sampling and early stopping, Optuna efficiently converged to an optimal configuration, minimizing manual intervention. The best-performing model, retrained on the full dataset, achieved the lowest validation loss and demonstrated strong generalization for complex fault diagnosis scenarios. The final configuration is presented in Table 2.

6. Results and Performance

6.1. Experimental Setup and Evaluation Metrics

The experimental evaluation was conducted using a dataset comprising labeled transmission line scenarios, encompassing both normal operating states and various fault conditions. To ensure reliable model training and performance assessment, the dataset was divided into three distinct subsets: 70% for training, 15% for validation, and 15% for testing. This division was performed in a two-step procedure: initially separating the training portion from a 30% holdout set, and subsequently dividing the holdout set equally into validation and test sets.

Model training and evaluation were carried out on a Windows 64-bit platform equipped with an Intel(R) Core(TM) i7-8750H CPU (2.20 GHz), 16 GB RAM, and an NVIDIA GeForce RTX 2060 GPU with 6 GB GDDR6 memory. The implementation used TensorFlow 2.x with Keras for model development, while Optuna was used to automate hyperparameter optimization.

To assess the model’s performance in fault detection, type classification, and location estimation, standard classification metrics were employed. These included accuracy, precision, recall, specificity, and F1-score, which are particularly useful in the presence of class imbalance. Confusion matrices were additionally examined to evaluate the distribution of correct and incorrect predictions across classes. Definitions and mathematical formulations of the evaluation metrics are presented in Table 3.

6.2. Fault Detection Results

The proposed MTL framework demonstrated excellent fault detection capability, reliably distinguishing between normal and faulty conditions. Treated as a binary classification problem, this task leveraged shared features comprising three-phase voltages and currents from both line ends, serving as the initial stage of the hierarchical fault analysis.

The model achieved an overall accuracy of 100% in fault identification. The confusion matrix (Figure 6) confirms perfect classification, with no false positives or false negatives recorded. Precision, recall, and F1-score values of 1.000 across both classes further highlight the model’s robustness and generalization ability.

6.3. Fault Type Classification Performance

The second task in the proposed MTL framework addresses fault type classification, formulated as a ten-class problem encompassing various fault scenarios, including single-phase-to-ground, two-phase, two-phase-to-ground, and three-phase-to-ground faults. Utilizing the shared representations extracted during fault detection, the model refines its decision boundaries to accurately assign each detected fault to its corresponding class.

The confusion matrix in Figure 7 confirms the high reliability of the classification task, with only minor confusion between neighboring fault types. The numeric class indices (0–10) in this figure follow the fault category mapping given in Table 1. Notably, A-B phase-to-ground faults exhibit slight misclassification with other line-to-ground faults, attributed to inherent similarities.

Class-wise performance metrics summarized in Table 4 reveal consistently high precision, recall, and specificity across all categories. Several fault types, including B-phase-to-ground, B-C-phase-to-ground, and three-phase-to-ground faults, achieved perfect scores across all evaluation metrics. Slightly lower, though still excellent, performance was observed for A-B and C-A phase faults, likely due to waveform overlaps in high-resistance or partial-load conditions. The weighted average F1-score reached 0.9998, demonstrating the robustness of the model even under complex and overlapping fault scenarios.

6.4. Fault Location Identification

The final task within the proposed MTL framework addresses fault localization along transmission lines, formulated as a multi-class classification problem across nine location classes (from 10% to 90% of line length). The model leverages shared features such as three-phase voltages and currents, along with task-specific inputs including line identifiers and fault types, emulating real-world hierarchical protection schemes.

The confusion matrix in Figure 8 demonstrates that most faults are accurately classified, with only minor confusion between neighboring segments, as expected due to gradual electrical transitions along lines. The model achieves an overall fault location accuracy of 98.82%, as summarized in Table 5. Higher F1-scores are observed for locations nearer to the sending end (e.g., 10%, 30%, and 40% segments), while marginally lower performance occurs toward the receiving end (e.g., 80% and 90%), where signal attenuation and reflections increase. A closer inspection of the 154 misclassified samples from 13,001 test samples shows that they are predominantly associated with higher fault resistances (e.g.,

R_{f} > 30 Ω

) near these end segments, where the RMS voltage and current patterns of adjacent location classes become particularly similar.

An additional evaluation of fault resistance impact is illustrated in Figure 9 and Figure 10. Location accuracy remains near perfect for faults with low resistance but gradually declines as resistance exceeds 30

Ω

. This degradation, typical in protective relaying, results from weaker fault signatures at high resistances, underscoring the challenge of detecting high-impedance faults under noisy conditions.

Despite these variations, the proposed MTL model maintains strong localization capability, suggesting its practical utility for real-time fault isolation and system protection.

6.5. Comparative Analysis

Recent studies in fault classification and location have explored a wide range of ML and DL architectures with impressive reported accuracy, though system configurations and evaluation setups vary widely. For instance, Nawaz et al. [18] implemented eight classical ML models on a three-terminal 735 kV line and used 12,350 simulated faults generated in Simulink. Their best-performing model, Logistic Regression, achieved 99.97% classification accuracy, though fault location was not addressed. In another study, Fang et al. [30] proposed a multitask CNN trained on 20 kHz-sampled voltage and current waveforms from a 200 km, 220 kV line. Their model achieved 99.85% classification accuracy and 98.02% location accuracy. However, the reliance on high-resolution pre- and post-fault time-series data from a single-circuit TL model limits the generalizability and real-time applicability of the approach, as such high sampling rates are costly in both computation and data collection.

Other notable works have employed a variety of network types and features, generally balancing between accuracy and practical complexity. Onaolapo et al. [35] used MLP-ANN models on a 400 kV, 29-bus GB system, leveraging windowed V/I ratios across fault stages and attaining 1% to 3% location error based on fault type. Belagoune et al. [36] adopted multitask LSTMs on the Kundur 4-Machine Two-Area system with 84-dimensional phasor inputs, reaching classification accuracy of 99.93%. Chen et al. [37] employed a convolutional sparse autoencoder trained on half-cycle waveform data from a high-voltage TL and reported 99.74% accuracy in fault classification.

In contrast, our proposed MTL-AttentionNet delivers a competitive performance of 99.99% fault classification accuracy and 98.82% fault location accuracy on the complex IEEE 39-bus system. The model processes only single-sampled post-fault data and leverages a unified multitask architecture with shared feature extraction and task-specific branches. This design reduces dependency on dense time-series data, improving scalability for real-time applications. A comparison with existing singular task-based models is provided in Table 6.

To provide additional perspective, two baseline algorithms, SVM and MLP, were evaluated under identical dataset conditions and task formulations, with their architectures and hyperparameters tuned on the validation set to provide competitive baselines. As shown in Table 7, the MLP achieved a fault classification accuracy of 99.98%, closely approaching that of the proposed MTL-AttentionNet. However, its performance on fault location estimation dropped to 94.52%. SVM underperformed across all tasks, with a notably low location estimation accuracy of 84.47%.

In general, while previous work exhibits high performance under specific conditions, many are constrained by the complexity of their feature engineering pipelines, system configurations, or data requirements. The proposed MTL-AttentionNet addresses these limitations through the inclusion of probabilistic load variations and a wide range of fault resistances to strengthen the applicability of the model to real-world fault analysis.

6.6. Statistical Robustness and Practical Considerations

To assess the robustness of the proposed MTL-AttentionNet beyond a single training run, a stability analysis was performed over

K = 5

independent runs with different random seeds. For each run, the model was trained on the same train/validation splits and evaluated on the held-out test set (

N

= 13,001 samples). The resulting test accuracies were then summarized by their mean and standard deviation across the five runs. As reported in Table 8, all three tasks exhibit consistently high performance, with fault identification and fault type classification showing negligible variability across runs and fault location estimation maintaining a high mean accuracy with low variance. Specifically, the mean ± standard deviation of test accuracy over the five runs is

100.00 \pm 0.00

% for fault identification,

99.98 \pm 0.02

% for fault type classification, and

98.82 \pm 0.13

% for fault location estimation.

In addition, we report 95% confidence intervals (CIs) for the test-set accuracies of the final selected model, computed using a binomial proportion model with

N

= 13,001 test samples. The resulting accuracies and 95% CIs are

100.0

% (CI:

[100.0, 100.0]

%) for fault identification,

99.98

% (CI:

[99.95, 100.0]

%) for fault type classification, and

98.82

% (CI:

[98.63, 99.00]

%) for fault location estimation (Table 8). The narrow confidence intervals and small standard deviations reflect both the large size of the test set and the stability of the optimization process, indicating that the reported performance is statistically robust and not the result of a single favourable initialization.

From an implementation perspective, the proposed MTL-AttentionNet contains 173,845 trainable parameters, corresponding to approximately

0.66

MB of weights in 32-bit floating-point representation. To obtain an indicative measure of computational cost, the average inference latency for a single forward pass was evaluated on a CPU-only environment (no GPU acceleration). On this experimental setup, the mean inference time per sample was approximately 61 ms over 500 repeated predictions. While these numbers are hardware-dependent and may vary across platforms, they suggest that the model has a modest memory footprint and an inference latency that is compatible with practical deployment in digital substations or dedicated embedded controllers, especially when implemented on modern industrial hardware or optimized with standard edge-inference tools.

6.7. Discussion, Limitations, and Future Work

The results in Section 6.5 show that the proposed MTL-AttentionNet can accurately diagnose faults across three complementary tasks using a single unified architecture. Fault detection and fault type classification both achieve near-perfect accuracy, while fault location estimation attains high performance even for high-resistance faults and different line lengths. The attention analysis further indicates that the model concentrates on a small subset of latent features derived from the three-phase voltages and currents, suggesting that the attention layer effectively acts as a data-driven feature selection and gating mechanism within the shared representation.

Despite these strengths, several assumptions constrain the current scope. First, each sample is constructed from a single post-fault RMS snapshot of three-phase voltages and currents at the two line terminals, taken at a fixed instant after fault inception and before clearing. This phasor-based view is consistent with PMU/IED protection, but it does not explicitly exploit temporal dynamics, which may be important for the most challenging location cases on long lines and under very high fault resistance. Second, the framework assumes three-phase measurements at both ends of each line, as provided in modern digital substations; long-distance lines with only single-end measurements would require retraining on one-terminal features or additional estimation steps.

These aspects suggest several directions for future work. Extending the architecture to sequence models that process short time windows of phasor or waveform data, adapting the framework to single-end measurement scenarios, and adding further deep-learning and capacity-matched single-task baselines on the same dataset would provide a more comprehensive assessment of multi-task learning benefits. In addition, validating the approach on other benchmark networks and, ultimately, on field PMU data will be important to confirm its robustness and practical applicability in real transmission systems.

7. Conclusions

This study proposed a unified deep learning framework based on multi-task learning (MTL) for fault diagnosis in power transmission systems. The MTL-AttentionNet model integrates shared feature representations, task-specific branches, and an attention mechanism to capture both global and task-specific fault characteristics, enhancing prediction accuracy and model robustness.

Experimental evaluations demonstrated that the proposed model achieves high accuracy in fault detection and fault type classification, while maintaining strong performance in fault location estimation, even under challenging conditions such as high-resistance faults. Comparative analysis with benchmark machine learning and deep learning methods further confirmed the superiority of the proposed model in diagnostic precision.

By consolidating three critical diagnostic tasks into a single architecture, the MTL-AttentionNet framework demonstrates practical efficiency and reduced system complexity for PMU-like transmission-grid data. While the present study focuses on the IEEE 39-bus network, the diverse training scenarios, incorporating varying load conditions, line lengths, fault resistances, and fault locations, indicate that the approach is well-suited for extension to other networks. In future work, we plan to validate the framework on additional benchmark systems and field PMU datasets, and to investigate real-time implementations for intelligent substation automation, grid monitoring, and smart grid resilience applications.

Author Contributions

M.S.A. led the analytical work, methodology development, dataset generation, simulation studies, and writing of the paper. M.R.I. contributed to methodology formulation, data curation, and simulation validation. R.F. provided overall vision, guidance, supervision, and project administration. M.S.A.S. assisted with simulation setup, review and model implementation. A.S.H. provided guidance on model design, review, and editing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the corresponding author on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lu, D.; Liu, Y.; Liao, Q.; Wang, B.; Huang, W.; Xi, X. Time-Domain Transmission Line Fault Location Method With Full Consideration of Distributed Parameters and Line Asymmetry. IEEE Trans. Power Deliv. 2020, 35, 2651–2662. [Google Scholar] [CrossRef]
Gururajapathy, S.S.; Mokhlis, H.; Illias, H.A. Fault location and detection techniques in power distribution systems with distributed generation: A review. Renew. Sustain. Energy Rev. 2017, 74, 949–958. [Google Scholar] [CrossRef]
Khodayar, M.; Liu, G.; Wang, J.; Khodayar, M.E. Deep learning in power systems research: A review. CSEE J. Power Energy Syst. 2021, 7, 209–220. [Google Scholar] [CrossRef]
Zhang, J.; He, Z.Y.; Lin, S.; Zhang, Y.B.; Qian, Q.Q. An ANFIS-based fault classification approach in power distribution system. Int. J. Electr. Power Energy Syst. 2013, 49, 243–252. [Google Scholar] [CrossRef]
Siddique, M.N.I.; Shafiullah, M.; Mekhilef, S.; Pota, H.; Abido, M.A. Fault classification and location of a PMU-equipped active distribution network using deep convolution neural network (CNN). Electr. Power Syst. Res. 2024, 229, 110178. [Google Scholar] [CrossRef]
Alhanaf, A.S.; Farsadi, M.; Balik, H.H. Fault Detection and Classification in Ring Power System With DG Penetration Using Hybrid CNN-LSTM. IEEE Access 2024, 12, 59953–59975. [Google Scholar] [CrossRef]
Mampilly, B.J.; Sheeba, V.S. An empirical wavelet transform based fault detection and hybrid convolutional recurrent neural network for fault classification in distribution network integrated power system. Multimed. Tools Appl. 2024, 83, 77445–77468. [Google Scholar] [CrossRef]
Hu, J.; Hu, W.; Chen, J.; Cao, D.; Zhang, Z.; Liu, Z.; Chen, Z.; Blaabjerg, F. Fault Location and Classification for Distribution Systems Based on Deep Graph Learning Methods. J. Mod. Power Syst. Clean Energy 2023, 11, 35–51. [Google Scholar] [CrossRef]
Ngu, E.E.; Ramar, K. A combined impedance and traveling wave based fault location method for multi-terminal transmission lines. Int. J. Electr. Power Energy Syst. 2011, 33, 1767–1775. [Google Scholar] [CrossRef]
Reddy, M.J.B.; Mohanta, D.K. Performance Evaluation of an Adaptive-Network-Based Fuzzy Inference System Approach for Location of Faults on Transmission Lines Using Monte Carlo Simulation. IEEE Trans. Fuzzy Syst. 2008, 16, 909–919. [Google Scholar] [CrossRef]
Shi, S.; Zhu, B.; Lei, A.; Dong, X. Fault Location for Radial Distribution Network via Topology and Reclosure-Generating Traveling Waves. IEEE Trans. Smart Grid 2019, 10, 6404–6413. [Google Scholar] [CrossRef]
Tresso, Y.V.; Fernandes, R.A.S.; Coury, D.V. Reducing multiple estimation for fault location in medium voltage distribution networks. Electr. Power Syst. Res. 2021, 199, 107424. [Google Scholar] [CrossRef]
Mishra, D.P.; Ray, P. Fault detection, location and classification of a transmission line. Neural Comput. Appl. 2018, 30, 1377–1424. [Google Scholar] [CrossRef]
Mukherjee, A.; Kundu, P.K.; Das, A. Transmission Line Faults in Power System and the Different Algorithms for Identification, Classification and Localization: A Brief Review of Methods. J. Inst. Eng. India Ser. B 2021, 102, 855–877. [Google Scholar] [CrossRef]
Fornás, J.G.; Jaraba, E.H.; Bludszuweit, H.; García, D.C.; Estopi nan, A.L. Modeling and Simulation of Time Domain Reflectometry Signals on a Real Network for Use in Fault Classification and Location. IEEE Access 2023, 11, 23596–23619. [Google Scholar] [CrossRef]
Shafiullah, M.; Abido, M.A. S-Transform Based FFNN Approach for Distribution Grids Fault Detection and Classification. IEEE Access 2018, 6, 8080–8088. [Google Scholar] [CrossRef]
Buzo, R.F.; Barradas, H.M.; Le ao, F.B. A New Method for Fault Location in Distribution Networks Based on Voltage Sag Measurements. IEEE Trans. Power Deliv. 2021, 36, 651–662. [Google Scholar] [CrossRef]
Nawaz, R.; Albalawi, H.A.; Bukhari, S.B.A.; Mehmood, K.K.; Sajid, M. Timeseries Fault Classification in Power Transmission Lines by Non-Intrusive Feature Extraction and Selection Using Supervised Machine Learning. IEEE Access 2024, 12, 93426–93449. [Google Scholar] [CrossRef]
Sodin, D.; Smolnikar, M.; Rudež, U.; Čampa, A. Precise PMU-Based Localization and Classification of Short-Circuit Faults in Power Distribution Systems. IEEE Trans. Power Deliv. 2023, 38, 3262–3273. [Google Scholar] [CrossRef]
Kanwal, S.; Jiriwibhakorn, S. Advanced Fault Detection, Classification, and Localization in Transmission Lines: A Comparative Study of ANFIS, Neural Networks, and Hybrid Methods. IEEE Access 2024, 12, 49017–49033. [Google Scholar] [CrossRef]
Kanwal, S.; Jiriwibhakorn, S. Artificial Intelligence based Faults Identification, Classification, and Localization Techniques in Transmission Lines—A Review. IEEE Lat. Am. Trans. 2023, 21, 1291–1305. [Google Scholar] [CrossRef]
Shafiullah, M.; Abido, M.A.; Al-Hamouz, Z. Wavelet-based extreme learning machine for distribution grid fault location. IET Gener. Transm. Distrib. 2017, 11, 4256–4263. [Google Scholar] [CrossRef]
Mirshekali, H.; Keshavarz, A.; Dashti, R.; Hafezi, S.; Shaker, H.R. Deep learning-based fault location framework in power distribution grids employing convolutional neural network based on capsule network. Electr. Power Syst. Res. 2023, 223, 109529. [Google Scholar] [CrossRef]
Kim, C.H.; Aggarwal, R. Wavelet transforms in power systems. Part 1: General introduction to the wavelet transforms. Power Eng. J. 2000, 14, 81–87. [Google Scholar] [CrossRef]
Barkhi, M.; Pourhossein, J.; Hosseini, S.A. Integrating fault detection and classification in microgrids using supervised machine learning considering fault resistance uncertainty. Sci. Rep. 2024, 14, 28466. [Google Scholar] [CrossRef]
Roy, B.; Adhikari, S.; Datta, S.; Devi, K.J.; Devi, A.D.; Alsaif, F.; Alsulamy, S.; Ustun, T.S. Deep Learning Based Relay for Online Fault Detection, Classification, and Fault Location in a Grid-Connected Microgrid. IEEE Access 2023, 11, 62674–62696. [Google Scholar] [CrossRef]
Bhatnagar, M.; Yadav, A.; Swetapadma, A.; Abdelaziz, A.Y. LSTM-based low-impedance fault and high-impedance fault detection and classification. Electr. Eng. 2024, 106, 6589–6613. [Google Scholar] [CrossRef]
Jayasinghe, J.A.R.R.; Malindi, J.H.E.; Rajapaksha, R.M.A.M.; Logeeshan, V.; Wanigasekara, C. Classification and Localization of Faults in AC Microgrids Through Discrete Wavelet Transform and Artificial Neural Networks. IEEE Open Access J. Power Energy 2024, 11, 303–313. [Google Scholar] [CrossRef]
Owolabi, J.; Pius, O. A Comparative Study of Symmetrical Method and Artificial Neural Network in Faults Detection in Power Transmission Lines. Int. J. Innov. Sci. Res. Technol. 2022, 7, 1362–1366. [Google Scholar] [CrossRef]
Fang, J.; Chen, K.; Liu, C.; He, J. An Explainable and Robust Method for Fault Classification and Location on Transmission Lines. IEEE Trans. Ind. Inform. 2023, 19, 10182–10191. [Google Scholar] [CrossRef]
Gilanifar, M.; Cordova, J.; Wang, H.; Stifter, M.; Ozguven, E.E.; Strasser, T.I.; Arghandeh, R. Multi-Task Logistic Low-Ranked Dirty Model for Fault Detection in Power Distribution System. IEEE Trans. Smart Grid 2020, 11, 786–796. [Google Scholar] [CrossRef]
Zhang, Y.; Yang, Q. A Survey on Multi-Task Learning. IEEE Trans. Knowl. Data Eng. 2022, 34, 5586–5609. [Google Scholar] [CrossRef]
Athay, T.; Podmore, R.; Virmani, S. A Practical Method for the Direct Analysis of Transient Stability. IEEE Trans. Power Appar. Syst. 1979, PAS-98, 573–584. [Google Scholar] [CrossRef]
Liu, C.L.; Tseng, C.J.; Huang, T.H.; Yang, J.S.; Huang, K.B. A multi-task learning model for building electrical load prediction. Energy Build. 2023, 278, 112601. [Google Scholar] [CrossRef]
Onaolapo, A.K.; Carpanen, R.P.; Dorrell, D.G.; Ojo, E.E. Transmission Line Fault Classification and Location Using Multi-Layer Perceptron Artificial Neural Network. In Proceedings of the IECON 2020—The 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 18–21 October 2020; pp. 5182–5187. [Google Scholar] [CrossRef]
Belagoune, S.; Bali, N.; Bakdi, A.; Baadji, B.; Atif, K. Deep learning through LSTM classification and regression for transmission line fault detection, diagnosis and location in large-scale multi-machine power systems. Measurement 2021, 177, 109330. [Google Scholar] [CrossRef]
Chen, K.; Hu, J.; He, J. Detection and Classification of Transmission Line Faults Based on Unsupervised Feature Learning and Convolutional Sparse Autoencoder. IEEE Trans. Smart Grid 2018, 9, 1748–1758. [Google Scholar] [CrossRef]

Figure 1. IEEE 39–Bus System Configuration Used for Fault Simulation.

Figure 2. Automated PowerFactory–Python workflow for generating labeled fault and no-fault samples in the IEEE 39 bus system.

Figure 3. Generalized training pipeline of a multi-task learning model.

Figure 4. Architecture of the Optimized Multi-Task Attention Model.

Figure 5. Top 30 shared latent features ranked by average attention weight.

Figure 6. Confusion matrix for fault identification.

Figure 7. Confusion matrix for fault type classification.

Figure 8. Confusion matrix for fault location classification.

Figure 9. Fault location accuracy vs. fault resistance.

Figure 10. Fault location misclassification rate vs. fault resistance.

Table 1. Fault Types and Corresponding Number of Samples Generated.

Fault Name	Fault Code	Size
No Fault	0	14,236
Single Phase a to Ground Fault	1	7370
Single Phase b to Ground Fault	2	7370
Single Phase c to Ground Fault	3	7370
Two Phase Ground Fault (a–b–g)	4	7353
Two Phase Ground Fault (b–c–g)	5	7353
Two Phase Ground Fault (c–a–g)	6	7353
Two Phase Fault (a–b)	7	6856
Two Phase Fault (b–c)	8	6856
Two Phase Fault (c–a)	9	6856
Three Phase Fault	10	7703
Total	–	86,676

Table 2. Selected Hyperparameters for Final MTL Architecture.

Hyperparameter	Search Range	Optimal Value
Number of Shared Layers	1 to 4 (Integer Range)	4
Number of Neurons per Layer	128, 192, or 256 (Categorical)	128
Dropout Rate	0.1 to 0.3 (Continuous Range)	0.1
Activation Function	ReLU or Tanh (Categorical)	ReLU
Optimizer	Adam, SGD, or RMSprop (Categorical)	SGD
Learning Rate	$1 \times 10^{- 4}$ to $1 \times 10^{- 2}$ (Log scale)	0.00115

Table 3. Performance Metrics and Their Definitions.

Performance Metric	Mathematical Expression
Accuracy	$\frac{T P + T N}{T P + F P + T N + F N}$
Recall	$\frac{T P}{T P + F N}$
Specificity	$\frac{T N}{T N + F P}$
Precision	$\frac{T P}{T P + F P}$
F1-score	$\frac{2 \times (Precision \times Recall)}{Precision + Recall}$

T P

= True Positive,

T N

= True Negative,

F P

= False Positive,

F N

= False Negative.

Table 4. Class-wise accuracy, precision, recall, specificity, and F1-score for fault type classification.

Class Name	Accuracy	Recall	Specificity	Precision	F1-Score
A phase to Gnd	0.9999	0.9991	1	1	0.9996
B phase to Gnd	1	1	1	1	1
C phase to Gnd	1	1	1	1	1
A-B phase to Gnd	0.9998	0.9991	0.9999	0.9991	0.9991
B-C phase to Gnd	1	1	1	1	1
C-A phase to Gnd	0.9999	0.9991	1	0.999	0.9995
A-B phase	0.9999	1	0.9999	0.999	0.9995
B-C phase	1	1	1	1	1
C-A phase	0.9999	1	0.9999	0.9991	0.9995
Three phases to Gnd	1	1	1	1	1
Average	0.9999	0.9998	1	0.9998	0.9998

Table 5. Class-wise fault location metrics including accuracy, precision, recall, specificity, and F1-score.

Class	Accuracy	Recall	Specificity	Precision	F1-Score
10%	0.9925	0.9884	0.9939	0.9821	0.9853
20%	0.9888	0.9239	0.9957	0.9572	0.9403
30%	0.9897	0.9548	0.9934	0.9381	0.9464
40%	0.9910	0.9395	0.9965	0.9664	0.9528
50%	0.9892	0.9539	0.9928	0.9305	0.9421
60%	0.9896	0.9382	0.9949	0.9499	0.9440
70%	0.9872	0.9035	0.9956	0.9535	0.9278
80%	0.9801	0.8852	0.9897	0.8979	0.8915
90%	0.9855	0.9625	0.9878	0.8891	0.9244
Average	0.9882	0.9469	0.9934	0.9473	0.9468

Table 6. Performance Comparison of MTL-AttentionNet and Existing Models for Fault Diagnosis.

Work	Algorithm	Network	Software	Fault DetectionAccuracy (%)	Fault ClassificationAccuracy (%)	Fault LocationAccuracy (%)	Task Type
Nawaz et al. [18]	LR	3-Bus System	Simulink	–	99.97	–	Single-task
Fang et al. [30]	CNN	Two-Bus Network	Simulink	–	99.85	98.02	Single-task
Onaolapo et al. [35]	MLP-ANN	29-Bus System	Simulink	–	–	97–98	Single-task
Belagoune et al. [36]	LSTM	Two-Area Network	MATLAB	–	99.93	–	Single-task
Chen et al. [37]	Convolutional Autoencoder	Two-Bus Network	MATLAB/Simulink	–	99.74	–	Single-task
Proposed Work	MTL-AttentionNet	IEEE 39-Bus	DIgSILENT PowerFactory 2024	100.00	99.99	98.82	Multi-task

Note. “Single-task” indicates that each fault-related task (detection, classification, or location) is handled by a separate model in the cited work. “Multi-task” indicates that the proposed MTL-AttentionNet jointly learns fault identification, fault type classification, and fault location within a single shared model.

Table 7. Performance Comparison of MTL-AttentionNet and Baseline Models for Fault Diagnosis.

Model	Fault Detection Accuracy (%)	Fault Classification Accuracy (%)	Fault Location Accuracy (%)	Training Type
MTL-AttentionNet	100	99.99	98.82	Joint multitasks
SVM	100	99.97	84.47	Separate models
MLP	100	99.98	94.52	Separate models

Table 8. Test accuracy, stability over

K = 5

runs, and 95% confidence intervals for the three tasks on the held-out test set (

N

= 13,001).

Table 8. Test accuracy, stability over

K = 5

runs, and 95% confidence intervals for the three tasks on the held-out test set (

N

= 13,001).

Task	Mean Accuracy (%)	Std. Dev. (%)	95% CI (%)
Fault identification	100.00	0.00	$[100.00, 100.00]$
Fault type classification	99.98	0.02	$[99.95, 100.00]$
Fault location estimation	98.82	0.13	$[98.63, 99.00]$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alam, M.S.; Islam, M.R.; Fan, R.; Alam Shazid, M.S.; Hasan, A.S. Attention-Guided Multi-Task Learning for Fault Detection, Classification, and Localization in Power Transmission Systems. Energies 2025, 18, 6547. https://doi.org/10.3390/en18246547

AMA Style

Alam MS, Islam MR, Fan R, Alam Shazid MS, Hasan AS. Attention-Guided Multi-Task Learning for Fault Detection, Classification, and Localization in Power Transmission Systems. Energies. 2025; 18(24):6547. https://doi.org/10.3390/en18246547

Chicago/Turabian Style

Alam, Md Samsul, Md Raisul Islam, Rui Fan, Md Shafayat Alam Shazid, and Abu Shouaib Hasan. 2025. "Attention-Guided Multi-Task Learning for Fault Detection, Classification, and Localization in Power Transmission Systems" Energies 18, no. 24: 6547. https://doi.org/10.3390/en18246547

APA Style

Alam, M. S., Islam, M. R., Fan, R., Alam Shazid, M. S., & Hasan, A. S. (2025). Attention-Guided Multi-Task Learning for Fault Detection, Classification, and Localization in Power Transmission Systems. Energies, 18(24), 6547. https://doi.org/10.3390/en18246547

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Attention-Guided Multi-Task Learning for Fault Detection, Classification, and Localization in Power Transmission Systems

Abstract

1. Introduction

2. Review of Existing Works

3. Fault Scenario Simulation and Dataset Development

3.1. IEEE 39–Bus System Setup and Load Variation

3.2. Fault Scenario Configuration and Data Generation

4. Baseline Models and Proposed MTL Framework

4.1. Multi-Task Learning (MTL)

4.1.1. MTL Framework and Model Formulation

4.1.2. Architectural Components and Training Strategy

4.2. Support Vector Machine (SVM)

4.3. Multi-Layer Perceptron (MLP)

5. Proposed Model

5.1. Overall Architecture of the Proposed Framework

5.2. Task Formulation

5.3. Attention Mechanism

5.4. Hyperparameter Tuning with Optuna

6. Results and Performance

6.1. Experimental Setup and Evaluation Metrics

6.2. Fault Detection Results

6.3. Fault Type Classification Performance

6.4. Fault Location Identification

6.5. Comparative Analysis

6.6. Statistical Robustness and Practical Considerations

6.7. Discussion, Limitations, and Future Work

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI