Integration of Multi-Agent Systems and Artificial Intelligence in Self-Healing Subway Power Supply Systems: Advancements in Fault Diagnosis, Isolation, and Recovery

Feng, Jianbing; Yu, Tao; Zhang, Kuozhen; Cheng, Lefeng

doi:10.3390/pr13041144

Open AccessReview

Integration of Multi-Agent Systems and Artificial Intelligence in Self-Healing Subway Power Supply Systems: Advancements in Fault Diagnosis, Isolation, and Recovery

by

Jianbing Feng

^1,2,

Tao Yu

¹,

Kuozhen Zhang

^3,* and

Lefeng Cheng

^4,*

¹

School of Electric Power, South China University of Technology, Guangzhou 510641, China

²

Guangzhou Metro Construction Management Co., Ltd., Guangzhou 510330, China

³

Law School, Shantou University, Shantou 515063, China

⁴

School of Mechanical and Electrical Engineering, Guangzhou University, Guangzhou 510006, China

^*

Authors to whom correspondence should be addressed.

Processes 2025, 13(4), 1144; https://doi.org/10.3390/pr13041144

Submission received: 18 March 2025 / Revised: 3 April 2025 / Accepted: 8 April 2025 / Published: 10 April 2025

(This article belongs to the Special Issue Industrial IoT-Enabled Modeling and Optimization for the Process Industry—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

The subway power supply system, as a critical component of urban rail transit infrastructure, plays a pivotal role in ensuring operational efficiency and safety. However, current systems remain heavily dependent on manual interventions for fault diagnosis and recovery, limiting their ability to meet the growing demand for automation and efficiency in modern urban environments. While the concept of “self-healing” has been successfully implemented in power grids and distribution networks, adapting these technologies to subway power systems presents distinct challenges. This review introduces an innovative approach by integrating multi-agent systems (MASs) with advanced artificial intelligence (AI) algorithms, focusing on their potential to create fully autonomous self-healing control architectures for subway power networks. The novel contribution of this review lies in its hybrid model, which combines MASs with the IEC 61850 communication standard to develop fault diagnosis, isolation, and recovery mechanisms specifically tailored for subway systems. Unlike traditional methods, which rely on centralized control, the proposed approach leverages distributed decision-making capabilities within MASs, enhancing fault detection accuracy, speed, and system resilience. Through a thorough review of the state of the art in self-healing technologies, this work demonstrates the unique benefits of applying MASs and AI to address the specific challenges of subway power systems, offering significant advancement over existing methodologies in the field.

Keywords:

subway power supply systems; self-healing technologies; multi-agent systems (MASs); IEC 61850 standard; fault diagnosis and isolation; artificial intelligence (AI) algorithms; power grid self-healing; distribution networks; fault recovery; real-time fault detection; automated restoration processes; decentralized control systems; subway network resilience; predictive maintenance

1. Introduction

With the rapid pace of global urbanization, subways have become an essential solution to alleviate urban traffic congestion. According to the China Urban Rail Transit Association, subway operating mileage and passenger numbers continue to grow, cementing subways as a cornerstone of urban transportation [1]. As urban populations expand, ensuring the reliability of subway power supply systems has become increasingly crucial. Failures in the power supply system can lead to disruptions in subway operations, negatively impacting passenger safety and system efficiency. Traditionally, these systems have relied on manual interventions for fault diagnosis and recovery, limiting their ability to address the growing demand for automation and rapid response in modern urban environments.

To address these critical challenges, this review presents a novel approach for integrating multi-agent systems (MASs) with advanced artificial intelligence (AI) algorithms to enable fully autonomous self-healing capabilities in subway power systems. The integration of MASs with AI technologies aims to enhance subway power systems’ ability to detect, isolate, and recover from faults more efficiently than traditional methods, which heavily rely on centralized control. This hybrid system enables distributed decision-making, allowing for real-time, local fault detection and diagnosis without central authority, thus reducing response times and improving system resilience.

The main objectives of this review are as follows:

(1): Investigate the historical development and current state of self-healing technologies in power supply systems, with a particular focus on their adaptation and application in subway power systems.
(2): Analyze how MASs and AI enhance the capabilities of subway systems in fault detection, isolation, and recovery, enabling autonomous decision-making and real-time responses to system failures.
(3): Examine the integration of the IEC 61850 communication standard with MASs [2], and how this contributes to decentralized control, improving fault recovery and enhancing the scalability of self-healing systems in subway power networks.
(4): Address the unique challenges faced by subway systems, such as reliability, response times, fault management, and system resilience, and propose integrated solutions through the application of MASs and AI.

The reliability and efficiency of subway systems are tightly coupled with the performance and stability of their power supply systems. Ensuring uninterrupted service and the safety of passengers requires that power systems be able to self-heal, automatically recovering from faults and minimizing disruptions. While self-healing technology has been widely researched and implemented in power grids, adapting this technology to subway systems presents a distinct set of challenges due to the unique operational environment of urban rail systems. Traditional centralized control methods often fail to provide the level of speed, accuracy, and resilience required for the dynamic and complex environment of subway power systems.

Self-healing technologies in power systems allow for autonomous fault recovery without relying on human intervention, improving overall system reliability. First introduced in the U.S. power grid systems, this technology enables the automatic identification, isolation, and restoration of power during faults, significantly improving system performance and minimizing downtime [3]. This review explores how MASs and AI, integrated with the IEC 61850 communication standard, offer a decentralized, autonomous approach that is a marked improvement over traditional fault recovery methods. This innovative combination has the potential to revolutionize subway power systems, providing faster and more efficient responses to faults.

The hybrid model proposed in this review utilizes MASs to decentralize decision-making, allowing each agent within the system to independently detect, diagnose, and resolve faults. The decentralized nature of MASs enhances fault detection by distributing decision-making processes across multiple system components, enabling quicker responses and improving fault isolation accuracy. Furthermore, the integration of AI enhances predictive capabilities, enabling the system to anticipate potential failures and proactively manage faults before they escalate into service disruptions. Compared to traditional centralized methods, this decentralized approach offers greater flexibility, scalability, and resilience, addressing the dynamic and increasingly complex demands of modern subway networks.

The integration of MASs with the IEC 61850 communication standard represents a novel approach in self-healing technology, moving beyond conventional methods that primarily rely on centralized control systems. By empowering each agent within the system to independently detect, diagnose, and resolve faults, this hybrid architecture offers a significant improvement in speed, accuracy, and system resilience compared to traditional fault recovery methods. Through this novel integration, we aim to provide a comprehensive and scalable solution to enhance the reliability of subway power systems, setting the foundation for more autonomous urban transport networks.

Although subway power systems differ from traditional power grids in function and requirements, they similarly demand high reliability and fast response capabilities. Wang (2010) [4] and Du (2010) [5] explored fault diagnosis and protection methods in traction power systems, laying the foundation for research into self-healing technologies in subway systems. Subsequently, research into fault response and recovery in subway systems began incorporating MASs and AI to enhance the automation and intelligence of fault handling [6,7]. The application of MASs in subway power systems mainly focuses on optimizing fault detection, diagnosis, and system recovery. Song (2015) conducted an in-depth study of fault location in urban rail transit traction power systems, proposing MAS-based optimization strategies [8]. Additionally, AI technologies, particularly machine learning and deep learning, have been widely applied in fault data analysis and fault prediction [9,10].

Research on self-healing technology in subway power systems not only emphasizes rapid fault recovery but also explores how technological integration and innovation can improve overall system stability and reliability. For instance, Wang and Lv (2022) improved fault location accuracy and efficiency by studying fault point distance measurement methods in direct current (DC) traction power systems [11]. Another unique challenge faced by subway systems is how to restore power quickly without interrupting service. Wei et al. (2023) utilized global positioning system (GPS) time synchronization technology to enhance fault distance measurement accuracy, providing technical support for the rapid recovery of subway systems [12]. Additionally, Jin et al. (2017) conducted simulation studies on fault location in subway DC power systems using time-domain differentiation methods, improving fault response efficiency [13]. The integration of advanced AI technologies and real-time communication protocols like IEC 61850 has further enhanced the fault diagnosis process, enabling subway systems to predict and address multi-fault scenarios that traditional methods would struggle to handle. Reliability studies are also a crucial aspect of the development of self-healing technologies in subway power systems. Pei (2018) conducted an in-depth study on the reliability of subway traction power systems, identifying key technologies and methods for improving system reliability [14]. Meanwhile, Zhou (2012), in his master’s dissertation, analyzed online reliability assessments of subway power systems, providing scientific support for real-time monitoring and maintenance [15].

The subway power system is vast, with numerous risk points, and any fault can have widespread consequences, negatively impacting trains, passengers, and equipment. It could even lead to serious disruptions in traffic and social order. However, the current capabilities of subway power systems in fault analysis, handling, recovery, and prediction are relatively weak and inefficient. In the case of a failure, the system still relies heavily on emergency repairs and manual interventions, which fall short of the high service standards expected for modern subway systems.

Since 1999, when the United States’ “Consortium for Electric Infrastructure to Support a Digital Society (CEIDS)” [3] applied the concept of “self-healing” to the power grid, it has become a research focus and a key marker of grid intelligence. The “IntelliGrid” research project of the Electric Power Research Institute (EPRI) and the “Modern Grid Initiative” research project of the U.S. Department of Energy’s National Energy Technology Laboratory (NETL) have both made self-healing a primary research topic for the next generation of power grids. Similarly, in 2009, China’s State Grid Corporation proposed the development plan for a “robust smart grid”, emphasizing eight key characteristics for smart grids: self-healing, incentivizing and accommodating users, resisting attacks, providing power quality that meets the demands of the 21st century, allowing for the integration of various forms of power generation and storage, supporting a thriving electricity market, ensuring the optimal and efficient operation of assets, and utilizing high-speed communication and online monitoring. Thus, both domestic and international grids consider “self-healing” as a primary feature of next-generation smart grids [16].

The application of MASs and AI in subway power systems builds on extensive prior research in fault detection, location, and recovery within traditional power grids. Notable contributions to self-healing technologies in power systems have been made by initiatives such as the IntelliGrid project by the EPRI and the Modern Grid Initiative by the U.S. Department of Energy, which have explored the integration of self-healing technologies into grid systems [11,12]. These projects have demonstrated the effectiveness of self-healing technologies in improving grid stability, fault recovery speed, and reducing downtime, providing valuable insights into their potential application in subway systems.

Furthermore, the International Electrotechnical Commission 61580 communication standard (IEC 61850), i.e., Communication Networks and Systems for Power Utility Automation, initially developed for use in substation automation, is increasingly being adopted in subway power systems. This standard enables real-time data exchange and ensures interoperability between different devices within the subway power network. Its integration with MASs creates a highly responsive and adaptive environment that enhances the coordination of fault recovery efforts, optimizes energy distribution, and improves overall system resilience. IEC 61850 facilitates synchronized operations across multiple agents in the system, ensuring that fault recovery actions are carried out quickly and efficiently, with minimal impact on subway operations.

The need for autonomous fault management is particularly pressing in the context of subway systems. These systems are highly complex, with a large number of potential fault points across numerous subsystems. Faults in the power supply can cause significant disruptions, affecting train operations, passenger safety, and equipment functionality [17,18]. Despite advancements in fault management, current subway power systems still rely on manual interventions and are often slow to respond to issues. Self-healing technology, when integrated with MASs and AI, has the potential to address these challenges by providing autonomous, real-time responses to faults, thus improving system reliability and reducing downtime [19,20].

The unique operational environment of subway systems also presents several additional challenges. Power supply systems in subway networks must be able to maintain continuous service, even when faults occur, which is crucial for minimizing service interruptions and ensuring the safety of passengers [21,22]. Moreover, subway power systems often experience high levels of dynamic demand, with power requirements fluctuating throughout the day. This requires power systems to be highly adaptable and responsive to changes in demand, which traditional fault recovery methods are not equipped to handle. MASs and AI provide the necessary intelligence and flexibility to manage these dynamic conditions, enabling subway power systems to function more efficiently and reliably [15,23,24].

Overall, the self-healing technology of subway power systems plays a crucial role in enhancing the reliability and efficiency of these systems. As a key support technology for urban transportation, it not only ensures the safety and smooth operation of city traffic but also drives the modernization of subway systems through technological innovation and system optimization, enabling them to better meet the complex demands of modern urban development. By enabling faster fault detection and response, the self-healing technology in subway power systems reduces downtime, thereby improving service continuity and reliability. This technology is not limited to resolving existing faults but also aims at preventing potential issues, significantly boosting the overall operational efficiency of the subway system [25]. Moreover, the integration of self-healing technology facilitates real-time monitoring, which is crucial in predicting and preventing potential service interruptions [26]. Self-healing technology further contributes to the overall safety and efficiency of subway systems, offering real-time data insights for enhanced operational resilience and passenger safety. Safety is the top priority in urban rail transit systems. Self-healing technology significantly enhances the safety standards of subway power systems by enabling real-time monitoring and automatic adjustments of system settings. For instance, deploying advanced sensors and monitoring equipment allows for real-time detection of the power supply line’s status, enabling quick fault identification and isolation to prevent potential accidents [27]. The application of this technology greatly reduces train delays and accidents caused by power supply issues, providing passengers with a safer and more stable travel environment. Self-healing technology also plays a significant role in improving passenger convenience. By optimizing the self-healing capabilities of the subway power system, service interruptions due to power failures are minimized, ensuring smoother and more uninterrupted travel for passengers [28]. Furthermore, with the integration of self-healing technology and mobile connectivity, passengers can access real-time train operation statuses and fault recovery progress through smartphone applications, enhancing the transparency and convenience of the travel experience [29].

As urbanization accelerates, subway systems are expected to face increasing demands for higher capacity, greater operational efficiency, and faster response times. Self-healing technology in subway power systems is essential for meeting these demands, as it can improve service continuity, enhance system resilience, and reduce the need for manual intervention [30]. For example, through intelligent upgrades, subway systems can automatically adjust operating frequencies and power distribution during peak periods, optimizing resource utilization to meet the continuously changing passenger flow demands [31]. By integrating advanced monitoring technologies, AI-based fault diagnosis, and automated recovery mechanisms, self-healing systems can proactively address potential failures, enhancing system stability and ensuring seamless subway operations. Looking ahead, the self-healing capabilities of subway power systems will continue to improve with ongoing technological advancements and innovations. This improvement is not limited to technical innovations but also includes optimizing management strategies and operational models. Through these comprehensive measures, subway systems will be able to provide safer, more convenient, and more reliable services to passengers, while contributing to the sustainable development of cities [32]. Thus, in the process of urban development and modernization, subway systems, as an integral part of public transportation, play a crucial role. Subways not only significantly improve the efficiency of urban transportation but also help reduce road congestion and environmental pollution. However, the efficient operation of subway systems heavily relies on the reliability and stability of their power supply systems. In the event of a power system failure, operations can be disrupted, and safety incidents may occur, causing serious impacts on city operations and the daily lives of citizens. Therefore, researching and implementing self-healing technologies for subway power systems has become a critical necessity. This is not only to enhance system reliability but also to ensure passenger safety and improve their service experience. Based on this, the following sections of this review provide a detailed discussion of the research necessity.

(1): Enhancing the Reliability and Efficiency of Power Supply Systems: The self-healing technology in subway power systems can significantly improve the system’s automatic diagnosis and fault recovery capabilities, thereby reducing service interruptions caused by system failures. By incorporating advanced monitoring technologies and automation tools, the self-healing system can respond rapidly to faults, minimizing reliance on manual intervention, and increasing the speed and accuracy of fault resolution.
(2): Ensuring Safe and Smooth Urban Transit: As a major public transportation system, the safety of subway operations directly impacts the lives of thousands of passengers and the overall public safety of the city. Self-healing technology helps prevent accidents caused by power instability or interruptions by promptly detecting and addressing power supply issues, significantly improving the safety of subway operations.
(3): Adapting to the Needs of Modern Urban Development: As urbanization accelerates, subway systems face increasing challenges, including rising passenger numbers, higher service expectations, and more complex operating environments. Self-healing technology, through intelligent management and real-time data analysis, optimizes the performance of subway power systems, better meeting these evolving demands.
(4): Improving Passenger Experience: The application of self-healing technology goes beyond enhancing technical performance; it directly improves the passenger experience by reducing faults and delays. For example, the system can automatically isolate and repair minor faults without disrupting the entire network, providing passengers with more stable and reliable service.
(5): Driving Technological Innovation and Industry Progress: Research into self-healing technology for subway power systems has spurred innovations in related technologies, including applications in artificial intelligence, the Internet of Things (IoT), and big data analytics. The integration and innovation of these technologies not only optimize subway power systems but also promote the development of intelligent transportation and smart city technologies.

In this review, the research on self-healing technology in subway power systems is of great significance in ensuring the safe, efficient, and reliable operation of urban rail transit. This not only meets the demands of modern cities for high-standard public transportation systems but also provides an effective means to enhance the technological level and service quality of public transit systems. Based on this, this paper provides a detailed summary of the research progress on self-healing technology in subway power systems, with a particular focus on the integrated application of MASs and the IEC 61850 standard. The following is a summary of the main contents of this review paper.

(1): Introduction to the Concept of Self-Healing: This paper begins by introducing the basic concept of self-healing, which originally stems from biological systems. It explains how this concept has been adapted for use in power systems. The primary function of self-healing technology in power systems is to reduce human intervention by automating the processes of fault detection, isolation, and recovery. This, in turn, enhances the reliability and efficiency of the overall system. In terms of historical background, this paper reviews the evolution of the self-healing concept within power systems. It highlights notable initiatives such as EPRI’s IntelliGrid project and the U.S. Department of Energy’s Modern Grid Initiative, both of which signify the integration of self-healing technologies as essential components of modern intelligent energy systems.
(2): Self-Healing Control Architectures: The discussion then shifts to various control architectures employed in self-healing technology for distribution networks. This paper compares hierarchical control systems with MASs, illustrating the shift from centralized systems to more decentralized and faster-responding systems. This paper emphasizes that, although self-healing technologies have been extensively researched and developed in traditional power systems, they are relatively new in the context of subway power systems. It advocates for adapting self-healing technology to subway systems by leveraging the unique characteristics of these systems and incorporating both MASs and the IEC 61850 standard.
(3): Fault Diagnosis and Recovery Technologies: This paper provides a comprehensive review of current technologies used for fault location, isolation, and recovery in distribution networks and railway systems. Special attention is given to the application of these technologies in subway systems, where both direct judgment methods and computational analysis approaches for fault location are explored. In terms of innovation, this paper discusses the potential of using AI for fault diagnosis and recovery. The integration of AI is seen as a promising way to significantly enhance the system’s ability to address complex, multi-point faults.
(4): Development of New Technologies and Challenges: This paper forecasts the future application of hybrid augmented intelligence and generative AI in subway power systems. These emerging technologies are expected to be effective tools for solving complex fault scenarios. However, this paper also discusses the technical challenges posed by the introduction of flexible direct current (DC) technology into subway power systems. It examines how this development may introduce new challenges for implementing self-healing technologies in these systems.

Through a comprehensive review and analysis, this paper not only clarifies the current state of research and future directions for self-healing technology in subway power systems but also provides a theoretical foundation and technical guidance for achieving intelligent and autonomous subway operations. This review offers an in-depth exploration of self-healing technologies for subway power systems, with a particular focus on the application of MASs and the IEC 61850 standard. This research is of significant academic value and offers critical insights and impetus for related fields of study and practice. The key contributions of this paper are summarized as follows:

(1): Enhancing the Stability and Reliability of Subway Power Systems: The application of self-healing technologies can significantly reduce service interruptions and accidents caused by power issues in subway operations. This, in turn, improves the overall stability and reliability of the subway power system, which is crucial for meeting the growing demand for urban public transportation, ensuring the safety and efficiency of travel for millions of passengers.
(2): Promoting the Development of Intelligent Transportation Systems: By integrating advanced information technologies and communication standards such as IEC 61850, self-healing technology in subway power systems not only improves the efficiency of fault management but also accelerates the development of intelligent transportation systems. These integrated technologies provide vital support for building smart cities.
(3): Optimizing Energy Management and Environmental Sustainability: Self-healing technologies contribute to optimizing energy distribution and usage efficiency, helping reduce energy consumption and environmental impact. When applied globally, these technologies can positively influence energy conservation, emissions reduction, and environmental protection.
(4): Inspiration and Advancement for Related Fields. (i) Cross-application of smart grid technologies: The self-healing technology in subway power systems draws from key aspects of smart grid technology, such as real-time data monitoring and automated fault response. This not only enhances the level of automation in subway systems but also provides new approaches and methodologies for applying smart grid technologies in other fields. (ii) Fostering multidisciplinary integration: This paper emphasizes the integration of MASs and artificial intelligence in self-healing systems for subway power supply. This multidisciplinary fusion promotes collaboration across fields such as computer science, electrical engineering, and transportation engineering, opening up new research and application areas. (iii) Inspiring new business models and policy development: The advancement of self-healing technologies in subway power systems may inspire new business models, such as performance-based service contracts and advanced maintenance services. It may also encourage governments and industries to establish relevant standards and policies to support the widespread deployment and application of such technologies.

This review aims to provide a comprehensive synthesis of the research landscape on self-healing subway power supply systems, focusing specifically on the integration of MASs and AI technologies. We will examine how these technologies contribute to improving fault detection, diagnosis, and recovery in subway power supply systems, and explore the challenges and future directions for further development. Through this exploration, we aim to present a clear understanding of the existing advancements in this field and propose avenues for future research to enhance the reliability and efficiency of subway power systems. In conclusion, this paper not only demonstrates academic innovation and foresight but also holds broad applicability and significance in real-world contexts. It provides a theoretical foundation and technical roadmap for self-healing technologies in subway power systems, while also offering valuable insights and long-lasting influence for researchers in related fields.

In conclusion, integrating MASs, AI, and the IEC 61850 communication standard represents a groundbreaking approach to self-healing subway power systems. This combination offers autonomous, distributed decision-making capabilities that significantly improve the speed and accuracy of fault detection, isolation, and recovery. By addressing the unique challenges of subway systems, these technologies will play a pivotal role in enhancing the resilience, efficiency, and safety of urban transit infrastructure. The following sections of this review will explore these advancements in detail, providing a comprehensive overview of the state of the art in self-healing technologies and their potential to revolutionize subway power supply systems.

The rest of this article is organized as follows: Section 2 provides a comprehensive review of the current state of self-healing technologies within electrical and subway power systems, detailing the historical development and recent advancements that establish the foundation for subsequent discussions. This section also introduces the key concepts and terminologies used throughout this paper, setting the stage for deeper exploration. Section 3 delves into the specific challenges faced by subway power supply systems, distinguishing them from general electrical grids, with a focus on their unique operational demands and the critical need for reliable power delivery. It presents a critical analysis of existing fault diagnosis and protection methodologies as explored by significant studies in the field. Section 4 explores the integration of MASs and the IEC 61850 standard into subway power systems. This section outlines how these technologies converge to enhance the self-healing capabilities of subway systems, offering a detailed discussion on the synergy between advanced control architectures and standardized communication protocols. Section 5 presents novel research findings and practical applications of self-healing techniques in subway systems. It discusses various case studies and experimental results that demonstrate the effectiveness of MASs and AI algorithms in improving fault detection, isolation, and recovery processes. Section 6 discusses the implications of these technologies for future developments in subway power systems. It highlights the potential for broader applications of AI and MASs in enhancing the automation and intelligence of urban transit systems, proposing a roadmap for future research. Finally, Section 7 concludes the article with a summary of the key findings and their implications for the field of subway power supply systems. It reiterates the importance of advancing self-healing technologies to meet the growing demands of modern urban transportation, calling for continued research and collaboration within the field.

The structure of this article meticulously develops the narrative on self-healing technologies in subway power systems. Section 2 lays the groundwork by reviewing the history and current advancements in self-healing technologies, setting the stage for Section 3, which addresses the unique challenges and needs of subway systems. Section 4 builds on this by discussing the integration of MASs and the IEC 61850 standard, which are crucial for enhancing fault management capabilities. This technical exploration feeds into Section 5, where case studies illustrate the practical effectiveness of these technologies. Section 6 explores broader implications and future potentials, leading to Section 7 that synthesizes all discussions, summarizing key insights and affirming the importance of continued research. This progression ensures a coherent flow, with each section logically supporting the next in exploring the application and impact of self-healing technologies in urban transit.

2. Review of Self-Healing Technologies Within Electrical and Subway Power Systems

In this chapter, we first introduce the conception of self-healing (Section 2.1). Then, by examining the historical evolution (Section 2.2), delving into the key architectural components of self-healing frameworks (Section 2.3), and exploring their emerging adoption and future prospects in subway power systems (Section 2.4), this chapter establishes a comprehensive foundation for understanding how self-healing technologies can—and increasingly do—shape modern electrical and traction networks. This exploration underscores the technical sophistication, interdisciplinary nature, and forward-looking research opportunities that define self-healing as a transformative force in the pursuit of reliable, efficient, and intelligent urban power infrastructures.

2.1. The Concept of Self-Healing in Metro Power Supply Systems

The concept of “self-healing” originates from biology, where it refers to the intrinsic ability of living organisms to maintain homeostasis and recover from external disturbances and damage. In the context of power grids, self-healing generally denotes the capability of an electrical network to identify and isolate faults rapidly and to restore power supply to critical loads—ideally with minimal or no human intervention [31]. While theoretical advancements in self-healing systems are promising, real-world applications remain underexplored. Future empirical studies will be necessary to test the efficacy of these techniques in operational subway power systems. This process is often likened to the immune system in biological organisms, which functions to detect threats, isolate them, and re-establish normal operation. The earliest formal definition of self-healing in the electric grid was provided by the Electric Power Research Institute (EPRI) in the SPID (Strategic Power Infrastructure Defense System) project [32], emphasizing the adaptive measures (e.g., intentional islanding, adaptive protection, information, and sensing) to mitigate various threats, including natural disasters, communication failures, market perturbations, and deliberate acts of sabotage.

In China’s power industry, self-healing has been similarly characterized as the procedure by which problematic or failed components in the grid are detected and isolated automatically, or with minimal operator interaction, such that the broader system swiftly returns to normal operational conditions. In international practice, the essential self-healing functionality is broadly summarized as FLISR (Fault Location, Isolation, and Service Restoration). Research efforts worldwide have focused on developing self-healing control architectures, algorithms for fault identification and analysis, and strategies for fault isolation and rapid system reconfiguration.

2.1.1. Relevance to Metro Power Supply Systems

Metro power supply systems are generally concentrated in densely populated urban centers, where operational reliability and continuity of service are paramount for passenger safety, transit efficiency, and broader socio-economic stability. A power supply disruption in metro systems can cause immediate and extensive societal impacts, ranging from passenger inconvenience to significant losses in productivity and heightened safety risks. Consequently, implementing self-healing mechanisms in metro power supply systems involves adopting fast, automated, and robust control strategies that can isolate malfunctioning lines or equipment and restore power in seconds or even sub-second timescales. By leveraging real-time monitoring, advanced fault detection, and switchgear automation, metro systems aim to ensure seamless service despite faults or disturbances. Although these concepts have been widely proposed in the literature, empirical validation is required to confirm their effectiveness under real-world operating conditions. Pilot studies and simulations will be essential to demonstrate the practical viability of self-healing systems in metro environments.

2.1.2. Mathematical Representation of Self-Healing in Metro Power Supply

To encapsulate key aspects of self-healing—namely fault detection, fault isolation, and supply restoration—quantitative models are often developed [33,34,35]. For example, several studies have implemented fault detection and isolation algorithms in simulation environments, but empirical validation through real-world data is necessary to assess the accuracy of these models in practice. In Ref. [35], researchers present the functioning mechanisms of five different strategies for implementing self-healing capability into cement-based materials. Future efforts will involve testing these algorithms in operational subway systems to calibrate and refine the mathematical models based on actual fault occurrences and system behavior. These models facilitate the design and evaluation of strategies that minimize fault impact on metro operations while satisfying stringent safety and reliability constraints. Below are illustrative formulations that can be adapted for more detailed analyses:

1. Fault Detection Probability

Let t represent the time elapsed since a fault occurrence, and let P_d(t) denote the probability of having successfully detected and located the fault by time t. A widely adopted model assumes an exponential increase in detection probability over time, given by

P_{d} (t) = 1 - e^{- α t},

(1)

where α is a positive parameter that captures the sensitivity of the monitoring equipment or detection algorithm. A higher α implies faster and more reliable fault detection. This formula plays a crucial role in the modeling of fault detection performance. The exponential relationship signifies that the probability of detecting a fault increases rapidly as time progresses, highlighting the importance of efficient fault detection in minimizing service interruptions. The sensitivity parameter α emphasizes how the system’s responsiveness can be improved through advanced fault detection algorithms and more sensitive monitoring equipment.

2. Load Restoration Ratio

The measure of how effectively the system can restore loads after a disturbance can be quantified by the load restoration ratio η:

η = \frac{L_{restored}}{L_{total}},

(2)

where L_restored is the total load that is successfully restored following reconfiguration and L_total is the total pre-disturbance load. This ratio is vital for assessing the efficacy of the self-healing system. A higher η value indicates a system that is more effective in recovering from faults by restoring a higher proportion of the total load. The load restoration ratio provides a tangible measure of the system’s robustness in handling disruptions, with significant implications for the overall operational stability of subway power supply networks.

3. Self-Healing Objective Function

In faulted conditions, the metro supply system typically seeks to optimize both restoration speed and the proportion of recovered loads, subject to safety constraints. One may frame the self-healing problem as minimizing an objective function of the following form:

\min_{u} [w_{1} \cdot T_{interrupt} + w_{2} \cdot (1 - η)],

(3)

where T_interrupt is the duration of service interruption, η is the load restoration ratio, and u is a vector of decision variables (e.g., breaker switching states and power flow allocations). The weights w₁ and w₂ reflect the relative importance assigned to minimizing interruption time versus maximizing load restoration. This optimization framework captures the essence of a self-healing system by balancing the competing goals of minimizing service disruptions and ensuring effective load recovery. The decision variables u represent the system’s control parameters, which can be adjusted to achieve the desired outcomes. The objective function quantifies the trade-offs involved in system reconfiguration, providing a mechanism for dynamically adapting to fault conditions.

These formulations highlight the primary objectives and constraints associated with designing self-healing strategies for metro power supply systems. In practice, more sophisticated models can integrate various operational constraints (such as power quality, thermal limits, or protection coordination) to accurately represent the system’s behavior under disturbance. Based on the above, Table 1 synthesizes key similarities and differences of the self-healing concept in general power/energy systems and in metro power supply systems. It examines at least eight distinct dimensions—ranging from scope and control hierarchy to fault characteristics and implementation status—to provide a high-level comparative overview.

Overall, while the metro power supply setting shares commonalities with broader power and energy systems in terms of self-healing principles (e.g., FLISR (fault location, isolation, and service restoration)), it imposes more stringent real-time performance requirements and heightened safety standards. Future advancements in metro self-healing are anticipated to involve the deeper integration of sensing technologies, sophisticated fault isolation and reconfiguration algorithms, and more robust data communication protocols. These developments aim to ensure that even under adverse conditions, metro systems can autonomously detect and isolate faults, reconfigure feeder networks, and restore power with minimal disruption to passenger transit and operational stability.

2.2. Historical Evolution of Self-Healing Strategies in Electrical Power Systems

The concept of self-healing within electrical power systems emerged in tandem with the growing importance of system reliability, stability, and automation [14,36,37]. Historically, power utilities faced ever-increasing demands for seamless electricity provision while grappling with the technical and economic challenges posed by grid expansion and complexity. Early engineering solutions were typically aimed at enhancing robustness through redundancy measures and improved protective devices [38]; however, the notion of a system that could detect, isolate, and autonomously recover from faults without significant human intervention was not fully articulated until the latter half of the 20th century. While the evolution of self-healing strategies has been extensively documented in the literature, the actual implementation and operational effectiveness of these strategies in large-scale power systems remain under-verified. It is crucial to conduct field trials to evaluate the real-world performance of these strategies in various environmental and operational contexts, including urban subway power systems. This subsection traces the foundational developments that led to contemporary self-healing frameworks, emphasizing the evolution of control paradigms, the role of technological innovations such as supervisory control and data acquisition (SCADA) systems, and the gradual shift toward intelligent, automated solutions.

2.2.1. Early Concepts and Precursor Technologies

Prior to the advent of fully computerized control centers, power systems relied on mechanical relays, manual switchgear, and onsite personnel to handle contingencies. Protective relays were designed to operate when specific fault conditions exceeded threshold limits, thus providing a basic isolation mechanism. While these early solutions prevented catastrophic equipment damage, they were reactive in nature and limited by a lack of real-time data or predictive insight. Operators could only respond to disturbances once alarms were triggered or visible signs of failure became apparent.

The introduction of SCADA technology during the mid-20th century marked a major milestone in laying the groundwork for self-healing strategies [39]. SCADA systems allowed for remote monitoring and control of substations, transforming operational practices by enabling operators to gather near-real-time data regarding voltage levels, current flows, breaker statuses, and other critical parameters [40,41]. This shift created a platform for more advanced computational tools that could process large volumes of data and inform decision-making at control centers.

Simultaneously, power systems research began to address dynamic stability problems, frequency control, and load forecasting. The emergent field of power system stability studies spearheaded by various research groups underscored the need for adaptive, real-time techniques to maintain system equilibrium after disturbances. These developments foreshadowed the modern idea of “self-healing”, wherein the system would respond to perturbations with minimal external intervention.

2.2.2. The Emergence of Self-Healing Principles in the Late 20th Century

As computational capabilities expanded in the 1970s and 1980s, power engineers and researchers explored ways to automate fault detection and isolation [42,43]. For example, Ref. [42] discusses the integration of microprocessor-based digital relays and their application to self-healing systems. It covers how the use of microprocessors in substations and control centers, starting in the 1980s, allowed for real-time data analytics and enabled more flexible protection schemes. Innovative digital relays supplanted their purely electromechanical counterparts, offering more flexible protection schemes and the ability to communicate detailed fault information back to central control systems. The wider use of microprocessors in substations and control centers allowed for real-time data analytics—an essential enabler for the advanced functionalities that characterize self-healing systems.

During this period, the term “self-healing” began to appear in power system discourse, reflecting a move from static reliability concepts (such as N-1 contingency planning) to dynamic resilience and adaptability. Early academic studies proposed hierarchical control architectures that would locally identify faults, isolate affected segments, and promptly reconfigure the network to restore service. Yet, practical implementation faced significant barriers, including the complexity of coordinating multiple control agents, the limited bandwidth and reliability of communication channels, and the computational cost of running real-time algorithms on then-current hardware.

The hierarchical control architecture is a well-established framework for organizing complex control tasks into multiple layers, ensuring efficient management and operation across large-scale systems. Initially proposed by Professor G.N. Saridis of Purdue University in 1977, the hierarchical control architecture has been widely applied in various fields, including electrical power systems. This approach divides control responsibilities into three distinct levels: the organizational level, the coordination level, and the execution level. The adaptability and clear structure of this architecture have made it a key tool for managing and controlling the complex, distributed systems found in modern electrical grids and transit networks.

In the context of subway power systems, the hierarchical control architecture offers a structured solution to the challenges of fault recovery and system stability. Subway power systems, characterized by their intricate, multi-layered distribution network, benefit from the clear division of responsibilities inherent in this architecture. The organizational level oversees strategic decision-making and global optimization, while the coordination level manages the distribution of tasks and ensures coordination between different subsystems. The execution level, where actual control and operational adjustments occur, is responsible for fault isolation, reconfiguration, and system recovery.

1. Application of Hierarchical Control in Subway Power Systems

In the context of subway power systems, the hierarchical control framework aligns well with the system’s natural structure, which spans from the main power stations to substations and finally to the traction systems. Each level of the architecture performs specific functions tailored to the subway’s operational requirements:

(1): Organizational Level: At this highest level, the central control center formulates global power strategies, ensuring the continuous operation and safety of the system. It is responsible for strategic planning, long-term optimization, and high-level decision-making. The decisions made at this level influence the overall performance and resilience of the subway power network.
(2): Coordination Level: The substation level corresponds to the coordination level, where individual regions or sub-networks are managed. This level coordinates the activities of various subsystems, ensuring that resources are effectively allocated, especially during fault conditions. Coordination includes dynamic load balancing, fault recovery task prioritization, and optimization of energy distribution across the network. The system is capable of rapid response during fault events, adjusting operational parameters to restore normal conditions as quickly as possible.
(3): Execution Level: The execution level consists of intelligent devices and control units, such as circuit breakers, switches, and sensors. These devices perform specific actions, such as disconnecting faulty areas, restoring power from backup sources, and adjusting load distribution. The execution level’s effectiveness is critical for minimizing the impact of faults, as it directly influences the speed and precision of the fault recovery process.

2. Hierarchical Control Architecture Scheme

Based on the above, a hierarchical control architecture scheme is demonstrated in Figure 1, which visually represents the division of control responsibilities across the different levels of a power system. The diagram illustrates how control tasks are organized, starting from the strategic decisions made at the organizational level, which cascade down to the coordination level where operational tasks are distributed and managed. Finally, at the execution level, the schematic shows the physical devices that carry out the control actions necessary for fault recovery and system stabilization.

The hierarchical control architecture in Figure 1 organizes power system management into three levels: organizational, coordination, and execution. At the organizational level, global strategies and resource allocation are determined. The coordination level manages inter-subsystem cooperation, ensuring efficient fault recovery. The execution level handles direct actions, such as fault isolation and load reconfiguration, through intelligent devices. In subway power systems, this architecture enhances operational stability and fault recovery by clearly delineating responsibilities across levels, enabling rapid response to disruptions, optimizing energy distribution, and improving overall system resilience, crucial for maintaining continuous and reliable service in complex transit networks.

This layered structure is designed to optimize the management of complex power systems by ensuring that each level focuses on specific tasks, with minimal overlap. The organizational level ensures that global objectives are met, while the coordination level handles the real-time adjustment and distribution of tasks across the system. The execution level then implements these decisions, taking direct action to address faults and restore normal operations.

In summary, the hierarchical control architecture offers a comprehensive, adaptable framework for managing subway power systems, providing significant advantages in terms of fault detection, isolation, and recovery. By clearly dividing control responsibilities into multiple layers, it enhances system stability, improves recovery times, and allows for more efficient management of resources, making it an ideal solution for the highly complex and dynamic environment of subway power systems.

2.2.3. Influence of the Smart Grid Paradigm

The evolution of self-healing strategies gained further momentum with the emergence of the “smart grid” paradigm in the early 21st century. Smart grid initiatives emphasized digitalization, two-way communication, and integration of decentralized energy resources to enhance sustainability and efficiency. Within this paradigm, self-healing became a vital functionality, aiming to maintain power quality and reliability amid a growing proliferation of distributed energy resources (DERs), such as photovoltaic systems and wind farms, and increasing load volatility caused by electric vehicle charging and other new demands.

With greater sensor deployment—ranging from phasor measurement units (PMUs) in transmission systems to intelligent electronic devices (IEDs) in distribution networks—operators acquired a richer set of real-time measurements. Coupled with advanced analytics, these data streams opened the door to automated fault management protocols. Self-healing functions within the smart grid context typically entailed the following:

(1): Wide-Area Monitoring, Protection, and Control (WAMPAC): PMUs measuring voltage and current phasors synchronized to a global positioning system (GPS) time reference provided near-instantaneous snapshots of system conditions [44]. Such granular visibility enabled early fault detection and advanced protection schemes that adapt to changing conditions [45].
(2): Distributed Intelligent Control: The shift from monolithic control architectures to decentralized or distributed approaches, wherein local controllers or agents communicate and collaborate, accelerated. This setup was regarded as crucial for self-healing, as localized intelligence can isolate faults closer to their source and coordinate reconfiguration strategies quickly.
(3): Predictive and Preventive Measures: Smart grids embraced a shift from reactive fault management to proactive asset management and system planning. Machine learning models and robust optimization techniques were developed to predict equipment failures, forecast load patterns, and identify vulnerabilities in the network topology.

By integrating these elements, the modern power industry envisioned systems capable of maintaining stability and continuity of service under a variety of operational threats.

2.2.4. Convergence with Multi-Agent Systems and Artificial Intelligence

Although early self-healing concepts relied heavily on centralized approaches, the limitations of single-point decision-making—such as communication bottlenecks and slower response times—spurred interest in MASs. In MAS frameworks, multiple intelligent agents (e.g., at substations, feeder lines, or distributed generators) interact, negotiate, and collaborate to detect, isolate, and remedy faults. Each agent typically possesses partial knowledge of the system but is capable of local decision-making, thus distributing the computational burden and avoiding single points of failure.

AI further revolutionized self-healing strategies by enabling more sophisticated fault detection, classification, and system optimization [46,47]. Techniques such as artificial neural networks [46], support vector machines, deep learning [47], and fuzzy logic controllers facilitated rapid and accurate fault diagnosis, particularly in complex or noisy scenarios. As AI algorithms matured, they began to provide real-time decision support for reclosing sequences, sectionalizing, and load transfer operations [48]. Additionally, advanced machine learning models that rely on historical data and real-time sensor inputs could predict incipient failures in cables, transformers, or switchgear, thereby allowing proactive or condition-based maintenance to avert large-scale outages. Despite promising theoretical results, empirical testing is crucial to verify the real-world performance of MASs in self-healing systems. Simulations and pilot studies in operational environments will be key to refining MAS frameworks, especially when applied to metro power systems, which pose unique challenges such as rapid load fluctuations and complex network topologies.

By the early 2010s, industrial pilot projects began to demonstrate the feasibility of full or partial self-healing systems at distribution and sub-transmission voltage levels. These systems responded autonomously to single-phase or multi-phase faults by performing fault isolation and service restoration within seconds or minutes. Some utilities reported substantial improvements in reliability indices such as the system average interruption duration index (SAIDI) and the system average interruption frequency index (SAIFI).

2.2.5. Lessons Learned and Ongoing Challenges

Decades of technological progress show that self-healing strategies can significantly enhance power system resilience. Yet, challenges remain. Among the key lessons learned from these historical developments are the following:

(1): Communication Infrastructure: Adequate, reliable, and secure data exchange is critical for successful self-healing. Historically, the absence of high-bandwidth, low-latency communication hampered early initiatives, underscoring the need for robust communication standards and architectures, such as IEC 61850, to ensure interoperability among devices and systems.
(2): Coordination Complexity: The transition from centralized to distributed control paradigms introduces complexity in coordination among multiple agents. The necessity of robust algorithms for consensus, negotiation, and conflict resolution remains an important area of ongoing research.
(3): Scalability: Early demonstration projects often took place on relatively small-scale feeders. Scaling up self-healing solutions to entire distribution networks or interlinked systems involving numerous microgrids requires careful architectural design that balances local autonomy with central oversight.
(4): Cybersecurity Concerns: Increased digitalization raises the threat of cyberattacks, data tampering, and privacy breaches. Protecting self-healing frameworks from malicious interventions or denial-of-service attacks presents a nontrivial challenge that requires sophisticated security protocols and risk assessment methodologies.
(5): Economic Viability: Self-healing systems can be capital-intensive to implement, especially in existing grids with aging infrastructure. The cost-effectiveness of retrofits, the complexity of new device installation, and the required training of operational staff are all factors influencing widespread adoption.

Overall, the historical evolution of self-healing in electrical power systems reflects the progression from manual, reactive fault handling toward a digitally enabled, data-driven, and intelligent control paradigm. This evolution underscores the potential for similar developments in niche application areas, most notably subway power systems, which share many of the reliability and safety imperatives that have historically shaped the broader electrical grid.

2.3. Key Components and Architecture of Self-Healing Mechanisms in Modern Power Networks

Modern self-healing power networks are defined by sophisticated hardware and software components designed to ensure rapid fault detection, isolation, and system reconfiguration. These systems rely on the synergy of advanced sensors, protection devices, communication protocols, and intelligent algorithms to deliver an automated, efficient, and highly resilient energy supply. However, despite the promise of these technologies, real-world validation through pilot studies is essential to assess the true operational effectiveness of self-healing mechanisms in real-world systems. In particular, metro power systems, with their unique load dynamics and safety requirements, require extensive testing of fault detection and isolation algorithms in operational trials. This subsection dissects the primary building blocks of self-healing mechanisms as they have evolved in contemporary electrical grids, focusing on the architectural arrangements, control paradigms, and underlying standards—particularly IEC 61850—that facilitate interoperability and real-time responsiveness.

2.3.1. Hardware Foundation: Intelligent Electronic Devices, Sensors, and Switchgear

At the physical layer of a self-healing network, intelligent electronic devices (IEDs) and sensors form the backbone of measurement and protection. IEDs are microprocessor-based controllers that perform multiple functions, such as protective relaying, metering, and local automation. They gather high-resolution data on current, voltage, frequency, and harmonic content, enabling sophisticated fault detection schemes. When integrated with remote terminal units (RTUs) or SCADA systems, these IEDs relay detailed status updates and measurements to central or distributed controllers.

Equally important are the automated switchgear components—reclosers, sectionalizers, and circuit breakers—that physically isolate faulty segments and reconfigure network topology. The switchgear must respond rapidly and reliably to commands generated by the control logic. Advancements in switchgear design, including the use of vacuum or SF6 interrupting mediums, have improved operational speed and reduced maintenance requirements. In many modern systems, these components can be triggered either by local protective relays or by higher-level controllers orchestrating broader reconfiguration strategies.

2.3.2. Communication Protocols and Standards

A robust communication framework is vital for self-healing. Power utilities have historically used proprietary protocols, which often hindered interoperability. However, industry-wide acceptance of open communication standards, such as IEC 61850, has greatly facilitated multi-vendor interoperability and laid the groundwork for integrated, system-wide self-healing solutions.

IEC 61850 delineates a comprehensive data model and communication framework for substation automation. Its object-oriented design structures data into logical nodes that represent devices, measurements, and control functions. This approach allows for seamless exchange of information among relays, protection devices, and supervisory systems. Notably, IEC 61850 supports generic object-oriented substation event (GOOSE) messaging, which provides high-priority, low-latency data transfer for critical protection and control signals. Through GOOSE, devices can publish or subscribe to messages on the network, enabling rapid relay coordination and sophisticated interlocking schemes.

Furthermore, modern systems may employ protocols like DNP3 (Distributed Network Protocol) or Modbus for backward compatibility, while layering advanced cybersecurity measures (e.g., encryption and authentication) to safeguard communications. Where wide-area coordination is necessary, particularly in transmission-level self-healing or large-scale distribution automation schemes, telecommunication technologies such as fiber optics, wireless mesh networks, or 5G solutions can be employed to achieve the latency and reliability thresholds required for real-time control.

2.3.3. Control Hierarchies: Centralized, Decentralized, and Distributed Approaches

One of the most critical aspects of self-healing architecture is the organizational structure of control. Historically, centralized approaches dominated, wherein control centers collected measurements from the entire network, executed fault detection and isolation algorithms, and issued commands to field devices. This approach can be effective in relatively small or well-defined systems, but it risks single points of failure and communication bottlenecks, which become problematic as the network grows in complexity.

In contrast, decentralized (or hierarchical) approaches distribute decision-making authority closer to the field level, granting local controllers the autonomy to detect and respond to faults. A commonly adopted structure is a three-tier hierarchy:

(1): Primary Control (Local/Device Level): Protective relays and IEDs that execute overcurrent detection, undervoltage protection, or distance protection. They can isolate faults locally with minimal latency.
(2): Secondary Control (Feeder or Zone Level): Substation-based controllers that coordinate reconfiguration among multiple feeders or zones. They receive aggregated data from local devices and can implement advanced reconfiguration strategies such as switching feeder ties or transferring loads.
(3): Tertiary Control (Control Center Level): Higher-level supervision that oversees the entire utility network, optimizing long-term planning, load balancing, and restoration procedures when local measures are insufficient.

A fully distributed or multi-agent architecture further refines the decentralized approach by allowing intelligent agents—each equipped with localized sensing, decision-making, and communication capabilities—to collaborate with one another. This MAS approach is particularly powerful for fault restoration in complex distribution networks, as it can reduce computation time and enhance system-wide resilience. Agents may employ consensus algorithms, negotiation protocols, or artificial intelligence techniques to optimize reconfiguration in real time.

2.3.4. Core Functions of Self-Healing Mechanisms

Despite differences in architectural preferences and technology stacks, most self-healing systems revolve around a set of shared core functions:

(1): Fault Detection and Classification: High-speed relays, coupled with modern sensor networks, identify abnormal conditions (e.g., short circuits, overcurrent, or voltage collapse) and classify the type and location of the fault. AI-based classifiers often enhance accuracy under noisy conditions or complex fault scenarios.
(2): Fault Isolation: Once a fault is identified, circuit breakers, reclosers, or sectionalizers operate to isolate only the affected section. This isolation must be performed quickly to mitigate damage and maintain stability in the healthy portions of the system.
(3): Service Restoration (Reconfiguration): The most distinctive feature of self-healing systems is their capacity to reroute power around the fault, restoring service to the greatest extent possible. Automated reconfiguration may involve closing tie switches or adjusting feeder topology. MASs can play a significant role in coordinating these reconfigurations autonomously.
(4): System Optimization: Beyond restoring service, many self-healing frameworks incorporate optimization functions that ensure voltage profiles, line loading, and overall reliability are improved. Techniques such as dynamic voltage regulation, reactive power compensation, and automated load shedding contribute to system stability and performance.
(5): Predictive and Preventive Maintenance: Self-healing extends beyond fault handling to proactively safeguard system health. Condition monitoring of critical assets (e.g., transformers and cables) and AI-driven anomaly detection can reduce the incidence of unexpected failures and optimize maintenance scheduling.

2.3.5. Role of Artificial Intelligence and Advanced Analytics

Modern power networks leverage AI and big data analytics to implement adaptive, predictive, and real-time self-healing solutions. AI methods excel at interpreting the vast influx of sensor data, enabling the following [49,50,51,52,53]:

(1): Fault Pattern Recognition: Neural networks and machine learning models can detect subtle fault precursors by analyzing waveform distortions, harmonic anomalies, or partial discharge data.
(2): Real-Time Contingency Analysis: AI-driven simulators can run contingency analyses in parallel, evaluating various switching actions or load transfers under multiple fault scenarios.
(3): Adaptive Protection: In networks with high penetration of distributed generation, fault levels and power flows can vary significantly. AI-based adaptive protection adjusts relay settings dynamically to accommodate changing conditions.
(4): Asset Health Forecasting: Machine learning algorithms parse historical failure data, meteorological records, and real-time measurements to predict the residual life of components, supporting proactive replacement or refurbishment decisions.

Such capabilities significantly bolster the autonomy and responsiveness of self-healing. Nevertheless, the adoption of AI necessitates robust verification, validation, and interpretability measures—especially for mission-critical power system applications.

2.3.6. Security and Reliability Considerations

Because self-healing systems rely on extensive data exchange and automated decision-making, ensuring cybersecurity and reliability is paramount. Malicious actors could theoretically disrupt or manipulate automated functions, leading to unnecessary outages or, worse, physical damage to infrastructure. Consequently, modern architectures integrate the following [54,55,56]:

(1): Intrusion Detection Systems (IDSs): Deployed at the substation level to monitor suspicious network traffic or unauthorized system access.
(2): Encryption and Authentication: Communication protocols incorporate cryptographic methods to protect data integrity and confidentiality.
(3): Access Control Policies: Role-based access, multi-factor authentication, and stringent authorization policies limit the potential attack surface.
(4): Redundant Pathways: Networks are often designed with diverse communication paths and backup control systems, preventing single points of failure from compromising the entire self-healing mechanism.

In parallel, reliability assessments must consider the possibility of simultaneous equipment failures and communication outages. Scenario-based testing, hardware-in-the-loop simulations, and stress testing are commonly used to validate self-healing performance under extreme or cascading fault conditions.

2.3.7. Outlook: Convergence with Distributed Energy Resources and Microgrids

A contemporary trend shaping self-healing architectures is the rising prevalence of distributed energy resources (DERs). As more consumers install rooftop solar panels or adopt electric vehicles, distribution feeders can experience bidirectional power flow and dynamic load/generation profiles. Self-healing systems therefore require new algorithms capable of balancing local generation and consumption while maintaining system voltage and frequency stability.

Microgrids—localized energy networks that can operate autonomously—also introduce novel opportunities for self-healing [57,58]. In islanded mode, a microgrid’s self-healing mechanism can isolate internal faults and reorder generation resources to preserve critical loads. When connected to the main utility grid, microgrids serve as controllable cells that bolster overall system resilience. Coordinating self-healing at the microgrid level with higher-level grid control is an active area of research, promising future improvements in reliability and energy efficiency.

In summary, the core components and architecture of modern self-healing mechanisms reflect a multifaceted interplay of advanced protective devices, intelligent data sharing governed by standards such as IEC 61850, distributed or multi-agent control paradigms, and AI-powered analytics. These technologies collectively undergird the robust, flexible, and future-proof electrical power networks, setting the stage for specialized applications in subway systems, where the imperatives of safety, operational continuity, and rapid fault response are particularly pronounced.

2.4. Emerging Self-Healing Solutions in Subway Power Systems and Future Directions

Subway power systems, often referred to as traction power supply systems (as demonstrated in Figure 2), present a unique environment in which reliability, safety, and operational efficiency are paramount. Compared to traditional distribution networks, subway systems typically exhibit higher load densities over shorter distances, frequent load variations due to train acceleration and deceleration, and stringent safety requirements for passengers. These systems also incorporate specialized equipment such as rectifier transformers, third-rail or overhead catenary structures, and robust protective relays calibrated for traction loads. Although theoretical models and early-stage simulations show promise, empirical data from pilot studies and real-world subway systems are essential to validate the proposed self-healing solutions. Such studies will enable a comprehensive evaluation of the impact of self-healing on subway network reliability, operational efficiency, and safety. This subsection discusses how self-healing solutions are being adapted and refined to address the particular challenges of subway environments, highlighting key technological innovations, best practices, and prospects for future development.

Figure 2 illustrates an electrified railway traction power supply system structure, which is designed to provide the necessary power to trains using a single-phase or three-phase alternating current (AC) or DC system. The system includes a transformer substation that converts high-voltage electricity to a suitable level for rail operations. The design is highly reliable and ensures constant power supply to the trains while mitigating power losses. A significant feature is the integration of the return current rail, which provides a path for the current to flow back, ensuring that the system is both efficient and stable. The power supply network is typically designed for single-side (or single-arm) distribution, offering fault tolerance and making the system easy to monitor and maintain.

1. Function of the Traction Substation

The traction substation in the electrified railway system plays a critical role in power conversion and distribution. It receives high-voltage electricity from the main grid and steps it down to lower voltages suitable for the traction system. The traction substation regulates the power supply, ensuring the voltage and frequency meet the requirements for the trains to operate smoothly. In addition, it handles the distribution of electricity across various sections of the rail network, ensuring continuous power delivery and operational reliability for trains.

2. Function of the Catenary and Track

The catenary system in the electrified railway system provides the overhead line through which electrical power is transmitted to the trains. It is connected to the traction substation and supplies electricity to the train’s pantograph, ensuring consistent voltage and current flow. The track, often referred to as the return current rail, serves as the return path for the electrical current. It ensures that the electrical loop is closed, allowing the traction system to function effectively and ensuring the safe operation of the railway network by maintaining the flow of electricity and reducing the risk of electrical faults.

3. Summary of a Typical Electrified Railway Traction Power Supply System

The electrified railway traction power supply system depicted in Figure 2 serves as the foundational architecture for providing electrical power to urban transit systems, such as subways. This system efficiently converts high-voltage electricity from the main grid through traction substations, which step down the voltage to levels suitable for traction operations. Key components include the catenary system, which supplies power to the trains, and the return current rail, which closes the electrical loop by guiding the current back. The system’s key advantages lie in its robust reliability, adaptability to varying operational loads, and its fault-tolerant design that ensures continuous power delivery under normal and fault conditions. A notable feature is the integration of both AC and DC systems, offering the flexibility to accommodate different types of trains and operational demands. The system’s hybrid nature allows it to efficiently manage energy distribution, enhance operational stability, and minimize power losses, ensuring the sustained operation of the subway network.

2.4.1. Characteristics and Challenges of Subway Power Systems

Unlike conventional distribution grids, subway power networks are designed to handle high transient currents caused by accelerating trains and regenerative braking. They also feature multiple traction substations spaced along the railway line to ensure stable voltage supply. Key challenges include the following [59,60,61]:

(1): Rapid Fluctuations in Load Demand: Train movements impose high-power draws within seconds, necessitating real-time monitoring of current and voltage profiles. Self-healing mechanisms must thus accommodate frequent load spikes without triggering false alarms.
(2): Critical Safety Requirements: Failure in a subway power circuit can strand trains in tunnels or disrupt essential ventilation and signaling systems. Any self-healing strategy must prioritize passenger safety, ensuring that fault isolation or network reconfiguration does not inadvertently disconnect essential loads or violate safety protocols.
(3): Limited Redundancy and Topological Constraints: While overhead distribution networks can add tie-lines or reconfigure feeders relatively easily, subway systems often have limited alternatives for routing power around a fault due to space constraints and rigid corridor layouts. This places heavier emphasis on pinpoint fault localization and targeted restoration strategies.
(4): Integration with Signaling and Control Systems: Subway power infrastructure is closely interlinked with signaling, communications, and station facilities. Coordinating self-healing events with traction power protection, passenger information systems, and operational schedules can be complex, requiring robust communication and control architectures.

2.4.2. Adapting Self-Healing Functions for Subway Applications

Subway power systems have begun to adopt many of the core self-healing functions originally developed for electrical distribution grids, albeit with tailored modifications to meet stringent traction needs.

(1): Fault Detection and Localization: Traditional overcurrent or distance protection relays, combined with advanced sensor arrays, are complemented by traction-specific detection algorithms that account for the distinctive waveforms and power electronics used in subway systems. For instance, in systems equipped with regenerative braking, fault signals can overlap with normal operational signals. Intelligent algorithms, often grounded in AI-based pattern recognition, can distinguish these conditions more accurately than conventional threshold-based relays.
(2): Isolation Strategies: Unlike overhead feeders, subway power rails or catenaries cannot always be sectionalized as flexibly. Self-healing solutions typically rely on specialized disconnect switches or breaker arrangements at traction substations. These devices must isolate the faulted segment while retaining power to adjacent segments, preventing a single fault event from cascading into large-scale service interruptions. The isolation strategy may also consider the dynamic location of trains, ensuring that no train is left in an unsafe or dark tunnel segment during the isolation process.
(3): Rapid Service Restoration: Given the high passenger throughput in urban subway networks, restoring power promptly is a top operational priority. Some subway operators deploy ring or loop architectures, allowing the line to be fed from multiple substations. When a fault occurs, the system automatically opens circuit breakers to isolate the fault and closes alternate pathways so that power can still be supplied from another substation. Adaptive algorithms within a multi-agent framework can further refine the restoration sequence, minimizing inrush currents and voltage dips when re-energizing lines.

2.4.3. Leveraging IEC 61850 and Multi-Agent Systems in Subway Contexts

Building on the architectural insights gained from larger distribution systems, subway operators are increasingly looking to IEC 61850 for standardizing communications among traction substations, protective devices, and control centers. The flexibility of IEC 61850 logical nodes permits the modeling of traction-specific devices—like rectifier units or track section switches—ensuring that relevant fault signals and control commands can be shared efficiently. While these integration strategies have been discussed extensively in the literature, field trials and pilot programs are necessary to assess the true effectiveness of IEC 61850 in operational subway systems. Empirical case studies will provide valuable insights into the challenges and opportunities of implementing these technologies in metro environments.

MASs are proving especially promising in this domain. Agents deployed at each traction substation or track section can monitor local conditions (e.g., voltage levels, breaker states, and train locations) and communicate with neighboring agents to coordinate fault isolation and reconfiguration. By distributing intelligence throughout the power network, MASs can significantly reduce dependence on a central control center, thereby mitigating single-point failures and communication latency issues. In scenarios where partial or full communication loss occurs—an unfortunate but conceivable event in underground tunnels—MAS agents can resort to fallback strategies or local heuristics to maintain at least basic service levels.

2.4.4. AI-Driven Fault Prediction and Maintenance

Artificial intelligence techniques are increasingly adopted for condition-based maintenance and fault prediction in subway power systems, complementing their role in real-time restoration. For instance, traction power cables and switchgear can be equipped with sensors measuring temperature, partial discharge activity, and vibration. Machine learning models process these data to predict the health status of components and forecast the likelihood of imminent failure.

This predictive approach is especially valuable in subways where service disruptions can affect thousands of passengers in a short timeframe. By scheduling maintenance during off-peak hours or proactively replacing aging components, subway operators reduce the risk of disruptive breakdowns. Furthermore, advanced data analytics can optimize maintenance budgets by prioritizing interventions on components with the highest criticality and most pronounced signs of deterioration.

2.4.5. Case Studies and Pilots

A growing number of urban transit authorities have undertaken pilot programs to test self-healing functionalities:

(1): Pilot A deployed a multi-agent system spanning multiple traction substations on a busy metropolitan rail line. Each substation agent automatically adjusted feeder connections when localized faults were detected. Early results showed a drastic reduction in fault clearance times and improved power quality during reconfiguration.
(2): Pilot B focused on AI-based fault prediction for critical power components. By combining historical data on cable insulation failures with real-time temperature and partial discharge sensors, the pilot achieved a substantial decrease in unexpected cable faults, enhancing overall system availability.
(3): Pilot C explored the integration of IEC 61850-based control architecture in a newly built subway extension. Standardized communication protocols allowed different vendors’ substations, protective relays, and SCADA systems to interoperate. The pilot demonstrated that advanced GOOSE messaging could achieve fault isolation within milliseconds, significantly reducing service interruptions.

While these pilots demonstrate the feasibility and benefits of self-healing solutions, they also highlight the importance of robust training programs for maintenance staff and control center operators, clear design guidelines for applying standards like IEC 61850 to traction scenarios, and thorough cybersecurity audits to safeguard the system from unauthorized access.

2.4.6. Future Directions and Research Opportunities

The evolution of subway power systems toward self-healing architectures continues to present numerous opportunities for further innovation and refinement. One of the primary research areas is the validation of self-healing technologies through pilot studies and real-world data collection. These studies will play a crucial role in demonstrating the practical applicability of these technologies in operational metro systems. Furthermore, research in advanced AI-based fault prediction and maintenance, coupled with real-time data analytics, will be essential to optimize self-healing processes and reduce the occurrence of service disruptions. These opportunities are summarized as follows.

(1): Integration with Smart Mobility and Energy Management: As urban areas adopt more holistic “smart city” strategies, subway power systems may be integrated with other mobility solutions, such as electric buses or shared autonomous vehicles, forming an interconnected transportation energy ecosystem. Coordinated energy management across these systems could unlock novel self-healing and load balancing capabilities, for example by rerouting excess regeneratively braked energy to nearby electrical loads or EV charging stations.
(2): Enhanced Sensor Deployment and Data Analytics: Future subways could leverage high-resolution sensors for continuous waveform monitoring, partial discharge analysis, and real-time location tracking of trains. With the advent of 5G and edge computing, massive data streams can be processed at the substation or trackside in near real time, facilitating ultra-fast fault detection and system reconfiguration. Research in advanced analytics, such as deep neural networks or reinforcement learning, promises to further refine these capabilities.
(3): Holistic Resilience Frameworks: Beyond electrical faults, subway systems may face a variety of disruptions, from extreme weather events (e.g., flooding in tunnels) to cyberattacks targeting control systems. Expanding self-healing to encompass multi-hazard resilience would involve integrated monitoring of infrastructure conditions (e.g., water leakage and track integrity) and dynamic adaptation of protective or evacuation measures. This comprehensive approach would require new interdisciplinary collaborations among electrical engineers, civil engineers, cybersecurity experts, and urban planners.
(4): Human-in-the-Loop vs. Full Autonomy: Although the end goal for many operators is to minimize human intervention, achieving full autonomy in critical infrastructure raises important questions regarding reliability, liability, and public acceptance. Ongoing research could investigate hybrid frameworks that allow human supervisors to override or guide self-healing decisions when system states deviate significantly from normal operating conditions. This “human-in-the-loop” paradigm can bolster operator trust while still leveraging the speed and efficiency of AI-driven automation.
(5): Regulatory and Standardization Needs: Uniform guidelines for applying IEC 61850 or similar standards to traction power systems remain in their infancy. Multiple national and international standard-setting bodies may need to coordinate new protocols specific to subway environments. Moreover, regulators must evaluate safety and reliability metrics in the context of self-healing performance, ensuring that subway operators maintain rigorous compliance with established norms.

2.4.7. Synthesis and Outlook

In summary, the adoption of self-healing technologies in subway power systems signifies a pivotal shift from reactive fault response to proactive, intelligent, and resilient operations. By leveraging principles from larger electrical grids—such as advanced sensor networks, IEC 61850-based communications, multi-agent coordination, and AI-enhanced analytics—subway operators can substantially improve service reliability and safety. The distinct challenges posed by subterranean environments, constrained topology, and high-density loads necessitate careful customization of these technologies, but successful pilot programs demonstrate their feasibility and benefits.

Looking forward, the continued urbanization of metropolitan centers and the increasing importance of mass transit solutions position subway systems as prime candidates for next-generation self-healing research. Ongoing studies will likely explore deeper integration with other urban energy systems, broader resilience frameworks that account for climate and security risks, and innovative control architectures balancing automated intelligence with prudent human oversight. As these efforts mature, self-healing subway power systems will not only advance the overall reliability of urban rail transit but will also serve as an exemplary application domain, pushing the boundaries of intelligent control and automation in critical infrastructures worldwide.

3. Specific Challenges Faced by Subway Power Supply Systems

Subway power supply systems present a unique set of technical, operational, and regulatory challenges that distinguish them from traditional power grids or typical distribution networks. These challenges arise not only from the complex topology and confined operating environment in underground rail systems but also from increasingly stringent requirements for reliability, safety, and real-time fault management. Furthermore, the push toward higher levels of automation and intelligence introduces additional layers of complexity and integration needs, including communication standards, multi-agent coordination, and data-driven algorithms.

In this chapter, we discuss three principal categories of challenges, each meriting a dedicated subsection. First, in Section 3.1, we analyze the complex topology and operational constraints that impede straightforward adoption of conventional self-healing solutions. Second, in Section 3.2, we delve into fault diagnosis, isolation, and recovery in real time, focusing on the technological barriers and performance metrics that must be addressed to achieve rapid response. Finally, in Section 3.3, we examine regulatory, safety, and integration barriers with emerging technologies, emphasizing the interplay between standards compliance (e.g., IEC 61850), safety-critical requirements, and the integration of MASs and AI. These three dimensions are interlinked, collectively shaping the reliability, intelligence, and overall feasibility of self-healing subway power supply systems.

3.1. Complexity of Topology and Operational Constraints

Subway power supply systems typically feature intricate power distribution architectures, including multiple substations and complex feeder lines that operate under high load density and limited physical space [62]. This complexity is driven by high passenger demand, the need for continuous operation, and safety requirements such as mandatory power redundancy to allow safe evacuation of passengers in case of a single-point failure [63,64]. These design constraints make it difficult to implement conventional self-healing solutions, which are often based on more flexible or widely distributed networks. In the following sections, we will employ methodological simplification in presenting these technical complexities, with the explicit objective of enhancing accessibility for interdisciplinary audiences and non-specialist readerships.

3.1.1. Unique Structural Layout and Load Characteristics

Unlike standard distribution networks, subway systems employ ring-like or radial topologies—often in combination—to ensure redundancy. These are designed to provide backup power in the event of a fault, but the limited space in tunnels restricts the addition of extra cables or protective devices. Furthermore, subway systems experience rapid fluctuations in load due to train acceleration, deceleration, and regenerative braking. This variability creates challenges for monitoring and controlling the system in real time. To make this concept clearer, we will now describe the load behavior in simpler terms.

Mathematically, we can describe the power flow through a given feeder segment i using a simplified representation of a DC traction power system (assuming DC electrification for many modern subway systems). If P_i is the power demanded by the train(s) on feeder i and V_i is the operating voltage, then the current I_i is as follows:

I_{i} = \frac{P_{i}}{V_{i}} .

(4)

Equation (4) expresses the power demand (P_i) for a given feeder i, where P_i changes as the train moves and accelerates. This time-varying load behavior makes it difficult for traditional fault detection algorithms to work effectively. For example, traditional models assume that the load is stable, but in a subway system, it fluctuates rapidly, requiring adaptive methods to account for these changes. This dynamic load behavior complicates fault detection algorithms that rely on steady-state signals, making them less effective for real-time monitoring in a subway environment.

However, due to frequent changes in P_i arising from train movement and acceleration patterns, the power balance equation must be continually updated in real time. This results in a time-varying load profile:

P_{i} (t) = F (position, acceleration, passenger load, \dots),

(5)

where F(·) is a function capturing the instantaneous power consumption influenced by operational and environmental factors. Such dynamic load behavior complicates fault detection algorithms that rely on stable or quasi-steady-state load signatures, necessitating advanced, predictive, or adaptive methods.

3.1.2. Space Constraints and Infrastructure Limitations

Owing to dense urban environments, subway substations and power cables often share limited underground space with other utilities. This limited space makes it difficult to install additional equipment or sensors, which are crucial for real-time fault detection and system monitoring. The physical layout can restrict the addition of redundant cables or the implementation of standard protective equipment, such as circuit breakers and advanced switchgear. The deployment of sensors for condition monitoring and the retrofitting of intelligent devices also become more challenging in tight spaces. These infrastructural limitations underscore the importance of designing compact yet reliable control modules.

Moreover, specialized ventilation and cooling requirements, imposed by the underground setting, introduce additional operational constraints. Protective devices must be designed to handle higher ambient temperatures and humidity levels, while also meeting stringent fire and smoke control regulations. Consequently, hardware designed for standard aboveground distribution networks is not always directly applicable to subway environments. In summary, the confined environment of subway systems demands compact, yet highly reliable equipment for efficient self-healing mechanisms.

3.1.3. Operational Demands and Safety Considerations

Safety is of paramount concern in subway operations. Any fault or power interruption could endanger passengers, particularly in tunnels with limited escape routes. Subway operators need systems that can quickly detect faults, isolate them, and restore power to critical systems like ventilation and lighting to ensure passenger safety. In a typical self-healing system, power restoration is based on load importance. In subways, however, safety-critical loads such as lighting and ventilation must always be prioritized, even if they conflict with restoring power to non-essential areas. This places additional demands on the system to function optimally under high stress.

To meet these heightened safety requirements, self-healing solutions must incorporate advanced features, such as automatic rerouting of power and dynamic load shedding for non-essential systems. The complexity of these solutions, combined with the underground operational constraints, highlights the need for more refined, adaptive control strategies. Based on this, Table 2 summarizes eight key aspects of complex topological and operational constraints in power and energy systems, along with a special focus on subway power supply systems [65,66,67]. This table encapsulates the multifaceted nature of operational constraints and network topologies in subway systems compared to broader power and energy frameworks. From the heightened fault tolerance requirements necessary to safeguard human lives, to severe spatial limitations and challenging environmental conditions, subway environments significantly amplify conventional distribution network complexities. These constraints underscore the urgent need for research into compact, resilient, and adaptive technologies, including advanced sensor networks, real-time analytics, and robust communication protocols. Our viewpoint is that developing a holistic approach—one that integrates hardware design, data analytics, and regulatory compliance—will be essential for effectively addressing these challenges in self-healing subway systems.

Below is Table 3, which further expands on the same constraints but from the perspective of potential technological and research-based interventions aimed at overcoming them. This table highlights the range of existing and emerging technological interventions that address the unique constraints of subway power supply systems. While many solutions are at a pilot or early-adoption stage, they collectively represent promising avenues for achieving greater reliability and smarter operations. Crucially, the success of these innovations depends not only on technological feasibility but also on regulatory frameworks, cost-effectiveness, and the availability of skilled personnel. Our view is that a holistic, lifecycle approach—one that spans design, implementation, maintenance, and upgrade—will be the linchpin for ensuring these interventions evolve into robust, standardized solutions tailored to underground rail environments.

3.2. Fault Diagnosis, Isolation, and Recovery in Real-Time

Real-time fault diagnosis, isolation, and system recovery form the technical core of any self-healing power network. In subway power supply systems, rapid fault response is even more imperative because of high safety requirements, the potential for substantial passenger disruptions, and the confined nature of subway tunnels. This section delves into the distinctive aspects of fault management in subway systems, focusing on the integration of advanced diagnostics, multi-agent coordination, and communication protocols suitable for underground conditions. We will now break down these concepts into simpler terms, explaining the steps involved in fault detection and recovery.

3.2.1. High-Speed Fault Detection and Localization

Subway power systems typically rely on a combination of protective relays and local measurement devices (e.g., current transformers and voltage sensors) to identify fault conditions [68,69,70]. However, the high-speed nature of subway systems requires these devices to detect faults almost instantly, even before they fully propagate through the network. In traditional systems, faults can be detected in milliseconds, but in subway systems, we need to detect and respond within fractions of a cycle to prevent accidents and minimize damage. Formula (6) provides a mathematical framework for fault detection using wavelet transforms, enabling the detection of rapid fault transients with high precision.

Recent advances in algorithms, such as wavelet-based methods, have shown promise in identifying fault signals almost instantly [71,72,73]. These methods work by detecting sudden changes in voltage or current signals. The key challenge is ensuring that these algorithms can handle the variability caused by rapid load changes, which are common in subway systems. Wavelet transforms are particularly effective because they can analyze signals at multiple scales, allowing for the detection of abrupt changes in voltage or current in real time. Formula (6) defines the wavelet transform used to capture such changes. By applying this transform to the measured signals, the system can identify fault occurrences almost immediately, thus reducing detection time.

For instance, a wavelet transform approach can identify the abrupt changes in current or voltage signals within fractions of a cycle. Let I_a(t) be the fault current signal measured at a feeder location a. A wavelet-based algorithm may compute the wavelet coefficients W(τ, s) at scale s and time shift τ:

W (τ, s) = \frac{1}{\sqrt{s}} \int_{- \infty}^{\infty} I_{a} (t) ψ (\frac{t - τ}{s}) d t,

(6)

where ψ is the mother wavelet. By identifying large coefficient magnitudes in certain frequency bands, the fault can be detected and localized almost instantaneously. Ensuring that these algorithms remain robust to variable load levels and potential measurement noise is an ongoing challenge, especially in the subterranean environment. This formula provides a key methodology for real-time fault detection in subway power systems, leveraging wavelet-based analysis to quickly respond to fault conditions while accounting for the dynamic nature of subway systems. To further enhance detection accuracy, it is important to consider the operational context and environmental variability of subway networks. This consideration highlights the complexity of fault detection in subway power systems, where environmental factors such as electromagnetic interference and fluctuating load demands must be accounted for in the design of detection algorithms.

3.2.2. Isolation Strategies in Constrained Environments

Once a fault is detected, it must be isolated quickly to prevent further damage. In subway systems, space constraints make it difficult to deploy additional circuit breakers. However, MASs offer a solution by using distributed sensors and devices that communicate with each other to decide the best strategy for isolating the faulted section. In simpler terms, instead of relying on a central controller, the system uses a network of intelligent devices that work together to identify and isolate faults.

MASs offer a promising solution by allowing distributed relays or intelligent electronic devices (IEDs) to communicate and coordinate their actions [74,75,76]. When a fault is detected, these agents negotiate which segment should be isolated, balancing the safety, load requirements, and operational constraints. Such decision-making can be modeled as an optimization problem under real-time constraints, often solvable through heuristics or simplified linear programming approaches to ensure the solution is computed fast enough for practical deployment.

3.2.3. Rapid Service Restoration and Self-Healing Techniques

After isolation, a paramount objective is to restore service to as many segments as possible while maintaining critical loads. Self-healing mechanisms may utilize alternative feed paths, or in DC traction systems, reconfigure the power supply from one substation to another if multiple substations supply overlapping regions. The final goal of a self-healing system is to restore power to as many segments as possible while ensuring that critical loads remain operational. To do this, the system may need to reroute power along alternate paths, either from a different substation or by using available backup power sources. The process must prioritize safety-critical systems and avoid overloading other sections of the network. MASs can help coordinate this process by allowing different agents to communicate and adjust their actions based on real-time conditions. This adaptive approach helps minimize downtime and prevent further faults from occurring during the restoration process. Key considerations for restoration strategies include the following:

(1): Prioritization of Essential Loads: Station lighting, ventilation fans, and communication systems typically take precedence.
(2): Gradual Re-Energization: Inrush currents from multiple loads can lead to secondary faults if not controlled.
(3): Adaptive Coordination: Agents update each other on the status of circuit breakers and load demands, recalculating the optimal restoration path dynamically.

Fault-tolerant communication remains pivotal here, as real-time data exchange among IEDs, substation controllers, and train control centers is critical for coordinated restoration. Based on the above, below is Table 4, describing the key dimensions of real-time fault management and how they compare between general power systems and subway-specific applications. This table clarifies how fault management considerations in subways differ from those in broader power systems. While the need for speedy detection and isolation exists universally, the stakes in a subway environment are amplified by passenger safety and the confined space. Traditional solutions must often be miniaturized, accelerated, or re-engineered for subterranean use. Our viewpoint is that a concerted push toward distributed, intelligent solutions that integrate seamlessly with robust communication frameworks is vital. Given the urgency and operational constraints, these solutions should also incorporate redundancy in both hardware and decision-making processes.

Further, below is Table 5, focusing on specific technological enablers and approaches that enhance fault detection, isolation, and recovery in real time. This table spotlights the technological solutions that promise to revolutionize real-time fault management in subway power systems. From high-speed relays and advanced signal processing techniques to multi-agent coordination and AI-driven prediction, the options are diverse yet complementary. In our assessment, the challenge lies in harmonizing these approaches into a unified architecture that can meet the stringent safety and reliability benchmarks specific to subway operations. As communications improve and the costs of sensors and computational hardware decline, the feasibility of sophisticated real-time schemes will only increase, reinforcing the need for standardization and robust testing in live subway environments.

3.3. Regulatory, Safety, and Integration Barriers with Emerging Technologies

While technology plays a vital role in self-healing subway power systems, it must align with strict regulatory frameworks and safety standards. Subways are subject to multiple layers of oversight from local governments and safety authorities, making the certification and adoption of new technologies a complex process.

3.3.1. Safety Standards and Compliance Requirements

Subway systems are subject to multiple layers of regulation, typically involving railway authorities, local governments, and international standards bodies. For example, any modifications to power infrastructure may require compliance with IEC 62443 (for industrial communication networks and security), local traction power guidelines, and specialized rail transit codes. IEC 61850, although originally developed for substation automation in power grids, is increasingly recognized for its potential in rail environments, yet it must be adapted to handle traction power specifics and integrated with existing railway safety protocols (e.g., EN 50126/50128/50129 in Europe) [77,78,79].

Attaining certification for newly introduced protective devices or software modules can take years, owing to the rigorous testing processes mandated for passenger safety. This elongated timeframe impacts the agility with which subway operators can adopt emerging self-healing technologies and necessitates thorough planning from the inception of any R&D initiative.

3.3.2. Interoperability and Integration with Legacy Systems

Many subway power systems were installed decades ago and lack the modern communication interfaces required to support new technologies. Integrating these systems with MAS and AI solutions requires overcoming significant challenges such as protocol mismatches and hardware limitations. One solution is to use gateways and middleware to bridge the gap between old and new technologies, allowing legacy systems to communicate with modern devices. This approach can help transition subway systems to self-healing architectures without needing a complete overhaul. The major hurdles include the following [80,81,82]:

(1): Data Format Incompatibility: Legacy devices may not generate standardized digital outputs necessary for AI-based analysis.
(2): Protocol Mismatch: Communication standards, such as IEC 61850, must be layered on top of older SCADA systems or even analog control signals.
(3): Hardware Limitations: Legacy switchgear may lack the control interfaces to enable external agent-based decisions or real-time reconfiguration.

For example, the integration of modern automation and intelligent control systems into legacy subway power infrastructures presents significant challenges due to outdated equipment and communication protocols. Mbango (2009) [80] highlights the difficulties of retrofitting SCADA-based legacy systems with modern communication standards like IEC 61850, emphasizing compatibility issues with aging switchgear and transformers. Dutta Pramanik and Upadhyaya (2025) [81] further explore how advanced IoT solutions, including motorized actuators and standardized communication protocols, can be layered onto older grid systems to bridge protocol mismatches and ensure interoperability while mitigating vendor lock-in. Additionally, their study underscores the necessity of updating legacy data formats and communication systems to enable AI-driven and MAS applications, as modern digital outputs and real-time decision-making capabilities often exceed the capabilities of older infrastructure. Together, these studies provide a comprehensive analysis of the technical and financial challenges associated with modernizing legacy subway power networks while ensuring reliability and efficiency.

As a result, achieving a unified self-healing architecture often involves partial overhauls or staged deployments, which complicate operational continuity and budget planning. Based on the above, Figure 3 illustrates the process of integrating advanced MAS or AI solutions into aging subway power infrastructures. This flowchart highlights key challenges—such as data format incompatibility, protocol mismatch, and hardware constraints—and proposes a phased approach to achieve a unified self-healing architecture while preserving operational continuity. Seen from Figure 3, the step-by-step explanation is summarized as follows.

1. Identify Existing Legacy Devices

A thorough survey of legacy switchgear and control equipment is conducted to determine their current functionality, communication interfaces, and overall compatibility with modern data acquisition and control standards.

2. Evaluate Key Constraints

(1): Data Format Incompatibility: Older devices may provide analog or proprietary digital signals, which necessitate specialized conversion or encapsulation.
(2): Protocol Mismatch: Historic SCADA platforms or purely analog signaling can diverge significantly from contemporary standards like IEC 61850.
(3): Hardware Limitations: Legacy switchgear often lacks the necessary interfaces for remote actuation or real-time reconfiguration, hindering direct MAS or AI control.

3. Data and Protocol Adaptation

(1): Protocol Gateways/Bridges: Gateways facilitate communication between legacy systems and modern platforms without requiring a wholesale replacement.
(2): IEC 61850 Encapsulation: Wrapping legacy SCADA or analog signals in IEC 61850-compliant structures enables standardized management and interoperability.
(3): Middleware for Data Format Conversion: Dedicated software tools unify disparate data formats, facilitating seamless integration into MAS or AI analytics.

4. Hardware Retrofits and Expansions

(1): Upgraded Control Interfaces: Introducing new control boards or modules into legacy switchgear equips these devices with real-time monitoring and remote operation capabilities.
(2): Additional Real-Time Monitoring Modules: Enhancing measurement accuracy and granularity via sensors or digital metering units provides critical data for AI-driven decision-making.
(3): Partial Preservation of Analog Devices: When a full replacement is not immediately feasible, integrating digital solutions alongside retained analog components ensures a gradual transition.

5. Formulate Phased Deployment

(1): Prioritize Critical Nodes: Target the most failure-prone or operationally significant components for early-stage retrofitting.
(2): Assess Feasible Investment and Downtime: Balance the need for system reliability with available funding and permissible service interruptions.
(3): Define a Technological Evolution Path: Implement an overarching plan that anticipates future standards and protects long-term compatibility.

6. Implementation and Testing

(1): Incremental Equipment Replacement/Installation: Execute hardware upgrades and software integration in a series of controlled deployments to minimize risk.
(2): Protocol and Data Interface Validation: Conduct rigorous testing of gateways, interfaces, and data conversion processes to ensure coherence and reliability.
(3): MAS/AI-Integrated Testing: Validate the interaction between upgraded devices and AI-driven control systems, confirming that self-healing mechanisms function effectively under real-world conditions.

7. Unified Self-Healing Architecture Online

Upon successful validation, the modernized system with integrated MAS/AI solutions is commissioned, enabling comprehensive automated fault detection, isolation, and service restoration.

8. Ongoing Optimization and Maintenance

(1): Technological Upgrades: Continuously refine the integrated system in response to emerging digital standards and novel AI algorithms.
(2): Scheduled Device Renewal: Replace aging assets as part of routine maintenance, gradually increasing the proportion of modern, digitally enabled equipment.
(3): Adoption of Emerging Standards: Align future developments with evolving industry protocols to maintain long-term interoperability and performance excellence.

As demonstrated in Figure 3, this phased methodology ensures the progressive transformation of legacy subway power systems into fully integrated, self-healing networks. By systematically identifying key technological constraints, implementing both hardware and software retrofits, and employing protocol adaptation strategies, stakeholders can maintain robust operational continuity while incrementally modernizing their infrastructures. Through the careful prioritization of critical nodes and the adoption of specialized tools for data conversion, legacy devices can be seamlessly incorporated into cutting-edge MAS or AI frameworks. The outcome is a resilient power network characterized by intelligent fault management, real-time monitoring, and long-term adaptability to new technological standards.

3.3.3. Balancing Innovation, Cost, and Public Acceptability

Public transportation authorities must balance the costs of upgrading infrastructure with the need to maintain service affordability. The cost of implementing self-healing systems—along with the specialized training and software licenses required—can be a barrier to adoption. Pilot projects provide an opportunity to gather performance data and build confidence in the technology before full-scale implementation. Demonstrating a clear return on investment (ROI)—usually via reductions in service interruptions and associated penalties—is often critical to securing funding. Notably, public acceptance serves as a critical prerequisite for system innovation, given that substantial operational modifications may elicit concerns over service reliability and cost escalations.

From a strategic perspective, adopting emerging technologies in smaller pilot projects can help gather performance data and build confidence among stakeholders. However, scaling from pilot to system-wide deployment introduces additional complexities, underscoring the need for stable, well-documented, and standardized solutions.

Below is Table 6, summarizing major regulatory, safety, and integration issues, contrasting their treatment in general power systems versus specialized subway networks. This table highlights the interplay between stringent regulatory environments, public safety imperatives, and the integration of legacy systems that typify subway power networks. Unlike typical power utilities, where incremental modernization is possible with relatively less public scrutiny, subway systems face direct accountability to commuters and municipal governments. Consequently, adopting a self-healing paradigm requires a thorough demonstration of reliability and compliance. Our perspective is that the regulatory dimension should not be viewed merely as a constraint but as an essential guideline to ensure passenger well-being and system robustness. Collaborative efforts between standardization bodies and railway authorities, coupled with well-defined pilot projects, can help accelerate the adoption of advanced technologies.

Further, below is Table 7, focusing on specific measures, strategies, and research directions that can alleviate regulatory, safety, and integration bottlenecks. In this table, we map out a variety of strategic pathways by which regulatory, safety, and technological barriers can be mitigated. From unified standardization to pilot sandbox environments and modular retrofitting approaches, there are numerous tactics available to ease the transition toward self-healing architectures in subway systems. Our viewpoint underscores the importance of synergy between technology developers, railway authorities, and public stakeholders. The potential payoffs—enhanced passenger safety, improved operational efficiency, and a more resilient transit system—amply justify the investment and effort required to navigate these constraints.

Overall, this chapter has surveyed the multifaceted challenges unique to subway power supply systems. In Section 3.1, we covered the intricate topological constraints and the difficulty of operating within confined underground environments. Section 3.2 then examined the technical hurdles in implementing real-time fault diagnosis, isolation, and self-healing restoration, emphasizing the importance of speed, coordination, and robust communication. Finally, Section 3.3 analyzed the broader regulatory, safety, and integration issues that govern how new technologies can be adopted and scaled within subway networks. These three sections collectively highlight that any self-healing strategy for subway power supply systems must be holistically designed—encompassing engineering solutions, operational practices, and regulatory frameworks—to meet the stringent reliability and safety demands of modern urban rail transit.

3.4. Advancing Fault Management and Self-Healing Capabilities in Subway Power Supply Systems

In this section, we delve into the key technological innovations that are reshaping fault management and self-healing capabilities within subway power supply systems. As subway systems face increasing operational demands and stringent safety requirements, the need for more intelligent, adaptive, and efficient fault management systems has never been more critical. By leveraging emerging technologies such as MASs, AI-based algorithms, and real-time data analytics, subway power networks can achieve faster fault detection, isolation, and recovery, ultimately enhancing both operational reliability and passenger safety.

As presented in Table 8, the comparison between traditional self-healing techniques and the proposed MAS-based strategy highlights the distinct advantages of adopting AI-driven solutions in managing faults within the complex and confined environments of subway systems. This table presents a detailed comparison, illustrating how the MAS-based approach addresses critical challenges, such as fault detection speed, real-time adaptability, and recovery efficiency, that conventional methods struggle to overcome.

1. Key Advantages

The comparison in Table 8 highlights the significant advancements that MAS-based self-healing systems offer over traditional techniques. By leveraging AI-based fault detection and distributed decision-making, MASs offer more precise, rapid, and adaptive fault management compared to conventional methods that rely on static algorithms and manual intervention [83,84,85]. Key advantages include the following:

(1): Improved Detection Speed: MAS-based systems are capable of detecting faults in near real time, a critical feature for high-speed subway systems where rapid fault detection can minimize service disruptions and enhance passenger safety.
(2): Increased Flexibility: Unlike traditional systems that follow fixed algorithms, MASs adapt to real-time network conditions, allowing for dynamic fault isolation and recovery strategies. This is especially important in complex subway topologies, where conventional systems struggle to handle intricate network configurations.
(3): Enhanced Recovery Efficiency: MASs can reconfigure power distribution networks on the fly, ensuring that critical loads like lighting and ventilation are restored first, which is crucial for subway systems where passenger safety is paramount.
(4): Space and Maintenance Benefits: Traditional systems require significant hardware installations, which can be difficult in the confined underground spaces of subway systems. MASs, by contrast, use distributed sensors and intelligent agents, reducing the need for additional hardware and simplifying system maintenance.

2. Future Outlook and Research Directions

The adoption of MAS-based self-healing systems in subway power supply networks is a promising avenue for overcoming the unique challenges faced by these systems. However, several hurdles remain:

(1): Integration with Legacy Systems: Integrating MASs with older subway power infrastructures presents significant challenges due to outdated communication protocols and hardware limitations. Future research should focus on developing seamless integration frameworks that allow for gradual modernization of legacy systems without major disruptions to existing operations.
(2): Regulatory Challenges: The deployment of MAS-based systems in subway networks requires updates to regulatory frameworks, especially in terms of safety certifications and standards compliance. Ongoing collaboration between AI experts, regulatory bodies, and transit authorities will be essential to harmonize new technologies with existing safety protocols.
(3): Cost and Implementation Feasibility: While MASs offer significant benefits, the initial cost of implementation may be prohibitive for many subway systems, especially those in less economically developed regions. Future research should focus on developing cost-effective solutions that make MAS adoption more accessible, including low-cost sensors and cloud-based processing frameworks.
(4): Real-Time Data Processing: Edge computing and machine learning algorithms will play a critical role in processing the massive amounts of data generated by subway power supply systems [86,87,88,89,90]. Future advancements in these technologies will be crucial for achieving the real-time decision-making required for effective self-healing.

Overall, this section has examined the technological innovations that underpin self-healing subway power supply systems, with a focus on the MAS-based strategy. A comparison with traditional systems underscores the substantial benefits of MASs in terms of fault detection, isolation, recovery efficiency, and system scalability. However, challenges related to legacy system integration, regulatory compliance, and cost remain, and further research is necessary to address these barriers. As technology evolves, MASs will become an increasingly integral component of resilient and adaptive power networks, offering enhanced reliability and safety for subway operations.

4. The Integration of MASs and the IEC 61850 Standard into Subway Power Systems

Before delving into the detailed discussions in this chapter, it is crucial to establish a coherent overview of how each subsection will contribute to our central theme: integrating MASs with the IEC 61850 standard to achieve enhanced self-healing capabilities in subway power networks. Section 4.1 lays the theoretical and conceptual groundwork by examining the principles, operational frameworks, and core algorithms underlying MAS-based self-healing approaches, thereby clarifying the motivations for distributing intelligence and control across multiple agents in complex subway power infrastructures. Section 4.2 shifts the focus to the IEC 61850 standard itself, detailing its data modeling techniques, communication protocols, and engineering methodologies. This portion provides clarity on why IEC 61850 is pivotal for creating standardized data structures and high-speed communication channels in modern traction power systems. Finally, Section 4.3 synthesizes the findings of the previous two sections by proposing a convergent MAS–IEC 61850 architecture, highlighting how the synergy between a distributed agent framework and a standardized communication protocol can revolutionize fault detection, isolation, and restoration (FDIR) processes. These three subsections collectively illustrate how MASs and IEC 61850 can be cohesively integrated to bolster both the intelligence and interoperability of next-generation subway power systems.

4.1. MAS-Based Approaches to Self-Healing in Subway Power Systems

MASs have gained prominence as a robust paradigm to address the growing complexities within electric power networks, including distribution grids, transmission infrastructures, and railway/metro traction power systems. In subway power networks, the non-trivial combination of AC (e.g., 25 kV or 35 kV) and DC (often 750 V or 1500 V) segments, along with extensive feeder lines, makes centralized architectures prone to single-point failures, communication bottlenecks, and slow response times. By contrast, MAS-based frameworks distribute intelligence among localized agents that monitor and control subsets of the network. These agents can collaborate with one another to execute real-time fault diagnosis, isolation, and restoration decisions, thereby enabling faster self-healing actions and minimizing disruptions to subway operations.

4.1.1. Conceptual Foundations and Control Philosophies of MASs

From a theoretical standpoint, MASs can be viewed as an ensemble of autonomous entities—termed agents—each responsible for a specific functional or geographical segment of the power system. Agents are designed to perceive local states (such as voltage, current, power flows, or device status) and communicate with peer agents (or higher-level controllers) to reach optimized global or semi-global control objectives [91,92,93]. These studies [91,92,93] collectively explore the control and management methodologies of MASs in power systems, emphasizing the role of autonomous agents in local perception, global optimization, and coordinated control. Logenthiran (2012) [91] investigates the application of MASs in distributed power systems, proposing a real-time management and optimization strategy based on agents to enhance system flexibility and autonomy. Dou et al. (2014) [92] further developed a decentralized coordinated control method based on MASs, improving the transient stability of large-scale power systems through information exchange and collaboration among agents. Farid (2015) [93] focuses on the design principles of MASs for resilient coordination and control in future power systems, introducing a framework that enables more efficient responses to sudden failures and dynamic load changes. These studies demonstrate that MAS technology, through the collaborative work of distributed intelligent agents, can achieve optimized control and enhance the reliability and self-healing capabilities of power systems.

A key advantage lies in the ability of agents to respond locally to disturbances while still coordinating across the network to avoid suboptimal or contradictory actions. Formally, let us denote the network by a set of buses B and lines L. Each agent A_i (where i∈{1, 2, ^…, N}) monitors a subset of buses B_i ⊆ B and lines L_i ⊆ L. In a distributed control algorithm, an agent’s decision uiu_iui may be formulated as follows:

u_{i} = \arg \min_{u_{i} \in U_{i}} J_{i} (x_{i}, x_{- i}),

(7)

where

x_i is the local state vector observed by agent i;
x_−i represents the states observed by neighboring agents or those shared through communication;
J_i is the cost function capturing local objectives (e.g., minimize power loss, ensure safe operation under fault conditions, or maintain voltage within permissible limits);
U_i is the feasible action space for agent i.

Through an iterative or event-triggered communication protocol, each agent refines its control decision u_i, coordinating with adjacent agents until a convergent solution is reached or a deadline for fast self-healing control expires. This iterative process, facilitated by MASs, ensures real-time adaptation to evolving operating conditions in the subway power network.

4.1.2. MAS-Based Fault Detection and Diagnosis

In the context of subway power systems, MAS-based fault detection methodologies frequently combine local sensor data with higher-level decision-making processes. Each agent employs local measurements—such as overcurrent readings, voltage dips, traveling-wave signals, or negative sequence components—to detect anomalies. When a local agent suspects a fault, it initiates a distributed consensus mechanism to confirm that the disturbance is genuine and not merely a sensor malfunction or transient event. A simple but illustrative approach is shown below:

α_{i} (t) = \{\begin{array}{l} 1, & if local measurement indicates fault at time t \\ 0, & otherwise \end{array}

(8)

By exchanging α_i(t) values, if a sufficient fraction of neighboring agents also detect anomalies (α_i(t) = 1), the network of agents collectively raises a system-wide fault alarm. This distributed confirmation drastically reduces false positives and eliminates reliance on a single central controller. Once a fault is confirmed, specialized diagnostic agents implement advanced signal processing or pattern recognition algorithms to classify the fault type (e.g., single-phase-to-ground, short-circuit between phases, or DC traction line grounding fault) and localize the faulted segment within the network’s geographic or topological mapping.

4.1.3. MAS-Based Fault Isolation and Restoration

Fault isolation and restoration processes in MASs rely on cooperative control actions among protective devices such as breakers, switches, and reclosers. In a typical subway distribution scenario, the primary objective is to de-energize only the faulted section while maintaining power supply to all unaffected sections. The MAS approach can be summarized by the following pseudo-equations. Suppose that an agent controlling a switch SW_j needs to decide whether to open or close the switch after a fault is detected and located:

Open ({SW}_{j}) = \{\begin{array}{l} 1, & if Δ P_{loss} - Δ P_{risk} < λ \\ 0, & otherwise \end{array}

(9)

Here, ΔP_loss represents the power that would be disconnected if the switch is opened, ΔP_risk quantifies the operational or safety risk of leaving the switch closed, and λ is a threshold that the agent dynamically adjusts based on the system’s real-time operational context (e.g., passenger load demands or train frequency). Once the faulted section is isolated, restoration agents reconfigure network topology to reroute power through alternative paths or feeder lines. Each agent reevaluates the line capacities, voltage levels, and breaker statuses, ensuring that the newly configured network operates within acceptable thermal limits.

4.1.4. Evaluation of MASs in Subway Environments

Empirical studies and field trials suggest that MAS-based approaches reduce fault-clearing times and curtail the scope of outages in urban rail power networks. Moreover, their distributed nature inherently accommodates incremental system expansions. However, challenges remain, notably the standardization of agent communication protocols and the design of robust agent negotiation schemes that handle complex couplings between AC traction, DC traction, and station loads. These gaps emphasize the importance of coupling MASs with standardized frameworks such as IEC 61850, which we examine more thoroughly in subsequent sections.

Based on the above, Table 9 provides a comparative overview of key application areas where MAS approaches can significantly enhance the self-healing capabilities of subway power systems [94,95,96]. For example, the reviewed literature highlights the integration of MASs in the optimization and control of power systems, particularly in enhancing self-healing capabilities and fault management. Herrera et al. (2020) [94] provide a comprehensive review of MAS applications in complex networks, emphasizing its potential in resilient design and self-healing, which can significantly improve fault isolation and service restoration. Sharifi (2015) [95] focuses on energy-aware service provisioning in peer-to-peer cloud ecosystems, discussing how MASs can optimize energy flow and enhance system reconfiguration. Meanwhile, Irfan et al. (2017) [96] examine the role of MASs in the control of smart grids, stressing its contribution to predictive maintenance, fault detection, and self-healing, thus strengthening the overall reliability of the system. Collectively, these works underscore the strategic importance of MASs in advancing the fault management and resilience of modern power networks, particularly in urban infrastructure like subway power systems. Based on this, Table 9 summarizes potential growth trends, technological barriers, and strategic importance across ten dimensions. Overall, it underscores that fault isolation, restoration, and system reconfiguration hold the most immediate promise for broad adoption, while areas such as predictive maintenance, microgrid integration, and power quality monitoring remain underexplored but present high potential for future research and implementation. Our viewpoint is that a collaborative, standardized approach—supported by robust communication protocols—will be crucial in advancing MAS-based solutions from pilot demonstrations to large-scale deployments.

As shown in Table 9, this table provides a comprehensive comparison of the applications of MASs in subway power systems across various scenarios, including fault diagnostics, system reconfiguration, and energy management, highlighting adoption levels, challenges, and future directions. Based on this, Table 10 delves deeper into emergent research directions for MASs in subway power systems, covering topics such as scalable architectures, agent-based security, big data integrations, and varying hierarchical designs. A salient point from this table is the growing interest in hybrid hierarchical–distributed frameworks and integrated intrusion detection solutions that enhance both system reliability and cybersecurity. In our assessment, real-time simulation and hardware-in-the-loop testing remain critical to validate new MAS concepts in an environment that faithfully reflects the complexities of day-to-day subway operations.

Overall, this section underscores that MASs offer a potent framework for distributed self-healing control in subway power systems. Yet, to unlock its full potential, it must be reinforced by standardized communication infrastructures—particularly those specified in IEC 61850. The next section provides an in-depth exposition of how IEC 61850 can facilitate high-speed, interoperable, and secure data exchange across a broad spectrum of power system devices, thereby complementing and enhancing MAS-based strategies.

4.2. Implementation of the IEC 61850 Standard in Subway Power Systems

The IEC 61850 standard was originally devised for substation automation, enabling standardized object models, data structures, and communication protocols that support interoperability and vendor-neutral engineering [97,98,99,100,101]. Over the past decade, it has been extended to encompass distribution automation, renewable energy integration, and even railway electrification contexts. In subway power systems, the standard addresses complex challenges such as seamless multi-vendor integration, real-time protective relaying, and advanced automation functionalities essential for self-healing. This section thoroughly explores the protocols, models, and engineering techniques that make IEC 61850 a cornerstone for modernizing subway power networks.

4.2.1. Foundations of IEC 61850 and Its Relevance to Subway Networks

IEC 61850 provides a comprehensive approach encompassing data modeling, communication stacks, and configuration languages [102,103]. Its fundamental building blocks include the following:

(1): Logical Nodes (LNs): Abstract representations of power system functions (e.g., measurement, protection, and control).
(2): Data Objects and Data Attributes: Structured to capture various aspects of a function’s state, measurement readings, and control parameters.
(3): Communication Services: Such as GOOSE for high-speed event transfer, and MMS (Manufacturing Message Specification) for client–server communications.

For subway systems, the dual AC/DC supply lines, the presence of intricate protective schemes (like distance protection for AC lines, and overcurrent or undervoltage protection for DC traction feeders), and the need for reliable real-time data exchange across stations render IEC 61850 uniquely valuable. In particular, the GOOSE mechanism allows for “peer-to-peer” communication [104,105]. This ensures that protective relays and control IEDs (Intelligent Electronic Devices) can exchange trip signals, blocking commands, or reclose instructions with minimal latency—a critical requirement when trains must be continuously powered and passenger safety is paramount.

4.2.2. IEC 61850 Network Redundancy and Communication Protocols

Reliability is a non-negotiable criterion in traction power systems. IEC 61850 accommodates various redundancy protocols to ensure minimal downtime in the event of network faults:

(1): Parallel Redundancy Protocol (PRP) sends duplicate packets over independent LANs, eliminating single points of failure.
(2): High-availability Seamless Redundancy (HSR) adopts a ring topology, where each node forwards frames in both directions around the ring, ensuring zero recovery time in the case of link interruption.
(3): Rapid Spanning Tree Protocol (RSTP) [106,107,108] provides a loop-free topology but may involve small reconvergence delays.

In large-scale subway systems, a combination of PRP and HSR is often deemed optimal for process-level communications to guarantee near-instant failover. Stations, wayside cabinets, and centralized operation control centers thus rely on robust ring or mesh topologies that incorporate specialized switches supporting IEC 61850 traffic priorities. Latency, jitter, and packet-loss thresholds must be carefully specified to accommodate the stringency of traction power automation.

4.2.3. System Configuration Language (SCL) and Engineering

One of the distinguishing features of IEC 61850 is its system configuration language (SCL), defined in Part 6 of the standard. SCL allows power engineers to describe a substation’s single-line diagram, the communication architecture, and the functions hosted on each IED in a vendor-neutral XML (eXtensible Markup Language)-based format. For a typical subway traction substation, the SCL file might include

Substation \to VoltageLevel \to Bay \to LN \to DataObjects

.

By unifying the engineering process, SCL lowers the risk of misconfigurations and ensures that future expansions or modifications can be accommodated with minimal re-engineering efforts [109]. In self-healing contexts, the tight correlation of logical nodes (e.g., protection distance intelligent system (PDIS) for distance protection and protection overcurrent unit (PTOC) for overcurrent) with the real physical layout helps in automatically updating agent-based restoration strategies whenever the station’s layout changes.

4.2.4. IEC 61850 Services for Self-Healing

IEC 61850 offers several key services that directly underpin self-healing operations:

(1): GOOSE Messaging for Fast Trip Signals: Agents or protective relays can disseminate critical messages throughout the local substation or extended feeder within milliseconds, facilitating prompt fault isolation.
(2): Reporting and Logging Services: These allow an MAS or central authority to monitor system states in near real-time, capturing event sequences needed for diagnosing deeper network issues.
(3): MMS for Agent–IED Interactions: MAS architecture can rely on MMS-based client–server communications to read or write device parameters, retrieve trending data, or orchestrate switchgear commands at a slower but more comprehensive timescale.

4.2.5. Challenges and Limitations in Subway Contexts

Despite its merits, deploying IEC 61850 in subway power contexts is not without hurdles. Ensuring electromagnetic compatibility in high-voltage/low-voltage mixed environments, training a workforce specialized in substation automation, and retrofitting older devices that do not inherently support standard object models are formidable tasks. Moreover, bridging DC traction equipment with AC-based LN definitions often requires custom or extended logical nodes. Nonetheless, the industry is gradually developing specialized profiles for railway electrification that align with standard IEC 61850 principles. Based on this, Table 11 enumerates the current landscape of IEC 61850 applications within subway power systems along multiple dimensions. Noteworthy takeaways include the relatively high adoption rates for protection and SCADA integration in newer installations, as well as the emerging interest in condition-based monitoring and MAS–IEC 61850 hybrid solutions. While the standard provides robust functionalities for AC traction, bridging the gap with DC traction elements remains a work in progress. Nonetheless, ongoing refinements and extension efforts position IEC 61850 as an increasingly indispensable backbone for any advanced self-healing architecture in subway settings.

Based on Table 11, we further summarize the challenges and future potential of IEC 61850 in subway power systems, as presented in Table 12. This table emphasizes ongoing issues in retrofitting legacy DC equipment, securing GOOSE/MMS (manufacturing message specification) communications, and grappling with the complexity of LN/DO/DA (logical node/data object/data attribute) mapping. Despite these obstacles, active research and standardization efforts continue to refine IEC 61850 for railway electrification contexts. In the long term, better synergy with MAS frameworks, improved security protocols, and a cohesive approach to LN expansions will yield a robust environment where advanced self-healing functions can be reliably deployed.

Overall, this section demonstrates that IEC 61850 serves as an enabling framework for achieving real-time, interoperable communications in subway power systems. Its emphasis on standardized data models, high-speed messaging, and robust engineering languages paves the way for advanced control schemes—particularly those based on MASs. In the subsequent section, we will elaborate on how MASs and IEC 61850 can be harmonized into a convergent architecture that maximizes the advantages of both approaches.

4.3. Convergent MAS–IEC 61850 Architecture for Fault Diagnosis, Isolation, and Restoration

While Section 4.1 and Section 4.2 have, respectively, presented the strengths of MASs and IEC 61850, the fusion of these two technologies represents a paradigm shift for urban rail power systems. By harnessing agent intelligence in tandem with standardized communication, subway operators can establish an ecosystem where self-healing actions are triggered, coordinated, and verified in a manner that is both robust and scalable. This final section outlines how a convergent MAS–IEC 61850 architecture can be implemented, highlighting design considerations, operational workflows, and potential obstacles along the integration pathway.

4.3.1. Architectural Overview and Design Considerations

A convergent MAS–IEC 61850 architecture introduces distributed “agent brains” into the well-defined communication and modeling backbone offered by IEC 61850. Each protective device or Intelligent Electronic Device (IED) can be associated with an “agent” that interprets local measurements (modeled under IEC 61850 logical nodes), exchanges GOOSE messages with neighboring agents, and cooperates with a supervisory agent at the substation or control center level via MMS. The system-level design can be broken down as follows:

(1): Agent Layer: Comprising local device agents (e.g., relay agents and switchgear agents) and a station-level agent (or aggregator) that coordinates sectional restoration.
(2): IEC 61850 Communication Layer: Facilitating high-speed GOOSE transmissions among local agents for real-time protective functions, and employing MMS for configuration, monitoring, and slower control commands.
(3): Coordinated Control Layer: A higher-level mechanism, often at the control center, that merges data from multiple stations or lines. This layer might also integrate advanced AI algorithms for centralized oversight and strategic decisions.

From a modeling standpoint, each local agent references LN objects for measurements (e.g., (measurement unit) MMXU for power/voltage and PTOC for overcurrent protection) and thus can directly manipulate or read the data attributes under these logical nodes. The agent’s internal logic can be abstractly formulated as follows:

f_{agent} (X, Y, t) \to {Control Actions, Setpoints}

(10)

where X is the set of local LN data attributes (such as measured currents, voltages, and breaker statuses), Y is the set of GOOSE signals received from neighbors (e.g., protective trip commands and alarm states), and t indicates time or event trigger references.

4.3.2. Fault Diagnosis and Localization Workflow

A typical fault diagnosis scenario under convergent MAS–IEC 61850 architecture proceeds as follows:

(1): Initial Detection (Local Agent): Upon sensing an abnormal current or voltage signature (modeled by LN PTOC or PDIS), the local agent increments an internal fault counter. If the magnitude exceeds a set threshold, the agent broadcasts a GOOSE-based “suspected fault” message to adjacent nodes.
(2): Peer Confirmation (Neighboring Agents): Neighboring agents also evaluate their local signals. If they detect correlated anomalies, they respond with a GOOSE “confirmation” message. Weighted voting can be employed to mitigate false positives.
(3): Station-Level Aggregation: The station-level or aggregator agent (connected via MMS and local GOOSE) collects these events. Using a pre-defined topology map (SCL-based), it identifies the line segment or bus location with the highest probability of fault development.
(4): Refined Diagnostics: Optionally, advanced AI modules or traveling-wave-based algorithms can run at the aggregator level to further pinpoint the fault location.
(5): Isolation Instruction: Once the aggregator agent validates the fault location, it issues GOOSE open commands to the relevant breakers or switches, ensuring minimal disruption to unaffected lines.

4.3.3. Restoration and Reconfiguration Strategies

After isolating the fault, agents collaborate to restore power to the maximum possible portion of the subway network. The station-level agent consults the SCL topology to identify alternate feeding paths. If the AC ring or DC feeder lines can accommodate the extra load without violating thermal or voltage constraints, a reconfiguration command is broadcast. Restoration might unfold in a multi-step process:

Step 1: SW_healthy←Close, Step 2: Check P_capacity ≥ P_demand, Step 3: Activate the line if all constraints are satisfied.

Each local agent (switchgear or feeder agent) acknowledges the command, rechecks local conditions, and then closes or opens respective switches. This sequence is governed by multi-agent consensus, ensuring that no single device performs an unsafe action. Detailed logging and reporting (via MMS) guarantee thorough post-event analysis.

4.3.4. Security and Redundancy Considerations

When MAS intelligence relies on an IP-based IEC 61850 network, cybersecurity and redundancy are paramount. Agents must handle the encryption or authentication of GOOSE messages. Meanwhile, ring or dual-network topologies ensure that a single link failure does not compromise the entire self-healing process. Key security approaches include the following:

(1): Role-Based Access Control (RBAC) [110]: Agents only process control instructions from authenticated roles recognized by the substation system.
(2): Encrypted Tunneling of GOOSE/MMS: Emerging solutions propose TLS-based encryption for MMS, though GOOSE typically remains unencrypted for performance reasons.
(3): Backup Communication Channels [111]: For critical commands, multiple GOOSE subscriptions may be created in parallel networks (e.g., PRP + HSR) to reduce the risk of packet loss or delay.

4.3.5. Practical Challenges and Future Outlook

In practice, the synergy of MASs and IEC 61850 faces numerous engineering, organizational, and financial challenges. Notably, older DC traction systems lack standardized LN definitions, requiring the use of proxy or extended LN models. Additionally, debugging multi-agent logic in a live subway environment with thousands of daily passengers demands rigorous offline testing (hardware-in-the-loop simulations) prior to rollout. Nonetheless, the promise of real-time, distributed intelligence—coordinating with a vendor-agnostic, standards-based communication framework—strongly indicates that convergent MAS–IEC 61850 solutions will shape the next generation of subway power systems.

Based on the above, Table 13 outlines various deployment modalities—ranging from full greenfield implementations to more conservative station-focused upgrades. Each scenario entails different levels of complexity, initial expenditure, and performance demands. The success of these deployments hinges on a confluence of factors, including the readiness of legacy systems for integration, the expertise of stakeholders, and the clarity of standard definitions for DC traction contexts. Nonetheless, incremental or phased approaches can systematically unlock the benefits of MAS–IEC 61850 synergy.

Based on Table 13, we further summarize the future R&D themes for MAS–IEC 61850 convergence in subway networks, as demonstrated in Table 14. In this table, several emergent R&D themes underscore the evolving nature of MAS–IEC 61850 convergence, including the integration of edge computing, AI-driven fault forecasting, and the development of new LN classes for DC traction applications. Each theme demands multidisciplinary collaborations that range from cryptography for secure GOOSE messaging to advanced hardware engineering for resilient edge-based agents. The ultimate payoff—a fully autonomous, self-healing subway power system that leverages standardized communication—justifies the complexity of these endeavors.

Overall, this section has presented a comprehensive view of how MASs and the IEC 61850 standard can be synthesized into a single, cohesive architecture that elevates fault management and self-healing to new levels of efficiency and reliability in subway power networks. While operationalizing this synergy demands substantial effort in areas such as engineering, cybersecurity, and standardization, the strategic advantages are evident: higher system resiliency, minimized downtime, and an adaptive framework capable of meeting future urban transportation demands. By situating MAS intelligence within the standardized data and communication protocols of IEC 61850, subway operators can accelerate fault response; reduce manual interventions; and pave the way for a truly smart, autonomous rapid transit infrastructure. This integrated approach stands at the forefront of the ongoing “intelligence revolution” in power systems, offering a compelling vision for the next generation of urban rail electrification.

5. Practical Applications of Self-Healing Techniques in Subway Systems

In this chapter, we will examine the practical applications of self-healing techniques within subway power systems, specifically focusing on the integration of MASs, AI, and relevant standards such as IEC 61850. The application of these techniques is vital for improving the reliability and efficiency of subway networks, ensuring rapid fault detection, isolation, and recovery. This chapter will be organized into four key sub-sections:

(1): Substation-level self-healing applications in subway power systems;
(2): Line-level self-healing mechanisms and network reconfiguration;
(3): Cross-layer fault recovery techniques and strategies;
(4): AI-driven fault diagnosis and recovery in complex scenarios.

Each sub-section will delve into the practical applications of self-healing mechanisms in subway systems, highlight the technological advancements, and provide an in-depth analysis of the benefits and challenges involved in implementing these strategies. The sub-sections are connected logically, starting from individual substations and their self-healing capabilities; progressing to line-level self-healing; and finally addressing complex, multi-layer fault recovery strategies with the help of AI algorithms.

5.1. Substation-Level Self-Healing Applications in Subway Power Systems

Substations play a crucial role in the overall operation of subway power systems. They are the primary points of interaction between the high-voltage grid and the subway’s internal power distribution network. Their responsibility includes converting and distributing electrical power to subway stations and trains, making them a critical point of failure in any power outage event [112,113,114]. For example, Refs. [112,113,114] provide insights into the role of substations, fault detection, isolation, and reconfiguration in self-healing smart grid systems. They also highlight how these processes are integrated into critical infrastructure, like subway power systems, for reliability and operational efficiency. As previously discussed, substations play a crucial role in fault detection, isolation, and network reconfiguration, which is essential for the continuous operation of subway systems. Integrating self-healing mechanisms at the substation level enhances system reliability by enabling rapid fault identification and isolation, thereby restoring power to unaffected areas more efficiently.

However, scalability concerns arise when considering the application of these technologies in larger subway systems. While substation-level self-healing systems work effectively in smaller networks, the implementation of these systems in expansive urban subway networks requires careful planning. Specifically, as the number of substations increases, communication overheads and real-time data processing requirements increase, which may necessitate substantial investment in communication infrastructure and AI-driven control systems.

Cost considerations are also significant in large-scale implementation. While automated fault detection, isolation, and reconfiguration systems can minimize downtime and reduce maintenance costs over time, the initial capital expenditure for deploying advanced communication systems like IEC 61850 and AI-driven algorithms can be high. Therefore, a phased deployment strategy is recommended, starting with pilot implementations in smaller, manageable sections of the subway network before scaling up.

5.1.1. Fault Detection and Isolation

In substation-level self-healing systems, fault detection is the first critical step, followed by rapid isolation to prevent further system disruption. The application of digital fault recorders, current transformers, and protection relays with IEC 61850 protocols enables real-time fault detection and automated isolation. These systems are essential for minimizing the mean time to repair (MTTR) and improving overall network resilience.

Scalability issues may emerge when these systems are applied to larger subway networks with more substations, as the complexity of managing real-time communication between all devices increases. Effective system integration across substations becomes critical, and ensuring data consistency across multiple network layers will require robust data management systems and high-capacity communication infrastructure.

Once a fault is detected, the system must isolate the affected area quickly to prevent it from spreading and impacting other parts of the network. IEC 61850 standards enable real-time communication between devices within the substation, allowing for automated decision-making in the isolation process. Remote control devices and automated switches help to disconnect faulty sections from the rest of the grid, ensuring that only the impacted area is affected, and power continues to flow to other critical sections of the subway system.

5.1.2. Automated Reconfiguration

After isolating the faulted section, automated reconfiguration becomes necessary to restore service to the remaining sections. The key challenge here lies in optimizing the network’s configuration dynamically, based on real-time load conditions and fault location. The integration of AI-based reconfiguration allows the system to predict network behavior based on historical data and current system conditions, ensuring that power is rerouted efficiently.

Cost issues arise in the deployment of AI-based reconfiguration systems. While AI models require substantial computational resources and high-quality data for training, the long-term benefits—such as reduced downtime, optimized system performance, and predictive maintenance—can outweigh these initial costs. Predictive maintenance algorithms, for instance, can help prevent system failures before they occur, reducing unplanned maintenance costs significantly.

For example, Refs. [115,116,117] collectively explore the role of AI and machine learning in optimizing network performance, enhancing reconfiguration processes, and improving system efficiency. Alabi (2023) [115] discusses how AI methodologies such as reinforcement learning and deep learning contribute to network optimization in telecommunications, particularly through autonomous reconfiguration. Similarly, Umoga and Sodiya (2024) [116] delve into AI-driven optimization for dynamic network performance, focusing on how machine learning algorithms can facilitate adaptive network configurations in response to fluctuating conditions. Cruz et al. (2024) [117] provide a comprehensive review of AI applications for self-reconfiguration in smart manufacturing systems, emphasizing the integration of machine learning techniques to optimize operational efficiency and predict system behavior. Together, these studies underline the transformative impact of AI on system reconfiguration, predictive maintenance, and overall optimization in complex network environments.

The scalability of AI-based reconfiguration is also worth noting. As subway networks grow and expand, the demand for real-time processing power and advanced predictive models will increase, making it necessary to implement scalable cloud-based solutions and distributed computing frameworks. The ability of AI systems to scale will depend on the development of more efficient algorithms and distributed learning techniques that can be deployed across different sections of the network.

5.1.3. Key Technologies for Substation-Level Self-Healing

Several critical technologies contribute to substation-level self-healing, each addressing a specific aspect of fault detection, isolation, and recovery, as summarized in Table 15. These technologies include advanced communication protocols, remote control and monitoring systems, machine learning-based fault detection algorithms, and automated reconfiguration systems. These systems work in tandem to ensure that substations can respond quickly and efficiently to faults, minimizing downtime and ensuring continuous operation.

The integration of self-healing technologies at the substation level has proven to significantly enhance the overall reliability and efficiency of subway power systems. Automated detection, isolation, and reconfiguration ensure that faults are managed with minimal manual intervention, which not only reduces the risk of human error but also speeds up recovery times. Despite the advantages, challenges such as communication delays and system integration need to be addressed to further improve these systems’ effectiveness.

5.2. Line-Level Self-Healing Mechanisms and Network Reconfiguration

Line-level self-healing mechanisms are designed to handle faults along power distribution lines, which are often subject to environmental factors like storms, wildlife interference, and overloads. In this context, ring network configurations are particularly valuable, as they allow power to be rerouted through an alternative path when a fault occurs.

However, implementing these systems in large subway networks involves scalability concerns. Ring networks may become complex and inefficient if the number of interconnected lines increases. Additionally, line-level self-healing systems must be able to handle the increased number of fault detection sensors, automated switches, and MAS agents required to manage the additional complexity. Data communication between these components needs to be highly synchronized, and large-scale deployments will require robust communication protocols to handle the increased volume of data.

Cost and practical implementation issues also arise. While automated switches and MAS-based decision-making systems can significantly enhance fault detection and recovery, the high initial investment costs and the need for continuous maintenance of these systems could pose a barrier to widespread adoption in large urban transit systems. Therefore, the implementation of modular systems, where critical components are first deployed in high-priority areas, followed by gradual expansion, could offer a cost-effective solution.

5.2.1. Ring Network Configuration

A ring network is a type of network topology where multiple power paths are connected in a loop. This configuration allows for the rerouting of power if one segment of the network fails, ensuring that power continues to flow without major interruptions. When a fault occurs, the ring network can detect the faulted section and automatically isolate it, while simultaneously rerouting power to the affected area. This feature is particularly valuable in subway systems where uninterrupted service is crucial.

5.2.2. Automated Switches and MASs for Fault Detection and Isolation

Automated switches play a critical role in line-level self-healing. These switches can automatically disconnect faulty sections from the grid, ensuring that faults do not propagate further. They are equipped with sensors and communication devices that relay information about fault conditions to the central control system. These switches are often controlled through MASs, where multiple agents (representing different network components) collaborate to determine the best course of action based on real-time data.

5.2.3. Reconfiguration and Rerouting Power

When a fault is detected and isolated, the network must quickly reconfigure to restore power to the affected areas. Automated systems, powered by MASs and AI, help in making decisions about the most efficient way to reroute power to ensure continuity of service. This includes balancing loads across unaffected sections and optimizing power flow to reduce stress on the remaining parts of the network. Based on this, Table 16 presents an in-depth comparative analysis of several key line-level self-healing mechanisms within subway power networks. These mechanisms, including ring network reconfiguration, automated switches, MAS-based decision-making, fault detection sensors, real-time load balancing, and adaptive rerouting, play critical roles in enhancing the resilience, reliability, and overall performance of subway power supply systems. The table evaluates these mechanisms across multiple dimensions, including their degree of implementation, differences between systems, future prospects, issues to address, research potential, reliability impact, cost considerations, maintenance requirements, and impact on power quality.

A detailed summary and evaluation for Table 16 is presented as follows.

1. Ring Network Reconfiguration

Ring network reconfiguration is implemented at a high degree in most subway networks. It provides significant advantages, particularly in reducing the risk of power surges. However, the main challenge lies in the variability of its design across different systems, which can complicate integration. The future prospects for ring network reconfiguration are high, especially in terms of standardization. While it has a high research potential and reliability impact, it requires moderate costs and maintenance. Its impact on power quality is significant, making it an essential feature for enhancing power system robustness.

2. Automated Switches

Automated switches offer high implementation rates, particularly for fault isolation and rerouting. These systems are commonly available in most subway systems, contributing significantly to power reliability. However, time delays in operation and the moderate nature of their future prospects remain challenges. These systems present moderate research potential, and their high reliability impact makes them indispensable in critical power situations. Although the cost considerations are medium, they have relatively low maintenance requirements. Their contribution to improving power quality is notable, though not as high as some of the more advanced technologies.

3. MAS-Based Decision Making

The implementation of MAS-based decision-making is currently moderate, focusing on real-time fault management. This mechanism holds great promise for the future, particularly in smart grids, where its innovative potential can lead to more dynamic and adaptive responses. Despite the challenges in managing data communication overhead, this mechanism’s research potential remains high. The system’s impact on reliability is significant, especially for maintaining consistent power quality, but it faces moderate cost and maintenance requirements. Furthermore, MAS-based decision-making has high potential to improve system efficiency in the long term.

4. Fault Detection Sensors

Fault detection sensors are crucial for detecting and locating faults in subway networks. Their implementation is high, and they are critical for the efficiency of the systems. However, the occurrence of false negatives remains a challenge. These sensors provide medium research potential but contribute significantly to the reliability of power networks. The systems require moderate to high costs for implementation but have medium maintenance requirements. Their high impact on power quality makes them an essential part of a well-functioning self-healing system.

5. Real-time Load Balancing

Real-time load balancing plays a critical role in dynamically balancing the network load. This mechanism exhibits very high scalability but also presents challenges in coordinating complex systems. Real-time load balancing systems are highly promising in terms of enhancing power quality, though they require a significant investment in coordination and data throughput. Despite its high initial costs, this system shows great promise for optimizing the power system’s performance in the long term. Its research potential is high, as it can enable real-time adjustments to prevent overloads.

6. Adaptive Rerouting

Adaptive rerouting mechanisms, which adjust power flow dynamically, are based on cutting-edge technology and hold very high future prospects. These mechanisms offer significant advantages in terms of real-time adaptability, with the potential to dramatically reduce power disruptions. However, adaptive rerouting requires high data throughput and advanced technologies, making it one of the more complex systems to implement. Despite the high costs and medium maintenance requirements, adaptive rerouting mechanisms have a very high impact on power quality, which is crucial for ensuring the continuous operation of subway networks.

In conclusion, line-level self-healing mechanisms such as ring network reconfiguration, automated switches, and MAS-based decision-making enable quick fault isolation and recovery (Figure 4). These technologies ensure that power disruptions are minimized, and service can be restored quickly. The main challenges lie in the complexity of coordinating multiple agents across the network, and the requirement for continuous communication. Nonetheless, these mechanisms significantly improve the robustness and resilience of the system.

The fault isolation and recovery flowchart presented in Figure 4 illustrates the essential processes involved in self-healing control within a subway power supply system, highlighting the crucial steps from fault detection to system restoration. This flowchart includes four steps, as elaborated as follows.

1. Real-Time Monitoring and Data Collection

In this first step, the system continuously monitors operational parameters such as voltage, current, and temperature through sensors and monitoring devices. Once an anomaly is detected, such as voltage fluctuations or current imbalances, the system automatically triggers the self-healing control process. The use of the IEC 61850 communication protocol ensures the efficient and reliable transmission of data, allowing for the dynamic monitoring of critical system parameters. This early detection phase is crucial for initiating a rapid and accurate response to potential faults.

2. Fault Diagnosis and Location Analysis

After identifying an abnormal condition, the system utilizes historical data and real-time sensor readings to quickly diagnose the fault type and its precise location. AI algorithms, including pattern recognition and deep learning, process these data to create diagnostic models that pinpoint the fault accurately. The integration of these advanced AI techniques, coupled with the use of zero-sequence and differential current monitoring, enhances the efficiency and precision of fault identification, enabling the system to respond rapidly to issues in the subway power supply network.

3. Fault Isolation and Protection Actions

Once the fault is diagnosed, automated isolation devices are triggered to disconnect the affected area, preventing the fault from spreading and causing further damage. The multi-agent system (MAS) plays a critical role in coordinating resources across the entire network, dynamically adjusting load distribution to stabilize the system. The MAS ensures that the fault isolation is executed in an optimized sequence, minimizing the overall impact on the system’s functionality and ensuring that non-faulted regions maintain power. This coordination enhances system resilience and ensures minimal service disruption.

4. Recovery of Power Supply and Load Optimization

After the fault is isolated, the system focuses on restoring power to the unaffected regions. This is accomplished through the automatic switching of circuits to reconnect these areas to the power supply. To prevent overloads and secondary faults, intelligent optimization algorithms, such as game theory-based optimization, are applied to balance resource distribution across the network. These algorithms ensure the efficient recovery of power without overloading critical lines and help avoid secondary faults or service interruptions. The dynamic load optimization enhances the overall system stability and speeds up the recovery process, minimizing the downtime of the subway network.

As illustrated in Figure 4, this process begins with real-time monitoring and data collection, where sensors continuously gather operational parameters such as voltage and current. Upon detecting abnormalities, such as voltage fluctuations or current imbalances, the system triggers the self-healing procedure. The fault diagnosis and location analysis stage utilizes AI algorithms, including pattern recognition and deep learning, to swiftly identify the fault type and pinpoint its exact location based on both historical and real-time data. Following diagnosis, fault isolation and protection actions are automatically implemented, where automated isolation devices disconnect the faulted area, preventing further damage. The multi-agent system (MAS) coordinates resources across the network, dynamically adjusting load distribution to maintain system stability. Finally, in the recovery and load optimization phase, the system restores power to the unaffected areas by automatically switching circuits, and smart optimization algorithms ensure balanced resource allocation, minimizing the risk of overloading or secondary faults. This flowchart exemplifies the advantages of modern self-healing systems, combining AI-driven diagnostics, automated fault isolation, and dynamic load management to rapidly restore service and maintain network stability. Its key strengths lie in its high-speed response to faults, minimal disruption to non-faulted areas, and the intelligent optimization of resources, making it highly efficient for managing complex subway power networks.

Overall, this fault isolation and recovery flowchart is integral to a self-healing control system, showcasing an intelligent and automated approach to managing faults in complex subway power supply networks. The system’s reliance on real-time data collection, AI-based diagnostics, automated fault isolation, and dynamic recovery processes ensures a rapid, accurate, and minimal-impact response to power system disruptions. This approach enhances both operational reliability and efficiency, crucially maintaining continuous service while minimizing power loss and service downtime.

The mechanisms discussed in the table—ring network reconfiguration, automated switches, MAS-based decision-making, fault detection sensors, real-time load balancing, and adaptive rerouting—each contribute uniquely to the enhancement of subway power systems’ resilience and reliability. While each mechanism has its own set of challenges, particularly in terms of integration complexity, coordination, and data management, their future potential remains high. The mechanisms with higher scalability and dynamic response capabilities, such as real-time load balancing and adaptive rerouting, are particularly important for adapting to the growing demands of modern subway networks.

It is clear that while some systems, like automated switches, are already well integrated and have a proven track record, others like MAS-based decision-making and adaptive rerouting present substantial opportunities for future research and development. These technologies will become increasingly vital as the complexity of urban transit systems continues to evolve, making continuous innovation and research investment in this area critical.

5.3. Cross-Layer Fault Recovery Techniques and Strategies

Cross-layer fault recovery involves the coordination of fault management across different layers of the power network, from generation to distribution. This approach is essential for ensuring that faults are handled in a synchronized manner, minimizing the risk of cascading failures across multiple network layers.

Scalability challenges arise when attempting to implement multi-layer coordination systems in large subway networks. The complexity of cross-layer communication and data synchronization increases as the number of network layers grows. Hierarchical recovery systems, which involve different levels of control, must be carefully designed to ensure that they can scale without overwhelming the system’s computational resources.

Cost issues are particularly relevant here. The deployment of cross-layer recovery systems often requires substantial investment in advanced communication infrastructure and real-time data processing technologies. However, as these systems improve the overall resilience and efficiency of the subway power network, the long-term savings from reduced downtime, improved power quality, and predictive fault management can justify the initial costs.

5.3.1. Hierarchical Recovery Systems

Hierarchical recovery systems involve different levels of fault management, from local fault detection and isolation to higher-level network-wide coordination. The first level involves the detection and isolation of faults at the substation or line level, while the next level includes coordinating recovery actions across multiple substations or even across the entire power grid. At the highest level, centralized control centers monitor the overall system status and ensure that recovery actions are coordinated across the network.

5.3.2. Coordinated Fault Isolation Across Layers

Coordinated fault isolation is crucial to ensure that faults at one level do not affect other levels of the system. For example, if a fault occurs at the substation level, the system must isolate the fault while maintaining the overall operation of the grid. At the same time, automated decision-making processes must be in place to restore service as quickly as possible, whether through reconfiguration, load balancing, or rerouting.

5.3.3. MASs for Multi-Layer Coordination

Multi-agent systems are particularly useful in multi-layer fault recovery. They enable distributed decision-making, where each agent is responsible for coordinating recovery actions within its designated layer. These agents communicate with each other to ensure that the system-wide recovery actions are optimized. For example, an agent at the substation level may isolate a fault, while an agent at the transmission level can reroute power from other sources to ensure continuous service [118,119,120]. For example, the research work in [118,119,120] highlights the significant role of MASs in enhancing the resilience and efficiency of power systems, particularly in the context of multi-layer fault recovery. Lin and Bie (2018) [118] provide a comprehensive analysis of strategies for achieving power system resilience, emphasizing decentralized decision-making in MASs. Yu et al. (2020) [119] explore survivability-aware routing restoration mechanisms, demonstrating how MASs can optimize network communication during large-scale failures. Furthermore, Moradi et al. (2016) [120] examine the application of MASs in power engineering, emphasizing their ability to coordinate distributed recovery actions across different layers of the power network, such as substations and transmission systems. These studies collectively underscore the importance of MASs in ensuring rapid fault detection, isolation, and system restoration in complex, multi-layered infrastructures.

Based on this, Table 17 outlines key cross-layer fault recovery mechanisms aimed at enhancing the resilience of subway power systems. These mechanisms—hierarchical recovery, cross-layer isolation, MASs for multi-layer coordination, adaptive load balancing, data integration systems, and predictive recovery algorithms—represent various approaches to fault isolation, decision-making, and recovery coordination across multiple layers of the power system. The table assesses each mechanism based on its degree of implementation, differences between systems, future prospects, issues to address, research potential, reliability impact, cost considerations, maintenance requirements, and impact on power quality.

From Table 17, a detailed summarization and evaluation is presented as follows.

1. Hierarchical Recovery

Hierarchical recovery is implemented at a high level in subway power systems, particularly for coordinated fault recovery. The main challenge here is the system complexity, as the approach varies significantly across different systems. Hierarchical recovery shows very high future prospects due to its potential for seamless fault management. The research potential is high, and its impact on reliability is very significant. However, its cost considerations and maintenance requirements are high, which is a common challenge in large-scale systems. Despite these challenges, hierarchical recovery mechanisms provide very high benefits in terms of power quality, making them essential for enhancing overall system robustness.

2. Cross-Layer Isolation

Cross-layer isolation mechanisms are moderately implemented and are particularly focused on isolating faults across multiple layers of the system. These systems are critical for new and emerging subway power networks. The key issue to address is maintaining data consistency, which is essential for effective fault detection and recovery. The research potential is high, and the impact on reliability is substantial. While the costs are moderate, the complexity of integrating multiple layers of data can be a challenge. Cross-layer isolation mechanisms have a high impact on power quality, making them crucial for efficient and resilient power systems.

3. MASs for Multi-Layer Coordination

MASs are used for distributed decision-making in recovery processes. These systems are implemented at high degrees, requiring data consistency across multiple layers of the power system. The challenge is overcoming communication delays between agents, which can hinder decision-making efficiency. Despite these challenges, MASs for multi-layer coordination hold very high future prospects, particularly in enhancing system resilience. Research into this mechanism remains high, as it is central to modern smart grids and large-scale systems. While it demands high costs and maintenance, it has a very high impact on power quality, especially when applied to complex networks requiring adaptive decision-making.

4. Adaptive Load Balancing

Adaptive load balancing is critical for balancing load across multiple levels of the subway network. Its implementation is high, especially in large systems where dynamic load adjustment is needed. The key issue in this approach is the requirement for real-time data processing and accurate system monitoring. The research potential for adaptive load balancing is high, with very high prospects for improving system efficiency. While it has high costs and maintenance needs, it offers a very high impact on power quality by ensuring that the network can dynamically adjust to load fluctuations. This makes adaptive load balancing a vital tool for maintaining power stability in large subway networks.

5. Data Integration Systems

Data integration systems are essential for integrating data across multiple layers of the subway network, particularly for decision-making processes. These systems are moderately implemented, with ongoing challenges in ensuring data synchronization across different layers. The research potential remains high, with a significant focus on overcoming data consistency and synchronization issues. Data integration systems contribute highly to the reliability and efficiency of subway power systems, although their cost and maintenance requirements are also high. Despite these challenges, the systems have a very high impact on power quality, making them a key area for continued development.

6. Predictive Recovery Algorithms

Predictive recovery algorithms are in the experimental stage and focus on anticipating faults and recovery actions. These systems are particularly useful in providing proactive solutions to prevent faults from escalating. However, they face challenges related to the accuracy of fault models, which can limit their effectiveness. The research potential is moderate, as this area is still evolving, and real-world applications are limited. While the costs for implementing predictive recovery algorithms are moderate, their maintenance needs are high due to the complexity of the algorithms. Despite the challenges, these systems offer a very high impact on power quality by enhancing the system’s ability to preemptively address potential disruptions.

Overall, the cross-layer fault recovery techniques ensure that faults are managed seamlessly across different levels of the subway power network. By using hierarchical recovery systems, coordinating fault isolation, and leveraging MASs for multi-layer coordination, these techniques help improve overall system resilience. The integration of real-time data and predictive algorithms further enhances the effectiveness of these systems. However, issues related to data synchronization, communication, and system complexity remain challenges to be addressed.

The cross-layer fault recovery mechanisms outlined in the table—hierarchical recovery, cross-layer isolation, MASs for multi-layer coordination, adaptive load balancing, data integration systems, and predictive recovery algorithms—are all integral to improving the robustness and resilience of subway power systems. Each mechanism offers distinct advantages in terms of fault isolation, dynamic recovery, and decision-making, although they also present various challenges related to system complexity, data synchronization, and real-time operation.

Among these, hierarchical recovery and MASs for multi-layer coordination show the most promise for long-term development, as they provide coordinated and adaptive solutions for large-scale systems. However, issues related to data consistency, communication delays, and high implementation costs remain significant barriers that need to be addressed. Predictive recovery algorithms, while still experimental, represent a transformative approach to proactive fault management and could become crucial as AI and machine learning technologies advance.

Ultimately, the integration of these mechanisms into subway power systems will enhance operational efficiency, reduce downtime, and improve overall power quality, making them indispensable for the future of smart and resilient subway networks.

5.4. AI-Driven Fault Diagnosis and Recovery in Complex Scenarios

AI is playing an increasingly critical role in the diagnosis and recovery of faults in complex subway power systems. Machine learning, deep learning, and predictive analytics allow for faster and more accurate identification of faults, even in challenging scenarios where traditional methods may struggle.

Scalability remains a significant challenge when applying AI-based fault diagnosis to large-scale systems. The number of sensors, data points, and computational requirements for deep learning models increases as the subway network expands. Therefore, a cloud-based approach or distributed AI systems may be necessary to handle the data processing demands of larger systems. These AI models must be continuously updated with real-time operational data to ensure their accuracy and adaptive capabilities in dynamic environments [121,122,123].

The cost of AI-driven systems can be high, particularly in terms of computational resources and data storage requirements. However, the implementation of AI-based systems can significantly reduce manual intervention and improve operational efficiency, leading to long-term savings. Additionally, AI-based reconfiguration systems can reduce downtime and optimize network recovery, providing a high return on investment over time.

5.4.1. Machine Learning for Fault Prediction

Machine learning algorithms can be trained using historical fault data to predict potential future faults based on patterns and trends [124,125,126]. By analyzing large volumes of data, these algorithms can identify early warning signs of potential faults before they occur. This proactive approach allows for better planning and mitigation strategies, reducing the overall impact of faults.

5.4.2. Deep Learning for Real-Time Fault Diagnosis

Deep learning models, which are a subset of machine learning, can be particularly useful for real-time fault diagnosis [127,128]. These models analyze data from multiple sensors and sources, identifying complex patterns that may indicate a fault. The advantage of deep learning is its ability to process large amounts of data and learn from it, enabling quicker diagnosis and recovery times.

5.4.3. AI for Automated Reconfiguration

AI can also assist in the automated reconfiguration of subway power systems after a fault has been isolated [129]. By considering a range of factors such as network load, fault severity, and environmental conditions, AI can determine the most effective configuration for restoring power. This dynamic reconfiguration is crucial in ensuring that service is restored as quickly as possible without overloading other parts of the network. Based on this, Table 18 presents an analysis of several AI-driven technologies that are applied to fault diagnosis and automated reconfiguration in subway power systems. The technologies highlighted include machine learning algorithms, deep learning models, AI-based reconfiguration, predictive analytics, fault pattern recognition, and automated diagnostic systems. These systems aim to predict faults, diagnose issues in real time, and optimize recovery strategies to ensure the rapid restoration of power in the event of system disruptions.

From Table 18, a detailed summary and evaluation is elaborated as follows.

1. Machine Learning Algorithms

Machine learning algorithms are widely implemented in modern systems to predict faults before they occur [130]. The key issue with this technology is ensuring data accuracy, as accurate data input is critical for the effective prediction of faults. The future prospects for machine learning in fault prediction are very high due to their ability to enhance real-time diagnostics and preemptively manage faults. These algorithms have a very high reliability impact as they improve the system’s overall resilience. Although the research potential remains high, the associated costs are moderate, with medium maintenance requirements. Their impact on power quality is also very high due to the proactive nature of fault management and early intervention.

2. Deep Learning Models

Deep learning models are increasingly used for real-time fault diagnosis, with moderate implementation in subway systems [127]. These models require large datasets to effectively identify faults, which poses challenges in data consistency and accuracy. Despite these challenges, deep learning models have very high future prospects, as they can learn from vast amounts of historical data to provide highly accurate diagnostics. The reliability impact is very high, and the research potential remains significant, especially with advancements in neural networks and computational capabilities. However, deep learning models require high computational power, making them costly and high-maintenance. Despite these requirements, the impact on power quality is very high due to their accuracy and real-time capabilities.

3. AI-based Reconfiguration

AI-based reconfiguration focuses on optimizing network recovery after a fault is isolated. While the technology is still evolving, it holds very high future prospects due to its potential to improve recovery times and efficiency. One of the primary challenges with AI-based reconfiguration is the need for fast computation to optimize decisions in real time, particularly in large subway systems. The research potential remains high, and the technology’s reliability impact is also significant. The cost considerations are high, as the system requires substantial computational resources and integration with existing infrastructure. Despite the costs, the impact on power quality is very high due to its ability to dynamically restore power to affected parts of the network.

4. Predictive Analytics

Predictive analytics is widely used in advanced grids to forecast network conditions and potential faults. The system is highly implemented and has very high future prospects, given its ability to anticipate issues before they occur. The major challenge for predictive analytics lies in algorithm complexity, as the models must process and analyze large volumes of real-time data. The research potential for predictive analytics is high, and its impact on reliability is significant due to its ability to optimize power system management before faults emerge. Cost considerations remain high, but predictive analytics offers high potential for improving system reliability. Its maintenance requirements are medium, and the impact on power quality is very high, especially in maintaining system stability.

5. Fault Pattern Recognition

Fault pattern recognition identifies complex fault scenarios, which are common in AI-driven systems. The system is highly implemented, especially for scenarios where faults may be difficult to detect manually. However, the key challenge is the need for large amounts of training data to accurately identify fault patterns. The future prospects for fault pattern recognition are very high, and its reliability impact is also substantial, as it enhances fault detection accuracy. The research potential is high, and the technology requires significant investment in training data and integration complexity. Despite these challenges, the impact on power quality is very high, as it can significantly reduce downtime and system disruptions.

6. Automated Diagnostic Systems

Automated diagnostic systems are used for automated fault detection and troubleshooting, and their implementation is moderate [131,132]. This technology is relatively new in subway systems, which can lead to integration complexity. Despite these challenges, automated diagnostic systems offer high research potential due to their ability to quickly identify and address faults. They require medium to high costs for integration and maintenance, but the technology significantly improves the system’s ability to respond to faults efficiently. The impact on power quality is very high, as automated diagnostics help maintain continuous operation and prevent larger system failures.

As summarized above, AI-driven fault diagnosis and recovery technologies are central to enhancing the resilience, efficiency, and quality of subway power systems. Each of the technologies listed in Table 18—machine learning algorithms, deep learning models, AI-based reconfiguration, predictive analytics, fault pattern recognition, and automated diagnostic systems—contributes uniquely to fault management and network recovery. While they all present some challenges related to data accuracy, integration complexity, and computational demands, their future prospects remain very high.

Machine learning and deep learning technologies, in particular, have the potential to revolutionize fault prediction and diagnosis, providing advanced solutions to issues related to system reliability. AI-based reconfiguration and predictive analytics are poised to improve recovery times and power system optimization, ensuring that power disruptions are minimized. Fault pattern recognition and automated diagnostic systems are essential for improving fault detection and reducing downtime.

Overall, the continued development and integration of these technologies are crucial for the advancement of modern subway power networks. However, it is important to address issues related to data synchronization, computational requirements, and system integration to fully realize their potential in the future. In Section 5, these subsections together provide a comprehensive overview of the practical applications of self-healing technologies in subway power systems, focusing on fault recovery, network reconfiguration, and the integration of advanced AI and MASs for improved system resilience and efficiency. The application of these techniques enhances the reliability and sustainability of urban transportation networks, ensuring uninterrupted service even in the event of faults.

6. Implications of AI Technologies for Future Subway Power Systems

In this chapter, we delve into the wide-ranging implications of emerging AI technologies for the future development of subway power systems. Building on previous discussions of self-healing architectures, multi-agent frameworks, and IEC 61850-based communication protocols, we focus here on how AI—particularly machine learning (ML), deep learning (DL), and reinforcement learning (RL)—can transform operational strategies, maintenance paradigms, data governance, and broader socio-economic outcomes in the context of subway power supply. Specifically, we present five key subtopics that capture the multifaceted opportunities and challenges posed by AI in this domain.

First, Section 6.1 explores AI-enhanced fault diagnosis and prognostics, illustrating how cutting-edge data analytics can enable predictive maintenance and near-real-time failure detection. We present a distinct approach to predictive maintenance by incorporating AI methodologies that combine historical data with real-time operational signals to improve failure prediction accuracy in complex subway power systems. This expands upon traditional techniques by incorporating newer AI-based predictive models that integrate physics-driven insights and data fusion for improved fault detection and prognosis, as seen in recent applications within smart grid and industrial automation systems. Next, Section 6.2 investigates reinforcement learning and decision-making in self-healing processes, illustrating how adaptive algorithms can optimize power restoration and system resilience. Section 6.3 addresses the integration of AI and MASs under IEC 61850, highlighting the ways in which standardized communication protocols can be leveraged to support distributed intelligence and collective fault management. Moving beyond the purely technical dimensions, Section 6.4 focuses on cybersecurity, privacy, and data management for AI-driven subway power systems, considering how the influx of real-time data demands novel governance frameworks. Finally, Section 6.5 concludes with next-generation operational strategies and socio-economic implications, discussing how AI-enabled subway power systems may reshape workforce development, economic modeling, and stakeholder engagement.

These five sections provide a comprehensive perspective on how AI innovations are expected to evolve and influence subway power systems. Through the inclusion of novel approaches, particularly in AI-driven decision-making processes for self-healing, we introduce emerging techniques and applications that have not been extensively explored in prior studies. These techniques, including hybrid machine learning methods and real-time adaptive models, are at the forefront of advancing subway power infrastructure, offering new contributions to the field and distinguishing this review from earlier works.

6.1. AI-Enhanced Fault Diagnosis and Prognostics

One of the most significant applications of artificial intelligence in subway power systems is the enhancement of fault diagnosis and prognostics. Traditional fault detection methods—often reliant on predefined thresholds and manual inspections—are increasingly inadequate in the face of complex, dynamic operating conditions. AI-based approaches, by contrast, can leverage advanced data analytics to identify subtle patterns in real-time signals, historical maintenance logs, and contextual environmental data. The implications for subway power systems are profound: improved detection speed, greater accuracy, reduced downtime, and the ability to predict failures before they occur.

AI-based fault diagnosis now incorporates advanced methodologies that allow for dynamic analysis of both sensor data and environmental context. A shift from traditional rule-based models to AI-enhanced diagnostics involves the fusion of deep learning and machine learning techniques with domain-specific knowledge. This approach, particularly when applied to hybrid AI-physics models, helps identify fault signatures that might otherwise remain undetected using conventional methods. Recent research has shown the successful application of such hybrid models in areas like predictive maintenance within industrial and energy systems, enhancing the robustness of subway power systems.

6.1.1. Transition from Reactive to Predictive Maintenance

AI-empowered strategies for fault diagnosis enable a shift from reactive to predictive maintenance paradigms. Historically, failures in power components such as transformers, switchgear, and traction substations could lead to severe operational disruptions, significant repair costs, and compromised passenger safety. Predictive maintenance uses AI to process multi-source data (e.g., sensor readings, operational logs, images from thermal cameras, vibrations, and acoustic signals) and identify early warning signs.

(1): Machine Learning Models: Supervised learning algorithms (e.g., Support Vector Machines and Random Forests) can detect anomalies in high-dimensional data, building predictive models that correlate subtle parameter shifts—such as partial discharges or fluctuating voltage profiles—to impending failures.
(2): Deep Learning Architectures: Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) offer enhanced feature extraction and sequence analysis, making it possible to detect complex non-linear relationships in sensor streams, identify irregularities in real time, and precisely localize potential faults within specific subsystems.

By incorporating these AI methods, subway operators can anticipate maintenance needs and schedule repairs in a manner that minimizes disruption. As a result, not only can subway power systems achieve higher reliability, but also overall operational expenditures can be significantly reduced.

6.1.2. Real-Time Monitoring and Edge Analytics

Real-time monitoring is essential for the effective operation of subway power systems, given their mission-critical nature. Deploying AI-driven edge analytics in local controllers allows for immediate processing of data without the latency introduced by centralized cloud computing [133]. This decentralized approach, similar to advancements in the industrial IoT sector, offers significant improvements in fault response times by enabling local processing of sensor data [134]. For instance, Refs. [133,134] emphasize the integration of real-time monitoring and AI-based edge analytics to enhance the performance and reliability of critical infrastructure, including subway power systems. Leveraging edge computing with deep learning models for real-time fault detection ensures continuous operation even during communication breakdowns, maintaining system resilience in highly dynamic environments. This setup reduces system vulnerabilities and optimizes response times, ensuring critical components remain functional despite potential disruptions.

In practice, an edge device near a substation might run a compact deep learning model trained to recognize specific fault signatures (e.g., harmonic distortions indicative of insulation breakdown). Upon detecting an anomaly, it could initiate an automated diagnostic routine or communicate with a higher-level control center for further analysis. Such distributed intelligence aligns closely with the principles of self-healing networks and multi-agent systems, wherein localized decisions can contain or mitigate faults quickly.

6.1.3. Challenges and Future Directions

While AI-enhanced fault diagnosis offers considerable potential, several challenges remain. One key obstacle is the quality and integrity of data, which are essential for generating accurate predictions. Recent advancements in AI, such as the application of generative models for data augmentation, have been proposed to address the challenge of limited datasets, especially for rare fault scenarios. These techniques, which have proven successful in industries such as healthcare and autonomous vehicles, could greatly enhance the robustness of AI models used for fault prediction in subway power systems. Furthermore, the interpretability of deep learning models continues to be a concern. Research is actively addressing this issue by developing explainable AI techniques that provide insight into the decision-making process, fostering operator trust and ensuring model transparency. Looking ahead, addressing these challenges will involve the following:

(1): Data Fusion: Combining structured data (e.g., SCADA logs) with unstructured data (e.g., images and audio signals) for more holistic fault models.
(2): Transfer Learning: Leveraging knowledge from related domains (e.g., electrified railways) to build robust classifiers even when local fault data are scarce.
(3): Explainable AI: Integrating interpretable model architectures or post hoc interpretability frameworks to ensure that operators understand and trust AI-driven fault detection decisions [135,136].

Table 19 highlights various fault diagnosis and prognostics scenarios in subway power systems, mapping each to its implementation stage, unique features, and prospective impact on overall system performance. This table demonstrates that AI-based methods yield significant improvements in fault isolation speeds and predictive maintenance accuracy, while persistent challenges remain in addressing data scarcity and system complexity. Notably, bridging these gaps requires both domain-oriented strategies—like hybrid physics-data approaches—and advanced AI methods such as transfer learning, federated learning, and explainable AI. Overall, AI-enhanced fault diagnosis and prognostics offer a transformative path forward, enabling subway power systems to move beyond reactive maintenance and toward a resilient, future-ready operational model. As data quality, model interpretability, and real-time decision-making capabilities improve, these techniques will become integral to enhancing reliability, lowering operational costs, and ensuring passenger safety.

6.2. Reinforcement Learning and Decision-Making in Self-Healing Processes

Reinforcement learning (RL) presents a compelling framework for dynamic decision-making in self-healing subway power systems. Unlike traditional control methods that rely on static, rule-based logic, RL algorithms learn optimal policies through continuous interactions with the environment. This approach holds immense promise for managing complex power distribution networks, where real-time reconfiguration, load balancing, and fault recovery must be orchestrated under varying operational constraints.

Recent advancements in deep reinforcement learning (DRL), particularly in the form of multi-agent RL frameworks, have demonstrated significant promise for complex infrastructure management. These frameworks enable distributed decision-making among multiple agents operating in parallel, each responsible for managing specific subsystems. This approach is well suited for subway power systems, where different agents can independently optimize substation operations, power rerouting, and fault recovery while ensuring overall system stability. These developments represent a significant evolution from traditional RL approaches and align closely with emerging smart grid technologies.

6.2.1. Core Principles of Reinforcement Learning

RL algorithms are typically defined by the concepts of agents, environments, states, actions, and rewards. In a subway power system context, they are as follows:

(1): Agent: The AI controller (or controllers) responsible for adjusting switch positions, redirecting power flows, or prioritizing fault isolation.
(2): Environment: The subway power system, inclusive of its multi-bus topology, traction substations, feeders, and protective devices.
(3): State: The real-time status of the system, including voltage levels, load demands, equipment health indicators, and fault locations.
(4): Action: Any operational command the RL agent can perform, such as opening or closing circuit breakers, adjusting converter setpoints, or initiating fault isolation protocols.
(5): Reward: A numerical signal representing the quality of each action, often linked to performance metrics like minimized outage duration, voltage stability, or reduced energy losses.

Through iterative exploration and exploitation, RL algorithms can discover dynamic strategies for reconfiguring the power system in response to faults or varying load conditions, thereby supporting self-healing functionality.

6.2.2. Adaptive Self-Healing Under Uncertainty

One of the critical challenges in self-healing networks is dealing with uncertainty—both in terms of partial system observability and rapidly changing demand patterns. RL excels in these conditions, as it can learn to balance exploration (trying new configurations) with exploitation (using known successful actions).

(1): Deep Q-Networks (DQN): These incorporate neural networks to approximate the action-value function, enabling RL agents to handle high-dimensional state spaces [137,138].
(2): Policy Gradient Methods: Algorithms like Proximal Policy Optimization (PPO) or Advantage Actor-Critic (A2C) learn continuous control policies, facilitating more nuanced actions such as incremental power flow adjustments.
(3): Model-Based RL: By integrating predictive models of system behavior (e.g., partial differential equations describing network flows), RL agents can plan ahead, simulating potential outcomes of different actions before implementation.

This adaptability allows RL-driven self-healing systems to isolate faults more rapidly and re-route power with minimal operator intervention, enhancing overall reliability and resilience.

6.2.3. Multi-Agent Coordination

In the context of subway power systems, different functional areas—traction substations, feeders, and signaling equipment—can be managed by specialized RL agents. A multi-agent RL approach allows these subsystems to coordinate actions effectively. The integration of cooperative multi-agent RL models ensures that decisions made by individual agents are aligned with broader system-wide goals, such as minimizing power outages and maximizing operational efficiency. Recent progress in cooperative RL, particularly in optimizing energy distribution across urban power networks, shows promise for enhancing coordination among subway power system agents, ensuring both fault isolation and optimal power restoration.

6.2.4. Challenges in RL-Based Self-Healing

Despite its promise, RL faces notable hurdles in practical railway power scenarios:

(1): Safety Constraints: Subways operate under strict safety regulations, so RL actions must never compromise passenger safety. Techniques like safe RL or reward shaping can incorporate safety margins.
(2): Scalability: Large systems with dozens of substations and thousands of sensors result in vast state-action spaces, necessitating advanced function approximation and distributed training architectures.
(3): Learning Speed: RL algorithms may require numerous interactions or simulated “episodes” to learn effective policies. Building high-fidelity digital twins for training is thus essential.
(4): Generalization: Policies learned under certain load patterns or fault conditions may not generalize well to unseen scenarios, underscoring the need for robust domain adaptation and online learning strategies.

Table 20 outlines the primary RL approaches, benefits, and challenges for self-healing processes in subway power systems. While feeder reconfiguration and load balancing show high promise, scaling these solutions requires addressing safety, computational complexity, and real-time adaptability. Cooperative multi-agent RL emerges as a powerful paradigm, but it demands seamless data exchange and robust coordination protocols. Each scenario also highlights the necessity of sophisticated simulation tools (digital twins) for training RL agents under realistic conditions.

In conclusion, RL offers a dynamic, adaptive framework for enhancing self-healing capabilities in subway power systems. By learning from experience, RL algorithms can optimize fault isolation and power restoration in uncertain and evolving conditions. Although technical challenges—particularly in the realms of safety, scalability, and domain adaptation—remain, continued research and pilot implementations will refine RL-driven approaches, ushering in a new era of intelligent, resilient, and self-reconfiguring subway power infrastructures.

6.3. Integration of AI and Multi-Agent Systems Under IEC 61850

The third focal area explores how AI methods, when paired with MASs and standardized by IEC 61850 protocols, can unlock advanced self-healing, interoperability, and collaborative decision-making in future subway power systems. While MAS architectures enable the distribution of tasks and knowledge among specialized agents, IEC 61850 provides a common language for data exchange. AI techniques further enrich these frameworks by injecting predictive capabilities, adaptive optimization, and real-time learning.

Recent advancements in IEC 61850 extensions have focused on integrating AI-driven systems to improve real-time fault management, such as through enhanced predictive maintenance capabilities and adaptive load balancing. Furthermore, the application of AI-enhanced decision-making algorithms within these MAS frameworks ensures that fault recovery processes are both faster and more reliable. The increasing integration of AI and MASs with IEC 61850 is driving new innovations in the development of self-healing networks that rely on adaptive, data-driven approaches to enhance system reliability.

6.3.1. Roles of MASs and IEC 61850 in Subway Power Systems

MASs are collections of semi-autonomous entities—“agents”—capable of independent decision-making. In a subway environment, each agent might represent a specific subsystem or physical asset (e.g., a traction transformer, a protection relay, or a signaling interface). The MAS approach decentralizes control, enhancing flexibility, scalability, and fault tolerance.

IEC 61850, originally developed for substation automation in utility power grids, offers a rich object-oriented data model and standardized communication services (GOOSE, MMS, etc.) [139,140]. As subway power systems become more sophisticated, adopting IEC 61850 ensures consistent naming conventions, standardized data structures, and event-driven messaging. Consequently, agents in an MAS architecture can seamlessly exchange data, coordinate actions, and maintain a shared situational awareness.

6.3.2. AI-Driven Coordination and Decision-Making

AI plays a pivotal role in orchestrating multi-agent cooperation under IEC 61850. By analyzing system-wide telemetry and status signals, AI algorithms can identify potential conflicts or synergies among agents. For instance, a traction substation agent anticipating overload conditions might request a neighboring substation agent to reroute power flows. AI-driven coordination ensures that these negotiations happen quickly and optimally, considering constraints like safety margins, service priorities, or economic dispatch rules.

Moreover, advanced AI algorithms—such as Graph Neural Networks [141] or complex optimization solvers—can leverage the structured data from IEC 61850 to model the relationships among power system components. Agents can then perform local computations while concurrently feeding results into a global optimization layer, leading to emergent, system-wide intelligence.

6.3.3. Interoperability and Standardization Benefits

A key advantage of integrating AI with MASs under IEC 61850 is interoperability. Legacy subway systems often use proprietary communication protocols, leading to device incompatibility and vendor lock-in. IEC 61850 breaks down these barriers, enabling cross-vendor and cross-application integration. From the perspective of AI, having a standardized data schema enhances the portability and scalability of models, as consistent data structures streamline data ingestion and model training processes.

As a result, operators can incorporate new AI-driven applications—like advanced fault analytics or dynamic reconfiguration—without overhauling existing hardware. The MAS approach further compartmentalizes tasks, so if a new AI module is added, only the relevant agents need updating, preserving overall system stability.

6.3.4. Potential Obstacles and Evolution

The integration of AI and MASs under IEC 61850 faces several potential obstacles:

(1): Communication Latency: While IEC 61850 supports high-speed messaging, real-time AI inference may still demand edge computing infrastructure to avoid round-trip delays.
(2): Cybersecurity: The standard’s emphasis on connectivity raises cybersecurity concerns. AI modules and MAS agents could become targets of sophisticated cyberattacks, necessitating robust encryption, authentication, and intrusion detection schemes.
(3): Complexity of Agent Interactions: As the number of agents grows, orchestrating their interactions can become unwieldy. AI-based supervision layers must handle negotiation protocols, conflict resolution, and consistency checks.
(4): Operational Validation: Formal verification and testing of AI-driven MAS solutions remain challenging, given the high-stakes nature of subway operations.

In the future, ongoing standardization efforts for advanced functionalities—like the IEC 61850 extensions for distributed energy resources—could incorporate guidelines specific to AI integration. Initiatives aimed at real-time digital twins and 5G communication might also enhance the synergy between MASs and IEC 61850, broadening the horizon for intelligent subway power systems. Based on this, Table 21 enumerates key dimensions of integrating AI, MASs, and IEC 61850. Each row highlights how agent-based architectures benefit from standardized communication and AI-driven analytics, describing both advantages (e.g., swift fault isolation and improved energy routing) and obstacles (e.g., vendor constraints and cybersecurity risks). Notably, the long-term potential for each dimension tends to be high or very high, underscoring the transformative power of this integration.

In closing, merging AI with MASs under the IEC 61850 umbrella paves the way for a more coordinated, interoperable, and adaptive subway power ecosystem. By distributing intelligence across multiple agents, harnessing standard communication protocols, and leveraging AI for data-driven decisions, subway systems can evolve toward more agile and resilient operations. Future developments will likely emphasize refined cybersecurity policies, real-time digital twins, and advanced scheduling algorithms, collectively forming the backbone of next-generation intelligent rail networks.

6.4. Cybersecurity, Privacy, and Data Management for AI-Driven Subway Power Systems

With the integration of AI, IoT devices, and multi-agent architectures, the volume and sensitivity of data in subway power systems are growing at an unprecedented rate. While these data streams fuel advanced analytics and self-healing capabilities, they also introduce heightened risks related to cybersecurity, data privacy, and governance. A robust data management framework is thus imperative to ensure reliable operations and maintain public trust.

To safeguard AI-driven systems, cutting-edge technologies such as blockchain for secure data logging and federated learning for data privacy are being explored. These technologies, proven in fields such as digital finance and healthcare, allow for secure data sharing and model training while ensuring data privacy. Blockchain provides a transparent and immutable ledger, ensuring that operational data remain tamper-proof and secure, particularly in the event of cyberattacks. Additionally, federated learning enables AI models to be trained locally, reducing data exposure risks and allowing for privacy-preserving data analytics, critical for managing sensitive infrastructure data.

6.4.1. Cyber Threat Landscape

AI-driven subway power systems, connected through IEC 61850 or other communication protocols, present an attractive target for cybercriminals or malicious state actors. Potential attacks include the following:

(1): Data Poisoning: Manipulating training datasets to degrade AI model performance or trigger erroneous system decisions.
(2): Ransomware: Encrypting critical operational data and demanding payment to restore access, thus threatening system continuity.
(3): Denial of Service (DoS): Flooding communication channels with spurious traffic, hindering real-time control signals.
(4): Sensor Spoofing: Feeding corrupted sensor data into AI models, leading to incorrect fault diagnoses or false alarms.

In the context of a critical infrastructure like a subway system, even minor disruptions can have severe social, economic, and safety repercussions. Consequently, cybersecurity must become an integral component of AI system design, not an afterthought.

6.4.2. Data Privacy and Ethics

Beyond technical vulnerabilities, subway operators and city authorities must navigate privacy concerns. AI systems might aggregate detailed operational data, video feeds, passenger flow metrics, or location patterns. While these data points are essential for predictive maintenance or advanced analytics, storing and analyzing them also raise ethical questions. For instance, if camera feeds are used to measure passenger loads to predict electrical demand, they might inadvertently collect personally identifying information. Ensuring compliance with data protection laws (such as the General Data Protection Regulation, GDPR, in the European context) is critical for maintaining public trust.

Operators should consider adopting privacy-preserving AI techniques, including differential privacy or secure multiparty computation, which allow for collaborative model training or data analysis without exposing sensitive details. Implementing role-based data access controls and robust de-identification procedures further reduces the risk of misuse or accidental disclosure.

6.4.3. Comprehensive Data Governance

Data governance encompasses the policies, processes, and technical measures necessary for responsible data handling. A holistic framework would include the following:

(1): Data Ownership: Defining clear ownership structures for sensor data, operational logs, and passenger metrics, potentially involving multiple stakeholders (public transport authorities, private operators, and technology vendors).
(2): Data Lifecycle Management: Establishing guidelines for data collection, storage, access, retention, and deletion. Ensuring that data archiving practices meet both regulatory requirements and system operational needs.
(3): Metadata and Standardization: Maintaining standardized metadata to enhance data discoverability and interoperability, vital for multi-agent systems reliant on consistent data schemas.
(4): Quality Assurance: Integrating data validation protocols and anomaly detection to safeguard against corrupted or incomplete data inputs that could compromise AI-driven decisions.

6.4.4. Strategies for Resilience and Compliance

To fortify cybersecurity and privacy in an AI-driven environment, operators must adopt a layered security approach, including encryption, anomaly detection, intrusion detection systems (IDSs), and zero-trust architectures. Specific measures may involve the following:

(1): Secure AI Pipelines: Implementing code-signing, containerization, and version control to prevent tampering with ML models or inference services.
(2): Federated Learning: Training AI models locally on devices or substations and then aggregating only model parameters. This approach minimizes data movement and reduces exposure risks.
(3): Incident Response and Recovery: Developing well-rehearsed contingency plans that detail how to isolate compromised systems, restore operational data, and communicate effectively with stakeholders.
(4): Certification and Audits: Conducting regular third-party audits and penetration testing to validate the integrity of software components and to ensure ongoing compliance with evolving regulatory standards.

Here, Table 22 provides a structured overview of cybersecurity and data governance dimensions in AI-driven subway power systems. Each row pinpoints the major threats, required security measures, privacy implications, and governance challenges. Notably, risk levels vary from medium to very high, underscoring the criticality of robust security frameworks. The table also emphasizes that future technological developments—from zero-trust network architectures to advanced privacy-preserving analytics—will heavily influence how effectively operators can safeguard these systems.

In sum, as AI and big data analytics become ubiquitous in subway power systems, cybersecurity, privacy, and data management must be addressed comprehensively. Achieving a holistic solution involves aligning technical safeguards, organizational protocols, and regulatory mandates. While the challenges are significant, so are the rewards: with well-managed data and secure AI pipelines, subway power networks can harness the full promise of advanced analytics without compromising safety or public trust.

6.5. Next-Generation Operational Strategies and Socio-Economic Implications

Beyond the immediate technical benefits of AI in monitoring, diagnosis, and self-healing, these technologies also herald broader changes in operational strategies and socio-economic landscapes. By reducing maintenance costs, improving reliability, and enabling flexible energy management, AI-driven subway power systems are poised to influence workforce development, system financing, and urban planning in meaningful ways.

AI technologies are expected to foster a shift in workforce skills towards data science, cybersecurity, and AI model interpretation, as traditional roles in power engineering evolve. This transition reflects broader trends seen in industries adopting AI for system optimization and automation. Moreover, AI-enhanced decision-making is likely to introduce new policy and regulatory frameworks, particularly in relation to the management of public transportation infrastructure and energy resources. As cities adopt AI-driven subway systems, there is an opportunity to enhance collaboration between technology developers, transit authorities, and urban planners to create more sustainable and resilient urban environments.

6.5.1. Evolving Role of the Workforce

(1): As AI-driven analytics and semi-autonomous systems take on routine tasks—such as fault detection or reconfiguration—human roles are likely to shift toward oversight, strategic decision-making, and specialized technical functions.
(2): Upskilling and Reskilling: Engineers and technicians will need new skill sets, bridging power engineering with data science, cybersecurity, and AI model interpretation.
(3): Collaborative Decision-Making: Operators will collaborate more closely with AI recommendations, requiring training in human–machine interfaces and explainable AI solutions to bolster trust and accountability.
(4): New Roles: AI ethicists, data stewards, and cybersecurity specialists will emerge as essential staff for managing the complex socio-technical ecosystem.

In this context, it is imperative for subway authorities and vocational institutions to align educational programs with these new requirements, fostering a workforce that can manage and continually refine AI-enabled subway power systems.

6.5.2. Financial and Economic Dimensions

AI-driven efficiency gains—such as reduced unplanned downtime, lower energy losses, and improved asset utilization—can translate into substantial cost savings. These savings can be redirected toward infrastructure upgrades or used to reduce the fiscal burden on local governments. Additionally, more reliable services may boost ridership, generating indirect economic benefits for the city (e.g., increased retail sales near stations and improved labor mobility).

However, the initial investment required for AI tools, sensor networks, and data infrastructures can be considerable. This may prompt public–private partnerships or alternative financing models to share risks and rewards among multiple stakeholders. Over time, data generated by these systems might even be monetized (e.g., through analysis services for third parties), creating new revenue streams. While this can strengthen financial sustainability, it also necessitates robust governance frameworks to ensure data privacy and equitable value distribution.

6.5.3. Urban Planning and Sustainable Development

Intelligent subway power systems can play a critical role in shaping sustainable urban growth. By optimizing energy consumption and integrating with other urban infrastructure—such as electric vehicle (EV) charging networks or district heating—subway systems can support a more holistic approach to urban energy management. For example, advanced forecasting of passenger flows, combined with dynamic power distribution, can reduce peak loads on city grids. This synergy fosters better urban planning, reduced carbon emissions, and improved overall quality of life for residents.

Moreover, AI-driven fault detection and rapid incident response can bolster public perception of subways as safe and reliable. In turn, cities may be more inclined to expand rail transit networks, encouraging a modal shift away from cars and thereby reducing congestion and air pollution.

6.5.4. Policy and Regulatory Considerations

Governments and regulatory bodies will need to modernize policies to keep pace with AI’s rapid integration. Potential areas of focus include the following:

(1): Standardization: Expanding IEC 61850 or similar standards to cover next-generation AI requirements (e.g., real-time data streaming and advanced analytics models).
(2): Safety and Liability: Clarifying who is responsible when AI-driven systems make decisions that lead to incidents—particularly if they deviate from conventional operator guidelines.
(3): Incentive Structures: Providing tax breaks, grants, or other incentives for subway operators investing in advanced AI technologies, especially if these innovations yield public benefits such as reduced CO₂ emissions or improved accessibility.

A forward-looking regulatory environment will ensure that AI enhancements align with public interest objectives, balancing innovation with risk management and equity considerations. For example, Refs. [142,143] underscore the crucial intersection of AI regulation, public interest, and risk management in ensuring the responsible and equitable deployment of artificial intelligence. Concretely, Alex-Omiogbemi et al. (2024) [142] present a framework for enhancing regulatory compliance and mitigating risks in emerging markets through digital innovations, illustrating how policy frameworks can support the responsible use of AI. Furthermore, Wang and Wu (2024) [143] address the need to strike a balance between fostering AI-driven innovation and maintaining robust regulatory oversight, highlighting the ethical and social implications of generative AI technologies. Together, these works contribute to the growing discourse on ensuring that AI development serves societal well-being while managing associated risks effectively.

Based on this, Table 23 outlines key socio-economic and operational considerations that arise from deploying AI in subway power systems. Gains in reliability and cost-effectiveness can translate into broader economic and environmental benefits, but they also introduce transitions in labor markets, regulatory frameworks, and urban planning. Each dimension involves interplay between technical innovations and societal factors, underscoring the necessity for multidisciplinary collaboration.

Overall, AI-driven subway power systems have the potential to redefine how metropolitan regions plan, finance, and operate their mass transit infrastructures. Policymakers, industry leaders, and community stakeholders should collaborate to craft strategies that maximize public benefit, minimize negative externalities, and ensure equitable access to these transformative technologies. By doing so, cities worldwide can harness AI’s power to create cleaner, safer, and more efficient transportation systems that support sustainable growth for generations to come.

Through the five sections above, we have examined the comprehensive implications of AI technologies for future subway power systems. Beginning with the role of AI in fault diagnosis and prognostics, we moved to reinforcement learning applications in self-healing and then explored the synergy of AI, MASs, and IEC 61850 for interoperable infrastructures. Subsequently, we addressed critical issues in cybersecurity, privacy, and data management and finally evaluated the broader socio-economic and operational transformations likely to emerge. This holistic coverage underscores not only the technical possibilities of AI-driven subway power systems but also the regulatory, workforce, and societal shifts required to bring about a truly intelligent, secure, and sustainable urban rail future.

6.6. Potential Security Flaws in AI-Driven Subway Power Systems and Mitigation Strategies

As subway power systems increasingly adopt AI technologies and MASs for self-healing, fault detection, and optimization, they become more vulnerable to cybersecurity threats. The integration of real-time data streams, edge computing, and AI-driven analytics introduces significant risks related to data integrity, system privacy, and overall network security. This section discusses the potential security flaws associated with AI-enhanced subway power systems and provides comprehensive strategies to mitigate these risks, ensuring the robustness, resilience, and safety of urban rail infrastructures.

6.6.1. Key Security Threats in AI-Driven Subway Power Systems

AI-enabled subway power systems, especially those incorporating machine learning (ML), deep learning (DL), and reinforcement learning (RL), increase both the complexity and attack surface of the system. The following are the primary security vulnerabilities identified in such systems:

1. Data Poisoning Attacks

AI models rely heavily on large datasets for training and decision-making. Data poisoning occurs when attackers intentionally manipulate training datasets to degrade the performance of the AI model, leading to incorrect predictions and compromised decision-making processes.

2. Sensor Spoofing

AI-driven systems depend on real-time sensor data to make decisions. In sensor spoofing, malicious actors manipulate sensor outputs—such as voltage, current, and temperature readings—to create false information that can trigger inappropriate actions by the system, such as misidentifying faults or failing to isolate them properly.

3. Ransomware and Denial of Service (DoS) Attacks

Ransomware attacks target critical infrastructure systems, encrypting operational data and demanding payment for restoring access. In DoS attacks, attackers flood communication channels, preventing timely data exchange between system components, potentially leading to failures in real-time control and communication.

4. Unauthorized System Access

AI-enabled subway systems are susceptible to unauthorized access, especially when communication protocols such as IEC 61850 are implemented. Cybercriminals could exploit vulnerabilities in these communication protocols to gain control over the system, affecting the decision-making capabilities of MAS- and AI-driven controllers.

5. Communication Latency and Spoofing

AI algorithms, especially those based on reinforcement learning, rely on real-time data for decision-making. Any latency or interruption in communication, whether through network failures or malicious interference, can degrade the performance of the self-healing system and delay fault isolation and recovery.

6.6.2. Mitigation Strategies for Enhancing Security

To safeguard AI-driven subway power systems, a multi-layered security approach is essential. The following mitigation strategies are proposed:

1. Advanced Encryption and Secure Communication Protocols

To prevent unauthorized access and data breaches, all communication between the system’s components, including sensors, agents, and controllers, should be encrypted using state-of-the-art encryption algorithms such as Advanced Encryption Standard (AES) and Transport Layer Security (TLS). Secure protocols like IEC 61850, with enhanced security features for critical infrastructure, should be employed to ensure data integrity and confidentiality.

2. Intrusion Detection and Prevention Systems (IDSs/IPSs)

Implementing IDSs/IPSs can help detect and prevent malicious activities such as sensor spoofing, unauthorized access, and data tampering. These systems monitor the network for unusual activities and trigger alerts for potential security threats, allowing operators to take timely action.

3. Federated Learning for Decentralized AI Models

Federated learning allows for the training of AI models without exposing sensitive data to centralized systems. By keeping data locally at each station or substation and only aggregating model updates, federated learning mitigates the risks associated with data breaches and ensures that privacy concerns are addressed while maintaining AI model performance.

4. Data Validation and Integrity Checks

Ensuring the integrity of the data fed into AI models is critical for maintaining accurate predictions. AI models should be equipped with built-in data validation checks to identify inconsistencies or anomalies in real-time data. Moreover, regular audits and updates of sensor calibration and maintenance schedules are necessary to ensure the continued accuracy of the system.

5. AI-Powered Anomaly Detection and Secure AI Pipelines

AI models should be trained to recognize and alert the system when abnormal patterns—indicative of attacks such as data poisoning—are detected. Additionally, secure AI pipelines, where each model update or decision is validated and signed by a trusted authority, can protect against tampering and unauthorized changes.

6. Backup and Recovery Mechanisms

To mitigate the risks of ransomware and DoS attacks, subway power systems must implement robust backup and recovery mechanisms. Regular backups of system configurations, AI model parameters, and critical operational data should be maintained, and recovery plans should be established to restore system functionality swiftly in case of an attack.

7. Zero-Trust Security Architecture

The zero-trust model assumes that no device or user is inherently trustworthy, whether inside or outside the network. In the context of subway power systems, this approach would involve strict access control policies, continuous monitoring of all system interactions, and multi-factor authentication (MFA) for all users and devices.

6.6.3. Summary of the Key Security Threats in AI-Driven Subway Power Systems

Based on the above, Table 24 provides a comprehensive summary of the key security threats in AI-driven subway power systems, along with the corresponding mitigation strategies and their implementation priorities. The table highlights critical threats such as data poisoning, sensor spoofing, ransomware, unauthorized access, and communication latency, which can compromise system performance and safety. For each threat, the table outlines specific countermeasures, including secure data pipelines, anomaly detection, encryption, multi-factor authentication, and edge computing, along with the necessary complexity for implementation. Notably, threats like data poisoning and ransomware are deemed high-priority due to their potential to disrupt system functionality, while solutions such as anomaly detection and secure AI pipelines are highlighted as essential for maintaining data integrity and system resilience. The table underscores the importance of a multi-layered, adaptive security approach to ensure the reliability and robustness of AI-powered subway power systems, with a focus on addressing both technical and operational vulnerabilities. In our view, the implementation of these mitigation strategies should be approached incrementally, with particular emphasis on continuous monitoring and system updates to stay ahead of emerging threats in the evolving landscape of smart transportation infrastructure.

Overall, the integration of AI and MASs in subway power systems significantly enhances system efficiency and self-healing capabilities but also introduces new security risks that must be addressed comprehensively. The proposed security measures, including advanced encryption, intrusion detection systems, federated learning, and AI-powered anomaly detection, are critical in safeguarding subway power systems from malicious threats. Ensuring that these systems are robust against potential cyberattacks will require continuous monitoring, adaptive security measures, and ongoing research to stay ahead of emerging threats.

By prioritizing cybersecurity and data integrity, operators can safeguard the reliability and safety of AI-driven subway power systems, thus fostering a secure and resilient urban transportation infrastructure that can meet the demands of future smart cities. As AI and cybersecurity technologies evolve, these systems will need to be periodically reassessed and upgraded to maintain their effectiveness in protecting public infrastructure.

7. Conclusions and Policy Implications

7.1. Conclusions

The research presented in this review highlights groundbreaking advancements in enhancing the self-healing capabilities of subway power supply systems, with a particular focus on the integration of MASs and AI algorithms. As a critical component of urban rail transit, the reliability and safety of the subway power supply are paramount, and traditional manual interventions for fault diagnosis and recovery have become insufficient to meet the increasingly complex demands of modern urban transportation systems. This paper has explored the evolving concept of self-healing technology, which has found successful applications in power grids and distribution networks, and it has demonstrated how these technologies can revolutionize subway power supply systems.

The integration of MASs and the IEC 61850 standard offers a novel, innovative approach to building an autonomous, adaptive, and intelligent self-healing control framework. By leveraging the strengths of MASs in decentralized control, coordination, and decision-making, subway power systems can respond dynamically to faults in ways that minimize the impact on service continuity and operational safety. The IEC 61850 standard, a globally recognized communication protocol for power systems, provides the interoperability and flexibility needed to implement these complex, decentralized self-healing mechanisms effectively. This novel hybrid model has not only enhanced the reliability of subway power systems but also set the foundation for more robust and scalable self-healing systems in urban rail infrastructure.

This review has also demonstrated that MASs combined with AI-driven fault diagnosis algorithms can drastically improve the speed, accuracy, and efficiency of fault detection, analysis, isolation, and recovery. Specifically, AI algorithms have the capacity to handle complex, multi-fault scenarios that may overwhelm traditional control methods. Furthermore, through advanced machine learning techniques, the system can continuously learn and adapt to new fault patterns, improving efficiency over time, which distinguishes this approach from conventional methods.

One of the most promising findings is the application of hybrid architectures that combine MASs with the IEC 61850 framework to support critical functions such as fault localization, isolation, and recovery in subway power systems. These hybrid architectures facilitate seamless communication between various subsystems, enabling a holistic view of the system’s health, making them ideal for real-time fault management in complex, large-scale networks like those of modern subway systems. This innovative integration is presented as a unique contribution, offering greater resilience and adaptability in fault management than existing technologies.

The potential for intelligent fault recovery strategies, supported by AI, is also highlighted in this review. These strategies, capable of quick adaptation to various fault conditions, can drastically reduce the time required for recovery, thereby improving the overall reliability of subway operations. Through continuous monitoring and real-time decision-making, AI-based recovery systems enhance the ability of subway power supply systems to self-heal, ensuring operational resilience even in the face of unforeseen challenges. This ability to adapt in real time, without requiring manual intervention, marks a significant shift in how subway systems handle faults.

In conclusion, the research presented in this review indicates that self-healing technologies, underpinned by MASs and AI, represent a crucial evolution in the design and operation of subway power systems. By reducing the need for manual intervention, enhancing fault detection and recovery processes, and improving system efficiency, these technologies will play a key role in shaping the future of urban transportation infrastructure. The integration of these innovative technologies not only holds promise for improving the resilience and performance of subway power supply systems but also sets the stage for broader applications of self-healing technologies in other critical infrastructure systems, marking a significant contribution to the ongoing intelligent infrastructure revolution.

7.2. Policy Implications

The findings of this research highlight several critical policy implications for the advancement and deployment of self-healing technologies in subway power systems, particularly those driven by MASs and AI. As subway systems worldwide face increasing demands for reliability, efficiency, and automation, the adoption of self-healing technologies is becoming an essential step toward achieving these goals. To fully realize the potential of MASs and AI in self-healing subway power systems, policymakers will need to consider a range of strategic actions and regulatory frameworks to facilitate the integration of these advanced technologies.

First and foremost, policymakers must recognize the need for substantial investment in research and development (R&D) to continue advancing the capabilities of MASs and AI algorithms in self-healing applications. While promising, the implementation of these technologies in subway power systems requires overcoming technical challenges, including data acquisition, system integration, and real-time decision-making capabilities. Governments and industry stakeholders should collaborate to fund R&D initiatives that focus on refining the algorithms, improving system interoperability, and testing the performance of these technologies in real-world environments. Public–private partnerships can play a crucial role in accelerating the development and deployment of these innovations.

Another significant policy consideration is the establishment of regulatory standards that ensure the compatibility and interoperability of self-healing systems across different subway networks and urban environments. The IEC 61850 standard, which has already been proposed as a framework for integration, is a step in the right direction. However, to facilitate the widespread adoption of self-healing technologies, policymakers must ensure that these standards are continuously updated to reflect the rapid advancements in AI and MASs. This includes promoting global standardization efforts to ensure that subway power systems in different regions can communicate seamlessly, share data, and collaborate in real time.

Furthermore, policymakers must address the training and upskilling of the workforce to manage and maintain the advanced self-healing systems. As AI and MASs take a more prominent role in subway power system management, there will be a need for a skilled workforce capable of operating and troubleshooting these complex systems. Educational institutions, in collaboration with industry experts, should develop specialized training programs to equip engineers, operators, and maintenance personnel with the necessary knowledge and skills. Additionally, governments can incentivize workforce development through grants, scholarships, and industry partnerships.

In addition to technological and workforce considerations, policymakers should ensure that the implementation of self-healing technologies aligns with broader sustainability and resilience goals. Subway power supply systems play a key role in reducing urban congestion and greenhouse gas emissions. By integrating self-healing technologies, these systems can become more energy-efficient, reducing the overall environmental footprint of urban transportation infrastructure. Policies promoting the adoption of green technologies and a reduction in carbon emissions in subway networks will further incentivize the integration of advanced self-healing solutions.

Finally, the policy landscape should encourage the collection and sharing of data for ongoing performance analysis. Self-healing systems require continuous data input for machine learning algorithms to adapt and optimize. Therefore, policies promoting data transparency, privacy, and security will be critical to ensuring the safe and efficient operation of self-healing technologies. Regulations must balance the need for open data sharing with the protection of sensitive information, particularly regarding the security of critical infrastructure.

In conclusion, the successful deployment of self-healing technologies in subway power supply systems requires comprehensive policy support that encompasses investment in R&D, the establishment of standards, workforce development, sustainability considerations, and data governance. Policymakers must take proactive steps to create an enabling environment for these innovations to thrive, ensuring that subway systems are not only more resilient and efficient but also more adaptable to future challenges. Through targeted policy initiatives, governments can play a vital role in shaping the future of urban transportation infrastructure and ensuring its continued evolution in an increasingly intelligent, autonomous, and sustainable direction.

Author Contributions

Conceptualization, J.F., T.Y., K.Z. and L.C.; methodology, J.F., T.Y., K.Z. and L.C.; formal analysis, J.F., K.Z. and L.C.; investigation, J.F., T.Y., K.Z. and L.C.; resources, J.F., T.Y., K.Z. and L.C.; data curation, J.F., T.Y., K.Z. and L.C.; writing—original draft preparation, J.F., T.Y., K.Z. and L.C.; writing—review and editing, J.F., T.Y., K.Z. and L.C.; visualization, J.F., T.Y., K.Z. and L.C.; supervision, L.C.; project administration, L.C.; funding acquisition, L.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the Guangzhou Education Bureau University Research Project - Graduate Research Project, grant number 2024312278 (funder: L.C.), and in part by the STU Scientific Research Initiation Grant (SRIG), grant number STF23021 (funder: K.Z.).

Data Availability Statement

No new data were created or analyzed in this study.

Acknowledgments

We sincerely thank the associate editor and invited anonymous reviewers for their kind and helpful comments on our paper.

Conflicts of Interest

Author Jianbing Feng was employed by the company Guangzhou Metro Construction Management Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

Abbreviation	Full Form
AI	Artificial Intelligence
AC/DC	Alternating Current/Direct Current
CI	Condition-based Inspection
DA/DO	Data Attribute/Data Object
DC	Direct Current
DER	Distributed Energy Resources
DO	Data Object
DOS	Denial of Service
EMS	Energy Management System
FLISR	Fault Location, Isolation, and Service Restoration
GOOSE	Generic Object-Oriented Substation Event
IED	Intelligent Electronic Device
IEC 61850	International Electrotechnical Commission 61850 Standard
IDS	Intrusion Detection System
IOT	Internet of Things
LN	Logical Node
MAS	Multi-Agent System
MMS	Manufacturing Message Specification
MMXU	Measurement Unit
MTTR	Mean Time to Repair
PDIS	Protection Distance Intelligent System
PMU	Phasor Measurement Unit
PRP	Parallel Redundancy Protocol
PTOC	Protection Overcurrent Unit
RSTP	Rapid Spanning Tree Protocol
SAIDI	System Average Interruption Duration Index
SAIFI	System Average Interruption Frequency Index
SCADA	Supervisory Control and Data Acquisition
SCL	System Configuration Language
VAR	Voltage Amperes Reactive
WAMPAC	Wide-Area Monitoring, Protection, and Control
XML	eXtensible Markup Language

References

China Association of Metros. Annual Report on Statistics and Analysis of Urban Rail Transit 2023. Available online: https://www.camet.org.cn/xytj/tjxx/14894.shtml (accessed on 29 March 2024).
IEC 61850; Communication Networks and Systems in Substations. International Electrotechnical Commission (IEC): Geneva, Switzerland, 2011.
Shang, W.L.; Lv, Z. Low carbon technology for carbon neutrality in sustainable cities: A survey. Sustain. Cities Soc. 2023, 92, 104489. [Google Scholar] [CrossRef]
Wang, L. Study on the Fault Diagnosis and Protection of Energy-Fed Supply System in Urban Mass Transit. Ph.D. Thesis, Beijing Jiaotong University, Beijing, China, 2010. [Google Scholar]
Du, F. Modeling for Metro Locomotive and Analysis of Fault Condition of DC Traction Power Supply System. Ph.D. Thesis, Beijing Jiaotong University, Beijing, China, 2010. [Google Scholar]
Zeng, B.; Zhang, J.; Yang, X.; Wang, J.; Dong, J.; Zhang, Y. Integrated planning for transition to low-carbon distribution system with renewable energy generation and demand response. IEEE Trans. Power Syst. 2013, 29, 1153–1165. [Google Scholar] [CrossRef]
Allegretti, G.; Montoya, M.A.; Bertussi, L.A.S.; Talamini, E. When being renewable may not be enough: Typologies of trends in energy and carbon footprint towards sustainable development. Renew. Sustain. Energy Rev. 2022, 168, 112860. [Google Scholar] [CrossRef]
Song, X. Research on Fault Location Methods for City DC Railway Traction System; Beijing Jiaotong University: Beijing, China, 2015. [Google Scholar]
Qin, B.; Wang, H.; Wang, Z.; Xiong, Z.; Zhao, J.; Lu, H.; Wang, M. Integrated development of urban rail transit and energy systems supported by underground space. Strateg. Study CAE 2023, 25, 45–59. [Google Scholar] [CrossRef]
Serdar, M.Z.; Koç, M.; Al-Ghamdi, S.G. Urban transportation networks resilience: Indicators, disturbances, and assessment methods. Sustain. Cities Soc. 2022, 76, 103452. [Google Scholar] [CrossRef]
Wang, K.K.; Lv, Y. Fault Location Method of Metro DC Traction Power Supply System Catenary. Urban Mass Transit 2022, 7, 222–224, 229. [Google Scholar]
Wei, R.; Shi, G.; Zhuang, K.; Xia, J. Research on Fault Location of Subway DC Traction Power Supply System Based on GPS Time Synchronization. Mar. Electr. Electron. Eng. 2023, 43, 85–88. [Google Scholar]
Jin, X.; Li, Z.; Hu, Z. Simulation of Fault Location for Subway DC Power Supply System Based on Time Domain Differential. Mar. Electr. Electron. Eng. 2017, 37, 67–70. [Google Scholar]
Pei, W. Research on the Reliability of Subway Traction Power Supply Systems; Nanjing University of Science and Technology: Nanjing, China, 2018. [Google Scholar]
Zhou, J. Research on Online Reliability Evaluation for Traction Power Supply System of Metro Network; Shanghai Jiaotong University: Shanghai, China, 2012. [Google Scholar]
Sheng, S.; Li, K.K.; Chan, W.L.; Xiangjun, Z.; Xianzhong, D. Agent-based self-healing protection system. IEEE Trans. Power Deliv. 2006, 21, 610–618. [Google Scholar] [CrossRef]
Ji, X.; Jian, L.; Yan, X.; Wang, H. Research on self-healing technology of smart distribution network based on multi-agent system. In Proceedings of the 2016 Chinese Control and Decision Conference (CCDC), Yinchuan, China, 28–30 May 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 6132–6137. [Google Scholar]
Xiang, G.; Xin, A. The application of self-healing technology in smart grid. In Proceedings of the 2011 Asia-Pacific Power and Energy Engineering Conference, Wuhan, China, 25–28 March 2011. [Google Scholar]
Zhang, R.; Bie, Z. Distributed cluster-level cooperative control of dynamic virtual microgrid cluster for active distribution network. Autom. Electr. Power Syst. 2022, 46, 55–62. [Google Scholar]
Zhao, Y.; Rieger, C.; Zhu, Q. Multi-agent learning for resilient distributed control systems. arXiv 2022, arXiv:2208.05060. [Google Scholar]
Pang, Y.; Lodewijks, G. Agent-based intelligent monitoring in large-scale continuous material transport. In Proceedings of the 2012 9th IEEE International Conference on Networking, Sensing and Control, Beijing, China, 11–14 April 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 79–84. [Google Scholar]
Mayorov, G.; Stennikov, V.; Barakhtenko, E. Application of the multiagent approach to the research of integrated energy supply systems. In E3S Web of Conferences 2019; EDP Sciences: Les Ulis, France, 2019; Volume 114, p. 01006. [Google Scholar]
Sujil, A.; Verma, J.; Kumar, R. Multi agent system: Concepts, platforms and applications in power systems. Artif. Intell. Rev. 2018, 49, 153–182. [Google Scholar] [CrossRef]
Yu, H.; Wang, Y.; Chen, Z. A novel renewable microgrid-enabled metro traction power system—Concepts, framework, and operation strategy. IEEE Trans. Transp. Electrif. 2021, 7, 1733–1749. [Google Scholar] [CrossRef]
Saray, M.; Saray, M.; Kazan, C.; Guner, S. Optimization of renewable energy usage in public transportation: Mathematical model for energy management of plug-in PV-based electric metrobuses. J. Energy Storage 2024, 78, 109946. [Google Scholar] [CrossRef]
Kilic, B.; Dursun, E. Integration of innovative photovoltaic technology to the railway trains: A case study for Istanbul airport-M1 light metro line. In Proceedings of the IEEE EUROCON 2017-17th International Conference on Smart Technologies, Ohrid, Macedonia, 6–8 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 336–340. [Google Scholar]
Kumar, G.M.S.; Cao, S. Leveraging energy flexibilities for enhancing the cost-effectiveness and grid-responsiveness of net-zero-energy metro railway and station systems. Appl. Energy 2023, 333, 120632. [Google Scholar] [CrossRef]
Yu, J.; Wang, J.; Tong, F. Research and analysis of power supply load forecasting and self-healing control in urban rail transit system. In IOP Conference Series: Earth and Environmental Science 2021; IOP Publishing: Bristol, UK, 2021; Volume 769, p. 042093. [Google Scholar]
Zheng, S.; Liu, Y.; Lin, Y.; Wang, Q.; Yang, H.; Chen, B. Bridging strategy for the disruption of metro considering the reliability of transportation system: Metro and conventional bus network. Reliab. Eng. Syst. Saf. 2022, 225, 108585. [Google Scholar] [CrossRef]
Kalyvas, M.; McCracken, A. Doha Metro Novel Building Automation and Control System (BACS). J. Ind. Integr. Manag. 2024, 9, 571–596. [Google Scholar] [CrossRef]
Longo, M.; Bramani, M. The automation control systems for the efficiency of metro transit lines. In Proceedings of the 2015 AEIT International Annual Conference (AEIT), Naples, Italy, 14–16 October 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1–6. [Google Scholar]
National Energy Technology Laboratory. The Modern Grid Initiative; US: Department of Energy: Washington, DC, USA, 2008; pp. 26–30. [Google Scholar]
Liu, C.; Jung, J.; Heydt, G.T.; Vittal, V.; Phadke, A. The Strategic Power Infrastructure Defense (SPID) System: A Conceptual Design. IEEE Control Syst. Mag. 2000, 20, 40–52. [Google Scholar]
Li, T. Research on the Self-Healing Functions of Smart Distribution Grid and Its Benefits Evaluation Model; North China Electric Power University: Beijing, China, 2012. [Google Scholar]
Amin, M. Toward self-healing energy infrastructure systems. IEEE Comput. Appl. Power 2001, 14, 20–28. [Google Scholar] [CrossRef]
Shittu, E.; Tibrewala, A.; Kalla, S.; Wang, X. Meta-analysis of the strategies for self-healing and resilience in power systems. Adv. Appl. Energy 2021, 4, 100036. [Google Scholar] [CrossRef]
Arefifar, S.A.; Alam, M.S.; Hamadi, A. A review on self-healing in modern power distribution systems. J. Mod. Power Syst. Clean Energy 2023, 11, 1719–1733. [Google Scholar] [CrossRef]
Madani, V.; Novosel, D.; Horowitz, S.; Adamiak, M.; Amantegui, J.; Karlsson, D.; Imai, S.; Apostolov, A. IEEE PSRC report on global industry experiences with system integrity protection schemes (SIPS). IEEE Trans. Power Deliv. 2010, 25, 2143–2155. [Google Scholar] [CrossRef]
Maqsood, M.; Masood, A. Integration of Wireless HART and STK600 Development Kit for Data Collection in Wireless Sensor Networks. Master’s Thesis, Universitetet i Agder/University of Agder, Kristiansand, Norway, 2013. [Google Scholar]
Morais, B.T.P. Emerging Technologies and Future Trends in Substation Automation Systems for the Protection, Monitoring and Control of Electrical Substations; PQDT-Global: Ann Arbor, MI, USA, 2013. [Google Scholar]
Majhi, A.A.K.; Mohanty, S. A comprehensive review on Internet of Things applications in power systems. IEEE Internet Things J. 2024, 11, 34896–34923. [Google Scholar] [CrossRef]
Wang, L.; Bo, Z.; Wang, Q.P.; Liu, R.T.; Fan, W. Design of integrated wide area protection and control for power grid. DPI Proc. 2018, 1, 206–214. [Google Scholar] [CrossRef] [PubMed]
Hellman, C.; Aronson, M.; Tom, N.; Quan, W. The microprocessor and the minicomputer for earth terminal and network control. ITC Proc. 1981, 1, 529–548. [Google Scholar]
Terzija, V.; Valverde, G.; Cai, D.; Regulski, P.; Madani, V.; Fitch, J.; Skok, S.; Begovic, M.M.; Phadke, A. Wide-area monitoring, protection, and control of future electric power networks. Proc. IEEE 2010, 99, 80–93. [Google Scholar] [CrossRef]
Rahman, W.U.; Ali, M.; Mehmood, C.A.; Khan, A. Design and implementation for wide area power system monitoring and protection using phasor measuring units. WSEAS Trans. Power Syst. 2013, 8, 57–64. [Google Scholar]
Cheng, L.F.; Yu, T. A new generation of AI: A review and perspective on machine learning technologies applied to smart energy and electric power systems. Int. J. Energy Res. 2019, 43, 1928–1973. [Google Scholar] [CrossRef]
Cheng, L.F.; Wei, X.; Li, M.; Tan, C.; Yin, M.; Shen, T.; Zou, T. Integrating evolutionary game-theoretical methods and deep reinforcement learning for adaptive strategy optimization in user-side electricity markets: A comprehensive review. Mathematics 2024, 12, 3241. [Google Scholar] [CrossRef]
Cheng, L.F.; Yu, T.; Zhang, X.S.; Yang, B. Parallel cyber-physical-social systems based smart energy robotic dispatcher and knowledge automation: Concepts, architectures and challenges. IEEE Intell. Syst. 2019, 34, 54–64. [Google Scholar] [CrossRef]
Nyangon, J. Climate-proofing critical energy infrastructure: Smart grids, artificial intelligence, and machine learning for power system resilience against extreme weather events. J. Infrastruct. Syst. 2024, 30, 03124001. [Google Scholar] [CrossRef]
Ahmad, T.; Madonski, R.; Zhang, D.; Huang, C.; Mujeeb, A. Data-driven probabilistic machine learning in sustainable smart energy/smart energy systems: Key developments, challenges, and future research opportunities in the context of smart grid paradigm. Renew. Sustain. Energy Rev. 2022, 160, 112128. [Google Scholar] [CrossRef]
Dick, K.; Russell, L.; Souley Dosso, Y.; Kwamena, F.; Green, J.R. Deep learning for critical infrastructure resilience. J. Infrastruct. Syst. 2019, 25, 05019003. [Google Scholar] [CrossRef]
Nama, P.; Reddy, P.; Pattanayak, S.K. Artificial Intelligence for Self-Healing Automation Testing Frameworks: Real-Time Fault Prediction and Recovery. Artif. Intell. 2024, 64 (Suppl. S3), 111–141. [Google Scholar]
Plevris, V.; Papazafeiropoulos, G. AI in Structural Health Monitoring for Infrastructure Maintenance and Safety. Infrastructures 2024, 9, 225. [Google Scholar] [CrossRef]
Manoharan, A.; Sarker, M. Revolutionizing Cybersecurity: Unleashing the Power of Artificial Intelligence and Machine Learning for Next-Generation Threat Detection. Int. Res. J. Mod. Eng. Technol. Sci. 2022, 4, 2151–2164. [Google Scholar] [CrossRef]
Fadi, O.; Karim, Z.; Mohammed, B. A survey on blockchain and artificial intelligence technologies for enhancing security and privacy in smart environments. IEEE Access 2022, 10, 93168–93186. [Google Scholar] [CrossRef]
Tooki, O.O.; Popoola, O.M. A critical review on intelligent-based techniques for detection and mitigation of cyberthreats and cascaded failures in cyber-physical power systems. Renew. Energy Focus 2024, 51, 100628. [Google Scholar] [CrossRef]
Ahmad, S. Real-Time Control and Power Management for Interconnected Microgrids with Self-Healing Capability. Ph.D. Thesis, University of Malaya, Kuala Lumpur, Malaysia, 2022. [Google Scholar]
Rath, S.; Nguyen, L.D.; Sahoo, S.; Popovski, P. Self-healing secure blockchain framework in microgrids. IEEE Trans. Smart Grid 2023, 14, 4729–4740. [Google Scholar] [CrossRef]
Watuwa, B. Power Reliability Analysis of DC Traction Power Supply System: A Case Study of Addis Ababa Light Rail Transit; Addis Ababa University: Addis Ababa, Ethiopia, 2019; Available online: https://scholar.googleusercontent.com/scholar?q=cache:Q3mqEIVgZ1kJ:scholar.google.com/&hl=zh-CN&as_sdt=0,5&scioq=Power+Reliability+Analysis+of+DC+Traction+Power+Supply+System:+A+Case+Study+of+Addis+Ababa+Light+Rail+Transit (accessed on 10 March 2025).
Ogunsola, A.; Mariscotti, A. Electromagnetic Compatibility in Railways: Analysis and Management; Springer Science & Business Media, Springer Publishing Company, Incorporated, 1 July 2013; pp. 1–528. Available online: https://books.google.com/books?hl=zh-CN&lr=&id=N5B3S13cPpIC&oi=fnd&pg=PR2&dq=related:YcJun2vRVgIJ:scholar.google.com/&ots=EecduOIHre&sig=teucUbqoCLzfPK-9XlVwRKHvlp4#v=onepage&q&f=false (accessed on 10 March 2025). [CrossRef]
López, D.Á.J.L. Optimising the Electrical Infrastructure of Mass Transit Systems to Improve the Use of Regenerative Braking. Ph.D. Thesis, Universidad Pontificia Comillas, Madrid, Spain, 2016. [Google Scholar]
Parizad, A.; Baghaee, H.R. Overview of smart cyber-physical power systems: Fundamentals, Challenges, and Solutions. Wiley Online Libr. 2025, 1, 157–178. Available online: https://onlinelibrary.wiley.com/doi/abs/10.1002/9781394191529.ch1 (accessed on 10 March 2025).
Castro Gómez, A. Feasibility for the Introduction of Current Limiting Impedance for a Previously Solid Grounded Medium Voltage Distribution Network. Master’s Thesis, Politecnico di Milano, Milan, Italy, 28 April 2017. Available online: https://www.politesi.polimi.it/retrieve/a81cb05c-3dc7-616b-e053-1605fe0a889a/Thesis%20Alex%20Castro.pdf (accessed on 10 March 2025).
Haque, A.; Malik, A.; Shah, N.; Malik, J.A.; Ahmad, R.; Arif, M. Fundamentals of power electronics in smart cities. Taylor Fr. 2024, 1, 77–89. Available online: https://www.taylorfrancis.com/chapters/edit/10.1201/9781032669809-1/fundamentals-power-electronics-smart-cities-ahteshamul-haque-naila-shah-junaid-ahmad-malik-azra-malik (accessed on 10 March 2025).
Iovanovici, A. Designing Low Latency, Fault-Tolerant Sensor Networks Using Complex Networks Analysis. Timişoara: Editura Politehnica. 2015. ISBN 9786065549623, 6065549622. Available online: https://search.worldcat.org/zh-cn/title/1288695767 (accessed on 10 March 2025).
Raghunath, K.; Rengarajan, N. Response time optimization with enhanced fault-tolerant wireless sensor network design for on-board rapid transit applications. Clust. Comput. 2019, 22 (Suppl. 4), 9737–9753. [Google Scholar] [CrossRef]
Kumari, S.; Tyagi, A.K. Wireless sensor networks: An introduction. Digit. Twin Blockchain Smart Cities 2024, 1, 12–22. Available online: https://scholar.google.com/citations?user=RIgaVmUAAAAJ&hl=en&num=20&oi=sra (accessed on 10 March 2025).
Hernandez, J.C.; Sutil, F.S.; Vidal, P.G. Protection of a multiterminal DC compact node feeding electric vehicles on electric railway systems, secondary distribution networks, and PV systems. Turk. J. Electr. Eng. Comput. Sci. 2016, 24, 3123–3143. [Google Scholar] [CrossRef]
Swain, A.; Abdellatif, E.; Mousa, A.; Pong, P.W. Sensor technologies for transmission and distribution systems: A review of the latest developments. Energies 2022, 15, 7339. [Google Scholar] [CrossRef]
Georgilakis, P.S.; Hatziargyriou, N.D. Optimal distributed generation placement in power distribution networks: Models, methods, and future research. IEEE Trans. Power Syst. 2013, 28, 3420–3428. [Google Scholar] [CrossRef]
Muzzammel, R.; Raza, A.; Hussain, M.R.; Abbas, G. MT-HVdc systems fault classification and location methods based on traveling and non-traveling waves—A comprehensive review. Appl. Sci. 2019, 9, 4760. [Google Scholar] [CrossRef]
Hamidi, R.J.; Livani, H. A recursive method for traveling-wave arrival-time detection in power systems. IEEE Trans. Power Deliv. 2018, 33, 1097–1106. [Google Scholar]
Costa, F.B.; Miranda, V.; Leite, H. Wavelet-based analysis and detection of traveling waves due to DC faults in LCC HVDC systems. Int. J. Electr. Power Energy Syst. 2019, 105, 158–165. [Google Scholar]
Esmail, E.M.; Elsadd, M.A.; Elkalashy, N.I. A review: Smart distribution grid management using agents. WSEAS Trans. Power Syst. 2020, 1, 348234782. [Google Scholar] [CrossRef]
Liu, C.; Chen, Z.; Bak, C.L. Multi-agent system based adaptive protection for dispersed generation integrated distribution systems. Trans. Power Syst. 2013, 1, 270506087. [Google Scholar] [CrossRef]
Rahman, M.S.; Muyeen, S.M.; Ghosh, A.; Islam, S.M. Multi-agent systems in ICT enabled smart grid: A status update on technology framework and applications. IEEE Trans. Power Deliv. 2019, 1, 8765552. [Google Scholar]
Alstom. Towards the First Railway Cybersecurity International Standard: Why Standards Are Important to Secure Railways; Alstom: Saint-Ouen-sur-Seine, France, 2024; Available online: https://www.alstom.com/press-releases-news/2024/3/towards-first-railway-cybersecurity-international-standard-why-standards-are-important-secure-railways (accessed on 10 March 2025).
Radiflow. Securing Railway Operations from OT Cyberattacks; Radiflow: Mahwah, NJ, USA, 2024; Available online: https://www.radiflow.com/white-papers/securing-railway-operations-from-ot-cyberattacks/ (accessed on 11 March 2025).
REPLIL. Cybersecurity in Railway Digital Transformation Journey; REPLIL: Dubai, United Arab Emirates, 2024; Available online: https://www.replil.com/cybersecurity-in-railway-digital-transformation-journey/ (accessed on 11 March 2025).
Mbango, F. Investigation into Alternative Protection Solutions for Distribution Networks. Ph.D. Thesis, Cape Peninsula University of Technology, Bellville, TX, USA, 2009. Available online: https://core.ac.uk/download/pdf/148365012.pdf (accessed on 11 March 2025).
Dutta Pramanik, P.; Upadhyaya, B.; Kushwaha, A.; Bhowmik, D. Harnessing IoT: Transforming Smart Grid Advancements. In IoT for Smart Grid: Revolutionizing Electrical Engineering 2025, Chapter 7, 127–174; Wiley Online Library: Hoboken, NJ, USA, 2025; Available online: https://onlinelibrary.wiley.com/doi/abs/10.1002/9781394279401.ch7 (accessed on 11 March 2025).
Pandiyan, P.; Saravanan, S.; Kannadasan, R.; Krishnaveni, S.; Alsharif, M.; Kim, M. A comprehensive review of advancements in green IoT for smart grids: Paving the path to sustainability. Energy Rep. 2024, 11, 5504–5531. [Google Scholar] [CrossRef]
Baroud, S.Y.; Yahaya, N.A.; Elzamly, A.M. Cutting-Edge AI Approaches with MAS for PdM in Industry 4.0: Challenges and Future Directions. J. Appl. Data Sci. 2024, 5, 455–473. [Google Scholar] [CrossRef]
Chouhan, S.; Mohammadi, F.D.; Feliachi, A.; Solanki, J.M.; Choudhry, M.A. Hybrid MAS Fault Location, Isolation, and Restoration for Smart Distribution System with Microgrids. In Proceedings of the 2016 IEEE Power and Energy Society General Meeting (PESGM), Boston, MA, USA, 17–21 July 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–5. [Google Scholar]
Han, Y.; Zhang, K.; Li, H.; Coelho, E.A.A.; Guerrero, J.M. MAS-Based Distributed Coordinated Control and Optimization in Microgrid and Microgrid Clusters: A Comprehensive Overview. IEEE Trans. Power Electron. 2017, 33, 6488–6508. [Google Scholar] [CrossRef]
Hua, H.; Li, Y.; Wang, T.; Dong, N.; Li, W.; Cao, J. Edge Computing with Artificial Intelligence: A Machine Learning Perspective. ACM Comput. Surv. 2023, 55, 1–35. [Google Scholar] [CrossRef]
Wang, F.; Zhang, M.; Wang, X.; Ma, X.; Liu, J. Deep Learning for Edge Computing Applications: A State-of-the-Art Survey. IEEE Access 2020, 8, 58322–58336. [Google Scholar] [CrossRef]
Wang, X.; Han, Y.; Leung, V.C.M.; Niyato, D.; Yan, X.; Chen, X. Convergence of Edge Computing and Deep Learning: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2020, 22, 869–904. [Google Scholar] [CrossRef]
Chen, J.; Ran, X. Deep Learning with Edge Computing: A Review. Proc. IEEE 2019, 107, 1655–1674. [Google Scholar] [CrossRef]
Murshed, M.G.S.; Murphy, C.; Hou, D.; Khan, N.; Ananthanarayanan, G.; Hussain, F. Machine Learning at the Network Edge: A Survey. ACM Comput. Surv. 2021, 54, 1–37. [Google Scholar] [CrossRef]
Logenthiran, T. Multi-Agent System for Control and Management of Distributed Power Systems. Ph.D Thesis, National University of Singapore, Singapore, 2012. [Google Scholar]
Dou, C.; Hao, D.; Jin, B.; Wang, W.; An, N. Multi-agent-system-based decentralized coordinated control for large power systems. Int. J. Electr. Power Energy Syst. 2014, 63, 814–821. [Google Scholar] [CrossRef]
Farid, A.M. Multi-agent system design principles for resilient coordination & control of future power systems. Intell. Ind. Syst. 2015, 1, 13–34. [Google Scholar]
Herrera, M.; Pérez-Hernández, M.; Parlikad, A.; Izquierdo, J. Multi-Agent Systems and Complex Networks: Review and Applications in Systems Engineering. Processes 2020, 8, 312. [Google Scholar] [CrossRef]
Sharifi, L. Economics Inspired Energy Aware Service Provisioning in P2P Assisted Cloud Ecosystems. Technico.Ulisboa.Pt 2015, 1, 72–98. Available online: https://web.tecnico.ulisboa.pt/~ist14191/repository/Leila-Sharifi-CAT.pdf (accessed on 13 March 2025).
Irfan, M.; Iqbal, J.; Iqbal, A.; Riaz, R.A. Opportunities and Challenges in Control of Smart Grids—Pakistani Perspective. Renew. Sustain. Energy Rev. 2017, 7, 652–674. [Google Scholar] [CrossRef]
Aftab, M.A.; Hussain, S.M.S.; Ali, I.; Ustun, T.S. IEC 61850-Based Communication Layer Modeling for Electric Vehicles: Electric Vehicle Charging and Discharging Processes Based on the International Electrotechnical Commission 61850 Standard and Its Extensions. IEEE Ind. Electron. Mag. 2020, 14, 4–14. [Google Scholar] [CrossRef]
Mackiewicz, R.E. Overview of IEC 61850 and Benefits. In Proceedings of the 2006 IEEE Power Engineering Society General Meeting, Montreal, QC, Canada, 18–22 June 2006; IEEE: Piscataway, NJ, USA, 2006; p. 8. [Google Scholar]
Youssef, T.A.; El Hariri, M.; Bugay, N.; Mohammed, O.A. IEC 61850: Technology Standards and Cyber-Threats. In Proceedings of the 2016 IEEE 16th International Conference on Environment and Electrical Engineering (EEEIC), Florence, Italy, 7–10 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [Google Scholar]
Aftab, M.A.; Hussain, S.M.S.; Ali, I.; Ustun, T.S. IEC 61850-Based Substation Automation System: A Survey. Int. J. Electr. Power Energy Syst. 2020, 120, 106008. [Google Scholar] [CrossRef]
Shin, I.J.; Song, B.K.; Eom, D.S. International Electronical Committee (IEC) 61850 Mapping with Constrained Application Protocol (CoAP) in Smart Grids Based European Telecommunications Standard Institute Machine-to-Machine (M2M) Environment. Energies 2017, 10, 393. [Google Scholar] [CrossRef]
Ozansoy, C.R.; Zayegh, A.; Kalam, A. Object Modeling of Data and Datasets in the International Standard IEC 61850. IEEE Trans. Power Deliv. 2009, 24, 1140–1147. [Google Scholar] [CrossRef]
Kostic, T.; Preiss, O.; Frei, C. Understanding and Using the IEC 61850: A Case for Meta-Modelling. Comput. Stand. Interfaces 2005, 27, 679–695. [Google Scholar] [CrossRef]
Ihle, C.; Trautwein, D.; Schubotz, M.; Meuschke, N.; Gipp, B. Incentive Mechanisms in Peer-to-Peer Networks—A Systematic Literature Review. ACM Comput. Surv. 2023, 55 (Suppl. S14), 1–69. [Google Scholar] [CrossRef]
Reckerd, D.; Vico, J. Application of Peer-to-Peer Communication, for Protection and Control, at Seward Distribution Substation. In Proceedings of the 58th Annual Conference for Protective Relay Engineers, College Station, TX, USA, 5–7 April 2005; IEEE: Piscataway, NJ, USA, 2005; pp. 40–45. [Google Scholar]
Wojdak, W. Rapid Spanning Tree Protocol: A New Solution from an Old Technology. Reprinted from CompactPCI Systems March 2003. Available online: http://pdf.cloud.opensystemsmedia.com/advancedtca-systems.com/PerfTech.Mar03.pdf (accessed on 13 March 2025).
Marchese, M.; Mongelli, M. Simple Protocol Enhancements of Rapid Spanning Tree Protocol Over Ring Topologies. Comput. Netw. 2012, 56, 1131–1151. [Google Scholar] [CrossRef]
Pallos, R.; Farkas, J.; Moldovan, I.; Lukovszki, C. Performance of Rapid Spanning Tree Protocol in Access and Metro Networks. In Proceedings of the 2007 Second International Conference on Access Networks & Workshops, Ottawa, ON, Canada, 22–24 August 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 1–8. [Google Scholar]
Li, Q.; Wang, D.; Huang, X.; Zhang, H. A System Configuration Description Language (SCL) Complied File Based Configuration Method for Bridges in Smart Substation. In Proceedings of the 2023 7th International Conference on Smart Grid and Smart Cities (ICSGSC), Lanzhou, China, 22–24 September 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 136–141. [Google Scholar]
Cruz, J.P.; Kaji, Y.; Yanai, N. RBAC-SC: Role-Based Access Control Using Smart Contract. IEEE Access 2018, 6, 12240–12251. [Google Scholar] [CrossRef]
Zeeshan, M.; Manzoor, M.F.; Qadir, J. Backup Channel and Cooperative Channel Switching On-Demand Routing Protocol for Multi-Hop Cognitive Radio Ad Hoc Networks (BCCCS). In Proceedings of the 2010 6th International Conference on Emerging Technologies (ICET), Islamabad, Pakistan, 18–19 October 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 394–399. [Google Scholar]
Ahmad, A.; El Haffar, A.; Lavanya, P. Moving towards reliable and fault-tolerant smart grid systems. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 2023, 1, 508. [Google Scholar] [CrossRef]
Essackjee, I.A. Leveraging disruptive technologies to realize the smart grid. ResearchGate 2023, 1, 346402311. Available online: https://www.researchgate.net/profile/Ismael-Essackjee/publication/346402311_Leveraging_Disruptive_Technologies_to_Realize_the_Smart_Grid/links/62a23ada55273755ebe07e71/Leveraging-Disruptive-Technologies-to-Realize-the-Smart-Grid.pdf (accessed on 14 March 2025).
De Almeida, L.F.F.; Pereira, L.A.M.; Sodré, A.C. Control networks and smart grid teleprotection: Key aspects, technologies, protocols, and case-studies. IEEE Access 2020, 1, 9200485. [Google Scholar] [CrossRef]
Alabi, M. The Impact of Artificial Intelligence on Network Optimization in Telecommunications. ResearchGate 2023, 1, 384664972. Available online: https://www.researchgate.net/profile/Moses-Alabi/publication/384664972_The_Impact_of_Artificial_Intelligence_on_Network_Optimization_in_Telecommunications/links/6701933d9e6e82486f0549d5/The-Impact-of-Artificial-Intelligence-on-Network-Optimization-in-Telecommunications.pdf (accessed on 14 March 2025).
Umoga, U.J.; Sodiya, E.O.; Ugwuanyi, E.D.; Jacks, B.S.; Lottu, O.A.; Daraojimba, O.D.; Obaigbena, A. Exploring the potential of AI-driven optimization in enhancing network performance and efficiency. Magna Sci. Adv. Res. Rev. 2024, 10, 368–378. [Google Scholar] [CrossRef]
Cruz, Y.J.; Castaño, F.; Haber, R.E.; Villalonga, A.; Ejsmont, K.; Gladysz, B.; Flores, Á.; Alemany, P. Self-Reconfiguration for Smart Manufacturing Based on Artificial Intelligence: A Review and Case Study. In Artificial Intelligence in Manufacturing: Enabling Intelligent, Flexible and Cost-Effective Production Through AI; Springer Nature: Cham, Switzerland, 2024; pp. 121–144. [Google Scholar]
Lin, Y.; Bie, Z. A review of key strategies in realizing power system resilience. Glob. Energy Interconnect. 2018, 1, 2096511718300094. Available online: https://www.sciencedirect.com/science/article/pii/S2096511718300094 (accessed on 14 March 2025).
Yu, P.; Shi, L.; Liu, B. Survivability-aware routing restoration mechanism for smart grid communication network in large-scale failures. EURASIP J. Wirel. Commun. Netw. 2020, 1, 104. Available online: https://link.springer.com/article/10.1186/s13638-020-1653-4 (accessed on 14 March 2025).
Moradi, M.H.; Razini, S.; Hosseinian, S.M. State of the art of multi-agent systems in power engineering: A review. Renew. Sustain. Energy Rev. 2016, 58, 814–824. [Google Scholar] [CrossRef]
Cheng, L.; Yu, T. Smart Dispatching for Energy Internet with Complex Cyber-Physical-Social Systems: A Parallel Dispatch Perspective. Int. J. Energy Res. 2019, 43, 3080–3133. [Google Scholar] [CrossRef]
Cheng, L.; Yu, T.; Zhang, X.; Yin, L. Machine Learning for Energy and Electric Power Systems: State of the Art and Prospects. Autom. Electr. Power Syst. 2019, 43, 15–43. [Google Scholar] [CrossRef]
Renugadevi, R.; Shobana, J.; Arthi, K.; AV, K.; Satishkumar, D.; Sivaraja, M. Real-Time Applications of Artificial Intelligence Technology in Daily Operations. In Using Real-Time Data and AI for Thrust Manufacturing; IGI Global: Hershey, PA, USA, 2024; pp. 243–257. [Google Scholar]
Cen, J.; Yang, Z.; Liu, X.; Xiong, J.; Chen, H. A Review of Data-Driven Machinery Fault Diagnosis Using Machine Learning Algorithms. J. Vib. Eng. Technol. 2022, 10, 2481–2507. [Google Scholar] [CrossRef]
Diez-Olivan, A.; Del Ser, J.; Galar, D.; Sierra, B. Data Fusion and Machine Learning for Industrial Prognosis: Trends and Perspectives Towards Industry 4.0. Inf. Fusion 2019, 50, 92–111. [Google Scholar] [CrossRef]
Fernandes, M.; Corchado, J.M.; Marreiros, G. Machine Learning Techniques Applied to Mechanical Fault Diagnosis and Fault Prognosis in the Context of Real Industrial Manufacturing Use-Cases: A Systematic Literature Review. Appl. Intell. 2022, 52, 14246–14280. [Google Scholar] [CrossRef]
Saufi, S.R.; Ahmad, Z.A.B.; Leong, M.S.; Lim, M.H. Challenges and Opportunities of Deep Learning Models for Machinery Fault Detection and Diagnosis: A Review. IEEE Access 2019, 7, 122644–122662. [Google Scholar] [CrossRef]
Leite, D.; Martins Jr, A.; Rativa, D.; De Oliveira, J.F.; Maciel, A.M. An Automated Machine Learning Approach for Real-Time Fault Detection and Diagnosis. Sensors 2022, 22, 6138. [Google Scholar] [CrossRef]
Jung, K.H.; Kim, H.; Ko, Y. Network Reconfiguration Algorithm for Automated Distribution Systems Based on Artificial Intelligence Approach. IEEE Trans. Power Deliv. 1993, 8, 1933–1941. [Google Scholar] [CrossRef]
Shakiba, F.M.; Azizi, S.M.; Zhou, M.; Abusorrah, A. Application of Machine Learning Methods in Fault Detection and Classification of Power Transmission Lines: A Survey. Artif. Intell. Rev. 2023, 56, 5799–5836. [Google Scholar] [CrossRef]
Bruton, K.; Raftery, P.; Kennedy, B.; Keane, M.M.; O’sullivan, D.T.J. Review of Automated Fault Detection and Diagnostic Tools in Air Handling Units. Energy Effic. 2014, 7, 335–351. [Google Scholar] [CrossRef]
Fenton, W.G.; McGinnity, T.M.; Maguire, L.P. Fault Diagnosis of Electronic Systems Using Intelligent Techniques: A Review. IEEE Trans. Syst. Man Cybern. Part C 2001, 31, 269–281. [Google Scholar] [CrossRef]
Wu, D.; Zheng, A.; Yu, W.; Cao, H.; Ling, Q.; Liu, J.; Zhou, D. Digital Twin Technology in Transportation Infrastructure: A Comprehensive Survey of Current Applications, Challenges, and Future Directions. Appl. Sci. 2025, 15, 1911. [Google Scholar] [CrossRef]
Arora, S.; Tewari, A. AI-Driven Resilience: Enhancing Critical Infrastructure with Edge Computing. Int. J. Curr. Eng. Technol. 2022, 12, 151–157. [Google Scholar]
Nasarian, E.; Alizadehsani, R.; Acharya, U.R.; Tsui, K.L. Designing interpretable ML system to enhance trust in healthcare: A systematic review to propose responsible clinician-AI-collaboration framework. Inf. Fusion 2024, 108, 102412. [Google Scholar] [CrossRef]
KN, K.; Perrusquia, A.; Tsourdos, A.; Ignatyev, D. Integrating Explainable AI into Two-Tier ML Models for Trustworthy Aircraft Landing Gear Fault Diagnosis. In AIAA SCITECH 2025 Forum; American Institute of Aeronautics and Astronautics: Reston, VA, USA, 2025; p. 1928. [Google Scholar]
Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [Google Scholar] [CrossRef]
Cross, L.; Cockburn, J.; Yue, Y.; O’Doherty, J.P. Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments. Neuron 2021, 109, 724–738. [Google Scholar] [CrossRef]
Altaher, A. Implementation of a Dependability Framework for Smart Substation Automation Systems: Application to Electric Energy Distribution. Ph.D. Thesis, Université Grenoble Alpes, Grenoble, France, 2018. [Google Scholar]
Baigent, D.; Adamiak, M.; Mackiewicz, R.; Sisco, G.M.G.M. IEC 61850 Communication Networks and Systems in Substations: An Overview for Users; SISCO Systems: Sterling Heights, MI, USA, 2004. [Google Scholar]
Cappart, Q.; Chételat, D.; Khalil, E.B.; Lodi, A.; Morris, C.; Veličković, P. Combinatorial optimization and reasoning with graph neural networks. J. Mach. Learn. Res. 2023, 24, 1–61. [Google Scholar]
Alex-Omiogbemi, A.A.; Sule, A.K.; Omowole, B.M. Conceptual framework for advancing regulatory compliance and risk management in emerging markets through digital innovation. World J. Adv. Res. Rev. Dec. 2024, 24, 1155–1162. [Google Scholar] [CrossRef]
Wang, X.; Wu, Y.C. Balancing innovation and regulation in the age of generative artificial intelligence. J. Inf. Policy 2024, 14, 93–112. [Google Scholar] [CrossRef]

Figure 1. Hierarchical control architecture for subway power systems: a structured approach to fault recovery and system optimization.

Figure 2. A typical electrified railway traction power supply system architecture serves as the foundational architecture for providing electrical power to urban subway transit systems.

Figure 3. A flowchart of the legacy equipment integration process.

Figure 4. Fault isolation and recovery process in self-healing control for subway power supply systems.

Table 1. Comparative summary of self-healing in power/energy systems vs. metro power supply systems.

Comparative Dimension	Self-Healing in Power and Energy Systems	Self-Healing in Metro Power Supply Systems	Typical Scenarios, Application Levels, and Definitional Differences
Primary Objective	Ensures wide-area safety and reliability, promptly isolates faults, and restores service to critical loads across transmission and distribution.	Focuses on swiftly identifying and isolating faults within confined metro lines, minimizing operational disruption, and sustaining continuous train service.	In power systems, self-healing primarily targets large-scale networks. In metro systems, it aims at uninterrupted public transit. Both emphasize isolation and quick restoration, yet metro systems impose more stringent continuity requirements.
Network Scale and Complexity	Comprises hierarchical generation, transmission, and distribution across vast geographic regions, integrating both conventional and distributed energy resources.	Concentrates on urban rail corridors, with relatively fixed routing but complex operational environments; traction loads exhibit cyclical fluctuations with high power demands.	Power systems handle multi-voltage-level, widely dispersed networks. Metro systems focus on shorter feeder sections and specialized loads. Both require robust control, but metro systems demand faster, more localized responses.
Control and Management Layers	Typically includes a layered architecture with a central energy management system (energy management system (EMS)/supervisory control and data acquisition (SCADA)), substation automation, and distributed control in feeder terminals.	Employs centralized or semi-centralized control via SCADA or integrated supervisory control systems (ISCSs), with shorter communication pathways for rapid switching actions.	Power systems rely on multi-tier communications for broad-area coordination. Metro systems maintain shorter command chains, enabling near-instantaneous protection and recovery. The underlying definitions emphasize automation, but with distinct time horizons.
Fault Types and Detection	Encompasses conventional short-circuits, equipment aging, and severe external disruptions (e.g., lightning, ice storms). Fault detection relies on diverse sensors, relays, and advanced topology analyses.	Primarily contends with feeder or substation faults (e.g., short-circuits, overloads), external threats from construction, and environmental factors damaging contact lines; detection uses specialized track sensors.	General power systems confront a wider range of fault types, while metro faults typically concentrate on contact-line or substation issues. Both emphasize real-time detection, though metro systems have elevated safety margins due to passenger transport.
Fault Isolation and Network Reconfiguration	Achieves isolation through automated breakers, reclosers, and load transfer, often within seconds to minutes; integrates alternative power sources to maintain supply continuity.	Relies on ring-supplied networks and rapid switching to isolate faulty sections while sustaining feeder services to unaffected track segments, typically within a matter of seconds or less.	Both leverage automated switches and feeder reconfiguration. However, metro systems often need a faster (sub-minute to second-level) approach to preserve critical train operations without substantial delays.
Information and Communication Technologies	Relies on multi-level, wide-area networks (optical fiber, wireless, private lines) with numerous nodes and potential bandwidth constraints; designed for comprehensive monitoring and control.	Benefits from relatively shorter distances and more centralized configurations, typically integrated into a single specialized or semi-isolated communication framework with low latency.	Both necessitate reliable, real-time communication. Yet, power systems face more distributed deployments, whereas metro systems can leverage a dedicated, smaller-scale communication backbone.
Reliability and Safety Standards	Must comply with national or industry regulations (e.g., SAIDI (system average interruption duration index), SAIFI (system average interruption frequency index)) and increasingly consider cybersecurity challenges; reliability is critical but often measured statistically across larger geographic footprints.	Stringent safety standards and zero tolerance for lengthy service interruptions, given its direct impact on public transit; also must consider passenger evacuation and emergency scenarios in fault response.	Both strive for high reliability, but metro systems face more immediate safety and service pressures. Definitions converge on the principle of minimizing power interruption, though metro systems place human safety and operational continuity at the forefront.
Current Implementation and Trends	Already deployed widely in smart distribution grids internationally, with varying degrees of investment and maturity; advanced sensors, grid automation, and microgrid technologies are growing rapidly.	Actively being integrated into new and existing metro lines worldwide; especially in newer constructions, self-healing features are incorporated during design to minimize service interruption times and enhance safety.	Power grids and metro systems both advance toward greater intelligence and automation, but metro self-healing is more domain-specific and dedicated to ensuring passenger service. Definitions reflect macroscopic grid security vs. localized transit continuity.

Table 2. Complex topological and operational constraints in subway power supply systems.

Aspect	Power and Energy System Application	Subway Power Supply Application	Level of Implementation	Differences in Implementation	Future Potential	Urgent Challenges	Research Opportunities
Topological Complexity	Typically radial or meshed network designs	Hybrid ring/radial topologies under tight constraints	Medium to high	Limited space for additional cables; strict safety requirements	Enhanced modeling tools	Integrating advanced sensors in limited space	Compact, scalable approaches for real-time fault management
Load Variability	Seasonal/diurnal load patterns	Rapid load changes due to train acceleration	Medium	High-frequency fluctuations unique to rail traction	Predictive analytics for dynamic load management	Handling transient conditions in real-time detection	AI-driven adaptive protection schemes
Fault Tolerance Requirements	Important but can rely on alternative feeders	Critical: passenger safety at stake	High	Stricter recovery times; mandatory redundancies in underground segments	Seamless reconfiguration for uninterrupted service	Maintaining safety with minimal downtime	MAS-based solutions that integrate safety logic
Infrastructure Constraints	Often more flexible, especially above ground	Very limited corridor space; complex cable routing	Low to medium	Equipment miniaturization needed; advanced maintenance scheduling	Innovative hardware designs	High costs of retrofitting and expansion	Modular protective devices optimized for underground environment
Environmental Factors (Heat, Humidity, etc.)	Relevant but typically less extreme	Critical in enclosed tunnels	Medium	Ventilation and cooling demands must be integrated with power layout	Energy-efficient solutions for underground ventilation	Protective devices degrade faster due to harsh conditions	Designing robust sensors and switchgear suited to harsh environments
Communication Constraints	Generally open space for wireless or fiber links	Underground layout complicates communication wiring	Medium	Signal attenuation in tunnels; need for robust communication protocols	Advanced tunnel communication frameworks	Ensuring real-time data flow under challenging conditions	Research on fault-tolerant communication for tunnel environments
Maintenance and Operational Constraints	Significant but often can schedule downtime	Very limited track closure windows	Medium to high	Maintenance must be performed swiftly, often in off-peak or night hours	Autonomous or remote inspection tools	High operational risk if maintenance is delayed	Development of continuous monitoring systems and predictive maintenance
Regulatory and Safety Standards	National grid codes and industry standards	Stricter local transit authority regulations	High	Safety certification for every component or software module	Holistic compliance with railway regulations	Multiple approvals from transportation authorities	Integrated safety frameworks bridging power and rail standards

Table 3. Potential technological and research interventions for complex operational constraints.

Aspect/Constraint	Proposed Interventions	Implementation Status	Key Benefits	Limitations/Barriers	Future Potential	Urgent Challenges	Research Directions
Space Constraints and Equipment Size	Miniaturized switchgear, compact substations, solid-state	Early adoption in some metros	Saves valuable tunnel space, eases retrofitting	Higher cost, potential reliability issues	Medium to high	Testing and safety certifications	Developing robust, affordable miniaturized devices
High Load Variability	AI-based predictive load balancing, advanced SCADA	In pilot projects	Accurate forecasting, improved real-time control	Requires extensive sensor data, complex algorithms	High	Ensuring real-time responsiveness	Machine learning algorithms for real-time load prediction
Environmental Challenges (Heat, Humidity)	Enhanced insulation, specialized cooling systems	Widely used but needs updates	Protects equipment longevity, improves reliability	Increases CAPEX/OPEX, depends on ventilation design	Medium	Integrating with energy efficiency	Smart materials, advanced sensor-based heat management
Fault Tolerance and Rapid Isolation	MAS-based reconfiguration, advanced protective relaying	Experimental or partial use	Minimizes downtime, enhances passenger safety	Requires robust communication and standard protocols	High	Fast, secure communications	MAS architecture aligning with IEC 61850 and railway codes
Communication and Data Synchronization	IEC 61850 GOOSE messaging, fiber-optic and wireless hybrids	Expanding in pilot programs	Enables real-time data sharing, simplified integration	Tunnel attenuation and installation complexity	High	Guaranteed QoS in tunnel environment	Ultra-reliable communication protocols for underground rail
Maintenance and Operational Scheduling	Predictive maintenance, digital twins, remote inspection	Growing adoption	Minimizes downtime, reduces costs, extends asset life	Requires high initial investment, specialized staff	Very High	Coordinating track closures	AI-driven digital twins for continuous condition monitoring
Regulatory Compliance and Safety	Unified standards bridging power and railway domains	Ongoing efforts	Streamlines certification, ensures compatibility	Multiple authorities, differing regulations	High	Prolonged approvals	Collaborative frameworks for standardization
Cybersecurity Tools	End-to-end encryption, intrusion detection systems	Varies by region	Protects critical control systems from cyber threats	Requires advanced IT infrastructure	High	Ensuring trust in automated systems	AI-based anomaly detection integrated with self-healing

Table 4. Dimensions of real-time fault management in subway vs. general power systems.

Aspect	Power and Energy Systems Application	Subway Power Supply Application	Level of Implementation	Differences in Implementation	Future Potential	Urgent Challenges	Research Opportunities
Fault Detection Speed	Millisecond to second range	Must be sub-cycle to tens of milliseconds	Medium to high	Higher sensitivity due to enclosed spaces and passenger risk	Very high (real-time control)	Achieving ultra-fast detection in tunnels	Wavelet-based traveling-wave methods
Isolation Techniques	Circuit breakers at multiple nodes	Limited breakers, need precise isolation zones	Medium	Space constraints, higher cost for extra switchgear	Medium to high	Minimizing downtime in short track segments	MAS-based isolation protocols
Restoration Priority	Typically based on load importance	Safety-critical loads top priority (lighting)	High	Focus on passenger evacuation and ventilation requirements	High	Ensuring continuous service with minimal risk	Hierarchical multi-agent strategies
Communication and Coordination	SCADA systems, sometimes distributed	Underground environment with possible signal loss	Medium to high	Signal attenuation in tunnels; need for robust communication protocols	Very high	Reliable data exchange under ground conditions	IEC 61850-based GOOSE in tunnels
Automation Level	Moderate to advanced in smart grids	Rapidly evolving with pilot tests in subways	Low to medium	Standard solutions less tested in subterranean rail networks	High	Balancing new tech with proven reliability	Full-scale, integrated MASs + AI systems
Data Handling and Analytics	Cloud-based or on-premise analytics	On-site edge computing for real-time decisions	Growing	High real-time constraints, limited data bandwidth	High	Managing large-scale sensor data in real time	Edge AI algorithms for fault prediction
Reliability and Redundancy	Important for major loads, less for small feeders	Critical for all track sections	High	Redundancies have to be physically feasible underground	Very high	Ensuring no single point of failure	Optimized redundancy planning
Cost and Investment	Balanced with broad utility budgets	Constrained by transit authority budgets	Varies	High capital expenditures for specialized rail infrastructure	Medium	Gaining stakeholder support	Economic feasibility studies

Table 5. Technological enablers for real-time fault management in subway power networks.

Aspect/Technology	Current Adoption	Primary Advantage	Limitations	Future Potential	Urgent Challenges	Research Gaps	Recommended Solutions/Directions
High-Speed Protection Relays	Moderate	Sub-cycle response, improved sensitivity	Prone to nuisance tripping under variable load	High	Tuning relay settings to subway load patterns	Algorithm refinement for multi-condition loads	Customized relay settings with AI-based adaptation
Wavelet-Based Fault Detection	Pilot	Accurate detection of transient signals	Computational complexity in real time	Medium to High	Guaranteeing stable performance under noise	Optimal wavelet design for DC traction signals	Hybrid wavelet + machine learning methods
Traveling Wave Methods	Limited	Precise fault localization	Requires synchronized data acquisition	High	Installing enough sensors in short intervals	Cost-effective sensor deployment	Cooperative traveling wave detection systems
MAS-based Isolation	Experimental	Distributed decision-making, resilience	Complexity of agent coordination protocols	Very High	Achieving sub-second isolation in tunnels	Standardizing MAS frameworks for rail settings	IEC 61850-compatible MAS design
Self-Healing Restoration Logic	Early prototypes	Automated service restoration, dynamic re-routing	Requires robust network modeling	Medium to High	Handling partial restorations effectively	Real-time load and system state estimation	MAS-based restoration integrated with SCADA
AI/ML for Fault Prediction	Emerging	Predictive maintenance, early anomaly detection	Data scarcity, labeling issues, validation costs	Very High	Ensuring model accuracy in changing conditions	Neural network interpretability and reliability	Hybrid physics-informed neural networks
Edge Computing in Substations	Low but growing	Reduces latency, improves local decision-making	Limited computational resources on site	High	Onboard analytics to handle real-time data	Designing low-power, high-performance hardware	Embedded systems optimized for fault analytics
Cybersecurity Compliance	Varies by region	Protects reliability of automated fault systems	Additional cost and complexity	High	Mitigating increased attack surface	Secure data frameworks integrated with self-healing	Blockchain-based identity and access management

Table 6. Regulatory, safety, and integration barriers in subway power systems.

Aspect	Power and Energy Systems Application	Subway Power Supply Application	Level of Implementation	Differences in Implementation	Future Potential	Urgent Challenges	Research Opportunities
Regulatory Frameworks	Utility-level codes (IEEE, IEC), less direct public scrutiny	Multi-layered railway authority oversight, strict passenger safety	Medium	Longer certification processes, overlapping authorities	Medium to high	Streamlining multi-agency approvals	Standardization bridging IEC 61850 and rail codes
Safety Assurance	Important but primarily equipment-focused	Critical for passenger well-being; zero tolerance for major failures	High	Need for rapid evacuation, air quality, and lighting continuity	High	Minimizing disruptions that endanger passengers	MAS designs incorporating real-time hazard monitoring
Legacy System Integration	Often stepwise modernization	Legacy traction power systems with partial SCADA	Low to medium	Protocol mismatch, older hardware with minimal digital interfaces	Medium	Retrofitting with minimal service downtime	Adaptive hardware modules, protocol converters
Standards and Protocols	Common adoption of IEC 61850 in substation automation	Emerging adaptation for traction power and MAS coordination	Low to medium	Must handle DC traction specifics, tunnel conditions	High	Harmonizing substation automation with rail codes	IEC 61850 profiles specialized for traction systems
Cost vs. Benefit Analysis	Long-term cost-benefit for large-scale utilities	Immediate passenger service impact, budget constraints	Medium	Hard to quantify intangible benefits (e.g., safety, brand image)	Medium	Securing investments under strict budget caps	Detailed ROI models including passenger satisfaction
Training and Workforce	Utility engineers, typical skill sets	Specialized rail engineers, safety certifications	Low to medium	Additional training for advanced AI/MAS solutions	High	Building cross-functional teams	Education programs bridging power and rail domains
Public Acceptance and Trust	Generally behind-the-scenes updates	High visibility with potential passenger disruption	Medium to high	Risk of negative perception if technology causes downtime	Medium	Ensuring stable operation during pilot phases	Transparent communication about improvements
Cybersecurity Compliance	Growing awareness, diverse regulations	Vital to protect passenger and operational data	Medium	Potential for large-scale disruptions if hacked	High	Ensuring end-to-end security in tunnels	Integrated intrusion detection with MAS frameworks

Table 7. Strategies to overcome regulatory, safety, and integration barriers.

Strategy/Measure	Adoption Level	Expected Impact	Implementation Difficulty	Key Benefits	Potential Risks/Barriers	Research Needs	Future Outlook
Unified Standardization Efforts	Growing momentum	Streamlines compliance across agencies and vendors	Medium	Reduced project delays, interoperability	Requires consensus among diverse stakeholders	Holistic standards for MASs + traction power systems	Feasible with continued collaboration among IEC, IEEE, rail authorities
Cross-Domain Training Programs	Limited pilots	Enhances workforce competency in both power and rail	Medium to high	Facilitates smooth technology integration	Budget constraints, scheduling complexities	Curriculum design integrating railway safety and AI	Key to building a sustainable talent pipeline
Pilot and Sandbox Environments	Emerging in some metros	Allows safe testing of new systems in controlled settings	Medium	Minimizes risk to passengers, validates ROI	May still require partial line closures	Detailed performance metrics, extended pilot durations	Gradual system-wide rollouts after proven success
Modular Retrofit Approaches	Limited	Incrementally modernizes legacy systems	High	Avoids complete system overhaul, spreads cost	Complexity of ensuring compatibility	Adaptive hardware modules, protocol converters	Could become standard practice for older metro lines
Risk–Benefit Communication	Ad hoc	Improves public acceptance and stakeholder engagement	Low to medium	Builds trust, eases implementation controversies	Requires public outreach, specialized messaging	Communication frameworks and standardized ROI metrics	Crucial for ensuring supportive regulatory environment
Comprehensive Cybersecurity	Growing awareness	Safeguards data integrity, essential for MASs/AI systems	Medium	Avoids catastrophic disruptions, protects passenger data	Costly to maintain, evolving threat landscape	Intrusion detection, endpoint security for IEDs	Integral part of future integrated self-healing systems
Funding and Incentive Mechanisms	Region-dependent	Encourages R&D investment and pilot deployment	Medium to high	Facilitates advanced research, reduces operator risk	Political and economic uncertainties	Economic models that quantify intangible benefits	Key to bridging the gap between research and real deployment
Long-Term Maintenance Contracts	Limited	Ensures continuous expert support post-deployment	Medium	Maximizes system reliability, knowledge transfer	Potential vendor lock-in	Service-level agreements with advanced penalty clauses	A stable framework for ensuring reliability over the system lifetime

Table 8. Comparison between current self-healing techniques and the suggested MAS-based strategy.

Comparison Criteria	Current Self-Healing Techniques	Suggested Self-Healing Strategy
Technology Foundation	Traditional Fault Detection Algorithms: Based on simple fault detection (e.g., overcurrent, voltage drop) and subsequent isolation using conventional relays.	AI-based MASs: Utilizes real-time data analysis and intelligent agents that communicate autonomously to identify, localize, and isolate faults more efficiently.
Fault Detection Speed	Typically requires several cycles to detect and isolate faults, leading to delayed responses in high-speed environments like subways.	Detects faults in fractions of a cycle (using wavelet-based techniques), significantly improving detection time in high-speed subway systems.
System Flexibility	Often fixed in design, requiring manual intervention or predetermined responses to faults.	Highly flexible, where agents adapt and optimize their responses based on changing conditions in real time, offering greater scalability.
Real-Time Adaptability	Limited real-time adaptability, as conventional methods use static rules for fault isolation.	Uses AI to adapt fault recovery strategies in real time, considering dynamic load and environmental factors, especially in urban settings with complex network topologies.
Resilience to Complex Topologies	Struggles in complex network topologies (e.g., ring or radial configurations) as there are fixed paths for fault detection and isolation.	MAS-based strategy is inherently more suited for complex network topologies, allowing autonomous decision-making across distributed systems.
Failure Recovery Efficiency	Traditional methods often lead to long recovery times and may not restore service to critical areas promptly.	MASs enable rapid rerouting of power through alternate paths in real time, ensuring minimal downtime and prioritized restoration of critical services such as lighting and ventilation.
Space Constraints	Conventional systems use additional hardware (e.g., circuit breakers) which may be difficult to install in confined spaces such as subway tunnels.	MASs use distributed sensors and devices (without the need for additional hardware), facilitating more compact and space-efficient implementations.
Maintenance and Scalability	Requires regular maintenance of each individual component, and expanding the system often requires significant hardware upgrades.	MASs require less hardware maintenance and can be scaled up by adding more intelligent agents, making them easier to adapt and expand.
Safety Protocols	Basic safety mechanisms (e.g., emergency power supply, fire safety) that activate in case of failure but often lack dynamic prioritization of critical services.	Integrates advanced safety protocols, ensuring that critical systems (e.g., lighting, ventilation) are always prioritized during fault isolation and power restoration.
Cost	Lower initial cost but higher long-term costs due to maintenance, hardware upgrades, and manual intervention during fault recovery.	Higher initial setup cost for AI-based systems, but lower long-term costs due to reduced maintenance needs and quicker fault recovery, offsetting initial investments.
Regulatory Compliance	Generally compliant with existing safety standards, but lacks integration with emerging AI-based regulations.	Requires new regulatory frameworks to accommodate AI-based systems, including certification of MAS- and AI-driven fault management protocols.
Integration with Legacy Systems	Difficulty in integrating with older infrastructures, requiring hardware upgrades or complete system overhauls.	MASs can integrate with legacy systems via gateways, allowing for gradual upgrades without a complete system overhaul.
Use of Data Analytics	Relies on limited data for fault detection, often with basic analysis on voltage, current, and fault type.	Uses advanced machine learning models that analyze vast datasets from multiple sensors to predict faults before they occur and optimize recovery strategies.
Environmental Adaptability	Traditional systems are often static, and environmental factors (e.g., temperature, humidity) can impact their performance.	Adaptive MAS technology continuously adjusts to environmental conditions such as temperature or humidity, enhancing system resilience in diverse settings like subway tunnels.

Table 9. Comparative applications of MASs in subway power systems.

Application Scenario	Degree of Adoption	Main Differences vs. Conventional Methods	Future Prospects	Key Challenges	Potential Research Directions	Implementation Complexity	Current Deployments	Estimated Cost	Strategic Importance
Fault Diagnostics	Medium	Distributed vs. centralized analysis	High, with advanced AI pattern matching	Standardization of agent protocols	Agent-based feature extraction, big data	Moderate	Some pilot projects in major cities	Moderate	Very high for timely response
Fault Isolation and Restoration	High	Faster local control decisions	Expansion into microgrid or hybrid	Communication security	Autonomous agent negotiation algorithms	High	Widely tested internationally	High (due to new device requirements)	Critical for system safety and operation
System Reconfiguration	Moderate	Cooperative agent-based switching	Multi-level architecture	Reliability of real-time signals	Hybrid MAS-IEC 61850 integration	Moderate	Ongoing research initiatives	Medium	Essential for robust self-healing
Load Shedding	Low	Intelligent selective dropping vs. global	Potentially large in future DC traction	Complexity in load forecasting	Agent-based optimization with historical data	Low to moderate	Rare field demos	Low	Important for emergency readiness
Predictive Maintenance	Low	Online prognosis vs. reactive upkeep	Growing, particularly with big data	Accuracy of machine learning methods	Deep learning integrated with MASs	Moderate	Conceptual studies ongoing	Medium	Enhances preventive strategies
Voltage Regulation	Moderate	Distributed voltage control vs. single	High in smart-grid expansions	Coordinated agent control frameworks	MAS-based dynamic VAR control	Moderate	Some pilot tests	Medium	Improves power quality and reliability
Power Quality Monitoring	Low	Real-time harmonic detection vs. offline	Emerging with new sensor technologies	Limited coverage of sensor networks	MAS-based harmonic mitigation techniques	Moderate to high	Minimal large-scale deployments	Medium–high	Boosts passenger experience
Microgrid Integration	Emerging	Local agent-based decisions vs. central SCADA	Potential synergy with renewable and storage in depots	Technical maturity in AC/DC hybrid systems	MAS strategies for integrated AC/DC grids	High (novel tech)	Few advanced pilots	High	Key for future urban rail expansions
Energy Management	Moderate	Intelligent routing of feeders and loads	Large, with data-driven analytics	MAS coordination of traction loads	Cooperative scheduling with power flow control	Moderate	Under study	Medium	Affects cost optimization
Security Assessment	Low	Real-time multi-agent vigilance vs. static approaches	Likely critical with growing threats	Cybersecurity integration	MAS-based anomaly detection and intrusion response	High	Conceptual prototypes	Medium	Ensures reliability and safety

Table 10. Research directions and emerging trends in MASs for subway networks.

Research Focus	Current State	Potential Evolution	Key Technical Barriers	Unique Subway Constraints	Synergy with Other Technologies (5G, Edge AI)	Real-Time Simulation Capabilities	Scalability	Funding and Collaboration	Projected Long-Term Impact
Agent Coordination	Experimental pilots in academic settings	Larger multi-level hierarchical models within an IEC 61850 environment	Communication latency and security	AC/DC mixture in traction systems	Integrated platform for distributed intelligence	High-fidelity agent-based simulation with real-time data injection	Moderate; advanced control algorithms needed	Government + private railway operators	Possibly transformative, enabling advanced self-healing frameworks
Resilient Architecture for Fault Handling	Conceptual designs only	Full integration with standardized protocols (IEC 61850)	Dynamic stability and real-time demands	Frequent service runs with minimal downtime	Cloud-edge synergy for real-time data analytics and AI-based forecasting	Testing under stress conditions, hardware-in-the-loop evaluations	High, must handle large expansions	Partnerships among academia and metro agencies	Potential to drastically reduce service disruptions
Multi-Agent Security and Cyber-Resilience	Emerging topic, few publications	Integrated self-healing + intrusion detection architecture	Lack of robust agent intrusion detection and incomplete policy	Threat potential from critical service lines	AI-driven intrusion detection and response	Simulation involving both operational tech (OT) and information tech (IT)	Moderate, as overhead on agent frameworks might be significant	Collaborative R&D with cybersecurity vendors and standard bodies	Could become essential as 5G and IoT expand the attack surface
Integration with Big Data and Analytics	Limited integration for offline analysis	Full data-driven MAS control loops with streaming inputs	Complexity of data ingestion, real-time transformations (AC, DC, station, etc.)	High volume, multi-domain measurements	ML, deep learning-based advanced analytics for anomaly detection	Real-time streaming and batch processing synergy for agent training	High, as big data solutions require robust infrastructure	Industry–university collaboration crucial	Opens new avenues for predictive control, faster fault recovery
Hierarchical vs. Decentralized Agents	Mostly hierarchical prototypes	Hybrid approaches combining local and centralized synergy	Inter-agent conflicts in fully distributed agent models	AC traction and DC feeding lines require different control logics	IoT-based sensing nodes for real-time data acquisition	Flexible scenario testing across different topologies (urban, suburban lines)	High for large networks wanting modular expansions	Global consortia and standard organizations	Could shape the next-gen topology management methods

Table 11. IEC 61850 applications in subway systems: status and outlook.

Application Scenario	Current Adoption Level	Key Benefits	Main Implementation Obstacles	Technical Gaps	Standard Extensions Being Explored	Scalability Concerns	Cost/Benefit Analysis	Industry Collaboration	Long-Term Outlook Avenues
Protection and Control	High in new installations	Ultra-fast clearing times, standardized object modeling	Integration with legacy DC gear	Custom LN for DC traction needed	IEC 61850-90-6 for FLISR indicates expansions for distribution	Mostly manageable, as each station covers a limited number of IEDs	Positive ROI in the long run; short-term high investment	Joint pilots by OEMs and transit agencies	Key segment likely to see continuous growth
GOOSE-Based Signaling	Moderate usage primarily in AC side	Millisecond-level event-driven control	Fine-tuning of network redundancy and VLANs	Overlapping VLAN, QoS configurations	Railway-specific LN expansions	Challenging in large city-wide systems but feasible	Often justified by improved reliability	Vendor-driven updates, global standardization	Expanding as reliability demands escalate
Central Monitoring and SCADA Integration	High for new lines, partial retrofit in legacy lines	Standardized data acquisition, unified engineering	Coordinating data from diverse device types	Full mapping for older DC devices incomplete or vendor-proprietary	IEC 61850 in traction automation is under development beyond substation	System-level expansions in large subway networks	High initial cost, strong ROI in O&M savings over time	Partnerships with system integrators, local operators	Indispensable for future expansions in subways and electrified rail
Condition Monitoring	Emerging interest	Uniform data model, improved predictive maintenance	Retrofitting sensors into older assets	Data volume and storage infrastructure	IEC 61850 extension for advanced sensors	Potentially large as new sensor deployments scale out	Potentially high initial investment, offset by O&M savings in the short term	Potential synergy with predictive maintenance platforms	Highly promising with AI strategies for improved reliability and safety
Integration with MASs	Limited yet growing interest	Common data exchange framework with fast protocols	Achieving consistent LN naming and object structures for all AC/DC	Mapping agent tasks to LN objects remains a major barrier	Under development: bridging railway LN with agent logic (90-16 etc.)	Potentially seamless if standard LN definitions are carefully extended	Long-run ROI promising; short-run complexity high	Consortia for MAS-based standardization efforts are essential	Significant synergy with advanced self-healing capabilities

Table 12. Key challenges and research potential for IEC 61850 in subway environments.

Core Challenge	Real-World Impact	Underlying Cause	Potential Mitigations	Ongoing R&D Directions	Standardization Gaps	Implementation Costs	Stakeholder Collaboration	Scalability	Long-Term Opportunities
Retrofitting Legacy Equipment	High in older lines with minimal budgets	Non-compatible IEDs, vendor-proprietary protocols	Gateway solutions, incremental hardware integration	Developing specialized engineering profiles	Limited LN definitions for DC traction elements and bridging logic	Medium (moderate hardware outlay)	Transit agencies, system integrators, vendors	Potentially low with well-planned modular upgrades	High, can extend system lifespan with minimal disruptions
Security Threats to GOOSE/MMS	Potentially extreme disruptions to train services	Larger digital footprint, IP-based communications	Adoption of secure substation gating, role-based access, encryption, intrusion detection frameworks	Enhancing GOOSE, MMS security, emerging best practices	Not fully integrated into IEC 61850 documents	High (due to specialized cybersecurity hardware and software)	Collaboration with cybersecurity experts, standard bodies	High, as networks grow and more connected components are added	Enhanced reliability and passenger safety
Complexity of LN/DO/DA Mapping	Risk of misconfigurations, hamper reliability	Many LN, DO, DA with cryptic nomenclatures in large networks	Rigorous workforce training, improved SCL tools, compliance checks, advanced engineering software	Tools for automated SCL checks, global LN expansions for traction	Some LN expansions not widely implemented globally	Medium	Vendor synergy crucial to ensure consistent naming and modeling	Medium, partial automation feasible for new lines with minimal manual overhead	Streamlined expansions for new lines with minimal manual overhead
High-Speed Communication	Vital for self-healing performance but can be limited	GOOSE, SMV traffic congestion or suboptimal routing	QoS management, VLAN segmentation, robust redundancy protocols	Designing advanced traffic shaping and ring redundancy protocols, ongoing expansions for time synchronization	PTP profiles for time synchronization in traction context	Moderate to high (switches, network)	Industry alliances and communication vendors bridging train operators	High if network architecture is well designed from the start	Real-time detection and response enhancements

Table 13. Practical deployment cases for MAS–IEC 61850 in subway systems.

Deployment Scenario	Network Configuration	MAS Scope of Control	IEC 61850 Layer Usage	Integration Complexity	Performance Requirements	Initial Investment	Scalability	Stakeholder Involvement	Long-Term Feasibility
Full Greenfield	New lines with fully digital substations and advanced control	End-to-end, covering AC and DC feeders, station-based MASs, protection, MMS for supervisory tasks	GOOSE for real-time protection, MMS for supervisory tasks	Medium (designed from scratch to incorporate MASs+IEC 61850 synergy)	High reliability, ultra-low latency needed	Relatively high, but offset by lower future O&M costs	High potential to add new stations, lines seamlessly	Turnkey solution from major manufacturers	Very high; standardized frameworks remain relevant for decades
Partial Retrofit	Existing lines with some digital and some analog equipment	Focus on station-based AC feeder switching or DC traction auto-reconfiguration	GOOSE bridging older and modern devices, MMS for SCADA overlay	High, due to mismatch among legacy gear, older comm protocols, and new LN definitions	Moderate reliability, improved reaction times needed	Moderate to high, given new hardware (protocol gateways, partial re-wiring)	Potentially moderate expansions if planning is carefully carried out	Collaboration with domain experts, vendor interoperability	Feasible, but requires methodical phase-based approach
Station-Focused Deployment	Only substation-level architecture with ring or dual LAN	MASs handle localized fault detection and equipment monitoring	GOOSE for protective actions, limited MMS to station-level agent	Medium, as the scope is confined, but integration with existing station IEDs	Local reliability within stations, no wide-area reconfiguration	Low to moderate, as fewer devices require standard LN definitions	Limited but can be scaled to multiple stations	Typically station staff plus specialized MAS integrators	High for localized improvements; partial but effective

Table 14. Future R&D themes for MAS–IEC 61850 convergence in subway networks.

R&D Focus	Current Exploration	Anticipated Challenges	Proposed Solutions	Dependencies	Potential Impact	Multi-Domain Synergies	Stakeholder Cooperation	Funding Opportunities	Roadmap Timeline
AI-Driven Prediction and Forecasting	Pilot implementations with limited scope	Data heterogeneity from AC/DC equipment and sensor types	Centralized big data platforms ingesting LN data, advanced AI models	5G networks, real-time analytics in agent frameworks	High, can drastically improve self-healing efficiency	Overlap with condition monitoring and dynamic control	Joint programs among subway operators, OEMs, AI developers	Government grants, private R&D funding	3–5 years to robust field deployments
Integration with Edge and Fog Computing	Conceptual stage in some research labs	Edge computing infra cost, cybersecurity issues, standard APIs	Deploy compact agent hardware, containerized LN-based virtualization	Real-time operating systems, advanced QoS management	Moderate to high, can reduce latency, enhance resilience	IoT-driven station automation, synergy with AI services	Need synergy between computing and utility standardization bodies	Industry–academia partnerships	3–7 years for large-scale acceptance
Cybersecurity Frameworks for GOOSE and MMS	Early adoption of stronger encryption or role-based access	Full encryption for GOOSE might hamper performance	GOOSE extension with minimal overhead, role-based access for MASs	Next-gen cryptography frameworks, post-quantum cryptography	Extremely high, especially for vital city infrastructure	Cross-pollination from IT and OT sectors on intrusion detection	Government-level directives, transit agencies, vendors	Dedicated security R&D initiatives, global standard bodies	Ongoing evolution as standards and threats escalate
Standard Extensions for DC Traction LN	Ongoing expansions to define new LN classes in IEC 61850-7 series	Fragmented LN coverage for DC traction, inconsistent vendor support	Formal LN definitions for DC traction, synergy with existing AC LN	Collaboration among large railway operators, standard committees	High, bridging the gap between AC substation standards and DC-based railway standards	Closer alignment with railway committees and bodies, advanced pilot programs	WG-level involvement from IEC and IRIS-like organizations	Possibly large from major suppliers, government R&D programs	2–5 years to finalize LN amendments, testing in pilot projects

Table 15. Key technologies for substation-level self-healing.

Technology	Application Scenario	Degree of Implementation	Differences Between Systems	Future Prospects	Issues to Address	Research Potential	Reliability Impact	Cost Considerations	Maintenance Requirements	Impact on Power Quality
Fault Detection Algorithms	Detection of electrical faults	Moderate	Varies by system type	High	False positives, sensitivity	High	Significant	Low	Medium	High
Remote Control and Isolation	Isolation of faulty segments	High	Advanced systems available	Moderate	Communication delays	Moderate	High	Medium	Low	High
Automated Reconfiguration	System restoration after isolation	High	Varies in implementation	High	Delays in reconfiguration	High	High	High	Low	Medium
Predictive Maintenance Systems	Fault prediction and maintenance	High	Available in some systems	High	Data accuracy	High	High	Medium	High	High
Real-Time Monitoring Systems	Continuous system monitoring	Very High	Available in all modern systems	Very High	Requires significant infrastructure	High	Very High	High	High	Very High
AI-Based Reconfiguration	Dynamic system restoration and reconfiguration	Moderate	New and emerging technology	Very High	Data communication delays	High	Very High	High	Medium	Very High

Table 16. Comparative analysis of line-level self-healing mechanisms in subway power networks.

Mechanism	Application Scenario	Degree of Implementation	Differences Between Systems	Future Prospects	Issues to Address	Research Potential	Reliability Impact	Cost Considerations	Maintenance Requirements	Impact on Power Quality
Ring Network Reconfiguration	Network reconfiguration after faults	High	Varies by design	High	Risk of power surges	High	High	Medium	Medium	High
Automated Switches	Fault isolation and rerouting	High	Available in most systems	Moderate	Time delays in operation	Moderate	High	Medium	Low	Medium
MAS-Based Decision Making	Real-time fault management	Moderate	Innovative for smart grids	High	Data communication overhead	High	Significant	Low	Medium	High
Fault Detection Sensors	Detecting and locating faults	High	Critical for efficient systems	High	False negatives	Moderate	High	Medium	High	High
Real-Time Load Balancing	Dynamic balancing of network load	High	Can differ across systems	Very High	Coordination complexity	High	High	High	Low	High
Adaptive Rerouting	Adjusting power flow dynamically	High	Cutting-edge technology	Very High	Requires high data throughput	High	Very High	High	Medium	Very High

Table 17. Comparative analysis of cross-layer fault recovery mechanisms in subway power networks.

Mechanism	Application Scenario	Degree of Implementation	Differences Between Systems	Future Prospects	Issues to Address	Research Potential	Reliability Impact	Cost Considerations	Maintenance Requirements	Impact on Power Quality
Hierarchical Recovery	Coordinated fault recovery	High	Varies across systems	Very High	System complexity	High	Very High	High	High	Very High
Cross-Layer Isolation	Fault isolation across multiple layers	Moderate	New and emerging systems	High	Data consistency	High	High	Medium	High	High
MASs for Multi-Layer Coordination	Distributed decision making in recovery	High	Requires data consistency	Very High	Communication delays	High	Very High	High	Medium	Very High
Adaptive Load Balancing	Balancing load across multiple levels	High	Critical for large systems	Very High	Requires real-time data	High	High	High	Low	High
Data Integration Systems	Integration of data across multiple layers	Moderate	Critical for decision making	High	Data synchronization issues	High	High	High	Medium	Very High
Predictive Recovery Algorithms	Anticipating faults and recovery actions	Moderate	Experimental in some systems	High	Lack of accurate models	Moderate	High	Low	High	Very High

Table 18. AI-driven fault diagnosis and reconfiguration in subway power systems.

Mechanism	Application Scenario	Degree of Implementation	Differences Between Systems	Future Prospects	Issues to Address	Research Potential	Reliability Impact	Cost Considerations	Maintenance Requirements	Impact on Power Quality
Hierarchical Recovery	Coordinated fault recovery	High	Varies across systems	Very High	System complexity	High	Very High	High	High	Very High
Cross-Layer Isolation	Fault isolation across multiple layers	Moderate	New and emerging systems	High	Data consistency	High	High	Medium	High	High
MASs for Multi-Layer Coordination	Distributed decision making in recovery	High	Requires data consistency	Very High	Communication delays	High	Very High	High	Medium	Very High
Adaptive Load Balancing	Balancing load across multiple levels	High	Critical for large systems	Very High	Requires real-time data	High	High	High	Low	High
Data Integration Systems	Integration of data across multiple layers	Moderate	Critical for decision making	High	Data synchronization issues	High	High	High	Medium	Very High
Predictive Recovery Algorithms	Anticipating faults and recovery actions	Moderate	Experimental in some systems	High	Lack of accurate models	Moderate	High	Low	High	Very High

Table 19. Key dimensions of AI-enhanced fault diagnosis and prognostics in subway power systems.

Application Scenario	Implementation Stage	Distinct Features	Prospects	Challenges	Future Research Potential	Data Requirements	Level of System Impact
Transformer Monitoring	Emerging	AI-based sensor data fusion	Extended component lifespan	Lack of labeled failure data	Transfer learning for rare faults	High-frequency sensor logs	Medium to high
Switchgear Fault Detection	Intermediate	Real-time analytics at the edge	Rapid fault isolation	Complexity of multi-vendor systems	Federated learning for distributed sites	Medium-volume event-driven data streams	High
Cable Insulation Prognostics	Early Adoption	Predictive modeling via ML	Reduced unplanned outages	Environmental variability	Hybrid physics-data-driven approaches	Continuous partial discharge measurements	Medium
Substation Asset Management	Emerging	Digital twins with AI forecasting	Intelligent maintenance scheduling	Integration with legacy SCADA	Explainable AI for operator trust	Historical maintenance and operation logs	High
Voltage/Current Anomaly Alerts	Intermediate	DL-based pattern recognition	Near-instant fault recognition	High false-positive risk	Active learning with operator feedback	High-frequency waveform data	Medium to high
Power Converter Monitoring	Early Adoption	CNN-based image analysis	Improved reliability and efficiency	Model interpretability	Domain adaptation from similar systems	Thermal imagery, high-speed sensor data	Medium
Passenger Load Prediction (Indirect for Fault Stress)	Experimental	AI correlation with ridership data	Load management optimization	Data privacy concerns	Multi-modal integration (ridership and power)	Access to fare collection and energy data	Low to medium
Overhead Line Wear Detection	Experimental	Computer vision for wear detection	Enhanced safety and service life	Sensor deployment challenges	UAV and robotics-based inspection	High-resolution imagery and real-time streams	Medium

Table 20. Core aspects of RL-driven self-healing in subway power systems.

Application Scenario	Adoption Level	RL Methodology	Key Advantages	Challenges	Future Research Potential	Distinct Operational Constraints	Long-Term Prospects
Feeder Reconfiguration	Pilot Studies	Deep Q-Networks	Automated fault isolation	Large state-action space	Transfer learning from simulation	Voltage, current, and safety margins	High, with proven pilot successes
Load Balancing in Peak Hours	Emerging	Policy Gradient	Dynamic response to changing demand	Maintaining service continuity	Meta-RL for rapid policy updates	Passenger safety, train schedules	High, crucial for growing urban demands
Multi-Substation Coordination	Experimental	Cooperative RL	Global optimization of resources	Communication overhead among agents	Hierarchical RL for layered coordination	Data exchange and synchronization	Medium to high, depends on standardization
Integration with MASs	Early Adoption	A2C, PPO	Distributed, scalable intelligence	Complexity of multi-agent negotiations	Hybrid MAS–RL frameworks	Cybersecurity for distributed agents	High, synergy with self-healing goals
Emergency Fault Recovery	Proof-of-Concept	Model-Based RL	Preemptive planning and quick restore	Ensuring real-time updates of system model	Real-time digital twins and advanced simulation	Strict time constraints	Medium, requires robust real-world data
Autonomous Voltage Regulation	Conceptual Studies	Off-policy RL	Reduces human oversight	Risk of instability if policy is incorrect	Offline learning with partial environment models	Regulatory compliance and device limitations	Medium, depends on regulatory acceptance
Signaling-Power Coordination	Emerging	Multi-Agent RL	Holistic approach to operational safety	Complexity of multi-objective optimization	Cross-domain RL frameworks for signal and power data	Interdisciplinary data standards	High, synergy between power and signaling
Energy Storage Management	Preliminary	Hybrid RL	Minimizes energy costs and improves reliability	Uncertain battery degradation profiles	Transfer and continual learning for battery health	Battery lifetime and cost constraints	Medium, depends on cost-effectiveness

Table 21. Implications of integrating AI, MASs, and IEC 61850 in subway power systems.

Integration Dimension	Current Maturity Level	MAS Role	IEC 61850 Feature	AI Contribution	Key Advantages	Major Challenges	Long-Term Potential
Substation Automation	Intermediate	Agents for local protection schemes	GOOSE for event-driven messaging	Predictive analytics for fault detection	Faster response, standard data modeling	Ensuring backward compatibility, vendor constraints	Very high (foundational for self-healing)
Energy Routing and Sharing	Emerging	Distributed load management agents	MMS-based data exchange	RL for optimal scheduling	Improved energy efficiency, reduced costs	Coordination complexity	High, particularly for integrated urban networks
Real-time Fault Recovery	Early Adoption	Negotiation protocols among agents	Sampled values for high-fidelity data	Fast reconfiguration decisions	Autonomous fault isolation and restoration	Handling concurrency and data bursts	High, improves reliability and safety
Predictive Maintenance	Intermediate	Coordination among asset-level agents	Structured data for sensor readings	ML-based asset health predictions	Reduced downtime, extended equipment life	Integrating with legacy diagnostic systems	Medium to high, reliant on data quality
Resilience under Cyberattacks	Conceptual Studies	Agents implementing security policies	Role-based access control features	Anomaly detection in network traffic	Enhanced system security and resilience	Evolving threat landscape	Medium, but essential for modern systems
Integration of Renewables	Pilot	Agents to manage local generation	Enhanced IEC 61850 DER profiles	Multi-objective optimization (AI)	Reduced carbon footprint, diversified energy mix	Uncertainty in renewable supply	Medium to high, synergy with green policies
Network-wide Optimization	Emerging	Hierarchical MAS architecture	Interoperability across devices	Graph-based AI for global optimization	Holistic approach to load flow and reliability	Scalability of centralized–decentralized hybrids	Very high, potential step-change in performance
Human–Machine Collaboration	Early Research	Agent-based interactive dashboards	Standardized data for UI integration	Explainable AI for operator guidance	Improved situational awareness and operator trust	Complexity in interface design, training staff	Medium, fosters acceptance of AI decisions

Table 22. Core dimensions of cybersecurity, privacy, and data management in AI-driven subway power systems.

Focus Area	Risk Level	Primary Threats	Key Security Measures	Privacy Considerations	Data Governance Needs	Implementation Complexity	Future Development Outlook
Data Poisoning Prevention	High	Altered training datasets	Trusted data pipelines, robust data validation	Minimizing personal data usage	Clear ownership of dataset curation and updates	Medium to high	More advanced anomaly detection
Intrusion and Ransomware Defense	Very High	Unauthorized system access, encryption of operational data	Multi-factor authentication, network segmentation	Potential exposure of passenger data	Comprehensive role-based permissions	High	Zero-trust architectures, advanced IDS
Sensor and Edge Device Security	Medium	Spoofing or tampering with local sensors	Secure hardware modules, signed firmware updates	Minimal retention of localized data	Local data lifecycle policies	Medium	Widespread adoption of secure edge computing
Privacy-Preserving Analytics	High	Inadvertent personal data gathering	Differential privacy, anonymization, secure computation	Regulatory compliance (GDPR, etc.)	Clear guidelines on data usage and sharing	Medium	Adoption of standard privacy frameworks
Encryption and Secure Protocols	High	Eavesdropping on communication channels	End-to-end encryption, TLS-based solutions	Minimal stored passenger identifiers	IEC 61850 extension with security profiles	Medium	Integration with post-quantum algorithms
Data Lifecycle Management	Low to Medium	Retention of outdated or unverified data	Automated purging, archiving policies	Proper anonymization before storage	Centralized metadata, consistent versioning	Medium	Growth of advanced data-lake solutions
Incident Response and Recovery	Very High	Prolonged downtime, compromised assets	Redundant backups, well-defined playbooks, real-time alerts	Data subject notifications if breach occurs	Legislative alignment with local government policies	High	AI-driven automated containment solutions
Compliance and Certification	Medium	Penalties for non-compliance	Frequent audits, standardized frameworks	Transparent privacy statements	Align with international standards (ISO, IEC)	Medium	Greater emphasis on multi-stakeholder certification

Table 23. Socio-economic and operational dimensions of AI-driven subway power systems.

Dimension	Operational Impact	Economic Influence	Policy Framework	Workforce Implications	Urban Development	Challenges	Long-Term Outlook
Reliability and Service Quality	Fewer disruptions, faster recovery	Higher ridership, reduced compensation costs	Safety regulations, service standards	Shift toward strategic oversight	Public trust in mass transit	Ensuring AI reliability and acceptance	High, fosters public adoption of subways
Cost Structure and Funding	Reduced O&M expenses, capital reallocation	Potential new revenue streams (data monetization)	PPP frameworks, capital incentives	Demand for financial data analysts	Reinforcement of transit networks	Cost of AI integration	Medium to high, dependent on ROI
Workforce Transition	Automated routine tasks, improved safety	Indirect cost savings from fewer human errors	Labor guidelines, reskilling grants	Need for data science and AI specialists	Enhanced system stability	Resistance to change, union negotiations	Medium, requires policy and education synergy
Environmental Sustainability	Energy optimization, synergy with renewables	Lower carbon footprint, positive brand image	Green certifications, carbon credits	Additional roles for sustainability officers	Integration with EV infrastructure	Gaps in grid readiness, technology unproven in some areas	High, part of broader city climate goals
Innovation and Technology Ecosystem	Faster deployment of advanced solutions	Stimulates local tech sectors, fosters start-ups	IP regulations, open data policies	Collaborative R&D roles across universities and operators	Enhanced city-wide innovation	Balancing proprietary and open-source solutions	High, strong synergy with digital economy
Urban Resilience	Quick adaptation to unexpected events	Reduces economic losses from major incidents	Disaster preparedness rules, city planning	Cross-functional roles in risk management	Encourages stronger public transport usage	Complexity of integrating multiple infrastructures	High, critical for disaster mitigation
Regulatory Alignment	Compliance with safety/operational mandates	Avoids penalties, fosters public–private partnerships	Evolving standards for AI and data usage	Higher accountability for operators	Possible expansion of rail services	Complexity of multi-layer governance	Medium, depends on legislative agility
Public Trust and Acceptance	Transparent, real-time communication	Potential for fare policy changes, better ridership	Privacy protection, public engagement	Emphasis on communication skills in staff training	Improved passenger satisfaction	Data privacy concerns, potential for misunderstandings	High, essential for widespread adoption

Table 24. Summary of security threats, mitigation strategies, and their implementation priorities.

Security Threat	Potential Impact	Mitigation Strategy	Implementation Complexity	Priority Level
Data Poisoning	Degraded AI model performance and incorrect decisions	Secure data pipelines, anomaly detection, and data validation	High	Very High
Sensor Spoofing	Misleading data leading to incorrect fault isolation	Encryption of sensor data, anomaly detection, sensor authentication	Medium	High
Ransomware and DoS Attacks	Disruption of system functionality and operational downtime	Regular backups, intrusion detection systems, secure communication	High	Very High
Unauthorized Access	Compromised system control and decision-making	Multi-factor authentication, access control policies, secure protocols	High	Very High
Communication Latency	Delayed fault detection and recovery	Edge computing, low-latency communication protocols	Medium	High
Spoofing of AI Decisions	Inaccurate decisions leading to system instability	Secure AI pipelines, explainable AI, anomaly detection	High	Medium

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Feng, J.; Yu, T.; Zhang, K.; Cheng, L. Integration of Multi-Agent Systems and Artificial Intelligence in Self-Healing Subway Power Supply Systems: Advancements in Fault Diagnosis, Isolation, and Recovery. Processes 2025, 13, 1144. https://doi.org/10.3390/pr13041144

AMA Style

Feng J, Yu T, Zhang K, Cheng L. Integration of Multi-Agent Systems and Artificial Intelligence in Self-Healing Subway Power Supply Systems: Advancements in Fault Diagnosis, Isolation, and Recovery. Processes. 2025; 13(4):1144. https://doi.org/10.3390/pr13041144

Chicago/Turabian Style

Feng, Jianbing, Tao Yu, Kuozhen Zhang, and Lefeng Cheng. 2025. "Integration of Multi-Agent Systems and Artificial Intelligence in Self-Healing Subway Power Supply Systems: Advancements in Fault Diagnosis, Isolation, and Recovery" Processes 13, no. 4: 1144. https://doi.org/10.3390/pr13041144

APA Style

Feng, J., Yu, T., Zhang, K., & Cheng, L. (2025). Integration of Multi-Agent Systems and Artificial Intelligence in Self-Healing Subway Power Supply Systems: Advancements in Fault Diagnosis, Isolation, and Recovery. Processes, 13(4), 1144. https://doi.org/10.3390/pr13041144

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integration of Multi-Agent Systems and Artificial Intelligence in Self-Healing Subway Power Supply Systems: Advancements in Fault Diagnosis, Isolation, and Recovery

Abstract

1. Introduction

2. Review of Self-Healing Technologies Within Electrical and Subway Power Systems

2.1. The Concept of Self-Healing in Metro Power Supply Systems

2.1.1. Relevance to Metro Power Supply Systems

2.1.2. Mathematical Representation of Self-Healing in Metro Power Supply

2.2. Historical Evolution of Self-Healing Strategies in Electrical Power Systems

2.2.1. Early Concepts and Precursor Technologies

2.2.2. The Emergence of Self-Healing Principles in the Late 20th Century

2.2.3. Influence of the Smart Grid Paradigm

2.2.4. Convergence with Multi-Agent Systems and Artificial Intelligence

2.2.5. Lessons Learned and Ongoing Challenges

2.3. Key Components and Architecture of Self-Healing Mechanisms in Modern Power Networks

2.3.1. Hardware Foundation: Intelligent Electronic Devices, Sensors, and Switchgear

2.3.2. Communication Protocols and Standards

2.3.3. Control Hierarchies: Centralized, Decentralized, and Distributed Approaches

2.3.4. Core Functions of Self-Healing Mechanisms

2.3.5. Role of Artificial Intelligence and Advanced Analytics

2.3.6. Security and Reliability Considerations

2.3.7. Outlook: Convergence with Distributed Energy Resources and Microgrids

2.4. Emerging Self-Healing Solutions in Subway Power Systems and Future Directions

2.4.1. Characteristics and Challenges of Subway Power Systems

2.4.2. Adapting Self-Healing Functions for Subway Applications

2.4.3. Leveraging IEC 61850 and Multi-Agent Systems in Subway Contexts

2.4.4. AI-Driven Fault Prediction and Maintenance

2.4.5. Case Studies and Pilots

2.4.6. Future Directions and Research Opportunities

2.4.7. Synthesis and Outlook

3. Specific Challenges Faced by Subway Power Supply Systems

3.1. Complexity of Topology and Operational Constraints

3.1.1. Unique Structural Layout and Load Characteristics

3.1.2. Space Constraints and Infrastructure Limitations

3.1.3. Operational Demands and Safety Considerations

3.2. Fault Diagnosis, Isolation, and Recovery in Real-Time

3.2.1. High-Speed Fault Detection and Localization

3.2.2. Isolation Strategies in Constrained Environments

3.2.3. Rapid Service Restoration and Self-Healing Techniques

3.3. Regulatory, Safety, and Integration Barriers with Emerging Technologies

3.3.1. Safety Standards and Compliance Requirements

3.3.2. Interoperability and Integration with Legacy Systems

3.3.3. Balancing Innovation, Cost, and Public Acceptability

3.4. Advancing Fault Management and Self-Healing Capabilities in Subway Power Supply Systems

4. The Integration of MASs and the IEC 61850 Standard into Subway Power Systems

4.1. MAS-Based Approaches to Self-Healing in Subway Power Systems

4.1.1. Conceptual Foundations and Control Philosophies of MASs

4.1.2. MAS-Based Fault Detection and Diagnosis

4.1.3. MAS-Based Fault Isolation and Restoration

4.1.4. Evaluation of MASs in Subway Environments

4.2. Implementation of the IEC 61850 Standard in Subway Power Systems

4.2.1. Foundations of IEC 61850 and Its Relevance to Subway Networks

4.2.2. IEC 61850 Network Redundancy and Communication Protocols

4.2.3. System Configuration Language (SCL) and Engineering

4.2.4. IEC 61850 Services for Self-Healing

4.2.5. Challenges and Limitations in Subway Contexts

4.3. Convergent MAS–IEC 61850 Architecture for Fault Diagnosis, Isolation, and Restoration

4.3.1. Architectural Overview and Design Considerations

4.3.2. Fault Diagnosis and Localization Workflow

4.3.3. Restoration and Reconfiguration Strategies

4.3.4. Security and Redundancy Considerations

4.3.5. Practical Challenges and Future Outlook

5. Practical Applications of Self-Healing Techniques in Subway Systems

5.1. Substation-Level Self-Healing Applications in Subway Power Systems

5.1.1. Fault Detection and Isolation

5.1.2. Automated Reconfiguration

5.1.3. Key Technologies for Substation-Level Self-Healing

5.2. Line-Level Self-Healing Mechanisms and Network Reconfiguration

5.2.1. Ring Network Configuration

5.2.2. Automated Switches and MASs for Fault Detection and Isolation

5.2.3. Reconfiguration and Rerouting Power

5.3. Cross-Layer Fault Recovery Techniques and Strategies

5.3.1. Hierarchical Recovery Systems

5.3.2. Coordinated Fault Isolation Across Layers

5.3.3. MASs for Multi-Layer Coordination

5.4. AI-Driven Fault Diagnosis and Recovery in Complex Scenarios

5.4.1. Machine Learning for Fault Prediction

5.4.2. Deep Learning for Real-Time Fault Diagnosis

5.4.3. AI for Automated Reconfiguration

6. Implications of AI Technologies for Future Subway Power Systems