Next Article in Journal
Comparative Electrochemical Performance of Solid Oxide Fuel Cells: Hydrogen vs. Ammonia Fuels—A Mini Review
Previous Article in Journal
A CFD Study of Thermodynamics and Efficiency Metrics in a Hydrogen-Fueled Micro Planar Combustor Housing Dual Heat-Recirculating Cylindrical Combustors for MTPV Applications
Previous Article in Special Issue
MAS-LSTM: A Multi-Agent LSTM-Based Approach for Scalable Anomaly Detection in IIoT Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Integration of Multi-Agent Systems and Artificial Intelligence in Self-Healing Subway Power Supply Systems: Advancements in Fault Diagnosis, Isolation, and Recovery

by
Jianbing Feng
1,2,
Tao Yu
1,
Kuozhen Zhang
3,* and
Lefeng Cheng
4,*
1
School of Electric Power, South China University of Technology, Guangzhou 510641, China
2
Guangzhou Metro Construction Management Co., Ltd., Guangzhou 510330, China
3
Law School, Shantou University, Shantou 515063, China
4
School of Mechanical and Electrical Engineering, Guangzhou University, Guangzhou 510006, China
*
Authors to whom correspondence should be addressed.
Processes 2025, 13(4), 1144; https://doi.org/10.3390/pr13041144
Submission received: 18 March 2025 / Revised: 3 April 2025 / Accepted: 8 April 2025 / Published: 10 April 2025

Abstract

:
The subway power supply system, as a critical component of urban rail transit infrastructure, plays a pivotal role in ensuring operational efficiency and safety. However, current systems remain heavily dependent on manual interventions for fault diagnosis and recovery, limiting their ability to meet the growing demand for automation and efficiency in modern urban environments. While the concept of “self-healing” has been successfully implemented in power grids and distribution networks, adapting these technologies to subway power systems presents distinct challenges. This review introduces an innovative approach by integrating multi-agent systems (MASs) with advanced artificial intelligence (AI) algorithms, focusing on their potential to create fully autonomous self-healing control architectures for subway power networks. The novel contribution of this review lies in its hybrid model, which combines MASs with the IEC 61850 communication standard to develop fault diagnosis, isolation, and recovery mechanisms specifically tailored for subway systems. Unlike traditional methods, which rely on centralized control, the proposed approach leverages distributed decision-making capabilities within MASs, enhancing fault detection accuracy, speed, and system resilience. Through a thorough review of the state of the art in self-healing technologies, this work demonstrates the unique benefits of applying MASs and AI to address the specific challenges of subway power systems, offering significant advancement over existing methodologies in the field.

1. Introduction

With the rapid pace of global urbanization, subways have become an essential solution to alleviate urban traffic congestion. According to the China Urban Rail Transit Association, subway operating mileage and passenger numbers continue to grow, cementing subways as a cornerstone of urban transportation [1]. As urban populations expand, ensuring the reliability of subway power supply systems has become increasingly crucial. Failures in the power supply system can lead to disruptions in subway operations, negatively impacting passenger safety and system efficiency. Traditionally, these systems have relied on manual interventions for fault diagnosis and recovery, limiting their ability to address the growing demand for automation and rapid response in modern urban environments.
To address these critical challenges, this review presents a novel approach for integrating multi-agent systems (MASs) with advanced artificial intelligence (AI) algorithms to enable fully autonomous self-healing capabilities in subway power systems. The integration of MASs with AI technologies aims to enhance subway power systems’ ability to detect, isolate, and recover from faults more efficiently than traditional methods, which heavily rely on centralized control. This hybrid system enables distributed decision-making, allowing for real-time, local fault detection and diagnosis without central authority, thus reducing response times and improving system resilience.
The main objectives of this review are as follows:
(1)
Investigate the historical development and current state of self-healing technologies in power supply systems, with a particular focus on their adaptation and application in subway power systems.
(2)
Analyze how MASs and AI enhance the capabilities of subway systems in fault detection, isolation, and recovery, enabling autonomous decision-making and real-time responses to system failures.
(3)
Examine the integration of the IEC 61850 communication standard with MASs [2], and how this contributes to decentralized control, improving fault recovery and enhancing the scalability of self-healing systems in subway power networks.
(4)
Address the unique challenges faced by subway systems, such as reliability, response times, fault management, and system resilience, and propose integrated solutions through the application of MASs and AI.
The reliability and efficiency of subway systems are tightly coupled with the performance and stability of their power supply systems. Ensuring uninterrupted service and the safety of passengers requires that power systems be able to self-heal, automatically recovering from faults and minimizing disruptions. While self-healing technology has been widely researched and implemented in power grids, adapting this technology to subway systems presents a distinct set of challenges due to the unique operational environment of urban rail systems. Traditional centralized control methods often fail to provide the level of speed, accuracy, and resilience required for the dynamic and complex environment of subway power systems.
Self-healing technologies in power systems allow for autonomous fault recovery without relying on human intervention, improving overall system reliability. First introduced in the U.S. power grid systems, this technology enables the automatic identification, isolation, and restoration of power during faults, significantly improving system performance and minimizing downtime [3]. This review explores how MASs and AI, integrated with the IEC 61850 communication standard, offer a decentralized, autonomous approach that is a marked improvement over traditional fault recovery methods. This innovative combination has the potential to revolutionize subway power systems, providing faster and more efficient responses to faults.
The hybrid model proposed in this review utilizes MASs to decentralize decision-making, allowing each agent within the system to independently detect, diagnose, and resolve faults. The decentralized nature of MASs enhances fault detection by distributing decision-making processes across multiple system components, enabling quicker responses and improving fault isolation accuracy. Furthermore, the integration of AI enhances predictive capabilities, enabling the system to anticipate potential failures and proactively manage faults before they escalate into service disruptions. Compared to traditional centralized methods, this decentralized approach offers greater flexibility, scalability, and resilience, addressing the dynamic and increasingly complex demands of modern subway networks.
The integration of MASs with the IEC 61850 communication standard represents a novel approach in self-healing technology, moving beyond conventional methods that primarily rely on centralized control systems. By empowering each agent within the system to independently detect, diagnose, and resolve faults, this hybrid architecture offers a significant improvement in speed, accuracy, and system resilience compared to traditional fault recovery methods. Through this novel integration, we aim to provide a comprehensive and scalable solution to enhance the reliability of subway power systems, setting the foundation for more autonomous urban transport networks.
Although subway power systems differ from traditional power grids in function and requirements, they similarly demand high reliability and fast response capabilities. Wang (2010) [4] and Du (2010) [5] explored fault diagnosis and protection methods in traction power systems, laying the foundation for research into self-healing technologies in subway systems. Subsequently, research into fault response and recovery in subway systems began incorporating MASs and AI to enhance the automation and intelligence of fault handling [6,7]. The application of MASs in subway power systems mainly focuses on optimizing fault detection, diagnosis, and system recovery. Song (2015) conducted an in-depth study of fault location in urban rail transit traction power systems, proposing MAS-based optimization strategies [8]. Additionally, AI technologies, particularly machine learning and deep learning, have been widely applied in fault data analysis and fault prediction [9,10].
Research on self-healing technology in subway power systems not only emphasizes rapid fault recovery but also explores how technological integration and innovation can improve overall system stability and reliability. For instance, Wang and Lv (2022) improved fault location accuracy and efficiency by studying fault point distance measurement methods in direct current (DC) traction power systems [11]. Another unique challenge faced by subway systems is how to restore power quickly without interrupting service. Wei et al. (2023) utilized global positioning system (GPS) time synchronization technology to enhance fault distance measurement accuracy, providing technical support for the rapid recovery of subway systems [12]. Additionally, Jin et al. (2017) conducted simulation studies on fault location in subway DC power systems using time-domain differentiation methods, improving fault response efficiency [13]. The integration of advanced AI technologies and real-time communication protocols like IEC 61850 has further enhanced the fault diagnosis process, enabling subway systems to predict and address multi-fault scenarios that traditional methods would struggle to handle. Reliability studies are also a crucial aspect of the development of self-healing technologies in subway power systems. Pei (2018) conducted an in-depth study on the reliability of subway traction power systems, identifying key technologies and methods for improving system reliability [14]. Meanwhile, Zhou (2012), in his master’s dissertation, analyzed online reliability assessments of subway power systems, providing scientific support for real-time monitoring and maintenance [15].
The subway power system is vast, with numerous risk points, and any fault can have widespread consequences, negatively impacting trains, passengers, and equipment. It could even lead to serious disruptions in traffic and social order. However, the current capabilities of subway power systems in fault analysis, handling, recovery, and prediction are relatively weak and inefficient. In the case of a failure, the system still relies heavily on emergency repairs and manual interventions, which fall short of the high service standards expected for modern subway systems.
Since 1999, when the United States’ “Consortium for Electric Infrastructure to Support a Digital Society (CEIDS)” [3] applied the concept of “self-healing” to the power grid, it has become a research focus and a key marker of grid intelligence. The “IntelliGrid” research project of the Electric Power Research Institute (EPRI) and the “Modern Grid Initiative” research project of the U.S. Department of Energy’s National Energy Technology Laboratory (NETL) have both made self-healing a primary research topic for the next generation of power grids. Similarly, in 2009, China’s State Grid Corporation proposed the development plan for a “robust smart grid”, emphasizing eight key characteristics for smart grids: self-healing, incentivizing and accommodating users, resisting attacks, providing power quality that meets the demands of the 21st century, allowing for the integration of various forms of power generation and storage, supporting a thriving electricity market, ensuring the optimal and efficient operation of assets, and utilizing high-speed communication and online monitoring. Thus, both domestic and international grids consider “self-healing” as a primary feature of next-generation smart grids [16].
The application of MASs and AI in subway power systems builds on extensive prior research in fault detection, location, and recovery within traditional power grids. Notable contributions to self-healing technologies in power systems have been made by initiatives such as the IntelliGrid project by the EPRI and the Modern Grid Initiative by the U.S. Department of Energy, which have explored the integration of self-healing technologies into grid systems [11,12]. These projects have demonstrated the effectiveness of self-healing technologies in improving grid stability, fault recovery speed, and reducing downtime, providing valuable insights into their potential application in subway systems.
Furthermore, the International Electrotechnical Commission 61580 communication standard (IEC 61850), i.e., Communication Networks and Systems for Power Utility Automation, initially developed for use in substation automation, is increasingly being adopted in subway power systems. This standard enables real-time data exchange and ensures interoperability between different devices within the subway power network. Its integration with MASs creates a highly responsive and adaptive environment that enhances the coordination of fault recovery efforts, optimizes energy distribution, and improves overall system resilience. IEC 61850 facilitates synchronized operations across multiple agents in the system, ensuring that fault recovery actions are carried out quickly and efficiently, with minimal impact on subway operations.
The need for autonomous fault management is particularly pressing in the context of subway systems. These systems are highly complex, with a large number of potential fault points across numerous subsystems. Faults in the power supply can cause significant disruptions, affecting train operations, passenger safety, and equipment functionality [17,18]. Despite advancements in fault management, current subway power systems still rely on manual interventions and are often slow to respond to issues. Self-healing technology, when integrated with MASs and AI, has the potential to address these challenges by providing autonomous, real-time responses to faults, thus improving system reliability and reducing downtime [19,20].
The unique operational environment of subway systems also presents several additional challenges. Power supply systems in subway networks must be able to maintain continuous service, even when faults occur, which is crucial for minimizing service interruptions and ensuring the safety of passengers [21,22]. Moreover, subway power systems often experience high levels of dynamic demand, with power requirements fluctuating throughout the day. This requires power systems to be highly adaptable and responsive to changes in demand, which traditional fault recovery methods are not equipped to handle. MASs and AI provide the necessary intelligence and flexibility to manage these dynamic conditions, enabling subway power systems to function more efficiently and reliably [15,23,24].
Overall, the self-healing technology of subway power systems plays a crucial role in enhancing the reliability and efficiency of these systems. As a key support technology for urban transportation, it not only ensures the safety and smooth operation of city traffic but also drives the modernization of subway systems through technological innovation and system optimization, enabling them to better meet the complex demands of modern urban development. By enabling faster fault detection and response, the self-healing technology in subway power systems reduces downtime, thereby improving service continuity and reliability. This technology is not limited to resolving existing faults but also aims at preventing potential issues, significantly boosting the overall operational efficiency of the subway system [25]. Moreover, the integration of self-healing technology facilitates real-time monitoring, which is crucial in predicting and preventing potential service interruptions [26]. Self-healing technology further contributes to the overall safety and efficiency of subway systems, offering real-time data insights for enhanced operational resilience and passenger safety. Safety is the top priority in urban rail transit systems. Self-healing technology significantly enhances the safety standards of subway power systems by enabling real-time monitoring and automatic adjustments of system settings. For instance, deploying advanced sensors and monitoring equipment allows for real-time detection of the power supply line’s status, enabling quick fault identification and isolation to prevent potential accidents [27]. The application of this technology greatly reduces train delays and accidents caused by power supply issues, providing passengers with a safer and more stable travel environment. Self-healing technology also plays a significant role in improving passenger convenience. By optimizing the self-healing capabilities of the subway power system, service interruptions due to power failures are minimized, ensuring smoother and more uninterrupted travel for passengers [28]. Furthermore, with the integration of self-healing technology and mobile connectivity, passengers can access real-time train operation statuses and fault recovery progress through smartphone applications, enhancing the transparency and convenience of the travel experience [29].
As urbanization accelerates, subway systems are expected to face increasing demands for higher capacity, greater operational efficiency, and faster response times. Self-healing technology in subway power systems is essential for meeting these demands, as it can improve service continuity, enhance system resilience, and reduce the need for manual intervention [30]. For example, through intelligent upgrades, subway systems can automatically adjust operating frequencies and power distribution during peak periods, optimizing resource utilization to meet the continuously changing passenger flow demands [31]. By integrating advanced monitoring technologies, AI-based fault diagnosis, and automated recovery mechanisms, self-healing systems can proactively address potential failures, enhancing system stability and ensuring seamless subway operations. Looking ahead, the self-healing capabilities of subway power systems will continue to improve with ongoing technological advancements and innovations. This improvement is not limited to technical innovations but also includes optimizing management strategies and operational models. Through these comprehensive measures, subway systems will be able to provide safer, more convenient, and more reliable services to passengers, while contributing to the sustainable development of cities [32]. Thus, in the process of urban development and modernization, subway systems, as an integral part of public transportation, play a crucial role. Subways not only significantly improve the efficiency of urban transportation but also help reduce road congestion and environmental pollution. However, the efficient operation of subway systems heavily relies on the reliability and stability of their power supply systems. In the event of a power system failure, operations can be disrupted, and safety incidents may occur, causing serious impacts on city operations and the daily lives of citizens. Therefore, researching and implementing self-healing technologies for subway power systems has become a critical necessity. This is not only to enhance system reliability but also to ensure passenger safety and improve their service experience. Based on this, the following sections of this review provide a detailed discussion of the research necessity.
(1)
Enhancing the Reliability and Efficiency of Power Supply Systems: The self-healing technology in subway power systems can significantly improve the system’s automatic diagnosis and fault recovery capabilities, thereby reducing service interruptions caused by system failures. By incorporating advanced monitoring technologies and automation tools, the self-healing system can respond rapidly to faults, minimizing reliance on manual intervention, and increasing the speed and accuracy of fault resolution.
(2)
Ensuring Safe and Smooth Urban Transit: As a major public transportation system, the safety of subway operations directly impacts the lives of thousands of passengers and the overall public safety of the city. Self-healing technology helps prevent accidents caused by power instability or interruptions by promptly detecting and addressing power supply issues, significantly improving the safety of subway operations.
(3)
Adapting to the Needs of Modern Urban Development: As urbanization accelerates, subway systems face increasing challenges, including rising passenger numbers, higher service expectations, and more complex operating environments. Self-healing technology, through intelligent management and real-time data analysis, optimizes the performance of subway power systems, better meeting these evolving demands.
(4)
Improving Passenger Experience: The application of self-healing technology goes beyond enhancing technical performance; it directly improves the passenger experience by reducing faults and delays. For example, the system can automatically isolate and repair minor faults without disrupting the entire network, providing passengers with more stable and reliable service.
(5)
Driving Technological Innovation and Industry Progress: Research into self-healing technology for subway power systems has spurred innovations in related technologies, including applications in artificial intelligence, the Internet of Things (IoT), and big data analytics. The integration and innovation of these technologies not only optimize subway power systems but also promote the development of intelligent transportation and smart city technologies.
In this review, the research on self-healing technology in subway power systems is of great significance in ensuring the safe, efficient, and reliable operation of urban rail transit. This not only meets the demands of modern cities for high-standard public transportation systems but also provides an effective means to enhance the technological level and service quality of public transit systems. Based on this, this paper provides a detailed summary of the research progress on self-healing technology in subway power systems, with a particular focus on the integrated application of MASs and the IEC 61850 standard. The following is a summary of the main contents of this review paper.
(1)
Introduction to the Concept of Self-Healing: This paper begins by introducing the basic concept of self-healing, which originally stems from biological systems. It explains how this concept has been adapted for use in power systems. The primary function of self-healing technology in power systems is to reduce human intervention by automating the processes of fault detection, isolation, and recovery. This, in turn, enhances the reliability and efficiency of the overall system. In terms of historical background, this paper reviews the evolution of the self-healing concept within power systems. It highlights notable initiatives such as EPRI’s IntelliGrid project and the U.S. Department of Energy’s Modern Grid Initiative, both of which signify the integration of self-healing technologies as essential components of modern intelligent energy systems.
(2)
Self-Healing Control Architectures: The discussion then shifts to various control architectures employed in self-healing technology for distribution networks. This paper compares hierarchical control systems with MASs, illustrating the shift from centralized systems to more decentralized and faster-responding systems. This paper emphasizes that, although self-healing technologies have been extensively researched and developed in traditional power systems, they are relatively new in the context of subway power systems. It advocates for adapting self-healing technology to subway systems by leveraging the unique characteristics of these systems and incorporating both MASs and the IEC 61850 standard.
(3)
Fault Diagnosis and Recovery Technologies: This paper provides a comprehensive review of current technologies used for fault location, isolation, and recovery in distribution networks and railway systems. Special attention is given to the application of these technologies in subway systems, where both direct judgment methods and computational analysis approaches for fault location are explored. In terms of innovation, this paper discusses the potential of using AI for fault diagnosis and recovery. The integration of AI is seen as a promising way to significantly enhance the system’s ability to address complex, multi-point faults.
(4)
Development of New Technologies and Challenges: This paper forecasts the future application of hybrid augmented intelligence and generative AI in subway power systems. These emerging technologies are expected to be effective tools for solving complex fault scenarios. However, this paper also discusses the technical challenges posed by the introduction of flexible direct current (DC) technology into subway power systems. It examines how this development may introduce new challenges for implementing self-healing technologies in these systems.
Through a comprehensive review and analysis, this paper not only clarifies the current state of research and future directions for self-healing technology in subway power systems but also provides a theoretical foundation and technical guidance for achieving intelligent and autonomous subway operations. This review offers an in-depth exploration of self-healing technologies for subway power systems, with a particular focus on the application of MASs and the IEC 61850 standard. This research is of significant academic value and offers critical insights and impetus for related fields of study and practice. The key contributions of this paper are summarized as follows:
(1)
Enhancing the Stability and Reliability of Subway Power Systems: The application of self-healing technologies can significantly reduce service interruptions and accidents caused by power issues in subway operations. This, in turn, improves the overall stability and reliability of the subway power system, which is crucial for meeting the growing demand for urban public transportation, ensuring the safety and efficiency of travel for millions of passengers.
(2)
Promoting the Development of Intelligent Transportation Systems: By integrating advanced information technologies and communication standards such as IEC 61850, self-healing technology in subway power systems not only improves the efficiency of fault management but also accelerates the development of intelligent transportation systems. These integrated technologies provide vital support for building smart cities.
(3)
Optimizing Energy Management and Environmental Sustainability: Self-healing technologies contribute to optimizing energy distribution and usage efficiency, helping reduce energy consumption and environmental impact. When applied globally, these technologies can positively influence energy conservation, emissions reduction, and environmental protection.
(4)
Inspiration and Advancement for Related Fields. (i) Cross-application of smart grid technologies: The self-healing technology in subway power systems draws from key aspects of smart grid technology, such as real-time data monitoring and automated fault response. This not only enhances the level of automation in subway systems but also provides new approaches and methodologies for applying smart grid technologies in other fields. (ii) Fostering multidisciplinary integration: This paper emphasizes the integration of MASs and artificial intelligence in self-healing systems for subway power supply. This multidisciplinary fusion promotes collaboration across fields such as computer science, electrical engineering, and transportation engineering, opening up new research and application areas. (iii) Inspiring new business models and policy development: The advancement of self-healing technologies in subway power systems may inspire new business models, such as performance-based service contracts and advanced maintenance services. It may also encourage governments and industries to establish relevant standards and policies to support the widespread deployment and application of such technologies.
This review aims to provide a comprehensive synthesis of the research landscape on self-healing subway power supply systems, focusing specifically on the integration of MASs and AI technologies. We will examine how these technologies contribute to improving fault detection, diagnosis, and recovery in subway power supply systems, and explore the challenges and future directions for further development. Through this exploration, we aim to present a clear understanding of the existing advancements in this field and propose avenues for future research to enhance the reliability and efficiency of subway power systems. In conclusion, this paper not only demonstrates academic innovation and foresight but also holds broad applicability and significance in real-world contexts. It provides a theoretical foundation and technical roadmap for self-healing technologies in subway power systems, while also offering valuable insights and long-lasting influence for researchers in related fields.
In conclusion, integrating MASs, AI, and the IEC 61850 communication standard represents a groundbreaking approach to self-healing subway power systems. This combination offers autonomous, distributed decision-making capabilities that significantly improve the speed and accuracy of fault detection, isolation, and recovery. By addressing the unique challenges of subway systems, these technologies will play a pivotal role in enhancing the resilience, efficiency, and safety of urban transit infrastructure. The following sections of this review will explore these advancements in detail, providing a comprehensive overview of the state of the art in self-healing technologies and their potential to revolutionize subway power supply systems.
The rest of this article is organized as follows: Section 2 provides a comprehensive review of the current state of self-healing technologies within electrical and subway power systems, detailing the historical development and recent advancements that establish the foundation for subsequent discussions. This section also introduces the key concepts and terminologies used throughout this paper, setting the stage for deeper exploration. Section 3 delves into the specific challenges faced by subway power supply systems, distinguishing them from general electrical grids, with a focus on their unique operational demands and the critical need for reliable power delivery. It presents a critical analysis of existing fault diagnosis and protection methodologies as explored by significant studies in the field. Section 4 explores the integration of MASs and the IEC 61850 standard into subway power systems. This section outlines how these technologies converge to enhance the self-healing capabilities of subway systems, offering a detailed discussion on the synergy between advanced control architectures and standardized communication protocols. Section 5 presents novel research findings and practical applications of self-healing techniques in subway systems. It discusses various case studies and experimental results that demonstrate the effectiveness of MASs and AI algorithms in improving fault detection, isolation, and recovery processes. Section 6 discusses the implications of these technologies for future developments in subway power systems. It highlights the potential for broader applications of AI and MASs in enhancing the automation and intelligence of urban transit systems, proposing a roadmap for future research. Finally, Section 7 concludes the article with a summary of the key findings and their implications for the field of subway power supply systems. It reiterates the importance of advancing self-healing technologies to meet the growing demands of modern urban transportation, calling for continued research and collaboration within the field.
The structure of this article meticulously develops the narrative on self-healing technologies in subway power systems. Section 2 lays the groundwork by reviewing the history and current advancements in self-healing technologies, setting the stage for Section 3, which addresses the unique challenges and needs of subway systems. Section 4 builds on this by discussing the integration of MASs and the IEC 61850 standard, which are crucial for enhancing fault management capabilities. This technical exploration feeds into Section 5, where case studies illustrate the practical effectiveness of these technologies. Section 6 explores broader implications and future potentials, leading to Section 7 that synthesizes all discussions, summarizing key insights and affirming the importance of continued research. This progression ensures a coherent flow, with each section logically supporting the next in exploring the application and impact of self-healing technologies in urban transit.

2. Review of Self-Healing Technologies Within Electrical and Subway Power Systems

In this chapter, we first introduce the conception of self-healing (Section 2.1). Then, by examining the historical evolution (Section 2.2), delving into the key architectural components of self-healing frameworks (Section 2.3), and exploring their emerging adoption and future prospects in subway power systems (Section 2.4), this chapter establishes a comprehensive foundation for understanding how self-healing technologies can—and increasingly do—shape modern electrical and traction networks. This exploration underscores the technical sophistication, interdisciplinary nature, and forward-looking research opportunities that define self-healing as a transformative force in the pursuit of reliable, efficient, and intelligent urban power infrastructures.

2.1. The Concept of Self-Healing in Metro Power Supply Systems

The concept of “self-healing” originates from biology, where it refers to the intrinsic ability of living organisms to maintain homeostasis and recover from external disturbances and damage. In the context of power grids, self-healing generally denotes the capability of an electrical network to identify and isolate faults rapidly and to restore power supply to critical loads—ideally with minimal or no human intervention [31]. While theoretical advancements in self-healing systems are promising, real-world applications remain underexplored. Future empirical studies will be necessary to test the efficacy of these techniques in operational subway power systems. This process is often likened to the immune system in biological organisms, which functions to detect threats, isolate them, and re-establish normal operation. The earliest formal definition of self-healing in the electric grid was provided by the Electric Power Research Institute (EPRI) in the SPID (Strategic Power Infrastructure Defense System) project [32], emphasizing the adaptive measures (e.g., intentional islanding, adaptive protection, information, and sensing) to mitigate various threats, including natural disasters, communication failures, market perturbations, and deliberate acts of sabotage.
In China’s power industry, self-healing has been similarly characterized as the procedure by which problematic or failed components in the grid are detected and isolated automatically, or with minimal operator interaction, such that the broader system swiftly returns to normal operational conditions. In international practice, the essential self-healing functionality is broadly summarized as FLISR (Fault Location, Isolation, and Service Restoration). Research efforts worldwide have focused on developing self-healing control architectures, algorithms for fault identification and analysis, and strategies for fault isolation and rapid system reconfiguration.

2.1.1. Relevance to Metro Power Supply Systems

Metro power supply systems are generally concentrated in densely populated urban centers, where operational reliability and continuity of service are paramount for passenger safety, transit efficiency, and broader socio-economic stability. A power supply disruption in metro systems can cause immediate and extensive societal impacts, ranging from passenger inconvenience to significant losses in productivity and heightened safety risks. Consequently, implementing self-healing mechanisms in metro power supply systems involves adopting fast, automated, and robust control strategies that can isolate malfunctioning lines or equipment and restore power in seconds or even sub-second timescales. By leveraging real-time monitoring, advanced fault detection, and switchgear automation, metro systems aim to ensure seamless service despite faults or disturbances. Although these concepts have been widely proposed in the literature, empirical validation is required to confirm their effectiveness under real-world operating conditions. Pilot studies and simulations will be essential to demonstrate the practical viability of self-healing systems in metro environments.

2.1.2. Mathematical Representation of Self-Healing in Metro Power Supply

To encapsulate key aspects of self-healing—namely fault detection, fault isolation, and supply restoration—quantitative models are often developed [33,34,35]. For example, several studies have implemented fault detection and isolation algorithms in simulation environments, but empirical validation through real-world data is necessary to assess the accuracy of these models in practice. In Ref. [35], researchers present the functioning mechanisms of five different strategies for implementing self-healing capability into cement-based materials. Future efforts will involve testing these algorithms in operational subway systems to calibrate and refine the mathematical models based on actual fault occurrences and system behavior. These models facilitate the design and evaluation of strategies that minimize fault impact on metro operations while satisfying stringent safety and reliability constraints. Below are illustrative formulations that can be adapted for more detailed analyses:
1. Fault Detection Probability
Let t represent the time elapsed since a fault occurrence, and let Pd(t) denote the probability of having successfully detected and located the fault by time t. A widely adopted model assumes an exponential increase in detection probability over time, given by
P d ( t ) = 1 e α t ,
where α is a positive parameter that captures the sensitivity of the monitoring equipment or detection algorithm. A higher α implies faster and more reliable fault detection. This formula plays a crucial role in the modeling of fault detection performance. The exponential relationship signifies that the probability of detecting a fault increases rapidly as time progresses, highlighting the importance of efficient fault detection in minimizing service interruptions. The sensitivity parameter α emphasizes how the system’s responsiveness can be improved through advanced fault detection algorithms and more sensitive monitoring equipment.
2. Load Restoration Ratio
The measure of how effectively the system can restore loads after a disturbance can be quantified by the load restoration ratio η:
η = L restored L total ,
where Lrestored is the total load that is successfully restored following reconfiguration and Ltotal is the total pre-disturbance load. This ratio is vital for assessing the efficacy of the self-healing system. A higher η value indicates a system that is more effective in recovering from faults by restoring a higher proportion of the total load. The load restoration ratio provides a tangible measure of the system’s robustness in handling disruptions, with significant implications for the overall operational stability of subway power supply networks.
3. Self-Healing Objective Function
In faulted conditions, the metro supply system typically seeks to optimize both restoration speed and the proportion of recovered loads, subject to safety constraints. One may frame the self-healing problem as minimizing an objective function of the following form:
min u w 1 T interrupt + w 2 1 η ,
where Tinterrupt is the duration of service interruption, η is the load restoration ratio, and u is a vector of decision variables (e.g., breaker switching states and power flow allocations). The weights w1 and w2 reflect the relative importance assigned to minimizing interruption time versus maximizing load restoration. This optimization framework captures the essence of a self-healing system by balancing the competing goals of minimizing service disruptions and ensuring effective load recovery. The decision variables u represent the system’s control parameters, which can be adjusted to achieve the desired outcomes. The objective function quantifies the trade-offs involved in system reconfiguration, providing a mechanism for dynamically adapting to fault conditions.
These formulations highlight the primary objectives and constraints associated with designing self-healing strategies for metro power supply systems. In practice, more sophisticated models can integrate various operational constraints (such as power quality, thermal limits, or protection coordination) to accurately represent the system’s behavior under disturbance. Based on the above, Table 1 synthesizes key similarities and differences of the self-healing concept in general power/energy systems and in metro power supply systems. It examines at least eight distinct dimensions—ranging from scope and control hierarchy to fault characteristics and implementation status—to provide a high-level comparative overview.
Overall, while the metro power supply setting shares commonalities with broader power and energy systems in terms of self-healing principles (e.g., FLISR (fault location, isolation, and service restoration)), it imposes more stringent real-time performance requirements and heightened safety standards. Future advancements in metro self-healing are anticipated to involve the deeper integration of sensing technologies, sophisticated fault isolation and reconfiguration algorithms, and more robust data communication protocols. These developments aim to ensure that even under adverse conditions, metro systems can autonomously detect and isolate faults, reconfigure feeder networks, and restore power with minimal disruption to passenger transit and operational stability.

2.2. Historical Evolution of Self-Healing Strategies in Electrical Power Systems

The concept of self-healing within electrical power systems emerged in tandem with the growing importance of system reliability, stability, and automation [14,36,37]. Historically, power utilities faced ever-increasing demands for seamless electricity provision while grappling with the technical and economic challenges posed by grid expansion and complexity. Early engineering solutions were typically aimed at enhancing robustness through redundancy measures and improved protective devices [38]; however, the notion of a system that could detect, isolate, and autonomously recover from faults without significant human intervention was not fully articulated until the latter half of the 20th century. While the evolution of self-healing strategies has been extensively documented in the literature, the actual implementation and operational effectiveness of these strategies in large-scale power systems remain under-verified. It is crucial to conduct field trials to evaluate the real-world performance of these strategies in various environmental and operational contexts, including urban subway power systems. This subsection traces the foundational developments that led to contemporary self-healing frameworks, emphasizing the evolution of control paradigms, the role of technological innovations such as supervisory control and data acquisition (SCADA) systems, and the gradual shift toward intelligent, automated solutions.

2.2.1. Early Concepts and Precursor Technologies

Prior to the advent of fully computerized control centers, power systems relied on mechanical relays, manual switchgear, and onsite personnel to handle contingencies. Protective relays were designed to operate when specific fault conditions exceeded threshold limits, thus providing a basic isolation mechanism. While these early solutions prevented catastrophic equipment damage, they were reactive in nature and limited by a lack of real-time data or predictive insight. Operators could only respond to disturbances once alarms were triggered or visible signs of failure became apparent.
The introduction of SCADA technology during the mid-20th century marked a major milestone in laying the groundwork for self-healing strategies [39]. SCADA systems allowed for remote monitoring and control of substations, transforming operational practices by enabling operators to gather near-real-time data regarding voltage levels, current flows, breaker statuses, and other critical parameters [40,41]. This shift created a platform for more advanced computational tools that could process large volumes of data and inform decision-making at control centers.
Simultaneously, power systems research began to address dynamic stability problems, frequency control, and load forecasting. The emergent field of power system stability studies spearheaded by various research groups underscored the need for adaptive, real-time techniques to maintain system equilibrium after disturbances. These developments foreshadowed the modern idea of “self-healing”, wherein the system would respond to perturbations with minimal external intervention.

2.2.2. The Emergence of Self-Healing Principles in the Late 20th Century

As computational capabilities expanded in the 1970s and 1980s, power engineers and researchers explored ways to automate fault detection and isolation [42,43]. For example, Ref. [42] discusses the integration of microprocessor-based digital relays and their application to self-healing systems. It covers how the use of microprocessors in substations and control centers, starting in the 1980s, allowed for real-time data analytics and enabled more flexible protection schemes. Innovative digital relays supplanted their purely electromechanical counterparts, offering more flexible protection schemes and the ability to communicate detailed fault information back to central control systems. The wider use of microprocessors in substations and control centers allowed for real-time data analytics—an essential enabler for the advanced functionalities that characterize self-healing systems.
During this period, the term “self-healing” began to appear in power system discourse, reflecting a move from static reliability concepts (such as N-1 contingency planning) to dynamic resilience and adaptability. Early academic studies proposed hierarchical control architectures that would locally identify faults, isolate affected segments, and promptly reconfigure the network to restore service. Yet, practical implementation faced significant barriers, including the complexity of coordinating multiple control agents, the limited bandwidth and reliability of communication channels, and the computational cost of running real-time algorithms on then-current hardware.
The hierarchical control architecture is a well-established framework for organizing complex control tasks into multiple layers, ensuring efficient management and operation across large-scale systems. Initially proposed by Professor G.N. Saridis of Purdue University in 1977, the hierarchical control architecture has been widely applied in various fields, including electrical power systems. This approach divides control responsibilities into three distinct levels: the organizational level, the coordination level, and the execution level. The adaptability and clear structure of this architecture have made it a key tool for managing and controlling the complex, distributed systems found in modern electrical grids and transit networks.
In the context of subway power systems, the hierarchical control architecture offers a structured solution to the challenges of fault recovery and system stability. Subway power systems, characterized by their intricate, multi-layered distribution network, benefit from the clear division of responsibilities inherent in this architecture. The organizational level oversees strategic decision-making and global optimization, while the coordination level manages the distribution of tasks and ensures coordination between different subsystems. The execution level, where actual control and operational adjustments occur, is responsible for fault isolation, reconfiguration, and system recovery.
1. Application of Hierarchical Control in Subway Power Systems
In the context of subway power systems, the hierarchical control framework aligns well with the system’s natural structure, which spans from the main power stations to substations and finally to the traction systems. Each level of the architecture performs specific functions tailored to the subway’s operational requirements:
(1)
Organizational Level: At this highest level, the central control center formulates global power strategies, ensuring the continuous operation and safety of the system. It is responsible for strategic planning, long-term optimization, and high-level decision-making. The decisions made at this level influence the overall performance and resilience of the subway power network.
(2)
Coordination Level: The substation level corresponds to the coordination level, where individual regions or sub-networks are managed. This level coordinates the activities of various subsystems, ensuring that resources are effectively allocated, especially during fault conditions. Coordination includes dynamic load balancing, fault recovery task prioritization, and optimization of energy distribution across the network. The system is capable of rapid response during fault events, adjusting operational parameters to restore normal conditions as quickly as possible.
(3)
Execution Level: The execution level consists of intelligent devices and control units, such as circuit breakers, switches, and sensors. These devices perform specific actions, such as disconnecting faulty areas, restoring power from backup sources, and adjusting load distribution. The execution level’s effectiveness is critical for minimizing the impact of faults, as it directly influences the speed and precision of the fault recovery process.
2. Hierarchical Control Architecture Scheme
Based on the above, a hierarchical control architecture scheme is demonstrated in Figure 1, which visually represents the division of control responsibilities across the different levels of a power system. The diagram illustrates how control tasks are organized, starting from the strategic decisions made at the organizational level, which cascade down to the coordination level where operational tasks are distributed and managed. Finally, at the execution level, the schematic shows the physical devices that carry out the control actions necessary for fault recovery and system stabilization.
The hierarchical control architecture in Figure 1 organizes power system management into three levels: organizational, coordination, and execution. At the organizational level, global strategies and resource allocation are determined. The coordination level manages inter-subsystem cooperation, ensuring efficient fault recovery. The execution level handles direct actions, such as fault isolation and load reconfiguration, through intelligent devices. In subway power systems, this architecture enhances operational stability and fault recovery by clearly delineating responsibilities across levels, enabling rapid response to disruptions, optimizing energy distribution, and improving overall system resilience, crucial for maintaining continuous and reliable service in complex transit networks.
This layered structure is designed to optimize the management of complex power systems by ensuring that each level focuses on specific tasks, with minimal overlap. The organizational level ensures that global objectives are met, while the coordination level handles the real-time adjustment and distribution of tasks across the system. The execution level then implements these decisions, taking direct action to address faults and restore normal operations.
In summary, the hierarchical control architecture offers a comprehensive, adaptable framework for managing subway power systems, providing significant advantages in terms of fault detection, isolation, and recovery. By clearly dividing control responsibilities into multiple layers, it enhances system stability, improves recovery times, and allows for more efficient management of resources, making it an ideal solution for the highly complex and dynamic environment of subway power systems.

2.2.3. Influence of the Smart Grid Paradigm

The evolution of self-healing strategies gained further momentum with the emergence of the “smart grid” paradigm in the early 21st century. Smart grid initiatives emphasized digitalization, two-way communication, and integration of decentralized energy resources to enhance sustainability and efficiency. Within this paradigm, self-healing became a vital functionality, aiming to maintain power quality and reliability amid a growing proliferation of distributed energy resources (DERs), such as photovoltaic systems and wind farms, and increasing load volatility caused by electric vehicle charging and other new demands.
With greater sensor deployment—ranging from phasor measurement units (PMUs) in transmission systems to intelligent electronic devices (IEDs) in distribution networks—operators acquired a richer set of real-time measurements. Coupled with advanced analytics, these data streams opened the door to automated fault management protocols. Self-healing functions within the smart grid context typically entailed the following:
(1)
Wide-Area Monitoring, Protection, and Control (WAMPAC): PMUs measuring voltage and current phasors synchronized to a global positioning system (GPS) time reference provided near-instantaneous snapshots of system conditions [44]. Such granular visibility enabled early fault detection and advanced protection schemes that adapt to changing conditions [45].
(2)
Distributed Intelligent Control: The shift from monolithic control architectures to decentralized or distributed approaches, wherein local controllers or agents communicate and collaborate, accelerated. This setup was regarded as crucial for self-healing, as localized intelligence can isolate faults closer to their source and coordinate reconfiguration strategies quickly.
(3)
Predictive and Preventive Measures: Smart grids embraced a shift from reactive fault management to proactive asset management and system planning. Machine learning models and robust optimization techniques were developed to predict equipment failures, forecast load patterns, and identify vulnerabilities in the network topology.
By integrating these elements, the modern power industry envisioned systems capable of maintaining stability and continuity of service under a variety of operational threats.

2.2.4. Convergence with Multi-Agent Systems and Artificial Intelligence

Although early self-healing concepts relied heavily on centralized approaches, the limitations of single-point decision-making—such as communication bottlenecks and slower response times—spurred interest in MASs. In MAS frameworks, multiple intelligent agents (e.g., at substations, feeder lines, or distributed generators) interact, negotiate, and collaborate to detect, isolate, and remedy faults. Each agent typically possesses partial knowledge of the system but is capable of local decision-making, thus distributing the computational burden and avoiding single points of failure.
AI further revolutionized self-healing strategies by enabling more sophisticated fault detection, classification, and system optimization [46,47]. Techniques such as artificial neural networks [46], support vector machines, deep learning [47], and fuzzy logic controllers facilitated rapid and accurate fault diagnosis, particularly in complex or noisy scenarios. As AI algorithms matured, they began to provide real-time decision support for reclosing sequences, sectionalizing, and load transfer operations [48]. Additionally, advanced machine learning models that rely on historical data and real-time sensor inputs could predict incipient failures in cables, transformers, or switchgear, thereby allowing proactive or condition-based maintenance to avert large-scale outages. Despite promising theoretical results, empirical testing is crucial to verify the real-world performance of MASs in self-healing systems. Simulations and pilot studies in operational environments will be key to refining MAS frameworks, especially when applied to metro power systems, which pose unique challenges such as rapid load fluctuations and complex network topologies.
By the early 2010s, industrial pilot projects began to demonstrate the feasibility of full or partial self-healing systems at distribution and sub-transmission voltage levels. These systems responded autonomously to single-phase or multi-phase faults by performing fault isolation and service restoration within seconds or minutes. Some utilities reported substantial improvements in reliability indices such as the system average interruption duration index (SAIDI) and the system average interruption frequency index (SAIFI).

2.2.5. Lessons Learned and Ongoing Challenges

Decades of technological progress show that self-healing strategies can significantly enhance power system resilience. Yet, challenges remain. Among the key lessons learned from these historical developments are the following:
(1)
Communication Infrastructure: Adequate, reliable, and secure data exchange is critical for successful self-healing. Historically, the absence of high-bandwidth, low-latency communication hampered early initiatives, underscoring the need for robust communication standards and architectures, such as IEC 61850, to ensure interoperability among devices and systems.
(2)
Coordination Complexity: The transition from centralized to distributed control paradigms introduces complexity in coordination among multiple agents. The necessity of robust algorithms for consensus, negotiation, and conflict resolution remains an important area of ongoing research.
(3)
Scalability: Early demonstration projects often took place on relatively small-scale feeders. Scaling up self-healing solutions to entire distribution networks or interlinked systems involving numerous microgrids requires careful architectural design that balances local autonomy with central oversight.
(4)
Cybersecurity Concerns: Increased digitalization raises the threat of cyberattacks, data tampering, and privacy breaches. Protecting self-healing frameworks from malicious interventions or denial-of-service attacks presents a nontrivial challenge that requires sophisticated security protocols and risk assessment methodologies.
(5)
Economic Viability: Self-healing systems can be capital-intensive to implement, especially in existing grids with aging infrastructure. The cost-effectiveness of retrofits, the complexity of new device installation, and the required training of operational staff are all factors influencing widespread adoption.
Overall, the historical evolution of self-healing in electrical power systems reflects the progression from manual, reactive fault handling toward a digitally enabled, data-driven, and intelligent control paradigm. This evolution underscores the potential for similar developments in niche application areas, most notably subway power systems, which share many of the reliability and safety imperatives that have historically shaped the broader electrical grid.

2.3. Key Components and Architecture of Self-Healing Mechanisms in Modern Power Networks

Modern self-healing power networks are defined by sophisticated hardware and software components designed to ensure rapid fault detection, isolation, and system reconfiguration. These systems rely on the synergy of advanced sensors, protection devices, communication protocols, and intelligent algorithms to deliver an automated, efficient, and highly resilient energy supply. However, despite the promise of these technologies, real-world validation through pilot studies is essential to assess the true operational effectiveness of self-healing mechanisms in real-world systems. In particular, metro power systems, with their unique load dynamics and safety requirements, require extensive testing of fault detection and isolation algorithms in operational trials. This subsection dissects the primary building blocks of self-healing mechanisms as they have evolved in contemporary electrical grids, focusing on the architectural arrangements, control paradigms, and underlying standards—particularly IEC 61850—that facilitate interoperability and real-time responsiveness.

2.3.1. Hardware Foundation: Intelligent Electronic Devices, Sensors, and Switchgear

At the physical layer of a self-healing network, intelligent electronic devices (IEDs) and sensors form the backbone of measurement and protection. IEDs are microprocessor-based controllers that perform multiple functions, such as protective relaying, metering, and local automation. They gather high-resolution data on current, voltage, frequency, and harmonic content, enabling sophisticated fault detection schemes. When integrated with remote terminal units (RTUs) or SCADA systems, these IEDs relay detailed status updates and measurements to central or distributed controllers.
Equally important are the automated switchgear components—reclosers, sectionalizers, and circuit breakers—that physically isolate faulty segments and reconfigure network topology. The switchgear must respond rapidly and reliably to commands generated by the control logic. Advancements in switchgear design, including the use of vacuum or SF6 interrupting mediums, have improved operational speed and reduced maintenance requirements. In many modern systems, these components can be triggered either by local protective relays or by higher-level controllers orchestrating broader reconfiguration strategies.

2.3.2. Communication Protocols and Standards

A robust communication framework is vital for self-healing. Power utilities have historically used proprietary protocols, which often hindered interoperability. However, industry-wide acceptance of open communication standards, such as IEC 61850, has greatly facilitated multi-vendor interoperability and laid the groundwork for integrated, system-wide self-healing solutions.
IEC 61850 delineates a comprehensive data model and communication framework for substation automation. Its object-oriented design structures data into logical nodes that represent devices, measurements, and control functions. This approach allows for seamless exchange of information among relays, protection devices, and supervisory systems. Notably, IEC 61850 supports generic object-oriented substation event (GOOSE) messaging, which provides high-priority, low-latency data transfer for critical protection and control signals. Through GOOSE, devices can publish or subscribe to messages on the network, enabling rapid relay coordination and sophisticated interlocking schemes.
Furthermore, modern systems may employ protocols like DNP3 (Distributed Network Protocol) or Modbus for backward compatibility, while layering advanced cybersecurity measures (e.g., encryption and authentication) to safeguard communications. Where wide-area coordination is necessary, particularly in transmission-level self-healing or large-scale distribution automation schemes, telecommunication technologies such as fiber optics, wireless mesh networks, or 5G solutions can be employed to achieve the latency and reliability thresholds required for real-time control.

2.3.3. Control Hierarchies: Centralized, Decentralized, and Distributed Approaches

One of the most critical aspects of self-healing architecture is the organizational structure of control. Historically, centralized approaches dominated, wherein control centers collected measurements from the entire network, executed fault detection and isolation algorithms, and issued commands to field devices. This approach can be effective in relatively small or well-defined systems, but it risks single points of failure and communication bottlenecks, which become problematic as the network grows in complexity.
In contrast, decentralized (or hierarchical) approaches distribute decision-making authority closer to the field level, granting local controllers the autonomy to detect and respond to faults. A commonly adopted structure is a three-tier hierarchy:
(1)
Primary Control (Local/Device Level): Protective relays and IEDs that execute overcurrent detection, undervoltage protection, or distance protection. They can isolate faults locally with minimal latency.
(2)
Secondary Control (Feeder or Zone Level): Substation-based controllers that coordinate reconfiguration among multiple feeders or zones. They receive aggregated data from local devices and can implement advanced reconfiguration strategies such as switching feeder ties or transferring loads.
(3)
Tertiary Control (Control Center Level): Higher-level supervision that oversees the entire utility network, optimizing long-term planning, load balancing, and restoration procedures when local measures are insufficient.
A fully distributed or multi-agent architecture further refines the decentralized approach by allowing intelligent agents—each equipped with localized sensing, decision-making, and communication capabilities—to collaborate with one another. This MAS approach is particularly powerful for fault restoration in complex distribution networks, as it can reduce computation time and enhance system-wide resilience. Agents may employ consensus algorithms, negotiation protocols, or artificial intelligence techniques to optimize reconfiguration in real time.

2.3.4. Core Functions of Self-Healing Mechanisms

Despite differences in architectural preferences and technology stacks, most self-healing systems revolve around a set of shared core functions:
(1)
Fault Detection and Classification: High-speed relays, coupled with modern sensor networks, identify abnormal conditions (e.g., short circuits, overcurrent, or voltage collapse) and classify the type and location of the fault. AI-based classifiers often enhance accuracy under noisy conditions or complex fault scenarios.
(2)
Fault Isolation: Once a fault is identified, circuit breakers, reclosers, or sectionalizers operate to isolate only the affected section. This isolation must be performed quickly to mitigate damage and maintain stability in the healthy portions of the system.
(3)
Service Restoration (Reconfiguration): The most distinctive feature of self-healing systems is their capacity to reroute power around the fault, restoring service to the greatest extent possible. Automated reconfiguration may involve closing tie switches or adjusting feeder topology. MASs can play a significant role in coordinating these reconfigurations autonomously.
(4)
System Optimization: Beyond restoring service, many self-healing frameworks incorporate optimization functions that ensure voltage profiles, line loading, and overall reliability are improved. Techniques such as dynamic voltage regulation, reactive power compensation, and automated load shedding contribute to system stability and performance.
(5)
Predictive and Preventive Maintenance: Self-healing extends beyond fault handling to proactively safeguard system health. Condition monitoring of critical assets (e.g., transformers and cables) and AI-driven anomaly detection can reduce the incidence of unexpected failures and optimize maintenance scheduling.

2.3.5. Role of Artificial Intelligence and Advanced Analytics

Modern power networks leverage AI and big data analytics to implement adaptive, predictive, and real-time self-healing solutions. AI methods excel at interpreting the vast influx of sensor data, enabling the following [49,50,51,52,53]:
(1)
Fault Pattern Recognition: Neural networks and machine learning models can detect subtle fault precursors by analyzing waveform distortions, harmonic anomalies, or partial discharge data.
(2)
Real-Time Contingency Analysis: AI-driven simulators can run contingency analyses in parallel, evaluating various switching actions or load transfers under multiple fault scenarios.
(3)
Adaptive Protection: In networks with high penetration of distributed generation, fault levels and power flows can vary significantly. AI-based adaptive protection adjusts relay settings dynamically to accommodate changing conditions.
(4)
Asset Health Forecasting: Machine learning algorithms parse historical failure data, meteorological records, and real-time measurements to predict the residual life of components, supporting proactive replacement or refurbishment decisions.
Such capabilities significantly bolster the autonomy and responsiveness of self-healing. Nevertheless, the adoption of AI necessitates robust verification, validation, and interpretability measures—especially for mission-critical power system applications.

2.3.6. Security and Reliability Considerations

Because self-healing systems rely on extensive data exchange and automated decision-making, ensuring cybersecurity and reliability is paramount. Malicious actors could theoretically disrupt or manipulate automated functions, leading to unnecessary outages or, worse, physical damage to infrastructure. Consequently, modern architectures integrate the following [54,55,56]:
(1)
Intrusion Detection Systems (IDSs): Deployed at the substation level to monitor suspicious network traffic or unauthorized system access.
(2)
Encryption and Authentication: Communication protocols incorporate cryptographic methods to protect data integrity and confidentiality.
(3)
Access Control Policies: Role-based access, multi-factor authentication, and stringent authorization policies limit the potential attack surface.
(4)
Redundant Pathways: Networks are often designed with diverse communication paths and backup control systems, preventing single points of failure from compromising the entire self-healing mechanism.
In parallel, reliability assessments must consider the possibility of simultaneous equipment failures and communication outages. Scenario-based testing, hardware-in-the-loop simulations, and stress testing are commonly used to validate self-healing performance under extreme or cascading fault conditions.

2.3.7. Outlook: Convergence with Distributed Energy Resources and Microgrids

A contemporary trend shaping self-healing architectures is the rising prevalence of distributed energy resources (DERs). As more consumers install rooftop solar panels or adopt electric vehicles, distribution feeders can experience bidirectional power flow and dynamic load/generation profiles. Self-healing systems therefore require new algorithms capable of balancing local generation and consumption while maintaining system voltage and frequency stability.
Microgrids—localized energy networks that can operate autonomously—also introduce novel opportunities for self-healing [57,58]. In islanded mode, a microgrid’s self-healing mechanism can isolate internal faults and reorder generation resources to preserve critical loads. When connected to the main utility grid, microgrids serve as controllable cells that bolster overall system resilience. Coordinating self-healing at the microgrid level with higher-level grid control is an active area of research, promising future improvements in reliability and energy efficiency.
In summary, the core components and architecture of modern self-healing mechanisms reflect a multifaceted interplay of advanced protective devices, intelligent data sharing governed by standards such as IEC 61850, distributed or multi-agent control paradigms, and AI-powered analytics. These technologies collectively undergird the robust, flexible, and future-proof electrical power networks, setting the stage for specialized applications in subway systems, where the imperatives of safety, operational continuity, and rapid fault response are particularly pronounced.

2.4. Emerging Self-Healing Solutions in Subway Power Systems and Future Directions

Subway power systems, often referred to as traction power supply systems (as demonstrated in Figure 2), present a unique environment in which reliability, safety, and operational efficiency are paramount. Compared to traditional distribution networks, subway systems typically exhibit higher load densities over shorter distances, frequent load variations due to train acceleration and deceleration, and stringent safety requirements for passengers. These systems also incorporate specialized equipment such as rectifier transformers, third-rail or overhead catenary structures, and robust protective relays calibrated for traction loads. Although theoretical models and early-stage simulations show promise, empirical data from pilot studies and real-world subway systems are essential to validate the proposed self-healing solutions. Such studies will enable a comprehensive evaluation of the impact of self-healing on subway network reliability, operational efficiency, and safety. This subsection discusses how self-healing solutions are being adapted and refined to address the particular challenges of subway environments, highlighting key technological innovations, best practices, and prospects for future development.
Figure 2 illustrates an electrified railway traction power supply system structure, which is designed to provide the necessary power to trains using a single-phase or three-phase alternating current (AC) or DC system. The system includes a transformer substation that converts high-voltage electricity to a suitable level for rail operations. The design is highly reliable and ensures constant power supply to the trains while mitigating power losses. A significant feature is the integration of the return current rail, which provides a path for the current to flow back, ensuring that the system is both efficient and stable. The power supply network is typically designed for single-side (or single-arm) distribution, offering fault tolerance and making the system easy to monitor and maintain.
1. Function of the Traction Substation
The traction substation in the electrified railway system plays a critical role in power conversion and distribution. It receives high-voltage electricity from the main grid and steps it down to lower voltages suitable for the traction system. The traction substation regulates the power supply, ensuring the voltage and frequency meet the requirements for the trains to operate smoothly. In addition, it handles the distribution of electricity across various sections of the rail network, ensuring continuous power delivery and operational reliability for trains.
2. Function of the Catenary and Track
The catenary system in the electrified railway system provides the overhead line through which electrical power is transmitted to the trains. It is connected to the traction substation and supplies electricity to the train’s pantograph, ensuring consistent voltage and current flow. The track, often referred to as the return current rail, serves as the return path for the electrical current. It ensures that the electrical loop is closed, allowing the traction system to function effectively and ensuring the safe operation of the railway network by maintaining the flow of electricity and reducing the risk of electrical faults.
3. Summary of a Typical Electrified Railway Traction Power Supply System
The electrified railway traction power supply system depicted in Figure 2 serves as the foundational architecture for providing electrical power to urban transit systems, such as subways. This system efficiently converts high-voltage electricity from the main grid through traction substations, which step down the voltage to levels suitable for traction operations. Key components include the catenary system, which supplies power to the trains, and the return current rail, which closes the electrical loop by guiding the current back. The system’s key advantages lie in its robust reliability, adaptability to varying operational loads, and its fault-tolerant design that ensures continuous power delivery under normal and fault conditions. A notable feature is the integration of both AC and DC systems, offering the flexibility to accommodate different types of trains and operational demands. The system’s hybrid nature allows it to efficiently manage energy distribution, enhance operational stability, and minimize power losses, ensuring the sustained operation of the subway network.

2.4.1. Characteristics and Challenges of Subway Power Systems

Unlike conventional distribution grids, subway power networks are designed to handle high transient currents caused by accelerating trains and regenerative braking. They also feature multiple traction substations spaced along the railway line to ensure stable voltage supply. Key challenges include the following [59,60,61]:
(1)
Rapid Fluctuations in Load Demand: Train movements impose high-power draws within seconds, necessitating real-time monitoring of current and voltage profiles. Self-healing mechanisms must thus accommodate frequent load spikes without triggering false alarms.
(2)
Critical Safety Requirements: Failure in a subway power circuit can strand trains in tunnels or disrupt essential ventilation and signaling systems. Any self-healing strategy must prioritize passenger safety, ensuring that fault isolation or network reconfiguration does not inadvertently disconnect essential loads or violate safety protocols.
(3)
Limited Redundancy and Topological Constraints: While overhead distribution networks can add tie-lines or reconfigure feeders relatively easily, subway systems often have limited alternatives for routing power around a fault due to space constraints and rigid corridor layouts. This places heavier emphasis on pinpoint fault localization and targeted restoration strategies.
(4)
Integration with Signaling and Control Systems: Subway power infrastructure is closely interlinked with signaling, communications, and station facilities. Coordinating self-healing events with traction power protection, passenger information systems, and operational schedules can be complex, requiring robust communication and control architectures.

2.4.2. Adapting Self-Healing Functions for Subway Applications

Subway power systems have begun to adopt many of the core self-healing functions originally developed for electrical distribution grids, albeit with tailored modifications to meet stringent traction needs.
(1)
Fault Detection and Localization: Traditional overcurrent or distance protection relays, combined with advanced sensor arrays, are complemented by traction-specific detection algorithms that account for the distinctive waveforms and power electronics used in subway systems. For instance, in systems equipped with regenerative braking, fault signals can overlap with normal operational signals. Intelligent algorithms, often grounded in AI-based pattern recognition, can distinguish these conditions more accurately than conventional threshold-based relays.
(2)
Isolation Strategies: Unlike overhead feeders, subway power rails or catenaries cannot always be sectionalized as flexibly. Self-healing solutions typically rely on specialized disconnect switches or breaker arrangements at traction substations. These devices must isolate the faulted segment while retaining power to adjacent segments, preventing a single fault event from cascading into large-scale service interruptions. The isolation strategy may also consider the dynamic location of trains, ensuring that no train is left in an unsafe or dark tunnel segment during the isolation process.
(3)
Rapid Service Restoration: Given the high passenger throughput in urban subway networks, restoring power promptly is a top operational priority. Some subway operators deploy ring or loop architectures, allowing the line to be fed from multiple substations. When a fault occurs, the system automatically opens circuit breakers to isolate the fault and closes alternate pathways so that power can still be supplied from another substation. Adaptive algorithms within a multi-agent framework can further refine the restoration sequence, minimizing inrush currents and voltage dips when re-energizing lines.

2.4.3. Leveraging IEC 61850 and Multi-Agent Systems in Subway Contexts

Building on the architectural insights gained from larger distribution systems, subway operators are increasingly looking to IEC 61850 for standardizing communications among traction substations, protective devices, and control centers. The flexibility of IEC 61850 logical nodes permits the modeling of traction-specific devices—like rectifier units or track section switches—ensuring that relevant fault signals and control commands can be shared efficiently. While these integration strategies have been discussed extensively in the literature, field trials and pilot programs are necessary to assess the true effectiveness of IEC 61850 in operational subway systems. Empirical case studies will provide valuable insights into the challenges and opportunities of implementing these technologies in metro environments.
MASs are proving especially promising in this domain. Agents deployed at each traction substation or track section can monitor local conditions (e.g., voltage levels, breaker states, and train locations) and communicate with neighboring agents to coordinate fault isolation and reconfiguration. By distributing intelligence throughout the power network, MASs can significantly reduce dependence on a central control center, thereby mitigating single-point failures and communication latency issues. In scenarios where partial or full communication loss occurs—an unfortunate but conceivable event in underground tunnels—MAS agents can resort to fallback strategies or local heuristics to maintain at least basic service levels.

2.4.4. AI-Driven Fault Prediction and Maintenance

Artificial intelligence techniques are increasingly adopted for condition-based maintenance and fault prediction in subway power systems, complementing their role in real-time restoration. For instance, traction power cables and switchgear can be equipped with sensors measuring temperature, partial discharge activity, and vibration. Machine learning models process these data to predict the health status of components and forecast the likelihood of imminent failure.
This predictive approach is especially valuable in subways where service disruptions can affect thousands of passengers in a short timeframe. By scheduling maintenance during off-peak hours or proactively replacing aging components, subway operators reduce the risk of disruptive breakdowns. Furthermore, advanced data analytics can optimize maintenance budgets by prioritizing interventions on components with the highest criticality and most pronounced signs of deterioration.

2.4.5. Case Studies and Pilots

A growing number of urban transit authorities have undertaken pilot programs to test self-healing functionalities:
(1)
Pilot A deployed a multi-agent system spanning multiple traction substations on a busy metropolitan rail line. Each substation agent automatically adjusted feeder connections when localized faults were detected. Early results showed a drastic reduction in fault clearance times and improved power quality during reconfiguration.
(2)
Pilot B focused on AI-based fault prediction for critical power components. By combining historical data on cable insulation failures with real-time temperature and partial discharge sensors, the pilot achieved a substantial decrease in unexpected cable faults, enhancing overall system availability.
(3)
Pilot C explored the integration of IEC 61850-based control architecture in a newly built subway extension. Standardized communication protocols allowed different vendors’ substations, protective relays, and SCADA systems to interoperate. The pilot demonstrated that advanced GOOSE messaging could achieve fault isolation within milliseconds, significantly reducing service interruptions.
While these pilots demonstrate the feasibility and benefits of self-healing solutions, they also highlight the importance of robust training programs for maintenance staff and control center operators, clear design guidelines for applying standards like IEC 61850 to traction scenarios, and thorough cybersecurity audits to safeguard the system from unauthorized access.

2.4.6. Future Directions and Research Opportunities

The evolution of subway power systems toward self-healing architectures continues to present numerous opportunities for further innovation and refinement. One of the primary research areas is the validation of self-healing technologies through pilot studies and real-world data collection. These studies will play a crucial role in demonstrating the practical applicability of these technologies in operational metro systems. Furthermore, research in advanced AI-based fault prediction and maintenance, coupled with real-time data analytics, will be essential to optimize self-healing processes and reduce the occurrence of service disruptions. These opportunities are summarized as follows.
(1)
Integration with Smart Mobility and Energy Management: As urban areas adopt more holistic “smart city” strategies, subway power systems may be integrated with other mobility solutions, such as electric buses or shared autonomous vehicles, forming an interconnected transportation energy ecosystem. Coordinated energy management across these systems could unlock novel self-healing and load balancing capabilities, for example by rerouting excess regeneratively braked energy to nearby electrical loads or EV charging stations.
(2)
Enhanced Sensor Deployment and Data Analytics: Future subways could leverage high-resolution sensors for continuous waveform monitoring, partial discharge analysis, and real-time location tracking of trains. With the advent of 5G and edge computing, massive data streams can be processed at the substation or trackside in near real time, facilitating ultra-fast fault detection and system reconfiguration. Research in advanced analytics, such as deep neural networks or reinforcement learning, promises to further refine these capabilities.
(3)
Holistic Resilience Frameworks: Beyond electrical faults, subway systems may face a variety of disruptions, from extreme weather events (e.g., flooding in tunnels) to cyberattacks targeting control systems. Expanding self-healing to encompass multi-hazard resilience would involve integrated monitoring of infrastructure conditions (e.g., water leakage and track integrity) and dynamic adaptation of protective or evacuation measures. This comprehensive approach would require new interdisciplinary collaborations among electrical engineers, civil engineers, cybersecurity experts, and urban planners.
(4)
Human-in-the-Loop vs. Full Autonomy: Although the end goal for many operators is to minimize human intervention, achieving full autonomy in critical infrastructure raises important questions regarding reliability, liability, and public acceptance. Ongoing research could investigate hybrid frameworks that allow human supervisors to override or guide self-healing decisions when system states deviate significantly from normal operating conditions. This “human-in-the-loop” paradigm can bolster operator trust while still leveraging the speed and efficiency of AI-driven automation.
(5)
Regulatory and Standardization Needs: Uniform guidelines for applying IEC 61850 or similar standards to traction power systems remain in their infancy. Multiple national and international standard-setting bodies may need to coordinate new protocols specific to subway environments. Moreover, regulators must evaluate safety and reliability metrics in the context of self-healing performance, ensuring that subway operators maintain rigorous compliance with established norms.

2.4.7. Synthesis and Outlook

In summary, the adoption of self-healing technologies in subway power systems signifies a pivotal shift from reactive fault response to proactive, intelligent, and resilient operations. By leveraging principles from larger electrical grids—such as advanced sensor networks, IEC 61850-based communications, multi-agent coordination, and AI-enhanced analytics—subway operators can substantially improve service reliability and safety. The distinct challenges posed by subterranean environments, constrained topology, and high-density loads necessitate careful customization of these technologies, but successful pilot programs demonstrate their feasibility and benefits.
Looking forward, the continued urbanization of metropolitan centers and the increasing importance of mass transit solutions position subway systems as prime candidates for next-generation self-healing research. Ongoing studies will likely explore deeper integration with other urban energy systems, broader resilience frameworks that account for climate and security risks, and innovative control architectures balancing automated intelligence with prudent human oversight. As these efforts mature, self-healing subway power systems will not only advance the overall reliability of urban rail transit but will also serve as an exemplary application domain, pushing the boundaries of intelligent control and automation in critical infrastructures worldwide.

3. Specific Challenges Faced by Subway Power Supply Systems

Subway power supply systems present a unique set of technical, operational, and regulatory challenges that distinguish them from traditional power grids or typical distribution networks. These challenges arise not only from the complex topology and confined operating environment in underground rail systems but also from increasingly stringent requirements for reliability, safety, and real-time fault management. Furthermore, the push toward higher levels of automation and intelligence introduces additional layers of complexity and integration needs, including communication standards, multi-agent coordination, and data-driven algorithms.
In this chapter, we discuss three principal categories of challenges, each meriting a dedicated subsection. First, in Section 3.1, we analyze the complex topology and operational constraints that impede straightforward adoption of conventional self-healing solutions. Second, in Section 3.2, we delve into fault diagnosis, isolation, and recovery in real time, focusing on the technological barriers and performance metrics that must be addressed to achieve rapid response. Finally, in Section 3.3, we examine regulatory, safety, and integration barriers with emerging technologies, emphasizing the interplay between standards compliance (e.g., IEC 61850), safety-critical requirements, and the integration of MASs and AI. These three dimensions are interlinked, collectively shaping the reliability, intelligence, and overall feasibility of self-healing subway power supply systems.

3.1. Complexity of Topology and Operational Constraints

Subway power supply systems typically feature intricate power distribution architectures, including multiple substations and complex feeder lines that operate under high load density and limited physical space [62]. This complexity is driven by high passenger demand, the need for continuous operation, and safety requirements such as mandatory power redundancy to allow safe evacuation of passengers in case of a single-point failure [63,64]. These design constraints make it difficult to implement conventional self-healing solutions, which are often based on more flexible or widely distributed networks. In the following sections, we will employ methodological simplification in presenting these technical complexities, with the explicit objective of enhancing accessibility for interdisciplinary audiences and non-specialist readerships.

3.1.1. Unique Structural Layout and Load Characteristics

Unlike standard distribution networks, subway systems employ ring-like or radial topologies—often in combination—to ensure redundancy. These are designed to provide backup power in the event of a fault, but the limited space in tunnels restricts the addition of extra cables or protective devices. Furthermore, subway systems experience rapid fluctuations in load due to train acceleration, deceleration, and regenerative braking. This variability creates challenges for monitoring and controlling the system in real time. To make this concept clearer, we will now describe the load behavior in simpler terms.
Mathematically, we can describe the power flow through a given feeder segment i using a simplified representation of a DC traction power system (assuming DC electrification for many modern subway systems). If Pi is the power demanded by the train(s) on feeder i and Vi is the operating voltage, then the current Ii is as follows:
I i = P i V i .
Equation (4) expresses the power demand (Pi) for a given feeder i, where Pi changes as the train moves and accelerates. This time-varying load behavior makes it difficult for traditional fault detection algorithms to work effectively. For example, traditional models assume that the load is stable, but in a subway system, it fluctuates rapidly, requiring adaptive methods to account for these changes. This dynamic load behavior complicates fault detection algorithms that rely on steady-state signals, making them less effective for real-time monitoring in a subway environment.
However, due to frequent changes in Pi arising from train movement and acceleration patterns, the power balance equation must be continually updated in real time. This results in a time-varying load profile:
P i ( t ) = F ( position , acceleration , passenger   load , ) ,
where F(·) is a function capturing the instantaneous power consumption influenced by operational and environmental factors. Such dynamic load behavior complicates fault detection algorithms that rely on stable or quasi-steady-state load signatures, necessitating advanced, predictive, or adaptive methods.

3.1.2. Space Constraints and Infrastructure Limitations

Owing to dense urban environments, subway substations and power cables often share limited underground space with other utilities. This limited space makes it difficult to install additional equipment or sensors, which are crucial for real-time fault detection and system monitoring. The physical layout can restrict the addition of redundant cables or the implementation of standard protective equipment, such as circuit breakers and advanced switchgear. The deployment of sensors for condition monitoring and the retrofitting of intelligent devices also become more challenging in tight spaces. These infrastructural limitations underscore the importance of designing compact yet reliable control modules.
Moreover, specialized ventilation and cooling requirements, imposed by the underground setting, introduce additional operational constraints. Protective devices must be designed to handle higher ambient temperatures and humidity levels, while also meeting stringent fire and smoke control regulations. Consequently, hardware designed for standard aboveground distribution networks is not always directly applicable to subway environments. In summary, the confined environment of subway systems demands compact, yet highly reliable equipment for efficient self-healing mechanisms.

3.1.3. Operational Demands and Safety Considerations

Safety is of paramount concern in subway operations. Any fault or power interruption could endanger passengers, particularly in tunnels with limited escape routes. Subway operators need systems that can quickly detect faults, isolate them, and restore power to critical systems like ventilation and lighting to ensure passenger safety. In a typical self-healing system, power restoration is based on load importance. In subways, however, safety-critical loads such as lighting and ventilation must always be prioritized, even if they conflict with restoring power to non-essential areas. This places additional demands on the system to function optimally under high stress.
To meet these heightened safety requirements, self-healing solutions must incorporate advanced features, such as automatic rerouting of power and dynamic load shedding for non-essential systems. The complexity of these solutions, combined with the underground operational constraints, highlights the need for more refined, adaptive control strategies. Based on this, Table 2 summarizes eight key aspects of complex topological and operational constraints in power and energy systems, along with a special focus on subway power supply systems [65,66,67]. This table encapsulates the multifaceted nature of operational constraints and network topologies in subway systems compared to broader power and energy frameworks. From the heightened fault tolerance requirements necessary to safeguard human lives, to severe spatial limitations and challenging environmental conditions, subway environments significantly amplify conventional distribution network complexities. These constraints underscore the urgent need for research into compact, resilient, and adaptive technologies, including advanced sensor networks, real-time analytics, and robust communication protocols. Our viewpoint is that developing a holistic approach—one that integrates hardware design, data analytics, and regulatory compliance—will be essential for effectively addressing these challenges in self-healing subway systems.
Below is Table 3, which further expands on the same constraints but from the perspective of potential technological and research-based interventions aimed at overcoming them. This table highlights the range of existing and emerging technological interventions that address the unique constraints of subway power supply systems. While many solutions are at a pilot or early-adoption stage, they collectively represent promising avenues for achieving greater reliability and smarter operations. Crucially, the success of these innovations depends not only on technological feasibility but also on regulatory frameworks, cost-effectiveness, and the availability of skilled personnel. Our view is that a holistic, lifecycle approach—one that spans design, implementation, maintenance, and upgrade—will be the linchpin for ensuring these interventions evolve into robust, standardized solutions tailored to underground rail environments.

3.2. Fault Diagnosis, Isolation, and Recovery in Real-Time

Real-time fault diagnosis, isolation, and system recovery form the technical core of any self-healing power network. In subway power supply systems, rapid fault response is even more imperative because of high safety requirements, the potential for substantial passenger disruptions, and the confined nature of subway tunnels. This section delves into the distinctive aspects of fault management in subway systems, focusing on the integration of advanced diagnostics, multi-agent coordination, and communication protocols suitable for underground conditions. We will now break down these concepts into simpler terms, explaining the steps involved in fault detection and recovery.

3.2.1. High-Speed Fault Detection and Localization

Subway power systems typically rely on a combination of protective relays and local measurement devices (e.g., current transformers and voltage sensors) to identify fault conditions [68,69,70]. However, the high-speed nature of subway systems requires these devices to detect faults almost instantly, even before they fully propagate through the network. In traditional systems, faults can be detected in milliseconds, but in subway systems, we need to detect and respond within fractions of a cycle to prevent accidents and minimize damage. Formula (6) provides a mathematical framework for fault detection using wavelet transforms, enabling the detection of rapid fault transients with high precision.
Recent advances in algorithms, such as wavelet-based methods, have shown promise in identifying fault signals almost instantly [71,72,73]. These methods work by detecting sudden changes in voltage or current signals. The key challenge is ensuring that these algorithms can handle the variability caused by rapid load changes, which are common in subway systems. Wavelet transforms are particularly effective because they can analyze signals at multiple scales, allowing for the detection of abrupt changes in voltage or current in real time. Formula (6) defines the wavelet transform used to capture such changes. By applying this transform to the measured signals, the system can identify fault occurrences almost immediately, thus reducing detection time.
For instance, a wavelet transform approach can identify the abrupt changes in current or voltage signals within fractions of a cycle. Let Ia(t) be the fault current signal measured at a feeder location a. A wavelet-based algorithm may compute the wavelet coefficients W(τ, s) at scale s and time shift τ:
W ( τ , s ) = 1 s I a ( t ) ψ t τ s d t ,
where ψ is the mother wavelet. By identifying large coefficient magnitudes in certain frequency bands, the fault can be detected and localized almost instantaneously. Ensuring that these algorithms remain robust to variable load levels and potential measurement noise is an ongoing challenge, especially in the subterranean environment. This formula provides a key methodology for real-time fault detection in subway power systems, leveraging wavelet-based analysis to quickly respond to fault conditions while accounting for the dynamic nature of subway systems. To further enhance detection accuracy, it is important to consider the operational context and environmental variability of subway networks. This consideration highlights the complexity of fault detection in subway power systems, where environmental factors such as electromagnetic interference and fluctuating load demands must be accounted for in the design of detection algorithms.

3.2.2. Isolation Strategies in Constrained Environments

Once a fault is detected, it must be isolated quickly to prevent further damage. In subway systems, space constraints make it difficult to deploy additional circuit breakers. However, MASs offer a solution by using distributed sensors and devices that communicate with each other to decide the best strategy for isolating the faulted section. In simpler terms, instead of relying on a central controller, the system uses a network of intelligent devices that work together to identify and isolate faults.
MASs offer a promising solution by allowing distributed relays or intelligent electronic devices (IEDs) to communicate and coordinate their actions [74,75,76]. When a fault is detected, these agents negotiate which segment should be isolated, balancing the safety, load requirements, and operational constraints. Such decision-making can be modeled as an optimization problem under real-time constraints, often solvable through heuristics or simplified linear programming approaches to ensure the solution is computed fast enough for practical deployment.

3.2.3. Rapid Service Restoration and Self-Healing Techniques

After isolation, a paramount objective is to restore service to as many segments as possible while maintaining critical loads. Self-healing mechanisms may utilize alternative feed paths, or in DC traction systems, reconfigure the power supply from one substation to another if multiple substations supply overlapping regions. The final goal of a self-healing system is to restore power to as many segments as possible while ensuring that critical loads remain operational. To do this, the system may need to reroute power along alternate paths, either from a different substation or by using available backup power sources. The process must prioritize safety-critical systems and avoid overloading other sections of the network. MASs can help coordinate this process by allowing different agents to communicate and adjust their actions based on real-time conditions. This adaptive approach helps minimize downtime and prevent further faults from occurring during the restoration process. Key considerations for restoration strategies include the following:
(1)
Prioritization of Essential Loads: Station lighting, ventilation fans, and communication systems typically take precedence.
(2)
Gradual Re-Energization: Inrush currents from multiple loads can lead to secondary faults if not controlled.
(3)
Adaptive Coordination: Agents update each other on the status of circuit breakers and load demands, recalculating the optimal restoration path dynamically.
Fault-tolerant communication remains pivotal here, as real-time data exchange among IEDs, substation controllers, and train control centers is critical for coordinated restoration. Based on the above, below is Table 4, describing the key dimensions of real-time fault management and how they compare between general power systems and subway-specific applications. This table clarifies how fault management considerations in subways differ from those in broader power systems. While the need for speedy detection and isolation exists universally, the stakes in a subway environment are amplified by passenger safety and the confined space. Traditional solutions must often be miniaturized, accelerated, or re-engineered for subterranean use. Our viewpoint is that a concerted push toward distributed, intelligent solutions that integrate seamlessly with robust communication frameworks is vital. Given the urgency and operational constraints, these solutions should also incorporate redundancy in both hardware and decision-making processes.
Further, below is Table 5, focusing on specific technological enablers and approaches that enhance fault detection, isolation, and recovery in real time. This table spotlights the technological solutions that promise to revolutionize real-time fault management in subway power systems. From high-speed relays and advanced signal processing techniques to multi-agent coordination and AI-driven prediction, the options are diverse yet complementary. In our assessment, the challenge lies in harmonizing these approaches into a unified architecture that can meet the stringent safety and reliability benchmarks specific to subway operations. As communications improve and the costs of sensors and computational hardware decline, the feasibility of sophisticated real-time schemes will only increase, reinforcing the need for standardization and robust testing in live subway environments.

3.3. Regulatory, Safety, and Integration Barriers with Emerging Technologies

While technology plays a vital role in self-healing subway power systems, it must align with strict regulatory frameworks and safety standards. Subways are subject to multiple layers of oversight from local governments and safety authorities, making the certification and adoption of new technologies a complex process.

3.3.1. Safety Standards and Compliance Requirements

Subway systems are subject to multiple layers of regulation, typically involving railway authorities, local governments, and international standards bodies. For example, any modifications to power infrastructure may require compliance with IEC 62443 (for industrial communication networks and security), local traction power guidelines, and specialized rail transit codes. IEC 61850, although originally developed for substation automation in power grids, is increasingly recognized for its potential in rail environments, yet it must be adapted to handle traction power specifics and integrated with existing railway safety protocols (e.g., EN 50126/50128/50129 in Europe) [77,78,79].
Attaining certification for newly introduced protective devices or software modules can take years, owing to the rigorous testing processes mandated for passenger safety. This elongated timeframe impacts the agility with which subway operators can adopt emerging self-healing technologies and necessitates thorough planning from the inception of any R&D initiative.

3.3.2. Interoperability and Integration with Legacy Systems

Many subway power systems were installed decades ago and lack the modern communication interfaces required to support new technologies. Integrating these systems with MAS and AI solutions requires overcoming significant challenges such as protocol mismatches and hardware limitations. One solution is to use gateways and middleware to bridge the gap between old and new technologies, allowing legacy systems to communicate with modern devices. This approach can help transition subway systems to self-healing architectures without needing a complete overhaul. The major hurdles include the following [80,81,82]:
(1)
Data Format Incompatibility: Legacy devices may not generate standardized digital outputs necessary for AI-based analysis.
(2)
Protocol Mismatch: Communication standards, such as IEC 61850, must be layered on top of older SCADA systems or even analog control signals.
(3)
Hardware Limitations: Legacy switchgear may lack the control interfaces to enable external agent-based decisions or real-time reconfiguration.
For example, the integration of modern automation and intelligent control systems into legacy subway power infrastructures presents significant challenges due to outdated equipment and communication protocols. Mbango (2009) [80] highlights the difficulties of retrofitting SCADA-based legacy systems with modern communication standards like IEC 61850, emphasizing compatibility issues with aging switchgear and transformers. Dutta Pramanik and Upadhyaya (2025) [81] further explore how advanced IoT solutions, including motorized actuators and standardized communication protocols, can be layered onto older grid systems to bridge protocol mismatches and ensure interoperability while mitigating vendor lock-in. Additionally, their study underscores the necessity of updating legacy data formats and communication systems to enable AI-driven and MAS applications, as modern digital outputs and real-time decision-making capabilities often exceed the capabilities of older infrastructure. Together, these studies provide a comprehensive analysis of the technical and financial challenges associated with modernizing legacy subway power networks while ensuring reliability and efficiency.
As a result, achieving a unified self-healing architecture often involves partial overhauls or staged deployments, which complicate operational continuity and budget planning. Based on the above, Figure 3 illustrates the process of integrating advanced MAS or AI solutions into aging subway power infrastructures. This flowchart highlights key challenges—such as data format incompatibility, protocol mismatch, and hardware constraints—and proposes a phased approach to achieve a unified self-healing architecture while preserving operational continuity. Seen from Figure 3, the step-by-step explanation is summarized as follows.
1. Identify Existing Legacy Devices
A thorough survey of legacy switchgear and control equipment is conducted to determine their current functionality, communication interfaces, and overall compatibility with modern data acquisition and control standards.
2. Evaluate Key Constraints
(1)
Data Format Incompatibility: Older devices may provide analog or proprietary digital signals, which necessitate specialized conversion or encapsulation.
(2)
Protocol Mismatch: Historic SCADA platforms or purely analog signaling can diverge significantly from contemporary standards like IEC 61850.
(3)
Hardware Limitations: Legacy switchgear often lacks the necessary interfaces for remote actuation or real-time reconfiguration, hindering direct MAS or AI control.
3. Data and Protocol Adaptation
(1)
Protocol Gateways/Bridges: Gateways facilitate communication between legacy systems and modern platforms without requiring a wholesale replacement.
(2)
IEC 61850 Encapsulation: Wrapping legacy SCADA or analog signals in IEC 61850-compliant structures enables standardized management and interoperability.
(3)
Middleware for Data Format Conversion: Dedicated software tools unify disparate data formats, facilitating seamless integration into MAS or AI analytics.
4. Hardware Retrofits and Expansions
(1)
Upgraded Control Interfaces: Introducing new control boards or modules into legacy switchgear equips these devices with real-time monitoring and remote operation capabilities.
(2)
Additional Real-Time Monitoring Modules: Enhancing measurement accuracy and granularity via sensors or digital metering units provides critical data for AI-driven decision-making.
(3)
Partial Preservation of Analog Devices: When a full replacement is not immediately feasible, integrating digital solutions alongside retained analog components ensures a gradual transition.
5. Formulate Phased Deployment
(1)
Prioritize Critical Nodes: Target the most failure-prone or operationally significant components for early-stage retrofitting.
(2)
Assess Feasible Investment and Downtime: Balance the need for system reliability with available funding and permissible service interruptions.
(3)
Define a Technological Evolution Path: Implement an overarching plan that anticipates future standards and protects long-term compatibility.
6. Implementation and Testing
(1)
Incremental Equipment Replacement/Installation: Execute hardware upgrades and software integration in a series of controlled deployments to minimize risk.
(2)
Protocol and Data Interface Validation: Conduct rigorous testing of gateways, interfaces, and data conversion processes to ensure coherence and reliability.
(3)
MAS/AI-Integrated Testing: Validate the interaction between upgraded devices and AI-driven control systems, confirming that self-healing mechanisms function effectively under real-world conditions.
7. Unified Self-Healing Architecture Online
Upon successful validation, the modernized system with integrated MAS/AI solutions is commissioned, enabling comprehensive automated fault detection, isolation, and service restoration.
8. Ongoing Optimization and Maintenance
(1)
Technological Upgrades: Continuously refine the integrated system in response to emerging digital standards and novel AI algorithms.
(2)
Scheduled Device Renewal: Replace aging assets as part of routine maintenance, gradually increasing the proportion of modern, digitally enabled equipment.
(3)
Adoption of Emerging Standards: Align future developments with evolving industry protocols to maintain long-term interoperability and performance excellence.
As demonstrated in Figure 3, this phased methodology ensures the progressive transformation of legacy subway power systems into fully integrated, self-healing networks. By systematically identifying key technological constraints, implementing both hardware and software retrofits, and employing protocol adaptation strategies, stakeholders can maintain robust operational continuity while incrementally modernizing their infrastructures. Through the careful prioritization of critical nodes and the adoption of specialized tools for data conversion, legacy devices can be seamlessly incorporated into cutting-edge MAS or AI frameworks. The outcome is a resilient power network characterized by intelligent fault management, real-time monitoring, and long-term adaptability to new technological standards.

3.3.3. Balancing Innovation, Cost, and Public Acceptability

Public transportation authorities must balance the costs of upgrading infrastructure with the need to maintain service affordability. The cost of implementing self-healing systems—along with the specialized training and software licenses required—can be a barrier to adoption. Pilot projects provide an opportunity to gather performance data and build confidence in the technology before full-scale implementation. Demonstrating a clear return on investment (ROI)—usually via reductions in service interruptions and associated penalties—is often critical to securing funding. Notably, public acceptance serves as a critical prerequisite for system innovation, given that substantial operational modifications may elicit concerns over service reliability and cost escalations.
From a strategic perspective, adopting emerging technologies in smaller pilot projects can help gather performance data and build confidence among stakeholders. However, scaling from pilot to system-wide deployment introduces additional complexities, underscoring the need for stable, well-documented, and standardized solutions.
Below is Table 6, summarizing major regulatory, safety, and integration issues, contrasting their treatment in general power systems versus specialized subway networks. This table highlights the interplay between stringent regulatory environments, public safety imperatives, and the integration of legacy systems that typify subway power networks. Unlike typical power utilities, where incremental modernization is possible with relatively less public scrutiny, subway systems face direct accountability to commuters and municipal governments. Consequently, adopting a self-healing paradigm requires a thorough demonstration of reliability and compliance. Our perspective is that the regulatory dimension should not be viewed merely as a constraint but as an essential guideline to ensure passenger well-being and system robustness. Collaborative efforts between standardization bodies and railway authorities, coupled with well-defined pilot projects, can help accelerate the adoption of advanced technologies.
Further, below is Table 7, focusing on specific measures, strategies, and research directions that can alleviate regulatory, safety, and integration bottlenecks. In this table, we map out a variety of strategic pathways by which regulatory, safety, and technological barriers can be mitigated. From unified standardization to pilot sandbox environments and modular retrofitting approaches, there are numerous tactics available to ease the transition toward self-healing architectures in subway systems. Our viewpoint underscores the importance of synergy between technology developers, railway authorities, and public stakeholders. The potential payoffs—enhanced passenger safety, improved operational efficiency, and a more resilient transit system—amply justify the investment and effort required to navigate these constraints.
Overall, this chapter has surveyed the multifaceted challenges unique to subway power supply systems. In Section 3.1, we covered the intricate topological constraints and the difficulty of operating within confined underground environments. Section 3.2 then examined the technical hurdles in implementing real-time fault diagnosis, isolation, and self-healing restoration, emphasizing the importance of speed, coordination, and robust communication. Finally, Section 3.3 analyzed the broader regulatory, safety, and integration issues that govern how new technologies can be adopted and scaled within subway networks. These three sections collectively highlight that any self-healing strategy for subway power supply systems must be holistically designed—encompassing engineering solutions, operational practices, and regulatory frameworks—to meet the stringent reliability and safety demands of modern urban rail transit.

3.4. Advancing Fault Management and Self-Healing Capabilities in Subway Power Supply Systems

In this section, we delve into the key technological innovations that are reshaping fault management and self-healing capabilities within subway power supply systems. As subway systems face increasing operational demands and stringent safety requirements, the need for more intelligent, adaptive, and efficient fault management systems has never been more critical. By leveraging emerging technologies such as MASs, AI-based algorithms, and real-time data analytics, subway power networks can achieve faster fault detection, isolation, and recovery, ultimately enhancing both operational reliability and passenger safety.
As presented in Table 8, the comparison between traditional self-healing techniques and the proposed MAS-based strategy highlights the distinct advantages of adopting AI-driven solutions in managing faults within the complex and confined environments of subway systems. This table presents a detailed comparison, illustrating how the MAS-based approach addresses critical challenges, such as fault detection speed, real-time adaptability, and recovery efficiency, that conventional methods struggle to overcome.
1. Key Advantages
The comparison in Table 8 highlights the significant advancements that MAS-based self-healing systems offer over traditional techniques. By leveraging AI-based fault detection and distributed decision-making, MASs offer more precise, rapid, and adaptive fault management compared to conventional methods that rely on static algorithms and manual intervention [83,84,85]. Key advantages include the following:
(1)
Improved Detection Speed: MAS-based systems are capable of detecting faults in near real time, a critical feature for high-speed subway systems where rapid fault detection can minimize service disruptions and enhance passenger safety.
(2)
Increased Flexibility: Unlike traditional systems that follow fixed algorithms, MASs adapt to real-time network conditions, allowing for dynamic fault isolation and recovery strategies. This is especially important in complex subway topologies, where conventional systems struggle to handle intricate network configurations.
(3)
Enhanced Recovery Efficiency: MASs can reconfigure power distribution networks on the fly, ensuring that critical loads like lighting and ventilation are restored first, which is crucial for subway systems where passenger safety is paramount.
(4)
Space and Maintenance Benefits: Traditional systems require significant hardware installations, which can be difficult in the confined underground spaces of subway systems. MASs, by contrast, use distributed sensors and intelligent agents, reducing the need for additional hardware and simplifying system maintenance.
2. Future Outlook and Research Directions
The adoption of MAS-based self-healing systems in subway power supply networks is a promising avenue for overcoming the unique challenges faced by these systems. However, several hurdles remain:
(1)
Integration with Legacy Systems: Integrating MASs with older subway power infrastructures presents significant challenges due to outdated communication protocols and hardware limitations. Future research should focus on developing seamless integration frameworks that allow for gradual modernization of legacy systems without major disruptions to existing operations.
(2)
Regulatory Challenges: The deployment of MAS-based systems in subway networks requires updates to regulatory frameworks, especially in terms of safety certifications and standards compliance. Ongoing collaboration between AI experts, regulatory bodies, and transit authorities will be essential to harmonize new technologies with existing safety protocols.
(3)
Cost and Implementation Feasibility: While MASs offer significant benefits, the initial cost of implementation may be prohibitive for many subway systems, especially those in less economically developed regions. Future research should focus on developing cost-effective solutions that make MAS adoption more accessible, including low-cost sensors and cloud-based processing frameworks.
(4)
Real-Time Data Processing: Edge computing and machine learning algorithms will play a critical role in processing the massive amounts of data generated by subway power supply systems [86,87,88,89,90]. Future advancements in these technologies will be crucial for achieving the real-time decision-making required for effective self-healing.
Overall, this section has examined the technological innovations that underpin self-healing subway power supply systems, with a focus on the MAS-based strategy. A comparison with traditional systems underscores the substantial benefits of MASs in terms of fault detection, isolation, recovery efficiency, and system scalability. However, challenges related to legacy system integration, regulatory compliance, and cost remain, and further research is necessary to address these barriers. As technology evolves, MASs will become an increasingly integral component of resilient and adaptive power networks, offering enhanced reliability and safety for subway operations.

4. The Integration of MASs and the IEC 61850 Standard into Subway Power Systems

Before delving into the detailed discussions in this chapter, it is crucial to establish a coherent overview of how each subsection will contribute to our central theme: integrating MASs with the IEC 61850 standard to achieve enhanced self-healing capabilities in subway power networks. Section 4.1 lays the theoretical and conceptual groundwork by examining the principles, operational frameworks, and core algorithms underlying MAS-based self-healing approaches, thereby clarifying the motivations for distributing intelligence and control across multiple agents in complex subway power infrastructures. Section 4.2 shifts the focus to the IEC 61850 standard itself, detailing its data modeling techniques, communication protocols, and engineering methodologies. This portion provides clarity on why IEC 61850 is pivotal for creating standardized data structures and high-speed communication channels in modern traction power systems. Finally, Section 4.3 synthesizes the findings of the previous two sections by proposing a convergent MAS–IEC 61850 architecture, highlighting how the synergy between a distributed agent framework and a standardized communication protocol can revolutionize fault detection, isolation, and restoration (FDIR) processes. These three subsections collectively illustrate how MASs and IEC 61850 can be cohesively integrated to bolster both the intelligence and interoperability of next-generation subway power systems.

4.1. MAS-Based Approaches to Self-Healing in Subway Power Systems

MASs have gained prominence as a robust paradigm to address the growing complexities within electric power networks, including distribution grids, transmission infrastructures, and railway/metro traction power systems. In subway power networks, the non-trivial combination of AC (e.g., 25 kV or 35 kV) and DC (often 750 V or 1500 V) segments, along with extensive feeder lines, makes centralized architectures prone to single-point failures, communication bottlenecks, and slow response times. By contrast, MAS-based frameworks distribute intelligence among localized agents that monitor and control subsets of the network. These agents can collaborate with one another to execute real-time fault diagnosis, isolation, and restoration decisions, thereby enabling faster self-healing actions and minimizing disruptions to subway operations.

4.1.1. Conceptual Foundations and Control Philosophies of MASs

From a theoretical standpoint, MASs can be viewed as an ensemble of autonomous entities—termed agents—each responsible for a specific functional or geographical segment of the power system. Agents are designed to perceive local states (such as voltage, current, power flows, or device status) and communicate with peer agents (or higher-level controllers) to reach optimized global or semi-global control objectives [91,92,93]. These studies [91,92,93] collectively explore the control and management methodologies of MASs in power systems, emphasizing the role of autonomous agents in local perception, global optimization, and coordinated control. Logenthiran (2012) [91] investigates the application of MASs in distributed power systems, proposing a real-time management and optimization strategy based on agents to enhance system flexibility and autonomy. Dou et al. (2014) [92] further developed a decentralized coordinated control method based on MASs, improving the transient stability of large-scale power systems through information exchange and collaboration among agents. Farid (2015) [93] focuses on the design principles of MASs for resilient coordination and control in future power systems, introducing a framework that enables more efficient responses to sudden failures and dynamic load changes. These studies demonstrate that MAS technology, through the collaborative work of distributed intelligent agents, can achieve optimized control and enhance the reliability and self-healing capabilities of power systems.
A key advantage lies in the ability of agents to respond locally to disturbances while still coordinating across the network to avoid suboptimal or contradictory actions. Formally, let us denote the network by a set of buses B and lines L. Each agent Ai (where i∈{1, 2, , N}) monitors a subset of buses BiB and lines LiL. In a distributed control algorithm, an agent’s decision uiu_iui may be formulated as follows:
u i = arg min u i U i J i ( x i , x i ) ,
where
  • xi is the local state vector observed by agent i;
  • xi represents the states observed by neighboring agents or those shared through communication;
  • Ji is the cost function capturing local objectives (e.g., minimize power loss, ensure safe operation under fault conditions, or maintain voltage within permissible limits);
  • Ui is the feasible action space for agent i.
Through an iterative or event-triggered communication protocol, each agent refines its control decision ui, coordinating with adjacent agents until a convergent solution is reached or a deadline for fast self-healing control expires. This iterative process, facilitated by MASs, ensures real-time adaptation to evolving operating conditions in the subway power network.

4.1.2. MAS-Based Fault Detection and Diagnosis

In the context of subway power systems, MAS-based fault detection methodologies frequently combine local sensor data with higher-level decision-making processes. Each agent employs local measurements—such as overcurrent readings, voltage dips, traveling-wave signals, or negative sequence components—to detect anomalies. When a local agent suspects a fault, it initiates a distributed consensus mechanism to confirm that the disturbance is genuine and not merely a sensor malfunction or transient event. A simple but illustrative approach is shown below:
α i ( t ) = 1 , if   local   measurement   indicates   fault   at   time   t 0 , otherwise
By exchanging αi(t) values, if a sufficient fraction of neighboring agents also detect anomalies (αi(t) = 1), the network of agents collectively raises a system-wide fault alarm. This distributed confirmation drastically reduces false positives and eliminates reliance on a single central controller. Once a fault is confirmed, specialized diagnostic agents implement advanced signal processing or pattern recognition algorithms to classify the fault type (e.g., single-phase-to-ground, short-circuit between phases, or DC traction line grounding fault) and localize the faulted segment within the network’s geographic or topological mapping.

4.1.3. MAS-Based Fault Isolation and Restoration

Fault isolation and restoration processes in MASs rely on cooperative control actions among protective devices such as breakers, switches, and reclosers. In a typical subway distribution scenario, the primary objective is to de-energize only the faulted section while maintaining power supply to all unaffected sections. The MAS approach can be summarized by the following pseudo-equations. Suppose that an agent controlling a switch SWj needs to decide whether to open or close the switch after a fault is detected and located:
Open ( SW j ) = 1 , if   Δ P loss Δ P risk < λ 0 , otherwise
Here, ΔPloss represents the power that would be disconnected if the switch is opened, ΔPrisk quantifies the operational or safety risk of leaving the switch closed, and λ is a threshold that the agent dynamically adjusts based on the system’s real-time operational context (e.g., passenger load demands or train frequency). Once the faulted section is isolated, restoration agents reconfigure network topology to reroute power through alternative paths or feeder lines. Each agent reevaluates the line capacities, voltage levels, and breaker statuses, ensuring that the newly configured network operates within acceptable thermal limits.

4.1.4. Evaluation of MASs in Subway Environments

Empirical studies and field trials suggest that MAS-based approaches reduce fault-clearing times and curtail the scope of outages in urban rail power networks. Moreover, their distributed nature inherently accommodates incremental system expansions. However, challenges remain, notably the standardization of agent communication protocols and the design of robust agent negotiation schemes that handle complex couplings between AC traction, DC traction, and station loads. These gaps emphasize the importance of coupling MASs with standardized frameworks such as IEC 61850, which we examine more thoroughly in subsequent sections.
Based on the above, Table 9 provides a comparative overview of key application areas where MAS approaches can significantly enhance the self-healing capabilities of subway power systems [94,95,96]. For example, the reviewed literature highlights the integration of MASs in the optimization and control of power systems, particularly in enhancing self-healing capabilities and fault management. Herrera et al. (2020) [94] provide a comprehensive review of MAS applications in complex networks, emphasizing its potential in resilient design and self-healing, which can significantly improve fault isolation and service restoration. Sharifi (2015) [95] focuses on energy-aware service provisioning in peer-to-peer cloud ecosystems, discussing how MASs can optimize energy flow and enhance system reconfiguration. Meanwhile, Irfan et al. (2017) [96] examine the role of MASs in the control of smart grids, stressing its contribution to predictive maintenance, fault detection, and self-healing, thus strengthening the overall reliability of the system. Collectively, these works underscore the strategic importance of MASs in advancing the fault management and resilience of modern power networks, particularly in urban infrastructure like subway power systems. Based on this, Table 9 summarizes potential growth trends, technological barriers, and strategic importance across ten dimensions. Overall, it underscores that fault isolation, restoration, and system reconfiguration hold the most immediate promise for broad adoption, while areas such as predictive maintenance, microgrid integration, and power quality monitoring remain underexplored but present high potential for future research and implementation. Our viewpoint is that a collaborative, standardized approach—supported by robust communication protocols—will be crucial in advancing MAS-based solutions from pilot demonstrations to large-scale deployments.
As shown in Table 9, this table provides a comprehensive comparison of the applications of MASs in subway power systems across various scenarios, including fault diagnostics, system reconfiguration, and energy management, highlighting adoption levels, challenges, and future directions. Based on this, Table 10 delves deeper into emergent research directions for MASs in subway power systems, covering topics such as scalable architectures, agent-based security, big data integrations, and varying hierarchical designs. A salient point from this table is the growing interest in hybrid hierarchical–distributed frameworks and integrated intrusion detection solutions that enhance both system reliability and cybersecurity. In our assessment, real-time simulation and hardware-in-the-loop testing remain critical to validate new MAS concepts in an environment that faithfully reflects the complexities of day-to-day subway operations.
Overall, this section underscores that MASs offer a potent framework for distributed self-healing control in subway power systems. Yet, to unlock its full potential, it must be reinforced by standardized communication infrastructures—particularly those specified in IEC 61850. The next section provides an in-depth exposition of how IEC 61850 can facilitate high-speed, interoperable, and secure data exchange across a broad spectrum of power system devices, thereby complementing and enhancing MAS-based strategies.

4.2. Implementation of the IEC 61850 Standard in Subway Power Systems

The IEC 61850 standard was originally devised for substation automation, enabling standardized object models, data structures, and communication protocols that support interoperability and vendor-neutral engineering [97,98,99,100,101]. Over the past decade, it has been extended to encompass distribution automation, renewable energy integration, and even railway electrification contexts. In subway power systems, the standard addresses complex challenges such as seamless multi-vendor integration, real-time protective relaying, and advanced automation functionalities essential for self-healing. This section thoroughly explores the protocols, models, and engineering techniques that make IEC 61850 a cornerstone for modernizing subway power networks.

4.2.1. Foundations of IEC 61850 and Its Relevance to Subway Networks

IEC 61850 provides a comprehensive approach encompassing data modeling, communication stacks, and configuration languages [102,103]. Its fundamental building blocks include the following:
(1)
Logical Nodes (LNs): Abstract representations of power system functions (e.g., measurement, protection, and control).
(2)
Data Objects and Data Attributes: Structured to capture various aspects of a function’s state, measurement readings, and control parameters.
(3)
Communication Services: Such as GOOSE for high-speed event transfer, and MMS (Manufacturing Message Specification) for client–server communications.
For subway systems, the dual AC/DC supply lines, the presence of intricate protective schemes (like distance protection for AC lines, and overcurrent or undervoltage protection for DC traction feeders), and the need for reliable real-time data exchange across stations render IEC 61850 uniquely valuable. In particular, the GOOSE mechanism allows for “peer-to-peer” communication [104,105]. This ensures that protective relays and control IEDs (Intelligent Electronic Devices) can exchange trip signals, blocking commands, or reclose instructions with minimal latency—a critical requirement when trains must be continuously powered and passenger safety is paramount.

4.2.2. IEC 61850 Network Redundancy and Communication Protocols

Reliability is a non-negotiable criterion in traction power systems. IEC 61850 accommodates various redundancy protocols to ensure minimal downtime in the event of network faults:
(1)
Parallel Redundancy Protocol (PRP) sends duplicate packets over independent LANs, eliminating single points of failure.
(2)
High-availability Seamless Redundancy (HSR) adopts a ring topology, where each node forwards frames in both directions around the ring, ensuring zero recovery time in the case of link interruption.
(3)
Rapid Spanning Tree Protocol (RSTP) [106,107,108] provides a loop-free topology but may involve small reconvergence delays.
In large-scale subway systems, a combination of PRP and HSR is often deemed optimal for process-level communications to guarantee near-instant failover. Stations, wayside cabinets, and centralized operation control centers thus rely on robust ring or mesh topologies that incorporate specialized switches supporting IEC 61850 traffic priorities. Latency, jitter, and packet-loss thresholds must be carefully specified to accommodate the stringency of traction power automation.

4.2.3. System Configuration Language (SCL) and Engineering

One of the distinguishing features of IEC 61850 is its system configuration language (SCL), defined in Part 6 of the standard. SCL allows power engineers to describe a substation’s single-line diagram, the communication architecture, and the functions hosted on each IED in a vendor-neutral XML (eXtensible Markup Language)-based format. For a typical subway traction substation, the SCL file might include Substation VoltageLevel Bay LN DataObjects .
By unifying the engineering process, SCL lowers the risk of misconfigurations and ensures that future expansions or modifications can be accommodated with minimal re-engineering efforts [109]. In self-healing contexts, the tight correlation of logical nodes (e.g., protection distance intelligent system (PDIS) for distance protection and protection overcurrent unit (PTOC) for overcurrent) with the real physical layout helps in automatically updating agent-based restoration strategies whenever the station’s layout changes.

4.2.4. IEC 61850 Services for Self-Healing

IEC 61850 offers several key services that directly underpin self-healing operations:
(1)
GOOSE Messaging for Fast Trip Signals: Agents or protective relays can disseminate critical messages throughout the local substation or extended feeder within milliseconds, facilitating prompt fault isolation.
(2)
Reporting and Logging Services: These allow an MAS or central authority to monitor system states in near real-time, capturing event sequences needed for diagnosing deeper network issues.
(3)
MMS for Agent–IED Interactions: MAS architecture can rely on MMS-based client–server communications to read or write device parameters, retrieve trending data, or orchestrate switchgear commands at a slower but more comprehensive timescale.

4.2.5. Challenges and Limitations in Subway Contexts

Despite its merits, deploying IEC 61850 in subway power contexts is not without hurdles. Ensuring electromagnetic compatibility in high-voltage/low-voltage mixed environments, training a workforce specialized in substation automation, and retrofitting older devices that do not inherently support standard object models are formidable tasks. Moreover, bridging DC traction equipment with AC-based LN definitions often requires custom or extended logical nodes. Nonetheless, the industry is gradually developing specialized profiles for railway electrification that align with standard IEC 61850 principles. Based on this, Table 11 enumerates the current landscape of IEC 61850 applications within subway power systems along multiple dimensions. Noteworthy takeaways include the relatively high adoption rates for protection and SCADA integration in newer installations, as well as the emerging interest in condition-based monitoring and MAS–IEC 61850 hybrid solutions. While the standard provides robust functionalities for AC traction, bridging the gap with DC traction elements remains a work in progress. Nonetheless, ongoing refinements and extension efforts position IEC 61850 as an increasingly indispensable backbone for any advanced self-healing architecture in subway settings.
Based on Table 11, we further summarize the challenges and future potential of IEC 61850 in subway power systems, as presented in Table 12. This table emphasizes ongoing issues in retrofitting legacy DC equipment, securing GOOSE/MMS (manufacturing message specification) communications, and grappling with the complexity of LN/DO/DA (logical node/data object/data attribute) mapping. Despite these obstacles, active research and standardization efforts continue to refine IEC 61850 for railway electrification contexts. In the long term, better synergy with MAS frameworks, improved security protocols, and a cohesive approach to LN expansions will yield a robust environment where advanced self-healing functions can be reliably deployed.
Overall, this section demonstrates that IEC 61850 serves as an enabling framework for achieving real-time, interoperable communications in subway power systems. Its emphasis on standardized data models, high-speed messaging, and robust engineering languages paves the way for advanced control schemes—particularly those based on MASs. In the subsequent section, we will elaborate on how MASs and IEC 61850 can be harmonized into a convergent architecture that maximizes the advantages of both approaches.

4.3. Convergent MAS–IEC 61850 Architecture for Fault Diagnosis, Isolation, and Restoration

While Section 4.1 and Section 4.2 have, respectively, presented the strengths of MASs and IEC 61850, the fusion of these two technologies represents a paradigm shift for urban rail power systems. By harnessing agent intelligence in tandem with standardized communication, subway operators can establish an ecosystem where self-healing actions are triggered, coordinated, and verified in a manner that is both robust and scalable. This final section outlines how a convergent MAS–IEC 61850 architecture can be implemented, highlighting design considerations, operational workflows, and potential obstacles along the integration pathway.

4.3.1. Architectural Overview and Design Considerations

A convergent MAS–IEC 61850 architecture introduces distributed “agent brains” into the well-defined communication and modeling backbone offered by IEC 61850. Each protective device or Intelligent Electronic Device (IED) can be associated with an “agent” that interprets local measurements (modeled under IEC 61850 logical nodes), exchanges GOOSE messages with neighboring agents, and cooperates with a supervisory agent at the substation or control center level via MMS. The system-level design can be broken down as follows:
(1)
Agent Layer: Comprising local device agents (e.g., relay agents and switchgear agents) and a station-level agent (or aggregator) that coordinates sectional restoration.
(2)
IEC 61850 Communication Layer: Facilitating high-speed GOOSE transmissions among local agents for real-time protective functions, and employing MMS for configuration, monitoring, and slower control commands.
(3)
Coordinated Control Layer: A higher-level mechanism, often at the control center, that merges data from multiple stations or lines. This layer might also integrate advanced AI algorithms for centralized oversight and strategic decisions.
From a modeling standpoint, each local agent references LN objects for measurements (e.g., (measurement unit) MMXU for power/voltage and PTOC for overcurrent protection) and thus can directly manipulate or read the data attributes under these logical nodes. The agent’s internal logic can be abstractly formulated as follows:
f agent ( X , Y , t ) { Control   Actions , Setpoints }
where X is the set of local LN data attributes (such as measured currents, voltages, and breaker statuses), Y is the set of GOOSE signals received from neighbors (e.g., protective trip commands and alarm states), and t indicates time or event trigger references.

4.3.2. Fault Diagnosis and Localization Workflow

A typical fault diagnosis scenario under convergent MAS–IEC 61850 architecture proceeds as follows:
(1)
Initial Detection (Local Agent): Upon sensing an abnormal current or voltage signature (modeled by LN PTOC or PDIS), the local agent increments an internal fault counter. If the magnitude exceeds a set threshold, the agent broadcasts a GOOSE-based “suspected fault” message to adjacent nodes.
(2)
Peer Confirmation (Neighboring Agents): Neighboring agents also evaluate their local signals. If they detect correlated anomalies, they respond with a GOOSE “confirmation” message. Weighted voting can be employed to mitigate false positives.
(3)
Station-Level Aggregation: The station-level or aggregator agent (connected via MMS and local GOOSE) collects these events. Using a pre-defined topology map (SCL-based), it identifies the line segment or bus location with the highest probability of fault development.
(4)
Refined Diagnostics: Optionally, advanced AI modules or traveling-wave-based algorithms can run at the aggregator level to further pinpoint the fault location.
(5)
Isolation Instruction: Once the aggregator agent validates the fault location, it issues GOOSE open commands to the relevant breakers or switches, ensuring minimal disruption to unaffected lines.

4.3.3. Restoration and Reconfiguration Strategies

After isolating the fault, agents collaborate to restore power to the maximum possible portion of the subway network. The station-level agent consults the SCL topology to identify alternate feeding paths. If the AC ring or DC feeder lines can accommodate the extra load without violating thermal or voltage constraints, a reconfiguration command is broadcast. Restoration might unfold in a multi-step process:
Step 1: SWhealthy←Close, Step 2: Check PcapacityPdemand, Step 3: Activate the line if all constraints are satisfied.
Each local agent (switchgear or feeder agent) acknowledges the command, rechecks local conditions, and then closes or opens respective switches. This sequence is governed by multi-agent consensus, ensuring that no single device performs an unsafe action. Detailed logging and reporting (via MMS) guarantee thorough post-event analysis.

4.3.4. Security and Redundancy Considerations

When MAS intelligence relies on an IP-based IEC 61850 network, cybersecurity and redundancy are paramount. Agents must handle the encryption or authentication of GOOSE messages. Meanwhile, ring or dual-network topologies ensure that a single link failure does not compromise the entire self-healing process. Key security approaches include the following:
(1)
Role-Based Access Control (RBAC) [110]: Agents only process control instructions from authenticated roles recognized by the substation system.
(2)
Encrypted Tunneling of GOOSE/MMS: Emerging solutions propose TLS-based encryption for MMS, though GOOSE typically remains unencrypted for performance reasons.
(3)
Backup Communication Channels [111]: For critical commands, multiple GOOSE subscriptions may be created in parallel networks (e.g., PRP + HSR) to reduce the risk of packet loss or delay.

4.3.5. Practical Challenges and Future Outlook

In practice, the synergy of MASs and IEC 61850 faces numerous engineering, organizational, and financial challenges. Notably, older DC traction systems lack standardized LN definitions, requiring the use of proxy or extended LN models. Additionally, debugging multi-agent logic in a live subway environment with thousands of daily passengers demands rigorous offline testing (hardware-in-the-loop simulations) prior to rollout. Nonetheless, the promise of real-time, distributed intelligence—coordinating with a vendor-agnostic, standards-based communication framework—strongly indicates that convergent MAS–IEC 61850 solutions will shape the next generation of subway power systems.
Based on the above, Table 13 outlines various deployment modalities—ranging from full greenfield implementations to more conservative station-focused upgrades. Each scenario entails different levels of complexity, initial expenditure, and performance demands. The success of these deployments hinges on a confluence of factors, including the readiness of legacy systems for integration, the expertise of stakeholders, and the clarity of standard definitions for DC traction contexts. Nonetheless, incremental or phased approaches can systematically unlock the benefits of MAS–IEC 61850 synergy.
Based on Table 13, we further summarize the future R&D themes for MAS–IEC 61850 convergence in subway networks, as demonstrated in Table 14. In this table, several emergent R&D themes underscore the evolving nature of MAS–IEC 61850 convergence, including the integration of edge computing, AI-driven fault forecasting, and the development of new LN classes for DC traction applications. Each theme demands multidisciplinary collaborations that range from cryptography for secure GOOSE messaging to advanced hardware engineering for resilient edge-based agents. The ultimate payoff—a fully autonomous, self-healing subway power system that leverages standardized communication—justifies the complexity of these endeavors.
Overall, this section has presented a comprehensive view of how MASs and the IEC 61850 standard can be synthesized into a single, cohesive architecture that elevates fault management and self-healing to new levels of efficiency and reliability in subway power networks. While operationalizing this synergy demands substantial effort in areas such as engineering, cybersecurity, and standardization, the strategic advantages are evident: higher system resiliency, minimized downtime, and an adaptive framework capable of meeting future urban transportation demands. By situating MAS intelligence within the standardized data and communication protocols of IEC 61850, subway operators can accelerate fault response; reduce manual interventions; and pave the way for a truly smart, autonomous rapid transit infrastructure. This integrated approach stands at the forefront of the ongoing “intelligence revolution” in power systems, offering a compelling vision for the next generation of urban rail electrification.

5. Practical Applications of Self-Healing Techniques in Subway Systems

In this chapter, we will examine the practical applications of self-healing techniques within subway power systems, specifically focusing on the integration of MASs, AI, and relevant standards such as IEC 61850. The application of these techniques is vital for improving the reliability and efficiency of subway networks, ensuring rapid fault detection, isolation, and recovery. This chapter will be organized into four key sub-sections:
(1)
Substation-level self-healing applications in subway power systems;
(2)
Line-level self-healing mechanisms and network reconfiguration;
(3)
Cross-layer fault recovery techniques and strategies;
(4)
AI-driven fault diagnosis and recovery in complex scenarios.
Each sub-section will delve into the practical applications of self-healing mechanisms in subway systems, highlight the technological advancements, and provide an in-depth analysis of the benefits and challenges involved in implementing these strategies. The sub-sections are connected logically, starting from individual substations and their self-healing capabilities; progressing to line-level self-healing; and finally addressing complex, multi-layer fault recovery strategies with the help of AI algorithms.

5.1. Substation-Level Self-Healing Applications in Subway Power Systems

Substations play a crucial role in the overall operation of subway power systems. They are the primary points of interaction between the high-voltage grid and the subway’s internal power distribution network. Their responsibility includes converting and distributing electrical power to subway stations and trains, making them a critical point of failure in any power outage event [112,113,114]. For example, Refs. [112,113,114] provide insights into the role of substations, fault detection, isolation, and reconfiguration in self-healing smart grid systems. They also highlight how these processes are integrated into critical infrastructure, like subway power systems, for reliability and operational efficiency. As previously discussed, substations play a crucial role in fault detection, isolation, and network reconfiguration, which is essential for the continuous operation of subway systems. Integrating self-healing mechanisms at the substation level enhances system reliability by enabling rapid fault identification and isolation, thereby restoring power to unaffected areas more efficiently.
However, scalability concerns arise when considering the application of these technologies in larger subway systems. While substation-level self-healing systems work effectively in smaller networks, the implementation of these systems in expansive urban subway networks requires careful planning. Specifically, as the number of substations increases, communication overheads and real-time data processing requirements increase, which may necessitate substantial investment in communication infrastructure and AI-driven control systems.
Cost considerations are also significant in large-scale implementation. While automated fault detection, isolation, and reconfiguration systems can minimize downtime and reduce maintenance costs over time, the initial capital expenditure for deploying advanced communication systems like IEC 61850 and AI-driven algorithms can be high. Therefore, a phased deployment strategy is recommended, starting with pilot implementations in smaller, manageable sections of the subway network before scaling up.

5.1.1. Fault Detection and Isolation

In substation-level self-healing systems, fault detection is the first critical step, followed by rapid isolation to prevent further system disruption. The application of digital fault recorders, current transformers, and protection relays with IEC 61850 protocols enables real-time fault detection and automated isolation. These systems are essential for minimizing the mean time to repair (MTTR) and improving overall network resilience.
Scalability issues may emerge when these systems are applied to larger subway networks with more substations, as the complexity of managing real-time communication between all devices increases. Effective system integration across substations becomes critical, and ensuring data consistency across multiple network layers will require robust data management systems and high-capacity communication infrastructure.
Once a fault is detected, the system must isolate the affected area quickly to prevent it from spreading and impacting other parts of the network. IEC 61850 standards enable real-time communication between devices within the substation, allowing for automated decision-making in the isolation process. Remote control devices and automated switches help to disconnect faulty sections from the rest of the grid, ensuring that only the impacted area is affected, and power continues to flow to other critical sections of the subway system.

5.1.2. Automated Reconfiguration

After isolating the faulted section, automated reconfiguration becomes necessary to restore service to the remaining sections. The key challenge here lies in optimizing the network’s configuration dynamically, based on real-time load conditions and fault location. The integration of AI-based reconfiguration allows the system to predict network behavior based on historical data and current system conditions, ensuring that power is rerouted efficiently.
Cost issues arise in the deployment of AI-based reconfiguration systems. While AI models require substantial computational resources and high-quality data for training, the long-term benefits—such as reduced downtime, optimized system performance, and predictive maintenance—can outweigh these initial costs. Predictive maintenance algorithms, for instance, can help prevent system failures before they occur, reducing unplanned maintenance costs significantly.
For example, Refs. [115,116,117] collectively explore the role of AI and machine learning in optimizing network performance, enhancing reconfiguration processes, and improving system efficiency. Alabi (2023) [115] discusses how AI methodologies such as reinforcement learning and deep learning contribute to network optimization in telecommunications, particularly through autonomous reconfiguration. Similarly, Umoga and Sodiya (2024) [116] delve into AI-driven optimization for dynamic network performance, focusing on how machine learning algorithms can facilitate adaptive network configurations in response to fluctuating conditions. Cruz et al. (2024) [117] provide a comprehensive review of AI applications for self-reconfiguration in smart manufacturing systems, emphasizing the integration of machine learning techniques to optimize operational efficiency and predict system behavior. Together, these studies underline the transformative impact of AI on system reconfiguration, predictive maintenance, and overall optimization in complex network environments.
The scalability of AI-based reconfiguration is also worth noting. As subway networks grow and expand, the demand for real-time processing power and advanced predictive models will increase, making it necessary to implement scalable cloud-based solutions and distributed computing frameworks. The ability of AI systems to scale will depend on the development of more efficient algorithms and distributed learning techniques that can be deployed across different sections of the network.

5.1.3. Key Technologies for Substation-Level Self-Healing

Several critical technologies contribute to substation-level self-healing, each addressing a specific aspect of fault detection, isolation, and recovery, as summarized in Table 15. These technologies include advanced communication protocols, remote control and monitoring systems, machine learning-based fault detection algorithms, and automated reconfiguration systems. These systems work in tandem to ensure that substations can respond quickly and efficiently to faults, minimizing downtime and ensuring continuous operation.
The integration of self-healing technologies at the substation level has proven to significantly enhance the overall reliability and efficiency of subway power systems. Automated detection, isolation, and reconfiguration ensure that faults are managed with minimal manual intervention, which not only reduces the risk of human error but also speeds up recovery times. Despite the advantages, challenges such as communication delays and system integration need to be addressed to further improve these systems’ effectiveness.

5.2. Line-Level Self-Healing Mechanisms and Network Reconfiguration

Line-level self-healing mechanisms are designed to handle faults along power distribution lines, which are often subject to environmental factors like storms, wildlife interference, and overloads. In this context, ring network configurations are particularly valuable, as they allow power to be rerouted through an alternative path when a fault occurs.
However, implementing these systems in large subway networks involves scalability concerns. Ring networks may become complex and inefficient if the number of interconnected lines increases. Additionally, line-level self-healing systems must be able to handle the increased number of fault detection sensors, automated switches, and MAS agents required to manage the additional complexity. Data communication between these components needs to be highly synchronized, and large-scale deployments will require robust communication protocols to handle the increased volume of data.
Cost and practical implementation issues also arise. While automated switches and MAS-based decision-making systems can significantly enhance fault detection and recovery, the high initial investment costs and the need for continuous maintenance of these systems could pose a barrier to widespread adoption in large urban transit systems. Therefore, the implementation of modular systems, where critical components are first deployed in high-priority areas, followed by gradual expansion, could offer a cost-effective solution.

5.2.1. Ring Network Configuration

A ring network is a type of network topology where multiple power paths are connected in a loop. This configuration allows for the rerouting of power if one segment of the network fails, ensuring that power continues to flow without major interruptions. When a fault occurs, the ring network can detect the faulted section and automatically isolate it, while simultaneously rerouting power to the affected area. This feature is particularly valuable in subway systems where uninterrupted service is crucial.

5.2.2. Automated Switches and MASs for Fault Detection and Isolation

Automated switches play a critical role in line-level self-healing. These switches can automatically disconnect faulty sections from the grid, ensuring that faults do not propagate further. They are equipped with sensors and communication devices that relay information about fault conditions to the central control system. These switches are often controlled through MASs, where multiple agents (representing different network components) collaborate to determine the best course of action based on real-time data.

5.2.3. Reconfiguration and Rerouting Power

When a fault is detected and isolated, the network must quickly reconfigure to restore power to the affected areas. Automated systems, powered by MASs and AI, help in making decisions about the most efficient way to reroute power to ensure continuity of service. This includes balancing loads across unaffected sections and optimizing power flow to reduce stress on the remaining parts of the network. Based on this, Table 16 presents an in-depth comparative analysis of several key line-level self-healing mechanisms within subway power networks. These mechanisms, including ring network reconfiguration, automated switches, MAS-based decision-making, fault detection sensors, real-time load balancing, and adaptive rerouting, play critical roles in enhancing the resilience, reliability, and overall performance of subway power supply systems. The table evaluates these mechanisms across multiple dimensions, including their degree of implementation, differences between systems, future prospects, issues to address, research potential, reliability impact, cost considerations, maintenance requirements, and impact on power quality.
A detailed summary and evaluation for Table 16 is presented as follows.
1. Ring Network Reconfiguration
Ring network reconfiguration is implemented at a high degree in most subway networks. It provides significant advantages, particularly in reducing the risk of power surges. However, the main challenge lies in the variability of its design across different systems, which can complicate integration. The future prospects for ring network reconfiguration are high, especially in terms of standardization. While it has a high research potential and reliability impact, it requires moderate costs and maintenance. Its impact on power quality is significant, making it an essential feature for enhancing power system robustness.
2. Automated Switches
Automated switches offer high implementation rates, particularly for fault isolation and rerouting. These systems are commonly available in most subway systems, contributing significantly to power reliability. However, time delays in operation and the moderate nature of their future prospects remain challenges. These systems present moderate research potential, and their high reliability impact makes them indispensable in critical power situations. Although the cost considerations are medium, they have relatively low maintenance requirements. Their contribution to improving power quality is notable, though not as high as some of the more advanced technologies.
3. MAS-Based Decision Making
The implementation of MAS-based decision-making is currently moderate, focusing on real-time fault management. This mechanism holds great promise for the future, particularly in smart grids, where its innovative potential can lead to more dynamic and adaptive responses. Despite the challenges in managing data communication overhead, this mechanism’s research potential remains high. The system’s impact on reliability is significant, especially for maintaining consistent power quality, but it faces moderate cost and maintenance requirements. Furthermore, MAS-based decision-making has high potential to improve system efficiency in the long term.
4. Fault Detection Sensors
Fault detection sensors are crucial for detecting and locating faults in subway networks. Their implementation is high, and they are critical for the efficiency of the systems. However, the occurrence of false negatives remains a challenge. These sensors provide medium research potential but contribute significantly to the reliability of power networks. The systems require moderate to high costs for implementation but have medium maintenance requirements. Their high impact on power quality makes them an essential part of a well-functioning self-healing system.
5. Real-time Load Balancing
Real-time load balancing plays a critical role in dynamically balancing the network load. This mechanism exhibits very high scalability but also presents challenges in coordinating complex systems. Real-time load balancing systems are highly promising in terms of enhancing power quality, though they require a significant investment in coordination and data throughput. Despite its high initial costs, this system shows great promise for optimizing the power system’s performance in the long term. Its research potential is high, as it can enable real-time adjustments to prevent overloads.
6. Adaptive Rerouting
Adaptive rerouting mechanisms, which adjust power flow dynamically, are based on cutting-edge technology and hold very high future prospects. These mechanisms offer significant advantages in terms of real-time adaptability, with the potential to dramatically reduce power disruptions. However, adaptive rerouting requires high data throughput and advanced technologies, making it one of the more complex systems to implement. Despite the high costs and medium maintenance requirements, adaptive rerouting mechanisms have a very high impact on power quality, which is crucial for ensuring the continuous operation of subway networks.
In conclusion, line-level self-healing mechanisms such as ring network reconfiguration, automated switches, and MAS-based decision-making enable quick fault isolation and recovery (Figure 4). These technologies ensure that power disruptions are minimized, and service can be restored quickly. The main challenges lie in the complexity of coordinating multiple agents across the network, and the requirement for continuous communication. Nonetheless, these mechanisms significantly improve the robustness and resilience of the system.
The fault isolation and recovery flowchart presented in Figure 4 illustrates the essential processes involved in self-healing control within a subway power supply system, highlighting the crucial steps from fault detection to system restoration. This flowchart includes four steps, as elaborated as follows.
1. Real-Time Monitoring and Data Collection
In this first step, the system continuously monitors operational parameters such as voltage, current, and temperature through sensors and monitoring devices. Once an anomaly is detected, such as voltage fluctuations or current imbalances, the system automatically triggers the self-healing control process. The use of the IEC 61850 communication protocol ensures the efficient and reliable transmission of data, allowing for the dynamic monitoring of critical system parameters. This early detection phase is crucial for initiating a rapid and accurate response to potential faults.
2. Fault Diagnosis and Location Analysis
After identifying an abnormal condition, the system utilizes historical data and real-time sensor readings to quickly diagnose the fault type and its precise location. AI algorithms, including pattern recognition and deep learning, process these data to create diagnostic models that pinpoint the fault accurately. The integration of these advanced AI techniques, coupled with the use of zero-sequence and differential current monitoring, enhances the efficiency and precision of fault identification, enabling the system to respond rapidly to issues in the subway power supply network.
3. Fault Isolation and Protection Actions
Once the fault is diagnosed, automated isolation devices are triggered to disconnect the affected area, preventing the fault from spreading and causing further damage. The multi-agent system (MAS) plays a critical role in coordinating resources across the entire network, dynamically adjusting load distribution to stabilize the system. The MAS ensures that the fault isolation is executed in an optimized sequence, minimizing the overall impact on the system’s functionality and ensuring that non-faulted regions maintain power. This coordination enhances system resilience and ensures minimal service disruption.
4. Recovery of Power Supply and Load Optimization
After the fault is isolated, the system focuses on restoring power to the unaffected regions. This is accomplished through the automatic switching of circuits to reconnect these areas to the power supply. To prevent overloads and secondary faults, intelligent optimization algorithms, such as game theory-based optimization, are applied to balance resource distribution across the network. These algorithms ensure the efficient recovery of power without overloading critical lines and help avoid secondary faults or service interruptions. The dynamic load optimization enhances the overall system stability and speeds up the recovery process, minimizing the downtime of the subway network.
As illustrated in Figure 4, this process begins with real-time monitoring and data collection, where sensors continuously gather operational parameters such as voltage and current. Upon detecting abnormalities, such as voltage fluctuations or current imbalances, the system triggers the self-healing procedure. The fault diagnosis and location analysis stage utilizes AI algorithms, including pattern recognition and deep learning, to swiftly identify the fault type and pinpoint its exact location based on both historical and real-time data. Following diagnosis, fault isolation and protection actions are automatically implemented, where automated isolation devices disconnect the faulted area, preventing further damage. The multi-agent system (MAS) coordinates resources across the network, dynamically adjusting load distribution to maintain system stability. Finally, in the recovery and load optimization phase, the system restores power to the unaffected areas by automatically switching circuits, and smart optimization algorithms ensure balanced resource allocation, minimizing the risk of overloading or secondary faults. This flowchart exemplifies the advantages of modern self-healing systems, combining AI-driven diagnostics, automated fault isolation, and dynamic load management to rapidly restore service and maintain network stability. Its key strengths lie in its high-speed response to faults, minimal disruption to non-faulted areas, and the intelligent optimization of resources, making it highly efficient for managing complex subway power networks.
Overall, this fault isolation and recovery flowchart is integral to a self-healing control system, showcasing an intelligent and automated approach to managing faults in complex subway power supply networks. The system’s reliance on real-time data collection, AI-based diagnostics, automated fault isolation, and dynamic recovery processes ensures a rapid, accurate, and minimal-impact response to power system disruptions. This approach enhances both operational reliability and efficiency, crucially maintaining continuous service while minimizing power loss and service downtime.
The mechanisms discussed in the table—ring network reconfiguration, automated switches, MAS-based decision-making, fault detection sensors, real-time load balancing, and adaptive rerouting—each contribute uniquely to the enhancement of subway power systems’ resilience and reliability. While each mechanism has its own set of challenges, particularly in terms of integration complexity, coordination, and data management, their future potential remains high. The mechanisms with higher scalability and dynamic response capabilities, such as real-time load balancing and adaptive rerouting, are particularly important for adapting to the growing demands of modern subway networks.
It is clear that while some systems, like automated switches, are already well integrated and have a proven track record, others like MAS-based decision-making and adaptive rerouting present substantial opportunities for future research and development. These technologies will become increasingly vital as the complexity of urban transit systems continues to evolve, making continuous innovation and research investment in this area critical.

5.3. Cross-Layer Fault Recovery Techniques and Strategies

Cross-layer fault recovery involves the coordination of fault management across different layers of the power network, from generation to distribution. This approach is essential for ensuring that faults are handled in a synchronized manner, minimizing the risk of cascading failures across multiple network layers.
Scalability challenges arise when attempting to implement multi-layer coordination systems in large subway networks. The complexity of cross-layer communication and data synchronization increases as the number of network layers grows. Hierarchical recovery systems, which involve different levels of control, must be carefully designed to ensure that they can scale without overwhelming the system’s computational resources.
Cost issues are particularly relevant here. The deployment of cross-layer recovery systems often requires substantial investment in advanced communication infrastructure and real-time data processing technologies. However, as these systems improve the overall resilience and efficiency of the subway power network, the long-term savings from reduced downtime, improved power quality, and predictive fault management can justify the initial costs.

5.3.1. Hierarchical Recovery Systems

Hierarchical recovery systems involve different levels of fault management, from local fault detection and isolation to higher-level network-wide coordination. The first level involves the detection and isolation of faults at the substation or line level, while the next level includes coordinating recovery actions across multiple substations or even across the entire power grid. At the highest level, centralized control centers monitor the overall system status and ensure that recovery actions are coordinated across the network.

5.3.2. Coordinated Fault Isolation Across Layers

Coordinated fault isolation is crucial to ensure that faults at one level do not affect other levels of the system. For example, if a fault occurs at the substation level, the system must isolate the fault while maintaining the overall operation of the grid. At the same time, automated decision-making processes must be in place to restore service as quickly as possible, whether through reconfiguration, load balancing, or rerouting.

5.3.3. MASs for Multi-Layer Coordination

Multi-agent systems are particularly useful in multi-layer fault recovery. They enable distributed decision-making, where each agent is responsible for coordinating recovery actions within its designated layer. These agents communicate with each other to ensure that the system-wide recovery actions are optimized. For example, an agent at the substation level may isolate a fault, while an agent at the transmission level can reroute power from other sources to ensure continuous service [118,119,120]. For example, the research work in [118,119,120] highlights the significant role of MASs in enhancing the resilience and efficiency of power systems, particularly in the context of multi-layer fault recovery. Lin and Bie (2018) [118] provide a comprehensive analysis of strategies for achieving power system resilience, emphasizing decentralized decision-making in MASs. Yu et al. (2020) [119] explore survivability-aware routing restoration mechanisms, demonstrating how MASs can optimize network communication during large-scale failures. Furthermore, Moradi et al. (2016) [120] examine the application of MASs in power engineering, emphasizing their ability to coordinate distributed recovery actions across different layers of the power network, such as substations and transmission systems. These studies collectively underscore the importance of MASs in ensuring rapid fault detection, isolation, and system restoration in complex, multi-layered infrastructures.
Based on this, Table 17 outlines key cross-layer fault recovery mechanisms aimed at enhancing the resilience of subway power systems. These mechanisms—hierarchical recovery, cross-layer isolation, MASs for multi-layer coordination, adaptive load balancing, data integration systems, and predictive recovery algorithms—represent various approaches to fault isolation, decision-making, and recovery coordination across multiple layers of the power system. The table assesses each mechanism based on its degree of implementation, differences between systems, future prospects, issues to address, research potential, reliability impact, cost considerations, maintenance requirements, and impact on power quality.
From Table 17, a detailed summarization and evaluation is presented as follows.
1. Hierarchical Recovery
Hierarchical recovery is implemented at a high level in subway power systems, particularly for coordinated fault recovery. The main challenge here is the system complexity, as the approach varies significantly across different systems. Hierarchical recovery shows very high future prospects due to its potential for seamless fault management. The research potential is high, and its impact on reliability is very significant. However, its cost considerations and maintenance requirements are high, which is a common challenge in large-scale systems. Despite these challenges, hierarchical recovery mechanisms provide very high benefits in terms of power quality, making them essential for enhancing overall system robustness.
2. Cross-Layer Isolation
Cross-layer isolation mechanisms are moderately implemented and are particularly focused on isolating faults across multiple layers of the system. These systems are critical for new and emerging subway power networks. The key issue to address is maintaining data consistency, which is essential for effective fault detection and recovery. The research potential is high, and the impact on reliability is substantial. While the costs are moderate, the complexity of integrating multiple layers of data can be a challenge. Cross-layer isolation mechanisms have a high impact on power quality, making them crucial for efficient and resilient power systems.
3. MASs for Multi-Layer Coordination
MASs are used for distributed decision-making in recovery processes. These systems are implemented at high degrees, requiring data consistency across multiple layers of the power system. The challenge is overcoming communication delays between agents, which can hinder decision-making efficiency. Despite these challenges, MASs for multi-layer coordination hold very high future prospects, particularly in enhancing system resilience. Research into this mechanism remains high, as it is central to modern smart grids and large-scale systems. While it demands high costs and maintenance, it has a very high impact on power quality, especially when applied to complex networks requiring adaptive decision-making.
4. Adaptive Load Balancing
Adaptive load balancing is critical for balancing load across multiple levels of the subway network. Its implementation is high, especially in large systems where dynamic load adjustment is needed. The key issue in this approach is the requirement for real-time data processing and accurate system monitoring. The research potential for adaptive load balancing is high, with very high prospects for improving system efficiency. While it has high costs and maintenance needs, it offers a very high impact on power quality by ensuring that the network can dynamically adjust to load fluctuations. This makes adaptive load balancing a vital tool for maintaining power stability in large subway networks.
5. Data Integration Systems
Data integration systems are essential for integrating data across multiple layers of the subway network, particularly for decision-making processes. These systems are moderately implemented, with ongoing challenges in ensuring data synchronization across different layers. The research potential remains high, with a significant focus on overcoming data consistency and synchronization issues. Data integration systems contribute highly to the reliability and efficiency of subway power systems, although their cost and maintenance requirements are also high. Despite these challenges, the systems have a very high impact on power quality, making them a key area for continued development.
6. Predictive Recovery Algorithms
Predictive recovery algorithms are in the experimental stage and focus on anticipating faults and recovery actions. These systems are particularly useful in providing proactive solutions to prevent faults from escalating. However, they face challenges related to the accuracy of fault models, which can limit their effectiveness. The research potential is moderate, as this area is still evolving, and real-world applications are limited. While the costs for implementing predictive recovery algorithms are moderate, their maintenance needs are high due to the complexity of the algorithms. Despite the challenges, these systems offer a very high impact on power quality by enhancing the system’s ability to preemptively address potential disruptions.
Overall, the cross-layer fault recovery techniques ensure that faults are managed seamlessly across different levels of the subway power network. By using hierarchical recovery systems, coordinating fault isolation, and leveraging MASs for multi-layer coordination, these techniques help improve overall system resilience. The integration of real-time data and predictive algorithms further enhances the effectiveness of these systems. However, issues related to data synchronization, communication, and system complexity remain challenges to be addressed.
The cross-layer fault recovery mechanisms outlined in the table—hierarchical recovery, cross-layer isolation, MASs for multi-layer coordination, adaptive load balancing, data integration systems, and predictive recovery algorithms—are all integral to improving the robustness and resilience of subway power systems. Each mechanism offers distinct advantages in terms of fault isolation, dynamic recovery, and decision-making, although they also present various challenges related to system complexity, data synchronization, and real-time operation.
Among these, hierarchical recovery and MASs for multi-layer coordination show the most promise for long-term development, as they provide coordinated and adaptive solutions for large-scale systems. However, issues related to data consistency, communication delays, and high implementation costs remain significant barriers that need to be addressed. Predictive recovery algorithms, while still experimental, represent a transformative approach to proactive fault management and could become crucial as AI and machine learning technologies advance.
Ultimately, the integration of these mechanisms into subway power systems will enhance operational efficiency, reduce downtime, and improve overall power quality, making them indispensable for the future of smart and resilient subway networks.

5.4. AI-Driven Fault Diagnosis and Recovery in Complex Scenarios

AI is playing an increasingly critical role in the diagnosis and recovery of faults in complex subway power systems. Machine learning, deep learning, and predictive analytics allow for faster and more accurate identification of faults, even in challenging scenarios where traditional methods may struggle.
Scalability remains a significant challenge when applying AI-based fault diagnosis to large-scale systems. The number of sensors, data points, and computational requirements for deep learning models increases as the subway network expands. Therefore, a cloud-based approach or distributed AI systems may be necessary to handle the data processing demands of larger systems. These AI models must be continuously updated with real-time operational data to ensure their accuracy and adaptive capabilities in dynamic environments [121,122,123].
The cost of AI-driven systems can be high, particularly in terms of computational resources and data storage requirements. However, the implementation of AI-based systems can significantly reduce manual intervention and improve operational efficiency, leading to long-term savings. Additionally, AI-based reconfiguration systems can reduce downtime and optimize network recovery, providing a high return on investment over time.

5.4.1. Machine Learning for Fault Prediction

Machine learning algorithms can be trained using historical fault data to predict potential future faults based on patterns and trends [124,125,126]. By analyzing large volumes of data, these algorithms can identify early warning signs of potential faults before they occur. This proactive approach allows for better planning and mitigation strategies, reducing the overall impact of faults.

5.4.2. Deep Learning for Real-Time Fault Diagnosis

Deep learning models, which are a subset of machine learning, can be particularly useful for real-time fault diagnosis [127,128]. These models analyze data from multiple sensors and sources, identifying complex patterns that may indicate a fault. The advantage of deep learning is its ability to process large amounts of data and learn from it, enabling quicker diagnosis and recovery times.

5.4.3. AI for Automated Reconfiguration

AI can also assist in the automated reconfiguration of subway power systems after a fault has been isolated [129]. By considering a range of factors such as network load, fault severity, and environmental conditions, AI can determine the most effective configuration for restoring power. This dynamic reconfiguration is crucial in ensuring that service is restored as quickly as possible without overloading other parts of the network. Based on this, Table 18 presents an analysis of several AI-driven technologies that are applied to fault diagnosis and automated reconfiguration in subway power systems. The technologies highlighted include machine learning algorithms, deep learning models, AI-based reconfiguration, predictive analytics, fault pattern recognition, and automated diagnostic systems. These systems aim to predict faults, diagnose issues in real time, and optimize recovery strategies to ensure the rapid restoration of power in the event of system disruptions.
From Table 18, a detailed summary and evaluation is elaborated as follows.
1. Machine Learning Algorithms
Machine learning algorithms are widely implemented in modern systems to predict faults before they occur [130]. The key issue with this technology is ensuring data accuracy, as accurate data input is critical for the effective prediction of faults. The future prospects for machine learning in fault prediction are very high due to their ability to enhance real-time diagnostics and preemptively manage faults. These algorithms have a very high reliability impact as they improve the system’s overall resilience. Although the research potential remains high, the associated costs are moderate, with medium maintenance requirements. Their impact on power quality is also very high due to the proactive nature of fault management and early intervention.
2. Deep Learning Models
Deep learning models are increasingly used for real-time fault diagnosis, with moderate implementation in subway systems [127]. These models require large datasets to effectively identify faults, which poses challenges in data consistency and accuracy. Despite these challenges, deep learning models have very high future prospects, as they can learn from vast amounts of historical data to provide highly accurate diagnostics. The reliability impact is very high, and the research potential remains significant, especially with advancements in neural networks and computational capabilities. However, deep learning models require high computational power, making them costly and high-maintenance. Despite these requirements, the impact on power quality is very high due to their accuracy and real-time capabilities.
3. AI-based Reconfiguration
AI-based reconfiguration focuses on optimizing network recovery after a fault is isolated. While the technology is still evolving, it holds very high future prospects due to its potential to improve recovery times and efficiency. One of the primary challenges with AI-based reconfiguration is the need for fast computation to optimize decisions in real time, particularly in large subway systems. The research potential remains high, and the technology’s reliability impact is also significant. The cost considerations are high, as the system requires substantial computational resources and integration with existing infrastructure. Despite the costs, the impact on power quality is very high due to its ability to dynamically restore power to affected parts of the network.
4. Predictive Analytics
Predictive analytics is widely used in advanced grids to forecast network conditions and potential faults. The system is highly implemented and has very high future prospects, given its ability to anticipate issues before they occur. The major challenge for predictive analytics lies in algorithm complexity, as the models must process and analyze large volumes of real-time data. The research potential for predictive analytics is high, and its impact on reliability is significant due to its ability to optimize power system management before faults emerge. Cost considerations remain high, but predictive analytics offers high potential for improving system reliability. Its maintenance requirements are medium, and the impact on power quality is very high, especially in maintaining system stability.
5. Fault Pattern Recognition
Fault pattern recognition identifies complex fault scenarios, which are common in AI-driven systems. The system is highly implemented, especially for scenarios where faults may be difficult to detect manually. However, the key challenge is the need for large amounts of training data to accurately identify fault patterns. The future prospects for fault pattern recognition are very high, and its reliability impact is also substantial, as it enhances fault detection accuracy. The research potential is high, and the technology requires significant investment in training data and integration complexity. Despite these challenges, the impact on power quality is very high, as it can significantly reduce downtime and system disruptions.
6. Automated Diagnostic Systems
Automated diagnostic systems are used for automated fault detection and troubleshooting, and their implementation is moderate [131,132]. This technology is relatively new in subway systems, which can lead to integration complexity. Despite these challenges, automated diagnostic systems offer high research potential due to their ability to quickly identify and address faults. They require medium to high costs for integration and maintenance, but the technology significantly improves the system’s ability to respond to faults efficiently. The impact on power quality is very high, as automated diagnostics help maintain continuous operation and prevent larger system failures.
As summarized above, AI-driven fault diagnosis and recovery technologies are central to enhancing the resilience, efficiency, and quality of subway power systems. Each of the technologies listed in Table 18—machine learning algorithms, deep learning models, AI-based reconfiguration, predictive analytics, fault pattern recognition, and automated diagnostic systems—contributes uniquely to fault management and network recovery. While they all present some challenges related to data accuracy, integration complexity, and computational demands, their future prospects remain very high.
Machine learning and deep learning technologies, in particular, have the potential to revolutionize fault prediction and diagnosis, providing advanced solutions to issues related to system reliability. AI-based reconfiguration and predictive analytics are poised to improve recovery times and power system optimization, ensuring that power disruptions are minimized. Fault pattern recognition and automated diagnostic systems are essential for improving fault detection and reducing downtime.
Overall, the continued development and integration of these technologies are crucial for the advancement of modern subway power networks. However, it is important to address issues related to data synchronization, computational requirements, and system integration to fully realize their potential in the future. In Section 5, these subsections together provide a comprehensive overview of the practical applications of self-healing technologies in subway power systems, focusing on fault recovery, network reconfiguration, and the integration of advanced AI and MASs for improved system resilience and efficiency. The application of these techniques enhances the reliability and sustainability of urban transportation networks, ensuring uninterrupted service even in the event of faults.

6. Implications of AI Technologies for Future Subway Power Systems

In this chapter, we delve into the wide-ranging implications of emerging AI technologies for the future development of subway power systems. Building on previous discussions of self-healing architectures, multi-agent frameworks, and IEC 61850-based communication protocols, we focus here on how AI—particularly machine learning (ML), deep learning (DL), and reinforcement learning (RL)—can transform operational strategies, maintenance paradigms, data governance, and broader socio-economic outcomes in the context of subway power supply. Specifically, we present five key subtopics that capture the multifaceted opportunities and challenges posed by AI in this domain.
First, Section 6.1 explores AI-enhanced fault diagnosis and prognostics, illustrating how cutting-edge data analytics can enable predictive maintenance and near-real-time failure detection. We present a distinct approach to predictive maintenance by incorporating AI methodologies that combine historical data with real-time operational signals to improve failure prediction accuracy in complex subway power systems. This expands upon traditional techniques by incorporating newer AI-based predictive models that integrate physics-driven insights and data fusion for improved fault detection and prognosis, as seen in recent applications within smart grid and industrial automation systems. Next, Section 6.2 investigates reinforcement learning and decision-making in self-healing processes, illustrating how adaptive algorithms can optimize power restoration and system resilience. Section 6.3 addresses the integration of AI and MASs under IEC 61850, highlighting the ways in which standardized communication protocols can be leveraged to support distributed intelligence and collective fault management. Moving beyond the purely technical dimensions, Section 6.4 focuses on cybersecurity, privacy, and data management for AI-driven subway power systems, considering how the influx of real-time data demands novel governance frameworks. Finally, Section 6.5 concludes with next-generation operational strategies and socio-economic implications, discussing how AI-enabled subway power systems may reshape workforce development, economic modeling, and stakeholder engagement.
These five sections provide a comprehensive perspective on how AI innovations are expected to evolve and influence subway power systems. Through the inclusion of novel approaches, particularly in AI-driven decision-making processes for self-healing, we introduce emerging techniques and applications that have not been extensively explored in prior studies. These techniques, including hybrid machine learning methods and real-time adaptive models, are at the forefront of advancing subway power infrastructure, offering new contributions to the field and distinguishing this review from earlier works.

6.1. AI-Enhanced Fault Diagnosis and Prognostics

One of the most significant applications of artificial intelligence in subway power systems is the enhancement of fault diagnosis and prognostics. Traditional fault detection methods—often reliant on predefined thresholds and manual inspections—are increasingly inadequate in the face of complex, dynamic operating conditions. AI-based approaches, by contrast, can leverage advanced data analytics to identify subtle patterns in real-time signals, historical maintenance logs, and contextual environmental data. The implications for subway power systems are profound: improved detection speed, greater accuracy, reduced downtime, and the ability to predict failures before they occur.
AI-based fault diagnosis now incorporates advanced methodologies that allow for dynamic analysis of both sensor data and environmental context. A shift from traditional rule-based models to AI-enhanced diagnostics involves the fusion of deep learning and machine learning techniques with domain-specific knowledge. This approach, particularly when applied to hybrid AI-physics models, helps identify fault signatures that might otherwise remain undetected using conventional methods. Recent research has shown the successful application of such hybrid models in areas like predictive maintenance within industrial and energy systems, enhancing the robustness of subway power systems.

6.1.1. Transition from Reactive to Predictive Maintenance

AI-empowered strategies for fault diagnosis enable a shift from reactive to predictive maintenance paradigms. Historically, failures in power components such as transformers, switchgear, and traction substations could lead to severe operational disruptions, significant repair costs, and compromised passenger safety. Predictive maintenance uses AI to process multi-source data (e.g., sensor readings, operational logs, images from thermal cameras, vibrations, and acoustic signals) and identify early warning signs.
(1)
Machine Learning Models: Supervised learning algorithms (e.g., Support Vector Machines and Random Forests) can detect anomalies in high-dimensional data, building predictive models that correlate subtle parameter shifts—such as partial discharges or fluctuating voltage profiles—to impending failures.
(2)
Deep Learning Architectures: Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) offer enhanced feature extraction and sequence analysis, making it possible to detect complex non-linear relationships in sensor streams, identify irregularities in real time, and precisely localize potential faults within specific subsystems.
By incorporating these AI methods, subway operators can anticipate maintenance needs and schedule repairs in a manner that minimizes disruption. As a result, not only can subway power systems achieve higher reliability, but also overall operational expenditures can be significantly reduced.

6.1.2. Real-Time Monitoring and Edge Analytics

Real-time monitoring is essential for the effective operation of subway power systems, given their mission-critical nature. Deploying AI-driven edge analytics in local controllers allows for immediate processing of data without the latency introduced by centralized cloud computing [133]. This decentralized approach, similar to advancements in the industrial IoT sector, offers significant improvements in fault response times by enabling local processing of sensor data [134]. For instance, Refs. [133,134] emphasize the integration of real-time monitoring and AI-based edge analytics to enhance the performance and reliability of critical infrastructure, including subway power systems. Leveraging edge computing with deep learning models for real-time fault detection ensures continuous operation even during communication breakdowns, maintaining system resilience in highly dynamic environments. This setup reduces system vulnerabilities and optimizes response times, ensuring critical components remain functional despite potential disruptions.
In practice, an edge device near a substation might run a compact deep learning model trained to recognize specific fault signatures (e.g., harmonic distortions indicative of insulation breakdown). Upon detecting an anomaly, it could initiate an automated diagnostic routine or communicate with a higher-level control center for further analysis. Such distributed intelligence aligns closely with the principles of self-healing networks and multi-agent systems, wherein localized decisions can contain or mitigate faults quickly.

6.1.3. Challenges and Future Directions

While AI-enhanced fault diagnosis offers considerable potential, several challenges remain. One key obstacle is the quality and integrity of data, which are essential for generating accurate predictions. Recent advancements in AI, such as the application of generative models for data augmentation, have been proposed to address the challenge of limited datasets, especially for rare fault scenarios. These techniques, which have proven successful in industries such as healthcare and autonomous vehicles, could greatly enhance the robustness of AI models used for fault prediction in subway power systems. Furthermore, the interpretability of deep learning models continues to be a concern. Research is actively addressing this issue by developing explainable AI techniques that provide insight into the decision-making process, fostering operator trust and ensuring model transparency. Looking ahead, addressing these challenges will involve the following:
(1)
Data Fusion: Combining structured data (e.g., SCADA logs) with unstructured data (e.g., images and audio signals) for more holistic fault models.
(2)
Transfer Learning: Leveraging knowledge from related domains (e.g., electrified railways) to build robust classifiers even when local fault data are scarce.
(3)
Explainable AI: Integrating interpretable model architectures or post hoc interpretability frameworks to ensure that operators understand and trust AI-driven fault detection decisions [135,136].
Table 19 highlights various fault diagnosis and prognostics scenarios in subway power systems, mapping each to its implementation stage, unique features, and prospective impact on overall system performance. This table demonstrates that AI-based methods yield significant improvements in fault isolation speeds and predictive maintenance accuracy, while persistent challenges remain in addressing data scarcity and system complexity. Notably, bridging these gaps requires both domain-oriented strategies—like hybrid physics-data approaches—and advanced AI methods such as transfer learning, federated learning, and explainable AI. Overall, AI-enhanced fault diagnosis and prognostics offer a transformative path forward, enabling subway power systems to move beyond reactive maintenance and toward a resilient, future-ready operational model. As data quality, model interpretability, and real-time decision-making capabilities improve, these techniques will become integral to enhancing reliability, lowering operational costs, and ensuring passenger safety.

6.2. Reinforcement Learning and Decision-Making in Self-Healing Processes

Reinforcement learning (RL) presents a compelling framework for dynamic decision-making in self-healing subway power systems. Unlike traditional control methods that rely on static, rule-based logic, RL algorithms learn optimal policies through continuous interactions with the environment. This approach holds immense promise for managing complex power distribution networks, where real-time reconfiguration, load balancing, and fault recovery must be orchestrated under varying operational constraints.
Recent advancements in deep reinforcement learning (DRL), particularly in the form of multi-agent RL frameworks, have demonstrated significant promise for complex infrastructure management. These frameworks enable distributed decision-making among multiple agents operating in parallel, each responsible for managing specific subsystems. This approach is well suited for subway power systems, where different agents can independently optimize substation operations, power rerouting, and fault recovery while ensuring overall system stability. These developments represent a significant evolution from traditional RL approaches and align closely with emerging smart grid technologies.

6.2.1. Core Principles of Reinforcement Learning

RL algorithms are typically defined by the concepts of agents, environments, states, actions, and rewards. In a subway power system context, they are as follows:
(1)
Agent: The AI controller (or controllers) responsible for adjusting switch positions, redirecting power flows, or prioritizing fault isolation.
(2)
Environment: The subway power system, inclusive of its multi-bus topology, traction substations, feeders, and protective devices.
(3)
State: The real-time status of the system, including voltage levels, load demands, equipment health indicators, and fault locations.
(4)
Action: Any operational command the RL agent can perform, such as opening or closing circuit breakers, adjusting converter setpoints, or initiating fault isolation protocols.
(5)
Reward: A numerical signal representing the quality of each action, often linked to performance metrics like minimized outage duration, voltage stability, or reduced energy losses.
Through iterative exploration and exploitation, RL algorithms can discover dynamic strategies for reconfiguring the power system in response to faults or varying load conditions, thereby supporting self-healing functionality.

6.2.2. Adaptive Self-Healing Under Uncertainty

One of the critical challenges in self-healing networks is dealing with uncertainty—both in terms of partial system observability and rapidly changing demand patterns. RL excels in these conditions, as it can learn to balance exploration (trying new configurations) with exploitation (using known successful actions).
(1)
Deep Q-Networks (DQN): These incorporate neural networks to approximate the action-value function, enabling RL agents to handle high-dimensional state spaces [137,138].
(2)
Policy Gradient Methods: Algorithms like Proximal Policy Optimization (PPO) or Advantage Actor-Critic (A2C) learn continuous control policies, facilitating more nuanced actions such as incremental power flow adjustments.
(3)
Model-Based RL: By integrating predictive models of system behavior (e.g., partial differential equations describing network flows), RL agents can plan ahead, simulating potential outcomes of different actions before implementation.
This adaptability allows RL-driven self-healing systems to isolate faults more rapidly and re-route power with minimal operator intervention, enhancing overall reliability and resilience.

6.2.3. Multi-Agent Coordination

In the context of subway power systems, different functional areas—traction substations, feeders, and signaling equipment—can be managed by specialized RL agents. A multi-agent RL approach allows these subsystems to coordinate actions effectively. The integration of cooperative multi-agent RL models ensures that decisions made by individual agents are aligned with broader system-wide goals, such as minimizing power outages and maximizing operational efficiency. Recent progress in cooperative RL, particularly in optimizing energy distribution across urban power networks, shows promise for enhancing coordination among subway power system agents, ensuring both fault isolation and optimal power restoration.

6.2.4. Challenges in RL-Based Self-Healing

Despite its promise, RL faces notable hurdles in practical railway power scenarios:
(1)
Safety Constraints: Subways operate under strict safety regulations, so RL actions must never compromise passenger safety. Techniques like safe RL or reward shaping can incorporate safety margins.
(2)
Scalability: Large systems with dozens of substations and thousands of sensors result in vast state-action spaces, necessitating advanced function approximation and distributed training architectures.
(3)
Learning Speed: RL algorithms may require numerous interactions or simulated “episodes” to learn effective policies. Building high-fidelity digital twins for training is thus essential.
(4)
Generalization: Policies learned under certain load patterns or fault conditions may not generalize well to unseen scenarios, underscoring the need for robust domain adaptation and online learning strategies.
Table 20 outlines the primary RL approaches, benefits, and challenges for self-healing processes in subway power systems. While feeder reconfiguration and load balancing show high promise, scaling these solutions requires addressing safety, computational complexity, and real-time adaptability. Cooperative multi-agent RL emerges as a powerful paradigm, but it demands seamless data exchange and robust coordination protocols. Each scenario also highlights the necessity of sophisticated simulation tools (digital twins) for training RL agents under realistic conditions.
In conclusion, RL offers a dynamic, adaptive framework for enhancing self-healing capabilities in subway power systems. By learning from experience, RL algorithms can optimize fault isolation and power restoration in uncertain and evolving conditions. Although technical challenges—particularly in the realms of safety, scalability, and domain adaptation—remain, continued research and pilot implementations will refine RL-driven approaches, ushering in a new era of intelligent, resilient, and self-reconfiguring subway power infrastructures.

6.3. Integration of AI and Multi-Agent Systems Under IEC 61850

The third focal area explores how AI methods, when paired with MASs and standardized by IEC 61850 protocols, can unlock advanced self-healing, interoperability, and collaborative decision-making in future subway power systems. While MAS architectures enable the distribution of tasks and knowledge among specialized agents, IEC 61850 provides a common language for data exchange. AI techniques further enrich these frameworks by injecting predictive capabilities, adaptive optimization, and real-time learning.
Recent advancements in IEC 61850 extensions have focused on integrating AI-driven systems to improve real-time fault management, such as through enhanced predictive maintenance capabilities and adaptive load balancing. Furthermore, the application of AI-enhanced decision-making algorithms within these MAS frameworks ensures that fault recovery processes are both faster and more reliable. The increasing integration of AI and MASs with IEC 61850 is driving new innovations in the development of self-healing networks that rely on adaptive, data-driven approaches to enhance system reliability.

6.3.1. Roles of MASs and IEC 61850 in Subway Power Systems

MASs are collections of semi-autonomous entities—“agents”—capable of independent decision-making. In a subway environment, each agent might represent a specific subsystem or physical asset (e.g., a traction transformer, a protection relay, or a signaling interface). The MAS approach decentralizes control, enhancing flexibility, scalability, and fault tolerance.
IEC 61850, originally developed for substation automation in utility power grids, offers a rich object-oriented data model and standardized communication services (GOOSE, MMS, etc.) [139,140]. As subway power systems become more sophisticated, adopting IEC 61850 ensures consistent naming conventions, standardized data structures, and event-driven messaging. Consequently, agents in an MAS architecture can seamlessly exchange data, coordinate actions, and maintain a shared situational awareness.

6.3.2. AI-Driven Coordination and Decision-Making

AI plays a pivotal role in orchestrating multi-agent cooperation under IEC 61850. By analyzing system-wide telemetry and status signals, AI algorithms can identify potential conflicts or synergies among agents. For instance, a traction substation agent anticipating overload conditions might request a neighboring substation agent to reroute power flows. AI-driven coordination ensures that these negotiations happen quickly and optimally, considering constraints like safety margins, service priorities, or economic dispatch rules.
Moreover, advanced AI algorithms—such as Graph Neural Networks [141] or complex optimization solvers—can leverage the structured data from IEC 61850 to model the relationships among power system components. Agents can then perform local computations while concurrently feeding results into a global optimization layer, leading to emergent, system-wide intelligence.

6.3.3. Interoperability and Standardization Benefits

A key advantage of integrating AI with MASs under IEC 61850 is interoperability. Legacy subway systems often use proprietary communication protocols, leading to device incompatibility and vendor lock-in. IEC 61850 breaks down these barriers, enabling cross-vendor and cross-application integration. From the perspective of AI, having a standardized data schema enhances the portability and scalability of models, as consistent data structures streamline data ingestion and model training processes.
As a result, operators can incorporate new AI-driven applications—like advanced fault analytics or dynamic reconfiguration—without overhauling existing hardware. The MAS approach further compartmentalizes tasks, so if a new AI module is added, only the relevant agents need updating, preserving overall system stability.

6.3.4. Potential Obstacles and Evolution

The integration of AI and MASs under IEC 61850 faces several potential obstacles:
(1)
Communication Latency: While IEC 61850 supports high-speed messaging, real-time AI inference may still demand edge computing infrastructure to avoid round-trip delays.
(2)
Cybersecurity: The standard’s emphasis on connectivity raises cybersecurity concerns. AI modules and MAS agents could become targets of sophisticated cyberattacks, necessitating robust encryption, authentication, and intrusion detection schemes.
(3)
Complexity of Agent Interactions: As the number of agents grows, orchestrating their interactions can become unwieldy. AI-based supervision layers must handle negotiation protocols, conflict resolution, and consistency checks.
(4)
Operational Validation: Formal verification and testing of AI-driven MAS solutions remain challenging, given the high-stakes nature of subway operations.
In the future, ongoing standardization efforts for advanced functionalities—like the IEC 61850 extensions for distributed energy resources—could incorporate guidelines specific to AI integration. Initiatives aimed at real-time digital twins and 5G communication might also enhance the synergy between MASs and IEC 61850, broadening the horizon for intelligent subway power systems. Based on this, Table 21 enumerates key dimensions of integrating AI, MASs, and IEC 61850. Each row highlights how agent-based architectures benefit from standardized communication and AI-driven analytics, describing both advantages (e.g., swift fault isolation and improved energy routing) and obstacles (e.g., vendor constraints and cybersecurity risks). Notably, the long-term potential for each dimension tends to be high or very high, underscoring the transformative power of this integration.
In closing, merging AI with MASs under the IEC 61850 umbrella paves the way for a more coordinated, interoperable, and adaptive subway power ecosystem. By distributing intelligence across multiple agents, harnessing standard communication protocols, and leveraging AI for data-driven decisions, subway systems can evolve toward more agile and resilient operations. Future developments will likely emphasize refined cybersecurity policies, real-time digital twins, and advanced scheduling algorithms, collectively forming the backbone of next-generation intelligent rail networks.

6.4. Cybersecurity, Privacy, and Data Management for AI-Driven Subway Power Systems

With the integration of AI, IoT devices, and multi-agent architectures, the volume and sensitivity of data in subway power systems are growing at an unprecedented rate. While these data streams fuel advanced analytics and self-healing capabilities, they also introduce heightened risks related to cybersecurity, data privacy, and governance. A robust data management framework is thus imperative to ensure reliable operations and maintain public trust.
To safeguard AI-driven systems, cutting-edge technologies such as blockchain for secure data logging and federated learning for data privacy are being explored. These technologies, proven in fields such as digital finance and healthcare, allow for secure data sharing and model training while ensuring data privacy. Blockchain provides a transparent and immutable ledger, ensuring that operational data remain tamper-proof and secure, particularly in the event of cyberattacks. Additionally, federated learning enables AI models to be trained locally, reducing data exposure risks and allowing for privacy-preserving data analytics, critical for managing sensitive infrastructure data.

6.4.1. Cyber Threat Landscape

AI-driven subway power systems, connected through IEC 61850 or other communication protocols, present an attractive target for cybercriminals or malicious state actors. Potential attacks include the following:
(1)
Data Poisoning: Manipulating training datasets to degrade AI model performance or trigger erroneous system decisions.
(2)
Ransomware: Encrypting critical operational data and demanding payment to restore access, thus threatening system continuity.
(3)
Denial of Service (DoS): Flooding communication channels with spurious traffic, hindering real-time control signals.
(4)
Sensor Spoofing: Feeding corrupted sensor data into AI models, leading to incorrect fault diagnoses or false alarms.
In the context of a critical infrastructure like a subway system, even minor disruptions can have severe social, economic, and safety repercussions. Consequently, cybersecurity must become an integral component of AI system design, not an afterthought.

6.4.2. Data Privacy and Ethics

Beyond technical vulnerabilities, subway operators and city authorities must navigate privacy concerns. AI systems might aggregate detailed operational data, video feeds, passenger flow metrics, or location patterns. While these data points are essential for predictive maintenance or advanced analytics, storing and analyzing them also raise ethical questions. For instance, if camera feeds are used to measure passenger loads to predict electrical demand, they might inadvertently collect personally identifying information. Ensuring compliance with data protection laws (such as the General Data Protection Regulation, GDPR, in the European context) is critical for maintaining public trust.
Operators should consider adopting privacy-preserving AI techniques, including differential privacy or secure multiparty computation, which allow for collaborative model training or data analysis without exposing sensitive details. Implementing role-based data access controls and robust de-identification procedures further reduces the risk of misuse or accidental disclosure.

6.4.3. Comprehensive Data Governance

Data governance encompasses the policies, processes, and technical measures necessary for responsible data handling. A holistic framework would include the following:
(1)
Data Ownership: Defining clear ownership structures for sensor data, operational logs, and passenger metrics, potentially involving multiple stakeholders (public transport authorities, private operators, and technology vendors).
(2)
Data Lifecycle Management: Establishing guidelines for data collection, storage, access, retention, and deletion. Ensuring that data archiving practices meet both regulatory requirements and system operational needs.
(3)
Metadata and Standardization: Maintaining standardized metadata to enhance data discoverability and interoperability, vital for multi-agent systems reliant on consistent data schemas.
(4)
Quality Assurance: Integrating data validation protocols and anomaly detection to safeguard against corrupted or incomplete data inputs that could compromise AI-driven decisions.

6.4.4. Strategies for Resilience and Compliance

To fortify cybersecurity and privacy in an AI-driven environment, operators must adopt a layered security approach, including encryption, anomaly detection, intrusion detection systems (IDSs), and zero-trust architectures. Specific measures may involve the following:
(1)
Secure AI Pipelines: Implementing code-signing, containerization, and version control to prevent tampering with ML models or inference services.
(2)
Federated Learning: Training AI models locally on devices or substations and then aggregating only model parameters. This approach minimizes data movement and reduces exposure risks.
(3)
Incident Response and Recovery: Developing well-rehearsed contingency plans that detail how to isolate compromised systems, restore operational data, and communicate effectively with stakeholders.
(4)
Certification and Audits: Conducting regular third-party audits and penetration testing to validate the integrity of software components and to ensure ongoing compliance with evolving regulatory standards.
Here, Table 22 provides a structured overview of cybersecurity and data governance dimensions in AI-driven subway power systems. Each row pinpoints the major threats, required security measures, privacy implications, and governance challenges. Notably, risk levels vary from medium to very high, underscoring the criticality of robust security frameworks. The table also emphasizes that future technological developments—from zero-trust network architectures to advanced privacy-preserving analytics—will heavily influence how effectively operators can safeguard these systems.
In sum, as AI and big data analytics become ubiquitous in subway power systems, cybersecurity, privacy, and data management must be addressed comprehensively. Achieving a holistic solution involves aligning technical safeguards, organizational protocols, and regulatory mandates. While the challenges are significant, so are the rewards: with well-managed data and secure AI pipelines, subway power networks can harness the full promise of advanced analytics without compromising safety or public trust.

6.5. Next-Generation Operational Strategies and Socio-Economic Implications

Beyond the immediate technical benefits of AI in monitoring, diagnosis, and self-healing, these technologies also herald broader changes in operational strategies and socio-economic landscapes. By reducing maintenance costs, improving reliability, and enabling flexible energy management, AI-driven subway power systems are poised to influence workforce development, system financing, and urban planning in meaningful ways.
AI technologies are expected to foster a shift in workforce skills towards data science, cybersecurity, and AI model interpretation, as traditional roles in power engineering evolve. This transition reflects broader trends seen in industries adopting AI for system optimization and automation. Moreover, AI-enhanced decision-making is likely to introduce new policy and regulatory frameworks, particularly in relation to the management of public transportation infrastructure and energy resources. As cities adopt AI-driven subway systems, there is an opportunity to enhance collaboration between technology developers, transit authorities, and urban planners to create more sustainable and resilient urban environments.

6.5.1. Evolving Role of the Workforce

(1)
As AI-driven analytics and semi-autonomous systems take on routine tasks—such as fault detection or reconfiguration—human roles are likely to shift toward oversight, strategic decision-making, and specialized technical functions.
(2)
Upskilling and Reskilling: Engineers and technicians will need new skill sets, bridging power engineering with data science, cybersecurity, and AI model interpretation.
(3)
Collaborative Decision-Making: Operators will collaborate more closely with AI recommendations, requiring training in human–machine interfaces and explainable AI solutions to bolster trust and accountability.
(4)
New Roles: AI ethicists, data stewards, and cybersecurity specialists will emerge as essential staff for managing the complex socio-technical ecosystem.
In this context, it is imperative for subway authorities and vocational institutions to align educational programs with these new requirements, fostering a workforce that can manage and continually refine AI-enabled subway power systems.

6.5.2. Financial and Economic Dimensions

AI-driven efficiency gains—such as reduced unplanned downtime, lower energy losses, and improved asset utilization—can translate into substantial cost savings. These savings can be redirected toward infrastructure upgrades or used to reduce the fiscal burden on local governments. Additionally, more reliable services may boost ridership, generating indirect economic benefits for the city (e.g., increased retail sales near stations and improved labor mobility).
However, the initial investment required for AI tools, sensor networks, and data infrastructures can be considerable. This may prompt public–private partnerships or alternative financing models to share risks and rewards among multiple stakeholders. Over time, data generated by these systems might even be monetized (e.g., through analysis services for third parties), creating new revenue streams. While this can strengthen financial sustainability, it also necessitates robust governance frameworks to ensure data privacy and equitable value distribution.

6.5.3. Urban Planning and Sustainable Development

Intelligent subway power systems can play a critical role in shaping sustainable urban growth. By optimizing energy consumption and integrating with other urban infrastructure—such as electric vehicle (EV) charging networks or district heating—subway systems can support a more holistic approach to urban energy management. For example, advanced forecasting of passenger flows, combined with dynamic power distribution, can reduce peak loads on city grids. This synergy fosters better urban planning, reduced carbon emissions, and improved overall quality of life for residents.
Moreover, AI-driven fault detection and rapid incident response can bolster public perception of subways as safe and reliable. In turn, cities may be more inclined to expand rail transit networks, encouraging a modal shift away from cars and thereby reducing congestion and air pollution.

6.5.4. Policy and Regulatory Considerations

Governments and regulatory bodies will need to modernize policies to keep pace with AI’s rapid integration. Potential areas of focus include the following:
(1)
Standardization: Expanding IEC 61850 or similar standards to cover next-generation AI requirements (e.g., real-time data streaming and advanced analytics models).
(2)
Safety and Liability: Clarifying who is responsible when AI-driven systems make decisions that lead to incidents—particularly if they deviate from conventional operator guidelines.
(3)
Incentive Structures: Providing tax breaks, grants, or other incentives for subway operators investing in advanced AI technologies, especially if these innovations yield public benefits such as reduced CO₂ emissions or improved accessibility.
A forward-looking regulatory environment will ensure that AI enhancements align with public interest objectives, balancing innovation with risk management and equity considerations. For example, Refs. [142,143] underscore the crucial intersection of AI regulation, public interest, and risk management in ensuring the responsible and equitable deployment of artificial intelligence. Concretely, Alex-Omiogbemi et al. (2024) [142] present a framework for enhancing regulatory compliance and mitigating risks in emerging markets through digital innovations, illustrating how policy frameworks can support the responsible use of AI. Furthermore, Wang and Wu (2024) [143] address the need to strike a balance between fostering AI-driven innovation and maintaining robust regulatory oversight, highlighting the ethical and social implications of generative AI technologies. Together, these works contribute to the growing discourse on ensuring that AI development serves societal well-being while managing associated risks effectively.
Based on this, Table 23 outlines key socio-economic and operational considerations that arise from deploying AI in subway power systems. Gains in reliability and cost-effectiveness can translate into broader economic and environmental benefits, but they also introduce transitions in labor markets, regulatory frameworks, and urban planning. Each dimension involves interplay between technical innovations and societal factors, underscoring the necessity for multidisciplinary collaboration.
Overall, AI-driven subway power systems have the potential to redefine how metropolitan regions plan, finance, and operate their mass transit infrastructures. Policymakers, industry leaders, and community stakeholders should collaborate to craft strategies that maximize public benefit, minimize negative externalities, and ensure equitable access to these transformative technologies. By doing so, cities worldwide can harness AI’s power to create cleaner, safer, and more efficient transportation systems that support sustainable growth for generations to come.
Through the five sections above, we have examined the comprehensive implications of AI technologies for future subway power systems. Beginning with the role of AI in fault diagnosis and prognostics, we moved to reinforcement learning applications in self-healing and then explored the synergy of AI, MASs, and IEC 61850 for interoperable infrastructures. Subsequently, we addressed critical issues in cybersecurity, privacy, and data management and finally evaluated the broader socio-economic and operational transformations likely to emerge. This holistic coverage underscores not only the technical possibilities of AI-driven subway power systems but also the regulatory, workforce, and societal shifts required to bring about a truly intelligent, secure, and sustainable urban rail future.

6.6. Potential Security Flaws in AI-Driven Subway Power Systems and Mitigation Strategies

As subway power systems increasingly adopt AI technologies and MASs for self-healing, fault detection, and optimization, they become more vulnerable to cybersecurity threats. The integration of real-time data streams, edge computing, and AI-driven analytics introduces significant risks related to data integrity, system privacy, and overall network security. This section discusses the potential security flaws associated with AI-enhanced subway power systems and provides comprehensive strategies to mitigate these risks, ensuring the robustness, resilience, and safety of urban rail infrastructures.

6.6.1. Key Security Threats in AI-Driven Subway Power Systems

AI-enabled subway power systems, especially those incorporating machine learning (ML), deep learning (DL), and reinforcement learning (RL), increase both the complexity and attack surface of the system. The following are the primary security vulnerabilities identified in such systems:
1. Data Poisoning Attacks
AI models rely heavily on large datasets for training and decision-making. Data poisoning occurs when attackers intentionally manipulate training datasets to degrade the performance of the AI model, leading to incorrect predictions and compromised decision-making processes.
2. Sensor Spoofing
AI-driven systems depend on real-time sensor data to make decisions. In sensor spoofing, malicious actors manipulate sensor outputs—such as voltage, current, and temperature readings—to create false information that can trigger inappropriate actions by the system, such as misidentifying faults or failing to isolate them properly.
3. Ransomware and Denial of Service (DoS) Attacks
Ransomware attacks target critical infrastructure systems, encrypting operational data and demanding payment for restoring access. In DoS attacks, attackers flood communication channels, preventing timely data exchange between system components, potentially leading to failures in real-time control and communication.
4. Unauthorized System Access
AI-enabled subway systems are susceptible to unauthorized access, especially when communication protocols such as IEC 61850 are implemented. Cybercriminals could exploit vulnerabilities in these communication protocols to gain control over the system, affecting the decision-making capabilities of MAS- and AI-driven controllers.
5. Communication Latency and Spoofing
AI algorithms, especially those based on reinforcement learning, rely on real-time data for decision-making. Any latency or interruption in communication, whether through network failures or malicious interference, can degrade the performance of the self-healing system and delay fault isolation and recovery.

6.6.2. Mitigation Strategies for Enhancing Security

To safeguard AI-driven subway power systems, a multi-layered security approach is essential. The following mitigation strategies are proposed:
1. Advanced Encryption and Secure Communication Protocols
To prevent unauthorized access and data breaches, all communication between the system’s components, including sensors, agents, and controllers, should be encrypted using state-of-the-art encryption algorithms such as Advanced Encryption Standard (AES) and Transport Layer Security (TLS). Secure protocols like IEC 61850, with enhanced security features for critical infrastructure, should be employed to ensure data integrity and confidentiality.
2. Intrusion Detection and Prevention Systems (IDSs/IPSs)
Implementing IDSs/IPSs can help detect and prevent malicious activities such as sensor spoofing, unauthorized access, and data tampering. These systems monitor the network for unusual activities and trigger alerts for potential security threats, allowing operators to take timely action.
3. Federated Learning for Decentralized AI Models
Federated learning allows for the training of AI models without exposing sensitive data to centralized systems. By keeping data locally at each station or substation and only aggregating model updates, federated learning mitigates the risks associated with data breaches and ensures that privacy concerns are addressed while maintaining AI model performance.
4. Data Validation and Integrity Checks
Ensuring the integrity of the data fed into AI models is critical for maintaining accurate predictions. AI models should be equipped with built-in data validation checks to identify inconsistencies or anomalies in real-time data. Moreover, regular audits and updates of sensor calibration and maintenance schedules are necessary to ensure the continued accuracy of the system.
5. AI-Powered Anomaly Detection and Secure AI Pipelines
AI models should be trained to recognize and alert the system when abnormal patterns—indicative of attacks such as data poisoning—are detected. Additionally, secure AI pipelines, where each model update or decision is validated and signed by a trusted authority, can protect against tampering and unauthorized changes.
6. Backup and Recovery Mechanisms
To mitigate the risks of ransomware and DoS attacks, subway power systems must implement robust backup and recovery mechanisms. Regular backups of system configurations, AI model parameters, and critical operational data should be maintained, and recovery plans should be established to restore system functionality swiftly in case of an attack.
7. Zero-Trust Security Architecture
The zero-trust model assumes that no device or user is inherently trustworthy, whether inside or outside the network. In the context of subway power systems, this approach would involve strict access control policies, continuous monitoring of all system interactions, and multi-factor authentication (MFA) for all users and devices.

6.6.3. Summary of the Key Security Threats in AI-Driven Subway Power Systems

Based on the above, Table 24 provides a comprehensive summary of the key security threats in AI-driven subway power systems, along with the corresponding mitigation strategies and their implementation priorities. The table highlights critical threats such as data poisoning, sensor spoofing, ransomware, unauthorized access, and communication latency, which can compromise system performance and safety. For each threat, the table outlines specific countermeasures, including secure data pipelines, anomaly detection, encryption, multi-factor authentication, and edge computing, along with the necessary complexity for implementation. Notably, threats like data poisoning and ransomware are deemed high-priority due to their potential to disrupt system functionality, while solutions such as anomaly detection and secure AI pipelines are highlighted as essential for maintaining data integrity and system resilience. The table underscores the importance of a multi-layered, adaptive security approach to ensure the reliability and robustness of AI-powered subway power systems, with a focus on addressing both technical and operational vulnerabilities. In our view, the implementation of these mitigation strategies should be approached incrementally, with particular emphasis on continuous monitoring and system updates to stay ahead of emerging threats in the evolving landscape of smart transportation infrastructure.
Overall, the integration of AI and MASs in subway power systems significantly enhances system efficiency and self-healing capabilities but also introduces new security risks that must be addressed comprehensively. The proposed security measures, including advanced encryption, intrusion detection systems, federated learning, and AI-powered anomaly detection, are critical in safeguarding subway power systems from malicious threats. Ensuring that these systems are robust against potential cyberattacks will require continuous monitoring, adaptive security measures, and ongoing research to stay ahead of emerging threats.
By prioritizing cybersecurity and data integrity, operators can safeguard the reliability and safety of AI-driven subway power systems, thus fostering a secure and resilient urban transportation infrastructure that can meet the demands of future smart cities. As AI and cybersecurity technologies evolve, these systems will need to be periodically reassessed and upgraded to maintain their effectiveness in protecting public infrastructure.

7. Conclusions and Policy Implications

7.1. Conclusions

The research presented in this review highlights groundbreaking advancements in enhancing the self-healing capabilities of subway power supply systems, with a particular focus on the integration of MASs and AI algorithms. As a critical component of urban rail transit, the reliability and safety of the subway power supply are paramount, and traditional manual interventions for fault diagnosis and recovery have become insufficient to meet the increasingly complex demands of modern urban transportation systems. This paper has explored the evolving concept of self-healing technology, which has found successful applications in power grids and distribution networks, and it has demonstrated how these technologies can revolutionize subway power supply systems.
The integration of MASs and the IEC 61850 standard offers a novel, innovative approach to building an autonomous, adaptive, and intelligent self-healing control framework. By leveraging the strengths of MASs in decentralized control, coordination, and decision-making, subway power systems can respond dynamically to faults in ways that minimize the impact on service continuity and operational safety. The IEC 61850 standard, a globally recognized communication protocol for power systems, provides the interoperability and flexibility needed to implement these complex, decentralized self-healing mechanisms effectively. This novel hybrid model has not only enhanced the reliability of subway power systems but also set the foundation for more robust and scalable self-healing systems in urban rail infrastructure.
This review has also demonstrated that MASs combined with AI-driven fault diagnosis algorithms can drastically improve the speed, accuracy, and efficiency of fault detection, analysis, isolation, and recovery. Specifically, AI algorithms have the capacity to handle complex, multi-fault scenarios that may overwhelm traditional control methods. Furthermore, through advanced machine learning techniques, the system can continuously learn and adapt to new fault patterns, improving efficiency over time, which distinguishes this approach from conventional methods.
One of the most promising findings is the application of hybrid architectures that combine MASs with the IEC 61850 framework to support critical functions such as fault localization, isolation, and recovery in subway power systems. These hybrid architectures facilitate seamless communication between various subsystems, enabling a holistic view of the system’s health, making them ideal for real-time fault management in complex, large-scale networks like those of modern subway systems. This innovative integration is presented as a unique contribution, offering greater resilience and adaptability in fault management than existing technologies.
The potential for intelligent fault recovery strategies, supported by AI, is also highlighted in this review. These strategies, capable of quick adaptation to various fault conditions, can drastically reduce the time required for recovery, thereby improving the overall reliability of subway operations. Through continuous monitoring and real-time decision-making, AI-based recovery systems enhance the ability of subway power supply systems to self-heal, ensuring operational resilience even in the face of unforeseen challenges. This ability to adapt in real time, without requiring manual intervention, marks a significant shift in how subway systems handle faults.
In conclusion, the research presented in this review indicates that self-healing technologies, underpinned by MASs and AI, represent a crucial evolution in the design and operation of subway power systems. By reducing the need for manual intervention, enhancing fault detection and recovery processes, and improving system efficiency, these technologies will play a key role in shaping the future of urban transportation infrastructure. The integration of these innovative technologies not only holds promise for improving the resilience and performance of subway power supply systems but also sets the stage for broader applications of self-healing technologies in other critical infrastructure systems, marking a significant contribution to the ongoing intelligent infrastructure revolution.

7.2. Policy Implications

The findings of this research highlight several critical policy implications for the advancement and deployment of self-healing technologies in subway power systems, particularly those driven by MASs and AI. As subway systems worldwide face increasing demands for reliability, efficiency, and automation, the adoption of self-healing technologies is becoming an essential step toward achieving these goals. To fully realize the potential of MASs and AI in self-healing subway power systems, policymakers will need to consider a range of strategic actions and regulatory frameworks to facilitate the integration of these advanced technologies.
First and foremost, policymakers must recognize the need for substantial investment in research and development (R&D) to continue advancing the capabilities of MASs and AI algorithms in self-healing applications. While promising, the implementation of these technologies in subway power systems requires overcoming technical challenges, including data acquisition, system integration, and real-time decision-making capabilities. Governments and industry stakeholders should collaborate to fund R&D initiatives that focus on refining the algorithms, improving system interoperability, and testing the performance of these technologies in real-world environments. Public–private partnerships can play a crucial role in accelerating the development and deployment of these innovations.
Another significant policy consideration is the establishment of regulatory standards that ensure the compatibility and interoperability of self-healing systems across different subway networks and urban environments. The IEC 61850 standard, which has already been proposed as a framework for integration, is a step in the right direction. However, to facilitate the widespread adoption of self-healing technologies, policymakers must ensure that these standards are continuously updated to reflect the rapid advancements in AI and MASs. This includes promoting global standardization efforts to ensure that subway power systems in different regions can communicate seamlessly, share data, and collaborate in real time.
Furthermore, policymakers must address the training and upskilling of the workforce to manage and maintain the advanced self-healing systems. As AI and MASs take a more prominent role in subway power system management, there will be a need for a skilled workforce capable of operating and troubleshooting these complex systems. Educational institutions, in collaboration with industry experts, should develop specialized training programs to equip engineers, operators, and maintenance personnel with the necessary knowledge and skills. Additionally, governments can incentivize workforce development through grants, scholarships, and industry partnerships.
In addition to technological and workforce considerations, policymakers should ensure that the implementation of self-healing technologies aligns with broader sustainability and resilience goals. Subway power supply systems play a key role in reducing urban congestion and greenhouse gas emissions. By integrating self-healing technologies, these systems can become more energy-efficient, reducing the overall environmental footprint of urban transportation infrastructure. Policies promoting the adoption of green technologies and a reduction in carbon emissions in subway networks will further incentivize the integration of advanced self-healing solutions.
Finally, the policy landscape should encourage the collection and sharing of data for ongoing performance analysis. Self-healing systems require continuous data input for machine learning algorithms to adapt and optimize. Therefore, policies promoting data transparency, privacy, and security will be critical to ensuring the safe and efficient operation of self-healing technologies. Regulations must balance the need for open data sharing with the protection of sensitive information, particularly regarding the security of critical infrastructure.
In conclusion, the successful deployment of self-healing technologies in subway power supply systems requires comprehensive policy support that encompasses investment in R&D, the establishment of standards, workforce development, sustainability considerations, and data governance. Policymakers must take proactive steps to create an enabling environment for these innovations to thrive, ensuring that subway systems are not only more resilient and efficient but also more adaptable to future challenges. Through targeted policy initiatives, governments can play a vital role in shaping the future of urban transportation infrastructure and ensuring its continued evolution in an increasingly intelligent, autonomous, and sustainable direction.

Author Contributions

Conceptualization, J.F., T.Y., K.Z. and L.C.; methodology, J.F., T.Y., K.Z. and L.C.; formal analysis, J.F., K.Z. and L.C.; investigation, J.F., T.Y., K.Z. and L.C.; resources, J.F., T.Y., K.Z. and L.C.; data curation, J.F., T.Y., K.Z. and L.C.; writing—original draft preparation, J.F., T.Y., K.Z. and L.C.; writing—review and editing, J.F., T.Y., K.Z. and L.C.; visualization, J.F., T.Y., K.Z. and L.C.; supervision, L.C.; project administration, L.C.; funding acquisition, L.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the Guangzhou Education Bureau University Research Project - Graduate Research Project, grant number 2024312278 (funder: L.C.), and in part by the STU Scientific Research Initiation Grant (SRIG), grant number STF23021 (funder: K.Z.).

Data Availability Statement

No new data were created or analyzed in this study.

Acknowledgments

We sincerely thank the associate editor and invited anonymous reviewers for their kind and helpful comments on our paper.

Conflicts of Interest

Author Jianbing Feng was employed by the company Guangzhou Metro Construction Management Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

AbbreviationFull Form
AIArtificial Intelligence
AC/DCAlternating Current/Direct Current
CICondition-based Inspection
DA/DOData Attribute/Data Object
DCDirect Current
DERDistributed Energy Resources
DOData Object
DOSDenial of Service
EMSEnergy Management System
FLISRFault Location, Isolation, and Service Restoration
GOOSEGeneric Object-Oriented Substation Event
IEDIntelligent Electronic Device
IEC 61850International Electrotechnical Commission 61850 Standard
IDSIntrusion Detection System
IOTInternet of Things
LNLogical Node
MASMulti-Agent System
MMSManufacturing Message Specification
MMXUMeasurement Unit
MTTRMean Time to Repair
PDISProtection Distance Intelligent System
PMUPhasor Measurement Unit
PRPParallel Redundancy Protocol
PTOCProtection Overcurrent Unit
RSTPRapid Spanning Tree Protocol
SAIDISystem Average Interruption Duration Index
SAIFISystem Average Interruption Frequency Index
SCADASupervisory Control and Data Acquisition
SCLSystem Configuration Language
VARVoltage Amperes Reactive
WAMPACWide-Area Monitoring, Protection, and Control
XMLeXtensible Markup Language

References

  1. China Association of Metros. Annual Report on Statistics and Analysis of Urban Rail Transit 2023. Available online: https://www.camet.org.cn/xytj/tjxx/14894.shtml (accessed on 29 March 2024).
  2. IEC 61850; Communication Networks and Systems in Substations. International Electrotechnical Commission (IEC): Geneva, Switzerland, 2011.
  3. Shang, W.L.; Lv, Z. Low carbon technology for carbon neutrality in sustainable cities: A survey. Sustain. Cities Soc. 2023, 92, 104489. [Google Scholar] [CrossRef]
  4. Wang, L. Study on the Fault Diagnosis and Protection of Energy-Fed Supply System in Urban Mass Transit. Ph.D. Thesis, Beijing Jiaotong University, Beijing, China, 2010. [Google Scholar]
  5. Du, F. Modeling for Metro Locomotive and Analysis of Fault Condition of DC Traction Power Supply System. Ph.D. Thesis, Beijing Jiaotong University, Beijing, China, 2010. [Google Scholar]
  6. Zeng, B.; Zhang, J.; Yang, X.; Wang, J.; Dong, J.; Zhang, Y. Integrated planning for transition to low-carbon distribution system with renewable energy generation and demand response. IEEE Trans. Power Syst. 2013, 29, 1153–1165. [Google Scholar] [CrossRef]
  7. Allegretti, G.; Montoya, M.A.; Bertussi, L.A.S.; Talamini, E. When being renewable may not be enough: Typologies of trends in energy and carbon footprint towards sustainable development. Renew. Sustain. Energy Rev. 2022, 168, 112860. [Google Scholar] [CrossRef]
  8. Song, X. Research on Fault Location Methods for City DC Railway Traction System; Beijing Jiaotong University: Beijing, China, 2015. [Google Scholar]
  9. Qin, B.; Wang, H.; Wang, Z.; Xiong, Z.; Zhao, J.; Lu, H.; Wang, M. Integrated development of urban rail transit and energy systems supported by underground space. Strateg. Study CAE 2023, 25, 45–59. [Google Scholar] [CrossRef]
  10. Serdar, M.Z.; Koç, M.; Al-Ghamdi, S.G. Urban transportation networks resilience: Indicators, disturbances, and assessment methods. Sustain. Cities Soc. 2022, 76, 103452. [Google Scholar] [CrossRef]
  11. Wang, K.K.; Lv, Y. Fault Location Method of Metro DC Traction Power Supply System Catenary. Urban Mass Transit 2022, 7, 222–224, 229. [Google Scholar]
  12. Wei, R.; Shi, G.; Zhuang, K.; Xia, J. Research on Fault Location of Subway DC Traction Power Supply System Based on GPS Time Synchronization. Mar. Electr. Electron. Eng. 2023, 43, 85–88. [Google Scholar]
  13. Jin, X.; Li, Z.; Hu, Z. Simulation of Fault Location for Subway DC Power Supply System Based on Time Domain Differential. Mar. Electr. Electron. Eng. 2017, 37, 67–70. [Google Scholar]
  14. Pei, W. Research on the Reliability of Subway Traction Power Supply Systems; Nanjing University of Science and Technology: Nanjing, China, 2018. [Google Scholar]
  15. Zhou, J. Research on Online Reliability Evaluation for Traction Power Supply System of Metro Network; Shanghai Jiaotong University: Shanghai, China, 2012. [Google Scholar]
  16. Sheng, S.; Li, K.K.; Chan, W.L.; Xiangjun, Z.; Xianzhong, D. Agent-based self-healing protection system. IEEE Trans. Power Deliv. 2006, 21, 610–618. [Google Scholar] [CrossRef]
  17. Ji, X.; Jian, L.; Yan, X.; Wang, H. Research on self-healing technology of smart distribution network based on multi-agent system. In Proceedings of the 2016 Chinese Control and Decision Conference (CCDC), Yinchuan, China, 28–30 May 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 6132–6137. [Google Scholar]
  18. Xiang, G.; Xin, A. The application of self-healing technology in smart grid. In Proceedings of the 2011 Asia-Pacific Power and Energy Engineering Conference, Wuhan, China, 25–28 March 2011. [Google Scholar]
  19. Zhang, R.; Bie, Z. Distributed cluster-level cooperative control of dynamic virtual microgrid cluster for active distribution network. Autom. Electr. Power Syst. 2022, 46, 55–62. [Google Scholar]
  20. Zhao, Y.; Rieger, C.; Zhu, Q. Multi-agent learning for resilient distributed control systems. arXiv 2022, arXiv:2208.05060. [Google Scholar]
  21. Pang, Y.; Lodewijks, G. Agent-based intelligent monitoring in large-scale continuous material transport. In Proceedings of the 2012 9th IEEE International Conference on Networking, Sensing and Control, Beijing, China, 11–14 April 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 79–84. [Google Scholar]
  22. Mayorov, G.; Stennikov, V.; Barakhtenko, E. Application of the multiagent approach to the research of integrated energy supply systems. In E3S Web of Conferences 2019; EDP Sciences: Les Ulis, France, 2019; Volume 114, p. 01006. [Google Scholar]
  23. Sujil, A.; Verma, J.; Kumar, R. Multi agent system: Concepts, platforms and applications in power systems. Artif. Intell. Rev. 2018, 49, 153–182. [Google Scholar] [CrossRef]
  24. Yu, H.; Wang, Y.; Chen, Z. A novel renewable microgrid-enabled metro traction power system—Concepts, framework, and operation strategy. IEEE Trans. Transp. Electrif. 2021, 7, 1733–1749. [Google Scholar] [CrossRef]
  25. Saray, M.; Saray, M.; Kazan, C.; Guner, S. Optimization of renewable energy usage in public transportation: Mathematical model for energy management of plug-in PV-based electric metrobuses. J. Energy Storage 2024, 78, 109946. [Google Scholar] [CrossRef]
  26. Kilic, B.; Dursun, E. Integration of innovative photovoltaic technology to the railway trains: A case study for Istanbul airport-M1 light metro line. In Proceedings of the IEEE EUROCON 2017-17th International Conference on Smart Technologies, Ohrid, Macedonia, 6–8 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 336–340. [Google Scholar]
  27. Kumar, G.M.S.; Cao, S. Leveraging energy flexibilities for enhancing the cost-effectiveness and grid-responsiveness of net-zero-energy metro railway and station systems. Appl. Energy 2023, 333, 120632. [Google Scholar] [CrossRef]
  28. Yu, J.; Wang, J.; Tong, F. Research and analysis of power supply load forecasting and self-healing control in urban rail transit system. In IOP Conference Series: Earth and Environmental Science 2021; IOP Publishing: Bristol, UK, 2021; Volume 769, p. 042093. [Google Scholar]
  29. Zheng, S.; Liu, Y.; Lin, Y.; Wang, Q.; Yang, H.; Chen, B. Bridging strategy for the disruption of metro considering the reliability of transportation system: Metro and conventional bus network. Reliab. Eng. Syst. Saf. 2022, 225, 108585. [Google Scholar] [CrossRef]
  30. Kalyvas, M.; McCracken, A. Doha Metro Novel Building Automation and Control System (BACS). J. Ind. Integr. Manag. 2024, 9, 571–596. [Google Scholar] [CrossRef]
  31. Longo, M.; Bramani, M. The automation control systems for the efficiency of metro transit lines. In Proceedings of the 2015 AEIT International Annual Conference (AEIT), Naples, Italy, 14–16 October 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1–6. [Google Scholar]
  32. National Energy Technology Laboratory. The Modern Grid Initiative; US: Department of Energy: Washington, DC, USA, 2008; pp. 26–30. [Google Scholar]
  33. Liu, C.; Jung, J.; Heydt, G.T.; Vittal, V.; Phadke, A. The Strategic Power Infrastructure Defense (SPID) System: A Conceptual Design. IEEE Control Syst. Mag. 2000, 20, 40–52. [Google Scholar]
  34. Li, T. Research on the Self-Healing Functions of Smart Distribution Grid and Its Benefits Evaluation Model; North China Electric Power University: Beijing, China, 2012. [Google Scholar]
  35. Amin, M. Toward self-healing energy infrastructure systems. IEEE Comput. Appl. Power 2001, 14, 20–28. [Google Scholar] [CrossRef]
  36. Shittu, E.; Tibrewala, A.; Kalla, S.; Wang, X. Meta-analysis of the strategies for self-healing and resilience in power systems. Adv. Appl. Energy 2021, 4, 100036. [Google Scholar] [CrossRef]
  37. Arefifar, S.A.; Alam, M.S.; Hamadi, A. A review on self-healing in modern power distribution systems. J. Mod. Power Syst. Clean Energy 2023, 11, 1719–1733. [Google Scholar] [CrossRef]
  38. Madani, V.; Novosel, D.; Horowitz, S.; Adamiak, M.; Amantegui, J.; Karlsson, D.; Imai, S.; Apostolov, A. IEEE PSRC report on global industry experiences with system integrity protection schemes (SIPS). IEEE Trans. Power Deliv. 2010, 25, 2143–2155. [Google Scholar] [CrossRef]
  39. Maqsood, M.; Masood, A. Integration of Wireless HART and STK600 Development Kit for Data Collection in Wireless Sensor Networks. Master’s Thesis, Universitetet i Agder/University of Agder, Kristiansand, Norway, 2013. [Google Scholar]
  40. Morais, B.T.P. Emerging Technologies and Future Trends in Substation Automation Systems for the Protection, Monitoring and Control of Electrical Substations; PQDT-Global: Ann Arbor, MI, USA, 2013. [Google Scholar]
  41. Majhi, A.A.K.; Mohanty, S. A comprehensive review on Internet of Things applications in power systems. IEEE Internet Things J. 2024, 11, 34896–34923. [Google Scholar] [CrossRef]
  42. Wang, L.; Bo, Z.; Wang, Q.P.; Liu, R.T.; Fan, W. Design of integrated wide area protection and control for power grid. DPI Proc. 2018, 1, 206–214. [Google Scholar] [CrossRef] [PubMed]
  43. Hellman, C.; Aronson, M.; Tom, N.; Quan, W. The microprocessor and the minicomputer for earth terminal and network control. ITC Proc. 1981, 1, 529–548. [Google Scholar]
  44. Terzija, V.; Valverde, G.; Cai, D.; Regulski, P.; Madani, V.; Fitch, J.; Skok, S.; Begovic, M.M.; Phadke, A. Wide-area monitoring, protection, and control of future electric power networks. Proc. IEEE 2010, 99, 80–93. [Google Scholar] [CrossRef]
  45. Rahman, W.U.; Ali, M.; Mehmood, C.A.; Khan, A. Design and implementation for wide area power system monitoring and protection using phasor measuring units. WSEAS Trans. Power Syst. 2013, 8, 57–64. [Google Scholar]
  46. Cheng, L.F.; Yu, T. A new generation of AI: A review and perspective on machine learning technologies applied to smart energy and electric power systems. Int. J. Energy Res. 2019, 43, 1928–1973. [Google Scholar] [CrossRef]
  47. Cheng, L.F.; Wei, X.; Li, M.; Tan, C.; Yin, M.; Shen, T.; Zou, T. Integrating evolutionary game-theoretical methods and deep reinforcement learning for adaptive strategy optimization in user-side electricity markets: A comprehensive review. Mathematics 2024, 12, 3241. [Google Scholar] [CrossRef]
  48. Cheng, L.F.; Yu, T.; Zhang, X.S.; Yang, B. Parallel cyber-physical-social systems based smart energy robotic dispatcher and knowledge automation: Concepts, architectures and challenges. IEEE Intell. Syst. 2019, 34, 54–64. [Google Scholar] [CrossRef]
  49. Nyangon, J. Climate-proofing critical energy infrastructure: Smart grids, artificial intelligence, and machine learning for power system resilience against extreme weather events. J. Infrastruct. Syst. 2024, 30, 03124001. [Google Scholar] [CrossRef]
  50. Ahmad, T.; Madonski, R.; Zhang, D.; Huang, C.; Mujeeb, A. Data-driven probabilistic machine learning in sustainable smart energy/smart energy systems: Key developments, challenges, and future research opportunities in the context of smart grid paradigm. Renew. Sustain. Energy Rev. 2022, 160, 112128. [Google Scholar] [CrossRef]
  51. Dick, K.; Russell, L.; Souley Dosso, Y.; Kwamena, F.; Green, J.R. Deep learning for critical infrastructure resilience. J. Infrastruct. Syst. 2019, 25, 05019003. [Google Scholar] [CrossRef]
  52. Nama, P.; Reddy, P.; Pattanayak, S.K. Artificial Intelligence for Self-Healing Automation Testing Frameworks: Real-Time Fault Prediction and Recovery. Artif. Intell. 2024, 64 (Suppl. S3), 111–141. [Google Scholar]
  53. Plevris, V.; Papazafeiropoulos, G. AI in Structural Health Monitoring for Infrastructure Maintenance and Safety. Infrastructures 2024, 9, 225. [Google Scholar] [CrossRef]
  54. Manoharan, A.; Sarker, M. Revolutionizing Cybersecurity: Unleashing the Power of Artificial Intelligence and Machine Learning for Next-Generation Threat Detection. Int. Res. J. Mod. Eng. Technol. Sci. 2022, 4, 2151–2164. [Google Scholar] [CrossRef]
  55. Fadi, O.; Karim, Z.; Mohammed, B. A survey on blockchain and artificial intelligence technologies for enhancing security and privacy in smart environments. IEEE Access 2022, 10, 93168–93186. [Google Scholar] [CrossRef]
  56. Tooki, O.O.; Popoola, O.M. A critical review on intelligent-based techniques for detection and mitigation of cyberthreats and cascaded failures in cyber-physical power systems. Renew. Energy Focus 2024, 51, 100628. [Google Scholar] [CrossRef]
  57. Ahmad, S. Real-Time Control and Power Management for Interconnected Microgrids with Self-Healing Capability. Ph.D. Thesis, University of Malaya, Kuala Lumpur, Malaysia, 2022. [Google Scholar]
  58. Rath, S.; Nguyen, L.D.; Sahoo, S.; Popovski, P. Self-healing secure blockchain framework in microgrids. IEEE Trans. Smart Grid 2023, 14, 4729–4740. [Google Scholar] [CrossRef]
  59. Watuwa, B. Power Reliability Analysis of DC Traction Power Supply System: A Case Study of Addis Ababa Light Rail Transit; Addis Ababa University: Addis Ababa, Ethiopia, 2019; Available online: https://scholar.googleusercontent.com/scholar?q=cache:Q3mqEIVgZ1kJ:scholar.google.com/&hl=zh-CN&as_sdt=0,5&scioq=Power+Reliability+Analysis+of+DC+Traction+Power+Supply+System:+A+Case+Study+of+Addis+Ababa+Light+Rail+Transit (accessed on 10 March 2025).
  60. Ogunsola, A.; Mariscotti, A. Electromagnetic Compatibility in Railways: Analysis and Management; Springer Science & Business Media, Springer Publishing Company, Incorporated, 1 July 2013; pp. 1–528. Available online: https://books.google.com/books?hl=zh-CN&lr=&id=N5B3S13cPpIC&oi=fnd&pg=PR2&dq=related:YcJun2vRVgIJ:scholar.google.com/&ots=EecduOIHre&sig=teucUbqoCLzfPK-9XlVwRKHvlp4#v=onepage&q&f=false (accessed on 10 March 2025). [CrossRef]
  61. López, D.Á.J.L. Optimising the Electrical Infrastructure of Mass Transit Systems to Improve the Use of Regenerative Braking. Ph.D. Thesis, Universidad Pontificia Comillas, Madrid, Spain, 2016. [Google Scholar]
  62. Parizad, A.; Baghaee, H.R. Overview of smart cyber-physical power systems: Fundamentals, Challenges, and Solutions. Wiley Online Libr. 2025, 1, 157–178. Available online: https://onlinelibrary.wiley.com/doi/abs/10.1002/9781394191529.ch1 (accessed on 10 March 2025).
  63. Castro Gómez, A. Feasibility for the Introduction of Current Limiting Impedance for a Previously Solid Grounded Medium Voltage Distribution Network. Master’s Thesis, Politecnico di Milano, Milan, Italy, 28 April 2017. Available online: https://www.politesi.polimi.it/retrieve/a81cb05c-3dc7-616b-e053-1605fe0a889a/Thesis%20Alex%20Castro.pdf (accessed on 10 March 2025).
  64. Haque, A.; Malik, A.; Shah, N.; Malik, J.A.; Ahmad, R.; Arif, M. Fundamentals of power electronics in smart cities. Taylor Fr. 2024, 1, 77–89. Available online: https://www.taylorfrancis.com/chapters/edit/10.1201/9781032669809-1/fundamentals-power-electronics-smart-cities-ahteshamul-haque-naila-shah-junaid-ahmad-malik-azra-malik (accessed on 10 March 2025).
  65. Iovanovici, A. Designing Low Latency, Fault-Tolerant Sensor Networks Using Complex Networks Analysis. Timişoara: Editura Politehnica. 2015. ISBN 9786065549623, 6065549622. Available online: https://search.worldcat.org/zh-cn/title/1288695767 (accessed on 10 March 2025).
  66. Raghunath, K.; Rengarajan, N. Response time optimization with enhanced fault-tolerant wireless sensor network design for on-board rapid transit applications. Clust. Comput. 2019, 22 (Suppl. 4), 9737–9753. [Google Scholar] [CrossRef]
  67. Kumari, S.; Tyagi, A.K. Wireless sensor networks: An introduction. Digit. Twin Blockchain Smart Cities 2024, 1, 12–22. Available online: https://scholar.google.com/citations?user=RIgaVmUAAAAJ&hl=en&num=20&oi=sra (accessed on 10 March 2025).
  68. Hernandez, J.C.; Sutil, F.S.; Vidal, P.G. Protection of a multiterminal DC compact node feeding electric vehicles on electric railway systems, secondary distribution networks, and PV systems. Turk. J. Electr. Eng. Comput. Sci. 2016, 24, 3123–3143. [Google Scholar] [CrossRef]
  69. Swain, A.; Abdellatif, E.; Mousa, A.; Pong, P.W. Sensor technologies for transmission and distribution systems: A review of the latest developments. Energies 2022, 15, 7339. [Google Scholar] [CrossRef]
  70. Georgilakis, P.S.; Hatziargyriou, N.D. Optimal distributed generation placement in power distribution networks: Models, methods, and future research. IEEE Trans. Power Syst. 2013, 28, 3420–3428. [Google Scholar] [CrossRef]
  71. Muzzammel, R.; Raza, A.; Hussain, M.R.; Abbas, G. MT-HVdc systems fault classification and location methods based on traveling and non-traveling waves—A comprehensive review. Appl. Sci. 2019, 9, 4760. [Google Scholar] [CrossRef]
  72. Hamidi, R.J.; Livani, H. A recursive method for traveling-wave arrival-time detection in power systems. IEEE Trans. Power Deliv. 2018, 33, 1097–1106. [Google Scholar]
  73. Costa, F.B.; Miranda, V.; Leite, H. Wavelet-based analysis and detection of traveling waves due to DC faults in LCC HVDC systems. Int. J. Electr. Power Energy Syst. 2019, 105, 158–165. [Google Scholar]
  74. Esmail, E.M.; Elsadd, M.A.; Elkalashy, N.I. A review: Smart distribution grid management using agents. WSEAS Trans. Power Syst. 2020, 1, 348234782. [Google Scholar] [CrossRef]
  75. Liu, C.; Chen, Z.; Bak, C.L. Multi-agent system based adaptive protection for dispersed generation integrated distribution systems. Trans. Power Syst. 2013, 1, 270506087. [Google Scholar] [CrossRef]
  76. Rahman, M.S.; Muyeen, S.M.; Ghosh, A.; Islam, S.M. Multi-agent systems in ICT enabled smart grid: A status update on technology framework and applications. IEEE Trans. Power Deliv. 2019, 1, 8765552. [Google Scholar]
  77. Alstom. Towards the First Railway Cybersecurity International Standard: Why Standards Are Important to Secure Railways; Alstom: Saint-Ouen-sur-Seine, France, 2024; Available online: https://www.alstom.com/press-releases-news/2024/3/towards-first-railway-cybersecurity-international-standard-why-standards-are-important-secure-railways (accessed on 10 March 2025).
  78. Radiflow. Securing Railway Operations from OT Cyberattacks; Radiflow: Mahwah, NJ, USA, 2024; Available online: https://www.radiflow.com/white-papers/securing-railway-operations-from-ot-cyberattacks/ (accessed on 11 March 2025).
  79. REPLIL. Cybersecurity in Railway Digital Transformation Journey; REPLIL: Dubai, United Arab Emirates, 2024; Available online: https://www.replil.com/cybersecurity-in-railway-digital-transformation-journey/ (accessed on 11 March 2025).
  80. Mbango, F. Investigation into Alternative Protection Solutions for Distribution Networks. Ph.D. Thesis, Cape Peninsula University of Technology, Bellville, TX, USA, 2009. Available online: https://core.ac.uk/download/pdf/148365012.pdf (accessed on 11 March 2025).
  81. Dutta Pramanik, P.; Upadhyaya, B.; Kushwaha, A.; Bhowmik, D. Harnessing IoT: Transforming Smart Grid Advancements. In IoT for Smart Grid: Revolutionizing Electrical Engineering 2025, Chapter 7, 127–174; Wiley Online Library: Hoboken, NJ, USA, 2025; Available online: https://onlinelibrary.wiley.com/doi/abs/10.1002/9781394279401.ch7 (accessed on 11 March 2025).
  82. Pandiyan, P.; Saravanan, S.; Kannadasan, R.; Krishnaveni, S.; Alsharif, M.; Kim, M. A comprehensive review of advancements in green IoT for smart grids: Paving the path to sustainability. Energy Rep. 2024, 11, 5504–5531. [Google Scholar] [CrossRef]
  83. Baroud, S.Y.; Yahaya, N.A.; Elzamly, A.M. Cutting-Edge AI Approaches with MAS for PdM in Industry 4.0: Challenges and Future Directions. J. Appl. Data Sci. 2024, 5, 455–473. [Google Scholar] [CrossRef]
  84. Chouhan, S.; Mohammadi, F.D.; Feliachi, A.; Solanki, J.M.; Choudhry, M.A. Hybrid MAS Fault Location, Isolation, and Restoration for Smart Distribution System with Microgrids. In Proceedings of the 2016 IEEE Power and Energy Society General Meeting (PESGM), Boston, MA, USA, 17–21 July 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–5. [Google Scholar]
  85. Han, Y.; Zhang, K.; Li, H.; Coelho, E.A.A.; Guerrero, J.M. MAS-Based Distributed Coordinated Control and Optimization in Microgrid and Microgrid Clusters: A Comprehensive Overview. IEEE Trans. Power Electron. 2017, 33, 6488–6508. [Google Scholar] [CrossRef]
  86. Hua, H.; Li, Y.; Wang, T.; Dong, N.; Li, W.; Cao, J. Edge Computing with Artificial Intelligence: A Machine Learning Perspective. ACM Comput. Surv. 2023, 55, 1–35. [Google Scholar] [CrossRef]
  87. Wang, F.; Zhang, M.; Wang, X.; Ma, X.; Liu, J. Deep Learning for Edge Computing Applications: A State-of-the-Art Survey. IEEE Access 2020, 8, 58322–58336. [Google Scholar] [CrossRef]
  88. Wang, X.; Han, Y.; Leung, V.C.M.; Niyato, D.; Yan, X.; Chen, X. Convergence of Edge Computing and Deep Learning: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2020, 22, 869–904. [Google Scholar] [CrossRef]
  89. Chen, J.; Ran, X. Deep Learning with Edge Computing: A Review. Proc. IEEE 2019, 107, 1655–1674. [Google Scholar] [CrossRef]
  90. Murshed, M.G.S.; Murphy, C.; Hou, D.; Khan, N.; Ananthanarayanan, G.; Hussain, F. Machine Learning at the Network Edge: A Survey. ACM Comput. Surv. 2021, 54, 1–37. [Google Scholar] [CrossRef]
  91. Logenthiran, T. Multi-Agent System for Control and Management of Distributed Power Systems. Ph.D Thesis, National University of Singapore, Singapore, 2012. [Google Scholar]
  92. Dou, C.; Hao, D.; Jin, B.; Wang, W.; An, N. Multi-agent-system-based decentralized coordinated control for large power systems. Int. J. Electr. Power Energy Syst. 2014, 63, 814–821. [Google Scholar] [CrossRef]
  93. Farid, A.M. Multi-agent system design principles for resilient coordination & control of future power systems. Intell. Ind. Syst. 2015, 1, 13–34. [Google Scholar]
  94. Herrera, M.; Pérez-Hernández, M.; Parlikad, A.; Izquierdo, J. Multi-Agent Systems and Complex Networks: Review and Applications in Systems Engineering. Processes 2020, 8, 312. [Google Scholar] [CrossRef]
  95. Sharifi, L. Economics Inspired Energy Aware Service Provisioning in P2P Assisted Cloud Ecosystems. Technico.Ulisboa.Pt 2015, 1, 72–98. Available online: https://web.tecnico.ulisboa.pt/~ist14191/repository/Leila-Sharifi-CAT.pdf (accessed on 13 March 2025).
  96. Irfan, M.; Iqbal, J.; Iqbal, A.; Riaz, R.A. Opportunities and Challenges in Control of Smart Grids—Pakistani Perspective. Renew. Sustain. Energy Rev. 2017, 7, 652–674. [Google Scholar] [CrossRef]
  97. Aftab, M.A.; Hussain, S.M.S.; Ali, I.; Ustun, T.S. IEC 61850-Based Communication Layer Modeling for Electric Vehicles: Electric Vehicle Charging and Discharging Processes Based on the International Electrotechnical Commission 61850 Standard and Its Extensions. IEEE Ind. Electron. Mag. 2020, 14, 4–14. [Google Scholar] [CrossRef]
  98. Mackiewicz, R.E. Overview of IEC 61850 and Benefits. In Proceedings of the 2006 IEEE Power Engineering Society General Meeting, Montreal, QC, Canada, 18–22 June 2006; IEEE: Piscataway, NJ, USA, 2006; p. 8. [Google Scholar]
  99. Youssef, T.A.; El Hariri, M.; Bugay, N.; Mohammed, O.A. IEC 61850: Technology Standards and Cyber-Threats. In Proceedings of the 2016 IEEE 16th International Conference on Environment and Electrical Engineering (EEEIC), Florence, Italy, 7–10 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [Google Scholar]
  100. Aftab, M.A.; Hussain, S.M.S.; Ali, I.; Ustun, T.S. IEC 61850-Based Substation Automation System: A Survey. Int. J. Electr. Power Energy Syst. 2020, 120, 106008. [Google Scholar] [CrossRef]
  101. Shin, I.J.; Song, B.K.; Eom, D.S. International Electronical Committee (IEC) 61850 Mapping with Constrained Application Protocol (CoAP) in Smart Grids Based European Telecommunications Standard Institute Machine-to-Machine (M2M) Environment. Energies 2017, 10, 393. [Google Scholar] [CrossRef]
  102. Ozansoy, C.R.; Zayegh, A.; Kalam, A. Object Modeling of Data and Datasets in the International Standard IEC 61850. IEEE Trans. Power Deliv. 2009, 24, 1140–1147. [Google Scholar] [CrossRef]
  103. Kostic, T.; Preiss, O.; Frei, C. Understanding and Using the IEC 61850: A Case for Meta-Modelling. Comput. Stand. Interfaces 2005, 27, 679–695. [Google Scholar] [CrossRef]
  104. Ihle, C.; Trautwein, D.; Schubotz, M.; Meuschke, N.; Gipp, B. Incentive Mechanisms in Peer-to-Peer Networks—A Systematic Literature Review. ACM Comput. Surv. 2023, 55 (Suppl. S14), 1–69. [Google Scholar] [CrossRef]
  105. Reckerd, D.; Vico, J. Application of Peer-to-Peer Communication, for Protection and Control, at Seward Distribution Substation. In Proceedings of the 58th Annual Conference for Protective Relay Engineers, College Station, TX, USA, 5–7 April 2005; IEEE: Piscataway, NJ, USA, 2005; pp. 40–45. [Google Scholar]
  106. Wojdak, W. Rapid Spanning Tree Protocol: A New Solution from an Old Technology. Reprinted from CompactPCI Systems March 2003. Available online: http://pdf.cloud.opensystemsmedia.com/advancedtca-systems.com/PerfTech.Mar03.pdf (accessed on 13 March 2025).
  107. Marchese, M.; Mongelli, M. Simple Protocol Enhancements of Rapid Spanning Tree Protocol Over Ring Topologies. Comput. Netw. 2012, 56, 1131–1151. [Google Scholar] [CrossRef]
  108. Pallos, R.; Farkas, J.; Moldovan, I.; Lukovszki, C. Performance of Rapid Spanning Tree Protocol in Access and Metro Networks. In Proceedings of the 2007 Second International Conference on Access Networks & Workshops, Ottawa, ON, Canada, 22–24 August 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 1–8. [Google Scholar]
  109. Li, Q.; Wang, D.; Huang, X.; Zhang, H. A System Configuration Description Language (SCL) Complied File Based Configuration Method for Bridges in Smart Substation. In Proceedings of the 2023 7th International Conference on Smart Grid and Smart Cities (ICSGSC), Lanzhou, China, 22–24 September 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 136–141. [Google Scholar]
  110. Cruz, J.P.; Kaji, Y.; Yanai, N. RBAC-SC: Role-Based Access Control Using Smart Contract. IEEE Access 2018, 6, 12240–12251. [Google Scholar] [CrossRef]
  111. Zeeshan, M.; Manzoor, M.F.; Qadir, J. Backup Channel and Cooperative Channel Switching On-Demand Routing Protocol for Multi-Hop Cognitive Radio Ad Hoc Networks (BCCCS). In Proceedings of the 2010 6th International Conference on Emerging Technologies (ICET), Islamabad, Pakistan, 18–19 October 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 394–399. [Google Scholar]
  112. Ahmad, A.; El Haffar, A.; Lavanya, P. Moving towards reliable and fault-tolerant smart grid systems. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 2023, 1, 508. [Google Scholar] [CrossRef]
  113. Essackjee, I.A. Leveraging disruptive technologies to realize the smart grid. ResearchGate 2023, 1, 346402311. Available online: https://www.researchgate.net/profile/Ismael-Essackjee/publication/346402311_Leveraging_Disruptive_Technologies_to_Realize_the_Smart_Grid/links/62a23ada55273755ebe07e71/Leveraging-Disruptive-Technologies-to-Realize-the-Smart-Grid.pdf (accessed on 14 March 2025).
  114. De Almeida, L.F.F.; Pereira, L.A.M.; Sodré, A.C. Control networks and smart grid teleprotection: Key aspects, technologies, protocols, and case-studies. IEEE Access 2020, 1, 9200485. [Google Scholar] [CrossRef]
  115. Alabi, M. The Impact of Artificial Intelligence on Network Optimization in Telecommunications. ResearchGate 2023, 1, 384664972. Available online: https://www.researchgate.net/profile/Moses-Alabi/publication/384664972_The_Impact_of_Artificial_Intelligence_on_Network_Optimization_in_Telecommunications/links/6701933d9e6e82486f0549d5/The-Impact-of-Artificial-Intelligence-on-Network-Optimization-in-Telecommunications.pdf (accessed on 14 March 2025).
  116. Umoga, U.J.; Sodiya, E.O.; Ugwuanyi, E.D.; Jacks, B.S.; Lottu, O.A.; Daraojimba, O.D.; Obaigbena, A. Exploring the potential of AI-driven optimization in enhancing network performance and efficiency. Magna Sci. Adv. Res. Rev. 2024, 10, 368–378. [Google Scholar] [CrossRef]
  117. Cruz, Y.J.; Castaño, F.; Haber, R.E.; Villalonga, A.; Ejsmont, K.; Gladysz, B.; Flores, Á.; Alemany, P. Self-Reconfiguration for Smart Manufacturing Based on Artificial Intelligence: A Review and Case Study. In Artificial Intelligence in Manufacturing: Enabling Intelligent, Flexible and Cost-Effective Production Through AI; Springer Nature: Cham, Switzerland, 2024; pp. 121–144. [Google Scholar]
  118. Lin, Y.; Bie, Z. A review of key strategies in realizing power system resilience. Glob. Energy Interconnect. 2018, 1, 2096511718300094. Available online: https://www.sciencedirect.com/science/article/pii/S2096511718300094 (accessed on 14 March 2025).
  119. Yu, P.; Shi, L.; Liu, B. Survivability-aware routing restoration mechanism for smart grid communication network in large-scale failures. EURASIP J. Wirel. Commun. Netw. 2020, 1, 104. Available online: https://link.springer.com/article/10.1186/s13638-020-1653-4 (accessed on 14 March 2025).
  120. Moradi, M.H.; Razini, S.; Hosseinian, S.M. State of the art of multi-agent systems in power engineering: A review. Renew. Sustain. Energy Rev. 2016, 58, 814–824. [Google Scholar] [CrossRef]
  121. Cheng, L.; Yu, T. Smart Dispatching for Energy Internet with Complex Cyber-Physical-Social Systems: A Parallel Dispatch Perspective. Int. J. Energy Res. 2019, 43, 3080–3133. [Google Scholar] [CrossRef]
  122. Cheng, L.; Yu, T.; Zhang, X.; Yin, L. Machine Learning for Energy and Electric Power Systems: State of the Art and Prospects. Autom. Electr. Power Syst. 2019, 43, 15–43. [Google Scholar] [CrossRef]
  123. Renugadevi, R.; Shobana, J.; Arthi, K.; AV, K.; Satishkumar, D.; Sivaraja, M. Real-Time Applications of Artificial Intelligence Technology in Daily Operations. In Using Real-Time Data and AI for Thrust Manufacturing; IGI Global: Hershey, PA, USA, 2024; pp. 243–257. [Google Scholar]
  124. Cen, J.; Yang, Z.; Liu, X.; Xiong, J.; Chen, H. A Review of Data-Driven Machinery Fault Diagnosis Using Machine Learning Algorithms. J. Vib. Eng. Technol. 2022, 10, 2481–2507. [Google Scholar] [CrossRef]
  125. Diez-Olivan, A.; Del Ser, J.; Galar, D.; Sierra, B. Data Fusion and Machine Learning for Industrial Prognosis: Trends and Perspectives Towards Industry 4.0. Inf. Fusion 2019, 50, 92–111. [Google Scholar] [CrossRef]
  126. Fernandes, M.; Corchado, J.M.; Marreiros, G. Machine Learning Techniques Applied to Mechanical Fault Diagnosis and Fault Prognosis in the Context of Real Industrial Manufacturing Use-Cases: A Systematic Literature Review. Appl. Intell. 2022, 52, 14246–14280. [Google Scholar] [CrossRef]
  127. Saufi, S.R.; Ahmad, Z.A.B.; Leong, M.S.; Lim, M.H. Challenges and Opportunities of Deep Learning Models for Machinery Fault Detection and Diagnosis: A Review. IEEE Access 2019, 7, 122644–122662. [Google Scholar] [CrossRef]
  128. Leite, D.; Martins Jr, A.; Rativa, D.; De Oliveira, J.F.; Maciel, A.M. An Automated Machine Learning Approach for Real-Time Fault Detection and Diagnosis. Sensors 2022, 22, 6138. [Google Scholar] [CrossRef]
  129. Jung, K.H.; Kim, H.; Ko, Y. Network Reconfiguration Algorithm for Automated Distribution Systems Based on Artificial Intelligence Approach. IEEE Trans. Power Deliv. 1993, 8, 1933–1941. [Google Scholar] [CrossRef]
  130. Shakiba, F.M.; Azizi, S.M.; Zhou, M.; Abusorrah, A. Application of Machine Learning Methods in Fault Detection and Classification of Power Transmission Lines: A Survey. Artif. Intell. Rev. 2023, 56, 5799–5836. [Google Scholar] [CrossRef]
  131. Bruton, K.; Raftery, P.; Kennedy, B.; Keane, M.M.; O’sullivan, D.T.J. Review of Automated Fault Detection and Diagnostic Tools in Air Handling Units. Energy Effic. 2014, 7, 335–351. [Google Scholar] [CrossRef]
  132. Fenton, W.G.; McGinnity, T.M.; Maguire, L.P. Fault Diagnosis of Electronic Systems Using Intelligent Techniques: A Review. IEEE Trans. Syst. Man Cybern. Part C 2001, 31, 269–281. [Google Scholar] [CrossRef]
  133. Wu, D.; Zheng, A.; Yu, W.; Cao, H.; Ling, Q.; Liu, J.; Zhou, D. Digital Twin Technology in Transportation Infrastructure: A Comprehensive Survey of Current Applications, Challenges, and Future Directions. Appl. Sci. 2025, 15, 1911. [Google Scholar] [CrossRef]
  134. Arora, S.; Tewari, A. AI-Driven Resilience: Enhancing Critical Infrastructure with Edge Computing. Int. J. Curr. Eng. Technol. 2022, 12, 151–157. [Google Scholar]
  135. Nasarian, E.; Alizadehsani, R.; Acharya, U.R.; Tsui, K.L. Designing interpretable ML system to enhance trust in healthcare: A systematic review to propose responsible clinician-AI-collaboration framework. Inf. Fusion 2024, 108, 102412. [Google Scholar] [CrossRef]
  136. KN, K.; Perrusquia, A.; Tsourdos, A.; Ignatyev, D. Integrating Explainable AI into Two-Tier ML Models for Trustworthy Aircraft Landing Gear Fault Diagnosis. In AIAA SCITECH 2025 Forum; American Institute of Aeronautics and Astronautics: Reston, VA, USA, 2025; p. 1928. [Google Scholar]
  137. Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [Google Scholar] [CrossRef]
  138. Cross, L.; Cockburn, J.; Yue, Y.; O’Doherty, J.P. Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments. Neuron 2021, 109, 724–738. [Google Scholar] [CrossRef]
  139. Altaher, A. Implementation of a Dependability Framework for Smart Substation Automation Systems: Application to Electric Energy Distribution. Ph.D. Thesis, Université Grenoble Alpes, Grenoble, France, 2018. [Google Scholar]
  140. Baigent, D.; Adamiak, M.; Mackiewicz, R.; Sisco, G.M.G.M. IEC 61850 Communication Networks and Systems in Substations: An Overview for Users; SISCO Systems: Sterling Heights, MI, USA, 2004. [Google Scholar]
  141. Cappart, Q.; Chételat, D.; Khalil, E.B.; Lodi, A.; Morris, C.; Veličković, P. Combinatorial optimization and reasoning with graph neural networks. J. Mach. Learn. Res. 2023, 24, 1–61. [Google Scholar]
  142. Alex-Omiogbemi, A.A.; Sule, A.K.; Omowole, B.M. Conceptual framework for advancing regulatory compliance and risk management in emerging markets through digital innovation. World J. Adv. Res. Rev. Dec. 2024, 24, 1155–1162. [Google Scholar] [CrossRef]
  143. Wang, X.; Wu, Y.C. Balancing innovation and regulation in the age of generative artificial intelligence. J. Inf. Policy 2024, 14, 93–112. [Google Scholar] [CrossRef]
Figure 1. Hierarchical control architecture for subway power systems: a structured approach to fault recovery and system optimization.
Figure 1. Hierarchical control architecture for subway power systems: a structured approach to fault recovery and system optimization.
Processes 13 01144 g001
Figure 2. A typical electrified railway traction power supply system architecture serves as the foundational architecture for providing electrical power to urban subway transit systems.
Figure 2. A typical electrified railway traction power supply system architecture serves as the foundational architecture for providing electrical power to urban subway transit systems.
Processes 13 01144 g002
Figure 3. A flowchart of the legacy equipment integration process.
Figure 3. A flowchart of the legacy equipment integration process.
Processes 13 01144 g003
Figure 4. Fault isolation and recovery process in self-healing control for subway power supply systems.
Figure 4. Fault isolation and recovery process in self-healing control for subway power supply systems.
Processes 13 01144 g004
Table 1. Comparative summary of self-healing in power/energy systems vs. metro power supply systems.
Table 1. Comparative summary of self-healing in power/energy systems vs. metro power supply systems.
Comparative
Dimension
Self-Healing in Power
and Energy Systems
Self-Healing in Metro Power
Supply Systems
Typical Scenarios, Application Levels, and Definitional Differences
Primary ObjectiveEnsures wide-area safety and reliability, promptly isolates faults, and restores service to critical loads across transmission and distribution.Focuses on swiftly identifying and isolating faults within confined metro lines, minimizing operational disruption, and sustaining continuous train service.In power systems, self-healing primarily targets large-scale networks. In metro systems, it aims at uninterrupted public transit. Both emphasize isolation and quick restoration, yet metro systems impose more stringent continuity requirements.
Network Scale and
Complexity
Comprises hierarchical generation, transmission, and distribution across vast geographic regions, integrating both conventional and distributed energy resources.Concentrates on urban rail corridors, with relatively fixed routing but complex operational environments; traction loads exhibit cyclical fluctuations with high power demands.Power systems handle multi-voltage-level, widely dispersed networks. Metro systems focus on shorter feeder sections and specialized loads. Both require robust control, but metro systems demand faster, more localized responses.
Control and Management LayersTypically includes a layered architecture with a central energy management system (energy management system (EMS)/supervisory control and data acquisition (SCADA)), substation automation, and distributed control in feeder terminals.Employs centralized or semi-centralized control via SCADA or integrated supervisory control systems (ISCSs), with shorter communication pathways for rapid switching actions.Power systems rely on multi-tier communications for broad-area coordination. Metro systems maintain shorter command chains, enabling near-instantaneous protection and recovery. The underlying definitions emphasize automation, but with distinct time horizons.
Fault Types and DetectionEncompasses conventional short-circuits, equipment aging, and severe external disruptions (e.g., lightning, ice storms). Fault detection relies on diverse sensors, relays, and advanced topology analyses.Primarily contends with feeder or substation faults (e.g., short-circuits, overloads), external threats from construction, and environmental factors damaging contact lines; detection uses specialized track sensors.General power systems confront a wider range of fault types, while metro faults typically concentrate on contact-line or substation issues. Both emphasize real-time detection, though metro systems have elevated safety margins due to passenger transport.
Fault Isolation and
Network Reconfiguration
Achieves isolation through automated breakers, reclosers, and load transfer, often within seconds to minutes; integrates alternative power sources to maintain supply continuity.Relies on ring-supplied networks and rapid switching to isolate faulty sections while sustaining feeder services to unaffected track segments, typically within a matter of seconds or less.Both leverage automated switches and feeder reconfiguration. However, metro systems often need a faster (sub-minute to second-level) approach to preserve critical train operations without substantial delays.
Information and Communication TechnologiesRelies on multi-level, wide-area networks (optical fiber, wireless, private lines) with numerous nodes and potential bandwidth constraints; designed for comprehensive monitoring and control.Benefits from relatively shorter distances and more centralized configurations, typically integrated into a single specialized or semi-isolated communication framework with low latency.Both necessitate reliable, real-time communication. Yet, power systems face more distributed deployments, whereas metro systems can leverage a dedicated, smaller-scale communication backbone.
Reliability and Safety StandardsMust comply with national or industry regulations (e.g., SAIDI (system average interruption duration index), SAIFI (system average interruption frequency index)) and increasingly consider cybersecurity challenges; reliability is critical but often measured statistically across larger geographic footprints.Stringent safety standards and zero tolerance for lengthy service interruptions, given its direct impact on public transit; also must consider passenger evacuation and emergency scenarios in fault response.Both strive for high reliability, but metro systems face more immediate safety and service pressures. Definitions converge on the principle of minimizing power interruption, though metro systems place human safety and operational continuity at the forefront.
Current Implementation
and Trends
Already deployed widely in smart distribution grids internationally, with varying degrees of investment and maturity; advanced sensors, grid automation, and microgrid technologies are growing rapidly.Actively being integrated into new and existing metro lines worldwide; especially in newer constructions, self-healing features are incorporated during design to minimize service interruption times and enhance safety.Power grids and metro systems both advance toward greater intelligence and automation, but metro self-healing is more domain-specific and dedicated to ensuring passenger service. Definitions reflect macroscopic grid security vs. localized transit continuity.
Table 2. Complex topological and operational constraints in subway power supply systems.
Table 2. Complex topological and operational constraints in subway power supply systems.
AspectPower and Energy System ApplicationSubway Power Supply ApplicationLevel of ImplementationDifferences in ImplementationFuture PotentialUrgent ChallengesResearch Opportunities
Topological ComplexityTypically radial or meshed network designsHybrid ring/radial topologies under tight constraintsMedium to highLimited space for additional cables; strict safety requirementsEnhanced modeling toolsIntegrating advanced sensors in limited spaceCompact, scalable approaches for real-time fault management
Load VariabilitySeasonal/diurnal load patternsRapid load changes due to train accelerationMediumHigh-frequency fluctuations unique to rail tractionPredictive analytics for dynamic load managementHandling transient conditions in real-time detectionAI-driven adaptive protection schemes
Fault Tolerance RequirementsImportant but can rely on alternative feedersCritical: passenger safety at stakeHighStricter recovery times; mandatory redundancies in underground segmentsSeamless reconfiguration for uninterrupted serviceMaintaining safety with minimal downtimeMAS-based solutions that integrate safety logic
Infrastructure ConstraintsOften more flexible, especially above groundVery limited corridor space; complex cable routingLow to mediumEquipment miniaturization needed; advanced maintenance schedulingInnovative hardware designsHigh costs of retrofitting and expansionModular protective devices optimized for underground environment
Environmental Factors (Heat, Humidity, etc.)Relevant but typically less extremeCritical in enclosed tunnelsMediumVentilation and cooling demands must be integrated with power layoutEnergy-efficient solutions for underground ventilationProtective devices degrade faster due to harsh conditionsDesigning robust sensors and switchgear suited to harsh environments
Communication ConstraintsGenerally open space for wireless or fiber linksUnderground layout complicates communication wiringMediumSignal attenuation in tunnels; need for robust communication protocolsAdvanced tunnel communication frameworksEnsuring real-time data flow under challenging conditionsResearch on fault-tolerant communication for tunnel environments
Maintenance and Operational ConstraintsSignificant but often can schedule downtimeVery limited track closure windowsMedium to highMaintenance must be performed swiftly, often in off-peak or night hoursAutonomous or remote inspection toolsHigh operational risk if maintenance is delayedDevelopment of continuous monitoring systems and predictive maintenance
Regulatory and Safety StandardsNational grid codes and industry standardsStricter local transit authority regulationsHighSafety certification for every component or software moduleHolistic compliance with railway regulationsMultiple approvals from transportation authoritiesIntegrated safety frameworks bridging power and rail standards
Table 3. Potential technological and research interventions for complex operational constraints.
Table 3. Potential technological and research interventions for complex operational constraints.
Aspect/ConstraintProposed InterventionsImplementation StatusKey BenefitsLimitations/BarriersFuture PotentialUrgent ChallengesResearch Directions
Space Constraints and Equipment SizeMiniaturized switchgear, compact substations, solid-stateEarly adoption in some metrosSaves valuable tunnel space, eases retrofittingHigher cost, potential reliability issuesMedium to highTesting and safety certificationsDeveloping robust, affordable miniaturized devices
High Load VariabilityAI-based predictive load balancing, advanced SCADAIn pilot projectsAccurate forecasting, improved real-time controlRequires extensive sensor data, complex algorithmsHighEnsuring real-time responsivenessMachine learning algorithms for real-time load prediction
Environmental Challenges (Heat, Humidity)Enhanced insulation, specialized cooling systemsWidely used but needs updatesProtects equipment longevity, improves reliabilityIncreases CAPEX/OPEX, depends on ventilation designMediumIntegrating with energy efficiencySmart materials, advanced sensor-based heat management
Fault Tolerance and Rapid IsolationMAS-based reconfiguration, advanced protective relayingExperimental or partial useMinimizes downtime, enhances passenger safetyRequires robust communication and standard protocolsHighFast, secure communicationsMAS architecture aligning with IEC 61850 and railway codes
Communication and Data SynchronizationIEC 61850 GOOSE messaging, fiber-optic and wireless hybridsExpanding in pilot programsEnables real-time data sharing, simplified integrationTunnel attenuation and installation complexityHighGuaranteed QoS in tunnel environmentUltra-reliable communication protocols for underground rail
Maintenance and Operational SchedulingPredictive maintenance, digital twins, remote inspectionGrowing adoptionMinimizes downtime, reduces costs, extends asset lifeRequires high initial investment, specialized staffVery HighCoordinating track closuresAI-driven digital twins for continuous condition monitoring
Regulatory Compliance and SafetyUnified standards bridging power and railway domainsOngoing effortsStreamlines certification, ensures compatibilityMultiple authorities, differing regulationsHighProlonged approvalsCollaborative frameworks for standardization
Cybersecurity ToolsEnd-to-end encryption, intrusion detection systemsVaries by regionProtects critical control systems from cyber threatsRequires advanced IT infrastructureHighEnsuring trust in automated systemsAI-based anomaly detection integrated with self-healing
Table 4. Dimensions of real-time fault management in subway vs. general power systems.
Table 4. Dimensions of real-time fault management in subway vs. general power systems.
AspectPower and Energy Systems ApplicationSubway Power Supply ApplicationLevel of ImplementationDifferences in ImplementationFuture PotentialUrgent ChallengesResearch Opportunities
Fault Detection SpeedMillisecond to second rangeMust be sub-cycle to tens of millisecondsMedium to highHigher sensitivity due to enclosed spaces and passenger riskVery high (real-time control)Achieving ultra-fast detection in tunnelsWavelet-based traveling-wave methods
Isolation TechniquesCircuit breakers at multiple nodesLimited breakers, need precise isolation zonesMediumSpace constraints, higher cost for extra switchgearMedium to highMinimizing downtime in short track segmentsMAS-based isolation protocols
Restoration PriorityTypically based on load importanceSafety-critical loads top priority (lighting)HighFocus on passenger evacuation and ventilation requirementsHighEnsuring continuous service with minimal riskHierarchical multi-agent strategies
Communication and CoordinationSCADA systems, sometimes distributedUnderground environment with possible signal lossMedium to highSignal attenuation in tunnels; need for robust communication protocolsVery highReliable data exchange under ground conditionsIEC 61850-based GOOSE in tunnels
Automation LevelModerate to advanced in smart gridsRapidly evolving with pilot tests in subwaysLow to mediumStandard solutions less tested in subterranean rail networksHighBalancing new tech with proven reliabilityFull-scale, integrated MASs + AI systems
Data Handling and AnalyticsCloud-based or on-premise analyticsOn-site edge computing for real-time decisionsGrowingHigh real-time constraints, limited data bandwidthHighManaging large-scale sensor data in real timeEdge AI algorithms for fault prediction
Reliability and RedundancyImportant for major loads, less for small feedersCritical for all track sectionsHighRedundancies have to be physically feasible undergroundVery highEnsuring no single point of failureOptimized redundancy planning
Cost and InvestmentBalanced with broad utility budgetsConstrained by transit authority budgetsVariesHigh capital expenditures for specialized rail infrastructureMediumGaining stakeholder supportEconomic feasibility studies
Table 5. Technological enablers for real-time fault management in subway power networks.
Table 5. Technological enablers for real-time fault management in subway power networks.
Aspect/TechnologyCurrent AdoptionPrimary AdvantageLimitationsFuture PotentialUrgent ChallengesResearch GapsRecommended Solutions/Directions
High-Speed Protection RelaysModerateSub-cycle response, improved sensitivityProne to nuisance tripping under variable loadHighTuning relay settings to subway load patternsAlgorithm refinement for multi-condition loadsCustomized relay settings with AI-based adaptation
Wavelet-Based Fault DetectionPilotAccurate detection of transient signalsComputational complexity in real timeMedium to HighGuaranteeing stable performance under noiseOptimal wavelet design for DC traction signalsHybrid wavelet + machine learning methods
Traveling Wave MethodsLimitedPrecise fault localizationRequires synchronized data acquisitionHighInstalling enough sensors in short intervalsCost-effective sensor deploymentCooperative traveling wave detection systems
MAS-based IsolationExperimentalDistributed decision-making, resilienceComplexity of agent coordination protocolsVery HighAchieving sub-second isolation in tunnelsStandardizing MAS frameworks for rail settingsIEC 61850-compatible MAS design
Self-Healing Restoration LogicEarly prototypesAutomated service restoration, dynamic re-routingRequires robust network modelingMedium to HighHandling partial restorations effectivelyReal-time load and system state estimationMAS-based restoration integrated with SCADA
AI/ML for Fault PredictionEmergingPredictive maintenance, early anomaly detectionData scarcity, labeling issues, validation costsVery HighEnsuring model accuracy in changing conditionsNeural network interpretability and reliabilityHybrid physics-informed neural networks
Edge Computing in SubstationsLow but growingReduces latency, improves local decision-makingLimited computational resources on siteHighOnboard analytics to handle real-time dataDesigning low-power, high-performance hardwareEmbedded systems optimized for fault analytics
Cybersecurity ComplianceVaries by regionProtects reliability of automated fault systemsAdditional cost and complexityHighMitigating increased attack surfaceSecure data frameworks integrated with self-healingBlockchain-based identity and access management
Table 6. Regulatory, safety, and integration barriers in subway power systems.
Table 6. Regulatory, safety, and integration barriers in subway power systems.
AspectPower and Energy Systems ApplicationSubway Power Supply ApplicationLevel of ImplementationDifferences in ImplementationFuture PotentialUrgent ChallengesResearch Opportunities
Regulatory FrameworksUtility-level codes (IEEE, IEC), less direct public scrutinyMulti-layered railway authority oversight, strict passenger safetyMediumLonger certification processes, overlapping authoritiesMedium to highStreamlining multi-agency approvalsStandardization bridging IEC 61850 and rail codes
Safety AssuranceImportant but primarily equipment-focusedCritical for passenger well-being; zero tolerance for major failuresHighNeed for rapid evacuation, air quality, and lighting continuityHighMinimizing disruptions that endanger passengersMAS designs incorporating real-time hazard monitoring
Legacy System IntegrationOften stepwise modernizationLegacy traction power systems with partial SCADALow to mediumProtocol mismatch, older hardware with minimal digital interfacesMediumRetrofitting with minimal service downtimeAdaptive hardware modules, protocol converters
Standards and ProtocolsCommon adoption of IEC 61850 in substation automationEmerging adaptation for traction power and MAS coordinationLow to mediumMust handle DC traction specifics, tunnel conditionsHighHarmonizing substation automation with rail codesIEC 61850 profiles specialized for traction systems
Cost vs. Benefit AnalysisLong-term cost-benefit for large-scale utilitiesImmediate passenger service impact, budget constraintsMediumHard to quantify intangible benefits (e.g., safety, brand image)MediumSecuring investments under strict budget capsDetailed ROI models including passenger satisfaction
Training and WorkforceUtility engineers, typical skill setsSpecialized rail engineers, safety certificationsLow to mediumAdditional training for advanced AI/MAS solutionsHighBuilding cross-functional teamsEducation programs bridging power and rail domains
Public Acceptance and TrustGenerally behind-the-scenes updatesHigh visibility with potential passenger disruptionMedium to highRisk of negative perception if technology causes downtimeMediumEnsuring stable operation during pilot phasesTransparent communication about improvements
Cybersecurity ComplianceGrowing awareness, diverse regulationsVital to protect passenger and operational dataMediumPotential for large-scale disruptions if hackedHighEnsuring end-to-end security in tunnelsIntegrated intrusion detection with MAS frameworks
Table 7. Strategies to overcome regulatory, safety, and integration barriers.
Table 7. Strategies to overcome regulatory, safety, and integration barriers.
Strategy/MeasureAdoption LevelExpected ImpactImplementation DifficultyKey BenefitsPotential Risks/BarriersResearch NeedsFuture Outlook
Unified Standardization EffortsGrowing momentumStreamlines compliance across agencies and vendorsMediumReduced project delays, interoperabilityRequires consensus among diverse stakeholdersHolistic standards for MASs + traction power systemsFeasible with continued collaboration among IEC, IEEE, rail authorities
Cross-Domain Training ProgramsLimited pilotsEnhances workforce competency in both power and railMedium to highFacilitates smooth technology integrationBudget constraints, scheduling complexitiesCurriculum design integrating railway safety and AIKey to building a sustainable talent pipeline
Pilot and Sandbox EnvironmentsEmerging in some metrosAllows safe testing of new systems in controlled settingsMediumMinimizes risk to passengers, validates ROIMay still require partial line closuresDetailed performance metrics, extended pilot durationsGradual system-wide rollouts after proven success
Modular Retrofit ApproachesLimitedIncrementally modernizes legacy systemsHighAvoids complete system overhaul, spreads costComplexity of ensuring compatibilityAdaptive hardware modules, protocol convertersCould become standard practice for older metro lines
Risk–Benefit CommunicationAd hocImproves public acceptance and stakeholder engagementLow to mediumBuilds trust, eases implementation controversiesRequires public outreach, specialized messagingCommunication frameworks and standardized ROI metricsCrucial for ensuring supportive regulatory environment
Comprehensive CybersecurityGrowing awarenessSafeguards data integrity, essential for MASs/AI systemsMediumAvoids catastrophic disruptions, protects passenger dataCostly to maintain, evolving threat landscapeIntrusion detection, endpoint security for IEDsIntegral part of future integrated self-healing systems
Funding and Incentive MechanismsRegion-dependentEncourages R&D investment and pilot deploymentMedium to highFacilitates advanced research, reduces operator riskPolitical and economic uncertaintiesEconomic models that quantify intangible benefitsKey to bridging the gap between research and real deployment
Long-Term Maintenance ContractsLimitedEnsures continuous expert support post-deploymentMediumMaximizes system reliability, knowledge transferPotential vendor lock-inService-level agreements with advanced penalty clausesA stable framework for ensuring reliability over the system lifetime
Table 8. Comparison between current self-healing techniques and the suggested MAS-based strategy.
Table 8. Comparison between current self-healing techniques and the suggested MAS-based strategy.
Comparison CriteriaCurrent Self-Healing TechniquesSuggested Self-Healing Strategy
Technology FoundationTraditional Fault Detection Algorithms: Based on simple fault detection (e.g., overcurrent, voltage drop) and subsequent isolation using conventional relays.AI-based MASs: Utilizes real-time data analysis and intelligent agents that communicate autonomously to identify, localize, and isolate faults more efficiently.
Fault Detection SpeedTypically requires several cycles to detect and isolate faults, leading to delayed responses in high-speed environments like subways.Detects faults in fractions of a cycle (using wavelet-based techniques), significantly improving detection time in high-speed subway systems.
System FlexibilityOften fixed in design, requiring manual intervention or predetermined responses to faults.Highly flexible, where agents adapt and optimize their responses based on changing conditions in real time, offering greater scalability.
Real-Time AdaptabilityLimited real-time adaptability, as conventional methods use static rules for fault isolation.Uses AI to adapt fault recovery strategies in real time, considering dynamic load and environmental factors, especially in urban settings with complex network topologies.
Resilience to Complex TopologiesStruggles in complex network topologies (e.g., ring or radial configurations) as there are fixed paths for fault detection and isolation.MAS-based strategy is inherently more suited for complex network topologies, allowing autonomous decision-making across distributed systems.
Failure Recovery EfficiencyTraditional methods often lead to long recovery times and may not restore service to critical areas promptly.MASs enable rapid rerouting of power through alternate paths in real time, ensuring minimal downtime and prioritized restoration of critical services such as lighting and ventilation.
Space ConstraintsConventional systems use additional hardware (e.g., circuit breakers) which may be difficult to install in confined spaces such as subway tunnels.MASs use distributed sensors and devices (without the need for additional hardware), facilitating more compact and space-efficient implementations.
Maintenance and ScalabilityRequires regular maintenance of each individual component, and expanding the system often requires significant hardware upgrades.MASs require less hardware maintenance and can be scaled up by adding more intelligent agents, making them easier to adapt and expand.
Safety ProtocolsBasic safety mechanisms (e.g., emergency power supply, fire safety) that activate in case of failure but often lack dynamic prioritization of critical services.Integrates advanced safety protocols, ensuring that critical systems (e.g., lighting, ventilation) are always prioritized during fault isolation and power restoration.
CostLower initial cost but higher long-term costs due to maintenance, hardware upgrades, and manual intervention during fault recovery.Higher initial setup cost for AI-based systems, but lower long-term costs due to reduced maintenance needs and quicker fault recovery, offsetting initial investments.
Regulatory ComplianceGenerally compliant with existing safety standards, but lacks integration with emerging AI-based regulations.Requires new regulatory frameworks to accommodate AI-based systems, including certification of MAS- and AI-driven fault management protocols.
Integration with Legacy SystemsDifficulty in integrating with older infrastructures, requiring hardware upgrades or complete system overhauls.MASs can integrate with legacy systems via gateways, allowing for gradual upgrades without a complete system overhaul.
Use of Data AnalyticsRelies on limited data for fault detection, often with basic analysis on voltage, current, and fault type.Uses advanced machine learning models that analyze vast datasets from multiple sensors to predict faults before they occur and optimize recovery strategies.
Environmental AdaptabilityTraditional systems are often static, and environmental factors (e.g., temperature, humidity) can impact their performance.Adaptive MAS technology continuously adjusts to environmental conditions such as temperature or humidity, enhancing system resilience in diverse settings like subway tunnels.
Table 9. Comparative applications of MASs in subway power systems.
Table 9. Comparative applications of MASs in subway power systems.
Application ScenarioDegree of AdoptionMain Differences vs. Conventional MethodsFuture ProspectsKey ChallengesPotential Research DirectionsImplementation ComplexityCurrent DeploymentsEstimated CostStrategic Importance
Fault DiagnosticsMediumDistributed vs. centralized analysisHigh, with advanced AI pattern matchingStandardization of agent protocolsAgent-based feature extraction, big dataModerateSome pilot projects in major citiesModerateVery high for timely response
Fault Isolation and RestorationHighFaster local control decisionsExpansion into microgrid or hybridCommunication securityAutonomous agent negotiation algorithmsHighWidely tested internationallyHigh (due to new device requirements)Critical for system safety and operation
System ReconfigurationModerateCooperative agent-based switchingMulti-level architectureReliability of real-time signalsHybrid MAS-IEC 61850 integrationModerateOngoing research initiativesMediumEssential for robust self-healing
Load SheddingLowIntelligent selective dropping vs. globalPotentially large in future DC tractionComplexity in load forecastingAgent-based optimization with historical dataLow to moderateRare field demosLowImportant for emergency readiness
Predictive MaintenanceLowOnline prognosis vs. reactive upkeepGrowing, particularly with big dataAccuracy of machine learning methodsDeep learning integrated with MASsModerateConceptual studies ongoingMediumEnhances preventive strategies
Voltage RegulationModerateDistributed voltage control vs. singleHigh in smart-grid expansionsCoordinated agent control frameworksMAS-based dynamic VAR controlModerateSome pilot testsMediumImproves power quality and reliability
Power Quality MonitoringLowReal-time harmonic detection vs. offlineEmerging with new sensor technologiesLimited coverage of sensor networksMAS-based harmonic mitigation techniquesModerate to highMinimal large-scale deploymentsMedium–highBoosts passenger experience
Microgrid IntegrationEmergingLocal agent-based decisions vs. central SCADAPotential synergy with renewable and storage in depotsTechnical maturity in AC/DC hybrid systemsMAS strategies for integrated AC/DC gridsHigh (novel tech)Few advanced pilotsHighKey for future urban rail expansions
Energy ManagementModerateIntelligent routing of feeders and loadsLarge, with data-driven analyticsMAS coordination of traction loadsCooperative scheduling with power flow controlModerateUnder studyMediumAffects cost optimization
Security AssessmentLowReal-time multi-agent vigilance vs. static approachesLikely critical with growing threatsCybersecurity integrationMAS-based anomaly detection and intrusion responseHighConceptual prototypesMediumEnsures reliability and safety
Table 10. Research directions and emerging trends in MASs for subway networks.
Table 10. Research directions and emerging trends in MASs for subway networks.
Research FocusCurrent StatePotential EvolutionKey Technical BarriersUnique Subway ConstraintsSynergy with Other Technologies (5G, Edge AI)Real-Time Simulation CapabilitiesScalabilityFunding and CollaborationProjected Long-Term Impact
Agent CoordinationExperimental pilots in academic settingsLarger multi-level hierarchical models within an IEC 61850 environmentCommunication latency and securityAC/DC mixture in traction systemsIntegrated platform for distributed intelligenceHigh-fidelity agent-based simulation with real-time data injectionModerate; advanced control algorithms neededGovernment + private railway operatorsPossibly transformative, enabling advanced self-healing frameworks
Resilient Architecture for Fault HandlingConceptual designs onlyFull integration with standardized protocols (IEC 61850)Dynamic stability and real-time demandsFrequent service runs with minimal downtimeCloud-edge synergy for real-time data analytics and AI-based forecastingTesting under stress conditions, hardware-in-the-loop evaluationsHigh, must handle large expansionsPartnerships among academia and metro agenciesPotential to drastically reduce service disruptions
Multi-Agent Security and Cyber-ResilienceEmerging topic, few publicationsIntegrated self-healing + intrusion detection architectureLack of robust agent intrusion detection and incomplete policyThreat potential from critical service linesAI-driven intrusion detection and responseSimulation involving both operational tech (OT) and information tech (IT)Moderate, as overhead on agent frameworks might be significantCollaborative R&D with cybersecurity vendors and standard bodiesCould become essential as 5G and IoT expand the attack surface
Integration with Big Data and AnalyticsLimited integration for offline analysisFull data-driven MAS control loops with streaming inputsComplexity of data ingestion, real-time transformations (AC, DC, station, etc.)High volume, multi-domain measurementsML, deep learning-based advanced analytics for anomaly detectionReal-time streaming and batch processing synergy for agent trainingHigh, as big data solutions require robust infrastructureIndustry–university collaboration crucialOpens new avenues for predictive control, faster fault recovery
Hierarchical vs. Decentralized AgentsMostly hierarchical prototypesHybrid approaches combining local and centralized synergyInter-agent conflicts in fully distributed agent modelsAC traction and DC feeding lines require different control logicsIoT-based sensing nodes for real-time data acquisitionFlexible scenario testing across different topologies (urban, suburban lines)High for large networks wanting modular expansionsGlobal consortia and standard organizationsCould shape the next-gen topology management methods
Table 11. IEC 61850 applications in subway systems: status and outlook.
Table 11. IEC 61850 applications in subway systems: status and outlook.
Application ScenarioCurrent Adoption LevelKey BenefitsMain Implementation ObstaclesTechnical GapsStandard Extensions Being ExploredScalability ConcernsCost/Benefit AnalysisIndustry CollaborationLong-Term Outlook Avenues
Protection and ControlHigh in new installationsUltra-fast clearing times, standardized object modelingIntegration with legacy DC gearCustom LN for DC traction neededIEC 61850-90-6 for FLISR indicates expansions for distributionMostly manageable, as each station covers a limited number of IEDsPositive ROI in the long run; short-term high investmentJoint pilots by OEMs and transit agenciesKey segment likely to see continuous growth
GOOSE-Based SignalingModerate usage primarily in AC sideMillisecond-level event-driven controlFine-tuning of network redundancy and VLANsOverlapping VLAN, QoS configurationsRailway-specific LN expansionsChallenging in large city-wide systems but feasibleOften justified by improved reliabilityVendor-driven updates, global standardizationExpanding as reliability demands escalate
Central Monitoring and SCADA IntegrationHigh for new lines, partial retrofit in legacy linesStandardized data acquisition, unified engineeringCoordinating data from diverse device typesFull mapping for older DC devices incomplete or vendor-proprietaryIEC 61850 in traction automation is under development beyond substationSystem-level expansions in large subway networksHigh initial cost, strong ROI in O&M savings over timePartnerships with system integrators, local operatorsIndispensable for future expansions in subways and electrified rail
Condition MonitoringEmerging interestUniform data model, improved predictive maintenanceRetrofitting sensors into older assetsData volume and storage infrastructureIEC 61850 extension for advanced sensorsPotentially large as new sensor deployments scale outPotentially high initial investment, offset by O&M savings in the short termPotential synergy with predictive maintenance platformsHighly promising with AI strategies for improved reliability and safety
Integration with MASsLimited yet growing interestCommon data exchange framework with fast protocolsAchieving consistent LN naming and object structures for all AC/DCMapping agent tasks to LN objects remains a major barrierUnder development: bridging railway LN with agent logic (90-16 etc.)Potentially seamless if standard LN definitions are carefully extendedLong-run ROI promising; short-run complexity highConsortia for MAS-based standardization efforts are essentialSignificant synergy with advanced self-healing capabilities
Table 12. Key challenges and research potential for IEC 61850 in subway environments.
Table 12. Key challenges and research potential for IEC 61850 in subway environments.
Core ChallengeReal-World ImpactUnderlying CausePotential MitigationsOngoing R&D DirectionsStandardization GapsImplementation CostsStakeholder CollaborationScalabilityLong-Term Opportunities
Retrofitting Legacy EquipmentHigh in older lines with minimal budgetsNon-compatible IEDs, vendor-proprietary protocolsGateway solutions, incremental hardware integrationDeveloping specialized engineering profilesLimited LN definitions for DC traction elements and bridging logicMedium (moderate hardware outlay)Transit agencies, system integrators, vendorsPotentially low with well-planned modular upgradesHigh, can extend system lifespan with minimal disruptions
Security Threats to GOOSE/MMSPotentially extreme disruptions to train servicesLarger digital footprint, IP-based communicationsAdoption of secure substation gating, role-based access, encryption, intrusion detection frameworksEnhancing GOOSE, MMS security, emerging best practicesNot fully integrated into IEC 61850 documentsHigh (due to specialized cybersecurity hardware and software)Collaboration with cybersecurity experts, standard bodiesHigh, as networks grow and more connected components are addedEnhanced reliability and passenger safety
Complexity of LN/DO/DA MappingRisk of misconfigurations, hamper reliabilityMany LN, DO, DA with cryptic nomenclatures in large networksRigorous workforce training, improved SCL tools, compliance checks, advanced engineering softwareTools for automated SCL checks, global LN expansions for tractionSome LN expansions not widely implemented globallyMediumVendor synergy crucial to ensure consistent naming and modelingMedium, partial automation feasible for new lines with minimal manual overheadStreamlined expansions for new lines with minimal manual overhead
High-Speed CommunicationVital for self-healing performance but can be limitedGOOSE, SMV traffic congestion or suboptimal routingQoS management, VLAN segmentation, robust redundancy protocolsDesigning advanced traffic shaping and ring redundancy protocols, ongoing expansions for time synchronizationPTP profiles for time synchronization in traction contextModerate to high (switches, network)Industry alliances and communication vendors bridging train operatorsHigh if network architecture is well designed from the startReal-time detection and response enhancements
Table 13. Practical deployment cases for MAS–IEC 61850 in subway systems.
Table 13. Practical deployment cases for MAS–IEC 61850 in subway systems.
Deployment ScenarioNetwork ConfigurationMAS Scope of ControlIEC 61850 Layer UsageIntegration ComplexityPerformance RequirementsInitial InvestmentScalabilityStakeholder InvolvementLong-Term Feasibility
Full GreenfieldNew lines with fully digital substations and advanced controlEnd-to-end, covering AC and DC feeders, station-based MASs, protection, MMS for supervisory tasksGOOSE for real-time protection, MMS for supervisory tasksMedium (designed from scratch to incorporate MASs+IEC 61850 synergy)High reliability, ultra-low latency neededRelatively high, but offset by lower future O&M costsHigh potential to add new stations, lines seamlesslyTurnkey solution from major manufacturersVery high; standardized frameworks remain relevant for decades
Partial RetrofitExisting lines with some digital and some analog equipmentFocus on station-based AC feeder switching or DC traction auto-reconfigurationGOOSE bridging older and modern devices, MMS for SCADA overlayHigh, due to mismatch among legacy gear, older comm protocols, and new LN definitionsModerate reliability, improved reaction times neededModerate to high, given new hardware (protocol gateways, partial re-wiring)Potentially moderate expansions if planning is carefully carried outCollaboration with domain experts, vendor interoperabilityFeasible, but requires methodical phase-based approach
Station-Focused DeploymentOnly substation-level architecture with ring or dual LANMASs handle localized fault detection and equipment monitoringGOOSE for protective actions, limited MMS to station-level agentMedium, as the scope is confined, but integration with existing station IEDsLocal reliability within stations, no wide-area reconfigurationLow to moderate, as fewer devices require standard LN definitionsLimited but can be scaled to multiple stationsTypically station staff plus specialized MAS integratorsHigh for localized improvements; partial but effective
Table 14. Future R&D themes for MAS–IEC 61850 convergence in subway networks.
Table 14. Future R&D themes for MAS–IEC 61850 convergence in subway networks.
R&D FocusCurrent ExplorationAnticipated ChallengesProposed
Solutions
DependenciesPotential
Impact
Multi-Domain SynergiesStakeholder CooperationFunding OpportunitiesRoadmap Timeline
AI-Driven Prediction and ForecastingPilot implementations with limited scopeData heterogeneity from AC/DC equipment and sensor typesCentralized big data platforms ingesting LN data, advanced AI models5G networks, real-time analytics in agent frameworksHigh, can drastically improve self-healing efficiencyOverlap with condition monitoring and dynamic controlJoint programs among subway operators, OEMs, AI developersGovernment grants, private R&D funding3–5 years to robust field deployments
Integration with Edge and Fog ComputingConceptual stage in some research labsEdge computing infra cost, cybersecurity issues, standard APIsDeploy compact agent hardware, containerized LN-based virtualizationReal-time operating systems, advanced QoS managementModerate to high, can reduce latency, enhance resilienceIoT-driven station automation, synergy with AI servicesNeed synergy between computing and utility standardization bodiesIndustry–academia partnerships3–7 years for large-scale acceptance
Cybersecurity Frameworks for GOOSE and MMSEarly adoption of stronger encryption or role-based accessFull encryption for GOOSE might hamper performanceGOOSE extension with minimal overhead, role-based access for MASsNext-gen cryptography frameworks, post-quantum cryptographyExtremely high, especially for vital city infrastructureCross-pollination from IT and OT sectors on intrusion detectionGovernment-level directives, transit agencies, vendorsDedicated security R&D initiatives, global standard bodiesOngoing evolution as standards and threats escalate
Standard Extensions for DC Traction LNOngoing expansions to define new LN classes in IEC 61850-7 seriesFragmented LN coverage for DC traction, inconsistent vendor supportFormal LN definitions for DC traction, synergy with existing AC LNCollaboration among large railway operators, standard committeesHigh, bridging the gap between AC substation standards and DC-based railway standardsCloser alignment with railway committees and bodies, advanced pilot programsWG-level involvement from IEC and IRIS-like organizationsPossibly large from major suppliers, government R&D programs2–5 years to finalize LN amendments, testing in pilot projects
Table 15. Key technologies for substation-level self-healing.
Table 15. Key technologies for substation-level self-healing.
TechnologyApplication ScenarioDegree of ImplementationDifferences Between SystemsFuture ProspectsIssues to AddressResearch PotentialReliability ImpactCost ConsiderationsMaintenance RequirementsImpact on Power Quality
Fault Detection AlgorithmsDetection of electrical faultsModerateVaries by system typeHighFalse positives, sensitivityHighSignificantLowMediumHigh
Remote Control and IsolationIsolation of faulty segmentsHighAdvanced systems availableModerateCommunication delaysModerateHighMediumLowHigh
Automated ReconfigurationSystem restoration after isolationHighVaries in implementationHighDelays in reconfigurationHighHighHighLowMedium
Predictive Maintenance SystemsFault prediction and maintenanceHighAvailable in some systemsHighData accuracyHighHighMediumHighHigh
Real-Time Monitoring SystemsContinuous system monitoringVery HighAvailable in all modern systemsVery HighRequires significant infrastructureHighVery HighHighHighVery High
AI-Based ReconfigurationDynamic system restoration and reconfigurationModerateNew and emerging technologyVery HighData communication delaysHighVery HighHighMediumVery High
Table 16. Comparative analysis of line-level self-healing mechanisms in subway power networks.
Table 16. Comparative analysis of line-level self-healing mechanisms in subway power networks.
MechanismApplication ScenarioDegree of ImplementationDifferences
Between Systems
Future
Prospects
Issues to AddressResearch PotentialReliability ImpactCost ConsiderationsMaintenance RequirementsImpact on Power Quality
Ring Network ReconfigurationNetwork reconfiguration after faultsHighVaries by designHighRisk of power surgesHighHighMediumMediumHigh
Automated SwitchesFault isolation and reroutingHighAvailable in most systemsModerateTime delays in operationModerateHighMediumLowMedium
MAS-Based Decision MakingReal-time fault managementModerateInnovative for smart gridsHighData communication overheadHighSignificantLowMediumHigh
Fault Detection SensorsDetecting and locating faultsHighCritical for efficient systemsHighFalse negativesModerateHighMediumHighHigh
Real-Time Load BalancingDynamic balancing of network loadHighCan differ across systemsVery HighCoordination complexityHighHighHighLowHigh
Adaptive ReroutingAdjusting power flow dynamicallyHighCutting-edge technologyVery HighRequires high data throughputHighVery HighHighMediumVery High
Table 17. Comparative analysis of cross-layer fault recovery mechanisms in subway power networks.
Table 17. Comparative analysis of cross-layer fault recovery mechanisms in subway power networks.
MechanismApplication ScenarioDegree of ImplementationDifferences
Between Systems
Future ProspectsIssues to AddressResearch PotentialReliability ImpactCost ConsiderationsMaintenance RequirementsImpact on Power Quality
Hierarchical RecoveryCoordinated fault recoveryHighVaries across systemsVery HighSystem complexityHighVery HighHighHighVery High
Cross-Layer IsolationFault isolation across multiple layersModerateNew and emerging systemsHighData consistencyHighHighMediumHighHigh
MASs for Multi-Layer CoordinationDistributed decision making in recoveryHighRequires data consistencyVery HighCommunication delaysHighVery HighHighMediumVery High
Adaptive Load BalancingBalancing load across multiple levelsHighCritical for large systemsVery HighRequires real-time dataHighHighHighLowHigh
Data Integration SystemsIntegration of data across multiple layersModerateCritical for decision makingHighData synchronization issuesHighHighHighMediumVery High
Predictive Recovery AlgorithmsAnticipating faults and recovery actionsModerateExperimental in some systemsHighLack of accurate modelsModerateHighLowHighVery High
Table 18. AI-driven fault diagnosis and reconfiguration in subway power systems.
Table 18. AI-driven fault diagnosis and reconfiguration in subway power systems.
MechanismApplication ScenarioDegree of ImplementationDifferences
Between Systems
Future ProspectsIssues to AddressResearch PotentialReliability ImpactCost ConsiderationsMaintenance RequirementsImpact on Power Quality
Hierarchical RecoveryCoordinated fault recoveryHighVaries across systemsVery HighSystem complexityHighVery HighHighHighVery High
Cross-Layer IsolationFault isolation across multiple layersModerateNew and emerging systemsHighData consistencyHighHighMediumHighHigh
MASs for Multi-Layer CoordinationDistributed decision making in recoveryHighRequires data consistencyVery HighCommunication delaysHighVery HighHighMediumVery High
Adaptive Load BalancingBalancing load across multiple levelsHighCritical for large systemsVery HighRequires real-time dataHighHighHighLowHigh
Data Integration SystemsIntegration of data across multiple layersModerateCritical for decision makingHighData synchronization issuesHighHighHighMediumVery High
Predictive Recovery AlgorithmsAnticipating faults and recovery actionsModerateExperimental in some systemsHighLack of accurate modelsModerateHighLowHighVery High
Table 19. Key dimensions of AI-enhanced fault diagnosis and prognostics in subway power systems.
Table 19. Key dimensions of AI-enhanced fault diagnosis and prognostics in subway power systems.
Application
Scenario
Implementation StageDistinct FeaturesProspectsChallengesFuture Research
Potential
Data
Requirements
Level of
System Impact
Transformer MonitoringEmergingAI-based sensor data fusionExtended component lifespanLack of labeled failure dataTransfer learning for rare faultsHigh-frequency sensor logsMedium to high
Switchgear Fault DetectionIntermediateReal-time analytics at the edgeRapid fault isolationComplexity of multi-vendor systemsFederated learning for distributed sitesMedium-volume event-driven data streamsHigh
Cable Insulation PrognosticsEarly AdoptionPredictive modeling via MLReduced unplanned outagesEnvironmental variabilityHybrid physics-data-driven approachesContinuous partial discharge measurementsMedium
Substation Asset ManagementEmergingDigital twins with AI forecastingIntelligent maintenance schedulingIntegration with legacy SCADAExplainable AI for operator trustHistorical maintenance and operation logsHigh
Voltage/Current Anomaly AlertsIntermediateDL-based pattern recognitionNear-instant fault recognitionHigh false-positive riskActive learning with operator feedbackHigh-frequency waveform dataMedium to high
Power Converter MonitoringEarly AdoptionCNN-based image analysisImproved reliability and efficiencyModel interpretabilityDomain adaptation from similar systemsThermal imagery, high-speed sensor dataMedium
Passenger Load Prediction (Indirect for Fault Stress)ExperimentalAI correlation with ridership dataLoad management optimizationData privacy concernsMulti-modal integration (ridership and power)Access to fare collection and energy dataLow to medium
Overhead Line Wear DetectionExperimentalComputer vision for wear detectionEnhanced safety and service lifeSensor deployment challengesUAV and robotics-based inspectionHigh-resolution imagery and real-time streamsMedium
Table 20. Core aspects of RL-driven self-healing in subway power systems.
Table 20. Core aspects of RL-driven self-healing in subway power systems.
Application ScenarioAdoption LevelRL MethodologyKey AdvantagesChallengesFuture Research
Potential
Distinct Operational ConstraintsLong-Term
Prospects
Feeder ReconfigurationPilot StudiesDeep Q-NetworksAutomated fault isolationLarge state-action spaceTransfer learning from simulationVoltage, current, and safety marginsHigh, with proven pilot successes
Load Balancing in Peak HoursEmergingPolicy GradientDynamic response to changing demandMaintaining service continuityMeta-RL for rapid policy updatesPassenger safety, train schedulesHigh, crucial for growing urban demands
Multi-Substation CoordinationExperimentalCooperative RLGlobal optimization of resourcesCommunication overhead among agentsHierarchical RL for layered coordinationData exchange and synchronizationMedium to high, depends on standardization
Integration with MASsEarly AdoptionA2C, PPODistributed, scalable intelligenceComplexity of multi-agent negotiationsHybrid MAS–RL frameworksCybersecurity for distributed agentsHigh, synergy with self-healing goals
Emergency Fault RecoveryProof-of-ConceptModel-Based RLPreemptive planning and quick restoreEnsuring real-time updates of system modelReal-time digital twins and advanced simulationStrict time constraintsMedium, requires robust real-world data
Autonomous Voltage RegulationConceptual StudiesOff-policy RLReduces human oversightRisk of instability if policy is incorrectOffline learning with partial environment modelsRegulatory compliance and device limitationsMedium, depends on regulatory acceptance
Signaling-Power CoordinationEmergingMulti-Agent RLHolistic approach to operational safetyComplexity of multi-objective optimizationCross-domain RL frameworks for signal and power dataInterdisciplinary data standardsHigh, synergy between power and signaling
Energy Storage ManagementPreliminaryHybrid RLMinimizes energy costs and improves reliabilityUncertain battery degradation profilesTransfer and continual learning for battery healthBattery lifetime and cost constraintsMedium, depends on cost-effectiveness
Table 21. Implications of integrating AI, MASs, and IEC 61850 in subway power systems.
Table 21. Implications of integrating AI, MASs, and IEC 61850 in subway power systems.
Integration DimensionCurrent
Maturity Level
MAS RoleIEC 61850
Feature
AI
Contribution
Key AdvantagesMajor ChallengesLong-Term
Potential
Substation AutomationIntermediateAgents for local protection schemesGOOSE for event-driven messagingPredictive analytics for fault detectionFaster response, standard data modelingEnsuring backward compatibility, vendor constraintsVery high (foundational for self-healing)
Energy Routing and SharingEmergingDistributed load management agentsMMS-based data exchangeRL for optimal schedulingImproved energy efficiency, reduced costsCoordination complexityHigh, particularly for integrated urban networks
Real-time Fault RecoveryEarly AdoptionNegotiation protocols among agentsSampled values for high-fidelity dataFast reconfiguration decisionsAutonomous fault isolation and restorationHandling concurrency and data burstsHigh, improves reliability and safety
Predictive MaintenanceIntermediateCoordination among asset-level agentsStructured data for sensor readingsML-based asset health predictionsReduced downtime, extended equipment lifeIntegrating with legacy diagnostic systemsMedium to high, reliant on data quality
Resilience under CyberattacksConceptual StudiesAgents implementing security policiesRole-based access control featuresAnomaly detection in network trafficEnhanced system security and resilienceEvolving threat landscapeMedium, but essential for modern systems
Integration of RenewablesPilotAgents to manage local generationEnhanced IEC 61850 DER profilesMulti-objective optimization (AI)Reduced carbon footprint, diversified energy mixUncertainty in renewable supplyMedium to high, synergy with green policies
Network-wide OptimizationEmergingHierarchical MAS architectureInteroperability across devicesGraph-based AI for global optimizationHolistic approach to load flow and reliabilityScalability of centralized–decentralized hybridsVery high, potential step-change in performance
Human–Machine CollaborationEarly ResearchAgent-based interactive dashboardsStandardized data for UI integrationExplainable AI for operator guidanceImproved situational awareness and operator trustComplexity in interface design, training staffMedium, fosters acceptance of AI decisions
Table 22. Core dimensions of cybersecurity, privacy, and data management in AI-driven subway power systems.
Table 22. Core dimensions of cybersecurity, privacy, and data management in AI-driven subway power systems.
Focus AreaRisk LevelPrimary ThreatsKey Security
Measures
Privacy
Considerations
Data Governance
Needs
Implementation ComplexityFuture Development Outlook
Data Poisoning PreventionHighAltered training datasetsTrusted data pipelines, robust data validationMinimizing personal data usageClear ownership of dataset curation and updatesMedium to highMore advanced anomaly detection
Intrusion and Ransomware DefenseVery HighUnauthorized system access, encryption of operational dataMulti-factor authentication, network segmentationPotential exposure of passenger dataComprehensive role-based permissionsHighZero-trust architectures, advanced IDS
Sensor and Edge Device SecurityMediumSpoofing or tampering with local sensorsSecure hardware modules, signed firmware updatesMinimal retention of localized dataLocal data lifecycle policiesMediumWidespread adoption of secure edge computing
Privacy-Preserving AnalyticsHighInadvertent personal data gatheringDifferential privacy, anonymization, secure computationRegulatory compliance (GDPR, etc.)Clear guidelines on data usage and sharingMediumAdoption of standard privacy frameworks
Encryption and Secure ProtocolsHighEavesdropping on communication channelsEnd-to-end encryption, TLS-based solutionsMinimal stored passenger identifiersIEC 61850 extension with security profilesMediumIntegration with post-quantum algorithms
Data Lifecycle ManagementLow to MediumRetention of outdated or unverified dataAutomated purging, archiving policiesProper anonymization before storageCentralized metadata, consistent versioningMediumGrowth of advanced data-lake solutions
Incident Response and RecoveryVery HighProlonged downtime, compromised assetsRedundant backups, well-defined playbooks, real-time alertsData subject notifications if breach occursLegislative alignment with local government policiesHighAI-driven automated containment solutions
Compliance and CertificationMediumPenalties for non-complianceFrequent audits, standardized frameworksTransparent privacy statementsAlign with international standards (ISO, IEC)MediumGreater emphasis on multi-stakeholder certification
Table 23. Socio-economic and operational dimensions of AI-driven subway power systems.
Table 23. Socio-economic and operational dimensions of AI-driven subway power systems.
DimensionOperational
Impact
Economic
Influence
Policy
Framework
Workforce
Implications
Urban
Development
ChallengesLong-Term
Outlook
Reliability and Service QualityFewer disruptions, faster recoveryHigher ridership, reduced compensation costsSafety regulations, service standardsShift toward strategic oversightPublic trust in mass transitEnsuring AI reliability and acceptanceHigh, fosters public adoption of subways
Cost Structure and FundingReduced O&M expenses, capital reallocationPotential new revenue streams (data monetization)PPP frameworks, capital incentivesDemand for financial data analystsReinforcement of transit networksCost of AI integrationMedium to high, dependent on ROI
Workforce TransitionAutomated routine tasks, improved safetyIndirect cost savings from fewer human errorsLabor guidelines, reskilling grantsNeed for data science and AI specialistsEnhanced system stabilityResistance to change, union negotiationsMedium, requires policy and education synergy
Environmental SustainabilityEnergy optimization, synergy with renewablesLower carbon footprint, positive brand imageGreen certifications, carbon creditsAdditional roles for sustainability officersIntegration with EV infrastructureGaps in grid readiness, technology unproven in some areasHigh, part of broader city climate goals
Innovation and Technology EcosystemFaster deployment of advanced solutionsStimulates local tech sectors, fosters start-upsIP regulations, open data policiesCollaborative R&D roles across universities and operatorsEnhanced city-wide innovationBalancing proprietary and open-source solutionsHigh, strong synergy with digital economy
Urban ResilienceQuick adaptation to unexpected eventsReduces economic losses from major incidentsDisaster preparedness rules, city planningCross-functional roles in risk managementEncourages stronger public transport usageComplexity of integrating multiple infrastructuresHigh, critical for disaster mitigation
Regulatory AlignmentCompliance with safety/operational mandatesAvoids penalties, fosters public–private partnershipsEvolving standards for AI and data usageHigher accountability for operatorsPossible expansion of rail servicesComplexity of multi-layer governanceMedium, depends on legislative agility
Public Trust and AcceptanceTransparent, real-time communicationPotential for fare policy changes, better ridershipPrivacy protection, public engagementEmphasis on communication skills in staff trainingImproved passenger satisfactionData privacy concerns, potential for misunderstandingsHigh, essential for widespread adoption
Table 24. Summary of security threats, mitigation strategies, and their implementation priorities.
Table 24. Summary of security threats, mitigation strategies, and their implementation priorities.
Security ThreatPotential ImpactMitigation StrategyImplementation ComplexityPriority Level
Data PoisoningDegraded AI model performance and incorrect decisionsSecure data pipelines, anomaly detection, and data validationHighVery High
Sensor SpoofingMisleading data leading to incorrect fault isolationEncryption of sensor data, anomaly detection, sensor authenticationMediumHigh
Ransomware and DoS AttacksDisruption of system functionality and operational downtimeRegular backups, intrusion detection systems, secure communicationHighVery High
Unauthorized AccessCompromised system control and decision-makingMulti-factor authentication, access control policies, secure protocolsHighVery High
Communication LatencyDelayed fault detection and recoveryEdge computing, low-latency communication protocolsMediumHigh
Spoofing of AI DecisionsInaccurate decisions leading to system instabilitySecure AI pipelines, explainable AI, anomaly detectionHighMedium
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Feng, J.; Yu, T.; Zhang, K.; Cheng, L. Integration of Multi-Agent Systems and Artificial Intelligence in Self-Healing Subway Power Supply Systems: Advancements in Fault Diagnosis, Isolation, and Recovery. Processes 2025, 13, 1144. https://doi.org/10.3390/pr13041144

AMA Style

Feng J, Yu T, Zhang K, Cheng L. Integration of Multi-Agent Systems and Artificial Intelligence in Self-Healing Subway Power Supply Systems: Advancements in Fault Diagnosis, Isolation, and Recovery. Processes. 2025; 13(4):1144. https://doi.org/10.3390/pr13041144

Chicago/Turabian Style

Feng, Jianbing, Tao Yu, Kuozhen Zhang, and Lefeng Cheng. 2025. "Integration of Multi-Agent Systems and Artificial Intelligence in Self-Healing Subway Power Supply Systems: Advancements in Fault Diagnosis, Isolation, and Recovery" Processes 13, no. 4: 1144. https://doi.org/10.3390/pr13041144

APA Style

Feng, J., Yu, T., Zhang, K., & Cheng, L. (2025). Integration of Multi-Agent Systems and Artificial Intelligence in Self-Healing Subway Power Supply Systems: Advancements in Fault Diagnosis, Isolation, and Recovery. Processes, 13(4), 1144. https://doi.org/10.3390/pr13041144

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop