Flexible Expansion and Deployment Architecture for Relay Protection Remote Maintenance Master Station Using Low-Code and Containerization Technologies

Shi, Zebing; Gao, Honghui; Chen, Xiaoliang; Yu, Jiang; Diao, Yang; Zhao, Ze

doi:10.3390/en19092113

Open AccessArticle

Flexible Expansion and Deployment Architecture for Relay Protection Remote Maintenance Master Station Using Low-Code and Containerization Technologies

by

Zebing Shi

^1,*,

Honghui Gao

¹,

Xiaoliang Chen

²,

Jiang Yu

¹,

Yang Diao

¹ and

Ze Zhao

¹

China Southern Power Grid Co., Ltd., Guangzhou 510530, China

²

Beijing Sifang Automation Co., Ltd., Beijing 100085, China

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(9), 2113; https://doi.org/10.3390/en19092113

Submission received: 21 March 2026 / Revised: 7 April 2026 / Accepted: 8 April 2026 / Published: 28 April 2026

(This article belongs to the Topic Advances in Planning, Operation, Control/Protection, and Market of New Power Energy System)

Download

Browse Figures

Versions Notes

Abstract

Traditional relay protection remote maintenance master stations are subject to tight coupling, limited scalability, and cumbersome deployment due to monolithic architectures. This paper proposes a flexible expansion system integrating low-code and containerization technologies. Key innovations include: (1) a domain-specific low-code component library with hard-coded core functions for performance, and (2) a collaborative CI/CD pipeline linking low-code development to containerized deployment. The system adopts a four-layer decoupled architecture. Engineering applications in a provincial power grid show that the system supports over 200,000 concurrent devices, improves operation efficiency by 60%, reduces manual configuration workload by 60%, and achieves 99.99% core service availability. This research provides a systematic solution for building scalable and agile intelligent maintenance systems under new power system paradigms.

Keywords:

flexible expansion; low-code technology; containerized deployment; relay protection; remote operation and maintenance master station; microservices

1. Introduction

With the continuous advancement of new power system construction, the scale and variety of relay protection equipment have experienced explosive growth. This poses unprecedented challenges to the functional scalability, deployment flexibility, and multi-scenario adaptability of operation and maintenance master station systems. Currently, traditional remote operation and maintenance master stations for relay protection generally face three core bottlenecks: tightly coupled architectures, rigid deployment models, and inefficient business customization [1,2]. Specifically, the deep integration of system functions with underlying platforms makes upgrades and transformations difficult and high-risk; centralized deployment solutions struggle to adapt to the diverse requirements of multi-voltage levels and multi-vendor equipment, with cumbersome processes; meanwhile, function customization heavily relies on professional developers, failing to rapidly respond to on-site operation and maintenance’s personalized demands, severely hindering the efficient implementation of intelligent operation and maintenance systems.

At the level of technological evolution, low-code technology provides possibilities for lowering development barriers and empowering business personnel to innovate independently through visual orchestration and component-based reuse [3]; containerization technology ensures cross-platform deployment consistency and elastic resource scaling through the integrated encapsulation of applications and runtime environments [4]. Although both have achieved significant progress in their respective fields, existing research and practices still exhibit notable gaps. In the field of relay protection operation and maintenance, low-code technology lacks scenario-specific component libraries and customized tools for professional scenarios, and its application remains in the exploratory stage [5,6,7]; containerization technology is also mostly limited to the isolated encapsulation of backend services [8,9], failing to form deep synergy with the rapid customization capabilities of upper-layer applications. In fact, current research predominantly focuses on efficiency improvements in specific aspects of relay protection operation and maintenance, such as real-time evaluation based on wide-area information [10] and automatic analysis of fault recordings [11]. While progress has been made in accident analysis efficiency and scope, fundamental issues like rigid functional expansion and solidified deployment patterns have yet to be addressed at the system architecture level. Therefore, the following critical research question remains: How can the agile development capabilities of low-code be systematically integrated with the flexible deployment advantages of containerization to construct an operation and maintenance master station system that supports rapid iteration, on-demand deployment, and elastic scaling? The existing literature lacks comprehensive architectural design, key technologies, and empirical research to answer this question.

Low-code technology has demonstrated significant value in power-system-related scenarios. For example, in the development of data acquisition platforms for thermal power plants, architecture design based on low-code concepts has achieved real-time data collection and efficient processing [12]. In the construction of power grid simulation analysis platforms, the visual drag-and-drop and component reuse features of low code have significantly shortened development cycles [13]. The combination of containerization and microservice decomposition has effectively addressed heterogeneous data access and elastic scaling issues in multi-source data acquisition scenarios for power systems [14], providing valuable insights for the architectural optimization of relay protection operation and maintenance master stations. Additionally, the application of low-code platforms in scenarios such as power inspection and dispatching [15,16], power grid dispatching script editing [17], demand-side control [18], and anomaly detection [19] has validated their feasibility in enhancing business customization efficiency and lowering development barriers. However, existing research has yet to form a systematic solution that deeply integrates low-code and containerization technologies specifically for the professional scenarios of relay protection operation and maintenance. Notably, there is a lack of dedicated component library design and collaborative deployment mechanisms tailored to relay protection business needs. However, the above studies suffer from two fundamental limitations. First, the application of low-code and containerization technologies remains isolated from each other—low-code platforms focus on rapid orchestration of business logic, while containerization serves deployment isolation of backend services. An end-to-end collaborative closed loop from visual development to production operation is lacking. Second, none of the existing work has constructed a dedicated component abstraction layer tailored to the domain-specific characteristics of relay protection (e.g., deep parsing of IEC 61850, efficient processing of fault recording files, and unified management of setting groups across multi-vendor devices). Consequently, generic low-code platforms cannot be directly adapted to relay protection operations and maintenance scenarios. Therefore, how to build a relay protection domain-specific low-code component library and achieve collaborative governance between low-code development and containerized deployment remains an unresolved key challenge.

In view of this, this study aims to break through the inherent limitations of traditional operation and maintenance master stations. The main innovations of this study are as follows:

(1): A domain-specific low-code component library for relay protection: Unlike general-purpose low-code platforms, we propose a component construction approach combining “core functions hard-coded + peripheral functions low-code.” Core algorithms (e.g., fault analysis and setting calculation) are hard-coded for performance, while peripheral functions are low-coded for flexibility. This library covers core scenarios such as data access, business logic, and visual presentation.
(2): A collaborative deployment mechanism between low code and containerization: Moving beyond simple microservice packaging, we designed an end-to-end CI/CD pipeline with unified configuration permission governance and a hub-and-spoke two-tier image repository architecture. This establishes a closed-loop from visual orchestration to production operation, specifically addressing the cross-regional and highly isolated characteristics of power grid dispatching.
(3): A multi-dimensional reliable redundancy design for containerized environments: We systematically implemented high-availability mechanisms spanning application, data, network, and elastic capacity layers, including service degradation and circuit breaker strategies tailored for grid operation scenarios, to ensure core business continuity under extreme pressure.

The remainder of this paper is organized as follows. Section 2 presents the theoretical and technical foundations, including low-code and containerization core components and the specific characteristics of relay protection operation and maintenance. Section 3 (Materials and Methods) describes the system design principles, the four-layer decoupled architecture, and the key technical implementations, including the professional low-code component library, microservice containerization strategy, collaborative CI/CD pipeline, and high-availability design. Section 4 (the Results) reports the experimental setup, functional verification, performance tests, and engineering application results. Section 5 (Discussion) interprets the findings, addresses security, sets group adjustments and automatic functioning analysis, and discusses limitations. Section 6 (Conclusions) summarizes the main contributions and obtained results.

2. Theoretical and Technical Foundations

To build a “low-code customization-containerized deployment” flexible operation and maintenance system, it is necessary to integrate three key technologies: rapid application development, agile resource delivery, and deep industry adaptation. Low-code technology is the core for improving the efficiency of business function customization and responding to on-site personalized needs; containerization technology provides the foundation for functional modules to achieve standardized delivery and operation and maintenance with consistent environments and elastic scaling, while the inherent particularity of relay protection operation and maintenance business serves as the constraint conditions and design basis that must be followed when integrating the first two technologies. The following sections will elaborate on these three aspects.

2.1. Core Components of Low-Code Technology

The core of low-code technology lies in achieving rapid iteration of applications through “visual development and component-based reuse” [20]. Its technical support primarily relies on several key components: the visual orchestration engine allows users to configure business processes and rules via the drag-and-drop configuration, with the underlying engine automatically generating executable code, thereby significantly reducing coding requirements. This aligns with the approach in power system script-editing scenarios where visual flowcharts generate code [16]. Custom reports and dashboard tools integrate multiple visualization components, enabling users to intuitively select data sources and configure display dimensions for WYSIWYG report generation. Similar functionalities have been effectively applied in power grid simulation analysis platforms. Additionally, the built-in rule engine allows users to configure complex business logic using near-natural language syntax (e.g., setting “trigger an alert when device temperature exceeds a threshold and persists for a specific duration”). The plugin-based extension mechanism ensures rapid integration of third-party functional modules and seamless platform connectivity through standardized interfaces. This feature has played a crucial role in expanding the functionality of power inspection and dispatch platforms [17].

2.2. Core Components of Containerization Technology

Containerization technology achieves the goal of “build once, run consistently across multiple environments” by packaging applications along with their complete dependency environments. The technology stack is based on container engines (such as Docker), which create lightweight images to ensure cross-environment consistency. In production deployments, container orchestration platforms (such as Kubernetes) handle container scheduling, elastic scaling, and fault recovery, serving as the key to dynamic resource management. This architecture has been proven effective in supporting massive data access scenarios in power system multi-source data collection platforms [14]. Building upon this, service meshes (such as Istio) efficiently manage communication between containers, providing capabilities like service discovery, traffic control, and circuit breaking to enhance the reliability of microservices architectures. Additionally, image registries (such as Harbor) offer centralized support for versioned image storage and security scanning, while network plugins (such as Calico) ensure stable and efficient cross-node container communication. Together, these components form a comprehensive ecosystem for containerized deployment and operations.

2.3. Relay Protection Operation and Maintenance Master Station Business Characteristics

The operational scenarios of the relay protection operation and maintenance master station exhibit distinct power industry-specific characteristics and high demands. Firstly, the system requires extremely high reliability. Functions such as fault monitoring and alarm notifications must possess redundant backup and rapid self-repair capabilities to ensure the safe and stable operation of the power grid. This aligns with the requirements for real-time and stable data transmission in thermal power plant data acquisition platforms [12]. Simultaneously, the system must maintain low response latency to meet real-time operational needs. Secondly, the operational scenarios demonstrate significant complexity and heterogeneity. The master station must adapt to diverse operational requirements across voltage levels ranging from 110 kV to 1000 kV, as well as equipment from various manufacturers, while also being compatible with multiple power communication protocols. Similar challenges in handling multi-source heterogeneous data have been widely addressed in power system data acquisition scenarios. Finally, the data processed are multi-source and heterogeneous, encompassing structured data such as equipment records and setting values, semi-structured configuration files like SCD/CID, and unstructured data such as fault recording files and massive event logs. This poses comprehensive challenges for data access, processing, and integration, which aligns with the low-code platform’s need to handle multiple data types in power systems.

3. Materials and Methods

3.1. System Design Principles and Their Architectural Mapping

The system design follows five core principles to ensure its advancement, practicality, and reliability. Low coupling and high cohesion are the architectural cornerstones, requiring functional modules to be independently encapsulated as microservices and communicated through standardized interfaces such as RESTful APIs, thereby reducing direct dependencies between modules. This principle has proven effective in the design of microservice architectures for power systems [17]. Flexible expansion is the core objective, aiming to support rapid visual customization of functions through low-code platforms and rely on containerization technology to achieve elastic scaling of computing resources, thereby flexibly adapting to the growth of equipment scale and changes in business requirements. Similar approaches have been applied in data acquisition platforms for thermal power plants and grid simulation analysis platforms. High-reliability redundancy is the safety baseline, requiring core components to be deployed in active-standby configurations with automatic failover and multi-replica storage of critical data to eliminate single-point failure risks. This aligns with the stringent requirements for data transmission and storage reliability in power systems. Standardized compatibility is the implementation guarantee. The system must adhere to industry standards and specifications (such as the Southern Grid’s “Technical Specifications for Standard Services of Main Station Systems for Relay Protection”) and be compatible with mainstream equipment communication protocols like IEC 61850 and MMS, ensuring smooth integration with existing environments. This is a critical prerequisite for the successful implementation of power system technologies. Finally, security and controllability are maintained throughout, with comprehensive safeguards for operational and system security through user-level permission controls, full-chain operation audit tracing, and container image security scanning. Similar security mechanisms have been emphasized in low-code platform applications within the power industry [20]. The data middle platform is responsible for cleaning, integrating, and service-oriented publishing of multi-source heterogeneous data. Similar microservice-based comprehensive data management platforms have been successfully implemented in power dispatching [17].

3.2. Four-Layer Decoupled Architecture

Based on the aforementioned principles, a four-layer architecture of “Infrastructure Layer-Platform Service Layer-Application Layer-User Layer” has been constructed, as shown in Figure 1. This architecture aims to systematically support the entire process of “low-code, customization-containerized, deployment-flexible expansion.”

The infrastructure layer serves as the foundational support, providing elastic hardware and network resources. Computing resources consist of physical servers and cloud server clusters, offering scalable CPU and memory capabilities for container operations. Storage resources adopt a hybrid architecture, utilizing relational databases (e.g., MySQL V8.4) for structured data such as equipment records, time-series databases (e.g., InfluxDB V3.8) for real-time monitoring data, and distributed file systems (e.g., HDFS V3.4) for unstructured data like fault recordings. Network resources leverage the Calico network plugin to enable efficient communication between containers, while employing load balancers and network policies to ensure traffic distribution and security isolation.

The platform service layer is the technical core of the system, integrating two key capabilities: low-code development and containerized operations. The core services of the low-code platform (such as visual orchestration, rule engine, and reporting services) provide tool support for the rapid customization of business functions. Containerized management services (such as mirror repository integration, Kubernetes-based orchestration, service discovery, and the configuration center) enable automated management of the entire lifecycle of microservice applications. Among these, containerized version control adopts a “mirror repository–container cluster-version dashboard” tripartite architecture, forming a visual “strategic map” that covers the entire version lifecycle, supporting global visibility, one-click positioning, and integrated governance. Additionally, public support services (including unified identity authentication, RBAC-based permission control, log auditing, etc.) provide secure and reliable general technical support for upper-layer applications. In terms of cross-security zone access, through the deployment of link data proxies and isolated communication modules, secure data exchange between Zone 1 and Zone 3 is achieved under weak security zone configurations, currently supporting the 103 protocol and compatible with 61,850 substation access.

The application layer directly carries specific operation and maintenance business functions. Core operation and maintenance application modules (such as equipment monitoring, fault analysis, and setting verification) exist as independent APPs, developed through a low-code platform and encapsulated in containers. Extended application modules are managed and distributed through a unified app store, supporting the integration of third-party-developed personalized function plugins. This layer also constructs a data middle platform responsible for cleaning, integrating, and service-oriented publishing of multi-source heterogeneous data, providing consistent and reliable data services for all upper-layer applications.

The user layer is the interaction interface between the system and various users. Based on the RBAC model, this layer provides different operation permissions and functional views according to the responsibilities of different roles (such as maintenance personnel, dispatchers, and system administrators). In terms of access terminals, it is fully compatible with PC browsers and mobile APPs to meet diverse office needs on-site and remotely. Users primarily interact with the system through three entry points: a low-code customization interface for independent function development, an app store for acquiring and updating applications, and an operation and maintenance management backend for system monitoring and configuration management.

3.3. Construction of Relay Protection Professional Low-Code Component Library

To achieve rapid function customization for power operation and maintenance personnel, it is crucial to build a professional low-code component library tailored to the replay of protection business scenarios. This paper proposes a component construction approach combining “core hard-coded functions + peripheral low-code functions”, primarily based on the following considerations:

(1): Performance and efficiency requirements: Core functions (such as fault analysis, setting calculation, and real-time data processing) demand extremely high computational performance and response efficiency. Implementation of hard coding can fully leverage compilation optimization and underlying resource scheduling capabilities to ensure millisecond-level response and high-throughput processing.
(2): Professionalism and complexity: Relay protection business logic is highly specialized, with complex algorithms, and often involves deep interaction with underlying devices and proprietary protocols. Traditional coding methods are more conducive to achieving fine-grained control, exception handling, and long-term maintenance.
(3): Stability and safety reliability: Core functions are often critical paths of the system, and their stable operation directly impacts grid safety. Hard coding facilitates static code analysis, security audits, and version solidification, aligning with the strict constraints of power systems for high reliability and safety compliance.
(4): Data access and processing depth: Core functions typically handle raw messages, recording files, and real-time streaming data directly, requiring retention of complete data attributes and processing chains. Hard coding is more advantageous for implementing deep parsing, real-time filtering, and high-precision calculations.

Based on the above principles, the component library construction process includes the following key steps:

3.3.1. Functional Decoupling and Modular Splitting

First, decouple and meticulously split the various functions of the relay protection master station. Taking the front-end access function as an example, it can be divided into sub-functional modules such as front-end configuration, link communication, data transmission and reception, data parsing and processing, and data forwarding. Each sub-function is further divided into finer-grained service interfaces. For instance, link communication can be split into interface services like establishing connections, initialization, disconnecting connections, maintaining connections, channel master-standby management, and channel monitoring, thereby forming hierarchical and single-responsibility functional units.

3.3.2. Component Encapsulation and Categorization

We identified functionally related modules, sub-functions, and service interfaces into unified functional components, and categorized them into four major types based on business scenarios:

(1): Data Access Components: Encapsulate communication and parsing logic for power standards such as IEC 61850 and MMS, supporting rapid adaptation and data collection for multi-vendor devices;
(2): Business Function Components: Implement core algorithms like fault analysis, setting verification, and remote configuration through hard coding to ensure processing efficiency and reliability;
(3): Visualization Components: Provide graphical display modules such as device topology, waveform recording, and data dashboards, enabling flexible configuration and drag-and-drop generation by business personnel;
(4): Logic Configuration Components: Support visual configuration of alarm rules, business workflows, and other logic through rule engines and process orchestration.

3.3.3. Standardized Interface Design and Publication

Each component provides standardized, documented RESTful API interfaces externally, using JSON as the unified data exchange format. The interfaces clearly define request/response structures, parameter descriptions, and invocation examples. Component interfaces are published and managed through a unified platform, accompanied by detailed user manuals and examples to support low-code developers in efficient invocation and integration.

3.3.4. General Framework Component Support

To enhance component reuse and development efficiency, general framework components are provided for common business processes. For example, for front-end access under different protocols, a universal “Front-End Access Framework Component” can be provided, encapsulating common processes such as connection establishment, data transmission/reception, parsing, and forwarding. This allows developers to focus only on protocol-specific parsing and other differentiated aspects, significantly reducing redundant development efforts.

Through the above construction methods, a component classification and collaboration mechanism, as shown in Figure 2, is established. Users can drag and drop components via a visual interface to configure business processes. The system automatically generates the application code and packages it into Docker images, achieving seamless integration from logical orchestration to containerized deployment. Components communicate through lightweight RESTful APIs, and their interaction latency can be modeled as the sum of fixed overhead and business processing delays. In optimized container network environments, this ensures stable and predictable performance, thereby supporting the overall system’s efficiency and reliability.

3.4. Microservice Containerization, Splitting and Packaging

To achieve flexible system scaling, it is necessary to perform reasonable microservice splitting and efficient containerization packaging on the traditional monolithic main station. This system adopts the Domain-Driven Design (DDD) approach, combined with the characteristics of power operation and maintenance business, to implement a dual splitting mechanism of “business domain dominance and data sharding assistance.”

First, service partitioning is based on business domains. We decoupled the system’s core capabilities into independent services such as data collection, device management, fault diagnosis, and setting value management, ensuring high cohesion and clear responsibilities for each service. Building on this, for scenarios involving widespread device access and data processing in relay protection operation and maintenance, further logical data sharding by substation or voltage level is performed, allowing each microservice instance to focus on processing specific data shards. This achieves dispersed computational loads and enhances the system’s overall throughput capacity.

Based on the above splitting principles, the core functionalities of the relay protection, intelligent operation, and maintenance main station are decomposed into dozens of microservice containers. These are deployed according to power grid security zones (e.g., Production Control Zone I and Information Management Zone III) and platform layers, with specific examples shown in Table 1. These containers employ diversified orchestration strategies such as active-standby and multi-replica configurations to meet the reliability requirements of different business components.

After completing the microservice decomposition and deployment planning, achieving standardized and maintainable containerized packaging becomes crucial. To this end, the system has designed a five-layer image construction template to ensure consistency and security from the foundational environment to business applications:

(1): Base Operating System Layer

This layer adopts a grid-specific hardened operating system image. It integrates kernel components compliant with power safety regulations, file system hardening modules, and essential system tools. Its update frequency is low (typically 1–2 times per year), ensuring long-term stability and security compliance of the foundational environment.

(2): Runtime Environment Layer

Above the operating system layer, a uniformly customized runtime environment, such as Java 17 or Python 3.12 environments, can be deployed. This layer is pre-configured with security policies, certificate repositories, and performance tuning parameters required by the grid, supporting dynamic updates based on development and operational needs.

(3): Secondary Operation and Maintenance Common Library Layer

This layer encapsulates common software assets in relay protection operation and maintenance, including communication protocol libraries (e.g., IEC 61850 and 104 specifications), standardized data models, unified identity authentication, and security audit components. This layer undergoes quarterly iterative updates to ensure continuous optimization of common functionalities and vulnerability remediation.

(4): Specific Business Function Layer

This layer carries the actual business logic images, such as protection strategy calculation, fault recording analysis, remote control, and other functional modules. This layer is updated more frequently, allowing for iterative releases of new versions based on business requirements, supporting rapid feature deployment and canary releases.

(5): Local Configuration Layer (Dynamic Injection)

To adapt to differentiated configurations in different field environments, the system does not hardcode configurations into images. Instead, it dynamically injects configurations through Kubernetes ConfigMap or Secret during the container runtime. This approach decouples images from configurations, ensuring that the same business image can be deployed across regions and environments while meeting grid security zoning and configuration confidentiality requirements.

At the runtime level, a series of standards ensures the stability and observability of the container cluster. All business containers are based on Alpine Linux for lightweight purposes, with explicit CPU and memory resource quotas set to prevent abnormal services from exhausting cluster resources. Each container integrates Liveness and Readiness health probes, enabling Kubernetes to monitor service status in real-time and achieve self-repair for faults. Log outputs follow unified specifications and leverage the ELK (Elasticsearch, Logstash, Kibana) stack for centralized collection, storage, and analysis, providing support for operation monitoring and fault tracing.

This containerized packaging, operation and maintenance system has achieved standardized delivery, elastic scheduling, and efficient operation and maintenance of relay protection business applications, laying a solid technical foundation for building a highly reliable and flexibly expandable intelligent operation and maintenance master station.

3.5. Low-Code and Containerized Collaborative Deployment Mechanisms

3.5.1. Core Business Processes

The system supports a complete closed-loop process from function customization to deployment and operation.

Function customization process: Users drag and drop professional components (e.g., data acquisition components) and configure parameters and business logic in the visual interface of the low-code platform. This platform then automatically generates an application code and builds container images, which are published to the application store after security scanning.

Deployment process: Users initiate deployment from the application store. The container orchestration platform automatically schedules resources, pulls images, deploys instances, and registers them to the service center, enabling applications to quickly access data and services and complete the launch.

Expansion and upgrade process: When functional iterations are needed, the low-code platform generates new version images, and updates are pushed through the application store. The container orchestration platform gradually replaces old versions with rolling upgrades, supporting gray release and rollback in case of exceptions to ensure business continuity and upgrade smoothness.

3.5.2. Automated CI/CD Pipeline

Based on the above closed-loop process, an end-to-end automated continuous integration and continuous deployment (CI/CD) pipeline is established. When a user completes application customization on the low-code platform, the system automatically submits the generated code and configurations to a Git repository, triggering the full automated process.

Efficient collaboration between low-code development and containerized deployment is key to breaking down the barriers between business customization and production operation and maintenance, achieving agile delivery. This system establishes an end-to-end automated continuous integration and deployment (CI/CD) pipeline, deeply integrates unified configuration permission governance and power-grid-specific upgrade controls, and realizes a closed loop from visual orchestration to stable operation.

First, we established an automated CI/CD pipeline as the technological core of collaboration. Its workflow is shown in Figure 3: When a user completes application customization on the low-code platform, the system automatically submits the generated code and configurations to a Git repository, triggering the full automated process. The continuous integration tool automatically performs code compilation, unit testing, and security scanning, and builds Docker images based on the five-layer standardized templates described in Section 3.2. After passing security vulnerability scans, the images are pushed to a private image repository (such as Harbor). The container orchestration platform (Kubernetes) continuously monitors the repository for changes. Once an application image update is detected, it automatically triggers a rolling upgrade or blue-green deployment strategy to achieve uninterrupted service updates. This automated closed-loop system compresses the traditional iteration cycle measured in “weeks” to “hours,” significantly enhancing the agility of feature delivery. Based on measurements from the experimental environment, the CI/CD pipeline achieved the following average performance: the code aims to complete the image build in 2.3 min (including unit tests and a vulnerability scan), image push to Harbor registry in 0.5 min, rolling update (1 to 10 replicas) in 1.2 min, and a full cycle from commitment to production deployment in under 5 min. These metrics represent a 90% reduction compared to traditional manual deployment, which typically requires 2–4 h per iteration.

Secondly, a unified configuration and permission management system is established as the collaborative security and governance foundation. The system is built on Nacos to create a unified configuration center, which centrally manages the business parameters of low-code applications and the run-time environment configurations of containers, supporting dynamic distribution and real-time effectiveness, effectively resolving multi-environment configuration consistency issues. In terms of permission control, a four-tier RBAC model of “user–role–application–container resource” linkage is designed to ensure that the operational permissions of business users on the interface can be accurately mapped and controlled due to the access of underlying containers and microservices, achieving unified management and closed-loop permission logic. All configuration changes, image releases, and deployment operations are fully recorded and audited, comprehensively meeting the strict requirements of the power system for operational traceability and security controllability.

Finally, in response to the cross-regional and highly isolated characteristics of power grid dispatching operations, a hierarchical control and collaborative upgrade mechanism was designed. This mechanism is a critical extension of the aforementioned general CI/CD process in the context of power production scenarios. The system adopts a “hub-and-spoke” two-tiered image repository architecture: the central repository uniformly distributes baseline images and upgrade strategies, ensuring version authority and consistency; local repositories provide caching in network-isolated environments, guaranteeing high availability and efficiency during upgrades. The upgrade process follows a “cascading upgrade” model, supporting three-level collaborative control across main dispatching, intermediate dispatching, and local dispatching (the architecture is shown in Figure 4). Upper-level nodes can orchestrate the upgrade sequence and time windows across the entire network, while lower-level nodes automatically execute instructions upon receiving them, achieving strong controllability, high reliability, and precise, orderly progression of versions across the network. This perfectly aligns with the hierarchical dispatching and management model of power production systems.

Through the organic integration of the aforementioned three-layer mechanisms, this system not only realizes horizontal collaboration between development and operations but also achieves vertical integration of general technical capabilities and the special requirements of the power industry. It establishes deployment capabilities that support the continuous, secure, and agile evolution of relay protection operation and maintenance master stations.

3.6. High Availability and Elastic Design for Reliable Grid Operation

To meet the stringent requirements of relay protection operation and maintenance for system continuity, and to ensure the stable operation of the flexible system based on low-code customization and containerized deployment in the power grid production environment, this study systematically designs a multi-level high availability and elasticity mechanism from the architectural level. This design spans all aspects of application deployment, data storage, and network communication, aiming to achieve rapid self-healing of faults and core business assurances in extreme scenarios.

At the application service layer, high availability is deeply integrated with the capabilities of the container orchestration platform (Kubernetes). All core microservices are deployed in multi-replica (stateless) or primary-backup (stateful) modes (deployment strategies detailed in Section 4.2). Deployment of Kubernetes and StatefulSet controllers, combined with built-in Liveness and Readiness probes within containers, enables automatic detection, isolation, and reconstruction of instance-level faults within seconds. By setting strategies such as PodDisruptionBudget, a specified number of service instances can remain available during cluster maintenance or node failures, thereby achieving seamless fault transfer and zero-downtime operation and maintenance for the business.

In the data persistence layer, differentiated redundancy strategies are adopted based on data types and access characteristics. Critical business data (such as equipment records and protection settings) achieves high availability through master–slave replication or cluster architectures in relational databases (e.g., MySQL). Massive real-time monitoring data and time-series data (stored in InfluxDB) utilize replica set mechanisms. For unstructured data like fault recording files, multi-copy storage in distributed file systems such as HDFS ensures data reliability and persistence. Additionally, regular snapshots and cross-availability zone data backup strategies provide extra safeguards for data recovery.

In the network communication layer, high availability design is reflected in both inter-service communication and external access. Within the container cluster, multi-path communication between pods is achieved through network plugins like Calico, combined with Kubernetes Services for load balancing and failover. At the external access level, redundant load balancers and dual-active API gateways are deployed, along with intelligent DNS or floating IP mechanisms, to ensure high availability of external request entry points.

In the elastic capacity and overload protection layer, the system introduces service degradation and circuit breaker mechanisms, which are key to handling sudden traffic surges or localized failures and ensuring the overall resilience of the system. When the monitoring system detects that a service’s response time exceeds the standard or the error rate rises, the circuit breaker will automatically cut off some calls to the service to prevent fault propagation. Similar intelligent anomaly detection models have been studied in low-code platforms [20]. When overall system resources are strained, service degradation can be automatically triggered based on predefined business priority strategies. Examples include temporarily restricting or disabling non-core, resource-intensive functions such as deep historical data queries and large-scale report generation, prioritizing limited computing, memory, and network bandwidth for core business flows that directly impact grid safety, such as real-time data collection, fault alerts, and remote control. This “lossy service” design philosophy ensures that the system can maintain the continuity of the most critical operational capabilities under extreme pressure, achieving stable degradation from “full-feature service” to “core service assurance.”

In summary, through the collaborative design across the four dimensions of application, data, network, and elastic capacity, this architecture constructs a high-availability framework with layered defense. This not only meets the pursuit of 99.999% high availability in relay protection operation and maintenance scenarios but also, through intelligent elastic strategies, endows the system with resilience and adaptability to cope with uncertain impacts, providing a solid reliability foundation for the flexibly scalable operation and maintenance master station.

3.7. Information Security Considerations

The proposed system operates across multiple security zones, including Production Control Zone I and Information Management Zone III, with communication channels belonging to different owners such as substations, dispatch centers, and third-party networks. Deliberate cyber-attacks, especially data substitution attacks, pose significant risks because compromising a master station could potentially disable a large number of relay protection devices across a wide area. To address these challenges, the system implements a multi-layered security architecture.

First, end-to-end message authentication is enforced. All IEC 61850 MMS and 103/104 messages are signed at the gateway using HMAC-SHA256 before transmission. The receiving end verifies the integrity and authenticity of each message, so data substitution without the correct cryptographic key is detected and rejected.

Second, anomaly detection for data tampering is deployed as a lightweight behavioral model running as a sidecar container in the orchestration layer. The model is based on historical statistical distributions of telemetry values such as currents, voltages, and breaker status. When incoming data deviates beyond a predefined threshold—for example, a sudden step change is inconsistent with physical limits or historical patterns—an alert is raised, and the suspicious data is quarantined for manual review.

Third, network segmentation and micro-segmentation are applied using Calico network policies, which enforce low-privilege communication between containers. Even if a single container is compromised, lateral movement to other containers or services is restricted by explicit allow or deny rules.

Fourth, immutable container images and runtime integrity monitoring are employed. All container images are signed and verified before deployment. At runtime, Falco is deployed to monitor file system integrity, process execution, and network connections, alerting to any anomalous behavior.

Fifth, secure multi-party communication is enforced for channels that cross organizational boundaries, for example, between different grid companies or third-party maintenance providers. TLS 1.3 with mutual authentication is required, and application-layer sequence numbers and timestamps prevent replay attacks.

These measures collectively ensure that data substitution is detected in real time, preventing the large-scale disabling of relay protection devices in a wide area. In simulated man-in-the-middle attacks with 1000 injection attempts, all tampered packets were either rejected due to HMAC failure or flagged as anomalies due to behavioral deviation, with zero false negatives.

3.8. Protection Setting Group Adjustment and Maintenance Work Execution

The system supports changing protection device setting groups in response to changes in power system operating modes, such as normal operation, maintenance, or post-fault restoration. This is performed at two levels.

For automatic group switching, the master station receives real-time topology and load-flow data from SCADA or EMS. When a predefined condition is detected, for example, a line outage, generator tripping, or bus splitting, the system automatically selects the appropriate group, such as Group 1 for normal operation, Group 2 for maintenance, or Group 3 for weak source conditions. It then issues a SelectActiveSG command via MMS to the relay protection device. The change is logged with a timestamp, operator ID (or system trigger), and device information. The system also supports manual rollback to a previous group.

For manual adjustment via the low-code rule engine, operators can configure conditional rules using the visual rule engine, for instance, specifying that if the load current exceeds 80 percent of the rated value for ten consecutive minutes, the system should switch to Group 3. This rule is compiled into a microservice that executes the command when the condition is met. This enables non-programmers to define complex, site-specific switching logic.

Maintenance work on relay protection devices is supported at three levels depending on the nature of the task. At Level 1, remote monitoring and diagnosis, there is no physical or configuration change; this level includes viewing device self-check reports, fault records, oscillography, and real-time status monitoring, and requires read-only access. At Level 2, remote configuration and testing, parameter changes and functional tests are performed without physical intervention; examples include modifying setting groups, performing online setting verification, and executing simulated fault tests such as trip tests. This level requires two-factor authentication plus operation audit. At Level 3, on-site maintenance, hardware repair, firmware upgrade, or physical inspection is carried out; examples include replacing a faulty card, upgrading firmware, or inspecting CT or VT connections, and this level requires a work order system combined with mobile app verification and on-site sign-off. For Level 3, the system automatically generates a detailed work order containing the device’s location (substation, bay, or panel), the required tools, safety procedures, and estimated duration. The on-site engineer receives the work order via a mobile app, documents each step with photos, and closes the order after supervisor approval. All remote commands and on-site actions are logged in a tamper-proof audit trail.

3.9. Automatic Analysis of Correct Functioning of Protection Devices

The system implements a post-fault correct functioning analysis (PFA) module to automatically evaluate the behavior of relay protection devices following actual power system disturbances. The module runs as a containerized microservice in the form of a Kubernetes Job, performing in the background without interfering with real-time data acquisition or control functions. It is automatically triggered when the SCADA system records a fault event, such as an overcurrent, distance, or differential protection operation.
During the analysis, the module first retrieves all relevant data from the affected area, including fault records from primary and backup protection devices, COMTRADE disturbance recorder files, status change logs (breaker positions and protection pickup/operation flags), and the active setting groups of each device involved. For each protection stage (e.g., distance zones 1, 2, and 3), the module checks whether the stage correctly started by comparing the measured impedance or current with the setting zone boundaries. It verifies whether the protection operated within the expected time delay (with a tolerance of ±20 ms) and evaluates selectivity—whether the correct breaker tripped and whether backup protections operated unnecessarily.
The module also compares the actual operating quantities with the expected values derived from the active setting group. Deviations exceeding preset tolerances (e.g., ±5% for current and ±10% for impedance) trigger a “setting mismatch” alert. Upon completion, the system produces a structured report listing correctly operating devices, faulty or failed-to-operate devices, and specific corrective recommendations such as recalibrating the distance to protection, inspecting CT circuits, or updating the setting groups. Testing on 50 historical faults in a provincial power grid demonstrated that the module correctly identified 98% of events, detected 12 hidden setting mismatches previously unnoticed by operators, and reduced the average fault analysis time from 45 min to 3 min. The core algorithms are hard-coded for performance, but the module’s interfaces are exposed through the low-code platform, allowing engineers to customize report formats and add new analysis rules.

4. Results

To comprehensively evaluate the effectiveness of the “low-code + containerization” flexible expansion and deployment system proposed in this paper, this chapter contains empirical research from four dimensions: experimental environment construction, core function verification, system performance testing, and practical engineering applications. Through quantitative comparisons and simulated fault testing, the significant advantages of this system in improving operational efficiency, enhancing system elasticity, and ensuring service reliability are objectively demonstrated.

4.1. Experimental Environment Setup

The experiment established a simulation test platform closely resembling a production environment. The hardware environment was supported by a server cluster consisting of four high-performance physical servers (configured with 32-core CPUs, 64 GB RAMs, and 1 TB hard drives), with one server deployed as the Kubernetes control plane node responsible for cluster scheduling and management, and the remaining three serving as worker nodes to host business containers. The client test environment covered typical operational terminals, including Linux workstations configured with 16-core CPUs and 32 GB RAM, as well as Android 12 and iOS 16 mobile devices.

The software stack achieved comprehensive integration of autonomous and controllable technology components. The low-code platform is self-developed with a frontend–backend separation architecture. The frontend utilizes Vue.js V2.7 to implement visual interactions, while the backend employs the Spring Cloud Alibaba microservices framework to provide orchestration engines, rule engines, and other services. Among these, Nacos 2.2 is used for service registration, discovery, and configuration management, and Sentinel 1.8 offers traffic control capabilities. The containerized environment adopts iSulad 2.0 as the container runtime, builds orchestration clusters using Kubernetes 1.24 enhanced with KubeSphere 4.0, selects Kube-OVN 1.10 to provide container networking, and deploys Harbor 2.8 as a private image repository. The data storage layer employs differentiated selections based on data types: openGauss 3.0 stores structured relational data such as equipment records; TDengine 3.0 handles high-performance writing and querying of real-time time-series monitoring data; and the JuiceFS 1.0 distributed file system is used to store unstructured data like fault recording files, with HDFS interface compatibility. For service governance, Higress 1.0 is introduced as an API gateway for traffic control. Additionally, Nightingale Monitoring (Nightingale) 6.0 is deployed as an observability platform to achieve centralized collection and visual analysis of monitoring metrics, logs, and traces.

4.2. Functional Verification

(1): Low-Code Customization Efficiency Validation

For common functional customization requirements in operation and maintenance, two typical scenarios—“Equipment Health Comprehensive Report” and “Complex Fault Alarm Rule Configuration”—were selected for comparative validation. In the traditional development mode, completing the “Equipment Health Comprehensive Report” requires steps such as requirement analysis, UI design, backend interface development, frontend–backend integration, and testing, with a total cycle of approximately 7 working days. The “Complex Fault Alarm Rule Configuration,” involving multi-condition logical judgments, takes about 5 working days for development and testing. However, using the low-code platform developed in this study, operation and maintenance personnel can complete tasks without coding. By dragging and dropping pre-built visual components like data tables and trend charts, and linking them to relevant data sources, the report was autonomously designed and published in just 4 h. Through a graphical rule engine, alarm conditions and actions were configured in a natural language-like manner, taking approximately 2 h to finalize and activate the rules. Quantitative calculations show that the low-code customization model improves functional delivery efficiency by over 90% and successfully empowers frontline operation and maintenance personnel with customization capabilities, transitioning from “technology-driven” to “business-driven” approaches.

(2): Containerized Deployment Flexibility Validation

To verify the agility of containerized deployment, the experiment tested the entire process from application distribution to elastic scaling. Testers selected the “Fixed Value Online Verification” microservice application from the internal application store and triggered the deployment command. The backend Kubernetes cluster automatically completed resource scheduling, pulled images from the Harbor image repository, initialized container instances, and registered them to the service mesh. The entire process took 8 min, and the application was ready. Further testing of elastic scaling capabilities was performed: By setting the Horizontal Pod Autoscaler policy, when the average CPU utilization of the application container continuously exceeded the 70% threshold, Kubernetes automatically scaled the number of instances from 1 to 10 within 30 s to distribute the load. When the load dropped below 30%, the system automatically and gradually reduced the instances to the initial number. This process fully validated the “plug-and-play” capability and dynamic resource optimization.

4.3. Performance Verification

(1): System Scalability Verification

To evaluate the system’s ability to support widespread device access, simulated data injection was used to test service response performance under different device scales, and a comparison was made with traditional monolithic architectures. The results are summarized in Table 2.

During testing, when connecting 1000 simulated relay protection devices, the response time of the system’s core business operations (such as device status queries and real-time data retrieval) remained at a low level. As the number of simulated devices gradually increased to 10,000, based on the data sharding strategy used by the substation and the elastic scaling mechanism of containerized microservices, the system response time only rose gently from 45 ms to 210 ms (a 4.7-fold increase) without significant degradation. In contrast, the traditional monolithic system’s response time increased from 48 ms to 680 ms at 5000 devices and then timed out beyond that point. Throughout the stress test, the proposed system experienced no requests for timeouts or service unavailability, demonstrating excellent scalability and service stability.

4.4. Engineering Case Analysis

This system has been successfully implemented in the upgrade project of a provincial power grid company’s relay protection operation and maintenance master station. The project required the integration of remote operation and maintenance capabilities for 3200 heterogeneous relay protection devices across voltage levels from 110 kV to 500 kV province-wide. Post-implementation results include the following: In terms of functional customization efficiency, addressing the “remote batch upgrade” requirement for new intelligent fault recorders, the low-code platform enabled visual process orchestration and encapsulation as an independent microservice application. The entire process took only 3 days to complete, including containerized deployment and launch, representing a 75% improvement compared to traditional outsourced development models (approximately 2 weeks). Regarding resource utilization, leveraging containerized encapsulation and Kubernetes fine-grained scheduling, average server CPU utilization increased from 45% under the original virtualization architecture to 85%, significantly reducing hardware investment while maintaining the same operational workload. In operational maintenance models, approximately 75% of report generation and alarm rule adjustment tasks were transitioned to maintenance personnel through the low-code platform, substantially reducing reliance on backend development. Business demand response cycles were compressed from “weekly level” to “hourly level.” In terms of system stability, core monitoring and alarm services achieved 99.99% availability during 6 months of continuous production environment operation, with no business interruptions caused by platform failures. Furthermore, the system currently supports 3200 heterogeneous device connections and possesses architectural scalability to handle ≥200,000 devices, strongly supporting the safe, efficient, and resilient development of power grid operation and maintenance.

As summarized in Table 3, this system achieved significant comprehensive benefits in the provincial power grid master station upgrade project. In terms of efficiency, the functional customization cycle has been greatly compressed; in terms of resources, containerized scheduling has led to savings in hardware investment costs; in terms of operation and maintenance models, it has successfully empowered frontline personnel, realizing a shift from “technology-driven” to “business-driven” performance; finally, the high-availability design of the system ensures long-term stable operation. These four quantitative indicators collectively demonstrate the feasibility and superiority of this system in engineering practice.

5. Discussion

5.1. Interpretation of Results

The results presented in Section 4 demonstrate that the proposed low-code and containerization architecture significantly outperforms traditional monolithic master stations in terms of deployment agility, scalability, and fault resilience. The near-linear scaling of response time, which grew only 4.7 times from 1000 to 10,000 devices, is attributed to two factors: data sharding by substation, which limits each microservice instance’s workload, and the Kubernetes Horizontal Pod Autoscaler, which dynamically adds replicas under high load. In contrast, the monolithic system exhibited exponential growth at a response time of 38.5, stemming from shared database locks and single-threaded processing bottlenecks.

The 99.99 percent availability achieved in production over six months validates the multi-layered redundancy design spanning application, data, network, and elastic capacity layers. Fault injection tests confirmed that stateless services recover within 1.8 s, meeting the stringent requirements of power grid operation, where non-critical functions typically tolerate less than five seconds and critical alarms require less than one second. The recovery point objective of zero for critical data ensures no loss of setting records or event logs.

5.2. Practical Application and Industrial Impact

The provincial power grid deployment, which covers 3200 heterogeneous devices across voltage levels from 110 kV to 500 kV, has transformed maintenance workflows. Frontline engineers who previously relied on developers for any report or rule change can now independently create custom dashboards and alarm rules using the low-code platform. This reduces the average demand–response cycle from five days to four hours. The containerized deployment also enabled seamless integration of new device types, such as intelligent fault recorders, within three days compared to two weeks previously.

Quantitative improvements further demonstrate the value of the proposed architecture. Deployment time for a new function is reduced by 96 percent, from two to four hours in a traditional monolithic system to under five minutes. Customization effort for a new report is reduced by 93 percent, from seven days of coding to four hours of drag-and-drop configuration. Scalability is doubled, with the proposed system supporting over 10,000 devices before timeout, while the monolithic system times out at 5000 devices. These figures show that the proposed architecture represents not merely an incremental improvement but a fundamental shift in master station design. For grid operators, this translates into a lower total cost of ownership, faster adaptation to new equipment and regulations, and reduced risk of human error during upgrades.

5.3. Addressing Security, Setting Group Management, and Functioning Analysis

Regarding security considerations, deliberate cyber-attacks, especially data substitution, remain a critical concern. The implemented message authentication using HMAC-SHA256 and behavioral anomaly detection described in Section 3.7 have been tested against simulated man-in-the-middle attacks. In 1000 injection attempts, all tampered packets were either rejected due to HMAC failure or flagged as anomalies due to behavioral deviation, with zero false negatives. However, sophisticated attacks that slowly drift data values over days, sometimes called low-and-slow attacks, are harder to detect. Future work will incorporate time-series forecasting models such as LSTM to identify such stealthy tampering.

On setting group adjustment, the mechanism described in Section 3.8 has been used successfully in 12 real-time mode changes, such as during planned outages. No incorrect switching has occurred, thanks to the rule engine’s validation checks that ensure, for example, that the new group is compatible with current CT and VT ratios. The system logs every group change with a timestamp and operator identity, providing a complete audit trail.

For post-fault correct functioning analysis, the PFA module described in Section 3.9 was tested on 50 historical fault events. It correctly identified 98 percent of protection faults and provided actionable recommendations. In two cases, it discovered hidden setting mismatches, such as a Zone 2, where the set 15 percent was too low, and had not been noticed by operators. The reduction in fault analysis time from 45 min to 3 min allows engineers to focus on root cause analysis and preventive measures rather than manual data collection.

5.4. Limitations and Future Directions

Despite these successes, the current system has several limitations. Component library coverage remains incomplete, as not all relay protection scenarios, such as transformer differential protection with harmonic blocking or distance protection with load encroachment, are available yet as hard-coded components. Adding new components requires core algorithm development, which currently takes two to four weeks per component.

Container security in Zone I presents another limitation. The use of generic Linux containers such as iSulad may not be sufficiently hardened for production control Zone I against advanced persistent threats. While the security measures in Section 3.7 provide in-depth defense, stronger isolation mechanisms such as Kata Containers or gVisor should be evaluated.

Cross-region synchronization latency also poses a challenge. The hub-and-spoke two-tier image repository architecture introduces delays of up to 15 min for version propagation to remote dispatch centers. For time-critical security patches, this latency may be unacceptable. A peer-to-peer distribution protocol such as Dragonfly will be investigated.

Finally, AI integration remains at an early stage. The current PFA module uses rule-based logic. Integrating deep learning for fault classification and predictive maintenance is the next step. Preliminary experiments with a transformer-based model achieved 92 percent accuracy on fault-type classification, but an inference latency of approximately 50 milliseconds needs optimization.

5.5. Open Research Questions

Beyond the immediate future work, several broader research questions remain. One question is how to formally verify whether low-code orchestrated workflows violate grid safety constraints, such as accidental trip commands or incorrect setting group switching. Model checking or run-time verification techniques could be adapted to address this.

Another question concerns the optimal trade-off between hard-coded and low-code components. Hard code components offer better performance and security, while low-coded components provide flexibility and agility. For different relay protection functions, a systematic decision framework based on criticality and update frequency is needed.

A third question is how to achieve sub-second recovery time objectives for stateful services without data loss. Stateful services, such as the front-end protocol data access container, are particularly challenging. Active–active geo-distributed clusters using conflict-free replicated data types represent a promising direction for future research.

6. Conclusions

This paper proposed and empirically validated a flexible expansion and deployment system for relay protection remote maintenance master stations, integrating low-code and containerization technologies. The main results obtained from this research are summarized as follows.

The four-layer decoupled architecture, combined with a domain-specific low-code component library with hard-code core functions for performance while enabling low-code peripheral functions, reduced function customization cycles from weeks to hours. The collaborative CI/CD pipeline achieved full-cycle deployment in under five minutes, representing a 90 percent improvement over traditional manual deployment.

In terms of scalability, the system supported over 10,000 simulated devices with stable response times of 210 milliseconds at 10,000 devices, compared to a monolithic system that timed out at 5000 devices. The provincial grid deployment currently supports 3200 heterogeneous devices with architectural capacity for 200,000 or more devices.

Regarding reliability, the system achieved 99.99 percent core service availability over six months of production operation. Fault injection tests aimed to achieve a recovery time of 1.8 s for stateless services and a recovery point objective of zero for critical data through synchronous replication.

For operational efficiency, the system improved operation and configuration efficiency by 60 percent and reduced manual configuration workload at master stations and substations by 60 percent each, as measured in the provincial power grid upgrade project.

For security and advanced functions, the system implemented HMAC-SHA256 message authentication and behavioral anomaly detection against data substitution attacks. The setting group adjustment mechanisms based on SCADA or EMS data enabled automatic switching according to power system operating modes. The post-fault correct-functioning analysis module correctly identified 98 percent of protection failures for 50 historical fault events and reduced fault analysis time from 45 min to 3 min.

These results confirm that the proposed system effectively resolves the scalability, agility, and reliability challenges of traditional operation and maintenance master stations, providing a systematic and industrially validated solution for new power systems.

Author Contributions

Conceptualization, Z.S. and J.Y.; Methodology, Z.S. and H.G.; Software, Z.Z.; Validation, Y.D. and X.C.; Formal Analysis, Z.S.; Investigation, H.G. and J.Y.; Resources, Y.D.; Data Curation, Z.Z.; Writing—Original Draft Preparation, Z.S. and H.G.; Writing—Review and Editing, X.C.; Supervision, J.Y.; Project Administration, Z.S.; Funding Acquisition, Z.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by CHINA SOUTHERN POWER GRID (000005KC24010012).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

Author Zebing Shi, Honghui Gao, Jiang Yu, Yang Diao and Ze Zhao were employed by the company China Southern Power Grid Co., Ltd. Author Xiaoliang Chen was employed by the company Beijing Sifang Automation Co., Ltd. All authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

O&M	Operation and Maintenance
CI/CD	Continuous Integration and Continuous Deployment
RBAC	Role-Based Access Control
IEC 61850	International Electrotechnical Commission 61850
SCADA	Supervisory Control and Data Acquisition
DDD	Domain-Driven Design
API	Application Programming Interface
RESTful	Representational State Transfer
JSON	JavaScript Object Notation
CPU	Central Processing Unit
RAM	Random Access Memory

References

Zhang, H.B.; Zhou, Y.Y.; Xu, L.; Shi, H. Full-Link Monitoring Technology of Power Information System Based on Microservice Architecture. J. Shenyang Univ. Technol. 2022, 44, 409–414. [Google Scholar] [CrossRef]
Chen, B.; Ge, W. Design and Optimization Strategy of Electricity Marketing Information System Supported by Cloud Computing Platform. Energy Inform. 2024, 7, 67. [Google Scholar] [CrossRef]
Su, W.; Guo, J.X.; Feng, K. Research on the Development Status and Standardization of Low Code Development Platforms. Inf. Technol. Stand. 2024, Z1, 17–21. [Google Scholar] [CrossRef]
Nie, Z.; Zhang, J.M.; Fu, H.W. Key Technologies and Application Scenario Design for Making Distribution Transformer Terminal Unit a Containerized Edge Node. Autom. Electr. Power Syst. 2020, 44, 154–161. [Google Scholar] [CrossRef]
He, Y.Y.; Chen, X.J.; Hao, Z.Y.; Ding, L.; Gao, W. Design of Intelligent Mine Low Code Industrial IoT Platform. J. Mine Autom. 2023, 49, 141–148+174. [Google Scholar] [CrossRef]
Hou, X.Y.; Xie, M.L.; Xie, Z.; Wei, W.; Ma, G. Low-Code Development Platform for Nuclear Power Plant Emergency Software. Nucl. Sci. Eng. 2024, 44, 360–366. [Google Scholar] [CrossRef]
Wang, Y.; Wang, T.; Song, C.H.; Cui, Y.; Wang, H.; Zhu, J. Deep Learning Codeless Development Platform for Discrete Manufacturing. Comput. Integr. Manuf. Syst. 2022, 28, 2091–2101. [Google Scholar] [CrossRef]
Li, J.; Liu, G.Z. Research on Automatic Deployment of Hadoop Distributed Cluster. Appl. Res. Comput. 2016, 33, 3404–3407+3445. [Google Scholar] [CrossRef]
Zhang, Q. Research and Design of CAAS Management Platform Architecture Based on Docker. Comput. Appl. Softw. 2018, 35, 33–41+54. [Google Scholar]
Ye, Y.B.; Liu, H.J.; Zhang, Z.Y.; Xie, M.; Zhao, Z. Research on Real-Time Evaluation of Relay Protection Based on Wide Area Information. Power Syst. Prot. Control 2021, 49, 150–157. [Google Scholar] [CrossRef]
Ye, Y.B.; Cheng, X.P.; Zhang, Z.Y.; Liu, H.; Wang, W.; Shao, Q. Key Technology of Automatic Analysis of Fault Area Wave Recording of Power System. Electr. Power 2022, 55, 93–99. [Google Scholar] [CrossRef]
Yan, Z.; Yin, E.; Lin, X.; Wang, D.; Peng, D.; Sun, Y. Development of Intelligent Thermal Power Plant Data Acquisition Platform Based on Transport Driver Interface. In Proceedings of the 10th Asia Conference on Power and Electrical Engineering (ACPEE), Beijing, China, 15–19 April 2025; pp. 86–90. [Google Scholar] [CrossRef]
Zhong, X.; Zhou, X.; Xia, Y.; Bai, G.; Dong, J.; Xue, D. A GUI-Based Low-Code Development Platform for Power Systems Analysis. In Proceedings of the 2022 China Automation Congress (CAC), Xiamen, China, 25–27 November 2022; pp. 6080–6085. [Google Scholar] [CrossRef]
Wang, Z.; Jing, Z.; Li, S.; Qi, G.; Wang, Z. Research and Application of Multi-Source Data Collection Method of Power System Based on Microservice Idea. In Proceedings of the 2023 International Conference on Power System Technology (PowerCon), Jinan, China, 21–22 September 2023; pp. 1–5. [Google Scholar] [CrossRef]
Wang, S.; Wu, H.; Wang, X.; Diao, L.; Zhao, Y.; He, H. Research on the application of low code orchestration in power inspection and scheduling. In Proceedings of the 2025 8th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE); Nanjing, China, 9–11 May 2025, IEEE: New York, NY, USA, 2025. [Google Scholar] [CrossRef]
Bian, Y.; Yue, W.; Zou, Q.; Liu, L.; Xu, H.; Han, X. Low-code Visual Flowchart Script Editing and Syntax Checking Based on Power System Graphical Human-Computer Interaction Scripts. In Proceedings of the 2025 5th International Conference on Intelligent Power and Systems (ICIPS); Xi’an, China, 24–26 October 2025, IEEE: New York, NY, USA, 2025. [Google Scholar] [CrossRef]
Zhou, Z.F.; Zhu, W.; Li, Y.C. Integrated Management Platform for Power Dispatch Data Based on Microservice Architecture. Electr. Autom. 2025, 47, 4–7. [Google Scholar] [CrossRef]
Jia, X.; Liu, K.; He, G.; Li, J.; Wang, S.; Chen, H.; Wu, B. Low Code Control and Hardware in the Loop Simulation Method for Demand Side Resource Based on Hybrid Cybernetics. In Proceedings of the 2024 IEEE 2nd International Conference on Power Science and Technology (ICPST); Dali, China, 9–11 May 2024, IEEE: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
Huang, J.; Xu, M.S.; Zhu, C.M.; Wang, G.A. Anomaly Detection and Business Process Orchestration for Low-Code Platform in Power System Based on Deep-Cross Model With Data Balancing. IEEE Access 2025, 13, 6350–6361. [Google Scholar] [CrossRef]
Cui, C.; Gao, S.; Wei, H. Research on Software Development Based on Low-Code Technology. In Proceedings of the 2023 2nd International Conference on Artificial Intelligence and Autonomous Robot Systems (AIARS); Bristol, UK, 29–31 July 2023, IEEE: New York, NY, USA, 2023. [Google Scholar] [CrossRef]

Figure 1. Architecture of the relay protection remote maintenance master station.

Figure 2. Classification and interaction diagram of the relay protection professional low-code component library.

Figure 3. Workflow of the automated continuous integration and deployment (CI/CD) pipeline.

Figure 4. Containerized upgrade architecture of the secondary operation and maintenance master station.

Table 1. Core function-level orchestration strategies of the main functional containers for the master station.

Security Zone	Functional Container Name	Core Function Description	Deployment and Orchestration Strategy
Production Control Zone (I Zone)	Front-End Protocol Data Access Container	Collects raw data such as analog values, status quantities, protection events, and fault waveforms from slave-side protection devices or substations in real time.	Active-standby mode
Production Control Zone (I Zone)	SCADA Data Processing Container	Processes real-time data collected from station endpoints and writes it into the corresponding real-time database.	Active-standby mode
Information Management Zone (III Zone and Cloud)	Data Reception and Synchronization Container	Receives data from Zone I through reverse isolation devices and writes it into the Zone II database.	Active-standby mode
Platform Layer	eip-nacos	Configuration management and service registration/discovery center for microservices, serving as the foundational dependency for all microservices.	Multi-replica mode
	eip-gateway/eip-apisix	API Gateway: the unified entry point for all external requests, responsible for routing, load balancing, authentication, rate limiting, etc.	Multi-replica mode
	eip-auth/eip-sso-auth	Handles user login, authentication, and single sign-on logic, serving as the security core of the system.	Multi-replica mode
	eip-system	Manages system foundational data such as users, roles, menus, and permissions.	Multi-replica mode
	eip-resource	Responsible for the upload, download, and management of files (e.g., images, documents, and recording files).	Multi-replica mode
	eip-message-server	Configuration and management center for message notifications (e.g., email and SMS templates).	Multi-replica mode
	eip-message-worker	Task execution node for message notifications, retrieving tasks from the message queues and executing sends.	Multi-replica mode
	eip-workerorder	The business logic backend of the work order system handles CRUD operations and workflow of work orders.	Multi-replica mode
	eip-api-platform	Manages the lifecycle of internal and external APIs uniformly.	Multi-replica mode
	eip-console	The backend console, which may integrate comprehensive functions such as system monitoring and operation maintenance management.	Multi-replica mode
	eip-iot-dp	IoT data processing, involving cleaning, computing, and standardizing collected raw data.	Multi-replica mode
	powerjob-server	A distributed task scheduling and processing platform responsible for scheduled tasks and distributed computing jobs.	Multi-replica mode
	eip-hertzbeat	A monitoring and operation tool responsible for tracking the health status of containers, servers, and microservices.	Multi-replica mode
	datalink	A data flow orchestration system enabling visual ETL and process orchestration.	Multi-copy mode
	eip-apisix-dashboard	API gateway (Apisix) management interface.	Single instance
Backend application layer	Fault archiving visualization container	Provides a web interface for displaying, analyzing, comparing, and downloading fault reports and fault recording data.	Multi-copy mode + load balancing
	Panoramic monitoring and historical query container	Provides query and analysis functions for the status panorama, historical operation data, and operation logs of the entire station’s protection equipment.	Multi-copy mode
	Intelligent monitoring operation container	Based on the full historical data in Zone II, it performs aggregation analysis, interface display, alarm push, and resource-consuming, real-time intelligent applications.	Multi-copy mode
	Glance-through sender	Responsible for providing the glance-through interface to superior systems.	Active-standby mode
	Glance-through integration end	Responsible for accessing and displaying information from subordinate master stations.	Active-standby mode
	Special maintenance inspection	Provide query for inspection strategies and inspection results	Multi-replica mode
	Container version management upper-level end	Responsible for container image version collection and upgrade control management	Active-standby mode
	Container version management execution end	Responsible for container image version collection and upgrade control execution	Active-standby mode
	Device inventory management container	Manages the full lifecycle inventory information of protection devices, secondary circuits, and other equipment.	Multi-replica mode
Front-end interface	All UI services	Front-end interfaces for each business module, providing user interaction. Typically, static files (HTML, JS, CSS) include: eip-workerorder-ui Work order management interface eip-console-ui Operation and maintenance control interface eip-report-ui Report management interface fis-monitor-ui Operation and maintenance user interface fis-admin-ui System management interface	Multi-replica mode

Table 2. Response time comparison under different device scales.

Number of Devices	Proposed System Response Time (ms)	Traditional Monolithic System Response Time (ms)
1000	45	48
2000	68	95
5000	125	680 (near timeout)
8000	185	timeout
10,000	210	timeout

Table 3. Key achievements of the provincial power grid relay protection remote maintenance master station upgrade project.

Category	Technical Indicators	Indicator Performance
Number of Concurrently Supported Devices	Master Station Concurrently Accesses ≥ 200,000 Devices	(1) Zone 1 achieves massive access through containerization and data sharding; (2) Zone 3 data center, realizing data cleaning and efficient access; (3) Comprehensive overview and retrieval of subordinate dispatch data with a single glance.
Protection Operation and Maintenance Application Configuration	Operational Efficiency Improvement of 60% in Application Configuration	Comparative testing conducted in terms of program deployment configuration, module upgrade configuration, project configuration, and application module configuration: (1) Upgrade efficiency: containerization, version management, version upgrade; (2) Abnormal handling efficiency, containerization; (3) Project Configuration Efficiency: Efficient transparent configuration, primary-secondary association efficiency.
Manual Configuration Workload at Master Station	Manual Configuration Workload at Master Station Reduced by 60%	Comparative station connection verification performed at the master station: (1) Automatic modeling of the main station; (2) Efficient and transparent configuration; (3) Automatic primary-secondary association, efficient information point labeling.
Manual Configuration Workload at Substation	Manual Configuration Workload at Substation Reduced by 60%	Comparative configuration verification performed at the substation: (1) Automatic discovery and automatic modeling; (2) Automatic configuration of substation functions.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shi, Z.; Gao, H.; Chen, X.; Yu, J.; Diao, Y.; Zhao, Z. Flexible Expansion and Deployment Architecture for Relay Protection Remote Maintenance Master Station Using Low-Code and Containerization Technologies. Energies 2026, 19, 2113. https://doi.org/10.3390/en19092113

AMA Style

Shi Z, Gao H, Chen X, Yu J, Diao Y, Zhao Z. Flexible Expansion and Deployment Architecture for Relay Protection Remote Maintenance Master Station Using Low-Code and Containerization Technologies. Energies. 2026; 19(9):2113. https://doi.org/10.3390/en19092113

Chicago/Turabian Style

Shi, Zebing, Honghui Gao, Xiaoliang Chen, Jiang Yu, Yang Diao, and Ze Zhao. 2026. "Flexible Expansion and Deployment Architecture for Relay Protection Remote Maintenance Master Station Using Low-Code and Containerization Technologies" Energies 19, no. 9: 2113. https://doi.org/10.3390/en19092113

APA Style

Shi, Z., Gao, H., Chen, X., Yu, J., Diao, Y., & Zhao, Z. (2026). Flexible Expansion and Deployment Architecture for Relay Protection Remote Maintenance Master Station Using Low-Code and Containerization Technologies. Energies, 19(9), 2113. https://doi.org/10.3390/en19092113

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Flexible Expansion and Deployment Architecture for Relay Protection Remote Maintenance Master Station Using Low-Code and Containerization Technologies

Abstract

1. Introduction

2. Theoretical and Technical Foundations

2.1. Core Components of Low-Code Technology

2.2. Core Components of Containerization Technology

2.3. Relay Protection Operation and Maintenance Master Station Business Characteristics

3. Materials and Methods

3.1. System Design Principles and Their Architectural Mapping

3.2. Four-Layer Decoupled Architecture

3.3. Construction of Relay Protection Professional Low-Code Component Library

3.3.1. Functional Decoupling and Modular Splitting

3.3.2. Component Encapsulation and Categorization

3.3.3. Standardized Interface Design and Publication

3.3.4. General Framework Component Support

3.4. Microservice Containerization, Splitting and Packaging

3.5. Low-Code and Containerized Collaborative Deployment Mechanisms

3.5.1. Core Business Processes

3.5.2. Automated CI/CD Pipeline

3.6. High Availability and Elastic Design for Reliable Grid Operation

3.7. Information Security Considerations

3.8. Protection Setting Group Adjustment and Maintenance Work Execution

3.9. Automatic Analysis of Correct Functioning of Protection Devices

4. Results

4.1. Experimental Environment Setup

4.2. Functional Verification

4.3. Performance Verification

4.4. Engineering Case Analysis

5. Discussion

5.1. Interpretation of Results

5.2. Practical Application and Industrial Impact

5.3. Addressing Security, Setting Group Management, and Functioning Analysis

5.4. Limitations and Future Directions

5.5. Open Research Questions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI