Reliability Analysis of Cyber–Physical Systems: Case of the Substation Based on the IEC 61850 Standard in China

With the increasing interaction between physical devices and communication components, the substation based on the IEC 61850 standard is a type of cyber–physical system. This paper proposes a reliability analysis method for substations with a cyber–physical interface matrix (CPIM). This method calculates the influences from both the physical device failures and the communication devices failures. Two indices, Probability of Load Curtailments and Expected Demand Not Supplied, are used in the reliability analysis. Given the simplified model of the practical substation based on the Chinese IEC 61850 standard, the results show that the substation system had a potential risk of cascading failure under the cyber–physical fusion trend, as the failure in cyber layer would increase the power loss of the whole system. The changing magnitude of Expected Demand Not Supplied increased significantly with increasing transmission delay rate of the process bus.


Introduction
Over the years, cyber-physical systems (CPSs) have attracted considerable attention given their wide applications in grids, intelligent robot networks, embedded systems, and other fields. A typical CPS is capable of real-sensing, dynamic control, and information services [1][2][3]. Smart cyber systems provide better monitoring, transferring, and controlling functions for the substation, but produce a trade-off, as the substation will experience more cyber-attacks. The Supervisory Control and Data Acquisition (SCADA) system of a nuclear plant recently experienced a severe cyber-attack [4], so the study of cyber security has become a hot topic in smart grids. However, the interactions between cyber devices and physical devices in substations based on the IEC 61850 standard might create new failure scenarios to substations. Thus, it is important to address the reliability of the substation considering the interactions between the cyber layer and physical layer.
In recent years, more research has focused on the cyber-security in power grid. Cyber security in a typical smart grid is illustrated by S. Lim et al. [5], and four types of cyber-power interdependencies were categorized by B. Falahati et al. [6]. For evaluating the direct element-element interdependency between power grid and communication network, B. Falahati et al. [6] proposed a probability table, denoted as the P- Table, to analyze the reliability in integrated systems. Based on the state updating-based model, indirect cyber-power interdependency was proposed to evaluate the reliability of cyber-power networks.

Interaction Framework of the Cyber-Physical Substation
In Figure 1, once one physical device breaks, the physical fault clearing process is the key factor for maintaining the correct functioning of the substation. The definition of fault clearing is when a physical component fails, the corresponding transformers or current transformers monitor the fault information, and then send the analog signal to the merging unit (MU) [13,16]. The MU digitizes the information and sends the information to the protection Intelligent Electronic Devices (IEDs) of the corresponding physical components. Protection IEDs generate the tripping-signal through the protection algorithm. Finally, the process bus sends the signal to the circuit breaker for corresponding actions, thus limiting the scope of the failure of the physical components. This process is partially affected by the cyber components. If all the components in the process act normally and actually, the fault clearing is successful, thus limiting the scope of the failure of the initial physical components. Otherwise, the fault clearing fails, thus the scope of the failure propagating to other physical components.
As summary above, the reliability of cyber elements, such as MU, IEDs, and the process bus, is important to alert the primary equipment failure and help the substation continue working. Once some failures occur in the primary equipment in the substation, three types of scenarios occur during the physical fault clearing process, low-impact failure, local-impact failure, and wide-impact failure. Assuming a failure happened to the busbar, the three types of impacts are shown in Figure 2. In the paper, during the physical fault clearing process, if the e related cyber devices work correctly and actually, we call it working functionally, otherwise, call it working malfunctioning.

Interaction Framework of the Cyber-Physical Substation
In Figure 1, once one physical device breaks, the physical fault clearing process is the key factor for maintaining the correct functioning of the substation. The definition of fault clearing is when a physical component fails, the corresponding transformers or current transformers monitor the fault information, and then send the analog signal to the merging unit (MU) [13,16]. The MU digitizes the information and sends the information to the protection Intelligent Electronic Devices (IEDs) of the corresponding physical components. Protection IEDs generate the tripping-signal through the protection algorithm. Finally, the process bus sends the signal to the circuit breaker for corresponding actions, thus limiting the scope of the failure of the physical components. This process is partially affected by the cyber components. If all the components in the process act normally and actually, the fault clearing is successful, thus limiting the scope of the failure of the initial physical components. Otherwise, the fault clearing fails, thus the scope of the failure propagating to other physical components.
As summary above, the reliability of cyber elements, such as MU, IEDs, and the process bus, is important to alert the primary equipment failure and help the substation continue working. Once some failures occur in the primary equipment in the substation, three types of scenarios occur during the physical fault clearing process, low-impact failure, local-impact failure, and wide-impact failure. Assuming a failure happened to the busbar, the three types of impacts are shown in Figure 2. In the paper, during the physical fault clearing process, if the e related cyber devices work correctly and actually, we call it working functionally, otherwise, call it working malfunctioning.

Interaction Framework of the Cyber-Physical Substation
In Figure 1, once one physical device breaks, the physical fault clearing process is the key factor for maintaining the correct functioning of the substation. The definition of fault clearing is when a physical component fails, the corresponding transformers or current transformers monitor the fault information, and then send the analog signal to the merging unit (MU) [13,16]. The MU digitizes the information and sends the information to the protection Intelligent Electronic Devices (IEDs) of the corresponding physical components. Protection IEDs generate the tripping-signal through the protection algorithm. Finally, the process bus sends the signal to the circuit breaker for corresponding actions, thus limiting the scope of the failure of the physical components. This process is partially affected by the cyber components. If all the components in the process act normally and actually, the fault clearing is successful, thus limiting the scope of the failure of the initial physical components. Otherwise, the fault clearing fails, thus the scope of the failure propagating to other physical components.
As summary above, the reliability of cyber elements, such as MU, IEDs, and the process bus, is important to alert the primary equipment failure and help the substation continue working. Once some failures occur in the primary equipment in the substation, three types of scenarios occur during the physical fault clearing process, low-impact failure, local-impact failure, and wide-impact failure. Assuming a failure happened to the busbar, the three types of impacts are shown in Figure 2. In the paper, during the physical fault clearing process, if the e related cyber devices work correctly and actually, we call it working functionally, otherwise, call it working malfunctioning.   The first type is the low-impact situation where no fault occurs in the cyber components ( Figure 2a). All the information from the primary equipment can be sent out; thus, the physical fault clearing process can work normally. For example, in Figure 2a, the fault occurs in busbar and it does not spread elsewhere.
The second type is local-impact. Once some cyber components malfunction during the physical fault clearing process (excluding the process bus), the failures might spread to their surroundings, triggering them to malfunction, but the failure can be limited to the local scope by other functional cyber components. For example, in Figure 2b, the initial fault also occurs in the busbar; the final fault spreads to the main transformer, due to the MU failures.
The third type is wide-impact. The entire communication of the cyber-physical substation breaks down if the core of the communication components is damaged. For example, the process bus in the communication process plays the core role. Once it fails, all the information from the substation operation states would not be sent out. For example, in Figure 2c, the initial fault still occurs in the busbar, and the whole system breaks, due to the failure of the process bus.

Model Quantifying the Interactions
Considering the three kinds of impact caused by cascading failures in substations, listed in Section 2, cascading failures chains can be described by a probabilistic model. To describe final cascading failure impact, we attempted to define the working states of the cyber components. A 0,1 sequence of related cyber components can reflect the final system state under different physical faults. For example, if 0 means functioning and 1 means malfunctioning, given the original failure in the substation, the working states of all related cyber components in the cascading failure chain can be obtained, and the impact of the cascading failure chain can be quantified as: In (1), m is the number of the physical components, n is the amounts of cascading scenarios of each physical component; p m,n is the probability of causing the cascading scenario nth of the physical component mth, thus, the row vector [p m , i ], i ∈ [0, n] is the cascading scenario set of the physical component mth.
However, in practice, the cyber component working state is not actually 0 or 1. Thus, in the paper, we modeled this as a two-state model, as shown in Figure 3. The state of the cyber component is set to [0,1], where 0 represents working functionally (down), and 1 represents working malfunctioning (up). In Figure 3, λ denotes the failure rate of one individual component, and µ denotes the repair rate. The detailed data are given in Table 1.
The first type is the low-impact situation where no fault occurs in the cyber components ( Figure 2a). All the information from the primary equipment can be sent out; thus, the physical fault clearing process can work normally. For example, in Figure 2a, the fault occurs in busbar and it does not spread elsewhere.
The second type is local-impact. Once some cyber components malfunction during the physical fault clearing process (excluding the process bus), the failures might spread to their surroundings, triggering them to malfunction, but the failure can be limited to the local scope by other functional cyber components. For example, in Figure 2b, the initial fault also occurs in the busbar; the final fault spreads to the main transformer, due to the MU failures.
The third type is wide-impact. The entire communication of the cyber-physical substation breaks down if the core of the communication components is damaged. For example, the process bus in the communication process plays the core role. Once it fails, all the information from the substation operation states would not be sent out. For example, in Figure 2c, the initial fault still occurs in the busbar, and the whole system breaks, due to the failure of the process bus.

Model Quantifying the Interactions
Considering the three kinds of impact caused by cascading failures in substations, listed in Section 2, cascading failures chains can be described by a probabilistic model. To describe final cascading failure impact, we attempted to define the working states of the cyber components. A 0,1 sequence of related cyber components can reflect the final system state under different physical faults. For example, if 0 means functioning and 1 means malfunctioning, given the original failure in the substation, the working states of all related cyber components in the cascading failure chain can be obtained, and the impact of the cascading failure chain can be quantified as: In (1), m is the number of the physical components, n is the amounts of cascading scenarios of each physical component; pm,n is the probability of causing the cascading scenario nth of the physical component mth, thus, the row vector [pm,i], i ∈ 0, is the cascading scenario set of the physical component mth.
However, in practice, the cyber component working state is not actually 0 or 1. Thus, in the paper, we modeled this as a two-state model, as shown in Figure 3. The state of the cyber component is set to [0,1], where 0 represents working functionally (down), and 1 represents working malfunctioning (up). In Figure 3, denotes the failure rate of one individual component, and  denotes the repair rate. The detailed data are given in Table 1.    The occurrence probability of a functionally working state p and unfunctional working state p' are calculated with Equations (2) and (3), respectively.
There are some delays in the communication process [11]. The delay transmission of the process bus is denoted by probability η (η = 0.3% in the case study). Thus, Equations (2) and (3) can be updated as Equations (4) and (5) considering the delay, respectively.
The functional working state and unfunctional working state probabilities of each cyber components are calculated, as shown in Table 2. The functional working state probability of the process bus is smaller than that of the other components according to Equation (4).

Indices of Cyber Physical Substation Reliability
Probability of Load Curtailments (U k ) and Expected Demand Not Supplied (EDNS) were used to calculate the reliability of the cyber-physical substation, and they are displayed in Equations (6) and (7), respectively. (6) where N is the number of the simulation, T dnik is the duration of load k in ith curtailments, and T upik is the duration of load k in the ith functionally working state.
where L k is the average load not supplied of load-point k during the simulation, P ik is the probability of failure of sub-state i at load-point k, and N k is the total number of states or sub-states that cause load curtailment at load-point k.

Reliability Simulation Method
The simulation was based on the sequential Monte Carlo method. Considering the cascading failures in the substation, the reliability simulation steps were as follows: (1) Simulate time t = 0: Initialize both cyber layer and physical components.
(2) Randomly generate states of all physical components. The working state of each physical component is based on the exponential distribution: where U i of item i is within the interval [0,1], which obeys uniform distribution. If the current working state of the item i is functional, σ i is the failure rate of the physical component; otherwise, the current state is unfunctional, and σ i is the repair rate of physical component. Finally, based on Equation (8), we can find the min{T i }, and its corresponding component j. The working state of the physical component j will change at the next simulation time.
the sth scenario of the physical component j occurs. (5) Calculate the reliability indices. (6) Repeat steps (3) to (5) until the variance coefficient is less than the allowable value with: where V(F) is the variance of the test function, NS is the number of simulation years, and E(F) is the expected value of the function.

CPIM of the Each Component in the Cyber-Physical Substation
A simplified model of a typical the substation based on the IEC 61850 standard in China is shown in Figure 4, which is a 220/121/38.5 kV step-down substation. The annual average load of both load-point-1 and load-point-2 are 100 MW. The details for the primary devices of the substation are shown in Table 3. In Figure 4, there are 11 breakers, denoted as 1, 2, 3 . . . ; A and J stand for the transmission lines; C, D, E are main transformers; MU is the merging unit, and the number of MUs is 8, denoted as by MU1, MU2 . . . ; B, F, G, H, I are the buses. According to (1), the shape of the CPM of Figure 4 is shown as (11). In (11), there are 10 physical devices, denoted as A, B, . . . J, thus the row number is m = 10, each row vector means the CPIM of a physical device. For example, the CPIM of the physical device A is denoted as CPI M A 1×a , where a is the number of cascading scenarios of A; similarly, the CPIM of the physical device B is denoted as CPI M B 1×b , where b is the number of cascading scenarios of B; the CPIM of the physical device J is denoted as CPI M J 1×j , where j is the number of cascading scenarios of J; Thus, the number of columns of CPM is a + b . . . + j. The CPIM of each physical device shows from Tables 4-13.     Based on the CPIM method in Section 3, considering a failure clearing at line A, the CPIMA are shown in Table 4. In this case, there are three kinds of cascading chains within the substation. scenario 1: If all the related cyber devices are working functionally, the breaker can obtain the failure information, and then locate and clear the failure. The failure scope would be limited within A, which is the low-impact case mentioned in Section 2. In Table 4, the results show that when line fault clearance occurs at A, more than 99% failures are limited to within A. However, in extremely few cases, the failure scope would extend to the entire system, due to the dysfunctional working of the process bus connected to A, which is the wide-impact case mentioned in Section 2. In Table 4, the probability of this occurrence is the smallest. With a small probability of 0.3%, among breaker 1, merging unit 1, and protection IED of A, more than one cyber device may be malfunctioning; thus, it leads to breaker 1 failure and then resulting in the failure of B. At this time, breakers 2, 3, and 4 can work functionally, thus limiting the failure scope to within A and B, which is the local-impact case mentioned in Section 2. Thus, based on  Based on the CPIM method in Section 3, considering a failure clearing at line A, the CPIM A are shown in Table 4. In this case, there are three kinds of cascading chains within the substation. scenario 1: If all the related cyber devices are working functionally, the breaker can obtain the failure information, and then locate and clear the failure. The failure scope would be limited within A, which is the low-impact case mentioned in Section 2. In Table 4, the results show that when line fault clearance occurs at A, more than 99% failures are limited to within A. However, in extremely few cases, the failure scope would extend to the entire system, due to the dysfunctional working of the process bus connected to A, which is the wide-impact case mentioned in Section 2. In Table 4, the probability of this occurrence is the smallest. With a small probability of 0.3%, among breaker 1, merging unit 1, and protection IED of A, more than one cyber device may be malfunctioning; thus, it leads to breaker 1 failure and then resulting in the failure of B. At this time, breakers 2, 3, and 4 can work functionally, thus limiting the failure scope to within A and B, which is the local-impact case mentioned in Section 2. Thus, based on

Cascading Scenario
Effects Scope Probability The entire system 0.000033384 3 AB 0.003009105 Table 5. CPIM J of the line fault clearance at transmission line J.

Cascading Scenario Effects Scope Probability
low-impact J 0.996957511 wide-impact The entire system 0.000033384 local-impact IJ 0.003009105 Table 6 shows a similar analysis in the case of a failure clearing at bus B. In this case, consider all cyber devices are connected to B, such as merging units 1, 2, 3, and 4; breakers 1, 2, 3, and 4; and the process bus. The three kinds of cascading chains could occur within the substation: Low-impact, wide-impact, and local impart. In Table 6, more than 99% of failures are limited to within B, due to all the related cyber devices functioning properly. However, having a smaller probability 0.3%, the failure scope would extend to entitle system, due to the dysfunctional working of the process bus. According to the different sizes of failure scopes caused by different related cyber devices, four kinds of local-impact may occur with minimal probability.  In Table 6, there are four types of local-impacts, denoted as local-impact 1, 2, 3, 4, and the number of cascading scenarios is 17. Local-impact 1: If one of the merging units or related breakers malfunctions, the failure effect scope would be limited to B and one of its connecting physical devices. The number of cascading scenarios belongs to local-impact 1 is 4. For example, either merging unit 2 or the breaker 2 is dysfunctional, while the others are functional, then the effect scope is limited to within B and C. Local-impact 2: If two of merging units or related breakers are dysfunctional, this case would limit the failure effect scope to B and two of its connecting physical devices. The number of cascading scenarios belongs to local-impact 2 is 6. For example, the effect scope ABE might result from the failure at breakers 1 and 4, and merging units 1 and 4. Similarly, If three (four) of the merging units or related breakers malfunction, this would limit the failure effect scope to B and three (four) of its connecting physical devices. The number of cascading scenarios belongs to local-impact 3 and local-impact 4 are 4 and 1.
Thus, based on Table 6 Table 8. CPIM G of the line fault clearance at bus G.

Cascading Scenario Effects Scope Probability
low-impact G 0.996957511 wide-impact The entire system 0.000033384 local-impact GD 0.003009105 Table 9. CPIM H of the line fault clearance at bus H.

Cascading Scenario Effects Scope Probability
low-impact H 0.996957511 wide-impact The entire system 0.000033384 local-impact HE 0.003009105 Table 10. CPIMI of the line fault clearance at bus I.

Cascading Scenario Effect Scope Probability
low-impact I 0.996911991 wide-impact The Using the same analysis method, Table 11 shows the results under failure clearing at transformer C. The results summary is similar to Table 6: (1) More than 99% failures are low-impact, limited to within C; (2) within a smaller probability of 0.3%, the failure scope extends to the entire system, due to the dysfunctional working of the process bus, being a wide-impact; and (3) local-impact are classified according to the failure number of the related cyber device, of which the occurrence has low probability. Thus, based on Table 11, the number of cascading scenarios is 9, and the CPIM C = [0.996927164, 1.51734070 −5 , 1.51734070 −5 , 1.51734070 −5 , 2.30941925 −10 , 2.30941925 −10 , 2.30941925 −10 , 1.82100387 −5 , 0.003009105] 1×9 , satisfied sum{CPIM C } = 1.

Reliability Analysis Results
Consider the reliability of load-point-1, load-point-2, and the entire system in Figure 4. The Probability of Load Curtailment (PLC) was calculated, as shown in Table 14. A traditional simulation without considering the impact of the cyber layer and our method with integrated CPIM was carried out. As seen from the growth rate (∆%), the probability of load curtailment slightly increased to 4.43% compared to without considering the influence of cyber layer. The improvement as not obvious compared with the traditional simulation, especially for the entire substation. The risk of cascading failure was low, due to the high reliability of the cyber components. Compared that in the traditional simulation. The EDNS of entire substation than that with traditional simulation increase 7.41%. Compared the results of Table 15 with Table 14, the failures in the cyber layer have more significant impacts on electricity unavailability than on the probability of load curtailment.
The comparison of EDNS is shown in Table 15. The EDNS in load-point 1 increased 11.93%.

Effects of Delay Rates
Values from 0 to 0.005 were assumed to be the delay rates for all process buses. In practice, a delay rate may be prolonged, due to electromagnetic interference was be influenced by other factors. The quantitative relationship between simulation time and the ENDS is studied, and the results are shown in Figure 5. The value of the system ENDS increased considerably, and the growth rate of ENDS increased linearly with prolonged switching time. This illustrates that the delay rate of the process bus signifies the fault clearing. Advanced technologies for smart grids are important. Highly reliable control components and fast information transmission accelerate the process of cyber failure identification and physical fault clearing.

Effects of Delay Rates
Values from 0 to 0.005 were assumed to be the delay rates for all process buses. In practice, a delay rate may be prolonged, due to electromagnetic interference was be influenced by other factors. The quantitative relationship between simulation time and the ENDS is studied, and the results are shown in Figure 5. The value of the system ENDS increased considerably, and the growth rate of ENDS increased linearly with prolonged switching time. This illustrates that the delay rate of the process bus signifies the fault clearing. Advanced technologies for smart grids are important. Highly reliable control components and fast information transmission accelerate the process of cyber failure identification and physical fault clearing.

Conclusions
With the development of substation system automation applications, the interdependency between the communication network and the primary equipment must be considered. This paper extended the Cyber Physical Interface Matrix (CPIM) methodology to reliability analysis. Two reliability indexes were presented, and the results of the case study verified that failures in the cyber layer increase the substation system's reliability, and the sensitivity analysis revealed that the process bus plays a key role in the reliability of the entire substation. Although the probability of time delay in information transmission is small, it is the critical factor leading to reliability changes in cyberphysical substations.
The proposed reliability assessment method can also be used to address the reliability problem faced by cyber physical power systems. In such systems, for future study, more detailed analysis on the interdependency between physical side and cyber layer is needed.

Conclusions
With the development of substation system automation applications, the interdependency between the communication network and the primary equipment must be considered. This paper extended the Cyber Physical Interface Matrix (CPIM) methodology to reliability analysis. Two reliability indexes were presented, and the results of the case study verified that failures in the cyber layer increase the substation system's reliability, and the sensitivity analysis revealed that the process bus plays a key role in the reliability of the entire substation. Although the probability of time delay in information transmission is small, it is the critical factor leading to reliability changes in cyber-physical substations.
The proposed reliability assessment method can also be used to address the reliability problem faced by cyber physical power systems. In such systems, for future study, more detailed analysis on the interdependency between physical side and cyber layer is needed.