Reliability Evaluation of Smart Substation Based on Time-Varying Probabilistic Hybrid Attack Graph

: A substation is the portion of a power grid that forms a link between the cyber system and the physical system. Reliability evaluation of smart substations based on a time-varying probabilistic hybrid attack graph (TVPHAG) is studied in this paper. First, the topology network of the smart substation is established, whose attributes are represented by probability. Then, in order to solve the problem of asynchrony in the cyber-physical system and the hybrid caused by heterogeneity, time-varying state equation in topology and cuts in algebra are introduced to TVPHAG. Based on TVPHAG, the evaluation of the reliability of cyber-physical systems with multiple equipment and multiple timescales is established. On this basis, the influences of physical conditions, cyberattacks, physical attacks, and cyber-physical attacks on substations are analyzed, respectively. Finally, the simulation shows that the method is effective in evaluating the reliability of smart substations, providing a new method for the evaluation of reliability.


Introduction
The modern power grid is developing toward the cyber-physical power system (CPPS), which coordinates the cyber system and the physical power system [1].A substation is the portion of a power grid that forms a link between the cyber system and the physical system.The cyber system monitors the state of the physical system and then sends real-time control information to the physical system [2,3].The process is time-varying because of the asynchrony on the control system and interdependence between the cyber-physical system [4,5].The cascading failure of the power grid has caused several incidents around the world, such as the large-scale blackout in Ukraine caused by a deliberate cyberattack in 2015 [6], the blackout in Israel in 2016 [7], and the blackout in Venezuela in 2019.The cyberattack may destroy the transient stability of the power grid through cross-space spreading, thus breaking the physical system, causing large-scale power outages [8,9], and even inducing power grid splitting [10].Therefore, the reliability assessment of smart substations is of great importance.
Attack graph is a technology of security assessment.The possibility of attack paths can be calculated through the causality among attack steps [11].Attack graphs can be used to identify network vulnerabilities and evaluate the safety of the system [12,13].The reliability of substations is affected by physical equipment [14], and attack graphs are also used to identify hazards of physical conditions [15], such as risk assessment of power distribution equipment [16] and assessment of the status of the power grid attacked [17], etc.
To avoid the spread of risks between the cyber system and the physical system, one approach is to establish a model of interdependence using complex network theory to abstract the power grid and cyber grid into a stochastic network [18,19].
However, those papers ignore the complex coupling between them.In order to obtain quantitative results, the attack tree theory was used for vulnerability analyses and security assessments of SCADA (supervisory control and data acquisition system) [20][21][22].
Then, a quantifiable method combined with CVSS (Common Vulnerability Scoring System) was proposed.With the help of the Bayesian network, the method solved the problems of lacking quantification, lacking confidence, and lacking readability in the attack graph [23].The Bayesian network was proposed to study the probability of cascading failures and their consequences of cyberattacks [24][25][26] and to trace the complete path of attacks [27].
The above papers have put forward probabilistic methods for the reliability evaluation of smart substations.However, being limited to static characteristics, they have difficulty capturing the time-varying and dynamic relationships of components and systems.Then a hierarchical Bayesian model was proposed, which achieved dynamic reliability evaluation by integrating historical data and real-time data [28], but it was not suitable for cyber-physical systems.In [29,30], the Bayesian network was used to track the complete path of attacks, which caused a cascading failure in the order of vulnerability on the network, host authority, and executor sequence.Taking probability and coupling into consideration, the model was used to study the relevance between the transition of system state and attack interval.Petri net was also used in the modeling of cyber-physical systems to figure out vulnerability cascade propagation [31,32].Being used as a substation fault diagnosis method, it incorporates time sequence and probabilistic features and obtains the diagnosis results with the help of failure information [33].However, the above methods were essentially used to study the time-invariant dynamic response under attack, ignoring the asynchrony and reconstruction of the network in a cascading failure process.The discrete-time state-space model provides another method for solving problems in the time domain [34].However, so far, the research has usually established a linear time-invariant model of system state.Although it is applicable for physical conditions [35], cyberattacks [36], and Gaussian noise [37], it still fails to solve time-varying problems.
In addition, current research often ignored the complex time-varying coupling inside the substation.Some papers tried to study interdependence by using correlation matrices, but they only considered the one-to-one correspondence between the two layers to establish a 0-1 logical matrix [38].The Petri net has the ability to analyze internal situations of substations, while it is a post-mortem diagnosis method, which requires the alarm information at the time of failure as a basis [39].Some scholars have tried to use the artificial neural network (ANN) to evaluate its reliability.However, due to difficulties in obtaining fault samples and the lack of confidence [40], it is necessary to analyze the coupling between equipment inside the substation.
Aiming to solve the problem of poor practicability on reliability evaluation, this paper proposes a time-varying probabilistic hybrid attack graph and its generation algorithm.The method takes the cyber-physical conditions and the hybrid coupling of the substation into consideration, and it abstracts the equipment as a vertex to establish the attack graph based on the structure of the smart substation.Considering the timeliness of transmission in the network, the established model reflects asynchrony and reconstruction of the network and then infers the dynamic changes in the reliability of each piece of equipment and the substation.The simulations of TVPHAG illustrate the characteristics of physical conditions, cyberattacks, physical attacks, and cyber-physical attacks, which provides guidance for ensuring the safety of substations.

Definition and Description of TVPHAG
This paper proposes the time-varying probabilistic hybrid attack graph (TVPHAG), whose vertices and edges will be explained by probability weighing.The model is suitable for smart substations with heterogeneous components, dynamics, and asynchronous behaviors.Moreover, the evaluation for each vertex in the grid is proposed.The definition and algorithm of TVPHAG ( , , , ) are given as follows:

Establishment of Network
TVPHAG is a directed graph whose topology is determined by , .The set of vertices is defined as: In Equation (1), represents the set of all vertices in TVPHAG, where represents the set of vertices of equipment, and represents the set of vertices of consequence.represents equipment, with a total of .represents the consequences of cascading failures, with a total of .
The set of edges is defined as: In Equation ( 2), , represents a directed edge ⟨ , ⟩.
is the set of directed edges with coupling between vertices, i.e., the set of paths of spread.
The geometric topology of TVPHAG is established through the following rules: (1) The messages and instructions on the secondary side are directed from the previous level to the next level; (2) The secondary equipment points to the primary equipment controlled by it; (3) The primary equipment points to the measuring equipment, (4) Other couplings between the equipment; (5) The primary equipment with abnormality points to the corresponding consequences.

Establishment of Parameter
The parameter of TVPHAG includes , , where: The vector of the device's attribute is defended as: In Equation ( 3), represents the self-triggering probability of the vertex of equipment in TVPHAG, i.e., the probability of equipment spontaneously failing, which is affected by operating conditions, working years, and other factors.
In Equation ( 4), → represents the mapping from the left set to the right set, represents the mapping function.The mapping of → { } is a one-to-one correspondence, i.e., a bijection.
The matrix of the directed edge is defined as: In Equation ( 5), , is defined as the triggering probability.is defined as the matrix of probability that faults are triggered by other vertices, whose element , represents the probability that causes a malfunction ( ∈ , ∈ ).The element , in the matrix represents the probability of causing a malfunction ( ∈ , ∈ ).
In Equation ( 6), is the mapping relationship between the sets formed by elements in and sets and , which is a bijection.
In this paper, the MTTF (Mean Time To Failure) of equipment is used as the selftriggering probability .However, the conditions of the substation and equipment are different.The influence of temperature, operating years, precipitation, loading rate, etc., should be considered.The above factors are set as independent variables of Ω = [ 1 (℃) ( ) ( ) (%)], and the relative failure probability is recorded as (Ω) = ( ( ), ( ), ( ), ( )).According to the big data on equipment failure, the following relationship is obtained [41]: Those factors, viz.temperature, operating years, precipitation, and loading rate, should be calculated respectively for reliability because of different weights.According to technical standards and actual operating conditions of power equipment, such information is collected for the preparation of calculation.The correction coefficient of is calculated by Equation ( 8): where * ( = 1,2,3,4) are the reference, contributing to vector Ω * = [ * * * * ] = [15 6 10 50], which is selected according to the statistical data.The MTTF is replaced by = Ω × .Finally, the self-triggering probability can be described as

Analysis of the Spread of Faults
Abnormal data or actions are accompanied by abnormal states of the equipment.In TVPHAG, the probability of failure changes as the abnormal state spreads through the equipment.The spread of state is asynchronous due to the delay of physical equipment's action, information transmission, data sampling, etc.The process is analyzed as follows: According to the standard of IEC61850, all messages of smart substation are divided into seven categories according to the time range of transmission.The main information flow of the process layer is GOOSE, SV, and synchronization messages.The transmission time of the secondary message [42] and the response time of the primary device [43] are given below and shown in Tables 1 and 2:  The transfer of the device's state has a delay caused by the period of sampling, transmission, and processing of messages and the action of the equipment.In TVPHAG, it is manifested as the delay of the failure cascading.Therefore, TVPHAG is a dynamic directed graph with asynchrony.Figure 1 shows a simple TVPHAG, which can be divided into multiple subgraphs according to the period of delay.In this paper, the transmission mechanism of failure probability is represented by matrices and .Through the above analyses, matrices and are time-varying.Through the above methods, the reliability of equipment and bays can be calculated and expressed as a probability, and then the loss-of-load probability (LOLP) of the substation can be calculated according to the connection relationship and operation mode of each bay.LOLP is defined as expected losses of load per unit.

Algorithm of TVPHAG
The failure probability of equipment is influenced by the following factors: the equipment spontaneously transfers to an abnormal state with the probability ; the equipment state is affected by other equipment with the probability ∑ × ,

∈ ( )
. Where ∈ ( ), ( ) represents the set of the entering edge of .represents the failure probability of .Therefore, the failure probability of is obtained as Equation ( 9): A discrete-time system is established by discretizing time.
( ) represents the failure probability of at time .Equation ( 9) is extended to all vertices in sets and and expressed in matrix forms in Equation (10) and Equation ( 13): According to Equations ( 10) and (11), the smart substation is modeled as a multivariable discrete-time linear time-invariant system.
The vector is iteratively updated in the process of calculation, which means the equipment turns to a new state after physical equipment actions, information transmission, and data sampling.Each iteration is the update cycle of device status, which is related to sampling action time and path length.The TVPHAG proposed in this paper considers the heterogeneity of various equipment and the asynchrony of action response, so the dynamic matrix of state update is time-varying, and its establishment method is introduced in Section 2.3.Analysis of the spread of faults.Based on the original model, it is further modeled as a multi-variable discrete-time linear time-varying system: With the iterative calculation, vector is continuously updated until it is of convergence.This paper gives the condition of convergence without proof [44]: the spectral radius ( ) = {| |} < 1, where is the eigenvalue of the matrix ( ) .In terms of physical attacks or cyberattacks on the substation, which cause an abnormal state of the equipment, the elements of the attack vector ] are formed with the attack strength, where ( ) ∈ [0,1] represents the probability of the attack causing an abnormal state of the equipment at time .For the physical attack, the energy intensity applied or caused by attacks is used to evaluate the failure probability.Considering the voltage level of the equipment, the current determines the hazard levels.The current caused by the direct lightning strike can reach 40 kA, and the current caused by short-circuit of the line can reach 1 kA.It is believed that the direct lightning will cause a failure, then ( ) = 1/40 at the situation of short-circuit.Cyberattacks are divided into three categories according to the purpose, i.e., destroying availability, integrity, and confidentiality of information.The value of where ( ) ∈ ℝ represents the vector of failure probability; ( ) ∈ ℝ represents the vector of consequences of bays; ( ) represents the reliability of substations, i.e., LOLP; ( ) ∈ ℝ represents the cyber-physical attack on the substation, and matrices , , are dynamic matrices.

Results and Case Study of TVPHAG
The D2-1 smart substations in IEC 61,850 presently have been built more, including transformer bays, bus bays, and feeder bays.The primary equipment includes primary power equipment such as buses, transformers, circuit breakers (CB), disconnect switches (DS), and electronic voltage current transformers (VCTs).The secondary equipment includes the merging unit (MU), the intelligent electronic device (IED), the protection device (PD) and the measurement and control device (MD), switches, Network Control Center Server (NCCS), and other equipment [45].The secondary system can be divided into three levels, viz., station level, bay level, and process level.This paper takes the D2-1 110 kV smart substation as an example.It has 2 SSZ10-40,000/110 main transformers with a total capacity of 2 × 40 MVA, with 110 kV sectionalized configuration, 2 incoming and outgoing lines, etc.Its topology is shown in Figure 3, where A and B are transformer bays, C is a bus bay, D and E are feeder bays, and the same type of bays has the same structures and configurations.TVPHAG is established according to the above analyses of the primary and secondary equipment of the smart substation.Figures 3 and 4 show the corresponding equipment of some vertices, and the other equipment can be expanded by their consistency.The vertices of the equipment and consequences of TVPHAG are distinguished by circles and squares in Figure 5.The description of vertices in Figure 5 is shown in Table 3.
Table 3.The description of vertices in Figure 5.The self-triggering probability is defined as MTTF , and shown in Table 4 [46,47]: This paper firstly discusses the influence of .According to the spectral radius of the matrix ( ), the convergence condition of the established model is calculated, that is, < 0.135.Moreover, the result is shown in Figure 6.

Vertices of Consequence
In Equation ( 17), the function × × 0.5 × 1 represents that the bay of A and bay of bus get failures, and the probability of working on any one of the buses is equal.In this case, all loads are lost, which can be changed in accordance with the actual operations of The probability of failure increases with the increase in .The simulation is shown in Figure 6.Two cases are simulated in this paper.One case is that is a constant value, and the other is that is evenly distributed around the constant value.The failure probability of the system increases exponentially and rapidly when > 0.12.Therefore, the situation that > 0.12 should be avoided.In practical engineering, is related to the degree of connection relationships between equipment.In this study, = 0.12 is selected for the particularity.


Static features: To get a clearer picture of the impact of physical conditions, it is assumed that the primary equipment on bays E and D with a loading rate of 80%, and the other conditions are reference values.According to Equations ( 9) and ( 10), the correction coefficient = 2.9437, that is, the MTTF of vertex ~ .~ is = 2.9437 × .According to Tables 2 and 3, in the case of this paper, the delay of processes includes 1 ms, 3 ms, 10 ms, 20 ms, 25 ms, 100 ms, 500 ms, and 1000 ms.According to the delay of switching on states, eight attack graphs are established on the same time scale.These graphs are all subgraphs of the TVPHAG established above.
Comparing the two situations where the loading is at a reference and over 80% of it, the result of primary equipment in bay E and bay D is shown in Figure 7.The darker bar represents the situation of reference.In contrast, the lighter bar represents the situation where the loading exceeds reference.The comparison is shown in Figure 7.Under the high load of primary equipment with only interval E and interval D, the failure probability of the bay whose equipment is in a harsh condition increases a lot, and other bays also see relatively small increases.The cause is that the anomalous data obtained by sampling is injected into other bays, which may lead to anomalies.The improvement in the E bay is much smaller than that of the B bay because the B bay has more primary equipment and is connected in series.From the above results in Figure 7, operating conditions of physical equipment have a greater impact on this bay.The impact on other bays is relatively limited.
The LOLP under different physical conditions is recorded in Table 5. Considering that the physical conditions of substations do not change drastically, the factors are set to deviate from the reference value by 10% and 20%.The impact of precipitation is relatively small; the high temperature promotes a more rapid increase in hazards.LOLP increases proportionally with the increase in loading rates and operating years.The result verifies the correctness of the model.The vertex 75 of TVPHAG is attacked from t = 50 ms to t = 6000 ms by simulation.The results of bay A, bay B, bay C, PD, MD, and NCCS are shown in Figure 8.The failure probability increases rapidly within 2s after the attacks' arrival.The response of each device has a delay after the attack is applied or removed because the transition between the normal state and the abnormal state requires time to transfer and process.
The failure probability reaches more than 80% of the increased value within 2 s after the vertex is attacked, while it takes 4 s to drop to 20% of the increased value after the attack is removed.The simulation is in line with the phenomenon that is easy to damage while hard to recover.During the followed simulation of TVPHAG, continuous physical attacks, cyberattacks, and cyber-physical attacks are respectively applied and analyzed.The main equipment and bays viz., bay A, bay B, bay C, PD, MD, and NCCS, are observed and analyzed.
Figure 9 shows the failure probability of the simulation that attacks arrive at t = 50 ms.Various delays such as sampling and transmission of processing, actions of equipment, etc.Several main indexes are selected in this paper as indicators, including the failure probability of the bay where the attack is located, the mean failure probability of other bays, and LOLP.The critical time is selected to indicate corollary, that is, 50 ms (initial state), 500 ms, 1000 ms (rise rapidly), and 6000 ms (stable).The result is shown in Table 6.As for the physical attacks, the attack on the measuring equipment, such as a C/VT, has the characteristics of small influence range, fast action, and little hazard.The attack on important primary equipment such as CB will cause direct failures of this bay and greatly increase LOLP while having little impact on other bays.
Compared with physical attacks, the transmission of cyberattacks on substations is more complicated, causing larger hazards both on range and time.The failure probability of the secondary equipment is significantly improved, which can also spread across bays.At the same time, it has the possibility of affecting the servers in the station and spreading to other power stations.Compared with the attack on the bay layer, the attack on the access layer mainly affects the primary and secondary equipment of the bay while threatening the cyber system of the whole station less.Compared with attacks on the station layer, the attack on the bay layer is more harmful to this bay and less threatening to other bays, so abnormal states often appear on this line.
The cyber-physical attack has the characteristics of both physical attacks and cyberattacks.Due to the complex transmission of the TVPHAG, the abnormality of one device may cause an abnormality in other equipment.Therefore, under cyber-physical attack, the failure probability is mutually coupled and superimposed.The TVPHAG can reveal the relationship between system risks and attacks and its dynamic trends.

Discussion
This paper proposes the TVPHAG, which discretizes the state of the system.The model constructs a one-to-one mapping between the availability of the equipment/bay and nodes of TVPHAG.Based on the internal correlation of the system, the subgraphs of TVPHAG are established by the time scale of equipment transmission.The model solves the problem of asynchrony in the cyber-physical system and the phenomenon of confounding caused by heterogeneity and dynamically evaluates the reliability of the system.
The contribution of this paper exists in the following aspects: on the one hand, a smart substation is a complex network with cyber and physical equipment, including complex data flows and connection of the equipment, facing threats of cyber-physical attacks.By expressing uncertainty in probabilities, TVPHAG adapts to this complex network and simulates the cascade propagation process.On the other hand, due to the delay of state transfer, time-varying state equations in topology and cuts in algebra are introduced to TVPHAG.Combined with graph theory and algebra, it overcomes the insufficiency caused by only analyzing static networks in current research and helps to analyze the state of complex systems over time.In addition, being different from diagnosis, TVPHAG does not require information after the occurrence of faults, while it can evaluate the reliability of the system by using the basic information of the system's equipment and professional data in the field.On this basis, the research's conclusions on substation reliability can provide a reference for future substation construction and upgrade planning and designing.
There are several aspects worth studying in the future.(1) The method deals with systems as linear dynamics.Many device processes are linear or can be linearized, while some cannot be handled by linearization.Designing nonlinearity is a direction for future work.(2) The method deals with the association relationship based on messages.In actual production, the message has more characteristics due to different objects and types.Establishing more specific characteristics for more types of equipment and messages is one direction to improve reliability.(3) The failure transmission probability data needs to be supported by many experiments or big data, and this aspect is still thin at present.
the type and intensity of the cyberattack.In particular, ( ) = 1 represents that the attack will definitely cause an abnormal state of the equipment , while( )= 0 represents no attack.The matrix ( ) is established by the wiring of the substation, the load of the incoming and outgoing lines, and the operation mode.Finally, the multi-variable discrete-time linear time-invariant model of the smart substation is established in Figure2and Equation(16).

Figure 2 .
Figure 2. Multi-variable discrete-time linear time-invariant model of smart substation.

Figure 3 .
Figure 3.The message flow of smart substation.

Figure 6 .
Figure 6.The failure probability as the increase in the substation.The function × 0.5 represent the bay of C that gets failures, and the load of any one of the feeder bays is equal.

Figure 7 .
Figure 7. Failure probability under high and low loading rates.

Figure 8 .
Figure 8. Dynamic change of failure probability after the attack.

Figure 9 .
Figure 9. Dynamic diagram of failure probability of multi-type attack.(a) CB (bay A) is attacked, (b) transformer (bay A) is attacked, (c) CB-IED (bay A) is attacked, (d) MU (bay A) is attacked, (e) MD (bay A) is attacked, (f) NCCS is attacked, (g) transformer (bay A) and CB-IED (bay A) are attacked, (h) transformer (bay A) and MD (bay A) are attacked, (i) transformer (bay A) and NCCS are attacked.

Table 1 .
Typical response delay of the primary equipment.

Table 2 .
Response delay of the primary equipment.

Table 4 .
Reliability data of equipment.

Table 5 .
LOLP on different physical conditions.

Table 6 .
Record table of critical time probability of multi-type attacks (unit: %).