1. Introduction
The most vulnerable part of any system, including railway signalling systems, is human. Not only humans who use technical equipment, but people who design and manufacture equipment. Railway signalling systems become more and more complicated and complex. The overarching aim at the designing stage is to automatize work in order to eliminate human errors in the signalling process [
1,
2,
3,
4]. Therefore, technical problems are a significant problem of safety and risk analysis. The main problem now is to apply reasonable and practicable method to perform risk assessment in order to identify the safety requirements that shall be considered during the signalling system development. It is desired that the method is no time and resources consuming in the dynamic world with so many changes occurring so fast. The purpose of the article is to present selected method of risk assessment of railway control and signaling systems, including current normative and legal bases, such as directives and regulations that regulate the interoperability and safety of the railway system. These problems differ because the devices and systems vary significantly. Engineers solve these problems mainly at the stage of designing. They are legally and morally responsible for the safety of future users of designed devices. The engineers design and supervise production processes of these devices to maintain the safety level required by law [
5]. It is especially difficult to specify the expected safety level, which is linked to the social risk acceptance level. In order to specify the safety level, we use the term risk, which is inextricably linked with the randomness of phenomena and events in the world [
6,
7]. Risk assessment is a key part of the safety management strategy [
8]. In the article, the authors focus on the initial stage of the risk assessment: from defining the system to specifying safety requirements. The goal of the article was to present the hazard and operability studies (HAZOP) and adapted risk graph method to perform initial risk assessment process and determine safety integrity level/tolerable hazard rate (SIL/THR) requirement and as well to meet the requirements of railway regulation especially regulation 402/2013 [
9,
10,
11]. The SIL and THR in the context of this article is applied interchangeably. The reason for that is because the used method is qualitative. Authors are aware that the two parameters represent different requirements, however in authors opinions, it has impact and meaning at further steps of safety analysis of a given system [
12]. The authors propose the combination of HAZOP and adapted risk graph as practical and comprehensive method to determine the safety targets for the system. There are many other qualitative and quantitative methods available. However, the goal of the method selection was to use well proven, but adapted methods, easy to apply at this level of the system analysis, and easy to understand by railway authorities and/or decision makers. The notations and abbreviations can be found in
Appendix A,
Table A2.
This publication is part of the research constituting statutory activity and as part of other research programs.
2. Literature Review
Risk assessment of the railway system has been specified in the Implementing Regulation of the European Commission (EU) No 402/2013 of 30 April, 2013 on the common safety method for risk evaluation and assessment and repealing Regulation (EC) No 352/2009 [
9]. If a risk assessment is required by the relevant technical specification for interoperability (TSI) then the TSI shall, where necessary, specify which parts of this Regulation apply and, in the case of a railway signalling system, the Commission Regulation (EU) 2016/919 of 27 May, 2016 on the technical specification for interoperability relating to the ‘control-command and signalling’ subsystems of the rail system in the European Union says that CENELEC standards shall apply [
10,
11]. Currently the SIL investigation methods in railway signalling is not clearly defined. Most of the standards propose several methods to perform the analysis without pointing out the one that is the most suitable [
10]. The idea of applying SIL concept is based on the standard PN-EN 61508-1 [
12], where also several methods are proposed for the generic E/E/PE systems including risk graph. Most of the scientific papers related to SIL determination is outside of the railway signalling domain [
13,
14].
The HAZOP method has its deficiencies [
15,
16,
17], however it is commonly used in railway system for identification of hazards and when carefully applied still provide very good value for the risk assessment process. In paper [
18] authors discuss the new HAZOP method is to be applied for the Train Control System in the railway environment with modified parameters. The approach in this article was to use selected guide words to specific functions of the system. It is the typical use of the HAZOP system with application to signalling domain. The risk graph method is not often used in main line signalling systems and more often you can see the risk matrices applied [
19], however the risk graph method provides very good values as a qualitative method that does not require huge effort and provides higher levels of details to consider during the decision making [
20,
21].
In the new version of the PN-EN 50126 standard [
10] and in the regulation [
9], the description of risk assessment is similar and differs mainly in terms of details at the stage of implementation. In the standard [
10], a simplified model has been presented, showing safety activities. It has been named Hourglass model and it separates the risk analysis process, which is a part of the risk assessment at the stage of the concept of the system, from hazards analysis, which is a part of the hazards control at the stage of system implementation. The Hourglass model is well illustrated in the standard PN-EN 50126 [
10].
In considerations and analyses regarding safe systems, the concept of a risk model is used because a real risk is not known and cannot be specified. Therefore, the risk analysis must be preceded by a risk model that takes into account, in turn, the human being, real risk, description of the risk (concept mapping), risk model, and risk analysis [
11,
22,
23].
The risk model adopted in Commission Implementing Regulation (2013) [
9] shows relations between the causes, hazards, and accidents and their consequences. It is especially assumed that: a single cause can lead to a few hazards, a hazard can consequently lead to a number of accidents of different types, depending on the context of the operation process and environmental parameters. Therefore, such an accident can have various consequences. An exemplary risk model has been shown at
Figure 1. It specifies how a hazard at the level of a considered subsystem or system, as a result of operational or technical factors, can be moved to the railway system level, and can lead to an accident, taking into account trigger events and availability of external barriers. The risk graph was applied to consider the below risk model and include evaluation of causes, external barriers, and consequences.
One of the key and difficult stages of a risk analysis is hazards identification. It is a continuous stage, which should be conducted on a regular basis, during the whole lifecycle of the system. Here, a hazards identification process at the initial stage of system development is described. Firstly, parts of the system are selected, where an unwanted event can take place. Causes and consequences of these events are specified. This task encompasses many methods which support hazard identification. FMECA [
24] and HAZOP are the most widespread, described in detail by standards [
17,
25]. It is also recommended to determine principles and criteria of hazards identification beforehand. The risk model needs to be supplemented by information about consequences of a potential accident. For this purpose, induction and deduction methods are applied, or combination thereof, among others the ones described above and such methods as ETA and FTA [
26], described in detail by standards [
27,
28]. The above activities provide information which is essential in order to classify hazards and estimate risk. In order to evaluate risk, it is necessary to specify risk acceptance criteria. Each analysis starts with a qualitative approach, followed by a quantitative one. However, it is not always possible to estimate risk mathematically. In [
9,
10], three risk acceptance criteria have been adopted: codes of practice, a comparison with a reference system, and an explicit risk estimation. Such an approach provides for an overview of the whole system, not just the part that can be assessed quantitatively. Proposals regarding the estimation of an explicit risk have been presented in Chapters 3 and 4 of the publication. By using the code of practice or by comparing with a reference system, hazards can be controlled. The code of practice can include principles which are recognized and applied in the railway environment (e.g., standard [
5], registers of railway plans). New principles can be applied, but they need to meet the requirements indicated in [
9] and be justified. A comparison with a reference system consists of applying safety measures already checked in the system with safety acceptance and is operated. When it comes to residual risk, the process management of the established Safety Related Application Conditions (SRAC) is very important [
10].
3. Materials and Methods
Risk assessment means the overall, multi-stage process comprising: system definition, hazards identification, risk estimation, and risk evaluation. Risk assessment is linked to the management of hazards through a hazard record. The system definition should, among others, specify the objective of the system, its functions and elements, as well as boundary, interfaces and environment. After hazards identification, risk acceptance is specified, using the following risk acceptance principles: the application of codes of practice, a comparison with similar systems and an explicit risk estimation [
9]. At the stage of risk estimation, it needs to be shown that the risk acceptance principle has been applied accordingly. Application of these risk acceptance principles will make it possible to identify possible safety measures which will make the risk of the assessed system acceptable. Out of the identified safety measures, measures which serve the purpose of risk control will be selected, which will become safety requirements that the system needs to meet [
4]. The whole process should be documented in the hazards record, which means the document in which identified hazards, their related measures, their origin, and the reference to the organization which has to manage them are recorded and referenced.
In order to apply risk estimation as the principle of risk acceptance, it is necessary to specify an acceptable risk level. Estimating an explicit risk is possible through specifying the frequency of hazard occurrence and its seriousness. The frequency and seriousness can be specified qualitatively or quantitatively (e.g., by matrix methods or ratio methods). For the purposes of technical systems, taking into account the frequency and seriousness, a safety objective will be set in the form of THR. It determines SIL. This chapter describes the risk graph method as the method chosen for further consideration.
The risk graph method, in accordance with the recommendations of the standard IEC 61508 [
12], as well as of the standard PN-EN 50126 [
4], makes it possible to estimate risk and determine the required safety integrity level targets or THR, using the following risk elements
The parameters of the equations definition:
S—potential consequences of the event
E—exposure (time/frequency of exposure to the event)
A—possibility to avoid or limit damages
O—probability of the occurrence of the event
The relationship between the elements of the method and the passage through the subsequent assessment steps is shown in
Figure 2.
The method is relatively easy to use and takes into account in explicit way more parameters then risk matrices methods when specifying the risk level. However, it needs to be adjusted for the right application.
Other data should be determined using hazards identification methods (such as HAZOP) and consequences analysis (e.g., ETA). The determined hazard rate constitutes tolerable hazard rate (THR) for a given system. The adaptation of the method was presented in
Section 4 as it is the part of the HAZOP method and no new approach has been identified.
For each factor, criteria based on quantitative and qualitative values have been adopted. For the purpose of the analysis it was necessary to adapt the initial parameters/criteria of the graph. They have been defined in the following way:
START
Set up of initial conditions for the analysis
There is a procedure of bidirectional communication between the train dispatcher and the level crossing attendant (currently in an analogue mode, telephone communication)
There is no dependence between track/station side devices and communication devices between the train dispatcher and the attendant
SWI system cannot be worse than the existing communication system.
A risk analysis executed with a risk graph method has been used for the SWI communication system between the train dispatcher and the level crossing attendant. The system shall support bi-directional communication based on telegrams and confirmation of messages. The initial conditions are related with applied system and are referenced for the further steps in the analysis. The authors decided to establish the initial conditions as reference base for the criteria analysis. The conditions and graduation have been analyzed at workshop together with railway experts. At the workshop, people responsible for safety, engineering, maintenance, and operation were invited. At the meeting the method was explained. The goal of the meeting was to analyze the propose definition of parameters based on the brainstorming. The parameters have been as well verified by the railway infrastructure manager.
S—potential consequences of the event
S0—event not affecting safety
S1—event affecting safety (no fatalities)
S2—event with a serious consequence (one fatality)
S3—event with catastrophic consequences (more than one fatality)
The potential consequences of the event have been developed in a way to meet the regulation applied in Poland [
5] and represent the 4-step order of magnitude increasing from S0 to S3.
E—exposure (time/frequency of exposure to the event)
Exposure to the event was selected base on two possible options. In the authors opinion, these two options actually define if the function is in demand mode or in continuous mode. This was the assumption made for further analysis.
A—possibility to avoid or limit damages
For technical reasons (the system equipped with existing technical safety measures), human reasons (skills, awareness, knowledge, psychophysical predispositions), and organizational reasons.
These two parameters have been designed to draw attention for analysts to external barriers minimizing the frequency or the consequences of the event. This category considers the external barriers from the risk model are presented in
Figure 1 above. It is important to notice that at the bottom of the graph there is no difference in resulting SIL (SIL = 4) when selecting A1 or A2. Authors assumed here that the goal of the analysis was aimed at determining the SIL level for the electronic system. Any other additional external measures have to be analyzed together with the railway infrastructure manager and were not considered in the analysis. This assumption is further discussed in Chapter 5.
O—probability of the occurrence of the event
The history of accidents for the same or similar systems.
O1—the event can happen often (more often than once every 10 years; 1 × 10−5)
O2—the event can happen sometimes during the lifecycle of the system
O3—the event can happen rarely (more rarely than once every 20 years; 5 × 10−6).
Probability of the occurrence of the event is established based on the history data. As mentioned at the start of the analysis, one of the assumption is that the new system cannot be worse than the one used before. The three-step approach and the ranges were selected in workshop with railway experts. The main issue with selecting the parameter is to established the contribution level of the system under consideration to the event scenario. The authors see the possibility to further research in this area.
On the basis of these data, a risk analysis report has been prepared. Results for an exemplary function have been presented in Chapter 4.
The method has been used in order to calculate the required value of THR/SIL parameters for specific system functions and in order to verify the parameter SIL4 imposed by the railway infrastructure manager for the whole SWI system. The SWI system is the system to be used by the train dispatcher and the level crossing operator to support the communication between them. The manually operated level crossing is used in the area where there is huge road and railway traffic. At the moment, in Poland, there are 2415 such operating level crossings (category A), which is 18.9% of all level crossings in Poland [
29,
30]. The
Figure 3 represents the share of the level crossing categories in Poland [
30].
Kat. A—means level crossing category A—Manually operated level crossing (by signalman or gatekeeper)
Kat. B—means level crossing category B—Automatic level crossing equipped with road signals and barriers
Kat. C—means level crossing category C—Automatic level crossing equipped with road signals only
Kat. D—means level crossing category D—Level crossing not equipped with any LX system
Kat. E—means level crossing category E—Level crossing for pedestrians equipped with systems like for cat. A or B
Kat. F—means level crossing category F—Private level crossings equipped like for cat. A or B.
From 2013 to 2018 there was, in average, 12.5 accidents per year [
30], which shows clearly the need to improve the level crossing operation. The SWI system shall improve that at the level of the communication. The system in principle shall support bi-directional communication based on telegrams and confirmation of messages and shall be primary mean to communicate between two operators and the current analogue phone communication will serve as a fallback system.
The SWI system has several components i.e., SWI-BD—database is a recording system, and a place for his administration and configuration files, SWI-IF—transferring data interface with external systems, SWI-PI—human machine interface system unit, responsible for exchanging telegrams between signaler and gatekeeper and SWI-SZ—approach detection unit notifying of the gatekeeper of incoming railway vehicle to level crossing (optional) [
31]. The general decomposition of the system is presented in
Figure 4.
4. Results
At the first stage of the analysis, railway instruction regulating requirements for SWI has been analyzed, and a functional analysis has been done in order to identify the necessary functions that will be performed by the system. The application of the system for railway line no. 7 Warsaw–Lublin at level crossings category A was taken into consideration. As a result of the latter, system functions have been determined, for which an identification number has been assigned and a preliminary information flow has been specified as required in order to perform the function. Below you can find the example of several functions and its decomposition to information flow. In total, there was 52 functions and they were further broke down to 81 information. The
Table 1 presents the examples of the SWI system functions.
On the basis of the determined functions, system requirements have been developed and in total there were 81 system requirements. All of the system requirements have been analyzed with application of HAZOP method, with use of the key words:
“Loss of function”,
“Excess of function”,
“Inverse of intended function”,
“Function done too early”,
“Function done too late” and
“Other than intended function”.
The HAZOP method is commonly used in the railway signalling domain and enables the identification of critical elements in the system functionality. The
Table 2 presents the extract from HAZOP analysis.
As a result of the HAZOP analysis, consequences of incorrect execution of a given function have been determined and generic hazards have been assigned, which had been identified earlier, as a result of a preliminary hazards analysis. The PHA was actually to derive the generic hazards in form of the preliminary hazard lists. It was done based on the brainstorm meeting and analysis of Hazards Log of the railway infrastructure. The HAZOP was to detail the hazards related with the identified high level functions.
The next step of the analysis was the application of the risk graph. The risk graph method was introduced in Chapter 2 and an example for mentioned functions is graphically shown in
Figure 5.
The
Table 3 presents exemplary result of the risk analysis for several functions.
The result of the analysis has provided very good screening of the requirements. The full scope of the analysis was done in 6 sessions with relatively small team. At the workshop the people responsible for safety, engineering, maintenance, and operation were invited. There was in total five people participated with two people experienced in risk assessment (five years’ experience) and three people experienced in signalling (from five to eight years). The results have been verified by the safety authority and further by the railway authority at separate meeting. The necessary effort was easy to consider in the project activities including clarification with the user. For the total of 81 functions, 23 were identified as S3 level, so the worst case scenario, no S2 level parameter have been identified, and 4 S1 level were identified. Other functions were estimated at S0 so no further steps in the analysis were necessary. In total, there were 16 SIL3/4 functions identified, but the most important point was the possibility to justify 65 functions at lower safety level i.e., SIL0,1,2, or non SIL.
5. Discussion
The results of the analysis for all performed system functions meet the expectation of the Railway Authority. The interesting fact related with adaptation of risk graph shows that with the highest severity (S3) there was only 65% of functions with the highest safety requirements (SIL4). The remaining 35% of cases, after careful analysis, show that two functions have high exposure and only due to the low value of parameters A and O, it was possible to limit the achieved requirement.
The results of the analysis have been presented to the railway infrastructure manager and the next step was to jointly review it. The standard method used by railway infrastructure manager is based on FMEA and RPN analysis. The method is regulated by the Technical and Operational Risk Analysis procedure [
32]. Therefore, the authors contrasted their own approach to the methods used so far. As a result of this activity, the value of the parameter “O” has been reduced in some situations, given the fact that the SWI system can only partly contribute to accidents at level crossings, consisting in a railway vehicle running into a road vehicle. It further reduced the achieved SIL [
13]. Additionally, in reference to the initial stage of the analysis, where a set of technical devices performing specific functions was not clearly defined and was treated like a “black box”, the following principle has been adopted: “Worst possible scenario”, “Reasonable estimates”, “Reasonable worst case”. Nevertheless it was not further considered in the article and this approach can be further investigated. Taking into account the principle introduced in [
33], which stipulates that the risk related to technical systems, with a plausible likelihood of catastrophic consequences as a direct result of a breakdown, does not need to be further reduced if the frequency of such breakdowns equals or is lower than 10–9 per hour of the system’s working time. On the basis of the above mentioned activities, the required SIL and THR have been assigned to every function of the system, instead of, like it was the case in earlier analyses made by the infrastructure manger, to the system as a whole.