1. Introduction
Railways continue to be one of the main transport systems nowadays [
1,
2], covering public, private, and military needs over a wide operational area. Thus, railway assets are an attractive target for malicious actors and are exposed to various threats, from natural events to man-made ones, such as terrorism or vandalism (e.g., [
3,
4,
5,
6]).
The associated risks are exacerbated by the fact that railway infrastructure assets are typically placed along the route, including remote areas where physically protecting them is challenging. Moreover, railway premises have a large attack surface (due to their numerous electronic and electrical parts, such as power supply, switches, scheduling, and other subsystems), but often reside far from the main stations. While auditing for physical threats is quite important [
7], the premises are usually inspected remotely through cameras. Sensory equipment is also deployed to monitor environmental parameters. The goal is to prevent potential intruders [
8,
9,
10], avoid machinery overheating, and detect fires. Since the interconnection of this monitoring equipment is, at least partly, wireless, it can become a target of several types of attacks.
In this context and considering that a successful attack could damage the railway’s operation or even cause severe injuries and deaths, cybersecurity is an important consideration for such interconnected critical systems [
11,
12]. Attackers can disrupt communications (e.g., through jammers) or even infiltrate the networks and take control of critical equipment [
13]. Cyber-attacks on the command-and-control centers (C&C) and the information systems are also feasible [
13,
14]. Thus, the secure interconnection of all the deployed elements and platforms is important, and the cyber and physical security of the critical infrastructure becomes imperative [
9,
10].
As sketched above, safety is another design factor and one that is closely related to security. Cargo and passengers are transported in high volumes each day, covering long distances. In the past, railway accidents have caused a number of deaths, along with significant financial losses [
15]. While the introduction of electronic controllers reduced the occurrence of such situations [
16], safety risks cannot be ignored, considering the wide railway coverage that still includes aspects such as uninspected car-crossings, system malfunctions (like signal loss), and, of course, the human factor [
17,
18].
Within the ever-changing technological landscape, there is currently a move from automated to intelligent cyber-physical systems (CPS), motivated by the speedy infiltration of the Internet of Things (IoT) and cloud computing and enabled by wireless networking [
19,
20,
21,
22]. Wireless sensor networks (WSNs) [
23] can cover the wide railway operational territory, gathering and processing pieces of ambient knowledge, while gateways can be used to transmit the data to the controlling center or a cloud service. The railway controlling software at the backend can, then, collect and integrate the spatial information and manage the underlying subsystems [
24,
25]. Therefore, WSNs are an ideal solution for covering the railway operating area, including the railway routes and various scattered shelters.
However, the railway cyber infrastructure and networks currently only adopt rudimentary defenses (e.g., cryptography), which provide protection against the most basic threats, forfeiting effective ways of detecting advanced cyber-attacks [
26]. While initially designed as closed systems, current infrastructure networks are vulnerable to various network layer attacks, like blackhole, badmouthing, and jamming attacks [
27].
Motivated by the above, this work presents “SPD-Safe”, (security, privacy, and dependability (SPD)), an administration framework for railway CPS, aiming to enhance the security, privacy, dependability, and safety of the intelligent railway infrastructure, while enabling services for monitoring and managing the overall setting. The framework integrates mechanisms for mitigating cyber-attacks attempting to disrupt communications or compromise infrastructure assets, and periodic malfunctioning of assets is also taken into consideration. SPD-Safe can act as an intelligent communications-based train control (CBTC) system for railway CPS, leveraging artificial intelligence (AI) to manage the system at runtime. The system uses standardized solutions, and its building blocks can be easily retrofitted in current deployments.
In addition to the detailed description of the proposed framework, a preliminary implementation is described and evaluated, concentrating on the management of: (a) In-carriage, and (b) on-route sub-systems. WSNs are deployed inside the carriage and by the railway tracks to safeguard carriages that transfer dangerous freight and to help avoid crashes with objects blocking the train’s route (like stuck vehicles on rail track crossings), respectively. Furthermore, smart cameras are installed to improve the physical security of the critical infrastructure. In the context of the two use cases (a) and (b) above, through SPD-Safe the railway CPS is configured in real-time to tackle ongoing cyber-attacks and control safety-related incidents. This hands-on validation was developed and demonstrated under the EU-funded project new embedded Systems arcHItecturE for multi-Layer Dependable solutions (nSHIELD) [
28], with the cooperation of major industrial partners in the railway and defense domains, including Ansaldo STS (
http://www.railway-technology.com/contractors/signal/ansaldo-sts/), Selex ES (now Leonardo S.p.A.:
https://www.leonardocompany.com/en/home), and HAI (
http://www.haicorp.com/en/). Simulation analysis was also conducted during the design phase, utilizing the security-aware Cyber-Physical Systems (CPS) Simulator Framework (COSSIM) [
29], paving the way for the final installation of the proposed system, as presented in the following sections.
The rest of the paper is structured as follows: In
Section 2, related work on railway signaling systems is reviewed. In
Section 3, the middleware platform and intelligent agent technologies that manage the underlying equipment are presented. In
Section 4, the network layer protection mechanisms are detailed. In
Section 5, the implementation details of SPD-Safe are provided and the application in the railway setting is demonstrated. The proposed system is also compared with relevant systems in
Section 6, while
Section 7 features the concluding remarks.
2. Materials and Methods—Related Work
Smart transportation ecosystems involve, among others, passenger services as well as critical infrastructure-related applications and the associated safeguards. The fundamental goals in this context include “green” (i.e., environment-friendly) operation, improved performance and efficacy, as well as enhanced security and safety.
Railways, in specific, rely on signaling systems that direct the trains’ traffic. Infrastructure control and management is achieved via various telecommunication means that are installed on carriages and tracks. Communication between track equipment and trains is achieved via CBTC signaling systems [
30,
31,
32] enabling the railway’s management and infrastructure control. For the European Union (EU), the international wireless communications standard for railways includes the European Train Control System (ETCS) [
33]. The communication baseline is implemented by the Global System for Mobile Communications—Railway (GSM-R) [
34], which is further enhanced with the General Packet Radio Service (GPRS) [
35] and forms the base of an intelligent transportation application. ETCS utilizes trackside equipment that transmits information regarding the route to unified controlling equipment within the train cab. Thus, all lineside data are passed wirelessly to the driver, without requiring the direct observation of lineside visual signals, as was the case in legacy railway settings. The adoption of ETCS results in more and longer running trains, with increased traffic and railway management capabilities.
In addition to the signaling developments, WSNs can now cover a wide railway operational area, gathering ambient data. Embedded systems implement intelligence solutions encompassing the underlying critical assets as well the interlinked smart city ecosystems. Related frameworks for intelligent monitoring of the critical infrastructure have already been proposed in the literature (e.g., [
36,
37]). The Integrated System for Transport Infrastructure surveillance and Monitoring by Electromagnetic Sensing (ISTIMES) project [
36] implements a transport infrastructure surveillance and monitoring system with electromagnetic sensing. Distributed and local sensory equipment (e.g., optic fiber sensors, infrared thermography, low-frequency geographical techniques, etc.) are utilized to perform non-destructive electromagnetic sensing and monitoring of the critical infrastructure. The Cloud to Infrared Thermography (Cloud2IR) [
37] deploys an infrared and environmental Structural Health Monitoring (SHM) information system. The software architecture enables multi-sensor connection and the interplay with cloud computing services (e.g., data aggregation, system management, etc.). However, the heterogeneity of the deployed equipment and diverse demands of the various applications make the administration of the underlying infrastructure a challenging task.
In parallel, as Service-oriented Architectures (SoAs) increase in popularity, a continuous effort to deploy SoAs within the Industrial IoT (IIoT) domain and the smart railway CPSs can be observed. Several technologies are proposed that support the required functionality, ranging from agent frameworks and middleware platforms, to communication protocols and data representation standards. Such state-of-the-art solutions are presented in the subsections that follow.
2.1. Management Platfroms and Reasoning Systems
Agent technologies constitute the typical option for modeling ambient intelligent systems that exchange information with the environment and user [
38,
39]. Intelligent agents inspect the surrounding setting and react to upcoming events at normal operation. Their AI modules process context-aware data, as collected from the surrounding environment by the attached devices.
Regarding the various agent technologies, 24 frameworks were analyzed in Kravari and Bassiliades [
40], including the popular Java Agent DEvelopment framework (JADE), Agent Globe (A-GLOBE), and Jason (the hero’s name from Greek mythology). JADE implements the relevant standards for Semantic Web and the Foundation for Intelligent Physical Agents (FIPA:
http://www.fipa.org/) (e.g., the Agent Communication Language (ACL) [
41]). The platform is easy to learn and user-friendly, while offering portability and compatibility with all Java Virtual Machines (JVMs). The open-source and stable developer versions operate with several programming languages, such as Java, Jess, and Prolog. The agent communication is fast, and the overall framework is efficient and scalable. Moreover, JADE supports strong user authentication and cryptographic solutions—i.e., JADE security (JADE-S)—along with Hypertext Transfer Protocol Secure (HTTPS). The framework is widely-used and is deployed in several fields, including reasoning in multiple domains, general purpose applications, mobile computing, and e-commerce. The study of Kravari and Bassiliades [
40] also infers that JADE is the most popular framework due to the pure Java design and the co-operation with several web systems. In addition, five respectable organizations (France Telecom, Motorola, Profactor, TILAB, and Whitestein TEchnologies AG) supervise the framework [
40].
Regarding middleware systems and messaging protocols, a comparative analysis of relevant IoT solutions (Constrained Application Protocol (CoAP), Message Queuing Telemetry Transport (MQTT), and Devices Profile for Web Services (DPWS)) was conducted in Fysarakis et al. [
42]. DPWS [
43], by the Organization for the Advancement of Structured Information Standards (OASIS:
https://www.oasis-open.org/), constitutes the benchmark in terms of ease-of-design. The framework is flexible and robust in terms of service eventing, discovery, and subscription; following initialization, the underlying devices can discover the provided services and communicate in a seamless manner.
Finally, concerning the deductive rule engines that enable the AI reasoning features, National Aeronautics and Space Administration (NASA) examines the capabilities of several related approaches (Jess, Drools, Microsoft Business Rule Engine, Official Production System Java (OPSJ), and Intelligence Logiciel (ILOG)), as described in [
44,
45]. Jess is efficient and excels in many categories. It works with dynamic facts and dynamism in variables and rules and is appropriate for NASA’s mission-critical applications as well as many other research areas [
46,
47].
2.2. Intelligent Railway Systems
To the best of the authors’ knowledge, only a few multi-agent systems (MAS) have been developed and tested on actual railway environments [
48,
49]. In addition, despite strong industrial involvement, not all developed agent capabilities are fully used in practice and thus, the full potential of agent technologies is not exploited fully. Three indicative installations on railway settings are: Train Integrity (TrainIntegrity) [
50], Condition Monitoring of a Light Rail Vehicle and Track (CMLRVT) [
51], and Sensor Networks for Railways (SENSORAIL) [
52].
TrainIntegrity utilizes WSNs to check the integrity of cargo trains [
50]. The nodes consist of the RCM 3400 RabbitCore module and sense environmental parameters. The WSN raises an alarm if it infers that an unexpected change has occurred in the train’s composition. CMLRVT is built and tested on a tramway operation in Poland. The system consists of a dispersed sensor network that is installed on the vehicles and railway infrastructure, along with the data acquisition component and a data server that maintains the artifacts of the management and analysis procedures. At first, the system collects data during normal operation, which is stored in the server. Then, the new pieces of knowledge that are sensed by the devices are compared with the nominal values. The detected variations are further analyzed by the system, revealing safety-related incidents (e.g., rail cracks) that are presented to the user through a dedicated application. However, neither of the two systems considers security issues at the network link, nor do they integrate and manage the heterogeneous underlying embedded systems.
SENSORAIL [
52] is an early warning system for the railway monitoring infrastructure. WSNs collect and integrate data to enable the detection of structural failures and security threats. Sensor clusters communicate information towards distant controlling centers through GSM-R/GPRS mobile equipment. The integration of heterogeneous sensors is managed by a component referred to as “scalable software architecture for the integration of heterogeneous sensor systems” (SeNsIM) [
53], while the detection of events is made by a model-based data correlation component, called “novel framework for the detection of attacks to critical infrastructure” (DETECT) [
54]. SENSORAIL specifies the examined threats in the Event Description Language (EDL) [
55] and maintains them within a scenario repository. Upcoming events are stored in a history database and model-checking is performed at runtime. Regarding the middleware and agent platform, SeNsIM does not utilize any semantic technologies and does not support related standards, thus lacking in terms of interoperability and ease of integration with existing setups. Moreover, SENSORAIL does not include any protection mechanisms, solely focusing on the detection of threats.
2.3. Network Layer Protection
Several schemes are suggested in the literature for protecting communication in WSNs, attempting to address pertinent security concerns (e.g., [
56,
57,
58,
59]).
The Reputation-based Framework for Sensor Networks (RFSN) [
56] authenticates underlying nodes with the Timed Efficient Stream Less Tolerant Authentication protocol (μTESLA) [
60], implementing a beta Bayesian formulation for fading and evaluating the reputation of the routing operation and the legitimacy of the reported sensed variables. Ariadne [
57] also utilizes a TESLA variant for authentication, collecting feedback regarding the successful delivery of packets to choose optimum communication paths and avoid malicious behavior. The Cooperative Secure routing protocol based on ARAN (CSRAN) [
58] integrates digital certificates and asymmetric cryptography for authentication. As in the case of RFSN, it uses a Bayesian distribution for fading and, when a node detects malicious activity, it automatically re-routes communication from that point on. The Secure Resilient Reputation-based Routing (SR3) [
59] adopts lightweight cryptography (LWC) and symmetric modules for security and authentication. Fading is accomplished by a First In, First Out (FIFO) finite list. The system combines reputation with a reinforced random walk algorithm, producing enhanced load-balancing at the cost of a high intermediate forwarding node count.
Despite the plethora of proposed solutions, and while most can tackle basic security attacks and malfunctioning cases, there are still various open avenues for attackers, including flooding attacks in congested periods, topology-related attacks, and jamming [
61,
62].
3. Administration of IoT Deployments
Considering the landscape sketched above, this section presents the proposed SPD-Safe solution, and more specifically the deployed platform and the reasoning process of each SPD-Safe agent. The core reasoning engine has been previously presented by the authors in [
63]. This version enhances the network layer security and is applied in a mobile setting, forming an intelligent transportation system that complies with the real-time requirements of CBTC for railways.
SPD-Safe comprises a framework that integrates variants of the aforementioned primitives (i.e., agents, middleware, rule engine, and network layer protection) across all system layers to implement an efficient, scalable, practical, and easy-to-deploy and maintain solution, with adequate reasoning and management capabilities. From top-to-bottom, the system consists of four layers: (i) An overlay with intelligent agents that control distinct subsystems; (ii) a middleware platform that enables communication between the agents and underlying networks; (iii) the network layer that consists of interconnected IoT devices; and (iv) the node layer that represents the devices themselves. The core technological building blocks will be detailed in the subsections that follow.
3.1. Agent Technologies & Middleware Solutions
SPD-Safe utilizes JADE [
64] as the top-layer multi-agent system. It adopts and implements standardized approaches to agent deployment, such as the ACL [
41] by FIPA. The JADE-S add-on [
65,
66] safeguards communication at the overlay and offers built-in security functionality for confidentiality, integrity, authentication, and authorization.
Then, each agent is ported as a bundle in the middleware platform Open Service Gateway initiative (OSGi) [
67]. Through it, an agent can monitor the underlying subsystem, enhancing real-time management. Network gateways also deploy a controlling bundle in the same platform, defining the offered functionality as a service in the DPWS standard [
43]. The agent and the related network bundles interchange well-structured semantic data, as defined in the OASIS standardized Common Alerting Protocol (CAP) [
68]. The OSGi platform also provides its own built-in security features for the inner-platform communication, limiting bundle functionality to pre-defined capabilities and protecting both the agent and controller bundles.
Here, other than these built-in features that are provided by the deployed platforms at the overlay and middleware layers, SPD-Safe integrates an additional defense mechanism for the network and node layers, namely Secure Route (SecRoute) [
61], a security protocol that protects the wireless ad hoc communication of the underlying embedded devices. This protocol counters several types of threats and attacks at the network link, protects the nodes’ assets and their resource consumption, and acts as an intrusion detection module for the upper layers. When a security incident is recorded, the network gateway bundle will send related CAP messages to the responsible agent, which may take further action. SecRoute is detailed in the next section.
Metrics that evaluate the various system aspects are now an integral feature of the development cycle. They offer a quantitative indication regarding the compliance with the targeted requirements of the application domain. An evaluation method for the estimation of the security, privacy, and dependability (SPD) properties for configurable embedded systems is presented by the authors in Hatzivasilis et al. [
63]. For every configuration option, the metrics derive a triple vector of <Security, Privacy, Dependability>, whereby the vector’s factors are assigned a value from 0–100, representing no to full protection respectively. SPD-Safe adopts this methodology to enable a metric-driven SPD- and safety-aware administration, where the reasoning procedure triggers runtime system adaptations to reach specific SPD goals [
69].
3.2. Reasoning Capabilities & Conflict Resolution
The artificial intelligence (AI) behavior of each agent is developed in the rule engine Jess [
70,
71]. For knowledge representation and reasoning, the Jess-EC [
72] is used. The latter is an Event Calculus (EC) [
73] implementation in Jess, offering the required semantics modeling. SPD-Safe’s software layers are illustrated in
Figure 1.
Each agents’ AI procedure implements automated temporal, casual, and epistemic reasoning with real-time events, action preconditions, rule priorities, indirect effects, context-sensitive side-effects, as well as the common law of inertia. Moreover, the reasoning capabilities can cope with the requirements of dynamic and partially known or uncertain domains.
However, as agents exchange information, contradictory reasoning results may occur due to the local viewpoint of each entity and the lack of global knowledge. Thus, for resolving conflicts, SPD-Safe introduces the epistemic mechanism of share theories [
63]. The participating entities send the involved theory rules to a mediator agent, along with the recently sensed local events. The mediator combines these elements and performs a reasoning operation that determines the final outcome and the state of the conflicting assets.
Nevertheless, if an agent utilizes protected data that must be maintained locally and not distributed (e.g., confidential information regarding user policies or system settings), it will not be able to contribute in the share theory with its full knowledge. For this occasion, an alternative relational grading mechanism, called certainty degree [
63], resolves the affair quickly and efficiently. The mechanism utilizes subjective criteria as well as the agents’ roles and hierarchy, marshaling the problem without constructing the related share theory and retaining the system’s coherency. Thus, the certainty degree is applied in affairs where reasoning with locally protected data is involved, otherwise a share theory is constructed.
3.3. SPD Measurement
The SPD multi-metric methodology [
69] measures the provided protection level of a system and its various configurations. The system’s perimeter is identified and the data sources, entry, and exit points are recorded. Then, the mechanisms that protect each of these elements are assessed based on the standardized Criteria Evaluation Methodology (CEM) [
69]. This involves the attack potential risk analysis that evaluates the attacker’s motive to misuse specific system elements, expertise, and the resources that they are willing to devote for an attack. Henceforth, five parameters are examined for the analysis of a potential threat:
- -
Required time: The time that it is required to perform a specific attack (e.g., in days or weeks);
- -
Expertise: The technical skills and knowledge that the attacking group can exhibit (such as copy-cat, advanced, or expert);
- -
Knowledge of the target: Familiarity with the targeted system and its operation (e.g., public, sensitive, or critical information concerning some subsystems, etc.);
- -
Window of opportunity: The attacker may require appreciable access to the system in order to exploit a vulnerability and avoid detection;
- -
Resources: The software, hardware, or other equipment that is necessary to perform an attack (such as specialized or common resources).
The method does not investigate every possible attack but educes a good indication of the defense status in accordance with standard ratings. The protection level for each of the three SPD properties is calculated by integrating the risk analysis with the efficacy of the installed defenses against known attacks and/or other limitations (e.g., based on the latest reports from Computer Emergency Response Teams (CERTs) or Common Vulnerabilities and Exposures (CVE) repositories). The result is a value in the range of 0–100, where 0 represents the absence of defense mechanisms and 100 represents full protection. The final outcome is a vector of <Security, Privacy, Dependability>, which represents the total SPD value of the currently composed setting of the system. The SPDs of different system configurations can be estimated either in advance or at runtime. The first option is leveraged by the AI units of SPD-Safe in order to perform proactive and/or automated changes in the state architecture when a safety or security event occurs. The second option provides indications to the human operator in order to take decisions and make manual interventions.
Therefore, the protection status of all mechanisms and their integration in the demonstration examples are pre-calculated based on this method, as described in
Section 4 and
Section 5. Then, automated administration policies are triggered in response to real-time events, as presented in
Section 5.
These features enable the implementation of a relative novel protection strategy, called Moving Target Defenses (MTDs) [
74]. When a system is stable, it is seen as a “sitting duck” by the attacker, who has plenty of time to analyze it, detect potential vulnerabilities, and exploit them. With MTD, a system that is aware of the defense level of its various components, their configurations, and the integration of all of them, can alter the setting automatically or semi-automatically in a periodic fashion. The AI modules are always keeping the system in a secure state, while the different configuration and architectural sets increase the system states that have to be analyzed by the attacker. In addition, the time that a specific setting remains active is determined by the time required for an average hacker to analyze it (i.e., based on the “Required Time” factor of the attack potential risk analysis). Performing attacks is becoming quite hard, while the window of opportunity for the malicious entities has significantly decreased.
3.4. AI Processing & Performance
The reasoning component of Jess implements the RETE algorithm (Latin word for net, meaning network in this domain) [
75]. This is the most widely-used pattern matching technique for rule-based systems and is optimized for speed. Scalability and performance are affected by the three factors of: (i) The rules’ volume (R), (ii) the average number of patterns in the left-hand-side of each rule (P), and (iii) the facts in the working memory (F). Computational complexity is linear to the working memory size and in the order of O (RPF). For each SPD agent in the railway mission-critical applications that are examined in the following sections, the theory rules volume (R) is very low (around 30 rules per scenario). In order to reduce the pattern-matching space, unique identifiers are assigned to every modeled entity, and therefore, occurring events affect specifically defined parameters, keeping the pattern-matching ration low (P) and in the order of 1–3. Performance is mostly influenced by the number of facts (F). In the demonstrated cases, it requires 10–20 facts per scenario. The computational overhead for an SPD agent is in the range of a nanosecond with additionally 50 bytes in memory.
For a central agent that collects information from the whole railway system, it requires around 500 facts and 40 rules to model the underlying setting. At boot time, the reasoning engine takes to run around 1.6 s, 87 MB for code, and 45 MB in RAM. Then, a reasoning process for a theory and a few hundreds of facts would require 0.002 s on average, representing the actual delay that affects the applications.
3.5. Relevant Methodologies for Secure IoT Modeling
Over time, several solutions have been proposed that try to resolve the open issues of capturing the security posture of an IoT or other system and facilitate its administration [
76,
77,
78,
79,
80]. Eby et al. [
76] integrated the Simple Modeling Language for Embedded Systems (SMoLES) with the Security Model Analysis Language (SMAL) [
76]. SMAL provides security extensions to the composition meta-model of the Domain Specific Modeling Language (DSML) [
77] and can express access control policies for IoT applications. The resulting framework is called SMoLES Security (SMoLES-SEC). However, its reasoning capabilities are bounded due to the constrained expressiveness of the underlying SMAL. Furthermore, SMoLES-SEC cannot deduce which security characteristics hold after the compositions of two components or the final security status of the composed system.
Service Dependency Trees (SDTs) [
78] support the verification of service secure composition in IoT ecosystems. The IoT devices/nodes construct their own SDT. For each provided service, the relevant SDT defines the potential external service nodes that the service is depending on. The nodes are also aware of all recursive SDTs for their composed services. Thus, secure service composition is performed by enabling integration only with SDTs where all paths and involved entities are trusted. On the other hand, creating a SDT for a real IoT application is not trivial, while trustworthiness and consistency in an actual complex and dynamic environment may be challenging.
Albanese et al. [
79] utilize attack surface metrics in order to evaluate the security aspects of system and materialize MTDs strategies. This solution calculates the distance of the security surface of the various system states. The goal is to administrate responses against ongoing attacks as well as to deduce a system setting that exhibits specific desirable parameters. Techniques for assessing and reducing the cost for the defender are also included.
Savola and Sihvonen [
80] propose a MTD approach based on a multi-metric-driven management framework. The overall solution has been applied in an e-health digital environment for chronic diseases [
80], where three metric types are considered. Risk-driven security assurance and engineering metrics are defined at deployment-time to offer an early assessment on the deployed defense mechanisms and their effectiveness. Continuous security monitoring metrics are determined at operational-time, enabling the security correctness assessment, enhanced systematization, and traceability of the various product requirements and involved metrics. Thereupon, automated adaptive decision-making metrics are assigned at operational-time and accomplish a higher quality security effectiveness understanding in operational security auditing and future versioning of the system. The method supports continuous security monitoring and automated metric-driven security-related actions.
Table 1 presents the outcomes of the qualitative analysis. The modeling expressiveness of SPD-Safe is quite general and can also be utilized in complex and dynamic systems. Moreover, it assesses all three security, privacy, and dependability properties and can evaluate their status both before a composition is performed and after the integration of the system. As with the other relevant approaches, the MTD features are driven by metrics and SPD-Safe provides a concrete implementation of this modern defense type. The overall solution fits with the distributed nature of IoT ecosystems and can resolve conflicts that may arise due to knowledge sharing between the various entities.
5. SPD-Safe Demonstration
5.1. Railway CPS Architecture
This section details the demonstration and evaluation of the whole SPD-Safe framework in the context of protecting and managing a railway CPS. In the proof-of-concept setting, our proposal assesses and manages the system and ambient ecosystem with the goal of safeguarding the trains’ carriages and railway’s routes. The hardware platforms incorporate embedded devices that control smart equipment (e.g., cameras and electronic doors), inspect environmental conditions, and exchange information wirelessly. Furthermore, the PBAC framework is applied for the control of the physical access for personnel, determined by access rights that are specified in XACML policies. Every agent manages a smart subsystem, like a train or a station. Backend agents can also run at the cloud in order to gather high level information, perform big data analysis, and enable interaction with external systems and actuators. These agents run on virtual machines deployed on the research cloud platform GRNET Virtual MAchines (ViMA:
http://vime.grnet.gr/about/info/en/).
Figure 4 illustrates the railway system architecture. The whole setting is administered by a master agent (MA) at the C&C. At the edge, simple and more lightweight agents (SAs) protect the local subsystems (applying access control, lightweight data analysis, incident detection, etc.) and exchange information with the MA (i.e., security/safety events and response strategies). The MA can, optionally, forward data to a cloud SA for storing or in-depth analysis. The cloud SA also presents high-level knowledge to end-users as well as the current SPD status of the railway infrastructure.
For this demonstration, the MA and the C&C services are deployed on a laptop. Both MA and cloud SA are installed on machines witha 2.1 GHz Intel Core i-7 processor, 8 GB of RAM, and the Ubuntu Linux Operating System (OS). The SAs are deployed on the BeagleBone devices at the edge systems.
As a case study, two deployments are evaluated. In the first indoor setting, which emulates in-carriage or shelter equipment, we test the system under normal operation and the aforementioned attacks on routing. In the second outdoor scenario, which emulates the on-route equipment, we examine the system’s response to safety-related incidents. Both networks run the SecRoute protocol [
61] to enable communication, protect the network layer against cyber-attacks, and act as an intrusion detection and incident response system for the upper layers.
5.2. Indoor Setting—Cyber-Security
The demonstration setting includes a carriage/shelter inspecting application, which is equipped with a surveillance system and WSNs. Those components are sensitive to network layer threats, like jamming and blackhole attacks. The deployed network is depicted in
Figure 5, where these devices are deployed in a shelter [
28]:
At the entrance, the smart camera inspects for physical intrusion;
Two WSNs are deployed in the shelter. WSN1-1 (green color) monitors light and temperature, and WSN1-2 (red color) senses temperature. WSN1-1 and WSN1-2 utilize different hardware to enhance diversity and ensure redundancy for the monitored factors;
A gateway interconnects the rest of the components with the C&C.
WSN1-1 consists of eight Memsic Iris sensor nodes (16 MHz Atmel ATMega 1281 processor, 8 KB RAM, Contiki OS). The devices are battery powered and measure light and temperature. Furthermore, the smart camera is controlled by the node at the carriage’s entrance. WSN1-2 is installed for redundancy and is comprised of Zolertia Z1 sensor nodes (16 MHz MSP430 processor, 8 KB RAM, and Contiki OS) that collect temperature data. The two WSNs are monitored by two relevant simple agents (SA1-1 and SA1-2 respectively). Every device executes the PEP module of the PBAC framework. The devices also exchange data with the gateway, which runs the access policies for PBAC, and communicates with an MA which administrates the whole network.
The devices gather environmental information and send data to the relevant base station (laptop with WiFi connectivity). This component integrates and processes the received information. It also runs an application with which the user accesses and manages the overall testbed.
The different components are evaluated by the corresponding agents, who also estimate the aggregate SPD value of the whole system. The agents inspect their underlying domain, managing it based on an SPD-aware reasoning operation. Furthermore, the system is re-configurable at runtime according to the SPD protection and performance goals defined in the activated policy. Affected agents configure their subsystem’s settings to raise the SPD value when attacks are performed and then return to normal when the attacks are over (to save resources). Regarding the adaptation capabilities integrated within the proof-of-concept, the cryptographic service provides three communication states: Plaintext, authenticated, as well as authenticated encryption. Additionally, the trust scheme supports two trust evaluation states: Direct trust only, as well as a combination of direct and indirect trust.
The system begins with a moderate SPD configuration to conserve resources (i.e., authenticated communication and direct trust). If SPD-Safe observes malicious activity, it informs the system entities to raise their protection level. The relevant response actions are specified in a security policy (applicable to the specific device type), such as applying authenticated and encrypted communication with combined direct and indirect trust information. The SPD value and status of each system component is then altered as a response to the launched attacks, so as to achieve a sufficient level of protection. The WSNs comply with the current policies, becoming stricter to misbehavior and isolating the compromised nodes. The main protection mechanism against cyber-attacks (i.e., blackhole or link-spoofing) is provided by SecRoute, while the smart camera enhances physical protection. In the same way, the system returns to the previous (initial) state when the triggering conditions are over.
For WSN
1-1, we emulate scenarios where: (i) A node is malfunctioning due to low battery, and (ii) a compromised node launches a badmouth attack. In (i) the node is protected when a low energy level is observed by not including in traffic forwarding operations. The administrator gets notified accordingly. When the issue is fixed, the trust level is restored and the nodes’ operational status returns back to normal. In case (ii), the compromised entity is detected when the attack rate reaches a threshold and it is blocked from routing operations. For WSN
1-2, we launch blackhole and jamming attacks against congested or topology significant components. The secure routing mechanism successfully identifies both attacks and mitigates them.
Table 5 presents in detail the above-mentioned scenario phases. The SPD levels are depicted with: (i) Red for values of 0–50—i.e., a situation where the provided protection is low, the proper functionality may not be available, and the operator must take immediately the related countermeasures; (ii) yellow for values of 51–70—i.e., moderate protection but still safe operation; and (iii) green for values of 71–100—i.e., high levels of protection.
5.3. Outdoor Setting—Safety Scenario
For outdoor on-route defense, a similar WSN with four BeagleBone nodes is installed. The nodes are connected with a mains power supply and control a smart camera as well as weather sensors. In the emulated use-case, the nodes and related SAs are deployed on: (i) The passenger’s station; (ii) the track; (iii) the carriage departure, and; (iv) all bridges and tunnels along the track.
Figure 6 illustrates the on-route WSN
2 [
28] along with the central MA and underlying SA
2-1–SA
2-4.
Through the responsible SA, the networking components (e.g., sensors and cameras) send real-time information to a security control center and the related master agent.
Figure 7 depicts the graphical user interface and the visualization of the information that is collected by the on-route equipment, as developed by Ansaldo STS.
In case of an emergency, the agents manage the system components to advise the personnel and assist the passengers. The demonstrated incident emulates the response strategy for a fire-alarm, where decisions concerning both safety and security must be taken. In
Appendix A, the code sample
Figure A1 describes the CAP message that indicates the fire alarm.
Normally, for the indoor setting, the personnel and passengers are allowed to open doors based on their access rights (as determined by safety and security rules). When fire is detected by the sensors, an alarm is triggered, and the associated agent is notified. The agent takes the decision to degrade the security status by unlocking all doors, therefore enabling the unhindered evacuation of the train. Furthermore, via GSM, the agent automatically transmits an SMS to the responsible authorities concerning this incident (including situation’s severity, GPS coordinates) and alerts the neighboring entities to be aware (e.g., agents on nearby trains). The train agents that cross the area are also notified to perform related actions (such as stop to the nearest station or change route). Moreover, it is assumed that during normal operation the smart cameras capture frames at a low rate to preserve bandwidth. When the alarm is raised, the setting is reconfigured at runtime, offering a high framerate and continuous monitoring of the affected area. As the fire is extinguished and the damaged components are restored, the normal status is restored. The code shown in
Figure A2 summarizes the main processing flow and the emergency response rules that perform the described actions (for more information regarding EC, please refer to Mueller [
73]).
6. Discussion
6.1. Comparison
This subsection compares SPD-Safe with the related works presented in
Section 2.2 (i.e., TrainIntegrity, CMLRVT, and SENSORAIL) in terms of features.
Table 6 summarizes the comparison results.
All the related smart railway systems identified adopt semantic representation and reasoning. The service-oriented approaches conform to the specific application aspects and, therefore, in all relevant systems the agents are uniquely responsible for specific operations. The conflicting patterns are also not examined in most of these designs, limiting their applicability to specific deployments.
Furthermore, the three related systems do not use any management middleware for embedded devices. This approach is quite limiting in the IoT era, where high volumes of heterogeneous equipment have to be deployed and co-function. The systems also neglect the popular agent frameworks which, among others, provide efficient agent-related functionality and implement relevant standards. The reasoning operation is developed with general purpose programming languages, ignoring the advantages offered by the deductive rule-based techniques. Mechanisms for resolving conflicts, when implemented, are based either on epistemic or relational reasoning. More importantly, these related systems do not safeguard security, privacy, and dependability, and do not utilize any built-in protection technologies.
Conversely, SPD-Safe is a solution focusing on the SPD management of IoT and CPS settings. The SPD modeling is based on well-structured metrics that analyze the various configuration options of a multi-layered system. The AI process adjusts the railway CPS and counters attacks at runtime. SPD-Safe integrates state-of-the-art technological building blocks and platforms for the implementation of reasoning, as well as the management of devices and agents. Epistemic and relational reasoning are incorporated for resolving conflicts. Furthermore, the proposed framework adopts standardized technologies, from semantic standards to communication protocols and authorization schemes.
6.2. Future Work
SPD-Safe integrates several technologies in a secure manner. It preserves the SPD properties, enables active defenses and countermeasures, and can facilitate emergency response operations.
Active and offensive types of defenses are proposing nowadays, as the next step to enhance protection and mitigate threats, that the mainstream passive mechanisms (e.g., cryptography, network slicing, anti-viruses, etc.) cannot tackle. MTD is such an approach. It is becoming harder to analyze the system and exploit its vulnerabilities. Furthermore, in conjunction with other intrusion detection techniques, it can mitigate or even block some type of ongoing attacks. Nevertheless, more research is needed in order to make guidelines for the implementation of effective MTD policies as well as strategies to mitigate more advance attacks.
Moreover, safety-related events require the participation of relevant authorities. In modern settings, emergency authorities possess their own equipment, which is utilized during safety incidents. The cooperation of the involved systems becomes vital when it comes to rescuing lives. For the effectiveness of the response services, the systems must authenticate and authorize the various participants and exchange information (e.g., sensors’ data, surveillance video, etc.) in real time. The seamless interoperation will be examined in future extensions of SPD-Safe.
7. Conclusions
This paper introduced SPD-Safe, an administration framework for IoT settings in ambient secure and safety-critical domains, applied to protect a railway CPS. For secure connectivity, an innovative secure routing protocol was integrated in the network layer. The protocol covers all core security properties (confidentiality, integrity, and authentication) and features policy-based authorization. It was found to be energy efficient and could effectively counter a variety of attacks, providing defense against several threats that are not mitigated by existing solutions. For smart monitoring and automatic adaptation, smart agents were deployed at the edge systems and backend infrastructure, and performed the required AI processes. A multi-agent system was developed in the JADE platform and integrated on the OSGi middleware for the management of DPWS-enabled equipment, also utilizing various built-in protection mechanisms. The core reasoning process was implemented in Event Calculus. The SPD validation and metric-driven administration were modeled as a heuristic framework in a security-related theory. The implementation of MTDs was enabled, providing extra protection against attacks that were not mitigated by passive defenses. Furthermore, the system models a safety-related theory and implemented associated AI ambient strategies and plans. The two features were incorporated to administrate the underlying components, considering both the SPD and safety aspects. To validate the proposed approach, SPD-Safe was deployed to administrate WSNs on a complex railway CPS testbed, where the underlying components were successfully configured at runtime and mitigate security-related attacks, while AI reactive plans preserved the safety of personnel and passengers in emergency situations. The average delay in real equipment was around 0.2–0.6 s.
In terms of future work, advances in MTD solutions and integration with emergency response services were considered. MTDs are coming to the foreground nowadays and are expected to play a significant role in future defense strategies as AI becomes an integral part of new generation systems. Safety critical systems, such as the railway ones, must provide an adequate means to collaborate with emergency authorities and support their operations. Facilitating emergency response and a rapid restoration of service must also be considered by modern smart railway installations.