A Novel Intrusion Detection Model Using a Fusion of Network and Device States for Communication-Based Train Control Systems

: Security is crucial in cyber-physical systems (CPS). As a typical CPS, the communication-based train control (CBTC) system is facing increasingly serious cyber-attacks. Intrusion detection systems (IDSs) are vital to protect the system against cyber-attacks. The traditional IDS cannot distinguish between cyber-attacks and system faults. Furthermore, the design of the traditional IDS does not take the principles of CBTC systems into consideration. When deployed, it cannot effectively detect cyber-attacks against CBTC systems. In this paper, we propose a novel intrusion detection method that considers both the status of the networks and those of the equipment to identify if the abnormality is caused by cyber-attacks or by system faults. The proposed method is veriﬁed on a hardware-in-the-loop simulation platform of CBTC systems. Simulation results indicate that the proposed method has achieved 97.64% true positive rate, which can signiﬁcantly improve the security protection level of CBTC systems.

anomaly detection. While parametric techniques estimate the parameters from the given data [18], such systems may generate incorrect results in non-stationary systems. To overcome the problem, non-parametric techniques are used [19], which can provide accurate notification of abnormal activities and detect DoS attacks without delay [20]. However, the detection rate is low when the anomaly traffic intensity is lower than 5% of the background traffic. Manikopoulos introduces a multi-window statistical method using statistical modeling and neural network classification to achieve high detection rate along with a low misclassification rate [21].
Machine learning algorithms are widely used in IDSs [11], such as decision trees, neural networks, support vector machines, clustering, and so on. Among the algorithms, decision tree is easily comprehensible and requires little data preprocessing. Sindhu uses it to construct a lightweight IDS that can discover specific attacks with a true positives rate of 98.4% [22]. When the data set becomes larger, the decision tree grows deeper and broader, and it is much more challenging to extract rules. Thus, random forest is used instead to process the vast amounts of data [23]. However, the traditional decision tree may have a low detection rate on highly imbalanced data. Jahromi combines a deep unsupervised learning approach with the decision tree for effective detection [24].
Although extensive research has been carried out on intrusion detection of CPS, few single studies exist which are suitable for CBTC systems. Melaragno proposes a signature-based rail radio intrusion detection system (RRIDS) to detect command replay, guessing, and message corruption attacks [25]. RRIDS detects intrusion by modeling each type of attack, which relies on fixed signatures and requires frequent database updates. As CBTC systems are continuously running and widely distributed in space, frequent updates may be unsuitable for CBTC systems. Zhang studies on the data tampering attacks on trains and proposes an intrusion detection method based on the running status of the train through Kalman filter and χ 2 detector [26]. However, the method can only detect data tampering attacks, which may not be effective against other anomalies. Gao proposes an improved Adaboost multi-classification IDS based on the n-gram model [27]. Experiments show that the IDSs can effectively detect attacks on the train-ground communication subsystem.
Analysis of the related works shows that the existing methods mostly focus on a certain attack or subsystem. They can not provide security protection for an entire CBTC system. Therefore, the intrusion detection mechanism should adopt a combination of different techniques to achieve good performance.

CBTC Systems and Cyber-Attacks
In this section, an overview of CBTC systems is firstly presented, followed by the impacts of cyber-attacks on CBTC systems.

An Introduction of CBTC Systems
As shown in Figure 1, a CBTC system consists of cyber networks and physical processes. A typical CBTC system is comprised of wayside equipment, on-board equipment, and data communication systems (DCS). The wayside equipment, including automatic train supervision (ATS), zone controller (ZC), computer interlocking (CI), and data storage unit (DSU), is connected through the wired backbone network. Through the wireless network, the wayside equipment communicates with the onboard equipment, which is called the vehicle onboard controller (VOBC), including automatic train protection (ATP), automatic train operation (ATO), and mobile station (MS). Due to the high reliability and safety requirements of CBTC systems, redundant and fault-tolerant equipment is adopted [28]. Meanwhile, redundant networks and dedicated safe communication protocols are deployed for data transmission.
In a CBTC system, the position and speed of the foregoing train are transmitted to ZC through the wireless network. After receiving the information from the foregoing train and the information of the safe route from CI, ZC generates and forwards the limitation of the movement authority (LMA), a location on the line that the train cannot travel cross, to the following train. The VOBC controls the train to run below the protective curve, which is calculated based on the LMA and the status of the train.

Impacts of Cyber-Attacks on CBTC Systems
The traditional rail system is a track-based train control (TBTC) system, which uses track circuits to transmit information [29]. As TBTC is designed physically isolated from external networks, issues of cybersecurity are not considered. With the increasing passenger volume of urban rail transit, CBTC systems are widely deployed all over the world. Commercial off-the-shelf (COTS) products are extensively used in CBTC systems, including general computers, commercial operating systems, standard communication protocols, etc. COTS improves the automation level of the system, shortens the headways between trains, enhances the capacity of the urban rail transit, however, introduces the risk of cyber-attacks at the same time. In this paper, we consider cyber-attacks that have serious impacts on CBTC, including denial of service (DoS) and data integrity attacks (DIA) [30].
When the CBTC systems operate normally, the ATP of the following train calculates the protective position/speed curve based on the received LMA. As shown in Figure 2, the LMA of the following train at time t is L m (t), the safe position of the tail of the foregoing train. The ATO of the following train calculates a service braking curve under the ATP curve and controls the train to run under the service braking curve. In this paper, we consider the cyber-attacks interfere with the operation of trains. To achieve this goal, the cyber-attacks impair the availability or the integrity of the LMAs directly or indirectly. DoS attack reduces the availability of a train's LMA. When a train cannot receive LMA, it uses the latest received LMA to calculate the ATP and the ATO curves. If the train unable to receive LMAs continuously and the interruption time exceeds a specified threshold, it applies an emergency brake to ensure safety. As depicted in Figure 2, if the following train does not receive L m (t), it uses the latest received LMA, L m (t − 1), to generate the ATP curve, C 2 . As C 2 is closer to the following train than C 1 , the speed of the train may be lower down unnecessarily, the efficiency of the train's operation is decreased.
If an attacker has prior knowledge of CBTC systems, he may launch a DIA attack that tampers with the LMA of a train directly or indirectly to cause more damage. There are three possible consequences. If the tampered LMA violates the communication protocol or is unreasonable in logic, it is perceived and discarded by the train. In this case, the DIA attack has the same impact on the train's operation as the DoS attack. If the tampered LMA is behind the real LMA and is used to calculate the ATP and ATO curves, the DIA attack may impair the efficiency of the train's operation. If the tempered LMA is in front of the real LMA and passes the inspection of the train, the DIA attack may lead to an accident. The train may crash into barriers after crossing the real LMA. As shown in Figure 2, if L m (t) is tampered into L m (t), which is in front of the foregoing train. The following train runs under the ATP curve C3, it may collide with the foregoing train.
This paper presents an IDS to detect specific CBTC attacks including DoS and DIA. The main technical challenges and proposed solutions are summarized in Table 1. • Adding a detection model based on device states.

The Intrusion Detection Model Using a Fusion of Network and Device States
In this section, the framework of the proposed IDS is presented firstly. Then the detection models based on the network status and the device states are described in detail, respectively. Lastly, an HMM classifier is adopted to distinguish between random faults and cyber attacks.
As depicted in Figure 3, the intrusion detection process of the IDS is divided into two phases. One is the anomaly detection phase. The other is the result classification phase. The anomaly detection phase includes two models which are the network detection model and the device detection model. The network detection model analyzes the data throughput and the content of packets to detect the anomalies of networks. The device detection model detects abnormalities of devices based on the tasks and resource usage of hosts in CBTC systems. In the classification phase, the anomalies of network and devices are fused through the HMM model to distinguish between random faults and cyber-attacks.

Result Classification Anomaly Detection
Packet analysis

Resource analysis
Task analysis Device states detection model Figure 3. A novel intrusion detection system (IDS) for CBTC systems.

The Network Detection Model
As shown in Figure 4, the network detection model includes a throughput analysis module and a packet analysis module. Before being used to detect anomalies, the model is trained through historical packets to model the normal behavior of a CBTC system in throughput and transmitted packets. A DoS attack may hinder the normal operation of an IDS through a very high data throughput as lots of packets consume excessive resources of the IDS. To avoid the above situation, a threshold of throughput is predefined. The throughput analysis module delivers packets to the packet analysis module only if the throughput is below the threshold. Otherwise, the throughput analysis module outputs detection results directly.  Due to the periodic communication between different equipment, the data throughput in a CBTC system is stable. A successful DoS attack on a CBTC system leads to abrupt changes in throughput. Consequently, a sudden change in the statistics parameters can be observed. The data throughput detection is equivalent to the problem of change point detection [31,32]. The exponentially weighted moving average (EWMA) control chart, a sequential analysis technique, is typically used for change point detection. EWMA is an efficient statistical method in detecting small shifts, which is superior to other control charts. EWMA can detect small changes more easily and quickly as it combines the current and historical data [33,34]. In this paper, EWMA is used to identify sudden changes in data throughput of a CBTC system.
Taking the data throughput between VOBC and ZC as an example, the predicted throughput of EWMA is calculated as [35] where x(i) is the throughput at time i. λ is the smoothing factor indicating the sensitivity of z(i) to the observed x(i). The mean and variance of z(i) can be expressed as where µ x and σ x are the mean and variance of x(i), which can be estimated from historical data in the training phase. µ z and σ z are the mean and variance of z(i), respectively. A change point is detected if where L is a coefficient that effects the results. However, due to the communication of CBTC systems is periodic, the observed data of throughput have strong autocorrelation. The traditional EWMA is not suitable for highly autocorrelated data [36]. To solve this problem, the error between x(i) and z(i − 1) is defined and used to detect change point, which is defined as The variance of e(i) can be rewritten as where α is a coefficient that affects the sensitivity of upper and lower limits of the interval to e(i), which is identified in the fourth chapter. Accordingly, the lower and upper limits of e(i) for change point detection are In the EWMA control chart, the values of L have an important influence on the performance of the detection. When L becomes larger, the threshold of the EWMA control chart will become higher, which may cause more attacks to be missed. However, if L is small, it may cause more false alarms. In the detection phase, an anomaly is discovered if z(i) falls outside the D or U.
As the types of packets in CBTC are much less than those in a general network [37], a decision tree is adopted in the proposed IDS. A decision tree is one of the most popular and useful machine learning algorithms mainly used for classification. It uses a tree-like structure in which each internal node denotes a test on an attribute, each branch represents the output of the test, each leaf node corresponds to a class label. The merits of the decision tree include high classification accuracy and simple implementation. The best-known method to build a decision tree automatically is the ID3 algorithm [38]. Information gain, I, is defined to choose the attribute for each internal node to classify data.
The entropy of D and D v can be calculated as follows where D k and D vk are the subset of D and D v , respectively. All the samples of D k and D vk belong to the kth category. |D vk | and |D v | are the number of samples in D vk and D v , respectively. The information gain of using property a to classify D is defined as The attribute with the highest information gain is chosen as the root node. Then I is computed on the other attributes to select a branch node until all the remaining samples belong to the same class. The root node, branch nodes, and leaf nodes make up a decision tree.
In the proposed IDS, the following attributes of the data are chosen for detecting anomalies in CBTC systems. AN = {sMAC, sIP, sPort, dMAC, dIP, dPort, Len, P, M} where the first six parameters represent the MAC, IP, port of the source and the MAC, IP, port of the destination, respectively. Len indicates the length of a packet. P represents the protocol type, including TCP, UDP, and ICMP. M is the position of the train. As a CBTC system adopts the specified protocol, the packet length varies in a predictable range. It is found through analyzing a typical CBTC system that the normal value of Len is between 0 and 400 or in the range of 800 to 900.
A vicious cyber attacker may try to threaten the safety of a train through tampering with the LMA of the train to create a great sensation. The most predictable DIA method is adding an offset to the real LMA. To detect the anomaly caused by DIA, attribute M is adopted in the decision tree to check if the position change of the foregoing train conforms to the kinematic equation.
The position change of the foregoing train is equivalent to the difference between two consecutive LMAs, which should be less than where S max is the maximum position change of the foregoing train during one communication period of T. v t is the train speed. If the difference between two consecutive LMAs is larger than S max , the attribute M is defined as abnormal.

The Detection Model Based on Device States
To satisfy the high requirement on reliability and safety, different redundant architectures are adopted in a CBTC system [39], such as hot-standby, two-out-of-three, and double 2-vote-2. Taking the hot-standby structure shown in Figure 5 which is adopted in ATS as an example, if the processing unit A is not working properly, the switching unit automatically switches to the processing unit B. The redundant device maintains the normal operation of the system even when the other unit is out of order due to random faults or cyber-attacks. It can be seen that the redundant architectures in CBTC which are adopted for safety can also protect against security risks to a certain extent. On the one hand, since the cyber-attack has not caused any communication abnormal behavior at this stage, the network detection model can not detect it yet. On the other hand, random faults and cyber-attacks may cause the same communication abnormal behaviors, the network detection model along can not identify the causes of the anomalies. To solve the above problem, the states of devices are analyzed to distinguish the anomalies caused by random faults from those introduced by cyber-attacks. As the trains of a CBTC system run in fixed headways, the tasks and load of the key subsystems are stable. The hosts of the subsystems have stable resource utilization, including CPU, memory, disk, and network. Association rule mining is a rule-based machine learning method to discover the potential relations among variables in large databases [40,41]. In the device detection model, the associated rule mining method is used to check the resource usage of hosts. A set of all the items is defined as follows where S t is the task running on the host, S ip is the IP address of the host, R c , R m , R d , and R n are the usage of the CPU, memory, disk, and network of the host, respectively. Based on the item set, the status of the concerned hosts are gathered to form the database of CBTC, which is a collection of transactions. Each transaction is a non-empty subset of I, such as where ZC1 represents a task running on the host with an IP address of 192.168.1.2. Its occupancy of CPU, memory, disk, and network are 11%, 2%, 0%, 0.1%, respectively. Then an association rule is defined as where X is the antecedent, Y is the consequent, s is the support of the rule that indicates the percentage of the transactions that contain both X and Y. c is the confidence of the rule that represents the ratio of the number of transactions containing both X and Y compared to the number of transactions containing only X. The Apriori algorithm is a classical association rule mining algorithm working in two steps. It finds frequent itemsets firstly and then generates association rules. However, Apriori can not handle continuous attributes such as CPU usage. For example, to discretize the CPU usage, the value range of R cpu is divided equally into ten intervals. The R cpu is discretized as follows: whereR cpu is the discretized R cpu . R n l and R n u are the lower and upper limits of the nth interval, respectively.
In addition, Apriori may suffer from heavy computational load in mining association rules. Since different subsystem implements variable functions, they have unique rules of their own. The computational load is closely related to the size of the database [42]. Thus, each subsystem can manipulate its own database and mine association rules individually. Furthermore, as the running processes in the concerned hosts are determined in a CBTC system, transactions with illegal processes can be directly classified as anomalies. Assuming that s and c are 60% and 70%, respectively, some associated rules obtained through the Apriori algorithm are listed in Table 2.

The HMM Classifier Distinguishing Faults and Attacks
Due to the fail-safe mechanism of CBTC systems, both random faults and cyber-attacks may lead to anomalies. If anomalies are caused by failures, the broken devices should be repaired or replaced. However, if abnormalities are intrigued by cyber-attacks, not only the equipment should be restored, but also defensive measures should be taken to prevent similar incidents. Adopting an anomaly detection model based on network status or device states alone, the IDS cannot identify the causes of anomalies, not to mention advising administrators to take appropriate measures.
HMM is a statistical method to characterize observation samples arranged in discrete time series, which can predict the hidden states through observations [43]. As investigated by Hindy, HMM can meet the requirements of network detection, such as high detection rate, online learning ability, and high stability [44]. Besides that, HMM requires little time to train the detection models.
In the proposed IDS, the results of different detection models are observable, while the state of the system is invisible. An HMM classifier can fuse the results of different models to differentiate cyber-attacks from failures and improve the performance of detection effectively.
In an HMM, Q is a set of possible hidden states. V is a set of possible observations.
where N is the number of hidden states and M is the number of observations. In this paper, the hidden states are normal, fault, and attack, N = 3. The most typical structure of a CBTC system is hot-standby, where two redundant devices are connected to two physically independent networks, respectively. Based on the structure, the observation is designed as shown in Figure 6. An observation includes data obtained from the two networks, indicated by Network A and Network B. The data get from network A includes F A , P A , H A , and T A , which are the analysis results of throughput, packet content, running process, and host state, respectively. The possible values of F A are "0," "1," and "2," indicate "normal," "low," and "high," respectively. The values of P A , H A , and T A are "0" and "1," represent "normal" and "abnormal," respectively. The data get from network B are the same as those collected from network A. The M of the HMM classifier is 576.
An observation is given as an example where the throughput is lower than the predefined threshold, the other results are normal. The following sequences are defined: where I and Q are the sequence of hidden system states and the sequence of observations, respectively. i t is the hidden system state at time t, o s is the observation at time s. Two assumptions are embodied in the HMM classifier. One is the Markov assumption on the probabilities of the sequence of system states. It is assumed that the probability of a system state depends only on the previous state.
The state transition probability is defined as where a ij represents the probability of moving from state i to state j. The transition probability matrix is composed of all the a ij . It has Another assumption is the probability of an observation depends only on the hidden system state that produced the observation. The observation probability is defined as Then the observation probability matrix is Besides A and B, an initial probability distribution of the system state is defines as The HMM classifier is specified by A, B, and Π, which is expressed as The classification scheme of the proposed IDS is shown in Figure 7. In the offline phase, the historical data, a sequence of observations, is used to learn the parameters of the HMM classifier. The Baum-Welch (BW) algorithm, which is also known as a special case of the expectation-maximization algorithm [45], is adopted to train the A and B matrices. The forward probability which represents the probability of a state given the sequence of pre-observations is expressed as where α t (i) is the forward probability of state i at time t.
The forward probability is calculated as The backward probability which indicates the probability of a state given the sequence of post-observations is represented as where β t (i) is the backward probability of state i at time t.
The backward probability is computed as The probability of a system state can be rewritten as where γ t (t) is the probability of system state i at time t. Given the observation sequence and the HMM, the probability of being in state i at time t and state j at time t + 1 is defined as ξ t (i, j). It has The state transition probability can be estimated aŝ whereâ ij is the estimation of a ij . The observation probability can be estimated aŝ To train the HMM classifier of the proposed IDS, a sequence of historical observations and the set of possible hidden system states are input to the BW algorithm. It is assumed that the initial hidden system state is "normal," P (q 1 = 0) = 1. The A and B matrices are initialized randomly at the beginning of the iterations. The convergence conditions of the algorithm are set as follows: where A and B are the 1-norm of the matrices A and B, respectively.
The algorithm converges after 152 iterations. The trained parameters of the HMM classifier are as follows: Given the assumed Π, the trained matrices A and B, the HMM classifier is determined and used to classify the detection results of different models in the online phase. Through using (29), (31), and (32), the probability of the most possible hidden system state, given a sequence of observations, can be calculated as Finally, the proposed IDS outputs i * t as the detection result.

Experimental Data Collection
In this section, an experimental environment is constructed to evaluate the proposed IDS. Therefore, the platform of Beijing Subway Line No. 7 is introduced, where attack scenarios are designed to collect experimental data.

Semi-Physical Simulation Platform of CBTC
The proposed method is verified on the hardware-in-the-loop simulation platform of CBTC systems. As shown in Figure 8, there are two networks, automatic train supervision (ATS), and automatic train control (ATC), which are connected by the gateway. In the platform, ZC and VOBC are real devices, while the other devices such as CI, DSU, and the gateway are simulated by software on different computers. Additionally, CBTC systems support degraded modes to ensure high availability. We can simulate different operation modes and collect all kinds of data on the platform. The operational modes can be classified as CBTC mode and intermittent ATP mode. In CBTC mode, information is transmitted continuously to realize automatic train protection. While in intermittent ATP mode, MA is only updated at discrete locations along the track. CBTC mode provides accurate closed-loop control of trains through continuous, bidirectional, and high-capacity communication between trains and wayside equipment. As the LMA is calculated based on the front train position, CBTC mode is a moving block signaling system. However, if continuous communication is interrupted, the system will convert to intermittent ATP mode, where the LMA will be transformed through beacons.
The proposed IDS is also implemented on this platform. As shown in Figure 8, the processor collects all packets and information from the CBTC platform. The detector is responsible for anomaly detection. As IDSs should not introduce new threats to CBTC, a firewall is set between the processor and the detector. Attacks can not be carried out through intrusion detection devices.

Experimental Scenarios
As shown in Figure 8, a typical attack path is highlighted in red and the major injection locations are highlighted in yellow. CBTC is connected with other systems, such as passenger information system (PIS), through the communication computer in the ATS network. As the security protection of other systems may be weak, attackers may first capture the communication computer through other networks, and then launch an attack on the ATS. To connect to the ATC network, they need to attack the gateway next. Finally, they will directly attack equipment such as ZC and seriously affect the train operation.
There is a wide variety of attacks in IT systems. However, most of the common attacks can not be achieved in CBTC because CBTC systems are not connected to the Internet directly. In this paper, we only select attack scenarios that may occur in CBTC systems, including DoS and DIA. Among them, DoS includes vulnerability triggering and resource exhaustion. As COTS products are widely used in CBTC systems, various buffer overflow vulnerabilities also exist. The exhaustion of resources is mainly caused by flood attacks, such as Smurf, synchronize sequence numbers (SYN) flood and so on. Additionally, LMAs directly determine where the trains can travel to. Therefore, DIA scenarios in the experiments are launched against the LMAs.
As our IDS can distinguish between faults and attacks, fault injection scenarios are also designed. Equipment faults and communication failure are selected as they mostly occur in CBTC systems.
When implementing experiments, the difference between faults and attacks is also considered. The goal of an attacker is usually to affect the operation of the train. As a result, the target choice and duration of attacks are purposeful while the faults are random.
The way of attack emulation and fault injection are introduced in detail next. •

Buffer overflow
A buffer overflow occurs during program execution when a fixed-size buffer has had too much data copied into it [46]. Buffer overflow attacks can take place in the process of using a stack during program execution. It can overwrite data into adjacent memory locations and affect the behavior of the software. Since most applications in CBTC are developed in C, a publicly-available suite is applied in this paper to identify buffer overflows and help launch attacks.

• Smurf
Smurf is a type of DoS attack which floods a victim network via spoofed broadcast ping messages [47]. Currently, Windows operating systems have adopted strategies to avoid this attack. However, this vulnerability may still exist in other operating systems, such as VxWorks in ZC. We simulate attackers sending ICMP echo request packets to the broadcast address and forging the source address to be the IP address of ZC. Then significant traffic will be generated on the ZC subsystem, which will cause ZC to be down.

• SYN flood
When the SYN flood attack occurs, all open ports may be saturated with requests and none are available for legitimate users to connect to. In this paper, the targets of SYN flood are communication computer, gateway, ZC, CI, and DSU respectively.

•
Tamper attack The tamper attack mainly affects the location information or the LMA transformed between the trains and ZC. We design attack nodes that can modify the train position or LMA before the packets are sent to the final nodes in the experiments. Therefore, the targets of tamper attacks are ZC or VOBC. •

Replay attack
The replay attack is also most likely to occur in the communication between ZC and VOBC, where the impact of the attack is greatest. In this paper, the attacker is simulated to eavesdrop and repeatedly send LMAs to disturb the normal operation of the trains. •

Equipment faults
In this paper, the target of equipment fault injection is chosen randomly. As shown in Figure 8, we take CI as an example to show how faults are injected. In the first case, an application error is simulated by shutting down related tasks running on CI. In the second case, unexpected device faults are considered and simulated by shutting down the CI host. •

Communication faults
Wire communications are used between wayside devices, while VOBC communicates with wayside devices wirelessly. Therefore, the probability of faults between VOBC and the ATC network is much greater. As shown in Figure 8, communication faults are injected between VOBC and the wayside equipment, including simulating packet loss and increasing transmission delay.

Experimental Data
The collected data includes traffic flow, packets, process lists and resource utilization of each process. The normal operation of CBTC systems is simulated on the platform, and the data is marked as "normal". The data collected during the attack is labeled as "attack". Similarly, the data is marked as "fault" if it is obtained during the fault injection.
True positives rate (TPR) and false positives rate (FPR) are selected to measure the performance of the IDS. TPR is the proportion of anomalous instances classified as correct ones over the total number of anomalous instances, while FPR is the proportion of normal instances classified as anomalous ones over the total number of normal instances. In this paper, we define the number of non-attack records which are detected true as non-attack → non-attack (TN). Similarly, we get non-attack → attack (FP), attack → attack (TP), and attack → non-attack (FN).
To get the appropriate data size, we verify the impact of dataset size on detection performance. Figure 9 shows the performance of the IDS with 20, 40, 60, 80, 100, 120, 140, and 160 experiment times, respectively. From 0 to 80 times, detection performance improves as the dataset gets larger. Compared with 80 times, the performance of 100 times improves less. In addition, TPR and FPR are basically stable after 100 times. Therefore, to save computing resources and reduce model training time, the dataset contains 100 experiments in this paper.
Finally, all of the data is summarized as shown in Table 3, including 560,977 packets and 31,649 state messages. The whole data set will be divided into training set and test set. The former is used for model training and the latter for performance evaluation. If the training set is too small, the training model is not accurate enough. On the contrary, if the test set is too small, the performance evaluation is not comprehensive. In general, 80% of the data is randomly selected as a training set and the remaining data is a test set.

Parameter Settings
It should be noted that when the results are calculated, both "normal" and "fault" instances are used as "non-attack" ones. To implement the proposed IDS, the parameters of each model need to be determined.
Due to the EWMA control chart used in the flow statistics, L, λ, and α may all have impacts on the detection performance. According to [48], L is assigned a value of 1.96. When λ and α take different values, the changes of TPR and FPR are shown in Table 4. When λ = 0.01 and α = 0.001, EWMA has the best performance. In the detection model based on device states, the support s and the confidence c of association rule mining also have a great impact on the performance. The larger these two parameters are, the fewer frequent itemsets are mined. Thus, fewer association rules are generated and the FPR may be higher. However, if s and c are too small, lots of redundant rules may be generated. Mining these rules will consume a large number of computing resources. Therefore, we set s equal to 0.6 and c equal to 0.8, just as the experts suggest.

Experiment Results
In this section, we compare the detection performance of the proposed IDS with other approaches. As multiple methods are applied from multiple perspectives in the IDS, it is difficult to compare with other intrusion detection approaches directly. Therefore, we firstly compare the performance of a single detection model with the entire IDS in Experiment 1. Then the network states anomaly detection (NAD) model, the device states anomaly detection (DAD) model and the HMM classifier are compared with other approaches, respectively. As shown in Figure 10, the test dataset is used to generate the following results.  • Experiment 1 To prove the proposed IDS has better performance than the NAD model or the DAD model alone, we calculate their detection results as shown in Table 5. The data in Table 5 is plotted as Figure 11 to display the results more intuitively. As the NAD or DAD model can only obtain information from one aspect, the TPR of a single model is lower than the entire IDS. Generally speaking, the HMM classifier can process different information on NAD and DAD to obtain a lower FPR. However, in the case of SYN flood attacks, FPR reaches 11.64% in DAD. Due to many false alarms generated in DAD, the HMM classifier may also generate more false positives. As a result, the FPR of the IDS is 0.05% higher than NAD. For the entire dataset, the TPR is increased by 3.51% and 7.76% after applying the fusion of the detection models. At the meantime, the FPR is reduced by 0.86% and 4.95%. In summary, the proposed method has better performance than a single detection model.   • Experiment 2 Table 6 gives some different methods, application scenarios, and results of other published IDS. Yang and Liu adopt a statistical-based method to analyze the network traffic [49,50]. Akbar applies a supervised classifier to detect attacks in Voice over Internet Protocol (VoIP) networks [51]. Using a single detection method, their TPR is lower than that of the combined methods. Some works combine statistical methods with machine learning algorithms to detect attacks in different systems [52][53][54][55][56]. Among them, Verba does not illustrate the detailed detection performance [52]. According to TPR, Valdes and Amini are effective in detecting DoS attacks [53,54]. However, they do not take the DIA scenarios into account. Only Goh gives the detection results of DIA, while it does not analyze the more common DoS attacks [56]. The NAD model adopts both the statistical method and decision tree. It can detect DoS and DIA at the same time. What's more, the NAD model is designed according to the characteristics of CBTC systems. It has good detection performance, where TPR is 98.86% for DoS and 92.95% for DIA. • Experiment 3 As mentioned before, TPR and FPR are selected to measure the performance of the IDS, where TPR represents the detection rate. When we compare the performance of the DAD model with other methods, the receiver operating characteristic (ROC) curve is applied in this paper. An ROC curve can evaluate the tradeoff between TPR and FPR. By carrying out several tests using different s and c of the association rule, the ROC curve can be plotted as shown in Figure 12. Since the states of the CBTC subsystems are generally stable, the association rule mining algorithm has good detection performance to detect the abnormal device states. As shown in Figure 12, the DAD model has a higher detection rate under the same FPR compared with the length decreasing support (LDS) Apriori [57] and the hybrid IDS [58]. When the FPR is higher than 15%, the detection rate of the Fuzzy IDS is higher than that of the DAD model [59]. However, the FPR of the Fuzzy IDS is too high for CBTC systems. A large number of false alarms may be generated in the case of a high FPR. Then the trains may perform emergency braking and the efficiency of CBTC systems will be reduced. When FPR is 7.61%, the detection rate of the DAD model is 89.88%, which is higher than LDS by 13.34%, fuzzy IDS by 61.88%, and mixed IDS by 38.91%. • Experiment 4 Finally, several commonly used classification algorithms in the field of IDS are selected to compare with the HMM classifier, including naive Bayes, neural networks (NN), and support vector machines (SVM). They are applied to classify the same data with the HMM classifier. The detection results are shown in Figure 13. As naive Bayes, NN and SVM may misjudge faults as attacks, they have higher FPR than that of the proposed IDS. As shown in Figure 13, the FPR of naive Bayes, NN, and SVM are 19.21%, 4.26%, and 9.77%, which significantly dropped to 2.66% using the HMM classifier. It proves that the HMM classifier can distinguish between faults and attacks effectively. The proposed IDS can also improve the packet loss rate and throughput performance of the system after an attack. Taking the communication between VOBC and the wayside equipment as an example, we repeat experiments with different attacks, whose results are shown in Figure 14. DIA attacks only tamper with the information and do not change the number of packets. These attacks have almost no impact on the packet loss rate or throughput. Therefore, only DoS attacks are simulated. We attack the system at the 5th second. Among the curves, the blue one indicates the case of joining the IDS. The IDS can detect attacks and notify administrators to take defensive measures. The results show that attacks have a great impact on the communication performance of CBTC. The proposed IDS can not only detect attacks but also promptly give alarms. Thus, the system can recover quickly after the communication is abnormal. In summary, the proposed IDS can effectively prevent attacks from causing more serious impacts on CBTC.

Conclusions
In this paper, a novel intrusion detection method for CBTC based on network and device states is designed. The impact of cyber-attacks on CBTC is analyzed and different detection models are proposed according to the principles of CBTC systems. An HMM classifier is adopted to differentiate cyber-attacks from random system faults. Through limited experimentation, we concluded that the proposed IDS could effectively detect attacks in CBTC systems, where the TPR approached 97.64% while bounding the FPR to below 2.66%.
Future improvements integrated into the proposed IDS would have the ability to use multiple data sources such as fault identification and running status of the train. Additionally, we noticed that the detection rate of data tampering attacks was lower than the other attacks during experiments. More detection patterns are needed to improve the performance of the IDS in the future.

Conflicts of Interest:
The authors declare no conflict of interest.