A Safety Computer System Based on Multi-Sensor Data Processing

The safety computer in the train control system is designed to be the double two-vote-two architecture. If safety-critical multi-input data are inconsistent, this may cause non-strict multi-sensor data problems in the output. These kinds of problems may directly affect the decision making of the safety computer and even pose a serious threat to the safe operation of the train. In this paper, non-strict multi-sensor data problems that exist in traditional safety computers are analyzed. The input data are classified based on data features and safety computer features. Then, the input data that cause non-strict multi-sensor data problems are modeled. Fuzzy theory is used in the safety computer to process multi-sensor data and to avoid the non-strict multi-sensor problems. The fuzzy processing model is added into the onboard double two-vote-two architecture safety computer platform. The fuzzy processing model can be divided into two parts: improved fuzzy decision tree and improved fuzzy weighted fusion. Finally, the model is verified based on two kinds of data. Verification results indicate that the fuzzy processing model can effectively reduce the non-strict identical problems and improve the system efficiency on the premise of ensuring the data reliability.


Introduction
With the development of the railway field, the safety computer is playing an increasingly important role in the train control system [1][2][3][4]. Due to different data having different data characteristics, the analysis and discussion of multi-sensor data processing in the safety computer are of great significance. The current problem is that analog data of multi-channel sensors are inconsistent due to the sensors' own features and environmental factors, which leads to non-strict multi-sensor data problems in the safety computer. Because of the increasing amount of data, these problems are becoming more and more serious. They will bring problems such as inefficiency and low availability in train control systems, even affecting the safe operation of the train if these data are safety-critical data, such as speed data. For example, there are multiple inputs of speed data in train control systems. The double two-vote-two onboard safety computer requires two inputs of speed data. If these data are different, the onboard safety computer will switch the system. However, this does not solve the problem because the safety computer still does not have an accurate value of speed data to calculate the operation curve for the next step. Various types of data in the train control system have this kind of problem, such as speed data, air pressure data, and positioning data.

Non-Strict Multi-Sensor Data Problems in the Safety Computer
The traditional double two-vote-two safety computer compares the homogenous and heterogeneous redundant data during data processing, and the redundant data need to be identical. If these data are different, the safety computer will switch regardless of the slightness of the data difference. Although this logical frame can ensure system safety, it may cause the problem of data inconsistency if input data are analog data, such as speed data and air pressure data, which require accurate values. The problem cannot be solved by switching the system. Safety-critical analog data play a vital role in the operation of the train control system. If these data have errors, this will bring problems such as inefficiency and low availability, even affecting the safe operation of the train.
The problems of multi-sensor data inconsistency can be roughly divided into four cases, i.e., the data parameters do not meet the input standard, there is a large difference between adjacent data, data drift, and the Byzantium problem. (1) The data parameters that do not meet the input standard mean that the values of the collected data far exceed their standard range. For example, the maximum train speed can reach 350 km/h, but the collected value is 600 km/h. In this case, the data input should be discarded. (2) The difference between adjacent acquisition data is obviously too large, which is obviously a sensor failure. If the data are input to a safety computer for data processing, there will be a situation where the efficiency is reduced and resources are wasted. (3) After the multi-channel sensor collects data, data drift may occur due to channel interference and environmental interference. (4) The Byzantium problem is a common problem in fault tolerance mechanism. It is impossible to achieve consistency by means of messaging on an unreliable channel with data failure. Figure 1 shows four kinds of problems of multi-sensor data inconsistency. Currently, in the train control system, the method of calculating the average value for multiple input data is adopted. However, air pressure data are affected by many factors like pipe diameter, duct shape, and the hysteresis effect. In this case, the method is obviously unsafe for the operation of the train control system.

Data Features' Discussion
As a safety critical system, the safety computer has different requirements for different data. According to the principle of safety, periodicity, redundancy, and functionality, the data can be divided into three types: state data, dynamic data, and static data.
State data are safety-critical data, and this reflects a certain state of the device or track in the operation of the train control system, such as the state of the segment. This type of data is usually digital data.
Dynamic data such as speed data and air pressure data are analog data which are required for calculation in the operation of the train control system. They play a vital role in system operation, which has a strong real-time performance and requires accurate values.
Static data are mainly composed of monitoring information and log information that have a large amount of data and low safety.
The non-strict multi-sensor data problems of the safety computer can only be caused by dynamic data. Therefore, modeling dynamic data separately can effectively improve the data processing capability and system efficiency. Establishing a dynamic database requires a large amount of dynamic data. A feature parameter database can be obtained by extracting features from the data. It is worth noting that data features need to be classified and extracted according to different operating environments and different operating modes.

The Architecture of the Safety Computer in the Train Control System
Traditional onboard safety computer platforms adopt the architecture of double two-vote-two. This kind of architecture forms a double relationship between two computers with identical functional structures, and each computer has two heterogeneous units to form the relationship of two-vote-two [11].
Only the dynamic data can result in the non-strict multi-sensor data problems of the safety computer. Thus, a fuzzy processing model based on dynamic data is established. Under the architecture of double two-vote-two, this paper proposes an improved onboard safety computer platform, which adds a fuzzy processing model to process dynamic data separately to improve the system communication efficiency and capability of data processing. The fuzzy processing model consists of two parts: improved fuzzy decision tree and fuzzy weighted fusion. The architecture of the improved onboard safety computer is shown as Figure 2.

Data Feature Extraction and Selection
The extraction and selection of data features play important roles in data classification, algorithm calculation, and simulation verification [12]. The data of train control system are characterized by a large amount of data, high repetition, and contradiction. It is very difficult to extract the information needed from the massive amount of data, so the data features of the train control system must be preprocessed in order to accurately reflect the state of the system.
The steps of data feature extraction are sorting, grouping, collating, dimensionality reduction, and illustration. The function of these steps is to further understand the distribution features of data and then find out the representative values that reflect the distribution features of data.
The feature distribution of data is described in three aspects: the centralized trend of data distribution, the discrete degree of data distribution, and the shape of data distribution. The centralized trend of data distribution reflects the degree of convergence or the convergence of data toward their central value; the discrete degree of data distribution reflects the trend of data away from their central value; and the shape of data distribution reflects the skewness and peak state of the data distribution.
Data feature extraction is the basis of the algorithm, which determines the result of algorithm simulation and the authenticity of the experiment. Therefore, data extraction needs to refer to a variety of factors affecting data, such as different operating modes, operating environments, and sensor types.
Different data features are extracted for different purposes. For example, the average value reflects the centralized trend of the data, and the variance reflects the degree of discreteness of the data.
It is incomplete to use a single data feature to reflect the whole system state, so this paper extracts several data features in data processing, and two kinds of data features are selected from each of three aspects of data features. The centralized trend of the data distribution chooses the mean value and median; the discrete degree of the data distribution chooses the standard deviation and discrete coefficient; and the shape of the data distribution chooses skewness and peak state. It is worth noting that the same kind of data collected from different sensors may also be different data feature. In data weighted fusion, the weights of different sensors need to be considered in particular. Figure 3 shows three operation modes of air pressure data.

Normal mode
Access mode Shunting mode

The Improved Fuzzy Decision Tree
The fuzzy decision tree introduces fuzzy mathematics into a classical decision tree. The difference between the classical decision tree and fuzzy decision tree is that the fuzzy decision tree needs to establish the membership function to represent the membership degree of data. The fuzzy decision tree use fuzzy multi-valued logic with ambiguous values instead of binary logic with precise numerical values.
The establishment process of the fuzzy decision tree model can be divided into six steps: (1) data collection; (2) data preprocessing; (3) induction and establishment of the fuzzy decision tree; (4) pruning the established fuzzy decision tree; (5) converting the fuzzy decision tree into a set of fuzzy rules; (6) applying the obtained fuzzy rules to forecast [13][14][15].
The fuzzy decision tree based on the FID3 algorithm has a poor real-time data processing ability and cannot analyze the historical data. Aiming to deal with these disadvantages, this paper proposes an improved real-time FID3 algorithm based on historical data, which fully takes historical data and real-time performance into consideration. Data preprocessing is one of the most important steps in the modeling of the fuzzy decision tree. The function of data preprocessing is to fuzzify the collected data and establish its membership function. In this paper, the membership function is established according to the speed data. In addition, a time function and a delay parameter are added to satisfy the requirements of historical data and real-time performance. The time function can be divided into two parts, the historical credibility and a similarity function. The function of the equation is to compare the similarity of adjacent data. Different data have different membership functions. Speed data will be used to build the improved fuzzy decision tree. The delay parameter represents the time delay of the data.
According to the characteristics of the data, the algorithm adds a time function based on the initial one. Assume there is an input instance In the formula, t i denotes the data acquisition time. σ denotes the distribution parameter. g(t i ) denotes the time function. δ denotes the difference between the speed acquisition value and the speed estimation value. V(t k ) denotes the speed data collected at t k point. ∑ i j=i−1 t j a j denotes the speed change value calculated by the collected acceleration. The abscissa xdenotes the historical speed value and the range of values is [0, +∞]. ∆t denotes the maximum delay allowed for the system.
In the formula, T(t i ) denotes the degree of real-time and the reliability of the data collected at time t i , and the value's range of T(t i ) is [0, 1]. θ(t i ) denotes the similarity of data collected at times t i−1 and t i . The value's range of θ(t i ) is [0, 1]; the higher the similarity is, the closer to one. β t i denotes the historical credibility of sensor, which ranges from [0, 1]. The higher the reliability is, the closer to one.
The rest of the algorithm is similar to the FID3 algorithm. According to the principle of maximum information gain, each time, the attribute with the greatest information gain is selected as the test attribute. In the process of dividing, according to the value of the degree of membership of a certain test attribute of each instance, it can be subsumed into multiple subsets to generate a fuzzy decision tree.
The functions of the improved fuzzy decision tree are data fuzzification, the generation of historical credibility, and data preliminary filtering.

Fuzzy Weighted Fusion Algorithm
Different sensors have different operational principles. Different sensors have their own advantages under different environmental conditions. The weighted fusion model gives different weights to different sensors; however, it is difficult to ensure the accuracy of the fusion for real-time and variable data. Fuzzy weighted fusion introduces fuzzy mathematics into the traditional weighted fusion model, with the advantages of high fusion precision and strong robustness. Fuzzy weighted fusion can solve nonlinear problems and environmental disturbance.
The main function of the fuzzy weighted fusion algorithm is to fuse multi-sensor data and improve the accuracy of data and fault tolerance. The degree of importance and historical credibility are used in the algorithm for the problem of different sensor sensitivities under different environmental conditions. The main point of this fuzzy weighted fusion algorithm is to fuzzify the degree of importance and then convert the fuzzified degree of importance and historical credibility into a comprehensive weighting matrix. Finally, the multi-sensor data are fused by a comprehensive weighting matrix. Figure 4 shows the structure of the fuzzy weighted fusion algorithm.  Assuming there are multiple inputs U = U{U 1 , U 2 , · · · , U m }, S i denotes the historical credibility, and P i denotes the degree of importance [16][17][18][19].

Multi-input data
The fuzzy weighted fusion algorithm is divided into five steps: (1) the fuzzification of the importance degree and historical credibility; (2) the analysis of the importance degree and historical credibility fusion; (3) determining the fuzzy attribute value of the data; (4) the weighted fusion of data; (5) the simulation and verification on the fuzzy weighted data.
The first step is to fuzzify the importance degree and the historical credibility. The historical credibility is fuzzified in the improved fuzzy decision tree. The fuzzification of the importance degree is based on the feature parameter database according to different environments and operating modes.
The second step is to analyze the historical credibility and importance degree after fuzzification. OWA (ordered weighted averaging operator) is used in the transformation. OWA is that between the maximum and minimum operators. It can be used to fuse multi-source and multi-attribute data effectively by adjusting the degree or-and in the operator.
There are two measures associated with an OWA operator, orness and andness.
andness(w) = 1 − orness(w) In the formula, n denotes the number of data. Wdenotes an ordered weighted vector.
An ordered weighted vector w = w(w 1 , w 2 , · · · , w n ) can be defined as: In the formula, Q denotes a fuzzy semantic quantization operator, which can be defined as: In the formula, the value range of α, β, r is [0, 1]. The parameters α, β are affected by the characteristics of the sensor, environmental factors, and different operating modes.
The third step is to synthesize and convert the historical credibility and importance degree. The fuzzy synthesis algorithm is used in transformation; a fuzzy conversion operator is defined as: In the formula, m denotes the number of data sources. d denotes the sum of the importance degree. T and S denote two different fuzzy logic operators.
Based on the previous discussion, fuzzy synthesis function can be defined as: The rules are as follows: Assuming that R i = G(P i , S i ), R i denotes the fuzzy attribute value of the ith data after fuzzy synthesis. G max , G ave , and G min can be defined as: In the fourth step, the sensor state matrix can be calculated by the weighted fusion formula that is defined as follows: In the formula, C denotes the fusion data. t denotes multi-source data collected by real-time multi-channel sensors.
The last step is to verify the algorithm. The dynamic data are divided into two parts after collection, the test set and training set. The training set is used for modeling, and the test set is used to verify the model. The model needs to be verified in multiple operating modes because data from different operating modes have different features.

Fuzzy Decision Tree Submodel
The fuzzy processing model is capable of processing multiple dynamic data. Two kinds of dynamic data (speed data and air pressure data) will be used in this section. The speed data in this paper were acquired by the wheel speed sensor and GPS sensor of the laboratory test vehicle. The air pressure data were collected in real time from the train K9828 in Yunnan Province.
Before data fuzzification, all the data need to be reprocessed, as shown in Figure 5. This can be divided into four steps. Step 1: The data are input into the historical database. If the historical range is satisfied, turn to Step 2; otherwise, modify the historical credibility.
Step 2: Calculate the simulation speed based on the previous frame speed data and acceleration.
The difference between the simulation speed and current frame speed are obtained. If the difference has a reasonable range calculated according to historical data, turn to Step 3; otherwise, modify the historical credibility Step 3: If data are delayed, modify the historical credibility; otherwise, turn to Step 4.
Step 4: Fuzzify the data to establish membership functions based on slope parameters, environmental factors, and acceleration information.
Using the proposed algorithm, the feature properties of all data gains are examined. Then, the fuzzy set is segmented and iterated. Finally, a fuzzy decision tree is obtained, as shown in Figure 6. This algorithm uses speed data as an example to model.
The fuzzy rules can be extracted by the established fuzzy decision tree. There are five parts to the fuzzy rules (the degree of confidence of the rules):

Fuzzy Weighted Fusion Submodule
The multiple inputs of speed data need to be weighted and merged to improve the data accuracy by the fuzzy decision submodule. This submodule adopts the feature parameter database and the fuzzy weighted fusion algorithm introduced in the previous section. The fuzzy level can be defined as: If c ≤ 0.5, R i can be defined as: If c > 0.5, R i can be defined as:

Algorithm Analysis and Verification
The functions of the fuzzy decision tree submodule are data fuzzification and data preliminary filtering on multiple inputs of data and adjusting each input's historical credibility based on whether the data meet the input criteria. Currently, the train control system adopts a redundant structure to collect data, so four channels of speed data are collected (A, B, C, D). Figure 7 shows the comparison of Channel A data and data processed by the fuzzy decision tree submodule.  Air pressure data of single input Air pressure data after processing As can be seen from Figures 7 and 8, the noise of air pressure data is preliminarily filtered out. The results show that fuzzy decision tree submodule improves data reliability and data accuracy.
In order to verify the reliability of the model, fault data are randomly injected into the test set of speed data and air pressure data. Then, the test set data are input into the fuzzy decision tree submodule. Figure 9 shows the comparison of the single input of speed data with fault injection and data processed by the fuzzy decision tree submodule.  Air pressure data after processing Figure 10. Single input of air pressure data with fault injection and processed data.
As can be seen from Figures 9 and 10, after submodule processing, no matter the data's noise level, the injected fault data of air pressure data and speed data are eliminated. The results show that the fuzzy decision tree submodule effectively improves the data inconsistency and improves the reliability of the data.
The simulation results of MATLAB show that the accuracy of speed data is increased by 15.72%, and air pressure data is improved by 17.57%. For injection faults, the elimination rate is 100%.
There are four channels of speed input data in the train control system because single input data cannot satisfy the reliability of the system. After data have been processed by the submodule, the fuzzy weighted fusion submodule fuses multiple inputs of data. Figure 11 shows the comparison of multiple inputs of speed data (A-D) and data processed by the fuzzy weighted fusion submodule (E).  As can be seen from Figures 11 and 12, after data filtering and data fusion, accurate data are obtained by multiple inputs of data. The results show that the fuzzy weighted fusion submodule improves the robustness and accuracy of data.
In order to verify the fault tolerance of the fuzzy processing model, one of inputs of speed data and air pressure data is in the state of error to verify whether the fuzzy processing model can output the correct data by other inputs. Figure 13 shows the comparison of multiple inputs of speed data (A-D) and data processed by the fuzzy processing model (E).  Figure 14 shows the comparison of multiple inputs of air pressure data and data processed by the fuzzy processing model. Air pressure data Fault input of air pressure data Air pressure data after processing Figure 14. Air pressure data processing with one error input.
As can be seen from Figures 13 and 14, the error input barely affects the result, which is processed by the fuzzy processing model. The fuzzy processing model can process data correctly with one error input or even multiple error inputs. The results show that the fuzzy processing model has strong robustness and can further improve the accuracy of data.
According to the above analysis, it proves that the fuzzy processing model can effectively reduce the non-strict multi-sensor data problems, improve the data processing capability, as well as the system efficiency. In addition, the fuzzy processing model provides a high level of robustness.
In this section, the speed data and the air pressure data are modeled by the fuzzy processing model. Based on previous analysis and verification, the results show that the non-strict multi-sensor data problems of dynamic data are improved, indicating that the fuzzy processing model can improve the data processing capability and system efficiency. However, the non-strict problem of the safety computer in the train control system is not only limited to these two kinds of multi-sensor data, but also has corresponding problems for multi-sensor data such as bearing temperature and train safety operation condition evaluation. These data play important roles in the train control system. This needs further study. It is worth noting that the method studied in this paper can be applied to other heterogeneous redundant structure data.
There are several important things to mention. Firstly, The method studied in this paper can be applied to other heterogeneous redundant structure data. One kind of data may come from different sensors, and the data of different sensors have different data features. Therefore, it is necessary to collect a large amount of data and establish a database to analyze the data features before they can be used. Secondly, the membership functions of different data are totally different. The membership function plays a decisive role in the simulation, so we may need to adjust the parameter in the data analysis. Then, the sensor fusion data can be obtained by using the weighted fusion formula and the fuzzy attribute matrix.
A large amount of analog data and digital data will be generated in the operation of the train control system; both of them are essential. There are some important digital data problems in the train control system, but the method of fuzzy theory in this paper is only applicable to analog data. Therefore, it is important to make further efforts in that direction.

Conclusions
Traditional safety computers in the train control system only compare multiple input data and operate based on the sameness of data, and all types of data perform the same operation. The architecture is simple, but has non-strict multi-sensor data problems and a low efficiency in performance. In this paper, the non-strict multi-sensor data problems that exist in traditional safety computers are analyzed. The multi-sensor input data are classified into three types based on data features and safety computer features. The feature extraction of train control data is discussed in detail, and data features are selected from three aspects of data features to reflect the operating state of the train control system accurately. The non-strict multi-sensor data problems of the safety computer can only be caused by dynamic data; therefore, this paper proposes a fuzzy processing model that only processes dynamic data in order to improve the data processing capability and system efficiency. Then, an improved onboard safety computer platform that combines a double two-vote-two onboard safety computer with a fuzzy processing model for processing dynamic data is proposed. The fuzzy processing model is divided into two parts: improved fuzzy decision tree and fuzzy weighted fusion. Finally, the fuzzy processing model is verified based on the speed and air pressure data. Verification results indicate that the fuzzy processing model can effectively reduce the non-strict multi-sensor data problems and improve the system's efficiency and robustness on the premise of ensuring the reliability of the input data.
Author Contributions: The authors contributed equally to this work.
Funding: This work is supported by the National Natural Science Foundation of China (U1734211 and U1534208).