Multiple Sensor Fail-Operational Architecture for Electric Vehicle Powertrain Control System

: With the expanding demand to meet speciﬁc safety requirements, a new deﬁnition of the architecture at the system level is required to keep the powertrain system still operational after the fault emerge of some sensors. This work proposes a fail-operational architecture by integrating battery management and motor control system, which implements heterogeneous sensor signal reconstruction and model-based signal estimation for redundant signal generation and adopts random forest for signal arbitration. The proposed architecture can reduce the system failure rate and allow a fault-toleration of up to three sensors at any given time without increasing costs. Finally, the proposed architecture was veriﬁed by comparing the fault detection performance among three arbitration algorithms in a model in the loop (MIL) platform.


Introduction
With the significant growth of vehicle electrification in the past few years, so are the complexity of electronic systems and the difficulty in its functional safety design. For example, the focus from the first version of ISO262626 [1] to the second version in 2018 shifted from the fail-safe systems [2] in general to more strict requirements regarding hazard avoidance by creating a fail-operational systems. To ensure a system to be operated continuously when a fault occurs, changes of the entire system and the addressing of the random hardware failures must be considered from a higher system level, such as the powertrain system.
The powertrain system controllers of most electric vehicles include the vehicle controller unit (VCU), motor controller unit (MCU), and battery management system (BMS), connected via control area network (CAN) [3]. Among them, BMS and MCU are responsible for generating the vehicle propulsion like the traditional internal combustion engine system. The main function of BMS is to monitor and calculate the state of charge (SoC), state of health (SoH), and state of power (SoP) of the battery pack, equalize the battery cell voltage [4][5][6][7][8], and further ensure its safety through relying on the DC bus current sensor, the battery cell voltage sensor, and the temperature sensor [9][10][11], while Bukhari shows that the MCU performs the related propulsion control based on the battery status, phase current sensor, rotor position sensor, and temperature sensor [12]. Both BMS and MCU systems strongly rely on the signals of the sensors, and any fault may cause both systems the unexpected behaviors [13]. Therefore, the mechanisms of effective sensor fault detection and isolation are much required.
The common MCU faults include damage or short circuit of power switching transistor and sensor anomalies. The extended Kalman filter (EKF) can be used to calculate the residual of the sensor, effectively avoid system failure caused by single sensor fault [14,15] Furthermore, the additional current sensor can be used to reconstruct the phase current and to achieve multiple fault tolerance of the motor controllers [16]. Some research has used deep long short-term memory as the residual generator and the random forest (RF) algorithm as the fault classifier to perform early fault diagnosis of wind turbines [17]. The advantage of this method is not necessary to know the physical model of the system. In addition, the fast Fourier transform can be used to obtain the characteristics of the signal, and further construct Bayesian networks to detect the faults of the inverter [18].
The common BMS fault can be classified into internal and external faults [19,20]. Internal faults include the characteristic changes of battery due to the overcharge, overdischarge, overheating, thermal runaway, electromagnetic interference induced failure [21], and other behaviors, while the external faults consist of the abnormalities in the cooling system, wiring, or sensors. Many studies have adopted the model-based method to detect sensor faults.
A sliding mode observer based on electrical design is used to detect voltage, temperature, and current sensor failures [22]. Coulomb counting based and unscented Kalman filter (UKF) can help to estimate the SoC of each battery cell, and calculate the residual between the two to further find the sensor fault in the battery pack [23]. The recursive least squares method was used to estimate the equivalent model and adapt common change-point algorithm to detect anomalies in voltage and current sensors [24].
The vast majority of researchers have focused on the fault detection mechanism for one single sensor, and the generation of estimated signals through other normal sensors to achieve fault tolerance. However, it is common to encounter multiple faults at the same time in vehicles due to the common cause failure (CCF). For example, as two phase current sensors often share the same set of power supplies, the power supply failure can cause their abnormal situation simultaneously. Hence, maintaining the operating characteristics of the system in this situation becomes a major challenge.
In addition, the residual generation by the model-based method can effectively detect faults of the sensors, but the premise is to rely on the accurate model parameters to ensure the system reliability. Although the motor parameters cannot generate too significant changes throughout the product life cycle, the manufacturing tolerances can pose the great challenges in obtaining parameters. Thus, using the model-based fault detection mechanism may cause improper results. This further leads to greater challenges for modelbased diagnosis techniques. On the contrary, the use of redundant sensors, do not rely on accurate models. However, it is quite rare to use redundant sensors in the controllers of vehicles, due to cost factors.
In this study, a new integrated battery management and motor control system (IBMS) is proposed. Under the premise of not increasing the costs of the sensors, the integrated system can help mutual system sensors to generate redundant signals and to further reshape the output through the machine learning. The proposed IBMS system can realize the requirement of continuous operation and verify in the MIL system. The main contribution of this work is to propose the new architecture and algorithm of the powertrain, and to constitute the mutually redundant sensors of BMS and MCU from the system aspect. As a result, the powertrain system will effectively avoid the failure when the sensors of these two sub-systems are faulty.

System Overview
The proposed architecture for IBMS is show in Figure 1, where the virtual BMS and MCU are running on the high-performance controller, called integrated battery management and motor control unit (IBMU). Within this architecture, the user's pedal commands are converted into torque by the external VCU and then fed into the IBMU through CAN. direct current (DC) bus current signal and (2) having the cell voltage signal as the input being used to estimate the battery SoC, and (3) having the motor controller taking the phase current signal and the DC bus voltage signal as the input of the field-oriented control (FOC) algorithm. In this setup, the DC bus current and phase current can be converted by special sampling and calculation to achieve fail-operation. As indicated in Table 1, it shows that the main sensors of BMS and MCU and the hazard caused by their faults, because these main sensors can directly affect the vehicle.   The battery system consists of several battery packs, contained within each pack are several battery modules connected to battery cell IC controller. Each chip provides the measurement signals for the IBMU, while the battery packs are arranged in parallel through the power distribution unit. Based on the values of cell voltage and pack current, the SoC and SoP are calculated by the virtual BMS and provided to the virtual MCU. Finally, the virtual MCU performs the current closed-loop control, torque command, phase current, and rotor position.
The IBMS builds the important and complementary sensors through integrating BMS and MCU, so that the virtual system can still operate under the condition of some sensor fault. For example, the integration of heterogeneous signals and machine learning arbitration helps the system to continue operation by allowing the tolerance of multiple signal faults. This is done by (1) having the battery management system requiring to take the direct current (DC) bus current signal and (2) having the cell voltage signal as the input being used to estimate the battery SoC, and (3) having the motor controller taking the phase current signal and the DC bus voltage signal as the input of the field-oriented control (FOC) algorithm. In this setup, the DC bus current and phase current can be converted by special sampling and calculation to achieve fail-operation. As indicated in Table 1, it shows that the main sensors of BMS and MCU and the hazard caused by their faults, because these main sensors can directly affect the vehicle.  Figure 2 is the IBMS architecture fault tree analysis, of which the solid line is the failure relationship between the original BMS and the MCU system. It can be seen that any single sensor fault will cause the unexpected behaviors of the system before the introduction of the architecture and algorithm. The dotted line emphasizes the analysis result of redundant signals after integrating the BMS, MCU, and proposed algorithm, especially the AND gate in the failure path can effectively avoid the single point of failure (SPOF) and CCF. Under the premise of not changing the component failure rate, the new architecture can reduce the failure rate of about one order of magnitude. For example, assume that the probability of failure of each sensor is 1 × 10 −3 (P1-P6), and the controller is 1 × 10 −4 (P7). In the original architecture, the probability of field oriented control module failure event is 2.997 × 10 −3 , while the proposed architecture is 3.029 × 10 −4 [31]. In this study, two strategies, the use of heterogeneous sensors and the use of different algorithms, were adopted to avoid system failures caused by CCF.  Figure 2 is the IBMS architecture fault tree analysis, of which the solid line is the failure relationship between the original BMS and the MCU system. It can be seen that any single sensor fault will cause the unexpected behaviors of the system before the introduction of the architecture and algorithm. The dotted line emphasizes the analysis result of redundant signals after integrating the BMS, MCU, and proposed algorithm, especially the AND gate in the failure path can effectively avoid the single point of failure (SPOF) and CCF. Under the premise of not changing the component failure rate, the new architecture can reduce the failure rate of about one order of magnitude. For example, assume that the probability of failure of each sensor is 1 × 10 −3 (P1-P6), and the controller is 1 × 10 −4 (P7). In the original architecture, the probability of field oriented control module failure event is 2.997 × 10 −3 , while the proposed architecture is 3.029 × 10 −4 [31]. In this study, two strategies, the use of heterogeneous sensors and the use of different algorithms, were adopted to avoid system failures caused by CCF.  Figure 3 is a block diagram of the IBMU algorithm, which can be divided into four blocks: (a) Control signal fusion module, responsible for signal arbitration and redundant generation; (b) motor control module, responsible for the operation of FOC algorithm; (c) battery status monitor module, responsible for monitoring the status of the battery such as SoH, SoC, SoP, and other indicators, battery balance, as well as the connection of relays; (d) supporting module responsible for thermal management, communication, signal conversion, IO control, and implementing emergency measures.

The Proposed Approach
The fault detection process includes several steps: (1) The supporting module acquires the raw signal and divides the signal into two groups. The first group includes the DC bus current signal, phase voltage signal, cell voltage signal, and pack current signal and the second group includes the phase current signal, rotor position signal, DC bus  Figure 3 is a block diagram of the IBMU algorithm, which can be divided into four blocks: (a) Control signal fusion module, responsible for signal arbitration and redundant generation; (b) motor control module, responsible for the operation of FOC algorithm; (c) battery status monitor module, responsible for monitoring the status of the battery such as SoH, SoC, SoP, and other indicators, battery balance, as well as the connection of relays; (d) supporting module responsible for thermal management, communication, signal conversion, IO control, and implementing emergency measures.

The Proposed Approach
The fault detection process includes several steps: (1) The supporting module acquires the raw signal and divides the signal into two groups. The first group includes the DC bus current signal, phase voltage signal, cell voltage signal, and pack current signal and the second group includes the phase current signal, rotor position signal, DC bus voltage signal, and phase voltage signal; (2) the first group will generate reconstructed signals based on signal recombination or sampling; (3) the second group will use a model-based approach to generate estimated signals; (4) the three sets of signals are used by random forest classifier to find the source of the failure sensor.
voltage signal, and phase voltage signal; (2) the first group will generate reconstructed signals based on signal recombination or sampling; (3) the second group will use a modelbased approach to generate estimated signals; (4) the three sets of signals are used by random forest classifier to find the source of the failure sensor.

Motor Control Module
Motor control module uses the FOC algorithm to control the three-phase AC motor.
After the coordinate conversion, the circuit model of the permanent magnet motor can be expressed as in Equations (5) and (6), where the back electromotive forces and are related to the generated flux. In permanent magnet motor, most magnetic field is generated by the magnet installed on rotor, and the generated magnetic flux is a fixed value. In (7) is the magnetic flux generated by the magnet, while the stator magnetic flux and are related to the current, as shown in Equations (7) and (8). Equations (9) and (10) expressed the back electromotive forces of d-q axis as the rotor speed multiplies with the magnetic flux. PMSM torque equation is shown in (11), in which two PID are used to regulate to control of the motor torque [32].

Motor Control Module
Motor control module uses the FOC algorithm to control the three-phase AC motor. FOC allows the separate control of the magnetic flux and torque. To simplify the motor model, FOC uses Park and Clark transformations to perform coordinate conversion of the three-phase current as in Equations (1)-(4), where i d is the flux control current, i q is the torque control current, i a i b are the phase current and θ r is the rotor position.
After the coordinate conversion, the circuit model of the permanent magnet motor can be expressed as in Equations (5) and (6), where the back electromotive forces e d and e q are related to the generated flux. In permanent magnet motor, most magnetic field is generated by the magnet installed on rotor, and the generated magnetic flux is a fixed value. In (7) λ f is the magnetic flux generated by the magnet, while the stator magnetic flux λ d and λ q are related to the current, as shown in Equations (7) and (8). Equations (9) and (10) expressed the back electromotive forces of d-q axis as the rotor speed multiplies with the magnetic flux. PMSM torque equation is shown in (11), in which two PID are used to regulate i d i q to control of the motor torque [32].
where v d v q are the d-q axis stator voltage, R s is the stator resistance, ω r is the rotor angular speed and L d L q are the d-q axis inductance.

Battery Status Monitor Module
This study mainly focuses on the impact of sensor fault on SoC and SoP, in which SoC estimates the remained mileage and SoP directly affects the power output. The SoC estimation method can be divided into eight types [9], of which Coulomb counting and electrical model-based estimation is a relatively mature and widely used algorithm. The most common SoC estimation method is to use Coulomb counting indicated in Equation (12), where SoC(t 0 ) is the initial value of SoC, C Q is the rated capacity, I dc is the current on the DC bus, and I loss is the chemical loss of the battery itself.
With the measured output current, the battery capacity can be obtained by accumulating the electric charge. This method is required to accurately measure the battery current, obtains the initial SoC, and conducts the integration for charging and discharging current during the process. While the initial state of the SoC can be obtained through the battery usage history or the open circuit voltage (OCV) method. Another common SoC estimation method relies on the EKF [33,34] of which state models are shown in Equations (13) and (14), where w is the system noise, v is the measurement noise. These two vectors are uncorrelated and zero-mean vector. .
The battery equivalent circuit model as show in Figure 4 [35][36][37] is used to estimate the state of the cell, with the internal state from SoC. The accuracy of EKF, based SoC algorithm, depends on its equivalent circuit model and the related parameters, including internal resistance R 0 , diffusion resistance R n , and diffusion capacitance C n as shown below.
where are the d-q axis stator voltage, is the stator resistance, is the roto angular speed and are the d-q axis inductance.

Battery Status Monitor Module
This study mainly focuses on the impact of sensor fault on SoC and SoP, in whic SoC estimates the remained mileage and SoP directly affects the power output. The SoC estimation method can be divided into eight types [9], of which Coulomb counting and electrical model-based estimation is a relatively mature and widely used algorithm. Th most common SoC estimation method is to use Coulomb counting indicated in Equation (12), where SoC( ) is the initial value of SoC, is the rated capacity, is the curren on the DC bus, and is the chemical loss of the battery itself. With the measured output current, the battery capacity can be obtained by accumu lating the electric charge. This method is required to accurately measure the battery cur rent, obtains the initial SoC, and conducts the integration for charging and dischargin current during the process. While the initial state of the SoC can be obtained through th battery usage history or the open circuit voltage (OCV) method. Another common SoC estimation method relies on the EKF [33,34] of which state models are shown in Equation (13) and (14), where w is the system noise, v is the measurement noise. These two vector are uncorrelated and zero-mean vector.
The battery equivalent circuit model as show in Figure 4 [35][36][37] is used to estimat the state of the cell, with the internal state from SoC. The accuracy of EKF, based SoC algorithm, depends on its equivalent circuit model and the related parameters, includin internal resistance , diffusion resistance , and diffusion capacitance as shown below. Higher-order models can achieve better fitting effects, but require more computin resources. This research adopts three RC pairs to establish the model (15), of which i the selected state vector, including the SoC and voltage of RC pair. The input vector i the change in the DC bus current due to charging or discharging, and the output vecto is the cell voltage, it can be obtained by adding up the voltage of OCV and the voltag of RC pair. The behavior model of the battery is described in Equation (16). Higher-order models can achieve better fitting effects, but require more computing resources. This research adopts three RC pairs to establish the model (15), of which x is the selected state vector, including the SoC and voltage of RC pair. The input vector u is the change in the DC bus current due to charging or discharging, and the output vector y is the cell voltage, it can be obtained by adding up the voltage of OCV and the voltage of RC pair. The behavior model of the battery is described in Equation (16).
The EKF operation includes two steps, the prediction and correction steps. The prediction step is to calculate the predicted statex − k , and the predicted covariance matrix P − k , as shown in Equations (17) and (18), where T s is the sampling period of EKF and F is the system Jacobian matrix. The second step is to update the results from the predicted state. First, it calculates the Kalman gain K k using Equation (19). Next, it corrects the values ofx − k from (17) by (20) and update the covariance matrix P − k from (18) by (21), where H is the measurement Jacobian matrix.
With the DC bus current sensor, the Coulomb counting method can estimate the SoC of the battery easily, but the accumulation of integration errors and different usage situations of battery may lead to significant errors. By measuring cell voltage and DC bus current, EKF based SoC algorithm can timely correct the errors of the initial value and suppress noises. In this study, both methods are adopted and switch based on the condition of the sensor. Being the another battery status indicator, SoP has a more direct impact on the system, which evaluates the available output power of the battery under safe conditions [8]. The torque output of the powertrain system can be adjusted according to this number, especially the wrong SoP can cause the unexpected acceleration and deceleration of the vehicles. The most common situation is when the motor driver uses more power than that is available can greatly drop the battery voltage and trigger the low voltage protection mechanism to cause the discontinuity of propulsion. The calculation of SoP is closely related to the degree of battery aging, the ambient temperature, and the use situation. Feature lookup table is the most extensive estimation method. In this study, we mainly build SoP into a three-dimensional table and further calculate the output limit of current based on the battery temperature, SoC and SoH as show in Figure 5.
The EKF operation includes two steps, the prediction and correction steps. The prediction step is to calculate the predicted state , and the predicted covariance matrix , as shown in Equations (17) and (18), where is the sampling period of EKF and is the system Jacobian matrix. The second step is to update the results from the predicted state. First, it calculates the Kalman gain using Equation (19). Next, it corrects the values of from (17) by (20) and update the covariance matrix from (18) by (21), where is the measurement Jacobian matrix.
With the DC bus current sensor, the Coulomb counting method can estimate the SoC of the battery easily, but the accumulation of integration errors and different usage situations of battery may lead to significant errors. By measuring cell voltage and DC bus current, EKF based SoC algorithm can timely correct the errors of the initial value and suppress noises. In this study, both methods are adopted and switch based on the condition of the sensor. Being the another battery status indicator, SoP has a more direct impact on the system, which evaluates the available output power of the battery under safe conditions [8]. The torque output of the powertrain system can be adjusted according to this number, especially the wrong SoP can cause the unexpected acceleration and deceleration of the vehicles. The most common situation is when the motor driver uses more power than that is available can greatly drop the battery voltage and trigger the low voltage protection mechanism to cause the discontinuity of propulsion. The calculation of SoP is closely related to the degree of battery aging, the ambient temperature, and the use situation. Feature lookup table is the most extensive estimation method. In this study, we mainly build SoP into a three-dimensional table and further calculate the output limit of current based on the battery temperature, SoC and SoH as show in Figure 5.

Control Signal Fusion Module
The control signal fusion module of fault-tolerant system relies on the heterogeneous signal reconstruction and signal estimation to build the corresponding fusion algorithm that can detect and isolate multiple faults in the IBMS system. As shown in Figure 6, the sensor fusion module includes three sub blocks: (a) signal estimation, (b) signal reconstruction, and (c) signal arbitration blocks.

Control Signal Fusion Module
The control signal fusion module of fault-tolerant system relies on the heterog signal reconstruction and signal estimation to build the corresponding fusion alg that can detect and isolate multiple faults in the IBMS system. As shown in Figur sensor fusion module includes three sub blocks: (a) signal estimation, (b) signa struction, and (c) signal arbitration blocks. The signal estimation block uses the phase current signal and the extended K filter to predict the phase current and rotor speed signal. The signal reconstructio is responsible for the rebuilding phase current signal through the DC bus current and the pulse width modulation (PWM) status, and uses the line voltage sensor an electromotive force to calculate the rotor speed. The DC bus voltage can be obta calculating the sum of the single cell voltage, while the DC bus current can be acqu calculating the sum of the battery pack current signals or reconstructed by calc phase current and PWM status.
Although both blocks can output current, voltage, and rotor speed signals, th reconstruction block focuses on converting real measurement signals into targ through calculation, while the estimation block concentrates on using the motor a tery model to predict the signal. The processing mechanisms of the signal reconst and signal estimation can effectively make the operation normal even when fac CCF. On the other hand, the signal arbitration module plays the role of the final d making. By collecting a large amount of data and training the random forest mode help determine the possible failure factors of the system, and further reconfigure t trol signal through the finite state machine. The signal estimation block uses the phase current signal and the extended Kalman filter to predict the phase current and rotor speed signal. The signal reconstruction block is responsible for the rebuilding phase current signal through the DC bus current sensor and the pulse width modulation (PWM) status, and uses the line voltage sensor and back electromotive force to calculate the rotor speed. The DC bus voltage can be obtained by calculating the sum of the single cell voltage, while the DC bus current can be acquired by calculating the sum of the battery pack current signals or reconstructed by calculating phase current and PWM status.
Although both blocks can output current, voltage, and rotor speed signals, the signal reconstruction block focuses on converting real measurement signals into target data through calculation, while the estimation block concentrates on using the motor and battery model to predict the signal. The processing mechanisms of the signal reconstruction and signal estimation can effectively make the operation normal even when facing the CCF. On the other hand, the signal arbitration module plays the role of the final decision-making. By collecting a large amount of data and training the random forest model, it can help determine the possible failure factors of the system, and further reconfigure the control signal through the finite state machine.  are rebuilt utilizing DC bus current [38][39][40]. These studies showed how to utilize the single current sensor to rebuild the phase current. Table 2 indicates the relationship between the DC current and the AC current for the switched PWM.
When the PWM is switched into V0 and V7 sector, I dc has no current. When the PWM operates in V1 sector, the measured DC bus current flows into S a and out from S b and S c , the measured DC bus current is equal to the signal of the phase A current, as shown in the red dotted line of Figure 7a. When the PWM operates in V2 sector, the measured DC bus current is a negative phase C current as shown in the green dotted of line Figure 7b. And the phase current signal can be reconstructed based on this logic.
Finally, the phase current and are rebuilt utilizing DC bus current [38][39][40]. These studies showed how to utilize the single current sensor to rebuild the phase current. Table 2 indicates the relationship between the DC current and the AC current for the switched PWM.
When the PWM is switched into V0 and V7 sector, has no current. When the PWM operates in V1 sector, the measured DC bus current flows into and out from and , the measured DC bus current is equal to the signal of the phase A current, as shown in the red dotted line of Figure 7a. When the PWM operates in V2 sector, the measured DC bus current is a negative phase C current as shown in the green dotted of line Figure 7b. And the phase current signal can be reconstructed based on this logic.

Estimation Block
Estimation block consists of three subunits, including the DC voltage estimation unit, the DC current estimation unit and the phase current and rotor speed estimation, which can be used to produce the estimated control signals shown in Figure 8. The DC voltage estimation unit uses torque and motor speed to calculate the overall system power, as shown in (24), of which is the torque estimation equation in (11). To obtain the value of DC voltage, (25) is employed. Note that is the system efficiency

Estimation Block
Estimation block consists of three subunits, including the DC voltage estimation unit, the DC current estimation unit and the phase current and rotor speed estimation, which can be used to produce the estimated control signals shown in  Finally, the phase current and are rebuilt utilizing DC bus current [38][39][40]. These studies showed how to utilize the single current sensor to rebuild the phase current. Table 2 indicates the relationship between the DC current and the AC current fo the switched PWM.
When the PWM is switched into V0 and V7 sector, has no current. When the PWM operates in V1 sector, the measured DC bus current flows into and out from and , the measured DC bus current is equal to the signal of the phase A current, a shown in the red dotted line of Figure 7a. When the PWM operates in V2 sector, the meas ured DC bus current is a negative phase C current as shown in the green dotted of line Figure 7b. And the phase current signal can be reconstructed based on this logic.

Estimation Block
Estimation block consists of three subunits, including the DC voltage estimation unit the DC current estimation unit and the phase current and rotor speed estimation, which can be used to produce the estimated control signals shown in Figure 8. The DC voltage estimation unit uses torque and motor speed to calculate the overal system power, as shown in (24), of which is the torque estimation equation in (11). To obtain the value of DC voltage, (25) is employed. Note that is the system efficiency The DC voltage estimation unit uses torque and motor speed to calculate the overall system power, as shown in (24), of which T e is the torque estimation equation in (11). To obtain the value of DC voltage, (25) is employed. Note that η sys is the system efficiency of the controller and the motor operations in different situations. η sys is various for different speeds and torques. Therefore, it is necessary to first build the efficiency table when using this method.
The DC current estimation unit assumes that the input power V dc I dc multiplies by the system efficiency η sys to get the output power as shown in (26), where V pha−a , V pha−b , and V pha−c are the root-mean-square values of the three-phase voltages. The amplitudes of PWM can be expressed as in (27) (26), the DC bus current I dc is shown in (30).
In phase current and rotor speed estimation unit, the EKF algorithm is utilized to generate the estimated phase current and rotor speed. In Equation (31) x is the selected state vector, which includes the magnetic flux of the d-q axis, the rotor speed, and the rotor degree. In Equation (31), the input vector is the voltage and the output part is the phase current, where the state matrix f , the measurement matrix h, the system Jacobian matrix F, and the measurement Jacobian matrix H are shown in Equations (32)-(35), respectively. The EKF algorithm can estimate the best value of x state, and further estimate the phase current, based on Equation (33).

Signal Arbitration
After the process of signal reconstruction and signal estimation, multiple sets of reference signals are obtained. The essence of the random forest (RF) classifier, which we used for arbitration, is a mixture of decision trees, bagging, and random concepts. Compared to traditional decision trees, the random forest algorithm relies on Bootstrap in order to obtain several databases and further build into a decision tree.
To form the required computing resources and performances, the results from the RF method [41] are compared with two machine learning algorithms: decision tree and support vector machine (SVM). Given that the classified content is a time series, when generating the training model dataset, time domain features should also be extracted to take the coefficient of variation, mean and peak to peak, residuals of the original signal, reconstructed signal, and estimated signal as the part of the training dataset in addition to using the raw data.

Evaluation of the Proposed Approach
To quickly train the classification model and test different fault scenarios, this work uses the MIL system to verify the proposed architecture and algorithm [42]. Figure 9 shows the construction of the MIL system, including the offline modeling and real-time in the loop simulation. A sufficiently accurate model should be essential to make MIL verification closer to the actual verification. In this study, the battery model 120155250N manufactured by AMITA and 50 KW PMSM motor were selected as modeling targets. The way to build the battery model is based on what Huria proposed [43]. The motor flux model is able to be completely established by measuring the three-phase voltage and current [44]. After the offline parameters are fully identified, the plant models of the battery and the motor can be placed in the real-time simulator, and the IBMU algorithm, proposed in this work, is applied in the virtual microcontroller. The high-fidelity physical plant model and virtual microcontroller can be effectively connected through the internal communication protocol or share memory. Because all the sensor signals are generated by the real-time simulator and plant model, the abnormal situation of the sensor can be easily inputted into the MIL system, in which the quick emulator (QEMU) can help to simulate the virtual microcontroller, with the set Central Processing Unit (CPU) frequency at 1000 MHz.

Model Training and Deployment
The motor and battery have different bandwidth responses, hence the speed of executing error diagnosis is also different. In order to reduce the load of the controllers, the fault classification model has been divided into two parts. The main classification model has a total of 75 input signals, including 5 measured signals, 11 estimated or reconstructed signals, 11 residual values, and 48 feature extraction variables. All features would first be normalized to ensure that the data is between 0 and 2, and the sensor status has five types; that is, the normal state, no signal, bias, stuck, and intermittent fault for encoding.
The proposed architecture is simulated through Simulink and by changing the state of the sensor to generate the variant test data sets and labeling data sets, as shown in Table  3. In each situation, 20,000 data are obtained, and are used to train three classification models of decision tree, random forest, and SVM. In the decision tree part, it relies on all the features and takes Gini index as the split, in order to better limit 60 of the maximum

Model Training and Deployment
The motor and battery have different bandwidth responses, hence the speed of executing error diagnosis is also different. In order to reduce the load of the controllers, the fault classification model has been divided into two parts. The main classification model has a total of 75 input signals, including 5 measured signals, 11 estimated or reconstructed signals, 11 residual values, and 48 feature extraction variables. All features would first be normalized to ensure that the data is between 0 and 2, and the sensor status has five types; that is, the normal state, no signal, bias, stuck, and intermittent fault for encoding.
The proposed architecture is simulated through Simulink and by changing the state of the sensor to generate the variant test data sets and labeling data sets, as shown in Table 3. In each situation, 20,000 data are obtained, and are used to train three classification models of decision tree, random forest, and SVM. In the decision tree part, it relies on all the features and takes Gini index as the split, in order to better limit 60 of the maximum number of splits. During the process of training random forest, it takes 20 classification trees and all features as the upper limits, while SVM uses the linear kernel to train the model and classify errors. The secondary classification model is used to locate the ill-functioned integrated circuit (IC). The cell voltage is usually measured by a specific battery cell controller IC. For example, a single MC33771B manufactured by NXP Semiconductors can be used to measure the voltage of up to 14 cells. In this study two groups of 84S3P battery packs are selected, that means the IBMU needs at least 12 ICs to measure all cell voltage signals. The aforementioned main model is responsible for confirming the IC state, while the secondary model is responsible for fault detection from 12 ICs.

Measure for Implementation
The architecture proposed by this study includes the signal fusion algorithm using machine learning, motor control algorithm, battery status monitor algorithm, as well as other logical operations or drivers with the lower computational load. In the past, due to the limitations of chip performance, there were fewer chips that could meet the functional safety, as well as simultaneously control the motor and estimated battery status. According to the related experiments, TMS570 ARM Cortex R4 chip manufactured by Texas Instruments single core 160 Dhrystone million instructions executed per second (DMIPS) can complete MCU control roughly for 60 µs of computing time without specific hardware acceleration requirements. Under the control system of 10 kHz PWM frequency, the CPU load reached nearly 70% and was unable to integrate other algorithms. However, in recent years, the architecture of vehicle electronic systems has gradually evolved from the central gateway architecture to the domain controller architecture. Thus, the high-performance chips suitable for power systems have begun to appear in the market, such as the S32P series chips launched by NXP which can own the computing power of 6000 DMIPS. It can be said that the emergence of domain controller realizes the architecture proposed by this research. Figure 10 is the algorithm flow chart of the IBMU. Figure 11 is the execution timing of each task, and the highest priority interrupt can execute once every 100 µs. Task T2 is triggered by this interrupt which is responsible for the calculation of FOC algorithm and timely outputting the results to the hardware circuit. The second highest priority interrupt can execute once every 200 µs. Task T1 is triggered by this interrupt which is responsible for generating redundant signal.
The second highest priority interrupt can execute once every 200 μs. Task T1 is triggered by this interrupt which is responsible for generating redundant signal.
Tasks T3 and T4 perform SoC estimation every 10 ms and 100 ms, respectively. As each EKF estimation can only be conducted for one battery, the whole SoC estimation time actually requires 840 ms, and SoP can be performed after the calculation of all batteries. Task T5 is executed at the speed of 20 ms, which is mainly responsible for the main fault classification and invoke fault handler after detecting the fault. Task T6 mainly performs the second level classification, and its goal is to find the faulty measurement IC.   Table 4 summarizes the evaluation of the utilization rate of each task. This research mainly adopts the QEMU ARM Cortex A9 1000 MHz simulator provided by MathWorks for testing, and the computing power of this simulation chip is about 2500 DMIPS/Core. The overall CPU load is around 233%, which means that the CPU needs at least 6000 DMIPS or higher computing power to cover all task execution.  Figure 12 it indicates that the response for each sensor is healthy, which can be used as a baseline for comparison of other experiments. In this simulation, the motor continuously outputs 100 Nm of torque as a load and continuously discharges the battery for 10 s. The referred SoC and Torque used in Figure 12a are consistent with the estimated results. The raw sensor data, reconstructed data and estimated data of the sensors in Figure   Figure 11. Task execution chart.
Tasks T3 and T4 perform SoC estimation every 10 ms and 100 ms, respectively. As each EKF estimation can only be conducted for one battery, the whole SoC estimation time actually requires 840 ms, and SoP can be performed after the calculation of all batteries. Task T5 is executed at the speed of 20 ms, which is mainly responsible for the main fault classification and invoke fault handler after detecting the fault. Task T6 mainly performs the second level classification, and its goal is to find the faulty measurement IC. Table 4 summarizes the evaluation of the utilization rate of each task. This research mainly adopts the QEMU ARM Cortex A9 1000 MHz simulator provided by MathWorks for testing, and the computing power of this simulation chip is about 2500 DMIPS/Core. The overall CPU load is around 233%, which means that the CPU needs at least 6000 DMIPS or higher computing power to cover all task execution.  Figure 12 it indicates that the response for each sensor is healthy, which can be used as a baseline for comparison of other experiments. In this simulation, the motor continuously outputs 100 Nm of torque as a load and continuously discharges the battery for 10 s. The referred SoC and Torque used in Figure 12a are consistent with the estimated results. The raw sensor data, reconstructed data and estimated data of the sensors in Figure 12b-d, are all consistent. Specifically, the residual is also within the tolerance range. This means that both the estimated signal and the reconstructed signal can replace the raw sensor signal. The impact of each sensor fault, and the required time to restore the system will be introduced later.  Figure 13 shows the system response after losing the phase A current sensor signal. Since the frequency of motor control task is much higher than the frequency of fault detection task, the system will temporarily lose control after the fault is injected at 0.05 s and  Figure 13 shows the system response after losing the phase A current sensor signal. Since the frequency of motor control task is much higher than the frequency of fault detection task, the system will temporarily lose control after the fault is injected at 0.05 s and cause an unstable in the torque output of the motor. When the fault is diagnosed at 0.076 s, one can switch the FOC control algorithm input signal from the raw current sensor signal to the reconstructed sensor signal and make the system return to stability at 0.085 s. In this research, diagnosis task is executed periodically, and the user can set the threshold of the phase current as the condition for task triggering.

DC Bus Voltage Sensor Fault
The torque output of motor is determined by current, the abnormality of voltage sensor does not directly affect the power output in most cases. However, in order to suppress the DC bus voltage ripple, the controller will acquire the dq axis command and then uses the voltage signal for compensation, which causes the torque command to be inconsistent with the output torque when the voltage sensor is abnormal. As shown in Figure 14, when the fault is injected into the voltage sensor signal at 0.05 s, the motor output will be lower than expected due to the voltage signal drift. In this case, the reconstructed DC bus voltage signal is based on the accumulation of the cell voltage; therefore, it will not be affected by the drift of the raw voltage sensor signal. However, the estimated DC bus voltage signal is calculated based on the Equation (11) and the output torque changes which indirectly lead to the accuracy of the estimated voltage signal. However, it is this signature that allows us to clearly diagnose signal drift faults. When the fault is diagnosed at 0.076 s, the voltage input of the motor control algorithm is switched to the reconstructed voltage signal, and the system is able to regain stability at 0.081 s.

DC Bus Voltage Sensor Fault
The torque output of motor is determined by current, the abnormality of voltage sensor does not directly affect the power output in most cases. However, in order to suppress the DC bus voltage ripple, the controller will acquire the dq axis command and then uses the voltage signal for compensation, which causes the torque command to be inconsistent with the output torque when the voltage sensor is abnormal. As shown in Figure 14, when the fault is injected into the voltage sensor signal at 0.05 s, the motor output will be lower than expected due to the voltage signal drift. In this case, the reconstructed DC bus voltage signal is based on the accumulation of the cell voltage; therefore, it will not be affected by the drift of the raw voltage sensor signal. However, the estimated DC bus voltage signal is calculated based on the Equation (11) and the output torque changes which indirectly lead to the accuracy of the estimated voltage signal. However, it is this signature that allows us to clearly diagnose signal drift faults. When the fault is diagnosed at 0.076 s, the voltage input of the motor control algorithm is switched to the reconstructed voltage signal, and the system is able to regain stability at 0.081 s.

DC Bus Current Sensor Fault
The fault of the DC bus current sensor will affect the estimation of the SoC. As the frequency of fault diagnosis is greater than the frequency of SoC calculation, most fault can be corrected timely. Figure 15 shows the impact caused by the worst case. The loss of DC bus current signal fault is injected at 5.99 s and the tasks of SOC estimation and main diagnosis are executed at 6 s. Since the diagnosis task takes 14 ms to complete, that is, the system can only diagnose the fault at 6.014 s, so the SOC estimation performed at the 6 s will be abnormal and cause deviation. When the system detects the fault at 6.014 s, it switches the input source of the SOC estimation signal from the raw DC bus current signal to the reconstructed DC bus current signal. Although the system has reconfigured the fault signal, the SOC estimation algorithm based on the Coulomb counter cannot correct the deviation caused by signal loss, the deviation is negligible.
with the output torque when the voltage sensor is abnormal. As shown in Figure 14, when the fault is injected into the voltage sensor signal at 0.05 s, the motor output will be lower than expected due to the voltage signal drift. In this case, the reconstructed DC bus voltage signal is based on the accumulation of the cell voltage; therefore, it will not be affected by the drift of the raw voltage sensor signal. However, the estimated DC bus voltage signal is calculated based on the Equation (11) and the output torque changes which indirectly lead to the accuracy of the estimated voltage signal. However, it is this signature that allows us to clearly diagnose signal drift faults. When the fault is diagnosed at 0.076 s, the voltage input of the motor control algorithm is switched to the reconstructed voltage signal, and the system is able to regain stability at 0.081 s.

DC Bus Current Sensor Fault
The fault of the DC bus current sensor will affect the estimation of the SoC. As the frequency of fault diagnosis is greater than the frequency of SoC calculation, most fault can be corrected timely. Figure 15 shows the impact caused by the worst case. The loss of DC bus current signal fault is injected at 5.99 s and the tasks of SOC estimation and main diagnosis are executed at 6 s. Since the diagnosis task takes 14 ms to complete, that is, the system can only diagnose the fault at 6.014 s, so the SOC estimation performed at the 6 s will be abnormal and cause deviation. When the system detects the fault at 6.014 s, it switches the input source of the SOC estimation signal from the raw DC bus current signal to the reconstructed DC bus current signal. Although the system has reconfigured the fault signal, the SOC estimation algorithm based on the Coulomb counter cannot correct the deviation caused by signal loss, the deviation is negligible.

Battery Cell Voltage Sensor Fault
The fault of the cell voltage mainly affects the estimation of the EKF based SoC and reconstructed DC bus voltage. In this work the battery pack has a total of 12 ICs for the voltage measurement of 86 cells. We can inject the fault at 6 s to make one of the measurement ICs invalid. Observing from Figure 16, this would cause the reconstructed voltage signal being lower than the original signal and would trigger the second fault classification model to determine the fault IC number for the isolation. In terms of SOC estimation, when a fault occurs, the SOC value will temporarily deviate from the reference value, but after the faulty ICs are isolated, the remaining ICs can still work properly, and the error will gradually converge. In this case, the system diagnoses the fault at 6.013 s and eliminate the system failure within 10 s.

Battery Cell Voltage Sensor Fault
The fault of the cell voltage mainly affects the estimation of the EKF based SoC and reconstructed DC bus voltage. In this work the battery pack has a total of 12 ICs for the voltage measurement of 86 cells. We can inject the fault at 6 s to make one of the measurement ICs invalid. Observing from Figure 16, this would cause the reconstructed voltage signal being lower than the original signal and would trigger the second fault classification model to determine the fault IC number for the isolation. In terms of SOC estimation, when a fault occurs, the SOC value will temporarily deviate from the reference value, but after the faulty ICs are isolated, the remaining ICs can still work properly, and the error will gradually converge. In this case, the system diagnoses the fault at 6.013 s and eliminate the system failure within 10 s. ment ICs invalid. Observing from Figure 16, this would cause the reconstructed voltage signal being lower than the original signal and would trigger the second fault classification model to determine the fault IC number for the isolation. In terms of SOC estimation, when a fault occurs, the SOC value will temporarily deviate from the reference value, but after the faulty ICs are isolated, the remaining ICs can still work properly, and the error will gradually converge. In this case, the system diagnoses the fault at 6.013 s and eliminate the system failure within 10 s.   Figure 17 is the system responses after the fault of losing two phase current sensors is injected at 0.05 s. After the fault is injected, the system cannot detect the fault immediately, so the torque output by the motor will be temporarily unstable. Since the frequency response of the vehicle is much lower than the electrical response, transient instability does not matter. A case of lost multiple signals would render the estimation algorithm based on EKF unable to work properly. At this time, the reconstructed phase current signal based on the DC current will be used as the input of the FOC algorithm. Therefore, when the system detects a fault at 0.076 s, it will switch the source of the control signal to the reconstructed current signal. Finally, the system can recover its stability at 0.85 s. Under the operation of multi-sensor fault, it is necessary to conduct the switching of multiple sets of signals. However, the direct switching multiple signal set will cause the system to take a longer period to restore stability and may even fail to converge. Under this circumstance, the PWM signal will be turned off shortly, and the source of the control signal will be reconfigured. Then the PWM sends the output signals again after signal reconfiguration.  Figure 17 is the system responses after the fault of losing two phase current sensors is injected at 0.05 s. After the fault is injected, the system cannot detect the fault immediately, so the torque output by the motor will be temporarily unstable. Since the frequency response of the vehicle is much lower than the electrical response, transient instability does not matter. A case of lost multiple signals would render the estimation algorithm based on EKF unable to work properly. At this time, the reconstructed phase current signal based on the DC current will be used as the input of the FOC algorithm. Therefore, when the system detects a fault at 0.076 s, it will switch the source of the control signal to the reconstructed current signal. Finally, the system can recover its stability at 0.85 s. Under the operation of multi-sensor fault, it is necessary to conduct the switching of multiple sets of signals. However, the direct switching multiple signal set will cause the system to take a longer period to restore stability and may even fail to converge. Under this circumstance, the PWM signal will be turned off shortly, and the source of the control signal will be reconfigured. Then the PWM sends the output signals again after signal reconfiguration. In this architecture, the system can continue to operate even in the face of three sensor faults with only one exception where the current sensors of phase A, phase B, and DC bus all fault simultaneously. Nonetheless, most MCUs are designed with three current sensors as hardware overcurrent protection circuits. For MCU systems with three current sensors, In this architecture, the system can continue to operate even in the face of three sensor faults with only one exception where the current sensors of phase A, phase B, and DC bus all fault simultaneously. Nonetheless, most MCUs are designed with three current sensors as hardware overcurrent protection circuits. For MCU systems with three current sensors, a third current sensor can be used to restore the system. Table 5 is the evaluated results through using different fault detection algorithms, of which the decision tree has the lowest execution cost with identifying 95% of faults in the first diagnosis cycle, while RF and SVM have the relatively higher execution cost and higher accuracy, compared to decision tree. The advantages of RF, compared to that of SVM, are that (1) the parameters of RF can be easily adjusted, (2) its memory required during the execution phase is lower, and (3) RF can effectively avoid the overfitting by introducing random features. Therefore, in an embedded system with limited resources, RF might be a better choice to achieve this specific purpose.

Conclusions
Model based fault tolerance system usually fails to the effective detection and isolation for the complex fault scenarios. Through integrating BMS, MCU, and using signal reconstruction, signal estimation, and signal arbitration methods, this work proposes a framework for sustainable operation. It can allow two systems to tolerate the faults of three sensors at the same time without increasing costs, and can further detect the fault types of sensors, including no signal fault, bias fault, stuck fault, and intermittent fault. Under the premise of not changing the component failure rate, the proposed IBMS can reduce the powertrain failure rate of about one order of magnitude. Three arbitration algorithms are utilized in the proposed system. Among them, the fault recognition rate of the RF algorithm is 97.4% in the first diagnosis cycle and higher accuracy will be obtained in the subsequent diagnosis cycle. The system recovery time is between 10 and 4000 ms, after the faults can be detected. Furthermore, this architecture can be achieved by chips with the floating point unit and computing power greater than 6000 DMIPS.