Big Data-Based Early Fault Warning of Batteries Combining Short-Text Mining and Grey Correlation

: Considering the battery-failure-induced catastrophic events reported frequently, the early fault warning of batteries is essential to the safety of electric vehicles (EVs). Motivated by this, a novel data-driven method for early-stage battery-fault warning is proposed in this paper by the fusion of the short-text mining and the grey correlation. In particular, the short-text mining approach is exploited to identify the fault information recorded in the maintenance and service documents and further to analyze the categories of battery faults in EVs statistically. The grey correlation algorithm is employed to build the relevance between the vehicle states and typical battery faults, which contributes to extracting the key features of corresponding failures. A key fault-prediction model of electric buses based on big data is then established on the key feature variables. Different selections of kernel functions and hyperparameters are scrutinized to optimize the performance of warning. The proposed method is validated with real-world data acquired from electric buses in operation. Results suggest that the constructed prediction model can effectively predict the faults and carry out the desired early fault warning.


Introduction
Due to the diversity of available energy and government policy promotion, electric vehicles (EVs) constantly receive the attention of researchers and entrepreneurs to boost economic transformation, optimize energy architecture, and ameliorate air quality [1,2]. With the accumulation of high-intensity operations and the aging of critical parts, however, the failures of EVs happen increasingly, bringing a significant challenge to the reliability and safety of vehicle control systems, as well as the unimpeded traffic flow [3][4][5]. In a recent report [6], the issues of battery, motor, and electronic control systems account for 52.5% of EV failures. Considered as the only energy source in EVs, the battery is of vital importance to maintaining the performance of battery management systems (BMS) because of the inconsistency, high-energy density, and the rapid decline in cycle life. To a considerable extent, the early fault warning of batteries with consequential timely repair and maintenance can lead to a series of improvements to EVs. Therefore, it is imperative to develop an early fault warning system for batteries in EVs [7,8].
During the past few years, considerable efforts have been dedicated to early fault warning systems in EVs. Traditionally, the early warning systems are highly dependent on human expert knowledge that can be generally divided into several methods, including signalprocessing-based, reliability-statistics-based, and data-driven-based approaches [9][10][11][12]. For the signal-processing-based approach, the mathematical model is not necessary to establish.

Description of the Early Fault Warning Scheme
The crux of early fault warning lies in identifying the abnormal change in performance parameters. Once identified, the abnormality can consequently be determined to inform the users of the corresponding fault in the system. The architecture of the presented early fault warning scheme based on the fusion of short-text mining and the grey correlation is illustrated in Figure 1. It consists chiefly of the maintenance service station, electric bus, cloud server, and early fault warning platform. The monitoring data of the electric bus and the historical data recorded in the maintenance service station are forwarded to the cloud server to predict possible faults further. The early fault warning scheme is implemented in the platform, including data processing, short-text mining, the grey correlation analysis, and the support vector machine (SVM) algorithm.
Energies 2022, 15, x FOR PEER REVIEW 3 of 20 diction model. The short-texting mining is applied to analyze the manually filled vehicle maintenance data and categorize the key faults in batteries together with the grey correlation that can establish the relationship between the vehicle state data and the main faults. The scheme can make it possible to choose the data highly correlated with faults and lower the implementation difficulty for the later machine learning algorithm. (2) The scheme is an early fault warning method for comprehensive failures instead of an approach for a sort of specific failure. Therefore, it can analyze and pre-warn critical faults in the vehicle, such as poor consistency of cells, parameter errors, communication failure, etc. Moreover, it can be applied to the analysis of general EVs, not limited to the electric buses studied in this research. Specifically, the critical faults are extracted from a pile of sample data produced by EVs, analyzed by the grey correlation, and classified using the SVM algorithm. (3) The presented scheme can reduce the computational complexity of the machine learning algorithm for model construction without additional hardware cost, which is more practicable and efficient in actual implementation. Besides, it also provides higher effectiveness and robustness with the comparison of different model functions and parameters.
The structure of this paper is organized as follows: Section 2 presents a combining short-text mining and grey correlation scheme for early fault warning. In Section 3, the key faults and characteristic parameters are extracted from a batch of sample data. The prediction model is then established in Section 4, and the effectiveness of the prediction model is verified. Section 5 finally concludes with the summarization of the complete paper and draws the outline of future work.

Description of the Early Fault Warning Scheme
The crux of early fault warning lies in identifying the abnormal change in performance parameters. Once identified, the abnormality can consequently be determined to inform the users of the corresponding fault in the system. The architecture of the presented early fault warning scheme based on the fusion of short-text mining and the grey correlation is illustrated in Figure 1. It consists chiefly of the maintenance service station, electric bus, cloud server, and early fault warning platform. The monitoring data of the electric bus and the historical data recorded in the maintenance service station are forwarded to the cloud server to predict possible faults further. The early fault warning scheme is implemented in the platform, including data processing, short-text mining, the grey correlation analysis, and the support vector machine (SVM) algorithm.

Data Preprocessing
Even though the data forwarded to the cloud server are well sorted, various abnormal data still exist and will seriously impact further data mining and exploration. Therefore, it is necessary to conduct the data preprocessing most centered around the data cleansing, data integration, and data protocol application. Data cleansing is the process of detecting and correcting (or removing) corrupt or inaccurate data [33]. The primary purpose of data cleansing is to improve data quality that can be preliminarily detected by the atlas. According to the theoretical correlation, the field data that have little impact on the research can be deleted. For the data with slow state change and a small number of vacancies, the interpolation mean method can be utilized in case of individual missing data, or the previous adjacent value can be adopted for replacement. In data cleansing, it is necessary to locate, find, and clean the adjacent problem data [34].
When integrating the data from different data sources, some data may be inconsistent and redundant, which will result in a poor prediction result and increasing time consumption. Based on the data protocol standard, it can adequately remove the redundant data, maintain the data consistency, and reduce the system complexity.

Short-Text Mining
The simple analysis of vehicle state data to predict the faults may lose the primary goal. Therefore, a fault prediction method of reliability statistics is adopted to find the name, time, and fault phenomenon of the key faults and to clarify the goal of the prediction model. The short-text mining method is applied to extract useful information from the unstructured text of various documents.
The text-mining process can be conducted as follows: the text contents are first preprocessed to extract the main features and then structured; later, the text can be decomposed and sorted by a related algorithm to obtain the required results. To reduce the computational difficulty, the sample text can be short-text-based. There are three steps in short-text mining, consisting of keyword extraction, key concept extraction, and topic description analysis. The keyword extraction is the main feature of text mining and the basis of the latter two works, which mainly use a mathematical tool to quantify and compare. Two main methods for keyword extraction are shown as follows: (1) Feature Selection Based on Statistical Words Occurrence Frequency: The occurrence frequency of keywords in the text is the key topic feature for analysis. The words occurring more often are retained, and the rest can be deleted to improve the accuracy of words in frequency screening. The term frequency-inverse document frequency (TF-IDF) algorithm is a relatively mature method for text mining, which can be defined with the calculation [35]: where t ij is the occurrence frequency of the keyword in the sample text D j ; ∑ n k=1 t kj is the total number of keywords in the D j ; N is the total number of short texts in the training model; and n i is the number of texts containing keywords in the training model.
(2) Information Gain: Based on the information entropy, this method is utilized to measure the proportion of a feature in classification and the amount of the information supplied. Concretely, it gauges the expected reduction in entropy [36]. The difference value between feature entropy can be introduced according to the amount of information: where P(c i ) is the occurrence probability of keyword category c i in the sample text; n is the total number of keyword categories in the sample text; P(D i ) is the occurrence probability of keyword D i in the sample text; P(c i /D i ) is the occurrence probability of keyword D i in the keyword category c i ; P D i is the occurrence probability of keyword D i in the sample text; and P(c i /D i ) is the occurrence probability of keyword category c i without keyword D i .

Grey Correlation Analysis
The grey correlation analysis is a general method to judge the correlation degree between various elements, which can be processed as follows [37]: (1) Determining the Sequence of Analysis: Two series should be determined in this step, including the reference series reflecting the feature of system behavior and the comparison series that can affect system behavior. The reference series Y can be formulated as: where X 0 (k) is the kth variable in the initial sequence X 0 . The comparison series X i can be established as: where X i (k) is the kth variable in X i . (2) Nondimensionalization of Variables: Given the data sequence X = {x(1), x(2), x(3), · · · , x(n)}, the mean change method can be employed. First, the average of each sequence can be calculated. Then, the average value with the data in the original sequence can be divided to generate a new data sequence. Last, the sequence average can be used to reflect the dynamic changes in the data. (3) Calculating the Difference Series, Extreme Value, and Grey Correlation Coefficient: Based on the dimensionless transformation, the related calculation can be obtained: where ∆i(k) is the difference sequence; a is the maximum value of ∆i(k); b is the minimum of ∆i(k); ε i (k) is the grey correlation coefficient; and δ ∈ (0, ∞) is the resolution coefficient. (4) Calculating the Correlation Value and Sorting: The correlation degree R can be defined as the average value between the comparison series and the reference series:

SVM Algorithm
Fault prediction can be regarded as a binary classification problem. As a widely used supervised algorithm, the support vector machine (SVM) can provide uniqueness in the binary classification problem, for which the prediction accuracy and operational efficiency are pretty satisfactory [38].
In the SVM model, the sample set can be given as {x i , y i }, i = 1, 2, 3, · · · , n, x i ∈ R, y i ∈ R, where x i is the sample data, and y i is the output of the sample data.
where C is a positive scalar called the penalty coefficient, and ε i is the slack variable for x i . Both C and ε are hyperparameters.
When modifying the objective function, the Lagrange function can be introduced to solve the objective function, and the Lagrange multiplier λ i is utilized to the constraints. Then, the original problem can be transformed into a computational dual optimization problem [40]: SVM can be turned into a variety of completely different function models through different kernel functions K(x i , y i ). In this paper, different penalty factors C are chosen to test and compare. By comparison, the solution with higher model prediction accuracy and lower model training time can be figured out.

Model Selection and Parameter Tuning for Machine Learning
SVM includes a key penalty factor C and a variety of kernel functions. According to the characteristics of the key faults in electric buses, the test kernel functions selected in this study are the linear kernel function (Linear), Gaussian radial basis kernel function (RBF), polynomial kernel function (Poly), and Sigmoid kernel function (Sigmoid) for nonlinear action of neurons. These kernel functions share the same hyperparameter γ with the default value 1/k (k is the number of categories). In addition, the kernel function Poly has hyperparameters c and d, and the kernel function Sigmoid has a hyperparameter c that needs to be manually given and set, of which selection can refer to [41]. These two parameters also have system default values c = 0 and d = 3, which can be used in this work to compare the advantages and disadvantages of different kernel functions in the specified fault prediction preliminarily without setting hyperparameters. Figure 2 provides a graphic explanation for model selection and parameter tuning, and the detailed descriptions are expressed as follows. Based on the default setting hyperparameters = 1/ , c = 0, d = 3, only the key penalty factor will be changed first, and the key fault predictions of the four kernel functions in the same batch of electric buses will be compared. According to the prediction accuracy and the time-consuming performance of the model operation, the appropriate kernel function can be selected.
For further parameter tuning of the selected kernel functions, changing the value of  Based on the default setting hyperparameters γ = 1/k, c = 0, d = 3, only the key penalty factor C will be changed first, and the key fault predictions of the four kernel functions in the same batch of electric buses will be compared. According to the prediction accuracy and the time-consuming performance of the model operation, the appropriate kernel function can be selected.
For further parameter tuning of the selected kernel functions, changing the value of the default hyperparameters and testing are necessary. The kernel function Linear has no additional hyperparameters, and it is only necessary to adjust the penalty factor C further to find the most accurate and efficient model. The kernel function RBF has an additional hyperparameter γ, so it is necessary to adjust C and γ concurrently, check the operation results for the parameters, analyze the relationship between different hyperparameter values and model effects, and find the most accurate and efficient model. The kernel function Poly needs to adjust the value of C, γ, c, and d simultaneously, and the kernel function Sigmoid has to adjust the values of C, γ, and c at the same time. After the comparison and analysis of different combinations of C and other hyperparameters, the optimal model can then be selected.

Key Fault and Feature Parameter Extraction of Electric Buses
The sample data used in this study were taken in March 2021 from a batch of electric buses operating in Zhenjiang, China, produced in 2017. These data include vehicle maintenance data, alarm data, real-time monitoring bus state data, etc., which were stored in CAN or personnel filling documents. The data are applied to the scheme depicted in Section 2 to extract the key faults and feature parameters of electric buses, as described in Figure 3.

PEER REVIEW
8 of 20

Data Preprocessing
Since the data attributes and registration time are quite different, these data cannot be used directly. Data integration and preprocessing are necessary first.
After the vehicle data are transferred and stored, the following preprocessing can be carried out: (1) Missing Data Completion or Removal: Due to the inconsistency of the time axis of the intercepted data, the real-time state data of the vehicle may lose individual data. The data acquisition frequency is set as 10 s. Some data that change rapidly cannot be easily completed. Still, for data whose state does not change quickly, such as battery voltage and the state of charge (SOC), the interpolation mean method can be used for numerical substitution, or the last adjacent value can be used.
Firstly, vehicle data are extracted from the database according to the vehicle produc-

Data Preprocessing
Since the data attributes and registration time are quite different, these data cannot be used directly. Data integration and preprocessing are necessary first.
After the vehicle data are transferred and stored, the following preprocessing can be carried out: (1) Missing Data Completion or Removal: Due to the inconsistency of the time axis of the intercepted data, the real-time state data of the vehicle may lose individual data. The data acquisition frequency is set as 10 s. Some data that change rapidly cannot be easily completed. Still, for data whose state does not change quickly, such as battery voltage and the state of charge (SOC), the interpolation mean method can be used for numerical substitution, or the last adjacent value can be used. Firstly, vehicle data are extracted from the database according to the vehicle production serial number. Then, the content of the data field is analyzed by the atlas, and the quality of the field data is preliminarily detected. The list of fields that have little impact on the research can be deleted. If the value of a variable is missing a lot, but does not affect the research goal, the entire variable can be deleted.
(2) Noise Data Cleaning: Noise data refer to erroneous data caused by system logic or information interference during acquisition. Noise data severely impact the results of later data processing, which need to be located first in data cleaning to find the adjacent problem data and remove it. Cleaning these noise data needs to be screened one by one according to the feature of electric buses and the correlation with the faults, and special tools are used for prevention and elimination.

Key Fault Extraction of Electric Buses Based on Short-Text Mining
In the actual vehicle maintenance and repair records, the content is filled in by the document-filling personnel of service stations or large group companies. The personnel who fill the report are generally not the actual maintenance personnel of the vehicle. They do not know the vehicle itself and only summarize and refine the "dictation" of the maintenance personnel and record it in the document. The document examiners who well master the vehicle knowledge can still comprehend the correct meaning when checking these incorrectly described words. However, the workload of order-by-order modification can be too large, and the economic benefits will be limited in the short term. Therefore, word segmentation should be performed on these spoken data. In word segmentation, the scattered data should be aggregated in the vehicle production serial number unit and converted into text format txt. to facilitate processing.
The txt. file can be analyzed by the ROSTCM6 text mining tool. The user data set can be loaded and processed according to the user's needs and text characteristics. Load the special fault database data of new energy vehicles into the tool, and clear the records not in the user-defined package, such as prepositions, auxiliary words, faults in non-pure electric systems, etc. The number of original fault records used in the study is 580,000. After automatic word segmentation, the fault data can be reduced to 100,000, and the number of key fault items is 103. Finally, valuable fault information data are extracted for subsequent analysis. The top 10 key faults are listed as shown in Table 1. It can be clearly observed that "poor consistency of battery cells" is the main fault in electric buses due to the inconsistency of the production and use environment. Therefore, the feature parameters of this fault are further extracted in a follow-up study.

Feature Parameter Extraction Based on Grey Correlation Analysis
The grey correlation algorithm module is developed based on Python to normalize the vehicle state data, analyze the correlation degree between each vehicle state parameter and the decision factor "whether there is fault", and calculate the correlation degree value. The main calculating process of the developed script is as follows: The fault state of electric buses is represented by "0" and "1" [42]; that is, "0" is used to indicate that the electric bus has "No failure", and "1" is used to indicate "Failure". The relevant parameters of the vehicle state and power battery state are selected as the influencing variables, denoted by X, where the total battery voltage of is X1; the total battery current is X2; the battery SOC is X3; the vehicle speed is X4; the travel of the accelerator pedal is X5; and the state of the brake pedal is X6. The situation after 1 h is the decision variable, expressed by Y. Then, the standardized data set can be obtained by dimensionless processing of the above data, as shown in Table 2. According to the grey correlation analysis, the decision variable Y is used as the feature parameter of the system. Each attribute parameter is used as the sequence of the relevant factors for the failure "poor consistency of battery cells" to establish a sub-sequence and solve it in the Python model to obtain the result shown in Table 3. According to Table 3, the grey correlation degree is sorted from large to small, and the order is that the total battery current X2 > the total battery voltage X1 > the battery SOC X3 > the travel of accelerator pedal X5 > the state of brake pedal X6 > the speed of electric bus X4. Generally, when the grey correlation degree is greater than 0.5, it indicates that the parameter has a greater impact on the system and should be paid special attention to. Combined with the calculation results of the grey correlation degree, the state data of the battery itself has a relatively large correlation to the occurrence of "poor consistency of battery cells". At the same time, the correlation degree of the three parameters in the vehicle state attributes is between 0.45 and 0.5, which is relatively large, indicating that the Energies 2022, 15, 5333 10 of 19 correlation between each system and the occurrence of fault in the initial stage is consistent with the result of data analysis.

Results and Discussion of Early Fault Warning Based on SVM
In this section, a prediction model is obtained for early fault warning of batteries, of which the effectiveness is verified. The establishment of the prediction model based on SVM can be generally divided into two steps: kernel function selection and hyperparameter tuning.

Kernel Function Selection
Using the sklearn module in the Python scripting tool, a comparison module for different kernel functions can be built. To facilitate manual parameter adjustment, only one value of the penalty factor C is used in each test. The next value of C can be adjusted according to the tested results. The designed Python script operation process is as follows: (1) Read the data and divide it into the training set and test set.
(2) Set the penalty factor C; evaluate the training accuracy and the computing efficiency of the model. Each time a penalty factor C value is set for the test, the test results are recorded, and the results of multiple tests are compared to obtain a proper model. After testing, the value of the penalty factor C is chosen to be {0.1, 0.5, 1, 3, 5, 15, 30, 50}, and the performance of four kernel functions with the same value C is compared.
When the penalty factor C is 0.1, the prediction results are shown in Figure 4. The kernel function Poly has the highest training accuracy under the data set and penalty factor, reaching 93%, but the test accuracy is only 76%, and the training time is longer than other kernel functions; the test accuracy of the kernel function Linear and RBF is 78%; RBF is better in training accuracy, but the training time is longer; the overall prediction effect of the kernel function Sigmoid is the worst; the training accuracy and test accuracy are only 58%, and the training time is up to 2 ms. It can also be seen from the binary effect diagram that the kernel function Sigmoid does not correctly separate the "red faulty vehicle" from the "green non-faulty vehicle", which performs the worst. In this round of test, the kernel functions RBF and Poly distinguish the "red faulty vehicle" from the "green non-faulty vehicle" accurately. The effect of the kernel function Linear is not good. Figure 5 shows the prediction accuracy, training time, and prediction effect of the four kernel functions when the penalty factor C = 0.5. With the increase in C, the overall prediction effect is consistent when C = 0.1, and the comparison of the four kernel functions changes slightly. According to Figures 4 and 5, the kernel function Sigmoid is not suitable for the binary analysis of the data set. The kernel functions Linear, RBF, and Poly perform well. At the same time, with the increase in the penalty factor C, the effect is improved. The kernel function RBF enhances the effect most and takes the least time.
With increasing the penalty factor C, the effect when C is set as 1 and 3 is shown in Figures 6 and 7. With the continuous increase in C, the prediction effect is also improving. With C increasing up to 5 and even 50, the effect is shown in Figures 8 and 9. It shows that when the penalty factor C increases up to 50, the increase in C has little effect on the effect of kernel functions RBF and Poly, and the training time increases slightly.
According to the test results and analysis, it can be concluded that the kernel function RBF is more suitable for the fault prediction of "poor consistency of battery cells". is better in training accuracy, but the training time is longer; the overall prediction effect of the kernel function Sigmoid is the worst; the training accuracy and test accuracy are only 58%, and the training time is up to 2 ms. It can also be seen from the binary effect diagram that the kernel function Sigmoid does not correctly separate the "red faulty vehicle" from the "green non-faulty vehicle", which performs the worst. In this round of test, the kernel functions RBF and Poly distinguish the "red faulty vehicle" from the "green non-faulty vehicle" accurately. The effect of the kernel function Linear is not good.  Figure 5 shows the prediction accuracy, training time, and prediction effect of the four kernel functions when the penalty factor C = 0.5. With the increase in C, the overall prediction effect is consistent when C = 0.1, and the comparison of the four kernel functions changes slightly. According to Figures 4 and 5, the kernel function Sigmoid is not suitable for the binary analysis of the data set. The kernel functions Linear, RBF, and Poly perform well. At the same time, with the increase in the penalty factor C, the effect is improved. The kernel function RBF enhances the effect most and takes the least time. With increasing the penalty factor C, the effect when C is set as 1 and 3 is shown in Figures 6 and 7. With the continuous increase in C, the prediction effect is also improving. With C increasing up to 5 and even 50, the effect is shown in Figures 8 and 9. It shows that when the penalty factor C increases up to 50, the increase in C has little effect on the effect

Sigmoid.
With increasing the penalty factor C, the effect when C is set as 1 and 3 is shown in Figures 6 and 7. With the continuous increase in C, the prediction effect is also improving. With C increasing up to 5 and even 50, the effect is shown in Figures 8 and 9. It shows that when the penalty factor C increases up to 50, the increase in C has little effect on the effect of kernel functions RBF and Poly, and the training time increases slightly.   (e) (f)  According to the test results and analysis, it can be concluded that the kernel function RBF is more suitable for the fault prediction of "poor consistency of battery cells".

Tuning Hyperparameters of the Kernel Function RBF
The sklearn module in the Python scripting tool is used to establish the comparison module of the selected kernel function RBF under different hyperparameters. In each test, only one value of the penalty factor C is used, and a different hyperparameter γ is set for testing at the same time. In the study, six groups of γ are tested under the same value of the penalty factor C. Then, the next value C will be selected to compare the tested results.
The designed Python script operation process is as follows: (1) Reading the data and dividing it into the training set and test set.
(2) Setting the penalty factor C and passing it to six models with different hyperparameters γ, respectively. According to the above-depicted research, the value of the penalty factor C is selected as {1, 3, 5, 10, 20, 30}, and the kernel function RBF has another hyperparameter γ. Combined with the experiment test, the value of the hyperparameter γ is selected to be {1, 3, 5, 10, 20, 30} as well. Six values of {1, 3, 5, 10, 20, 30} are taken to test for each penalty factor C and to compare. When the penalty factor C is 1 and 3, the results of six groups of γ are shown in Figures 10 and 11.  Figure 10 shows the prediction accuracy, training time, and prediction effect of six groups of with penalty factor C = 1. It shows that the training accuracy increases from 92% with = 1 to 97% with = 20, but the training accuracy does not increase anymore when the value of further increased. The overall change trend of test accuracy also increases from 78% when = 1 to 90% when = 10. However, the test accuracy does not increase anymore when the value of is further increased. The training time is 2 ms when = 1, and as the value of increases, the training time is kept around 1.2 ms. In addition, there is a flection point, which is 2.1 ms when = 5. Therefore, we can conclude that = 20 is the optimal hyperparameter when the penalty factor C = 1. Figure 11 shows the results with six groups of when the penalty factor C = 3. It can be observed that the prediction accuracy and training time have similar change trends to those when C = 1, and it can be concluded that = 10 is the optimal hyperparameter for penalty factor C = 3.
The prediction results are presented in Figures 12-15 when C is 5, 10, 20, and 30,  Figure 10 shows the prediction accuracy, training time, and prediction effect of six groups of with penalty factor C = 1. It shows that the training accuracy increases from 92% with = 1 to 97% with = 20, but the training accuracy does not increase anymore when the value of further increased. The overall change trend of test accuracy also increases from 78% when = 1 to 90% when = 10. However, the test accuracy does not increase anymore when the value of is further increased. The training time is 2 ms when = 1, and as the value of increases, the training time is kept around 1.2 ms. In addition, there is a flection point, which is 2.1 ms when = 5. Therefore, we can conclude that = 20 is the optimal hyperparameter when the penalty factor C = 1. Figure 11 shows the results with six groups of when the penalty factor C = 3. It can be observed that the prediction accuracy and training time have similar change trends to those when C = 1, and it can be concluded that = 10 is the optimal hyperparameter for penalty factor C = 3.
The prediction results are presented in Figures 12-15 when C is 5, 10, 20, and 30,  Figure 10 shows the prediction accuracy, training time, and prediction effect of six groups of γ with penalty factor C = 1. It shows that the training accuracy increases from 92% with γ = 1 to 97% with γ = 20, but the training accuracy does not increase anymore when the value of γ further increased. The overall change trend of test accuracy also increases from 78% when γ = 1 to 90% when γ = 10. However, the test accuracy does not increase anymore when the value of γ is further increased. The training time is 2 ms when γ = 1, and as the value of γ increases, the training time is kept around 1.2 ms. In addition, there is a flection point, which is 2.1 ms when γ = 5. Therefore, we can conclude that γ = 20 is the optimal hyperparameter when the penalty factor C = 1. Figure 11 shows the results with six groups of γ when the penalty factor C = 3. It can be observed that the prediction accuracy and training time have similar change trends to those when C = 1, and it can be concluded that γ = 10 is the optimal hyperparameter for penalty factor C = 3.
The prediction results are presented in Figures 12-15 when C is 5, 10, 20, and 30, respectively. It can be concluded that γ = 10 is the optimal hyperparameter for penalty factor C = 5, and γ = 5 is the optimal one for penalty factor C = 10, 20, and 30.
92% with = 1 to 97% with = 20, but the training accuracy does not increase anymore when the value of further increased. The overall change trend of test accuracy also increases from 78% when = 1 to 90% when = 10. However, the test accuracy does not increase anymore when the value of is further increased. The training time is 2 ms when = 1, and as the value of increases, the training time is kept around 1.2 ms. In addition, there is a flection point, which is 2.1 ms when = 5. Therefore, we can conclude that = 20 is the optimal hyperparameter when the penalty factor C = 1. Figure 11 shows the results with six groups of when the penalty factor C = 3. It can be observed that the prediction accuracy and training time have similar change trends to those when C = 1, and it can be concluded that = 10 is the optimal hyperparameter for penalty factor C = 3. The prediction results are presented in Figures 12-15 when C is 5, 10, 20, and 30, respectively. It can be concluded that = 10 is the optimal hyperparameter for penalty factor C = 5, and = 5 is the optimal one for penalty factor C = 10, 20, and 30.  According to the above test results, we can conclude that the kernel function RBF is the most suitable for the fault data set "poor consistency of battery cells", and the kernel function RBF with penalty factor C = 3 and hyperparametric = 10 is the optimized model for predicting the specified faulty. It has the advantages in curve fitting, test set According to the above test results, we can conclude that the kernel function RBF is the most suitable for the fault data set "poor consistency of battery cells", and the kernel function RBF with penalty factor C = 3 and hyperparametric = 10 is the optimized model for predicting the specified faulty. It has the advantages in curve fitting, test set accuracy, and model training time. According to the above test results, we can conclude that the kernel function RBF is the most suitable for the fault data set "poor consistency of battery cells", and the kernel function RBF with penalty factor C = 3 and hyperparametric = 10 is the optimized model for predicting the specified faulty. It has the advantages in curve fitting, test set accuracy, and model training time. According to the above test results, we can conclude that the kernel function RBF is the most suitable for the fault data set "poor consistency of battery cells", and the kernel function RBF with penalty factor C = 3 and hyperparametric γ = 10 is the optimized model for predicting the specified faulty. It has the advantages in curve fitting, test set accuracy, and model training time.

Experimental Verification
To verify the effectiveness of the key fault prediction model, the data of the same batch of electric buses are selected for verification, with a test system from the Higer Bus Company shown in Figure 16. The test system consists of the electric buses, vehicle terminal, cloud server, and management system. The battery state data of electric buses are transmitted to the cloud server through the vehicle terminal, and the cloud server transmits the data to the management system to implement the experimental operation. (1) Extract the state parameters of some faulty electric buses one hour before the fault to form a real vehicle fault data set. (2) Input the fault data set into the model for calculation, record the results of early fault warning, and fill the data with the correct prediction and wrong prediction into the confusion matrix. (3) Extract the state parameters of some non-faulty electric buses to form a real vehicle fault-free data set. (4) Input the fault-free data set into the model for calculation, record the results of early fault warning, and fill the data with the correct prediction and wrong prediction into the confusion matrix. (5) Calculate the data in the confusion matrix, obtain the harmonic average value, and determine the prediction accuracy. As above-mentioned, the data used to build the model and tune the model parameters were taken in March 2021 from a batch of electric buses operating in Zhenjiang. The vehicle state and the fault data of the same batch of electric buses in April 2021 were used for verification in this test. Select the kernel function RBF in SVM with the penalty factor C = 3 and hyperparameter = 10. The confusion matrix of fault prediction results is shown in Table 4. The average harmonic value of the fault prediction is 85.2%, which shows that the model can meet the faulty prediction expectations.

Conclusions
This paper presents a data-driven early fault warning method that combines the advantages of reliability statistics and information processing based on big data analysis. For the model evaluation of the binary classification problem, since the actual fault probability of the vehicle is small, and the vehicle fault after one hour corresponding to most of the vehicle state data is "No fault", the best verification method for this situation is to use the confusion matrix [43]. The specific operation of the verification method is as follows: (1) Extract the state parameters of some faulty electric buses one hour before the fault to form a real vehicle fault data set. (2) Input the fault data set into the model for calculation, record the results of early fault warning, and fill the data with the correct prediction and wrong prediction into the confusion matrix. (3) Extract the state parameters of some non-faulty electric buses to form a real vehicle fault-free data set. (4) Input the fault-free data set into the model for calculation, record the results of early fault warning, and fill the data with the correct prediction and wrong prediction into the confusion matrix. (5) Calculate the data in the confusion matrix, obtain the harmonic average value, and determine the prediction accuracy.
As above-mentioned, the data used to build the model and tune the model parameters were taken in March 2021 from a batch of electric buses operating in Zhenjiang. The vehicle state and the fault data of the same batch of electric buses in April 2021 were used for verification in this test. Select the kernel function RBF in SVM with the penalty factor C = 3 and hyperparameter γ = 10. The confusion matrix of fault prediction results is shown in Table 4. The average harmonic value of the fault prediction is 85.2%, which shows that the model can meet the faulty prediction expectations.

Conclusions
This paper presents a data-driven early fault warning method that combines the advantages of reliability statistics and information processing based on big data analysis. Through short-text mining, the fault information recorded in various vehicle maintenance and service documents is identified, and the key fault types and names of electric buses are statistically analyzed. Through the grey correlation algorithm, the monitoring bus state data are processed. By processing and analysis, the key feature variables related to failures are extracted. The values of key feature variables in the vehicle state one hour before the failure are extracted to establish a machine learning model and predict the faulty possibility of vehicles in the next coming hour. Based on the SVM algorithm, the process of prediction model establishment and evaluation indicators are discussed. First, only the value of the penalty factor C is adjusted to find the proper kernel function with higher accuracy and lower operation time in fault prediction by comparing the effects of various kernel functions. Then, through the establishment of the data test matrix, the penalty factor C and the value of the other hyperparameters γ are adjusted simultaneously to obtain the optimized model. Finally, the fault prediction model is verified by comparing the predicted result with further obtained real fault data of electric buses. The results show that the key fault prediction model of electric buses based on big data can effectively predict the vehicle faults and carry out the early fault warning.
It should be mentioned that the described scheme is a data-driven method, which provides high performance for prediction; however, it still depends on the volume of data. The results might be inaccurate when applying to a limited data set. Moreover, further improvement of prediction performance can be offered by developing the presented method. In future work, it is necessary to consider distinguishing and effective clustering of vehicles in different states and then carry out the data-driven fault prediction, which can improve the effectiveness of the model prediction. In addition, the analysis process can be further optimized. The analysis system can process the data and model adjustment by itself to improve the reliability of the prediction model further and reduce the application cost.