Electric Vehicle Batteries: Status and Perspectives of Data-Driven Diagnosis and Prognosis

: Mass marketing of battery-electric vehicles (EVs) will require that car buyers have high conﬁdence in the performance, reliability and safety of the battery in their vehicles. Over the past decade, steady progress has been made towards the development of advanced battery diagnostic and prognostic technologies using data-driven methods that can be used to inform EV owners of the condition of their battery over its lifetime. The research has shown promise for accurately predicting battery state of health (SOH), state of safety (SOS), cycle life, the remaining useful life (RUL), and indicators of cells with high risk of failure (i.e., weak cells). These methods yield information about the battery that would be of great interest to EV owners, but at present it is not shared with them. This paper is concerned with the present status of the information available on the battery with a focus on data-driven diagnostic and prognostic approaches, and how the information would be generated in the future for the millions of EVs that will be on the road in the next decade. Finally, future trends and key challenges for the prognostics and health management of the batteries in real-world EV applications are presented from four perspectives (cloud-edge full-scale diagnosis, artiﬁcial intelligence and electronic health reports) are discussed.


Introduction
Global sales of electric vehicles (EVs) have exhibited strong growth in the past year, reaching a new record of 6.6 million in 2021 [1]. The advancement of lithium-ion battery energy storage technology plays a pivotal role in the mass marketing of EVs. Estimates have shown that global lithium-ion battery demand would rise over fivefold to 2000 gigawatthours (GWh) between 2022 and 2030 ( Figure 1). The largest market for lithium-ion batteries is and will remain diverse EV application scenarios [2,3]. To seize the market opportunity, billions of dollars have been spent on research and development of battery technologies for the improvement of energy density and cycle life [4]. However, automotive batteries encounter complex, harsh operating environments during the daily operation of EVs [5]. Battery reliability, longevity, and safety may be compromised under realistic conditions [6]. Hence, full-time battery prognostics and health management (PHM) can play an important role in increasing battery life and safety. Despite relentless progress, information to the vehicle owners is still very limited owing to the difficulties in accurate predicting the evolution of nonlinear multiphysics battery systems [7].
Data-driven, machine learning-based approaches have received much attention from both academia and industry over the past decade [8][9][10][11]. The research has shown promise for accurately predicting the dynamics of nonlinear multiscale and multiphysics electrochemical systems in applications to battery health and safety, including state of health (SOH) estimation [12], cycle life prediction [13,14], remaining useful life (RUL) prediction [15,16], internal short circuit detection [17] and safety envelope and risk prediction [18,19]. The canonical predictive methods that use experimental datasets generated under conditions of well-controlled tests in the laboratory provide an organizational framework for the conditions of well-controlled tests in the laboratory provide an organizational framewor for the investigation of a specific mechanism, and the deeper understanding of physic chemical nature of the charge storage processes. Therefore, most of the current approach for battery diagnosis and prognosis are based on relatively small but well-designed e periments. However, in real-world EV application, the batteries experience diverse agin mechanisms and complex operating conditions. The challenge is further complicated b the cell-to-cell variation in the battery pack [20]. As a result, a new solution for the pro nostics and health management of lithium-ion batteries for EV applications has been d veloped, which is cloud-based AI-powered battery diagnosis and prognosis. The inte connected framework that links cell behavior under various realistic conditions to ce performance (health and safety) enables more intelligent data processing and analysis fo AI modeling through uploading the battery operating data to a cloud [21]. Over the pa few years, cloud-based battery prognostics and health management has attracted gre attention from both industry and academia [22][23][24]. This closed-loop digital solutio opens up new opportunities for accurate battery diagnosis and prognosis, and the im provement of safe operation in different environments in its lifetime. This paper begins with a brief review and assessment of the recent advances in dat driven, machine learning approaches for the diagnosis and prognosis of battery heal and safety using experimental data and proceeds to provide a comprehensive review the battery diagnostic and prognostic results using in-vehicle field data. Then, the valu and key issues of using the in-vehicle battery data for battery prognostics and health ma agement are presented. Finally, we provide perspectives and outlooks for battery fu time diagnosis and prognosis using machine learning techniques and cloud-to-edge inte action approaches to providing battery health reports to vehicles users.

Approaches for Battery Diagnosis and Prognosis Using Laboratory Data
Battery degradation is a complex physicochemical process, which involves a varie of electrochemical side reactions in anode, electrolyte, and cathode [25][26][27]. Over the pa decade, the performance of lithium-ion batteries has greatly improved, as a wide varie of advanced electrode active materials and new cell design have been developed [27]. Ye battery condition remains a concern to EV drivers in terms of battery Ah capacity fad and safety risk. Battery prognostics and health management has the two important task This paper begins with a brief review and assessment of the recent advances in datadriven, machine learning approaches for the diagnosis and prognosis of battery health and safety using experimental data and proceeds to provide a comprehensive review of the battery diagnostic and prognostic results using in-vehicle field data. Then, the values and key issues of using the in-vehicle battery data for battery prognostics and health management are presented. Finally, we provide perspectives and outlooks for battery full-time diagnosis and prognosis using machine learning techniques and cloud-to-edge interaction approaches to providing battery health reports to vehicles users.

Approaches for Battery Diagnosis and Prognosis Using Laboratory Data
Battery degradation is a complex physicochemical process, which involves a variety of electrochemical side reactions in anode, electrolyte, and cathode [25][26][27]. Over the past decade, the performance of lithium-ion batteries has greatly improved, as a wide variety of advanced electrode active materials and new cell design have been developed [27]. Yet, battery condition remains a concern to EV drivers in terms of battery Ah capacity fade and safety risk. Battery prognostics and health management has the two important tasks of assessing the state of health (SOH) [28][29][30] and state of safety (SOS) [6,31]. Accurate battery SOH estimation is essential for quantitatively predicting the battery expected lifespan and driving range as the battery degrades. The safety issue is more uncertain and relates to safe operation of the EV and possibly the life of the vehicle driver. This section focuses on the of assessing the state of health (SOH) [28][29][30] and state of safety (SOS) [6,31]. Accurate battery SOH estimation is essential for quantitatively predicting the battery expected lifespan and driving range as the battery degrades. The safety issue is more uncertain and relates to safe operation of the EV and possibly the life of the vehicle driver. This section focuses on the recent advances in data-driven techniques for battery diagnosis and prognosis in terms of health and safety using experimental data ( Figure 2).

Figure 2.
Framework of data-driven battery diagnosis and prognosis. Abbreviations: CC-CV, constant current-constant voltage; DST, dynamic stress test; HPPC, hybrid pulse power characterization; ISC, internal short circuit; ESC, external short circuit; ARC, accelerating rate calorimetry.

State of Health
Battery SOH involves assessing its performance at present or in the foreseeable future. Generally, battery performance is measured as Ah capacity or cycle life for EV applications. Retention of 80% of its initial capacity is generally considered a useful operating lifespan for a battery. Over the past decade, great efforts have been made to achieve accurate estimation of battery SOH and considerable progress has been made. For example, well-designed machine learning techniques have been developed for battery SOH estimation by using parametric and non-parametric algorithms [12]. To probe the parameter space, a comprehensive dataset consisting of 179 cells cycled under various conditions from openly shared datasets was used to train, validate and test the machine learning models. The model, which achieved a root mean squared percent error of 0.45% in predicted capacity (Ah), benefited from the adaptation of uncertainty management-calibration error and accuracy measure techniques. Estimation of RUL is another parameter often used to measure SOH. This parameter is often the number of complete charge-discharge cycles before the end of life. A representative example using data-driven tools to both predict and classify cells by cycle life for a dataset consisting of 124 lithium-ion batteries is given in [13]. In this case, the machine learning model achieve 9.1% test error for using the first 100 cycles to predict cycle life of the cells under fast-charging conditions. The proposed early-prediction model and machine learning technique using Bayesian optimization demonstrated good success in achieving accurate prediction of the final cycle life and optimization of the parameter space of charging protocols [32]. Deriving from the Bayesian framework, Gaussian process regression (GPR) is another commonly used probabilistic machine learning technique for battery diagnosis and prognosis. For example, complex degradation behaviors of lithium-ion battery were fitted through systematic kernel function selection for the GPR model [33].

State of Health
Battery SOH involves assessing its performance at present or in the foreseeable future. Generally, battery performance is measured as Ah capacity or cycle life for EV applications. Retention of 80% of its initial capacity is generally considered a useful operating lifespan for a battery. Over the past decade, great efforts have been made to achieve accurate estimation of battery SOH and considerable progress has been made. For example, welldesigned machine learning techniques have been developed for battery SOH estimation by using parametric and non-parametric algorithms [12]. To probe the parameter space, a comprehensive dataset consisting of 179 cells cycled under various conditions from openly shared datasets was used to train, validate and test the machine learning models. The model, which achieved a root mean squared percent error of 0.45% in predicted capacity (Ah), benefited from the adaptation of uncertainty management-calibration error and accuracy measure techniques. Estimation of RUL is another parameter often used to measure SOH. This parameter is often the number of complete charge-discharge cycles before the end of life. A representative example using data-driven tools to both predict and classify cells by cycle life for a dataset consisting of 124 lithium-ion batteries is given in [13]. In this case, the machine learning model achieve 9.1% test error for using the first 100 cycles to predict cycle life of the cells under fast-charging conditions. The proposed early-prediction model and machine learning technique using Bayesian optimization demonstrated good success in achieving accurate prediction of the final cycle life and optimization of the parameter space of charging protocols [32]. Deriving from the Bayesian framework, Gaussian process regression (GPR) is another commonly used probabilistic machine learning technique for battery diagnosis and prognosis. For example, complex degradation behaviors of lithium-ion battery were fitted through systematic kernel function selection for the GPR model [33].
Over the past decade, a fast-emerging branch of artificial intelligence (AI) named deep learning is helping both industry groups and academic researchers [34,35]. Deep learning has infiltrated different segments of the market, and it has turned out to be a great innovation leap forward in electrochemical and physical realm [36,37]. For example, an encoder-decoder model based on deep learning was proposed for mapping the relationship between battery charging data and corresponding SOH [38]. Experimental results show that the hybrid neural network architectures have good transferability and generalization for different types of batteries. More recently, transfer learning has demonstrated remarkable power in achieving accurate health prediction for unseen battery discharge protocols [39]. The model can adaptively predict the battery health condition (capacity and RUL) based on the pre-trained transfer learning model using a comprehensive dataset consisting of 77 cells. These studies demonstrated that domain knowledge-based learning provide tools for learning the complex battery system. However, purely data-driven models may fit observed data very well, but projections may be physically implausible resulted from extrapolation or observational biases. Therefore, a promising direction that opens a new world for intelligent learning on the multiphysics and multiscale systems is physics-informed machine learning (PIML), which can automatically satisfy some of the physical invariants through embedding physics and related domain knowledge into machine learning [40]. Such physics-informed learning is also renamed as physics-informed neural networks (PINNs) when enforcing the physical laws into specialized neural networks [41]. In applications to battery research, PIML have been demonstrated as a powerful tool to learn nonlinear electrochemical system in a supervised data-driven manner, from the electrode-level state estimation [42] to the nondestructive battery SOH prognostics [43,44]. These studies demonstrated that combining data generation with machine learning offer great opportunities to learn and predict battery behavior. However, due to the model complexity and computing cost, how to accelerate technology transfer from academia to industry applications has long been a problem.
Considering the situations in EV applications that only a few time-resolved parameters (e.g., voltage, current, and temperature) and limited computing power of onboard sensors, the methods that offer effective, simple tools are more desirable. In this regard, incremental capacity (IC) and differential voltage (DV) analysis was used over the past decade. Differential methods have been demonstrated as a powerful tool to pinpoint degradation mechanisms and quantify the degree of degradation for a variety of commercial lithium-ion batteries, including LiNiCoAl (NCA) [45], LiNiMnCo (NMC) [46][47][48][49], LiFePO 4 (LFP) [50][51][52], LiMnO 2 composite (LMO/NMC) [53], and LiCoO 2 (LCO) [54]. Studies have shown that the features (e.g., the location of the peaks) extracted from differential curves can effectively reflect degradation modes and mechanisms-conductivity loss, loss of active material, and loss of lithium inventory [55,56]. The combination of IC/DV features and data-driven methods have demonstrated remarkable power in achieving accurate estimation of battery SOH in real-world applications, which are discussed in Section 3.1.

State of Safety
SOS is a new terminology and there is no commonly accepted core group of metrics to describe it so far. A definition was proposed under the assumption that battery safety is inverse to the abuse, such as mechanical, electrical, and thermal abuse [57]. In that case, SOS can be projected based on a series of abuse experiments according to how difficult it is to cause battery failure. Another approach to battery safety is to attempt to identify conditions in a cell that would indicate there is a high probability of thermal runway occurring in the near future. This approach would require the analysis of battery data during the system's operational lifetime. The objectives would be to identify cells exhibiting abnormal behavior indicating self-discharge [58], internal short circuit [59], external short circuit [60] lithium plating [61], oxygen release of the charged electrodes [62] and expansion force [63] at the cell level, and inconsistency information at the pack level, as well as the data analysis at the system level ( Figure 3).  Due to the rarity of battery failure (estimates for 18650-NCA batteries that fail catastrophically are 1 in several million cells [65]) and privacy of the consumer data [66], one commonly used method for researchers is triggering an internal short circuit by artificially induced failure, such as intentionally inducing an internal defect or an abuse condition such as mechanical, electric, or thermal event. For example, an internal short circuit was induced by mechanical abuse and the data generated by the abuse tests were used for supervised machine learning modeling [67]. Short circuit fault was identified based on the development of equivalent electric circuit models for both healthy and faulty battery. Random forest-based classifier effectively learns the healthy and faulty features from the charge-discharge data. In addition to equivalent electric circuit models, there has been progress in understanding multiphysics systems using in innovative approaches, such as finite element models, to solve the partial differential equations for characterization of the material properties under mechanical abuse conditions [68][69][70][71]. For example, the safety envelope of a lithium-ion battery under mechanical loading conditions was developed by leveraging three finite element models (2D axisymmetric, 2D plane-strain, and the 3D full model) [18]. The classification models provided an accurate and efficient approach to predict the short circuit and safe condition, while the regression models quantitatively predicted the intrusion, force, and kinetic energy related the internal short circuit caused. Another work also highlighted the advantage of combining the finite element model with machine-learning tools to predict the behaviors of battery systems under mechanical abuse conditions [19]. Beyond the internal short circuit, an external short circuit is another electrical situation that can trigger battery failure. A learning machine-based thermal (ELMT) model has been used for the estimation of cell temperature under external short circuit [72]. At pack level for EV applications, not all cells are equipped with current sensors due to the space limitation and manufacturing cost. Therefore, diagnosis of current provides tools for recognizing the risk behaviors of external short circuit. Examples include artificial neural network (ANN)-based method proposed to estimate the current of the short-circuited cell using voltage information [73]. Based on the estimated current information, an electro-thermal coupling model was developed and used for predicting temperature distribution and increase in the faulty cell. More recently, a data-driven Due to the rarity of battery failure (estimates for 18650-NCA batteries that fail catastrophically are 1 in several million cells [65]) and privacy of the consumer data [66], one commonly used method for researchers is triggering an internal short circuit by artificially induced failure, such as intentionally inducing an internal defect or an abuse condition such as mechanical, electric, or thermal event. For example, an internal short circuit was induced by mechanical abuse and the data generated by the abuse tests were used for supervised machine learning modeling [67]. Short circuit fault was identified based on the development of equivalent electric circuit models for both healthy and faulty battery. Random forest-based classifier effectively learns the healthy and faulty features from the charge-discharge data. In addition to equivalent electric circuit models, there has been progress in understanding multiphysics systems using in innovative approaches, such as finite element models, to solve the partial differential equations for characterization of the material properties under mechanical abuse conditions [68][69][70][71]. For example, the safety envelope of a lithium-ion battery under mechanical loading conditions was developed by leveraging three finite element models (2D axisymmetric, 2D plane-strain, and the 3D full model) [18]. The classification models provided an accurate and efficient approach to predict the short circuit and safe condition, while the regression models quantitatively predicted the intrusion, force, and kinetic energy related the internal short circuit caused. Another work also highlighted the advantage of combining the finite element model with machine-learning tools to predict the behaviors of battery systems under mechanical abuse conditions [19]. Beyond the internal short circuit, an external short circuit is another electrical situation that can trigger battery failure. A learning machine-based thermal (ELMT) model has been used for the estimation of cell temperature under external short circuit [72]. At pack level for EV applications, not all cells are equipped with current sensors due to the space limitation and manufacturing cost. Therefore, diagnosis of current provides tools for recognizing the risk behaviors of external short circuit. Examples include artificial neural network (ANN)-based method proposed to estimate the current of the short-circuited cell using voltage information [73]. Based on the estimated current information, an electrothermal coupling model was developed and used for predicting temperature distribution and increase in the faulty cell. More recently, a data-driven fusion model, named multimode and multi-task thermal propagation forecasting method is proposed for achieving quantitative advance multi-step prediction of battery thermal runaway propagation [74]. The early warning strategy is developed based on the detailed data generated from accelerating rate calorimeter (ARC) test and thermal runaway propagation experiments on a hand-made battery module (5 cells in-series) in an explosion-proof box. The method provides tools for analyzing the thermal runaway propagation at the module level and would be helpful to design fire propagation suppression methods.
Unsupervised learning has demonstrated remarkable power in automatically discerning multiple categories (risk of failure/fault and safety) in a collection of observed data [75]. Clustering analysis is the most common unsupervised learning task. For example, a K shape-based time series hierarchical clustering was developed to identify if the batteries have risk of failure from a large amount of operating data in the data center [76]. In another work, principal component analysis-based unsupervised learning offers an effective and simple method for detection of battery voltage anomaly for a large battery system consisting of 432 lithium-ion cells [77]. The core idea of the anomaly detection is based on the cell-to-cell variation and comparison at the pack level. Beyond clustering analysis and PCA, autoencoder-based ANN is an emerging unsupervised learning for capturing latent representation from the input data. A recent study shown that graph-based autoencoder have strong fault detection abilities in terms of sensor fault, connection fault and external short circuit fault [78]. Based on the dataset generated from abuse tests for a well-designed battery pack consists of 5 cells in series, this work achieves reliable and fast detection of mixed faults that occur simultaneously.
In addition to the failures under abuse conditions, foreign object debris or defects occurring during manufacturing in materials may also trigger battery hazardous failure during the system's operational lifetime [79,80]. Therefore, there is a pressing need for the analysis and detection of internal structures and defects at the material level. In a recent work, the difficult task is accomplished by physics-informed learning [81]. The physics-informed deep learning approach can be effectively used to predict topology, geometry, material properties, and nonlinear deformation of internal void/inclusion as well as the elastic modulus of the inclusion by integrating known PDEs in solid mechanics with neural networks. However, solving real-life physical problems such as lithium-ion battery with hundreds of uncertain parameters and boundary conditions often rendering such specialized network architectures infeasible in practice.

Battery Diagnosis and Prognosis Using Field Data from EVs
Those research findings discussed in Section 2 provide valuable insights into physical mechanism and offer tools for methodological explorations. However, what matters is the battery performance under realistic conditions. Learning multiphysics systems using datadriven techniques could be computationally expensive. Big data storage and modelling can quickly become costly and time consuming, which makes them operationally burdensome for onboard BMS. This is where cloud-based solution has come into play [82,83]. Cloudbased solutions using field, in-vehicle battery data with proper privacy protection [84] offer new opportunities for battery diagnosis in real-world EV applications. Cloud computing have attracted great attention in academia. A big data-based monitoring platform was established in Beijing, China in 2011, named Service and Management Center for Electric Vehicles (SMC-EV) [85]. Its main function is to support in-depth analysis and research on the battery systems through collecting the operating data of EVs, such as voltage and temperature. Since the Data Service Center has been established, a series of studies covering a significant number of battery faults have been conducted. For example, a data-driven machine learning technique was developed for fault diagnosis of battery systems [86]. It can effectively detect the abnormal voltage by using multi-level screening strategy. Further, the statistical approach verifies the hypothesis: frequency of battery fault would sharply drop at low temperature (winter). The application of machine learning to battery failure analysis using field data is illustrated in Figure 4.  ) based on the pre-trained models. Cell balancing and thermal management provided by BMS is fundamental to the safe operation of the battery system. The field data can be recorded and uploaded to a cloud. Cloud BMS offers opportunities for the learning of the complex battery system using datadriven, machine learning approaches followed by data processing and feature engineering/selection. The data-driven models on the cloud offer a viable path of analyzing and interpreting learning tools for battery performance evaluation and thus generate diagnosis report, which plays key roles in optimizing the protocols used in the prognostics and health management.

State of Health
Researchers in general do not have access to the massive in-vehicle battery data being collected by battery manufacturers, auto companies and national-level monitoring center from the millions of EVs on the road worldwide. Those data could be used by researchers worldwide as datasets needed to accelerate the adoption of machine learning and digital intelligence to automotive battery applications. However, existing research on datadriven techniques for battery diagnosis and prognosis cannot utilize those data due to the ownership and privacy of companies generating the data during the daily operation of the EVs. With the improvement of the computational power and data storage capability, the field data can be uploaded to a Cloud, which offer new opportunities for researchers and engineers to develop diagnostics and prognostics technologies towards battery fulllifespan management. For example, a comprehensive dataset consisting of 147 vehicles cycled under realistic conditions was generated using cloud-based solution [87]. The result demonstrated that daily charging data (voltage, current and temperature vs. time) with 30s sampling interval are capable of mapping the battery ageing process. Another example was the SOH estimation for a battery pack using feedforward neural network models based on the data collected by Shanghai Electric Vehicle Public Data Collecting, Monitoring and Research Center [88]. Four health features were extracted from historical operating data, including accumulated mileage of vehicles, C-rates distribution during battery cycling, intensity of SOC ranges and distribution of cell temperatures. The result showed that the analytical methods used achieved a maximum relative error of 4.5% for ) based on the pre-trained models. Cell balancing and thermal management provided by BMS is fundamental to the safe operation of the battery system. The field data can be recorded and uploaded to a cloud. Cloud BMS offers opportunities for the learning of the complex battery system using data-driven, machine learning approaches followed by data processing and feature engineering/selection. The data-driven models on the cloud offer a viable path of analyzing and interpreting learning tools for battery performance evaluation and thus generate diagnosis report, which plays key roles in optimizing the protocols used in the prognostics and health management.

State of Health
Researchers in general do not have access to the massive in-vehicle battery data being collected by battery manufacturers, auto companies and national-level monitoring center from the millions of EVs on the road worldwide. Those data could be used by researchers worldwide as datasets needed to accelerate the adoption of machine learning and digital intelligence to automotive battery applications. However, existing research on data-driven techniques for battery diagnosis and prognosis cannot utilize those data due to the ownership and privacy of companies generating the data during the daily operation of the EVs. With the improvement of the computational power and data storage capability, the field data can be uploaded to a Cloud, which offer new opportunities for researchers and engineers to develop diagnostics and prognostics technologies towards battery fulllifespan management. For example, a comprehensive dataset consisting of 147 vehicles cycled under realistic conditions was generated using cloud-based solution [87]. The result demonstrated that daily charging data (voltage, current and temperature vs. time) with 30 s sampling interval are capable of mapping the battery ageing process. Another example was the SOH estimation for a battery pack using feedforward neural network models based on the data collected by Shanghai Electric Vehicle Public Data Collecting, Monitoring and Research Center [88]. Four health features were extracted from historical operating data, including accumulated mileage of vehicles, C-rates distribution during battery cycling, intensity of SOC ranges and distribution of cell temperatures. The result showed that the analytical methods used achieved a maximum relative error of 4.5% for the 700 vehicles (300 pure EVs and 400 HEVs using NMC type cells) for the data collected by the big data platform for different driving mode. In addition to the SOH passenger vehicles, accurate battery aging assessment for electric buses is a valuable information for bus companies. One study has shown that feature-based neural networks can effectively establish the relationship between the battery aging level (the third peak values of IC curves) and realworld operation features (accumulated mileage of vehicles, initial charging SOC, average charging current, and average operating temperature of bus battery systems) [89].
In this case, of cloud-based battery diagnosis, our research team has achieved an accurate estimation and prediction for the SOH of battery systems using the field data based on the cloud platform ( Figure 5). It has demonstrated that feature-based (IC and DV analysis) machine learning is robust and highly compatible with different chargingdischarging cycles after well-designed data processing (e.g., filter design for denoising of the field data) and feature extraction from differential methods, which provides powerful tools to predict the health conditions (capacity) of every battery cell within a pack for real-life EVs applications. Broadly speaking, the work highlights the promise of combining in-vehicle battery data and data-driven methods for modelling and predicting the evolution of multiphysics and multiscale electrochemical systems with missing and noisy data in a supervised data-driven manner. Reliable and accurate capacity estimation and prediction using sensor data collected from the device will create great value for prognostics and health management to ensure the efficient and safe operation of the battery systems under various, dynamic operating conditions. the 700 vehicles (300 pure EVs and 400 HEVs using NMC type cells) for the data collected by the big data platform for different driving mode. In addition to the SOH passenger vehicles, accurate battery aging assessment for electric buses is a valuable information for bus companies. One study has shown that feature-based neural networks can effectively establish the relationship between the battery aging level (the third peak values of IC curves) and real-world operation features (accumulated mileage of vehicles, initial charging SOC, average charging current, and average operating temperature of bus battery systems) [89].
In this case, of cloud-based battery diagnosis, our research team has achieved an accurate estimation and prediction for the SOH of battery systems using the field data based on the cloud platform ( Figure 5). It has demonstrated that feature-based (IC and DV analysis) machine learning is robust and highly compatible with different charging-discharging cycles after well-designed data processing (e.g., filter design for denoising of the field data) and feature extraction from differential methods, which provides powerful tools to predict the health conditions (capacity) of every battery cell within a pack for real-life EVs applications. Broadly speaking, the work highlights the promise of combining in-vehicle battery data and data-driven methods for modelling and predicting the evolution of multiphysics and multiscale electrochemical systems with missing and noisy data in a supervised data-driven manner. Reliable and accurate capacity estimation and prediction using sensor data collected from the device will create great value for prognostics and health management to ensure the efficient and safe operation of the battery systems under various, dynamic operating conditions. Cloud-based battery SOH estimation and prediction for real-world EV applications. Infrastructure and software constitute essential elements to a cloud platform. The sensor data generated from the Internet of things (IoT)-enabled vehicles can be seamlessly delivered to a cloud. Learning algorithm and physics-based modelling can be conducted on a cloud using both the battery parameters generated during the daily operation of the battery system and the end-use behaviors during the daily operation of the vehicle. Over the air (OTA) technology provides tools for distributing new software that developed in an offline environment to the onboard actuators. Machine learning pipeline approach, composed of data processing, feature engineering, feature selection and machine learning modeling, has a crucial role to map from inputs (raw data or engineered features) to target variable (capacity or cycle life, etc.).

Figure 5.
Cloud-based battery SOH estimation and prediction for real-world EV applications. Infrastructure and software constitute essential elements to a cloud platform. The sensor data generated from the Internet of things (IoT)-enabled vehicles can be seamlessly delivered to a cloud. Learning algorithm and physics-based modelling can be conducted on a cloud using both the battery parameters generated during the daily operation of the battery system and the end-use behaviors during the daily operation of the vehicle. Over the air (OTA) technology provides tools for distributing new software that developed in an offline environment to the onboard actuators. Machine learning pipeline approach, composed of data processing, feature engineering, feature selection and machine learning modeling, has a crucial role to map from inputs (raw data or engineered features) to target variable (capacity or cycle life, etc.).

State of Safety
As discussed earlier, due to the rarity of battery failure, it is very difficult to generate enough experimental data in the laboratory to include the entire spectrum of battery failure conditions and mechanisms. This would require cycling millions of normal cells from several months to years [90,91]. The best sources of such data are the millions of EVs operating daily on the roads around the world. Cloud-based solutions offer new opportunities for battery diagnosis and prognosis in real-world applications. As noted in the previous section, a big data-based monitoring platform, named SMC-EV was established in Beijing, China in 2011. In the following years, a national-level big-data platform was established in 2017, named National Monitoring and Management Platform for New Energy Vehicles (NMMP-NEV), which is updated from the city-level SMC-EV [92]. The new monitoring platform provides more flexible data-driven models for fault diagnosis and prognosis of millions of EVs (6 million by 2022). For example, a multi-scale entropy method was developed for the fault feature extraction, which improves the prognosis sensitivity by avoiding entropy fluctuations and information redundancy based on the analysis and verification using an electric bus thermal runaway accident [93]. After a full fast charge of 90 min, the bus started to smoke after about one kilometer travelling, and then spontaneously ignited. Using data provided by the NMMP-NEV, the analysis method effectively extracted and located abnormal signals before the battery anomaly became apparent. The author of the study also claims that it is possible to detect the high risk, abnormal cells as early as one week before the failure. As a follow-up study, the research team presented a real-time diagnosis and prognosis technique using the normalized discrete wavelet decomposition algorithm based on real failure case analysis and vibration tests [94]. The experiments verified that battery connection faults can be a triggering factor of thermal runaway. The proposed method provides effective tools to extract and locate the early hidden fault signals. Another real-world thermal runaway case was also reported using NMMP-NEV [95]. In order to investigate of the evolution of nonlinear lithium-ion battery failure under overcharge abuse, a 3D electrochemical-thermal coupled model was developed. Based on the experiment results using the coupled model, the study concluded that overcharge is a primary cause of real-world battery thermal runaway, which results from long-term voltage inconsistencies within the battery pack. It demonstrated that cell balancing and adaptive operating voltage window control along with capacity fade play important roles for the safe operation of battery systems during the long-term service life. Recently, a comprehensive dataset consisting of 3 real-world EVs (2 thermal runaway cases and 1 normal one) were extracted from the monitoring platform for data-driven modeling of the early warning system [96]. By applying discrete Fréchet distance and local outlier factor to the cell voltage and temperature, the early warning system provided tools to quantify the correlation between the normal and faulty cells. The high-frequency (0.1 Hz) sampling monitoring system is a giant leap forward in this mission.
In the same period, a China Southwest Branch of the National Monitoring and Management Center for New Energy Vehicles was established to monitor and diagnosis the EVs in southwest China [97]. Based on the real-time investigation and targeting analysis on the real cases, a data-driven fault diagnosis and early warning system was proposed [17]. Four EVs with three types of health conditions (potential failures, thermal runaway and normal) were adopted for the model validation. The results showed that the proposed state representation methodology can effectively detect the subtle changes in the cell voltage. In the case of battery failure analysis, more recently, we have made a breakthrough by developing a semi-supervised machine learning system [24]. The proposed data-driven models can be used to both predict and classify the batteries by health condition based on the observational, empirical, physical, and statistical understanding of the multiphysics and multiscale systems. The cloud-based framework ( Figure 5) provides powerful tools for seamlessly learning from the historical battery data and generate longitudinal electronic health records in cyberspace. Our findings highlight the need for cloud-based AI-powered technology tailored to predict battery failure in real-world applications.

The Value and Key Issues for the Use of in-Vehicle Data to Monitor Battery Condition
There are two primary reasons that tracking the performance and health of the battery in an EV is important. First, there are concerns regarding the safety of the battery pack and the risk of a sudden failure of a cell in it. Second, the performance (Ah capacity) of the battery pack will degrade with use and the EV owner needs to know the extent of the degradation and as information to transfer to a second owner of the EV. In addition, the battery and auto manufacturers need to know how well their new products are functioning in real-world use. The manufacturers are presently tracking the performance of the EVs they have sold, but the data are not available to the car owner or the public.

Tracking EV Battery Performance and Health/Safety
Monitoring and storing the data from which the performance and health of the batteries in EVs can be determined is critical to their safe operation over their lifetime of 10-15 years. The performance and health of a battery pack and its cells depends on many factors and no two battery packs will have the same history. The fact that the battery will be charged and discharged 1000-2000 times over its lifetime and no two charge-discharge cycles will be the same makes prediction of health of a particular battery nearly impossible. Further, the battery pack consists of hundreds of cells in series and parallel and no two cells are exactly the same due to the complexity of their design and construction. Small defects in manufacture and/or impurities/nano-metal particles in the electrode of a single cell can lead to catastrophic failure (thermal runaway) of the pack after an unknowable time (several months to years). Hence, the only realistic approach to tracking the performance and health of the cells is to measure and store the data for later analysis of their voltage and temperature and possibly current as the EV is being driven and as the battery is being charged. The volume of data from a single battery for even several days or weeks is very large and the analysis of the data requires much computing power.
Laboratory testing can provide insights into specific mechanisms that can guide physics-based models to detect risks of battery failure and also what physical mechanisms govern the electrochemical degradation phenomenon of the cells. However, when battery diagnosis and prognosis for real-world EV applications is the core task, there is a huge gap between laboratory and real-world battery tracking data in terms of quantity and relevance ( Figure 6). Therefore, one promising area of research is the combination of small number of experiments and large volume of field data, which offers opportunities for solving real-life physical problems with high-dimensional parameter space in the continuous space time domain.

The Value and Key Issues for the Use of in-Vehicle Data to Monitor Battery Condition
There are two primary reasons that tracking the performance and health of the battery in an EV is important. First, there are concerns regarding the safety of the battery pack and the risk of a sudden failure of a cell in it. Second, the performance (Ah capacity) of the battery pack will degrade with use and the EV owner needs to know the extent of the degradation and as information to transfer to a second owner of the EV. In addition, the battery and auto manufacturers need to know how well their new products are functioning in real-world use. The manufacturers are presently tracking the performance of the EVs they have sold, but the data are not available to the car owner or the public.

Tracking EV Battery Performance and Health/Safety
Monitoring and storing the data from which the performance and health of the batteries in EVs can be determined is critical to their safe operation over their lifetime of 10-15 years. The performance and health of a battery pack and its cells depends on many factors and no two battery packs will have the same history. The fact that the battery will be charged and discharged 1000-2000 times over its lifetime and no two charge-discharge cycles will be the same makes prediction of health of a particular battery nearly impossible. Further, the battery pack consists of hundreds of cells in series and parallel and no two cells are exactly the same due to the complexity of their design and construction. Small defects in manufacture and/or impurities/nano-metal particles in the electrode of a single cell can lead to catastrophic failure (thermal runaway) of the pack after an unknowable time (several months to years). Hence, the only realistic approach to tracking the performance and health of the cells is to measure and store the data for later analysis of their voltage and temperature and possibly current as the EV is being driven and as the battery is being charged. The volume of data from a single battery for even several days or weeks is very large and the analysis of the data requires much computing power.
Laboratory testing can provide insights into specific mechanisms that can guide physics-based models to detect risks of battery failure and also what physical mechanisms govern the electrochemical degradation phenomenon of the cells. However, when battery diagnosis and prognosis for real-world EV applications is the core task, there is a huge gap between laboratory and real-world battery tracking data in terms of quantity and relevance ( Figure 6). Therefore, one promising area of research is the combination of small number of experiments and large volume of field data, which offers opportunities for solving real-life physical problems with high-dimensional parameter space in the continuous space time domain. Figure 6. Combination of physics-informed modelling with in-vehicle battery data. Figure 6. Combination of physics-informed modelling with in-vehicle battery data.

Issues in Using Cloud-Stored Battery Field Data
Measurement and collection of the battery tracking data are difficult in the vehicle environment. Difficulties are exacerbated because in the case of EV applications data must be taken for the cells and the pack. In the pack, hundreds or even thousands of cells are connected in-series (and parallel) making installation of instrumentation difficult. Data are taken in the environment of the electronic noise from the electric drive unit of the vehicle making the battery data noisy in some cases. Battery data are taken while the EV is being driven and while the vehicle is parked at a battery charger. The battery voltage and current data are changing rapidly as the EV changes speeds in stop-go traffic resulting in an uncertainty in the cycling pattern of batteries at any particular time step.
The data taken while the battery is being charged are particularly important because most of the data-driven battery analyses utilize charging data to evaluate battery health. Taking data during charging is less difficult than during driving because the currents are much lower, the changes in voltage much slower, and the pattern of battery use more predictable. Much of the degradation of the cells occurs during charging and the results of that degradation are more evident by studying the charging data. The actions of the BMS are very important during charging as it sets the charging protocol including cell balancing and setting cell voltage limits at the conclusion of the charge. Since actions (normal and malfunctions) of the BMS can affect battery degradation, the expected control of the charge by the BMS should be included in the tracking data.
The classical machine learning routine requires a dataset to train the model on the battery before the model is applied to a large dataset to project future characteristics of the battery [7,12,13]. In the case of in-vehicle batteries, the model would be trained using test data from the initial charge-discharge cycles in the testing. Hopefully nominal information on the new cells and the pack will be available from the battery manufacturer. Clearly careful attention must be given to how to train the machine learning models before the tracking testing is started. This could include some limited laboratory testing.
The battery parameters to be determined and the strategy regarding data transmission to the cloud should be carefully designed. For example, sampling frequency (e.g., 1 s) and transmission cycle (e.g., 1 to several weeks) play a crucial role in improving the accuracy of the tracking testing. Higher sampling frequency and shorter transmission cycle mean higher requirement of cloud infrastructure for data storage and computing. In the absence of a commonly accepted core group of metrics, the diagnosis reports from different cloud supplier may reach very different conclusions, especially in the case of the risk of battery failure. Therefore, there is a compelling need to establish more sophisticated criteria for evaluating battery systems when they are working in an EV environment.

Outlook for Battery Prognosis in EV Applications
There seems to be little doubt that cloud-based storage and computing can be utilized to improve the operation of vehicle and battery systems [21]. The cloud-based software is being developed by several large companies among them are Bosch [98], Panasonic [99] and Huawei [100]. The products provided by Bosch, named battery in the cloud using big data from vehicle fleets and digital twins are now on the market that claim to increase the cycle life by 20%. The universal battery management cloud (UBMC) service offered by Panasonic provides tools for ascertaining the battery state and optimal battery operation. The project launched by Huawei aims to provide public cloud service for EV companies focusing on the remote monitoring and diagnosis of battery systems. In discussing the software (often named software as a service, SaaS), machine learning and AI technologies are often mentioned. These developments are in progress world-wide in China, Europe, and the United States. Edge computing is playing an increasing role in these cloud-based systems [101,102]. Various aspects of the application of cloud-based prognostics and health management (Figure 7) to the battery systems in real-world EV applications are presented and analyzed in this section.
health management (Figure 7) to the battery systems in real-world EV applications are presented and analyzed in this section.

Cloud-Edge Applications and Battery Digital Twins
Batteries are complex electrochemical systems that are difficult to analyze/model. Most of the battery models have many uncertain parameters dealing with material properties, cell dimensions, and/or component values as in equivalent circuit models. The model accuracy can be improved if test data for the battery are available to determine the parameters that result in the battery model predicting with high accuracy the performance of the battery over a wide range of vehicle operating conditions and time (seconds to years). The result model of the battery is termed its digital twin [103,104]. The physical battery of interest is in the EV and the digital twin is presented in a cloud server which can store the very large volumes of data needed to run the physical battery model software.
In all respects, tracking changes in battery performance and health in the laboratory is less difficult than for in-vehicle testing and data interpretation. The cloud/digital twin/physical battery processes can be applied to develop approaches for battery diagnosis and prognosis [23,105]. The battery data from the EV can be stored in the cloud and used to analysis the battery with the machine learning techniques discussed in Sections 2 and 3. The first products being developed are directed to assessing the battery state and the optimized control of the battery operation both during driving and charging to optimize its cycle life. An example of this product is the software being developed in the UK by Watt Electric Vehicle Company [106]. The EVs involved are electric taxis in London. The digital twins of the taxi batteries are utilized to determine the battery SOH and predicted cycle life.

Cloud-Edge Applications and Battery Digital Twins
Batteries are complex electrochemical systems that are difficult to analyze/model. Most of the battery models have many uncertain parameters dealing with material properties, cell dimensions, and/or component values as in equivalent circuit models. The model accuracy can be improved if test data for the battery are available to determine the parameters that result in the battery model predicting with high accuracy the performance of the battery over a wide range of vehicle operating conditions and time (seconds to years). The result model of the battery is termed its digital twin [103,104]. The physical battery of interest is in the EV and the digital twin is presented in a cloud server which can store the very large volumes of data needed to run the physical battery model software.
In all respects, tracking changes in battery performance and health in the laboratory is less difficult than for in-vehicle testing and data interpretation. The cloud/digital twin/physical battery processes can be applied to develop approaches for battery diagnosis and prognosis [23,105]. The battery data from the EV can be stored in the cloud and used to analysis the battery with the machine learning techniques discussed in Sections 2 and 3. The first products being developed are directed to assessing the battery state and the optimized control of the battery operation both during driving and charging to optimize its cycle life. An example of this product is the software being developed in the UK by Watt Electric Vehicle Company [106]. The EVs involved are electric taxis in London. The digital twins of the taxi batteries are utilized to determine the battery SOH and predicted cycle life. During the battery system's daily operation, a mass of data is generated for the hundreds/thousands of cells under various working conditions. It is impossible to achieve real-time monitoring and diagnosis of battery systems using the models that developed on top of the cloud platform. Such inherent limitation stimulates the growth and development of edge computing, which collects and processes data closer to sensors and actuators and hence making it become remarkably efficient [102]. Edge computing combined with cloud computing in a mesh-like topology (e.g., 1 cloud server connected to N edge devices) opens a whole new world of solving real-life physical problems with high-dimensional parameter space. For example, complex machine learning models using large and deep neural networks is trained on cloud for handling large volumes of in-vehicle battery data, after which the pre-trained parameters and indicators for prognostics and health management of battery systems on the cloud can be transmitted to an edge computing node to achieve real-time monitoring and controls. Cloud-edge interactions and battery digital twins provide a promising avenue for the advances in solving real-life physical problems.

Full-Scale Battery Diagnosis and Prognosis for Health and Safety
The discussions in the previous section indicate the development of products/software to assess SOH, RUL or safety using in-vehicle battery data stored in a cloud server is underway. Those products will provide valuable information to EV owners and to battery and vehicle manufacturers concerning their batteries. In Section 3.2, a brief review is given of the analysis of a limited number of battery failures for which there were in-vehicle data available. The intent of those analyzes were to investigate the battery conditions just before the failures in order to have a means of identifying in-vehicle batteries of a high risk of failure. All the studies cited were done in China using cloud data from national cloud data centers. Similar cloud data centers do not seem to exist outside of China which can be utilized for studies of in-vehicle battery failure studies. Whether or not such studies are being made by the battery/auto industries/DOE National Labs in the United States is unclear at the present. Battery failure studies using large volumes of in-vehicle data are needed to develop criteria for evaluating risk of failure for different battery chemistries and pack configurations. At the present time, those criteria do not exist for any battery chemistry. A full-scale battery diagnosis and prognosis for EV applications should be conducted at both material level and cell level as well as at the pack/system level. Further, predefined software used for onboard BMS are inapplicable to the battery systems that have degraded with age. The adaptive battery management and optimization of onboard control strategy such as operating voltage window, thermal management and cell balancing can be very beneficial to reduce the risk of abuse and improve battery safe operation.

Advanced Artificial Intelligence and Machine Learning Techniques
Data-driven approaches play a central part in the battery diagnosis and prognosis. Although steady progress towards this goal through the development of machine learning techniques with or without physics, accurately predicting cell behaviors under realistic conditions is still rather challenging. In recent years, how to endow a machine with the expert human ability at certain tasks is considered a very promising direction that may lead to a conceptual leap for scientific discovery and optimization [107]. It can be classified into two main categories: (1) concept learning [108,109], and (2) lifelong learning [110][111][112].
Concept learning endows human with the ability to learn new concepts from just one or a few examples, by which it allows us to make predictions and take reasonable decisions for the purpose of generalization, discrimination, and inference. Concept-cognitive learning (CCL) can be considered as an immersive and active process that focuses on helping learn how to maximize concept learning and dynamic knowledge transfer in the context of dynamical systems and chaotic states. In addition, recent research demonstrated that metalearning approach provides a powerful framework to solve few-shot learning tasks [113]. The learning to learn mechanism (meta-learning) offer tools to learn a new prediction task from a small amount of observed data, and it is possible to exploit a wealth of observations using the new model trained by meta-learner [114].
In most biological respects, continual lifelong learning is the instincts of the living organisms to adapt to a changing world or cope with vital environmental contingencies. Inspired by this unique ability, how to design specialized machine learning models to achieve lifelong learning earned significant attention in recent years. To address this issue, one may first need to learn how to prevent catastrophic forgetting and avoid catastrophic interference. as the continual acquisition of massive amounts of multi-fidelity observational data, such as elastic weight consolidation [115], brain-inspired replay-based approaches [116] and Bayesian-based learning method [117] in a supervised or reinforced manner. Another school of thought pertains to the efforts focused on designing inputdriven self-organizational neural networks in an unsupervised fashion [118,119].

Battery Health Reports to Electric Vehicle Owners
EV owners presently receive reports from vehicle manufacturer regarding the operation and condition of their EVs over the past month. For example, vehicle diagnostic reports are sent by email, text or "myChevrolet Mobile" app to the owners of Chevrolet Bolt EV [120]. The electronic health records currently include diagnostic information on the vehicle systems, such as driving and charging history as well as the required maintenance such as tire pressure. In addition to the basic maintenance of the car, the vehicle manufacturer is also by wireless communications monitoring the operation and condition of the battery. The detailed results of the monitoring of the battery are not currently shared with the vehicle owner unless it leads to the need for a recall to examine the battery. The manufacturers could share information concerning the battery such as its current capacity and the number of cells showing abnormal voltage characteristics. This information would give the consumer peace of mind and more confidence in their vehicle. As the cloud-based, data-driven analytical methods are further developed, the manufacturers will have more detailed information to share with the vehicle owner. When transfer of ownership takes place, the auto manufacturer will have detailed information on the history and condition of the battery to give to the new owner. This will facilitate the transfer of EVs into the second/used car market and battery packs into second use applications [121].

Summary and Conclusions
Data-driven, machine learning-based approaches have received much attention from both academia and industry over the past decade. The research has shown promise for accurately predicting the dynamics of nonlinear multiscale and multiphysics electrochemical systems in application to battery health and safety, including SOH estimation, cycle life prediction. RUL prediction, internal short circuit detection and safety envelope and risk prediction. The application of these methods to give meaningful information to the vehicle owners is still very limited. This paper is concerned with the present status of the information available, its reliability, and how the information would be generated for the millions of EVs that will be on the road in the next decade. The information desired by EV drivers concerns the SOH of the battery and the possibility of an unexpected battery failure. Battery SOH is of continuing concern as it affects vehicle range, battery cycle life, vehicle trade-in value, and total ownership cost (TOC). Battery safety, especially a sudden failure, is also a continuing concern, but it is likely to be experienced by only by a relatively small fraction of EV owners over the lifetime of their vehicle.
Analysis to determine the SOH and the risk of failure depends on having available appropriate battery test data. The data can be taken in the laboratory from cells or modules or from batteries in EVs as they are used in daily service. The volume of the data to be stored and analyzed from laboratory testing is very much smaller than from studies involving battery data from fleets of EVs in use. In the latter studies, the data are stored in cloudbased servers. Laboratory testing is done to study electrochemical mechanisms that can lead to cell degradation and conditions that can lead to sudden thermal run-away events. In-vehicle battery data are taken to study the effects of real-world conditions like variable driving cycles, uncertain charging patterns, and small differences in cell manufacturing on battery pack SOH and the possibilities of cell failure in the pack.
Researchers have been analyzing laboratory data for many decades. The analysis of cloud-based datasets is much more recent. The cloud-based datasets can be analyzed using the data-drive approaches discussed in Sections 2.1 and 3.1 to determine the SOH of a battery usually expressed as changes in cell capacity as the battery is used in the vehicle. The cloud-based datasets can be combined with battery digital twin to optimize the control of the in-vehicle battery to increase driving range and cycle life.
Unexpected, sudden battery failures occur very infrequently. Most often when the EV is parked during or after battery charging. The battery failure situations can be analyzed using cloud-based data using the data-driven approaches discussed in Sections 2.2 and 3.2 applied to the cells. Risks of cell failure may be indicated by large inconsistences in a cell voltage compared to other cells in the pack. These inconsistences often become apparent during charging. There is much interest in developing methods and computer software that when used with a cloud-based dataset will target cells of risk.
The vehicle manufacturers monitor the operation of the vehicle and track the performance of battery as it is charged and discharged. The detailed results are not currently shared with the vehicle owner unless it leads to the need for a recall to examine the battery. Software is now becoming available that would allow the manufacturers to share information concerning the battery condition with EV owners. This information could include the current capacity of the battery and some indicator showing abnormal characteristics. This information would give the consumer peace of mind and more confidence in their vehicle. As the cloud-based, data-driven analytical methods are further developed, the manufacturers will have more detailed information to share with the vehicle owner. When transfer of ownership takes place, the auto manufacturer will have detailed information on the history and condition of the battery to give to the new owner. This will facilitate the transfer of EVs into the used car market and battery packs into second use applications.
Author Contributions: All authors contributed to writing the manuscript. All authors have read and agreed to the published version of the manuscript.