Lithium–Ion Battery Data: From Production to Prediction

: In our increasingly electriﬁed society, lithium–ion batteries are a key element. To design, monitor or optimise these systems, data play a central role and are gaining increasing interest. This article is a review of data in the battery ﬁeld. The authors are experimentalists who aim to provide a comprehensive overview of battery data. From data generation to the most advanced analysis techniques, this article addresses the concepts, tools and challenges related to battery informatics with a holistic approach. The different types of data production techniques are described and the most commonly used analysis methods are presented. The cost of data production and the heterogeneity of data production and analysis methods are presented as major challenges for the development of data-driven methods in this ﬁeld. By providing an understandable description of battery data and their limitations, the authors aim to bridge the gap between battery experimentalists, modellers and data scientists. As a perspective, open science practices are presented as a key approach to reduce the impact of data heterogeneity and to facilitate the collaboration between battery scientists from different institutions and different branches of science.


Introduction
Energy storage systems are the key to reducing gas emissions in both the power and transport sectors.A wide range of technologies are being investigated [1].Some examples are hydrogen-based technologies, sodium-ion batteries, lithium-ion capacitors or aqueous ammonium-ion batteries [2][3][4].Lithium-ion batteries are the most widely used and represent the cornerstone of two growing markets: renewable energy and electric mobility [5].Research is underway to develop more sustainable batteries that will be safer and cheaper for many uses.Experiments play a central role in this goal by informing decisions at different stages of the battery life cycle.As interest in batteries grows, so does the amount of experimental data being produced to characterise them.For example, the Licit-Eco7 laboratory at Gustave Eiffel University generates 550 gigabytes of battery data per year, while the French Electrochemical Energy Storage Network (RS2E) with its 17 academic partners generates 1 petabyte of battery data per year [6].
The amount of existing data can allow for the development of a field of battery informatics that aims to develop methods and software tools for understanding battery data.Processing experimental battery data remains a challenge.As battery research is carried out by a large number of scientists who perform experiments with different approaches, experimental setups and conditions of use, data production can serve very different purposes.From battery design to defining their optimal use, data-driven approaches have been used to develop future generations of batteries and optimise the use of existing batteries.Although data-driven methods have recently gained significant interest, the field still lacks common practices, tools and standards for data generation and analysis [7][8][9].
Although standard methodologies and best practice recommendations for battery experiment design, data sharing and research publication are emerging [10][11][12], important gaps remain in the field of data generation and analysis.To the best of the authors' knowledge, no article describes the complete workflow from battery data generation to simulation and prediction, including processing techniques.This work aims to fill this gap by presenting the analysis techniques needed to process battery data, the available tools for data processing and the barriers to the adoption of open science practices in the field.

Contributions
The authors are experimentalists willing to provide a clear description of concepts, approaches, tools and challenges related to battery data generation and analysis.The ultimate goal of this article is to accelerate battery research by facilitating the collaboration between experimentalists, modellers and data scientists.The main contributions of this review can be summarised as follows: 1.

Description of data generation techniques and implications:
This article provides a comprehensive description of the major battery electrical experiments.The types of experiments and the equipment needed to perform them are described.The economic and environmental costs of battery experiments are also discussed.To the best of the authors' knowledge, this is the first article to describe such implications.

2.
Description of data analysis techniques: This article describes data processing for energy storage systems using the mathematical theory of time series analysis.This article lists and exhaustively describes the possible data analyses of the main battery testing methods: capacity, impedance and low current tests.

3.
Description of battery data uses and related tools: This work describes the possible uses of battery data.Data modelling and prediction for energy storage systems are introduced.Existing software for data processing and usage are also described and a particular interest is given to open software.4.
Discussion of open science practices in the field: This article describes open science as an important practice to be considered in the field of energy storage.This section discusses how open science practices could be made available to every researcher working on battery data, from data generation to analysis and prediction.

Layout
This article provides a discussion and analysis of several important and increasingly common questions: how battery data are produced, what data analysis techniques are needed, what the existing data analysis tools are and what perspectives on tool development are needed to advance the field of battery science.This article is structured as follows: Section 2 describes the types of experiments, the equipment needed to perform them and their environmental and economic impacts.In section 3, data processing techniques are discussed in detail.Section 4 describes the possible uses of battery data and the software available to carry them out.Finally, Section 5 describes the importance and gives some recommendations for the development of open science practices in the battery field.

Where Do Battery Data Come From?
Battery data are most often derived from either laboratory experiments or field use.Field data are essential to capture the non-regular cycling patterns and varying operating conditions that batteries experience in real-world applications [13].However, it is difficult to understand the mechanisms occurring in a battery with such data.Field data are generated under uncontrolled conditions, which can lead to overlapping effects that are difficult to interpret.The reproducibility of data can also be complicated by this uncontrolled nature.Therefore, laboratory studies are a necessary complement to field data.In the literature, the vast majority of studies favour this approach because it allows for the complete control of experimental conditions and the ability to accurately measure battery states at any time, which eases the development of accurate models.However, laboratory testing is costly as it requires equipment and skilled personnel to ensure the correct definition of test procedures and the proper operation of the experiment.

What Experimental Setup Is Required to Perform Battery Electrical Tests?
Experiments on batteries are performed for different purposes, depending on the background and expertise of the experimenter.Electrochemists characterise the battery at the electrode or cell scale to determine material properties and their evolution over time.Their expertise is needed to understand the electrochemical behaviour of the battery, which is essential for design purposes.Electrical engineers characterise the battery at the cell or module scale to determine system properties and their evolution over time.Their expertise, along with that of mechanical and electronic engineers, is required to ensure long, safe and optimal use of the battery inside a system.The experimental setup of any battery experimenter consists of two important pieces of equipment.Firstly, a climate chamber is required to perform the experiments, as battery behaviour is temperature-dependent.In most battery testing campaigns, the temperature is controlled using a climate chamber.This equipment is used to create a controlled climatic atmosphere in terms of temperature and humidity.
Secondly, a test device capable of evaluating battery performance is also required, commonly referred to as a cycler.The general principle of a battery cycler is to apply a current and measure a voltage (galvanostatic mode) or the opposite function (potentiostatic mode).Most cyclers use a direct current signal, but some devices can also handle alternative current signals.As the name suggests, this device is often used for cycling test campaigns.Three features make this hardware central to battery testing.First, it can handle a variety of usage modes.In real life, battery testing has a wide range of applications and goes through a variety of usage phases such as rest, constant current, constant voltage, constant resistance or constant power phases.A cycler should then be modular, responsive to perform this wide range of test types, and programmable, as the experimenter should be able to define a sequence of operating modes with conditions to move from one step to another step depending on the measured values (voltage, current, temperature, etc.) or calculated values (capacity, cycle count, etc.).Time constraints are also a major challenge in battery testing.As experimental studies can take up to months to complete, experimenters can hardly wait for one test to be completed before starting the next.It is therefore useful to have multiple channels to run several tests independently at the same time.Most modern battery cyclers offer this feature, allowing for each channel to be used as if it were separate equipment, allowing users to run experiments on different projects or batteries at the same time.Battery testing is not only time-consuming, but also involves some risks.Safety precautions must be taken to avoid incidents.Temperature, voltage or current must be carefully monitored to detect any abnormal levels and stop the test if risky conditions arise.As battery testing is often conducted over a long period of time, the user cannot always be present to monitor it.A cycler should include several safety systems, such as an emergency stop button.The ability to communicate with external systems such as a battery management system, an external alarm or a climate chamber is another key feature to detect potential hazardous situations and automatically stop the ongoing test.For battery cyclers designed for module or battery pack testing, the presence of multiple auxiliary measurement channels (voltage, temperature, etc.) as well as digital input/output and communication modules (e.g., CAN) is essential to facilitate external communication with safety devices.
Due to the diversity of battery cells, modules and packs in the field, there is no "one size fits all" solution for cyclers.Their performance, such as maximum current and accuracy, should be matched to the test being performed.Although most cyclers can be defined as two-quadrant power supplies (positive voltage and positive or negative current), some of them also include some additional test equipment.Potentiostats, galvanostats and spectrometers are the three most important in the battery field.They allow for electrochemical tests to be carried out on battery cells.As shown in Figure 1, a potentiostat is a device used to control the potential of an electrode by adjusting the electrical current supplied.Unlike a simple DC power supply or fixed voltage source, the potentiostat allows the potential of an electrode to be measured independently of the circuit used to power the cell.A galvanostat is a device used to control the electrical current supplied to a cell by adjusting the applied potential.Figure 2 shows its principle.Galvanostats allow for the control and measurement of both positive and negative currents through the electrode.Spectrometers are used for electrochemical impedance spectroscopy (EIS), which studies the response of a system to a sinusoidal potential or current perturbation as a function of frequency.This principle is shown in Figure 3.The frequency sweep provides access to all the processes taking place in the battery cell.

What Type of Electrical Tests Are Conducted?
Ageing campaigns and characterisation tests are the two main experimental approaches that can be performed in the laboratory.
An ageing test consists of subjecting a battery cell to stress conditions and periodically measuring the evolution of its performance.This performance measurement is known as a check-up or reference performance test (RPT) and aims to determine the evolution of battery performance degradation over a given period of time and under specific ageing conditions.The most common stressors (also known as ageing factors) are temperature, current rate and state of charge [14,15], but other factors such as mechanical stress are also gaining interest [16,17].Ageing campaigns are conducted to determine the influence of each ageing factor on the performance evolution of the battery.In practice, this consists of a series of ageing tests designed to determine the influence of each ageing factor on the development of the various ageing mechanisms affecting the battery.In order to guarantee the reproducibility of the test, several battery cells are usually tested under the same experimental conditions.In the literature, the influence of a variety of ageing factors has been assessed by varying their levels and tracking their effect on battery performance degradation [18,19].The influence of temperature is assessed almost systematically, as temperatures above 40 °C and below 5 °C are known to accelerate battery degradation by promoting side reactions [20][21][22].As battery ageing is a long process whose effects are only quantifiable after several years, accelerated ageing by thermal stress is necessary to follow the degradation over the time scale of a research project, i.e., a few months or years.However, it may raise doubts about the representativeness of the experiments carried out.High current is another important and popular stress factor as it triggers several degradation mechanisms [23][24][25].The influence of the state of charge is also often assessed as high levels and significant variations in this stress factor lead to accelerated ageing [26][27][28].To determine the minimum and most valuable number of ageing test conditions, but also to obtain the parameters to support empirical model development, the design of experiments is a useful methodology [29][30][31][32].Combining laboratory and field data is another interesting approach to improve the representativeness of the experiments [33][34][35].
Characterisation tests are the other important tests performed in the battery field.They aim to evaluate the electrical and thermal behaviour of the battery at a given time and under specific experimental conditions.The ambient temperature, the electrical connection and the pressure applied to the battery under test are key elements to be controlled during the experiment.Climate chambers are the devices used to control and monitor the ambient temperature during the test.For electrical connection and applied pressure, battery holders are commonly used to ensure the reproducibility of the test [36,37].A characterisation test typically consists of three important characteristic measurements: capacity, impedance and low-current behaviour.
Capacity is the amount of charge a battery can store or deliver at a given current rate.It is typically measured by performing a constant-current discharge and a constantcurrent charge between the minimum and maximum voltage thresholds; once the maximum voltage is reached, the voltage is maintained at this upper limit by reducing the current down to a threshold.This procedure is often called "CC-CV charging".The definition of the voltage threshold depends on the battery chemistry and the manufacturer's recommendations [38][39][40].The capacity is then calculated by current integration during the discharge phase.If this method is popular, there are several variants.The current rates and the number of iterations of measures can vary significantly between studies [41].Some authors also add constant voltage phases at the end of the discharge phase, as this step allows the battery to be fully discharged and obtain a more accurate measurement of capacity as it is less affected by polarisation [42].Other authors have also studied current profiles that could accelerate the charging process [43,44].Figure 4 shows two variations of a capacity test.
Impedance testing is used to characterise deliverable power and thermal behaviour.Figure 5 presents the two approaches to measure impedance.The typical approach to temporal impedance measurement consists in applying a current pulse or a series of current pulses and measuring the voltage drop during each pulse.As impedance is very dependent of temperature and state of charge, the same pattern may be repeated over a range of state of charge and temperatures.In a review on impedance measurement, Piłatowicz showed that impedance depends on many parameters such as current rate, temperature, state of health and state of charge, and that there are no standardised methods and that many different definitions and ways of measuring impedance are used [47].Another approach for impedance characterisation is to measure impedance in the frequency domain.Electrochemical impedance spectrometry is a popular technique that allows for impedance to be measured at different frequencies.By covering a wide range of frequencies, it is possible to characterise the different physical phenomena occurring in a battery.Typically, this technique consists of applying a series of sinusoidal electrical constraints (from high frequency to low frequency or vice versa) to the battery and measuring its response as a function of frequency.When the constraint is a voltage, this technique is called potential electrochemical impedance spectroscopy (PEIS), and when a current signal is used it is called galvanostatic electrochemical impedance spectroscopy (GEIS).
Low-current tests are used to determine the relationship between open-circuit voltage and state of charge, and to perform differential analyses such as incremental capacity analysis or differential voltage analysis.Open circuit voltage is the voltage of the battery at rest.It is measured using either a low current rate or a sequence of partial current or voltage charges and discharges to vary the state of charge [48,49].In the latter technique, charges and discharges are followed by a rest period of a few hours before voltage measurement.The low current rate technique consists of using a low charge or discharge current rate, known as pseudo-open circuit voltage, since the battery is not at rest. Figure 6 shows these two approaches.The voltage variations with capacity can be used as a diagnostic tool to determine the ongoing degradation mechanisms in the cell [50].
The previous three types of test are the most common, but there is a wide variety of possible tests.As the battery is an electrochemical system, many electrochemical tests such as linear and cyclic voltametry or titration techniques are used in the field.Cyclic voltammetry is a similar technique, but when the cell reaches the maximum voltage, the applied voltage is reduced to the minimum value to complete a cycle.Galvanostatic intermittent titration consists of applying a series of constant current pulses followed by pauses, while potentiostatic intermittent titration is a similar technique using voltage levels instead of current pulses.The diversity of techniques and test definitions in the field has led several institutions to define standards for battery testing [51].FreedomCAR, US ABC, ISO 12405-4:2018 are some examples of such initiatives to promote best and common practices for characterisation techniques [52][53][54].However, as most of the experiments conducted are aimed at testing a specific battery technology and application, these standards are not always followed.

What Are the Costs of Battery Testing?
Battery testing is often presented as a necessary but costly approach in terms of time, economic and environmental costs [29,55,56].This article presents an estimate of these costs using a simple approach.It aims to provide a better understanding of the origin and magnitude of these costs.
As a representative example, the costs associated with the characterisation dataset shared by the authors in an article are estimated [42,45,57].In this article, a SAMSUNG 94Ah cell was tested at three temperatures of 0, 25 and 40 °C in France in 2022.In terms of equipment production costs, a Bitrode cycler with a Votsch VT 3050 climate chamber was used to perform the experiment.The equipment was purchased in 2011 and 2006 for approximately EUR 40,000 and EUR 20,000, respectively.According to the open source web application GES 1point5, it can be estimated that the production, distribution and endof-life devices would, respectively, produce 11,200 ± 5600 kg CO 2 eq and 7800 ± 3900 kg CO 2 eq over their lifetime [58].Regarding the cost of data production, the characterisation test is a sequence of capacity, resistance and low current measurement, and it lasted 91 h at each temperature.The authors estimate that, for the characterisation of a cell at 0, 25 and 40 °C during 91 h, the climate chamber has consumed 136.5 kWh and the cycler 94.5 kWh, resulting in a total consumption of 231 kWh.In France, at the time of the experiment, the electricity cost was around 0.21 EUR/kWh and 0.63 kg CO 2 eq/kWh; therefore, this electrical consumption cost about 14.7 kg CO 2 eq and EUR 48 [59].Regarding equipment storage after the characterisation test, the tested cell was stored in a refrigerator at 10 °C, consuming 1.1 kWh per week.After one year, the consumption was 57.2 kWh.In France, this electrical consumption cost about 3.6 kg CO 2 eq and EUR 13 [59].Finally, the data produced were stored in a public data centre [57].According to [60] and assuming a lifespan of 5 years for the server, data storage production, distribution and end-of-life cost about 550 kg CO 2 eq and EUR 137.
The authors would like to point out that these estimates are made to highlight the costs associated with the experiments, as well as to raise collective awareness of the importance of data sharing in order to make the best use of resources dedicated to battery testing.This estimate is a first approach to indicate some orders of magnitude, but it does not take into account important factors such as the manufacture and power consumption of computers and room lighting, the manufacture of the cold room for post-test cell storage and the time taken to prepare the test protocol, which can represent several days [61].A more comprehensive and robust assessment of laboratory impact is therefore needed and is being undertaken by many laboratories in initiatives such as labo1point5 [58].
This estimate highlights the prominent role of equipment manufacturing in the cost of battery testing.The purchase of equipment is certainly an important barrier to the widespread adoption of an experimentally based research approach.The environmental cost of battery testing is primarily a function of the amount and emission factor of energy consumed.This is because the emission factors associated with electricity consumption depend on the country's electricity generation mix.The Table 1 shows cost estimates for an experiment conducted in France, where the emission factor is low due to the large share of nuclear power in the energy mix.The impact of battery experiments can then vary significantly from the values shown, as the range of carbon intensities can vary by a factor of 10 or more between countries.For example, the carbon intensity of the European Union in 2020 is 123.5 gCO 2 /kWh, while the United States, China and India reach 352.5, 580 and 707.2 gCO 2 /kWh, respectively, in the same year [62].This estimate highlights the importance of establishing collaborations between laboratories for experimental campaigns.Sharing equipment and data helps to limit redundant experiments and their associated costs.SIMSTOCK, SIMCAL, MOBICUS, COMUTES², BATTERIES2020, Battery 2030+ or BattNutzung are some examples of such collaborations in the field [18,[63][64][65][66][67].

What Do Battery Data Look Like?
As in many areas of science, battery data are mostly derived from experiments over time.Battery experimental data consist of an ordered sequence of variables such as current, voltage and temperature, measured at uniformly spaced points in time according to a given sampling rate.This description corresponds to the definition of a multivariate time series [68].In order to take advantage of the tests carried out, the analysis of time vectors is usually carried out in three stages: the preparation of the data, their segmentation and the identification of patterns useful for the analysis of the data.

Data Preparation
In the battery field, time series typically come from field observations or battery cyclers in a variety of file formats and sizes.The Tables A1-A3 show the variety of file formats and cyclers used for the public battery datasets.
Table A1 shows a list of public ageing datasets.The analysis of these 26 datasets shows that data sharing is a practice that is gaining interest, as the number of public ageing datasets has been multiplied by 5 since 2020 (21 more); consequently, most of the shared data concerned newer battery technologies.More than 80% of the studies examined the ageing evolution of cylindrical cells (21 out of 26 studies).In spite of being a popular format for electric vehicle batteries, ageing data for prismatic cells have not been reported in any public datasets.Small capacity cells were also widely favoured, with 25 of the 26 datasets conducting experiments on cells with a capacity of 5Ah or less.This can be explained by economic constraints, as the cost of equipment is directly related to the power it can deliver, so testing small capacity cells helps to limit current levels.As an illustrative example, the cell test cycler manufactured by Bitrode Corporation can control the cell within potentiostatic limits of ±18 V and galvanostatic limits of ±300 A, with data collected every 100 ms, and is advertised for sale for several thousand dollars.The module test cycler has potentiostatic limits of ±100 V and galvanostatic limits of ±1000 A, with data collected every 10 ms, and is advertised for sale for tens of thousands of dollars.Finally, the package testing cycler has potentiostatic limits of ±1500 V and galvanostatic limits of ±2400 A, with data collected every 10 ms, and is advertised for sale for several hundred thousand dollars [69].In the available public ageing datasets, cycling ageing has also mostly been tested using constant current profiles, and only five campaigns have been conducted using realistic profiles.This can be explained by the complexity of interpreting experimental results with a combination of calendar and cycling ageing [70].
A wide variety of cyclers were used to perform the experiments, as 10 different brands of cyclers are listed in Table A1.Arbin is the most popular and widely used among laboratories.Regarding the data format, over 90% of the experimental files are in tabular form, but the file format, i.e., column names and organisation, varies depending on the cycler brand and model used.Recently, initiatives have emerged to provide tools for format harmonisation, such as BatteryArchive, a public repository for the visualisation, analysis and comparison of battery data across institutions, or software such as Cellpy, BEEP or DATTES [71][72][73].All data processing scripts are at least written in Matlab code.Python and R scripts are also shared in some repositories.
Table A2 contains a list of publicly available characterisation data.As with the ageing datasets, cylindrical and low-capacity cells make up the majority of the shared data.The references [57,74] are two outstanding datasets as they have tested second-life cells.In terms of data format, the characterisation datasets also suffer from inhomogeneity, as the 25 available datasets were produced on 11 different cyclers.Prior to any analysis, it is therefore necessary to determine the structure of the results file to be analysed.As experimental results from different cyclers may need to be compared, the battery software should define its own result structure to facilitate the comparison of the results.

Segmentation
Segmentation aims to divide the time series into appropriate, internally homogeneous segments to facilitate analysis.Figure 7 illustrates the segmentation principle.In this figure, the segmentation consisted of determining full charges and discharges by identifying the points at which the battery cut-off voltages were reached.These outstanding points are highlighted in the figure with red dots above the test profile (in the blue line).Other segmentation conditions can be defined depending on the test and the purpose of the segmentation.

Pattern Discovery
Pattern discovery is the process of finding each subsequence with specific characteristics (called a pattern) from a time series or set of time series.In the battery field, there are two types of salient patterns that provide useful information for the battery researcher to model or analyse the data: the recurrent patterns and the task-specific patterns.A recurrent pattern is a sub-sequence that occurs repeatedly in a longer time series or in a set of time series.This type of pattern occurs during ageing campaigns and is commonly referred to as a "cycle".Table A3 contains a list of publicly available usage profiles.To date, only a limited number of usage and/or field data profiles have been made available.Figure 8 shows some of the shared cycles.
A task-specific relevant pattern consists of the detection of capacity, impedance and low-current tests.This type of pattern occurs during characterisation or ageing campaigns and is used for modelling and prediction.The principle is shown in Figure 9.

Data Analysis
A time series is usually long with complex information, so an automatic summary can be useful to gain insight into the time series.In the battery field, capacity, impedance and low-current measurements are often used to summarise their task-specific patterns.

Capacity Test
The analysis of capacity measurement data is absolutely central to the battery field.This quantity is the keystone of a large number of battery state variables.State of health (SoH) and state of charge (SoC) are two very popular battery state variables based on capacity.
State of health is an indicator that describes the capacity available in a battery compared to its capacity at the beginning of its life.Although standards have proposed definitions, the methods of calculating this quantity can vary in the literature [79].In ageing studies, the evolution of this parameter over time and as a function of ageing conditions is generally monitored [80,81].As capacity is a key indicator of battery performance, its analysis can also allow for the prediction of the remaining useful life (RUL) [82].A growing number of papers have also presented some features within a capacity test as particularly interesting for the rapid diagnosis of battery health.Recently, Li et al. analysed a capacity test to extract possible ageing signatures [83].They compared 69 features and showed that the energy of the current during the discharge phase is a particularly relevant indicator of battery health.Other studies have shown that partial charge or constant-voltage phase kinetics during a capacity test are also relevant tools for monitoring battery ageing [84].
State of charge is another important battery parameter that can be obtained thanks to a capacity test analysis.It quantifies the remaining capacity available in the battery at a given time and in relation to a given state of ageing.It can be calculated in a variety of ways [85] and there is a wealth of literature on the optimal definition of this key parameter.In a review article, Espedal et al. highlighted the importance of hardware and software choices as well as analysis methodology to accurately estimate this state value [86].Good state of charge determination is particularly important because battery behaviour is highly dependent on this parameter.Consequently, the accuracy of battery models and simulations relies heavily on this state of charge value.

Impedance Test
Impedance measurements are commonly used to determine battery performance and, in particular, its power capability [87].This measurement is highly dependent on the ageing level, current rate, state of charge and temperature, so in most test campaigns the experimenter measures the impedance in several experimental conditions.The analysis of the measurement mainly consists of determining the impedance evolution as a function of the test conditions.Impedance models are then used to mimic the dynamic behaviour of the battery.However, as with other battery characteristics, impedance data acquisition, analysis and modelling suffer from methodological heterogeneity.Schweiger et al. presented several possible alternatives for temporal impedance measurement [88].The authors showed that the duration and amplitude of the current pulse significantly influence the value of the impedance.They also presented the variety of methods used to perform the experiment.The use of discharge or charge pulses, the current rise or the current decrease phase are some examples of existing methods used in the literature.The time at which the voltage drop is measured can also differ between two studies.This heterogeneity is further exacerbated with modelling, as a large number of impedance models exist and modelling methods often vary between studies [89].In a review of impedance modelling, Wang et al. presented the wide variety of battery impedance models available in the literature [90].
Impedance is also a popular tool for diagnosing battery degradation.When a cell is cycled, various time-dependent over-voltage effects occur, resulting in a dynamic voltage response.Their evolution over time can be tracked to identify the main ageing mechanisms occurring in the battery.Figure 10 shows the different over-voltage effects, and detailed descriptions of these phenomena are available in the literature [91].In practice, however, electrochemical impedance spectroscopy is far more popular for separating impedance contributions and understanding the various physical phenomena taking place in the battery.To ensure the quality of the analysis, the measurement data should be carefully evaluated.Kramers Kronig residuals is a method that can be used to verify the quality of the measurement data as well as the time invariance and linearity condition of the impedance measurements [92].Several studies have used the frequency method at different stages of battery ageing to obtain an evolution law of the impedance components.This method allows one to estimate the power loss of the battery over time, in other words, a state of health or remaining useful life [93].For example, Mingant et al. used R Ω as a health indicator to track ageing [94] while Wang et al. preferred R CT [95].

Low-Current Test
The analysis of low-current tests is particularly useful for determining the degradation mechanisms occurring in a battery.Incremental capacity analysis (ICA) and differential voltage analysis (DVA) are two popular non-invasive techniques well suited to establishing battery diagnosis from laboratory tests or field data.The ICA curve refers to cell voltage, while the DVA refers to cell capacity.These methods are based on transforming voltage plateaus of the charge and discharge curves into clearly identifiable peaks.The peaks in the ICA curve represent phase equilibria, while in the DVA curve they represent phase transitions.The intensity and position of the peaks can be interpreted as signatures of specific electrochemical processes taking place in the battery.Ageing can then be monitored by tracking the evolution of peak values, shape and position.These analytical tools have gained interest for various purposes such as degradation mode detection [96], embedded diagnostics [97] or post-mortem investigation [98].
In a review of characterisation techniques, Barai et al. compared the ICA and DVA techniques [99].According to the authors, both techniques provide the same information with only minor differences.Differential curves are described as more visual but difficult to compare between two studies because the abscissa is capacity, which changes with age.While the incremental capacity curve requires more advanced knowledge to derive valuable information at first glance, the abscissa is voltage and therefore constant with degradation.When analysing experiments conducted at medium-to-high rates, IC curves may be a better alternative because, unlike DV curves, the changes in resistance can be visualised and quantified.To maximise diagnostic accuracy, Dubarry and Ansean recommend optimising data quality by avoiding high-current, low-resolution cyclers and noise [100].Therefore, the authors recommend performing the experiment in a thermally controlled environment and using a well-calibrated instrument with a resolution of 1 mV or less.Figure 12 shows the influence of the choice of cycler and filter on the noise of the incremental capacity analysis.Once the data are generated, advanced analysis methods should be applied because it can be difficult to directly derive the peaks on the incremental capacity curve, since their corresponding points on the charge/discharge curve are always in the voltage plateau region, which is flat and sensitive to measurement noise.Liu et al. distinguish two methods for processing the low-current data [101]: filtering methods and fitting methods.
Filtering methods make it possible to obtain smooth IC curves, which may maximise the accuracy of the analysis.A wide variety of filters have been presented in the literature for this purpose.Moving average, wavelet transform, Butterworth and Gaussian window filters are some examples of filters studied in the literature [97,[102][103][104].The lack of a standardised approach to data manipulation makes it difficult to compare studies.In an article collecting best practices for ICA, Dubarry and Ansean recommended the use of a filter that considers one data point every fixed number of mV, typically 2 mV [100].
Fitting methods consist of transforming experimental signals with discrete levels and random noise into smoothed and differentiable functions, called "curve fitting" because the fitted curves have continuous functions that are easier to differentiate [105].As with filtering, there is no standard method yet.Weng et al. and Wang et al. used support vector machine techniques to perform curve fitting [106,107].Samad et al. successfully applied the Savitzky-Golay method [108], while Maures et al. used logistic equations [109] and Feng et al. proposed a novel method called "Level Evaluation ANalysis" or LEAN method, based on counting the number of points at each sampling level, whose accuracy and reproducibility are proven by mathematical arguments [105].Several studies have also used some features of a low-current measurement as ageing signatures.Weng et al. proposed tracking ICA peaks to monitor battery health [110], while Rivière et al. used peak areas [97] and Zheng et al. used peak and valleys positions [111].

Tools for Data Processing, Modelling and Prediction
The previous sections have shown that the main challenge is to address the heterogeneity in data formats and experimental methodology, as this limits the ability to compare studies.This section presents the software and tools available to process the heterogeneous battery experimental data and transform it into valuable analysis or simulation results.
Software is the cornerstone of this key activity.In the field, most research teams have developed their own software to process their private datasets and meet their specific analysis needs.While this organisation allows researchers to process their data rapidly, it gravely hampers the reproducibility of their studies.In addition, this organisation dramatically increases the amount of time researchers spend programming and the risk of producing faulty analyses, as most researchers are self-taught software developers.It is also difficult to compare research from two different laboratories, as each team is likely to use its own approach to collecting and analysing data.Sharing and comparing experimental results is then severely hampered by the lack of widely available data processing tools.Recently, several teams have made efforts to release their tools under open licenses.Open software is data processing tools that have been released under an open license, meaning that they are freely available for modification and redistribution.By allowing any user to see, use and modify the code of software, a standard, peer-reviewed methodology for data processing can emerge.This review will present the available tools and software that promote open science practices.It has been limited to three criteria, unless otherwise stated: all the software described in this section is distributed under an open license, they are actively developed since 2020 and they provide tools capable of processing data from a variety of projects and cyclers.Table A4 lists some of the existing open and free software in the battery field.Among the software that does not meet these criteria, commercial software is a very popular tool.Table A5 gives a non-exhaustive list of such software.Some software produced by researchers in public laboratories is also not included, as it is no longer actively developed.Alawa [112,113], DiffCapAnalyzer [114,115] and Lionsimba [116,117] are examples of such software.As the battery is a constantly evolving research topic, continuous development efforts are required to keep the tools up to date with the latest analysis methods available in the literature.The need for reproducible research has also led some researchers to publish their data processing code in papers.In the field of batteries, EISFitting is an example of such a repository [118,119].This practice is particularly welcome, but in this article these types of repositories will not be presented because shared tools only work on a specific dataset.They do not make it possible to manage the diversity of battery data.

Experimental Data Processing
In the battery field, three open software can handle raw experimental data from many different cyclers: BEEP, Cellpy and DATTES.
The Battery Evaluation and Early Prediction (BEEP) software package is an open source software written in Python that allows for raw data preparation, data integrity validation, structuring of data into Python objects and feature extraction data to serve as input for machine learning.This software focuses on cyclic ageing data and provides a bridge between raw experimental data and data-driven prediction as machine learning [120,121].
Cellpy is a Python package designed to facilitate the task of interpreting and processing data from battery and cell cycling tests.The software performs data preparation and data processing including ICA and OCV.It also allows for data comparison and visualisation [71].
DATTES is a Matlab/GNU Octave software designed to help researchers get the most out of experimental data.DATTES provides a complete and customisable framework for data extraction, standardisation, analysis and visualisation, as well as bridges to other existing opens source tools [73,122].

Battery and System Models
Experimental data can be used to develop models of the batteries that allow for a safe and optimal management of the battery within a system.Battery models can be divided into two categories: physics-based or empirical.One is based on an understanding of the underlying physical phenomena that occur during battery operation.The models that attempt to mimic the dynamics of these mechanisms are called physics-based models [123].In the battery field, several open software exist for physics-based modelling.
Dualfoil.py is open source Python software that wraps the Doyle et al. dualfoil model in Python [124].Leveraging the object-oriented nature of Python, dualfoil.pyallows the user to generate, organise and visualise the electrochemical responses of different rechargeable battery systems [125].
PyBaMM (Python Battery Mathematical Modelling) is a tool for the fast and flexible simulation of battery models.It provides a modular framework to solve continuum models for batteries [126,127].
Liionpack is a battery pack simulator that allows PyBaMM simulations to be run at the pack level.Thermal effects can also be included [128].
PETLION is a Julia battery modelling software that can run pseudo-2D porous electrode theory (PET) simulations [129,130].
MPET is Python software designed to run simulations of porous electrode batteries using the porous electrode theory, which is a volume-averaged, multi-scale approach to capture the coupled behaviour of the electrolyte and active material within the electrodes [131,132].
NEOLAB [133,134], Spectral Li-Ion SPM [135,136] or Supercapacitor-Model [137,138] are some examples of other existing software.They will not be presented in detail as they do not meet the above criteria.
A second type of model uses experimental data and an electrical circuit analogy to emulate the electrical and thermal response of the battery to a current demand.These simpler models are called equivalent circuit models.The vast majority of battery management systems for battery packs use equivalent circuit models.Their simplicity, robustness and ability to account for thermal and ageing effects are the main reasons for this choice [123].In the field, some software proposes functions using this modelling approach.
Impedance.py is a Python package designed to make the analysis of electrochemical impedance spectroscopy (EIS) data easier and more reproducible.Impedance.pyprovides several useful functions such as data preparation and integrity checking, model fitting and validation and visualisation tools.Its EIS modelling function fits the parameters of an electrical circuit model to impedance data.This function uses non-linear least squares regression to simultaneously fit the real and imaginary components of the impedance and allows users to customise their own electrical circuit [139,140].The Lin-KK tool is another free but not open software with similar features [141,142].
LIBEIS is an open Matlab Python software for EIS analysis.By analysing the voltage and current time series obtained from a battery under test, the LIBEIS software tool calculates and plots EIS data.LiBEIS also fits the EIS data to an analogue circuit model.Finally, LiBEIS uses machine learning techniques and exploratory data analysis tools to estimate the state of charge from the EIS data [143].In addition to this software, the same research team proposed the open software EasyEIS.A data processing tool to process EIS data generated by a custom impedance measurement system [144].
DATTES also proposes the use of experimental data to calibrate electrical circuit models [73,122].Model parameters are proposed directly and systematically as a result of the data analysis process.The static behaviour is modelled after processing a low current test data, while the dynamic behaviour is fitted thanks to an impedance test.DATTES models are also designed to be easily used in the energy management software VEHLIB [145,146].

Simulation and Prediction
Once produced and processed, battery data are reused in a variety of scientific communities.This subsection is not intended to be exhaustive, but to show how battery data can be reused.In most battery-powered applications, optimising energy management and estimating the evolution of battery performance is of paramount importance.Simulations and predictions play this central role.Battery end-of-life is defined as the point at which the battery can no longer provide the energy or power required for its application.To ensure the economic viability of a product, it is necessary to predict the remaining useful life before this point.In the battery community, state of health estimation is the preferred technique for tracking the actual performance of batteries in service.As presented in the previous sections, state of health is an indicator of the battery's current performance in terms of energy and power compared to its performance at the beginning of its life.This topic is very popular and several reviews have presented the challenge [14,[147][148][149].Model-based and data-based prediction are often defined as the two most common approaches [150,151].Model-based prediction involves the use of physics-based or equivalent circuit models to estimate the state of health.Data-driven methods take advantage of the increasing amount of battery ageing data available.By fitting a large amount of data collected under predefined experimental conditions, lifetime estimation models can be defined.Machine learning techniques are also a popular method [152].
Slide is an open software, written mainly in C++, for simulating the degradation of lithium-ion batteries.Users can program their degradation procedures by loading the battery with their own current profile or by selecting the standard procedures already implemented in the software.Users can also simulate various characterisation tests, including the main electrical test types described in Section 3.3 [153,154].
To date, most prediction studies have been conducted with Python or Matlab code written specifically for a project [155][156][157].The BEEP software is therefore an excellent initiative to promote open science practices for predictive studies [120,121].The BEEP modelling module is designed for prediction.It makes it possible to aggregate data from ageing campaigns and train machine learning models.The predictions can be used to predict the cycle life or the number of cycles to reach a certain level of performance degradation [120,121].

Promoting Open Science
Generating battery data through experimentation is costly in terms of time, equipment and environmental impact.Several approaches can be used to reduce these costs and improve our understanding of battery behaviour.This section presents a number of good practices to maximise the sharing and reuse of valuable experimental data, but also to facilitate the comparison of data generated by different research facilities using different test equipment and protocols.Perspectives for future research on comparing different cell formats and sizes are also given.
Many published articles present experimental results on batteries.It is often the case that an article presents experimental work on a technology or application close to the one of interest.Where data are not available, the use of tools to extract data from images such as Webplotdigitizer or Grabit can be particularly useful [158] and are increasingly used in the battery literature [159,160].Although the number of public data repositories is still limited and they lack visibility, recent articles highlighted their existence and content [161,162].It is also worth to investigate whether experimental data on similar battery technologies or applications have been shared before starting a new test campaign.Table 2 gives a non-exhaustive list of repositories where battery data are deposited.As the number of open data repositories is large and may be difficult to discover, a recent initiative involving several institutes emerged: the Battery Data Genome [172].This initiative aims to create a collection of datahubs to promote data sharing and structure open data.After having developed a proof-of-concept of integration between Battery Archive [173], BEEP [121] and PyBAMM [127], this community may extend this initiative to other existing software and tools.There are also long-term archives comparable to Battery Archives on the Web [173].Liiondb is an initiative that aims to highlight many parameter measurements that could be used in physics-based models and are available in the literature [174,175].The Materialsproject is another open database that presents the properties of a wide range of materials that could be used in battery design [176,177].NREL has proposed an open library of three-dimensional lithium-ion battery electrode microstructures for microstructure characterisation and modelling [178,179].A database containing data from hundreds of abuse tests conducted on commercial lithium-ion batteries has also been released by NREL [180,181].
After reviewing the existing literature on a battery technology, data generation should take into account the cost and time constraints of the experiments.To this end, Design of Experiments (DoE) is an interesting methodology that is increasingly being used in the field.Encouraging collaboration between research teams is another effective way to reduce the need for experiments.However, working methods and equipment may differ between research teams.It is then essential to ensure that the equipment and methods used are comparable between teams.In terms of data production methodology, Dubarry and Baure proposed a set of best practices for battery testing that can help experimenters produce accurate and complete experimental data [10].Specific recommendations have also been made for incremental capacity analysis studies [100].As shown in Section 2, battery data production methods suffer from heterogeneity.To minimise the impact of this problem, standard test methods should be preferred.In [182], Gabbar et al. provide an overview of the various existing standards for battery testing.
Metadata generation is another key element to maximise the interpretability and reusability of the data generated.The Battery Data Genome initiative proposed a metadata convention to make data generated by different instruments and experimenters more easily interoperable and comparable.Based on these recommendations, the open software DATTES includes metadata generation functions to associate data descriptors with raw and processed data.
To ensure a reliable result, data used for battery modelling or prediction should be limited to datasets wherein the production methodology is well known.Therefore, only measured data such as time, current, voltage or temperature should be collected from cyclers.The use of data calculated by the test equipment needs to be weighted.The advantage of using data calculated by the cycler lies in the fact that the cycler's central processing unit generally has data at a very high sampling frequency compared to the data recording rate.On the other hand, the calculation methodology inside the cycler is often poorly described.One recommendation could be to use the variables calculated by the cycler when the methodology is well described or simple, as in the case of Ah counting (integration of current over time), but to prefer the use of well-documented software (often open source) for more complex data processing such as the calculation of equivalent resistances or the identification of impedance parameters.The use of open software to determine such characteristics is indeed a good practice, as the methodology can be peerreviewed and described in detail in the future publications.Conversely, processing battery data with non-open software severely limits the quality of the research, as there is no way to review, debug or modify the analysis methodology.Several open software packages are already available for battery data analysis.BEEP, Cellpy or DATTES are some examples of software that aim to facilitate the reproducibility of the experiments.Although these software packages are well documented, their documentation is generally written in a way that is not very accessible to non-developers, as it is often included in the source code.One way to improve this would be to include "methodology documentation" that does not require users to search for this information in the source code.A detailed description of the data processing methodology would make debugging and adding new functionality to the software easier.The development of user-friendly tools is also one of the most effective ways of facilitating a wider adoption of open software.Supporting researchers in their daily work leads them to feed these tools with large amounts of data, which increases the robustness of the software.To date, the most comprehensive software available requires programming skills.The development of simple tools with graphical user interfaces would be an important addition to existing tools and could remove barriers to the widespread adoption of open science in this area.Lithium Inventory, WattRank or Batemo are three examples of such initiatives [183][184][185].
A number of good practices can be identified in relation to publication and data sharing.The Journal of Power Sources has published a set of guidelines and best practices for publishing battery research articles [11].In addition to these recommendations, it is good practice to use a different colour for each experimental data plot, as this greatly facilitates the use of data extraction tools from images.This is particularly true for ageing studies, which should also be clearly labelled with the chosen colours.Presenting battery degradation as a function of number of cycles only should also be avoided, as cycle definitions may differ between studies.Instead, time and charged or discharged capacity should be preferred.For modelling and prediction studies, it is good practice to provide a full description of the calibration methodology, sharing model parameter values or data processing code.Sharing easily interpretable performance information is also good practice.Presentation of computational load, maximum error, mean absolute error, mean square error and a probability of the error distribution in each publication may greatly facilitate performance comparisons between models.
To maximise the impact of data sharing in public repositories, several good practices should be considered.Raw experimental data should always be shared, as they are the keystone for the reliability of future data exploitation work.Galvanalyser is an open source tool that facilitates the sharing of raw data.This platform helps users to automatically store their data generated by battery cyclers in a database and present this data to users via a Web application and a REST API [186,187].Processed data can also be shared with open software to accelerate data reuse.Another good practice is to share a data repository descriptor text file, which should contain the information presented in Table 3.The Battery Data Genome initiative presents a list of principles to improve the accessibility, interpretability and understandability of data shared in the field [172].To maximise the visibility of data repositories, the use of a common vocabulary for metadata is crucial.Clark et al. have proposed a battery data ontology that may facilitate the emergence of a common vocabulary in the community [72].
Figure 13 presents a graphical synthesis of these recommendations to promote open sharing and reuse of data, but also to facilitate comparison of data from different research facilities using different experimental equipment and protocols.This review has highlighted the fact that battery data suffer from high production costs and significant heterogeneity in production and use methods.Open science can help address this challenge and accelerate the development of battery data-driven science [6].However, given the diversity of battery technologies, comparing data collected using different cell formats, sizes or assembly architectures will remain challenging.This difficulty offers interesting prospects for future research.The development and testing of data preparation and processing techniques, such as filtering or feature extraction, which can be applied to different battery technologies, will be of paramount importance [188].The influence of battery technology, format or architecture on model performance also needs to be further investigated [189].

Conclusions and Recommendations
This review provides a critical analysis of data generation and processing techniques for lithium-ion batteries.
A comprehensive description of the main battery electrical experiments has been provided.The types of experiments and the equipment required to perform them have been described and the different costs associated with battery experiments have been discussed.Data processing for energy storage systems has also been described using the mathematical theory of time series analysis.The possible data analyses of the main battery test methods: capacity, impedance and low current tests were described.Data modelling and prediction for energy storage systems was also introduced.Existing software for data processing and use has also been described.
The ultimate goal of the article is to accelerate battery research by facilitating collaboration between experimentalists, modellers and data scientists.The authors hope that it will raise awareness of two major gaps in the field: the cost of data production and the heterogeneity of data production and processing methods.In order to promote the raising of the standard from data production to data-driven predictions, and to democratise the practice of open science, a number of good practices have been presented.

Figure 4 .
Figure 4. Voltage (a) and current (c) profiles during a capacity test in[45], voltage (b) and current (d) profiles during a capacity test in[46].

Figure 5 .
Figure 5. Voltage (a) and current (b) profiles during an impedance test conducted with a temporal approach and impedance measurement thanks to a frequency approach (c).

Figure 6 .
Figure 6.Voltage (a) and current (c) profiles during a low current rate test and voltage (b) and current (d) profiles during a low current rate test measured thanks to the sequence of pulse approach.

Figure 7 .
Figure 7. Segmentation of a battery test by highlighting points at which the battery cut-off voltages are reached.

Figure 10 .
Figure 10.Different phenomena occurring in a battery with their dynamics and the type of test used for their characterisation.

Figure 11
Figure11shows that temporal and frequency analysis methods can be used to extract the different over-voltage effects generated by battery phenomena.

Figure 11 .
Figure 11.Different over-voltage effects illustrated on a time impedance (left) and frequency impedance (right).

Figure 12 .
Figure 12.Influence of the cycler (left) and filter (right) choices over the noisiness of incremental capacity analysis.

Figure 13 .
Figure 13.Best practices to promote open science in the battery field.

Table 1 .
Estimation of the order of magnitude of the various costs associated with a battery electrical test.

Table 2 .
Overview of open dataset repository including battery data.
*: Number of results indexing the term "lithium battery".

Table 3 .
Minimal set of information to include in a data repository descriptor text file.
?: Data not available.

Table A2 .
Overview of characterisation public data sets.

Table A4 .
Open software in the battery field.

Table A5 .
Commercial software in the battery field.