Clustered Based Prediction for Batteries in the Data Centers

This paper proposes an ARIMA approach to battery health forecasting with accuracy improvement by K shape-based clustered predictors. The health prediction of the battery pack is an important function of a battery management system in data centers. Accurate forecasting of battery life turns out to be very difficult without failure data to train a good forecasting model in real life. The conventional ARIMA model is compared with total and clustered predictors for battery health forecasting. Results show that the forecasting accuracy of the ARIMA model significantly improved by utilizing the results of the clustered predictors for 40 batteries in a real data center. One year of actual historical data of 40 batteries of large scale datacenter is presented to validate the effectiveness of the proposed methodology.


Introduction
Battery uninterrupted power supply (UPS) is an integral part of the data centers, which ensures the stable performance of the data center during transitional fail-over mechanisms between power grids and diesel generators [1]. Steady power is required by the data centers, which is managed by the battery uninterrupted power supply (UPS). UPS is installed between the main power grid and servers [2]. Since the electricity bill of a data center constitutes a significant portion of its overall operational costs, data centers are now major consumers of electrical energy [3]. In 2013, data centers in the U.S. consumed 91 billion kilowatt-hours of electricity, and this is expected to continue to rise [4]. In 2017, nearly 8 million data centers required an astronomical 416.2 terawatt-hours of electricity [5,6]. The layout of the data centers design is illustrated in Figure 1.
The health assessment and remaining cycle life estimation of battery is a challenge when the battery system participates in data centers. Despite the increasing improvements in battery manufacturing and storage technology [7]. Not surprisingly, many traditional battery management methodologies have been studied to develop battery life prediction of the battery packs, such as voltage fault diagnosis, charge regimes, and state of health (SOH) estimation. Kristen et al [8] used the discharge voltage curve of 124 batteries to demonstrate a data-driven model to predict the battery life cycle before degradation. Tang et al. [9] predict the battery voltage, power with the model-based extreme learning machine for the electric vehicles. Xiaosong et al. [10] utilized the sample entropy of short voltage the sequence is used as an effective signature of capacity loss. Advanced sparse Bayesian predictive modeling (SBPM) methodology is employed to capture the underlying correspondence between capacity loss and sample entropy. You et al. [11] proposed a data-driven approach to trace battery SOH by using data such as current, voltage, temperature as well as historical distributions.Song et al. [12]Data-driven hybrid remaining useful life estimation approach for spacecraft lithium-ion battery by fussing IND-AR model and empirical model via the state-space model in RPF. Iterative updating is used to improve the prediction capability of ND-AR. Yape et al. [13] combined empirical mode decomposition (EMD) and autoregressive integrated moving average (ARIMA) model is for prediction of lithium-ion batteries remaining useful life (RUL) in battery management system (BMS) used in electric vehicles. Luping et al. [14] propose a hybrid approach combining Variational mode decomposition (VMD) denoising technique, auto-regressive integrated moving average (ARIMA), and GM(1,1) models for battery RUL prediction.
The autoregressive integrated moving average (ARIMA) model has been one of the most widely used models in time series forecasting [15]. RG Kavasseri et al. [16] examines the use of fractional-ARIMA or f-ARIMA models to model, and forecast wind speeds on the day-ahead (24 h) and two-day-ahead (48 h) horizons. A hybridization of artificial neural networks and ARIMA model is proposed by Mehdi et al [17] to overcome the mentioned limitation of ANNs and yield a more general and more accurate forecasting model than traditional hybrid ARIMA-ANNs models. The annual energy consumption in Iran is forecasted using 3 patterns of ARIMA-ANFIS model by Sasan et al [18].
ARIMA has found applicaitons in forecasting social, economic, engineering, foreign exchange, and stock problems. It predicts future values of a time series using a linear combination of its past values and a series of errors [19,20]. Since batteries in the data center are always on charging mode thus deep discharge is a rare occurrence for batteries and their distinctive internal chemistry causes different behaviors like stationary or stochastic for each battery. Also, failure data is not available in real life which makes it a challenge to accurately predict the battery health before it fails. In this paper, we develop a cluster assisted ARIMA model to improve the accurate prediction of battery health. Our groundwork for the cluster consistency with battery data is demonstrated in [21]. Clustering is applied in many applications to improve the model forecasting accuracy. The proposed K shape-based cluster assisted forecasting results are compared with actual battery data and without clustered ARIMA forecasting [22,23].
The rest of the paper is organized as follows: Section 2 describes the features of the data center and data set used for the study. Section 3 describes data preprocessing and explain the methodology by introducing the algorithms for cluster consistency and clustered ARIMA forecasting. Section 4 shows the steps to implement the proposed clustered forecasting method. Section 5 demonstrates the battery cluster consistency detection results and cluster assisted ARIMA forecasting discusses the effectiveness of the method by comparing the results with actual data and without cluster assisted forecasting ARIMA model. Section 6 concludes this work.

Overview of the data set
In this paper, data is collected from a large-scale social media company located in China. One year data is used for research with 407266 data points and a sampling interval of 1 min. This data set includes the variables of datacenter main power, transmission units, battery units, cooling systems, and DC load values. Data set variables are shown in Table 1 Our objective is to develop an accurate and scalable method for the battery life degradation forecasting model. Voltage is utilized in the simplest of BMS of small vehicles to large scale data centers. Our data has 40 batteries voltages. Battery aging features are selected from domain knowledge of batteries [8]. 3. Methodology Figure 2 shows the flowchart of the proposed method and the steps of the proposed method are given as follows.
Step 1 Data Preprocessing: First, separate the battery voltage data from the data set. Extract the historic values of first-month battery voltages and keep updating the real-time voltage values.
Step 2 Cluster consistency: Apply the K shape-based clustering to first month data and updated real time data separately. Compare first month and updated month clustering for cluster consistency. If a clusters are inconsistent, then go to step 3.
Step 3 Clustered arima forecasting: Chose a single battery from the cluster and cluster member as a predictor to fit an ARIMA model. If clustering outlines a single battery as a cluster fit an ARIMA model without a predictor to predict battery health.If a declining trend is predicted then it is a degrading battery. otherwise, go back to step 2 Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 January 2020 doi:10.20944/preprints202001.0387.v1

Data preprocessing
Data cleaning is the initial step in the data preprocessing step, identifying the missing values and correcting the raw data for analysis. See section 2 for multiple features of the data set. Battery voltage data is utilized to forecast battery health. First month data is extracted from the data set and used as a baseline for comparing clustering and voltage status with real-time updated data. Data centers real-time data continuously update itself, we used one year of data and divide the data in 12 months to update on each iteration of clustering. See subsection 4.1.

Cluster consistency
We now present our proposed algorithm 1 based on K shape clustering for battery cluster consistency for data centers. For a detail description of the K shape-based clustering algorithm (see Section 3.2.1).
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 January 2020 doi:10.20944/preprints202001.0387.v1 K shape-based algorithm expects as input the battery voltage data set V ij to be the first-month historic voltage data set and LV ij the latest and updated voltage data set, where (i) is the time and (j) is the total number of batteries. B is the set of batteries when clustering is applied in the first month. LB is the set of batteries when clustering is applied in the latest month. DA is the set of inconsistent batteries cluster resulting in a comparison of the latest and first month LB, B.If DA is not equal to ∅, its an inconsistent or outlined battery cluster. MB,MC are the first and latest month clusters mean voltage respectively. These sets also represent cluster voltage status relative to other clusters. The difference between the MB,MC gives us DM. If DM is not equal to ∅, cluster voltage status is changed. 3.2.1. K shape-based clustering K shape clustering is an iterative refinement algorithm to isolate each cluster with keeping the shapes of time-series data. In K shape, cross-correlation measures are implemented to calculate the centroid of all clusters, and then update the members of each cluster [24].
In the assignment step,the algorithm updates the cluster memberships by matching all time series with all calculated centroids and by appointing each time series to the cluster of the closest centroid as shown in Equation 1. In the refinement step, the cluster centroids are updated to show the shift in cluster memberships in the previous step. The algorithm repeats these two steps until either no change in cluster membership occurs or the maximum number of iterations allowed is reached.

Clustered ARIMA forecasting
A clustered ARIMA forecasting method is proposed for cluster consistency and cluster voltage status change in DM and DA. Algorithm 2 is proposed for forecasting battery health using cluster members as predictors to improve the forecasting accuracy than a simple ARIMA forecasting. ARIMA is widely used in numerous applications including finance, engineering, social sciences, and agriculture. ARIMA models are the integration of Autoregressive models (AR) and Moving Average models (MA). ARIMA models give good accuracy in forecasting relatively stationary time-series data [25]. DA or DM either can be input set. Extracting a battery element from the set v j makes a new set DC. Extracting another element from DA from the remaining elements after v j extraction results in R. R is the set of predictors used to forecast the battery element in DC. Fit an ARIMA model with R predictors to forecast DC. AF is the battery forecasted voltage values. If a declining trend is predicted in AF than the battery is degrading and if stable trend is predicted than the battery is stable

Cluster consistency detection
Import the time-series data transformed into CSV format in the data preprocessing step for R programming. Dtwclust package is used for time series clustering in R. For K shape-based clustering battery data frame should be converted into a matrix see Section 3.2.1). Plot the K shape-based clustering with Plot function is used to visualize the results. Repeat this process every month until an inconsistent cluster is detected and perform clustered ARIMA forecasting See subsection 4.2. An overview of the clustering inconsistency detection procedure is shown in Figure 3.

Implementing clustered ARIMA forecasting
The objective of this procedure is to forecast the battery voltage detected by cluster consistency. An overview of the method is shown in Figure 3. Import "Forecast" package in R. Select a battery from the inconsistent cluster to forecast. Perform ACF, PACF and dicky fuller test to check the data stationarity. Use auto.ARIMA function to build the fitting model for the selected battery. Select cluster predictors for "Xreg" function in the fitting model, If the cluster contains only one battery than "Xreg" function is not required. Use the "forecast" function to forecast the battery voltage. If the declining trend is shown, the cluster is degrading and if the trend is stable than the battery will be stable in the future as well.

Battery voltage time series clustering
K shape-based clustering on the first month of the battery dataset results in 3 clusters see Figure 5. Cluster consistency is compared every month. Figure 6 shows consistent cluster members from 1 st to 8 th . Inconsistent cluster is shown in Figure 7 after 9 months. Battery 6 is now separated by battery 36 and 39, which was originally in the same cluster from the first month. This can also be interpreted as battery 6 is outlined from its original cluster.  This change in cluster consistency is an indication of a change in battery voltage behavior. To utilize this new information and predict the battery health from each cluster an improved accuracy forecasting model is discussed see subsection 5.3

ARIMA forecasting
The proposed clustered ARIMA approach with improved prediction accuracy model is evaluated by comparing actual voltage with clustered predictors(predictors within the cluster) , single predictors(without clustering), and total predictors(complete data).The metrics used are root mean square error (RMSE), mean average error (MAE), mean average percentage error (MAPE). Batteries 6,15 and 36, one battery from each cluster is selected for demonstration. Cluster inconsistency was detected on 9 th month, data of 9 th month is transformed for the forecasting model. ACF and PACF for the transformed data are shown in Figure 8. Table 2 shows the augmented Dickey-Fuller test of the selected batteries. Batteries are selected from different clusters each shows different voltage behavior, which will require a different fitting model for each battery. The forecast package uses (auto.ARIMA) function to automatically select the best-fitted model by comparing other models. AIC and BIC are both penalized-likelihood criteria that are used for fit criteria [26]. Table 3 shows the AIC and BIC values of the best-fitted model on the batteries for total, single and clustered predictors scenario. Battery 6 (cluster 2) is a single member in cluster 2, battery 6 has zero external predictors in the cluster at the point of cluster inconsistency detection by K shape clustering. This makes battery 6 (cluster 2) a special case because clustered predictors and single predictor case is equal for battery 6.Prediction results of battery 6 with single or clustered predictors have better accuracy than Total predictors. This argument is further verified for battery 15 (cluster 1) and battery 36 (cluster 3) with the metrics comparison of the clustered predictors, single predictors and total predictors in Table 4.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 January 2020 doi:10.20944/preprints202001.0387.v1 The overall prediction accuracy after the application of K shape clustering is improved when clustered predictors are utilized. Battery voltage is compared against actual voltage , Clustered predictors and single predictor in Figure 9, Figure 10 and Figure 11. Battery 6 only has one comparison with clustered predictor because its the only battery in cluster 2.Battery 15 and 36 are compared with single predictor,clustered predictors and actual voltage in Figure 9 and Figure 11. Actual voltage is plotted against total predictors forecasting for battery 6,15 and 36 in Figure 12, Figure 13 and Figure 14. It is clear from Table 4 and these figures that clustered predictor model is better fit for the battery voltage data.

Effectiveness of clustered ARIMA approach
Forecasting model predicted declining trend in battery 6, this can be confirmed in Figure 15 with actual resistance data. Battery-6(cluster 2) shows an exponential increase in resistance after nine months. This degradation was accurately predicted by our proposed method. Similarly, stable resistance of battery 15 (cluster 1) and battery 36 (cluster 3) verify the predicted results. Actual voltages of the battery pack after the cluster inconsistency and predicted results are shown in Figure 16. A clear drop in battery 6 voltage can be seen which was predicted by the proposed method. Furthermore, battery 15,36 from clusters 1 and 3 predicted to be stable batteries as shown in Figure 16. resistance-validate-v2.JPG

Conclusion
ARIMA forecasting with clustering predictors is proposed to predict the battery health. Forecasting accuracy of ARIMA model with the 40 batteries in the data center is improved. It is observed that the K shape-based clustering assisted result can significantly improve the ARIMA forecasting accuracy compared with the single predictor and total data predictors . A few challenges with our data-driven technique implications are the cleaning and preparation of data set, loss of data and missing values that have to be addressed to apply the proposed method.