A Hybrid ASW-UKF-TRF Algorithm for Efficient Data Classification and Compression in Lithium-Ion Battery Management Systems

Huang, Bowen; Xie, Xueyuan; Yi, Jiangteng; Yu, Qian; Xu, Yong; Liu, Kai

doi:10.3390/electronics14193780

Open AccessArticle

A Hybrid ASW-UKF-TRF Algorithm for Efficient Data Classification and Compression in Lithium-Ion Battery Management Systems

by

Bowen Huang

,

Xueyuan Xie

,

Jiangteng Yi

,

Qian Yu

,

Yong Xu

and

Kai Liu

^*

State Grid Hunan Integrated Energy Service Co., Ltd., Changsha 410007, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(19), 3780; https://doi.org/10.3390/electronics14193780

Submission received: 1 August 2025 / Revised: 15 September 2025 / Accepted: 22 September 2025 / Published: 24 September 2025

(This article belongs to the Special Issue IoT Applications for Renewable Energy Management and Control, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Electrochemical energy storage technology, primarily lithium-ion batteries, has been widely applied in large-scale energy storage systems. However, differences in assembly structures, manufacturing processes, and operating environments introduce parameter inconsistencies among cells within a pack, producing complex, high-volume datasets with redundant and fragmented charge–discharge records that hinder efficient and accurate system monitoring. To address this challenge, we propose a hybrid ASW-UKF-TRF framework for the classification and compression of battery data collected from energy storage power stations. First, an adaptive sliding-window Unscented Kalman Filter (ASW-UKF) performs online data cleaning, imputation, and smoothing to ensure temporal consistency and recover missing/corrupted samples. Second, a temporally aware TRF segments the time series and applies an importance-weighted, multi-level compression that formally prioritizes diagnostically relevant features while compressing low-information segments. The novelty of this work lies in combining deployment-oriented engineering robustness with methodological innovation: the ASW-UKF provides context-aware, online consistency restoration, while the TRF compression formalizes diagnostic value in its retention objective. This hybrid design preserves transient fault signatures that are frequently removed by conventional smoothing or generic compressors, while also bounding computational overhead to enable online deployment. Experiments on real operational station data demonstrate classification accuracy above 95% and an overall data volume reduction in more than 60%, indicating that the proposed pipeline achieves substantial gains in monitoring reliability and storage efficiency compared to standard denoising-plus-generic-compression baselines. The result is a practical, scalable workflow that bridges algorithmic advances and engineering requirements for large-scale battery energy storage monitoring.

Keywords:

data preprocessing; sliding window; traceless Kalman filter; random forest; data classification processing

1. Introduction

In the context of increasingly severe global climate change and environmental pollution, the traditional energy structure dominated by fossil fuels must gradually transition to a low-carbon energy system centered on renewable energy sources [1]. As social electricity demand continues to grow and large-scale renewable energy sources are gradually integrated into the grid, the complexity of the power grid structure also increases [2], placing higher demands on grid stability, efficiency, and economic viability. However, annual losses from curtailed wind and solar power in the new energy sector still amount to nearly 10 billion kilowatt-hours [3]. Additionally, since thermal power plants provide most of the power auxiliary services, their own operational load rates are relatively low, and the costs of flexibility upgrades are substantial [4], resulting in industry-wide losses. Energy storage systems can provide various services to power grid operations, including peak shaving, frequency regulation, black start, and demand response support [5], making them an important means to enhance the flexibility, economic efficiency, and safety of traditional power systems. Energy storage can significantly enhance the utilization rate of renewable energy sources, such as wind and solar power, support distributed generation and microgrids [6], and is a crucial technology for promoting the transition from fossil fuels to renewable energy sources [7].

However, in actual operation [8,9], energy storage systems face challenges such as the large number of battery cells, their complex structure, and prolonged operational periods [10]. This necessitates frequent monitoring and assessment of battery status to ensure the stable and safe operation of energy storage power plants. In particular, the automated classification of cell- and module-level states—for example, distinguishing between regular operation and anomalous behaviors, identifying early-stage faults, categorizing degradation stages (SOH levels), and detecting imminent end-of-life indicators—directly supports critical operational functions. Timely and reliable state classification enables online condition monitoring and early fault detection to prevent safety incidents (e.g., thermal runaway), supports predictive maintenance and remaining useful life (RUL) estimation to reduce unplanned outages and maintenance costs, and provides inputs for dispatch and power-optimization strategies that increase renewable energy utilization and overall economic returns. Moreover, state-aware classification facilitates hierarchical control and fault isolation in PACK-level systems, allowing operators to selectively disengage or re-route affected modules and thereby maintain service continuity. Under prolonged standby conditions and non-standard operational conditions, a significant amount of redundant and ineffective data is generated. These data are also unable to accurately reflect the operational status of the energy storage system itself [11], thereby increasing transmission and storage costs while impairing system monitoring, maintenance, and optimization. Targeted preprocessing and classification must therefore both reduce data volume and explicitly preserve diagnostically relevant signatures (e.g., transient features and SOH-related patterns) so that downstream decision-support algorithms receive high-fidelity inputs. Additionally, deployment-oriented metrics such as inference latency, memory footprint, and false-alarm rates should be reported to demonstrate practical applicability. Therefore, conducting data processing on energy storage system data and exploring methods to enhance data quality and reduce data volume hold significant importance [12].

The primary purpose of battery data extraction is to study the charge characteristics (SOC), health status (SOH), and remaining useful life (RUL) of lithium batteries [13,14]. Currently, research on battery data preprocessing focuses on various data features that reflect battery characteristics. Reference [15] proposes a neural network-based algorithm that uses actual time series data measured in vehicles to achieve continuous automated estimation of battery health status. This method simplifies the traditional modeling process and offers greater versatility and application flexibility. Reference [16] proposes a battery modeling method based on an improved LSTM neural network, which uses unsupervised algorithms to extract typical recurring loads from in-vehicle time series data, achieving data compression and balanced sampling for battery modeling. This method maintains accuracy even with significantly compressed data volumes, making it suitable for in-vehicle battery environments with limited computational power. The study referenced in [17] proposes a data compression method that converts monthly electric vehicle charging raw sequence data into feature maps, effectively reducing data volume while retaining key information, thereby reducing model training time and achieving a balance between feature representation and information preservation. Reference [18] introduces multi-stage data compression processing in battery capacity prediction, using PCA to compress the dimensionality of battery aging features, then employing t-SNE to preserve local structure and map data to a visualizable space, combined with DBSCAN to remove noise from compressed data. This achieves effective dimensionality reduction and data cleaning for high-dimensional data, enhancing model training efficiency and prediction accuracy. These methods are highly applicable to vehicle power batteries but are not fully compatible with data processing for energy storage batteries. References [19,20,21,22] modify and optimize the Kalman filter algorithm to improve data preprocessing effectiveness, achieving good results with minimal computational effort. However, this method is sensitive to initial conditions and low-quality data, and may fail when initial estimation errors increase. Reference [22] (Peng et al., Applied Energy, 2019) develops an improved cubature Kalman filter (CKF) for more accurate SOC estimation under nonlinear battery dynamics. That work is primarily concerned with model-based state estimation for individual cells or packs and focuses on improving estimation accuracy under model nonlinearity. It does not address large-scale data management tasks such as time-series segmentation, diagnostic-aware compression, or the preservation of transient fault signatures in compressed archives. By contrast, the present work targets station-scale operational data pipelines: the ASW-UKF module is designed for online consistency restoration and robust handling of missing/corrupted samples in streaming data, and the TRF performs temporally aware, importance-weighted multi-level compression to retain diagnostically relevant events while reducing storage. Thus, although both lines of work use Kalman-filtering ideas for robustness in estimation, their goals and end-to-end functionality are different. Literature [23] employs unsupervised learning autoencoders for battery data denoising, leveraging their robust feature extraction capabilities to achieve satisfactory results in data denoising. However, the output data may suffer from over-smoothing, loss of original battery data features, and high computational costs. Reference [24] proposes a bidirectional LSTM encoder–decoder for SOC sequence estimation, leveraging sequence-learning to predict SOC trajectories. That approach is essentially a data-driven sequence modeling solution aimed at improving SOC prediction accuracy; it typically relies on labeled training data, offline model training, and relatively heavy inference cost. It likewise does not provide an integrated solution for online data cleaning, missing-data recovery, segmentation, or compression tuned to preserve transient diagnostic signatures. In contrast, our ASW-UKF-TRF pipeline combines an online, light-weight consistency-restoration module (ASW-UKF) with a compression module (TRF) that is explicitly designed to balance compression efficiency and diagnostic fidelity for large heterogeneous station datasets.

Recent research has further expanded in two directions that are directly relevant to our work. First, adaptive and online variants of Kalman filtering—including sliding-window and context-aware Unscented Kalman Filter (UKF) schemes—have been proposed to provide robust, low-latency state reconciliation under non-stationary field conditions, making them more suitable for continuous station operation than conventional offline filters [25,26,27]. Second, there is growing interest in temporally aware segmentation and diagnostic-aware (importance-weighted) compression techniques that explicitly prioritize retention of transient fault signatures and diagnostically valuable segments while aggressively compressing low-information intervals; such approaches aim to balance downstream diagnostic performance with storage constraints [28,29]. Notably, a small but rising body of work has begun to explore integrated pipelines that jointly consider online consistency restoration, transient preservation, and compression—a combination rarely addressed by vehicle-focused studies and one that is crucial for energy-storage-scale deployments where data volume, heterogeneity, and the operational duty cycle differ markedly [30]. In this manuscript, we therefore position ASW-UKF as the online consistency-restoration component and TRF as the temporally aware, importance-driven compression component, arguing that their joint design better meets the specific requirements of energy storage stations than approaches that treat preprocessing and compression as separate, offline steps. The algorithm comparison content is shown in Table 1 and Table 2.

Recent advances in preprocessing and compression techniques for lithium-ion battery data have been primarily driven by research on electric-vehicle (EV) power batteries, where the principal constraints are limited on-board computing power and strict latency requirements. Those studies have developed effective denoising, feature-extraction, and compact-representation methods (e.g., sequence models, autoencoders, and model-based filters) that are suitable for vehicle-scale deployments. However, grid-scale energy storage systems present fundamentally different challenges: station-level datasets are several orders of magnitude larger, contain heterogeneous behavior from thousands of cells or modules operating in parallel, and include long-term trends, intermittent faults, and environmental disturbances that do not commonly appear in EV datasets. Consequently, algorithms optimized for EV scenarios often fail to deliver the required combination of scalability, information preservation (especially of transient events), and operational interpretability that power-plant operators require for online monitoring, fault diagnosis, and asset management.

To address these gaps, we propose an integrated pipeline—Adaptive Sliding Window/Unscented Kalman Filter/Time-series Random Forest (ASW–UKF–TRF)—specifically tailored to the data and operational characteristics of energy storage power plants. The pipeline is designed with three linked objectives: (1) reliably remove redundant and low-value observations while preserving diagnostically relevant signatures; (2) reconstruct and smooth incomplete or noisy signals with model-based filtering to retain physical interpretability; and (3) provide a hierarchical, importance-weighted compression that supports different downstream consumers (real-time alarms, medium-term diagnostics, and archival analytics).

The first stage, ASW, performs lightweight, data-driven cleaning and deduplication using an adaptive windowing strategy. Rather than applying a fixed-length segmentation, ASW dynamically adjusts the window size in response to local signal stability (for example, using a per-sample voltage-change-rate statistic). Within each adaptive window, triplet-based equality checks (voltage, current, SOC—rounded to a configurable number of decimals) identify exact or near-duplicate samples for removal or collapse. This design preserves short transient events by skipping deduplication when sign reversals or abrupt rate changes are detected, while aggressively compressing during long, steady-state segments. ASW’s behavior is governed by a small set of interpretable hyperparameters (min/max window, change threshold, and rounding precision), which we tune empirically and report for reproducibility.

The second stage, UKF, is a model-based smoothing and imputation module that operates on the ASW-pruned stream. We select the Unscented Kalman Filter because it propagates nonlinear state uncertainties with minimal linearization error and provides an explicit uncertainty estimate for each imputed value. This property facilitates downstream decision thresholds and confidence-aware compression. In practice, the UKF reduces high-frequency noise, fills intermittent missing samples produced by ASW, and preserves the physical coherence of derived quantities (e.g., SOC trajectories). UKF tuning focuses on process and measurement noise scales (Q, R) and unscented transform parameters (α, β, κ). We provide recommended ranges and a simple data-driven rule for selecting the change threshold and Q/R scaling in Section 3.

The third stage, TRF, combines time-series feature extraction with a RandomForest classifier that is trained to categorize each processed sample (or sample window) into operationally meaningful classes (e.g., regular, transient event, degradation indicator, redundant). TRF differs from off-the-shelf static classifiers by: (a) using windowed temporal features (rolling statistics, short-term trend slopes, spectral-energy proxies) that capture both steady and transient behavior; (b) incorporating event-importance weighting in the loss function to prioritize rare but critical events; and (c) producing class-specific importance scores used to drive hierarchical compression (high-importance samples are retained at fine granularity; low-importance data are coarsely aggregated). This graded compression strategy yields substantial reductions in storage and transmission load while preserving the information that operators and diagnostic pipelines need. The complete algorithm flow is shown in Figure 1

Evaluation of the pipeline utilizes multi-faceted metrics, including reconstruction error (RMAE/RMSE), coefficient of determination (R²), classification F1 score for important event classes, and deployment-oriented indicators such as compressed data ratio, model parameter count, and inference latency (measured on a reference CPU). We also conduct ablation studies (e.g., ASW off, UKF replaced by EKF, and TRF replaced by standard RF) and sensitivity analyses for key hyperparameters (ASW thresholds, UKF Q/R, TRF window length, and minimum segment length). These analyses demonstrate that ASW–UKF–TRF achieves consistent improvements in accuracy retention and compression efficiency relative to representative baselines (unsupervised LSTM, MCC-EKF, autoencoder, FSW-UKF, ASW-EKF), while maintaining interpretable uncertainty estimates that aid operational decision-making.

Finally, we consider practical deployment aspects. The pipeline is implemented in Python 3.12.2 with a modest computational footprint (experimental runs reported on an Intel i5 machine with 8 GB RAM), and the modular design allows computationally intensive components (e.g., TRF training) to be offloaded to cloud or batch infrastructure. At the same time, lightweight ASW/UKF can operate near-real-time on edge controllers. Remaining limitations include the need for broader cross-site validation (different chemistries and usage patterns), further optimization for embedded deployment, and automated hyperparameter adaptation for heterogeneous fleets—directions that we discuss in the Conclusion and future work sections.

The remainder of this paper is organized as follows. Section 2 presents a description of the methodology considered (ASW, UKF, TRF usage methods, combined use of methods, and their roles in each section), Section 3 conducts simulation experiments, and Section 4 presents the conclusions.

2. Adaptive Sliding Window—Unscented Kalman Filter—Sequential Random Forest Algorithm

2.1. Adaptive Sliding Window (ASW)

To enhance the flexibility and fidelity of time series data cleaning, this paper proposes an adaptive sliding window method. This method dynamically adjusts the window length to adapt to the characteristics of data changes at different stages, thereby achieving more accurate data redundancy removal and detail retention.

ASW slides a variable-sized window over the time series data and adjusts the window based on the following principles:

Data change rate: Calculate the data change rate within the window. Use a small window in rapidly changing areas to capture details and a large window in stable and smooth areas to improve compression efficiency.
Key event detection: Identify key events such as charging transitions and abnormal points through rules (e.g., voltage exceeding thresholds, temperature anomalies), and force the use of the smallest window to retain details.

w = \{\begin{array}{l} w_{m}, & if |\frac{y_{t} - y_{t - 1}}{t_{t} - t_{t - 1}}| > θ or (y > V_{c} or T > T_{c}) \\ \min (w_{M}, w \cdot k), & otherwise \end{array}

(1)

In this formula,

y_{t}

and

y_{t - 1}

represent the measured values at times

t_{t}

and

t_{t - 1}

, respectively, while θ denotes the threshold for the data change rate.

V w

and

T_{c}

correspond to the critical limits of voltage and temperature. The parameter

w

indicates the current window size,

w_{m}

denotes the minimum window size, and

k

is a scaling factor that controls the expansion of the window.

2.2. Unscented Kalman Filter (UKF)

Extended Kalman filtering is a commonly used method for estimating the state of nonlinear systems. It linearizes the system function using a first-order Taylor expansion and combines it with the standard Kalman filtering framework for prediction and updating. The state transition and observation models of EKF are as follows:

\{\begin{array}{l} x_{k} = f (x_{k - 1}, u_{k - 1}) + w_{k - 1}, \\ z_{k} = h (x_{k}) + v_{k} \end{array}

(2)

where

w_{k - 1}

and

v_{k}

representing process noise and observation noise, respectively,

x_{k}

represents the system state at time step

k

,

z_{k}

is the observation at time step

k

.

However, this method is prone to introducing linearization errors when the system is highly nonlinear or model uncertainty is substantial, which affects estimation accuracy and stability. To overcome this limitation, this paper adopts the more advanced Unscented Kalman Filter (UKF). By applying the unscented transformation to reintroduce a set of representative Sigma points, the mean and covariance are propagated directly in the original nonlinear space, avoiding explicit linearization and enabling more accurate capture of the statistical characteristics of nonlinear systems. The primary process of the unscented transformation is as follows:

The following Sigma points are generated:

\{\begin{cases} χ_{k}^{(i)} = {\hat{x}}_{k - 1} + {(\sqrt{(n + λ) P_{k - 1}})}_{i}, i = 1, \dots, n \\ χ_{k}^{(i)} = {\hat{x}}_{k - 1} - {(\sqrt{(n + λ) P_{k - 1}})}_{i - n}, i = n + 1, \dots, 2 n \\ χ_{k}^{(0)} = {\hat{x}}_{k - 1} \end{cases}

(3)

where Sigma point prediction is as follows:

χ_{k | k - 1}^{(i)} = f (χ_{k}^{(i)}, u_{k - 1})

(4)

Predicting the mean and covariance:

\{\begin{cases} {\hat{x}}_{k | k - 1} = \sum_{i = 0}^{2 n} W_{m}^{(i)} χ_{k | k - 1}^{(i)} \\ P_{k | k - 1} = \sum_{i = 0}^{2 n} W_{c}^{(i)} [χ_{k | k - 1}^{(i)} - {\hat{x}}_{k | k - 1}] {[χ_{k | k - 1}^{(i)} - {\hat{x}}_{k | k - 1}]}^{T} + Q \end{cases}

(5)

Predicted observation values:

\{\begin{cases} γ_{k}^{(i)} = h (χ_{k | k - 1}^{(i)}) \\ {\hat{z}}_{k} = \sum_{i = 0}^{2 n} W_{m}^{(i)} γ_{k}^{(i)} \end{cases}

(6)

Observed covariance and cross-covariance:

\{\begin{cases} P_{z z} = \sum_{i = 0}^{2 n} W_{c}^{(i)} [γ_{k}^{(i)} - {\hat{z}}_{k}] {[γ_{k}^{(i)} - {\hat{z}}_{k}]}^{T} + R \\ P_{x z} = \sum_{i = 0}^{2 n} W_{c}^{(i)} [χ_{k | k - 1}^{(i)} - {\hat{x}}_{k | k - 1}] {[γ_{k}^{(i)} - {\hat{z}}_{k}]}^{T} \end{cases}

(7)

Kalman gain and update:

\{\begin{cases} K_{k} = P_{x z} P_{z z}^{- 1} \\ {\hat{x}}_{k} = {\hat{x}}_{k | k - 1} + K_{k} (z_{k} - {\hat{z}}_{k}) \\ P_{k} = P_{k | k - 1} - K_{k} P_{z z} K_{k}^{T} \end{cases}

(8)

In the above equations,

χ_{k}^{(i)}

the i-th sigma point at step

k

, generated from the estimate

{\hat{x}}_{k - 1}

and its covariance

P_{k - 1}

. The parameters

n

and

λ

are the state dimension and scaling factor, respectively. The weights

W_{i}^{(i)}

and

W_{c}^{(i)}

correspond to the mean and covariance of each sigma point.

Q

and

R

represent the process of noise covariance and observation noise covariance.

{\hat{x}}_{k | k - 1}

and

P_{k | k - 1}

denote the predicted state and its covariance, while

{\hat{z}}_{k}

is the predicted observation. The variables

P_{z z}

and

P_{x z}

are the observation covariance and cross-covariance. finally

K_{k}

is the Kalman gain used for updating the state estimate

{\hat{x}}_{k}

, and its covariance

P_{k}

.

2.3. Time Series Random Forest (TRF)

Random Forest is an ensemble learning method composed of multiple decision trees (classification and regression trees) with relatively weak performance. It is trained through random selection of samples and features, and ultimately produces an ensemble output using majority voting (for classification) or averaging (for regression). Its structure is shown in Figure 2. RF not only possesses strong nonlinear modeling capabilities and robustness, but can also be used to address issues with missing data. Each tree independently predicts missing samples, and the optimal estimate is ultimately determined through a voting process.

However, traditional RF assumes that samples are independent of each other and does not consider dynamic dependencies in time series, which limits its performance in time series modeling.

To address this issue, this paper introduces a time window mechanism based on RF. It incorporates historical information into the modeling process by constructing lagged features, thereby forming a time series random forest algorithm that enhances the model’s ability to characterize the system’s evolution characteristics. The model structure and training and prediction process of TRF are shown below:

Model structure:

Let the original time series be

{\{x_{t}\}}_{t = 1}^{T}

and the target variable be

{\{y_{t}\}}_{t = 1}^{T}

. During the training phase, TRF constructs input samples in the following format:

X_{t} = [x_{t}, x_{t - 1}, \dots, x_{t - p + 1}]

(9)

This represents the target output at time t, and p is the time window length (lag order); the training set is generated using a sliding window method:

D = {(X_{t}, y_{t})}_{t = p}^{T}

(10)

D

represents the training set generated using a sliding window method.

y_{t}

is the final prediction result, computed as the average (for regression) or majority vote (for classification) of all trees.

Each sample contains the current and previous p − 1 observation information, which can be used to predict the current or future state.

The overall training and prediction process is as follows: TRF trains multiple decision trees on the above-constructed sample set, with each tree constructed based on different data subsets and feature subsets. The final prediction result is the average output (regression) or majority vote (classification) of all trees:

{\hat{y}}_{t} = \frac{1}{N} \sum_{i = 1}^{N} f_{i} (X_{t})

(11)

where

f_{i} (\cdot)

is the i-th decision tree, and N is the total number of trees. Its structure is shown in Figure 3.

2.4. Adaptive Sliding Window-Unscented Kalman Filter Data Preprocessing (ASW-UKF)

By cleaning and deduplicating lithium battery time series data using the adaptive sliding window (ASW) method, noise (such as high-frequency random fluctuations and spike noise) can be effectively removed from the original data while retaining key trends. However, the compression process may result in data point loss or discontinuity, especially in areas with high rates of change or during critical events. Additionally, battery data may contain missing values due to sensor failures or communication interruptions. To address these issues and further smooth the data to enhance the accuracy of subsequent analyses, the Unscented Kalman Filter (UKF) is employed as a subsequent step to the ASW method to fill in missing data and achieve data smoothing. Together, these two methods complete the entire data preprocessing phase.

Among them, the data smoothing process is as follows:

State equation:

x_{k} = f (x_{k - 1,} u_{k - 1,} w_{k - 1})

(12)

In the equation,

x_{k}

is the state at time k, f is the nonlinear state transition function,

u_{k - 1}

is the control input, and

w_{k - 1}

is the process noise.

2.: Observation equation:

z_{k} = h (x_{k}, v_{k})

(13)

In the equation,

z_{k}

represents the compressed data points output by ASW, h represents the nonlinear observation function; and

v_{k}

represents the observation noise.

3.: Fill in missing values:

When

z_{k}

is missing, UKF skips the observation update step and directly uses the state prediction value

{\hat{x}}_{k | k - 1}

as the estimate to fill in the missing points. The prediction value is based on the state transition function f and the state

{\hat{x}}_{k - 1}

at the previous moment. Finally, a continuous estimate sequence is generated to fill in the missing points in the ASW output.

{\hat{x}}_{k | k - 1} = \sum_{i = 0}^{2 n} W_{i} \cdot f (σ_{i, k - 1}, u_{k - 1})

(14)

Among them,

σ_{i, k - 1}

is the Sigma point, W_i is the weight, and n is the state dimension. If Z_k is missing, then

{\hat{x}}_{k} = {\hat{x}}_{k | k - 1}

.

4.: Smooth the data:

Generate a smooth state estimate

{\hat{x}}_{k | k - 1}

by fusing the predicted value

{\hat{z}}_{k |}

and the observed value

{\hat{x}}_{k}

using the Kalman gain.

{\hat{x}}_{k} = {\hat{x}}_{k | k - 1} + K_{k} \cdot (z_{k} - {\hat{z}}_{k | k - 1})

(15)

In this case,

K_{k} = P_{x z, k} \cdot {P_{z z, k}}^{- 1}

is the Kalman gain, and

{\hat{z}}_{k | k - 1}

is the predicted observation value.

The preprocessing of lithium battery time series data is accomplished by combining an adaptive sliding window (ASW) with an unscented Kalman filter (UKF). The two techniques work together to achieve efficient compression, noise removal, missing value filling, and trend retention, making it suitable for battery health status estimation.

2.5. Adaptive Sliding Window-Unconditional Kalman Filter-Time Series Random Forest (ASW-UKF-TRF) Algorithm

After undergoing adaptive sliding window (ASW) and unscented Kalman filter (UKF) preprocessing steps, lithium-ion battery time-series data (such as voltage, current, and temperature) have been efficiently compressed, missing values filled, and noise smoothed, generating high-quality continuous data. These data serve as the foundation for subsequent classification and intelligent decision-making tasks. Then, using a time-series random forest (TRF), classification is performed and feature importance is evaluated to provide decision support for the battery management system. Figure 4 shows the ASW-UKF-TRF algorithm flowchart.

3. Experimental Simulation and Results

The experimental data used in this paper are derived from the actual operational data of the Battery Management System (BMS) of a lithium iron phosphate battery, assembled by China Xiamen Haichen Energy Storage Co., Ltd., (Xiamen, China) in a storage cabinet at a specific energy storage company in Hunan, China. The data were collected continuously over one month, with uninterrupted acquisition of multidimensional battery pack time-series data throughout the entire process. The sampling rate was set to once every 30 s, with a total of 80,628 raw battery operation data points recorded during the entire collection period. Data were acquired using CAN-bus Professional tools with a CANalyst-11 analyzer (serial number 31F00033448) integrated into the station CAN network. During logging, communication irregularities occasionally produced blank frames or error frames, which manifest as missing, corrupted, or noisy records in the raw logs. These fault cases are documented in the revised manuscript. The proposed ASW-UKF-TRF preprocessing pipeline performs targeted cleaning, imputation, and smoothing to remove such noise and recover usable samples for subsequent segmentation, classification, and compression. All collected data were used for full-scale training of the algorithm model to validate the end-to-end performance of the ASW-UKF-TRF method in a real-world operational environment. The code was run on a device equipped with an Intel(R) Core(TM) i5-9300H CPU at 2.40 GHz, a Windows 10 Professional 64-bit operating system, and 8.00 GB of RAM. The programming software used is Python 3.12.2, with Python supported by PyTorch 2.7.1. This framework implements data classification and compressed simulation model construction based on the ASW-UKF-TRF algorithm, providing corresponding measurement and testing environments for each module of the algorithm. The experimental test environment is shown in Figure 5, Figure 6 and Figure 7.

First, the ASW-UKF algorithm was employed to perform data preprocessing, resulting in a total of 495,686 data points being marked and processed. After a series of cleaning, denoising, filling, and smoothing operations, clean data was obtained. The original file size was 18,513,304 bytes, and the processed file size was 11,450,196 bytes, achieving a certain degree of compression. The fitting results between the processed data and the original data are shown in Figure 8, Figure 9 and Figure 10.

A comparative experiment was conducted using the same data but different methods. Compared with the algorithms in papers [17,20,24] and the processing effects of traditional algorithm combinations, the fitting degree and compression ratio were significantly improved under similar operating conditions. Additionally, a comparison of the computational costs among the different algorithms was performed, demonstrating that our method achieves better efficiency without compromising performance. The comparison results are shown in Table 3 and Table 4.

Taking total pressure as an example, the preprocessed data from each method were exported as images for intuitive comparison of the results, as shown in Figure 11. The ASW-UKF algorithm fits the original curve best compared to other algorithms and is more suitable for further processing.

After ASW-UKF data preprocessing, the data is further classified and compressed using the TRF algorithm. Based on key features such as the instantaneous change rate of voltage and the rolling standard deviation of current, the entire operation process is dynamically divided into four sequential segments: the load-discharge segment, the regular charging segment, the standby discharge segment, and the charge–discharge change segment. The processing results are shown in Figure 12 and Figure 13.

Compared with traditional random forest algorithms, time-series random forests offer better performance in processing time-series data. The classification results of RF are characterized by widespread state transitions, which are neither smooth nor stable. In contrast, TRF can significantly reduce meaningless short-term fluctuations, maintain long-term stable segments, and more closely reflect the actual operating status of energy storage power stations. A comparison of the classification performance of the two algorithms is shown in Figure 14.

Figure 15, Figure 16 and Figure 17 illustrate the effects of the proposed ASW-UKF-TRF processing on representative station time series. Figure 13 shows an overlay of original and compressed/reconstructed signals for a typical measurement channel; the close alignment indicates preservation of the primary waveform and key transient features. Figure 14 presents the residual analysis, including a histogram and time series of reconstruction errors, along with summary statistics (mean error, RMSE, MAE, and maximum absolute error). This demonstrates that the residuals remain small and concentrated, except for rare transient peaks. Figure 15 summarizes the classification-aware compression performance across the four data categories: per-class sampling reduction, per-class compression ratio (bytes retained/bytes original), and the number of preserved diagnostically relevant events per class (events detected before and after compression). For completeness, the manuscript now reports both sample-level and file-size reductions: the dataset was reduced from 80,628 samples to 40,766 samples (a 49.44% reduction in sample count), and the stored file size decreased from 18,513,304 bytes to 7,143,949 bytes (61.46% reduction in bytes). In addition to these aggregate figures, we have included quantitative metrics that support the visual plots, including reconstruction statistics (RMSE, MAE, maximum error), classification accuracy (exceeding 95% on the evaluated station dataset), anomaly F1 score for preserved events, and per-class compression ratios. These metrics confirm that the compression achieves substantial storage savings while maintaining monitoring fidelity and preserving transient diagnostic signatures. High-resolution figure files have been provided in the revised submission to improve the legibility of plots and axes.

Hyperparameter Settings

To ensure reproducibility and methodological transparency, key hyperparameters were explicitly specified for each module of the proposed ASW-UKF-TRF framework. The sampling rate was fixed at once every 30 s, yielding a uniform data resolution that underpinned subsequent feature extraction and segmentation. To ensure consistent labeling, the classification targets were clearly defined in accordance with state-of-health thresholds. Within the adaptive sliding window (ASW) module, the minimum window size, maximum window size, and the change threshold governed the flexibility of dynamic segmentation. Additionally, rounding decimals was applied to stabilize equality checks during voltage, current, and SOC comparisons.

For state estimation with the unscented Kalman filter (UKF), the process noise scale (Q) and measurement noise scale (R) were calibrated as proportional factors to balance system dynamics and measurement fidelity. In the trend-based random forest (TRF) classifier, the minimum segment length was introduced to suppress spurious fluctuations, while the maximum tree depth regulated model complexity. Collectively, these hyperparameters constitute a transparent configuration that facilitates both the evaluation of model performance and the reproducibility of results in independent studies. The hyperparameters are shown in Table 5

4. Discussion

The ASW-UKF-TRF algorithm proposed in this paper achieves end-to-end processing and efficient compression of massive battery data from energy storage power stations by using adaptive sliding window (ASW) to clean noise and duplicate data, unscented Kalman filter (UKF) to fill in missing values and smooth time series, and time series random forest (TRF) to classify and compress data based on importance. Simulation experiments based on real-world data from Hunan Province demonstrate that this method significantly reduces the total data volume while maintaining high accuracy (lowest RMAE and highest R²), providing lightweight yet high-quality time-series data support for applications such as online monitoring, fault diagnosis, and intelligent scheduling, thereby demonstrating clear engineering application value.

Theoretically, this paper integrates the ASW, UKF, and TRF paradigms for the first time, proposing a new framework that combines adaptive sliding window cleaning, filtering interpolation smoothing, and importance-based hierarchical compression. To address the characteristics of sudden changes and missing values in battery data, a dedicated UKF filtering-interpolation strategy was designed to enhance data reconstruction accuracy; the interpretability of TRF was utilized to assess the importance of different data categories, providing a quantifiable method for prioritizing time-series data management. These theoretical innovations offer new quantitative tools and methodologies for data processing and optimized scheduling in large-scale energy storage systems.

Although the method demonstrated notable improvements in accuracy and compression efficiency under the present test conditions, several limitations related to scale and battery ageing remain. First, computational cost and memory usage increase with the number of monitored variables, the length of sliding windows, and the number of sigma points used in the UKF stage. For continuous station streams, this implies higher processing time and an increased memory footprint, unless bounded by windowing or reduced-order models. Second, data from aged or faulty batteries tend to exhibit more frequent transients, nonstationary bias, and higher noise levels, which can degrade classification accuracy and compression fidelity if the importance metric or model parameters are not adapted. To make these considerations explicit, we propose the following evaluation and mitigation plan: (i) quantify runtime (throughput in samples/sec) and peak memory as functions of data volume using representative workloads (e.g., datasets of ~10⁵, ~10⁶ and ~10⁷ samples); (ii) evaluate algorithmic performance across SOH levels (e.g., nominal 100%, 90%, 80% and 70%) and for injected fault cases, reporting classification accuracy, anomaly F1, compression ratio and reconstruction error as primary metrics; (iii) explore mitigation strategies such as reduced-order state models or sparse sigma-point selection to cut per-update cost, adaptive thresholding and online parameter updates to track ageing-induced distribution shifts, and parallel/streaming implementations (batching or GPU acceleration for the TRF) to increase throughput. These steps will allow quantifying how accuracy and compression scale with dataset size and battery degradation. They will inform practical deployment choices (e.g., update frequency, model complexity, and hardware provisioning) for long-term station operation.

5. Conclusions

This study systematically evaluated the ASW-UKF-TRF pipeline on real operational data from an energy storage platform. Quantitatively, compared with the unsupervised LSTM baseline, the proposed method reduced average relative mean absolute error (RMAE) by ≈30.8% and improved average R2 by ≈0.07; versus MCC-EKF, the average RMAE decreased by ≈14.6% with R2 increased by ≈0.08; against the autoencoder, the average RMAE was reduced by ≈37.9% with R2 up by ≈0.09; compared to FSW-UKF, the average RMAE dropped by ≈18.3% with R2 improved by ≈0.08; and relative to ASW-EKF, the average RMAE decreased by ≈2.9% with R2 increased by ≈0.03. On data volume metrics, the pipeline reduced the sample count from 80,628 to 40,766 (≈approximately 49.44% fewer samples) and decreased the stored file size from 18,513,304 bytes to 7,143,949 bytes (≈approximately 61.46% reduction in bytes). Classification performance on the evaluated station dataset exceeded 95% accuracy, and visual plus residual analyses indicate that key transient diagnostic signatures are largely preserved after compression. Taken together, these results demonstrate that ASW-UKF-TRF achieves substantial dual benefits in both compression efficiency and information retention, supporting improved online monitoring, fault diagnosis, and scheduling for energy storage plants under the tested conditions.

Author Contributions

Conceptualization, B.H.; methodology, X.X. and J.Y.; software, J.Y. and Q.Y.; validation, Y.X. and K.L.; formal analysis, J.Y.; investigation, K.L.; resources, X.X.; data curation, Q.Y.; writing—original draft preparation, B.H.; writing—review and editing, B.H.; visualization, X.X.; supervision, B.H.; project administration, B.H.; funding acquisition, B.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by the Science and Technology Project of State Grid Integrated Energy Service Group Co., Ltd. [Project No. 647853240339], and the project is titled “Research on the Monitoring of the Operating Status of Energy Storage Power Stations and Station-level Intelligent Control Technology”.

Data Availability Statement

The datasets generated and analyzed during the current study are proprietary operational data owned by the collaborating company and contain commercially sensitive information. Consequently, the raw data are not publicly available. De-identified and aggregated data that are sufficient to reproduce the principal findings of this study may be provided upon reasonable request, subject to approval by the data provider and completion of a formal data-use and confidentiality agreement, including a non-disclosure agreement. Requests for access should be directed to the corresponding author, who will forward legitimate requests to the data provider. The form of any provided data and the access modality, such as secure on-site review or controlled electronic transfer, will be determined by the data provider and governed by the executed agreements.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Nomenclature

This section defines the key terms and symbols used throughout the manuscript.

$y_{t}$ , $y_{t - 1}$	Measured values at times t and t − 1, respectively.
θ	Threshold for the data change rate.
$V w$ , $T_{c}$	Critical limits of voltage and temperature, respectively.
$w^{'}$	Current window size.
$w$	Minimum window size.
$k$	Scaling factor that controls the expansion of the window.
$χ_{k}$	System state at time step k.
$z_{k}$	Observation at time step k.
$w_{k - 1}$	Process noise at time step k − 1.
$v_{k}$	Observation noise at time step k.
$u_{k - 1}$	Control input at time step k − 1.
$χ_{k}^{(i)}$	Sigma point at step k.
$W^{(i)}$	Weight associated with the i-th sigma point.
n	State dimension.
$λ$	Scaling factor.
$Q$	Process noise covariance.
$R$	Observation noise covariance.
${\hat{x}}_{k \| k - 1}$	Predicted state estimate at time step k.
$P_{k \| k - 1}$	Predicted state covariance at time step k.
${\hat{z}}_{k \| k - 1}$	Predicted observation at time step k.
$P_{z z}$	Observation covariance.
$P_{x z}$	Cross-covariance.
$K_{k}$	Kalman gain at time step k.
$P_{k}$	Updated state covariance at time step k.
${\hat{x}}_{k}$	Updated state estimate at time step k.
$X_{t}$	Input sample at time $ t $ with time window length p.
$y_{t}$	Target variable at time t.
$D$	Training set generated using a sliding window method.
$f_{i} (\cdot)$	i-th decision tree.
N	Total number of trees.
${\hat{y}}_{t}$	Final prediction result (average for regression or majority vote for classification).
p	Time window length (lag order).

References

Adeyinka, A.M.; Esan, O.C.; Ijaola, A.O.; Farayibi, P.K. Advancements in hybrid energy storage systems for enhancing renewable energy-to-grid integration. Sustain. Energy Res. 2024, 11, 26. [Google Scholar] [CrossRef]
Aslam, S.; Aung, P.P.; Rafsanjani, A.S.; Majeed, A.P.P.A. Machine learning applications in energy systems: Current trends, challenges, and research directions. Energy Inform. 2025, 8, 62. [Google Scholar] [CrossRef]
Zhao, H.; Cui, C.; Zhang, Z. Assessing the dynamics of power curtailment in China: Market insights from wind, solar, and nuclear energy integration. Int. J. Hydrogen Energy 2025, 118, 209–216. [Google Scholar] [CrossRef]
Graf, C.; Marcantonini, C. Renewable energy and its impact on thermal generation. Energy Econ. 2017, 66, 421–430. [Google Scholar] [CrossRef]
Jacobson, M.Z.; Delucchi, M.A.; Bauer, Z.A.F.; Goodman, S.C.; Cameron, M.A.; Bozonnat, C.; Chobadi, L.; Enevoldsen, P.; Erwin, J.R.; Fobi, S.N.; et al. 100% Clean and Renewable Wind, Water, and Sunlight All-Sector Energy Roadmaps for 139 Countries of the World. Joule 2017, 1, 108–121. [Google Scholar]
Naboni, E.; Natanian, J.; Brizzi, G.; Florio, P.; Chokhachian, A.; Galanos, T.; Rastogi, P. A digital workflow to quantify regenerative urban design in the context of a changing climate. Renew. Sustain. Energy Rev. 2019, 113, 109255. [Google Scholar] [CrossRef]
Wang, J.; Conejo, A.J.; Wang, C.; Yan, J. Smart grids, renewable energy integration, and climate change mitigation—Future electric energy systems. Appl. Energy 2012, 96, 1–3. [Google Scholar] [CrossRef]
Worku, M.Y. Recent Advances in Energy Storage Systems for Renewable Source Grid Integration: A Comprehensive Review. Sustainability 2022, 14, 5985. [Google Scholar] [CrossRef]
Wei, P.; Abid, M.; Adun, H.; Awoh, D.K.; Cai, D.; Zaini, J.H.; Bamisile, O. Progress in Energy Storage Technologies and Methods for Renewable Energy Systems Application. Appl. Sci. 2023, 13, 5626. [Google Scholar] [CrossRef]
Gao, T.; Lu, W. Machine learning toward advanced energy storage devices and systems. iScience 2021, 24, 101936. [Google Scholar] [CrossRef]
Lipu, M.H.; Ansari, S.; Miah, S.; Meraj, S.T.; Hasan, K.; Shihavuddin, A.; Hannan, M.; Muttaqi, K.M.; Hussain, A. Deep learning enabled state of charge, state of health and remaining useful life estimation for smart battery management system: Methods, implementations, issues and prospects. J. Energy Storage 2022, 55, 105752. [Google Scholar] [CrossRef]
Liu, S.; Li, K.; Yu, J. Battery pack condition monitoring and characteristic state estimation: Challenges, techniques, and future prospectives. J. Energy Storage 2024, 105, 114446. [Google Scholar] [CrossRef]
Lei, X.; Xie, F.; Wang, J.; Zhang, C. A review of lithium-ion battery state of health and remaining useful life estimation methods based on bibliometric analysis. J. Traffic Transp. Eng. 2024, 11, 1420–1446. [Google Scholar] [CrossRef]
Xia, F.; Wang, K.; Chen, J. State of health and remaining useful life prediction of lithium-ion batteries based on a disturbance-free incremental capacity and differential voltage analysis method. J. Energy Storage 2023, 64, 107161. [Google Scholar] [CrossRef]
Min, H.; Yan, Y.; Sun, W.; Yu, Y.; Jiang, R.; Meng, F. Construction and Estimation of Battery State of Health Using a De-LSTM Model Based on Real Driving Data. Energies 2023, 16, 8088. [Google Scholar] [CrossRef]
Heinrich, F.; Noering, F.-D.; Pruckner, M.; Jonas, K. Unsupervised data-preprocessing for Long Short-Term Memory based battery model under electric vehicle operation. J. Energy Storage 2021, 38, 102598. [Google Scholar] [CrossRef]
Yang, Y.; Yang, J.; Wu, X.; Fu, L.; Gao, X.; Xie, X.; Ouyang, Q. Battery pack capacity prediction using deep learning and data compression technique: A method for real-world vehicles. J. Energy Chem. 2025, 106, 553–564. [Google Scholar] [CrossRef]
Ma, Y.; Li, J.; Gao, J.; Chen, H. Prognostication of lithium-ion battery capacity fade based on data space compression visualization and SMA-ISVR. Appl. Energy 2024, 380, 124974. [Google Scholar] [CrossRef]
Liu, X.; Li, K.; Wu, J.; He, Y.; Liu, X. An extended Kalman filter based data-driven method for state of charge estimation of Li-ion batteries. J. Energy Storage 2021, 40, 102655. [Google Scholar] [CrossRef]
Zhang, S.; Guo, X.; Zhang, X. An improved adaptive unscented kalman filtering for state of charge online estimation of lithium-ion battery. J. Energy Storage 2020, 32, 101980. [Google Scholar] [CrossRef]
Peng, J.; Luo, J.; He, H.; Lu, B. An improved state of charge estimation method based on cubature Kalman filter for lithium-ion batteries. Appl. Energy 2019, 253, 113520. [Google Scholar] [CrossRef]
Hu, X.; Sun, F.; Zou, Y. Comparison between two model-based algorithms for Li-ion battery SOC estimation in electric vehicles. Simul. Model. Pract. Theory 2013, 34, 1–11. [Google Scholar] [CrossRef]
Chen, J.; Feng, X.; Jiang, L.; Zhu, Q. State of charge estimation of lithium-ion battery using denoising autoencoder and gated recurrent unit recurrent neural network. Energy 2021, 227, 120451. [Google Scholar] [CrossRef]
Bian, C.; He, H.; Yang, S.; Huang, T. State-of-charge sequence estimation of lithium-ion battery based on bidirectional long short-term memory encoder-decoder architecture. J. Power Sources 2020, 449, 227558. [Google Scholar] [CrossRef]
Li, C.; Kim, G.-W. Improved State-of-Charge Estimation of Lithium-Ion Battery for Electric Vehicles Using Parameter Estimation and Multi-Innovation Adaptive Robust Unscented Kalman Filter. Energies 2024, 17, 272. [Google Scholar] [CrossRef]
Costa, H.; Silva, M.; Sánchez-Gendriz, I.; Viegas, C.M.D.; Silva, I. An Evolving Multivariate Time Series Compression Algorithm for IoT Applications. Sensors 2024, 24, 7273. [Google Scholar] [CrossRef]
Pandian, A.; Subbian, S.; Natarajan, P. Robust GenUT-UKF for state of charge estimation of Li-ion battery against data and model uncertainty. J. Energy Storage 2025, 119, 116360. [Google Scholar] [CrossRef]
Nachimuthu, S.; Alsaif, F.; Devarajan, G.; Vairavasundaram, I. Real time SOC estimation for Li-ion batteries in Electric vehicles using UKBF with online parameter identification. Sci. Rep. 2025, 15, 1714. [Google Scholar] [CrossRef]
Wang, S.; Huang, P.; Lian, C.; Liu, H. Multi-interest adaptive unscented Kalman filter based on improved matrix decomposition methods for lithium-ion battery state of charge estimation. J. Power Sources 2024, 606, 234547. [Google Scholar] [CrossRef]
Wang, L.; Wang, F.; Xu, L.; Li, W.; Tang, J.; Wang, Y. SOC estimation of lead–carbon battery based on GA-MIUKF algorithm. Sci. Rep. 2024, 14, 3347. [Google Scholar] [CrossRef]

Figure 1. Overall algorithm flow chart.

Figure 2. Random Forest algorithm structure diagram.

Figure 3. Time Series Random Forest algorithm structure diagram.

Figure 4. Adaptive Sliding Window-Unconditional Kalman Filter-Time Series Random Forest Algorithm Flowchart.

Figure 5. Experimental energy storage battery structure diagram.

Figure 6. Experimental battery platform system.

Figure 7. Data acquisition CAN analyzer.

Figure 8. Voltage data processing results.

Figure 9. Current data processing results.

Figure 10. SOC data processing results.

Figure 11. Comparison of voltage data processing.

Figure 12. Total pressure continuous curve diagram after Time Series Random Forest treatment.

Figure 13. Total pressure continuous curve diagram after Random Forest treatment.

Figure 14. Comparison of Time Series Random Forest and Random Forest classification performance.

Figure 15. Compressed voltage results.

Figure 16. Compressed current results.

Figure 17. Compressed SOC results.

Table 1. Comparison table of algorithm advantages and disadvantages.

Method	Strength	Weakness
ASW-UKF-TRF (this work)	Joint online consistency + diagnostic-aware compression	Requires importance metric design and tuning
LSTM	Models complex temporal patterns well	Needs large labeled data, high compute
PCA/t-SNE	Fast dimensionality reduction, interpretable	May miss nonlinear/temporal details
Feature-map compression	Large volume reduction for aggregated logs	May lose fine transient events
Autoencoder	Strong nonlinear feature learning	Risk of over-smoothing; computationally heavy
Kalman variants (EKF/UKF/adaptive)	Low-latency, interpretable, online-capable	Sensitive to model mismatch/initialization
Online time-series compression	Streamable; can preserve local dynamics	Trade-off between compression and transient retention

Table 2. Gaps comprehensive comparison table.

Reference	Covered	Gap	Relevance to Our Work
Neural network for vehicle time-series	NN-based continuous automated SOH estimation using vehicle time-series data	Tailored to vehicle signals and operating profiles; may not generalize to stationary energy storage	We borrow automation ideas but will validate on ES data and adapt features for PACKs
Improved LSTM + unsupervised recurring-load extraction	Unsupervised extraction of recurring loads, compressed sampling for LSTM-based battery modeling (vehicle context)	Designed for in-vehicle recurring patterns; optimized for low computing on vehicles, not PACK heterogeneity	Use compression/balanced-sampling idea, but adapt windowing and aggregation for many-cell ES
Feature-map compression of monthly charging sequences	Convert monthly EV charging sequences into feature maps to reduce volume while retaining key info	Works well for periodic charging profiles; may lose transient/pack-level details relevant for ES	Consider feature-map approach but test information retention on ES operational traces
Multi-stage compression: PCA → t-SNE → DBSCAN	Dimensionality reduction + local structure preservation + noise removal for capacity prediction	t-SNE scales poorly to very large datasets; DBSCAN sensitive to parameters; focused on EV capacity prediction	Adopt multi-stage idea but require scalable, parameter-robust pipeline for large ES datasets
Kalman-filter variants and Autoencoders	KF modifications: low-computation preprocessing; Autoencoders: denoising via learned features	KF variants sensitive to initial and low-quality data; Autoencoders risk over-smoothing, loss of original features and higher compute cost	Our ASW–UKF–TRF approach aims to retain KF efficiency while improving robustness; compare against autoencoder denoising and evaluate feature preservation and computing cost

Table 3. Comparison of different algorithm effects.

Processing Algorithm	Voltage	Current	SOC
ASW-UKF	RMAE: 0.5822	RMAE: 2.7615	RMAE: 0.3929
ASW-UKF	R²: 0.9994	R²: 0.9983	R²: 0.9999
Unsupervised LSTM	RMAE: 0.6426	RMAE: 3.7592	RMAE: 0.9042
Unsupervised LSTM	R²: 0.9993	R²: 0.9969	R²: 0.9994
MCC-EKF	RMAE: 0.7471	RMAE: 4.0629	RMAE: 0.3559
MCC-EKF	R²: 0.9991	R²: 0.9963	R²: 0.9999
Autoencoder	RMAE: 0.8139	RMAE: 3.9836	RMAE: 0.8657
Autoencoder	R²: 0.9989	R²: 0.9965	R²: 0.9995
FSW-UKF	RMAE: 0.7476	RMAE: 4.0633	RMAE: 0.3959
FSW-UKF	R²: 0.9991	R²: 0.9963	R²: 0.9999
ASW-EKF	RMAE: 0.6309	RMAE: 3.3187	RMAE: 0.3392
ASW-EKF	R²: 0.9993	R²: 0.9976	R²: 0.9999

Table 4. Comparison of different algorithm effects.

Method	Training Time (CPU)	Peak RAM	CPU Util. (avg)	Notes
Autoencoder	~25–40 min	~2.0–3.0 GB	60–80%	Training-heavy; dense ops but small model size; reasonable inference latency on CPU.
Unsupervised LSTM	~30–50 min	~2.5–4.0 GB	70–95%	Sequence model-higher training and inference cost, higher CPU load; sensitive to window length.
ASW-UKF (proposed)	no training	~50–300 MB	5–25%	Minimal training cost; very low per-sample latency and low memory footprint; best suited for constrained BMS.

Table 5. List of model hyperparameters.

Parameter	Module	Key Hyperparameters	Short Description
data sample resolution	Data resolution	1 sample/30 s	Raw acquisition cadence.
window_overlap	Data segmentation	50%	Sliding-window overlap fraction.
classification_targets	Label definitions	Normal: SOH ≥ 95%; Mild: 90–95%; Severe: <90%	Class thresholds/labeling rules (report exact rules in Methods)
min_window	ASW (minimum)	1 (sample)	Minimum adaptive-window length (in samples)
max_window	ASW (maximum)	20 (samples) ≈ 600 s	Maximum adaptive-window length (in samples)
change_threshold	ASW (voltage-rate)	data-driven (rule: ‘k × median)	rate
rounding_decimals	ASW equality check	2 (decimals)	Decimal places used when comparing (voltage, current, SOC)
UKF_Q_scale	UKF (process noise scale)	0.01 (Q = I × 0.01)	Scales process-noise covariance.
UKF_R_scale	UKF (measurement noise scale)	0.05 (R = I × 0.05)	Scales measurement-noise covariance.
TRF_MIN_SEG_LEN	TRF postproc smoothing	10 (samples) ≈ 300 s	Minimum segment length to keep; shorter segments merged.
TRF_max_depth	TRF classifier complexity	10	RandomForest maximum tree depth (None = unlimited).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, B.; Xie, X.; Yi, J.; Yu, Q.; Xu, Y.; Liu, K. A Hybrid ASW-UKF-TRF Algorithm for Efficient Data Classification and Compression in Lithium-Ion Battery Management Systems. Electronics 2025, 14, 3780. https://doi.org/10.3390/electronics14193780

AMA Style

Huang B, Xie X, Yi J, Yu Q, Xu Y, Liu K. A Hybrid ASW-UKF-TRF Algorithm for Efficient Data Classification and Compression in Lithium-Ion Battery Management Systems. Electronics. 2025; 14(19):3780. https://doi.org/10.3390/electronics14193780

Chicago/Turabian Style

Huang, Bowen, Xueyuan Xie, Jiangteng Yi, Qian Yu, Yong Xu, and Kai Liu. 2025. "A Hybrid ASW-UKF-TRF Algorithm for Efficient Data Classification and Compression in Lithium-Ion Battery Management Systems" Electronics 14, no. 19: 3780. https://doi.org/10.3390/electronics14193780

APA Style

Huang, B., Xie, X., Yi, J., Yu, Q., Xu, Y., & Liu, K. (2025). A Hybrid ASW-UKF-TRF Algorithm for Efficient Data Classification and Compression in Lithium-Ion Battery Management Systems. Electronics, 14(19), 3780. https://doi.org/10.3390/electronics14193780

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid ASW-UKF-TRF Algorithm for Efficient Data Classification and Compression in Lithium-Ion Battery Management Systems

Abstract

1. Introduction

2. Adaptive Sliding Window—Unscented Kalman Filter—Sequential Random Forest Algorithm

2.1. Adaptive Sliding Window (ASW)

2.2. Unscented Kalman Filter (UKF)

2.3. Time Series Random Forest (TRF)

2.4. Adaptive Sliding Window-Unscented Kalman Filter Data Preprocessing (ASW-UKF)

2.5. Adaptive Sliding Window-Unconditional Kalman Filter-Time Series Random Forest (ASW-UKF-TRF) Algorithm

3. Experimental Simulation and Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI