Predictive Active Cell Balancing for Li-Ion Batteries Using GRU-Based Voltage Estimation
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis paper titled “Predictive Active Cell Balancing for Li-Ion Batteries Using GRU-Based Voltage Estimation” submitted by Olteanu M. and Petreus D. presents a practical predictive active cell-balancing approach for series Li-ion packs, where a GRU-based model forecasts short-horizon cell voltages from recent voltage/current/temperature history and uses those predictions to drive balancing decisions on a flyback-based active balancer. The work is framed around reducing unnecessary/reactive transfers and improving convergence behavior, and it includes an experimental comparison on a 12-cell NCR18650B pack. However, the authors are suggested to address following concerns to make their scientific statements more lucid and reader friendly:
Overall, I think the authors have done solid work here. The manuscript addresses a real control limitation of voltage-threshold balancing under dynamic conditions, and it goes beyond a purely algorithmic study by tying model development to a real balancing platform and reporting an experimental comparison between a reactive rule and a prediction-driven rule under matched timing/stabilization logic. The prediction component is also evaluated across multiple horizons with error metrics reported, which is helpful context before the controller comparison.
The main technical point that would benefit from one clean clarification is how the multi-step forecasts are produced in practice. The problem formulation is written as a multi-step forecasting task over a horizon , and the manuscript discusses direct versus recursive multi-step approaches. Later, the model architecture is described as producing the “next step” voltage output via a single-neuron dense layer, while results are reported for t+1, t+5, and t+10. A short, explicit statement explaining whether t+5/t+10 comes from (i) separately trained models, (ii) a direct multi-output head, or (iii) recursive roll-out would remove ambiguity and make it easier to assess whether the deployed controller matches the reported horizon evaluation.
The manuscript should make the sampling/feature alignment between training and deployment more explicit. The dataset description states a 1 s sampling interval. In the experimental balancing loop, the controller collects sixty samples per cell at 100 Hz, and during balancing intervals voltages are sampled at 10 ms to support current estimation. If the GRU is trained on 1 Hz sequences but fed higher-rate experimental traces, the paper should briefly state what preprocessing is applied (down sampling/averaging/feature extraction) so the inputs presented to the model have the same meaning/statistics as in training. This is a reproducibility detail, but it’s directly tied to the credibility of the deployment.
The approach used to infer balancing current from the slope of the voltage trace and an empirically determined gain factor is a reasonable engineering solution, but the manuscript would benefit from a couple of sentences on how that gain factor was obtained or calibrated. Similarly, the assumption that cell temperature equals ambient (due to lack of per-cell sensors) should be presented explicitly as a limitation, and the ambient temperature used in the balancing comparison should be stated clearly, since temperature is one of the model inputs.
On terminology: the control loop is referred to as “MPC cycles,” but what is described in the comparison reads as a dead band rule applied to measured voltages (reactive) versus the same dead band applied to predicted voltages (predictive). Since MPC is also introduced earlier in the manuscript in the standard optimization-based sense, I would suggest slightly tightening the wording here (or defining “MPC” as used in this paper) to avoid readers expecting an explicit optimization formulation that isn’t part of the contribution.
The results are promising and directionally clear: the predictive strategy reduces the reported voltage spread substantially more than the reactive method over the same 100 s window, and the ΔVmax trajectory suggests faster convergence. However, I’d recommend tightening the definitions and ensuring consistency between “voltage spread / voltage difference” and, ΔVmax especially given the manuscript’s emphasis on measuring only when balancing switches are disabled and after a stabilization check. Also, the abstract highlights reduced “balancing command reconfigurations,” but this benefit isn’t clearly quantified alongside the voltage metrics; adding a simple count of command updates or state transitions (already available from the controller log) would make that claim much stronger without expanding scope.
Author Response
Comments 1: The main technical point that would benefit from one clean clarification is how the multi-step forecasts are produced in practice. The problem formulation is written as a multi-step forecasting task over a horizon , and the manuscript discusses direct versus recursive multi-step approaches. Later, the model architecture is described as producing the “next step” voltage output via a single-neuron dense layer, while results are reported for t+1, t+5, and t+10. A short, explicit statement explaining whether t+5/t+10 comes from (i) separately trained models, (ii) a direct multi-output head, or (iii) recursive roll-out would remove ambiguity and make it easier to assess whether the deployed controller matches the reported horizon evaluation.
Response 1: Thank you for this insightful comment. We agree that the manuscript required a clearer explanation of how the multi-step forecasts are generated. To address this, we have revised the text to explicitly describe the forecasting strategy and the model configuration used for each prediction horizon. Specifically, we now clarify that separate models were trained for H = 1, H = 5, and H = 10, each using a direct multi-output dense layer to produce the corresponding voltage forecasts. The revised text has been added on page 23, paragraph 4, lines 693–702.
Comments 2: The manuscript should make the sampling/feature alignment between training and deployment more explicit. The dataset description states a 1 s sampling interval. In the experimental balancing loop, the controller collects sixty samples per cell at 100 Hz, and during balancing intervals voltages are sampled at 10 ms to support current estimation. If the GRU is trained on 1 Hz sequences but fed higher-rate experimental traces, the paper should briefly state what preprocessing is applied (down sampling/averaging/feature extraction) so the inputs presented to the model have the same meaning/statistics as in training. This is a reproducibility detail, but it’s directly tied to the credibility of the deployment.
Response 2: Thank you for this valuable observation. We agree that the manuscript required a clearer explanation of how the sampling rates used during deployment are aligned with the 1 Hz sampling rate of the training dataset. To address this, we have added a detailed clarification describing the preprocessing steps applied to the experimental measurements before they are used as inputs to the GRU model.
Specifically, we now explain that although the controller acquires 60 voltage samples at 100 Hz during each control cycle, these measurements are collected under near-zero current conditions and after a short relaxation interval, resulting in a quasi-steady voltage sequence that is statistically consistent with the 1 Hz training data. We also clarify that voltage measurements are filtered to reduce noise, and that the high frequency 10 ms sampling used during active balancing is employed solely for current estimation and is not fed directly to the predictive model. The revised text has been added on page 30, paragraph 3–4, lines 880-889.
Comments 3: The approach used to infer balancing current from the slope of the voltage trace and an empirically determined gain factor is a reasonable engineering solution, but the manuscript would benefit from a couple of sentences on how that gain factor was obtained or calibrated. Similarly, the assumption that cell temperature equals ambient (due to lack of per-cell sensors) should be presented explicitly as a limitation, and the ambient temperature used in the balancing comparison should be stated clearly, since temperature is one of the model inputs.
Response 3: Thank you for this helpful comment. We agree that additional clarification regarding the calibration of the gain factor and the temperature assumption improves the transparency and reproducibility of the proposed method. To address this, we have added two explanatory paragraphs in the revised manuscript.
First, we now specify that the gain factor used for current estimation was selected empirically to ensure consistency between the estimated current and the observed voltage response during balancing. This clarification has been added on page 29, paragraph 1, lines 819-820.
Second, we explicitly state that, due to the absence of per cell temperature sensors, the cell temperature was approximated using the ambient temperature. We justify this assumption by noting that the balancing currents are relatively low and the test intervals short, making significant self heating unlikely. We also acknowledge this as a limitation of the current implementation, since potential temperature gradients between cells cannot be captured. This explanation has been added on page 29, paragraph 4, lines 839-841.
Comments 4: On terminology: the control loop is referred to as “MPC cycles,” but what is described in the comparison reads as a dead band rule applied to measured voltages (reactive) versus the same dead band applied to predicted voltages (predictive). Since MPC is also introduced earlier in the manuscript in the standard optimization-based sense, I would suggest slightly tightening the wording here (or defining “MPC” as used in this paper) to avoid readers expecting an explicit optimization formulation that isn’t part of the contribution.
Response 4: Thank you for this important observation. We agree that the use of the term “MPC cycles” may unintentionally suggest the presence of a full optimization based Model Predictive Control scheme, which is not implemented in this work. The balancing strategy used in our experiments relies on a dead band decision rule applied either to measured voltages (reactive) or to GRU predicted voltages (predictive), without solving an optimization problem.To avoid any confusion, we have removed the term “MPC cycles” from the experimental description and replaced it with the more accurate expression “balancing cycles” throughout the relevant sections. This terminology better reflects the actual control logic used in the study and prevents misinterpretation regarding the presence of an MPC formulation.These changes have been applied in the following locations of the revised manuscript: page 30, paragraph 2, lines 874; page 30, paragraph 5, line 891; page 30, paragraph 7, line 898; page 30, paragraph 8, lines 911.
Comments 5: The results are promising and directionally clear: the predictive strategy reduces the reported voltage spread substantially more than the reactive method over the same 100 s window, and the ΔVmax trajectory suggests faster convergence. However, I’d recommend tightening the definitions and ensuring consistency between “voltage spread / voltage difference” and, ΔVmax especially given the manuscript’s emphasis on measuring only when balancing switches are disabled and after a stabilization check. Also, the abstract highlights reduced “balancing command reconfigurations,” but this benefit isn’t clearly quantified alongside the voltage metrics; adding a simple count of command updates or state transitions (already available from the controller log) would make that claim much stronger without expanding scope.
Response 5: Thank you for these constructive suggestions. We agree that the terminology related to voltage imbalance metrics required clarification and consistency. To address this, we have revised the manuscript to use the term “voltage difference” consistently throughout the text and to explicitly define ΔVmax as the maximum instantaneous voltage difference between any two cells, measured only after stabilization and with all balancing switches disabled. These clarifications have been incorporated on page 31, paragraph 3, line 931; page 32, paragraph 1, line 937; page 32, paragraph 3, lines 947-949.Regarding the second part of the comment, we appreciate the recommendation to quantify the number of balancing command reconfigurations. We have now included this metric in the results section and added the corresponding values extracted from the controller log. This addition strengthens the comparison between reactive and predictive strategies by explicitly showing the reduction in command updates. The new information has been added on page 32, paragraph 2, lines 939-941, and the numerical values have also been included in Table 3 (page 32).
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper addresses a critical challenge in Battery Management Systems (BMS): the limitations of reactive cell balancing. By proposing a predictive approach using Gated Recurrent Units (GRU) to estimate short-term voltage evolution, the authors aim to improve the efficiency and speed of active balancing in flyback converter topologies. The integration of deep learning with hardware-in-the-loop validation (DC2100B-C module) is a significant strength. The results demonstrate a clear improvement in voltage equalization (88% reduction in â–³V for the predictive method vs. 64% for the reactive method). The paper presents high-quality experimental work and a promising application of RNNs in power electronics. However, the structural errors and the need for a more robust justification of the 100s test limit and temperature assumptions must be resolved to meet the standards of Electronics. Herein, a Major Revision is necessary.
- On page 4, the "State of the Art" is labeled Section 1.2, but the next sub-header is 2.1.1. On page 11, the "Hardware Platform" is Section 2.1, but on page 27, "Experimental Setup" is Section 3.3. There is a general lack of a cohesive numbering hierarchy.
- Line 106 states Section 5 discusses findings, while line 108 says Section 5 discusses findings/limitations and Section 6 concludes. Please synchronize the "Organization of the Paper" paragraph with the actual headers.
- The authors state that cell temperature was assumed equal to ambient temperature because balancing currents are low (Line 784). While this may be acceptable for short 100s tests, internal resistance changes significantly with even small temperature gradients. The authors should justify why this simplification does not compromise the GRU model’s accuracy in a real-world scenario where cells in the center of a pack typically run hotter.
- The comparison is conducted over a 100s interval. While this shows "convergence behavior," it does not show full equalization. Why was 100s chosen? A longer test would more clearly demonstrate the "number of command reconfigurations" mentioned in the abstract.
- The "Classic Reactive" strategy uses a ±10 mV deadband. Is this the industry standard for the LTC3300-1? The authors should clarify if the reactive strategy was optimized for the specific cell chemistry (NCR18650B) to ensure a fair comparison.
- The model uses Voltage, Current, and Temperature. However, the current is "estimated" rather than measured (Line 774). The authors should provide a brief error analysis of this current estimation method, as the GRU’s performance is heavily dependent on the quality of this input.
- The subscript for CO2 in Figure 1 is wrong; there are many such issues in the article.
Author Response
Comments 1: On page 4, the "State of the Art" is labeled Section 1.2, but the next sub-header is 2.1.1. On page 11, the "Hardware Platform" is Section 2.1, but on page 27, "Experimental Setup" is Section 3.3. There is a general lack of a cohesive numbering hierarchy.
Response 1: : Thank you for pointing out these inconsistencies in the section numbering. We fully agree that the manuscript contained several formatting oversights that affected the clarity and coherence of the hierarchy. We have carefully reviewed and corrected the numbering of all affected sections and subsections to ensure a consistent and logical structure throughout the manuscript.n The corrections have been applied in the following locations: page 4, line 124; page 11, line 322; page 27, line 792.
Comments 2: Line 106 states Section 5 discusses findings, while line 108 says Section 5 discusses findings/limitations and Section 6 concludes. Please synchronize the "Organization of the Paper" paragraph with the actual headers.
Response 2: Thank you for highlighting this inconsistency. We fully agree that the “Organization of the Paper” paragraph must accurately reflect the structure of the manuscript. During earlier revisions, several section titles were updated, and we inadvertently omitted to synchronize this introductory paragraph accordingly. We appreciate the reviewer’s attention to this detail and apologize for the oversight. We have now corrected the paragraph to ensure full alignment with the actual section headers. The updated text has been revised on page 4, paragraph 3, lines 114–122.
Comments 3: The authors state that cell temperature was assumed equal to ambient temperature because balancing currents are low (Line 784). While this may be acceptable for short 100s tests, internal resistance changes significantly with even small temperature gradients. The authors should justify why this simplification does not compromise the GRU model’s accuracy in a real-world scenario where cells in the center of a pack typically run hotter.
Response 3: Thank you for this insightful comment. We agree that assuming cell temperature equal to ambient temperature represents a simplification, and we now state this explicitly as a limitation of the current implementation. To support the validity of this assumption within the scope of our experiments, we note that the balancing currents used in the 100 second tests are relatively low, and infrared thermometer measurements performed during the experiments did not indicate significant temperature variations across the cells. Although even small gradients can influence internal resistance, the short duration and low thermal load minimize this effect. Importantly, both the reactive and predictive balancing experiments were conducted under identical thermal conditions. Therefore, while the assumption limits the generality of the implementation, it does not affect the comparative validity of the GRU based predictive strategy relative to the classical reactive method. This limitation has been added on page 29, paragraph 4, lines 836-847 of the revised manuscript.
Comments 4: The comparison is conducted over a 100s interval. While this shows "convergence behavior," it does not show full equalization. Why was 100s chosen? A longer test would more clearly demonstrate the "number of command reconfigurations" mentioned in the abstract.
Response 4: Thank you for this observation. We agree that the 100 second interval does not capture full equalization, and we clarify this explicitly in the manuscript. The purpose of the experiment was not to demonstrate complete balancing, but rather to compare the initial convergence behavior and the number of command reconfigurations between the reactive and predictive strategies under controlled and repeatable conditions. A longer experiment was not used because extended balancing intervals may introduce non negligible temperature gradients between cells, which would affect internal resistance and compromise the fairness of the comparison. This consideration is closely related to the temperature related limitation discussed in Comment 3. By restricting the evaluation to a short interval with low balancing currents, we ensured that both strategies operated under identical thermal conditions, preserving the validity of the comparative analysis. This clarification has been added on page 31, paragraph 2, lines 924-926 of the revised manuscript.
Comments 5: The "Classic Reactive" strategy uses a ±10 mV deadband. Is this the industry standard for the LTC3300-1? The authors should clarify if the reactive strategy was optimized for the specific cell chemistry (NCR18650B) to ensure a fair comparison.
Response 5: Thank you for this valuable comment. We fully agree that the manuscript should explicitly justify the choice of the ±10 mV deadband used in the classical reactive strategy. In the original version, this rationale was not clearly stated, and we appreciate the reviewer drawing attention to this omission. In the revised manuscript, we now clarify that the ±10 mV threshold was not selected based on LTC3300 1 defaults, but rather based on published electrochemical analyses of lithium ion cells. Specifically, this value is consistent with the voltage deviation ranges reported in reference [31], which aligns with the voltage deviation ranges reported in the literature for lithium ion cells. This clarification has been added on page 30, paragraph 5, lines 893-894 of the revised manuscript.
Comments 6: The model uses Voltage, Current, and Temperature. However, the current is "estimated" rather than measured (Line 774). The authors should provide a brief error analysis of this current estimation method, as the GRU’s performance is heavily dependent on the quality of this input.
Response 6: explaining that the estimation relies on the voltage slope derivative, which inherently amplifies measurement noise and introduces uncertainty. We also note that the resulting discrepancies between estimated and nominal balancing current are consistent with the known limitations of derivative based methods and should be interpreted as an approximate dynamic indicator rather than an exact measurement. These clarifications have been added on page 29, paragraphs 2–3, lines 826-835.
Comments 7: The subscript for CO2 in Figure 1 is wrong; there are many such issues in the article.
Response 7: Thank you for pointing out this formatting issue. We fully agree that the incorrect rendering of subscripts. The specific issue in Figure 1 has been corrected on page 2, line 52.
Reviewer 3 Report
Comments and Suggestions for AuthorsThis work presents a method for predicting short-term evolution of battery cell voltages, intended for active battery management systems. It is based on a model trained on experimental datasets derived from measurements of Li-ion batteries and it is experimentally validated on a battery pack with 12 cells connected in series. The authors are invited to address the following points to improve their work in view of possible publication.
- The novelties introduced in this work should be better highlighted in comparison to previously published works.
- Section 2.3: the benefits of the adopted neural network architecture should be highlighted in comparison to other alternative implementations.
- To expand the literature background on technologies for battery monitoring and management, the authors should mention and briefly discuss other recently published works (such as doi: 10.3390/batteries9050239, doi: 10.1016/j.rineng.2025.104524), highlighting improvements enabled by their proposed solution.
- Page 24: further theoretical insight should be given concerning the cause of the residual prediction errors.
- Figures 27-28: quantitative metrics should be added to highlight how the traces shown in fig. 28 are preferrable to those of fig. 27.
Author Response
Comments 1: The novelties introduced in this work should be better highlighted in comparison to previously published works.
Response 1: Thank you for this constructive comment. We agree that the original version of the manuscript did not sufficiently emphasize the specific novelties of our approach relative to previously published work. We have now revised the introduction to more clearly articulate the unique contributions of this study and to contrast them with existing methods. We highlight that prior approaches often rely on advanced measurement capabilities or complex control architectures that are difficult to deploy in real time embedded systems. We added a dedicated clarification on page 4, paragraph 2, lines 105–112.
Comments 2: Section 2.3: the benefits of the adopted neural network architecture should be highlighted in comparison to other alternative implementations.
Response 2: Thank you for this insightful comment. We agree that the advantages of the selected neural network architecture should be clearly articulated. In the revised manuscript, we expanded the discussion in Section 2.3 to highlight why the GRU architecture is particularly suitable for the proposed application. Specifically, we now emphasize that GRUs offer an effective balance between modeling capability and computational efficiency. Unlike conventional RNNs, which struggle to capture long term temporal dependencies, and LSTM networks, which introduce additional architectural complexity and higher computational overhead, GRUs achieve comparable performance with a more compact structure. This makes them well suited for real time battery management applications, where computational resources are limited and low latency operation is essential. The relevance of this trade off is supported by recent findings reported in [26], which demonstrate the efficiency and robustness of GRU based models for battery state estimation. These additions have been incorporated on page 22, paragraph 7, lines 664–670 of the revised manuscript.
Comments 3: To expand the literature background on technologies for battery monitoring and management, the authors should mention and briefly discuss other recently published works (such as doi: 10.3390/batteries9050239, doi: 10.1016/j.rineng.2025.104524), highlighting improvements enabled by their proposed solution.
Response 3: Thank you for this valuable recommendation. We agree that the literature background can be strengthened by including recent developments in battery monitoring and machine learning based balancing strategies. In the revised manuscript, we have incorporated a concise discussion of the two suggested works, emphasizing their relevance to advanced sensing and data driven control approaches. Specifically, we now reference recent progress in high resolution battery monitoring techniques, such as electrochemical impedance spectroscopy (EIS) for enhanced internal state estimation, as well as machine learning based active balancing frameworks that leverage reinforcement learning and neural network controllers to improve charge redistribution and SoC equalization. These additions help contextualize our contribution within the broader landscape of emerging BMS technologies. The corresponding updates have been added on page 4, paragraph 1, lines 98–104 of the revised manuscript.
Comments 4: Page 24: further theoretical insight should be given concerning the cause of the residual prediction errors.
Response 4: Thank you for this helpful comment. We agree that the manuscript benefits from a clearer explanation of the underlying causes of the residual prediction errors. In the revised version, we have added a concise theoretical discussion outlining the main factors contributing to these errors, including the nonlinear and history dependent behavior of lithium ion cells, the accumulation of uncertainty with increasing prediction horizon, the smoothing tendency of neural networks during fast transients, and the impact of uncertainties in the estimated current input. These clarifications have been incorporated on page 27, starting from paragraph 1, lines 771-789 of the revised manuscript.
Comments 5: Figures 27-28: quantitative metrics should be added to highlight how the traces shown in fig. 28 are preferrable to those of fig. 27.
Response 5: Thank you for this comment. We agree that quantitative metrics are essential to clearly demonstrate the performance differences between the two balancing strategies. These metrics have now been explicitly included in Table 3, where we report the initial and final voltage differences, convergence behavior, and the number of command reconfigurations for both methods. This table provides a direct quantitative comparison that complements Figures 27 and 28 and highlights why the predictive balancing traces are preferable.
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsNo more question. Good luck!
