Next Article in Journal
Applications of Dual-Phase Soft Magnetic Laminate in Interior Permanent-Magnet Synchronous Motors: Research Progress and Challenges
Previous Article in Journal
Design Optimization of a Prismatic Compact High-Power Molten-Salt Reactor Based on Graphite Lifetime and Fuel Efficiency
 
 
Article
Peer-Review Record

An Energy Management Optimization Method for Arctic Space Environment Monitoring Buoys Based on Deep Reinforcement Learning

Energies 2026, 19(6), 1487; https://doi.org/10.3390/en19061487
by Hui Zhu 1,2,3, Bingrui Li 2,*, Yan Chen 1, Yinke Dou 1, Yi Tian 1,2,4, Yahao Li 1,2,5, Huiguang Li 6 and Zepeng Gao 1,2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Energies 2026, 19(6), 1487; https://doi.org/10.3390/en19061487
Submission received: 12 December 2025 / Revised: 30 December 2025 / Accepted: 5 January 2026 / Published: 17 March 2026
(This article belongs to the Section F5: Artificial Intelligence and Smart Energy)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper entitled "An Energy Management Optimization Method for Arctic Space Environment Monitoring Buoys Based on Deep Reinforcement Learning" deals with the optimization of the energy management of arctic measurement buoys using deep reinforcement learning. The topic is current and relevant, especially in the context of long-term monitoring in extreme conditions and reducing dependence on lithium batteries. The system model is detailed, includes PV, wind, battery storage and real Arctic data, and the results show a significant reduction in lithium battery consumption when integrating wind generators. However, in addition to the above, I propose a minor revision and give guidelines for improving the work.

A quality overview of the literature and a larger number of cited references of recent date are missing. You can rely on and look at the Pyramid of Contribution Review and thus cite references. Also, expand the list of more recent references and cite papers of a more recent date. and I suggest some of the paper https://doi.org/10.18280/ijepm.100203, which would give the breadth of the introduction and literature review. Also, most of the literature is theses and regional works; several references from leading international journals are missing.

Although TD3 is a justified choice, it lacks a comparison with, for example: DDPG, PPO, SAC, classical heuristic methods. This would significantly strengthen the scientific contribution.

The cost of a lithium battery (per kWh or monetary cost) is not precisely defined, which makes reproducibility and realistic economic interpretation difficult.

All results are simulation. Limited experimental validation would significantly increase credibility.

Battery degradation is mentioned but not included in the optimization. Long-term work in the Arctic makes this a key factor.

The mixing of symbols appears throughout the work (eg WB ​​as SOC and as capacity). Please check this throughout the paper.

Lithium battery as "power supply only" without charging is unusual and requires additional explanation.

The paper contains a large number of grammatical and stylistic errors (eg repetitions, missing words, inconsistent spacing). Detailed language proofreading is recommended.

Some references at the end are not properly formatted.

The paper has good potential for publication, but requires moderate refinements, especially in the literature review section, as well as validation, model clarity, and technical precision. After addressing the above points, the contribution would be significantly stronger and more competitive for publication.

Author Response

Comments 1: A quality overview of the literature and a larger number of cited references of recent date are missing. You can rely on and look at the Pyramid of Contribution Review and thus cite references. Also, expand the list of more recent references and cite papers of a more recent date. and I suggest some of the paper https://doi.org/10.18280/ijepm.100203, which would give the breadth of the introduction and literature review. Also, most of the literature is theses and regional works; several references from leading international journals are missing.

Response 1: Thank you for pointing this out. We agree with this comment. Therefore, we have rewritten the introduction, added a list of recent literature and incorporated citations, thereby enhancing the breadth of both the introduction and the literature review.

Comments 2: Although TD3 is a justified choice, it lacks a comparison with, for example: DDPG, PPO, SAC, classical heuristic methods. This would significantly strengthen the scientific contribution.

Response 2: Thank you for pointing this out. We agree with this comment. Therefore, we have revised the article's rationale for selecting the algorithm, adding the reasons for choosing TD3 in the Introduction (overestimation bias, off-policy efficiency) and honestly listing it as a limitation in the Conclusions (no cross-algorithm comparison). We plan to complete comparative analyses with other algorithms in subsequent work.

Comments 3: The cost of a lithium battery (per kWh or monetary cost) is not precisely defined, which makes reproducibility and realistic economic interpretation difficult.

Response 3: Thank you for pointing this out. We agree with this comment. Therefore, we have calculated the acquisition cost of lithium batteries to be $839.412 per kilowatt-hour and have made revisions and additions to Formula 10 in the article.

Comments 4: All results are simulation. Limited experimental validation would significantly increase credibility.

Response 4: Thank you for pointing this out. We agree with this comment. Therefore, we have supplemented the results of this year's field deployment tests by adding Section 4.3 to the paper, providing strong support for the reliability and credibility of the algorithm.

Comments 5: Battery degradation is mentioned but not included in the optimization. Long-term work in the Arctic makes this a key factor.

Response 5: Thank you for pointing this out. We agree with this comment. Battery degradation is accounted for in the computational model presented herein. Based on battery test results, degradation rates at different temperatures were selected to calculate battery power supply capacity. The battery power supply results obtained in this paper are derived considering battery degradation, thus requiring a greater number of lithium batteries for power supply compared to normal conditions.

Comments 6: The mixing of symbols appears throughout the work (eg WB as SOC and as capacity). Please check this throughout the paper.

Response 6: Thank you for pointing this out. We agree with this comment. Therefore, we have thoroughly reviewed instances of mixed symbols in the article. WB denotes capacity, while SOCB denotes state of charge. Corrections have been made in equations 4 and 21.

Comments 7: Lithium battery as "power supply only" without charging is unusual and requires additional explanation.

Response 7: Thank you for pointing this out. We agree with this comment. Non-rechargeable lithium batteries are commonly used as power sources for disposable buoys. The buoy system described in this paper incorporates renewable energy supply and lead-acid battery energy storage to enhance the system's endurance. This configuration is based on an actual deployed buoy system. Additional details have been provided in Section 1.1 of the article.

Point 1:The paper contains a large number of grammatical and stylistic errors (eg repetitions, missing words, inconsistent spacing). Detailed language proofreading is recommended.

Response 1:  Thank you for pointing this out. We agree with this comment. We have thoroughly reviewed the content of the article and conducted a detailed linguistic proofreading to address the numerous grammatical and stylistic errors present in the paper.

Point 2:The paper contains a large number of grammatical and stylistic errors (eg repetitions, missing words, inconsistent spacing). Detailed language proofreading is recommended.

Response 2:  Thank you for pointing this out. We agree with this comment. We have rechecked the formatting of the references and revised any non-standard formats.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript proposes an intelligent energy management optimization framework for Arctic space environment monitoring buoys using a Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. The approach integrates photovoltaic (PV), wind turbine, lithium-ion, and lead-acid battery systems to improve energy efficiency and reliability under harsh Arctic conditions. Simulation experiments conducted using MATLAB with real Arctic data (irradiance, wind speed, temperature) show that integrating wind turbines with PV significantly reduces lithium battery consumption and extends operational endurance.

  1. The paper does not compare TD3 with other reinforcement learning algorithms, such as DDPG, PPO, or SAC, to validate performance superiority (noted in Conclusions, p.12, lines 427–430).
  2. The study remains simulation-based. Field test data or hardware-in-the-loop validation would significantly enhance reliability claims.
  3. The design of the reward function (Eq. 23, p.9) is briefly presented but lacks sensitivity analysis or justification of its penalty terms and scaling factor δ.
  4. Figures 3–5 (pp.10–12) lack axis labeling units, language, and have small font sizes. Descriptions in captions could better explain trends and transitions.
  5. Several grammatical inconsistencies (e.g., “coupled with strong winds” → “which, coupled with strong winds,”2) and minor typographical errors should be corrected.
  6. The discussion could be enriched by quantifying algorithm convergence metrics, runtime efficiency, and computational requirements.

Author Response

Comments 1: The paper does not compare TD3 with other reinforcement learning algorithms, such as DDPG, PPO, or SAC, to validate performance superiority (noted in Conclusions, p.12, lines 427–430).

Response 1: Thank you for pointing this out. We agree with this comment. Therefore, we  have revised the article's rationale for selecting the algorithm, adding the reasons for choosing TD3 in the Introduction (overestimation bias, off-policy efficiency) and honestly listing it as a limitation in the Conclusions (no cross-algorithm comparison). We plan to complete comparative analyses with other algorithms in subsequent work.

Comments 2: The study remains simulation-based. Field test data or hardware-in-the-loop validation would significantly enhance reliability claims.

Response 2: Thank you for pointing this out. We agree with this comment. Therefore, we have supplemented the results of this year's field deployment tests by adding Section 4.3 to the paper, providing strong support for the reliability and credibility of the algorithm.

Comments 3: The design of the reward function (Eq. 23, p.9) is briefly presented but lacks sensitivity analysis or justification of its penalty terms and scaling factor δ.

Response 3: Thank you for pointing this out. We agree with this comment. Therefore, we have conducted a sensitivity analysis on the penalty factor and scaling parameter δ. The relevant analysis results have been supplemented in Section 3.3, further supporting the rationality of parameter settings and the robustness of the model.

Comments 4: Figures 3–5 (pp.10–12) lack axis labeling units, language, and have small font sizes. Descriptions in captions could better explain trends and transitions.

Response 4: Thank you for pointing this out. We agree with this comment. Therefore, we have revised the article to address issues such as missing axis labels, language expression problems, and excessively small font sizes in the figures.

Comments 5: Several grammatical inconsistencies (e.g., “coupled with strong winds” → “which, coupled with strong winds,”2) and minor typographical errors should be corrected.

Response 5: Thank you for pointing this out. We agree with this comment. We have fixed similar grammatical structure issues in the text.

Comments 6: The discussion could be enriched by quantifying algorithm convergence metrics, runtime efficiency, and computational requirements.

Response 6: Thank you for pointing this out. We agree with this comment. Therefore, we have added descriptions regarding algorithm convergence (100 episodes) and single-step decision time (<10ms) in the Results and Discussion section.

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

Authors should address following concerns.

  1. While the abstract summarizes the approach, it does not provide enough quantitative outcomes. For example, stating the percentage reduction in lithium battery usage or endurance improvement would strengthen the abstract and make the contribution clearer.
  2. The introduction discusses DRL applications in general terms but does not clearly distinguish this work from existing DRL-based microgrid or isolated power system studies. What specific limitations of existing DRL energy management approaches prevent their direct application to Arctic buoys?
  3. TD3 is introduced later as the chosen algorithm, but the introduction does not clearly explain why TD3 is more suitable than DDPG, PPO, or SAC for this application. A brief justification here would help guide the reader.
  4. The separation between lithium batteries (as power supply only) and lead-acid batteries (as energy storage) is unusual and not sufficiently justified. Is this configuration based on real deployed buoy systems, or is it a modeling abstraction?
  5. The paper mixes W and kW units, and the stated maximum outputs (e.g., lithium battery at 10 W vs. storage capacity at 19.2 kWh) raise questions about scale consistency. Please verify and clarify whether these values reflect a realistic buoy system.
  6. Several constraints (e.g., Equations 7–9) are mathematically presented but not intuitively explained. A short physical interpretation of each constraint would significantly improve readability.
  7. The lithium battery cost function is not clearly defined in physical or economic terms (e.g., cost per kWh, degradation cost, or replacement cost). This makes it difficult to interpret the reported “cost reduction.”
  8. The selected state variables (renewable output, load, SOC) are reasonable, but environmental factors such as temperature and wind speed are not directly included. Did the authors consider including temperature or forecasted wind/irradiance as part of the state?
  9. The penalty factors and scaling parameter δ are crucial to performance, yet their selection is not justified. A brief sensitivity analysis would improve confidence in the results.
  10. The authors correctly note the lack of algorithm comparison and simplified modeling. However, these are not minor issues; they significantly affect the strength of the conclusions and should be emphasized more clearly.

Author Response

Comments 1: While the abstract summarizes the approach, it does not provide enough quantitative outcomes. For example, stating the percentage reduction in lithium battery usage or endurance improvement would strengthen the abstract and make the contribution clearer.

Response 1: Thank you for pointing this out. We agree with this comment. Therefore, we  have incorporated specific figures into the Abstract (87.5% reduction, $68,830 savings) to strengthen its content and more clearly articulate the study's contributions.

Comments 2: The introduction discusses DRL applications in general terms but does not clearly distinguish this work from existing DRL-based microgrid or isolated power system studies. What specific limitations of existing DRL energy management approaches prevent their direct application to Arctic buoys?

Response 2: Thank you for pointing this out. We agree with this comment. Therefore, we have specifically added a paragraph in the Introduction to explain the differences between polar buoys and conventional microgrids (polar night, unmaintainable conditions, low-temperature degradation).

Comments 3: TD3 is introduced later as the chosen algorithm, but the introduction does not clearly explain why TD3 is more suitable than DDPG, PPO, or SAC for this application. A brief justification here would help guide the reader.

Response 3: Thank you for pointing this out. We agree with this comment. Therefore, we have revised the article's rationale for selecting the algorithm, adding the reasons for choosing TD3 in the Introduction (overestimation bias, off-policy efficiency) and honestly listing it as a limitation in the Conclusions (no cross-algorithm comparison). We plan to complete comparative analyses with other algorithms in subsequent work.

Comments 4: The separation between lithium batteries (as power supply only) and lead-acid batteries (as energy storage) is unusual and not sufficiently justified. Is this configuration based on real deployed buoy systems, or is it a modeling abstraction?

Response 4: Thank you for pointing this out. We agree with this comment. Non-rechargeable lithium batteries are commonly used as power sources for disposable buoys. The buoy system described in this paper incorporates renewable energy supply and lead-acid battery energy storage to enhance the system's endurance. This configuration is based on an actual deployed buoy system. Supplementary details have been added to the article.

Comments 5: The paper mixes W and kW units, and the stated maximum outputs (e.g., lithium battery at 10 W vs. storage capacity at 19.2 kWh) raise questions about scale consistency. Please verify and clarify whether these values reflect a realistic buoy system.

Response 5: Thank you for pointing this out. We agree with this comment. Therefore, we have revised instances where watts (W) and kilowatts (kW) were used interchangeably throughout the article, standardizing all measurements to kilowatts (kW). Consequently, the maximum output values for the lithium battery (10W) and the energy storage capacity (19.2kWh) are now uniformly expressed in kilowatt-hours (kWh). Additionally, the article consistently uses kWh as the unit for lithium battery power consumption.

Comments 6: Several constraints (e.g., Equations 7–9) are mathematically presented but not intuitively explained. A short physical interpretation of each constraint would significantly improve readability.

Response 6: Thank you for pointing this out. We agree with this comment. Therefore, we  have supplemented each constraint in Equations 7-9 with a brief physical explanation, expanded Equation 7 into Equations 7-10, and added physical commentary to each of these equations.

Comments 7: The lithium battery cost function is not clearly defined in physical or economic terms (e.g., cost per kWh, degradation cost, or replacement cost). This makes it difficult to interpret the reported “cost reduction.”

Response 7: Thank you for pointing this out. We agree with this comment. Therefore, we have calculated the acquisition cost of lithium batteries to be $839.412 per kilowatt-hour. Therefore, reducing lithium battery power consumption by 1 kWh can save $839.412. This adjustment has been incorporated into and supplemented in Formula 10 of the article.

Comments 8: The selected state variables (renewable output, load, SOC) are reasonable, but environmental factors such as temperature and wind speed are not directly included. Did the authors consider including temperature or forecasted wind/irradiance as part of the state?

Response 8: Thank you for pointing this out. We agree with this comment. This paper incorporates environmental factors such as temperature and wind speed into the algorithmic calculations, converting wind speed and irradiance into daily photovoltaic and wind-solar power generation. However, this approach was not explicitly stated in the original article. We have now revised and clarified this methodology within the paper.

Comments 9:  The penalty factors and scaling parameter δ are crucial to performance, yet their selection is not justified. A brief sensitivity analysis would improve confidence in the results.

Response 9: Thank you for pointing this out. We agree with this comment. Therefore, we have conducted a sensitivity analysis on the penalty factor and scaling parameter δ. The relevant analysis results have been supplemented in Section 3.3, further supporting the rationality of parameter settings and the robustness of the model.

Comments 10: The authors correctly note the lack of algorithm comparison and simplified modeling. However, these are not minor issues; they significantly affect the strength of the conclusions and should be emphasized more clearly.

Response 10: Thank you for pointing this out. We agree with this comment. In the Results and Conclusions sections, we adopted more rigorous wording to downplay the assertion that “a new algorithm was invented,” while emphasizing the claim that “TD3 was successfully applied to highly complex scenarios.” We also acknowledged the limitation of not conducting algorithmic comparisons.

Author Response File: Author Response.docx

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

Authors have made significant changes. I have no further comments.

Back to TopTop