Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

FS-DDPG: Optimal Control of a Fan Coil Unit System Based on Safe Reinforcement Learning

Buildings 2025, 15(2), 226; https://doi.org/10.3390/buildings15020226

by Chenyang Li¹, Qiming Fu^1,2,3,*, Jianping Chen^1,2,3,4,*, You Lu^1,3

, Yunzhe Wang^1,3 and Hongjie Wu¹

Reviewer 1:

Torkan Shafighfard

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Buildings 2025, 15(2), 226; https://doi.org/10.3390/buildings15020226

Submission received: 4 November 2024 / Revised: 31 December 2024 / Accepted: 9 January 2025 / Published: 14 January 2025

(This article belongs to the Section Building Energy, Physics, Environment, and Systems)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This study proposes a new FCU control method based on FS-DDPG, which can achieve the optimal control policy while suppress dramatic fluctuations in action during the control process. It claims that innovatively models the FCU control problem as a constrained Markov decision process. The paper should be revised, and all the comments should be addressed:

1- The abstract should be restructured. The novelty is not highlighted enough. The main contribution of the study should be provided. Number of dataset gathered is important. What is the advantage of your model

2- Provide nomenclature for this study

3- Provide insight in your recent work section how they related to your work and what are the gaps?

4- Needs reference as: However, machine learning relies to heavily on data. If there is insufficient data, poor quality data, or overfitting, system performance might be impacted severely. https://doi.org/10.1016/j.engappai.2024.109053

5- How did you ensure that data is enough for your models?

6- How did you validate your result?

7- How many data you used? Provide statistical metrics for those data

8- Where are your model metrics for each model?

9- You should provide the differences between various models utilized in this study

10- Conclusion should be bulleted point and most important findings should be provided

11- The pros and cons of each model should be mentioned.

12-

Comments on the Quality of English Language

N/A

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Author(s),

Article is well structured and the topic is interesting. However, following comments should be addressed prior to further processing of the article.

1) Refer to abstract: What is FS-DDPG? Authors need to ensure that each short form is described at its first occurrence.

2) Refer to line # 44, 47 & 50: Authors need to proofread the article for typing mistake like “control methods.[3]. However, ….”, “conditions of the FCU Therefore, optimizing…” and “based control (MBC) methods[4], model free…”.

3) Refer to line # 64: Authors need to check this sentence “RL is a method to ML without relying on physical models.” What is ML here?

4) Refer to figure 1: Is it Episode or Time at X axis? If Episode then authors need to elaborate it.

5) Refer to figure 2: Authors need to show the exhaust air as well.

6) Refer to Introduction section: Last paragraph of the Introduction section should describe structure of the article i.e. missing in the current version. Authors need to include such paragraph.

7) Refer to figure 3: Is there unidirectional relation between cooling load data and reinforcement learning? Authors are required to verify the relationship among entities shown in figure 3.

8) Refer to line # 261: How 12 W is calculated?

9) Refer to line # 269: Always include a valid reference with each claim like data source US National Ocean Service. Provide reference here.

10) Refer to line # 270: How weather data was imported to ClimateData.accdb file in the DeST directory? Authors need to include sufficient detail in the revised version?

11) Refer to figure 6: Same task can be done using a simple motorized chilled water regulating valve taking temperature signal at its input without disturbing fan frequency. Why the proposed model is required and how it is different form mentioned chilled water control scheme (motorized control)?

12) Refer to figure 11: Power consumption of MBC is less than that of proposed model. How do authors justify it?

Good luck.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Clearly articulate the specific limitations of traditional RL methods for FCU control. For instance, what type of fluctuations occur, and how do these create safety risks? Providing examples or quantifying these issues would improve comprehension.

Expand on the FS-DDPG algorithm. Include pseudocode or a step-by-step explanation to help readers understand the modifications made to standard DDPG and how constraint tightening and penalty terms are implemented.

Provide a more detailed theoretical explanation for how the constrained Markov decision process (CMDP) framework reduces action fluctuations. Discuss why the proposed changes effectively mitigate risks compared to unconstrained RL methods.

Elaborate on the variable operating conditions simulation platform. Include specific details about the actual parameters, data sources, and scenarios used for testing the algorithm under wet and dry conditions.

Include quantitative performance metrics (e.g., percentage reduction in fluctuations, energy consumption savings) in a tabular or graphical format. Compare FS-DDPG with DDPG and RBC in a way that highlights the proposed method’s advantages.

Conduct and report an ablation study to isolate the contributions of constraint tightening, penalty terms, and other modifications. This will demonstrate which components of FS-DDPG are most impactful.

Discuss the generalizability of FS-DDPG to other HVAC systems or similar control systems. Highlight any assumptions that might limit its applicability in broader contexts.

Address areas for improvement or open questions, such as exploring alternative reward functions, tuning hyperparameters, or applying the method to multi-zone FCU systems. This shows awareness of the field's broader challenges and motivates further research.

Please avoid citing sources that were published before to 2019. Cite current research that are really pertinent to your topic. The study also lacks sufficient citations. Another critical step is to compare the topic of the article to other relevant recent publications or works in order to widen the research's repercussions beyond the issue. Authors can use and depend on these essential works while addressing the topic of their paper and current issues.

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

well done

Author Response

We sincerely thank you for your constructive feedbacks, which have helped us improve the clarity and rigor of our manuscript.

Reviewer 3 Report

Comments and Suggestions for Authors

The proposed method, FS-DDPG, appears to be a variation of the DDPG algorithm with added penalty terms and constraints. The abstract does not highlight any fundamentally new or groundbreaking techniques beyond known RL modifications like constraint addition or action space restriction.

While the paper mentions introducing constraints and penalties, it lacks details on any theoretical advancement or rigorous analysis, such as proofs or derivations that differentiate the approach from existing methods.

The statement that FS-DDPG "ensures the safety of the equipment" is overly generalized and lacks quantified metrics or evidence to back up the claim, making it speculative.

The description of the experimental setup is ambiguous. The abstract does not specify the parameters of the FCU simulation platform, the size of the dataset, or the metrics used for comparison, leaving the validation process questionable.

The abstract only mentions comparisons with DDPG and RBC. It does not explore performance against other advanced RL methods or optimization techniques, raising concerns about the comprehensiveness of the evaluation.

While the abstract claims to use "real FCU system parameters and historical data," it does not clarify how representative or extensive these are. This omission makes it difficult to gauge the practical relevance of the findings.

The reliance on a simulation platform, without clear evidence of real-world deployment or testing, raises the possibility of overfitting to simulated conditions that may not translate to practical scenarios.

The abstract does not address how FS-DDPG performs under varying or unexpected conditions, nor does it discuss the scalability of the method for larger or more complex FCU systems.

Although the paper emphasizes safety, the abstract lacks a robust discussion on how safety is quantified and validated beyond general references to reduced flow fluctuations. There is no mention of specific safety metrics or failure cases tested in the experiments.

Comments on the Quality of English Language

It falls from quality. It should be rejected.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 3

Reviewer 3 Report

Comments and Suggestions for Authors

Provide a detailed mathematical formulation of the constrained Markov decision process, including state space, action space, transition probabilities, and constraints.

Elaborate on the design of the reward function, specifically how the penalty term for process constraints is formulated and its impact on the optimization objectives.

Clarify the methodology and parameters used for constraint tightening, and provide justification for the chosen values or thresholds.

Include a pseudo-code or a detailed algorithmic description of FS-DDPG to improve reproducibility.

Explain why DDPG and RBC were chosen as baselines for comparison. Include a discussion on the limitations of these methods to contextualize the improvements introduced by FS-DDPG.

Provide quantitative metrics or graphs to show how FS-DDPG reduces fluctuations in pump and fan flow compared to DDPG.

Include technical details about the variable operating conditions FCU simulation platform, such as modeling assumptions, equations, and validation of the simulation framework.

Provide more information about the historical data used, including the range, resolution, and preprocessing steps, to ensure the reliability of the results.

Discuss how well the simulation scenarios reflect real-world wet and dry conditions, and whether extreme or edge-case scenarios were considered.

Define the specific metrics used for "system energy consumption," "operational performance," and "satisfaction," and explain how they are measured or calculated.

Add statistical analysis (e.g., confidence intervals, p-values) to validate the significance of performance improvements shown in the experimental results.

Discuss the trade-offs, if any, between energy consumption, operational performance, and satisfaction when optimizing the control policy.

Address the potential long-term impacts of using FS-DDPG on equipment wear and maintenance requirements compared to traditional methods.

Discuss whether the proposed FS-DDPG algorithm can generalize to other HVAC systems or industrial control problems beyond FCU systems.

Conduct a sensitivity analysis to evaluate how changes in key parameters (e.g., penalty term weights, constraint tightening thresholds) affect system performance and stability.

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 4

Reviewer 3 Report

Comments and Suggestions for Authors

The novelty of the proposed FS-DDPG algorithm is not clearly established, as it appears to be a minor variation of existing methods without significant innovation.

The paper does not provide sufficient details about the implementation of the proposed FS-DDPG algorithm, making it difficult to replicate the work.

The modeling of the FCU control problem as a constrained Markov Decision Process is described in a high-level manner without enough mathematical rigor or detailed formulation.

The penalty term introduced in the reward function is not adequately justified, and its parameters are not explained or optimized.

The constraint tightening mechanism is mentioned, but the process for determining the constraint bounds and their impact on system performance is unclear.

The simulation platform lacks validation against real-world FCU system performance, raising concerns about the practical applicability of the results.

The experimental setup and evaluation metrics are not adequately detailed, making it hard to assess the validity and reliability of the results.

The comparisons with DDPG and RBC are insufficiently detailed. The paper does not specify the tuning parameters or experimental setup for these baselines, which could impact the fairness of the comparison.

There is no discussion of the computational complexity or scalability of the FS-DDPG algorithm, which is critical for real-world application.

The safety improvements claimed by the algorithm are not quantified in a meaningful way, and there is no clear evidence to support the reduced risk of equipment damage.

The paper lacks a thorough sensitivity analysis to show how changes in parameters affect the performance and robustness of FS-DDPG.

The authors do not adequately address the limitations of their approach or discuss potential failure cases.

Key technical terms, such as "satisfaction" and "operational performance," are not clearly defined, leading to ambiguity in the evaluation.

The paper does not provide insights into the convergence behavior of the FS-DDPG algorithm or its stability during training.

The discussion of related work is limited and fails to place the proposed approach in the context of existing methods or highlight significant advances over prior art.

The energy consumption improvements are mentioned but not sufficiently broken down or analyzed for different operating conditions.

There is no consideration of real-world constraints such as sensor inaccuracies, network delays, or hardware limitations, which could affect the algorithm’s performance.

The writing style is overly technical in some sections and overly vague in others, making it difficult to follow the core contributions and methodology.

The paper lacks visualizations or detailed explanations of the results, such as plots comparing performance metrics across different methods.

The conclusions drawn from the experiments seem overly optimistic and are not fully supported by the evidence presented in the paper.

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Article Menu

FS-DDPG: Optimal Control of a Fan Coil Unit System Based on Safe Reinforcement Learning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI