Climate Change and Assessing Thermal Comfort in Social Housing of Southeastern Mexico: A Prospective Study Using Machine Learning and Global Sensitivity Analysis
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsQ1: The abstract states that Regression Trees demonstrated "superior performance" and that ambient temperature and solar radiation were the "most influential variables," but it lacks the specific numbers that quantify these findings. An effective abstract should provide a concise summary of the most critical results. Including the best-fit model's performance (e.g., "R^2 > 0.98") and the sensitivity percentages for the dominant variables (e.g., "ambient temperature accounted for ~49% of the variance").
Q2: The introduction, spanning nearly three pages and divided into 3 subsections (1.1, 1.2, 1.3), feels somewhat fragmented. The transition between the discussion of thermal comfort challenges (Section 1.1) and the introduction of ML/GSA methods (Section 1.2) is abrupt.
Q3: The current text introduces the prominence of ML models solely within the context of building science. While accurate, it misses an opportunity to frame this trend within the larger landscape of scientific research, which would add more weight to the methodological choice.
Modified Paragraph: “In recent years, Artificial Intelligence (AI) tools, particularly supervised Machine Learning (ML) models, have gained prominence in various research fields—from geosciences to building science—including the study of thermal comfort in buildings [1] [2].”
References:
[1] K. Luo et al., “Shale pore pressure seismic prediction based on the hydrogen generation and compaction-based rock-physics model and Bayesian Hamiltonian Monte Carlo inversion method,” GEOPHYSICS, vol. 90, no. 2, pp. M15–M30, Feb. 2025, doi: https://doi.org/10.1190/geo2024-0325.1.
[2] Q. Q. Liu et al., “An efficient thin layer equivalent technique of SETD method for thermo-mechanical multi-physics analysis of electronic devices,” International Journal of Heat and Mass Transfer, vol. 192, p. 122816, Apr. 2022, doi: https://doi.org/10.1016/j.ijheatmasstransfer.2022.122816.
Q4: Section 2.1, "Phase 1: Model design of case study", provides good detail on the building geometry and construction materials. However, it omits other critical simulation inputs that heavily influence thermal performance, such as occupancy schedules, internal heat gains (from occupants, lighting, and equipment), and air infiltration rates.
Q5: In Section 2.2, "Phase 2: Database variables for training", the manuscript states that Representative Concentration Pathways (RCPs) were used to create future scenarios. However, it does not describe the specific process or tool used to morph the baseline typical meteorological year (TMY) weather data into future weather files for 2050 and 2100.
Q6: Equation (3) on page 13 defines the annual CDD (\overline{CDD}_{yearly}) as the yearly average of daily CDD values. This deviates from the standard industry and academic definition, where annual CDD is the cumulative sum of daily values over the year.
Q7: The manuscript provides a general justification for choosing RT, SVM, and ANN models and briefly introduces the PAWN method for GSA. However, it does not explain why these specific methods were chosen over other popular alternatives (e.g., Gradient Boosting for ML, Sobol indices for GSA).
Q8: Section 3.2 clearly demonstrates the superior performance of the Regression Tree (RT) model. However, the diagnostic plots reveal interesting characteristics that go unmentioned. Specifically, the error histogram for the RT model (Figure 7, plot a4) shows a distinct bimodal distribution, which is unusual and suggests a specific model behavior. This bimodal error could indicate that the model has learned two slightly different predictive pathways based on certain input conditions.
Q9: Figures 7 and 8 are very information-dense, with each containing 12 separate plots to diagnose model performance. The text walks through them well, but it is easy for a reader to get lost in the details and miss the main point. Readability could be significantly improved by adding a concluding summary sentence to the caption of each figure. For example, the caption for Figure 7 could end with: "Overall, these plots collectively illustrate the superior predictive accuracy and minimal residual error of the Regression Trees model compared to the SVM and ANN alternatives."
Author Response
Reviewer #1 comments
The authors are grateful for the reviewer's comments, which significantly improved the manuscript. Both the reviewer's comments and the author's responses are presented below.
Q1: The abstract states that Regression Trees demonstrated "superior performance" and that ambient temperature and solar radiation were the "most influential variables," but it lacks the specific numbers that quantify these findings. An effective abstract should provide a concise summary of the most critical results. Including the best-fit model's performance (e.g., "R^2 > 0.98") and the sensitivity percentages for the dominant variables (e.g., "ambient temperature accounted for ~49% of the variance").
Response: We thank the reviewer for this valuable observation. We have revised the abstract to incorporate key numerical values that quantify our main results. The revised text now explicitly includes the model performance metrics (R² > 0.98 for both comfort temperature and cooling degree days), the specific sensitivity percentages for ambient temperature (45-49%) and solar radiation (17-22%), and quantitative projections for future cooling demands. Please see the Abstract in the new version of the manuscript.
Q2: The introduction, spanning nearly three pages and divided into 3 subsections (1.1, 1.2, 1.3), feels somewhat fragmented. The transition between the discussion of thermal comfort challenges (Section 1.1) and the introduction of ML/GSA methods (Section 1.2) is abrupt.
Response: We appreciate the reviewer's observation regarding the flow and structure of the introduction. To address this concern, we have added transitional paragraphs at the end of Section 1.1 and the beginning of Section 1.2 that explicitly connect the thermal comfort challenges identified in tropical climates with the need for advanced computational approaches. Specifically, we have inserted text that explains how the complexity and scale of thermal comfort assessment under multiple climate change scenarios necessitate the integration of machine learning and sensitivity analysis methods, thereby creating a clear bridge between the problem statement and the methodological solutions. Additionally, we have revised the opening sentence of Section 1.2 to reference the challenges discussed in Section 1.1, ensuring that readers can follow the logical progression from identifying the problem to introducing the tools that can address it. Please see the Introduction section in the new version of the manuscript.
Q3: The current text introduces the prominence of ML models solely within the context of building science. While accurate, it misses an opportunity to frame this trend within the larger landscape of scientific research, which would add more weight to the methodological choice.
Modified Paragraph: "In recent years, Artificial Intelligence (AI) tools, particularly supervised Machine Learning (ML) models, have gained prominence in various research fields—from geosciences to building science—including the study of thermal comfort in buildings [1] [2]."
References:
[1] K. Luo et al., "Shale pore pressure seismic prediction based on the hydrogen generation and compaction-based rock-physics model and Bayesian Hamiltonian Monte Carlo inversion method," GEOPHYSICS, vol. 90, no. 2, pp. M15–M30, Feb. 2025, doi: https://doi.org/10.1190/geo2024-0325.1.
[2] Q. Q. Liu et al., "An efficient thin layer equivalent technique of SETD method for thermo-mechanical multi-physics analysis of electronic devices," International Journal of Heat and Mass Transfer, vol. 192, p. 122816, Apr. 2022, doi: https://doi.org/10.1016/j.ijheatmasstransfer.2022.122816.
Response: We thank the reviewer for this insightful suggestion and for providing the relevant references. We have incorporated the reviewer's suggested text and references into the manuscript as recommended (references [19,20]). The revised paragraph now explicitly acknowledges the widespread adoption of ML methods across multiple scientific domains, from geosciences to building science, thereby providing a more comprehensive context for our application of these techniques to thermal comfort assessment. Please see the Section 1.2 in the new version of the manuscript.
Q4: Section 2.1, "Phase 1: Model design of case study", provides good detail on the building geometry and construction materials. However, it omits other critical simulation inputs that heavily influence thermal performance, such as occupancy schedules, internal heat gains (from occupants, lighting, and equipment), and air infiltration rates.
Response: We very much appreciate the reviewer's observation regarding simulation parameters. We acknowledge that occupancy patterns, internal heat gains, and infiltration rates are important factors in comprehensive building energy simulations. However, want to clarify that our study deliberately focuses on evaluating the passive thermal performance of the building envelope under various climate scenarios and construction systems, rather than modeling actual energy consumption or occupant behavior. This methodological choice was made for several reasons. First, our primary objective is to assess how building typology, geographical location, roof construction systems, and climatic variables influence thermal comfort indicators across multiple climate change scenarios, which requires isolating the effects of these design and environmental parameters. Second, social housing in Mexico exhibits highly variable occupancy patterns and usage profiles, depending on household composition, economic activities, and cultural practices, making it difficult to define representative occupancy schedules that are applicable across the diverse contexts of our four study cities. Third, by excluding occupancy-related variables, our framework provides a baseline assessment of the building envelope's intrinsic thermal behavior that can be later adjusted for specific occupancy scenarios in future research or practical applications. The outdoor operating temperature calculated by Design Builder under free-running conditions represents the thermal response of the building envelope to external climatic conditions, which is the focus of our sensitivity analysis and machine learning models. We have added clarifying text in Section 2.1 to explicitly state these modeling assumptions and their justification, ensuring that readers understand the scope and limitations of our simulation approach. Please see Section 2.1 in the new version of the manuscript.
Q5: In Section 2.2, "Phase 2: Database variables for training", the manuscript states that Representative Concentration Pathways (RCPs) were used to create future scenarios. However, it does not describe the specific process or tool used to morph the baseline typical meteorological year (TMY) weather data into future weather files for 2050 and 2100.
Response: We thank the reviewer for requesting this important methodological clarification. The current baseline scenario and the future climate weather data for RCP scenarios (2.6, 4.5, and 8.5) for both 2050 and 2100 were obtained directly from Meteonorm software. Meteonorm generates synthetic weather files in EnergyPlus Weather (EPW) format based on climate models that incorporate IPCC RCP projections. The weather files contain Typical Meteorological Year (TMY) type data with hourly temporal resolution for all required climatic variables (ambient temperature, relative humidity, solar radiation, and wind velocity). We have added this methodological detail to Section 2.2 in subsection "Climatic Data". Please see this Section in the new version of the manuscript.
Q6: Equation (3) on page 13 defines the annual as the yearly average of daily CDD values. This deviates from the standard industry and academic definition, where annual CDD is the cumulative sum of daily values over the year.
Response: We thank the reviewer for this important clarification. We acknowledge that the standard definition of annual CDD in building energy analysis is the cumulative sum of daily values (Σ CDDday), and we recognize that our terminology may have been confusing. Our metric, which calculates the mean of daily CDD values across the year, serves a specific purpose within our machine learning framework; however, it should not be referred to as "annual CDD" to avoid conflicting with established conventions. We have revised the manuscript to clarify this distinction by renaming our metric as "mean daily CDD" ( ) throughout the document. This metric represents the average daily cooling demand over the year and provides a normalized indicator that facilitates comparison across different climate scenarios and geographical locations within our machine learning models. The mathematical notation with the overbar explicitly indicates that this is a mean value rather than a cumulative sum. We have also added explanatory text in Section 2.3 to clarify that this metric differs from the conventional annual CDD. Please see section 2.3 in the new version of the manuscript and Figure 6 of the results section.
Q7: The manuscript provides a general justification for choosing RT, SVM, and ANN models and briefly introduces the PAWN method for GSA. However, it does not explain why these specific methods were chosen over other popular alternatives (e.g., Gradient Boosting for ML, Sobol indices for GSA).
We appreciate the reviewer's request for a more detailed justification of our methodological choices. The selection of Regression Trees, Support Vector Machines, and Artificial Neural Networks was based on their demonstrated effectiveness and widespread adoption in building energy and thermal comfort research. These three algorithms represent distinct modeling paradigms: tree-based, kernel-based, and neural network-based approaches, enabling comprehensive evaluation of different learning strategies for thermal performance prediction. While ensemble methods like Gradient Boosting and Random Forest offer potential advantages, RT, SVM, and ANN were selected due to their proven robustness, interpretability, and computational efficiency for building thermal analysis applications. Regarding the global sensitivity analysis, we selected the PAWN method over variance-based approaches like Sobol indices due to PAWN is a distribution-based method that achieves stable results with significantly smaller sample sizes compared to Sobol indices, often requiring 5 -10 times fewer model evaluations, while being more robust for non-normal output distributions typical in thermal comfort analysis. We have added this expanded justification, along with appropriate references, to Section 2.3 in the new version of the manuscript.
Q8: Section 3.2 clearly demonstrates the superior performance of the Regression Tree (RT) model. However, the diagnostic plots reveal interesting characteristics that go unmentioned. Specifically, the error histogram for the RT model (Figure 7, plot a4) shows a distinct bimodal distribution, which is unusual and suggests a specific model behavior. This bimodal error could indicate that the model has learned two slightly different predictive pathways based on certain input conditions.
Response: We thank the reviewer for this insightful observation regarding the bimodal error distribution in the Regression Tree model. The reviewer is correct in identifying this characteristic pattern, which indeed reflects the inherent structure of decision tree algorithms. This bimodal distribution arises from the RT model's hierarchical splitting mechanism, where predictions are made based on discrete decision rules that partition the input space into distinct regions. The two peaks in the error histogram likely correspond to predictions made at different terminal nodes or branches of the tree, reflecting the model's ability to capture different thermal performance regimes in our dataset. This behavior is characteristic of tree-based models and demonstrates the algorithm's ability to effectively segment the complex relationships between climate variables, building parameters, and thermal comfort outcomes. However, both error peaks are tightly clustered near zero, which confirms that the model maintains high accuracy across both pathways. We have added this interpretation to Section 3.2 to provide readers with a deeper understanding of the RT model's predictive behavior. Please see the Section 3.2 in the new version of the manuscript.
Q9: Figures 7 and 8 are very information-dense, with each containing 12 separate plots to diagnose model performance. The text walks through them well, but it is easy for a reader to get lost in the details and miss the main point. Readability could be significantly improved by adding a concluding summary sentence to the caption of each figure. For example, the caption for Figure 7 could end with: "Overall, these plots collectively illustrate the superior predictive accuracy and minimal residual error of the Regression Trees model compared to the SVM and ANN alternatives."
Response: We appreciate the reviewer's comment and agree that adding concluding summary sentences to the figure captions will significantly improve readability, helping readers avoid getting lost in the details and missing the main point. Therefore, we have revised the captions for both Figures 7 and 8 to include concise summary statements that highlight the main conclusions drawn from each set of plots. Please see Figures 7 and 8 in the new version of the manuscript.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis is a very interesting study that covers the topic of thermal comfort in social housing in a city in Mexico. The flow is smooth. The results are comprehensive and almost discussed properly. The literature summary (Table 1) as well as the contribution of the study are well presented. However, I have some comments and questions as shown below:
- Please improve your abstract. what is the problem? what are the main numerical findings?
- Please justify the selection the following: Tcomfort = 0.31Toutdoor + 17.8.
- in Table 2, the EPS insulation conductivity is 0.04 W/m·K. Please elaborate on the selection of this value specifically.
- Please state heat gains in Figure 2. Am I missing something?
- In Figure 5, why does Campeche consistently exhibit the highest comfort temperatures compared to Cancún and Tuxtla?
- Figure 2 (a1): tipology or typology?
- Some trend requires further justifications such as: 1- Please discuss why the CDD growth is more pronounced in Tuxtla than in Cancún (Figure 52). 2- Why does solar radiation rank lower than wind speed in influence, despite expectations that radiation dominates thermal loads in tropical climates? (Figure 4). 3- Please provide more explanation of why Regression Trees outperformed ANN and SVM.(Table 5).
- The Conclusions should be concise.
Please revise the English. There are some typos.
Author Response
Reviewer #2 comments
This is a very interesting study that covers the topic of thermal comfort in social housing in a city in Mexico. The flow is smooth. The results are comprehensive and almost discussed properly. The literature summary (Table 1) as well as the contribution of the study are well presented. However, I have some comments and questions as shown below:
The authors are grateful for the reviewer's comments, which significantly improved the manuscript. Both the reviewer's comments and the author's responses are presented below.
- Please improve your abstract. what is the problem? what are the main numerical findings?
Response: We appreciate the reviewer's comment on improving the clarity of the abstract. We have revised the abstract to state the research problem at the beginning explicitly and to present the main numerical findings more prominently. Please see the abstract in the new version of the manuscript.
- Please justify the selection the following: Tcomfort = 0.31Toutdoor + 17.8.
Response: We thank the reviewer for requesting clarification on the comfort temperature equation. The equation Tcomfort = 0.31Toutdoor + 17.8 is the adaptive thermal comfort model established by ASHRAE Standard 55. Therefore, the justification has been added to Section 2.3 in the new version of the manuscript.
- in Table 2, the EPS insulation conductivity is 0.04 W/m·K. Please elaborate on the selection of this value specifically.
Response: We appreciate the reviewer's comment. The thermal conductivity value of 0.04 W/m·K for expanded polystyrene (EPS) insulation was selected based on standard reference values for this material as reported in established heat transfer literature and Mexican building standards. This value corresponds to typical EPS insulation used in construction applications. The specific value of 0.04 W/m·K has been previously validated in building thermal performance studies in similar Mexican climatic contexts and is consistent with values specified in the Mexican Official Standard NOM-020-ENER-2011 for building thermal insulation materials. We have added this clarification as a footnote to Table 2 and included the appropriate references to support this material property selection. Please see Table 2 in the new version of the manuscript.
- Please state heat gains in Figure 2. Am I missing something?
Response: We greatly appreciate the reviewer's comment. We completely agree that heat gain information is crucial for understanding a building's thermal behavior. However, we would like to clarify that Figure 2 was designed specifically as a schematic architectural diagram to illustrate the spatial configuration and geometry of the two housing typologies analyzed in this study. Given the complexity of our analysis framework, which includes two typologies, four cities, and seven climate scenarios per city, we opted to present the comprehensive thermal performance results through monthly heat maps in Figure 5, which visualize comfort temperature variations across all configurations. Nevertheless, in response to this valuable comment, we have added a new paragraph in Section 3.1 that explicitly addresses heat gain mechanisms, including the contribution of solar radiation, and how the outdoor operating temperatures used to calculate comfort temperature represent the building's overall thermal balance. We hope these additions adequately address the reviewer's comment and strengthen the manuscript by providing the thermal performance context, which is appropriately identified as important. Please see Section 3.1 in the new version of the manuscript.
- In Figure 5, why does Campeche consistently exhibit the highest comfort temperatures compared to Cancún and Tuxtla?
Response: We appreciate the reviewer's observation. We have added this clarification to Section 3.1 to explain these geographic thermal performance differences. Please see the explanation of Figure 5 in Section 3.1 in the new version of the manuscript.
- Figure 2 (a1): tipology or typology?
Response: We appreciate the reviewer's comment. The grammatical error has been corrected. Please see Figure 2 in the new version of the manuscript.
- Some trend requires further justifications such as: 1- Please discuss why the CDD growth is more pronounced in Tuxtla than in Cancún (Figure 52). 2- Why does solar radiation rank lower than wind speed in influence, despite expectations that radiation dominates thermal loads in tropical climates? (Figure 4). 3- Please provide more explanation of why Regression Trees outperformed ANN and SVM. (Table 5).
Response: We thank the reviewer for these insightful observations. Regarding point 1, Tuxtla Gutiérrez's inland valley location at a higher elevation results in greater temperature variability and sensitivity to climate change compared to Cancún's maritime-moderated climate, leading to more pronounced proportional CDD increases despite lower absolute values. For point 2, we believe the reviewer meant to refer to ambient temperature rather than radiation. The reason is that the ambient temperature's higher sensitivity reflects its comprehensive nature in the adaptive comfort model, capturing the cumulative effect of solar radiation absorbed by surfaces along with other climatic factors. In contrast, the independent contribution of solar radiation (17-22%) represents its direct influence beyond what is already captured through ambient temperature. Regarding point 3, Regression Trees' superior performance stems from their ability to efficiently exploit the categorical structure of our dataset (encoded typologies, cities, and scenarios). We have expanded the discussion of these three points in Sections 3.1, 3.3, and 3.2, respectively. Please see these sections in the new version of the manuscript.
- The Conclusions should be concise.
Response: We agree with the reviewer's observation. We have revised the Conclusions section to be more concise while retaining the key findings and implications of the study. Please see the conclusion section in the new version of the manuscript.
- The English could be improved to more clearly express the research.
Response: We appreciate the reviewer's feedback regarding language clarity. The manuscript has been thoroughly revised and edited by a native English speaker to improve the expression and clarity of the research throughout all sections.
Author Response File:
Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have addressed all the comments.

