Machine Learning-Based Temperature Forecasting for Sustainable Climate Change Adaptation and Mitigation
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe study compares artificial neural networks (ANNs) and machine learning models for temperature forecasting in Istanbul. However, the distinct contribution of the study compared to similar works in the literature needs to be more explicitly highlighted. Beyond the methods used, the study should clarify what makes these applications unique or innovative.
The analysis is based on meteorological data from 1950-2023. More details about the quality and representativeness of the dataset would strengthen the study. For instance, how were missing data points handled? Were there any concerns about the reliability or completeness of the data sources?
While the study compares the performance of ANN and machine learning models, the parameter-tuning strategies could be further detailed. Additionally, a more comprehensive discussion of the methodological limitations would be beneficial. For instance, which environmental or systemic factors might affect the accuracy of the modeled parameters?
The application of k-fold cross-validation is appropriate; however, the performance degradation at lower k-fold values raises questions about the robustness of the models. This highlights the potential need for larger datasets or alternative validation techniques to improve generalization.
The Random Forest (RF) model outperformed others, but the reasons behind its superior performance could be explored further. For example, the structure of the dataset or the model's resistance to overfitting might have contributed to its success. This could be discussed in more depth.
While the study demonstrates the effectiveness of the methods for Istanbul’s climate data, the generalizability of these findings to other regions or climate systems is not thoroughly discussed. Suggestions for adapting the methodology to different geographic or climatic conditions would enhance its impact.
The findings are clearly presented with graphs, but the readability of some visualizations could be improved. Establishing a clearer connection between tables, graphs, and the narrative would help readers better understand the results.
The article suggests exploring advanced deep learning models like LSTM in future studies. These recommendations could be more detailed, such as proposing specific approaches to hyperparameter optimization or including other geographic or climatic parameters.
The study emphasizes potential applications in areas such as energy planning and agricultural activities. Including concrete examples or case studies demonstrating the practical utility of the findings would enhance the paper’s relevance.
The article provides sufficient references to the literature but could benefit from citing more recent studies. Expanding the bibliography with references to closely related or highly impactful works would improve the context of the research.
Author Response
Response to Reviewer 1
The authors thank the editor and the reviewers for their precious time and invaluable comments. We have carefully addressed all the comments.
Thank you for your in-depth reading of the paper and your respectable comments. In the revised version, the authors have made every effort to answer and satisfy your concerns and further enhance the manuscript following your recommendations. Please find below, a detailed account of the answers and changes introduced in response to your comments.
- The study compares artificial neural networks (ANNs) and machine learning models for temperature forecasting in Istanbul. However, the distinct contribution of the study compared to similar works in the literature needs to be more explicitly highlighted. Beyond the methods used, the study should clarify what makes these applications unique or innovative.
The original contribution of this study is the comparative analysis of both artificial neural networks (ANN) and different machine learning methods (LM, SVM, KNN, RF) using meteorological data specific to Istanbul (humidity, wind, and precipitation). Studies in the literature have generally been conducted using either a single method or focusing on different regions and short periods. Our study provides a comprehensive model for a city like Istanbul, which is heavily affected by climate change, by integrating a wide time period (1950-2023) and multivariate data. This innovation is an important reference in increasing regional climate analysis and forecast accuracy. The article has been updated within this logic.
- The analysis is based on meteorological data from 1950-2023. More details about the quality and representativeness of the dataset would strengthen the study. For instance, how were missing data points handled? Were there any concerns about the reliability or completeness of the data sources?
The dataset used here was supplied by the MuÅŸ Province Meteorology Directorate. The dataset used was obtained from the past to the present by measuring with devices calibrated by an official state institution as robust and reliable. Despite everything, before using the dataset, it was investigated whether there were any empty values and records containing such values were not used in the study. No reliability problems were detected in the data sources, so the data set was found to be sufficient to represent the climate characteristics of Istanbul.
- While the study compares the performance of ANN and machine learning models, the parameter-tuning strategies could be further detailed. Additionally, a more comprehensive discussion of the methodological limitations would be beneficial. For instance, which environmental or systemic factors might affect the accuracy of the modeled parameters?
The ANN and machine learning models used in the study were structured with certain strategies for parameter optimization. The Levenberg-Marquardt algorithm was used for the ANN model and hyperparameters such as learning rate, epoch number and hidden layer number were experimentally optimized. In machine learning models, hyperparameter selection was made with the cross-validation (k-fold) approach, and the generalization performance of the models was increased by selecting the most appropriate values.
In terms of methodological limitations, environmental and systemic factors can affect model accuracy. For example:
- Environmental Factors: Sudden climate changes, microclimatic effects or anthropogenic changes (e.g. urban heat island effect) can reduce the accuracy of the predictions.
- Data Set Representativeness: The limited data provided by the data set used for some years or seasons can affect model performance, especially at extreme values.
- Model Complexity: The complexity of some models can lead to overfitting in high-dimensional data, which can cause errors in real-world predictions.
- The application of k-fold cross-validation is appropriate; however, the performance degradation at lower k-fold values raises questions about the robustness of the models. This highlights the potential need for larger datasets or alternative validation techniques to improve generalization.
K-fold cross-validation is an effective method to evaluate model generalization performance. In the study, it was observed that the performance decreases with decreasing k-fold value. This situation may occur due to an imbalance of training and test datasets at low k-fold values. Using larger datasets can reduce this effect by improving the training-test distribution.
- The Random Forest (RF) model outperformed others, but the reasons behind its superior performance could be explored further. For example, the structure of the dataset or the model's resistance to overfitting might have contributed to its success. This could be discussed in more depth.
The superior performance of the Random Forest (RF) model is due to both the structural features of the model and the nature of the dataset:
- Model Resistance: The RF model uses an ensemble learning method by combining the results of multiple decision trees. This approach reduces the tendency of a single model to overfitting and increases its generalization ability. Random sampling (bagging) and random selection of feature subsets allow the model to show strong performance even on high-dimensional and complex datasets.
- Dataset Structure: The dataset used in the study consists of interrelated meteorological variables such as humidity, wind and precipitation. The RF model can effectively capture dependencies in such complex and multivariate datasets. The fact that RF is more tolerant to missing data and nonlinear relationships between variables has increased its performance.
- Resistance to Overfitting: Random sub-datasets are used in the training of each tree in the RF model, and this increases the generalization ability of the model and reduces the risk of overfitting.
- While the study demonstrates the effectiveness of the methods for Istanbul’s climate data, the generalizability of these findings to other regions or climate systems is not thoroughly discussed. Suggestions for adapting the methodology to different geographic or climatic conditions would enhance its impact.
Although the study focuses on the climate data of Istanbul, the methods used can be adapted to other regions and climate systems. The climatic and geographical characteristics of each region are different. For example, the effects of variables such as precipitation and humidity in arid regions may be different from those in Istanbul. The dataset should be expanded to accurately represent the climatic conditions of the target region. For different regions, the hyperparameters of the models used should be re-optimized. In particular, region-specific hyperparameter optimization can be performed to increase the accuracy of the ANN and RF models. Additional region-specific variables can be added to the model inputs by considering sectoral effects such as agriculture, energy or water management. Since RF and ANN models can successfully work with complex and multivariate datasets, they can be generalized to different regions and climate systems. However, the performance of the models may vary depending on the quality, length and diversity of the dataset.
- The findings are clearly presented with graphs, but the readability of some visualizations could be improved. Establishing a clearer connection between tables, graphs, and the narrative would help readers better understand the results.
An effort was made to improve the quality of the graphics used in the study.
- The article suggests exploring advanced deep learning models like LSTM in future studies. These recommendations could be more detailed, such as proposing specific approaches to hyperparameter optimization or including other geographic or climatic parameters.
Although this study has shown the usability of ANN and RF models for temperature pre-dictions in Istanbul, it was concluded that this methodology can also be applied to other regional climate analyses. Future studies can focus on optimizing the model's hyperparameters and further increasing the prediction accuracy by using different input variables. In addition, comparing deep learning models with different network structures and algorithms is among the potential research areas. Such studies will contribute to our better understanding of the effects of climate change and taking effective measures.
- The study emphasizes potential applications in areas such as energy planning and agricultural activities. Including concrete examples or case studies demonstrating the practical utility of the findings would enhance the paper’s relevance.
To emphasize the importance of the study’s findings in practical applications, we can add concrete examples and potential scenarios as follows:
- Energy Planning: Future temperature estimates of Istanbul can be used in energy demand management. For example, increasing temperatures in the summer months increase energy consumption due to air conditioning use. These estimates can help energy distribution companies to prepare in advance and ensure supply-demand balance.
- Agriculture and Irrigation Management: Temperature, humidity and precipitation estimates can be used in planning crop patterns in agricultural areas around Istanbul. For example, farmers can be recommended appropriate plant species and irrigation methods according to the predicted temperature increase.
- The article provides sufficient references to the literature but could benefit from citing more recent studies. Expanding the bibliography with references to closely related or highly impactful works would improve the context of the research.
In line with your suggestions, the literature section has been updated and expanded with new and current publications.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors compare the effectiveness of ANN and linear models, support vector machines, K-nearest neighbor, and random forests in predicting temperatures in Istanbul province. The results show that ANN and RF models are good predictive models. It is a detailed and valuable work that fits with the aims and scope of the journal. It can be published after some revisions. Specific comments are as follows:
(1) Whether there are any anomalies in the raw data, sitting how to deal with anomalous data.
(2) It is well known that input variables are critical to model predictions. The impact of different variables on the model needs to be assessed.
(3) The literature review is superficial as some of the essential papers dealing with more frontier technics of machine learning are omitted; see, e.g.:
https://doi.org/10.3390/atmos13020180
http://doi.org/10.1016/j.applthermaleng.2024.124923
(4) How were the key variables selected?
(5) Was the data randomly split, or was it divided based on the sequence of data collection?
(6) The English and format of the manuscript should be further improved. For example, R2.
Author Response
The authors thank the editor and the reviewers for their precious time and invaluable comments. We have carefully addressed all the comments. Thank you for your in-depth reading of the paper and your respectable comments. In the revised version, the authors have made every effort to answer and satisfy your concerns and further enhance the manuscript following your recommendations. Please find below, a detailed account of the answers and changes introduced in response to your comments.
Answer to Reviewer
The authors compare the effectiveness of ANN and linear models, support vector machines, K-nearest neighbor, and random forests in predicting temperatures in Istanbul province. The results show that ANN and RF models are good predictive models. It is a detailed and valuable work that fits with the aims and scope of the journal. It can be published after some revisions. Specific comments are as follows:
1- Whether there are any anomalies in the raw data, sitting how to deal with anomalous data.
While processing the raw data in the article, the record lines containing missing data were completely deleted and the data obtained with the most accurate methods available were used in the study. In other words, missing or blank data were not included in the study.
2- It is well known that input variables are critical to model predictions. The impact of different variables on the model needs to be assessed.
The article states that input variables are of critical importance in model predictions and that the impact of these variables should be evaluated.
In the study, humidity, wind speed and precipitation were used as input variables for temperature prediction. The impact of these variables on temperature was also examined visually in the article.
As a result, the impact of input variables on the model was evaluated in the article using visual analysis and different machine learning models. It was determined that humidity, wind and precipitation variables had a negative impact on temperature. It was also stated that the Random Forest (RF) model performed better than other models.
3- The literature review is superficial as some of the essential papers dealing with more frontier technics of machine learning are omitted; see, e.g.:
https://doi.org/10.3390/atmos13020180
http://doi.org/10.1016/j.applthermaleng.2024.124923
In line with your suggestions, the literature section has been updated and expanded with new and current publications.
4- How were the key variables selected?
Humidity, wind speed and precipitation data were used as input variables for the temperature prediction model. These variables are widely used parameters known to be related to temperature in meteorological studies. The importance of these variables in temperature prediction is supported by the evaluation of regional climate data and visual analysis.
5- Was the data randomly split, or was it divided based on the sequence of data collection?
In the article, k-fold cross-validation technique is used for data division. This technique is used to evaluate the generalization ability of the model by dividing the dataset into random subsets.
In the k-fold cross-validation method, the dataset is divided into k random subsets of equal or approximately equal size. Each of these subsets is used as the test dataset in turn, while the other k-1 subsets are used as the training dataset. This process is repeated until all subsets are used as the test dataset.
In the article, different values ​​such as 130, 65, 50, 25, 10 and 5 are used as k-fold values. This shows that the dataset is randomly divided into different numbers of subsets.
(6) The English and format of the manuscript should be further improved. For example, R2.
The work was checked from beginning to end and attempts were made to correct spelling errors and English grammatical errors.
Reviewer 3 Report
Comments and Suggestions for AuthorsAfter reading the whole article I found a number of issues related to:
Abstract. The author does not clearly state the main objective of his research, nor the methodology used, the results obtained or the final conclusions. I suggest the author to realize these aspects.
Introduction. The author does not specify what is the gap covered in the literature and what differentiates this study from others already done. I suggest the author to clarify this aspect.
Literature review. The author presents only a few significant studies related to ANN forecasting models but does not present a clear structuring of the advantages and disadvantages offered by this model through the studies carried out by specialists. I suggest the author to improve the literature by clearly structuring this section and highlighting the advantages and limitations of this forecasting model.
Material and Methods. The author describes the two branches of the analysis carried out by means of equations and graphs used as examples.
Findings. The author describes the results of the performed analysis and tests by means of data tables, graphs and appropriate comments.
Conclusions. The author presents some of his own contributions to the realization of the study and some research directions but does not present the limits of his study nor the practical implications that the model used and researched by him would have. I suggest the author to realize these aspects
Author Response
The authors thank the editor and the reviewers for their precious time and invaluable comments. We have carefully addressed all the comments. Thank you for your in-depth reading of the paper and your respectable comments. In the revised version, the authors have made every effort to answer and satisfy your concerns and further enhance the manuscript following your recommendations. Please find below, a detailed account of the answers and changes introduced in response to your comments.
Answer to Reviewer
Abstract. The author does not clearly state the main objective of his research, nor the methodology used, the results obtained or the final conclusions. I suggest the author to realize these aspects.
According to your instructions, the abstract section has been updated as follows:
In this study, using monthly humidity, wind, precipitation and temperature data for the province of Istanbul between 1950-2023, temperature estimates were modeled with artificial neural networks (ANN) and machine learning models (Linear Model, Support Vector Machine, K-Nearest Neighbor, Random Forest). Approximately 96% accuracy estimates were obtained with the ANN model, and the Random Forest (RF) model showed the best performance among machine learning models. The generalization ability of the models was increased with the K-fold cross-validation technique. It was observed that the input variables (humidity, wind, precipitation) had a negative effect on temperature. The results obtained show that artificial intelligence and machine learning methods are effective in temperature estimates and that ANN and Random Forest models are reliable tools. These methods can guide similar studies in areas such as climate change, energy planning and agriculture. The methodology of the study is also applicable to other regional climate analyzes.
Introduction. The author does not specify what is the gap covered in the literature and what differentiates this study from others already done. I suggest the author to clarify this aspect.
In the literature, the use of artificial neural networks and machine learning methods for temperature predictions is becoming widespread, but studies comparing different methods are limited. Most studies focus on a specific region or time period, creating a need to present a general methodology and demonstrate its applicability to different regions. In particular, comprehensive temperature prediction studies that include meteorological variables such as humidity, wind and precipitation for Istanbul are lacking in the literature. This study compares the temperature prediction performance of artificial neural networks (ANN) and different machine learning models (Linear Model, Support Vector Machine, K-Nearest Neighbor, Random Forest) using monthly meteorological data for the province of Istanbul between 1950-2023. The generalization ability of the models is evaluated with the K-fold cross-validation technique, and a comparative performance analysis of different algorithms is presented to the literature. The study aims to fill the gap in the literature by aiming to increase the accuracy of predictions made with meteorological data in a period when the effects of climate change are increasing. This study provides a comprehensive overview of the literature by comparing the performance of both ANN and different machine learning methods on the same dataset and provides a reliable methodology that can be used in regional climate analyses.
Literature review. The author presents only a few significant studies related to ANN forecasting models but does not present a clear structuring of the advantages and disadvantages offered by this model through the studies carried out by specialists. I suggest the author to improve the literature by clearly structuring this section and highlighting the advantages and limitations of this forecasting model.
In line with your suggestions, the literature section has been updated and expanded with new and current publications.
Material and Methods. The author describes the two branches of the analysis carried out by means of equations and graphs used as examples.
Findings. The author describes the results of the performed analysis and tests by means of data tables, graphs and appropriate comments.
Conclusions. The author presents some of his own contributions to the realization of the study and some research directions but does not present the limits of his study nor the practical implications that the model used and researched by him would have. I suggest the author to realize these aspects
This section has been updated within the framework you stated.
Reviewer 4 Report
Comments and Suggestions for AuthorsThis paper compares the suitability of artificial neural networks (ANN) and machine learning models (linear model, support vector machine, K nearest neighbor, random forest) in predicting the temperature in Istanbul province. Some comments are as follows:
1. Lines 153-156 are duplicates of the above (lines 145-147); please delete and reorganize the content.
2. Line 183-185, why does this study use the feed-forward topology not recurrent topology? It seems like the recurrent topology can predict the temperature more accurately. Please describe the reasons for selecting feed-forward topology.
3. Line 211-213, what is the source of the dataset? Please provide relevant information.
4. Why are these parameters (humidity, wind, precipitation) chosen as inputs?
5. Figure 4, the units of the data are inconsistent, and it is suggested that they be split into four graphs and labeled with x-axis titles. Also, please explain why the horizontal coordinate of Figure 4 only shows roughly 670 months when, in fact, 1950-2023 totals 888 months.
6. Line 230, in section 3.2, there is a lot of repetition and redundancy in the descriptions here, and it is recommended that the models be tabulated or briefly described.
7. Line 292-296, it is suggested that the content can be moved to the corresponding method section. Only the results are shown here.
8. Figure 5, the horizontal and vertical coordinates of the graph should show the units.
9. Line 329-337, the content of this paragraph is duplicated above, please reorganize the content. Also, Figure 7 shows the prediction results of the ANN method and suggests deleting Table 1.
10. Line 369-370, this trend is insignificant, please provide the relevant literature to support it.
11. Figure 9, this figure is confusing. Should the horizontal coordinate represent CV k-fold?
12. Line 451-462, the content of this paragraph is duplicated. Please reorganize the expression and have a more in-depth and valuable discussion.
13. The conclusions section is too wordy, so please list the results of the most critical findings of this study in this section.
14. The following reference should be considered:
https://doi.org/10.1177/0361198119846473 https://doi.org/10.1016/j.conbuildmat.2022.127029
Comments on the Quality of English LanguageCan be approved
Author Response
The authors thank the editor and the reviewers for their precious time and invaluable comments. We have carefully addressed all the comments. Thank you for your in-depth reading of the paper and your respectable comments. In the revised version, the authors have made every effort to answer and satisfy your concerns and further enhance the manuscript following your recommendations. Please find below, a detailed account of the answers and changes introduced in response to your comments.
Answer to Reviewer
- Lines 153-156 are duplicates of the above (lines 145-147); please delete and reorganize the content.
Thank you for pointing this out. We have edited the duplicate lines (153–156) and reorganized the section to avoid redundancy. The revised content now presents the necessary information concisely.
- Line 183-185, why does this study use the feed-forward topology not recurrent topology? It seems like the recurrent topology can predict the temperature more accurately. Please describe the reasons for selecting feed-forward topology.
We appreciate your comment. The feed-forward topology was selected for its simplicity and suitability for our dataset and modeling objective. While recurrent networks (RNNs) are indeed powerful for time-series data, they require extensive computational resources and are prone to vanishing gradient issues, especially for datasets with long time horizons. Feed-forward networks, coupled with effective preprocessing and cross-validation techniques, offered sufficient accuracy and generalization capability in this case. We have updated the manuscript to include these clarity reasons.
- Line 211-213, what is the source of the dataset? Please provide relevant information.
The dataset was sourced from the MuÅŸ Province Meteorology Directorate, an official governmental institution, ensuring high accuracy and reliability. This information has been added to the manuscript (Line 295-300)
- Why are these parameters (humidity, wind, precipitation) chosen as inputs?
These parameters were chosen based on their strong meteorological correlation with temperature. Humidity, wind, and precipitation directly influence temperature variations and are widely used in similar studies for predictive modeling. The justification for selecting these inputs has been clarified in the text.
- Figure 4, the units of the data are inconsistent, and it is suggested that they be split into four graphs and labeled with x-axis titles. Also, please explain why the horizontal coordinate of Figure 4 only shows roughly 670 months when, in fact, 1950-2023 totals 888 months.
Thank you for your detailed observation. Figure 4 has been split into four graphs, each labeled with appropriate x-axis titles. Regarding the horizontal coordinate, only non-missing data points were plotted, as months with missing or incomplete data were excluded during preprocessing.
- Line 230, in section 3.2, there is a lot of repetition and redundancy in the descriptions here, and it is recommended that the models be tabulated or briefly described.
We appreciate this suggestion. The descriptions of the models have been condensed.
- 7. Line 292-296, it is suggested that the content can be moved to the corresponding method section. Only the results are shown here.
Thank you for your review. The content in lines 292–296 has been moved to the method section as per your suggestion. We believe that with this change, the structure and flow of the study has become clearer and more readable.
- Figure 5, the horizontal and vertical coordinates of the graph should show the units.
We have updated Figure 5 to include appropriate units for both the horizontal and vertical axes.
- Line 329-337, the content of this paragraph is duplicated above, please reorganize the content. Also, Figure 7 shows the prediction results of the ANN method and suggests deleting Table 1.
Thank you very much for your attention. Duplicate content has been rearranged.
- Line 369-370, this trend is insignificant, please provide the relevant literature to support it.
Thank you for your review. Appropriate references have been added to the article. In addition, our arguments have been more clearly stated with the additions to this section.
- Figure 9, this figure is confusing. Should the horizontal coordinate represent CV k-fold?
Yes, the horizontal coordinate represents CV k-fold. It is clearly stated in the figure.
- Line 451-462, the content of this paragraph is duplicated. Please reorganize the expression and have a more in-depth and valuable discussion.
The duplicate content has been edited, and the discussion has been expanded with additional insights to make it more in-depth and valuable.
- The conclusions section is too wordy, so please list the results of the most critical findings of this study in this section.
There were also criticisms from other reviewers on this issue. The conclusion has been revised to focus on the most critical findings. All necessary arrangements have been made. Thank you very much for your interest and attention.
- The following reference should be considered:
https://doi.org/10.1177/0361198119846473
https://doi.org/10.1016/j.conbuildmat.2022.127029
Thank you for these suggestions. Both references have been reviewed and incorporated into the manuscript where relevant to enhance the context and support our arguments.
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe author has made good revisions to the paper and suggests accepting it.
Reviewer 3 Report
Comments and Suggestions for AuthorsThanks to the authors for realizing all the suggestions indicated by me.
Reviewer 4 Report
Comments and Suggestions for AuthorsMy comments during the first round of review have been well-addressed.