Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessEditor’s ChoiceArticle

Peer-Review Record

Prediction of Cloud Fractional Cover Using Machine Learning

Big Data Cogn. Comput. 2021, 5(4), 62; https://doi.org/10.3390/bdcc5040062

by Hanna Svennevik¹, Michael A. Riegler^2,3, Steven Hicks^3,4, Trude Storelvmo¹

and Hugo L. Hammer^3,4,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Big Data Cogn. Comput. 2021, 5(4), 62; https://doi.org/10.3390/bdcc5040062

Submission received: 20 September 2021 / Revised: 20 October 2021 / Accepted: 29 October 2021 / Published: 3 November 2021

(This article belongs to the Special Issue Multimedia Systems for Multimedia Big Data)

Round 1

Reviewer 1 Report

Abstract:

line 9 ...."features, and outperformed the regression model en some geographic areas." rewrite as ."features, and outperformed the regression model in some geographic areas."

Introduction

Line 37: "to predict CFC. The closest is Han et al." insert a number in this reference.

Put the meaning of MAE in all figures legend. Note that legends should be enough informative to avoid reader to go back in the text

Author Response

The reply is provided in the uploaded PDF file.

Author Response File: Author Response.pdf

Reviewer 2 Report

Review of Prediction of Cloud Fractional Cover Using Machine Learning

The paper is quite short, with much machine learning detail left out. This makes it hard for people who may not have a strong ML background to understand what is being done. Also, the explanation of the ML model being developed needs to be explained better (see comments below). The results really don’t show much promise for ML methods as presented.

Major Issues

One major issue I have with this study is that it is trying to relate CFC to global warming, but the study is only over a small/limited part of the globe. It is very difficult to extend any results from this small of an area to any other areas of the globe and also to draw any sort of conclusion (or derive any sort of model) from such a limited area.
I also have an issue with using (what I assume are) surface based measurements of temperature, pressure, and humidity to estimate CFC. Cloud formation is an atmospheric column issue, dependent upon the conditions at different layers of the atmosphere. A cloud forming in an atmospheric layer is probably not related to the surface conditions unless that layer/level is very near the surface.
Authors assume readers know a lot about machine learning (ML). Some explanation as to why a ConvLSTM model is better over other ML model types is really needed. Most ML models can learn complex spatial and temporal patterns. What makes it unique to your study.
The development of the ConvLSTM model is unclear (see below). I am uncertain how the various steps are related to each other and needs to be clarified.

Minor Issues

Abstract

Line2: Climate change is not a “big challenge”, it is an issue or consequence. Dealing with it is a challenge.

L8: Replace “but also” with however

L9: “en” should be “in”

L10: Replace “All parts of the analysis pipeline is explained” with “All aspects of the research analyses are explained”.

Introduction

L2: See above w.r.t. “challenges”.

L6: Replace “expand” with “increase”. The ocean will not expand as it gets warmer.

L11: Remove “same”

L12: Replace “going on” with “occurring”

L14: Replace “solved” with “estimated”. If there is no know solution, then it cannot be “solved”. Also remove “again”.

L15: Cloud interact with radiation is true, but clouds do different things regarding incoming and outgoing radiation, so to say it will affect global warming is too broad and not entirely accurate.

L16: Comma after “global warming” and add “as a result” between “and different”

L21: Remove “e.g.”

L24: Change “in the first step of the procedure above” to “between CFCs and other variables”.

L26: What “documented impressive performance to address problems” exist? What problems are addressed and what is unique to the data that lends itself to deep learning (or in general Machine Learning)? This sentence is too broad and needs references and documentation. Also, I think you should replace “deep learning” with “machine learning” since that is what you use in the article title. There are differences between machine and deep learning, which you don’t describe.

L28: Why is the application of ML within climate research limited? Also, remove the last sentence of this paragraph. It is not needed and out of place since no background is given to what each part is and how they are examined/addressed in ML.

L34: Replace “Given the high spatial and temporal resolution… x 5)” with some thing more general regarding the ECC dataset, perhaps like “Given the availability of the ECC dataset…”. You have not described the ECC dataset at all other than stating it consists of satellite observations of CFC and other observations. You describe it more in the next section, so this sentence here is out of place.

L42: Replace “all the parts” and “pipeline” with something like “we outline the entire analysis technique”. Also, “pis” should be “is”.

Section 2

L50: Add “humidity” after relative

L51: I assume all of the variables in the ECC dataset are all surface based measurements (outside of CFC)? See above Major Comment.

L62: Which satellites? I know it is Meteosat-8 through 11 but it should be stated.

L63: Remove “such as humidity”. Also, you should just say the cloud mask was regridded to the same grid resolution as the ERA5 data.

L68: ECC should be ERA5 data, correct? ECC includes the cloud mask data along with the four variables in the ERA5 data. How is the cloud mask data utilized? Is it part of the models?

L74: High humidity is not always a requirement of cloud formation. It may be a requirement of cloud formation at the surface or perhaps cumulus clouds but not necessarily a cloud at higher levels (stratus and cirrus clouds).

L77: It should be “… a 24 hour predicted sequence of values…”, correct? This gets into another issue I’m having with your model development. I am uncertain on what you are using as inputs and what you are outputting. In model CFC(H) you are using historical values (0-24 hours prior to time t0, correct?) to train a model, but what is the model producing as an output, a CFC prediction at t0 to t0+24 hours or just t0? Then you are applying the CFC(H) model using input predicted values to forecast the CFC(F) model (CFC at t0 to t0+24hours)? I don’t think you can do that. The temporal relationships will be completely different using historical and predicted values of ERA5 data. If I am wrong, you need to explain what you are doing in greater detail. I can see you developing a model using historical ERA5 data to predict future CFC values or perhaps using future ERA5 parameters to predict CFC data at the same predicted times (ERA5 values at t0+15hours to predict CFC at t0+15 hours, for example), but I don’t think you can take a model developed using historical values as input and then apply it to predicted values as input. I guess I am unsure how the two steps in 1 and 2 (lines 68 and 70) are related. Also, are you using forecast values from periods after the analysis time to predict that time (e.g. using t0+20hours to predict the CFC value at T0+15hours)? That isn’t clear.

L81: Here are areas that need to be explained better for non-ML experts. What are batch normalizations? What are hidden states? What are convolutional filters? Weight initializations? Learning rates and dropouts? Hyperparameters? Overfitting? These are all things that need some explanation to those who may not know.

L83: This is a run-on sentence. Should have a colon or semicolon after “benefits” and then commas between the three points (and add “, the model had a” before “higher learning rate” and change “and dropout” to “, and the use of dropouts”).

L85: This is an incomplete sentence. You can fix it with something like “… expensive, so disabling…”.

L86: What is “Padding Same”? Remove comma after “layers”

L88: Not sure why the three lines in this section are not numbered. Must be a formatting issue. Also, this is really not a model. It is just a regression equation. There are “regression ML models”, but this is not one of them. What is the “e” at the end of the equation? The a0 is the y-intercept, which a1-a4 being the regression equation constants for each variable. These should be defined. So again, in this model, you are using a forecast prediction of the ERA5 values at a certain time to derive a CFC value at the same forecast time, correct?

L94: Keras should be capitalized.

L96: Again, the various parameters should be explicitly identified and should probably be explained a bit. What does changing the learning rate do? What is B1 and B2 and e (and why did you change it from the default value)?

L97: Training and terms are both misspelled.

L99: You have 5 different models in Figure 3, but you really have not explained the differences between them or why you selected the five you did display. Did you have more but just not show them?

L99: So, you just ran the model at t0 each day and predicted our 24 hours, or are you running it at every time and predicting 24 hours out for any t0 time? What is t0? Local midnight? 00Z?

L101: The intercept is the average CFC values from the training and validation data sets at a certain time? From which model… the ConvLSTM or the Regression equation? How do you get an average value from a ML model? If you are somehow using an average CFC model value, you should only use the validation or the training value. You should not combine the values. The Validation dataset is independent of the Training data set. Did the ML models all converge and not overfit the data?

L103: How do you know the ConvLSTM model learned from the features? The MAE differences between the Intercept and the ConvLSTM models is so small, how do you know they are significant? Did you do significance testing between the data sets to make sure what you are seeing is real and not just a data artifact?

L104: Cloud formations are not necessarily due to thunderstorms. Clouds form more during the day due to surface heating? And getting back to my comment above, are you just initializing the model at a set time every day (e.g. 00Z). If so, the “improved performance” starting around t0+10hours may not be “afternoon” but more related to surface heating increasing as the sun rises.

L111: The local variations seemed to be tied to areas where there are land/sea interactions or places where convergence may initiate convections (along mountain ridgelines lead to orographic lifting, etc). It would be good to see if one variable in the ERA5 is more important to the accuracy of the model than another one. This might lead to helping to determine why the models performed better in some areas than others (temperature or humidity related). Wind measurements would also aid in this, but not part of your model currently.

L117: The inability of the regression model to learn the complex spatiotemporal complexities is only because you have set it up to only consider the current (forecast) time being analyzed. It could easily be set up to include other time periods prior to the time being analyzed, but would require different regression equations for each time step, most likely.

L119: Not really sure what is concluded with this study. Basically you have stated that the results contain a lot of error and that the ML methods are worse overall than the basic regression equation and neither are much better than just taking a bulk average, but you try and reason that using ML methods might improve the results in the future. Not sure there is much info to back this up based on your results. Perhaps adding in a plot comparing the different model errors as a difference between the two (showing where the ML model MAE is better than the regression model) would highlight areas easier where the ML model improves over the regression. As is, the reader needs to compare shadings in two side-by-side plots.

Author Response

The reply is provided in the uploaded PDF file.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The paper is much improved and clearer now. Authors took great care to answer questions in the comments and explain their answers. The new section regarding the different machine learning techniques is a good summary for people just learning about ML and also provides a platform for the authors to explain why they chose what they did for the study. The Conclusion section is much improved. Even a result that does not convey what you hoped is a good result as long as you explain what can be gleaned from it, which you have done.

Article Menu

Prediction of Cloud Fractional Cover Using Machine Learning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI