Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Cloud Nowcasting with Structure-Preserving Convolutional Gated Recurrent Units

Atmosphere 2022, 13(10), 1632; https://doi.org/10.3390/atmos13101632

by Samuel A. Kellerhals^1,2,*, Fons De Leeuw² and Cristian Rodriguez Rivero¹

Reviewer 1:

Wei Fang

Reviewer 2:

Luis Valentín

Reviewer 3: Anonymous

Atmosphere 2022, 13(10), 1632; https://doi.org/10.3390/atmos13101632

Submission received: 29 July 2022 / Revised: 24 September 2022 / Accepted: 30 September 2022 / Published: 7 October 2022

(This article belongs to the Special Issue Application of Machine Learning in Atmospheric Observations, Monitoring and Modeling)

Round 1

Reviewer 1 Report

The authors propose the use of Convolutional Gated Recurrent Unit networks (ConvGRUs) to produce short-term cloudiness forecasts for the next hours over Europe, along with an optimisation criterion able to preserve image structure across predicted sequences. This approach is compared against state of the art optical flow algorithms using over two and a half years of observations from the Spinning Enhanced Visible and Infrared Imager (SEVIRI) instrument onboard the Meteosat Second Generation satellite.The topic is interesting. However, there are still problems in the structure and content of this paper. Below are more specific comments:

1. When discussing the related work, I think it is important to mention some of the most recent research efforts.

2. In the summary of the introduction, there are no more innovative points. It is suggested to change 'deep learning' into specific deep learning methods.

3. Page 3, the concept of Lagrangian persistence could be a bit better explained.

4. For comprehensive understanding, please discuss about the “worst-case” which may cause bad performance.

5. The formula (3), (4) and (5) is not explained.Some of the notations need to be better clarified.

6. The connection between sections 3.2 and 3.3 is not better explained.

7. There are too few evaluation indicators in the experiment. The authors use the R², MAE and SSIM indicators to evaluate the accuracy of prediction. There are other more suitable methods. There are usually four indicators to measure the accuracy of the recommendation system algorithm: Recall, Precision, and Hit Rate.

8. Lack of comparative experiments with other deep learning models.

9. Lack of ablation experiments.

10. The format of the references is confusing, please modify the format of the references according to the journal template.

Author Response

Dear Reviewer,

First of all we would like to thank you for taking the time to review our paper and provide us with valuable feedback. We made an effort to consider and address all of your comments to our best abilities. Please see below for a summary of our responses to your comments. Also please note that in the revised submission new changes are presented in a blue font (except for new equations and figure captions which could not be made blue).

When discussing the related work, I think it is important to mention some of the most recent research efforts.

Thank you for the suggestion. More related research papers were included now and more thoroughly discussed in the introduction section, such as relevant other studies as well as currently deployed nowcasting systems.

In the summary of the introduction, there are no more innovative points. It is suggested to change 'deep learning' into specific deep learning methods.

The deep learning methods proposed in the article have now been made clearer in the introduction and a more specific focus on the explanation of the chosen architecture (autoencoders) was made.

Page 3, the concept of Lagrangian persistence could be a bit better explained.

Indeed this topic should be better explained, thanks for noticing. In the methodology section Lagrangian persistence was now better explained.

For comprehensive understanding, please discuss about the “worst-case” which may cause bad performance.

It was now clarified that the worst case model is Eulerian Persistence and that this is what we use as the baseline model.

The formula (3), (4) and (5) is not explained.Some of the notations need to be better clarified.

To answer this point all of the optical flow equations were rewritten in a clearer way, and the equations (and their derivation) were better explained and clarified in the methodology section.

The connection between sections 3.2 and 3.3 is not better explained.

This is a good point, the connection between the two sections was now better explained, in particular the transition from an optical flow based model to an autoencoder (ConvGRU) model.

There are too few evaluation indicators in the experiment. The authors use the R2, MAE and SSIM indicators to evaluate the accuracy of prediction. There are other more suitable methods. There are usually four indicators to measure the accuracy of the recommendation system algorithm: Recall, Precision, and Hit Rate.

In the case of a machine learning problem where the focus is on classification you are right that evaluation metrics such as Recall, Precision and Hit Rate are used. However in a regression problem such as is the case in this study these metrics are not applicable. There are a variety of other papers in the domain of cloud and precipitation nowcasting who use MSE and MAE as their primary evaluation metric, in our case we extend this to also include SSIM.

Lack of comparative experiments with other deep learning models.

Comparisons were now made with other published research where the results are comparable to what we found such as the study by Knol et al. The focus of this study is the comparison between optical flow and a state of the art autoencoder model, as such carrying out additional experiments with additional deep learning models is out of scope for this study but is something that we are considering for a future study.

Lack of ablation experiments.

Ablation experiments were conducted by varying loss functions in our model, which is the central ablation study in this research. Given that further ablation studies are of interest to us for future research the discussion section was extended to include a paragraph about future ablation studies and hyperparameter optimisation.

The format of the references is confusing, please modify the format of the references according to the journal template.

We double checked the reference format once again (and corrected some references) and used the official MDPI template provided by Overleaf to make sure the formatting is correct.

Reviewer 2 Report

In this work, the authors present a deep learning-based methodology to address the clouds nowcasting task, which is a challenging problem due to the dynamics of the sky. The article compares two strategies, the first one is based on the well-known optical flow, whereas the second one is based on the implementation of a deep neural network.

The authors have made a good effort to describe the methodology as the writing is clear and the structure satisfying. Nevertheless, although the document has been well presented, from this reviewer's point of view, there are some important details that need to be addressed,

1) My main concern is about the way the authors present the Results and Discussion section, especially due to the fact that figures and tables are not referenced in the main text, which makes it difficult to appreciate the results.

2) The variables used in the equations are confusing, for example, initially "y" represents a position, but later it indicates a non-negative image signal and then a cloud field. Thus, it is suggested to review the variables that are being used and be consistent with their definition throughout the entire document.

3) Although the parameters of the CNN are mentioned, it is not established how it is trained, that is, the authors do not mention how many images were used to train the model and how many to validate it. Furthermore, the loss graphs, resulting from the training, are not shown; these graphs are relevant because it allows observing the behavior of the model and if it will present overfitting.

- Shortcomings

1) in line 122: It looks like the variable "I" refers to the image, not to the partial derivatives of the image, Am I right?

2) It is necessary to clearly define each of the elements of equation 8, for instance, the sigma definition is missing.

3) In line 178 the authors talk about the SSIM metric but they define it until line 186, it is suggested to define it before and then talk about it.

4) It would be nice to see some representative images of the training set.

5) Indicate how the database is made up in terms of the number of images.

Author Response

Dear Reviewer,

First of all we would like to thank you for taking the time to review our paper and provide us with valuable feedback. We made an effort to consider and address all of your comments to our best abilities. Please see below for a summary of our responses to your comments. Also please note that in the revised submission new changes are presented in a blue font (except for new equations and figure captions which could not be made blue).

My main concern is about the way the authors present the Results and Discussion section, especially due to the fact that figures and tables are not referenced in the main text, which makes it difficult to appreciate the results.

The variables used in the equations are confusing, for example, initially "y" represents a position, but later it indicates a non-negative image signal and then a cloud field. Thus, it is suggested to review the variables that are being used and be consistent with their definition throughout the entire document.

Thank you for the observation. We double checked all equations again and ensured that the use of variables is streamlined across the entire article. Cloud fields have now been changed to be consistently Psi, and inside the SSIM equation a and a prime were used to denote image signals. All other variables across equations should also be consistent now.

Although the parameters of the CNN are mentioned, it is not established how it is trained, that is, the authors do not mention how many images were used to train the model and how many to validate it. Furthermore, the loss graphs, resulting from the training, are not shown; these graphs are relevant because it allows observing the behavior of the model and if it will present overfitting.

Although we understand your concern we believe that we have thoroughly explained how the ConvGRU model is trained in section 3.5 (Model training and evaluation). Here we mention how many images were used in the training and test set, the partitioning of the cloud sequence, and we also mention all the hyperparameters used for training, as well as what optimiser has been used along with a description of loss function used and the infrastructure used to train the model. Regarding the missing loss graphs, we chose to omit them due to space reasons as we would have to include 5 additional graphs which in our opinion do not add enough value to justify the space required.

in line 122: It looks like the variable "I" refers to the image, not to the partial derivatives of the image, Am I right?

The optical flow equations have now been changed to make them more easily understandable.

It is necessary to clearly define each of the elements of equation 8, for instance, the sigma definition is missing.

Indeed you are correct, thanks for noticing. All elements of this equation (which is now equation 9) have been fully defined.

In line 178 the authors talk about the SSIM metric but they define it until line 186, it is suggested to define it before and then talk about it.

We have now properly defined SSIM in the correct order of appearance in the text.

It would be nice to see some representative images of the training set.

We have now included representative images belonging to the training set in the appendix.

Indicate how the database is made up in terms of the number of images.

This has been thoroughly explained in section 3.5 (Model training and evaluation).

Reviewer 3 Report

Review for Cloud Nowcasting with Structure Preserving Convolutional Gated Recurrent Units

The autohrs present a study aimed to analyse short-term cloudiness forecasts with ConvGRu and ensemble OF model for 3 hours over Europe along with an optimisation criterion able to preserve image structure across predicted sequences. While the methodology for cloud tracking is widely used, any input to improve results is of value. However, there are some considerations that, in my opinion, should be improved and better explained before the manuscript is considered for publication. The manuscript is generally well written, although there are sentences that could be better written (some details). For example "page 2, line 56: relatively little studies .." or page 6, line 168 "information forget". The introduction could be more comprehensive without adding many more pages and going into more detail about the cases where images are blurred in other approaches . There are basic elements that need to be corrected. For example, the fact that no figures or tables are mentioned in the text, and therefore interpreting results is a complicated task. Authors should include not only the figures and tables but also mention and describe the results obtained that are intended to be shown there. In addition, there are abbreviations in these figures or tables that are not specified in the manuscript, making the read and the interpretation of the results difficult and thus, validation of the statements. There are figures that even lack the description of any axis.

There are certain inconsistencies in the text that need to be reviewed as they impact the core of the issue. For example, the abstract says that this approach improves predictions for t <. 1 hour, while the rest of the manuscript refers to intervals of 3 -6 hours, improved results for t <. 3 hours, and even the tables and figure 5 show times of 3 hours. I find it redundant to show figures and tables with the performance of each model tested, and I find it more useful to incorporate in a table the results, including the values of R2, MAE and SSIM in addition to the percentages. Regarding this, where do the percentages expressed in the results and discussions (section 4) for R2, MAE and SSIm come from? They do not seem to be compatible with the values shown in Tables.

On the other hand, although the text mentions the better performance of ConvGRU models with respect to the assembly, in figure 5, this result is somewhat confusing. Is OF the one with the worst performance? Why does the last row with the SSIM+MAE modification show the last t with blurred images? Why the t is 3 hours? What does this image belong to? Where is the validation supposed to be done? What kind of clouds are you trying to capture. This is not a minor issue. There are no specifications or comments about what type of cloudiness is being tracked. It is not the same to study low or high clouds, or even thin or thick clouds. What about semi-transparent and transparent? The next section mentions the regional effects and even mentions results from various regions in Europe; none of which are shown. It is not possible to observe such differences or to appreciate the sensitivity of the proposals when there are no results in sight. Therefore this section does not add value to the analysis if the method cannot be tested on at least some examples or samples. A single example is not enough to support such conclusions. The last part of the paper discusses the G (noise) function. Where in the text is this included in the analysis or results? Where the geographical variations better represented can be observed?

The conclusion also states that the results are in agreement with Ayzel's results related to precipitation. It does not seem appropriate to compare cloud results with precipitation.

n summary, the authors state in the manuscript conclusions that exceed the results shown. There is little detail on cloud tracking and prediction, even without details that make it difficult to contrast the methodology with the achievements. It is not clear that the improvement in blurred images is a multi-scenario result, because only one example is shown. It is necessary to adjust the prediction times in accordance with what is expressed in the text and to extend the results shown to different situations both spatially and temporally.

Author Response

Dear Reviewer,

For example "page 2, line 56: relatively little studies .." or page 6, line 168 "information forget".

Thank you for your suggestions. We now went into greater detail regarding the relatively little amount of studies conducted on this subject, and also better explained the information disposal mechanism of the ConvGRU model.

The introduction could be more comprehensive without adding many more pages and going into more detail about the cases where images are blurred in other approaches .

We now made the introduction significantly more comprehensive, and gave further background on the models used, and also went into more detail on relevant studies. We also better explained the problem that we aim to tackle with our study.

There are basic elements that need to be corrected. For example, the fact that no figures or tables are mentioned in the text, and therefore interpreting results is a complicated task. Authors should include not only the figures and tables but also mention and describe the results obtained that are intended to be shown there. In addition, there are abbreviations in these figures or tables that are not specified in the manuscript, making the read and the interpretation of the results difficult and thus, validation of the statements. There are figures that even lack the description of any axis.

Thank you for noticing this. An effort has now been made to reference all figures and tables throughout the text in the relevant sections such as the Results and Discussion section, which should now make it easier to interpret the results. We also cross checked all figures and tables again for abbreviations, and ensured that all of these were properly defined. However regarding the missing axis labels we do not see any figures with such missing axis labels. The only figure which contains subfigures with missing axis labels is Figure 5. This is done explicitly to save space and avoid repetition of the same axis in multiple figures as that would be redundant here.

There are certain inconsistencies in the text that need to be reviewed as they impact the core of the issue. For example, the abstract says that this approach improves predictions for t <. 1 hour, while the rest of the manuscript refers to intervals of 3 -6 hours, improved results for t <. 3 hours, and even the tables and figure 5 show times of 3 hours.

We understand that there may be some confusion around the lead time. We now clarified that according to the WMO nowcasting is defined as forecasting with a time horizon from 0 up to 6 hours, and made clear in the paper that we chose a lead time of 3 hours.

I find it redundant to show figures and tables with the performance of each model tested, and I find it more useful to incorporate in a table the results, including the values of R2, MAE and SSIM in addition to the percentages. Regarding this, where do the percentages expressed in the results and discussions (section 4) for R2, MAE and SSIm come from? They do not seem to be compatible with the values shown in Tables.

Thank you for your feedback on this. Although putting the evaluation metrics into a table is also a valid approach we prefer to show these as part of a plot, as in this way it is easier to visually see how metrics change across time. Regarding the percentages, in addition to those already shown in the results tables we also mention the average percentage improvement across all timesteps. As you correctly say these overarching results are not part of the tables but were still part of our analysis and serve as a way to interpret the general performance of each tested model.

On the other hand, although the text mentions the better performance of ConvGRU models with respect to the assembly, in figure 5, this result is somewhat confusing. Is OF the one with the worst performance? Why does the last row with the SSIM+MAE modification show the last t with blurred images? Why the t is 3 hours? What does this image belong to? Where is the validation supposed to be done?

We modified the image caption to clarify which is the best model in that specific case. As mentioned previously our nowcasting lead time is 3 hours, meaning that all of our models are making a forecast 3 hours into the future that is why t is equal to 3 hours. This figure serves as an illustration to be able to visually validate that the models trained with locally oriented loss function (SSIM, and SSIM + MAE), produce structurally more accurate nowcasts than models trained with globally oriented loss functions such as (MSE, MAE, Huber) and optical flow.

7. What kind of clouds are you trying to capture. This is not a minor issue. There are no specifications or comments about what type of cloudiness is being tracked. It is not the same to study low or high clouds, or even thin or thick clouds. What about semi-transparent and transparent?

We now added a description of what clouds are being captured by this method in the introduction to better explain this.

The next section mentions the regional effects and even mentions results from various regions in Europe; none of which are shown. It is not possible to observe such differences or to appreciate the sensitivity of the proposals when there are no results in sight. Therefore this section does not add value to the analysis if the method cannot be tested on at least some examples or samples.

In addition to Figure 5 which shows the regional differences in cumulative mean accuracy metrics for all lead times across all ConvGRU models, we have now also included a variety of example predictions across all tested geographical regions in Appendix B so that there are also visual samples in addition to the analytical results.

A single example is not enough to support such conclusions. The last part of the paper discusses the G (noise) function. Where in the text is this included in the analysis or results? Where the geographical variations better represented can be observed?

We have included multiple visual examples in Appendix B to illustrate the improvement in blurring over MSE. The discussion of the noise function serves as a mental model for understanding why the ensemble optical flow model outperforms the individual optical flow models.

The conclusion also states that the results are in agreement with Ayzel's results related to precipitation. It does not seem appropriate to compare cloud results with precipitation.

Whilst precipitation nowcasting and cloud nowcasting are not the same if framed as a generalized pattern recognition problem they share more similarities than not. Both rainfall and cloud cover nowcasting aim at forecasting the future positions of moving objects and are considering time series of images as inputs and targets of computational models. Therefore we deem it as justified to compare the two problem domains.

In summary, the authors state in the manuscript conclusions that exceed the results shown. There is little detail on cloud tracking and prediction, even without details that make it difficult to contrast the methodology with the achievements. It is not clear that the improvement in blurred images is a multi-scenario result, because only one example is shown.

We believe that there is sufficient evidence presented in the results section to justify the claim that our approach reduces blurring of predictions. We agree that looking only at SSIM scores alone is not as intuitive as looking at visual examples. This is why we included multiple ones from each region now.

It is necessary to adjust the prediction times in accordance with what is expressed in the text and to extend the results shown to different situations both spatially and temporally.

We made clear in the text now that our lead time is for 3 hours. Although longer lead times are of active interest to us, such an adjustment is out of scope for this study but is something we are strongly considering for future research with a dataset where we have access to longer sequences and can thus make prediction further into the future.

Round 2

Reviewer 1 Report

The paper has been revised according to my requirements, and the current version is suitable for publication.

Author Response

Dear Reviewer,

Thanks again for taking the time to go through our manuscript. We are happy to hear that all your concerns have been addressed and that you deem the current version suitable for publication.

Kind regards,
Samuel Kellerhals

Reviewer 2 Report

I appreciate the authors' efforts in responding to my questions and concerns. The revision clarifies almost all the points I raised and helps me (and hopefully readers) understand the current version of the manuscript. Nevertheless, there are some points that I think the authors may still consider.

1. Figures 1 and 2 are still not referenced in the text.

2. When referring to an equation I suggest doing it with capital letters (lines 183, 189...).

3. Replace Figure 2 with one of better quality.

4. I suggest adding a brief description (in the main text) of the figures in the appendices (not just figure captions).

Author Response

Dear Reviewer,

Thank you for the additional time spent going through the manuscript again and providing your feedback, it has been very valuable in improving our work. We have once again made an effort to respond to your comments, and as this time the changes requested were smaller in nature we reverted back to the original font style (removed all blue markings), and are indicating our changes made directly in our responses using line numbers.

Figures 1 and 2 are still not referenced in the text.

Thanks for noticing this, we must have missed this initially. Figures 1 and 2 are now referenced in the text. Please see line numbers 151 and 239 respectively.

When referring to an equation I suggest doing it with capital letters (lines 183, 189...).

We have now ensured that all references to in-text equations use a capital starting letter. Please see line numbers: 166, 174, 184, 190, 192-194 and 216.

3. Replace Figure 2 with one of better quality.

Figure 2 has now been replaced with a more high quality version of the previous figure using a resolution of 600 dots per inch (dpi).

I suggest adding a brief description (in the main text) of the figures in the appendices (not just figure captions).

This is a good point, we now discussed these images in the main text by mentioning the image warping effects of optical flow (402-405), as well as the different visual effects arising in different ConvGRU model predictions in lines 439-446.

Reviewer 3 Report

The manuscript is better constructed in this new version and some things have been included in the introduction. If figures are not to be included in the text as they appear, what is the purpose of including them? Better to delete them..

I still think the subject of type of clouds needs to be better addressed, including some discussion of semi-transparent clouds or cirrus. Even in the examples shown, it would be of interest to know some characteristics of clouds. Has the method the same accuracy in all tracked clouds regardless of type, height, etc. Mean accuracy metrics could be affected by these type of characteristics?

Author Response

Dear Reviewer,

1. If figures are not to be included in the text as they appear, what is the purpose of including them? Better to delete them..

You are correct, Figure 1 and Figure 2 were previously not referenced in the text, we have now changed this, please see line numbers 151 and 239 respectively. We have also included descriptions of the figures in the Appendix. In particular we now discussed these images in the main text by mentioning the image warping effects of optical flow (402-405), as well as the different visual effects arising in different ConvGRU model predictions in lines 439-446.

2. I still think the subject of type of clouds needs to be better addressed, including some discussion of semi-transparent clouds or cirrus. Even in the examples shown, it would be of interest to know some characteristics of clouds. Has the method the same accuracy in all tracked clouds regardless of type, height, etc. Mean accuracy metrics could be affected by these type of characteristics?

In this study the focus is on the development of a generalisable deep learning model for nowcasting all types of clouds, as measured through the lack of available solar radiation for a given position in our spatial extent in our dataset. One limitation of this study which we have now discussed in lines 457-462, is that with our current solar radiation dataset we cannot evaluate model performance on different types of clouds, or to clouds at different heights. Given that this merits future research we have included a statement on this and will address it in a future study.

Round 3

Reviewer 3 Report

I find the manuscript improved in this latest version. It would have been desirable to include a discussion in the conclusion of cloud related methodology and I believe that this issue should be improved in future attempts; it is not a minor issue. I also understand that some of the figures shown in the appendix are at the core of the topic, so perhaps, at least some of them, should have been included in the main manuscript with their corresponding explanation. In any case, in the present format I understand that the paper could be accepted for publication.

Author Response

Dear Reviewer,

Thanks for your feedback, we appreciate it.

We have now moved some of the most relevant figures from the Appendix directly into the discussion section. This should further aid the reader in following the arguments made.

We acknowledge your point on including a discussion in the conclusion section highlighting this limitation of the study, and the need to address it in future research. In addition to the discussion on this point in the discussion section (L458 - L461), we have now also included a paragraph in the conclusion section (L484 - L492) which highlights this limitation.

Kind Regards,
Samuel Kellerhals

Article Menu

Cloud Nowcasting with Structure-Preserving Convolutional Gated Recurrent Units

Further Information

Guidelines

MDPI Initiatives

Follow MDPI