Super-Resolution for Renewable Energy Resource Data with Wind from Reanalysis Data and Application to Ukraine
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsGeneral Comments:
The manuscript proposes a new deep learning-based spatiotemporal downscaling method using GAN for generating historical high-resolution wind resource data from the low-resolution ERA5. The training high-resolution dataset is Wind Integration National Dataset Toolkit.
Generally, the manuscript is properly organized and well written. The methodology appears quite sound, and the datasets for training and validating the deep learning model are comprehensive and robust. The model performance seems satisfactory. It is recommended that the manuscript be accepted for publication after minor revision. Some comments and suggestions are given below.
Specific Comments:
- While the title says "application to Ukraine", the study domain includes other countries such as Moldova and Romania.
- In the introduction, the current status of wind energy development in the study domain should be briefly mentioned.
- Why did the authors choose Ukraine, Moldova, and Romania as the application region? Are there any specific reasons? (e.g., lack of wind resource assessments, demand for renewable energy). Please clarify in the manuscript.
- The authors argue that this approach is different from previous ones in terms of the use of ERA5 as the low-resolution data. What is the advantage of using ERA5 over coarsened high-resolution data? Please clarify.
- Including a table to summarize (and compare) the key physical configurations of ERA5 and WTK would benefit the methodology section.
- Wind speed and resource are climate dependent, so their characteristics are expected to vary from one geographical region to another. The training and validation domains are all located within North America and East Europe. Could this model be directly applied to other geographical regions of the world? Please include some discussions.
Author Response
We thank the reviewer for the helpful comments and suggestions. Our responses are included below.
While the title says "application to Ukraine", the study domain includes other countries such as Moldova and Romania.
The main focus was on Ukraine but surrounding countries were included to keep the dataset domain rectangular. A clarification on this was added in the introduction.
In the introduction, the current status of wind energy development in the study domain should be briefly mentioned.
A few sentences added about the impact of the war on Ukrainian energy capacity and wind assets in particular.
Why did the authors choose Ukraine, Moldova, and Romania as the application region? Are there any specific reasons? (e.g., lack of wind resource assessments, demand for renewable energy). Please clarify in the manuscript.
This domain was selected based on stakeholder interest and funding support to help rebuild Ukraine’s energy infrastructure. Some clarification on this was added to the abstract and the introduction.
The authors argue that this approach is different from previous ones in terms of the use of ERA5 as the low-resolution data. What is the advantage of using ERA5 over coarsened high-resolution data? Please clarify.
Thank you for pointing this out, as this is a crucial point which deserves more discussion. A paragraph has been added to the introduction describing the advantages of using ERA5 over coarsened high-res data.
Including a table to summarize (and compare) the key physical configurations of ERA5 and WTK would benefit the methodology section.
A table with the output variables, run conditions, and resolution for ERA5 and WTK added to methods section.
Wind speed and resource are climate dependent, so their characteristics are expected to vary from one geographical region to another. The training and validation domains are all located within North America and East Europe. Could this model be directly applied to other geographical regions of the world? Please include some discussions.
While this is not definitive, the model demonstrated good performance when generalizing to Easter Europe while being trained only on data from North America. Some discussion has been added on how the use of region-specific bias correction, the conditioning on terrain data, and the substantial set of low-resolution training features, should enable generalization to other regions as well.
Reviewer 2 Report
Comments and Suggestions for Authorsthis is an excellently written manuscript and nice piece of work on downscaling of wind data to synoptic 10 m or rotor heights using multiple sources of data, ERA5, a downscaled dataset, synoptic and wind farm data and a well described and concise production method including pre-bias correction, training and running in spatio-temporal batches etc.
The manuscript can be published essentially as it is. I suggest to make one clarification and add one relevant reference
-clarification: you mentioned the different performance/improvements with respect to ERA5 for flat and more mountainous terrain (which was expected), please also add a sentence on the possible different performance for near surface (10 m) and rotor height (> 80 m) data - Note: the ERA5 winds are post-processed from the models 40 m winds using roughness length for grass over land
-you mentioned plans to add more ERA5 variables into training, a useful overview of relevant variables and on the diurnal cycle is available from this paper
Bouallègue, Z. B., F. Cooper, M. Chantry, P. Düben, P. Bechtold, I. Sandu, 2023. Statistical modelling of 2m temperature and 10m wind speed forecast errors. https://doi.org/10.1175/MWR-D-22-0107.1
Author Response
Thank you for the kind words and suggestions.
this is an excellently written manuscript and nice piece of work on downscaling of wind data to synoptic 10 m or rotor heights using multiple sources of data, ERA5, a downscaled dataset, synoptic and wind farm data and a well described and concise production method including pre-bias correction, training and running in spatio-temporal batches etc.
The manuscript can be published essentially as it is. I suggest to make one clarification and add one relevant reference
-clarification: you mentioned the different performance/improvements with respect to ERA5 for flat and more mountainous terrain (which was expected), please also add a sentence on the possible different performance for near surface (10 m) and rotor height (> 80 m) data - Note: the ERA5 winds are post-processed from the models 40 m winds using roughness length for grass over land
Excellent point. We included a caveat on this in the Sup3rWind vs MADIS results section.
-you mentioned plans to add more ERA5 variables into training, a useful overview of relevant variables and on the diurnal cycle is available from this paper
Bouallègue, Z. B., F. Cooper, M. Chantry, P. Düben, P. Bechtold, I. Sandu, 2023. Statistical modelling of 2m temperature and 10m wind speed forecast errors. https://doi.org/10.1175/MWR-D-22-0107.1
Thanks for this suggestion. We added a citation to this work in the future work section.
Reviewer 3 Report
Comments and Suggestions for AuthorsI would like to thank the authors for writing up the results of the work and the editor for entrusting me with the review.
The manuscript presents a method of generating high-resolution wind data from low-resolution global fields using machine learning. Compared to dynamical downscaling, this approach requires a couple orders of magnitude less computation, which is a major benefit. The obtained high-resolution data are validated and found to be of comparable accuracy to the ones obtained with dynamical downscaling. The software code is made available, and so is the obtained high-resolution wind data for a region of choice.
The impression I get is that this is a well-written report on a very important research contribution. It is challenging to provide suggestions on a work of such quality, as I'm more likely to go wrong in a comment than to recognize an actual problem. However, let me try anyway – please ignore the comments that aren't reasonable.
The meaning of the phrase "high-resolution wind data" is not self-evident to a non-expert reader (such as me) – one could wrongly assume that they are a result of measurement, not of an algorithm. You may want to clarify it sooner in the text, at the beginning of Introduction if not in Abstract.
On first reading, it was not clear to me why the 24-year data record over Ukraine, Moldova, and part of Romania produced with sup3r would be important. You train your GAN model on WTK data, and your model reproduces it imperfectly (with an additional error); therefore, it only makes sense to use it in times and spaces where WTK data is not available. After carefully reading the manuscript again, I'm under the impression that WTK is only available for the U.S. and not for Ukraine and neighboring countries. You train your GAN on the U.S. data exactly because you have WTK there, and then successfully transfer it to Ukraine without performing any training on Ukraine data. Am I correct, and if so, can you make it more obvious in the text?
In line 93, the phrase "order of magnitude" should probably be "orders of magnitude" (or "an order of magnitude").
Bias correction of ERA5 seems to be an important step in your algorithm, but it is not obvious to a layperson why it would be necessary or beneficial – if you didn't do it, GAN should learn to do it automatically. Did you include bias correction because you tried it and it improved the results, or is it a well-known fact that bias correction should be done in computations like this? Please provide a sentence of comment on the topic, if you agree that it would be appropriate.
Regarding Ukraine Wind Farm Observations (lines 186–197), you don't specify measurement heights for farm B. Are they the same as in A and C?
Can you specify the loss function you use in training? The choice of the loss function seems to be an original contribution of your work (line 205) so it deserves to be described.
Topography injection (line 207) is not explained in enough detail for me to understand it. Do you mean you use elevation above sea level at the location as an input feature? Where do you get the topography data from? Table 2 distinguishes between "training features" and "input": don't you use both in the same way, as inputs to the neural network, and if not, what is the difference? What does "U/V" among the Training Features stand for, and what is your data source for it and for the other Training Features?
In line 357, you probably mean Table 5, not Table 6.
Last but not least, I'd comment on the Data and Software Availability. It is excellent that you provide both the data and the software source code (it is also necessary, as the software you developed is the main point of the publication if I understand it correctly). However, I believe it could be improved further by providing more detail. The datasets such as ERA5 provide a lot of data; could you specify which are the exact quantities, variables and versions that you use? It is similar with the software: which version exactly do you use? In the ideal case, the reader would be able to install the software from source, download the data, click "run", and their computer would repeat the work of your computer all the way to the figures. I can't tell if the case presented in the manuscript is available among Examples in the sup3r repository (I'm looking at https://github.com/NREL/sup3r/blob/main/examples/sup3rwind/README.rst), but you should consider offering it (and making clear which example it is). Also make the example as unambiguous as possible. It is cool that you provide the trained model in addition to the training from scratch instructions, but don't leave any choices such as "Download the ERA5 data that you want to downscale from" to the reader. I do realize that having a very precise description of steps needed to reproduce your results would be a relatively tiny step toward better reproducibility, as repeating your computations is a major challenge requiring major computing resources – but it would at least make it easier to start ...
Another hypothetical challenge to the enthusiastic reader excited about repeating your work is the identity of Wind Farms A–E. I understand that you cannot provide either the locations or the data, the reader will have to contact the wind farm companies (and prove to them that they are trustworthy) by themselves. But what data should they ask for? How will the wind company know which farms you're talking about in the manuscript? Can you provide some kind of internal IDs of the wind farms that don't contain any sensitive information yet would make it clear to the wind companies which farm is which?
Thank you again for such a well-written manuscript on such an important contribution. I hope at least some of my comments make sense and enable you to make it even better without being too much of a burden.
Author Response
The meaning of the phrase "high-resolution wind data" is not self-evident to a non-expert reader (such as me) – one could wrongly assume that they are a result of measurement, not of an algorithm. You may want to clarify it sooner in the text, at the beginning of Introduction if not in Abstract.
Some clarification on this added to the abstract.
On first reading, it was not clear to me why the 24-year data record over Ukraine, Moldova, and part of Romania produced with sup3r would be important. You train your GAN model on WTK data, and your model reproduces it imperfectly (with an additional error); therefore, it only makes sense to use it in times and spaces where WTK data is not available. After carefully reading the manuscript again, I'm under the impression that WTK is only available for the U.S. and not for Ukraine and neighboring countries. You train your GAN on the U.S. data exactly because you have WTK there, and then successfully transfer it to Ukraine without performing any training on Ukraine data. Am I correct, and if so, can you make it more obvious in the text?
Yes you're exactly right. We clarified that WTK is only available for North America and added discussion on how using the model over Eastern Europe represents a significant geographic generalization.
In line 93, the phrase "order of magnitude" should probably be "orders of magnitude" (or "an order of magnitude").
Fixed
Bias correction of ERA5 seems to be an important step in your algorithm, but it is not obvious to a layperson why it would be necessary or beneficial – if you didn't do it, GAN should learn to do it automatically. Did you include bias correction because you tried it and it improved the results, or is it a well-known fact that bias correction should be done in computations like this? Please provide a sentence of comment on the topic, if you agree that it would be appropriate.
Great suggestion. This is an example of something we took for granted and failed to provide a full explanation. We added references to the underestimation of ERA5 on wind speed and described why we opted for region specific bias correction instead of using the model training to learn this.
Regarding Ukraine Wind Farm Observations (lines 186–197), you don't specify measurement heights for farm B. Are they the same as in A and C?
Good catch. Added this in and included reference to the table with the list of sites and heights.
Can you specify the loss function you use in training? The choice of the loss function seems to be an original contribution of your work (line 205) so it deserves to be described.
Great suggestion. We added the explicit form of the loss function as Equation 1.
Topography injection (line 207) is not explained in enough detail for me to understand it. Do you mean you use elevation above sea level at the location as an input feature? Where do you get the topography data from? Table 2 distinguishes between "training features" and "input": don't you use both in the same way, as inputs to the neural network, and if not, what is the difference? What does "U/V" among the Training Features stand for, and what is your data source for it and for the other Training Features?
Thanks for pointing this out. We added additional explanation on the difference between topography injection and standard model input. We also changed “input” to “source data” to clarify the difference between variables and the sources of those variables. “u/v” was changed to “u/v wind vector components”.
In line 357, you probably mean Table 5, not Table 6.
Fixed
Last but not least, I'd comment on the Data and Software Availability. It is excellent that you provide both the data and the software source code (it is also necessary, as the software you developed is the main point of the publication if I understand it correctly). However, I believe it could be improved further by providing more detail. The datasets such as ERA5 provide a lot of data; could you specify which are the exact quantities, variables and versions that you use? It is similar with the software: which version exactly do you use? In the ideal case, the reader would be able to install the software from source, download the data, click "run", and their computer would repeat the work of your computer all the way to the figures. I can't tell if the case presented in the manuscript is available among Examples in the sup3r repository (I'm looking at https://github.com/NREL/sup3r/blob/main/examples/sup3rwind/README.rst), but you should consider offering it (and making clear which example it is). Also make the example as unambiguous as possible. It is cool that you provide the trained model in addition to the training from scratch instructions, but don't leave any choices such as "Download the ERA5 data that you want to downscale from" to the reader. I do realize that having a very precise description of steps needed to reproduce your results would be a relatively tiny step toward better reproducibility, as repeating your computations is a major challenge requiring major computing resources – but it would at least make it easier to start ...
Thanks for these suggestions. We added the specific sup3r version used, references to the full python environment file, references to the configuration files used to run inference, and pointed to the utilities provided by sup3r to help download and pre-process ERA5 data.
Another hypothetical challenge to the enthusiastic reader excited about repeating your work is the identity of Wind Farms A–E. I understand that you cannot provide either the locations or the data, the reader will have to contact the wind farm companies (and prove to them that they are trustworthy) by themselves. But what data should they ask for? How will the wind company know which farms you're talking about in the manuscript? Can you provide some kind of internal IDs of the wind farms that don't contain any sensitive information yet would make it clear to the wind companies which farm is which?
Unfortunately due to the security concerns and the NDA with the data providers I don't think there's anything more I can add here.
Reviewer 4 Report
Comments and Suggestions for AuthorsReview for paper "Super Resolution for Renewable Energy Resource Data With Wind From Reanalysis Data and Application to Ukraine"
The manuscript presents a well-structured and relevant study that proposes an innovative deep learning approach for generating high-resolution wind data using GANs. The application to Ukraine and surrounding regions is timely and impactful, especially given the increasing need for resilient energy infrastructure. However, several concerns and suggestions are outlined below to further improve the quality and clarity of the work.
Strengths
- The paper tackles a critical challenge in renewable energy planning: the generation of high-resolution, long-term wind resource data at reduced computational cost.
- The use of a GAN-based multi-step approach combining spatial and temporal super-resolution is novel and well-executed.
- The methodology is well explained, and the work provides valuable open-access datasets and code.
Major Suggestions for Improvement
- Expand the Related Work Section
- Please provide a more robust discussion of prior work in both meteorological downscaling and deep learning for super-resolution.
- Highlight existing limitations of traditional methods and clearly articulate the gap this study aims to fill.
- Terminology: Use “Performance Measure” Instead of “Metric”
- Replace the term “metric” throughout the manuscript with “performance measure”, which is more precise in this context.
- Comparative Baselines
- Discuss how alternative methods (e.g., U-Net, CNNs, physics-informed models, or ensemble downscaling) could be employed for the same task.
- This will help the reader better understand the relative merits of your proposed GAN-based method.
- Hyperparameter Tuning
- Clarify how hyperparameters were selected. Did the authors use grid search, random search, or expert heuristics?
- Was cross-validation employed for model selection or to estimate generalization performance? If not, please explain why.
- Hypothesis Testing
- Include statistical hypothesis tests to support claims of improvement over ERA5 and WTK, such as:
- Paired t-tests
- Wilcoxon signed-rank tests
- This will strengthen the claim that observed differences are significant.
- Training Data Scope
- Consider expanding the training dataset beyond the current 6-year period, which is relatively limited.
- If expanding is not feasible, please provide a clear justification (e.g., computational constraints or data availability).
Minor Issues
- Figures
- Several figures (e.g., Figures 4–7) appear low in resolution and difficult to interpret in detail.
- Please provide higher resolution figures, especially for scatter plots and distribution comparisons.
- Typographical and Formatting Errors
- The manuscript contains several typographical issues and formatting inconsistencies, particularly in the metadata, section headers, and tables.
- Examples include:
- Placeholder text (“Firstname Lastname” for editor)
- Broken URLs
- Inconsistent use of spacing or line breaks
- A thorough proofreading pass is recommended.
Conclusion
This is an important and promising contribution to the field of renewable energy data generation. By addressing the concerns outlined above, the authors will further enhance the rigor, reproducibility, and impact of their work. I look forward to seeing the revised version.
Author Response
We thank the reviewer for this very thorough review and all the helpful suggestions. Our responses are found below.
Expand the Related Work Section
- Please provide a more robust discussion of prior work in both meteorological downscaling and deep learning for super-resolution.
- Highlight existing limitations of traditional methods and clearly articulate the gap this study aims to fill.
We’ve expanded the discussion on dynamical and statistical downscaling, and some of the latest work on super resolution. We added many additional references, discussed the limitations of both dynamical and statistical downscaling, and further described the advantages of our methods over previous approaches. Advantages of GANs over standard regression with UNets or CNNs was added. We also further emphasized how our approach of training with ERA5 and WTK, instead of coarsened WTK, improves over previous work.
Terminology: Use “Performance Measure” Instead of “Metric”
- Replace the term “metric” throughout the manuscript with “performance measure”, which is more precise in this context.
Thanks for this suggestion. We used this replacement throughout the paper.
Comparative Baselines
- Discuss how alternative methods (e.g., U-Net, CNNs, physics-informed models, or ensemble downscaling) could be employed for the same task.
- This will help the reader better understand the relative merits of your proposed GAN-based method.
Great suggestion. We added some discussion on the advantages of GANs over standard regression approaches to the previous work section.
Hyperparameter Tuning
- Clarify how hyperparameters were selected. Did the authors use grid search, random search, or expert heuristics?
- Was cross-validation employed for model selection or to estimate generalization performance? If not, please explain why.
Thanks for pointing out that this required further clarification. Limited time and the time needed to train models prevented extensive hyper parameters search and cross validation but we did select from a few models based on performance in the CONUS validation regions, which were outside the training domain. We added clarification on this at the beginning of the results section.
Hypothesis Testing
- Include statistical hypothesis tests to support claims of improvement over ERA5 and WTK, such as:
- Paired t-tests
- Wilcoxon signed-rank tests
- This will strengthen the claim that observed differences are significant.
Thanks for this suggestion and the help in strengthening the claims in the paper. We used bootstrapping to estimate p-values for the performance measure differences between Sup3rWind and both ERA5 / WTK. We also added p-values from Wilcoxon signed-rank tests performed on the time series directly.
Training Data Scope
- Consider expanding the training dataset beyond the current 6-year period, which is relatively limited.
- If expanding is not feasible, please provide a clear justification (e.g., computational constraints or data availability).
Unfortunately WTK is only available for 2007-2013. We added clarification of this.
Figures
- Several figures (e.g., Figures 4–7) appear low in resolution and difficult to interpret in detail.
- Please provide higher resolution figures, especially for scatter plots and distribution comparisons.
All figures redone with higher dpi
Typographical and Formatting Errors
- The manuscript contains several typographical issues and formatting inconsistencies, particularly in the metadata, section headers, and tables.
- Examples include:
- Placeholder text (“Firstname Lastname” for editor)
- Broken URLs
- Inconsistent use of spacing or line breaks
- A thorough proofreading pass is recommended.
Thanks for the attention to detail here. Some of this will be changed after final acceptance (place holder text). Otherwise we have fixed broken links and inconsistent formatting.
Round 2
Reviewer 4 Report
Comments and Suggestions for AuthorsMy comments were addressed.
The paper can be acept for publication in its current form.