Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Geographically Weighted Machine Learning and Downscaling for High-Resolution Spatiotemporal Estimations of Wind Speed

Remote Sens. 2019, 11(11), 1378; https://doi.org/10.3390/rs11111378

by Lianfa Li^1,2

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Remote Sens. 2019, 11(11), 1378; https://doi.org/10.3390/rs11111378

Submission received: 27 March 2019 / Revised: 1 June 2019 / Accepted: 7 June 2019 / Published: 10 June 2019

(This article belongs to the Section Atmospheric Remote Sensing)

Round 1

Reviewer 1 Report

The manuscript presents a case study of wind speed prediction for mainland China using the geographically weighted regression.

In general, the paper is interesting, but needs changes. The quality of the manuscript presentation should be improved in accordance with Remote Sensing template.

What is the usefulness of the obtained data?

You should also specify the height above the ground at which you have estimated the wind speed in your study (Fig.8, 11 and 12). For example, recovery of the wind potential for energy purposes involves estimating the wind speed at heights between 50-100 m above the ground.

Additional observations:

The abstract should contain no more than 200 words.

The equations are not clearly edited: especially (1), (3), (4), (5), (8).

Does the paper have multiple authors? You should make the appropriate corrections. Lines: 16 (we develop); 28 (we show); 18, 31, 89, 93, 95, 96, 485, 489, 542, 555, 561, 563 (our approach); 81, 497, 554 (we developed); 98 (We conducted); 100 (we demonstrated); 81 (we developed); 119, 354 (We collected); 140, 144, 149, 202, 287, 308, 335 (we used); 159 (we employed); 162, 372 (we obtained); 193 (we selected); 202, 277 (our case study); 209 (we added); 214 (we have); 271, 483 (we propose); 283 (we can obtain); 293 (we can also test); 320 (we adjusted); 329 (We also published); 404 (we predicted); 408 (we also showed); 495 (our case); 503 (we leveraged); 527 (Our example); 537 (our downscaling); 543, 551 (we introduced); 547 (we did not embed); 549 (we also used), 581 (The authors declare …).

Author Response

Response to Reviewer 1

The manuscript presents a case study of wind speed prediction for mainland China using the geographically weighted regression.

In general, the paper is interesting, but needs changes. The quality of the manuscript presentation should be improved in accordance with Remote Sensing template.

Response 1.1: Thanks for the constructive comments and suggestions. This manuscript was revised accordingly to improve its presentation quality.

What is the usefulness of the obtained data?

Response 1.2: Thanks for the informative comment. The ground monitoring station measurement data originated from the China Meteorological Data Service Center (http://dataNaNa.cn) and wind data were primarily gathered at a height of 10-12 m above the ground. This height was added to the three maps (Figure 9, 12 and 13 in the revision). Since the recovery of the wind potential involves estimating the wind speed at heights of 50-100 m above the ground, the predicted surface of wind speed cannot be directly applied to recover the wind potential for energy purpose. This paper focuses on machine learning methods for the high spatiotemporal mapping of wind speed rather than practical applications of wind potential recovery. For the latter, new measurement data of wind speed may be gathered to retrain the models for an appropriate evaluation of wind potential. We recognized and discussed this limitation (Lines 631-638) in the revision.

Additional observations:

The abstract should contain no more than 200 words.

Response 1.3: Thanks. The online instructions for the authors show that the abstract should be written as one paragraph (approximately 300 words)

(https://www.mdpi.com/journal/remotesensing/instructions#front). As suggested, I have shortened the abstract to 258 words. A shorter abstract of less than 200 words might not sufficiently generalize this paper. Thus, without violating the length requirement, the abstract was shortened to 258 words.

The equations are not clearly edited: especially (1), (3), (4), (5), (8).

Response 1.4: I apologize for this oversight. The equations were clarified in the revision.

Response 1.5: Thanks for this comment. The relevant descriptions have been revised throughout the manuscript to correctly reflect its authorship.

Author Response File: Author Response.docx

Reviewer 2 Report

In order to modify the paper into a more appropriate form for publication the following comments

should be addressed:

1) There is no clear justification on why using the proposed model.

Have the author tested other machine learning algorithms such as feed-forward

neural networks, neuro-fuzzy systems, etc.?

2) Can the clustering tool be applied in the specific problem so that to cluster data of similar

patterns?

3) Evaluate the results with additional metrics, such as Mean Absolute Error and Mean Absolute

Percentage Error.

4) Provide a discussion in form of steps or a flow-chart on how the proposed analysis can be adopted and applied by other researchers into other geographical regions.

Author Response

Response to Reviewer 2

In order to modify the paper into a more appropriate form for publication the following comments should be addressed:

1) There is no clear justification on why using the proposed model. Have the author tested other machine learning algorithms such as feed-forward neural networks, neuro-fuzzy systems, etc.?

Response 2.1: Thanks. There are several reasons for use of the proposed model:

1) Compared with feed-forward neural network, the proposed model (autoencoder-based deep residual network) has better learning efficiency (convergence) and accuracy due to the introduction of residual connections that are known to address vanishing/exploding gradients and accuracy degradation [1,2]. The proposed approach leveraged the internal structure of the autoencoder to implement residual connections from the encoding to decoding layers, as demonstrated in the tests of many public datasets including UCI machine learning repository, MAIAC AOD imputations and PM_2.5 predictions [3]. In this revision, the learning curves and accuracy of deep residual network were also compared with those of the feed-forward neural network (Table 2 and Figure 6). For fairness of comparison, except for residual connections, the feed-forward neural network has a similar network structure and number (100, 959) of parameters as the deep residual network. The results showed better convergence and accuracy for the proposed deep residual network. Residual connections did not increase the number of parameters but considerably improved learning efficiency and generalizability.

Table 2. Performances of individual models (learners).

Base model	Training				Independent test
Base model	R²	Adjusted R²	RMSE^a	RAE^b	R²	Adjusted R²	RMSE	RAE
ARN^c	0.68	0.68	0.76	0.49	0.66	0.66	0.72	0.51
XGBoost	0.76	0.76	0.60	0.46	0.67	0.67	0.71	0.51
RF^d	0.69	0.69	0.76	0.49	0.63	0.63	0.77	0.53
GAM^e	0.43	0.43	0.95	0.67	0.42	0.42	0.96	0.67
FFNN^f	0.58	0.58	0.83	0.57	0.58	0.58	0.82	0.57

Note: RMSE^a: root mean square Error; RAE^b: mean absolute error; ARN^c: autoencoder-based deep residual network; RF^d: random forest; GAM^e: generalized additive model. FFNN^f: feed-forward neural network.

Figure 6 Learning curves (a. validation loss; b. validation R²) of deep residual network vs. feed-forward neural network

2) Neuro-fuzzy systems are based on fuzzy and artificial neural networks. Their main drawbacks are slow convergence and convergence to the local minimum [4-5] of the feed-forward neural network. The case dataset of this paper has a large number of samples (255,209). The test shows slow convergence and low accuracy using the neuro-fuzzy system. Therefore, the author did not use the fuzzy neural network and discussed the implication (Lines 96-98, 562-565).

3) Compared with XGBoost, deep residual network has similar generalization with better spatial continuity, as demonstrated in the paper’s test. Thus we used XGBoost as one of the unrelated models in Stage 1 for ensemble predictions but used deep residual network in Stage 2 to avoid spatial discontinuity.

4) The dataset was also tested using other methods including GAM and support vector machine (SVM). For GAM, the proposed method has much better accuracy (Table 2); for SVM, the large size of the dataset made learning difficult, even when using parallel SVM (parallelSVM).

In a summary, the use of the proposed approach was justified (Lines 92-101), the results were presented in Table 2, Figure 6 and Lines 428-432, and the implications were discussed (Line 560-565): “In comparison with GAM and feed-forward neural network, the test showed that the three base learners achieved much better generalization and efficient learning. Compared with SVM and fuzzy neural system, which presented very slow convergence in the test using the wind speed samples, the three learners were convenient to use with high generalization and fewer feature engineering operations.”

References

1. He K, Sun J: Convolutional neural networks at constrained time cost. In: CVPR: 2015; 2015.

2. He, K.M., Zhang, X.Y., Ren, S.Q., Sun, J., 2016, Deep Residual Learning for Image Recognition. Proc Cvpr IEEE: 770-778.

3. Li, L., Fang, Y., Wu, J., Wang, J., 2018, Autoencoder Based Residual Deep Networks for Robust Regression Prediction and Spatiotemporal Estimation. (https://arxiv.org/abs/1812.11262).

4. Rovithakis, A.G., Christodoulou, A. M., 1994, Adaptive control of unknown plants using dynamical neural networks, IEEE Trans. Syst. Man Cybern. 24, 400–412.

5. Yu, W., 2018, PID Control with Intelligent Compensation for Exoskeleton Robots, Elsevier Inc. ISBN: 978-0-12-813380-4.

2) Can the clustering tool be applied in the specific problem so that to cluster data of similar patterns?

Response 2.2: Currently, this paper does not use a clustering method. Clustering can be used to split the samples into training and testing sets, or for fuzzy neural network; however, this task is beyond the scope of this paper.

3) Evaluate the results with additional metrics, such as Mean Absolute Error and Mean Absolute Percentage Error.

Response 2.3: Thanks. In the revision, the mean absolute error (MAE) was added (Table 2). For the mean absolute percentage error, given the zero value in the observed wind speed, this metric will result in an overflow exception; thus, no value was reported. Correspondingly, the relevant description was changed in the revision.

4) Provide a discussion in form of steps or a flow-chart on how the proposed analysis can be adopted and applied by other researchers into other geographical regions.

Response 2.4: Thanks for this good suggestion. In the revision, a flow chart was added (Figure 2) and the generalizability of the proposed method to other variables and regions was discussed (Lines 612-620): ”For application to other meteorological or surface variables and other regions, this proposed approach is divided into two stages (Figure 2). Stage 1 aims to train three base learners (deep residual network, XGBoost and random forest) and GWR using X and y from the training samples. Stage 2 involves inference (prediction) and downscaling using the new and reanalysis datasets to obtain reliable high-res grid predictions. The new dataset first supplies X to the trained base learners and GWR in Stage 1 to get the initial finely resolved predictions, . Then, the coarse-resolution reanalysis data are used to adjust or (inferred by the downscaling model in each iteration). Then, X and adjusted or were used to train or re-train the downscaling model. This process is repeated until the preselected SCV is attained as shown in Figure 5.”

Figure 2. Systematic framework: training of base models and GWR (Stage 1), and inference and downscaling (Stage 2).

Author Response File: Author Response.docx

Reviewer 3 Report

The manuscript by L. Li proposes in a first step a machine learning algorithm (MLA) ensemble, combined by GWR, and in a second step downscaling with deep residual network. This is applied to estimate spatio-temporal wind speed maps based on meteorological monitoring network measurements and combined with reanalysis data in mainland China.

This paper is well structured and presents original and innovative ideas, which are very interesting. However, it needs to be deepened. In my opinion, the main problems are the following :

1) The author specify that MLA used at stage 1 doesn’t taking into account the spatial correlation, which is recover with GWR. In my opinion, this only reflect the residual spatial correlation in the MLA prediction. The author should strongly motivate this argument.

2) The testing procedure is unclear and not enough described : the testing points cannot be chosen randomly due to the spatio-temporal nature of the monitoring data. In particular, testing monitoring stations should be putting apart since the very beginning of the overall procedure, and then used at the very end to evaluate the procedure at each time. This avoid overestimation of the spatial results due to temporal correlation.

3) Globally, the machine learning methodology should be more precise and formal

Moreover, The reference number [29] on which a part of the method is based, is from Arxiv (which is not a problem). I don’t know if the method is already validated by Machine Learning community (peer-reviewed journal).

For all these reasons, the paper couldn’t be accepted in this form. However, it contains clever ideas such that using GWR to do spatial ensemble prediction by fusing MLAs, and I encourage the author to persevere.

Also, I have the following comment:

1) The first sentence of the abstract is almost the same as the first sentence of the introduction. This kind of internal copy/paste should be avoided.

2) The author says, “finely resolved variability of wind speed also cannot be well captured (low accuracy) by traditional approaches, including multiple linear regression, nonlinear regression and spatial interpolation such as inverse distance weighting (IDW) or kriging.” (lines 48-51). Following this remark, it would be interesting to explain why this techniques don’t (shouldn’t ?) work with small scale variability.

3) Can the author explain what is an “optimal learner” ? optimal with respect to what ? (lines 83, 202, … etc). Same remark for line 311, “optimal performance in capturing nonlinear associations….”

4) When introducing a machine learning algorithm, a reference to an academic paper must appear, and not only web link to a python package, e.g. line 254. In this particular case, the author can cite e.g. L. Breiman for Random Forest. Then the library use can be add as a comment at the end of the paragraph. Same remark for XGBoost and GWR.

5) Section 3.3 : I suppose the author means Hyperparameters and not Superparameters… ?

6) Mathematical notations should be clarified, e.g. :

a. Beginning of section 3.1.1, “ Assume m models with similar errors, with E[epsilon_i] = nu, … “, is indices i run from 1 to m ? or on the data ? if the error is similar , why there is indices ? By similar, does the author mean the same ? or up to a scalar multiplication, which is the meaning of “similar” for me.

b. The LHS of equ (1) should contain parenthesis to clarifying the square.

c. The Greek letter nu (lines 181-183) and latin letter v (lines 186-190) is used to describe the same quantity and should be unified.

7) Which “test” the author speak about in line 364 ?

8) Section 4.2 is rather unclear and shows a lack of terminology. This section should be rewritten with the machine learning terminology (training/testing error). Moreover, it is not specified if the Figure 7 shows the accuracy plots of the training or testing points.

Author Response

Response to Reviewer 3

This paper is well structured and presents original and innovative ideas, which are very interesting. However, it needs to be deepened. In my opinion, the main problems are the following :

Response 3.1: Thanks for the constructive comments. The manuscript was revised accordingly. I hope this revision can address this concern.

Response 3.2: Thanks for this comment. For machine learning algorithms such as deep residual network and XGBoost, spatial correlation cannot be directly embedded into the models. However, the coordinates and their derivatives (squares and the product of latitude and longitude) were added as proxy variables to partially capture spatial correlation. GWR was further used to capture spatial autocorrelation and heterogeneity. Although the kriging method can be used to capture spatial autocorrelation, it involves variogram simulation, which is considerably limited by the sparse distribution of monitoring stations and the volatile nature of wind speed. The kriging test also shows low performance compared with the proposed approach. The advantages and justification of the three base learners were introduced (Lines 92-101) and discussed (Lines 428-432, 560-565, 607-611) in comparison with other methods including kriging.

In the leave-one-site-out-cross-validation for GWR, Moran’s I was calculated for the residuals of the predictions for every day and either no spatial correlation [p-value>=0.05 indicating the null hypothesis of complete spatial randomness (CRS) cannot be refused] or small Moran’s I (mean: 0.06; range: 0.001 to 0.15) was found, indicating very weak spatial correlation, which demonstrates that most of the spatial autocorrelation of the residuals was accounted for by the proposed approach. The results were presented (Lines 456-459) and discussed (Lines 577-579) in the revision.

Response 3.3: Thanks for this comment. The sampling procedure was double-checked and the testing points were not randomly selected but sampled according to the regions (Figure 1) and month to ensure an even distribution of samples across space and time. This stratified sampling approach is imperfect but mitigated the overestimation of spatial results, as pointed out in the comment. I apologize for this oversight. This procedure has been clarified in the revision (Lines 384-387).

3) Globally, the machine learning methodology should be more precise and formal

Response 3.4: I apologize for the issues with the formulas, which are due to conversion from Word to PDF. In the revision, the formulas have been adjusted to make them clearer and more formal.

Response 3.5: There were extensive tests and experiments using the open-source package (R and Python’s resautonet) and publicly available datasets [i.e., the UCI machine learning repository and online test dataset (https://github.com/lspatial/resautonet)] for the method proposed in [29]; however, there is a delay in the formal publication of this reference. To support this paper, extensive tests were used to compare the proposed method with other methods, including GAM and feed-forward neural network (Table 2 and Figure 6). Support vector machine and fuzzy neural network were also tested. Given a large sample size (255,209), the tests showed very slow convergence for fuzzy neural network [1-2] and support vector machine [3], due to the drawbacks of gradient descent optimization for feed-forward neural network and the sample size limitations of SVM (i.e. it is difficult to scale SVM to accommodate a large sample size). Compared with these methods, the proposed approach is easier to implement, requires fewer features engineering operations, and achieves higher learning efficiency and better generalizability. The tests were summarized (Lines 428-432) and the proposed approach was justified (Lines 92-101, 560-565) in this revision. Please also refer to Response 2.1 to Reviewer 2 for additional information.

References

1. Rovithakis, A.G., Christodoulou, A. M., 1994, Adaptive control of unknown plants using dynamical neural networks, IEEE Trans. Syst. Man Cybern. 24, 400–412.

2. Yu, W., 2018, PID Control with Intelligent Compensation for Exoskeleton Robots, Elsevier Inc. ISBN: 978-0-12-813380-4.

3. python, 2018, the sklearn package document,

https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

Response 3.6: Thanks for this comment. I hope that this revision addressed these concerns.

Also, I have the following comment:

1) The first sentence of the abstract is almost the same as the first sentence of the introduction. This kind of internal copy/paste should be avoided.

Response 3.7: I apologize for this issue. The first sentence in the abstract and the introduction has been changed respectively so that both are different.

Response 3.8: This statement was made regarding the tests with the dataset and the comparison of these methods (see Table 2 and Lines 428-430 and 607-611, although it is not completely reported). There may be two reasons for this phenomenon: “the sparse spatial distribution of the wind speed monitoring stations, and limited generalization of these methods, compared with advanced machine learning methods such as XGBoost and deep learning.” (Lines 55-57). Please also see Response 2.1 for additional information.

Response 3.9: Here, “optimal learner” refers to good performance (generalization) and it can be reflected by test accuracy (high R² or low RMSE). However, some uses (lines 83, 202, etc.) of “optimal” were misleading. I apologize for the confusion. The issues were corrected to make the descriptions clearer. See the version of the manuscript with tracked changes for details.

Response 3.10: Thanks for this good advice. The corresponding changes have been made for random forest, XGBoost and GWR in this revision.

5) Section 3.3 : I suppose the author means Hyperparameters and not Superparameters… ?

Response 3.11: Thanks for the correction. I have made the corresponding changes.

6) Mathematical notations should be clarified, e.g. :

Response 3.12: Thanks for this comment. Here, “similar” was misleading; thus, it has been removed in this revision. i denotes the model’s indices (running from 1 to m). Here, the errors of the models are drawn from a zero-mean multivariate normal distribution with error variance and covariances . Corresponding changes (Lines 208-220) were made to facilitate understanding of the argument: “models with no correlations or weak correlations can theoretically generate better ensemble predictions with less error”.

b. The LHS of equ (1) should contain parenthesis to clarifying the square.

Response 3.13: I apologize for the missing parenthesis. The conversion from Word to PDF caused this issue. The original formula in MathType was replaced with an image to avoid this issue.

c. The Greek letter nu (lines 181-183) and latin letter v (lines 186-190) is used to describe the same quantity and should be unified.

Response 3.14: I apologize for this oversight. I have changed v to ν (Lines 213-223) as suggested.

7) Which “test” the author speak about in line 364 ?

Response 3.15: The test referred to is the calculation of skewness. Because the wording may have been misleading, I have revised this sentence as follows: “The result (…) showed less skewness” (Line 414).

Response 3.16: Thanks for this constructive comment. The machine learning terminology has been changed in Section 4.2 and the other parts. Figure 8 (Figure 7 in the original manuscript) shows the test result, and it has been adjusted to improve specificity.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

The authors dealt with the comments.

Author Response

Thanks for the constructive comments.

Reviewer 3 Report

Thank you for your replies and corrections.

I still have one question regarding your Response 3.2 : Is it possible to see some spatial variograms of the spatial residuals, drawn randomly in time ?

Author Response

Thank you for your replies and corrections.

I still have one question regarding your Response 3.2 : Is it possible to see some spatial variograms of the spatial residuals, drawn randomly in time ?

Response: Thanks. For variogram, the exponential model was selected through sensitivity analysis and fitted in ArcGIS (version 10.2) for each day’s residuals of the 2015 ensemble predictions by GWR. The resultant variogram models were used in universal kriging to estimate the corresponding day’s residuals.

The results of four typical days (spring: 04/01/2015; summer: 07/01/2015; autumn: 10/01/2015; winter: 12/20/2015) are presented in Supplementary Materials Table S1 for optimal parameters (nugget, range and partial sills) of the variogram exponential models, and LOOCV R² and RMSE between the original residuals and the estimated residuals by universal kriging, as well as Supplementary Materials Figure S3 and S4 for the plots of variogram and scatter points between original and estimated residuals. Very small LOOCV R² (negative) and almost random patterns of the scatter plots showed little contribution by variogram based universal kriging to estimation of the residuals. The result of very small Moran’s I and little contribution by variogram-based universal kriging illustrated that most of spatial autocorrelation was actually accounted for by the proposed approach.

Table S1 Variogram fitting and cross validation for estimation of the residuals of predicted wind speed for four typical seasonal days

Date	Parameters for variogram (exponential models)			LOOCV^b R²	LOOCV RMSE (m/s)
Date	Nugget (m/s)	Range (km)	Partial sill (m/s)	LOOCV^b R²	LOOCV RMSE (m/s)
04/01/2015	0.14	54	1.05	-0.24	0.72
07/01/2015	0.21	273	0.00	-0.06	0.55
10/01/2015	0.56	199	0.23	-0.25	0.72
12/20/2015	0.24	582	0.05	-0.03	0.54

Note: LOOCV^a: leave one site out cross validation for universal kriging.

The relevant introduction was added (Lines 371-375), the results were described (Lines 434-442) and implication was discussed (Lines 558-561) in the revision. Hope this revision can address the reviewer’s concern.

Author Response File: Author Response.docx

Article Menu

Geographically Weighted Machine Learning and Downscaling for High-Resolution Spatiotemporal Estimations of Wind Speed

Further Information

Guidelines

MDPI Initiatives

Follow MDPI