Next Article in Journal
Visualizing Current and Future Climate Boundaries of the Conterminous United States: Implications for Forests
Next Article in Special Issue
Assessing the Ability of Image Based Point Clouds Captured from a UAV to Measure the Terrain in the Presence of Canopy Cover
Previous Article in Journal
Genetic Diversity and Population Structure of Alnus cremastogyne as Revealed by Microsatellite Markers
Previous Article in Special Issue
Influence of Scan Density on the Estimation of Single-Tree Attributes by Hand-Held Mobile Laser Scanning
 
 
Article
Peer-Review Record

Modelling and Predicting the Growing Stock Volume in Small-Scale Plantation Forests of Tanzania Using Multi-Sensor Image Synergy

Forests 2019, 10(3), 279; https://doi.org/10.3390/f10030279
by Ernest William Mauya 1,2,*, Joni Koskinen 1, Katri Tegel 3, Jarno Hämäläinen 3, Tuomo Kauranne 3 and Niina Käyhkö 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Forests 2019, 10(3), 279; https://doi.org/10.3390/f10030279
Submission received: 29 January 2019 / Revised: 18 March 2019 / Accepted: 19 March 2019 / Published: 21 March 2019

Round 1

Reviewer 1 Report

General comments

English language review is needed to clarify sentence structure and help with word selection. The writing is understandable but needs improvement.

You don’t mention any accuracy information for plot locations. Given the small plot size and the use of hand held GPS receivers, you could easily have position errors larger than your plot radius. How might these errors affect your models and overall results? You cover this in your discussion so no need to address this comment.

Did you evaluate interaction terms for your models or possible transformations to the response variable? While not always applicable, these can result in improved models.

Specific comments (referenced by line numbers)

Line 49: What is meant by volume?

Figure 3: You need to do something to make the numbers on the images more readable. You could simply move the numbers to the left of each image (in the white space) or put the numbers on a white square within the image (black text on a white background).

Lines 168-169: Figure 2 caption: It is unclear why you define DBH in the caption but none of the other acronyms used in the schematic (e.g. VV, VH, HH, HV, PCA)

Lines 178-185: it is unclear how you go from 3 sample blocks with 20 stands in each and 3 plots per stand to 77 plots. Were stands smaller than 0.5 ha excluded prior to sampling or were the smaller stands dropped from the 20 chosen for the block?

Line 184: How were plot locations determined? You mention using hand held GPS receivers to navigate to the plots but don’t mention if you obtained more accurate locations for the actual plot center.

Lines 187-188: Was there a minimum DBH or were seedlings and small saplings measured?

Line 218: PCA should be PC. How many components were used and what was the criteria used to select the number of components?

Tables 5 & 6: the “Notes” sections at the bottom of the tables don’t match the variables in each of the tables. Also the model names are not explained. I get the MC and MW but A1, A2, and A3 are all year but what do the number correspond to? Same for DS1, DS2 and DS3 and RS2. I think the numbers correspond to Sentinel-1, Sentinel-2 and mixed but I don’t see this explained in the text.

Lines 408-409: Looking at table 5, the RMSEr values for model MC_DS2 is lower than that for model MC_RS2. This seems to contradict the statement that RMSEr values for Sentinel-2 models fit using RS variables had lower RMSEr than models fit using DS 


Author Response

Reviewer 1

Comments and Suggestions for Authors

 

General comments

English language review is needed to clarify sentence structure and help with word selection. The writing is understandable but needs improvement.

As by now the paper has been checked for English and all the suggestions have been incorporated in the manuscripts. The Changes are presented using track changes as suggested.

You don’t mention any accuracy information for plot locations. Given the small plot size and the use of hand held GPS receivers, you could easily have position errors larger than your plot radius. How might these errors affect your models and overall results? You cover this in your discussion so no need to address this comment.

          We appreciate

Did you evaluate interaction terms for your models or possible transformations to the response variable? While not always applicable, these can result in improved models.

    Yes transformation of both square root and logarithmic were tested.         However, when transforming back the predictions by accounting also the     correction factors, the accuracy of the models were not improving as         compare to the linear models with no transformation. Thus it was more         objective to fit the models without transformation

Specific comments (referenced by line numbers)

Line 49: What is meant by volume?

          Here, we meant large size of the data, which will require larger storage           infrastructures like savers

Figure 3: You need to do something to make the numbers on the images more readable. You could simply move the numbers to the left of each image (in the white space) or put the numbers on a white square within the image (black text on a white background).

          -We assume this comment was meant for figure 1. We have moved the             number to the left of each image in Figure 1, as suggested

Lines 168-169: Figure 2 caption: It is unclear why you define DBH in the caption but none of the other acronyms used in the schematic (e.g. VV, VH, HH, HV, PCA)

          -We have now decided not to explain any of the abbreviations in the         Figure 2 as they are all explained in the sections (2.4 and 2.5) that             describe the approach in more detail.

Lines 178-185: it is unclear how you go from 3 sample blocks with 20 stands in each and 3 plots per stand to 77 plots. Were stands smaller than 0.5 ha excluded prior to sampling or were the smaller stands dropped from the 20 chosen for the block?

     The 20 stands randomly chosen from each block represented stand candidates.    In the field we omitted some of the candidates because of recent clearings and  ield time constraints. Thus, ending up with roughly 10 stands in each block and        3 sample plots in each stand. This is clarified in LINES 195-197 and 204-206

Line 184: How were plot locations determined? You mention using hand held GPS receivers to navigate to the plots but don’t mention if you obtained more accurate locations for the actual plot center.

          We navigated to the previously defined plot locations with handheld     GPS. Then the plot location was recorded with the hand held GPS and these     were   used in the analysis. We have added a text to clarify this LINES 208-209

Lines 187-188: Was there a minimum DBH or were seedlings and small saplings measured?

          Yes there was a minimum DBH. The trees measured were those with         DBH ≥ 5 cm. A sentence to clarify this has been inserted on lines 210-211

Line 218: PCA should be PC. How many components were used and what was the criteria used to select the number of components?

          -PCA is changed to PC (LINE 241). We used 3 first PC:s as they         explained 98,8% and 98,9% of the variance of all 10 bands for Sentinel-2     20170916 and Sentinel-2 20180213 LINES 247-248.

Tables 5 & 6: the “Notes” sections at the bottom of the tables don’t match the variables in each of the tables. Also the model names are not explained. I get the MC and MW but A1, A2, and A3 are all year but what do the number correspond to? Same for DS1, DS2 and DS3 and RS2. I think the numbers correspond to Sentinel-1, Sentinel-2 and mixed but I don’t see this explained in the text.

          -We have added descriptions to table 4 which should explain the         variables within the best categories LINE 314

Lines 408-409: Looking at table 5, the RMSEr values for model MC_DS2 is lower than that for model MC_RS2. This seems to contradict the statement that RMSEr values for Sentinel-2 models fit using RS variables had lower RMSEr than models fit using DS .

          This oversight has now been corrected, however with new analysis the     results now AGREES with the above statement see Table 5&6.

 


Author Response File: Author Response.docx

Reviewer 2 Report

The study evaluated the ability of ALOS PALSAR-2, Sentinel -1 (SAR) and Sentinel -2 in predicting growing stock volume in a small region of Tanzania forest. The overall design of the method is reasonable. My main suggestion is to discuss more about the advance of the new approach compared to previous studies, since the author claims the approach brings new insights on evaluating the best practice of estimating GSV using satellite-based remote sensing techniques. My detailed comments are below:


Line 246-249: Please add more description of the preprocessing, e.g. how do you deal with data distortion in this mountain area.

Line 263-265: what's the distribution of weights and how many neighbor pixels are used?

Table 4: Please describe abbreviations. The table should stand alone without reference back to the main text.

Line 269: The model design is essentially searching variables that have the best overall linear relationship with GSV, but have you tested other machine learning methods?

Line 327-328: I would suggest displaying the variation of RMSE and RMSEr from the 10-fold results. This helps readers understand the stability of the model.


Author Response

Reviwer 2

 

Comments and Suggestions for Authors

 

The study evaluated the ability of ALOS PALSAR-2, Sentinel -1 (SAR) and Sentinel -2 in predicting growing stock volume in a small region of Tanzania forest. The overall design of the method is reasonable. My main suggestion is to discuss more about the advance of the new approach compared to previous studies, since the author claims the approach brings new insights on evaluating the best practice of estimating GSV using satellite-based remote sensing techniques. My detailed comments are below:

 

    Line 246-249: Please add more description of the preprocessing, e.g. how do  you deal with data distortion in this mountain area.

 

-The pre-processing steps were standard steps in SAR pre-processing. Radiometric normalization was used to decrease the distortions of topography to radiometric variability. We have added a more precise explanation on the steps LINES 274-279

Line 263-265: what's the distribution of weights and how many neighbor pixels are used?

-This is a very important comment which in the end led us re-calculate the values and re-fitted the models. Previously, as mentioned in the text, we had used nearest neighbour algorithm in resampling. This is because it’s a standard algorithm in image co-registration where the image displacements are relatively small. However, simultaneously we rescaled the images to a common 20m resolution. Nearest neighbour algorithm is not an optimal algorithm for upscaling continuous layers. Upscaling Sentinel-1 and Sentinel-2 images from 10m resolution to 20m resolution with nearest neighbour algorithm causes the 20m cell to have a single 10m cell value although it contains four of them. This caused a displacement of the cells compared to the original image. When we discovered this we resampled the image layers again using bilinear interpolation. Bilinear interpolation uses the values of four closest cells and the value for the new cell is a weighted average of these four cells (LINES 294-295). This change also showed in slight improvement in model performances as presented in Tables 5&6, LINES 420 and 442. The new analysis also changed all the figures which were previously presented in the result section. However all the changes are in Track changes and can easily been seen in the documents. Slightly changes in the DISCUSSION are also presented in track changes as the results of the re-analysis.

Table 4: Please describe abbreviations. The table should stand alone without reference back to the main text.

         -We have now revised the table and added description for the categories

Line 269: The model design is essentially searching variables that have the best overall linear relationship with GSV, but have you tested other machine learning methods?

          Yes we have tested Random Forest but yet the performance of Linear Models was best, thus we thought good we focus with Linear models as the     commonest method for modelling forest attributes using Remote sensing data.

Line 327-328: I would suggest displaying the variation of RMSE and RMSEr from the 10-fold results. This helps readers understand the stability of the model.

        Yes the RMSE and RMSEr are displayed in Tables 5 &6 as well as in Figure 7

Additional revision by the authors

      All additional revision by the authors and language check are in track changes         in the respective sections

 


Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

Concerns are clarified. Appreciate the authors' feedback.

Author Response

Round 2

One small comment regarding Reviewer #2's comment over Line 327-328. Line 327-328: I would suggest displaying the variation of RMSE and RMSEr from the 10-fold results. This helps readers understand the stability of the model Please read more carefully that comment because I don't think you have addressed that one satisfactorily. While in Tables 5 & 6 and Figure 7 you express some variation in resulting RMSEs and their relative counterparts, this is not what the reviewer requested. The reviewer requests you to express the variation within each of those figures in terms of the ten different RMSEs you obtain within your 10-fold procedure. This concerns to the reliability of the cross-validation itself and the chances of using that to show some evidence that your models are not simply overfitted to the sample. I did myself publish about that topic recently (Valbuena et al., 2017 reference below), in terms of comparing cross-validated sums of squares against non-cross-validated ones. But there is still a lot to do and say about this topic and an ongoing discussion on how reliable are the methods we all are traditionally using for accuracy assessment. I thus wish to still give you a chance to provide a more appropriate answer to that comment and perhaps put forward in your paper your own conclusions and insights about these issues.

Valbuena et al. (2017) Enhancing of Accuracy Assessment for Forest Above-Ground Biomass Estimates Obtained from Remote Sensing via Hypothesis Testing and Overfitting Evaluation. Ecological Modelling 366: 15-26

Thanks for the clarifications from both the reviewer and the editor. Figures 5 and 6 have been inserted to display the variability in RMSEr for the different folds. However additional sentences have been inserted in the methodology section Lines 348-349 and in the results section lines 383-385. In addition to that, in the methodology section we stated clearly as to why 10-fold cross validation is favored especially when k value is 10 or 5. One of the reason is that these values have been shown (See. James et al. 2013) empirically to yield test error rate estimates that suffer neither from excessively high bias nor from very high variance. This is the reason why we previously focused on showing the results of k= 10 only. These explanation are now shown also through Figures 5& 6 where the RMSEr values at k=5 and k= 10 are having lower RMSEr values. However for fair comparison of the models is advised to use a common value for both models. 

 


Author Response File: Author Response.docx

Back to TopTop