Mapping the Potential Presence of the Spotted Wing Drosophila Under Current and Future Scenario: An Update of the Distribution Modeling and Ecological Perspectives
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis paper uses a random forest algorithm to model the current distribution of SDW and predict the future global distribution of SDW, classifying environmental suitability into three levels: high, medium and low. This research has certain implications for the prevention and control of agricultural pests and diseases. Provide the following comments:
- Why use random forest algorithm to predict SDW distribution? I recommend you elaborate on why you chose this algorithm.
- This paper discussed the distribution of current and future climatically suitable areas for the SWD, which were based on prediction results. However, how can you prove that the prediction results are correct? And, the two supplementary file links provided are invalid.
- What does the Increase in node purity in Table 1 represent? This paper did not analyze the Increase in node purity and increase mean square error.
Author Response
Comments 1: Why use a random forest algorithm to predict SDW distribution? Please elaborate on why you chose this algorithm.
R: We agree. We have added a referenced explanation of why the Random Forest algorithm was used and the significance of the number of pseudo-absences adopted, which is crucial for balancing model performance and computational efficiency.
Comments 2: This paper discussed the distribution of current and future climatically suitable areas for the SWD, which were based on prediction results. However, how can you prove that the prediction results are correct? The two supplementary file links provided are invalid.
R: We have emphasized the importance of model validation in evaluating the results of a species distribution model. This involves an approach that uses different metrics for assessment. To ensure the model's accuracy, the data must be divided into training (to adjust the model) and testing (to validate the prediction). Metrics such as the Area Under the ROC Curve (AUC) are used to assess differences between presences and absences. Values above 0.8 indicate good model performance. Other metrics, including the True Skill Statistic (TSS), Kappa, and the Mean Squared Error (MSE), also contribute to model evaluation.
Comments 3: What does the Increase in node purity in Table 1 represent? This paper did not analyze the Increase in node purity and increase in mean square error.
R: We have clarified the role of node purity and mean squared error in the Methodology section. These metrics are used by machine learning-based algorithms to assess the importance of predictor variables in the models. The mean squared error measures the impact of removing a variable on model performance. An increase in mean squared error indicates the variable's importance for the prediction, while a decrease may suggest little or no importance to the model.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsI have thoroughly reviewed the manuscript entitled “Mapping the Potential Presence of the Spotted-Wing Drosophila under Current and Future Scenarios: An Update of the Distribution Modeling and Ecological Perspectives.” The study investigates the current and future potential distribution patterns of Drosophila suzukii by integrating pest occurrence data from multiple databases and climate variables from the WorldClim platform. A Random Forest algorithm was employed to construct current distribution models and predict future scenarios. Below are my detailed suggestions for improvement:
- Introduction
According to my understanding, several studies have investigated the distribution of spotted-wing drosophila (Drosophila suzukii) under future climate change scenarios, the current manuscript (Lines 73-84) merely lists these previous studies without detailing their specific methodologies and findings. This omission makes it difficult to emphasize how the present research differs from and innovates upon prior work. It is recommended that the authors clearly articulate in the Introduction section their rationale for re-evaluating the distribution patterns of D. suzukii under climate change scenarios, explicitly highlighting the novel contributions and distinguishing features of their current investigation compared to existing literature.
- Materials and Methods
2.1 Occurrence of D. suzukii
The manuscript does not mention incorporating occurrence data from the EPPO Global Database (https://gd.eppo.int/taxon/DROSSU/distribution), which contains 38 records of the species’ native distribution. Native-range data are critical for robust model calibration, as they reflect the pest’s fundamental ecological niche. Integrating these data could significantly enhance model accuracy.
2.2 Historical Climate Data
I saw this manuscript used historical climate data from WorldClim v. 2.1 covering 1970 to 2000. I believe that a 30-year dataset appears insufficient in duration. I recommend extending the temporal coverage to include post-invasion climate data (e.g., 2000–2020) to better align the model with the species’ invasive dynamics.
Table 1 should be relocated to the Methods section. The manuscript lacks clarity on how the contribution rates of the 19 bioclimatic variables were calculated and why only 9 variables were retained. A detailed explanation of variable selection criteria is essential.
2.3 Spatial Resolution
I am very concerned about the resolution(2.5 arc minutes ,4.5 km2) used. WorldClim variables are currently available at 1km resolution, which considering the minimum number of presence data used is the best resolution to use. I also saw people use the highest possible resolution (30 s = 0.93 × 0.93 km = 0.86 km2 at the equator)(Ørsted & Ørsted,2018)。
I suggest re-running the models at this resolution to ensure that the resulting maps are of the quality needed for decision making.
- Results
3.1 Current Distribution of D. suzukii
Figure 1 inadequately represents the species’ native range, with only a few occurrence points in East Asia (China, Korean Peninsula, and Japan). Native-range data are pivotal for calibrating models to predict future distributions, particularly under climate change. Expanding this dataset is strongly advised.
- Discussion
It is recommended to incorporate host plant distribution in the discussion. For instance, the highly suitable areas predicted by the model based on climate data and current distribution information might lack appropriate host plants, which could also prevent the establishment of spotted-wing drosophila (Drosophila suzukii) populations.
Additionally, it would be valuable to highlight the novel aspects and methodological improvements of this study compared to previous research. For example, does the predicted distribution range demonstrate greater spatial extent or higher accuracy? What supplementary value does the current findings provide to existing studies? Specific comparisons could be made regarding how this prediction complements previous research limitations, enhances model precision through integrated multi-factor analysis, or reveals new ecological adaptation patterns.
Author Response
- Introduction
Comments 1: According to my understanding, several studies have investigated the distribution of spotted-wing drosophila (Drosophila suzukii) under future climate change scenarios, the current manuscript (Lines 73-84) merely lists these previous studies without detailing their specific methodologies and findings. This omission makes it difficult to emphasize how the present research differs from and innovates upon prior work. It is recommended that the authors clearly articulate in the Introduction section their rationale for re-evaluating the distribution patterns of D. suzukii under climate change scenarios, explicitly highlighting the novel contributions and distinguishing features of their current investigation compared to existing literature.
R: We agree with the suggestion to not only list previous studies on the distribution modeling of SWD under different climate change scenarios. Therefore, we have included a new paragraph (highlighted in yellow) discussing the latest findings on the SDM of D. suzukii and its applications. Additionally, another paragraph has been added to emphasize the use of different methodologies for monitoring the pest’s distribution.
It is important to highlight that the data collection period spans from the beginning of the invasion (first records) to January 2024, providing an updated assessment of potential distribution and supplying data for re-evaluating distribution in future scenarios
- Materials and Methods
2.1 Occurrence of D. suzukii
Comments 2: The manuscript does not mention incorporating occurrence data from the EPPO Global Database (https://gd.eppo.int/taxon/DROSSU/distribution), which contains 38 records of the species’ native distribution. Native-range data are critical for robust model calibration, as they reflect the pest’s fundamental ecological niche. Integrating these data could significantly enhance model accuracy.
R: We understand that incorporating occurrence records from the species' native range allows for more precise calibration of distribution models.
However, we note that our study already includes data from six well-established databases: the Global Biodiversity Information Facility (GBIF), iNaturalist (iNat), VertNet, Berkeley Ecoinformatics Engine (Ecoengine), Integrated Digitized Biocollections (iDigBio), and TaxoDros. These datasets already encompass the 38 native-range records of D. suzukii. Our models, which are based on 3,561 occurrence records, are therefore precisely calibrated to enable new insights into the pest’s distribution.
Integrating these additional 38 EPPO occurrence coordinates into the calibration spectrum of the model would not result in significant changes, as these native-range coordinates have already been considered. We recommend reviewing the 38 occurrence records in the Asian region, available in Supplementary Material S1 (coordinates used).
2.2 Historical Climate Data
Comments 3: I saw this manuscript used historical climate data from WorldClim v. 2.1 covering 1970 to 2000. I believe that a 30-year dataset appears insufficient in duration. I recommend extending the temporal coverage to include post-invasion climate data (e.g., 2000–2020) to better align the model with the species’ invasive dynamics.
R: We understand that the ideal period for evaluating bioclimatic variables should align with the time frame in which the species became invasive in various regions worldwide.
However, according to the literature, the second version of WorldClim (v. 2.1), which includes the monthly averages of abiotic variables from 1970 to 2000, remains the most comprehensive climate dataset available and is widely used in global studies. Updating the pest's distribution should maintain consistency with past studies to ensure reliable data and allow for meaningful comparisons with previous findings.
Furthermore, as per the latest CMIP6 report, these variables are compatible with most climate models used in future studies.
Thus, we recognize the necessity of continuing with this dataset to enable the reproduction of environmental conditions and infer new findings in species distribution modeling.
Comments 4: Table 1 should be relocated to the Methods section. The manuscript lacks clarity on how the contribution rates of the 19 bioclimatic variables were calculated and why only 9 variables were retained. A detailed explanation of variable selection criteria is essential.
R: We have added an additional explanation in the Methodology section detailing how the contribution of each variable was calculated.
The contribution rates of each climatic variable in the D. suzukii model were obtained through the Variance Inflation Factor (VIF) analysis to prevent independent variables from exhibiting linear relationships and generating inaccurate models.
Since VIF depends on species distribution (the variable), we believe that placing this information in the Methods section may not be the most appropriate, as it also constitutes a scientific result.
2.3 Spatial Resolution
Comments 5: I am very concerned about the resolution(2.5 arc minutes ,4.5 km2) used. WorldClim variables are currently available at 1km resolution, which considering the minimum number of presence data used is the best resolution to use. I also saw people use the highest possible resolution (30 s = 0.93 × 0.93 km = 0.86 km2 at the equator)(Ørsted & Ørsted,2018)。
I suggest re-running the models at this resolution to ensure that the resulting maps are of the quality needed for decision making.
R: We acknowledge that distribution modeling studies aiming to provide decision-makers with insights for pest management programs would benefit from higher-resolution data (~1 km² pixels).
However, at a global scale, such a resolution requires significant computational power, which is beyond the capacity of our research group. Running models at this high level of detail (30s resolution, ~1 km²) would significantly increase data volume and computational demands, making analysis infeasible. Additionally, we are concerned about spatial bias in model accuracy, as models trained with large but highly concentrated datasets may not accurately predict species occurrences in unsampled regions.
This issue of data concentration in models has been discussed in the study: DOI 10.7717/peerj.10411
Furthermore, there appears to be a consensus among D. suzukii modeling researchers to use resolutions between 2.5 and 5 arc minutes for global studies. Higher-resolution data are typically employed for regional studies.
Here are some examples of related studies and the resolutions they used:
Santos et al. (2017) – Global study on SWD – 5 arc min resolution
Viana et al. (2023) – SWD study in Brazil – 2.5 arc min resolution
Reyes & Lira-Noriega (2020) – Global SWD study – 10 arc min resolution
Nair & Peterson (2023) – Global SWD and parasitoid study – 5 or 10 arc min resolution
De la Veja & Corley (2019) – SWD distribution in Patagonia/Argentina – 2.5 arc min resolution
Fraimount & Monnet (2018) – Global SWD study – 10 arc min resolution
- Results
3.1 Current Distribution of D. suzukii
Comments 6: Figure 1 inadequately represents the species’ native range, with only a few occurrence points in East Asia (China, Korean Peninsula, and Japan). Native-range data are pivotal for calibrating models to predict future distributions, particularly under climate change. Expanding this dataset is strongly advised.
R: The occurrence points in D. suzukii’s native distribution are overlapped on the map due to the global scale used. These occurrence coordinates for the Asian region are available in Supplementary File S1.
- Discussion
Comments 7: It is recommended to incorporate host plant distribution in the discussion. For instance, the highly suitable areas predicted by the model based on climate data and current distribution information might lack appropriate host plants, which could also prevent the establishment of spotted-wing drosophila (Drosophila suzukii) populations.
R: We understand that incorporating the distribution of cultivated plants as biotic resources can be a limiting factor for the pest’s distribution range. However, D. suzukii has an extensive variety of host plants. For example, in Latin America, Garcia et al. (2022) listed more than 60 host plants for the pest (doi: 10.1093/jee/toac052).
Furthermore, Zhou et al. (2024) concluded that the relative importance of climate and host availability in determining potential SWD areas may not solely depend on feeding habits (polyphagy). Therefore, the role of host availability as a driver of changes in potential areas may be underestimated (doi: https://doi.org/10.3390/f15010206).
Additionally, we believe that the scope of this study is to predict distribution based on abiotic variables using correlative modeling, focusing on environmental suitability according to the insect’s tolerances.
Comments 8: Additionally, it would be valuable to highlight the novel aspects and methodological improvements of this study compared to previous research. For example, does the predicted distribution range demonstrate greater spatial extent or higher accuracy? What supplementary value does the current findings provide to existing studies? Specific comparisons could be made regarding how this prediction complements previous research limitations, enhances model precision through integrated multi-factor analysis, or reveals new ecological adaptation patterns
R: We agree with the suggestion and have added a new paragraph in the discussion section highlighting the novel aspects and methodological improvements implemented in this study compared to previous research. We emphasize the use of the Random Forest algorithm as a more robust and accurate machine learning algorithm, as well as the extensive data collection from six occurrence databases. With these new insights, we provide more reliable and precise information, enhancing decision-making processes for policymakers and researchers working on Drosophila suzukii distribution and management.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe present manuscript provides a comprehensive and well-structured analysis of the current and future potential distribution of Drosophila suzukii, using species distribution modeling (SDM). The study is highly relevant given the global agricultural impact of SWD, and the use of the Random Forest algorithm for modeling is appropriate and well-justified. The manuscript is well-written, but there are areas where improvements can be made to enhance clarity, scientific rigor, and readability.
The use of the Random Forest algorithm is well-suited for this type of ecological modeling. The inclusion of multiple climate models (CMIP6) and the SSP585 scenario adds robustness to the future projections. Although, while the AUC value of 0.99 indicates excellent model performance, it would be beneficial to include additional validation metrics (e.g., kappa statistic, true skill statistic) to further confirm the model's accuracy. This is especially important given the high stakes of pest management decisions based on these predictions.
Moreover, while the generation of pseudo-absences is a common practice in SDM, the manuscript could benefit from a more detailed explanation of how these pseudo-absences were selected and why a ratio of 2:1 (pseudo-absences to occurrences) was chosen. This would help readers understand the potential biases introduced by this method.
Furthermore, given the importance of integrated pest management (IPM), it would be beneficial to expand on potential biological control strategies and how they might interact with the predicted distribution of SWD.
There is, also, some repetition in the introduction and discussion sections. For example, the importance of climate change in pest distribution is mentioned multiple times. Streamlining these sections would improve readability.
In Figure 2, the color scheme for suitability levels is effective, but the caption could be more descriptive. For example, it could explain what the different colors represent in terms of suitability (e.g., red = low suitability, green = high suitability).
Finally, the manuscript could be benefit from a more extensive literature of recent papers about the use of the Random Forest model, or others on insect pest prediction or early detection.
Overall, this is a well-executed study that makes a significant contribution to the understanding of the current and future distribution of D. suzukii.
Some additional comments/suggestions can be found in the attached pdf.
Comments for author File: Comments.pdf
Author Response
Review Report Form (Revisor 3)
Comments and Suggestions for Authors
The present manuscript, which provides a comprehensive and well-structured analysis of the current and future potential distribution of Drosophila suzukii using species distribution modeling (SDM), is of significant importance in the field of ecology and pest management. The global agricultural impact of SWD makes this study highly relevant, and using the Random Forest algorithm for modeling is appropriate and well-justified. While the manuscript is well-written, there are areas where improvements can be made to enhance clarity, scientific rigor, and readability.
Comments 1: The Random Forest algorithm, well-suited for this type of ecological modeling, has been used in a robust manner in this study. The inclusion of multiple climate models (CMIP6) and the SSP585 scenario adds further robustness to the future projections. The AUC value of 0.99, indicating excellent model performance, is a testament to the reliability of the model. However, it would be beneficial to include additional validation metrics (e.g., kappa statistic, accurate skill statistic) to further confirm the model's accuracy. This is especially important given the high stakes of pest management decisions based on these predictions.
R: We agree with the suggestion. In the methodology and results sections, we added a validation metric, the True Skill Statistic (TSS), to explain the model's performance. We chose TSS because it is a reliable metric that takes into account both sensitivity and specificity, providing a comprehensive evaluation of the model's performance.
Comments 2: Moreover, while the generation of pseudo-absences is a common practice in SDM, the manuscript could benefit from a more detailed explanation of how these pseudo-absences were selected and why a ratio of 2:1 (pseudo-absences to occurrences) was chosen. This would help readers understand the potential biases introduced by this method.
R: We agree with the suggestion and have included an explanation in the methodology section detailing the reason for using twice the number of pseudo-absences to improve the models' robustness.
Comments 3: Furthermore, given the importance of integrated pest management (IPM), it would be beneficial to expand on potential biological control strategies and how they might interact with the predicted distribution of SWD.
R: We agree. We have added mentions in the introduction section discussing the benefits of potential biological control strategies in interaction with the predicted distribution of SWD.
These strategies, such as introducing natural predators or using pheromones to disrupt mating, can significantly reduce the population of SWD and mitigate its impact on agriculture.
Comments 4: The introduction and discussion sections also contain some repetition. For example, the importance of climate change in pest distribution is mentioned multiple times. Streamlining these sections would improve readability.
R: We have modified the text to reduce the repetition of the importance of climate change in pest distribution, instead stating that certain climatic factors can contribute to ecological modeling to predict the expansion or contraction of the pest under these climatic scenarios.
Comments 5: The color scheme for suitability levels in Figure 2 is effective, but the caption could be more descriptive. For example, it could explain what the different colors represent regarding suitability (e.g., red = low suitability, green = high suitability).
R: We agree with the suggested modification of the caption for Figure 2 and have adjusted the explanation of the color scheme used.
Comments 6: Finally, the manuscript could benefit from a more extensive literature review of recent papers about using the Random Forest model or others on insect pest prediction or early detection.
R: We have added studies on the detection and prediction of SWD in the introduction section. These studies, which focus on species distribution modeling and the Random Forest algorithm, provide valuable insights into insect pest prediction and early detection.
Comments 7: Overall, this is a well-executed study that significantly contributes to understanding the current and future distribution of D. suzukii. The authors' efforts in conducting this study are highly appreciated and have not gone unnoticed.
Some additional comments/suggestions can be found in the attached pdf.
R: All suggestions and comments from the attached PDF have been addressed and corrected in the text.
Author Response File: Author Response.pdf