Review Reports - Improved Prediction of Aquatic Beetle Diversity in a Stagnant Pool by a One-Dimensional Convolutional Neural Network Using Variational Autoencoder Generative Adversarial Network-Generated Data

Round 1

Reviewer 1 Report (Previous Reviewer 2)

Once a series of corrections and explanations have been made. I think that the paper could be considered for publication. Nevertheless, I think that even when it is a good explained and developed work, it suffers from a lack of scientific soundness and significance: we are very used to papers that develop neural models in order to show the relation between variables. even when it is very interesting from the user's point of view, that suppose a little advance in the machine learning knowledge frontier.

I think that it could be good to have a final check because some expressions could be written in a more formal way. But I think English is not an issue in this paper

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report (New Reviewer)

The reviewed manuscript focuses on research aimed at enhancing biodiversity prediction through utilising a variational autoencoder generative adversarial network. The topic is interesting and relevant, and the manuscript is well-structured, encompassing all necessary components for this type of publication. However, several disadvantages must be corrected before the manuscript can be accepted. I have outlined my remarks below:

1. In the introduction section, it would be preferable to allocate the unresolved aspects of the general problem and emphasize the main contributions of the authors' research. This would give readers a clearer understanding of the research gaps and the study's novelty.

2. Please ensure the formulas in the manuscript are aligned according to the specified formatting guidelines.

3. Figure 2: The Pooling level is missing. Is this intentional? Additionally, it is unclear whether the output represents the loss or the prediction result. Please correct this figure by presenting the correct CNN structure used within your research framework.

4. Please explain how the CNN hyperparameters were determined. This information is currently absent, yet the values of these hyperparameters significantly impact the operation of the CNN.

5. Section 2 (Materials and Methods) has no information regarding evaluating CNN's effectiveness using quantitative criteria calculated on the test subset. Merely including the loss function is insufficient. Please add the necessary information, such as Accuracy, F1-score, MCC, and other relevant metrics.

6. At the end of the conclusion, please include further perspectives on the authors' research. This would provide readers with insights into potential future directions for the study.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report (New Reviewer)

Thanks for the response and manuscript correction. The paper is interesting. I have no other questions about the manuscript.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

1. In this MS, the authors tried to depict many facets of the biological communities in pools. Simultaneously, they tried to prove the superiority of CNN on the MLR and the detection of the importance of several ecological variables. However, MLR is merely one technique and it has been superseded by other methods such as ‘glmm’ (in R environment) which are capable to cope with nonlinear data, many variables, discontinuities in the data, and various predictions (Pinna et al., C. R. Biologies 337 (2014) 338–344).

2. In addition, they state that ‘Currently, the effective method of processing experimental data is a neural network method’ following the conference paper by Nikiforov et al., 2020. This statement is too heavy and certainly, it needs many elaborated papers and conferences to be verified.

3. Currently, the effective method of processing experimental data is a neural network method

4. The ‘Abstract’ session needs thorough rewriting since the reader is ignorant of how the 132 groups were obtained. On this, the authors simply say ‘A total of 132 groups of data were obtained by field collection and measurements (experimental data)’.

5. Another point is the following. The authors state that ‘The prediction accuracy to individuals is about 70%, and 55% to species’. The question is whether the prediction is made to the individual or to the number of individuals. This is extremely confusing for any reader.

6. In the 3.2 session of the MS the authors do not make clear what is meant by ‘Grade of the number of individuals’. Also, what is meant by ESM (Expanded Sample Model)? Is this the ‘fully connected layer’ of the CNN model?

7. VRS (very rare species) are already excluded in virtually all studies (1 or 2 occurrences) since they introduce some noise or uncertainty in the data. However, an one year data is expected to have many such occurrences.

8. In general the MS failed to show in a clear way the superiority of CNN against MLR, which is a simple and primitive technique. The various functions, such as the LeakyReLU are not explained and in effect, the ‘black box' of the model is not explained.

For all the above I suggest the rewriting of the MS and the authors should take in mind all the above together with other explanations in order to make the MS clear for all readers.

Author Response

Dear Reviewers:

Thank you for your letter and for the reviewers' comments concerning our manuscript entitled “Evaluation of water beetle diversity of a stagnant pool using convolutional neural networks” (ID: applsci-1927364). Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied comments carefully and have made correction and explanation which we hope meet with approval. The parts of the modification all retained the traces. Main revisions of the paper and responses to reviewer comments are shown in the attachment.

We appreciate for Reviewers' warm work earnestly, and hope that the correction will meet with approval.

Once again, thank you very much for your comments and suggestions.

Kind regards,

Xiaomei Yang

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors develop a comparison between a classical regression model and a 1-D convolutional model for the prediction of beetle species diversity in a specific area of standing water. The problem is interesting with regard to its ultimate objective, although it does not represent a relevant contribution of scientific knowledge, but instead seeks to improve the accuracy of existing models and does not develop a new methodology. The number of data is insufficient to be able to draw conclusions from a convolutional model. It is true that the authors themselves present an improvement on the initial approach, consisting of generating a series of synthetic data (Even so, the results presented show very high and dissimilar error values between the different months.). However, there is no description with any degree of detail on the construction of said data or on the validity of said generated set. There is also no proper explanation about the output called "grade of the number of individuals". It seems like a percentage measure of the individuals in relation to the species to which they belong, but from reading the paper it is not possible to understand what it means nor, therefore, how the data has been generated. On the other hand, the second output, which consists of a single numerical value corresponding to the number of different species, does not represent solid evidence of what the authors stated in the motivation of the article. If we are talking about a measure of the diversity of species, and said diversity is only measured based on the number of species, how then can we extrapolate results? For example, we could find two scenarios where the number of species are the same at two times of the year, but the species are different. In that case we would have an important variety in the diversity of species in the ecosystem that would not be reflected in the results obtained by the model. This reviewer considers that a rethinking of the model to be implemented should be carried out so that it really offers an accurate measure of the variability of species.

Author Response

Dear Reviewers:

Thank you for your letter and for the reviewers' comments concerning our manuscript entitled “Evaluation of water beetle diversity of a stagnant pool using convolutional neural networks” (ID: applsci-1927364). We have studied comments carefully and have made correction and explanation which we hope meet with approval. The parts of the modification all retained the traces. Main revisions of the paper and responses to reviewer comments are shown in the attachment.

We appreciate for Reviewers' warm work earnestly, and hope that the correction and explanation will meet with approval.

Once again, thank you very much for your comments and suggestions.

Kind regards,

Xiaomei Yang

Author Response File: Author Response.docx

Reviewer 3 Report

Dear Authors,

Below is my report on the research article titled " Evaluation of water beetle diversity of a stagnant pool using convolutional neural networks".

In this study, the diversity of water beetles of the Nanshe pool, a large artificial pool dating back more than 30 years without human interference, situated on the Dapeng Peninsula, Shenzhen, Guang-dong Province, China, is studied.

Multiple Linear Regression (MLR) and one dimension Convolution Neural Networks (1-D CNN) has used to assess and predict the species and individual diversities of water beetles.

The following eight ecological factors were considered: water temperature (WT), salinity, pH, water depth (WD), ratio area of emergent plants (EP), ratio area of submerged plants (SP), water area (WA), water level (WL). A total of 132 groups of data were obtained by field collection and measurements.

A total of 39 species of aquatic beetles were collected, in which 19 species are assigned to Hydrophilidae, 16 to Dytiscidae, 3 to Noteridae and 1 to Gyrinidae.

The prediction accuracy to individuals is about 70%, and 55% to species only based on 132 groups of experimental data by MLR and 112 groups of experimental data as training data and other 20 groups as validation and testing by 1-D CNN.

To improve the accuracy, a new 1-D CNN predicted model based on 224 groups of expanded sample data using above MLR model is trained, 82.3% to individuals and 80.0% to species can be achieved.

If the species of which only one individual was collected during the study is excluded, the prediction accuracy is 85.2% to individuals, and 80.0% to species.

The prediction accuracy is clearly higher using 1-D CNN than MLR in the both number of individuals and number of species when enough data are available.

1-D CNN is a suitable method to predict the number of species and abundance of aquatic insects based on relevant environmental factors.

This manuscript is contain sufficient original and new scientific data.

Separately, introduction, materials and methods, results, discussion and reference list are sufficient. Separately, statistical analysis was made appropriately.

This manuscipt can be publish in this journal.

Best regards

Author Response

Dear Reviewers:

Thank you for your letter and for the reviewers' comments concerning our manuscript entitled “Evaluation of water beetle diversity of a stagnant pool using convolutional neural networks” (ID: applsci-1927364). Thank you for taking time out of your busy schedule to carefully review our article and give a comprehensive review.

We appreciate for your warm work earnestly.

Kind regards,

Xiaomei Yang

Round 2

Reviewer 1 Report

The authors skipped saying that CNN does not make any inference about the process of water beetle biodiversity construction. This is okay for processes and terms that do not contain any mathematical description like the ‘nanofibers’ in the paper written by three Iranian scientists “Kalantary, S., Jahani, A. & Jahani, R. MLR, and ANN Approaches for Prediction of Synthetic/Natural Nanofibers Diameter in the Environmental and Medical Applications. Sci Rep 10, 8117 (2020). https://doi.org/10.1038/s41598-020-65121-x”. However, biodiversity is a sophisticated term that describes the process of co-existence of many species having various ecological roles and responding in a myriad of ways to any biotic or abiotic factor. This is missing from CNN analysis and for this, the authors rely on MLR to predict the best independent variables for the FNOI and NOS dependent variables. CNN is very good to apply Artificial Intelligence to image recognition where each pixel does not have any (evolutionary) history or any kind of biological depth. A situation that differs a lot from biological systems such as the one of water beetle biodiversity. That is why we are not interested in the information loss or gain when the ‘Pooling layer is used to reduce the 167 dimensions of the input matrix’ (line 167). What this reduction means in biological terms is absolutely unknown and certainly differs front the information loss in principal and other ‘factor analysis’ and ‘ordination methods’.

For all these, I declare my opposition to such approaches, but I agree to the publication of the MS in the ‘Applies Sciences’ journal as is the revised MS.

Author Response

Dear Reviewers:

Thank you for your letter and for the reviewers' comments concerning our manuscript entitled “Evaluation of water beetle diversity of a stagnant pool using convolutional neural networks” (ID: applsci-1927364). Thank you for taking time out of your busy schedule to carefully review our article and give a comprehensive review.

We appreciate for your warm work earnestly.

Kind regards,

Xiaomei Yang

Author Response File: Author Response.docx

Reviewer 2 Report

Even when the explanations contained within the cover letter are really helpful, there are some aspects regarding the paper that could be discussed:

For example, it is true that synthetic data generation it is needed in order to improve the proposal performance. Nevertheless, the method used for generating this data is questionable. They can be found in several synthetic generation data proposals within the literature and it could be highly desirable to study and use them.

Also, I don't understand the role of the output regarding "The individuals of water beetles were graded based on the proportion of each sample to all number of the pool collected each sample". Do it imply that, for example, if 1000 thousand individuals were collected in a specific sample, and 200 of them belongs to the most frequent specie, the output is going to be grade 2?

Even when I think that the paper could be interesting, I think that there are several methodologic issues that should be revised.

Author Response

Dear Reviewers:

Thank you for your letter and for the reviewers' comments concerning our manuscript entitled “Evaluation of water beetle diversity of a stagnant pool using convolutional neural networks” (ID: applsci-1927364). Thank you for taking time out of your busy schedule to carefully review our article and give a comprehensive review.

We appreciate for your warm work earnestly.

Kind regards,

Xiaomei Yang

Author Response File: Author Response.docx