Next Article in Journal
Simulation of Soil Water Dynamics in a Black Locust Plantation on the Loess Plateau, Western Shanxi Province, China
Next Article in Special Issue
Crustaceans in the Meiobenthos and Plankton of the Thermokarst Lakes and Polygonal Ponds in the Lena River Delta (Northern Yakutia, Russia): Species Composition and Factors Regulating Assemblage Structures
Previous Article in Journal
Changes in the Soil–Plant–Water System Due to Biochar Amendment
Previous Article in Special Issue
Fresh- and Brackish-Water Cold-Tolerant Species of Southern Europe: Migrants from the Paratethys That Colonized the Arctic
 
 
Article
Peer-Review Record

Unsupervised Machine Learning and Data Mining Procedures Reveal Short Term, Climate Driven Patterns Linking Physico-Chemical Features and Zooplankton Diversity in Small Ponds

Water 2021, 13(9), 1217; https://doi.org/10.3390/w13091217
by Nicolò Bellin *, Erica Racchetti, Catia Maurone, Marco Bartoli and Valeria Rossi
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Water 2021, 13(9), 1217; https://doi.org/10.3390/w13091217
Submission received: 17 March 2021 / Revised: 20 April 2021 / Accepted: 26 April 2021 / Published: 28 April 2021
(This article belongs to the Special Issue Species Richness and Diversity of Aquatic Ecosystems)

Round 1

Reviewer 1 Report

The overall objectives are clear, however, I have some questions that need to be addressed in a revision.

1) The motivation for using fuzzy sets is still a bit unclear to me. It should be compared to other approaches in the literature and the benefits need to presented more clearly.

2) Using machine learning for monitoring of microorganisms in environmental samples is not new, but has been used in some studies already. While traditional studies count organisms using a microscope, novel approaches take also into account deep sequencing approaches. Please discuss the use of ML + sequencing in contrast to ML + microscope in more detail. See, e.g., pubmed:33682183 as an example.

3) English needs to be improved. There are many grammar and typing errors.

Author Response

Comments and Suggestions for Authors

The overall objectives are clear, however, I have some questions that need to be addressed in a revision.

 

1) The motivation for using fuzzy sets is still a bit unclear to me. It should be compared to other approaches in the literature and the benefits need to presented more clearly.

We re-wrote a sentence in the Introduction and we added further references (see below).

Clusters with fuzzy boundaries reflect better the continuous character of ecological features. In our study shallow ponds represented a system with high inter-annual variability in the water chemistry and zooplankton community. The uncertainty that characterized this kind of habitats was described by this unsupervised machine learning algorithm (fuzzy c-means) that is a very helpful tool to give an adequate partition of ecological data into clusters with similar properties.

We added a sentence in the introduction.

 

2) Using machine learning for monitoring of microorganisms in environmental samples is not new, but has been used in some studies already. While traditional studies count organisms using a microscope, novel approaches take also into account deep sequencing approaches. Please discuss the use of ML + sequencing in contrast to ML + microscope in more detail. See, e.g., pubmed:33682183 as an example.

We added a sentence in the introduction:

“In the study of microbial community, Sperlea [] used a machine learning-based framework for the quantification of the covariation between the microbiome 27 environmental variables of lake ecosystems. Suppa et al [], applied Random Forest models to identify correlations between transcriptome and Daphnia magna microbiome changes”.

 

3) English needs to be improved. There are many grammar and typing errors.

We did it. 

Reviewer 2 Report

This article deals with the factors influencing the assemblage composition and distribution patterns of several zooplankton taxa in a series of ponds in a Mediterranean country, Italy. They make use of the fuzzy c-means algorithm to classify the ponds according to the taxa they support and to define the influential potential of several physico-chemical parameters on the assemblage patterns of the taxa. Although their analysis is based on data with relatively confined temporal length, their results are interesting. Unsupervised machine learning and data mining techniques together with data fuzzification are proven efficient and adequate to deal with freshwater biological aspects. In general, the manuscript is well-written, while the presentation of their results is satisfactory. Under this context, I propose some minor comments and corrections for the manuscript to be improved.

General comments

 I propose to decrease the length of the abstract; I find it extensive with many details and introductive material that needs to be deleted. The length should be 10-15 lines, focusing on explaining the methodology you followed and what are your findings.

Conclusions should be a take home message by highlighting how your study progressed beyond the state of the art and why the findings are intriguing.

Line-specific comments

Line 25: you can replace ”chlorophyll a” with “chlorophyll-a” whenever you mention this specific word.

Line 42: “stochasticity” instead of “stocasticity”

Line 53: You can add a relative paper for supervised machine learning algorithms in freshwater ecosystems after the 10th reference. “Mellios, N.; Moe, S.J.; Laspidou, C. Machine Learning Approaches for Predicting Health Risk of Cyanobacterial Blooms in Northern European Lakes. Water 202012, 1191. https://doi.org/10.3390/w12041191”

Line 68: Instead of comma use a semicolon.

Line 141: m2 instead of m2. In general check for similar mistakes throughout the whole manuscript.

Figures 1 and 7 are not so clear for the reader, so I propose to replace them with clearer pics.

To this end, I recommend the manuscript for publication with minor revisions as proposed above.

Author Response

This article deals with the factors influencing the assemblage composition and distribution patterns of several zooplankton taxa in a series of ponds in a Mediterranean country, Italy. They make use of the fuzzy c-means algorithm to classify the ponds according to the taxa they support and to define the influential potential of several physico-chemical parameters on the assemblage patterns of the taxa. Although their analysis is based on data with relatively confined temporal length, their results are interesting. Unsupervised machine learning and data mining techniques together with data fuzzification are proven efficient and adequate to deal with freshwater biological aspects. In general, the manuscript is well-written, while the presentation of their results is satisfactory. Under this context, I propose some minor comments and corrections for the manuscript to be improved.

 

General comments

 

 I propose to decrease the length of the abstract; I find it extensive with many details and introductive material that needs to be deleted. The length should be 10-15 lines, focusing on explaining the methodology you followed and what are your findings.

 We shortened the abstract and added the graphical abstract.

 

Conclusions should be a take home message by highlighting how your study progressed beyond the state of the art and why the findings are intriguing.

We shortened and modified the conclusion

 

Line-specific comments

 

Line 25: you can replace ”chlorophyll a” with “chlorophyll-a” whenever you mention this specific word.

We did it

 

Line 42: “stochasticity” instead of “stocasticity”

We did it

 

 

Line 53: You can add a relative paper for supervised machine learning algorithms in freshwater ecosystems after the 10th reference. “Mellios, N.; Moe, S.J.; Laspidou, C. Machine Learning Approaches for Predicting Health Risk of Cyanobacterial Blooms in Northern European Lakes. Water 2020, 12, 1191. https://doi.org/10.3390/w12041191”

We did it

 

Line 68: Instead of comma use a semicolon.

We did it

Line 141: m2 instead of m2. In general check for similar mistakes throughout the whole manuscript.

Ok, we checked and did it

Figures 1 and 7 are not so clear for the reader, so I propose to replace them with clearer pics.

Ok thank you for the suggestion, we re-draw Figures 1 and 7.

 

To this end, I recommend the manuscript for publication with minor revisions as proposed above.

Reviewer 3 Report

This manuscript proposes a procedure for applying ML techniques to heterogeneous ecological data. The authors provide a case study of an unsupervised machine learning application with ecological data to characterize and cluster Italian ponds based on relevant physicochemical and environmental features and zooplankton diversity, considering different geographical sources and a 2-year time evolution. The computational protocol includes fuzzy c-means and association rules applied to evaluate the underlying factors affecting the composition and distribution of different zooplankton taxa in the selected ponds.The study was conducted professionally and the results enrich the portfolio of ML in ecology. However, there is a need to make the results clearer and more objective to enhance the understanding of the current study. The results would be very informative if changes were made. Therefore, I recommend a revision before it could be accepted for publication in the journal Water.

 

n particular, this month. misses the innovation and novelty of the work presented. It is unclear whether the authors present an overarching methodology, an ecological problem solving ML task, or both.

The introduction should identify specific knowledge gaps in the existing literature.

It also discusses a specific case study without generality and, more importantly, without contrasting it with existing approaches. Is there anything specific to ecology that prevents authors from using standard unsupervised and supervised ML methods (e.g., Hierarchical Clustering, PCA, decision trees, neural networks, etc.)? Please comment on this.

 

Minor concerns:

1- Line 22. please clarify this vague sentence: " Results indicate that machine-learning approaches deliver consistent, accurate, and stable results."

2- The ms. would benefit from a schematic ilustration of the proposed analysis procedure.

3- How many variables were collected and selected? How was the impact of dimensionality reduction evaluated? Please clarify.

4- Figure 2 (PCA) should include both scores and loadings, i.e., a biplot representation to make the interpretation and visualization of the ponds and most discriminating features in each year more intuitive and clear.

5 - Please explain how the authors handled the validation of the results.

Author Response

This manuscript proposes a procedure for applying ML techniques to heterogeneous ecological data. The authors provide a case study of an unsupervised machine learning application with ecological data to characterize and cluster Italian ponds based on relevant physicochemical and environmental features and zooplankton diversity, considering different geographical sources and a 2-year time evolution. The computational protocol includes fuzzy c-means and association rules applied to evaluate the underlying factors affecting the composition and distribution of different zooplankton taxa in the selected ponds.The study was conducted professionally and the results enrich the portfolio of ML in ecology. However, there is a need to make the results clearer and more objective to enhance the understanding of the current study. The results would be very informative if changes were made. Therefore, I recommend a revision before it could be accepted for publication in the journal Water.

 

 

 

In particular, this month. misses the innovation and novelty of the work presented. It is unclear whether the authors present an overarching methodology, an ecological problem solving ML task, or both.

We re-wrote a sentence in the introduction (see below).

The introduction should identify specific knowledge gaps in the existing literature.

In the introduction, we cited further references as suggested by Rev.1 and Rev.2

It also discusses a specific case study without generality and, more importantly, without contrasting it with existing approaches. Is there anything specific to ecology that prevents authors from using standard unsupervised and supervised ML methods (e.g., Hierarchical Clustering, PCA, decision trees, neural networks, etc.)? Please comment on this.

Actually, the fuzzy c-means is a standard method of unsupervised learning. With unsupervised learning there is no response variable specified; algorithms identify patterns or groupings in the data based on its structure alone. Clusters with fuzzy boundaries reflect better the continuous character of ecological features. In our study shallow ponds represented a system with high inter-annual variability in the water chemistry and zooplankton community. The uncertainty that characterized this kind of habitats was described by this unsupervised machine learning algorithm (fuzzy c-means) that is a very helpful tool to give an adequate partition of ecological data into clusters with similar properties.

We re-wrote a sentence in the introduction.

 

Minor concerns:

 

1- Line 22. please clarify this vague sentence: " Results indicate that machine-learning approaches deliver consistent, accurate, and stable results."

According to Reviewer 2, we shortened the abstract and deleted the sentence.

2- The ms. would benefit from a schematic ilustration of the proposed analysis procedure.

We produced a Graphical abstract to clarify the proposed analysis. Thank you for the suggestion.

3- How many variables were collected and selected? How was the impact of dimensionality reduction evaluated? Please clarify.

The collected variables were 12 as reported in Table S2. The selected variables were 9 in 2014 and 11 in 2015 according to VIF analysis as reported in Table 1. We added a sentence 

 

4- Figure 2 (PCA) should include both scores and loadings, i.e., a biplot representation to make the interpretation and visualization of the ponds and most discriminating features in each year more intuitive and clear.

We modified figure 2. We added right panels to avoid superimposition and to show the variables’ loading with a proper scale.   

 

5 - Please explain how the authors handled the validation of the results.

As reported in material and method:

The evaluation of the quality of the clustering procedure is made with a particular function that will be maximized or minimized according to the number of clusters c [58]. These procedures allow to know how well the algorithm fitted the data structure (cluster validity problem). The most common measures for this task are the partition coefficient (PC) and the partition entropy (PE). In this study, the fuzzy c-means on the environmental dataset was used. The environmental features were standardized and a search grid procedure was used: the fuzzy c-means was run multiple times. For each run, a combination of the parameter c (number of cluster) and the fuzzifier exponent m, were set. The best partition was selected according to the maximum value of PC, or, in alternative, to the minimum value of PE. To improve the clustering procedure for each run, the algorithm was randomly initialized 50 times.

In the revised version of the manuscript, we added “The evaluation of the quality of the clustering was made considering the best partition of the maximum value of partition coefficient (PC) and the minimum value of partition entropy (PE). In 2014 and 2015, the maximum value of PC, 0.68 and 0.63, respectively, and the minimum value of PE, 0.49 and 0.56, respectively, were obtained for c = 2.”

Round 2

Reviewer 1 Report

The authors addressed my concerns adequately. I have only some minor comments left.

Figure 6 looks a bit squeezed and the text on the right side is a bit small and distorted.

Same for Figure 2.

Author Response

Figure 6 looks a bit squeezed and the text on the right side is a bit small and distorted.

Same for Figure 2.

We re-resize and re-draw the right side of Figure 2 and Figure 6 (see attached file). 

Author Response File: Author Response.pdf

Back to TopTop