Next Article in Journal
The Water–Economy Nexus of Beef Produced from Different Cattle Breeds
Next Article in Special Issue
Citizen Science for Marine Litter Detection and Classification on Unmanned Aerial Vehicle Images
Previous Article in Journal
Ferrous Magnetic Nanoparticles for Arsenic Removal from Groundwater
Previous Article in Special Issue
Sea Topography of the Ionian and Adriatic Seas Using Repeated GNSS Measurements
 
 
Article
Peer-Review Record

Machine Learning Applications of Convolutional Neural Networks and Unet Architecture to Predict and Classify Demosponge Behavior

Water 2021, 13(18), 2512; https://doi.org/10.3390/w13182512
by Dominica Harrison 1,2, Fabio Cabrera De Leo 2,3, Warren J. Gallin 1, Farin Mir 1, Simone Marini 4,5 and Sally P. Leys 1,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Water 2021, 13(18), 2512; https://doi.org/10.3390/w13182512
Submission received: 5 August 2021 / Revised: 27 August 2021 / Accepted: 6 September 2021 / Published: 13 September 2021
(This article belongs to the Special Issue Pattern Analysis, Recognition and Classification of Marine Data)

Round 1

Reviewer 1 Report

The manuscript "Machine learning applications of Convolutional Neural Networks and Unet architecture to predict and classify demosponge behavior" is within the scope of Water and can be considered for publication, after going through Minor Revision.
- Insert research hypotheses and objectives clearly in the Abstract and at the end of the Introduction.
- It is necessary to make clear at the end of the Introduction what is the novelty of this research in relation to other similar ones. What scientific gap does it fill?
- the authors presented only one accuracy metric in the manuscript: the confusion matrix in Figure 4. I would like to see other metrics like the apparent error rate (in percentage). This result makes it clearer to the reader about the accuracy of the CNNs tested.

Author Response

The manuscript "Machine learning applications of Convolutional Neural Networks and Unet architecture to predict and classify demosponge behavior" is within the scope of Water and can be considered for publication, after going through Minor Revision.
- Insert research hypotheses and objectives clearly in the Abstract and at the end of the Introduction.

 

Response: We appreciate this comment from the reviewer. This work did not apply hypotheses however, but attempts to formulate a method to describe an object that is changing over time. If we had used hypotheses we would have applied statistics to test those which was not done here. However we have clarified the objectives of the study, both in the abstract and the introduction.

-We added to the description of our objectives in the abstract (lines 20-25):

“This study is a first step towards analysing trends in the behavior of a demosponge in an environment that experiences severe seasonal and inter-annual changes in climate. The end objective will be to correlate changes in sponge size (activity) over seasons and years with environmental variables collected from the same observatory platform.”

 

To the introduction we added the following text (lines 70-76). Three additional citations need to be added after the word ‘methods’. Because the version of the manuscript we were provided with is formatted, we provide those citations here for the copy editor to add.

“This investigation focuses on understanding the behavior of a sponge that has been monitored over three years by cameras at the Ocean Networks Canada Folger Observatory. Our goal is to automate the classification of sponge activity over time by using machine learning to predict the changes in its size. Previous analyses of sponge behavior have largely used software such as ImageJ with few attempts to use machine learning methods (Nickel, 2004; Leys et al 2019; Kahn et al 2020). No other work has generated an automated method that can extract complex coloured objects from a changing coloured background over time.”

- It is necessary to make clear at the end of the Introduction what is the novelty of this research in relation to other similar ones. What scientific gap does it fill?

We have added text to indicate the novelty and we insert three references that show how previous work quantified changes in behaviour of sponges. The additional text is below (lines 94-99):

“In this investigation we aim to isolate the image of a sponge from a complex background in which the sponge and background change over seasons and years. We have two objectives: we first clearly outline a methodology using convolutional neural networks with large image datasets that is effective for use by other biologists and ecologists; next, we apply this approach to study the behavior of a sponge in situ to provide an example of how this type of machine learning is applicable to a range of image sets.”

- the authors presented only one accuracy metric in the manuscript: the confusion matrix in Figure 4. I would like to see other metrics like the apparent error rate (in percentage). This result makes it clearer to the reader about the accuracy of the CNNs tested.

Response: We reported three types of validation to test the performance of the models and elaborate on how those describe error rate on lines 322-324 “The accuracy is the mean percentage of correct predictions, the dice coefficient score is the precision of the model or the amount of overlap between the actual class and the predictions, and the loss is the mean error rate.”

Validation is also discussed in Section 2.3.3 (lines 320-329), in Section 3.4, and again in the discussion on lines 646-652. These mechanisms of testing error are well established in the machine learning literature. We have added the following citation to support this: Fawcett T. 2006. An introduction to ROC analysis. Pattern Recognition Letters 27(8):861-874.

Reviewer 2 Report

  1. The Unet used in the paper is 2D or 3D? Figure 2 shows 2D-Unet architecture.
  2. Figure 3 appears before Figure 2.
  3. Why is 40% of the data used for training? The percentage of training data is lower than that of testing data.
  4. The image data are all for one purpose to identify Belinda. Why are all image data used to train 6 models instead of 1 model?
  5. For TL2, there are only 46 images for training. The outcome of course would be not good.  

Author Response

  1. The Unet used in the paper is 2D or 3D? Figure 2 shows 2D-Unet architecture.

Response: The Unet used was 2D which we clarify on line 220.

 

  1. Figure 3 appears before Figure 2.

Response: We now refer to figure 2 first, and the figure appears before figure 3 in the formatted manuscript.

 

  1. Why is 40% of the data used for training? The percentage of training data is lower than that of testing data.

Response: We originally used Matlab Imagelabeller and there it recommended to use a minimum of 30% as training data. We used 40% because it took a lot of time to manually delinate each mask. We tested the performance using that method and and found it very good where the number of images was high. For that reason we kept with 40% for all image series. In fact, by using smaller training, this choice makes our performance evaluation more conservative and prevents overfitting. The results could be better with a smaller size of the testing set. However, we found that 97% accuracy in some of our image series was very high.

 

  1. The image data are all for one purpose to identify Belinda. Why are all image data used to train 6 models instead of 1 model?

Response: We used images from each time-lapse series to train the model for that time-lapse series, and it is true that the performance was lower for the TL series with fewest images (TL6). We chose to do this because of the differences in images with season and each year, and we think that the performance was best when the model was just for the TL series that the training set was associated with. As we point out in the discussion a next step in analysing the behaviour of the sponge could include using the entire data set. However, as no changes were requested, none are made.

  1. For TL2, there are only 46 images for training. The outcome of course would be not good.  

Response: This comment is similar to the previous one. Although TL2 had only 46 images for training the model performed quite well.  The results for the model of TL6 were the lowest and they still performed extremely well. We could, in another iteration, try applying all images to one model, but that is not in the scope of this paper. No changes to the text are made.

Back to TopTop