Next Article in Journal
Non-Destructive Appraisal of Macro- and Micronutrients in Persimmon Leaves Using Vis/NIR Hyperspectral Imaging
Previous Article in Journal
From Waste to Resources: Sewage Sludges from the Citrus Processing Industry to Improve Soil Fertility and Performance of Lettuce (Lactuca sativa L.)
 
 
Article
Peer-Review Record

An Improved Mask RCNN Model for Segmentation of ‘Kyoho’ (Vitis labruscana) Grape Bunch and Detection of Its Maturity Level

Agriculture 2023, 13(4), 914; https://doi.org/10.3390/agriculture13040914
by Yane Li 1,2,3,†, Ying Wang 1,2,3,†, Dayu Xu 1, Jiaojiao Zhang 4 and Jun Wen 5,*
Reviewer 1:
Reviewer 2:
Agriculture 2023, 13(4), 914; https://doi.org/10.3390/agriculture13040914
Submission received: 2 April 2023 / Revised: 17 April 2023 / Accepted: 19 April 2023 / Published: 21 April 2023
(This article belongs to the Section Digital Agriculture)

Round 1

Reviewer 1 Report

Line 92: what is image method? Do you mean imaging processing method?

Research goal must be provided in the end of Introduction.

Line 155: Add scientific name of ’Kyoho’ grapes.

Line 158-159: not meaning. The sentence must be rewritten.

Provide model, company, and country of mobile phone.

Line 170: “... with bunches of 163, 170 214, 204 and 75 respectively” can be “… with 163, 214, 204 and 75 bunch samples, respectively”.

In which software image processing and data analysis were done?

Lin 188: “is” must be changed to “was”.

Introduce mAP and mAP0.75 for all tables (in the bottom of tables).

The results must be compared with results of previous researches.

Some corrections must be done.

Author Response

Yane Li, Ph.D.

College of Mathematics and Computer Science

Zhejiang A&F University

Hangzhou 311300

P.R. China

Email: [email protected]

April 17, 2023

Re: Manuscript ID 19-2636 

Dear Editor,

Thank you very much for your letter dated April 10, 2023. We would like to thank the reviewers for their helpful comments and suggestions regarding our manuscript entitled “An Improved Mask RCNN Model for Segmentation of ‘Kyoho’ Grape Bunch and Detection of its Maturity Level”. We have addressed the issues raised by the reviewers point-by-point and revised our manuscript accordingly. Please find our responses to the reviewers’ specific comments and suggestions below.

We hope our manuscript is now suitable for publication in Agriculture. Thank you very much for your time and consideration. We look forward to hearing from you soon.

Yours sincerely,

Yane Li, Ph.D.

Response to Reviewers

We sincerely appreciate the reviewers’ constructive comments and suggestions. We have carefully addressed each of the comments, and our specific responses to each individual question or comment are as follows.

Response to Reviewer 1

Comment 1. Line 92: what is image method? Do you mean imaging processing method?

Response: Thanks for the reviewer’s valuable comment. The “image method” here means “image processing and computational intelligence methods”. We have revised it in the revised manuscript.

Comment 2. Research goal must be provided in the end of Introduction.

Response: Thanks for the reviewer’s helpful comment. As a result, the goal of this research is to investigated an improved Mask RCNN based algorithm by adding attention mechanism module to establish grape bunch segmentation and maturity level detection model. This model is of practical significance to help the maturity judgment of ‘Kyoho’ grape for artificial intelligence to pick grapes.

We have added it accordingly in the revised manuscript, as shown in highlight words in the end of Introduction.

Comment 3. Line 155: Add scientific name of ’Kyoho’ grapes.

Response: Thanks for the reviewer’s helpful suggestion. ‘Kyoho’ (Vitis labruscana) is a very important cultivar originated in Japan.

We have added scientific name of ’Kyoho’ grapes (Vitis labruscana) in the revised manuscript.

Comment 4. Line 158-159: not meaning. The sentence must be rewritten.

Response: Thanks for the reviewer’s valuable comment. In this sentence we attempted to introduce the environment of the grape base which we collected images. We have rewritten this sentence as “In this study, all ’Kyoho’ grapes images were collected on July, 2022 from ’Kyoho’ grapes base which have good light transmission in Pujiang County, Zhejiang Province, China.” in the revised manuscript.

Comment 5. Provide model, company, and country of mobile phone.

Response: Thanks for the reviewer’s helpful suggestion. RGB images of ’Kyoho’ grapes were obtained by using a mobile phone (HUAWEI P40, Huawei Technologies Co., Ltd, China), the phone has a 50 megapixel camera. The phone model, company and country have been added in the revised manuscript.

Comment 6. Line 170: “... with bunches of 163, 170 214, 204 and 75 respectively” can be “… with 163, 214, 204 and 75 bunch samples, respectively”.

Response: Thanks for the reviewer’s helpful suggestion. The sentence of “Grape bunches were divided into four groups according to the maturity levels based on the color of bunch skin, from maturity level 1 to maturity level 4, with bunches of 163, 214, 204 and 75 respectively.” has been revised to “Grape bunches were divided into four groups according to the maturity levels based on the color of bunch skin, from maturity level 1 to maturity level 4, with 163, 214, 204 and 75 bunch samples, respectively.” in the revised manuscript.

Comment 7. In which software image processing and data analysis were done?

Response: Thanks for the reviewer’s helpful suggestion. The software of pycharm with 2022.2 version was used for image processing and data analysis. We have added it in the “Experimental environment and parameter Settings” section in the revised manuscript.

Comment 8. Line 188: “is” must be changed to “was”.

Response: Thanks for the reviewer’s helpful suggestion. We have revised it in the revised manuscript.

Comment 9. Introduce mAP and mAP0.75 for all tables (in the bottom of tables).

Response: Thanks for the reviewer’s valuable comment. Index of mAP is the mean average precision, mAP0.75 is the mean average precision value when intersection over union threshold was set as 0.75. we have added the introduce mAP and mAP0.75 in the bottom of tables 2-6 in the revised manuscript.

Comment 10. The results must be compared with results of previous researches.

Response: Thanks for the reviewer’s valuable comment. In previous researches, Solov2, Yolov3, Yolact, as well as Mask RCNN based deep learning methods were usually used for target recognition. In this study, to compare performance of models established with these methods, we established models with Solov2, Yolov3, Yolact and mask RCNN methods on the same dataset we collected in this suty. Results are shown in Table 2 in section of “3.2. Performance comparison between models established with different CNN networks”. In addition, we also compared performance of models established with Mask RCNN ResNet50/101 combining with three attention mechanism of SE, CBAM and CA respectively. Results are shown in Table 3 and Table 5 in section of “3.3 Performance comparison of models established by combining Mask RCNN with different attention mechanisms.”

Author Response File: Author Response.docx

Reviewer 2 Report

See the attachment 

Comments for author File: Comments.pdf


Author Response

Yane Li, Ph.D.

College of Mathematics and Computer Science

Zhejiang A&F University

Hangzhou 311300

P.R. China

Email: [email protected]

April 17, 2023

Re: Manuscript ID 19-2636 

Dear Editor,

Thank you very much for your letter dated April 10, 2023. We would like to thank the reviewers for their helpful comments and suggestions regarding our manuscript entitled “An Improved Mask RCNN Model for Segmentation of ‘Kyoho’ Grape Bunch and Detection of its Maturity Level”. We have addressed the issues raised by the reviewers point-by-point and revised our manuscript accordingly. Please find our responses to the reviewers’ specific comments and suggestions below.

We hope our manuscript is now suitable for publication in Agriculture. Thank you very much for your time and consideration. We look forward to hearing from you soon.

Yours sincerely,

Yane Li, Ph.D.

Response to Reviewers

We sincerely appreciate the reviewers’ constructive comments and suggestions. We have carefully addressed each of the comments, and our specific responses to each individual question or comment are as follows.

Response to Reviewer 2

Comment 1. Abstract is written well, enough information to understand what is presented in the manuscript. However, error rate is missing in the result section of abstract.

Response: Thanks for the reviewer’s valuable comment. The error rate of this model was 5.6%, which is less than ResNet 101 based model. We have added it in the result section of abstract in the revised manuscript.

Comment 2. The contribution of the article is missing in the introduction. Author should show the contribution of the manuscript in built points.

Response: Thanks for the reviewer’s valuable comment. The main contributions of this work include the following. Firstly, we collected one dataset, including four different maturity level grape bunches collected from different views and background in the real word of the vineyard. Second, we designed an improved segmentation and classification model by combining Mask RCNN and attention mechanism of coordinate attention, which have higher precision for segmentation of grape bunch and evaluation of grape maturity level. The mean average precision (mAP), mAP0.75 and the average accuracy of the model reached 0.934, 0.891 and 0.944. In the process of this model design, we compared the performance of different models established with YoloV3, Solov2, Yolact and Mask RCNN to select the backbone network. Then, three different attention mechanism modules including Squeeze-and-Excitation Attention (SE), Convolutional Block Attention Module (CBAM), and Coordinate Attention (CA) are introduced to the backbone network of Mask RCNN respectively. In addition, performance of models constructed with different network layers of ResNet 50 and ResNet 101 based attention mechanism modules combining method were compared. The experimental results show that the segmentation and classification ability of this model are higher than those of the above models. Finally, feature visualization was analyzed.

We have added it in the section of Introduction in the revised manuscript.

Comment 3. Research goal must be provided in the end of Introduction. The literature is not sufficient to cover the said area. Many latest references have not been mentioned in the literature. The author should include the latest literature in the manuscript and highlight their contribution. Some of the studies are as follows:

An Artificial Intelligence-Based Stacked Ensemble Approach for Prediction of Protein Subcellular Localization in Confocal Microscopy Images; Fruit Image Classification Model Based on MobileNetV2 with Deep Transfer Learning Technique; Enhancing Image Annotation Technique of Fruit Classification Using a Deep Learning Approach; Smart seed classification system based on MobileNetV2 architecture.

Response: Thanks for the reviewer’s valuable comment. As a result, the goal of this research is to investigated a model can not only accurately segment the grape bunch, but also evaluate the maturity of the grape bunch, which have practical significance to help the maturity judgment of ‘Kyoho’ grape for artificial intelligence to pick grapes. In addition, the model can judge the maturity level of ’Kyoho’ grape and provide a basis for further evaluation the time required for grape ripening, avoiding the wasteful situation of picking too early or too late. In this study, an improved Mask RCNN based algorithm by adding attention mechanism module to establish grape bunch segmentation and maturity level detection model. We first collected a dataset of grape bunches in different maturity from different angles and backgrounds in natural environments. Then, an improved grape bunch segmentation and maturity evaluation model was proposed by combining Mask RCNN network and attention mechanism.

We have added it in the section of Introduction in the revised manuscript.

In addition, more literatures which covered the said area are cited as shown in [XX], [XX] and [] in the revised manuscript.

Comment 4. It would be better to mention how many megapixels the camera has.

Response: Thanks for the reviewer’s helpful suggestion. The camera we used has 50 megapixels in this study. We have added it in the section of “Experimental data set” in the revised manuscript.

Comment 5. Dataset distribution according to maturity is missing from the manuscript. Author must show the datasets distribution based on different classes and the number of images per class in 2.1 section.

Response: Thanks for the reviewer’s helpful comment. A total of 601 images were applied, in which one or more grape bunches with different maturity level presented on one image. As a result, 656 ’Kyoho’ grapes bunches were used.

Grape bunches were divided into four groups according to the maturity levels based on the color of bunch skin, from maturity level 1 to maturity level 4, with 163, 214, 204 and 75 bunch samples, respectively.

We have added it in the section of “2.1 Experimental data set” in the revised manuscript.

Comment 6. Dataset of 646 total images is relatively considered as small dataset, what are the precautions author has opted to overcome this issue.

Response. Thank you for your valuable comment. In order to address the small dataset situation, we pre-trained weights (training with transfer learning method) during the experiment to alleviate the problem of too few sample labels. The pre-trained weights of Mask RCNN are shown in the experiments described in 3.1. in the revised manuscript.

Comment 7. Accuracy graphs are missing of all the models.

Response. Thank you for your valuable comment. The loss and accuracy graphs are shown in figure 9 (a) to (d). We have added them in the revised manuscript.

Figure 9. Loss and accurace variation. (a) Loss and accuracy changes of Mask RCNN_ResNet50 and Mask RCNN_ResNet101; (b) Loss and accuracy changes of Mask RCNN_ResNet50+SE and Mask RCNN_ResNet101+SE; (c) Loss and accuracy changes of Mask RCNN_ResNet50+CBAM and Mask RCNN_ResNet101+CBAM; (d) Loss and accuracy changes of Mask RCNN_ResNet50+CA and Mask RCNN_ResNet101+CA.

Comment 8. I don’t see any kind of qualitive and quantitative table comparison between the proposed work and existing work. Author must provide qualitative and quantitative table.

Response. Thank you for your valuable comment. We have added the qualitive and quantitative comparison between the proposed work and existing work. Specifically, Figure 10 shows qualitive comparison of performance between the proposed model established by combining Mask RCNN ResNet101 with coordinate attention and models constructed by original Mask RCNN ResNet 50/101 as well as Mask RCNN ResNet 50/101 combined with attention mechanism of Squeeze-and-Excitation Attention and Convolutional Block Attention Module respectively.

 

Figure 10. Qualitive comparison of performance between different models established with Mask RCNN Res-Net50/101+SE/CBAM/CA respectively.

From Figure 10 we can find that when three different kinds of attention were introduced, mAP, mAP0.75 and accuracy are all higher than original Mask RCNN based models. In addition, performance of series models established with deeper network of ResNet101 is better than according series models constructed with ResNet50 based models.

Table 5 and table 6 shows the quantitative comparison of performance between different models established with Mask RCNN ResNet50/101 combining attention mechanism of SE, CBAM and CA respectively.

Table 5. Incremental comparison between Mask RCNN ResNet 50 based model and combining of Mask RCNN ResNet 50 with three attention mechanisms based models.

Table 5 shows when three different kinds of attention were introduced, ResNet50+SE, ResNet50+CBAM and ResNet50+CA, mAP was 1.8%, 5.2% and 6.1% higher than ResNet50, respectively. mAP0.75 was 3.2%, 5.1% and 8.6% higher than ResNet50, respectively. The accuracy was 13%, 6.7% and 14.7% higher than ResNet50, respectively.

Table 6. Incremental comparison between Mask RCNN ResNet 101 based model and combining of Mask RCNN ResNet 101 with three attention mechanisms based models.

Table 6 shows when three different kinds of attention were introduced, ResNet50+SE, ResNet50+CBAM and ResNet50+CA, mAP was 3.6%, 5.1% and 6.5% higher than ResNet50, respectively. mAP0.75 was 2.1%, 3.0% and 4.4% higher than ResNet50, respectively. The accuracy was 6.7%, 8.3% and 9.4% higher than ResNet50, respectively.

We have added the contents and results of these qualitive and quantitative comparison in section of 3.3 in the revised manuscript.

Comment 9. The practical usage of the model in real world is not mentioned neither in introduction nor in conclusion.

Response. Thank you for your valuable comment. The practical usage of the model in real world we have added in section of Introductin and Conclusion. Specifically, in section of Introduction we have added: “The model have practical significance to help the maturity judgment of ‘Kyoho’ grape for artificial intelligence to pick grapes. In addition, the model can judge the maturity level of ’Kyoho’ grape and provide a basis for further evaluation the time required for grape ripening, avoiding the wasteful situation of picking too early or too late. In section of Conclusion we have added: “Rapid and accurate segmentation of grape bunches and detection of the maturity level can contribute to the construction of wisdom vineyards, helping for artificial intelligence to pick grapes and evaluate when to pick grapes. This not only helps to improve the commodity rate of 'Kyoho' grape, but also provides research basis for robot picking.”

Comment 10. Future directions are missing in conclusion.

Response. Thank you for your helpful comment. This model we proposed needs to be validation and improved by larger dataset with more complicated background. In addition, more accurate and faster model needs to be studied and developed in furture studies.

We have added the future directions in the revised manuscript.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

Author has addressed all the comments 

Author has addressed all the comments 

Back to TopTop